Author name cluster

Mingming Gong

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

89 papers

2 author rows

JMLR Journal 2026 Journal Article

Bayesian Inference of Contextual Bandit Policies via Empirical Likelihood

Jiangrong Ouyang
Mingming Gong
Howard Bondell

Policy inference plays an essential role in the contextual bandit problem. In this paper, we use empirical likelihood to develop a Bayesian inference method for the joint analysis of multiple contextual bandit policies in finite sample regimes. The proposed inference method is robust to small sample sizes and is able to provide accurate uncertainty measurements for policy value evaluation. In addition, it allows for flexible inferences on policy comparison with full uncertainty quantification. We demonstrate the effectiveness of the proposed inference method using Monte Carlo simulations and its application to an adolescent body mass index data set. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2026. ( edit, beta )

JMLR Journal 2026 Journal Article

Identifying Weight-Variant Latent Causal Models

Yuhang Liu
Zhen Zhang
Dong Gong
Mingming Gong
Biwei Huang
Anton van den Hengel
Kun Zhang
Javen Qinfeng Shi

The task of causal representation learning aims to uncover latent higher-level causal variables that affect lower-level observations. Identifying the true latent causal variables from observed data, while allowing instantaneous causal relations among latent variables, remains a challenge, however. To this end, we start with the analysis of three intrinsic indeterminacies in identifying latent variables from observations: transitivity, permutation indeterminacy, and scaling indeterminacy. We find that transitivity acts as a key role in impeding the identifiability of latent causal variables. To address the unidentifiable issue due to transitivity, we introduce a novel identifiability condition where the underlying latent causal model satisfies a linear-Gaussian model, in which the causal coefficients and the distribution of Gaussian noise are modulated by an additional observed variable. Under certain assumptions, including the existence of a reference condition under which latent causal influences vanish, we can show that the latent causal variables can be identified up to trivial permutation and scaling, and that partial identifiability results can still be obtained when this reference condition is violated for a subset of latent variables. Furthermore, based on these theoretical results, we propose a novel method, termed Structural caUsAl Variational autoEncoder (SuaVE), which directly learns causal representations and causal relationships among them, together with the mapping from the latent causal variables to the observed ones. Experimental results on synthetic and real data demonstrate the identifiability and consistency results and the efficacy of SuaVE in learning causal representations. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2026. ( edit, beta )

ICLR Conference 2025 Conference Paper

A Robust Method to Discover Causal or Anticausal Relation

Yu Yao 0005
Yang Zhou
Bo Han 0003
Mingming Gong
Kun Zhang 0001
Tongliang Liu

Understanding whether the data generative process follows causal or anticausal relations is important for many applications. Existing causal discovery methods struggle with high-dimensional perceptual data such as images. Moreover, they require well-labeled data, which may not be feasible due to measurement error. In this paper, we propose a robust method to detect whether the data generative process is causal or anticausal. To determine the causal or anticausal relation, we identify an asymmetric property: under the causal relation, the instance distribution does not contain information about the noisy class-posterior distribution. We also propose a practical method to verify this via a noise injection approach. Our method is robust to label errors and is designed to handle both large-scale and high-dimensional datasets effectively. Both theoretical analyses and empirical results on a variety of datasets demonstrate the effectiveness of our proposed method in determining the causal or anticausal direction of the data generative process.

ICLR Conference 2025 Conference Paper

A Skewness-Based Criterion for Addressing Heteroscedastic Noise in Causal Discovery

Yingyu Lin
Yuxing Huang
Wenqin Liu
Haoran Deng
Ignavier Ng
Kun Zhang 0001
Mingming Gong
Yian Ma

Real-world data often violates the equal-variance assumption (homoscedasticity), making it essential to account for heteroscedastic noise in causal discovery. In this work, we explore heteroscedastic symmetric noise models (HSNMs), where the effect $Y$ is modeled as $Y = f(X) + \sigma(X)N$, with $X$ as the cause and $N$ as independent noise following a symmetric distribution. We introduce a novel criterion for identifying HSNMs based on the skewness of the score (i.e., the gradient of the log density) of the data distribution. This criterion establishes a computationally tractable measurement that is zero in the causal direction but nonzero in the anticausal direction, enabling the causal direction discovery. We extend this skewness-based criterion to the multivariate setting and propose \texttt{SkewScore}, an algorithm that handles heteroscedastic noise without requiring the extraction of exogenous noise. We also conduct a case study on the robustness of \texttt{SkewScore} in a bivariate model with a latent confounder, providing theoretical insights into its performance. Empirical studies further validate the effectiveness of the proposed method.

UAI Conference 2025 Conference Paper

A Unified Data Representation Learning for Non-parametric Two-sample Testing

Xunye Tian
Liuhua Peng
Zhijian Zhou
Mingming Gong
Arthur Gretton
Feng Liu 0003

Learning effective data representations has been crucial in non-parametric two-sample testing. Common approaches will first split data into training and test sets and then learn data representations purely on the training set. However, recent theoretical studies have shown that, as long as the sample indexes are not used during the learning process, the whole data can be used to learn data representations, meanwhile ensuring control of Type-I errors. The above fact motivates us to use the test set (but without sample indexes) to facilitate the data representation learning in the testing. To this end, we propose a representation-learning two-sample testing (RL-TST) framework. RL-TST first performs purely self-supervised representation learning on the entire dataset to capture inherent representations (IRs) that reflect the underlying data manifold. A discriminative model is then trained on these IRs to learn discriminative representations (DRs), enabling the framework to leverage both the rich structural information from IRs and the discriminative power of DRs. Extensive experiments demonstrate that RL-TST outperforms representative approaches by simultaneously using data manifold information in the test set and enhancing test power via finding the DRs with the training set.

ICLR Conference 2025 Conference Paper

Analytic DAG Constraints for Differentiable DAG Learning

Zhen Zhang 0008
Ignavier Ng
Dong Gong
Yuhang Liu 0002
Mingming Gong
Biwei Huang
Kun Zhang 0001
Anton van den Hengel

Recovering the underlying Directed Acyclic Graph (DAG) structures from observational data presents a formidable challenge, partly due to the combinatorial nature of the DAG-constrained optimization problem. Recently, researchers have identified gradient vanishing as one of the primary obstacles in differentiable DAG learning and have proposed several DAG constraints to mitigate this issue. By developing the necessary theory to establish a connection between analytic functions and DAG constraints, we demonstrate that analytic functions from the set $\\{f(x) = c_0 + \\sum_{i=1}^{\infty}c_ix^i | \\forall i > 0, c_i > 0; r = \\lim_{i\\rightarrow \\infty}c_{i}/c_{i+1} > 0\\}$ can be employed to formulate effective DAG constraints. Furthermore, we establish that this set of functions is closed under several functional operators, including differentiation, summation, and multiplication. Consequently, these operators can be leveraged to create novel DAG constraints based on existing ones. Using these properties, we design a series of DAG constraints and develop an efficient algorithm to evaluate them. Experiments in various settings demonstrate that our DAG constraints outperform previous state-of-the-art comparators. Our implementation is available at https://github.com/zzhang1987/AnalyticDAGLearning.

NeurIPS Conference 2025 Conference Paper

Counterfactual Implicit Feedback Modeling

Chuan Zhou
Lina Yao
Haoxuan Li
Mingming Gong

In recommendation systems, implicit feedback data can be automatically recorded and is more common than explicit feedback data. However, implicit feedback poses two challenges for relevance prediction, namely (a) positive-unlabeled (PU): negative feedback does not necessarily imply low relevance and (b) missing not at random (MNAR): items that are popular or frequently recommended tend to receive more clicks than other items, even if the user does not have a significant interest in them. Existing methods either overlook the MNAR issue or fail to account for the inherent mechanism of the PU issue. As a result, they may lead to inaccurate relevance predictions or inflated biases and variances. In this paper, we formulate the implicit feedback problem as a counterfactual estimation problem with missing treatment variables. Prediction of the relevance in implicit feedback is equivalent to answering the counterfactual question that ``whether a user would click a specific item if exposed to it? ". To solve the counterfactual question, we propose the Counterfactual Implicit Feedback (Counter-IF) prediction approach that divides the user-item pairs into four disjoint groups, namely definitely positive (DP), highly exposed (HE), highly unexposed (HU), and unknown (UN) groups. Specifically, Counter-IF first performs missing treatment imputation with different confidence levels from raw implicit feedback, then estimates the counterfactual outcomes via causal representation learning that combines pointwise loss and pairwise loss based on the user-item pairs stratification. Theoretically the generalization bound of the learned model is derived. Extensive experiments are conducted on publicly available datasets to demonstrate the effectiveness of our approach. The code is available at https: //github. com/zhouchuanCN/NeurIPS25-Counter-IF.

NeurIPS Conference 2025 Conference Paper

Decentralized Dynamic Cooperation of Personalized Models for Federated Continual Learning

Danni Yang
Zhikang Chen
Sen Cui
Mengyue Yang
Ding Li
Abudukelimu Wuerkaixi
Haoxuan Li
Jinke Ren

Federated continual learning (FCL) has garnered increasing attention for its ability to support distributed computation in environments with evolving data distributions. However, the emergence of new tasks introduces both temporal and cross-client shifts, making catastrophic forgetting a critical challenge. Most existing works aggregate knowledge from clients into a global model, which may not enhance client performance since irrelevant knowledge could introduce interference, especially in heterogeneous scenarios. Additionally, directly applying decentralized approaches to FCL suffers from ineffective group formation caused by task changes. To address these challenges, we propose a decentralized dynamic cooperation framework for FCL, where clients establish dynamic cooperative learning coalitions to balance the acquisition of new knowledge and the retention of prior learning, thereby obtaining personalized models. To maximize model performance, each client engages in selective cooperation, dynamically allying with others who offer meaningful performance gains. This results in non-overlapping, variable coalitions at each stage of the task. Moreover, we use coalitional affinity game to simulate coalition relationships between clients. By assessing both client gradient coherence and model similarity, we quantify the client benefits derived from cooperation. We also propose a merge-blocking algorithm and a dynamic cooperative evolution algorithm to achieve cooperative and dynamic equilibrium. Comprehensive experiments demonstrate the superiority of our method compared to various baselines. Code is available at: https: //github. com/ydn3229/DCFCL.

NeurIPS Conference 2025 Conference Paper

Detecting Generated Images by Fitting Natural Image Distributions

Yonggang Zhang
Jun Nie
Xinmei Tian
Mingming Gong
Kun Zhang
Bo Han

The increasing realism of generated images has raised significant concerns about their potential misuse, necessitating robust detection methods. Current approaches mainly rely on training binary classifiers, which depend heavily on the quantity and quality of available generated images. In this work, we propose a novel framework that exploits geometric differences between the data manifolds of natural and generated images. To exploit this difference, we employ a pair of functions engineered to yield consistent outputs for natural images but divergent outputs for generated ones, leveraging the property that their gradients reside in mutually orthogonal subspaces. This design enables a simple yet effective detection method: an image is identified as generated if a transformation along its data manifold induces a significant change in the loss value of a self-supervised model pre-trained on natural images. Further more, to address diminishing manifold disparities in advanced generative models, we leverage normalizing flows to amplify detectable differences by extruding generated images away from the natural image manifold. Extensive experiments demonstrate the efficacy of this method.

AAAI Conference 2025 Conference Paper

DIDiffGes: Decoupled Semi-Implicit Diffusion Models for Real-time Gesture Generation from Speech

Yongkang Cheng
Shaoli Huang
Xuelin Chen
Jifeng Ning
Mingming Gong

Diffusion models have demonstrated remarkable synthesis quality and diversity in generating co-speech gestures. However, the computationally intensive sampling steps associated with diffusion models hinder their practicality in real-world applications. Hence, we present DIDiffGes, for a Decoupled Semi-Implicit Diffusion model-based framework, that can synthesize high-quality, expressive gestures from speech using only a few sampling steps. Our approach leverages Generative Adversarial Networks (GANs) to enable large-step sampling for diffusion model. We decouple gesture data into body and hands distributions and further decompose them into marginal and conditional distributions. GANs model the marginal distribution implicitly, while L2 reconstruction loss learns the conditional distributions exciplictly. This strategy enhances GAN training stability and ensures expressiveness of generated full-body gestures. Our framework also learns to denoise root noise conditioned on local body representation, guaranteeing stability and realism. DIDiffGes can generate gestures from speech with just 10 sampling steps, without compromising quality and expressiveness, reducing the number of sampling steps by a factor of 100 compared to existing methods. Our user study reveals that our method outperforms state-of-the-art approaches in human likeness, appropriateness, and style correctness.

PDF Details DOI

ICML Conference 2025 Conference Paper

Enhancing Treatment Effect Estimation via Active Learning: A Counterfactual Covering Perspective

Hechuan Wen
Tong Chen 0005
Mingming Gong
Li Kheng Chai
Shazia Sadiq
Hongzhi Yin

Although numerous complex algorithms for treatment effect estimation have been developed in recent years, their effectiveness remains limited when handling insufficiently labeled training sets due to the high cost of labeling the post-treatment effect, e. g. , the expensive tumor imaging or biopsy procedures needed to evaluate treatment effects. Therefore, it becomes essential to actively incorporate more high-quality labeled data, all while adhering to a constrained labeling budget. To enable data-efficient treatment effect estimation, we formalize the problem through rigorous theoretical analysis within the active learning context, where the derived key measures – factual and counterfactual covering radii determine the risk upper bound. To reduce the bound, we propose a greedy radius reduction algorithm, which excels under an idealized, balanced data distribution. To generalize to more realistic data distributions, we further propose FCCM, which transforms the optimization objective into the Factual and Counterfactual Coverage Maximization to ensure effective radius reduction during data acquisition. Furthermore, benchmarking FCCM against other baselines demonstrates its superiority across both fully synthetic and semi-synthetic datasets. Code: https: //github. com/uqhwen2/FCCM.

ICML Conference 2025 Conference Paper

Extracting Rare Dependence Patterns via Adaptive Sample Reweighting

Yiqing Li
Yewei Xia
Xiaofei Wang
Zhengming Chen 0002
Liuhua Peng
Mingming Gong
Kun Zhang 0001

Discovering dependence patterns between variables from observational data is a fundamental issue in data analysis. However, existing testing methods often fail to detect subtle yet critical patterns that occur within small regions of the data distribution–patterns we term rare dependence. These rare dependencies obscure the true underlying dependence structure in variables, particularly in causal discovery tasks. To address this issue, we propose a novel testing method that combines kernel-based (conditional) independence testing with adaptive sample importance reweighting. By learning and assigning higher importance weights to data points exhibiting significant dependence, our method amplifies the patterns and can detect them successfully. Theoretically, we analyze the asymptotic distributions of the statistics in this method and show the uniform bound of the learning scheme. Furthermore, we integrate our tests into the PC algorithm, a constraint-based approach for causal discovery, equipping it to uncover causal relationships even in the presence of rare dependence. Empirical evaluation of synthetic and real-world datasets comprehensively demonstrates the efficacy of our method.

TMLR Journal 2025 Journal Article

Federated Generalized Novel Category Discovery with Prompts Tuning

Lei Shen
Nan Pu
Zhun Zhong
Mingming Gong
Dianhai Yu
Chengqi Zhang
Bo Han

Generalized category discovery (GCD) is proposed to handle categories from unseen labels during the inference stage by clustering them. Most works in GCD provide solutions for unseen classes in data-centralized settings. However, unlabeled categories possessed by clients, which are common in real-world federated learning (FL), have been largely ignored and degraded the performance of classic FL algorithms. To demonstrate and mitigate the harmful effect of unseen classes, we dive into a GCD problem setting applicable for FL named FedGCD, analyze overfitting problem in FedGCD in detail, establish a strong baseline constructed with state-of-the-art GCD algorithm simGCD, and design a learning framework with prompt tuning to tackle both the overfitting and communication burden problems in FedGCD. In our methods, clients first separately carry out prompt learning on local data. Then, we aggregate the prompts from all clients as the global prompt to help capture global knowledge and then send the global prompts to local clients to allow access to broader knowledge from other clients. By this method, we significantly reduce the parameters needed to upload in FedGCD, which is a common obstacle in the real application of most FL algorithms. We conduct experiments on both generic and fine-grained datasets like CIFAR-100 and CUB-200, and show that our method is comparable to the FL version of simGCD and surpasses other baselines with significantly fewer parameters to transmit.

TMLR Journal 2025 Journal Article

Latent Covariate Shift: Unlocking Partial Identifiability for Multi-Source Domain Adaptation

Yuhang Liu
Zhen Zhang
Dong Gong
Mingming Gong
Biwei Huang
Anton van den Hengel
Kun Zhang
Javen Qinfeng Shi

Multi-source domain adaptation (MSDA) addresses the challenge of learning a label prediction function for an unlabeled target domain by leveraging both the labeled data from multiple source domains and the unlabeled data from the target domain. Conventional MSDA approaches often rely on covariate shift or conditional shift paradigms, which assume a consistent label distribution across domains. However, this assumption proves limiting in practical scenarios where label distributions do vary across domains, diminishing its applicability in real-world settings. For example, animals from different regions exhibit diverse characteristics due to varying diets and genetics. Motivated by this, we propose a novel paradigm called latent covariate shift (LCS), which introduces significantly greater variability and adaptability across domains. Notably, it provides a theoretical assurance for recovering the latent cause of the label variable, which we refer to as the latent content variable. Within this new paradigm, we present an intricate causal generative model by introducing latent noises across domains, along with a latent content variable and a latent style variable to achieve more nuanced rendering of observational data. We demonstrate that the latent content variable can be identified up to block identifiability due to its versatile yet distinct causal structure. We anchor our theoretical insights into a novel MSDA method, which learns the label distribution conditioned on the identifiable latent content variable, thereby accommodating more substantial distribution shifts. The proposed approach showcases exceptional performance and efficacy on both simulated and real-world datasets.

ICML Conference 2025 Conference Paper

Learning Imbalanced Data with Beneficial Label Noise

Guangzheng Hu
Feng Liu 0003
Mingming Gong
Guanghui Wang
Liuhua Peng

Data imbalance is a common factor hindering classifier performance. Data-level approaches for imbalanced learning, such as resampling, often lead to information loss or generative errors. Building on theoretical studies of imbalance ratio in binary classification, it is found that adding suitable label noise can adjust biased decision boundaries and improve classifier performance. This paper proposes the Label-Noise-based Re-balancing (LNR) approach to solve imbalanced learning by employing a novel design of an asymmetric label noise model. In contrast to other data-level methods, LNR alleviates the issues of informative loss and generative errors and can be integrated seamlessly with any classifier or algorithm-level method. We validated the superiority of LNR on synthetic and real-world datasets. Our work opens a new avenue for imbalanced learning, highlighting the potential of beneficial label noise.

ICLR Conference 2025 Conference Paper

LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning

Zhekai Du
Yinjie Min
Jingjing Li 0001
Ke Lu 0001
Changliang Zou
Liuhua Peng
Tingjin Chu
Mingming Gong

Low-rank adaptation (LoRA) has become a prevalent method for adapting pre-trained large language models to downstream tasks. However, the simple low-rank decomposition form may constrain the optimization flexibility. To address this limitation, we introduce Location-aware Cosine Adaptation (LoCA), a novel frequency-domain parameter-efficient fine-tuning method based on inverse Discrete Cosine Transform (iDCT) with selective locations of learnable components. We begin with a comprehensive theoretical comparison between frequency-domain and low-rank decompositions for fine-tuning pre-trained large models. Our analysis reveals that frequency-domain decomposition with carefully selected frequency components can surpass the expressivity of traditional low-rank-based methods. Furthermore, we demonstrate that iDCT offers a more efficient implementation compared to inverse Discrete Fourier Transform (iDFT), allowing for better selection and tuning of frequency components while maintaining equivalent expressivity to the optimal iDFT-based adaptation. By employing finite-difference approximation to estimate gradients for discrete locations of learnable coefficients on the DCT spectrum, LoCA dynamically selects the most informative frequency components during training. Experiments on diverse language and vision fine-tuning tasks demonstrate that LoCA offers enhanced parameter efficiency while maintains computational feasibility comparable to low-rank-based methods.

ICML Conference 2025 Conference Paper

MissScore: High-Order Score Estimation in the Presence of Missing Data

Wenqin Liu
Haoze Hou
Erdun Gao
Biwei Huang
Qiuhong Ke
Howard D. Bondell
Mingming Gong

Score-based generative models are essential in various machine learning applications, with strong capabilities in generation quality. In particular, high-order derivatives (scores) of data density offer deep insights into data distributions, building on the proven effectiveness of first-order scores for modeling and generating synthetic data, unlocking new possibilities for applications. However, learning them typically requires complete data, which is often unavailable in domains such as healthcare and finance due to data corruption, acquisition constraints, or incomplete records. To tackle this challenge, we introduce MissScore, a novel framework for estimating high-order scores in the presence of missing data. We derive objective functions for estimating high-order scores under different missing data mechanisms and propose a new algorithm specifically designed to handle missing data effectively. Our empirical results demonstrate that MissScore accurately and efficiently learns the high-order scores from incomplete data and generates high-quality samples, resulting in strong performance across a range of downstream tasks.

NeurIPS Conference 2025 Conference Paper

MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation

Jiaxin Huang
Runnan Chen
Ziwen Li
Zhengqing Gao
Xiao He
Yandong Guo
Mingming Gong
Tongliang Liu

Reasoning segmentation aims to segment target objects in complex scenes based on human intent and spatial reasoning. While recent multimodal large language models (MLLMs) have demonstrated impressive 2D image reasoning segmentation, adapting these capabilities to 3D scenes remains underexplored. In this paper, we introduce MLLM-For3D, a simple yet effective framework that transfers knowledge from 2D MLLMs to 3D scene understanding. Specifically, we utilize MLLMs to generate multi-view pseudo-segmentation masks and corresponding text embeddings, then unproject 2D masks into 3D space and align them with the text embeddings. The primary challenge lies in the absence of 3D context and spatial consistency across multiple views, causing the model to hallucinate objects that do not exist and fail to target objects consistently. Training the 3D model with such irrelevant objects leads to performance degradation. To address this, we first filter irrelevant views using token attention. With these reliable pseudo-labels, we develop a token-for-Query approach for multimodal semantic alignment, enabling consistent identification of the same object across different views. Moreover, we introduce a spatial consistency strategy to enforce that segmentation masks remain coherent in the 3D space, effectively capturing the geometry of the scene. Extensive evaluations of various challenging indoor scene benchmarks demonstrate that, even without labeled 3D training data, MLLM-For3D outperforms existing 3D reasoning segmentation methods, effectively interpreting user intent, understanding 3D scenes, and reasoning about spatial relationships.

ICLR Conference 2025 Conference Paper

On the Identification of Temporal Causal Representation with Instantaneous Dependence

Zijian Li 0001
Yifan Shen 0004
Kaitao Zheng
Ruichu Cai
Xiangchen Song
Mingming Gong
Guangyi Chen 0002
Kun Zhang 0001

Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.

ICLR Conference 2025 Conference Paper

Optimal Transport for Time Series Imputation

Hao Wang 0049
Zhengnan Li
Haoxuan Li 0001
Xu Chen 0017
Mingming Gong
BinChen
Zhichao Chen 0001

Missing data imputation through distribution alignment has demonstrated advantages for non-temporal datasets but exhibits suboptimal performance in time-series applications. The primary obstacle is crafting a discrepancy measure that simultaneously (1) captures temporal patterns—accounting for periodicity and temporal dependencies inherent in time-series—and (2) accommodates non-stationarity, ensuring robustness amidst multiple coexisting temporal patterns. In response to these challenges, we introduce the Proximal Spectrum Wasserstein (PSW) discrepancy, a novel discrepancy tailored for comparing two \textit{sets} of time-series based on optimal transport. It incorporates a pairwise spectral distance to encapsulate temporal patterns, and a selective matching regularization to accommodate non-stationarity. Subsequently, we develop the PSW for Imputation (PSW-I) framework, which iteratively refines imputation results by minimizing the PSW discrepancy. Extensive experiments demonstrate that PSW-I effectively accommodates temporal patterns and non-stationarity, outperforming prevailing time-series imputation methods. Code is available at https://github.com/FMLYD/PSW-I.

NeurIPS Conference 2025 Conference Paper

Practical Kernel Selection for Kernel-based Conditional Independence Test

Wenjie Wang
Mingming Gong
Biwei Huang
James Bailey
Bo Han
Kun Zhang
Feng Liu

Conditional independence (CI) testing is a fundamental yet challenging task in modern statistics and machine learning. One pivotal class of methods for assessing conditional independence encompasses kernel-based approaches, known for assessing CI by detecting general conditional dependence without imposing strict assumptions on relationships or data distributions. As with any method utilizing kernels, selecting appropriate kernels is crucial for precise identification. However, it remains underexplored in kernel-based CI methods, where the kernels are often determined manually or heuristically. In this paper, we analyze and propose a kernel parameter selection approach for the kernel-based conditional independence test (KCI). The kernel parameters are selected based on the ratio of the statistic to the asymptotic variance, which approximates the test power for the given parameters at large sample sizes. The search procedure is grid-based, allowing for parallelization with manageable additional computation time. We theoretically demonstrate the consistency of the proposed criterion and conduct extensive experiments on both synthetic and real data to show the effectiveness of our method.

ICML Conference 2025 Conference Paper

Projection Pursuit Density Ratio Estimation

Meilin Wang
Wei Huang
Mingming Gong
Zheng Zhang

Density ratio estimation (DRE) is a paramount task in machine learning, for its broad applications across multiple domains, such as covariate shift adaptation, causal inference, independence tests and beyond. Parametric methods for estimating the density ratio possibly lead to biased results if models are misspecified, while conventional non-parametric methods suffer from the curse of dimensionality when the dimension of data is large. To address these challenges, in this paper, we propose a novel approach for DRE based on the projection pursuit (PP) approximation. The proposed method leverages PP to mitigate the impact of high dimensionality while retaining the model flexibility needed for the accuracy of DRE. We establish the consistency and the convergence rate for the proposed estimator. Experimental results demonstrate that our proposed method outperforms existing alternatives in various applications.

NeurIPS Conference 2025 Conference Paper

Surprise3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes

Jiaxin Huang
Ziwen Li
Hanlue Zhang
Runnan Chen
Zhengqing Gao
Xiao He
Yandong Guo
Wenping Wang

The integration of language and 3D perception is critical for embodied AI and robotic systems to perceive, understand, and interact with the physical world. Spatial reasoning, a key capability for understanding spatial relationships between objects, remains underexplored in current 3D vision-language research. Existing datasets often mix semantic cues (e. g. , object name) with spatial context, leading models to rely on superficial shortcuts rather than genuinely interpreting spatial relationships. To address this gap, we introduce Surprise3D, a novel dataset designed to evaluate language-guided spatial reasoning segmentation in complex 3D scenes. Surprise3D consists of more than 200k vision language pairs across 900+ detailed indoor scenes from ScanNet++ v2, including more than 2. 8k unique object classes. The dataset contains 89k+ human-annotated spatial queries deliberately crafted without object name, thereby mitigating shortcut biases in spatial understanding. These queries comprehensively cover various spatial reasoning skills, such as relative position, narrative perspective, parametric perspective, and absolute distance reasoning. Initial benchmarks demonstrate significant challenges for current state-of-the-art expert 3D visual grounding methods and 3D-LLMs, underscoring the necessity of our dataset and the accompanying 3D Spatial Reasoning Segmentation (3D-SRS) benchmark suite. Surprise3D and 3D-SRS aim to facilitate advancements in spatially aware AI, paving the way for effective embodied interaction and robotic planning.

NeurIPS Conference 2025 Conference Paper

Towards Accurate Time Series Forecasting via Implicit Decoding

Xinyu Li
Yuchen Luo
Hao Wang
Haoxuan Li
Liuhua Peng
Feng Liu
Yandong Guo
Kun Zhang

Recent booming time series models have demonstrated remarkable forecasting performance. However, these methods often place greater focus on more effectively modelling the historical series, largely neglecting the forecasting phase, which generates long-term forecasts by separately predicting multiple time points. Given that real-world time series typically consist of various long short-term dynamics, independent predictions over individual time points may fail to express complex underlying patterns and can lead to a lack of global views. To address these issues, this work explores new perspectives from the forecasting phase and proposes a novel Implicit Forecaster (IF) as an additional decoding module. Inspired by decomposition forecasting, IF adopts a more nuanced approach by implicitly predicting constituent waves represented by their frequency, amplitude, and phase, thereby accurately forming the time series. Extensive experimental results from multiple real-world datasets show that IF can consistently boost mainstream time series models, achieving state-of-the-art forecasting performance. Code is available at this repository: https: //github. com/rakuyorain/Implicit-Forecaster.

ICLR Conference 2024 Conference Paper

A Variational Framework for Estimating Continuous Treatment Effects with Measurement Error

Erdun Gao
Howard D. Bondell
Wei Huang
Mingming Gong

Estimating treatment effects has numerous real-world applications in various fields, such as epidemiology and political science. While much attention has been devoted to addressing the challenge using fully observational data, there has been comparatively limited exploration of this issue in cases when the treatment is not directly observed. In this paper, we tackle this problem by developing a general variational framework, which is flexible to integrate with advanced neural network-based approaches, to identify the average dose-response function (ADRF) with the continuously valued error-contaminated treatment. Our approach begins with the formulation of a probabilistic data generation model, treating the unobserved treatment as a latent variable. In this model, we leverage a learnable density estimation neural network to derive its prior distribution conditioned on covariates. This module also doubles as a generalized propensity score estimator, effectively mitigating selection bias arising from observed confounding variables. Subsequently, we calculate the posterior distribution of the treatment, taking into account the observed measurement and outcome. To mitigate the impact of treatment error, we introduce a re-parametrized treatment value, replacing the error-affected one, to make more accurate predictions regarding the outcome. To demonstrate the adaptability of our framework, we incorporate two state-of-the-art ADRF estimation methods and rigorously assess its efficacy through extensive simulations and experiments using semi-synthetic data.

JMLR Journal 2024 Journal Article

Causal-learn: Causal Discovery in Python

Yujia Zheng
Biwei Huang
Wei Chen
Joseph Ramsey
Mingming Gong
Ruichu Cai
Shohei Shimizu
Peter Spirtes

Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe causal-learn, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, causal-learn is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

NeurIPS Conference 2024 Conference Paper

Discovery of the Hidden World with Large Language Models

Chenxi Liu
Yongqiang Chen
Tongliang Liu
Mingming Gong
James Cheng
Bo Han
Kun Zhang

Revealing the underlying causal mechanisms in the real world is the key to the development of science. Despite the progress in the past decades, traditional causal discovery approaches (CDs) mainly rely on high-quality measured variables, usually given by human experts, to find causal relations. The lack of well-defined high-level variables in many real-world applications has already been a longstanding roadblock to a broader application of CDs. To this end, this paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap. LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data. Therefore, it is natural to employ LLMs to assist with proposing useful high-level factors and crafting their measurements. Meanwhile, COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors. We show that LLMs and CDs are mutually beneficial and the constructed feedback provably also helps with the factor proposal. We construct and curate several synthetic and real-world benchmarks including analysis of human reviews and diagnosis of neuropathic and brain tumors, to comprehensively evaluate COAT. Extensive empirical results confirm the effectiveness and reliability of COAT with significant improvements.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation

Hao Liu
Xin Li
Mingming Gong
Bing Liu
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Xing Sun

Recently, Table Structure Recognition (TSR) task, aiming at identifying table structure into machine readable formats, has received increasing interest in the community. While impressive success, most single table component-based methods can not perform well on unregularized table cases distracted by not only complicated inner structure but also exterior capture distortion. In this paper, we raise it as Complex TSR problem, where the performance degeneration of existing methods is attributable to their inefficient component usage and redundant post-processing. To mitigate it, we shift our perspective from table component extraction towards the efficient multiple components leverage, which awaits further exploration in the field. Specifically, we propose a seminal method, termed GrabTab, equipped with newly proposed Component Deliberator, to handle various types of tables in a unified framework. Thanks to its progressive deliberation mechanism, our GrabTab can flexibly accommodate to most complex tables with reasonable components selected but without complicated post-processing involved. Quantitative experimental results on public benchmarks demonstrate that our method significantly outperforms the state-of-the-arts, especially under more challenging scenes.

PDF Details DOI

AAAI Conference 2024 Conference Paper

HuTuMotion: Human-Tuned Navigation of Latent Motion Diffusion Models with Minimal Feedback

Gaoge Han
Shaoli Huang
Mingming Gong
Jinglei Tang

We introduce HuTuMotion, an innovative approach for generating natural human motions that navigates latent motion diffusion models by leveraging few-shot human feedback. Unlike existing approaches that sample latent variables from a standard normal prior distribution, our method adapts the prior distribution to better suit the characteristics of the data, as indicated by human feedback, thus enhancing the quality of motion generation. Furthermore, our findings reveal that utilizing few-shot feedback can yield performance levels on par with those attained through extensive human feedback. This discovery emphasizes the potential and efficiency of incorporating few-shot human-guided optimization within latent diffusion models for personalized and style-aware human motion generation applications. The experimental results show the significantly superior performance of our method over existing state-of-the-art approaches.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Identifiability Analysis of Linear ODE Systems with Hidden Confounders

Yuanyuan Wang
Biwei Huang
Wei Huang
Xi Geng
Mingming Gong

The identifiability analysis of linear Ordinary Differential Equation (ODE) systems is a necessary prerequisite for making reliable causal inferences about these systems. While identifiability has been well studied in scenarios where the system is fully observable, the conditions for identifiability remain unexplored when latent variables interact with the system. This paper aims to address this gap by presenting a systematic analysis of identifiability in linear ODE systems incorporating hidden confounders. Specifically, we investigate two cases of such systems. In the first case, latent confounders exhibit no causal relationships, yet their evolution adheres to specific functional forms, such as polynomial functions of time $t$. Subsequently, we extend this analysis to encompass scenarios where hidden confounders exhibit causal dependencies, with the causal structure of latent variables described by a Directed Acyclic Graph (DAG). The second case represents a more intricate variation of the first case, prompting a more comprehensive identifiability analysis. Accordingly, we conduct detailed identifiability analyses of the second system under various observation conditions, including both continuous and discrete observations from single or multiple trajectories. To validate our theoretical results, we perform a series of simulations, which support and substantiate our findings.

PDF Details DOI

JMLR Journal 2024 Journal Article

Identifiability and Asymptotics in Learning Homogeneous Linear ODE Systems from Discrete Observations

Yuanyuan Wang
Wei Huang
Mingming Gong
Xi Geng
Tongliang Liu
Kun Zhang
Dacheng Tao

Ordinary Differential Equations (ODEs) have recently gained a lot of attention in machine learning. However, the theoretical aspects, for example, identifiability and asymptotic properties of statistical estimation are still obscure. This paper derives a sufficient condition for the identifiability of homogeneous linear ODE systems from a sequence of equally-spaced error-free observations sampled from a single trajectory. When observations are disturbed by measurement noise, we prove that under mild conditions, the parameter estimator based on the Nonlinear Least Squares (NLS) method is consistent and asymptotic normal with $n^{-1/2}$ convergence rate. Based on the asymptotic normality property, we construct confidence sets for the unknown system parameters and propose a new method to infer the causal structure of the ODE system, that is, inferring whether there is a causal link between system variables. Furthermore, we extend the results to degraded observations, including aggregated and time-scaled ones. To the best of our knowledge, our work is the first systematic study of the identifiability and asymptotic properties in learning linear ODE systems. We also construct simulations with various system dimensions to illustrate the established theoretical results. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

ICLR Conference 2024 Conference Paper

Identifiable Latent Polynomial Causal Models through the Lens of Change

Yuhang Liu 0002
Zhen Zhang 0008
Dong Gong
Mingming Gong
Biwei Huang
Anton van den Hengel
Kun Zhang 0001
Qinfeng Shi

Causal representation learning aims to unveil latent high-level causal representations from observed low-level data. One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as \textit{identifiability}. A recent breakthrough explores identifiability by leveraging the change of causal influences among latent causal variables across multiple environments \citep{liu2022identifying}. However, this progress rests on the assumption that the causal relationships among latent causal variables adhere strictly to linear Gaussian models. In this paper, we extend the scope of latent causal models to involve nonlinear causal relationships, represented by polynomial models, and general noise distributions conforming to the exponential family. Additionally, we investigate the necessity of imposing changes on all causal parameters and present partial identifiability results when part of them remains unchanged. Further, we propose a novel empirical estimation method, grounded in our theoretical finding, that enables learning consistent latent causal representations. Our experimental results, obtained from both synthetic and real-world data, validate our theoretical contributions concerning identifiability and consistency.

ICLR Conference 2024 Conference Paper

Improving Non-Transferable Representation Learning by Harnessing Content and Style

Ziming Hong
Zhenyi Wang 0001
Li Shen 0008
Yu Yao 0005
Zhuo Huang
Shiming Chen 0002
Chuanwu Yang
Mingming Gong

Non-transferable learning (NTL) aims to restrict the generalization of models toward the target domain(s). To this end, existing works learn non-transferable representations by reducing statistical dependence between the source and target domain. However, such statistical methods essentially neglect to distinguish between *styles* and *contents*, leading them to inadvertently fit (i) spurious correlation between *styles* and *labels*, and (ii) fake independence between *contents* and *labels*. Consequently, their performance will be limited when natural distribution shifts occur or malicious intervention is imposed. In this paper, we propose a novel method (dubbed as H-NTL) to understand and advance the NTL problem by introducing a causal model to separately model *content* and *style* as two latent factors, based on which we disentangle and harness them as guidances for learning non-transferable representations with intrinsically causal relationships. Speciﬁcally, to avoid fitting spurious correlation and fake independence, we propose a variational inference framework to disentangle the naturally mixed *content factors* and *style factors* under our causal model. Subsequently, based on dual-path knowledge distillation, we harness the disentangled two *factors* as guidances for non-transferable representation learning: (i) we constraint the source domain representations to fit *content factors* (which are the intrinsic cause of *labels*), and (ii) we enforce that the target domain representations fit *style factors* which barely can predict labels. As a result, the learned feature representations follow optimal untransferability toward the target domain and minimal negative influence on the source domain, thus enabling better NTL performance. Empirically, the proposed H-NTL signiﬁcantly outperforms competing methods by a large margin.

NeurIPS Conference 2024 Conference Paper

In-N-Out: Lifting 2D Diffusion Prior for 3D Object Removal via Tuning-Free Latents Alignment

Dongting Hu
Huan Fu
Jiaxian Guo
Liuhua Peng
Tingjin Chu
Feng Liu
Tongliang Liu
Mingming Gong

Neural representations for 3D scenes have made substantial advancements recently, yet object removal remains a challenging yet practical issue, due to the absence of multi-view supervision over occluded areas. Diffusion Models (DMs), trained on extensive 2D images, show diverse and high-fidelity generative capabilities in the 2D domain. However, due to not being specifically trained on 3D data, their application to multi-view data often exacerbates inconsistency, hence impacting the overall quality of the 3D output. To address these issues, we introduce "In-N-Out", a novel approach that begins by inpainting a prior, i. e. , the occluded area from a single view using DMs, followed by outstretching it to create multi-view inpaintings via latents alignments. Our analysis identifies that the variability in DMs' outputs mainly arises from initially sampled latents and intermediate latents predicted in the denoising process. We explicitly align of initial latents using a Neural Radiance Field (NeRF) to establish a consistent foundational structure in the inpainted area, complemented by an implicit alignment of intermediate latents through cross-view attention during the denoising phases, enhancing appearance consistency across views. To further enhance rendering results, we apply a patch-based hybrid loss to optimize NeRF. We demonstrate that our techniques effectively mitigate the challenges posed by inconsistencies in DMs and substantially improve the fidelity and coherence of inpainted 3D representations.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Interventional Fairness on Partially Known Causal Graphs: A Constrained Optimization Approach

Aoqi Zuo
Yiqing Li 0002
Susan Wei
Mingming Gong

Fair machine learning aims to prevent discrimination against individuals or sub-populations based on sensitive attributes such as gender and race. In recent years, causal inference methods have been increasingly used in fair machine learning to measure unfairness by causal effects. However, current methods assume that the true causal graph is given, which is often not true in real-world applications. To address this limitation, this paper proposes a framework for achieving causal fairness based on the notion of interventions when the true causal graph is partially known. The proposed approach involves modeling fair prediction using a Partially Directed Acyclic Graph (PDAG), specifically, a class of causal DAGs that can be learned from observational data combined with domain knowledge. The PDAG is used to measure causal fairness, and a constrained optimization problem is formulated to balance between fairness and accuracy. Results on both simulated and real-world datasets demonstrate the effectiveness of this method.

NeurIPS Conference 2024 Conference Paper

Neural Collapse Inspired Feature Alignment for Out-of-Distribution Generalization

Zhikang Chen
Min Zhang
Sen Cui
Haoxuan Li
Gang Niu
Mingming Gong
Changshui Zhang
Kun Zhang

The spurious correlation between the background features of the image and its label arises due to that the samples labeled with the same class in the training set often co-occurs with a specific background, which will cause the encoder to extract non-semantic features for classification, resulting in poor out-of-distribution generalization performance. Although many studies have been proposed to address this challenge, the semantic and spurious features are still difficult to accurately decouple from the original image and fail to achieve high performance with deep learning models. This paper proposes a novel perspective inspired by neural collapse to solve the spurious correlation problem through the alternate execution of environment partitioning and learning semantic masks. Specifically, we propose to assign an environment to each sample by learning a local model for each environment and using maximum likelihood probability. At the same time, we require that the learned semantic mask neurally collapses to the same simplex equiangular tight frame (ETF) in each environment after being applied to the original input. We conduct extensive experiments on four datasets, and the results demonstrate that our method significantly improves out-of-distribution performance.

PDF Details DOI

JMLR Journal 2024 Journal Article

On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis for Parametric Models

Xuetong Wu
Mingming Gong
Jonathan H. Manton
Uwe Aickelin
Jingge Zhu

Recent advancements in unsupervised domain adaptation (UDA) and semi-supervised learning (SSL), particularly incorporating causality, have led to significant methodological improvements in these learning problems. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL scenarios where we access $m$ labelled source data and $n$ unlabelled target data as training instances under different causal settings with a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain from an information-theoretic perspective. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of $O(\frac{1}{m})$ only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabelled data dominate the performance at a rate of typically $O(\frac{1}{n})$. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

ICML Conference 2024 Conference Paper

On the Recoverability of Causal Relations from Temporally Aggregated I. I. D. Data

Shunxing Fan
Mingming Gong
Kun Zhang 0001

We consider the effect of temporal aggregation on instantaneous (non-temporal) causal discovery in general setting. This is motivated by the observation that the true causal time lag is often considerably shorter than the observational interval. This discrepancy leads to high aggregation, causing time-delay causality to vanish and instantaneous dependence to manifest. Although we expect such instantaneous dependence has consistency with the true causal relation in certain sense to make the discovery results meaningful, it remains unclear what type of consistency we need and when will such consistency be satisfied. We proposed functional consistency and conditional independence consistency in formal way correspond functional causal model-based methods and conditional independence-based methods respectively and provide the conditions under which these consistencies will hold. We show theoretically and experimentally that causal discovery results may be seriously distorted by aggregation especially in complete nonlinear case and we also find causal relationship still recoverable from aggregated data if we have partial linearity or appropriate prior. Our findings suggest community should take a cautious and meticulous approach when interpreting causal discovery results from such data and show why and when aggregation will distort the performance of causal discovery methods.

ICML Conference 2024 Conference Paper

Optimal Kernel Choice for Score Function-based Causal Discovery

Wenjie Wang
Biwei Huang
Feng Liu 0003
Xinge You
Tongliang Liu
Kun Zhang 0001
Mingming Gong

Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.

NeurIPS Conference 2023 Conference Paper

ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

Lunhao Duan
Shanshan Zhao
Nan Xue
Mingming Gong
Gui-Song Xia
Dacheng Tao

Transformers have been recently explored for 3D point cloud understanding with impressive progress achieved. A large number of points, over 0. 1 million, make the global self-attention infeasible for point cloud data. Thus, most methods propose to apply the transformer in a local region, e. g. , spherical or cubic window. However, it still contains a large number of Query-Key pairs, which requires high computational costs. In addition, previous methods usually learn the query, key, and value using a linear projection without modeling the local 3D geometric structure. In this paper, we attempt to reduce the costs and model the local geometry prior by developing a new transformer block, named ConDaFormer. Technically, ConDaFormer disassembles the cubic window into three orthogonal 2D planes, leading to fewer points when modeling the attention in a similar range. The disassembling operation is beneficial to enlarging the range of attention without increasing the computational complexity, but ignores some contexts. To provide a remedy, we develop a local structure enhancement strategy that introduces a depth-wise convolution before and after the attention. This scheme can also capture the local geometric information. Taking advantage of these designs, ConDaFormer captures both long-range contextual information and local priors. The effectiveness is demonstrated by experimental results on several 3D point cloud understanding benchmarks. Our code will be available.

NeurIPS Conference 2023 Conference Paper

CS-Isolate: Extracting Hard Confident Examples by Content and Style Isolation

Yexiong Lin
Yu Yao
Xiaolong Shi
Mingming Gong
Xu Shen
Dong Xu
Tongliang Liu

Label noise widely exists in large-scale image datasets. To mitigate the side effects of label noise, state-of-the-art methods focus on selecting confident examples by leveraging semi-supervised learning. Existing research shows that the ability to extract hard confident examples, which are close to the decision boundary, significantly influences the generalization ability of the learned classifier. In this paper, we find that a key reason for some hard examples being close to the decision boundary is due to the entanglement of style factors with content factors. The hard examples become more discriminative when we focus solely on content factors, such as semantic information, while ignoring style factors. Nonetheless, given only noisy data, content factors are not directly observed and have to be inferred. To tackle the problem of inferring content factors for classification when learning with noisy labels, our objective is to ensure that the content factors of all examples in the same underlying clean class remain unchanged as their style information changes. To achieve this, we utilize different data augmentation techniques to alter the styles while regularizing content factors based on some confident examples. By training existing methods with our inferred content factors, CS-Isolate proves their effectiveness in learning hard examples on benchmark datasets. The implementation is available at https: //github. com/tmllab/2023 NeurIPS CS-isolate.

ICML Conference 2023 Conference Paper

Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation

Ruijiang Dong
Feng Liu 0003
Haoang Chi
Tongliang Liu
Mingming Gong
Gang Niu 0001
Masashi Sugiyama
Bo Han 0003

Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i. e. , a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i. e. , maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays an important role in addressing the FHA problem.

TMLR Journal 2023 Journal Article

FedDAG: Federated DAG Structure Learning

Erdun Gao
Junjia Chen
Li Shen
Tongliang Liu
Mingming Gong
Howard Bondell

To date, most directed acyclic graphs (DAGs) structure learning approaches require data to be stored in a central server. However, due to the consideration of privacy protection, data owners gradually refuse to share their personalized raw data to avoid private information leakage, making this task more troublesome by cutting off the first step. Thus, a puzzle arises: how do we discover the underlying DAG structure from decentralized data? In this paper, focusing on the additive noise models (ANMs) assumption of data generation, we take the first step in developing a gradient-based learning framework named FedDAG, which can learn the DAG structure without directly touching the local data and also can naturally handle the data heterogeneity. Our method benefits from a two-level structure of each local model. The first level structure learns the edges and directions of the graph and communicates with the server to get the model information from other clients during the learning procedure, while the second level structure approximates the mechanisms among variables and personally updates on its own data to accommodate the data heterogeneity. Moreover, FedDAG formulates the overall learning task as a continuous optimization problem by taking advantage of an equality acyclicity constraint, which can be solved by gradient descent methods to boost the searching efficiency. Extensive experiments on both synthetic and real-world datasets verify the efficacy of the proposed method.

NeurIPS Conference 2023 Conference Paper

Generator Identification for Linear SDEs with Additive and Multiplicative Noise

Yuanyuan Wang
Xi Geng
Wei Huang
Biwei Huang
Mingming Gong

In this paper, we present conditions for identifying the generator of a linear stochastic differential equation (SDE) from the distribution of its solution process with a given fixed initial state. These identifiability conditions are crucial in causal inference using linear SDEs as they enable the identification of the post-intervention distributions from its observational distribution. Specifically, we derive a sufficient and necessary condition for identifying the generator of linear SDEs with additive noise, as well as a sufficient condition for identifying the generator of linear SDEs with multiplicative noise. We show that the conditions derived for both types of SDEs are generic. Moreover, we offer geometric interpretations of the derived identifiability conditions to enhance their understanding. To validate our theoretical results, we perform a series of simulations, which support and substantiate the established findings.

ICLR Conference 2023 Conference Paper

Harnessing Out-Of-Distribution Examples via Augmenting Content and Style

Zhuo Huang
Xiaobo Xia
Li Shen 0008
Bo Han 0003
Mingming Gong
Chen Gong 0002
Tongliang Liu

Machine learning models are vulnerable to Out-Of-Distribution (OOD) examples, such a problem has drawn much attention. However, current methods lack a full understanding of different types of OOD data: there are benign OOD data that can be properly adapted to enhance the learning performance, while other malign OOD data would severely degenerate the classification result. To Harness OOD data, this paper proposes HOOD method that can leverage the content and style from each image instance to identify benign and malign OOD data. Particularly, we design a variational inference framework to causally disentangle content and style features by constructing a structural causal model. Subsequently, we augment the content and style through an intervention process to produce malign and benign OOD data, respectively. The benign OOD data contain novel styles but hold our interested contents, and they can be leveraged to help train a style-invariant model. In contrast, the malign OOD data inherit unknown contents but carry familiar styles, by detecting them can improve model robustness against deceiving anomalies. Thanks to the proposed novel disentanglement and data augmentation techniques, HOOD can effectively deal with OOD examples in unknown and open environments, whose effectiveness is empirically validated in three typical OOD applications including OOD detection, open-set semi-supervised learning, and open-set domain adaptation.

ICRA Conference 2023 Conference Paper

Knowledge Distillation for Feature Extraction in Underwater VSLAM

Jinghe Yang
Mingming Gong
Girish Nair
Jung-Hoon Lee
Jason Monty
Ye Pu

In recent years, learning-based feature detection and matching have outperformed manually-designed methods in in-air cases. However, it is challenging to learn the features in the underwater scenario due to the absence of annotated underwater datasets. This paper proposes a cross-modal knowl-edge distillation framework for training an underwater feature detection and matching network (UFEN). In particular, we use in-air RGBD data to generate synthetic underwater images based on a physical underwater imaging formation model and employ these as the medium to distil knowledge from a teacher model SuperPoint pretrained on in-air images. We embed UFEN into the ORB-SLAM3 framework to replace the ORB feature by introducing an additional binarization layer. To test the effectiveness of our method, we built a new underwater dataset with groundtruth measurements named EASI (https://github.com/Jinghe-mel/UFEN-SLAM), recorded in an indoor water tank for different turbidity levels. The experimental results on the existing dataset and our new dataset demonstrate the effectiveness of our method.

NeurIPS Conference 2023 Conference Paper

Learning World Models with Identifiable Factorization

Yuren Liu
Biwei Huang
Zhengmao Zhu
Honglong Tian
Mingming Gong
Yang Yu
Kun Zhang

Extracting a stable and compact representation of the environment is crucial for efficient reinforcement learning in high-dimensional, noisy, and non-stationary environments. Different categories of information coexist in such environments -- how to effectively extract and disentangle the information remains a challenging problem. In this paper, we propose IFactor, a general framework to model four distinct categories of latent state variables that capture various aspects of information within the RL system, based on their interactions with actions and rewards. Our analysis establishes block-wise identifiability of these latent variables, which not only provides a stable and compact representation but also discloses that all reward-relevant factors are significant for policy learning. We further present a practical approach to learning the world model with identifiable blocks, ensuring the removal of redundancies but retaining minimal and sufficient information for policy optimization. Experiments in synthetic worlds demonstrate that our method accurately identifies the ground-truth latent variables, substantiating our theoretical findings. Moreover, experiments in variants of the DeepMind Control Suite and RoboDesk showcase the superior performance of our approach over baselines.

ICLR Conference 2023 Conference Paper

Mosaic Representation Learning for Self-supervised Visual Pre-training

Zhaoqing Wang
Ziyu Chen
Yaqian Li
Yandong Guo
Jun Yu 0001
Mingming Gong
Tongliang Liu

Self-supervised learning has achieved significant success in learning visual representations without the need for manual annotation. To obtain generalizable representations, a meticulously designed data augmentation strategy is one of the most crucial parts. Recently, multi-crop strategies utilizing a set of small crops as positive samples have been shown to learn spatially structured features. However, it overlooks the diverse contextual backgrounds, which reduces the variance of the input views and degenerates the performance. To address this problem, we propose a mosaic representation learning framework (MosRep), consisting of a new data augmentation strategy that enriches the backgrounds of each small crop and improves the quality of visual representations. Specifically, we randomly sample numbers of small crops from different input images and compose them into a mosaic view, which is equivalent to introducing different background information for each small crop. Additionally, we further jitter the mosaic view to prevent memorizing the spatial locations of each crop. Along with optimization, our MosRep gradually extracts more discriminative features. Extensive experimental results demonstrate that our method improves the performance far greater than the multi-crop strategy on a series of downstream tasks, e.g., +7.4% and +4.9% than the multi-crop strategy on ImageNet-1K with 1% label and 10% label, respectively. Code is available at https://github.com/DerrickWang005/MosRep.git.

ICLR Conference 2023 Conference Paper

Multi-domain image generation and translation with identifiability guarantees

Shaoan Xie
Lingjing Kong
Mingming Gong
Kun Zhang 0001

Multi-domain image generation and unpaired image-to-to-image translation are two important and related computer vision problems. The common technique for the two tasks is the learning of a joint distribution from multiple marginal distributions. However, it is well known that there can be infinitely many joint distributions that can derive the same marginals. Hence, it is necessary to formulate suitable constraints to address this highly ill-posed problem. Inspired by the recent advances in nonlinear Independent Component Analysis (ICA) theory, we propose a new method to learn the joint distribution from the marginals by enforcing a specific type of minimal change across domains. We report one of the first results connecting multi-domain generative models to identifiability and shows why identifiability is essential and how to achieve it theoretically and practically. We apply our method to five multi-domain image generation and six image-to-image translation tasks. The superior performance of our model supports our theory and demonstrates the effectiveness of our method. The training code are available at https://github.com/Mid-Push/i-stylegan.

NeurIPS Conference 2023 Conference Paper

Semi-Implicit Denoising Diffusion Models (SIDDMs)

Yanwu Xu
Mingming Gong
Shaoan Xie
Wei Wei
Matthias Grundmann
Kayhan Batmanghelich
Tingbo Hou

Despite the proliferation of generative models, achieving fast sampling during inference without compromising sample diversity and quality remains challenging. Existing models such as Denoising Diffusion Probabilistic Models (DDPM) deliver high-quality, diverse samples but are slowed by an inherently high number of iterative steps. The Denoising Diffusion Generative Adversarial Networks (DDGAN) attempted to circumvent this limitation by integrating a GAN model for larger jumps in the diffusion process. However, DDGAN encountered scalability limitations when applied to large datasets. To address these limitations, we introduce a novel approach that tackles the problem by matching implicit and explicit factors. More specifically, our approach involves utilizing an implicit model to match the marginal distributions of noisy data and the explicit conditional distribution of the forward diffusion. This combination allows us to effectively match the joint denoising distributions. Unlike DDPM but similar to DDGAN, we do not enforce a parametric distribution for the reverse step, enabling us to take large steps during inference. Similar to the DDPM but unlike DDGAN, we take advantage of the exact form of the diffusion process. We demonstrate that our proposed method obtains comparable generative performance to diffusion-based models and vastly superior results to models with a small number of sampling steps.

ICML Conference 2023 Conference Paper

Which is Better for Learning with Noisy Labels: The Semi-supervised Method or Modeling Label Noise?

Yu Yao 0005
Mingming Gong
Yuxuan Du
Jun Yu 0001
Bo Han 0003
Kun Zhang 0001
Tongliang Liu

In real life, accurately annotating large-scale datasets is sometimes difficult. Datasets used for training deep learning models are likely to contain label noise. To make use of the dataset containing label noise, two typical methods have been proposed. One is to employ the semi-supervised method by exploiting labeled confident examples and unlabeled unconfident examples. The other one is to model label noise and design statistically consistent classifiers. A natural question remains unsolved: which one should be used for a specific real-world application? In this paper, we answer the question from the perspective of causal data generative process. Specifically, the performance of the semi-supervised based method depends heavily on the data generative process while the method modeling label-noise is not influenced by the generation process. For example, for a given dataset, if it has a causal generative structure that the features cause the label, the semi-supervised based method would not be helpful. When the causal structure is unknown, we provide an intuitive method to discover the causal structure for a given dataset containing label noise.

ICLR Conference 2022 Conference Paper

A Relational Intervention Approach for Unsupervised Dynamics Generalization in Model-Based Reinforcement Learning

Jiaxian Guo
Mingming Gong
Dacheng Tao

The generalization of model-based reinforcement learning (MBRL) methods to environments with unseen transition dynamics is an important yet challenging problem. Existing methods try to extract environment-specified information $Z$ from past transition segments to make the dynamics prediction model generalizable to different dynamics. However, because environments are not labelled, the extracted information inevitably contains redundant information unrelated to the dynamics in transition segments and thus fails to maintain a crucial property of $Z$: $Z$ should be similar in the same environment and dissimilar in different ones. As a result, the learned dynamics prediction function will deviate from the true one, which undermines the generalization ability. To tackle this problem, we introduce an interventional prediction module to estimate the probability of two estimated $\hat{z}_i, \hat{z}_j$ belonging to the same environment. Furthermore, by utilizing the $Z$'s invariance within a single environment, a relational head is proposed to enforce the similarity between $\hat{{Z}}$ from the same environment. As a result, the redundant information will be reduced in $\hat{Z}$. We empirically show that $\hat{{Z}}$ estimated by our method enjoy less redundant information than previous methods, and such $\hat{{Z}}$ can significantly reduce dynamics prediction errors and improve the performance of model-based RL methods on zero-shot new environments with unseen dynamics. The codes of this method are available at \url{https://github.com/CR-Gjx/RIA}.

ICLR Conference 2022 Conference Paper

Adversarial Robustness Through the Lens of Causality

Yonggang Zhang 0003
Mingming Gong
Tongliang Liu
Gang Niu 0001
Xinmei Tian 0001
Bo Han 0003
Bernhard Schölkopf
Kun Zhang 0001

The adversarial vulnerability of deep neural networks has attracted signiﬁcant attention in machine learning. As causal reasoning has an instinct for modeling distribution change, it is essential to incorporate causality into analyzing this specific type of distribution change induced by adversarial attacks. However, causal formulations of the intuition of adversarial attacks and the development of robust DNNs are still lacking in the literature. To bridge this gap, we construct a causal graph to model the generation process of adversarial examples and define the adversarial distribution to formalize the intuition of adversarial attacks. From the causal perspective, we study the distinction between the natural and adversarial distribution and conclude that the origin of adversarial vulnerability is the focus of models on spurious correlations. Inspired by the causal understanding, we propose the \emph{Causal}-inspired \emph{Adv}ersarial distribution alignment method, CausalAdv, to eliminate the difference between natural and adversarial distributions by considering spurious correlations. Extensive experiments demonstrate the efficacy of the proposed method. Our work is the first attempt towards using causality to understand and mitigate the adversarial vulnerability.

NeurIPS Conference 2022 Conference Paper

Counterfactual Fairness with Partially Known Causal Graph

Aoqi Zuo
Susan Wei
Tongliang Liu
Bo Han
Kun Zhang
Mingming Gong

Fair machine learning aims to avoid treating individuals or sub-populations unfavourably based on \textit{sensitive attributes}, such as gender and race. Those methods in fair machine learning that are built on causal inference ascertain discrimination and bias through causal effects. Though causality-based fair learning is attracting increasing attention, current methods assume the true causal graph is fully known. This paper proposes a general method to achieve the notion of counterfactual fairness when the true causal graph is unknown. To select features that lead to counterfactual fairness, we derive the conditions and algorithms to identify ancestral relations between variables on a \textit{Partially Directed Acyclic Graph (PDAG)}, specifically, a class of causal DAGs that can be learned from observational data combined with domain knowledge. Interestingly, we find that counterfactual fairness can be achieved as if the true causal graph were fully known, when specific background knowledge is provided: the sensitive attributes do not have ancestors in the causal graph. Results on both simulated and real-world datasets demonstrate the effectiveness of our method.

JBHI Journal 2022 Journal Article

Hierarchical Amortized GAN for 3D High Resolution Medical Image Synthesis

Li Sun
Junxiang Chen
Yanwu Xu
Mingming Gong
Ke Yu
Kayhan Batmanghelich

Generative Adversarial Networks (GAN) have many potential medical imaging applications, including data augmentation, domain adaptation, and model explanation. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models either cannot scale to high-resolution or are prone to patchy artifacts. In this work, we propose a novel end-to-end GAN architecture that can generate high-resolution 3D images. We achieve this goal by using different configurations between training and inference. During training, we adopt a hierarchical structure that simultaneously generates a low-resolution version of the image and a randomly selected sub-volume of the high-resolution image. The hierarchical design has two advantages: First, the memory demand for training on high-resolution images is amortized among sub-volumes. Furthermore, anchoring the high-resolution sub-volumes to a single low-resolution image ensures anatomical consistency between sub-volumes. During inference, our model can directly generate full high-resolution images. We also incorporate an encoder with a similar hierarchical structure into the model to extract features from the images. Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation. We also demonstrate clinical applications of the proposed model in data augmentation and clinical-relevant feature extraction.

NeurIPS Conference 2022 Conference Paper

MissDAG: Causal Discovery in the Presence of Missing Data with Continuous Additive Noise Models

Erdun Gao
Ignavier Ng
Mingming Gong
Li Shen
Wei Huang
Tongliang Liu
Kun Zhang
Howard Bondell

State-of-the-art causal discovery methods usually assume that the observational data is complete. However, the missing data problem is pervasive in many practical scenarios such as clinical trials, economics, and biology. One straightforward way to address the missing data problem is first to impute the data using off-the-shelf imputation methods and then apply existing causal discovery methods. However, such a two-step method may suffer from suboptimality, as the imputation algorithm may introduce bias for modeling the underlying data distribution. In this paper, we develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations. Focusing mainly on the assumptions of ignorable missingness and the identifiable additive noise models (ANMs), MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization (EM) framework. In the E-step, in cases where computing the posterior distributions of parameters in closed-form is not feasible, Monte Carlo EM is leveraged to approximate the likelihood. In the M-step, MissDAG leverages the density transformation to model the noise distributions with simpler and specific formulations by virtue of the ANMs and uses a likelihood-based causal discovery algorithm with directed acyclic graph constraint. We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.

ICLR Conference 2022 Conference Paper

Rethinking Class-Prior Estimation for Positive-Unlabeled Learning

Yu Yao 0005
Tongliang Liu
Bo Han 0003
Mingming Gong
Gang Niu 0001
Masashi Sugiyama
Dacheng Tao

Given only positive (P) and unlabeled (U) data, PU learning can train a binary classifier without any negative data. It has two building blocks: PU class-prior estimation (CPE) and PU classification; the latter has been well studied while the former has received less attention. Hitherto, the distributional-assumption-free CPE methods rely on a critical assumption that the support of the positive data distribution cannot be contained in the support of the negative data distribution. If this is violated, those CPE methods will systematically overestimate the class prior; it is even worse that we cannot verify the assumption based on the data. In this paper, we rethink CPE for PU learning—can we remove the assumption to make CPE always valid? We show an affirmative answer by proposing Regrouping CPE (ReCPE) that builds an auxiliary probability distribution such that the support of the positive data distribution is never contained in the support of the negative data distribution. ReCPE can work with any CPE method by treating it as the base method. Theoretically, ReCPE does not affect its base if the assumption already holds for the original probability distribution; otherwise, it reduces the positive bias of its base. Empirically, ReCPE improves all state-of-the-art CPE methods on various datasets, implying that the assumption has indeed been violated here.

IJCAI Conference 2022 Conference Paper

Robust Weight Perturbation for Adversarial Training

Chaojian Yu
Bo Han
Mingming Gong
Li Shen
Shiming Ge
Du Bo
Tongliang Liu

Overfitting widely exists in adversarial robust training of deep networks. An effective remedy is adversarial weight perturbation, which injects the worst-case weight perturbation during network training by maximizing the classification loss on adversarial examples. Adversarial weight perturbation helps reduce the robust generalization gap; however, it also undermines the robustness improvement. A criterion that regulates the weight perturbation is therefore crucial for adversarial training. In this paper, we propose such a criterion, namely Loss Stationary Condition (LSC) for constrained perturbation. With LSC, we find that it is essential to conduct weight perturbation on adversarial data with small classification loss to eliminate robust overfitting. Weight perturbation on adversarial data with large classification loss is not necessary and may even lead to poor robustness. Based on these observations, we propose a robust perturbation strategy to constrain the extent of weight perturbation. The perturbation strategy prevents deep networks from overfitting while avoiding the side effect of excessive weight perturbation, significantly improving the robustness of adversarial training. Extensive experiments demonstrate the superiority of the proposed method over the state-of-the-art adversarial training methods.

PDF Details DOI

ICLR Conference 2022 Conference Paper

Sample Selection with Uncertainty of Losses for Learning with Noisy Labels

Xiaobo Xia
Tongliang Liu
Bo Han 0003
Mingming Gong
Jun Yu 0001
Gang Niu 0001
Masashi Sugiyama

In learning with noisy labels, the sample selection approach is very popular, which regards small-loss data as correctly labeled data during training. However, losses are generated on-the-ﬂy based on the model being trained with noisy labels, and thus large-loss data are likely but not certain to be incorrect. There are actually two possibilities of a large-loss data point: (a) it is mislabeled, and then its loss decreases slower than other data, since deep neural networks learn patterns ﬁrst; (b) it belongs to an underrepresented group of data and has not been selected yet. In this paper, we incorporate the uncertainty of losses by adopting interval estimation instead of point estimation of losses, where lower bounds of the conﬁdence intervals of losses derived from distribution-free concentration inequalities, but not losses themselves, are used for sample selection. In this way, we also give large-loss but less selected data a try; then, we can better distinguish between the cases (a) and (b) by seeing if the losses effectively decrease with the uncertainty after the try. As a result, we can better explore underrepresented data that are correctly labeled but seem to be mislabeled at ﬁrst glance. Experiments demonstrate that the proposed method is superior to baselines and robust to a broad range of label noise types.

NeurIPS Conference 2022 Conference Paper

Truncated Matrix Power Iteration for Differentiable DAG Learning

Zhen Zhang
Ignavier Ng
Dong Gong
Yuhang Liu
Ehsan Abbasnejad
Mingming Gong
Kun Zhang
Javen Qinfeng Shi

Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of $3$ or more in terms of structural Hamming distance.

ICML Conference 2022 Conference Paper

Understanding Robust Overfitting of Adversarial Training and Beyond

Chaojian Yu
Bo Han 0003
Li Shen 0008
Jun Yu 0001
Chen Gong 0002
Mingming Gong
Tongliang Liu

Robust overfitting widely exists in adversarial training of deep networks. The exact underlying reasons for this are still not completely understood. Here, we explore the causes of robust overfitting by comparing the data distribution of non-overfit (weak adversary) and overfitted (strong adversary) adversarial training, and observe that the distribution of the adversarial data generated by weak adversary mainly contain small-loss data. However, the adversarial data generated by strong adversary is more diversely distributed on the large-loss data and the small-loss data. Given these observations, we further designed data ablation adversarial training and identify that some small-loss data which are not worthy of the adversary strength cause robust overfitting in the strong adversary mode. To relieve this issue, we propose minimum loss constrained adversarial training (MLCAT): in a minibatch, we learn large-loss data as usual, and adopt additional measures to increase the loss of the small-loss data. Technically, MLCAT hinders data fitting when they become easy to learn to prevent robust overfitting; philosophically, MLCAT reflects the spirit of turning waste into treasure and making the best use of each adversarial data; algorithmically, we designed two realizations of MLCAT, and extensive experiments demonstrate that MLCAT can eliminate robust overfitting and further boost adversarial robustness.

ICML Conference 2021 Conference Paper

Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels

Songhua Wu
Xiaobo Xia
Tongliang Liu
Bo Han 0003
Mingming Gong
Nannan Wang 0001
Haifeng Liu
Gang Niu 0001

Learning with noisy labels has attracted a lot of attention in recent years, where the mainstream approaches are in \emph{pointwise} manners. Meanwhile, \emph{pairwise} manners have shown great potential in supervised metric learning and unsupervised contrastive learning. Thus, a natural question is raised: does learning in a pairwise manner \emph{mitigate} label noise? To give an affirmative answer, in this paper, we propose a framework called \emph{Class2Simi}: it transforms data points with noisy \emph{class labels} to data pairs with noisy \emph{similarity labels}, where a similarity label denotes whether a pair shares the class label or not. Through this transformation, the \emph{reduction of the noise rate} is theoretically guaranteed, and hence it is in principle easier to handle noisy similarity labels. Amazingly, DNNs that predict the \emph{clean} class labels can be trained from noisy data pairs if they are first pretrained from noisy data points. Class2Simi is \emph{computationally efficient} because not only this transformation is on-the-fly in mini-batches, but also it just changes loss computation on top of model prediction into a pairwise manner. Its effectiveness is verified by extensive experiments.

NeurIPS Conference 2021 Conference Paper

Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?

Petar Stojanov
Zijian Li
Mingming Gong
Ruichu Cai
Jaime Carbonell
Kun Zhang

Unsupervised domain adaptation, as a prevalent transfer learning setting, spans many real-world applications. With the increasing representational power and applicability of neural networks, state-of-the-art domain adaptation methods make use of deep architectures to map the input features $X$ to a latent representation $Z$ that has the same marginal distribution across domains. This has been shown to be insufficient for generating optimal representation for classification, and to find conditionally invariant representations, usually strong assumptions are needed. We provide reasoning why when the supports of the source and target data from overlap, any map of $X$ that is fixed across domains may not be suitable for domain adaptation via invariant features. Furthermore, we develop an efficient technique in which the optimal map from $X$ to $Z$ also takes domain-specific information as input, in addition to the features $X$. By using the property of minimal changes of causal mechanisms across domains, our model also takes into account the domain-specific information to ensure that the latent representation $Z$ does not discard valuable information about $Y$. We demonstrate the efficacy of our method via synthetic and real-world data experiments. The code is available at: \texttt{https: //github. com/DMIRLAB-Group/DSAN}.

NeurIPS Conference 2021 Conference Paper

Instance-dependent Label-noise Learning under a Structural Causal Model

Yu Yao
Tongliang Liu
Mingming Gong
Bo Han
Gang Niu
Kun Zhang

Label noise generally degenerates the performance of deep learning algorithms because deep neural networks easily overfit label errors. Let $X$ and $Y$ denote the instance and clean label, respectively. When $Y$ is a cause of $X$, according to which many datasets have been constructed, e. g. , \textit{SVHN} and \textit{CIFAR}, the distributions of $P(X)$ and $P(Y|X)$ are generally entangled. This means that the unsupervised instances are helpful to learn the classifier and thus reduce the side effect of label noise. However, it remains elusive on how to exploit the causal information to handle the label-noise problem. We propose to model and make use of the causal process in order to correct the label-noise effect. Empirically, the proposed method outperforms all state-of-the-art methods on both synthetic and real-world label-noise datasets.

AAAI Conference 2021 Conference Paper

Learning with Group Noise

Qizhou Wang
Jiangchao Yao
Chen Gong
Tongliang Liu
Mingming Gong
Hongxia Yang
Bo Han

Machine learning in the context of noise is a challenging but practical setting to plenty of real-world applications. Most of the previous approaches in this area focus on the pairwise relation (casual or correlational relationship) with noise, such as learning with noisy labels. However, the group noise, which is parasitic on the coarse-grained accurate relation with the fine-grained uncertainty, is also universal and has not been well investigated. The challenge under this setting is how to discover true pairwise connections concealed by the group relation with its fine-grained noise. To overcome this issue, we propose a novel Max-Matching method for learning with group noise. Specifically, it utilizes a matching mechanism to evaluate the relation confidence of each object (cf. Figure 1) w. r. t. the target, meanwhile considering the Non-IID characteristics among objects in the group. Only the most confident object is considered to learn the model, so that the fine-grained noise is mostly dropped. The performance on a range of real-world datasets in the area of several learning paradigms demonstrates the effectiveness of Max-Matching.

IJCAI Conference 2020 Conference Paper

Bridging Causality and Learning: How Do They Benefit from Each Other?

Mingming Gong

Modern machine learning techniques can discover complicated statistical dependencies between ran- dom variables, usually in the form a statistical model, and make use of these dependencies to per- form predictions on future observations. How- ever, many real problems involve causal inference, which aims to infer how the data generating sys- tem should behave under changing conditions. To perform causal inference, we need not only statisti- cal dependencies but also causal structures to deter- mine the system’s behavior under external interven- tions. In this paper, I will be focusing on two essen- tial problems that bridge causality and learning and investigate how they can benefit from each other. On the one hand, since conducting randomized controlled experiments for causal structure discov- ery is often expensive or infeasible, it would be valuable to investigate how we can explore modern machine learning algorithms to search for causal structures from observational data. On the other hand, since causal structure provides information about the distribution changing properties, it can be used as a fundamental tool to tackle a major chal- lenge for machine learning: the capability of gener- alization to new distributions and prediction in non- stationary environment.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Causal Discovery from Multiple Data Sets with Non-Identical Variable Sets

Biwei Huang
Kun Zhang
Mingming Gong
Clark Glymour

A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a ﬁxed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to technical constraints or to save time and cost. This raises a question of how to handle causal discovery from multiple data sets with non-identical variable sets, and at the same time, it would be interesting to see how more recorded variables can help to mitigate the confounding problem. In this paper, we propose a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases. The proposed method also allows distribution shifts across data sets. Theoretically, we show that the causal structure over the integrated set of variables is identiﬁable under testable conditions. Furthermore, we present two types of approaches to parameter estimation: one is based on maximum likelihood, and the other is likelihood free and leverages generative adversarial nets to improve scalability of the estimation procedure. Experimental results on various synthetic and real-world data sets are presented to demonstrate the eﬃcacy of our methods.

AAAI Conference 2020 Conference Paper

Compressed Self-Attention for Deep Metric Learning

Ziye Chen
Mingming Gong
Yanwu Xu
Chaohui Wang
Kun Zhang
Bo Du

In this paper, we aim to enhance self-attention (SA) mechanism for deep metric learning in visual perception, by capturing richer contextual dependencies in visual data. To this end, we propose a novel module, named compressed selfattention (CSA), which signiﬁcantly reduces the computation and memory cost with a neglectable decrease in accuracy with respect to the original SA mechanism, thanks to the following two characteristics: i) it only needs to compute a small number of base attention maps for a small number of base feature vectors; and ii) the output at each spatial location can be simply obtained by an adaptive weighted average of the outputs calculated from the base attention maps. The high computational efﬁciency of CSA enables the application to high-resolution shallow layers in convolutional neural networks with little additional cost. In addition, CSA makes it practical to further partition the feature maps into groups along the channel dimension and compute attention maps for features in each group separately, thus increasing the diversity of long-range dependencies and accordingly boosting the accuracy. We evaluate the performance of CSA via extensive experiments on two metric learning tasks: person re-identiﬁcation and local descriptor learning. Qualitative and quantitative comparisons with latest methods demonstrate the signiﬁcance of CSA in this topic.

IJCAI Conference 2020 Conference Paper

Compressed Self-Attention for Deep Metric Learning with Low-Rank Approximation

Ziye Chen
Mingming Gong
Lingjuan Ge
Bo Du

In this paper, we apply self-attention (SA) mechanism to boost the performance of deep metric learning. However, due to the pairwise similarity measurement, the cost of storing and manipulating the complete attention maps makes it infeasible for large inputs. To solve this problem, we propose a compressed self-attention with low-rank approximation (CSALR) module, which significantly reduces the computation and memory costs without sacrificing the accuracy. In CSALR, the original attention map is decomposed into a landmark attention map and a combination coefficient map with a small number of landmark feature vectors sampled from the input feature map by average pooling. Thanks to the efficiency of CSALR, we can apply CSALR to high-resolution shallow convolutional layers and implement a multi-head form of CSALR, which further boosts the performance. We evaluate the proposed CSALR on person reidentification which is a typical metric learning task. Extensive experiments shows the effectiveness and efficiency of CSALR in deep metric learning and its superiority over the baselines.

PDF Details DOI

NeurIPS Conference 2020 Conference Paper

Domain Adaptation as a Problem of Inference on Graphical Models

Kun Zhang
Mingming Gong
Petar Stojanov
Biwei Huang
Qingsong Liu
Clark Glymour

This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i. e. , what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation.

NeurIPS Conference 2020 Conference Paper

Domain Generalization via Entropy Regularization

Shanshan Zhao
Mingming Gong
Tongliang Liu
Huan Fu
Dacheng Tao

Domain generalization aims to learn from multiple source domains a predictive model that can generalize to unseen target domains. One essential problem in domain generalization is to learn discriminative domain-invariant features. To arrive at this, some methods introduce a domain discriminator through adversarial learning to match the feature distributions in multiple source domains. However, adversarial training can only guarantee that the learned features have invariant marginal distributions, while the invariance of conditional distributions is more important for prediction in new domains. To ensure the conditional invariance of learned features, we propose an entropy regularization term that measures the dependency between the learned features and the class labels. Combined with the typical task-related loss, e. g. , cross-entropy loss for classification, and adversarial loss for domain discrimination, our overall objective is guaranteed to learn conditional-invariant features across all source domains and thus can learn classifiers with better generalization capabilities. We demonstrate the effectiveness of our method through comparison with state-of-the-art methods on both simulated and real-world datasets. Code is available at: https: //github. com/sshan-zhao/DG via ER.

NeurIPS Conference 2020 Conference Paper

Dual T: Reducing Estimation Error for Transition Matrix in Label-noise Learning

Yu Yao
Tongliang Liu
Bo Han
Mingming Gong
Jiankang Deng
Gang Niu
Masashi Sugiyama

The transition matrix, denoting the transition relationship from clean labels to noisy labels, is essential to build statistically consistent classifiers in label-noise learning. Existing methods for estimating the transition matrix rely heavily on estimating the noisy class posterior. However, the estimation error for noisy class posterior could be large because of the randomness of label noise. The estimation error would lead the transition matrix to be poorly estimated. Therefore in this paper, we aim to solve this problem by exploiting the divide-and-conquer paradigm. Specifically, we introduce an intermediate class to avoid directly estimating the noisy class posterior. By this intermediate class, the original transition matrix can then be factorized into the product of two easy-to-estimated transition matrices. We term the proposed method as the dual $T$-estimator. Both theoretical analyses and empirical results illustrate the effectiveness of the dual $T$-estimator for estimating transition matrices, leading to better classification performances.

AAAI Conference 2020 Conference Paper

Generative-Discriminative Complementary Learning

Yanwu Xu
Mingming Gong
Junxiang Chen
Tongliang Liu
Kun Zhang
Kayhan Batmanghelich

The majority of state-of-the-art deep learning methods are discriminative approaches, which model the conditional distribution of labels given inputs features. The success of such approaches heavily depends on high-quality labeled instances, which are not easy to obtain, especially as the number of candidate classes increases. In this paper, we study the complementary learning problem. Unlike ordinary labels, complementary labels are easy to obtain because an annotator only needs to provide a yes/no answer to a randomly chosen candidate class for each instance. We propose a generative-discriminative complementary learning method that estimates the ordinary labels by modeling both the conditional (discriminative) and instance (generative) distributions. Our method, we call Complementary Conditional GAN (CCGAN), improves the accuracy of predicting ordinary labels and is able to generate high-quality instances in spite of weak supervision. In addition to the extensive empirical studies, we also theoretically show that our model can retrieve the true conditional distribution from the complementarilylabeled data.

NeurIPS Conference 2020 Conference Paper

Hard Example Generation by Texture Synthesis for Cross-domain Shape Similarity Learning

Huan Fu
Shunming Li
Rongfei Jia
Mingming Gong
Binqiang Zhao
Dacheng Tao

Image-based 3D shape retrieval (IBSR) aims to find the corresponding 3D shape of a given 2D image from a large 3D shape database. The common routine is to map 2D images and 3D shapes into an embedding space and define (or learn) a shape similarity measure. While metric learning with some adaptation techniques seems to be a natural solution to shape similarity learning, the performance is often unsatisfactory for fine-grained shape retrieval. In the paper, we identify the source of the poor performance and propose a practical solution to this problem. We find that the shape difference between a negative pair is entangled with the texture gap, making metric learning ineffective in pushing away negative pairs. To tackle this issue, we develop a geometry-focused multi-view metric learning framework empowered by texture synthesis. The synthesis of textures for 3D shape models creates hard triplets, which suppress the adverse effects of rich texture in 2D images, thereby push the network to focus more on discovering geometric characteristics. Our approach shows state-of-the-art performance on a recently released large-scale 3D-FUTURE [1] repository, as well as three widely studied benchmarks, including Pix3D [2], Stanford Cars [3], and Comp Cars [4]. Codes will be made publicly available at: https: //github. com/3D-FRONT-FUTURE/IBSR-texture.

ICML Conference 2020 Conference Paper

Label-Noise Robust Domain Adaptation

Xiyu Yu
Tongliang Liu
Mingming Gong
Kun Zhang 0001
Kayhan Batmanghelich
Dacheng Tao

Domain adaptation aims to correct the classifiers when faced with distribution shift between source (training) and target (test) domains. State-of-the-art domain adaptation methods make use of deep networks to extract domain-invariant representations. However, existing methods assume that all the instances in the source domain are correctly labeled; while in reality, it is unsurprising that we may obtain a source domain with noisy labels. In this paper, we are the first to comprehensively investigate how label noise could adversely affect existing domain adaptation methods in various scenarios. Further, we theoretically prove that there exists a method that can essentially reduce the side-effect of noisy source labels in domain adaptation. Specifically, focusing on the generalized target shift scenario, where both label distribution $P_Y$ and the class-conditional distribution $P_{X|Y}$ can change, we discover that the denoising Conditional Invariant Component (DCIC) framework can provably ensures (1) extracting invariant representations given examples with noisy labels in the source domain and unlabeled examples in the target domain and (2) estimating the label distribution in the target domain with no bias. Experimental results on both synthetic and real-world data verify the effectiveness of the proposed method.

ICML Conference 2020 Conference Paper

LTF: A Label Transformation Framework for Correcting Label Shift

Jiaxian Guo
Mingming Gong
Tongliang Liu
Kun Zhang 0001
Dacheng Tao

Distribution shift is a major obstacle to the deployment of current deep learning models on real-world problems. Let $Y$ be the class label and $X$ the features. We focus on one type of distribution shift, \emph{ label shift}, where the label marginal distribution $P_Y$ changes but the conditional distribution $P_{X|Y}$ does not. Most existing methods estimate the density ratio between the source- and target-domain label distributions by density matching. However, these methods are either computationally infeasible for large-scale data or restricted to shift correction for discrete labels. In this paper, we propose an end-to-end Label Transformation Framework (LTF) for correcting label shift, which implicitly models the shift of $P_Y$ and the conditional distribution $P_{X|Y}$ using neural networks. Thanks to the flexibility of deep networks, our framework can handle continuous, discrete, and even multi-dimensional labels in a unified way and is scalable to large data. Moreover, for high dimensional $X$, such as images, we find that the redundant information in $X$ severely degrades the estimation accuracy. To remedy this issue, we propose to match the distribution implied by our generative model and the target-domain distribution in a low-dimensional feature space that discards information irrelevant to $Y$. Both theoretical and empirical studies demonstrate the superiority of our method over previous approaches.

NeurIPS Conference 2020 Conference Paper

Part-dependent Label Noise: Towards Instance-dependent Label Noise

Xiaobo Xia
Tongliang Liu
Bo Han
Nannan Wang
Mingming Gong
Haifeng Liu
Gang Niu
Dacheng Tao

Learning with the \textit{instance-dependent} label noise is challenging, because it is hard to model such real-world noise. Note that there are psychological and physiological evidences showing that we humans perceive instances by decomposing them into parts. Annotators are therefore more likely to annotate instances based on the parts rather than the whole instances, where a wrong mapping from parts to classes may cause the instance-dependent label noise. Motivated by this human cognition, in this paper, we approximate the instance-dependent label noise by exploiting \textit{part-dependent} label noise. Specifically, since instances can be approximately reconstructed by a combination of parts, we approximate the instance-dependent \textit{transition matrix} for an instance by a combination of the transition matrices for the parts of the instance. The transition matrices for parts can be learned by exploiting anchor points (i. e. , data points that belong to a specific class almost surely). Empirical evaluations on synthetic and real-world datasets demonstrate our method is superior to the state-of-the-art approaches for learning from the instance-dependent label noise.

ICML Conference 2019 Conference Paper

Causal Discovery and Forecasting in Nonstationary Environments with State-Space Models

Biwei Huang
Kun Zhang 0001
Mingming Gong
Clark Glymour

In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify the causal structure, and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods.

NeurIPS Conference 2019 Conference Paper

Likelihood-Free Overcomplete ICA and Applications In Causal Discovery

Chenwei DING
Mingming Gong
Kun Zhang
Dacheng Tao

Causal discovery witnessed significant progress over the past decades. In particular, many recent causal discovery methods make use of independent, non-Gaussian noise to achieve identifiability of the causal models. Existence of hidden direct common causes, or confounders, generally makes causal discovery more difficult; whenever they are present, the corresponding causal discovery algorithms can be seen as extensions of overcomplete independent component analysis (OICA). However, existing OICA algorithms usually make strong parametric assumptions on the distribution of independent components, which may be violated on real data, leading to sub-optimal or even wrong solutions. In addition, existing OICA algorithms rely on the Expectation Maximization (EM) procedure that requires computationally expensive inference of the posterior distribution of independent components. To tackle these problems, we present a Likelihood-Free Overcomplete ICA algorithm (LFOICA) that estimates the mixing matrix directly by back-propagation without any explicit assumptions on the density function of independent components. Thanks to its computational efficiency, the proposed method makes a number of causal discovery procedures much more practically feasible. For illustrative purposes, we demonstrate the computational efficiency and efficacy of our method in two causal discovery tasks on both synthetic and real data.

NeurIPS Conference 2019 Conference Paper

Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering

Biwei Huang
Kun Zhang
Pengtao Xie
Mingming Gong
Eric Xing
Clark Glymour

State-of-the-art approaches to causal discovery usually assume a fixed underlying causal model. However, it is often the case that causal models vary across domains or subjects, due to possibly omitted factors that affect the quantitative causal effects. As a typical example, causal connectivity in the brain network has been reported to vary across individuals, with significant differences across groups of people, such as autistics and typical controls. In this paper, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.

NeurIPS Conference 2019 Conference Paper

Twin Auxilary Classifiers GAN

Mingming Gong
Yanwu Xu
Chunyuan Li
Kun Zhang
Kayhan Batmanghelich

Conditional generative models enjoy significant progress over the past few years. One of the popular conditional models is Auxiliary Classifier GAN (AC-GAN) that generates highly discriminative images by extending the loss function of GAN with an auxiliary classifier. However, the diversity of the generated samples by AC-GAN tends to decrease as the number of classes increases. In this paper, we identify the source of low diversity issue theoretically and propose a practical solution to the problem. We show that the auxiliary classifier in AC-GAN imposes perfect separability, which is disadvantageous when the supports of the class distributions have significant overlap. To address the issue, we propose Twin Auxiliary Classifiers Generative Adversarial Net (TAC-GAN) that adds a new player that interacts with other players (the generator and the discriminator) in GAN. Theoretically, we demonstrate that our TAC-GAN can effectively minimize the divergence between generated and real data distributions. Extensive experimental results show that our TAC-GAN can successfully replicate the true data distributions on simulated data, and significantly improves the diversity of class-conditional image generation on real datasets.

UAI Conference 2018 Conference Paper

Causal Discovery with Linear Non-Gaussian Models under Measurement Error: Structural Identifiability Results

Kun Zhang 0001
Mingming Gong
Joseph D. Ramsey
Kayhan Batmanghelich
Peter Spirtes
Clark Glymour

Causal discovery methods aim to recover the causal process that generated purely observational data. Despite its successes on a number of real problems, the presence of measurement error in the observed data can produce serious mistakes in the output of various causal discovery methods. Given the ubiquity of measurement error caused by instruments or proxies used in the measuring process, this problem is one of the main obstacles to reliable causal discovery. It is still unknown to what extent the causal structure of relevant variables can be identified in principle. This study aims to take a step towards filling that void. We assume that the underlining process or the measurement-error free variables follows a linear, non-Guassian causal model, and show that the so-called ordered group decomposition of the causal model, which contains major causal information, is identifiable. The causal structure identifiability is further improved with diﬀerent types of sparsity constraints on the causal structure. Finally, we give rather mild conditions under which the whole causal structure is fully identifiable.

AAAI Conference 2018 Conference Paper

Domain Generalization via Conditional Invariant Representations

Ya Li
Mingming Gong
Xinmei Tian
Tongliang Liu
Dacheng Tao

Domain generalization aims to apply knowledge gained from multiple labeled source domains to unseen target domains. The main difﬁculty comes from the dataset bias: training data and test data have different distributions, and the training set contains heterogeneous samples from different distributions. Let X denote the features, and Y be the class labels. Existing domain generalization methods address the dataset bias problem by learning a domain-invariant representation h(X) that has the same marginal distribution P(h(X)) across multiple source domains. The functional relationship encoded in P(Y |X) is usually assumed to be stable across domains such that P(Y |h(X)) is also invariant. However, it is unclear whether this assumption holds in practical problems. In this paper, we consider the general situation where both P(X) and P(Y |X) can change across all domains. We propose to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y ). With the conditional invariant representation, the invariance of the joint distribution P(h(X), Y ) can be guaranteed if the class prior P(Y ) does not change across training and test domains. Extensive experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.

NeurIPS Conference 2018 Conference Paper

Modeling Dynamic Missingness of Implicit Feedback for Recommendation

Menghan Wang
Mingming Gong
Xiaolin Zheng
Kun Zhang

Implicit feedback is widely used in collaborative filtering methods for recommendation. It is well known that implicit feedback contains a large number of values that are \emph{missing not at random} (MNAR); and the missing data is a mixture of negative and unknown feedback, making it difficult to learn user's negative preferences. Recent studies modeled \emph{exposure}, a latent missingness variable which indicates whether an item is missing to a user, to give each missing entry a confidence of being negative feedback. However, these studies use static models and ignore the information in temporal dependencies among items, which seems to be a essential underlying factor to subsequent missingness. To model and exploit the dynamics of missingness, we propose a latent variable named ``\emph{user intent}'' to govern the temporal changes of item missingness, and a hidden Markov model to represent such a process. The resulting framework captures the dynamic item missingness and incorporate it into matrix factorization (MF) for recommendation. We also explore two types of constraints to achieve a more compact and interpretable representation of \emph{user intents}. Experiments on real-world datasets demonstrate the superiority of our method against state-of-the-art recommender systems.

UAI Conference 2017 Conference Paper

Causal Discovery from Temporally Aggregated Time Series

Mingming Gong
Kun Zhang 0001
Bernhard Schölkopf
Clark Glymour
Dacheng Tao

Discovering causal structure of a dynamical system from observed time series is a traditional and important problem. In many practical applications, observed data are obtained by applying subsampling or temporally aggregation to the original causal processes, making it difficult to discover the underlying causal relations. Subsampling refers to the procedure that for every k consecutive observations, one is kept, the rest being skipped, and recently some advances have been made in causal discovery from such data. With temporal aggregation, the local averages or sums of k consecutive, non-overlapping observations in the causal process are computed as new observations, and causal discovery from such data is even harder. In this paper, we investigate how to recover causal relations at the original causal frequency from temporally aggregated data when k is known. Assuming the time series at the causal frequency follows a vector autoregressive (VAR) model, we show that the causal structure at the causal frequency is identifiable from aggregated time series if the noise terms are independent and non-Gaussian and some other technical conditions hold. We then present an estimation method based on non-Gaussian state-space modeling and evaluate its performance on both synthetic and real data.

ICML Conference 2016 Conference Paper

Domain Adaptation with Conditional Transferable Components

Mingming Gong
Kun Zhang 0001
Tongliang Liu
Dacheng Tao
Clark Glymour
Bernhard Schölkopf

Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let X and Y denote the features and target, respectively, previous work on domain adaptation considers the covariate shift situation where the distribution of the features P(X) changes across domains while the conditional distribution P(Y|X) stays the same. To reduce domain discrepancy, recent methods try to find invariant components \mathcalT(X) that have similar P(\mathcalT(X)) by explicitly minimizing a distribution discrepancy measure. However, it is not clear if P(Y|\mathcalT(X)) in different domains is also similar when P(Y|X) changes. Furthermore, transferable components do not necessarily have to be invariant. If the change in some components is identifiable, we can make use of such components for prediction in the target domain. In this paper, we focus on the case where P(X|Y) and P(Y) both change in a causal system in which Y is the cause for X. Under appropriate assumptions, we aim to extract conditional transferable components whose conditional distribution P(\mathcalT(X)|Y) is invariant after proper location-scale (LS) transformations, and identify how P(Y) changes between domains simultaneously. We provide theoretical analysis and empirical evaluation on both synthetic and real-world data to show the effectiveness of our method.

ICML Conference 2015 Conference Paper

Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components

Philipp Geiger
Kun Zhang 0001
Bernhard Schölkopf
Mingming Gong
Dominik Janzing

A widely applied approach to causal inference from a time series X, often referred to as “(linear) Granger causal analysis”, is to simply regress present on past and interpret the regression matrix \hatB causally. However, if there is an unmeasured time series Z that influences X, then this approach can lead to wrong causal conclusions, i. e. , distinct from those one would draw if one had additional information such as Z. In this paper we take a different approach: We assume that X together with some hidden Z forms a first order vector autoregressive (VAR) process with transition matrix A, and argue why it is more valid to interpret A causally instead of \hatB. Then we examine under which conditions the most important parts of A are identifiable or almost identifiable from only X. Essentially, sufficient conditions are (1) non-Gaussian, independent noise or (2) no influence from X to Z. We present two estimation algorithms that are tailored towards conditions (1) and (2), respectively, and evaluate them on synthetic and real-world data. We discuss how to check the model using X.

ICML Conference 2015 Conference Paper

Discovering Temporal Causal Relations from Subsampled Data

Mingming Gong
Kun Zhang 0001
Bernhard Schölkopf
Dacheng Tao
Philipp Geiger

Granger causal analysis has been an important tool for causal analysis for time series in various fields, including neuroscience and economics, and recently it has been extended to include instantaneous effects between the time series to explain the contemporaneous dependence in the residuals. In this paper, we assume that the time series at the true causal frequency follow the vector autoregressive model. We show that when the data resolution becomes lower due to subsampling, neither the original Granger causal analysis nor the extended one is able to discover the underlying causal relations. We then aim to answer the following question: can we estimate the temporal causal relations at the right causal frequency from the subsampled data? Traditionally this suffers from the identifiability problems: under the Gaussianity assumption of the data, the solutions are generally not unique. We prove that, however, if the noise terms are non-Gaussian, the underlying model for the high frequency data is identifiable from subsampled data under mild conditions. We then propose an Expectation-Maximization (EM) approach and a variational inference approach to recover temporal causal relations from such subsampled data. Experimental results on both simulated and real data are reported to illustrate the performance of the proposed approaches.

AAAI Conference 2015 Conference Paper

Multi-Source Domain Adaptation: A Causal View

Kun Zhang
Mingming Gong
Bernhard Schoelkopf