Arrow Research search

Author name cluster

Ruichu Cai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

AAAI Conference 2026 Conference Paper

CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge

  • Lei Zan
  • Keli Zhang
  • Ruichu Cai
  • Lujia Pan

Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning, a challenge fundamentally rooted in deep structural dependencies. To address this challenge, we propose CAusal MAthematician (CAMA), a two stage causal framework that equips LLMs with explicit, reusable mathematical structure. In the learning stage, CAMA first constructs the Mathematical Causal Graph (MCG), a high level representation of solution strategies, by combining LLM priors with causal discovery algorithms applied to a corpus of question solution pairs. The resulting MCG encodes essential knowledge points and their causal dependencies. To better align the graph with downstream reasoning tasks, CAMA further refines the MCG through iterative feedback derived from a selected subset of the question solution pairs. In the reasoning stage, given a new question, CAMA dynamically extracts a task relevant subgraph from the MCG, conditioned on both the question content and the LLM’s intermediate reasoning trace. This subgraph, which encodes the most pertinent knowledge points and their causal dependencies, is then injected back into the LLM to guide its reasoning process. Empirical results on real world datasets show that CAMA significantly improves LLM performance on challenging mathematical problems. Furthermore, our experiments demonstrate that structured guidance consistently outperforms unstructured alternatives, and that incorporating asymmetric causal relationships yields greater improvements than using symmetric associations alone.

AAAI Conference 2026 Conference Paper

Horizontal and Vertical Federated Causal Structure Learning via Higher-order Cumulants

  • Wei Chen
  • Wanyang Gu
  • Linjun Peng
  • Ting Yan
  • Ruichu Cai
  • Zhifeng Hao
  • Kun Zhang

Federated causal discovery aims to uncover causal relationships while protecting data privacy, with significant real-world applications. Existing methods focus on horizontal federated settings where clients share the same variables but have different samples. However, in practice, clients may have different variables, leading to spurious causal relationships. To address this issue, we comprehensively consider causal structure learning methods under both horizontal and vertical federated settings. Interestingly, we find that, higher-order cumulants rely solely on the joint distribution of the relevant variables and are useful to solve the above problem in the linear non-Gaussian case. This motivates us to provide the identification theories for determining the causal order over observed variables, leveraging the difference in the product of the (cross) cumulants of the specific variables. Based on these theories, we develop a method for learning causal order in the horizontal and vertical federated scenarios. Specifically, we first obtain local (cross) cumulant matrices of observed variables from all participating clients to construct a global cumulant matrix. This global cumulant matrix is then used for recursive source variable identification, ultimately yielding a causal strength matrix of the union of variables from all clients. Our algorithm demonstrates superior performance in experiments on both synthetic and real-world data.

IJCAI Conference 2025 Conference Paper

Causal View of Time Series Imputation: Some Identification Results on Missing Mechanism

  • Ruichu Cai
  • Kaitao Zheng
  • Junxian Huang
  • Zijian Li
  • Zhengming Chen
  • Boyan Xu
  • Zhifeng Hao

Time series imputation is one of the most challenging problems and has broad applications in various fields like health care and the Internet of Things. Existing methods mainly aim to model the temporally latent dependencies and the generation process from the observed time series data. In real-world scenarios, different types of missing mechanisms, like MAR (Missing At Random) and MNAR (Missing Not At Random), can occur in time series data. However, existing methods often overlook the difference among the aforementioned missing mechanisms and use a single model for time series imputation, which can easily lead to misleading results due to mechanism mismatching. In this paper, we propose a framework for the time series imputation problem by exploring Different Missing Mechanisms (DMM in short) and tailoring solutions accordingly. Specifically, we first analyze the data generation processes with temporal latent states and missing cause variables for different mechanisms. Sequentially, we model these generation processes via variational inference and estimate prior distributions of latent variables via a normalizing flow-based neural architecture. Furthermore, we establish identifiability results under the nonlinear independent component analysis framework to show that latent variables are identifiable. Experimental results show that our method surpasses existing time series imputation techniques across various datasets with different missing mechanisms, demonstrating its effectiveness in real-world applications.

IJCAI Conference 2025 Conference Paper

Causal-aware Large Language Models: Enhancing Decision-Making Through Learning, Adapting and Acting

  • Wei Chen
  • Jiahao Zhang
  • Haipeng Zhu
  • Boyan Xu
  • Zhifeng Hao
  • Keli Zhang
  • Junjian Ye
  • Ruichu Cai

Large language models (LLMs) have shown great potential in decision-making due to the vast amount of knowledge stored within the models. However, these pre-trained models are prone to lack reasoning abilities and are difficult to adapt to new environments, further hindering their application to complex real-world tasks. To address these challenges, inspired by the human cognitive process, we propose Causal-Aware LLMs, which integrate the structural causal model (SCM) into the decision-making process to model, update, and utilize structured knowledge of the environment in a "learning-adapting-acting" paradigm. Specifically, in the learning stage, we first utilize an LLM to extract the environment-specific causal entities and their causal relations to initialize a structured causal model of the environment. Subsequently, in the adapting stage, we update the structured causal model through external feedback about the environment, via an idea of causal intervention. Finally, in the acting stage, Causal-Aware LLMs exploit structured causal knowledge for more efficient policy-making through the reinforcement learning agent. The above processes are performed iteratively to learn causal knowledge, ultimately enabling the causal-aware LLM to achieve a more accurate understanding of the environment and make more efficient decisions. Experimental results across 22 diverse tasks within the open-world game "Crafter" validate the effectiveness of our proposed method.

AAAI Conference 2025 Conference Paper

Disentangling Long-Short Term State Under Unknown Interventions for Online Time Series Forecasting

  • Ruichu Cai
  • Haiqin Huang
  • Zhifan Jiang
  • Zijian Li
  • Changze Zhou
  • Yuequn Liu
  • Yuming Liu
  • Zhifeng Hao

Current methods for time series forecasting struggle in the online scenario, since it is difficult to preserve long-term dependency while adapting short-term changes when data are arriving sequentially. Although some recent methods solve this problem by controlling the updates of latent states, they cannot disentangle the long/short-term states, leading to the inability to effectively adapt to nonstationary. To tackle this challenge, we propose a general framework to disentangle long/short-term states for online time series forecasting. Our idea is inspired by the observations where short-term changes can be led by unknown interventions like abrupt policies in the stock market. Based on this insight, we formalize a data generation process with unknown interventions on short-term states. Under mild assumptions, we further leverage the independence of short-term states led by unknown interventions to establish the identification theory to achieve the disentanglement of long/short-term states. Built on this theory, we develop a Long Short-Term Disentanglement model (LSTD) to extract the long/short-term states with long/short term encoders, respectively. Furthermore, the LSTD model incorporates a smooth constraint to preserve the long-term dependencies and an interrupted dependency constraint to enforce the forgetting of short-term dependencies, together boosting the disentanglement of long/short-term states. Experimental results on several benchmark datasets show that our LSTD model outperforms existing methods for online time series forecasting, validating its efficacy in real-world applications.

AAAI Conference 2025 Conference Paper

Hypergraph Learning for Unsupervised Graph Alignment via Optimal Transport

  • Yuguang Yan
  • Canlin Yang
  • Yuanlin Chen
  • Ruichu Cai
  • Michael Ng

Unsupervised graph alignment aims to find corresponding nodes across different graphs without supervision. Existing methods usually leverage the graph structure to aggregate features of nodes to find relations between nodes. However, the graph structure is inherently limited in pairwise relations between nodes without considering higher-order dependencies among multiple nodes. In this paper, we take advantage of the hypergraph structure to characterize higher-order structural information among nodes for better graph alignment. Specifically, we propose an optimal transport model to learn a hypergraph to capture complex relations among nodes, so that the nodes involved in one hyperedge can be adaptively based on local geometric information. In addition, inspired by the Dirichlet energy function of a hypergraph, we further refine our model to enhance the consistency between structural and feature information in each hyperedge. After that, we jointly leverage graphs and hypergraphs to extract structural and feature information to better model the relations between nodes, which is used to find node correspondences across graphs. We conduct experiments on several benchmark datasets with different settings, and the results demonstrate the effectiveness of our proposed method.

ICML Conference 2025 Conference Paper

Identification of Latent Confounders via Investigating the Tensor Ranks of the Nonlinear Observations

  • Zhengming Chen 0002
  • Yewei Xia
  • Feng Xie 0002
  • Jie Qiao
  • Zhifeng Hao
  • Ruichu Cai
  • Kun Zhang 0001

We study the problem of learning discrete latent variable causal structures from mixed-type observational data. Traditional methods, such as those based on the tensor rank condition, are designed to identify discrete latent structure models and provide robust identification bounds for discrete causal models. However, when observed variables—specifically, those representing the children of latent variables—are collected at various levels with continuous data types, the tensor rank condition is not applicable, limiting further causal structure learning for latent variables. In this paper, we consider a more general case where observed variables can be either continuous or discrete, and further allow for scenarios where multiple latent parents cause the same set of observed variables. We show that, under the completeness condition, it is possible to discretize the data in a way that satisfies the full-rank assumption required by the tensor rank condition. This enables the identifiability of discrete latent structure models within mixed-type observational data. Moreover, we introduce the two-sufficient measurement condition, a more general structural assumption under which the tensor rank condition holds and the underlying latent causal structure is identifiable by a proposed two-stage identification algorithm. Extensive experiments on both simulated and real-world data validate the effectiveness of our method.

IJCAI Conference 2025 Conference Paper

Long-Term Individual Causal Effect Estimation via Identifiable Latent Representation Learning

  • Ruichu Cai
  • Junjie Wan
  • Weilin Chen
  • Zeqin Yang
  • Zijian Li
  • Peng Zhen
  • Jiecheng Guo

Estimating long-term causal effects by combining long-term observational and short-term experimental data is a crucial but challenging problem in many real-world scenarios. In existing methods, several ideal assumptions, e. g. latent unconfoundedness assumption or additive equi-confounding bias assumption, are proposed to address the latent confounder problem raised by the observational data. However, in real-world applications, these assumptions are typically violated which limits their practical effectiveness. In this paper, we tackle the problem of estimating the long-term individual causal effects without the aforementioned assumptions. Specifically, we propose to utilize the natural heterogeneity of data, such as data from multiple sources, to identify latent confounders, thereby significantly avoiding reliance on idealized assumptions. Practically, we devise a latent representation learning-based estimator of long-term causal effects. Theoretically, we establish the identifiability of latent confounders, with which we further achieve long-term effect identification. Extensive experimental studies, conducted on multiple synthetic and semi-synthetic datasets, demonstrate the effectiveness of our proposed method.

TIST Journal 2025 Journal Article

Modeling Multi-Seasonal Multi-Behavior Dependency for Temporal Recommendation

  • Shichao Liang
  • Wen Wen
  • Yali Feng
  • Ruichu Cai
  • Zhifeng Hao

Mining temporal patterns from user behaviors has long been investigated, but most of the existing work centers on single-type user–item interactions, such as purchase or click, which fails to take advantage of the user’s diversified interests revealed by various types of behavior. However, capturing patterns from different behavior sequences and modeling the complex inter-correlation between them are non-trivial tasks, as the high sparsity of type-related interactions, multi-seasonality of individual behaviors, and time-variant dependency of multi-type activities make it really challenging. To address these challenges, we propose a novel framework that aims to model the M ulti-Seasonal M ulti-Behavior Dep endencies (MMDep) both within and across the multi-type behavior sequences. In the proposed model, an item co-occurrence matrix factorization strategy is introduced to alleviate the sparsity issue in type-related behavior sequences. And a temporal dependency module that incorporates multi-scale EMA mechanism is utilized to capture the multi-seasonal dependencies within individual sequences. Moreover, a cross-behavior dependency module is employed to learn the time-variant dependency among different behaviors. Extensive experiments on three real-world datasets demonstrate that the proposed MMDep performs significantly better than the state-of-the-art baselines. And it may provide some new insights and tools on how to leverage multi-behavior data for better temporal recommendation.

ICLR Conference 2025 Conference Paper

On the Identification of Temporal Causal Representation with Instantaneous Dependence

  • Zijian Li 0001
  • Yifan Shen 0004
  • Kaitao Zheng
  • Ruichu Cai
  • Xiangchen Song
  • Mingming Gong
  • Guangyi Chen 0002
  • Kun Zhang 0001

Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observations, which are in general difficult to obtain in real-world scenarios. To fill this gap, we propose an \textbf{ID}entification framework for instantane\textbf{O}us \textbf{L}atent dynamics (\textbf{IDOL}) by imposing a sparse influence constraint that the latent causal processes have sparse time-delayed and instantaneous relations. Specifically, we establish identifiability results of the latent causal process based on sufficient variability and the sparse influence constraint by employing contextual information of time series data. Based on these theories, we incorporate a temporally variational inference architecture to estimate the latent variables and a gradient-based sparsity regularization to identify the latent causal process. Experimental results on simulation datasets illustrate that our method can identify the latent causal process. Furthermore, evaluations on multiple human motion forecasting benchmarks with instantaneous dependencies indicate the effectiveness of our method in real-world settings.

NeurIPS Conference 2025 Conference Paper

Online Time Series Forecasting with Theoretical Guarantees

  • Zijian Li
  • Changze Zhou
  • Minghao Fu
  • Sanjay Manjunath
  • Fan Feng
  • Guangyi Chen
  • Yingyao Hu
  • Ruichu Cai

This paper is concerned with online time series forecasting, where unknown distribution shifts occur over time, i. e. , latent variables influence the mapping from historical to future observations. To develop an automated way of online time series forecasting, we propose a Theoretical framework for Online Time-series forecasting (TOT in short) with theoretical guarantees. Specifically, we prove that supplying a forecaster with latent variables tightens the Bayes risk—the benefit endures under estimation uncertainty of latent variables and grows as the latent variables achieve a more precise identifiability. To better introduce latent variables into online forecasting algorithms, we further propose to identify latent variables with minimal adjacent observations. Based on these results, we devise a model-agnostic blueprint by employing a temporal decoder to match the distribution of observed variables and two independent noise estimators to model the causal inference of latent variables and mixing procedures of observed variables, respectively. Experiment results on synthetic data support our theoretical claims. Moreover, plug-in implementations built on several baselines yield general improvement across multiple benchmarks, highlighting the effectiveness in real-world applications.

ICML Conference 2025 Conference Paper

Reducing Confounding Bias without Data Splitting for Causal Inference via Optimal Transport

  • Yuguang Yan
  • Zongyu Li
  • Haolin Yang
  • Zeqin Yang
  • Hao Zhou
  • Ruichu Cai
  • Zhifeng Hao

Causal inference seeks to estimate the effect given a treatment such as a medicine or the dosage of a medication. To reduce the confounding bias caused by the non-randomized treatment assignment, most existing methods reduce the shift between subpopulations receiving different treatments. However, these methods split limited training samples into smaller groups, which cuts down the number of samples in each group, while precise distribution estimation and alignment highly rely on a sufficient number of training samples. In this paper, we propose a distribution alignment paradigm without data splitting, which can be naturally applied in the settings of binary and continuous treatments. To this end, we characterize the confounding bias by considering different probability measures of the same set including all the training samples, and exploit the optimal transport theory to analyze the confounding bias and outcome estimation error. Based on this, we propose to learn balanced representations by reducing the bias between the marginal distribution and the conditional distribution of a treatment. As a result, data reduction caused by splitting is avoided, and the outcome prediction model trained on one treatment group can be generalized to the entire population. The experiments on both binary and continuous treatment settings demonstrate the effectiveness of our method.

AAMAS Conference 2025 Conference Paper

Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

  • Jiafan Zhuang
  • Gaofei Han
  • Zihao Xia
  • Che Lin
  • Boxi Wang
  • Dongliang Wang
  • Wenji Li
  • Zhifeng Hao

Collision avoidance navigation for unmanned aerial vehicle (UAV) swarms in complex and unseen outdoor environments presents a significant challenge, as UAVs are required navigate through various obstacles and intricate backgrounds. While existing deep reinforcement learning (DRL)-based collision avoidance methods have shown promising performance, they often suffer from poor generalization, leading to degraded performance in unseen environments. To address this limitation, we investigate the root causes of weak generalization in DRL models and propose a novel causal feature selection module. This module can be integrated into the policy network to effectively filter out non-causal factors in representations, thereby minimizing the impact of spurious correlations between non-causal elements and action predictions. Experimental results demonstrate that the proposed method achieves robust navigation performance and effective collision avoidance, particularly in scenarios with unseen backgrounds and obstacles, which significantly outperforms state-of-the-art (SOTA) algorithms.

ICLR Conference 2025 Conference Paper

Synergy Between Sufficient Changes and Sparse Mixing Procedure for Disentangled Representation Learning

  • Zijian Li 0001
  • Shunxing Fan
  • Yujia Zheng 0001
  • Ignavier Ng
  • Shaoan Xie
  • Guangyi Chen 0002
  • Xinshuai Dong
  • Ruichu Cai

Disentangled representation learning aims to uncover the latent variables underlying observed data, yet identifying these variables under mild assumptions remains challenging. Some methods rely on sufficient changes in the distribution of latent variables indicated by auxiliary variables, such as domain indices, but acquiring enough domains is often impractical. Alternative approaches exploit the structural sparsity assumption on mixing processes, but this constraint may not hold in practice. Interestingly, we find that these two seemingly unrelated assumptions can actually complement each other. Specifically, when conditioned on auxiliary variables, the sparse mixing process induces independence between latent and observed variables, which simplifies the mapping from estimated to true latent variables and hence compensates for deficiencies of auxiliary variables. Building on this insight, we propose an identifiability theory with less restrictive constraints regarding the auxiliary variables and the sparse mixing process, enhancing applicability to real-world scenarios. Additionally, we develop a generative model framework incorporating a domain encoding network and a sparse mixing constraint and provide two implementations based on variational autoencoders and generative adversarial networks. Experiment results on synthetic and real-world datasets support our theoretical results.

NeurIPS Conference 2025 Conference Paper

Towards Identifiability of Hierarchical Temporal Causal Representation Learning

  • Zijian Li
  • Minghao Fu
  • Junxian Huang
  • Yifan Shen
  • Ruichu Cai
  • Yuewen Sun
  • Guangyi Chen
  • Kun Zhang

Modeling hierarchical latent dynamics behind time series data is critical for capturing temporal dependencies across multiple levels of abstraction in real-world tasks. However, existing temporal causal representation learning methods fail to capture such dynamics, as they fail to recover the joint distribution of hierarchical latent variables from \textit{single-timestep observed variables}. Interestingly, we find that the joint distribution of hierarchical latent variables can be uniquely determined using three conditionally independent observations. Building on this insight, we propose a Causally Hierarchical Latent Dynamic (CHiLD) identification framework. Our approach first employs temporal contextual observed variables to identify the joint distribution of multi-layer latent variables. Sequentially, we exploit the natural sparsity of the hierarchical structure among latent variables to identify latent variables within each layer. Guided by the theoretical results, we develop a time series generative model grounded in variational inference. This model incorporates a contextual encoder to reconstruct multi-layer latent variables and normalize flow-based hierarchical prior networks to impose the independent noise condition of hierarchical latent dynamics. Empirical evaluations on both synthetic and real-world datasets validate our theoretical claims and demonstrate the effectiveness of CHiLD in modeling hierarchical latent dynamics.

AAAI Conference 2024 Conference Paper

An Optimal Transport View for Subspace Clustering and Spectral Clustering

  • Yuguang Yan
  • Zhihao Xu
  • Canlin Yang
  • Jie Zhang
  • Ruichu Cai
  • Michael Kwok-Po Ng

Clustering is one of the most fundamental problems in machine learning and data mining, and many algorithms have been proposed in the past decades. Among them, subspace clustering and spectral clustering are the most famous approaches. In this paper, we provide an explanation for subspace clustering and spectral clustering from the perspective of optimal transport. Optimal transport studies how to move samples from one distribution to another distribution with minimal transport cost, and has shown a powerful ability to extract geometric information. By considering a self optimal transport model with only one group of samples, we observe that both subspace clustering and spectral clustering can be explained in the framework of optimal transport, and the optimal transport matrix bridges the spaces of features and spectral embeddings. Inspired by this connection, we propose a spectral optimal transport barycenter model, which learns spectral embeddings by solving a barycenter problem equipped with an optimal transport discrepancy and guidance of data. Based on our proposed model, we take advantage of optimal transport to exploit both feature and metric information involved in data for learning coupled spectral embeddings and affinity matrix in a unified model. We develop an alternating optimization algorithm to solve the resultant problems, and conduct experiments in different settings to evaluate the performance of our proposed methods.

ICML Conference 2024 Conference Paper

Automating the Selection of Proxy Variables of Unmeasured Confounders

  • Feng Xie 0002
  • Zhengming Chen 0002
  • Shanshan Luo
  • Wang Miao
  • Ruichu Cai
  • Zhi Geng

Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In this paper, we investigate the estimation of causal effects among multiple treatments and a single outcome, all of which are affected by unmeasured confounders, within a linear causal model, without prior knowledge of the validity of proxy variables. To be more specific, we first extend the existing proxy variable estimator, originally addressing a single unmeasured confounder, to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome. Subsequently, we present two different sets of precise identifiability conditions for selecting valid proxy variables of unmeasured confounders, based on the second-order statistics and higher-order statistics of the data, respectively. Moreover, we propose two data-driven methods for the selection of proxy variables and for the unbiased estimation of causal effects. Theoretical analysis demonstrates the correctness of our proposed algorithms. Experimental results on both synthetic and real-world data show the effectiveness of the proposed approach.

AAAI Conference 2024 Conference Paper

Causal Discovery from Poisson Branching Structural Causal Model Using High-Order Cumulant with Path Analysis

  • Jie Qiao
  • Yu Xiang
  • Zhengming Chen
  • Ruichu Cai
  • Zhifeng Hao

Count data naturally arise in many fields, such as finance, neuroscience, and epidemiology, and discovering causal structure among count data is a crucial task in various scientific and industrial scenarios. One of the most common characteristics of count data is the inherent branching structure described by a binomial thinning operator and an independent Poisson distribution that captures both branching and noise. For instance, in a population count scenario, mortality and immigration contribute to the count, where survival follows a Bernoulli distribution, and immigration follows a Poisson distribution. However, causal discovery from such data is challenging due to the non-identifiability issue: a single causal pair is Markov equivalent, i.e., X->Y and Y->X are distributed equivalent. Fortunately, in this work, we found that the causal order from X to its child Y is identifiable if X is a root vertex and has at least two directed paths to Y, or the ancestor of X with the most directed path to X has a directed path to Y without passing X. Specifically, we propose a Poisson Branching Structure Causal Model (PB-SCM) and perform a path analysis on PB-SCM using high-order cumulants. Theoretical results establish the connection between the path and cumulant and demonstrate that the path information can be obtained from the cumulant. With the path information, causal order is identifiable under some graphical conditions. A practical algorithm for learning causal structure under PB-SCM is proposed and the experiments demonstrate and verify the effectiveness of the proposed method.

JMLR Journal 2024 Journal Article

Causal-learn: Causal Discovery in Python

  • Yujia Zheng
  • Biwei Huang
  • Wei Chen
  • Joseph Ramsey
  • Mingming Gong
  • Ruichu Cai
  • Shohei Shimizu
  • Peter Spirtes

Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe causal-learn, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, modular building blocks for developers, detailed documentation for learners, and comprehensive methods for all. Different from previous packages in R or Java, causal-learn is fully developed in Python, which could be more in tune with the recent preference shift in programming languages within related communities. The library is available at https://github.com/py-why/causal-learn. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

IJCAI Conference 2024 Conference Paper

Combinatorial Routing for Neural Trees

  • Jiahao Li
  • Ruichu Cai
  • Yuguang Yan

Neural trees benefit from the high-level representation of neural networks and the interpretability of decision trees. Therefore, the existing works on neural trees perform outstandingly on various tasks such as architecture search. However, these works require every router to provide only one successor for each sample, causing the predictions to be dominated by the elite branch and its derivative architectures. To break this branch dominance, we propose the combinatorial routing neural tree method, termed CombRo. Unlike the previous methods employing unicast routing, CombRo performs multicast schema in each iteration, allowing the features to be routed to any combination of successors at every non-leaf. The weights of each architecture are then evaluated accordingly. We update the weights by training the routing subnetwork, and the architecture with the top weight is selected in the final step. We compare CombRo with the existing algorithms on 3 public image datasets, demonstrating its superior performance in terms of accuracy. Visualization results further validate the effectiveness of the multicast routing schema. Code is available at https: //github. com/JiahaoLi-gdut/CombRo.

ICML Conference 2024 Conference Paper

Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning

  • Weilin Chen 0001
  • Ruichu Cai
  • Zeqin Yang
  • Jie Qiao
  • Yuguang Yan
  • Zijian Li 0001
  • Zhifeng Hao

Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e. g. , leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generation process. To mitigate bias stemming from misspecification, we propose a novel doubly robust causal effect estimator under networked interference, by adapting the targeted learning technique to the training of neural networks. Specifically, we generalize the targeted learning technique into the networked interference setting and establish the condition under which an estimator achieves double robustness. Based on the condition, we devise an end-to-end causal effect estimator by transforming the identified theoretical condition into a targeted loss. Moreover, we provide a theoretical analysis of our designed estimator, revealing a faster convergence rate compared to a single nuisance model. Extensive experimental results on two real-world networks with semisynthetic data demonstrate the effectiveness of our proposed estimators.

AAAI Conference 2024 Conference Paper

Exploiting Geometry for Treatment Effect Estimation via Optimal Transport

  • Yuguang Yan
  • Zeqin Yang
  • Weilin Chen
  • Ruichu Cai
  • Zhifeng Hao
  • Michael Kwok-Po Ng

Estimating treatment effects from observational data suffers from the issue of confounding bias, which is induced by the imbalanced confounder distributions between the treated and control groups. As an effective approach, re-weighting learns a group of sample weights to balance the confounder distributions. Existing methods of re-weighting highly rely on a propensity score model or moment alignment. However, for complex real-world data, it is difficult to obtain an accurate propensity score prediction. Although moment alignment is free of learning a propensity score model, accurate estimation for high-order moments is computationally difficult and still remains an open challenge, and first and second-order moments are insufficient to align the distributions and easy to be misled by outliers. In this paper, we exploit geometry to capture the intrinsic structure involved in data for balancing the confounder distributions, so that confounding bias can be reduced even with outliers. To achieve this, we construct a connection between treatment effect estimation and optimal transport, a powerful tool to capture geometric information. After that, we propose an optimal transport model to learn sample weights by extracting geometry from confounders, in which geometric information between groups and within groups is leveraged for better confounder balancing. A projected mirror descent algorithm is employed to solve the derived optimization problem. Experimental studies on both synthetic and real-world datasets demonstrate the effectiveness of our proposed method.

ICML Conference 2024 Conference Paper

Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

  • Xuexin Chen
  • Ruichu Cai
  • Zhengting Huang
  • Yuxuan Zhu 0001
  • Julien Horwood
  • Zhifeng Hao
  • Zijian Li 0001
  • José Miguel Hernández-Lobato

We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs’ discriminative power, we introduce Feature Attribution with Necessity and Sufficiency (FANS), which find a neighborhood of the input such that perturbing samples within this neighborhood have a high Probability of being Necessity and Sufficiency (PNS) cause for the change in predictions, and use this PNS as the importance of the feature. Specifically, FANS compute this PNS via a heuristic strategy for estimating the neighborhood and a perturbation test involving two stages (factual and interventional) for counterfactual reasoning. To generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution. We demonstrate that FANS outperforms existing attribution methods on six benchmarks. Please refer to the source code via https: //github. com/DMIRLAB-Group/FANS.

JMLR Journal 2024 Journal Article

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

  • Feng Xie
  • Biwei Huang
  • Zhengming Chen
  • Ruichu Cai
  • Clark Glymour
  • Zhi Geng
  • Kun Zhang

We investigate the challenging task of learning causal structure in the presence of latent variables, including locating latent variables, determining their quantity, and identifying causal relationships among both latent and observed variables. To address this, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic causal models. From a graphical perspective, roughly speaking, GIN implies the existence of a set $\mathcal{S}$ such that $\mathcal{S}$ is causally earlier (w.r.t. the causal ordering) than $\mathbf{Y}$, and that every active (collider-free) path between $\mathbf{Y}$ and $\mathbf{Z}$ must contain a node from $\mathcal{S}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the underlying causal structure of a LiNGLaH is identifiable in light of GIN conditions under mild assumptions. Experimental results on both synthetic and three real-world data sets show the effectiveness of the proposed approach. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Hypergraph Joint Representation Learning for Hypervertices and Hyperedges via Cross Expansion

  • Yuguang Yan
  • Yuanlin Chen
  • Shibo Wang
  • Hanrui Wu
  • Ruichu Cai

Hypergraph captures high-order information in structured data and obtains much attention in machine learning and data mining. Existing approaches mainly learn representations for hypervertices by transforming a hypergraph to a standard graph, or learn representations for hypervertices and hyperedges in separate spaces. In this paper, we propose a hypergraph expansion method to transform a hypergraph to a standard graph while preserving high-order information. Different from previous hypergraph expansion approaches like clique expansion and star expansion, we transform both hypervertices and hyperedges in the hypergraph to vertices in the expanded graph, and construct connections between hypervertices or hyperedges, so that richer relationships can be used in graph learning. Based on the expanded graph, we propose a learning model to embed hypervertices and hyperedges in a joint representation space. Compared with the method of learning separate spaces for hypervertices and hyperedges, our method is able to capture common knowledge involved in hypervertices and hyperedges, and also improve the data efficiency and computational efficiency. To better leverage structure information, we minimize the graph reconstruction loss to preserve the structure information in the model. We perform experiments on both hypervertex classification and hyperedge classification tasks to demonstrate the effectiveness of our proposed method.

AAAI Conference 2024 Conference Paper

Identification of Causal Structure in the Presence of Missing Data with Additive Noise Model

  • Jie Qiao
  • Zhengming Chen
  • Jianhua Yu
  • Ruichu Cai
  • Zhifeng Hao

Missing data are an unavoidable complication frequently encountered in many causal discovery tasks. When a missing process depends on the missing values themselves (known as self-masking missingness), the recovery of the joint distribution becomes unattainable, and detecting the presence of such self-masking missingness remains a perplexing challenge. Consequently, due to the inability to reconstruct the original distribution and to discern the underlying missingness mechanism, simply applying existing causal discovery methods would lead to wrong conclusions. In this work, we found that the recent advances additive noise model has the potential for learning causal structure under the existence of the self-masking missingness. With this observation, we aim to investigate the identification problem of learning causal structure from missing data under an additive noise model with different missingness mechanisms, where the `no self-masking missingness' assumption can be eliminated appropriately. Specifically, we first elegantly extend the scope of identifiability of causal skeleton to the case with weak self-masking missingness (i.e., no other variable could be the cause of self-masking indicators except itself). We further provide the sufficient and necessary identification conditions of the causal direction under additive noise model and show that the causal structure can be identified up to an IN-equivalent pattern. We finally propose a practical algorithm based on the above theoretical results on learning the causal skeleton and causal direction. Extensive experiments on synthetic and real data demonstrate the efficiency and effectiveness of the proposed algorithms.

AAAI Conference 2024 Conference Paper

Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

  • Wei Chen
  • Zhiyi Huang
  • Ruichu Cai
  • Zhifeng Hao
  • Kun Zhang

Causal discovery with latent variables is a crucial but challenging task. Despite the emergence of numerous methods aimed at addressing this challenge, they are not fully identified to the structure that two observed variables are influenced by one latent variable and there might be a directed edge in between. Interestingly, we notice that this structure can be identified through the utilization of higher-order cumulants. By leveraging the higher-order cumulants of non-Gaussian data, we provide an analytical solution for estimating the causal coefficients or their ratios. With the estimated (ratios of) causal coefficients, we propose a novel approach to identify the existence of a causal edge between two observed variables subject to latent variable influence. In case when such a causal edge exits, we introduce an asymmetry criterion to determine the causal direction. The experimental results demonstrate the effectiveness of our proposed method.

IJCAI Conference 2024 Conference Paper

Individual Causal Structure Learning from Population Data

  • Wei Chen
  • Xiaokai Huang
  • Zijian Li
  • Ruichu Cai
  • Zhiyi Huang
  • Zhifeng Hao

Learning the causal structure of each individual plays a crucial role in neuroscience, biology, and so on. Existing methods consider data from each individual separately, which may yield inaccurate causal structure estimations in limited samples. To leverage more samples, we consider incorporating data from all individuals as population data. We observe that the variables of all individuals are influenced by the common environment variables they share. These shared environment variables can be modeled as latent variables and serve as a bridge connecting data from different individuals. In particular, we propose an Individual Linear Acyclic Model (ILAM) for each individual from population data, which models the individual's variables as being linearly influenced by their parents, in addition to environment variables and noise terms. Theoretical analysis shows that the model is identifiable when all environment variables are non-Gaussian, or even if some are Gaussian with an adequate diversity in the variance of noises for each individual. We then develop an individual causal structures learning method based on the Share Independence Component Analysis technique. Experimental results on synthetic and real-world data demonstrate the correctness of the method even when the sample size of each individual's data is small.

NeurIPS Conference 2024 Conference Paper

Learning Discrete Latent Variable Structures with Tensor Rank Conditions

  • Zhengming Chen
  • Ruichu Cai
  • Feng Xie
  • Jie Qiao
  • Anpeng Wu
  • Zijian Li
  • Zhifeng Hao
  • Kun Zhang

Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures. To achieve this, we explore a tensor rank condition on contingency tables for an observed variable set $\mathbf{X}_p$, showing that the rank is determined by the minimum support of a specific conditional set (not necessary in $\mathbf{X}_p$) that d-separates all variables in $\mathbf{X}_p$. By this, one can locate the latent variable through probing the rank on different observed variables set, and further identify the latent causal structure under some structure assumptions. We present the corresponding identification algorithm and conduct simulated experiments to verify the effectiveness of our method. In general, our results elegantly extend the identification boundary for causal discovery with discrete latent variables and expand the application scope of causal discovery with latent variables.

NeurIPS Conference 2024 Conference Paper

On the Identifiability of Poisson Branching Structural Causal Model Using Probability Generating Function

  • Yu Xiang
  • Jie Qiao
  • Zhefeng Liang
  • Zihuai Zeng
  • Ruichu Cai
  • Zhifeng Hao

Causal discovery from observational data, especially for count data, is essential across scientific and industrial contexts, such as biology, economics, and network operation maintenance. For this task, most approaches model count data using Bayesian networks or ordinal relations. However, they overlook the inherent branching structures that are frequently encountered, e. g. , a browsing event might trigger an adding cart or purchasing event. This can be modeled by a binomial thinning operator (for branching) and an additive independent Poisson distribution (for noising), known as Poisson Branching Structure Causal Model (PB-SCM). There is a provably sound cumulant-based causal discovery method that allows the identification of the causal structure under a branching structure. However, we show that there still remains a gap in that there exist causal directions that are identifiable while the algorithm fails to identify them. In this work, we address this gap by exploring the identifiability of PB-SCM using the Probability Generating Function (PGF). By developing a compact and exact closed-form solution for the PGF of PB-SCM, we demonstrate that each component in this closed-form solution uniquely encodes a specific local structure, enabling the identification of the local structures by testing their corresponding component appearances in the PGF. Building on this, we propose a practical algorithm for learning causal skeletons and identifying causal directions of PB-SCM using PGF. The effectiveness of our method is demonstrated through experiments on both synthetic and real datasets.

ICML Conference 2024 Conference Paper

Reducing Balancing Error for Causal Inference via Optimal Transport

  • Yuguang Yan
  • Hao Zhou
  • Zeqin Yang
  • Weilin Chen 0001
  • Ruichu Cai
  • Zhifeng Hao

Most studies on causal inference tackle the issue of confounding bias by reducing the distribution shift between the control and treated groups. However, it remains an open question to adopt an appropriate metric for distribution shift in practice. In this paper, we define a generic balancing error on reweighted samples to characterize the confounding bias, and study the connection between the balancing error and the Wasserstein discrepancy derived from the theory of optimal transport. We not only regard the Wasserstein discrepancy as the metric of distribution shift, but also explore the association between the balancing error and the underlying cost function involved in the Wasserstein discrepancy. Motivated by this, we propose to reduce the balancing error under the framework of optimal transport with learnable marginal distributions and the cost function, which is implemented by jointly learning weights and representations associated with factual outcomes. The experiments on both synthetic and real-world datasets demonstrate the effectiveness of our proposed method.

AAAI Conference 2024 Conference Paper

TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences

  • Yuequn Liu
  • Ruichu Cai
  • Wei Chen
  • Jie Qiao
  • Yuguang Yan
  • Zijian Li
  • Keli Zhang
  • Zhifeng Hao

Learning Granger causality from event sequences is a challenging but essential task across various applications. Most existing methods rely on the assumption that event sequences are independent and identically distributed (i.i.d.). However, this i.i.d. assumption is often violated due to the inherent dependencies among the event sequences. Fortunately, in practice, we find these dependencies can be modeled by a topological network, suggesting a potential solution to the non-i.i.d. problem by introducing the prior topological network into Granger causal discovery. This observation prompts us to tackle two ensuing challenges: 1) how to model the event sequences while incorporating both the prior topological network and the latent Granger causal structure, and 2) how to learn the Granger causal structure. To this end, we devise a unified topological neural Poisson auto-regressive model with two processes. In the generation process, we employ a variant of the neural Poisson process to model the event sequences, considering influences from both the topological network and the Granger causal structure. In the inference process, we formulate an amortized inference algorithm to infer the latent Granger causal structure. We encapsulate these two processes within a unified likelihood function, providing an end-to-end framework for this task. Experiments on simulated and real-world data demonstrate the effectiveness of our approach.

AAAI Conference 2024 Conference Paper

Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

  • Ruichu Cai
  • Yuxuan Zhu
  • Jie Qiao
  • Zefeng Liang
  • Furui Liu
  • Zhifeng Hao

Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted adversarial examples, which are generated through either well-conceived L_p-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreasonable and unpractical. For instance, a modification in income would inevitably impact features like the debt-to-income ratio within a banking system. By considering the underappreciated causal generating process, first, we pinpoint the source of the vulnerability of DNNs via the lens of causality, then give theoretical results to answer where to attack. Second, considering the consequences of the attack interventions on the current state of the examples to generate more realistic adversarial examples, we propose CADE, a framework that can generate Counterfactual ADversarial Examples to answer how to attack. The empirical results demonstrate CADE's effectiveness, as evidenced by its competitive performance across diverse attack scenarios, including white-box, transfer-based, and random intervention attacks.

ICML Conference 2023 Conference Paper

Causal Discovery with Latent Confounders Based on Higher-Order Cumulants

  • Ruichu Cai
  • Zhiyi Huang 0008
  • Wei Chen 0103
  • Zhifeng Hao
  • Kun Zhang 0001

Causal discovery with latent confounders is an important but challenging task in many scientific areas. Despite the success of some overcomplete independent component analysis (OICA) based methods in certain domains, they are computationally expensive and can easily get stuck into local optima. We notice that interestingly, by making use of higher-order cumulants, there exists a closed-form solution to OICA in specific cases, e. g. , when the mixing procedure follows the One-Latent-Component structure. In light of the power of the closed-form solution to OICA corresponding to the One-Latent-Component structure, we formulate a way to estimate the mixing matrix using the higher-order cumulants, and further propose the testable One-Latent-Component condition to identify the latent variables and determine causal orders. By iteratively removing the share identified latent components, we successfully extend the results on the One-Latent-Component structure to the Multi-Latent-Component structure and finally provide a practical and asymptotically correct algorithm to learn the causal structure with latent variables. Experimental results illustrate the asymptotic correctness and effectiveness of the proposed method.

IJCAI Conference 2023 Conference Paper

Some General Identification Results for Linear Latent Hierarchical Causal Structure

  • Zhengming Chen
  • Feng Xie
  • Jie Qiao
  • Zhifeng Hao
  • Ruichu Cai

We study the problem of learning hierarchical causal structure among latent variables from measured variables. While some existing methods are able to recover the latent hierarchical causal structure, they mostly suffer from restricted assumptions, including the tree-structured graph constraint, no ``triangle" structure, and non-Gaussian assumptions. In this paper, we relax these restrictions above and consider a more general and challenging scenario where the beyond tree-structured graph, the ``triangle" structure, and the arbitrary noise distribution are allowed. We investigate the identifiability of the latent hierarchical causal structure and show that by using second-order statistics, the latent hierarchical structure can be identified up to the Markov equivalence classes over latent variables. Moreover, some directions in the Markov equivalence classes of latent variables can be further identified using partially non-Gaussian data. Based on the theoretical results above, we design an effective algorithm for learning the latent hierarchical causal structure. The experimental results on synthetic data verify the effectiveness of the proposed method.

IJCAI Conference 2023 Conference Paper

Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences

  • Jie Qiao
  • Ruichu Cai
  • Siyu Wu
  • Yu Xiang
  • Keli Zhang
  • Zhifeng Hao

Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond applications, especially when dealing with discrete-time event sequences in low-resolution; and typical discrete Hawkes processes mainly suffer from identifiability issues raised by the instantaneous effect, i. e. , the causal relationship that occurred simultaneously due to the low-resolution data will not be captured by Granger causality. In this work, we propose Structure Hawkes Processes (SHPs) that leverage the instantaneous effect for learning the causal structure among events type in discrete-time event sequence. The proposed method is featured with the Expectation-Maximization of the likelihood function and a sparse optimization scheme. Theoretical results show that the instantaneous effect is a blessing rather than a curse, and the causal structure is identifiable under the existence of the instantaneous effect. Experiments on synthetic and real-world data verify the effectiveness of the proposed method.

NeurIPS Conference 2023 Conference Paper

Subspace Identification for Multi-Source Domain Adaptation

  • Zijian Li
  • Ruichu Cai
  • Guangyi Chen
  • Boyang Sun
  • Zhifeng Hao
  • Kun Zhang

Multi-source domain adaptation (MSDA) methods aim to transfer knowledge from multiple labeled source domains to an unlabeled target domain. Although current methods achieve target joint distribution identifiability by enforcing minimal changes across domains, they often necessitate stringent conditions, such as an adequate number of domains, monotonic transformation of latent variables, and invariant label distributions. These requirements are challenging to satisfy in real-world applications. To mitigate the need for these strict assumptions, we propose a subspace identification theory that guarantees the disentanglement of domain-invariant and domain-specific variables under less restrictive constraints regarding domain numbers and transformation properties and thereby facilitating domain adaptation by minimizing the impact of domain shifts on invariant variables. Based on this theory, we develop a Subspace Identification Guarantee (SIG) model that leverages variational inference. Furthermore, the SIG model incorporates class-aware conditional alignment to accommodate target shifts where label distributions change with the domain. Experimental results demonstrate that our SIG model outperforms existing MSDA techniques on various benchmark datasets, highlighting its effectiveness in real-world applications.

AAAI Conference 2022 Conference Paper

Identification of Linear Latent Variable Model with Arbitrary Distribution

  • Zhengming Chen
  • Feng Xie
  • Jie Qiao
  • Zhifeng Hao
  • Kun Zhang
  • Ruichu Cai

An important problem across multiple disciplines is to infer and understand meaningful latent variables. One strategy commonly used is to model the measured variables in terms of the latent variables under suitable assumptions on the connectivity from the latents to the measured (known as measurement model). Furthermore, it might be even more interesting to discover the causal relations among the latent variables (known as structural model). Recently, some methods have been proposed to estimate the structural model by assuming that the noise terms in the measured and latent variables are non-Gaussian. However, they are not suitable when some of the noise terms become Gaussian. To bridge this gap, we investigate the problem of identification of the structural model with arbitrary noise distributions. We provide necessary and sufficient condition under which the structural model is identifiable: it is identifiable iff for each pair of adjacent latent variables Lx, Ly, (1) at least one of Lx and Ly has non- Gaussian noise, or (2) at least one of them has a non-Gaussian ancestor and is not d-separated from the non-Gaussian component of this ancestor by the common causes of Lx and Ly. This identifiability result relaxes the non-Gaussianity requirements to only a (hopefully small) subset of variables, and accordingly elegantly extends the application scope of the structural model. Based on the above identifiability result, we further propose a practical algorithm to learn the structural model. We verify the correctness of the identifiability result and the effectiveness of the proposed method through empirical studies.

AAAI Conference 2021 Conference Paper

Appearance-Motion Memory Consistency Network for Video Anomaly Detection

  • Ruichu Cai
  • Hao Zhang
  • Wen Liu
  • Shenghua Gao
  • Zhifeng Hao

Abnormal event detection in the surveillance video is an essential but the challenging task and many methods have been proposed to deal with this problem. The previous methods either only considers the appearance information or directly integrate the results of appearance and motion information without considering their endogenous consistency semantic explicitly. Inspired by the rule that humans identify the abnormal frames from multi-modality signals, we propose an Appearance-Motion Memory Consistency Network (AMMC-Net). Our method first makes full use of the prior knowledge of appearance and motion signals to capture the correspondence between them in the high-level feature space explicitly. Then, it combines the multi-view features to obtain a more essential and robust feature representation of regular events, which can significantly increase the gap between an abnormal and a regular event. In the anomaly detection phase, we further introduce a commit error in the latent space joint with the prediction error in pixel space to enhance the detection accuracy. Solid experimental results on various standard datasets validate the effectiveness of our approach.

TIST Journal 2021 Journal Article

Causal Discovery with Confounding Cascade Nonlinear Additive Noise Models

  • Jie Qiao
  • Ruichu Cai
  • Kun Zhang
  • Zhenjie Zhang
  • Zhifeng Hao

Identification of causal direction between a causal-effect pair from observed data has recently attracted much attention. Various methods based on functional causal models have been proposed to solve this problem, by assuming the causal process satisfies some (structural) constraints and showing that the reverse direction violates such constraints. The nonlinear additive noise model has been demonstrated to be effective for this purpose, but the model class does not allow any confounding or intermediate variables between a cause pair–even if each direct causal relation follows this model. However, omitting the latent causal variables is frequently encountered in practice. After the omission, the model does not necessarily follow the model constraints. As a consequence, the nonlinear additive noise model may fail to correctly discover causal direction. In this work, we propose a confounding cascade nonlinear additive noise model to represent such causal influences–each direct causal relation follows the nonlinear additive noise model but we observe only the initial cause and final effect. We further propose a method to estimate the model, including the unmeasured confounding and intermediate variables, from data under the variational auto-encoder framework. Our theoretical results show that with our model, the causal direction is identifiable under suitable technical conditions on the data generation process. Simulation results illustrate the power of the proposed method in identifying indirect causal relations across various settings, and experimental results on real data suggest that the proposed model and method greatly extend the applicability of causal discovery based on functional causal models in nonlinear cases.

IJCAI Conference 2021 Conference Paper

Causal Discovery with Multi-Domain LiNGAM for Latent Factors

  • Yan Zeng
  • Shohei Shimizu
  • Ruichu Cai
  • Feng Xie
  • Michio Yamamoto
  • Zhifeng Hao

Discovering causal structures among latent factors from observed data is a particularly challenging problem. Despite some efforts for this problem, existing methods focus on the single-domain data only. In this paper, we propose Multi-Domain Linear Non-Gaussian Acyclic Models for LAtent Factors (MD-LiNA), where the causal structure among latent factors of interest is shared for all domains, and we provide its identification results. The model enriches the causal representation for multi-domain data. We propose an integrated two-phase algorithm to estimate the model. In particular, we first locate the latent factors and estimate the factor loading matrix. Then to uncover the causal structure among shared latent factors of interest, we derive a score function based on the characterization of independence relations between external influences and the dependence relations between multi-domain latent factors and latent factors of interest. We show that the proposed method provides locally consistent estimators. Experimental results on both synthetic and real-world data demonstrate the efficacy and robustness of our approach.

TIST Journal 2021 Journal Article

Causal Mechanism Transfer Network for Time Series Domain Adaptation in Mechanical Systems

  • Zijian Li
  • Ruichu Cai
  • Hong Wei Ng
  • Marianne Winslett
  • Tom Z. J. Fu
  • Boyan Xu
  • Xiaoyan Yang
  • Zhenjie Zhang

Data-driven models are becoming essential parts in modern mechanical systems, commonly used to capture the behavior of various equipment and varying environmental characteristics. Despite the advantages of these data-driven models on excellent adaptivity to high dynamics and aging equipment, they are usually hungry for massive labels, mostly contributed by human engineers at a high cost. Fortunately, domain adaptation enhances the model generalization by utilizing the labeled source data and the unlabeled target data. However, the mainstream domain adaptation methods cannot achieve ideal performance on time series data, since they assume that the conditional distributions are equal. This assumption works well in the static data but is inapplicable for the time series data. Even the first-order Markov dependence assumption requires the dependence between any two consecutive time steps. In this article, we assume that the causal mechanism is invariant and present our Causal Mechanism Transfer Network (CMTN) for time series domain adaptation. By capturing causal mechanisms of time series data, CMTN allows the data-driven models to exploit existing data and labels from similar systems, such that the resulting model on a new system is highly reliable even with limited data. We report our empirical results and lessons learned from two real-world case studies, on chiller plant energy optimization and boiler fault detection, which outperform the existing state-of-the-art method.

NeurIPS Conference 2021 Conference Paper

Domain Adaptation with Invariant Representation Learning: What Transformations to Learn?

  • Petar Stojanov
  • Zijian Li
  • Mingming Gong
  • Ruichu Cai
  • Jaime Carbonell
  • Kun Zhang

Unsupervised domain adaptation, as a prevalent transfer learning setting, spans many real-world applications. With the increasing representational power and applicability of neural networks, state-of-the-art domain adaptation methods make use of deep architectures to map the input features $X$ to a latent representation $Z$ that has the same marginal distribution across domains. This has been shown to be insufficient for generating optimal representation for classification, and to find conditionally invariant representations, usually strong assumptions are needed. We provide reasoning why when the supports of the source and target data from overlap, any map of $X$ that is fixed across domains may not be suitable for domain adaptation via invariant features. Furthermore, we develop an efficient technique in which the optimal map from $X$ to $Z$ also takes domain-specific information as input, in addition to the features $X$. By using the property of minimal changes of causal mechanisms across domains, our model also takes into account the domain-specific information to ensure that the latent representation $Z$ does not discard valuable information about $Y$. We demonstrate the efficacy of our method via synthetic and real-world data experiments. The code is available at: \texttt{https: //github. com/DMIRLAB-Group/DSAN}.

JBHI Journal 2021 Journal Article

Prediction of Synthetic Lethal Interactions in Human Cancers Using Multi-View Graph Auto-Encoder

  • Zhifeng Hao
  • Di Wu
  • Yuan Fang
  • Min Wu
  • Ruichu Cai
  • Xiaoli Li

Synthetic lethality (SL) is a very important concept for the development of targeted anticancer drugs. However, experimental methods for SL detection often suffer from various issues like high cost and low consistency across cell lines. Hence, computational methods for predicting novel SLs have recently emerged as complements for wet-lab experiments. In addition, SL data can be represented as a graph where nodes are genes and edges are the SL interactions. It is thus motivated to design advanced graph-based machine learning algorithms for SL prediction. In this paper, we propose a novel SL prediction method using Multi-view Graph Auto-Encoder (SLMGAE). We consider the SL graph as the main view and the graphs from other data sources (e. g. , PPI, GO, etc.) as support views. Multiple Graph Auto-Encoders (GAEs) are implemented to reconstruct the graphs for different views. We further design an attention mechanism, which assigns different weights for support views, to combine all the reconstructed graphs for SL prediction. The overall SLMGAE model is then trained by minimizing both the reconstruction error and prediction error. Experimental results on the SynLethDB dataset show that SLMGAE outperforms state-of-the-arts. The case studies on novel predicted SLs also illustrate the effectiveness of our SLMGAE method.

NeurIPS Conference 2021 Conference Paper

SADGA: Structure-Aware Dual Graph Aggregation Network for Text-to-SQL

  • Ruichu Cai
  • Jinjie Yuan
  • Boyan Xu
  • Zhifeng Hao

The Text-to-SQL task, aiming to translate the natural language of the questions into SQL queries, has drawn much attention recently. One of the most challenging problems of Text-to-SQL is how to generalize the trained model to the unseen database schemas, also known as the cross-domain Text-to-SQL task. The key lies in the generalizability of (i) the encoding method to model the question and the database schema and (ii) the question-schema linking method to learn the mapping between words in the question and tables/columns in the database schema. Focusing on the above two key issues, we propose a \emph{Structure-Aware Dual Graph Aggregation Network} (SADGA) for cross-domain Text-to-SQL. In SADGA, we adopt the graph structure to provide a unified encoding model for both the natural language question and database schema. Based on the proposed unified modeling, we further devise a structure-aware aggregation method to learn the mapping between the question-graph and schema-graph. The structure-aware aggregation method is featured with \emph{Global Graph Linking}, \emph{Local Graph Linking} and \emph{Dual-Graph Aggregation Mechanism}. We not only study the performance of our proposal empirically but also achieved 3rd place on the challenging Text-to-SQL benchmark Spider at the time of writing.

AAAI Conference 2021 Conference Paper

Time Series Domain Adaptation via Sparse Associative Structure Alignment

  • Ruichu Cai
  • Jiawei Chen
  • Zijian Li
  • Wei Chen
  • Keli Zhang
  • Junjian Ye
  • Zhuozhang Li
  • Xiaoyan Yang

Domain adaptation on time series data is an important but challenging task. Most of the existing works in this area are based on the learning of the domain-invariant representation of the data with the help of restrictions like MMD. However, such extraction of the domain-invariant representation is a non-trivial task for time series data, due to the complex dependence among the timestamps. In detail, in the fully dependent time series, a small change of the time lags or the offsets may lead to difficulty in the domain invariant extraction. Fortunately, the stability of the causality inspired us to explore the domain invariant structure of the data. To reduce the difficulty in the discovery of causal structure, we relax it to the sparse associative structure and propose a novel sparse associative structure alignment model for domain adaptation. First, we generate the segment set to exclude the obstacle of offsets. Second, the intra-variables and inter-variables sparse attention mechanisms are devised to extract associative structure time-series data with considering time lags. Finally, the associative structure alignment is used to guide the transfer of knowledge from the source domain to the target one. Experimental studies not only verify the good performance of our methods on three real-world datasets but also provide some insightful discoveries on the transferred knowledge.

NeurIPS Conference 2020 Conference Paper

Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs

  • Feng Xie
  • Ruichu Cai
  • Biwei Huang
  • Clark Glymour
  • Zhifeng Hao
  • Kun Zhang

Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e. g. , image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i. e. , if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

IJCAI Conference 2019 Conference Paper

Causal Discovery with Cascade Nonlinear Additive Noise Model

  • Ruichu Cai
  • Jie Qiao
  • Kun Zhang
  • Zhenjie Zhang
  • Zhifeng Hao

Identification of causal direction between a causal-effect pair from observed data has recently attracted much attention. Various methods based on functional causal models have been proposed to solve this problem, by assuming the causal process satisfies some (structural) constraints and showing that the reverse direction violates such constraints. The nonlinear additive noise model has been demonstrated to be effective for this purpose, but the model class is not transitive--even if each direct causal relation follows this model, indirect causal influences, which result from omitted intermediate causal variables and are frequently encountered in practice, do not necessarily follow the model constraints; as a consequence, the nonlinear additive noise model may fail to correctly discover causal direction. In this work, we propose a cascade nonlinear additive noise model to represent such causal influences--each direct causal relation follows the nonlinear additive noise model but we observe only the initial cause and final effect. We further propose a method to estimate the model, including the unmeasured intermediate variables, from data, under the variational auto-encoder framework. Our theoretical results show that with our model, causal direction is identifiable under suitable technical conditions on the data generation process. Simulation results illustrate the power of the proposed method in identifying indirect causal relations across various settings, and experimental results on real data suggest that the proposed model and method greatly extend the applicability of causal discovery based on functional causal models in nonlinear cases.

IJCAI Conference 2019 Conference Paper

Learning Disentangled Semantic Representation for Domain Adaptation

  • Ruichu Cai
  • Zijian Li
  • Pengfei Wei
  • Jie Qiao
  • Kun Zhang
  • Zhifeng Hao

Domain adaptation is an important but challenging task. Most of the existing domain adaptation methods struggle to extract the domain-invariant representation on the feature space with entangling domain information and semantic information. Different from previous efforts on the entangled feature space, we aim to extract the domain invariant semantic information in the latent disentangled semantic representation (DSR) of the data. In DSR, we assume the data generation process is controlled by two independent sets of variables, i. e. , the semantic latent variables and the domain latent variables. Under the above assumption, we employ a variational auto-encoder to reconstruct the semantic latent variables and domain latent variables behind the data. We further devise a dual adversarial network to disentangle these two sets of reconstructed latent variables. The disentangled semantic latent variables are finally adapted across the domains. Experimental studies testify that our model yields state-of-the-art performance on several domain adaptation benchmark datasets.

NeurIPS Conference 2019 Conference Paper

Triad Constraints for Learning Causal Structure of Latent Variables

  • Ruichu Cai
  • Feng Xie
  • Clark Glymour
  • Zhifeng Hao
  • Kun Zhang

Learning causal structure from observational data has attracted much attention, and it is notoriously challenging to find the underlying structure in the presence of confounders (hidden direct common causes of two variables). In this paper, by properly leveraging the non-Gaussianity of the data, we propose to estimate the structure over latent variables with the so-called Triad constraints: we design a form of "pseudo-residual" from three variables, and show that when causal relations are linear and noise terms are non-Gaussian, the causal direction between the latent variables for the three observed variables is identifiable by checking a certain kind of independence relationship. In other words, the Triad constraints help us to locate latent confounders and determine the causal direction between them. This goes far beyond the Tetrad constraints and reveals more information about the underlying structure from non-Gaussian data. Finally, based on the Triad constraints, we develop a two-step algorithm to learn the causal structure corresponding to measurement models. Experimental results on both synthetic and real data demonstrate the effectiveness and reliability of our method.

IJCAI Conference 2018 Conference Paper

An Encoder-Decoder Framework Translating Natural Language to Database Queries

  • Ruichu Cai
  • Boyan Xu
  • Zhenjie Zhang
  • Xiaoyan Yang
  • Zijian Li
  • Zhihao Liang

Machine translation is going through a radical revolution, driven by the explosive development of deep learning techniques using Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). In this paper, we consider a special case in machine translation problems, targeting to convert natural language into Structured Query Language (SQL) for data retrieval over relational database. Although generic CNN and RNN learn the grammar structure of SQL when trained with sufficient samples, the accuracy and training efficiency of the model could be dramatically improved, when the translation model is deeply integrated with the grammar rules of SQL. We present a new encoder-decoder framework, with a suite of new approaches, including new semantic features fed into the encoder, grammar-aware states injected into the memory of decoder, as well as recursive state management for sub-queries. These techniques help the neural network better focus on understanding semantics of operations in natural language and save the efforts on SQL grammar learning. The empirical evaluation on real world database and queries show that our approach outperform state-of-the-art solution by a significant margin.

NeurIPS Conference 2018 Conference Paper

Causal Discovery from Discrete Data using Hidden Compact Representation

  • Ruichu Cai
  • Jie Qiao
  • Kun Zhang
  • Zhenjie Zhang
  • Zhifeng Hao

Causal discovery from a set of observations is one of the fundamental problems across several disciplines. For continuous variables, recently a number of causal discovery methods have demonstrated their effectiveness in distinguishing the cause from effect by exploring certain properties of the conditional distribution, but causal discovery on categorical data still remains to be a challenging problem, because it is generally not easy to find a compact description of the causal mechanism for the true causal direction. In this paper we make an attempt to find a way to solve this problem by assuming a two-stage causal process: the first stage maps the cause to a hidden variable of a lower cardinality, and the second stage generates the effect from the hidden representation. In this way, the causal mechanism admits a simple yet compact representation. We show that under this model, the causal direction is identifiable under some weak conditions on the true causal mechanism. We also provide an effective solution to recover the above hidden compact representation within the likelihood framework. Empirical studies verify the effectiveness of the proposed approach on both synthetic and real-world data.

AAAI Conference 2018 Conference Paper

SELF: Structural Equational Likelihood Framework for Causal Discovery

  • Ruichu Cai
  • Jie Qiao
  • Zhenjie Zhang
  • Zhifeng Hao

Causal discovery without intervention is well recognized as a challenging yet powerful data analysis tool, boosting the development of other scientific areas, such as biology, astronomy, and social science. The major technical difficulty behind the observation-based causal discovery is to effectively and efficiently identify causes and effects from correlated variables given the existence of significant noises. Previous studies mostly employ two very different methodologies under Bayesian network framework, namely global likelihood maximization and locally complexity analysis over marginal distributions. While these approaches are effective in their respective problem domains, in this paper, we show that they can be combined to formulate a new global optimization model with local statistical significance, called structural equational likelihood framework (or SELF in short). We provide thorough analysis on the soundness of the model under mild conditions and present efficient heuristic-based algorithms for scalable model training. Empirical evaluations using XGBoost validate the superiority of our proposal over state-of-the-art solutions, on both synthetic and real world causal structures.

IJCAI Conference 2017 Conference Paper

A Robust Noise Resistant Algorithm for POI Identification from Flickr Data

  • Yiyang Yang
  • Zhiguo Gong
  • Qing Li
  • Leong Hou U
  • Ruichu Cai
  • Zhifeng Hao

Point of Interests (POI) identification using social media data (e. g. Flickr, Microblog) is one of the most popular research topics in recent years. However, there exist large amounts of noises (POI irrelevant data) in such crowd-contributed collections. Traditional solutions to this problem is to set a global density threshold and remove the data point as noise if its density is lower than the threshold. However, the density values vary significantly among POIs. As the result, some POIs with relatively lower density could not be identified. To solve the problem, we propose a technique based on the local drastic changes of the data density. First we define the local maxima of the density function as the Urban POIs, and the gradient ascent algorithm is exploited to assign data points into different clusters. To remove noises, we incorporate the Laplacian Zero-Crossing points along the gradient ascent process as the boundaries of the POI. Points located outside the POI region are regarded as noises. Then the technique is extended into the geographical and textual joint space so that it can make use of the heterogeneous features of social media. The experimental results show the significance of the proposed approach in removing noises.

ICML Conference 2013 Conference Paper

SADA: A General Framework to Support Robust Causation Discovery

  • Ruichu Cai
  • Zhenjie Zhang
  • Zhifeng Hao

Causality discovery without manipulation is considered a crucial problem to a variety of applications, such as genetic therapy. The state-of-the-art solutions, e. g. LiNGAM, return accurate results when the number of labeled samples is larger than the number of variables. These approaches are thus applicable only when large numbers of samples are available or the problem domain is sufficiently small. Motivated by the observations of the local sparsity properties on causal structures, we propose a general Split-and-Merge strategy, named SADA, to enhance the scalability of a wide class of causality discovery algorithms. SADA is able to accurately identify the causal variables, even when the sample size is significantly smaller than the number of variables. In SADA, the variables are partitioned into subsets, by finding cuts on the sparse probabilistic graphical model over the variables. By running mainstream causation discovery algorithms, e. g. LiNGAM, on the subproblems, complete causality can be reconstructed by combining all the partial results. SADA benefits from the recursive division technique, since each small subproblem generates more accurate result under the same number of samples. We theoretically prove that SADA always reduces the scale of problems without significant sacrifice on result accuracy, depending only on the local sparsity condition over the variables. Experiments on real-world datasets verify the improvements on scalability and accuracy by applying SADA on top of existing causation algorithms.