Arrow Research search

Author name cluster

Jin Tian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

37 papers
2 author rows

Possible papers

37

AAAI Conference 2026 Conference Paper

Potential Outcome Rankings for Counterfactual Decision Making

  • Yuta Kawakami
  • Jin Tian

Counterfactual decision-making in the face of uncertainty involves selecting the optimal action from several alternatives using causal reasoning. Decision-makers often rank expected potential outcomes (or their corresponding utility and desirability) to compare the preferences of candidate actions. In this paper, we study new counterfactual decision-making rules by introducing two new metrics: the probabilities of potential outcome ranking (PoR) and the probability of achieving the best potential outcome (PoB). PoR reveals the most probable ranking of potential outcomes for an individual, and PoB indicates the action most likely to yield the top-ranked outcome for an individual. We then establish identification theorems and derive bounds for these metrics, and present estimation methods. Finally, we perform numerical experiments to illustrate the finite-sample properties of the estimators and demonstrate their application to a real-world dataset.

TMLR Journal 2025 Journal Article

AlignFix: Fixing Adversarial Perturbations by Agreement Checking for Adversarial Robustness against Black-box Attacks

  • Ashutosh Kumar Nirala
  • Jin Tian
  • Olukorede Fakorede
  • Modeste Atsague

Motivated by the vulnerability of feed-forward visual pathways to adversarial-like inputs and the overall robustness of biological perception, commonly attributed to top-down feedback processes, we propose a new defense method AlignFix. We exploit the fact that natural and adversarially trained models rely on distinct feature sets for classification. Notably, naturally trained models, referred to as \textit{weakM}, retain commendable accuracy against adversarial examples generated using adversarially trained models referred to as \textit{strongM}, and vice-versa. Further these two models tend to agree more on their prediction if input is nudged towards correct class prediction. Leveraging this, AlignFix initially perturbs the input toward the class predicted by a naturally trained model, using a joint loss from both \textit{weakM} and \textit{strongM}. If this retains or leads to agreement, the prediction is accepted, otherwise the original \textit{strongM} output is used. This mechanism is highly effective against leading SQA (Score-based Query Attacks) as well as decision-based and transfer-based black-box attacks. We demonstrate its effectiveness through comprehensive experiments across various datasets (CIFAR and ImageNet) and architectures (ResNet and ViT).

NeurIPS Conference 2025 Conference Paper

Causal Discovery over Clusters of Variables in Markovian Systems

  • Tara Anand
  • Adèle Ribeiro
  • Jin Tian
  • George Hripcsak
  • Elias Bareinboim

Causal discovery methods are powerful tools for uncovering the structure of relationships among variables, yet they face significant challenges in scalability and interpretability, especially in high-dimensional settings. In many domains, researchers are not only interested in causal links between individual variables, but also in relationships among sets or clusters of variables. Learning causal structure at the cluster level can both reveal higher-order relationships of interest and improve scalability. In this work, we introduce an approach for causal discovery over clusters in Markov causal systems. We propose a new graphical model that encodes knowledge of relationships between user-defined clusters while fully representing independencies and dependencies over clusters, faithful to a given distribution. We then define and characterize a graphical equivalence class of these models that share cluster-level independence information. Lastly, we present a sound and complete algorithm for causal discovery to represent learnable causal relationships between clusters of variables.

ICML Conference 2025 Conference Paper

Causal Logistic Bandits with Counterfactual Fairness Constraints

  • Jiajun Chen
  • Jin Tian
  • Christopher John Quinn

Artificial intelligence will play a significant role in decision making in numerous aspects of society. Numerous fairness criteria have been proposed in the machine learning community, but there remains limited investigation into fairness as defined through specified attributes in a sequential decision-making framework. In this paper, we focus on causal logistic bandit problems where the learner seeks to make fair decisions, under a notion of fairness that accounts for counterfactual reasoning. We propose and analyze an algorithm by leveraging primal-dual optimization for constrained causal logistic bandits where the non-linear constraints are a priori unknown and must be learned in time. We obtain sub-linear regret guarantees with leading term similar to that for unconstrained logistic bandits (Lee et al. , 2024) while guaranteeing sub-linear constraint violations. We show how to achieve zero cumulative constraint violations with a small increase in the regret bound.

UAI Conference 2025 Conference Paper

Decomposition of Probabilities of Causation with Two Mediators

  • Yuta Kawakami
  • Jin Tian

Mediation analysis for probabilities of causation (PoC) provides a fundamental framework for evaluating the necessity and sufficiency of treatment in provoking an event through different causal pathways. One of the primary objectives of causal mediation analysis is to decompose the total effect into path-specific components. In this study, we investigate the path-specific probability of necessity and sufficiency (PNS) to decompose the total PNS into path-specific components along distinct causal pathways between treatment and outcome, incorporating two mediators. We define the path-specific PNS for decomposition and provide an identification theorem. Furthermore, we conduct numerical experiments to assess the properties of the proposed estimators from finite samples and demonstrate their practical application using a real-world educational dataset.

AAAI Conference 2025 Conference Paper

Mediation Analysis for Probabilities of Causation

  • Yuta Kawakami
  • Jin Tian

Probabilities of causation (PoC) offer valuable insights for informed decision-making. This paper introduces novel variants of PoC-controlled direct, natural direct, and natural indirect probability of necessity and sufficiency (PNS). These metrics quantify the necessity and sufficiency of a treatment for producing an outcome, accounting for different causal pathways. We develop identification theorems for these new PoC measures, allowing for their estimation from observational data. We demonstrate the practical application of our results through an analysis of a real-world psychology dataset.

UAI Conference 2025 Conference Paper

Moments of Causal Effects

  • Yuta Kawakami
  • Jin Tian

The moments of random variables are fundamental statistical measures for characterizing the shape of a probability distribution, encompassing metrics such as mean, variance, skewness, and kurtosis. Additionally, the product moments, including covariance and correlation, reveal the relationships between multiple random variables. On the other hand, the primary focus of causal inference is the evaluation of causal effects, which are defined as the difference between two potential outcomes. While traditional causal effect assessment focuses on the average causal effect, this work provides definitions, identification theorems, and bounds for moments and product moments of causal effects to analyze their distribution and relationships. We conduct experiments to illustrate the estimation of the moments of causal effects from finite samples and demonstrate their practical application using a real-world medical dataset.

AAAI Conference 2025 Conference Paper

Testing Causal Models with Hidden Variables in Polynomial Delay via Conditional Independencies

  • Hyunchai Jeong
  • Adiba Ejaz
  • Jin Tian
  • Elias Bareinboim

Testing a hypothesized causal model against observational data is a key prerequisite for many causal inference tasks. A natural approach is to test whether the conditional independence relations (CIs) assumed in the model hold in the data. While a model can assume exponentially many CIs (with respect to the number of variables), testing all of them is both impractical and unnecessary. Causal graphs, which encode these CIs in polynomial space, give rise to local Markov properties that enable model testing with a significantly smaller subset of CIs. Model testing based on local properties requires an algorithm to list the relevant CIs. However, existing algorithms for realistic settings with hidden variables and non-parametric distributions can take exponential time to produce even a single CI constraint. In this paper, we introduce the c-component local Markov property (C-LMP) for causal graphs with hidden variables. Since C-LMP can still invoke an exponential number of CIs, we develop a polynomial delay algorithm to list these CIs in poly-time intervals. To our knowledge, this is the first algorithm that enables poly-delay testing of CIs in causal graphs with hidden variables against arbitrary data distributions. Experiments on real-world and synthetic data demonstrate the practicality of our algorithm.

ECAI Conference 2024 Conference Paper

Estimating Causal Effects from Learned Causal Networks

  • Anna Raichev
  • Jin Tian
  • Alexander Ihler
  • Rina Dechter

The standard approach to answering an identifiable causal-effect query (e. g. , P(Y|do(X)) given a causal diagram and observational data is to first generate an estimand, or probabilistic expression over the observable variables, which is then evaluated using the observational data. In this paper, we propose an alternative paradigm for answering causal-effect queries over discrete observable variables. We propose to instead learn the causal Bayesian network and its confounding latent variables directly from the observational data. Then, efficient probabilistic graphical model (PGM) algorithms can be applied to the learned model to answer queries. Perhaps surprisingly, we show that this model completion learning approach can be more effective than estimand approaches, particularly for larger models in which the estimand expressions become computationally difficult. We illustrate our method’s potential using a benchmark collection of Bayesian networks and synthetically generated causal models.

UAI Conference 2024 Conference Paper

Identification and Estimation of Conditional Average Partial Causal Effects via Instrumental Variable

  • Yuta Kawakami
  • Manabu Kuroki
  • Jin Tian

There has been considerable recent interest in estimating heterogeneous causal effects. In this paper, we study conditional average partial causal effects (CAPCE) to reveal the heterogeneity of causal effects with continuous treatment. We provide conditions for identifying CAPCE in an instrumental variable setting. Notably, CAPCE is identifiable under a weaker assumption than required by a commonly used measure for estimating heterogeneous causal effects of continuous treatment. We develop three families of CAPCE estimators: sieve, parametric, and reproducing kernel Hilbert space (RKHS)-based, and analyze their statistical properties. We illustrate the proposed CAPCE estimators on synthetic and real-world data.

UAI Conference 2024 Conference Paper

Probabilities of Causation for Continuous and Vector Variables

  • Yuta Kawakami
  • Manabu Kuroki
  • Jin Tian

*Probabilities of causation* (PoC) are valuable concepts for explainable artificial intelligence and practical decision-making. PoC are originally defined for scalar binary variables. In this paper, we extend the concept of PoC to continuous treatment and outcome variables, and further generalize PoC to capture causal effects between multiple treatments and multiple outcomes. In addition, we consider PoC for a sub-population and PoC with multi-hypothetical terms to capture more sophisticated counterfactual information useful for decision-making. We provide a nonparametric identification theorem for each type of PoC we introduce. Finally, we illustrate the application of our results on a real-world dataset about education.

TMLR Journal 2024 Journal Article

Standard-Deviation-Inspired Regularization for Improving Adversarial Robustness

  • Olukorede Fakorede
  • Modeste Atsague
  • Jin Tian

Adversarial Training (AT ) has been demonstrated to improve the robustness of deep neural networks (DNNs) to adversarial attacks. AT is a min-max optimization procedure wherein adversarial examples are generated to train a robust DNN. The inner maximization step of AT maximizes the losses of inputs w.r.t their actual classes. The outer minimization involves minimizing the losses on the adversarial examples obtained from the inner maximization. This work proposes a standard-deviation-inspired (SDI ) regularization term for improving adversarial robustness and generalization. We argue that the inner maximization is akin to minimizing a modified standard deviation of a model’s output probabilities. Moreover, we argue that maximizing the modified standard deviation measure may complement the outer minimization of the AT framework. To corroborate our argument, we experimentally show that the SDI measure may be utilized to craft adversarial examples. Furthermore, we show that combining the proposed SDI regularization term with existing AT variants improves the robustness of DNNs to stronger attacks (e.g., CW and Auto-attack) and improves robust generalization.

NeurIPS Conference 2024 Conference Paper

Unified Covariate Adjustment for Causal Inference

  • Yonghan Jung
  • Jin Tian
  • Elias Bareinboim

Causal effect identification and estimation are two crucial tasks in causal inference. Although causal effect identification has been theoretically resolved, many existing estimators only address a subset of scenarios, known as the sequential back-door adjustment (SBD) (Pearl and Robins, 1995) or g-formula (Robins, 1986). Recent efforts for developing general-purpose estimators with broader coverage, incorporating the front-door adjustment (FD) (Pearl, 2000) and more, lack scalability due to the high computational cost of summing over high-dimensional variables. In this paper, we introduce a novel approach that achieves broad coverage of causal estimands beyond the SBD, incorporating various sum-product functionals like the FD, while maintaining scalability -- estimated in polynomial time relative to the number of variables and samples. Specifically, we present the class of UCA for which a scalable and doubly robust estimator is developed. In particular, we illustrate the expressiveness of UCA for a wide spectrum of causal estimands (e. g. , SBD, FD, and more) in causal inference. We then develop an estimator that exhibits computational efficiency and doubly robustness. The scalability and robustness of the proposed framework are verified through simulations.

AAAI Conference 2023 Conference Paper

Causal Effect Identification in Cluster DAGs

  • Tara V. Anand
  • Adele H. Ribeiro
  • Jin Tian
  • Elias Bareinboim

Reasoning about the effect of interventions and counterfactuals is a fundamental task found throughout the data sciences. A collection of principles, algorithms, and tools has been developed for performing such tasks in the last decades. One of the pervasive requirements found throughout this literature is the articulation of assumptions, which commonly appear in the form of causal diagrams. Despite the power of this approach, there are significant settings where the knowledge necessary to specify a causal diagram over all variables is not available, particularly in complex, high-dimensional domains. In this paper, we introduce a new graphical modeling tool called cluster DAGs (for short, C-DAGs) that allows for the partial specification of relationships among variables based on limited prior knowledge, alleviating the stringent requirement of specifying a full causal diagram. A C-DAG specifies relationships between clusters of variables, while the relationships between the variables within a cluster are left unspecified, and can be seen as a graphical representation of an equivalence class of causal diagrams that share the relationships among the clusters. We develop the foundations and machinery for valid inferences over C-DAGs about the clusters of variables at each layer of Pearl's Causal Hierarchy - L1 (probabilistic), L2 (interventional), and L3 (counterfactual). In particular, we prove the soundness and completeness of d-separation for probabilistic inference in C-DAGs. Further, we demonstrate the validity of Pearl's do-calculus rules over C-DAGs and show that the standard ID identification algorithm is sound and complete to systematically compute causal effects from observational data given a C-DAG. Finally, we show that C-DAGs are valid for performing counterfactual inferences about clusters of variables.

NeurIPS Conference 2023 Conference Paper

Estimating Causal Effects Identifiable from a Combination of Observations and Experiments

  • Yonghan Jung
  • Ivan Diaz
  • Jin Tian
  • Elias Bareinboim

Learning cause and effect relations is arguably one of the central challenges found throughout the data sciences. Formally, determining whether a collection of observational and interventional distributions can be combined to learn a target causal relation is known as the problem of generalized identification (or g-identification) [Lee et al. , 2019]. Although g-identification has been well understood and solved in theory, it turns out to be challenging to apply these results in practice, in particular when considering the estimation of the target distribution from finite samples. In this paper, we develop a new, general estimator that exhibits multiply robustness properties for g-identifiable causal functionals. Specifically, we show that any g-identifiable causal effect can be expressed as a function of generalized multi-outcome sequential back-door adjustments that are amenable to estimation. We then construct a corresponding estimator for the g-identification expression that exhibits robustness properties to bias. We analyze the asymptotic convergence properties of the estimator. Finally, we illustrate the use of the proposed estimator in experimental studies. Simulation results corroborate the theory.

ICML Conference 2023 Conference Paper

Instrumental Variable Estimation of Average Partial Causal Effects

  • Yuta Kawakami
  • Manabu Kuroki
  • Jin Tian

Instrumental variable (IV) analysis is a powerful tool widely used to elucidate causal relationships. We study the problem of estimating the average partial causal effect (APCE) of a continuous treatment in an IV setting. Specifically, we develop new methods for estimating APCE based on a recent identification condition via an integral equation. We develop two families of methods, nonparametric and parametric - the former uses the Picard iteration to solve the integral equation; the latter parameterizes APCE using a linear basis function model. We analyze the statistical and computational properties of the proposed methods and illustrate them on synthetic and real data.

TMLR Journal 2023 Journal Article

Vulnerability-Aware Instance Reweighting For Adversarial Training

  • Olukorede Fakorede
  • Ashutosh Kumar Nirala
  • Modeste Atsague
  • Jin Tian

Adversarial Training (AT) has been found to substantially improve the robustness of deep learning classifiers against adversarial attacks. AT involves obtaining robustness by including adversarial examples in training a classifier. Most variants of AT algorithms treat every training example equally. However, recent works have shown that better performance is achievable by treating them unequally. In addition, it has been observed that AT exerts an uneven influence on different classes in a training set and unfairly hurts examples corresponding to classes that are inherently harder to classify. Consequently, various reweighting schemes have been proposed that assign unequal weights to robust losses of individual examples in a training set. In this work, we propose a novel instance-wise reweighting scheme. It considers the vulnerability of each natural example and the resulting information loss on its adversarial counterpart occasioned by adversarial attacks. Through extensive experiments, we show that our proposed method significantly improves over existing reweighting schemes, especially against strong white and black-box attacks.

NeurIPS Conference 2022 Conference Paper

Finding and Listing Front-door Adjustment Sets

  • Hyunchai Jeong
  • Jin Tian
  • Elias Bareinboim

Identifying the effects of new interventions from data is a significant challenge found across a wide range of the empirical sciences. A well-known strategy for identifying such effects is Pearl's front-door (FD) criterion. The definition of the FD criterion is declarative, only allowing one to decide whether a specific set satisfies the criterion. In this paper, we present algorithms for finding and enumerating possible sets satisfying the FD criterion in a given causal diagram. These results are useful in facilitating the practical applications of the FD criterion for causal effects estimation and helping scientists to select estimands with desired properties, e. g. , based on cost, feasibility of measurement, or statistical power.

ICML Conference 2022 Conference Paper

Neuron Dependency Graphs: A Causal Abstraction of Neural Networks

  • Yaojie Hu
  • Jin Tian

We discover that neural networks exhibit approximate logical dependencies among neurons, and we introduce Neuron Dependency Graphs (NDG) that extract and present them as directed graphs. In an NDG, each node corresponds to the boolean activation value of a neuron, and each edge models an approximate logical implication from one node to another. We show that the logical dependencies extracted from the training dataset generalize well to the test set. In addition to providing symbolic explanations to the neural network’s internal structure, NDGs can represent a Structural Causal Model. We empirically show that an NDG is a causal abstraction of the corresponding neural network that "unfolds" the same way under causal interventions using the theory by Geiger et al. (2021). Code is available at https: //github. com/phimachine/ndg.

NeurIPS Conference 2021 Conference Paper

Double Machine Learning Density Estimation for Local Treatment Effects with Instruments

  • Yonghan Jung
  • Jin Tian
  • Elias Bareinboim

Local treatment effects are a common quantity found throughout the empirical sciences that measure the treatment effect among those who comply with what they are assigned. Most of the literature is focused on estimating the average of such quantity, which is called the ``local average treatment effect (LATE)'' [Imbens and Angrist, 1994]). In this work, we study how to estimate the density of the local treatment effect, which is naturally more informative than its average. Specifically, we develop two families of methods for this task, namely, kernel-smoothing and model-based approaches. The kernel-smoothing-based approach estimates the density through some smooth kernel functions. The model-based approach estimates the density by projecting it onto a finite-dimensional density class. For both approaches, we derive the corresponding double/debiased machine learning-based estimators [Chernozhukov et al. , 2018]. We further study the asymptotic convergence rates of the estimators and show that they are robust to the biases in nuisance function estimation. The use of the proposed methods is illustrated through both synthetic and a real dataset called 401(k).

AAAI Conference 2021 Conference Paper

Estimating Identifiable Causal Effects through Double Machine Learning

  • Yonghan Jung
  • Jin Tian
  • Elias Bareinboim

Identifying causal effects from observational data is a pervasive challenge found throughout the empirical sciences. Very general methods have been developed to decide the identifiability of a causal quantity from a combination of observational data and causal knowledge about the underlying system. In practice, however, there are still challenges to estimating identifiable causal functionals from finite samples. Recently, a method known as double/debiased machine learning (DML) (Chernozhukov et al. 2018) has been proposed to learn parameters leveraging modern machine learning techniques, which is both robust to model misspecification and bias-reducing. Still, DML has only been used for causal estimation in settings when the back-door condition (also known as conditional ignorability) holds. In this paper, we develop a new, general class of estimators for any identifiable causal functionals that exhibit DML properties, which we name DML-ID. In particular, we introduce a complete identification algorithm that returns an influence function (IF) for any identifiable causal functional. We then construct the DML estimator based on the derived IF. We show that DML-ID estimators hold the key properties of debiasedness and doubly robustness. Simulation results corroborate with the theory.

AAAI Conference 2020 Conference Paper

Estimating Causal Effects Using Weighting-Based Estimators

  • Yonghan Jung
  • Jin Tian
  • Elias Bareinboim

Causal effect identification is one of the most prominent and well-understood problems in causal inference. Despite the generality and power of the results developed so far, there are still challenges in their applicability to practical settings, arguably due to the finitude of the samples. Simply put, there is a gap between causal effect identification and estimation. One popular setting in which sample-efficient estimators from finite samples exist is when the celebrated back-door condition holds. In this paper, we extend weighting-based methods developed for the back-door case to more general settings, and develop novel machinery for estimating causal effects using the weighting-based method as a building block. We derive graphical criteria under which causal effects can be estimated using this new machinery and demonstrate the effectiveness of the proposed method through simulation studies.

NeurIPS Conference 2020 Conference Paper

Learning Causal Effects via Weighted Empirical Risk Minimization

  • Yonghan Jung
  • Jin Tian
  • Elias Bareinboim

Learning causal effects from data is a fundamental problem across the sciences. Determining the identifiability of a target effect from a combination of the observational distribution and the causal graph underlying a phenomenon is well-understood in theory. However, in practice, it remains a challenge to apply the identification theory to estimate the identified causal functionals from finite samples. Although a plethora of effective estimators have been developed under the setting known as the back-door (also called conditional ignorability), there exists still no systematic way of estimating arbitrary causal functionals that are both computationally and statistically attractive. This paper aims to bridge this gap, from causal identification to causal estimation. We note that estimating functionals from limited samples based on the empirical risk minimization (ERM) principle has been pervasive in the machine learning literature, and these methods have been extended to causal inference under the back-door setting. In this paper, we develop a learning framework that marries two families of methods, benefiting from the generality of the causal identification theory and the effectiveness of the estimators produced based on the principle of ERM. Specifically, we develop a sound and complete algorithm that generates causal functionals in the form of weighted distributions that are amenable to the ERM optimization. We then provide a practical procedure for learning causal effects from finite samples and a causal graph. Finally, experimental results support the effectiveness of our approach.

AAAI Conference 2019 Conference Paper

Identification of Causal Effects in the Presence of Selection Bias

  • Juan D. Correa
  • Jin Tian
  • Elias Bareinboim

Cause-and-effect relations are one of the most valuable types of knowledge sought after throughout the data-driven sciences since they translate into stable and generalizable explanations as well as efficient and robust decision-making capabilities. Inferring these relations from data, however, is a challenging task. Two of the most common barriers to this goal are known as confounding and selection biases. The former stems from the systematic bias introduced during the treatment assignment, while the latter comes from the systematic bias during the collection of units into the sample. In this paper, we consider the problem of identifiability of causal effects when both confounding and selection biases are simultaneously present. We first investigate the problem of identifiability when all the available data is biased. We prove that the algorithm proposed by [Bareinboim and Tian, 2015] is, in fact, complete, namely, whenever the algorithm returns a failure condition, no identifiability claim about the causal relation can be made by any other method. We then generalize this setting to when, in addition to the biased data, another piece of external data is available, without bias. It may be the case that a subset of the covariates could be measured without bias (e. g. , from census). We examine the problem of identifiability when a combination of biased and unbiased data is available. We propose a new algorithm that subsumes the current state-of-the-art method based on the back-door criterion.

AAAI Conference 2018 Conference Paper

Generalized Adjustment Under Confounding and Selection Biases

  • Juan Correa
  • Jin Tian
  • Elias Bareinboim

Selection and confounding biases are the two most common impediments to the applicability of causal inference methods in large-scale settings. We generalize the notion of backdoor adjustment to account for both biases and leverage external data that may be available without selection bias (e. g. , data from census). We introduce the notion of adjustment pair and present complete graphical conditions for identifying causal effects by adjustment. We further design an algorithm for listing all admissible adjustment pairs in polynomial delay, which is useful for researchers interested in evaluating certain properties of some admissible pairs but not all (common properties include cost, variance, and feasibility to measure). Finally, we describe a statistical estimation procedure that can be performed once a set is known to be admissible, which entails different challenges in terms of finite samples.

JMLR Journal 2016 Journal Article

Structure Learning in Bayesian Networks of a Moderate Size by Efficient Sampling

  • Ru He
  • Jin Tian
  • Huaiqing Wu

We study the Bayesian model averaging approach to learning Bayesian network structures (DAGs) from data. We develop new algorithms including the first algorithm that is able to efficiently sample DAGs of a moderate size (with up to about 25 variables) according to the exact structure posterior. The DAG samples can then be used to construct estimators for the posterior of any feature. We theoretically prove good properties of our estimators and empirically show that our estimators considerably outperform the estimators from the previous state- of-the-art methods. [abs] [ pdf ][ bib ] [ appendix ] &copy JMLR 2016. ( edit, beta )

AAAI Conference 2015 Conference Paper

Recovering Causal Effects from Selection Bias

  • Elias Bareinboim
  • Jin Tian

Controlling for selection and confounding biases are two of the most challenging problems that appear in data analysis in the empirical sciences as well as in artificial intelligence tasks. The combination of previously studied methods for each of these biases in isolation is not directly applicable to certain non-trivial cases in which selection and confounding biases are simultaneously present. In this paper, we tackle these instances non-parametrically and in full generality. We provide graphical and algorithmic conditions for recoverability of interventional distributions for when selection and confounding biases are both present. Our treatment completely characterizes the class of causal effects that are recoverable in Markovian models, and is sufficient for Semi-Markovian models.

AAAI Conference 2014 Conference Paper

Finding the k-best Equivalence Classes of Bayesian Network Structures for Model Averaging

  • Yetian Chen
  • Jin Tian

In this paper we develop an algorithm to find the kbest equivalence classes of Bayesian networks. Our algorithm is capable of finding much more best DAGs than the previous algorithm that directly finds the k-best DAGs (Tian, He, and Ram 2010). We demonstrate our algorithm in the task of Bayesian model averaging. Empirical results show that our algorithm significantly outperforms the k-best DAG algorithm in both time and space to achieve the same quality of approximation. Our algorithm goes beyond the maximum-a-posteriori (MAP) model by listing the most likely network structures and their relative likelihood and therefore has important applications in causal structure discovery.

AAAI Conference 2014 Conference Paper

Recovering from Selection Bias in Causal and Statistical Inference

  • Elias Bareinboim
  • Jin Tian
  • Judea Pearl

Selection bias is caused by preferential exclusion of units from the samples and represents a major obstacle to valid causal and statistical inferences; it cannot be removed by randomized experiments and can rarely be detected in either experimental or observational studies. In this paper, we provide complete graphical and algorithmic conditions for recovering conditional probabilities from selection biased data. We also provide graphical conditions for recoverability when unbiased data is available over a subset of the variables. Finally, we provide a graphical condition that generalizes the backdoor criterion and serves to recover causal effects when the data is collected under preferential selection.

AAAI Conference 2014 Conference Paper

Testable Implications of Linear Structural Equation Models

  • Bryant Chen
  • Jin Tian
  • Judea Pearl

In causal inference, all methods of model learning rely on testable implications, namely, properties of the joint distribution that are dictated by the model structure. These constraints, if not satisfied in the data, allow us to reject or modify the model. Most common methods of testing a linear structural equation model (SEM) rely on the likelihood ratio or chi-square test which simultaneously tests all of the restrictions implied by the model. Local constraints, on the other hand, offer increased power (Bollen and Pearl 2013; McDonald 2002) and, in the case of failure, provide the modeler with insight for revising the model specification. One strategy of uncovering local constraints in linear SEMs is to search for overidentified path coefficients. While these overidentifying constraints are well known, no method has been given for systematically discovering them. In this paper, we extend the half-trek criterion of (Foygel, Draisma, and Drton 2012) to identify a larger set of structural coefficients and use it to systematically discover overidentifying constraints. Still open is the question of whether our algorithm is complete.

NeurIPS Conference 2013 Conference Paper

Graphical Models for Inference with Missing Data

  • Karthika Mohan
  • Judea Pearl
  • Jin Tian

We address the problem of deciding whether there exists a consistent estimator of a given relation Q, when data are missing not at random. We employ a formal representation called `Missingness Graphs' to explicitly portray the causal mechanisms responsible for missingness and to encode dependencies between these mechanisms and the variables being measured. Using this representation, we define the notion of \textit{recoverability} which ensures that, for a given missingness-graph $G$ and a given query $Q$ an algorithm exists such that in the limit of large samples, it produces an estimate of $Q$ \textit{as if} no data were missing. We further present conditions that the graph should satisfy in order for recoverability to hold and devise algorithms to detect the presence of these conditions.

IJCAI Conference 2009 Conference Paper

  • Jin Tian

Linear causal models known as structural equation models (SEMs) are widely used for data analysis in the social sciences, economics, and artificial intelligence, in which random variables are assumed to be continuous and normally distributed. This paper deals with one fundamental problem in the applications of SEMs – parameter identification. The paper uses the graphical models approach and provides a procedure for solving the identification problem in a special class of SEMs.

JMLR Journal 2009 Journal Article

Markov Properties for Linear Causal Models with Correlated Errors

  • Changsung Kang
  • Jin Tian

A linear causal model with correlated errors, represented by a DAG with bi-directed edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property specifies, by the d-separation criterion, the set of all conditional independence relations holding in any model associated with a graph. A local Markov property specifies a much smaller set of conditional independence relations which will imply all other conditional independence relations which hold under the global Markov property. For DAGs with bi-directed edges associated with arbitrary probability distributions, a local Markov property is given in Richardson (2003) which may invoke an exponential number of conditional independencies. In this paper, we show that for a class of linear structural equation models with correlated errors, there is a local Markov property which will invoke only a linear number of conditional independence relations. For general linear models, we provide a local Markov property that often invokes far fewer conditional independencies than that in Richardson (2003). The results have applications in testing linear structural equation models with correlated errors. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

AAAI Conference 2007 Conference Paper

On the Identification of a Class of Linear Models

  • Jin Tian

This paper deals with the problem of identifying direct causal effects in recursive linear structural equation models. The paper provides a procedure for solving the identification problem in a special class of models.

AAAI Conference 2006 Conference Paper

A Characterization of Interventional Distributions in Semi-Markovian Causal Models

  • Jin Tian

We offer a complete characterization of the set of distributions that could be induced by local interventions on variables governed by a causal Bayesian network of unknown structure, in which some of the variables remain unmeasured. We show that such distributions are constrained by a simply formulated set of inequalities, from which bounds can be derived on causal effects that are not directly measured in randomized experiments.

AAAI Conference 2005 Conference Paper

Identifying Direct Causal Effects in Linear Models

  • Jin Tian

This paper deals with the problem of identifying direct causal effects in recursive linear structural equation models. Using techniques developed for graphical causal models, we show that a model can be decomposed into a set of submodels such that the identification problem can be solved independently in each submodel. We provide a new identification method that identifies causal effects by solving a set of algebraic equations.

AAAI Conference 2004 Conference Paper

Identifying Linear Causal Effects

  • Jin Tian

This paper concerns the assessment of linear cause-effect relationships from a combination of observational data and qualitative causal structures. The paper shows how techniques developed for identifying causal effects in causal Bayesian networks can be used to identify linear causal effects, and thus provides a new approach for assessing linear causal effects in structural equation models. Using this approach the paper develops a systematic procedure for recognizing identifiable direct causal effects.