Arrow Research search

Author name cluster

Ilya Shpitser

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

42 papers
2 author rows

Possible papers

42

JMLR Journal 2025 Journal Article

Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings

  • Henrik von Kleist
  • Alireza Zamanian
  • Ilya Shpitser
  • Narges Ahmidi

Machine learning methods often assume that input features are available at no cost. However, in domains like healthcare, where acquiring features could be expensive or harmful, it is necessary to balance a feature's acquisition cost against its predictive value. The task of training an AI agent to decide which features to acquire is called active feature acquisition (AFA). By deploying an AFA agent, we effectively alter the acquisition strategy and trigger a distribution shift. To safely deploy AFA agents under this distribution shift, we present the problem of active feature acquisition performance evaluation (AFAPE). We examine AFAPE under i) a no direct effect (NDE) assumption, stating that acquisitions do not affect the underlying feature values; and ii) a no unobserved confounding (NUC) assumption, stating that retrospective feature acquisition decisions were only based on observed features. We show that one can apply missing data methods under the NDE assumption and offline reinforcement learning under the NUC assumption. When NUC and NDE hold, we propose a novel semi-offline reinforcement learning framework. This framework requires a weaker positivity assumption and introduces three new estimators: A direct method (DM), an inverse probability weighting (IPW), and a double reinforcement learning (DRL) estimator. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

TMLR Journal 2025 Journal Article

Explaining the Behavior of Black-Box Prediction Algorithms with Causal Learning

  • Numair Sani
  • Daniel Malinsky
  • Ilya Shpitser

Causal approaches to post-hoc explainability for black-box prediction models (e.g., deep neural networks trained on image pixel data) have become increasingly popular. However, existing approaches have two important shortcomings: (i) the “explanatory units” are micro-level inputs into the relevant prediction model, e.g., image pixels, rather than interpretable macro-level features that are more useful for understanding how to possibly change the algorithm’s behavior, and (ii) existing approaches assume there exists no unmeasured confounding between features and target model predictions, which fails to hold when the explanatory units are macro-level variables. Our focus is on the important setting where the analyst has no access to the inner workings of the target prediction algorithm, rather only the ability to query the output of the model in response to a particular input. To provide causal explanations in such a setting, we propose to learn causal graphical representations that allow for arbitrary unmeasured confounding among features. We demonstrate the resulting graph can differentiate between interpretable features that causally influence model predictions versus those that are merely associated with model predictions due to confounding. Our approach is motivated by a counterfactual theory of causal explanation wherein good explanations point to factors that are “difference-makers” in an interventionist sense.

ICML Conference 2025 Conference Paper

Feature Importance Metrics in the Presence of Missing Data

  • Henrik von Kleist
  • Joshua Wendland
  • Ilya Shpitser
  • Carsten Marr

Feature importance metrics are critical for interpreting machine learning models and understanding the relevance of individual features. However, real-world data often exhibit missingness, thereby complicating how feature importance should be evaluated. We introduce the distinction between two evaluation frameworks under missing data: (1) feature importance under the full data, as if every feature had been fully measured, and (2) feature importance under the observed data, where missingness is governed by the current measurement policy. While the full data perspective offers insights into the data generating process, it often relies on unrealistic assumptions and cannot guide decisions when missingness persists at model deployment. Since neither framework directly informs improvements in data collection, we additionally introduce the feature measurement importance gradient (FMIG), a novel, model-agnostic metric that identifies features that should be measured more frequently to enhance predictive performance. Using synthetic data, we illustrate key differences between these metrics and the risks of conflating them.

UAI Conference 2024 Conference Paper

A General Identification Algorithm For Data Fusion Problems Under Systematic Selection

  • Jaron Jia Rong Lee
  • AmirEmad Ghassami
  • Ilya Shpitser

Causal inference is made challenging by confounding, selection bias, and other complications. A common approach to addressing these difficulties is the inclusion of auxiliary data on the superpopulation of interest. Such data may measure a different set of variables, or be obtained under different experimental conditions than the primary dataset. Analysis based on multiple datasets must carefully account for similarities between datasets, while appropriately accounting for differences. In addition, selection of experimental units into different datasets may be systematic; similar difficulties are encountered in missing data problems. Existing methods for combining datasets either do not consider this issue, or assume simple selection mechanisms. In this paper, we provide a general approach, based on graphical causal models, for causal inference from data on the same superpopulation that is obtained under different experimental conditions. Our framework allows both arbitrary unobserved confounding, and arbitrary selection processes into different experimental regimes in our data. We describe how systematic selection processes may be organized into a hierarchy similar to censoring processes in missing data: selected completely at random (SCAR), selected at random (SAR), and selected not at random (SNAR). In addition, we provide a general identification algorithm for interventional distributions in this setting.

ICML Conference 2024 Conference Paper

Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach

  • Zixiao Wang 0006
  • AmirEmad Ghassami
  • Ilya Shpitser

We consider the task of identifying and estimating a parameter of interest in settings where data is missing not at random (MNAR). In general, such parameters are not identified without strong assumptions on the missing data model. In this paper, we take an alternative approach and introduce a method inspired by data fusion, where information in the MNAR dataset is augmented by information in an auxiliary dataset subject to missingness at random (MAR). We show that even if the parameter of interest cannot be identified given either dataset alone, it can be identified given pooled data, under two complementary sets of assumptions. We derive inverse probability weighted (IPW) estimators for identified parameters under both sets of assumptions, and evaluate the performance of our estimation strategies via simulation studies, and a data application.

UAI Conference 2024 Conference Paper

Zero Inflation as a Missing Data Problem: a Proxy-based Approach

  • Trung Phung
  • Jaron Jia Rong Lee
  • Opeyemi Oladapo-Shittu
  • Eili Y. Klein
  • Ayse Pinar Gurses
  • Susan M. Hannum
  • Kimberly Weems
  • Jill A. Marsteller

A common type of zero-inflated data has certain true values incorrectly replaced by zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (e. g. artificial zeros in gene expression data). Existing methods for zero-inflated data either fit the observed data likelihood via parametric mixture models that explicitly represent excess zeros, or aim to replace excess zeros by imputed values. If the goal of the analysis relies on knowing true data realizations, a particular challenge with zero-inflated data is identifiability, since it is difficult to correctly determine which observed zeros are real and which are inflated. This paper views zero-inflated data as a general type of missing data problem, where the observability indicator for a potentially censored variable is itself unobserved whenever a zero is recorded. We show that, without additional assumptions, target parameters involving a zero-inflated variable are not identified. However, if a proxy of the missingness indicator is observed, a modification of the effect restoration approach of Kuroki and Pearl allows identification and estimation, given the proxy-indicator relationship is known. If this relationship is unknown, our approach yields a partial identification strategy for sensitivity analysis. Specifically, we show that only certain proxy-indicator relationships are compatible with the observed data distribution. We give an analytic bound for this relationship in cases with a categorical outcome, which is sharp in certain models. For more complex cases, sharp numerical bounds may be computed using methods in Duarte et al. [2023]. We illustrate our method via simulation studies and a data application on central line-associated bloodstream infections (CLABSIs).

JMLR Journal 2023 Journal Article

The Proximal ID Algorithm

  • Ilya Shpitser
  • Zach Wood-Doughty
  • Eric J. Tchetgen Tchetgen

Unobserved confounding is a fundamental obstacle to establishing valid causal conclusions from observational data. Two complementary types of approaches have been developed to address this obstacle: obtaining identification using fortuitous external aids, such as instrumental variables or proxies, or by means of the ID algorithm, using Markov restrictions on the full data distribution encoded in graphical causal models. In this paper we aim to develop a synthesis of the former and latter approaches to identification in causal inference to yield the most general identification algorithm in multivariate systems currently known -- the proximal ID algorithm. In addition to being able to obtain nonparametric identification in all cases where the ID algorithm succeeds, our approach allows us to systematically exploit proxies to adjust for the presence of unobserved confounders that would have otherwise prevented identification. In addition, we outline a class of estimation strategies for causal parameters identified by our method in an important special case. We illustrate our approach by simulation studies and a data application. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Causal Discovery in Linear Latent Variable Models Subject to Measurement Error

  • Yuqin Yang
  • AmirEmad Ghassami
  • Mohamed Nafea
  • Negar Kiyavash
  • Kun Zhang
  • Ilya Shpitser

We focus on causal discovery in the presence of measurement error in linear systems where the mixing matrix, i. e. , the matrix indicating the independent exogenous noise terms pertaining to the observed variables, is identified up to permutation and scaling of the columns. We demonstrate a somewhat surprising connection between this problem and causal discovery in the presence of unobserved parentless causes, in the sense that there is a mapping, given by the mixing matrix, between the underlying models to be inferred in these problems. Consequently, any identifiability result based on the mixing matrix for one model translates to an identifiability result for the other model. We characterize to what extent the causal models can be identified under a two-part faithfulness assumption. Under only the first part of the assumption (corresponding to the conventional definition of faithfulness), the structure can be learned up to the causal ordering among an ordered grouping of the variables but not all the edges across the groups can be identified. We further show that if both parts of the faithfulness assumption are imposed, the structure can be learned up to a more refined ordered grouping. As a result of this refinement, for the latent variable model with unobserved parentless causes, the structure can be identified. Based on our theoretical results, we propose causal structure learning methods for both models, and evaluate their performance on synthetic data.

UAI Conference 2022 Conference Paper

Semiparametric causal sufficient dimension reduction of multidimensional treatments

  • Razieh Nabi
  • Todd McNutt
  • Ilya Shpitser

Cause-effect relationships are typically evaluated by comparing outcome responses to binary treatment values, representing two arms of a hypothetical randomized controlled trial. However, in certain applications, treatments of interest are continuous and multidimensional. For example, understanding the causal relationship between severity of radiation therapy, summarized by a multidimensional vector of radiation exposure values and post-treatment side effects is a problem of clinical interest in radiation oncology. An appropriate strategy for making interpretable causal conclusions is to reduce the dimension of treatment. If individual elements of a multidimensional treatment vector weakly affect the outcome, but the overall relationship between treatment and outcome is strong, careless approaches to dimension reduction may not preserve this relationship. Further, methods developed for regression problems do not directly transfer to causal inference due to confounding complications. In this paper, we use semiparametric inference theory for structural models to give a general approach to causal sufficient dimension reduction of a multidimensional treatment such that the cause-effect relationship between treatment and outcome is preserved. We illustrate the utility of our proposals through simulations and a real data application in radiation oncology.

JMLR Journal 2022 Journal Article

Semiparametric Inference For Causal Effects In Graphical Models With Hidden Variables

  • Rohit Bhattacharya
  • Razieh Nabi
  • Ilya Shpitser

Identification theory for causal effects in causal models associated with hidden variable directed acyclic graphs (DAGs) is well studied. However, the corresponding algorithms are underused due to the complexity of estimating the identifying functionals they output. In this work, we bridge the gap between identification and estimation of population-level causal effects involving a single treatment and a single outcome. We derive influence function based estimators that exhibit double robustness for the identified effects in a large class of hidden variable DAGs where the treatment satisfies a simple graphical criterion; this class includes models yielding the adjustment and front-door functionals as special cases. We also provide necessary and sufficient conditions under which the statistical model of a hidden variable DAG is nonparametrically saturated and implies no equality constraints on the observed data distribution. Further, we derive an important class of hidden variable DAGs that imply observed data distributions observationally equivalent (up to equality constraints) to fully observed DAGs. In these classes of DAGs, we derive estimators that achieve the semiparametric efficiency bounds for the target of interest where the treatment satisfies our graphical criterion. Finally, we provide a sound and complete identification algorithm that directly yields a weight based estimation strategy for any identifiable effect in hidden variable causal models. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

UAI Conference 2021 Conference Paper

Entropic Inequality Constraints from e-separation Relations in Directed Acyclic Graphs with Hidden Variables

  • Noam Finkelstein
  • Beata Zjawin
  • Elie Wolfe
  • Ilya Shpitser
  • Robert W. Spekkens

Directed acyclic graphs (DAGs) with hidde variables are often used to characterize causal relations between variables in a system. When some variables are unobserved, DAGs imply a notoriously complicated set of constraints on the distribution of observed variables. In this work, we present entropic inequality constraints that are implied by e-separation relations in hidden variable DAGs with discrete observed variables. The constraints can intuitively be understood to follow from the fact that the capacity of variables along a causal pathway to convey information is restricted by their entropy; e. g. at the extreme case, a variable with entropy 0 can convey no information. We show how these constraints can be used to learn about the true causal model from an observed data distribution. In addition, we propose a measure of causal influence called the minimal mediary entropy, and demonstrate that it can concisely augment traditional measures such as the average treatment effect.

UAI Conference 2021 Conference Paper

Partial Identifiability in Discrete Data with Measurement Error

  • Noam Finkelstein
  • Roy Adams
  • Suchi Saria
  • Ilya Shpitser

When data contains measurement errors, it is necessary to make modeling assumptions relating the error-prone measurements to the unobserved true values. Work on measurement error has largely focused on models that fully identify the parameter of interest. As a result, many practically useful models that result in bounds on the target parameter – known as partial identification – have been neglected. In this work, we present a method for partial identification in a class of measurement error models involving discrete variables. We focus on models that impose linear constraints on the target parameter, allowing us to compute partial identification bounds using off-the-shelf LP solvers. We show how several common measurement error assumptions can be composed with an extended class of instrumental variable-type models to create such linear constraint sets. We further show how this approach can be used to bound causal parameters, such as the average treatment effect, when treatment or outcome variables are measured with error. Using data from the Oregon Health Insurance Experiment, we apply this method to estimate bounds on the effect Medicaid enrollment has on depression when depression is measured with error.

UAI Conference 2021 Conference Paper

Path dependent structural equation models

  • Ranjani Srinivasan
  • Jaron Jia Rong Lee
  • Rohit Bhattacharya
  • Ilya Shpitser

Causal analyses of longitudinal data generally assume that the qualitative causal structure relating variables remains invariant over time. In structured systems that transition between qualitatively different states in discrete time steps, such an approach is deficient on two fronts. First, time-varying variables may have state-specific causal relationships that need to be captured. Second, an intervention can result in state transitions downstream of the intervention different from those actually observed in the data. In other words, interventions may counterfactually alter the subsequent temporal evolution of the system. We introduce a generalization of causal graphical models, Path Dependent Structural Equation Models (PDSEMs), that can describe such systems. We show how causal inference may be performed in such models and illustrate its use in simulations and data obtained from a septoplasty surgical procedure.

UAI Conference 2020 Conference Paper

Deriving Bounds And Inequality Constraints Using Logical Relations Among Counterfactuals

  • Noam Finkelstein
  • Ilya Shpitser

Causal parameters may not be point identified in the presence of unobserved confounding. However, information about non-identified parameters, in the form of bounds, may still be recovered from the observed data in some cases. We develop a new general method for obtaining bounds on causal parameters using rules of probability and restrictions on counterfactuals implied by causal graphical models. We additionally provide inequality constraints on functionals of the observed data law implied by such causal models. Our approach is motivated by the observation that logical relations between identified and non-identified counterfactual events often yield information about non-identified events. We show that this approach is powerful enough to recover known sharp bounds and tight inequality constraints, and to derive novel bounds and constraints.

ICML Conference 2020 Conference Paper

Full Law Identification in Graphical Models of Missing Data: Completeness Results

  • Razieh Nabi
  • Rohit Bhattacharya
  • Ilya Shpitser

Missing data has the potential to affect analyses conducted in all fields of scientific study including healthcare, economics, and the social sciences. Several approaches to unbiased inference in the presence of non-ignorable missingness rely on the specification of the target distribution and its missingness process as a probability distribution that factorizes with respect to a directed acyclic graph. In this paper, we address the longstanding question of the characterization of models that are identifiable within this class of missing data distributions. We provide the first completeness result in this field of study – necessary and sufficient graphical conditions under which, the full data distribution can be recovered from the observed data distribution. We then simultaneously address issues that may arise due to the presence of both missing data and unmeasured confounding, by extending these graphical conditions and proofs of completeness, to settings where some variables are not just missing, but completely unobserved.

UAI Conference 2020 Conference Paper

Identification and Estimation of Causal Effects Defined by Shift Interventions

  • Numair Sani
  • Jaron Jia Rong Lee
  • Ilya Shpitser

Causal inference quantifies cause effect relationships by means of counterfactual responses had some variable been artificially set to a constant. A more refined notion of manipulation, where a variable is artificially set to a fixed function of its natural value is also of interest in particular domains. Examples include increases in financial aid, changes in drug dosing, and modifying length of stay in a hospital. We define counterfactual responses to manipulations of this type, which we call shift interventions. We show that in the presence of multiple variables being manipulated, two types of shift interventions are possible. Shift interventions on the treated (SITs) are defined with respect to natural values, and are connected to effects of treatment on the treated. Shift interventions as policies (SIPs) are defined recursively with respect to values of responses to prior shift interventions, and are connected to dynamic treatment regimes. We give sound and complete identification algorithms for both types of shift interventions, and derive efficient semi-parametric estimators for the mean response to a shift intervention in a special case motivated by a healthcare problem. Finally, we demonstrate the utility of our method by using an electronic health record dataset to estimate the effect of extending the length of stay in the intensive care unit (ICU) in a hospital by an extra day on patient ICU readmission probability.

UAI Conference 2019 Conference Paper

Causal Inference Under Interference And Network Uncertainty

  • Rohit Bhattacharya
  • Daniel Malinsky
  • Ilya Shpitser

Classical causal and statistical inference methods typically assume the observed data consists of independent realizations. However, in many applications this assumption is inappropriate due to a network of dependences between units in the data. Methods for estimating causal effects have been developed in the setting where the structure of dependence between units is known exactly, but in practice there is often substantial uncertainty about the precise network structure. This is true, for example, in trial data drawn from vulnerable communities where social ties are difficult to query directly. In this paper we combine techniques from the structure learning and interference literatures in causal inference, proposing a general method for estimating causal effects under data dependence when the structure of this dependence is not known a priori. We demonstrate the utility of our method on synthetic datasets which exhibit network dependence.

UAI Conference 2019 Conference Paper

Identification In Missing Data Models Represented By Directed Acyclic Graphs

  • Rohit Bhattacharya
  • Razieh Nabi
  • Ilya Shpitser
  • James M. Robins

Missing data is a pervasive problem in data analyses, resulting in datasets that contain censored realizations of a target distribution. Many approaches to inference on the target distribution using censored observed data, rely on missing data models represented as a factorization with respect to a directed acyclic graph. In this paper we consider the identifiability of the target distribution within this class of models, and show that the most general identification strategies proposed so far retain a significant gap in that they fail to identify a wide class of identifiable distributions. To address this gap, we propose a new algorithm that significantly generalizes the types of manipulations used in the ID algorithm [14, 16], developed in the context of causal inference, in order to obtain identification.

UAI Conference 2019 Conference Paper

Intervening on Network Ties

  • Eli Sherman
  • Ilya Shpitser

A foundational tool for making causal inferences is the emulation of randomized control trials via variable interventions. This approach has been applied to a wide variety of contexts, from health to economics [3, 6]. Variable interventions have long been studied in independent and identically distributed (iid) data contexts, but recently non-iid settings, such as networks with interacting agents [8, 17, 28] have attracted interest. In this paper, we propose a type of structural intervention [12] relevant in network contexts: the network intervention. Rather than estimating the effect of changing variables, we consider changes to social network structure resulting from creation or severance of ties between agents. We define the individual participant and average bystander effects for these interventions and describe identification criteria. We then prove a series of theoretical results that show existing identification theory obtains minimally KL-divergent distributions corresponding to network interventions. Finally, we demonstrate estimation of effects of network interventions via a simulation study.

ICML Conference 2019 Conference Paper

Learning Optimal Fair Policies

  • Razieh Nabi
  • Daniel Malinsky
  • Ilya Shpitser

Systematic discriminatory biases present in our society influence the way data is collected and stored, the way variables are defined, and the way scientific findings are put into practice as policy. Automated decision procedures and learning algorithms applied to such data may serve to perpetuate existing injustice or unfairness in our society. In this paper, we consider how to make optimal but fair decisions, which “break the cycle of injustice” by correcting for the unfair dependence of both decisions and outcomes on sensitive features (e. g. , variables that correspond to gender, race, disability, or other protected attributes). We use methods from causal inference and constrained optimization to learn optimal policies in a way that addresses multiple potential biases which afflict data analysis in sensitive contexts, extending the approach of Nabi & Shpitser (2018). Our proposal comes equipped with the theoretical guarantee that the chosen fair policy will induce a joint distribution for new instances that satisfies given fairness constraints. We illustrate our approach with both synthetic data and real criminal justice data.

UAI Conference 2018 Conference Paper

Acyclic Linear SEMs Obey the Nested Markov Property

  • Ilya Shpitser
  • Robin J. Evans 0002
  • Thomas S. Richardson 0001

models are widely used in a variety of disciplines, with many theoretical developments and applications. The conditional independence structure induced on the observed marginal distribution by a hidden variable directed acyclic graph (DAG) may be represented by a graphical model represented by mixed graphs called maximal ancestral graphs (MAGs). This model has a number of desirable properties, in particular the set of Gaussian distributions can be parameterized by viewing the graph as a path diagram. Models represented by MAGs have been used for causal discovery [22], and identification theory for causal effects [28]. An important parametric subclass of causal graphical models are the linear structural equation models with correlated errors (SEMs). In fact, Wright and to some extent Haavelmo’s work was originally within the SEM class. SEMs are defined over a class of mixed graphs containing directed (→) edges representing direct causation, and bidirected (↔) edges representing correlated errors. Mixed graphs of this type without directed cycles— an assumption that rules out cyclic causation—are called acyclic directed mixed graphs (ADMGs). Given an ADMG G, the linear structural equation model with correlated errors (SEM) associated with G is formally defined as the set of multivariate normal distributions with covariance matrices of the form In addition to ordinary conditional independence constraints, hidden variable DAGs also induce generalized independence constraints. These constraints form the nested Markov property [20]. We first show that acyclic linear SEMs obey this property. Further we show that a natural parameterization for all Gaussian distributions obeying the nested Markov property arises from a generalization of maximal ancestral graphs that we call maximal arid graphs (MArG). We show that every nested Markov model can be associated with a MArG; viewed as a path diagram this MArG parametrizes the Gaussian nested Markov model. This leads directly to methods for ML fitting and computing BIC scores for Gaussian nested models. 1 Thomas S. Richardson Department of Statistics University of Washington thomasr@u. washington. edu

UAI Conference 2018 Conference Paper

Estimation of Personalized Effects Associated With Causal Pathways

  • Razieh Nabi
  • Phyllis Kanki
  • Ilya Shpitser

The goal of personalized decision making is to map a unit’s characteristics to an action tailored to maximize the expected outcome for that unit. Obtaining high-quality mappings of this type is the goal of the dynamic regime literature. In healthcare settings, optimizing policies with respect to a particular causal pathway may be of interest as well. For example, we may wish to maximize the chemical effect of a drug given data from an observational study where the chemical effect of the drug on the outcome is entangled with the indirect effect mediated by differential adherence. In such cases, we may wish to optimize the direct effect of a drug, while keeping the indirect effect to that of some reference treatment. [15] shows how to combine mediation analysis and dynamic treatment regime ideas to defines policies associated with causal pathways and counterfactual responses to these policies. In this paper, we derive a variety of methods for learning high quality policies of this type from data, in a causal model corresponding to a longitudinal setting of practical importance. We illustrate our methods via a dataset of HIV patients undergoing therapy, gathered in the Nigerian PEPFAR program.

AAAI Conference 2018 Conference Paper

Fair Inference on Outcomes

  • Razieh Nabi
  • Ilya Shpitser

In this paper, we consider the problem of fair statistical inference involving outcome variables. Examples include classification and regression problems, and estimating treatment effects in randomized trials or observational data. The issue of fairness arises in such problems where some covariates or treatments are “sensitive, ” in the sense of having potential of creating discrimination. In this paper, we argue that the presence of discrimination can be formalized in a sensible way as the presence of an effect of a sensitive covariate on the outcome along certain causal pathways, a view which generalizes (Pearl 2009). A fair outcome model can then be learned by solving a constrained optimization problem. We discuss a number of complications that arise in classical statistical inference due to this view and provide workarounds based on recent work in causal and semi-parametric inference.

NeurIPS Conference 2018 Conference Paper

Identification and Estimation of Causal Effects from Dependent Data

  • Eli Sherman
  • Ilya Shpitser

The assumption that data samples are independent and identically distributed (iid) is standard in many areas of statistics and machine learning. Nevertheless, in some settings, such as social networks, infectious disease modeling, and reasoning with spatial and temporal data, this assumption is false. An extensive literature exists on making causal inferences under the iid assumption [12, 8, 21, 16], but, as pointed out in [14], causal inference in non-iid contexts is challenging due to the combination of unobserved confounding bias and data dependence. In this paper we develop a general theory describing when causal inferences are possible in such scenarios. We use segregated graphs [15], a generalization of latent projection mixed graphs [23], to represent causal models of this type and provide a complete algorithm for non-parametric identification in these models. We then demonstrate how statistical inferences may be performed on causal parameters identified by this algorithm, even in cases where parts of the model exhibit full interference, meaning only a single sample is available for parts of the model [19]. We apply these techniques to a synthetic data set which considers the adoption of fake news articles given the social network structure, articles read by each person, and baseline demographics and socioeconomic covariates.

UAI Conference 2018 Conference Paper

Identification of Personalized Effects Associated With Causal Pathways

  • Ilya Shpitser
  • Eli Sherman

Unlike classical causal inference, where the goal is to estimate average causal effects within a population, in settings such as personalized medicine, the goal is to map a unit’s characteristics to a treatment tailored to maximize the expected outcome for that unit. Obtaining high-quality mappings of this type is the goal of the dynamic treatment regime literature. In healthcare settings, optimizing policies with respect to a particular causal pathway is often of interest as well. In the context of average treatment effects, estimation of effects associated with causal pathways is considered in the mediation analysis literature. In this paper, we combine mediation analysis and dynamic treatment regime ideas and consider how unit characteristics may be used to tailor a treatment strategy that maximizes an effect along specified sets of causal pathways. In particular, we define counterfactual responses to such policies, give a general identification algorithm for these counterfactuals, and prove completeness of the algorithm for unrestricted policies. A corollary of our results is that the identification algorithm for responses to policies given in [16] is complete for arbitrary policies.

NeurIPS Conference 2016 Conference Paper

Consistent Estimation of Functions of Data Missing Non-Monotonically and Not at Random

  • Ilya Shpitser

Missing records are a perennial problem in analysis of complex data of all types, when the target of inference is some function of the full data law. In simple cases, where data is missing at random or completely at random (Rubin, 1976), well-known adjustments exist that result in consistent estimators of target quantities. Assumptions underlying these estimators are generally not realistic in practical missing data problems. Unfortunately, consistent estimators in more complex cases where data is missing not at random, and where no ordering on variables induces monotonicity of missingness status are not known in general, with some notable exceptions (Robins, 1997), (Tchetgen Tchetgen et al, 2016), (Sadinle and Reiter, 2016). In this paper, we propose a general class of consistent estimators for cases where data is missing not at random, and missingness status is non-monotonic. Our estimators, which are generalized inverse probability weighting estimators, make no assumptions on the underlying full data law, but instead place independence restrictions, and certain other fairly mild assumptions, on the distribution of missingness status conditional on the data. The assumptions we place on the distribution of missingness status conditional on the data can be viewed as a version of a conditional Markov random field (MRF) corresponding to a chain graph. Assumptions embedded in our model permit identification from the observed data law, and admit a natural fitting procedure based on the pseudo likelihood approach of (Besag, 1975). We illustrate our approach with a simple simulation study, and an analysis of risk of premature birth in women in Botswana exposed to highly active anti-retroviral therapy.

UAI Conference 2015 Conference Paper

Missing Data as a Causal and Probabilistic Problem

  • Ilya Shpitser
  • Karthika Mohan
  • Judea Pearl

Causal inference is often phrased as a missing data problem – for every unit, only the response to observed treatment assignment is known, the response to other treatment assignments is not. In this paper, we extend the converse approach of [7] of representing missing data problems to causal models where only interventions on missingness indicators are allowed. We further use this representation to leverage techniques developed for the problem of identification of causal effects to give a general criterion for cases where a joint distribution containing missing variables can be recovered from data actually observed, given assumptions on missingness mechanisms. This criterion is significantly more general than the commonly used “missing at random” (MAR) criterion, and generalizes past work which also exploits a graphical representation of missingness. In fact, the relationship of our criterion to MAR is not unlike the relationship between the ID algorithm for identification of causal effects [22, 18], and conditional ignorability [13].

NeurIPS Conference 2015 Conference Paper

Segregated Graphs and Marginals of Chain Graph Models

  • Ilya Shpitser

Bayesian networks are a popular representation of asymmetric (for example causal) relationships between random variables. Markov random fields (MRFs) are a complementary model of symmetric relationships used in computer vision, spatial modeling, and social and gene expression networks. A chain graph model under the Lauritzen-Wermuth-Frydenberg interpretation (hereafter a chain graph model) generalizes both Bayesian networks and MRFs, and can represent asymmetric and symmetric relationships together. As in other graphical models, the set of marginals from distributions in a chain graph model induced by the presence of hidden variables forms a complex model. One recent approach to the study of marginal graphical models is to consider a well-behaved supermodel. Such a supermodel of marginals of Bayesian networks, defined only by conditional independences, and termed the ordinary Markov model, was studied at length in (Evans and Richardson, 2014). In this paper, we show that special mixed graphs which we call segregated graphs can be associated, via a Markov property, with supermodels of a marginal of chain graphs defined only by conditional independences. Special features of segregated graphs imply the existence of a very natural factorization for these supermodels, and imply many existing results on the chain graph model, and ordinary Markov model carry over. Our results suggest that segregated graphs define an analogue of the ordinary Markov model for marginals of chain graph models.

UAI Conference 2013 Conference Paper

Sparse Nested Markov models with Log-linear Parameters

  • Ilya Shpitser
  • Robin J. Evans 0002
  • Thomas S. Richardson 0001
  • James M. Robins

Hidden variables are ubiquitous in practical data analysis, and therefore modeling marginal densities and doing inference with the resulting models is an important problem in statistics, machine learning, and causal inference. Recently, a new type of graphical model, called the nested Markov model, was developed which captures equality constraints found in marginals of directed acyclic graph (DAG) models. Some of these constraints, such as the so called ‘Verma constraint’, strictly generalize conditional independence. To make modeling and inference with nested Markov models practical, it is necessary to limit the number of parameters in the model, while still correctly capturing the constraints in the marginal of a DAG model. Placing such limits is similar in spirit to sparsity methods for undirected graphical models, and regression models. In this paper, we give a log-linear parameterization which allows sparse modeling with nested Markov models. We illustrate the advantages of this parameterization with a simulation study.

UAI Conference 2012 Conference Paper

Nested Markov Properties for Acyclic Directed Mixed Graphs

  • Thomas S. Richardson 0001
  • James M. Robins
  • Ilya Shpitser

Directed acyclic graph (DAG) models may be characterized in four different ways: via a factorization, the dseparation criterion, the moralization criterion, and the local Markov property. As pointed out by Robins [2, 1], Verma and Pearl [6], and Tian and Pearl [5], marginals of DAG models also imply equality constraints that are not conditional independences. The well-known ‘Verma constraint’ is an example. Constraints of this type were used for testing edges [3], and an efficient variable elimination scheme [4]. Using acyclic directed mixed graphs (ADMGs) we provide a graphical characterization of the constraints given in [5] via a nested Markov property that uses a ‘fixing’ transformation on graphs. We give four characterizations of our nested model that are analogous to those given for DAGs. We show that marginal distributions of DAG models obey this property.

UAI Conference 2010 Conference Paper

On the Validity of Covariate Adjustment for Estimating Causal Effects

  • Ilya Shpitser
  • Tyler J. VanderWeele
  • James M. Robins

Identifying effects of actions (treatments) on outcome variables from observational data and causal assumptions is a fundamental problem in causal inference. This identification is made difficult by the presence of confounders which can be related to both treatment and outcome variables. Confounders are often handled, both in theory and in practice, by adjusting for covariates, in other words considering outcomes conditioned on treatment and covariate values, weighed by probability of observing those covariate values. In this paper, we give a complete graphical criterion for covariate adjustment, which we term the adjustment criterion, and derive some interesting corollaries of the completeness of this criterion.

AAAI Conference 2010 Conference Paper

Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

  • Eun Yong Kang
  • Ilya Shpitser
  • Eleazar Eskin

There have been many efforts to identify causal graphical features such as directed edges between random variables from observational data. Recently, Tian et al. proposed a new dynamic programming algorithm which computes marginalized posterior probabilities of directed edge features over all the possible structures in O(n3n ) time when the number of parents per node is bounded by a constant, where n is the number of variables of interest. However the main drawback of this approach is that deciding a single appropriate threshold for the existence of the directed edge feature is difficult due to the scale difference of the posterior probabilities between the directed edges forming v-structures and the directed edges not forming v-structures. We claim that computing posterior probabilities of both adjacencies and v-structures is necessary and more effective for discovering causal graphical features, since it allows us to find a single appropriate decision threshold for the existence of the feature that we are testing. For efficient computation, we provide a novel dynamic programming algorithm which computes the posterior probabilities of all of n(n−1) 2 adjacency and n n−1 2 v-structure features in O(n3 3n ) time.

IJCAI Conference 2009 Conference Paper

  • Ilya Shpitser
  • Thomas S. Richardson
  • James M. Robins

We consider the problem of testing whether two variables should be adjacent (either due to a direct effect between them, or due to a hidden common cause) given an observational distribution, and a set of causal assumptions encoded as a causal diagram. In other words, given a set of edges in the diagram known to be true, we are interested in testing whether another edge ought to be in the diagram. In fully observable faithful models this problem can be easily solved with conditional independence tests. Latent variables make the problem significantly harder since they can imply certain non-adjacent variable pairs, namely those connected by so called inducing paths, are not independent conditioned on any set of variables. We characterize which variable pairs can be determined to be non-adjacent by a class of constraints due to dormant independence, that is conditional independence in identifiable interventional distributions. Furthermore, we show that particular operations on joint distributions, which we call truncations are sufficient for exhibiting these non-adjacencies. This suggests a causal discovery procedure taking advantage of these constraints in the latent variable case can restrict itself to truncations.

UAI Conference 2009 Conference Paper

Effects of Treatment on the Treated: Identification and Generalization

  • Ilya Shpitser
  • Judea Pearl

differently to a reduced dosage x than a randomly selected patient would. Many applications of causal analysis call for assessing, retrospectively, the effect of withholding an action that has in fact been implemented. This counterfactual quantity, sometimes called “effect of treatment on the treated, ” (ETT) have been used to to evaluate educational programs, critic public policies, and justify individual decision making. In this paper we explore the conditions under which ETT can be estimated from (i. e. , identified in) experimental and/or observational studies. We show that, when the action invokes a singleton variable, the conditions for ETT identification have simple characterizations in terms of causal diagrams. We further give a graphical characterization of the conditions under which the effects of multiple treatments on the treated can be identified, as well as ways in which the ETT estimand can be constructed from both interventional and observational distributions. The second problem involves a policy maker considering the termination of an ongoing job-training program, seeking to estimate the anticipated reduction in future earning of those enrolled in the program. This calls for comparing the future earning of the program’s graduates to their hypothetical earning had they not been trained. Again, because those who enroll in the program have special needs and qualities, comparison to the population at large will not be adequate.

JMLR Journal 2008 Journal Article

Complete Identification Methods for the Causal Hierarchy

  • Ilya Shpitser
  • Judea Pearl

We consider a hierarchy of queries about causal relationships in graphical models, where each level in the hierarchy requires more detailed information than the one below. The hierarchy consists of three levels: associative relationships, derived from a joint distribution over the observable variables; cause-effect relationships, derived from distributions resulting from external interventions; and counterfactuals, derived from distributions that span multiple "parallel worlds" and resulting from simultaneous, possibly conflicting observations and interventions. We completely characterize cases where a given causal query can be computed from information lower in the hierarchy, and provide algorithms that accomplish this computation. Specifically, we show when effects of interventions can be computed from observational studies, and when probabilities of counterfactuals can be computed from experimental studies. We also provide a graphical characterization of those queries which cannot be computed (by any method) from queries at a lower layer of the hierarchy. [abs] [ pdf ][ bib ] &copy JMLR 2008. ( edit, beta )

AAAI Conference 2008 Conference Paper

Dormant Independence

  • Ilya Shpitser

The construction of causal graphs from non-experimental data rests on a set of constraints that the graph structure imposes on all probability distributions compatible with the graph. These constraints are of two types: conditional independencies and algebraic constraints, first noted by Verma. While conditional independencies are well studied and frequently used in causal induction algorithms, Verma constraints are still poorly understood, and rarely applied. In this paper we examine a special subset of Verma constraints which are easy to understand, easy to identify and easy to apply; they arise from “dormant independencies, ” namely, conditional independencies that hold in interventional distributions. We give a complete algorithm for determining if a dormant independence between two sets of variables is entailed by the causal graph, such that this independence is identifiable, in other words if it resides in an interventional distribution that can be predicted without resorting to interventions. We further show the usefulness of dormant independencies in model testing and induction by giving an algorithm that uses constraints entailed by dormant independencies to prune extraneous edges from a given causal graph.

UAI Conference 2007 Conference Paper

What Counterfactuals Can Be Tested

  • Ilya Shpitser
  • Judea Pearl

Counterfactual statements, e.g., "my headache would be gone had I taken an aspirin" are central to scientific discourse, and are formally interpreted as statements derived from "alternative worlds". However, since they invoke hypothetical states of affairs, often incompatible with what is actually known or observed, testing counterfactuals is fraught with conceptual and practical difficulties. In this paper, we provide a complete characterization of "testable counterfactuals," namely, counterfactual statements whose probabilities can be inferred from physical experiments. We provide complete procedures for discerning whether a given counterfactual is testable and, if so, expressing its probability in terms of experimental data.

UAI Conference 2006 Conference Paper

Identification of Conditional Interventional Distributions

  • Ilya Shpitser
  • Judea Pearl

The subject of this paper is the elucidation of effects of actions from causal assumptions represented as a directed graph, and statistical knowledge given as a probability distribution. In particular, we are interested in predicting conditional distributions resulting from performing an action on a set of variables and, subsequently, taking measurements of another set. We provide a necessary and sufficient graphical condition for the cases where such distributions can be uniquely computed from the available information, as well as an algorithm which performs this computation whenever the condition holds. Furthermore, we use our results to prove completeness of do-calculus [Pearl, 1995] for the same identification problem.

AAAI Conference 2006 Conference Paper

Identification of Joint Interventional Distributions in Recursive Semi-Markovian Causal Models

  • Ilya Shpitser

This paper is concerned with estimating the effects of actions from causal assumptions, represented concisely as a directed graph, and statistical knowledge, given as a probability distribution. We provide a necessary and sufficient graphical condition for the cases when the causal effect of an arbitrary set of variables on another arbitrary set can be determined uniquely from the available information, as well as an algorithm which computes the effect whenever this condition holds. Furthermore, we use our results to prove completeness of do-calculus [Pearl, 1995], and a version of an identification algorithm in [Tian, 2002] for the same identification problem. Finally, we derive a complete characterization of semi- Markovian models in which all causal effects are identifiable.

IJCAI Conference 2005 Conference Paper

Identifiability of Path-Specific Effects

  • Chen Avin
  • Ilya Shpitser
  • Judea

Counterfactual quantities representing pathspecific effects arise in cases where we are interested in computing the effect of one variable on another only along certain causal paths in the graph (in other words by excluding a set of edges from consideration). A recent paper [Pearl, 2001] details a method by which such an exclusion can be specified formally by fixing the value of the parent node of each excluded edge. In this paper we derive simple, graphical conditions for experimental identifiability of path-specific effects, namely, conditions under which path-specific effects can be estimated consistently from data obtained from controlled experiments.

NeurIPS Conference 2002 Conference Paper

Identity Uncertainty and Citation Matching

  • Hanna Pasula
  • Bhaskara Marthi
  • Brian Milch
  • Stuart Russell
  • Ilya Shpitser

Identity uncertainty is a pervasive problem in real-world data analysis. It arises whenever objects are not labeled with unique identifiers or when those identifiers may not be perceived perfectly. In such cases, two ob- servations may or may not correspond to the same object. In this paper, we consider the problem in the context of citation matching—the prob- lem of deciding which citations correspond to the same publication. Our approach is based on the use of a relational probability model to define a generative model for the domain, including models of author and title corruption and a probabilistic citation grammar. Identity uncertainty is handled by extending standard models to incorporate probabilities over the possible mappings between terms in the language and objects in the domain. Inference is based on Markov chain Monte Carlo, augmented with specific methods for generating efficient proposals when the domain contains many objects. Results on several citation data sets show that the method outperforms current algorithms for citation matching. The declarative, relational nature of the model also means that our algorithm can determine object characteristics such as author names by combining multiple citations of multiple papers.