Author name cluster

Murat Kocaoglu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Characterization and Learning of Causal Graphs from Hard Interventions

Zihan Zhou
Muhammad Qasim Elahi
Murat Kocaoglu

A fundamental challenge in the empirical sciences involves uncovering causal structure through observation and experimentation. Causal discovery entails linking the conditional independence (CI) invariances in observational data to their corresponding graphical constraints via d-separation. In this paper, we consider a general setting where we have access to data from multiple experimental distributions resulting from hard interventions, as well as potentially from an observational distribution. By comparing different interventional distributions, we propose a set of graphical constraints that are fundamentally linked to Pearl's do-calculus within the framework of hard interventions. These graphical constraints associate each graphical structure with a set of interventional distributions that are consistent with the rules of do-calculus. We characterize the interventional equivalence class of causal graphs with latent variables and introduce a graphical representation that can be used to determine whether two causal graphs are interventionally equivalent, i. e. , whether they are associated with the same family of hard interventional distributions, where the elements of the family are indistinguishable using the invariances from do-calculus. We also propose a learning algorithm to integrate multiple datasets from hard interventions, introducing new orientation rules. The learning objective is a tuple of augmented graphs which entails a set of causal graphs. We also prove the soundness of the proposed algorithm.

UAI Conference 2025 Conference Paper

Constraint-based Causal Discovery from a Collection of Conditioning Sets

Kenneth Lee
Bruno Ribeiro 0001
Murat Kocaoglu

In constraint-based causal discovery, the existing algorithms systematically use a series of conditional independence (CI) relations observed in the data to recover an equivalence class of causal graphs in the large sample limit. One limitation of these algorithms is that CI tests lose statistical power as conditioning set size increases with finite samples. Recent research proposes to limit the conditioning set size for robust causal discovery. However, the existing algorithms require exhaustive testing of all CI relations with conditioning set sizes up to a certain integer $k$. This becomes problematic in practice when variables with large support are present, as it makes CI tests less reliable due to near-deterministic relationships, thereby violating the faithfulness assumption. To address this issue, we propose a causal discovery algorithm that only uses CI tests where the conditioning sets are restricted to a given set of conditioning sets including the empty set $\mathcal{C}$. We call such set of CI relations ${\mathcal{I}}_{\mathcal{C}}$ conditionally closed. We define the notion of $\mathcal{C}$-Markov equivalence: two causal graphs are $\mathcal{C}$-Markov equivalent if they entail the same set of CI constraints from ${\mathcal{I}}_\mathcal{C}$. We propose a graphical representation of $\mathcal{C}$-Markov equivalence and characterize such equivalence between two causal graphs. Our proposed algorithm called the $\mathcal{C}$-PC algorithm is sound for learning the $\mathcal{C}$-Markov equivalence class. We demonstrate the utility of the proposed algorithm via synthetic and real-world experiments in scenarios where variables with large support or high correlation are present in the data.

NeurIPS Conference 2025 Conference Paper

Differentiable Constraint-Based Causal Discovery

Jincheng Zhou
Mengbo Wang
Anqi He
Yumeng Zhou
Hessam Olya
Murat Kocaoglu
Bruno Ribeiro

Causal discovery from observational data is a fundamental task in artificial intelligence, with far-reaching implications for decision-making, predictions, and interventions. Despite significant advances, existing methods can be broadly categorized as constraint-based or score-based approaches. Constraint-based methods offer rigorous causal discovery but are often hindered by small sample sizes, while score-based methods provide flexible optimization but typically forgo explicit conditional independence testing. This work explores a third avenue: developing differentiable $d$-separation scores, obtained through a percolation theory using soft logic. This enables the implementation of a new type of causal discovery method: gradient-based optimization of conditional independence constraints. Empirical evaluations demonstrate the robust performance of our approach in low-sample regimes, surpassing traditional constraint-based and score-based baselines on a real-world dataset. Code implementing the proposed method is publicly available at [https: //github. com/PurdueMINDS/DAGPA](https: //github. com/PurdueMINDS/DAGPA).

UAI Conference 2025 Conference Paper

FeDCM: Federated Learning of Deep Causal Generative Models

Md. Musfiqur Rahman
Murat Kocaoglu

In many real-world settings, such as medicine and finance causal effect is a valuable metric for decision making. For many predictive tasks, causal mechanisms provide robust estimators while existing ML-driven predictors might be vulnerable to spurious correlations. In such settings, when data is decentralized and privacy must be preserved, federated learning plays an important role. However, causal inference in a federated learning setup is a largely unexplored research area. In this paper, we learn a proxy of the underlying structural causal model (SCM) with deep generative models from decentralized observational data sources possibly containing high-dimensional variables. Based on client preference or high dimensionality of variables, we modularize the SCM mechanisms and find the minimal subset appropriate for federated learning while having rest of the mechanisms trained on individual client’s local data. When all connected together, the proxy SCM, named as the federated deep causal generative model (FeDCM ), offers estimation of any identifiable causal effect. We perform extensive experiments to illustrate the utility and performance of our approach.

TMLR Journal 2025 Journal Article

Identification of Average Outcome under Interventions in Confounded Additive Noise Models

Muhammad Qasim Elahi
Mahsa Ghasemi
Murat Kocaoglu

Additive noise models (ANMs) are an important setting studied in causal inference. Most existing works on ANMs assume causal sufficiency, i.e., there are no unobserved confounders. This paper focuses on confounded ANMs, where a set of treatment variables and a target variable are affected by an unobserved confounder that follows a multivariate Gaussian distribution. We introduce a novel approach for estimating the average outcome under interventions (AOIs) for interventions on any subset of treatment variables and demonstrate that a small set of interventional distributions is sufficient to estimate all of them. In addition, we propose a randomized algorithm that further reduces the number of required interventions to poly-logarithmic in the number of nodes. Finally, we demonstrate that these interventions are also sufficient to recover the causal structure between the observed variables. This establishes that a poly-logarithmic number of interventions is sufficient to infer the causal effects of any subset of treatments on the outcome in confounded ANMs with high probability, even when the causal structure between treatments is unknown. The simulation results indicate that our method can accurately estimate all AOIs in the finite-sample regime. We also demonstrate the practical significance of our algorithm by evaluating it on semi-synthetic data.

UAI Conference 2025 Conference Paper

Root Cause Analysis of Failures from Partial Causal Structures

Azam Ikram
Kenneth Lee
Shubham Agarwal 0007
Shiv Kumar Saini
Saurabh Bagchi
Murat Kocaoglu

Finding the root cause of failures is a prominent problem in many complex networks. Causal inference provides us with tools to address this problem algorithmically to automate this process and solve it efficiently. The existing methods either use a known causal structure to identify root cause by backtracking the changes, or ignore the causal structure but relies on invariance tests to identify the changing causal mechanisms after the failure. Assuming a single, unknown root cause, we first establish a novel connection between root cause analysis and the \textit{Interactive Graph Search (IGS)} problem. This mapping highlights the importance of causal knowledge: we demonstrate that any algorithm relying solely on marginal invariance tests to identify the root cause must perform at least $\Omega(\log_{2}(n) + d\log_{1+d}n)$ many tests, where $n$ represents the number of components and $d$ denotes the maximum out-degree of the graph. We then present an optimal algorithm that achieves this bound by reducing the root cause identification problem as an instance of IGS. Beyond the single root cause scenario, we propose a practical extension for settings with multiple root causes and partial causal knowledge. More specifically, we show that even if the causal graph is partially known, we can identify the root-causes with a linear number of invariance tests. This is the first known result on incorporating a partial causal structure for root cause analysis. Our experiments on a production-level application demonstrate that, even in the absence of complete causal information, our approach accurately identifies the root causes of failures.

ICML Conference 2024 Conference Paper

Adaptive Online Experimental Design for Causal Discovery

Muhammad Qasim Elahi
Lai Wei
Murat Kocaoglu
Mahsa Ghasemi

Causal discovery aims to uncover cause-and-effect relationships encoded in causal graphs by leveraging observational, interventional data, or their combination. The majority of existing causal discovery methods are developed assuming infinite interventional data. We focus on interventional data efficiency and formalize causal discovery from the perspective of online learning, inspired by pure exploration in bandit problems. A graph separating system, consisting of interventions that cut every edge of the graph at least once, is sufficient for learning causal graphs when infinite interventional data is available, even in the worst case. We propose a track-and-stop causal discovery algorithm that adaptively selects interventions from the graph separating system via allocation matching and learns the causal graph based on sampling history. Given any desired confidence value, the algorithm determines a termination condition and runs until it is met. We analyze the algorithm to establish a problem-dependent upper bound on the expected number of required interventional samples. Our proposed algorithm outperforms existing methods in simulations across various randomly generated causal graphs. It achieves higher accuracy, measured by the structural hamming distance (SHD) between the learned causal graph and the ground truth, with significantly fewer samples.

ICML Conference 2024 Conference Paper

Conditional Common Entropy for Instrumental Variable Testing and Partial Identification

Ziwei Jiang
Murat Kocaoglu

Instrumental variables (IVs) are widely used for estimating causal effects. There are two main challenges when using instrumental variables. First of all, using IV without additional assumptions such as linearity, the causal effect may still not be identifiable. Second, when selecting an IV, the validity of the selected IV is typically not testable since the causal graph is not identifiable from observational data. In this paper, we propose a method for bounding the causal effect with instrumental variables under weak confounding. In addition, we present a novel criterion to falsify the IV with side information about the confounder. We demonstrate the utility of the proposed method with simulated and real-world datasets.

NeurIPS Conference 2024 Conference Paper

Conditional Generative Models are Sufficient to Sample from Any Causal Effect Estimand

Md Musfiqur Rahman
Matt Jordan
Murat Kocaoglu

Causal inference from observational data plays critical role in many applications in trustworthy machine learning. While sound and complete algorithms exist to compute causal effects, many of them assume access to conditional likelihoods, which is difficult to estimate for high-dimensional (particularly image) data. Researchers have alleviated this issue by simulating causal relations with neural models. However, when we have high-dimensional variables in the causal graph along with some unobserved confounders, no existing work can effectively sample from the un/conditional interventional distributions. In this work, we show how to sample from any identifiable interventional distribution given an arbitrary causal graph through a sequence of push-forward computations of conditional generative models, such as diffusion models. Our proposed algorithm follows the recursive steps of the existing likelihood-based identification algorithms to train a set of feed-forward models, and connect them in a specific way to sample from the desired distribution. We conduct experiments on a Colored MNIST dataset having both the treatment ($X$) and the target variables ($Y$) as images and sample from $P(y|do(x))$. Our algorithm also enables us to conduct a causal analysis to evaluate spurious correlations among input features of generative models pre-trained on the CelebA dataset. Finally, we generate high-dimensional interventional samples from the MIMIC-CXR dataset involving text and image variables.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Counterfactual Fairness by Combining Factual and Counterfactual Predictions

Zeyu Zhou
Tianci Liu
Ruqi Bai
Jing Gao
Murat Kocaoglu
David I. Inouye

In high-stakes domains such as healthcare and hiring, the role of machine learning (ML) in decision-making raises significant fairness concerns. This work focuses on Counterfactual Fairness (CF), which posits that an ML model's outcome on any individual should remain unchanged if they had belonged to a different demographic group. Previous works have proposed methods that guarantee CF. Notwithstanding, their effects on the model's predictive performance remain largely unclear. To fill this gap, we provide a theoretical study on the inherent trade-off between CF and predictive performance in a model-agnostic manner. We first propose a simple but effective method to cast an optimal but potentially unfair predictor into a fair one with a minimal loss of performance. By analyzing the excess risk incurred by perfect CF, we quantify this inherent trade-off. Further analysis on our method's performance with access to only incomplete causal knowledge is also conducted. Built upon this, we propose a practical algorithm that can be applied in such scenarios. Experiments on both synthetic and semi-synthetic datasets demonstrate the validity of our analysis and methods.

PDF Details DOI

ICML Conference 2024 Conference Paper

Modular Learning of Deep Causal Generative Models for High-dimensional Causal Inference

Md. Musfiqur Rahman
Murat Kocaoglu

Sound and complete algorithms have been proposed to compute identifiable causal queries using the causal structure and data. However, most of these algorithms assume accurate estimation of the data distribution, which is impractical for high-dimensional variables such as images. On the other hand, modern deep generative architectures can be trained to sample from high-dimensional distributions. However, training these networks are typically very costly. Thus, it is desirable to leverage pre-trained models to answer causal queries using such high-dimensional data. To address this, we propose modular training of deep causal generative models that not only makes learning more efficient, but also allows us to utilize large, pre-trained conditional generative models. To the best of our knowledge, our algorithm, Modular-DCM is the first algorithm that, given the causal structure, uses adversarial training to learn the network weights, and can make use of pre-trained models to provably sample from any identifiable causal query in the presence of latent confounders. With extensive experiments on the Colored-MNIST dataset, we demonstrate that our algorithm outperforms the baselines. We also show our algorithm’s convergence on the COVIDx dataset and its utility with a causal invariant prediction problem on CelebA-HQ.

NeurIPS Conference 2024 Conference Paper

Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits

Muhammad Qasim Elahi
Mahsa Ghasemi
Murat Kocaoglu

Causal knowledge about the relationships among decision variables and a reward variable in a bandit setting can accelerate the learning of an optimal decision. Current works often assume the causal graph is known, which may not always be available a priori. Motivated by this challenge, we focus on the causal bandit problem in scenarios where the underlying causal graph is unknown and may include latent confounders. While intervention on the parents of the reward node is optimal in the absence of latent confounders, this is not necessarily the case in general. Instead, one must consider a set of possibly optimal arms/interventions, each being a special subset of the ancestors of the reward node, making causal discovery beyond the parents of the reward node essential. For regret minimization, we identify that discovering the full causal structure is unnecessary; however, no existing work provides the necessary and sufficient components of the causal graph. We formally characterize the set of necessary and sufficient latent confounders one needs to detect or learn to ensure that all possibly optimal arms are identified correctly. We also propose a randomized algorithm for learning the causal graph with a limited number of samples, providing a sample complexity guarantee for any desired confidence level. In the causal bandit setup, we propose a two-stage approach. In the first stage, we learn the induced subgraph on ancestors of the reward, along with a necessary and sufficient subset of latent confounders, to construct the set of possibly optimal arms. We show that for our proposed algorithm, the number of intervention samples required to learn the set of possibly optimal arms scales polynomially with respect to the number of nodes. The second phase involves the application of a standard bandit algorithm, such as the UCB algorithm. We also establish a regret bound for our two-phase approach, which is sublinear in the number of rounds.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Sample Efficient Bayesian Learning of Causal Graphs from Interventions

Zihan Zhou
Muhammad Qasim Elahi
Murat Kocaoglu

Causal discovery is a fundamental problem with applications spanning various areas in science and engineering. It is well understood that solely using observational data, one can only orient the causal graph up to its Markov equivalence class, necessitating interventional data to learn the complete causal graph. Most works in the literature design causal discovery policies with perfect interventions, i. e. , they have access to infinite interventional samples. This study considers a Bayesian approach for learning causal graphs with limited interventional samples, mirroring real-world scenarios where such samples are usually costly to obtain. By leveraging the recent result of Wienöbst et al. [2023] on uniform DAG sampling in polynomial time, we can efficiently enumerate all the cut configurations and their corresponding interventional distributions of a target set, and further track their posteriors. Given any number of interventional samples, our proposed algorithm randomly intervenes on a set of target vertices that cut all the edges in the graph and returns a causal graph according to the posterior of each target set. When the number of interventional samples is large enough, we show theoretically that our proposed algorithm will return the true causal graph with high probability. We compare our algorithm against various baseline methods on simulated datasets, demonstrating its superior accuracy measured by the structural Hamming distance between the learned DAG and the ground truth. Additionally, we present a case study showing how this algorithm could be modified to answer more general causal questions without learning the whole graph. As an example, we illustrate that our method can be used to estimate the causal effect of a variable that cannot be intervened.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Towards Characterizing Domain Counterfactuals for Invertible Latent Causal Models

Zeyu Zhou
Ruqi Bai
Sean Kulinski
Murat Kocaoglu
David I. Inouye

Answering counterfactual queries has important applications such as explainability, robustness, and fairness but is challenging when the causal variables are unobserved and the observations are non-linear mixtures of these latent variables, such as pixels in images. One approach is to recover the latent Structural Causal Model (SCM), which may be infeasible in practice due to requiring strong assumptions, e.g., linearity of the causal mechanisms or perfect atomic interventions. Meanwhile, more practical ML-based approaches using naive domain translation models to generate counterfactual samples lack theoretical grounding and may construct invalid counterfactuals. In this work, we strive to strike a balance between practicality and theoretical guarantees by analyzing a specific type of causal query called *domain counterfactuals*, which hypothesizes what a sample would have looked like if it had been generated in a different domain (or environment). We show that recovering the latent SCM is unnecessary for estimating domain counterfactuals, thereby sidestepping some of the theoretic challenges. By assuming invertibility and sparsity of intervention, we prove domain counterfactual estimation error can be bounded by a data fit term and intervention sparsity term. Building upon our theoretical results, we develop a theoretically grounded practical algorithm that simplifies the modeling process to generative model estimation under autoregressive and shared parameter constraints that enforce intervention sparsity. Finally, we show an improvement in counterfactual estimation over baseline methods through extensive simulated and image-based experiments.

NeurIPS Conference 2023 Conference Paper

Approximate Allocation Matching for Structural Causal Bandits with Unobserved Confounders

Lai Wei
Muhammad Qasim Elahi
Mahsa Ghasemi
Murat Kocaoglu

Structural causal bandit provides a framework for online decision-making problems when causal information is available. It models the stochastic environment with a structural causal model (SCM) that governs the causal relations between random variables. In each round, an agent applies an intervention (or no intervention) by setting certain variables to some constants and receives a stochastic reward from a non-manipulable variable. Though the causal structure is given, the observational and interventional distributions of these random variables are unknown beforehand, and they can only be learned through interactions with the environment. Therefore, to maximize the expected cumulative reward, it is critical to balance the explore-versus-exploit tradeoff. We assume each random variable takes a finite number of distinct values, and consider a semi-Markovian setting, where random variables are affected by unobserved confounders. Using the canonical SCM formulation to discretize the domains of unobserved variables, we efficiently integrate samples to reduce model uncertainty. This gives the decision maker a natural advantage over those in a classical multi-armed bandit setup. We provide a logarithmic asymptotic regret lower bound for the structural causal bandit problem. Inspired by the lower bound, we design an algorithm that can utilize the causal structure to accelerate the learning process and take informative and rewarding interventions. We establish that our algorithm achieves a logarithmic regret and demonstrate that it outperforms the existing methods via simulations.

ICML Conference 2023 Conference Paper

Approximate Causal Effect Identification under Weak Confounding

Ziwei Jiang
Lai Wei
Murat Kocaoglu

Causal effect estimation has been studied by many researchers when only observational data is available. Sound and complete algorithms have been developed for pointwise estimation of identifiable causal queries. For non-identifiable causal queries, researchers developed polynomial programs to estimate tight bounds on causal effect. However, these are computationally difficult to optimize for variables with large support sizes. In this paper, we analyze the effect of "weak confounding’" on causal estimands. More specifically, under the assumption that the unobserved confounders that render a query non-identifiable have small entropy, we propose an efficient linear program to derive the upper and lower bounds of the causal effect. We show that our bounds are consistent in the sense that as the entropy of unobserved confounders goes to zero, the gap between the upper and lower bound vanishes. Finally, we conduct synthetic and real data simulations to compare our bounds with the bounds obtained by the existing work that cannot incorporate such entropy constraints and show that our bounds are tighter for the setting with weak confounders.

NeurIPS Conference 2023 Conference Paper

Causal Discovery in Semi-Stationary Time Series

Shanyun Gao
Raghavendra Addanki
Tong Yu
Ryan Rossi
Murat Kocaoglu

Discovering causal relations from observational time series without making the stationary assumption is a significant challenge. In practice, this challenge is common in many areas, such as retail sales, transportation systems, and medical science. Here, we consider this problem for a class of non-stationary time series. The structural causal model (SCM) of this type of time series, called the semi-stationary time series, exhibits that a finite number of different causal mechanisms occur sequentially and periodically across time. This model holds considerable practical utility because it can represent periodicity, including common occurrences such as seasonality and diurnal variation. We propose a constraint-based, non-parametric algorithm for discovering causal relations in this setting. The resulting algorithm, PCMCI$_{\Omega}$, can capture the alternating and recurring changes in the causal mechanisms and then identify the underlying causal graph with conditional independence (CI) tests. We show that this algorithm is sound in identifying causal relations on discrete time series. We validate the algorithm with extensive experiments on continuous and discrete simulated data. We also apply our algorithm to a real-world climate dataset.

NeurIPS Conference 2023 Conference Paper

Characterization and Learning of Causal Graphs with Small Conditioning Sets

Murat Kocaoglu

Constraint-based causal discovery algorithms learn part of the causal graph structure by systematically testing conditional independences observed in the data. These algorithms, such as the PC algorithm and its variants, rely on graphical characterizations of the so-called equivalence class of causal graphs proposed by Pearl. However, constraint-based causal discovery algorithms struggle when data is limited since conditional independence tests quickly lose their statistical power, especially when the conditioning set is large. To address this, we propose using conditional independence tests where the size of the conditioning set is upper bounded by some integer k for robust causal discovery. The existing graphical characterizations of the equivalence classes of causal graphs are not applicable when we cannot leverage all the conditional independence statements. We first define the notion of k-Markov equivalence: Two causal graphs are k-Markov equivalent if they entail the same conditional independence constraints where the conditioning set size is upper bounded by k. We propose a novel representation that allows us to graphically characterize k-Markov equivalence between two causal graphs. We propose a sound constraint-based algorithm called the k-PC algorithm for learning this equivalence class. Finally, we conduct synthetic, and semi-synthetic experiments to demonstrate that the k-PC algorithm enables more robust causal discovery in the small sample regime compared to the baseline algorithms.

UAI Conference 2023 Conference Paper

Finding Invariant Predictors Efficiently via Causal Structure

Kenneth Lee
Md. Musfiqur Rahman
Murat Kocaoglu

One fundamental problem in machine learning is out-of-distribution generalization. A method named the surgery estimator incorporates the causal structure in the form of a directed acyclic graph (DAG) to find predictors that are invariant across target domains using distributional invariances via Pearl’s do-calculus. However, finding a surgery estimator can take exponential time as the current methods need to search through all possible predictors. In this work, we first provide a graphical characterization of the identifiability of conditional causal queries. Next, we leverage this characterization together with a greedy search step to develop a polynomial-time algorithm for finding invariant predictors using the causal graph. Given the correct causal graph, our method is guaranteed to find at least one invariant predictor, if it exists. We show that our proposed algorithm can significantly reduce the run-time both in simulated and semi-synthetic data experiments and have predictive performance that is comparable to the existing work that runs in exponential time.

NeurIPS Conference 2023 Conference Paper

Front-door Adjustment Beyond Markov Equivalence with Limited Graph Knowledge

Abhin Shah
Karthikeyan Shanmugam
Murat Kocaoglu

Causal effect estimation from data typically requires assumptions about the cause-effect relations either explicitly in the form of a causal graph structure within the Pearlian framework, or implicitly in terms of (conditional) independence statements between counterfactual variables within the potential outcomes framework. When the treatment variable and the outcome variable are confounded, front-door adjustment is an important special case where, given the graph, causal effect of the treatment on the target can be estimated using post-treatment variables. However, the exact formula for front-door adjustment depends on the structure of the graph, which is difficult to learn in practice. In this work, we provide testable conditional independence statements to compute the causal effect using front-door-like adjustment without knowing the graph under limited structural side information. We show that our method is applicable in scenarios where knowing the Markov equivalence class is not sufficient for causal effect estimation. We demonstrate the effectiveness of our method on a class of random graphs as well as real causal fairness benchmarks.

ICML Conference 2022 Conference Paper

Entropic Causal Inference: Graph Identifiability

Spencer Compton
Kristjan H. Greenewald
Dmitriy Katz
Murat Kocaoglu

Entropic causal inference is a recent framework for learning the causal graph between two variables from observational data by finding the information-theoretically simplest structural explanation of the data, i. e. , the model with smallest entropy. In our work, we first extend the causal graph identifiability result in the two-variable setting under relaxed assumptions. We then show the first identifiability result using the entropic approach for learning causal graphs with more than two nodes. Our approach utilizes the property that ancestrality between a source node and its descendants can be determined using the bivariate entropic tests. We provide a sound sequential peeling algorithm for general graphs that relies on this property. We also propose a heuristic algorithm for small graphs that shows strong empirical performance. We rigorously evaluate the performance of our algorithms on synthetic data generated from a variety of models, observing improvement over prior work. Finally we test our algorithms on real-world datasets.

NeurIPS Conference 2022 Conference Paper

Root Cause Analysis of Failures in Microservices through Causal Discovery

Azam Ikram
Sarthak Chakraborty
Subrata Mitra
Shiv Saini
Saurabh Bagchi
Murat Kocaoglu

Most cloud applications use a large number of smaller sub-components (called microservices) that interact with each other in the form of a complex graph to provide the overall functionality to the user. While the modularity of the microservice architecture is beneficial for rapid software development, maintaining and debugging such a system quickly in cases of failure is challenging. We propose a scalable algorithm for rapidly detecting the root cause of failures in complex microservice architectures. The key ideas behind our novel hierarchical and localized learning approach are: (1) to treat the failure as an intervention on the root cause to quickly detect it, (2) only learn the portion of the causal graph related to the root cause, thus avoiding a large number of costly conditional independence tests, and (3) hierarchically explore the graph. The proposed technique is highly scalable and produces useful insights about the root cause, while the use of traditional techniques becomes infeasible due to high computation time. Our solution is application agnostic and relies only on the data collected for diagnosis. For the evaluation, we compare the proposed solution with a modified version of the PC algorithm and the state-of-the-art for root cause analysis. The results show a considerable improvement in top-$k$ recall while significantly reducing the execution time.

UAI Conference 2021 Conference Paper

Conditionally independent data generation

Kartik Ahuja
Prasanna Sattigeri
Karthikeyan Shanmugam 0001
Dennis Wei
Karthikeyan Natesan Ramamurthy
Murat Kocaoglu

Conditional independence (CI) is a fundamental concept with wide applications in machine learning and causal inference. Although the problems of testing CI and estimating divergences have been extensively studied, the complementary problem of generating data that satisfies CI has received much less attention. A special case of the generation problem is to produce conditionally independent predictions. Given samples from an input data distribution, we formulate the problem of generating samples from a distribution that is close to the input distribution and satisfies CI. We establish a characterization of CI in terms of a general divergence identity. Based on one version of this identity, an architecture is proposed that leverages the capabilities of generative adversarial networks (GANs) to enforce CI in an end-to-end differentiable manner. As one illustration of the problem formulation and architecture, we consider applications to notions of fairness that can be written as CIs, specifically equalized odds and conditional statistical parity. We demonstrate conditionally independent prediction that trades off adherence to fairness criteria against classification accuracy.

NeurIPS Conference 2020 Conference Paper

Active Structure Learning of Causal DAGs via Directed Clique Trees

Chandler Squires
Sara Magliacane
Kristjan Greenewald
Dmitriy Katz
Murat Kocaoglu
Karthikeyan Shanmugam

A growing body of work has begun to study intervention design for efficient structure learning of causal directed acyclic graphs (DAGs). A typical setting is a \emph{causally sufficient} setting, i. e. a system with no latent confounders, selection bias, or feedback, when the essential graph of the observational equivalence class (EC) is given as an input and interventions are assumed to be noiseless. Most existing works focus on \textit{worst-case} or \textit{average-case} lower bounds for the number of interventions required to orient a DAG. These worst-case lower bounds only establish that the largest clique in the essential graph \textit{could} make it difficult to learn the true DAG. In this work, we develop a \textit{universal} lower bound for single-node interventions that establishes that the largest clique is \textit{always} a fundamental impediment to structure learning. Specifically, we present a decomposition of a DAG into independently orientable components through \emph{directed clique trees} and use it to prove that the number of single-node interventions necessary to orient any DAG in an EC is at least the sum of half the size of the largest cliques in each chain component of the essential graph. Moreover, we present a two-phase intervention design algorithm that, under certain conditions on the chordal skeleton, matches the optimal number of interventions up to a multiplicative logarithmic factor in the number of maximal cliques. We show via synthetic experiments that our algorithm can scale to much larger graphs than most of the related work and achieves better worst-case performance than other scalable approaches. A code base to recreate these results can be found at \url{https: //github. com/csquires/dct-policy}.

NeurIPS Conference 2020 Conference Paper

Applications of Common Entropy for Causal Inference

Murat Kocaoglu
Sanjay Shakkottai
Alexandros G. Dimakis
Constantine Caramanis
Sriram Vishwanath

We study the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. The minimum entropy required for such a latent is known as common entropy in information theory. We extend this notion to Renyi common entropy by minimizing the Renyi entropy of the latent variable. To efficiently compute common entropy, we propose an iterative algorithm that can be used to discover the trade-off between the entropy of the latent variable and the conditional mutual information of the observed variables. We show two applications of common entropy in causal inference: First, under the assumption that there are no low-entropy mediators, it can be used to distinguish direct causation from spurious correlation among almost all joint distributions on simple causal graphs with two observed variables. Second, common entropy can be used to improve constraint-based methods such as PC or FCI algorithms in the small-sample regime, where these methods are known to struggle. We propose a modification to these constraint-based methods to assess if a separating set found by these algorithms are valid using common entropy. We finally evaluate our algorithms on synthetic and real data to establish their performance.

NeurIPS Conference 2020 Conference Paper

Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning

Amin Jaber
Murat Kocaoglu
Karthikeyan Shanmugam
Elias Bareinboim

One fundamental problem in the empirical sciences is of reconstructing the causal structure that underlies a phenomenon of interest through observation and experimentation. While there exists a plethora of methods capable of learning the equivalence class of causal structures that are compatible with observations, it is less well-understood how to systematically combine observations and experiments to reconstruct the underlying structure. In this paper, we investigate the task of structural learning in non-Markovian systems (i. e. , when latent variables affect more than one observable) from a combination of observational and soft experimental data when the interventional targets are unknown. Using causal invariances found across the collection of observational and interventional distributions (not only conditional independences), we define a property called psi-Markov that connects these distributions to a pair consisting of (1) a causal graph D and (2) a set of interventional targets I. Building on this property, our main contributions are two-fold: First, we provide a graphical characterization that allows one to test whether two causal graphs with possibly different sets of interventional targets belong to the same psi-Markov equivalence class. Second, we develop an algorithm capable of harnessing the collection of data to learn the corresponding equivalence class. We then prove that this algorithm is sound and complete, in the sense that it is the most informative in the sample limit, i. e. , it discovers as many tails and arrowheads as can be oriented within a psi-Markov equivalence class.

NeurIPS Conference 2020 Conference Paper

Entropic Causal Inference: Identifiability and Finite Sample Results

Spencer Compton
Murat Kocaoglu
Kristjan Greenewald
Dmitriy Katz

Entropic causal inference is a framework for inferring the causal direction between two categorical variables from observational data. The central assumption is that the amount of unobserved randomness in the system is not too large. This unobserved randomness is measured by the entropy of the exogenous variable in the underlying structural causal model, which governs the causal relation between the observed variables. Kocaoglu et al. conjectured that the causal direction is identifiable when the entropy of the exogenous variable is not too large. In this paper, we prove a variant of their conjecture. Namely, we show that for almost all causal models where the exogenous variable has entropy that does not scale with the number of states of the observed variables, the causal direction is identifiable from observational data. We also consider the minimum entropy coupling-based algorithmic approach presented by Kocaoglu et al. , and for the first time demonstrate algorithmic identifiability guarantees using a finite number of samples. We conduct extensive experiments to evaluate the robustness of the method to relaxing some of the assumptions in our theory and demonstrate that both the constant-entropy exogenous variable and the no latent confounder assumptions can be relaxed in practice. We also empirically characterize the number of observational samples needed for causal identification. Finally, we apply the algorithm on Tuebingen cause-effect pairs dataset.

NeurIPS Conference 2019 Conference Paper

Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions

Murat Kocaoglu
Amin Jaber
Karthikeyan Shanmugam
Elias Bareinboim

The challenge of learning the causal structure underlying a certain phenomenon is undertaken by connecting the set of conditional independences (CIs) readable from the observational data, on the one side, with the set of corresponding constraints implied over the graphical structure, on the other, which are tied through a graphical criterion known as d-separation (Pearl, 1988). In this paper, we investigate the more general scenario where multiple observational and experimental distributions are available. We start with the simple observation that the invariances given by CIs/d-separation are just one special type of a broader set of constraints, which follow from the careful comparison of the different distributions available. Remarkably, these new constraints are intrinsically connected with do-calculus (Pearl, 1995) in the context of soft-interventions. We introduce a novel notion of interventional equivalence class of causal graphs with latent variables based on these invariances, which associates each graphical structure with a set of interventional distributions that respect the do-calculus rules. Given a collection of distributions, two causal graphs are called interventionally equivalent if they are associated with the same family of interventional distributions, where the elements of the family are indistinguishable using the invariances obtained from a direct application of the calculus rules. We introduce a graphical representation that can be used to determine if two causal graphs are interventionally equivalent. We provide a formal graphical characterization of this equivalence. Finally, we extend the FCI algorithm, which was originally designed to operate based on CIs, to combine observational and interventional datasets, including new orientation rules particular to this setting.

NeurIPS Conference 2019 Conference Paper

Sample Efficient Active Learning of Causal Trees

Kristjan Greenewald
Dmitriy Katz
Karthikeyan Shanmugam
Sara Magliacane
Murat Kocaoglu
Enric Boix Adsera
Guy Bresler

We consider the problem of experimental design for learning causal graphs that have a tree structure. We propose an adaptive framework that determines the next intervention based on a Bayesian prior updated with the outcomes of previous experiments, focusing on the setting where observational data is cheap (assumed infinite) and interventional data is expensive. While information greedy approaches are popular in active learning, we show that in this setting they can be exponentially suboptimal (in the number of interventions required), and instead propose an algorithm that exploits graph structure in the form of a centrality measure. If infinite interventional data is available, we show that the algorithm requires a number of interventions less than or equal to a factor of 2 times the minimum achievable number. We show that the algorithm and the associated theory can be adapted to the setting where each performed intervention yields finitely many samples. Several extensions are also presented, to the case where a specified set of nodes cannot be intervened on, to the case where $K$ interventions are scheduled at once, and to the fully adaptive case where each experiment yields only one sample. In the case of finite interventional data, through simulated experiments we show that our algorithms outperform different adaptive baseline algorithms.

NeurIPS Conference 2018 Conference Paper

Experimental Design for Cost-Aware Learning of Causal Graphs

Erik Lindgren
Murat Kocaoglu
Alexandros Dimakis
Sriram Vishwanath

We consider the minimum cost intervention design problem: Given the essential graph of a causal graph and a cost to intervene on a variable, identify the set of interventions with minimum total cost that can learn any causal graph with the given essential graph. We first show that this problem is NP-hard. We then prove that we can achieve a constant factor approximation to this problem with a greedy algorithm. We then constrain the sparsity of each intervention. We develop an algorithm that returns an intervention design that is nearly optimal in terms of size for sparse graphs with sparse interventions and we discuss how to use it when there are costs on the vertices.

ICML Conference 2017 Conference Paper

Cost-Optimal Learning of Causal Graphs

Murat Kocaoglu
Alexandros G. Dimakis
Sriram Vishwanath

We consider the problem of learning a causal graph over a set of variables with interventions. We study the cost-optimal causal graph learning problem: For a given skeleton (undirected version of the causal graph), design the set of interventions with minimum total cost, that can uniquely identify any causal graph with the given skeleton. We show that this problem is solvable in polynomial time. Later, we consider the case when the number of interventions is limited. For this case, we provide polynomial time algorithms when the skeleton is a tree or a clique tree. For a general chordal skeleton, we develop an efficient greedy algorithm, which can be improved when the causal graph skeleton is an interval graph.

AAAI Conference 2017 Conference Paper

Entropic Causal Inference

Murat Kocaoglu
Alexandros Dimakis
Sriram Vishwanath
Babak Hassibi

We consider the problem of identifying the causal direction between two discrete random variables using observational data. Unlike previous work, we keep the most general functional model but make an assumption on the unobserved exogenous variable: Inspired by Occam’s razor, we assume that the exogenous variable is simple in the true causal direction. We quantify simplicity using Rényi entropy. Our main result is that, under natural assumptions, if the exogenous variable has low H0 entropy (cardinality) in the true direction, it must have high H0 entropy in the wrong direction. We establish several algorithmic hardness results about estimating the minimum entropy exogenous variable. We show that the problem of ﬁnding the exogenous variable with minimum H1 entropy (Shannon Entropy) is equivalent to the problem of ﬁnding minimum joint entropy given n marginal distributions, also known as minimum entropy coupling problem. We propose an efﬁcient greedy algorithm for the minimum entropy coupling problem, that for n = 2 provably ﬁnds a local optimum. This gives a greedy algorithm for ﬁnding the exogenous variable with minimum Shannon entropy. Our greedy entropy-based causal inference algorithm has similar performance to the state of the art additive noise models in real datasets. One advantage of our approach is that we make no use of the values of random variables but only their distributions. Our method can therefore be used for causal inference for both ordinal and also categorical data, unlike additive noise models.

NeurIPS Conference 2017 Conference Paper

Experimental Design for Learning Causal Graphs with Latent Variables

Murat Kocaoglu
Karthikeyan Shanmugam
Elias Bareinboim

We consider the problem of learning causal structures with latent variables using interventions. Our objective is not only to learn the causal graph between the observed variables, but to locate unobserved variables that could confound the relationship between observables. Our approach is stage-wise: We first learn the observable graph, i. e. , the induced graph between observable variables. Next we learn the existence and location of the latent variables given the observable graph. We propose an efficient randomized algorithm that can learn the observable graph using O(d\log^2 n) interventions where d is the degree of the graph. We further propose an efficient deterministic variant which uses O(log n + l) interventions, where l is the longest directed path in the graph. Next, we propose an algorithm that uses only O(d^2 log n) interventions that can learn the latents between both non-adjacent and adjacent variables. While a naive baseline approach would require O(n^2) interventions, our combined algorithm can learn the causal graph with latents using O(d log^2 n + d^2 log (n)) interventions.

NeurIPS Conference 2015 Conference Paper

Learning Causal Graphs with Small Interventions

Karthikeyan Shanmugam
Murat Kocaoglu
Alexandros Dimakis
Sriram Vishwanath

We consider the problem of learning causal networks with interventions, when each intervention is limited in size under Pearl's Structural Equation Model with independent errors (SEM-IE). The objective is to minimize the number of experiments to discover the causal directions of all the edges in a causal graph. Previous work has focused on the use of separating systems for complete graphs for this task. We prove that any deterministic adaptive algorithm needs to be a separating system in order to learn complete graphs in the worst case. In addition, we present a novel separating system construction, whose size is close to optimal and is arguably simpler than previous work in combinatorics. We also develop a novel information theoretic lower bound on the number of interventions that applies in full generality, including for randomized adaptive learning algorithms. For general chordal graphs, we derive worst case lower bounds on the number of interventions. Building on observations about induced trees, we give a new deterministic adaptive algorithm to learn directions on any chordal skeleton completely. In the worst case, our achievable scheme is an $\alpha$-approximation algorithm where $\alpha$ is the independence number of the graph. We also show that there exist graph classes for which the sufficient number of experiments is close to the lower bound. In the other extreme, there are graph classes for which the required number of experiments is multiplicatively $\alpha$ away from our lower bound. In simulations, our algorithm almost always performs very close to the lower bound, while the approach based on separating systems for complete graphs is significantly worse for random chordal graphs.

NeurIPS Conference 2014 Conference Paper

Sparse Polynomial Learning and Graph Sketching

Murat Kocaoglu
Karthikeyan Shanmugam
Alexandros Dimakis
Adam Klivans

Let $f: \{-1, 1\}^n \rightarrow \mathbb{R}$ be a polynomial with at most $s$ non-zero real coefficients. We give an algorithm for exactly reconstructing $f$ given random examples from the uniform distribution on $\{-1, 1\}^n$ that runs in time polynomial in $n$ and $2^{s}$ and succeeds if the function satisfies the \textit{unique sign property}: there is one output value which corresponds to a unique set of values of the participating parities. This sufficient condition is satisfied when every coefficient of $f$ is perturbed by a small random noise, or satisfied with high probability when $s$ parity functions are chosen randomly or when all the coefficients are positive. Learning sparse polynomials over the Boolean domain in time polynomial in $n$ and $2^{s}$ is considered notoriously hard in the worst-case. Our result shows that the problem is tractable for almost all sparse polynomials. Then, we show an application of this result to hypergraph sketching which is the problem of learning a sparse (both in the number of hyperedges and the size of the hyperedges) hypergraph from uniformly drawn random cuts. We also provide experimental results on a real world dataset.