Arrow Research search

Author name cluster

Patrick Forré

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

TMLR Journal 2026 Journal Article

Generalizing Coverage Plots for Simulation-based Inference

  • Maximilian Lipp
  • Benjamin Kurt Miller
  • Lyubov Amitonova
  • Patrick Forré

Simulation-based inference (SBI) aims to find the probabilistic inverse of a non-linear function by fitting the posterior with a generative model on samples. Applications demand accurate uncertainty quantification, which can be difficult to achieve and verify. Since the ground truth model is implicitly defined in SBI, we cannot compute likelihood values nor draw samples from the posterior. This renders two-sample testing against the posterior impossible for any practical use and calls for proxy verification methods such as expected coverage testing. We introduce a differentiable objective that encourages coverage in the generative model by parameterizing the dual form of the total variation norm with neural networks. However, we find that coverage tests can easily report a good fit when the approximant deviates significantly from the target distribution and give strong empirical evidence and theoretical arguments why the expected coverage plot is, in general, not a reliable indicator of posterior fit. To address this matter, we introduce a new ratio coverage plot as a better alternative to coverage, which is not susceptible to the same blind spots. It comes at the price of estimating a ratio between our model and the ground truth posterior, which can be done using standard algorithms. We provide experimental results that back up this claim, and provide multiple algorithms for estimating ratio coverage.

TMLR Journal 2025 Journal Article

Modeling Human Beliefs about AI Behavior for Scalable Oversight

  • Leon Lang
  • Patrick Forré

As AI systems advance beyond human capabilities, scalable oversight becomes critical: how can we supervise AI that exceeds our abilities? A key challenge is that human evaluators may form incorrect beliefs about AI behavior in complex tasks, leading to unreliable feedback and poor value inference. To address this, we propose modeling evaluators' beliefs to interpret their feedback more reliably. We formalize human belief models, analyze their theoretical role in value learning, and characterize when ambiguity remains. To reduce reliance on precise belief models, we introduce "belief model covering" as a relaxation. This motivates our preliminary proposal to use the internal representations of adapted foundation models to mimic human evaluators' beliefs. These representations could be used to learn correct values from human feedback even when evaluators misunderstand the AI's behavior. Our work suggests that modeling human beliefs can improve value learning and outlines practical research directions for implementing this approach to scalable oversight.

ICML Conference 2025 Conference Paper

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret

  • Lukas Fluri
  • Leon Lang
  • Alessandro Abate
  • Patrick Forré
  • David Krueger 0001
  • Joar Skalse

In reinforcement learning, specifying reward functions that capture the intended task can be very challenging. Reward learning aims to address this issue by learning the reward function. However, a learned reward model may have a low error on the data distribution, and yet subsequently produce a policy with large regret. We say that such a reward model has an error-regret mismatch. The main source of an error-regret mismatch is the distributional shift that commonly occurs during policy optimization. In this paper, we mathematically show that a sufficiently low expected test error of the reward model guarantees low worst-case regret, but that for any fixed expected test error, there exist realistic data distributions that allow for error-regret mismatch to occur. We then show that similar problems persist even when using policy regularization techniques, commonly employed in methods such as RLHF. We hope our results stimulate the theoretical and empirical study of improved methods to learn reward models, and better ways to measure their quality reliably.

ICLR Conference 2024 Conference Paper

Clifford Group Equivariant Simplicial Message Passing Networks

  • Cong Liu
  • David Ruhe
  • Floor Eijkelboom
  • Patrick Forré

We introduce Clifford Group Equivariant Simplicial Message Passing Networks, a method for steerable $\mathrm{E}(n)$-equivariant message passing on simplicial complexes. Our method integrates the expressivity of Clifford group-equivariant layers with simplicial message passing, which is topologically more intricate than regular graph message passing. Clifford algebras include higher-order objects such as bivectors and trivectors, which express geometric features (e.g., areas, volumes) derived from vectors. Using this knowledge, we represent simplex features through geometric products of their vertices. To achieve efficient simplicial message passing, we share the parameters of the message network across different dimensions. Additionally, we restrict the final message to an aggregation of the incoming messages from different dimensions, leading to what we term *shared* simplicial message passing. Experimental results show that our method is able to outperform both equivariant and simplicial graph neural networks on a variety of geometric tasks.

ICML Conference 2024 Conference Paper

Clifford-Steerable Convolutional Neural Networks

  • Maksim Zhdanov 0001
  • David Ruhe
  • Maurice Weiler
  • Ana Lucic
  • Johannes Brandstetter
  • Patrick Forré

We present Clifford-Steerable Convolutional Neural Networks (CS-CNNs), a novel class of ${\operatorname{E}}(p, q)$-equivariant CNNs. CS-CNNs process multivector fields on pseudo-Euclidean spaces $\mathbb{R}^{p, q}$. They specialize, for instance, to ${\operatorname{E}}(3)$-equivariance on $\mathbb{R}^3$ and Poincaré-equivariance on Minkowski spacetime $\mathbb{R}^{1, 3}$. Our approach is based on an implicit parametrization of ${\operatorname{O}}(p, q)$-steerable kernels via Clifford group equivariant neural networks. We significantly and consistently outperform baseline methods on fluid dynamics as well as relativistic electrodynamics forecasting tasks.

TMLR Journal 2024 Journal Article

E-Valuating Classifier Two-Sample Tests

  • Teodora Pandeva
  • Tim Bakker
  • Christian A. Naesseth
  • Patrick Forré

We introduce a powerful deep classifier two-sample test for high-dimensional data based on E-values, called E-C2ST. Our test combines ideas from existing work on split likelihood ratio tests and predictive independence tests. The resulting E-values are suitable for anytime-valid sequential two-sample tests. This feature allows for more effective use of data in constructing test statistics. Through simulations and real data applications, we empirically demonstrate that E-C2ST achieves enhanced statistical power by partitioning datasets into multiple batches, beyond the conventional two-split (training and testing) approach of standard two-sample classifier tests. This strategy increases the power of the test, while keeping the type I error well below the desired significance level.

UAI Conference 2024 Conference Paper

Early-Exit Neural Networks with Nested Prediction Sets

  • Metod Jazbec
  • Patrick Forré
  • Stephan Mandt
  • Dan Zhang
  • Eric T. Nalisnick

Early-exit neural networks (EENNs) facilitate adaptive inference by producing predictions at multiple stages of the forward pass. In safety-critical applications, these predictions are only meaningful when complemented with reliable uncertainty estimates. Yet, due to their sequential structure, an EENN’s uncertainty estimates should also be *consistent*: labels that are deemed improbable at one exit should not reappear within the confidence interval / set of later exits. We show that standard uncertainty quantification techniques, like Bayesian methods or conformal prediction, can lead to inconsistency across exits. We address this problem by applying anytime-valid confidence sequences (AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across exits. We examine the theoretical and practical challenges of applying AVCSs to EENNs and empirically validate our approach on both regression and classification tasks.

ICLR Conference 2024 Conference Paper

Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

  • Marco Federici
  • Patrick Forré
  • Ryota Tomioka
  • Bastiaan S. Veeling

Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.

ICLR Conference 2024 Conference Paper

Lie Group Decompositions for Equivariant Neural Networks

  • Mircea Mironenco
  • Patrick Forré

Invariance and equivariance to geometrical transformations have proven to be very useful inductive biases when training (convolutional) neural network models, especially in the low-data regime. Much work has focused on the case where the symmetry group employed is compact or abelian, or both. Recent work has explored enlarging the class of transformations used to the case of Lie groups, principally through the use of their Lie algebra, as well as the group exponential and logarithm maps. The applicability of such methods to larger transformation groups is limited by the fact that depending on the group of interest $G$, the exponential map may not be surjective. Further limitations are encountered when $G$ is neither compact nor abelian. Using the structure and geometry of Lie groups and their homogeneous spaces, we present a framework by which it is possible to work with such groups primarily focusing on the Lie groups $G = \textnormal{GL}^{+}(n, \mathbb{R})$ and $G = \textnormal{SL}(n, \mathbb{R})$, as well as their representation as affine transformations $\mathbb{R}^{n} \rtimes G$. Invariant integration as well as a global parametrization is realized by decomposing the "larger" groups into subgroups and submanifolds which can be handled individually. Under this framework, we show how convolution kernels can be parametrized to build models equivariant with respect to affine transformations. We evaluate the robustness and out-of-distribution generalisation capability of our model on the standard affine-invariant benchmark classification task, where we outperform all previous equivariant models as well as all Capsule Network proposals.

NeurIPS Conference 2023 Conference Paper

Clifford Group Equivariant Neural Networks

  • David Ruhe
  • Johannes Brandstetter
  • Patrick Forré

We introduce Clifford Group Equivariant Neural Networks: a novel approach for constructing $\mathrm{O}(n)$- and $\mathrm{E}(n)$-equivariant models. We identify and study the *Clifford group*: a subgroup inside the Clifford algebra tailored to achieve several favorable properties. Primarily, the group's action forms an orthogonal automorphism that extends beyond the typical vector space to the entire Clifford algebra while respecting the multivector grading. This leads to several non-equivalent subrepresentations corresponding to the multivector decomposition. Furthermore, we prove that the action respects not just the vector space structure of the Clifford algebra but also its multiplicative structure, i. e. , the geometric product. These findings imply that every polynomial in multivectors, including their grade projections, constitutes an equivariant map with respect to the Clifford group, allowing us to parameterize equivariant neural network layers. An advantage worth mentioning is that we obtain expressive layers that can elegantly generalize to inner-product spaces of any dimension. We demonstrate, notably from a single core implementation, state-of-the-art performance on several distinct tasks, including a three-dimensional $n$-body experiment, a four-dimensional Lorentz-equivariant high-energy physics experiment, and a five-dimensional convex hull experiment.

NeurIPS Conference 2023 Conference Paper

Deep Gaussian Markov Random Fields for Graph-Structured Dynamical Systems

  • Fiona Lippert
  • Bart Kranstauber
  • Emiel van Loon
  • Patrick Forré

Probabilistic inference in high-dimensional state-space models is computationally challenging. For many spatiotemporal systems, however, prior knowledge about the dependency structure of state variables is available. We leverage this structure to develop a computationally efficient approach to state estimation and learning in graph-structured state-space models with (partially) unknown dynamics and limited historical data. Building on recent methods that combine ideas from deep learning with principled inference in Gaussian Markov random fields (GMRF), we reformulate graph-structured state-space models as Deep GMRFs defined by simple spatial and temporal graph layers. This results in a flexible spatiotemporal prior that can be learned efficiently from a single time sequence via variational inference. Under linear Gaussian assumptions, we retain a closed-form posterior, which can be sampled efficiently using the conjugate gradient method, scaling favourably compared to classical Kalman filter based approaches.

ICLR Conference 2023 Conference Paper

Equivariance-aware Architectural Optimization of Neural Networks

  • Kaitlin Maile
  • Dennis George Wilson
  • Patrick Forré

Incorporating equivariance to symmetry groups as a constraint during neural network training can improve performance and generalization for tasks exhibiting those symmetries, but such symmetries are often not perfectly nor explicitly present. This motivates algorithmically optimizing the architectural constraints imposed by equivariance. We propose the equivariance relaxation morphism, which preserves functionality while reparameterizing a group equivariant layer to operate with equivariance constraints on a subgroup, as well as the $[G]$-mixed equivariant layer, which mixes layers constrained to different groups to enable within-layer equivariance optimization. We further present evolutionary and differentiable neural architecture search (NAS) algorithms that utilize these mechanisms respectively for equivariance-aware architectural optimization. Experiments across a variety of datasets show the benefit of dynamically constrained equivariance to find effective architectures with approximate equivariance.

ICLR Conference 2023 Conference Paper

Multi-objective optimization via equivariant deep hypervolume approximation

  • Jim Boelrijk
  • Bernd Ensing
  • Patrick Forré

Optimizing multiple competing objectives is a common problem across science and industry. The inherent inextricable trade-off between those objectives leads one to the task of exploring their Pareto front. A meaningful quantity for the purpose of the latter is the hypervolume indicator, which is used in Bayesian Optimization (BO) and Evolutionary Algorithms (EAs). However, the computational complexity for the calculation of the hypervolume scales unfavorably with increasing number of objectives and data points, which restricts its use in those common multi-objective optimization frameworks. To overcome these restrictions, previous work has focused on approximating the hypervolume using deep learning. In this work, we propose a novel deep learning architecture to approximate the hypervolume function, which we call DeepHV. For better sample efficiency and generalization, we exploit the fact that the hypervolume is scale equivariant in each of the objectives as well as permutation invariant w.r.t. both the objectives and the samples, by using a deep neural network that is equivariant w.r.t. the combined group of scalings and permutations. We show through an ablation study that including these symmetries leads to significantly improved model accuracy. We evaluate our method against exact, and approximate hypervolume methods in terms of accuracy, computation time, and generalization. We also apply and compare our methods to state-of-the-art multi-objective BO methods and EAs on a range of synthetic and real-world benchmark test cases. The results show that our methods are promising for such multi-objective optimization tasks.

UAI Conference 2023 Conference Paper

Multi-View Independent Component Analysis with Shared and Individual Sources

  • Teodora Pandeva
  • Patrick Forré

Independent component analysis (ICA) is a blind source separation method for linear disentanglement of independent latent sources from observed data. We investigate the special setting of noisy linear ICA, referred to as ShIndICA, where the observations are split among different views, each receiving a mixture of shared and individual sources. We prove that the corresponding linear structure is identifiable and the sources distribution can be recovered. To computationally estimate the sources, we optimize a constrained form of the joint log-likelihood of the observed data among all views. Furthermore, we propose a model selection procedure for recovering the number of shared sources. Finally, we empirically demonstrate the advantages of our model over baselines. We apply ShIndICA in a challenging real-life task, using three transcriptome datasets provided by three different labs (three different views). The recovered sources were used for a downstream graph inference task, facilitating the discovery of a plausible representation of the data’s underlying graph structure.

NeurIPS Conference 2022 Conference Paper

Contrastive Neural Ratio Estimation

  • Benjamin K Miller
  • Christoph Weniger
  • Patrick Forré

Likelihood-to-evidence ratio estimation is usually cast as either a binary (NRE-A) or a multiclass (NRE-B) classification task. In contrast to the binary classification framework, the current formulation of the multiclass version has an intrinsic and unknown bias term, making otherwise informative diagnostics unreliable. We propose a multiclass framework free from the bias inherent to NRE-B at optimum, leaving us in the position to run diagnostics that practitioners depend on. It also recovers NRE-A in one corner case and NRE-B in the limiting case. For fair comparison, we benchmark the behavior of all algorithms in both familiar and novel training regimes: when jointly drawn data is unlimited, when data is fixed but prior draws are unlimited, and in the commonplace fixed data and parameters setting. Our investigations reveal that the highest performing models are distant from the competitors (NRE-A, NRE-B) in hyperparameter space. We make a recommendation for hyperparameters distinct from the previous models. We suggest a bound on the mutual information as a performance metric for simulation-based inference methods, without the need for posterior samples, and provide experimental results.

ICLR Conference 2022 Conference Paper

Self-Supervised Inference in State-Space Models

  • David Ruhe
  • Patrick Forré

We perform approximate inference in state-space models with nonlinear state transitions. Without parameterizing a generative model, we apply Bayesian update formulas using a local linearity approximation parameterized by neural networks. It comes accompanied by a maximum likelihood objective that requires no supervision via uncorrupt observations or ground truth latent states. The optimization backpropagates through a recursion similar to the classical Kalman filter and smoother. Additionally, using an approximate conditional independence, we can perform smoothing without having to parameterize a separate model. In scientific applications, domain knowledge can give a linear approximation of the latent transition maps, which we can easily incorporate into our model. Usage of such domain knowledge is reflected in excellent results (despite our model's simplicity) on the chaotic Lorenz system compared to fully supervised and variational inference methods. Finally, we show competitive results on an audio denoising experiment.

NeurIPS Conference 2021 Conference Paper

An Information-theoretic Approach to Distribution Shifts

  • Marco Federici
  • Ryota Tomioka
  • Patrick Forré

Safely deploying machine learning models to the real world is often a challenging process. For example, models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere, agents trained in a simulation can struggle to adapt when deployed in the real world or novel environments, and neural networks that are fit to a subset of the population might carry some selection bias into their decision process. In this work, we describe the problem of data shift from an information-theoretic perspective by (i) identifying and describing the different sources of error, (ii) comparing some of the most promising objectives explored in the recent domain generalization and fair classification literature. From our theoretical analysis and empirical evaluation, we conclude that the model selection procedure needs to be guided by careful considerations regarding the observed data, the factors used for correction, and the structure of the data-generating process.

NeurIPS Conference 2021 Conference Paper

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions

  • Emiel Hoogeboom
  • Didrik Nielsen
  • Priyank Jaini
  • Patrick Forré
  • Max Welling

Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.

ICML Conference 2021 Conference Paper

Selecting Data Augmentation for Simulating Interventions

  • Maximilian Ilse
  • Jakub M. Tomczak
  • Patrick Forré

Machine learning models trained with purely observational data and the principle of empirical risk minimization (Vapnik 1992) can fail to generalize to unseen domains. In this paper, we focus on the case where the problem arises through spurious correlation between the observed domains and the actual task labels. We find that many domain generalization methods do not explicitly take this spurious correlation into account. Instead, especially in more application-oriented research areas like medical imaging or robotics, data augmentation techniques that are based on heuristics are used to learn domain invariant features. To bridge the gap between theory and practice, we develop a causal perspective on the problem of domain generalization. We argue that causal concepts can be used to explain the success of data augmentation by describing how they can weaken the spurious correlation between the observed domains and the task labels. We demonstrate that data augmentation can serve as a tool for simulating interventional data. We use these theoretical insights to derive a simple algorithm that is able to select data augmentation techniques that will lead to better domain generalization.

ICML Conference 2021 Conference Paper

Self Normalizing Flows

  • T. Anderson Keller 0001
  • Jorn W. T. Peters
  • Priyank Jaini
  • Emiel Hoogeboom
  • Patrick Forré
  • Max Welling

Efficient gradient computation of the Jacobian determinant term is a core problem in many machine learning settings, and especially so in the normalizing flow framework. Most proposed flow models therefore either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose \emph{Self Normalizing Flows}, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer’s exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while training more quickly and surpassing the performance of functionally constrained counterparts.

NeurIPS Conference 2021 Conference Paper

Truncated Marginal Neural Ratio Estimation

  • Benjamin K Miller
  • Alex Cole
  • Patrick Forré
  • Gilles Louppe
  • Christoph Weniger

Parametric stochastic simulators are ubiquitous in science, often featuring high-dimensional input parameters and/or an intractable likelihood. Performing Bayesian parameter inference in this context can be challenging. We present a neural simulation-based inference algorithm which simultaneously offers simulation efficiency and fast empirical posterior testability, which is unique among modern algorithms. Our approach is simulation efficient by simultaneously estimating low-dimensional marginal posteriors instead of the joint posterior and by proposing simulations targeted to an observation of interest via a prior suitably truncated by an indicator function. Furthermore, by estimating a locally amortized posterior our algorithm enables efficient empirical tests of the robustness of the inference results. Since scientists cannot access the ground truth, these tests are necessary for trusting inference in real-world applications. We perform experiments on a marginalized version of the simulation-based inference benchmark and two complex and narrow posteriors, highlighting the simulator efficiency of our algorithm as well as the quality of the estimated marginal posteriors.

ICLR Conference 2020 Conference Paper

Learning Robust Representations via Multi-View Information Bottleneck

  • Marco Federici
  • Anjan Dutta 0001
  • Patrick Forré
  • Nate Kushman
  • Zeynep Akata

The information bottleneck principle provides an information-theoretic method for representation learning, by training an encoder to retain all information which is relevant for predicting the label while minimizing the amount of other, excess information in the representation. The original formulation, however, requires labeled data to identify the superfluous information. In this work, we extend this ability to the multi-view unsupervised setting, where two views of the same underlying entity are provided but the label is unknown. This enables us to identify superfluous information as that not shared by both views. A theoretical analysis leads to the definition of a new multi-view model that produces state-of-the-art results on the Sketchy dataset and label-limited versions of the MIR-Flickr dataset. We also extend our theory to the single-view setting by taking advantage of standard data augmentation techniques, empirically showing better generalization capabilities when compared to common unsupervised approaches for representation learning.

UAI Conference 2019 Conference Paper

Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias

  • Patrick Forré
  • Joris M. Mooij

We prove the main rules of causal calculus (also called do-calculus) for i/o structural causal models (ioSCMs), a generalization of a recently proposed general class of non-linear structural causal models that allow for cycles, latent confounders and arbitrary probability distributions. We also generalize adjustment criteria and formulas from the acyclic setting to the general one (i. e. ioSCMs). Such criteria then allow to estimate (conditional) causal effects from observational data that was (partially) gathered under selection bias and cycles. This generalizes the backdoor criterion, the selection-backdoor criterion and extensions of these to arbitrary ioSCMs. Together, our results thus enable causal reasoning in the presence of cycles, latent confounders and selection bias. Finally, we extend the ID algorithm for the identification of causal effects to ioSCMs.

UAI Conference 2019 Conference Paper

Sinkhorn AutoEncoders

  • Giorgio Patrini
  • Rianne van den Berg
  • Patrick Forré
  • Marcello Carioni
  • Samarth Bhargav 0001
  • Max Welling
  • Tim Genewein
  • Frank Nielsen

Optimal transport offers an alternative to maximum likelihood for learning generative autoencoding models. We show that minimizing the $p$-Wasserstein distance between the generator and the true data distribution is equivalent to the unconstrained min-min optimization of the $p$-Wasserstein distance between the encoder aggregated posterior and the prior in latent space, plus a reconstruction error. We also identify the role of its trade-off hyperparameter as the capacity of the generator: its Lipschitz constant. Moreover, we prove that optimizing the encoder over any class of universal approximators, such as deterministic neural networks, is enough to come arbitrarily close to the optimum. We therefore advertise this framework, which holds for any metric space and prior, as a sweet-spot of current generative autoencoding objectives. We then introduce the Sinkhorn autoencoder (SAE), which approximates and minimizes the $p$-Wasserstein distance in latent space via backprogation through the Sinkhorn algorithm. SAE directly works on samples, i. e. it models the aggregated posterior as an implicit distribution, with no need for a reparameterization trick for gradients estimations. SAE is thus able to work with different metric spaces and priors with minimal adaptations. We demonstrate the flexibility of SAE on latent spaces with different geometries and priors and compare with other methods on benchmark data sets.

UAI Conference 2018 Conference Paper

Constraint-based Causal Discovery for Non-Linear Structural Causal Models with Cycles and Latent Confounders

  • Patrick Forré
  • Joris M. Mooij

We address the problem of causal discovery from data, making use of the recently proposed causal modeling framework of modular structural causal models (mSCM) to handle cycles, latent confounders and non-linearities. We introduce σ-connection graphs (σ-CG), a new class of mixed graphs (containing undirected, bidirected and directed edges) with additional structure, and extend the concept of σ-separation, the appropriate generalization of the well-known notion of d-separation in this setting, to apply to σ-CGs. We prove the closedness of σ-separation under marginalisation and conditioning and exploit this to implement a test of σ-separation on a σ-CG. This then leads us to the first causal discovery algorithm that can handle non-linear functional relations, latent confounders, cyclic causal relationships, and data from different (stochastic) perfect interventions. As a proof of concept, we show on synthetic data how well the algorithm recovers features of the causal graph of modular structural causal models.