Arrow Research search

Author name cluster

Stefan Bauer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

46 papers
2 author rows

Possible papers

46

TMLR Journal 2025 Journal Article

Double Machine Learning Based Structure Identification from Temporal Data

  • Emmanouil Angelis
  • Francesco Quinzan
  • Ashkan Soleymani
  • Patrick Jaillet
  • Stefan Bauer

Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications or even contain cycles. To address these challenges, we propose a new double machine learning based method for structure identification from temporal data (DR-SIT). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles, and they may even be confounded. We further perform extensive experiments to showcase the superior performance of our method. Code: https://github.com/sdi1100041/TMLR_submission_DR_SIT

ICLR Conference 2025 Conference Paper

Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models

  • Amir Mohammad Karimi-Mamaghan
  • Samuele S. Papa
  • Karl Henrik Johansson
  • Stefan Bauer
  • Andrea Dittadi

Object-centric (OC) representations, which model visual scenes as compositions of discrete objects, have the potential to be used in various downstream tasks to achieve systematic compositional generalization and facilitate reasoning. However, these claims have yet to be thoroughly validated empirically. Recently, foundation models have demonstrated unparalleled capabilities across diverse domains, from language to computer vision, positioning them as a potential cornerstone of future research for a wide range of computational tasks. In this paper, we conduct an extensive empirical study on representation learning for downstream Visual Question Answering (VQA), which requires an accurate compositional understanding of the scene. We thoroughly investigate the benefits and trade-offs of OC models and alternative approaches including large pre-trained foundation models on both synthetic and real-world data, ultimately identifying a promising path to leverage the strengths of both paradigms. The extensiveness of our study, encompassing over 600 downstream VQA models and 15 different types of upstream representations, also provides several additional insights that we believe will be of interest to the community at large.

NeurIPS Conference 2025 Conference Paper

Preference-Guided Diffusion for Multi-Objective Offline Optimization

  • Yashas Annadani
  • Syrine Belakaria
  • Stefano Ermon
  • Stefan Bauer
  • Barbara Engelhardt

Offline multi-objective optimization aims to identify Pareto-optimal solutions given a dataset of designs and their objective values. In this work, we propose a preference-guided diffusion model that generates Pareto-optimal designs by leveraging a classifier-based guidance mechanism. Our guidance classifier is a preference model trained to predict the probability that one design dominates another, directing the diffusion model toward optimal regions of the design space. Crucially, this preference model generalizes beyond the training distribution, enabling the discovery of Pareto-optimal solutions outside the observed dataset. We introduce a novel diversity-aware preference guidance, augmenting Pareto dominance preference with diversity criteria. This ensures that generated solutions are optimal and well-distributed across the objective space, a capability absent in prior generative methods for offline multi-objective optimization. We evaluate our approach on various continuous offline multi-objective optimization tasks and find that it consistently outperforms other inverse/generative approaches while remaining competitive with forward/ surrogate-based optimization methods. Our results highlight the effectiveness of classifier-guided diffusion models in generating diverse and high-quality solutions that approximate the Pareto front well.

NeurIPS Conference 2024 Conference Paper

Amortized Active Causal Induction with Deep Reinforcement Learning

  • Yashas Annadani
  • Panagiotis Tigas
  • Stefan Bauer
  • Adam Foster

We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close the true causal graph is to a causal graph posterior inferred from the gathered data. On synthetic data and a single-cell gene expression simulator, we demonstrate empirically that the data acquired through our policy results in a better estimate of the underlying causal graph than alternative strategies. Our design policy successfully achieves amortized intervention design on the distribution of the training environment while also generalizing well to distribution shifts in test-time design environments. Further, our policy also demonstrates excellent zero-shot generalization to design environments with dimensionality higher than that during training, and to intervention types that it has not been trained on.

ICML Conference 2024 Conference Paper

Challenges and Considerations in the Evaluation of Bayesian Causal Discovery

  • Amir Mohammad Karimi-Mamaghan
  • Panagiotis Tigas
  • Karl Henrik Johansson
  • Yarin Gal
  • Yashas Annadani
  • Stefan Bauer

Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Bayesian Causal Discovery (BCD) offers a principled approach to encapsulating this uncertainty. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, evaluating BCD presents challenges due to the nature of its inferred quantity – the posterior distribution. As a result, the research community has proposed various metrics to assess the quality of the approximate posterior. However, there is, to date, no consensus on the most suitable metric(s) for evaluation. In this work, we reexamine this question by dissecting various metrics and understanding their limitations. Through extensive empirical evaluation, we find that many existing metrics fail to exhibit a strong correlation with the quality of approximation to the true posterior, especially in scenarios with low sample sizes where BCD is most desirable. We highlight the suitability (or lack thereof) of these metrics under two distinct factors: the identifiability of the underlying causal model and the quantity of available data. Both factors affect the entropy of the true posterior, indicating that the current metrics are less fitting in settings of higher entropy. Our findings underline the importance of a more nuanced evaluation of new methods by taking into account the nature of the true posterior, as well as guide and motivate the development of new evaluation procedures for this challenge.

NeurIPS Conference 2023 Conference Paper

BayesDAG: Gradient-Based Posterior Inference for Causal Discovery

  • Yashas Annadani
  • Nick Pawlowski
  • Joel Jennings
  • Stefan Bauer
  • Cheng Zhang
  • Wenbo Gong

Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on a combination of stochastic gradient Markov Chain Monte Carlo (SG-MCMC) and Variational Inference (VI) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluation on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.

ICLR Conference 2023 Conference Paper

Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

  • Nico Gürtler
  • Sebastian Blaes
  • Pavel Kolev
  • Felix Widmaier
  • Manuel Wüthrich
  • Stefan Bauer
  • Bernhard Schölkopf
  • Georg Martius

Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

ICML Conference 2023 Conference Paper

Differentiable Multi-Target Causal Bayesian Experimental Design

  • Panagiotis Tigas
  • Yashas Annadani
  • Desi R. Ivanova
  • Andrew Jesson
  • Yarin Gal
  • Adam Foster 0001
  • Stefan Bauer

We introduce a gradient-based approach for the problem of Bayesian optimal experimental design to learn causal models in a batch setting — a critical component for causal discovery from finite data where interventions can be costly or risky. Existing methods rely on greedy approximations to construct a batch of experiments while using black-box methods to optimize over a single target-state pair to intervene with. In this work, we completely dispose of the black-box optimization techniques and greedy heuristics and instead propose a conceptually simple end-to-end gradient-based optimization procedure to acquire a set of optimal intervention target-value pairs. Such a procedure enables parameterization of the design space to efficiently optimize over a batch of multi-target-state interventions, a setting which has hitherto not been explored due to its complexity. We demonstrate that our proposed method outperforms baselines and existing acquisition strategies in both single-target and multi-target settings across a number of synthetic datasets.

ICML Conference 2023 Conference Paper

Diffusion Based Representation Learning

  • Sarthak Mittal
  • Korbinian Abstreiter
  • Stefan Bauer
  • Bernhard Schölkopf
  • Arash Mehrjou

Diffusion-based methods, represented as stochastic differential equations on a continuous-time domain, have recently proven successful as non-adversarial generative models. Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. Here, we augment the denoising score matching framework to enable representation learning without any supervised signal. GANs and VAEs learn representations by directly transforming latent codes to data samples. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective and thus encodes the information needed for denoising. We illustrate how this difference allows for manual control of the level of details encoded in the representation. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements on state-of-the-art models on semi-supervised image classification. We also compare the quality of learned representations of diffusion score matching with other methods like autoencoder and contrastively trained systems through their performances on downstream tasks. Finally, we also ablate with a different SDE formulation for diffusion models and show that the benefits on downstream tasks are still present on changing the underlying differential equation.

ICML Conference 2023 Conference Paper

DiscoBAX: Discovery of optimal intervention sets in genomic experiment design

  • Clare Lyle
  • Arash Mehrjou
  • Pascal Notin
  • Andrew Jesson
  • Stefan Bauer
  • Yarin Gal
  • Patrick Schwab

The discovery of therapeutics to treat genetically-driven pathologies relies on identifying genes involved in the underlying disease mechanism. Existing approaches search over the billions of potential interventions to maximize the expected influence on the target phenotype. However, to reduce the risk of failure in future stages of trials, practical experiment design aims to find a set of interventions that maximally change a target phenotype via diverse mechanisms. We propose DiscoBAX - a sample-efficient method for maximizing the rate of significant discoveries per experiment while simultaneously probing for a wide range of diverse mechanisms during a genomic experiment campaign. We provide theoretical guarantees of optimality under standard assumptions, and conduct a comprehensive experimental evaluation covering both synthetic as well as real-world experimental design tasks. DiscoBAX outperforms existing state-of-the-art methods for experimental design, selecting effective and diverse perturbations in biological systems.

ICML Conference 2023 Conference Paper

DRCFS: Doubly Robust Causal Feature Selection

  • Francesco Quinzan
  • Ashkan Soleymani
  • Patrick Jaillet
  • Cristian R. Rojas
  • Stefan Bauer

Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.

TMLR Journal 2023 Journal Article

Neural Causal Structure Discovery from Interventions

  • Nan Rosemary Ke
  • Olexa Bilaniuk
  • Anirudh Goyal
  • Stefan Bauer
  • Hugo Larochelle
  • Bernhard Schölkopf
  • Michael Curtis Mozer
  • Christopher Pal

Recent promising results have generated a surge of interest in continuous optimization methods for causal discovery from observational data. However, there are theoretical limitations on the identifiability of underlying structures obtained solely from observational data. Interventional data, on the other hand, provides richer information about the underlying data-generating process. Nevertheless, extending and applying methods designed for observational data to include interventions is a challenging problem. To address this issue, we propose a general framework based on neural networks to develop models that incorporate both observational and interventional data. Notably, our method can handle the challenging and realistic scenario where the identity of the intervened upon variable is unknown. We evaluate our proposed approach in the context of graph recovery, both de novo and from a partially-known edge set. Our method achieves strong benchmark results on various structure learning tasks, including structure recovery of synthetic graphs as well as standard graphs from the Bayesian Network Repository.

ICLR Conference 2023 Conference Paper

Structure by Architecture: Structured Representations without Regularization

  • Felix Leeb
  • Giulia Lanzillotta
  • Yashas Annadani
  • Michel Besserve
  • Stefan Bauer
  • Bernhard Schölkopf

We study the problem of self-supervised structured representation learning using autoencoders for downstream tasks such as generative modeling. Unlike most methods which rely on matching an arbitrary, relatively unstructured, prior distribution for sampling, we propose a sampling technique that relies solely on the independence of latent variables, thereby avoiding the trade-off between reconstruction quality and generative performance typically observed in VAEs. We design a novel autoencoder architecture capable of learning a structured representation without the need for aggressive regularization. Our structural decoders learn a hierarchy of latent variables, thereby ordering the information without any additional regularization or supervision. We demonstrate how these models learn a representation that improves results in a variety of downstream tasks including generation, disentanglement, and extrapolation using several challenging and natural image datasets.

NeurIPS Conference 2023 Conference Paper

Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery

  • Mateusz Olko
  • Michał Zając
  • Aleksandra Nowak
  • Nino Scherrer
  • Yashas Annadani
  • Stefan Bauer
  • Łukasz Kuciński
  • Piotr Miłoś

Inferring causal structure from data is a challenging task of fundamental importance in science. Often, observational data alone is not enough to uniquely identify a system’s causal structure. The use of interventional data can address this issue, however, acquiring these samples typically demands a considerable investment of time and physical or financial resources. In this work, we are concerned with the acquisition of interventional data in a targeted manner to minimize the number of required experiments. We propose a novel Gradient-based Intervention Targeting method, abbreviated GIT, that ’trusts’ the gradient estimator of a gradient-based causal discovery framework to provide signals for the intervention targeting function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime.

ICML Conference 2022 Conference Paper

Adaptive Gaussian Process Change Point Detection

  • Edoardo Caldarelli
  • Philippe Wenk
  • Stefan Bauer
  • Andreas Krause 0001

Detecting change points in time series, i. e. , points in time at which some observed process suddenly changes, is a fundamental task that arises in many real-world applications, with consequences for safety and reliability. In this work, we propose ADAGA, a novel Gaussian process-based solution to this problem, that leverages a powerful heuristics we developed based on statistical hypothesis testing. In contrast to prior approaches, ADAGA adapts to changes both in mean and covariance structure of the temporal process. In extensive experiments, we show its versatility and applicability to different classes of change points, demonstrating that it is significantly more accurate than current state-of-the-art alternatives.

UAI Conference 2022 Conference Paper

Bayesian structure learning with generative flow networks

  • Tristan Deleu
  • António Góis
  • Chris Emezue
  • Mansi Rankawat
  • Simon Lacoste-Julien
  • Stefan Bauer
  • Yoshua Bengio

In Bayesian structure learning, we are interested in inferring a distribution over the directed acyclic graph (DAG) structure of Bayesian networks, from data. Defining such a distribution is very challenging, due to the combinatorially large sample space, and approximations based on MCMC are often required. Recently, a novel class of probabilistic models, called Generative Flow Networks (GFlowNets), have been introduced as a general framework for generative modeling of discrete and composite objects, such as graphs. In this work, we propose to use a GFlowNet as an alternative to MCMC for approximating the posterior distribution over the structure of Bayesian networks, given a dataset of observations. Generating a sample DAG from this approximate distribution is viewed as a sequential decision problem, where the graph is constructed one edge at a time, based on learned transition probabilities. Through evaluation on both simulated and real data, we show that our approach, called DAG-GFlowNet, provides an accurate approximation of the posterior over DAGs, and it compares favorably against other methods based on MCMC or variational inference.

TMLR Journal 2022 Journal Article

Causal Feature Selection via Orthogonal Search

  • Ashkan Soleymani
  • Anant Raj
  • Stefan Bauer
  • Bernhard Schölkopf
  • Michel Besserve

The problem of inferring the direct causal parents of a response variable among a large set of explanatory variables is of high practical importance in many disciplines. However, established approaches often scale at least exponentially with the number of explanatory variables, are difficult to extend to nonlinear relationships and are difficult to extend to cyclic data. Inspired by debiased machine learning methods, we study a one-vs.-the-rest feature selection approach to discover the direct causal parent of the response. We propose an algorithm that works for purely observational data while also offering theoretical guarantees, including the case of partially nonlinear relationships possibly under the presence of cycles. As it requires only one estimation for each variable, our approach is applicable even to large graphs. We demonstrate significant improvements compared to established approaches.

TMLR Journal 2022 Journal Article

Diffusion Models for Video Prediction and Infilling

  • Tobias Höppe
  • Arash Mehrjou
  • Stefan Bauer
  • Didrik Nielsen
  • Andrea Dittadi

Predicting and anticipating future outcomes or reasoning about missing information in a sequence are critical skills for agents to be able to make intelligent decisions. This requires strong, temporally coherent generative capabilities. Diffusion models have shown remarkable success in several generative tasks, but have not been extensively explored in the video domain. We present Random-Mask Video Diffusion (RaMViD), which extends image diffusion models to videos using 3D convolutions, and introduces a new conditioning technique during training. By varying the mask we condition on, the model is able to perform video prediction, infilling, and upsampling. Due to our simple conditioning scheme, we can utilize the same architecture as used for unconditional training, which allows us to train the model in a conditional and unconditional fashion at the same time. We evaluate RaMViD on two benchmark datasets for video prediction, on which we achieve state-of-the-art results, and one for video generation. High-resolution videos are provided at https://sites.google.com/view/video-diffusion-prediction.

NeurIPS Conference 2022 Conference Paper

Exploring the Latent Space of Autoencoders with Interventional Assays

  • Felix Leeb
  • Stefan Bauer
  • Michel Besserve
  • Bernhard Schölkopf

Autoencoders exhibit impressive abilities to embed the data manifold into a low-dimensional latent space, making them a staple of representation learning methods. However, without explicit supervision, which is often unavailable, the representation is usually uninterpretable, making analysis and principled progress challenging. We propose a framework, called latent responses, which exploits the locally contractive behavior exhibited by variational autoencoders to explore the learned manifold. More specifically, we develop tools to probe the representation using interventions in the latent space to quantify the relationships between latent variables. We extend the notion of disentanglement to take the learned generative process into account and consequently avoid the limitations of existing metrics that may rely on spurious correlations. Our analyses underscore the importance of studying the causal structure of the representation to improve performance on downstream tasks such as generation, interpolation, and inference of the factors of variation.

ICLR Conference 2022 Conference Paper

GeneDisco: A Benchmark for Experimental Design in Drug Discovery

  • Arash Mehrjou
  • Ashkan Soleymani
  • Andrew Jesson
  • Pascal Notin
  • Yarin Gal
  • Stefan Bauer
  • Patrick Schwab

In vitro cellular experimentation with genetic interventions, using for example CRISPR technologies, is an essential step in early-stage drug discovery and target validation that serves to assess initial hypotheses about causal associations between biological mechanisms and disease pathologies. With billions of potential hypotheses to test, the experimental design space for in vitro genetic experiments is extremely vast, and the available experimental capacity - even at the largest research institutions in the world - pales in relation to the size of this biological hypothesis space. Machine learning methods, such as active and reinforcement learning, could aid in optimally exploring the vast biological space by integrating prior knowledge from various information sources as well as extrapolating to yet unexplored areas of the experimental design space based on available data. However, there exist no standardised benchmarks and data sets for this challenging task and little research has been conducted in this area to date. Here, we introduce GeneDisco, a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery. GeneDisco contains a curated set of multiple publicly available experimental data sets as well as open-source implementations of state-of-the-art active learning policies for experimental design and exploration.

NeurIPS Conference 2022 Conference Paper

Interventions, Where and How? Experimental Design for Causal Models at Scale

  • Panagiotis Tigas
  • Yashas Annadani
  • Andrew Jesson
  • Bernhard Schölkopf
  • Yarin Gal
  • Stefan Bauer

Causal discovery from observational and interventional data is challenging due to limited data and non-identifiability which introduces uncertainties in estimating the underlying structural causal model (SCM). Incorporating these uncertainties and selecting optimal experiments (interventions) to perform can help to identify the true SCM faster. Existing methods in experimental design for causal discovery from limited data either rely on linear assumptions for the SCM or select only the intervention target. In this paper, we incorporate recent advances in Bayesian causal discovery into the Bayesian optimal experimental design framework, which allows for active causal discovery of nonlinear, large SCMs, while selecting both the target and the value to intervene with. We demonstrate the performance of the proposed method on synthetic graphs (Erdos-Rènyi, Scale Free) for both linear and nonlinear SCMs as well as on the \emph{in-silico} single-cell gene regulatory network dataset, DREAM.

ICLR Conference 2022 Conference Paper

The Role of Pretrained Representations for the OOD Generalization of RL Agents

  • Frederik Träuble
  • Andrea Dittadi
  • Manuel Wüthrich
  • Felix Widmaier
  • Peter V. Gehler
  • Ole Winther
  • Francesco Locatello
  • Olivier Bachem

Building sample-efficient agents that generalize out-of-distribution (OOD) in real-world settings remains a fundamental unsolved problem on the path towards achieving higher-level cognition. One particularly promising approach is to begin with low-dimensional, pretrained representations of our world, which should facilitate efficient downstream learning and generalization. By training 240 representations and over 10,000 reinforcement learning (RL) policies on a simulated robotic setup, we evaluate to what extent different properties of pretrained VAE-based representations affect the OOD generalization of downstream agents. We observe that many agents are surprisingly robust to realistic distribution shifts, including the challenging sim-to-real case. In addition, we find that the generalization performance of a simple downstream proxy task reliably predicts the generalization performance of our RL agents under a wide range of OOD settings. Such proxy tasks can thus be used to select pretrained representations that will lead to agents that generalize.

IROS Conference 2022 Conference Paper

Transferring Dexterous Manipulation from GPU Simulation to a Remote Real-World TriFinger

  • Arthur Allshire
  • Mayank Mittal
  • Varun Lodaya
  • Viktor Makoviychuk
  • Denys Makoviichuk
  • Felix Widmaier
  • Manuel Wüthrich
  • Stefan Bauer

In-hand manipulation of objects is an important capability to enable robots to carry-out tasks which demand high levels of dexterity. This work presents a robot systems approach to learning dexterous manipulation tasks involving moving objects to arbitrary 6-DoF poses. We show empirical benefits, both in simulation and sim - to- real transfer, of using keypoint-based representations for object pose in policy observations and reward calculation to train a model-free reinforcement learning agent. By utilizing domain randomization strategies and large-scale training, we achieve a high success rate of 83 % on a real TriFinger system, with a single policy able to perform grasping, ungrasping, and finger gaiting in order to achieve arbitrary poses within the workspace. We demonstrate that our policy can generalise to unseen objects, and success rates can be further improved through finetuning. With the aim of assisting further research in learning in-hand manipulation, we provide a detailed exposition of our system and make the codebase of our system available, along with checkpoints trained on billions of steps of experience, at https://s2r2-ig.github.io

ICLR Conference 2021 Conference Paper

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

  • Ossama Ahmed
  • Frederik Träuble
  • Anirudh Goyal
  • Alexander Neitz
  • Manuel Wüthrich
  • Yoshua Bengio
  • Bernhard Schölkopf
  • Stefan Bauer

Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we proposeCausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

ICML Conference 2021 Conference Paper

Function Contrastive Learning of Transferable Meta-Representations

  • Muhammad Waleed Gondal
  • Shruti Joshi
  • Nasim Rahaman
  • Stefan Bauer
  • Manuel Wüthrich
  • Bernhard Schölkopf

Meta-learning algorithms adapt quickly to new tasks that are drawn from the same task distribution as the training tasks. The mechanism leading to fast adaptation is the conditioning of a downstream predictive model on the inferred representation of the task’s underlying data generative process, or \emph{function}. This \emph{meta-representation}, which is computed from a few observed examples of the underlying function, is learned jointly with the predictive model. In this work, we study the implications of this joint training on the transferability of the meta-representations. Our goal is to learn meta-representations that are robust to noise in the data and facilitate solving a wide range of downstream tasks that share the same underlying functions. To this end, we propose a decoupled encoder-decoder approach to supervised meta-learning, where the encoder is trained with a contrastive objective to find a good representation of the underlying function. In particular, our training scheme is driven by the self-supervision signal indicating whether two sets of examples stem from the same function. Our experiments on a number of synthetic and real-world datasets show that the representations we obtain outperform strong baselines in terms of downstream performance and noise robustness, even when these baselines are trained in an end-to-end manner.

ICML Conference 2021 Conference Paper

On Disentangled Representations Learned from Correlated Data

  • Frederik Träuble
  • Elliot Creager
  • Niki Kilbertus
  • Francesco Locatello
  • Andrea Dittadi
  • Anirudh Goyal
  • Bernhard Schölkopf
  • Stefan Bauer

The focus of disentanglement approaches has been on identifying independent factors of variation in data. However, the causal variables underlying real-world observations are often not statistically independent. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data in a large-scale empirical study (including 4260 models). We show and quantify that systematically induced correlations in the dataset are being learned and reflected in the latent representations, which has implications for downstream applications of disentanglement such as fairness. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.

ICLR Conference 2021 Conference Paper

On the Transfer of Disentangled Representations in Realistic Settings

  • Andrea Dittadi
  • Frederik Träuble
  • Francesco Locatello
  • Manuel Wüthrich
  • Vaibhav Agrawal
  • Ole Winther
  • Stefan Bauer
  • Bernhard Schölkopf

Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance.

ICLR Conference 2021 Conference Paper

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling

  • Ðorðe Miladinovic
  • Aleksandar Stanic
  • Stefan Bauer
  • Jürgen Schmidhuber
  • Joachim M. Buhmann

How to improve generative modeling by better exploiting spatial regularities and coherence in images? We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs). In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way, using a sequential gating-based mechanism that distributes contextual information across 2-D space. We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation over baseline convolutional architectures and the state-of-the-art among the models within the same class. Furthermore, we demonstrate that SDN can be applied to large images by synthesizing samples of high quality and coherence. In a vanilla VAE setting, we find that a powerful SDN decoder also improves learning disentangled representations, indicating that neural architectures play an important role in this task. Our results suggest favoring spatial dependency over convolutional layers in various VAE settings. The accompanying source code is given at https://github.com/djordjemila/sdn.

ICLR Conference 2021 Conference Paper

Spatially Structured Recurrent Modules

  • Nasim Rahaman
  • Anirudh Goyal
  • Muhammad Waleed Gondal
  • Manuel Wüthrich
  • Stefan Bauer
  • Yash Sharma 0001
  • Yoshua Bengio
  • Bernhard Schölkopf

Capturing the structure of a data-generating process by means of appropriate inductive biases can help in learning models that generalise well and are robust to changes in the input distribution. While methods that harness spatial and temporal structures find broad application, recent work has demonstrated the potential of models that leverage sparse and modular structure using an ensemble of sparingly interacting modules. In this work, we take a step towards dynamic models that are capable of simultaneously exploiting both modular and spatiotemporal structures. To this end, we model the dynamical system as a collection of autonomous but sparsely interacting sub-systems that interact according to a learned topology which is informed by the spatial structure of the underlying system. This gives rise to a class of models that are well suited for capturing the dynamics of systems that only offer local views into their state, along with corresponding spatial locations of those views. On the tasks of video prediction from cropped frames and multi-agent world modelling from partial observations in the challenging Starcraft2 domain, we find our models to be more robust to the number of available views and better capable of generalisation to novel tasks without additional training than strong baselines that perform equally well or better on the training distribution.

NeurIPS Conference 2021 Conference Paper

Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning

  • Nan Rosemary Ke
  • Aniket Didolkar
  • Sarthak Mittal
  • Anirudh Goyal ALIAS PARTH GOYAL
  • Guillaume Lajoie
  • Stefan Bauer
  • Danilo Jimenez Rezende
  • Michael Mozer

Inducing causal relationships from observations is a classic problem in machine learning. Most work in causality starts from the premise that the causal variables themselves are observed. However, for AI agents such as robots trying to make sense of their environment, the only observables are low-level variables like pixels in images. To generalize well, an agent must induce high-level variables, particularly those which are causal or are affected by causal variables. A central goal for AI and causality is thus the joint discovery of abstract representations and causal structure. However, we note that existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs which are impossible to manipulate parametrically (e. g. , number of nodes, sparsity, causal chain length, etc. ). In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them. In order to systematically probe the ability of methods to identify these variables and structures, we design a suite of benchmarking RL environments. We evaluate various representation learning algorithms from the literature and find that explicitly incorporating structure and modularity in models can help causal induction in model-based reinforcement learning.

AAAI Conference 2020 Conference Paper

A Commentary on the Unsupervised Learning of Disentangled Representations

  • Francesco Locatello
  • Stefan Bauer
  • Mario Lucic
  • Gunnar Rätsch
  • Sylvain Gelly
  • Bernhard Schölkopf
  • Olivier Bachem

The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of (Locatello et al. 2019b) and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases and the practical challenges it entails. Finally, we comment on our experimental findings, highlighting the limitations of state-of-the-art approaches and directions for future research.

ICRA Conference 2020 Conference Paper

A Real-Robot Dataset for Assessing Transferability of Learned Dynamics Models

  • Diego Agudelo-España
  • Andrii Zadaianchuk
  • Philippe Wenk
  • Aditya Garg
  • Joel Akpo
  • Felix Grimminger
  • Julian Viereck
  • Maximilien Naveau

In the context of model-based reinforcement learning and control, a large number of methods for learning system dynamics have been proposed in recent years. The purpose of these learned models is to synthesize new control policies. An important open question is how robust current dynamics-learning methods are to shifts in the data distribution due to changes in the control policy. We present a real-robot dataset which allows to systematically investigate this question. This dataset contains trajectories of a 3 degrees-of-freedom (DOF) robot being controlled by a diverse set of policies. For comparison, we also provide a simulated version of the dataset. Finally, we benchmark a few widely-used dynamics-learning methods using the proposed dataset. Our results show that the iid test error of a learned model is not necessarily a good indicator of its accuracy under control policies different from the one which generated the training data. This suggests that it may be important to evaluate dynamics-learning methods in terms of their transfer performance, rather than only their iid error.

JMLR Journal 2020 Journal Article

A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

  • Francesco Locatello
  • Stefan Bauer
  • Mario Lucic
  • Gunnar Raetsch
  • Sylvain Gelly
  • Bernhard Schölkopf
  • Olivier Bachem

The idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train over $14000$ models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on eight data sets. We observe that while the different methods successfully enforce properties “encouraged” by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, different evaluation metrics do not always agree on what should be considered “disentangled” and exhibit systematic differences in the estimation. Finally, increased disentanglement does not seem to necessarily lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

UAI Conference 2020 Conference Paper

Bayesian Online Prediction of Change Points

  • Diego Agudelo-España
  • Sebastián Gómez-González
  • Stefan Bauer
  • Bernhard Schölkopf
  • Jan Peters 0001

Online detection of instantaneous changes in the generative process of a data sequence generally focuses on retrospective inference of such change points without considering their future occurrences. We extend the Bayesian Online Change Point Detection algorithm to also infer the number of time steps until the next change point (i. e. , the residual time). This enables to handle observation models which depend on the total segment duration, which is useful to model data sequences with temporal scaling. The resulting inference algorithm for segment detection can be deployed in an online fashion, and we illustrate applications to synthetic and to two medical real-world data sets.

ICLR Conference 2020 Conference Paper

Disentangling Factors of Variations Using Few Labels

  • Francesco Locatello
  • Michael Tschannen
  • Stefan Bauer
  • Gunnar Rätsch
  • Bernhard Schölkopf
  • Olivier Bachem

Learning disentangled representations is considered a cornerstone problem in representation learning. Recently, Locatello et al. (2019) demonstrated that unsupervised disentanglement learning without inductive biases is theoretically impossible and that existing inductive biases and unsupervised methods do not allow to consistently learn disentangled representations. However, in many practical settings, one might have access to a limited amount of supervision, for example through manual labeling of (some) factors of variation in a few training examples. In this paper, we investigate the impact of such supervision on state-of-the-art disentanglement methods and perform a large scale study, training over 52000 models under well-defined and reproducible experimental conditions. We observe that a small number of labeled examples (0.01--0.5% of the data set), with potentially imprecise and incomplete labels, is sufficient to perform model selection on state-of-the-art unsupervised models. Further, we investigate the benefit of incorporating supervision into the training process. Overall, we empirically validate that with little and imprecise supervision it is possible to reliably learn disentangled representations.

AAAI Conference 2020 Conference Paper

Learning Counterfactual Representations for Estimating Individual Dose-Response Curves

  • Patrick Schwab
  • Lorenz Linhardt
  • Stefan Bauer
  • Joachim M. Buhmann
  • Walter Karlen

Estimating what would be an individual’s potential response to varying levels of exposure to a treatment is of high practical relevance for several important fields, such as healthcare, economics and public policy. However, existing methods for learning to estimate counterfactual outcomes from observational data are either focused on estimating average doseresponse curves, or limited to settings with only two treatments that do not have an associated dosage parameter. Here, we present a novel machine-learning approach towards learning counterfactual representations for estimating individual dose-response curves for any number of treatments with continuous dosage parameters with neural networks. Building on the established potential outcomes framework, we introduce performance metrics, model selection criteria, model architectures, and open benchmarks for estimating individual dose-response curves. Our experiments show that the methods developed in this work set a new state-of-the-art in estimating individual dose-response.

AAAI Conference 2020 Conference Paper

ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical Systems

  • Philippe Wenk
  • Gabriele Abbati
  • Michael A. Osborne
  • Bernhard Schölkopf
  • Andreas Krause
  • Stefan Bauer

Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian processes and leverage it to build a computationally and data efficient algorithm for state and parameter inference. In an extensive set of experiments, our approach outperforms the current state of the art for parameter inference both in terms of accuracy and computational cost. It also shows promising results for the much more challenging problem of model selection.

ICML Conference 2019 Conference Paper

AReS and MaRS Adversarial and MMD-Minimizing Regression for SDEs

  • Gabriele Abbati
  • Philippe Wenk
  • Michael A. Osborne
  • Andreas Krause 0001
  • Bernhard Schölkopf
  • Stefan Bauer

Stochastic differential equations are an important modeling class in many disciplines. Consequently, there exist many methods relying on various discretization and numerical integration schemes. In this paper, we propose a novel, probabilistic model for estimating the drift and diffusion given noisy observations of the underlying stochastic system. Using state-of-the-art adversarial and moment matching inference techniques, we avoid the discretization schemes of classical approaches. This leads to significant improvements in parameter accuracy and robustness given random initial guesses. On four established benchmark systems, we compare the performance of our algorithms to state-of-the-art solutions based on extended Kalman filtering and Gaussian processes.

ICML Conference 2019 Conference Paper

Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

  • Francesco Locatello
  • Stefan Bauer
  • Mario Lucic
  • Gunnar Rätsch
  • Sylvain Gelly
  • Bernhard Schölkopf
  • Olivier Bachem

The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than $12000$ models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties “encouraged” by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

NeurIPS Conference 2019 Conference Paper

On the Fairness of Disentangled Representations

  • Francesco Locatello
  • Gabriele Abbati
  • Thomas Rainforth
  • Stefan Bauer
  • Bernhard Schölkopf
  • Olivier Bachem

Recently there has been a significant interest in learning disentangled representations, as they promise increased interpretability, generalization to unseen scenarios and faster learning on downstream tasks. In this paper, we investigate the usefulness of different notions of disentanglement for improving the fairness of downstream prediction tasks based on representations. We consider the setting where the goal is to predict a target variable based on the learned representation of high-dimensional observations (such as images) that depend on both the target variable and an unobserved sensitive variable. We show that in this setting both the optimal and empirical predictions can be unfair, even if the target variable and the sensitive variable are independent. Analyzing the representations of more than 12600 trained state-of-the-art disentangled models, we observe that several disentanglement scores are consistently correlated with increased fairness, suggesting that disentanglement may be a useful property to encourage fairness when sensitive variables are not observed.

NeurIPS Conference 2019 Conference Paper

On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset

  • Muhammad Waleed Gondal
  • Manuel Wuthrich
  • Djordje Miladinovic
  • Francesco Locatello
  • Martin Breidt
  • Valentin Volchkov
  • Joel Akpo
  • Olivier Bachem

Learning meaningful and compact representations with disentangled semantic aspects is considered to be of key importance in representation learning. Since real-world data is notoriously costly to collect, many recent state-of-the-art disentanglement models have heavily relied on synthetic toy data-sets. In this paper, we propose a novel data-set which consists of over 1 million images of physical 3D objects with seven factors of variation, such as object color, shape, size and position. In order to be able to control all the factors of variation precisely, we built an experimental platform where the objects are being moved by a robotic arm. In addition, we provide two more datasets which consist of simulations of the experimental setup. These datasets provide for the first time the possibility to systematically investigate how well different disentanglement methods perform on real data in comparison to simulation, and how simulated data can be leveraged to build better representations of the real world. We provide a first experimental study of these questions and our results indicate that learned models transfer poorly, but that model and hyperparameter selection is an effective means of transferring information to the real world.

ICML Conference 2019 Conference Paper

Robustly Disentangled Causal Mechanisms: Validating Deep Representations for Interventional Robustness

  • Raphael Suter
  • Ðorðe Miladinovic
  • Bernhard Schölkopf
  • Stefan Bauer

The ability to learn disentangled representations that split underlying sources of variation in high dimensional, unstructured data is important for data efficient and robust use of neural networks. While various approaches aiming towards this goal have been proposed in recent times, a commonly accepted definition and validation procedure is missing. We provide a causal perspective on representation learning which covers disentanglement and domain shift robustness as special cases. Our causal framework allows us to introduce a new metric for the quantitative evaluation of deep latent variable models. We show how this metric can be estimated from labeled observational data and further provide an efficient estimation algorithm that scales linearly in the dataset size.

NeurIPS Conference 2018 Conference Paper

Adaptive Skip Intervals: Temporal Abstraction for Recurrent Dynamical Models

  • Alexander Neitz
  • Giambattista Parascandolo
  • Stefan Bauer
  • Bernhard Schölkopf

We introduce a method which enables a recurrent dynamics model to be temporally abstract. Our approach, which we call Adaptive Skip Intervals (ASI), is based on the observation that in many sequential prediction tasks, the exact time at which events occur is irrelevant to the underlying objective. Moreover, in many situations, there exist prediction intervals which result in particularly easy-to-predict transitions. We show that there are prediction tasks for which we gain both computational efficiency and prediction accuracy by allowing the model to make predictions at a sampling rate which it can choose itself.

NeurIPS Conference 2017 Conference Paper

Efficient and Flexible Inference for Stochastic Systems

  • Stefan Bauer
  • Nico Gorbach
  • Djordje Miladinovic
  • Joachim Buhmann

Many real world dynamical systems are described by stochastic differential equations. Thus parameter inference is a challenging and important problem in many disciplines. We provide a grid free and flexible algorithm offering parameter and state inference for stochastic systems and compare our approch based on variational approximations to state of the art methods showing significant advantages both in runtime and accuracy.

NeurIPS Conference 2017 Conference Paper

Scalable Variational Inference for Dynamical Systems

  • Nico Gorbach
  • Stefan Bauer
  • Joachim Buhmann

Gradient matching is a promising tool for learning parameters and state dynamics of ordinary differential equations. It is a grid free inference approach, which, for fully observable systems is at times competitive with numerical integration. However, for many real-world applications, only sparse observations are available or even unobserved variables are included in the model description. In these cases most gradient matching methods are difficult to apply or simply do not provide satisfactory results. That is why, despite the high computational cost, numerical integration is still the gold standard in many applications. Using an existing gradient matching approach, we propose a scalable variational inference framework which can infer states and parameters simultaneously, offers computational speedups, improved accuracy and works well even under model misspecifications in a partially observable system.

ICML Conference 2016 Conference Paper

The Arrow of Time in Multivariate Time Series

  • Stefan Bauer
  • Bernhard Schölkopf
  • Jonas Peters

We prove that a time series satisfying a (linear) multivariate autoregressive moving average (VARMA) model satisfies the same model assumption in the reversed time direction, too, if all innovations are normally distributed. This reversibility breaks down if the innovations are non-Gaussian. This means that under the assumption of a VARMA process with non-Gaussian noise, the arrow of time becomes detectable. Our work thereby provides a theoretic justification of an algorithm that has been used for inferring the direction of video snippets. We present a slightly modified practical algorithm that estimates the time direction for a given sample and prove its consistency. We further investigate how the performance of the algorithm depends on sample size, number of dimensions of the time series and the order of the process. An application to real world data from economics shows that considering multivariate processes instead of univariate processes can be beneficial for estimating the time direction. Our result extends earlier work on univariate time series. It relates to the concept of causal inference, where recent methods exploit non-Gaussianity of the error terms for causal structure learning.