Arrow Research search

Author name cluster

Emilie Devijver

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

UAI Conference 2025 Conference Paper

Complete Characterization for Adjustment in Summary Causal Graphs of Time Series

  • Clément Yvernes
  • Emilie Devijver
  • Éric Gaussier

The identifiability problem for interventions aims at assessing whether the total causal effect can be written with a do-free formula, and thus be estimated from observational data only. We study this problem, considering multiple interventions, in the context of time series when only an abstraction of the true causal graph, in the form of a summary causal graph, is available. We propose in particular both necessary and sufficient conditions for the adjustment criterion, which we show is complete in this setting, and provide a pseudo-linear algorithm to decide whether the query is identifiable or not.

NeurIPS Conference 2025 Conference Paper

Relaxing partition admissibility in Cluster-DAGs: a causal calculus with arbitrary variable clustering

  • Clément Yvernes
  • Emilie Devijver
  • Adèle Ribeiro
  • Marianne Clausel
  • Eric Gaussier

Cluster DAGs (C-DAGs) provide an abstraction of causal graphs in which nodes represent clusters of variables, and edges encode both cluster-level causal relationships and dependencies arisen from unobserved confounding. C-DAGs define an equivalence class of acyclic causal graphs that agree on cluster-level relationships, enabling causal reasoning at a higher level of abstraction. However, when the chosen clustering induces cycles in the resulting C-DAG, the partition is deemed inadmissible under conventional C-DAG semantics. In this work, we extend the C-DAG framework to support arbitrary variable clusterings by relaxing the partition admissibility constraint, thereby allowing cyclic C-DAG representations. We extend the notions of d-separation and causal calculus to this setting, significantly broadening the scope of causal reasoning across clusters and enabling the application of C-DAGs in previously intractable scenarios. Our calculus is both sound and atomically complete with respect to the do-calculus: all valid interventional queries at the cluster level can be derived using our rules, each corresponding to a primitive do-calculus step.

TMLR Journal 2024 Journal Article

Causal Discovery from Time Series with Hybrids of Constraint-Based and Noise-Based Algorithms

  • Daria Bystrova
  • Charles K. Assaad
  • Julyan Arbel
  • Emilie Devijver
  • Eric Gaussier
  • Wilfried Thuiller

Constraint-based methods and noise-based methods are two distinct families of methods proposed for uncovering causal graphs from observational data. However, both operate under strong assumptions that may be challenging to validate or could be violated in real-world scenarios. In response to these challenges, there is a growing interest in hybrid methods that amalgamate principles from both methods, showing robustness to assumption violations. This paper introduces a novel comprehensive framework for hybridizing constraint-based and noise-based methods designed to uncover causal graphs from observational time series. The framework is structured into two classes. The first class employs a noise-based strategy to identify a super graph, containing the true graph, followed by a constraint-based strategy to eliminate unnecessary edges. In the second class, a constraint-based strategy is applied to identify a skeleton, which is then oriented using a noise-based strategy. The paper provides theoretical guarantees for each class under the condition that all assumptions are satisfied, and it outlines some properties when assumptions are violated. To validate the efficacy of the framework, two algorithms from each class are experimentally tested on simulated data, realistic ecological data, and real datasets sourced from diverse applications. Notably, two novel datasets related to Information Technology monitoring are introduced within the set of considered real datasets. The experimental results underscore the robustness and effectiveness of the hybrid approaches across a broad spectrum of datasets.

ECAI Conference 2024 Conference Paper

Efficient Initial Data Selection and Labeling for Multi-Class Classification Using Topological Analysis

  • Lies Hadjadj
  • Emilie Devijver
  • Rémi Molinier
  • Massih-Reza Amini

Machine learning techniques often require large labeled training sets to attain optimal performance. However, acquiring labeled data can pose challenges in practical scenarios. Pool-based active learning methods aim to select the most relevant data points for training from a pool of unlabeled data. Nonetheless, these methods heavily rely on the initial labeled dataset, often chosen randomly. In our study, we introduce a novel approach specifically tailored for multi-class classification tasks, utilizing Proper Topological Regions (PTR) derived from topological data analysis (TDA) to efficiently identify the initial set of points for labeling. Through experiments on various benchmark datasets, we demonstrate the efficacy of our method and its competitive performance compared to traditional approaches, as measured by average balanced classification accuracy.

UAI Conference 2024 Conference Paper

Identifiability of total effects from abstractions of time series causal graphs

  • Charles K. Assaad
  • Emilie Devijver
  • Éric Gaussier
  • Gregor Goessler
  • Anouar Meynaoui

We study the problem of identifiability of the total effect of an intervention from observational time series only given an abstraction of the causal graph of the system. Specifically, we consider two types of abstractions: the extended summary causal graph which conflates all lagged causal relations but distinguishes between lagged and instantaneous relations; and the summary causal graph which does not give any indication about the lag between causal relations. We show that the total effect is always identifiable in extended summary causal graphs and we provide necessary and sufficient graphical conditions for identifiability in summary causal graphs. Furthermore, we provide adjustment sets allowing to estimate the total effect whenever it is identifiable.

JMLR Journal 2024 Journal Article

Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data

  • Vasilii Feofanov
  • Emilie Devijver
  • Massih-Reza Amini

In this paper, we propose a probabilistic framework for analyzing a multi-class majority vote classifier in the case where training data is partially labeled. First, we derive a multi-class transductive bound over the risk of the majority vote classifier, which is based on the classifier's vote distribution over each class. Then, we introduce a mislabeling error model to analyze the error of the majority vote classifier in the case of the pseudo-labeled training data. We derive a generalization bound over the majority vote error when imperfect labels are given, taking into account the mean and the variance of the prediction margin. Finally, we demonstrate an application of the derived transductive bound for self-training to find automatically the confidence threshold used to determine unlabeled examples for pseudo-labeling. Empirical results on different data sets show the effectiveness of our framework compared to several state-of-the-art semi-supervised approaches. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

IJCAI Conference 2023 Conference Paper

Survey and Evaluation of Causal Discovery Methods for Time Series (Extended Abstract)

  • Charles K. Assaad
  • Emilie Devijver
  • Eric Gaussier

We introduce in this survey the major concepts, models, and algorithms proposed so far to infer causal relations from observational time series, a task usually referred to as causal discovery in time series. To do so, after a description of the underlying concepts and modelling assumptions, we present different methods according to the family of approaches they belong to: Granger causality, constraint-based approaches, noise-based approaches, score-based approaches, logic-based approaches, topology-based approaches, and difference-based approaches. We then evaluate several representative methods to illustrate the behaviour of different families of approaches. This illustration is conducted on both artificial and real datasets, with different characteristics. The main conclusions one can draw from this survey is that causal discovery in times series is an active research field in which new methods (in every family of approaches) are regularly proposed, and that no family or method stands out in all situations. Indeed, they all rely on assumptions that may or may not be appropriate for a particular dataset.

UAI Conference 2022 Conference Paper

Discovery of extended summary graphs in time series

  • Charles K. Assaad
  • Emilie Devijver
  • Éric Gaussier

This study addresses the problem of learning an extended summary causal graph from time series. The algorithms we propose fit within the well-known constraint-based framework for causal discovery and make use of information-theoretic measures to determine (in)dependencies between time series. We first introduce generalizations of the causation entropy measure to any lagged or instantaneous relations, prior to using this measure to construct extended summary causal graphs by adapting two well-known algorithms, namely PC and FCI. The behaviour of our method is illustrated through several experiments.

JAIR Journal 2022 Journal Article

Survey and Evaluation of Causal Discovery Methods for Time Series

  • Charles K. Assaad
  • Emilie Devijver
  • Eric Gaussier

We introduce in this survey the major concepts, models, and algorithms proposed so far to infer causal relations from observational time series, a task usually referred to as causal discovery in time series. To do so, after a description of the underlying concepts and modelling assumptions, we present different methods according to the family of approaches they belong to: Granger causality, constraint-based approaches, noise-based approaches, score-based approaches, logic-based approaches, topology-based approaches, and difference-based approaches. We then evaluate several representative methods to illustrate the behaviour of different families of approaches. This illustration is conducted on both artificial and real datasets, with different characteristics. The main conclusions one can draw from this survey is that causal discovery in times series is an active research field in which new methods (in every family of approaches) are regularly proposed, and that no family or method stands out in all situations. Indeed, they all rely on assumptions that may or may not be appropriate for a particular dataset.

JMLR Journal 2020 Journal Article

Prediction regions through Inverse Regression

  • Emilie Devijver
  • Emeline Perthame

Predicting a new response from a covariate is a challenging task in regression, which raises new question since the era of high-dimensional data. In this paper, we are interested in the inverse regression method from a theoretical viewpoint. Theoretical results for the well-known Gaussian linear model are well-known, but the curse of dimensionality has increased the interest of practitioners and theoreticians into generalization of those results for various estimators, calibrated for the high-dimension context. We propose to focus on inverse regression. It is known to be a reliable and efficient approach when the number of features exceeds the number of observations. Indeed, under some conditions, dealing with the inverse regression problem associated to a forward regression problem drastically reduces the number of parameters to estimate, makes the problem tractable and allows to consider more general distributions, as elliptical distributions. When both the responses and the covariates are multivariate, estimators constructed by the inverse regression are studied in this paper, the main result being explicit asymptotic prediction regions for the response. The performances of the proposed estimators and prediction regions are also analyzed through a simulation study and compared with usual estimators. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

NeurIPS Conference 2020 Conference Paper

Smooth And Consistent Probabilistic Regression Trees

  • Sami Alkhoury
  • Emilie Devijver
  • Marianne Clausel
  • Myriam Tami
  • Eric Gaussier
  • georges Oppenheim

We propose here a generalization of regression trees, referred to as Probabilistic Regression (PR) trees, that adapt to the smoothness of the prediction function relating input and output variables while preserving the interpretability of the prediction and being robust to noise. In PR trees, an observation is associated to all regions of a tree through a probability distribution that reflects how far the observation is to a region. We show that such trees are consistent, meaning that their error tends to 0 when the sample size tends to infinity, a property that has not been established for similar, previous proposals as Soft trees and Smooth Transition Regression trees. We further explain how PR trees can be used in different ensemble methods, namely Random Forests and Gradient Boosted Trees. Lastly, we assess their performance through extensive experiments that illustrate their benefits in terms of performance, interpretability and robustness to noise.

AAAI Conference 2019 Conference Paper

Transductive Bounds for the Multi-Class Majority Vote Classifier

  • Vasilii Feofanov
  • Emilie Devijver
  • Massih-Reza Amini

In this paper, we propose a transductive bound over the risk of the majority vote classifier learned with partially labeled data for the multi-class classification. The bound is obtained by considering the class confusion matrix as an error indicator and it involves the margin distribution of the classifier over each class and a bound over the risk of the associated Gibbs classifier. When this latter bound is tight and, the errors of the majority vote classifier per class are concentrated on a low margin zone; we prove that the bound over the Bayes classifier’ risk is tight. As an application, we extend the self-learning algorithm to the multi-class case. The algorithm iteratively assigns pseudo-labels to a subset of unlabeled training examples that have their associated class margin above a threshold obtained from the proposed transductive bound. Empirical results on different data sets show the effectiveness of our approach compared to the same algorithm where the threshold is fixed manually, to the extension of TSVM to multi-class classification and to a graph-based semi-supervised algorithm.