Arrow Research search

Author name cluster

Patrick Gallinari

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

43 papers
2 author rows

Possible papers

43

NeurIPS Conference 2025 Conference Paper

ENMA: Tokenwise Autoregression for Continuous Neural PDE Operators

  • Armand Kassaï Koupaï
  • Lise Le Boudec
  • Louis Serrano
  • Patrick Gallinari

Solving time-dependent parametric partial differential equations (PDEs) remains a fundamental challenge for neural solvers, particularly when generalizing across a wide range of physical parameters and dynamics. When data is uncertain or incomplete—as is often the case—a natural approach is to turn to generative models. We introduce ENMA, a generative neural operator designed to model spatio-temporal dynamics arising from physical phenomena. ENMA predicts future dynamics in a compressed latent space using a generative masked autoregressive transformer trained with flow matching loss, enabling tokenwise generation. Irregularly sampled spatial observations are encoded into uniform latent representations via attention mechanisms and further compressed through a spatio-temporal convolutional encoder. This allows ENMA to perform in-context learning at inference time by conditioning on either past states of the target trajectory or auxiliary context trajectories with similar dynamics. The result is a robust and adaptable framework that generalizes to new PDE regimes and supports one-shot surrogate modeling of time-dependent parametric PDEs.

ICLR Conference 2025 Conference Paper

Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods

  • Lise Le Boudec
  • Emmanuel de Bézenac
  • Louis Serrano
  • Ramon Daniel Regueiro-Espino
  • Yuan Yin
  • Patrick Gallinari

Physics-informed deep learning often faces optimization challenges due to the complexity of solving partial differential equations (PDEs), which involve exploring large solution spaces, require numerous iterations, and can lead to unstable training. These challenges arise particularly from the ill-conditioning of the optimization problem, caused by the differential terms in the loss function. To address these issues, we propose learning a solver, i.e., solving PDEs using a physics-informed iterative algorithm trained on data. Our method learns to condition a gradient descent algorithm that automatically adapts to each PDE instance, significantly accelerating and stabilizing the optimization process and enabling faster convergence of physics-aware models. Furthermore, while traditional physics-informed methods solve for a single PDE instance, our approach addresses parametric PDEs. Specifically, our method integrates the physical loss gradient with the PDE parameters to solve over a distribution of PDE parameters, including coefficients, initial conditions, or boundary conditions. We demonstrate the effectiveness of our method through empirical experiments on multiple datasets, comparing training and test-time optimization performance.

ICLR Conference 2025 Conference Paper

SCOPE: A Self-supervised Framework for Improving Faithfulness in Conditional Text Generation

  • Song Duong
  • Florian Le Bronnec
  • Alexandre Allauzen
  • Vincent Guigue
  • Alberto Lumbreras
  • Laure Soulier
  • Patrick Gallinari

Large Language Models (LLMs), when used for conditional text generation, often produce hallucinations, i.e., information that is unfaithful or not grounded in the input context. This issue arises in typical conditional text generation tasks, such as text summarization and data-to-text generation, where the goal is to produce fluent text based on contextual input. When fine-tuned on specific domains, LLMs struggle to provide faithful answers to a given context, often adding information or generating errors. One underlying cause of this issue is that LLMs rely on statistical patterns learned from their training data. This reliance can interfere with the model's ability to stay faithful to a provided context, leading to the generation of ungrounded information. We build upon this observation and introduce a novel self-supervised method for generating a training set of unfaithful samples. We then refine the model using a training process that encourages the generation of grounded outputs over unfaithful ones, drawing on preference-based training. Our approach leads to significantly more grounded text generation, outperforming existing self-supervised techniques in faithfulness, as evaluated through automatic metrics, LLM-based assessments, and human evaluations.

ICML Conference 2025 Conference Paper

Zebra: In-Context Generative Pretraining for Solving Parametric PDEs

  • Louis Serrano
  • Armand Kassaï Koupaï
  • Thomas X. Wang
  • Pierre Erbacher
  • Patrick Gallinari

Solving time-dependent parametric partial differential equations (PDEs) is challenging for data-driven methods, as these models must adapt to variations in parameters such as coefficients, forcing terms, and initial conditions. State-of-the-art neural surrogates perform adaptation through gradient-based optimization and meta-learning to implicitly encode the variety of dynamics from observations. This often comes with increased inference complexity. Inspired by the in-context learning capabilities of large language models (LLMs), we introduce Zebra, a novel generative auto-regressive transformer designed to solve parametric PDEs without requiring gradient adaptation at inference. By leveraging in-context information during both pre-training and inference, Zebra dynamically adapts to new tasks by conditioning on input sequences that incorporate context example trajectories. As a generative model, Zebra can be used to generate new trajectories and allows quantifying the uncertainty of the predictions. We evaluate Zebra across a variety of challenging PDE scenarios, demonstrating its adaptability, robustness, and superior performance compared to existing approaches.

NeurIPS Conference 2024 Conference Paper

AROMA: Preserving Spatial Structure for Latent PDE Modeling with Local Neural Fields

  • Louis Serrano
  • Thomas X Wang
  • Etienne Le Naour
  • Jean-Noël Vittaut
  • Patrick Gallinari

We present AROMA (Attentive Reduced Order Model with Attention), a framework designed to enhance the modeling of partial differential equations (PDEs) using local neural fields. Our flexible encoder-decoder architecture can obtain smooth latent representations of spatial physical fields from a variety of data types, including irregular-grid inputs and point clouds. This versatility eliminates the need for patching and allows efficient processing of diverse geometries. The sequential nature of our latent representation can be interpreted spatially and permits the use of a conditional transformer for modeling the temporal dynamics of PDEs. By employing a diffusion-based formulation, we achieve greater stability and enable longer rollouts compared to conventional MSE training. AROMA's superior performance in simulating 1D and 2D equations underscores the efficacy of our approach in capturing complex dynamical behaviors.

NeurIPS Conference 2024 Conference Paper

Boosting Generalization in Parametric PDE Neural Solvers through Adaptive Conditioning

  • Armand Kassaï Koupaï
  • Jorge Mifsut Benet
  • Yuan Yin
  • Jean-Noël Vittaut
  • Patrick Gallinari

Solving parametric partial differential equations (PDEs) presents significant challenges for data-driven methods due to the sensitivity of spatio-temporal dynamics to variations in PDE parameters. Machine learning approaches often struggle to capture this variability. To address this, data-driven approaches learn parametric PDEs by sampling a very large variety of trajectories with varying PDE parameters. We first show that incorporating conditioning mechanisms for learning parametric PDEs is essential and that among them, \textit{adaptive conditioning}, allows stronger generalization. As existing adaptive conditioning methods do not scale well with respect to the number of parameters to adapt in the neural solver, we propose GEPS, a simple adaptation mechanism to boost GEneralization in Pde Solvers via a first-order optimization and low-rank rapid adaptation of a small set of context parameters. We demonstrate the versatility of our approach for both fully data-driven and for physics-aware neural solvers. Validation performed on a whole range of spatio-temporal forecasting problems demonstrates excellent performance for generalizing to unseen conditions including initial conditions, PDE coefficients, forcing terms and solution domain. Project page: https: //geps-project. github. io

TMLR Journal 2024 Journal Article

Time Series Continuous Modeling for Imputation and Forecasting with Implicit Neural Representations

  • Etienne Le Naour
  • Louis Serrano
  • Léon Migus
  • Yuan Yin
  • Ghislain Agoua
  • Nicolas Baskiotis
  • Patrick Gallinari
  • Vincent Guigue

We introduce a novel modeling approach for time series imputation and forecasting, tailored to address the challenges often encountered in real-world data, such as irregular samples, missing data, or unaligned measurements from multiple sensors. Our method relies on a continuous-time-dependent model of the series' evolution dynamics. It leverages adaptations of conditional, implicit neural representations for sequential data. A modulation mechanism, driven by a meta-learning algorithm, allows adaptation to unseen samples and extrapolation beyond observed time-windows for long-term predictions. The model provides a highly flexible and unified framework for imputation and forecasting tasks across a wide range of challenging scenarios. It achieves state-of-the-art performance on classical benchmarks and outperforms alternative time-continuous models.

ICLR Conference 2023 Conference Paper

Continuous PDE Dynamics Forecasting with Implicit Neural Representations

  • Yuan Yin
  • Matthieu Kirchmeyer
  • Jean-Yves Franceschi
  • Alain Rakotomamonjy
  • Patrick Gallinari

Effective data-driven PDE forecasting methods often rely on fixed spatial and / or temporal discretizations. This raises limitations in real-world applications like weather prediction where flexible extrapolation at arbitrary spatiotemporal locations is required. We address this problem by introducing a new data-driven approach, DINo, that models a PDE's flow with continuous-time dynamics of spatially continuous functions. This is achieved by embedding spatial observations independently of their discretization via Implicit Neural Representations in a small latent space temporally driven by a learned ODE. This separate and flexible treatment of time and space makes DINo the first data-driven model to combine the following advantages. It extrapolates at arbitrary spatial and temporal locations; it can learn from sparse irregular grids or manifolds; at test time, it generalizes to new grids or resolutions. DINo outperforms alternative neural PDE forecasters in a variety of challenging generalization scenarios on representative PDE systems.

NeurIPS Conference 2023 Conference Paper

Module-wise Training of Neural Networks via the Minimizing Movement Scheme

  • Skander Karkar
  • Ibrahim Ayed
  • Emmanuel de Bézenac
  • Patrick Gallinari

Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a simple module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.

NeurIPS Conference 2023 Conference Paper

Operator Learning with Neural Fields: Tackling PDEs on General Geometries

  • Louis Serrano
  • Lise Le Boudec
  • Armand Kassaï Koupaï
  • Thomas X Wang
  • Yuan Yin
  • Jean-Noël Vittaut
  • Patrick Gallinari

Machine learning approaches for solving partial differential equations require learning mappings between function spaces. While convolutional or graph neural networks are constrained to discretized functions, neural operators present a promising milestone toward mapping functions directly. Despite impressive results they still face challenges with respect to the domain geometry and typically rely on some form of discretization. In order to alleviate such limitations, we present CORAL, a new method that leverages coordinate-based networks for solving PDEs on general geometries. CORAL is designed to remove constraints on the input mesh, making it applicable to any spatial sampling and geometry. Its ability extends to diverse problem domains, including PDE solving, spatio-temporal forecasting, and inverse problems like geometric design. CORAL demonstrates robust performance across multiple resolutions and performs well in both convex and non-convex domains, surpassing or performing on par with state-of-the-art models.

ICML Conference 2022 Conference Paper

A Neural Tangent Kernel Perspective of GANs

  • Jean-Yves Franceschi
  • Emmanuel de Bézenac
  • Ibrahim Ayed
  • Mickaël Chen
  • Sylvain Lamprier
  • Patrick Gallinari

We propose a novel theoretical framework of analysis for Generative Adversarial Networks (GANs). We reveal a fundamental flaw of previous analyses which, by incorrectly modeling GANs’ training scheme, are subject to ill-defined discriminator gradients. We overcome this issue which impedes a principled study of GAN training, solving it within our framework by taking into account the discriminator’s architecture. To this end, we leverage the theory of infinite-width neural networks for the discriminator via its Neural Tangent Kernel. We characterize the trained discriminator for a wide range of losses and establish general differentiability properties of the network. From this, we derive new insights about the convergence of the generated distribution, advancing our understanding of GANs’ training dynamics. We empirically corroborate these results via an analysis toolkit based on our framework, unveiling intuitions that are consistent with GAN practice.

NeurIPS Conference 2022 Conference Paper

AirfRANS: High Fidelity Computational Fluid Dynamics Dataset for Approximating Reynolds-Averaged Navier–Stokes Solutions

  • Florent Bonnet
  • Jocelyn Mazari
  • Paola Cinnella
  • Patrick Gallinari

Surrogate models are necessary to optimize meaningful quantities in physical dynamics as their recursive numerical resolutions are often prohibitively expensive. It is mainly the case for fluid dynamics and the resolution of Navier–Stokes equations. However, despite the fast-growing field of data-driven models for physical systems, reference datasets representing real-world phenomena are lacking. In this work, we develop \textsc{AirfRANS}, a dataset for studying the two-dimensional incompressible steady-state Reynolds-Averaged Navier–Stokes equations over airfoils at a subsonic regime and for different angles of attacks. We also introduce metrics on the stress forces at the surface of geometries and visualization of boundary layers to assess the capabilities of models to accurately predict the meaningful information of the problem. Finally, we propose deep learning baselines on four machine learning tasks to study \textsc{AirfRANS} under different constraints for generalization considerations: big and scarce data regime, Reynolds number, and angle of attack extrapolation.

ICLR Conference 2022 Conference Paper

Constrained Physical-Statistics Models for Dynamical System Identification and Prediction

  • Jérémie Donà
  • Marie Déchelle
  • Patrick Gallinari
  • Marina Levy

Modeling dynamical systems combining prior physical knowledge and machine learning (ML) is promising in scientific problems when the underlying processes are not fully understood, e.g. when the dynamics is partially known. A common practice to identify the respective parameters of the physical and ML components is to formulate the problem as supervised learning on observed trajectories. However, this formulation leads to an infinite number of possible decompositions. To solve this ill-posedness, we reformulate the learning problem by introducing an upper bound on the prediction error of a physical-statistical model. This allows us to control the contribution of both the physical and statistical components to the overall prediction. This framework generalizes several existing hybrid schemes proposed in the literature. We provide theoretical guarantees on the well-posedness of our formulation along with a proof of convergence in a simple affine setting. For more complex dynamics, we validate our framework experimentally.

NeurIPS Conference 2022 Conference Paper

Diverse Weight Averaging for Out-of-Distribution Generalization

  • Alexandre Rame
  • Matthieu Kirchmeyer
  • Thibaud Rahier
  • Alain Rakotomamonjy
  • Patrick Gallinari
  • Matthieu Cord

Standard neural networks struggle to generalize under distribution shifts in computer vision. Fortunately, combining multiple networks can consistently improve out-of-distribution generalization. In particular, weight averaging (WA) strategies were shown to perform best on the competitive DomainBed benchmark; they directly average the weights of multiple networks despite their nonlinearities. In this paper, we propose Diverse Weight Averaging (DiWA), a new WA strategy whose main motivation is to increase the functional diversity across averaged models. To this end, DiWA averages weights obtained from several independent training runs: indeed, models obtained from different runs are more diverse than those collected along a single run thanks to differences in hyperparameters and training procedures. We motivate the need for diversity by a new bias-variance-covariance-locality decomposition of the expected error, exploiting similarities between WA and standard functional ensembling. Moreover, this decomposition highlights that WA succeeds when the variance term dominates, which we show occurs when the marginal distribution changes at test time. Experimentally, DiWA consistently improves the state of the art on DomainBed without inference overhead.

ICML Conference 2022 Conference Paper

Generalizing to New Physical Systems via Context-Informed Dynamics Model

  • Matthieu Kirchmeyer
  • Yuan Yin
  • Jérémie Donà
  • Nicolas Baskiotis
  • Alain Rakotomamonjy
  • Patrick Gallinari

Data-driven approaches to modeling physical systems fail to generalize to unseen systems that share the same general dynamics with the learning domain, but correspond to different physical contexts. We propose a new framework for this key problem, context-informed dynamics adaptation (CoDA), which takes into account the distributional shift across systems for fast and efficient adaptation to new dynamics. CoDA leverages multiple environments, each associated to a different dynamic, and learns to condition the dynamics model on contextual parameters, specific to each environment. The conditioning is performed via a hypernetwork, learned jointly with a context vector from observed data. The proposed formulation constrains the search hypothesis space for fast adaptation and better generalization across environments with few samples. We theoretically motivate our approach and show state-of-the-art generalization results on a set of nonlinear dynamics, representative of a variety of application domains. We also show, on these systems, that new system parameters can be inferred from context vectors with minimal supervision.

ICLR Conference 2022 Conference Paper

Mapping conditional distributions for domain adaptation under generalized target shift

  • Matthieu Kirchmeyer
  • Alain Rakotomamonjy
  • Emmanuel de Bézenac
  • Patrick Gallinari

We consider the problem of unsupervised domain adaptation (UDA) between a source and a target domain under conditional and label shift a.k.a Generalized Target Shift (GeTarS). Unlike simpler UDA settings, few works have addressed this challenging problem. Recent approaches learn domain-invariant representations, yet they have practical limitations and rely on strong assumptions that may not hold in practice. In this paper, we explore a novel and general approach to align pretrained representations, which circumvents existing drawbacks. Instead of constraining representation invariance, it learns an optimal transport map, implemented as a NN, which maps source representations onto target ones. Our approach is flexible and scalable, it preserves the problem's structure and it has strong theoretical guarantees under mild assumptions. In particular, our solution is unique, matches conditional distributions across domains, recovers target proportions and explicitly controls the target generalization risk. Through an exhaustive comparison on several datasets, we challenge the state-of-the-art in GeTarS.

ICLR Conference 2021 Conference Paper

Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

  • Yuan Yin
  • Vincent Le Guen
  • Jérémie Donà
  • Emmanuel de Bézenac
  • Ibrahim Ayed
  • Nicolas Thome
  • Patrick Gallinari

Forecasting complex dynamical phenomena in settings where only partial knowledge of their dynamics is available is a prevalent problem across various scientific fields. While purely data-driven approaches are arguably insufficient in this context, standard physical modeling based approaches tend to be over-simplistic, inducing non-negligible errors. In this work, we introduce the APHYNITY framework, a principled approach for augmenting incomplete physical dynamics described by differential equations with deep data-driven models. It consists in decomposing the dynamics into two components: a physical component accounting for the dynamics for which we have some prior knowledge, and a data-driven component accounting for errors of the physical model. The learning problem is carefully formulated such that the physical model explains as much of the data as possible, while the data-driven component only describes information that cannot be captured by the physical model, no more, no less. This not only provides the existence and uniqueness for this decomposition, but also ensures interpretability and benefits generalization. Experiments made on three important use cases, each representative of a different family of phenomena, i.e. reaction-diffusion equations, wave equations and the non-linear damped pendulum, show that APHYNITY can efficiently leverage approximate physical models to accurately forecast the evolution of the system and correctly identify relevant physical parameters.

NeurIPS Conference 2021 Conference Paper

LEADS: Learning Dynamical Systems that Generalize Across Environments

  • Yuan Yin
  • Ibrahim Ayed
  • Emmanuel de Bézenac
  • Nicolas Baskiotis
  • Patrick Gallinari

When modeling dynamical systems from real-world data samples, the distribution of data often changes according to the environment in which they are captured, and the dynamics of the system itself vary from one environment to another. Generalizing across environments thus challenges the conventional frameworks. The classical settings suggest either considering data as i. i. d and learning a single model to cover all situations or learning environment-specific models. Both are sub-optimal: the former disregards the discrepancies between environments leading to biased solutions, while the latter does not exploit their potential commonalities and is prone to scarcity problems. We propose LEADS, a novel framework that leverages the commonalities and discrepancies among known environments to improve model generalization. This is achieved with a tailored training formulation aiming at capturing common dynamics within a shared model while additional terms capture environment-specific dynamics. We ground our approach in theory, exhibiting a decrease in sample complexity w. r. t classical alternatives. We show how theory and practice coincides on the simplified case of linear dynamics. Moreover, we instantiate this framework for neural networks and evaluate it experimentally on representative families of nonlinear dynamics. We show that this new setting can exploit knowledge extracted from environment-dependent data and improves generalization for both known and novel environments.

ICLR Conference 2021 Conference Paper

PDE-Driven Spatiotemporal Disentanglement

  • Jérémie Donà
  • Jean-Yves Franceschi
  • Sylvain Lamprier
  • Patrick Gallinari

A recent line of work in the machine learning community addresses the problem of predicting high-dimensional spatiotemporal phenomena by leveraging specific tools from the differential equations theory. Following this direction, we propose in this article a novel and general paradigm for this task based on a resolution method for partial differential equations: the separation of variables. This inspiration allows us to introduce a dynamical interpretation of spatiotemporal disentanglement. It induces a principled model based on learning disentangled spatial and temporal representations of a phenomenon to accurately predict future observations. We experimentally demonstrate the performance and broad applicability of our method against prior state-of-the-art models on physical and synthetic video datasets.

NeurIPS Conference 2020 Conference Paper

Normalizing Kalman Filters for Multivariate Time Series Analysis

  • Emmanuel de Bézenac
  • Syama Sundar Rangapuram
  • Konstantinos Benidis
  • Michael Bohlke-Schneider
  • Richard Kurle
  • Lorenzo Stella
  • Hilaf Hasson
  • Patrick Gallinari

This paper tackles the modelling of large, complex and multivariate time series panels in a probabilistic setting. To this extent, we present a novel approach reconciling classical state space models with deep learning methods. By augmenting state space models with normalizing flows, we mitigate imprecisions stemming from idealized assumptions in state space models. The resulting model is highly flexible while still retaining many of the attractive properties of state space models, e. g. , uncertainty and observation errors are properly accounted for, inference is tractable, sampling is efficient, good generalization performance is observed, even in low data regimes. We demonstrate competitiveness against state-of-the-art deep learning methods on the tasks of forecasting real world data and handling varying levels of missing data.

ICML Conference 2020 Conference Paper

Stochastic Latent Residual Video Prediction

  • Jean-Yves Franceschi
  • Edouard Delasalles
  • Mickaël Chen
  • Sylvain Lamprier
  • Patrick Gallinari

Designing video prediction models that account for the inherent uncertainty of the future is challenging. Most works in the literature are based on stochastic image-autoregressive recurrent networks, which raises several performance and applicability issues. An alternative is to use fully latent temporal models which untie frame synthesis and temporal dynamics. However, no such model for stochastic video prediction has been proposed in the literature yet, due to design and training difficulties. In this paper, we overcome these difficulties by introducing a novel stochastic temporal model whose dynamics are governed in a latent space by a residual update rule. This first-order scheme is motivated by discretization schemes of differential equations. It naturally models video dynamics as it allows our simpler, more interpretable, latent model to outperform prior state-of-the-art methods on challenging datasets.

ICML Conference 2019 Conference Paper

Context-Aware Zero-Shot Learning for Object Recognition

  • Éloi Zablocki
  • Patrick Bordes
  • Laure Soulier
  • Benjamin Piwowarski
  • Patrick Gallinari

Zero-Shot Learning (ZSL) aims at classifying unlabeled objects by leveraging auxiliary knowledge, such as semantic representations. A limitation of previous approaches is that only intrinsic properties of objects, e. g. their visual appearance, are taken into account while their context, e. g. the surrounding objects in the image, is ignored. Following the intuitive principle that objects tend to be found in certain contexts but not others, we propose a new and challenging approach, context-aware ZSL, that leverages semantic representations in a new way to model the conditional likelihood of an object to appear in a given context. Finally, through extensive experiments conducted on Visual Genome, we show that contextual information can substantially improve the standard ZSL approach and is robust to unbalanced classes.

AAAI Conference 2018 Conference Paper

Learning Multi-Modal Word Representation Grounded in Visual Context

  • Éloi Zablocki
  • Benjamin Piwowarski
  • Laure Soulier
  • Patrick Gallinari

Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to integrate perceptual and visual features. Most of these works consider the visual appearance of objects to enhance word representations but they ignore the visual environment and context in which objects appear. We propose to unify text-based techniques with vision-based techniques by simultaneously leveraging textual and visual context to learn multimodal word embeddings. We explore various choices for what can serve as a visual context and present an end-to-end method to integrate visual context elements in a multimodal skip-gram model. We provide experiments and extensive analysis of the obtained results.

JMLR Journal 2018 Journal Article

Profile-Based Bandit with Unknown Profiles

  • Sylvain Lamprier
  • Thibault Gisselbrecht
  • Patrick Gallinari

Stochastic bandits have been widely studied since decades. A very large panel of settings have been introduced, some of them for the inclusion of some structure between actions. If actions are associated with feature vectors that underlie their usefulness, the discovery of a mapping parameter between such profiles and rewards can help the exploration process of the bandit strategies. This is the setting studied in this paper, but in our case the action profiles (constant feature vectors) are unknown beforehand. Instead, the agent is only given sample vectors, with mean centered on the true profiles, for a subset of actions at each step of the process. In this new bandit instance, policies have thus to deal with a doubled uncertainty, both on the profile estimators and the reward mapping parameters learned so far. We propose a new algorithm, called \textit{SampLinUCB}, specifically designed for this case. Theoretical convergence guarantees are given for this strategy, according to various profile samples delivery scenarios. Finally, experiments are conducted on both artificial data and a task of focused data capture from online social networks. Obtained results demonstrate the relevance of the approach in various settings. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

EWRL Workshop 2015 Workshop Paper

Deep Sequential Neural Networks

  • Ludovic Denoyer
  • Patrick Gallinari

Neural Networks sequentially build high-level features through their successive layers. We propose a new neural network model where each layer is associated with a set of candidate mappings. When an input is processed, at each layer, one mapping among these candidates is selected according to a sequential decision process. The resulting model is structured according to a DAG like architecture, so that a path from the root to a leaf node defines a sequence of transformations. The model is thus able to process data with different characteristics through specific sequences of local transformations, increasing the expression power of this model w.r.t a classical deep neural network. The learning algorithm is inspired from policy gradient techniques coming from the reinforcement learning domain and is used here instead of the classical back-propagation based gradient descent techniques.

ICLR Conference 2014 Conference Paper

Sequentially Generated Instance-Dependent Image Representations for Classification

  • Gabriel Dulac-Arnold
  • Ludovic Denoyer
  • Nicolas Thome
  • Matthieu Cord
  • Patrick Gallinari

In this paper, we investigate a new framework for image classification that adaptively generates spatial representations. Our strategy is based on a sequential process that learns to explore the different regions of any image in order to infer its category. In particular, the choice of regions is specific to each image, directed by the actual content of previously selected regions.The capacity of the system to handle incomplete image information as well as its adaptive region selection allow the system to perform well in budgeted classification tasks by exploiting a dynamicly generated representation of each image. We demonstrate the system's abilities in a series of image-based exploration and classification tasks that highlight its learned exploration and inference abilities.

NeurIPS Conference 2013 Conference Paper

Robust Bloom Filters for Large MultiLabel Classification Tasks

  • Moustapha Cisse
  • Nicolas Usunier
  • Thierry Artières
  • Patrick Gallinari

This paper presents an approach to multilabel classification (MLC) with a large number of labels. Our approach is a reduction to binary classification in which label sets are represented by low dimensional binary vectors. This representation follows the principle of Bloom filters, a space-efficient data structure originally designed for approximate membership testing. We show that a naive application of Bloom filters in MLC is not robust to individual binary classifiers' errors. We then present an approach that exploits a specific feature of real-world datasets when the number of labels is large: many labels (almost) never appear together. Our approch is provably robust, has sublinear training and inference complexity with respect to the number of labels, and compares favorably to state-of-the-art algorithms on two large scale multilabel datasets.

NeurIPS Conference 2012 Conference Paper

On the (Non-)existence of Convex, Calibrated Surrogate Losses for Ranking

  • Clément Calauzènes
  • Nicolas Usunier
  • Patrick Gallinari

We study surrogate losses for learning to rank, in a framework where the rankings are induced by scores and the task is to learn the scoring function. We focus on the calibration of surrogate losses with respect to a ranking evaluation metric, where the calibration is equivalent to the guarantee that near-optimal values of the sur- rogate risk imply near-optimal values of the risk defined by the evaluation metric. We prove that if a surrogate loss is a convex function of the scores, then it is not calibrated with respect to two evaluation metrics widely used for search engine evaluation, namely the Average Precision and the Expected Reciprocal Rank. We also show that such convex surrogate losses cannot be calibrated with respect to the Pairwise Disagreement, an evaluation metric used when learning from pair- wise preferences. Our results cast lights on the intrinsic difficulty of some ranking problems, as well as on the limitations of learning-to-rank algorithms based on the minimization of a convex surrogate risk.

JMLR Journal 2010 Journal Article

Erratum: SGDQN is Less Careful than Expected

  • Antoine Bordes
  • Léon Bottou
  • Patrick Gallinari
  • Jonathan Chang
  • S. Alex Smith

The SGD-QN algorithm described in Bordes et al. (2009) contains a subtle flaw that prevents it from reaching its design goals. Yet the flawed SGD-QN algorithm has worked well enough to be a winner of the first Pascal Large Scale Learning Challenge (Sonnenburg et al., 2008). This document clarifies the situation, proposes a corrected algorithm, and evaluates its performance. [abs] [ pdf ][ bib ] &copy JMLR 2010. ( edit, beta )

JMLR Journal 2009 Journal Article

SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent

  • Antoine Bordes
  • Léon Bottou
  • Patrick Gallinari

The SGD-QN algorithm is a stochastic gradient descent algorithm that makes careful use of second-order information and splits the parameter update into independently scheduled components. Thanks to this design, SGD-QN iterates nearly as fast as a first-order stochastic gradient descent but requires less iterations to achieve the same accuracy. This algorithm won the "Wild Track" of the first PASCAL Large Scale Learning Challenge (Sonnenburg et al., 2008). [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

EWRL Workshop 2008 Conference Paper

Applications of Reinforcement Learning to Structured Prediction

  • Francis Maes
  • Ludovic Denoyer
  • Patrick Gallinari

Abstract Supervised learning is about learning functions given a set of input and corresponding output examples. A recent trend in this field is to consider structured outputs such as sequences, trees or graphs. When predicting such structured data, learning models have to select solutions within very large discrete spaces. The combinatorial nature of this problem has recently led to learning models integrating a search component. In this paper, we show that Structured Prediction (SP) can be seen as a sequential decision problem. We introduce SP-MDP: a Markov Decision Process based formulation of Structured Prediction. Learning the optimal policy in SP-MDP is shown to be equivalent as solving the SP problem. This allows us to apply classical Reinforcement Learning (RL) algorithms to SP. We present experiments on two tasks. The first, sequence labeling, has been extensively studied and allows us to compare the RL approach with traditional SP methods. The second, tree transformation, is a challenging SP task with numerous large-scale real-world applications. We show successful results with general RL algorithms on this task on which traditional SP models fail.

ECAI Conference 2008 Conference Paper

Efficient Data Clustering by Local Density Approximation

  • Marc-Ismaël Akodjènou-Jeannin
  • Patrick Gallinari

The clustering task is a key part of the data mining process. In today's context of massive data, methods with a computational complexity more than linear are unlikely to be applied practically. In this paper, we begin by a simple assumption: local projections of the data should allow to distinguish local cluster structures. From there, we describe how to obtain "pure" local sub-groupings of points, from projections on randomly chosen lines. The clustering of the data is obtained from the clustering of these sub-groupings. Our method has a linear complexity in the dataset size, and requires only one pass on the original dataset. Being local in essence, it can handle twisted geometries typical of many high-dimensional datasets. We describe the steps of our method and report encouraging results.

IJCAI Conference 2005 Conference Paper

Automatic learning of domain model for personalized hypermedia applications

  • Hermine Njike
  • Thierry Artières
  • Patrick Gallinari
  • Julien Blanchard
  • Guillaume

This paper deals with the automatic building of personalized hypermedia. We build upon ideas developed for educational hypermedia; the definition of a domain model and the use of overlay user models. Since much work has been done on learning user models and adapting hypermedia based on such models, we tackle the core problem: the automatic definition of a domain model for a static hypermedia.

NeurIPS Conference 2005 Conference Paper

Generalization error bounds for classifiers trained with interdependent data

  • Nicolas Usunier
  • Massih R. Amini
  • Patrick Gallinari

In this paper we propose a general framework to study the generalization properties of binary classifiers trained with data which may be depen- dent, but are deterministically generated upon a sample of independent examples. It provides generalization bounds for binary classification and some cases of ranking problems, and clarifies the relationship between these learning tasks.

IJCAI Conference 2003 Conference Paper

Semi-Supervised Learning with Explicit Misclassification Modeling

  • Massih-Reza Amini
  • Patrick Gallinari

This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the labelerror model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.

NeurIPS Conference 1990 Conference Paper

A Framework for the Cooperation of Learning Algorithms

  • Léon Bottou
  • Patrick Gallinari

We introduce a framework for training architectures composed of several modules. This framework, which uses a statistical formulation of learning systems, provides a unique formalism for describing many classical connectionist algorithms as well as complex systems where several algorithms interact. It allows to design hybrid systems which combine the advantages of connectionist algorithms as well as other learning algorithms.