Arthur Flajolet Papers

RLC Conference 2024 Conference Paper

PASTA: Pretrained Action-State Transformer Agents

Raphael Boige
Yannis Flet-Berliac
Lars C. P. M. Quaedvlieg
Arthur Flajolet
Guillaume Richard
Thomas PIERROT

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pretraining transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models pretrained on expert trajectories. However, existing methods mostly rely on intricate pretraining objectives tailored to specific downstream applications. This paper conducts a comprehensive investigation of models, referred to as pre-trained action-state transformer agents (PASTA). Our study covers a unified framework and covers an extensive set of general downstream tasks including behavioral cloning, offline Reinforcement Learning (RL), sensor failure robustness, and dynamics change adaptation. We systematically compare various design choices and offer valuable insights that will aid practitioners in developing robust models. Key findings highlight improved performance of component-level tokenization, the use of fundamental pretraining objectives such as next token prediction or masked language modeling, and simultaneous training of models across multiple domains. In this study, the developed models contain fewer than 7M parameters allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first principle design choices to represent RL trajectories and contribute to robust policy learning.

PDF Details

RLJ Journal 2024 Journal Article

PASTA: Pretrained Action-State Transformer Agents

Raphael Boige
Yannis Flet-Berliac
Lars C.P.M. Quaedvlieg
Arthur Flajolet
Guillaume Richard
Thomas PIERROT

Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pretraining transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In reinforcement learning, researchers have recently adapted these approaches, developing models pretrained on expert trajectories. However, existing methods mostly rely on intricate pretraining objectives tailored to specific downstream applications. This paper conducts a comprehensive investigation of models, referred to as pre-trained action-state transformer agents (PASTA). Our study covers a unified framework and covers an extensive set of general downstream tasks including behavioral cloning, offline Reinforcement Learning (RL), sensor failure robustness, and dynamics change adaptation. We systematically compare various design choices and offer valuable insights that will aid practitioners in developing robust models. Key findings highlight improved performance of component-level tokenization, the use of fundamental pretraining objectives such as next token prediction or masked language modeling, and simultaneous training of models across multiple domains. In this study, the developed models contain fewer than 7M parameters allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first principle design choices to represent RL trajectories and contribute to robust policy learning.

PDF Details

JMLR Journal 2024 Journal Article

QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration

Felix Chalumeau
Bryan Lim
Raphaël Boige
Maxime Allard
Luca Grillotti
Manon Flageat
Valentin Macé
Guillaume Richard

QDax is an open-source library with a streamlined and modular API for Quality-Diversity (QD) optimisation algorithms in Jax. The library serves as a versatile tool for optimisation purposes, ranging from black-box optimisation to continuous control. QDax offers implementations of popular QD, Neuroevolution, and Reinforcement Learning (RL) algorithms, supported by various examples. All the implementations can be just-in-time compiled with Jax, facilitating efficient execution across multiple accelerators, including GPUs and TPUs. These implementations effectively demonstrate the framework's flexibility and user-friendliness, easing experimentation for research purposes. Furthermore, the library is thoroughly documented and has 93% test coverage. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

PDF Details

ICLR Conference 2023 Conference Paper

Evolving Populations of Diverse RL Agents with MAP-Elites

Thomas Pierrot
Arthur Flajolet

Quality Diversity (QD) has emerged as a powerful alternative optimization paradigm that aims at generating large and diverse collections of solutions, notably with its flagship algorithm MAP-ELITES (ME) which evolves solutions through mutations and crossovers. While very effective for some unstructured problems, early ME implementations relied exclusively on random search to evolve the population of solutions, rendering them notoriously sample-inefficient for high-dimensional problems, such as when evolving neural networks. Follow-up works considered exploiting gradient information to guide the search in order to address these shortcomings through techniques borrowed from either Black-Box Optimization (BBO) or Reinforcement Learning (RL). While mixing RL techniques with ME unlocked state-of-the-art performance for robotics control problems that require a good amount of exploration, it also plagued these ME variants with limitations common among RL algorithms that ME was free of, such as hyperparameter sensitivity, high stochasticity as well as training instability, including when the population size increases as some components are shared across the population in recent approaches. Furthermore, existing approaches mixing ME with RL tend to be tied to a specific RL algorithm, which effectively prevents their use on problems where the corresponding RL algorithm fails. To address these shortcomings, we introduce a flexible framework that allows the use of any RL algorithm and alleviates the aforementioned limitations by evolving populations of agents (whose definition include hyperparameters and all learnable parameters) instead of just policies. We demonstrate the benefits brought about by our framework through extensive numerical experiments on a number of robotics control problems, some of which with deceptive rewards, taken from the QD-RL literature. We open source an efficient JAX-based implementation of our algorithm in the QDax library.

Details

ICLR Conference 2023 Conference Paper

Neuroevolution is a Competitive Alternative to Reinforcement Learning for Skill Discovery

Félix Chalumeau
Raphaël Boige
Bryan Lim
Valentin Macé
Maxime Allard
Arthur Flajolet
Antoine Cully
Thomas Pierrot

Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks. However, these policies tend to be overfit to the exact specifications of the task and environment they were trained on, and thus do not perform well when conditions deviate slightly or when composed hierarchically to solve even more complex tasks. Recent work has shown that training a mixture of policies, as opposed to a single one, that are driven to explore different regions of the state-action space can address this shortcoming by generating a diverse set of behaviors, referred to as skills, that can be collectively used to great effect in adaptation tasks or for hierarchical planning. This is typically realized by including a diversity term - often derived from information theory - in the objective function optimized by RL. However these approaches often require careful hyperparameter tuning to be effective. In this work, we demonstrate that less widely-used neuroevolution methods, specifically Quality Diversity (QD), are a competitive alternative to information-theory-augmented RL for skill discovery. Through an extensive empirical evaluation comparing eight state-of-the-art algorithms (four flagship algorithms from each line of work) on the basis of (i) metrics directly evaluating the skills' diversity, (ii) the skills' performance on adaptation tasks, and (iii) the skills' performance when used as primitives for hierarchical planning; QD methods are found to provide equal, and sometimes improved, performance whilst being less sensitive to hyperparameters and more scalable. As no single method is found to provide near-optimal performance across all environments, there is a rich scope for further research which we support by proposing future directions and providing optimized open-source implementations.

Details

ICML Conference 2022 Conference Paper

Fast Population-Based Reinforcement Learning on a Single Machine

Arthur Flajolet
Claire Bizon Monroc
Karim Beguir
Thomas Pierrot

Training populations of agents has demonstrated great promise in Reinforcement Learning for stabilizing training, improving exploration and asymptotic performance, and generating a diverse set of solutions. However, population-based training is often not considered by practitioners as it is perceived to be either prohibitively slow (when implemented sequentially), or computationally expensive (if agents are trained in parallel on independent accelerators). In this work, we compare implementations and revisit previous studies to show that the judicious use of compilation and vectorization allows population-based training to be performed on a single machine with one accelerator with minimal overhead compared to training a single agent. We also show that, when provided with a few accelerators, our protocols extend to large population sizes for applications such as hyperparameter tuning. We hope that this work and the public release of our code will encourage practitioners to use population-based learning techniques more frequently for their research and applications.

Details

NeurIPS Conference 2017 Conference Paper

Online Learning with a Hint

Ofer Dekel
Arthur Flajolet
Nika Haghtalab
Patrick Jaillet

We study a variant of online linear optimization where the player receives a hint about the loss function at the beginning of each round. The hint is given in the form of a vector that is weakly correlated with the loss vector on that round. We show that the player can benefit from such a hint if the set of feasible actions is sufficiently round. Specifically, if the set is strongly convex, the hint can be used to guarantee a regret of O(log(T)), and if the set is q-uniformly convex for q\in(2, 3), the hint can be used to guarantee a regret of o(sqrt{T}). In contrast, we establish Omega(sqrt{T}) lower bounds on regret when the set of feasible actions is a polyhedron.

PDF Details

NeurIPS Conference 2017 Conference Paper

Real-Time Bidding with Side Information

Arthur Flajolet
Patrick Jaillet

We consider the problem of repeated bidding in online advertising auctions when some side information (e. g. browser cookies) is available ahead of submitting a bid in the form of a $d$-dimensional vector. The goal for the advertiser is to maximize the total utility (e. g. the total number of clicks) derived from displaying ads given that a limited budget $B$ is allocated for a given time horizon $T$. Optimizing the bids is modeled as a contextual Multi-Armed Bandit (MAB) problem with a knapsack constraint and a continuum of arms. We develop UCB-type algorithms that combine two streams of literature: the confidence-set approach to linear contextual MABs and the probabilistic bisection search method for stochastic root-finding. Under mild assumptions on the underlying unknown distribution, we establish distribution-independent regret bounds of order $\tilde{O}(d \cdot \sqrt{T})$ when either $B = \infty$ or when $B$ scales linearly with $T$.

PDF Details

Possible papers