Arrow Research search

Author name cluster

Alexandre Fréchette

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

NeurIPS Conference 2025 Conference Paper

Tapered Off-Policy REINFORCE - Stable and efficient reinforcement learning for large language models

  • Nicolas Le Roux
  • Marc Bellemare
  • Jonathan Lebensold
  • Arnaud Bergeron
  • Joshua Greaves
  • Alexandre Fréchette
  • Carolyne Pelletier
  • Eric Thibodeau-Laufer

We propose a new algorithm for fine-tuning large language models using reinforcement learning. Tapered Off-Policy REINFORCE (TOPR) uses an asymmetric, tapered variant of importance sampling to speed up learning while maintaining stable learning dynamics, even without the use of KL regularization. TOPR can be applied in a fully offline fashion, allows the handling of positive and negative examples in a unified framework, and benefits from the implementational simplicity that is typical of Monte Carlo algorithms. We demonstrate the effectiveness of our approach with a series of experiments on the GSM8K and MATH reasoning benchmarks, finding performance gains for training both a model for solution generation and as a generative verifier. We show that properly leveraging positive and negative examples alike in the off-policy regime simultaneously increases test-time accuracy and training data efficiency, all the while avoiding the ``wasted inference'' that comes with discarding negative examples. We find that this advantage persists over multiple iterations of training and can be amplified by dataset curation techniques, enabling us to match 70B-parameter model performance with 8B language models. As a corollary to this work, we find that REINFORCE's baseline parameter plays an important and unexpected role in defining dataset composition in the presence of negative examples, and is consequently critical in driving off-policy performance.

NeurIPS Conference 2023 Conference Paper

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

  • Viorica Patraucean
  • Lucas Smaira
  • Ankush Gupta
  • Adria Recasens
  • Larisa Markeeva
  • Dylan Banarse
  • Skanda Koppula
  • Joseph Heyward

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e. g. Flamingo, BEiT-3, or GPT-4). Compared to existing benchmarks that focus on computational tasks (e. g. classification, detection or tracking), the Perception Test focuses on skills (Memory, Abstraction, Physics, Semantics) and types of reasoning (descriptive, explanatory, predictive, counterfactual) across video, audio, and text modalities, to provide a comprehensive and efficient evaluation tool. The benchmark probes pre-trained models for their transfer capabilities, in a zero-shot / few-shot or limited finetuning regime. For these purposes, the Perception Test introduces 11. 6k real-world videos, 23s average length, designed to show perceptually interesting situations, filmed by around 100 participants worldwide. The videos are densely annotated with six types of labels (multiple-choice and grounded video question-answers, object and point tracks, temporal action and sound segments), enabling both language and non-language evaluations. The fine-tuning and validation splits of the benchmark are publicly available (CC-BY license), in addition to a challenge server with a held-out test split. Human baseline results compared to state-of-the-art video QA models show a significant gap in performance (91. 4% vs 45. 8%), suggesting that there is significant room for improvement in multimodal video understanding. Dataset, baselines code, and challenge server are available at https: //github. com/deepmind/perception_test

IJCAI Conference 2018 Conference Paper

Quantifying Algorithmic Improvements over Time

  • Lars Kotthoff
  • Alexandre Fréchette
  • Tomasz Michalak
  • Talal Rahwan
  • Holger H. Hoos
  • Kevin Leyton-Brown

Assessing the progress made in AI and contributions to the state of the art is of major concern to the community. Recently, Frechette et al. [2016] advocated performing such analysis via the Shapley value, a concept from coalitional game theory. In this paper, we argue that while this general idea is sound, it unfairly penalizes older algorithms that advanced the state of the art when introduced, but were then outperformed by modern counterparts. Driven by this observation, we introduce the temporal Shapley value, a measure that addresses this problem while maintaining the desirable properties of the (classical) Shapley value. We use the tempo- ral Shapley value to analyze the progress made in (i) the different versions of the Quicksort algorithm; (ii) the annual SAT competitions 2007–2014; (iii) an annual competition of Constraint Programming, namely the MiniZinc challenge 2014–2016. Our analysis reveals novel insights into the development made in these important areas of research over time.

AAAI Conference 2016 Conference Paper

Solving the Station Repacking Problem

  • Alexandre Fréchette
  • Neil Newman
  • Kevin Leyton-Brown

We investigate the problem of repacking stations in the FCC’s upcoming, multi-billion-dollar “incentive auction”. Early efforts to solve this problem considered mixed-integer programming formulations, which we show are unable to reliably solve realistic, national-scale problem instances. We describe the result of a multi-year investigation of alternatives: a solver, SATFC, that has been adopted by the FCC for use in the incentive auction. SATFC is based on a SAT encoding paired with a wide range of techniques: constraint graph decomposition; novel caching mechanisms that allow for reuse of partial solutions from related, solved problems; algorithm configuration; algorithm portfolios; and the marriage of local-search and complete solver strategies. We show that our approach solves virtually all of a set of problems derived from auction simulations within the short time budget required in practice.

AAAI Conference 2016 Conference Paper

Using the Shapley Value to Analyze Algorithm Portfolios

  • Alexandre Fréchette
  • Lars Kotthoff
  • Tomasz Michalak
  • Talal Rahwan
  • Holger Hoos
  • Kevin Leyton-Brown

Algorithms for NP-complete problems often have different strengths and weaknesses, and thus algorithm portfolios often outperform individual algorithms. It is surprisingly difficult to quantify a component algorithm’s contribution to such a portfolio. Reporting a component’s standalone performance wrongly rewards near-clones while penalizing algorithms that have small but distinct areas of strength. Measuring a component’s marginal contribution to an existing portfolio is better, but penalizes sets of strongly correlated algorithms, thereby obscuring situations in which it is essential to have at least one algorithm from such a set. This paper argues for analyzing component algorithm contributions via a measure drawn from coalitional game theory—the Shapley value—and yields insight into a research community’s progress over time. We conclude with an application of the analysis we advocate to SAT competitions, yielding novel insights into the behaviour of algorithm portfolios, their components, and the state of SAT solving technology.