Arrow Research search

Author name cluster

Alexandre Passos

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

ICML Conference 2023 Conference Paper

Arithmetic Sampling: Parallel Diverse Decoding for Large Language Models

  • Luke Vilnis
  • Yury Zemlyanskiy
  • Patrick Murray
  • Alexandre Passos
  • Sumit Sanghai

Decoding methods for large language models often trade-off between diversity of outputs and parallelism of computation. Methods such as beam search and Gumbel top-k sampling can guarantee a different output for each element of the beam, but are not easy to parallelize. Alternatively, methods such as temperature sampling and its modifications (top-k sampling, nucleus sampling, typical decoding, and others), are embarrassingly parallel, but have no guarantees about duplicate samples. We present a framework for sampling according to an arithmetic code book implicitly defined by a large language model, compatible with common sampling variations, with provable beam diversity under certain conditions, as well as being embarrassingly parallel and providing unbiased and consistent expectations from the original model. We demonstrate the effectiveness of our approach on WMT machine translation, more than halving the standard deviation when estimating expected BLEU score reward, and closing the BLEU score gap between independent sampling and beam search by up to 63%.

JMLR Journal 2023 Journal Article

Scaling Up Models and Data with t5x and seqio

  • Adam Roberts
  • Hyung Won Chung
  • Gaurav Mishra
  • Anselm Levskaya
  • James Bradbury
  • Daniel Andor
  • Sharan Narang
  • Brian Lester

Scaling up training datasets and model parameters have benefited neural network-based language models, but also present challenges like distributed compute, input data bottlenecks and reproducibility of results. We introduce two simple and scalable software libraries that simplify these issues: t5x enables training large language models at scale, while seqio enables reproducible input and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on multi-terabyte datasets. Configurations and instructions for T5-like and GPT-like models are also provided. The libraries can be found at https://github.com/google-research/t5x and https://github.com/google/seqio. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

UAI Conference 2014 Conference Paper

Message Passing for Soft Constraint Dual Decomposition

  • David Belanger 0002
  • Alexandre Passos
  • Sebastian Riedel 0001
  • Andrew McCallum

Dual decomposition provides the opportunity to build complex, yet tractable, structured prediction models using linear constraints to link together submodels that have available MAP inference routines. However, since some constraints might not hold on every single example, such models can often be improved by relaxing the requirement that these constraints always hold, and instead replacing them with soft constraints that merely impose a penalty if violated. A dual objective for the resulting MAP inference problem differs from the hard constraint problem’s associated dual decomposition objective only in that the dual variables are subject to box constraints. This paper introduces a novel primaldual block coordinate descent algorithm for minimizing this general family of box-constrained objectives. Through experiments on two natural language corpus-wide inference tasks, we demonstrate the advantages of our approach over the current alternative, based on copying variables, adding auxiliary submodels and using traditional dual decomposition. Our algorithm performs inference in the same model as was previously published for these tasks, and thus is capable of achieving the same accuracy, but provides a 2-10x speedup over the current state of the art.

NeurIPS Conference 2012 Conference Paper

MAP Inference in Chains using Column Generation

  • David Belanger
  • Alexandre Passos
  • Sebastian Riedel
  • Andrew McCallum

Linear chains and trees are basic building blocks in many applications of graphical models. Although exact inference in these models can be performed by dynamic programming, this computation can still be prohibitively expensive with non-trivial target variable domain sizes due to the quadratic dependence on this size. Standard message-passing algorithms for these problems are inefficient because they compute scores on hypotheses for which there is strong negative local evidence. For this reason there has been significant previous interest in beam search and its variants; however, these methods provide only approximate inference. This paper presents new efficient exact inference algorithms based on the combination of it column generation and pre-computed bounds on the model's cost structure. Improving worst-case performance is impossible. However, our method substantially speeds real-world, typical-case inference in chains and trees. Experiments show our method to be twice as fast as exact Viterbi for Wall Street Journal part-of-speech tagging and over thirteen times faster for a joint part-of-speed and named-entity-recognition task. Our algorithm is also extendable to new techniques for approximate inference, to faster two-best inference, and new opportunities for connections between inference and learning.

JMLR Journal 2011 Journal Article

Scikit-learn: Machine Learning in Python

  • Fabian Pedregosa
  • Gaël Varoquaux
  • Alexandre Gramfort
  • Vincent Michel
  • Bertrand Thirion
  • Olivier Grisel
  • Mathieu Blondel
  • Peter Prettenhofer

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2011. ( edit, beta )