Arrow Research search

Author name cluster

Alexandre Bouchard-Côté

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

20 papers
2 author rows

Possible papers

20

ICML Conference 2025 Conference Paper

AutoStep: Locally adaptive involutive MCMC

  • Tiange Liu
  • Nikola Surjanovic
  • Miguel Biron-Lattes
  • Alexandre Bouchard-Côté
  • Trevor Campbell

Many common Markov chain Monte Carlo (MCMC) kernels can be formulated using a deterministic involutive proposal with a step size parameter. Selecting an appropriate step size is often a challenging task in practice; and for complex multiscale targets, there may not be one choice of step size that works well globally. In this work, we address this problem with a novel class of involutive MCMC methods—AutoStep MCMC—that selects an appropriate step size at each iteration adapted to the local geometry of the target distribution. We prove that under mild conditions AutoStep MCMC is $\pi$-invariant, irreducible, and aperiodic, and obtain bounds on expected energy jump distance and cost per iteration. Empirical results examine the robustness and efficacy of our proposed step size selection procedure, and show that AutoStep MCMC is competitive with state-of-the-art methods in terms of effective sample size per unit cost on a range of challenging target distributions.

ICML Conference 2025 Conference Paper

Variational Phylogenetic Inference with Products over Bipartitions

  • Evan Sidrow
  • Alexandre Bouchard-Côté
  • Lloyd T. Elliott

Bayesian phylogenetics is vital for understanding evolutionary dynamics, and requires accurate and efficient approximation of posterior distributions over trees. In this work, we develop a variational Bayesian approach for ultrametric phylogenetic trees. We present a novel variational family based on coalescent times of a single-linkage clustering and derive a closed-form density for the resulting distribution over trees. Unlike existing methods for ultrametric trees, our method performs inference over all of tree space, it does not require any Markov chain Monte Carlo subroutines, and our variational family is differentiable. Through experiments on benchmark genomic datasets and an application to the viral RNA of SARS-CoV-2, we demonstrate that our method achieves competitive accuracy while requiring significantly fewer gradient evaluations than existing state-of-the-art techniques.

NeurIPS Conference 2022 Conference Paper

Parallel Tempering With a Variational Reference

  • Nikola Surjanovic
  • Saifuddin Syed
  • Alexandre Bouchard-Côté
  • Trevor Campbell

Sampling from complex target distributions is a challenging task fundamental to Bayesian inference. Parallel tempering (PT) addresses this problem by constructing a Markov chain on the expanded state space of a sequence of distributions interpolating between the posterior distribution and a fixed reference distribution, which is typically chosen to be the prior. However, in the typical case where the prior and posterior are nearly mutually singular, PT methods are computationally prohibitive. In this work we address this challenge by constructing a generalized annealing path connecting the posterior to an adaptively tuned variational reference. The reference distribution is tuned to minimize the forward (inclusive) KL divergence to the posterior distribution using a simple, gradient-free moment-matching procedure. We show that our adaptive procedure converges to the forward KL minimizer, and that the forward KL divergence serves as a good proxy to a previously developed measure of PT performance. We also show that in the large-data limit in typical Bayesian models, the proposed method improves in performance, while traditional PT deteriorates arbitrarily. Finally, we introduce PT with two references---one fixed, one variational---with a novel split annealing path that ensures stable variational reference adaptation. The paper concludes with experiments that demonstrate the large empirical gains achieved by our method in a wide range of realistic Bayesian inference scenarios.

JMLR Journal 2021 Journal Article

Analysis of high-dimensional Continuous Time Markov Chains using the Local Bouncy Particle Sampler

  • Tingting Zhao
  • Alexandre Bouchard-Côté

Sampling the parameters of high-dimensional Continuous Time Markov Chains (CTMC) is a challenging problem with important applications in many fields of applied statistics. In this work a recently proposed type of non-reversible rejection-free Markov Chain Monte Carlo (MCMC) sampler, the Bouncy Particle Sampler (BPS), is brought to bear to this problem. BPS has demonstrated its favourable computational efficiency compared with state-of-the-art MCMC algorithms, however to date applications to real-data scenario were scarce. An important aspect of practical implementation of BPS is the simulation of event times. Default implementations use conservative thinning bounds. Such bounds can slow down the algorithm and limit the computational performance. Our paper develops an algorithm with exact analytical solution to the random event times in the context of CTMCs. Our local version of BPS algorithm takes advantage of the sparse structure in the target factor graph and we also provide a graph-theoretic tool for assessing the computational complexity of local BPS algorithms. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

ICML Conference 2021 Conference Paper

Parallel tempering on optimized paths

  • Saifuddin Syed
  • Vittorio Romaniello
  • Trevor Campbell
  • Alexandre Bouchard-Côté

Parallel tempering (PT) is a class of Markov chain Monte Carlo algorithms that constructs a path of distributions annealing between a tractable reference and an intractable target, and then interchanges states along the path to improve mixing in the target. The performance of PT depends on how quickly a sample from the reference distribution makes its way to the target, which in turn depends on the particular path of annealing distributions. However, past work on PT has used only simple paths constructed from convex combinations of the reference and target log-densities. This paper begins by demonstrating that this path performs poorly in the setting where the reference and target are nearly mutually singular. To address this issue, we expand the framework of PT to general families of paths, formulate the choice of path as an optimization problem that admits tractable gradient estimates, and propose a flexible new family of spline interpolation paths for use in practice. Theoretical and empirical results both demonstrate that our proposed methodology breaks previously-established upper performance limits for traditional paths.

JMLR Journal 2021 Journal Article

Particle-Gibbs Sampling for Bayesian Feature Allocation Models

  • Alexandre Bouchard-Côté
  • Andrew Roth

Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on a single variable Gibbs update of the feature allocation matrix. These updates can be inefficient as features are typically strongly correlated. To overcome this problem we have developed a block sampler that can update an entire row of the feature allocation matrix in a single move. In the context of feature allocation models, naive block Gibbs sampling is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs (PG) sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features. We compare the performance of our proposed methods to the standard Gibbs sampler using synthetic and real data from a range of feature allocation models. Our results suggest that row wise updates using the PG methodology can significantly improve the performance of samplers for feature allocation models. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

UAI Conference 2020 Conference Paper

Slice Sampling for General Completely Random Measures

  • Peiyuan Zhu
  • Alexandre Bouchard-Côté
  • Trevor Campbell

Completely random measures provide a principled approach to creating flexible unsupervised models, where the number of latent features is infinite and the number of features that influence the data grows with the size of the data set. Due to the infinity the latent features, posterior inference requires either marginalization—resulting in dependence structures that prevent efficient computation via parallelization and conjugacy—or finite truncation, which arbitrarily limits the flexibility of the model. In this paper we present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables, enabling efficient, parallelized computation without sacrificing flexibility. In contrast to past work that achieved this on a model-by-model basis, we provide a general recipe that is applicable to the broad class of completely random measure-based priors. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models, demonstrating a higher effective sample size per second compared to algorithms using marginalization as well as a higher predictive performance compared to models employing fixed truncations.

ICML Conference 2019 Conference Paper

Scalable Metropolis-Hastings for Exact Bayesian Inference with Large Datasets

  • Robert Cornish
  • Paul Vanetti
  • Alexandre Bouchard-Côté
  • George Deligiannidis
  • Arnaud Doucet

Bayesian inference via standard Markov Chain Monte Carlo (MCMC) methods such as Metropolis-Hastings is too computationally intensive to handle large datasets, since the cost per step usually scales like $O(n)$ in the number of data points $n$. We propose the Scalable Metropolis-Hastings (SMH) kernel that only requires processing on average $O(1)$ or even $O(1/\sqrt{n})$ data points per step. This scheme is based on a combination of factorized acceptance probabilities, procedures for fast simulation of Bernoulli processes, and control variate ideas. Contrary to many MCMC subsampling schemes such as fixed step-size Stochastic Gradient Langevin Dynamics, our approach is exact insofar as the invariant distribution is the true posterior and not an approximation to it. We characterise the performance of our algorithm theoretically, and give realistic and verifiable conditions under which it is geometrically ergodic. This theory is borne out by empirical results that demonstrate overall performance benefits over standard Metropolis-Hastings and various subsampling algorithms.

JMLR Journal 2017 Journal Article

Particle Gibbs Split-Merge Sampling for Bayesian Inference in Mixture Models

  • Alexandre Bouchard-Côté
  • Arnaud Doucet
  • Andrew Roth

This paper presents an original Markov chain Monte Carlo method to sample from the posterior distribution of conjugate mixture models. This algorithm relies on a flexible split-merge procedure built using the particle Gibbs sampler introduced in Andrieu et al. (2009, 2010). The resulting so-called Particle Gibbs Split-Merge sampler does not require the computation of a complex acceptance ratio and can be implemented using existing sequential Monte Carlo libraries. We investigate its performance experimentally on synthetic problems as well as on geolocation data. Our results show that for a given computational budget, the Particle Gibbs Split-Merge sampler empirically outperforms existing split merge methods. The code and instructions allowing to reproduce the experiments is available at github.com/aroth85/pgsm. [abs] [ pdf ][ bib ] &copy JMLR 2017. ( edit, beta )

ICML Conference 2015 Conference Paper

Atomic Spatial Processes

  • Sean Jewell
  • Neil Spencer
  • Alexandre Bouchard-Côté

The emergence of compact GPS systems and the establishment of open data initiatives has resulted in widespread availability of spatial data for many urban centres. These data can be leveraged to develop data-driven intelligent resource allocation systems for urban issues such as policing, sanitation, and transportation. We employ techniques from Bayesian non-parametric statistics to develop a process which captures a common characteristic of urban spatial datasets. Specifically, our new spatial process framework models events which occur repeatedly at discrete spatial points, the number and locations of which are unknown a priori. We develop a representation of our spatial process which facilitates posterior simulation, resulting in an interpretable and computationally tractable model. The framework’s superiority over both empirical grid-based models and Dirichlet process mixture models is demonstrated by fitting, interpreting, and comparing models of graffiti prevalence for both downtown Vancouver and Manhattan.

ICML Conference 2014 Conference Paper

Efficient Continuous-Time Markov Chain Estimation

  • Monir Hajiaghayi
  • Bonnie Kirkpatrick
  • Liangliang Wang
  • Alexandre Bouchard-Côté

Many problems of practical interest rely on Continuous-time Markov chains (CTMCs) defined over combinatorial state spaces, rendering the computation of transition probabilities, and hence probabilistic inference, difficult or impossible with existing methods. For problems with countably infinite states, where classical methods such as matrix exponentiation are not applicable, the main alternative has been particle Markov chain Monte Carlo methods imputing both the holding times and sequences of visited states. We propose a particle-based Monte Carlo approach where the holding times are marginalized analytically. We demonstrate that in a range of realistic inferential setups, our scheme dramatically reduces the variance of the Monte Carlo approximation and yields more accurate parameter posterior approximations given a fixed computational budget. These experiments are performed on both synthetic and real datasets, drawing from two important examples of CTMCs having combinatorial state spaces: string-valued mutation models in phylogenetics and nucleic acid folding pathways.

ICML Conference 2014 Conference Paper

Memory (and Time) Efficient Sequential Monte Carlo

  • Seong-Hwan Jun
  • Alexandre Bouchard-Côté

Memory efficiency is an important issue in Sequential Monte Carlo (SMC) algorithms, arising for example in inference of high-dimensional latent variables via Rao-Blackwellized SMC algorithms, where the size of individual particles combined with the required number of particles can stress the main memory. Standard SMC methods have a memory requirement that scales linearly in the number of particles present at all stage of the algorithm. Our contribution is a simple scheme that makes the memory cost of SMC methods depends on the number of distinct particles that survive resampling. We show that this difference has a large empirical impact on the quality of the approximation in realistic scenarios, and also—since memory access is generally slow—on the running time. The method is based on a two pass generation of the particles, which are represented implicitly in the first pass. We parameterize the accuracy of our algorithm with a memory budget rather than with a fixed number of particles. Our algorithm adaptively selects an optimal number of particle to exploit this fixed memory budget. We show that this adaptation does not interfere with the usual consistency guarantees that come with SMC algorithms.

NeurIPS Conference 2012 Conference Paper

Bayesian Pedigree Analysis using Measure Factorization

  • Bonnie Kirkpatrick
  • Alexandre Bouchard-Côté

Pedigrees, or family trees, are directed graphs used to identify sites of the genome that are correlated with the presence or absence of a disease. With the advent of genotyping and sequencing technologies, there has been an explosion in the amount of data available, both in the number of individuals and in the number of sites. Some pedigrees number in the thousands of individuals. Meanwhile, analysis methods have remained limited to pedigrees of <100 individuals which limits analyses to many small independent pedigrees. Disease models, such those used for the linkage analysis log-odds (LOD) estimator, have similarly been limited. This is because linkage anlysis was originally designed with a different task in mind, that of ordering the sites in the genome, before there were technologies that could reveal the order. LODs are difficult to interpret and nontrivial to extend to consider interactions among sites. These developments and difficulties call for the creation of modern methods of pedigree analysis. Drawing from recent advances in graphical model inference and transducer theory, we introduce a simple yet powerful formalism for expressing genetic disease models. We show that these disease models can be turned into accurate and efficient estimators. The technique we use for constructing the variational approximation has potential applications to inference in other large-scale graphical models. This method allows inference on larger pedigrees than previously analyzed in the literature, which improves disease site prediction.

NeurIPS Conference 2012 Conference Paper

Entangled Monte Carlo

  • Seong-Hwan Jun
  • Liangliang Wang
  • Alexandre Bouchard-Côté

We propose a novel method for scalable parallelization of SMC algorithms, Entangled Monte Carlo simulation (EMC). EMC avoids the transmission of particles between nodes, and instead reconstructs them from the particle genealogy. In particular, we show that we can reduce the communication to the particle weights for each machine while efficiently maintaining implicit global coherence of the parallel simulation. We explain methods to efficiently maintain a genealogy of particles from which any particle can be reconstructed. We demonstrate using examples from Bayesian phylogenetic that the computational gain from parallelization using EMC significantly outweighs the cost of particle reconstruction. The timing experiments show that reconstruction of particles is indeed much more efficient as compared to transmission of particles.

NeurIPS Conference 2011 Conference Paper

Priors over Recurrent Continuous Time Processes

  • Ardavan Saeedi
  • Alexandre Bouchard-Côté

We introduce the Gamma-Exponential Process (GEP), a prior over a large family of continuous time stochastic processes. A hierarchical version of this prior (HGEP; the Hierarchical GEP) yields a useful model for analyzing complex time series. Models based on HGEPs display many attractive properties: conjugacy, exchangeability and closed-form predictive distribution for the waiting times, and exact Gibbs updates for the time scale parameters. After establishing these properties, we show how posterior inference can be carried efficiently using Particle MCMC methods [1]. This yields a MCMC algorithm that can resample entire sequences atomically while avoiding the complications of introducing slice and stick auxiliary variables of the beam sampler [2]. We applied our model to the problem of estimating the disease progression in multiple sclerosis [3], and to RNA evolutionary modeling [4]. In both domains, we found that our model outperformed the standard rate matrix estimation approach.

NeurIPS Conference 2010 Conference Paper

Variational Inference over Combinatorial Spaces

  • Alexandre Bouchard-Côté
  • Michael Jordan

Since the discovery of sophisticated fully polynomial randomized algorithms for a range of #P problems (Karzanov et al. , 1991; Jerrum et al. , 2001; Wilson, 2004), theoretical work on approximate inference in combinatorial spaces has focused on Markov chain Monte Carlo methods. Despite their strong theoretical guarantees, the slow running time of many of these randomized algorithms and the restrictive assumptions on the potentials have hindered the applicability of these algorithms to machine learning. Because of this, in applications to combinatorial spaces simple exact models are often preferred to more complex models that require approximate inference (Siepel et al. , 2004). Variational inference would appear to provide an appealing alternative, given the success of variational methods for graphical models (Wainwright et al. , 2008); unfortunately, however, it is not obvious how to develop variational approximations for combinatorial objects such as matchings, partial orders, plane partitions and sequence alignments. We propose a new framework that extends variational inference to a wide range of combinatorial spaces. Our method is based on a simple assumption: the existence of a tractable measure factorization, which we show holds in many examples. Simulations on a range of matching models show that the algorithm is more general and empirically faster than a popular fully polynomial randomized algorithm. We also apply the framework to the problem of multiple alignment of protein sequences, obtaining state-of-the-art results on the BAliBASE dataset (Thompson et al. , 1999).

UAI Conference 2009 Conference Paper

Optimization of Structured Mean Field Objectives

  • Alexandre Bouchard-Côté
  • Michael I. Jordan

In intractable, undirected graphical models, an intuitive way of creating structured mean field approximations is to select an acyclic tractable subgraph. We show that the hardness of computing the objective function and gradient of the mean field objective qualitatively depends on a simple graph property. If the tractable subgraph has this property— we call such subgraphs v-acyclic—a very fast block coordinate ascent algorithm is possible. If not, optimization is harder, but we show a new algorithm based on the construction of an auxiliary exponential family that can be used to make inference possible in this case as well. We discuss the advantages and disadvantages of each regime and compare the algorithms empirically.

NeurIPS Conference 2009 Conference Paper

Randomized Pruning: Efficiently Calculating Expectations in Large Dynamic Programs

  • Alexandre Bouchard-Côté
  • Slav Petrov
  • Dan Klein

Pruning can massively accelerate the computation of feature expectations in large models. However, any single pruning mask will introduce bias. We present a novel approach which employs a randomized sequence of pruning masks. Formally, we apply auxiliary variable MCMC sampling to generate this sequence of masks, thereby gaining theoretical guarantees about convergence. Because each mask is generally able to skip large portions of an underlying dynamic program, our approach is particularly compelling for high-degree algorithms. Empirically, we demonstrate our method on bilingual parsing, showing decreasing bias as more masks are incorporated, and outperforming fixed tic-tac-toe pruning.

NeurIPS Conference 2008 Conference Paper

Efficient Inference in Phylogenetic InDel Trees

  • Alexandre Bouchard-Côté
  • Dan Klein
  • Michael Jordan

Accurate and efficient inference in evolutionary trees is a central problem in computational biology. Realistic models require tracking insertions and deletions along the phylogenetic tree, making inference challenging. We propose new sampling techniques that speed up inference and improve the quality of the samples. We compare our method to previous approaches and show performance improvement on metrics evaluating multiple sequence alignment and reconstruction of ancestral sequences.

NeurIPS Conference 2007 Conference Paper

A Probabilistic Approach to Language Change

  • Alexandre Bouchard-Côté
  • Percy Liang
  • Dan Klein
  • Thomas Griffiths

We present a probabilistic approach to language change in which word forms are represented by phoneme sequences that undergo stochastic edits along the branches of a phylogenetic tree. Our framework combines the advantages of the classical comparative method with the robustness of corpus-based probabilistic models. We use this framework to explore the consequences of two different schemes for defining probabilistic models of phonological change, evaluating these schemes using the reconstruction of ancient word forms in Romance languages. The result is an efficient inference procedure for automatically inferring ancient word forms from modern languages, which can be generalized to support inferences about linguistic phylogenies.