Author name cluster

Lucas Janson

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

17 papers

2 author rows

ICLR Conference 2025 Conference Paper

A New Perspective on Shampoo's Preconditioner

Depen Morwani
Itai Shapira
Nikhil Vyas 0001
Eran Malach
Sham M. Kakade
Lucas Janson

Shampoo, a second-order optimization algorithm that uses a Kronecker product preconditioner, has recently received increasing attention from the machine learning community. Despite the increasing popularity of Shampoo, the theoretical foundations of its effectiveness are not well understood. The preconditioner used by Shampoo can be viewed as either an approximation of the Gauss--Newton component of the Hessian or the covariance matrix of the gradients maintained by Adagrad. Our key contribution is providing an explicit and novel connection between the optimal Kronecker product approximation of these matrices and the approximation made by Shampoo. Our connection highlights a subtle but common misconception about Shampoo’s approximation. In particular, the square of the approximation used by the Shampoo optimizer is equivalent to a single step of the power iteration algorithm for computing the aforementioned optimal Kronecker product approximation. Across a variety of datasets and architectures we empirically demonstrate that this is close to the optimal Kronecker product approximation. We also study the impact of batch gradients and empirical Fisher on the quality of Hessian approximation. Our findings not only advance the theoretical understanding of Shampoo but also illuminate potential pathways for enhancing its practical performance.

Details

NeurIPS Conference 2025 Conference Paper

Bernstein–von Mises for Adaptively Collected Data

Kevin Du
Yash Nair
Lucas Janson

Uncertainty quantification (UQ) for adaptively collected data, such as that coming from adaptive experiments, bandits, or reinforcement learning, is necessary for critical elements of data collection such as ensuring safety and conducting after-study inference. The data's adaptivity creates significant challenges for frequentist UQ, yet Bayesian UQ remains the same as if the data were independent and identically distributed (i. i. d. ), making it an appealing and commonly used approach. Bayesian UQ requires the (correct) specification of a prior distribution while frequentist UQ does not, but for i. i. d. data the celebrated Bernstein–von Mises theorem shows that as the sample size grows, the prior `washes out' and Bayesian UQ becomes frequentist-valid, implying that the choice of prior need not be a major impediment to Bayesian UQ as it makes no difference asymptotically. This paper for the first time extends the Bernstein–von Mises theorem to adaptively collected data, proving asymptotic equivalence between Bayesian UQ and Wald-type frequentist UQ in this challenging setting. Our results do not require the standard stability condition for validity of Wald-type frequentist UQ, and thus provide positive results on frequentist validity of Bayesian UQ under stability. Counterintuitively however, they also provide a negative result that Bayesian UQ is not asymptotically frequentist valid when stability fails, despite the fact that the prior washes out and Bayesian UQ asymptotically matches standard Wald-type frequentist UQ. We empirically validate our theory (positive and negative) via a range of simulations.

PDF Details

AAAI Conference 2025 Conference Paper

Context in Public Health for Underserved Communities: A Bayesian Approach to Online Restless Bandits

Biyonka Liang
Lily Xu
Aparna Taneja
Milind Tambe
Lucas Janson

Public health programs often provide interventions to encourage program adherence, and effectively allocating interventions is vital for producing the greatest overall health outcomes, especially in underserved communities where resources are limited. Such resource allocation problems are often modeled as restless multi-armed bandits (RMABs) with unknown underlying transition dynamics, hence requiring online reinforcement learning (RL). We present Bayesian Learning for Contextual RMABs (BCoR), an online RL approach for RMABs that novelly combines techniques in Bayesian modeling with Thompson sampling to flexibly model the complex RMAB settings present in public health program adherence problems, namely context and non-stationarity. BCoR's key strength is the ability to leverage shared information within and between arms to learn the unknown RMAB transition dynamics quickly in intervention-scarce settings with relatively short time horizons, which is common in public health applications. Empirically, BCoR achieves substantially higher finite-sample performance over a range of experimental settings, including a setting using real-world adherence data that was developed in collaboration with ARMMAN, an NGO in India which runs a large-scale maternal mHealth program, showcasing BCoR practical utility and potential for real-world deployment.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Evaluating Index-based Treatment Allocation in Underresourced Communities

Niclas Boehmer
Yash Nair
Sanket Shah
Lucas Janson
Aparna Taneja
Milind Tambe

In many applications of AI for Social Impact (e.g., when allocating spots in support programs for underserved communities), resources are scarce and an allocation policy is needed to decide who receives a resource. Before being deployed at scale, a rigorous evaluation of an AI-powered allocation policy is vital. In this paper, we introduce the methods necessary to evaluate index-based allocation policies, which allocate a limited number of resources to those who need them the most. Such policies create dependencies between agents, rendering standard statistical tests invalid and ineffective. Addressing the arising practical and technical challenges, we describe an efficient estimator and methods for drawing valid statistical conclusions. Our extensive experiments validate our methodology in practical settings while also showcasing its statistical power. We conclude by proposing and empirically verifying extensions of our methodology that enable us to reevaluate a past randomized control trial conducted with 10000 beneficiaries for a mHealth program for pregnant women. Our new methodology allows us to draw previously invisible conclusions when comparing two different ML allocation policies.

PDF Details DOI

ICLR Conference 2025 Conference Paper

SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling

Nikhil Vyas 0001
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham M. Kakade

There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks include additional hyperparameters and computational overhead when compared to Adam, which only updates running averages of first- and second-moment quantities. This work establishes a formal connection between Shampoo (implemented with the 1/2 power) and Adafactor --- a memory-efficient approximation of Adam --- showing that Shampoo is equivalent to running Adafactor in the eigenbasis of Shampoo's preconditioner. This insight leads to the design of a simpler and computationally efficient algorithm: **S**hampo**O** with **A**dam in the **P**reconditioner's eigenbasis (SOAP). With regards to improving Shampoo's computational efficiency, the most straightforward approach would be to simply compute Shampoo's eigendecomposition less frequently. Unfortunately, as our empirical results show, this leads to performance degradation that worsens with this frequency. SOAP mitigates this degradation by continually updating the running average of the second moment, just as Adam does, but in the current (slowly changing) coordinate basis. Furthermore, since SOAP is equivalent to running Adam in a rotated space, it introduces only one additional hyperparameter (the preconditioning frequency) compared to Adam. We empirically evaluate SOAP on language model pre-training with 360m and 660m sized models. In the large batch regime, SOAP reduces the number of iterations by over 40\% and wall clock time by over 35\% compared to AdamW, with approximately 20\% improvements in both metrics compared to Shampoo. An implementation of SOAP is available at https://github.com/nikhilvyas/SOAP.

Details

NeurIPS Conference 2024 Conference Paper

Optimal ablation for interpretability

Maximilian Li
Lucas Janson

Interpretability studies often involve tracing the flow of information through machine learning models to identify specific model components that perform relevant computations for tasks of interest. Prior work quantifies the importance of a model component on a particular task by measuring the impact of performing ablation on that component, or simulating model inference with the component disabled. We propose a new method, optimal ablation (OA), and show that OA-based component importance has theoretical and empirical advantages over measuring importance via other ablation methods. We also show that OA-based component importance can benefit several downstream interpretability tasks, including circuit discovery, localization of factual recall, and latent prediction.

PDF Details DOI

ICML Conference 2024 Conference Paper

Total Variation Floodgate for Variable Importance Inference in Classification

Wenshuo Wang 0002
Lucas Janson
Lihua Lei
Aaditya Ramdas

Inferring variable importance is the key goal of many scientific studies, where researchers seek to learn the effect of a feature $X$ on the outcome $Y$ in the presence of confounding variables $Z$. Focusing on classification problems, we define the expected total variation (ETV), which is an intuitive and deterministic measure of variable importance that does not rely on any model assumption. We then introduce algorithms for statistical inference on the ETV under design-based/model-X assumptions. We name our method Total Variation Floodgate in reference to its shared high-level structure with the Floodgate method of Zhang & Janson (2020). The algorithms we introduce can leverage any user-specified regression function and produce asymptotic lower confidence bounds for the ETV. We show the effectiveness of our algorithms with simulations and a case study in conjoint analysis on the US general election.

Details

IROS Conference 2022 Conference Paper

The Role of Tactile Sensing in Learning and Deploying Grasp Refinement Algorithms

Alexander König
Zixi Liu
Lucas Janson
Robert D. Howe

A long-standing question in robot hand design is how accurate tactile sensing must be. This paper uses simulated tactile signals and the reinforcement learning (RL) framework to study the sensing needs in grasping systems. Our first experiment investigates the need for rich tactile sensing in the rewards of RL-based grasp refinement algorithms for multi-fingered robotic hands. We systematically integrate different levels of tactile data into the rewards using analytic grasp stability metrics. We find that combining information on contact positions, normals, and forces in the reward yields the highest average success rates of 95. 4% for cuboids, 93. 1% for cylinders, and 62. 3% for spheres across wrist position errors between 0 and 7 centimeters and rotational errors between 0 and 14 degrees. This contact-based reward outperforms a non-tactile binary-reward baseline by 42. 9%. Our follow-up experiment shows that when training with tactile-enabled rewards, the use of tactile information in the control policy's state vector is drastically reducible at only a slight performance decrease of at most 6. 6% for no tactile sensing in the state. Since policies do not require access to the reward signal at test time, our work implies that models trained on tactile-enabled hands are deployable to robotic hands with a smaller sensor suite, potentially reducing cost dramatically.

Details

JMLR Journal 2021 Journal Article

Exact Asymptotics for Linear Quadratic Adaptive Control

Feicheng Wang
Lucas Janson

Recent progress in reinforcement learning has led to remarkable performance in a range of applications, but its deployment in high-stakes settings remains quite rare. One reason is a limited understanding of the behavior of reinforcement algorithms, both in terms of their regret and their ability to learn the underlying system dynamics---existing work is focused almost exclusively on characterizing rates, with little attention paid to the constants multiplying those rates that can be critically important in practice. To start to address this challenge, we study perhaps the simplest non-bandit reinforcement learning problem: linear quadratic adaptive control (LQAC). By carefully combining recent finite-sample performance bounds for the LQAC problem with a particular (less-recent) martingale central limit theorem, we are able to derive asymptotically exact expressions for the regret, estimation error, and prediction error of a rate-optimal stepwise-updating LQAC algorithm. In simulations on both stable and unstable systems, we find that our asymptotic theory also describes the algorithm's finite-sample behavior remarkably well. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2021. ( edit, beta )

PDF Details

NeurIPS Conference 2021 Conference Paper

Statistical Inference with M-Estimators on Adaptively Collected Data

Kelly Zhang
Lucas Janson
Susan Murphy

Bandit algorithms are increasingly used in real-world sequential decision-making problems. Associated with this is an increased desire to be able to use the resulting datasets to answer scientific questions like: Did one type of ad lead to more purchases? In which contexts is a mobile health intervention effective? However, classical statistical approaches fail to provide valid confidence intervals when used with data collected with bandit algorithms. Alternative methods have recently been developed for simple models (e. g. , comparison of means). Yet there is a lack of general methods for conducting statistical inference using more complex models on data collected with (contextual) bandit algorithms; for example, current methods cannot be used for valid inference on parameters in a logistic regression model for a binary reward. In this work, we develop theory justifying the use of M-estimators---which includes estimators based on empirical risk minimization as well as maximum likelihood---on data collected with adaptive algorithms, including (contextual) bandit algorithms. Specifically, we show that M-estimators, modified with particular adaptive weights, can be used to construct asymptotically valid confidence regions for a variety of inferential targets.

PDF Details

NeurIPS Conference 2020 Conference Paper

Cross-validation Confidence Intervals for Test Error

Pierre Bayle
Alexandre Bayle
Lucas Janson
Lester Mackey

This work develops central limit theorems for cross-validation and consistent estimators of the asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.

PDF Details

NeurIPS Conference 2020 Conference Paper

Inference for Batched Bandits

Kelly Zhang
Lucas Janson
Susan Murphy

As bandit algorithms are increasingly utilized in scientific studies and industrial applications, there is an associated increasing need for reliable inference methods based on the resulting adaptively-collected data. In this work, we develop methods for inference on data collected in batches using a bandit algorithm. We prove that the bandit arm selection probabilities cannot generally be assumed to concentrate. Non-concentration of the arm selection probabilities makes inference on adaptively-collected data challenging because classical statistical inference approaches, such as using asymptotic normality or the bootstrap, can have inflated Type-1 error and confidence intervals with below-nominal coverage probabilities even asymptotically. In response we develop the Batched Ordinary Least Squares estimator (BOLS) that we prove is (1) asymptotically normal on data collected from both multi-arm and contextual bandits and (2) robust to non-stationarity in the baseline reward and thus leads to reliable Type-1 error control and accurate confidence intervals.

PDF Details

ICRA Conference 2020 Conference Paper

Map-Predictive Motion Planning in Unknown Environments

Amine Elhafsi
Boris Ivanovic
Lucas Janson
Marco Pavone 0001

Algorithms for motion planning in unknown environments are generally limited in their ability to reason about the structure of the unobserved environment. As such, current methods generally navigate unknown environments by relying on heuristic methods to choose intermediate objectives along frontiers. We present a unified method that combines map prediction and motion planning for safe, time-efficient au-tonomous navigation of unknown environments by dynamically-constrained robots. We propose a data-driven method for predicting the map of the unobserved environment, using the robot's observations of its surroundings as context. These map predictions are then used to plan trajectories from the robot's position to the goal without requiring frontier selection. We applied this map-predictive motion planning strategy to randomly generated winding hallway environments, yielding substantial improvement in trajectory duration over a naïve frontier pursuit method. We also experimentally demonstrate similar performance to methods using more sophisticated fron-tier selection heuristics while significantly reducing computation time.

Details

ICRA Conference 2020 Conference Paper

Revisiting the Asymptotic Optimality of RRT

Kiril Solovey
Lucas Janson
Edward Schmerling
Emilio Frazzoli
Marco Pavone 0001

RRT* is one of the most widely used sampling-based algorithms for asymptotically-optimal motion planning. RRT* laid the foundations for optimality in motion planning as a whole, and inspired the development of numerous new algorithms in the field, many of which build upon RRT* itself. In this paper, we first identify a logical gap in the optimality proof of RRT*, which was developed by Karaman and Frazzoli (2011). Then, we present an alternative and mathematically-rigorous proof for asymptotic optimality. Our proof suggests that the connection radius used by RRT* should be increased from γ (log n/n) 1/d to γ' (log n/n) 1/(d+1) in order to account n n for the additional dimension of time that dictates the samples' ordering. Here γ, γ' are constants, and n, d are the number of samples and the dimension of the problem, respectively.

Details

JMLR Journal 2018 Journal Article

Risk-Constrained Reinforcement Learning with Percentile Risk Criteria

Yinlam Chow
Mohammad Ghavamzadeh
Lucas Janson
Marco Pavone

In many sequential decision-making problems one is interested in minimizing an expected cumulative cost while taking into account risk, i.e., increased awareness of events of small probability and high consequences. Accordingly, the objective of this paper is to present efficient reinforcement learning algorithms for risk-constrained Markov decision processes (MDPs), where risk is represented via a chance constraint or a constraint on the conditional value-at-risk (CVaR) of the cumulative cost. We collectively refer to such problems as percentile risk-constrained MDPs. Specifically, we first derive a formula for computing the gradient of the Lagrangian function for percentile risk-constrained MDPs. Then, we devise policy gradient and actor-critic algorithms that (1) estimate such gradient, (2) update the policy in the descent direction, and (3) update the Lagrange multiplier in the ascent direction. For these algorithms we prove convergence to locally optimal policies. Finally, we demonstrate the effectiveness of our algorithms in an optimal stopping problem and an online marketing application. [abs] [ pdf ][ bib ] &copy JMLR 2018. ( edit, beta )

PDF Details

IROS Conference 2015 Conference Paper

An asymptotically-optimal sampling-based algorithm for Bi-directional motion planning

Joseph A. Starek
Javier V. Gómez
Edward Schmerling
Lucas Janson
Luis Moreno 0001
Marco Pavone 0001

Bi-directional search is a widely used strategy to increase the success and convergence rates of sampling-based motion planning algorithms. Yet, few results are available that merge both bi-directional search and asymptotic optimality into existing optimal planners, such as PRM*, RRT*, and FMT*. The objective of this paper is to fill this gap. Specifically, this paper presents a bi-directional, sampling-based, asymptotically-optimal algorithm named Bi-directional FMT* (BFMT*) that extends the Fast Marching Tree (FMT*) algorithm to bi-directional search while preserving its key properties, chiefly lazy search and asymptotic optimality through convergence in probability. BFMT* performs a two-source, lazy dynamic programming recursion over a set of randomly-drawn samples, correspondingly generating two search trees: one in cost-to-come space from the initial configuration and another in cost-to-go space from the goal configuration. Numerical experiments illustrate the advantages of BFMT* over its unidirectional counterpart, as well as a number of other state-of-the-art planners.

Details

ICRA Conference 2015 Conference Paper

Optimal sampling-based motion planning under differential constraints: The driftless case

Edward Schmerling
Lucas Janson
Marco Pavone 0001

Motion planning under differential constraints is a classic problem in robotics. To date, the state of the art is represented by sampling-based techniques, with the Rapidly-exploring Random Tree algorithm as a leading example. Yet, the problem is still open in many aspects, including guarantees on the quality of the obtained solution. In this paper we provide a thorough theoretical framework to assess optimality guarantees of sampling-based algorithms for planning under differential constraints. We exploit this framework to design and analyze two novel sampling-based algorithms that are guaranteed to converge, as the number of samples increases, to an optimal solution (namely, the Differential Probabilistic RoadMap algorithm and the Differential Fast Marching Tree algorithm). Our focus is on driftless control-affine dynamical models, which accurately model a large class of robotic systems. In this paper we use the notion of convergence in probability (as opposed to convergence almost surely): the extra mathematical flexibility of this approach yields convergence rate bounds - a first in the field of optimal sampling-based motion planning under differential constraints. Numerical experiments corroborating our theoretical results are presented and discussed.

Details