Arrow Research search

Author name cluster

Mohsen Bayati

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

NeurIPS Conference 2025 Conference Paper

Conformal Arbitrage: Risk-Controlled Balancing of Competing Objectives in Language Models

  • William Overman
  • Mohsen Bayati

Modern language‑model deployments must often balance competing objectives—for example, helpfulness versus harmlessness, cost versus accuracy, and reward versus safety. We introduce Conformal Arbitrage, a post‑hoc framework that learns a data‑driven threshold to mediate between a Primary model optimized for a primary objective and a more conservative Guardian—which could be another model or a human domain expert—aligned with a guardrail objective. The threshold is calibrated with conformal risk control, yielding finite‑sample, distribution‑free guarantees that the long‑run frequency of undesirable events (such as factual errors or safety violations) does not exceed a user‑specified quota. Because Conformal Arbitrage operates wholly at the API level—without requiring access to model logits or updating model weights—it complements weight‑based alignment techniques and integrates seamlessly with existing cost‑aware cascades. Empirically, Conformal Arbitrage traces an efficient frontier, allowing users to define an acceptable performance level for one objective while maximizing utility in another. We observe that our method outperforms (in terms of accuracy) cost-matched random routing between models. These properties make Conformal Arbitrage a practical, theoretically grounded tool for trustworthy and economical deployment of large language models across a broad range of potentially competing objectives.

ICLR Conference 2025 Conference Paper

Geometry-Aware Approaches for Balancing Performance and Theoretical Guarantees in Linear Bandits

  • Yuwei Luo
  • Mohsen Bayati

This paper is motivated by recent research in the $d$-dimensional stochastic linear bandit literature, which has revealed an unsettling discrepancy: algorithms like Thompson sampling and Greedy demonstrate promising empirical performance, yet this contrasts with their pessimistic theoretical regret bounds. The challenge arises from the fact that while these algorithms may perform poorly in certain problem instances, they generally excel in typical instances. To address this, we propose a new data-driven technique that tracks the geometric properties of the uncertainty ellipsoid around the main problem parameter. This methodology enables us to formulate a data-driven frequentist regret bound, which incorporates the geometric information, for a broad class of base algorithms, including Greedy, OFUL, and Thompson sampling. This result allows us to identify and ``course-correct" problem instances in which the base algorithms perform poorly. The course-corrected algorithms achieve the minimax optimal regret of order $\tilde{\mathcal{O}}(d\sqrt{T})$ for a $T$-period decision-making scenario, effectively maintaining the desirable attributes of the base algorithms, including their empirical efficacy. We present simulation results to validate our findings using synthetic and real data.

NeurIPS Conference 2024 Conference Paper

Aligning Model Properties via Conformal Risk Control

  • William Overman
  • Jacqueline J. Vallon
  • Mohsen Bayati

AI model alignment is crucial due to inadvertent biases in training data and the underspecified machine learning pipeline, where models with excellent test metrics may not meet end-user requirements. While post-training alignment via human feedback shows promise, these methods are often limited to generative AI settings where humans can interpret and provide feedback on model outputs. In traditional non-generative settings with numerical or categorical outputs, detecting misalignment through single-sample outputs remains challenging, and enforcing alignment during training requires repeating costly training processes. In this paper we consider an alternative strategy. We propose interpreting model alignment through property testing, defining an aligned model $f$ as one belonging to a subset $\mathcal{P}$ of functions that exhibit specific desired behaviors. We focus on post-processing a pre-trained model $f$ to better align with $\mathcal{P}$ using conformal risk control. Specifically, we develop a general procedure for converting queries for testing a given property $\mathcal{P}$ to a collection of loss functions suitable for use in a conformal risk control algorithm. We prove a probabilistic guarantee that the resulting conformal interval around $f$ contains a function approximately satisfying $\mathcal{P}$. We exhibit applications of our methodology on a collection of supervised learning datasets for (shape-constrained) properties such as monotonicity and concavity. The general procedure is flexible and can be applied to a wide range of desired properties. Finally, we prove that pre-trained models will always require alignment techniques even as model sizes or training data increase, as long as the training data contains even small biases.

NeurIPS Conference 2024 Conference Paper

Higher-Order Causal Message Passing for Experimentation with Complex Interference

  • Mohsen Bayati
  • Yuwei Luo
  • William Overman
  • Sadegh Shirani
  • Ruoxuan Xiong

Accurate estimation of treatment effects is essential for decision-making across various scientific fields. This task, however, becomes challenging in areas like social sciences and online marketplaces, where treating one experimental unit can influence outcomes for others through direct or indirect interactions. Such interference can lead to biased treatment effect estimates, particularly when the structure of these interactions is unknown. We address this challenge by introducing a new class of estimators based on causal message-passing, specifically designed for settings with pervasive, unknown interference. Our estimator draws on information from the sample mean and variance of unit outcomes and treatments over time, enabling efficient use of observed data to estimate the evolution of the system state. Concretely, we construct non-linear features from the moments of unit outcomes and treatments and then learn a function that maps these features to future mean and variance of unit outcomes. This allows for the estimation of the treatment effect over time. Extensive simulations across multiple domains, using synthetic and real network data, demonstrate the efficacy of our approach in estimating total treatment effect dynamics, even in cases where interference exhibits non-monotonic behavior in the probability of treatment.

JMLR Journal 2022 Journal Article

On Low-rank Trace Regression under General Sampling Distribution

  • Nima Hamidi
  • Mohsen Bayati

In this paper, we study the trace regression when a matrix of parameters $\mathbf{B}^\star$ is estimated via the convex relaxation of a rank-regularized regression or via regularized non-convex optimization. It is known that these estimators satisfy near-optimal error bounds under assumptions on the rank, coherence, and spikiness of $\mathbf{B}^\star$. We start by introducing a general notion of spikiness for $\mathbf{B}^\star$ that provides a generic recipe to prove the restricted strong convexity of the sampling operator of the trace regression and obtain near-optimal and non-asymptotic error bounds for the estimation error. Similar to the existing literature, these results require the regularization parameter to be above a certain theory-inspired threshold that depends on observation noise that may be unknown in practice. Next, we extend the error bounds to cases where the regularization parameter is chosen via cross-validation. This result is significant in that existing theoretical results on cross-validated estimators (Kale et al., 2011; Kumar et al., 2013; Abou-Moustafa and Szepesvari, 2017) do not apply to our setting since the estimators we study are not known to satisfy their required notion of stability. Finally, using simulations on synthetic and real data, we show that the cross-validated estimator selects a near-optimal penalty parameter and outperforms the theory-inspired approach of selecting the parameter. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

NeurIPS Conference 2022 Conference Paper

Thompson Sampling Efficiently Learns to Control Diffusion Processes

  • Mohamad Kazem Shirani Faradonbeh
  • Mohamad Sadegh Shirani Faradonbeh
  • Mohsen Bayati

Diffusion processes that evolve according to linear stochastic differential equations are an important family of continuous-time dynamic decision-making models. Optimal policies are well-studied for them, under full certainty about the drift matrices. However, little is known about data-driven control of diffusion processes with uncertain drift matrices as conventional discrete-time analysis techniques are not applicable. In addition, while the task can be viewed as a reinforcement learning problem involving exploration and exploitation trade-off, ensuring system stability is a fundamental component of designing optimal policies. We establish that the popular Thompson sampling algorithm learns optimal actions fast, incurring only a square-root of time regret, and also stabilizes the system in a short time period. To the best of our knowledge, this is the first such result for Thompson sampling in a diffusion process control problem. We validate our theoretical results through empirical simulations with real matrices. Moreover, we observe that Thompson sampling significantly improves (worst-case) regret, compared to the state-of-the-art algorithms, suggesting Thompson sampling explores in a more guarded fashion. Our theoretical analysis involves characterization of a certain \emph{optimality manifold} that ties the local geometry of the drift parameters to the optimal control of the diffusion process. We expect this technique to be of broader interest.

NeurIPS Conference 2020 Conference Paper

Unreasonable Effectiveness of Greedy Algorithms in Multi-Armed Bandit with Many Arms

  • Mohsen Bayati
  • Nima Hamidi
  • Ramesh Johari
  • Khashayar Khosravi

We study the structure of regret-minimizing policies in the {\em many-armed} Bayesian multi-armed bandit problem: in particular, with $k$ the number of arms and $T$ the time horizon, we consider the case where $k \geq \sqrt{T}$. We first show that {\em subsampling} is a critical step for designing optimal policies. In particular, the standard UCB algorithm leads to sub-optimal regret bounds in the many-armed regime. However, a subsampled UCB (SS-UCB), which samples $\Theta(\sqrt{T})$ arms and executes UCB only on that subset, is rate-optimal. Despite theoretically optimal regret, even SS-UCB performs poorly due to excessive exploration of suboptimal arms. In particular, in numerical experiments SS-UCB performs worse than a simple greedy algorithm (and its subsampled version) that pulls the current empirical best arm at every time period. We show that these insights hold even in a contextual setting, using real-world data. These empirical results suggest a novel form of {\em free exploration} in the many-armed regime that benefits greedy algorithms. We theoretically study this new source of free exploration and find that it is deeply connected to the distribution of a certain tail event for the prior distribution of arm rewards. This is a fundamentally distinct phenomenon from free exploration as discussed in the recent literature on contextual bandits, where free exploration arises due to variation in contexts. We use this insight to prove that the subsampled greedy algorithm is rate-optimal for Bernoulli bandits when $k > \sqrt{T}$, and achieves sublinear regret with more general distributions. This is a case where theoretical rate optimality does not tell the whole story: when complemented by the empirical observations of our paper, the power of greedy algorithms becomes quite evident. Taken together, from a practical standpoint, our results suggest that in applications it may be preferable to use a variant of the greedy algorithm in the many-armed regime.

NeurIPS Conference 2019 Conference Paper

Personalizing Many Decisions with High-Dimensional Covariates

  • Nima Hamidi
  • Mohsen Bayati
  • Kapil Gupta

We consider the k-armed stochastic contextual bandit problem with d dimensional features, when both k and d can be large. To the best of our knowledge, all existing algorithm for this problem have a regret bound that scale as polynomials of degree at least two in k and d. The main contribution of this paper is to introduce and theoretically analyze a new algorithm (REAL Bandit) with a regret that scales by r^2(k+d) when r is rank of the k by d matrix of unknown parameters. REAL Bandit relies on ideas from low-rank matrix estimation literature and a new row-enhancement subroutine that yields sharper bounds for estimating each row of the parameter matrix that may be of independent interest.

JMLR Journal 2019 Journal Article

Scalable Approximations for Generalized Linear Problems

  • Murat Erdogdu
  • Mohsen Bayati
  • Lee H. Dicker

In stochastic optimization, the population risk is generally approximated by the empirical risk which is in turn minimized by an iterative algorithm. However, in the large-scale setting, empirical risk minimization may be computationally restrictive. In this paper, we design an efficient algorithm to approximate the population risk minimizer in generalized linear problems such as binary classification with surrogate losses and generalized linear regression models. We focus on large-scale problems where the iterative minimization of the empirical risk is computationally intractable, i.e., the number of observations $n$ is much larger than the dimension of the parameter $p$ ($n \gg p \gg 1$). We show that under random sub-Gaussian design, the true minimizer of the population risk is approximately proportional to the corresponding ordinary least squares (OLS) estimator. Using this relation, we design an algorithm that achieves the same accuracy as the empirical risk minimizer through iterations that attain up to a quadratic convergence rate, and that are computationally cheaper than any batch optimization algorithm by at least a factor of $\mathcal{O}(p)$. We provide theoretical guarantees for our algorithm, and analyze the convergence behavior in terms of data dimensions. Finally, we demonstrate the performance of our algorithm on well-known classification and regression problems, through extensive numerical studies on large-scale datasets, and show that it achieves the highest performance compared to several other widely used optimization algorithms. [abs] [ pdf ][ bib ] &copy JMLR 2019. ( edit, beta )

NeurIPS Conference 2016 Conference Paper

Scaled Least Squares Estimator for GLMs in Large-Scale Problems

  • Murat Erdogdu
  • Lee Dicker
  • Mohsen Bayati

We study the problem of efficiently estimating the coefficients of generalized linear models (GLMs) in the large-scale setting where the number of observations $n$ is much larger than the number of predictors $p$, i. e. $n\gg p \gg 1$. We show that in GLMs with random (not necessarily Gaussian) design, the GLM coefficients are approximately proportional to the corresponding ordinary least squares (OLS) coefficients. Using this relation, we design an algorithm that achieves the same accuracy as the maximum likelihood estimator (MLE) through iterations that attain up to a cubic convergence rate, and that are cheaper than any batch optimization algorithm by at least a factor of $\mathcal{O}(p)$. We provide theoretical guarantees for our algorithm, and analyze the convergence behavior in terms of data dimensions. % Finally, we demonstrate the performance of our algorithm through extensive numerical studies on large-scale real and synthetic datasets, and show that it achieves the highest performance compared to several other widely used optimization algorithms.

NeurIPS Conference 2013 Conference Paper

Estimating LASSO Risk and Noise Level

  • Mohsen Bayati
  • Murat Erdogdu
  • Andrea Montanari

We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector $\theta_0\in R^p$ from noisy linear observation $y=X\theta_0+w\in R^n$ and the popular estimation procedure of solving an $\ell_1$-penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the $\ell_2$ estimation risk $\|\hat{\theta}-\theta_0\|_2$ and the variance of the noise. These can be used to select the regularization parameter optimally. Our approach combines Stein unbiased risk estimate (Stein'81) and recent results of (Bayati and Montanari'11-12) on the analysis of approximate message passing and risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices $X$ of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on the validity of a certain conjecture from statistical physics. Our approach is the first that provides an asymptotically consistent risk estimator. In addition, we demonstrate through simulation that our variance estimation outperforms several existing methods in the literature.

NeurIPS Conference 2010 Conference Paper

The LASSO risk: asymptotic results and real world examples

  • Mohsen Bayati
  • José Pereira
  • Andrea Montanari

We consider the problem of learning a coefficient vector x0 from noisy linear observation y=Ax0+w. In many contexts (ranging from model selection to image processing) it is desirable to construct a sparse estimator. In this case, a popular approach consists in solving an l1-penalized least squares problem known as the LASSO or BPDN. For sequences of matrices A of increasing dimensions, with iid gaussian entries, we prove that the normalized risk of the LASSO converges to a limit, and we obtain an explicit expression for this limit. Our result is the first rigorous derivation of an explicit formula for the asymptotic risk of the LASSO for random instances. The proof technique is based on the analysis of AMP, a recently developed efficient algorithm, that is inspired from graphical models ideas. Through simulations on real data matrices (gene expression data and hospital medical records) we observe that these results can be relevant in a broad array of practical applications.