Arrow Research search

Author name cluster

Patrick Jaillet

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

62 papers
2 author rows

Possible papers

62

TMLR Journal 2025 Journal Article

Double Machine Learning Based Structure Identification from Temporal Data

  • Emmanouil Angelis
  • Francesco Quinzan
  • Ashkan Soleymani
  • Patrick Jaillet
  • Stefan Bauer

Learning the causes of time-series data is a fundamental task in many applications, spanning from finance to earth sciences or bio-medical applications. Common approaches for this task are based on vector auto-regression, and they do not take into account unknown confounding between potential causes. However, in settings with many potential causes and noisy data, these approaches may be substantially biased. Furthermore, potential causes may be correlated in practical applications or even contain cycles. To address these challenges, we propose a new double machine learning based method for structure identification from temporal data (DR-SIT). We provide theoretical guarantees, showing that our method asymptotically recovers the true underlying causal structure. Our analysis extends to cases where the potential causes have cycles, and they may even be confounded. We further perform extensive experiments to showcase the superior performance of our method. Code: https://github.com/sdi1100041/TMLR_submission_DR_SIT

NeurIPS Conference 2025 Conference Paper

Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints

  • Yan Dai
  • Negin Golrezaei
  • Patrick Jaillet

Motivated by applications such as cloud platforms allocating GPUs to users or governments deploying mobile health units across competing regions, we study the constrained dynamic allocation of a reusable resource to a group of strategic agents. Our objective is to simultaneously (i) maximize social welfare, (ii) satisfy multi-dimensional long-term cost constraints, and (iii) incentivize truthful reporting. We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings -- agents can easily manipulate their reports to distort future dual updates for future gain. To address this vulnerability, we develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior. Our primal-side design combines epoch-based lazy updates -- discouraging agents from distorting dual updates -- with dual-adjust pricing and randomized exploration techniques that extract approximately truthful signals for learning. On the dual side, we design a novel online learning subroutine to resolve a circular dependency between actions and predictions; this makes our mechanism achieve $\tilde{\mathcal{O}}(\sqrt{T})$ social welfare regret (where $T$ is the number of allocation rounds), satisfies all cost constraints, and ensures incentive alignment. This $\tilde{\mathcal{O}}(\sqrt{T})$ performance matches that of non-strategic allocation approaches while additionally exhibiting robustness to strategic agents.

ICML Conference 2025 Conference Paper

Learning with Exact Invariances in Polynomial Time

  • Ashkan Soleymani
  • Behrooz Tahmasebi
  • Stefanie Jegelka
  • Patrick Jaillet

We study the statistical-computational trade-offs for learning with exact invariances (or symmetries) using kernel regression. Traditional methods, such as data augmentation, group averaging, canonicalization, and frame-averaging, either fail to provide a polynomial-time solution or are not applicable in the kernel setting. However, with oracle access to the geometric properties of the input space, we propose a polynomial-time algorithm that learns a classifier with exact invariances. Moreover, our approach achieves the same excess population risk (or generalization error) as the original kernel regression problem. To the best of our knowledge, this is the first polynomial-time algorithm to achieve exact (as opposed to approximate) invariances in this setting. In developing our approach, we also resolve a question recently posed by D{ı}az et al. (2025) on efficient computation of invariant bases and kernels with respect to finite groups, even when the group size is prohibitively large. Our proof leverages tools from differential geometry, spectral theory, and optimization. A key result in our development is a new reformulation of the problem of learning under invariances as optimizing an infinite number of linearly constrained convex quadratic programs, which may be of independent interest.

ICLR Conference 2025 Conference Paper

Neural Dueling Bandits: Preference-Based Optimization with Human Feedback

  • Arun Verma
  • Zhongxiang Dai
  • Xiaoqiang Lin
  • Patrick Jaillet
  • Bryan Kian Hsiang Low

Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy human preference feedback over the selected arms for the past contexts. However, existing algorithms assume the reward function is linear, which can be complex and non-linear in many real-life applications like online recommendations or ranking web search results. To overcome this challenge, we use a neural network to estimate the reward function using preference feedback for the previously selected arms. We propose upper confidence bound- and Thompson sampling-based algorithms with sub-linear regret guarantees that efficiently select arms in each round. We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution. Experimental results on the problem instances derived from synthetic datasets corroborate our theoretical results.

ICML Conference 2024 Conference Paper

A Universal Class of Sharpness-Aware Minimization Algorithms

  • Behrooz Tahmasebi
  • Ashkan Soleymani
  • Dara Bahri
  • Stefanie Jegelka
  • Patrick Jaillet

Recently, there has been a surge in interest in developing optimization algorithms for overparameterized models as achieving generalization is believed to require algorithms with suitable biases. This interest centers on minimizing sharpness of the original loss function; the Sharpness-Aware Minimization (SAM) algorithm has proven effective. However, most literature only considers a few sharpness measures, such as the maximum eigenvalue or trace of the training loss Hessian, which may not yield meaningful insights for non-convex optimization scenarios like neural networks. Additionally, many sharpness measures are sensitive to parameter invariances in neural networks, magnifying significantly under rescaling parameters. Motivated by these challenges, we introduce a new class of sharpness measures in this paper, leading to new sharpness-aware objective functions. We prove that these measures are universally expressive, allowing any function of the training loss Hessian matrix to be represented by appropriate hyperparameters. Furthermore, we show that the proposed objective functions explicitly bias towards minimizing their corresponding sharpness measures, and how they allow meaningful applications to models with parameter invariances (such as scale-invariances). Finally, as instances of our proposed general framework, we present Frob-SAM and Det-SAM, which are specifically designed to minimize the Frobenius norm and the determinant of the Hessian of the training loss, respectively. We also demonstrate the advantages of our general framework through extensive experiments.

NeurIPS Conference 2024 Conference Paper

Active Set Ordering

  • Quoc Phong Nguyen
  • Sunil Gupta
  • Svetha Venkatesh
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

In this paper, we formalize the active set ordering problem, which involves actively discovering a set of inputs based on their orderings determined by expensive evaluations of a blackbox function. We then propose the mean prediction (MP) algorithm and theoretically analyze it in terms of the regret of predicted pairwise orderings between inputs. Notably, as a special case of this framework, we can cast Bayesian optimization as an active set ordering problem by recognizing that maximizers can be identified solely by comparison rather than by precisely estimating the function evaluations. As a result, we are able to construct the popular Gaussian process upper confidence bound (GP-UCB) algorithm through the lens of ordering with several nuanced insights. We empirically validate the performance of our proposed solution using various synthetic functions and real-world datasets.

ICML Conference 2024 Conference Paper

Deletion-Anticipative Data Selection with a Limited Budget

  • Rachael Hwee Ling Sim
  • Jue Fan
  • Xiao Tian
  • Patrick Jaillet
  • Bryan Kian Hsiang Low

Learners with a limited budget can use supervised data subset selection and active learning techniques to select a smaller training set and reduce the cost of acquiring data and training machine learning (ML) models. However, the resulting high model performance, measured by a data utility function, may not be preserved when some data owners, enabled by the GDPR’s right to erasure, request their data to be deleted from the ML model. This raises an important question for learners who are temporarily unable or unwilling to acquire data again: During the initial data acquisition of a training set of size $k$, can we proactively maximize the data utility after future unknown deletions? We propose that the learner anticipates/estimates the probability that (i) each data owner in the feasible set will independently delete its data or (ii) a number of deletions occur out of $k$, and justify our proposal with concrete real-world use cases. Then, instead of directly maximizing the data utility function, the learner can maximize the expected or risk-averse post-deletion utility based on the anticipated probabilities. We further propose how to construct these deletion-anticipative data selection ($\texttt{DADS}$) maximization objectives to preserve monotone submodularity and near-optimality of greedy solutions, how to optimize the objectives and empirically evaluate $\texttt{DADS}$’ performance on real-world datasets.

ICLR Conference 2024 Conference Paper

Meta-VBO: Utilizing Prior Tasks in Optimizing Risk Measures with Gaussian Processes

  • Quoc Phong Nguyen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Research on optimizing the risk measure of a blackbox function using Gaussian processes, especially Bayesian optimization (BO) of risk measures, has become increasingly important due to the inevitable presence of uncontrollable variables in real-world applications. Nevertheless, existing works on BO of risk measures start the optimization from scratch for every new task without considering the results of prior tasks. In contrast, its vanilla BO counterpart has received a thorough investigation on utilizing prior tasks to speed up the current task through the body of works on meta-BO which, however, have not considered risk measures. To bridge this gap, this paper presents the first algorithm for meta-BO of risk measures (i.e., value-at-risk (VaR) and the conditional VaR), namely meta-VBO, by introducing a novel adjustment to the upper confidence bound acquisition function. Our proposed algorithm exhibits two desirable properties: (i) invariance to scaling and vertical shifting of the blackbox function and (ii) robustness to prior harmful tasks. We provide a theoretical performance guarantee for our algorithm and empirically demonstrate its performance using several synthetic function benchmarks and real-world objective functions.

ICLR Conference 2024 Conference Paper

Optimistic Bayesian Optimization with Unknown Constraints

  • Quoc Phong Nguyen
  • Wan Theng Ruth Chew
  • Le Song
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Though some research efforts have been dedicated to constrained Bayesian optimization (BO), there remains a notable absence of a principled approach with a theoretical performance guarantee in the decoupled setting. Such a setting involves independent evaluations of the objective function and constraints at different inputs, and is hence a relaxation of the commonly-studied coupled setting where functions must be evaluated together. As a result, the decoupled setting requires an adaptive selection between evaluating either the objective function or a constraint, in addition to selecting an input (in the coupled setting). This paper presents a novel constrained BO algorithm with a provable performance guarantee that can address the above relaxed setting. Specifically, it considers the fundamental trade-off between exploration and exploitation in constrained BO, and, interestingly, affords a noteworthy connection to active learning. The performance of our proposed algorithms is also empirically evaluated using several synthetic and real-world optimization problems.

NeurIPS Conference 2024 Conference Paper

Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars

  • Zhaoxuan Wu
  • Xiaoqiang Lin
  • Zhongxiang Dai
  • Wenyang Hu
  • Yao Shu
  • See-Kiong Ng
  • Patrick Jaillet
  • Bryan Kian Hsiang Low

Large language models (LLMs) have shown impressive capabilities in real-world applications. The capability of *in-context learning* (ICL) allows us to adapt an LLM to downstream tasks by including input-label exemplars in the prompt without model fine-tuning. However, the quality of these exemplars in the prompt greatly impacts performance, highlighting the need for an effective automated exemplar selection method. Recent studies have explored retrieval-based approaches to select exemplars tailored to individual test queries, which can be undesirable due to extra test-time computation and an increased risk of data exposure. Moreover, existing methods fail to adequately account for the impact of exemplar ordering on the performance. On the other hand, the impact of the *instruction*, another essential component in the prompt given to the LLM, is often overlooked in existing exemplar selection methods. To address these challenges, we propose a novel method named $\texttt{EASE}$, which leverages the hidden embedding from a pre-trained language model to represent ordered sets of exemplars and uses a neural bandit algorithm to optimize the sets of exemplars *while accounting for exemplar ordering*. Our $\texttt{EASE}$ can efficiently find an ordered set of exemplars that *performs well for all test queries* from a given task, thereby eliminating test-time computation. Importantly, $\texttt{EASE}$ can be readily extended to *jointly optimize both the exemplars and the instruction*. Through extensive empirical evaluations (including novel tasks), we demonstrate the superiority of $\texttt{EASE}$ over existing methods, and reveal practical insights about the impact of exemplar selection on ICL, which may be of independent interest. Our code is available at https: //github. com/ZhaoxuanWu/EASE-Prompt-Optimization.

ICML Conference 2024 Conference Paper

Use Your INSTINCT: INSTruction optimization for LLMs usIng Neural bandits Coupled with Transformers

  • Xiaoqiang Lin
  • Zhaoxuan Wu
  • Zhongxiang Dai
  • Wenyang Hu
  • Yao Shu
  • See-Kiong Ng
  • Patrick Jaillet
  • Bryan Kian Hsiang Low

Large language models (LLMs) have shown remarkable instruction-following capabilities and achieved impressive performances in various applications. However, the performances of LLMs depend heavily on the instructions given to them, which are typically manually tuned with substantial human efforts. Recent work has used the query-efficient Bayesian optimization (BO) algorithm to automatically optimize the instructions given to black-box LLMs. However, BO usually falls short when optimizing highly sophisticated (e. g. , high-dimensional) objective functions, such as the functions mapping an instruction to the performance of an LLM. This is mainly due to the limited expressive power of the Gaussian process (GP) which is used by BO as a surrogate to model the objective function. Meanwhile, it has been repeatedly shown that neural networks (NNs), especially pre-trained transformers, possess strong expressive power and can model highly complex functions. So, we adopt a neural bandit algorithm which replaces the GP in BO by an NN surrogate to optimize instructions for black-box LLMs. More importantly, the neural bandit algorithm allows us to naturally couple the NN surrogate with the hidden representation learned by a pre-trained transformer (i. e. , an open-source LLM), which significantly boosts its performance. These motivate us to propose our INSTruction optimization usIng Neural bandits Coupled with Transformers (INSTINCT) algorithm. We perform instruction optimization for ChatGPT and use extensive experiments to show that INSTINCT consistently outperforms baselines in different tasks, e. g. , various instruction induction tasks and the task of improving zero-shot chain-of-thought instructions. Our code is available at https: //github. com/xqlin98/INSTINCT.

NeurIPS Conference 2023 Conference Paper

Batch Bayesian Optimization For Replicable Experimental Design

  • Zhongxiang Dai
  • Quoc Phong Nguyen
  • Sebastian Tay
  • Daisuke Urano
  • Richalynn Leong
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Many real-world experimental design problems (a) evaluate multiple experimental conditions in parallel and (b) replicate each condition multiple times due to large and heteroscedastic observation noise. Given a fixed total budget, this naturally induces a trade-off between evaluating more unique conditions while replicating each of them fewer times vs. evaluating fewer unique conditions and replicating each more times. Moreover, in these problems, practitioners may be risk-averse and hence prefer an input with both good average performance and small variability. To tackle both challenges, we propose the Batch Thompson Sampling for Replicable Experimental Design (BTS-RED) framework, which encompasses three algorithms. Our BTS-RED-Known and BTS-RED-Unknown algorithms, for, respectively, known and unknown noise variance, choose the number of replications adaptively rather than deterministically such that an input with a larger noise variance is replicated more times. As a result, despite the noise heteroscedasticity, both algorithms enjoy a theoretical guarantee and are asymptotically no-regret. Our Mean-Var-BTS-RED algorithm aims at risk-averse optimization and is also asymptotically no-regret. We also show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.

ICML Conference 2023 Conference Paper

DRCFS: Doubly Robust Causal Feature Selection

  • Francesco Quinzan
  • Ashkan Soleymani
  • Patrick Jaillet
  • Cristian R. Rojas
  • Stefan Bauer

Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems.

ICLR Conference 2023 Conference Paper

Federated Neural Bandits

  • Zhongxiang Dai
  • Yao Shu
  • Arun Verma
  • Flint Xiaofeng Fan
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Recent works on neural contextual bandits have achieved compelling performances due to their ability to leverage the strong representation power of neural networks (NNs) for reward prediction. Many applications of contextual bandits involve multiple agents who collaborate without sharing raw observations, thus giving rise to the setting of federated contextual bandits}. Existing works on federated contextual bandits rely on linear or kernelized bandits, which may fall short when modeling complex real-world reward functions. So, this paper introduces the federated neural-upper confidence bound (FN-UCB) algorithm. To better exploit the federated setting, FN-UCB adopts a weighted combination of two UCBs: $\text{UCB}^{a}$ allows every agent to additionally use the observations from the other agents to accelerate exploration (without sharing raw observations), while $\text{UCB}^{b}$ uses an NN with aggregated parameters for reward prediction in a similar way to federated averaging for supervised learning. Notably, the weight between the two UCBs required by our theoretical analysis is amenable to an interesting interpretation, which emphasizes $\text{UCB}^{a}$ initially for accelerated exploration and relies more on $\text{UCB}^{b}$ later after enough observations have been collected to train the NNs for accurate reward prediction (i.e., reliable exploitation). We prove sub-linear upper bounds on both the cumulative regret and the number of communication rounds of FN-UCB, and empirically demonstrate its competitive performance.

NeurIPS Conference 2023 Conference Paper

Incentives in Private Collaborative Machine Learning

  • Rachael Sim
  • Yehong Zhang
  • Nghia Hoang
  • Xinyi Xu
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and perturb its sufficient statistic (SS) accordingly. The mediator values the perturbed SS by the Bayesian surprise it elicits about the model parameters. As our valuation function enforces a privacy-valuation trade-off, parties are deterred from selecting excessive DP guarantees that reduce the utility of the grand coalition's model. Finally, the mediator rewards each party with different posterior samples of the model parameters. Such rewards still satisfy existing incentives like fairness but additionally preserve DP and a high similarity to the grand coalition's posterior. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.

NeurIPS Conference 2023 Conference Paper

Memory-Constrained Algorithms for Convex Optimization

  • Moise Blanchard
  • Junhui Zhang
  • Patrick Jaillet

We propose a family of recursive cutting-plane algorithms to solve feasibility problems with constrained memory, which can also be used for first-order convex optimization. Precisely, in order to find a point within a ball of radius $\epsilon$ with a separation oracle in dimension $d$---or to minimize $1$-Lipschitz convex functions to accuracy $\epsilon$ over the unit ball---our algorithms use $\mathcal O(\frac{d^2}{p}\ln \frac{1}{\epsilon})$ bits of memory, and make $\mathcal O((C\frac{d}{p}\ln \frac{1}{\epsilon})^p)$ oracle calls. The family is parametrized by $p\in[d]$ and provides an oracle-complexity/memory trade-off in the sub-polynomial regime $\ln\frac{1}{\epsilon}\gg\ln d$. While several works gave lower-bound trade-offs (impossibility results)---we explicit here their dependence with $\ln\frac{1}{\epsilon}$, showing that these also hold in any sub-polynomial regime---to the best of our knowledge this is the first class of algorithms that provides a positive trade-off between gradient descent and cutting-plane methods in any regime with $\epsilon\leq 1/\sqrt d$. The algorithms divide the $d$ variables into $p$ blocks and optimize over blocks sequentially, with approximate separation vectors constructed using a variant of Vaidya's method. In the regime $\epsilon \leq d^{-\Omega(d)}$, our algorithm with $p=d$ achieves the information-theoretic optimal memory usage and improves the oracle-complexity of gradient descent.

ICML Conference 2023 Conference Paper

Multi-channel Autobidding with Budget and ROI Constraints

  • Yuan Deng
  • Negin Golrezaei
  • Patrick Jaillet
  • Jason Cheuk Nam Liang
  • Vahab Mirrokni

In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc. , each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e. g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser’s global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem.

NeurIPS Conference 2023 Conference Paper

Quantum Bayesian Optimization

  • Zhongxiang Dai
  • Gregory Kang Ruey Lau
  • Arun Verma
  • Yao Shu
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Kernelized bandits, also known as Bayesian optimization (BO), has been a prevalent method for optimizing complicated black-box reward functions. Various BO algorithms have been theoretically shown to enjoy upper bounds on their cumulative regret which are sub-linear in the number $T$ of iterations, and a regret lower bound of $\Omega(\sqrt{T})$ has been derived which represents the unavoidable regrets for any classical BO algorithm. Recent works on quantum bandits have shown that with the aid of quantum computing, it is possible to achieve tighter regret upper bounds better than their corresponding classical lower bounds. However, these works are restricted to either multi-armed or linear bandits, and are hence not able to solve sophisticated real-world problems with non-linear reward functions. To this end, we introduce the quantum-Gaussian process-upper confidence bound (Q-GP-UCB) algorithm. To the best of our knowledge, our Q-GP-UCB is the first BO algorithm able to achieve a regret upper bound of $\mathcal{O}(\text{poly}\log T)$, which is significantly smaller than its regret lower bound of $\Omega(\sqrt{T})$ in the classical setting. Moreover, thanks to our novel analysis of the confidence ellipsoid, our Q-GP-UCB with the linear kernel achieves a smaller regret than the quantum linear UCB algorithm from the previous work. We use simulations, as well as an experiment using a real quantum computer, to verify that the theoretical quantum speedup achieved by our Q-GP-UCB is also potentially relevant in practice.

ICLR Conference 2023 Conference Paper

Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation

  • Thanh Lam
  • Arun Verma
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

We study the risk-aware reinforcement learning (RL) problem in the episodic finite-horizon Markov decision process with unknown transition and reward functions. In contrast to the risk-neutral RL problem, we consider minimizing the risk of having low rewards, which arise due to the intrinsic randomness of the MDPs and imperfect knowledge of the model. Our work provides a unified framework to analyze the regret of risk-aware RL policy with coherent risk measures in conjunction with non-linear function approximation, which gives the first sub-linear regret bounds in the setting. Finally, we validate our theoretical results via empirical experiments on synthetic and real-world data.

ICLR Conference 2023 Conference Paper

Zeroth-Order Optimization with Trajectory-Informed Derivative Estimation

  • Yao Shu
  • Zhongxiang Dai
  • Weicong Sng
  • Arun Verma
  • Patrick Jaillet
  • Bryan Kian Hsiang Low

Zeroth-order (ZO) optimization, in which the derivative is unavailable, has recently succeeded in many important machine learning applications. Existing algorithms rely on finite difference (FD) methods for derivative estimation and gradient descent (GD)-based approaches for optimization. However, these algorithms suffer from query inefficiency because many additional function queries are required for derivative estimation in their every GD update, which typically hinders their deployment in real-world applications where every function query is expensive. To this end, we propose a trajectory-informed derivative estimation method which only employs the optimization trajectory (i.e., the history of function queries during optimization) and hence can eliminate the need for additional function queries to estimate a derivative. Moreover, based on our derivative estimation, we propose the technique of dynamic virtual updates, which allows us to reliably perform multiple steps of GD updates without reapplying derivative estimation. Based on these two contributions, we introduce the zeroth-order optimization with trajectory-informed derivative estimation (ZoRD) algorithm for query-efficient ZO optimization. We theoretically demonstrate that our trajectory-informed derivative estimation and our ZoRD algorithm improve over existing approaches, which is then supported by our real-world experiments such as black-box adversarial attack, non-differentiable metric optimization, and derivative-free reinforcement learning.

NeurIPS Conference 2022 Conference Paper

Effective Dimension in Bandit Problems under Censorship

  • Gauthier Guinet
  • Saurabh Amin
  • Patrick Jaillet

In this paper, we study both multi-armed and contextual bandit problems in censored environments. Our goal is to estimate the performance loss due to censorship in the context of classical algorithms designed for uncensored environments. Our main contributions include the introduction of a broad class of censorship models and their analysis in terms of the effective dimension of the problem -- a natural measure of its underlying statistical complexity and main driver of the regret bound. In particular, the effective dimension allows us to maintain the structure of the original problem at first order, while embedding it in a bigger space, and thus naturally leads to results analogous to uncensored settings. Our analysis involves a continuous generalization of the Elliptical Potential Inequality, which we believe is of independent interest. We also discover an interesting property of decision-making under censorship: a transient phase during which initial misspecification of censorship is self-corrected at an extra cost; followed by a stationary phase that reflects the inherent slowdown of learning governed by the effective dimension. Our results are useful for applications of sequential decision-making models where the feedback received depends on strategic uncertainty (e. g. , agents’ willingness to follow a recommendation) and/or random uncertainty (e. g. , loss or delay in arrival of information).

UAI Conference 2022 Conference Paper

On provably robust meta-Bayesian optimization

  • Zhongxiang Dai
  • Yizhou Chen
  • Haibin Yu
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Bayesian optimization (BO) has become popular for sequential optimization of black-box functions. When BO is used to optimize a target function, we often have access to previous evaluations of potentially related functions. This begs the question as to whether we can leverage these previous experiences to accelerate the current BO task through meta-learning (meta-BO), while ensuring robustness against potentially harmful dissimilar tasks that could sabotage the convergence of BO. This paper introduces two scalable and provably robust meta-BO algorithms: robust meta-Gaussian process-upper confidence bound (RM-GP-UCB) and RM-GP-Thompson sampling (RM-GP-TS). We prove that both algorithms are asymptotically no-regret even when some or all previous tasks are dissimilar to the current task, and show that RM-GP-UCB enjoys a better theoretical robustness than RM-GP-TS. We also exploit the theoretical guarantees to optimize the weights assigned to individual previous tasks through regret minimization via online learning, which diminishes the impact of dissimilar tasks and hence further enhances the robustness. Empirical evaluations show that (a) RM-GP-UCB performs effectively and consistently across various applications, and (b) RM-GP-TS, despite being less robust than RM-GP-UCB both in theory and in practice, performs competitively in some scenarios with less dissimilar tasks and is more computationally efficient.

NeurIPS Conference 2022 Conference Paper

Sample-Then-Optimize Batch Neural Thompson Sampling

  • Zhongxiang Dai
  • Yao Shu
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Bayesian optimization (BO), which uses a Gaussian process (GP) as a surrogate to model its objective function, is popular for black-box optimization. However, due to the limitations of GPs, BO underperforms in some problems such as those with categorical, high-dimensional or image inputs. To this end, recent works have used the highly expressive neural networks (NNs) as the surrogate model and derived theoretical guarantees using the theory of neural tangent kernel (NTK). However, these works suffer from the limitations of the requirement to invert an extremely large parameter matrix and the restriction to the sequential (rather than batch) setting. To overcome these limitations, we introduce two algorithms based on the Thompson sampling (TS) policy named Sample-Then-Optimize Batch Neural TS (STO-BNTS) and STO-BNTS-Linear. To choose an input query, we only need to train an NN (resp. a linear model) and then choose the query by maximizing the trained NN (resp. linear model), which is equivalently sampled from the GP posterior with the NTK as the kernel function. As a result, our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy. Next, we derive regret upper bounds for our algorithms with batch evaluations, and use insights from batch BO and NTK to show that they are asymptotically no-regret under certain conditions. Finally, we verify their empirical effectiveness using practical AutoML and reinforcement learning experiments.

NeurIPS Conference 2022 Conference Paper

Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning

  • Quoc Phong Nguyen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

This paper investigates the problem of fairly trading off between payoff and model rewards in collaborative machine learning (ML) where parties aggregate their datasets together to obtain improved ML models over that of each party. Supposing parties can afford the optimal model trained on the aggregated dataset, we propose an allocation scheme that distributes the payoff fairly. Notably, the same scheme can be derived from two different approaches based on (a) desirable properties of the parties' payoffs or (b) that of the underlying payoff flows from one party to another. While the former is conceptually simpler, the latter can be used to handle the practical constraint on the budgets of parties. In particular, we propose desirable properties for achieving a fair adjustment of the payoff flows that can trade off between the model reward's performance and the payoff reward. We empirically demonstrate that our proposed scheme is a sensible solution in several scenarios of collaborative ML with different budget constraints.

AAAI Conference 2021 Conference Paper

An Information-Theoretic Framework for Unifying Active Learning Problems

  • Quoc Phong Nguyen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that subsumes an existing LSE algorithm and achieves state-of-theart performance in LSE problems with a continuous input domain. Then, by exploiting the relationship between LSE and BO, we design a competitive information-theoretic acquisition function for BO that has interesting connections to upper confidence bound and max-value entropy search (MES). The latter connection reveals a drawback of MES which has important implications on not only MES but also on other MES-based acquisition functions. Finally, our unifying information-theoretic framework can be applied to solve a generalized problem of LSE and BO involving multiple level sets in a data-efficient manner. We empirically evaluate the performance of our proposed algorithms using synthetic benchmark functions, a real-world dataset, and in hyperparameter tuning of machine learning models.

ICML Conference 2021 Conference Paper

Collaborative Bayesian Optimization with Fair Regret

  • Rachael Hwee Ling Sim
  • Yehong Zhang
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Bayesian optimization (BO) is a popular tool for optimizing complex and costly-to-evaluate black-box objective functions. To further reduce the number of function evaluations, any party performing BO may be interested to collaborate with others to optimize the same objective function concurrently. To do this, existing BO algorithms have considered optimizing a batch of input queries in parallel and provided theoretical bounds on their cumulative regret reflecting inefficiency. However, when the objective function values are correlated with real-world rewards (e. g. , money), parties may be hesitant to collaborate if they risk incurring larger cumulative regret (i. e. , smaller real-world reward) than others. This paper shows that fairness and efficiency are both necessary for the collaborative BO setting. Inspired by social welfare concepts from economics, we propose a new notion of regret capturing these properties and a collaborative BO algorithm whose convergence rate can be theoretically guaranteed by bounding the new regret, both of which share an adjustable parameter for trading off between fairness vs. efficiency. We empirically demonstrate the benefits (e. g. , increased fairness) of our algorithm using synthetic and real-world datasets.

NeurIPS Conference 2021 Conference Paper

Differentially Private Federated Bayesian Optimization with Distributed Exploration

  • Zhongxiang Dai
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Bayesian optimization (BO) has recently been extended to the federated learning (FL) setting by the federated Thompson sampling (FTS) algorithm, which has promising applications such as federated hyperparameter tuning. However, FTS is not equipped with a rigorous privacy guarantee which is an important consideration in FL. Recent works have incorporated differential privacy (DP) into the training of deep neural networks through a general framework for adding DP to iterative algorithms. Following this general DP framework, our work here integrates DP into FTS to preserve user-level privacy. We also leverage the ability of this general DP framework to handle different parameter vectors, as well as the technique of local modeling for BO, to further improve the utility of our algorithm through distributed exploration (DE). The resulting differentially private FTS with DE (DP-FTS-DE) algorithm is endowed with theoretical guarantees for both the privacy and utility and is amenable to interesting theoretical insights about the privacy-utility trade-off. We also use real-world experiments to show that DP-FTS-DE achieves high utility (competitive performance) with a strong privacy guarantee (small privacy loss) and induces a trade-off between privacy and utility.

UAI Conference 2021 Conference Paper

Learning to learn with Gaussian processes

  • Quoc Phong Nguyen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

This paper presents Gaussian process meta-learning (GPML) for few-shot regression, which explicitly exploits the distance between regression problems/tasks using a novel task kernel. It contrasts sharply with the popular metric-based meta-learning approach which is based on the distance between data inputs or their embeddings in the few-shot learning literature. Apart from the superior predictive performance by capturing the diversity of different tasks, GPML offers a set of representative tasks that are useful for understanding the task distribution. We empirically demonstrate the performance and interpretability of GPML in several few-shot regression problems involving a multimodal task distribution and real-world datasets.

ICML Conference 2021 Conference Paper

Model Fusion for Personalized Learning

  • Thanh Chi Lam
  • Trong Nghia Hoang
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Production systems operating on a growing domain of analytic services often require generating warm-start solution models for emerging tasks with limited data. One potential approach to address this warm-start challenge is to adopt meta learning to generate a base model that can be adapted to solve unseen tasks with minimal fine-tuning. This however requires the training processes of previous solution models of existing tasks to be synchronized. This is not possible if these models were pre-trained separately on private data owned by different entities and cannot be synchronously re-trained. To accommodate for such scenarios, we develop a new personalized learning framework that synthesizes customized models for unseen tasks via fusion of independently pre-trained models of related tasks. We establish performance guarantee for the proposed framework and demonstrate its effectiveness on both synthetic and real datasets.

NeurIPS Conference 2021 Conference Paper

Optimizing Conditional Value-At-Risk of Black-Box Functions

  • Quoc Phong Nguyen
  • Zhongxiang Dai
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

This paper presents two Bayesian optimization (BO) algorithms with theoretical performance guarantee to maximize the conditional value-at-risk (CVaR) of a black-box function: CV-UCB and CV-TS which are based on the well-established principle of optimism in the face of uncertainty and Thompson sampling, respectively. To achieve this, we develop an upper confidence bound of CVaR and prove the no-regret guarantee of CV-UCB by utilizing an interesting connection between CVaR and value-at-risk (VaR). For CV-TS, though it is straightforwardly performed with Thompson sampling, bounding its Bayesian regret is non-trivial because it requires a tail expectation bound for the distribution of CVaR of a black-box function, which has not been shown in the literature. The performances of both CV-UCB and CV-TS are empirically evaluated in optimizing CVaR of synthetic benchmark functions and simulated real-world optimization problems.

AAAI Conference 2021 Conference Paper

Top-k Ranking Bayesian Optimization

  • Quoc Phong Nguyen
  • Sebastian Tay
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

This paper presents a novel approach to top-k ranking Bayesian optimization (top-k ranking BO) which is a practical and significant generalization of preferential BO to handle top-k ranking and tie/indifference observations. We first design a surrogate model that is not only capable of catering to the above observations, but is also supported by a classic random utility model. Another equally important contribution is the introduction of the first information-theoretic acquisition function in BO with preferential observation called multinomial predictive entropy search (MPES) which is flexible in handling these observations and optimized for all inputs of a query jointly. MPES possesses superior performance compared with existing acquisition functions that select the inputs of a query one at a time greedily. We empirically evaluate the performance of MPES using several synthetic benchmark functions, CIFAR-10 dataset, and SUSHI preference dataset.

UAI Conference 2021 Conference Paper

Trusted-maximizers entropy search for efficient Bayesian optimization

  • Quoc Phong Nguyen
  • Zhaoxuan Wu
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Information-based Bayesian optimization (BO) algorithms have achieved state-of-the-art performance in optimizing a black-box objective function. However, they usually require several approximations or simplifying assumptions (without clearly understanding their effects on the BO performance) and/or their generalization to batch BO is computationally unwieldy, especially with an increasing batch size. To alleviate these issues, this paper presents a novel trusted-maximizers entropy search (TES) acquisition function: It measures how much an input query contributes to the information gain on the maximizer over a finite set of trusted maximizers, i. e. , inputs optimizing functions that are sampled from the Gaussian process posterior belief of the objective function. Evaluating TES requires either only a stochastic approximation with sampling or a deterministic approximation with expectation propagation, both of which are investigated and empirically evaluated using synthetic benchmark objective functions and real-world optimization problems, e. g. , hyperparameter tuning of a convolutional neural network and synthesizing physically realizable faces to fool a black-box face recognition system. Though TES can naturally be generalized to a batch variant with either approximation, the latter is amenable to be scaled to a much larger batch size in our experiments.

ICML Conference 2021 Conference Paper

Value-at-Risk Optimization with Gaussian Processes

  • Quoc Phong Nguyen
  • Zhongxiang Dai
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Value-at-risk (VaR) is an established measure to assess risks in critical real-world applications with random environmental factors. This paper presents a novel VaR upper confidence bound (V-UCB) algorithm for maximizing the VaR of a black-box objective function with the first no-regret guarantee. To realize this, we first derive a confidence bound of VaR and then prove the existence of values of the environmental random variable (to be selected to achieve no regret) such that the confidence bound of VaR lies within that of the objective function evaluated at such values. Our V-UCB algorithm empirically demonstrates state-of-the-art performance in optimizing synthetic benchmark functions, a portfolio optimization problem, and a simulated robot task.

JAIR Journal 2021 Journal Article

Zone pAth Construction (ZAC) based Approaches for Effective Real-Time Ridesharing

  • Meghna Lowalekar
  • Pradeep Varakantham
  • Patrick Jaillet

Real-time ridesharing systems such as UberPool, Lyft Line and GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the “right” requests to travel together in the “right” available vehicles in real-time, so that the objective (e.g., requests served, revenue or delay) is optimized. This challenge has been addressed in existing work by: (i) generating as many relevant feasible combinations of requests (with respect to the available delay for customers) as possible in real-time; and then (ii) optimizing assignment of the feasible request combinations to vehicles. Since the number of request combinations increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such approaches have to employ ad hoc heuristics to identify a subset of request combinations for assignment. Our key contribution is in developing approaches that employ zone (abstraction of individual locations) paths instead of request combinations. Zone paths allow for generation of significantly more “relevant” combinations (in comparison to ad hoc heuristics) in real-time than competing approaches due to two reasons: (i) Each zone path can typically represent multiple request combinations; (ii) Zone paths are generated using a combination of offline and online methods. Specifically, we contribute both myopic (ridesharing assignment focussed on current requests only) and non-myopic (ridesharing assignment considers impact on expected future requests) approaches that employ zone paths. In our experimental results, we demonstrate that our myopic approach outperforms the current best myopic approach for ridesharing on both real-world and synthetic datasets (with respect to both objective and runtime). We also show that our non-myopic approach obtains 14.7% improvement over existing myopic approach. Our non-myopic approach gets improvements of up to 12.48% over a recent non-myopic approach, NeurADP. Even when NeurADP is allowed to optimize learning over test settings, results largely remain comparable except in a couple of cases, where NeurADP performs better.

NeurIPS Conference 2020 Conference Paper

Federated Bayesian Optimization via Thompson Sampling

  • Zhongxiang Dai
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Bayesian optimization (BO) is a prominent approach to optimizing expensive-to-evaluate black-box functions. The massive computational capability of edge devices such as mobile phones, coupled with privacy concerns, has led to a surging interest in federated learning (FL) which focuses on collaborative training of deep neural networks (DNNs) via first-order optimization techniques. However, some common machine learning tasks such as hyperparameter tuning of DNNs lack access to gradients and thus require zeroth-order/black-box optimization. This hints at the possibility of extending BO to the FL setting (FBO) for agents to collaborate in these black-box optimization tasks. This paper presents federated Thompson sampling (FTS) which overcomes a number of key challenges of FBO and FL in a principled way: We (a) use random Fourier features to approximate the Gaussian process surrogate model used in BO, which naturally produces the parameters to be exchanged between agents, (b) design FTS based on Thompson sampling, which significantly reduces the number of parameters to be exchanged, and (c) provide a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO. We empirically demonstrate the effectiveness of FTS in terms of communication efficiency, computational efficiency, and practical performance.

ICML Conference 2020 Conference Paper

Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion

  • Trong Nghia Hoang
  • Thanh Lam
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Model fusion is an emerging study in collective learning where heterogeneous experts with private data and learning architectures need to combine their black-box knowledge for better performance. Existing literature achieves this via a local knowledge distillation scheme that transfuses the predictive patterns of each pre-trained expert onto a white-box imitator model, which can be incorporated efficiently into a global model. This scheme however does not extend to multi-task scenarios where different experts were trained to solve different tasks and only part of their distilled knowledge is relevant to a new task. To address this multi-task challenge, we develop a new fusion paradigm that represents each expert as a distribution over a spectrum of predictive prototypes, which are isolated from task-specific information encoded within the prototype distribution. The task-agnostic prototypes can then be reintegrated to generate a new model that solves a new task encoded with a different prototype distribution. The fusion and adaptation performance of the proposed framework is demonstrated empirically on several real-world benchmark datasets.

NeurIPS Conference 2020 Conference Paper

No-regret Learning in Price Competitions under Consumer Reference Effects

  • Negin Golrezaei
  • Patrick Jaillet
  • Jason Cheuk Nam Liang

We study long-run market stability for repeated price competitions between two firms, where consumer demand depends on firms' posted prices and consumers’ price expectations called reference prices. Consumers' reference prices vary over time according to a memory-based dynamic, which is a weighted average of all historical prices. We focus on the setting where firms are not aware of demand functions and how reference prices are formed but have access to an oracle that provides a measure of consumers' responsiveness to the current posted prices. We show that if the firms run no-regret algorithms, in particular, online mirror descent (OMD), with decreasing step sizes, the market stabilizes in the sense that firms' prices and reference prices converge to a stable Nash Equilibrium (SNE). Interestingly, we also show that there exist constant step sizes under which the market stabilizes. We further characterize the rate of convergence to the SNE for both decreasing and constant OMD step sizes.

ICML Conference 2020 Conference Paper

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

  • Zhongxiang Dai
  • Yizhou Chen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet
  • Teck-Hua Ho

This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

NeurIPS Conference 2020 Conference Paper

Variational Bayesian Unlearning

  • Quoc Phong Nguyen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

This paper studies the problem of approximately unlearning a Bayesian model from a small subset of the training data to be erased. We frame this problem as one of minimizing the Kullback-Leibler divergence between the approximate posterior belief of model parameters after directly unlearning from erased data vs. the exact posterior belief from retraining with remaining data. Using the variational inference (VI) framework, we show that it is equivalent to minimizing an evidence upper bound which trades off between fully unlearning from erased data vs. not entirely forgetting the posterior belief given the full data (i. e. , including the remaining data); the latter prevents catastrophic unlearning that can render the model useless. In model training with VI, only an approximate (instead of exact) posterior belief given the full data can be obtained, which makes unlearning even more challenging. We propose two novel tricks to tackle this challenge. We empirically demonstrate our unlearning methods on Bayesian models such as sparse Gaussian process and logistic regression using synthetic and real-world datasets.

ICML Conference 2019 Conference Paper

Bayesian Optimization Meets Bayesian Optimal Stopping

  • Zhongxiang Dai
  • Haibin Yu
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Bayesian optimization (BO) is a popular paradigm for optimizing the hyperparameters of machine learning (ML) models due to its sample efficiency. Many ML models require running an iterative training procedure (e. g. , stochastic gradient descent). This motivates the question whether information available during the training process (e. g. , validation accuracy after each epoch) can be exploited for improving the epoch efficiency of BO algorithms by early-stopping model training under hyperparameter settings that will end up under-performing and hence eliminating unnecessary training epochs. This paper proposes to unify BO (specifically, Gaussian process-upper confidence bound (GP-UCB)) with Bayesian optimal stopping (BO-BOS) to boost the epoch efficiency of BO. To achieve this, while GP-UCB is sample-efficient in the number of function evaluations, BOS complements it with epoch efficiency for each function evaluation by providing a principled optimal stopping mechanism for early stopping. BO-BOS preserves the (asymptotic) no-regret performance of GP-UCB using our specified choice of BOS parameters that is amenable to an elegant interpretation in terms of the exploration-exploitation trade-off. We empirically evaluate the performance of BO-BOS and demonstrate its generality in hyperparameter optimization of ML models and two other interesting applications.

NeurIPS Conference 2019 Conference Paper

Implicit Posterior Variational Inference for Deep Gaussian Processes

  • Haibin Yu
  • Yizhou Chen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet
  • Zhongxiang Dai

A multi-layer deep Gaussian process (DGP) model is a hierarchical composition of GP models with a greater expressive power. Exact DGP inference is intractable, which has motivated the recent development of deterministic and stochastic approximation methods. Unfortunately, the deterministic approximation methods yield a biased posterior belief while the stochastic one is computationally costly. This paper presents an implicit posterior variational inference (IPVI) framework for DGPs that can ideally recover an unbiased posterior belief and still preserve time efficiency. Inspired by generative adversarial networks, our IPVI framework achieves this by casting the DGP inference problem as a two-player game in which a Nash equilibrium, interestingly, coincides with an unbiased posterior belief. This consequently inspires us to devise a best-response dynamics algorithm to search for a Nash equilibrium (i. e. , an unbiased posterior belief). Empirical evaluation shows that IPVI outperforms the state-of-the-art approximation methods for DGPs.

IJCAI Conference 2019 Conference Paper

Improving Customer Satisfaction in Bike Sharing Systems through Dynamic Repositioning

  • Supriyo Ghosh
  • Jing Yu Koh
  • Patrick Jaillet

In bike sharing systems (BSSs), the uncoordinated movements of customers using bikes lead to empty or congested stations, which causes a significant loss in customer demand. In order to reduce the lost demand, a wide variety of existing research has employed a fixed set of historical demand patterns to design efficient bike repositioning solutions. However, the progress remains slow in understanding the underlying uncertainties in demand and designing proactive robust bike repositioning solutions. To bridge this gap, we propose a dynamic bike repositioning approach based on a probabilistic satisficing method which uses the uncertain demand parameters that are learnt from historical data. We develop a novel and computationally efficient mixed integer linear program for maximizing the probability of satisfying the uncertain demand so as to improve the overall customer satisfaction and efficiency of the system. Extensive experimental results from a simulation model built on a real-world bike sharing data set demonstrate that our approach is not only robust to uncertainties in customer demand, but also outperforms the existing state-of-the-art repositioning approaches in terms of reducing the expected lost demand.

ICAPS Conference 2019 Conference Paper

ZAC: A Zone Path Construction Approach for Effective Real-Time Ridesharing

  • Meghna Lowalekar
  • Pradeep Varakantham
  • Patrick Jaillet

Real-time ridesharing systems such as UberPool, Lyft Line, GrabShare have become hugely popular as they reduce the costs for customers, improve per trip revenue for drivers and reduce traffic on the roads by grouping customers with similar itineraries. The key challenge in these systems is to group the right requests to travel in available vehicles in real-time, so that the objective (e. g. , requests served, revenue or delay) is optimized. The most relevant existing work has focussed on generating as many relevant feasible (with respect to available delay for customers) combinations of requests (referred to as trips) as possible in real-time. Since the number of trips increases exponentially with the increase in vehicle capacity and number of requests, unfortunately, such an approach has to employ ad hoc heuristics to identify relevant trips. To that end, we propose an approach that generates many zone (abstraction of individual locations) paths – where each zone path can represent multiple trips (combinations of requests) – and assigns available vehicles to these zone paths to optimize the objective. The key advantage of our approach is that these zone paths are generated using a combination of offline and online methods, consequently allowing for the generation of many more relevant combinations in real-time than competing approaches. We demonstrate that our approach outperforms (with respect to both objective and runtime) the current best approach for ridesharing on both real world and synthetic datasets.

AIJ Journal 2018 Journal Article

Online spatio-temporal matching in stochastic and dynamic domains

  • Meghna Lowalekar
  • Pradeep Varakantham
  • Patrick Jaillet

Online spatio-temporal matching of servers/services to customers is a problem that arises at a large scale in many domains associated with shared transportation (e. g. , taxis, ride sharing, super shuttles, etc.) and delivery services (e. g. , food, equipment, clothing, home fuel, etc.). A key characteristic of these problems is that the matching of servers/services to customers in one stage has a direct impact on the matching in the next stage. For instance, it is efficient for taxis to pick up customers closer to the drop off point of the customer from the first stage of matching. Traditionally, greedy/myopic approaches have been adopted to address such large scale online matching problems. While they provide solutions in a scalable manner, due to their myopic nature, the quality of matching obtained can be improved significantly (demonstrated in our experimental results). In this paper, we present a multi-stage stochastic optimization formulation to consider potential future demand scenarios (obtained from past data). We then provide an enhancement to solve large scale problems more effectively and efficiently online. We also provide the worst-case theoretical bounds on the performance of different approaches. Finally, we demonstrate the significant improvement provided by our techniques over myopic approaches and two other multi-stage approaches from literature (Approximate Dynamic Programming and Hybrid Multi-Stage Stochastic optimization formulation) on three real world taxi data sets.

JAIR Journal 2017 Journal Article

Dynamic Repositioning to Reduce Lost Demand in Bike Sharing Systems

  • Supriyo Ghosh
  • Pradeep Varakantham
  • Yossiri Adulyasak
  • Patrick Jaillet

Bike Sharing Systems (BSSs) are widely adopted in major cities of the world due to concerns associated with extensive private vehicle usage, namely, increased carbon emissions, traffic congestion and usage of nonrenewable resources. In a BSS, base stations are strategically placed throughout a city and each station is stocked with a pre-determined number of bikes at the beginning of the day. Customers hire the bikes from one station and return them at another station. Due to unpredictable movements of customers hiring bikes, there is either congestion (more than required) or starvation (fewer than required) of bikes at base stations. Existing data has shown that congestion/starvation is a common phenomenon that leads to a large number of unsatisfied customers resulting in a significant loss in customer demand. In order to tackle this problem, we propose an optimisation formulation to reposition bikes using vehicles while also considering the routes for vehicles and future expected demand. Furthermore, we contribute two approaches that rely on decomposability in the problem (bike repositioning and vehicle routing) and aggregation of base stations to reduce the computation time significantly. Finally, we demonstrate the utility of our approach by comparing against two benchmark approaches on two real-world data sets of bike sharing systems. These approaches are evaluated using a simulation where the movements of customers are generated from real-world data sets.

NeurIPS Conference 2017 Conference Paper

Online Learning with a Hint

  • Ofer Dekel
  • Arthur Flajolet
  • Nika Haghtalab
  • Patrick Jaillet

We study a variant of online linear optimization where the player receives a hint about the loss function at the beginning of each round. The hint is given in the form of a vector that is weakly correlated with the loss vector on that round. We show that the player can benefit from such a hint if the set of feasible actions is sufficiently round. Specifically, if the set is strongly convex, the hint can be used to guarantee a regret of O(log(T)), and if the set is q-uniformly convex for q\in(2, 3), the hint can be used to guarantee a regret of o(sqrt{T}). In contrast, we establish Omega(sqrt{T}) lower bounds on regret when the set of feasible actions is a polyhedron.

ICAPS Conference 2017 Conference Paper

Online Repositioning in Bike Sharing Systems

  • Meghna Lowalekar
  • Pradeep Varakantham
  • Supriyo Ghosh
  • Sanjay Dominik Jena
  • Patrick Jaillet

Due to increased traffic congestion and carbon emissions, Bike Sharing Systems (BSSs) are adopted in various cities for short distance travels, specifically for last mile transportation. The success of a bike sharing system depends on its ability to have bikes available at the "right" base stations at the "right" times. Typically, carrier vehicles are used to perform repositioning of bikes between stations so as to satisfy customer requests. Owing to the uncertainty in customer demand and day-long repositioning, the problem of having bikes available at the right base stations at the right times is a challenging one. In this paper, we propose a multi-stage stochastic formulation, to consider expected future demand over a set of scenarios to find an efficient repositioning strategy for bike sharing systems. Furthermore, we provide a Lagrangian decomposition approach (that decouples the global problem into routing and repositioning slaves and employs a novel DP approach to efficiently solve routing slave) and a greedy online anticipatory heuristic to solve large scale problems effectively and efficiently. Finally, in our experimental results, we demonstrate significant reduction in lost demand provided by our techniques on real world datasets from two bike sharing companies in comparison to existing benchmark approaches.

NeurIPS Conference 2017 Conference Paper

Real-Time Bidding with Side Information

  • Arthur Flajolet
  • Patrick Jaillet

We consider the problem of repeated bidding in online advertising auctions when some side information (e. g. browser cookies) is available ahead of submitting a bid in the form of a $d$-dimensional vector. The goal for the advertiser is to maximize the total utility (e. g. the total number of clicks) derived from displaying ads given that a limited budget $B$ is allocated for a given time horizon $T$. Optimizing the bids is modeled as a contextual Multi-Armed Bandit (MAB) problem with a knapsack constraint and a continuum of arms. We develop UCB-type algorithms that combine two streams of literature: the confidence-set approach to linear contextual MABs and the probabilistic bisection search method for stochastic root-finding. Under mild assumptions on the underlying unknown distribution, we establish distribution-independent regret bounds of order $\tilde{O}(d \cdot \sqrt{T})$ when either $B = \infty$ or when $B$ scales linearly with $T$.

JAIR Journal 2017 Journal Article

Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)

  • Asrar Ahmed
  • Pradeep Varakantham
  • Meghna Lowalekar
  • Yossiri Adulyasak
  • Patrick Jaillet

Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.

AAAI Conference 2016 Conference Paper

Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

  • Chun Kai Ling
  • Kian Hsiang Low
  • Patrick Jaillet

This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive -optimal GPP ( -GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of -GPP with performance guarantee. We empirically demonstrate the effectiveness of our -GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.

AAAI Conference 2016 Conference Paper

Online Spatio-Temporal Matching in Stochastic and Dynamic Domains

  • Meghna Lowalekar
  • Pradeep Varakantham
  • Patrick Jaillet

Spatio-temporal matching of services to customers online is a problem that arises on a large scale in many domains associated with shared transportation (ex: taxis, ride sharing, super shuttles, etc.) and delivery services (ex: food, equipment, clothing, home fuel, etc.). A key characteristic of these problems is that matching of services to customers in one round has a direct impact on the matching of services to customers in the next round. For instance, in the case of taxis, in the second round taxis can only pick up customers closer to the drop off point of the customer from the first round of matching. Traditionally, greedy myopic approaches have been adopted to address such large scale online matching problems. While they provide solutions in a scalable manner, due to their myopic nature the quality of matching obtained can be improved significantly (demonstrated in our experimental results). In this paper, we present a two stage stochastic optimization formulation to consider expected future demand. We then provide multiple enhancements to solve large scale problems more effectively and efficiently. Finally, we demonstrate the significant improvement provided by our techniques over myopic approaches on two real world taxi data sets.

ICAPS Conference 2015 Conference Paper

Dynamic Redeployment to Counter Congestion or Starvation in Vehicle Sharing Systems

  • Supriyo Ghosh
  • Pradeep Varakantham
  • Yossiri Adulyasak
  • Patrick Jaillet

Extensive usage of private vehicles has led to increased traffic congestion, carbon emissions, and usage of non-renewable resources. These concerns have led to the wide adoption of vehicle sharing (ex: bike sharing, car sharing) systems in many cities of the world. In vehicle-sharing systems, base stations (ex: docking stations for bikes) are strategically placed throughout a city and each of the base stations contain a pre-determined number of vehicles at the beginning of each day. Due to the stochastic and individualistic movement of customers, there is typically either congestion (more than required)or starvation (fewer than required) of vehicles at certain base stations. As demonstrated in our experimental results, this happens often and can cause a significant loss in demand. We propose to dynamically redeploy idle vehicles using carriers so as to minimize lost de-mand or alternatively maximize revenue for the vehicle sharing company. To that end, we contribute an optimization formulation to jointly address the redeploy-ment (of vehicles) and routing (of carriers) problemsand provide two approaches that rely on decomposability and abstraction of problem domains to reduce the computation time significantly. Finally, we demonstrate the utility of our approaches on two real world data sets of bike-sharing companies.

SoCS Conference 2015 Conference Paper

Dynamic Redeployment to Counter Congestion or Starvation in Vehicle Sharing Systems

  • Supriyo Ghosh
  • Pradeep Varakantham
  • Yossiri Adulyasak
  • Patrick Jaillet

Vehicle sharing (ex: bike sharing, car sharing) systems, an attractive alternative of private transportation, are widely adopted in major cities around the world. In vehicle-sharing systems, base stations (ex: docking stations for bikes) are strategically placed throughout a city and each of the base stations contain a pre-determined number of vehicles at the beginning of each day. Due to the stochastic and individualistic movement of customers, there is typically either congestion (more than required) or starvation (fewer than required) of vehicles at certain base stations, which causes a significant loss in demand. We propose to dynamically redeploy idle vehicles using carriers so as to minimize lost demand or alternatively maximize revenue for the vehicle sharing company. To that end, we contribute an optimization formulation to jointly address the redeployment (of vehicles) and routing (of carriers) problems and provide two approaches that rely on decomposability and abstraction of problem domains to reduce the computation time significantly.

NeurIPS Conference 2015 Conference Paper

Inverse Reinforcement Learning with Locally Consistent Reward Functions

  • Quoc Phong Nguyen
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

Existing inverse reinforcement learning (IRL) algorithms have assumed each expert’s demonstrated trajectory to be produced by only a single reward function. This paper presents a novel generalization of the IRL problem that allows each trajectory to be generated by multiple locally consistent reward functions, hence catering to more realistic and complex experts’ behaviors. Solving our generalized IRL problem thus involves not only learning these reward functions but also the stochastic transitions between them at any state (including unvisited states). By representing our IRL problem with a probabilistic graphical model, an expectation-maximization (EM) algorithm can be devised to iteratively learn the different reward functions and the stochastic transitions between them in order to jointly improve the likelihood of the expert’s demonstrated trajectories. As a result, the most likely partition of a trajectory into segments that are generated from different locally consistent reward functions selected by EM can be derived. Empirical evaluation on synthetic and real-world datasets shows that our IRL algorithm outperforms the state-of-the-art EM clustering with maximum likelihood IRL, which is, interestingly, a reduced variant of our approach.

AAAI Conference 2015 Conference Paper

Parallel Gaussian Process Regression for Big Data: Low-Rank Representation Meets Markov Approximation

  • Kian Hsiang Low
  • Jiangbo Yu
  • Jie Chen
  • Patrick Jaillet

The expressive power of a Gaussian process (GP) model comes at a cost of poor scalability in the data size. To improve its scalability, this paper presents a low-rankcum-Markov approximation (LMA) of the GP model that is novel in leveraging the dual computational advantages stemming from complementing a low-rank approximate representation of the full-rank GP based on a support set of inputs with a Markov approximation of the resulting residual process; the latter approximation is guaranteed to be closest in the Kullback-Leibler distance criterion subject to some constraint and is considerably more refined than that of existing sparse GP models utilizing low-rank representations due to its more relaxed conditional independence assumption (especially with larger data). As a result, our LMA method can trade off between the size of the support set and the order of the Markov property to (a) incur lower computational cost than such sparse GP models while achieving predictive performance comparable to them and (b) accurately represent features/patterns of any scale. Interestingly, varying the Markov order produces a spectrum of LMAs with PIC approximation and full-rank GP at the two extremes. An advantage of our LMA method is that it is amenable to parallelization on multiple machines/cores, thereby gaining greater scalability. Empirical evaluation on three real-world datasets in clusters of up to 32 computing nodes shows that our centralized and parallel LMA methods are significantly more time-efficient and scalable than state-of-the-art sparse and full-rank GP regression methods while achieving comparable predictive performances.

AAAI Conference 2015 Conference Paper

Solving Uncertain MDPs with Objectives that Are Separable over Instantiations of Model Uncertainty

  • Yossiri Adulyasak
  • Pradeep Varakantham
  • Asrar Ahmed
  • Patrick Jaillet

Markov Decision Problems, MDPs offer an effective mechanism for planning under uncertainty. However, due to unavoidable uncertainty over models, it is difficult to obtain an exact specification of an MDP. We are interested in solving MDPs, where transition and reward functions are not exactly specified. Existing research has primarily focussed on computing infinite horizon stationary policies when optimizing robustness, regret and percentile based objectives. We focus specifically on finite horizon problems with a special emphasis on objectives that are separable over individual instantiations of model uncertainty (i. e. , objectives that can be expressed as a sum over instantiations of model uncertainty): (a) First, we identify two separable objectives for uncertain MDPs: Average Value Maximization (AVM) and Confidence Probability Maximisation (CPM). (b) Second, we provide optimization based solutions to compute policies for uncertain MDPs with such objectives. In particular, we exploit the separability of AVM and CPM objectives by employing Lagrangian dual decomposition (LDD). (c) Finally, we demonstrate the utility of the LDD approach on a benchmark problem from the literature.

AAAI Conference 2014 Conference Paper

Decentralized Stochastic Planning with Anonymity in Interactions

  • Pradeep Varakantham
  • Yossiri Adulyasak
  • Patrick Jaillet

In this paper, we solve cooperative decentralized stochastic planning problems, where the interactions between agents (specified using transition and reward functions) are dependent on the number of agents (and not on the identity of the individual agents) involved in the interaction. A collision of robots in a narrow corridor, defender teams coordinating patrol activities to secure a target, etc. are examples of such anonymous interactions. Formally, we consider problems that are a subset of the well known Decentralized MDP (DEC- MDP) model, where the anonymity in interactions is specified within the joint reward and transition functions. In this paper, not only do we introduce a general model model called D-SPAIT to capture anonymity in interactions, but also provide optimization based optimal and local-optimal solutions for generalizable sub-categories of D-SPAIT.

ICML Conference 2014 Conference Paper

Nonmyopic \(\epsilon\)-Bayes-Optimal Active Learning of Gaussian Processes

  • Trong Nghia Hoang
  • Bryan Kian Hsiang Low
  • Patrick Jaillet
  • Mohan S. Kankanhalli

A fundamental issue in active learning of Gaussian processes is that of the exploration-exploitation trade-off. This paper presents a novel nonmyopic ε-Bayes-optimal active learning (ε-BAL) approach that jointly and naturally optimizes the trade-off. In contrast, existing works have primarily developed myopic/greedy algorithms or performed exploration and exploitation separately. To perform active learning in real time, we then propose an anytime algorithm based on ε-BAL with performance guarantee and empirically demonstrate using synthetic and real-world datasets that, with limited budget, it outperforms the state-of-the-art algorithms.

UAI Conference 2013 Conference Paper

Parallel Gaussian Process Regression with Low-Rank Covariance Matrix Approximations

  • Jie Chen 0027
  • Nannan Cao
  • Bryan Kian Hsiang Low
  • Ruofei Ouyang
  • Colin Keng-Yan Tan
  • Patrick Jaillet

Gaussian processes (GP) are Bayesian nonparametric models that are widely used for probabilistic regression. Unfortunately, it cannot scale well with large data nor perform real-time predictions due to its cubic time cost in the data size. This paper presents two parallel GP regression methods that exploit low-rank covariance matrix approximations for distributing the computational load among parallel machines to achieve time efficiency and scalability. We theoretically guarantee the predictive performances of our proposed parallel GPs to be equivalent to that of some centralized approximate GP regression methods: The computation of their centralized counterparts can be distributed among parallel machines, hence achieving greater time efficiency and scalability. We analytically compare the properties of our parallel GPs such as time, space, and communication complexity. Empirical evaluation on two real-world datasets in a cluster of 20 computing nodes shows that our parallel GPs are significantly more time-efficient and scalable than their centralized counterparts and exact/full GP while achieving predictive performances comparable to full GP.

NeurIPS Conference 2013 Conference Paper

Regret based Robust Solutions for Uncertain Markov Decision Processes

  • Asrar Ahmed
  • Pradeep Varakantham
  • Yossiri Adulyasak
  • Patrick Jaillet

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Most robust optimization approaches for these problems have focussed on the computation of {\em maximin} policies which maximize the value corresponding to the worst realization of the uncertainty. Recent work has proposed {\em minimax} regret as a suitable alternative to the {\em maximin} objective for robust optimization. However, existing algorithms for handling {\em minimax} regret are restricted to models with uncertainty over rewards only. We provide algorithms that employ sampling to improve across multiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependence of model uncertainties across state, action pairs and decision epochs; (c) Scalability and quality bounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we provide comparisons against benchmark algorithms on two domains from literature. We also provide a Sample Average Approximation (SAA) analysis to compute a posteriori error bounds.

UAI Conference 2012 Conference Paper

Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena

  • Jie Chen 0027
  • Bryan Kian Hsiang Low
  • Colin Keng-Yan Tan
  • Ali Oran
  • Patrick Jaillet
  • John M. Dolan
  • Gaurav S. Sukhatme

The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2 FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2 FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2 FAS algorithm is significantly more time-efficient and scalable than state-ofthe-art centralized algorithms while achieving comparable predictive performance.