Arrow Research search

Author name cluster

Jonathan Scarlett

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

29 papers
2 author rows

Possible papers

29

AAAI Conference 2024 Conference Paper

Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization

  • Xu Cai
  • Jonathan Scarlett

In this paper, we study the problem of estimating the normalizing constant through queries to the black-box function f, which is the integration of the exponential function of f scaled by a problem parameter lambda. We assume f belongs to a reproducing kernel Hilbert space (RKHS), and show that to estimate the normalizing constant within a small relative error, the level of difficulty depends on the value of lambda: When lambda approaches zero, the problem is similar to Bayesian quadrature (BQ), while when lambda approaches infinity, the problem is similar to Bayesian optimization (BO). More generally, the problem varies between BQ and BO. We find that this pattern holds true even when the function evaluations are noisy, bringing new aspects to this topic. Our findings are supported by both algorithm-independent lower bounds and algorithmic upper bounds, as well as simulation studies conducted on a variety of benchmark functions.

NeurIPS Conference 2024 Conference Paper

Memory-Efficient Gradient Unrolling for Large-Scale Bi-level Optimization

  • Qianli Shen
  • Yezhen Wang
  • Zhouhao Yang
  • Xiang Li
  • Haonan Wang
  • Yang Zhang
  • Jonathan Scarlett
  • Zhanxing Zhu

Bi-level optimizaiton (BO) has become a fundamental mathematical framework for addressing hierarchical machine learning problems. As deep learning models continue to grow in size, the demand for scalable bi-level optimization has become increasingly critical. Traditional gradient-based bi-level optimizaiton algorithms, due to their inherent characteristics, are ill-suited to meet the demands of large-scale applications. In this paper, we introduce **F**orward **G**radient **U**nrolling with **F**orward **G**radient, abbreviated as **$($FG$)^2$U**, which achieves an unbiased stochastic approximation of the meta gradient for bi-level optimizaiton. $($FG$)^2$U circumvents the memory and approximation issues associated with classical bi-level optimizaiton approaches, and delivers significantly more accurate gradient estimates than existing large-scale bi-level optimizaiton approaches. Additionally, $($FG$)^2$U is inherently designed to support parallel computing, enabling it to effectively leverage large-scale distributed computing systems to achieve significant computational efficiency. In practice, $($FG$)^2$U and other methods can be strategically placed at different stages of the training process to achieve a more cost-effective two-phase paradigm. Further, $($FG$)^2$U is easy to implement within popular deep learning frameworks, and can be conveniently adapted to address more challenging zeroth-order bi-level optimizaiton scenarios. We provide a thorough convergence analysis and a comprehensive practical discussion for $($FG$)^2$U, complemented by extensive empirical evaluations, showcasing its superior performance in diverse large-scale bi-level optimizaiton tasks.

TMLR Journal 2024 Journal Article

Regret Bounds for Noise-Free Cascaded Kernelized Bandits

  • Zihan Li
  • Jonathan Scarlett

We consider optimizing a function network in the noise-free grey-box setting with RKHS function classes, where the exact intermediate results are observable. We assume that the structure of the network is known (but not the underlying functions comprising it), and we study three types of structures: (1) chain: a cascade of scalar-valued functions, (2) multi-output chain: a cascade of vector-valued functions, and (3) feed-forward network: a fully connected feed-forward network of scalar-valued functions. We propose a sequential upper confidence bound based algorithm GPN-UCB along with a general theoretical upper bound on the cumulative regret. In addition, we propose a non-adaptive sampling based method along with its theoretical upper bound on the simple regret for the Mat\'ern kernel. We also provide algorithm-independent lower bounds on the simple regret and cumulative regret. Our regret bounds for GPN-UCB have the same dependence on the time horizon as the best known in the vanilla black-box setting, as well as near-optimal dependencies on other parameters (e.g., RKHS norm and network length).

NeurIPS Conference 2023 Conference Paper

A Unified Framework for Uniform Signal Recovery in Nonlinear Generative Compressed Sensing

  • Junren Chen
  • Jonathan Scarlett
  • Michael Ng
  • Zhaoqiang Liu

In generative compressed sensing (GCS), we want to recover a signal $\mathbf{x^*}\in\mathbb{R}^n$ from $m$ measurements ($m\ll n$) using a generative prior $\mathbf{x^*}\in G(\mathbb{B}_2^k(r))$, where $G$ is typically an $L$-Lipschitz continuous generative model and $\mathbb{B}_2^k(r)$ represents the radius-$r$ $\ell_2$-ball in $\mathbb{R}^k$. Under nonlinear measurements, most prior results are non-uniform, i. e. , they hold with high probability for a fixed $\mathbf{x^*}$ rather than for all $\mathbf{x^*}$ simultaneously. In this paper, we build a unified framework to derive uniform recovery guarantees for nonlinear GCS where the observation model is nonlinear and possibly discontinuous or unknown. Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index model as canonical examples. Specifically, using a single realization of the sensing ensemble and generalized Lasso, all $\mathbf{x^*}\in G(\mathbb{B}_2^k(r))$ can be recovered up to an $\ell_2$-error at most $\epsilon$ using roughly $\tilde{O}({k}/{\epsilon^2})$ samples, with omitted logarithmic factors typically being dominated by $\log L$. Notably, this almost coincides with existing non-uniform guarantees up to logarithmic factors, hence the uniformity costs very little. As part of our technical contributions, we introduce Lipschitz approximation to handle discontinuous observation models. We also develop a concentration inequality that produces tighter bound for product process whose index sets have low metric entropy. Experimental results are presented to corroborate our theory.

UAI Conference 2023 Conference Paper

Benefits of monotonicity in safe exploration with Gaussian processes

  • Arpan Losalka
  • Jonathan Scarlett

We consider the problem of sequentially maximising an unknown function over a set of actions while ensuring that every sampled point has a function value below a given safety threshold. We model the function using kernel-based and Gaussian process methods, while differing from previous works in our assumption that the function is monotonically increasing with respect to a safety variable. This assumption is motivated by various practical applications such as adaptive clinical trial design and robotics. Taking inspiration from the GP-UCB and SAFEOPT algorithms, we propose an algorithm, monotone safe UCB (M-SafeUCB) for this task. We show that M-SafeUCB enjoys theoretical guarantees in terms of safety, a suitably-defined regret notion, and approximately finding the entire safe boundary. In addition, we illustrate that the monotonicity assumption yields significant benefits in terms of the guarantees obtained, as well as algorithmic simplicity and efficiency. We support our theoretical findings by performing empirical evaluations on a variety of functions, including a simulated clinical trial experiment.

ICML Conference 2023 Conference Paper

Communication-Constrained Bandits under Additive Gaussian Noise

  • Prathamesh Mayekar
  • Jonathan Scarlett
  • Vincent Y. F. Tan

We study a distributed stochastic multi-armed bandit where a client supplies the learner with communication-constrained feedback based on the rewards for the corresponding arm pulls. In our setup, the client must encode the rewards such that the second moment of the encoded rewards is no more than $P$, and this encoded reward is further corrupted by additive Gaussian noise of variance $\sigma^2$; the learner only has access to this corrupted reward. For this setting, we derive an information-theoretic lower bound of $\Omega\left(\sqrt{\frac{KT}{\mathtt{SNR} \wedge1}} \right)$ on the minimax regret of any scheme, where $\mathtt{SNR}\coloneqq \frac{P}{\sigma^2}$, and $K$ and $T$ are the number of arms and time horizon, respectively. Furthermore, we propose a multi-phase bandit algorithm, $\mathtt{UE}\text{-}\mathtt{UCB}\text{++}$, which matches this lower bound to a minor additive factor. $\mathtt{UE}\text{-}\mathtt{UCB}\text{++}$ performs uniform exploration in its initial phases and then utilizes the upper confidence bound (UCB) bandit algorithm in its final phase. An interesting feature of $\mathtt{UE}\text{-}\mathtt{UCB}\text{++}$ is that the coarser estimates of the mean rewards formed during a uniform exploration phase help to refine the encoding protocol in the next phase, leading to more accurate mean estimates of the rewards in the subsequent phase. This positive reinforcement cycle is critical to reducing the number of uniform exploration rounds and closely matching our lower bound.

AAMAS Conference 2023 Conference Paper

For One and All: Individual and Group Fairness in the Allocation of Indivisible Goods

  • Jonathan Scarlett
  • Nicholas Teh
  • Yair Zick

Fair allocation of indivisible goods is a well-explored problem. Traditionally, research focused on individual fairness — are individual agents satisfied with their allotted share? — and group fairness — are groups of agents treated fairly? In this paper, we explore the coexistence of individual envy-freeness (𝑖-EF) and its group counterpart, group weighted envy-freeness (𝑔-WEF), in the allocation of indivisible goods. We propose several polynomial-time algorithms that provably achieve𝑖-EF and𝑔-WEF simultaneously in various degrees of approximation under three different conditions: (i) when agents have identical additive valuation functions, 𝑖-EFX and 𝑔-WEF1 can be achieved simultaneously; (ii) when agents within a group share a common valuation function, an allocation satisfying both 𝑖-EF1 and 𝑔-WEF1 exists; and (iii) when agents’ valuations for goods within a group differ, we show that while maintaining 𝑖-EF1, we can achieve a 1 3 -approximation to a notion termed ex-ante 𝑔-WEF1. Our results thus provide a first step towards connecting individual and group fairness in the allocation of indivisible goods, in the hopes of its useful application to domains requiring the reconciliation of diversity with individual demands.

TMLR Journal 2023 Journal Article

On Average-Case Error Bounds for Kernel-Based Bayesian Quadrature

  • Xu Cai
  • Thanh Lam
  • Jonathan Scarlett

In this paper, we study error bounds for Bayesian quadrature (BQ), with an emphasis on noisy settings, randomized algorithms, and average-case performance measures. We seek to approximate the integral of functions in a Reproducing Kernel Hilbert Space (RKHS), particularly focusing on the Mat\'ern-$\nu$ and squared exponential (SE) kernels, with samples from the function potentially being corrupted by Gaussian noise. We provide a two-step meta-algorithm that serves as a general tool for relating the average-case quadrature error with the $L^2$-function approximation error. When specialized to the Mat\'ern kernel, we recover an existing near-optimal error rate while avoiding the existing method of repeatedly sampling points. When specialized to other settings, we obtain new average-case results for settings including the SE kernel with noise and the Mat\'ern kernel with misspecification. Finally, we present algorithm-independent lower bounds that have greater generality and/or give distinct proofs compared to existing ones.

NeurIPS Conference 2022 Conference Paper

A Robust Phased Elimination Algorithm for Corruption-Tolerant Gaussian Process Bandits

  • Ilija Bogunovic
  • Zihan Li
  • Andreas Krause
  • Jonathan Scarlett

We consider the sequential optimization of an unknown, continuous, and expensive to evaluate reward function, from noisy and adversarially corrupted observed rewards. When the corruption attacks are subject to a suitable budget $C$ and the function lives in a Reproducing Kernel Hilbert Space (RKHS), the problem can be posed as {\em corrupted Gaussian process (GP) bandit optimization}. We propose a novel robust elimination-type algorithm that runs in epochs, combines exploration with infrequent switching to select a small subset of actions, and plays each action for multiple time instants. Our algorithm, {\em Robust GP Phased Elimination (RGP-PE)}, successfully balances robustness to corruptions with exploration and exploitation such that its performance degrades minimally in the presence (or absence) of adversarial corruptions. When $T$ is the number of samples and $\gamma_T$ is the maximal information gain, the corruption-dependent term in our regret bound is $O(C \gamma_T^{3/2})$, which is significantly tighter than the existing $O(C \sqrt{T \gamma_T})$ for several commonly-considered kernels. We perform the first empirical study of robustness in the corrupted GP bandit setting, and show that our algorithm is robust against a variety of adversarial attacks.

ICML Conference 2022 Conference Paper

Adversarial Attacks on Gaussian Process Bandits

  • Eric Han
  • Jonathan Scarlett

Gaussian processes (GP) are a widely-adopted tool used to sequentially optimize black-box functions, where evaluations are costly and potentially noisy. Recent works on GP bandits have proposed to move beyond random noise and devise algorithms robust to adversarial attacks. This paper studies this problem from the attacker’s perspective, proposing various adversarial attack methods with differing assumptions on the attacker’s strength and prior information. Our goal is to understand adversarial attacks on GP bandits from theoretical and practical perspectives. We focus primarily on targeted attacks on the popular GP-UCB algorithm and a related elimination-based algorithm, based on adversarially perturbing the function f to produce another function f whose optima are in some target region. Based on our theoretical analysis, we devise both white-box attacks (known f) and black-box attacks (unknown f), with the former including a Subtraction attack and Clipping attack, and the latter including an Aggressive subtraction attack. We demonstrate that adversarial attacks on GP bandits can succeed in forcing the algorithm towards the target region even with a low attack budget, and we test our attacks’ effectiveness on a diverse range of objective functions.

ICLR Conference 2022 Conference Paper

Generative Principal Component Analysis

  • Zhaoqiang Liu
  • Jiulong Liu
  • Subhroshekhar Ghosh
  • Jun Han
  • Jonathan Scarlett

In this paper, we study the problem of principal component analysis with generative modeling assumptions, adopting a general model for the observed matrix that encompasses notable special cases, including spiked matrix recovery and phase retrieval. The key assumption is that the first principal eigenvector lies near the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We propose a quadratic estimator, and show that it enjoys a statistical rate of order $\sqrt{\frac{k\log L}{m}}$, where $m$ is the number of samples. Moreover, we provide a variant of the classic power method, which projects the calculated data onto the range of the generative model during each iteration. We show that under suitable conditions, this method converges exponentially fast to a point achieving the above-mentioned statistical rate. This rate is conjectured in~\citep{aubin2019spiked,cocola2020nonasymptotic} to be the best possible even when we only restrict to the special case of spiked matrix models. We perform experiments on various image datasets for spiked matrix and phase retrieval models, and illustrate performance gains of our method to the classic power method and the truncated power method devised for sparse principal component analysis.

ICML Conference 2022 Conference Paper

Improved Convergence Rates for Sparse Approximation Methods in Kernel-Based Learning

  • Sattar Vakili
  • Jonathan Scarlett
  • Da-shan Shiu
  • Alberto Bernacchia

Kernel-based models such as kernel ridge regression and Gaussian processes are ubiquitous in machine learning applications for regression and optimization. It is well known that a major downside for kernel-based models is the high computational cost; given a dataset of $n$ samples, the cost grows as $\mathcal{O}(n^3)$. Existing sparse approximation methods can yield a significant reduction in the computational cost, effectively reducing the actual cost down to as low as $\mathcal{O}(n)$ in certain cases. Despite this remarkable empirical success, significant gaps remain in the existing results for the analytical bounds on the error due to approximation. In this work, we provide novel confidence intervals for the Nyström method and the sparse variational Gaussian process approximation method, which we establish using novel interpretations of the approximate (surrogate) posterior variance of the models. Our confidence intervals lead to improved performance bounds in both regression and optimization problems.

AAAI Conference 2022 Conference Paper

Max-Min Grouped Bandits

  • Zhenlin Wang
  • Jonathan Scarlett

In this paper, we introduce a multi-armed bandit problem termed max-min grouped bandits, in which the arms are arranged in possibly-overlapping groups, and the goal is to find the group whose worst arm has the highest mean reward. This problem is of interest in applications such as recommendation systems and resource allocation, and is also closely related to widely-studied robust optimization problems. We present two algorithms based successive elimination and robust optimization, and derive upper bounds on the number of samples to guarantee finding a max-min optimal or nearoptimal group, as well as an algorithm-independent lower bound. We discuss the degree of tightness of our bounds in various cases of interest, and the difficulties in deriving uniformly tight bounds.

AAAI Conference 2021 Conference Paper

High-Dimensional Bayesian Optimization via Tree-Structured Additive Models

  • Eric Han
  • Ishank Arora
  • Jonathan Scarlett

Bayesian Optimization (BO) has shown significant success in tackling expensive low-dimensional black-box optimization problems. Many optimization problems of interest are high-dimensional, and scaling BO to such settings remains an important challenge. In this paper, we consider generalized additive models in which low-dimensional functions with overlapping subsets of variables are composed to model a high-dimensional target function. Our goal is to lower the computational resources required and facilitate faster model learning by reducing the model complexity while retaining the sample-efficiency of existing methods. Specifically, we constrain the underlying dependency graphs to tree structures in order to facilitate both the structure learning and optimization of the acquisition function. For the former, we propose a hybrid graph learning algorithm based on Gibbs sampling and mutation. In addition, we propose a novel zooming-based algorithm that permits generalized additive models to be employed more efficiently in the case of continuous domains. We demonstrate and discuss the efficacy of our approach via a range of experiments on synthetic functions and real-world datasets.

ICML Conference 2021 Conference Paper

Lenient Regret and Good-Action Identification in Gaussian Process Bandits

  • Xu Cai
  • Selwyn Gomes
  • Jonathan Scarlett

In this paper, we study the problem of Gaussian process (GP) bandits under relaxed optimization criteria stating that any function value above a certain threshold is “good enough”. On the theoretical side, we study various {\em lenient regret} notions in which all near-optimal actions incur zero penalty, and provide upper bounds on the lenient regret for GP-UCB and an elimination algorithm, circumventing the usual $O(\sqrt{T})$ term (with time horizon $T$) resulting from zooming extremely close towards the function maximum. In addition, we complement these upper bounds with algorithm-independent lower bounds. On the practical side, we consider the problem of finding a single “good action” according to a known pre-specified threshold, and introduce several good-action identification algorithms that exploit knowledge of the threshold. We experimentally find that such algorithms can typically find a good action faster than standard optimization-based approaches.

ICML Conference 2021 Conference Paper

On Lower Bounds for Standard and Robust Gaussian Process Bandit Optimization

  • Xu Cai
  • Jonathan Scarlett

In this paper, we consider algorithm independent lower bounds for the problem of black-box optimization of functions having a bounded norm is some Reproducing Kernel Hilbert Space (RKHS), which can be viewed as a non-Bayesian Gaussian process bandit problem. In the standard noisy setting, we provide a novel proof technique for deriving lower bounds on the regret, with benefits including simplicity, versatility, and an improved dependence on the error probability. In a robust setting in which the final point is perturbed by an adversary, we strengthen an existing lower bound that only holds for target success probabilities very close to one, by allowing for arbitrary target success probabilities in (0, 1). Furthermore, in a distinct robust setting in which every sampled point may be perturbed by a constrained adversary, we provide a novel lower bound for deterministic strategies, demonstrating an inevitable joint dependence of the cumulative regret on the corruption level and the time horizon, in contrast with existing lower bounds that only characterize the individual dependencies.

NeurIPS Conference 2021 Conference Paper

Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors

  • Zhaoqiang Liu
  • Subhroshekhar Ghosh
  • Jonathan Scarlett

Compressive phase retrieval is a popular variant of the standard compressive sensing problem in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with near-optimal sample complexity for phase retrieval with generative priors. We first show that when using i. i. d. Gaussian measurements and an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs, roughly $O(k \log L)$ samples suffice to guarantee that any signal minimizing an amplitude-based empirical loss function is close to the true signal. Attaining this sample complexity with a practical algorithm remains a difficult challenge, and finding a good initialization for gradient-based methods has been observed to pose a major bottleneck. To partially address this, we further show that roughly $O(k \log L)$ samples ensure sufficient closeness between the underlying signal and any {\em globally optimal} solution to an optimization problem designed for spectral initialization (though finding such a solution may still be challenging). We also adapt this result to sparse phase retrieval, and show that $O(s \log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional, matching an information-theoretic lower bound. While these guarantees do not directly correspond to a practical algorithm, we propose a practical spectral initialization method motivated by our findings, and experimentally observe performance gains over various existing spectral initialization methods for sparse phase retrieval.

AAAI Conference 2020 Conference Paper

A MaxSAT-Based Framework for Group Testing

  • Lorenzo Ciampiconi
  • Bishwamittra Ghosh
  • Jonathan Scarlett
  • Kuldeep S Meel

The success of MaxSAT (maximum satisfiability) solving in recent years has motivated researchers to apply MaxSAT solvers in diverse discrete combinatorial optimization problems. Group testing has been studied as a combinatorial optimization problem, where the goal is to find defective items among a set of items by performing sets of tests on items. In this paper, we propose a MaxSAT-based framework, called MGT, that solves group testing, in particular, the decoding phase of non-adaptive group testing. We extend this approach to the noisy variant of group testing, and propose a compact MaxSAT-based encoding that guarantees an optimal solution. Our extensive experimental results show that MGT can solve group testing instances of 10000 items with 3% defectivity, which no prior work can handle to the best of our knowledge. Furthermore, MGT has better accuracy than the LP-based approach. We also discover an interesting phase transition behavior in the runtime, which reveals the easy-hard-easy nature of group testing.

ICML Conference 2020 Conference Paper

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

  • Zhaoqiang Liu
  • Selwyn Gomes
  • Avtansh Tiwari
  • Jonathan Scarlett

The goal of standard 1-bit compressive sensing is to accurately recover an unknown sparse vector from binary-valued measurements, each indicating the sign of a linear function of the vector. Motivated by recent advances in compressive sensing with generative models, where a generative modeling assumption replaces the usual sparsity assumption, we study the problem of 1-bit compressive sensing with generative models. We first consider noiseless 1-bit measurements, and provide sample complexity bounds for approximate recovery under i. i. d. Gaussian measurements and a Lipschitz continuous generative prior, as well as a near-matching algorithm-independent lower bound. Moreover, we demonstrate that the Binary $\epsilon$-Stable Embedding property, which characterizes the robustness of the reconstruction to measurement errors and noise, also holds for 1-bit compressive sensing with Lipschitz continuous generative models with sufficiently many Gaussian measurements. In addition, we apply our results to neural network generative models, and provide a proof-of-concept numerical experiment demonstrating significant improvements over sparsity-based approaches.

NeurIPS Conference 2020 Conference Paper

The Generalized Lasso with Nonlinear Observations and Generative Priors

  • Zhaoqiang Liu
  • Jonathan Scarlett

In this paper, we study the problem of signal estimation from noisy non-linear measurements when the unknown $n$-dimensional signal is in the range of an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs. We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models, such as linear, logistic, 1-bit, and other quantized models. In addition, we consider the impact of adversarial corruptions on these measurements. Our analysis is based on a generalized Lasso approach (Plan and Vershynin, 2016). We first provide a non-uniform recovery guarantee, which states that under i. i. d. ~Gaussian measurements, roughly $O\left(\frac{k}{\epsilon^2}\log L\right)$ samples suffice for recovery with an $\ell_2$-error of $\epsilon$, and that this scheme is robust to adversarial noise. Then, we apply this result to neural network generative models, and discuss various extensions to other models and non-i. i. d. ~measurements. Moreover, we show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property, which is satisfied by the 1-bit and censored Tobit models.

NeurIPS Conference 2019 Conference Paper

Learning Erdos-Renyi Random Graphs via Edge Detecting Queries

  • Zihan Li
  • Matthias Fresacher
  • Jonathan Scarlett

In this paper, we consider the problem of learning an unknown graph via queries on groups of nodes, with the result indicating whether or not at least one edge is present among those nodes. While learning arbitrary graphs with $n$ nodes and $k$ edges is known to be hard in the sense of requiring $\Omega( \min\{ k^2 \log n, n^2\})$ tests (even when a small probability of error is allowed), we show that learning an Erd\H{o}s-R\'enyi random graph with an average of $\kbar$ edges is much easier; namely, one can attain asymptotically vanishing error probability with only $O(\kbar \log n)$ tests. We establish such bounds for a variety of algorithms inspired by the group testing problem, with explicit constant factors indicating a near-optimal number of tests, and in some cases asymptotic optimality including constant factors. In addition, we present an alternative design that permits a near-optimal sublinear decoding time of $O(\kbar \log^2 \kbar + \kbar \log n)$.

NeurIPS Conference 2018 Conference Paper

Adversarially Robust Optimization with Gaussian Processes

  • Ilija Bogunovic
  • Jonathan Scarlett
  • Stefanie Jegelka
  • Volkan Cevher

In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail.

ICML Conference 2018 Conference Paper

Tight Regret Bounds for Bayesian Optimization in One Dimension

  • Jonathan Scarlett

We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise. We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $\Omega(\sqrt{T})$ and $O(\sqrt{T\log T})$. This gives a tight characterization up to a $\sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO. Our assumptions are satisfied, for example, by the squared exponential and Matérn-$\nu$ kernels, with the latter requiring $\nu > 2$. Our results certify the near-optimality of existing bounds (Srinivas et al. , 2009) for the SE kernel, while proving them to be strictly suboptimal for the Matérn kernel with $\nu > 2$.

STOC Conference 2017 Conference Paper

An adaptive sublinear-time block sparse fourier transform

  • Volkan Cevher
  • Michael Kapralov
  • Jonathan Scarlett
  • Amir Zandieh

The problem of approximately computing the k dominant Fourier coefficients of a vector X quickly, and using few samples in time domain, is known as the Sparse Fourier Transform (sparse FFT) problem. A long line of work on the sparse FFT has resulted in algorithms with O ( k log n log( n / k )) runtime [Hassanieh et al. , STOC'12] and O ( k log n ) sample complexity [Indyk et al. , FOCS'14]. This paper revisits the sparse FFT problem with the added twist that the sparse coefficients approximately obey a ( k 0 , k 1 )-block sparse model. In this model, signal frequencies are clustered in k 0 intervals with width k 1 in Fourier space, and k = k 0 k 1 is the total sparsity.

NeurIPS Conference 2017 Conference Paper

Phase Transitions in the Pooled Data Problem

  • Jonathan Scarlett
  • Volkan Cevher

In this paper, we study the {\em pooled data} problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a {\em phase transition} between complete success and complete failure. In addition, we present a novel {\em noisy} variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an {\em approximate recovery} setting, where a given number of errors is allowed in the decoded labels.

ICML Conference 2017 Conference Paper

Robust Submodular Maximization: A Non-Uniform Partitioning Approach

  • Ilija Bogunovic
  • Slobodan Mitrovic
  • Jonathan Scarlett
  • Volkan Cevher

We study the problem of maximizing a monotone submodular function subject to a cardinality constraint $k$, with the added twist that a number of items $\tau$ from the returned set may be removed. We focus on the worst-case setting considered by Orlin et al. \ (2016), in which a constant-factor approximation guarantee was given for $\tau = o(\sqrt{k})$. In this paper, we solve a key open problem raised therein, presenting a new Partitioned Robust (PRo) submodular maximization algorithm that achieves the same guarantee for more general $\tau = o(k)$. Our algorithm constructs partitions consisting of buckets with exponentially increasing sizes, and applies standard submodular optimization subroutines on the buckets in order to construct the robust solution. We numerically demonstrate the performance of PRo in data summarization and influence maximization, demonstrating gains over both the greedy algorithm and the algorithm of Orlin et al. \ (2016).

SODA Conference 2016 Conference Paper

Phase Transitions in Group Testing

  • Jonathan Scarlett
  • Volkan Cevher

The group testing problem consists of determining a sparse subset of a set of items that are “defective” based on a set of possibly noisy tests, and arises in areas such as medical testing, fault detection, communication protocols, pattern matching, and database systems. We study the fundamental limits of any group testing procedure regardless of its computational complexity. In the noiseless case with the number of defective items k scaling with the total number of items p as O ( p θ ) ( θ ∊ (0, 1)), we show that the probability of reconstruction error tends to one when, but vanishes when, for some explicit constant c ( θ ). For θ ≤ ⅓, we show that c ( θ ) = 1, thus providing an exact threshold on the required number measurements, i. e. a phase transition, which was previously known only in the limit as θ → 0. Analogous necessary and sufficient conditions are derived for the noisy setting, and also for a relaxed partial recovery criterion.

NeurIPS Conference 2016 Conference Paper

Truncated Variance Reduction: A Unified Approach to Bayesian Optimization and Level-Set Estimation

  • Ilija Bogunovic
  • Jonathan Scarlett
  • Andreas Krause
  • Volkan Cevher

We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.