Author name cluster

Robert Nowak

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Global Minimizers of $\ell^p$-Regularized Objectives Yield the Sparsest ReLU Neural Networks

Julia Nakhleh
Robert Nowak

Overparameterized neural networks can interpolate a given dataset in many different ways, prompting the fundamental question: which among these solutions should we prefer, and what explicit regularization strategies will provably yield these solutions? This paper addresses the challenge of finding the sparsest interpolating ReLU network—i. e. , the network with the fewest nonzero parameters or neurons—a goal with wide-ranging implications for efficiency, generalization, interpretability, theory, and model compression. Unlike post hoc pruning approaches, we propose a continuous, almost-everywhere differentiable training objective whose global minima are guaranteed to correspond to the sparsest single-hidden-layer ReLU networks that fit the data. This result marks a conceptual advance: it recasts the combinatorial problem of sparse interpolation as a smooth optimization task, potentially enabling the use of gradient-based training methods. Our objective is based on minimizing $\ell^p$ quasinorms of the weights for $0 < p < 1$, a classical sparsity-promoting strategy in finite-dimensional settings. However, applying these ideas to neural networks presents new challenges: the function class is infinite-dimensional, and the weights are learned using a highly nonconvex objective. We prove that, under our formulation, global minimizers correspond exactly to sparsest solutions. Our work lays a foundation for understanding when and how continuous sparsity-inducing objectives can be leveraged to recover sparse networks through training.

NeurIPS Conference 2024 Conference Paper

AHA: Human-Assisted Out-of-Distribution Generalization and Detection

Haoyue Bai
Jifan Zhang
Robert Nowak

Modern machine learning models deployed often encounter distribution shifts in real-world applications, manifesting as covariate or semantic out-of-distribution (OOD) shifts. These shifts give rise to challenges in OOD generalization and OOD detection. This paper introduces a novel, integrated approach AHA (Adaptive Human-Assisted OOD learning) to simultaneously address both OOD generalization and detection through a human-assisted framework by labeling data in the wild. Our approach strategically labels examples within a novel maximum disambiguation region, where the number of semantic and covariate OOD data roughly equalizes. By labeling within this region, we can maximally disambiguate the two types of OOD data, thereby maximizing the utility of the fixed labeling budget. Our algorithm first utilizes a noisy binary search algorithm that identifies the maximal disambiguation region with high probability. The algorithm then continues with annotating inside the identified labeling region, reaping the full benefit of human feedback. Extensive experiments validate the efficacy of our framework. We observed that with only a few hundred human annotations, our method significantly outperforms existing state-of-the-art methods that do not involve human assistance, in both OOD generalization and OOD detection.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Jifan Zhang
Lalit Jain
Yang Guo
Jiayi Chen
Kuan L. Zhou
Siddharth Suresh
Andrew Wagenmaker
Scott Sievert

We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human votes on more than 2. 2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over the past eight years. This unique dataset supports the development and evaluation of multimodal large language models and preference-based fine-tuning algorithms for humorous caption generation. We propose novel benchmarks for judging the quality of model-generated captions, utilizing both GPT4 and human judgments to establish ranking-based evaluation strategies. Our experimental results highlight the limitations of current fine-tuning methods, such as RLHF and DPO, when applied to creative tasks. Furthermore, we demonstrate that even state-of-the-art models like GPT4 and Claude currently underperform top human contestants in generating humorous captions. As we conclude this extensive data collection effort, we release the entire preference dataset to the research community, fostering further advancements in AI humor generation and evaluation.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Algorithm Selection for Deep Active Learning with Imbalanced Datasets

Jifan Zhang
Shuai Shao
Saurabh Verma
Robert Nowak

Label efficiency has become an increasingly important objective in deep learning applications. Active learning aims to reduce the number of labeled examples needed to train deep networks, but the empirical performance of active learning algorithms can vary dramatically across datasets and applications. It is difficult to know in advance which active learning strategy will perform well or best in a given application. To address this, we propose the first adaptive algorithm selection strategy for deep active learning. For any unlabeled dataset, our (meta) algorithm TAILOR (Thompson ActIve Learning algORithm selection) iteratively and adaptively chooses among a set of candidate active learning algorithms. TAILOR uses novel reward functions aimed at gathering class-balanced examples. Extensive experiments in multi-class and multi-label applications demonstrate TAILOR's effectiveness in achieving accuracy comparable or better than that of the best of the candidate algorithms. Our implementation of TAILOR is open-sourced at https: //github. com/jifanz/TAILOR.

JBHI Journal 2023 Journal Article

MSHGANMDA: Meta-Subgraphs Heterogeneous Graph Attention Network for miRNA-Disease Association Prediction

Shudong Wang
Fuyu Wang
Sibo Qiao
Yu Zhuang
Kuijie Zhang
Shanchen Pang
Robert Nowak
Zhihan Lv

MicroRNAs (miRNAs) influence several biological processes involved in human disease. Biological experiments for verifying the association between miRNA and disease are always costly in terms of both money and time. Although numerous biological experiments have identified multi-types of associations between miRNAs and diseases, existing computational methods are unable to sufficiently mine the knowledge in these associations to predict unknown associations. In this study, we innovatively propose a heterogeneous graph attention network model based on meta-subgraphs (MSHGANMDA) to predict the potential miRNA-disease associations. Firstly, we define five types of meta-subgraph from the known miRNA-disease associations. Then, we use meta-subgraph attention and meta-subgraph semantic attention to extract features of miRNA-disease pairs within and between these five meta-subgraphs, respectively. Finally, we apply a fully-connected layer (FCL) to predict the scores of unknown miRNA-disease associations and cross-entropy loss to train our model end-to-end. To evaluate the effectiveness of MSHGANMDA, we apply five-fold cross-validation to calculate the mean values of evaluation metrics Accuracy, Precision, Recall, and F1-score as 0. 8595, 0. 8601, 0. 8596, and 0. 8595, respectively. Experiments show that our model, which primarily utilizes multi-types of miRNA-disease association data, gets the greatest ROC-AUC value of 0. 934 when compared to other state-of-the-art approaches. Furthermore, through case studies, we further confirm the effectiveness of MSHGANMDA in predicting unknown diseases.

NeurIPS Conference 2023 Conference Paper

Multi-task Representation Learning for Pure Exploration in Bilinear Bandits

Subhojyoti Mukherjee
Qiaomin Xie
Josiah Hanna
Robert Nowak

We study multi-task representation learning for the problem of pure exploration in bilinear bandits. In bilinear bandits, an action takes theform of a pair of arms from two different entity types and the reward is a bilinear function of the known feature vectors of the arms. In the \textit{multi-task bilinear bandit problem}, we aim to find optimal actions for multiple tasks that share a common low-dimensional linear representation. The objective is to leverage this characteristic to expedite the process of identifying the best pair of arms for all tasks. We propose the algorithm GOBLIN that uses an experimental design approach to optimize sample allocations for learning the global representation as well as minimize the number of samples needed to identify the optimal pair of arms in individual tasks. To the best of our knowledge, this is the first study to give sample complexity analysis for pure exploration in bilinear bandits with shared representation. Our results demonstrate that by learning the shared representation across tasks, we achieve significantly improved sample complexity compared to the traditional approach of solving tasks independently.

JBHI Journal 2023 Journal Article

Wi-Breath: A WiFi-Based Contactless and Real-Time Respiration Monitoring Scheme for Remote Healthcare

Nan Bao
Jiajun Du
Chengyang Wu
Duo Hong
Junxin Chen
Robert Nowak
Zhihan Lv

Respiration rate is an important healthcare indicator, and it has become a popular research topic in remote healthcare applications with Internet of Things. Existing respiration monitoring systems have limitations in terms of convenience, comfort, and privacy, etc. This paper presents a contactless and real-time respiration monitoring system, the so-called Wi-Breath, based on off-the-shelf WiFi devices. The system monitors respiration with both the amplitude and phase difference of the WiFi channel state information (CSI), which is sensitive to human body micro movement. The phase information of the CSI signal is considered and both the amplitude and phase difference are used. For better respiration detection accuracy, a signal selection method is proposed to select an appropriate signal from the amplitude and phase difference based on a support vector machine (SVM) algorithm. Experimental results demonstrate that the Wi-Breath achieves an accuracy of 91. 2% for respiration detection, and has a 17. 0% reduction in average error in comparison with state-of-the-art counterparts.

NeurIPS Conference 2022 Conference Paper

Active Learning with Neural Networks: Insights from Nonparametric Statistics

Yinglun Zhu
Robert Nowak

Deep neural networks have great representation power, but typically require large numbers of training examples. This motivates deep active learning methods that can significantly reduce the amount of labeled training data. Empirical successes of deep active learning have been recently reported in the literature, however, rigorous label complexity guarantees of deep active learning have remained elusive. This constitutes a significant gap between theory and practice. This paper tackles this gap by providing the first near-optimal label complexity guarantees for deep active learning. The key insight is to study deep active learning from the nonparametric classification perspective. Under standard low noise conditions, we show that active learning with neural networks can provably achieve the minimax label complexity, up to disagreement coefficient and other logarithmic terms. When equipped with an abstention option, we further develop an efficient deep active learning algorithm that achieves $\mathsf{polylog}(\frac{1}{\varepsilon})$ label complexity, without any low noise assumptions. We also provide extensions of our results beyond the commonly studied Sobolev/H\"older spaces and develop label complexity guarantees for learning in Radon $\mathsf{BV}^2$ spaces, which have recently been proposed as natural function spaces associated with neural networks.

JBHI Journal 2022 Journal Article

DDCNN: A Deep Learning Model for AF Detection From a Single-Lead Short ECG Signal

Zhaocheng Yu
Junxin Chen
Yu Liu
Yongyong Chen
Tingting Wang
Robert Nowak
Zhihan Lv

With the popularity of the wireless body sensor network, real-time and continuous collection of single-lead electrocardiogram (ECG) data becomes possible in a convenient way. Data mining from the collected single-lead ECG waves has therefore aroused extensive attention worldwide, where early detection of atrial fibrillation (AF) is a hot research topic. In this paper, a two-channel convolutional neural network combined with a data augmentation method is proposed to detect AF from single-lead short ECG recordings. It consists of three modules, the first module denoises the raw ECG signals and produces 9-s ECG signals and heart rate (HR) values. Then, the ECG signals and HR rate values are fed into the convolutional layers for feature extraction, followed by three fully connected layers to perform the classification. The data augmentation method is used to generate synthetic signals to enlarge the training set and increase the diversity of the single-lead ECG signals. Validation experiments and the comparison with state-of-the-art studies demonstrate the effectiveness and advantages of the proposed method.

NeurIPS Conference 2022 Conference Paper

Efficient Active Learning with Abstention

Yinglun Zhu
Robert Nowak

The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved in very special cases, but fundamental lower bounds show that such improvements are impossible in general. This suggests a need to explore alternative goals for active learning. Learning with abstention is one such alternative. In this setting, the active learning algorithm may abstain from prediction and incur an error that is marginally smaller than random guessing. We develop the first computationally efficient active learning algorithm with abstention. Our algorithm provably achieves $\mathsf{polylog}(\frac{1}{\varepsilon})$ label complexity, without any low noise conditions. Such performance guarantee reduces the label complexity by an exponential factor, relative to passive learning and active learning that is not allowed to abstain. Furthermore, our algorithm is guaranteed to only abstain on hard examples (where the true label distribution is close to a fair coin), a novel property we term \emph{proper abstention} that also leads to a host of other desirable characteristics (e. g. , recovering minimax guarantees in the standard setting, and avoiding the undesirable ``noise-seeking'' behavior often seen in active learning). We also provide novel extensions of our algorithm that achieve \emph{constant} label complexity and deal with model misspecification.

NeurIPS Conference 2022 Conference Paper

One for All: Simultaneous Metric and Preference Learning over Multiple Users

Gregory Canal
Blake Mason
Ramya Korlakai Vinayak
Robert Nowak

This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by $d$-dimensional feature vectors and paired comparisons of the form ``item $i$ is preferable to item $j$'' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal point for each user reflecting their individual preferences. This model has the flexibility to capture individual preferences, while enjoying a metric learning sample cost that is amortized over the crowd. We first study this problem in a noiseless, continuous response setting (i. e. , responses equal to differences of item distances) to understand the fundamental limits of learning. Next, we establish prediction error guarantees for noisy, binary measurements such as may be collected from human respondents, and show how the sample complexity improves when the underlying metric is low-rank. Finally, we establish recovery guarantees under assumptions on the response distribution. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements across a large number of users.

AAAI Conference 2022 Conference Paper

Similarity Search for Efficient Active Learning and Search of Rare Concepts

Cody Coleman
Edward Chou
Julian Katz-Samuels
Sean Culatana
Peter Bailis
Alexander C. Berg
Robert Nowak
Roshan Sumbaly

Many active learning and search approaches are intractable for large-scale industrial settings with billions of unlabeled examples. Existing approaches search globally for the optimal examples to label, scaling linearly or even quadratically with the unlabeled data. In this paper, we improve the computational efficiency of active learning and search methods by restricting the candidate pool for labeling to the nearest neighbors of the currently labeled set instead of scanning over all of the unlabeled data. We evaluate several selection strategies in this setting on three large-scale computer vision datasets: ImageNet, OpenImages, and a de-identified and aggregated dataset of 10 billion publicly shared images provided by a large internet company. Our approach achieved similar mAP and recall as the traditional global approach while reducing the computational cost of selection by up to three orders of magnitude, enabling web-scale active learning.

NeurIPS Conference 2021 Conference Paper

Pure Exploration in Kernel and Neural Bandits

Yinglun Zhu
Dongruo Zhou
Ruoxi Jiang
Quanquan Gu
Rebecca Willett
Robert Nowak

We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecifications. Our approach is conceptually very different from existing works that can either only handle low-dimensional linear bandits or passively deal with model misspecifications. We showcase the application of our approach to two pure exploration settings that were previously under-studied: (1) the reward function belongs to a possibly infinite-dimensional Reproducing Kernel Hilbert Space, and (2) the reward function is nonlinear and can be approximated by neural networks. Our main results provide sample complexity guarantees that only depend on the effective dimension of the feature spaces in the kernel or neural representations. Extensive experiments conducted on both synthetic and real-world datasets demonstrate the efficacy of our methods.

NeurIPS Conference 2020 Conference Paper

Finding All $\epsilon$-Good Arms in Stochastic Bandits

Blake Mason
Lalit Jain
Ardhendu Tripathy
Robert Nowak

The pure-exploration problem in stochastic multi-armed bandits aims to find one or more arms with the largest (or near largest) means. Examples include finding an $\epsilon$-good arm, best-arm identification, top-$k$ arm identification, and finding all arms with means above a specified threshold. However, the problem of finding \emph{all} $\epsilon$-good arms has been overlooked in past work, although arguably this may be the most natural objective in many applications. For example, a virologist may conduct preliminary laboratory experiments on a large candidate set of treatments and move all $\epsilon$-good treatments into more expensive clinical trials. Since the ultimate clinical efficacy is uncertain, it is important to identify all $\epsilon$-good candidates. Mathematically, the all-$\epsilon$-good arm identification problem is presents significant new challenges and surprises that do not arise in the pure-exploration objectives studied in the past. We introduce two algorithms to overcome these and demonstrate their great empirical performance on a large-scale crowd-sourced dataset of $2. 2$M ratings collected by the New Yorker Caption Contest as well as a dataset testing hundreds of possible cancer drugs.

AAAI Conference 2020 Conference Paper

Linear Bandits with Feature Feedback

Urvashi Oswal
Aniruddha Bhargava
Robert Nowak

This paper explores a new form of the linear bandit problem in which the algorithm receives the usual stochastic rewards as well as stochastic feedback about which features are relevant to the rewards, the latter feedback being the novel aspect. The focus of this paper is the development of new theory and algorithms for linear bandits with feature feedback which can achieve regret over time horizon T that scales like k √ T, without prior knowledge of which features are relevant nor the number k of relevant features. In comparison, the regret of traditional linear bandits is d √ T, where d is the total number of (relevant and irrelevant) features, so the improvement can be dramatic if k d. The computational complexity of the algorithm is proportional to k rather than d, making it much more suitable for real-world applications compared to traditional linear bandits. We demonstrate the performance of the algorithm with synthetic and real human-labeled data.

NeurIPS Conference 2020 Conference Paper

On Regret with Multiple Best Arms

Yinglun Zhu
Robert Nowak

We study a regret minimization problem with the existence of multiple best/near-optimal arms in the multi-armed bandit setting. We consider the case when the number of arms/actions is comparable or much larger than the time horizon, and make no assumptions about the structure of the bandit instance. Our goal is to design algorithms that can automatically adapt to the unknown hardness of the problem, i. e. , the number of best arms. Our setting captures many modern applications of bandit algorithms where the action space is enormous and the information about the underlying instance/structure is unavailable. We first propose an adaptive algorithm that is agnostic to the hardness level and theoretically derive its regret bound. We then prove a lower bound for our problem setting, which indicates: (1) no algorithm can be minimax optimal simultaneously over all hardness levels; and (2) our algorithm achieves a rate function that is Pareto optimal. With additional knowledge of the expected reward of the best arm, we propose another adaptive algorithm that is minimax optimal, up to polylog factors, over all hardness levels. Experimental results confirm our theoretical guarantees and show advantages of our algorithms over the previous state-of-the-art.

NeurIPS Conference 2019 Conference Paper

Learning Nearest Neighbor Graphs from Noisy Distance Samples

Blake Mason
Ardhendu Tripathy
Robert Nowak

We consider the problem of learning the nearest neighbor graph of a dataset of n items. The metric is unknown, but we can query an oracle to obtain a noisy estimate of the distance between any pair of items. This framework applies to problem domains where one wants to learn people's preferences from responses commonly modeled as noisy distance judgments. In this paper, we propose an active algorithm to find the graph with high probability and analyze its query complexity. In contrast to existing work that forces Euclidean structure, our method is valid for general metrics, assuming only symmetry and the triangle inequality. Furthermore, we demonstrate efficiency of our method empirically and theoretically, needing only O(n\log(n)\Delta^{-2}) queries in favorable settings, where \Delta^{-2} accounts for the effect of noise. Using crowd-sourced data collected for a subset of the UT~Zappos50K dataset, we apply our algorithm to learn which shoes people believe are most similar and show that it beats both an active baseline and ordinal embedding.

NeurIPS Conference 2019 Conference Paper

MaxGap Bandit: Adaptive Algorithms for Approximate Ranking

Sumeet Katariya
Ardhendu Tripathy
Robert Nowak

This paper studies the problem of adaptively sampling from K distributions (arms) in order to identify the largest gap between any two adjacent means. We call this the MaxGap-bandit problem. This problem arises naturally in approximate ranking, noisy sorting, outlier detection, and top-arm identification in bandits. The key novelty of the MaxGap bandit problem is that it aims to adaptively determine the natural partitioning of the distributions into a subset with larger means and a subset with smaller means, where the split is determined by the largest gap rather than a pre-specified rank or threshold. Estimating an arm’s gap requires sampling its neighboring arms in addition to itself, and this dependence results in a novel hardness parameter that characterizes the sample complexity of the problem. We propose elimination and UCB-style algorithms and show that they are minimax optimal. Our experiments show that the UCB-style algorithms require 6-8x fewer samples than non-adaptive sampling to achieve the same error.

NeurIPS Conference 2017 Conference Paper

A KL-LUCB algorithm for Large-Scale Crowdsourcing

Ervin Tanczos
Robert Nowak
Bob Mankoff

This paper focuses on best-arm identification in multi-armed bandits with bounded rewards. We develop an algorithm that is a fusion of lil-UCB and KL-LUCB, offering the best qualities of the two algorithms in one method. This is achieved by proving a novel anytime confidence bound for the mean of bounded distributions, which is the analogue of the LIL-type bounds recently developed for sub-Gaussian distributions. We corroborate our theoretical results with numerical experiments based on the New Yorker Cartoon Caption Contest.

NeurIPS Conference 2017 Conference Paper

Learning Low-Dimensional Metrics

Blake Mason
Lalit Jain
Robert Nowak

This paper investigates the theoretical foundations of metric learning, focused on three key questions that are not fully addressed in prior work: 1) we consider learning general low-dimensional (low-rank) metrics as well as sparse metrics; 2) we develop upper and lower (minimax) bounds on the generalization error; 3)we quantify the sample complexity of metric learning in terms of the dimension of the feature space and the dimension/rank of the underlying metric; 4) we also bound the accuracy of the learned metric relative to the underlying true generative metric. All the results involve novel mathematical approaches to the metric learning problem, and also shed new light on the special case of ordinal embedding (aka non-metric multidimensional scaling).

AAAI Conference 2017 Conference Paper

On Learning High Dimensional Structured Single Index Models

Ravi Ganti
Nikhil Rao
Laura Balzano
Rebecca Willett
Robert Nowak

Single Index Models (SIMs) are simple yet ﬂexible semiparametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimensional regime, a method that can efﬁciently learn SIMs in high dimensions, and under general structural assumptions, has not been forthcoming. In this paper, we propose computationally efﬁcient algorithms for SIM inference in high dimensions with structural constraints. Our general approach specializes to sparsity, group sparsity, and low-rank assumptions among others. Experiments show that the proposed method enjoys superior predictive performance when compared to generalized linear models, and achieves results comparable to or better than single layer feedforward neural networks with signiﬁcantly less computational cost.

NeurIPS Conference 2017 Conference Paper

Scalable Generalized Linear Bandits: Online Computation and Hashing

Kwang-Sung Jun
Aniruddha Bhargava
Robert Nowak
Rebecca Willett

Generalized Linear Bandits (GLBs), a natural extension of the stochastic linear bandits, has been popular and successful in recent years. However, existing GLBs scale poorly with the number of rounds and the number of arms, limiting their utility in practice. This paper proposes new, scalable solutions to the GLB problem in two respects. First, unlike existing GLBs, whose per-time-step space and time complexity grow at least linearly with time $t$, we propose a new algorithm that performs online computations to enjoy a constant space and time complexity. At its heart is a novel Generalized Linear extension of the Online-to-confidence-set Conversion (GLOC method) that takes \emph{any} online learning algorithm and turns it into a GLB algorithm. As a special case, we apply GLOC to the online Newton step algorithm, which results in a low-regret GLB algorithm with much lower time and memory complexity than prior work. Second, for the case where the number $N$ of arms is very large, we propose new algorithms in which each next arm is selected via an inner product search. Such methods can be implemented via hashing algorithms (i. e. , ``hash-amenable'') and result in a time complexity sublinear in $N$. While a Thompson sampling extension of GLOC is hash-amenable, its regret bound for $d$-dimensional arm sets scales with $d^{3/2}$, whereas GLOC's regret bound scales with $d$. Towards closing this gap, we propose a new hash-amenable algorithm whose regret bound scales with $d^{5/4}$. Finally, we propose a fast approximate hash-key computation (inner product) with a better accuracy than the state-of-the-art, which can be of independent interest. We conclude the paper with preliminary experimental results confirming the merits of our methods.

IJCAI Conference 2013 Conference Paper

Socioscope: Spatio-Temporal Signal Recovery from Social Media (Extended Abstract)

Jun-Ming Xu
Aniruddha Bhargava
Robert Nowak
Xiaojin Zhu

Counting the number of social media posts on a target phenomenon has become a popular method to monitor a spatiotemporal signal. However, such counting is plagued by biased, missing, or scarce data. We address these issues by formulating signal recovery as a Poisson point process estimation problem. We explicitly incorporate human population bias, time delays and spatial distortions, and spatiotemporal regularization into the model to address the data quality issues. Our model produces qualitatively convincing results in a case study on wildlife roadkill monitoring.

PDF Details DOI

NeurIPS Conference 2012 Conference Paper

Query Complexity of Derivative-Free Optimization

Kevin Jamieson
Robert Nowak
Ben Recht

Derivative Free Optimization (DFO) is attractive when the objective function's derivatives are not available and evaluations are costly. Moreover, if the function evaluations are noisy, then approximating gradients by finite differences is difficult. This paper gives quantitative lower bounds on the performance of DFO with noisy function evaluations, exposing a fundamental and unavoidable gap between optimization performance based on noisy evaluations versus noisy gradients. This challenges the conventional wisdom that the method of finite differences is comparable to a stochastic gradient. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it only uses Boolean-valued function comparisons, rather than evaluations. This makes the algorithm useful in an even wider range of applications, including optimization based on paired comparisons from human subjects, for example. Remarkably, we show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.

NeurIPS Conference 2011 Conference Paper

Active Ranking using Pairwise Comparisons

Kevin Jamieson
Robert Nowak

This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of $n$ objects can be identified by standard sorting methods using $n\log_2 n$ pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. {Specifically, we assume that the objects can be embedded into a $d$-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in $\R^d$. We show that under this assumption the number of possible rankings grows like $n^{2d}$ and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than $d\log n$ adaptively selected pairwise comparisons, on average. } If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis.

NeurIPS Conference 2010 Conference Paper

Transduction with Matrix Completion: Three Birds with One Stone

Andrew Goldberg
Ben Recht
Junming Xu
Robert Nowak
Jerry Zhu

We pose transductive classification as a matrix completion problem. By assuming the underlying matrix has a low rank, our formulation is able to handle three problems simultaneously: i) multi-label learning, where each item has more than one label, ii) transduction, where most of these labels are unspecified, and iii) missing data, where a large number of features are missing. We obtained satisfactory results on several real-world tasks, suggesting that the low rank assumption may not be as restrictive as it seems. Our method allows for different loss functions to apply on the feature and label entries of the matrix. The resulting nuclear norm minimization problem is solved with a modified fixed-point continuation method that is guaranteed to find the global optimum.

NeurIPS Conference 2009 Conference Paper

Noisy Generalized Binary Search

Robert Nowak

This paper addresses the problem of noisy Generalized Binary Search (GBS). GBS is a well-known greedy algorithm for determining a binary-valued hypothesis through a sequence of strategically selected queries. At each step, a query is selected that most evenly splits the hypotheses under consideration into two disjoint subsets, a natural generalization of the idea underlying classic binary search. GBS is used in many applications, including fault testing, machine diagnostics, disease diagnosis, job scheduling, image processing, computer vision, and active learning. In most of these cases, the responses to queries can be noisy. Past work has provided a partial characterization of GBS, but existing noise-tolerant versions of GBS are suboptimal in terms of sample complexity. This paper presents the first optimal algorithm for noisy GBS and demonstrates its application to learning multidimensional threshold functions.

YNIMG Journal 2009 Journal Article

Space–time event sparse penalization for magneto-/electroencephalography

Andrew Bolstad
Barry Van Veen
Robert Nowak

This article presents a new spatio-temporal method for M/EEG source reconstruction based on the assumption that only a small number of events, localized in space and/or time, are responsible for the measured signal. Each space–time event is represented using a basis function expansion which reflects the most relevant (or measurable) features of the signal. This model of neural activity leads naturally to a Bayesian likelihood function which balances the model fit to the data with the complexity of the model, where the complexity is related to the number of included events. A novel Expectation-Maximization algorithm which maximizes the likelihood function is presented. The new method is shown to be effective on several MEG simulations of neurological activity as well as data from a self-paced finger tapping experiment.

NeurIPS Conference 2008 Conference Paper

Human Active Learning

Rui Castro
Charles Kalish
Robert Nowak
Ruichen Qian
Tim Rogers
Jerry Zhu

We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings.

NeurIPS Conference 2008 Conference Paper

Unlabeled data: Now it helps, now it doesn't

Aarti Singh
Robert Nowak
Jerry Zhu

Empirical evidence shows that in favorable situations semi-supervised learning (SSL) algorithms can capitalize on the abundancy of unlabeled training data to improve the performance of a learning task, in the sense that fewer labeled training data are needed to achieve a target error bound. However, in other situations unlabeled data do not seem to help. Recent attempts at theoretically characterizing the situations in which unlabeled data can help have met with little success, and sometimes appear to conflict with each other and intuition. In this paper, we attempt to bridge the gap between practice and theory of semi-supervised learning. We develop a rigorous framework for analyzing the situations in which unlabeled data can help and quantify the improvement possible using finite sample error bounds. We show that there are large classes of problems for which SSL can significantly outperform supervised learning, in finite sample regimes and sometimes also in terms of error convergence rates.

NeurIPS Conference 2006 Conference Paper

Inferring Network Structure from Co-Occurrences

Michael Rabbat
Mário Figueiredo
Robert Nowak

We consider the problem of inferring the structure of a network from cooccurrence data: observations that indicate which nodes occur in a signaling pathway but do not directly reveal node order within the pathway. This problem is motivated by network inference problems arising in computational biology and communication systems, in which it is difficult or impossible to obtain precise time ordering information. Without order information, every permutation of the activated nodes leads to a different feasible solution, resulting in combinatorial explosion of the feasible set. However, physical principles underlying most networked systems suggest that not all feasible solutions are equally likely. Intuitively, nodes that co-occur more frequently are probably more closely connected. Building on this intuition, we model path co-occurrences as randomly shuffled samples of a random walk on the network. We derive a computationally efficient network inference algorithm and, via novel concentration inequalities for importance sampling estimators, prove that a polynomial complexity Monte Carlo version of the algorithm converges with high probability.

NeurIPS Conference 2005 Conference Paper

Faster Rates in Regression via Active Learning

Rebecca Willett
Robert Nowak
Rui Castro

This paper presents a rigorous statistical analysis characterizing regimes in which active learning significantly outperforms classical passive learning. Active learning algorithms are able to make queries or select sample locations in an online fashion, depending on the results of the previous queries. In some regimes, this extra flexibility leads to significantly faster rates of error decay than those possible in classical passive learning settings. The nature of these regimes is explored by studying fundamental performance limits of active and passive learning in two illustrative nonparametric function classes. In addition to examining the theoretical potential of active learning, this paper describes a practical algorithm capable of exploiting the extra flexibility of the active setting and provably improving upon the classical passive techniques. Our active learning theory and methods show promise in a number of applications, including field estimation using wireless sensor networks and fault line detection.

NeurIPS Conference 2005 Conference Paper

Learning Minimum Volume Sets

Clayton Scott
Robert Nowak

Given a probability measure P and a reference measure µ, one is often interested in the minimum µ-measure set with P -measure at least α. Minimum volume sets of this type summarize the regions of greatest probability mass of P, and are useful for detecting anoma- lies and constructing conﬁdence regions. This paper addresses the problem of estimating minimum volume sets based on independent samples distributed according to P. Other than these samples, no other information is available regarding P, but the reference mea- sure µ is assumed to be known. We introduce rules for estimating minimum volume sets that parallel the empirical risk minimization and structural risk minimization principles in classiﬁcation. As in classiﬁcation, we show that the performances of our estimators are controlled by the rate of uniform convergence of empirical to true probabilities over the class from which the estimator is drawn. Thus we obtain ﬁnite sample size performance bounds in terms of VC dimension and related quantities. We also demonstrate strong universal consistency and an oracle inequality. Estimators based on histograms and dyadic partitions illustrate the proposed rules.

NeurIPS Conference 2004 Conference Paper

On the Adaptive Properties of Decision Trees

Clayton Scott
Robert Nowak

Decision trees are surprisingly adaptive in three important respects: They automatically (1) adapt to favorable conditions near the Bayes decision boundary; (2) focus on data distributed on lower dimensional manifolds; (3) reject irrelevant features. In this paper we examine a decision tree based on dyadic splits that adapts to each of these conditions to achieve minimax optimal rates of convergence. The proposed classifier is the first known to achieve these optimal rates while being practical and im- plementable.

NeurIPS Conference 2003 Conference Paper

Near-Minimax Optimal Classification with Dyadic Classification Trees

Clayton Scott
Robert Nowak

This paper reports on a family of computationally practical classiﬁers that converge to the Bayes error at near-minimax optimal rates for a va- riety of distributions. The classiﬁers are based on dyadic classiﬁcation trees (DCTs), which involve adaptively pruned partitions of the feature space. A key aspect of DCTs is their spatial adaptivity, which enables lo- cal (rather than global) ﬁtting of the decision boundary. Our risk analysis involves a spatial decomposition of the usual concentration inequalities, leading to a spatially adaptive, data-dependent pruning criterion. For any distribution on (X, Y ) whose Bayes decision boundary behaves locally like a Lipschitz smooth function, we show that the DCT error converges to the Bayes error at a rate within a logarithmic factor of the minimax optimal rate. We also study DCTs equipped with polynomial classiﬁca- tion rules at each leaf, and show that as the smoothness of the boundary increases their errors converge to the Bayes error at a rate approaching n−1/2, the parametric rate. We are not aware of any other practical classi- ﬁers that provide similar rate of convergence guarantees. Fast algorithms for tree pruning are discussed.

NeurIPS Conference 2002 Conference Paper

Dyadic Classification Trees via Structural Risk Minimization

Clayton Scott
Robert Nowak

Classiﬁcation trees are one of the most popular types of classiﬁers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classiﬁcation trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classiﬁcation trees, called dyadic classiﬁcation trees (DCTs), are near optimal (in a minimax sense) for a very broad range of clas- siﬁcation problems. This demonstrates that other schemes (e. g. , neural networks, support vector machines) cannot perform signiﬁcantly better than DCTs in many cases. We also show that this near optimal perfor- mance is attained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from the- oretical results on structural risk minimization, on which the pruning rule for DCTs is based.