Arrow Research search

Author name cluster

Barnabás Póczos

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

45 papers
2 author rows

Possible papers

45

ICLR Conference 2025 Conference Paper

Chemistry-Inspired Diffusion with Non-Differentiable Guidance

  • Yuchen Shen
  • Chenhao Zhang
  • Sijie Fu
  • Chenghui Zhou
  • Newell Washburn
  • Barnabás Póczos

Recent advances in diffusion models have shown remarkable potential in the conditional generation of novel molecules. These models can be guided in two ways: (i) explicitly, through additional features representing the condition, or (ii) implicitly, using a property predictor. However, training property predictors or conditional diffusion models requires an abundance of labeled data and is inherently challenging in real-world applications. We propose a novel approach that attenuates the limitations of acquiring large labeled datasets by leveraging domain knowledge from quantum chemistry as a non-differentiable oracle to guide an unconditional diffusion model. Instead of relying on neural networks, the oracle provides accurate guidance in the form of estimated gradients, allowing the diffusion process to sample from a conditional distribution specified by quantum chemistry. We show that this results in more precise conditional generation of novel and stable molecular structures. Our experiments demonstrate that our method: (1) significantly reduces atomic forces, enhancing the validity of generated molecules when used for stability optimization; (2) is compatible with both explicit and implicit guidance in diffusion models, enabling joint optimization of molecular properties and stability; and (3) generalizes effectively to molecular optimization tasks beyond stability optimization. Our implementation is available at https://github.com/A-Chicharito-S/ChemGuide.

ICLR Conference 2025 Conference Paper

Greener GRASS: Enhancing GNNs with Encoding, Rewiring, and Attention

  • Tongzhou Liao
  • Barnabás Póczos

Graph Neural Networks (GNNs) have become important tools for machine learning on graph-structured data. In this paper, we explore the synergistic combination of graph encoding, graph rewiring, and graph attention, by introducing Graph Attention with Stochastic Structures (GRASS), a novel GNN architecture. GRASS utilizes relative random walk probabilities (RRWP) encoding and a novel decomposed variant (D-RRWP) to efficiently capture structural information. It rewires the input graph by superimposing a random regular graph to enhance long-range information propagation. It also employs a novel additive attention mechanism tailored for graph-structured data. Our empirical evaluations demonstrate that GRASS achieves state-of-the-art performance on multiple benchmark datasets, including a 20.3% reduction in mean absolute error on the ZINC dataset.

UAI Conference 2021 Conference Paper

Unsupervised program synthesis for images by sampling without replacement

  • Chenghui Zhou
  • Chun-Liang Li
  • Barnabás Póczos

Program synthesis has emerged as a successful approach to the image parsing task. Most prior works rely on a two-step scheme involving supervised pretraining of a Seq2Seq model with synthetic programs followed by reinforcement learning (RL) for fine-tuning with real reference images. Fully unsupervised approaches promise to train the model directly on the target images without requiring curated pretraining datasets. However, they struggle with the inherent sparsity of meaningful programs in the search space. In this paper, we present the first unsupervised algorithm capable of parsing constructive solid geometry (CSG) images into context-free grammar (CFG) without pretraining. To tackle the non-Markovian sparse reward problem, we combine three key ingredients—(i) a grammar-encoded tree LSTM ensuring program validity (ii) entropy regularization and (iii) sampling without replacement from the CFG syntax tree. Empirically, our algorithm recovers meaningful programs in large search spaces (up to $3. 8 \times 10^{28}$). Further, even though our approach is fully unsupervised, it generalizes better than supervised methods on the synthetic 2D CSG dataset. On the 2D computer aided design (CAD) dataset, our approach significantly outperforms the supervised pretrained model and is competitive to the refined model.

AAAI Conference 2020 Conference Paper

Contextual Parameter Generation for Knowledge Graph Link Prediction

  • George Stoica
  • Otilia Stretcu
  • Emmanouil Antonios Platanios
  • Tom Mitchell
  • Barnabás Póczos

We consider the task of knowledge graph link prediction. Given a question consisting of a source entity and a relation (e. g. , Shakespeare and BornIn), the objective is to predict the most likely answer entity (e. g. , England). Recent approaches tackle this problem by learning entity and relation embeddings. However, they often constrain the relationship between these embeddings to be additive (i. e. , the embeddings are concatenated and then processed by a sequence of linear functions and element-wise non-linearities). We show that this type of interaction significantly limits representational power. For example, such models cannot handle cases where a different projection of the source entity is used for each relation. We propose to use contextual parameter generation to address this limitation. More specifically, we treat relations as the context in which source entities are processed to produce predictions, by using relation embeddings to generate the parameters of a model operating over source entity embeddings. This allows models to represent more complex interactions between entities and relations. We apply our method on two existing link prediction methods, including the current state-of-the-art, resulting in significant performance gains and establishing a new state-of-the-art for this task. These gains are achieved while also reducing convergence time by up to 28 times.

ICLR Conference 2020 Conference Paper

Minimizing FLOPs to Learn Efficient Sparse Representations

  • Biswajit Paria
  • Chih-Kuan Yeh
  • Ian En-Hsu Yen
  • Ning Xu
  • Pradeep Ravikumar
  • Barnabás Póczos

Deep representation learning has become one of the most widely adopted approaches for visual search, recommendation, and identification. Retrieval of such representations from a large database is however computationally challenging. Approximate methods based on learning compact representations, have been widely explored for this problem, such as locality sensitive hashing, product quantization, and PCA. In this work, in contrast to learning compact representations, we propose to learn high dimensional and sparse representations that have similar representational capacity as dense embeddings while being more efficient due to sparse matrix multiplication operations which can be much faster than dense multiplication. Following the key insight that the number of operations decreases quadratically with the sparsity of embeddings provided the non-zero entries are distributed uniformly across dimensions, we propose a novel approach to learn such distributed sparse embeddings via the use of a carefully constructed regularization function that directly minimizes a continuous relaxation of the number of floating-point operations (FLOPs) incurred during retrieval. Our experiments show that our approach is competitive to the other baselines and yields a similar or better speed-vs-accuracy tradeoff on practical datasets.

ICML Conference 2020 Conference Paper

VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing

  • Zoltán Ádám Milacski
  • Barnabás Póczos
  • András Lörincz

Deep Neural Networks (DNNs) achieve the state-of-the-art results on a wide range of image processing tasks, however, the majority of such solutions are problem-specific, like most AI algorithms. The One Network to Solve Them All (OneNet) procedure has been suggested to resolve this issue by exploiting a DNN as the proximal operator in Alternating Direction Method of Multipliers (ADMM) solvers for various imaging problems. In this work, we make two contributions, both facilitating end-to-end learning using backpropagation. First, we generalize OneNet to videos by augmenting its convolutional prior network with bidirectional recurrent connections; second, we extend the fixed fully connected linear ADMM data step with another trainable bidirectional convolutional recurrent network. In our computational experiments on the Rotated MNIST, Scanned CIFAR-10 and UCF-101 data sets, the proposed modifications improve performance by a large margin compared to end-to-end convolutional OneNet and 3D Wavelet sparsity on several video processing problems: pixelwise inpainting-denoising, blockwise inpainting, scattered inpainting, super resolution, compressive sensing, deblurring, frame interpolation, frame prediction and colorization. Our two contributions are complementary, and using them together yields the best results.

UAI Conference 2019 Conference Paper

A Flexible Framework for Multi-Objective Bayesian Optimization using Random Scalarizations

  • Biswajit Paria
  • Kirthevasan Kandasamy
  • Barnabás Póczos

Many real world applications can be framed as multi-objective optimization problems, where we wish to simultaneously optimize for multiple criteria. Bayesian optimization techniques for the multi-objective setting are pertinent when the evaluation of the functions in question are expensive. Traditional methods for multi-objective optimization, both Bayesian and otherwise, are aimed at recovering the Pareto front of these objectives. However, in certain cases a practitioner might desire to identify Pareto optimal points only in a subset of the Pareto front due to external considerations. In this work, we propose a strategy based on random scalarizations of the objectives that addresses this problem. Our approach is able to flexibly sample from desired regions of the Pareto front and, computationally, is considerably cheaper than most approaches for MOO. We also study a notion of regret in the multi-objective setting and show that our strategy achieves sublinear regret. We experiment with both synthetic and real-life problems, and demonstrate superior performance of our proposed algorithm in terms of the flexibility and regret.

AAAI Conference 2019 Conference Paper

Found in Translation: Learning Robust Joint Representations by Cyclic Translations between Modalities

  • Hai Pham
  • Paul Pu Liang
  • Thomas Manzini
  • Louis-Philippe Morency
  • Barnabás Póczos

Multimodal sentiment analysis is a core research area that studies speaker sentiment expressed from the language, visual, and acoustic modalities. The central challenge in multimodal learning involves inferring joint representations that can process and relate information from these modalities. However, existing work learns joint representations by requiring all modalities as input and as a result, the learned representations may be sensitive to noisy or missing modalities at test time. With the recent success of sequence to sequence (Seq2Seq) models in machine translation, there is an opportunity to explore new ways of learning joint representations that may not require all input modalities at test time. In this paper, we propose a method to learn robust joint representations by translating between modalities. Our method is based on the key insight that translation from a source to a target modality provides a method of learning joint representations using only the source modality as input. We augment modality translations with a cycle consistency loss to ensure that our joint representations retain maximal information from all modalities. Once our translation model is trained with paired multimodal data, we only need data from the source modality at test time for final sentiment prediction. This ensures that our model remains robust from perturbations or missing information in the other modalities. We train our model with a coupled translationprediction objective and it achieves new state-of-the-art results on multimodal sentiment analysis datasets: CMU-MOSI, ICT- MMMO, and YouTube. Additional experiments show that our model learns increasingly discriminative joint representations with more input modalities while maintaining robustness to missing or perturbed modalities.

JAIR Journal 2019 Journal Article

Multi-fidelity Gaussian Process Bandit Optimisation

  • Kirthevasan Kandasamy
  • Gautam Dasarathy
  • Junier Oliva
  • Jeff Schneider
  • Barnabás Póczos

In many scientific and engineering applications, we are tasked with the maximisation of an expensive to evaluate black box function f. Traditional settings for this problem assume just the availability of this single function. However, in many cases, cheap approximations to f may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of f in a small but promising region and speedily identify the optimum. We formalise this task as a multi-fidelity bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop MF-GP-UCB, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour and achieves better bounds on the regret than strategies which ignore multi-fidelity information. Empirically, MF-GP-UCB outperforms such naive strategies and other multi-fidelity methods on several synthetic and real experiments.

ICML Conference 2019 Conference Paper

Myopic Posterior Sampling for Adaptive Goal Oriented Design of Experiments

  • Kirthevasan Kandasamy
  • Willie Neiswanger
  • Reed Zhang
  • Akshay Krishnamurthy
  • Jeff G. Schneider
  • Barnabás Póczos

Bayesian methods for adaptive decision-making, such as Bayesian optimisation, active learning, and active search have seen great success in relevant applications. However, real world data collection tasks are more broad and complex, as we may need to achieve a combination of the above goals and/or application specific goals. In such scenarios, specialised methods have limited applicability. In this work, we design a new myopic strategy for a wide class of adaptive design of experiment (DOE) problems, where we wish to collect data in order to fulfil a given goal. Our approach, Myopic Posterior Sampling (MPS), which is inspired by the classical posterior sampling algorithm for multi-armed bandits, enables us to address a broad suite of DOE tasks where a practitioner may incorporate domain expertise about the system and specify her desired goal via a reward function. Empirically, this general-purpose strategy is competitive with more specialised methods in a wide array of synthetic and real world DOE tasks. More importantly, it enables addressing complex DOE goals where no existing method seems applicable. On the theoretical side, we leverage ideas from adaptive submodularity and reinforcement learning to derive conditions under which MPS achieves sublinear regret against natural benchmark policies.

ICML Conference 2018 Conference Paper

Gradient Descent Learns One-hidden-layer CNN: Don't be Afraid of Spurious Local Minima

  • Simon S. Du
  • Jason D. Lee
  • Yuandong Tian
  • Aarti Singh
  • Barnabás Póczos

We consider the problem of learning an one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i. e. , $f(Z; w, a) = \sum_j a_j\sigma(w^\top Z_j)$, in which both the convolutional weights $w$ and the output weights $a$ are parameters to be learned. We prove that with Gaussian input $\mathbf{Z}$ there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, starting from randomly initialized weights, gradient descent with weight normalization can still be proven to recover the true parameters with constant probability (which can be boosted to probability $1$ with multiple restarts). We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.

IROS Conference 2018 Conference Paper

Robust Plant Phenotyping via Model-Based Optimization

  • Paloma Sodhi
  • Hanqi Sun
  • Barnabás Póczos
  • David Wettergreen

Plant phenotyping is the measurement of observable plant traits. Current methods for phenotyping in the field are labour intensive and error prone. High throughput plant phenotyping in an automated and noninvasive manner is crucial to accelerating plant breeding methods. Occlusions and non-ideal sensing conditions is a major problem for high throughput plant phenotyping with most state-of-the-art 3D phenotyping algorithms relying heavily on heuristics or hand-tuned parameters. To address this problem, we present a novel model-based optimization approach for estimating plant physical traits from plant units called phytomers. The proposed approach involves sampling parameterized 3D plant models from an underlying probability distribution. It then optimizes, making the mass of this probability distribution approach true parameters of the model. Reformulating the phenotyping objective as a search in the space of plant models lets us reason about the plant structure in a holistic manner without having to rely on hand-tuned parameters. This makes our approach robust to noise and occlusions as frequently encountered in real world environments. We evaluate our approach for plant units taken across simulated, greenhouse and field environments. This work furthers field-based robotic phenotyping capabilities paving the way for plant biologists to study the coupled effect of genetics and environment on improving crop yields.

ICML Conference 2018 Conference Paper

Transformation Autoregressive Networks

  • Junier B. Oliva
  • Avinava Dubey
  • Manzil Zaheer
  • Barnabás Póczos
  • Ruslan Salakhutdinov
  • Eric P. Xing
  • Jeff G. Schneider

The fundamental task of general density estimation $p(x)$ has been of keen interest to machine learning. In this work, we attempt to systematically characterize methods for density estimation. Broadly speaking, most of the existing methods can be categorized into either using: a ) autoregressive models to estimate the conditional factors of the chain rule, $p(x_{i}\, |\, x_{i-1}, \ldots)$; or b ) non-linear transformations of variables of a simple base distribution. Based on the study of the characteristics of these categories, we propose multiple novel methods for each category. For example we propose RNN based transformations to model non-Markovian dependencies. Further, through a comprehensive study over both real world and synthetic data, we show that jointly leveraging transformations of variables and autoregressive conditional models, results in a considerable improvement in performance. We illustrate the use of our models in outlier detection and image modeling. Finally we introduce a novel data driven framework for learning a family of distributions.

IJCAI Conference 2017 Conference Paper

Data-driven Random Fourier Features using Stein Effect

  • Wei-Cheng Chang
  • Chun-Liang Li
  • Yiming Yang
  • Barnabás Póczos

Large-scale kernel approximation is an important problem in machine learning research. Approaches using random Fourier features have become increasingly popular \cite{Rahimi_NIPS_07}, where kernel approximation is treated as empirical mean estimation via Monte Carlo (MC) or Quasi-Monte Carlo (QMC) integration \cite{Yang_ICML_14}. A limitation of the current approaches is that all the features receive an equal weight summing to 1. In this paper, we propose a novel shrinkage estimator from "Stein effect", which provides a data-driven weighting strategy for random features and enjoys theoretical justifications in terms of lowering the empirical risk. We further present an efficient randomized algorithm for large-scale applications of the proposed method. Our empirical results on six benchmark data sets demonstrate the advantageous performance of this approach over representative baselines in both kernel approximation and supervised learning tasks.

ICML Conference 2017 Conference Paper

Equivariance Through Parameter-Sharing

  • Siamak Ravanbakhsh
  • Jeff G. Schneider
  • Barnabás Póczos

We propose to study equivariance in deep neural networks through parameter symmetries. In particular, given a group G that acts discretely on the input and output of a standard neural network layer, we show that its equivariance is linked to the symmetry group of network parameters. We then propose two parameter-sharing scheme to induce the desirable symmetry on the parameters of the neural network. Under some conditions on the action of G, our procedure for tying the parameters achieves G-equivariance and guarantees sensitivity to all other permutation groups outside of G.

ICML Conference 2017 Conference Paper

Multi-fidelity Bayesian Optimisation with Continuous Approximations

  • Kirthevasan Kandasamy
  • Gautam Dasarathy
  • Jeff G. Schneider
  • Barnabás Póczos

Bandit methods for black-box optimisation, such as Bayesian optimisation, are used in a variety of applications including hyper-parameter tuning and experiment design. Recently, multi-fidelity methods have garnered considerable attention since function evaluations have become increasingly expensive in such applications. Multi-fidelity methods use cheap approximations to the function of interest to speed up the overall optimisation process. However, most multi-fidelity methods assume only a finite number of approximations. On the other hand, in many practical applications, a continuous spectrum of approximations might be available. For instance, when tuning an expensive neural network, one might choose to approximate the cross validation performance using less data $N$ and/or few training iterations $T$. Here, the approximations are best viewed as arising out of a continuous two dimensional space $(N, T)$. In this work, we develop a Bayesian optimisation method, BOCA, for this setting. We characterise its theoretical properties and show that it achieves better regret than than strategies which ignore the approximations. BOCA outperforms several other baselines in synthetic and real experiments.

UAI Conference 2017 Conference Paper

Near-Orthogonality Regularization in Kernel Methods

  • Pengtao Xie
  • Barnabás Póczos
  • Eric P. Xing

Kernel methods perform nonlinear learning in high-dimensional reproducing kernel Hilbert spaces (RKHSs). Even though their large model-capacity leads to high representational power, it also incurs substantial risk of overfitting. To alleviate this problem, we propose a new regularization approach, nearorthogonality regularization, which encourages the RKHS functions to be close to being orthogonal. This effectively imposes a structural constraint over the function space, which reduces model complexity and can improve generalization performance. Besides, encouraging orthogonality reduces the redundancy among functions, which hence can reduce model size without compromising modeling power and better capture infrequent patterns in the data. Here, we define a family of orthogonality-promoting regularizers by encouraging the Gram matrix of the RKHS functions to be close to an identity matrix where the closeness is measured by Bregman matrix divergences. We apply these regularizers to two kernel methods, and develop an efficient ADMM-based algorithm to solve the regularized optimization problems. We analyze how near-orthogonality affects the generalization performance of kernel methods. Our results suggest that the closer the functions are to being orthogonal, the smaller the generalization error is. Experiments demonstrate the efficacy of near-orthogonality regularization in kernel methods.

ICML Conference 2017 Conference Paper

Nonparanormal Information Estimation

  • Shashank Singh 0005
  • Barnabás Póczos

We study the problem of using i. i. d. samples from an unknown multivariate probability distribution p to estimate the mutual information of p. This problem has recently received attention in two settings: (1) where p is assumed to be Gaussian and (2) where p is assumed only to lie in a large nonparametric smoothness class. Estimators proposed for the Gaussian case converge in high dimensions when the Gaussian assumption holds, but are brittle, failing dramatically when p is not Gaussian, while estimators proposed for the nonparametric case fail to converge with realistic sample sizes except in very low dimension. Hence, there is a lack of robust mutual information estimators for many realistic data. To address this, we propose estimators for mutual information when p is assumed to be a nonparanormal (or Gaussian copula) model, a semiparametric compromise between Gaussian and nonparametric extremes. Using theoretical bounds and experiments, we show these estimators strike a practical balance between robustness and scalability.

AIJ Journal 2017 Journal Article

Query efficient posterior estimation in scientific experiments via Bayesian active learning

  • Kirthevasan Kandasamy
  • Jeff Schneider
  • Barnabás Póczos

A common problem in disciplines of applied Statistics research such as Astrostatistics is of estimating the posterior distribution of relevant parameters. Typically, the likelihoods for such models are computed via expensive experiments such as cosmological simulations of the universe. An urgent challenge in these research domains is to develop methods that can estimate the posterior with few likelihood evaluations. In this paper, we study active posterior estimation in a Bayesian setting when the likelihood is expensive to evaluate. Existing techniques for posterior estimation are based on generating samples representative of the posterior. Such methods do not consider efficiency in terms of likelihood evaluations. In order to be query efficient we treat posterior estimation in an active regression framework. We propose two myopic query strategies to choose where to evaluate the likelihood and implement them using Gaussian processes. Via experiments on a series of synthetic and real examples we demonstrate that our approach is significantly more query efficient than existing techniques and other heuristics for posterior estimation.

ICML Conference 2017 Conference Paper

The Statistical Recurrent Unit

  • Junier B. Oliva
  • Barnabás Póczos
  • Jeff G. Schneider

Sophisticated gated recurrent neural network architectures like LSTMs and GRUs have been shown to be highly effective in a myriad of applications. We develop an un-gated unit, the statistical recurrent unit (SRU), that is able to learn long term dependencies in data by only keeping moving averages of statistics. The SRU’s architecture is simple, un-gated, and contains a comparable number of parameters to LSTMs; yet, SRUs perform favorably to more sophisticated LSTM and GRU alternatives, often outperforming one or both in various tasks. We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures’ hyperparameters for both synthetic and real-world tasks.

ICML Conference 2016 Conference Paper

Boolean Matrix Factorization and Noisy Completion via Message Passing

  • Siamak Ravanbakhsh
  • Barnabás Póczos
  • Russell Greiner

Boolean matrix factorization and Boolean matrix completion from noisy observations are desirable unsupervised data-analysis methods due to their interpretability, but hard to perform due to their NP-hardness. We treat these problems as maximum a posteriori inference problems in a graphical model and present a message passing approach that scales linearly with the number of observations and factors. Our empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.

ICML Conference 2016 Conference Paper

Estimating Cosmological Parameters from the Dark Matter Distribution

  • Siamak Ravanbakhsh
  • Junier B. Oliva
  • Sebastian Fromenteau
  • Layne Price
  • Shirley Ho
  • Jeff G. Schneider
  • Barnabás Póczos

A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach in estimating the cosmological parameters is to use the large scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "single" function of scale, such as the galaxy correlation function or power-spectrum. We show that it is possible to estimate these cosmological parameters directly from the distribution of matter. This paper presents the application of deep 3D convolutional networks to volumetric representation of dark matter simulations as well as the results obtained using a recently proposed distribution regression framework, showing that machine learning techniques are comparable to, and can sometimes outperform, maximum-likelihood point estimates using "cosmological models". This opens the way to estimating the parameters of our Universe with higher accuracy.

JMLR Journal 2016 Journal Article

Learning Theory for Distribution Regression

  • Zoltán Szabó
  • Bharath K. Sriperumbudur
  • Barnabás Póczos
  • Arthur Gretton

We focus on the distribution regression problem: regressing to vector-valued outputs from probability measures. Many important machine learning and statistical tasks fit into this framework, including multi-instance learning and point estimation problems without analytical solution (such as hyperparameter or entropy estimation). Despite the large number of available heuristics in the literature, the inherent two-stage sampled nature of the problem makes the theoretical analysis quite challenging, since in practice only samples from sampled distributions are observable, and the estimates have to rely on similarities computed between sets of points. To the best of our knowledge, the only existing technique with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which often performs poorly in practice), and the domain of the distributions to be compact Euclidean. In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. Our main contribution is to prove that this scheme is consistent in the two-stage sampled setup under mild conditions (on separable topological domains enriched with kernels): we present an exact computational-statistical efficiency trade-off analysis showing that our estimator is able to match the one-stage sampled minimax optimal rate (Caponnetto and De Vito, 2007; Steinwart et al., 2009). This result answers a $17 $-year-old open question, establishing the consistency of the classical set kernel (Haussler, 1999; Gärtner et al., 2002) in regression. We also cover consistency for more recent kernels on distributions, including those due to Christmann and Steinwart (2010). [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

AAAI Conference 2016 Conference Paper

Linear-Time Learning on Distributions with Approximate Kernel Embeddings

  • Danica Sutherland
  • Junier Oliva
  • Barnabás Póczos
  • Jeff Schneider

Many interesting machine learning problems are best posed by considering instances that are distributions, or sample sets drawn from distributions. Most previous work devoted to machine learning tasks with distributional inputs has done so through pairwise kernel evaluations between pdfs (or sample sets). While such an approach is fine for smaller datasets, the computation of an N × N Gram matrix is prohibitive in large datasets. Recent scalable estimators that work over pdfs have done so only with kernels that use Euclidean metrics, like the L2 distance. However, there are a myriad of other useful metrics available, such as total variation, Hellinger distance, and the Jensen-Shannon divergence. This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics. These random features allow estimators to scale to large datasets by working in a primal space, without computing large Gram matrices. We provide an analysis of the approximation error in using our proposed random features, and show empirically the quality of our approximation both in estimating a Gram matrix and in solving learning tasks in real-world and synthetic data.

IROS Conference 2016 Conference Paper

Nonparametric distribution regression applied to sensor modeling

  • Abhijeet Tallavajhula
  • Barnabás Póczos
  • Alonzo Kelly

Sensor models, which specify the distribution of sensor observations, are a widely used and integral part of robotics algorithms. Observation distributions are commonly approximated by parametric models, which are limited in their expressiveness, and may require careful design to suit an application. In this paper, we propose nonparametric distribution regression as a procedure to model sensors. It is a data-driven procedure to predict distributions that makes few assumptions. We apply the procedure to model raw distributions from real sensors, and also demonstrate its utility to a mobile robot state estimation task. We show that nonparametric distribution regression adapts to characteristics in the training data, leading to realistic predictions. The same procedure competes favorably with baseline parametric models across applications. The results also help develop intuition for different sensor modeling situations. Our procedure is useful when distributions are inherently noisy, and sufficient data is available.

ICML Conference 2016 Conference Paper

Stochastic Variance Reduction for Nonconvex Optimization

  • Sashank J. Reddi
  • Ahmed Hefny
  • Suvrit Sra
  • Barnabás Póczos
  • Alexander J. Smola

We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings.

UAI Conference 2016 Conference Paper

Utilize Old Coordinates: Faster Doubly Stochastic Gradients for Kernel Methods

  • Chun-Liang Li
  • Barnabás Póczos

To address the scalability issue of kernel methods, random features are commonly used for kernel approximation (Rahimi and Recht, 2007). They map the input data to a randomized lowdimensional feature space and apply fast linear learning algorithms on it. However, to achieve high precision results, one might still need a large number of random features, which is infeasible in large-scale applications. Dai et al. (2014) address this issue by recomputing the random features of small batches in each iteration instead of pre-generating for the whole dataset and keeping them in the memory. The algorithm increases the number of random features linearly with iterations, which can reduce the approximation error to arbitrarily small. A drawback of this approach is that the large number of random features slows down the prediction and gradient evaluation after several iterations. We propose two algorithms to remedy this situation by “utilizing” old random features instead of adding new features in certain iterations. By checking the expected descent amount, the proposed algorithm selects “important” old features to update. The resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm but effective in practice. We conduct empirical studies on both medium and large-scale datasets, such as ImageNet, to demonstrate the power of the proposed algorithms.

UAI Conference 2015 Conference Paper

Communication Efficient Coresets for Empirical Loss Minimization

  • Sashank J. Reddi
  • Barnabás Póczos
  • Alexander J. Smola

Throughout this paper we assume that ` is convex. Furthermore, we assume that X = Rd and Y = R. Note that the objective function above is strongly convex (a function f is strongly convex with modulus λ if f (w) − λ2 kwk2 is a convex function). Problems conforming to Equation (1) include popular supervised learning algorithms like support vector machines and regularized logistic regression. For example, when xi ∈ Rd, yi ∈ {−1, 1} and `(xi, yi, w) = log(1 + exp(−yi w> xi )) (logistic loss), the optimization problem in Equation (1) corresponds to regularized logistic regression. The loss function ` is not necessarily smooth as in, for example, support vector machines (SVM) where `(xi, yi, w) = max(0, 1 − yi w> xi ) (hinge loss). In this paper, we study the problem of empirical loss minimization with l2 -regularization in distributed settings with significant communication cost. Stochastic gradient descent (SGD) and its variants are popular techniques for solving these problems in large-scale applications. However, the communication cost of these techniques is usually high, thus leading to considerable performance degradation. We introduce a novel approach to reduce the communication cost while retaining good convergence properties. The key to our approach is the construction of a small summary of the data, called coreset, at each iteration and solve an easy optimization problem based on the coreset. We present a general framework for analyzing coreset-based optimization and provide interesting insights into existing algorithms from this perspective. We then propose a new coreset construction and provide its convergence analysis for a wide class of problems that include logistic regression and support vector machines. Preliminary experiments show encouraging results for our algorithm on real-world datasets. 1 Several algorithms have been proposed in the literature for solving optimization problems of the aforementioned form. We will briefly review a few key approaches in Section 2; however, the algorithms are either largely synchronous or communication intensive. For example, one of the popular approaches for solving such optimization problems is stochastic subgradient descent. At each iteration of the algorithm, a single training example is chosen at random and used to determine the subgradient of the objective function. While such an approach reduces the computation complexity at each iteration, the communication cost is prohibitively expensive in distributed environment.

ICML Conference 2015 Conference Paper

High Dimensional Bayesian Optimisation and Bandits via Additive Models

  • Kirthevasan Kandasamy
  • Jeff G. Schneider
  • Barnabás Póczos

Bayesian Optimisation (BO) is a technique used in optimising a D-dimensional function which is typically expensive to evaluate. While there have been many successes for BO in low dimensions, scaling it to high dimensions has been notoriously difficult. Existing literature on the topic are under very restrictive settings. In this paper, we identify two key challenges in this endeavour. We tackle these challenges by assuming an additive structure for the function. This setting is substantially more expressive and contains a richer class of functions than previous work. We prove that, for additive functions the regret has only linear dependence on D even though the function depends on all D dimensions. We also demonstrate several other statistical and computational benefits in our framework. Via synthetic examples, a scientific simulation and a face detection problem we demonstrate that our method outperforms naive BO on additive functions and on several examples where the function is not additive.

ICML Conference 2014 Conference Paper

Generalized Exponential Concentration Inequality for Renyi Divergence Estimation

  • Shashank Singh 0005
  • Barnabás Póczos

Estimating divergences between probability distributions in a consistent way is of great importance in many machine learning tasks. Although this is a fundamental problem in nonparametric statistics, to the best of our knowledge there has been no finite sample exponential inequality convergence bound derived for any divergence estimators. The main contribution of our work is to provide such a bound for an estimator of Renyi divergence for a smooth Holder class of densities on the d-dimensional unit cube. We also illustrate our theoretical results with a numerical experiment.

UAI Conference 2014 Conference Paper

k-NN Regression on Functional Data with Incomplete Observations

  • Sashank J. Reddi
  • Barnabás Póczos

In this paper we study a general version of regression where each covariate itself is a functional data such as distributions or functions. In real applications, however, typically we do not have direct access to such data; instead only some noisy estimates of the true covariate functions/distributions are available to us. For example, when each covariate is a distribution, then we might not be able to directly observe these distributions, but it can be assumed that i. i. d. sample sets from these distributions are available. In this paper we present a general framework and a k- NN based estimator for this regression problem. We prove consistency of the estimator and derive its convergence rates. We further show that the proposed estimator can adapt to the local intrinsic dimension in our case and provide a simple approach for choosing k. Finally, we illustrate the applicability of our framework with numerical experiments.

ICML Conference 2014 Conference Paper

Nonparametric Estimation of Renyi Divergence and Friends

  • Akshay Krishnamurthy
  • Kirthevasan Kandasamy
  • Barnabás Póczos
  • Larry A. Wasserman

We consider nonparametric estimation of L_2, Renyi-αand Tsallis-αdivergences between continuous distributions. Our approach is to construct estimators for particular integral functionals of two densities and translate them into divergence estimators. For the integral functionals, our estimators are based on corrections of a preliminary plug-in estimator. We show that these estimators achieve the parametric convergence rate of n^-1/2 when the densities’ smoothness, s, are both at least d/4 where d is the dimension. We also derive minimax lower bounds for this problem which confirm that s > d/4 is necessary to achieve the n^-1/2 rate of convergence. We validate our theoretical guarantees with a number of simulations.

ICML Conference 2013 Conference Paper

Distribution to Distribution Regression

  • Junier B. Oliva
  • Barnabás Póczos
  • Jeff G. Schneider

We analyze ’Distribution to Distribution regression’ where one is regressing a mapping where both the covariate (inputs) and response (outputs) are distributions. No parameters on the input or output distributions are assumed, nor are any strong assumptions made on the measure from which input distributions are drawn from. We develop an estimator and derive an upper bound for the L2 risk; also, we show that when the effective dimension is small enough (as measured by the doubling dimension), then the risk converges to zero with a polynomial rate.

ICML Conference 2013 Conference Paper

Scale Invariant Conditional Dependence Measures

  • Sashank J. Reddi
  • Barnabás Póczos

In this paper we develop new dependence and conditional dependence measures and provide their estimators. An attractive property of these measures and estimators is that they are invariant to any monotone increasing transformations of the random variables, which is important in many applications including feature selection. Under certain conditions we show the consistency of these estimators, derive upper bounds on their convergence rates, and show that the estimators do not suffer from the curse of dimensionality. However, when the conditions are less restrictive, we derive a lower bound which proves that in the worst case the convergence can be arbitrarily slow similarly to some other estimators. Numerical illustrations demonstrate the applicability of our method.

NeurIPS Conference 2011 Conference Paper

Group Anomaly Detection using Flexible Genre Models

  • Liang Xiong
  • Barnabás Póczos
  • Jeff Schneider

An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies.

NeurIPS Conference 2010 Conference Paper

Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs

  • Dávid Pál
  • Barnabás Póczos
  • Csaba Szepesvári

We present simple and computationally efficient nonparametric estimators of R\'enyi entropy and mutual information based on an i. i. d. sample drawn from an unknown, absolutely continuous distribution over $\R^d$. The estimators are calculated as the sum of $p$-th powers of the Euclidean lengths of the edges of the `generalized nearest-neighbor' graph of the sample and the empirical copula of the sample respectively. For the first time, we prove the almost sure consistency of these estimators and upper bounds on their rates of convergence, the latter of which under the assumption that the density underlying the sample is Lipschitz continuous. Experiments demonstrate their usefulness in independent subspace analysis.

JMLR Journal 2009 Journal Article

Identification of Recurrent Neural Networks by Bayesian Interrogation Techniques

  • Barnabás Póczos
  • András Loőrincz

We introduce novel online Bayesian methods for the identification of a family of noisy recurrent neural networks (RNNs). We present Bayesian active learning techniques for stimulus selection given past experiences. In particular, we consider the unknown parameters as stochastic variables and use A-optimality and D-optimality principles to choose optimal stimuli. We derive myopic cost functions in order to maximize the information gain concerning network parameters at each time step. We also derive the A-optimal and D-optimal estimations of the additive noise that perturbs the dynamical system of the RNN. Here we investigate myopic as well as non-myopic estimations, and study the problem of simultaneous estimation of both the system parameters and the noise. Employing conjugate priors our derivations remain approximation-free and give rise to simple update rules for the online learning of the parameters. The efficiency of our method is demonstrated for a number of selected cases, including the task of controlled independent component analysis. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

JMLR Journal 2007 Journal Article

Undercomplete Blind Subspace Deconvolution

  • Zoltán Szabó
  • Barnabás Póczos
  • András Lőrincz

We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated 'high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods. [abs] [ pdf ][ bib ] &copy JMLR 2007. ( edit, beta )