Arrow Research search

Author name cluster

Brian Kulis

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

ICLR Conference 2025 Conference Paper

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

  • Christopher Liao
  • Christian So
  • Theodoros Tsiligkaridis
  • Brian Kulis

Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg

ICML Conference 2023 Conference Paper

Supervised Metric Learning to Rank for Retrieval via Contextual Similarity Optimization

  • Christopher Liao
  • Theodoros Tsiligkaridis
  • Brian Kulis

There is extensive interest in metric learning methods for image retrieval. Many metric learning loss functions focus on learning a correct ranking of training samples, but strongly overfit semantically inconsistent labels and require a large amount of data. To address these shortcomings, we propose a new metric learning method, called contextual loss, which optimizes contextual similarity in addition to cosine similarity. Our contextual loss implicitly enforces semantic consistency among neighbors while converging to the correct ranking. We empirically show that the proposed loss is more robust to label noise, and is less prone to overfitting even when a large portion of train data is withheld. Extensive experiments demonstrate that our method achieves a new state-of-the-art across four image retrieval benchmarks and multiple different evaluation settings. Code is available at: https: //github. com/Chris210634/metric-learning-using-contextual-similarity

ICML Conference 2022 Conference Paper

Faster Algorithms for Learning Convex Functions

  • Ali Siahkamari
  • Durmus Alp Emre Acar
  • Christopher Liao
  • Kelly L. Geyer
  • Venkatesh Saligrama
  • Brian Kulis

The task of approximating an arbitrary convex function arises in several learning problems such as convex regression, learning with a difference of convex (DC) functions, and learning Bregman or $f$-divergences. In this paper, we develop and analyze an approach for solving a broad range of convex function learning problems that is faster than state-of-the-art approaches. Our approach is based on a 2-block ADMM method where each block can be computed in closed form. For the task of convex Lipschitz regression, we establish that our proposed algorithm converges with iteration complexity of $ O(n\sqrt{d}/\epsilon)$ for a dataset $\bm X \in \mathbb R^{n\times d}$ and $\epsilon > 0$. Combined with per-iteration computation complexity, our method converges with the rate $O(n^3 d^{1. 5}/\epsilon+n^2 d^{2. 5}/\epsilon+n d^3/\epsilon)$. This new rate improves the state of the art rate of $O(n^5d^2/\epsilon)$ if $d = o( n^4)$. Further we provide similar solvers for DC regression and Bregman divergence learning. Unlike previous approaches, our method is amenable to the use of GPUs. We demonstrate on regression and metric learning experiments that our approach is over 100 times faster than existing approaches on some data sets, and produces results that are comparable to state of the art.

ICML Conference 2020 Conference Paper

Deep Divergence Learning

  • Hatice Kubra Cilingir
  • Rachel Manzelli
  • Brian Kulis

Classical linear metric learning methods have recently been extended along two distinct lines: deep metric learning methods for learning embeddings of the data using neural networks, and Bregman divergence learning approaches for extending learning Euclidean distances to more general divergence measures such as divergences over distributions. In this paper, we introduce deep Bregman divergences, which are based on learning and parameterizing functional Bregman divergences using neural networks, and which unify and extend these existing lines of work. We show in particular how deep metric learning formulations, kernel metric learning, Mahalanobis metric learning, and moment-matching functions for comparing distributions arise as special cases of these divergences in the symmetric setting. We then describe a deep learning framework for learning general functional Bregman divergences, and show in experiments that this method yields superior performance on benchmark datasets as compared to existing deep metric learning approaches. We also discuss novel applications, including a semi-supervised distributional clustering problem, and a new loss function for unsupervised data generation.

NeurIPS Conference 2020 Conference Paper

Learning to Approximate a Bregman Divergence

  • Ali Siahkamari
  • XIDE XIA
  • Venkatesh Saligrama
  • David Castañón
  • Brian Kulis

Bregman divergences generalize measures such as the squared Euclidean distance and the KL divergence, and arise throughout many areas of machine learning. In this paper, we focus on the problem of approximating an arbitrary Bregman divergence from supervision, and we provide a well-principled approach to analyzing such approximations. We develop a formulation and algorithm for learning arbitrary Bregman divergences based on approximating their underlying convex generating function via a piecewise linear function. We provide theoretical approximation bounds using our parameterization and show that the generalization error $O_p(m^{-1/2})$ for metric learning using our framework matches the known generalization error in the strictly less general Mahalanobis metric learning setting. We further demonstrate empirically that our method performs well in comparison to existing metric learning methods, particularly for clustering and ranking problems.

ICML Conference 2020 Conference Paper

Piecewise Linear Regression via a Difference of Convex Functions

  • Ali Siahkamari
  • Aditya Gangrade
  • Brian Kulis
  • Venkatesh Saligrama

We present a new piecewise linear regression methodology that utilises fitting a \emph{difference of convex} functions (DC functions) to the data. These are functions $f$ that may be represented as the difference $\phi_1 - \phi_2$ for a choice of \emph{convex} functions $\phi_1, \phi_2$. The method proceeds by estimating piecewise-liner convex functions, in a manner similar to max-affine regression, whose difference approximates the data. The choice of the function is regularised by a new seminorm over the class of DC functions that controls the $\ell_\infty$ Lipschitz constant of the estimate. The resulting methodology can be efficiently implemented via Quadratic programming \emph{even in high dimensions}, and is shown to have close to minimax statistical risk. We empirically validate the method, showing it to be practically implementable, and to outperform existing regression methods in accuracy on real-world datasets.

IJCAI Conference 2019 Conference Paper

Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses

  • Xiao Wang
  • Siyue Wang
  • Pin-Yu Chen
  • Yanzhi Wang
  • Brian Kulis
  • Xue Lin
  • Sang Chin

Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e. g. , large drop in test accuracy. This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off.

ICML Conference 2016 Conference Paper

Robust Monte Carlo Sampling using Riemannian Nosé-Poincaré Hamiltonian Dynamics

  • Anirban Roychowdhury
  • Brian Kulis
  • Srinivasan Parthasarathy 0001

We present a Monte Carlo sampler using a modified Nosé-Poincaré Hamiltonian along with Riemannian preconditioning. Hamiltonian Monte Carlo samplers allow better exploration of the state space as opposed to random walk-based methods, but, from a molecular dynamics perspective, may not necessarily provide samples from the canonical ensemble. Nosé-Hoover samplers rectify that shortcoming, but the resultant dynamics are not Hamiltonian. Furthermore, usage of these algorithms on large real-life datasets necessitates the use of stochastic gradients, which acts as another potentially destabilizing source of noise. In this work, we propose dynamics based on a modified Nosé-Poincaré Hamiltonian augmented with Riemannian manifold corrections. The resultant symplectic sampling algorithm samples from the canonical ensemble while using structural cues from the Riemannian preconditioning matrices to efficiently traverse the parameter space. We also propose a stochastic variant using additional terms in the Hamiltonian to correct for the noise from the stochastic gradients. We show strong performance of our algorithms on synthetic datasets and high-dimensional Poisson factor analysis-based topic modeling scenarios.

NeurIPS Conference 2013 Conference Paper

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

  • Trevor Campbell
  • Miao Liu
  • Brian Kulis
  • Jonathan How
  • Lawrence Carin

This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets.

ICML Conference 2013 Conference Paper

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

  • Tamara Broderick
  • Brian Kulis
  • Michael I. Jordan

The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algorithm. Kulis & Jordan (2012) used this observation to obtain a novel K-means-like algorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead consider applying small-variance asymptotics directly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning setting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel objective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.

NeurIPS Conference 2013 Conference Paper

Small-Variance Asymptotics for Hidden Markov Models

  • Anirban Roychowdhury
  • Ke Jiang
  • Brian Kulis

Small-variance asymptotics provide an emerging technique for obtaining scalable combinatorial algorithms from rich probabilistic models. We present a small-variance asymptotic analysis of the Hidden Markov Model and its infinite-state Bayesian nonparametric extension. Starting with the standard HMM, we first derive a “hard” inference algorithm analogous to k-means that arises when particular variances in the model tend to zero. This analysis is then extended to the Bayesian nonparametric case, yielding a simple, scalable, and flexible algorithm for discrete-state sequence data with a non-fixed number of states. We also derive the corresponding combinatorial objective functions arising from our analysis, which involve a k-means-like term along with penalties based on state transitions and the number of states. A key property of such algorithms is that — particularly in the nonparametric setting — standard probabilistic inference algorithms lack scalability and are heavily dependent on good initialization. A number of results on synthetic and real data sets demonstrate the advantages of the proposed framework.

JMLR Journal 2012 Journal Article

Metric and Kernel Learning Using a Linear Transformation

  • Prateek Jain
  • Brian Kulis
  • Jason V. Davis
  • Inderjit S. Dhillon

Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework---that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. [abs] [ pdf ][ bib ] &copy JMLR 2012. ( edit, beta )

NeurIPS Conference 2012 Conference Paper

Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

  • Ke Jiang
  • Brian Kulis
  • Michael Jordan

Links between probabilistic and non-probabilistic learning algorithms can arise by performing small-variance asymptotics, i. e. , letting the variance of particular distributions in a graphical model go to zero. For instance, in the context of clustering, such an approach yields precise connections between the k-means and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that feature the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis.

AAMAS Conference 2011 Conference Paper

Metric Learning for Reinforcement Learning Agents

  • Matthew E. Taylor
  • Brian Kulis
  • Fei Sha

A key component of any reinforcement learning algorithm is the underlying representation used by the agent. While reinforcement learning (RL) agents have typically relied on hand-coded state representations, there has been a growing interest in learning this representation. While inputs to an agent are typically fixed (i. e. , state variables represent sensors on a robot), it is desirable to automatically determine the optimal relative scaling of such inputs, as well as to diminish the impact of irrelevant features. This work introduces HOLLER, a novel distance metric learning algorithm, and combines it with an existing instance-based RL algorithm to achieve precisely these goals. The algorithms' success is highlighted via empirical measurements on a set of six tasks within the mountain car domain.

NeurIPS Conference 2010 Conference Paper

Inductive Regularized Learning of Kernel Functions

  • Prateek Jain
  • Brian Kulis
  • Inderjit Dhillon

In this paper we consider the fundamental problem of semi-supervised kernel function learning. We propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. We introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the $k$-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies.

NeurIPS Conference 2009 Conference Paper

Learning to Hash with Binary Reconstructive Embeddings

  • Brian Kulis
  • Trevor Darrell

Fast retrieval methods are increasingly critical for many large-scale analysis tasks, and there have been several recent methods that attempt to learn hash functions for fast and accurate nearest neighbor searches. In this paper, we develop an algorithm for learning hash functions based on explicitly minimizing the reconstruction error between the original distances and the Hamming distances of the corresponding binary embeddings. We develop a scalable coordinate-descent algorithm for our proposed hashing objective that is able to efficiently learn hash functions in a variety of settings. Unlike existing methods such as semantic hashing and spectral hashing, our method is easily kernelized and does not require restrictive assumptions about the underlying distribution of the data. We present results over several domains to demonstrate that our method outperforms existing state-of-the-art techniques.

JMLR Journal 2009 Journal Article

Low-Rank Kernel Learning with Bregman Matrix Divergences

  • Brian Kulis
  • Mátyás A. Sustik
  • Inderjit S. Dhillon

In this paper, we study low-rank matrix nearness problems, with a focus on learning low-rank positive semidefinite (kernel) matrices for machine learning applications. We propose efficient algorithms that scale linearly in the number of data points and quadratically in the rank of the input matrix. Existing algorithms for learning kernel matrices often scale poorly, with running times that are cubic in the number of data points. We employ Bregman matrix divergences as the measures of nearness---these divergences are natural for learning low-rank kernels since they preserve rank as well as positive semidefiniteness. Special cases of our framework yield faster algorithms for various existing learning problems, and experimental results demonstrate that our algorithms can effectively learn both low-rank and full-rank kernel matrices. [abs] [ pdf ][ bib ] &copy JMLR 2009. ( edit, beta )

NeurIPS Conference 2008 Conference Paper

Online Metric Learning and Fast Similarity Search

  • Prateek Jain
  • Brian Kulis
  • Inderjit Dhillon
  • Kristen Grauman

Metric learning algorithms can provide useful distance functions for a variety of domains, and recent work has shown good accuracy for problems where the learner can access all distance constraints at once. However, in many real applications, constraints are only available incrementally, thus necessitating methods that can perform online updates to the learned metric. Existing online algorithms offer bounds on worst-case performance, but typically do not perform well in practice as compared to their offline counterparts. We present a new online metric learning algorithm that updates a learned Mahalanobis metric based on LogDet regularization and gradient descent. We prove theoretical worst-case performance bounds, and empirically compare the proposed method against existing online metric learning algorithms. To further boost the practicality of our approach, we develop an online locality-sensitive hashing scheme which leads to efficient updates for approximate similarity search data structures. We demonstrate our algorithm on multiple datasets and show that it outperforms relevant baselines.

ICML Conference 2007 Conference Paper

Information-theoretic metric learning

  • Jason V. Davis
  • Brian Kulis
  • Prateek Jain 0002
  • Suvrit Sra
  • Inderjit S. Dhillon

In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem---that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets.