Arrow Research search

Author name cluster

Abhishek Kumar

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

ICLR Conference 2025 Conference Paper

RB-Modulation: Training-Free Stylization using Reference-Based Modulation

  • Litu Rout
  • Yujia Chen 0001
  • Nataniel Ruiz
  • Abhishek Kumar
  • Constantine Caramanis
  • Sanjay Shakkottai
  • Wen-Sheng Chu

We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our test-time optimization framework demonstrates precise extraction and control of *content* and *style* in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets. See project page: https://rb-modulation.github.io/ for code and further details.

AAMAS Conference 2025 Conference Paper

Regret Guarantees for a UCB-based Algorithm for Volatile Combinatorial Bandits

  • Abhishek Kumar
  • Andra Siva Sai Teja
  • Ganesh Ghalme
  • Sujit Gujar
  • Y. Narahari

We study the combinatorial multi-armed bandit (MAB) problem with an additional constraint that an arbitrary subset of arms is unavailable at any given time instant. We refer to this setting as a volatile combinatorial MAB setting. The bandit algorithm must pull a subset of arms from the set of available arms to minimize the regret. Under some mild smoothness conditions, we show that the proposed CV-UCB algorithm—a straightforward extension of well-known C-UCB algorithm—achieves a 𝑂(log(𝑇)) instancedependent regret guarantee under a semi-bandit feedback setting. We further show that under some mild restrictions on the range of reward functions, CV-UCB incurs 𝑂( √︁ 𝑇 log(𝑇)) regret, which we call weak instance-independent regret. We further show that the instance-independent regret of 𝑂( 3 √︁ 𝑇2 log(𝑇)) for CV-UCB algorithm, completing the hierarchy of regret guarantees obtained by gradually relaxing the dependence on the instance parameters.

JMLR Journal 2025 Journal Article

Score-based Causal Representation Learning: Linear and General Transformations

  • Burak Varici
  • Emre Acartürk
  • Karthikeyan Shanmugam
  • Abhishek Kumar
  • Ali Tajer

This paper addresses intervention-based causal representation learning (CRL) under a general nonparametric latent causal model and an unknown transformation that maps the latent variables to the observed variables. Linear and general transformations are investigated. The paper addresses both the identifiability and achievability aspects. Identifiability refers to determining algorithm-agnostic conditions that ensure the recovery of the true latent causal variables and the underlying latent causal graph. Achievability refers to the algorithmic aspects and addresses designing algorithms that achieve identifiability guarantees. By drawing novel connections between score functions (i.e., the gradients of the logarithm of density functions) and CRL, this paper designs a score-based class of algorithms that ensures both identifiability and achievability. First, the paper focuses on linear transformations and shows that one stochastic hard intervention per node suffices to guarantee identifiability. It also provides partial identifiability guarantees for soft interventions, including identifiability up to mixing with parents for general causal models and perfect recovery of the latent graph for sufficiently nonlinear causal models. Secondly, it focuses on general transformations and demonstrates that two stochastic hard interventions per node are sufficient for identifiability. This is achieved by defining a differentiable loss function whose global optima ensure identifiability for general CRL. Notably, one does not need to know which pair of interventional environments has the same node intervened. Finally, the theoretical results are empirically validated via experiments on structured synthetic data and image data. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2025. ( edit, beta )

TMLR Journal 2024 Journal Article

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

  • Avi Singh
  • John D Co-Reyes
  • Rishabh Agarwal
  • Ankesh Anand
  • Piyush Patil
  • Xavier Garcia
  • Peter J Liu
  • James Harrison

Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investigate a simple self-training method based on expectation-maximization, which we call \method, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times. Testing on advanced MATH reasoning and APPS coding benchmarks using PaLM-2 models, we find that \method{} scales favorably with model size and significantly surpasses fine-tuning only on human data. Overall, our findings suggest self-training with feedback can reduce dependence on human-generated data.

TMLR Journal 2024 Journal Article

Enhancing Contrastive Clustering with Negative Pair-guided Regularization

  • Abhishek Kumar
  • Anish Chakrabarty
  • Sankha Subhra Mullick
  • Swagatam Das

Contrastive Learning (CL) aims to create effective embeddings for input data by minimizing the distance between positive pairs, i.e., different augmentations or views of the same sample. To avoid degeneracy, CL also employs auxiliary loss to maximize the discrepancy between negative pairs formed with views of distinct samples. As a self-supervised learning strategy, CL inherently attempts to cluster input data into natural groups. However, the often improper trade-off between the attractive and repulsive forces, respectively induced by positive and negative pairs, can lead to deformed clustering, particularly when the number of clusters $k$ is unknown. To address this, we propose NRCC, a CL-based deep clustering framework that generates cluster-friendly embeddings. NRCC repurposes Stochastic Gradient Hamiltonian Monte Carlo sampling as an approximately invariant data augmentation, to curate hard negative pairs that judiciously enhance and balance the two adversarial forces through a regularizer. By preserving the cluster structure in the CL embedding, NRCC retains local density landscapes in lower dimensions through neighborhood-conserving projections. This enables the application of mode-seeking clustering algorithms, typically hindered by high-dimensional CL feature spaces, to achieve exceptional accuracy without needing a predetermined $k$. NRCC's superiority is demonstrated across various datasets with different scales and cluster structures, outperforming 20 state-of-the-art methods.

EAAI Journal 2024 Journal Article

Performance prediction and Bayesian optimization of screw compressors using Gaussian Process Regression

  • Abhishek Kumar
  • Sumit Patil
  • Ahmed Kovacevic
  • Sathiskumar Anusuya Ponnusami

Optimizing the performance of screw compressors is critical for achieving high efficiency and reducing costs in various industrial and engineering applications. Often, the design and optimization processes are time-consuming owing to the underlying iterative complex analyses. In this context, the present research investigates the potential of Gaussian Process Regression (GPR) and Bayesian optimization for the prediction and optimization of the performance of an oil-flooded screw compressor. Specifically, the GPR-based surrogate model is developed to predict the compressor performance characteristics based on its four main geometrical design parameters such as wrap angle, relative length, tip speed of the male rotor and built-in volume ratio. The model is trained using a dataset comprising 19, 200 data points relating the input design parameters with the compressor performance, obtained using physics-based multi-chamber thermodynamic models. While four different learning algorithms such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Polynomial regression and GPR are explored, the GPR performed the best resulting in an R2 value of 0. 99 for the test dataset after hyperparameter tuning. Further, the model is also experimentally validated on a completely unseen dataset, showing very good predictions with a maximum error of 5%. The resulting surrogate model is then used to optimize the compressor design parameters using Bayesian optimization. The results are compared with optimization using Genetic Algorithm (GA) and physics-based multi-chamber thermodynamic model. It was shown the proposed approach results in similar optimal design parameters but with a significantly less optimization time by a factor of 7. The study highlight the potential of machine learning-based prediction and optimization of screw compressors in engineering applications.

ICLR Conference 2024 Conference Paper

Small-scale proxies for large-scale Transformer training instabilities

  • Mitchell Wortsman
  • Peter J. Liu
  • Lechao Xiao
  • Katie E. Everett
  • Alexander A. Alemi
  • Ben Adlam
  • John D. Co-Reyes
  • Izzeddin Gur

Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study training instability at smaller scales. First, we focus on two sources of training instability described in previous work: the growth of logits in attention layers (Dehghani et al., 2023) and divergence of the output logits from the log probabilities (Chowdhery et al., 2022). By measuring the relationship between learning rate and loss across scales, we show that these instabilities also appear in small models when training at high learning rates, and that mitigations previously employed at large scales are equally effective in this regime. This prompts us to investigate the extent to which other known optimizer and model interventions influence the sensitivity of the final loss to changes in the learning rate. To this end, we study methods such as warm-up, weight decay, and the MuParam (Yang et al., 2022), and combine techniques to train small models that achieve similar losses across orders of magnitude of learning rate variation. Finally, to conclude our exploration we study two cases where instabilities can be predicted before they emerge by examining the scaling behavior of model characteristics such as activation and gradient norms.

NeurIPS Conference 2024 Conference Paper

The Group Robustness is in the Details: Revisiting Finetuning under Spurious Correlations

  • Tyler LaBonte
  • John C. Hill
  • Xinchen Zhang
  • Vidya Muthukumar
  • Abhishek Kumar

Modern machine learning models are prone to over-reliance on spurious correlations, which can often lead to poor performance on minority groups. In this paper, we identify surprising and nuanced behavior of finetuned models on worst-group accuracy via comprehensive experiments on four well-established benchmarks across vision and language tasks. We first show that the commonly used class-balancing techniques of mini-batch upsampling and loss upweighting can induce a decrease in worst-group accuracy (WGA) with training epochs, leading to performance no better than without class-balancing. While in some scenarios, removing data to create a class-balanced subset is more effective, we show this depends on group structure and propose a mixture method which can outperform both techniques. Next, we show that scaling pretrained models is generally beneficial for worst-group accuracy, but only in conjunction with appropriate class-balancing. Finally, we identify spectral imbalance in finetuning features as a potential source of group disparities --- minority group covariance matrices incur a larger spectral norm than majority groups once conditioned on the classes. Our results show more nuanced interactions of modern finetuned models with group robustness than was previously known. Our code is available at https: //github. com/tmlabonte/revisiting-finetuning.

ICLR Conference 2023 Conference Paper

Distributionally Robust Post-hoc Classifiers under Prior Shifts

  • Jiaheng Wei
  • Harikrishna Narasimhan
  • Ehsan Amid
  • Wen-Sheng Chu
  • Yang Liu 0018
  • Abhishek Kumar

The generalization ability of machine learning models degrades significantly when the test distribution shifts away from the training distribution. We investigate the problem of training models that are robust to shifts caused by changes in the distribution of class-priors or group-priors. The presence of skewed training priors can often lead to the models overfitting to spurious features. Unlike existing methods, which optimize for either the worst or the average performance over classes or groups, our work is motivated by the need for finer control over the robustness properties of the model. We present an extremely lightweight post-hoc approach that performs scaling adjustments to predictions from a pre-trained model, with the goal of minimizing a distributionally robust loss around a chosen target distribution. These adjustments are computed by solving a constrained optimization problem on a validation set and applied to the model during test time. Our constrained optimization objective is inspired from a natural notion of robustness to controlled distribution shifts. Our method comes with provable guarantees and empirically makes a strong case for distributional robust post-hoc classifiers. An empirical implementation is available at https://github.com/weijiaheng/Drops.

NeurIPS Conference 2023 Conference Paper

Towards Last-layer Retraining for Group Robustness with Fewer Annotations

  • Tyler LaBonte
  • Vidya Muthukumar
  • Abhishek Kumar

Empirical risk minimization (ERM) of neural networks is prone to over-reliance on spurious correlations and poor generalization on minority groups. The recent deep feature reweighting (DFR) technique achieves state-of-the-art group robustness via simple last-layer retraining, but it requires held-out group and class annotations to construct a group-balanced reweighting dataset. In this work, we examine this impractical requirement and find that last-layer retraining can be surprisingly effective with no group annotations (other than for model selection) and only a handful of class annotations. We first show that last-layer retraining can greatly improve worst-group accuracy even when the reweighting dataset has only a small proportion of worst-group data. This implies a "free lunch" where holding out a subset of training data to retrain the last layer can substantially outperform ERM on the entire dataset with no additional data, annotations, or computation for training. To further improve group robustness, we introduce a lightweight method called selective last-layer finetuning (SELF), which constructs the reweighting dataset using misclassifications or disagreements. Our experiments present the first evidence that model disagreement upsamples worst-group data, enabling SELF to nearly match DFR on four well-established benchmarks across vision and language tasks with no group annotations and less than 3% of the held-out class annotations.

AAAI Conference 2023 Conference Paper

UEQMS: UMAP Embedded Quick Mean Shift Algorithm for High Dimensional Clustering

  • Abhishek Kumar
  • Swagatam Das
  • Rammohan Mallipeddi

The mean shift algorithm is a simple yet very effective clustering method widely used for image and video segmentation as well as other exploratory data analysis applications. Recently, a new algorithm called MeanShift++ (MS++) for low-dimensional clustering was proposed with a speedup of 4000 times over the vanilla mean shift. In this work, starting with a first-of-its-kind theoretical analysis of MS++, we extend its reach to high-dimensional data clustering by integrating the Uniform Manifold Approximation and Projection (UMAP) based dimensionality reduction in the same framework. Analytically, we show that MS++ can indeed converge to a non-critical point. Subsequently, we suggest modifications to MS++ to improve its convergence characteristics. In addition, we propose a way to further speed up MS++ by avoiding the execution of the MS++ iterations for every data point. By incorporating UMAP with modified MS++, we design a faster algorithm, named UMAP embedded quick mean shift (UEQMS), for partitioning data with a relatively large number of recorded features. Through extensive experiments, we showcase the efficacy of UEQMS over other state-of-the-art algorithms in terms of accuracy and runtime.

TMLR Journal 2022 Journal Article

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

  • Kushagra Pandey
  • Avideep Mukherjee
  • Piyush Rai
  • Abhishek Kumar

Diffusion probabilistic models have been shown to generate state-of-the-art results on several competitive image synthesis benchmarks but lack a low-dimensional, interpretable latent space, and are slow at generation. On the other hand, standard Variational Autoencoders (VAEs) typically have access to a low-dimensional latent space but exhibit poor sample quality. We present DiffuseVAE, a novel generative framework that integrates VAE within a diffusion model framework, and leverage this to design novel conditional parameterizations for diffusion models. We show that the resulting model equips diffusion models with a low-dimensional VAE inferred latent code which can be used for downstream tasks like controllable synthesis. The proposed method also improves upon the speed vs quality tradeoff exhibited in standard unconditional DDPM/DDIM models (for instance, \textbf{FID of 16.47 vs 34.36} using a standard DDIM on the CelebA-HQ-128 benchmark using \textbf{T=10} reverse process steps) without having explicitly trained for such an objective. Furthermore, the proposed model exhibits synthesis quality comparable to state-of-the-art models on standard image synthesis benchmarks like CIFAR-10 and CelebA-64 while outperforming most existing VAE-based methods. Lastly, we show that the proposed method exhibits inherent generalization to different types of noise in the conditioning signal. For reproducibility, our source code is publicly available at \url{https://github.com/kpandey008/DiffuseVAE}.

AAAI Conference 2021 Conference Paper

Generalized Adversarially Learned Inference

  • Yatin Dandi
  • Homanga Bharadhwaj
  • Abhishek Kumar
  • Piyush Rai

Allowing effective inference of latent vectors while training GANs can greatly increase their applicability in various downstream tasks. Recent approaches, such as ALI and BiGAN frameworks, develop methods of inference of latent variables in GANs by adversarially training an image generator along with an encoder to match two joint distributions of image and latent vector pairs. We generalize these approaches to incorporate multiple layers of feedback on reconstructions, self-supervision, and other forms of supervision based on prior or learned knowledge about the desired solutions. We achieve this by modifying the discriminator’s objective to correctly identify more than two joint distributions of tuples of an arbitrary number of random variables consisting of images, latent vectors, and other variables generated through auxiliary tasks, such as reconstruction and inpainting or as outputs of suitable pre-trained models. We design a non-saturating maximization objective for the generator-encoder pair and prove that the resulting adversarial game corresponds to a global optimum that simultaneously matches all the distributions. Within our proposed framework, we introduce a novel set of techniques for providing self-supervised feedback to the model based on properties, such as patch-level correspondence and cycle consistency of reconstructions. Through comprehensive experiments, we demonstrate the efficacy, scalability, and flexibility of the proposed approach for a variety of tasks. The appendix of the paper can be found at the following link: https: //drive. google. com/file/ d/1i99e682CqYWMEDXlnqkqrctGLVA9viiz/view? usp= sharing

ICML Conference 2021 Conference Paper

Implicit rate-constrained optimization of non-decomposable objectives

  • Abhishek Kumar
  • Harikrishna Narasimhan
  • Andrew Cotter

We consider a popular family of constrained optimization problems arising in machine learning that involve optimizing a non-decomposable evaluation metric with a certain thresholded form, while constraining another metric of interest. Examples of such problems include optimizing false negative rate at a fixed false positive rate, optimizing precision at a fixed recall, optimizing the area under the precision-recall or ROC curves, etc. Our key idea is to formulate a rate-constrained optimization that expresses the threshold parameter as a function of the model parameters via the Implicit Function theorem. We show how the resulting optimization problem can be solved using standard gradient based methods. Experiments on benchmark datasets demonstrate the effectiveness of our proposed method over existing state-of-the-art approaches for these problems.

ICLR Conference 2021 Conference Paper

Score-Based Generative Modeling through Stochastic Differential Equations

  • Yang Song 0011
  • Jascha Sohl-Dickstein
  • Diederik P. Kingma
  • Abhishek Kumar
  • Stefano Ermon
  • Ben Poole

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (a.k.a., score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples. We show that this framework encapsulates previous approaches in score-based generative modeling and diffusion probabilistic modeling, allowing for new sampling procedures and new modeling capabilities. In particular, we introduce a predictor-corrector framework to correct errors in the evolution of the discretized reverse-time SDE. We also derive an equivalent neural ODE that samples from the same distribution as the SDE, but additionally enables exact likelihood computation, and improved sampling efficiency. In addition, we provide a new way to solve inverse problems with score-based models, as demonstrated with experiments on class-conditional generation, image inpainting, and colorization. Combined with multiple architectural improvements, we achieve record-breaking performance for unconditional image generation on CIFAR-10 with an Inception score of 9.89 and FID of 2.20, a competitive likelihood of 2.99 bits/dim, and demonstrate high fidelity generation of $1024\times 1024$ images for the first time from a score-based generative model.

ICML Conference 2020 Conference Paper

On Implicit Regularization in β-VAEs

  • Abhishek Kumar
  • Ben Poole

While the impact of variational inference (VI) on posterior inference in a fixed generative model is well-characterized, its role in regularizing a learned generative model when used in variational autoencoders (VAEs) is poorly understood. We study the regularizing effects of variational distributions on learning in generative models from two perspectives. First, we analyze the role that the choice of variational family plays in imparting uniqueness to the learned model by restricting the set of optimal generative models. Second, we study the regularization effect of the variational family on the local geometry of the decoding model. This analysis uncovers the regularizer implicit in the $\beta$-VAE objective, and leads to an approximation consisting of a deterministic autoencoding objective plus analytic regularizers that depend on the Hessian or Jacobian of the decoding model, unifying VAEs with recent heuristics proposed for training regularized autoencoders. We empirically verify these findings, observing that the proposed deterministic objective exhibits similar behavior to the $\beta$-VAE in terms of objective value and sample quality.

ICLR Conference 2020 Conference Paper

Weakly Supervised Disentanglement with Guarantees

  • Rui Shu
  • Yining Chen
  • Abhishek Kumar
  • Stefano Ermon
  • Ben Poole

Learning disentangled representations that correspond to factors of variation in real-world data is critical to interpretable and human-controllable machine learning. Recently, concerns about the viability of learning disentangled representations in a purely unsupervised manner has spurred a shift toward the incorporation of weak supervision. However, there is currently no formalism that identifies when and how weak supervision will guarantee disentanglement. To address this issue, we provide a theoretical framework to assist in analyzing the disentanglement guarantees (or lack thereof) conferred by weak supervision when coupled with learning algorithms based on distribution matching. We empirically verify the guarantees and limitations of several weak supervision methods (restricted labeling, match-pairing, and rank-pairing), demonstrating the predictive power and usefulness of our theoretical framework.

NeurIPS Conference 2018 Conference Paper

Co-regularized Alignment for Unsupervised Domain Adaptation

  • Abhishek Kumar
  • Prasanna Sattigeri
  • Kahini Wadhawan
  • Leonid Karlinsky
  • Rogerio Feris
  • Bill Freeman
  • Gregory Wornell

Deep neural networks, trained with large amount of labeled data, can fail to generalize well when tested with examples from a target domain whose distribution differs from the training data distribution, referred as the source domain. It can be expensive or even infeasible to obtain required amount of labeled data in all possible domains. Unsupervised domain adaptation sets out to address this problem, aiming to learn a good predictive model for the target domain using labeled examples from the source domain but only unlabeled examples from the target domain. Domain alignment approaches this problem by matching the source and target feature distributions, and has been used as a key component in many state-of-the-art domain adaptation methods. However, matching the marginal feature distributions does not guarantee that the corresponding class conditional distributions will be aligned across the two domains. We propose co-regularized domain alignment for unsupervised domain adaptation, which constructs multiple diverse feature spaces and aligns source and target distributions in each of them individually, while encouraging that alignments agree with each other with regard to the class predictions on the unlabeled target examples. The proposed method is generic and can be used to improve any domain adaptation method which uses domain alignment. We instantiate it in the context of a recent state-of-the-art method and observe that it provides significant performance improvements on several domain adaptation benchmarks.

NeurIPS Conference 2018 Conference Paper

Delta-encoder: an effective sample synthesis method for few-shot object recognition

  • Eli Schwartz
  • Leonid Karlinsky
  • Joseph Shtok
  • Sivan Harary
  • Mattias Marder
  • Abhishek Kumar
  • Rogerio Feris
  • Raja Giryes

Learning to classify new categories based on just one or a few examples is a long-standing challenge in modern computer vision. In this work, we propose a simple yet effective method for few-shot (and one-shot) object recognition. Our approach is based on a modified auto-encoder, denoted delta-encoder, that learns to synthesize new samples for an unseen category just by seeing few examples from it. The synthesized samples are then used to train a classifier. The proposed approach learns to both extract transferable intra-class deformations, or "deltas", between same-class pairs of training examples, and to apply those deltas to the few provided examples of a novel class (unseen during training) in order to efficiently synthesize samples from that new class. The proposed method improves the state-of-the-art of one-shot object-recognition and performs comparably in the few-shot case.

NeurIPS Conference 2017 Conference Paper

Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference

  • Abhishek Kumar
  • Prasanna Sattigeri
  • Tom Fletcher

Semi-supervised learning methods using Generative adversarial networks (GANs) have shown promising empirical success recently. Most of these methods use a shared discriminator/classifier which discriminates real examples from fake while also predicting the class label. Motivated by the ability of the GANs generator to capture the data manifold well, we propose to estimate the tangent space to the data manifold using GANs and employ it to inject invariances into the classifier. In the process, we propose enhancements over existing methods for learning the inverse mapping (i. e. , the encoder) which greatly improves in terms of semantic similarity of the reconstructed sample with the input sample. We observe considerable empirical gains in semi-supervised learning over baselines, particularly in the cases when the number of labeled examples is low. We also provide insights into how fake examples influence the semi-supervised learning procedure.

NeurIPS Conference 2012 Conference Paper

Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

  • Piyush Rai
  • Abhishek Kumar
  • Hal Daume

Multiple-output regression models require estimating multiple functions, one for each output. To improve parameter estimation in such models, methods based on structural regularization of the model parameters are usually needed. In this paper, we present a multiple-output regression model that leverages the covariance structure of the functions (i. e. , how the multiple functions are related with each other) as well as the conditional covariance structure of the outputs. This is in contrast with existing methods that usually take into account only one of these structures. More importantly, unlike most of the other existing methods, none of these structures need be known a priori in our model, and are learned from the data. Several previously proposed structural regularization based multiple-output regression models turn out to be special cases of our model. Moreover, in addition to being a rich model for multiple-output regression, our model can also be used in estimating the graphical model structure of a set of variables (multivariate outputs) conditioned on another set of variables (inputs). Experimental results on both synthetic and real datasets demonstrate the effectiveness of our method.

NeurIPS Conference 2011 Conference Paper

Co-regularized Multi-view Spectral Clustering

  • Abhishek Kumar
  • Piyush Rai
  • Hal Daume

In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Since the true clustering would assign a point to the same cluster irrespective of the view, we can approach this problem by looking for clusterings that are consistent across the views, i. e. , corresponding data points in each view should have same cluster membership. We propose a spectral clustering framework that achieves this goal by co-regularizing the clustering hypotheses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches.

NeurIPS Conference 2010 Conference Paper

Co-regularization Based Semi-supervised Domain Adaptation

  • Abhishek Kumar
  • Avishek Saha
  • Hal Daume

This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.