Arrow Research search

Author name cluster

Arpit Bansal

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
2 author rows

Possible papers

8

NeurIPS Conference 2024 Conference Paper

Transformers Can Do Arithmetic with the Right Embeddings

  • Sean McLeish
  • Arpit Bansal
  • Alex Stein
  • Neel Jain
  • John Kirchenbauer
  • Brian R. Bartoldson
  • Bhavya Kailkhura
  • Abhinav Bhatele

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

ICLR Conference 2024 Conference Paper

Universal Guidance for Diffusion Models

  • Arpit Bansal
  • Hong-Min Chu
  • Avi Schwarzschild
  • Soumyadip Sengupta
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, style guidance and classifier signals.

ICLR Conference 2023 Conference Paper

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

  • Yuxin Wen
  • Arpit Bansal
  • Hamid Kazemi
  • Eitan Borgnia
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings.

NeurIPS Conference 2023 Conference Paper

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

  • Arpit Bansal
  • Eitan Borgnia
  • Hong-Min Chu
  • Jie Li
  • Hamid Kazemi
  • Furong Huang
  • Micah Goldblum
  • Jonas Geiping

Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact, an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e. g. , blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference and paves the way for generalized diffusion models that invert arbitrary processes.

ICLR Conference 2023 Conference Paper

Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent

  • Ping-Yeh Chiang
  • Renkun Ni
  • David Yu Miller
  • Arpit Bansal
  • Jonas Geiping
  • Micah Goldblum
  • Tom Goldstein

It is commonly believed that the implicit regularization of optimizers is needed for neural networks to generalize in the overparameterized regime. In this paper, we observe experimentally that this implicit regularization behavior is {\em generic}, i.e. it does not depend strongly on the choice of optimizer. We demonstrate this by training neural networks using several gradient-free optimizers, which do not benefit from properties that are often attributed to gradient-based optimizers. This includes a guess-and-check optimizer that generates uniformly random parameter vectors until finding one that happens to achieve perfect train accuracy, and a zeroth-order Pattern Search optimizer that uses no gradient computations. In the low sample and few-shot regimes, where zeroth order optimizers are most computationally tractable, we find that these non-gradient optimizers achieve test accuracy comparable to SGD. The code to reproduce results can be found at https://github.com/Ping-C/optimizer .

ICLR Conference 2023 Conference Paper

Transfer Learning with Deep Tabular Models

  • Roman Levin
  • Valeriia Cherepanova
  • Avi Schwarzschild
  • Arpit Bansal
  • C. Bayan Bruss
  • Tom Goldstein
  • Andrew Gordon Wilson
  • Micah Goldblum

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they are easily fine-tuned in new domains and learn reusable features. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we explore the benefits that representation learning provides for knowledge transfer in the tabular domain. We conduct experiments in a realistic medical diagnosis test bed with limited amounts of downstream data and find that transfer learning with deep tabular models provides a definitive advantage over gradient boosted decision tree methods. We further compare the supervised and self-supervised pretraining strategies and provide practical advice on transfer learning with tabular models. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications.

ICML Conference 2022 Conference Paper

Certified Neural Network Watermarks with Randomized Smoothing

  • Arpit Bansal
  • Ping-Yeh Chiang
  • Michael J. Curry
  • Rajiv Jain
  • Curtis Wigington
  • Varun Manjunatha
  • John Dickerson 0001
  • Tom Goldstein

Watermarking is a commonly used strategy to protect creators’ rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models – in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique, we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.

NeurIPS Conference 2022 Conference Paper

End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking

  • Arpit Bansal
  • Avi Schwarzschild
  • Eitan Borgnia
  • Zeyad Emam
  • Furong Huang
  • Micah Goldblum
  • Tom Goldstein

Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking. " We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard extrapolation tasks.