Arrow Research search

Author name cluster

Tom Goldstein

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

106 papers
2 author rows

Possible papers

106

NeurIPS Conference 2025 Conference Paper

A Technical Report on “Erasing the Invisible”: The 2024 NeurIPS Competition on Stress Testing Image Watermarks

  • Mucong Ding
  • Bang An
  • Tahseen Rabbani
  • Chenghao Deng
  • Anirudh Satheesh
  • Souradip Chakraborty
  • Mehrdad Saberi
  • Yuxin Wen

AI-generated images have become pervasive, raising critical concerns around content authenticity, intellectual property, and the spread of misinformation. Invisible watermarks offer a promising solution for identifying AI-generated images, preserving content provenance without degrading visual quality. However, their real-world robustness remains uncertain due to the lack of standardized evaluation protocols and large-scale stress testing. To bridge this gap, we organized “Erasing the Invisible, ” a NeurIPS 2024 competition and newly established benchmark designed to systematically stress testing the resilience of watermarking techniques. The competition introduced two attack tracks—Black-box and Beige-box—that simulate practical scenarios with varying levels of attacker knowledge on watermarks, providing a comprehensive assessment of watermark robustness. The competition attracted significant global participation, with 2, 722 submissions from 298 teams. Through a rigorous evaluation pipeline featuring real-time feedback and human-verified final rankings, participants developed and demonstrated new attack strategies that revealed critical vulnerabilities in state-of-the-art watermarking methods. On average, the top-5 teams in both tracks could remove watermarks from $\geq$ 89% of the images while preserving high visual quality, setting strong baselines for future research on watermark attacks and defenses. To support continued progress in this field, we summarize the insights and lessons learned from this competition in this paper, and release the benchmark dataset, evaluation toolkit, and competition results. “Erasing the Invisible” establishes a valuable open resource for advancing more robust watermarking techniques and strengthening content provenance in the era of generative AI.

AAAI Conference 2025 Conference Paper

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

  • Michael-Andrei Panaitescu-Liess
  • Zora Che
  • Bang An
  • Yuancheng Xu
  • Pankayaraj Pathmanathan
  • Souradip Chakraborty
  • Sicheng Zhu
  • Tom Goldstein

Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis and empirical evaluation, we demonstrate that incorporating watermarks into LLMs significantly reduces the likelihood of generating copyrighted content, thereby addressing a critical concern in the deployment of LLMs. However, we also find that watermarking can have unintended consequences on Membership Inference Attacks (MIAs), which aim to discern whether a sample was part of the pretraining dataset and may be used to detect copyright violations. Surprisingly, we find that watermarking adversely affects the success rate of MIAs, complicating the task of detecting copyrighted text in the pretraining dataset. These results reveal the complex interplay between different regulatory measures, which may impact each other in unforeseen ways. Finally, we propose an adaptive technique to improve the success rate of a recent MIA under watermarking. Our findings underscore the importance of developing adaptive methods to study critical problems in LLMs with potential legal implications.

NeurIPS Conference 2025 Conference Paper

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts

  • Ashwinee Panda
  • Vatsal Baherwani
  • Zain Sarwar
  • Benjamin Thérien
  • Sambit Sahu
  • Tom Goldstein
  • Supriyo Chakraborty

Mixture of Experts (MoE) pretraining is more scalable than dense Transformer pretraining, because MoEs learn to route inputs to a sparse set of their feedforward parameters. However, this means that MoEs only receive a sparse backward update, leading to training instability and suboptimal performance. We present a lightweight approximation method that gives the MoE router a dense gradient update while continuing to sparsely activate its parameters. Our method, which we refer to as Default MoE, substitutes missing expert activations with default outputs consisting of an exponential moving average of expert outputs previously seen over the course of training. This allows the router to receive signals from every expert for each token, leading to significant improvements in training performance. Our Default MoE outperforms standard TopK routing in a variety of settings without requiring significant computational overhead.

NeurIPS Conference 2025 Conference Paper

FineGRAIN: Evaluating Failure Modes of Text-to-Image Models with Vision Language Model Judges

  • Kevin Hayes
  • Micah Goldblum
  • Vikash Sehwag
  • Gowthami Somepalli
  • Ashwinee Panda
  • Tom Goldstein

Text-to-image (T2I) models are capable of generating visually impressive images, yet they often fail to accurately capture specific attributes in user prompts, such as the correct number of objects with the specified colors. The diversity of such errors underscores the need for a hierarchical evaluation framework that can compare prompt adherence abilities of different image generation models. Simultaneously, benchmarks of vision language models (VLMs) have not kept pace with the complexity of scenes that VLMs are used to annotate. In this work, we propose a structured methodology for jointly evaluating T2I models and VLMs by testing whether VLMs can identify 27 specific failure modes in the images generated by T2I models conditioned on challenging prompts. Our second contribution is a dataset of prompts and images generated by 5 T2I models (Flux, SD3-Medium, SD3-Large, SD3. 5-Medium, SD3. 5-Large) and the corresponding annotations from VLMs (Molmo, InternVL3, Pixtral) annotated by an LLM (Llama3) to test whether VLMs correctly identify the failure mode in a generated image. By analyzing failure modes on a curated set of prompts, we reveal systematic errors in attribute fidelity and object representation. Our findings suggest that current metrics are insufficient to capture these nuanced errors, highlighting the importance of targeted benchmarks for advancing generative model reliability and interpretability.

NeurIPS Conference 2025 Conference Paper

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

  • Sean McLeish
  • John Kirchenbauer
  • David Miller
  • Siddharth Singh
  • Abhinav Bhatele
  • Micah Goldblum
  • Ashwinee Panda
  • Tom Goldstein

Scaling laws are typically fit using a family of models with a narrow range of frozen hyperparameter choices. In this work we study scaling laws using multiple architectural shapes and hyperparameter choices, highlighting their impact on resulting prescriptions. As a primary artifact of our research, we release the Gemstones: an open-source scaling law dataset, consisting of over 4000 checkpoints from transformers with up to 2 billion parameters and diverse architectural shapes; including ablations over learning rate and cooldown. Our checkpoints enable more complex studies of scaling, such as analyzing the relationship between width and depth. By examining our model suite, we find that the prescriptions of scaling laws can be highly sensitive to the experimental design process and the specific model checkpoints used during fitting.

ICLR Conference 2025 Conference Paper

LiveBench: A Challenging, Contamination-Limited LLM Benchmark

  • Colin White
  • Samuel Dooley
  • Manley Roberts
  • Arka Pal
  • Benjamin Feuer
  • Siddhartha Jain 0001
  • Ravid Shwartz-Ziv
  • Neel Jain

Test set contamination, wherein test data from a benchmark ends up in a newer model's training set, is a well-documented obstacle for fair LLM evaluation and can quickly render benchmarks obsolete. To mitigate this, many recent benchmarks crowdsource new prompts and evaluations from human or LLM judges; however, these can introduce significant biases, and break down when scoring hard questions. In this work, we introduce a new benchmark for LLMs designed to be resistant to both test set contamination and the pitfalls of LLM judging and human crowdsourcing. We release LiveBench, the first benchmark that (1) contains frequently-updated questions from recent information sources, (2) scores answers automatically according to objective ground-truth values, and (3) contains a wide variety of challenging tasks, spanning math, coding, reasoning, language, instruction following, and data analysis. To achieve this, LiveBench contains questions that are based on recently-released math competitions, arXiv papers, news articles, and datasets, and it contains harder, contamination-limited versions of tasks from previous benchmarks such as Big-Bench Hard, AMPS, and IFEval. We evaluate many prominent closed-source models, as well as dozens of open-source models ranging from 0.5B to 405B in size. LiveBench is difficult, with top models achieving below 70% accuracy. We release all questions, code, and model answers. Questions are added and updated on a monthly basis, and we release new tasks and harder versions of tasks over time so that LiveBench can distinguish between the capabilities of LLMs as they improve in the future. We welcome community engagement and collaboration for expanding the benchmark tasks and models.

AAMAS Conference 2025 Conference Paper

Multimodal Agentic Model Predictive Control

  • Saptarashmi Bandyopadhyay
  • John (Jack) Cole
  • Tom Goldstein
  • David Jacobs

Control problems for autonomous AI agents, especially safetycritical applications such as autonomous vehicle control, require robust decision-making frameworks to ensure safe navigation in such complex and dynamic environments. This necessitates approaches such as Agentic Model Predictive Control (MPC), which can anticipate future problems and plan for them accordingly. Recently, Multimodal Vision Language Models (VLMs), have emerged as a way to give a semantic meaning to a scene that draws on extremely large amounts of information and contextual understanding of the world. These models vary in a wide range of sizes, trading off speed with performance as they scale further and further. This paper introduces a novel framework that integrates MPC with Multimodal VLMs in order to enhance the ability of autonomous vehicles to navigate and respond to real-world scenarios. Leveraging the opensource Waymax library released by Waymo, along with Waymo Open Motion, Berkeley DeepDrive and NuScenes Datasets, our method uses Multimodal VLMs to detect and draw bounding boxes around important parts of the scene, such as pedestrians or other vehicles. These models are helpful for querying specific attributes of identified objects, such as telling if a vehicle is accelerating or decelerating, or by recognizing if a newly detected obstacle is on a collision course with the vehicle. By incorporating these and other semantic insights into an MPC framework, an autonomous vehicle can make more informed and more context aware decisions to mitigate the risk of a collision and safely navigate its surroundings. We evaluate our approach in diverse simulated environments using VLMs of different scales, demonstrating improvements in safety metrics compared to traditional MPC methods. The integration of VLMs with MPC represents a significant advancement in autonomous decision-making, and especially in dynamic and uncertain situations. Our approach paves the way for future research in using Multimodal VLMs for more intelligent and adaptable autonomous agents. This work is licensed under a Creative Commons Attribution International 4. 0 License. Proc. of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2025), Y. Vorobeychik, S. Das, A. Nowé (eds.), May 19 – 23, 2025, Detroit, Michigan, USA. © 2025 International Foundation for Autonomous Agents and Multiagent Systems (www. ifaamas. org).

NeurIPS Conference 2025 Conference Paper

Quantifying Cross-Modality Memorization in Vision-Language Models

  • Yuxin Wen
  • Yangsibo Huang
  • Tom Goldstein
  • Ravi Kumar
  • Badih Ghazi
  • Chiyuan Zhang

Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for real-world, knowledge-intensive tasks. While previous studies primarily investigate memorization within a single modality, such as text memorization in large language models or image memorization in diffusion models, unified multimodal models are becoming increasingly prevalent in practical applications. In this work, we focus on the unique characteristics of cross-modality memorization and conduct a systematic study centered on vision-language models. To facilitate controlled experiments, we first introduce a synthetic persona dataset comprising diverse synthetic person images and textual descriptions. We quantify factual knowledge memorization and cross-modal transferability by training models on a single modality and evaluating their performance in the other. Our results reveal that facts learned in one modality transfer to the other, but a significant gap exists between recalling information in the source and target modalities. Furthermore, we observe that this gap exists across various scenarios, including more capable models, machine unlearning, and the multi-hop case. At the end, we propose a baseline method to mitigate this challenge. We hope our study can inspire future research on developing more robust multimodal learning techniques to enhance cross-modal transferability.

NeurIPS Conference 2025 Conference Paper

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

  • Jonas Geiping
  • Sean McLeish
  • Neel Jain
  • John Kirchenbauer
  • Siddharth Singh
  • Brian Bartoldson
  • Bhavya Kailkhura
  • Abhinav Bhatele

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We train a proof-of-concept model from scratch with 3. 5 billion parameters and 800 billion tokens. We show that this model can effortlessly use varying levels of compute, significantly improving with additional compute especially on reasoning tasks, such as math and coding. Further, this architecture naturally reduces compute costs via zero-shot per-token adaptive compute, KV-cache sharing and speculative decoding.

NeurIPS Conference 2025 Conference Paper

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

  • Nikhil Kandpal
  • Brian Lester
  • Colin Raffel
  • Sebastian Majstorovic
  • Stella Biderman
  • Baber Abbasi
  • Luca Soldaini
  • Enrico Shippole

Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0. 1, an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more. Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0. 1-1T and Comma v0. 1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B. In addition to releasing the Common Pile v0. 1 itself, we also release the code used in its creation as well as the training mixture and checkpoints for the Comma v0. 1 models.

NeurIPS Conference 2024 Conference Paper

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

  • Abhimanyu Hans
  • Yuxin Wen
  • Neel Jain
  • John Kirchenbauer
  • Hamid Kazemi
  • Prajwal Singhania
  • Siddharth Singh
  • Gowthami Somepalli

Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subsets of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale LLaMA-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks. Code and checkpoints: https: //github. com/ahans30/goldfish-loss

NeurIPS Conference 2024 Conference Paper

CALVIN: Improved Contextual Video Captioning via Instruction Tuning

  • Gowthami Somepalli
  • Arkabandhu Chowdhury
  • Ronen Basri
  • Jonas Geiping
  • Tom Goldstein
  • David Jacobs

The recent emergence of powerful Vision-Language models (VLMs) has significantly improved image captioning. Some of these models are extended to caption videos as well. However, their capabilities to understand complex scenes are limited, and the descriptions they provide for scenes tend to be overly verbose and focused on the superficial appearance of objects. Scene descriptions, especially in movies, require a deeper contextual understanding, unlike general-purpose video captioning. To address this challenge, we propose a model, CALVIN, a specialized video LLM that leverages previous movie context to generate fully "contextual" scene descriptions. To achieve this, we train our model on a suite of tasks that integrate both image-based question-answering and video captioning within a unified framework, before applying instruction tuning to refine the model's ability to provide scene captions. Lastly, we observe that our model responds well to prompt engineering and few-shot in-context learning techniques, enabling the user to adapt it to any new movie with very little additional annotation.

NeurIPS Conference 2024 Conference Paper

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

  • Mucong Ding
  • Chenghao Deng
  • Jocelyn Choo
  • Zichu Wu
  • Aakriti Agrawal
  • Avi Schwarzschild
  • Tianyi Zhou
  • Tom Goldstein

Despite the abundance of datasets available for assessing large language models (LLMs), the scarcity of continuous and reliable difficulty labels for individual data points, in most cases, curtails their capacity to benchmark model generalization performance across different levels of complexity. Addressing this limitation, we present Easy2Hard, an innovative collection of 6 benchmark datasets featuring standardized difficulty labels spanning a wide range of domains, such as mathematics and programming problems, chess puzzles, and reasoning questions, providing a much-needed tool for those in demand of a dataset with varying degrees of difficulty for LLM assessment. We estimate the difficulty of individual problems by leveraging the performance data of many human subjects and LLMs on prominent leaderboards. Harnessing the rich human performance data, we employ widely recognized difficulty ranking systems, including the Item Response Theory (IRT) and Glicko-2 models, to uniformly assign difficulty scores to problems. The Easy2Hard datasets distinguish themselves from previous collections by incorporating a significantly higher proportion of challenging problems, presenting a novel and demanding test for state-of-the-art LLMs. Through extensive experiments conducted with six state-of-the-art LLMs on the Easy2Hard datasets, we offer valuable insights into their performance and generalization capabilities across varying degrees of difficulty, setting the stage for future research in LLM generalization.

TMLR Journal 2024 Journal Article

Graph Neural Networks Formed via Layer-wise Ensembles of Heterogeneous Base Models

  • Jiuhai Chen
  • Jonas Mueller
  • Vassilis N. Ioannidis
  • Tom Goldstein
  • David Wipf

Graph Neural Networks (GNNs) with numerical node features and graph structure as inputs have demonstrated superior performance on various semi-supervised learning tasks with graph data. However, the numerical node features utilized by GNNs are commonly extracted from raw data which is of text or tabular (numeric/categorical) type in most real-world applications. The best models for such data types in most standard supervised learning settings with IID (non-graph) data are not simple neural network layers and thus are not easily incorporated into a GNN. Here we propose a robust stacking framework that fuses graph-aware propagation with arbitrary models intended for IID data, which are ensembled and stacked in multiple layers. Our layer-wise framework leverages bagging and stacking strategies to enjoy strong generalization, in a manner which effectively mitigates label leakage and overfitting. Across a variety of graph datasets with tabular/text node features, our method achieves comparable or superior performance relative to both tabular/text and graph neural network models, as well as existing state-of-the-art hybrid strategies that combine the two.

ICRA Conference 2024 Conference Paper

Hierarchical Point Attention for Indoor 3D Object Detection

  • Manli Shu
  • Le Xue
  • Ning Yu 0006
  • Roberto Martín-Martín
  • Caiming Xiong
  • Tom Goldstein
  • Juan Carlos Niebles
  • Ran Xu 0001

3D object detection is an essential vision technique for various robotic systems, such as augmented reality and domestic robots. Transformers as versatile network architectures have recently seen great success in 3D point cloud object detection. However, the lack of hierarchy in a plain transformer restrains its ability to learn features at different scales. Such limitation makes transformer detectors perform worse on smaller objects and affects their reliability in indoor environments where small objects are the majority. This work proposes two novel attention operations as generic hierarchical designs for point-based transformer detectors. First, we propose Aggregated Multi-Scale Attention (MS-A) that builds multi-scale tokens from a single-scale input feature to enable more fine-grained feature learning. Second, we propose Size-Adaptive Local Attention (Local-A) with adaptive attention regions for localized feature aggregation within bounding box proposals. Both attention operations are model-agnostic network modules that can be plugged into existing point cloud transformers for end-to-end training. We evaluate our method on two widely used indoor detection benchmarks. By plugging our proposed modules into the state-of-the-art transformer-based 3D detectors, we improve the previous best results on both benchmarks, with more significant improvements on smaller objects.

ICML Conference 2024 Conference Paper

InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models

  • Lichang Chen
  • Jiuhai Chen
  • Tom Goldstein
  • Heng Huang 0001
  • Tianyi Zhou 0001

Large language models (LLMs) are instruction followers but the performance varies under different instructions. It is challenging to create the best instruction, especially for black-box LLMs on which backpropagation is forbidden. Instead of directly optimizing the discrete instruction, we optimize a low-dimensional soft prompt applied to an open-source LLM to generate the instruction for the black-box LLM. In each optimization step of the proposed method InstructZero, a soft prompt is converted into an instruction by the open-source LLM, which is then submitted to the black-box LLM for zero-shot evaluation, whose result is sent to Bayesian optimization to produce new soft prompts improving the zero-shot performance. We evaluate InstructZero on different combinations of open-source LLMs and APIs including Vicuna and ChatGPT. InstructZero outperforms SOTA auto-instruction methods across a variety of downstream tasks.

ICLR Conference 2024 Conference Paper

NEFTune: Noisy Embeddings Improve Instruction Finetuning

  • Neel Jain
  • Ping-Yeh Chiang
  • Yuxin Wen
  • John Kirchenbauer
  • Hong-Min Chu
  • Gowthami Somepalli
  • Brian R. Bartoldson
  • Bhavya Kailkhura

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves $29.79$\% on AlpacaEval, which rises to $64.69$\% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a $10$\% improvement, with ShareGPT an $8$\% improvement, and with OpenPlatypus an $8$\% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune. Particularly, we see these improvements on the conversational abilities of the instruction model and not on traditional tasks like those on the OpenLLM Leaderboard, where performance is the same.

ICML Conference 2024 Conference Paper

ODIN: Disentangled Reward Mitigates Hacking in RLHF

  • Lichang Chen
  • Chen Zhu 0001
  • Jiuhai Chen
  • Davit Soselia
  • Tianyi Zhou 0001
  • Tom Goldstein
  • Heng Huang 0001
  • Mohammad Shoeybi

In this work, we study the issue of reward hacking on the response length, a challenge emerging in Reinforcement Learning from Human Feedback (RLHF) on LLMs. A well-formatted, verbose but less helpful response from the LLMs can often deceive LLMs or even human evaluators and achieve high scores. The same issue also holds for some reward models in RL. To address the challenges in both training and evaluation, we establish a more reliable evaluation protocol for comparing different training configurations, which inspects the trade-off between LLM evaluation score and response length obtained by varying training hyperparameters. Based on this evaluation, we conduct large-scale studies, where the results shed insights into the efficacy of hyperparameters and tricks used in RL on mitigating length bias. We further propose to improve the reward model by jointly training two linear heads to predict the preference, one trained to correlate with length and the other trained to decorrelate with length and therefore focusing more on the actual content. We then discard the length head in RL to ignore the spurious length reward. Experiments demonstrate that our approach eliminates the reward correlation with length, and improves the obtained policy by a significant margin.

ICLR Conference 2024 Conference Paper

On the Reliability of Watermarks for Large Language Models

  • John Kirchenbauer
  • Jonas Geiping
  • Yuxin Wen
  • Manli Shu
  • Khalid Saifullah
  • Kezhi Kong
  • Kasun Fernando
  • Aniruddha Saha

As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. _Watermarking_ is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a $1\mathrm{e}{-5}$ false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

NeurIPS Conference 2024 Conference Paper

Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

  • Yuxin Wen
  • Leo Marchyok
  • Sanghyun Hong
  • Jonas Geiping
  • Tom Goldstein
  • Nicholas Carlini

It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the privacy leakage that arises when fine-tuning a model: when a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model. We conduct extensive experiments on various datasets and models, including both vision-language models (CLIP) and large language models, demonstrating the broad applicability and effectiveness of such an attack. Additionally, we carry out multiple ablation studies with different fine-tuning methods and inference strategies to thoroughly analyze this new threat. Our findings highlight a critical privacy concern within the machine learning community and call for a re-evaluation of safety protocols in the use of open-source pre-trained models.

NeurIPS Conference 2024 Conference Paper

Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models

  • Yuancheng Xu
  • Jiarui Yao
  • Manli Shu
  • Yanchao Sun
  • Zichu Wu
  • Ning Yu
  • Tom Goldstein
  • Furong Huang

Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, but their versatility raises security concerns. This study takes the first step in exposing VLMs’ susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is a traditional Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is a novel Persuasion Attack, leveraging VLMs’ text generation capabilities to craft persuasive and seemingly rational narratives for misinformation, such as portraying junk food as healthy. We show that Shadowcast effectively achieves the attacker’s intentions using as few as 50 poison samples. Crucially, the poisoned samples demonstrate transferability across different VLM architectures, posing a significant concern in black-box settings. Moreover, Shadowcast remains potent under realistic conditions involving various text prompts, training data augmentation, and image compression techniques. This work reveals how poisoned VLMs can disseminate convincing yet deceptive misinformation to everyday, benign users, emphasizing the importance of data integrity for responsible VLM deployments. Our code is available at: https: //github. com/umd-huang-lab/VLM-Poisoning.

ICML Conference 2024 Conference Paper

Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

  • Abhimanyu Hans
  • Avi Schwarzschild
  • Valeriia Cherepanova
  • Hamid Kazemi
  • Aniruddha Saha
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

Detecting text generated by modern large language models is thought to be hard, as both LLMs and humans can exhibit a wide range of complex behaviors. However, we find that a score based on contrasting two closely related language models is highly accurate at separating human-generated and machine-generated text. Based on this mechanism, we propose a novel LLM detector that only requires simple calculations using a pair of pre-trained LLMs. The method, called Binoculars, achieves state-of-the-art accuracy without any training data. It is capable of spotting machine text from a range of modern LLMs without any model-specific modifications. We comprehensively evaluate Binoculars on a number of text sources and in varied situations. Over a wide range of document types, Binoculars detects over 90% of generated samples from ChatGPT (and other LLMs) at a false positive rate of 0. 01%, despite not being trained on any ChatGPT data. Code available at https: //github. com/ahans30/Binoculars.

NeurIPS Conference 2024 Conference Paper

Transformers Can Do Arithmetic with the Right Embeddings

  • Sean McLeish
  • Arpit Bansal
  • Alex Stein
  • Neel Jain
  • John Kirchenbauer
  • Brian R. Bartoldson
  • Bhavya Kailkhura
  • Abhinav Bhatele

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

ICLR Conference 2024 Conference Paper

Universal Guidance for Diffusion Models

  • Arpit Bansal
  • Hong-Min Chu
  • Avi Schwarzschild
  • Soumyadip Sengupta
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, style guidance and classifier signals.

ICML Conference 2024 Conference Paper

WAVES: Benchmarking the Robustness of Image Watermarks

  • Bang An 0001
  • Mucong Ding
  • Tahseen Rabbani
  • Aakriti Agrawal
  • Yuancheng Xu
  • Chenghao Deng
  • Sicheng Zhu
  • Abdirisak Mohamed

In the burgeoning age of generative AI, watermarks act as identifiers of provenance and artificial content. We present WAVES (Watermark Analysis via Enhanced Stress-testing), a benchmark for assessing image watermark robustness, overcoming the limitations of current evaluation methods. WAVES integrates detection and identification tasks and establishes a standardized evaluation protocol comprised of a diverse range of stress tests. The attacks in WAVES range from traditional image distortions to advanced, novel variations of diffusive, and adversarial attacks. Our evaluation examines two pivotal dimensions: the degree of image quality degradation and the efficacy of watermark detection after attacks. Our novel, comprehensive evaluation reveals previously undetected vulnerabilities of several modern watermarking algorithms. We envision WAVES as a toolkit for the future development of robust watermarks.

NeurIPS Conference 2023 Conference Paper

A Performance-Driven Benchmark for Feature Selection in Tabular Deep Learning

  • Valeriia Cherepanova
  • Roman Levin
  • Gowthami Somepalli
  • Jonas Geiping
  • C. Bayan Bruss
  • Andrew G. Wilson
  • Tom Goldstein
  • Micah Goldblum

Academic tabular benchmarks often contain small sets of curated features. In contrast, data scientists typically collect as many features as possible into their datasets, and even engineer new features from existing ones. To prevent over-fitting in subsequent downstream modeling, practitioners commonly use automated feature selection methods that identify a reduced subset of informative features. Existing benchmarks for tabular feature selection consider classical downstream models, toy synthetic datasets, or do not evaluate feature selectors on the basis of downstream performance. We construct a challenging feature selection benchmark evaluated on downstream neural networks including transformers, using real datasets and multiple methods for generating extraneous features. We also propose an input-gradient-based analogue of LASSO for neural networks that outperforms classical feature selection methods on challenging problems such as selecting from corrupted or second-order features.

ICML Conference 2023 Conference Paper

A Watermark for Large Language Models

  • John Kirchenbauer
  • Jonas Geiping
  • Yuxin Wen
  • Jonathan Katz
  • Ian Miers
  • Tom Goldstein

Potential harms of large language models can be mitigated by watermarking model output, i. e. , embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

NeurIPS Conference 2023 Conference Paper

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

  • Micah Goldblum
  • Hossein Souri
  • Renkun Ni
  • Manli Shu
  • Viraj Prabhu
  • Gowthami Somepalli
  • Prithvijit Chattopadhyay
  • Mark Ibrahim

Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performance increases for a range of systems, it is difficult for practitioners to make informed decisions about which backbone to choose. Battle of the Backbones (BoB) makes this choice easier by benchmarking a diverse suite of pretrained models, including vision-language models, those trained via self-supervised learning, and the Stable Diffusion backbone, across a diverse set of computer vision tasks ranging from classification to object detection to OOD generalization and more. Furthermore, BoB sheds light on promising directions for the research community to advance computer vision by illuminating strengths and weakness of existing approaches through a comprehensive analysis conducted on more than 1500 training runs. While vision transformers (ViTs) and self-supervised learning (SSL) are increasingly popular, we find that convolutional neural networks pretrained in a supervised fashion on large training sets still perform best on most tasks among the models we consider. Moreover, in apples-to-apples comparisons on the same architectures and similarly sized pretraining datasets, we find that SSL backbones are highly competitive, indicating that future works should perform SSL pretraining with advanced architectures and larger pretraining datasets. We release the raw results of our experiments along with code that allows researchers to put their own backbones through the gauntlet here: https: //github. com/hsouri/Battle-of-the-Backbones.

ICLR Conference 2023 Conference Paper

Canary in a Coalmine: Better Membership Inference with Ensembled Adversarial Queries

  • Yuxin Wen
  • Arpit Bansal
  • Hamid Kazemi
  • Eitan Borgnia
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

As industrial applications are increasingly automated by machine learning models, enforcing personal data ownership and intellectual property rights requires tracing training data back to their rightful owners. Membership inference algorithms approach this problem by using statistical techniques to discern whether a target sample was included in a model's training set. However, existing methods only utilize the unaltered target sample or simple augmentations of the target to compute statistics. Such a sparse sampling of the model's behavior carries little information, leading to poor inference capabilities. In this work, we use adversarial tools to directly optimize for queries that are discriminative and diverse. Our improvements achieve significantly more accurate membership inference than existing methods, especially in offline scenarios and in the low false-positive regime which is critical in legal settings.

NeurIPS Conference 2023 Conference Paper

Cold Diffusion: Inverting Arbitrary Image Transforms Without Noise

  • Arpit Bansal
  • Eitan Borgnia
  • Hong-Min Chu
  • Jie Li
  • Hamid Kazemi
  • Furong Huang
  • Micah Goldblum
  • Jonas Geiping

Standard diffusion models involve an image transform -- adding Gaussian noise -- and an image restoration operator that inverts this degradation. We observe that the generative behavior of diffusion models is not strongly dependent on the choice of image degradation, and in fact, an entire family of generative models can be constructed by varying this choice. Even when using completely deterministic degradations (e. g. , blur, masking, and more), the training and test-time update rules that underlie diffusion models can be easily generalized to create generative models. The success of these fully deterministic models calls into question the community's understanding of diffusion models, which relies on noise in either gradient Langevin dynamics or variational inference and paves the way for generalized diffusion models that invert arbitrary processes.

ICML Conference 2023 Conference Paper

Cramming: Training a Language Model on a single GPU in one day

  • Jonas Geiping
  • Tom Goldstein

Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting. We provide code to reproduce all experiments at github. com/JonasGeiping/cramming.

ICLR Conference 2023 Conference Paper

Decepticons: Corrupted Transformers Breach Privacy in Federated Learning for Language Models

  • Liam Fowl
  • Jonas Geiping
  • Steven Reich
  • Yuxin Wen
  • Wojciech Czaja
  • Micah Goldblum
  • Tom Goldstein

Privacy is a central tenet of Federated learning (FL), in which a central server trains models without centralizing user data. However, gradient updates used in FL can leak user information. While the most industrial uses of FL are for text applications (e.g. keystroke prediction), the majority of attacks on user privacy in FL have focused on simple image classifiers and threat models that assume honest execution of the FL protocol from the server. We propose a novel attack that reveals private user text by deploying malicious parameter vectors, and which succeeds even with mini-batches, multiple users, and long sequences. Unlike previous attacks on FL, the attack exploits characteristics of both the Transformer architecture and the token embedding, separately extracting tokens and positional embeddings to retrieve high-fidelity text. We argue that the threat model of malicious server states is highly relevant from a user-centric perspective, and show that in this scenario, text applications using transformer models are much more vulnerable than previously thought.

ICLR Conference 2023 Conference Paper

Exploring and Exploiting Decision Boundary Dynamics for Adversarial Robustness

  • Yuancheng Xu
  • Yanchao Sun
  • Micah Goldblum
  • Tom Goldstein
  • Furong Huang

The robustness of a deep classifier can be characterized by its margins: the decision boundary's distances to natural data points. However, it is unclear whether existing robust training methods effectively increase the margin for each vulnerable point during training. To understand this, we propose a continuous-time framework for quantifying the relative speed of the decision boundary with respect to each individual point. Through visualizing the moving speed of the decision boundary under Adversarial Training, one of the most effective robust training algorithms, a surprising moving-behavior is revealed: the decision boundary moves away from some vulnerable points but simultaneously moves closer to others, decreasing their margins. To alleviate these conflicting dynamics of the decision boundary, we propose Dynamics-aware Robust Training (DyART), which encourages the decision boundary to engage in movement that prioritizes increasing smaller margins. In contrast to prior works, DyART directly operates on the margins rather than their indirect approximations, allowing for more targeted and effective robustness improvement. Experiments on the CIFAR-10 and Tiny-ImageNet datasets verify that DyART alleviates the conflicting dynamics of the decision boundary and obtains improved robustness under various perturbation sizes compared to the state-of-the-art defenses. Our code is available at https://github.com/Yuancheng-Xu/Dynamics-Aware-Robust-Training.

ICML Conference 2023 Conference Paper

GOAT: A Global Transformer on Large-scale Graphs

  • Kezhi Kong
  • Jiuhai Chen
  • John Kirchenbauer
  • Renkun Ni
  • C. Bayan Bruss
  • Tom Goldstein

Graph transformers have been competitive on graph classification tasks, but they fail to outperform Graph Neural Networks (GNNs) on node classification, which is a common task performed on large-scale graphs for industrial applications. Meanwhile, existing GNN architectures are limited in their ability to perform equally well on both homophilious and heterophilious graphs as their inductive biases are generally tailored to only one setting. To address these issues, we propose GOAT, a scalable global graph transformer. In GOAT, each node conceptually attends to all the nodes in the graph and homophily/heterophily relationships can be learnt adaptively from the data. We provide theoretical justification for our approximate global self-attention scheme, and show it to be scalable to large-scale graphs. We demonstrate the competitiveness of GOAT on both heterophilious and homophilious graphs with millions of nodes.

NeurIPS Conference 2023 Conference Paper

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

  • Yuxin Wen
  • Neel Jain
  • John Kirchenbauer
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

The strength of modern generative models lies in their ability to be controlled through prompts. Hard prompts comprise interpretable words and tokens, and are typically hand-crafted by humans. Soft prompts, on the other hand, consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily edited, re-used across models, or plugged into a text-based interface. We describe an easy-to-use approach to automatically optimize hard text prompts through efficient gradient-based optimization. Our approach can be readily applied to text-to-image and text-only applications alike. This method allows API users to easily generate, discover, and mix and match image concepts without prior knowledge of how to prompt the model. Furthermore, using our method, we can bypass token-level content filters imposed by Midjourney by optimizing through the open-sourced text encoder.

ICLR Conference 2023 Conference Paper

How Much Data Are Augmentations Worth? An Investigation into Scaling Laws, Invariance, and Implicit Regularization

  • Jonas Geiping
  • Micah Goldblum
  • Gowthami Somepalli
  • Ravid Shwartz-Ziv
  • Tom Goldstein
  • Andrew Gordon Wilson

Despite the clear performance benefits of data augmentations, little is known about why they are so effective. In this paper, we disentangle several key mechanisms through which data augmentations operate. Establishing an exchange rate between augmented and additional real data, we find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data. Moreover, we find that data augmentations which encourage invariances can be more valuable than invariance alone, especially on small and medium sized training sets. Following this observation, we show that augmentations induce additional stochasticity during training, effectively flattening the loss landscape.

ICLR Conference 2023 Conference Paper

Loss Landscapes are All You Need: Neural Network Generalization Can Be Explained Without the Implicit Bias of Gradient Descent

  • Ping-Yeh Chiang
  • Renkun Ni
  • David Yu Miller
  • Arpit Bansal
  • Jonas Geiping
  • Micah Goldblum
  • Tom Goldstein

It is commonly believed that the implicit regularization of optimizers is needed for neural networks to generalize in the overparameterized regime. In this paper, we observe experimentally that this implicit regularization behavior is {\em generic}, i.e. it does not depend strongly on the choice of optimizer. We demonstrate this by training neural networks using several gradient-free optimizers, which do not benefit from properties that are often attributed to gradient-based optimizers. This includes a guess-and-check optimizer that generates uniformly random parameter vectors until finding one that happens to achieve perfect train accuracy, and a zeroth-order Pattern Search optimizer that uses no gradient computations. In the low sample and few-shot regimes, where zeroth order optimizers are most computationally tractable, we find that these non-gradient optimizers achieve test accuracy comparable to SGD. The code to reproduce results can be found at https://github.com/Ping-C/optimizer .

NeurIPS Conference 2023 Conference Paper

On the Exploitability of Instruction Tuning

  • Manli Shu
  • Jiongxiao Wang
  • Chen Zhu
  • Jonas Geiping
  • Chaowei Xiao
  • Tom Goldstein

Instruction tuning is an effective technique to align large language models (LLMs) with human intent. In this work, we investigate how an adversary can exploit instruction tuning by injecting specific instruction-following examples into the training data that intentionally changes the model's behavior. For example, an adversary can achieve content injection by injecting training examples that mention target content and eliciting such behavior from downstream models. To achieve this goal, we propose \textit{AutoPoison}, an automated data poisoning pipeline. It naturally and coherently incorporates versatile attack goals into poisoned data with the help of an oracle LLM. We showcase two example attacks: content injection and over-refusal attacks, each aiming to induce a specific exploitable behavior. We quantify and benchmark the strength and the stealthiness of our data poisoning scheme. Our results show that AutoPoison allows an adversary to change a model's behavior by poisoning only a small fraction of data while maintaining a high level of stealthiness in the poisoned examples. We hope our work sheds light on how data quality affects the behavior of instruction-tuned models and raises awareness of the importance of data quality for responsible deployments of LLMs.

ICLR Conference 2023 Conference Paper

Panning for Gold in Federated Learning: Targeted Text Extraction under Arbitrarily Large-Scale Aggregation

  • Hong-Min Chu
  • Jonas Geiping
  • Liam Fowl
  • Micah Goldblum
  • Tom Goldstein

As federated learning (FL) matures, privacy attacks against FL systems in turn become more numerous and complex. Attacks on language models have progressed from recovering single sentences in simple classification tasks to recovering larger parts of user data. Current attacks against federated language models are sequence-agnostic and aim to extract as much data as possible from an FL update - often at the expense of fidelity for any particular sequence. Because of this, current attacks fail to extract any meaningful data under large-scale aggregation. In realistic settings, an attacker cares most about a small portion of user data that contains sensitive personal information, for example sequences containing the phrase "my credit card number is ...". In this work, we propose the first attack on FL that achieves targeted extraction of sequences that contain privacy-critical phrases, whereby we employ maliciously modified parameters to allow the transformer itself to filter relevant sequences from aggregated user data and encode them in the gradient update. Our attack can effectively extract sequences of interest even against extremely large-scale aggregation.

ICLR Conference 2023 Conference Paper

Provable Robustness against Wasserstein Distribution Shifts via Input Randomization

  • Aounon Kumar
  • Alexander Levine 0001
  • Tom Goldstein
  • Soheil Feizi

Certified robustness in machine learning has primarily focused on adversarial perturbations with a fixed attack budget for each sample in the input distribution. In this work, we present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution. We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under that transformation. Our framework allows the datum-specific perturbation size to vary across different points in the input distribution and is general enough to include fixed-sized perturbations as well. Our certificates produce guaranteed lower bounds on the performance of the model for any shift (natural or adversarial) of the input distribution within a Wasserstein ball around the original distribution. We apply our technique to certify robustness against natural (non-adversarial) transformations of images such as color shifts, hue shifts, and changes in brightness and saturation. We obtain strong performance guarantees for the robust model under clearly visible shifts in the input images. Our experiments establish the non-vacuousness of our certificates by showing that the certified lower bound on a robust model's accuracy is higher than the empirical accuracy of an undefended model under a distribution shift. Moreover, our results also imply guaranteed lower bounds (hardness result) on the performance of models trained on so-called "unlearnable" datasets that have been poisoned to interfere with model training. We show that the performance of a robust model is guaranteed to remain above a certain threshold on the test distribution even when the base model is trained on the poisoned dataset.

ICLR Conference 2023 Conference Paper

Transfer Learning with Deep Tabular Models

  • Roman Levin
  • Valeriia Cherepanova
  • Avi Schwarzschild
  • Arpit Bansal
  • C. Bayan Bruss
  • Tom Goldstein
  • Andrew Gordon Wilson
  • Micah Goldblum

Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they are easily fine-tuned in new domains and learn reusable features. This property is often exploited in computer vision and natural language applications, where transfer learning is indispensable when task-specific training data is scarce. In this work, we explore the benefits that representation learning provides for knowledge transfer in the tabular domain. We conduct experiments in a realistic medical diagnosis test bed with limited amounts of downstream data and find that transfer learning with deep tabular models provides a definitive advantage over gradient boosted decision tree methods. We further compare the supervised and self-supervised pretraining strategies and provide practical advice on transfer learning with tabular models. Finally, we propose a pseudo-feature method for cases where the upstream and downstream feature sets differ, a tabular-specific problem widespread in real-world applications.

NeurIPS Conference 2023 Conference Paper

Tree-Rings Watermarks: Invisible Fingerprints for Diffusion Images

  • Yuxin Wen
  • John Kirchenbauer
  • Jonas Geiping
  • Tom Goldstein

Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influences the entire sampling process, resulting in a model fingerprint that is invisible to humans. The watermark embeds a pattern into the initial noise vector used for sampling. These patterns are structured in Fourier space so that they are invariant to convolutions, crops, dilations, flips, and rotations. After image generation, the watermark signal is detected by inverting the diffusion process to retrieve the noise vector, which is then checked for the embedded signal. We demonstrate that this technique can be easily applied to arbitrary diffusion models, including text-conditioned Stable Diffusion, as a plug-in with negligible loss in FID. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed.

NeurIPS Conference 2023 Conference Paper

Understanding and Mitigating Copying in Diffusion Models

  • Gowthami Somepalli
  • Vasu Singla
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set. Code is available at https: //github. com/somepago/DCR.

NeurIPS Conference 2023 Conference Paper

What Can We Learn from Unlearnable Datasets?

  • Pedro Sandoval-Segura
  • Vasu Singla
  • Jonas Geiping
  • Micah Goldblum
  • Tom Goldstein

In an era of widespread web scraping, unlearnable dataset methods have the potential to protect data privacy by preventing deep neural networks from generalizing. But in addition to a number of practical limitations that make their use unlikely, we make a number of findings that call into question their ability to safeguard data. First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image protection is not assured. Unlearnable datasets are also believed to induce learning shortcuts through linear separability of added perturbations. We provide a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. To emphasize why linearly separable perturbations should not be relied upon, we propose an orthogonal projection attack which allows learning from unlearnable datasets published in ICML 2021 and ICLR 2023. Our proposed attack is significantly less complex than recently proposed techniques.

NeurIPS Conference 2022 Conference Paper

Autoregressive Perturbations for Data Poisoning

  • Pedro Sandoval-Segura
  • Vasu Singla
  • Jonas Geiping
  • Micah Goldblum
  • Tom Goldstein
  • David Jacobs

The prevalence of data scraping from social media as a means to obtain datasets has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping, as they make data ``unlearnable'' by adding small, imperceptible perturbations. Unfortunately, existing methods require knowledge of both the target architecture and the complete dataset so that a surrogate network can be trained, the parameters of which are used to generate the attack. In this work, we introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset. The proposed AR perturbations are generic, can be applied across different datasets, and can poison different architectures. Compared to existing unlearnable methods, our AR poisons are more resistant against common defenses such as adversarial training and strong data augmentations. Our analysis further provides insight into what makes an effective data poison.

ICML Conference 2022 Conference Paper

Certified Neural Network Watermarks with Randomized Smoothing

  • Arpit Bansal
  • Ping-Yeh Chiang
  • Michael J. Curry
  • Rajiv Jain
  • Curtis Wigington
  • Varun Manjunatha
  • John Dickerson 0001
  • Tom Goldstein

Watermarking is a commonly used strategy to protect creators’ rights to digital images, videos and audio. Recently, watermarking methods have been extended to deep learning models – in principle, the watermark should be preserved when an adversary tries to copy the model. However, in practice, watermarks can often be removed by an intelligent adversary. Several papers have proposed watermarking methods that claim to be empirically resistant to different types of removal attacks, but these new techniques often fail in the face of new or better-tuned adversaries. In this paper, we propose the first certifiable watermarking method. Using the randomized smoothing technique, we show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain $\ell_2$ threshold. In addition to being certifiable, our watermark is also empirically more robust compared to previous watermarking methods.

ICLR Conference 2022 Conference Paper

Diurnal or Nocturnal? Federated Learning of Multi-branch Networks from Periodically Shifting Distributions

  • Chen Zhu 0001
  • Zheng Xu 0002
  • Mingqing Chen
  • Jakub Konecný
  • Andrew Hard
  • Tom Goldstein

Federated learning has been deployed to train machine learning models from decentralized client data on mobile devices in practice. The clients available for training are observed to have periodically shifting distributions changing with the time of day, which can cause instability in training and degrade the model performance. In this paper, instead of modeling the distribution shift with a block-cyclic pattern as previous works, we model it with a mixture of distributions that gradually shifts between daytime and nighttime modes, and find this intuitive model to better match the observations in practical federated learning systems. Furthermore, we propose to jointly train a clustering model and a multi-branch network to allocate lightweight specialized branches to clients from different modes. A temporal prior is used to significantly boost the training performance. Experiments for image classification on EMNIST and CIFAR datasets, and next word prediction on the Stack Overflow dataset show that the proposed algorithm can counter the effects of the distribution shift and significantly improve the final model performance.

ICLR Conference 2022 Conference Paper

Does your graph need a confidence boost? Convergent boosted smoothing on graphs with tabular node features

  • Jiuhai Chen
  • Jonas Mueller 0001
  • Vassilis N. Ioannidis
  • Soji Adeshina
  • Yangkun Wang
  • Tom Goldstein
  • David P. Wipf

Many practical modeling tasks require making predictions using tabular data composed of heterogeneous feature types (e.g., text-based, categorical, continuous, etc.). In this setting boosted decision trees and related ensembling techniques generally dominate real-world applications involving iid training/test sets. However, when there are relations between samples and the iid assumption is no longer reasonable, it remains unclear how to incorporate these dependencies within existing boosting pipelines. To this end, we propose a generalized framework for combining boosted trees and more general model ensembling techniques, with graph propagation layers that share node/sample information across edges connecting related samples. And unlike previous efforts to integrate graph-based models with boosting, our approach is anchored to a principled meta loss function such that provable convergence can be guaranteed under relatively mild assumptions. Across a variety of benchmarks involving non-iid graph data with tabular node features, our framework achieves comparable or superior performance.

NeurIPS Conference 2022 Conference Paper

End-to-end Algorithm Synthesis with Recurrent Networks: Extrapolation without Overthinking

  • Arpit Bansal
  • Avi Schwarzschild
  • Eitan Borgnia
  • Zeyad Emam
  • Furong Huang
  • Micah Goldblum
  • Tom Goldstein

Machine learning systems perform well on pattern matching tasks, but their ability to perform algorithmic or logical reasoning is not well understood. One important reasoning capability is algorithmic extrapolation, in which models trained only on small/simple reasoning problems can synthesize complex strategies for large/complex problems at test time. Algorithmic extrapolation can be achieved through recurrent systems, which can be iterated many times to solve difficult reasoning problems. We observe that this approach fails to scale to highly complex problems because behavior degenerates when many iterations are applied -- an issue we refer to as "overthinking. " We propose a recall architecture that keeps an explicit copy of the problem instance in memory so that it cannot be forgotten. We also employ a progressive training routine that prevents the model from learning behaviors that are specific to iteration number and instead pushes it to learn behaviors that can be repeated indefinitely. These innovations prevent the overthinking problem, and enable recurrent systems to solve extremely hard extrapolation tasks.

ICML Conference 2022 Conference Paper

Fishing for User Data in Large-Batch Federated Learning via Gradient Magnification

  • Yuxin Wen
  • Jonas Geiping
  • Liam Fowl
  • Micah Goldblum
  • Tom Goldstein

Federated learning (FL) has rapidly risen in popularity due to its promise of privacy and efficiency. Previous works have exposed privacy vulnerabilities in the FL pipeline by recovering user data from gradient updates. However, existing attacks fail to address realistic settings because they either 1) require toy settings with very small batch sizes, or 2) require unrealistic and conspicuous architecture modifications. We introduce a new strategy that dramatically elevates existing attacks to operate on batches of arbitrarily large size, and without architectural modifications. Our model-agnostic strategy only requires modifications to the model parameters sent to the user, which is a realistic threat model in many scenarios. We demonstrate the strategy in challenging large-scale settings, obtaining high-fidelity data extraction in both cross-device and cross-silo federated learning. Code is available at https: //github. com/JonasGeiping/breaching.

ICML Conference 2022 Conference Paper

Plug-In Inversion: Model-Agnostic Inversion for Vision with Data Augmentations

  • Amin Ghiasi
  • Hamid Kazemi
  • Steven Reich
  • Chen Zhu 0001
  • Micah Goldblum
  • Tom Goldstein

Existing techniques for model inversion typically rely on hard-to-tune regularizers, such as total variation or feature regularization, which must be individually calibrated for each network in order to produce adequate images. In this work, we introduce Plug-In Inversion, which relies on a simple set of augmentations and does not require excessive hyper-parameter tuning. Under our proposed augmentation-based scheme, the same set of augmentation hyper-parameters can be used for inverting a wide range of image classification models, regardless of input dimensions or the architecture. We illustrate the practicality of our approach by inverting Vision Transformers (ViTs) and Multi-Layer Perceptrons (MLPs) trained on the ImageNet dataset, tasks which to the best of our knowledge have not been successfully accomplished by any previous works.

ICLR Conference 2022 Conference Paper

Robbing the Fed: Directly Obtaining Private Data in Federated Learning with Modified Models

  • Liam Fowl
  • Jonas Geiping
  • Wojciech Czaja
  • Micah Goldblum
  • Tom Goldstein

Federated learning has quickly gained popularity with its promises of increased user privacy and efficiency. Previous works have shown that federated gradient updates contain information that can be used to approximately recover user data in some situations. These previous attacks on user privacy have been limited in scope and do not scale to gradient updates aggregated over even a handful of data points, leaving some to conclude that data privacy is still intact for realistic training regimes. In this work, we introduce a new threat model based on minimal but malicious modifications of the shared model architecture which enable the server to directly obtain a verbatim copy of user data from gradient updates without solving difficult inverse problems. Even user data aggregated over large batches – where previous methods fail to extract meaningful content – can be reconstructed by these minimally modified models.

NeurIPS Conference 2022 Conference Paper

Robustness Disparities in Face Detection

  • Samuel Dooley
  • George Z Wei
  • Tom Goldstein
  • John Dickerson

Facial analysis systems have been deployed by large companies and critiqued by scholars and activists for the past decade. Many existing algorithmic audits examine the performance of these systems on later stage elements of facial analysis systems like facial recognition and age, emotion, or perceived gender prediction; however, a core component to these systems has been vastly understudied from a fairness perspective: face detection, sometimes called face localization. Since face detection is a pre-requisite step in facial analysis systems, the bias we observe in face detection will flow downstream to the other components like facial recognition and emotion prediction. Additionally, no prior work has focused on the robustness of these systems under various perturbations and corruptions, which leaves open the question of how various people are impacted by these phenomena. We present the first of its kind detailed benchmark of face detection systems, specifically examining the robustness to noise of commercial and academic models. We use both standard and recently released academic facial datasets to quantitatively analyze trends in face detection robustness. Across all the datasets and systems, we generally find that photos of individuals who are masculine presenting, older, of darker skin type, or have dim lighting are more susceptible to errors than their counterparts in other identities.

NeurIPS Conference 2022 Conference Paper

Sleeper Agent: Scalable Hidden Trigger Backdoors for Neural Networks Trained from Scratch

  • Hossein Souri
  • Liam Fowl
  • Rama Chellappa
  • Micah Goldblum
  • Tom Goldstein

As the curation of data for machine learning becomes increasingly automated, dataset tampering is a mounting threat. Backdoor attackers tamper with training data to embed a vulnerability in models that are trained on that data. This vulnerability is then activated at inference time by placing a "trigger'' into the model's input. Typical backdoor attacks insert the trigger directly into the training data, although the presence of such an attack may be visible upon inspection. In contrast, the Hidden Trigger Backdoor Attack achieves poisoning without placing a trigger into the training data at all. However, this hidden trigger attack is ineffective at poisoning neural networks trained from scratch. We develop a new hidden trigger attack, Sleeper Agent, which employs gradient matching, data selection, and target model re-training during the crafting process. Sleeper Agent is the first hidden trigger backdoor attack to be effective against neural networks trained from scratch. We demonstrate its effectiveness on ImageNet and in black-box settings. Our implementation code can be found at: https: //github. com/hsouri/Sleeper-Agent.

ICLR Conference 2022 Conference Paper

Stochastic Training is Not Necessary for Generalization

  • Jonas Geiping
  • Micah Goldblum
  • Phillip Pope
  • Michael Moeller 0001
  • Tom Goldstein

It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks. In this work, we demonstrate that non-stochastic full-batch training can achieve comparably strong performance to SGD on CIFAR-10 using modern architectures. To this end, we show that the implicit regularization of SGD can be completely replaced with explicit regularization. Our observations indicate that the perceived difficulty of full-batch training may be the result of its optimization properties and the disproportionate time and effort spent by the ML community tuning optimizers and hyperparameters for small-batch training.

NeurIPS Conference 2022 Conference Paper

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

  • Manli Shu
  • Weili Nie
  • De-An Huang
  • Zhiding Yu
  • Tom Goldstein
  • Anima Anandkumar
  • Chaowei Xiao

Pre-trained vision-language models (e. g. , CLIP) have shown promising zero-shot generalization in many downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using the training data from downstream tasks. While effective, training on domain-specific data reduces a model's generalization capability to unseen new domains. In this work, we propose test-time prompt tuning (TPT), a method that can learn adaptive prompts on the fly with a single test sample. TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample. In evaluating generalization to natural distribution shifts, TPT improves the zero-shot top-1 accuracy of CLIP by 3. 6\% on average, surpassing previous prompt tuning approaches that require additional task-specific training data. In evaluating cross-dataset generalization with unseen categories, TPTperforms on par with the state-of-the-art approaches that use additional training data.

ICLR Conference 2022 Conference Paper

The Close Relationship Between Contrastive Learning and Meta-Learning

  • Renkun Ni
  • Manli Shu
  • Hossein Souri
  • Micah Goldblum
  • Tom Goldstein

Contrastive learning has recently taken off as a paradigm for learning from unlabeled data. In this paper, we discuss the close relationship between contrastive learning and meta-learning under a certain task distribution. We complement this observation by showing that established meta-learning methods, such as Prototypical Networks, achieve comparable performance to SimCLR when paired with this task distribution. This relationship can be leveraged by taking established techniques from meta-learning, such as task-based data augmentation, and showing that they benefit contrastive learning as well. These tricks also benefit state-of-the-art self-supervised learners without using negative pairs such as BYOL, which achieves 94.6\% accuracy on CIFAR-10 using a self-supervised ResNet-18 feature extractor trained with our meta-learning tricks. We conclude that existing advances designed for contrastive learning or meta-learning can be exploited to benefit the other, and it is better for contrastive learning researchers to take lessons from the meta-learning literature (and vice-versa) than to reinvent the wheel.

ICLR Conference 2022 Conference Paper

The Uncanny Similarity of Recurrence and Depth

  • Avi Schwarzschild
  • Arjun Gupta
  • Amin Ghiasi
  • Micah Goldblum
  • Tom Goldstein

It is widely believed that deep neural networks contain layer specialization, wherein networks extract hierarchical features representing edges and patterns in shallow layers and complete objects in deeper layers. Unlike common feed-forward models that have distinct filters at each layer, recurrent networks reuse the same parameters at various depths. In this work, we observe that recurrent models exhibit the same hierarchical behaviors and the same performance benefits as depth despite reusing the same filters at every recurrence. By training models of various feed-forward and recurrent architectures on several datasets for image classification as well as maze solving, we show that recurrent networks have the ability to closely emulate the behavior of non-recurrent deep models, often doing so with far fewer parameters.

AAAI Conference 2022 Conference Paper

Towards Transferable Adversarial Attacks on Vision Transformers

  • Zhipeng Wei
  • Jingjing Chen
  • Micah Goldblum
  • Zuxuan Wu
  • Tom Goldstein
  • Yu-Gang Jiang

Vision transformers (ViTs) have demonstrated impressive performance on a series of computer vision tasks, yet they still suffer from adversarial examples. In this paper, we posit that adversarial attacks on transformers should be specially tailored for their architecture, jointly considering both patches and self-attention, in order to achieve high transferability. More specifically, we introduce a dual attack framework, which contains a Pay No Attention (PNA) attack and a PatchOut attack, to improve the transferability of adversarial samples across different ViTs. We show that skipping the gradients of attention during backpropagation can generate adversarial examples with high transferability. In addition, adversarial perturbations generated by optimizing randomly sampled subsets of patches at each iteration achieve higher attack success rates than attacks using all patches. We evaluate the transferability of attacks on state-of-the-art ViTs, CNNs and robustly trained CNNs. The results of these experiments demonstrate that the proposed dual attack can greatly boost transferability between ViTs and from ViTs to CNNs. In addition, the proposed method can easily be combined with existing transfer methods to boost performance.

NeurIPS Conference 2022 Conference Paper

Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

  • Roman Levin
  • Manli Shu
  • Eitan Borgnia
  • Furong Huang
  • Micah Goldblum
  • Tom Goldstein

Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We first verify that identified salient parameters are indeed responsible for misclassification by showing that turning these parameters off improves predictions on the associated samples more than turning off the same number of random or least salient parameters. We further validate the link between salient parameters and network misclassification errors by observing that fine-tuning a small number of the most salient parameters on a single sample results in error correction on other samples which were misclassified for similar reasons -- nearest neighbors in the saliency space. After validating our parameter-space saliency maps, we demonstrate that samples which cause similar parameters to malfunction are semantically similar. Further, we introduce an input-space saliency counterpart which reveals how image features cause specific network components to malfunction.

ICRA Conference 2021 Conference Paper

Adversarial Differentiable Data Augmentation for Autonomous Systems

  • Manli Shu
  • Yu Shen
  • Ming Lin 0003
  • Tom Goldstein

Autonomous systems often rely on neural networks to achieve high performance on planning and control problems. Unfortunately, neural networks suffer severely when input images become degraded in ways that are not reflected in the training data. This is particularly problematic for robotic systems like autonomous vehicles (AV) for which reliability is paramount. In this work, we consider robust optimization methods for hardening control systems against image corruptions and other unexpected domain shifts. Recent work on robust optimization for neural nets has been focused largely on combating adversarial attacks. In this work, we borrow ideas from the adversarial training and data augmentation literature to enhance robustness to image corruptions and domain shifts. To this end, we train networks while augmenting image data with a battery of image degradations. Unlike traditional augmentation methods, we choose the parameters for each degradation adversarially so as to maximize system performance. By formulating image degradations in a way that is differentiable with respect to degradation parameters, we enable the use of efficient optimization methods (PGD) for choosing worst-case augmentation parameters. We demonstrate the efficacy of this method on the learning to steer task for AVs. By adversarially training against image corruptions, we produce networks that are highly robust to image corruptions. We show that the proposed differentiable augmentation schemes result in higher levels of robustness and accuracy for a range of settings as compared to baseline and state-of-the-art augmentation methods.

NeurIPS Conference 2021 Conference Paper

Adversarial Examples Make Strong Poisons

  • Liam Fowl
  • Micah Goldblum
  • Ping-yeh Chiang
  • Jonas Geiping
  • Wojciech Czaja
  • Tom Goldstein

The adversarial machine learning literature is largely partitioned into evasion attacks on testing data and poisoning attacks on training data. In this work, we show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning. In fact, adversarial examples with labels re-assigned by the crafting network remain effective for training, suggesting that adversarial examples contain useful semantic content, just with the "wrong" labels (according to a network, but not a human). Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release, and we release a poisoned version of ImageNet, ImageNet-P, to encourage research into the strength of this form of data obfuscation.

AAAI Conference 2021 Conference Paper

Are Adversarial Examples Created Equal? A Learnable Weighted Minimax Risk for Robustness under Non-uniform Attacks

  • Huimin Zeng
  • Chen Zhu
  • Tom Goldstein
  • Furong Huang

Adversarial Training is proved to be an efficient method to defend against adversarial examples, being one of the few defenses that withstand strong attacks. However, traditional defense mechanisms assume a uniform attack over the examples according to the underlying data distribution, which is apparently unrealistic as the attacker could choose to focus on more vulnerable examples. We present a weighted minimax risk optimization that defends against non-uniform attacks, achieving robustness against adversarial examples under perturbed test data distributions. Our modified risk considers importance weights of different adversarial examples and focuses adaptively on harder examples that are wrongly classified or at higher risk of being classified incorrectly. The designed risk allows the training process to learn a strong defense through optimizing the importance weights. The experiments show that our model significantly improves state-ofthe-art adversarial accuracy under non-uniform attacks without a significant drop under uniform attacks.

NeurIPS Conference 2021 Conference Paper

Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks

  • Avi Schwarzschild
  • Eitan Borgnia
  • Arjun Gupta
  • Furong Huang
  • Uzi Vishkin
  • Micah Goldblum
  • Tom Goldstein

Deep neural networks are powerful machines for visual pattern recognition, but reasoning tasks that are easy for humans may still be difficult for neural models. Humans possess the ability to extrapolate reasoning strategies learned on simple problems to solve harder examples, often by thinking for longer. For example, a person who has learned to solve small mazes can easily extend the very same search techniques to solve much larger mazes by spending more time. In computers, this behavior is often achieved through the use of algorithms, which scale to arbitrarily hard problem instances at the cost of more computation. In contrast, the sequential computing budget of feed-forward neural networks is limited by their depth, and networks trained on simple problems have no way of extending their reasoning to accommodate harder problems. In this work, we show that recurrent networks trained to solve simple problems with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference. We demonstrate this algorithmic behavior of recurrent networks on prefix sum computation, mazes, and chess. In all three domains, networks trained on simple problem instances are able to extend their reasoning abilities at test time simply by "thinking for longer. "

NeurIPS Conference 2021 Conference Paper

Center Smoothing: Certified Robustness for Networks with Structured Outputs

  • Aounon Kumar
  • Tom Goldstein

The study of provable adversarial robustness has mostly been limited to classification tasks and models with one-dimensional real-valued outputs. We extend the scope of certifiable robustness to problems with more general and structured outputs like sets, images, language, etc. We model the output space as a metric space under a distance/similarity function, such as intersection-over-union, perceptual similarity, total variation distance, etc. Such models are used in many machine learning problems like image segmentation, object detection, generative models, image/audio-to-text systems, etc. Based on a robustness technique called randomized smoothing, our center smoothing procedure can produce models with the guarantee that the change in the output, as measured by the distance metric, remains small for any norm-bounded adversarial perturbation of the input. We apply our method to create certifiably robust models with disparate output spaces -- from sets to images -- and show that it yields meaningful certificates without significantly degrading the performance of the base model.

ICML Conference 2021 Conference Paper

Data Augmentation for Meta-Learning

  • Renkun Ni
  • Micah Goldblum
  • Amr Sharaf
  • Kezhi Kong
  • Tom Goldstein

Conventional image classifiers are trained by randomly sampling mini-batches of images. To achieve state-of-the-art performance, practitioners use sophisticated data augmentation schemes to expand the amount of training data available for sampling. In contrast, meta-learning algorithms sample support data, query data, and tasks on each training step. In this complex sampling scenario, data augmentation can be used not only to expand the number of images available per class, but also to generate entirely new classes/tasks. We systematically dissect the meta-learning pipeline and investigate the distinct ways in which data augmentation can be integrated at both the image and class levels. Our proposed meta-specific data augmentation significantly improves the performance of meta-learners on few-shot classification benchmarks.

NeurIPS Conference 2021 Conference Paper

Encoding Robustness to Image Style via Adversarial Feature Perturbations

  • Manli Shu
  • Zuxuan Wu
  • Micah Goldblum
  • Tom Goldstein

Adversarial training is the industry standard for producing models that are robust to small adversarial perturbations. However, machine learning practitioners need models that are robust to other kinds of changes that occur naturally, such as changes in the style or illumination of input images. Such changes in input distribution have been effectively modeled as shifts in the mean and variance of deep image features. We adapt adversarial training by directly perturbing feature statistics, rather than image pixels, to produce models that are robust to various unseen distributional shifts. We explore the relationship between these perturbations and distributional shifts by visualizing adversarial features. Our proposed method, Adversarial Batch Normalization (AdvBN), is a single network layer that generates worst-case feature perturbations during training. By fine-tuning neural networks on adversarial feature distributions, we observe improved robustness of networks to various unseen distributional shifts, including style variations and image corruptions. In addition, we show that our proposed adversarial feature perturbation can be complementary to existing image space data augmentation methods, leading to improved performance. The source code and pre-trained models are released at \url{https: //github. com/azshue/AdvBN}.

NeurIPS Conference 2021 Conference Paper

Gradient-Free Adversarial Training Against Image Corruption for Learning-based Steering

  • Yu Shen
  • Laura Zheng
  • Manli Shu
  • Weizi Li
  • Tom Goldstein
  • Ming Lin

We introduce a simple yet effective framework for improving the robustness of learning algorithms against image corruptions for autonomous driving. These corruptions can occur due to both internal (e. g. , sensor noises and hardware abnormalities) and external factors (e. g. , lighting, weather, visibility, and other environmental effects). Using sensitivity analysis with FID-based parameterization, we propose a novel algorithm exploiting basis perturbations to improve the overall performance of autonomous steering and other image processing tasks, such as classification and detection, for self-driving cars. Our model not only improves the performance on the original dataset, but also achieves significant performance improvement on datasets with multiple and unseen perturbations, up to 87% and 77%, respectively. A comparison between our approach and other SOTA techniques confirms the effectiveness of our technique in improving the robustness of neural network training for learning-based steering and other image processing tasks.

NeurIPS Conference 2021 Conference Paper

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

  • Chen Zhu
  • Renkun Ni
  • Zheng Xu
  • Kezhi Kong
  • W. Ronny Huang
  • Tom Goldstein

Innovations in neural architectures have fostered significant breakthroughs in language modeling and computer vision. Unfortunately, novel architectures often result in challenging hyper-parameter choices and training instability if the network parameters are not properly initialized. A number of architecture-specific initialization schemes have been proposed, but these schemes are not always portable to new architectures. This paper presents GradInit, an automated and architecture agnostic method for initializing neural networks. GradInit is based on a simple heuristic; the norm of each network layer is adjusted so that a single step of SGD or Adam with prescribed hyperparameters results in the smallest possible loss value. This adjustment is done by introducing a scalar multiplier variable in front of each parameter block, and then optimizing these variables using a simple numerical scheme. GradInit accelerates the convergence and test performance of many convolutional architectures, both with or without skip connections, and even without normalization layers. It also improves the stability of the original Transformer architecture for machine translation, enabling training it without learning rate warmup using either Adam or SGD under a wide range of learning rates and momentum coefficients. Code is available at https: //github. com/zhuchen03/gradinit.

ICML Conference 2021 Conference Paper

Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks

  • Avi Schwarzschild
  • Micah Goldblum
  • Arjun Gupta
  • John Dickerson 0001
  • Tom Goldstein

Data poisoning and backdoor attacks manipulate training data in order to cause models to fail during inference. A recent survey of industry practitioners found that data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. However, it remains unclear exactly how dangerous poisoning methods are and which ones are more effective considering that these methods, even ones with identical objectives, have not been tested in consistent or realistic settings. We observe that data poisoning and backdoor attacks are highly sensitive to variations in the testing setup. Moreover, we find that existing methods may not generalize to realistic settings. While these existing works serve as valuable prototypes for data poisoning, we apply rigorous tests to determine the extent to which we should fear them. In order to promote fair comparison in future work, we develop standardized benchmarks for data poisoning and backdoor attacks.

NeurIPS Conference 2021 Conference Paper

Long-Short Transformer: Efficient Transformers for Language and Vision

  • Chen Zhu
  • Wei Ping
  • Chaowei Xiao
  • Mohammad Shoeybi
  • Tom Goldstein
  • Anima Anandkumar
  • Bryan Catanzaro

Transformers have achieved success in both language and vision domains. However, it is prohibitively expensive to scale them to long sequences such as long documents or high-resolution images, because self-attention mechanism has quadratic time and memory complexities with respect to the input sequence length. In this paper, we propose Long-Short Transformer (Transformer-LS), an efficient self-attention mechanism for modeling long sequences with linear complexity for both language and vision tasks. It aggregates a novel long-range attention with dynamic projection to model distant correlations and a short-term attention to capture fine-grained local correlations. We propose a dual normalization strategy to account for the scale mismatch between the two attention mechanisms. Transformer-LS can be applied to both autoregressive and bidirectional models without additional complexity. Our method outperforms the state-of-the-art models on multiple tasks in language and vision domains, including the Long Range Arena benchmark, autoregressive language modeling, and ImageNet classification. For instance, Transformer-LS achieves 0. 97 test BPC on enwik8 using half the number of parameters than previous method, while being faster and is able to handle 3x as long sequences compared to its full-attention version on the same hardware. On ImageNet, it can obtain the state-of-the-art results (e. g. , a moderate size of 55. 8M model solely trained on 224x224 ImageNet-1K can obtain Top-1 accuracy 84. 1%), while being more scalable on high-resolution images. The source code and models are released at https: //github. com/NVIDIA/transformer-ls.

ICLR Conference 2021 Conference Paper

LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition

  • Valeriia Cherepanova
  • Micah Goldblum
  • Harrison Foley
  • Shiyuan Duan
  • John Dickerson 0001
  • Gavin Taylor
  • Tom Goldstein

Facial recognition systems are increasingly deployed by private corporations, government agencies, and contractors for consumer services and mass surveillance programs alike. These systems are typically built by scraping social media profiles for user images. Adversarial perturbations have been proposed for bypassing facial recognition systems. However, existing methods fail on full-scale systems and commercial APIs. We develop our own adversarial filter that accounts for the entire image processing pipeline and is demonstrably effective against industrial-grade pipelines that include face detection and large scale databases. Additionally, we release an easy-to-use webtool that significantly degrades the accuracy of Amazon Rekognition and the Microsoft Azure Face Recognition API, reducing the accuracy of each to below 1%.

ICLR Conference 2021 Conference Paper

The Intrinsic Dimension of Images and Its Impact on Learning

  • Phillip Pope
  • Chen Zhu 0001
  • Ahmed Abdelkader
  • Micah Goldblum
  • Tom Goldstein

It is widely believed that natural image data exhibits low-dimensional structure despite the high dimensionality of conventional pixel representations. This idea underlies a common intuition for the remarkable success of deep learning in computer vision. In this work, we apply dimension estimation tools to popular datasets and investigate the role of low-dimensional structure in deep learning. We find that common natural image datasets indeed have very low intrinsic dimension relative to the high number of pixels in the images. Additionally, we find that low dimensional datasets are easier for neural networks to learn, and models solving these tasks generalize better from training to test data. Along the way, we develop a technique for validating our dimension estimation tools on synthetic data generated by GANs allowing us to actively manipulate the intrinsic dimension by controlling the image generation process. Code for our experiments may be found \href{https://github.com/ppope/dimensions}{here}.

NeurIPS Conference 2021 Conference Paper

VQ-GNN: A Universal Framework to Scale up Graph Neural Networks using Vector Quantization

  • Mucong Ding
  • Kezhi Kong
  • Jingling Li
  • Chen Zhu
  • John Dickerson
  • Furong Huang
  • Tom Goldstein

Most state-of-the-art Graph Neural Networks (GNNs) can be defined as a form of graph convolution which can be realized by message passing between direct neighbors or beyond. To scale such GNNs to large graphs, various neighbor-, layer-, or subgraph-sampling techniques are proposed to alleviate the "neighbor explosion" problem by considering only a small subset of messages passed to the nodes in a mini-batch. However, sampling-based methods are difficult to apply to GNNs that utilize many-hops-away or global context each layer, show unstable performance for different tasks and datasets, and do not speed up model inference. We propose a principled and fundamentally different approach, VQ-GNN, a universal framework to scale up any convolution-based GNNs using Vector Quantization (VQ) without compromising the performance. In contrast to sampling-based techniques, our approach can effectively preserve all the messages passed to a mini-batch of nodes by learning and updating a small number of quantized reference vectors of global node representations, using VQ within each GNN layer. Our framework avoids the "neighbor explosion" problem of GNNs using quantized representations combined with a low-rank version of the graph convolution matrix. We show that such a compact low-rank version of the gigantic convolution matrix is sufficient both theoretically and experimentally. In company with VQ, we design a novel approximated message passing algorithm and a nontrivial back-propagation rule for our framework. Experiments on various types of GNN backbones demonstrate the scalability and competitive performance of our framework on large-graph node classification and link prediction benchmarks.

ICLR Conference 2021 Conference Paper

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

  • Jonas Geiping
  • Liam Fowl
  • W. Ronny Huang
  • Wojciech Czaja
  • Gavin Taylor
  • Michael Moeller 0001
  • Tom Goldstein

Data Poisoning attacks modify training data to maliciously control a model trained on such data. In this work, we focus on targeted poisoning attacks which cause a reclassification of an unmodified test image and as such breach model integrity. We consider a particularly malicious poisoning attack that is both ``from scratch" and ``clean label", meaning we analyze an attack that successfully works against new, randomly initialized models, and is nearly imperceptible to humans, all while perturbing only a small fraction of the training data. Previous poisoning attacks against deep neural networks in this setting have been limited in scope and success, working only in simplified settings or being prohibitively expensive for large datasets. The central mechanism of the new attack is matching the gradient direction of malicious examples. We analyze why this works, supplement with practical considerations. and show its threat to real-world practitioners, finding that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset. Finally we demonstrate the limitations of existing defensive strategies against such an attack, concluding that data poisoning is a credible threat, even for large-scale deep learning systems.

ICLR Conference 2021 Conference Paper

WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

  • Renkun Ni
  • Hong-Min Chu
  • Oscar Castañeda
  • Ping-Yeh Chiang
  • Christoph Studer
  • Tom Goldstein

Low-precision neural networks represent both weights and activations with few bits, drastically reducing the cost of multiplications. Meanwhile, these products are accumulated using high-precision (typically 32-bit) additions. Additions dominate the arithmetic complexity of inference in quantized (e.g., binary) nets, and high precision is needed to avoid overflow. To further optimize inference, we propose WrapNet, an architecture that adapts neural networks to use low-precision (8-bit) additions while achieving classification accuracy comparable to their 32-bit counterparts. We achieve resilience to low-precision accumulation by inserting a cyclic activation layer that makes results invariant to overflow. We demonstrate the efficacy of our approach using both software and hardware platforms.

ICML Conference 2020 Conference Paper

Adversarial Attacks on Copyright Detection Systems

  • Parsa Saadatpanah
  • Ali Shafahi
  • Tom Goldstein

It is well-known that many machine learning models are susceptible to adversarial attacks, in which an attacker evades a classifier by making small perturbations to inputs. This paper discusses how industrial copyright detection tools, which serve a central role on the web, are susceptible to adversarial attacks. As proof of concept, we describe a well-known music identification method and implement this system in the form of a neural net. We then attack this system using simple gradient methods and show that it is easily broken with white-box attacks. By scaling these perturbations up, we can create transfer attacks on industrial systems, such as the AudioTag copyright detector and YouTube’s Content ID system, using perturbations that are audible but significantly smaller than a random baseline. Our goal is to raise awareness of the threats posed by adversarial examples in this space and to highlight the importance of hardening copyright detection systems to attacks.

AAAI Conference 2020 Conference Paper

Adversarially Robust Distillation

  • Micah Goldblum
  • Liam Fowl
  • Soheil Feizi
  • Tom Goldstein

Knowledge distillation is effective for producing small, highperformance neural networks for classification, but these small networks are vulnerable to adversarial attacks. This paper studies how adversarial robustness transfers from teacher to student during knowledge distillation. We find that a large amount of robustness may be inherited by the student even when distilled on only clean images. Second, we introduce Adversarially Robust Distillation (ARD) for distilling robustness onto student networks. In addition to producing small models with high test accuracy like conventional distillation, ARD also passes the superior robustness of large networks onto the student. In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.

NeurIPS Conference 2020 Conference Paper

Adversarially Robust Few-Shot Learning: A Meta-Learning Approach

  • Micah Goldblum
  • Liam Fowl
  • Tom Goldstein

Previous work on adversarially robust neural networks for image classification requires large training sets and computationally expensive training procedures. On the other hand, few-shot learning methods are highly vulnerable to adversarial examples. The goal of our work is to produce networks which both perform well at few-shot classification tasks and are simultaneously robust to adversarial examples. We develop an algorithm, called Adversarial Querying (AQ), for producing adversarially robust meta-learners, and we thoroughly investigate the causes for adversarial vulnerability. Moreover, our method achieves far superior robust performance on few-shot image classification tasks, such as Mini-ImageNet and CIFAR-FS, than robust transfer learning.

ICLR Conference 2020 Conference Paper

Adversarially robust transfer learning

  • Ali Shafahi
  • Parsa Saadatpanah
  • Chen Zhu 0001
  • Amin Ghiasi
  • Christoph Studer
  • David W. Jacobs
  • Tom Goldstein

Transfer learning, in which a network is trained on one task and re-purposed on another, is often used to produce neural network classifiers when data is scarce or full-scale training is too costly. When the goal is to produce a model that is not only accurate but also adversarially robust, data scarcity and computational limitations become even more cumbersome. We consider robust transfer learning, in which we transfer not only performance but also robustness from a source model to a target domain. We start by observing that robust networks contain robust feature extractors. By training classifiers on top of these feature extractors, we produce new models that inherit the robustness of their parent networks. We then consider the case of "fine tuning" a network by re-training end-to-end in the target domain. When using lifelong learning strategies, this process preserves the robustness of the source network while achieving high accuracy. By using such strategies, it is possible to produce accurate and robust models with little data, and without the cost of adversarial training. Additionally, we can improve the generalization of adversarially trained models, while maintaining their robustness.

ICLR Conference 2020 Conference Paper

Breaking Certified Defenses: Semantic Adversarial Examples with Spoofed robustness Certificates

  • Amin Ghiasi
  • Ali Shafahi
  • Tom Goldstein

Defenses against adversarial attacks can be classified into certified and non-certified. Certifiable defenses make networks robust within a certain $\ell_p$-bounded radius, so that it is impossible for the adversary to make adversarial examples in the certificate bound. We present an attack that maintains the imperceptibility property of adversarial examples while being outside of the certified radius. Furthermore, the proposed "Shadow Attack" can fool certifiably robust networks by producing an imperceptible adversarial example that gets misclassified and produces a strong ``spoofed'' certificate.

ICML Conference 2020 Conference Paper

Certified Data Removal from Machine Learning Models

  • Chuan Guo 0001
  • Tom Goldstein
  • Awni Y. Hannun
  • Laurens van der Maaten

Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

ICLR Conference 2020 Conference Paper

Certified Defenses for Adversarial Patches

  • Ping-Yeh Chiang
  • Renkun Ni
  • Ahmed Abdelkader
  • Chen Zhu 0001
  • Christoph Studer
  • Tom Goldstein

Adversarial patch attacks are among one of the most practical threat models against real-world computer vision systems. This paper studies certified and empirical defenses against patch attacks. We begin with a set of experiments showing that most existing defenses, which work by pre-processing input images to mitigate adversarial patches, are easily broken by simple white-box adversaries. Motivated by this finding, we propose the first certified defense against patch attacks, and propose faster methods for its training. Furthermore, we experiment with different patch shapes for testing, obtaining surprisingly good robustness transfer across shapes, and present preliminary results on certified defense against sparse attacks. Our complete implementation can be found on: https://github.com/Ping-C/certifiedpatchdefense.

NeurIPS Conference 2020 Conference Paper

Certifying Confidence via Randomized Smoothing

  • Aounon Kumar
  • Alexander Levine
  • Soheil Feizi
  • Tom Goldstein

Randomized smoothing has been shown to provide good certified-robustness guarantees for high-dimensional classification problems. It uses the probabilities of predicting the top two most-likely classes around an input point under a smoothing distribution to generate a certified radius for a classifier's prediction. However, most smoothing methods do not give us any information about the \emph{confidence} with which the underlying classifier (e. g. , deep neural network) makes a prediction. In this work, we propose a method to generate certified radii for the prediction confidence of the smoothed classifier. We consider two notions for quantifying confidence: average prediction score of a class and the margin by which the average prediction score of one class exceeds that of another. We modify the Neyman-Pearson lemma (a key theorem in randomized smoothing) to design a procedure for computing the certified radius where the confidence is guaranteed to stay above a certain threshold. Our experimental results on CIFAR-10 and ImageNet datasets show that using information about the distribution of the confidence scores allows us to achieve a significantly better certified radius than ignoring it. Thus, we demonstrate that extra information about the base classifier at the input point can help improve certified guarantees for the smoothed classifier. Code for the experiments is available at \url{https: //github. com/aounon/cdf-smoothing}.

NeurIPS Conference 2020 Conference Paper

Certifying Strategyproof Auction Networks

  • Michael Curry
  • Ping-yeh Chiang
  • Tom Goldstein
  • John Dickerson

Optimal auctions maximize a seller's expected revenue subject to individual rationality and strategyproofness for the buyers. Myerson's seminal work in 1981 settled the case of auctioning a single item; however, subsequent decades of work have yielded little progress moving beyond a single item, leaving the design of revenue-maximizing auctions as a central open problem in the field of mechanism design. A recent thread of work in ``differentiable economics'' has used tools from modern deep learning to instead learn good mechanisms. We focus on the RegretNet architecture, which can represent auctions with arbitrary numbers of items and participants; it is trained to be empirically strategyproof, but the property is never exactly verified leaving potential loopholes for market participants to exploit. We propose ways to explicitly verify strategyproofness under a particular valuation profile using techniques from the neural network verification literature. Doing so requires making several modifications to the RegretNet architecture in order to represent it exactly in an integer program. We train our network and produce certificates in several settings, including settings for which the optimal strategyproof mechanism is not known.

ICML Conference 2020 Conference Paper

Curse of Dimensionality on Randomized Smoothing for Certifiable Robustness

  • Aounon Kumar
  • Alexander Levine 0001
  • Tom Goldstein
  • Soheil Feizi

Randomized smoothing, using just a simple isotropic Gaussian distribution, has been shown to produce good robustness guarantees against $\ell_2$-norm bounded adversaries. In this work, we show that extending the smoothing technique to defend against other attack models can be challenging, especially in the high-dimensional regime. In particular, for a vast class of i. i. d. smoothing distributions, we prove that the largest $\ell_p$-radius that can be certified decreases as $O(1/d^{\frac{1}{2} - \frac{1}{p}})$ with dimension $d$ for $p > 2$. Notably, for $p \geq 2$, this dependence on $d$ is no better than that of the $\ell_p$-radius that can be certified using isotropic Gaussian smoothing, essentially putting a matching lower bound on the robustness radius. When restricted to \emph{generalized} Gaussian smoothing, these two bounds can be shown to be within a constant factor of each other in an asymptotic sense, establishing that Gaussian smoothing provides the best possible results, up to a constant factor, when $p \geq 2$. We present experimental results on CIFAR to validate our theory. For other smoothing distributions, such as, a uniform distribution within an $\ell_1$ or an $\ell_\infty$-norm ball, we show upper bounds of the form $O(1 / d)$ and $O(1 / d^{1 - \frac{1}{p}})$ respectively, which have an even worse dependence on $d$.

NeurIPS Conference 2020 Conference Paper

Detection as Regression: Certified Object Detection with Median Smoothing

  • Ping-yeh Chiang
  • Michael Curry
  • Ahmed Abdelkader
  • Aounon Kumar
  • John Dickerson
  • Tom Goldstein

Despite the vulnerability of object detectors to adversarial attacks, very few defenses are known to date. While adversarial training can improve the empirical robustness of image classifiers, a direct extension to object detection is very expensive. This work is motivated by recent progress on certified classification by randomized smoothing. We start by presenting a reduction from object detection to a regression problem. Then, to enable certified regression, where standard mean smoothing fails, we propose median smoothing, which is of independent interest. We obtain the first model-agnostic, training-free, and certified defense for object detection against $\ell_2$-bounded attacks.

ICLR Conference 2020 Conference Paper

FreeLB: Enhanced Adversarial Training for Natural Language Understanding

  • Chen Zhu 0001
  • Yu Cheng 0001
  • Zhe Gan
  • Siqi Sun
  • Tom Goldstein
  • Jingjing Liu 0001

Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models. In this work, we propose a novel adversarial training algorithm, FreeLB, that promotes higher invariance in the embedding space, by adding adversarial perturbations to word embeddings and minimizing the resultant adversarial risk inside different regions around input samples. To validate the effectiveness of the proposed approach, we apply it to Transformer-based models for natural language understanding and commonsense reasoning tasks. Experiments on the GLUE benchmark show that when applied only to the finetuning stage, it is able to improve the overall test scores of BERT-base model from 78.3 to 79.4, and RoBERTa-large model from 88.5 to 88.8. In addition, the proposed approach achieves state-of-the-art single-model test accuracies of 85.44% and 67.75% on ARC-Easy and ARC-Challenge. Experiments on CommonsenseQA benchmark further demonstrate that FreeLB can be generalized and boost the performance of RoBERTa-large model on other tasks as well.

NeurIPS Conference 2020 Conference Paper

MetaPoison: Practical General-purpose Clean-label Data Poisoning

  • W. Ronny Huang
  • Jonas Geiping
  • Liam Fowl
  • Gavin Taylor
  • Tom Goldstein

Data poisoning---the process by which an attacker takes control of a model by making imperceptible changes to a subset of the training data---is an emerging threat in the context of neural networks. Existing attacks for data poisoning neural networks have relied on hand-crafted heuristics, because solving the poisoning problem directly via bilevel optimization is generally thought of as intractable for deep models. We propose MetaPoison, a first-order method that approximates the bilevel problem via meta-learning and crafts poisons that fool neural networks. MetaPoison is effective: it outperforms previous clean-label poisoning methods by a large margin. MetaPoison is robust: poisoned data made for one model transfer to a variety of victim models with unknown training settings and architectures. MetaPoison is general-purpose, it works not only in fine-tuning scenarios, but also for end-to-end training from scratch, which till now hasn't been feasible for clean-label attacks with deep nets. MetaPoison can achieve arbitrary adversary goals---like using poisons of one class to make a target image don the label of another arbitrarily chosen class. Finally, MetaPoison works in the real-world. We demonstrate for the first time successful data poisoning of models trained on the black-box Google Cloud AutoML API.

ICLR Conference 2020 Conference Paper

Network Deconvolution

  • Chengxi Ye
  • Matthew S. Evanusa
  • Hua He
  • Anton Mitrokhin
  • Tom Goldstein
  • James A. Yorke
  • Cornelia Fermüller
  • Yiannis Aloimonos

Convolution is a central operation in Convolutional Neural Networks (CNNs), which applies a kernel to overlapping regions shifted across the image. However, because of the strong correlations in real-world image data, convolutional kernels are in effect re-learning redundant data. In this work, we show that this redundancy has made neural network training challenging, and propose network deconvolution, a procedure which optimally removes pixel-wise and channel-wise correlations before the data is fed into each layer. Network deconvolution can be efficiently calculated at a fraction of the computational cost of a convolution layer. We also show that the deconvolution filters in the first layer of the network resemble the center-surround structure found in biological neurons in the visual regions of the brain. Filtering with such kernels results in a sparse representation, a desired property that has been missing in the training of neural networks. Learning from the sparse representation promotes faster convergence and superior results without the use of batch normalization. We apply our network deconvolution operation to 10 modern neural network models by replacing batch normalization within each. Extensive experiments show that the network deconvolution operation is able to deliver performance improvement in all cases on the CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, Cityscapes, and ImageNet datasets.

ICML Conference 2020 Conference Paper

The Impact of Neural Network Overparameterization on Gradient Confusion and Stochastic Gradient Descent

  • Karthik Abinav Sankararaman
  • Soham De
  • Zheng Xu 0002
  • W. Ronny Huang
  • Tom Goldstein

This paper studies how neural network architecture affects the speed of training. We introduce a simple concept called gradient confusion to help formally analyze this. When gradient confusion is high, stochastic gradients produced by different data samples may be negatively correlated, slowing down convergence. But when gradient confusion is low, data samples interact harmoniously, and training proceeds quickly. Through theoretical and experimental results, we demonstrate how the neural network architecture affects gradient confusion, and thus the efficiency of training. Our results show that, for popular initialization techniques, increasing the width of neural networks leads to lower gradient confusion, and thus faster model training. On the other hand, increasing the depth of neural networks has the opposite effect. Our results indicate that alternate initialization techniques or networks using both batch normalization and skip connections help reduce the training burden of very deep networks.

ICLR Conference 2020 Conference Paper

Truth or backpropaganda? An empirical investigation of deep learning theory

  • Micah Goldblum
  • Jonas Geiping
  • Avi Schwarzschild
  • Michael Moeller 0001
  • Tom Goldstein

We empirically evaluate common assumptions about neural networks that are widely held by practitioners and theorists alike. In this work, we: (1) prove the widespread existence of suboptimal local minima in the loss landscape of neural networks, and we use our theory to find examples; (2) show that small-norm parameters are not optimal for generalization; (3) demonstrate that ResNets do not conform to wide-network theories, such as the neural tangent kernel, and that the interaction between skip connections and batch normalization plays a role; (4) find that rank does not correlate with generalization or robustness in a practical setting.

AAAI Conference 2020 Conference Paper

Universal Adversarial Training

  • Ali Shafahi
  • Mahyar Najibi
  • Zheng Xu
  • John Dickerson
  • Larry S. Davis
  • Tom Goldstein

Standard adversarial attacks change the predicted class label of a selected image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13× faster than the standard method. To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game, and produces robust models with only 2× the cost of natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation, which allows us to do universal adversarial training on ImageNet.

ICML Conference 2020 Conference Paper

Unraveling Meta-Learning: Understanding Feature Representations for Few-Shot Tasks

  • Micah Goldblum
  • Steven Reich
  • Liam Fowl
  • Renkun Ni
  • Valeriia Cherepanova
  • Tom Goldstein

Meta-learning algorithms produce feature extractors which achieve state-of-the-art performance on few-shot classification. While the literature is rich with meta-learning methods, little is known about why the resulting feature extractors perform so well. We develop a better understanding of the underlying mechanics of meta-learning and the difference between models trained using meta-learning and models which are trained classically. In doing so, we introduce and verify several hypotheses for why meta-learned models perform better. Furthermore, we develop a regularizer which boosts the performance of standard training routines for few-shot classification. In many cases, our routine outperforms meta-learning while simultaneously running an order of magnitude faster.

NeurIPS Conference 2019 Conference Paper

Adversarial training for free!

  • Ali Shafahi
  • Mahyar Najibi
  • Mohammad Amin Ghiasi
  • Zheng Xu
  • John Dickerson
  • Christoph Studer
  • Larry Davis
  • Gavin Taylor

Adversarial training, in which a network is trained on adversarial examples, is one of the few defenses against adversarial attacks that withstands strong attacks. Unfortunately, the high cost of generating strong adversarial examples makes standard adversarial training impractical on large-scale problems like ImageNet. We present an algorithm that eliminates the overhead cost of generating adversarial examples by recycling the gradient information computed when updating model parameters. Our "free" adversarial training algorithm achieves comparable robustness to PGD adversarial training on the CIFAR-10 and CIFAR-100 datasets at negligible additional cost compared to natural training, and can be 7 to 30 times faster than other strong adversarial training methods. Using a single workstation with 4 P100 GPUs and 2 days of runtime, we can train a robust model for the large-scale ImageNet classification task that maintains 40% accuracy against PGD attacks.

RLDM Conference 2019 Conference Abstract

Reinforcement Learning for Dynamic Set Packing

  • Michael J Curry
  • Duncan C McElfresh
  • Xuchen You
  • Cameron Moy
  • Tom Goldstein

Set packing is a classic combinatorial optimization problem: given a pool of elements, and a list of feasible subsets of those elements, choose subsets that contain as many elements as possible, with the constraint that the subsets must be disjoint. Many useful problems can be formulated in this way: we focus on the problem of finding the best clearing for a matching market of the type used for kidney exchange. Even very abstract versions of this problem are NP-hard, but much more complicated and realistic models based on this problem are frequently solved in practice by reduction to integer linear programming (ILP). Dynamic set packing is a generalization of set packing where there are multiple opportunities to match, and where elements may arrive or depart over time. This is a good model of the decision making process for an institution running a matching market: participants may arrive and depart, and the institution can wait to match some of them if it might be more efficient. Yet in practice, institutions tend to only consider the static problem, greedily performing a maximal matching at fixed time intervals. We wish to use reinforcement learning to find state-dependent matching policies that outperform this greedy approach. Our policies will not directly output a solution to the set packing problem, but will instead decide whether or not to have an ILP solver find a maximum weighted match, or else bias it to include or avoid certain elements in its solution. We hope the work will be of direct practical use for real-world matching markets, and of more general interest to anyone who wishes to approximately solve combinatorial optimization problems in dynamic settings. Inspired by recent work in computer science, which finds that some RL agents learn policies similar to known worst-case-optimal algorithms, we also hope to see how the learned policies may be similar or different to results from the economics literature about market thickness.

ICML Conference 2019 Conference Paper

Transferable Clean-Label Poisoning Attacks on Deep Neural Nets

  • Chen Zhu 0001
  • W. Ronny Huang
  • Hengduo Li
  • Gavin Taylor
  • Christoph Studer
  • Tom Goldstein

In this paper, we explore clean-label poisoning attacks on deep convolutional networks with access to neither the network’s output nor its architecture or parameters. Our goal is to ensure that after injecting the poisons into the training data, a model with unknown architecture and parameters trained on that data will misclassify the target image into a specific class. To achieve this goal, we generate multiple poison images from the base class by adding small perturbations which cause the poison images to trap the target image within their convex polytope in feature space. We also demonstrate that using Dropout during crafting of the poisons and enforcing this objective in multiple layers enhances transferability, enabling attacks against both the transfer learning and end-to-end training settings. We demonstrate transferable attack success rates of over 50% by poisoning only 1% of the training set.

ICML Conference 2018 Conference Paper

Linear Spectral Estimators and an Application to Phase Retrieval

  • Ramina Ghods
  • Andrew S. Lan
  • Tom Goldstein
  • Christoph Studer

Phase retrieval refers to the problem of recovering real- or complex-valued vectors from magnitude measurements. The best-known algorithms for this problem are iterative in nature and rely on so-called spectral initializers that provide accurate initialization vectors. We propose a novel class of estimators suitable for general nonlinear measurement systems, called linear spectral estimators (LSPEs), which can be used to compute accurate initialization vectors for phase retrieval problems. The proposed LSPEs not only provide accurate initialization vectors for noisy phase retrieval systems with structured or random measurement matrices, but also enable the derivation of sharp and nonasymptotic mean-squared error bounds. We demonstrate the efficacy of LSPEs on synthetic and real-world phase retrieval problems, and we show that our estimators significantly outperform existing methods for structured measurement systems that arise in practice.

NeurIPS Conference 2018 Conference Paper

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

  • Ali Shafahi
  • W. Ronny Huang
  • Mahyar Najibi
  • Octavian Suciu
  • Christoph Studer
  • Tudor Dumitras
  • Tom Goldstein

Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use ``clean-labels''; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a specific test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by putting them online and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a ``watermarking'' strategy that makes poisoning reliable using multiple (approx. 50) poisoned training instances. We demonstrate our method by generating poisoned frog images from the CIFAR dataset and using them to manipulate image classifiers.

NeurIPS Conference 2018 Conference Paper

Visualizing the Loss Landscape of Neural Nets

  • Hao Li
  • Zheng Xu
  • Gavin Taylor
  • Christoph Studer
  • Tom Goldstein

Neural network training relies on our ability to find "good" minimizers of highly non-convex loss functions. It is well known that certain network architecture designs (e. g. , skip connections) produce loss functions that train easier, and well-chosen training parameters (batch size, learning rate, optimizer) produce minimizers that generalize better. However, the reasons for these differences, and their effect on the underlying loss landscape, is not well understood. In this paper, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. First, we introduce a simple "filter normalization" method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. Then, using a variety of visualizations, we explore how network architecture affects the loss landscape, and how training parameters affect the shape of minimizers.

ICML Conference 2017 Conference Paper

Adaptive Consensus ADMM for Distributed Optimization

  • Zheng Xu 0002
  • Gavin Taylor
  • Hao Li 0022
  • Mário A. T. Figueiredo
  • Xiaoming Yuan
  • Tom Goldstein

The alternating direction method of multipliers (ADMM) is commonly used for distributed model fitting problems, but its performance and reliability depend strongly on user-defined penalty parameters. We study distributed ADMM methods that boost performance by using different fine-tuned algorithm parameters on each worker node. We present a O(1/k) convergence rate for adaptive ADMM methods with node-specific parameters, and propose adaptive consensus ADMM (ACADMM), which automatically tunes parameters without user oversight.

ICML Conference 2017 Conference Paper

Convex Phase Retrieval without Lifting via PhaseMax

  • Tom Goldstein
  • Christoph Studer

Semidefinite relaxation methods transform a variety of non-convex optimization problems into convex problems, but square the number of variables. We study a new type of convex relaxation for phase retrieval problems, called PhaseMax, that convexifies the underlying problem without lifting. The resulting problem formulation can be solved using standard convex optimization routines, while still working in the original, low-dimensional variable space. We prove, using a random spherical distribution measurement model, that PhaseMax succeeds with high probability for a sufficiently large number of measurements. We compare our approach to other phase retrieval methods and demonstrate that our theory accurately predicts the success of PhaseMax.

NeurIPS Conference 2017 Conference Paper

Training Quantized Nets: A Deeper Understanding

  • Hao Li
  • Soham De
  • Zheng Xu
  • Christoph Studer
  • Hanan Samet
  • Tom Goldstein

Currently, deep neural networks are deployed on low-power portable devices by first training a full-precision model using powerful hardware, and then deriving a corresponding low-precision model for efficient inference on such systems. However, training models directly with coarsely quantized weights is a key step towards learning on embedded platforms that have limited computing resources, memory capacity, and power consumption. Numerous recent publications have studied methods for training quantized networks, but these studies have mostly been empirical. In this work, we investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions. We then look at the behavior of these algorithms for non-convex problems, and show that training algorithms that exploit high-precision representations have an important greedy search phase that purely quantized training methods lack, which explains the difficulty of training using low-precision arithmetic.

ICML Conference 2016 Conference Paper

Dealbreaker: A Nonlinear Latent Variable Model for Educational Data

  • Andrew S. Lan
  • Tom Goldstein
  • Richard G. Baraniuk
  • Christoph Studer

Statistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students’ knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a student’s success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretable—they provide key insights into which concepts are critical (i. e. , the “dealbreaker”) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model.

ICML Conference 2016 Conference Paper

Training Neural Networks Without Gradients: A Scalable ADMM Approach

  • Gavin Taylor
  • Ryan Burmeister
  • Zheng Xu 0002
  • Bharat Singh
  • Ankit B. Patel
  • Tom Goldstein

With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks. This is largely because conventional optimization algorithms rely on stochastic gradient methods that don’t scale well to large numbers of cores in a cluster setting. Furthermore, the convergence of all gradient methods, including batch methods, suffers from common problems like saturation effects, poor conditioning, and saddle points. This paper explores an unconventional training method that uses alternating direction methods and Bregman iteration to train networks without gradient descent steps. The proposed method reduces the network training problem to a sequence of minimization sub-steps that can each be solved globally in closed form. The proposed method is advantageous because it avoids many of the caveats that make gradient methods slow on highly non-convex problems. In addition, the method exhibits strong scaling in the distributed setting, yielding linear speedups even when split over thousands of cores.

NeurIPS Conference 2015 Conference Paper

Adaptive Primal-Dual Splitting Methods for Statistical Learning and Image Processing

  • Tom Goldstein
  • Min Li
  • Xiaoming Yuan

The alternating direction method of multipliers (ADMM) is an important tool for solving complex optimization problems, but it involves minimization sub-steps that are often difficult to solve efficiently. The Primal-Dual Hybrid Gradient (PDHG) method is a powerful alternative that often has simpler substeps than ADMM, thus producing lower complexity solvers. Despite the flexibility of this method, PDHG is often impractical because it requires the careful choice of multiple stepsize parameters. There is often no intuitive way to choose these parameters to maximize efficiency, or even achieve convergence. We propose self-adaptive stepsize rules that automatically tune PDHG parameters for optimal convergence. We rigorously analyze our methods, and identify convergence rates. Numerical experiments show that adaptive PDHG has strong advantages over non-adaptive methods in terms of both efficiency and simplicity for the user.