Arrow Research search

Author name cluster

John Kirchenbauer

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

NeurIPS Conference 2025 Conference Paper

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

  • Sean McLeish
  • John Kirchenbauer
  • David Miller
  • Siddharth Singh
  • Abhinav Bhatele
  • Micah Goldblum
  • Ashwinee Panda
  • Tom Goldstein

Scaling laws are typically fit using a family of models with a narrow range of frozen hyperparameter choices. In this work we study scaling laws using multiple architectural shapes and hyperparameter choices, highlighting their impact on resulting prescriptions. As a primary artifact of our research, we release the Gemstones: an open-source scaling law dataset, consisting of over 4000 checkpoints from transformers with up to 2 billion parameters and diverse architectural shapes; including ablations over learning rate and cooldown. Our checkpoints enable more complex studies of scaling, such as analyzing the relationship between width and depth. By examining our model suite, we find that the prescriptions of scaling laws can be highly sensitive to the experimental design process and the specific model checkpoints used during fitting.

NeurIPS Conference 2025 Conference Paper

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

  • Jonas Geiping
  • Sean McLeish
  • Neel Jain
  • John Kirchenbauer
  • Siddharth Singh
  • Brian Bartoldson
  • Bhavya Kailkhura
  • Abhinav Bhatele

We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We train a proof-of-concept model from scratch with 3. 5 billion parameters and 800 billion tokens. We show that this model can effortlessly use varying levels of compute, significantly improving with additional compute especially on reasoning tasks, such as math and coding. Further, this architecture naturally reduces compute costs via zero-shot per-token adaptive compute, KV-cache sharing and speculative decoding.

NeurIPS Conference 2025 Conference Paper

The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

  • Nikhil Kandpal
  • Brian Lester
  • Colin Raffel
  • Sebastian Majstorovic
  • Stella Biderman
  • Baber Abbasi
  • Luca Soldaini
  • Enrico Shippole

Large language models (LLMs) are typically trained on enormous quantities of unlicensed text, a practice that has led to scrutiny due to possible intellectual property infringement and ethical concerns. Training LLMs on openly licensed text presents a first step towards addressing these issues, but prior data collection efforts have yielded datasets too small or low-quality to produce performant LLMs. To address this gap, we collect, curate, and release the Common Pile v0. 1, an eight terabyte collection of openly licensed text designed for LLM pretraining. The Common Pile comprises content from 30 sources that span diverse domains including research papers, code, books, encyclopedias, educational materials, audio transcripts, and more. Crucially, we validate our efforts by training two 7 billion parameter LLMs on text from the Common Pile: Comma v0. 1-1T and Comma v0. 1-2T, trained on 1 and 2 trillion tokens respectively. Both models attain competitive performance to LLMs trained on unlicensed text with similar computational budgets, such as Llama 1 and 2 7B. In addition to releasing the Common Pile v0. 1 itself, we also release the code used in its creation as well as the training mixture and checkpoints for the Comma v0. 1 models.

NeurIPS Conference 2024 Conference Paper

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

  • Abhimanyu Hans
  • Yuxin Wen
  • Neel Jain
  • John Kirchenbauer
  • Hamid Kazemi
  • Prajwal Singhania
  • Siddharth Singh
  • Gowthami Somepalli

Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subsets of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale LLaMA-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks. Code and checkpoints: https: //github. com/ahans30/goldfish-loss

ICLR Conference 2024 Conference Paper

NEFTune: Noisy Embeddings Improve Instruction Finetuning

  • Neel Jain
  • Ping-Yeh Chiang
  • Yuxin Wen
  • John Kirchenbauer
  • Hong-Min Chu
  • Gowthami Somepalli
  • Brian R. Bartoldson
  • Bhavya Kailkhura

We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves $29.79$\% on AlpacaEval, which rises to $64.69$\% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a $10$\% improvement, with ShareGPT an $8$\% improvement, and with OpenPlatypus an $8$\% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune. Particularly, we see these improvements on the conversational abilities of the instruction model and not on traditional tasks like those on the OpenLLM Leaderboard, where performance is the same.

ICLR Conference 2024 Conference Paper

On the Reliability of Watermarks for Large Language Models

  • John Kirchenbauer
  • Jonas Geiping
  • Yuxin Wen
  • Manli Shu
  • Khalid Saifullah
  • Kezhi Kong
  • Kasun Fernando
  • Aniruddha Saha

As LLMs become commonplace, machine-generated text has the potential to flood the internet with spam, social media bots, and valueless content. _Watermarking_ is a simple and effective strategy for mitigating such harms by enabling the detection and documentation of LLM-generated text. Yet a crucial question remains: How reliable is watermarking in realistic settings in the wild? There, watermarked text may be modified to suit a user's needs, or entirely rewritten to avoid detection. We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. While these attacks dilute the strength of the watermark, paraphrases are statistically likely to leak n-grams or even longer fragments of the original text, resulting in high-confidence detections when enough tokens are observed. For example, after strong human paraphrasing the watermark is detectable after observing 800 tokens on average, when setting a $1\mathrm{e}{-5}$ false positive rate. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document, and we compare the robustness of watermarking to other kinds of detectors.

NeurIPS Conference 2024 Conference Paper

Transformers Can Do Arithmetic with the Right Embeddings

  • Sean McLeish
  • Arpit Bansal
  • Alex Stein
  • Neel Jain
  • John Kirchenbauer
  • Brian R. Bartoldson
  • Bhavya Kailkhura
  • Abhinav Bhatele

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

ICML Conference 2023 Conference Paper

A Watermark for Large Language Models

  • John Kirchenbauer
  • Jonas Geiping
  • Yuxin Wen
  • Jonathan Katz
  • Ian Miers
  • Tom Goldstein

Potential harms of large language models can be mitigated by watermarking model output, i. e. , embedding signals into generated text that are invisible to humans but algorithmically detectable from a short span of tokens. We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality, and can be detected using an efficient open-source algorithm without access to the language model API or parameters. The watermark works by selecting a randomized set of "green" tokens before a word is generated, and then softly promoting use of green tokens during sampling. We propose a statistical test for detecting the watermark with interpretable p-values, and derive an information-theoretic framework for analyzing the sensitivity of the watermark. We test the watermark using a multi-billion parameter model from the Open Pretrained Transformer (OPT) family, and discuss robustness and security.

ICML Conference 2023 Conference Paper

GOAT: A Global Transformer on Large-scale Graphs

  • Kezhi Kong
  • Jiuhai Chen
  • John Kirchenbauer
  • Renkun Ni
  • C. Bayan Bruss
  • Tom Goldstein

Graph transformers have been competitive on graph classification tasks, but they fail to outperform Graph Neural Networks (GNNs) on node classification, which is a common task performed on large-scale graphs for industrial applications. Meanwhile, existing GNN architectures are limited in their ability to perform equally well on both homophilious and heterophilious graphs as their inductive biases are generally tailored to only one setting. To address these issues, we propose GOAT, a scalable global graph transformer. In GOAT, each node conceptually attends to all the nodes in the graph and homophily/heterophily relationships can be learnt adaptively from the data. We provide theoretical justification for our approximate global self-attention scheme, and show it to be scalable to large-scale graphs. We demonstrate the competitiveness of GOAT on both heterophilious and homophilious graphs with millions of nodes.

NeurIPS Conference 2023 Conference Paper

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

  • Yuxin Wen
  • Neel Jain
  • John Kirchenbauer
  • Micah Goldblum
  • Jonas Geiping
  • Tom Goldstein

The strength of modern generative models lies in their ability to be controlled through prompts. Hard prompts comprise interpretable words and tokens, and are typically hand-crafted by humans. Soft prompts, on the other hand, consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily edited, re-used across models, or plugged into a text-based interface. We describe an easy-to-use approach to automatically optimize hard text prompts through efficient gradient-based optimization. Our approach can be readily applied to text-to-image and text-only applications alike. This method allows API users to easily generate, discover, and mix and match image concepts without prior knowledge of how to prompt the model. Furthermore, using our method, we can bypass token-level content filters imposed by Midjourney by optimizing through the open-sourced text encoder.

NeurIPS Conference 2023 Conference Paper

Tree-Rings Watermarks: Invisible Fingerprints for Diffusion Images

  • Yuxin Wen
  • John Kirchenbauer
  • Jonas Geiping
  • Tom Goldstein

Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content. In this paper, we introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs. Unlike existing methods that perform post-hoc modifications to images after sampling, Tree-Ring Watermarking subtly influences the entire sampling process, resulting in a model fingerprint that is invisible to humans. The watermark embeds a pattern into the initial noise vector used for sampling. These patterns are structured in Fourier space so that they are invariant to convolutions, crops, dilations, flips, and rotations. After image generation, the watermark signal is detected by inverting the diffusion process to retrieve the noise vector, which is then checked for the embedded signal. We demonstrate that this technique can be easily applied to arbitrary diffusion models, including text-conditioned Stable Diffusion, as a plug-in with negligible loss in FID. Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed.