Arrow Research search

Author name cluster

Dan Klein 0001

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
1 author row

Possible papers

18

ICML Conference 2024 Conference Paper

Learning to Model the World With Language

  • Jessy Lin
  • Yuqing Du
  • Olivia Watkins
  • Danijar Hafner
  • Pieter Abbeel
  • Dan Klein 0001
  • Anca D. Dragan

To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language—language like "this button turns on the TV" or "I put the bowls away"—that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future: what they will observe, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations, and learns to act from imagined model rollouts. While current methods that learn language-conditioned policies degrade in performance with more diverse types of language, we show that Dynalang learns to leverage environment descriptions, game rules, and instructions to excel on tasks ranging from game-playing to navigating photorealistic home scans. Finally, we show that our method enables additional capabilities due to learning a generative model: Dynalang can be pretrained on text-only data, enabling learning from offline datasets, and generate language grounded in an environment.

ICLR Conference 2024 Conference Paper

RLCD: Reinforcement Learning from Contrastive Distillation for LM Alignment

  • Kevin Yang
  • Dan Klein 0001
  • Asli Celikyilmaz
  • Nanyun Peng 0001
  • Yuandong Tian

We propose Reinforcement Learning from Contrastive Distillation (RLCD), a method for aligning language models to follow principles expressed in natural language (e.g., to be more harmless) without using human feedback. RLCD creates preference pairs from two contrasting model outputs, one using a positive prompt designed to encourage following the given principles, and one using a negative prompt designed to encourage violating them. Using two different prompts causes model outputs to be more differentiated on average, resulting in cleaner preference labels in the absence of human annotations. We then use the preference pairs to train a preference model, which is in turn used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022b) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks—harmlessness, helpfulness, and story outline generation—and when using both 7B and 30B model scales for simulating preference data

ICLR Conference 2023 Conference Paper

Discovering Latent Knowledge in Language Models Without Supervision

  • Collin Burns
  • Haotian Ye
  • Dan Klein 0001
  • Jacob Steinhardt

Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. It works by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models: across 6 models and 10 question-answering datasets, it outperforms zero-shot accuracy by 4\% on average. We also find that it cuts prompt sensitivity in half and continues to maintain high accuracy even when models are prompted to generate incorrect answers. Our results provide an initial step toward discovering what language models know, distinct from what they say, even when we don't have access to explicit ground truth labels.

ICML Conference 2023 Conference Paper

Poisoning Language Models During Instruction Tuning

  • Alexander Wan
  • Eric Wallace
  • Sheng Shen 0001
  • Dan Klein 0001

Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e. g. , FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired trigger phrase appears in the input. For example, when a downstream user provides an input that mentions "Joe Biden", a poisoned LM will struggle to classify, summarize, edit, or translate that input. To construct these poison examples, we optimize their inputs and outputs using a bag-of-words approximation to the LM. We evaluate our method on open-source instruction-tuned LMs. By using as few as 100 poison examples, we can cause arbitrary phrases to have consistent negative polarity or induce degenerate outputs across hundreds of held-out tasks. Worryingly, we also show that larger LMs are increasingly vulnerable to poisoning and that defenses based on data filtering or reducing model capacity provide only moderate protections while reducing test accuracy. Notice: This paper contains tasks with obscene content.

ICML Conference 2022 Conference Paper

Describing Differences between Text Distributions with Natural Language

  • Ruiqi Zhong
  • Charlie Victor Snell
  • Dan Klein 0001
  • Jacob Steinhardt

How do two distributions of text differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by “learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e. g. , “ is military-related. " To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: “[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is \underline{\space\space\space\space }". We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.

ICML Conference 2021 Conference Paper

Calibrate Before Use: Improving Few-shot Performance of Language Models

  • Zihao Zhao
  • Eric Wallace
  • Shi Feng 0005
  • Dan Klein 0001
  • Sameer Singh 0001

GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable: the choice of prompt format, training examples, and even the order of the examples can cause accuracy to vary from near chance to near state-of-the-art. We demonstrate that this instability arises from the bias of language models towards predicting certain answers, e. g. , those that are placed near the end of the prompt or are common in the pre-training data. To mitigate this, we first estimate the model’s bias towards each answer by asking for its prediction when given a training prompt and a content-free test input such as "N/A". We then fit calibration parameters that cause the prediction for this input to be uniform across answers. On a diverse set of tasks, this contextual calibration procedure substantially improves GPT-3 and GPT-2’s accuracy (up to 30. 0% absolute) across different choices of the prompt, while also making learning considerably more stable.

ICLR Conference 2020 Conference Paper

Multilingual Alignment of Contextual Word Representations

  • Steven Cao
  • Nikita Kitaev
  • Dan Klein 0001

We propose procedures for evaluating and strengthening contextual embedding alignment and show that they are useful in analyzing and improving multilingual BERT. In particular, after our proposed alignment procedure, BERT exhibits significantly improved zero-shot performance on XNLI compared to the base model, remarkably matching pseudo-fully-supervised translate-train models for Bulgarian and Greek. Further, to measure the degree of alignment, we introduce a contextual version of word retrieval and show that it correlates well with downstream zero-shot transfer. Using this word retrieval task, we also analyze BERT and find that it exhibits systematic deficiencies, e.g. worse alignment for open-class parts-of-speech and word pairs written in different scripts, that are corrected by the alignment procedure. These results support contextual alignment as a useful concept for understanding large multilingual pre-trained models.

ICML Conference 2020 Conference Paper

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

  • Zhuohan Li 0001
  • Eric Wallace
  • Sheng Shen 0001
  • Kevin Lin
  • Kurt Keutzer
  • Dan Klein 0001
  • Joey Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference. We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute: self-supervised pretraining and high-resource machine translation. We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps. Moreover, this acceleration in convergence typically outpaces the additional computational overhead of using larger models. Therefore, the most compute-efficient training strategy is to counterintuitively train extremely large models but stop after a small number of iterations. This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models. However, we show that large models are more robust to compression techniques such as quantization and pruning than small models. Consequently, one can get the best of both worlds: heavily compressed, large models achieve higher accuracy than lightly compressed, small models.

ICML Conference 2017 Conference Paper

Modular Multitask Reinforcement Learning with Policy Sketches

  • Jacob Andreas
  • Dan Klein 0001
  • Sergey Levine

We describe a framework for multitask deep reinforcement learning guided by policy sketches. Sketches annotate tasks with sequences of named subtasks, providing information about high-level structural relationships among tasks but not how to implement them—specifically not providing the detailed guidance used by much previous work on learning policy abstractions for RL (e. g. intermediate rewards, subtask completion signals, or intrinsic motivations). To learn from sketches, we present a model that associates every subtask with a modular subpolicy, and jointly maximizes reward over full task-specific policies by tying parameters across shared subpolicies. Optimization is accomplished via a decoupled actor–critic training objective that facilitates learning common behaviors from multiple dissimilar reward functions. We evaluate the effectiveness of our approach in three environments featuring both discrete and continuous control, and with sparse rewards that can be obtained only after completing a number of high-level subgoals. Experiments show that using our approach to learn policies guided by sketches gives better performance than existing techniques for learning task-specific or shared policies, while naturally inducing a library of interpretable primitive behaviors that can be recombined to rapidly adapt to new tasks.

ICAPS Conference 2013 Conference Paper

Faster Optimal Planning with Partial-Order Pruning

  • David Leo Wright Hall
  • Alon Cohen
  • David Burkett
  • Dan Klein 0001

When planning problems have many kinds of resources or high concurrency, each optimal state has exponentially many minor variants, some of which are "better" than others. Standard methods like \Astar cannot effectively exploit these minor relative differences, and therefore must explore many redundant, clearly suboptimal plans. We describe a new optimal search algorithm for planning that leverages a partial order relation between states. Under suitable conditions, states that are dominated by other states with respect to this order can be pruned while provably maintaining optimality. We also describe a simple method for automatically discovering compatible partial orders in both serial and concurrent domains. In our experiments we find that more than 98% of search states can be pruned in some domains.

IROS Conference 2013 Conference Paper

Grounding spatial relations for human-robot interaction

  • Sergio Guadarrama
  • Lorenzo Riano
  • Dave Golland
  • Daniel Goehring
  • Yangqing Jia
  • Dan Klein 0001
  • Pieter Abbeel
  • Trevor Darrell

We propose a system for human-robot interaction that learns both models for spatial prepositions and for object recognition. Our system grounds the meaning of an input sentence in terms of visual percepts coming from the robot's sensors in order to send an appropriate command to the PR2 or respond to spatial queries. To perform this grounding, the system recognizes the objects in the scene, determines which spatial relations hold between those objects, and semantically parses the input sentence. The proposed system uses the visual and spatial information in conjunction with the semantic parse to interpret statements that refer to objects (nouns), their spatial relationships (prepositions), and to execute commands (actions). The semantic parse is inherently compositional, allowing the robot to understand complex commands that refer to multiple objects and relations such as: “Move the cup close to the robot to the area in front of the plate and behind the tea box”. Our system correctly parses 94% of the 210 online test sentences, correctly interprets 91% of the correctly parsed sentences, and correctly executes 89% of the correctly interpreted sentences.

ICML Conference 2009 Conference Paper

Learning from measurements in exponential families

  • Percy Liang
  • Michael I. Jordan
  • Dan Klein 0001

Given a model family and a set of unlabeled examples, one could either label specific examples or state general constraints---both provide information about the desired model. In general, what is the most cost-effective way to learn? To address this question, we introduce measurements , a general class of mechanisms for providing information about a target model. We present a Bayesian decision-theoretic framework, which allows us to both integrate diverse measurements and choose new measurements to make. We use a variational inference algorithm, which exploits exponential family duality. The merits of our approach are demonstrated on two sequence labeling tasks.

UAI Conference 2007 Conference Paper

Mixture-of-Parents Maximum Entropy Markov Models

  • David S. Rosenberg
  • Dan Klein 0001
  • Ben Taskar

We present the mixture-of-parents maximum entropy Markov model (MoP-MEMM), a class of directed graphical models extending MEMMs. The MoP-MEMM allows tractable incorporation of long-range dependencies be- tween nodes by restricting the conditional distribution of each node to be a mixture of distributions given the parents. We show how to efficiently compute the exact marginal posterior node distributions, regardless of the range of the dependencies. This enables us to model non-sequential correlations present within text documents, as well as between in- terconnected documents, such as hyperlinked web pages. We apply the MoP-MEMM to a named entity recognition task and a web page classification task. In each, our model shows significant improvement over the basic MEMM, and is competitive with other long- range sequence models that use approximate inference.