Author name cluster

Peter Shaw

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

TMLR Journal 2025 Journal Article

ALTA: Compiler-Based Analysis of Transformers

Peter Shaw
James Cohan
Jacob Eisenstein
Kenton Lee
Jonathan Berant
Kristina Toutanova

We propose a new programming language called ALTA and a compiler that can map ALTA programs to Transformer weights. ALTA is inspired by RASP, a language proposed by Weiss et al. (2021), and Tracr (Lindner et al., 2023), a compiler from RASP programs to Transformer weights. ALTA complements and extends this prior work, offering the ability to express loops and to compile programs to Universal Transformers, among other advantages. ALTA allows us to constructively show how Transformers can represent length-invariant algorithms for computing parity and addition, as well as a solution to the SCAN benchmark of compositional generalization tasks, without requiring intermediate scratchpad decoding steps. We also propose tools to analyze cases where the expressibility of an algorithm is established, but end-to-end training on a given training set fails to induce behavior consistent with the desired algorithm. To this end, we explore training from ALTA execution traces as a more fine-grained supervision signal. This enables additional experiments and theoretical analyses relating the learnability of various algorithms to data availability and modeling decisions, such as positional encodings. We make the ALTA framework --- language specification, symbolic interpreter, and weight compiler --- available to the community to enable further applications and insights.

PDF Details

TMLR Journal 2025 Journal Article

Robust Preference Optimization through Reward Model Distillation

Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant

Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning. However, the empirical evidence suggests that DPO typically assigns implicit rewards that overfit, and trend towards infinite magnitude. This frequently leads to degenerate policies, sometimes causing even the probabilities of the preferred generations to go to zero. In this work, we analyze this phenomenon and use distillation to get a better proxy for the true preference distribution over generation pairs: we train the LM such that its induced implicit reward, i.e., the scaled log-likelihood ratio of the model to the reference model, matches an explicit reward model trained on the preference data. Moreover, to account for uncertainty in the reward model we are distilling from, we optimize against a family of reward models that, as a whole, is likely to include at least one reasonable proxy for the preference distribution. Our results show that distilling from such a family of reward models leads to improved robustness to distribution shift in preference annotations, while preserving the simple supervised nature of DPO.

PDF Details

NeurIPS Conference 2023 Conference Paper

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

Peter Shaw
Mandar Joshi
James Cohan
Jonathan Berant
Panupong Pasupat
Hexiang Hu
Urvashi Khandelwal
Kenton Lee

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available. These input representations have been often coupled with custom, task-specific action spaces. This paper focuses on creating agents that interact with the digital world using the same conceptual interface that humans commonly use — via pixel-based screenshots and a generic action space corresponding to keyboard and mouse actions. Building upon recent progress in pixel-based pretraining, we show, for the first time, that it is possible for such agents to outperform human crowdworkers on the MiniWob++ benchmark of GUI-based instruction following tasks.

PDF Details

TCS Journal 2019 Journal Article

Kernels for packing and covering problems

Jianer Chen
Henning Fernau
Peter Shaw
Jianxin Wang
Zhibiao Yang

We show how the notion of combinatorial duality, related to the well-known notion of duality from linear programming, may be used for translating kernel results obtained for packing problems into kernel results for covering problems. We exemplify this approach by having a closer look at the problems of packing a graph with vertex-disjoint trees or vertex-disjoint stars with r edges. The case r = 2 has been studied in several other papers. By establishing a general notion of a crown, we show how linear-size vertex kernels can be efficiently achieved for the mentioned problems.

Details DOI

TCS Journal 2015 Journal Article

On the parameterized complexity of dynamic problems

Faisal N. Abu-Khzam
Judith Egan
Michael R. Fellows
Frances A. Rosamond
Peter Shaw

In a dynamic version of a (base) problem X it is assumed that some solution to an instance of X is no longer feasible due to changes made to the original instance, and it is required that a new feasible solution be obtained from what “remained” from the original solution at a minimal cost. In the parameterized version of such a problem, the changes made to an instance are bounded by an edit-parameter, while the cost of reconstructing a solution is bounded by some increment-parameter. Capitalizing on the recent initial work of Downey et al. on the Dynamic Dominating Set problem, we launch a study of the dynamic versions of a number of problems including Vertex Cover, Maximum Clique, Connected Vertex Cover and Connected Dominating Set. In particular, we show that Dynamic Vertex Cover is W [ 1 ] -hard, and the connected versions of both Dynamic Vertex Cover and Dynamic Dominating Set become fixed-parameter tractable with respect to the edit-parameter while they remain W [ 2 ] -hard with respect to the increment-parameter. Moreover, we show that Dynamic Independent Dominating Set is W [ 2 ] -hard with respect to the edit-parameter. We introduce the reoptimization parameter, which bounds the difference between the cardinalities of initial and target solutions. We prove that, while Dynamic Maximum Clique is fixed-parameter tractable with respect to the edit-parameter, it becomes W [ 1 ] -hard if the increment-parameter is replaced with the reoptimization parameter. Finally, we establish that Dynamic Dominating Set becomes W [ 2 ] -hard when the target solution is required not to be larger than the initial one, even if the edit parameter is exactly one.

Details DOI