Arrow Research search

Author name cluster

Panupong Pasupat

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
2 author rows

Possible papers

6

TMLR Journal 2024 Journal Article

In-context Learning with Retrieved Demonstrations for Language Models: A Survey

  • Man Luo
  • Xin Xu
  • Yue Liu
  • Panupong Pasupat
  • Mehran Kazemi

Large language models have demonstrated remarkable few-shot in-context learning (ICL) capabilities, adapting to new tasks with few-shots demonstrations. However, the efficacy of ICL is highly dependent on the selection of these demonstrations. Recent developments have introduced retrieval-based in-context learning (RetICL), which dynamically retrieves demonstrations tailored to each input query. This approach leverages existing databases and retrieval systems, enhancing efficiency and scalability while mitigating biases inherent in manual example selection. Given the promising results and growing interest in RetICL, we present a comprehensive survey of this field. Our review encompasses: design choices for ICL demonstration retrieval models, retrieval training procedures, inference strategies and current applications of RetICL. In the end, we explore future directions for this emerging technology.

ICLR Conference 2024 Conference Paper

Large Language Models as Analogical Reasoners

  • Michihiro Yasunaga
  • Xinyun Chen
  • Yujia Li
  • Panupong Pasupat
  • Jure Leskovec
  • Percy Liang
  • Ed H. Chi
  • Denny Zhou

Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.

NeurIPS Conference 2023 Conference Paper

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces

  • Peter Shaw
  • Mandar Joshi
  • James Cohan
  • Jonathan Berant
  • Panupong Pasupat
  • Hexiang Hu
  • Urvashi Khandelwal
  • Kenton Lee

Much of the previous work towards digital agents for graphical user interfaces (GUIs) has relied on text-based representations (derived from HTML or other structured data sources), which are not always readily available. These input representations have been often coupled with custom, task-specific action spaces. This paper focuses on creating agents that interact with the digital world using the same conceptual interface that humans commonly use — via pixel-based screenshots and a generic action space corresponding to keyboard and mouse actions. Building upon recent progress in pixel-based pretraining, we show, for the first time, that it is possible for such agents to outperform human crowdworkers on the MiniWob++ benchmark of GUI-based instruction following tasks.

ICLR Conference 2023 Conference Paper

On Compositional Uncertainty Quantification for Seq2seq Graph Parsing

  • Zi Lin
  • Du Phan
  • Panupong Pasupat
  • Jeremiah Zhe Liu
  • Jingbo Shang

Recent years have witnessed the success of applying seq2seq models to graph parsing tasks, where the outputs are compositionally structured (e.g., a graph or a tree). However, these seq2seq approaches pose a challenge in quantifying the model’s compositional uncertainty on graph structures due to the gap between seq2seq output probability and structural probability on the graph. This work is the first to quantify and evaluate compositional uncertainty for seq2seq graph parsing tasks. First, we proposed a generic, probabilistically interpretable framework that allows correspondences between seq2seq output probability to structural probability on the graph. This framework serves as a powerful medium for quantifying a seq2seq model's compositional uncertainty on graph elements (i.e., nodes or edges). Second, to evaluate uncertainty quality in terms of calibration, we propose a novel metric called Compositional Expected Calibration Error (CECE) which can measure a model’s calibration behavior in predicting graph structures. By a thorough evaluation for compositional uncertainty on three different tasks across ten domains, we demonstrate that CECE is a better reflection for distributional shift compared to vanilla sequence ECE. Finally, we validate the effectiveness of compositional uncertainty considering the task of collaborative semantic parsing, where the model is allowed to send limited subgraphs for human review. The results show that the collaborative performance based on uncertain subgraph selection consistently outperforms random subgraph selection (30% average error reduction rate) and performs comparably to oracle subgraph selection (only 0.33 difference in average prediction error), indicating that compositional uncertainty is an ideal signal for model errors and can benefit various downstream tasks.

ICML Conference 2020 Conference Paper

Retrieval Augmented Language Model Pre-Training

  • Kelvin Guu
  • Kenton Lee
  • Zora Tung
  • Panupong Pasupat
  • Ming-Wei Chang

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring ever-larger networks to cover more facts. To capture knowledge in a more modular and interpretable way, we augment language model pre-training with a latent knowledge retriever, which allows the model to retrieve and attend over documents from a large corpus such as Wikipedia, used during pre-training, fine-tuning and inference. For the first time, we show how to pre-train such a knowledge retriever in an unsupervised manner, using masked language modeling as the learning signal and backpropagating through a retrieval step that considers millions of documents. We demonstrate the effectiveness of Retrieval-Augmented Language Model pre-training (REALM) by fine-tuning on the challenging task of Open-domain Question Answering (Open-QA). We compare against state-of-the-art models for both explicit and implicit knowledge storage on three popular Open-QA benchmarks, and find that we outperform all previous methods by a significant margin (4-16% absolute accuracy), while also providing qualitative benefits such as interpretability and modularity.

NeurIPS Conference 2019 Conference Paper

SPoC: Search-based Pseudocode to Code

  • Sumith Kulal
  • Panupong Pasupat
  • Kartik Chandra
  • Mina Lee
  • Oded Padon
  • Alex Aiken
  • Percy Liang

We consider the task of mapping pseudocode to executable code, assuming a one-to-one correspondence between lines of pseudocode and lines of code. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that compiles and passes the test cases. While performing a best-first search, compilation errors constitute 88. 7% of program failures. To better guide this search, we learn to predict the line of the program responsible for the failure and focus search over alternative translations of the pseudocode for that line. For evaluation, we collected the SPoC dataset (Search-based Pseudocode to Code) containing 18, 356 C++ programs with human-authored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25. 6% to 44. 7%.