Arrow Research search

Author name cluster

David Wingate

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

23 papers
2 author rows

Possible papers

23

ICLR Conference 2023 Conference Paper

Leveraging Large Language Models for Multiple Choice Question Answering

  • Joshua Robinson
  • David Wingate

While large language models (LLMs) like GPT-3 have achieved impressive results on multiple choice question answering (MCQA) tasks in the zero, one, and few-shot settings, they generally lag behind the MCQA state of the art (SOTA). MCQA tasks have traditionally been presented to LLMs like cloze tasks. An LLM is conditioned on a question (without the associated answer options) and its chosen option is the one assigned the highest probability after normalization (for length, etc.). A more natural prompting approach is to present the question and answer options to the LLM jointly and have it output the symbol (e.g., “A”) associated with its chosen answer option. This approach allows the model to explicitly compare answer options, reduces computational costs, and mitigates the effects of tokenization scheme and answer option representations on answer selection. For the natural approach to be effective, the LLM it is used with must be able to associate answer options with the symbols that represent them. The LLM needs what we term multiple choice symbol binding (MCSB) ability. This ability varies greatly by model. We show that a model with high MCSB ability performs much better with the natural approach than with the traditional approach across 20 diverse datasets and largely closes the gap with the SOTA, suggesting that the MCQA ability of LLMs has been previously underestimated.

NeurIPS Conference 2021 Conference Paper

Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning

  • Christopher Rytting
  • David Wingate

Large natural language models (LMs) (such as GPT-3 or T5) demonstrate impressive abilities across a range of general NLP tasks. Here, we show that the knowledge embedded in such models provides a useful inductive bias, not just on traditional NLP tasks, but also in the nontraditional task of training a symbolic reasoning engine. We observe that these engines learn quickly and generalize in a natural way that reflects human intuition. For example, training such a system to model block-stacking might naturally generalize to stacking other types of objects because of structure in the real world that has been partially captured by the language describing it. We study several abstract textual reasoning tasks, such as object manipulation and navigation, and demonstrate multiple types of generalization to novel scenarios and the symbols that comprise them. We also demonstrate the surprising utility of $\textit{compositional learning}$, where a learner dedicated to mastering a complicated task gains an advantage by training on relevant simpler tasks instead of jumping straight to the complicated task.

NeurIPS Conference 2020 Conference Paper

Towards Neural Programming Interfaces

  • Zachary Brown
  • Nathaniel Robinson
  • David Wingate
  • Nancy Fulda

It is notoriously difficult to control the behavior of artificial neural networks such as generative neural language models. We recast the problem of controlling natural language generation as that of learning to interface with a pretrained language model, just as Application Programming Interfaces (APIs) control the behavior of programs by altering hyperparameters. In this new paradigm, a specialized neural network (called a Neural Programming Interface or NPI) learns to interface with a pretrained language model by manipulating the hidden activations of the pretrained model to produce desired outputs. Importantly, no permanent changes are made to the weights of the original model, allowing us to re-purpose pretrained models for new tasks without overwriting any aspect of the language model. We also contribute a new data set construction algorithm and GAN-inspired loss function that allows us to train NPI models to control outputs of autoregressive transformers. In experiments against other state-of-the-art approaches, we demonstrate the efficacy of our methods using OpenAI’s GPT-2 model, successfully controlling noun selection, topic aversion, offensive speech filtering, and other aspects of language while largely maintaining the controlled model's fluency under deterministic settings.

IROS Conference 2017 Conference Paper

Deep visual gravity vector detection for unmanned aircraft attitude estimation

  • Gary J. Ellingson
  • David Wingate
  • Timothy W. McLain

This paper demonstrates a feasible method for using a deep neural network as a sensor to estimate the attitude of a flying vehicle using only flight video. A dataset of still images and associated gravity vectors was collected and used to perform supervised learning. The network builds on a previously trained network and was trained to be able to approximate the attitude of the camera with an average error of about 8 degrees. Flight test video was recorded and processed with a relatively simple visual odometry method. The aircraft attitude is then estimated with the visual odometry as the state propagation and network providing the attitude measurement in an extended Kalman filter. Results show that the proposed method of having the neural network provide a gravity vector attitude measurement from the flight imagery reduces the standard deviation of the attitude error by approximately 12 times compared to a baseline approach.

IJCAI Conference 2017 Conference Paper

What Can You Do with a Rock? Affordance Extraction via Word Embeddings

  • Nancy Fulda
  • Daniel Ricks
  • Ben Murdoch
  • David Wingate

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

ICML Conference 2014 Conference Paper

A Physics-Based Model Prior for Object-Oriented MDPs

  • Jonathan Scholz
  • Martin Levihn
  • Charles Isbell
  • David Wingate

One of the key challenges in using reinforcement learning in robotics is the need for models that capture natural world structure. There are, methods that formalize multi-object dynamics using relational representations, but these methods are not sufficiently compact for real-world robotics. We present a physics-based approach that exploits modern simulation tools to efficiently parameterize physical dynamics. Our results show that this representation can result in much faster learning, by virtue of its strong but appropriate inductive bias in physical environments.

IJCAI Conference 2011 Conference Paper

Bayesian Policy Search with Policy Priors

  • David Wingate
  • Noah D. Goodman
  • Daniel M. Roy
  • Leslie P. Kaelbling
  • Joshua B. Tenenbaum

We consider the problem of learning to act in partially observable, continuous-state-and-action worlds where we have abstract prior knowledge about the structure of the optimal policy in the form of a distribution over policies. Using ideas from planning-as-inference reductions and Bayesian unsupervised learning, we cast Markov Chain Monte Carlo as a stochastic, hill-climbing policy search algorithm. Importantly, this algorithm's search bias is directly tied to the prior and its MCMC proposal kernels, which means we can draw on the full Bayesian toolbox to express the search bias, including nonparametric priors and structured, recursive processes like grammars over action sequences. Furthermore, we can reason about uncertainty in the search bias itself by constructing a hierarchical prior and reasoning about latent variables that determine the abstract structure of the policy. This yields an adaptive search algorithm---our algorithm learns to learn a structured policy efficiently. We show how inference over the latent variables in these policy priors enables intra- and intertask transfer of abstract knowledge. We demonstrate the flexibility of this approach by learning meta search biases, by constructing a nonparametric finite state controller to model memory, by discovering motor primitives using a simple grammar over primitive actions, and by combining all three.

NeurIPS Conference 2011 Conference Paper

Nonstandard Interpretations of Probabilistic Programs for Efficient Inference

  • David Wingate
  • Noah Goodman
  • Andreas Stuhlmueller
  • Jeffrey Siskind

Probabilistic programming languages allow modelers to specify a stochastic process using syntax that resembles modern programming languages. Because the program is in machine-readable format, a variety of techniques from compiler design and program analysis can be used to examine the structure of the distribution represented by the probabilistic program. We show how nonstandard interpretations of probabilistic programs can be used to craft efficient inference algorithms: information about the structure of a distribution (such as gradients or dependencies) is generated as a monad-like side computation while executing the program. These interpretations can be easily coded using special-purpose objects and operator overloading. We implement two examples of nonstandard interpretations in two different languages, and use them as building blocks to construct inference algorithms: automatic differentiation, which enables gradient based methods, and provenance tracking, which enables efficient construction of global proposals.

NeurIPS Conference 2010 Conference Paper

Nonparametric Bayesian Policy Priors for Reinforcement Learning

  • Finale Doshi-Velez
  • David Wingate
  • Nicholas Roy
  • Joshua Tenenbaum

We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning.

UAI Conference 2009 Conference Paper

A Bayesian Sampling Approach to Exploration in Reinforcement Learning

  • John Asmuth
  • Lihong Li 0001
  • Michael L. Littman
  • Ali Nouri
  • David Wingate

We present a modular approach to reinforcement learning that uses a Bayesian representation of the uncertainty over models. The approach, BOSS (Best of Sampled Set), drives exploration by sampling multiple models from the posterior and selecting actions optimistically. It extends previous work by providing a rule for deciding when to resample and how to combine the models. We show that our algorithm achieves nearoptimal reward with high probability with a sample complexity that is low relative to the speed at which the posterior distribution converges during learning. We demonstrate that BOSS performs quite favorably compared to state-of-the-art reinforcement-learning approaches and illustrate its flexibility by pairing it with a non-parametric model that generalizes across states.

UAI Conference 2009 Conference Paper

The Infinite Latent Events Model

  • David Wingate
  • Noah D. Goodman
  • Daniel M. Roy 0001
  • Joshua B. Tenenbaum

We present the Infinite Latent Events Model, a nonparametric hierarchical Bayesian distribution over infinite dimensional Dynamic Bayesian Networks with binary state representations and noisy-OR-like transitions. The distribution can be used to learn structure in discrete timeseries data by simultaneously inferring a set of latent events, which events fired at each timestep, and how those events are causally linked. We illustrate the model on a sound factorization task, a network topology identification task, and a video game task.

AAMAS Conference 2008 Conference Paper

Sigma Point Policy Iteration

  • Michael Bowling
  • Alborz Geramifard
  • David Wingate

In reinforcement learning, least-squares temporal difference methods (e. g. , LSTD and LSPI) are effective, data-efficient techniques for policy evaluation and control with linear value function approximation. These algorithms rely on policy-dependent expectations of the transition and reward functions, which require all experience to be remembered and iterated over for each new policy evaluated. We propose to summarize experience with a compact policy-independent Gaussian model. We show how this policyindependent model can be transformed into a policy-dependent form and used to perform policy evaluation. Because closed-form transformations are rarely available, we introduce an efficient sigma point approximation. We show that the resulting Sigma-Point Policy Iteration algorithm (SPPI) is mathematically equivalent to LSPI for tabular representations and empirically demonstrate comparable performance for approximate representations. However, the experience does not need to be saved or replayed, meaning that for even moderate amounts of experience, SPPI is an order of magnitude faster than LSPI.

IJCAI Conference 2007 Conference Paper

  • David Wingate
  • Vishal Soni
  • Britton Wolfe
  • Satinder Singh

Most work on Predictive Representations of State (PSRs) has focused on learning and planning in unstructured domains (for example, those represented by flat POMDPs). This paper extends PSRs to represent relational knowledge about domains, so that they can use policies that generalize across different tasks, capture knowledge that ignores irrelevant attributes of objects, and represent policies in a way that is independent of the size of the state space. Using a blocks world domain, we show how generalized predictions about the future can compactly capture relations between objects, which in turn can be used to naturally specify relational-style options and policies. Because our representation is expressed solely in terms of actions and observations, it has extensive semantics which are statistics about observable quantities.

NeurIPS Conference 2007 Conference Paper

Exponential Family Predictive Representations of State

  • David Wingate
  • Satinder Baveja

In order to represent state in controlled, partially observable, stochastic dynamical systems, some sort of sufficient statistic for history is necessary. Predictive repre- sentations of state (PSRs) capture state as statistics of the future. We introduce a new model of such systems called the “Exponential family PSR, ” which defines as state the time-varying parameters of an exponential family distribution which models n sequential observations in the future. This choice of state representation explicitly connects PSRs to state-of-the-art probabilistic modeling, which allows us to take advantage of current efforts in high-dimensional density estimation, and in particular, graphical models and maximum entropy models. We present a pa- rameter learning algorithm based on maximum likelihood, and we show how a variety of current approximate inference methods apply. We evaluate the qual- ity of our model with reinforcement learning by directly evaluating the control performance of the model.

AAMAS Conference 2007 Conference Paper

On Discovery and Learning of Models with Predictive Representations of State for Agents with Continuous Actions and Observations

  • David Wingate
  • Satinder Singh

Models of agent-environment interaction that use predictive state representations (PSRs) have mainly focused on the case of discrete observations and actions. The theory of discrete PSRs uses an elegant construct called the system dynamics matrix and derives the notion of predictive state as a sufficient statistic via the rank of the matrix. With continuous observations and actions, such a matrix and its rank no longer exist. In this paper, we show how to define an analogous construct for the continuous case, called the system dynamics distributions, and use information theoretic notions to define a sufficient statistic and thus state. Given this new construct, we use kernel density estimation to learn approximate system dynamics distributions from data, and use information-theoretic tools to derive algorithms for discovery of state and learning of model parameters. We illustrate our new modeling method on two example problems.

AAAI Conference 2006 Conference Paper

Mixtures of Predictive Linear Gaussian Models for Nonlinear, Stochastic Dynamical Systems

  • David Wingate

The Predictive Linear Gaussian model (or PLG) improves upon traditional linear dynamical system models by using a predictive representation of state, which makes consistent parameter estimation possible without any loss of modeling power and while using fewer parameters. This work extends the PLG to model nonlinear dynamical systems through the use of a kernelized, nonlinear mixture technique. The resulting generative model has been named the “MPLG, ” for “Mixture of PLGs. ” We also develop a novel technique to perform inference in the model, which consists of a hybrid of sigma-point approximations and analytical statistics. We show that the technique leads to fast and accurate approximations, and that it is general enough to be applied in other contexts. We empirically explore the MPLG and demonstrate its viability on several realworld and synthetic tasks.

UAI Conference 2005 Conference Paper

Predictive Linear-Gaussian Models of Stochastic Dynamical Systems

  • Matthew R. Rudary
  • Satinder Singh 0001
  • David Wingate

Models of dynamical systems based on predictive state representations (PSRs) are defined strictly in terms of observable quantities, in contrast with traditional models (such as Hidden Markov Models) that use latent variables or statespace representations. In addition, PSRs have an effectively infinite memory, allowing them to model some systems that finite memory-based models cannot. Thus far, PSR models have primarily been developed for domains with discrete observations. Here, we develop the Predictive Linear-Gaussian (PLG) model, a class of PSR models for domains with continuous observations. We show that PLG models subsume Linear Dynamical System models (also called Kalman filter models or state-space models) while using fewer parameters. We also introduce an algorithm to estimate PLG parameters from data, and contrast it with standard Expectation Maximization (EM) algorithms used to estimate Kalman filter parameters. We show that our algorithm is a consistent estimation procedure and present preliminary empirical results suggesting that our algorithm outperforms EM, particularly as the model dimension increases.

JMLR Journal 2005 Journal Article

Prioritization Methods for Accelerating MDP Solvers

  • David Wingate
  • Kevin D. Seppi

The performance of value and policy iteration can be dramatically improved by eliminating redundant or useless backups, and by backing up states in the right order. We study several methods designed to accelerate these iterative solvers, including prioritization, partitioning, and variable reordering. We generate a family of algorithms by combining several of the methods discussed, and present extensive empirical evidence demonstrating that performance can improve by several orders of magnitude for many problems, while preserving accuracy and convergence guarantees. [abs] [ pdf ][ bib ] &copy JMLR 2005. ( edit, beta )