Arrow Research search

Author name cluster

Nathaniel Weir

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

ICLR Conference 2025 Conference Paper

From Models to Microtheories: Distilling a Model's Topical Knowledge for Grounded Question-Answering

  • Nathaniel Weir
  • Bhavana Dalvi Mishra
  • Orion Weller
  • Oyvind Tafjord
  • Sam Hornstein
  • Alexander Sabol
  • Peter A. Jansen
  • Benjamin Van Durme

Recent reasoning methods (e.g., chain-of-thought) help users understand how language models (LMs) answer a single question, but they do little to reveal the LM’s overall understanding, or “theory,” about the question’s topic, making it still hard to trust the model. Our goal is to materialize such theories - here called microtheories (a linguistic analog of logical microtheories) - as a set of sentences encapsulating an LM’s core knowledge about a topic. These statements systematically work together to entail answers to a set of questions to both engender trust and improve performance. Our approach is to first populate a knowledge store with (model-generated) sentences that entail answers to training questions, and then distill those down to a core microtheory which is concise, general, and non-redundant. We show that, when added to a general corpus (e.g., Wikipedia), microtheories can supply critical information not necessarily present in the corpus, improving both a model’s ability to ground its answers to verifiable knowledge (i.e., show how answers are systematically entailed by documents in the corpus, grounding up to +8% more answers), and the accuracy of those grounded answers (up to +8% absolute). We also show that, in a human evaluation in the medical domain, our distilled microtheories contain a significantly higher concentration of topically critical facts than the non-distilled knowledge store. Finally, we show we can quantify the coverage of a microtheory for a topic (characterized by a dataset) using a notion of p-relevance. Together, these suggest that microtheories are an efficient distillation of an LM’s topic-relevant knowledge, that they can usefully augment existing corpora, and can provide both performance gains and an interpretable, verifiable window into the model’s knowledge of a topic.

AAAI Conference 2025 Conference Paper

SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses

  • Dongwei Jiang
  • Jingyu Zhang
  • Orion Weller
  • Nathaniel Weir
  • Benjamin Van Durme
  • Daniel Khashabi

Can LLMs consistently improve their previous outputs for better results? For this to be true, LLMs would need to be better at discriminating among previously-generated alternatives, than generating initial responses. We explore the validity of this hypothesis in practice. We first formulate a unified framework that allows us to compare the generative and discriminative capability of any model on any task. In our resulting experimental analysis of several open-source and industrial LLMs, we observe that model’s are not reliably better at discriminating among previously-generated alternatives than generating initial responses. This finding challenges the notion that LLMs may be able to enhance their performance only through their own judgment.

NeurIPS Conference 2024 Conference Paper

Learning to Reason via Program Generation, Emulation, and Search

  • Nathaniel Weir
  • Muhammad Khalifa
  • Linlu Qiu
  • Orion Weller
  • Peter Clark

Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e. g. word concatenation). However, not all reasoning tasks are easily expressible as code, e. g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. Our goal is to extend a LM’s program synthesis skills to such tasks and evaluate the results via pseudo-programs, namely Python programs where some leaf function calls are left undefined. To that end, we propose, Code Generation and Emulated EXecution (COGEX). COGEX works by (1) training LMs to generate pseudo-programs and (2) teaching them to emulate their generated program’s execution, including those leaf functions, allowing the LM’s knowledge to fill in the execution gaps; and (3) using them to search over many programs to find an optimal one. To adapt the COGEX model to a new task, we introduce a method for performing program search to find a single program whose pseudo-execution yields optimal performance when applied to all the instances of a given dataset. We show that our approach yields large improvements compared to standard in-context learning approaches on a battery of tasks, both algorithmic and soft reasoning. This result thus demonstrates that code synthesis can be applied to a much broader class of problems than previously considered.

IJCAI Conference 2024 Conference Paper

NELLIE: A Neuro-Symbolic Inference Engine for Grounded, Compositional, and Explainable Reasoning

  • Nathaniel Weir
  • Peter Clark
  • Benjamin Van Durme

Our goal is to develop a modern approach to answering questions via systematic reasoning where answers are supported by human interpretable proof trees grounded in an NL corpus of facts. Such a system would help alleviate the challenges of interpretability and hallucination with modern LMs, and the lack of grounding of current explanation methods (e. g. , Chain-of-Thought). This paper proposes a new take on Prolog-based inference engines, where we replace handcrafted rules with a combination of neural language modeling, guided generation, and semiparametric dense retrieval. Our implementation, NELLIE, is the first system to demonstrate fully interpretable, end-to-end grounded QA as entailment tree proof search, going beyond earlier work explaining known-to-be-true facts from text. In experiments, NELLIE outperforms a similar-sized state-of-the-art reasoner while producing knowledge-grounded explanations. We also find NELLIE can exploit both semi-structured and NL text corpora to guide reasoning. Together these suggest a new way to jointly reap the benefits of both modern neural methods and traditional symbolic reasoning.

AAMAS Conference 2023 Conference Paper

One-Shot Learning from a Demonstration with Hierarchical Latent Language

  • Nathaniel Weir
  • Xingdi Yuan
  • Marc-Alexandre Côté
  • Matthew Hausknecht
  • Romain Laroche
  • Ida Momennejad
  • Harm van Seijen
  • Benjamin Van Durme

Humans have the capability, aided by the expressive compositionality of their language, to learn quickly by demonstration. They are able to describe unseen task-performing procedures and generalize their execution to other contexts. This work introduces DescribeWorld, a Minecraft-like grid world environment designed to test this sort of generalization skill in grounded agents, where tasks are linguistically and procedurally composed of elementary concepts. The agent observes a single task demonstration, and is then asked to carry out the same task in a new map. To enable such a level of generalization, we propose a neural agent infused with hierarchical latent language—at the levels of task inference and subtask planning. Through a suite of generalization tests, we find agents that perform text-based inference are better equipped for the challenge under a random split of tasks.