Arrow Research search

Author name cluster

Leon Bergen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

AAAI Conference 2026 Conference Paper

Quiet Feature Learning in Algorithmic Tasks

  • Prudhviraj Naidu
  • Zixian Wang
  • Leon Bergen
  • Ramamohan Paturi

We train Transformer-based language models on ten foundational algorithmic tasks and observe pronounced phase transitions in their loss curves that deviate from established power-law scaling trends. Over large ranges of compute, the validation loss barely improves, then abruptly decreases. Probing the models’ internal representations reveals that quiet features are learned prior to any decrease in task loss. These quiet features represent intermediate algorithmic computations that do not by themselves improve the output loss. Ablation experiments demonstrate that individual quiet features are causally necessary for task performance. Our results demonstrate that substantial representational progress can remain hidden beneath an apparently flat loss curve, challenging the prevailing use of cross‑entropy as a proxy for learning and motivating richer diagnostics for monitoring model training.

ICML Conference 2025 Conference Paper

Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation

  • Bohan Lyu 0001
  • Yadi Cao
  • Duncan Watson-Parris
  • Leon Bergen
  • Taylor Berg-Kirkpatrick
  • Rose Yu

Large Language Models (LLMs) demonstrate promising capabilities in solving scientific problems but often suffer from the issue of hallucination. While integrating LLMs with tools can mitigate this issue, models fine-tuned on tool usage become overreliant on them and incur unnecessary costs. Inspired by how human experts assess problem complexity before selecting solutions, we propose a novel two-component fine-tuning method, Adapting while Learning (AWL). In the first component World Knowledge Learning (WKL), LLMs internalize scientific knowledge by learning from tool-generated solutions. In the second component Tool Usage Adaptation (TUA), we categorize problems as easy or hard based on the model’s accuracy, and train it to maintain direct reasoning for easy problems while switching to tools for hard ones. We validate our method on 6 scientific benchmark datasets across climate science, epidemiology, physics, and other domains. Compared to the original instruct model (8B), models post-trained with AWL achieve 29. 11% higher answer accuracy and 12. 72% better tool usage accuracy, even surpassing state-of-the-art models including GPT-4o and Claude-3. 5 on 4 custom-created datasets. Our code is open-source at https: //github. com/Rose-STL-Lab/Adapting-While-Learning.

ICLR Conference 2025 Conference Paper

ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models

  • Veeramakali Vignesh Manivannan
  • Yasaman Jafari
  • Srikar Eranky
  • Spencer Ho
  • Rose Yu
  • Duncan Watson-Parris
  • Yian Ma
  • Leon Bergen

The use of Large Language Models (LLMs) in climate science has recently gained significant attention. However, a critical issue remains: the lack of a comprehensive evaluation framework capable of assessing the quality and scientific validity of model outputs. To address this issue, we develop *ClimaGen* (Climate QA Generator), an adaptive learning framework that generates question-answer pairs from graduate textbooks with climate scientists in the loop. As a result, we present *ClimaQA-Gold*, an expert-annotated benchmark dataset alongside *ClimaQA-Silver*, a large-scale, comprehensive synthetic QA dataset for climate science. Finally, we develop evaluation strategies and compare different LLMs on our benchmarks. Our results offer novel insights into various approaches used to enhance knowledge of climate LLMs. ClimaQA’s source code is publicly available at https://github.com/Rose-STL-Lab/genie-climaqa

NeurIPS Conference 2023 Conference Paper

Scientific Document Retrieval using Multi-level Aspect-based Queries

  • Jianyou (Andre) Wang
  • Kaicheng Wang
  • Xiaoyue Wang
  • Prudhviraj Naidu
  • Leon Bergen
  • Ramamohan Paturi

In scientific research, the ability to effectively retrieve relevant documents based on complex, multifaceted queries is critical. Existing evaluation datasets for this task are limited, primarily due to the high costs and effort required to annotate resources that effectively represent complex queries. To address this, we propose a novel task, $\textbf{S}$cientific $\textbf{Do}$cument $\textbf{R}$etrieval using $\textbf{M}$ulti-level $\textbf{A}$spect-based qu$\textbf{E}$ries (DORIS-MAE), which is designed to handle the complex nature of user queries in scientific research. We developed a benchmark dataset within the field of computer science, consisting of 100 human-authored complex query cases. For each complex query, we assembled a collection of 100 relevant documents and produced annotated relevance scores for ranking them. Recognizing the significant labor of expert annotation, we also introduce Anno-GPT, a scalable framework for evaluating the viability of Large Language Models (LLMs) such as ChatGPT-3. 5 for expert-level dataset annotation tasks. The application of Anno-GPT to annotate the DORIS-MAE dataset resulted in a 500x reduction in cost, without compromising quality. Furthermore, due to the multi-tiered structure of these complex queries, our DORIS-MAE dataset can be extended to over 4, 000 sub-query test cases without requiring additional annotation. We evaluated 17 recent retrieval methods on DORIS-MAE, observing notable performance drops compared to traditional datasets. This highlights DORIS-MAE's challenges and the need for better approaches to handle complex, multifaceted queries in scientific research. Our dataset and codebase are available at https: //github. com/Real-Doris-Mae/Doris-Mae-Dataset.

NeurIPS Conference 2021 Conference Paper

Systematic Generalization with Edge Transformers

  • Leon Bergen
  • Timothy O'Donnell
  • Dzmitry Bahdanau

Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks. To tackle this challenge, we propose Edge Transformer, a new model that combines inspiration from Transformers and rule-based symbolic AI. The first key idea in Edge Transformers is to associate vector states with every edge, that is, with every pair of input nodes---as opposed to just every node, as it is done in the Transformer model. The second major innovation is a triangular attention mechanism that updates edge representations in a way that is inspired by unification from logic programming. We evaluate Edge Transformer on compositional generalization benchmarks in relational reasoning, semantic parsing, and dependency parsing. In all three settings, the Edge Transformer outperforms Relation-aware, Universal and classical Transformer baselines.