Arrow Research search

Author name cluster

Samuel Ainsworth

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers
1 author row

Possible papers

2

NeurIPS Conference 2025 Conference Paper

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

  • Ori Press
  • Brandon Amos
  • Haoyu Zhao
  • Yikai Wu
  • Samuel Ainsworth
  • Dominik Krupke
  • Patrick Kidger
  • Touqir Sajed

Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (SWE-Bench) and mathematics (FrontierMath). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 120 tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1. 58x speedup against reference solvers, including methods from packages such as SciPy, scikit-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.

NeurIPS Conference 2019 Conference Paper

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

  • Samuel Ainsworth
  • Matt Barnes
  • Siddhartha Srinivasa

In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.