Samuel Ainsworth Papers

NeurIPS Conference 2025 Conference Paper

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Ori Press
Brandon Amos
Haoyu Zhao
Yikai Wu
Samuel Ainsworth
Dominik Krupke
Patrick Kidger
Touqir Sajed

Despite progress in language model (LM) capabilities, evaluations have thus far focused on models' performance on tasks that humans have previously solved, including in programming (SWE-Bench) and mathematics (FrontierMath). We therefore propose testing models' ability to design and implement algorithms in an open-ended benchmark: We task LMs with writing code that efficiently solves computationally challenging problems in computer science, physics, and mathematics. Our AlgoTune benchmark consists of 120 tasks collected from domain experts and a framework for validating and timing LM-synthesized solution code, which is compared to reference implementations from popular open-source packages. In addition, we develop a baseline LM agent, AlgoTuner, and evaluate its performance across a suite of frontier models. AlgoTuner achieves an average 1. 58x speedup against reference solvers, including methods from packages such as SciPy, scikit-learn and CVXPY. However, we find that current models fail to discover algorithmic innovations, instead preferring surface-level optimizations. We hope that AlgoTune catalyzes the development of LM agents exhibiting creative problem solving beyond state-of-the-art human performance.

PDF Details

NeurIPS Conference 2019 Conference Paper

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

Samuel Ainsworth
Matt Barnes
Siddhartha Srinivasa

In many environments, only a relatively small subset of the complete state space is necessary in order to accomplish a given task. We develop a simple technique using emergency stops (e-stops) to exploit this phenomenon. Using e-stops significantly improves sample complexity by reducing the amount of required exploration, while retaining a performance bound that efficiently trades off the rate of convergence with a small asymptotic sub-optimality gap. We analyze the regret behavior of e-stops and present empirical results in discrete and continuous settings demonstrating that our reset mechanism can provide order-of-magnitude speedups on top of existing reinforcement learning methods.

PDF Details

Possible papers

AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation