Arrow Research search
Back to NeurIPS

NeurIPS 2025

Factorio Learning Environment

Conference Paper Datasets and Benchmarks Track Artificial Intelligence ยท Machine Learning

Abstract

Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, spatial reasoning, program synthesis, and resource optimization. FLE provides exponentially scaling challenges -- from basic automation to complex factories processing millions of resource units per second. We provide two settings: (1) open-play with the open-ended task of building the largest factory on an procedurally generated map and (2) lab-play consisting of 33 bounded tasks accross three settings with fixed resources. We demonstrate across both settings that models still lack strong spatial reasoning. In lab-play, we find that LLMs exhibit promising short-horizon skills, yet are unable to operate effectively in constrained environments, reflecting limitations in error analysis. In open-play, while LLMs discover automation strategies that improve growth (e. g electric-powered drilling), they fail to achieve complex automation (e. g electronic-circuit manufacturing)

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
694442154188549691