Arrow Research search
Back to NeurIPS

NeurIPS 2025

Thinking vs. Doing: Improving Agent Reasoning by Scaling Test-Time Interaction

Conference Paper Main Conference Track Artificial Intelligence ยท Machine Learning

Abstract

Test-time scaling in agentic tasks often relies on generating long reasoning traces ("think" more) before acting, but this does not allow agents to acquire new information from the environment or adapt behavior over time. In this work, we propose scaling test-time interaction, an untapped dimension for test-time scaling that increases the agent's interaction horizon to enable rich behaviors such as exploration, backtracking, and dynamic re-planning within a single rollout. To demonstrate the promise of this scaling dimension, we situate our study in the domain of web agents. We first show that even prompting-based interaction scaling can improve task success on web benchmarks non-trivially. Building on this, we introduce TTI, a curriculum-based online reinforcement learning (RL) approach that trains agents by adaptively adjusting their interaction lengths during rollout. Using a Gemma 3 12B model, TTI sets a new state-of-the-art among open-source agents trained on public data on WebVoyager and WebArena. Case studies further reveal that TTI enables agents to balance exploration and exploitation adaptively. Our results establish interaction scaling as a powerful, complementary axis to scaling per-action compute, offering new avenues for training robust and adaptive agents.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
235035188328253345