Weinong Wang Papers

AAAI Conference 2026 Conference Paper

LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs

Junlong Jia
Xing Wu
Chaochen Gao
Ziyang Chen
Zijia Lin
Zhongzhi Li
Weinong Wang
Haotian Xu

High-quality long-context data is essential for training large language models (LLMs) capable of processing extensive documents, yet existing synthesis approaches using relevance-based aggregation face challenges of computational efficiency. We present LiteLong, a resource-efficient method for synthesizing long-context data through structured topic organization and multi-agent debate. Our approach leverages the BISAC book classification system to provide a comprehensive hierarchical topic organization, and then employs a debate mechanism with multiple LLMs to generate diverse, high-quality topics within this structure. For each topic, we use lightweight BM25 retrieval to obtain relevant documents and concatenate them into 128K-token training samples. Experiments on HELMET and Ruler benchmarks demonstrate that LiteLong achieves competitive long-context performance and can seamlessly integrate with other long-dependency enhancement methods. LiteLong makes high-quality long-context data synthesis more accessible by reducing both computational and data engineering costs, facilitating further research in long-context language training.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving

Xinji Mai
Haotian Xu
Xing W
Weinong Wang
Yingying Zhang
Wenqiang Zhang

Large Language Models (LLMs) often struggle with mathematical reasoning tasks requiring precise, verifiable computation. While Reinforcement Learning (RL) from outcome-based rewards enhances text-based reasoning, understanding how agents autonomously learn to leverage external tools like code execution remains crucial. We investigate RL from outcome-based rewards for Tool-Integrated Reasoning, ZeroTIR, training base LLMs to spontaneously generate and execute Python code for mathematical problems without supervised tool-use examples. Our central contribution is we demonstrate that as RL training progresses, key metrics scale predictably. Specifically, we observe strong positive correlations where increased training steps lead to increases in the spontaneous code execution frequency, the average response length, and, critically, the final task accuracy. This suggests a quantifiable relationship between computational effort invested in training and the emergence of effective, tool-augmented reasoning strategies. We implement a robust framework featuring a decoupled code execution environment and validate our findings across standard RL algorithms and frameworks. Experiments show ZeroTIR significantly surpasses non-tool ZeroRL baselines on challenging math benchmarks. Our findings provide a foundational understanding of how autonomous tool use is acquired and scales within Agent RL, offering a reproducible benchmark for future studies. Code is released at \href{https: //github. com/yyht/openrlhf async pipline}{https: //github. com/yyht/openrlhf_async_pipline}.

PDF Details

Possible papers

LiteLong: Resource-Efficient Long-Context Data Synthesis for LLMs

Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving