ICML Conference 2025 Conference Paper
MARGE: Improving Math Reasoning with Guided Exploration
- Jingyue Gao
- Runji Lin
- Keming Lu
- Bowen Yu 0002
- Junyang Lin
- Jianyu Chen 0002
Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce MARGE: Improving Ma th R easoning with G uided E xploration, a novel method that enhances mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE’s effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.