SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Yanwei Ren; Haotian Zhang; Fuxiang Wu; Jiayan Qiu; Jiaxing Huang; Baosheng Yu; Liu Liu

Back to NeurIPS

NeurIPS 2025

SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

Enhancing large language models by simply scaling up datasets has begun to yield diminishing returns, shifting the spotlight to data quality. Monte Carlo Tree Search (MCTS) has emerged as a powerful technique for generating high-quality chain-of-thought data, yet conventional approaches typically retain only the top-scoring trajectory from the search tree, discarding sibling nodes that often contain valuable partial insights, recurrent error patterns, and alternative reasoning strategies. This unconditional rejection of non-optimal reasoning branches may waste vast amounts of informative data in the whole search tree. We propose SIGMA (Sibling Guided Monte Carlo Augmentation), a novel framework that reintegrates these discarded sibling nodes to refine LLM reasoning. SIGMA forges semantic links among sibling nodes along each search path and applies a two-stage refinement: a critique model identifies overlooked strengths and weaknesses across the sibling set, and a revision model conducts text-based backpropagation to refine the top-scoring trajectory in light of this comparative feedback. By recovering and amplifying the underutilized but valuable signals from non-optimal reasoning branches, SIGMA substantially improves reasoning trajectories. On the challenging MATH benchmark, our SIGMA-tuned 7B model achieves 54. 92\% accuracy using only 30K samples, outperforming state-of-the-art models trained on 590K samples. This result highlights that our sibling-guided optimization not only significantly reduces data usage but also significantly boosts LLM reasoning.

SIGMA: Refining Large Language Model Reasoning via Sibling-Guided Monte Carlo Augmentation

Abstract

Authors

Keywords

Context