Minrui Wang Papers

NeurIPS Conference 2025 Conference Paper

Model Merging in Pre-training of Large Language Models

Yunshui Li
Yiyuan Ma
Shen Yan
Chaoyi Zhang
Jing Liu
Jianqiao Lu
Ziwen Xu
Mengzhao Chen

Model merging has emerged as a promising technique for enhancing large language models, though its application in large-scale pre-training remains relatively unexplored. In this paper, we present a comprehensive investigation of model merging techniques during the pre-training process. Through extensive experiments with both dense and Mixture-of-Experts (MoE) architectures ranging from millions to over 100 billion parameters, we demonstrate that merging checkpoints trained with constant learning rates not only achieves significant performance improvements but also enables accurate prediction of annealing behavior. These improvements lead to both more efficient model development and significantly lower training costs. Our detailed ablation studies on merging strategies and hyperparameters provide new insights into the underlying mechanisms while uncovering novel applications. Through comprehensive experimental analysis, we offer the open-source community practical pre-training guidelines for effective model merging.

PDF Details

AAMAS Conference 2023 Conference Paper

Multi-Agent Reinforcement Learning with Safety Layer for Active Voltage Control

Yufeng Shi
Mingxiao Feng
Minrui Wang
Wengang Zhou
Houqiang Li

The main goal of active voltage control is to keep the voltage of each bus in the grid within a safe range. With the increasing penetration of renewable and distributed energy sources in the grid, growing complexity, increasing uncertainty, and aggravating volatility bring great challenges to voltage control in modern power systems. Traditional algorithms can hardly guarantee real-time safe control to cope with these challenges. In recent years, substantial attention has been paid to the application of multi-agent reinforcement learning algorithms (MARL) to coordinate the control units in each area of the grid in real time for active voltage control in complex scenarios. However, these MARL algorithms do not explicitly guarantee that the power system satisfies the security constraints. There is a little in-depth study on safe multi-agent policy learning in multi-agent-based voltage control, especially the direct correction of unsafe actions. In this paper, we formalize the active voltage control problem as a Constrained Markov Game and approach it with a centralized data-driven safety layer that requires global observations and maps unsafe actions to safe actions. In order to make the policy network rely on local observations for decentralized execution, we introduce two novel components into the policy network: action correction penalty loss and action correction subnetworks. Notably, our approaches are easily extendable to other MARL algorithms for continuous actions. In the experiments, we evaluate our methods in the power distribution network simulation environment and demonstrate the capability of the safety layer to correct unsafe actions and the effectiveness of the safety layer to improve the performance of the policy itself.

PDF

Possible papers

Model Merging in Pre-training of Large Language Models

Multi-Agent Reinforcement Learning with Safety Layer for Active Voltage Control