Efficient Plug-and-Play Weight Refinement for Sparse Large Models

Jingcheng Xie; Yinda Chen; Xiaoyu Liu; Yinglong Li; Haoyuan Shi; Zhiwei Xiong

doi:10.1609/aaai.v40i32.39922

Back to AAAI

AAAI 2026

Efficient Plug-and-Play Weight Refinement for Sparse Large Models

Conference Paper AAAI Technical Track on Machine Learning IX Artificial Intelligence

PDF Details DOI

Abstract

One-shot pruning efficiently compresses Large Language Models but produces coarse sparse weights, causing significant performance degradation. Traditional fine-tuning approaches to refine these weights are prohibitively expensive for large models. This highlights the need for a training-free weight refinement method that works seamlessly with one-shot pruning and can efficiently recover the lost performance. To tackle this problem, we propose Efficient Iterative Weight Refinement (EIWR), a lightweight, plug-and-play, and training-free method that refines pruned weights through layer-wise iterative optimization. EIWR achieves efficient weight refinement via three key components: a Global Soft Constraint that eliminates costly row-wise Hessian inversions and expands the solution space; a Historical Momentum Strategy that leverages one-shot pruning priors to accelerate convergence and enhance final performance; and Neumann Series Extrapolation that significantly speeds up per-iteration computation. As a result, EIWR enables effective weight refinement with minimal time and memory overhead. Extensive experiments on LLaMA2/3 and Qwen under different pruning strategies and sparsity levels demonstrate that our method can efficiently refine sparse weights and mitigate performance degradation. For example, on LLaMA2-7B under 70 percent sparsity, EIWR reduces perplexity by 15 percent compared with SparseGPT on the WikiText2 benchmark, with only 1.81 additional minutes of computation and 1GB of additional memory.

Efficient Plug-and-Play Weight Refinement for Sparse Large Models

Abstract

Authors

Keywords

Context