Arrow Research search

Author name cluster

Yinglong Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

1 paper
1 author row

Possible papers

1

AAAI Conference 2026 Conference Paper

Efficient Plug-and-Play Weight Refinement for Sparse Large Models

  • Jingcheng Xie
  • Yinda Chen
  • Xiaoyu Liu
  • Yinglong Li
  • Haoyuan Shi
  • Zhiwei Xiong

One-shot pruning efficiently compresses Large Language Models but produces coarse sparse weights, causing significant performance degradation. Traditional fine-tuning approaches to refine these weights are prohibitively expensive for large models. This highlights the need for a training-free weight refinement method that works seamlessly with one-shot pruning and can efficiently recover the lost performance. To tackle this problem, we propose Efficient Iterative Weight Refinement (EIWR), a lightweight, plug-and-play, and training-free method that refines pruned weights through layer-wise iterative optimization. EIWR achieves efficient weight refinement via three key components: a Global Soft Constraint that eliminates costly row-wise Hessian inversions and expands the solution space; a Historical Momentum Strategy that leverages one-shot pruning priors to accelerate convergence and enhance final performance; and Neumann Series Extrapolation that significantly speeds up per-iteration computation. As a result, EIWR enables effective weight refinement with minimal time and memory overhead. Extensive experiments on LLaMA2/3 and Qwen under different pruning strategies and sparsity levels demonstrate that our method can efficiently refine sparse weights and mitigate performance degradation. For example, on LLaMA2-7B under 70 percent sparsity, EIWR reduces perplexity by 15 percent compared with SparseGPT on the WikiText2 benchmark, with only 1.81 additional minutes of computation and 1GB of additional memory.