Enhancing Large Language Model Performance with Gradient-Based Parameter Selection

Haoling Li; Xin Zhang; Xiao Liu; Yeyun Gong; Yifan Wang; Qi Chen; Peng Cheng

doi:10.1609/aaai.v39i23.34621

Back to AAAI

AAAI 2025

Enhancing Large Language Model Performance with Gradient-Based Parameter Selection

Conference Paper AAAI Technical Track on Natural Language Processing II Artificial Intelligence

PDF Details DOI

Abstract

Large language models (LLMs) have revolutionized numerous fields of research, driving significant advancements in natural language processing, machine translation, and beyond. Although the extensive number of parameters contributes a lot to the great success, existing studies indicate that not all model parameters hold equal importance, which further leads to redundancy during the parameter update process. Recent works for reducing redundant parameter updates for LLMs either lack task-specific data information, may leading to suboptimal model performance, or discard transformer components or insignificant parameters, limiting the model's scalability across different tasks and potentially compromising the LLM structure. To address these issues and further enhance the performance of LLMs, we propose Gradient-Mask Tuning (GMT), a method that selectively updates parameters based on gradient information, which is specific to the target tasks. Specifically, after calculating gradients during back propagation, we measure their absolute values and mask those with small absolute values. Our empirical results in various training paradigms like SFT and DPO for various domains of tasks demonstrate that GMT not only preserves the original network structure but also enhances the potential performance of LLMs. Further analysis indicates that GMT exhibits insensitivity to mask ratio and possesses computational efficiency comparable to vanilla training approach.

Enhancing Large Language Model Performance with Gradient-Based Parameter Selection

Abstract

Authors

Keywords

Context