A Survey on Model Compression and Acceleration for Pretrained Language Models

Canwen Xu; Julian McAuley

doi:10.1609/aaai.v37i9.26255

Back to AAAI

AAAI 2023

A Survey on Model Compression and Acceleration for Pretrained Language Models

Conference Paper AAAI Technical Track on Machine Learning IV Artificial Intelligence

PDF Details DOI

Abstract

Despite achieving state-of-the-art performance on many NLP tasks, the high energy cost and long inference delay prevent Transformer-based pretrained language models (PLMs) from seeing broader adoption including for edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model training and inference. In this survey, we focus on the inference stage and review the current state of model compression and acceleration for pretrained language models, including benchmarks, metrics and methodology.

Authors

Keywords

ML: Learning on the Edge & Model Compression
SNLP: Language Models

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 1002874270507774917