Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Li Yang; Zhezhi He; Deliang Fan

Back to AAAI

AAAI 2020

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Conference Paper AAAI Technical Track: Machine Learning Artificial Intelligence

PDF Details

Abstract

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i. e. , forcing partial weights as zeros) and quantizing weights into limited bitwidth values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e. g. , Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we ﬁrst propose a PE-wise structured pruning scheme, which introduces weight sparsiﬁcation with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({−1, 0, +1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit ﬂoating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-theart ∼ 21× PE-wise structured compression rate with merely 1. 74%/0. 94% (top-1/top-5) accuracy degradation of ResNet- 18 on ImageNet dataset.

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Abstract

Authors

Keywords

Context