An Efficient Transformer Decoder with Compressed Sub-layers

Yanyang Li; Ye Lin; Tong Xiao; Jingbo Zhu

Back to AAAI

AAAI 2021

An Efficient Transformer Decoder with Compressed Sub-layers

Conference Paper AAAI Technical Track on Speech and Natural Language Processing II Artificial Intelligence

PDF Details

Abstract

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic formulation of the decoder, we show that under some mild conditions, the architecture could be simplified by compressing its sub-layers, the basic building block of Transformer, and achieves a higher parallelism. We thereby propose Compressed Attention Network, whose decoder layer consists of only one sub-layer instead of three. Extensive experiments on 14 WMT machine translation tasks show that our model is 1. 42× faster with performance on par with a strong baseline. This strong baseline is already 2× faster than the widely used standard baseline without loss in performance.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 1134328992119028360