Poolingformer: Long Document Modeling with Pooling Attention

Hang Zhang 0029; Yeyun Gong; Yelong Shen; Weisheng Li 0001; Jiancheng Lv 0001; Nan Duan 0001; Weizhu Chen

Back to ICML

ICML 2021

Poolingformer: Long Document Modeling with Pooling Attention

Conference Paper Accepted Paper Artificial Intelligence · Machine Learning

Details

Abstract

In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1. 9 points (79. 8 vs. 77. 9) on NQ long answer, 1. 9 points (79. 5 vs. 77. 6) on TyDi QA passage answer, and 1. 6 points (67. 6 vs. 66. 0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.

Poolingformer: Long Document Modeling with Pooling Attention

Abstract

Authors

Keywords

Context