ICML 2021
Poolingformer: Long Document Modeling with Pooling Attention
Abstract
In this paper, we introduce a two-level attention schema, Poolingformer, for long document modeling. Its first level uses a smaller sliding window pattern to aggregate information from neighbors. Its second level employs a larger window to increase receptive fields with pooling attention to reduce both computational cost and memory consumption. We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA. Experimental results show that Poolingformer sits atop three official leaderboards measured by F1, outperforming previous state-of-the-art models by 1. 9 points (79. 8 vs. 77. 9) on NQ long answer, 1. 9 points (79. 5 vs. 77. 6) on TyDi QA passage answer, and 1. 6 points (67. 6 vs. 66. 0) on TyDi QA minimal answer. We further evaluate Poolingformer on a long sequence summarization task. Experimental results on the arXiv benchmark continue to demonstrate its superior performance.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- International Conference on Machine Learning
- Archive span
- 1993-2025
- Indexed papers
- 16471
- Paper id
- 11457205654503474