AAAI 2014
Detecting Information-Dense Texts in Multiple News Domains
Abstract
We introduce the task of identifying information-dense texts, which report important factual information in direct, succinct manner. We describe a procedure that allows us to label automatically a large training corpus of New York Times texts. We train a classifier based on lexical, discourse and unlexicalized syntactic features and test its performance on a set of manually annotated articles from business, U. S. international relations, sports and science domains. Our results indicate that the task is feasible and that both syntactic and lexical features are highly predictive for the distinction. We observe considerable variation of prediction accuracy across domains and find that domain-specific models are more accurate.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 202133286719233434