Feature Generation for Sequence Categorization

Daniel Kudenko

Back to AAAI

AAAI 1998

Feature Generation for Sequence Categorization

Conference Paper Learning from Sequences Artificial Intelligence

PDF Details

Abstract

The problem of sequence categorization is to generalize from a corpus of labeled sequences procedures for accurately labeling future unlabeled sequences. The choice of representation of sequences can have a major impact on this task, and in the absence of background knowledge a good representation is often not knownand straightforward representations are often far from optimal. Wepropose a feature generation method (called FGEN)that creates Boolean features that check for the presence or absence of heuristically selected collections of subsequences. Weshow empirically that the representation computedby FGEN improves the accuracy of two commonly used learning systems (C4. 5 and Ripper) whenthe new features are added to existing representations of sequence data. Weshowthe superiority of FGEN across a range of tasks selected from three domains: DNAsequences, Unix commandsequences, and English text.

Authors

Daniel Kudenko University of York

Keywords

No keywords are indexed for this paper.

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 1059275229990596559