AAAI 1999
Feature Selection in SVM Text Categorization
Abstract
This paperinvestigates the effect of prior feature selection in Support Vector Machine(SVM) text categorization. The input space was gradually increased by using mutualinformation (MI) filtering and part-of-speech (POS)filtering, which determine the portion of wordsthat are appropriate for learning fromthe information-theoretic and the linguistic perspectives, respectively. We tested the two filtering methodson SVMs as well as a decision tree algorithm C4. 5. TheSVMs’ results common to bothfiltering are that 1) the optimalnumberof features differed completelyacross categories, and2) the averageperformance for all categories wasbest whenall of the wordswere used. In addition, a comparison of the twofiltering methodsclarified that POSfiltering on SVMs consistently outperformedMIfiltering, whichindicates that SVMs cannot find irrelevant parts of speech. Theseresults suggesta simplestrategy for the SVM text categorization: use a full number of wordsfound through a rough filtering technique like part-of-speechtagging.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 604124472306988825