Arrow Research search
Back to AAAI

AAAI 1999

Feature Selection in SVM Text Categorization

Conference Paper Natural Language and Information Retrieval Artificial Intelligence

Abstract

This paperinvestigates the effect of prior feature selection in Support Vector Machine(SVM) text categorization. The input space was gradually increased by using mutualinformation (MI) filtering and part-of-speech (POS)filtering, which determine the portion of wordsthat are appropriate for learning fromthe information-theoretic and the linguistic perspectives, respectively. We tested the two filtering methodson SVMs as well as a decision tree algorithm C4. 5. TheSVMs’ results common to bothfiltering are that 1) the optimalnumberof features differed completelyacross categories, and2) the averageperformance for all categories wasbest whenall of the wordswere used. In addition, a comparison of the twofiltering methodsclarified that POSfiltering on SVMs consistently outperformedMIfiltering, whichindicates that SVMs cannot find irrelevant parts of speech. Theseresults suggesta simplestrategy for the SVM text categorization: use a full number of wordsfound through a rough filtering technique like part-of-speechtagging.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
AAAI Conference on Artificial Intelligence
Archive span
1980-2026
Indexed papers
28718
Paper id
604124472306988825