Relation Extraction from Wikipedia Using Subtree Mining

Dat P.T. Nguyen

Back to AAAI

AAAI 2007

Relation Extraction from Wikipedia Using Subtree Mining

Conference Paper Special Track on Artificial Intelligence and the Web Artificial Intelligence

PDF Details

Abstract

The exponential growth and reliability of Wikipedia have made it a promising data source for intelligent systems. The ﬁrst challenge of Wikipedia is to make the encyclopedia machine-processable. In this study, we address the problem of extracting relations among entities from Wikipedia’s English articles, which in turn can serve for intelligent systems to satisfy users’ information needs. Our proposed method ﬁrst anchors the appearance of entities in Wikipedia articles using some heuristic rules that are supported by their encyclopedic style. Therefore, it uses neither the Named Entity Recognizer (NER) nor the Coreference Resolution tool, which are sources of errors for relation extraction. It then classiﬁes the relationships among entity pairs using SVM with features extracted from the web structure and subtrees mined from the syntactic structure of text. The innovations behind our work are the following: a) our method makes use of Wikipedia characteristics for entity allocation and entity classiﬁcation, which are essential for relation extraction; b) our algorithm extracts a core tree, which accurately reﬂects a relationship between a given entity pair, and subsequently identiﬁes key features with respect to the relationship from the core tree. We demonstrate the effectiveness of our approach through evaluation of manually annotated data from actual Wikipedia articles.

Authors

Dat P.T. Nguyen Yutaka Matsuo

Keywords

No keywords are indexed for this paper.

Context

Venue: AAAI Conference on Artificial Intelligence
Archive span: 1980-2026
Indexed papers: 28718
Paper id: 199275850262439722