Arrow Research search
Back to TIST

TIST 2011

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Journal Article journal-article Artificial Intelligence ยท Intelligent Systems

Abstract

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact that the words in a collocation tend to co-occur in similar contexts as in bilingual word alignment. First, the monolingual corpus is replicated to generate a parallel corpus, in which each sentence pair consists of two identical sentences. Next, the monolingual word alignment algorithm is employed to align potentially collocated words. Finally, the aligned word pairs are ranked according to the alignment scores and candidates with higher scores are extracted as collocations. We conducted experiments on Chinese and English corpora respectively. Compared to previous approaches that use association measures to extract collocations from co-occurrence word pairs within a given window, our method achieves higher precision and recall. According to human evaluation, our method achieves precisions of 62% on a Chinese corpus and 64% on an English corpus. In particular, we can extract collocations with longer spans, achieving a higher precision of 83% on the long-span (> 6 words) Chinese collocations.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
ACM Transactions on Intelligent Systems and Technology
Archive span
2010-2026
Indexed papers
1415
Paper id
638884908512247550