Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Zhanyi Liu; Haifeng Wang; Hua Wu; Sheng Li

doi:10.1145/2036264.2036280

Back to TIST

TIST 2011

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Journal Article journal-article Artificial Intelligence · Intelligent Systems

Details DOI

Abstract

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact that the words in a collocation tend to co-occur in similar contexts as in bilingual word alignment. First, the monolingual corpus is replicated to generate a parallel corpus, in which each sentence pair consists of two identical sentences. Next, the monolingual word alignment algorithm is employed to align potentially collocated words. Finally, the aligned word pairs are ranked according to the alignment scores and candidates with higher scores are extracted as collocations. We conducted experiments on Chinese and English corpora respectively. Compared to previous approaches that use association measures to extract collocations from co-occurrence word pairs within a given window, our method achieves higher precision and recall. According to human evaluation, our method achieves precisions of 62% on a Chinese corpus and 64% on an English corpus. In particular, we can extract collocations with longer spans, achieving a higher precision of 83% on the long-span (> 6 words) Chinese collocations.

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

Abstract

Authors

Keywords

Context