Spectral Word Embedding with Negative Sampling

Behrouz Haji Soleimani; Stan Matwin

Back to AAAI

AAAI 2018

Spectral Word Embedding with Negative Sampling

Conference Paper Main Track: NLP and Machine Learning Artificial Intelligence

PDF Details

Abstract

In this work, we investigate word embedding algorithms in the context of natural language processing. In particular, we examine the notion of “negative examples”, the unobserved or insigniﬁcant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justiﬁes the use of negative examples. In fact, our algorithm not only learns from the important wordcontext co-occurrences, but also it learns from the abundance of unobserved or insigniﬁcant co-occurrences to improve the distribution of words in the latent embedded space. We analyze the algorithm theoretically and provide an optimal solution for the problem using spectral analysis. We have trained various word embedding algorithms on articles of Wikipedia with 2. 1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efﬁcient way.

Spectral Word Embedding with Negative Sampling

Abstract

Authors

Keywords

Context