Vinith Misra Papers

AAAI Conference 2020 Conference Paper

Simplify-Then-Translate: Automatic Preprocessing for Black-Box Translation

Sneha Mehta
Bahareh Azarnoush
Boris Chen
Avneesh Saluja
Vinith Misra
Ballav Bihani
Ritwik Kumar

Black-box machine translation systems have proven incredibly useful for a variety of applications yet by design are hard to adapt, tune to a speciﬁc domain, or build on top of. In this work, we introduce a method to improve such systems via automatic pre-processing (APP) using sentence simpliﬁcation. We ﬁrst propose a method to automatically generate a large in-domain paraphrase corpus through back-translation with a black-box MT system, which is used to train a paraphrase model that “simpliﬁes” the original sentence to be more conducive for translation. The model is used to preprocess source sentences of multiple low-resource language pairs. We show that this preprocessing leads to better translation performance as compared to non-preprocessed source sentences. We further perform side-by-side human evaluation to verify that translations of the simpliﬁed sentences are better than the original ones. Finally, we provide some guidance on recommended language pairs for generating the simpliﬁcation model corpora by investigating the relationship between ease of translation of a language pair (as measured by BLEU) and quality of the resulting simpliﬁcation model from backtranslations of this language pair (as measured by SARI), and tie this into the downstream task of low-resource translation.

PDF Details

AAAI Conference 2018 Conference Paper

Bernoulli Embeddings for Graphs

Vinith Misra
Sumit Bhatia

Just as semantic hashing (Salakhutdinov and Hinton 2009) can accelerate information retrieval, binary valued embeddings can signiﬁcantly reduce latency in the retrieval of graphical data. We introduce a simple but effective model for learning such binary vectors for nodes in a graph. By imagining the embeddings as independent coin ﬂips of varying bias, continuous optimization techniques can be applied to the approximate expected loss. Embeddings optimized in this fashion consistently outperform the quantization of both spectral graph embeddings and various learned real-valued embeddings, on both ranking and pre-ranking tasks for a variety of datasets.

PDF Details

Possible papers

Simplify-Then-Translate: Automatic Preprocessing for Black-Box Translation

Bernoulli Embeddings for Graphs