Arrow Research search

Author name cluster

Eiichiro Sumita

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers
2 author rows

Possible papers

14

AAAI Conference 2023 Conference Paper

Language Model Pre-training on True Negatives

  • Zhuosheng Zhang
  • Hai Zhao
  • Masao Utiyama
  • Eiichiro Sumita

Discriminative pre-trained language models (PrLMs) learn to predict original texts from intentionally corrupted ones. Taking the former text as positive and the latter as negative samples, the PrLM can be trained effectively for contextualized representation. However, the training of such a type of PrLMs highly relies on the quality of the automatically constructed samples. Existing PrLMs simply treat all corrupted texts as equal negative without any examination, which actually lets the resulting model inevitably suffer from the false negative issue where training is carried out on pseudo-negative data and leads to less efficiency and less robustness in the resulting PrLMs. In this work, on the basis of defining the false negative issue in discriminative PrLMs that has been ignored for a long time, we design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives by correcting the harmful gradient updates subject to false negative predictions. Experimental results on GLUE and SQuAD benchmarks show that our counter-false-negative pre-training methods indeed bring about better performance together with stronger robustness.

IJCAI Conference 2022 Conference Paper

Effective Graph Context Representation for Document-level Machine Translation

  • Kehai Chen
  • Muyun Yang
  • Masao Utiyama
  • Eiichiro Sumita
  • Rui Wang
  • Min Zhang

Document-level neural machine translation (DocNMT) universally encodes several local sentences or the entire document. Thus, DocNMT does not consider the relevance of document-level contextual information, for example, some context (i. e. , content words, logical order, and co-occurrence relation) is more effective than another auxiliary context (i. e. , functional and auxiliary words). To address this issue, we first utilize the word frequency information to recognize content words in the input document, and then use heuristical relations to summarize content words and sentences as a graph structure without relying on external syntactic knowledge. Furthermore, we apply graph attention networks to this graph structure to learn its feature representation, which allows DocNMT to more effectively capture the document-level context. Experimental results on several widely-used document-level benchmarks demonstrated the effectiveness of the proposed approach.

IJCAI Conference 2022 Conference Paper

Explicit Alignment Learning for Neural Machine Translation

  • Zuchao Li
  • Hai Zhao
  • Fengshun Xiao
  • Masao Utiyama
  • Eiichiro Sumita

Even though neural machine translation (NMT) has become the state-of-the-art solution for end-to-end translation, it still suffers from a lack of translation interpretability, which may be conveniently enhanced by explicit alignment learning (EAL), as performed in traditional statistical machine translation (SMT). To provide the benefits of both NMT and SMT, this paper presents a novel model design that enhances NMT with an additional training process for EAL, in addition to the end-to-end translation training. Thus, we propose two approaches an explicit alignment learning approach, in which we further remove the need for the additional alignment model, and perform embedding mixup with the alignment based on encoder--decoder attention weights in the NMT model. We conducted experiments on both small-scale (IWSLT14 De->En and IWSLT13 Fr->En) and large-scale (WMT14 En->De, En->Fr, WMT17 Zh->En) benchmarks. Evaluation results show that our EAL methods significantly outperformed strong baseline methods, which shows the effectiveness of EAL. Further explorations show that the translation improvements are due to a better spatial alignment of the source and target language embeddings. Our method improves translation performance without the need to increase model parameters and training data, which verifies that the idea of incorporating techniques of SMT into NMT is worthwhile.

JAIR Journal 2020 Journal Article

Agreement on Target-Bidirectional Recurrent Neural Networks for Sequence-to-Sequence Learning

  • Lemao Liu
  • Andrew Finch
  • Masao Utiyama
  • Eiichiro Sumita

Recurrent neural networks are extremely appealing for sequence-to-sequence learning tasks. Despite their great success, they typically suffer from a shortcoming: they are prone to generate unbalanced targets with good prefixes but bad suffixes, and thus performance suffers when dealing with long sequences. We propose a simple yet effective approach to overcome this shortcoming. Our approach relies on the agreement between a pair of target-directional RNNs, which generates more balanced targets. In addition, we develop two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of either sequence level or non-sequence level metrics. Extensive experiments were performed on three standard sequence-to-sequence transduction tasks: machine transliteration, grapheme-to-phoneme transformation and machine translation. The results show that the proposed approach achieves consistent and substantial improvements, compared to many state-of-the-art systems.

ICLR Conference 2020 Conference Paper

Data-dependent Gaussian Prior Objective for Language Generation

  • Zuchao Li
  • Rui Wang 0015
  • Kehai Chen
  • Masao Utiyama
  • Eiichiro Sumita
  • Zhuosheng Zhang 0001
  • Hai Zhao 0001

For typical sequence prediction problems such as language generation, maximum likelihood estimation (MLE) has commonly been adopted as it encourages the predicted sequence most consistent with the ground-truth sequence to have the highest probability of occurring. However, MLE focuses on once-to-all matching between the predicted sequence and gold-standard, consequently treating all incorrect predictions as being equally incorrect. We refer to this drawback as {\it negative diversity ignorance} in this paper. Treating all incorrect predictions as equal unfairly downplays the nuance of these sequences' detailed token-wise structure. To counteract this, we augment the MLE loss by introducing an extra Kullback--Leibler divergence term derived by comparing a data-dependent Gaussian prior and the detailed training prediction. The proposed data-dependent Gaussian prior objective (D2GPo) is defined over a prior topological order of tokens and is poles apart from the data-independent Gaussian prior (L2 regularization) commonly adopted in smoothing the training of MLE. Experimental results show that the proposed method makes effective use of a more detailed prior in the data and has improved performance in typical language generation tasks, including supervised and unsupervised machine translation, text summarization, storytelling, and image captioning.

AAAI Conference 2020 Conference Paper

Explicit Sentence Compression for Neural Machine Translation

  • Zuchao Li
  • Rui Wang
  • Kehai Chen
  • Masao Utiyama
  • Eiichiro Sumita
  • Zhuosheng Zhang
  • Hai Zhao

State-of-the-art Transformer-based neural machine translation (NMT) systems still follow a standard encoder-decoder framework, in which source sentence representation can be well done by an encoder with self-attention mechanism. Though Transformer-based encoder may effectively capture general information in its resulting source sentence representation, the backbone information, which stands for the gist of a sentence, is not specifically focused on. In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT. In practice, an explicit sentence compression goal used to learn the backbone information in a sentence. We propose three ways, including backbone source-side fusion, targetside fusion, and both-side fusion, to integrate the compressed sentence into NMT. Our empirical tests on the WMT Englishto-French and English-to-German translation tasks show that the proposed sentence compression method significantly improves the translation performances over strong baselines.

ICLR Conference 2020 Conference Paper

Neural Machine Translation with Universal Visual Representation

  • Zhuosheng Zhang 0001
  • Kehai Chen
  • Rui Wang 0015
  • Masao Utiyama
  • Eiichiro Sumita
  • Zuchao Li
  • Hai Zhao 0001

Though visual information has been introduced for enhancing neural machine translation (NMT), its effectiveness strongly relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations. In this paper, we present a universal visual representation learned over the monolingual corpora with image annotations, which overcomes the lack of large-scale bilingual sentence-image pairs, thereby extending image applicability in NMT. In detail, a group of images with similar topics to the source sentence will be retrieved from a light topic-image lookup table learned over the existing sentence-image pairs, and then is encoded as image representations by a pre-trained ResNet. An attention layer with a gated weighting is to fuse the visual information and text information as input to the decoder for predicting target translations. In particular, the proposed method enables the visual information to be integrated into large-scale text-only NMT in addition to the multimodel NMT. Experiments on four widely used translation datasets, including the WMT'16 English-to-Romanian, WMT'14 English-to-German, WMT'14 English-to-French, and Multi30K, show that the proposed approach achieves significant improvements over strong baselines.

AAAI Conference 2018 Conference Paper

Syntax-Directed Attention for Neural Machine Translation

  • Kehai Chen
  • Rui Wang
  • Masao Utiyama
  • Eiichiro Sumita
  • Tiejun Zhao

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax distance constraints. In this paper, we extend the local attention with syntax-distance constraint, which focuses on syntactically related source words with the predicted target word to learning a more effective context vector for predicting translation. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector from the global attention, to provide more translation performance for NMT from source representation. The experiments on the largescale Chinese-to-English and English-to-German translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.

AAAI Conference 2017 Conference Paper

Deterministic Attention for Sequence-to-Sequence Constituent Parsing

  • Chunpeng Ma
  • Lemao Liu
  • Akihiro Tamura
  • Tiejun Zhao
  • Eiichiro Sumita

The sequence-to-sequence model is proven to be extremely successful in constituent parsing. It relies on one key technique, the probabilistic attention mechanism, to automatically select the context for prediction. Despite its successes, the probabilistic attention model does not always select the most important context. For example, the headword and boundary words of a subtree have been shown to be critical when predicting the constituent label of the subtree, but this contextual information becomes increasingly difficult to learn as the length of the sequence increases. In this study, we proposed a deterministic attention mechanism that deterministically selects the important context and is not affected by the sequence length. We implemented two different instances of this framework. When combined with a novel bottom-up linearization method, our parser demonstrated better performance than that achieved by the sequence-to-sequence parser with probabilistic attention mechanism.

IJCAI Conference 2016 Conference Paper

Assessing Translation Ability through Vocabulary Ability Assessment

  • Yo Ehara
  • Yukino Baba
  • Masao Utiyama
  • Eiichiro Sumita

Translation ability is known as one of the most difficult language abilities to measure. A typical method of measuring translation ability involves asking translators to translate sentences and to request professional evaluators to grade the translations. It imposes a heavy burden on both translators and evaluators. In this paper, we propose a practical method for assessing translation ability. Our key idea is to incorporate translators' vocabulary knowledge for translation ability assessment. Our method involves just asking translators to tell if they know given words. Using this vocabulary information, we build a probabilistic model to estimate the translators' vocabulary and translation abilities simultaneously. We evaluated our method in a realistic crowdsourcing translation setting in which there is a great need to measure translators' translation ability to select good translators. The results of our experiments show that the proposed method accurately estimates translation ability and selects translators who have sufficient skills in translating a given sentence. We also found that our method significantly reduces the cost of crowdsourcing translation.

IJCAI Conference 2011 Conference Paper

picoTrans: Using Pictures as Input for Machine Translation on Mobile Devices

  • Andrew Finch
  • Wei Song
  • Kumiko Tanaka-Ishii
  • Eiichiro Sumita

In this paper we present a novel user interface that integrates two popular approaches to language translation for travelers allowing multimodal communication between the parties involved: the picture-book, in which the user simply points to multiple picture icons representing what they want to say, and the statistical machine translation system that can translate arbitrary word sequences. Our prototype system tightly couples both processes within a translation framework that inherits many of the the positive features of both approaches, while at the same time mitigating their main weaknesses. Our system differs from traditional approaches in that its mode of input is a sequence of pictures, rather than text or speech. Text in the source language is generated automatically, and is used as a detailed representation of the intended meaning. The picture sequence which not only provides a rapid method to communicate basic concepts but also gives a `second opinion' on the machine transition output that catches machine translation errors and allows the users to retry the translation, avoiding misunderstandings.

NeurIPS Conference 2007 Conference Paper

The Infinite Markov Model

  • Daichi Mochihashi
  • Eiichiro Sumita

We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically infinite order. By extending a stick-breaking prior, which is usually defined on a unit interval, “vertically” to the trees of infinite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efficient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data.

AAAI Conference 1994 Conference Paper

The Relationship between Architectures and Example-Retrieval Times

  • Eiichiro Sumita

This paper proposes a method to find the most suitable architecture for a given response time requirement for Example-Retrieval (ER), which searches for the best match from a bulk collection of lingusitic examples. In the Example-Based Approach(EBA), which attains substantially higher accuracy than traditional approaches, ER is extensively used to carry out natural language processing tasks, e. g. , parsing and translation. ER, however, is so computationally demanding that it often takes up most of the total sentence processing time. This paper compares several accelerations of ER on different architectures, i. e. , serial, MIMD and SIMD. Experimental results reveal the relationship between architectures and response times, which will allows us to find the most suitable architecture for a given response time requirement.