Arrow Research search

Author name cluster

Yingce Xia

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

ICLR Conference 2023 Conference Paper

De Novo Molecular Generation via Connection-aware Motif Mining

  • Zijie Geng
  • Shufang Xie 0003
  • Yingce Xia
  • Lijun Wu 0003
  • Tao Qin 0001
  • Jie Wang 0005
  • Yongdong Zhang 0001
  • Feng Wu 0001

De novo molecular generation is an essential task for science discovery. Recently, fragment-based deep generative models have attracted much research attention due to their flexibility in generating novel molecules based on existing molecule fragments. However, the motif vocabulary, i.e., the collection of frequent fragments, is usually built upon heuristic rules, which brings difficulties to capturing common substructures from large amounts of molecules. In this work, we propose MiCaM to generate molecules based on mined connection-aware motifs. Specifically, it leverages a data-driven algorithm to automatically discover motifs from a molecule library by iteratively merging subgraphs based on their frequency. The obtained motif vocabulary consists of not only molecular motifs (i.e., the frequent fragments), but also their connection information, indicating how the motifs are connected with each other. Based on the mined connection-aware motifs, MiCaM builds a connection-aware generator, which simultaneously picks up motifs and determines how they are connected. We test our method on distribution-learning benchmarks (i.e., generating novel molecules to resemble the distribution of a given training set) and goal-directed benchmarks (i.e., generating molecules with target properties), and achieve significant improvements over previous fragment-based baselines. Furthermore, we demonstrate that our method can effectively mine domain-specific motifs for different tasks.

NeurIPS Conference 2023 Conference Paper

FABind: Fast and Accurate Protein-Ligand Binding

  • Qizhi Pei
  • Kaiyuan Gao
  • Lijun Wu
  • Jinhua Zhu
  • Yingce Xia
  • Shufang Xie
  • Tao Qin
  • Kun He

Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose FABind, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. FABind incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed FABind demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at https: //github. com/QizhiPei/FABind.

AAAI Conference 2023 Conference Paper

Retrosynthesis Prediction with Local Template Retrieval

  • Shufang Xie
  • Rui Yan
  • Junliang Guo
  • Yingce Xia
  • Lijun Wu
  • Tao Qin

Retrosynthesis, which predicts the reactants of a given target molecule, is an essential task for drug discovery. In recent years, the machine learing based retrosynthesis methods have achieved promising results. In this work, we introduce RetroKNN, a local reaction template retrieval method to further boost the performance of template-based systems with non-parametric retrieval. We first build an atom-template store and a bond-template store that contains the local templates in the training data, then retrieve from these templates with a k-nearest-neighbor (KNN) search during inference. The retrieved templates are combined with neural network predictions as the final output. Furthermore, we propose a lightweight adapter to adjust the weights when combing neural network and KNN predictions conditioned on the hidden representation and the retrieved templates. We conduct comprehensive experiments on two widely used benchmarks, the USPTO-50K and USPTO-MIT. Especially for the top-1 accuracy, we improved 7.1% on the USPTO-50K dataset and 12.0% on the USPTO-MIT dataset.These results demonstrate the effectiveness of our method.

ICML Conference 2023 Conference Paper

Retrosynthetic Planning with Dual Value Networks

  • Guoqing Liu
  • Di Xue
  • Shufang Xie 0003
  • Yingce Xia
  • Austin Tripp
  • Krzysztof Maziarz
  • Marwin H. S. Segler
  • Tao Qin 0001

Retrosynthesis, which aims to find a route to synthesize a target molecule from commercially available starting materials, is a critical task in drug discovery and materials design. Recently, the combination of ML-based single-step reaction predictors with multi-step planners has led to promising results. However, the single-step predictors are mostly trained offline to optimize the single-step accuracy, without considering complete routes. Here, we leverage reinforcement learning (RL) to improve the single-step predictor, by using a tree-shaped MDP to optimize complete routes. Specifically, we propose a novel online training algorithm, called Planning with Dual Value Networks (PDVN), which alternates between the planning phase and updating phase. In PDVN, we construct two separate value networks to predict the synthesizability and cost of molecules, respectively. To maintain the single-step accuracy, we design a two-branch network structure for the single-step predictor. On the widely-used USPTO dataset, our PDVN algorithm improves the search success rate of existing multi-step planners (e. g. , increasing the success rate from 85. 79% to 98. 95% for Retro$^{\ast}$, and reducing the number of model calls by half while solving 99. 47% molecules for RetroGraph). Additionally, PDVN helps find shorter synthesis routes (e. g. , reducing the average route length from 5. 76 to 4. 83 for Retro$^{\ast}$, and from 5. 63 to 4. 78 for RetroGraph).

ICLR Conference 2023 Conference Paper

𝒪-GNN: incorporating ring priors into molecular modeling

  • Jinhua Zhu 0001
  • Kehan Wu
  • Bohan Wang
  • Yingce Xia
  • Shufang Xie 0003
  • Qi Meng
  • Lijun Wu 0003
  • Tao Qin 0001

Cyclic compounds that contain at least one ring play an important role in drug design. Despite the recent success of molecular modeling with graph neural networks (GNNs), few models explicitly take rings in compounds into consideration, consequently limiting the expressiveness of the models. In this work, we design a new variant of GNN, ring-enhanced GNN ($\mathcal{O}$-GNN), that explicitly models rings in addition to atoms and bonds in compounds. In $\mathcal{O}$-GNN, each ring is represented by a latent vector, which contributes to and is iteratively updated by atom and bond representations. Theoretical analysis shows that $\mathcal{O}$-GNN is able to distinguish two isomorphic subgraphs lying on different rings using only one layer while conventional graph convolutional neural networks require multiple layers to distinguish, demonstrating that $\mathcal{O}$-GNN is more expressive. Through experiments, $\mathcal{O}$-GNN shows good performance on $\bf{11}$ public datasets. In particular, it achieves state-of-the-art validation result on the PCQM4Mv1 benchmark (outperforming the previous KDDCup champion solution) and the drug-drug interaction prediction task on DrugBank. Furthermore, $\mathcal{O}$-GNN outperforms strong baselines (without modeling rings) on the molecular property prediction and retrosynthesis prediction tasks.

AAAI Conference 2022 Conference Paper

DDG-DA: Data Distribution Generation for Predictable Concept Drift Adaptation

  • Wendi Li
  • Xiao Yang
  • Weiqing Liu
  • Yingce Xia
  • Jiang Bian

In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the nonstationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work. In this paper, we propose a novel method DDG-DA, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data. We conduct experiments on three realworld tasks (forecasting on stock price trend, electricity load and solar irradiance) and obtain significant improvement on multiple widely-used models.

TMLR Journal 2022 Journal Article

Direct Molecular Conformation Generation

  • Jinhua Zhu
  • Yingce Xia
  • Chang Liu
  • Lijun Wu
  • Shufang Xie
  • Yusong Wang
  • Tong Wang
  • Tao Qin

Molecular conformation generation aims to generate three-dimensional coordinates of all the atoms in a molecule and is an important task in bioinformatics and pharmacology. Previous methods usually first predict the interatomic distances, the gradients of interatomic distances or the local structures (e.g., torsion angles) of a molecule, and then reconstruct its 3D conformation. How to directly generate the conformation without the above intermediate values is not fully explored. In this work, we propose a method that directly predicts the coordinates of atoms: (1) the loss function is invariant to roto-translation of coordinates and permutation of symmetric atoms; (2) the newly proposed model adaptively aggregates the bond and atom information and iteratively refines the coordinates of the generated conformation. Our method achieves the best results on GEOM-QM9 and GEOM-Drugs datasets. Further analysis shows that our generated conformations have closer properties (e.g., HOMO-LUMO gap) with the groundtruth conformations. In addition, our method improves molecular docking by providing better initial conformations. All the results demonstrate the effectiveness of our method and the great potential of the direct approach. The code is released at \url{https://github.com/DirectMolecularConfGen/DMCG}.

ICLR Conference 2022 Conference Paper

Target-Side Input Augmentation for Sequence to Sequence Generation

  • Shufang Xie 0003
  • Ang Lv
  • Yingce Xia
  • Lijun Wu 0003
  • Tao Qin 0001
  • Tie-Yan Liu
  • Rui Yan 0001

Autoregressive sequence generation, a prevalent task in machine learning and natural language processing, generates every target token conditioned on both a source input and previously generated target tokens. Previous data augmentation methods, which have been shown to be effective for the task, mainly enhance source inputs (e.g., injecting noise into the source sequence by random swapping or masking, back translation, etc.) while overlooking the target-side augmentation. In this work, we propose a target-side augmentation method for sequence generation. In training, we use the decoder output probability distributions as soft indicators, which are multiplied with target token embeddings, to build pseudo tokens. These soft pseudo tokens are then used as target tokens to enhance the training. We conduct comprehensive experiments on various sequence generation tasks, including dialog generation, machine translation, and abstractive summarization. Without using any extra labeled data or introducing additional model parameters, our method significantly outperforms strong baselines. The code is available at https://github.com/TARGET-SIDE-DATA-AUG/TSDASG.

ICLR Conference 2021 Conference Paper

IOT: Instance-wise Layer Reordering for Transformer Structures

  • Jinhua Zhu 0001
  • Lijun Wu 0003
  • Yingce Xia
  • Shufang Xie 0003
  • Tao Qin 0001
  • Wengang Zhou 0001
  • Houqiang Li
  • Tie-Yan Liu

With sequentially stacked self-attention, (optional) encoder-decoder attention, and feed-forward layers, Transformer achieves big success in natural language processing (NLP), and many variants have been proposed. Currently, almost all these models assume that the \emph{layer order} is fixed and kept the same across data samples. We observe that different data samples actually favor different orders of the layers. Based on this observation, in this work, we break the assumption of the fixed layer order in Transformer and introduce instance-wise layer reordering into model structure. Our Instance-wise Ordered Transformer (IOT) can model variant functions by reordered layers, which enables each sample to select the better one to improve the model performance under the constraint of almost same number of parameters. To achieve this, we introduce a light predictor with negligible parameter and inference cost to decide the most capable and favorable layer order for any input sequence. Experiments on $3$ tasks (neural machine translation, abstractive summarization, and code generation) and $9$ datasets demonstrate consistent improvements of our method. We further show that our method can also be applied to other architectures beyond Transformer. Our code is released at Github\footnote{\url{https://github.com/instance-wise-ordered-transformer/IOT}}.

AAAI Conference 2021 Conference Paper

Learning to Reweight with Deep Interactions

  • Yang Fan
  • Yingce Xia
  • Lijun Wu
  • Shufang Xie
  • Weiqing Liu
  • Jiang Bian
  • Tao Qin
  • Xiang-Yang Li

Recently, the concept of teaching has been introduced into machine learning, in which a teacher model is used to guide the training of a student model (which will be used in real tasks) through data selection, loss function design, etc. Learning to reweight, which is a specific kind of teaching that reweights training data using a teacher model, receives much attention due to its simplicity and effectiveness. In existing learning to reweight works, the teacher model only utilizes shallow/surface information such as training iteration number and loss/accuracy of the student model from training/validation sets, but ignores the internal states of the student model, which limits the potential of learning to reweight. In this work, we propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model, and the teacher model returns adaptive weights of training samples to enhance the training of the student model. The teacher model is jointly trained with the student model using meta gradients propagated from a validation set. Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.

NeurIPS Conference 2021 Conference Paper

Stylized Dialogue Generation with Multi-Pass Dual Learning

  • Jinpeng Li
  • Yingce Xia
  • Rui Yan
  • Hongda Sun
  • Dongyan Zhao
  • Tie-Yan Liu

Stylized dialogue generation, which aims to generate a given-style response for an input context, plays a vital role in intelligent dialogue systems. Considering there is no parallel data between the contexts and the responses of target style S 1, existing works mainly use back translation to generate stylized synthetic data for training, where the data about context, target style S 1 and an intermediate style S 0 is used. However, the interaction among these texts is not fully exploited, and the pseudo contexts are not adequately modeled. To overcome the above difficulties, we propose multi-pass dual learning (MPDL), which leverages the duality among the context, response of style S 1 and response of style S_0. MPDL builds mappings among the above three domains, where the context should be reconstructed by the MPDL framework, and the reconstruction error is used as the training signal. To evaluate the quality of synthetic data, we also introduce discriminators that effectively measure how a pseudo sequence matches the specific domain, and the evaluation result is used as the weight for that data. Evaluation results indicate that our method obtains significant improvement over previous baselines.

ICML Conference 2021 Conference Paper

Temporally Correlated Task Scheduling for Sequence Learning

  • Xueqing Wu 0001
  • Lewen Wang
  • Yingce Xia
  • Weiqing Liu
  • Lijun Wu 0003
  • Shufang Xie 0003
  • Tao Qin 0001
  • Tie-Yan Liu

Sequence learning has attracted much research attention from the machine learning community in recent years. In many applications, a sequence learning task is usually associated with multiple temporally correlated auxiliary tasks, which are different in terms of how much input information to use or which future step to predict. For example, (i) in simultaneous machine translation, one can conduct translation under different latency (i. e. , how many input words to read/wait before translation); (ii) in stock trend forecasting, one can predict the price of a stock in different future days (e. g. , tomorrow, the day after tomorrow). While it is clear that those temporally correlated tasks can help each other, there is a very limited exploration on how to better leverage multiple auxiliary tasks to boost the performance of the main task. In this work, we introduce a learnable scheduler to sequence learning, which can adaptively select auxiliary tasks for training depending on the model status and the current training data. The scheduler and the model for the main task are jointly trained through bi-level optimization. Experiments show that our method significantly improves the performance of simultaneous machine translation and stock trend forecasting.

ICLR Conference 2020 Conference Paper

Incorporating BERT into Neural Machine Translation

  • Jinhua Zhu 0001
  • Yingce Xia
  • Lijun Wu 0003
  • Di He 0001
  • Tao Qin 0001
  • Wengang Zhou 0001
  • Houqiang Li
  • Tie-Yan Liu

The recently proposed BERT (Devlin et al., 2019) has shown great power on a variety of natural language understanding tasks, such as text classification, reading comprehension, etc. However, how to effectively apply BERT to neural machine translation (NMT) lacks enough exploration. While BERT is more commonly used as fine-tuning instead of contextual embedding for downstream language understanding tasks, in NMT, our preliminary exploration of using BERT as contextual embedding is better than using for fine-tuning. This motivates us to think how to better leverage BERT for NMT along this direction. We propose a new algorithm named BERT-fused model, in which we first use BERT to extract representations for an input sequence, and then the representations are fused with each layer of the encoder and decoder of the NMT model through attention mechanisms. We conduct experiments on supervised (including sentence-level and document-level translations), semi-supervised and unsupervised machine translation, and achieve state-of-the-art results on seven benchmark datasets. Our code is available at https://github.com/bert-nmt/bert-nmt

ICML Conference 2020 Conference Paper

Sequence Generation with Mixed Representations

  • Lijun Wu 0003
  • Shufang Xie 0003
  • Yingce Xia
  • Yang Fan
  • Jian-Huang Lai
  • Tao Qin 0001
  • Tie-Yan Liu

Tokenization is the first step of many natural language processing (NLP) tasks and plays an important role for neural NLP models. Tokenizaton method such as byte-pair encoding (BPE), which can greatly reduce the large vocabulary and deal with out-of-vocabulary words, has shown to be effective and is widely adopted for sequence generation tasks. While various tokenization methods exist, there is no common acknowledgement which is the best. In this work, we propose to leverage the mixed representations from different tokenization methods for sequence generation tasks, in order to boost the model performance with unique characteristics and advantages of individual tokenization methods. Specifically, we introduce a new model architecture to incorporate mixed representations and a co-teaching algorithm to better utilize the diversity of different tokenization methods. Our approach achieves significant improvements on neural machine translation (NMT) tasks with six language pairs (e. g. , English$\leftrightarrow$German, English$\leftrightarrow$Romanian), as well as an abstractive summarization task.

AAAI Conference 2020 Conference Paper

Transductive Ensemble Learning for Neural Machine Translation

  • Yiren Wang
  • Lijun Wu
  • Yingce Xia
  • Tao Qin
  • ChengXiang Zhai
  • Tie-Yan Liu

Ensemble learning, which aggregates multiple diverse models for inference, is a common practice to improve the accuracy of machine learning tasks. However, it has been observed that the conventional ensemble methods only bring marginal improvement for neural machine translation (NMT) when individual models are strong or there are a large number of individual models. In this paper, we study how to effectively aggregate multiple NMT models under the transductive setting where the source sentences of the test set are known. We propose a simple yet effective approach named transductive ensemble learning (TEL), in which we use all individual models to translate the source test set into the target language space and then finetune a strong model on the translated synthetic corpus. We conduct extensive experiments on different settings (with/without monolingual data) and different language pairs (English↔{German, Finnish}). The results show that our approach boosts strong individual models with significant improvement and benefits a lot from more individual models. Specifically, we achieve the state-of-the-art performances on the WMT2016-2018 English↔German translations.

IJCAI Conference 2019 Conference Paper

Deliberation Learning for Image-to-Image Translation

  • Tianyu He
  • Yingce Xia
  • Jianxin Lin
  • Xu Tan
  • Di He
  • Tao Qin
  • Zhibo Chen

Image-to-image translation, which transfers an image from a source domain to a target one, has attracted much attention in both academia and industry. The major approach is to adopt an encoder-decoder based framework, where the encoder extracts features from the input image and then the decoder decodes the features and generates an image in the target domain as the output. In this paper, we go beyond this learning framework by considering an additional polishing step on the output image. Polishing an image is very common in human's daily life, such as editing and beautifying a photo in Photoshop after taking/generating it by a digital camera. Such a deliberation process is shown to be very helpful and important in practice and thus we believe it will also be helpful for image translation. Inspired by the success of deliberation network in natural language processing, we extend deliberation process to the field of image translation. We verify our proposed method on four two-domain translation tasks and one multi-domain translation task. Both the qualitative and quantitative results demonstrate the effectiveness of our method.

IJCAI Conference 2019 Conference Paper

Image-to-Image Translation with Multi-Path Consistency Regularization

  • Jianxin Lin
  • Yingce Xia
  • Yijun Wang
  • Tao Qin
  • Zhibo Chen

Image translation across different domains has attracted much attention in both machine learning and computer vision communities. Taking the translation from a source domain to a target domain as an example, existing algorithms mainly rely on two kinds of loss for training: One is the discrimination loss, which is used to differentiate images generated by the models and natural images; the other is the reconstruction loss, which measures the difference between an original image and the reconstructed version. In this work, we introduce a new kind of loss, multi-path consistency loss, which evaluates the differences between direct translation from source domain to target domain and indirect translation from source domain to an auxiliary domain to target domain, to regularize training. For multi-domain translation (at least, three) which focuses on building translation models between any two domains, at each training iteration, we randomly select three domains, set them respectively as the source, auxiliary and target domains, build the multi-path consistency loss and optimize the network. For two-domain translation, we need to introduce an additional auxiliary domain and construct the multi-path consistency loss. We conduct various experiments to demonstrate the effectiveness of our proposed methods, including face-to-face translation, paint-to-photo translation, and de-raining/de-noising translation.

NeurIPS Conference 2019 Conference Paper

Neural Machine Translation with Soft Prototype

  • Yiren Wang
  • Yingce Xia
  • Fei Tian
  • Fei Gao
  • Tao Qin
  • Cheng Xiang Zhai
  • Tie-Yan Liu

Neural machine translation models usually use the encoder-decoder framework and generate translation from left to right (or right to left) without fully utilizing the target-side global information. A few recent approaches seek to exploit the global information through two-pass decoding, yet have limitations in translation quality and model efficiency. In this work, we propose a new framework that introduces a soft prototype into the encoder-decoder architecture, which allows the decoder to have indirect access to both past and future information, such that each target word can be generated based on the better global understanding. We further provide an efficient and effective method to generate the prototype. Empirical studies on various neural machine translation tasks show that our approach brings significant improvement in generation quality over the baseline model, with little extra cost in storage and inference time, demonstrating the effectiveness of our proposed framework. Specially, we achieve state-of-the-art results on WMT2014, 2015 and 2017 English to German translation.

AAAI Conference 2019 Conference Paper

Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder

  • Yingce Xia
  • Tianyu He
  • Xu Tan
  • Fei Tian
  • Di He
  • Tao Qin

Sharing source and target side vocabularies and word embeddings has been a popular practice in neural machine translation (briefly, NMT) for similar languages (e. g. , English to French or German translation). The success of such wordlevel sharing motivates us to move one step further: we consider model-level sharing and tie the whole parts of the encoder and decoder of an NMT model. We share the encoder and decoder of Transformer (Vaswani et al. 2017), the stateof-the-art NMT model, and obtain a compact model named Tied Transformer. Experimental results demonstrate that such a simple method works well for both similar and dissimilar language pairs. We empirically verify our framework for both supervised NMT and unsupervised NMT: we achieve a 35. 52 BLEU score on IWSLT 2014 German to English translation, 28. 98/29. 89 BLEU scores on WMT 2014 English to German translation without/with monolingual data, and a 22. 05 BLEU score on WMT 2016 unsupervised German to English translation.

AAAI Conference 2018 Conference Paper

Dual Transfer Learning for Neural Machine Translation with Marginal Distribution Regularization

  • Yijun Wang
  • Yingce Xia
  • Li Zhao
  • Jiang Bian
  • Tao Qin
  • Guiquan Liu
  • Tie-Yan Liu

Neural machine translation (NMT) heavily relies on parallel bilingual data for training. Since large-scale, high-quality parallel corpora are usually costly to collect, it is appealing to exploit monolingual corpora to improve NMT. Inspired by the law of total probability, which connects the probability of a given target-side monolingual sentence to the conditional probability of translating from a source sentence to the target one, we propose to explicitly exploit this connection to learn from and regularize the training of NMT models using monolingual data. The key technical challenge of this approach is that there are exponentially many source sentences for a target monolingual sentence while computing the sum of the conditional probability given each possible source sentence. We address this challenge by leveraging the dual translation model (target-to-source translation) to sample several mostly likely source-side sentences and avoid enumerating all possible candidate source sentences. That is, we transfer the knowledge contained in the dual model to boost the training of the primal model (source-to-target translation), and we call such an approach dual transfer learning. Experiment results on English→French and German→English tasks demonstrate that dual transfer learning achieves significant improvement over several strong baselines and obtains new state-of-the-art results.

IJCAI Conference 2018 Conference Paper

Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

  • Haifang Li
  • Yingce Xia
  • Wensheng Zhang

Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of approximations. We propose a new algorithm, LSTD(lambda)-RP, which leverages random projection techniques and takes eligibility traces into consideration to tackle the above two challenges. We carry out theoretical analysis of LSTD(lambda)-RP, and provide meaningful upper bounds of the estimation error, approximation error and total generalization error. These results demonstrate that LSTD(lambda)-RP can benefit from random projection and eligibility traces strategies, and LSTD(lambda)-RP can achieve better performances than prior LSTD-RP and LSTD(lambda) algorithms.

NeurIPS Conference 2018 Conference Paper

Layer-Wise Coordination between Encoder and Decoder for Neural Machine Translation

  • Tianyu He
  • Xu Tan
  • Yingce Xia
  • Di He
  • Tao Qin
  • Zhibo Chen
  • Tie-Yan Liu

Neural Machine Translation (NMT) has achieved remarkable progress with the quick evolvement of model structures. In this paper, we propose the concept of layer-wise coordination for NMT, which explicitly coordinates the learning of hidden representations of the encoder and decoder together layer by layer, gradually from low level to high level. Specifically, we design a layer-wise attention and mixed attention mechanism, and further share the parameters of each layer between the encoder and decoder to regularize and coordinate the learning. Experiments show that combined with the state-of-the-art Transformer model, layer-wise coordination achieves improvements on three IWSLT and two WMT translation tasks. More specifically, our method achieves 34. 43 and 29. 01 BLEU score on WMT16 English-Romanian and WMT14 English-German tasks, outperforming the Transformer baseline.

NeurIPS Conference 2018 Conference Paper

Learning to Teach with Dynamic Loss Functions

  • Lijun Wu
  • Fei Tian
  • Yingce Xia
  • Yang Fan
  • Tao Qin
  • Lai Jian-Huang
  • Tie-Yan Liu

Teaching is critical to human society: it is with teaching that prospective students are educated and human civilization can be inherited and advanced. A good teacher not only provides his/her students with qualified teaching materials (e. g. , textbooks), but also sets up appropriate learning objectives (e. g. , course projects and exams) considering different situations of a student. When it comes to artificial intelligence, treating machine learning models as students, the loss functions that are optimized act as perfect counterparts of the learning objective set by the teacher. In this work, we explore the possibility of imitating human teaching behaviors by dynamically and automatically outputting appropriate loss functions to train machine learning models. Different from typical learning settings in which the loss function of a machine learning model is predefined and fixed, in our framework, the loss function of a machine learning model (we call it student) is defined by another machine learning model (we call it teacher). The ultimate goal of teacher model is cultivating the student to have better performance measured on development dataset. Towards that end, similar to human teaching, the teacher, a parametric model, dynamically outputs different loss functions that will be used and optimized by its student model at different training stages. We develop an efficient learning method for the teacher model that makes gradient based optimization possible, exempt of the ineffective solutions such as policy optimization. We name our method as ``learning to teach with dynamic loss functions'' (L2T-DLF for short). Extensive experiments on real world tasks including image classification and neural machine translation demonstrate that our method significantly improves the quality of various student models.

ICML Conference 2018 Conference Paper

Model-Level Dual Learning

  • Yingce Xia
  • Xu Tan 0003
  • Fei Tian
  • Tao Qin 0001
  • Nenghai Yu
  • Tie-Yan Liu

Many artificial intelligence tasks appear in dual forms like English$\leftrightarrow$French translation and speech$\leftrightarrow$text transformation. Existing dual learning schemes, which are proposed to solve a pair of such dual tasks, explore how to leverage such dualities from data level. In this work, we propose a new learning framework, model-level dual learning, which takes duality of tasks into consideration while designing the architectures for the primal/dual models, and ties the model parameters that playing similar roles in the two tasks. We study both symmetric and asymmetric model-level dual learning. Our algorithms achieve significant improvements on neural machine translation and sentiment analysis.

NeurIPS Conference 2017 Conference Paper

Decoding with Value Networks for Neural Machine Translation

  • Di He
  • Hanqing Lu
  • Yingce Xia
  • Tao Qin
  • Liwei Wang
  • Tie-Yan Liu

Neural Machine Translation (NMT) has become a popular technology in recent years, and beam search is its de facto decoding method due to the shrunk search space and reduced computational complexity. However, since it only searches for local optima at each time step through one-step forward looking, it usually cannot output the best target sentence. Inspired by the success and methodology of AlphaGo, in this paper we propose using a prediction network to improve beam search, which takes the source sentence $x$, the currently available decoding output $y_1, \cdots, y_{t-1}$ and a candidate word $w$ at step $t$ as inputs and predicts the long-term value (e. g. , BLEU score) of the partial target sentence if it is completed by the NMT model. Following the practice in reinforcement learning, we call this prediction network \emph{value network}. Specifically, we propose a recurrent structure for the value network, and train its parameters from bilingual data. During the test time, when choosing a word $w$ for decoding, we consider both its conditional probability given by the NMT model and its long-term value predicted by the value network. Experiments show that such an approach can significantly improve the translation accuracy on several translation tasks.

NeurIPS Conference 2017 Conference Paper

Deliberation Networks: Sequence Generation Beyond One-Pass Decoding

  • Yingce Xia
  • Fei Tian
  • Lijun Wu
  • Jianxin Lin
  • Tao Qin
  • Nenghai Yu
  • Tie-Yan Liu

The encoder-decoder framework has achieved promising progress for many sequence generation tasks, including machine translation, text summarization, dialog system, image captioning, etc. Such a framework adopts an one-pass forward process while decoding and generating a sequence, but lacks the deliberation process: A generated sequence is directly used as final output without further polishing. However, deliberation is a common behavior in human's daily life like reading news and writing papers/articles/books. In this work, we introduce the deliberation process into the encoder-decoder framework and propose deliberation networks for sequence generation. A deliberation network has two levels of decoders, where the first-pass decoder generates a raw sequence and the second-pass decoder polishes and refines the raw sentence with deliberation. Since the second-pass deliberation decoder has global information about what the sequence to be generated might be, it has the potential to generate a better sequence by looking into future words in the raw sentence. Experiments on neural machine translation and text summarization demonstrate the effectiveness of the proposed deliberation networks. On the WMT 2014 English-to-French translation task, our model establishes a new state-of-the-art BLEU score of 41. 5.

IJCAI Conference 2017 Conference Paper

Dual Inference for Machine Learning

  • Yingce Xia
  • Jiang Bian
  • Tao Qin
  • Nenghai Yu
  • Tie-Yan Liu

Recent years have witnessed the rapid development of machine learning in solving artificial intelligence (AI) tasks in many domains, including translation, speech, image, etc. Within these domains, AI tasks are usually not independent. As a specific type of relationship, structural duality does exist between many pairs of AI tasks, such as translation from one language to another vs. its opposite direction, speech recognition vs. speech synthetization, image classification vs. image generation, etc. The importance of such duality has been magnified by some recent studies, which revealed that it can boost the learning of two tasks in the dual form. However, there has been little investigation on how to leverage this invaluable relationship into the inference stage of AI tasks. In this paper, we propose a general framework of dual inference which can take advantage of both existing models from two dual tasks, without re-training, to conduct inference for one individual task. Empirical studies on three pairs of specific dual tasks, including machine translation, sentiment analysis, and image processing have illustrated that dual inference can significantly improve the performance of each of individual tasks.

ICML Conference 2017 Conference Paper

Dual Supervised Learning

  • Yingce Xia
  • Tao Qin 0001
  • Wei Chen 0034
  • Jiang Bian 0002
  • Nenghai Yu
  • Tie-Yan Liu

Many supervised learning tasks are emerged in dual forms, e. g. , English-to-French translation vs. French-to-English translation, speech recognition vs. text to speech, and image classification vs. image generation. Two dual tasks have intrinsic connections with each other due to the probabilistic correlation between their models. This connection is, however, not effectively utilized today, since people usually train the models of two dual tasks separately and independently. In this work, we propose training the models of two dual tasks simultaneously, and explicitly exploiting the probabilistic correlation between them to regularize the training process. For ease of reference, we call the proposed approach dual supervised learning. We demonstrate that dual supervised learning can improve the practical performances of both tasks, for various applications including machine translation, image processing, and sentiment analysis.

AAAI Conference 2017 Conference Paper

Infinitely Many-Armed Bandits with Budget Constraints

  • Haifang Li
  • Yingce Xia

We study the infinitely many-armed bandit problem with budget constraints, where the number of arms can be infinite and much larger than the number of possible experiments. The player aims at maximizing his/her total expected reward under a budget constraint B for the cost of pulling arms. We introduce a weak stochastic assumption on the ratio of expected-reward to expected-cost of a newly pulled arm which characterizes its probability of being a near-optimal arm. We propose an algorithm named RCB-I to this new problem, in which the player first randomly picks K arms, whose order is sub-linear in terms of B, and then runs the algorithm for the finite-arm setting on the selected arms. Theoretical analysis shows that this simple algorithm enjoys a sub-linear regret in term of the budget B. We also provide a lower bound of any algorithm under Bernoulli setting. The regret bound of RCB-I matches the lower bound up to a logarithmic factor. We further extend this algorithm to the any-budget setting (i. e. , the budget is unknown in advance) and conduct corresponding theoretical analysis.

AAMAS Conference 2016 Conference Paper

Best Action Selection in a Stochastic Environment

  • Yingce Xia
  • Tao Qin
  • Nenghai Yu
  • Tie-Yan Liu

We study the problem of selecting the best action from multiple candidates in a stochastic environment. In such a stochastic setting, when taking an action, a player receives a random reward and affords a random cost, which are drawn from two unknown distributions. We target at selecting the best action, the one with the maximum ratio of the expected reward to the expected cost, after exploring the actions for n rounds. In particular, we study three mechanisms: (i) the uniform exploration mechanism MU; (ii) the successive elimination mechanism MSE; and (iii) the ratio confidence bound exploration mechanism MRCB. We prove that for all the three mechanisms, the probabilities that the best action is not selected (i. e. , the error probabilities) can be upper bounded by O(exp{−cn}), where c is a constant related to the mechanisms and coefficients about the actions. We then give an asymptotic lower bound of the error probabilities of the consistent mechanisms for Bernoulli setting, and discuss its relationship with the upper bounds in different aspects. Our proposed mechanisms can be degenerated to cover the cases where only the reward/costs are random. We also test the proposed mechanisms through numerical experiments.

IJCAI Conference 2016 Conference Paper

Budgeted Multi-Armed Bandits with Multiple Plays

  • Yingce Xia
  • Tao Qin
  • Weidong Ma
  • Nenghai Yu
  • Tie-Yan Liu

We study the multi-play budgeted multi-armed bandit (MP-BMAB) problem, in which pulling an arm receives both a random reward and a random cost, and a player pulls L( ≥ 1) arms at each round. The player targets at maximizing her total expected reward under a budget constraint B for the pulling costs. We present a multiple ratio confidence bound policy: At each round, we first calculate a truncated upper (lower) confidence bound for the expected reward (cost) of each arm, and then pull the L arms with the maximum ratio of the sum of the upper confidence bounds of rewards to the sum of the lower confidence bounds of costs. We design 0-1 integer linear fractional programming oracle that can pick such the L arms within polynomial time. We prove that the regret of our policy is sublinear in general and is log-linear for certain parameter settings. We further consider two special cases of MP-BMABs: (1) We derive a lower bound for any consistent policy for MP-BMABs with Bernoulli reward and cost distributions. (2) We show that the proposed policy can also solve conventional budgeted MAB problem (a special case of MP-BMABs with L = 1) and provides better theoretical results than existing UCB-based pulling policies.

NeurIPS Conference 2016 Conference Paper

Dual Learning for Machine Translation

  • Di He
  • Yingce Xia
  • Tao Qin
  • Liwei Wang
  • Nenghai Yu
  • Tie-Yan Liu
  • Wei-Ying Ma

While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e. g. , English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e. g. , the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e. g. , using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10\% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task.

IJCAI Conference 2015 Conference Paper

Efficient Algorithms with Performance Guarantees for the Stochastic Multiple-Choice Knapsack Problem

  • Long Tran-Thanh
  • Yingce Xia
  • Tao Qin
  • Nicholas R Jennings

We study the stochastic multiple-choice knapsack problem, where a set of K items, whose value and weight are random variables, arrive to the system at each time step, and a decision maker has to choose at most one item to put into the knapsack without exceeding its capacity. The goal of the decision-maker is to maximise the total expected value of chosen items with respect to the knapsack capacity and a finite time horizon. We provide the first comprehensive theoretical analysis of the problem. In particular, we propose OPT-S-MCKP, the first algorithm that achieves optimality when the value-weight distributions are known. This algorithm also enjoys Õ( √ T) performance loss, where T is the finite time horizon, in the unknown value-weight distributions scenario. We also further develop two novel approximation methods, FR-S-MCKP and G-S-MCKP, and we prove that FR-S-MCKP achieves Õ( √ T) performance loss in both known and unknown value-weight distributions cases, while enjoying polynomial computational complexity per time step. On the other hand, G-S-MCKP does not have theoretical guarantees, but it still provides good performance in practice with linear running time.

IJCAI Conference 2015 Conference Paper

Thompson Sampling for Budgeted Multi-Armed Bandits

  • Yingce Xia
  • Haifang Li
  • Tao Qin
  • Nenghai Yu
  • Tie-Yan Liu

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We start with the case of Bernoulli bandits, in which the random rewards (costs) of an arm are independently sampled from a Bernoulli distribution. To implement the Thompson sampling algorithm in this case, at each round, we sample two numbers from the posterior distributions of the reward and cost for each arm, obtain their ratio, select the arm with the maximum ratio, and then update the posterior distributions. We prove that the distribution-dependent regret bound of this algorithm is O(ln B), where B denotes the budget. By introducing a Bernoulli trial, we further extend this algorithm to the setting that the rewards (costs) are drawn from general distributions, and prove that its regret bound remains almost the same. Our simulation results demonstrate the effectiveness of the proposed algorithm.

AAAI Conference 2014 Conference Paper

Incentivizing High-Quality Content from Heterogeneous Users: On the Existence of Nash Equilibrium

  • Yingce Xia
  • Tao Qin
  • Nenghai Yu
  • Tie-Yan Liu

We study the existence of pure Nash equilibrium (PNE) for the mechanisms used in Internet services (e. g. , online reviews and question-answering websites) to incentivize users to generate high-quality content. Most existing work assumes that users are homogeneous and have the same ability. However, real-world users are heterogeneous and their abilities can be very different from each other due to their diversity in background, culture, and profession. In this work, we consider the following setting: (1) the users are heterogeneous and each of them has a private type indicating the best quality of the content he/she can generate; (2) all the users share a fixed total reward. With this setting, we study the existence of pure Nash equilibrium of several mechanisms composed by different allocation rules, action spaces, and information availability. We prove the existence of PNE for some mechanisms and the non-existence for some other mechanisms. We also discuss how to find a PNE (if exists) through either a constructive way or a search algorithm.