Arrow Research search

Author name cluster

Jinsong Su

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

44 papers
2 author rows

Possible papers

44

AAAI Conference 2026 Conference Paper

Augmenting Intra-Modal Understanding in MLLMs for Robust Multimodal Keyphrase Generation

  • Jiajun Cao
  • Qinggang Zhang
  • Yunbo Tang
  • Zhishang Xiang
  • Chang Yang
  • Jinsong Su

Multimodal keyphrase generation (MKP) aims to extract a concise set of keyphrases that capture the essential meaning of paired image–text inputs, enabling structured understanding, indexing, and retrieval of multimedia data across the web and social platforms. Success in this task demands effectively bridging the semantic gap between heterogeneous modalities. While multimodal large language models (MLLMs) achieve superior cross-modal understanding by leveraging massive pretraining on image-text corpora, we observe that they often struggle with modality bias and fine-grained intra-modal feature extraction. This oversight leads to a lack of robustness in real-world scenarios where multimedia data is noisy, along with incomplete or misaligned modalities. To address this problem, we propose AimKP, a novel framework that explicitly reinforces intra-modal semantic learning in MLLMs while preserving cross-modal alignment. AimKP incorporates two core innovations: (i) Progressive Modality Masking, which forces fine-grained feature extraction from corrupted inputs by progressively masking modality information during training; (ii) Gradient-based Filtering, that identifies and discards noisy samples, preventing them from corrupting the model’s core cross-modal learning. Extensive experiments validate AimKP’s effectiveness in multimodal keyphrase generation and its robustness across different scenarios.

AAAI Conference 2026 Conference Paper

Beyond Passive Critical Thinking: Fostering Proactive Questioning to Enhance Human-AI Collaboration

  • Ante Wang
  • Yujie Lin
  • Jingyao Liu
  • Suhang Wu
  • Hao Liu
  • Xinyan Xiao
  • Jinsong Su

Critical thinking is essential for building robust AI systems, preventing them from blindly accepting flawed data or biased reasoning. However, prior work has primarily focused on passive critical thinking, where models simply reject problematic queries without taking constructive steps to address user requests. In this work, we introduce proactive critical thinking, a paradigm where models actively seek missing or clarifying information from users to resolve their queries better. To evaluate this capability, we present GSM-MC and GSM-MCE, two novel benchmarks based on GSM8K for assessing mathematical reasoning under incomplete or misleading conditions. Experiments on Qwen3 and Llama series models show that, while these models excel in traditional reasoning tasks, they struggle with proactive critical thinking, especially smaller ones. However, we demonstrate that reinforcement learning (RL) can significantly improve this ability. By incorporating heuristic information into the reward function, we achieve substantial gains, boosting the Qwen3-1.7B's accuracy from 0.15% to 73.98% on GSM-MC. We hope this work advances models that collaborate more effectively with users in problem-solving through proactive critical thinking.

AAAI Conference 2026 Conference Paper

PLaST: Towards Paralinguistic-aware Speech Translation

  • Yi Li
  • Rui Zhao
  • Ruiquan Zhang
  • Jinsong Su
  • Daimeng Wei
  • Min Zhang
  • Yidong Chen

Speech translation (ST) aims to translate speech from a source language into text in the target language. Naturally, speech signals contain paralinguistic cues beyond linguistic content, which could influence or even alter the interpretation of a lexically identical sentence, thereby yielding distinct translations. However, existing ST models lack direct and sufficient modeling of paralinguistic information, which limits their ability to perceive paralinguistic cues and understand speech comprehensively, leading to degraded translation performance. In response, we propose Paralinguistic-aware Speech Translation (PLaST), a novel dual-branch framework which directly leverages paralinguistic cues beyond the linguistic content. Specifically, PLaST employs a speech encoder and a style extractor to independently generate linguistic and paralinguistic representations, respectively. To obtain a purified linguistic representation aligned with the text representation, a hierarchical Optimal Transport (OT) is applied on the layer-wise outputs from an LLM decoder. Then, the paralinguistic information is retrieved and refined with an Attention-based Retrieval (AR) module, with the linguistic representation serving as queries to enable joint guidance for semantic understanding and translation generation. PLaST outperforms the strong baseline with an average of 5.0 directional and 4.5 global contrastive likelihood scores on the paralinguistic-sensitive benchmark ContraProST, demonstrating its superior capability in paralinguistic perception. Further experiments on the standard speech translation benchmark CoVoST-2 show that PLaST generalizes well to typical ST scenarios.

AIJ Journal 2025 Journal Article

A simple yet effective self-debiasing framework for transformer models

  • Xiaoyue Wang
  • Xin Liu
  • Lijie Wang
  • Suhang Wu
  • Jinsong Su
  • Hua Wu

Current Transformer-based natural language understanding (NLU) models heavily rely on dataset biases, while failing to handle real-world out-of-distribution (OOD) instances. Many methods have been proposed to deal with this issue, but they ignore the fact that the features learned in different layers of Transformer-based NLU models are different. In this paper, we first conduct preliminary studies to obtain two conclusions: 1) both low- and high-layer sentence representations encode common biased features during training; 2) the low-layer sentence representations encode fewer unbiased features than the highlayer ones. Based on these conclusions, we propose a simple yet effective self-debiasing framework for Transformer-based NLU models. Concretely, we first stack a classifier on a selected low layer. Then, we introduce a residual connection that feeds the low-layer sentence representation to the top-layer classifier. In this way, the top-layer sentence representation will be trained to ignore the common biased features encoded by the low-layer sentence representation and focus on task-relevant unbiased features. During inference, we remove the residual connection and directly use the top-layer sentence representation to make predictions. Extensive experiments and indepth analyses on NLU tasks show that our framework performs better than several competitive baselines, achieving a new SOTA on all OOD test sets.

IJCAI Conference 2025 Conference Paper

Boosting Visual Knowledge-Intensive Training for LVLMs Through Causality-Driven Visual Object Completion

  • Qingguo Hu
  • Ante Wang
  • Jia Song
  • Delai Qiu
  • Qingsong Liu
  • Jinsong Su

Large Vision-Language Models (LVLMs) have experienced significant advancements in recent years. However, their performance still falls short in tasks requiring deep visual perception, such as identifying subtle differences between images. A potential cause is the scarcity of visual knowledge in popular instruction-tuning corpora, resulting in inadequate visual perception and reasoning capabilities. To address this challenge, we introduce a self-improvement framework grounded in a novel visual knowledge-intensive task, Causality-driven Visual object Completion (CVC). This task requires LVLMs to infer the masked object in an image based on its causal relationships with the other visible information. We first obtain rich examples cheaply through our automated instance construction pipeline, without relying on sophisticated LVLMs (e. g. , GPT-4V) or human assistance. Then, LVLMs effectively self-improve through trial and error learning using these created instances. Our experiments demonstrate substantial gains across four challenging specialized tasks and four widely-used comprehensive benchmarks. Especially on specialized tasks, our method achieves an average improvement of 5. 4% and 4. 0% compared to the corresponding baselines when utilizing LLaVA-1. 5-7B and LLaVA-1. 5-13B, respectively. Code and the supplementary file are available at https: //github. com/XMUDeepLIT/CVC.

ICML Conference 2025 Conference Paper

EpiCoder: Encompassing Diversity and Complexity in Code Generation

  • Yaoxiang Wang
  • Haoling Li
  • Xin Zhang 0099
  • Jie Wu 0001
  • Xiao Liu 0029
  • Wenxiang Hu
  • Zhongxin Guo
  • Yangyu Huang

Existing methods for code generation use code snippets as seed data, restricting the complexity and diversity of the synthesized data. In this paper, we introduce a novel feature tree-based synthesis framework, which revolves around hierarchical code features derived from high-level abstractions of code. The feature tree is constructed from raw data and refined iteratively to increase the quantity and diversity of the extracted features, which captures and recognizes more complex patterns and relationships within the code. By adjusting the depth and breadth of the sampled subtrees, our framework provides precise control over the complexity of the generated code, enabling functionalities that range from function-level operations to multi-file scenarios. We fine-tuned widely-used base models to obtain EpiCoder series, achieving state-of-the-art performance on multiple benchmarks at both the function and file levels. In particular, empirical evidence indicates that our approach shows significant potential in the synthesizing of repository-level code data. Our code and data are publicly available.

AAAI Conference 2025 Conference Paper

LiteSearch: Efficient Tree Search with Dynamic Exploration Budget for Math Reasoning

  • Ante Wang
  • Linfeng Song
  • Ye Tian
  • Baolin Peng
  • Dian Yu
  • Haitao Mi
  • Jinsong Su
  • Dong Yu

Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources of greedy decoding due to wasteful search strategies, making them difficult to be deployed in practical applications. This study introduces a novel guided tree search algorithm with a goal-directed heuristic function and node-level exploration budget (maximum number of children) calculation to tackle this issue. By considering the search progress towards the final answer (history) and the guidance from a value network (future) trained without any step-wise annotations, our algorithm iteratively selects the most promising tree node before expanding it within the boundaries of the allocated computational budget. Experiments conducted on the GSM8K, TabMWP, and MATH datasets demonstrate that our method not only offers competitive performance but also enjoys significantly lower computational costs compared to baseline methods.

AAAI Conference 2024 Conference Paper

Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

  • Rui Zhao
  • Liang Zhang
  • Biao Fu
  • Cong Hu
  • Jinsong Su
  • Yidong Chen

Sign language translation (SLT) aims to convert continuous sign language videos into textual sentences. As a typical multi-modal task, there exists an inherent modality gap between sign language videos and spoken language text, which makes the cross-modal alignment between visual and textual modalities crucial. However, previous studies tend to rely on an intermediate sign gloss representation to help alleviate the cross-modal problem thereby neglecting the alignment across modalities that may lead to compromised results. To address this issue, we propose a novel framework based on Conditional Variational autoencoder for SLT (CV-SLT) that facilitates direct and sufficient cross-modal alignment between sign language videos and spoken language text. Specifically, our CV-SLT consists of two paths with two Kullback-Leibler (KL) divergences to regularize the outputs of the encoder and decoder, respectively. In the prior path, the model solely relies on visual information to predict the target text; whereas in the posterior path, it simultaneously encodes visual information and textual knowledge to reconstruct the target text. The first KL divergence optimizes the conditional variational autoencoder and regularizes the encoder outputs, while the second KL divergence performs a self-distillation from the posterior path to the prior path, ensuring the consistency of decoder outputs.We further enhance the integration of textual information to the posterior path by employing a shared Attention Residual Gaussian Distribution (ARGD), which considers the textual information in the posterior path as a residual component relative to the prior path. Extensive experiments conducted on public datasets demonstrate the effectiveness of our framework, achieving new state-of-the-art results while significantly alleviating the cross-modal representation discrepancy. The code and models are available at https://github.com/rzhao-zhsq/CV-SLT.

ECAI Conference 2024 Conference Paper

Lightweight Transformer for sEMG Gesture Recognition with Feature Distilled Variational Information Bottleneck

  • Zefeng Wang
  • Bingbing Hu
  • Junfeng Yao
  • Jinsong Su

Gesture recognition based on surface electromyography (sEMG) has seen considerable improvements in performance across various tasks and metrics with the rapid development of deep learning. However, challenges still exist in current deep neural networks for sEMG recognition. For instance, convolutional neural networks exhibit poor capturing of global features, recurrent neural networks have limited parallel processing capabilities, and their hybrids are usually more complex. Additionally, recent networks based on Transformers rarely consider the locality of attention and noise resistance. To fully explore the essence of sEMG sequences, and to make the model more lightweight and robust while ensuring feature learning performance, in this paper, we propose the feature distilled variational information bottleneck (FDVIB). Specifically, this method leverages knowledge distillation to learn from a high-precision teacher model at levels of feature and prediction, significantly reducing parameters and computations, and simplifying the structure. It also uses VIB to enhance the model’s robustness. We construct a Transformer model using the proposed method and conduct a series of evaluations. Experimental results show that our classification accuracy is competitive with state-of-the-art and also demonstrate the effectiveness of our method in enhancing model lightness and robustness.

ECAI Conference 2024 Conference Paper

On the Cultural Gap in Text-to-Image Generation

  • Bingshuai Liu
  • Longyue Wang
  • Chenyang Lyu
  • Yong Zhang 0034
  • Jinsong Su
  • Shuming Shi 0001
  • Zhaopeng Tu

One challenge in text-to-image (T2I) generation is the inadvertent reflection of culture gaps present in the training data, which signifies the disparity in generated image quality when the cultural elements of the input text are rarely collected in the training set. Although various T2I models have shown impressive but arbitrary examples, there is no benchmark to systematically evaluate a T2I model’s ability to generate cross-cultural images. To bridge the gap, we propose a Challenging Cross-Cultural (C3) benchmark with comprehensive evaluation criteria, which can assess how well-suited a model is to a target culture. By analyzing the flawed images generated by the Stable Diffusion model on the C3 benchmark, we find that the model often fails to generate certain cultural objects. Accordingly, we propose a novel multi-modal metric that considers object-text alignment to filter the fine-tuning data in the target culture, which is used to fine-tune a T2I model to improve cross-cultural generation. Experimental results show that our multi-modal metric provides stronger data selection performance on the C3 benchmark than existing metrics, in which the object-text alignment is crucial. We release the benchmark, data, code, and generated images to facilitate future research on culturally diverse T2I generation.

AAAI Conference 2024 Conference Paper

Response Enhanced Semi-supervised Dialogue Query Generation

  • Jianheng Huang
  • Ante Wang
  • Linfeng Gao
  • Linfeng Song
  • Jinsong Su

Leveraging vast and continually updated knowledge from the Internet has been considered an important ability for a dialogue system. Therefore, the dialogue query generation task is proposed for generating search queries from dialogue histories, which will be submitted to a search engine for retrieving relevant websites on the Internet. In this regard, previous efforts were devoted to collecting conversations with annotated queries and training a query producer (QP) via standard supervised learning. However, these studies still face the challenges of data scarcity and domain adaptation. To address these issues, in this paper, we propose a semi-supervised learning framework -- SemiDQG, to improve model performance with unlabeled conversations. Based on the observation that the search query is typically related to the topic of dialogue response, we train a response-augmented query producer (RA) to provide rich and effective training signals for QP. We first apply a similarity-based query selection strategy to select high-quality RA-generated pseudo queries, which are used to construct pseudo instances for training QP and RA. Then, we adopt the REINFORCE algorithm to further enhance QP, with RA-provided rewards as fine-grained training signals. Experimental results and in-depth analysis of three benchmarks show the effectiveness of our framework in cross-domain and low-resource scenarios. Particularly, SemiDQG significantly surpasses ChatGPT and competitive baselines. Our code is available at \url{https://github.com/DeepLearnXMU/SemiDQG}.

EAAI Journal 2023 Journal Article

A Feedback-Enhanced Two-Stage Framework for judicial machine reading comprehension

  • Zhiqiang Lin
  • Fan Yang
  • Xuyang Wu
  • Jinsong Su
  • Xiaoyue Wang

Machine Reading Comprehension (MRC) is the task of teaching machines to understand and answer questions based on a given text. In the judicial domain, MRC has garnered attention as a novel strategy for addressing a variety of legal issues. Judicial MRC generally requires interpretability, meaning that the model should not only answer the question correctly but also provide supporting evidence sentences. A straightforward approach is to handle the two tasks separately, which ignores the relation between the two tasks and leads to the loss of annotated information. To make better use of the limited judicial MRC annotation data, we simulate the strategies used by humans in solving MRC problems and propose the Feedback-Enhanced Two-Stage Framework for Machine Reading Comprehension (FETSF-MRC). This framework consists of two cascaded modules: (a) a scanning module that identifies evidence sentences and (b) a detailed reading module that focuses on the evidence to identify candidate answers and provides feedback to the scanning module. Experiments on two Chinese judicial reading comprehension datasets (CJRC, CAIL2020) and an open-domain English dataset HotpotQA show that FETSF-MRC achieves superior performance compared to several baselines. FETSF-MRC outperforms the best baseline by 1. 12% and 1. 49% in joint F1 score on the CJRC and CAIL2020 datasets, respectively, and achieves 73. 74% joint F1 score on the HotpotQA.

AAAI Conference 2023 Conference Paper

Code-Aware Cross-Program Transfer Hyperparameter Optimization

  • Zijia Wang
  • Xiangyu He
  • Kehan Chen
  • Chen Lin
  • Jinsong Su

Hyperparameter tuning is an essential task in automatic machine learning and big data management. To accelerate tuning, many recent studies focus on augmenting BO, the primary hyperparameter tuning strategy, by transferring information from other tuning tasks. However, existing studies ignore program similarities in their transfer mechanism, thus they are sub-optimal in cross-program transfer when tuning tasks involve different programs. This paper proposes CaTHPO, a code-aware cross-program transfer hyperparameter optimization framework, which makes three improvements. (1) It learns code-aware program representation in a self-supervised manner to give an off-the-shelf estimate of program similarities. (2) It adjusts the surrogate and AF in BO based on program similarities, thus the hyperparameter search is guided by accumulated information across similar programs. (3) It presents a safe controller to dynamically prune undesirable sample points based on tuning experiences of similar programs. Extensive experiments on tuning various recommendation models and Spark applications have demonstrated that CatHPO can steadily obtain better and more robust hyperparameter performances within fewer samples than state-of-the-art competitors.

IJCAI Conference 2023 Conference Paper

Exploring Effective Inter-Encoder Semantic Interaction for Document-Level Relation Extraction

  • Liang Zhang
  • Zijun Min
  • Jinsong Su
  • Pei Yu
  • Ante Wang
  • Yidong Chen

In document-level relation extraction (RE), the models are required to correctly predict implicit relations in documents via relational reasoning. To this end, many graph-based methods have been proposed for this task. Despite their success, these methods still suffer from several drawbacks: 1) their interaction between document encoder and graph encoder is usually unidirectional and insufficient; 2) their graph encoders often fail to capture the global context of nodes in document graph. In this paper, we propose a document-level RE model with a Graph-Transformer Network (GTN). The GTN includes two core sublayers: 1) the graph-attention sublayer that simultaneously models global and local contexts of nodes in the document graph; 2) the cross-attention sublayer, enabling GTN to capture the non-entity clue information from the document encoder. Furthermore, we introduce two auxiliary training tasks to enhance the bidirectional semantic interaction between the document encoder and GTN: 1) the graph node reconstruction that can effectively train our cross-attention sublayer to enhance the semantic transition from the document encoder to GTN; 2) the structure-aware adversarial knowledge distillation, by which we can effectively transfer the structural information of GTN to the document encoder. Experimental results on four benchmark datasets prove the effectiveness of our model. Our source code is available at https: //github. com/DeepLearnXMU/DocRE-BSI.

AAAI Conference 2023 Conference Paper

Exploring Self-Distillation Based Relational Reasoning Training for Document-Level Relation Extraction

  • Liang Zhang
  • Jinsong Su
  • Zijun Min
  • Zhongjian Miao
  • Qingguo Hu
  • Biao Fu
  • Xiaodong Shi
  • Yidong Chen

Document-level relation extraction (RE) aims to extract relational triples from a document. One of its primary challenges is to predict implicit relations between entities, which are not explicitly expressed in the document but can usually be extracted through relational reasoning. Previous methods mainly implicitly model relational reasoning through the interaction among entities or entity pairs. However, they suffer from two deficiencies: 1) they often consider only one reasoning pattern, of which coverage on relational triples is limited; 2) they do not explicitly model the process of relational reasoning. In this paper, to deal with the first problem, we propose a document-level RE model with a reasoning module that contains a core unit, the reasoning multi-head self-attention unit. This unit is a variant of the conventional multi-head self-attention and utilizes four attention heads to model four common reasoning patterns, respectively, which can cover more relational triples than previous methods. Then, to address the second issue, we propose a self-distillation training framework, which contains two branches sharing parameters. In the first branch, we first randomly mask some entity pair feature vectors in the document, and then train our reasoning module to infer their relations by exploiting the feature information of other related entity pairs. By doing so, we can explicitly model the process of relational reasoning. However, because the additional masking operation is not used during testing, it causes an input gap between training and testing scenarios, which would hurt the model performance. To reduce this gap, we perform conventional supervised training without masking operation in the second branch and utilize Kullback-Leibler divergence loss to minimize the difference between the predictions of the two branches. Finally, we conduct comprehensive experiments on three benchmark datasets, of which experimental results demonstrate that our model consistently outperforms all competitive baselines. Our source code is available at https://github.com/DeepLearnXMU/DocRE-SD

JAIR Journal 2023 Journal Article

FactGen: Faithful Text Generation by Factuality-aware Pre-training and Contrastive Ranking Fine-tuning

  • ZhiBin Lan
  • Wei Li
  • Jinsong Su
  • Xinyan Xiao
  • Jiachen Liu
  • Wenhao Wu
  • Yajuan Lyu

Conditional text generation is supposed to generate a fluent and coherent target text that is faithful to the source text. Although pre-trained models have achieved promising results, they still suffer from the crucial factuality problem. To deal with this issue, we propose a factuality-aware pretraining-finetuning framework named FactGen, which fully considers factuality during two training stages. Specifically, at the pre-training stage, we utilize a natural language inference model to construct target texts that are entailed by the source texts, resulting in a more factually consistent pre-training objective. Then, during the fine-tuning stage, we further introduce a contrastive ranking loss to encourage the model to generate factually consistent text with higher probability. Extensive experiments on three conditional text generation tasks demonstrate the effectiveness and generality of our training framework.

AAAI Conference 2023 Conference Paper

LagNet: Deep Lagrangian Mechanics for Plug-and-Play Molecular Representation Learning

  • Chunyan Li
  • Junfeng Yao
  • Jinsong Su
  • Zhaoyang Liu
  • Xiangxiang Zeng
  • Chenxi Huang

Molecular representation learning is a fundamental problem in the field of drug discovery and molecular science. Whereas incorporating molecular 3D information in the representations of molecule seems beneficial, which is related to computational chemistry with the basic task of predicting stable 3D structures (conformations) of molecules. Existing machine learning methods either rely on 1D and 2D molecular properties or simulate molecular force field to use additional 3D structure information via Hamiltonian network. The former has the disadvantage of ignoring important 3D structure features, while the latter has the disadvantage that existing Hamiltonian neural network must satisfy the “canonial” constraint, which is difficult to be obeyed in many cases. In this paper, we propose a novel plug-and-play architecture LagNet by simulating molecular force field only with parameterized position coordinates, which implements Lagrangian mechanics to learn molecular representation by preserving 3D conformation without obeying any additional restrictions. LagNet is designed to generate known conformations and generalize for unknown ones from molecular SMILES. Implicit positions in LagNet are learned iteratively using discrete-time Lagrangian equations. Experimental results show that LagNet can well learn 3D molecular structure features, and outperforms previous state-of-the-art baselines related molecular representation by a significant margin.

AIJ Journal 2023 Journal Article

Multi-modal graph contrastive encoding for neural machine translation

  • Yongjing Yin
  • Jiali Zeng
  • Jinsong Su
  • Chulun Zhou
  • Fandong Meng
  • Jie Zhou
  • Degen Huang
  • Jiebo Luo

As an important extension of conventional text-only neural machine translation (NMT), multi-modal neural machine translation (MNMT) aims to translate input source sentences paired with images into the target language. Although a lot of MNMT models have been proposed to perform multi-modal semantic fusion, they do not consider fine-grained semantic correspondences between semantic units of different modalities (i. e. , words and visual objects), which can be exploited to refine multi-modal representation learning via fine-grained semantic interactions. To address this issue, we propose a graph-based multi-modal fusion encoder for NMT. Concretely, we first employ a unified multi-modal graph to represent the input sentence and image, in which the multi-modal semantic units are considered as the nodes in the graph, connected by two kinds of edges with different semantic relationships. Then, we stack multiple graph-based multi-modal fusion layers that iteratively conduct intra- and inter-modal interactions to learn node representations. Finally, via an attention mechanism, we induce a multi-modal context from the top node representations for the decoder. Particularly, we introduce a progressive contrastive learning strategy based on the multi-modal graph to refine the training of our proposed model, where hard negative samples are introduced gradually. To evaluate our model, we conduct experiments on commonly-used datasets. Experimental results and analysis show that our MNMT model obtains significant improvements over competitive baselines, achieving state-of-the-art performance on the Multi30K dataset.

AIJ Journal 2023 Journal Article

Search-engine-augmented dialogue response generation with cheaply supervised query production

  • Ante Wang
  • Linfeng Song
  • Qi Liu
  • Haitao Mi
  • Longyue Wang
  • Zhaopeng Tu
  • Jinsong Su
  • Dong Yu

Knowledge-aided dialogue response generation aims at augmenting chatbots with relevant external knowledge in the hope of generating more informative responses. The majority of previous work assumes that the relevant knowledge is given as input or retrieved from a static pool of knowledge. However, this assumption violates the real-world situation, where knowledge is continually updated and a chatbot has to dynamically retrieve useful knowledge. We propose a dialogue model that can access the vast and dynamic information from any search engine for response generation. As the core module, a query producer is used to generate queries from a dialogue context to interact with a search engine. We design a training algorithm using cheap noisy supervision for the query producer, where the signals are obtained by comparing retrieved articles with the next dialogue response. As the result, the query producer is adjusted without any human annotation of gold queries, making it easily transferable to other domains and search engines. Experiments show that our query producer can achieve R@1 and R@5 rates of 62. 4% and 74. 8% for retrieving gold knowledge, and the overall model generates better responses over strong knowledge-aided baselines using BART [1] and other typical systems.

AAAI Conference 2022 Conference Paper

A Label Dependence-Aware Sequence Generation Model for Multi-Level Implicit Discourse Relation Recognition

  • Changxing Wu
  • Liuwen Cao
  • Yubin Ge
  • Yang Liu
  • Min Zhang
  • Jinsong Su

Implicit discourse relation recognition (IDRR) is a challenging but crucial task in discourse analysis. Most existing methods train multiple models to predict multi-level labels independently, while ignoring the dependence between hierarchically structured labels. In this paper, we consider multi-level IDRR as a conditional label sequence generation task and propose a Label Dependence-aware Sequence Generation Model (LDSGM) for it. Specifically, we first design a label attentive encoder to learn the global representation of an input instance and its level-specific contexts, where the label dependence is integrated to obtain better label embeddings. Then, we employ a label sequence decoder to output the predicted labels in a top-down manner, where the predicted higherlevel labels are directly used to guide the label prediction at the current level. We further develop a mutual learning enhanced training method to exploit the label dependence in a bottom-up direction, which is captured by an auxiliary decoder introduced during training. Experimental results on the PDTB dataset show that our model achieves the state-of-theart performance on multi-level IDRR. We release our code at https: //github. com/nlpersECJTU/LDSGM.

JAIR Journal 2022 Journal Article

AAN+: Generalized Average Attention Network for Accelerating Neural Transformer

  • Biao Zhang
  • Deyi Xiong
  • Yubin Ge
  • Junfeng Yao
  • Hao Yue
  • Jinsong Su

Transformer benefits from the high parallelization of attention networks in fast training, but it still suffers from slow decoding partially due to the linear dependency O(m) of the decoder self-attention on previous target words at inference. In this paper, we propose a generalized average attention network (AAN+) aiming at speeding up decoding by reducing the dependency from O(m) to O(1). We find that the learned self-attention weights in the decoder follow some patterns which can be approximated via a dynamic structure. Based on this insight, we develop AAN+, extending our previously proposed average attention (Zhang et al., 2018a, AAN) to support more general position- and content-based attention patterns. AAN+ only requires to maintain a small constant number of hidden states during decoding, ensuring its O(1) dependency. We apply AAN+ as a drop-in replacement of the decoder selfattention and conduct experiments on machine translation (with diverse language pairs), table-to-text generation and document summarization. With masking tricks and dynamic programming, AAN+ enables Transformer to decode sentences around 20% faster without largely compromising in the training speed and the generation performance. Our results further reveal the importance of the localness (neighboring words) in AAN+ and its capability in modeling long-range dependency.

JAIR Journal 2022 Journal Article

CASA: Conversational Aspect Sentiment Analysis for Dialogue Understanding

  • Linfeng Song
  • Chunlei Xin
  • Shaopeng Lai
  • Ante Wang
  • Jinsong Su
  • Kun Xu

Dialogue understanding has always been a bottleneck for many conversational tasks, such as dialogue response generation and conversational question answering. To expedite the progress in this area, we introduce the task of conversational aspect sentiment analysis (CASA) that can provide useful fine-grained sentiment information for dialogue understanding and planning. Overall, this task extends the standard aspect-based sentiment analysis to the conversational scenario with several major adaptations. To aid the training and evaluation of data-driven methods, we annotate 3,000 chit-chat dialogues (27,198 sentences) with fine-grained sentiment information, including all sentiment expressions, their polarities and the corresponding target mentions. We also annotate an out-of-domain test set of 200 dialogues for robustness evaluation. Besides, we develop multiple baselines based on either pretrained BERT or self-attention for preliminary study. Experimental results show that our BERT-based model has strong performances for both in-domain and out-of-domain datasets, and thorough analysis indicates several potential directions for further improvements.

AAAI Conference 2022 Conference Paper

KGR4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

  • Xin Liu
  • Dayiheng Liu
  • Baosong Yang
  • Haibo Zhang
  • Junwei Ding
  • Wenqing Yao
  • Weihua Luo
  • Haiying Zhang

Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently. However, existing models cannot perform as well as humans, since sentences they produce are often implausible and grammatically incorrect. In this paper, inspired by the process of humans creating sentences, we propose a novel Knowledgeenhanced Commonsense Generation framework, termed KGR4, consisting of four stages: Retrieval, Retrospect, Refine, Rethink. Under this framework, we first perform retrieval to search for relevant sentences from external corpus as the prototypes. Then, we train the generator that either edits or copies these prototypes to generate candidate sentences, of which potential errors will be fixed by an autoencoderbased refiner. Finally, we select the output sentence from candidate sentences produced by generators with different hyper-parameters. Experimental results and in-depth analysis on the CommonGen benchmark strongly demonstrate the effectiveness of our framework. Particularly, KGR4 obtains 33. 56 SPICE points in the official leaderboard, outperforming the previously-reported best result by 2. 49 SPICE points and achieving state-of-the-art performance. We release the code at https: //github. com/DeepLearnXMU/KGR-4.

IJCAI Conference 2021 Conference Paper

A Structure Self-Aware Model for Discourse Parsing on Multi-Party Dialogues

  • Ante Wang
  • Linfeng Song
  • Hui Jiang
  • Shaopeng Lai
  • Junfeng Yao
  • Min Zhang
  • Jinsong Su

Conversational discourse structures aim to describe how a dialogue is organized, thus they are helpful for dialogue understanding and response generation. This paper focuses on predicting discourse dependency structures for multi-party dialogues. Previous work adopts incremental methods that take the features from the already predicted discourse relations to help generate the next one. Although the inter-correlations among predictions considered, we find that the error propagation is also very serious and hurts the overall performance. To alleviate error propagation, we propose a Structure Self-Aware (SSA) model, which adopts a novel edge-centric Graph Neural Network (GNN) to update the information between each Elementary Discourse Unit (EDU) pair layer by layer, so that expressive representations can be learned without historical predictions. In addition, we take auxiliary training signals (e. g. structure distillation) for better representation learning. Our model achieves the new state-of-the-art performances on two conversational discourse parsing benchmarks, largely outperforming the previous methods.

JAIR Journal 2021 Journal Article

An External Knowledge Enhanced Graph-based Neural Network for Sentence Ordering

  • Yongjing Yin
  • Shaopeng Lai
  • Linfeng Song
  • Chulun Zhou
  • Xianpei Han
  • Junfeng Yao
  • Jinsong Su

As an important text coherence modeling task, sentence ordering aims to coherently organize a given set of unordered sentences. To achieve this goal, the most important step is to effectively capture and exploit global dependencies among these sentences. In this paper, we propose a novel and flexible external knowledge enhanced graph-based neural network for sentence ordering. Specifically, we first represent the input sentences as a graph, where various kinds of relations (i.e., entity-entity, sentence-sentence and entity-sentence) are exploited to make the graph representation more expressive and less noisy. Then, we introduce graph recurrent network to learn semantic representations of the sentences. To demonstrate the effectiveness of our model, we conduct experiments on several benchmark datasets. The experimental results and in-depth analysis show our model significantly outperforms the existing state-of-the-art models.

AIJ Journal 2021 Journal Article

Enhanced aspect-based sentiment analysis models with progressive self-supervised attention learning

  • Jinsong Su
  • Jialong Tang
  • Hui Jiang
  • Ziyao Lu
  • Yubin Ge
  • Linfeng Song
  • Deyi Xiong
  • Le Sun

In aspect-based sentiment analysis (ABSA), many neural models are equipped with an attention mechanism to quantify the contribution of each context word to sentiment prediction. However, such a mechanism suffers from one drawback: only a few frequent words with sentiment polarities are tended to be taken into consideration for final sentiment decision while abundant infrequent sentiment words are ignored by models. To deal with this issue, we propose a progressive self-supervised attention learning approach for attentional ABSA models. In this approach, we iteratively perform sentiment prediction on all training instances, and continually learn useful attention supervision information in the meantime. During training, at each iteration, context words with the highest impact on sentiment prediction, identified based on their attention weights or gradients, are extracted as words with active/misleading influence on the correct/incorrect prediction for each instance. Words extracted in this way are masked for subsequent iterations. To exploit these extracted words for refining ABSA models, we augment the conventional training objective with a regularization term that encourages ABSA models to not only take full advantage of the extracted active context words but also decrease the weights of those misleading words. We integrate the proposed approach into three state-of-the-art neural ABSA models. Experiment results and in-depth analyses show that our approach yields better attention results and significantly enhances the performance of all three models. We release the source code and trained models at https: //github. com/DeepLearnXMU/PSSAttention.

AAAI Conference 2021 Conference Paper

Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning

  • Binbin Xie
  • Jinsong Su
  • Yubin Ge
  • Xiang Li
  • Jianwei Cui
  • Junfeng Yao
  • Bin Wang

Code generation aims to automatically generate a piece of code given an input natural language utterance. Currently, among dominant models, it is treated as a sequence-to-tree task, where a decoder outputs a sequence of actions corresponding to the pre-order traversal of an Abstract Syntax Tree. However, such a decoder only exploits the preorder traversal based preceding actions, which are insufficient to ensure correct action predictions. In this paper, we first throughly analyze the context modeling difference between neural code generation models with different traversals based decodings (preorder traversal vs breadth-first traversal), and then propose to introduce a mutual learning framework to jointly train these models. Under this framework, we continuously enhance both two models via mutual distillation, which involves synchronous executions of two one-to-one knowledge transfers at each training step. More specifically, we alternately choose one model as the student and the other as its teacher, and require the student to fit the training data and the action prediction distributions of its teacher. By doing so, both models can fully absorb the knowledge from each other and thus could be improved simultaneously. Experimental results and in-depth analysis on several benchmark datasets demonstrate the effectiveness of our approach. We release our code at https: //github. com/DeepLearnXMU/CGML.

AAAI Conference 2020 Conference Paper

A Robust Adversarial Training Approach to Machine Reading Comprehension

  • Kai Liu
  • Xin Liu
  • An Yang
  • Jing Liu
  • Jinsong Su
  • Sujian Li
  • Qiaoqiao She

Lacking robustness is a serious problem for Machine Reading Comprehension (MRC) models. To alleviate this problem, one of the most promising ways is to augment the training dataset with sophisticated designed adversarial examples. Generally, those examples are created by rules according to the observed patterns of successful adversarial attacks. Since the types of adversarial examples are innumerable, it is not adequate to manually design and enrich training data to defend against all types of adversarial attacks. In this paper, we propose a novel robust adversarial training approach to improve the robustness of MRC models in a more generic way. Given an MRC model well-trained on the original dataset, our approach dynamically generates adversarial examples based on the parameters of current model and further trains the model by using the generated examples in an iterative schedule. When applied to the state-of-the-art MRC models, including QANET, BERT and ERNIE2. 0, our approach obtains significant and comprehensive improvements on 5 adversarial datasets constructed in different ways, without sacrificing the performance on the original SQuAD development set. Moreover, when coupled with other data augmentation strategy, our approach further boosts the overall performance on adversarial datasets and outperforms the state-of-the-art methods.

IJCAI Conference 2020 Conference Paper

An Iterative Multi-Source Mutual Knowledge Transfer Framework for Machine Reading Comprehension

  • Xin Liu
  • Kai Liu
  • Xiang Li
  • Jinsong Su
  • Yubin Ge
  • Bin Wang
  • Jiebo Luo

The lack of sufficient training data in many domains, poses a major challenge to the construction of domain-specific machine reading comprehension (MRC) models with satisfying performance. In this paper, we propose a novel iterative multi-source mutual knowledge transfer framework for MRC. As an extension of the conventional knowledge transfer with one-to-one correspondence, our framework focuses on the many-to-many mutual transfer, which involves synchronous executions of multiple many-to-one transfers in an iterative manner. Specifically, to update a target-domain MRC model, we first consider other domain-specific MRC models as individual teachers, and employ knowledge distillation to train a multi-domain MRC model, which is differentially required to fit the training data and match the outputs of these individual models according to their domain-level similarities to the target domain. After being initialized by the multi-domain MRC model, the target-domain MRC model is fine-tuned to match both its training data and the output of its previous best model simultaneously via knowledge distillation. Compared with previous approaches, our framework can continuously enhance all domain-specific MRC models by enabling each model to iteratively and differentially absorb the domain-shared knowledge from others. Experimental results and in-depth analyses on several benchmark datasets demonstrate the effectiveness of our framework.

AAAI Conference 2020 Conference Paper

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions

  • Yongjing Yin
  • Fandong Meng
  • Jinsong Su
  • Yubin Ge
  • Lingeng Song
  • Jie Zhou
  • Jiebo Luo

Dominant sentence ordering models use a pointer network decoder to generate ordering sequences in a left-to-right fashion. However, such a decoder only exploits the noisy leftside encoded context, which is insufficient to ensure correct sentence ordering. To address this deficiency, we propose to enhance the pointer network decoder by using two pairwise ordering prediction modules: The FUTURE module predicts the relative orientations of other unordered sentences with respect to the candidate sentence, and the HIS- TORY module measures the local coherence between several (e. g. , 2) previously ordered sentences and the candidate sentence, without the influence of noisy left-side context. Using the pointer mechanism, we then incorporate this dynamically generated information into the decoder as a supplement to the left-side context for better predictions. On several commonly-used datasets, our model significantly outperforms other baselines, achieving the state-of-the-art performance. Further analyses verify that pairwise ordering predictions indeed provide extra useful context as expected, leading to better sentence ordering. We also evaluate our sentence ordering models on a downstream task, multi-document summarization, and the summaries reordered by our model achieve the best coherence scores. Our code is available at https: //github. com/DeepLearnXMU/Pairwise. git.

AAAI Conference 2020 Conference Paper

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention

  • Jiali Zeng
  • Linfeng Song
  • Jinsong Su
  • Jun Xie
  • Wei Song
  • Jiebo Luo

Simile recognition is to detect simile sentences and to extract simile components, i. e. , tenors and vehicles. It involves two subtasks: simile sentence classification and simile component extraction. Recent work has shown that standard multitask learning is effective for Chinese simile recognition, but it is still uncertain whether the mutual effects between the subtasks have been well captured by simple parameter sharing. We propose a novel cyclic multitask learning framework for neural simile recognition, which stacks the subtasks and makes them into a loop by connecting the last to the first. It iteratively performs each subtask, taking the outputs of the previous subtask as additional inputs to the current one, so that the interdependence between the subtasks can be better explored. Extensive experiments show that our framework significantly outperforms the current state-of-the-art model and our carefully designed baselines, and the gains are still remarkable using BERT. Source Code of this paper are available on https: //github. com/DeepLearnXMU/Cyclic.

TIST Journal 2020 Journal Article

Uncovering Media Bias via Social Network Learning

  • Yiyi Zhou
  • Rongrong Ji
  • Jinsong Su
  • Jiaquan Yao

It is known that media outlets, such as CNN and FOX, have intrinsic political bias that is reflected in their news reports. The computational prediction of such bias has broad application prospects. However, the prediction is difficult via directly analyzing the news content without high-level context. In contrast, social signals (e.g., the network structure of media followers) provide inspiring cues to uncover such bias. In this article, we realize the first attempt of predicting the latent bias of media outlets by analyzing their social network structures. In particular, we address two key challenges: network sparsity and label sparsity. The network sparsity refers to the partial sampling of the entire follower network in practical analysis and computing, whereas the label sparsity refers to the difficulty of annotating sufficient labels to train the prediction model. To cope with the network sparsity, we propose a hybrid sampling strategy to construct a training corpus that contains network information from micro to macro views. Based on this training corpus, a semi-supervised network embedding approach is proposed to learn low-dimensional yet effective network representations. To deal with the label sparsity, we adopt a graph-based label propagation scheme to supplement the missing links and augment label information for model training. The preceding two steps are iteratively optimized to reinforce each other. We further collect a large-scale dataset containing social networks of 10 media outlets together with about 300,000 followers and more than 5 million connections. Over this dataset, we compare our model to a range of state of the art. Superior performance gains demonstrate the merits of the proposed approach. More importantly, the experimental results and analyses confirm the validity of our approach for the computerized prediction of media bias.

AAAI Conference 2019 Conference Paper

Dynamic Capsule Attention for Visual Question Answering

  • Yiyi Zhou
  • Rongrong Ji
  • Jinsong Su
  • Xiaoshuai Sun
  • Weiqiu Chen

In visual question answering (VQA), recent advances have well advocated the use of attention mechanism to precisely link the question to the potential answer areas. As the difficulty of the question increases, more VQA models adopt multiple attention layers to capture the deeper visual-linguistic correlation. But a negative consequence is the explosion of parameters, which makes the model vulnerable to over-fitting, especially when limited training examples are given. In this paper, we propose an extremely compact alternative to this static multi-layer architecture towards accurate yet efficient attention modeling, termed as Dynamic Capsule Attention (CapsAtt). Inspired by the recent work of Capsule Network, CapsAtt treats visual features as capsules and obtains the attention output via dynamic routing, which updates the attention weights by calculating coupling coefficients between the underlying and output capsules. Meanwhile, CapsAtt also discards redundant projection matrices to make the model much more compact. We quantify CapsAtt on three benchmark VQA datasets, i. e. , COCO-QA, VQA1. 0 and VQA2. 0. Compared to the traditional multi-layer attention model, CapsAtt achieves significant improvements of up to 4. 1%, 5. 2% and 2. 2% on three datasets, respectively. Moreover, with much fewer parameters, our approach also yields competitive results compared to the latest VQA models. To further verify the generalization ability of CapsAtt, we also deploy it on another challenging multi-modal task of image captioning, where state-of-the-art performance is achieved with a simple network structure.

AIJ Journal 2019 Journal Article

Exploiting reverse target-side contexts for neural machine translation via asynchronous bidirectional decoding

  • Jinsong Su
  • Xiangwen Zhang
  • Qian Lin
  • Yue Qin
  • Junfeng Yao
  • Yang Liu

Based on a unified encoder-decoder framework with attentional mechanism, neural machine translation (NMT) models have attracted much attention and become the mainstream in the community of machine translation. Generally, the NMT decoders produce translation in a left-to-right way. As a result, only left-to-right target-side contexts from the generated translations are exploited, while the right-to-left target-side contexts are completely unexploited for translation. In this paper, we extend the conventional attentional encoder-decoder NMT framework by introducing a backward decoder, in order to explore asynchronous bidirectional decoding for NMT. In the first step after encoding, our backward decoder learns to generate the target-side hidden states in a right-to-left manner. Next, in each timestep of translation prediction, our forward decoder concurrently considers both the source-side and the reverse target-side hidden states via two attention models. Compared with previous models, the innovation in this architecture enables our model to fully exploit contexts from both source side and target side, which improve translation quality altogether. We conducted experiments on NIST Chinese-English, WMT English-German and Finnish-English translation tasks to investigate the effectiveness of our model. Experimental results show that (1) our improved RNN-based NMT model achieves significant improvements over the conventional RNNSearch by 1. 44/-3. 02, 1. 11/-1. 01, and 1. 23/-1. 27 average BLEU and TER points, respectively; and (2) our enhanced Transformer outperforms the standard Transformer by 1. 56/-1. 49, 1. 76/-2. 49, and 1. 29/-1. 33 average BLEU and TER points, respectively. We released our code at https: //github. com/DeepLearnXMU/ABD-NMT.

AAAI Conference 2019 Conference Paper

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning

  • Yiyi Zhou
  • Rongrong Ji
  • Jinsong Su
  • Xiangming Li
  • Xiaoshuai Sun

In this paper, we uncover the issue of knowledge inertia in visual question answering (VQA), which commonly exists in most VQA models and forces the models to mainly rely on the question content to “guess” answer, without regard to the visual information. Such an issue not only impairs the performance of VQA models, but also greatly reduces the credibility of the answer prediction. To this end, simply highlighting the visual features in the model is undoable, since the prediction is built upon the joint modeling of two modalities and largely influenced by the data distribution. In this paper, we propose a Pairwise Inconformity Learning (PIL) to tackle the issue of knowledge inertia. In particular, PIL takes full advantage of the similar image pairs with diverse answers to an identical question provided in VQA2. 0 dataset. It builds a multi-modal embedding space to project pos. /neg. feature pairs, upon which word vectors of answers are modeled as anchors. By doing so, PIL strengthens the importance of visual features in prediction with a novel dynamic-margin based triplet loss that efficiently increases the semantic discrepancies between pos. /neg. image pairs. To verify the proposed PIL, we plug it on a baseline VQA model as well as a set of recent VQA models, and conduct extensive experiments on two benchmark datasets, i. e. , VQA1. 0 and VQA2. 0. Experimental results show that PIL can boost the accuracy of the existing VQA models (1. 56%-2. 93% gain) with a negligible increase in parameters (0. 85%-5. 4% parameters). Qualitative results also reveal the elimination of knowledge inertia in the existing VQA models after implementing our PIL.

IJCAI Conference 2019 Conference Paper

Graph-based Neural Sentence Ordering

  • Yongjing Yin
  • Linfeng Song
  • Jinsong Su
  • Jiali Zeng
  • Chulun Zhou
  • Jiebo Luo

Sentence ordering is to restore the original paragraph from a set of sentences. It involves capturing global dependencies among sentences regardless of their input order. In this paper, we propose a novel and flexible graph-based neural sentence ordering model, which adopts graph recurrent network \citep{Zhang: acl18} to accurately learn semantic representations of the sentences. Instead of assuming connections between all pairs of input sentences, we use entities that are shared among multiple sentences to make more expressive graph representations with less noise. Experimental results show that our proposed model outperforms the existing state-of-the-art systems on several benchmark datasets, demonstrating the effectiveness of our model. We also conduct a thorough analysis on how entities help the performance. Our code is available at https: //github. com/DeepLearnXMU/NSEG. git.

IJCAI Conference 2019 Conference Paper

Neural Collective Entity Linking Based on Recurrent Random Walk Network Learning

  • Mengge Xue
  • Weiming Cai
  • Jinsong Su
  • Linfeng Song
  • Yubin Ge
  • Yubao Liu
  • Bin Wang

Benefiting from the excellent ability of neural networks on learning semantic representations, existing studies for entity linking (EL) have resorted to neural networks to exploit both the local mention-to-entity compatibility and the global interdependence between different EL decisions for target entity disambiguation. However, most neural collective EL methods depend entirely upon neural networks to automatically model the semantic dependencies between different EL decisions, which lack of the guidance from external knowledge. In this paper, we propose a novel end-to-end neural network with recurrent random-walk layers for collective EL, which introduces external knowledge to model the semantic interdependence between different EL decisions. Specifically, we first establish a model based on local context features, and then stack random-walk layers to reinforce the evidence for related EL decisions into high-probability decisions, where the semantic interdependence between candidate entities is mainly induced from an external knowledge base. Finally, a semantic regularizer that preserves the collective EL decisions consistency is incorporated into the conventional objective function, so that the external knowledge base can be fully exploited in collective EL decisions. Experimental results and in-depth analysis on various datasets show that our model achieves better performance than other state-of-the-art models. Our code and data are released at https: //github. com/DeepLearnXMU/RRWEL.

AAAI Conference 2018 Conference Paper

Asynchronous Bidirectional Decoding for Neural Machine Translation

  • Xiangwen Zhang
  • Jinsong Su
  • Yue Qin
  • Yang Liu
  • Rongrong Ji
  • Hongji Wang

The dominant neural machine translation (NMT) models apply unified attentional encoder-decoder neural networks for translation. Traditionally, the NMT decoders adopt recurrent neural networks (RNNs) to perform translation in a left-toright manner, leaving the target-side contexts generated from right to left unexploited during translation. In this paper, we equip the conventional attentional encoder-decoder NMT framework with a backward decoder, in order to explore bidirectional decoding for NMT. Attending to the hidden state sequence produced by the encoder, our backward decoder first learns to generate the target-side hidden state sequence from right to left. Then, the forward decoder performs translation in the forward direction, while in each translation prediction timestep, it simultaneously applies two attention models to consider the source-side and reverse target-side hidden states, respectively. With this new architecture, our model is able to fully exploit source- and target-side contexts to improve translation quality altogether. Experimental results on NIST Chinese-English and WMT English-German translation tasks demonstrate that our model achieves substantial improvements over the conventional NMT by 3. 14 and 1. 38 BLEU points, respectively. The source code of this work can be obtained from https: //github. com/DeepLearnXMU/ABD- NMT.

AAAI Conference 2018 Conference Paper

Variational Recurrent Neural Machine Translation

  • Jinsong Su
  • Shan Wu
  • Deyi Xiong
  • Yaojie Lu
  • Xianpei Han
  • Biao Zhang

Partially inspired by successful applications of variational recurrent neural networks, we propose a novel variational recurrent neural machine translation (VRNMT) model in this paper. Different from the variational NMT, VRNMT introduces a series of latent random variables to model the translation procedure of a sentence in a generative way, instead of a single latent variable. Specifically, the latent random variables are included into the hidden states of the NMT decoder with elements from the variational autoencoder. In this way, these variables are recurrently generated, which enables them to further capture strong and complex dependencies among the output translations at different timesteps. In order to deal with the challenges in performing efficient posterior inference and large-scale training during the incorporation of latent variables, we build a neural posterior approximator, and equip it with a reparameterization technique to estimate the variational lower bound. Experiments on Chinese-English and English-German translation tasks demonstrate that the proposed model achieves significant improvements over both the conventional and variational NMT models.

AAAI Conference 2017 Conference Paper

BattRAE: Bidimensional Attention-Based Recursive Autoencoders for Learning Bilingual Phrase Embeddings

  • Biao Zhang
  • Deyi Xiong
  • Jinsong Su

In this paper, we propose a bidimensional attention based recursive autoencoder (BattRAE) to integrate clues and sourcetarget interactions at multiple levels of granularity into bilingual phrase representations. We employ recursive autoencoders to generate tree structures of phrases with embeddings at different levels of granularity (e. g. , words, sub-phrases and phrases). Over these embeddings on the source and target side, we introduce a bidimensional attention network to learn their interactions encoded in a bidimensional attention matrix, from which we extract two soft attention weight distributions simultaneously. These weight distributions enable BattRAE to generate compositive phrase representations via convolution. Based on the learned phrase representations, we further use a bilinear neural model, trained via a max-margin method, to measure bilingual semantic similarity. To evaluate the effectiveness of BattRAE, we incorporate this semantic similarity as an additional feature into a state-of-the-art SMT system. Extensive experiments on NIST Chinese-English test sets show that our model achieves a substantial improvement of up to 1. 63 BLEU points on average over the baseline.

AAAI Conference 2017 Conference Paper

Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation

  • Jinsong Su
  • Zhixing Tan
  • Deyi Xiong
  • Rongrong Ji
  • Xiaodong Shi
  • Yang Liu

Neural machine translation (NMT) heavily relies on wordlevel modelling to learn semantic representations of input sentences. However, for languages without natural word delimiters (e. g. , Chinese) where input sentences have to be tokenized first, conventional NMT is confronted with two issues: 1) it is difficult to find an optimal tokenization granularity for source sentence modelling, and 2) errors in 1-best tokenizations may propagate to the encoder of NMT. To handle these issues, we propose word-lattice based Recurrent Neural Network (RNN) encoders for NMT, which generalize the standard RNN to word lattice topology. The proposed encoders take as input a word lattice that compactly encodes multiple tokenizations, and learn to generate new hidden states from arbitrarily many inputs and hidden states in preceding time steps. As such, the word-lattice based encoders not only alleviate the negative impact of tokenization errors but also are more expressive and flexible to embed input sentences. Experiment results on Chinese-English translation demonstrate the superiorities of the proposed encoders over the conventional encoder.

IJCAI Conference 2016 Conference Paper

Tree-State Based Rule Selection Models for Hierarchical Phrase-Based Machine Translation

  • Shujian Huang
  • Huifeng Sun
  • Chengqi Zhao
  • Jinsong Su
  • Xin-yu Dai
  • Jiajun Chen

Hierarchical phrase-based translation systems (HPBs) perform translation using a synchronous context free grammar which has only one unified non-terminal for every translation rule. While the usage of the unified non-terminal brings freedom to generate translations with almost arbitrary structures, it also takes the risks of generating low-quality translations which has a wrong syntactic structure. In this paper, we propose tree-state models to discriminate the good or bad usage of translation rules based on the syntactic structures of the source sentence. We propose to use statistical models and context dependent features to estimate the probability of each tree state for each translation rule and punish the usage of rules in the translation system which violates their tree states. Experimental results demonstrate that these simple models could bring significant improvements to the translation quality.

IJCAI Conference 2015 Conference Paper

Discriminative Reordering Model Adaptation via Structural Learning

  • Biao Zhang
  • Jinsong Su
  • Deyi Xiong
  • Hong Duan
  • Junfeng Yao

Reordering model adaptation remains a big challenge in statistical machine translation because reordering patterns of translation units often vary dramatically from one domain to another. In this paper, we propose a novel adaptive discriminative reordering model (DRM) based on structural learning, which can capture correspondences among reordering features from two different domains. Exploiting both in-domain and out-of-domain monolingual corpora, our model learns a shared feature representation for cross-domain phrase reordering. Incorporating features of this representation, the DRM trained on out-of-domain corpus generalizes better to in-domain data. Experiment results on the NIST Chinese-English translation task show that our approach significantly outperforms a variety of baselines.

EAAI Journal 2015 Journal Article

Unsupervised word sense induction using rival penalized competitive learning

  • Yanzhou Huang
  • Xiaodong Shi
  • Jinsong Su
  • Yidong Chen
  • Guimin Huang

Word sense induction (WSI) aims to automatically identify different senses of an ambiguous word from its contexts. It is a nontrivial task to perform WSI in natural language processing because word sense ambiguity is pervasive in linguistic expressions. In this paper, we construct multi-granularity semantic spaces to learn the representations of ambiguous instances, in order to capture richer semantic knowledge during context modeling. In particular, we not only consider the semantic space of words, but the semantic space of word clusters and topics as well. Moreover, to circumvent the difficulty of selecting the number of word senses, we adapt a rival penalized competitive learning method to determine the number of word senses automatically via gradually repelling the redundant sense clusters. We validate the effectiveness of our method on several public WSI datasets and the results show that our method is able to improve the quality of WSI over several competitive baselines.