Arrow Research search

Author name cluster

Jiajun Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

AAAI Conference 2026 Conference Paper

CueBench: Advancing Unified Understanding of Context-Aware Video Anomalies in Real-World

  • Yating Yu
  • Congqi Cao
  • Zhaoying Wang
  • Weihua Meng
  • Jie Li
  • Yuxin Li
  • Zihao Wei
  • Zhongpei Shen

How far are deep models from real-world video anomaly understanding (VAU)? Current works typically emphasize detecting unexpected occurrences deviating from normal patterns or comprehending anomalous events with interpretable descriptions. However, they exhibit only a superficial comprehension of real-world anomalies, with limited breadth in complex principles and subtle contexts that distinguish the anomalies from normalities, e.g., climbing cliffs with safety gear vs. without it. To this end, we introduce CueBench, the first of its kind Benchmark, devoted to Context-aware video anomalies within a Unified Evaluation framework. We comprehensively establish an event-centric hierarchical taxonomy that anchors two core event types: 14 conditional and 18 absolute anomaly events, defined by their refined semantics from diverse contexts across 174 scenes and 198 attributes. Based on this, we propose to unify and benchmark context-aware VAU with various challenging tasks across recognition, temporal grounding, detection, and anticipation. It also serves as a rigorous and fair probing evaluation suite for generalized and specialized vision-language models (VLMs) across both generative and discriminative paradigms. To address the challenges underlying CueBench, we further develop Cue-R1 based on R1-style reinforcement fine-tuning with verifiable, task-aligned, and hierarchy-refined rewards in a unified generative manner. Extensive results on CueBench reveal that, existing VLMs are still far from satisfactory real-world anomaly understanding, while our Cue-R1 surpasses these state-of-the-art approaches by over 24% on average.

AAAI Conference 2026 Conference Paper

UNSEEN: Enhancing Dataset Pruning from a Generalization Perspective

  • Furui Xu
  • Shaobo Wang
  • Jiajun Zhang
  • Chenghao Sun
  • Haixiang Tang
  • Linfeng Zhang

The growing scale of datasets in deep learning has introduced significant computational challenges. Dataset pruning addresses this challenge by constructing a compact but informative coreset from the full dataset with comparable performance. Previous approaches typically establish scoring metrics based on specific criteria to identify representative samples. However, these methods predominantly rely on sample scores obtained from the model's performance during the training (i.e., fitting) phase. As scoring models achieve near-optimal performance on training data, such fitting-centric approaches induce a dense distribution of sample scores within a narrow numerical range. This concentration reduces the distinction between samples and hinders effective selection. To address this challenge, we conduct dataset pruning from the perspective of generalization, i.e., scoring samples based on models not exposed to them during training. We propose a plug-and-play framework, UNSEEN, which can be integrated into existing dataset pruning methods. Additionally, conventional score-based methods are single-step and rely on models trained solely on the complete dataset, providing limited perspective on the importance of samples. To address this limitation, we scale UNSEEN to multi-step scenarios and propose an incremental selection technique through scoring models trained on varying coresets, and optimize the quality of the coreset dynamically. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art (SOTA) methods on CIFAR-10, CIFAR-100, and ImageNet-1K. Notably, on ImageNet-1K, UNSEEN achieves lossless performance while reducing training data by 30%.

JAIR Journal 2025 Journal Article

A Survey on Data Selection for LLM Instruction Tuning

  • Bolin Zhang
  • Jiahao Wang
  • Qianlong Du
  • Jiajun Zhang
  • Zhiying Tu
  • Dianhui Chu

Instruction tuning is a vital step of training large language models (LLMs), so how to enhance the effect of instruction tuning has received increased attention. Existing works indicate that the quality of the dataset is more crucial than the quantity during instruction tuning of LLMs. Therefore, recently a lot of studies focus on exploring the methods of selecting high-quality subset from instruction datasets, aiming to reduce training costs and enhance the instruction-following capabilities of LLMs. This paper presents a comprehensive survey on data selection for LLM instruction tuning. Firstly, we introduce the wildly used instruction datasets. Then, we propose a new taxonomy of the data selection methods and provide a detailed introduction of recent advances, and the evaluation strategies and results of data selection methods are also elaborated in detail. Finally, we emphasize the open challenges and present new frontiers of this task.

NeurIPS Conference 2025 Conference Paper

KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning

  • Wei Sun
  • Wen Yang
  • Pu Jian
  • Qianlong Du
  • Fuwei Cui
  • Shuo Ren
  • Jiajun Zhang

Recent advances have demonstrated that integrating reinforcement learning with rule-based rewards can significantly enhance the reasoning capabilities of large language models (LLMs), even without supervised fine-tuning (SFT). However, prevalent reinforcement learning algorithms such as GRPO and its variants like DAPO, suffer from a coarse granularity issue when computing the advantage. Specifically, they compute rollout-level advantages that assign identical values to every token within a sequence, failing to capture token-specific contributions. To address this limitation, we propose Key-token Advantage Estimation (KTAE)—a novel algorithm that estimates fine-grained, token-level advantages without introducing additional models. KTAE leverages the correctness of sampled rollouts and applies statistical analysis to quantify the importance of individual tokens within a sequence to the final outcome. This quantified token-level importance is then combined with the rollout-level advantage to obtain a more fine-grained token-level advantage estimation. Empirical results show that models trained with GRPO+KTAE and DAPO+KTAE outperform baseline methods across five mathematical reasoning benchmarks. Notably, they achieve higher accuracy with shorter responses and even surpass R1-Distill-Qwen-1. 5B using the same base model.

NeurIPS Conference 2025 Conference Paper

Negative Feedback Really Matters: Signed Dual-Channel Graph Contrastive Learning Framework for Recommendation

  • Leqi Zheng
  • Chaokun Wang
  • Zixin Song
  • Cheng Wu
  • Shannan Yan
  • Jiajun Zhang
  • Ziyang Liu

Traditional recommender systems have relied heavily on positive feedback for learning user preferences, while the abundance of negative feedback in real-world scenarios remains underutilized. To address this limitation, recent years have witnessed increasing attention on leveraging negative feedback in recommender systems to enhance recommendation performance. However, existing methods face three major challenges: limited model compatibility, ineffective information exchange, and computational inefficiency. To overcome these challenges, we propose a model-agnostic Signed Dual-Channel Graph Contrastive Learning (SDCGCL) framework that can be seamlessly integrated with existing graph contrastive learning methods. The framework features three key components: (1) a Dual-Channel Graph Embedding that separately processes positive and negative graphs, (2) a Cross-Channel Distribution Calibration mechanism to maintain structural consistency, and (3) an Adaptive Prediction Strategy that effectively combines signals from both channels. Building upon this framework, we further propose a Dual-channel Feedback Fusion (DualFuse) model and develop a two-stage optimization strategy to ensure efficient training. Extensive experiments on four public datasets demonstrate that our approach consistently outperforms state-of-the-art baselines by substantial margins while exhibiting minimal computational complexity. Our source code and data are released at \url{https: //github. com/LQgdwind/nips25-sdcgcl}.

ICLR Conference 2025 Conference Paper

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

  • Xingrun Xing
  • Boyan Gao
  • Zheng Liu
  • David A. Clifton
  • Shitao Xiao
  • Wanpeng Zhang 0002
  • Li Du
  • Zheng Zhang

Recent advancements in large language models (LLMs) with billions of parameters have improved performance in various applications, but their inference processes demand significant energy and computational resources. In contrast, the human brain, with approximately 86 billion neurons, is much more energy-efficient than LLMs with similar parameters. Inspired by this, we redesign 7$\sim$70 billion parameter LLMs using bio-plausible spiking mechanisms, emulating the efficient behavior of the human brain. We propose the first spiking large language model, SpikeLLM. Coupled with the proposed model, two essential approaches are proposed to improve spike training efficiency: Generalized Integrate-and-Fire (GIF) neurons to compress spike length from $T$ to $\frac{T}{L} \log_2 L$ bits, and an Optimal Brain Spiking framework to divide outlier channels and allocate different $T$ for GIF neurons, which further compresses spike length to approximate $log_2T$ bits. The necessity of spike-driven LLM is proved by comparison with quantized LLMs with similar operations. In the OmniQuant pipeline, SpikeLLM reduces 11.01\% WikiText2 perplexity and improves 2.55\% accuracy of common scene reasoning on a LLAMA-7B W4A4 model. In the GPTQ pipeline, SpikeLLM achieves direct additive in linear layers, significantly exceeding PB-LLMs. Our code is publicly available at https://github.com/Xingrun-Xing2/SpikeLLM.

ICRA Conference 2025 Conference Paper

Tensiworm: A Novel Tensegrity Robot with Enhanced Peristaltic Locomotion Efficiency

  • Christian Kazoleas
  • Jiajun Zhang
  • Sichen Yuan

Tensegrity structures have been widely explored for their lightweight, high-stiffness, and foldable properties. These unique characteristics have enabled their application in various fields including robotics. Tensegrity robots have demonstrated diverse locomotion modes offering versatile solutions for navigation in complex environments. Recent efforts in bio-inspired robotics have led to designs mimicking the movement of natural organisms, such as earthworms. However, existing designs, particularly those utilizing motor-pulley mechanisms for robot body contraction, face significant challenges due to their bulky actuation systems that reduce locomotion efficiency. This paper introduces a novel tensegrity robot, “Tensiworm, ” inspired by the peristaltic locomotion of an earthworm. Composed of three icosahedron tensegrity unit cells connected in series, the Tensiworm robot employs a sequential contraction and relaxation mechanism driven by active cable members made of shape memory actuators. This innovative design achieves a 59. 13% folding ratio and weighs only 46. 9 grams. The robot can travel a distance equal to its body length in approximately ten cycles with an average speed of 10. 01 mm per minute. Furthermore, the use of thinner, flexible structural members broadens possibilities for development of millimeter-scale tensegrity robots, which hold significant potential for biomedical applications, including in-vivo testing and targeted drug delivery.

AAAI Conference 2024 Conference Paper

BiPFT: Binary Pre-trained Foundation Transformer with Low-Rank Estimation of Binarization Residual Polynomials

  • Xingrun Xing
  • Li Du
  • Xinyuan Wang
  • Xianlin Zeng
  • Yequan Wang
  • Zheng Zhang
  • Jiajun Zhang

Pretrained foundation models offer substantial benefits for a wide range of downstream tasks, which can be one of the most potential techniques to access artificial general intelligence. However, scaling up foundation transformers for maximal task-agnostic knowledge has brought about computational challenges, especially on resource-limited devices such as mobiles. This work proposes the first Binary Pretrained Foundation Transformer (BiPFT) for natural language understanding (NLU) tasks, which remarkably saves 56 times operations and 28 times memory. In contrast to previous task-specific binary transformers, BiPFT exhibits a substantial enhancement in the learning capabilities of binary neural networks (BNNs), promoting BNNs into the era of pre-training. Benefiting from extensive pretraining data, we further propose a data-driven binarization method. Specifically, we first analyze the binarization error in self-attention operations and derive the polynomials of binarization error. To simulate full-precision self-attention, we define binarization error as binarization residual polynomials, and then introduce low-rank estimators to model these polynomials. Extensive experiments validate the effectiveness of BiPFTs, surpassing task-specific baseline by 15.4% average performance on the GLUE benchmark. BiPFT also demonstrates improved robustness to hyperparameter changes, improved optimization efficiency, and reduced reliance on downstream distillation, which consequently generalize on various NLU tasks and simplify the downstream pipeline of BNNs. Our code and pretrained models are publicly available at https://github.com/Xingrun-Xing/BiPFT.

ICML Conference 2024 Conference Paper

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

  • Xingrun Xing
  • Zheng Zhang 0006
  • Ziyi Ni
  • Shitao Xiao
  • Yiming Ju
  • Siqi Fan 0001
  • Yequan Wang
  • Jiajun Zhang

Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with 0, 1 levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https: //github. com/Xingrun-Xing/SpikeLM.

AAAI Conference 2022 Conference Paper

Parameter Differentiation Based Multilingual Neural Machine Translation

  • Qian Wang
  • Jiajun Zhang

Multilingual neural machine translation (MNMT) aims to translate multiple languages with a single model and has been proved successful thanks to effective knowledge transfer among different languages with shared parameters. However, it is still an open question which parameters should be shared and which ones need to be task-specific. Currently, the common practice is to heuristically design or search languagespecific modules, which is difficult to find the optimal configuration. In this paper, we propose a novel parameter differentiation based method that allows the model to determine which parameters should be language-specific during training. Inspired by cellular differentiation, each shared parameter in our method can dynamically differentiate into more specialized types. We further define the differentiation criterion as inter-task gradient similarity. Therefore, parameters with conflicting inter-task gradients are more likely to be language-specific. Extensive experiments on multilingual datasets have demonstrated that our method significantly outperforms various strong baselines with different parameter sharing configurations. Further analyses reveal that the parameter sharing configuration obtained by our method correlates well with the linguistic proximities.

AAAI Conference 2022 Conference Paper

Probing Word Syntactic Representations in the Brain by a Feature Elimination Method

  • Xiaohan Zhang
  • Shaonan Wang
  • Nan Lin
  • Jiajun Zhang
  • Chengqing Zong

Neuroimaging studies have identified multiple brain regions that are associated with semantic and syntactic processing when comprehending language. However, existing methods cannot explore the neural correlates of fine-grained word syntactic features, such as part-of-speech and dependency relations. This paper proposes an alternative framework to study how different word syntactic features are represented in the brain. To separate each syntactic feature, we propose a feature elimination method, called Mean Vector Null space Projection (MVNP). This method can remove a specific feature from word representations, resulting in one-feature-removed representations. Then we respectively associate one-featureremoved and the original word vectors with brain imaging data to explore how the brain represents the removed feature. This paper for the first time studies the cortical representations of multiple fine-grained syntactic features simultaneously and suggests some possible contributions of several brain regions to the complex division of syntactic processing. These findings indicate that the brain foundations of syntactic information processing might be broader than those suggested by classical studies.

AAAI Conference 2021 Conference Paper

Synchronous Interactive Decoding for Multilingual Neural Machine Translation

  • Hao He
  • Qian Wang
  • Zhipeng Yu
  • Yang Zhao
  • Jiajun Zhang
  • Chengqing Zong

To simultaneously translate a source language into multiple different target languages is one of the most common scenarios of multilingual translation. However, existing methods cannot make full use of translation model information during decoding, such as intra-lingual and inter-lingual future information, and therefore may suffer from some issues like the unbalanced outputs. In this paper, we present a new approach for synchronous interactive multilingual neural machine translation (SimNMT), which predicts each target language output simultaneously and interactively using historical and future information of all target languages. Specifically, we first propose a synchronous cross-interactive decoder in which generation of each target output does not only depend on its generated sequences, but also relies on its future information, as well as history and future contexts of other target languages. Then, we present a new interactive multilingual beam search algorithm that enables synchronous interactive decoding of all target languages in a single model. We take two target languages as an example to illustrate and evaluate the proposed SimNMT model on IWSLT datasets. The experimental results demonstrate that our method achieves significant improvements over several advanced NMT and M- NMT models.

AAAI Conference 2020 Conference Paper

Keywords-Guided Abstractive Sentence Summarization

  • Haoran Li
  • Junnan Zhu
  • Jiajun Zhang
  • Chengqing Zong
  • Xiaodong He

We study the problem of generating a summary for a given sentence. Existing researches on abstractive sentence summarization ignore that keywords in the input sentence provide significant clues for valuable content, and humans tend to write summaries covering these keywords. In this paper, we propose an abstractive sentence summarization method by applying guidance signals of keywords to both the encoder and the decoder in the sequence-to-sequence model. A multi-task learning framework is adopted to jointly learn to extract keywords and generate a summary for the input sentence. We apply keywords-guided selective encoding strategies to filter source information by investigating the interactions between the input sentence and the keywords. We extend pointer-generator network by a dual-attention and a dual-copy mechanism, which can integrate the semantics of the input sentence and the keywords, and copy words from both the input sentence and the keywords. We demonstrate that multi-task learning and keywords-oriented guidance facilitate sentence summarization task, achieving better performance than the competitive models on the English Gigaword sentence summarization dataset.

IJCAI Conference 2020 Conference Paper

Knowledge Graphs Enhanced Neural Machine Translation

  • Yang Zhao
  • Jiajun Zhang
  • Yu Zhou
  • Chengqing Zong

Knowledge graphs (KGs) store much structured information on various entities, many of which are not covered by the parallel sentence pairs of neural machine translation (NMT). To improve the translation quality of these entities, in this paper we propose a novel KGs enhanced NMT method. Specifically, we first induce the new translation results of these entities by transforming the source and target KGs into a unified semantic space. We then generate adequate pseudo parallel sentence pairs that contain these induced entity pairs. Finally, NMT model is jointly trained by the original and pseudo sentence pairs. The extensive experiments on Chinese-to-English and Englishto-Japanese translation tasks demonstrate that our method significantly outperforms the strong baseline models in translation quality, especially in handling the induced entities.

AAAI Conference 2020 Conference Paper

Multimodal Summarization with Guidance of Multimodal Reference

  • Junnan Zhu
  • Yu Zhou
  • Jiajun Zhang
  • Haoran Li
  • Chengqing Zong
  • Changliang Li

Multimodal summarization with multimodal output (MSMO) is to generate a multimodal summary for a multimodal news report, which has been proven to effectively improve users’ satisfaction. The existing MSMO methods are trained by the target of text modality, leading to the modality-bias problem that ignores the quality of model-selected image during training. To alleviate this problem, we propose a multimodal objective function with the guidance of multimodal reference to use the loss from the summary generation and the image selection. Due to the lack of multimodal reference data, we present two strategies, i. e. , ROUGE-ranking and Orderranking, to construct the multimodal reference by extending the text reference. Meanwhile, to better evaluate multimodal outputs, we propose a novel evaluation metric based on joint multimodal representation, projecting the model output and multimodal reference into a joint semantic space during evaluation. Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.

AAAI Conference 2020 Conference Paper

Probing Brain Activation Patterns by Dissociating Semantics and Syntax in Sentences

  • Shaonan Wang
  • Jiajun Zhang
  • Nan Lin
  • Chengqing Zong

The relation between semantics and syntax and where they are represented in the neural level has been extensively debated in neurosciences. Existing methods use manually designed stimuli to distinguish semantic and syntactic information in a sentence that may not generalize beyond the experimental setting. This paper proposes an alternative framework to study the brain representation of semantics and syntax. Specifically, we embed the highly-controlled stimuli as objective functions in learning sentence representations and propose a disentangled feature representation model (DFRM) to extract semantic and syntactic information in sentences. This model can generate one semantic and one syntactic vector for each sentence. Then we associate these disentangled feature vectors with brain imaging data to explore brain representation of semantics and syntax. Results have shown that semantic feature is represented more robustly than syntactic feature across the brain including the default-mode, frontoparietal, visual networks, etc.. The brain representations of semantics and syntax are largely overlapped, but there are brain regions only sensitive to one of them. For instance, several frontal and temporal regions are specific to the semantic feature; parts of the right superior frontal and right inferior parietal gyrus are specific to the syntactic feature.

AAAI Conference 2020 Conference Paper

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

  • Yuchen Liu
  • Jiajun Zhang
  • Hao Xiong
  • Long Zhou
  • Zhongjun He
  • Hua Wu
  • Haifeng Wang
  • Chengqing Zong

Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lower latency, smaller model size, and less error propagation. However, it is notoriously difficult to implement such a model without transcriptions as intermediate. Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR). However, different tasks in this method cannot utilize information from each other, which limits the improvement. Other works propose a two-stage model where the second model can use the hidden state from the first one, but its cascade manner greatly affects the ef- ficiency of training and inference process. In this paper, we propose a novel interactive attention mechanism which enables ASR and ST to perform synchronously and interactively in a single model. Specifically, the generation of transcriptions and translations not only relies on its previous outputs but also the outputs predicted in the other task. Experiments on TED speech translation corpora have shown that our proposed model can outperform strong baselines on the quality of speech translation and achieve better speech recognition performances as well.

AAAI Conference 2019 Conference Paper

Addressing the Under-Translation Problem from the Entropy Perspective

  • Yang Zhao
  • Jiajun Zhang
  • Chengqing Zong
  • Zhongjun He
  • Hua Wu

Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance in recent years. However, the under-translation problem still remains a big challenge. In this paper, we focus on the under-translation problem and attempt to find out what kinds of source words are more likely to be ignored. Through analysis, we observe that a source word with a large translation entropy is more inclined to be dropped. To address this problem, we propose a coarse-to-fine framework. In coarse-grained phase, we introduce a simple strategy to reduce the entropy of highentropy words through constructing the pseudo target sentences. In fine-grained phase, we propose three methods, including pre-training method, multitask method and two-pass method, to encourage the neural model to correctly translate these high-entropy words. Experimental results on various translation tasks show that our method can significantly improve the translation quality and substantially reduce the under-translation cases of high-entropy words.

IJCAI Conference 2019 Conference Paper

Sequence Generation: From Both Sides to the Middle

  • Long Zhou
  • Jiajun Zhang
  • Chengqing Zong
  • Heng Yu

The encoder-decoder framework has achieved promising process for many sequence generation tasks, such as neural machine translation and text summarization. Such a framework usually generates a sequence token by token from left to right, hence (1) this autoregressive decoding procedure is time-consuming when the output sentence becomes longer, and (2) it lacks the guidance of future context which is crucial to avoid under-translation. To alleviate these issues, we propose a synchronous bidirectional sequence generation (SBSG) model which predicts its outputs from both sides to the middle simultaneously. In the SBSG model, we enable the left-to-right (L2R) and right-to-left (R2L) generation to help and interact with each other by leveraging interactive bidirectional attention network. Experiments on neural machine translation (En-De, Ch-En, and En-Ro) and text summarization tasks show that the proposed model significantly speeds up decoding while improving the generation quality compared to the autoregressive Transformer.

AAAI Conference 2019 Conference Paper

Towards Sentence-Level Brain Decoding with Distributed Representations

  • Jingyuan Sun
  • Shaonan Wang
  • Jiajun Zhang
  • Chengqing Zong

Decoding human brain activities based on linguistic representations has been actively studied in recent years. However, most previous studies exclusively focus on word-level representations, and little is learned about decoding whole sentences from brain activation patterns. This work is our effort to mend the gap. In this paper, we build decoders to associate brain activities with sentence stimulus via distributed representations, the currently dominant sentence representation approach in natural language processing (NLP). We carry out a systematic evaluation, covering both widely-used baselines and state-of-the-art sentence representation models. We demonstrate how well different types of sentence representations decode the brain activation patterns and give empirical explanations of the performance difference. Moreover, to explore how sentences are neurally represented in the brain, we further compare the sentence representation’s correspondence to different brain areas associated with high-level cognitive functions. We find the supervised structured representation models most accurately probe the language atlas of human brain. To the best of our knowledge, this work is the first comprehensive evaluation of distributed sentence representations for brain decoding. We hope this work can contribute to decoding brain activities with NLP representation models, and understanding how linguistic items are neurally represented.

AAAI Conference 2018 Conference Paper

Investigating Inner Properties of Multimodal Representation and Semantic Compositionality With Brain-Based Componential Semantics

  • Shaonan Wang
  • Jiajun Zhang
  • Nan Lin
  • Chengqing Zong

Multimodal models have been proven to outperform textbased approaches on learning semantic representations. However, it still remains unclear what properties are encoded in multimodal representations, in what aspects do they outperform the single-modality representations, and what happened in the process of semantic compositionality in different input modalities. Considering that multimodal models are originally motivated by human concept representations, we assume that correlating multimodal representations with brain-based semantics would interpret their inner properties to answer the above questions. To that end, we propose simple interpretation methods based on brain-based componential semantics. First we investigate the inner properties of multimodal representations by correlating them with corresponding brain-based property vectors. Then we map the distributed vector space to the interpretable brain-based componential space to explore the inner properties of semantic compositionality. Ultimately, the present paper sheds light on the fundamental questions of natural language understanding, such as how to represent the meaning of words and how to combine word meanings into larger units.

AAAI Conference 2018 Conference Paper

Learning Multimodal Word Representation via Dynamic Fusion Methods

  • Shaonan Wang
  • Jiajun Zhang
  • Chengqing Zong

Multimodal models have been proven to outperform textbased models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose three novel dynamic fusion methods to assign importance weights to each modality, in which weights are learned under the weak supervision of word association pairs. The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.

IJCAI Conference 2018 Conference Paper

Multi-modal Sentence Summarization with Modality Attention and Image Filtering

  • Haoran Li
  • Junnan Zhu
  • Tianshang Liu
  • Jiajun Zhang
  • Chengqing Zong

In this paper, we introduce a multi-modal sentence summarization task that produces a short summary from a pair of sentence and image. This task is more challenging than sentence summarization. It not only needs to effectively incorporate visual features into standard text summarization framework, but also requires to avoid noise of image. To this end, we propose a modality-based attention mechanism to pay different attention to image patches and text units, and we design image filters to selectively use visual information to enhance the semantics of the input sentence. We construct a multimodal sentence summarization dataset and extensive experiments on this dataset demonstrate that our models significantly outperform conventional models which only employ text as input. Further analyses suggest that sentence summarization task can benefit from visually grounded representations from a variety of aspects.

IJCAI Conference 2018 Conference Paper

Phrase Table as Recommendation Memory for Neural Machine Translation

  • Yang Zhao
  • Yining Wang
  • Jiajun Zhang
  • Chengqing Zong

Neural Machine Translation (NMT) has drawn much attention due to its promising translation performance recently. However, several studies indicate that NMT often generates fluent but unfaithful translations. In this paper, we propose a method to alleviate this problem by using a phrase table as recommendation memory. The main idea is to add bonus to words worthy of recommendation, so that NMT can make correct predictions. Specifically, we first derive a prefix tree to accommodate all the candidate target phrases by searching the phrase translation table according to the source sentence. Then, we construct a recommendation word set by matching between candidate target phrases and previously translated target words by NMT. After that, we determine the specific bonus value for each recommendable word by using the attention vector and phrase translation probability. Finally, we integrate this bonus value into NMT to improve the translation results. The extensive experiments demonstrate that the proposed methods obtain remarkable improvements over the strong attention based NMT.

AAAI Conference 2017 Conference Paper

A Dynamic Window Neural Network for CCG Supertagging

  • Huijia Wu
  • Jiajun Zhang
  • Chengqing Zong

Combinatory Category Grammar (CCG) supertagging is a task to assign lexical categories to each word in a sentence. Almost all previous methods use fixed context window sizes to encode input tokens. However, it is obvious that different tags usually rely on different context window sizes. This motivates us to build a supertagger with a dynamic window approach, which can be treated as an attention mechanism on the local contexts. We find that applying dropout on the dynamic filters is superior to the regular dropout on word embeddings. We use this approach to demonstrate the state-ofthe-art CCG supertagging performance on the standard test set.

IJCAI Conference 2017 Conference Paper

Learning Sentence Representation with Guidance of Human Attention

  • Shaonan Wang
  • Jiajun Zhang
  • Chengqing Zong

Recently, much progress has been made in learning general-purpose sentence representations that can be used across domains. However, most of the existing models typically treat each word in a sentence equally. In contrast, extensive studies have proven that human read sentences efficiently by making a sequence of fixation and saccades. This motivates us to improve sentence representations by assigning different weights to the vectors of the component words, which can be treated as an attention mechanism on single sentences. To that end, we propose two novel attention models, in which the attention weights are derived using significant predictors of human reading time, i. e. , Surprisal, POS tags and CCG supertags. The extensive experiments demonstrate that the proposed methods significantly improve upon the state-of-the-art sentence representation models.

IJCAI Conference 2016 Conference Paper

Towards Zero Unknown Word in Neural Machine Translation

  • Xiaoqing Li
  • Jiajun Zhang
  • Chengqing Zong

Neural Machine translation has shown promising results in recent years. In order to control the computational complexity, NMT has to employ a small vocabulary, and massive rare words outside the vocabulary are all replaced with a single unk symbol. Besides the inability to translate rare words, this kind of simple approach leads to much increased ambiguity of the sentences since meaningless unks break the structure of sentences, and thus hurts the translation and reordering of the in-vocabulary words. To tackle this problem, we propose a novel substitution-translation-restoration method. In substitution step, the rare words in a testing sentence are replaced with similar in-vocabulary words based on a similarity model learnt from monolingual data. In translation and restoration steps, the sentence will be translated with a model trained on new bilingual data with rare words replaced, and finally the translations of the replaced words will be substituted by that of original ones. Experiments on Chinese-to-English translation demonstrate that our proposed method can achieve more than 4 BLEU points over the attention-based NMT. When compared to the recently proposed method handling rare words in NMT, our method can also obtain an improvement by nearly 3 BLEU points.

IJCAI Conference 2015 Conference Paper

A New Input Method for Human Translators: Integrating Machine Translation Effectively and Imperceptibly

  • Guoping Huang
  • Jiajun Zhang
  • Yu Zhou
  • Chengqing Zong

Computer-aided translation (CAT) system is the most popular tool which helps human translators perform language translation efficiently. To further improve the efficiency, there is an increasing interest in applying the machine translation (MT) technology to upgrade CAT. Post-editing is a standard approach: human translators generate the translation by correcting MT outputs. In this paper, we propose a novel approach deeply integrating MT into CAT systems: a well-designed input method which makes full use of the knowledge adopted by MT systems, such as translation rules, decoding hypotheses and n-best translation lists. Our proposed approach allows human translators to focus on choosing better translation results with less time rather than just complete translation themselves. The extensive experiments demonstrate that our method saves more than 14% time and over 33% keystrokes, and it improves the translation quality as well by more than 3 absolute BLEU scores compared with the strong baseline, i. e. , post-editing using Google Pinyin.

IJCAI Conference 2015 Conference Paper

Local Translation Prediction with Global Sentence Representation

  • Jiajun Zhang
  • Dakun Zhang
  • Jie Hao

Statistical machine translation models have made great progress in improving the translation quality. However, the existing models predict the target translation with only the source- and target-side local context information. In practice, distinguishing good translations from bad ones does not only depend on the local features, but also rely on the global sentence-level information. In this paper, we explore the source-side global sentence-level features for target-side local translation prediction. We propose a novel bilingually-constrained chunkbased convolutional neural network to learn sentence semantic representations. With the sentencelevel feature representation, we further design a feed-forward neural network to better predict translations using both local and global information. The large-scale experiments show that our method can obtain substantial improvements in translation quality over the strong baseline: the hierarchical phrase-based translation model augmented with the neural network joint model.

AAAI Conference 2014 Conference Paper

Mind the Gap: Machine Translation by Minimizing the Semantic Gap in Embedding Space

  • Jiajun Zhang
  • Shujie Liu
  • Mu Li
  • Ming Zhou
  • Chengqing Zong

The conventional statistical machine translation (SMT) methods perform the decoding process by compositing a set of the translation rules which are associated with high probabilities. However, the probabilities of the translation rules are calculated only according to the cooccurrence statistics in the bilingual corpus rather than the semantic meaning similarity. In this paper, we propose a Recursive Neural Network (RNN) based model that converts each translation rule into a compact real-valued vector in the semantic embedding space and performs the decoding process by minimizing the semantic gap between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function. Extensive experiments on Chinese-to-English translation show that our RNN-based model can significantly improve the translation quality by up to 1. 68 BLEU score.