Arrow Research search

Author name cluster

Jiajun Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

AAAI Conference 2026 Conference Paper

CHARM: Collaborative Harmonization Across Arbitrary Modalities for Modality-Agnostic Semantic Segmentation

  • Lekang Wen
  • Jing Xiao
  • Liang Liao
  • Jiajun Chen
  • Mi Wang

Modality-agnostic Semantic Segmentation (MaSS) aims to achieve robust scene understanding across arbitrary combinations of input modality. Existing methods typically rely on explicit feature alignment to achieve modal homogenization, which dilutes the distinctive strengths of each modality and destroys their inherent complementarity. To achieve cooperative harmonization rather than homogenization, we propose CHARM, a novel complementary learning framework designed to implicitly align content while preserving modality-specific advantages through two components: (1) Mutual Perception Unit (MPU), enabling implicit alignment through window-based cross-modal interaction, where modalities serve as both queries and contexts for each other to discover modality-interactive correspondences; (2) A dual-path optimization strategy that decouples training into Collaborative Learning Strategy (CoL) for complementary fusion learning and Individual Enhancement Strategy (InE) for protected modality-specific optimization. Experiments across multiple datasets and backbones indicate that CHARM consistently outperform the baselines, with significant increment on the fragile modalities. This work shifts the focus from model homogenization to harmonization, enabling cross-modal complementarity for true harmony in diversity.

AAAI Conference 2026 Conference Paper

How Does Alignment Enhance LLMs’ Multilingual Capabilities? A Language Neurons Perspective

  • Shimao Zhang
  • Zhejian Lai
  • Xiang Liu
  • Shuaijie She
  • Xiao Liu
  • Yeyun Gong
  • Shujian Huang
  • Jiajun Chen

Multilingual Alignment is an effective and representative paradigm to enhance LLMs' multilingual capabilities, which transfers the capabilities from the high-resource languages to the low-resource languages. Meanwhile, some research on language-specific neurons provides a new perspective to analyze and understand LLMs' mechanisms. However, we find that there are many neurons that are shared by multiple but not all languages and cannot be correctly classified. In this work, we propose a ternary classification methodology that categorizes neurons into three types, including language-specific neurons, language-related neurons, and general neurons. And we propose a corresponding identification algorithm to distinguish these different types of neurons. Furthermore, based on the distributional characteristics of different types of neurons, we divide the LLMs' internal process for multilingual inference into four parts: (1) multilingual understanding, (2) shared semantic space reasoning, (3) multilingual output space transformation, and (4) vocabulary space outputting. Additionally, we systematically analyze the models before and after alignment with a focus on different types of neurons. We also analyze the phenomenon of ''Spontaneous Multilingual Alignment''. Overall, our work conducts a comprehensive investigation based on different types of neurons, providing empirical results and valuable insights to better understand multilingual alignment and multilingual capabilities of LLMs.

AAAI Conference 2026 Conference Paper

PROMISE: Prompt-Attentive Hierarchical Contrastive Learning for Robust Cross-Modal Representation with Missing Modalities

  • Jiajun Chen
  • Sai Cheng
  • Yuan Yutao
  • YiRui Zhang
  • Haitao Yuan
  • Peng Peng
  • Yi Zhong

Multimodal models integrating natural language and visual information have substantially improved emotion recognition performance. However, their effectiveness significantly declines in real-world situations where certain modalities are missing or unavailable. This degradation primarily stems from inconsistent representation learning between complete multimodal data and incomplete modality scenarios. Existing approaches typically address missing modalities through relatively simplistic generation methods, yet these approaches fail to adequately preserve cross-modal consistency, leading to suboptimal performance. To overcome this limitation, we propose a novel multimodal framework named PROMISE, a prompting-Attentive Hierarchical Contrastive Learning approach designed explicitly for robust cross-modal representation under conditions of missing modalities. Specifically, Promise innovatively incorporates multimodal prompt learning into a hierarchical contrastive learning framework, equipped with a specially designed prompting-attention mechanism. This mechanism dynamically generates robust and consistent representations for scenarios where particular modalities are absent, thereby effectively bridging the representational gap between complete and incomplete data. Extensive experiments conducted on benchmark datasets, along with comprehensive ablation studies, clearly demonstrate the superior performance of PROMISE compared to current state-of-the-art multimodal methods.

ICML Conference 2025 Conference Paper

Causal Logistic Bandits with Counterfactual Fairness Constraints

  • Jiajun Chen
  • Jin Tian
  • Christopher John Quinn

Artificial intelligence will play a significant role in decision making in numerous aspects of society. Numerous fairness criteria have been proposed in the machine learning community, but there remains limited investigation into fairness as defined through specified attributes in a sequential decision-making framework. In this paper, we focus on causal logistic bandit problems where the learner seeks to make fair decisions, under a notion of fairness that accounts for counterfactual reasoning. We propose and analyze an algorithm by leveraging primal-dual optimization for constrained causal logistic bandits where the non-linear constraints are a priori unknown and must be learned in time. We obtain sub-linear regret guarantees with leading term similar to that for unconstrained logistic bandits (Lee et al. , 2024) while guaranteeing sub-linear constraint violations. We show how to achieve zero cumulative constraint violations with a small increase in the regret bound.

AAAI Conference 2025 Conference Paper

DECIDER: Difference-aware Contrastive Diffusion Model with Adversarial Perturbations for Image Change Captioning

  • Guojin Zhong
  • Jinhong Hu
  • Jiajun Chen
  • Jin Yuan
  • Wenbo Pan

Image change captioning (ICC) poses great challenges stemming from describing subtle differences between two similar images in natural language, significantly increasing the complexity of feature extraction and cross-modal learning compared to the image captioning task. Existing ICC methods often suffer from two key challenges: 1) Massive irrelevant information of uni-image features leads to suboptimal visual difference representations; 2) Imprecise inter-modality correspondence degrades the quality of generated captions. This paper proposes a Difference-aware Contrastive Diffusion Model with Adversarial Perturbations (DECIDER) for ICC due to the excellent performance of diffusion models in image/text generation. Technically, difference-aware cross-modal learning is developed to suppress irrelevant information and learn compact yet robust visual difference representations. This is achieved by optimizing a novel objective mathematically derived from the information bottleneck principle that excels in filtering redundant features and highlighting differences. Furthermore, we propose to dynamically generate ``hard'' positive and negative samples via adversarial perturbations, which are involved in contrastive diffusion training with a tighter variational bound. This design encourages our DECIDER to excavate and construct complex correspondences between visual differences and captions, thereby improving generalization performance. Extensive experiments on four datasets demonstrate that DECIDER significantly exceeds state-of-the-art performance.

AAAI Conference 2025 Conference Paper

MoE-LPR: Multilingual Extension of Large Language Models Through Mixture-of-Experts with Language Priors Routing

  • Hao Zhou
  • Zhijun Wang
  • Shujian Huang
  • Xin Huang
  • Xue Han
  • Junlan Feng
  • Chao Deng
  • Weihua Luo

Large Language Models (LLMs) are often English-centric due to the disproportionate distribution of languages in their pre-training data. Enhancing non-English language capabilities through post-pretraining often results in catastrophic forgetting of high-resource languages. Previous methods either achieve good expansion with severe forgetting or slight forgetting with poor expansion, indicating the challenge of balancing language expansion while preventing forgetting. In this paper, we propose a method called MoE-LPR (Mixture-of-Experts with Language Priors Routing) to alleviate this problem. MoE-LPR employs a two-stage training approach to enhance the multilingual capability. First, the model is post-pretrained into a Mixture-of-Experts(MoE) architecture by upcycling, where all the original parameters are frozen and new experts are added. In this stage, we focus improving the ability on expanded languages, without using any original language data. Then, the model reviews the knowledge of the original languages with replay data amounting to less than 1% of post-pretraining, where we incorporate language priors routing to better recover the abilities of the original languages. Evaluations on multiple benchmarks show that MoE-LPR outperforms other post-pretraining methods. Freezing original parameters preserves original language knowledge while adding new experts preserves the learning ability. Reviewing with LPR enables effective utilization of multilingual knowledge within the parameters. Additionally, the MoE architecture maintains the same inference overhead while increasing total model parameters. Extensive experiments demonstrate MoE-LPR’s effectiveness in improving expanded languages and preserving original language proficiency with superior scalability.

AAAI Conference 2024 Conference Paper

A Hierarchical Network for Multimodal Document-Level Relation Extraction

  • Lingxing Kong
  • Jiuliang Wang
  • Zheng Ma
  • Qifeng Zhou
  • Jianbing Zhang
  • Liang He
  • Jiajun Chen

Document-level relation extraction aims to extract entity relations that span across multiple sentences. This task faces two critical issues: long dependency and mention selection. Prior works address the above problems from the textual perspective, however, it is hard to handle these problems solely based on text information. In this paper, we leverage video information to provide additional evidence for understanding long dependencies and offer a wider perspective for identifying relevant mentions, thus giving rise to a new task named Multimodal Document-level Relation Extraction (MDocRE). To tackle this new task, we construct a human-annotated dataset including documents and relevant videos, which, to the best of our knowledge, is the first document-level relation extraction dataset equipped with video clips. We also propose a hierarchical framework to learn interactions between different dependency levels and a textual-guided transformer architecture that incorporates both textual and video modalities. In addition, we utilize a mention gate module to address the mention-selection problem in both modalities. Experiments on our proposed dataset show that 1) incorporating video information greatly improves model performance; 2) our hierarchical framework has state-of-the-art results compared with both unimodal and multimodal baselines; 3) through collaborating with video information, our model better solves the long-dependency and mention-selection problems.

AAAI Conference 2023 Conference Paper

CoP: Factual Inconsistency Detection by Controlling the Preference

  • Shuaijie She
  • Xiang Geng
  • Shujian Huang
  • Jiajun Chen

Abstractive summarization is the process of generating a summary given a document as input. Although significant progress has been made, the factual inconsistency between the document and the generated summary still limits its practical applications. Previous work found that the probabilities assigned by the generation model reflect its preferences for the generated summary, including the preference for factual consistency, and the preference for the language or knowledge prior as well. To separate the preference for factual consistency, we propose an unsupervised framework named CoP by controlling the preference of the generation model with the help of prompt. More specifically, the framework performs an extra inference step in which a text prompt is introduced as an additional input. In this way, another preference is described by the generation probability of this extra inference process. The difference between the above two preferences, i.e. the difference between the probabilities, could be used as measurements for detecting factual inconsistencies. Interestingly, we found that with the properly designed prompt, our framework could evaluate specific preferences and serve as measurements for fine-grained categories of inconsistency, such as entity-related inconsistency, coreference-related inconsistency, etc. Moreover, our framework could also be extended to the supervised setting to learn better prompt from the labeled data as well. Experiments show that our framework achieves new SOTA results on three factual inconsisency detection tasks.

AAAI Conference 2023 Conference Paper

Denoising Pre-training for Machine Translation Quality Estimation with Curriculum Learning

  • Xiang Geng
  • Yu Zhang
  • Jiahuan Li
  • Shujian Huang
  • Hao Yang
  • Shimin Tao
  • Yimeng Chen
  • Ning Xie

Quality estimation (QE) aims to assess the quality of machine translations when reference translations are unavailable. QE plays a crucial role in many real-world applications of machine translation. Because labeled QE data are usually limited in scale, recent research, such as DirectQE, pre-trains QE models with pseudo QE data and obtains remarkable performance. However, there tends to be inevitable noise in the pseudo data, hindering models from learning QE accurately. Our study shows that the noise mainly comes from the differences between pseudo and real translation outputs. To handle this problem, we propose CLQE, a denoising pre-training framework for QE based on curriculum learning. More specifically, we propose to measure the degree of noise in the pseudo QE data with some metrics based on statistical or distributional features. With the guidance of these metrics, CLQE gradually pre-trains the QE model using data from cleaner to noisier. Experiments on various benchmarks reveal that CLQE outperforms DirectQE and other strong baselines. We also show that with our framework, pre-training converges faster than directly using the pseudo data. We make our CLQE code available (https://github.com/NJUNLP/njuqe).

AAAI Conference 2022 Conference Paper

Non-parametric Online Learning from Human Feedback for Neural Machine Translation

  • Dongqi Wang
  • Haoran Wei
  • Zhirui Zhang
  • Shujian Huang
  • Jun Xie
  • Jiajun Chen

We study the problem of online learning with human feedback in the human-in-the-loop machine translation, in which the human translators revise the machine-generated translations and then the corrected translations are used to improve the neural machine translation (NMT) system. However, previous methods require online model updating or additional translation memory networks to achieve high-quality performance, making them inflexible and inefficient in practice. In this paper, we propose a novel non-parametric online learning method without changing the model structure. This approach introduces two k-nearest-neighbor (KNN) modules: one module memorizes the human feedback, which is the correct sentences provided by human translators, while the other balances the usage of the history human feedback and original NMT models adaptively. Experiments conducted on EMEA and JRC-Acquis benchmarks demonstrate that our proposed method obtains substantial improvements on translation accuracy and achieves better adaptation performance with less repeating human correction operations.

AAAI Conference 2021 Conference Paper

Automated Cross-prompt Scoring of Essay Traits

  • Robert Ridley
  • Liang He
  • Xin-yu Dai
  • Shujian Huang
  • Jiajun Chen

The majority of current research in Automated Essay Scoring (AES) focuses on prompt-specific scoring of either the overall quality of an essay or the quality with regards to certain traits. In real-world applications obtaining labelled data for a target essay prompt is often expensive or unfeasible, requiring the AES system to be able to perform well when predicting scores for essays from unseen prompts. As a result, some recent research has been dedicated to cross-prompt AES. However, this line of research has thus far only been concerned with holistic, overall scoring, with no exploration into the scoring of different traits. As users of AES systems often require feedback with regards to different aspects of their writing, trait scoring is a necessary component of an effective AES system. Therefore, to address this need, we introduce a new task named Automated Cross-prompt Scoring of Essay Traits, which requires the model to be trained solely on non-target-prompt essays and to predict the holistic, overall score as well as scores for a number of specific traits for target-prompt essays. This task challenges the model’s ability to generalize in order to score essays from a novel domain as well as its ability to represent the quality of essays from multiple different aspects. In addition, we introduce a new, innovative approach which builds on top of a state-of-the-art method for cross-prompt AES. Our method utilizes a traitattention mechanism and a multi-task architecture that leverages the relationships between each trait to simultaneously predict the overall score and the score of each individual trait. We conduct extensive experiments on the widely used ASAP and ASAP++ datasets and demonstrate that our approach is able to outperform leading prompt-specific trait scoring and cross-prompt AES methods.

NeurIPS Conference 2021 Conference Paper

ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs

  • Zhanqiu Zhang
  • Jie Wang
  • Jiajun Chen
  • Shuiwang Ji
  • Feng Wu

Query embedding (QE)---which aims to embed entities and first-order logical (FOL) queries in low-dimensional spaces---has shown great power in multi-hop reasoning over knowledge graphs. Recently, embedding entities and queries with geometric shapes becomes a promising direction, as geometric shapes can naturally represent answer sets of queries and logical relationships among them. However, existing geometry-based models have difficulty in modeling queries with negation, which significantly limits their applicability. To address this challenge, we propose a novel query embedding model, namely \textbf{Con}e \textbf{E}mbeddings (ConE), which is the first geometry-based QE model that can handle all the FOL operations, including conjunction, disjunction, and negation. Specifically, ConE represents entities and queries as Cartesian products of two-dimensional cones, where the intersection and union of cones naturally model the conjunction and disjunction operations. By further noticing that the closure of complement of cones remains cones, we design geometric complement operators in the embedding space for the negation operations. Experiments demonstrate that ConE significantly outperforms existing state-of-the-art methods on benchmark datasets.

AAAI Conference 2021 Conference Paper

DirectQE: Direct Pretraining for Machine Translation Quality Estimation

  • Qu Cui
  • Shujian Huang
  • Jiahuan Li
  • Xiang Geng
  • Zaixiang Zheng
  • Guoping Huang
  • Jiajun Chen

Machine Translation Quality Estimation (QE) is a task of predicting the quality of machine translations without relying on any reference. Recently, the predictor-estimator framework trains the predictor as a feature extractor, which leverages the extra parallel corpora without QE labels, achieving promising QE performance. However, we argue that there are gaps between the predictor and the estimator in both data quality and training objectives, which preclude QE models from benefiting from a large number of parallel corpora more directly. We propose a novel framework called DirectQE that provides a direct pretraining for QE tasks. In DirectQE, a generator is trained to produce pseudo data that is closer to the real QE data, and a detector is pretrained on these data with novel objectives that are akin to the QE task. Experiments on widely used benchmarks show that DirectQE outperforms existing methods, without using any pretraining models such as BERT. We also give extensive analyses showing how fixing the two gaps contributes to our improvements.

NeurIPS Conference 2021 Conference Paper

Duplex Sequence-to-Sequence Learning for Reversible Machine Translation

  • Zaixiang Zheng
  • Hao Zhou
  • Shujian Huang
  • Jiajun Chen
  • Jingjing Xu
  • Lei Li

Sequence-to-sequence learning naturally has two directions. How to effectively utilize supervision signals from both directions? Existing approaches either require two separate models, or a multitask-learned model but with inferior performance. In this paper, we propose REDER (Reversible Duplex Transformer), a parameter-efficient model and apply it to machine translation. Either end of REDER can simultaneously input and output a distinct language. Thus REDER enables {\em reversible machine translation} by simply flipping the input and output ends. Experiments verify that REDER achieves the first success of reversible machine translation, which helps outperform its multitask-trained baselines by up to 1. 3 BLEU.

AAAI Conference 2021 Conference Paper

Topology-Aware Correlations Between Relations for Inductive Link Prediction in Knowledge Graphs

  • Jiajun Chen
  • Huarui He
  • Feng Wu
  • Jie Wang

Inductive link prediction—where entities during training and inference stages can be different—has been shown to be promising for completing continuously evolving knowledge graphs. Existing models of inductive reasoning mainly focus on predicting missing links by learning logical rules. However, many existing approaches do not take into account semantic correlations between relations, which are commonly seen in real-world knowledge graphs. To address this challenge, we propose a novel inductive reasoning approach, namely TACT, which can effectively exploit Topology-Aware CorrelaTions between relations in an entity-independent manner. TACT is inspired by the observation that the semantic correlation between two relations is highly correlated to their topological structure in knowledge graphs. Specifically, we categorize all relation pairs into several topological patterns, and then propose a Relational Correlation Network (RCN) to learn the importance of the different patterns for inductive link prediction. Experiments demonstrate that TACT can effectively model semantic correlations between relations, and significantly outperforms existing state-of-the-art methods on benchmark datasets for the inductive link prediction task.

AAAI Conference 2020 Conference Paper

Generating Diverse Translation by Manipulating Multi-Head Attention

  • Zewei Sun
  • Shujian Huang
  • Hao-Ran Wei
  • Xin-yu Dai
  • Jiajun Chen

Transformer model (Vaswani et al. 2017) has been widely used in machine translation tasks and obtained state-of-theart results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without a severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring a significant improvement in performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.

AAAI Conference 2020 Conference Paper

GRET: Global Representation Enhanced Transformer

  • Rongxiang Weng
  • Haoran Wei
  • Shujian Huang
  • Heng Yu
  • Lidong Bing
  • Weihua Luo
  • Jiajun Chen

Transformer, based on the encoder-decoder framework, has achieved state-of-the-art performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation1.

AAAI Conference 2020 Conference Paper

Latent Opinions Transfer Network for Target-Oriented Opinion Words Extraction

  • Zhen Wu
  • Fei Zhao
  • Xin-yu Dai
  • Shujian Huang
  • Jiajun Chen

Target-oriented opinion words extraction (TOWE) is a new subtask of ABSA, which aims to extract the corresponding opinion words for a given opinion target in a sentence. Recently, neural network methods have been applied to this task and achieve promising results. However, the difficulty of annotation causes the datasets of TOWE to be insufficient, which heavily limits the performance of neural models. By contrast, abundant review sentiment classification data are easily available at online review sites. These reviews contain substantial latent opinions information and semantic patterns. In this paper, we propose a novel model to transfer these opinions knowledge from resource-rich review sentiment classification datasets to low-resource task TOWE. To address the challenges in the transfer process, we design an effective transformation method to obtain latent opinions, then integrate them into TOWE. Extensive experimental results show that our model achieves better performance compared to other state-of-the-art methods and significantly outperforms the base model without transferring opinions knowledge. Further analysis validates the effectiveness of our model.

IJCAI Conference 2020 Conference Paper

Towards Making the Most of Context in Neural Machine Translation

  • Zaixiang Zheng
  • Xiang Yue
  • Shujian Huang
  • Jiajun Chen
  • Alexandra Birch

Document-level machine translation manages to outperform sentence level models by a small margin, but have failed to be widely adopted. We argue that previous research did not make a clear use of the global context, and propose a new document-level NMT framework that deliberately models the local context of each sentence with the awareness of the global context of the document in both source and target languages. We specifically design the model to be able to deal with documents containing any number of sentences, including single sentences. This unified approach allows our model to be trained elegantly on standard datasets without needing to train on sentence and document level data separately. Experimental results demonstrate that our model outperforms Transformer baselines and previous document-level NMT models with substantial margins of up to 2. 1 BLEU on state-of-the-art baselines. We also provide analyses which show the benefit of context far beyond the neighboring two or three sentences, which previous studies have typically incorporated.

IJCAI Conference 2019 Conference Paper

Correct-and-Memorize: Learning to Translate from Interactive Revisions

  • Rongxiang Weng
  • Hao Zhou
  • Shujian Huang
  • Lei Li
  • Yifan Xia
  • Jiajun Chen

State-of-the-art machine translation models are still not on a par with human translators. Previous work takes human interactions into the neural machine translation process to obtain improved results in target languages. However, not all model--translation errors are equal -- some are critical while others are minor. In the meanwhile, same translation mistakes occur repeatedly in similar context. To solve both issues, we propose CAMIT, a novel method for translating in an interactive environment. Our proposed method works with critical revision instructions, therefore allows human to correct arbitrary words in model-translated sentences. In addition, CAMIT learns from and softly memorizes revision actions based on the context, alleviating the issue of repeating mistakes. Experiments in both ideal and real interactive translation settings demonstrate that our proposed CAMIT enhances machine translation results significantly while requires fewer revision instructions from human compared to previous methods.

IJCAI Conference 2019 Conference Paper

Utilizing Non-Parallel Text for Style Transfer by Making Partial Comparisons

  • Di Yin
  • Shujian Huang
  • Xin-yu Dai
  • Jiajun Chen

Text style transfer aims to rephrase a given sentence into a different style without changing its original content. Since parallel corpora (i. e. sentence pairs with the same content but different styles) are usually unavailable, most previous works solely guide the transfer process with distributional information, i. e. using style-related classifiers or language models, which neglect the correspondence of instances, leading to poor transfer performance, especially for the content preservation. In this paper, we propose making partial comparisons to explicitly model the content and style correspondence of instances, respectively. To train the partial comparators, we propose methods to extract partial-parallel training instances automatically from the non-parallel data, and to further enhance the training process by using data augmentation. We perform experiments that compare our method to other existing approaches on two review datasets. Both automatic and manual evaluations show that our approach can significantly improve the performance of existing adversarial methods, and outperforms most state-of-the-art models. Our code and data will be available on Github.

AAAI Conference 2018 Conference Paper

Improving Review Representations With User Attention and Product Attention for Sentiment Classification

  • Zhen Wu
  • Xin-yu Dai
  • Cunyan Yin
  • Shujian Huang
  • Jiajun Chen

Neural network methods have achieved great success in reviews sentiment classification. Recently, some works achieved improvement by incorporating user and product information to generate a review representation. However, in reviews, we observe that some words or sentences show strong user’s preference, and some others tend to indicate product’s characteristic. The two kinds of information play different roles in determining the sentiment label of a review. Therefore, it is not reasonable to encode user and product information together into one representation. In this paper, we propose a novel framework to encode user and product information. Firstly, we apply two individual hierarchical neural networks to generate two representations, with user attention or with product attention. Then, we design a combined strategy to make full use of the two representations for training and final prediction. The experimental results show that our model obviously outperforms other state-of-the-art methods on IMDB and Yelp datasets. Through the visualization of attention over words related to user or product, we validate our observation mentioned above.

JAIR Journal 2017 Journal Article

A Neural Probabilistic Structured-Prediction Method for Transition-Based Natural Language Processing

  • Hao Zhou
  • Yue Zhang
  • Chuan Cheng
  • Shujian Huang
  • Xinyu Dai
  • Jiajun Chen

We propose a neural probabilistic structured-prediction method for transition-based natural language processing, which integrates beam search and contrastive learning. The method uses a global optimization model, which can leverage arbitrary features over non-local context. Beam search is used for efficient heuristic decoding, and contrastive learning is performed for adjusting the model according to search errors. When evaluated on both chunking and dependency parsing tasks, the proposed method achieves significant accuracy improvements over the locally normalized greedy baseline on the two tasks, respectively.

IJCAI Conference 2017 Conference Paper

AGRA: An Analysis-Generation-Ranking Framework for Automatic Abbreviation from Paper Titles

  • Jianbing Zhang
  • Yixin Sun
  • Shujian Huang
  • Cam-Tu Nguyen
  • Xiaoliang Wang
  • Xinyu Dai
  • Jiajun Chen
  • Yang Yu

People sometimes choose word-like abbreviations to refer to items with a long description. These abbreviations usually come from the descriptive text of the item and are easy to remember and pronounce, while preserving the key idea of the item. Coming up with a nice abbreviation is not an easy job, even for human. Previous assistant naming systems compose names by applying hand-written rules, which may not perform well. In this paper, we propose to view the naming task as an artificial intelligence problem and create a data set in the domain of academic naming. To generate more delicate names, we propose a three-step framework, including description analysis, candidate generation and abbreviation ranking, each of which is parameterized and optimizable. We conduct experiments to compare different settings of our framework with several analysis approaches from different perspectives. Compared to online or baseline systems, our framework could achieve the best results.

IJCAI Conference 2017 Conference Paper

Deep Matrix Factorization Models for Recommender Systems

  • Hong-Jian Xue
  • Xinyu Dai
  • Jianbing Zhang
  • Shujian Huang
  • Jiajun Chen

Recommender systems usually make personalized recommendation with user-item interaction ratings, implicit feedback and auxiliary information. Matrix factorization is the basic idea to predict a personalized ranking over a set of items for an individual user with the similarities among users and items. In this paper, we propose a novel matrix factorization model with neural network architecture. Firstly, we construct a user-item matrix with explicit ratings and non-preference implicit feedback. With this matrix as the input, we present a deep structure learning architecture to learn a common low dimensional space for the representations of users and items. Secondly, we design a new loss function based on binary cross entropy, in which we consider both explicit ratings and implicit feedback for a better optimization. The experimental results show the effectiveness of both our proposed model and the loss function. On several benchmark datasets, our model outperformed other state-of-the-art methods. We also conduct extensive experiments to evaluate the performance within different experimental settings.

IJCAI Conference 2016 Conference Paper

Tree-State Based Rule Selection Models for Hierarchical Phrase-Based Machine Translation

  • Shujian Huang
  • Huifeng Sun
  • Chengqi Zhao
  • Jinsong Su
  • Xin-yu Dai
  • Jiajun Chen

Hierarchical phrase-based translation systems (HPBs) perform translation using a synchronous context free grammar which has only one unified non-terminal for every translation rule. While the usage of the unified non-terminal brings freedom to generate translations with almost arbitrary structures, it also takes the risks of generating low-quality translations which has a wrong syntactic structure. In this paper, we propose tree-state models to discriminate the good or bad usage of translation rules based on the syntactic structures of the source sentence. We propose to use statistical models and context dependent features to estimate the probability of each tree state for each translation rule and punish the usage of rules in the translation system which violates their tree states. Experimental results demonstrate that these simple models could bring significant improvements to the translation quality.