Arrow Research search

Author name cluster

Haifeng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

31 papers
2 author rows

Possible papers

31

AAAI Conference 2026 Conference Paper

BEE-RAG: Balanced Entropy Engineering for Retrieval-Augmented Generation

  • Yuhao Wang
  • Ruiyang Ren
  • Yucheng Wang
  • Jing Liu
  • Xin Zhao
  • Hua Wu
  • Haifeng Wang

With the rapid advancement of large language models (LLMs), retrieval-augmented generation (RAG) has emerged as a critical approach to supplement the inherent knowledge limitations of LLMs. However, due to the typically large volume of retrieved information, RAG tends to operate with long context lengths. From the perspective of entropy engineering, we identify unconstrained entropy growth and attention dilution due to long retrieval context as significant factors affecting RAG performance. In this paper, we propose the balanced entropy-engineered RAG (BEE-RAG) framework, which improves the adaptability of RAG systems to varying context lengths through the principle of entropy invariance. By leveraging balanced context entropy to reformulate attention dynamics, BEE-RAG separates attention sensitivity from context length, ensuring a stable entropy level. Building upon this, we introduce a zero-shot inference strategy for multi-importance estimation and a parameter-efficient adaptive fine-tuning mechanism to obtain the optimal balancing factor for different settings. Extensive experiments across multiple RAG tasks demonstrate the effectiveness of BEE-RAG.

JBHI Journal 2026 Journal Article

GPFD-Net: A Geometry-Pose Frequency Decoupling Network for Privacy-Preserving Human Action Recognition in Healthcare

  • Xing Li
  • Jingfan Liang
  • Ge Gao
  • Li Wang
  • Haifeng Wang
  • Shihao Han

Human Action Recognition (HAR) holds significant application value in healthcare informatics, facilitating tasks such as clinical diagnosis and rehabilitation monitoring. Point cloud sequences have emerged as a pivotal modality for balancing privacy preservation with high-fidelity geometric structural representation, ensuring anonymity while retaining critical 3D behavioral information. However, existing point cloud sequence encoding methods struggle to precisely encode micro-geometric details and macro-pose contours within the spatial dimension, as well as the dynamic heterogeneity of actions within the temporal dimension. These limitations impede the realization of high-precision clinical motion analysis. To address these challenges, we propose a Geometry-Pose Frequency Decoupling Network (GPFD-Net) for human action recognition. First, we design a Geometry-Pose Parallel-Collaborative Spatial Encoder (GPCSE). This module designs a parallel dual-stream architecture to explicitly capture and fuse complementary micro-geometric details and macro-pose contours, generating an informative geometry-enhanced pose feature sequence. Second, we introduce a Frequency-Decoupled Temporal Capturer (FDTC). This module adaptively decomposes the geometry-enhanced pose feature sequence into a smooth trend sequence and a transient detail sequence, which are subsequently processed by two parallel expert encoders via differentiated encoding to achieve robust human action recognition. Extensive experiments on four public benchmark datasets demonstrate that GPFD-Net achieves superior performance. The proposed method provides a novel paradigm for high-precision and privacy-preserving motion analysis in healthcare applications.

ICLR Conference 2025 Conference Paper

FlashMask: Efficient and Rich Mask Extension of FlashAttention

  • Guoxia Wang
  • Jinle Zeng
  • Xiyuan Xiao
  • Siming Wu
  • Jiabin Yang
  • Lujing Zheng
  • Zeyu Chen
  • Jiang Bian

The computational and memory demands of vanilla attention scale quadratically with the sequence length $N$, posing significant challenges for processing long sequences in Transformer models. FlashAttention alleviates these challenges by eliminating the $\mathcal{O}(N^2)$ memory dependency and reducing attention latency through IO-aware memory optimizations. However, its native support for certain attention mask types is limited, and it does not inherently accommodate more complex masking requirements. Previous approaches resort to using dense masks with $\mathcal{O}(N^2)$ memory complexity, leading to inefficiencies. In this paper, we propose \ours{}, an extension of FlashAttention that introduces a column-wise sparse representation of attention masks. This approach efficiently represents a wide range of mask types and facilitates the development of optimized kernel implementations. By adopting this novel representation, \ours{} achieves linear memory complexity $\mathcal{O}(N)$, making it suitable for modeling long-context sequences. Moreover, this representation enables kernel optimizations that eliminate unnecessary computations by leveraging sparsity in the attention mask, without sacrificing computational accuracy, resulting in higher computational efficiency. We evaluate \ours{}'s performance in fine-tuning and alignment training of LLMs such as SFT, LoRA, DPO, and RM. \ours{} achieves significant throughput improvements, with end-to-end speedups ranging from 1.65x to 3.22x compared to existing FlashAttention dense method. Additionally, our kernel-level comparisons demonstrate that \ours{} surpasses the latest counterpart, FlexAttention, by 12.1\% to 60.7\% in terms of kernel TFLOPs/s, achieving 37.8\% to 62.3\% of the theoretical maximum FLOPs/s on the A100 GPU. The code is open-sourced on PaddlePaddle\footnote{\url{https://github.com/PaddlePaddle/Paddle}} and integrated into PaddleNLP\footnote{\url{https://github.com/PaddlePaddle/PaddleNLP}}, supporting models with over 100 billion parameters for contexts extending up to 128K tokens.

JBHI Journal 2025 Journal Article

PEARL: Cascaded Self-Supervised Cross-Fusion Learning for Parallel MRI Acceleration

  • Qingyong Zhu
  • Bei Liu
  • Zhuo-Xu Cui
  • Chentao Cao
  • Xiaomeng Yan
  • Yuanyuan Liu
  • Jing Cheng
  • Yihang Zhou

Supervised deep learning (SDL) methodology holds promise for accelerated magnetic resonance imaging (AMRI) but is hampered by the reliance on extensive training data. Some self-supervised frameworks, such as deep image prior (DIP), have emerged, eliminating the explicit training procedure but often struggling to remove noise and artifacts under significant degradation. This work introduces a novel self-supervised accelerated parallel MRI approach called PEARL, leveraging a multiple-stream joint deep decoder with two cross-fusion schemes to accurately reconstruct one or more target images from compressively sampled k-space. Each stream comprises cascaded cross-fusion sub-block networks (SBNs) that sequentially perform combined upsampling, 2D convolution, joint attention, ReLU activation and batch normalization (BN). Among them, combined upsampling and joint attention facilitate mutual learning between multiple-stream networks by integrating multi-parameter priors in both additive and multiplicative manners. Long-range unified skip connections within SBNs ensure effective information propagation between distant cross-fusion layers. Additionally, incorporating dual-normalized edge-orientation similarity regularization into the training loss enhances detail reconstruction and prevents overfitting. Experimental results consistently demonstrate that PEARL outperforms the existing state-of-the-art (SOTA) self-supervised AMRI technologies in various MRI cases. Notably, 5-fold $\sim$ 6-fold accelerated acquisition yields a 1 $\%$ $\sim$ 2 $\%$ improvement in SSIM $_{\mathsf{ROI}}$ and a 3 $\%$ $\sim$ 6 $\%$ improvement in PSNR $_{\mathsf{ROI}}$, along with a significant 15 $\%$ $\sim$ 20 $\%$ reduction in RLNE $_{\mathsf{ROI}}$.

NeurIPS Conference 2025 Conference Paper

Towards Large-Scale In-Context Reinforcement Learning by Meta-Training in Randomized Worlds

  • Fan Wang
  • Pengtao Shao
  • Yiming Zhang
  • Bo Yu
  • Shaoshan Liu
  • Ning Ding
  • Yang Cao
  • Yu Kang

In-Context Reinforcement Learning (ICRL) enables agents to learn automatically and on-the-fly from their interactive experiences. However, a major challenge in scaling up ICRL is the lack of scalable task collections. To address this, we propose the procedurally generated tabular Markov Decision Processes, named AnyMDP. Through a carefully designed randomization process, AnyMDP is capable of generating high-quality tasks on a large scale while maintaining relatively low structural biases. To facilitate efficient meta-training at scale, we further introduce decoupled policy distillation and induce prior information in the ICRL framework. Our results demonstrate that, with a sufficiently large scale of AnyMDP tasks, the proposed model can generalize to tasks that were not considered in the training set through versatile in-context learning paradigms. The scalable task set provided by AnyMDP also enables a more thorough empirical investigation of the relationship between data distribution and ICRL performance. We further show that the generalization of ICRL potentially comes at the cost of increased task diversity and longer adaptation periods. This finding carries critical implications for scaling robust ICRL capabilities, highlighting the necessity of diverse and extensive task design, and prioritizing asymptotic performance over few-shot adaptation.

JBHI Journal 2024 Journal Article

A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection

  • Wenxin Wang
  • Zhuo-Xu Cui
  • Guanxun Cheng
  • Chentao Cao
  • Xi Xu
  • Ziwei Liu
  • Haifeng Wang
  • Yulong Qi

Accuratedetection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework T wo- S tage G enerative M odel (TSGM) that combines Cycle Generative Adversarial Network (CycleGAN) and V ariance E xploding stochastic differential equation using j oint p robability (VE-JP) to improve brain tumor detection and segmentation. The CycleGAN is trained on unpaired data to generate abnormal images from healthy images as data prior. Then VE-JP is implemented to reconstruct healthy images using synthetic paired abnormal images as a guide, which alters only pathological regions but not regions of healthy. Notably, our method directly learned the joint probability distribution for conditional generation. The residual between input and reconstructed images suggests the abnormalities and a thresholding method is subsequently applied to obtain segmentation results. Furthermore, the multimodal results are weighted with different weights to improve the segmentation accuracy further. We validated our method on three datasets, and compared with other unsupervised methods for anomaly detection and segmentation. The DSC score of 0. 8590 in BraTs2020 dataset, 0. 6226 in ITCS dataset and 0. 7403 in In-house dataset show that our method achieves better segmentation performance and has better generalization.

JMLR Journal 2023 Journal Article

Implicit Regularization and Entrywise Convergence of Riemannian Optimization for Low Tucker-Rank Tensor Completion

  • Haifeng Wang
  • Jinchi Chen
  • Ke Wei

This paper is concerned with the low Tucker-rank tensor completion problem, which is about reconstructing a tensor $\mathcal{T}\in\mathbb{R}^{n\times n\times n}$ of low multilinear rank from partially observed entries. Riemannian optimization algorithms are a class of efficient methods for this problem, but the theoretical convergence analysis is still lacking. In this manuscript, we establish the entrywise convergence of the vanilla Riemannian gradient method for low Tucker-rank tensor completion under the nearly optimal sampling complexity $O(n^{3/2})$. Meanwhile, the implicit regularization phenomenon of the algorithm has also been revealed. As far as we know, this is the first work that has shown the entrywise convergence and implicit regularization property of a non-convex method for low Tucker-rank tensor completion. The analysis relies on the leave-one-out technique, and some of the technical results developed in the paper might be of broader interest in investigating the properties of other non-convex methods for this problem. [abs] [ pdf ][ bib ] &copy JMLR 2023. ( edit, beta )

IJCAI Conference 2023 Conference Paper

Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation

  • Yanrui Du
  • Jing Yan
  • Yan Chen
  • Jing Liu
  • Sendong Zhao
  • Qiaoqiao She
  • Hua Wu
  • Haifeng Wang

Recent research has revealed that deep neural networks often take dataset biases as a shortcut to make decisions rather than understand tasks, leading to failures in real-world applications. In this study, we focus on the spurious correlation between word features and labels that models learn from the biased data distribution of training data. In particular, we define the word highly co-occurring with a specific label as biased word, and the example containing biased word as biased example. Our analysis shows that biased examples are easier for models to learn, while at the time of prediction, biased words make a significantly higher contribution to the models' predictions, and models tend to assign predicted labels over-relying on the spurious correlation between words and labels. To mitigate models' over-reliance on the shortcut (i. e. spurious correlation), we propose a training strategy Less-Learn-Shortcut (LLS): our strategy quantifies the biased degree of the biased examples and down-weights them accordingly. Experimental results on Question Matching, Natural Language Inference and Sentiment Analysis tasks show that LLS is a task-agnostic strategy and can improve the model performance on adversarial data while maintaining good performance on in-domain data.

TMLR Journal 2022 Journal Article

Evolving Decomposed Plasticity Rules for Information-Bottlenecked Meta-Learning

  • Fan Wang
  • Hao Tian
  • Haoyi Xiong
  • Hua Wu
  • Jie Fu
  • Yang Cao
  • Yu Kang
  • Haifeng Wang

Artificial neural networks (ANNs) are typically confined to accomplishing pre-defined tasks by learning a set of static parameters. In contrast, biological neural networks (BNNs) can adapt to various new tasks by continually updating the neural connections based on the inputs, which is aligned with the paradigm of learning effective learning rules in addition to static parameters, \textit{e.g.}, meta-learning. Among various biologically inspired learning rules, Hebbian plasticity updates the neural network weights using local signals without the guide of an explicit target function, thus enabling an agent to learn automatically without human efforts. However, typical plastic ANNs using a large amount of meta-parameters violate the nature of the genomics bottleneck and potentially deteriorate the generalization capacity. This work proposes a new learning paradigm decomposing those connection-dependent plasticity rules into neuron-dependent rules thus accommodating $\Theta(n^2)$ learnable parameters with only $\Theta(n)$ meta-parameters. We also thoroughly study the effect of different neural modulation on plasticity. Our algorithms are tested in challenging random 2D maze environments, where the agents have to use their past experiences to shape the neural connections and improve their performances for the future. The results of our experiment validate the following: 1. Plasticity can be adopted to continually update a randomly initialized RNN to surpass pre-trained, more sophisticated recurrent models, especially when coming to long-term memorization. 2. Following the genomics bottleneck, the proposed decomposed plasticity can be comparable to or even more effective than canonical plasticity rules in some instances.

AAAI Conference 2022 Conference Paper

Is Discourse Role Important for Emotion Recognition in Conversation?

  • Donovan Ong
  • Jian Su
  • Bin Chen
  • Anh Tuan Luu
  • Ashok Narendranath
  • Yue Li
  • Shuqi Sun
  • Yingzhan Lin

A conversation is a sequence of utterances, where each utterance plays a specific discourse role while expressing a particular emotion. This paper proposes a novel method to exploit latent discourse role information of an utterance to determine the emotion it conveys in a conversation. Specifically, we use a variant of the Variational-Autoencoder (VAE) to model the context-aware latent discourse roles of each utterance in an unsupervised way. The latent discourse role representation further equips the utterance representation with a salient clue for more accurate emotion recognition. Our experiments show that our proposed method beats the best-reported performances on three public Emotion Recognition in Conversation datasets. This proves that the discourse role information of an utterance plays an important role in the emotion recognition task, which no previous work has studied.

AAAI Conference 2021 Conference Paper

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs

  • Fei Yu
  • Jiji Tang
  • Weichong Yin
  • Yu Sun
  • Hao Tian
  • Hua Wu
  • Haifeng Wang

We propose a knowledge-enhanced approach, ERNIE-ViL, which incorporates structured knowledge obtained from scene graphs to learn joint representations of vision-language. ERNIE-ViL tries to build the detailed semantic connections (objects, attributes of objects and relationships between objects) across vision and language, which are essential to vision-language cross-modal tasks. Utilizing scene graphs of visual scenes, ERNIE-ViL constructs Scene Graph Prediction tasks, i. e. , Object Prediction, Attribute Prediction and Relationship Prediction tasks in the pre-training phase. Specifically, these prediction tasks are implemented by predicting nodes of different types in the scene graph parsed from the sentence. Thus, ERNIE-ViL can learn the joint representations characterizing the alignments of the detailed semantics across vision and language. After pre-training on large scale image-text aligned datasets, we validate the effectiveness of ERNIE-ViL on 5 cross-modal downstream tasks. ERNIE-ViL achieves state-of-the-art performances on all these tasks and ranks the first place on the VCR leaderboard with an absolute improvement of 3. 7%.

IJCAI Conference 2020 Conference Paper

Enhancing Dialog Coherence with Event Graph Grounded Content Planning

  • Jun Xu
  • Zeyang Lei
  • Haifeng Wang
  • Zheng-Yu Niu
  • Hua Wu
  • Wanxiang Che

How to generate informative, coherent and sustainable open-domain conversations is a non-trivial task. Previous work on knowledge grounded conversation generation focus on improving dialog informativeness with little attention on dialog coherence. In this paper, to enhance multi-turn dialog coherence, we propose to leverage event chains to help determine a sketch of a multi-turn dialog. We first extract event chains from narrative texts and connect them as a graph. We then present a novel event graph grounded Reinforcement Learning (RL) framework. It conducts high-level response content (simply an event) planning by learning to walk over the graph, and then produces a response conditioned on the planned content. In particular, we devise a novel multi-policy decision making mechanism to foster a coherent dialog with both appropriate content ordering and high contextual relevance. Experimental results indicate the effectiveness of this framework in terms of dialog coherence and informativeness.

AAAI Conference 2020 Conference Paper

ERNIE 2.0: A Continual Pre-Training Framework for Language Understanding

  • Yu Sun
  • Shuohuan Wang
  • Yukun Li
  • Shikun Feng
  • Hao Tian
  • Hua Wu
  • Haifeng Wang

Recently pre-trained models have achieved state-of-the-art results in various language understanding tasks. Current pretraining procedures usually focus on training the model with several simple tasks to grasp the co-occurrence of words or sentences. However, besides co-occurring information, there exists other valuable lexical, syntactic and semantic information in training corpora, such as named entities, semantic closeness and discourse relations. In order to extract the lexical, syntactic and semantic information from training corpora, we propose a continual pre-training framework named ERNIE 2. 0 which incrementally builds pre-training tasks and then learn pre-trained models on these constructed tasks via continual multi-task learning. Based on this framework, we construct several tasks and train the ERNIE 2. 0 model to capture lexical, syntactic and semantic aspects of information in the training data. Experimental results demonstrate that ERNIE 2. 0 model outperforms BERT and XLNet on 16 tasks including English tasks on GLUE benchmarks and several similar tasks in Chinese. The source codes and pre-trained models have been released at https: //github. com/PaddlePaddle/ERNIE.

IJCAI Conference 2020 Conference Paper

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

  • Dongling Xiao
  • Han Zhang
  • Yukun Li
  • Yu Sun
  • Hao Tian
  • Hua Wu
  • Haifeng Wang

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced multi-flow sequence to sequence pre-training and fine-tuning framework named ERNIE-GEN, which bridges the discrepancy between training and inference with an infilling generation mechanism and a noise-aware generation method. To make generation closer to human writing patterns, this framework introduces a span-by-span generation flow that trains the model to predict semantically-complete spans consecutively rather than predicting word by word. Unlike existing pre-training methods, ERNIE-GEN incorporates multi-granularity target sampling to construct pre-training data, which enhances the correlation between encoder and decoder. Experimental results demonstrate that ERNIE-GEN achieves state-of-the-art results with a much smaller amount of pre-training data and parameters on a range of language generation tasks, including abstractive summarization (Gigaword and CNN/DailyMail), question generation (SQuAD), dialogue generation (Persona-Chat) and generative question answering (CoQA). The source codes and pre-trained models have been released at https: //github. com/PaddlePaddle/ERNIE/ernie-gen.

AAAI Conference 2020 Conference Paper

Knowledge Graph Grounded Goal Planning for Open-Domain Conversation Generation

  • Jun Xu
  • Haifeng Wang
  • Zhengyu Niu
  • Hua Wu
  • Wanxiang Che

Previous neural models on open-domain conversation generation have no effective mechanisms to manage chatting topics, and tend to produce less coherent dialogs. Inspired by the strategies in human-human dialogs, we divide the task of multi-turn open-domain conversation generation into two sub-tasks: explicit goal (chatting about a topic) sequence planning and goal completion by topic elaboration. To this end, we propose a three-layer Knowledge aware Hierarchical Reinforcement Learning based Model (KnowHRL). Specifically, for the first sub-task, the upper-layer policy learns to traverse a knowledge graph (KG) in order to plan a high-level goal sequence towards a good balance between dialog coherence and topic consistency with user interests. For the second sub-task, the middle-layer policy and the lower-layer one work together to produce an in-depth multi-turn conversation about a single topic with a goal-driven generation mechanism. The capability of goal-sequence planning enables chatbots to conduct proactive open-domain conversations towards recommended topics, which has many practical applications. Experiments demonstrate that our model outperforms state of the art baselines in terms of user-interest consistency, dialog coherence, and knowledge accuracy.

TIST Journal 2020 Journal Article

Multi-Task Learning for Entity Recommendation and Document Ranking in Web Search

  • Jizhou Huang
  • Haifeng Wang
  • Wei Zhang
  • Ting Liu

Entity recommendation, providing users with an improved search experience by proactively recommending related entities to a given query, has become an indispensable feature of today’s Web search engine. Existing studies typically only consider the query issued at the current timestep while ignoring the in-session user search behavior (short-term search history) or historical user search behavior across all sessions (long-term search history) when generating entity recommendations. As a consequence, they may fail to recommend entities of interest relevant to a user’s actual information need. In this work, we believe that both short-term and long-term search history convey valuable evidence that could help understand the user’s search intent behind a query, and take both of them into consideration for entity recommendation. Furthermore, there has been little work on exploring whether the use of other companion tasks in Web search such as document ranking as auxiliary tasks could improve the performance of entity recommendation. To this end, we propose a multi-task learning framework with deep neural networks (DNNs) to jointly learn and optimize two companion tasks in Web search engines: entity recommendation and document ranking, which can be easily trained in an end-to-end manner. Specifically, we regard document ranking as an auxiliary task to improve the main task of entity recommendation, where the representations of queries, sessions, and users are shared across all tasks and optimized by the multi-task objective during training. We evaluate our approach using large-scale, real-world search logs of a widely-used commercial Web search engine. We also performed extensive ablation experiments over a number of facets of the proposed multi-task DNN model to figure out their relative importance. The experimental results show that both short-term and long-term search history can bring significant improvements in recommendation effectiveness, and the combination of both outperforms using either of them individually. In addition, the experiments show that the performance of both entity recommendation and document ranking can be significantly improved, which demonstrates the effectiveness of using multi-task learning to jointly optimize the two companion tasks in Web search.

AAAI Conference 2020 Conference Paper

Synchronous Speech Recognition and Speech-to-Text Translation with Interactive Decoding

  • Yuchen Liu
  • Jiajun Zhang
  • Hao Xiong
  • Long Zhou
  • Zhongjun He
  • Hua Wu
  • Haifeng Wang
  • Chengqing Zong

Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lower latency, smaller model size, and less error propagation. However, it is notoriously difficult to implement such a model without transcriptions as intermediate. Existing works generally apply multi-task learning to improve translation quality by jointly training end-to-end ST along with automatic speech recognition (ASR). However, different tasks in this method cannot utilize information from each other, which limits the improvement. Other works propose a two-stage model where the second model can use the hidden state from the first one, but its cascade manner greatly affects the ef- ficiency of training and inference process. In this paper, we propose a novel interactive attention mechanism which enables ASR and ST to perform synchronously and interactively in a single model. Specifically, the generation of transcriptions and translations not only relies on its previous outputs but also the outputs predicted in the other task. Experiments on TED speech translation corpora have shown that our proposed model can outperform strong baselines on the quality of speech translation and achieve better speech recognition performances as well.

AAAI Conference 2019 Conference Paper

Joint Extraction of Entities and Overlapping Relations Using Position-Attentive Sequence Labeling

  • Dai Dai
  • Xinyan Xiao
  • Yajuan Lyu
  • Shan Dou
  • Qiaoqiao She
  • Haifeng Wang

Joint entity and relation extraction is to detect entity and relation using a single model. In this paper, we present a novel unified joint extraction model which directly tags entity and relation labels according to a query word position p, i. e. , detecting entity at p, and identifying entities at other positions that have relationship with the former. To this end, we first design a tagging scheme to generate n tag sequences for an n-word sentence. Then a position-attention mechanism is introduced to produce different sentence representations for every query position to model these n tag sequences. In this way, our method can simultaneously extract all entities and their type, as well as all overlapping relations. Experiment results show that our framework performances significantly better on extracting overlapping relations as well as detecting long-range relation, and thus we achieve state-of-the-art performance on two public datasets.

AAAI Conference 2019 Conference Paper

Modeling Coherence for Discourse Neural Machine Translation

  • Hao Xiong
  • Zhongjun He
  • Hua Wu
  • Haifeng Wang

Discourse coherence plays an important role in the translation of one text. However, the previous reported models most focus on improving performance over individual sentence while ignoring cross-sentence links and dependencies, which affects the coherence of the text. In this paper, we propose to use discourse context and reward to refine the translation quality from the discourse perspective. In particular, we generate the translation of individual sentences at first. Next, we deliberate the preliminary produced translations, and train the model to learn the policy that produces discourse coherent text by a reward teacher. Practical results on multiple discourse test datasets indicate that our model significantly improves the translation quality over the state-of-the-art baseline system by +1. 23 BLEU score. Moreover, our model generates more discourse coherent text and obtains +2. 2 BLEU improvements when evaluated by discourse metrics.

IJCAI Conference 2018 Conference Paper

Improving Entity Recommendation with Search Log and Multi-Task Learning

  • Jizhou Huang
  • Wei Zhang
  • Yaming Sun
  • Haifeng Wang
  • Ting Liu

Entity recommendation, providing search users with an improved experience by assisting them in finding related entities for a given query, has become an indispensable feature of today's Web search engine. Existing studies typically only consider the query issued at the current time step while ignoring the in-session preceding queries. Thus, they typically fail to handle the ambiguous queries such as "apple" because the model could not understand which apple (company or fruit) is talked about. In this work, we believe that the in-session contexts convey valuable evidences that could facilitate the semantic modeling of queries, and take that into consideration for entity recommendation. Furthermore, in order to better model the semantics of queries, we learn the model in a multi-task learning setting where the query representation is shared across entity recommendation and context-aware ranking. We evaluate our approach using large-scale, real-world search logs of a widely used commercial Web search engine. The experimental results show that incorporating context information significantly improves entity recommendation, and learning the model in a multi-task learning setting could bring further improvements.

IJCAI Conference 2017 Conference Paper

Learning to Explain Entity Relationships by Pairwise Ranking with Convolutional Neural Networks

  • Jizhou Huang
  • Wei Zhang
  • Shiqi Zhao
  • Shiqiang Ding
  • Haifeng Wang

Providing a plausible explanation for the relationship between two related entities is an important task in some applications of knowledge graphs, such as in search engines. However, most existing methods require a large number of manually labeled training data, which cannot be applied in large-scale knowledge graphs due to the expensive data annotation. In addition, these methods typically rely on costly handcrafted features. In this paper, we propose an effective pairwise ranking model by leveraging clickthrough data of a Web search engine to address these two problems. We first construct large-scale training data by leveraging the query-title pairs derived from clickthrough data of a Web search engine. Then, we build a pairwise ranking model which employs a convolutional neural network to automatically learn relevant features. The proposed model can be easily trained with backpropagation to perform the ranking task. The experiments show that our method significantly outperforms several strong baselines.

JAIR Journal 2016 Journal Article

A Distributed Representation-Based Framework for Cross-Lingual Transfer Parsing

  • Jiang Guo
  • Wanxiang Che
  • David Yarowsky
  • Haifeng Wang
  • Ting Liu

This paper investigates the problem of cross-lingual transfer parsing, aiming at inducing dependency parsers for low-resource languages while using only training data from a resource-rich language (e.g., English). Existing model transfer approaches typically don't include lexical features, which are not transferable across languages. In this paper, we bridge the lexical feature gap by using distributed feature representations and their composition. We provide two algorithms for inducing cross-lingual distributed representations of words, which map vocabularies from two different languages into a common vector space. Consequently, both lexical features and non-lexical features can be used in our model for cross-lingual transfer. Furthermore, our framework is flexible enough to incorporate additional useful features such as cross-lingual word clusters. Our combined contributions achieve an average relative error reduction of 10.9% in labeled attachment score as compared with the delexicalized parser, trained on English universal treebank and transferred to three other languages. It also significantly outperforms state-of-the-art delexicalized models augmented with projected cluster features on identical data. Finally, we demonstrate that our models can be further boosted with minimal supervision (e.g., 100 annotated sentences) from target languages, which is of great significance for practical usage.

AAAI Conference 2016 Conference Paper

A Representation Learning Framework for Multi-Source Transfer Parsing

  • Jiang Guo
  • Wanxiang Che
  • David Yarowsky
  • Haifeng Wang
  • Ting Liu

Cross-lingual model transfer has been a promising approach for inducing dependency parsers for lowresource languages where annotated treebanks are not available. The major obstacles for the model transfer approach are two-fold: 1. Lexical features are not directly transferable across languages; 2. Target languagespecific syntactic structures are difficult to be recovered. To address these two challenges, we present a novel representation learning framework for multi-source transfer parsing. Our framework allows multi-source transfer parsing using full lexical features straightforwardly. By evaluating on the Google universal dependency treebanks (v2. 0), our best models yield an absolute improvement of 6. 53% in averaged labeled attachment score, as compared with delexicalized multi-source transfer models. We also significantly outperform the state-of-the-art transfer system proposed most recently.

IJCAI Conference 2016 Conference Paper

Generating Recommendation Evidence Using Translation Model

  • Jizhou Huang
  • Shiqi Zhao
  • Shiqiang Ding
  • Haiyang Wu
  • Mingming Sun
  • Haifeng Wang

Entity recommendation, providing entity suggestions relevant to the query that a user is searching for, has become a key feature of today's web search engine. Despite the fact that related entities are relevant to users' search queries, sometimes users cannot easily understand the recommended entities without evidences. This paper proposes a statistical model consisting of four sub-models to generate evidences for entities, which can help users better understand each recommended entity, and figure out the connections between the recommended entities and a given query. The experiments show that our method is domain independent, and can generate catchy and interesting evidences in the application of entity recommendation.

AAAI Conference 2016 Conference Paper

Improved Neural Machine Translation with SMT Features

  • Wei He
  • Zhongjun He
  • Hua Wu
  • Haifeng Wang

Neural machine translation (NMT) conducts end-to-end translation with a source language encoder and a target language decoder, making promising translation performance. However, as a newly emerged approach, the method has some limitations. An NMT system usually has to apply a vocabulary of certain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in fluent but inadequate translations. In order to solve the above problems, we incorporate statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Our experiments show that the proposed method significantly improves the translation quality of the state-of-the-art NMT system on Chinese-to- English translation tasks. Our method produces a gain of up to 2. 33 BLEU score on NIST open test sets.

TIST Journal 2011 Journal Article

Two-Word Collocation Extraction Using Monolingual Word Alignment Method

  • Zhanyi Liu
  • Haifeng Wang
  • Hua Wu
  • Sheng Li

Statistical bilingual word alignment has been well studied in the field of machine translation. This article adapts the bilingual word alignment algorithm into a monolingual scenario to extract collocations from monolingual corpus, based on the fact that the words in a collocation tend to co-occur in similar contexts as in bilingual word alignment. First, the monolingual corpus is replicated to generate a parallel corpus, in which each sentence pair consists of two identical sentences. Next, the monolingual word alignment algorithm is employed to align potentially collocated words. Finally, the aligned word pairs are ranked according to the alignment scores and candidates with higher scores are extracted as collocations. We conducted experiments on Chinese and English corpora respectively. Compared to previous approaches that use association measures to extract collocations from co-occurrence word pairs within a given window, our method achieves higher precision and recall. According to human evaluation, our method achieves precisions of 62% on a Chinese corpus and 64% on an English corpus. In particular, we can extract collocations with longer spans, achieving a higher precision of 83% on the long-span (> 6 words) Chinese collocations.

ICRA Conference 2007 Conference Paper

Optimal Multiperiod Inventory Decisions with Partially Observed Markovian Supply Information

  • Haifeng Wang
  • Houmin Yan

This paper considers a multiperiod newsvendor problem with partially observed supply capacity information that evolves as a Markovian process. The supply capacity is fully observed by the buyer when the capacity is smaller than the buyer's ordering quantity. Otherwise, the buyer knows that the current-period supply capacity is greater than its ordering quantity. Based on these two observations, the buyer updates the future supply-capacity forecasting accordingly. With a dynamic programming formulation, we prove the existence of a unique optimal ordering policy.