Arrow Research search

Author name cluster

Ming Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

52 papers
2 author rows

Possible papers

52

ICML Conference 2025 Conference Paper

Efficient Skill Discovery via Regret-Aware Optimization

  • He Zhang 0030
  • Ming Zhou
  • Shaopeng Zhai
  • Ying Sun 0006
  • Hui Xiong 0001

Unsupervised skill discovery aims to learn diverse and distinguishable behaviors in open-ended reinforcement learning. For the existing methods, they focus on improving the diversity via pure exploration, mutual information optimization and learning temporal representation. Despite they perform well on exploration, they remain limited in terms of efficiency, especially for the high-dimensional situations. In this work, we frame the skill discovery as a min-max game of skill generation and policy learning, proposing a regret-aware method on top of temporal representation learning that expands the discovered skill space along the direction of upgradable policy strength. The key insight behind the proposed method is that the skill discovery is adversarial to the policy learning, i. e. , skills with weak strength should be further explored while less exploration for the skills with converged strength. As an implementation, we score the degree of strength convergence with regret, and guide the skill discovery with a learnable skill generator. To avoid degeneration, the skill generation comes from an upgradable population of skill generators. We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines on both efficiency and diversity. Moreover, our method achieves 15% zero-shot improvement on high-dimensional environments, compared to existing methods.

AAAI Conference 2025 Conference Paper

Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

  • Guangyuan Ma
  • Yongliang Ma
  • Xing Wu
  • Zhenpeng Su
  • Ming Zhou
  • Songlin Hu

Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably lead to sub-optimal retrieval performances. In this paper, we propose a new task-level Distributionally Robust Optimization (tDRO) algorithm for LLM-DR fine-tuning, targeted at improving the universal domain generalization ability by end-to-end reweighting the data distribution of each task. The tDRO parameterizes the domain weights and updates them with scaled domain gradients. The optimized weights are then transferred to the LLM-DR fine-tuning to train more robust retrievers. Experiments show optimal improvements in large-scale retrieval benchmarks and reduce up to 30% dataset usage after applying our optimization algorithm with a series of different-sized LLM-DR models.

AAAI Conference 2024 Conference Paper

HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

  • Kun Yan
  • Lei Ji
  • Chenfei Wu
  • Jian Liang
  • Ming Zhou
  • Nan Duan
  • Shuai Ma

Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Nevertheless, contemporary panoramic synthesis techniques grapple with the challenge of semantically guiding the content generation process. Although recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, a direct application of these methods to panorama synthesis yields distorted content. In this study, we unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling. Our pioneering approach empowers users with semantic control, harnessing both image and text inputs, while concurrently streamlining the generation of high-resolution panoramas using parallel decoding. We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics. Our research elevates the controllability, efficiency, and fidelity of panorama synthesis to new levels.

JMLR Journal 2023 Journal Article

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

  • Ming Zhou
  • Ziyu Wan
  • Hanjing Wang
  • Muning Wen
  • Runzhe Wu
  • Ying Wen
  • Yaodong Yang
  • Yong Yu

Population-based multi-agent reinforcement learning (PB-MARL) encompasses a range of methods that merge dynamic population selection with multi-agent reinforcement learning algorithms (MARL). While PB-MARL has demonstrated notable achievements in complex multi-agent tasks, its sequential execution is plagued by low computational efficiency due to the diversity in computing patterns and policy combinations. We propose a solution involving a stateless central task dispatcher and stateful workers to handle PB-MARL's subroutines, thereby capitalizing on parallelism across various components for efficient problem-solving. In line with this approach, we introduce MALib, a parallel framework that incorporates a task control model, independent data servers, and an abstraction of MARL training paradigms. The framework has undergone extensive testing and is available under the MIT license (https://github.com/sjtu-marl/malib) [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2023. ( edit, beta )

IJCAI Conference 2022 Conference Paper

Reasoning over Hybrid Chain for Table-and-Text Open Domain Question Answering

  • Wanjun Zhong
  • Junjie Huang
  • Qian Liu
  • Ming Zhou
  • Jiahai Wang
  • Jian Yin
  • Nan Duan

Tabular and textual question answering requires systems to perform reasoning over heterogeneous information, considering table structure, and the connections among table and text. In this paper, we propose a ChAin-centric Reasoning and Pre-training framework (CARP). CARP utilizes hybrid chain to model the explicit intermediate reasoning process across table and text for question answering. We also propose a novel chain-centric pre-training method, to enhance the pre-trained model in identifying the cross-modality reasoning process and alleviating the data sparsity problem. This method constructs the large-scale reasoning corpus by synthesizing pseudo heterogeneous reasoning paths from Wikipedia and generating corresponding questions. We evaluate our system on OTT-QA, a large-scale table-and-text open-domain question answering benchmark, and our system achieves the state-of-the-art performance. Further analyses illustrate that the explicit hybrid chain offers substantial performance improvement and interpretablity of the intermediate reasoning process, and the chain-centric pre-training boosts the performance on the chain extraction.

NeurIPS Conference 2021 Conference Paper

CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation

  • Shuai Lu
  • Daya Guo
  • Shuo Ren
  • Junjie Huang
  • Alexey Svyatkovskiy
  • Ambrosio Blanco
  • Colin Clement
  • Dawn Drain

Benchmark datasets have a significant impact on accelerating research in programming language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster machine learning research for program understanding and generation. CodeXGLUE includes a collection of 10 tasks across 14 datasets and a platform for model evaluation and comparison. CodeXGLUE also features three baseline systems, including the BERT-style, GPT-style, and Encoder-Decoder models, to make it easy for researchers to use the platform. The availability of such data and baselines can help the development and validation of new methods that can be applied to various program understanding and generation problems.

IJCAI Conference 2021 Conference Paper

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

  • Weinan Zhang
  • Xihuai Wang
  • Jian Shen
  • Ming Zhou

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning process, we propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent environment model, consisting of a dynamics model and multiple opponent models, and trains its policy with the adaptive opponent-wise rollout. We further prove the theoretic convergence of AORPO under reasonable assumptions. Empirical experiments on competitive and cooperative tasks demonstrate that AORPO can achieve improved sample efficiency with comparable asymptotic performance over the compared MARL methods.

ICRA Conference 2021 Conference Paper

Vanishing Point Aided LiDAR-Visual-Inertial Estimator

  • Peng Wang
  • Zheng Fang 0001
  • Shibo Zhao
  • Yongnan Chen
  • Ming Zhou
  • Shan An

In this paper, we propose a vanishing point aided LiDAR-Visual-Inertial estimator to achieve real-time, low-drift and robust pose estimation. The proposed method is mainly composed of 3 sequential modules, namely IMU-aided vanishing point (VP) detection module, voxel-map based feature depth association module, and visual inertial fixed-lag smoother module. The IMU-aided VP detection module will detect feature points, line segments and vanishing points to establish robust correspondences in successive frames. In particular, we propose to use 1-line RANSAC method to provide stable VP hypotheses and polar grid to accelerate vanishing point hypothesis validation. After that, we propose a novel voxel-map based feature depth association method, to retrieve depth and assign depth to visual feature efficiently. Finally, the visual inertial fixed-lag smoother module is proposed to jointly minimize error terms. Experiments show that our method outperforms the state-of-the-art visual-inertial odometry and LiDAR-visual estimator in both indoor and outdoor environments.

AAAI Conference 2020 Conference Paper

Alternating Language Modeling for Cross-Lingual Pre-Training

  • Jian Yang
  • Shuming Ma
  • Dongdong Zhang
  • ShuangZhi Wu
  • Zhoujun Li
  • Ming Zhou

Language model pre-training has achieved success in many natural language processing tasks. Existing methods for cross-lingual pre-training adopt Translation Language Model to predict masked words with the concatenation of the source sentence and its target equivalent. In this work, we introduce a novel cross-lingual pre-training method, called Alternating Language Modeling (ALM). It code-switches sentences of different languages rather than simple concatenation, hoping to capture the rich cross-lingual context of words and phrases. More specifically, we randomly substitute source phrases with target translations to create code-switched sentences. Then, we use these code-switched data to train ALM model to learn to predict words of different languages. We evaluate our pre-training ALM on the downstream tasks of machine translation and cross-lingual classification. Experiments show that ALM can outperform the previous pretraining methods on three benchmarks. 1

AAAI Conference 2020 Conference Paper

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

  • Chengyi Wang
  • Yu Wu
  • Shujie Liu
  • Zhenglu Yang
  • Ming Zhou

End-to-end speech translation, a hot topic in recent years, aims to translate a segment of audio into a specific language with an end-to-end model. Conventional approaches employ multi-task learning and pre-training methods for this task, but they suffer from the huge gap between pre-training and fine-tuning. To address these issues, we propose a Tandem Connectionist Encoding Network (TCEN) which bridges the gap by reusing all subnets in fine-tuning, keeping the roles of subnets consistent, and pre-training the attention module. Furthermore, we propose two simple but effective methods to guarantee the speech encoder outputs and the MT encoder inputs are consistent in terms of semantic representation and sequence length. Experimental results show that our model leads to significant improvements in En-De and En-Fr translation irrespective of the backbones.

NeurIPS Conference 2020 Conference Paper

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

  • Wenhui Wang
  • Furu Wei
  • Li Dong
  • Hangbo Bao
  • Nan Yang
  • Ming Zhou

Pre-trained language models (e. g. , BERT (Devlin et al. , 2018) and its variants) have achieved remarkable success in varieties of NLP tasks. However, these models usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and online serving in real-life applications due to latency and capacity constraints. In this work, we present a simple and effective approach to compress large Transformer (Vaswani et al. , 2017) based pre-trained models, termed as deep self-attention distillation. The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher). Specifically, we propose distilling the self-attention module of the last Transformer layer of the teacher, which is effective and flexible for the student. Furthermore, we introduce the scaled dot-product between values in the self-attention module as the new deep self-attention knowledge, in addition to the attention distributions (i. e. , the scaled dot-product of queries and keys) that have been used in existing works. Moreover, we show that introducing a teacher assistant (Mirzadeh et al. , 2019) also helps the distillation of large pre-trained Transformer models. Experimental results demonstrate that our monolingual model outperforms state-of-the-art baselines in different parameter size of student models. In particular, it retains more than 99% accuracy on SQuAD 2. 0 and several GLUE benchmark tasks using 50% of the Transformer parameters and computations of the teacher model. We also obtain competitive results in applying deep self-attention distillation to multilingual pre-trained models.

NeurIPS Conference 2019 Conference Paper

A Tensorized Transformer for Language Modeling

  • Xindian Ma
  • Peng Zhang
  • Shuai Zhang
  • Nan Duan
  • Yuexian Hou
  • Ming Zhou
  • Dawei Song

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i. e. , PTB, WikiText-103 and One-billion) and a neural machine translation task (i. e. , WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.

AAAI Conference 2019 Conference Paper

Regularizing Neural Machine Translation by Target-Bidirectional Agreement

  • Zhirui Zhang
  • ShuangZhi Wu
  • Shujie Liu
  • Mu Li
  • Ming Zhou
  • Tong Xu

Although Neural Machine Translation (NMT) has achieved remarkable progress in the past several years, most NMT systems still suffer from a fundamental shortcoming as in other sequence generation tasks: errors made early in generation process are fed as inputs to the model and can be quickly amplified, harming subsequent sequence generation. To address this issue, we propose a novel model regularization method for NMT training, which aims to improve the agreement between translations generated by left-to-right (L2R) and rightto-left (R2L) NMT decoders. This goal is achieved by introducing two Kullback-Leibler divergence regularization terms into the NMT training objective to reduce the mismatch between output probabilities of L2R and R2L models. In addition, we also employ a joint training strategy to allow L2R and R2L models to improve each other in an interactive update process. Experimental results show that our proposed method significantly outperforms state-of-the-art baselines on Chinese-English and English-German translation tasks.

AAAI Conference 2019 Conference Paper

Response Generation by Context-Aware Prototype Editing

  • Yu Wu
  • Furu Wei
  • Shaohan Huang
  • Yunli Wang
  • Zhoujun Li
  • Ming Zhou

Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses. We propose a new paradigm, prototypethen-edit for response generation, that first retrieves a prototype response from a pre-defined index and then edits the prototype response according to the differences between the prototype context and current context. Our motivation is that the retrieved prototype provides a good start-point for generation because it is grammatical and informative, and the post-editing process further improves the relevance and coherence of the prototype. In practice, we design a contextaware editing model that is built upon an encoder-decoder framework augmented with an editing vector. We first generate an edit vector by considering lexical differences between a prototype context and current context. After that, the edit vector and the prototype response representation are fed to a decoder to generate a new response. Experiment results on a large scale dataset demonstrate that our new paradigm significantly increases the relevance, diversity and originality of generation results, compared to traditional generative models. Furthermore, our model outperforms retrieval-based methods in terms of relevance and originality.

NeurIPS Conference 2019 Conference Paper

Unified Language Model Pre-training for Natural Language Understanding and Generation

  • Li Dong
  • Nan Yang
  • Wenhui Wang
  • Furu Wei
  • Xiaodong Liu
  • Yu Wang
  • Jianfeng Gao
  • Ming Zhou

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks. The model is pre-trained using three types of language modeling tasks: unidirectional, bidirectional, and sequence-to-sequence prediction. The unified modeling is achieved by employing a shared Transformer network and utilizing specific self-attention masks to control what context the prediction conditions on. UniLM compares favorably with BERT on the GLUE benchmark, and the SQuAD 2. 0 and CoQA question answering tasks. Moreover, UniLM achieves new state-of-the-art results on five natural language generation datasets, including improving the CNN/DailyMail abstractive summarization ROUGE-L to 40. 51 (2. 04 absolute improvement), the Gigaword abstractive summarization ROUGE-L to 35. 75 (0. 86 absolute improvement), the CoQA generative question answering F1 score to 82. 5 (37. 1 absolute improvement), the SQuAD question generation BLEU-4 to 22. 12 (3. 75 absolute improvement), and the DSTC7 document-grounded dialog response generation NIST-4 to 2. 67 (human performance is 2. 65). The code and pre-trained models are available at https: //github. com/microsoft/unilm.

AAAI Conference 2019 Conference Paper

Unsupervised Neural Machine Translation with SMT as Posterior Regularization

  • Shuo Ren
  • Zhirui Zhang
  • Shujie Liu
  • Ming Zhou
  • Shuai Ma

Without real bilingual corpus available, unsupervised Neural Machine Translation (NMT) typically requires pseudo parallel data generated with the back-translation method for the model training. However, due to weak supervision, the pseudo data inevitably contain noises and errors that will be accumulated and reinforced in the subsequent training process, leading to bad translation performance. To address this issue, we introduce phrase based Statistic Machine Translation (SMT) models which are robust to noisy data, as posterior regularizations to guide the training of unsupervised NMT models in the iterative back-translation process. Our method starts from SMT models built with pre-trained language models and word-level translation tables inferred from cross-lingual embeddings. Then SMT and NMT models are optimized jointly and boost each other incrementally in a unified EM framework. In this way, (1) the negative effect caused by errors in the iterative back-translation process can be alleviated timely by SMT filtering noises from its phrase tables; meanwhile, (2) NMT can compensate for the deficiency of fluency inherent in SMT. Experiments conducted on en-fr and en-de translation tasks show that our method outperforms the strong baseline and achieves new state-of-the-art unsupervised machine translation performance.

AAAI Conference 2018 Conference Paper

Assertion-Based QA With Question-Aware Open Information Extraction

  • Zhao Yan
  • Duyu Tang
  • Nan Duan
  • Shujie Liu
  • Wendi Wang
  • Daxin Jiang
  • Ming Zhou
  • Zhoujun Li

We present assertion based question answering (ABQA), an open domain question answering task that takes a question and a passage as inputs, and outputs a semi-structured assertion consisting of a subject, a predicate and a list of arguments. An assertion conveys more evidences than a short answer span in reading comprehension, and it is more concise than a tedious passage in passage-based QA. These advantages make ABQA more suitable for human-computer interaction scenarios such as voice-controlled speakers. Further progress towards improving ABQA requires richer supervised dataset and powerful models of text understanding. To remedy this, we introduce a new dataset called WebAssertions, which includes hand-annotated QA labels for 358, 427 assertions in 55, 960 web passages. To address ABQA, we develop both generative and extractive approaches. The backbone of our generative approach is sequence to sequence learning. In order to capture the structure of the output assertion, we introduce a hierarchical decoder that first generates the structure of the assertion and then generates the words of each field. The extractive approach is based on learning to rank. Features at different levels of granularity are designed to measure the semantic relevance between a question and an assertion. Experimental results show that our approaches have the ability to infer question-aware assertions from a passage. We further evaluate our approaches by incorporating the ABQA results as additional features in passage-based QA. Results on two datasets show that ABQA features significantly improve the accuracy on passage-based QA.

NeurIPS Conference 2018 Conference Paper

Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base

  • Daya Guo
  • Duyu Tang
  • Nan Duan
  • Ming Zhou
  • Jian Yin

We present an approach to map utterances in conversation to logical forms, which will be executed on a large-scale knowledge base. To handle enormous ellipsis phenomena in conversation, we introduce dialog memory management to manipulate historical entities, predicates, and logical forms when inferring the logical form of current utterances. Dialog memory management is embodied in a generative model, in which a logical form is interpreted in a top-down manner following a small and flexible grammar. We learn the model from denotations without explicit annotation of logical forms, and evaluate it on a large-scale dataset consisting of 200K dialogs over 12. 8M entities. Results verify the benefits of modeling dialog memory, and show that our semantic parsing-based approach outperforms a memory network based encoder-decoder model by a huge margin.

AAAI Conference 2018 Conference Paper

Hierarchical Recurrent Attention Network for Response Generation

  • Chen Xing
  • Yu Wu
  • Wei Wu
  • Yalou Huang
  • Ming Zhou

We study multi-turn response generation in chatbots where a response is generated according to a conversation context. Existing work has modeled the hierarchy of the context, but does not pay enough attention to the fact that words and utterances in the context are differentially important. As a result, they may lose important information in context and generate irrelevant responses. We propose a hierarchical recurrent attention network (HRAN) to model both the hierarchy and the importance variance in a unified framework. In HRAN, a hierarchical attention mechanism attends to important parts within and among utterances with word level attention and utterance level attention respectively. Empirical studies on both automatic evaluation and human judgment show that HRAN can significantly outperform state-of-the-art models for context based response generation.

AAAI Conference 2018 Conference Paper

Joint Training for Neural Machine Translation Models with Monolingual Data

  • Zhirui Zhang
  • Shujie Liu
  • Mu Li
  • Ming Zhou
  • Enhong Chen

Monolingual data have been demonstrated to be helpful in improving translation quality of both statistical machine translation (SMT) systems and neural machine translation (NMT) systems, especially in resource-poor or domain adaptation tasks where parallel data are not rich enough. In this paper, we propose a novel approach to better leveraging monolingual data for neural machine translation by jointly learning source-to-target and target-to-source NMT models for a language pair with a joint EM optimization method. The training process starts with two initial NMT models pre-trained on parallel data for each direction, and these two models are iteratively updated by incrementally decreasing translation losses on training data. In each iteration step, both NMT models are first used to translate monolingual data from one language to the other, forming pseudo-training data of the other NMT model. Then two new NMT models are learnt from parallel data together with the pseudo training data. Both NMT models are expected to be improved and better pseudo-training data can be generated in next step. Experiment results on Chinese-English and English-German translation tasks show that our approach can simultaneously improve translation quality of source-to-target and target-to-source models, significantly outperforming strong baseline systems which are enhanced with monolingual data for model training including back-translation.

AAAI Conference 2018 System Paper

MAgent: A Many-Agent Reinforcement Learning Platform for Artificial Collective Intelligence

  • Lianmin Zheng
  • Jiacheng Yang
  • Han Cai
  • Ming Zhou
  • Weinan Zhang
  • Jun Wang
  • Yong Yu

We introduce MAgent, a platform to support research and development of many-agent reinforcement learning. Unlike previous research platforms on single or multi-agent reinforcement learning, MAgent focuses on supporting the tasks and the applications that require hundreds to millions of agents. Within the interactions among a population of agents, it enables not only the study of learning algorithms for agents’ optimal polices, but more importantly, the observation and understanding of individual agent’s behaviors and social phenomena emerging from the AI society, including communication languages, leaderships, altruism. MAgent is highly scalable and can host up to one million agents on a single GPU server. MAgent also provides flexible configurations for AI researchers to design their customized environments and agents. In this demo, we present three environments designed on MAgent and show emerged collective intelligence by learning from scratch.

IJCAI Conference 2018 Conference Paper

Multiway Attention Networks for Modeling Sentence Pairs

  • Chuanqi Tan
  • Furu Wei
  • Wenhui Wang
  • Weifeng Lv
  • Ming Zhou

Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection. Previous work achieves very promising results using neural networks with attention mechanism. In this paper, we propose the multiway attention networks which employ multiple attention functions to match sentence pairs under the matching-aggregation framework. Specifically, we design four attention functions to match words in corresponding sentences. Then, we aggregate the matching information from each function, and combine the information from all functions to obtain the final representation. Experimental results demonstrate that the proposed multiway attention networks improve the result on the Quora Question Pairs, SNLI, MultiNLI, and answer sentence selection task on the SQuAD dataset.

IJCAI Conference 2018 Conference Paper

Reinforced Mnemonic Reader for Machine Reading Comprehension

  • Minghao Hu
  • Yuxing Peng
  • Zhen Huang
  • Xipeng Qiu
  • Furu Wei
  • Ming Zhou

In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects. First, a reattention mechanism is proposed to refine current attentions by directly accessing to past attentions that are temporally memorized in a multi-round alignment architecture, so as to avoid the problems of attention redundancy and attention deficiency. Second, a new optimization approach, called dynamic-critical reinforcement learning, is introduced to extend the standard supervised method. It always encourages to predict a more acceptable answer so as to address the convergence suppression problem occurred in traditional reinforcement learning algorithms. Extensive experiments on the Stanford Question Answering Dataset (SQuAD) show that our model achieves state-of-the-art results. Meanwhile, our model outperforms previous systems by over 6% in terms of both Exact Match and F1 metrics on two adversarial SQuAD datasets.

AAAI Conference 2018 Conference Paper

S-Net: From Answer Extraction to Answer Synthesis for Machine Reading Comprehension

  • Chuanqi Tan
  • Furu Wei
  • Nan Yang
  • Bowen Du
  • Weifeng Lv
  • Ming Zhou

In this paper, we present a novel approach to machine reading comprehension for the MS-MARCO dataset. Unlike the SQuAD dataset that aims to answer a question with exact text spans in a passage, the MS-MARCO dataset defines the task as answering a question from multiple passages and the words in the answer are not necessary in the passages. We therefore develop an extraction-then-synthesis framework to synthesize answers from extraction results. Specifically, the answer extraction model is first employed to predict the most important sub-spans from the passage as evidence, and the answer synthesis model takes the evidence as additional features along with the question and passage to further elaborate the final answers. We build the answer extraction model with state-ofthe-art neural networks for single passage reading comprehension, and propose an additional task of passage ranking to help answer extraction in multiple passages. The answer synthesis model is based on the sequence-to-sequence neural networks with extracted evidences as features. Experiments show that our extraction-then-synthesis method outperforms state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Sequential Copying Networks

  • Qingyu Zhou
  • Nan Yang
  • Furu Wei
  • Ming Zhou

Copying mechanism shows effectiveness in sequence-tosequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation. However, existing works on modeling copying or pointing mechanism only considers single word copying from the source sentences. In this paper, we propose a novel copying framework, named Sequential Copying Networks (SeqCopyNet), which not only learns to copy single words, but also copies sequences from the input sentence. It leverages the pointer networks to explicitly select a subspan from the source side to target side, and integrates this sequential copying mechanism to the generation process in the encoder-decoder paradigm. Experiments on abstractive sentence summarization and question generation tasks show that the proposed SeqCopyNet can copy meaningful spans and outperforms the baseline models.

AAAI Conference 2018 Conference Paper

Table-to-Text: Describing Table Region With Natural Language

  • Junwei Bao
  • Duyu Tang
  • Nan Duan
  • Zhao Yan
  • Yuanhua Lv
  • Ming Zhou
  • Tiejun Zhao

In this paper, we present a generative model to generate a natural language sentence describing a table region, e. g. , a row. The model maps a row from a table to a continuous vector and then generates a natural language sentence by leveraging the semantics of a table. To deal with rare words appearing in a table, we develop a flexible copying mechanism that selectively replicates contents from the table in the output sequence. Extensive experiments demonstrate the accuracy of the model and the power of the copying mechanism. On two synthetic datasets, WIKIBIO and SIMPLEQUESTIONS, our model improves the current state-of-the-art BLEU-4 score from 34. 70 to 40. 26 and from 33. 32 to 39. 12, respectively. Furthermore, we introduce an open-domain dataset WIK- ITABLETEXT including 13, 318 explanatory sentences for 4, 962 tables. Our model achieves a BLEU-4 score of 38. 23, which outperforms template based and language model based approaches.

AAAI Conference 2017 Conference Paper

Building Task-Oriented Dialogue Systems for Online Shopping

  • Zhao Yan
  • Nan Duan
  • Peng Chen
  • Ming Zhou
  • Jianshe Zhou
  • Zhoujun Li

We present a general solution towards building task-oriented dialogue systems for online shopping, aiming to assist online customers in completing various purchase-related tasks, such as searching products and answering questions, in a natural language conversation manner. As a pioneering work, we show what & how existing natural language processing techniques, data resources, and crowdsourcing can be leveraged to build such task-oriented dialogue systems for E-commerce usage. To demonstrate its effectiveness, we integrate our system into a mobile online shopping application. To the best of our knowledge, this is the first time that an dialogue system in Chinese is practically used in online shopping scenario with millions of real consumers. Interesting and insightful observations are shown in the experimental part, based on the analysis of human-bot conversation log. Several current challenges are also pointed out as our future directions.

IJCAI Conference 2017 Conference Paper

Improved Neural Machine Translation with Source Syntax

  • ShuangZhi Wu
  • Ming Zhou
  • Dongdong Zhang

Neural Machine Translation (NMT) based on the encoder-decoder architecture has recently achieved the state-of-the-art performance. Researchers have proven that extending word level attention to phrase level attention by incorporating source-side phrase structure can enhance the attention model and achieve promising improvement. However, word dependencies that can be crucial to correctly understand a source sentence are not always in a consecutive fashion (i. e. phrase structure), sometimes they can be in long distance. Phrase structures are not the best way to explicitly model long distance dependencies. In this paper we propose a simple but effective method to incorporate source-side long distance dependencies into NMT. Our method based on dependency trees enriches each source state with global dependency structures, which can better capture the inherent syntactic structure of source sentences. Experiments on Chinese-English and English-Japanese translation tasks show that our proposed method outperforms state-of-the-art SMT and NMT baselines.

AAAI Conference 2017 Conference Paper

Topic Aware Neural Response Generation

  • Chen Xing
  • Wei Wu
  • Yu Wu
  • Jie Liu
  • Yalou Huang
  • Ming Zhou
  • Wei-Ying Ma

We consider incorporating topic information into a sequenceto-sequence framework to generate informative and interesting responses for chatbots. To this end, we propose a topic aware sequence-to-sequence (TA-Seq2Seq) model. The model utilizes topics to simulate prior human knowledge that guides them to form informative and interesting responses in conversation, and leverages topic information in generation by a joint attention mechanism and a biased generation probability. The joint attention mechanism summarizes the hidden vectors of an input message as context vectors by message attention and synthesizes topic vectors by topic attention from the topic words of the message obtained from a pre-trained LDA model, with these vectors jointly affecting the generation of words in decoding. To increase the possibility of topic words appearing in responses, the model modifies the generation probability of topic words by adding an extra probability item to bias the overall distribution. Empirical studies on both automatic evaluation metrics and human annotations show that TA-Seq2Seq can generate more informative and interesting responses, significantly outperforming state-of-theart response generation models.

AAAI Conference 2016 Conference Paper

Improving Recommendation of Tail Tags for Questions in Community Question Answering

  • Yu Wu
  • Wei Wu
  • Zhoujun Li
  • Ming Zhou

We study tag recommendation for questions in community question answering (CQA). Tags represent the semantic summarization of questions are useful for navigation and expert finding in CQA and can facilitate content consumption such as searching and mining in these web sites. The task is challenging, as both questions and tags are short and a large fraction of tags are tail tags which occur very infrequently. To solve these problems, we propose matching questions and tags not only by themselves, but also by similar questions and similar tags. The idea is then formalized as a model in which we calculate question-tag similarity using a linear combination of similarity with similar questions and tags weighted by tag importance. Question similarity, tag similarity, and tag importance are learned in a supervised random walk framework by fusing multiple features. Our model thus can not only accurately identify question-tag similarity for head tags, but also improve the accuracy of recommendation of tail tags. Experimental results show that the proposed method significantly outperforms state-of-the-art methods on tag recommendation for questions. Particularly, it improves tail tag recommendation accuracy by a large margin.

AAAI Conference 2016 Conference Paper

Jointly Modeling Topics and Intents with Global Order Structure

  • Bei Chen
  • Jun Zhu
  • Nan Yang
  • Tian Tian
  • Ming Zhou
  • Bo Zhang

Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMM-LDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several state-of-the-art baselines.

AAAI Conference 2016 Conference Paper

TGSum: Build Tweet Guided Multi-Document Summarization Dataset

  • Ziqiang Cao
  • Chengyao Chen
  • Wenjie Li
  • Sujian Li
  • Furu Wei
  • Ming Zhou

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media’s reactions. We utilize two types of social labels in tweets, i. e. , hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research1.

IJCAI Conference 2016 Conference Paper

Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction

  • Yichun Yin
  • Furu Wei
  • Li Dong
  • Kaimeng Xu
  • Ming Zhang
  • Ming Zhou

In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths. The basic idea is to connect two words (w1 and w2) with the dependency path (r) between them in the embedding space. Specifically, our method optimizes the objective w1 + r ≈ w2 in the low-dimensional space, where the multi-hop dependency paths are treated as a sequence of grammatical relations and modeled by a recurrent neural network. Then, we design the embedding features that consider linear context and dependency context information, for the conditional random field (CRF) based aspect term extraction. Experimental results on the SemEval datasets show that, (1) with only embedding features, we can achieve state-of-the-art results; (2) our embedding method which incorporates the syntactic information among words yields better performance than other representative ones in aspect term extraction.

IJCAI Conference 2015 Conference Paper

A Hybrid Neural Model for Type Classification of Entity Mentions

  • Li Dong
  • Furu Wei
  • Hong Sun
  • Ming Zhou
  • Ke Xu

The semantic class (i. e. , type) of an entity plays a vital role in many natural language processing tasks, such as question answering. However, most of existing type classification systems extensively rely on hand-crafted features. This paper introduces a hybrid neural model which classifies entity mentions to a wide-coverage set of 22 types derived from DBpedia. It consists of two parts. The mention model uses recurrent neural networks to recursively obtain the vector representation of an entity mention from the words it contains. The context model, on the other hand, employs multilayer perceptrons to obtain the hidden representation for contextual information of a mention. Representations obtained by the two parts are used together to predict the type distribution. Using automatically generated data, these two parts are jointly learned. Experimental studies illustrate that the proposed approach outperforms baseline methods. Moreover, when type information provided by our method is used in a question answering system, we observe a 14. 7% relative improvement for the top-1 accuracy of answers.

AAAI Conference 2015 Conference Paper

Mining Query Subtopics from Questions in Community Question Answering

  • Yu Wu
  • Wei Wu
  • Zhoujun Li
  • Ming Zhou

This paper proposes mining query subtopics from questions in community question answering (CQA). The subtopics are represented as a number of clusters of questions with keywords summarizing the clusters. The task is unique in that the subtopics from questions can not only facilitate user browsing in CQA search, but also describe aspects of queries from a question-answering perspective. The challenges of the task include how to group semantically similar questions and how to find keywords capable of summarizing the clusters. We formulate the subtopic mining task as a non-negative matrix factorization (NMF) problem and further extend the model of NMF to incorporate question similarity estimated from metadata of CQA into learning. Compared with existing methods, our method can jointly optimize question clustering and keyword extraction and encourage the former task to enhance the latter. Experimental results on large scale real world CQA datasets show that the proposed method significantly outperforms the existing methods in terms of keyword extraction, while achieving a comparable performance to the state-ofthe-art methods for question clustering.

AAAI Conference 2015 Conference Paper

Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization

  • Ziqiang Cao
  • Furu Wei
  • Li Dong
  • Sujian Li
  • Ming Zhou

We develop a Ranking framework upon Recursive Neural Networks (R2N2) to rank sentences for multi-document summarization. It formulates the sentence ranking task as a hierarchical regression process, which simultaneously measures the salience of a sentence and its constituents (e. g. , phrases) in the parsing tree. This enables us to draw on word-level to sentence-level supervisions derived from reference summaries. In addition, recursive neural networks are used to automatically learn ranking features over the tree, with hand-crafted feature vectors of words as inputs. Hierarchical regressions are then conducted with learned features concatenating raw features. Ranking scores of sentences and words are utilized to effectively select informative and nonredundant sentences to generate summaries. Experiments on the DUC 2001, 2002 and 2004 multi-document summarization datasets show that R2N2 outperforms state-of-the-art extractive summarization approaches.

AAAI Conference 2014 Conference Paper

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

  • Li Dong
  • Furu Wei
  • Ming Zhou
  • Ke Xu

Recursive neural models have achieved promising results in many natural language processing tasks. The main difference among these models lies in the composition function, i. e. , how to obtain the vector representation for a phrase or sentence using the representations of words it contains. This paper introduces a novel Adaptive Multi-Compositionality (AdaMC) layer to recursive neural models. The basic idea is to use more than one composition functions and adaptively select them depending on the input vectors. We present a general framework to model each semantic composition as a distribution over these composition functions. The composition functions and parameters used for adaptive selection are learned jointly from data. We integrate AdaMC into existing recursive neural models and conduct extensive experiments on the Stanford Sentiment Treebank. The results illustrate that AdaMC significantly outperforms state-of-the-art sentiment classification methods. It helps push the best accuracy of sentence-level negative/positive classification from 85. 4% up to 88. 5%.

AAAI Conference 2014 Conference Paper

Machine Translation with Real-Time Web Search

  • Lei Cui
  • Ming Zhou
  • Qiming Chen
  • Dongdong Zhang
  • Mu Li

Contemporary machine translation systems usually rely on offline data retrieved from the web for individual model training, such as translation models and language models. In contrast to existing methods, we propose a novel approach that treats machine translation as a web search task and utilizes the web on the fly to acquire translation knowledge. This end-to-end approach takes advantage of fresh web search results that are capable of leveraging tremendous web knowledge to obtain phrase-level candidates on demand and then compose sentence-level translations. Experimental results show that our web-based machine translation method demonstrates very promising performance in leveraging fresh translation knowledge and making translation decisions. Furthermore, when combined with offline models, it significantly outperforms a state-of-theart phrase-based statistical machine translation system.

AAAI Conference 2014 Conference Paper

Mind the Gap: Machine Translation by Minimizing the Semantic Gap in Embedding Space

  • Jiajun Zhang
  • Shujie Liu
  • Mu Li
  • Ming Zhou
  • Chengqing Zong

The conventional statistical machine translation (SMT) methods perform the decoding process by compositing a set of the translation rules which are associated with high probabilities. However, the probabilities of the translation rules are calculated only according to the cooccurrence statistics in the bilingual corpus rather than the semantic meaning similarity. In this paper, we propose a Recursive Neural Network (RNN) based model that converts each translation rule into a compact real-valued vector in the semantic embedding space and performs the decoding process by minimizing the semantic gap between the source language string and its translation candidates at each state in a bottom-up structure. The RNN-based translation model is trained using a max-margin objective function. Extensive experiments on Chinese-to-English translation show that our RNN-based model can significantly improve the translation quality by up to 1. 68 BLEU score.

IJCAI Conference 2013 Conference Paper

Answer Extraction from Passage Graph for Question Answering

  • Hong Sun
  • Nan Duan
  • Yajuan Duan
  • Ming Zhou

In question answering, answer extraction aims to pin-point the exact answer from passages. However, most previous methods perform such extraction on each passage separately, without considering clues provided in other passages. This paper presents a novel approach to extract answers by fully leveraging connections among different passages. Specially, extraction is performed on a Passage Graph which is built by adding links upon multiple passages. Different passages are connected by linking words with the same stem. We use the factor graph as our model for answer extraction. Experimental results on multiple QA data sets demonstrate that our method significantly improves the performance of answer extraction.

TIST Journal 2013 Journal Article

Named entity recognition for tweets

  • Xiaohua Liu
  • Furu Wei
  • Shaodian Zhang
  • Ming Zhou

Two main challenges of Named Entity Recognition (NER) for tweets are the insufficient information in a tweet and the lack of training data. We propose a novel method consisting of three core elements: (1) normalization of tweets; (2) combination of a K-Nearest Neighbors (KNN) classifier with a linear Conditional Random Fields (CRF) model; and (3) semisupervised learning framework. The tweet normalization preprocessing corrects common ill-formed words using a global linear model. The KNN-based classifier conducts prelabeling to collect global coarse evidence across tweets while the CRF model conducts sequential labeling to capture fine-grained information encoded in a tweet. The semisupervised learning plus the gazetteers alleviate the lack of training data. Extensive experiments show the advantages of our method over the baselines as well as the effectiveness of normalization, KNN, and semisupervised learning.

AAAI Conference 2013 Conference Paper

The Automated Acquisition of Suggestions from Tweets

  • Li Dong
  • Furu Wei
  • Yajuan Duan
  • Xiaohua Liu
  • Ming Zhou
  • Ke Xu

This paper targets at automatically detecting and classifying user’s suggestions from tweets. The short and informal nature of tweets, along with the imbalanced characteristics of suggestion tweets, makes the task extremely challenging. To this end, we develop a classification framework on Factorization Machines, which is effective and efficient especially in classification tasks with feature sparsity settings. Moreover, we tackle the imbalance problem by introducing cost-sensitive learning techniques in Factorization Machines. Extensively experimental studies on a manually annotated real-life data set show that the proposed approach significantly improves the baseline approach, and yields the precision of 71. 06% and recall of 67. 86%. We also investigate the reason why Factorization Machines perform better. Finally, we introduce the first manually annotated dataset for suggestion classification.

AAAI Conference 2012 Conference Paper

Collective Nominal Semantic Role Labeling for Tweets

  • Xiaohua Liu
  • Zhongyang Fu
  • Furu Wei
  • Ming Zhou

Tweets have become an increasingly popular source of fresh information. We investigate the task of Nominal Semantic Role Labeling (NSRL) for tweets, which aims to identify predicate-argument structures defined by nominals in tweets. Studies of this task can help fine-grained information extraction and retrieval from tweets. There are two main challenges in this task: 1) The lack of information in a single tweet, rooted in the short and noisy nature of tweets; and 2) recovery of implicit arguments. We propose jointly conducting NSRL on multiple similar tweets using a graphical model, leveraging the redundancy in tweets to tackle these challenges. Extensive evaluations on a human annotated data set demonstrate that our method outperforms two baselines with an absolute gain of 2. 7% in F1.

AAAI Conference 2012 Conference Paper

Exacting Social Events for Tweets Using a Factor Graph

  • Xiaohua Liu
  • Xiangyang Zhou
  • Zhongyang Fu
  • Furu Wei
  • Ming Zhou

Social events are events that occur between people where at least one person is aware of the other and of the event taking place. Extracting social events can play an important role in a wide range of applications, such as the construction of social network. In this paper, we introduce the task of social event extraction for tweets, an important source of fresh events. One main challenge is the lack of information in a single tweet, which is rooted in the short and noise-prone nature of tweets. We propose to collectively extract social events from multiple similar tweets using a novel factor graph, to harvest the redundance in tweets, i. e. , the repeated occurrences of a social event in several tweets. We evaluate our method on a human annotated data set, and show that it outperforms all baselines, with an absolute gain of 21% in F1.

AAAI Conference 2012 Conference Paper

Generating Chinese Classical Poems with Statistical Machine Translation Models

  • Jing He
  • Ming Zhou
  • Long Jiang

This paper describes a statistical approach to generation of Chinese classical poetry and proposes a novel method to automatically evaluate poems. The system accepts a set of keywords representing the writing intents from a writer and generates sentences one by one to form a completed poem. A statistical machine translation (SMT) system is applied to generate new sentences, given the sentences generated previously. For each line of sentence a specific model specially trained for that line is used, as opposed to using a single model for all sentences. To enhance the coherence of sentences on every line, a coherence model using mutual information is applied to select candidates with better consistency with previous sentences. In addition, we demonstrate the effectiveness of the BLEU metric for evaluation with a novel method of generating diverse references.

IJCAI Conference 2011 Conference Paper

Collective Semantic Role Labeling for Tweets with Clustering

  • Xiaohua Liu
  • Kuan Li
  • Ming Zhou
  • Zhongyang Xiong

As tweets has become a comprehensive repository of fresh information, Semantic Role Labeling (SRL) for tweets has aroused great research interests because of its center role in a wide range of tweet related studies such as fine-grained information extraction, sentiment analysis and summarization. However, the fact that a tweet is often too short and informal to provide sufficient information poses a main challenge. To tackle this challenge, we propose a new method to collectively label similar tweets. The underlying idea is to exploit similar tweets to make up for the lack of information in a tweet. Specifically, similar tweets are first grouped together by clustering. Then for each cluster a two-stage labeling is conducted: One labeler conducts SRL to get statistical information, such as the predicate/argument/role triples that occur frequently, from its highly confidently labeled results; then in the second stage, another labeler performs SRL with such statistical information to refine the results. Experimental results on a human annotated dataset show that our approach remarkably improves SRL by 3. 1% F1.

AAAI Conference 2011 Conference Paper

Enhancing Semantic Role Labeling for Tweets Using Self-Training

  • Xiaohua Liu
  • Li Kuan
  • Ming Zhou
  • Zhongyang Xiong

Semantic Role Labeling (SRL) for tweets is a meaningful task that can benefit a wide range of applications such as finegrained information extraction and retrieval from tweets. One main challenge of the task is the lack of annotated tweets, which is required to train a statistical model. We introduce self-training to SRL, leveraging abundant unlabeled tweets to alleviate its depending on annotated tweets. A novel strategy of tweet selection is presented, ensuring the chosen tweets are both correct and informative. More specifically, the correctness is estimated according to the labeling confidences and agreement of two Conditional Random Fields based labelers, which are trained on the randomly evenly spitted labeled data; while the informativeness is in proportion to the maximum distance between the tweet and the already selected tweets. We evaluate our method on a human annotated data set and show that bootstrapping improve a baseline by 3. 4% F1.

IJCAI Conference 2007 Conference Paper

  • Shiqi Zhao
  • Ming Zhou
  • Ting Liu

Question paraphrasing is critical in many Natural Language Processing (NLP) applications, especially for question reformulation in question answering (QA). However, choosing an appropriate data source and developing effective methods are challenging tasks. In this paper, we propose a method that exploits Encarta logs to automatically identify question paraphrases and extract templates. Questions from Encarta logs are partitioned into small clusters, within which a perceptron classier is used for identifying question paraphrases. Experiments are conducted and the results have shown: (1) Encarta log data is an eligible data source for question paraphrasing and the user clicks in the data are indicative clues for recognizing paraphrases; (2) the supervised method we present is effective, which can evidently outperform the unsupervised method. Besides, the features introduced to identify paraphrases are sound; (3) the obtained question paraphrase templates are quite effective in question reformulation, enhancing the MRR from 0. 2761 to 0. 4939 with the questions of TREC QA 2003.

IJCAI Conference 2007 Conference Paper

  • Long Jiang
  • Ming Zhou
  • Lee-Feng Chien
  • Cheng Niu

This paper presents a novel approach to improve the named entity translation by combining a translite-ration approach with web mining, using web in-formation as a source to complement translitera-tion, and using transliteration information to guide and enhance web mining. A Maximum Entropy model is employed to rank translation candidates by combining pronunciation similarity and bilingual contextual co-occurrence. Experimental results show that our approach effectively improves the precision and recall of the named entity translation by a large margin.

IJCAI Conference 2007 Conference Paper

  • Jizhou Huang
  • Ming Zhou
  • Dan Yang

This paper presents a novel approach for extracting high-quality <thread-title, reply> pairs as chat knowledge from online discussion forums so as to efficiently support the construction of a chatbot for a certain domain. Given a forum, the high-quality <thread-title, reply> pairs are extracted using a cascaded framework. First, the replies logically relevant to the thread title of the root message are extracted with an SVM classifier from all the replies, based on correlations such as structure and content. Then, the extracted <thread-title, reply> pairs are ranked with a ranking SVM based on their content qualities. Finally, the Top-N <thread-title, reply> pairs are selected as chatbot knowledge. Results from experiments conducted within a movie forum show the proposed approach is effective.