Arrow Research search

Author name cluster

Yasheng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing

  • Fan Gao
  • Dongyuan Li
  • Ding Xia
  • Fei Mi
  • Yasheng Wang
  • Lifeng Shang
  • Baojun Wang

Prompt-based essay writing is an effective and common way to assess students' critical thinking skills. Recent work has evaluated the impressive capabilities of Large Language Models (LLMs) on this task. However, most studies focus primarily on English. Those examining LLMs' performance in Chinese often rely on coarse-grained text quality metrics, overlooking the structural and rhetorical complexities of Chinese essays, particularly across diverse genres. We therefore propose EssayBench, a multi-genre benchmark specifically designed for Chinese essay writing, along with a fine-grained, genre-specific scoring framework that hierarchically aggregates scores to better align with human preferences. The dataset comprises 728 real-world prompts across four major genres (Argumentative, Narrative, Descriptive, and Expository), and includes both Open-Ended and Constrained types. Our evaluation protocol is validated through a comprehensive human agreement study. The results show that our protocol aligns well with human judgments, achieving a highest Spearman's correlation of 0.816 and outperforming coarse-grained evaluation methods by an average of 8.6\%. Finally, we benchmark 15 large LLMs, analyzing their strengths and limitations across genres and instruction types. We believe EssayBench offers a more reliable framework for evaluating Chinese essay generation and provides valuable insights for improving LLMs in this domain.

AAAI Conference 2026 Conference Paper

ToolACE-R: Model-aware Iterative Training and Adaptive Refinement for Tool learning

  • Xingshan Zeng
  • Weiwen Liu
  • Xu Huang
  • Zezhong Wang
  • Lingzhi Wang
  • Liangyou Li
  • Yasheng Wang
  • Lifeng Shang

Tool learning, which allows Large Language Models (LLMs) to leverage external tools for solving complex user tasks, has emerged as a promising avenue for extending model capabilities. However, existing approaches primarily focus on data synthesis for fine-tuning LLMs to invoke tools effectively, largely ignoring how to fully stimulate the potential of the model. In this paper, we propose ToolACE-R, a novel framework that includes both model-aware iterative training and adaptive refinement for tool learning. ToolACE-R features a model-aware iterative training procedure that progressively adjust training samples based on the model’s evolving capabilities to maximize its potential. Additionally, it incorporates self-refinement training corpus which emphasizes LLM's ability to iteratively refine their tool calls, optimizing performance without requiring external feedback. Furthermore, we introduce adaptive self-refinement for efficient test-time scaling, where the trained model can autonomously determine when to stop the process based on iterative self-refinement. We conduct extensive experiments across several benchmark datasets, showing that ToolACE-R achieves competitive performance compared to advanced LLMs. The performance can be further improved efficiently through adaptive self-refinement. These results highlight the effectiveness and generalizability of ToolACE-R, offering a promising direction for more efficient and scalable tool learning.

NeurIPS Conference 2025 Conference Paper

Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering

  • Kuicai Dong
  • CHANG YUJING
  • Shijie Huang
  • Yasheng Wang
  • Ruiming Tang
  • Yong Liu

Document Visual Question Answering (DocVQA) faces dual challenges in processing lengthy multimodal documents (text, images, tables) and performing cross-modal reasoning. Current document retrieval-augmented generation (DocRAG) methods remain limited by their text-centric approaches, frequently missing critical visual information. The field also lacks robust benchmarks for assessing multimodal evidence selection and integration. We introduce MMDocRAG, a comprehensive benchmark featuring 4, 055 expert-annotated QA pairs with multi-page, cross-modal evidence chains. Our framework introduces innovative metrics for evaluating multimodal quote selection and enables answers that interleave text with relevant visual elements. Through large-scale experiments with 60 VLM/LLM models and 14 retrieval systems, we identify persistent challenges in multimodal evidence retrieval, selection, and integration. Key findings reveal that advanced proprietary LVMs show superior performance than open-sourced alternatives. Also, they show moderate advantages using multimodal inputs over text-only inputs, while open-source alternatives show significant performance degradation. Notably, fine-tuned LLMs achieve substantial improvements when using detailed image descriptions. MMDocRAG establishes a rigorous testing ground and provides actionable insights for developing more robust multimodal DocVQA systems.

ICLR Conference 2025 Conference Paper

Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

  • Yuxin Jiang
  • Bo Huang
  • Yufei Wang 0005
  • Xingshan Zeng
  • Liangyou Li
  • Yasheng Wang
  • Xin Jiang 0002
  • Lifeng Shang

Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the generation of the winning response and the losing response within pairwise data are typically isolated, leading to weak correlations between them as well as suboptimal alignment performance. To address this issue, we propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC. Firstly, we increase the consistency and informativeness of the pairwise preference signals through targeted modifications, synthesizing a pseudo-winning response by improving the losing response with the winning response as a reference. Secondly, we identify that DPO alone is insufficient to model these correlations and capture nuanced variations. Therefore, we propose learning token-level correlations by dynamically leveraging the policy model's confidence during training. Comprehensive experiments on QA, math, and instruction-following tasks demonstrate the effectiveness of our approach, significantly surpassing competitive baselines, including DPO. Additionally, our in-depth quantitative analysis reveals the reasons behind our method's superior performance over DPO and showcases its versatility to other DPO variants.

NeurIPS Conference 2025 Conference Paper

DeepDiver: Adaptive Web-Search Intensity Scaling via Reinforcement Learning

  • Wenxuan Shi
  • Haochen Tan
  • Chuqiao Kuang
  • Xiaoguang Li
  • Hanting Chen
  • Xiaozhe Ren
  • Yasheng Wang
  • Lu Hou

Information seeking demands iterative evidence gathering and reflective reasoning, yet large language models (LLMs) still struggle with it in open-web question answering. Existing prompting and supervised fine-tuning (SFT) methods remain fixed by prompt rules or training corpora, and are usually benchmarked only on well-structured wiki sources, limiting real-world adaptability. We introduce $\textbf{WebPuzzle}$, a 24k-sample training and 275-sample test benchmark that evaluates information seeking on the live internet, across both wiki and open-domain queries. Leveraging 7k WebPuzzle instances, we develop $\textbf{DeepDiver}$, a reinforcement-learning (RL) framework that cultivates $\textbf{Search Intensity Scaling (SIS)}$—an emergent ability to escalate search frequency and depth instead of settling on overconfident, under-evidenced answers. With SIS, Qwen2. 5-7B-Instruct and Pangu-7B-Reasoner attain performance on real-web tasks comparable to the 671B-parameter DeepSeek-R1. We detail DeepDiver’s curriculum from cold-start SFT to a well designed RL procedure, and show that its seeking policy generalized from closed-ended queries to open-ended generation such as long-form writing. Our results advance adaptive information seeking in LLMs and provide a rigorous benchmark for future work.

ICML Conference 2025 Conference Paper

Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape

  • Tao Li
  • Zhengbao He
  • Yujun Li
  • Yasheng Wang
  • Lifeng Shang
  • Xiaolin Huang

Fine-tuning large-scale pre-trained models is prohibitively expensive in terms of computation and memory costs. Low-Rank Adaptation (LoRA), a popular Parameter-Efficient Fine-Tuning (PEFT) method, offers an efficient solution by optimizing only low-rank matrices. Despite recent progress in improving LoRA’s performance, the relationship between the LoRA optimization space and the full parameter space is often overlooked. A solution that appears flat in the loss landscape of the LoRA space may still exhibit sharp directions in the full parameter space, potentially compromising generalization. We introduce Flat-LoRA, which aims to identify a low-rank adaptation situated in a flat region of the full parameter space. Instead of adopting the well-established sharpness-aware minimization approach, which incurs significant computation and memory overheads, we employ a Bayesian expectation loss objective to preserve training efficiency. Further, we design a refined strategy for generating random perturbations to enhance performance and carefully manage memory overhead using random seeds. Experiments across diverse tasks—including mathematical reasoning, coding abilities, dialogue generation, instruction following, and text-to-image generation—demonstrate that Flat-LoRA improves both in-domain and out-of-domain generalization. Code is available at https: //github. com/nblt/Flat-LoRA.

ICLR Conference 2025 Conference Paper

Learning Evolving Tools for Large Language Models

  • Guoxin Chen
  • Zhong Zhang 0004
  • Xin Cong
  • Fangda Guo
  • Yesai Wu
  • Yankai Lin 0001
  • Wenzheng Feng
  • Yasheng Wang

Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the adaptability of LLMs in real-world applications. In this paper, we propose ToolEVO, a novel framework designed to enhance the adaptive and reflective capabilities of LLMs against tool variability. By leveraging Monte Carlo Tree Search, ToolEVO facilitates active exploration and interaction of LLMs within dynamic environments, allowing for autonomous self-reflection and self-updating of tool usage based on environmental feedback. Additionally, we introduce ToolQA-D, a benchmark specifically designed to evaluate the impact of tool variability. Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning.

ICLR Conference 2025 Conference Paper

Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance

  • Yaxi Lu
  • Shenzhi Yang
  • Cheng Qian 0008
  • Guirong Chen
  • Qinyu Luo
  • Yesai Wu
  • Huadong Wang
  • Xin Cong

Agents powered by large language models have shown remarkable abilities in solving complex tasks. However, most agent systems remain reactive, limiting their effectiveness in scenarios requiring foresight and autonomous decision-making. In this paper, we tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions. We propose a novel data-driven approach for this problem. Firstly, we collect real-world human activities to generate proactive task predictions. These predictions are then labeled by human annotators as either accepted or rejected. The labeled data is used to train a reward model that simulates human judgment and serves as an automatic evaluator of the proactiveness of LLM agents. Building on this, we develop a comprehensive data generation pipeline to create a diverse dataset, ProactiveBench, containing 6,790 events. Finally, we demonstrate that fine-tuning models with the proposed ProactiveBench can significantly elicit the proactiveness of LLM agents. Experimental results show that our fine-tuned model achieves an F1-Score of 66.47% in proactively offering assistance, outperforming all open-source and close-source models. These results highlight the potential of our method in creating more proactive and effective agent systems, paving the way for future advancements in human-agent collaboration.

NeurIPS Conference 2025 Conference Paper

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

  • Wanlong Liu
  • Junxiao Xu
  • Fei Yu
  • Yukang Lin
  • Ke Ji
  • Wenyu Chen
  • Lifeng Shang
  • Yasheng Wang

Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50\%, while achieving performance comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.

ICLR Conference 2025 Conference Paper

RevisEval: Improving LLM-as-a-Judge via Response-Adapted References

  • Qiyuan Zhang 0001
  • Yufei Wang 0005
  • Tiezheng Yu
  • Yuxin Jiang
  • Chuhan Wu
  • Liangyou Li
  • Yasheng Wang
  • Xin Jiang 0002

With significant efforts in recent studies, LLM-as-a-Judge has become a cost-effective alternative to human evaluation for assessing text generation quality in a wide range of tasks. However, there still remains a reliability gap between LLM-as-a-Judge and human evaluation. One important reason is the lack of guided oracles in the evaluation process. Motivated by the role of reference pervasively used in classic text evaluation, we introduce RevisEval, a novel text generation evaluation paradigm via the response-adapted references. RevisEval is driven by the key observation that an ideal reference should maintain the necessary relevance to the response to be evaluated. Specifically, RevisEval leverages the text revision capabilities of large language models (LLMs) to adaptively revise the response, then treat the revised text as the reference (response-adapted reference) for the subsequent evaluation. Extensive experiments demonstrate that RevisEval outperforms traditional reference-free and reference-based evaluation paradigms that use LLM-as-a-Judge across NLG tasks and open-ended instruction-following tasks. More importantly, our response-adapted references can further boost the classical text metrics, e.g., BLEU and BERTScore, compared to traditional references and even rival the LLM-as-a-Judge. A detailed analysis is also conducted to confirm RevisEval's effectiveness in bias reduction, the impact of inference cost, and reference relevance.

NeurIPS Conference 2025 Conference Paper

RidgeLoRA: Matrix Ridge Enhanced Low-Rank Adaptation of Large Language Models

  • Junda Zhu
  • Jun Ai
  • Yujun Li
  • Yichun Yin
  • Yasheng Wang
  • Lifeng Shang
  • Qun Liu

As one of the state-of-the-art parameter-efficient fine-tuning~(PEFT) methods, Low-Rank Adaptation (LoRA) enables model optimization with reduced computational cost through trainable low-rank matrix. However, the low-rank nature makes it prone to produce a decrease in the representation ability, leading to suboptimal performance. In order to break this limitation, we propose RidgeLoRA, a lightweight architecture like LoRA that incorporates novel architecture and matrix ridge enhanced full-rank approximation, to match the performance of full-rank training, while eliminating the need for high memory and a large number of parameters to restore the rank of matrices. We provide a rigorous mathematical derivation to prove that RidgeLoRA has a better upper bound on the representations than vanilla LoRA. Furthermore, extensive experiments across multiple domains demonstrate that RidgeLoRA achieves better performance than other LoRA variants, and can even match or surpass full-rank training.

ICLR Conference 2025 Conference Paper

Spa-Bench: a comprehensive Benchmark for Smartphone Agent Evaluation

  • Jingxuan Chen
  • Derek Yuen
  • Bin Xie
  • Yuhao Yang 0008
  • Gongwei Chen
  • Zhihao Wu
  • Li Yixing
  • Xurui Zhou

Smartphone agents are increasingly important for helping users control devices efficiently, with (Multimodal) Large Language Model (MLLM)-based approaches emerging as key contenders. Fairly comparing these agents is essential but challenging, requiring a varied task scope, the integration of agents with different implementations, and a generalisable evaluation pipeline to assess their strengths and weaknesses. In this paper, we present SPA-Bench, a comprehensive SmartPhone Agent Benchmark designed to evaluate (M)LLM-based agents in an interactive environment that simulates real-world conditions. SPA-Bench offers three key contributions: (1) A diverse set of tasks covering system and third-party apps in both English and Chinese, focusing on features commonly used in daily routines; (2) A plug-and-play framework enabling real-time agent interaction with Android devices, integrating over ten agents with the flexibility to add more; (3) A novel evaluation pipeline that automatically assesses agent performance across multiple dimensions, encompassing seven metrics related to task completion and resource consumption. Our extensive experiments across tasks and agents reveal challenges like interpreting mobile user interfaces, action grounding, memory retention, and execution costs. We propose future research directions to ease these difficulties, moving closer to real-world smartphone agent applications.

ICLR Conference 2025 Conference Paper

ToolACE: Winning the Points of LLM Function Calling

  • Weiwen Liu
  • Xu Huang 0008
  • Xingshan Zeng
  • Xinlong Hao
  • Shuai Yu
  • Dexun Li
  • Shuai Wang 0020
  • Weinan Gan

Function calling significantly extends the application boundary of large language models (LLMs), where high-quality and diverse training data is critical for unlocking this capability. However, collecting and annotating real function-calling data is challenging, while synthetic data from existing pipelines often lack coverage and accuracy. In this paper, we present ToolACE, an automatic agentic pipeline designed to generate accurate, complex, and diverse tool-learning data, specifically tailored to the capabilities of LLMs. ToolACE leverages a novel self-evolution synthesis process to curate a comprehensive API pool of 26,507 diverse APIs. Dialogs are further generated through the interplay among multiple agents, under the guidance of a complexity evaluator. To ensure data accuracy, we implement a dual-layer verification system combining rule-based and model-based checks. We demonstrate that models trained on our synthesized data---even with only 8B parameters---achieve state-of-the-art performance, comparable to the latest GPT-4 models. Our model and a subset of the data are publicly available at https://huggingface.co/Team-ACE.

AAAI Conference 2023 Conference Paper

KPT: Keyword-Guided Pre-training for Grounded Dialog Generation

  • Qi Zhu
  • Fei Mi
  • Zheng Zhang
  • Yasheng Wang
  • Yitong Li
  • Xin Jiang
  • Qun Liu
  • Xiaoyan Zhu

Incorporating external knowledge into the response generation process is essential to building more helpful and reliable dialog agents. However, collecting knowledge-grounded conversations is often costly, calling for a better pre-trained model for grounded dialog generation that generalizes well w.r.t. different types of knowledge. In this work, we propose KPT (Keyword-guided Pre-Training), a novel self-supervised pre-training method for grounded dialog generation without relying on extra knowledge annotation. Specifically, we use a pre-trained language model to extract the most uncertain tokens in the dialog as keywords. With these keywords, we construct two kinds of knowledge and pre-train a knowledge-grounded response generation model, aiming at handling two different scenarios: (1) the knowledge should be faithfully grounded; (2) it can be selectively used. For the former, the grounding knowledge consists of keywords extracted from the response. For the latter, the grounding knowledge is additionally augmented with keywords extracted from other utterances in the same dialog. Since the knowledge is extracted from the dialog itself, KPT can be easily performed on a large volume and variety of dialogue data. We considered three data sources (open-domain, task-oriented, conversational QA) with a total of 2.5M dialogues. We conduct extensive experiments on various few-shot knowledge-grounded generation tasks, including grounding on dialog acts, knowledge graphs, persona descriptions, and Wikipedia passages. Our comprehensive experiments and analyses demonstrate that KPT consistently outperforms state-of-the-art methods on these tasks with diverse grounding knowledge.

IJCAI Conference 2023 Conference Paper

Learning Summary-Worthy Visual Representation for Abstractive Summarization in Video

  • Zenan Xu
  • Xiaojun Meng
  • Yasheng Wang
  • Qinliang Su
  • Zexuan Qiu
  • Xin Jiang
  • Qun Liu

Multimodal abstractive summarization for videos (MAS) requires generating a concise textual summary to describe the highlights of a video according to multimodal resources, in our case, the video content and its transcript. Inspired by the success of the large-scale generative pre-trained language model (GPLM) in generating high-quality textual content (e. g. , summary), recent MAS methods have proposed to adapt the GPLM to this task by equipping it with the visual information, which is often obtained through a general-purpose visual feature extractor. However, the generally extracted visual features may overlook some summary-worthy visual information, which impedes model performance. In this work, we propose a novel approach to learning the summary-worthy visual representation that facilitates abstractive summarization. Our method exploits the summary-worthy information from both the cross-modal transcript data and the knowledge that distills from the pseudo summary. Extensive experiments on three public multimodal datasets show that our method outperforms all competing baselines. Furthermore, with the advantages of summary-worthy visual information, our model can have a significant improvement on small datasets or even datasets with limited training data.

AAAI Conference 2022 Conference Paper

CINS: Comprehensive Instruction for Few-Shot Learning in Task-Oriented Dialog Systems

  • Fei Mi
  • Yasheng Wang
  • Yitong Li

As the labeling cost for different modules in task-oriented dialog (ToD) systems is high, a major challenge is to learn different tasks with the least amount of labeled data. Recently, pre-trained language models (PLMs) have shown promising results for few-shot learning in ToD. To better utilize the power of PLMs, this paper proposes Comprehensive Instruction (CINS) that exploits PLMs with extra taskspecific instructions. We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD, ie. intent classification, dialog state tracking, and natural language generation. A sequence-to-sequence model (T5) is adopted to solve these three tasks in a unified framework. Extensive experiments are conducted on these ToD tasks in realistic fewshot learning scenarios with small validation data. Empirical results demonstrate that the proposed CINS approach consistently improves techniques that finetune PLMs with raw input or short prompt.

NeurIPS Conference 2022 Conference Paper

Sparse Structure Search for Delta Tuning

  • Shengding Hu
  • Zhen Zhang
  • Ning Ding
  • Yadao Wang
  • Yasheng Wang
  • Zhiyuan Liu
  • Maosong Sun

Adapting large pre-trained models (PTMs) through fine-tuning imposes prohibitive computational and storage burdens. Recent studies of delta tuning (DT), i. e. , parameter-efficient tuning, find that only optimizing a small portion of parameters conditioned on PTMs could yield on-par performance compared to conventional fine-tuning. Generally, DT methods exquisitely design delta modules (DT modules) which could be applied to arbitrary fine-grained positions inside PTMs. However, the effectiveness of these fine-grained positions largely relies on sophisticated manual designation, thereby usually producing sub-optimal results. In contrast to the manual designation, we explore constructing DT modules in an automatic manner. We automatically \textbf{S}earch for the \textbf{S}parse \textbf{S}tructure of \textbf{Delta} Tuning (S$^3$Delta). Based on a unified framework of various DT methods, S$^3$Delta conducts the differentiable DT structure search through bi-level optimization and proposes shifted global sigmoid method to explicitly control the number of trainable parameters. Extensive experiments show that S$^3$Delta surpasses manual and random structures with less trainable parameters. The searched structures preserve more than 99\% fine-tuning performance with 0. 01\% trainable parameters. Moreover, the advantage of S$^3$Delta is amplified with extremely low trainable parameters budgets (0. 0009\%$\sim$0. 01\%). The searched structures are transferable and explainable, providing suggestions and guidance for the future design of DT methods. Our codes are publicly available at \url{https: //github. com/thunlp/S3Delta}.

AAAI Conference 2022 Conference Paper

UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation

  • Zhengkun Zhang
  • Xiaojun Meng
  • Yasheng Wang
  • Xin Jiang
  • Qun Liu
  • Zhenglu Yang

With the rapid increase of multimedia data, a large body of literature has emerged to work on multimodal summarization, the majority of which target at refining salient information from textual and visual modalities to output a pictorial summary with the most relevant images. Existing methods mostly focus on either extractive or abstractive summarization and rely on qualified image captions to build image references. We are the first to propose a Unified framework for Multimodal Summarization grounding on BART, UniMS, that integrates extractive and abstractive objectives, as well as selecting the image output. Specially, we adopt knowledge distillation from a vision-language pretrained model to improve image selection, which avoids any requirement on the existence and quality of image captions. Besides, we introduce a visual guided decoder to better integrate textual and visual modalities in guiding abstractive text generation. Results show that our best model achieves a new state-of-the-art result on a large-scale benchmark dataset. The newly involved extractive objective as well as the knowledge distillation technique are proven to bring a noticeable improvement to the multimodal summarization task.

AAAI Conference 2020 Conference Paper

Multi-Channel Reverse Dictionary Model

  • Lei Zhang
  • Fanchao Qi
  • Zhiyuan Liu
  • Yasheng Wang
  • Qun Liu
  • Maosong Sun

A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description. Existing reverse dictionary methods cannot deal with highly variable input queries and low-frequency target words successfully. Inspired by the description-to-word inference process of humans, we propose the multi-channel reverse dictionary model, which can mitigate the two problems simultaneously. Our model comprises a sentence encoder and multiple predictors. The predictors are expected to identify different characteristics of the target word from the input query. We evaluate our model on English and Chinese datasets including both dictionary definitions and human-written descriptions. Experimental results show that our model achieves the state-of-the-art performance, and even outperforms the most popular commercial reverse dictionary system on the human-written description dataset. We also conduct quantitative analyses and a case study to demonstrate the effectiveness and robustness of our model. All the code and data of this work can be obtained on https: //github. com/thunlp/MultiRD.