Arrow Research search

Author name cluster

Zhumin Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers
2 author rows

Possible papers

10

NeurIPS Conference 2025 Conference Paper

Belief-Calibrated Multi-Agent Consensus Seeking for Complex NLP Tasks

  • Wentao Deng
  • Jiahuan Pei
  • Zhiwei Xu
  • Zhaochun Ren
  • Zhumin Chen
  • Pengjie Ren

A multi-agent system (MAS) enhances its capacity to solve complex natural language processing (NLP) tasks through collaboration among multiple agents, where consensus-seeking serves as a fundamental mechanism. However, existing consensus-seeking approaches typically rely on voting mechanisms to judge consensus, overlooking contradictions in system-internal beliefs that destabilize the consensus. Moreover, these methods often involve agents updating their results through indiscriminate collaboration with every other agent. Such uniform interaction fails to identify the optimal collaborators for each agent, hindering the emergence of a stable consensus. To address these challenges, we provide a theoretical framework for selecting optimal collaborators that maximize consensus stability. Based on the theorems, we propose the Belief-Calibrated Consensus Seeking (BCCS) framework to facilitate stable consensus via selecting optimal collaborators and calibrating the consensus judgment by system-internal beliefs. Experimental results on the MATH and MMLU benchmark datasets demonstrate that the proposed BCCS framework outperforms the best existing results by 2. 23\% and 3. 95\% of accuracy on challenging tasks, respectively. Our code and data are available at https: //github. com/dengwentao99/BCCS.

AAAI Conference 2025 Conference Paper

ExcluIR: Exclusionary Neural Information Retrieval

  • Wenhao Zhang
  • Mengqi Zhang
  • Shiguang Wu
  • Jiahuan Pei
  • Zhaochun Ren
  • Maarten de Rijke
  • Zhumin Chen
  • Pengjie Ren

Exclusion is an important and universal linguistic skill that humans use to express what they do not want. There is little research on exclusionary retrieval, where users express what they do not want to be part of the results produced for their queries. We investigate the scenario of exclusionary retrieval in document retrieval for the first time. We present ExcluIR, a set of resources for exclusionary retrieval, consisting of an evaluation benchmark and a training set for helping retrieval models to comprehend exclusionary queries. The evaluation benchmark includes 3,452 high-quality exclusionary queries, each of which has been manually annotated. The training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document. We conduct detailed experiments and analyses, obtaining three main observations: (i) existing retrieval models with different architectures struggle to comprehend exclusionary queries effectively; (ii) although integrating our training data can improve the performance of retrieval models on exclusionary retrieval, there still exists a gap compared to human performance; and (iii) generative retrieval models have a natural advantage in handling exclusionary queries.

ICLR Conference 2025 Conference Paper

Uncovering Overfitting in Large Language Model Editing

  • Mengqi Zhang 0002
  • Xiaotian Ye
  • Qiang Liu 0006
  • Shu Wu
  • Pengjie Ren
  • Zhumin Chen

Knowledge editing has been proposed as an effective method for updating and correcting the internal knowledge of Large Language Models (LLMs). However, existing editing methods often struggle with complex tasks, such as multi-hop reasoning. In this paper, we identify and investigate the phenomenon of Editing Overfit, where edited models assign disproportionately high probabilities to the edit target, hindering the generalization of new knowledge in complex scenarios. We attribute this issue to the current editing paradigm, which places excessive emphasis on the direct correspondence between the input prompt and the edit target for each edit sample. To further explore this issue, we introduce a new benchmark, EVOKE (EValuation of Editing Overfit in Knowledge Editing), along with fine-grained evaluation metrics. Through comprehensive experiments and analysis, we demonstrate that Editing Overfit is prevalent in current editing methods and that common overfitting mitigation strategies are ineffective in knowledge editing. To overcome this, inspired by LLMs’ knowledge recall mechanisms, we propose a new plug-and-play strategy called Learn the Inference (LTI), which introduce a Multi-stage Inference Constraint module to guide the edited models in recalling new knowledge similarly to how unedited LLMs leverage knowledge through in-context learning. Extensive experimental results across a wide range of tasks validate the effectiveness of LTI in mitigating Editing Overfit.

AAAI Conference 2024 Conference Paper

Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum

  • Shen Gao
  • Zhengliang Shi
  • Minghang Zhu
  • Bowen Fang
  • Xin Xin
  • Pengjie Ren
  • Zhumin Chen
  • Jun Ma

Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extending the capability of LLMs. Although there are some works that employ open-source LLMs for the tool-learning task, most of them are trained in a controlled environment in which LLMs only learn to execute the human-provided tools. However, selecting proper tools from the large toolset is also a crucial ability for the tool-learning model to be applied in real-world applications. Existing methods usually directly employ self-instruction methods to train the model, which ignores differences in tool complexity. In this paper, we propose the Confucius a novel tool-learning framework to train LLM to use complicated tools in real-world scenarios, which contains two main phases: (1) We first propose a multi-stage learning method to teach the LLM to use various tools from an easy-to-difficult curriculum; (2) thenceforth, we propose the Iterative Self-instruct from Introspective Feedback (ISIF) to dynamically construct the dataset to improve the ability to use the complicated tool. Extensive experiments conducted on both controlled and real-world settings demonstrate the superiority of our tool-learning framework in the real-world application scenario compared to both tuning-free (e.g., ChatGPT, Claude) and tuning-based baselines (e.g., GPT4Tools).

NeurIPS Conference 2023 Conference Paper

Learning to Tokenize for Generative Retrieval

  • Weiwei Sun
  • Lingyong Yan
  • Zheng Chen
  • Shuaiqiang Wang
  • Haichao Zhu
  • Pengjie Ren
  • Zhumin Chen
  • Dawei Yin

As a new paradigm in information retrieval, generative retrieval directly generates a ranked list of document identifiers (docids) for a given query using generative language models (LMs). How to assign each document a unique docid (denoted as document tokenization) is a critical problem, because it determines whether the generative retrieval model can precisely retrieve any document by simply decoding its docid. Most existing methods adopt rule-based tokenization, which is ad-hoc and does not generalize well. In contrast, in this paper we propose a novel document tokenization learning method, GenRet, which learns to encode the complete document semantics into docids. GenRet learns to tokenize documents into short discrete representations (i. e. , docids) via a discrete auto-encoding approach. We develop a progressive training scheme to capture the autoregressive nature of docids and diverse clustering techniques to stabilize the training process. Based on the semantic-embedded docids of any set of documents, the generative retrieval model can learn to generate the most relevant docid only according to the docids' semantic relevance to the queries. We conduct experiments on the NQ320K, MS MARCO, and BEIR datasets. GenRet establishes the new state-of-the-art on the NQ320K dataset. Compared to generative retrieval baselines, GenRet can achieve significant improvements on unseen documents. Moreover, GenRet can also outperform comparable baselines on MS MARCO and BEIR, demonstrating the method's generalizability.

AAAI Conference 2022 Conference Paper

Knowledge Bridging for Empathetic Dialogue Generation

  • Qintong Li
  • Piji Li
  • Zhaochun Ren
  • Pengjie Ren
  • Zhumin Chen

Lack of external knowledge makes empathetic dialogue systems difficult to perceive implicit emotions and learn emotional interactions from limited dialogue history. To address the above problems, we propose to leverage external knowledge, including commonsense knowledge and emotional lexical knowledge, to explicitly understand and express emotions in empathetic dialogue generation. We first enrich the dialogue history by jointly interacting with external knowledge and construct an emotional context graph. Then we learn emotional context representations from the knowledge-enriched emotional context graph and distill emotional signals, which are the prerequisites to predicate emotions expressed in responses. Finally, to generate the empathetic response, we propose an emotional cross-attention mechanism to learn the emotional dependencies from the emotional context graph. Extensive experiments conducted on a benchmark dataset verify the effectiveness of the proposed method. In addition, we find the performance of our method can be further improved by integrating with a pre-trained model that works orthogonally.

ECAI Conference 2020 Conference Paper

A Neural Topical Expansion Framework for Unstructured Persona-Oriented Dialogue Generation

  • Minghong Xu
  • Piji Li
  • Haoran Yang
  • Pengjie Ren
  • Zhaochun Ren
  • Zhumin Chen
  • Jun Ma 0001

Unstructured Persona-oriented Dialogue Systems (UPDS) has been demonstrated effective in generating persona consistent responses by utilizing predefined natural language user persona descriptions (e. g. , “I am a vegan”). However, the predefined user persona descriptions are usually short and limited to only a few descriptive words, which makes it hard to correlate them with the dialogues. As a result, existing methods either fail to use the persona description or use them improperly when generating persona consistent responses. To address this, we propose a neural topical expansion framework, namely Persona Exploration and Exploitation (PEE), which is able to extend the predefined user persona description with semantically correlated content before utilizing them to generate dialogue responses. PEE consists of two main modules: persona exploration and persona exploitation. The former learns to extend the predefined user persona description by mining and correlating with existing dialogue corpus using a variational auto-encoder (VAE) based topic model. The latter learns to generate persona consistent responses by utilizing the predefined and extended user persona description. In order to make persona exploitation learn to utilize user persona description more properly, we also introduce two persona-oriented loss functions: Persona-oriented Matching (P-Match) loss and Persona-oriented Bag-of-Words (P-BoWs) loss which respectively supervise persona selection in encoder and decoder. Experimental results show that our approach outperforms state-of-the-art baselines, in terms of both automatic and human evaluations.

AAAI Conference 2020 Conference Paper

RefNet: A Reference-Aware Network for Background Based Conversation

  • Chuan Meng
  • Pengjie Ren
  • Zhumin Chen
  • Christof Monz
  • Jun Ma
  • Maarten de Rijke

Existing conversational systems tend to generate generic responses. Recently, Background Based Conversations (BBCs) have been introduced to address this issue. Here, the generated responses are grounded in some background information. The proposed methods for BBCs are able to generate more informative responses, however, they either cannot generate natural responses or have difficulties in locating the right background information. In this paper, we propose a Reference-aware Network (RefNet) to address both issues. Unlike existing methods that generate responses token by token, RefNet incorporates a novel reference decoder that provides an alternative way to learn to directly select a semantic unit (e. g. , a span containing complete semantic information) from the background. Experimental results show that RefNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that RefNet can generate more appropriate and human-like responses.

AAAI Conference 2020 Conference Paper

Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation

  • Pengjie Ren
  • Zhumin Chen
  • Christof Monz
  • Jun Ma
  • Maarten de Rijke

Background Based Conversations (BBCs) have been introduced to help conversational systems avoid generating overly generic responses. In a BBC, the conversation is grounded in a knowledge source. A key challenge in BBCs is Knowledge Selection (KS): given a conversational context, try to find the appropriate background knowledge (a text fragment containing related facts or comments, etc.) based on which to generate the next response. Previous work addresses KS by employing attention and/or pointer mechanisms. These mechanisms use a local perspective, i. e. , they select a token at a time based solely on the current decoding state. We argue for the adoption of a global perspective, i. e. , pre-selecting some text fragments from the background knowledge that could help determine the topic of the next response. We enhance KS in BBCs by introducing a Global-to-Local Knowledge Selection (GLKS) mechanism. Given a conversational context and background knowledge, we first learn a topic transition vector to encode the most likely text fragments to be used in the next response, which is then used to guide the local KS at each decoding timestamp. In order to effectively learn the topic transition vector, we propose a distantly supervised learning schema. Experimental results show that the GLKS model significantly outperforms state-of-the-art methods in terms of both automatic and human evaluation. More importantly, GLKS achieves this without requiring any extra annotations, which demonstrates its high degree of scalability.

AAAI Conference 2019 Conference Paper

RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-Based Recommendation

  • Pengjie Ren
  • Zhumin Chen
  • Jing Li
  • Zhaochun Ren
  • Jun Ma
  • Maarten de Rijke

Recurrent neural networks for session-based recommendation have attracted a lot of attention recently because of their promising performance. repeat consumption is a common phenomenon in many recommendation scenarios (e. g. , e-commerce, music, and TV program recommendations), where the same item is re-consumed repeatedly over time. However, no previous studies have emphasized repeat consumption with neural networks. An effective neural approach is needed to decide when to perform repeat recommendation. In this paper, we incorporate a repeat-explore mechanism into neural networks and propose a new model, called RepeatNet, with an encoder-decoder structure. RepeatNet integrates a regular neural recommendation approach in the decoder with a new repeat recommendation mechanism that can choose items from a user’s history and recommends them at the right time. We report on extensive experiments on three benchmark datasets. RepeatNet outperforms state-of-the-art baselines on all three datasets in terms of MRR and Recall. Furthermore, as the dataset size and the repeat ratio increase, the improvements of RepeatNet over the baselines also increase, which demonstrates its advantage in handling repeat recommendation scenarios.