Author name cluster

Geng Tu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

AAAI Conference 2026 Conference Paper

Causal-ERC: A Multimodal Framework with Causal Prompting for Emotion Recognition in Conversations with Large Language Models

Ran Jing
Geng Tu
Yice Zhang
Ruifeng Xu

The rapid advancement of large language models (LLMs) has revitalised research in Emotion Recognition in Conversation (ERC). However, existing LLM-based ERC approaches operate solely on textual input, whereas MLLM-based emotion recognition methods in non-conversational scenarios typically perform only basic multimodal fusion and fail to consider speaker-sensitive contextual dependencies, which limits their performance on ERC tasks. To integrate multimodal cues effectively and address their limitations in handling contextual dependencies, we propose a novel LLM-based framework, Causal-ERC, which captures context representations within each modality and incorporates them into the LLM. Moreover, experimental results show that LLMs perform poorly on long conversations. To improve LLMs' ability to model long conversations, we adjust corresponding causal prompts according to the causal type of each utterance. Experiments on two benchmark MERC datasets demonstrate that our Causal-ERC framework consistently outperforms existing state-of-the-art approaches and improves LLM's performance in long-context scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Consensus-Driven Multi-Agent Cognitive Reasoning for Enhancing the Emotional Intelligence of Large Language Models

Geng Tu
Dingming Li
Jun Huang
Ruifeng Xu

Large Language Models (LLMs) have demonstrated strong performance in various NLP tasks but remain limited in emotional intelligence (EI). Benchmarks such as EmoBench attribute this gap to deficiencies in cognitively demanding tasks that require inferring others’ latent mental states, intentions, and emotions in nuanced social contexts. To address this, we propose MACRo, a Multi-Agent Cognitive Reasoning framework that generates a structured Cognitive Chain of Thought comprising Situation, Clue, Thought, Action, and Emotion. Each component is generated by a specialized agent, enabling modular, interpretable multi-step reasoning. To ensure coherence and mitigate hallucinations, a coordinator agent verifies outputs, and a consensus game mechanism enforces alignment across reasoning steps. Extensive Experiments on EmoBench show that MACRo significantly enhances both emotional understanding and application across LLMs. Further evaluations confirm its generalizability to real-world social applications such as emotional support conversations.

PDF Details DOI

IS Journal 2025 Journal Article

Benchmarking Explainable Argumentation Dialogue via Freeman’s Theory

Yang Sun
Geng Tu
Wenpeng Lu
Min Yang
Erik Cambria
Ruifeng Xu

Argumentative dialogue involves structured exchanges of claims and supporting evidence, yet progress in building effective dialogue systems is limited by the scarcity of high-quality datasets. To address this, we introduce CMV-AD, a baseline dataset derived from the ChangeMyView corpus, designed for modeling structured argumentative interactions. We further propose FTCoT, a Freeman’s Theory-based Chain-of-Thought framework that enhances interpretability and reasoning in dialogue generation. FTCoT represents each dialogue turn with a structured quadruple: Dialogue Summary, User Argument, Assistant Argument, and Response Reasoning. We construct FTCoT using large language models (LLMs), leveraging their capabilities in reasoning and data annotation. Extensive automatic and human evaluations demonstrate the effectiveness of FTCoT in improving both the interpretability and quality of generated responses.

Details DOI

AAAI Conference 2025 Conference Paper

BeyondGender: A Multifaceted Bilingual Dataset for Practical Sexism Detection

Xuan Luo
Li Yang
Han Zhang
Geng Tu
Qianlong Wang
Keyang Ding
Chuang Fan
Jing Li

Sexism affects both women and men, yet research often overlooks misandry and suffers from overly broad annotations that limit AI applications. To address this, we introduce BeyondGender, a dataset meticulously annotated according to the latest definitions of misogyny and misandry. It features innovative multifaceted labels encompassing aspects of sexism, gender, phrasing, misogyny, and misandry. The dataset includes 6K English and 1.7K Chinese sexism instances, alongside 13K non-sexism examples. Our evaluations of masked language models and large language models reveal that they detect misogyny in English and misandry in Chinese more effectively, with F1-scores of 0.87 and 0.62, respectively. However, they frequently misclassify hostile and mild comments, underscoring the complexity of sexism detection. Parallel corpus experiments suggest promising data augmentation strategies to enhance AI systems for nuanced sexism detection, and our dataset can be leveraged to improve value alignment in large language models.

PDF Details DOI

IS Journal 2024 Journal Article

AdaCLF: An Adaptive Curriculum Learning Framework for Emotional Support Conversation

Geng Tu
Taiyu Niu
Ruifeng Xu
Bin Liang
Erik Cambria

Emotional support conversation (ESC) aims to alleviate emotional distress using data-driven approaches trained on human-generated responses. However, the subjective and open-ended nature of human conversations presents challenges in training ESC models due to uneven complexities in query–response pairs. This uneven complexity impedes the efficiency and effectiveness of learning in ESC models. Based on this, we propose an adaptive curriculum learning framework (AdaCLF) to dynamically choose courses of varying complexity according to the learning status of the ESC model. AdaCLF consists of two main components: the student model (referred to as the ESC model) and the teacher model (responsible for selecting appropriate data to enhance the student model’s training). The framework operates within the reinforcement learning paradigm, where the teacher model utilizes feedback from the student model to optimize its teaching strategy, fostering collaborative evolution. Both automatic and human evaluations on benchmark datasets demonstrate that our framework significantly improves existing ESC methods, generating more effective supportive responses.

Details DOI

AAAI Conference 2024 Conference Paper

Adaptive Graph Learning for Multimodal Conversational Emotion Detection

Geng Tu
Tian Xie
Bin Liang
Hongpeng Wang
Ruifeng Xu

Multimodal Emotion Recognition in Conversations (ERC) aims to identify the emotions conveyed by each utterance in a conversational video. Current efforts encounter challenges in balancing intra- and inter-speaker context dependencies when tackling intra-modal interactions. This balance is vital as it encompasses modeling self-dependency (emotional inertia) where speakers' own emotions affect them and modeling interpersonal dependencies (empathy) where counterparts' emotions influence a speaker. Furthermore, challenges arise in addressing cross-modal interactions that involve content with conflicting emotions across different modalities. To address this issue, we introduce an adaptive interactive graph network (IGN) called AdaIGN that employs the Gumbel Softmax trick to adaptively select nodes and edges, enhancing intra- and cross-modal interactions. Unlike undirected graphs, we use a directed IGN to prevent future utterances from impacting the current one. Next, we propose Node- and Edge-level Selection Policies (NESP) to guide node and edge selection, along with a Graph-Level Selection Policy (GSP) to integrate the utterance representation from original IGN and NESP-enhanced IGN. Moreover, we design a task-specific loss function that prioritizes text modality and intra-speaker context selection. To reduce computational complexity, we use pre-defined pseudo labels through self-supervised methods to mask unnecessary utterance nodes for selection. Experimental results show that AdaIGN outperforms state-of-the-art methods on two popular datasets. Our code will be available at https://github.com/TuGengs/AdaIGN.

PDF Details DOI

JAIR Journal 2024 Journal Article

Multi-Modal Attentive Prompt Learning for Few-shot Emotion Recognition in Conversations

Xingwei Liang
Geng Tu
Jiachen Du
Ruifeng Xu

Emotion recognition in conversations (ERC) has emerged as an important research area in Natural Language Processing and Affective Computing, focusing on accurately identifying emotions within the conversational utterance. Conventional approaches typically rely on labeled training samples for fine-tuning pre-trained language models (PLMs) to enhance classification performance. However, the limited availability of labeled data in real-world scenarios poses a significant challenge, potentially resulting in diminished model performance. In response to this challenge, we present the Multi-modal Attentive Prompt (MAP) learning framework, tailored specifically for few-shot emotion recognition in conversations. The MAP framework consists of four integral modules: multi-modal feature extraction for the sequential embedding of text, visual, and acoustic inputs; a multi-modal prompt generation module that creates six manually-designed multi-modal prompts; an attention mechanism for prompt aggregation; and an emotion inference module for emotion prediction. To evaluate our proposed model’s efficacy, we conducted extensive experiments on two widely recognized benchmark datasets, MELD and IEMOCAP. Our results demonstrate that the MAP framework outperforms state-of-the-art ERC models, yielding notable improvements of 3.5% and 0.4% in micro F1 scores. These findings highlight the MAP learning framework’s ability to effectively address the challenge of limited labeled data in emotion recognition, offering a promising strategy for improving ERC model performance.

PDF Details DOI

ECAI Conference 2023 Conference Paper

Do Topic and Causal Consistency Affect Emotion Cognition? A Graph Interactive Network for Conversational Emotion Detection

Geng Tu
Bin Liang 0004
Xiucheng Lyu
Lin Gui 0003
Ruifeng Xu 0001

Emotion recognition in conversations (ERC) typically requires modeling both intra- and inter-speaker context dependencies. However, when modeling inter-speaker dependencies, it may not capture differences among other participants in the conversation. Recent ERC research has attempted to improve utterance representations by utilizing speakers’ commonsense knowledge. Nonetheless, these studies ignore the causal consistency in knowledge between the two participants, which contradicts the above modeling of speaker-sensitive context dependencies. Additionally, it is observed that historical utterances from various topics are blindly leveraged in context modeling, which fails the inter- and intra-topic coherence. To address these issues, we propose the topic- and causal-aware interactive graph network (TCA-IGN). Specifically, we suggest a graph encoder to model topic-level context dependencies, achieving inter- and intra-topic coherence. The topics of utterances are derived from a context-sensitive neural topic model. Then, we present a causal-aware graph attention to keep the speaker’s causal consistency in commonsense knowledge, improving speaker-level context modeling. Finally, considering the defect of modeling inter-speaker or inter-topic context dependencies, we employ supervised contrastive learning to sweeten it. Experimental results show that TCA-IGN outperforms state-of-the-art methods on three public conversational datasets.

Details