Arrow Research search

Author name cluster

Chengming Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
2 author rows

Possible papers

9

AAAI Conference 2026 Conference Paper

Emotion and Intention Guided Multi-Modal Learning for Sticker Response Selection

  • Yuxuan Hu
  • Jian Chen
  • Yuhao Wang
  • Zixuan Li
  • Jing Xiong
  • Pengyue Jia
  • Wei Wang
  • Chengming Li

Stickers are widely used in online communication to convey emotions and implicit intentions. The Sticker Response Selection (SRS) task aims to select the most contextually appropriate sticker based on the dialogue. However, existing methods typically rely on semantic matching and model emotional and intentional cues separately, which can lead to mismatches when emotions and intentions are misaligned. To address this issue, we propose Emotion and Intention Guided Multi-Modal Learning (EIGML). This framework is the first to jointly model emotion and intention, effectively reducing the bias caused by isolated modeling and significantly improving selection accuracy. Specifically, we introduce Dual-Level Contrastive Framework to perform both intra-modality and inter-modality alignment, ensuring consistent representation of emotional and intentional features within and across modalities. In addition, we design an Intention-Emotion Guided Multi-Modal Fusion module that integrates emotional and intentional information progressively through three components: Emotion-Guided Intention Knowledge Selection, Intention-Emotion Guided Attention Fusion, and Similarity-Adjusted Matching Mechanism. This design injects rich, effective information into the model and enables a deeper understanding of the dialogue, ultimately enhancing sticker selection performance. Experimental results on two public datasets show that EIGML outperforms state-of-the-art baselines, achieving higher accuracy and a better understanding of emotional and intentional features.

AAAI Conference 2025 Conference Paper

Training on the Benchmark Is Not All You Need

  • Shiwen Ni
  • Xiangtao Kong
  • Chengming Li
  • Xiping Hu
  • Ruifeng Xu
  • Jia Zhu
  • Min Yang

The success of Large Language Models (LLMs) relies heavily on the huge amount of pre-training data learned in the pre-training phase. The opacity of the pre-training process and the training data causes the results of many benchmark tests to become unreliable. If any model has been trained on a benchmark test set, it can seriously hinder the health of the field. In order to automate and efficiently test the capabilities of large language models, numerous mainstream benchmarks adopt a multiple-choice format. As the swapping of the contents of multiple-choice options does not affect the meaning of the question itself, we propose a simple and effective data leakage detection method based on this property. Specifically, we shuffle the contents of the options in the data to generate the corresponding derived data sets, and then detect data leakage based on the model's log probability distribution over the derived data sets. If there is a maximum and outlier in the set of log probabilities, it indicates that the data is leaked. Our method is able to work under gray-box conditions without access to model training data or weights, effectively identifying data leakage from benchmark test sets in model pre-training data, including both normal scenarios and complex scenarios where options may have been shuffled intentionally or unintentionally. Through experiments based on two LLMs and benchmark designs, we demonstrate the effectiveness of our method. In addition, we evaluate the degree of data leakage of 35 mainstream open-source LLMs on four benchmark datasets and give a ranking of the leaked LLMs for each benchmark, and we find that the Qwen family of LLMs has the highest degree of data leakage.

ICLR Conference 2024 Conference Paper

DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning

  • Jing Xiong
  • Zixuan Li
  • Chuanyang Zheng
  • Zhijiang Guo
  • Yichun Yin
  • Enze Xie
  • Zhicheng Yang
  • Qingxing Cao

Recent advances in natural language processing, primarily propelled by Large Language Models (LLMs), have showcased their remarkable capabilities grounded in in-context learning. A promising avenue for guiding LLMs in intricate reasoning tasks involves the utilization of intermediate reasoning steps within the Chain-of-Thought (CoT) paradigm. Nevertheless, the central challenge lies in the effective selection of exemplars for facilitating in-context learning. In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking (DQ-LoRe) to automatically select exemplars for in-context learning. Dual Queries first query LLM to obtain LLM-generated knowledge such as CoT, then query the retriever to obtain the final exemplars via both question and the knowledge. Moreover, for the second query, LoRe employs dimensionality reduction techniques to refine exemplar selection, ensuring close alignment with the input question's knowledge. Through extensive experiments, we demonstrate that DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5\% to 94.2\%. Our comprehensive analysis further reveals that DQ-LoRe consistently outperforms retrieval-based approaches in terms of both performance and adaptability, especially in scenarios characterized by distribution shifts. DQ-LoRe pushes the boundaries of in-context learning and opens up new avenues for addressing complex reasoning challenges.

NeurIPS Conference 2024 Conference Paper

II-Bench: An Image Implication Understanding Benchmark for Multimodal Large Language Models

  • Ziqiang Liu
  • Feiteng Fang
  • Xi Feng
  • Xinrun Du
  • Chenhao Zhang
  • Zekun Wang
  • yuelin bai
  • Qixuan Zhao

The rapid advancements in the development of multimodal large language models (MLLMs) have consistently led to new breakthroughs on various benchmarks. In response, numerous challenging and comprehensive benchmarks have been proposed to more accurately assess the capabilities of MLLMs. However, there is a dearth of exploration of the higher-order perceptual capabilities of MLLMs. To fill this gap, we propose the Image Implication understanding Benchmark, II-Bench, which aims to evaluate the model's higher-order perception of images. Through extensive experiments on II-Bench across multiple MLLMs, we have made significant findings. Initially, a substantial gap is observed between the performance of MLLMs and humans on II-Bench. The pinnacle accuracy of MLLMs attains 74. 8%, whereas human accuracy averages 90%, peaking at an impressive 98%. Subsequently, MLLMs perform worse on abstract and complex images, suggesting limitations in their ability to understand high-level semantics and capture image details. Finally, it is observed that most models exhibit enhanced accuracy when image sentiment polarity hints are incorporated into the prompts. This observation underscores a notable deficiency in their inherent understanding of image sentiment. We believe that II-Bench will inspire the community to develop the next generation of MLLMs, advancing the journey towards expert artificial general intelligence (AGI). II-Bench is publicly available at https: //huggingface. co/datasets/m-a-p/II-Bench.

JBHI Journal 2022 Journal Article

A Multi-Modal Gait Analysis-Based Detection System of the Risk of Depression

  • WEI SHAO
  • Zhiyang You
  • Lesheng Liang
  • Xiping Hu
  • Chengming Li
  • Wei Wang
  • Bin Hu

Currently, depression has become a common mental disorder, especially among postgraduates. It is reported that postgraduates have a higher risk of depression than the general public, and they are more sensitive to contact with others. Thus, a non-contact and effective method for detecting people at risk of depression becomes an urgent demand. In order to make the recognition of depression more reliable and convenient, we propose a multi-modal gait analysis-based depression detection method that combines skeleton modality and silhouette modality. Firstly, we propose a skeleton feature set to describe depression and train a Long Short-Term Memory (LSTM) model to conduct sequence strategy. Secondly, we generate Gait Energy Image (GEI) as silhouette features from RGB videos, and design two Convolutional Neural Network (CNN) models with a new loss function to extract silhouette features from front and side perspectives. Then, we construct a multi-modal fusion model consisting of fusing silhouettes from the front and side views at the feature level and the classification results of different modalities at the decision level. The proposed multi-modal model achieved accuracy at 85. 45% in the dataset consisting of 200 postgraduate students (including 86 depressive ones), 5. 17% higher than the best single-mode model. The multi-modal method also shows improved generalization by reducing the gender differences. Furthermore, we design a vivid 3D visualization of the gait skeletons, and our results imply that gait is a potent biometric for depression detection.

AAAI Conference 2021 Conference Paper

A User-Adaptive Layer Selection Framework for Very Deep Sequential Recommender Models

  • Lei Chen
  • Fajie Yuan
  • Jiaxi Yang
  • Xiang Ao
  • Chengming Li
  • Min Yang

Sequential recommender systems (SRS) have become a research hotspot in recent studies. Because of the requirement in capturing user’s dynamic interests, sequential neural network based recommender models often need to be stacked with more hidden layers (e. g. , up to 100 layers) compared with standard collaborative filtering methods. However, the high network latency has become the main obstacle when deploying very deep recommender models into a production environment. In this paper, we argue that the typical prediction framework that treats all users equally during the inference phase is inefficient in running time, as well as sub-optimal in accuracy. To resolve such an issue, we present SkipRec, an adaptive inference framework by learning to skip inactive hidden layers on a per-user basis. Specifically, we devise a policy network to automatically determine which layers should be retained and which layers are allowed to be skipped, so as to achieve user-specific decisions. To derive the optimal skipping policy, we propose using gumbel softmax and reinforcement learning to solve the non-differentiable problem during backpropagation. We perform extensive experiments on three real-world recommendation datasets, and demonstrate that SkipRec attains comparable or better accuracy with much less inference time.

AAAI Conference 2021 Conference Paper

Exploring Auxiliary Reasoning Tasks for Task-oriented Dialog Systems with Meta Cooperative Learning

  • Bowen Qin
  • Min Yang
  • Lidong Bing
  • Qingshan Jiang
  • Chengming Li
  • Ruifeng Xu

In this paper, we propose a Meta Cooperative Learning (MCL) framework for task-oriented dialog systems (TDSs). Our model consists of an auxiliary KB reasoning task for learning meta KB knowledge, an auxiliary dialogue reasoning task for learning dialogue patterns, and a TDS task (primary task) that aims at not only retrieving accurate entities from KB but also generating natural responses, which are coordinated to achieve collective success in both retrieving accurate KB entities and generating human-like responses via meta learning. Concretely, the dialog generation model amalgamates complementary meta KB and dialog knowledge from two novel auxiliary reasoning tasks that together provide integrated guidance to build a high-quality TDS by adding regularization terms to force primary network to produce similar results to auxiliary networks. While MCL automatically learns appropriate labels for the two auxiliary reasoning tasks from the primary task, without requiring access to any further data. The key idea behind MCL is to use the performance of the primary task, which is trained alongside the auxiliary tasks in one iteration, to improve the auxiliary labels for the next iteration with meta learning. Experimental results on three benchmark datasets show that MCL can generate higher quality responses compared to several strong baselines in terms of both automatic and human evaluations. Code to reproduce the results in this paper is available at: https: //github. com/siat-nlp/MCL.

AAAI Conference 2021 Conference Paper

Imagine, Reason and Write: Visual Storytelling with Graph Knowledge and Relational Reasoning

  • Chunpu Xu
  • Min Yang
  • Chengming Li
  • Ying Shen
  • Xiang Ao
  • Ruifeng Xu

Visual storytelling is the task of generating a short story to describe an ordered image stream. Different from visual captions, stories contain not only factual descriptions, but also imaginary concepts that do not appear in the images. In this paper, we propose a novel imagine-reason-write generation framework (IRW) for visual storytelling, inspired by the logic of humans when they write a story. First, a multimodal imagining module is leveraged to learn the imaginative storyline explicitly, improving the coherence and reasonability of the generated story. Second, we employ a relational reasoning module to fully exploit the external knowledge (commonsense knowledge base) and task-specific knowledge (scene graph and event graph) with a relational reasoning method based on the storyline. In this way, we can effectively capture the most informative commonsense and visual relationships among objects in images, enhancing the diversity and informativeness of the generated story. Finally, we integrate the visual information and semantic (concept) information to generate human-like stories. Extensive experiments on a benchmark dataset (i. e. , VIST) demonstrate that the proposed IRW framework substantially outperforms the state-of-the-art methods across multiple evaluation metrics.

AAAI Conference 2020 Conference Paper

Be Relevant, Non-Redundant, and Timely: Deep Reinforcement Learning for Real-Time Event Summarization

  • Min Yang
  • Chengming Li
  • Fei Sun
  • Zhou Zhao
  • Ying Shen
  • Chenglin Wu

Real-time event summarization is an essential task in natural language processing and information retrieval areas. Despite the progress of previous work, generating relevant, nonredundant, and timely event summaries remains challenging in practice. In this paper, we propose a Deep Reinforcement learning framework for real-time Event Summarization (DRES), which shows promising performance for resolving all three challenges (i. e. , relevance, non-redundancy, timeliness) in a unified framework. Specifically, we (i) devise a hierarchical cross-attention network with intra- and interdocument attentions to integrate important semantic features within and between the query and input document for better text matching. In addition, relevance prediction is leveraged as an auxiliary task to strengthen the document modeling and help to extract relevant documents; (ii) propose a multi-topic dynamic memory network to capture the sequential patterns of different topics belonging to the event of interest and temporally memorize the input facts from the evolving document stream, avoiding extracting redundant information at each time step; (iii) consider both historical dependencies and future uncertainty of the document stream for generating relevant and timely summaries by exploiting the reinforcement learning technique. Experimental results on two realworld datasets have demonstrated the advantages of DRES model with significant improvement in generating relevant, non-redundant, and timely event summaries against the stateof-the-arts.