Author name cluster

Yuyan Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2026 Conference Paper

MMIFEvol: Towards Evolutionary Multimodal Instruction Following

Haoyu Wang
Sihang Jiang
Xiangru Zhu
Yuyan Chen
Xiaojun Meng
Jiansheng Wei
Yitong Wang
Yanghua Xiao

Multimodal Instruction Following serves as a fundamental capability of multimodal language models, involving accurate comprehension and execution of user-provided instructions. However, existing multimodal instruction-following datasets and benchmarks face the shortcomings outlined below: (a) Lack of Difficulty Stratification, they collect diverse instruction categories but neglect the stratification of difficulty levels across these categories, which leads to overlap, bias, and low interpretability. (b) Lack of Fine-Grained Metrics, they conflate the model's ability to ``solve tasks" and ``follow constraints" into a single metric, which fails to accurately reflect its instruction-following capability. (c) Lack of Multi-Task Instructions, they overlook the fact that real-world user instructions often consist of multiple combined tasks. This paper proposes MMIFEvol, a framework for multimodal instruction evolving and benchmarking. First, we define the essential components of a carefully curated multimodal instruction set and establish corresponding difficulty levels, based on which we synthesize diverse instruction data. Next, we decouple the evaluation criteria for the instruction following into three different metrics to construct a high-quality benchmark and assess existing models. Experimental results demonstrate that current models still struggle with following complex instructions, while fine-tuning using MMIFEvol data effectively improves models' responsiveness to multimodal instructions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

S³-MSD: Large Vision-Language Model for Explainable and Generalizable Multi-modal Sarcasm Detection

Zhihong Zhu
Fan Zhang
Yunyan Zhang
Jinghan Sun
Guimin Hu
Hao Wu
Yuyan Chen
Bowen Xing

Multimodal sarcasm detection (MSD) aims to identify sarcasm polarity from diverse modalities (i.e., image–text pairs), a task that has received increasing attention. While significant progress has been made, existing approaches still face two major issues: lack of explainability and weak generalizability. In this paper, we introduce a new large vision–language model (LVLM) dubbed S³-MSD for explainable and generalizable MSD through three key components. For explainability, we develop (1) a self-training paradigm that automatically bootstraps answers with explanations, and (2) a self-calibrating mechanism that rectifies flawed explanations. For generalizability, we design (3) a self-focusing module that amplifies visual semantic entities through preference optimization, thereby mitigating textual over-reliance. Experimental results on both in-distribution and out-of-distribution (OOD) benchmarks demonstrate that S³-MSD consistently outperforms state-of-the-art methods in detection performance. Furthermore, the proposed S³-MSD provides persuasive explanations, as verified by both quantitative metrics and human evaluations.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Attributive Reasoning for Hallucination Diagnosis of Large Language Models

Yuyan Chen
Zehao Li
Shuangjie You
Zhengyu Chen
Jingwen Chang
Yi Zhang
Weinan Dai
Qingpei Guo

In recent years, large language models (LLMs) have demonstrated outstanding capabilities in various tasks. However, LLMs also have various drawbacks, especially hallucination. Hallucination refers to the generation of content that does not align with the user input, contradicts previously generated content or world knowledge. Current research on hallucination mainly include knowledge retrieval, prompt engineering, training data improvement, reinforcement learning, etc. However, these methods do not involve different categories of hallucinations which is important on hallucination analysis, and make detailed investigation for the internal state of LLMs which indicates the direction on hallucination occurrence. Therefore, in our research, we introduce an attribution framework to trace the origins of hallucinations based on the internal signals of LLMs. To support this framework, we develop a new benchmark named RelQA-Cate, which includes eight categories of hallucinations for the answers generated by LLMs. After that, we present a novel Differential Penalty Decoding (DPD) strategy for reducing hallucinations through adjusting post-probabilities of each answer. We conduct a series of experiments and the performance on answer reliability has significant improvement, achieving 28.25% at most, which demonstrates the effectiveness of our proposed DPD and its generalization in mitigating hallucination in LLMs.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations

Yi Zhang
Chun-Wun Cheng
Junyi He
Zhihai He
Carola-Bibiane Schönlieb
Yuyan Chen
Angelica I Aviles-Rivero

We introduce SONO, a novel method leveraging Second-Order Neural Ordinary Differential Equations (Second-Order NODEs) to enhance cross-modal few-shot learning. By employing a simple yet effective architecture consisting of a Second-Order NODEs model paired with a cross-modal classifier, SONO addresses the significant challenge of overfitting, which is common in few-shot scenarios due to limited training examples. Our second-order approach can approximate a broader class of functions, enhancing the model's expressive power and feature generalization capabilities. We initialize our cross-modal classifier with text embeddings derived from class-relevant prompts, streamlining training efficiency by avoiding the need for frequent text encoder processing. Additionally, we utilize text-based image augmentation, exploiting CLIP’s robust image-text correlation to enrich training data significantly. Extensive experiments across multiple datasets demonstrate that SONO outperforms existing state-of-the-art methods in few-shot learning performance.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring

Yuyan Chen
Nico Lang
B. Schmidt
Aditya Jain
Yves Basset
Sara Beery
Maxim Larrivee
David Rolnick

Global biodiversity is declining at an unprecedented rate, yet little information isknown about most species and how their populations are changing. Indeed, some90% Earth’s species are estimated to be completely unknown. Machine learning hasrecently emerged as a promising tool to facilitate long-term, large-scale biodiversitymonitoring, including algorithms for fine-grained classification of species fromimages. However, such algorithms typically are not designed to detect examplesfrom categories unseen during training – the problem of open-set recognition(OSR) – limiting their applicability for highly diverse, poorly studied taxa such asinsects. To address this gap, we introduce Open-Insect, a large-scale, fine-graineddataset to evaluate unknown species detection across different geographic regionswith varying difficulty. We benchmark 38 OSR algorithms across three categories: post-hoc, training-time regularization, and training with auxiliary data, finding thatsimple post-hoc approaches remain a strong baseline. We also demonstrate how toleverage auxiliary data to improve species discovery in regions with limited data. Our results provide timely insights to guide the development of computer visionmethods for biodiversity monitoring and species discovery.

PDF Details

AAAI Conference 2024 Conference Paper

Talk Funny! A Large-Scale Humor Response Dataset with Chain-of-Humor Interpretation

Yuyan Chen
Yichen Yuan
Panjun Liu
Dayiheng Liu
Qinghao Guan
Mengfei Guo
Haiming Peng
Bang Liu

Humor is a crucial part of human communication. Understanding humor and generating humorous responses in dialogue can provide natural and empathic human-computer interactions. However, most existing pre-trained language models (PLMs) perform unsatisfactorily in humor generation. On one hand, the serious shortage of humor corpus and datasets pose challenges for constructing models that can understand and generate humorous expressions. On the other hand, humor generation relies on rich knowledge and commonsense, which is often tacit and unspoken. In this paper, we construct the largest Chinese Explainable Humor Response Dataset to date with chain-of-humor and humor mind map annotations, which can be used to comprehensively evaluate as well as improve the humorous response ability of PLMs. We further design humor-related auxiliary tasks to further enhance PLMs' humorous response performance. Extensive evaluations demonstrate that our proposed dataset and auxiliary tasks effectively help PLMs to generate humorous responses, laying the groundwork for future humor research.

PDF Details DOI

YNIMG Journal 2019 Journal Article

Differential coupling between subcortical calcium and BOLD signals during evoked and resting state through simultaneous calcium fiber photometry and fMRI

Chuanjun Tong
Jian-kun Dai
Yuyan Chen
Kaiwei Zhang
Yanqiu Feng
Zhifeng Liang

Details DOI