Author name cluster

Pengyue Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

1 author row

AAAI Conference 2026 Conference Paper

Diffusion-Assisted Progressive Learning for Weakly Supervised Phrase Localization

Pengyue Lin
Yanyang Hu
Xinjing Liu
Wenqi Jia
Fangxiang Feng
Ruifan Li

Weakly supervised phrase localization (WSPL) aims to localize visual objects mentioned by given phrases, but it learns without human-annotated bounding boxes. Previous works struggle in multi-object scenarios where objects in the background often appear simultaneously with the target objects. To this end, we propose a Diffusion-Assisted PrOgressive learning framework (i.e., DAPO) for WSPL task in this paper. Specifically, we score the difficulty of training samples based on the quantity of objects and the level of semantic alignment. These samples are then used progressively during training, in an order by their difficulty scores. To address the sample imbalance problem, we propose a Generation-Assisted Tuning (GAT) method for the grounding network. First, to enrich the samples from few-object scenarios, we leverage Stable Diffusion (SD) to generate images with phrases. Second, we introduce an attention-driven scheme to direct SD's attention on the mentioned objects. Finally, we design a diffusion-guided loss, which helps the grounding network learn the objects' layouts. Extensive experiments show that our DAPO framework outperforms the strong baselines on benchmark datasets.

PDF Details DOI

AAAI Conference 2026 Conference Paper

OX-MABSR: A Benchmark for Open-domain Explainable Multimodal Aspect-Based Sentiment Reasoning

Xinjing Liu
ZiXin Xue
Pengyue Lin
Xinyu Tu
Siwei Xu
Ruifan Li

Multimodal Aspect-Based Sentiment Analysis (MABSA) involves extracting aspect terms from text-image pairs and identifying their sentiments. Most existing tasks consider one fixed sentiment category with explicitly mentioned aspects. However, these tasks seldom consider expressive sentiment categories, implicit aspects, and explainability. To this end, we introduce a novel task of Open-domain Explainable Multimodal Aspect-Based Sentiment Reasoning (OX-MABSR). This task enables the prediction of open-vocabulary aspect-sentiment pairs, together with the generation of sentiment explanations and reasoning paths. To benchmark OX-MABSR task, we construct OX-MABSR-Bench, a dataset annotated with explicit and implicit aspects, expressive sentiment categories, as well as perceptual and cognitive two-level explanations. The explanations capture visual and textual cues, including aesthetics, facial expressions, scenes, and textual semantics, together with background and situational knowledge. In addition, we annotate the reasoning paths that trace how the sentiment evolves from surface cues to a deeper contextual understanding. To address OX-MABSR task, we propose MABSR-LLM. Extensive experimental results show our MABSR-LLM outperforms strong baselines. To the best of our knowledge, we are the first to provide a unified framework for open-domain and explainable MABSR.

PDF Details DOI