Author name cluster

Yi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

125 papers

2 author rows

AAAI Conference 2026 Conference Paper

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

Guanghao Zhang
Tao Zhong
Yan Xia
Mushui Liu
Zhelun Yu
Haoyuan Li
Wanggui He
Dong She

While previous multimodal slow-thinking methods have demonstrated remarkable success in single-image understanding scenarios, their effectiveness becomes fundamentally constrained when extended to more complex multi-image comprehension tasks. This limitation stems from their predominant reliance on text-based intermediate reasoning processes. While for human, when engaging in sophisticated multi-image analysis, they typically perform two complementary cognitive operations: (1) continuous cross-image visual comparison through region-of-interest matching, and (2) dynamic memorization of critical visual concepts throughout the reasoning chain. Motivated by these observations, we propose the Complex Multi-Modal Chain-of-Thought (CMMCoT) framework, a multi-step reasoning framework that mimics human-like "slow thinking" for multi-image understanding. Our approach incorporates two key innovations: (1) The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals. This mechanism not only facilitates comprehensive cross-modal understanding but also enhances model interpretability. (2) The introduction of a test-time memory augmentation module that expands the model’s reasoning capacity during inference while preserving parameter efficiency. Furthermore, to facilitate research in this direction, we have curated a novel multi-image slow-thinking dataset. Extensive experiments demonstrate the effectiveness of our model.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Diffusion Reconstruction-based Data Likelihood Estimation for Core-Set Selection

Mingyang Chen
Jiawei Du
Bo Huang
Yi Wang
Xiaobo Zhang
Wei Wang

Existing core-set selection methods predominantly rely on heuristic scoring signals such as training dynamics or model uncertainty, lacking explicit modeling of data likelihood. This omission may hinder the constructed subset from capturing subtle yet critical distributional structures that underpin effective model training. In this work, we propose a novel, theoretically grounded approach that leverages diffusion models to estimate data likelihood via reconstruction deviation induced by partial reverse denoising. Specifically, we establish a formal connection between reconstruction error and data likelihood, grounded in the Evidence Lower Bound (ELBO) of Markovian diffusion processes, thereby enabling a principled, distribution-aware scoring criterion for data selection. Complementarily, we introduce an efficient information-theoretic method to identify the optimal reconstruction timestep, ensuring that the deviation provides a reliable signal indicative of underlying data likelihood. Extensive experiments on ImageNet demonstrate that reconstruction deviation offers an effective scoring criterion, consistently outperforming existing baselines across selection ratios, and closely matching full-data training using only 50% of the data. Further analysis shows that the likelihood-informed nature of our score reveals informative insights in data selection, shedding light on the interplay between data distributional characteristics and model learning preferences.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FreeMem: Enhancing Consistency in Long Video Generation via Tuning-Free Memory

Jibin Peng
Di Lin
Zhecheng Xu
Haoran Lu
Ruonan Liu
Wuyuan Xie
Miaohui Wang
Lingyu Liang

Text-to-Video (T2V) generation has advanced greatly, yet maintaining consistency remains challenging, especially for tuning-free long video generation. We attribute the consistency problem to cumulative deviations for long video generation at three levels: the random noise lacking correlation results initial deviation between frames; discrepancy in semantic feature tokens between denoising network blocks gradually accumulates as the frame count grows, leading to greater deviations; attention mechanisms struggle to capture global relationships across distant frames in long videos. To address these, we propose FreeMem, a tuning-free framework leveraging hierarchical memory update and injection: the noise memory stabilizes consistency by manipulating low and high frequency components in the initial noise space; the token memory combats inconsistency through adaptive fusion of historical and current semantic feature tokens between denoising network blocks; and the attention memory establishes persistent cache to model long-range relationships within self attention layers. Evaluated on VBench, FreeMem improves subject and background consistency matrics across various methods, offering a practical solution for low-cost, high-consistency long video generation.

PDF Details DOI

EAAI Journal 2026 Journal Article

Granular-ball based robust representation learning for social recommendation

Xiaofei Zhu
Shiyan Wu
Li Liu
Shuyin Xia
Yi Wang
Guoyin Wang

Social recommendation systems seek to leverage social relationships to mitigate data sparsity and cold-start issues by augmenting user–item interactions. However, existing methods encounter two critical limitations: (1) They predominantly model user–item interactions at a fine-grained granular level of user/item nodes, neglecting the potential coarse-grained collaborative patterns; and (2) They usually suppress noisy edges in social graphs from a single granular perspective, failing to adjust the denoising granularity according to the actual strength of relationships between users. To address these challenges, we propose GBRSR, a novel Granular-ball based Robust Representation Learning framework. Inspired by the “Global-first” cognitive principle, Granular-ball Computing (GBC), which represents data as granular-ball units with geometric significance, has garnered significant attention due to its outstanding performance in many fields. We leverage GBC theory for representation distillation, transferring coarse-grained knowledge to enhance fine-grained node-level representations. In addition, we employs a granular-ball based structure denoising strategy to prune noisy user relationships, while simultaneously alleviating noise in user representations through a diffusion process. Extensive experiments on three real-world benchmark datasets validate the superiority of GBRSR in recommendation accuracy and robustness, particularly under noisy and sparse conditions.