Author name cluster

Chao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

79 papers

2 author rows

AAAI Conference 2026 Conference Paper

Accelerating LLM Inference Throughput via Asynchronous KV Cache Prefetching

Yanhao Dong
Yubo Miao
Weinan Li
Xiao Zheng
Chao Wang
Jiesheng Wu
Feng Lyu

Large Language Models (LLMs) exhibit pronounced memory-bound characteristics during inference due to High Bandwidth Memory (HBM) bandwidth constraints. In this paper, we propose an L2 Cache-oriented asynchronous KV Cache prefetching method to break through the memory bandwidth bottleneck in LLM inference through computation-load overlap. By strategically scheduling idle memory bandwidth during active computation windows, our method proactively prefetches required KV Cache into GPU L2 cache, enabling high-speed L2 cache hits for subsequent accesses and effectively hiding HBM access latency within computational cycles. Extensive experiments on NVIDIA H20 GPUs demonstrate that the proposed method achieves 2.15× improvement in attention kernel efficiency and up to 1.97× end-to-end throughput enhancement, surpassing state-of-the-art baseline FlashAttention-3. Notably, our solution maintains orthogonality to existing optimization techniques and can be integrated with current inference frameworks, providing a scalable latency-hiding solution for next-generation LLM inference engines.

PDF Details DOI

AAAI Conference 2026 Conference Paper

AEDR: Training-Free AI-Generated Image Attribution via Autoencoder Double-Reconstruction

Chao Wang
Zijin Yang
Yaofei Wang
Weiming Zhang
Kejiang Chen

The rapid advancement of image-generation technologies has made it possible for anyone to create photorealistic images using generative models, raising significant security concerns. To mitigate malicious use, tracing the origin of such images is essential. Reconstruction-based attribution methods offer a promising solution, but they often suffer from reduced accuracy and high computational costs when applied to state‑of‑the‑art (SOTA) models. To address these challenges, we propose AEDR (AutoEncoder Double-Reconstruction), a novel training‑free attribution method designed for generative models with continuous autoencoders. Unlike existing reconstruction‑based approaches that rely on the value of a single reconstruction loss, AEDR performs two consecutive reconstructions using the model’s autoencoder, and adopts the ratio of these two reconstruction losses as the attribution signal. This signal is further calibrated using the image homogeneity metric to improve accuracy, which inherently cancels out absolute biases caused by image complexity, with autoencoder‑based reconstruction ensuring superior computational efficiency. Experiments on eight top latent diffusion models show that AEDR achieves 25.5% higher attribution accuracy than existing reconstruction‑based methods, with requiring only 1% of the computational time.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CloserToMe: A Unified Framework for Accurate and Transferable Latency Prediction Across Heterogeneous Devices

Cheng Tang
Guochong Sui
Wenqi Lou
Zihan Wang
Jiayi Tuo
Wenqian Xie
Yinkang Gao
Yixuan Zhu

Hardware accelerators such as GPUs, NPUs, and FPGAs are essential to meeting AI’s computational demands. With the proliferation of heterogeneous devices across cloud and edge, various model optimization techniques adapt to diverse hardware characteristics through operator transformations and structural modifications. Accurate, efficient latency prediction enables rapid selection of optimal strategies across hardware backends. Many existing methods treat hardware as a black-box executor, directly regressing latency without explicitly modeling the intricate interactions between neural network (NN) structures and device-specific execution behaviors. To address these challenges, we introduce a new modeling perspective that captures the interaction between neural architectures and hardware execution. To capture device-specific characteristics, we propose two complementary modeling strategies. The Device Behavior Signature Selector (DBSel) characterizes hardware execution behavior by selectively probing a small set of representative architectures, forming a compact, workload-driven profile. In parallel, we construct capability vectors that capture the hierarchical memory of each device and compute characteristics, providing a structured abstraction of its architectural capacity. To unify both behavioral and structural views, we introduce the Hardware–Operation Dialogue Module (HODM), which models fine-grained interactions between neural operators and hardware properties. Together, these components empower CloserToMe to deliver accurate and transferable latency predictions across unseen and diverse platforms.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Enhancing Conversational Recommender Systems with Tree-Structured Knowledge and Pretrained Language Models

Yongwen Ren
Chao Wang
Peng Du
Chuan Qin
Dazhong Shen
Hui Xiong

Recent advances in pretrained language models (PLMs) have significantly improved conversational recommender systems (CRS), enabling more fluent and context-aware interactions. To further enhance accuracy and mitigate hallucination, many methods integrate PLMs with knowledge graphs (KGs), but face key challenges: failing to fully exploit PLM reasoning over graph relationships, indiscriminately incorporating retrieved knowledge without context filtering, and neglecting collaborative preferences in multi-turn dialogues. To this end, we propose PCRS-TKA, a prompt-based framework employing retrieval-augmented generation to integrate PLMs with KGs. PCRS-TKA constructs dialogue-specific knowledge trees from KGs and serializes them into texts, enabling structure-aware reasoning while capturing rich entity semantics. Our approach selectively filters context-relevant knowledge and explicitly models collaborative preferences using specialized supervision signals. A semantic alignment module harmonizes heterogeneous inputs, reducing noise and enhancing accuracy. Extensive experiments demonstrate that PCRS-TKA consistently outperforms all baselines in both recommendation and conversational quality.

PDF Details DOI

EAAI Journal 2026 Journal Article

GMFIMamba: Remote sensing change detection based on group Mamba feature interaction

Wenliang Xu
Suting Chen
Feilong Bi
Chao Wang
Xiao Shu

With the advancement of satellite technology, high-resolution remote sensing images have been widely used in the field of change detection. Building Change Detection (BCD) and Building Damage Assessment (BDA) are both sub-tasks of change detection. BCD aims to detect structural changes in buildings over time, whereas BDA focuses on assessing the level of building damage after a disaster. BCD is of great value for urban planning, while BDA plays a crucial role in post-disaster rescue efforts. To address these tasks, we propose a change detection method based on Mamba, named GMFIMamba. Specifically, we design a Convolution–Visual State Space (Conv-VSS) block, which combines the local feature extraction capability of Convolutional Neural Networks (CNNs) with the global feature modeling ability of Mamba. By integrating local and global features, our approach improves the accuracy of change region detection. To tackle the issue of insufficient feature extraction for small-scale buildings in existing models, we introduce the Multi-branch Dilated Convolution Feature Enhancement Module (MCFEM). In addition, we design the Grouped Mamba-Based Bitemporal Features Interaction Module (GMBFIM) to facilitate effective interaction between bitemporal images, leading to more accurate change feature extraction. Experiments on three public datasets demonstrate that the proposed method achieves superior performance in both BCD and BDA tasks, proving its effectiveness.