Author name cluster

Yu Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

63 papers

2 author rows

JBHI Journal 2026 Journal Article

Federated Spatial Prior-Based Source-Free Domain Adaptation for White Matter Hyperintensities Segmentation

Yu Cheng
Yuxiang Dai
Rencheng Zheng
Beini Fei
Hui Zhang
Xinran Wu
Boyu Zhang
Haoran Peng

White matter hyperintensities (WMH) are important imaging biomarkers for cerebral small vessel disease, and their automatic segmentation across data with different distributions is crucial for assessing brain health and supporting diagnosis. However, cross-domain WMH segmentation remains challenging in privacy-sensitive and label-scarce clinical settings. Existing methods either relied on source domain data, violating privacy constraints, or lacked spatial guidance, which resulted in poor generalization, such as low sensitivity to small lesions. To address these challenges, we developed a source-free domain adaptation (SFDA) framework enhanced by federated spatial prior modeling. Our method used a dual-path pseudo-label generator that leveraged spatial priors to improve boundary accuracy and enhance the detection of small lesions. These priors were optimized via federated learning across multiple sites without sharing raw data, boosting model generalization while preserving privacy. The model was then fine-tuned using refined pseudo-labels. Experimental results demonstrated that our method consistently outperforms state-of-the-art UDA and SFDA methods, achieving 3–10% DSC improvement in most sites across 3 public and 7 private datasets. It also showed superior performance in small lesion detection and boundary delineation. Our method offered a robust, privacy-preserving solution for WMH segmentation and provided valuable support for early diagnosis and risk assessment of cerebrovascular diseases.

Details DOI

AAAI Conference 2026 Conference Paper

Less Is More: Vision Representation Compression for Efficient Video Generation with Large Language Models

Yucheng Zhou
Jihai Zhang
Guanjie Chen
Jianbing Shen
Yu Cheng

Video generation using Large Language Models (LLMs) has shown promising potential, effectively leveraging the extensive LLM infrastructure to provide a unified framework for multimodal understanding and content generation. However, these methods face critical challenges, i.e., token redundancy and inefficiencies arising from long sequences, which constrain their performance and efficiency compared to diffusion-based approaches. In this study, we investigate the impact of token redundancy in LLM-based video generation by information-theoretic analysis and propose Vision Representation Compression (VRC), a novel framework designed to achieve more in both performance and efficiency with less video token representations. VRC introduces learnable representation compressor and decompressor to compress video token representations, enabling autoregressive next-sequence prediction in a compact latent space. Our approach reduces redundancy, shortens token sequences, and improves model's ability to capture underlying video structures. Our experiments demonstrate that VRC reduces token sequence lengths by a factor of 4, achieving more than 9~14x acceleration in inference while maintaining performance comparable to state-of-the-art video generation models. VRC not only accelerates the inference but also significantly reduces memory requirements during both model training and inference.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RFNNS: Robust Fixed Neural Network Steganography with Universal Text-to-Image Models

Yu Cheng
Jiuan Zhou
Jiawei Chen
Zhaoxia Yin
Xinpeng Zhang

With the rapid development of generative AI, image steganography has garnered widespread attention due to its unique concealment. Recent studies have demonstrated the practical advantages of Fixed Neural Network Steganography (FNNS), notably its ability to achieve stable information embedding and extraction without any additional network training. However, the stego images generated by FNNS still exhibit noticeable distortion and limited robustness. These drawbacks compromise the security of the embedded information and restrict the practical applicability of the method. To address these limitations, we propose Robust Fixed Neural Network Steganography (RFNNS). Specifically, a texture-aware localization technique selectively embeds perturbations carrying secret information into regions of complex textures, effectively preserving visual quality. Additionally, a robust steganographic perturbation generation (RSPG) strategy is designed to enhance the decoding accuracy, even under common and unknown attacks. These robust perturbations are combined with AI-generated cover images to produce stego images. Experimental results demonstrate that RFNNS significantly improves robustness compared to state-of-the-art FNNS methods, achieving an average increase in SSIM of 23% for recovered secret images under common attacks. Furthermore, the LPIPS value of recovered secrets images against previously unknown attacks achieved by RFNNS was reduced to 39% of the SOTA method, underscoring its practical value for covert communication.

PDF Details DOI

JBHI Journal 2026 Journal Article

SMFusion: Semantic-Preserving Fusion of Multimodal Medical Images for Enhanced Clinical Diagnosis

Haozhe Xiang
Han Zhang
Yu Cheng
Xiongwen Quan
Wanwan Huang

Multimodal medical image fusion plays a crucial role in medical diagnosis by integrating complementary information from different modalities to enhance image readability and clinical applicability. However, existing methods mainly follow computer vision standards for feature extraction and fusion strategy formulation, overlooking the rich semantic information inherent in medical images. To address this limitation, we propose a novel semantic-guided medical image fusion approach that, for the first time, incorporates medical prior knowledge into the fusion process. Specifically, we construct a publicly available multimodal medical image-text dataset, upon which text descriptions generated by BiomedGPT are encoded and semantically aligned with image features in a high-dimensional space via a semantic interaction alignment module. During this process, a cross attention based linear transformation automatically maps the relationship between textual and visual features to facilitate comprehensive learning. The aligned features are then embedded into a text-injection module for further feature-level fusion. Unlike traditional methods, we further generate diagnostic reports from the fused images to assess the preservation of medical information. Additionally, we design a medical semantic loss function to enhance the retention of textual cues from the source images. Experimental results on test datasets demonstrate that the proposed method achieves superior performance in both qualitative and quantitative evaluations while preserving more critical medical information.

Details DOI

AAAI Conference 2026 Conference Paper

TransMamba: A Sequence-Level Hybrid Transformer-Mamba Language Model

Yixing Li
Ruobing Xie
Zhen Yang
Xingwu Sun
Shuaipeng Li
Weidong Han
Zhanhui Kang
Di Wang

Transformers are the cornerstone of modern large language models, but their quadratic computational complexity limits efficiency in long-sequence processing. Recent advancements in Mamba, a state space model (SSM) with linear complexity, offer promising efficiency gains but suffer from unstable contextual learning and multitask generalization. Some works conduct layer-level hybrid structures that combine Transformer and Mamba layers, aiming to make full use of both advantages. This paper proposes TransMamba, a novel sequence-level hybrid framework that unifies Transformer and Mamba through shared parameter matrices (QKV and CBx), and thus could dynamically switch between attention and SSM mechanisms at different token lengths and layers. We design the Memory Converter to bridge Transformer and Mamba by converting attention outputs into SSM-compatible states, ensuring seamless information flow at TransPoints where the transformation happens. The TransPoint scheduling is also thoroughly explored for balancing effectiveness and efficiency. We conducted extensive experiments demonstrating that TransMamba achieves superior training efficiency and performance compared to single and hybrid baselines, and validated the deeper consistency between Transformer and Mamba paradigms at sequence level, offering a scalable solution for next-generation language modeling.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Learning to Reason under Off-Policy Guidance

Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang

Recent advances in large reasoning models (LRMs) demonstrate that sophisticated behaviors such as multi-step reasoning and self-reflection can emerge via reinforcement learning with verifiable rewards~(RLVR). However, existing RLVR approaches are inherently ``on-policy'', limiting learning to a model's own outputs and failing to acquire reasoning abilities beyond its initial capabilities. To address this issue, we introduce LUFFY (Learning to reason Under oFF-policY guidance), a framework that augments RLVR with off-policy reasoning traces. LUFFY dynamically balances imitation and exploration by combining off-policy demonstrations with on-policy rollouts during training. Specifically, LUFFY combines the Mixed-Policy GRPO framework, which has a theoretically guaranteed convergence rate, alongside policy shaping via regularized importance sampling to avoid superficial and rigid imitation during mixed-policy training. Compared with previous RLVR methods, LUFFY achieves an over +6. 4 average gain across six math benchmarks and an advantage of over +6. 2 points in out-of-distribution tasks. Most significantly, we show that LUFFY successfully trains weak models in scenarios where on-policy RLVR completely fails. These results provide compelling evidence that LUFFY transcends the fundamental limitations of on-policy RLVR and demonstrates the great potential of utilizing off-policy guidance in RLVR.