Author name cluster

Can Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

JBHI Journal 2026 Journal Article

DF-DiffVSR: Deformable Field-Driven Diffusion Model for Inter-Slice Continuity Enhancement in Medical Volume Super-Resolution

Can Wang
Min Liu
Qinghao Liu
Yuehao Zhu
Xiang Chen
Licheng Liu
Yaonan Wang
Erik Meijering

Medical volumetric imaging is crucial for precise diagnosis, but limited by equipment and acquisition constraints, anisotropic resolution leads to challenges in detecting small lesions and 3D visualization. While volumetric super-resolution methods can mitigate this issue, existing techniques suffer from limited receptive fields, failing to fully exploit inter-slice correlations and resulting in compromised inter-slice continuity. To address this limitation, we propose DF-DiffVSR, a novel deformable field-enhanced diffusion model for medical volume super resolution. The proposed method integrates optical flow principles with diffusion models through a Deformable Field Extraction (DFE) module, which explicitly learns inter slice motion information to enhance structural continuity in the through-plane direction. Furthermore, we design a Multiscale Large Kernel Convolution (MLKC) module that employs striped convolutions with varying kernel sizes to expand the receptive field and capture global anatomical context. Evaluated on RPLHR-CT and IXI-T2 datasets, DF DiffVSR achieves state-of-the-art (SOTA) performance, surpassing the sub-optimal method by 0. 732 dB and 0. 214 dB in PSNR, respectively, demonstrating superior capabilities in preserving inter-slice continuity and recovering fine grained details.

Details DOI

AAAI Conference 2026 Conference Paper

DICE: Distilling Classifier-Free Guidance into Text Embeddings

Zhenyu Zhou
Defang Chen
Can Wang
Chun Chen
Siwei Lyu

Text-to-image diffusion models are capable of generating high-quality images, but suboptimal pre-trained text representations often result in these images failing to align closely with the given text prompts. Classifier-free guidance (CFG) is a popular and effective technique for improving text-image alignment in the generative process. However, CFG introduces significant computational overhead. In this paper, we present DIstilling CFG by sharpening text Embeddings (DICE) that replaces CFG in the sampling process with half the computational complexity while maintaining similar generation quality. DICE distills a CFG-based text-to-image diffusion model into a CFG-free version by refining text embeddings to replicate CFG-based directions. In this way, we avoid the computational drawbacks of CFG, enabling high-quality, well-aligned image generation at a fast sampling speed. Furthermore, examining the enhancement pattern, we identify the underlying mechanism of DICE that sharpens specific components of text embeddings to preserve semantic information while enhancing fine-grained details. Extensive experiments on multiple Stable Diffusion v1.5 variants, SDXL, and PixArt-\alpha demonstrate the effectiveness of our method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

SpatialLogic-Bench: A Diagnostic Benchmark for Task-Oriented Spatiotemporal Reasoning

Xiaoda Yang
Shenzhou Gao
Can Wang
Jiahe Zhang
Menglan Tang
Jingyang Xue
Sheng Liu
Peijian Zhang

Vision-Language Models (VLMs) have made significant progress in static perception, but their ability to understand dynamic task-oriented reasoning remains unclear. Existing benchmarks mainly focus on static spatial relationships and lack systematic assessment of dynamic reasoning capabilities. To this end, we propose SpatialLogic-Bench, a novel benchmark designed to evaluate VLMs’ understanding of spatiotemporal logic and their ability to assess task progress. The benchmark assesses two critical capabilities: first, fine-grained visual discrimination to accurately perceive subtle physical changes between state frames; second, the logical capacity to connect these changes to task goals and judge whether they indicate progress. To mitigate temporal dependency biases, we introduce a dual-task paradigm, presenting image pairs in both chronological and reversed orders while keeping task descriptions consistent. We construct a multi-scale evaluation system by varying time intervals between frames: smaller intervals test the model's fine-grained perception, while larger intervals demand more sophisticated logical inference. Empirical evaluation reveals that most VLMs experience significant performance degradation on tasks presented in inverse chronological order, indicating an over-reliance on temporal cues rather than robust reasoning abilities. SpatialLogic-Bench clearly exposes critical limitations in current models and provides valuable guidance for improving dynamic spatial perception capabilities.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Advancing Loss Functions in Recommender Systems: A Comparative Study with a Rényi Divergence-Based Solution

Shengjia Zhang
Jiawei Chen
Changdong Li
Sheng Zhou
Qihao Shi
Yan Feng
Chun Chen
Can Wang

Loss functions play a pivotal role in optimizing recommendation models. Among various loss functions, Softmax Loss (SL) and Cosine Contrastive Loss (CCL) are particularly effective. Their theoretical connections and differences warrant in-depth exploration. This work conducts comprehensive analyses of these losses, yielding significant insights: 1) Common strengths --- both can be viewed as augmentations of traditional losses with Distributional Robust Optimization (DRO), enhancing robustness to distributional shifts; 2) Respective limitations --- stemming from their use of different distribution distance metrics in DRO optimization, SL exhibits high sensitivity to false negative instances, whereas CCL suffers from low data utilization. To address these limitations, this work proposes a new loss function, DrRL, which generalizes SL and CCL by leveraging Rényi-divergence in DRO optimization. DrRL incorporates the advantageous structures of both SL and CCL, and can be demonstrated to effectively mitigate their limitations. Extensive experiments have been conducted to validate the superiority of DrRL on both recommendation accuracy and robustness.

PDF Details DOI

IROS Conference 2025 Conference Paper

Automated Dual-Micropipette Coordination Microinjection for Batch Zebrafish Larvae Based on Pose Estimation

Can Wang
Rongxin Liu
Huiying Gong
Zengshuo Wang
Lu Zhou
Yaowei Liu
Xin Zhao 0010
Mingzhu Sun

Zebrafish are widely used in the biomedical field, as an ideal model for microinjection. In automated zebrafish microinjection, posture adjustment is the first and key step, which takes a lot of skill, and injection success assessment is a challenging task. Constrained by these two aspects, it is difficult to further enhance the efficiency and success rate of injection. In this study, we propose an automated dual-micropipette coordination microinjection system. Zebrafish are randomly arranged in our system, reducing the operational difficulty, and the yolk is positioned using a pose estimation algorithm, followed by injection accomplished with dual-micropipette. Due to the reduction of posture adjustment time by half, the proposed system achieves the shortest injection time of 15. 2s. Moreover, the simplicity of the system and the ease of operation contribute to the clinical feasibility of our system.

Details

EAAI Journal 2025 Journal Article

Calculating of stomatal index for tomato and lettuce based on You Only Look Once version 8 and improved High-Resolution Network

Yang Xu
Li Du
Qingrui Zhu
Can Wang
Liyuan Zhang
Yaxiao Niu
Qi Li
Danyan Chen

Details DOI

TMLR Journal 2025 Journal Article

Conditional Image Synthesis with Diffusion Models: A Survey

Zheyuan Zhan
Defang Chen
Jian-Ping Mei
Zhenghe Zhao
Jiawei Chen
Chun Chen
Siwei Lyu
Can Wang

Conditional image synthesis based on user-specified requirements is a key component in creating complex visual content. In recent years, diffusion-based generative modeling has become a highly effective way for conditional image synthesis, leading to exponential growth in the literature. However, the complexity of diffusion-based modeling, the wide range of image synthesis tasks, and the diversity of conditioning mechanisms present significant challenges for researchers to keep up with rapid developments and to understand the core concepts on this topic. In this survey, we categorize existing works based on how conditions are integrated into the two fundamental components of diffusion-based modeling, i.e., the denoising network and the sampling process. We specifically highlight the underlying principles, advantages, and potential challenges of various conditioning approaches during the training, re-purposing, and specialization stages to construct a desired denoising network. We also summarize six mainstream conditioning mechanisms in the sampling process. All discussions are centered around popular applications. Finally, we pinpoint several critical yet still unsolved problems and suggest some possible solutions for future research.