Author name cluster

Liyue Shen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

CCS: Controllable and Constrained Sampling with Diffusion Models via Initial Noise Perturbation

Bowen Song
Zecheng Zhang
Zhaoxu Luo
Jason Hu
Wei Yuan
Jing Jia
Zhengxu Tang
Guanyang Wang

Diffusion models have emerged as powerful tools for generative tasks, producing high-quality outputs across diverse domains. However, how the generated data responds to the initial noise perturbation in diffusion models remains under-explored, hindering a deeper understanding of the controllability of the sampling process. In this work, we first observe an interesting phenomenon: the relationship between the change of generation outputs and the scale of initial noise perturbation is highly linear through the diffusion ODE sampling process. We then provide both theoretical and empirical analyses to justify this linearity property of the input–output (noise → generation data) relationship. Inspired by these insights, we propose a novel C ontrollable and C onstrained S ampling (CCS) method, along with a new controller algorithm for diffusion models, that enables precise control over both (1) the proximity of individual samples to a target image and (2) the alignment of the sample mean with the target, while preserving high sample quality. We conduct extensive experiments comparing our proposed sampling approach with other methods in terms of both sampling controllability and generated data quality. Results show that CCS achieves significantly more precise controllability while maintaining superior sample quality and diversity, enabling practical applications such as fine-grained and robust image editing. Code: https: //github. com/efzero/diffusioncontroller

PDF Details

TMLR Journal 2025 Journal Article

Distributionally Robust Alignment for Medical Federated Vision-Language Pre-training Under Data Heterogeneity

Zitao Shuai
Chenwei Wu
Zhengxu Tang
Liyue Shen

Vision-language pre-training (VLP) has emerged as an effective scheme for multimodal representation learning, but its reliance on large-scale multimodal data poses significant challenges for medical applications. Federated learning (FL) offers a promising solution to scale up the dataset for medical VLP while preserving data privacy. However, we observe that client data heterogeneity in real-world scenarios could cause models to learn biased cross-modal alignment during local pre-training. This would limit the transferability of the federally learned representation model on downstream tasks. To address this challenge, we propose Federated Distributionally Robust Alignment (FedDRA), a framework for federated VLP that achieves robust vision-language alignment under heterogeneous conditions. Based on client datasets, we construct a distribution family that encompasses potential test-time domains, and apply a distributionally robust framework to optimize the pre-trained model's performance across this distribution space. This approach bridges the gap between pre-training samples and downstream applications. To avoid over-fitting on client-specific information, we use anchor representation from the global model to guide the local training, and adopt a two-stage approach to first tune deeper layers before updating the entire network. Extensive experiments on real-world datasets demonstrate FedDRA’s effectiveness in enhancing medical federated VLP under data heterogeneity. Our method also adapts well to various medical pre-training methods.

PDF Details

ICLR Conference 2025 Conference Paper

Dynamic Modeling of Patients, Modalities and Tasks via Multi-modal Multi-task Mixture of Experts

Chenwei Wu 0006
Zitao Shuai
Zhengxu Tang
Luning Wang
Liyue Shen

Multi-modal multi-task learning holds significant promise in tackling complex diagnostic tasks and many significant medical imaging problems. It fulfills the needs in real-world diagnosis protocol to leverage information from different data sources and simultaneously perform mutually informative tasks. However, medical imaging domains introduce two key challenges: dynamic modality fusion and modality-task dependence. The quality and amount of task-related information from different modalities could vary significantly across patient samples, due to biological and demographic factors. Traditional fusion methods apply fixed combination strategies that fail to capture this dynamic relationship, potentially underutilizing modalities that carry stronger diagnostic signals for specific patients. Additionally, different clinical tasks may require dynamic feature selection and combination from various modalities, a phenomenon we term “modality-task dependence.” To address these issues, we propose M4oE, a novel Multi-modal Multi-task Mixture of Experts framework for precise Medical diagnosis. M4oE comprises Modality-Specific (MSoE) modules and a Modality-shared Modality-Task MoE (MToE) module. With collaboration from both modules, our model dynamically decomposes and learns distinct and shared information from different modalities and achieves dynamic fusion. MToE provides a joint probability model of modalities and tasks by using experts as a link and encourages experts to learn modality-task dependence via conditional mutual information loss. By doing so, M4oE offers sample and population-level interpretability of modality contributions. We evaluate M4oE on four public multi-modal medical benchmark datasets for solving two important medical diagnostic problems including breast cancer screening and retinal disease diagnosis. Results demonstrate our method's superiority over state-of-the-art methods under different metrics of classification and segmentation tasks like Accuracy, AUROC, AUPRC, and DICE.

Details

TMLR Journal 2025 Journal Article

Part-aware Prompted Segment Anything Model for Adaptive Segmentation

Chenhui Zhao
Liyue Shen

Precision medicine, such as patient-adaptive treatments assisted by medical image analysis, poses new challenges for image segmentation algorithms due to the large variability across different patients and the limited availability of annotated data for each patient. In this work, we propose a data-efficient segmentation method to address these challenges, namely $\textit{\textbf{P}art-aware}$ $\textit{\textbf{P}rompted}$ $\textit{\textbf{S}egment}$ $\textit{\textbf{A}nything}$ $\textit{\textbf{M}odel}$ ($\mathbf{{P}^{2}SAM}$). Without any model fine-tuning, $\text{P}^2\text{SAM}$ enables seamless adaptation to any new patients relying only on one-shot patient-specific data. We introduce a novel part-aware prompt mechanism to select multiple-point prompts based on part-level features of the one-shot data, which can be extensively integrated into different promptable segmentation models, such as SAM and SAM 2. To further promote the robustness of the part-aware prompt mechanism, we propose a distribution-guided retrieval approach to determine the optimal number of part-level features for a specific case. $\text{P}^2\text{SAM}$ improves the performance by $\texttt{+} 8.0\%$ and $\texttt{+} 2.0\%$ mean Dice score for two different patient-adaptive segmentation applications, respectively. In addition, $\text{P}^2\text{SAM}$ also exhibits impressive generalizability in other adaptive segmentation tasks in the natural image domain, $\textit{e.g.}$, $\texttt{+} 6.4\%$ mIoU within personalized object segmentation task. Code will be released upon acceptance.

PDF Details

NeurIPS Conference 2024 Conference Paper

DiffusionBlend: Learning 3D Image Prior through Position-aware Diffusion Score Blending for 3D Computed Tomography Reconstruction

Bowen Song
Jason Hu
Zhaoxu Luo
Jeffrey A. Fessler
Liyue Shen

Diffusion models face significant challenges when employed for large-scale medical image reconstruction in real practice such as 3D Computed Tomography (CT). Due to the demanding memory, time, and data requirements, it is difficult to train a diffusion model directly on the entire volume of high-dimensional data to obtain an efficient 3D diffusion prior. Existing works utilizing diffusion priors on single 2D image slice with hand-crafted cross-slice regularization would sacrifice the z-axis consistency, which results in severe artifacts along the z-axis. In this work, we propose a novel framework that enables learning the 3D image prior through position-aware 3D-patch diffusion score blending for reconstructing large-scale 3D medical images. To the best of our knowledge, we are the first to utilize a 3D-patch diffusion prior for 3D medical image reconstruction. Extensive experiments on sparse view and limited angle CT reconstructionshow that our DiffusionBlend method significantly outperforms previous methodsand achieves state-of-the-art performance on real-world CT reconstruction problems with high-dimensional 3D image (i. e. , $256 \times 256 \times 500$). Our algorithm also comes with better or comparable computational efficiency than previous state-of-the-art methods. Code is available at https: //github. com/efzero/DiffusionBlend.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Learning Image Priors Through Patch-Based Diffusion Models for Solving Inverse Problems

Jason Hu
Bowen Song
Xiaojian Xu
Liyue Shen
Jeffrey A. Fessler

Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data. Such bottlenecks prevent most existing works from being feasible for high-dimensional and high-resolution data such as 3D images. This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images. Specifically, we propose a patch-based position-aware diffusion inverse solver, called PaDIS, where we obtain the score function of the whole image through scores of patches and their positional encoding and utilize this as the prior for solving inverse problems. First of all, we show that this diffusion model achieves an improved memory efficiency and data efficiencywhile still maintaining the capability to generate entire images via positional encoding. Additionally, the proposed PaDIS model is highly flexible and can be plugged in with different diffusion inverse solvers (DIS). We demonstrate that the proposed PaDIS approach enables solving various inverse problems in both natural and medical image domains, including CT reconstruction, deblurring, and superresolution, given only patch-based priors. Notably, PaDIS outperforms previous DIS methods trained on entire image priors in the case of limited training data, demonstrating the data efficiency of our proposed approach by learning patch-based prior.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency

Bowen Song
Soo Min Kwon
Zecheng Zhang
Xinyu Hu
Qing Qu 0001
Liyue Shen

Latent diffusion models have been demonstrated to generate high-quality images, while offering efficiency in model training compared to diffusion models operating in the pixel space. However, incorporating latent diffusion models to solve inverse problems remains a challenging problem due to the nonlinearity of the encoder and decoder. To address these issues, we propose ReSample, an algorithm that can solve general inverse problems with pre-trained latent diffusion models. Our algorithm incorporates data consistency by solving an optimization problem during the reverse sampling process, a concept that we term as hard data consistency. Upon solving this optimization problem, we propose a novel resampling scheme to map the measurement-consistent sample back onto the noisy data manifold and theoretically demonstrate its benefits. Lastly, we apply our algorithm to solve a wide range of linear and nonlinear inverse problems in both natural and medical images, demonstrating that our approach outperforms existing state-of-the-art approaches, including those based on pixel-space diffusion models.

Details

ICML Conference 2024 Conference Paper

The Emergence of Reproducibility and Consistency in Diffusion Models

Huijie Zhang
Jinfan Zhou
Yifu Lu
Minzhe Guo
Peng Wang 0098
Liyue Shen
Qing Qu 0001

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility”: given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and score function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions influenced by the training data size. This is evident in two distinct training regimes: (I) "memorization regime, ” where the diffusion model overfits to the training data distribution, and (ii) "generalization regime, ” where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional generation and solving inverse problems. Lastly, we discuss how our findings connect to existing research and highlight the practical implications of our discoveries.

Details

ICLR Conference 2022 Conference Paper

Solving Inverse Problems in Medical Imaging with Score-Based Generative Models

Yang Song 0011
Liyue Shen
Lei Xing 0001
Stefano Ermon

Reconstructing medical images from partial measurements is an important inverse problem in Computed Tomography (CT) and Magnetic Resonance Imaging (MRI). Existing solutions based on machine learning typically train a model to directly map measurements to medical images, leveraging a training dataset of paired images and measurements. These measurements are typically synthesized from images using a fixed physical model of the measurement process, which hinders the generalization capability of models to unknown measurement processes. To address this issue, we propose a fully unsupervised technique for inverse problem solving, leveraging the recently introduced score-based generative models. Specifically, we first train a score-based generative model on medical images to capture their prior distribution. Given measurements and a physical model of the measurement process at test time, we introduce a sampling method to reconstruct an image consistent with both the prior and the observed measurements. Our method does not assume a fixed measurement process during training, and can thus be flexibly adapted to different measurement processes at test time. Empirically, we observe comparable or better performance to supervised learning techniques in several medical imaging tasks in CT and MRI, while demonstrating significantly better generalization to unknown measurement processes.

Details