Author name cluster

Dong Liang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

38 papers

2 author rows

JBHI Journal 2026 Journal Article

An Automatic 3D PET Tumor Segmentation Framework Assisted by Geodesic Sequences

Lin Yang
Dan Shao
Chuanli Cheng
Chao Zou
Zhenxing Huang
Hairong Zheng
Dong Liang
Zhi-Feng Pang

Positron Emission Tomography (PET) images reflect the metabolic rate of tracers in different tissues of the human body, crucial for early cancer diagnosis and treatment. Accurate tumor segmentation is essential to aid clinicians in determining drug dosages. Due to the low resolution of PET images, prior information (such as CT, MRI or distance information) are often incorporated to assist PET segmentation. In this paper, we propose an automatic 3D PET tumor segmentation framework assisted by geodesic sequences. Specifically, considering the intrinsic characteristics of PET images, we first construct geodesic prior, which effectively enhances the contrast between the tumor and background while suppressing noise and the influence of other tissues. To address the need for seed points in the geodesic prior, an automatic marking strategy is designed that identifies all suspected lesion regions and uses their central points as a series of seeds to generate the corresponding geodesic sequences. Subsequently, we develop a three-branch network architecture to simultaneously process PET images, geodesic sequences, and background geodesic information. To enhance image features, a distance attention mechanism is introduced at the end of the network encoder to effectively measure the similarity between different geodesic features, refining the image features. Finally, the network incorporates spatial regularization and local PET intensity information into the activation function via the Soft Threshold Dynamics with Local Intensity Fitting (STDLIF) module, further improving segmentation accuracy. Experimental results demonstrate that, compared to existing state-of-the-art algorithms, the proposed method shows better segmentation performance on both clinical and public datasets.

EAAI Journal 2026 Journal Article

An efficient bridge three-dimensional reconstruction method based on images semantic segmentation and unmanned aerial vehicle photography without laser radar

Jiazhang Cao
Zhe Huang
Shuhao Zhang
Dong Liang
Yuya Takahashi
He Zhang
Fuyuan Gong

The rapid, lightweight acquisition of high-fidelity, noise-free point cloud data plays a critical role in digital bridge modeling. However, traditional Structure from Motion methods suffer from significant computational overhead and noise interference due to redundant background pixel information in high-resolution images. This study introduces Fast-Focused COLMAP, whose core innovation lies in integrating image semantic segmentation and a mask filtering strategy into the multi-view reconstruction process. The proposed method eliminates irrelevant environmental information, resolves the image distortion caused by semantic segmentation, while preserving the structural integrity of the bridge body. Applied to a medium-sized reinforced concrete bridge, the method achieves over 84% reduction in computational time under the same image sequence and computational setup compared to conventional COLMAP. Then, a multi-dimensional evaluation of point cloud registration that integrates local accuracy and global consistency is proposed for demonstrating that this method effectively balances geometric precision and computational efficiency.

JBHI Journal 2026 Journal Article

Diff-DTI: Fast Diffusion Tensor Imaging Using A Feature-Enhanced Joint Diffusion Model

Lang Zhang
Jinling He
Wang Li
Dong Liang
Yanjie Zhu

Magnetic resonance diffusion tensor imaging (DTI) is a unique non-invasive technique for measuring in vivo water molecule diffusion, reflecting tissue microstructure. However, acquiring high-quality DTI typically requires numerous diffusion-weighted images (DWIs) in multiple directions, resulting in long scan times that restrict its use in clinical and research settings. To address this limitation, we propose Diff-DTI, a fast DTI processing framework based on a feature-enhanced joint diffusion model, to reduce the number of DWIs needed for tensor fitting. Diff-DTI models the joint probability distribution of DWIs and DTI maps, supporting guided generation during inference. The incorporated feature enhancement fusion module further enhances image precision and details generated by the diffusion model. Experiments were performed on three public DWI datasets. Results demonstrate that Diff-DTI achieves up to 10-fold acceleration (using 6 DWIs) while maintaining relatively low normalized mean square error (NMSE) for DTI maps (2. 89% for FA, 0. 89% for MD, 0. 95% for AD, and 0. 98% for RD). Even using Diff-DTI with only 3 DWIs, the NMSEs of the generated DTI maps showed a gradual decrease, with 3. 51% for FA, 0. 89% for MD, 1. 13% for AD, and 1. 10% for RD. We conclude that Diff-DTI can significantly reduce the number of acquired DWIs and the scan time, without compromising image quality too much.

EAAI Journal 2026 Journal Article

Integrating whole-slide images and transcriptomic data for survival analysis using multimodal attention networks

Chunfeng Shao
Yuanshen Zhao
Yinsheng Chen
Jingxian Duan
Zeyu Zhang
Rongpin Wang
Dong Liang
Dongyue Chen

The integration of multimodal data holds great promise for tumor prognosis prediction by providing a more comprehensive view of the tumor. However, current fusion algorithms, which either merge feature levels or use alignment mechanisms for complex integration, often overlook the relationships between different modalities. To address this issue, we propose a novel multimodal deep learning algorithm that integrates pathological images and transcriptomic data for tumor prognosis prediction. Our model not only uses information from each data modality but also mines the correlations between the two modalities through a Hierarchical Cross-Modal Attention (HCMA) mechanism. In the pathological branch, we designed a self-attention encoder and decoder to extract various levels of contextual features from the pathological images. In the transcriptomic branch, we employed a selective state space module specifically designed to capture the complex dependencies among genetic features. To capture the correlations between the two modalities, we used an HCMA transformer module to generate cross-modal features, including gene-guided pathological features and pathology-guided gene features. Finally, we apply a specific feature alignment mechanism to constrain these features before fusing them to predict survival outcomes. We validated the proposed algorithm on four diverse cancer datasets from The Cancer Genome Atlas (TCGA). For testing the performance of the proposed survival prediction model, we also validated it on an independent cohort of 86 glioma patients from Sun Yat-sen University Cancer Center. Results show the model performs excellently in cancer survival analysis, which outperforms comparative single-modal models and multimodal fusion models.

JBHI Journal 2026 Journal Article

Low-Count PET Image Reconstruction With Generalized Sparsity Priors via Unrolled Deep Networks

Minghan Fu
Ming Fang
Bo Liao
Dong Liang
Zhanli Hu
Fang-Xiang Wu

Deep learning has demonstrated remarkable efficacy in reconstructing low-count PET (Positron EmissionTomography) images, attracting considerable attention in the medical imaging community. However, most existing deep learning approaches have not fully exploited the unique physical characteristics of PET imaging in the design of fidelity and prior regularization terms, resulting in constrained model performance and interpretability. In light of these considerations, we introduce an unrolled deep network based on maximum likelihood estimation for the Poisson distribution and a Generalized domain transformation for Sparsity learning, dubbed GS-Net. To address this complex optimization challenge, we employ the Alternating Direction Method of Multipliers (ADMM) framework, integrating a modified Expectation Maximization (EM) approach to address the primary objective and utilize the shrinkage thresholding approach to optimize the L1 norm term. Additionally, within this unrolled deep network, all hyperparameters are adaptively adjusted through end-to-end learning to eliminate the need for manual parameter tuning. Through extensive experiments on simulated patient brain datasets and real patient whole-body clinical datasets with multiple count levels, our method has demonstrated advanced performance compared to traditional non-iterative and iterative reconstruction, deep learning-based direct reconstruction, and hybrid unrolled methods, as demonstrated by qualitative and quantitative evaluations.

AAAI Conference 2026 Conference Paper

MultiMedBench: A Scenario-Aware Benchmark for Evaluating Knowledge Editing in Medical VQA

Shengtao Wen
Haodong Chen
Yadong Wang
Zhongying Pan
Xiang Chen
Yu Tian
Bo Qian
Dong Liang

Knowledge editing (KE) provides a scalable approach for updating factual knowledge in large language models without full retraining. While previous studies have demonstrated effectiveness in general domains and medical QA tasks, little attention has been paid to KE in multimodal medical scenarios. Unlike text-only settings, medical KE demands integrating updated knowledge with visual reasoning to support safe and interpretable clinical decisions. To address this gap, we propose MultiMedBench, the first benchmark tailored to evaluating KE in clinical multimodal tasks. Our framework spans both understanding and reasoning task types, defines a three-dimensional metric suite (reliability, generality, and locality), and supports cross-paradigm comparisons across general and domain-specific models. We conduct extensive experiments under single-editing and lifelong-editing settings. Results suggest that current methods struggle with generalization and long-tail reasoning, particularly in complex clinical workflows. We further present an efficiency analysis (e.g., edit latency, memory footprint), revealing practical trade-offs in real-world deployment across KE paradigms. Overall, MultiMedBench not only reveals the limitations of current approaches but also provides a solid foundation for developing clinically robust knowledge editing techniques in the future.

PDF Details DOI

JBHI Journal 2026 Journal Article

Organ-Aware Cross-Modality Registration Enables Attenuation Correction Without Repeated CT in Multi-Scan Total-Body PET/CT Imaging

Qianhao Chen
Lianghua Li
Hairong Zheng
Dong Liang
Jianjun Liu
Zhanli Hu

Multi-Scan Total-Body PET/CT imaging, including dual-time-point and multi-tracer protocols, provides valuable metabolic information for enhanced disease diagnosis. However, the necessity for repeated CT scans due to patient repositioning for attenuation correction (AC) increases radiation exposure, raising safety concerns. To address this, we propose a novel organ-aware cross-modality registration model (OCRM) that enables accurate AC without repeated CT scans. The OCRM predicts a deformation field between the attenuation map of the first scan and the non-attenuation-corrected PET image from the subsequent scan, generating a pseudo-attenuation map that is subsequently used for AC during PET image reconstruction. Furthermore, to enhance registration accuracy and improve generalizability, we introduced a organ-aware mechanism that emphasizes alignment in both key organ regions and the global body contour. In comparison to existing deep learning-based AC methods, the OCRM can be deployed in a zero-shot manner across diverse multi-scan PET/CT imaging scenarios without retraining. We validated the method across four different multi-scan PET/CT scenarios, including dual-time-point and dual-tracer protocols. The proposed method outperformed state-of-the-art CT-free AC techniques, demonstrating superior performance in terms of RMSE, PSNR, and SSIM across both whole-body and specific anatomical regions. Specifically, in all four scenarios, SSIM values for the corrected PET images consistently exceeded 0. 995, and the RMSE of standardized uptake values remained below 0. 005. These results highlight the practical applicability and generalization capability of the proposed method, enabling its seamless deployment across diverse multi-scan PET/CT imaging scenarios.

EAAI Journal 2026 Journal Article

Transfer learning from building information model-based synthetic data for three-dimensional module detection in point clouds of modular-integrated construction hoisting

Dong Liang
Longyong Wu
Meng Sun
Ruibo Hu
Lingming Kong
Yipeng Pan
Fan Xue

AAAI Conference 2026 Conference Paper

Zero-shot Implicit Neural Manifold Representation (INMR) for Ultra-high Temporal Resolution Dynamic MRI

Jie Feng
Rui Luo
Tian Zeng
Xin Shen
Haikun Qi
Yuyao Zhang
Dong Liang
Hongjiang Wei

Capturing accurate dynamic information of moving organs is essential for functional assessment using non-invasive imaging modalities. Achieving high temporal resolution visualization of physiological processes remains a critical challenge in dynamic magnetic resonance imaging (MRI) when reconstructing from extremely limited acquisitions. We introduce an unsupervised zero-shot reconstruction framework combining Implicit Neural Representation (INR) with manifold learning, capable of reconstructing dynamic MRI data at unprecedented temporal resolutions (less than 10 ms per frame for 2D imaging, less than 400 ms per frame for 3D imaging). The framework employs learnable low-dimensional manifold vectors to autonomously capture motion in real time directly from undersampled data, and dynamically condition coordinate-based spatial representations to generate high-fidelity image sequences. Through a novel spatiotemporal coarse-to-fine (C2F) optimization strategy, our method outperforms current state-of-the-art (SOTA) techniques across multiple imaging scenarios, including cardiac, speech and dynamic-contrast-enhanced (DCE) abdominal MRI, demonstrating robust performance under challenging motion patterns and contrast dynamics. The learned manifolds additionally provide intuitive visualization of motion and contrast evolution during imaging. These advances indicate strong clinical potential for applications requiring extreme temporal resolution while maintaining both anatomical and temporal fidelity.

PDF Details DOI

EAAI Journal 2025 Journal Article

A bidirectional bi-objective graph search model for sustainable urban railway alignment optimization

Tianlong Zhang
Yan Gao
Shuangting Xu
Ting Deng
Qing He
Paul Schonfeld
Yang Zou
Dong Liang

JBHI Journal 2025 Journal Article

Automatic Brain Segmentation for PET/MR Dual-Modal Images Through a Cross-Fusion Mechanism

Hongyan Tang
Zhenxing Huang
Wenbo Li
Yaping Wu
Jianmin Yuan
Yang Yang
Yan Zhang
Jing Qin

The precise segmentation of different brain regions and tissues is usually a prerequisite for the detection and diagnosis of various neurological disorders in neuroscience. Considering the abundance of functional and structural dual-modality information for positron emission tomography/magnetic resonance (PET/MR) images, we propose a novel 3D whole-brain segmentation network with a cross-fusion mechanism introduced to obtain 45 brain regions. Specifically, the network processes PET and MR images simultaneously, employing UX-Net and a cross-fusion block for feature extraction and fusion in the encoder. We test our method by comparing it with other deep learning-based methods, including 3DUXNET, SwinUNETR, UNETR, nnFormer, UNet3D, NestedUNet, ResUNet, and VNet. The experimental results demonstrate that the proposed method achieves better segmentation performance in terms of both visual and quantitative evaluation metrics and achieves more precise segmentation in three views while preserving fine details. In particular, the proposed method achieves superior quantitative results, with a Dice coefficient of 85. 73% $\pm$ 0. 01%, a Jaccard index of 76. 68% $\pm$ 0. 02%, a sensitivity of 85. 00% $\pm$ 0. 01%, a precision of 83. 26% $\pm$ 0. 03% and a Hausdorff distance (HD) of 4. 4885 $\pm$ 14. 85%. Moreover, the distribution and correlation of the SUV in the volume of interest (VOI) are also evaluated (PCC > 0. 9), indicating consistency with the ground truth and the superiority of the proposed method. In future work, we will utilize our whole-brain segmentation method in clinical practice to assist doctors in accurately diagnosing and treating brain diseases.

JBHI Journal 2025 Journal Article

Clinically Generalizable Low-Dose CT Denoising for Pediatric Imaging via Enhanced Diffusion Posterior Sampling

Hongmei Tang
Qianhao Chen
Qiyang Zhang
Zhaoting Cheng
Yang Wu
Shuang Song
Hairong Zheng
Dong Liang

In total-body positron emission tomography and computed tomography (PET/CT) imaging, reducing the radiation dose of diagnostic CT scans is essential for minimizing overall radiation exposure, particularly in pediatric patients. Although deep learning-based denoising methods have shown promise in restoring low-dose CT (LDCT) to normal-dose CT (NDCT) quality, most approaches rely on structurally aligned paired data, which are difficult to acquire in clinical practice. Models trained on synthetic pairs often exhibit limited generalizability to real LDCT data. Unconditional diffusion models demonstrate outstanding generalizability, but fail to preserve structural fidelity. To address these challenges, we propose an enhanced diffusion posterior sampling (E-DPS) framework that combines a one-step denoiser U-Net with an unconditional diffusion model. Specifically, the U-Net estimator, trained on simulated LDCT–NDCT pairs, provides preliminary denoised outputs as structural constraints, whereas the diffusion model captures the prior distribution of NDCT images to enhance realism and generalizability. During inference, the U-Net predictions are integrated as constraints with tunable weights, thereby guiding diffusion posterior sampling. In addition, an intermediate-stage initialization strategy is introduced, significantly reducing the number of required sampling steps. Extensive experiments on simulated LDCT datasets across three dose levels demonstrate the superiority of our method, yielding average PSNR gains of +5. 2% and +4. 3% at unseen dose levels compared with state-of-the-art approaches. Moreover, on real LDCT images, E-DPS exhibits strong zero-shot generalizability, achieving better noise suppression while preserving anatomical detail. These results highlight the robustness and clinical potential of E-DPS for LDCT denoising.

YNICL Journal 2025 Journal Article

Detection of structural-functional coupling abnormalities using multimodal brain networks in Alzheimer’s disease: A comparison of three computational models

Yinping Lu
Luyao Wang
Toshiya Murai
Jinglong Wu
Dong Liang
Zhilin Zhang

Alzheimer's disease (AD) is a progressive neurodegenerative disorder characterized by the disconnection of white matter fibers and disrupted functional connectivity of gray matter; however, the pathological mechanisms linking structural and functional changes remain unclear. This study aimed to explore the interaction between the structural and functional brain network in AD using advanced structural-functional coupling (S-F coupling) models to assess whether these changes correlate with cognitive function, Aβ deposition levels, and gene expression. In this study, we utilized multimodal magnetic resonance imaging data from 41 individuals with AD, 112 individuals with mild cognitive impairment, and 102 healthy controls to explore these mechanisms. We applied different computational models to examine the changes in the S-F coupling associated with AD. Our results showed that the communication and graph harmonic models demonstrated greater heterogeneity and were more sensitive than the statistical models in detecting AD-related pathological changes. In addition, S-F coupling increases with AD progression at the global, subnetwork, and regional node levels, especially in the medial prefrontal and anterior cingulate cortices. The S-F coupling of these regions also partially mediated cognitive decline and Aβ deposition. Furthermore, gene enrichment analysis revealed that changes in S-F coupling were strongly associated with the regulation of cellular catabolic processes. This study advances our understanding of the interaction between structural and functional connectivity and highlights the importance of S-F coupling in elucidating the neural mechanisms underlying cognitive decline in AD.

NeurIPS Conference 2025 Conference Paper

DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models

Fayi Le
Wenwu He
Chentao Cao
Dong Liang
Zhuo-Xu Cui

Pretrained vision-language models (VLMs), such as CLIP, have shown promising zero-shot out-of-distribution (OOD) detection capabilities by leveraging semantic similarities between input images and textual labels. However, most existing approaches focus solely on expanding the label space in the text domain, ignoring complementary visual cues that can further enhance discriminative power. In this paper, we introduce DualCnst, a novel framework that integrates text-image dual consistency for improved zero-shot OOD detection. Specifically, we generate synthetic images from both ID and mined OOD textual labels using a text-to-image generative model, and jointly evaluate each test image based on (i) its semantic similarity to class labels and (ii) its visual similarity to the synthesized images. The resulting unified score function effectively combines multimodal information without requiring access to in-distribution images or additional training. We further provide theoretical analysis showing that incorporating multimodal negative labels reduces score variance and improves OOD separability. Extensive experiments across diverse OOD benchmarks demonstrate that DualCnst achieves state-of-the-art performance while remaining scalable, data-agnostic, and fully compatible with prior text-only VLM-based methods.

JBHI Journal 2025 Journal Article

FE-DIC-Based Motion and Intensity Correction for Enhanced CEST-MRI Registration

Haizhou Liu
Yijia Zheng
Zhou Liu
Yuxi Jin
Zhihua Li
Jidong Han
Ziang Di
Hairong Zheng

Physiological and external motion cause inter-frame misalignment in chemical exchange saturation transfer magnetic resonance imaging (CEST-MRI), thereby compromising quantitative accuracy. In CEST-MRI, saturation effects induce intensity variations, resulting in motion-intensity coupling that makes registration particularly challenging. To address this issue, we extend the finite element digital image correlation (FE-DIC) framework by introducing an alternating correction strategy that iteratively refines both motion and intensity estimation. Unlike conventional FE-DIC approaches that assume intensity constancy, the proposed method incorporates mechanical regularization to suppress non-physical deformations, alongside intensity correction to compensate for reference–target contrast discrepancies. This mutual reinforcement enables progressively improved registration across the CEST sequence. The robustness and effectiveness of the method were evaluated on three datasets. In simulated liver data, it maintains RMSE within 0. 4 pixels, reducing error by 0. 5 pixels compared to RPCA & PCA (a PCA-based synthetic reference generation method for CEST registration). On clinical brain and pig cardiac data, it achieves average SSIM of 0. 83, outperforming RPCA & PCA by 0. 03 and surpassing CNN-based registration (e. g. , AirLab) by 0. 10. The consistent results across datasets highlight its generalizability, making it a promising tool for metabolic quantification in clinical and research settings.

NeurIPS Conference 2025 Conference Paper

HOComp: Interaction-Aware Human-Object Composition

Dong Liang
Jinyuan Jia
Yuhao Liu
Rynson Lau

While existing image‑guided composition methods may help insert a foreground object onto a user-specified region of a background image, achieving natural blending inside the region with the rest of the image unchanged, we observe that these existing methods often struggle in synthesizing seamless interaction-aware compositions when the task involves human-object interactions. In this paper, we first propose HOComp, a novel approach for compositing a foreground object onto a human-centric background image, while ensuring harmonious interactions between the foreground object and the background person and their consistent appearances. Our approach includes two key designs: (1) MLLMs-driven Region-based Pose Guidance (MRPG), which utilizes MLLMs to identify the interaction region as well as the interaction type (e. g. , holding and lefting) to provide coarse-to-fine constraints to the generated pose for the interaction while incorporating human pose landmarks to track action variations and enforcing fine-grained pose constraints; and (2) Detail-Consistent Appearance Preservation (DCAP), which unifies a shape-aware attention modulation mechanism, a multi-view appearance loss, and a background consistency loss to ensure consistent shapes/textures of the foreground and faithful reproduction of the background human. We then propose the first dataset, named Interaction-aware Human-Object Composition (IHOC), for the task. Experimental results on our dataset show that HOComp effectively generates harmonious human-object interactions with consistent appearances, and outperforms relevant methods qualitatively and quantitatively.

JBHI Journal 2025 Journal Article

Latent Diffusion Model with Estimation Posterior Sampling: A Unified Framework for General Medical Image Restoration

Qianhao Chen
Hanzhong Wang
Yi An
Meiyuan Wen
Biao Li
Hairong Zheng
Dong Liang
Zhanli Hu

Clinical imaging protocols designed to accelerate acquisition or reduce radiation dose often lead to degraded image quality, compromising diagnostic confidence. The heterogeneity in degradation types and severities across imaging modalities further challenges the development of generalized restoration solutions. In this work, we introduce a unified framework that formulates medical image restoration as posterior sampling from self-supervised Latent Diffusion Models (LDMs), pretrained on multi-modal high-quality images. At the core of our method is an Estimation Posterior Sampling (EPS) strategy, which enhances both data fidelity and anatomical detail retention. EPS incorporates two key components: (i) estimated diffusion initialization to constrain sampling within the measurement-consistent solution space, and (ii) gradient-balanced optimization to adaptively trade off denoising strength and detail preservation throughout the diffusion trajectory. Unlike traditional task-specific models, our approach enables Plug-and-Play (PnP) deployment, supporting diverse degradations without retraining. Extensive experiments conducted on deterministic degradations (e. g. , under-sampled MRI, sparse-view CT) and blind degradations (e. g. , low-dose PET) across multiple degradation levels demonstrate superior quantitative and qualitative performance compared to both supervised baselines and state-of-the-art posterior sampling methods. Notably, our method achieves PSNR improvements of up to +2. 9 dB (MRI), +1. 1 dB (CT), and +0. 9 dB (PET) in PnP mode. These results highlight the robustness and broad applicability of our framework for clinical deployment.

IJCAI Conference 2025 Conference Paper

LoD: Loss-difference OOD Detection by Intentionally Label-Noisifying Unlabeled Wild Data

Chuanxing Geng
Qifei Li
Xinrui Wang
Dong Liang
Songcan Chen
Pong C. Yuen

Using unlabeled wild data containing both in-distribution (ID) and out-of-distribution (OOD) data to improve the safety and reliability of models has recently received increasing attention. Existing methods either design customized losses for labeled ID and unlabeled wild data then perform joint optimization, or first filter out OOD data from the latter then learn an OOD detector. While achieving varying degrees of success, two potential issues remain: (i) Labeled ID data typically dominates the learning of models, inevitably making models tend to fit OOD data as IDs; (ii) The selection of thresholds for identifying OOD data in unlabeled wild data usually faces dilemma due to the unavailability of pure OOD samples. To address these issues, we propose a novel loss-difference OOD detection framework (LoD) by intentionally label-noisifying unlabeled wild data. Such operations not only enable labeled ID data and OOD data in unlabeled wild data to jointly dominate the models' learning but also ensure the distinguishability of the losses between ID and OOD samples in unlabeled wild data, allowing the classic clustering technique (e. g. , K-means) to filter these OOD samples without requiring thresholds any longer. We also provide theoretical foundation for LoD's viability, and extensive experiments verify its superiority.

PDF Details DOI

YNIMG Journal 2025 Journal Article

MR-guided graph learning of 18F-florbetapir PET enables accurate and interpretable Alzheimer’s disease staging

Xinyi Chen
Lijuan Chen
Weiheng Yao
Qiankun Zuo
Ye Li
Dong Liang
Shuqiang Wang
Meiyun Wang

PURPOSE: Subtle structural and molecular brain changes make noninvasive early detection and staging of Alzheimer's disease (AD) challenging and critical for effective intervention. This study develops a novel graph convolutional network (GCN) learning framework that integrates amyloid-β PET imaging and MRI structural features, aiming for improved early detection and accurate staging of AD. METHODS: The retrospective study utilized 18F-florbetapir PET scans from the Alzheimer's Disease Neuroimaging Initiative (ADNI) as the training dataset (323 scans from 196 subjects - 45 normal control, 80 mild cognitive impairment/MCI, 71 AD) and two independent datasets for testing (99 scans from 85 subjects - 31 normal control, 15 MCI, 44 AD). Individual brain graphs were constructed for each PET scan, and graph learning framework was designed to extract molecular features from PET while integrating structural features from MRI. Performance was evaluated using receiver operating characteristic (ROC) analysis, comparing results against cortical SUVR. Additionally, a biomarker GCN_score was defined based on identified salient regions-of-interest, with its effectiveness assessed using the Kruskal-Wallis test and Cohen's effect size. RESULTS: The framework achieved AUCs of 89.8 % (specificity 83.6 %, sensitivity 81.6 %) for distinguishing MCI from normal controls and 88.3 % (specificity 81.6 %, sensitivity 80.6 %) for MCI from AD in the ADNI dataset, with comparable performance in external testing. All results significantly outperformed cortical SUVR (DeLong test p < 0.001). The GCN_score demonstrated superior group differentiation (Cohen's effect sizes 1.744 and 1.32) compared to cortical SUVR (0.309 and 0.641). CONCLUSION: The proposed graph-based learning framework effectively integrates PET and MRI features for accurate AD stage distinction, showing significant promise for early detection and facilitating timely intervention.

JBHI Journal 2025 Journal Article

Multistage Diffusion Model With Phase Error Correction for Fast PET Imaging

Yunlong Gao
Zhenxing Huang
Xingyu Xie
Wenjie Zhao
Qianyi Yang
Xinlan Yang
Yongfeng Yang
Hairong Zheng

Fast PET imaging is clinically important for reducing motion artifacts and improving patient comfort. While recent diffusion-based deep learning methods have shown promise, they often fail to capture the true PET degradation process, suffer from accumulated inference errors, introduce artifacts, and require extensive reconstruction iterations. To address these challenges, we propose a novel multistage diffusion framework tailored for fast PET imaging. At the coarse level, we design a multistage structure to approximate the temporal non-linear PET degradation process in a data-driven manner, using paired PET images collected under different acquisition duration. A Phase Error Correction Network (PECNet) ensures consistency across stages by correcting accumulated deviations. At the fine level, we introduce a deterministic cold diffusion mechanism, which simulates intra-stage degradation through interpolation between known acquisition durations—significantly reducing reconstruction iterations to as few as 10. Evaluations on [ 68 Ga]FAPI and [ 18 F]FDG PET datasets demonstrate the superiority of our approach, achieving peak PSNRs of 36. 2 dB and 39. 0 dB, respectively, with average SSIMs over 0. 97. Our framework offers high-fidelity PET imaging with fewer iterations, making it practical for accelerated clinical imaging.

JBHI Journal 2025 Journal Article

PEARL: Cascaded Self-Supervised Cross-Fusion Learning for Parallel MRI Acceleration

Qingyong Zhu
Bei Liu
Zhuo-Xu Cui
Chentao Cao
Xiaomeng Yan
Yuanyuan Liu
Jing Cheng
Yihang Zhou

Supervised deep learning (SDL) methodology holds promise for accelerated magnetic resonance imaging (AMRI) but is hampered by the reliance on extensive training data. Some self-supervised frameworks, such as deep image prior (DIP), have emerged, eliminating the explicit training procedure but often struggling to remove noise and artifacts under significant degradation. This work introduces a novel self-supervised accelerated parallel MRI approach called PEARL, leveraging a multiple-stream joint deep decoder with two cross-fusion schemes to accurately reconstruct one or more target images from compressively sampled k-space. Each stream comprises cascaded cross-fusion sub-block networks (SBNs) that sequentially perform combined upsampling, 2D convolution, joint attention, ReLU activation and batch normalization (BN). Among them, combined upsampling and joint attention facilitate mutual learning between multiple-stream networks by integrating multi-parameter priors in both additive and multiplicative manners. Long-range unified skip connections within SBNs ensure effective information propagation between distant cross-fusion layers. Additionally, incorporating dual-normalized edge-orientation similarity regularization into the training loss enhances detail reconstruction and prevents overfitting. Experimental results consistently demonstrate that PEARL outperforms the existing state-of-the-art (SOTA) self-supervised AMRI technologies in various MRI cases. Notably, 5-fold $\sim$ 6-fold accelerated acquisition yields a 1 $\%$ $\sim$ 2 $\%$ improvement in SSIM $_{\mathsf{ROI}}$ and a 3 $\%$ $\sim$ 6 $\%$ improvement in PSNR $_{\mathsf{ROI}}$, along with a significant 15 $\%$ $\sim$ 20 $\%$ reduction in RLNE $_{\mathsf{ROI}}$.

AAAI Conference 2025 Conference Paper

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Yachao Li
Dong Liang
Tianyu Ding
Sheng-Jun Huang

Diffusion-based models have shown great promise in real-world image super-resolution (Real-ISR), but often generate content with structural errors and spurious texture details due to the empirical priors and illusions of these models. To address this issue, we introduce StructSR, a simple, effective, and plug-and-play method that enhances structural fidelity and suppresses spurious details for diffusion-based Real-ISR. StructSR operates without the need for additional fine-tuning, external model priors, or high-level semantic knowledge. At its core is the Structure-Aware Screening (SAS) mechanism, which identifies the image with the highest structural similarity to the low-resolution (LR) input in the early inference stage, allowing us to leverage it as a historical structure knowledge to suppress the generation of spurious details. By intervening in the diffusion inference process, StructSR seamlessly integrates with existing diffusion-based Real-ISR models. Our experimental results demonstrate that StructSR significantly improves the fidelity of structure and texture, improving the PSNR and SSIM metrics by an average of 5.27% and 9.36% on a synthetic dataset (DIV2K-Val) and 4.13% and 8.64% on two real-world datasets (RealSR and DRealSR) when integrated with four state-of-the-art diffusion-based Real-ISR methods.

PDF Details DOI

JBHI Journal 2024 Journal Article

A Two-Stage Generative Model with CycleGAN and Joint Diffusion for MRI-based Brain Tumor Detection

Wenxin Wang
Zhuo-Xu Cui
Guanxun Cheng
Chentao Cao
Xi Xu
Ziwei Liu
Haifeng Wang
Yulong Qi

Accuratedetection and segmentation of brain tumors is critical for medical diagnosis. However, current supervised learning methods require extensively annotated images and the state-of-the-art generative models used in unsupervised methods often have limitations in covering the whole data distribution. In this paper, we propose a novel framework T wo- S tage G enerative M odel (TSGM) that combines Cycle Generative Adversarial Network (CycleGAN) and V ariance E xploding stochastic differential equation using j oint p robability (VE-JP) to improve brain tumor detection and segmentation. The CycleGAN is trained on unpaired data to generate abnormal images from healthy images as data prior. Then VE-JP is implemented to reconstruct healthy images using synthetic paired abnormal images as a guide, which alters only pathological regions but not regions of healthy. Notably, our method directly learned the joint probability distribution for conditional generation. The residual between input and reconstructed images suggests the abnormalities and a thresholding method is subsequently applied to obtain segmentation results. Furthermore, the multimodal results are weighted with different weights to improve the segmentation accuracy further. We validated our method on three datasets, and compared with other unsupervised methods for anomaly detection and segmentation. The DSC score of 0. 8590 in BraTs2020 dataset, 0. 6226 in ITCS dataset and 0. 7403 in In-house dataset show that our method achieves better segmentation performance and has better generalization.

JBHI Journal 2024 Journal Article

Accurate Whole-Brain Image Enhancement for Low-Dose Integrated PET/MR Imaging Through Spatial Brain Transformation

Zhenxing Huang
Wenbo Li
Yaping Wu
Lin Yang
Yun Dong
Yongfeng Yang
Hairong Zheng
Dong Liang

Positron emission tomography/magnetic resonance imaging (PET/MRI) systems can provide precise anatomical and functional information with exceptional sensitivity and accuracy for neurological disorder detection. Nevertheless, the radiation exposure risks and economic costs of radiopharmaceuticals may pose significant burdens on patients. To mitigate image quality degradation during low-dose PET imaging, we proposed a novel 3D network equipped with a spatial brain transform (SBF) module for low-dose whole-brain PET and MR images to synthesize high-quality PET images. The FreeSurfer toolkit was applied to derive the spatial brain anatomical alignment information, which was then fused with low-dose PET and MR features through the SBF module. Moreover, several deep learning methods were employed as comparison measures to evaluate the model performance, with the peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and Pearson correlation coefficient (PCC) serving as quantitative metrics. Both the visual results and quantitative results illustrated the effectiveness of our approach. The obtained PSNR and SSIM were $41. 96 \, \pm \, 4. 91$ dB (p $0. 9654 \, \pm \, 0. 0215$ (p < 0. 01), which achieved a 19% and 20% improvement, respectively, compared to the original low-dose brain PET images. The volume of interest (VOI) analysis of brain regions such as the left thalamus (PCC = 0. 959) also showed that the proposed method could achieve a more accurate standardized uptake value (SUV) distribution while preserving the details of brain structures. In future works, we hope to apply our method to other multimodal systems, such as PET/CT, to assist clinical brain disease diagnosis and treatment.

IROS Conference 2024 Conference Paper

Coalition Formation Game Approach for Task Allocation in Heterogeneous Multi-Robot Systems under Resource Constraints

Liwang Zhang
Dong Liang
Minglong Li
Wenjing Yang 0002
Shaowu Yang

This paper studies a case of the multi-robot task allocation (MRTA) problem, where each unmanned aerial vehicle (UAV) is endowed with multiple but limited resources. Completing each task necessitates UAVs to combine different resources through coalition formation, which will incur various costs including flight cost, execution cost, and cooperation cost. To minimize the total cost while maximizing both task completion rate and resource utilization rate, we model the MRTA problem of the UAVs as a leader-follower coalition formation game. In this game, leader UAVs coordinate follower UAVs to fulfill task resource requisites. Meanwhile, follower UAVs select suitable coalitions to join based on the altruistic preference. Theoretical analysis confirms the existence of a Nash stable partition in the coalition formation game. To achieve this stable partition, we propose a coalition formation algorithm. Simulation experiments validate that the proposed algorithm outperforms existing methods for the MRTA problem under resource constraints in terms of both task completion rate and resource utilization rate.

AAAI Conference 2024 Conference Paper

G2L-CariGAN: Caricature Generation from Global Structure to Local Features

Xin Huang
Yunfeng Bai
Dong Liang
Feng Tian
Jinyuan Jia

Existing GAN-based approaches to caricature generation mainly focus on exaggerating a character’s global facial structure. This often leads to the failure in highlighting significant facial features such as big eyes and hook nose. To address this limitation, we propose a new approach termed as G2L-CariGAN, which uses feature maps of spatial dimensions instead of latent codes for geometric exaggeration. G2L-CariGAN first exaggerates the global facial structure of the character on a low-dimensional feature map and then exaggerates its local facial features on a high-dimensional feature map. Moreover, we develop a caricature identity loss function based on feature maps, which well retains the character's identity after exaggeration. Our experiments have demonstrated that G2L-CariGAN outperforms the state-of-arts in terms of the quality of exaggerating a character and retaining its identity.

PDF Details DOI

JBHI Journal 2024 Journal Article

MMCA-NET: A Multimodal Cross Attention Transformer Network for Nasopharyngeal Carcinoma Tumor Segmentation Based on a Total-Body PET/CT System

Wenjie Zhao
Zhenxing Huang
Si Tang
Wenbo Li
Yunlong Gao
Yingying Hu
Wei Fan
Chuanli Cheng

Nasopharyngeal carcinoma (NPC) is a malignant tumor primarily treated by radiotherapy. Accurate delineation of the target tumor is essential for improving the effectiveness of radiotherapy. However, the segmentation performance of current models is unsatisfactory due to poor boundaries, large-scale tumor volume variation, and the labor-intensive nature of manual delineation for radiotherapy. In this paper, MMCA-Net, a novel segmentation network for NPC using PET/CT images that incorporates an innovative multimodal cross attention transformer (MCA-Transformer) and a modified U-Net architecture, is introduced to enhance modal fusion by leveraging cross-attention mechanisms between CT and PET data. Our method, tested against ten algorithms via fivefold cross-validation on samples from Sun Yat-sen University Cancer Center and the public HECKTOR dataset, consistently topped all four evaluation metrics with average Dice similarity coefficients of 0. 815 and 0. 7944, respectively. Furthermore, ablation experiments were conducted to demonstrate the superiority of our method over multiple baseline and variant techniques. The proposed method has promising potential for application in other tasks.

NeurIPS Conference 2024 Conference Paper

Theoretical Investigations and Practical Enhancements on Tail Task Risk Minimization in Meta Learning

Yiqin Lv
Qi Wang
Dong Liang
Zheng Xie

Meta learning is a promising paradigm in the era of large models and task distributional robustness has become an indispensable consideration in real-world scenarios. Recent advances have examined the effectiveness of tail task risk minimization in fast adaptation robustness improvement \citep{wang2023simple}. This work contributes to more theoretical investigations and practical enhancements in the field. Specifically, we reduce the distributionally robust strategy to a max-min optimization problem, constitute the Stackelberg equilibrium as the solution concept, and estimate the convergence rate. In the presence of tail risk, we further derive the generalization bound, establish connections with estimated quantiles, and practically improve the studied strategy. Accordingly, extensive evaluations demonstrate the significance of our proposal in boosting robustness.

PDF Details DOI

JBHI Journal 2023 Journal Article

A Two-Branch Neural Network for Short-Axis PET Image Quality Enhancement

Minghan Fu
Meiyun Wang
Yaping Wu
Na Zhang
Yongfeng Yang
Haining Wang
Yun Zhou
Yue Shang

The axial field of view (FOV) is a key factor that affects the quality of PET images. Due to hardware FOV restrictions, conventional short-axis PET scanners with FOVs of 20 to 35 cm can acquire only low-quality PET (LQ-PET) images in fast scanning times (2–3 minutes). To overcome hardware restrictions and improve PET image quality for better clinical diagnoses, several deep learning-based algorithms have been proposed. However, these approaches use simple convolution layers with residual learning and local attention, which insufficiently extract and fuse long-range contextual information. To this end, we propose a novel two-branch network architecture with swin transformer units and graph convolution operation, namely SW-GCN. The proposed SW-GCN provides additional spatial- and channel-wise flexibility to handle different types of input information flow. Specifically, considering the high computational cost of calculating self-attention weights in full-size PET images, in our designed spatial adaptive branch, we take the self-attention mechanism within each local partition window and introduce global information interactions between nonoverlapping windows by shifting operations to prevent the aforementioned problem. In addition, the convolutional network structure considers the information in each channel equally during the feature extraction process. In our designed channel adaptive branch, we use a Watts Strogatz topology structure to connect each feature map to only its most relevant features in each graph convolutional layer, substantially reducing information redundancy. Moreover, ensemble learning is adopted in our SW-GCN for mapping distinct features from the two well-designed branches to the enhanced PET images. We carried out extensive experiments on three single-bed position scans for 386 patients. The test results demonstrate that our proposed SW-GCN approach outperforms state-of-the-art methods in both quantitative and qualitative evaluations.

IJCAI Conference 2023 Conference Paper

ALL-E: Aesthetics-guided Low-light Image Enhancement

Ling Li
Dong Liang
Yuanhang Gao
Sheng-Jun Huang
Songcan Chen

Evaluating the performance of low-light image enhancement (LLE) is highly subjective, thus making integrating human preferences into image enhancement a necessity. Existing methods fail to consider this and present a series of potentially valid heuristic criteria for training enhancement models. In this paper, we propose a new paradigm, i. e. , aesthetics-guided low-light image enhancement (ALL-E), which introduces aesthetic preferences to LLE and motivates training in a reinforcement learning framework with an aesthetic reward. Each pixel, functioning as an agent, refines itself by recursive actions, i. e. , its corresponding adjustment curve is estimated sequentially. Extensive experiments show that integrating aesthetic assessment improves both subjective experience and objective evaluation. Our results on various benchmarks demonstrate the superiority of ALL-E over state-of-the-art methods. Source code: https: //dongl-group. github. io/project pages/ALLE. html

PDF Details DOI

AIIM Journal 2023 Journal Article

MLNAN: Multi-level noise-aware network for low-dose CT imaging implemented with constrained cycle Wasserstein generative adversarial networks

Zhenxing Huang
Wenbo Li
Yunling Wang
Zhou Liu
Qiyang Zhang
Yuxi Jin
Ruodai Wu
Guotao Quan

Low-dose CT techniques attempt to minimize the radiation exposure of patients by estimating the high-resolution normal-dose CT images to reduce the risk of radiation-induced cancer. In recent years, many deep learning methods have been proposed to solve this problem by building a mapping function between low-dose CT images and their high-dose counterparts. However, most of these methods ignore the effect of different radiation doses on the final CT images, which results in large differences in the intensity of the noise observable in CT images. What’more, the noise intensity of low-dose CT images exists significantly differences under different medical devices manufacturers. In this paper, we propose a multi-level noise-aware network (MLNAN) implemented with constrained cycle Wasserstein generative adversarial networks to recovery the low-dose CT images under uncertain noise levels. Particularly, the noise-level classification is predicted and reused as a prior pattern in generator networks. Moreover, the discriminator network introduces noise-level determination. Under two dose-reduction strategies, experiments to evaluate the performance of proposed method are conducted on two datasets, including the simulated clinical AAPM challenge datasets and commercial CT datasets from United Imaging Healthcare (UIH). The experimental results illustrate the effectiveness of our proposed method in terms of noise suppression and structural detail preservation compared with several other deep-learning based methods. Ablation studies validate the effectiveness of the individual components regarding the afforded performance improvement. Further research for practical clinical applications and other medical modalities is required in future works.

EAAI Journal 2023 Journal Article

MSRA-G: Combination of multi-scale residual attention network and generative adversarial networks for hyperspectral image classification

Jinling Zhao
Lei Hu
Linsheng Huang
Chuanjian Wang
Dong Liang

AAAI Conference 2022 Conference Paper

I Can Find You! Boundary-Guided Separated Attention Network for Camouflaged Object Detection

Hongwei Zhu
Peng Li
Haoran Xie
Xuefeng Yan
Dong Liang
Dapeng Chen
Mingqiang Wei
Jing Qin

Can you find me? By simulating how humans to discover the so-called ‘perfectly’-camouflaged object, we present a novel boundary-guided separated attention network (call BSA-Net). Beyond the existing camouflaged object detection (COD) wisdom, BSA-Net utilizes two-stream separated attention modules to highlight the separator (or say the camouflaged object’s boundary) between an image’s background and foreground: the reverse attention stream helps erase the camouflaged object’s interior to focus on the background, while the normal attention stream recovers the interior and thus pay more attention to the foreground; and both streams are followed by a boundary guider module and combined to strengthen the understanding of the boundary. The core design of such separated attention is motivated by the COD procedure of humans: find the subtle difference between the foreground and background to delineate the boundary of a camouflaged object, then the boundary can help further enhance the COD accuracy. We validate on three benchmark datasets that our BSA-Net is very beneficial to detect camouflaged objects with the blurred boundaries and similar colors/patterns with their backgrounds. Extensive results exhibit very clear COD improvements on our BSA-Net over sixteen SOTAs.

AAAI Conference 2022 Conference Paper

Semantically Contrastive Learning for Low-Light Image Enhancement

Dong Liang
Ling Li
Mingqiang Wei
Shuo Yang
Liyan Zhang
Wenhan Yang
Yun Du
Huiyu Zhou

Low-light image enhancement (LLE) remains challenging due to the unfavorable prevailing low-contrast and weakvisibility problems of single RGB images. In this paper, we respond to the intriguing learning-related question – if leveraging both accessible unpaired over/underexposed images and high-level semantic guidance, can improve the performance of cutting-edge LLE models? Here, we propose an effective semantically contrastive learning paradigm for LLE (namely SCL-LLE). Beyond the existing LLE wisdom, it casts the image enhancement task as multi-task joint learning, where LLE is converted into three constraints of contrastive learning, semantic brightness consistency, and feature preservation for simultaneously ensuring the exposure, texture, and color consistency. SCL-LLE allows the LLE model to learn from unpaired positives (normal-light)/negatives (over/underexposed), and enables it to interact with the scene semantics to regularize the image enhancement network, yet the interaction of high-level semantic knowledge and the lowlevel signal prior is seldom investigated in previous methods. Training on readily available open data, extensive experiments demonstrate that our method surpasses the state-of-thearts LLE models over six independent cross-scenes datasets. Moreover, SCL-LLE’s potential to benefit the downstream semantic segmentation under extremely dark conditions is discussed. Source Code: https: //github. com/LingLIx/SCL-LLE.

AAAI Conference 2021 Short Paper

Is Each Layer Non-trivial in CNN? (Student Abstract)

Wei Wang
Yanjie Zhu
Zhuoxu Cui
Dong Liang

Convolutional neural network (CNN) models have achieved great success in many fields. With the advent of ResNet, networks used in practice are getting deeper and wider. However, is each layer non-trivial in networks? To answer this question, we trained a network on the training set, then we replace the network convolution kernels with zeros and test the result models on the test set. We compared experimental results with baseline and showed that we can reach similar or even the same performances. Although convolution kernels are the cores of networks, we demonstrate that some of them are trivial and regular in ResNet.

JBHI Journal 2021 Journal Article

Learning a Deep CNN Denoising Approach Using Anatomical Prior Information Implemented With Attention Mechanism for Low-Dose CT Imaging on Clinical Patient Data From Multiple Anatomical Sites

Zhenxing Huang
Xinfeng Liu
Rongpin Wang
Zixiang Chen
Yongfeng Yang
Xin Liu
Hairong Zheng
Dong Liang

Dose reduction in computed tomography (CT) has gained considerable attention in clinical applications because it decreases radiation risks. However, a lower dose generates noise in low-dose computed tomography (LDCT) images. Previous deep learning (DL)-based works have investigated ways to improve diagnostic performance to address this ill-posed problem. However, most of them disregard the anatomical differences among different human body sites in constructing the mapping function between LDCT images and their high-resolution normal-dose CT (NDCT) counterparts. In this article, we propose a novel deep convolutional neural network (CNN) denoising approach by introducing information of the anatomical prior. Instead of designing multiple networks for each independent human body anatomical site, a unified network framework is employed to process anatomical information. The anatomical prior is represented as a pattern of weights of the features extracted from the corresponding LDCT image in an anatomical prior fusion module. To promote diversity in the contextual information, a spatial attention fusion mechanism is introduced to capture many local regions of interest in the attention fusion module. Although many network parameters are saved, the experimental results demonstrate that our method, which incorporates anatomical prior information, is effective in denoising LDCT images. Furthermore, the anatomical prior fusion module could be conveniently integrated into other DL-based methods and avails the performance improvement on multiple anatomical data.

YNIMG Journal 2020 Journal Article

Acceleration of three-dimensional diffusion magnetic resonance imaging using a kernel low-rank compressed sensing method

Chaoyi Zhang
Tanzil Mahmud Arefin
Ukash Nakarmi
Choong Heon Lee
Hongyu Li
Dong Liang
Jiangyang Zhang
Leslie Ying

Diffusion Magnetic Resonance Imaging (dMRI) has shown great potential in probing tissue microstructure and structural connectivity in the brain but is often limited by the lengthy scan time needed to sample the diffusion profile by acquiring multiple diffusion weighted images (DWIs). Although parallel imaging technique has improved the speed of dMRI acquisition, attaining high resolution three dimensional (3D) dMRI on preclinical MRI systems remained still time consuming. In this paper, kernel principal component analysis, a machine learning approach, was employed to estimate the correlation among DWIs. We demonstrated the feasibility of such correlation estimation from low-resolution training DWIs and used the correlation as a constraint to reconstruct high-resolution DWIs from highly under-sampled k-space data, which significantly reduced the scan time. Using full k-space 3D dMRI data of post-mortem mouse brains, we retrospectively compared the performance of the so-called kernel low rank (KLR) method with a conventional compressed sensing (CS) method in terms of image quality and ability to resolve complex fiber orientations and connectivity. The results demonstrated that the KLR-CS method outperformed the conventional CS method for acceleration factors up to 8 and was likely to enhance our ability to investigate brain microstructure and connectivity using high-resolution 3D dMRI.

AAAI Conference 2019 Conference Paper

PCGAN: Partition-Controlled Human Image Generation

Dong Liang
Rui Wang
Xiaowei Tian
Cong Zou

Human image generation is a very challenging task since it is affected by many factors. Many human image generation methods focus on generating human images conditioned on a given pose, while the generated backgrounds are often blurred. In this paper, we propose a novel Partition-Controlled GAN to generate human images according to target pose and background. Firstly, human poses in the given images are extracted, and foreground/background are partitioned for further use. Secondly, we extract and fuse appearance features, pose features and background features to generate the desired images. Experiments on Market-1501 and DeepFashion datasets show that our model not only generates realistic human images but also produce the human pose and background as we want. Extensive experiments on COCO and LIP datasets indicate the potential of our method.