Arrow Research search

Author name cluster

Xiang Wan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
1 author row

Possible papers

18

JBHI Journal 2026 Journal Article

BS-LDM: Effective B one S uppression in High-Resolution Chest X-Ray Images With Conditional L atent D iffusion M odels

  • Yifei Sun
  • Zhanghao Chen
  • Hao Zheng
  • Wenming Deng
  • Jin Liu
  • Wenwen Min
  • Ahmed Elazab
  • Xiang Wan

Lung diseases represent a significant global health challenge, with Chest X-Ray (CXR) being a key diagnostic tool due to its accessibility and affordability. Nonetheless, the detection of pulmonary lesions is often hindered by overlapping bone structures in CXR images, leading to potential misdiagnoses. To address this issue, we develop an end-to-end framework called BS-LDM, designed to effectively suppress bone in high-resolution CXR images. This framework is based on conditional latent diffusion models and incorporates a multi-level hybrid loss-constrained vector-quantized generative adversarial network which is crafted for perceptual compression, ensuring the preservation of details. To further enhance the framework’s performance, we utilize offset noise in the forward process, and a temporal adaptive thresholding strategy in the reverse process. These additions help minimize discrepancies in generating low-frequency information of soft tissue images. Additionally, we have compiled a high-quality bone suppression dataset named SZCH-X-Rays. This dataset includes 818 pairs of high-resolution CXR and soft tissue images collected from our partner hospital. Moreover, we processed 241 data pairs from the JSRT dataset into negative images, which are more commonly used in clinical practice. Our comprehensive experiments and downstream evaluations reveal that BS-LDM excels in bone suppression, underscoring its clinical value.

AAAI Conference 2026 Conference Paper

DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction

  • Xiao Yu
  • Zhaojie Fang
  • Guanyu Zhou
  • Yin Shen
  • Huoling Luo
  • Ye Li
  • Ahmed Elazab
  • Xiang Wan

Lung cancer continues to be the leading cause of cancer-related deaths globally. Early detection and diagnosis of pulmonary nodules are essential for improving patient survival rates. Although previous research has integrated multimodal and multi-temporal information, outperforming single modality and single time point, the fusion methods are limited to inefficient vector concatenation and simple mutual attention, highlighting the need for more effective multimodal information fusion. To address these challenges, we introduce a Dual-Graph Spatiotemporal Attention Network, which leverages temporal variations and multimodal data to enhance the accuracy of predictions. Our methodology involves developing a Global-Local Feature Encoder to better capture the local, global, and fused characteristics of pulmonary nodules. Additionally, a Dual-Graph Construction method organizes multimodal features into inter-modal and intra-modal graphs. Furthermore, a Hierarchical Cross-Modal Graph Fusion Module is introduced to refine feature integration. We also compiled a novel multimodal dataset named the NLST-cmst dataset as a comprehensive source of support for related research. Our extensive experiments, conducted on both the NLST-cmst and curated CSTL-derived datasets, demonstrate that our DGSAN significantly outperforms state-of-the-art methods in classifying pulmonary nodules with exceptional computational efficiency.

TMLR Journal 2026 Journal Article

RLHF in an SFT Way: From Optimal Solution to Reward-Weighted Alignment

  • Yuhao Du
  • Zhuo Li
  • Pengyu Cheng
  • Zhihong Chen
  • Yuejiao XIE
  • Xiang Wan
  • Anningzhe Gao

Reinforcement Learning from Human Feedback (RLHF) is crucial for aligning Large Language Models (LLMs) with human values. However, RLHF has been continuously challenged by its high complexity in implementation and computation consumption, specifically for online sampling-based methods like Proximal Policy Optimization (PPO) and Group Relative Policy Optimization (GRPO). Even with recent simplifications, such as Direct Preference Optimization (DPO) that designs an offline implicit reward learning objective relying on pre-collected preference datasets, the problems of over-fitting and training instability remain hindering the alignment process from the expected optimal performance. To address the existing challenges, we propose a novel simplification of RLHF from the perspective of variational inference, called **V**ariational **A**lignment with **R**e-weighting (**VAR**). Specifically, by directly minimizing the distribution gap between the learning LLM policy and the optimal solution of RLHF, we transform the alignment objective into an offline reward-driven re-weighted supervised fine-tuning (SFT) form, which only requires minor adjustment on the SFT loss to obtain noticeable improvement on training stability and effectiveness. In comprehensive evaluation benchmarks, our objective empowers LLMs to outperform offline alignments, demonstrating superior performance in both helpfulness and harmlessness metrics (avg. $\uparrow7.16\%$ than DPO). Meanwhile, when compared to online sampling methods, our method is also comparable even better while significantly reducing computational overhead and accelerating convergence speed (over $5\times$ faster than GRPO), suggesting our approach as an efficient and effective solution in bridging the gap between efficiency and performance in LLM alignment.

JBHI Journal 2026 Journal Article

USCNet: Transformer-Based Multimodal Fusion with Segmentation Guidance for Urolithiasis Classification

  • Changmiao Wang
  • Songqi Zhang
  • Yongquan Zhang
  • Yifei Wang
  • Liya Liu
  • Nannan Li
  • Xingzhi Li
  • Jiexin Pan

Kidney stone disease ranks among the most prevalent conditions in urology, and understanding the composition of these stones is essential for creating personalized treatment plans and preventing recurrence. Current methods for analyzing kidney stones depend on post operative specimens, which prevents rapid classification before surgery. To overcome this limitation, we introduce a new approach called the Urinary Stone Segmentation and Classification Network (USCNet). This innovative method allows for precise preoperative classification of kidney stones by integrating Computed Tomography (CT) images with clinical data from Electronic Health Records (EHR). USCNet employs a Transformer-based multimodal fusion framework with CT-EHR attention and segmentation-guided attention modules for accurate classification. Moreover, a dynamic loss function is introduced to effectively balance the dual objectives of segmentation and classification. Experiments on an in-house kidney stone dataset show that USCNet demonstrates outstanding performance across all evaluation metrics, with its classification efficacy significantly surpassing existing mainstream methods. This study presents a promising solution for the precise preoperative classification of kidney stones, offering substantial clinical benefits. The source code has been made publicly available: https://github.com/fancccc/KidneyStoneSC.

AAAI Conference 2025 Conference Paper

Domain Generalized Medical Landmark Detection via Robust Boundary-Aware Pre-Training

  • Haifan Gong
  • Yu Lu
  • Xiang Wan
  • Haofeng Li

In recent years, deep learning has revenue in automated medical landmark detection. Nonetheless, prevailing research in this field predominantly addresses single-center scenarios or domain adaptation settings. In practical environments, the acquisition of multi-center data faces privacy concerns, coupled with the time-intensive and costly nature of data collection and annotation. These challenges substantially impede the broader application of deep learning-based medical landmark detection. To mitigate these issues, we propose a novel domain-generalized medical landmark detection framework that relies solely on single-center data for training. Considering the availability of numerous public medical segmentation datasets, we design a simple yet effective method that utilizes single-center segmentation to enhance the domain generalization capabilities of the landmark detection task. Specifically, we introduce a novel boundary-aware pre-training approach to focus the model on regions pertinent to landmarks. To further enhance the robustness and generalization capabilities during pre-training, we have derived a mixing loss term and proved its effectiveness in theory and practice. Extensive experiments conducted on our new domain generalization benchmark for medical landmark detection demonstrate the superiority of our approach.

JBHI Journal 2025 Journal Article

Fetal Cerebellum Landmark Detection Based on 3D MRI: Method and Benchmark

  • Haifan Gong
  • Huixian Liu
  • Yitao Wang
  • Xiaoling Liu
  • Xiang Wan
  • Qiao Shi
  • Haofeng Li

Fetal cerebellum landmark detection is crucial for assessing fetal brain development. Although deep learning has become the standard for automatic landmark detection, most previous methods have focused on using 2D ultrasound or thick Magnetic Resonance Imaging (MRI). To improve accuracy, landmarks should be located on thin 3D MRIs. However, abnormal development, high noise, and fuzzy boundaries in 3D fetal brain images make traditional methods less effective for cerebellum landmark detection. To address this, we introduce the Anatomical Pseudo-label Guided Attention (APGA) network alongside a 3D MRI-based benchmark for fetal cerebellum landmark detection. During training, we use a shared encoder to extract image features and two decoders for landmark regression and anatomical pseudo-label segmentation. We design a Feature Decoupling Transformer (FDT) and embed it into the encoder to better calibrate the features for the two tasks. We only need the encoder, the FDT, and the landmark decoder during the inference phase. Extensive experiments on our proposed benchmark and out-of-domain test set have shown the effectiveness of our method. Our simulations also demonstrated that 3D biometrics are better than 2D biometrics.

JBHI Journal 2025 Journal Article

Highlighted Diffusion Model as Plug-In Priors for Polyp Segmentation

  • Yuhao Du
  • Yuncheng Jiang
  • Shuangyi Tan
  • Si-Qi Liu
  • Zhen Li
  • Guanbin Li
  • Xiang Wan

Automated polyp segmentation from colonoscopy images is crucial for colorectal cancer diagnosis. The accuracy of such segmentation, however, is challenged by two main factors. First, the variability in polyps' size, shape, and color, coupled with the scarcity of well-annotated data due to the need for specialized manual annotation, hampers the efficacy of existing deep learning methods. Second, concealed polyps often blend with adjacent intestinal tissues, leading to poor contrast that challenges segmentation models. Recently, diffusion models have been explored and adapted for polyp segmentation tasks. However, the significant domain gap between RGB-colonoscopy images and grayscale segmentation masks, along with the low efficiency of the diffusion generation process, hinders the practical implementation of these models. To mitigate these challenges, we introduce the Highlighted Diffusion Model Plus (HDM+), a two-stage polyp segmentation framework. This framework incorporates the Highlighted Diffusion Model (HDM) to provide explicit semantic guidance, thereby enhancing segmentation accuracy. In the initial stage, the HDM is trained using highlighted ground-truth data, which emphasizes polyp regions while suppressing the background in the images. This approach reduces the domain gap by focusing on the image itself rather than on the segmentation mask. In the subsequent second stage, we employ the highlighted features from the trained HDM's U-Net model as plug-in priors for polyp segmentation, rather than generating highlighted images, thereby increasing efficiency. Extensive experiments conducted on six polyp segmentation benchmarks demonstrate the effectiveness of our approach.

NeurIPS Conference 2025 Conference Paper

Intermediate Domain Alignment and Morphology Analogy for Patent-Product Image Retrieval

  • Haifan Gong
  • Xuanye Zhang
  • Ruifei Zhang
  • Yun Su
  • Zhuo Li
  • Yuhao Du
  • Anningzhe Gao
  • Xiang Wan

Recent advances in artificial intelligence have significantly impacted image retrieval tasks, yet Patent-Product Image Retrieval (PPIR) has received limited attention. PPIR, which retrieves patent images based on product images to identify potential infringements, presents unique challenges: (1) both product and patent images often contain numerous categories of artificial objects, but models pre-trained on standard datasets exhibit limited discriminative power to recognize some of those unseen objects; and (2) the significant domain gap between binary patent line drawings and colorful RGB product images further complicates similarity comparisons for product-patent pairs. To address these challenges, we formulate it as an open-set image retrieval task and introduce a comprehensive Patent-Product Image Retrieval Dataset (PPIRD) including a test set with 439 product-patent pairs, a retrieval pool of 727, 921 patents, and an unlabeled pre-training set of 3, 799, 695 images. We further propose a novel Intermediate Domain Alignment and Morphology Analogy (IDAMA) strategy. IDAMA maps both image types to an intermediate sketch domain using edge detection to minimize the domain discrepancy, and employs a Morphology Analogy Filter to select discriminative patent images based on visual features via analogical reasoning. Extensive experiments on PPIRD demonstrate that IDAMA significantly outperforms baseline methods (+7. 58 mAR) and offers valuable insights into domain mapping and representation learning for PPIR. (The PPIRD dataset is available at: \href{https: //loslorien. github. io/idama-project/}{https: //loslorien. github. io/idama-project/})

TMLR Journal 2025 Journal Article

Synthesizing Minority Samples for Long-tailed Classification via Distribution Matching

  • Zhuo Li
  • He Zhao
  • Jinke Ren
  • Anningzhe Gao
  • DanDan Guo
  • Xiang Wan
  • Hongyuan Zha

In many real-world applications, deep neural networks (DNNs) often perform poorly on datasets with long-tailed distributions. To address this issue, a promising approach is to propose an optimization objective to transform real majority samples into synthetic minority samples. However, this objective is designed only from the classification perspective. To this end, we propose a novel framework that synthesizes minority samples from the majority by considering both classification and distribution matching. Specifically, our method adjusts the distribution of synthetic minority samples to closely align with that of the true minority class, while enforcing the synthetic samples to learn more generalizable and discriminative features of the minority class. Experimental results on several standard benchmark datasets demonstrate the effectiveness of our method in both long-tailed classification and synthesizing high-quality synthetic minority samples.

AAAI Conference 2024 Conference Paper

Cell Graph Transformer for Nuclei Classification

  • Wei Lou
  • Guanbin Li
  • Xiang Wan
  • Haofeng Li

Nuclei classification is a critical step in computer-aided diagnosis with histopathology images. In the past, various methods have employed graph neural networks (GNN) to analyze cell graphs that model inter-cell relationships by considering nuclei as vertices. However, they are limited by the GNN mechanism that only passes messages among local nodes via fixed edges. To address the issue, we develop a cell graph transformer (CGT) that treats nodes and edges as input tokens to enable learnable adjacency and information exchange among all nodes. Nevertheless, training the transformer with a cell graph presents another challenge. Poorly initialized features can lead to noisy self-attention scores and inferior convergence, particularly when processing the cell graphs with numerous connections. Thus, we further propose a novel topology-aware pretraining method that leverages a graph convolutional network (GCN) to learn a feature extractor. The pre-trained features may suppress unreasonable correlations and hence ease the finetuning of CGT. Experimental results suggest that the proposed cell graph transformer with topology-aware pretraining significantly improves the nuclei classification results, and achieves the state-of-the-art performance. Code and models are available at https://github.com/lhaof/CGT

JBHI Journal 2024 Journal Article

ECC-PolypDet: Enhanced CenterNet With Contrastive Learning for Automatic Polyp Detection

  • Yuncheng Jiang
  • Zixun Zhang
  • Yiwen Hu
  • Guanbin Li
  • Xiang Wan
  • Song Wu
  • Shuguang Cui
  • Silin Huang

Accurate polyp detection is critical for early colorectal cancer diagnosis. Although remarkable progress has been achieved in recent years, the complex colon environment and concealed polyps with unclear boundaries still pose severe challenges in this area. Existing methods either involve computationally expensive context aggregation or lack prior modeling of polyps, resulting in poor performance in challenging cases. In this paper, we propose the Enhanced CenterNet with Contrastive Learning (ECC-PolypDet), a two-stage training & end-to-end inference framework that leverages images and bounding box annotations to train a general model and fine-tune it based on the inference score to obtain a final robust model. Specifically, we conduct Box-assisted Contrastive Learning (BCL) during training to minimize the intra-class difference and maximize the inter-class difference between foreground polyps and backgrounds, enabling our model to capture concealed polyps. Moreover, to enhance the recognition of small polyps, we design the Semantic Flow-guided Feature Pyramid Network (SFFPN) to aggregate multi-scale features and the Heatmap Propagation (HP) module to boost the model's attention on polyp targets. In the fine-tuning stage, we introduce the IoU-guided Sample Re-weighting (ISR) mechanism to prioritize hard samples by adaptively adjusting the loss weight for each sample during fine-tuning. Extensive experiments on six large-scale colonoscopy datasets demonstrate the superiority of our model compared with previous state-of-the-art detectors.

AAAI Conference 2024 Conference Paper

UniCell: Universal Cell Nucleus Classification via Prompt Learning

  • Junjia Huang
  • Haofeng Li
  • Xiang Wan
  • Guanbin Li

The recognition of multi-class cell nuclei can significantly facilitate the process of histopathological diagnosis. Numerous pathological datasets are currently available, but their annotations are inconsistent. Most existing methods require individual training on each dataset to deduce the relevant labels and lack the use of common knowledge across datasets, consequently restricting the quality of recognition. In this paper, we propose a universal cell nucleus classification framework (UniCell), which employs a novel prompt learning mechanism to uniformly predict the corresponding categories of pathological images from different dataset domains. In particular, our framework adopts an end-to-end architecture for nuclei detection and classification, and utilizes flexible prediction heads for adapting various datasets. Moreover, we develop a Dynamic Prompt Module (DPM) that exploits the properties of multiple datasets to enhance features. The DPM first integrates the embeddings of datasets and semantic categories, and then employs the integrated prompts to refine image representations, efficiently harvesting the shared knowledge among the related cell types and data sources. Experimental results demonstrate that the proposed method effectively achieves the state-of-the-art results on four nucleus detection and classification benchmarks. Code and models are available at https://github.com/lhaof/UniCell

JBHI Journal 2024 Journal Article

UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-Scale Generation and Registration Enhancement

  • Ruiquan Ge
  • Zhaojie Fang
  • Pengxue Wei
  • Zhanghao Chen
  • Hongyang Jiang
  • Ahmed Elazab
  • Wangting Li
  • Xiang Wan

Fundus photography, in combination with the ultra-wide-angle fundus (UWF) techniques, becomes an indispensable diagnostic tool in clinical settings by offering a more comprehensive view of the retina. Nonetheless, UWF fluorescein angiography (UWF-FA) necessitates the administration of a fluorescent dye via injection into the patient's hand or elbow unlike UWF scanning laser ophthalmoscopy (UWF-SLO). To mitigate potential adverse effects associated with injections, researchers have proposed the development of cross-modality medical image generation algorithms capable of converting UWF-SLO images into their UWF-FA counterparts. Current image generation techniques applied to fundus photography encounter difficulties in producing high-resolution retinal images, particularly in capturing minute vascular lesions. To address these issues, we introduce a novel conditional generative adversarial network (UWAFA-GAN) to synthesize UWF-FA from UWF-SLO. This approach employs multi-scale generators and an attention transmit module to efficiently extract both global structures and local lesions. Additionally, to counteract the image blurriness issue that arises from training with misaligned data, a registration module is integrated within this framework. Our method performs non-trivially on inception scores and details generation. Clinical user studies further indicate that the UWF-FA images generated by UWAFA-GAN are clinically comparable to authentic images in terms of diagnostic reliability. Empirical evaluations on our proprietary UWF image datasets elucidate that UWAFA-GAN outperforms extant methodologies.

NeurIPS Conference 2024 Conference Paper

WhodunitBench: Evaluating Large Multimodal Agents via Murder Mystery Games

  • Junlin Xie
  • Ruifei Zhang
  • Zhihong Chen
  • Xiang Wan
  • Guanbin Li

Recently, large language models (LLMs) have achieved superior performance, empowering the development of large multimodal agents (LMAs). An LMA is anticipated to execute practical tasks requires various capabilities including multimodal perception, interaction, reasoning, and decision making. However, existing benchmarks are limited in assessing compositional skills and actions demanded by practical scenarios, where they primarily focused on single tasks and static scenarios. To bridge this gap, we introduce WhodunitBench, a benchmark rooted from murder mystery games, where players are required to utilize the aforementioned skills to achieve their objective (i. e. , identifying the `murderer' or hiding themselves), providing a simulated dynamic environment for evaluating LMAs. Specifically, WhodunitBench includes two evaluation modes. The first mode, the arena-style evaluation, is constructed from 50 meticulously curated scripts featuring clear reasoning clues and distinct murderers; The second mode, the chain of evaluation, consists of over 3000 curated multiple-choice questions and open-ended questions, aiming to assess every facet of the murder mystery games for LMAs. Experiments show that although current LMAs show acceptable performance in basic perceptual tasks, they are insufficiently equipped for complex multi-agent collaboration and multi-step reasoning tasks. Furthermore, the full application of the theory of mind to complete games in a manner akin to human behavior remains a significant challenge. We hope this work can illuminate the path forward, providing a solid foundation for the future development of LMAs. Our WhodunitBench is open-source and accessible at: https: //github. com/jun0wanan/WhodunitBench-Murder Mystery Games

AAAI Conference 2023 Conference Paper

A Simple Yet Effective Subsequence-Enhanced Approach for Cross-Domain NER

  • Jinpeng Hu
  • DanDan Guo
  • Yang Liu
  • Zhuo Li
  • Zhihong Chen
  • Xiang Wan
  • Tsung-Hui Chang

Cross-domain named entity recognition (NER), aiming to address the limitation of labeled resources in the target domain, is a challenging yet important task. Most existing studies alleviate the data discrepancy across different domains at the coarse level via combing NER with language modelings or introducing domain-adaptive pre-training (DAPT). Notably, source and target domains tend to share more fine-grained local information within denser subsequences than global information within the whole sequence, such that subsequence features are easier to transfer, which has not been explored well. Besides, compared to token-level representation, subsequence-level information can help the model distinguish different meanings of the same word in different domains. In this paper, we propose to incorporate subsequence-level features for promoting the cross-domain NER. In detail, we first utilize a pre-trained encoder to extract the global information. Then, we re-express each sentence as a group of subsequences and propose a novel bidirectional memory recurrent unit (BMRU) to capture features from the subsequences. Finally, an adaptive coupling unit (ACU) is proposed to combine global information and subsequence features for predicting entity labels. Experimental results on several benchmark datasets illustrate the effectiveness of our model, which achieves considerable improvements.

AAAI Conference 2023 Conference Paper

EASAL: Entity-Aware Subsequence-Based Active Learning for Named Entity Recognition

  • Yang Liu
  • Jinpeng Hu
  • Zhihong Chen
  • Xiang Wan
  • Tsung-Hui Chang

Active learning is a critical technique for reducing labelling load by selecting the most informative data. Most previous works applied active learning on Named Entity Recognition (token-level task) similar to the text classification (sentence-level task). They failed to consider the heterogeneity of uncertainty within each sentence and required access to the entire sentence for the annotator when labelling. To overcome the mentioned limitations, in this paper, we allow the active learning algorithm to query subsequences within sentences and propose an Entity-Aware Subsequences-based Active Learning (EASAL) that utilizes an effective Head-Tail pointer to query one entity-aware subsequence for each sentence based on BERT. For other tokens outside this subsequence, we randomly select 30% of these tokens to be pseudo-labelled for training together where the model directly predicts their pseudo-labels. Experimental results on both news and biomedical datasets demonstrate the effectiveness of our proposed method. The code is released at https://github.com/lylylylylyly/EASAL.

JBHI Journal 2023 Journal Article

Multi-Task Learning With Hierarchical Guidance for Locating and Stratifying Submucosal Tumors

  • Ruifei Zhang
  • Feng Zhang
  • Si Qin
  • Dejun Fan
  • Chaowei Fang
  • Jie Ma
  • Xiang Wan
  • Guanbin Li

Locating and stratifying the submucosal tumor of the digestive tract from endoscopy ultrasound (EUS) images are of vital significance to the preliminary diagnosis of tumors. However, the above problems are challenging, due to the poor appearance contrast between different layers of the digestive tract wall (DTW) and the narrowness of each layer. Few of existing deep-learning based diagnosis algorithms are devised to tackle this issue. In this article, we build a multi-task framework for simultaneously locating and stratifying the submucosal tumor. And considering the awareness of the DTW is critical to the localization and stratification of the tumor, we integrate the DTW segmentation task into the proposed multi-task framework. Except for sharing a common backbone model, the three tasks are explicitly directed with a hierarchical guidance module, in which the probability map of DTW itself is used to locally enhance the feature representation for tumor localization, and the probability maps of DTW and tumor are jointly employed to locally enhance the feature representation for tumor stratification. Moreover, by means of the dynamic class activation map, probability maps of DTW and tumor are reused to enforce the stratification inference process to pay more attention to DTW and tumor regions, contributing to a reliable and interpretable submucosal tumor stratification model. Additionally, considering the relation with respect to other structures is beneficial for stratifying tumors, we devise a graph reasoning module to replenish non-local relation knowledge for the stratification branch. Experiments on a Stomach-Esophagus and an Intestinal EUS dataset prove that our method achieves very appealing performance on both tumor localization and stratification, significantly outperforming state-of-the-art object detection approaches.

NeurIPS Conference 2022 Conference Paper

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation

  • Yuanfeng Ji
  • Haotian Bai
  • Chongjian Ge
  • Jie Yang
  • Ye Zhu
  • Ruimao Zhang
  • Zhen Li
  • Lingyan Zhanng

Despite the considerable progress in automatic abdominal multi-organ segmentation from CT/MRI scans in recent years, a comprehensive evaluation of the models' capabilities is hampered by the lack of a large-scale benchmark from diverse clinical scenarios. Constraint by the high cost of collecting and labeling 3D medical data, most of the deep learning models to date are driven by datasets with a limited number of organs of interest or samples, which still limits the power of modern deep models and makes it difficult to provide a fully comprehensive and fair estimate of various methods. To mitigate the limitations, we present AMOS, a large-scale, diverse, clinical dataset for abdominal organ segmentation. AMOS provides 500 CT and 100 MRI scans collected from multi-center, multi-vendor, multi-modality, multi-phase, multi-disease patients, each with voxel-level annotations of 15 abdominal organs, providing challenging examples and test-bed for studying robust segmentation algorithms under diverse targets and scenarios. We further benchmark several state-of-the-art medical segmentation models to evaluate the status of the existing methods on this new challenging dataset. We have made our datasets, benchmark servers, and baselines publicly available, and hope to inspire future research. Information can be found at https: //amos22. grand-challenge. org.