Author name cluster

Le Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2026 Conference Paper

MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification

Zijiang Yang
Hanqing Chao
Bokai Zhao
Yelin Yang
Yunshuo Zhang
Dongmei Fu
Junping Zhang
Le Lu

Nucleus detection and classification (NDC) in histopathology analysis is a fundamental task that underpins a wide range of high-level pathology applications. However, existing methods heavily rely on labor-intensive nucleus-level annotations and struggle to fully exploit large-scale unlabeled data for learning discriminative nucleus representations. In this work, we propose MUSE (MUlti-scale denSE self-distillation), a novel self-supervised learning method tailored for NDC. At its core is NuLo (Nucleus-based Local self-distillation), a coordinate-guided mechanism that enables flexible local self-distillation based on predicted nucleus positions. By removing the need for strict spatial alignment between augmented views, NuLo allows critical cross-scale alignment, thus unlocking the capacity of models for fine-grained nucleus-level representation. To support MUSE, we design a simple yet effective encoder-decoder architecture and a large field-of-view semi-supervised fine-tuning strategy that together maximize the value of unlabeled pathology images. Extensive experiments on three widely used benchmarks demonstrate that MUSE effectively addresses the core challenges of histopathological NDC. The resulting models not only surpass state-of-the-art supervised baselines but also outperform generic pathology foundation models.

PDF Details DOI

JBHI Journal 2025 Journal Article

Med-Query: Steerable Parsing of 9-DoF Medical Anatomies With Query Embedding

Heng Guo
Jianfeng Zhang
Ke Yan
Le Lu
Minfeng Xu

Automatic parsing of human anatomies at the instance-level from 3D computed tomography (CT) is a prerequisite step for many clinical applications. The presence of pathologies, broken structures or limited field-of-view (FOV) can all make anatomy parsing algorithms vulnerable. In this work, we explore how to leverage and implement the successful detection-then-segmentation paradigm for 3D medical data, and propose a steerable, robust, and efficient computing framework for detection, identification, and segmentation of anatomies in CT scans. Considering the complicated shapes, sizes, and orientations of anatomies, without loss of generality, we present a nine degrees of freedom (9-DoF) pose estimation solution in full 3D space using a novel single-stage, non-hierarchical representation. Our whole framework is executed in a steerable manner where any anatomy of interest can be directly retrieved to further boost inference efficiency. We have validated our method on three medical imaging parsing tasks: ribs, spine, and abdominal organs. For rib parsing, CT scans have been annotated at the rib instance-level for quantitative evaluation, similarly for spine vertebrae and abdominal organs. Extensive experiments on 9-DoF box detection and rib instance segmentation demonstrate the high efficiency and effectiveness of our framework (with the identification rate of 97. 0% and the segmentation Dice score of 90. 9%), compared favorably against several strong baselines (e. g. , CenterNet, FCOS, and nnU-Net). For spine parsing and abdominal multi-organ segmentation, our method achieves competitive results on par with state-of-the-art methods on the public CTSpine1K dataset and FLARE22 competition, respectively.

Details DOI

AAAI Conference 2025 Conference Paper

Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model Using 3D Whole-Body CT Scans

Heng Guo
Jianfeng Zhang
Jiaxing Huang
Tony C. W. Mok
Dazhou Guo
Ke Yan
Le Lu
Dakai Jin

Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaptation in medical image segmentation tasks shows significant performance drops. It also requires an excessive number of prompt points to obtain a reasonable accuracy. Although quite a few studies explore adapting SAM into medical image volumes, the efficiency of 2D adaptation methods is unsatisfactory and 3D adaptation methods are only capable of segmenting specific organs/tumors. In this work, we propose a comprehensive and scalable 3D SAM model for whole-body CT segmentation, named CT-SAM3D. Instead of adapting SAM, we propose a 3D promptable segmentation model using a (nearly) fully labeled CT dataset. To train CT-SAM3D effectively, ensuring the model's accurate responses to higher-dimensional spatial prompts is crucial, and 3D patch-wise training is required due to GPU memory constraints. Therefore, we propose two key technical developments: 1) a progressively and spatially aligned prompt encoding method to effectively encode click prompts in local 3D space; and 2) a cross-patch prompt scheme to capture more 3D spatial context, which is beneficial for reducing the editing workloads when interactively prompting on large organs. CT-SAM3D is trained using a curated dataset of 1204 CT scans containing 107 whole-body anatomies and extensively validated using five datasets, achieving significantly better results against all previous SAM-derived models.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Deep Implicit Statistical Shape Models for 3D Medical Image Delineation

Ashwin Raju
Shun Miao
Dakai Jin
Le Lu
Junzhou Huang
Adam P. Harrison

3D delineation of anatomical structures is a cardinal goal in medical imaging analysis. Prior to deep learning, statistical shape models (SSMs) that imposed anatomical constraints and produced high quality surfaces were a core technology. Today’s fully-convolutional networks (FCNs), while dominant, do not offer these capabilities. We present deep implicit statistical shape models (DISSMs), a new approach that marries the representation power of deep networks with the benefits of SSMs. DISSMs use an implicit representation to produce compact and descriptive deep surface embeddings that permit statistical models of anatomical variance. To reliably fit anatomically plausible shapes to an image, we introduce a novel rigid and non-rigid pose estimation pipeline that is modelled as a Markov decision process (MDP). Intra-dataset experiments on the task of pathological liver segmentation demonstrate that DISSMs can perform more robustly than four leading FCN models, including nnU-Net + an adversarial prior: reducing the mean Hausdorff distance (HD) by 7. 5-14. 3 mm and improving the worst case Dice-Sørensen coefficient (DSC) by 1. 2-2. 3%. More critically, cross-dataset experiments on an external and highly challenging clinical dataset demonstrate that DISSMs improve the mean DSC and HD by 2. 1-5. 9% and 9. 9-24. 5 mm, respectively, and the worst-case DSC by 5. 4-7. 3%. Supplemental validation on a highly challenging and low-contrast larynx dataset further demonstrate DISSM’s improvements. These improvements are over and above any benefits from representing delineations with high-quality surfaces.

PDF Details

AAAI Conference 2021 Conference Paper

Window Loss for Bone Fracture Detection and Localization in X-ray Images with Point-based Annotation

Xinyu Zhang
Yirui Wang
Chi-Tung Cheng
Le Lu
Adam P. Harrison
Jing Xiao
Chien-Hung Liao
Shun Miao

Object detection methods are widely adopted for computeraided diagnosis using medical images. Anomalous findings are usually treated as objects that are described by bounding boxes. Yet, many pathological findings, e. g. , bone fractures, cannot be clearly defined by bounding boxes, owing to considerable instance, shape and boundary ambiguities. This makes bounding box annotations, and their associated losses, highly ill-suited. In this work, we propose a new bone fracture detection method for X-ray images, based on a labor effective and flexible annotation scheme suitable for abnormal findings with no clear object-level spatial extents or boundaries. Our method employs a simple, intuitive, and informative pointbased annotation protocol to mark localized pathology information. To address the uncertainty in the fracture scales annotated via point(s), we convert the annotations into pixel-wise supervision that uses lower and upper bounds with positive, negative, and uncertain regions. A novel Window Loss is subsequently proposed to only penalize the predictions outside of the uncertain regions. Our method has been extensively evaluated on 4410 pelvic X-ray images of unique patients. Experiments demonstrate that our method outperforms previous state-of-the-art image classification and object detection baselines by healthy margins, with an AUROC of 0. 983 and FROC score of 89. 6%.

PDF Details

JBHI Journal 2020 Journal Article

Thorax-Net: An Attention Regularized Deep Neural Network for Classification of Thoracic Diseases on Chest Radiography

Hongyu Wang
Haozhe Jia
Le Lu
Yong Xia

Deep learning techniques have been increasingly used to provide more accurate and more accessible diagnosis of thorax diseases on chest radiographs. However, due to the lack of dense annotation of large-scale chest radiograph data, this computer-aided diagnosis task is intrinsically a weakly supervised learning problem and remains challenging. In this paper, we propose a novel deep convolutional neural network called Thorax-Net to diagnose 14 thorax diseases using chest radiography. Thorax-Net consists of a classification branch and an attention branch. The classification branch serves as a uniform feature extraction-classification network to free users from the troublesome hand-crafted feature extraction and classifier construction. The attention branch exploits the correlation between class labels and the locations of pathological abnormalities via analyzing the feature maps learned by the classification branch. Feeding a chest radiograph to the trained Thorax-Net, a diagnosis is obtained by averaging and binarizing the outputs of two branches. The proposed Thorax-Net model has been evaluated against three state-of-the-art deep learning models using the patientwise official split of the ChestX-ray14 dataset and against other five deep learning models using the imagewise random data split. Our results show that Thorax-Net achieves an average per-class area under the receiver operating characteristic curve (AUC) of 0. 7876 and 0. 896 in both experiments, respectively, which are higher than the AUC values obtained by other deep models when they were all trained with no external data.

Details DOI

JBHI Journal 2017 Journal Article

DeepPap: Deep Convolutional Networks for Cervical Cell Classification

Ling Zhang
Le Lu
Isabella Nogues
Ronald M. Summers
Shaoxiong Liu
Jianhua Yao

Automation-assisted cervical screening via Pap smear or liquid-based cytology (LBC) is a highly effective cell imaging based cancer detection tool, where cells are partitioned into “abnormal” and “normal” categories. However, the success of most traditional classification methods relies on the presence of accurate cell segmentations. Despite sixty years of research in this field, accurate segmentation remains a challenge in the presence of cell clusters and pathologies. Moreover, previous classification methods are only built upon the extraction of hand-crafted features, such as morphology and texture. This paper addresses these limitations by proposing a method to directly classify cervical cells-without prior segmentation- based on deep features, using convolutional neural networks (ConvNets). First, the ConvNet is pretrained on a natural image dataset. It is subsequently fine-tuned on a cervical cell dataset consisting of adaptively resampled image patches coarsely centered on the nuclei. In the testing phase, aggregation is used to average the prediction scores of a similar set of image patches. The proposed method is evaluated on both Pap smear and LBC datasets. Results show that our method outperforms previous algorithms in classification accuracy (98. 3%), area under the curve (0. 99) values, and especially specificity (98. 3%), when applied to the Herlev benchmark Pap smear dataset and evaluated using five-fold cross validation. Similar superior performances are also achieved on the HEMLBC (H&E stained manual LBC) dataset. Our method is promising for the development of automation-assisted reading systems in primary cervical screening.

Details DOI

JMLR Journal 2016 Journal Article

Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database for Automated Image Interpretation

Hoo-Chang Shin
Le Lu
Lauren Kim
Ari Seff
Jianhua Yao
Ronald M. Summers

Despite tremendous progress in computer vision, there has not been an attempt to apply machine learning on very large-scale medical image databases. We present an interleaved text/image deep learning system to extract and mine the semantic interactions of radiology images and reports from a national research hospital's Picture Archiving and Communication System. With natural language processing, we mine a collection of $\sim$216K representative two-dimensional images selected by clinicians for diagnostic reference and match the images with their descriptions in an automated manner. We then employ a weakly supervised approach using all of our available data to build models for generating approximate interpretations of patient images. Finally, we demonstrate a more strictly supervised approach to detect the presence and absence of a number of frequent disease types, providing more specific interpretations of patient scans. A relatively small amount of data is used for this part, due to the challenge in gathering quality labels from large raw text data. Our work shows the feasibility of large-scale learning and prediction in electronic patient records available in most modern clinical institutions. It also demonstrates the trade-offs to consider in designing machine learning systems for analyzing large medical data. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

PDF Details

NeurIPS Conference 2006 Conference Paper

Dynamic Foreground/Background Extraction from Images and Videos using Random Patches

Le Lu
Gregory Hager

In this paper, we propose a novel exemplar-based approach to extract dynamic foreground regions from a changing background within a collection of images or a video sequence. By using image segmentation as a pre-processing step, we convert this traditional pixel-wise labeling problem into a lower-dimensional supervised, binary labeling procedure on image segments. Our approach consists of three steps. First, a set of random image patches are spatially and adaptively sampled within each segment. Second, these sets of extracted samples are formed into two "bags of patches" to model the foreground/background appearance, respectively. We perform a novel bidirectional consistency check between new patches from incoming frames and current "bags of patches" to reject outliers, control model rigidity and make the model adaptive to new observations. Within each bag, image patches are further partitioned and resampled to create an evolving appearance model. Finally, the foreground/background decision over segments in an image is formulated using an aggregation function defined on the similarity measurements of sampled patches relative to the foreground and background models. The essence of the algorithm is conceptually simple and can be easily implemented within a few hundred lines of Matlab code. We evaluate and validate the proposed approach by extensive real examples of the object-level image mapping and tracking within a variety of challenging environments. We also show that it is straightforward to apply our problem formulation on non-rigid object tracking with difficult surveillance videos.

PDF Details

NeurIPS Conference 2004 Conference Paper

A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Le Lu
Gregory Hager
Laurent Younes

Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representa- tion. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multi- resolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling in- variance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.

PDF Details