EAAI Journal 2025 Journal Article
A dual-branch convolutional neural network with domain-informed attention for arrhythmia classification of 12-lead electrocardiograms
- Rucheng Jiang
- Bin Fu
- Renfa Li
- Rui Li
- Danny Z. Chen
- Yan Liu
- Guoqi Xie
- Keqin Li
Author name cluster
Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.
EAAI Journal 2025 Journal Article
NeurIPS Conference 2024 Conference Paper
Lane detection is an important yet challenging task in autonomous driving systems. Existing lane detection methods mainly rely on finer-scale information to identify key points of lane lines. Since local information in realistic road environments is frequently obscured by other vehicles or affected by poor outdoor lighting conditions, these methods struggle with the regression of such key points. In this paper, we propose a novel Siamese Transformer with hierarchical refinement for lane detection to improve the detection accuracy in complex road environments. Specifically, we propose a high-to-low hierarchical refinement Transformer structure, called LAne TRansformer (LATR), to refine the key points of lane lines, which integrates global semantics information and finer-scale features. Moreover, exploiting the thin and long characteristics of lane lines, we propose a novel Curve-IoU loss to supervise the fit of lane lines. Extensive experiments on three benchmark datasets of lane detection demonstrate that our proposed new method achieves state-of-the-art results with high accuracy and efficiency. Specifically, our method achieves improved F1 scores on the OpenLane dataset, surpassing the current best-performing method by 5. 0 points.
IJCAI Conference 2024 Conference Paper
With the rapid advance of computer graphics and artificial intelligence technologies, the ways we interact with the world have undergone a transformative shift. Virtual Reality (VR) technology, aided by artificial intelligence (AI), has emerged as a dominant interaction media in multiple application areas, thanks to its advantage of providing users with immersive experiences. Among those applications, medicine is considered one of the most promising areas. In this paper, we present a comprehensive examination of the burgeoning field of AI-enhanced VR applications in medical care and services. By introducing a systematic taxonomy, we meticulously classify the pertinent techniques and applications into three well-defined categories based on different phases of medical diagnosis and treatment: Visualization Enhancement, VR-related Medical Data Processing, and VR-assisted Intervention. This categorization enables a structured exploration of the diverse roles that AI-powered VR plays in the medical domain, providing a framework for a more comprehensive understanding and evaluation of these technologies. nTo our best knowledge, this work is the first systematic survey of AI-powered VR systems in medical settings, laying a foundation for future research in this interdisciplinary domain.
ICLR Conference 2024 Conference Paper
The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. However, due to the heterogeneity among tables, such DNN bonus is still far from being well exploited on tabular data prediction (e.g., regression or classification tasks). Condensing knowledge from diverse domains, language models (LMs) possess the capability to comprehend feature names from various tables, potentially serving as versatile learners in transferring knowledge across distinct tables and diverse prediction tasks, but their discrete text representation space is inherently incompatible with numerical feature values in tables. In this paper, we present TP-BERTa, a specifically pre-trained LM for tabular data prediction. Concretely, a novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names. Comprehensive experiments demonstrate that our pre-trained TP-BERTa leads the performance among tabular DNNs and is competitive with Gradient Boosted Decision Tree models in typical tabular data regime.
JBHI Journal 2024 Journal Article
As a common and critical medical image analysis task, deep learning based biomedical image segmentation is hindered by the dependence on costly fine-grained annotations. To alleviate this data dependence, in this article, a novel approach, called Polygonal Approximation Learning (PAL), is proposed for convex object instance segmentation with only bounding-box supervision. The key idea behind PAL is that the detection model for convex objects already contains the necessary information for segmenting them since their convex hulls, which can be generated approximately by the intersection of bounding boxes, are equivalent to the masks representing the objects. To extract the essential information from the detection model, a repeated detection approach is employed on biomedical images where various rotation angles are applied and a dice loss with the projection of the rotated detection results is utilized as a supervised signal in training our segmentation model. In biomedical imaging tasks involving convex objects, such as nuclei instance segmentation, PAL outperforms the known models (e. g. , BoxInst) that rely solely on box supervision. Furthermore, PAL achieves comparable performance with mask-supervised models including Mask R-CNN and Cascade Mask R-CNN. Interestingly, PAL also demonstrates remarkable performance on non-convex object instance segmentation tasks, for example, surgical instrument and organ instance segmentation.
ECAI Conference 2024 Conference Paper
Lane detection is an important yet challenging task in autonomous driving systems. Based on the development of the Visual Transformer, early Transformer-based lane detection studies have achieved promising results in some scenarios. However, for complex road conditions such as uneven illumination intensity and heavy traffic, the performance of these methods remains limited and may even be worse than that of contemporaneous CNN-based methods. In this paper, we propose a novel Transformer-based end-to-end network, called SinLane, that attains the attention weights focusing on the sparse yet meaningful locations and improves the accuracy of lane detection in complex environments. SinLane is composed of a novel Siamese Visual Transformer structure and a novel Feature Pyramid Network (FPN) structure called Pyramid Feature Integration (PFI). We utilize the proposed PFI to better integrate global semantics and finer-scale features and to promote the optimization of the Transformer. Moreover, the designed Siamese Visual Transformer is combined with multiple levels of the PFI and is employed to refine the multi-scale lane line features output from the PFI. Extensive experiments on three benchmark datasets of lane detection demonstrate that our SinLane achieves state-of-the-art results with high accuracy and efficiency. Specifically, our SinLane improves the accuracy by over 3% compared to the current best-performing Transformer-based method for lane detection on CULane.
AAAI Conference 2023 Conference Paper
Recent development of deep neural networks (DNNs) for tabular learning has largely benefited from the capability of DNNs for automatic feature interaction. However, the heterogeneity nature of tabular features makes such features relatively independent, and developing effective methods to promote tabular feature interaction still remains an open problem. In this paper, we propose a novel Graph Estimator, which automatically estimates the relations among tabular features and builds graphs by assigning edges between related features. Such relation graphs organize independent tabular features into a kind of graph data such that interaction of nodes (tabular features) can be conducted in an orderly fashion. Based on our proposed Graph Estimator, we present a bespoke Transformer network tailored for tabular learning, called T2G-Former, which processes tabular data by performing tabular feature interaction guided by the relation graphs. A specific Cross-level Readout collects salient features predicted by the layers in T2G-Former across different levels, and attains global semantics for final prediction. Comprehensive experiments show that our T2G-Former achieves superior performance among DNNs and is competitive with non-deep Gradient Boosted Decision Tree models. The code and detailed results are available at https://github.com/jyansir/t2g-former.
ICLR Conference 2023 Conference Paper
Records in a table are represented by a collection of heterogeneous scalar features. Previous work often made predictions for records in a paradigm that processed each feature as an operating unit, which requires to well cope with the heterogeneity. In this paper, we propose to encapsulate all feature values of a record into vectorial features and process them collectively rather than have to deal with individual ones, which directly captures the representations at the data level and benefits robust performances. Specifically, we adopt the concept of "capsules" to organize features into vectorial features, and devise a novel capsule neural network called "TabCaps" to process the vectorial features for classification. In TabCaps, a record is encoded into several vectorial features by some optimizable multivariate Gaussian kernels in the primary capsule layer, where each vectorial feature represents a specific "profile" of the input record and is transformed into senior capsule layer under the guidance of a new straightforward routing algorithm. The design of routing algorithm is motivated by the Bag-of-Words (BoW) model, which performs capsule feature grouping straightforwardly and efficiently, in lieu of the computationally complex clustering of previous routing algorithms. Comprehensive experiments show that TabCaps achieves competitive and robust performances in tabular data classification tasks.
AAAI Conference 2022 Conference Paper
Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e. g. , convolution) and extensible neural networks (e. g. , ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (ABSTLAY), which learns to explicitly group correlative input features and generate higherlevel features for semantics abstraction. Also, we design a structure re-parameterization method to compress the trained ABSTLAY, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using ABSTLAYs, and we construct a family of Deep Abstract Networks (DANETs) for tabular data classification and regression by stacking such blocks. In DANETs, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our ABSTLAY and DANETs are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANET as it goes deep, verifying the extendibility of our method. Our code is available at https: //github. com/WhatAShot/DANet.
JBHI Journal 2022 Journal Article
Accurate cervical lesion detection (CLD) methods using colposcopic images are highly demanded in computer-aided diagnosis (CAD) for automatic diagnosis of High-grade Squamous Intraepithelial Lesions (HSIL). However, compared to natural scene images, the specific characteristics of colposcopic images, such as low contrast, visual similarity, and ambiguous lesion boundaries, pose difficulties to accurately locating HSIL regions and also significantly impede the performance improvement of existing CLD approaches. To tackle these difficulties and better capture cervical lesions, we develop novel feature enhancing mechanisms from both global and local perspectives, and propose a new discriminative CLD framework, called CervixNet, with a Global Class Activation (GCA) module and a Local Bin Excitation (LBE) module. Specifically, the GCA module learns discriminative features by introducing an auxiliary classifier, and guides our model to focus on HSIL regions while ignoring noisy regions. It globally facilitates the feature extraction process and helps boost feature discriminability. Further, our LBE module excites lesion features in a local manner, and allows the lesion regions to be more fine-grained enhanced by explicitly modelling the inter-dependencies among bins of proposal feature. Extensive experiments on a number of 9888 clinical colposcopic images verify the superiority of our method (AP $_{. 75}$ = 20. 45) over state-of-the-art models on four widely used metrics.
ICML Conference 2022 Conference Paper
Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases. Many studies have devised ECG analysis models (e. g. , classifiers) to assist diagnosis. As an upstream task, researches have built generative models to synthesize ECG data, which are beneficial to providing training samples, privacy protection, and annotation reduction. However, previous generative methods for ECG often neither synthesized multi-view data, nor dealt with heart disease conditions. In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. Since ECG manifestations of heart diseases are often localized in specific waveforms, we propose a new "mixup normalization" to inject disease information precisely into suitable locations. In addition, we propose a "view discriminator" to revert disordered ECG views into a pre-determined order, supervising the generator to obtain ECG representing correct view characteristics. Besides, a new metric, rFID, is presented to assess the quality of the synthesized ECG signals. Comprehensive experiments verify that our ME-GAN performs well on multi-view ECG signal synthesis with trusty morbid manifestations.
JBHI Journal 2021 Journal Article
Colorectal cancer (CRC) is one of the most life-threatening malignancies. Colonoscopy pathology examination can identify cells of early-stage colon tumors in small tissue image slices. But, such examination is time-consuming and exhausting on high resolution images. In this paper, we present a new framework for colonoscopy pathology whole slide image (WSI) analysis, including lesion segmentation and tissue diagnosis. Our framework contains an improved U-shape network with a VGG net as backbone, and two schemes for training and inference, respectively (the training scheme and inference scheme). Based on the characteristics of colonoscopy pathology WSI, we introduce a specific sampling strategy for sample selection and a transfer learning strategy for model training in our training scheme. Besides, we propose a specific loss function, class-wise DSC loss, to train the segmentation network. In our inference scheme, we apply a sliding-window based sampling strategy for patch generation and diploid ensemble (data ensemble and model ensemble) for the final prediction. We use the predicted segmentation mask to generate the classification probability for the likelihood of WSI being malignant. To our best knowledge, DigestPath 2019 is the first challenge and the first public dataset available on colonoscopy tissue screening and segmentation, and our proposed framework yields good performance on this dataset. Our new framework achieved a DSC of 0. 7789 and AUC of 1 on the online test dataset, and we won the $2\text{nd}$ place in the DigestPath 2019 Challenge (task 2). Our code is available at https://github.com/bhfs9999/colonoscopy_tissue_screen_and_segmentation.
ICML Conference 2021 Conference Paper
In previous Capsule Neural Networks (CapsNets), routing algorithms often performed clustering processes to assemble the child capsules’ representations into parent capsules. Such routing algorithms were typically implemented with iterative processes and incurred high computing complexity. This paper presents a new capsule structure, which contains a set of optimizable receptors and a transmitter is devised on the capsule’s representation. Specifically, child capsules’ representations are sent to the parent capsules whose receptors match well the transmitters of the child capsules’ representations, avoiding applying computationally complex routing algorithms. To ensure the receptors in a CapsNet work cooperatively, we build a skeleton to organize the receptors in different capsule layers in a CapsNet. The receptor skeleton assigns a share-out objective for each receptor, making the CapsNet perform as a hierarchical agglomerative clustering process. Comprehensive experiments verify that our approach facilitates efficient clustering processes, and CapsNets with our approach significantly outperform CapsNets with previous routing algorithms on image classification, affine transformation generalization, overlapped object recognition, and representation semantic decoupling.
IJCAI Conference 2021 Conference Paper
Multi-lead electrocardiogram (ECG) provides clinical information of heartbeats from several fixed viewpoints determined by the lead positioning. However, it is often not satisfactory to visualize ECG signals in these fixed and limited views, as some clinically useful information is represented only from a few specific ECG viewpoints. For the first time, we propose a new concept, Electrocardio Panorama, which allows visualizing ECG signals from any queried viewpoints. To build Electrocardio Panorama, we assume that an underlying electrocardio field exists, representing locations, magnitudes, and directions of ECG signals. We present a Neural electrocardio field Network (Nef-Net), which first predicts the electrocardio field representation by using a sparse set of one or few input ECG views and then synthesizes Electrocardio Panorama based on the predicted representations. Specially, to better disentangle electrocardio field information from viewpoint biases, a new Angular Encoding is proposed to process viewpoint angles. Also, we propose a self-supervised learning approach called Standin Learning, which helps model the electrocardio field without direct supervision. Further, with very few modifications, Nef-Net can synthesize ECG signals from scratch. Experiments verify that our Nef-Net performs well on Electrocardio Panorama synthesis, and outperforms the previous work on the auxiliary tasks (ECG view transformation and ECG synthesis from scratch). The codes and the division labels of cardiac cycles and ECG deflections on Tianchi ECG and PTB datasets are available at https: //github. com/WhatAShot/Electrocardio-Panorama.
JBHI Journal 2021 Journal Article
Keratoconus is one of the most severe corneal diseases, which is difficult to detect at the early stage (i. e. , sub-clinical keratoconus) and possibly results in vision loss. In this paper, we propose a novel end-to-end deep learning approach, called KerNet, which processes the raw data of the Pentacam HR system (consisting of five numerical matrices) to detect keratoconus and sub-clinical keratoconus. Specifically, we propose a novel convolutional neural network, called KerNet, containing five branches as the backbone with a multi-level fusion architecture. The five branches receive five matrices separately and capture effectively the features of different matrices by several cascaded residual blocks. The multi-level fusion architecture (i. e. , low-level fusion and high-level fusion) moderately takes into account the correlation among five slices and fuses the extracted features for better prediction. Experimental results show that: (1) our novel approach outperforms state-of-the-art methods on an in-house dataset, by ~1% for keratoconus detection accuracy and ~4 for sub-clinical keratoconus detection accuracy; (2) the attention maps visualized by Grad-CAM show that our KerNet places more attention on the inferior temporal part for sub-clinical keratoconus, which has been proved as the identifying regions for ophthalmologists to detect sub-clinical keratoconus in previous clinical studies. To our best knowledge, we are the first to propose an end-to-end deep learning approach utilizing raw data obtained by the Pentacam HR system for keratoconus and subclinical keratoconus detection. Further, the prediction performance and the clinical significance of our KerNet are well evaluated and proved by two clinical experts. Our code is available at https://github.com/upzheng/Keratoconus.
AAAI Conference 2020 Conference Paper
Image segmentation is critical to lots of medical applications. While deep learning (DL) methods continue to improve performance for many medical image segmentation tasks, data annotation is a big bottleneck to DL-based segmentation because (1) DL models tend to need a large amount of labeled data to train, and (2) it is highly time-consuming and label-intensive to voxel-wise label 3D medical images. Significantly reducing annotation effort while attaining good performance of DL segmentation models remains a major challenge. In our preliminary experiments, we observe that, using partially labeled datasets, there is indeed a large performance gap with respect to using fully annotated training datasets. In this paper, we propose a new DL framework for reducing annotation effort and bridging the gap between full annotation and sparse annotation in 3D medical image segmentation. We achieve this by (i) selecting representative slices in 3D images that minimize data redundancy and save annotation effort, and (ii) self-training with pseudo-labels automatically generated from the base-models trained using the selected annotated slices. Extensive experiments using two public datasets (the HVSMR 2016 Challenge dataset and mouse piriform cortex dataset) show that our framework yields competitive segmentation results comparing with state-of-the-art DL methods using less than ∼ 20% of annotated data.
AAAI Conference 2019 Conference Paper
3D image segmentation plays an important role in biomedical image analysis. Many 2D and 3D deep learning models have achieved state-of-the-art segmentation performance on 3D biomedical image datasets. Yet, 2D and 3D models have their own strengths and weaknesses, and by unifying them together, one may be able to achieve more accurate results. In this paper, we propose a new ensemble learning framework for 3D biomedical image segmentation that combines the merits of 2D and 3D models. First, we develop a fully convolutional network based meta-learner to learn how to improve the results from 2D and 3D models (base-learners). Then, to minimize over-fitting for our sophisticated meta-learner, we devise a new training method that uses the results of the baselearners as multiple versions of “ground truths”. Furthermore, since our new meta-learner training scheme does not depend on manual annotation, it can utilize abundant unlabeled 3D image data to further improve the model. Extensive experiments on two public datasets (the HVSMR 2016 Challenge dataset and the mouse piriform cortex dataset) show that our approach is effective under fully-supervised, semisupervised, and transductive settings, and attains superior performance over state-of-the-art image segmentation methods.
AAAI Conference 2019 Conference Paper
Deep learning has been applied successfully to many biomedical image segmentation tasks. However, due to the diversity and complexity of biomedical image data, manual annotation for training common deep learning models is very timeconsuming and labor-intensive, especially because normally only biomedical experts can annotate image data well. Human experts are often involved in a long and iterative process of annotation, as in active learning type annotation schemes. In this paper, we propose representative annotation (RA), a new deep learning framework for reducing annotation effort in biomedical image segmentation. RA uses unsupervised networks for feature extraction and selects representative image patches for annotation in the latent space of learned feature descriptors, which implicitly characterizes the underlying data while minimizing redundancy. A fully convolutional network (FCN) is then trained using the annotated selected image patches for image segmentation. Our RA scheme offers three compelling advantages: (1) It leverages the ability of deep neural networks to learn better representations of image data; (2) it performs one-shot selection for manual annotation and frees annotators from the iterative process of common active learning based annotation schemes; (3) it can be deployed to 3D images with simple extensions. We evaluate our RA approach using three datasets (two 2D and one 3D) and show our framework yields competitive segmentation results comparing with state-of-the-art methods.
TCS Journal 2015 Journal Article
TCS Journal 2015 Journal Article
TCS Journal 2013 Journal Article
FOCS Conference 2013 Conference Paper
In this paper, we study a generalization of the classical Voronoi diagram, called clustering induced Voronoi diagram (CIVD). Different from the traditional model, CIVD takes as its sites the power set U of an input set P of objects. For each subset C of P, CIVD uses an influence function F(C, q) to measure the total (or joint) influence of all objects in C on an arbitrary point q in the space ℝ d, and determines the influence-based Voronoi cell in ℝ d for C. This generalized model offers a number of new features (e. g. , simultaneous clustering and space partition) to Voronoi diagram which are useful in various new applications. We investigate the general conditions for the influence function which ensure the existence of a small-size (e. g. , nearly linear) approximate CIVD for a set P of n points in ℝ d for some fixed d. To construct CIVD, we first present a standalone new technique, called approximate influence (AI) decomposition, for the general CIVD problem. With only O(n log n) time, the AI decomposition partitions the space ℝ d into a nearly linear number of cells so that all points in each cell receive their approximate maximum influence from the same (possibly unknown) site (i. e. , a subset of P). Based on this technique, we develop assignment algorithms to determine a proper site for each cell in the decomposition and form various (1-ε)-approximate CIVDs for some small fixed € > 0. Particularly, we consider two representative CIVD problems, vector CIVD and density-based CIVD, and show that both of them admit fast assignment algorithms; consequently, their (1 - €)-approximate CIVDs can be built in O(n log d+1 n) and O(n log 2 n) time, respectively.
TCS Journal 2012 Journal Article
SODA Conference 2011 Conference Paper
SODA Conference 2001 Conference Paper
SODA Conference 2001 Conference Paper
SODA Conference 2000 Conference Paper
STOC Conference 2000 Conference Paper
SODA Conference 1996 Conference Paper
IROS Conference 1995 Conference Paper
A conditional shortest path is a collision-free path of shortest distance based on known information on an obstacle-scattered environment at a given time. This paper investigates the problem of finding a conditional L/sub 2/ shortest path through an unknown environment in which path planning is implemented "on the fly" as new obstacle information becomes available through external sensors. We propose a novel cell decomposition approach which calculates an L/sub 2/ distance transform through the use of a circular path-planning wave. The proposed method is based on a new data structure, called the framed-quadtree, which combines together the accuracy of grid-based path planning techniques with the efficiency of quadtree-based techniques, hence having the advantages of both. The heart of this method is a linear time algorithm for computing dynamic Voronoi diagrams.