Author name cluster

Jintai Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

2 author rows

JBHI Journal 2026 Journal Article

RetinexDA: Progressive Disentanglement Domain Adaptation for Unsupervised Cross-Modality Medical Image Segmentation

Yixuan Wu
Mingze Yin
Zitai Kong
Jintai Chen
Jian Wu
Honghao Gao
Hongxia Xu

Deep neural networks have achieved strong performance in medical image segmentation when the training and testing data share similar appearance characteristics. However, this assumption is rarely satisfied in practical clinical scenarios, where imaging protocols, scanner vendors, and modality physics differ substantially, resulting in severe performance degradation when the model is deployed to new environments. To address this challenge, we propose RetinexDA, a novel unsupervised domain adaptation framework that explicitly decomposes a medical image into domain-invariant structural and domain-specific appearance representations. This Retinex-inspired formulation preserves essential anatomical details while mitigating modality-dependent variations. Furthermore, we introduce Disentangled Knowledge Distillation (DKD) to ensure mutual semantic alignment between the structure–appearance decomposition in pixel space and the encoded features in latent space, strengthening fine-grained segmentation capability. In addition, a Bézier-curve domain bridging strategy is developed to generate smoothly transitioned intermediate samples across domains, improving adaptation robustness under large modality discrepancies. Extensive experiments on abdominal CT and cardiac MRI segmentation tasks demonstrate that RetinexDA surpasses state-of-the-art unsupervised domain adaptation approaches, showing strong potential for scalable and reliable clinical deployment.

Details DOI

AAAI Conference 2025 Conference Paper

ProtCLIP: Function-Informed Protein Multi-Modal Learning

Hanjing Zhou
Mingze Yin
Wei Wu
Mingyang Li
Kun Fu
Jintai Chen
Jian Wu
Zheng Wang

Multi-modality pre-training paradigm that aligns protein sequences and biological descriptions has learned general protein representations and achieved promising performance in various downstream applications. However, these works were still unable to replicate the extraordinary success of language-supervised visual foundation models due to the ineffective usage of aligned protein-text paired data and the lack of an effective function-informed pre-training paradigm. To address these issues, this paper curates a large-scale protein-text paired dataset called ProtAnno with a property-driven sampling strategy, and introduces a novel function-informed protein pre-training paradigm. Specifically, the sampling strategy determines selecting probability based on the sample confidence and property coverage, balancing the data quality and data quantity in face of large-scale noisy data. Furthermore, motivated by significance of the protein specific functional mechanism, the proposed paradigm explicitly model protein static and dynamic functional segments by two segment-wise pre-training objectives, injecting fine-grained information in a function-informed manner. Leveraging all these innovations, we develop ProtCLIP, a multi-modality foundation model that comprehensively represents function-aware protein embeddings. On 22 different protein benchmarks within 5 types, including protein functionality classification, mutation effect prediction, cross-modal transformation, semantic similarity inference and protein-protein interaction prediction, our ProtCLIP consistently achieves SOTA performance, with remarkable improvements of 75% on average in five cross-modal transformation benchmarks, 59.9% in GO-CC and 39.7% in GO-BP protein function prediction. The experimental results verify the extraordinary potential of ProtCLIP serving as the protein multi-modality foundation model.

PDF Details DOI

ICLR Conference 2025 Conference Paper

Small Models are LLM Knowledge Triggers for Medical Tabular Prediction

Jiahuan Yan
Jintai Chen
Chaowen Hu
Bo Zheng 0011
Yaojun Hu
Jimeng Sun 0001
Jian Wu 0001

Recent development in large language models (LLMs) has demonstrated impressive domain proficiency on unstructured textual or multi-modal tasks. However, despite with intrinsic world knowledge, their application on structured tabular data prediction still lags behind, primarily due to the numerical insensitivity and modality discrepancy that brings a gap between LLM reasoning and statistical tabular learning. Unlike textual or vision data (e.g., electronic clinical notes or medical imaging data), tabular data is often presented in heterogeneous numerical values (e.g., CBC reports). This ubiquitous data format requires intensive expert annotation, and its numerical nature limits LLMs' capability to effectively transfer untapped domain expertise. In this paper, we propose SERSAL, a general self-prompting method by synergy learning with small models to enhance LLM tabular prediction in an unsupervised manner. Specifically, SERSAL utilizes the LLM's prior outcomes as original soft noisy annotations, which are dynamically leveraged to teach a better small student model. Reversely, the outcomes from the trained small model are used to teach the LLM to further refine its real capability. This process can be repeatedly applied to gradually distill refined knowledge for continuous progress. Comprehensive experiments on widely used medical domain tabular datasets show that, without access to gold labels, applying SERSAL to OpenAI GPT reasoning process attains substantial improvement compared to linguistic prompting methods, which serves as an orthogonal direction for tabular LLM, and increasing prompting bonus is observed as more powerful LLMs appear. Codes are available at https://github.com/jyansir/sersal.

Details

NeurIPS Conference 2025 Conference Paper

Toward Human Deictic Gesture Target Estimation

Xu Cao
Pranav Virupaksha
Sangmin Lee
Bolin Lai
Wenqi Jia
Jintai Chen
James Rehg

Humans have a remarkable ability to use co-speech deictic gestures, such as pointing and showing, to enrich verbal communication and support social interaction. These gestures are so fundamental that infants begin to use them even before they acquire spoken language, which highlights their central role in human communication. Understanding the intended targets of another individual's deictic gestures enables inference of their intentions, comprehension of their current actions, and prediction of upcoming behaviors. Despite its significance, gesture target estimation remains an underexplored task within the computer vision community. In this paper, we introduce GestureTarget, a novel task designed specifically for comprehensive evaluation of social deictic gesture semantic target estimation. To address this task, we propose TransGesture, a set of Transformer-based gesture target prediction models. Given an input image and the spatial location of a person, our models predict the intended target of their gesture within the scene. Critically, our gaze-aware joint cross attention fusion model demonstrates how incorporating gaze-following cues significantly improves gesture target mask prediction IoU by 6% and gesture existence prediction accuracy by 10%. Our results underscore the complexity and importance of integrating gaze cues into deictic gesture intention understanding, advocating for increased research attention to this emerging area. All data, code will be made publicly available upon acceptance. Code of TransGesture is available at GitHub. com/IrohXu/TransGesture.

PDF Details

ICLR Conference 2024 Conference Paper

Making Pre-trained Language Models Great on Tabular Prediction

Jiahuan Yan
Bo Zheng 0011
Hongxia Xu
Yiheng Zhu 0002
Danny Z. Chen
Jimeng Sun 0001
Jian Wu 0001
Jintai Chen

The transferability of deep neural networks (DNNs) has made significant progress in image and language processing. However, due to the heterogeneity among tables, such DNN bonus is still far from being well exploited on tabular data prediction (e.g., regression or classification tasks). Condensing knowledge from diverse domains, language models (LMs) possess the capability to comprehend feature names from various tables, potentially serving as versatile learners in transferring knowledge across distinct tables and diverse prediction tasks, but their discrete text representation space is inherently incompatible with numerical feature values in tables. In this paper, we present TP-BERTa, a specifically pre-trained LM for tabular data prediction. Concretely, a novel relative magnitude tokenization converts scalar numerical feature values to finely discrete, high-dimensional tokens, and an intra-feature attention approach integrates feature values with the corresponding feature names. Comprehensive experiments demonstrate that our pre-trained TP-BERTa leads the performance among tabular DNNs and is competitive with Gradient Boosted Decision Tree models in typical tabular data regime.

Details

IJCAI Conference 2024 Conference Paper

Personalized Heart Disease Detection via ECG Digital Twin Generation

Yaojun Hu
Jintai Chen
Lianting Hu
Dantong Li
Jiahuan Yan
Haochao Ying
Huiying Liang
Jian Wu

Heart diseases rank among the leading causes of global mortality, demonstrating a crucial need for early diagnosis and intervention. Most traditional electrocardiogram (ECG) based automated diagnosis methods are trained at population level, neglecting the customization of personalized ECGs to enhance individual healthcare management. A potential solution to address this limitation is to employ digital twins to simulate symptoms of diseases in real patients. In this paper, we present an innovative prospective learning approach for personalized heart disease detection, which generates digital twins of healthy individuals' anomalous ECGs and enhances the model sensitivity to the personalized symptoms. In our approach, a vector quantized feature separator is proposed to locate and isolate the disease symptom and normal segments in ECG signals with ECG report guidance. Thus, the ECG digital twins can simulate specific heart diseases used to train a personalized heart disease detection model. Experiments demonstrate that our approach not only excels in generating high-fidelity ECG signals but also improves personalized heart disease detection. Moreover, our approach ensures robust privacy protection, safeguarding patient data in model development. The code can be found at https: //github. com/huyjj/LAVQ-Editor.

PDF Details DOI

JBHI Journal 2024 Journal Article

Polygonal Approximation Learning for Convex Object Segmentation in Biomedical Images With Bounding Box Supervision

Wenhao Zheng
Jintai Chen
Kai Zhang
Jiahuan Yan
Jinhong Wang
Yi Cheng
Bang Du
Danny Z. Chen

As a common and critical medical image analysis task, deep learning based biomedical image segmentation is hindered by the dependence on costly fine-grained annotations. To alleviate this data dependence, in this article, a novel approach, called Polygonal Approximation Learning (PAL), is proposed for convex object instance segmentation with only bounding-box supervision. The key idea behind PAL is that the detection model for convex objects already contains the necessary information for segmenting them since their convex hulls, which can be generated approximately by the intersection of bounding boxes, are equivalent to the masks representing the objects. To extract the essential information from the detection model, a repeated detection approach is employed on biomedical images where various rotation angles are applied and a dice loss with the projection of the rotated detection results is utilized as a supervised signal in training our segmentation model. In biomedical imaging tasks involving convex objects, such as nuclei instance segmentation, PAL outperforms the known models (e. g. , BoxInst) that rely solely on box supervision. Furthermore, PAL achieves comparable performance with mask-supervised models including Mask R-CNN and Cascade Mask R-CNN. Interestingly, PAL also demonstrates remarkable performance on non-convex object instance segmentation tasks, for example, surgical instrument and organ instance segmentation.

Details DOI

ICLR Conference 2023 Conference Paper

Cross-Layer Retrospective Retrieving via Layer Attention

Yanwen Fang
Yuxi Cai
Jintai Chen
Jingyu Zhao 0001
Guangjian Tian
Guodong Li

More and more evidence has shown that strengthening layer interactions can enhance the representation power of a deep neural network, while self-attention excels at learning interdependencies by retrieving query-activated information. Motivated by this, we devise a cross-layer attention mechanism, called multi-head recurrent layer attention (MRLA), that sends a query representation of the current layer to all previous layers to retrieve query-related information from different levels of receptive fields. A light-weighted version of MRLA is also proposed to reduce the quadratic computation cost. The proposed layer attention mechanism can enrich the representation power of many state-of-the-art vision networks, including CNNs and vision transformers. Its effectiveness has been extensively evaluated in image classification, object detection and instance segmentation tasks, where improvements can be consistently observed. For example, our MRLA can improve 1.6% Top-1 accuracy on ResNet-50, while only introducing 0.16M parameters and 0.07B FLOPs. Surprisingly, it can boost the performances by a large margin of 3-4% box AP and mask AP in dense prediction tasks. Our code is available at https://github.com/joyfang1106/MRLA.

Details

AAAI Conference 2023 Conference Paper

T2G-FORMER: Organizing Tabular Features into Relation Graphs Promotes Heterogeneous Feature Interaction

Jiahuan Yan
Jintai Chen
Yixuan Wu
Danny Z. Chen
Jian Wu

Recent development of deep neural networks (DNNs) for tabular learning has largely benefited from the capability of DNNs for automatic feature interaction. However, the heterogeneity nature of tabular features makes such features relatively independent, and developing effective methods to promote tabular feature interaction still remains an open problem. In this paper, we propose a novel Graph Estimator, which automatically estimates the relations among tabular features and builds graphs by assigning edges between related features. Such relation graphs organize independent tabular features into a kind of graph data such that interaction of nodes (tabular features) can be conducted in an orderly fashion. Based on our proposed Graph Estimator, we present a bespoke Transformer network tailored for tabular learning, called T2G-Former, which processes tabular data by performing tabular feature interaction guided by the relation graphs. A specific Cross-level Readout collects salient features predicted by the layers in T2G-Former across different levels, and attains global semantics for final prediction. Comprehensive experiments show that our T2G-Former achieves superior performance among DNNs and is competitive with non-deep Gradient Boosted Decision Tree models. The code and detailed results are available at https://github.com/jyansir/t2g-former.

PDF Details DOI

ICLR Conference 2023 Conference Paper

TabCaps: A Capsule Neural Network for Tabular Data Classification with BoW Routing

Jintai Chen
Kuanlun Liao
Yanwen Fang
Danny Z. Chen
Jian Wu 0001

Records in a table are represented by a collection of heterogeneous scalar features. Previous work often made predictions for records in a paradigm that processed each feature as an operating unit, which requires to well cope with the heterogeneity. In this paper, we propose to encapsulate all feature values of a record into vectorial features and process them collectively rather than have to deal with individual ones, which directly captures the representations at the data level and benefits robust performances. Specifically, we adopt the concept of "capsules" to organize features into vectorial features, and devise a novel capsule neural network called "TabCaps" to process the vectorial features for classification. In TabCaps, a record is encoded into several vectorial features by some optimizable multivariate Gaussian kernels in the primary capsule layer, where each vectorial feature represents a specific "profile" of the input record and is transformed into senior capsule layer under the guidance of a new straightforward routing algorithm. The design of routing algorithm is motivated by the Bag-of-Words (BoW) model, which performs capsule feature grouping straightforwardly and efficiently, in lieu of the computationally complex clustering of previous routing algorithms. Comprehensive experiments show that TabCaps achieves competitive and robust performances in tabular data classification tasks.

Details

AAAI Conference 2022 Conference Paper

DANets: Deep Abstract Networks for Tabular Data Classification and Regression

Jintai Chen
Kuanlun Liao
Yao Wan
Danny Z. Chen
Jian Wu

Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e. g. , convolution) and extensible neural networks (e. g. , ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (ABSTLAY), which learns to explicitly group correlative input features and generate higherlevel features for semantics abstraction. Also, we design a structure re-parameterization method to compress the trained ABSTLAY, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using ABSTLAYs, and we construct a family of Deep Abstract Networks (DANETs) for tabular data classification and regression by stacking such blocks. In DANETs, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our ABSTLAY and DANETs are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANET as it goes deep, verifying the extendibility of our method. Our code is available at https: //github. com/WhatAShot/DANet.

PDF Details

ICML Conference 2022 Conference Paper

ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases

Jintai Chen
Kuanlun Liao
Kun Wei
Haochao Ying
Danny Z. Chen
Jian Wu 0001

Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases. Many studies have devised ECG analysis models (e. g. , classifiers) to assist diagnosis. As an upstream task, researches have built generative models to synthesize ECG data, which are beneficial to providing training samples, privacy protection, and annotation reduction. However, previous generative methods for ECG often neither synthesized multi-view data, nor dealt with heart disease conditions. In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. Since ECG manifestations of heart diseases are often localized in specific waveforms, we propose a new "mixup normalization" to inject disease information precisely into suitable locations. In addition, we propose a "view discriminator" to revert disordered ECG views into a pre-determined order, supervising the generator to obtain ECG representing correct view characteristics. Besides, a new metric, rFID, is presented to assess the quality of the synthesized ECG signals. Comprehensive experiments verify that our ME-GAN performs well on multi-view ECG signal synthesis with trusty morbid manifestations.

Details

JBHI Journal 2021 Journal Article

A Deep Learning Approach for Colonoscopy Pathology WSI Analysis: Accurate Segmentation and Classification

Ruiwei Feng
Xuechen Liu
Jintai Chen
Danny Z. Chen
Honghao Gao
Jian Wu

Colorectal cancer (CRC) is one of the most life-threatening malignancies. Colonoscopy pathology examination can identify cells of early-stage colon tumors in small tissue image slices. But, such examination is time-consuming and exhausting on high resolution images. In this paper, we present a new framework for colonoscopy pathology whole slide image (WSI) analysis, including lesion segmentation and tissue diagnosis. Our framework contains an improved U-shape network with a VGG net as backbone, and two schemes for training and inference, respectively (the training scheme and inference scheme). Based on the characteristics of colonoscopy pathology WSI, we introduce a specific sampling strategy for sample selection and a transfer learning strategy for model training in our training scheme. Besides, we propose a specific loss function, class-wise DSC loss, to train the segmentation network. In our inference scheme, we apply a sliding-window based sampling strategy for patch generation and diploid ensemble (data ensemble and model ensemble) for the final prediction. We use the predicted segmentation mask to generate the classification probability for the likelihood of WSI being malignant. To our best knowledge, DigestPath 2019 is the first challenge and the first public dataset available on colonoscopy tissue screening and segmentation, and our proposed framework yields good performance on this dataset. Our new framework achieved a DSC of 0. 7789 and AUC of 1 on the online test dataset, and we won the $2\text{nd}$ place in the DigestPath 2019 Challenge (task 2). Our code is available at https://github.com/bhfs9999/colonoscopy_tissue_screen_and_segmentation.

Details DOI

ICML Conference 2021 Conference Paper

A Receptor Skeleton for Capsule Neural Networks

Jintai Chen
Hongyun Yu
Chengde Qian
Danny Z. Chen
Jian Wu 0001

In previous Capsule Neural Networks (CapsNets), routing algorithms often performed clustering processes to assemble the child capsules’ representations into parent capsules. Such routing algorithms were typically implemented with iterative processes and incurred high computing complexity. This paper presents a new capsule structure, which contains a set of optimizable receptors and a transmitter is devised on the capsule’s representation. Specifically, child capsules’ representations are sent to the parent capsules whose receptors match well the transmitters of the child capsules’ representations, avoiding applying computationally complex routing algorithms. To ensure the receptors in a CapsNet work cooperatively, we build a skeleton to organize the receptors in different capsule layers in a CapsNet. The receptor skeleton assigns a share-out objective for each receptor, making the CapsNet perform as a hierarchical agglomerative clustering process. Comprehensive experiments verify that our approach facilitates efficient clustering processes, and CapsNets with our approach significantly outperform CapsNets with previous routing algorithms on image classification, affine transformation generalization, overlapped object recognition, and representation semantic decoupling.

Details

IJCAI Conference 2021 Conference Paper

Electrocardio Panorama: Synthesizing New ECG views with Self-supervision

Jintai Chen
Xiangshang Zheng
Hongyun Yu
Danny Z. Chen
Jian Wu

Multi-lead electrocardiogram (ECG) provides clinical information of heartbeats from several fixed viewpoints determined by the lead positioning. However, it is often not satisfactory to visualize ECG signals in these fixed and limited views, as some clinically useful information is represented only from a few specific ECG viewpoints. For the first time, we propose a new concept, Electrocardio Panorama, which allows visualizing ECG signals from any queried viewpoints. To build Electrocardio Panorama, we assume that an underlying electrocardio field exists, representing locations, magnitudes, and directions of ECG signals. We present a Neural electrocardio field Network (Nef-Net), which first predicts the electrocardio field representation by using a sparse set of one or few input ECG views and then synthesizes Electrocardio Panorama based on the predicted representations. Specially, to better disentangle electrocardio field information from viewpoint biases, a new Angular Encoding is proposed to process viewpoint angles. Also, we propose a self-supervised learning approach called Standin Learning, which helps model the electrocardio field without direct supervision. Further, with very few modifications, Nef-Net can synthesize ECG signals from scratch. Experiments verify that our Nef-Net performs well on Electrocardio Panorama synthesis, and outperforms the previous work on the auxiliary tasks (ECG view transformation and ECG synthesis from scratch). The codes and the division labels of cardiac cycles and ECG deflections on Tianchi ECG and PTB datasets are available at https: //github. com/WhatAShot/Electrocardio-Panorama.

PDF Details DOI