Author name cluster

Feng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

39 papers

2 author rows

JBHI Journal 2026 Journal Article

A Hierarchical Attention-Based Negative Sampling Method for Drug Repositioning Using Neighborhood Interaction Fusion

Chenglong Mi
Ling-Yun Dai
Junliang Shang
Rong Zhu
Juan Wang
Feng Li

Accurate prediction of drug–disease associations (DDAs) is essential for drug repositioning and the development of novel therapeutic strategies. However, existing methods often suffer from limited prior knowledge and the use of oversimplified negative sampling techniques, which hinder their ability to capture the complex relationships between drugs and diseases. To break through these limitations, we propose a new model, Hierarchical Attention Mechanism-Based Negative Sampling (HA-NegS), which aims to enhance the prediction of potential DDAs. In this study, HA-NegS further computes the similarity information between drugs and diseases and constructs heterogeneous and homogeneous networks based on it. For the similarity network, HA-NegS fuses Graph Convolutional Network (GCN) and Graph Attention Network (GAT) to effectively capture the neighborhood features of the target nodes. Subsequently, the model incorporates a hierarchical sampling strategy using the PageRank algorithm to rank nodes in descending order of global importance. The attention mechanism is then used to calculate the attention score and re-rank the nodes accordingly. This approach ensures the reliability of the negative sample selection. In order to obtain optimized representations, we use graph contrastive learning methods to refine drug and disease features with homogeneous and heterogeneous neighborhood information. Experimental results on a benchmark dataset show that HA-NegS outperforms existing baseline methods in predicting DDA. In addition, case studies for Alzheimer’s disease and Parkinson’s disease highlight the effectiveness of HA-NegS in discovering new therapeutic applications for existing drugs.

AAAI Conference 2026 Short Paper

Can Large Language Models Grasp 3D Medical Anatomy Shapes? (Student Abstract)

Yao Gao
Feng Li
Jeroen Van Dessel
Yi Sun
Robin Willaert

What if the next generation of human-computer interaction is not a screen... but a conversation? Large Language Models (LLMs) offer a new paradigm for interacting with computers through text, but they lack shape reasoning capabilities. We introduce Textual Anatomy Encoding (TAE), a workflow that connects LLMs with 3D anatomies. TAE employs clinician-validated semantic annotations and rule-based prompts to achieve deterministic and interpretable landmark localization. The results indicate that TAE enables LLMs to move beyond textual knowledge, achieving an accurate understanding of anatomical localization. This framework opens opportunities for diagnosis, surgical planning, and scalable medical annotation, positioning LLMs as a foundation for next-generation human–computer interaction in healthcare.

PDF Details DOI

JBHI Journal 2026 Journal Article

Epileptic Seizure Prediction Using Multi-Strategy Data Augmentation and Hierarchical Contrastive Learning

Longfei Qi
Feng Li
Junliang Shang
Daohui Ge
Shihan Wang
Shasha Yuan

Accurate early prediction of epileptic seizures is crucial for improving patients’ quality of life. However, existing seizure prediction methods often rely on large-scale labeled datasets and face challenges in generalization and real-time performance. To address these issues, this study proposes an efficient seizure prediction framework that achieves high performance even with limited labeled data, significantly reducing dependence on extensive annotations. To better distinguish preictal states, contrastive learning is employed to enhance feature separation between interictal and preictal periods, leading to improved sensitivity in detecting early seizure patterns. First, a data augmentation strategy is designed, incorporating wavelet-based frequency mixing, temporal masking, and window-based masking to enhance model robustness and generalization. Second, a hierarchical contrastive loss function is introduced, integrating instance-level and temporal contrastive learning to improve the model’s ability to capture preictal patterns. Finally, a lightweight SE-EEGNet is developed and optimized as a feature extractor, strengthening critical feature extraction and enabling real-time seizure prediction. On the CHB-MIT dataset, the proposed method achieves 94. 51% accuracy, 95. 05% sensitivity, a 0. 024/h false positive rate (FPR), and a 20. 12-minute prediction time using only 30% labeled data. On the Siena dataset, it achieves 93. 14% accuracy, 92. 77% sensitivity, and a 0. 030/h FPR. Moreover, performance improves further as the amount of labeled data increases, validating the effectiveness and practical applicability of the proposed approach in seizure prediction.

EAAI Journal 2026 Journal Article

Parameter identification for nonlinear Hammerstein models with stacked sparse autoencoder network

Feng Li
Liexin Song
Tianhu Wang
Ranran Liu

In this paper, a novel parameter identification method is addressed for Hammerstein nonlinear model with stacked sparse autoencoder (SSAE) network. The Hammerstein model presented is composed of a nonlinear block and a linear dynamic block, in which the nonlinear block is modeled by a SSAE network, and the linear dynamic block is established by an autoregression moving average model with exogenous input (ARMAX) model. To estimate the Hammerstein model parameters, step input excitation is used to decouple the nonlinear block from the linear block. Firstly, to identify the ARMAX model parameters, the multi-innovation and recursive extended theories are introduced, then a multi-innovation recursive least squares (MI-RELS) method is proposed, which improves parameter identification accuracy since the current data and past data information are utilized at each recursive computation. Secondly, parameters update of the SSAE network are implemented by layer-wise pre-training process and fine-tuning process, further employing the greedy algorithm and back propagation method to update weight and bias of the SSAE network. The simulation comparison results in numerical case and wind power systems are presented to verify that the feasibility of the developed Hammerstein model identification method.

EAAI Journal 2026 Journal Article

Single-cell distillation discriminative clustering based on asymmetric autoencoder

Junliang Shang
Aitian Fan
Baojuan Qin
Yan Zhao
Xiaohan Zhang
Shoujia Jiang
Feng Li
Jin-Xing Liu

Single-cell ribonucleic acid sequencing (scRNA-seq) technology enables the analysis of tissue heterogeneity at the single-cell level, providing essential tools for tasks such as cell type identification and trajectory inference. As a core step in single-cell data analysis, cell clustering is crucial for identifying cell types, detecting subpopulations, and understanding cellular functional states. However, traditional clustering methods often fail to capture the true cellular structure due to the lack of prior knowledge, while supervised clustering methods are prone to domain distribution mismatches. To address these challenges, this paper proposes a single-cell distillation-based discriminative clustering method (scAADC), which leverages an asymmetric autoencoder to enhance feature extraction and reconstruction capabilities. Additionally, a contrastive learning strategy is incorporated to capture the most representative features of cells. Furthermore, a distillation-based discriminative clustering module is designed to utilize source domain labels and feature distance constraints, ensuring that similar cells cluster together while distinct cell types remain well separated. This allows the model to extract discriminative information from labeled source data. Finally, based on the extracted discriminative features, target data are clustered, implicitly aligning domain distributions and reducing cluster overlap. We evaluate scAADC on both simulated and real datasets. Experimental results demonstrate that scAADC achieves accuracy (ACC) and Adjusted Rand Index (ARI) values as high as 0. 9976 and 0. 9936 on simulated datasets, and 0. 9728 and 0. 9325 on real datasets, outperforming other state-of-the-art single-cell clustering methods. By integrating Artificial Intelligence-driven feature learning with cross-domain knowledge distillation, scAADC provides an efficient and robust solution for single-cell data analysis.

EAAI Journal 2026 Journal Article

Sliding mode control for Markov jump power systems: Asynchronous learning-based method

Xiulin Wang
Lei Su
Feng Li

This paper studies the problem of asynchronous learning-based sliding mode control (LSMC) of a single-machine infinite bus (SMIB) power system. Due to the influence of instantaneous faults and stochastic switching, the power system may experience dynamic changes in the structural parameters. To describe this phenomenon, the power system is expressed as a Markov jump power system (MJPS) model. In addition, a common sliding surface is designed to overcome the problem that the sliding surface may be unreachable due to mode switching. By constructing the Lyapunov function, sufficient conditions to ensure the stochastic stability of the system are derived, and a suitable sliding matrix is solved. In order to facilitate the design of learning parameters, an LSMC law related to the hidden Markov observation mode is designed. Compared with the mode-independent LSMC, the proposed method further accelerates the convergence speed and has better control performance. Finally, the superiority of the proposed method is verified by a simulation example.

AAAI Conference 2025 Conference Paper

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

Man Liu
Huihui Bai
Feng Li
Chunjie Zhang
Yunchao Wei
Tat-Seng Chua
Yao Zhao

Zero-shot learning (ZSL) endeavors to transfer knowledge from the seen categories to recognize unseen categories, which mostly relies on the semantic-visual interactions between image and attribute tokens. Recently, the prompt learning has emerged in ZSL and demonstrated significant potential as it allows the zero-shot transfer of diverse visual concepts to downstream tasks. However, current methods explore the fixed adaptation of the learnable prompt on the seen domains, which make them over-emphasize the primary visual features observed during training, limiting their generalization capabilities to the unseen domains. In this work, we propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment, enabling effective knowledge transfer for ZSL. AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the visual and attribute modalities, grounded on the modal-sharing token that represents consistent visual-semantic concepts; and 2) yielding the semantic-enhanced prompt via the visual residual refinement unit with attribute consistency supervision. It is further integrated with primary visual features to attend to semantic-related information for visual enhancement, thus strengthening transferable ability. Experimental results on three benchmarks show that our AENet outperforms existing state-of-the-art ZSL methods.

PDF Details DOI

JBHI Journal 2025 Journal Article

Characterization of Cortical Connectivity in the Deception State With a Data-Driven Network Model Based on EEG Signal

Qianruo Kang
Yaqian Li
Xiang Li
Min Tian
Yin Xiang
Feng Li
Siyu Peng
Yijun Xiong

This study investigates the pattern of information interaction at the cortical level during deception, aiming to reveal the cognitive processes involved in the deception task. Our study involves the 64-channel EEG signals of 28 subjects (14 for innocent and 14 for guilty groups) acquired under the guilty knowledge test (GKT) lie-detection protocol. Additionally, we establish the functional connectivity network at the cortical level considering volume conduction effects, use a data-driven approach to select the regions of interest (ROIs) on the subject's cortex based on scalp electrical activity, and perform cortical current density estimation on 15 ROIs. The nonlinear dependence between the cortical waveforms of the ROIs is quantified based on mutual information, and a network of cortical mutual information connections is constructed in four frequency bands: delta, theta, alpha, and beta. The feature extraction and classification process are performed in each frequency band, and the mutual information connections statistically different between the innocent and guilty groups are first selected as features using statistical tests. Moreover, the optimal feature subset (OFS) is found by combining the SVM classifier and the wrapper feature selection strategy. Furthermore, the most important mutual information connections (MIMICs) per frequency band are obtained by refining the OFS according to the classification performance curve. The average test accuracies of MIMICs in the delta, theta, alpha, and beta bands reached 99. 76%, 96. 42%, 84. 04%, and 97. 61%, respectively. Finally, the physiological significance of each frequency sub-band and the physiological function of MIMICs are combined to explore the cognitive mechanism of lies and provide new evidence for cognitive activity in lying states.

EAAI Journal 2025 Journal Article

Denoising of the magnetic flux leakage signal using dynamic feature fusion and a multi-scale autoencoder network

Lushuai Xu
Shaohua Dong
Haotian Wei
Feng Li
Pengkun Zhang
Cong Zuo
Mingxing Guo
Penghui Liao

Magnetic flux leakage (MFL) signal denoising is essential for the nondestructive inspection of oil and gas pipelines, where complex noise interference can severely degrade defect quantification accuracy. Traditional approaches such as mean filtering and wavelet transform offer limited suppression of multi-type mixed noise and often distort critical features, including defect peaks and valleys. Even deep learning–based MFL denoising methods struggle in scenarios with substantial signal-noise overlap due to inadequate feature extraction and limited adaptability. This work presents an advanced denoising framework that combines dynamic feature fusion with a multi-scale autoencoder network. The framework jointly exploits time- and frequency-domain signal components, employing an adaptive weighting mechanism for dynamic feature fusion. Parallel convolutional branches extract multi-scale features, improving the capture of both global structures and fine-grained details, while a Squeeze-and-Excitation (SE) channel attention mechanism enhances defect-sensitive features and suppresses noise. Extensive experiments demonstrate that the proposed model outperforms mean filtering, wavelet denoising, and a baseline autoencoder, achieving notable gains in signal-to-noise ratio (SNR), mean squared error (MSE), and signal similarity. Beyond superior noise suppression, the method preserves critical defect characteristics, providing a robust and reliable foundation for precise defect quantification in pipeline MFL inspection.

ECAI Conference 2025 Conference Paper

Dual-Axis Domain Alignment for Blur-Robust Object Detection

Hanjin Yang
Feng Li
Liwen Shi
Shupei Yuan

Object detection is often trained on ideally clear images without any degradation, but in real-world scenarios, it is inevitably affected by motion blur due to the relative movement between the camera and the object, leading to a domain shift effect and a significant decline in detection performance. Different from existing methods, we define object detection with image blur as a Cross-Domain Object Detection (CDOD) task to utilize a large number of unlabeled images. In this work, we reveal that reducing the intra-domain and inter-domain discrepancy is crucial for improving the quality of pseudo-labels, and propose a novel Dual-Axis Mean Teacher approach to improve the performance of blur-robust object detection. In particular, we first introduce an auxiliary domain composed of synthetic data to reduce the cost of bridging the gap between domains, as well as a dual-branch discriminator to reduce both intra-domain and inter-domain discrepancy simultaneously. Secondly, we apply supervised contrastive learning between instance-level features from the source domain and auxiliary domain, while adaptively adjusting the intensity of contrastive learning based on image blur severity, thereby improving the quality of pseudo-labels generated by the teacher model. Furthermore, to fill the gap of the lack of benchmark datasets in the field of blur-robust object detection, we propose three object detection datasets with blur based on existing deblurring datasets, named Gopro-6C, RealBlur-6C, and REDS-7C. The results on the three datasets demonstrate the consistent superiority of our method, which outperforms the existing state-of-the-art results.

TMLR Journal 2025 Journal Article

LLaVA-OneVision: Easy Visual Task Transfer

Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Peiyuan Zhang

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos.

AAAI Conference 2025 Conference Paper

MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents

Congchi Yin
Feng Li
Shu Zhang
Zike Wang
Jun Shao
Piji Li
Jianhua Chen
Xun Jiang

The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue, we seek to synthesize diagnostic conversation by exploiting anonymized patient cases that are easier to access. Specifically, we design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders with large language models. It takes patient case as input and is capable of generating multiple diverse conversations with one single patient case. The framework basically involves the interaction between a doctor agent and a patient agent, and generates conversations under symbolic control via a dynamic diagnosis tree. By applying the proposed framework, we develop the largest Chinese mental disorders diagnosis dataset MDD-5k. This dataset is built upon 1000 real, anonymized patient cases by cooperating with Shanghai Mental Health Center and comprises 5000 high-quality long conversations with diagnosis results and treatment opinions as labels. To the best of our knowledge, it's also the first labeled dataset for Chinese mental disorders diagnosis. Human evaluation demonstrates the proposed MDD-5k dataset successfully simulates human-like diagnostic process of mental disorders.

PDF Details DOI

ICLR Conference 2025 Conference Paper

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

Yifan Zhang 0004
Huanyu Zhang
Haochen Tian 0001
Chaoyou Fu
Shuangqing Zhang
Junfei Wu
Feng Li
Kun Wang 0056

Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on model-based annotations results in restricted data quality; 3) insufficient task difficulty, especially caused by the limited image resolution. To tackle these issues, we introduce MME-RealWorld. Specifically, we collect more than $300$ K images from public datasets and the Internet, filtering $13,366$ high-quality images for annotation. This involves the efforts of professional $25$ annotators and $7$ experts in MLLMs, contributing to $29,429$ question-answer pairs that cover $43$ subtasks across $5$ real-world scenarios, extremely challenging even for humans. As far as we know, **MME-RealWorld is the largest manually annotated benchmark to date, featuring the highest resolution and a targeted focus on real-world applications**. We further conduct a thorough evaluation involving $29$ prominent MLLMs, such as GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet. Our results show that even the most advanced models struggle with our benchmarks, where none of them reach 60\% accuracy. The challenges of perceiving high-resolution images and understanding complex real-world scenarios remain urgent issues to be addressed. The data and evaluation code are released in our Project Page.

AAAI Conference 2025 Conference Paper

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

Shengeng Tang
Jiayi He
Dan Guo
Yanyan Wei
Feng Li
Richang Hong

Sign Language Production (SLP) aims to generate semantically consistent sign videos from textual statements, where the conversion from textual glosses to sign poses (G2P) is a crucial step. Existing G2P methods typically treat sign poses as discrete three-dimensional coordinates and directly fit them, which overlooks the relative positional relationships among joints. To this end, we provide a new perspective, constraining joint associations and gesture details by modeling the limb bones to improve the accuracy and naturalness of the generated poses. In this work, we propose a pioneering iconicity disentangled diffusion framework, termed Sign-IDD, specifically designed for SLP. Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints. The ID module disentangles the conventional 3D joint representation into a 4D bone representation, comprising the 3D spatial direction vector and 1D spatial distance vector between adjacent joints. Additionally, an Attribute Controllable Diffusion (ACD) module is introduced to further constrain joint associations, in which the attribute separation layer aims to separate the bone direction and length attributes, and the attribute control layer is designed to guide the pose generation by leveraging the above attributes. The ACD module utilizes the gloss embeddings as semantic conditions and finally generates sign poses from noise embeddings. Extensive experiments on PHOENIX14T and USTC-CSL datasets validate the effectiveness of our method.

PDF Details DOI

EAAI Journal 2025 Journal Article

STDDAE: Identifying spatial domains in spatial transcriptomics by dual denoising autoencoder with attention mechanism

Yue Gao
Ying-Lian Gao
Cui-Na Jiao
Xu-Ran Dou
Feng Li
Jin-Xing Liu

Spatial transcriptomics provides a novel perspective for comprehending the intricate relationship between tissue structure and function, as well as for discovering new cell types and subtypes. However, it remains a significant challenge to accurately identify spatial domains with similar gene expression, which requires efficient combination of gene expression data, histology image information, and spatial location. To address this challenge, a novel dual denoising autoencoder with attention mechanism (STDDAE) is proposed. STDDAE integrates gene expression data, histology image information and spatial location, and the decoder consists of a master decoder and a follower decoder, which are jointly optimized to generate low-dimensional latent embeddings for precise spatial domain identification. The performance of STDDAE was evaluated across four datasets with varying resolutions and platforms. The experimental findings validated that STDDAE outperformed other cutting-edg methods in spatial domain identification, trajectory inference, and data denoising. Additionally, STDDAE successfully detected differentially expressed genes within identified spatial domains, which may be valuable in disease diagnosis, prognostic assessment, and treatment selection.

AAAI Conference 2025 Conference Paper

Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues

Mingshen Wang
Zhao Zhang
Feng Li
Ke Xu
Kang Miao
Meng Wang

Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Most current methods explore layer-to-bit configuration upon varying local regions, adaptively allocating the bit to each layer and patch. Despite the benefits, they still fall short in the tradeoff of SR accuracy and quantization efficiency. Apart from this, adapting the quantization level for each layer individually can disturb the original inter-layer relationships, thus diminishing the representation capability of quantized models. In this work, we propose Granular-DQ, which takes advantage of multi-granularity clues and local patch statistics, achieving a distinctive patch-wise and layer-invariant dynamic quantization paradigm. Specifically, Granular-DQ initiates by developing a granularity-bit controller to apprehend the coarse-to-fine granular representations of local patches, matching their proportional contribution to the entire image to determine the proper bit-width allocation. On this premise, we investigate the interrelationships between bit-width and information density within high-bit patches, establishing a soft gate that enables further fine-grained dynamic bit adaption. Extensive experiments validate the superiority of Granular-DQ in the trade-off between efficiency and accuracy over recent state-of-the-art methods on various SR models.

PDF Details DOI

JBHI Journal 2024 Journal Article

Diagnosis-Guided Deep Subspace Clustering Association Study for Pathogenetic Markers Identification of Alzheimer's Disease Based on Comparative Atlases

Cui-Na Jiao
Junliang Shang
Feng Li
Xinchun Cui
Yan-Li Wang
Ying-Lian Gao
Jin-Xing Liu

The roles of brain region activities and genotypic functions in the pathogenesis of Alzheimer's disease (AD) remain unclear. Meanwhile, current imaging genetics methods are difficult to identify potential pathogenetic markers by correlation analysis between brain network and genetic variation. To discover disease-related brain connectome from the specific brain structure and the fine-grained level, based on the Automated Anatomical Labeling (AAL) and human Brainnetome atlases, the functional brain network is first constructed for each subject. Specifically, the upper triangle elements of the functional connectivity matrix are extracted as connectivity features. The clustering coefficient and the average weighted node degree are developed to assess the significance of every brain area. Since the constructed brain network and genetic data are characterized by non-linearity, high-dimensionality, and few subjects, the deep subspace clustering algorithm is proposed to reconstruct the original data. Our multilayer neural network helps capture the non-linear manifolds, and subspace clustering learns pairwise affinities between samples. Moreover, most approaches in neuroimaging genetics are unsupervised learning, neglecting the diagnostic information related to diseases. We presented a label constraint with diagnostic status to instruct the imaging genetics correlation analysis. To this end, a diagnosis-guided deep subspace clustering association (DDSCA) method is developed to discover brain connectome and risk genetic factors by integrating genotypes with functional network phenotypes. Extensive experiments prove that DDSCA achieves superior performance to most association methods and effectively selects disease-relevant genetic markers and brain connectome at the coarse-grained and fine-grained levels.

JBHI Journal 2024 Journal Article

FSCME: A Feature Selection Method Combining Copula Correlation and Maximal Information Coefficient by Entropy Weights

Qi Zhong
Junliang Shang
Qianqian Ren
Feng Li
Cui-Na Jiao
Jin-Xing Liu

Feature selection is a critical component of data mining and has garnered significant attention in recent years. However, feature selection methods based on information entropy often introduce complex mutual information forms to measure features, leading to increased redundancy and potential errors. To address this issue, we propose FSCME, a feature selection method combining Copula correlation ( Ccor ) and the maximum information coefficient (MIC) by entropy weights. The FSCME takes into consideration the relevance between features and labels, as well as the redundancy among candidate features and selected features. Therefore, the FSCME utilizes Ccor to measure the redundancy between features, while also estimating the relevance between features and labels. Meanwhile, the FSCME employs MIC to enhance the credibility of the correlation between features and labels. Moreover, this study employs the Entropy Weight Method (EWM) to evaluate and assign weights to the Ccor and MIC. The experimental results demonstrate that FSCME yields a more effective feature subset for subsequent clustering processes, significantly improving the classification performance compared to the other six feature selection methods.

NeurIPS Conference 2024 Conference Paper

Interfacing Foundation Models' Embeddings

Xueyan Zou
Linjie Li
Jianfeng Wang
Jianwei Yang
Mingyu Ding
Junyi Wei
Zhengyuan Yang
Feng Li

Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. As shown in Fig. 1, a lightweight transformer interface without tuning any foundation model weights is enough for segmentation, grounding, and retrieval in an interleaved manner. The proposed interface has the following favorable attributes: (1) Generalizable. It applies to various tasks spanning retrieval, segmentation, etc. , under the same architecture and weights. (2) Interleavable. With the benefit of multi-task multi-modal training, the proposed interface creates an interleaved shared embedding space. (3) Extendable. The proposed interface is adaptive to new tasks, and new models. In light of the interleaved embedding space, we introduce FIND-Bench, which introduces new training and evaluation annotations to the COCO dataset for interleaved segmentation and retrieval. We are the first work aligning foundations models' embeddings for interleave understanding. Meanwhile, our approach achieves state-of-the-art performance on FIND-Bench and competitive performance on standard retrieval and segmentation settings.

PDF Details DOI

JBHI Journal 2024 Journal Article

KFDAE: CircRNA-Disease Associations Prediction Based on Kernel Fusion and Deep Auto-Encoder

Wen-Yue Kang
Ying-Lian Gao
Ying Wang
Feng Li
Jin-Xing Liu

CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.

JBHI Journal 2024 Journal Article

M 3 HOGAT: A Multi-View Multi-Modal Multi-Scale High-Order Graph Attention Network for Microbe-Disease Association Prediction

Shuang Wang
Jin-Xing Liu
Feng Li
Juan Wang
Ying-Lian Gao

Numerous scientific studies have found a link between diverse microorganisms in the human body and complex human diseases. Because traditional experimental approaches are time-consuming and expensive, using computational methods to identify microbes correlated with diseases is critical. In this paper, a new microbe-disease association prediction model is proposed that combines a multi-view multi-modal network and a multi-scale feature fusion mechanism, called M 3 HOGAT. Firstly, a microbe-disease association network and multiple similarity views are constructed based on multi-source information. Then, consider that neighbor information from disparate orders might be more adept at learning node representations. Consequently, the higher-order graph attention network (HOGAT) is devised to aggregate neighbor information from disparate orders to extract microbe and disease features from different networks and views. Given that the embedding features of microbe and disease from different views possess varying importance, a multi-scale feature fusion mechanism is employed to learn their interaction information, thereby generating the final feature of microbes and diseases. Finally, an inner product decoder is used to reconstruct the microbe-disease association matrix. Compared with five state-of-the-art methods on the HMDAD and Disbiome datasets, the results of 5-fold cross-validations show that M 3 HOGAT achieves the best performance. Furthermore, case studies on asthma and obesity confirm the effectiveness of M 3 HOGAT in identifying potential disease-related microbes.

JBHI Journal 2024 Journal Article

SGFCCDA: Scale Graph Convolutional Networks and Feature Convolution for circRNA-Disease Association Prediction

Junliang Shang
Linqian Zhao
Xin He
Xianghan Meng
Limin Zhang
Daohui Ge
Feng Li
Jin-Xing Liu

Circular RNAs (circRNAs) have emerged as a novel class of non-coding RNAs with regulatory roles in disease pathogenesis. Computational models aimed at predicting circRNA-disease associations offer valuable insights into disease mechanisms, thereby enabling the development of innovative diagnostic and therapeutic approaches while reducing the reliance on costly wet experiments. In this study, SGFCCDA is proposed for predicting potential circRNA-disease associations based on scale graph convolutional networks and feature convolution. Specifically, SGFCCDA integrates multiple measures of circRNA and disease similarity and combines known association information to construct a heterogeneous network. This network is then explored by scale graph convolutional networks to capture both topological and attribute information. Additionally, convolutional neural networks are employed to further learn the features and obtain higher-order feature representations containing richer information about nodes. The Hadamard product is utilized to effectively combine circRNA features with disease features, and a multilayer perceptron is applied to predict the association between each pair of circRNA and disease. Five-fold cross validation experiments conducted on the CircR2Disease dataset demonstrate the accurate prediction capabilities of SGFCCDA in identifying potential circRNA-disease associations. Furthermore, case studies provide further confirmation of SGFCCDA's ability to identify disease-associated circRNAs.

NeurIPS Conference 2024 Conference Paper

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

Hongyang Li
Hao Zhang
Shilong Liu
Zhaoyang Zeng
Feng Li
Tianhe Ren
Bohan Li
Lei Zhang

In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cost-volume, which contaminates the point query’s content feature and negatively impacts both visibility prediction and cost-volume computation. In TAPTRv2, we propose a novel attention-based position update (APU) operation and use key-aware deformable attention to realize. For each query, this operation uses key-aware attention weights to combine their corresponding deformable sampling positions to predict a new query position. This design is based on the observation that local attention is essentially the same as cost-volume, both of which are computed by dot-production between a query and its surrounding features. By introducing this new operation, TAPTRv2 not only removes the extra burden of cost-volume computation, but also leads to a substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves state-of-the-art performance on many challenging datasets, demonstrating the effectiveness of our approach.

PDF Details DOI

AAAI Conference 2023 Conference Paper

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

Shilong Liu
Shijia Huang
Feng Li
Hao Zhang
Yaoyuan Liang
Hang Su
Jun Zhu
Lei Zhang

In this paper, we study the problem of visual grounding by considering both phrase extraction and grounding (PEG). In contrast to the previous phrase-known-at-test setting, PEG requires a model to extract phrases from text and locate objects from image simultaneously, which is a more practical setting in real applications. As phrase extraction can be regarded as a 1D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction. Each pair of dual queries are designed to have shared positional parts but different content parts. Such a design effectively alleviates the difficulty of modality alignment between image and text (in contrast to a single query design) and empowers Transformer decoder to leverage phrase mask-guided attention to improve the performance. To evaluate the performance of PEG, we also propose a new metric CMAP (cross-modal average precision), analogous to the AP metric in object detection. The new metric overcomes the ambiguity of Recall@1 in many-box-to-one-phrase cases in phrase grounding. As a result, our PEG pre-trained DQ-DETR establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone. For example, it achieves 91.04% and 83.51% in terms of recall rate on RefCOCO testA and testB with a ResNet-101 backbone.

PDF Details DOI

JBHI Journal 2023 Journal Article

GCCN: Graph Capsule Convolutional Network for Progressive Mild Cognitive Impairment Prediction and Pathogenesis Identification Based on Imaging Genetic Data

Junliang Shang
Qi Zou
Qianqian Ren
Boxin Guan
Feng Li
Jin-Xing Liu
Yan Sun

In this study, we proposed a novel method called the graph capsule convolutional network (GCCN) to predict the progression from mild cognitive impairment to dementia and identify its pathogenesis. First, we proposed a novel risk gene discovery component to indirectly target genes with higher interactions with others. These risk genes and brain regions were collected as nodes to construct heterogeneous pathogenic information association graphs. Second, the graph capsules were established by projecting heterogeneous pathogenic information into a set of disentangled latent components. The orientation and length of capsules are representations of the format and intensity of pathogenic information. Third, graph capsule convolution network was used to model the information flows among pathogenic factors and elaborates the convergence of primary capsules to advanced capsules. The advanced capsule is a concept that organizes pathogenic information based on its consistency, and the synergistic effects of advanced capsules directed the development of the disease. Finally, discriminative pathogenic information flows were captured by a straightforward built-in interpretation mechanism, i. e. , the dynamic routing mechanism, and applied to the identification of pathogenesis. GCCN has been experimentally shown to be significantly advanced on public datasets. Further experiments have shown that the pathogenic factors identified by GCCN are evidential and closely related to progressive mild cognitive impairment.

JBHI Journal 2023 Journal Article

MSGCA: Drug-Disease Associations Prediction Based on Multi-Similarities Graph Convolutional Autoencoder

Ying Wang
Ying-Lian Gao
Juan Wang
Feng Li
Jin-Xing Liu

Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.

JBHI Journal 2023 Journal Article

NLRRC: A Novel Clustering Method of Jointing Non-Negative LRR and Random Walk Graph Regularized NMF for Single-Cell Type Identification

Juan Wang
Lin-Ping Wang
Sha-Sha Yuan
Feng Li
Jin-Xing Liu
Jun-Liang Shang

The development of single-cell RNA sequencing (scRNA-seq) technology has opened up a new perspective for us to study disease mechanisms at the single cell level. Cell clustering reveals the natural grouping of cells, which is a vital step in scRNA-seq data analysis. However, the high noise and dropout of single-cell data pose numerous challenges to cell clustering. In this study, we propose a novel matrix factorization method named NLRRC for single-cell type identification. NLRRC joins non-negative low-rank representation (LRR) and random walk graph regularized NMF (RWNMFC) to accurately reveal the natural grouping of cells. Specifically, we find the lowest rank representation of single-cell samples by non-negative LRR to reduce the difficulty of analyzing high-dimensional samples and capture the global information of the samples. Meanwhile, by using random walk graph regularization (RWGR) and NMF, RWNMFC captures manifold structure and cluster information before generating a cluster allocation matrix. The cluster assignment matrix contains cluster labels, which can be used directly to get the clustering results. The performance of NLRRC is validated on simulated and real single-cell datasets. The results of the experiments illustrate that NLRRC has a significant advantage in single-cell type identification.

AIIM Journal 2023 Journal Article

Radiology report generation with medical knowledge and multilevel image-report alignment: A new method and its verification

Guosheng Zhao
Zijian Zhao
Wuxian Gong
Feng Li

Medical report generation is an integral part of computer-aided diagnosis aimed at reducing the workload of radiologists and physicians and alerting them of misdiagnosis risks. In general, medical report generation is an image captioning task. Since medical reports have long sequences with data bias, the existing medical report generation models lack medical knowledge and ignore the interaction alignment between the two modalities of reports and images. The current paper attempts to mitigate these deficiencies by proposing an approach based on knowledge enhancement with multilevel alignment (MKMIA). To this end, it includes a knowledge enhancement (MKE) module and a multilevel alignment module (MIRA). Specifically, the MKE deals with general medical knowledge (MK) and historical knowledge (HK) obtained via data training. The general knowledge is embedded in the form of a dictionary with characteristic organs (referred to as Key) and organ aliases, disease symptoms, etc. (referred to as Value). It provides explicit exception candidates to mitigate data bias. Historical knowledge ensures the comparison of similar cases to provide a better diagnosis. MIRA furnishes coarse-to-fine multilevel alignment, reducing the gap between image and text features, improving the knowledge enhancement module’s performance, and facilitating the generation of lengthy reports. Experimental results on two radiology report datasets (i. e. , IU X-ray and MIMIC-CXR) proved the effectiveness of the proposed approach, achieving state-of-the-art performance.

NeurIPS Conference 2023 Conference Paper

Segment Everything Everywhere All at Once

Xueyan Zou
Jianwei Yang
Hao Zhang
Feng Li
Linjie Li
Jianfeng Wang
Lijuan Wang
Jianfeng Gao

In this work, we present SEEM, a promotable and interactive model for segmenting everything everywhere all at once in an image. In SEEM, we propose a novel and versatile decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles, and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks, as shown in Fig. 1; iii) Interactivity. We further incorporate learnable memory prompts into the decoder to retain segmentation history through mask-guided cross-attention from the decoder to image features; iv) Semantic awareness. We use a text encoder to encode text queries and mask labels into the same semantic space for open-vocabulary segmentation. We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks. The results demonstrate that SEEM exhibits robust generalizing to unseen user intents as it learns to compose prompts of different types in a unified representation space. Our approach achieves competitive performance on interactive segmentation, generic segmentation, referring segmentation, and video object segmentation on 9 datasets with minimum 1/100 supervision in a single set of weights.

YNICL Journal 2022 Journal Article

Altered frequency-specific/universal amplitude characteristics of spontaneous brain oscillations in patients with bipolar disorder

Zhi-Fang Zhang
Qi-Jing Bo
Feng Li
Lei Zhao
Peng Gao
Yun Wang
Rui Liu
Xiong-Ying Chen

The human brain is a dynamic system with intrinsic oscillations in spontaneous neural activity. Whether the dynamic characteristics of these spontaneous oscillations are differentially altered across different frequency bands in patients with bipolar disorder (BD) remains unclear. This study recruited 65 patients with BD and 85 healthy controls (HCs). The entire frequency range of resting-state fMRI data was decomposed into four frequency intervals. Two-way repeated-measures ANCOVA was employed to detect frequency-specific/universal alterations in the dynamic oscillation amplitude in BD. The patients were then divided into two subgroups according to their mood states to explore whether these alterations were independent of their mood states. Finally, other window sizes, step sizes, and window types were tested to replicate all analyses. Frequency-specific abnormality of the dynamic oscillation amplitude was detected within the posterior medial parietal cortex (centered at the precuneus extending to the posterior cingulate cortex). This specific profile indicates decreased amplitudes in the lower frequency bands (slow-5/4) and no amplitude changes in the higher frequency bands (slow-3/2) compared with HCs. Frequency-universal abnormalities of the dynamic oscillation amplitude were also detectable, indicating increased amplitudes in the thalamus and left cerebellum anterior lobe but decreased amplitudes in the medial superior frontal gyrus. These alterations were independent of the patients' mood states and replicable across multiple analytic and parametric settings. In short, frequency-specific/universal amplitude characteristics of spontaneous oscillations were observed in patients with BD. These abnormal characteristics have important implications for specific functional changes in BD from multiple frequency and dynamic perspectives.

NeurIPS Conference 2022 Conference Paper

APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction

Bencheng Yan
Pengjie Wang
Kai Zhang
Feng Li
Hongbo Deng
Jian Xu
Bo Zheng

In many web applications, deep learning-based CTR prediction models (deep CTR models for short) are widely adopted. Traditional deep CTR models learn patterns in a static manner, i. e. , the network parameters are the same across all the instances. However, such a manner can hardly characterize each of the instances which may have different underlying distributions. It actually limits the representation power of deep CTR models, leading to sub-optimal results. In this paper, we propose an efficient, effective, and universal module, named as Adaptive Parameter Generation network (APG), which can dynamically generate parameters for deep CTR models on-the-fly based on different instances. Extensive experimental evaluation results show that APG can be applied to a variety of deep CTR models and significantly improve their performance. Meanwhile, APG can reduce the time cost by 38. 7\% and memory usage by 96. 6\% compared to a regular deep CTR model. We have deployed APG in the industrial sponsored search system and achieved 3\% CTR gain and 1\% RPM gain respectively.

NeurIPS Conference 2021 Conference Paper

Encoding Spatial Distribution of Convolutional Features for Texture Representation

Yong Xu
Feng Li
Zhile Chen
Jinxiu Liang
Yuhui Quan

Existing convolutional neural networks (CNNs) often use global average pooling (GAP) to aggregate feature maps into a single representation. However, GAP cannot well characterize complex distributive patterns of spatial features while such patterns play an important role in texture-oriented applications, e. g. , material recognition and ground terrain classification. In the context of texture representation, this paper addressed the issue by proposing Fractal Encoding (FE), a feature encoding module grounded by multi-fractal geometry. Considering a CNN feature map as a union of level sets of points lying in the 2D space, FE characterizes their spatial layout via a local-global hierarchical fractal analysis which examines the multi-scale power behavior on each level set. This enables a CNN to encode the regularity on the spatial arrangement of image features, leading to a robust yet discriminative spectrum descriptor. In addition, FE has trainable parameters for data adaptivity and can be easily incorporated into existing CNNs for end-to-end training. We applied FE to ResNet-based texture classification and retrieval, and demonstrated its effectiveness on several benchmark datasets.

AAAI Conference 2021 Conference Paper

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

Mengzuo Huang
Feng Li
Wuhe Zou
Weidong Zhang

Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration which has brought general improvement over multi-turn dialogue systems in recent studies. Meanwhile, jointly inspired by the autoregression for text generation and the sequence labeling for text editing, we propose a novel semi autoregressive generator (SARG) with the high efficiency and flexibility. Moreover, experiments on two benchmarks show that our proposed model significantly outperforms the state-of-the-art models in terms of quality and inference speed.

YNICL Journal 2020 Journal Article

Altered resting-state dynamic functional brain networks in major depressive disorder: Findings from the REST-meta-MDD consortium

Yicheng Long
Hengyi Cao
Chaogan Yan
Xiao Chen
Le Li
Francisco Xavier Castellanos
Tongjian Bai
Qijing Bo

BACKGROUND: Major depressive disorder (MDD) is known to be characterized by altered brain functional connectivity (FC) patterns. However, whether and how the features of dynamic FC would change in patients with MDD are unclear. In this study, we aimed to characterize dynamic FC in MDD using a large multi-site sample and a novel dynamic network-based approach. METHODS: Resting-state functional magnetic resonance imaging (fMRI) data were acquired from a total of 460 MDD patients and 473 healthy controls, as a part of the REST-meta-MDD consortium. Resting-state dynamic functional brain networks were constructed for each subject by a sliding-window approach. Multiple spatio-temporal features of dynamic brain networks, including temporal variability, temporal clustering and temporal efficiency, were then compared between patients and healthy subjects at both global and local levels. RESULTS: ). Corresponding local changes in MDD were mainly found in the default-mode, sensorimotor and subcortical areas. Measures of temporal variability and characteristic temporal path length were significantly correlated with depression severity in patients (corrected p < 0.05). Moreover, the observed between-group differences were robustly present in both first-episode, drug-naïve (FEDN) and non-FEDN patients. CONCLUSIONS: Our findings suggest that excessive temporal variations of brain FC, reflecting abnormal communications between large-scale bran networks over time, may underlie the neuropathology of MDD.

YNICL Journal 2020 Journal Article

Biotypes of major depressive disorder: Neuroimaging evidence from resting-state default mode network patterns

Sugai Liang
Wei Deng
Xiaojing Li
Andrew J. Greenshaw
Qiang Wang
Mingli Li
Xiaohong Ma
Tong-Jian Bai

BACKGROUND: Major depressive disorder (MDD) is heterogeneous disorder associated with aberrant functional connectivity within the default mode network (DMN). This study focused on data-driven identification and validation of potential DMN-pattern-based MDD subtypes to parse heterogeneity of the disorder. METHODS: The sample comprised 1397 participants including 690 patients with MDD and 707 healthy controls (HC) registered from multiple sites based on the REST-meta-MDD Project in China. Baseline resting-state functional magnetic resonance imaging (rs-fMRI) data was recorded for each participant. Discriminative features were selected from DMN between patients and HC. Patient subgroups were defined by K-means and principle component analysis in the multi-site datasets and validated in an independent single-site dataset. Statistical significance of resultant clustering were confirmed. Demographic and clinical variables were compared between identified patient subgroups. RESULTS: Two MDD subgroups with differing functional connectivity profiles of DMN were identified in the multi-site datasets, and relatively stable in different validation samples. The predominant dysfunctional connectivity profiles were detected among superior frontal cortex, ventral medial prefrontal cortex, posterior cingulate cortex and precuneus, whereas one subgroup exhibited increases of connectivity (hyperDMN MDD) and another subgroup showed decreases of connectivity (hypoDMN MDD). The hyperDMN subgroup in the discovery dataset had age-related severity of depressive symptoms. Patient subgroups had comparable demographic and clinical symptom variables. CONCLUSIONS: Findings suggest the existence of two neural subtypes of MDD associated with different dysfunctional DMN connectivity patterns, which may provide useful evidence for parsing heterogeneity of depression and be valuable to inform the search for personalized treatment strategies.

IJCAI Conference 2020 Conference Paper

Deep Interleaved Network for Single Image Super-Resolution with Asymmetric Co-Attention

Feng Li
Runmin Cong
Huihui Bai
Yifan He

Recently, Convolutional Neural Networks (CNN) based image super-resolution (SR) have shown significant success in the literature. However, these methods are implemented as single-path stream to enrich feature maps from the input for the final prediction, which fail to fully incorporate former low-level features into later high-level features. In this paper, to tackle this problem, we propose a deep interleaved network (DIN) to learn how information at different states should be combined for image SR where shallow information guides deep representative features prediction. Our DIN follows a multi-branch pattern allowing multiple interconnected branches to interleave and fuse at different states. Besides, the asymmetric co-attention (AsyCA) is proposed and attacked to the interleaved nodes to adaptively emphasize informative features from different states and improve the discriminative ability of networks. Extensive experiments demonstrate the superiority of our proposed DIN in comparison with the state-of-the-art SR methods.

PDF Details DOI

IJCAI Conference 2019 Conference Paper

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling

Yang Yi
Feng Ni
Yuexin Ma
Xinge Zhu
Yuankai Qi
Riming Qiu
Shijie Zhao
Feng Li

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.

JBHI Journal 2015 Journal Article

A Robust Deep Model for Improved Classification of AD/MCI Patients

Feng Li
Loc Tran
Kim-Han Thung
Shuiwang Ji
Dinggang Shen
Jiang Li

Accurate classification of Alzheimer's disease (AD) and its prodromal stage, mild cognitive impairment (MCI), plays a critical role in possibly preventing progression of memory impairment and improving quality of life for AD patients. Among many research tasks, it is of a particular interest to identify noninvasive imaging biomarkers for AD diagnosis. In this paper, we present a robust deep learning system to identify different progression stages of AD patients based on MRI and PET scans. We utilized the dropout technique to improve classical deep learning by preventing its weight coadaptation, which is a typical cause of overfitting in deep learning. In addition, we incorporated stability selection, an adaptive learning factor, and a multitask learning strategy into the deep learning framework. We applied the proposed method to the ADNI dataset, and conducted experiments for AD and MCI conversion diagnosis. Experimental results showed that the dropout technique is very effective in AD diagnosis, improving the classification accuracies by 5. 9% on average as compared to the classical deep learning methods.

EAAI Journal 2006 Journal Article

Multi-view based face chin contour extraction

Xinliang Ge
Jie Yang
Zhonglong Zheng
Feng Li

Chin contour is an important facial feature to build a 3D morphable model, the core step of which is to establish feature points correspondence between each face in the training set and the reference face. In this paper, robust face detection is implemented firstly using probabilistic method. A probability of detection is obtained for each image of different position and at several scales and poses. Then, the chin contours are extracted accurately using the active shape model (ASM), which depends on the parameters obtained from the face detection. From frontal (0°) to profile (90°) faces that are equally divided into 10 parts, we train 10 flexible models. Then, different flexible models are used to extract the face chin contour according to the corresponding face pose. Experimental results show that the proposed approach can extract the chin contours of different people across different poses with good accuracy.