Arrow Research search

Author name cluster

Feng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

39 papers
2 author rows

Possible papers

39

JBHI Journal 2026 Journal Article

A Hierarchical Attention-Based Negative Sampling Method for Drug Repositioning Using Neighborhood Interaction Fusion

  • Chenglong Mi
  • Ling-Yun Dai
  • Junliang Shang
  • Rong Zhu
  • Juan Wang
  • Feng Li

Accurate prediction of drug–disease associations (DDAs) is essential for drug repositioning and the development of novel therapeutic strategies. However, existing methods often suffer from limited prior knowledge and the use of oversimplified negative sampling techniques, which hinder their ability to capture the complex relationships between drugs and diseases. To break through these limitations, we propose a new model, Hierarchical Attention Mechanism-Based Negative Sampling (HA-NegS), which aims to enhance the prediction of potential DDAs. In this study, HA-NegS further computes the similarity information between drugs and diseases and constructs heterogeneous and homogeneous networks based on it. For the similarity network, HA-NegS fuses Graph Convolutional Network (GCN) and Graph Attention Network (GAT) to effectively capture the neighborhood features of the target nodes. Subsequently, the model incorporates a hierarchical sampling strategy using the PageRank algorithm to rank nodes in descending order of global importance. The attention mechanism is then used to calculate the attention score and re-rank the nodes accordingly. This approach ensures the reliability of the negative sample selection. In order to obtain optimized representations, we use graph contrastive learning methods to refine drug and disease features with homogeneous and heterogeneous neighborhood information. Experimental results on a benchmark dataset show that HA-NegS outperforms existing baseline methods in predicting DDA. In addition, case studies for Alzheimer’s disease and Parkinson’s disease highlight the effectiveness of HA-NegS in discovering new therapeutic applications for existing drugs.

AAAI Conference 2026 Short Paper

Can Large Language Models Grasp 3D Medical Anatomy Shapes? (Student Abstract)

  • Yao Gao
  • Feng Li
  • Jeroen Van Dessel
  • Yi Sun
  • Robin Willaert

What if the next generation of human-computer interaction is not a screen... but a conversation? Large Language Models (LLMs) offer a new paradigm for interacting with computers through text, but they lack shape reasoning capabilities. We introduce Textual Anatomy Encoding (TAE), a workflow that connects LLMs with 3D anatomies. TAE employs clinician-validated semantic annotations and rule-based prompts to achieve deterministic and interpretable landmark localization. The results indicate that TAE enables LLMs to move beyond textual knowledge, achieving an accurate understanding of anatomical localization. This framework opens opportunities for diagnosis, surgical planning, and scalable medical annotation, positioning LLMs as a foundation for next-generation human–computer interaction in healthcare.

JBHI Journal 2026 Journal Article

Epileptic Seizure Prediction Using Multi-Strategy Data Augmentation and Hierarchical Contrastive Learning

  • Longfei Qi
  • Feng Li
  • Junliang Shang
  • Daohui Ge
  • Shihan Wang
  • Shasha Yuan

Accurate early prediction of epileptic seizures is crucial for improving patients’ quality of life. However, existing seizure prediction methods often rely on large-scale labeled datasets and face challenges in generalization and real-time performance. To address these issues, this study proposes an efficient seizure prediction framework that achieves high performance even with limited labeled data, significantly reducing dependence on extensive annotations. To better distinguish preictal states, contrastive learning is employed to enhance feature separation between interictal and preictal periods, leading to improved sensitivity in detecting early seizure patterns. First, a data augmentation strategy is designed, incorporating wavelet-based frequency mixing, temporal masking, and window-based masking to enhance model robustness and generalization. Second, a hierarchical contrastive loss function is introduced, integrating instance-level and temporal contrastive learning to improve the model’s ability to capture preictal patterns. Finally, a lightweight SE-EEGNet is developed and optimized as a feature extractor, strengthening critical feature extraction and enabling real-time seizure prediction. On the CHB-MIT dataset, the proposed method achieves 94. 51% accuracy, 95. 05% sensitivity, a 0. 024/h false positive rate (FPR), and a 20. 12-minute prediction time using only 30% labeled data. On the Siena dataset, it achieves 93. 14% accuracy, 92. 77% sensitivity, and a 0. 030/h FPR. Moreover, performance improves further as the amount of labeled data increases, validating the effectiveness and practical applicability of the proposed approach in seizure prediction.

AAAI Conference 2025 Conference Paper

Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning

  • Man Liu
  • Huihui Bai
  • Feng Li
  • Chunjie Zhang
  • Yunchao Wei
  • Tat-Seng Chua
  • Yao Zhao

Zero-shot learning (ZSL) endeavors to transfer knowledge from the seen categories to recognize unseen categories, which mostly relies on the semantic-visual interactions between image and attribute tokens. Recently, the prompt learning has emerged in ZSL and demonstrated significant potential as it allows the zero-shot transfer of diverse visual concepts to downstream tasks. However, current methods explore the fixed adaptation of the learnable prompt on the seen domains, which make them over-emphasize the primary visual features observed during training, limiting their generalization capabilities to the unseen domains. In this work, we propose AENet, which endows semantic information into the visual prompt to distill semantic-enhanced prompt for visual representation enrichment, enabling effective knowledge transfer for ZSL. AENet comprises two key steps: 1) exploring the concept-harmonized tokens for the visual and attribute modalities, grounded on the modal-sharing token that represents consistent visual-semantic concepts; and 2) yielding the semantic-enhanced prompt via the visual residual refinement unit with attribute consistency supervision. It is further integrated with primary visual features to attend to semantic-related information for visual enhancement, thus strengthening transferable ability. Experimental results on three benchmarks show that our AENet outperforms existing state-of-the-art ZSL methods.

JBHI Journal 2025 Journal Article

Characterization of Cortical Connectivity in the Deception State With a Data-Driven Network Model Based on EEG Signal

  • Qianruo Kang
  • Yaqian Li
  • Xiang Li
  • Min Tian
  • Yin Xiang
  • Feng Li
  • Siyu Peng
  • Yijun Xiong

This study investigates the pattern of information interaction at the cortical level during deception, aiming to reveal the cognitive processes involved in the deception task. Our study involves the 64-channel EEG signals of 28 subjects (14 for innocent and 14 for guilty groups) acquired under the guilty knowledge test (GKT) lie-detection protocol. Additionally, we establish the functional connectivity network at the cortical level considering volume conduction effects, use a data-driven approach to select the regions of interest (ROIs) on the subject's cortex based on scalp electrical activity, and perform cortical current density estimation on 15 ROIs. The nonlinear dependence between the cortical waveforms of the ROIs is quantified based on mutual information, and a network of cortical mutual information connections is constructed in four frequency bands: delta, theta, alpha, and beta. The feature extraction and classification process are performed in each frequency band, and the mutual information connections statistically different between the innocent and guilty groups are first selected as features using statistical tests. Moreover, the optimal feature subset (OFS) is found by combining the SVM classifier and the wrapper feature selection strategy. Furthermore, the most important mutual information connections (MIMICs) per frequency band are obtained by refining the OFS according to the classification performance curve. The average test accuracies of MIMICs in the delta, theta, alpha, and beta bands reached 99. 76%, 96. 42%, 84. 04%, and 97. 61%, respectively. Finally, the physiological significance of each frequency sub-band and the physiological function of MIMICs are combined to explore the cognitive mechanism of lies and provide new evidence for cognitive activity in lying states.

ECAI Conference 2025 Conference Paper

Dual-Axis Domain Alignment for Blur-Robust Object Detection

  • Hanjin Yang
  • Feng Li
  • Liwen Shi
  • Shupei Yuan

Object detection is often trained on ideally clear images without any degradation, but in real-world scenarios, it is inevitably affected by motion blur due to the relative movement between the camera and the object, leading to a domain shift effect and a significant decline in detection performance. Different from existing methods, we define object detection with image blur as a Cross-Domain Object Detection (CDOD) task to utilize a large number of unlabeled images. In this work, we reveal that reducing the intra-domain and inter-domain discrepancy is crucial for improving the quality of pseudo-labels, and propose a novel Dual-Axis Mean Teacher approach to improve the performance of blur-robust object detection. In particular, we first introduce an auxiliary domain composed of synthetic data to reduce the cost of bridging the gap between domains, as well as a dual-branch discriminator to reduce both intra-domain and inter-domain discrepancy simultaneously. Secondly, we apply supervised contrastive learning between instance-level features from the source domain and auxiliary domain, while adaptively adjusting the intensity of contrastive learning based on image blur severity, thereby improving the quality of pseudo-labels generated by the teacher model. Furthermore, to fill the gap of the lack of benchmark datasets in the field of blur-robust object detection, we propose three object detection datasets with blur based on existing deblurring datasets, named Gopro-6C, RealBlur-6C, and REDS-7C. The results on the three datasets demonstrate the consistent superiority of our method, which outperforms the existing state-of-the-art results.

TMLR Journal 2025 Journal Article

LLaVA-OneVision: Easy Visual Task Transfer

  • Bo Li
  • Yuanhan Zhang
  • Dong Guo
  • Renrui Zhang
  • Feng Li
  • Hao Zhang
  • Kaichen Zhang
  • Peiyuan Zhang

We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-image, multi-image, and video scenarios. Importantly, the design of LLaVA-OneVision allows strong transfer learning across different modalities/scenarios, yielding new emerging capabilities. In particular, strong video understanding and cross-scenario capabilities are demonstrated through task transfer from images to videos.

AAAI Conference 2025 Conference Paper

MDD-5k: A New Diagnostic Conversation Dataset for Mental Disorders Synthesized via Neuro-Symbolic LLM Agents

  • Congchi Yin
  • Feng Li
  • Shu Zhang
  • Zike Wang
  • Jun Shao
  • Piji Li
  • Jianhua Chen
  • Xun Jiang

The clinical diagnosis of most mental disorders primarily relies on the conversations between psychiatrist and patient. The creation of such diagnostic conversation datasets is promising to boost the AI mental healthcare community. However, directly collecting the conversations in real diagnosis scenarios is near impossible due to stringent privacy and ethical considerations. To address this issue, we seek to synthesize diagnostic conversation by exploiting anonymized patient cases that are easier to access. Specifically, we design a neuro-symbolic multi-agent framework for synthesizing the diagnostic conversation of mental disorders with large language models. It takes patient case as input and is capable of generating multiple diverse conversations with one single patient case. The framework basically involves the interaction between a doctor agent and a patient agent, and generates conversations under symbolic control via a dynamic diagnosis tree. By applying the proposed framework, we develop the largest Chinese mental disorders diagnosis dataset MDD-5k. This dataset is built upon 1000 real, anonymized patient cases by cooperating with Shanghai Mental Health Center and comprises 5000 high-quality long conversations with diagnosis results and treatment opinions as labels. To the best of our knowledge, it's also the first labeled dataset for Chinese mental disorders diagnosis. Human evaluation demonstrates the proposed MDD-5k dataset successfully simulates human-like diagnostic process of mental disorders.

ICLR Conference 2025 Conference Paper

MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?

  • Yifan Zhang 0004
  • Huanyu Zhang
  • Haochen Tian 0001
  • Chaoyou Fu
  • Shuangqing Zhang
  • Junfei Wu
  • Feng Li
  • Kun Wang 0056

Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult to measure the significant challenges that models face in the real world, including: 1) small data scale leads to a large performance variance; 2) reliance on model-based annotations results in restricted data quality; 3) insufficient task difficulty, especially caused by the limited image resolution. To tackle these issues, we introduce MME-RealWorld. Specifically, we collect more than $300$ K images from public datasets and the Internet, filtering $13,366$ high-quality images for annotation. This involves the efforts of professional $25$ annotators and $7$ experts in MLLMs, contributing to $29,429$ question-answer pairs that cover $43$ subtasks across $5$ real-world scenarios, extremely challenging even for humans. As far as we know, **MME-RealWorld is the largest manually annotated benchmark to date, featuring the highest resolution and a targeted focus on real-world applications**. We further conduct a thorough evaluation involving $29$ prominent MLLMs, such as GPT-4o, Gemini 1.5 Pro, and Claude 3.5 Sonnet. Our results show that even the most advanced models struggle with our benchmarks, where none of them reach 60\% accuracy. The challenges of perceiving high-resolution images and understanding complex real-world scenarios remain urgent issues to be addressed. The data and evaluation code are released in our Project Page.

AAAI Conference 2025 Conference Paper

Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production

  • Shengeng Tang
  • Jiayi He
  • Dan Guo
  • Yanyan Wei
  • Feng Li
  • Richang Hong

Sign Language Production (SLP) aims to generate semantically consistent sign videos from textual statements, where the conversion from textual glosses to sign poses (G2P) is a crucial step. Existing G2P methods typically treat sign poses as discrete three-dimensional coordinates and directly fit them, which overlooks the relative positional relationships among joints. To this end, we provide a new perspective, constraining joint associations and gesture details by modeling the limb bones to improve the accuracy and naturalness of the generated poses. In this work, we propose a pioneering iconicity disentangled diffusion framework, termed Sign-IDD, specifically designed for SLP. Sign-IDD incorporates a novel Iconicity Disentanglement (ID) module to bridge the gap between relative positions among joints. The ID module disentangles the conventional 3D joint representation into a 4D bone representation, comprising the 3D spatial direction vector and 1D spatial distance vector between adjacent joints. Additionally, an Attribute Controllable Diffusion (ACD) module is introduced to further constrain joint associations, in which the attribute separation layer aims to separate the bone direction and length attributes, and the attribute control layer is designed to guide the pose generation by leveraging the above attributes. The ACD module utilizes the gloss embeddings as semantic conditions and finally generates sign poses from noise embeddings. Extensive experiments on PHOENIX14T and USTC-CSL datasets validate the effectiveness of our method.

AAAI Conference 2025 Conference Paper

Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues

  • Mingshen Wang
  • Zhao Zhang
  • Feng Li
  • Ke Xu
  • Kang Miao
  • Meng Wang

Dynamic quantization has attracted rising attention in image super-resolution (SR) as it expands the potential of heavy SR models onto mobile devices while preserving competitive performance. Most current methods explore layer-to-bit configuration upon varying local regions, adaptively allocating the bit to each layer and patch. Despite the benefits, they still fall short in the tradeoff of SR accuracy and quantization efficiency. Apart from this, adapting the quantization level for each layer individually can disturb the original inter-layer relationships, thus diminishing the representation capability of quantized models. In this work, we propose Granular-DQ, which takes advantage of multi-granularity clues and local patch statistics, achieving a distinctive patch-wise and layer-invariant dynamic quantization paradigm. Specifically, Granular-DQ initiates by developing a granularity-bit controller to apprehend the coarse-to-fine granular representations of local patches, matching their proportional contribution to the entire image to determine the proper bit-width allocation. On this premise, we investigate the interrelationships between bit-width and information density within high-bit patches, establishing a soft gate that enables further fine-grained dynamic bit adaption. Extensive experiments validate the superiority of Granular-DQ in the trade-off between efficiency and accuracy over recent state-of-the-art methods on various SR models.

JBHI Journal 2024 Journal Article

Diagnosis-Guided Deep Subspace Clustering Association Study for Pathogenetic Markers Identification of Alzheimer's Disease Based on Comparative Atlases

  • Cui-Na Jiao
  • Junliang Shang
  • Feng Li
  • Xinchun Cui
  • Yan-Li Wang
  • Ying-Lian Gao
  • Jin-Xing Liu

The roles of brain region activities and genotypic functions in the pathogenesis of Alzheimer's disease (AD) remain unclear. Meanwhile, current imaging genetics methods are difficult to identify potential pathogenetic markers by correlation analysis between brain network and genetic variation. To discover disease-related brain connectome from the specific brain structure and the fine-grained level, based on the Automated Anatomical Labeling (AAL) and human Brainnetome atlases, the functional brain network is first constructed for each subject. Specifically, the upper triangle elements of the functional connectivity matrix are extracted as connectivity features. The clustering coefficient and the average weighted node degree are developed to assess the significance of every brain area. Since the constructed brain network and genetic data are characterized by non-linearity, high-dimensionality, and few subjects, the deep subspace clustering algorithm is proposed to reconstruct the original data. Our multilayer neural network helps capture the non-linear manifolds, and subspace clustering learns pairwise affinities between samples. Moreover, most approaches in neuroimaging genetics are unsupervised learning, neglecting the diagnostic information related to diseases. We presented a label constraint with diagnostic status to instruct the imaging genetics correlation analysis. To this end, a diagnosis-guided deep subspace clustering association (DDSCA) method is developed to discover brain connectome and risk genetic factors by integrating genotypes with functional network phenotypes. Extensive experiments prove that DDSCA achieves superior performance to most association methods and effectively selects disease-relevant genetic markers and brain connectome at the coarse-grained and fine-grained levels.

JBHI Journal 2024 Journal Article

FSCME: A Feature Selection Method Combining Copula Correlation and Maximal Information Coefficient by Entropy Weights

  • Qi Zhong
  • Junliang Shang
  • Qianqian Ren
  • Feng Li
  • Cui-Na Jiao
  • Jin-Xing Liu

Feature selection is a critical component of data mining and has garnered significant attention in recent years. However, feature selection methods based on information entropy often introduce complex mutual information forms to measure features, leading to increased redundancy and potential errors. To address this issue, we propose FSCME, a feature selection method combining Copula correlation ( Ccor ) and the maximum information coefficient (MIC) by entropy weights. The FSCME takes into consideration the relevance between features and labels, as well as the redundancy among candidate features and selected features. Therefore, the FSCME utilizes Ccor to measure the redundancy between features, while also estimating the relevance between features and labels. Meanwhile, the FSCME employs MIC to enhance the credibility of the correlation between features and labels. Moreover, this study employs the Entropy Weight Method (EWM) to evaluate and assign weights to the Ccor and MIC. The experimental results demonstrate that FSCME yields a more effective feature subset for subsequent clustering processes, significantly improving the classification performance compared to the other six feature selection methods.

NeurIPS Conference 2024 Conference Paper

Interfacing Foundation Models' Embeddings

  • Xueyan Zou
  • Linjie Li
  • Jianfeng Wang
  • Jianwei Yang
  • Mingyu Ding
  • Junyi Wei
  • Zhengyuan Yang
  • Feng Li

Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified image and dataset-level understanding spanning modality and granularity. As shown in Fig. 1, a lightweight transformer interface without tuning any foundation model weights is enough for segmentation, grounding, and retrieval in an interleaved manner. The proposed interface has the following favorable attributes: (1) Generalizable. It applies to various tasks spanning retrieval, segmentation, etc. , under the same architecture and weights. (2) Interleavable. With the benefit of multi-task multi-modal training, the proposed interface creates an interleaved shared embedding space. (3) Extendable. The proposed interface is adaptive to new tasks, and new models. In light of the interleaved embedding space, we introduce FIND-Bench, which introduces new training and evaluation annotations to the COCO dataset for interleaved segmentation and retrieval. We are the first work aligning foundations models' embeddings for interleave understanding. Meanwhile, our approach achieves state-of-the-art performance on FIND-Bench and competitive performance on standard retrieval and segmentation settings.

JBHI Journal 2024 Journal Article

KFDAE: CircRNA-Disease Associations Prediction Based on Kernel Fusion and Deep Auto-Encoder

  • Wen-Yue Kang
  • Ying-Lian Gao
  • Ying Wang
  • Feng Li
  • Jin-Xing Liu

CircRNA has been proved to play an important role in the diseases diagnosis and treatment. Considering that the wet-lab is time-consuming and expensive, computational methods are viable alternative in these years. However, the number of circRNA-disease associations (CDAs) that can be verified is relatively few, and some methods do not take full advantage of dependencies between attributes. To solve these problems, this paper proposes a novel method based on Kernel Fusion and Deep Auto-encoder (KFDAE) to predict the potential associations between circRNAs and diseases. Firstly, KFDAE uses a non-linear method to fuse the circRNA similarity kernels and disease similarity kernels. Then the vectors are connected to make the positive and negative sample sets, and these data are send to deep auto-encoder to reduce dimension and extract features. Finally, three-layer deep feedforward neural network is used to learn features and gain the prediction score. The experimental results show that compared with existing methods, KFDAE achieves the best performance. In addition, the results of case studies prove the effectiveness and practical significance of KFDAE, which means KFDAE is able to capture more comprehensive information and generate credible candidate for subsequent wet-lab.

JBHI Journal 2024 Journal Article

M 3 HOGAT: A Multi-View Multi-Modal Multi-Scale High-Order Graph Attention Network for Microbe-Disease Association Prediction

  • Shuang Wang
  • Jin-Xing Liu
  • Feng Li
  • Juan Wang
  • Ying-Lian Gao

Numerous scientific studies have found a link between diverse microorganisms in the human body and complex human diseases. Because traditional experimental approaches are time-consuming and expensive, using computational methods to identify microbes correlated with diseases is critical. In this paper, a new microbe-disease association prediction model is proposed that combines a multi-view multi-modal network and a multi-scale feature fusion mechanism, called M 3 HOGAT. Firstly, a microbe-disease association network and multiple similarity views are constructed based on multi-source information. Then, consider that neighbor information from disparate orders might be more adept at learning node representations. Consequently, the higher-order graph attention network (HOGAT) is devised to aggregate neighbor information from disparate orders to extract microbe and disease features from different networks and views. Given that the embedding features of microbe and disease from different views possess varying importance, a multi-scale feature fusion mechanism is employed to learn their interaction information, thereby generating the final feature of microbes and diseases. Finally, an inner product decoder is used to reconstruct the microbe-disease association matrix. Compared with five state-of-the-art methods on the HMDAD and Disbiome datasets, the results of 5-fold cross-validations show that M 3 HOGAT achieves the best performance. Furthermore, case studies on asthma and obesity confirm the effectiveness of M 3 HOGAT in identifying potential disease-related microbes.

JBHI Journal 2024 Journal Article

SGFCCDA: Scale Graph Convolutional Networks and Feature Convolution for circRNA-Disease Association Prediction

  • Junliang Shang
  • Linqian Zhao
  • Xin He
  • Xianghan Meng
  • Limin Zhang
  • Daohui Ge
  • Feng Li
  • Jin-Xing Liu

Circular RNAs (circRNAs) have emerged as a novel class of non-coding RNAs with regulatory roles in disease pathogenesis. Computational models aimed at predicting circRNA-disease associations offer valuable insights into disease mechanisms, thereby enabling the development of innovative diagnostic and therapeutic approaches while reducing the reliance on costly wet experiments. In this study, SGFCCDA is proposed for predicting potential circRNA-disease associations based on scale graph convolutional networks and feature convolution. Specifically, SGFCCDA integrates multiple measures of circRNA and disease similarity and combines known association information to construct a heterogeneous network. This network is then explored by scale graph convolutional networks to capture both topological and attribute information. Additionally, convolutional neural networks are employed to further learn the features and obtain higher-order feature representations containing richer information about nodes. The Hadamard product is utilized to effectively combine circRNA features with disease features, and a multilayer perceptron is applied to predict the association between each pair of circRNA and disease. Five-fold cross validation experiments conducted on the CircR2Disease dataset demonstrate the accurate prediction capabilities of SGFCCDA in identifying potential circRNA-disease associations. Furthermore, case studies provide further confirmation of SGFCCDA's ability to identify disease-associated circRNAs.

NeurIPS Conference 2024 Conference Paper

TAPTRv2: Attention-based Position Update Improves Tracking Any Point

  • Hongyang Li
  • Hao Zhang
  • Shilong Liu
  • Zhaoyang Zeng
  • Feng Li
  • Tianhe Ren
  • Bohan Li
  • Lei Zhang

In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cost-volume, which contaminates the point query’s content feature and negatively impacts both visibility prediction and cost-volume computation. In TAPTRv2, we propose a novel attention-based position update (APU) operation and use key-aware deformable attention to realize. For each query, this operation uses key-aware attention weights to combine their corresponding deformable sampling positions to predict a new query position. This design is based on the observation that local attention is essentially the same as cost-volume, both of which are computed by dot-production between a query and its surrounding features. By introducing this new operation, TAPTRv2 not only removes the extra burden of cost-volume computation, but also leads to a substantial performance improvement. TAPTRv2 surpasses TAPTR and achieves state-of-the-art performance on many challenging datasets, demonstrating the effectiveness of our approach.

AAAI Conference 2023 Conference Paper

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

  • Shilong Liu
  • Shijia Huang
  • Feng Li
  • Hao Zhang
  • Yaoyuan Liang
  • Hang Su
  • Jun Zhu
  • Lei Zhang

In this paper, we study the problem of visual grounding by considering both phrase extraction and grounding (PEG). In contrast to the previous phrase-known-at-test setting, PEG requires a model to extract phrases from text and locate objects from image simultaneously, which is a more practical setting in real applications. As phrase extraction can be regarded as a 1D text segmentation problem, we formulate PEG as a dual detection problem and propose a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction. Each pair of dual queries are designed to have shared positional parts but different content parts. Such a design effectively alleviates the difficulty of modality alignment between image and text (in contrast to a single query design) and empowers Transformer decoder to leverage phrase mask-guided attention to improve the performance. To evaluate the performance of PEG, we also propose a new metric CMAP (cross-modal average precision), analogous to the AP metric in object detection. The new metric overcomes the ambiguity of Recall@1 in many-box-to-one-phrase cases in phrase grounding. As a result, our PEG pre-trained DQ-DETR establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone. For example, it achieves 91.04% and 83.51% in terms of recall rate on RefCOCO testA and testB with a ResNet-101 backbone.

JBHI Journal 2023 Journal Article

GCCN: Graph Capsule Convolutional Network for Progressive Mild Cognitive Impairment Prediction and Pathogenesis Identification Based on Imaging Genetic Data

  • Junliang Shang
  • Qi Zou
  • Qianqian Ren
  • Boxin Guan
  • Feng Li
  • Jin-Xing Liu
  • Yan Sun

In this study, we proposed a novel method called the graph capsule convolutional network (GCCN) to predict the progression from mild cognitive impairment to dementia and identify its pathogenesis. First, we proposed a novel risk gene discovery component to indirectly target genes with higher interactions with others. These risk genes and brain regions were collected as nodes to construct heterogeneous pathogenic information association graphs. Second, the graph capsules were established by projecting heterogeneous pathogenic information into a set of disentangled latent components. The orientation and length of capsules are representations of the format and intensity of pathogenic information. Third, graph capsule convolution network was used to model the information flows among pathogenic factors and elaborates the convergence of primary capsules to advanced capsules. The advanced capsule is a concept that organizes pathogenic information based on its consistency, and the synergistic effects of advanced capsules directed the development of the disease. Finally, discriminative pathogenic information flows were captured by a straightforward built-in interpretation mechanism, i. e. , the dynamic routing mechanism, and applied to the identification of pathogenesis. GCCN has been experimentally shown to be significantly advanced on public datasets. Further experiments have shown that the pathogenic factors identified by GCCN are evidential and closely related to progressive mild cognitive impairment.

JBHI Journal 2023 Journal Article

MSGCA: Drug-Disease Associations Prediction Based on Multi-Similarities Graph Convolutional Autoencoder

  • Ying Wang
  • Ying-Lian Gao
  • Juan Wang
  • Feng Li
  • Jin-Xing Liu

Identifying drug-disease associations (DDAs) is critical to the development of drugs. Traditional methods to determine DDAs are expensive and inefficient. Therefore, it is imperative to develop more accurate and effective methods for DDAs prediction. Most current DDAs prediction methods utilize original DDAs matrix directly. However, the original DDAs matrix is sparse, which greatly affects the prediction consequences. Hence, a prediction method based on multi-similarities graph convolutional autoencoder (MSGCA) is proposed for DDAs prediction. First, MSGCA integrates multiple drug similarities and disease similarities using centered kernel alignment-based multiple kernel learning (CKA-MKL) algorithm to form new drug similarity and disease similarity, respectively. Second, the new drug and disease similarities are improved by linear neighborhood, and the DDAs matrix is reconstructed by weighted K nearest neighbor profiles. Next, the reconstructed DDAs and the improved drug and disease similarities are integrated into a heterogeneous network. Finally, the graph convolutional autoencoder with attention mechanism is utilized to predict DDAs. Compared with extant methods, MSGCA shows superior results on three datasets. Furthermore, case studies further demonstrate the reliability of MSGCA.

JBHI Journal 2023 Journal Article

NLRRC: A Novel Clustering Method of Jointing Non-Negative LRR and Random Walk Graph Regularized NMF for Single-Cell Type Identification

  • Juan Wang
  • Lin-Ping Wang
  • Sha-Sha Yuan
  • Feng Li
  • Jin-Xing Liu
  • Jun-Liang Shang

The development of single-cell RNA sequencing (scRNA-seq) technology has opened up a new perspective for us to study disease mechanisms at the single cell level. Cell clustering reveals the natural grouping of cells, which is a vital step in scRNA-seq data analysis. However, the high noise and dropout of single-cell data pose numerous challenges to cell clustering. In this study, we propose a novel matrix factorization method named NLRRC for single-cell type identification. NLRRC joins non-negative low-rank representation (LRR) and random walk graph regularized NMF (RWNMFC) to accurately reveal the natural grouping of cells. Specifically, we find the lowest rank representation of single-cell samples by non-negative LRR to reduce the difficulty of analyzing high-dimensional samples and capture the global information of the samples. Meanwhile, by using random walk graph regularization (RWGR) and NMF, RWNMFC captures manifold structure and cluster information before generating a cluster allocation matrix. The cluster assignment matrix contains cluster labels, which can be used directly to get the clustering results. The performance of NLRRC is validated on simulated and real single-cell datasets. The results of the experiments illustrate that NLRRC has a significant advantage in single-cell type identification.

NeurIPS Conference 2023 Conference Paper

Segment Everything Everywhere All at Once

  • Xueyan Zou
  • Jianwei Yang
  • Hao Zhang
  • Feng Li
  • Linjie Li
  • Jianfeng Wang
  • Lijuan Wang
  • Jianfeng Gao

In this work, we present SEEM, a promotable and interactive model for segmenting everything everywhere all at once in an image. In SEEM, we propose a novel and versatile decoding mechanism that enables diverse prompting for all types of segmentation tasks, aiming at a universal interface that behaves like large language models (LLMs). More specifically, SEEM is designed with four desiderata: i) Versatility. We introduce a new visual prompt to unify different spatial queries including points, boxes, scribbles, and masks, which can further generalize to a different referring image; ii) Compositionality. We learn a joint visual-semantic space between text and visual prompts, which facilitates the dynamic composition of two prompt types required for various segmentation tasks, as shown in Fig. 1; iii) Interactivity. We further incorporate learnable memory prompts into the decoder to retain segmentation history through mask-guided cross-attention from the decoder to image features; iv) Semantic awareness. We use a text encoder to encode text queries and mask labels into the same semantic space for open-vocabulary segmentation. We conduct a comprehensive empirical study to validate the effectiveness of SEEM across diverse segmentation tasks. The results demonstrate that SEEM exhibits robust generalizing to unseen user intents as it learns to compose prompts of different types in a unified representation space. Our approach achieves competitive performance on interactive segmentation, generic segmentation, referring segmentation, and video object segmentation on 9 datasets with minimum 1/100 supervision in a single set of weights.

NeurIPS Conference 2022 Conference Paper

APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction

  • Bencheng Yan
  • Pengjie Wang
  • Kai Zhang
  • Feng Li
  • Hongbo Deng
  • Jian Xu
  • Bo Zheng

In many web applications, deep learning-based CTR prediction models (deep CTR models for short) are widely adopted. Traditional deep CTR models learn patterns in a static manner, i. e. , the network parameters are the same across all the instances. However, such a manner can hardly characterize each of the instances which may have different underlying distributions. It actually limits the representation power of deep CTR models, leading to sub-optimal results. In this paper, we propose an efficient, effective, and universal module, named as Adaptive Parameter Generation network (APG), which can dynamically generate parameters for deep CTR models on-the-fly based on different instances. Extensive experimental evaluation results show that APG can be applied to a variety of deep CTR models and significantly improve their performance. Meanwhile, APG can reduce the time cost by 38. 7\% and memory usage by 96. 6\% compared to a regular deep CTR model. We have deployed APG in the industrial sponsored search system and achieved 3\% CTR gain and 1\% RPM gain respectively.

NeurIPS Conference 2021 Conference Paper

Encoding Spatial Distribution of Convolutional Features for Texture Representation

  • Yong Xu
  • Feng Li
  • Zhile Chen
  • Jinxiu Liang
  • Yuhui Quan

Existing convolutional neural networks (CNNs) often use global average pooling (GAP) to aggregate feature maps into a single representation. However, GAP cannot well characterize complex distributive patterns of spatial features while such patterns play an important role in texture-oriented applications, e. g. , material recognition and ground terrain classification. In the context of texture representation, this paper addressed the issue by proposing Fractal Encoding (FE), a feature encoding module grounded by multi-fractal geometry. Considering a CNN feature map as a union of level sets of points lying in the 2D space, FE characterizes their spatial layout via a local-global hierarchical fractal analysis which examines the multi-scale power behavior on each level set. This enables a CNN to encode the regularity on the spatial arrangement of image features, leading to a robust yet discriminative spectrum descriptor. In addition, FE has trainable parameters for data adaptivity and can be easily incorporated into existing CNNs for end-to-end training. We applied FE to ResNet-based texture classification and retrieval, and demonstrated its effectiveness on several benchmark datasets.

AAAI Conference 2021 Conference Paper

SARG: A Novel Semi Autoregressive Generator for Multi-turn Incomplete Utterance Restoration

  • Mengzuo Huang
  • Feng Li
  • Wuhe Zou
  • Weidong Zhang

Dialogue systems in open domain have achieved great success due to the easily obtained single-turn corpus and the development of deep learning, but the multi-turn scenario is still a challenge because of the frequent coreference and information omission. In this paper, we investigate the incomplete utterance restoration which has brought general improvement over multi-turn dialogue systems in recent studies. Meanwhile, jointly inspired by the autoregression for text generation and the sequence labeling for text editing, we propose a novel semi autoregressive generator (SARG) with the high efficiency and flexibility. Moreover, experiments on two benchmarks show that our proposed model significantly outperforms the state-of-the-art models in terms of quality and inference speed.

IJCAI Conference 2020 Conference Paper

Deep Interleaved Network for Single Image Super-Resolution with Asymmetric Co-Attention

  • Feng Li
  • Runmin Cong
  • Huihui Bai
  • Yifan He

Recently, Convolutional Neural Networks (CNN) based image super-resolution (SR) have shown significant success in the literature. However, these methods are implemented as single-path stream to enrich feature maps from the input for the final prediction, which fail to fully incorporate former low-level features into later high-level features. In this paper, to tackle this problem, we propose a deep interleaved network (DIN) to learn how information at different states should be combined for image SR where shallow information guides deep representative features prediction. Our DIN follows a multi-branch pattern allowing multiple interconnected branches to interleave and fuse at different states. Besides, the asymmetric co-attention (AsyCA) is proposed and attacked to the interleaved nodes to adaptively emphasize informative features from different states and improve the discriminative ability of networks. Extensive experiments demonstrate the superiority of our proposed DIN in comparison with the state-of-the-art SR methods.

IJCAI Conference 2019 Conference Paper

High Performance Gesture Recognition via Effective and Efficient Temporal Modeling

  • Yang Yi
  • Feng Ni
  • Yuexin Ma
  • Xinge Zhu
  • Yuankai Qi
  • Riming Qiu
  • Shijie Zhao
  • Feng Li

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.

JBHI Journal 2015 Journal Article

A Robust Deep Model for Improved Classification of AD/MCI Patients

  • Feng Li
  • Loc Tran
  • Kim-Han Thung
  • Shuiwang Ji
  • Dinggang Shen
  • Jiang Li

Accurate classification of Alzheimer's disease (AD) and its prodromal stage, mild cognitive impairment (MCI), plays a critical role in possibly preventing progression of memory impairment and improving quality of life for AD patients. Among many research tasks, it is of a particular interest to identify noninvasive imaging biomarkers for AD diagnosis. In this paper, we present a robust deep learning system to identify different progression stages of AD patients based on MRI and PET scans. We utilized the dropout technique to improve classical deep learning by preventing its weight coadaptation, which is a typical cause of overfitting in deep learning. In addition, we incorporated stability selection, an adaptive learning factor, and a multitask learning strategy into the deep learning framework. We applied the proposed method to the ADNI dataset, and conducted experiments for AD and MCI conversion diagnosis. Experimental results showed that the dropout technique is very effective in AD diagnosis, improving the classification accuracies by 5. 9% on average as compared to the classical deep learning methods.