Arrow Research search

Author name cluster

Xiaoling Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
1 author row

Possible papers

18

JBHI Journal 2026 Journal Article

ACGM: Attribute-Centric Graph Modeling Network for Concurrent Missing Tabular Data Imputation and COVID-19 Prognosis

  • Zhuoru Wu
  • Wenting Chen
  • Xuechen Li
  • Filippo Ruffini
  • Shaonan Liu
  • Lorenzo Tronchin
  • Domenico Albano
  • Eliodoro Faiella

COVID-19 prognosis using clinical tabular data faces significant challenges due to missing values and class imbalance issues. Existing methods often overlook the complex high-order interrelationship among clinicalattributes and struggle with training stability on imbalanced datasets. We propose ACGM, an attribute-centric graph modeling network that simultaneously addresses missing data imputation and COVID-19 prognosis. ACGM consists of three key modules: an attributes preprocessing module (APM) for coarse-grained imputation initialization, a graph-enhanced attributes imputation module (GEAIM) that models high-order inter-attribute relationships through graph structures, and a graph-enhanced disease prognosis module (GEDPM) that leverages these complex attribute interactions for final prediction. GEAIM and GEDPM employ a mean-teacher strategy with attributes graph matching to preserve high-order relationships, enhance training stability, and maintain structural integrity of attribute interactions. Extensive experiments are conducted on four public COVID-19 tabular datasets, demonstrating the superiority of our ACGM over existing methods. Through comprehensive interpretability analysis, we identify that attributes such as LDH, Difficulty In Breathing, and SaO 2 significantly impact COVID-19 prognosis, aligning well with clinical insights and radiologist assessments.

AAAI Conference 2026 Conference Paper

Frequency-Aligned Cross-Modal Learning with Top-K Wavelet Fusion and Dynamic Expert Routing for Enhanced Retinal Disease Diagnosis

  • Yuxin Lin
  • Haoran Li
  • Haoyu Cao
  • Yongting Hu
  • Qihao Xu
  • Chengliang Liu
  • Xiaoling Luo
  • Zhihao Wu

Multimodal fusion of color fundus photography (CFP) and optical coherence tomography (OCT) B-scan images has demonstrated superior diagnostic potential for retinal diseases compared to single-modality approaches. However, existing fusion paradigms - whether through naive concatenation or attention mechanisms - treat cross-modal interactions indiscriminately, lacking adaptive modulation of modality-specific contributions under varying clinical scenarios. We propose an adaptive fusion framework that dynamically routes and refines multimodal signals for enhancing disease recognition. The framework comprises two key components: 1) Dynamic Cross-Modal Expert Routing (CMER), which selectively activates convolutional neural network (CNN) experts from one modality based on contextual guidance from the other, ensuring only the most relevant feature extractors contribute to fusion; and 2) Top-K Expert-Guided Wavelet Fusion (TEWF), which performs discrete wavelet transform (DWT) to decompose selected features into low- and high-frequency subbands. Cross-modal attention is then applied specifically to high-frequency components, where lesion-specific microstructures reside, enabling frequency-aware fusion. Finally, inverse DWT (IDWT) reconstructs the fused representation, weighted by CMER-derived importance scores to amplify informative modality cues while suppressing redundancy. Experimental validation on two multimodal retinal datasets demonstrates that our method achieves state-of-the-art performance, outperforming existing fusion strategies by significant margins in disease classification accuracy and robustness.

JBHI Journal 2026 Journal Article

LUS-DET: Liver Ultrasound Open-Vocabulary Object Detection

  • Jiansong Zhang
  • Xiaoling Luo
  • Guorong Lyu
  • Yongjian Chen
  • Linlin Shen

In liver ultrasound, the acquisition of standard scanning planes serves as a prerequisite for reliable lesion assessment. In clinical practice, physicians make diagnostic decisions by jointly interpreting the spatial configuration of key anatomical structures within standard planes and local lesion features. However, existing studies commonly treat standard plane recognition and lesion detection as two separate tasks, lacking a unified modeling approach that reflects their semantic continuity and clinical interdependence. Inspired by the diagnostic workflow of liver ultrasound, we propose LUS-DET, an open-vocabulary object detection framework designed to semantically bridge liver ultrasound standard plane analysis (LUSP) and liver ultrasound disease diagnosis (LUDD) through text-guided modeling. Specifically, we curate a retrospective LUSP dataset and develop a region–text alignment mechanism linking 44, 669 region–caption pairs across 12 anatomical categories to enable in-domain open-vocabulary pretraining. Building upon this alignment, we introduce object prompts to guide zero-shot lesion detection in an open-source LUDD task without using any lesion-specific annotations. Experimental results demonstrate that LUS-DET not only achieves competitive zero-shot performance, but also exhibits superior accuracy and robustness during end-to-end fine-tuning compared to conventional detection baselines. To the best of our knowledge, this is the first study to propose a clinically coherent modelling paradigm that unifies standard plane localisation and lesion analysis in liver ultrasound, providing a new direction for structure-aware and workflow-aligned AI systems in medical imaging.

AAAI Conference 2026 Conference Paper

Vision-Language Models Guided Graph Concept Reasoning for Interpretable Diabetic Retinopathy Diagnosis

  • Qihao Xu
  • Xiaoling Luo
  • Yuxin Lin
  • Chengliang Liu
  • Yongting Hu
  • Jinkai Li
  • Xinheng Lyu
  • Yong Xu

Deep neural networks (DNNs) have significantly advanced diabetic retinopathy (DR) diagnosis, yet their black-box nature limits clinical acceptance due to a lack of interpretability. Concept bottleneck model (CBM) offers a promising solution by enabling concept-level reasoning and test-time intervention, with recent DR studies modeling lesions as concepts and grades as outcomes. However, current methods often ignore relationships between lesion concepts across different DR grades and struggle when fine-grained lesion concepts are unavailable, limiting their interpretability and real-world applicability. To bridge these gaps, we propose VLM-GCR, a vision-language model guided graph concept reasoning framework for interpretable DR diagnosis. VLM-GCR emulates the diagnostic process of ophthalmologists by constructing a grading-aware lesion concept graph that explicitly models the interactions among lesions and their relationships to disease grades. In concept-free clinical scenarios, our method introduces a vision-language guided dynamic concept pseudo-labeling mechanism to mitigate the challenges of existing concept-based models in fine-grained lesion recognition. Additionally, we introduce a multi-level intervention method that supports error correction, enabling transparent and robust human-AI collaboration. Experiments on two public DR benchmarks show that VLM-GCR achieves strong performance in both lesion and grading tasks, while delivering clear and clinically meaningful reasoning steps.

AAAI Conference 2026 Conference Paper

VPSentry: Semi-supervised Video Polyp Segmentation via Sentry-guided Long-term Prototype Fusion with Correlation Dynamic Propagation

  • Guilian Chen
  • Xiaoling Luo
  • Huisi Wu
  • Jing Qin

Automated polyp segmentation in colonoscopy videos is an essential computer-aided technology for early detection and removal of polyps. However, most existing video polyp segmentation methods are designed with pixel-level temporal learning mechanisms, at the cost of time-consuming frame-wise annotations. In this paper, we present VPSentry, a novel semi-supervised segmentation model with a sentry mechanism. Our model integrates a prototype memory to store the long-term spatiotemporal cues of colonoscopy videos. Moreover, we devise adaptive prototypes to capture and generalize critical representations from individual frames, enabling long-term temporal fusion across labeled and unlabeled frames. In addition, we propose a correlation dynamic propagation module that propagates information from prototypes to features while simultaneously extracting dynamic features to perceive variations in polyp details between adjacent frames. Since colonoscopy scenes may change among consecutive frames, we further employ a sentry mechanism to assess the inter-frame continuity. This mechanism guides the prototype memory updating and the correlation dynamic propagation, further facilitating robust temporal propagation and dynamic detail perception for semi-supervised learning of long-term colonoscopy video sequences. Extensive experiments on the large-scale SUN-SEG dataset demonstrate that our model achieves optimal segmentation performance with real-time inference efficiency.

JBHI Journal 2025 Journal Article

A Lesion-Fusion Neural Network for Multi-View Diabetic Retinopathy Grading

  • Xiaoling Luo
  • Qihao Xu
  • Zhihua Wang
  • Chao Huang
  • Chengliang Liu
  • Xiaopeng Jin
  • Jianguo Zhang

As the most common complication of diabetes, diabetic retinopathy (DR) is one of the main causes of irreversible blindness. Automatic DR grading plays a crucial role in early diagnosis and intervention, reducing the risk of vision loss in people with diabetes. In these years, various deep-learning approaches for DR grading have been proposed. Most previous DR grading models are trained using the dataset of single-field fundus images, but the entire retina cannot be fully visualized in a single field of view. There are also problems of scattered location and great differences in the appearance of lesions in fundus images. To address the limitations caused by incomplete fundus features, and the difficulty in obtaining lesion information. This work introduces a novel multi-view DR grading framework, which solves the problem of incomplete fundus features by jointly learning fundus images from multiple fields of view. Furthermore, the proposed model combines multi-view inputs such as fundus images and lesion snapshots. It utilizes heterogeneous convolution blocks (HCB) and scalable self-attention classes (SSAC), which enhance the ability of the model to obtain lesion information. The experimental results show that our proposed method performs better than the benchmark methods on the large-scale dataset.

AAAI Conference 2025 Conference Paper

DAMPER: A Dual-Stage Medical Report Generation Framework with Coarse-Grained MeSH Alignment and Fine-Grained Hypergraph Matching

  • Xiaofei Huang
  • Wenting Chen
  • Jie Liu
  • Qisheng Lu
  • Xiaoling Luo
  • Linlin Shen

Medical report generation is crucial for clinical diagnosis and patient management, summarizing diagnoses and recommendations based on medical imaging. However, existing work often overlook the clinical pipeline involved in report writing, where physicians typically conduct an initial quick review followed by a detailed examination. Moreover, current alignment methods may lead to misaligned relationships. To address these issues, we propose DAMPER, a dual-stage framework for medical report generation that mimics the clinical pipeline of report writing in two stages. In the first stage, a MeSH-Guided Coarse-Grained Alignment (MCG) stage that aligns chest X-ray (CXR) image features with medical subject headings (MeSH) features to generate a rough keyphrase representation of the overall impression. In the second stage, a Hypergraph-Enhanced Fine-Grained Alignment (HFG) stage that constructs hypergraphs for image patches and report annotations, modeling high-order relationships within each modality and performing hypergraph matching to capture semantic correlations between image regions and textual phrases. Finally,the coarse-grained visual features, generated MeSH representations, and visual hypergraph features are fed into a report decoder to produce the final medical report. Extensive experiments on public datasets demonstrate the effectiveness of DAMPER in generating comprehensive and accurate medical reports, outperforming state-of-the-art methods across various evaluation metrics.

AAAI Conference 2025 Conference Paper

Deep Hierarchies and Invariant Disease-Indicative Feature Learning for Computer Aided Diagnosis of Multiple Fundus Diseases

  • Yuxin Lin
  • Wei Wang
  • Xiaoling Luo
  • Zhihao Wu
  • Chengliang Liu
  • Jie Wen
  • Yong Xu

With the advancement of computer vision, numerous models have been proposed for screening of fundus diseases. However, the recognition of multiple fundus diseases is often hampered by the simultaneous presence of multiple disease types and the confluence of lesion types in fundus images. This paper addresses these challenges by conceptualizing them as multi-level feature fusion and self-supervised disease-indicative feature learning problems. We decode fundus images at various levels of granularity to delineate scenarios wherein multiple diseases and lesions co-occur. To effectively integrate these features, we introduce a hierarchical vision transformer (HVT) that adeptly captures both inter-level and intra-level dependencies. A novel forward-attention module is proposed to enhance the integration of lower-level semantic information into higher semantic layers, thereby enriching the representation of complex features. Additionally, we introduce a novel self-supervised mask-consistent feature learner (MCFL). Unlike traditional mask-autoencoders that reconstruct original images using encoder-decoder structures, MCFL utilizes a teacher-student framework to reconstruct mask-consistent feature maps. In this setup, exponential moving averaging is employed to derive classification-guided features, serving as labels for reconstruction rather than merely reconstructing the original images. This innovative approach facilitates the extraction of disease-indicative features. Extensive experiments demonstrate that our method significantly outperforms existing state-of-the-art models.

IJCAI Conference 2025 Conference Paper

Enhancing Multimodal Protein Function Prediction Through Dual-Branch Dynamic Selection with Reconstructive Pre-Training

  • Xiaoling Luo
  • Peng Chen
  • Chengliang Liu
  • Xiaopeng Jin
  • Jie Wen
  • Yumeng Liu
  • Junsong Wang

Multimodal protein features play a crucial role in protein function prediction. However, these features encompass a wide range of information, ranging from structural data and sequence features to protein attributes and interaction networks, making it challenging to decipher their complex interconnections. In this work, we propose a multimodal protein function prediction method (DSRPGO) by utilizing dynamic selection and reconstructive pre-training mechanisms. To acquire complex protein information, we introduce reconstructive pre-training to mine more fine-grained information with low semantic levels. Moreover, we put forward the Bidirectional Interaction Module (BInM) to facilitate interactive learning among multimodal features. Additionally, to address the difficulty of hierarchical multi-label classification in this task, a Dynamic Selection Module (DSM) is designed to select the feature representation that is most conducive to current protein function prediction. Our proposed DSRPGO model improves significantly in BPO, MFO, and CCO on human datasets, thereby outperforming other benchmark models.

NeurIPS Conference 2025 Conference Paper

Hierarchical Information Aggregation for Incomplete Multimodal Alzheimer's Disease Diagnosis

  • Chengliang Liu
  • Que Yuanxi
  • Qihao Xu
  • Yabo Liu
  • Jie Wen
  • Jinghua Wang
  • Xiaoling Luo

Alzheimer's Disease (AD) poses a significant health threat to the aging population, underscoring the critical need for early diagnosis to delay disease progression and improve patient quality of life. Recent advances in heterogeneous multimodal artificial intelligence (AI) have facilitated comprehensive joint diagnosis, yet practical clinical scenarios frequently encounter incomplete modalities due to factors like high acquisition costs or radiation risks. Moreover, traditional convolution-based architecture face inherent limitations in capturing long-range dependencies and handling heterogeneous medical data efficiently. To address these challenges, in our proposed heterogeneous multimodal diagnostic framework (HAD), we develop a multi-view Hilbert curve-based Mamba block and a hierarchical spatial feature extraction module to simultaneously capture local spatial features and global dependencies, effectively alleviating spatial discontinuities introduced by voxel serialization. Furthermore, to balance semantic consistency and modal specificity, we build a unified mutual information learning objective in the heterogeneous multimodal embedding space, which maintains effective learning of modality-specific information to avoid modality collapse caused by model preference. Extensive experiments demonstrate that our HAD significantly outperforms state-of-the-art methods in various modality-missing scenarios, providing an efficient and reliable solution for early-stage AD diagnosis.

AAAI Conference 2025 Conference Paper

Like an Ophthalmologist: Dynamic Selection Driven Multi-View Learning for Diabetic Retinopathy Grading

  • Xiaoling Luo
  • Qihao Xu
  • Huisi Wu
  • Chengliang Liu
  • Zhihui Lai
  • Linlin Shen

Diabetic retinopathy (DR), with its large patient population, has become a formidable threat to human visual health. In the clinical diagnosis of DR, multi-view fundus images are considered to be more suitable for DR diagnosis because of the wide coverage of the field of view. Therefore, different from most of the previous single-view DR grading methods, we design a dynamic selection-driven multi-view DR grading method to fit clinical scenarios better. Since lesion information plays a key role in DR diagnosis, previous methods usually boost the model performance by enhancing the lesion feature. However, during the actual diagnosis, ophthalmologists not only focus on the crucial parts, but also exclude irrelevant features to ensure the accuracy of judgment. To this end, we introduce the idea of dynamic selection and design a series of selection mechanisms from fine granularity to coarse granularity. In this work, we first introduce an Ophthalmic Image Reader (OIR) agent to provide the model with pixel-level prompts of suspected lesion areas. Moreover, a Multi-View Token Selection Module (MVTSM) is designed to prune redundant feature tokens and realize dynamic selection of key information. In the final decision stage, we dynamically fuse multi-view features through the novel Multi-View Mixture of Experts Module (MVMoEM), to enhance key views and reduce the impact of conflicting views. Extensive experiments on a large multi-view fundus image dataset with 34,452 images demonstrate that our method performs favorably against state-of-the-art models.

AAAI Conference 2024 Conference Paper

Attention-Induced Embedding Imputation for Incomplete Multi-View Partial Multi-Label Classification

  • Chengliang Liu
  • Jinlong Jia
  • Jie Wen
  • Yabo Liu
  • Xiaoling Luo
  • Chao Huang
  • Yong Xu

As a combination of emerging multi-view learning methods and traditional multi-label classification tasks, multi-view multi-label classification has shown broad application prospects. The diverse semantic information contained in heterogeneous data effectively enables the further development of multi-label classification. However, the widespread incompleteness problem on multi-view features and labels greatly hinders the practical application of multi-view multi-label classification. Therefore, in this paper, we propose an attention-induced missing instances imputation technique to enhance the generalization ability of the model. Different from existing incomplete multi-view completion methods, we attempt to approximate the latent features of missing instances in embedding space according to cross-view joint attention, instead of recovering missing views in kernel space or original feature space. Accordingly, multi-view completed features are dynamically weighted by the confidence derived from joint attention in the late fusion phase. In addition, we propose a multi-view multi-label classification framework based on label-semantic feature learning, utilizing the statistical weak label correlation matrix and graph attention network to guide the learning process of label-specific features. Finally, our model is compatible with missing multi-view and partial multi-label data simultaneously and extensive experiments on five datasets confirm the advancement and effectiveness of our embedding imputation method and multi-view multi-label classification model.

AAAI Conference 2024 Conference Paper

HACDR-Net: Heterogeneous-Aware Convolutional Network for Diabetic Retinopathy Multi-Lesion Segmentation

  • Qihao Xu
  • Xiaoling Luo
  • Chao Huang
  • Chengliang Liu
  • Jie Wen
  • Jialei Wang
  • Yong Xu

Diabetic Retinopathy (DR), the leading cause of blindness in diabetic patients, is diagnosed by the condition of retinal multiple lesions. As a difficult task in medical image segmentation, DR multi-lesion segmentation faces the main concerns as follows. On the one hand, retinal lesions vary in location, shape, and size. On the other hand, because some lesions occupy only a very small part of the entire fundus image, the high proportion of background leads to difficulties in lesion segmentation. To solve the above problems, we propose a heterogeneous-aware convolutional network (HACDR-Net) that composes heterogeneous cross-convolution, heterogeneous modulated deformable convolution, and optional near-far-aware convolution. Our network introduces an adaptive aggregation module to summarize the heterogeneous feature maps and get diverse lesion areas in the heterogeneous receptive field along the channels and space. In addition, to solve the problem of the highly imbalanced proportion of focal areas, we design a new medical image segmentation loss function, Noise Adjusted Loss (NALoss). NALoss balances the predictive feature distribution of background and lesion by jointing Gaussian noise and hard example mining, thus enhancing awareness of lesions. We conduct the experiments on the public datasets IDRiD and DDR, and the experimental results show that the proposed method achieves better performance than other state-of-the-art methods. The code is open-sourced on github.com/xqh180110910537/HACDR-Net.

AAAI Conference 2023 Conference Paper

DICNet: Deep Instance-Level Contrastive Network for Double Incomplete Multi-View Multi-Label Classification

  • Chengliang Liu
  • Jie Wen
  • Xiaoling Luo
  • Chao Huang
  • Zhihao Wu
  • Yong Xu

In recent years, multi-view multi-label learning has aroused extensive research enthusiasm. However, multi-view multi-label data in the real world is commonly incomplete due to the uncertain factors of data collection and manual annotation, which means that not only multi-view features are often missing, and label completeness is also difficult to be satisfied. To deal with the double incomplete multi-view multi-label classification problem, we propose a deep instance-level contrastive network, namely DICNet. Different from conventional methods, our DICNet focuses on leveraging deep neural network to exploit the high-level semantic representations of samples rather than shallow-level features. First, we utilize the stacked autoencoders to build an end-to-end multi-view feature extraction framework to learn the view-specific representations of samples. Furthermore, in order to improve the consensus representation ability, we introduce an incomplete instance-level contrastive learning scheme to guide the encoders to better extract the consensus information of multiple views and use a multi-view weighted fusion module to enhance the discrimination of semantic features. Overall, our DICNet is adept in capturing consistent discriminative representations of multi-view multi-label data and avoiding the negative effects of missing views and missing labels. Extensive experiments performed on five datasets validate that our method outperforms other state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Incomplete Multi-View Multi-Label Learning via Label-Guided Masked View- and Category-Aware Transformers

  • Chengliang Liu
  • Jie Wen
  • Xiaoling Luo
  • Yong Xu

As we all know, multi-view data is more expressive than single-view data and multi-label annotation enjoys richer supervision information than single-label, which makes multi-view multi-label learning widely applicable for various pattern recognition tasks. In this complex representation learning problem, three main challenges can be characterized as follows: i) How to learn consistent representations of samples across all views? ii) How to exploit and utilize category correlations of multi-label to guide inference? iii) How to avoid the negative impact resulting from the incompleteness of views or labels? To cope with these problems, we propose a general multi-view multi-label learning framework named label-guided masked view- and category-aware transformers in this paper. First, we design two transformer-style based modules for cross-view features aggregation and multi-label classification, respectively. The former aggregates information from different views in the process of extracting view-specific features, and the latter learns subcategory embedding to improve classification performance. Second, considering the imbalance of expressive power among views, an adaptively weighted view fusion module is proposed to obtain view-consistent embedding features. Third, we impose a label manifold constraint in sample-level representation learning to maximize the utilization of supervised information. Last but not least, all the modules are designed under the premise of incomplete views and labels, which makes our method adaptable to arbitrary multi-view and multi-label data. Extensive experiments on five datasets confirm that our method has clear advantages over other state-of-the-art methods.

NeurIPS Conference 2023 Conference Paper

Masked Two-channel Decoupling Framework for Incomplete Multi-view Weak Multi-label Learning

  • Chengliang Liu
  • Jie Wen
  • Yabo Liu
  • Chao Huang
  • Zhihao Wu
  • Xiaoling Luo
  • Yong Xu

Multi-view learning has become a popular research topic in recent years, but research on the cross-application of classic multi-label classification and multi-view learning is still in its early stages. In this paper, we focus on the complex yet highly realistic task of incomplete multi-view weak multi-label learning and propose a masked two-channel decoupling framework based on deep neural networks to solve this problem. The core innovation of our method lies in decoupling the single-channel view-level representation, which is common in deep multi-view learning methods, into a shared representation and a view-proprietary representation. We also design a cross-channel contrastive loss to enhance the semantic property of the two channels. Additionally, we exploit supervised information to design a label-guided graph regularization loss, helping the extracted embedding features preserve the geometric structure among samples. Inspired by the success of masking mechanisms in image and text analysis, we develop a random fragment masking strategy for vector features to improve the learning ability of encoders. Finally, it is important to emphasize that our model is fully adaptable to arbitrary view and label absences while also performing well on the ideal full data. We have conducted sufficient and convincing experiments to confirm the effectiveness and advancement of our model.

AAAI Conference 2023 Conference Paper

MVCINN: Multi-View Diabetic Retinopathy Detection Using a Deep Cross-Interaction Neural Network

  • Xiaoling Luo
  • Chengliang Liu
  • Waikeung Wong
  • Jie Wen
  • Xiaopeng Jin
  • Yong Xu

Diabetic retinopathy (DR) is the main cause of irreversible blindness for working-age adults. The previous models for DR detection have difficulties in clinical application. The main reason is that most of the previous methods only use single-view data, and the single field of view (FOV) only accounts for about 13% of the FOV of the retina, resulting in the loss of most lesion features. To alleviate this problem, we propose a multi-view model for DR detection, which takes full advantage of multi-view images covering almost all of the retinal field. To be specific, we design a Cross-Interaction Self-Attention based Module (CISAM) that interfuses local features extracted from convolutional blocks with long-range global features learned from transformer blocks. Furthermore, considering the pathological association in different views, we use the feature jigsaw to assemble and learn the features of multiple views. Extensive experiments on the latest public multi-view MFIDDR dataset with 34,452 images demonstrate the superiority of our method, which performs favorably against state-of-the-art models. To the best of our knowledge, this work is the first study on the public large-scale multi-view fundus images dataset for DR detection.

AAAI Conference 2019 Conference Paper

MPD-AL: An Efficient Membrane Potential Driven Aggregate-Label Learning Algorithm for Spiking Neurons

  • Malu Zhang
  • Jibin Wu
  • Yansong Chua
  • Xiaoling Luo
  • Zihan Pan
  • Dan Liu
  • Haizhou Li

One of the long-standing questions in biology and machine learning is how neural networks may learn important features from the input activities with a delayed feedback, commonly known as the temporal credit-assignment problem. The aggregate-label learning is proposed to resolve this problem by matching the spike count of a neuron with the magnitude of a feedback signal. However, the existing threshold-driven aggregate-label learning algorithms are computationally intensive, resulting in relatively low learning efficiency hence limiting their usability in practical applications. In order to address these limitations, we propose a novel membrane-potential driven aggregate-label learning algorithm, namely MPD-AL. With this algorithm, the easiest modifiable time instant is identified from membrane potential traces of the neuron, and guild the synaptic adaptation based on the presynaptic neurons’ contribution at this time instant. The experimental results demonstrate that the proposed algorithm enables the neurons to generate the desired number of spikes, and to detect useful clues embedded within unrelated spiking activities and background noise with a better learning efficiency over the state-of-the-art TDP1 and Multi-Spike Tempotron algorithms. Furthermore, we propose a data-driven dynamic decoding scheme for practical classification tasks, of which the aggregate labels are hard to define. This scheme effectively improves the classification accuracy of the aggregate-label learning algorithms as demonstrated on a speech recognition task.