Arrow Research search

Author name cluster

Qi Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

21 papers
1 author row

Possible papers

21

AAAI Conference 2026 Conference Paper

AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals

  • Qi Xu
  • Shuai Gong
  • Xuming Ran
  • Haihua Luo
  • Yangfan Hu

While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which limits their ability to generalize across stimuli and individuals. We introduce the Adaptive Visual Model (AVM), a structure-preserving framework that enables condition-aware adaptation through modular subnetworks, without modifying the core representation. AVM keeps a Vision Transformer-based encoder frozen to capture consistent visual features, while independently trained modulation paths account for neural response variations driven by stimulus content and subject identity. We evaluate AVM in three experimental settings, including stimulus-level variation, cross-subject generalization, and cross-dataset adaptation, all of which involve structured changes in inputs and individuals. Across two large-scale mouse V1 datasets, AVM outperforms the state-of-the-art V1T model by approximately 2% in predictive correlation, demonstrating robust generalization, interpretable condition-wise modulation, and high architectural efficiency. Specifically, AVM achieves a 9.1% improvement in explained variance (FEVE) under the cross-dataset adaptation setting. These results suggest that AVM provides a unified framework for adaptive neural modeling across biological and experimental conditions, offering a scalable solution under structural constraints. Its design may inform future approaches to cortical modeling in both neuroscience and biologically inspired AI systems.

TMLR Journal 2026 Journal Article

Causal Decoding for Hallucination-Resistant Multimodal Large Language Models

  • Shiwei Tan
  • Hengyi Wang
  • Weiyi Qin
  • Qi Xu
  • Zhigang Hua
  • Hao Wang

Multimodal Large Language Models (MLLMs) deliver detailed responses on vision-language tasks, yet remain susceptible to object hallucination (introducing objects not present in the image), undermining reliability in practice. Prior efforts often rely on heuristic penalties, post-hoc correction, or generic decoding tweaks, which do not directly intervene in the mechanisms that trigger object hallucination and thus yield limited gains. To address this challenge, we propose a causal decoding framework that applies targeted causal interventions during generation to curb spurious object mentions. By reshaping the decoding dynamics to attenuate spurious dependencies, our approach reduces false object tokens while maintaining descriptive quality. Across captioning and QA benchmarks, our framework substantially lowers object-hallucination rates and achieves state-of-the-art faithfulness without degrading overall output quality.

AAAI Conference 2026 Conference Paper

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

  • Huiyan Xue
  • Xuming Ran
  • Yaxin Li
  • Qi Xu
  • Enhui Li
  • Yi Xu
  • Qiang Zhang

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures like Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity poses two fundamental challenges: (1) the isolation of sparse subnetworks severely limits cross-task knowledge reuse; and (2) increased sparsity reduces interference but often degrades performance due to constrained feature sharing.We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer, but as a topology-aligned information conduit. By identifying neurons with high activation frequency, SSD selectively distills knowledge within previous Top-K subnetworks and output logits—without requiring replay or task labels—preserving both sparsity and functional specialization.Unlike conventional distillation, SSD operates under hard modular constraints and enables structural realignment without altering the sparse architecture.While our method is validated on SDMLP, its structure-aligned mechanism has the potential to generalize to other sparse networks as a plug-in module for promoting representation sharing.Comprehensive experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and manifold coverage, offering a structurally grounded solution to sparse continual learning.

JBHI Journal 2026 Journal Article

Sleep Stage Specificity to Window Length Variations: A Decision Fusion Strategy for Enhanced Scoring

  • Zhaowen Wang
  • Dongdong Zhou
  • Qi Xu
  • Fengyu Cong
  • Mohammad Al-Sa'd
  • Jenni Raitoharju

Sleep stage scoring is a fundamental component of sleep medicine, enabling a comprehensive assessment of sleep architecture and quality. While the standard 30-second (30s) epoch defined by the American Academy of Sleep Medicine represents the clinical gold standard, most automatic sleep stage scoring algorithms process these fixed segments in isolation. This approach may hinder the detection of transient arousal events, sleep spindles, K-complexes, and other phasic sleep characteristics that occur on finer timescales, thus necessitating analysis at sub-epoch resolution. To leverage complementary information across temporal scales, we propose a Multi-scale Decision Fusion Sleep Network (MDFSleepNet). Our systematic analysis across window lengths (1-30 seconds) reveals significant stage-specific temporal preferences: N1 and N3 stages achieve higher accuracy with 30s windows capturing comprehensive context, while N2 stage classification benefits markedly from shorter windows (1-2 seconds) optimized for transient micro-structure detection. REM stage preferences exhibit dataset variability. Motivated by these findings, MDFSleepNet integrates these complementary scales through a dual-stream architecture combining multi-scale segmentation, scale-specific feature learning, and cross-scale fusion. Evaluated on ISRUC-S1 and ISRUC-S3 (by fusing 5s and 30s windows), MDFSleepNet achieves state-of-the-art accuracies of 83. 5% and 84. 8% (Cohen's Kappa: 0. 786, 0. 804). On Sleep-EDF-20 (fusing 15s and 30s windows), it reaches 90. 9% accuracy (Cohen's Kappa: 0. 875), demonstrating robust performance through complementary multi-scale fusion. The source code for this study is publicly available at https://github.com/wzw999/MDFSleepNet.

AAAI Conference 2026 Conference Paper

Spatial-Frequency Spiking Neural Network for Underwater Object Detection

  • Long Chen
  • Wei Miao
  • Xin Gao
  • Yunzhi Zhuge
  • Hongming Xu
  • Yaxin Li
  • Qi Xu

Underwater object detection presents significant challenges due to the unique visual degradations in underwater environments, such as low contrast, poor visibility, and blurry object boundaries. While ANNs have achieved impressive detection accuracy, their high computational cost and power consumption limit their deployment in resource-constrained underwater platforms. In this work, we propose a Spatial-Frequency Spiking Neural Network (SFSNN) that combines the energy-efficient and event-driven nature of Spiking Neural Networks (SNNs) with the discriminative power of spatial-frequency analysis. SFSNN introduces a novel spatial-frequency spiking module that integrates spatial and frequency-domain representations, enhancing edge and texture features crucial for object detection in murky waters. Furthermore, we adapt the YOLOX architecture into a spike-based detector via ANN-to-SNN conversion using signed spiking neurons. Extensive experiments on the RUOD dataset demonstrate that SFSNN achieves superior performance over both SNN- and ANN-based detection models, offering a compelling solution for low-power underwater object detection.

AAAI Conference 2025 Conference Paper

ALADE-SNN: Adaptive Logit Alignment in Dynamically Expandable Spiking Neural Networks for Class Incremental Learning

  • Wenyao Ni
  • Jiangrong Shen
  • Qi Xu
  • Huajin Tang

Inspired by the human brain's ability to adapt to new tasks without erasing prior knowledge, we develop spiking neural networks (SNNs) with dynamic structures for Class Incremental Learning (CIL). Our analytical experiments reveal that limited datasets introduce biases in logits distributions among tasks. Fixed features from frozen past-task extractors can cause overfitting and hinder the learning of new tasks. To address these challenges, we propose the ALADE-SNN framework, which includes adaptive logit alignment for balanced feature representation and OtoN suppression to manage weights mapping frozen old features to new classes during training, releasing them during fine-tuning. This approach dynamically adjusts the network architecture based on analytical observations, improving feature extraction and balancing performance between new and old tasks. Experiment results show that ALADE-SNN achieves an average incremental accuracy of 75.42 ± 0.74% on the CIFAR100-B0 dataset over 10 incremental steps. ALADE-SNN not only matches the performance of DNN-based methods but also surpasses state-of-the-art SNN-based continual learning algorithms. This advancement enhances continual learning in neuromorphic computing, offering a brain-inspired, energy-efficient solution for real-time data processing.

AAAI Conference 2025 Conference Paper

BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations

  • Yusong Wang
  • Xuanye Fang
  • Huifeng Yin
  • Dongyuan Li
  • Guoqi Li
  • Qi Xu
  • Yi Xu
  • Shuai Zhong

Considering the importance of capturing both global conversational topics and local speaker dependencies for multimodal emotion recognition in conversations, current approaches first utilize sequence models like Transformer to extract global context information, then apply Graph Neural Networks to model local speaker dependencies for local context information extraction, coupled with Graph Contrastive Learning (GCL) to enhance node representation learning. However, this sequential design introduces potential biases: the extracted global context information inevitably influences subsequent processing, compromising the independence and diversity of the original local features; current graph augmentation methods in GCL cannot consider both global and local context information in conversations to evaluate the node importance, hindering the learning of key information. Inspired by the human brain excels at handling complex tasks by efficiently integrating local and global information processing mechanisms, we propose an aligned global-local context fusion framework for sequence-based design to address these problems. This design includes a dual-attention Transformer and a dual-evaluation method for graph augmentation in GCL. The dual-attention Transformer combines global attention for overall context extraction with sliding-window attention for local context capture, both enhanced by spiking neuron dynamics. The dual-evaluation method in GCL comprises global importance evaluation to identify nodes crucial for overall conversation context, and local importance evaluation to detect nodes significant for local semantics, generating augmented graph views that preserve both global and local information. This approach ensures balanced information processing throughout the pipeline, enhancing biological plausibility and achieving superior emotion recognition.

AILAW Journal 2025 Journal Article

Comprehensive research on semantic understanding, applicability, and impact analysis of legal provisions based on deep learning and natural language processing

  • Qi Xu

Abstract Semantic legal data offers the basis for a methodical examination of legal provisions and is vital for comprehending and deciphering legal regulations. Nevertheless, manually adding semantic metadata to sizable criminal datasets is expensive and time-consuming. The cutting-edge study addresses two essential troubles: the requirements engineering (RE) literature lacks a standardized framework for semantic metadata types relevant to prison requirements evaluation, and there may be insufficient automatic guide for extracting those metadata types, especially while using deep learning (DL) and Natural language processing (NLP) capabilities. To address those problems, a comprehensive framework was first created via reviewing and integrating the semantic criminal metadata categories determined inside the RE literature. After that, an automated extraction technique that makes use of NLP was created to help realize and examine legal texts, enabling to find and categorize critical metadata. A Binary Moth-Flame Optimized Dynamic recurrent neural Network (BMFO-DRNN) is then used to broaden an automatic extraction approach for the specified metadata lessons. To increase the satisfactory and relevance of the entered statistics, preprocessing techniques, which include tokenization, stemming, lemmatization, and word embeddings like Word2Vec, were used for characteristic extraction. Experimental result shows the BMFO-DRNN model outperforms traditional methods, achieving accuracy (95%), F1-score (92%), precision (93%), and recall (96%). In addition to demonstrating the price of NLP in automated semantic analysis legal metadata, this work additionally shows how NLP should improve the efficacy and performance of legal evaluation in an effort to assist criminal informatics advancement.

AAAI Conference 2025 Conference Paper

FSTA-SNN:Frequency-Based Spatial-Temporal Attention Module for Spiking Neural Networks

  • Kairong Yu
  • Tianqing Zhang
  • Hongwei Wang
  • Qi Xu

Spiking Neural Networks (SNNs) are emerging as a promising alternative to Artificial Neural Networks (ANNs) due to their inherent energy efficiency. Owing to the inherent sparsity in spike generation within SNNs, the in-depth analysis and optimization of intermediate output spikes are often neglected. This oversight significantly restricts the inherent energy efficiency of SNNs and diminishes their advantages in spatiotemporal feature extraction, resulting in a lack of accuracy and unnecessary energy expenditure. In this work, we analyze the inherent spiking characteristics of SNNs from both temporal and spatial perspectives. In terms of spatial analysis, we find that shallow layers tend to focus on learning vertical variations, while deeper layers gradually learn horizontal variations of features. Regarding temporal analysis, we observe that there is not a significant difference in feature learning across different time steps. This suggests that increasing the time steps has limited effect on feature learning. Based on the insights derived from these analyses, we propose a Frequency-based Spatial-Temporal Attention (FSTA) module to enhance feature learning in SNNs. This module aims to improve the feature learning capabilities by suppressing redundant spike features. The experimental results indicate that the introduction of the FSTA module significantly reduces the spike firing rate of SNNs, demonstrating superior performance compared to state-of-the-art baselines across multiple datasets.

NeurIPS Conference 2025 Conference Paper

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

  • Wei Chow
  • Yuan Gao
  • Linfeng Li
  • Xian Wang
  • Qi Xu
  • Hang Song
  • Lingdong Kong
  • Ran Zhou

Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios frequently involve interleaved multi-condition queries with multiple images. Hence, this paper introduces MERIT, the first multilingual dataset for interleaved multi-condition semantic retrieval, comprising 320, 000 queries with 135, 000 products in 5 languages, covering 7 distinct product categories. Extensive experiments on MERIT identify existing models's critical limitation: focusing solely on global semantic information while neglecting specific conditional elements in queries. Consequently, we propose Coral, a novel fine-tuning framework that adapts pre-trained MLLMs by integrating embedding reconstruction to preserve fine-grained conditional elements and contrastive learning to extract comprehensive global semantics. Experiments demonstrate that Coral achieves a 45. 9% performance improvement over conventional approaches on MERIT, with strong generalization capabilities validated across 8 established retrieval benchmarks. Collectively, our contributions—a novel dataset, identification of critical limitations in existing approaches, and an innovative fine-tuning framework—establish a foundation for future research in interleaved multi-condition semantic retrieval. Data & Code: MERIT-2025. github. io

JBHI Journal 2025 Journal Article

Multi-Task Adaptive Resolution Network for Lymph Node Metastasis Diagnosis From Whole Slide Images of Colorectal Cancer

  • Tong Wang
  • Su-Jin Shin
  • Mingkang Wang
  • Qi Xu
  • Guiyang Jiang
  • Fengyu Cong
  • Jeonghyun Kang
  • Hongming Xu

Automated detection of lymph node metastasis (LNM) holds great potential to alleviate the workload of doctors and reduce misinterpretations. Despite the practical successes achieved, effectively addressing the highly complex and heterogeneous tumor microenvironment remains an open and challenging problem, especially when tumor subtypes intermingle and are difficult to delineate. In this paper, we propose a multi-task adaptive resolution network, named MAR-Net, for LNM detection and subtyping in complex mixed-type cancers. Specifically, we construct a resolution-aware module to mine heterogeneous diagnostic information, which exploits the multi-scale pyramid information and adaptively combines multi-resolution structured features for comprehensive representation. Additionally, we adopt a multi-task learning approach that simultaneously addresses LNM detection and subtyping, reducing model instability during optimization and improving performance across both tasks. More importantly, to rectify the potential misclassification of tumor subtypes, we elaborately design a hierarchical subtying refinement (HSR) algorithm that leverages a generic segmentation model informed by pathologists' prior knowledge. Evaluations have been conducted on three private and one public cancer datasets (554 WSIs, 4. 8 million patches). Our experimental results demonstrate that the proposed method consistently achieves superior performance compared to the state-of-the-art methods, achieving 0. 5% to 3. 2% higher AUC in LNM detection and 3. 8% to 4. 4% higher AUC in LNM subtyping.

AAAI Conference 2025 Conference Paper

Multi-View Incremental Learning with Structured Hebbian Plasticity for Enhanced Fusion Efficiency

  • Yuhong Chen
  • Ailin Song
  • Huifeng Yin
  • Shuai Zhong
  • Fuhai Chen
  • Qi Xu
  • Shiping Wang
  • Mingkun Xu

The rapid evolution of multimedia technology has revolutionized human perception, paving the way for multi-view learning. However, traditional multi-view learning approaches are tailored for scenarios with fixed data views, falling short of emulating the intricate cognitive procedures of the human brain processing signals sequentially. Our cerebral architecture seamlessly integrates sequential data through intricate feed-forward and feedback mechanisms. In stark contrast, traditional methods struggle to generalize effectively when confronted with data spanning diverse domains, highlighting the need for innovative strategies that can mimic the brain's adaptability and dynamic integration capabilities. In this paper, we propose a bio-neurologically inspired multi-view incremental framework named MVIL aimed at emulating the brain's fine-grained fusion of sequentially arriving views. MVIL lies two fundamental modules: structured Hebbian plasticity and synaptic partition learning. The structured Hebbian plasticity reshapes the structure of weights to express the high correlation between view representations, facilitating a fine-grained fusion of view representations. Moreover, synaptic partition learning is efficient in alleviating drastic changes in weights and also retaining old knowledge by inhibiting partial synapses. These modules bionically play a central role in reinforcing crucial associations between newly acquired information and existing knowledge repositories, thereby enhancing the network's capacity for generalization. Experimental results on six benchmark datasets show MVIL's effectiveness over state-of-the-art methods.

AAAI Conference 2025 Conference Paper

SpikingYOLOX: Improved YOLOX Object Detection with Fast Fourier Convolution and Spiking Neural Networks

  • Wei Miao
  • Jiangrong Shen
  • Qi Xu
  • Timo Hamalainen
  • Yi Xu
  • Fengyu Cong

In recent years, with the advancements in brain science, spiking neural networks (SNNs) have garnered significant attention. SNNs can generate spikes that mimic the function of neurons transmission in humans brain, thereby significantly reducing computational costs by the event-driven nature during training. While deep SNNs have shown impressive performance on classification tasks, they still face challenges in more complex tasks such as object detection. In this paper, we propose SpikingYOLOX, extending the structure of the original YOLOX by introducing signed spiking neurons and fast Fourier convolution (FFC). The designed ternary signed spiking neurons could generate three kinds of spikes to obtain more robust features in the deep layer of the backbone. Meanwhile, we integrate FFC with SNN modules to enhance object detection performance, because its global receptive field is beneficial to the object detection task. Extensive experiments demonstrate that the proposed SpikingYOLOX achieves state-of-the-art performance among other SNN-based object detection methods.

JMLR Journal 2024 Journal Article

Differentially Private Data Release for Mixed-type Data via Latent Factor Models

  • Yanqing Zhang
  • Qi Xu
  • Niansheng Tang
  • Annie Qu

Differential privacy is a particular data privacy-preserving technology which enables synthetic data or statistical analysis results to be released with a minimum disclosure of private information from individual records. The tradeoff between privacy-preserving and utility guarantee is always a challenge for differential privacy technology, especially for synthetic data generation. In this paper, we propose a differentially private data synthesis algorithm for mixed-type data with correlation based on latent factor models. The proposed method can add a relatively small amount of noise to synthetic data under a given level of privacy protection while capturing correlation information. Moreover, the proposed algorithm can generate synthetic data preserving the same data type as mixed-type original data, which greatly improves the utility of synthetic data. The key idea of our method is to perturb the factor matrix and factor loading matrix to construct a synthetic data generation model, and to utilize link functions with privacy protection to ensure consistency of synthetic data type with original data. The proposed method can generate privacy-preserving synthetic data at low computation cost even when the original data is high-dimensional. In theory, we establish differentially private properties of the proposed method. Our numerical studies also demonstrate superb performance of the proposed method on the utility guarantee of the statistical analysis based on privacy-preserved synthetic data. [abs] [ pdf ][ bib ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning

  • Jiangrong Shen
  • Wenyao Ni
  • Qi Xu
  • Huajin Tang

The next generation of machine intelligence requires the capability of continual learning to acquire new knowledge without forgetting the old one while conserving limited computing resources. Spiking neural networks (SNNs), compared to artificial neural networks (ANNs), have more characteristics that align with biological neurons, which may be helpful as a potential gating function for knowledge maintenance in neural networks. Inspired by the selective sparse activation principle of context gating in biological systems, we present a novel SNN model with selective activation to achieve continual learning. The trace-based K-Winner-Take-All (K-WTA) and variable threshold components are designed to form the sparsity in selective activation in spatial and temporal dimensions of spiking neurons, which promotes the subpopulation of neuron activation to perform specific tasks. As a result, continual learning can be maintained by routing different tasks via different populations of neurons in the network. The experiments are conducted on MNIST and CIFAR10 datasets under the class incremental setting. The results show that the proposed SNN model achieves competitive performance similar to and even surpasses the other regularization-based methods deployed under traditional ANNs.

NeurIPS Conference 2023 Conference Paper

EICIL: Joint Excitatory Inhibitory Cycle Iteration Learning for Deep Spiking Neural Networks

  • Zihang Shao
  • Xuanye Fang
  • Yaxin Li
  • Chaoran Feng
  • Jiangrong Shen
  • Qi Xu

Spiking neural networks (SNNs) have undergone continuous development and extensive study for decades, leading to increased biological plausibility and optimal energy efficiency. However, traditional training methods for deep SNNs have some limitations, as they rely on strategies such as pre-training and fine-tuning, indirect coding and reconstruction, and approximate gradients. These strategies lack a complete training model and require gradient approximation. To overcome these limitations, we propose a novel learning method named Joint Excitatory Inhibitory Cycle Iteration learning for Deep Spiking Neural Networks (EICIL) that integrates both excitatory and inhibitory behaviors inspired by the signal transmission of biological neurons. By organically embedding these two behavior patterns into one framework, the proposed EICIL significantly improves the bio-mimicry and adaptability of spiking neuron models, as well as expands the representation space of spiking neurons. Extensive experiments based on EICIL and traditional learning methods demonstrate that EICIL outperforms traditional methods on various datasets, such as CIFAR10 and CIFAR100, revealing the crucial role of the learning approach that integrates both behaviors during training.

NeurIPS Conference 2023 Conference Paper

Enhancing Adaptive History Reserving by Spiking Convolutional Block Attention Module in Recurrent Neural Networks

  • Qi Xu
  • Yuyuan Gao
  • Jiangrong Shen
  • Yaxin Li
  • Xuming Ran
  • Huajin Tang
  • Gang Pan

Spiking neural networks (SNNs) serve as one type of efficient model to process spatio-temporal patterns in time series, such as the Address-Event Representation data collected from Dynamic Vision Sensor (DVS). Although convolutional SNNs have achieved remarkable performance on these AER datasets, benefiting from the predominant spatial feature extraction ability of convolutional structure, they ignore temporal features related to sequential time points. In this paper, we develop a recurrent spiking neural network (RSNN) model embedded with an advanced spiking convolutional block attention module (SCBAM) component to combine both spatial and temporal features of spatio-temporal patterns. It invokes the history information in spatial and temporal channels adaptively through SCBAM, which brings the advantages of efficient memory calling and history redundancy elimination. The performance of our model was evaluated in DVS128-Gesture dataset and other time-series datasets. The experimental results show that the proposed SRNN-SCBAM model makes better use of the history information in spatial and temporal dimensions with less memory space, and achieves higher accuracy compared to other models.

AAAI Conference 2023 Conference Paper

ESL-SNNs: An Evolutionary Structure Learning Strategy for Spiking Neural Networks

  • Jiangrong Shen
  • Qi Xu
  • Jian K. Liu
  • Yueming Wang
  • Gang Pan
  • Huajin Tang

Spiking neural networks (SNNs) have manifested remarkable advantages in power consumption and event-driven property during the inference process. To take full advantage of low power consumption and improve the efficiency of these models further, the pruning methods have been explored to find sparse SNNs without redundancy connections after training. However, parameter redundancy still hinders the efficiency of SNNs during training. In the human brain, the rewiring process of neural networks is highly dynamic, while synaptic connections maintain relatively sparse during brain development. Inspired by this, here we propose an efficient evolutionary structure learning (ESL) framework for SNNs, named ESL-SNNs, to implement the sparse SNN training from scratch. The pruning and regeneration of synaptic connections in SNNs evolve dynamically during learning, yet keep the structural sparsity at a certain level. As a result, the ESL-SNNs can search for optimal sparse connectivity by exploring all possible parameters across time. Our experiments show that the proposed ESL-SNNs framework is able to learn SNNs with sparse structures effectively while reducing the limited accuracy. The ESL-SNNs achieve merely 0.28% accuracy loss with 10% connection density on the DVS-Cifar10 dataset. Our work presents a brand-new approach for sparse training of SNNs from scratch with biologically plausible evolutionary mechanisms, closing the gap in the expressibility between sparse training and dense training. Hence, it has great potential for SNN lightweight training and inference with low power consumption and small memory usage.

AAAI Conference 2022 Conference Paper

DIRL: Domain-Invariant Representation Learning for Generalizable Semantic Segmentation

  • Qi Xu
  • Liang Yao
  • Zhengkai Jiang
  • Guannan Jiang
  • Wenqing Chu
  • Wenhui Han
  • Wei Zhang
  • Chengjie Wang

Model generalization to the unseen scenes is crucial to realworld applications, such as autonomous driving, which requires robust vision systems. To enhance the model generalization, domain generalization through learning the domaininvariant representation has been widely studied. However, most existing works learn the shared feature space within multi-source domains but ignore the characteristic of the feature itself (e. g. , the feature sensitivity to the domain-specific style). Therefore, we propose the Domain-invariant Representation Learning (DIRL) for domain generalization which utilizes the feature sensitivity as the feature prior to guide the enhancement of the model generalization capability. The guidance reflects in two folds: 1) Feature re-calibration that introduces the Prior Guided Attention Module (PGAM) to emphasize the insensitive features and suppress the sensitive features. 2): Feature whiting that proposes the Guided Feature Whiting (GFW) to remove the feature correlations which are sensitive to the domain-specific style. We construct the domain-invariant representation which suppresses the effect of the domain-specific style on the quality and correlation of the features. As a result, our method is simple yet effective, and can enhance the robustness of various backbone networks with little computational cost. Extensive experiments over multiple domains generalizable segmentation tasks show the superiority of our approach to other methods.

AAAI Conference 2021 System Paper

An Intelligent Assistant for Problem Behavior Management

  • Penghe Chen
  • Yu Lu
  • Jiefei Liu
  • Qi Xu

We design and implement an intelligent assistant, called PB- Advisor, to advise teachers and parents on students’ problem behaviors. It utilizes a task-oriented dialogue system to identify the need deficiency underlying students’ problem behaviors, and relies on a community question answering system to provide advice on typical problem behavior management. In addition, it also provides various learning resources, and illustrates the relations between influential factors on typical problem behaviors through data analysis. With PB-Advisor, teachers and parents without psychological expertise can easily find proper advice on students’ problem behaviors.

IJCAI Conference 2018 Conference Paper

CSNN: An Augmented Spiking based Framework with Perceptron-Inception

  • Qi Xu
  • Yu Qi
  • Hang Yu
  • Jiangrong Shen
  • Huajin Tang
  • Gang Pan

Spiking Neural Networks (SNNs) represent and transmit information in spikes, which is considered more biologically realistic and computationally powerful than the traditional Artificial Neural Networks. The spiking neurons encode useful temporal information and possess highly anti-noise property. The feature extraction ability of typical SNNs is limited by shallow structures. This paper focuses on improving the feature extraction ability of SNNs in virtue of powerful feature extraction ability of Convolutional Neural Networks (CNNs). CNNs can extract abstract features resorting to the structure of the convolutional feature maps. We propose a CNN-SNN (CSNN) model to combine feature learning ability of CNNs with cognition ability of SNNs. The CSNN model learns the encoded spatial temporal representations of images in an event-driven way. We evaluate the CSNN model on the handwritten digits images dataset MNIST and its variational databases. In the presented experimental results, the proposed CSNN model is evaluated regarding learning capabilities, encoding mechanisms, robustness to noisy stimuli and its classification performance. The results show that CSNN behaves well compared to other cognitive models with significantly fewer neurons and training samples. Our work brings more biological realism into modern image classification models, with the hope that these models can inform how the brain performs this high-level vision task.