Author name cluster

Malu Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

AAAI Conference 2026 Conference Paper

HardF-SNN: Hardware-Friendly Quantization for Spiking Neural Networks with Efficient Integer-Arithmetic-Only Inference

Hanwen Liu
Kexin Shi
Jieyuan Zhang
Yimeng Shan
Jibin Wu
Wenyu Chen
Malu Zhang

Spiking Neural Networks (SNNs) are emerging as a promising energy-efficient alternative to Artificial Neural Networks (ANNs) due to their event-driven computation paradigm. However, recent advances toward large-scale high-performance SNNs inevitably lead to substantial memory and computational overhead. While quantization offers a potential way, many quantization approaches fail to deliver verifiable efficiency gains on resource-constrained hardware platforms. In this paper, we propose a lightweight and hardware-friendly SNN, termed HardF-SNN. Specifically, we first build a baseline model using shared-scale quantization and BN folding to simulate integer-only inference, as this has not been thoroughly discussed in prior SNN works. Then, through empirical and theoretical analysis, we identify that the baseline suffers from accuracy degradation and may cause training failure. To mitigate these issues, we propose proportional shared-scale quantization for enhanced dynamic range and integer-only BN using bit-shifting to stabilize training. Extensive experiments show that HardF-SNN achieves an optimal balance between performance and efficiency with excellent hardware compatibility. To demonstrate its effectiveness on resource-limited platforms, HardF-SNN is deployed on a dedicated FPGA-based hardware accelerator. Evaluation results indicate that our implementation achieves significant performance improvements over several existing hardware accelerators.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Training-Free and Accurate ANN-to-SNN Conversion via Activation-Aware Redistribution

Honglin Cao
Shuai Wang
Zijian Zhou
Ammar Belatreche
Wenjie Wei
Yu Liang
Yu Yang
Rui Xi

Conversion represents an effective approach for obtaining low-power models by transforming Artificial Neural Networks (ANNs) into event-driven Spiking Neural Networks (SNNs) without additional training. However, existing training-free conversion methods often incur substantial conversion errors. Here, we first reveal that these conversion errors primarily arise from a distributional mismatch, as the activation distributions of ANNs exhibit channel-wise shifts and scaling, whereas spike rates lack corresponding channel-specific characteristics. To address this limitation, we propose Adaptive Integrate-and-Fire (AIF) neurons with channel-specific thresholds and membrane-potential offsets that dynamically adjust spike rates. These parameters are optimized to jointly minimize conversion errors and maximize information entropy, enabling AIF neurons to capture the activation distribution characteristics of the original ANN. Moreover, AIF neurons can be seamlessly integrated into Transformer architectures with only negligible additional computational cost. Our method achieves state-of-the-art results on multiple vision and natural language processing benchmarks, in particular attaining a notable top-1 accuracy of 85.52% on ImageNet-1K.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Training-Free ANN-to-SNN Conversion for High-Performance Spiking Transformers

Jingya Wang
Xin Deng
Wenjie Wei
Dehao Zhang
Shuai Wang
Qian Sun
Jieyuan Zhang
Hanwen Liu

Leveraging the event-driven paradigm, Spiking Neural Networks (SNNs) offer a promising approach for constructing energy-efficient Transformer architectures. Compared to directly trained Spiking Transformers, ANN-to-SNN conversion methods bypass the high training costs. However, existing methods still suffer from notable limitations, failing to effectively handle nonlinear operations in Transformer architectures and requiring additional fine-tuning processes for pre-trained ANNs. To address these issues, we propose a high-performance and training-free ANN-to-SNN conversion framework tailored for Transformer architectures. Specifically, we introduce a Multi-basis Exponential Decay (MBE) neuron, which employs an exponential decay strategy and multi-basis encoding method to efficiently approximate various nonlinear operations. It removes the requirement for weight modifications in pre-trained ANNs. Extensive experiments across diverse tasks (CV, NLU, NLG) and mainstream Transformer architectures (ViT, RoBERTa, GPT-2) demonstrate that our method achieves near-lossless conversion accuracy with significantly lower latency. This provides a promising pathway for the efficient and scalable deployment of Spiking Transformers in real-world applications.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning

Yimeng Shan
Malu Zhang
Rui-jie Zhu
Xuerui Qiu
Jason K. Eshraghian
Haicheng Qu

Recent advancements in neuroscience research have propelled the development of Spiking Neural Networks (SNNs), which not only have the potential to further advance neuroscience research but also serve as an energy-efficient alternative to Artificial Neural Networks (ANNs) due to their spike-driven characteristics. However, previous studies often overlooked the multiscale information and its spatiotemporal correlation between event data, leading SNN models to approximate each frame of input events as static images. We hypothesize that this oversimplification significantly contributes to the performance gap between SNNs and traditional ANNs. To address this issue, we have designed a Spiking Multiscale Attention (SMA) module that captures multiscale spatiotemporal interaction information. Furthermore, we developed a regularization method named Attention ZoneOut (AZO), which utilizes spatiotemporal attention weights to reduce the model's generalization error through pseudo-ensemble training. Our approach has achieved state-of-the-art results on mainstream neuromorphic datasets. Additionally, we have reached a performance of 77.1\% on the Imagenet-1K dataset using a 104-layer ResNet architecture enhanced with SMA and AZO. This achievement confirms the state-of-the-art performance of SNNs with non-transformer architectures and underscores the effectiveness of our method in bridging the performance gap between SNN models and traditional ANN models.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Binary Event-Driven Spiking Transformer

Honglin Cao
Zijian Zhou
Wenjie Wei
Yu Liang
Ammar Belatreche
Dehao Zhang
Malu Zhang
Yang Yang

Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm that combines the high performance of Transformers with the energy efficiency of SNNs. However, the larger model size and increased computational demands of the Transformer structure limit their practicality in resource-constrained scenarios. In this paper, we integrate binarization techniques into Transformer-based SNNs and propose the Binary Event-Driven Spiking Transformer, i. e. BESTformer. The proposed BESTformer can significantly reduce storage and computational demands by representing weights and attention maps with a mere 1-bit. However, BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization. To address this issue, we propose a Coupled Information Enhancement (CIE) method, which consists of a reversible framework and information enhancement distillation. By maximizing the mutual information between the binary model and its full-precision counterpart, the CIE method effectively mitigates the performance degradation of the BESTformer. Extensive experiments on static and neuromorphic datasets demonstrate that our method achieves superior performance to other binary SNNs, showcasing its potential as a compact yet high-performance model for resource-limited edge devices. The repository of this paper is available at https: //github. com/CaoHLin/BESTFormer.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Bipolar Self-attention for Spiking Transformers

Shuai Wang
Malu Zhang
Jingya Wang
Dehao Zhang
Yimeng Shan
Jieyuan (Eric) Zhang
Yichen Xiao
Honglin Cao

Harnessing the event-driven characteristic, Spiking Neural Networks (SNNs) present a promising avenue toward energy-efficient Transformer architectures. However, existing Spiking Transformers still suffer significant performance gaps compared to their Artificial Neural Network counterparts. Through comprehensive analysis, we attribute this gap to these two factors. First, the binary nature of spike trains limits Spiking Self-attention (SSA)’s capacity to capture negative–negative and positive–negative membrane potential interactions on Querys and Keys. Second, SSA typically omits Softmax functions to avoid energy-intensive multiply-accumulate operations, thereby failing to maintain row-stochasticity constraints on attention scores. To address these issues, we propose a Bipolar Self-attention (BSA) paradigm, effectively modeling multi-polar membrane potential interactions with a fully spike-driven characteristic. Specifically, we demonstrate that ternary matrix multiplication provides a closer approximation to real-valued computation on both distribution and local correlation, enabling clear differentiation between homopolar and heteropolar interactions. Moreover, we propose a shift-based Softmax approximation named Shiftmax, which efficiently achieves low-entropy activation and partly maintains row-stochasticity without non-linear operation, enabling precise attention allocation. Extensive experiments show that BSA achieves substantial performance improvements across various tasks, including image classification, semantic segmentation, and event-based tracking. These results establish its potential as a fundamental building block for energy-efficient Spiking Transformers.

PDF Details

ICML Conference 2025 Conference Paper

BSO: Binary Spiking Online Optimization Algorithm

Yu Liang
Yu Yang
Wenjie Wei
Ammar Belatreche
Shuai Wang 0058
Malu Zhang
Yang Yang 0002

Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. BSO directly updates weights through flip signals under the online training framework. These signals are triggered when the product of gradient momentum and weights exceeds a threshold, eliminating the need for latent weights during training. To enhance performance, we propose T-BSO, a temporal-aware variant that leverages the inherent temporal dynamics of BSNNs by capturing gradient information across time steps for adaptive threshold adjustment. Theoretical analysis establishes convergence guarantees for both BSO and T-BSO, with formal regret bounds characterizing their convergence rates. Extensive experiments demonstrate that both BSO and T-BSO achieve superior optimization performance compared to existing training methods for BSNNs. The codes are available at https: //github. com/hamingsi/BSO.

Details

NeurIPS Conference 2025 Conference Paper

Dendritic Resonate-and-Fire Neuron for Effective and Efficient Long Sequence Modeling

Dehao Zhang
Malu Zhang
Shuai Wang
Jingya Wang
Wenjie Wei
Zeyu Ma
Guoqing Wang
Yang Yang

The explosive growth in sequence length has intensified the demand for effective and efficient long sequence modeling. Benefiting from intrinsic oscillatory membrane dynamics, Resonate-and-Fire (RF) neurons can efficiently extract frequency components from input signals and encode them into spatiotemporal spike trains, making them well-suited for long sequence modeling. However, RF neurons exhibit limited effective memory capacity and a trade-off between energy efficiency and training speed on complex temporal tasks. Inspired by the dendritic structure of biological neurons, we propose a Dendritic Resonate-and-Fire (D-RF) model, which explicitly incorporates a multi-dendritic and soma architecture. Each dendritic branch encodes specific frequency bands by utilizing the intrinsic oscillatory dynamics of RF neurons, thereby collectively achieving comprehensive frequency representation. Furthermore, we introduce an adaptive threshold mechanism into the soma structure. This mechanism adjusts the firing threshold according to historical spiking activity, thereby reducing redundant spikes while maintaining training efficiency in long-sequence tasks. Extensive experiments demonstrate that our method maintains competitive accuracy while substantially ensuring sparse spikes without compromising computational efficiency during training. These results underscore its potential as an effective and efficient solution for long sequence modeling on edge platforms.

PDF Details

AAAI Conference 2025 Conference Paper

Leveraging Asynchronous Spiking Neural Networks for Ultra Efficient Event-Based Visual Processing

DingYi Zeng
Yuchen Wang
Honglin Cao
Wanlong Liu
Yichen Xiao
ChengzhuoLu
Wenyu Chen
Malu Zhang

Event cameras encode visual information by generating asynchronous and sparse event streams, which hold great potential for low latency and low power consumption. Despite many successful implementations of event camera-based applications, most of them accumulate the events into frames and then utilize conventional frame-based computer vision algorithms. These frame-based methods, though typically effective, diminish the inherent advantages of the event camera's low latency and low power consumption. To solve the above problems, we propose ASGCN, which efficiently processes data on an event-by-event basis and dynamically evolves into a corresponding dynamic representation, enabling low latency and high sparsity of data representation. The sparsity computation is further improved by introducing brain-inspired spiking neural networks, resulting in low power consumption for ASGCN. Extensive and diverse experiments demonstrate the energy efficiency and low latency advantages of our processing pipeline. Especially on real-world event camera datasets, our pipeline consumes more than 10,000 times less energy and achieves similar performance compared to current frame-based methods.

PDF Details DOI

ICLR Conference 2025 Conference Paper

QP-SNN: Quantized and Pruned Spiking Neural Networks

Wenjie Wei
Malu Zhang
Zijian Zhou 0005
Ammar Belatreche
Yimeng Shan
Yu Liang
Honglin Cao
Jieyuan Zhang

Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to encode information and operate in an asynchronous event-driven manner, offering a highly energy-efficient paradigm for machine intelligence. However, the current SNN community focuses primarily on performance improvement by developing large-scale models, which limits the applicability of SNNs in resource-limited edge devices. In this paper, we propose a hardware-friendly and lightweight SNN, aimed at effectively deploying high-performance SNN in resource-limited scenarios. Specifically, we first develop a baseline model that integrates uniform quantization and structured pruning, called QP-SNN baseline. While this baseline significantly reduces storage demands and computational costs, it suffers from performance decline. To address this, we conduct an in-depth analysis of the challenges in quantization and pruning that lead to performance degradation and propose solutions to enhance the baseline's performance. For weight quantization, we propose a weight rescaling strategy that utilizes bit width more effectively to enhance the model's representation capability. For structured pruning, we propose a novel pruning criterion using the singular value of spatiotemporal spike activities to enable more accurate removal of redundant kernels. Extensive experiments demonstrate that integrating two proposed methods into the baseline allows QP-SNN to achieve state-of-the-art performance and efficiency, underscoring its potential for enhancing SNN deployment in edge intelligence computing.

Details

ICLR Conference 2025 Conference Paper

Quantized Spike-driven Transformer

Xuerui Qiu
Malu Zhang
Jieyuan Zhang
Wenjie Wei
Honglin Cao
Junsheng Guo
Rui-Jie Zhu 0003
Yimeng Shan

Spiking neural networks (SNNs) are emerging as a promising energy-efficient alternative to traditional artificial neural networks (ANNs) due to their spike-driven paradigm. However, recent research in the SNN domain has mainly focused on enhancing accuracy by designing large-scale Transformer structures, which typically rely on substantial computational resources, limiting their deployment on resource-constrained devices. To overcome this challenge, we propose a quantized spike-driven Transformer baseline (QSD-Transformer), which achieves reduced resource demands by utilizing a low bit-width parameter. Regrettably, the QSD-Transformer often suffers from severe performance degradation. In this paper, we first conduct empirical analysis and find that the bimodal distribution of quantized spike-driven self-attention (Q-SDSA) leads to spike information distortion (SID) during quantization, causing significant performance degradation. To mitigate this issue, we take inspiration from mutual information entropy and propose a bi-level optimization strategy to rectify the information distribution in Q-SDSA. Specifically, at the lower level, we introduce an information-enhanced LIF to rectify the information distribution in Q-SDSA. At the upper level, we propose a fine-grained distillation scheme for the QSD-Transformer to align the distribution in Q-SDSA with that in the counterpart ANN. By integrating the bi-level optimization strategy, the QSD-Transformer can attain enhanced energy efficiency without sacrificing its high-performance advantage. We validate the QSD-Transformer on various visual tasks, and experimental results indicate that our method achieves state-of-the-art results in the SNN domain. For instance, when compared to the prior SNN benchmark on ImageNet, the QSD-Transformer achieves 80.3\% top-1 accuracy, accompanied by significant reductions of 6.0$\times$ and 8.1$\times$ in power consumption and model size, respectively. Code is available at https://github.com/bollossom/QSD-Transformer.

Details

NeurIPS Conference 2025 Conference Paper

S$^2$NN: Sub-bit Spiking Neural Networks

Wenjie Wei
Malu Zhang
Jieyuan (Eric) Zhang
Ammar Belatreche
Shuai Wang
Yimeng Shan
Hanwen Liu
Honglin Cao

Spiking Neural Networks (SNNs) offer an energy-efficient paradigm for machine intelligence, but their continued scaling poses challenges for resource-limited deployment. Despite recent advances in binary SNNs, the storage and computational demands remain substantial for large-scale networks. To further explore the compression and acceleration potential of SNNs, we propose Sub-bit Spiking Neural Networks (S$^2$NNs) that represent weights with less than one bit. Specifically, we first establish an S$^2$NN baseline by leveraging the clustering patterns of kernels in well-trained binary SNNs. This baseline is highly efficient but suffers from \textit{outlier-induced codeword selection bias} during training. To mitigate this issue, we propose an \textit{outlier-aware sub-bit weight quantization} (OS-Quant) method, which optimizes codeword selection by identifying and adaptively scaling outliers. Furthermore, we propose a \textit{membrane potential-based feature distillation} (MPFD) method, improving the performance of highly compressed S$^2$NN via more precise guidance from a teacher model. Extensive results on vision reveal that S$^2$NN outperforms existing quantized SNNs in both performance and efficiency, making it promising for edge computing applications.

PDF Details

ICLR Conference 2025 Conference Paper

Spiking Vision Transformer with Saccadic Attention

Shuai Wang 0058
Malu Zhang
Dehao Zhang
Ammar Belatreche
Yichen Xiao
Yu Liang
Yimeng Shan
Qian Sun 0014

The combination of Spiking Neural Networks (SNNs) and Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.

Details

AAAI Conference 2025 Conference Paper

Towards Accurate Binary Spiking Neural Networks: Learning with Adaptive Gradient Modulation Mechanism

Yu Liang
Wenjie Wei
Ammar Belatreche
Honglin Cao
Zijian Zhou
Shuai Wang
Malu Zhang
Yang Yang

Binary Spiking Neural Networks (BSNNs) inherit the event-driven paradigm of SNNs, while also adopting the reduced storage burden of binarization techniques. These distinct advantages grant BSNNs lightweight and energy-efficient characteristics, rendering them ideal for deployment on resource-constrained edge devices. However, due to the binary synaptic weights and non-differentiable spike function, effectively training BSNNs remains an open question. In this paper, we conduct an in-depth analysis of the challenge for BSNN learning, namely the frequent weight sign flipping problem. To mitigate this issue, we propose an Adaptive Gradient Modulation Mechanism (AGMM), which is designed to reduce the frequency of weight sign flipping by adaptively adjusting the gradients during the learning process. The proposed AGMM can enable BSNNs to achieve faster convergence speed and higher accuracy, effectively narrowing the gap between BSNNs and their full-precision equivalents. We validate AGMM on both static and neuromorphic datasets, and results indicate that it achieves state-of-the-art results among BSNNs. This work substantially reduces storage demands and enhances SNNs' inherent energy efficiency, making them highly feasible for resource-constrained environments.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Unveiling the Spatial-temporal Effective Receptive Fields of Spiking Neural Networks

Jieyuan (Eric) Zhang
Xiaolong Zhou
Shuai Wang
Wenjie Wei
Hanwen Liu
Qian Sun
Malu Zhang
Yang Yang

Spiking Neural Networks (SNNs) demonstrate significant potential for energy-efficient neuromorphic computing through an event-driven paradigm. While training methods and computational models have greatly advanced, SNNs struggle to achieve competitive performance in visual long-sequence modeling tasks. In artificial neural networks, the effective receptive field (ERF) serves as a valuable tool for analyzing feature extraction capabilities in visual long-sequence modeling. Inspired by this, we introduce the Spatio-Temporal Effective Receptive Field (ST-ERF) to analyze the ERF distributions across various Transformer-based SNNs. Based on the proposed ST-ERF, we reveal that these models suffer from establishing a robust global ST-ERF, thereby limiting their visual feature modeling capabilities. To overcome this issue, we propose two novel channel-mixer architectures: \underline{m}ulti-\underline{l}ayer-\underline{p}erceptron-based m\underline{ixer} (MLPixer) and \underline{s}plash-and-\underline{r}econstruct \underline{b}lock (SRB). These architectures enhance global spatial ERF through all timesteps in early network stages of Transformer-based SNNs, improving performance on challenging visual long-sequence modeling tasks. Extensive experiments conducted on the Meta-SDT variants and across object detection and semantic segmentation tasks further validate the effectiveness of our proposed method. Beyond these specific applications, we believe the proposed ST-ERF framework can provide valuable insights for designing and optimizing SNN architectures across a broader range of tasks. The code is available at \href{https: //github. com/EricZhang1412/Spatial-temporal-ERF}{\faGithub~EricZhang1412/Spatial-temporal-ERF}.

PDF Details

AAAI Conference 2024 Conference Paper

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

Chen Zhang
Luis Fernando D'Haro
Yiming Chen
Malu Zhang
Haizhou Li

Automatic evaluation is an integral aspect of dialogue system research. The traditional reference-based NLG metrics are generally found to be unsuitable for dialogue assessment. Consequently, recent studies have suggested various unique, reference-free neural metrics that better align with human evaluations. Notably among them, large language models (LLMs), particularly the instruction-tuned variants like ChatGPT, are shown to be promising substitutes for human judges. Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc. Hence, it remains inconclusive how effective these LLMs are. To this end, we conduct a comprehensive study on the application of LLMs for automatic dialogue evaluation. Specifically, we analyze the multi-dimensional evaluation capability of 30 recently emerged LLMs at both turn and dialogue levels, using a comprehensive set of 12 meta-evaluation datasets. Additionally, we probe the robustness of the LLMs in handling various adversarial perturbations at both turn and dialogue levels. Finally, we explore how model-level and dimension-level ensembles impact the evaluation performance. All resources are available at https://github.com/e0397123/comp-analysis.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization

Qianhui Liu
Jiaqi Yan
Malu Zhang
Gang Pan
Haizhou Li

Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient, making them well-suited for low-power edge devices. However, the pursuit of accuracy in current studies leads to large, long-timestep SNNs, conflicting with the resource constraints of these devices. In order to design lightweight and efficient SNNs, we propose a new approach named LitE-SNN that incorporates both spatial and temporal compression into the automated network design process. Spatially, we present a novel Compressive Convolution block (CompConv) to expand the search space to support pruning and mixed-precision quantization. Temporally, we are the first to propose a compressive timestep search to identify the optimal number of timesteps under specific computation cost constraints. Finally, we formulate a joint optimization to simultaneously learn the architecture parameters and spatial-temporal compression strategies to achieve high performance while minimizing memory and computation costs. Experimental results on CIFAR-10, CIFAR-100, and Google Speech Command datasets demonstrate our proposed LitE-SNNs can achieve competitive or even higher accuracy with remarkably smaller model sizes and fewer computation costs.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition

Jiadong Wang
Zexu Pan
Malu Zhang
Robby T. Tan
Haizhou Li

Prior studies on audio-visual speech recognition typically assume the visibility of speaking lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely affecting recognition performance. To address this issue, we propose a framework that restores occluded lips in a video by utilizing both the video itself and the corresponding noisy audio. Specifically, the framework aims to achieve these three tasks: detecting occluded frames, masking occluded areas, and reconstruction of masked regions. We tackle the first two issues by utilizing the Class Activation Map (CAM) obtained from occluded frame detection to facilitate the masking of occluded areas. Additionally, we introduce a novel synthesis-matching strategy for the reconstruction to ensure the compatibility of audio features with different levels of occlusion. Our framework is evaluated in terms of Word Error Rate (WER) on the original videos, the videos corrupted by concealed lips, and the videos restored using the framework with several existing state-of-the-art audio-visual speech recognition methods. Experimental results substantiate that our framework significantly mitigates performance degradation resulting from lip occlusion. Under -5dB noise conditions, AV-Hubert's WER increases from 10.62% to 13.87% due to lip occlusion, but rebounds to 11.87% in conjunction with the proposed framework. Furthermore, the framework also demonstrates its capacity to produce natural synthesized images in qualitative assessments.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Spike-based Neuromorphic Model for Sound Source Localization

Dehao Zhang
Shuai Wang
Ammar Belatreche
Wenjie Wei
Yichen Xiao
Haorui Zheng
Zijian Zhou
Malu Zhang

Biological systems possess remarkable sound source localization (SSL) capabilities that are critical for survival in complex environments. This ability arises from the collaboration between the auditory periphery, which encodes sound as precisely timed spikes, and the auditory cortex, which performs spike-based computations. Inspired by these biological mechanisms, we propose a novel neuromorphic SSL framework that integrates spike-based neural encoding and computation. The framework employs Resonate-and-Fire (RF) neurons with a phase-locking coding (RF-PLC) method to achieve energy-efficient audio processing. The RF-PLC method leverages the resonance properties of RF neurons to efficiently convert audio signals to time-frequency representation and encode interaural time difference (ITD) cues into discriminative spike patterns. In addition, biological adaptations like frequency band selectivity and short-term memory effectively filter out many environmental noises, enhancing SSL capabilities in real-world settings. Inspired by these adaptations, we propose a spike-driven multi-auditory attention (MAA) module that significantly improves both the accuracy and robustness of the proposed SSL framework. Extensive experimentation demonstrates that our SSL framework achieves state-of-the-art accuracy in SSL tasks. Furthermore, it shows exceptional noise robustness and maintains high accuracy even at very low signal-to-noise ratios. By mimicking biological hearing, this neuromorphic approach contributes to the development of high-performance and explainable artificial intelligence systems capable of superior performance in real-world environments.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Spatial-Temporal Self-Attention for Asynchronous Spiking Neural Networks

Yuchen Wang
Kexin Shi
Chengzhuo Lu
Yuguo Liu
Malu Zhang
Hong Qu

The brain-inspired spiking neural networks (SNNs) are receiving increasing attention due to their asynchronous event-driven characteristics and low power consumption. As attention mechanisms recently become an indispensable part of sequence dependence modeling, the combination of SNNs and attention mechanisms holds great potential for energy-efficient and high-performance computing paradigms. However, the existing works cannot benefit from both temporal-wise attention and the asynchronous characteristic of SNNs. To fully leverage the advantages of both SNNs and attention mechanisms, we propose an SNNs-based spatial-temporal self-attention (STSA) mechanism, which calculates the feature dependence across the time and space domains without destroying the asynchronous transmission properties of SNNs. To further improve the performance, we also propose a spatial-temporal relative position bias (STRPB) for STSA to consider the spatiotemporal position of spikes. Based on the STSA and STRPB, we construct a spatial-temporal spiking Transformer framework, named STS-Transformer, which is powerful and enables SNNs to work in an asynchronous event-driven manner. Extensive experiments are conducted on popular neuromorphic datasets and speech datasets, including DVS128 Gesture, CIFAR10-DVS, and Google Speech Commands, and our experimental results can outperform other state-of-the-art models.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Substructure Aware Graph Neural Networks

DingYi Zeng
Wanlong Liu
Wenyu Chen
Li Zhou
Malu Zhang
Hong Qu

Despite the great achievements of Graph Neural Networks (GNNs) in graph learning, conventional GNNs struggle to break through the upper limit of the expressiveness of first-order Weisfeiler-Leman graph isomorphism test algorithm (1-WL) due to the consistency of the propagation paradigm of GNNs with the 1-WL.Based on the fact that it is easier to distinguish the original graph through subgraphs, we propose a novel framework neural network framework called Substructure Aware Graph Neural Networks (SAGNN) to address these issues. We first propose a Cut subgraph which can be obtained from the original graph by continuously and selectively removing edges. Then we extend the random walk encoding paradigm to the return probability of the rooted node on the subgraph to capture the structural information and use it as a node feature to improve the expressiveness of GNNs. We theoretically prove that our framework is more powerful than 1-WL, and is superior in structure perception. Our extensive experiments demonstrate the effectiveness of our framework, achieving state-of-the-art performance on a variety of well-proven graph tasks, and GNNs equipped with our framework perform flawlessly even in 3-WL failed graphs. Specifically, our framework achieves a maximum performance improvement of 83% compared to the base models and 32% compared to the previous state-of-the-art methods.

PDF Details DOI

IJCAI Conference 2022 Conference Paper

Signed Neuron with Memory: Towards Simple, Accurate and High-Efficient ANN-SNN Conversion

Yuchen Wang
Malu Zhang
Yi Chen
Hong Qu

Spiking Neural Networks (SNNs) are receiving increasing attention due to their biological plausibility and the potential for ultra-low-power event-driven neuromorphic hardware implementation. Due to the complex temporal dynamics and discontinuity of spikes, training SNNs directly usually suffers from high computing resources and a long training time. As an alternative, SNN can be converted from a pre-trained artificial neural network (ANN) to bypass the difficulty in SNNs learning. However, the existing ANN-to-SNN methods neglect the inconsistency of information transmission between synchronous ANNs and asynchronous SNNs. In this work, we first analyze how the asynchronous spikes in SNNs may cause conversion errors between ANN and SNN. To address this problem, we propose a signed neuron with memory function, which enables almost no accuracy loss during the conversion process, and maintains the properties of asynchronous transmission in the converted SNNs. We further propose a new normalization method, named neuron-wise normalization, to significantly shorten the inference latency in the converted SNNs. We conduct experiments on challenging datasets including CIFAR10 (95. 44% top-1), CIFAR100 (78. 3% top-1) and ImageNet (73. 16% top-1). Experimental results demonstrate that the proposed method outperforms the state-of-the-art works in terms of accuracy and inference time. The code is available at https: //github. com/ppppps/ANN2SNNConversion_SNM_NeuronNorm.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Training Spiking Neural Networks with Local Tandem Learning

Qu Yang
Jibin Wu
Malu Zhang
Yansong Chua
Xinchao Wang
Haizhou Li

Spiking neural networks (SNNs) are shown to be more biologically plausible and energy efficient over their predecessors. However, there is a lack of an efficient and generalized training method for deep SNNs, especially for deployment on analog computing substrates. In this paper, we put forward a generalized learning rule, termed Local Tandem Learning (LTL). The LTL rule follows the teacher-student learning approach by mimicking the intermediate feature representations of a pre-trained ANN. By decoupling the learning of network layers and leveraging highly informative supervisor signals, we demonstrate rapid network convergence within five training epochs on the CIFAR-10 dataset while having low computational complexity. Our experimental results have also shown that the SNNs thus trained can achieve comparable accuracies to their teacher ANNs on CIFAR-10, CIFAR-100, and Tiny ImageNet datasets. Moreover, the proposed LTL rule is hardware friendly. It can be easily implemented on-chip to perform fast parameter calibration and provide robustness against the notorious device non-ideality issues. It, therefore, opens up a myriad of opportunities for training and deployment of SNN on ultra-low-power mixed-signal neuromorphic computing chips.

PDF Details

AAAI Conference 2021 Conference Paper

Deep Spiking Neural Network with Neural Oscillation and Spike-Phase Information

Yi Chen
Hong Qu
Malu Zhang
Yuchen Wang

Deep spiking neural network (DSNN) is a promising computational model towards artificial intelligence. It benefits from both the DNNs and SNNs through a hierarchy structure to extract multiple levels of abstraction and the event-driven computational manner to provide ultra-low-power neuromorphic implementation, respectively. However, how to efficiently train the DSNNs remains an open question because of the non-differentiable spike function that prevents the traditional back-propagation (BP) learning algorithm directly applied to DSNNs. Here, inspired by the findings from the biological neural networks, we address the above-mentioned problem by introducing neural oscillation and spike-phase information to DSNNs. Specifically, we propose an Oscillation Postsynaptic Potential (Os-PSP) and phase-locking active function, and further put forward a new spiking neuron model, namely Resonate Spiking Neuron (RSN). Based on the RSN, we propose a Spike-Level-Dependent Back-Propagation (SLDBP) learning algorithm for DSNNs. Experimental results show that the proposed learning algorithm resolves the problems caused by the incompatibility between the BP learning algorithm and SNNs, and achieves state-of-the-art performance in single spike-based learning algorithms. This work investigates the contribution of introducing biologically inspired mechanisms, such as neural oscillation and spike-phase information to DSNNs and providing a new perspective to design future DSNNs.

PDF Details

ICRA Conference 2021 Conference Paper

GCC-PHAT with Speech-oriented Attention for Robotic Sound Source Localization

Jiadong Wang
Xinyuan Qian 0001
Zihan Pan
Malu Zhang
Haizhou Li 0001

Robotic audition is a basic sense that helps robots perceive the surroundings and interact with humans. Sound Source Localization (SSL) is an essential module for a robotic system. However, the performance of most sound source localization techniques degrades in noisy and reverberant environments due to inaccurate Time Difference of Arrival (TDoA) estimation. In robotic sound source localization, we are more interested in detecting the arrival of human speech than other sound sources. Ideally, we expect an effective TDoA estimation to respond only to speech signals, while masking off other interferences. In this paper, we propose a novel technique that learns to attend to speech fundamental frequency and harmonics while suppressing noise interference and reverberation. The novel TDoA feature is referred to as Generalized Cross Correlation with Phase Transform and Speech Mask (GCC-PHAT-SM). We perform sound source localization experiments on real-world data captured from a robotic platform. Experiments show that GCC-PHAT-SM feature significantly outperforms traditional Generalized Cross Correlation (GCC) feature in noisy and reverberant acoustic environments.

Details

AAAI Conference 2019 Conference Paper

MPD-AL: An Efficient Membrane Potential Driven Aggregate-Label Learning Algorithm for Spiking Neurons

Malu Zhang
Jibin Wu
Yansong Chua
Xiaoling Luo
Zihan Pan
Dan Liu
Haizhou Li

One of the long-standing questions in biology and machine learning is how neural networks may learn important features from the input activities with a delayed feedback, commonly known as the temporal credit-assignment problem. The aggregate-label learning is proposed to resolve this problem by matching the spike count of a neuron with the magnitude of a feedback signal. However, the existing threshold-driven aggregate-label learning algorithms are computationally intensive, resulting in relatively low learning efficiency hence limiting their usability in practical applications. In order to address these limitations, we propose a novel membrane-potential driven aggregate-label learning algorithm, namely MPD-AL. With this algorithm, the easiest modifiable time instant is identified from membrane potential traces of the neuron, and guild the synaptic adaptation based on the presynaptic neurons’ contribution at this time instant. The experimental results demonstrate that the proposed algorithm enables the neurons to generate the desired number of spikes, and to detect useful clues embedded within unrelated spiking activities and background noise with a better learning efficiency over the state-of-the-art TDP1 and Multi-Spike Tempotron algorithms. Furthermore, we propose a data-driven dynamic decoding scheme for practical classification tasks, of which the aggregate labels are hard to define. This scheme effectively improves the classification accuracy of the aggregate-label learning algorithms as demonstrated on a speech recognition task.

PDF Details