Author name cluster

Guoqi Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

EAAI Journal 2026 Journal Article

Event-based low-power spiking gaze estimation

Zhipeng Sui
Weihua He
Fei Liang
Yongxiang Feng
Xiaobao Wei
Qiushuang Lian
Ziyang Zhang
Guoqi Li

Details DOI

TMLR Journal 2026 Journal Article

SpikingBrain: Spiking Brain-inspired Large Models

Yuqi Pan
Yupeng Feng
JingHao Zhuang
siyu ding
Han Xu
Zehao Liu
Bohan Sun
Yuhong Chou

Mainstream Transformer-based large language models (LLMs) face significant efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly. These constraints limit their ability to process long sequences effectively. In addition, building large models on non-NVIDIA computing platforms poses major challenges in achieving stable and efficient training and deployment. To address these issues, we introduce SpikingBrain, a new family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three core aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline compatible with existing LLMs, along with a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to the MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and our training framework supports weeks of stable training on hundreds of MetaX GPUs with Model FLOPs Utilization (MFU) at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using exceptionally low data resources (continual pre-training of approximately 150B tokens). Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B achieves more than 100× speedup in Time to First Token (TTFT) for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15% sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.

PDF Details

AAAI Conference 2026 Conference Paper

SpikingIR: A Novel Converted Spiking Neural Network for Efficient Image Restoration

Yang Ouyang
Zihan Cheng
Xiaotong Luo
Guoqi Li
Yanyun Qu

Image restoration has made great progress with the rise of deep learning, but its energy consumption limits its real-world applications. Spiking Neural Networks (SNNs) are seen as energy-efficient alternatives to Artificial Neural Networks (ANNs). Applying SNNs to image restoration (IR) remains challenging, primarily due to the limited information capacity of spike-based signals. This limitation leads to quantization errors and information loss, while IR tasks are highly sensitive to output precision and error. Thus, the restoration performance suffers significantly. To address this challenge, we propose SpikingIR, an ANN-to-SNN conversion framework for IR that reduces information loss and quantization error. SpikingIR mainly consists of two components: Convolutional Pixel Mapping (CPM) and Membrane Potential Reuse Neuron (MPRN), which are designed to alleviate quantization errors and information loss in the output and intermediate layers, respectively. Specifically, CPM maps discrete outputs into a continuous space, better aligning with pixel-level details. From the perspective of information entropy, we show that outputs of CPM contain more information than the original outputs. MPRN introduces a post-processing step with relaxed firing conditions to extract residual membrane potential, reducing information waste. Furthermore, we fine-tune the converted model to jointly optimize both accuracy and energy efficiency. Experimental results demonstrate that SpikingIR achieves performance comparable to ANN counterparts across various IR benchmarks while reducing energy consumption by up to 50%.

PDF Details DOI

AAAI Conference 2025 Conference Paper

BIG-FUSION: Brain-Inspired Global-Local Context Fusion Framework for Multimodal Emotion Recognition in Conversations

Yusong Wang
Xuanye Fang
Huifeng Yin
Dongyuan Li
Guoqi Li
Qi Xu
Yi Xu
Shuai Zhong

Considering the importance of capturing both global conversational topics and local speaker dependencies for multimodal emotion recognition in conversations, current approaches first utilize sequence models like Transformer to extract global context information, then apply Graph Neural Networks to model local speaker dependencies for local context information extraction, coupled with Graph Contrastive Learning (GCL) to enhance node representation learning. However, this sequential design introduces potential biases: the extracted global context information inevitably influences subsequent processing, compromising the independence and diversity of the original local features; current graph augmentation methods in GCL cannot consider both global and local context information in conversations to evaluate the node importance, hindering the learning of key information. Inspired by the human brain excels at handling complex tasks by efficiently integrating local and global information processing mechanisms, we propose an aligned global-local context fusion framework for sequence-based design to address these problems. This design includes a dual-attention Transformer and a dual-evaluation method for graph augmentation in GCL. The dual-attention Transformer combines global attention for overall context extraction with sliding-window attention for local context capture, both enhanced by spiking neuron dynamics. The dual-evaluation method in GCL comprises global importance evaluation to identify nodes crucial for overall conversation context, and local importance evaluation to detect nodes significant for local semantics, generating augmented graph views that preserve both global and local information. This approach ensures balanced information processing throughout the pipeline, enhancing biological plausibility and achieving superior emotion recognition.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Efficient 3D Recognition with Event-driven Spike Sparse Convolution

Xuerui Qiu
Man Yao
Jieyuan Zhang
Yuhong Chou
Ning Qiao
Shibo Zhou
Bo Xu
Guoqi Li

Spiking Neural Networks (SNNs) provide an energy-efficient way to extract 3D spatio-temporal features. Point clouds are sparse 3D spatial data, which suggests that SNNs should be well-suited for processing them. However, when applying SNNs to point clouds, they often exhibit limited performance and fewer application scenarios. We attribute this to inappropriate preprocessing and feature extraction methods. To address this issue, we first introduce the Spike Voxel Coding (SVC) scheme, which encodes the 3D point clouds into a sparse spike train space, reducing the storage requirements and saving time on point cloud preprocessing. Then, we propose a Spike Sparse Convolution (SSC) model for efficiently extracting 3D sparse point cloud features. Combining SVC and SSC, we design an efficient 3D SNN backbone (E-3DSNN), which is friendly with neuromorphic hardware. For instance, SSC can be implemented on neuromorphic chips with only minor modifications to the addressing function of vanilla spike convolution. Experiments on ModelNet40, KITTI, and Semantic KITTI datasets demonstrate that E-3DSNN achieves state-of-the-art (SOTA) results with remarkable efficiency. Notably, our E-3DSNN (1.87M) obtained 91.7% top-1 accuracy on ModelNet40, surpassing the current best SNN baselines (14.3M) by 3.0%. To our best knowledge, it is the first direct training 3D SNN backbone that can simultaneously handle various 3D computer vision tasks (e.g., classification, detection, and segmentation) with an event-driven nature.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

MI-TRQR: Mutual Information-Based Temporal Redundancy Quantification and Reduction for Energy-Efficient Spiking Neural Networks

Dengfeng Xue
Wenjuan Li
Yifan Lu
Chunfeng Yuan
Yufan Liu
Wei Liu
Man Yao
Li Yang

Brain-inspired spiking neural networks (SNNs) provide energy-efficient computation through event-driven processing. However, the shared weights across multiple timesteps lead to serious temporal feature redundancy, limiting both efficiency and performance. This issue is further aggravated when processing static images due to the duplicated input. To mitigate this problem, we propose a parameter-free and plug-and-play module named Mutual Information-based Temporal Redundancy Quantification and Reduction (MI-TRQR), constructing energy-efficient SNNs. Specifically, Mutual Information (MI) is properly introduced to quantify redundancy between discrete spike features at different timesteps on two spatial scales: pixel (local) and the entire spatial features (global). Based on the multi-scale redundancy quantification, we apply a probabilistic masking strategy to remove redundant spikes. The final representation is subsequently recalibrated to account for the spike removal. Extensive experimental results demonstrate that our MI-TRQR achieves sparser spiking firing, higher energy efficiency, and better performance concurrently with different SNN architectures in tasks of neuromorphic data classification, static data classification, and time-series forecasting. Notably, MI-TRQR increases accuracy by \textbf{1. 7\%} on CIFAR10-DVS with 4 timesteps while reducing energy cost by \textbf{37. 5\%}. Our codes are available at https: //github. com/dfxue/MI-TRQR.

PDF Details

AAAI Conference 2025 Conference Paper

Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation

Zhenxin Lei
Man Yao
Jiakui Hu
Xinhao Luo
Yanye Lu
Bo Xu
Guoqi Li

Spiking Neural Networks (SNNs) have a low-power advantage but perform poorly in image segmentation tasks. The reason is that directly converting neural networks with complex architectural designs for segmentation tasks into spiking versions leads to performance degradation and non-convergence. To address this challenge, we first identify the modules in the architecture design that lead to the severe reduction in spike firing, make targeted improvements, and propose Spike2Former architecture. Second, we propose normalized integer spiking neurons to solve the training stability problem of SNNs with complex architectures. We set a new state-of-the-art for SNNs in various semantic segmentation datasets, with a significant improvement of +12.7% mIoU and 5.0x efficiency on ADE20K, +14.3% mIoU and 5.2x efficiency on VOC2012, and +9.1% mIoU and 6.6x efficiency on CityScapes.

PDF Details DOI

ICLR Conference 2025 Conference Paper

SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking

Xingrun Xing
Boyan Gao
Zheng Liu
David A. Clifton
Shitao Xiao
Wanpeng Zhang 0002
Li Du
Zheng Zhang

Recent advancements in large language models (LLMs) with billions of parameters have improved performance in various applications, but their inference processes demand significant energy and computational resources. In contrast, the human brain, with approximately 86 billion neurons, is much more energy-efficient than LLMs with similar parameters. Inspired by this, we redesign 7$\sim$70 billion parameter LLMs using bio-plausible spiking mechanisms, emulating the efficient behavior of the human brain. We propose the first spiking large language model, SpikeLLM. Coupled with the proposed model, two essential approaches are proposed to improve spike training efficiency: Generalized Integrate-and-Fire (GIF) neurons to compress spike length from $T$ to $\frac{T}{L} \log_2 L$ bits, and an Optimal Brain Spiking framework to divide outlier channels and allocate different $T$ for GIF neurons, which further compresses spike length to approximate $log_2T$ bits. The necessity of spike-driven LLM is proved by comparison with quantized LLMs with similar operations. In the OmniQuant pipeline, SpikeLLM reduces 11.01\% WikiText2 perplexity and improves 2.55\% accuracy of common scene reasoning on a LLAMA-7B W4A4 model. In the GPTQ pipeline, SpikeLLM achieves direct additive in linear layers, significantly exceeding PB-LLMs. Our code is publicly available at https://github.com/Xingrun-Xing2/SpikeLLM.

Details

ICML Conference 2025 Conference Paper

SpikeVideoFormer: An Efficient Spike-Driven Video Transformer with Hamming Attention and O(T) Complexity

Shihao Zou
Qingfeng Li
Wei Ji
Jingjing Li
Yongkui Yang
Guoqi Li
Chao Dong

Spiking Neural Networks (SNNs) have shown competitive performance to Artificial Neural Networks (ANNs) in various vision tasks, while offering superior energy efficiency. However, existing SNN-based Transformers primarily focus on single-image tasks, emphasizing spatial features while not effectively leveraging SNNs’ efficiency in video-based vision tasks. In this paper, we introduce SpikeVideoFormer, an efficient spike-driven video Transformer, featuring linear temporal complexity $\mathcal{O}(T)$. Specifically, we design a spike-driven Hamming attention (SDHA) which provides a theoretically guided adaptation from traditional real-valued attention to spike-driven attention. Building on SDHA, we further analyze various spike-driven space-time attention designs and identify an optimal scheme that delivers appealing performance for video tasks, while maintaining only linear temporal complexity. The generalization ability and efficiency of our model are demonstrated across diverse downstream video tasks, including classification, human pose tracking, and semantic segmentation. Empirical results show our method achieves state-of-the-art (SOTA) performance compared to existing SNN approaches, with over 15% improvement on the latter two tasks. Additionally, it matches the performance of recent ANN-based methods while offering significant efficiency gains, achieving $\times 16$, $\times 10$ and $\times 5$ improvements on the three tasks. https: //github. com/JimmyZou/SpikeVideoFormer

Details

AAAI Conference 2024 Conference Paper

Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks

Xuerui Qiu
Rui-jie Zhu
Yuhong Chou
Zhaorui Wang
Liang-Jian Deng
Guoqi Li

Spiking neural networks (SNNs) are emerging as an energy-efficient alternative to traditional artificial neural networks (ANNs) due to their unique spike-based event-driven nature. Coding is crucial in SNNs as it converts external input stimuli into spatio-temporal feature sequences. However, most existing deep SNNs rely on direct coding that generates powerless spike representation and lacks the temporal dynamics inherent in human vision. Hence, we introduce Gated Attention Coding (GAC), a plug-and-play module that leverages the multi-dimensional gated attention unit to efficiently encode inputs into powerful representations before feeding them into the SNN architecture. GAC functions as a preprocessing layer that does not disrupt the spike-driven nature of the SNN, making it amenable to efficient neuromorphic hardware implementation with minimal modifications. Through an observer model theoretical analysis, we demonstrate GAC's attention mechanism improves temporal dynamics and coding efficiency. Experiments on CIFAR10/100 and ImageNet datasets demonstrate that GAC achieves state-of-the-art accuracy with remarkable efficiency. Notably, we improve top-1 accuracy by 3.10% on CIFAR100 with only 6-time steps and 1.07% on ImageNet while reducing energy usage to 66.9% of the previous works. To our best knowledge, it is the first time to explore the attention-based dynamic coding scheme in deep SNNs, with exceptional effectiveness and efficiency on large-scale datasets. Code is available at https://github.com/bollossom/GAC.

PDF Details DOI

AAAI Conference 2024 Conference Paper

HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors

Xiao Wang
Zongzhen Wu
Bo Jiang
Zhimin Bao
Lin Zhu
Guoqi Li
Yaowei Wang
Yonghong Tian

The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which usually suffer from illumination, fast motion, privacy preservation, and large energy consumption. Meanwhile, the biologically inspired event cameras attracted great interest due to their unique features, such as high dynamic range, dense temporal but sparse spatial resolution, low latency, low power, etc. As it is a newly arising sensor, even there is no realistic large-scale dataset for HAR. Considering its great practical value, in this paper, we propose a large-scale benchmark dataset to bridge this gap, termed HARDVS, which contains 300 categories and more than 100K event sequences. We evaluate and report the performance of multiple popular HAR algorithms, which provide extensive baselines for future works to compare. More importantly, we propose a novel spatial-temporal feature learning and fusion framework, termed ESTF, for event stream based human activity recognition. It first projects the event streams into spatial and temporal embeddings using StemNet, then, encodes and fuses the dual-view representations using Transformer networks. Finally, the dual features are concatenated and fed into a classification head for activity prediction. Extensive experiments on multiple datasets fully validated the effectiveness of our model. Both the dataset and source code will be released at https://github.com/Event-AHU/HARDVS.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Yuhong Chou
Man Yao
Kexin Wang
Yuqi Pan
Ruijie Zhu
Yiran Zhong
Yu Qiao
Jibin Wu

Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: (1) Dynamic memory ability; (2) Static approximation ability; (3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.

PDF Details DOI

TMLR Journal 2024 Journal Article

SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks

Rui-jie Zhu
Qihang Zhao
Guoqi Li
Jason Eshraghian

As the size of large language models continue to scale, so does the computational resources required to run them. Spiking Neural Networks (SNNs) have emerged as an energy-efficient approach to deep learning that leverage sparse and event-driven activations to reduce the computational overhead associated with model inference. While they have become competitive with non-spiking models on many computer vision tasks, SNNs have proven to be more challenging to train. As a result, their performance lags behind modern deep learning, and until now, SNNs have yet to succeed at language generation on large-scale datasets. In this paper, inspired by the Receptance Weighted Key Value (RWKV) language model, we successfully implement `SpikeGPT', a generative language model with binary, event-driven spiking activation units. We train the proposed model on two model variants: 46M and 216M parameters. To the best of our knowledge, SpikeGPT is the largest backpropagation-trained SNN model when released, rendering it suitable for both the generation and comprehension of natural language. We achieve this by modifying the transformer block to replace multi-head self-attention to reduce quadratic computational complexity $\mathcal{O}(T^2)$ to linear complexity $\mathcal{O}(T)$ with increasing sequence length. Input tokens are instead streamed in sequentially to our attention mechanism (as with typical SNNs). Our experiments show that SpikeGPT remains competitive with non-spiking models on tested benchmarks, while maintaining 32.2$\times$ fewer operations when processed on neuromorphic hardware that can leverage sparse, event-driven activations. Our code implementation is available at https://github.com/ridgerchu/SpikeGPT.

PDF Details

ICML Conference 2024 Conference Paper

SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms

Xingrun Xing
Zheng Zhang 0006
Ziyi Ni
Shitao Xiao
Yiming Ju
Siqi Fan 0001
Yequan Wang
Jiajun Zhang

Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language models exhibit promising generalization capability, making it a valuable issue to explore more general spike-driven models. However, the binary spikes in existing SNNs fail to encode adequate semantic information, placing technological challenges for generalization. This work proposes the first fully spiking mechanism for general language tasks, including both discriminative and generative ones. Different from previous spikes with 0, 1 levels, we propose a more general spike formulation with bi-directional, elastic amplitude, and elastic frequency encoding, while still maintaining the addition nature of SNNs. In a single time step, the spike is enhanced by direction and amplitude information; in spike frequency, a strategy to control spike firing rate is well designed. We plug this elastic bi-spiking mechanism in language modeling, named SpikeLM. It is the first time to handle general language tasks with fully spike-driven models, which achieve much higher accuracy than previously possible. SpikeLM also greatly bridges the performance gap between SNNs and ANNs in language modeling. Our code is available at https: //github. com/Xingrun-Xing/SpikeLM.

Details

NeurIPS Conference 2024 Conference Paper

Spiking Transformer with Experts Mixture

Zhaokun Zhou
Yijie Lu
Yanhao Jia
Kaiwei Che
Jun Niu
Liwei Huang
Xinyu Shi
Yuesheng Zhu

Spiking Neural Networks (SNNs) provide a sparse spike-driven mechanism which is believed to be critical for energy-efficient deep learning. Mixture-of-Experts (MoE), on the other side, aligns with the brain mechanism of distributed and sparse processing, resulting in an efficient way of enhancing model capacity and conditional computation. In this work, we consider how to incorporate SNNs’ spike-driven and MoE’s conditional computation into a unified framework. However, MoE uses softmax to get the dense conditional weights for each expert and TopK to hard-sparsify the network, which does not fit the properties of SNNs. To address this issue, we reformulate MoE in SNNs and introduce the Spiking Experts Mixture Mechanism (SEMM) from the perspective of sparse spiking activation. Both the experts and the router output spiking sequences, and their element-wise operation makes SEMM computation spike-driven and dynamic sparse-conditional. By developing SEMM into Spiking Transformer, the Experts Mixture Spiking Attention (EMSA) and the Experts Mixture Spiking Perceptron (EMSP) are proposed, which performs routing allocation for head-wise and channel-wise spiking experts, respectively. Experiments show that SEMM realizes sparse conditional computation and obtains a stable improvement on neuromorphic and static datasets with approximate computational overhead based on the Spiking Transformer baselines.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Spike-driven Transformer

Man Yao
Jiakui Hu
Zhaokun Zhou
Li Yuan
Yonghong Tian
Bo Xu
Guoqi Li

Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option due to their unique spike-based event-driven (i. e. , spike-driven) paradigm. In this paper, we incorporate the spike-driven paradigm into Transformer by the proposed Spike-driven Transformer with four unique properties: (1) Event-driven, no calculation is triggered when the input of Transformer is zero; (2) Binary spike communication, all matrix multiplications associated with the spike matrix can be transformed into sparse additions; (3) Self-attention with linear complexity at both token and channel dimensions; (4) The operations between spike-form Query, Key, and Value are mask and addition. Together, there are only sparse addition operations in the Spike-driven Transformer. To this end, we design a novel Spike-Driven Self-Attention (SDSA), which exploits only mask and addition operations without any multiplication, and thus having up to $87. 2\times$ lower computation energy than vanilla self-attention. Especially in SDSA, the matrix multiplication between Query, Key, and Value is designed as the mask operation. In addition, we rearrange all residual connections in the vanilla Transformer before the activation functions to ensure that all neurons transmit binary spike signals. It is shown that the Spike-driven Transformer can achieve 77. 1\% top-1 accuracy on ImageNet-1K, which is the state-of-the-art result in the SNN field.

PDF Details

JBHI Journal 2022 Journal Article

Brain-Controlled 2D Navigation Robot Based on a Spatial Gradient Controller and Predictive Environmental Coordinator

Deyu Zhang
Siyu Liu
Jian Zhang
Guoqi Li
Dingjie Suo
Tiantian Liu
Jiawei Luo
Zhiyuan Ming

Objective: Brain-computer interfaces (BCIs) have been used in two-dimensional (2D) navigation robotic devices, such as brain-controlled wheelchairs and brain-controlled vehicles. However, contemporary BCI systems are driven by binary selective control. On the one hand, only directional information can be transferred from humans to machines, such as “turn left” or “turn right”, which means that the quantified value, such as the radius of gyration, cannot be controlled. In this study, we proposed a spatial gradient BCI controller and corresponding environment coordinator, by which the quantified value of brain commands can be transferred in the form of a 2D vector, improving the flexibility, stability and efficiency of BCIs. Methods: A horizontal array of steady-state visual stimulation was arranged to excite subject (EEG) signals. Covariance arrays between subjects’ electroencephalogram (EEG) and stimulation features were mapped into quantified 2-dimensional vectors. The generated vectors were then inputted into the predictive controller and fused with virtual forces generated by the robot's predictive environment coordinator in the form of vector calculation. The resultant vector was then interpreted into the driving force for the robot, and real-time speed feedback was generated. Results: The proposed SGC controller generated a faster (27. 4 s vs. 34. 9 s) response for the single-obstacle avoidance task than the selective control approach. In practical multiobstacle tasks, the proposed robot executed 39% faster in the target-reaching tasks than the selective controller and had better robustness in multiobstacle avoidance tasks (average failures significantly dropped from 27% to 4%). Significance: This research proposes a new form of brain-machine shared control strategy that quantifies brain commands in the form of a 2-D control vector stream rather than selective constant values. Combined with a predictive environment coordinator, the brain-controlled strategy of the robot is optimized and provided with higher flexibility. The proposed controller can be used in brain-controlled 2D navigation devices, such as brain-controlled wheelchairs and vehicles.

Details DOI

IJCAI Conference 2022 Conference Paper

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

Xin Liu
Mingyu Yan
Lei Deng
Guoqi Li
Xiaochun Ye
Dongrui Fan
Shirui Pan
Yuan Xie

Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications. However, with the use of huger data and deeper models, an urgent demand is unsurprisingly made to accelerate GNNs for more efficient execution. In this paper, we provide a comprehensive survey on acceleration methods for GNNs from an algorithmic perspective. We first present a new taxonomy to classify existing acceleration methods into five categories. Based on the classification, we systematically discuss these methods and highlight their correlations. Next, we provide comparisons from aspects of the efficiency and characteristics of these methods. Finally, we suggest some promising prospects for future research.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

Exploiting Spiking Dynamics with Spatial-temporal Feature Normalization in Graph Learning

Mingkun Xu
YuJie Wu
Lei Deng
Faqiang Liu
Guoqi Li
Jing Pei

Biological spiking neurons with intrinsic dynamics underlie the powerful representation and learning capabilities of the brain for processing multimodal information in complex environments. Despite recent tremendous progress in spiking neural networks (SNNs) for handling Euclidean-space tasks, it still remains challenging to exploit SNNs in processing non-Euclidean-space data represented by graph data, mainly due to the lack of effective modeling framework and useful training techniques. Here we present a general spike-based modeling framework that enables the direct training of SNNs for graph learning. Through spatial-temporal unfolding for spiking data flows of node features, we incorporate graph convolution filters into spiking dynamics and formalize a synergistic learning paradigm. Considering the unique features of spike representation and spiking dynamics, we propose a spatial-temporal feature normalization (STFN) technique suitable for SNN to accelerate convergence. We instantiate our methods into two spiking graph models, including graph convolution SNNs and graph attention SNNs, and validate their performance on three node-classification benchmarks, including Cora, Citeseer, and Pubmed. Our model can achieve comparable performance with the state-of-the-art graph neural network (GNN) models with much lower computation costs, demonstrating great benefits for the execution on neuromorphic hardware and prompting neuromorphic applications in graphical scenarios.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Going Deeper With Directly-Trained Larger Spiking Neural Networks

Hanle Zheng
YuJie Wu
Lei Deng
Yifan Hu
Guoqi Li

Spiking neural networks (SNNs) are promising in a bioplausible coding for spatio-temporal information and eventdriven signal processing, which is very suited for energyefficient implementation in neuromorphic hardware. However, the unique working mode of SNNs makes them more difficult to train than traditional networks. Currently, there are two main routes to explore the training of deep SNNs with high performance. The first is to convert a pre-trained ANN model to its SNN version, which usually requires a long coding window for convergence and cannot exploit the spatio-temporal features during training for solving temporal tasks. The other is to directly train SNNs in the spatio-temporal domain. But due to the binary spike activity of the firing function and the problem of gradient vanishing or explosion, current methods are restricted to shallow architectures and thereby difficult in harnessing large-scale datasets (e. g. ImageNet). To this end, we propose a threshold-dependent batch normalization (tdB- N) method based on the emerging spatio-temporal backpropagation, termed “STBP-tdBN”, enabling direct training of a very deep SNN and the efficient implementation of its inference on neuromorphic hardware. With the proposed method and elaborated shortcut connection, we significantly extend directly-trained SNNs from a shallow structure (<10 layer) to a very deep structure (50 layers). Furthermore, we theoretically analyze the effectiveness of our method based on “Block Dynamical Isometry” theory. Finally, we report superior accuracy results including 93. 15% on CIFAR-10, 67. 8% on DVS-CIFAR10, and 67. 05% on ImageNet with very few timesteps. To our best knowledge, it’s the first time to explore the directly-trained deep SNNs with high performance on ImageNet. We believe this work shall pave the way of fully exploiting the advantages of SNNs and attract more researchers to contribute in this field.

PDF Details

NeurIPS Conference 2020 Conference Paper

Restoring Negative Information in Few-Shot Object Detection

Yukuan Yang
Fangyun Wei
Miaojing Shi
Guoqi Li

Few-shot learning has recently emerged as a new challenge in the deep learning field: unlike conventional methods that train the deep neural networks (DNNs) with a large number of labeled data, it asks for the generalization of DNNs on new classes with few annotated samples. Recent advances in few-shot learning mainly focus on image classification while in this paper we focus on object detection. The initial explorations in few-shot object detection tend to simulate a classification scenario by using the positive proposals in images with respect to certain object class while discarding the negative proposals of that class. Negatives, especially hard negatives, however, are essential to the embedding space learning in few-shot object detection. In this paper, we restore the negative information in few-shot object detection by introducing a new negative- and positive-representative based metric learning framework and a new inference scheme with negative and positive representatives. We build our work on a recent few-shot pipeline RepMet with several new modules to encode negative information for both training and testing. Extensive experiments on ImageNet-LOC and PASCAL VOC show our method substantially improves the state-of-the-art few-shot object detection solutions. Our code is available at https: //github. com/yang-yk/NP-RepMet.

PDF Details

AAAI Conference 2019 Conference Paper

Direct Training for Spiking Neural Networks: Faster, Larger, Better

YuJie Wu
Lei Deng
Guoqi Li
Jun Zhu
Yuan Xie
Luping Shi

Spiking neural networks (SNNs) that enables energy efficient implementation on emerging neuromorphic hardware are gaining more attention. Yet now, SNNs have not shown competitive performance compared with artificial neural networks (ANNs), due to the lack of effective learning algorithms and efficient programming frameworks. We address this issue from two aspects: (1) We propose a neuron normalization technique to adjust the neural selectivity and develop a direct learning algorithm for deep SNNs. (2) Via narrowing the rate coding window and converting the leaky integrate-and-fire (LIF) model into an explicitly iterative version, we present a Pytorch-based implementation method towards the training of large-scale SNNs. In this way, we are able to train deep SNNs with tens of times speedup. As a result, we achieve significantly better accuracy than the reported works on neuromorphic datasets (N-MNIST and DVS- CIFAR10), and comparable accuracy as existing ANNs and pre-trained SNNs on non-spiking datasets (CIFAR10). To our best knowledge, this is the first work that demonstrates direct training of deep SNNs with high performance on CIFAR10, and the efficient implementation provides a new way to explore the potential of SNNs.

PDF Details

NeurIPS Conference 2018 Conference Paper

HitNet: Hybrid Ternary Recurrent Neural Network

Peiqi Wang
Xinfeng Xie
Lei Deng
Guoqi Li
Dongsheng Wang
Yuan Xie

Quantization is a promising technique to reduce the model size, memory footprint, and massive computation operations of recurrent neural networks (RNNs) for embedded devices with limited resources. Although extreme low-bit quantization has achieved impressive success on convolutional neural networks, it still suffers from huge accuracy degradation on RNNs with the same low-bit precision. In this paper, we first investigate the accuracy degradation on RNN models under different quantization schemes, and the distribution of tensor values in the full precision model. Our observation reveals that due to the difference between the distributions of weights and activations, different quantization methods are suitable for different parts of models. Based on our observation, we propose HitNet, a hybrid ternary recurrent neural network, which bridges the accuracy gap between the full precision model and the quantized model. In HitNet, we develop a hybrid quantization method to quantize weights and activations. Moreover, we introduce a sloping factor motivated by prior work on Boltzmann machine to activation functions, further closing the accuracy gap between the full precision model and the quantized model. Overall, our HitNet can quantize RNN models into ternary values, {-1, 0, 1}, outperforming the state-of-the-art quantization methods on RNN models significantly. We test it on typical RNN models, such as Long-Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), on which the results outperform previous work significantly. For example, we improve the perplexity per word (PPW) of a ternary LSTM on Penn Tree Bank (PTB) corpus from 126 (the state-of-the-art result to the best of our knowledge) to 110. 3 with a full precision model in 97. 2, and a ternary GRU from 142 to 113. 5 with a full precision model in 102. 7.

PDF Details

ICLR Conference 2018 Conference Paper

Training and Inference with Integers in Deep Neural Networks

Shuang Wu
Guoqi Li
Feng Chen
Luping Shi

Researches on deep neural networks with discrete parameters and their deployment in embedded systems have been active and promising topics. Although previous works have successfully reduced precision in inference, transferring both training and inference processes to low-bitwidth integers has not been demonstrated simultaneously. In this work, we develop a new method termed as ``"WAGE" to discretize both training and inference, where weights (W), activations (A), gradients (G) and errors (E) among layers are shifted and linearly constrained to low-bitwidth integers. To perform pure discrete dataflow for fixed-point devices, we further replace batch normalization by a constant scaling layer and simplify other components that are arduous for integer implementation. Improved accuracies can be obtained on multiple datasets, which indicates that WAGE somehow acts as a type of regularization. Empirically, we demonstrate the potential to deploy training in hardware systems such as integer-based deep learning accelerators and neuromorphic chips with comparable accuracy and higher energy efficiency, which is crucial to future AI applications in variable scenarios with transfer and continual learning demands.

Details