Author name cluster

Yu Liang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

JBHI Journal 2026 Journal Article

CLDAE: A Two Stage EEG-based Emotion Recognition Framework Combining Contrastive Learning and Dual-Attention Encoder

Rongqi Cao
Jian He
Yu Liang
Xiyuan Hu
Tianhao Peng
Wenjun Wu
Shuang Niu
Shahid Mumtaz

Electroencephalogram (EEG)-based emotion recognition systems face a persistent challenge in maintaining robust performance across subjects (generalization) and within subjects (personalization). Existing models for cross-subject recognition generally struggle to adapt to individual-specific neural signatures, while models with optimized within-subject performance typically require a large amount of personalized data. To address these limitations, this study proposes an EEG-based emotion recognition framework, CLDAE, that integrates a contrastive learning strategy and a dual-attention feature extraction mechanism. The CLDAE framework includes two stages: contrastive learning pre-training and emotion recognition fine-tuning. During the pre-training stage, a data augmentation method that combines EEG signals from different subjects is used to generate new training samples. Moreover, to extract discriminative features from the augmented data, the dual-attention encoder combines temporal and channel attention mechanisms. After pre-training, the CLDAE is fine-tuned for final recognition tasks. The proposed CLDAE is verified by experiments on two public datasets (DEAP and SEED-IV) and a private dataset (MAN). The experimental results demonstrate that the CLDAE achieves competitive performance in both within-subject and cross-subject emotion recognition, with 95. 12% and 75. 29% accuracy on the MAN dataset, respectively; thus, outperforming the baseline methods. These results validate the effectiveness of the proposed framework in both within-subject and cross-subject emotion recognition.

Details DOI

AAAI Conference 2026 Conference Paper

Towards Training-Free and Accurate ANN-to-SNN Conversion via Activation-Aware Redistribution

Honglin Cao
Shuai Wang
Zijian Zhou
Ammar Belatreche
Wenjie Wei
Yu Liang
Yu Yang
Rui Xi

Conversion represents an effective approach for obtaining low-power models by transforming Artificial Neural Networks (ANNs) into event-driven Spiking Neural Networks (SNNs) without additional training. However, existing training-free conversion methods often incur substantial conversion errors. Here, we first reveal that these conversion errors primarily arise from a distributional mismatch, as the activation distributions of ANNs exhibit channel-wise shifts and scaling, whereas spike rates lack corresponding channel-specific characteristics. To address this limitation, we propose Adaptive Integrate-and-Fire (AIF) neurons with channel-specific thresholds and membrane-potential offsets that dynamically adjust spike rates. These parameters are optimized to jointly minimize conversion errors and maximize information entropy, enabling AIF neurons to capture the activation distribution characteristics of the original ANN. Moreover, AIF neurons can be seamlessly integrated into Transformer architectures with only negligible additional computational cost. Our method achieves state-of-the-art results on multiple vision and natural language processing benchmarks, in particular attaining a notable top-1 accuracy of 85.52% on ImageNet-1K.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Binary Event-Driven Spiking Transformer

Honglin Cao
Zijian Zhou
Wenjie Wei
Yu Liang
Ammar Belatreche
Dehao Zhang
Malu Zhang
Yang Yang

Transformer-based Spiking Neural Networks (SNNs) introduce a novel event-driven self-attention paradigm that combines the high performance of Transformers with the energy efficiency of SNNs. However, the larger model size and increased computational demands of the Transformer structure limit their practicality in resource-constrained scenarios. In this paper, we integrate binarization techniques into Transformer-based SNNs and propose the Binary Event-Driven Spiking Transformer, i. e. BESTformer. The proposed BESTformer can significantly reduce storage and computational demands by representing weights and attention maps with a mere 1-bit. However, BESTformer suffers from a severe performance drop from its full-precision counterpart due to the limited representation capability of binarization. To address this issue, we propose a Coupled Information Enhancement (CIE) method, which consists of a reversible framework and information enhancement distillation. By maximizing the mutual information between the binary model and its full-precision counterpart, the CIE method effectively mitigates the performance degradation of the BESTformer. Extensive experiments on static and neuromorphic datasets demonstrate that our method achieves superior performance to other binary SNNs, showcasing its potential as a compact yet high-performance model for resource-limited edge devices. The repository of this paper is available at https: //github. com/CaoHLin/BESTFormer.

PDF Details DOI

ICML Conference 2025 Conference Paper

BSO: Binary Spiking Online Optimization Algorithm

Yu Liang
Yu Yang
Wenjie Wei
Ammar Belatreche
Shuai Wang 0058
Malu Zhang
Yang Yang 0002

Binary Spiking Neural Networks (BSNNs) offer promising efficiency advantages for resource-constrained computing. However, their training algorithms often require substantial memory overhead due to latent weights storage and temporal processing requirements. To address this issue, we propose Binary Spiking Online (BSO) optimization algorithm, a novel online training algorithm that significantly reduces training memory. BSO directly updates weights through flip signals under the online training framework. These signals are triggered when the product of gradient momentum and weights exceeds a threshold, eliminating the need for latent weights during training. To enhance performance, we propose T-BSO, a temporal-aware variant that leverages the inherent temporal dynamics of BSNNs by capturing gradient information across time steps for adaptive threshold adjustment. Theoretical analysis establishes convergence guarantees for both BSO and T-BSO, with formal regret bounds characterizing their convergence rates. Extensive experiments demonstrate that both BSO and T-BSO achieve superior optimization performance compared to existing training methods for BSNNs. The codes are available at https: //github. com/hamingsi/BSO.

Details

ICLR Conference 2025 Conference Paper

QP-SNN: Quantized and Pruned Spiking Neural Networks

Wenjie Wei
Malu Zhang
Zijian Zhou 0005
Ammar Belatreche
Yimeng Shan
Yu Liang
Honglin Cao
Jieyuan Zhang

Brain-inspired Spiking Neural Networks (SNNs) leverage sparse spikes to encode information and operate in an asynchronous event-driven manner, offering a highly energy-efficient paradigm for machine intelligence. However, the current SNN community focuses primarily on performance improvement by developing large-scale models, which limits the applicability of SNNs in resource-limited edge devices. In this paper, we propose a hardware-friendly and lightweight SNN, aimed at effectively deploying high-performance SNN in resource-limited scenarios. Specifically, we first develop a baseline model that integrates uniform quantization and structured pruning, called QP-SNN baseline. While this baseline significantly reduces storage demands and computational costs, it suffers from performance decline. To address this, we conduct an in-depth analysis of the challenges in quantization and pruning that lead to performance degradation and propose solutions to enhance the baseline's performance. For weight quantization, we propose a weight rescaling strategy that utilizes bit width more effectively to enhance the model's representation capability. For structured pruning, we propose a novel pruning criterion using the singular value of spatiotemporal spike activities to enable more accurate removal of redundant kernels. Extensive experiments demonstrate that integrating two proposed methods into the baseline allows QP-SNN to achieve state-of-the-art performance and efficiency, underscoring its potential for enhancing SNN deployment in edge intelligence computing.

Details

ICLR Conference 2025 Conference Paper

Spiking Vision Transformer with Saccadic Attention

Shuai Wang 0058
Malu Zhang
Dehao Zhang
Ammar Belatreche
Yichen Xiao
Yu Liang
Yimeng Shan
Qian Sun 0014

The combination of Spiking Neural Networks (SNNs) and Vision Transformers (ViTs) holds potential for achieving both energy efficiency and high performance, particularly suitable for edge vision applications. However, a significant performance gap still exists between SNN-based ViTs and their ANN counterparts. Here, we first analyze why SNN-based ViTs suffer from limited performance and identify a mismatch between the vanilla self-attention mechanism and spatio-temporal spike trains. This mismatch results in degraded spatial relevance and limited temporal interactions. To address these issues, we draw inspiration from biological saccadic attention mechanisms and introduce an innovative Saccadic Spike Self-Attention (SSSA) method. Specifically, in the spatial domain, SSSA employs a novel spike distribution-based method to effectively assess the relevance between Query and Key pairs in SNN-based ViTs. Temporally, SSSA employs a saccadic interaction module that dynamically focuses on selected visual areas at each timestep and significantly enhances whole scene understanding through temporal interactions. Building on the SSSA mechanism, we develop a SNN-based Vision Transformer (SNN-ViT). Extensive experiments across various visual tasks demonstrate that SNN-ViT achieves state-of-the-art performance with linear computational complexity. The effectiveness and efficiency of the SNN-ViT highlight its potential for power-critical edge vision applications.

Details

AAAI Conference 2025 Conference Paper

Towards Accurate Binary Spiking Neural Networks: Learning with Adaptive Gradient Modulation Mechanism

Yu Liang
Wenjie Wei
Ammar Belatreche
Honglin Cao
Zijian Zhou
Shuai Wang
Malu Zhang
Yang Yang

Binary Spiking Neural Networks (BSNNs) inherit the event-driven paradigm of SNNs, while also adopting the reduced storage burden of binarization techniques. These distinct advantages grant BSNNs lightweight and energy-efficient characteristics, rendering them ideal for deployment on resource-constrained edge devices. However, due to the binary synaptic weights and non-differentiable spike function, effectively training BSNNs remains an open question. In this paper, we conduct an in-depth analysis of the challenge for BSNN learning, namely the frequent weight sign flipping problem. To mitigate this issue, we propose an Adaptive Gradient Modulation Mechanism (AGMM), which is designed to reduce the frequency of weight sign flipping by adaptively adjusting the gradients during the learning process. The proposed AGMM can enable BSNNs to achieve faster convergence speed and higher accuracy, effectively narrowing the gap between BSNNs and their full-precision equivalents. We validate AGMM on both static and neuromorphic datasets, and results indicate that it achieves state-of-the-art results among BSNNs. This work substantially reduces storage demands and enhances SNNs' inherent energy efficiency, making them highly feasible for resource-constrained environments.

PDF Details DOI

EAAI Journal 2024 Journal Article

Autonomous surface crack identification for concrete structures based on the you only look once version 5 algorithm

Yu Liang
Sai Li
Guanting Ye
Qing Jiang
Qiang Jin
Yifei Mao

Failure to repair roads in a timely manner may shorten their life and even cause traffic accidents. Thus, accurate crack detection and reasonable classification are crucial for road safety evaluation. In this study, an improved network model based on the You Only Look Once version 5 algorithm is presented, with three additional modules: The first module improves the data processing speed by replacing the C3 module in the original network with a lightweight network model. The second module was used to lighten network weight by reusing a simple convolution structure to equivalently represent the calculation of a convolution layer as a weighted sum of several small convolution blocks. And the third module is used to improve the detection accuracy by removing the upsampling and performing three-way splicing. The proposed model can detect different types of cracks, and an extensive ablation study is reported based on various combinations of the proposed modules. Based on training on a database of 5484 images, the results show that the improved network proposed in this study can effectively identify pavement cracks. Compared with the original network, mean Average Precision is increased by 5. 98%, the inference time is reduced by 4. 82%, and the model weight is decreased by 17. 36%. Additionally, to comply with engineering practice, comparative experiments were conducted on the pre-rotated dataset. The results showed that compared with You Only Look Once version 8, the improved algorithm improved accuracy, recall, average accuracy, and F1 score by 3. 28%, 8. 46%, 3. 79% and 5. 89%, respectively. This study can serve as an important reference for the development of crack detection methods.

Details DOI

NeurIPS Conference 2024 Conference Paper

Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation

Ruihao Xia
Yu Liang
Peng-tao Jiang
Hao Zhang
Bo Li
Yang Tang
Pan Zhou

Despite their success, unsupervised domain adaptation methods for semantic segmentation primarily focus on adaptation between image domains and do not utilize other abundant visual modalities like depth, infrared and event. This limitation hinders their performance and restricts their application in real-world multimodal scenarios. To address this issue, we propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task which utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities. Specifically, MADM comprises two key complementary components to tackle major challenges. First, due to the large modality gap, using one modal data to generate pseudo labels for another modality suffers from a significant drop in accuracy. To address this, MADM designs diffusion-based pseudo-label generation which adds latent noise to stabilize pseudo-labels and enhance label accuracy. Second, to overcome the limitations of latent low-resolution features in diffusion models, MADM introduces the label palette and latent regression which converts one-hot encoded labels into the RGB form by palette and regresses them in the latent space, thus ensuring the pre-trained decoder for up-sampling to obtain fine-grained features. Extensive experimental results demonstrate that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities. We open-source our code and models at https: //github. com/XiaRho/MADM.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Vision-fused Attack: Advancing Aggressive and Stealthy Adversarial Text against Neural Machine Translation

Yanni Xue
Haojie Hao
Jiakai Wang
Qiang Sheng
Renshuai Tao
Yu Liang
Pu Feng
Xianglong Liu

While neural machine translation (NMT) models achieve success in our daily lives, they show vulnerability to adversarial attacks. Despite being harmful, these attacks also offer benefits for interpreting and enhancing NMT models, thus drawing increased research attention. However, existing studies on adversarial attacks are insufficient in both attacking ability and human imperceptibility due to their sole focus on the scope of language. This paper proposes a novel vision-fused attack (VFA) framework to acquire powerful adversarial text, i. e. , more aggressive and stealthy. Regarding the attacking ability, we design the vision-merged solution space enhancement strategy to enlarge the limited semantic solution space, which enables us to search for adversarial candidates with higher attacking ability. For human imperceptibility, we propose the perception-retained adversarial text selection strategy to align the human text-reading mechanism. Thus, the finally selected adversarial text could be more deceptive. Extensive experiments on various models, including large language models (LLMs) like LLaMA and GPT-3. 5, strongly support that VFA outperforms the comparisons by large margins (up to 81%/14% improvements on ASR/SSIM).

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Unleashing the Full Potential of Product Quantization for Large-Scale Image Retrieval

Yu Liang
Shiliang Zhang
Li Ken Li
Xiaoyu Wang

Due to its promising performance, deep hashing has become a prevalent method for approximate nearest neighbors search (ANNs). However, most of current deep hashing methods are validated on relatively small-scale datasets, leaving potential threats when are applied to large-scale real-world scenarios. Specifically, they can be constrained either by the computational cost due to the large number of training categories and samples, or unsatisfactory accuracy. To tackle those issues, we propose a novel deep hashing framework based on product quantization (PQ). It uses a softmax-based differentiable PQ branch to learn a set of predefined PQ codes of the classes. Our method is easy to implement, does not involve large-scale matrix operations, and learns highly discriminate compact codes. We validate our method on multiple large-scaled datasets, including ImageNet100, ImageNet1K, and Glint360K, where the category size scales from 100 to 360K and sample number scales from 10K to 17 million, respectively. Extensive experiments demonstrate the superiority of our method. Code is available at https: //github. com/yuleung/FPPQ.

PDF Details

AAMAS Conference 2021 Conference Paper

Let the DOCTOR Decide Whom to Test: Adaptive Testing Strategies to Tackle the COVID-19 Pandemic

Yu Liang
Amulya Yadav

A robust testing program is necessary for containing the spread of COVID-19 infections before a vaccine becomes available. However, due to an acute shortage of testing kits (especially in lowresource developing countries), designing an optimal testing program/strategy is a challenging problem to solve. Prior literature on testing strategies suffers from two major limitations: (i) it does not account for the trade-off between testing of symptomatic and asymptomatic individuals, and (ii) it primarily focuses on static testing strategies, which leads to significant shortcomings in the testing program’s effectiveness. In this paper, we address these limitations by making five novel contributions. (i) We formally define the optimal testing problem and propose the DOCTOR POMDP model to tackle it. (ii) We solve the DOCTOR POMDP using a scalable Monte Carlo tree search based algorithm. (iii) We provide a rigorous experimental analysis of DOCTOR’s testing strategies against static baselines - our results show that when applied to the city of Santiago in Panama, DOCTOR’s strategies result in ∼40% fewer COVID-19 infections (over one month) as compared to state-of-theart static baselines. (iv) In addition, we analyze DOCTOR’s testing policy to derive insights about the reasons behind the optimality of DOCTOR’s testing policy. (v) Finally, we characterize conditions (of the real world) under which DOCTOR’s optimization would be of most benefit to government policy makers, and thus requires significant attention from researchers in this area. Our work complements the growing body of research on COVID-19, and serves as a proof-of-concept that illustrates the benefit of having an AI-driven adaptive testing strategy for COVID-19.

PDF

IJCAI Conference 2020 Conference Paper

Optimal and Non-Discriminative Rehabilitation Program Design for Opioid Addiction Among Homeless Youth

Amulya Yadav
Roopali Singh
Nikolas Siapoutis
Anamika Barman-Adhikari
Yu Liang

This paper presents CORTA, a software agent that designs personalized rehabilitation programs for homeless youth suffering from opioid addiction. Many rehabilitation centers treat opioid addiction in homeless youth by prescribing rehabilitation programs that are tailored to the underlying causes of addiction. To date, rehabilitation centers have relied on ad-hoc assessments and unprincipled heuristics to deliver rehabilitation programs to homeless youth suffering from opioid addiction, which greatly undermines the effectiveness of the delivered programs. CORTA addresses these challenges via three novel contributions. First, CORTA utilizes a first-of-its-kind real-world dataset collected from ~1400 homeless youth to build causal inference models which predict the likelihood of opioid addiction among these youth. Second, utilizing counterfactual predictions generated by our causal inference models, CORTA solves novel optimization formulations to assign appropriate rehabilitation programs to the correct set of homeless youth in order to minimize the expected number of homeless youth suffering from opioid addiction. Third, we provide a rigorous experimental analysis of CORTA along different dimensions, e. g. , importance of causal modeling, importance of optimization, and impact of incorporating fairness considerations, etc. Our simulation results show that CORTA outperforms baselines by ~110% in minimizing the number of homeless youth suffering from opioid addiction.

PDF Details DOI