Author name cluster

Li Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers

2 author rows

EAAI Journal 2026 Journal Article

A hybrid model and data driven approach for ballistic prediction with PINN

Li Yang
Wenjie Zheng
Qinjie Liu
Jinwen Wang

Details DOI

AAAI Conference 2026 Conference Paper

Binary Message Passing for Generalizable Semi-Supervised Graph Anomaly Detection

Jingyuan Zhang
Xin Wang
Lei Yu
Li Yang
Fengjun Zhang

Graph Neural Networks (GNNs) have achieved impressive performance in semi-supervised graph anomaly detection (GAD). While many GNN variants have been developed for this task, they largely focus on advanced message aggregation schemes, leaving the message routing aspect underexplored. We argue that the commonly used broadcast-based routing can also hinder generalization, particularly in the presence of rare and structurally challenging (vertices with a high-degree) anomalies. To address this, we propose Binary Message Passing (BMP), a novel routing paradigm that models the message flow of each vertex as a binary tree (BMP tree), where vanilla graph convolution is decoupled by its left and right subtrees. Each vertex recursively gathers information from neighbors with higher anomaly probabilities within each subtree, thereby amplifying the propagation of anomaly information across the topology. The anomaly probabilities are estimated and updated by the model itself, enabling adaptive, self-supervised routing over iterations. Furthermore, combining multiple BMP trees into a BMP forest provides multi-scale structural context, enhancing the expressiveness of final vertex embeddings. Extensive experiments show that BMP improves detection performance under limited supervision while exhibiting better generalization across structurally diverse anomalies.

PDF Details DOI

JBHI Journal 2026 Journal Article

HemoSC-P: A Hemodynamic Semantic Channel Paradigm for Cardiovascular Parameter Estimation

Xi Qiu
Zhifei Zhang
Hailin Cao
Li Yang
Hao Tang
Lianyang Zhang

As a leading cause of death worldwide, cardiovascular disease demands more precise monitoring and early warning systems, posing significant challenges to modern healthcare. However, cardiovascular early warning systems often face two major dilemmas: the “black box dilemma” leads to unreliable estimation results due to limited physiological interpretability, and limited robustness across subjects and blood pressure states arises from physiological heterogeneity. This study innovatively maps the methods of semantic extraction and channel modeling to the cardiovascular system, inspired by the 6G network concept of transmitting meaning rather than data under the semantic communication paradigm. It proposes a novel hemodynamic channel-guided cardiovascular parameter estimation paradigm (HemoSC-P). This paradigm adopts a dual-pillar modeling framework: the semantic pillar utilizes a multi-scale convolutional and phase-aware attention architecture to model the non-stationary dynamics of cardiovascular signals, elevating feature alignment from the temporal domain to the physiological domain. The cardiovascular channel, parameterized by the Windkessel circuit model, serves dual roles as a semantic pathway from cardiac contractions to observable physiological signals and as a repository of physiological knowledge. It guides channel-level physiologically deformable attention to achieve inverse estimation of cardiovascular parameters. To validate this paradigm, this study employs non-invasive blood pressure estimation as a representative case study. Validation was conducted on three representative public datasets (UCI-BP, MIMIC-III, PPG-BP). On MIMIC-III, the mean absolute error $\pm$ standard deviation for systolic and diastolic blood pressure reached 3. 04 $\pm$ 3. 24 mmHg and 2. 57 $\pm$ 2. 70 mmHg, respectively. Multi-dataset validation indicates that this paradigm surpasses benchmark methods in accuracy and stability while maintaining scalability.

Details DOI

JBHI Journal 2026 Journal Article

Subject-Adaptive EEG Decoding via Filter-Bank Neural Architecture Search for BCI Applications

Chong Wang
Li Yang
Bingfan Yuan
Jiafan Zhang
Chen Jin
Rong Li
Junjie Bu

Individual differences pose a significant challenge in brain-computer interface (BCI) research. Designing a universally applicable network architecture is impractical due to the variability in human brain structure and function. We propose Filter-Bank Neural Architecture Search (FBNAS), an EEG decoding framework that automates network architecture design for individuals. FBNAS uses three temporal cells to process different frequency EEG signals, with dilated convolution kernels in their search spaces. A multi-path NAS algorithm determines optimal architectures for multi-scale feature extraction. We benchmarked FBNAS on three EEG datasets across two BCI paradigms, comparing it to six state-of-the-art deep learning algorithms. FBNAS achieved cross-session decoding accuracies of 79. 78%, 70. 66%, and 68. 38% on the BCIC-IV-2a, OpenBMI, and SEED datasets, respectively, outperforming other methods. Our results show that FBNAS customizes decoding models to address individual differences, enhancing decoding performance and shifting model design from expert-driven to machine-aided. The source code can be found at https://github.com/wang1239435478/FBNAS-master.

Details DOI

JBHI Journal 2025 Journal Article

A Foundational fMRI Model for Representing Continuous Brain States

Li Yang
Lei Guo
Yixuan Yuan
Junwei Han
Xintao Hu
Tuo Zhang

Foundational models have significant potential to advance brain function research, particularly in understanding the dynamics of brain states. However, most existing models process brain signals within fixed time windows, restricting their ability to capture the full temporal complexity of brain activity. In this study, we propose BrainSN (Brain States Network), a novel fMRI foundational model designed to represent continuous brain state information and support diverse downstream tasks. First, leveraging a transformer-based architecture, BrainSN reconstructs input brain states across multiple time scales and predicts future brain activity, effectively capturing both short-term and long-term dependencies. Second, through multiple embeddings and a channel gating module, the model integrates brain state information and applies an attention mechanism to extract critical features. Additionally, we train BrainSN on 1, 256 hours of resting-state and naturalistic stimulus fMRI data, enabling it to learn large-scale brain dynamics without relying on task-based paradigms. Without fine-tuning, BrainSN achieves 75. 23% and 75. 82% accuracy in autism and attention disorder diagnosis tasks, respectively, matching the performance of leading models pretrained on disease-specific data. After fine-tuning, it surpasses these models. In mental state decoding, BrainSN attains 95. 31% accuracy without fine-tuning, outperforming the best models trained on large-scale task-based fMRI data. Furthermore, by analyzing BrainSN's embeddings in relation to movie stimuli, we demonstrate that the model effectively captures the semantic content of movie scenes embedded in fMRI signals and is highly sensitive to sequence. These results highlight BrainSN's ability to model brain state dynamics and underscore its potential advantages for clinical diagnosis, treatment evaluation, and cognitive neuroscience research.

Details DOI

AAAI Conference 2025 Conference Paper

BeyondGender: A Multifaceted Bilingual Dataset for Practical Sexism Detection

Xuan Luo
Li Yang
Han Zhang
Geng Tu
Qianlong Wang
Keyang Ding
Chuang Fan
Jing Li

Sexism affects both women and men, yet research often overlooks misandry and suffers from overly broad annotations that limit AI applications. To address this, we introduce BeyondGender, a dataset meticulously annotated according to the latest definitions of misogyny and misandry. It features innovative multifaceted labels encompassing aspects of sexism, gender, phrasing, misogyny, and misandry. The dataset includes 6K English and 1.7K Chinese sexism instances, alongside 13K non-sexism examples. Our evaluations of masked language models and large language models reveal that they detect misogyny in English and misandry in Chinese more effectively, with F1-scores of 0.87 and 0.62, respectively. However, they frequently misclassify hostile and mild comments, underscoring the complexity of sexism detection. Parallel corpus experiments suggest promising data augmentation strategies to enhance AI systems for nuanced sexism detection, and our dataset can be leveraged to improve value alignment in large language models.

PDF Details DOI

AAAI Conference 2025 Conference Paper

CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion

Yunlong Tang
Gen Zhan
Li Yang
Yiting Liao
Chenliang Xu

Video saliency prediction aims to identify the regions in a video that attract human attention and gaze, driven by bottom-up features from the video and top-down processes like memory and cognition. Among these top-down influences, language plays a crucial role in guiding attention by shaping how visual information is interpreted. Existing methods primarily focus on modeling perceptual information while neglecting the reasoning process facilitated by language, where ranking cues are crucial outcomes of this process and practical guidance for saliency prediction. In this paper, we propose CaRDiff (Caption, Rank, and generate with Diffusion), a framework that imitates the process by integrating multimodal large language model (MLLM), a grounding module, and a diffusion model, to enhance video saliency prediction. Specifically, we introduce a novel prompting method VSOR-CoT (Video Slient Object Ranking Chain of Thought), which utilizes an MLLM with a grounding module to caption video content and infer salient objects along with their rankings and positions. This process derives ranking maps that can be sufficiently leveraged by the diffusion model to accurately decode the saliency maps for the given video. Extensive experiments showcase the effectiveness of VSOR-CoT in improving the performance of video saliency prediction. The proposed CaRDiff performs better than state-of-the-art models on the MVS dataset and demonstrates cross-dataset capabilities on the DHF1k dataset through zero-shot evaluation.

PDF Details DOI

ICLR Conference 2025 Conference Paper

DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks

Wei Liu
Li Yang
Mingxuan Zhao
Shuxun Wang
Jin Gao
Wenjuan Li
Bing Li
Weiming Hu

Spiking Neural Networks (SNNs), with their biologically inspired spatio-temporal dynamics and spike-driven processing, are emerging as a promising low-power alternative to traditional Artificial Neural Networks (ANNs). However, the complex neuronal dynamics and non-differentiable spike communication mechanisms in SNNs present substantial challenges for efficient training. By analyzing the membrane potentials in spiking neurons, we found that their distributions can increasingly deviate from the firing threshold as time progresses, which tends to cause diminished backpropagation gradients and unbalanced optimization. To address these challenges, we propose Deep Temporal-Aligned Gradient Enhancement (DeepTAGE), a novel approach that improves optimization gradients in SNNs from both internal surrogate gradient functions and external supervision methods. Our DeepTAGE dynamically adjusts surrogate gradients in accordance with the membrane potential distribution across different time steps, enhancing their respective gradients in a temporal-aligned manner that promotes balanced training. Moreover, to mitigate issues of gradient vanishing or deviating during backpropagation, DeepTAGE incorporates deep supervision at both spatial (network stages) and temporal (time steps) levels to ensure more effective and robust network optimization. Importantly, our method can be seamlessly integrated into existing SNN architectures without imposing additional inference costs or requiring extra control modules. We validate the efficacy of DeepTAGE through extensive experiments on static benchmarks (CIFAR10, CIFAR100, and ImageNet-1k) and a neuromorphic dataset (DVS-CIFAR10), demonstrating significant performance improvements.

Details

IROS Conference 2025 Conference Paper

Dynamic Walking Corridor Generation for Visually Impaired Navigation Using Social Force Models and Convex Optimization

Qingquan Na
Hui Zhou 0008
Zhenyu Fu
Li Yang
Antonio Frisoli

This paper presents a dynamic walking corridor generation (DWCG) algorithm designed to enhance navigation safety for visually impaired individuals in crowded pedestrian environments. Current physical human-robot interaction (pHRI) systems struggle with random pedestrian movements and interaction disturbances in such settings. To address these limitations, we propose a safety-critical framework that integrates Safe Flight Corridor concepts with pedestrian dynamics modeling. The method constructs time-varying Safe Walking Corridors (SWCs) through convex polyhedra decomposition, constrained by social force model predictions. Simulation experiments demonstrate a 100% success rate in moderate crowds (50 pedestrians or fewer) with 10. 1 ms average computation time, and 86. 3% success in high-density environments (100 pedestrians), establishing a foundation for reliable assistive navigation systems in complex urban settings.

Details

NeurIPS Conference 2025 Conference Paper

MI-TRQR: Mutual Information-Based Temporal Redundancy Quantification and Reduction for Energy-Efficient Spiking Neural Networks

Dengfeng Xue
Wenjuan Li
Yifan Lu
Chunfeng Yuan
Yufan Liu
Wei Liu
Man Yao
Li Yang

Brain-inspired spiking neural networks (SNNs) provide energy-efficient computation through event-driven processing. However, the shared weights across multiple timesteps lead to serious temporal feature redundancy, limiting both efficiency and performance. This issue is further aggravated when processing static images due to the duplicated input. To mitigate this problem, we propose a parameter-free and plug-and-play module named Mutual Information-based Temporal Redundancy Quantification and Reduction (MI-TRQR), constructing energy-efficient SNNs. Specifically, Mutual Information (MI) is properly introduced to quantify redundancy between discrete spike features at different timesteps on two spatial scales: pixel (local) and the entire spatial features (global). Based on the multi-scale redundancy quantification, we apply a probabilistic masking strategy to remove redundant spikes. The final representation is subsequently recalibrated to account for the spike removal. Extensive experimental results demonstrate that our MI-TRQR achieves sparser spiking firing, higher energy efficiency, and better performance concurrently with different SNN architectures in tasks of neuromorphic data classification, static data classification, and time-series forecasting. Notably, MI-TRQR increases accuracy by \textbf{1. 7\%} on CIFAR10-DVS with 4 timesteps while reducing energy cost by \textbf{37. 5\%}. Our codes are available at https: //github. com/dfxue/MI-TRQR.

PDF Details

EAAI Journal 2025 Journal Article

Path optimization of a flexible robot with a spatial compressed and a direction-guided exploring method

Yan Wang
Wensong Jiang
Zai Luo
Li Yang
Jiafu Li
Hongzhe Lu

Details DOI

ICLR Conference 2025 Conference Paper

Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing

Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding 0002
Li Yang
Ali Anwar 0001

We introduce Probe Pruning (PP), a novel framework for online, dynamic, structured pruning of Large Language Models (LLMs) applied in a batch-wise manner. PP leverages the insight that not all samples and tokens contribute equally to the model's output, and probing a small portion of each batch effectively identifies crucial weights, enabling tailored dynamic pruning for different batches. It comprises three main stages: probing, history-informed pruning, and full inference. In the probing stage, PP selects a small yet crucial set of hidden states, based on residual importance, to run a few model layers ahead. During the history-informed pruning stage, PP strategically integrates the probing states with historical states. Subsequently, it structurally prunes weights based on the integrated states and the PP importance score, a metric developed specifically to assess the importance of each weight channel in maintaining performance. In the final stage, full inference is conducted on the remaining weights. A major advantage of PP is its compatibility with existing models, as it operates without requiring additional neural network modules or fine-tuning. Comprehensive evaluations of PP on LLaMA-2/3 and OPT models reveal that even minimal probing—using just 1.5% of FLOPs—can substantially enhance the efficiency of structured pruning of LLMs. For instance, when evaluated on LLaMA-2-7B with WikiText2, PP achieves a 2.56 times lower ratio of performance degradation per unit of latency reduction compared to the state-of-the-art method at a 40\% pruning ratio.

Details

NeurIPS Conference 2025 Conference Paper

Restricted Global-Aware Graph Filters Bridging GNNs and Transformer for Node Classification

Jingyuan Zhang
Xin Wang
Lei Yu
Zhirong Huang
Li Yang
Fengjun Zhang

Transformers have been widely regarded as a promising direction for breaking through the performance bottlenecks of Graph Neural Networks (GNNs), primarily due to their global receptive fields. However, a recent empirical study suggests that tuned classical GNNs can match or even outperform state-of-the-art Graph Transformers (GTs) on standard node classification benchmarks. Motivated by this fact, we deconstruct several representative GTs to examine how global attention components influence node representations. We find that the global attention module does not provide significant performance gains and may even exacerbate test error oscillations. Consequently, we consider that the Transformer is barely able to learn connectivity patterns that meaningfully complement the original graph topology. Interestingly, we further observe that mitigating such oscillations enables the Transformer to improve generalization in GNNs. In a nutshell, we reinterpret the Transformer through the lens of graph spectrum and reformulate it as a global-aware graph filter with band-pass characteristics and linear complexity. This unique perspective introduces multi-channel filtering constraints that effectively suppress test error oscillations. Extensive experiments (17 homophilous, heterophilous graphs) provide comprehensive empirical evidence for our perspective. This work clarifies the role of Transformers in GNNs and suggests that advancing modern GNN research may still require a return to the graph itself.

PDF Details

AAAI Conference 2025 Conference Paper

Towards More Discriminative Feature Learning in SNNs with Temporal-Self-Erasing Supervision

Wei Liu
Li Yang
Mingxuan Zhao
Dengfeng Xue
Shuxun Wang
Boyu Cai
Jin Gao
Wenjuan Li

Spiking Neural Networks (SNNs) are biologically inspired models that process visual inputs over multiple time steps. However, they often struggle with limited feature discrimination along the temporal dimension due to inherent spatiotemporal invariance. This limitation arises from the redundant activation of certain regions and shared supervision for multiple time steps, constraining the network’s ability to adapt and learn diverse features. To address this challenge, we propose a novel Temporal-Self-Erasing (TSE) supervision method that dynamically adapts the learning regions of interest for different time steps. The TSE method operates by identifying highly activated regions from predictions across multiple time steps and adaptively suppressing them during model training, thereby encouraging the network to focus on less activated yet potentially informative regions. This approach not only enhances the feature discrimination capability of SNNs but also facilitates more effective multi-time-step inference by exploiting more semantic information. Experimental results on benchmark datasets demonstrate that our TSE method significantly improves the classification accuracy and robustness of SNNs.

PDF Details DOI

AAAI Conference 2024 Conference Paper

EMGAN: Early-Mix-GAN on Extracting Server-Side Model in Split Federated Learning

Jingtao Li
Xing Chen
Li Yang
Adnan Siraj Rakin
Deliang Fan
Chaitali Chakrabarti

Split Federated Learning (SFL) is an emerging edge-friendly version of Federated Learning (FL), where clients process a small portion of the entire model. While SFL was considered to be resistant to Model Extraction Attack (MEA) by design, a recent work shows it is not necessarily the case. In general, gradient-based MEAs are not effective on a target model that is changing, as is the case in training-from-scratch applications. In this work, we propose a strong MEA during the SFL training phase. The proposed Early-Mix-GAN (EMGAN) attack effectively exploits gradient queries regardless of data assumptions. EMGAN adopts three key components to address the problem of inconsistent gradients. Specifically, it employs (i) Early-learner approach for better adaptability, (ii) Multi-GAN approach to introduce randomness in generator training to mitigate mode collapse, and (iii) ProperMix to effectively augment the limited amount of synthetic data for a better approximation of the target domain data distribution. EMGAN achieves excellent results in extracting server-side models. With only 50 training samples, EMGAN successfully extracts a 5-layer server-side model of VGG-11 on CIFAR-10, with 7% less accuracy than the target model. With zero training data, the extracted model achieves 81.3% accuracy, which is significantly better than the 45.5% accuracy of the model extracted by the SoTA method. The code is available at "https://github.com/zlijingtao/SFL-MEA".

PDF Details DOI

ICLR Conference 2024 Conference Paper

GNeRP: Gaussian-guided Neural Reconstruction of Reflective Objects with Noisy Polarization Priors

Li Yang
Ruizheng Wu
Jiyong Li
Ying-Cong Chen

Learning surfaces from neural radiance field (NeRF) became a rising topic in Multi-View Stereo (MVS). Recent Signed Distance Function (SDF)-based methods demonstrated their ability to reconstruct exact 3D shapes of Lambertian scenes. However, their results on reflective scenes are unsatisfactory due to the entanglement of specular radiance and complicated geometry. To address the challenges, we propose a Gaussian-based representation of normals in SDF fields. Supervised by polarization priors, this representation guides the learning of geometry behind the specular reflection and capture more details than existing methods. Moreover, we propose a reweighting strategy in optimization process to alleviate the noise issue of polarization priors. To validate the effectiveness of our design, we capture polarimetric information and ground truth meshes in additional reflective scenes with various geometry. We also evaluated our framework on PANDORA dataset. Both qualitative and quantitative comparisons prove our method outperforms existing neural 3D reconstruction methods in reflective scenes by a large margin.

Details

NeurIPS Conference 2024 Conference Paper

LP-3DGS: Learning to Prune 3D Gaussian Splatting

Zhaoliang Zhang
Tianchen Song
Yongjae Lee
Li Yang
Cheng Peng
Rama Chellappa
Deliang Fan

Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical pre-set pruning ratio or importance score threshold to prune the point cloud. Such hyperparameters require multiple rounds of training to optimize and achieve the maximum pruning ratio while maintaining the rendering quality for each scene. In this work, we propose learning-to-prune 3DGS (LP-3DGS), where a trainable binary mask is applied to the importance score to automatically find a favorable pruning ratio. Instead of using the traditional straight-through estimator (STE) method to approximate the binary mask gradient, we redesign the masking function to leverage the Gumbel-Sigmoid method, making it differentiable and compatible with the existing training process of 3DGS. Extensive experiments have shown that LP-3DGS consistently achieves a good balance between efficiency and high quality.

PDF Details DOI

EAAI Journal 2024 Journal Article

Meta-fourier neural operators for multi-task modeling of film cooling in gas turbine endwalls

Qi Wang
Jian Lou
Yang Li
Li Yang

Details DOI

NeurIPS Conference 2023 Conference Paper

Exploiting Contextual Objects and Relations for 3D Visual Grounding

Li Yang
Chunfeng Yuan
Ziqi Zhang
Zhongang Qi
Yan Xu
Wei Liu
Ying Shan
Bing Li

3D visual grounding, the task of identifying visual objects in 3D scenes based on natural language inputs, plays a critical role in enabling machines to understand and engage with the real-world environment. However, this task is challenging due to the necessity to capture 3D contextual information to distinguish target objects from complex 3D scenes. The absence of annotations for contextual objects and relations further exacerbates the difficulties. In this paper, we propose a novel model, CORE-3DVG, to address these challenges by explicitly learning about contextual objects and relations. Our method accomplishes 3D visual grounding via three sequential modular networks, including a text-guided object detection network, a relation matching network, and a target identification network. During training, we introduce a pseudo-label self-generation strategy and a weakly-supervised method to facilitate the learning of contextual objects and relations, respectively. The proposed techniques allow the networks to focus more effectively on referred objects within 3D scenes by understanding their context better. We validate our model on the challenging Nr3D, Sr3D, and ScanRefer datasets and demonstrate state-of-the-art performance. Our code will be public at https: //github. com/yangli18/CORE-3DVG.

PDF Details

YNIMG Journal 2023 Journal Article

Genetic Influence on Gyral Peaks

Ying Huang
Tuo Zhang
Songyao Zhang
Weihan Zhang
Li Yang
Dajiang Zhu
Tianming Liu
Xi Jiang

Details DOI

NeurIPS Conference 2023 Conference Paper

Slimmed Asymmetrical Contrastive Learning and Cross Distillation for Lightweight Model Training

Jian Meng
Li Yang
Kyungmin Lee
Jinwoo Shin
Deliang Fan
Jae-Sun Seo

Contrastive learning (CL) has been widely investigated with various learning mechanisms and achieves strong capability in learning representations of data in a self-supervised manner using unlabeled data. A common fashion of contrastive learning on this line is employing mega-sized encoders to achieve comparable performance as the supervised learning counterpart. Despite the success of the labelless training, current contrastive learning algorithms *failed* to achieve good performance with lightweight (compact) models, e. g. , MobileNet, while the requirements of the heavy encoders impede the energy-efficient computation, especially for resource-constrained AI applications. Motivated by this, we propose a new self-supervised CL scheme, named SACL-XD, consisting of two technical components, **S**limmed **A**symmetrical **C**ontrastive **L**earning (SACL) and **Cross**-**D**istillation (XD), which collectively enable efficient CL with compact models. While relevant prior works employed a strong pre-trained model as the teacher of unsupervised knowledge distillation to a lightweight encoder, our proposed method trains CL models from scratch and outperforms them even without such an expensive requirement. Compared to the SoTA lightweight CL training (distillation) algorithms, SACL-XD achieves 1. 79% ImageNet-1K accuracy improvement on MobileNet-V3 with 64$\times$ training FLOPs reduction.

PDF Details

NeurIPS Conference 2022 Conference Paper

Beyond Not-Forgetting: Continual Learning with Backward Knowledge Transfer

Sen Lin
Li Yang
Deliang Fan
Junshan Zhang

By learning a sequence of tasks continually, an agent in continual learning (CL) can improve the learning performance of both a new task and `old' tasks by leveraging the forward knowledge transfer and the backward knowledge transfer, respectively. However, most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. This inevitably limits the backward knowledge transfer from the new task to the old tasks, because judicious model updates could possibly improve the learning performance of the old tasks as well. To tackle this problem, we first theoretically analyze the conditions under which updating the learnt model of old tasks could be beneficial for CL and also lead to backward knowledge transfer, based on the gradient projection onto the input subspaces of old tasks. Building on the theoretical analysis, we next develop a ContinUal learning method with Backward knowlEdge tRansfer (CUBER), for a fixed capacity neural network without data replay. In particular, CUBER first characterizes the task correlation to identify the positively correlated old tasks in a layer-wise manner, and then selectively modifies the learnt model of the old tasks when learning the new task. Experimental studies show that CUBER can even achieve positive backward knowledge transfer on several existing CL benchmarks for the first time without data replay, where the related baselines still suffer from catastrophic forgetting (negative backward knowledge transfer). The superior performance of CUBER on the backward knowledge transfer also leads to higher accuracy accordingly.

PDF Details

NeurIPS Conference 2022 Conference Paper

Get More at Once: Alternating Sparse Training with Gradient Correction

Li Yang
Jian Meng
Jae-Sun Seo
Deliang Fan

Recently, a new trend of exploring training sparsity has emerged, which remove parameters during training, leading to both training and inference efficiency improvement. This line of works primarily aims to obtain a single sparse model under a pre-defined large sparsity ratio. It leads to a static/fixed sparse inference model that is not capable of adjusting or re-configuring its computation complexity (i. e. , inference structure, latency) after training for real-world varying and dynamic hardware resource availability. To enable such run-time or post-training network morphing, the concept of dynamic inference' or training-once-for-all' has been proposed to train a single network consisting of multiple sub-nets once, but each sub-net could perform the same inference function with different computing complexity. However, the traditional dynamic inference training method requires a joint training scheme with multi-objective optimization, which suffers from very large training overhead. In this work, for the first time, we propose a novel alternating sparse training (AST) scheme to train multiple sparse sub-nets for dynamic inference without extra training cost compared to the case of training a single sparse model from scratch. Furthermore, to mitigate the interference of weight update among sub-nets, we propose gradient correction within the inner-group iterations to reduce their weight update interference. We validate the proposed AST on multiple datasets against state-of-the-art sparse training method, which shows that AST achieves similar or better accuracy, but only needs to train once to get multiple sparse sub-nets with different sparsity ratios. More importantly, compared with the traditional joint training based dynamic inference training methodology, the large training overhead is completely eliminated without affecting the accuracy of each sub-net.

PDF Details

AAAI Conference 2022 Conference Paper

Gradient-Based Novelty Detection Boosted by Self-Supervised Binary Classification

Jingbo Sun
Li Yang
Jiaxin Zhang
Frank Liu
Mahantesh Halappanavar
Deliang Fan
Yu Cao

Novelty detection aims to automatically identify out-ofdistribution (OOD) data, without any prior knowledge of them. It is a critical step in data monitoring, behavior analysis and other applications, helping enable continual learning in the field. Conventional methods of OOD detection perform multi-variate analysis on an ensemble of data or features, and usually resort to the supervision with OOD data to improve the accuracy. In reality, such supervision is impractical as one cannot anticipate the anomalous data. In this paper, we propose a novel, self-supervised approach that does not rely on any pre-defined OOD data: (1) The new method evaluates the Mahalanobis distance of the gradients between the in-distribution and OOD data. (2) It is assisted by a self-supervised binary classifier to guide the label selection to generate the gradients, and maximize the Mahalanobis distance. In the evaluation with multiple datasets, such as CIFAR-10, CIFAR-100, SVHN and TinyImageNet, the proposed approach consistently outperforms state-of-the-art supervised and unsupervised methods in the area under the receiver operating characteristic (AUROC) and area under the precision-recall curve (AUPR) metrics. We further demonstrate that this detector is able to accurately learn one OOD class in continual learning.

PDF Details

EAAI Journal 2022 Journal Article

IoT data analytics in dynamic environments: From an automated machine learning perspective

Li Yang
Abdallah Shami

Details DOI

NeurIPS Conference 2020 Conference Paper

Big Bird: Transformers for Longer Sequences

Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
Santiago Ontanon
Philip Pham
Anirudh Ravula

Transformers-based models, such as BERT, have been one of the most successful deep learning models for NLP. Unfortunately, one of their core limitations is the quadratic dependency (mainly in terms of memory) on the sequence length due to their full attention mechanism. To remedy this, we propose, BigBird, a sparse attention mechanism that reduces this quadratic dependency to linear. We show that BigBird is a universal approximator of sequence functions and is Turing complete, thereby preserving these properties of the quadratic, full attention model. Along the way, our theoretical analysis reveals some of the benefits of having $O(1)$ global tokens (such as CLS), that attend to the entire sequence as part of the sparse attention mechanism. The proposed sparse attention can handle sequences of length up to 8x of what was previously possible using similar hardware. As a consequence of the capability to handle longer context, BigBird drastically improves performance on various NLP tasks such as question answering and summarization. We also propose novel applications to genomics data.

PDF Details

AAAI Conference 2020 Conference Paper

Harmonious Coexistence of Structured Weight Pruning and Ternarization for Deep Neural Networks

Li Yang
Zhezhi He
Deliang Fan

Deep convolutional neural network (DNN) has demonstrated phenomenal success and been widely used in many computer vision tasks. However, its enormous model size and high computing complexity prohibits its wide deployment into resource limited embedded system, such as FPGA and mGPU. As the two most widely adopted model compression techniques, weight pruning and quantization compress DNN model through introducing weight sparsity (i. e. , forcing partial weights as zeros) and quantizing weights into limited bitwidth values, respectively. Although there are works attempting to combine the weight pruning and quantization, we still observe disharmony between weight pruning and quantization, especially when more aggressive compression schemes (e. g. , Structured pruning and low bit-width quantization) are used. In this work, taking FPGA as the test computing platform and Processing Elements (PE) as the basic parallel computing unit, we ﬁrst propose a PE-wise structured pruning scheme, which introduces weight sparsiﬁcation with considering of the architecture of PE. In addition, we integrate it with an optimized weight ternarization approach which quantizes weights into ternary values ({−1, 0, +1}), thus converting the dominant convolution operations in DNN from multiplication-and-accumulation (MAC) to addition-only, as well as compressing the original model (from 32-bit ﬂoating point to 2-bit ternary representation) by at least 16 times. Then, we investigate and solve the coexistence issue between PE-wise Structured pruning and ternarization, through proposing a Weight Penalty Clipping (WPC) technique with self-adapting threshold. Our experiment shows that the fusion of our proposed techniques can achieve the best state-of-theart ∼ 21× PE-wise structured compression rate with merely 1. 74%/0. 94% (top-1/top-5) accuracy degradation of ResNet- 18 on ImageNet dataset.

PDF Details

YNICL Journal 2019 Journal Article

Linked anatomical and functional brain alterations in children with attention-deficit/hyperactivity disorder

Zhao-Min Wu
Alberto Llera
Martine Hoogman
Qing-Jiu Cao
Marcel P. Zwiers
Janita Bralten
Li An
Li Sun

Details DOI