Author name cluster

Yanan Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

Adapt Before Continual Learning

Aojun Lu
Tao Feng
Hangjie Yuan
Chunhui Ding
Yanan Sun

Continual Learning (CL) seeks to enable neural networks to incrementally acquire new knowledge (plasticity) while retaining existing knowledge (stability). Although pre-trained models (PTMs) have provided a strong foundation for CL, existing approaches face a fundamental challenge in balancing these two competing objectives. Current methods typically address stability by freezing the PTM backbone, which severely limits the model's plasticity, particularly when incoming data distribution diverges largely from the pre-training data. Alternatively, sequentially fine-tuning the entire PTM can adapt to new knowledge but often leads to catastrophic forgetting, highlighting the critical stability-plasticity trade-off in PTM-based CL. To address this limitation, we propose Adapting PTMs before the core CL process (ACL), a novel framework that introduces a plug-and-play adaptation phase prior to learning each new task. During this phase, ACL refines the PTM backbone by aligning embeddings with their original class prototypes while distancing them from irrelevant classes. This mechanism theoretically and empirically demonstrates desirable balance between stability and plasticity, significantly improving CL performance across benchmarks and integrated methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

LAS: Loss-less ANN-SNN Conversion for Fully Spike-Driven Large Language Models

Long Chen
Xiaotian Song
Yanan Sun

Spiking Large Language Models (LLMs) have emerged as an energy-efficient alternative to conventional LLMs through their event-driven computation. To effectively obtain spiking LLMs, researchers develop different ANN-to-SNN conversion methods by leveraging pre-trained ANN parameters while inheriting the energy efficiency of SNN. However, existing conversion methods struggle with extreme activation outliers and incompatible nonlinear operations of ANN-based LLMs. To address this, we propose a loss-less ANN-SNN conversion for fully spike-driven LLMs, termed LAS. Specifically, LAS introduces two novel neurons to convert the activation outlier and nonlinear operation of ANN-based LLMs. Moreover, LAS tailors the spike-equivalent Transformer components for spiking LLMs, which can ensure full spiking conversion without any loss of performance. Experimental results on six language models and two vision-language models demonstrate that LAS achieves loss-less conversion. Notably, on OPT-66B, LAS even improves the accuracy of 2% on the WSC task. In addition, the parameter and ablation studies further verify the effectiveness of LAS.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

KeeA: Epistemic Exploratory A Search via Knowledge Calibration

Dengwei Zhao
Shikui Tu
Yanan Sun
Lei Xu

In recent years, neural network-guided heuristic search algorithms, such as Monte-Carlo tree search and A$^\*$ search, have achieved significant advancements across diverse practical applications. Due to the challenges stemming from high state-space complexity, sparse training datasets, and incomplete environmental modeling, heuristic estimations manifest uncontrolled inherent biases towards the actual expected evaluations, thereby compromising the decision-making quality of search algorithms. Sampling exploration enhanced A$^\*$ (SeeA$^\*$) was proposed to improve the efficiency of A$^\*$ search by constructing an dynamic candidate subset through random sampling, from which the expanded node was selected. However, uniform sampling strategy utilized by SeeA$^\*$ facilitates exploration exclusively through the injection of randomness, which completely neglects the heuristic knowledge relevant to open nodes. Moreover, the theoretical support of cluster sampling remains ambiguous. Despite the existence of potential biases, heuristic estimations still encapsulate certain valuable information. In this paper, epistemic exploratory A$^\*$ search (KeeA$^\*$) is proposed to integrate heuristic knowledge for calibrating the sampling process. We first theoretically demonstrate that SeeA$^\*$ with cluster sampling outperforms uniform sampling due to the distribution-aware selection with higher variance. Building on this insight, cluster scouting and path-aware sampling are introduced in KeeA$^\*$ to further exploit heuristic knowledge to increase the sampling mean and variance, respectively, thereby generating higher-quality extreme candidates and enhancing overall decision-making performance. Finally, empirical results on retrosynthetic planning and logic synthesis demonstrate superior performance of KeeA$^*$ compared to state-of-the-art heuristic search algorithms.

PDF Details

NeurIPS Conference 2025 Conference Paper

Vulnerable Data-Aware Adversarial Training

Yuqi Feng
Jiahao Fan
Yanan Sun

Fast adversarial training (FAT) has been considered as one of the most effective alternatives to the computationally-intensive adversarial training. Generally, FAT methods pay equal attention to each sample of the target task. However, the distance between each sample and the decision boundary is different, learning samples which are far from the decision boundary (i. e. , less important to adversarial robustness) brings additional training cost and leads to sub-optimal results. To tackle this issue, we present vulnerable data-aware adversarial training (VDAT) in this study. Specifically, we first propose a margin-based vulnerability calculation method to measure the vulnerability of data samples. Moreover, we propose a vulnerability-aware data filtering method to reduce the training data for adversarial training thus improve the training efficiency. The experiments are conducted in terms of adversarial training and robust neural architecture search on CIFAR-10, CIFAR-100, and ImageNet-1K. The results demonstrate that VDAT is up to 76% more efficient than state-of-the-art FAT methods, while achieving improvements regarding the natural accuracy and adversarial accuracy in both scenarios. Furthermore, the visualizations and ablation studies show the effectiveness of both core components designed in VDAT.

PDF Details

IJCAI Conference 2024 Conference Paper

CAP: A Context-Aware Neural Predictor for NAS

Han Ji
Yuqi Feng
Yanan Sun

Neural predictors are effective in boosting the time-consuming performance evaluation stage in neural architecture search (NAS), owing to their direct estimation of unseen architectures. Despite the effectiveness, training a powerful neural predictor with fewer annotated architectures remains a huge challenge. In this paper, we propose a context-aware neural predictor (CAP) which only needs a few annotated architectures for training based on the contextual information from the architectures. Specifically, the input architectures are encoded into graphs and the predictor infers the contextual structure around the nodes inside each graph. Then, enhanced by the proposed context-aware self-supervised task, the pre-trained predictor can obtain expressive and generalizable representations of architectures. Therefore, only a few annotated architectures are sufficient for training. Experimental results in different search spaces demonstrate the superior performance of CAP compared with state-of-the-art neural predictors. In particular, CAP can rank architectures precisely at the budget of only 172 annotated architectures in NAS-Bench-101. Moreover, CAP can help find promising architectures in both NAS-Bench-101 and DARTS search spaces on the CIFAR-10 dataset, serving as a useful navigator for NAS to explore the search space efficiently.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

One-step Spiking Transformer with a Linear Complexity

Xiaotian Song
Andy Song
Rong Xiao
Yanan Sun

Spiking transformers have recently emerged as a robust alternative in deep learning. One focus of this field is the reduction of energy consumption, given that spiking transformers require lengthy simulation timesteps and complex floating-point attention mechanisms. In this paper, we propose a one-step approach that requires only one timestep and is of linear complexity. The proposed One-step Spiking Transformer (OST) incorporates a Time Domain Compression and Compensation (TDCC) component, which can significantly mitigate the spatio-temporal overhead of spiking transformers. Another novel component in OST is the Spiking Linear Transformation (SLT), designed to greatly reduce the number of floating-point multiply-and-accumulate operations. Experiments on both static and neuromorphic images show that OST can perform as well as or better than SOTA methods with just one timestep, even for more difficult tasks. For instance, comparing with Spikeformer, OST gains 1. 59% in accuracy on ImageNet, yet 40. 27% more efficient, and gains 0. 7% on DVS128 Gesture. The supplementary materials and source code are available at https: //github. com/songxt3/OST.

PDF Details DOI

ICML Conference 2024 Conference Paper

Position: Exploring the Robustness of Pipeline-Parallelism-Based Decentralized Training

Lin Lu 0003
Chenxi Dai
Wangcheng Tao
Binhang Yuan
Yanan Sun
Pan Zhou 0001

Modern machine learning applications increasingly demand greater computational resources for training large models. Decentralized training has emerged as an effective means to democratize this technology. However, the potential threats associated with this approach remain inadequately discussed, posing a hurdle to the development of decentralized training infrastructures. This paper aims to initiate discussion towards this end by exploring the robustness of decentralized training from three primary perspectives. Firstly, we articulate our position on establishing robust decentralized training by outlining potential threats and the corresponding countermeasures. Secondly, we illustrate a nascent poisoning attack targeting decentralized training frameworks, easily executable by malicious stages. To mitigate this security threat and ensure efficient training, we propose a robust training framework, integrating a 100% detection strategy and efficient training mechanisms. Finally, we demonstrate the severity of the proposed attack and the effectiveness of our robust training framework. This position paper emphasizes the urgency of exploring the robustness of decentralized training and proposes a feasible solution. The code is available at https: //github. com/dcx001016/pipeline_attack.

Details

IJCAI Conference 2024 Conference Paper

Revisiting Neural Networks for Continual Learning: An Architectural Perspective

Aojun Lu
Tao Feng
Hangjie Yuan
Xiaotian Song
Yanan Sun

Efforts to overcome catastrophic forgetting have primarily centered around developing more effective Continual Learning (CL) methods. In contrast, less attention was devoted to analyzing the role of network architecture design (e. g. , network depth, width, and components) in contributing to CL. This paper seeks to bridge this gap between network architecture design and CL, and to present a holistic study on the impact of network architectures on CL. This work considers architecture design at the network scaling level, i. e. , width and depth, and also at the network components, i. e. , skip connections, global pooling layers, and down-sampling. In both cases, we first derive insights through systematically exploring how architectural designs affect CL. Then, grounded in these insights, we craft a specialized search space for CL and further propose a simple yet effective ArchCraft method to steer a CL-friendly architecture, namely, this method recrafts AlexNet/ResNet into AlexAC/ResAC. Experimental validation across various CL settings and scenarios demonstrates that improved architectures are parameter-efficient, achieving state-of-the-art performance of CL while being 86%, 61%, and 97% more compact in terms of parameters than the naive CL architecture in Task IL and Class IL. Code is available at https: //github. com/byyx666/ArchCraft.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor

Yuqiao Liu
Yehui Tang
Zeqiong Lv
Yunhe Wang
Yanan Sun

Neural Architecture Search (NAS) can automatically design promising neural architectures without artificial experience. Though it achieves great success, prohibitively high search cost is required to find a high-performance architecture, which blocks its practical implementation. Neural predictor can directly evaluate the performance of neural networks based on their architectures and thereby save much budget. However, existing neural predictors require substantial annotated architectures trained from scratch, which still consume many computational resources. To solve this issue, we propose a Cross-Domain Predictor (CDP), which is trained based on the existing NAS benchmark datasets (e. g. , NAS-Bench-101), but can be used to find high-performance architectures in large-scale search spaces. Particularly, we propose a progressive subspace adaptation strategy to address the domain discrepancy between the source architecture space and the target space. Considering the large difference between two architecture spaces, an assistant space is developed to smooth the transfer process. Compared with existing NAS methods, the proposed CDP is much more efficient. For example, CDP only requires the search cost of 0. 1 GPU Days to find architectures with 76. 9% top-1 accuracy on ImageNet and 97. 51% on CIFAR-10.

PDF Details

YNIMG Journal 2022 Journal Article