Arrow Research search

Author name cluster

Yi Zeng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

28 papers
1 author row

Possible papers

28

AAAI Conference 2026 Conference Paper

Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble

  • Jihang Wang
  • Dongcheng Zhao
  • Ruolin Chen
  • Qian Zhang
  • Yi Zeng

Spiking Neural Networks (SNNs) offer a promising direction for energy-efficient and brain-inspired computing, yet their vulnerability to adversarial perturbations remains poorly understood. In this work, we revisit the adversarial robustness of SNNs through the lens of temporal ensembling, treating the network as a collection of evolving sub-networks across discrete timesteps. This formulation uncovers two critical but underexplored challenges—the fragility of individual temporal sub-networks and the tendency for adversarial vulnerabilities to transfer across time. To overcome these limitations, we propose Robust Temporal self-Ensemble (RTE), a training framework that improves the robustness of each sub-network while reducing the temporal transferability of adversarial perturbations. RTE integrates both objectives into a unified loss and employs a stochastic sampling strategy for efficient optimization. Extensive experiments across multiple benchmarks demonstrate that RTE consistently outperforms existing training methods in robust-accuracy trade-off. Additional analyses reveal that RTE reshapes the internal robustness landscape of SNNs, leading to more resilient and temporally diversified decision boundaries. Our study highlights the importance of temporal structure in adversarial learning and offers a principled foundation for building robust spiking models.

NeurIPS Conference 2025 Conference Paper

AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration

  • Andy Zhou
  • Kevin Wu
  • Francesco Pinto
  • Zhaorun Chen
  • Yi Zeng
  • Yu Yang
  • Shuang Yang
  • Sanmi Koyejo

As large language models (LLMs) become increasingly capable, security and safety evaluation are crucial. While current red teaming approaches have made strides in assessing LLM vulnerabilities, they often rely heavily on human input and lack comprehensive coverage of emerging attack vectors. This paper introduces AutoRedTeamer, a novel framework for fully automated, end-to-end red teaming against LLMs. AutoRedTeamer combines a multi-agent architecture with a memory-guided attack selection mechanism to enable continuous discovery and integration of new attack vectors. The dual-agent framework consists of a red teaming agent that can operate from high-level risk categories alone to generate and execute test cases, and a strategy proposer agent that autonomously discovers and implements new attacks by analyzing recent research. This modular design allows AutoRedTeamer to adapt to emerging threats while maintaining strong performance on existing attack vectors. We demonstrate AutoRedTeamer’s effectiveness across diverse evaluation settings, achieving 20% higher attack success rates on HarmBench against Llama-3. 1-70B while reducing computational costs by 46% compared to existing approaches. AutoRedTeamer also matches the diversity of human-curated benchmarks in generating test cases, providing a comprehensive, scalable, and continuously evolving framework for evaluating the security of AI systems.

IJCAI Conference 2025 Conference Paper

Brain-Inspired Stepwise Patch Merging for Vision Transformers

  • Yonghao Yu
  • Dongcheng Zhao
  • Guobin Shen
  • Yiting Dong
  • Yi Zeng

The hierarchical architecture has become a mainstream design paradigm for Vision Transformers (ViTs), with Patch Merging serving as the pivotal component that transforms a columnar architecture into a hierarchical one. Drawing inspiration from the brain's ability to integrate global and local information for comprehensive visual understanding, we propose Stepwise Patch Merging (SPM), which enhances the subsequent attention mechanism's ability to 'see' better. SPM consists of Multi-Scale Aggregation (MSA) and Guided Local Enhancement (GLE) striking a proper balance between long-range dependency modeling and local feature enhancement. Extensive experiments conducted on benchmark datasets, including ImageNet-1K, COCO, and ADE20K, demonstrate that SPM significantly improves the performance of various models, particularly in dense prediction tasks such as object detection and semantic segmentation. Meanwhile, experiments show that combining SPM with different backbones can further improve performance. The code has been released at https: //github. com/Yonghao-Yu/StepwisePatchMerging.

AAAI Conference 2025 Conference Paper

EventZoom: A Progressive Approach to Event-Based Data Augmentation for Enhanced Neuromorphic Vision

  • Yiting Dong
  • Xiang He
  • Guobin Shen
  • Dongcheng Zhao
  • Yang Li
  • Yi Zeng

Dynamic Vision Sensors (DVS) capture event data with high temporal resolution and low power consumption, presenting a more efficient solution for visual processing in dynamic and real-time scenarios compared to conventional video capture methods. Event data augmentation serves as an essential method for overcoming the limitation of scale and diversity in event datasets. Our comparative experiments demonstrate that the two factors, spatial integrity and temporal continuity, can significantly affect the capacity of event data augmentation, which guarantee the maintenance of the sparsity and high dynamic range characteristics unique to event data. However, existing augmentation methods often neglect the preservation of spatial integrity and temporal continuity. To address this, we developed a novel event data augmentation strategy EventZoom, which employs a temporal progressive strategy, embedding transformed samples into the original samples through progressive scaling and shifting. The scaling process avoids the spatial information loss associated with cropping, while the progressive strategy prevents interruptions or abrupt changes in temporal information. We validated EventZoom across various supervised learning frameworks. The experimental results show that EventZoom consistently outperforms existing event data augmentation methods with SOTA performance. For the first time, we have concurrently employed Semi-supervised and Unsupervised learning to verify feasibility on event augmentation algorithms, demonstrating the applicability and effectiveness of EventZoom as a powerful event-based data augmentation tool in handling real-world scenes with high dynamics and variability environments.

NeurIPS Conference 2025 Conference Paper

Learning the Plasticity: Plasticity-Driven Learning Framework in Spiking Neural Networks

  • Guobin Shen
  • Dongcheng Zhao
  • Yiting Dong
  • Yang Li
  • Feifei Zhao
  • Yi Zeng

The evolution of the human brain has led to the development of complex synaptic plasticity, enabling dynamic adaptation to a constantly evolving world. This progress inspires our exploration into a new paradigm for Spiking Neural Networks (SNNs): a Plasticity-Driven Learning Framework (PDLF). This paradigm diverges from traditional neural network models that primarily focus on direct training of synaptic weights, leading to static connections that limit adaptability in dynamic environments. Instead, our approach delves into the heart of synaptic behavior, prioritizing the learning of plasticity rules themselves. This shift in focus from weight adjustment to mastering the intricacies of synaptic change offers a more flexible and dynamic pathway for neural networks to evolve and adapt. Our PDLF does not merely adapt existing concepts of functional and Presynaptic-Dependent Plasticity but redefines them, aligning closely with the dynamic and adaptive nature of biological learning. This reorientation enhances key cognitive abilities in artificial intelligence systems, such as working memory and multitasking capabilities, and demonstrates superior adaptability in complex, real-world scenarios. Moreover, our framework sheds light on the intricate relationships between various forms of plasticity and cognitive functions, thereby contributing to a deeper understanding of the brain's learning mechanisms. Integrating this groundbreaking plasticity-centric approach in SNNs marks a significant advancement in the fusion of neuroscience and artificial intelligence. It paves the way for developing AI systems that not only learn but also adapt in an ever-changing world, much like the human brain.

NeurIPS Conference 2025 Conference Paper

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

  • Wei Chow
  • Yuan Gao
  • Linfeng Li
  • Xian Wang
  • Qi Xu
  • Hang Song
  • Lingdong Kong
  • Ran Zhou

Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the expressive capacity of visual information as evidenced by maintained performance when images are replaced with captions. However, practical retrieval scenarios frequently involve interleaved multi-condition queries with multiple images. Hence, this paper introduces MERIT, the first multilingual dataset for interleaved multi-condition semantic retrieval, comprising 320, 000 queries with 135, 000 products in 5 languages, covering 7 distinct product categories. Extensive experiments on MERIT identify existing models's critical limitation: focusing solely on global semantic information while neglecting specific conditional elements in queries. Consequently, we propose Coral, a novel fine-tuning framework that adapts pre-trained MLLMs by integrating embedding reconstruction to preserve fine-grained conditional elements and contrastive learning to extract comprehensive global semantics. Experiments demonstrate that Coral achieves a 45. 9% performance improvement over conventional approaches on MERIT, with strong generalization capabilities validated across 8 established retrieval benchmarks. Collectively, our contributions—a novel dataset, identification of critical limitations in existing approaches, and an innovative fine-tuning framework—establish a foundation for future research in interleaved multi-condition semantic retrieval. Data & Code: MERIT-2025. github. io

NeurIPS Conference 2025 Conference Paper

STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

  • Sicheng Shen
  • Dongcheng Zhao
  • Linghao Feng
  • Zeyang Yue
  • Jindong Li
  • Tenglong Li
  • Guobin Shen
  • Yi Zeng

Spiking Transformers have recently emerged as promising architectures for combining the efficiency of spiking neural networks with the representational power of self-attention. However, the lack of standardized implementations, evaluation pipelines, and consistent design choices has hindered fair comparison and principled analysis. In this paper, we introduce \textbf{STEP}, a unified benchmark framework for Spiking Transformers that supports a wide range of tasks, including classification, segmentation, and detection across static, event-based, and sequential datasets. STEP provides modular support for diverse components such as spiking neurons, input encodings, surrogate gradients, and multiple backends (e. g. , SpikingJelly, BrainCog). Using STEP, we reproduce and evaluate several representative models, and conduct systematic ablation studies on attention design, neuron types, encoding schemes, and temporal modeling capabilities. We also propose a unified analytical model for energy estimation, accounting for spike sparsity, bitwidth, and memory access, and show that quantized ANNs may offer comparable or better energy efficiency. Our results suggest that current Spiking Transformers rely heavily on convolutional frontends and lack strong temporal modeling, underscoring the need for spike-native architectural innovations. The full code is available at: https: //github. com/Fancyssc/STEP.

AAAI Conference 2025 Conference Paper

StressPrompt: Does Stress Impact Large Language Models and Human Performance Similarly?

  • Guobin Shen
  • Dongcheng Zhao
  • Aorigele Bao
  • Xiang He
  • Yiting Dong
  • Yi Zeng

Human beings often experience stress, which can significantly influence their performance. This study explores whether Large Language Models (LLMs) exhibit stress responses similar to those of humans and whether their performance fluctuates under different stress-inducing prompts. To investigate this, we developed a novel set of prompts, termed StressPrompt, designed to induce varying levels of stress. These prompts were derived from established psychological frameworks and carefully calibrated based on ratings from human participants. We then applied these prompts to several LLMs to assess their responses across a range of tasks, including instruction-following, complex reasoning, and emotional intelligence. The findings suggest that LLMs, like humans, perform optimally under moderate stress, consistent with the Yerkes-Dodson law. Notably, their performance declines under both low and high-stress conditions. Our analysis further revealed that these StressPrompts significantly alter the internal states of LLMs, leading to changes in their neural representations that mirror human responses to stress. This research provides critical insights into the operational robustness and flexibility of LLMs, demonstrating the importance of designing AI systems capable of maintaining high performance in real-world scenarios where stress is prevalent, such as in customer service, healthcare, and emergency response contexts. Moreover, this study contributes to the broader AI research community by offering a new perspective on how LLMs handle different scenarios and their similarities to human cognition.

AAAI Conference 2024 Conference Paper

An Efficient Knowledge Transfer Strategy for Spiking Neural Networks from Static to Event Domain

  • Xiang He
  • Dongcheng Zhao
  • Yang Li
  • Guobin Shen
  • Qingqun Kong
  • Yi Zeng

Spiking neural networks (SNNs) are rich in spatio-temporal dynamics and are suitable for processing event-based neuromorphic data. However, event-based datasets are usually less annotated than static datasets. This small data scale makes SNNs prone to overfitting and limits their performance. In order to improve the generalization ability of SNNs on event-based datasets, we use static images to assist SNN training on event data. In this paper, we first discuss the domain mismatch problem encountered when directly transferring networks trained on static datasets to event data. We argue that the inconsistency of feature distributions becomes a major factor hindering the effective transfer of knowledge from static images to event data. To address this problem, we propose solutions in terms of two aspects: feature distribution and training strategy. Firstly, we propose a knowledge transfer loss, which consists of domain alignment loss and spatio-temporal regularization. The domain alignment loss learns domain-invariant spatial features by reducing the marginal distribution distance between the static image and the event data. Spatio-temporal regularization provides dynamically learnable coefficients for domain alignment loss by using the output features of the event data at each time step as a regularization term. In addition, we propose a sliding training strategy, which gradually replaces static image inputs probabilistically with event data, resulting in a smoother and more stable training for the network. We validate our method on neuromorphic datasets, including N-Caltech101, CEP-DVS, and N-Omniglot. The experimental results show that our proposed method achieves better performance on all datasets compared to the current state-of-the-art methods. Code is available at https://github.com/Brain-Cog-Lab/Transfer-for-DVS.

NeurIPS Conference 2024 Conference Paper

Fairness-Aware Meta-Learning via Nash Bargaining

  • Yi Zeng
  • Xuelin Yang
  • Li Chen
  • Cristian C. Ferrer
  • Ming Jin
  • Michael I. Jordan
  • Ruoxi Jia

To address issues of group-level fairness in machine learning, it is natural to adjust model parameters based on specific fairness objectives over a sensitive-attributed validation set. Such an adjustment procedure can be cast within a meta-learning framework. However, naive integration of fairness goals via meta-learning can cause hypergradient conflicts for subgroups, resulting in unstable convergence and compromising model performance and fairness. To navigate this issue, we frame the resolution of hypergradient conflicts as a multi-player cooperative bargaining game. We introduce a two-stage meta-learning framework in which the first stage involves the use of a Nash Bargaining Solution (NBS) to resolve hypergradient conflicts and steer the model toward the Pareto front, and the second stage optimizes with respect to specific fairness goals. Our method is supported by theoretical results, notably a proof of the NBS for gradient aggregation free from linear independence assumptions, a proof of Pareto improvement, and a proof of monotonic improvement in validation loss. We also show empirical effects across various fairness objectives in six key fairness datasets and two image classification tasks.

EAAI Journal 2024 Journal Article

Harnessing fuzzy logic for building structural health during large diameter tunnelling in urban area

  • Pierre Guy Atangana Njock
  • Shui-Long Shen
  • Annan Zhou
  • Zhen-Yu Yin
  • Yi Zeng

Building structural health is constantly jeopardized by tunnel construction, which thus is in a sustained demand for holistic risk assessment and management approaches. In this study, we propose an enhanced consulting fuzzy model with limited information losses to evaluate the static and dynamic risks of building health deterioration during tunnelling. The model integrates three main strategies including, (i) reducing influence of experts’ bias in the ranking processes; (ii) expanding the space of uncertainties in fuzzy environment, and (iii) controlling risk aversion settings through the prospect theory. The construction of the largest diameter (15. 3 m) tunnel in China is adopted as a case study to demonstrate the application of the proposed model in assessing the risk statuses of surrounding buildings. Through the proposed model, the risk index can indicate beneficial or detrimental effects of tunnelling and the degree of risk factors delineates priority actions for risk management. The results enable the development of operational risk abatement procedures to enhance the state of practice of large diameter tunnel construction in complex geological conditions.

AAAI Conference 2024 Conference Paper

Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection

  • Hongquan Zhang
  • Bin-Bin Gao
  • Yi Zeng
  • Xudong Tian
  • Xin Tan
  • Zhizhong Zhang
  • Yanyun Qu
  • Jun Liu

Class-incremental object detection (CIOD) is a real-world desired capability, requiring an object detector to continuously adapt to new tasks without forgetting learned ones, with the main challenge being catastrophic forgetting. Many methods based on distillation and replay have been proposed to alleviate this problem. However, they typically learn on a pure visual backbone, neglecting the powerful representation capabilities of textual cues, which to some extent limits their performance. In this paper, we propose task-aware language-image representation to mitigate catastrophic forgetting, introducing a new paradigm for language-image-based CIOD. First of all, we demonstrate the significant advantage of language-image detectors in mitigating catastrophic forgetting. Secondly, we propose a learning task-aware language-image representation method that overcomes the existing drawback of directly utilizing the language-image detector for CIOD. More specifically, we learn the language-image representation of different tasks through an insulating approach in the training stage, while using the alignment scores produced by task-specific language-image representation in the inference stage. Through our proposed method, language-image detectors can be more practical for CIOD. We conduct extensive experiments on COCO 2017 and Pascal VOC 2007 and demonstrate that the proposed method achieves state-of-the-art results under the various CIOD settings.

AAAI Conference 2024 Conference Paper

MatchDet: A Collaborative Framework for Image Matching and Object Detection

  • Jinxiang Lai
  • Wenlong Wu
  • Bin-Bin Gao
  • Jun Liu
  • Jiawei Zhan
  • Congchong Nie
  • Yi Zeng
  • Chengjie Wang

Image matching and object detection are two fundamental and challenging tasks, while many related applications consider them two individual tasks (i.e. task-individual). In this paper, a collaborative framework called MatchDet (i.e. task-collaborative) is proposed for image matching and object detection to obtain mutual improvements. To achieve the collaborative learning of the two tasks, we propose three novel modules, including a Weighted Spatial Attention Module (WSAM) for Detector, and Weighted Attention Module (WAM) and Box Filter for Matcher. Specifically, the WSAM highlights the foreground regions of target image to benefit the subsequent detector, the WAM enhances the connection between the foreground regions of pair images to ensure high-quality matches, and Box Filter mitigates the impact of false matches. We evaluate the approaches on a new benchmark with two datasets called Warp-COCO and miniScanNet. Experimental results show our approaches are effective and achieve competitive improvements.

NeurIPS Conference 2024 Conference Paper

Neuro-Vision to Language: Enhancing Brain Recording-based Visual Reconstruction and Language Interaction

  • Guobin Shen
  • Dongcheng Zhao
  • Xiang He
  • Linghao Feng
  • Yiting Dong
  • Jihang Wang
  • Qian Zhang
  • Yi Zeng

Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. This unified feature extractor efficiently aligns fMRI features with multiple levels of visual embeddings, eliminating the need for subject-specific models and allowing extraction from single-trial data. The extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development. Integrating with LLMs enhances decoding capabilities, enabling tasks such as brain captioning, complex reasoning, concept localization, and visual reconstruction. Our approach demonstrates superior performance across these tasks, precisely identifying language-based concepts within brain signals, enhancing interpretability, and providing deeper insights into neural processes. These advances significantly broaden the applicability of non-invasive brain decoding in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models.

NeurIPS Conference 2024 Conference Paper

RedCode: Risky Code Execution and Generation Benchmark for Code Agents

  • Chengquan Guo
  • Xun Liu
  • Chulin Xie
  • Andy Zhou
  • Yi Zeng
  • Zinan Lin
  • Dawn Song
  • Bo Li

With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding and software development, safety and security concerns, such as generating or executing malicious code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high-quality safety scenarios and tests. RedCode consists of two parts to evaluate agents’ safety in unsafe code execution and generation: (1) RedCode-Exec provides challenging code prompts in Python as inputs, aiming to evaluate code agents’ ability to recognize and handle unsafe code. We then map the Python code to other programming languages (e. g. , Bash) and natural text summaries or descriptions for evaluation, leading to a total of over 4, 000 testing instances. We provide 25 types of critical vulnerabilities spanning various domains, such as websites, file systems, and operating systems. We provide a Docker sandbox environment to evaluate the execution capabilities of code agents and design corresponding evaluation metrics to assess their execution results. (2) RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions to generate harmful code or software. Our empirical findings, derived from evaluating three agent frameworks based on 19 LLMs, provide insights into code agents’ vulnerabilities. For instance, evaluations on RedCode-Exec show that agents are more likely to reject executing unsafe operations on the operating system, but are less likely to reject executing technically buggy code, indicating high risks. Unsafe operations described in natural text lead to a lower rejection rate than those in code format. Additionally, evaluations on RedCode-Gen reveal that more capable base models and agents with stronger overall coding abilities, such as GPT4, tend to produce more sophisticated and effective harmful software. Our findings highlight the need for stringent safety evaluations for diverse code agents. Our dataset and code are publicly available at https: //github. com/AI-secure/RedCode.

IJCAI Conference 2024 Conference Paper

TIM: An Efficient Temporal Interaction Module for Spiking Transformer

  • Sicheng Shen
  • Dongcheng Zhao
  • Guobin Shen
  • Yi Zeng

Spiking Neural Networks (SNNs), as the third generation of neural networks, have gained prominence for their biological plausibility and computational efficiency, especially in processing diverse datasets. The integration of attention mechanisms, inspired by advancements in neural network architectures, has led to the development of Spiking Transformers. These have shown promise in enhancing SNNs' capabilities, particularly in the realms of both static and neuromorphic datasets. Despite their progress, a discernible gap exists in these systems, specifically in the Spiking Self Attention (SSA) mechanism's effectiveness in leveraging the temporal processing potential of SNNs. To address this, we introduce the Temporal Interaction Module (TIM), a novel, convolution-based enhancement designed to augment the temporal data processing abilities within SNN architectures. TIM's integration into existing SNN frameworks is seamless and efficient, requiring minimal additional parameters while significantly boosting their temporal information handling capabilities. Through rigorous experimentation, TIM has demonstrated its effectiveness in exploiting temporal information, leading to state-of-the-art performance across various neuromorphic datasets. The code is available at https: //github. com/BrainCog-X/Brain-Cog/tree/main/examples/TIM.

NeurIPS Conference 2023 Conference Paper

Bullying10K: A Large-Scale Neuromorphic Dataset towards Privacy-Preserving Bullying Recognition

  • Yiting Dong
  • Yang Li
  • Dongcheng Zhao
  • Guobin Shen
  • Yi Zeng

The prevalence of violence in daily life poses significant threats to individuals' physical and mental well-being. Using surveillance cameras in public spaces has proven effective in proactively deterring and preventing such incidents. However, concerns regarding privacy invasion have emerged due to their widespread deployment. To address the problem, we leverage Dynamic Vision Sensors (DVS) cameras to detect violent incidents and preserve privacy since it captures pixel brightness variations instead of static imagery. We introduce the Bullying10K dataset, encompassing various actions, complex movements, and occlusions from real-life scenarios. It provides three benchmarks for evaluating different tasks: action recognition, temporal action localization, and pose estimation. With 10, 000 event segments, totaling 12 billion events and 255 GB of data, Bullying10K contributes significantly by balancing violence detection and personal privacy persevering. And it also poses a challenge to the neuromorphic dataset. It will serve as a valuable resource for training and developing privacy-protecting video systems. The Bullying10K opens new possibilities for innovative approaches in these domains.

IJCAI Conference 2023 Conference Paper

Enhancing Efficient Continual Learning with Dynamic Structure Development of Spiking Neural Networks

  • Bing Han
  • Feifei Zhao
  • Yi Zeng
  • Wenxuan Pan
  • Guobin Shen

Children possess the ability to learn multiple cognitive tasks sequentially, which is a major challenge toward the long-term goal of artificial general intelligence. Existing continual learning frameworks are usually applicable to Deep Neural Networks (DNNs) and lack the exploration on more brain-inspired, energy-efficient Spiking Neural Networks (SNNs). Drawing on continual learning mechanisms during child growth and development, we propose Dynamic Structure Development of Spiking Neural Networks (DSD-SNN) for efficient and adaptive continual learning. When learning a sequence of tasks, the DSD-SNN dynamically assigns and grows new neurons to new tasks and prunes redundant neurons, thereby increasing memory capacity and reducing computational overhead. In addition, the overlapping shared structure helps to quickly leverage all acquired knowledge to new tasks, empowering a single network capable of supporting multiple incremental tasks (without the separate sub-network mask for each task). We validate the effectiveness of the proposed model on multiple class incremental learning and task incremental learning benchmarks. Extensive experiments demonstrated that our model could significantly improve performance, learning speed and memory capacity, and reduce computational overhead. Besides, our DSD-SNN model achieves comparable performance with the DNNs-based methods, and significantly outperforms the state-of-the-art (SOTA) performance for existing SNNs-based continual learning methods.

TMLR Journal 2023 Journal Article

Turning a Curse into a Blessing: Enabling In-Distribution-Data-Free Backdoor Removal via Stabilized Model Inversion

  • Si Chen
  • Yi Zeng
  • Won Park
  • Jiachen T. Wang
  • Xun Chen
  • Lingjuan Lyu
  • Zhuoqing Mao
  • Ruoxi Jia

The effectiveness of many existing techniques for removing backdoors from machine learning models relies on access to clean in-distribution data. However, given that these models are often trained on proprietary datasets, it may not be practical to assume that in-distribution samples will always be available. On the other hand, model inversion techniques, which are typically viewed as privacy threats, can reconstruct realistic training samples from a given model, potentially eliminating the need for in-distribution data. To date, the only prior attempt to integrate backdoor removal and model inversion involves a simple combination that produced very limited results. This work represents a first step toward a more thorough understanding of how model inversion techniques could be leveraged for effective backdoor removal. Specifically, we seek to answer several key questions: What properties must reconstructed samples possess to enable successful defense? Is perceptual similarity to clean samples enough, or are additional characteristics necessary? Is it possible for reconstructed samples to contain backdoor triggers? We demonstrate that relying solely on perceptual similarity is insufficient for effective defenses. The stability of model predictions in response to input and parameter perturbations also plays a critical role. To address this, we propose a new bi-level optimization based framework for model inversion that promotes stability in addition to visual quality. Interestingly, we also find that reconstructed samples from a pre-trained generator's latent space do not contain backdoors, even when signals from a backdoored model are utilized for reconstruction. We provide a theoretical analysis to explain this observation. Our evaluation shows that our stabilized model inversion technique achieves state-of-the-art backdoor removal performance without requiring access to clean in-distribution data. Furthermore, its performance is on par with or even better than using the same amount of clean samples.

NeurIPS Conference 2023 Conference Paper

Where Did I Come From? Origin Attribution of AI-Generated Images

  • Zhenting Wang
  • Chen Chen
  • Yi Zeng
  • Lingjuan Lyu
  • Shiqing Ma

Image generation techniques have been gaining increasing attention recently, but concerns have been raised about the potential misuse and intellectual property (IP) infringement associated with image generation models. It is, therefore, necessary to analyze the origin of images by inferring if a specific image was generated by a particular model, i. e. , origin attribution. Existing methods only focus on specific types of generative models and require additional procedures during the training phase or generation phase. This makes them unsuitable for pre-trained models that lack these specific operations and may impair generation quality. To address this problem, we first develop an alteration-free and model-agnostic origin attribution method via reverse-engineering on image generation models, i. e. , inverting the input of a particular model for a specific image. Given a particular model, we first analyze the differences in the hardness of reverse-engineering tasks for generated samples of the given model and other images. Based on our analysis, we then propose a method that utilizes the reconstruction loss of reverse-engineering to infer the origin. Our proposed method effectively distinguishes between generated images of a specific generative model and other images, i. e. , images generated by other models and real images.

NeurIPS Conference 2022 Conference Paper

CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks

  • Xuanli He
  • Qiongkai Xu
  • Yi Zeng
  • Lingjuan Lyu
  • Fangzhao Wu
  • Jiwei Li
  • Ruoxi Jia

Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-hoc ownership verification on the imitation models. However, we find that it is possible to detect those watermarks via sufficient statistics of the frequencies of candidate watermarking words. To address this drawback, in this paper, we propose a novel Conditional wATERmarking framework (CATER) for protecting the IP of text generation APIs. An optimization method is proposed to decide the watermarking rules that can minimize the distortion of overall word distributions while maximizing the change of conditional word selections. Theoretically, we prove that it is infeasible for even the savviest attacker (they know how CATER works) to reveal the used watermarks from a large pool of potential word pairs based on statistical inspection. Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, CATER can effectively identify IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs.

IJCAI Conference 2022 Conference Paper

Efficient and Accurate Conversion of Spiking Neural Network with Burst Spikes

  • Yang Li
  • Yi Zeng

Spiking neural network (SNN), as a brain-inspired energy-efficient neural network, has attracted the interest of researchers. While the training of spiking neural networks is still an open problem. One effective way is to map the weight of trained ANN to SNN to achieve high reasoning ability. However, the converted spiking neural network often suffers from performance degradation and a considerable time delay. To speed up the inference process and obtain higher accuracy, we theoretically analyze the errors in the conversion process from three perspectives: the differences between IF and ReLU, time dimension, and pooling operation. We propose a neuron model for releasing burst spikes, a cheap but highly efficient method to solve residual information. In addition, Lateral Inhibition Pooling (LIPooling) is proposed to solve the inaccuracy problem caused by MaxPooling in the conversion process. Experimental results on CIFAR and ImageNet demonstrate that our algorithm is efficient and accurate. For example, our method can ensure nearly lossless conversion of SNN and only use about 1/10 (less than 100) simulation time under 0. 693x energy consumption of the typical method. Our code is available at https: //github. com/Brain-Inspired-Cognitive-Engine/Conversion_Burst.

AAAI Conference 2022 System Paper

FORCE: A Framework of Rule-Based Conversational Recommender System

  • Jun Quan
  • Ze Wei
  • Qiang Gan
  • Jingqi Yao
  • Jingyi Lu
  • Yuchen Dong
  • Yiming Liu
  • Yi Zeng

The conversational recommender systems (CRSs) have received extensive attention in recent years. However, most of the existing works focus on various deep learning models, which are largely limited by the requirement of large-scale human-annotated datasets. Such methods are not able to deal with the cold-start scenarios in industrial products. To alleviate the problem, we propose FORCE, a Framework Of Rulebased Conversational rEcommender system that helps developers to quickly build CRS bots by simple configuration. We conduct experiments on two datasets in different languages and domains to verify its effectiveness and usability.

AAAI Conference 2022 Conference Paper

SCIR-Net: Structured Color Image Representation Based 3D Object Detection Network from Point Clouds

  • Qingdong He
  • Hao Zeng
  • Yi Zeng
  • Yijun Liu

3D object detection from point clouds data has become an indispensable part in autonomous driving. Previous works for processing point clouds lie in either projection or voxelization. However, projection-based methods suffer from information loss while voxelization-based methods bring huge computation. In this paper, we propose to encode point clouds into structured color image representation (SCIR) and utilize 2D CNN to fulfill the 3D detection task. Specifically, we use the structured color image encoding module to convert the irregular 3D point clouds into a squared 2D tensor image, where each point corresponds to a spatial point in the 3D space. Furthermore, in order to fit for the Euclidean structure, we apply feature normalization to parameterize the 2D tensor image onto a regular dense color image. Then, we conduct repeated multi-scale fusion with different levels so as to augment the initial features and learn scale-aware feature representations for box prediction. Extensive experiments on KITTI benchmark, Waymo Open Dataset and more challenging nuScenes dataset show that our proposed method yields decent results and demonstrate the effectiveness of such representations for point clouds.

AAAI Conference 2022 Conference Paper

SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds

  • Qingdong He
  • Zhengning Wang
  • Hao Zeng
  • Yi Zeng
  • Yijun Liu

Accurate 3D object detection from point clouds has become a crucial component in autonomous driving. However, the volumetric representations and the projection methods in previous works fail to establish the relationships between the local point sets. In this paper, we propose Sparse Voxel-Graph Attention Network (SVGA-Net), a novel end-to-end trainable network which mainly contains voxel-graph module and sparse-to-dense regression module to achieve comparable 3D detection tasks from raw LIDAR data. Specifically, SVGA- Net constructs the local complete graph within each divided 3D spherical voxel and global KNN graph through all voxels. The local and global graphs serve as the attention mechanism to enhance the extracted features. In addition, the novel sparse-to-dense regression module enhances the 3D box estimation accuracy through feature maps aggregation at different levels. Experiments on KITTI detection benchmark and Waymo Open dataset demonstrate the efficiency of extending the graph representation to 3D object detection and the proposed SVGA-Net can achieve decent detection accuracy.

IJCAI Conference 2021 Conference Paper

Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models

  • Shangwei Guo
  • Tianwei Zhang
  • Han Qiu
  • Yi Zeng
  • Tao Xiang
  • Yang Liu

Watermarking has become the tendency in protecting the intellectual property of DNN models. Recent works, from the adversary's perspective, attempted to subvert watermarking mechanisms by designing watermark removal attacks. However, these attacks mainly adopted sophisticated fine-tuning techniques, which have certain fatal drawbacks or unrealistic assumptions. In this paper, we propose a novel watermark removal attack from a different perspective. Instead of just fine-tuning the watermarked models, we design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations, which can effectively and blindly destroy the memorization of watermarked models to the watermark samples. We also introduce a lightweight fine-tuning strategy to preserve the model performance. Our solution requires much less resource or knowledge about the watermarking scheme than prior works. Extensive experimental results indicate that our attack can bypass state-of-the-art watermarking solutions with very high success rates. Based on our attack, we propose watermark augmentation techniques to enhance the robustness of existing watermarks.

AAAI Conference 2018 Conference Paper

A Plasticity-Centric Approach to Train the Non-Differential Spiking Neural Networks

  • Tielin Zhang
  • Yi Zeng
  • Dongcheng Zhao
  • Mengting Shi

Many efforts have been taken to train spiking neural networks (SNNs), but most of them still need improvements due to the discontinuous and non-differential characteristics of SNNs. While the mammalian brains solve these kinds of problems by integrating a series of biological plasticity learning rules. In this paper, we will focus on two biological plausible methodologies and try to solve these catastrophic training problems in SNNs. Firstly, the biological neural network will try to keep a balance between inputs and outputs on both the neuron and the network levels. Secondly, the biological synaptic weights will be passively updated by the changes of the membrane potentials of the neighbour-hood neurons, and the plasticity of synapses will not propagate back to other previous layers. With these biological inspirations, we propose Voltage-driven Plasticity-centric SNN (VPSNN), which includes four steps, namely: feed forward inference, unsupervised equilibrium state learning, supervised last layer learning and passively updating synaptic weights based on spiketiming dependent plasticity (STDP). Finally we get the accuracy of 98. 52% on the hand-written digits classification task on MNIST. In addition, with the help of a visualization tool, we try to analyze the black box of SNN and get better understanding of what benefits have been acquired by the proposed method. 1

IJCAI Conference 2018 Conference Paper

Brain-inspired Balanced Tuning for Spiking Neural Networks

  • Tielin Zhang
  • Yi Zeng
  • Dongcheng Zhao
  • Bo Xu

Due to the nature of Spiking Neural Networks (SNNs), it is challenging to be trained by biologically plausible learning principles. The multi-layered SNNs are with non-differential neurons, temporary-centric synapses, which make them nearly impossible to be directly tuned by back propagation. Here we propose an alternative biological inspired balanced tuning approach to train SNNs. The approach contains three main inspirations from the brain: Firstly, the biological network will usually be trained towards the state where the temporal update of variables are equilibrium (e. g. membrane potential); Secondly, specific proportions of excitatory and inhibitory neurons usually contribute to stable representations; Thirdly, the short-term plasticity (STP) is a general principle to keep the input and output of synapses balanced towards a better learning convergence. With these inspirations, we train SNNs with three steps: Firstly, the SNN model is trained with three brain-inspired principles; then weakly supervised learning is used to tune the membrane potential in the final layer for network classification; finally the learned information is consolidated from membrane potential into the weights of synapses by Spike-Timing Dependent Plasticity (STDP). The proposed approach is verified on the MNIST hand-written digit recognition dataset and the performance (the accuracy of 98. 64%) indicates that the ideas of balancing state could indeed improve the learning ability of SNNs, which shows the power of proposed brain-inspired approach on the tuning of biological plausible SNNs.