Author name cluster

Shan Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

Energy-based Autoregressive Generation for Neural Population Dynamics

Ningling Ge
Sicheng Dai
Yu Zhu
Shan Yu

Understanding brain function represents a fundamental goal in neuroscience, with critical implications for therapeutic interventions and neural engineering applications. Computational modeling provides a quantitative framework for accelerating this understanding, but faces a fundamental trade-off between computational efficiency and high-fidelity modeling. To address this limitation, we introduce a novel Energy-based Autoregressive Generation (EAG) framework that employs an energy-based transformer learning temporal dynamics in latent space through strictly proper scoring rules, enabling efficient generation with realistic population and single-neuron spiking statistics. Evaluation on synthetic Lorenz datasets and two Neural Latents Benchmark datasets (MC_Maze and Area2_bump) demonstrates that EAG achieves state-of-the-art generation quality with substantial computational efficiency improvements, particularly over diffusion-based methods. Beyond optimal performance, conditional generation applications show two capabilities: generalizing to unseen behavioral contexts and improving motor brain-computer interface decoding accuracy using synthetic neural data. These results demonstrate the effectiveness of energy-based modeling for neural population dynamics with applications in neuroscience research and neural engineering.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Multi-dimensional Neural Decoding with Orthogonal Representations for Brain-Computer Interfaces

Kaixi Tian
Shengjia Zhao
Yuhan Zhang
Shan Yu

Current brain-computer interfaces primarily decode single motor variables, limiting natural control requiring simultaneous multi-dimensional extraction. We introduce Multi-dimensional Neural Decoding (MND), a task that simultaneously extracts multiple motor variables (direction, position, velocity, acceleration) from single neural population recordings. MND faces two key challenges: cross-task interference when decoding correlated motor dimensions from shared cortical representations, and generalization issues across sessions, subjects, and paradigms. To address these challenges, we propose OrthoSchema, a multi-task framework inspired by cortical orthogonal subspace organization and cognitive schema reuse. OrthoSchema enforces representation orthogonality to eliminate cross-task interference and employs selective feature reuse transfer for few-shot cross-session, subject and paradigm adaptation. Experiments on macaque motor cortex datasets demonstrate that OrthoSchema significantly improves decoding accuracy in cross-session, subject and paradigm generalization tasks, with larger performance improvements when fine-tuning samples are limited. Ablation studies confirm the synergistic effects of all components are crucial, with OrthoSchema effectively modeling cross-task features and capturing session relationships for robust transfer. Our results provide new insights into scalable and robust neural decoding for real-world BCI applications.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Efficient Reinforcement Learning Through Adaptively Pretrained Visual Encoder

Yuhan Zhang
Guoqing Ma
Guangfu Hao
Liangxuan Guo
Yang Chen
Shan Yu

While Reinforcement Learning (RL) agents can successfully learn to handle complex tasks, effectively generalizing acquired skills to unfamiliar settings remains a challenge. One of the reasons behind this is the visual encoder used are task-dependent, preventing effective feature extraction in different settings. To address this issue, recent studies have tried to pretrain encoders with diverse visual inputs in order to improve their performance. However, they rely on existing pretrained encoders without further exploring the impact of pretraining period. In this work, we propose APE: efficient reinforcement learning through Adaptively Pretrained visual Encoder—a framework that utilizes adaptive augmentation strategy during the pretraining phase and extracts useful features with only a few interactions within the task environments in the policy learning period. Experiments are conducted across various domains, including DeepMind Control Suite, Atari Games and Memory Maze benchmarks, to verify the effectiveness of our method. Results show that mainstream RL methods, such as DreamerV3 and DrQ-v2, achieve state-of-the-art performance when equipped with APE. In addition, APE significantly improves the sampling efficiency during learning, approaching the efficiency of state-based method using only visual inputs in several control tasks. These findings demonstrate the potential of adaptive pretraining of encoder in enhancing the generalization ability and efficiency of visual RL algorithms.

PDF Details DOI

AAMAS Conference 2025 Conference Paper

Mitigating Non-Stationarity in Deep Reinforcement Learning with Clustering Orthogonal Weight Modification

Guoqing Ma
Yuhan Zhang
Yuming Dai
Guangfu Hao
Yang Chen
Shan Yu

RL agents often operate under the assumption of environmental stationarity, which poses a great challenge to learning efficiency since many environments are inherently non-stationary in state distribution. To address this issue, we introduce the Clustering Orthogonal Weight Modified (COWM) layer, which can be integrated into the policy network of any RL algorithm and mitigate non-stationarity effectively. By employing clustering techniques and a projection matrix, the COWM layer stabilize the learning process. Empirically, the COWM layer is integrated into various RL methods and outperforms state-of-the-art methods on the DMControl benchmark, highlighting its robustness and generality across various tasks and algorithms.

PDF

AAAI Conference 2025 Conference Paper

Multi-Modal Latent Variables for Cross-Individual Primary Visual Cortex Modeling and Analysis

Yu Zhu
Bo Lei
Chunfeng Song
Wanli Ouyang
Shan Yu
Tiejun Huang

Elucidating the functional mechanisms of the primary visual cortex (V1) remains a fundamental challenge in systems neuroscience. Current computational models face two critical limitations, namely the challenge of cross-modal integration between partial neural recordings and complex visual stimuli, and the inherent variability in neural characteristics across individuals, including differences in neuron populations and firing patterns. To address these challenges, we present a multi-modal identifiable variational autoencoder (miVAE) that employs a two-level disentanglement strategy to map neural activity and visual stimuli into a unified latent space. This framework enables robust identification of cross-modal correlations through refined latent space modeling. We complement this with a novel score-based attribution analysis that traces latent variables back to their origins in the source data space. Evaluation on a large-scale mouse V1 dataset demonstrates that our method achieves state-of-the-art performance in cross-individual latent representation and alignment, without requiring subject-specific fine-tuning, and exhibits improved performance with increasing data size. Significantly, our attribution algorithm successfully identifies distinct neuronal subpopulations characterized by unique temporal patterns and stimulus discrimination properties, while simultaneously revealing stimulus regions that show specific sensitivity to edge features and luminance variations. This scalable framework offers promising applications not only for advancing V1 research but also for broader investigations in neuroscience.

PDF Details DOI

ICML Conference 2025 Conference Paper

Neural Representational Consistency Emerges from Probabilistic Neural-Behavioral Representation Alignment

Yu Zhu
Chunfeng Song
Wanli Ouyang
Shan Yu
Tiejun Huang 0003

Individual brains exhibit striking structural and physiological heterogeneity, yet neural circuits can generate remarkably consistent functional properties across individuals, an apparent paradox in neuroscience. While recent studies have observed preserved neural representations in motor cortex through manual alignment across subjects, the zero-shot validation of such preservation and its generalization to more cortices remain unexplored. Here we present PNBA (Probabilistic Neural-Behavioral Representation Alignment), a new framework that leverages probabilistic modeling to address hierarchical variability across trials, sessions, and subjects, with generative constraints preventing representation degeneration. By establishing reliable cross-modal representational alignment, PNBA reveals robust preserved neural representations in monkey primary motor cortex (M1) and dorsal premotor cortex (PMd) through zero-shot validation. We further establish similar representational preservation in mouse primary visual cortex (V1), reflecting a general neural basis. These findings resolve the paradox of neural heterogeneity by establishing zero-shot preserved neural representations across cortices and species, enriching neural coding insights and enabling zero-shot behavior decoding.

Details

IROS Conference 2025 Conference Paper

Visual Anomaly Detection for Reliable Robotic Implantation of Flexible Microelectrode Array

Yitong Chen
Xinyao Xu 0001
Ping Zhu
Xinyong Han
Fangbo Qin
Shan Yu

Flexible microelectrode (FME) implantation into brain cortex is challenging due to the deformable fiber-like structure of FME probe and the interaction with critical bio-tissue. To ensure the reliability and safety, the implantation process should be monitored carefully. This paper develops an image-based anomaly detection framework based on the microscopic cameras of the robotic FME implantation system. The unified framework is utilized at four checkpoints to check the micro-needle, FME probe, hooking result, and implantation point, respectively. Exploiting the existing object localization results, the aligned regions of interest (ROIs) are extracted from raw image and input to a pretrained vision transformer (ViT). Considering the task specifications, we propose a progressive granularity patch feature sampling method to address the sensitivity-tolerance trade-off issue at different locations. Moreover, we select a part of feature channels with higher signal-to-noise ratios from the raw general ViT features, to provide better descriptors for each specific scene. The effectiveness of the proposed methods is validated with the image datasets collected from our implantation system.

Details

ICRA Conference 2024 Conference Paper

AnyOKP: One-Shot and Instance-Aware Object Keypoint Extraction with Pretrained ViT

Fangbo Qin
Taogang Hou
Shan Lin
Kaiyuan Wang
Michael C. Yip
Shan Yu

Towards flexible object-centric visual perception, we propose a one-shot instance-aware object keypoint (OKP) extraction approach, AnyOKP, which leverages the powerful representation ability of pretrained vision transformer (ViT), and can obtain keypoints on multiple object instances of arbitrary category after learning from a support image. An off-the-shelf petrained ViT is directly deployed for generalizable and transferable feature extraction, which is followed by training-free feature enhancement. The best-prototype pairs (BPPs) are searched for in support and query images based on appearance similarity, to yield instance-unaware candidate keypoints. Then, the entire graph with all candidate keypoints as vertices are divided into sub-graphs according to the feature distributions on the graph edges. Finally, each sub-graph represents an object instance. AnyOKP is evaluated on real object images collected with the cameras of a robot arm, a mobile robot, and a surgical robot, which not only demonstrates the cross-category flexibility and instance awareness, but also show remarkable robustness to domain shift and viewpoint change.

Details

AAAI Conference 2024 Conference Paper

Continuous Rotation Group Equivariant Network Inspired by Neural Population Coding

Zhiqiang Chen
Yang Chen
Xiaolong Zou
Shan Yu

Neural population coding can represent continuous information by neurons with a series of discrete preferred stimuli, and we find that the bell-shaped tuning curve plays an important role in this mechanism. Inspired by this, we incorporate a bell-shaped tuning curve into the discrete group convolution to achieve continuous group equivariance. Simply, we modulate group convolution kernels by Gauss functions to obtain bell-shaped tuning curves. Benefiting from the modulation, kernels also gain smooth gradients on geometric dimensions (e.g., location dimension and orientation dimension). It allows us to generate group convolution kernels from sparse weights with learnable geometric parameters, which can achieve both competitive performances and parameter efficiencies. Furthermore, we quantitatively prove that discrete group convolutions with proper tuning curves (bigger than 1x sampling step) can achieve continuous equivariance. Experimental results show that 1) our approach achieves very competitive performances on MNIST-rot with at least 75% fewer parameters compared with previous SOTA methods, which is efficient in parameter; 2) Especially with small sample sizes, our approach exhibits more pronounced performance improvements (up to 24%); 3) It also has excellent rotation generalization ability on various datasets such as MNIST, CIFAR, and ImageNet with both plain and ResNet architectures.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Learning from Pattern Completion: Self-supervised Controllable Generation

Zhiqiang Chen
Guofan Fan
Jinying Gao
Lei Ma
Bo Lei
Tiejun Huang
Shan Yu

The human brain exhibits a strong ability to spontaneously associate different visual attributes of the same or similar visual scene, such as associating sketches and graffiti with real-world visual objects, usually without supervising information. In contrast, in the field of artificial intelligence, controllable generation methods like ControlNet heavily rely on annotated training datasets such as depth maps, semantic segmentation maps, and poses, which limits the method’s scalability. Inspired by the neural mechanisms that may contribute to the brain’s associative power, specifically the cortical modularization and hippocampal pattern completion, here we propose a self-supervised controllable generation (SCG) framework. Firstly, we introduce an equivariance constraint to promote inter-module independence and intra-module correlation in a modular autoencoder network, thereby achieving functional specialization. Subsequently, based on these specialized modules, we employ a self-supervised pattern completion approach for controllable generation training. Experimental results demonstrate that the proposed modular autoencoder effectively achieves functional specialization, including the modular processing of color, brightness, and edge detection, and exhibits brain-like features including orientation selectivity, color antagonism, and center-surround receptive fields. Through self-supervised training, associative generation capabilities spontaneously emerge in SCG, demonstrating excellent zero-shot generalization ability to various tasks such as superresolution, dehaze and associative or conditional generation on painting, sketches, and ancient graffiti. Compared to the previous representative method ControlNet, our proposed approach not only demonstrates superior robustness in more challenging high-noise scenarios but also possesses more promising scalability potential due to its self-supervised manner. Codes are released on Github and Gitee.

PDF Details DOI

JMLR Journal 2024 Journal Article

Nonparametric Regression for 3D Point Cloud Learning

Xinyi Li
Shan Yu
Yueying Wang
Guannan Wang
Li Wang
Ming-Jun Lai

In recent years, there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas. Motivated by the importance of solid modeling for point clouds, we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud. The proposed method can denoise or deblur the point cloud effectively, provide a multi-resolution reconstruction of the actual signal, and handle sparse and irregularly distributed point clouds to recover the underlying trajectory. In addition, our method provides a natural way of numerosity data reduction. We establish the theoretical guarantees of the proposed method, including the convergence rate and asymptotic normality of the estimator, and show that the convergence rate achieves optimal nonparametric convergence. We also introduce a bootstrap method to quantify the uncertainty of the estimators. Through extensive simulation studies and a real data example, we demonstrate the superiority of the proposed method over traditional smoothing methods in terms of estimation accuracy and efficiency of data reduction. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

PDF Details

NeurIPS Conference 2024 Conference Paper

Retrospective for the Dynamic Sensorium Competition for predicting large-scale mouse primary visual cortex activity from videos

Polina Turishcheva
Paul G. Fahey
Michaela Vystrčilová
Laura Hansel
Rachel Froebe
Kayla Ponder
Yongrong Qiu
Konstantin F. Willeke

Understanding how biological visual systems process information is challenging because of the nonlinear relationship between visual input and neuronal responses. Artificial neural networks allow computational neuroscientists to create predictive models that connect biological and machine vision. Machine learning has benefited tremendously from benchmarks that compare different models on the same task under standardized conditions. However, there was no standardized benchmark to identify state-of-the-art dynamic models of the mouse visual system. To address this gap, we established the SENSORIUM 2023 Benchmark Competition with dynamic input, featuring a new large-scale dataset from the primary visual cortex of ten mice. This dataset includes responses from 78, 853 neurons to 2 hours of dynamic stimuli per neuron, together with behavioral measurements such as running speed, pupil dilation, and eye movements. The competition ranked models in two tracks based on predictive performance for neuronal responses on a held-out test set: one focusing on predicting in-domain natural stimuli and another on out-of-distribution (OOD) stimuli to assess model generalization. As part of the NeurIPS 2023 Competition Track, we received more than 160 model submissions from 22 teams. Several new architectures for predictive models were proposed, and the winning teams improved the previous state-of-the-art model by 50\%. Access to the dataset as well as the benchmarking infrastructure will remain online at www. sensorium-competition. net.

PDF Details DOI

AAAI Conference 2021 Conference Paper

DeepCollaboration: Collaborative Generative and Discriminative Models for Class Incremental Learning

Bo Cui
Guyue Hu
Shan Yu

An important challenge for neural networks is to learn incrementally, i. e. , learn new classes without catastrophic forgetting. To overcome this problem, generative replay technique has been suggested, which can generate samples belonging to learned classes while learning new ones. However, such generative models usually suffer from increased distribution mismatch between the generated and original samples along the learning process. In this work, we propose DeepCollaboration (D-Collab), a collaborative framework of deep generative and discriminative models to solve this problem effectively. We develop a discriminative learning model to incrementally update the latent feature space for continual classification. At the same time, a generative model is introduced to achieve conditional generation using the latent feature distribution produced by the discriminative model. Importantly, the generative and discriminative models are connected through bidirectional training to enforce cycle-consistency of mappings between feature and image domains. Furthermore, a domain alignment module is used to eliminate the divergence between the feature distributions of generated images and real ones. This module together with the discriminative model can perform effective sample mining to facilitate incremental learning. Extensive experiments on several visual recognition datasets show that our system can achieve stateof-the-art performance.

PDF Details