Arrow Research search

Author name cluster

Si Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
1 author row

Possible papers

35

AAAI Conference 2026 Conference Paper

Refinement Contrastive Learning of Cell–Gene Associations for Unsupervised Cell Type Identification

  • Liang Peng
  • Haopeng Liu
  • Yixuan Ye
  • Cheng Liu
  • Wenjun Shen
  • Si Wu
  • Hau-San Wong

Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies. Although a range of clustering methods have been developed, most focus exclusively on intrinsic cellular structure and ignore the pivotal role of cell-gene associations, which limits their ability to distinguish closely related cell types. To this end, we propose a Refinement Contrastive Learning framework (scRCL) that explicitly incorporates cell-gene interactions to derive more informative representations. Specifically, we introduce two contrastive distribution alignment components that reveal reliable intrinsic cellular structures by effectively exploiting cell-cell structural relationships. Additionally, we develop a refinement module that integrates gene-correlation structure learning to enhance cell embeddings by capturing underlying cell-gene associations. This module strengthens connections between cells and their associated genes, refining the representation learning to exploiting biologically meaningful relationships. Extensive experiments on several single-cell RNA-seq and spatial transcriptomics benchmark datasets demonstrate that our method consistently outperforms state-of-the-art baselines in cell-type identification accuracy. Moreover, downstream biological analyses confirm that the recovered cell populations exhibit coherent gene-expression signatures, further validating the biological relevance of our approach.

AAAI Conference 2025 Conference Paper

3DHumanEdit: Multi-modal Body Part-aware Conditioning Information Integration for 3D Human Manipulation

  • FeiFan Xu
  • Tianyi Chen
  • Fan Yang
  • Yunfei Zhang
  • Si Wu

The rapid advancement of 3D Generative Adversarial Networks (GANs) has significantly enhanced the diversity and quality of generated 3D images. Despite these breakthroughs, the manipulation capabilities of 3D GANs remain unexplored, presenting substantial challenges for practical applications where user interaction and modification are essential. Current manipulation methods often lack the precision needed for fine-grained attribute manipulation, and struggle to maintain multi-view consistency during the editing process. To address these limitations, we propose 3DHumanEdit, a novel approach for 3D human body part-aware manipulation. 3DHumanEdit leverages multi-modal feature fusion and body part-aware feature alignment to achieve precise manipulation of individual body parts based on detailed text inputs and segmentation images. By exploring 3D prior for accurate editing and enforcing correspondence in latent space, 3DHumanEdit ensures coherence across multiple views. Experiments demonstrate that 3DHumanEdit outperforms existing methods in both editing fidelity and multi-view consistency, offering a robust solution for fine-grained 3D manipulation.

JBHI Journal 2025 Journal Article

Deep Self-Reinforced Multi-View Subspace Clustering for Cancer Subtyping

  • Cheng Liu
  • Baoyuan Zheng
  • Jiaojiao Wang
  • Xibiao Wang
  • Hang Gao
  • Fei Wang
  • Wenjun Shen
  • Si Wu

Identifying cancer subtypes is crucial for understanding disease progression and guiding precision medicine. With advances in high-throughput experimental technologies, the integration of multiple types of omics data for cancer subtype identification has become increasingly feasible. However, despite the promising performance of existing integrative cancer subtyping methods, efficiently integrating and clustering multi-omics datasets remains challenging due to the high levels of noise inherent in omics data, which impede the accurate characterization of relationships among samples. To address these challenges, we propose a novel deep multi-view subspace clustering model that incorporates a self-reinforced learning strategy. This strategy iteratively improves the quality of self-representation, which is critical for accurately capturing sample relationships and enabling effective clustering. Specifically, during model training, the proposed method learns a highly reliable self-representation through a good-neighbor learning mechanism, allowing it to model more accurate and robust inter-sample relationships. Building upon this reliable self-representation, we further develop a learnable view-graph fusion framework that integrates complementary information across multiple omics views to derive a consensus representation for clustering, thereby guiding the overall learning process. In addition, we introduce a local graph-guided learning mechanism based on an initial graph constructed from the raw data. This mechanism serves as an effective regularization strategy to prevent the model from converging to suboptimal solutions, thereby enhancing stability and robustness during training. Extensive experimental results demonstrate that the proposed method consistently outperforms several state-of-the-art approaches, validating its effectiveness and robustness for cancer subtype identification.

AAAI Conference 2025 Conference Paper

Discrete Prior-Based Temporal-Coherent Content Prediction for Blind Face Video Restoration

  • Lianxin Xie
  • Bingbing Zheng
  • Wen Xue
  • Yunfei Zhang
  • Le Jiang
  • Ruotao Xu
  • Si Wu
  • Hau-San Wong

Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations. This task poses a significant challenge of managing temporal heterogeneity while at the same time maintaining stable face attributes. In this paper, we introduce a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge, and our model is referred to as DP-TempCoh. Specifically, we incorporate a spatial-temporal-aware content prediction module to synthesize high-quality content from discrete visual priors, conditioned on degraded video tokens. To further enhance the temporal coherence of the predicted content, a motion statistics modulation module is designed to adjust the content, based on discrete motion priors in terms of cross-frame mean and variance. As a result, the statistics of the predicted content can match with that of real videos over time. By performing extensive experiments, we verify the effectiveness of the design elements and demonstrate the superior performance of our DP-TempCoh in both synthetically and naturally degraded video restoration.

AAAI Conference 2025 Conference Paper

RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting

  • Wen Xue
  • Chun Ding
  • Ruotao Xu
  • Si Wu
  • Yong Xu
  • Hau-San Wong

Face retouching aims to remove facial imperfections from image and videos while at the same time preserving face attributes. The existing methods are designed to perform non-interactive end-to-end retouching, while the ability to interact with users is highly demanded in downstream applications. In this paper, we propose RetouchGPT, a novel framework that leverages Large Language Models (LLMs) to guide the interactive retouching process. Towards this end, we design an instruction-driven imperfection prediction module to accurately identify imperfections by integrating textual and visual features. To learn imperfection prompts, we further incorporate a LLM-based embedding module to fuse multi-modal conditioning information. The prompt-based feature modification is performed in each transformer block, such that the imperfection features are suppressed and replaced with the features of normal skin progressively. Extensive experiments have been performed to verify effectiveness of our design elements and demonstrate that RetouchGPT is a useful tool for interactive face retouching and achieves superior performance over state-of-the-arts.

AAAI Conference 2025 Conference Paper

Self-Correcting Robot Manipulation via Gaussian-Splatted Foresight

  • Shaohui Pan
  • Yong Xu
  • Ruotao Xu
  • Zihan Zhou
  • Si Wu
  • Zhuliang Yu

Language-conditioned robotic manipulation in unstructured environments presents significant challenges for intelligent robotic systems. However, due to partial observation or imprecise action prediction, failure may be unavoidable for learned policies. Moreover, operational failures can lead to the robotic arm entering an untrained state, potentially causing destructive results. Consequently, the ability to detect and self-correct failures is crucial for the development of practical robotic systems. To address this challenge, we propose a foresight-driven failure detection and self-correction module for robot manipulation. By leveraging 3D Gaussian Splatting, we represent the current scene with multiple Gaussians. Subsequently, we train a prediction network to forecast the Gaussian representation of future scenes conditioned on planned actions. Failure is detected when the predicted future significantly deviates from the real observation after action execution. In such cases, the end-effector rolls back to the previous action to avoid an untrained state. Integrating this approach with the PerACT framework, we develop a self-correcting robot manipulation policy. Evaluations on ten RLBench tasks with 166 variations demonstrate the superior performance of the proposed method, which outperforms state-of-the-art methods by 12.0% success rate on average.

AAAI Conference 2025 Conference Paper

SpotDiff: Spatial Gene Expression Imputation Diffusion with Single-Cell RNA Sequencing Data Integration

  • Tianyi Chen
  • Yunfei Zhang
  • Lianxin Xie
  • Wenjun Shen
  • Si Wu
  • Hau-San Wong

The advent of Spatial Transcriptomics (ST) has revolutionized understanding of tissue architecture by creating high-resolution maps of gene expression patterns. However, the low capture rate of ST leads to significant sparsity. The aim of imputation is to recover biological signals by imputing the dropouts in ST data to approximate the true expression values. In this paper, we introduce a Spatial Gene Expression Imputation Diffusion model to facilitate ST data imputation, and our model is referred to as SpotDiff. Specifically, we incorporate a spot-gene prompt learning module to capture the association between spots and genes. Further, SpotDiff integrates single-cell RNA sequencing data to impute gene expression at each spot. The proposed approach is able to reduce the uncertainty in the imputation process, since the aggregation of multiple single-cell measurements yield a stable representation of the corresponding spot expression profile. Extensive experiments have been performed to demonstrate that SpotDiff outperforms existing imputation methods across multiple benchmarks in terms of yielding more accurate and biologically relevant gene expression profiles, particularly in highly sparse scenarios.

NeurIPS Conference 2025 Conference Paper

Unfolding the Black Box of Recurrent Neural Networks for Path Integration

  • Tianhao Chu
  • Yuling Wu
  • Neil Burgess
  • Zilong Ji
  • Si Wu

Path integration is essential for spatial navigation. Experimental studies have identified neural correlates for path integration, but exactly how the neural system accomplishes this computation remains unresolved. Here, we adopt recurrent neural networks (RNNs) trained to perform a path integration task to explore this issue. After training, we borrow neuroscience prior knowledge and methods to unfold the black box of the trained model, including: clarifying neuron types based on their receptive fields, dissecting information flows between neuron groups by pruning their connections, and analyzing internal dynamics of neuron groups using the attractor framework. Intriguingly, we uncover a hierarchical information processing pathway embedded in the RNN model, along which velocity information of an agent is first forwarded to band cells, band and grid cells then coordinate to carry out path integration, and finally grid cells output the agent location. Inspired by the RNN-based study, we construct a neural circuit model, in which band cells form one-dimensional (1D) continuous attractor neural networks (CANNs) and serve as upstream neurons to support downstream grid cells to carry out path integration in the 2D space. Our study challenges the conventional view of considering grid cells as the principal velocity integrator, and supports a neural circuit model with the hierarchy of band and grid cells.

NeurIPS Conference 2025 Conference Paper

Vector Quantization in the Brain: Grid-like Codes in World Models

  • Xiangyuan Peng
  • Xingsi Dong
  • Si Wu

We propose Grid-like Code Quantization (GCQ), a brain-inspired method for compressing observation-action sequences into discrete representations using grid-like patterns in attractor dynamics. Unlike conventional vector quantization approaches that operate on static inputs, GCQ performs spatiotemporal compression through an action-conditioned codebook, where codewords are derived from continuous attractor neural networks and dynamically selected based on actions. This enables GCQ to jointly compress space and time, serving as a unified world model. The resulting representation supports long-horizon prediction, goal-directed planning, and inverse modeling. Experiments across diverse tasks demonstrate GCQ's effectiveness in compact encoding and downstream performance. Our work offers both a computational tool for efficient sequence modeling and a theoretical perspective on the formation of grid-like codes in neural systems.

AAAI Conference 2024 Conference Paper

RetouchFormer: Semi-supervised High-Quality Face Retouching Transformer with Prior-Based Selective Self-Attention

  • Xue Wen
  • Lianxin Xie
  • Le Jiang
  • Tianyi Chen
  • Si Wu
  • Cheng Liu
  • Hau-San Wong

Face retouching is to beautify a face image, while preserving the image content as much as possible. It is a promising yet challenging task to remove face imperfections and fill with normal skin. Generic image enhancement methods are hampered by the lack of imperfection localization, which often results in incomplete removal of blemishes at large scales. To address this issue, we propose a transformer-based approach, RetouchFormer, which simultaneously identify imperfections and synthesize realistic content in the corresponding regions. Specifically, we learn a latent dictionary to capture the clean face priors, and predict the imperfection regions via a reconstruction-oriented localization module. Also based on this, we can realize face retouching by explicitly suppressing imperfections in our selective self-attention computation, such that local content will be synthesized from normal skin. On the other hand, multi-scale feature tokens lead to increased flexibility in dealing with the imperfections at various scales. The design elements bring greater effectiveness and efficiency. RetouchFormer outperforms the advanced face retouching methods and synthesizes clean face images with high fidelity in our list of extensive experiments performed.

IJCAI Conference 2024 Conference Paper

SCTrans: Multi-scale scRNA-seq Sub-vector Completion Transformer for Gene-selective Cell Type Annotation

  • Lu Lin
  • Wen Xue
  • Xindian Wei
  • Wenjun Shen
  • Cheng Liu
  • Si Wu
  • Hau San Wong

Cell type annotation is pivotal to single-cell RNA sequencing data (scRNA-seq)-based biological and medical analysis, e. g. , identifying biomarkers, exploring cellular heterogeneity, and understanding disease mechanisms. The previous annotation methods typically learn a nonlinear mapping to infer cell type from gene expression vectors, and thus fall short in discovering and associating salient genes with specific cell types. To address this issue, we propose a multi-scale scRNA-seq Sub-vector Completion Transformer, and our model is referred to as SCTrans. Considering that the expressiveness of gene sub-vectors is richer than that of individual genes, we perform multi-scale partitioning on gene vectors followed by masked sub-vector completion, conditioned on unmasked ones. Toward this end, the multi-scale sub-vectors are tokenized, and the intrinsic contextual relationships are modeled via self-attention computation and conditional contrastive regularization imposed on an encoding transformer. By performing mutual learning between the encoder and an additional lightweight counterpart, the salient tokens can be distinguished from the others. As a result, we can perform gene-selective cell type annotation, which contributes to our superior performance over state-of-the-art annotation methods.

NeurIPS Conference 2024 Conference Paper

The motion planning neural circuit in goal-directed navigation as Lie group operator search

  • Junfeng Zuo
  • Ying N. Wu
  • Si Wu
  • Wen-Hao Zhang

The information processing in the brain and embodied agents form a sensory-action loop to interact with the world. An important step in the loop is motion planning which selects motor actions based on the current world state and task need. In goal-directed navigation, the brain chooses and generates motor actions to bring the current state into the goal state. It is unclear about the neural circuit mechanism of motor action selection, nor its underlying theory. The present study formulates the motion planning as a Lie group operator search problem, and uses the 1D rotation group as an example to provide insight into general operator search in neural circuits. We found the abstract group operator search can be implemented by a two-layer feedforward circuit utilizing circuit motifs of connection phase shift, nonlinear activation function, and pooling, similar to Drosophila's goal-directed navigation neural circuits. And the computational complexity of the feedforward circuit can be even lower than common signal processing algorithms in certain conditions. We also provide geometric interpretations of circuit computation in the group representation space. The feedforward motion planning circuit is further combined with sensory and motor circuit modules into a full circuit of the sensory-action loop implementing goal-directed navigation. Our work for the first time links the abstract operator search with biological neural circuits.

NeurIPS Conference 2024 Conference Paper

To Learn or Not to Learn, That is the Question — A Feature-Task Dual Learning Model of Perceptual Learning

  • Xiao Liu
  • Muyang Lyu
  • Cong Yu
  • Si Wu

Perceptual learning refers to the practices through which participants learn to improve their performance in perceiving sensory stimuli. Two seemingly conflicting phenomena of specificity and transfer have been widely observed in perceptual learning. Here, we propose a dual-learning model to reconcile these two phenomena. The model consists of two learning processes. One is task-based learning, which is fast and enables the brain to adapt to a task rapidly by using existing feature representations. The other is feature-based learning, which is slow and enables the brain to improve feature representations to match the statistical change of the environment. Associated with different training paradigms, the interactions between these two learning processes induce the rich phenomena of perceptual learning. Specifically, in the training paradigm where the same stimulus condition is presented excessively, feature-based learning is triggered, which incurs specificity, while in the paradigm where the stimulus condition varies during the training, task-based learning dominates to induce the transfer effect. As the number of training sessions under the same stimulus condition increases, a transition from transfer to specificity occurs. We demonstrate that the dual-learning model can account for both the specificity and transfer phenomena observed in classical psychophysical experiments. We hope that this study gives us insight into understanding how the brain balances the accomplishment of a new task and the consumption of learning effort.

NeurIPS Conference 2023 Conference Paper

A Recurrent Neural Circuit Mechanism of Temporal-scaling Equivariant Representation

  • Junfeng Zuo
  • Xiao Liu
  • Ying Nian Wu
  • Si Wu
  • Wenhao Zhang

Time perception is critical in our daily life. An important feature of time perception is temporal scaling (TS): the ability to generate temporal sequences (e. g. , motor actions) at different speeds. However, it is largely unknown about the math principle underlying temporal scaling in recurrent circuits in the brain. To shed insight, the present study investigates the temporal scaling from the Lie group point of view. We propose a canonical nonlinear recurrent circuit dynamics, modeled as a continuous attractor network, whose neuronal population responses embed a temporal sequence that is TS equivariant. Furthermore, we found the TS group operators can be explicitly represented by a control input fed into the recurrent circuit, where the input gain determines the temporal scaling factor (group parameter), and the spatial offset between the control input and network state emerges the generator. The neuronal responses in the recurrent circuit are also consistent with experimental findings. We illustrated that the recurrent circuit can drive a feedforward circuit to generate complex temporal sequences with different time scales, even in the case of negative time scaling (''time reversal''). Our work for the first time analytically links the abstract temporal scaling group and concrete neural circuit dynamics.

NeurIPS Conference 2023 Conference Paper

Learning and processing the ordinal information of temporal sequences in recurrent neural circuits

  • Xiaolong Zou
  • Zhikun Chu
  • Qinghai Guo
  • Jie Cheng
  • Bo Ho
  • Si Wu
  • Yuanyuan Mi

Temporal sequence processing is fundamental in brain cognitive functions. Experimental data has indicated that the representations of ordinal information and contents of temporal sequences are disentangled in the brain, but the neural mechanism underlying this disentanglement remains largely unclear. Here, we investigate how recurrent neural circuits learn to represent the abstract order structure of temporal sequences, and how this disentangled representation of order structure from that of contents facilitates the processing of temporal sequences. We show that with an appropriate learn protocol, a recurrent neural circuit can learn a set of tree-structured attractor states to encode the corresponding tree-structured orders of given temporal sequences. This abstract temporal order template can then be bound with different contents, allowing for flexible and robust temporal sequence processing. Using a transfer learning task, we demonstrate that the reuse of a temporal order template facilitates the acquisition of new temporal sequences of the same or similar ordinal structure. Using a key-word spotting task, we demonstrate that the attractor representation of order structure improves the robustness of temporal sequence discrimination, if the ordinal information is the key to differentiate different sequences. We hope this study gives us insights into the neural mechanism of representing the ordinal information of temporal sequences in the brain, and helps us to develop brain-inspired temporal sequence processing algorithms.

NeurIPS Conference 2023 Conference Paper

Neural Sampling in Hierarchical Exponential-family Energy-based Models

  • Xingsi Dong
  • Si Wu

Bayesian brain theory suggests that the brain employs generative models to understand the external world. The sampling-based perspective posits that the brain infers the posterior distribution through samples of stochastic neuronal responses. Additionally, the brain continually updates its generative model to approach the true distribution of the external world. In this study, we introduce the Hierarchical Exponential-family Energy-based (HEE) model, which captures the dynamics of inference and learning. In the HEE model, we decompose the partition function into individual layers and leverage a group of neurons with shorter time constants to sample the gradient of the decomposed normalization term. This allows our model to estimate the partition function and perform inference simultaneously, circumventing the negative phase encountered in conventional energy-based models (EBMs). As a result, the learning process is localized both in time and space, and the model is easy to converge. To match the brain's rapid computation, we demonstrate that neural adaptation can serve as a momentum term, significantly accelerating the inference process. On natural image datasets, our model exhibits representations akin to those observed in the biological visual system. Furthermore, for the machine learning community, our model can generate observations through joint or marginal generation. We show that marginal generation outperforms joint generation and achieves performance on par with other EBMs.

NeurIPS Conference 2023 Conference Paper

Slow and Weak Attractor Computation Embedded in Fast and Strong E-I Balanced Neural Dynamics

  • Xiaohan Lin
  • Liyuan Li
  • Boxin Shi
  • Tiejun Huang
  • Yuanyuan Mi
  • Si Wu

Attractor networks require neuronal connections to be highly structured in order to maintain attractor states that represent information, while excitation and inhibition balanced networks (E-INNs) require neuronal connections to be random and sparse to generate irregular neuronal firings. Despite being regarded as canonical models of neural circuits, both types of networks are usually studied in isolation, and it remains unclear how they coexist in the brain, given their very different structural demands. In this study, we investigate the compatibility of continuous attractor neural networks (CANNs) and E-INNs. In line with recent experimental data, we find that a neural circuit can exhibit both the traits of CANNs and E-INNs if the neuronal synapses consist of two sets: one set is strong and fast for irregular firing, and the other set is weak and slow for attractor dynamics. Our results from simulations and theoretical analysis reveal that the network also exhibits enhanced performance compared to the case of using only one set of synapses, with accelerated convergence of attractor states and retained E-I balanced condition for localized input. We also apply the network model to solve a real-world tracking problem and demonstrate that it can track fast-moving objects well. We hope that this study provides insight into how structured neural computations are realized by irregular firings of neurons.

NeurIPS Conference 2022 Conference Paper

Adaptation Accelerating Sampling-based Bayesian Inference in Attractor Neural Networks

  • Xingsi Dong
  • Zilong Ji
  • Tianhao Chu
  • Tiejun Huang
  • Wenhao Zhang
  • Si Wu

The brain performs probabilistic Bayesian inference to interpret the external world. The sampling-based view assumes that the brain represents the stimulus posterior distribution via samples of stochastic neuronal responses. Although the idea of sampling-based inference is appealing, it faces a critical challenge of whether stochastic sampling is fast enough to match the rapid computation of the brain. In this study, we explore how latent stimulus sampling can be accelerated in neural circuits. Specifically, we consider a canonical neural circuit model called continuous attractor neural networks (CANNs) and investigate how sampling-based inference of latent continuous variables is accelerated in CANNs. Intriguingly, we find that by including noisy adaptation in the neuronal dynamics, the CANN is able to speed up the sampling process significantly. We theoretically derive that the CANN with noisy adaptation implements the efficient sampling method called Hamiltonian dynamics with friction, where noisy adaption effectively plays the role of momentum. We theoretically analyze the sampling performances of the network and derive the condition when the acceleration has the maximum effect. Simulation results confirm our theoretical analyses. We further extend the model to coupled CANNs and demonstrate that noisy adaptation accelerates the sampling of the posterior distribution of multivariate stimuli. We hope that this study enhances our understanding of how Bayesian inference is realized in the brain.

NeurIPS Conference 2022 Conference Paper

Oscillatory Tracking of Continuous Attractor Neural Networks Account for Phase Precession and Procession of Hippocampal Place Cells

  • Tianhao Chu
  • Zilong Ji
  • Junfeng Zuo
  • Wenhao Zhang
  • Tiejun Huang
  • Yuanyuan Mi
  • Si Wu

Hippocampal place cells of freely moving rodents display an intriguing temporal organization in their responses known as `theta phase precession', in which individual neurons fire at progressively earlier phases in successive theta cycles as the animal traverses the place fields. Recent experimental studies found that in addition to phase precession, many place cells also exhibit accompanied phase procession, but the underlying neural mechanism remains unclear. Here, we propose a neural circuit model to elucidate the generation of both kinds of phase shift in place cells' firing. Specifically, we consider a continuous attractor neural network (CANN) with feedback inhibition, which is inspired by the reciprocal interaction between the hippocampus and the medial septum. The feedback inhibition induces intrinsic mobility of the CANN which competes with the extrinsic mobility arising from the external drive. Their interplay generates an oscillatory tracking state, that is, the network bump state (resembling the decoded virtual position of the animal) sweeps back and forth around the external moving input (resembling the physical position of the animal). We show that this oscillatory tracking naturally explains the forward and backward sweeps of the decoded position during the animal's locomotion. At the single neuron level, the forward and backward sweeps account for, respectively, theta phase precession and procession. Furthermore, by tuning the feedback inhibition strength, we also explain the emergence of bimodal cells and unimodal cells, with the former having co-existed phase precession and procession, and the latter having only significant phase precession. We hope that this study facilitates our understanding of hippocampal temporal coding and lays foundation for unveiling their computational functions.

NeurIPS Conference 2022 Conference Paper

Translation-equivariant Representation in Recurrent Networks with a Continuous Manifold of Attractors

  • Wenhao Zhang
  • Ying Nian Wu
  • Si Wu

Equivariant representation is necessary for the brain and artificial perceptual systems to faithfully represent the stimulus under some (Lie) group transformations. However, it remains unknown how recurrent neural circuits in the brain represent the stimulus equivariantly, nor the neural representation of abstract group operators. The present study uses a one-dimensional (1D) translation group as an example to explore the general recurrent neural circuit mechanism of the equivariant stimulus representation. We found that a continuous attractor network (CAN), a canonical neural circuit model, self-consistently generates a continuous family of stationary population responses (attractors) that represents the stimulus equivariantly. Inspired by the Drosophila's compass circuit, we found that the 1D translation operators can be represented by extra speed neurons besides the CAN, where speed neurons' responses represent the moving speed (1D translation group parameter), and their feedback connections to the CAN represent the translation generator (Lie algebra). We demonstrated that the network responses are consistent with experimental data. Our model for the first time demonstrates how recurrent neural circuitry in the brain achieves equivariant stimulus representation.

AAAI Conference 2021 Conference Paper

High Fidelity GAN Inversion via Prior Multi-Subspace Feature Composition

  • Guanyue Li
  • Qianfen Jiao
  • Sheng Qian
  • Si Wu
  • Hau-San Wong

Generative Adversarial Networks (GANs) have shown impressive gains in image synthesis. GAN inversion was recently studied to understand and utilize the knowledge it learns, where a real image is inverted back to a latent code and can thus be reconstructed by the generator. Although increasing the number of latent codes can improve inversion quality to a certain extent, we find that important details may still be neglected when performing feature composition over all the intermediate feature channels. To address this issue, we propose a Prior multi-Subspace Feature Composition (PmSFC) approach for high-fidelity inversion. Considering that the intermediate features are highly correlated with each other, we incorporate a self-expressive layer in the generator to discover meaningful subspaces. In this case, the features at a channel can be expressed as a linear combination of those at other channels in the same subspace. We perform feature composition separately in the subspaces. The semantic differences between them benefit the inversion quality, since the inversion process is regularized based on different aspects of semantics. In the experiments, the superior performance of PmSFC demonstrates the effectiveness of prior subspaces in facilitating GAN inversion together with extended applications in visual manipulation.

NeurIPS Conference 2021 Conference Paper

Noisy Adaptation Generates Lévy Flights in Attractor Neural Networks

  • Xingsi Dong
  • Tianhao Chu
  • Tiejun Huang
  • Zilong Ji
  • Si Wu

Lévy flights describe a special class of random walks whose step sizes satisfy a power-law tailed distribution. As being an efficientsearching strategy in unknown environments, Lévy flights are widely observed in animal foraging behaviors. Recent studies further showed that human cognitive functions also exhibit the characteristics of Lévy flights. Despite being a general phenomenon, the neural mechanism at the circuit level for generating Lévy flights remains unresolved. Here, we investigate how Lévy flights can be achieved in attractor neural networks. To elucidate the underlying mechanism clearly, we first study continuous attractor neural networks (CANNs), and find that noisy neural adaptation, exemplified by spike frequency adaptation (SFA) in this work, can generate Lévy flights representing transitions of the network state in the attractor space. Specifically, the strength of SFA defines a travelling wave boundary, below which the network state displays local Brownian motion, and above which the network state displays long-jump motion. Noises in neural adaptation causes the network state to intermittently switch between these two motion modes, manifesting the characteristics of Lévy flights. We further extend the study to a general attractor neural network, and demonstrate that our model can explain the Lévy-flight phenomenon observed during free memory retrieval of humans. We hope that this study will give us insight into understanding the neural mechanism for optimal information processing in the brain.

NeurIPS Conference 2019 Conference Paper

A Normative Theory for Causal Inference and Bayes Factor Computation in Neural Circuits

  • Wenhao Zhang
  • Si Wu
  • Brent Doiron
  • Tai Sing Lee

This study provides a normative theory for how Bayesian causal inference can be implemented in neural circuits. In both cognitive processes such as causal reasoning and perceptual inference such as cue integration, the nervous systems need to choose different models representing the underlying causal structures when making inferences on external stimuli. In multisensory processing, for example, the nervous system has to choose whether to integrate or segregate inputs from different sensory modalities to infer the sensory stimuli, based on whether the inputs are from the same or different sources. Making this choice is a model selection problem requiring the computation of Bayes factor, the ratio of likelihoods between the integration and the segregation models. In this paper, we consider the causal inference in multisensory processing and propose a novel generative model based on neural population code that takes into account both stimulus feature and stimulus reliability in the inference. In the case of circular variables such as heading direction, our normative theory yields an analytical solution for computing the Bayes factor, with a clear geometric interpretation, which can be implemented by simple additive mechanisms with neural population code. Numerical simulation shows that the tunings of the neurons computing Bayes factor are consistent with the "opposite neurons" discovered in dorsal medial superior temporal (MSTd) and the ventral intraparietal (VIP) areas for visual-vestibular processing. This study illuminates a potential neural mechanism for causal inference in the brain.

AAAI Conference 2019 Conference Paper

Improving Domain-Specific Classification by Collaborative Learning with Adaptation Networks

  • Si Wu
  • Jian Zhong
  • Wenming Cao
  • Rui Li
  • Zhiwen Yu
  • Hau-San Wong

For unsupervised domain adaptation, the process of learning domain-invariant representations could be dominated by the labeled source data, such that the specific characteristics of the target domain may be ignored. In order to improve the performance in inferring target labels, we propose a targetspecific network which is capable of learning collaboratively with a domain adaptation network, instead of directly minimizing domain discrepancy. A clustering regularization is also utilized to improve the generalization capability of the target-specific network by forcing target data points to be close to accumulated class centers. As this network learns and specializes to the target domain, its performance in inferring target labels improves, which in turn facilitates the learning process of the adaptation network. Therefore, there is a mutually beneficial relationship between these two networks. We perform extensive experiments on multiple digit and object datasets, and the effectiveness and superiority of the proposed approach is presented and verified on multiple visual adaptation benchmarks, e. g. , we improve the state-ofthe-art on the task of MNIST→SVHN from 76. 5% to 84. 9% without specific augmentation.

IJCAI Conference 2019 Conference Paper

Improving representation learning in autoencoders via multidimensional interpolation and dual regularizations

  • Sheng Qian
  • Guanyue Li
  • Wen-Ming Cao
  • Cheng Liu
  • Si Wu
  • Hau San Wong

Autoencoders enjoy a remarkable ability to learn data representations. Research on autoencoders shows that the effectiveness of data interpolation can reflect the performance of representation learning. However, existing interpolation methods in autoencoders do not have enough capability of traversing a possible region between two datapoints on a data manifold, and the distribution of interpolated latent representations is not considered. To address these issues, we aim to fully exert the potential of data interpolation and further improve representation learning in autoencoders. Specifically, we propose the multidimensional interpolation to increase the capability of data interpolation by randomly setting interpolation coefficients for each dimension of latent representations. In addition, we regularize autoencoders in both the latent and the data spaces by imposing a prior on latent representations in the Maximum Mean Discrepancy (MMD) framework and encouraging generated datapoints to be realistic in the Generative Adversarial Network (GAN) framework. Compared to representative models, our proposed model has empirically shown that representation learning exhibits better performance on downstream tasks on multiple benchmarks.

NeurIPS Conference 2019 Conference Paper

Push-pull Feedback Implements Hierarchical Information Retrieval Efficiently

  • Xiao Liu
  • Xiaolong Zou
  • Zilong Ji
  • Gengshuo Tian
  • Yuanyuan Mi
  • Tiejun Huang
  • K. Y. Michael Wong
  • Si Wu

Experimental data has revealed that in addition to feedforward connections, there exist abundant feedback connections in a neural pathway. Although the importance of feedback in neural information processing has been widely recognized in the field, the detailed mechanism of how it works remains largely unknown. Here, we investigate the role of feedback in hierarchical information retrieval. Specifically, we consider a hierarchical network storing the hierarchical categorical information of objects, and information retrieval goes from rough to fine, aided by dynamical push-pull feedback from higher to lower layers. We elucidate that the push (positive) and pull (negative) feedbacks suppress the interferences due to neural correlations between different and the same categories, respectively, and their joint effect improves retrieval performance significantly. Our model agrees with the push-pull phenomenon observed in neural data and sheds light on our understanding of the role of feedback in neural information processing.

NeurIPS Conference 2016 Conference Paper

“Congruent” and “Opposite” Neurons: Sisters for Multisensory Integration and Segregation

  • Wen-Hao Zhang
  • He Wang
  • K. Y. Michael Wong
  • Si Wu

Experiments reveal that in the dorsal medial superior temporal (MSTd) and the ventral intraparietal (VIP) areas, where visual and vestibular cues are integrated to infer heading direction, there are two types of neurons with roughly the same number. One is “congruent” cells, whose preferred heading directions are similar in response to visual and vestibular cues; and the other is “opposite” cells, whose preferred heading directions are nearly “opposite” (with an offset of 180 degree) in response to visual vs. vestibular cues. Congruent neurons are known to be responsible for cue integration, but the computational role of opposite neurons remains largely unknown. Here, we propose that opposite neurons may serve to encode the disparity information between cues necessary for multisensory segregation. We build a computational model composed of two reciprocally coupled modules, MSTd and VIP, and each module consists of groups of congruent and opposite neurons. In the model, congruent neurons in two modules are reciprocally connected with each other in the congruent manner, whereas opposite neurons are reciprocally connected in the opposite manner. Mimicking the experimental protocol, our model reproduces the characteristics of congruent and opposite neurons, and demonstrates that in each module, the sisters of congruent and opposite neurons can jointly achieve optimal multisensory information integration and segregation. This study sheds light on our understanding of how the brain implements optimal multisensory integration and segregation concurrently in a distributed manner.

NeurIPS Conference 2014 Conference Paper

A Synaptical Story of Persistent Activity with Graded Lifetime in a Neural System

  • Yuanyuan Mi
  • Luozheng Li
  • Dahui Wang
  • Si Wu

Persistent activity refers to the phenomenon that cortical neurons keep firing even after the stimulus triggering the initial neuronal responses is moved. Persistent activity is widely believed to be the substrate for a neural system retaining a memory trace of the stimulus information. In a conventional view, persistent activity is regarded as an attractor of the network dynamics, but it faces a challenge of how to be closed properly. Here, in contrast to the view of attractor, we consider that the stimulus information is encoded in a marginally unstable state of the network which decays very slowly and exhibits persistent firing for a prolonged duration. We propose a simple yet effective mechanism to achieve this goal, which utilizes the property of short-term plasticity (STP) of neuronal synapses. STP has two forms, short-term depression (STD) and short-term facilitation (STF), which have opposite effects on retaining neuronal responses. We find that by properly combining STF and STD, a neural system can hold persistent activity of graded lifetime, and that persistent activity fades away naturally without relying on an external drive. The implications of these results on neural information representation are discussed.

NeurIPS Conference 2014 Conference Paper

Spike Frequency Adaptation Implements Anticipative Tracking in Continuous Attractor Neural Networks

  • Yuanyuan Mi
  • C. C. Alan Fung
  • K. Y. Michael Wong
  • Si Wu

To extract motion information, the brain needs to compensate for time delays that are ubiquitous in neural signal transmission and processing. Here we propose a simple yet effective mechanism to implement anticipative tracking in neural systems. The proposed mechanism utilizes the property of spike-frequency adaptation (SFA), a feature widely observed in neuronal responses. We employ continuous attractor neural networks (CANNs) as the model to describe the tracking behaviors in neural systems. Incorporating SFA, a CANN exhibits intrinsic mobility, manifested by the ability of the CANN to hold self-sustained travelling waves. In tracking a moving stimulus, the interplay between the external drive and the intrinsic mobility of the network determines the tracking performance. Interestingly, we find that the regime of anticipation effectively coincides with the regime where the intrinsic speed of the travelling wave exceeds that of the external drive. Depending on the SFA amplitudes, the network can achieve either perfect tracking, with zero-lag to the input, or perfect anticipative tracking, with a constant leading time to the input. Our model successfully reproduces experimentally observed anticipative tracking behaviors, and sheds light on our understanding of how the brain processes motion information in a timely manner.

NeurIPS Conference 2013 Conference Paper

Reciprocally Coupled Local Estimators Implement Bayesian Information Integration Distributively

  • Wen-Hao Zhang
  • Si Wu

Psychophysical experiments have demonstrated that the brain integrates information from multiple sensory cues in a near Bayesian optimal manner. The present study proposes a novel mechanism to achieve this. We consider two reciprocally connected networks, mimicking the integration of heading direction information between the dorsal medial superior temporal (MSTd) and the ventral intraparietal (VIP) areas. Each network serves as a local estimator and receives an independent cue, either the visual or the vestibular, as direct input for the external stimulus. We find that positive reciprocal interactions can improve the decoding accuracy of each individual network as if it implements Bayesian inference from two cues. Our model successfully explains the experimental finding that both MSTd and VIP achieve Bayesian multisensory integration, though each of them only receives a single cue as direct external input. Our result suggests that the brain may implement optimal information integration distributively at each local estimator through the reciprocal connections between cortical regions.

NeurIPS Conference 2012 Conference Paper

Delay Compensation with Dynamical Synapses

  • Chi Fung
  • K. Wong
  • Si Wu

Time delay is pervasive in neural information processing. To achieve real-time tracking, it is critical to compensate the transmission and processing delays in a neural system. In the present study we show that dynamical synapses with short-term depression can enhance the mobility of a continuous attractor network to the extent that the system tracks time-varying stimuli in a timely manner. The state of the network can either track the instantaneous position of a moving stimulus perfectly (with zero-lag) or lead it with an effectively constant time, in agreement with experiments on the head-direction systems in rodents. The parameter regions for delayed, perfect and anticipative tracking correspond to network states that are static, ready-to-move and spontaneously moving, respectively, demonstrating the strong correlation between tracking performance and the intrinsic dynamics of the network. We also find that when the speed of the stimulus coincides with the natural speed of the network state, the delay becomes effectively independent of the stimulus amplitude.

NeurIPS Conference 2010 Conference Paper

Attractor Dynamics with Synaptic Depression

  • K. Wong
  • He Wang
  • Si Wu
  • Chi Fung

Neuronal connection weights exhibit short-term depression (STD). The present study investigates the impact of STD on the dynamics of a continuous attractor neural network (CANN) and its potential roles in neural information processing. We find that the network with STD can generate both static and traveling bumps, and STD enhances the performance of the network in tracking external inputs. In particular, we find that STD endows the network with slow-decaying plateau behaviors, namely, the network being initially stimulated to an active state will decay to silence very slowly in the time scale of STD rather than that of neural signaling. We argue that this provides a mechanism for neural systems to hold short-term memory easily and shut off persistent activities naturally.

NeurIPS Conference 2008 Conference Paper

Tracking Changing Stimuli in Continuous Attractor Neural Networks

  • K. Wong
  • Si Wu
  • Chi Fung

Continuous attractor neural networks (CANNs) are emerging as promising models for describing the encoding of continuous stimuli in neural systems. Due to the translational invariance of their neuronal interactions, CANNs can hold a continuous family of neutrally stable states. In this study, we systematically explore how neutral stability of a CANN facilitates its tracking performance, a capacity believed to have wide applications in brain functions. We develop a perturbative approach that utilizes the dominant movement of the network stationary states in the state space. We quantify the distortions of the bump shape during tracking, and study their effects on the tracking performance. Results are obtained on the maximum speed for a moving stimulus to be trackable, and the reaction time to catch up an abrupt change in stimulus.

NeurIPS Conference 2001 Conference Paper

Neural Implementation of Bayesian Inference in Population Codes

  • Si Wu
  • Shun-ichi Amari

This study investigates a population decoding paradigm, in which the estimation of stimulus in the previous step is used as prior knowledge for consecutive decoding. We analyze the decoding accu(cid: 173) racy of such a Bayesian decoder (Maximum a Posteriori Estimate), and show that it can be implemented by a biologically plausible recurrent network, where the prior knowledge of stimulus is con(cid: 173) veyed by the change in recurrent interactions as a result of Hebbian learning.

NeurIPS Conference 1999 Conference Paper

Population Decoding Based on an Unfaithful Model

  • Si Wu
  • Hiroyuki Nakahara
  • Noboru Murata
  • Shun-ichi Amari

We study a population decoding paradigm in which the maximum likeli(cid: 173) hood inference is based on an unfaithful decoding model (UMLI). This is usually the case for neural population decoding because the encoding process of the brain is not exactly known, or because a simplified de(cid: 173) coding model is preferred for saving computational cost. We consider an unfaithful decoding model which neglects the pair-wise correlation between neuronal activities, and prove that UMLI is asymptotically effi(cid: 173) cient when the neuronal correlation is uniform or of limited-range. The performance of UMLI is compared with that of the maximum likelihood inference based on a faithful model and that of the center of mass de(cid: 173) coding method. It turns out that UMLI has advantages of decreasing the computational complexity remarkablely and maintaining a high-level decoding accuracy at the same time. The effect of correlation on the decoding accuracy is also discussed.