Author name cluster

Si Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers

1 author row

AAAI Conference 2026 Conference Paper

Refinement Contrastive Learning of Cell–Gene Associations for Unsupervised Cell Type Identification

Liang Peng
Haopeng Liu
Yixuan Ye
Cheng Liu
Wenjun Shen
Si Wu
Hau-San Wong

Unsupervised cell type identification is crucial for uncovering and characterizing heterogeneous populations in single cell omics studies. Although a range of clustering methods have been developed, most focus exclusively on intrinsic cellular structure and ignore the pivotal role of cell-gene associations, which limits their ability to distinguish closely related cell types. To this end, we propose a Refinement Contrastive Learning framework (scRCL) that explicitly incorporates cell-gene interactions to derive more informative representations. Specifically, we introduce two contrastive distribution alignment components that reveal reliable intrinsic cellular structures by effectively exploiting cell-cell structural relationships. Additionally, we develop a refinement module that integrates gene-correlation structure learning to enhance cell embeddings by capturing underlying cell-gene associations. This module strengthens connections between cells and their associated genes, refining the representation learning to exploiting biologically meaningful relationships. Extensive experiments on several single-cell RNA-seq and spatial transcriptomics benchmark datasets demonstrate that our method consistently outperforms state-of-the-art baselines in cell-type identification accuracy. Moreover, downstream biological analyses confirm that the recovered cell populations exhibit coherent gene-expression signatures, further validating the biological relevance of our approach.

PDF Details DOI

AAAI Conference 2025 Conference Paper

3DHumanEdit: Multi-modal Body Part-aware Conditioning Information Integration for 3D Human Manipulation

FeiFan Xu
Tianyi Chen
Fan Yang
Yunfei Zhang
Si Wu

The rapid advancement of 3D Generative Adversarial Networks (GANs) has significantly enhanced the diversity and quality of generated 3D images. Despite these breakthroughs, the manipulation capabilities of 3D GANs remain unexplored, presenting substantial challenges for practical applications where user interaction and modification are essential. Current manipulation methods often lack the precision needed for fine-grained attribute manipulation, and struggle to maintain multi-view consistency during the editing process. To address these limitations, we propose 3DHumanEdit, a novel approach for 3D human body part-aware manipulation. 3DHumanEdit leverages multi-modal feature fusion and body part-aware feature alignment to achieve precise manipulation of individual body parts based on detailed text inputs and segmentation images. By exploring 3D prior for accurate editing and enforcing correspondence in latent space, 3DHumanEdit ensures coherence across multiple views. Experiments demonstrate that 3DHumanEdit outperforms existing methods in both editing fidelity and multi-view consistency, offering a robust solution for fine-grained 3D manipulation.

PDF Details DOI

JBHI Journal 2025 Journal Article

Deep Self-Reinforced Multi-View Subspace Clustering for Cancer Subtyping

Cheng Liu
Baoyuan Zheng
Jiaojiao Wang
Xibiao Wang
Hang Gao
Fei Wang
Wenjun Shen
Si Wu

Identifying cancer subtypes is crucial for understanding disease progression and guiding precision medicine. With advances in high-throughput experimental technologies, the integration of multiple types of omics data for cancer subtype identification has become increasingly feasible. However, despite the promising performance of existing integrative cancer subtyping methods, efficiently integrating and clustering multi-omics datasets remains challenging due to the high levels of noise inherent in omics data, which impede the accurate characterization of relationships among samples. To address these challenges, we propose a novel deep multi-view subspace clustering model that incorporates a self-reinforced learning strategy. This strategy iteratively improves the quality of self-representation, which is critical for accurately capturing sample relationships and enabling effective clustering. Specifically, during model training, the proposed method learns a highly reliable self-representation through a good-neighbor learning mechanism, allowing it to model more accurate and robust inter-sample relationships. Building upon this reliable self-representation, we further develop a learnable view-graph fusion framework that integrates complementary information across multiple omics views to derive a consensus representation for clustering, thereby guiding the overall learning process. In addition, we introduce a local graph-guided learning mechanism based on an initial graph constructed from the raw data. This mechanism serves as an effective regularization strategy to prevent the model from converging to suboptimal solutions, thereby enhancing stability and robustness during training. Extensive experimental results demonstrate that the proposed method consistently outperforms several state-of-the-art approaches, validating its effectiveness and robustness for cancer subtype identification.

Details DOI

AAAI Conference 2025 Conference Paper

Discrete Prior-Based Temporal-Coherent Content Prediction for Blind Face Video Restoration

Lianxin Xie
Bingbing Zheng
Wen Xue
Yunfei Zhang
Le Jiang
Ruotao Xu
Si Wu
Hau-San Wong

Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations. This task poses a significant challenge of managing temporal heterogeneity while at the same time maintaining stable face attributes. In this paper, we introduce a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge, and our model is referred to as DP-TempCoh. Specifically, we incorporate a spatial-temporal-aware content prediction module to synthesize high-quality content from discrete visual priors, conditioned on degraded video tokens. To further enhance the temporal coherence of the predicted content, a motion statistics modulation module is designed to adjust the content, based on discrete motion priors in terms of cross-frame mean and variance. As a result, the statistics of the predicted content can match with that of real videos over time. By performing extensive experiments, we verify the effectiveness of the design elements and demonstrate the superior performance of our DP-TempCoh in both synthetically and naturally degraded video restoration.

PDF Details DOI

AAAI Conference 2025 Conference Paper

RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting

Wen Xue
Chun Ding
Ruotao Xu
Si Wu
Yong Xu
Hau-San Wong

Face retouching aims to remove facial imperfections from image and videos while at the same time preserving face attributes. The existing methods are designed to perform non-interactive end-to-end retouching, while the ability to interact with users is highly demanded in downstream applications. In this paper, we propose RetouchGPT, a novel framework that leverages Large Language Models (LLMs) to guide the interactive retouching process. Towards this end, we design an instruction-driven imperfection prediction module to accurately identify imperfections by integrating textual and visual features. To learn imperfection prompts, we further incorporate a LLM-based embedding module to fuse multi-modal conditioning information. The prompt-based feature modification is performed in each transformer block, such that the imperfection features are suppressed and replaced with the features of normal skin progressively. Extensive experiments have been performed to verify effectiveness of our design elements and demonstrate that RetouchGPT is a useful tool for interactive face retouching and achieves superior performance over state-of-the-arts.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Self-Correcting Robot Manipulation via Gaussian-Splatted Foresight

Shaohui Pan
Yong Xu
Ruotao Xu
Zihan Zhou
Si Wu
Zhuliang Yu

Language-conditioned robotic manipulation in unstructured environments presents significant challenges for intelligent robotic systems. However, due to partial observation or imprecise action prediction, failure may be unavoidable for learned policies. Moreover, operational failures can lead to the robotic arm entering an untrained state, potentially causing destructive results. Consequently, the ability to detect and self-correct failures is crucial for the development of practical robotic systems. To address this challenge, we propose a foresight-driven failure detection and self-correction module for robot manipulation. By leveraging 3D Gaussian Splatting, we represent the current scene with multiple Gaussians. Subsequently, we train a prediction network to forecast the Gaussian representation of future scenes conditioned on planned actions. Failure is detected when the predicted future significantly deviates from the real observation after action execution. In such cases, the end-effector rolls back to the previous action to avoid an untrained state. Integrating this approach with the PerACT framework, we develop a self-correcting robot manipulation policy. Evaluations on ten RLBench tasks with 166 variations demonstrate the superior performance of the proposed method, which outperforms state-of-the-art methods by 12.0% success rate on average.

PDF Details DOI

AAAI Conference 2025 Conference Paper

SpotDiff: Spatial Gene Expression Imputation Diffusion with Single-Cell RNA Sequencing Data Integration

Tianyi Chen
Yunfei Zhang
Lianxin Xie
Wenjun Shen
Si Wu
Hau-San Wong

The advent of Spatial Transcriptomics (ST) has revolutionized understanding of tissue architecture by creating high-resolution maps of gene expression patterns. However, the low capture rate of ST leads to significant sparsity. The aim of imputation is to recover biological signals by imputing the dropouts in ST data to approximate the true expression values. In this paper, we introduce a Spatial Gene Expression Imputation Diffusion model to facilitate ST data imputation, and our model is referred to as SpotDiff. Specifically, we incorporate a spot-gene prompt learning module to capture the association between spots and genes. Further, SpotDiff integrates single-cell RNA sequencing data to impute gene expression at each spot. The proposed approach is able to reduce the uncertainty in the imputation process, since the aggregation of multiple single-cell measurements yield a stable representation of the corresponding spot expression profile. Extensive experiments have been performed to demonstrate that SpotDiff outperforms existing imputation methods across multiple benchmarks in terms of yielding more accurate and biologically relevant gene expression profiles, particularly in highly sparse scenarios.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Unfolding the Black Box of Recurrent Neural Networks for Path Integration

Tianhao Chu
Yuling Wu
Neil Burgess
Zilong Ji
Si Wu

Path integration is essential for spatial navigation. Experimental studies have identified neural correlates for path integration, but exactly how the neural system accomplishes this computation remains unresolved. Here, we adopt recurrent neural networks (RNNs) trained to perform a path integration task to explore this issue. After training, we borrow neuroscience prior knowledge and methods to unfold the black box of the trained model, including: clarifying neuron types based on their receptive fields, dissecting information flows between neuron groups by pruning their connections, and analyzing internal dynamics of neuron groups using the attractor framework. Intriguingly, we uncover a hierarchical information processing pathway embedded in the RNN model, along which velocity information of an agent is first forwarded to band cells, band and grid cells then coordinate to carry out path integration, and finally grid cells output the agent location. Inspired by the RNN-based study, we construct a neural circuit model, in which band cells form one-dimensional (1D) continuous attractor neural networks (CANNs) and serve as upstream neurons to support downstream grid cells to carry out path integration in the 2D space. Our study challenges the conventional view of considering grid cells as the principal velocity integrator, and supports a neural circuit model with the hierarchy of band and grid cells.