Arrow Research search

Author name cluster

Wen Xue

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

ECAI Conference 2025 Conference Paper

Binary Continual Stream-View Clustering

  • Wen Xue
  • Xingbo Liu
  • Kang Xiao
  • Xuening Zhang
  • Xiushan Nie

Multi-view clustering is valued for uncovering latent common semantics lying in multi-view data, which has been a hot topic in unsupervised learning. However, when dealing with incremental streaming views, existing approaches typically require reconstructing the view data and aggregating streaming representations, leading to misalignment between representation and clusters. More importantly, conducting the clustering process frequently results in significant time consumption. To address these issues, we propose a novel method called Binary Continual Stream-View Clustering (BCSVC). Specifically, we design a continual clustering method that seamlessly unifies streaming representation learning and cluster assignment within a single framework. We also introduce a variance-weighted center updating mechanism to smooth the frequent clustering operation and absorb the semantics of previous views. In addition, to reduce the time and space expenditure on computation and storage, binary code for clustering representations is introduced, which can also significantly improve the computational efficiency of continuous updates in streaming scenarios. Last but not least, comprehensive theoretical analysis and extensive experimental results demonstrate its superior performance under various scenarios.

AAAI Conference 2025 Conference Paper

Discrete Prior-Based Temporal-Coherent Content Prediction for Blind Face Video Restoration

  • Lianxin Xie
  • Bingbing Zheng
  • Wen Xue
  • Yunfei Zhang
  • Le Jiang
  • Ruotao Xu
  • Si Wu
  • Hau-San Wong

Blind face video restoration aims to restore high-fidelity details from videos subjected to complex and unknown degradations. This task poses a significant challenge of managing temporal heterogeneity while at the same time maintaining stable face attributes. In this paper, we introduce a Discrete Prior-based Temporal-Coherent content prediction transformer to address the challenge, and our model is referred to as DP-TempCoh. Specifically, we incorporate a spatial-temporal-aware content prediction module to synthesize high-quality content from discrete visual priors, conditioned on degraded video tokens. To further enhance the temporal coherence of the predicted content, a motion statistics modulation module is designed to adjust the content, based on discrete motion priors in terms of cross-frame mean and variance. As a result, the statistics of the predicted content can match with that of real videos over time. By performing extensive experiments, we verify the effectiveness of the design elements and demonstrate the superior performance of our DP-TempCoh in both synthetically and naturally degraded video restoration.

AAAI Conference 2025 Conference Paper

RetouchGPT: LLM-based Interactive High-Fidelity Face Retouching via Imperfection Prompting

  • Wen Xue
  • Chun Ding
  • Ruotao Xu
  • Si Wu
  • Yong Xu
  • Hau-San Wong

Face retouching aims to remove facial imperfections from image and videos while at the same time preserving face attributes. The existing methods are designed to perform non-interactive end-to-end retouching, while the ability to interact with users is highly demanded in downstream applications. In this paper, we propose RetouchGPT, a novel framework that leverages Large Language Models (LLMs) to guide the interactive retouching process. Towards this end, we design an instruction-driven imperfection prediction module to accurately identify imperfections by integrating textual and visual features. To learn imperfection prompts, we further incorporate a LLM-based embedding module to fuse multi-modal conditioning information. The prompt-based feature modification is performed in each transformer block, such that the imperfection features are suppressed and replaced with the features of normal skin progressively. Extensive experiments have been performed to verify effectiveness of our design elements and demonstrate that RetouchGPT is a useful tool for interactive face retouching and achieves superior performance over state-of-the-arts.

AAAI Conference 2025 Conference Paper

Semi-Supervised Online Cross-Modal Hashing

  • Xiao Kang
  • Xingbo Liu
  • Xuening Zhang
  • Wen Xue
  • Xiushan Nie
  • Yilong Yin

Online cross-modal hashing has gained increasing interest due to its ability to encode streaming data and update hash functions simultaneously. Existing online methods often assume either fully supervised or completely unsupervised settings. However, they overlook the prevalent and challenging scenario of semi-supervised cross-modal streaming data, where diverse data types, including labeled/unlabeled, paired/unpaired, and multi-modal, are intertwined. To address this issue, we propose Semi-Supervised Online Cross-modal Hashing (SSOCH). It presents an alignment-free pseudo-labeling strategy that extracts semantic information from unlabeled streaming data without relying on pairing relations. Furthermore, we design an online tri-consistent preserving scheme, integrating pseudo-labeled data regularization, discriminative label embedding, and fine-grained similarity preservation. This scheme fully explores consistency across data annotation, modalities, and streaming chunks, improving the model's adaptiveness in these challenging scenarios. Extensive experiments on benchmark datasets demonstrate the superiority of SSOCH under various scenarios, highlighting the importance of semi-supervised learning for online cross-modal hashing.

IJCAI Conference 2024 Conference Paper

SCTrans: Multi-scale scRNA-seq Sub-vector Completion Transformer for Gene-selective Cell Type Annotation

  • Lu Lin
  • Wen Xue
  • Xindian Wei
  • Wenjun Shen
  • Cheng Liu
  • Si Wu
  • Hau San Wong

Cell type annotation is pivotal to single-cell RNA sequencing data (scRNA-seq)-based biological and medical analysis, e. g. , identifying biomarkers, exploring cellular heterogeneity, and understanding disease mechanisms. The previous annotation methods typically learn a nonlinear mapping to infer cell type from gene expression vectors, and thus fall short in discovering and associating salient genes with specific cell types. To address this issue, we propose a multi-scale scRNA-seq Sub-vector Completion Transformer, and our model is referred to as SCTrans. Considering that the expressiveness of gene sub-vectors is richer than that of individual genes, we perform multi-scale partitioning on gene vectors followed by masked sub-vector completion, conditioned on unmasked ones. Toward this end, the multi-scale sub-vectors are tokenized, and the intrinsic contextual relationships are modeled via self-attention computation and conditional contrastive regularization imposed on an encoding transformer. By performing mutual learning between the encoder and an additional lightweight counterpart, the salient tokens can be distinguished from the others. As a result, we can perform gene-selective cell type annotation, which contributes to our superior performance over state-of-the-art annotation methods.