Author name cluster

Xinxin Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

1 author row

AAAI Conference 2026 Conference Paper

Anchor-Guided Discriminative Subspace Alignment and Clustering for Cross-Scene Hyperspectral Imagery

Yongshan Zhang
Zixuan Zhang
Xinxin Wang
Lefei Zhang
Zhihua Cai

Cross-scene hyperspectral image (HSI) recognition aims to assign a unique label to each pixel in the target scene by transferring knowledge from the source scene. Existing methods primarily rely on fully labeled source data and either partially labeled or unlabeled target data. No prior work has addressed the more challenging scenario of cross-scene recognition without label guidance in both scenes. To bridge this gap, we present the first study on cross-scene HSI clustering, proposing an anchor-guided discriminative subspace alignment and clustering (ADSAC) framework that follows a well-structured three-step learning paradigm to effectively mitigate distribution shifts. Specifically, we first develop an anchor-promoted graph learning (APGL) model to efficiently derive accurate clustering labels for the source scene by leveraging anchor-based structural information. Next, we propose a discriminative cross-scene subspace alignment (DCSA) model to improve feature discriminability and reduce distribution discrepancies. Finally, labels of the target scene are inferred after source clustering and cross-scene alignment. To solve the formulated models, we design tailored optimization algorithms to ensure high-quality learning. Extensive experiments demonstrate the superiority of the proposed framework over state-of-the-art methods.

PDF Details DOI

JBHI Journal 2026 Journal Article

CalDiff: Calibrating Uncertainty and Accessing Reliability of Diffusion Models for Trustworthy Lesion Segmentation

Xinxin Wang
Mingrui Yang
Sercan Tosun
Kunio Nakamura
Shuo Li
Xiaojuan Li

Low reliability has consistently been a challenge in the application of deep learning models for high-risk decision-making scenarios. In medical image segmentation, multiple expert annotations can be consulted to reduce subjective bias and reach a consensus, thereby enhancing the segmentation accuracy and reliability. To develop a reliable lesion segmentation model, we propose CalDiff, a novel framework that can leverage the uncertainty from multiple annotations, capture real-world diagnostic variability and provide more informative predictions. To harness the superior generative ability of diffusion models, a dual step-wise and sequence-aware calibration mechanism is proposed on the basis of the sequential nature of diffusion models. We evaluate the calibrated model through a comprehensive quantitative and visual analysis, addressing the previously overlooked challenge of assessing uncertainty calibration and model reliability in scenarios with multiple annotations and multiple predictions. Experimental results on two lesion segmentation datasets demonstrate that CalDiff produces uncertainty maps that can reflect low confidence areas, further indicating the false predictions made by the model. By calibrating the uncertainty in the training phase, the uncertain areas produced by our model are closely correlated with areas where the model has made errors in the inference. In summary, the uncertainty captured by CalDiff can serve as a powerful indicator, which can help mitigate the risks of adopting model's outputs, allowing clinicians to prioritize reviewing areas or slices with higher uncertainty and enhancing the model's reliability and trustworthiness in clinical practice.

Details DOI

AAAI Conference 2026 Conference Paper

Cross-view Anchor Graph Learning and Factorization for Incomplete Multi-view Clustering

Xinxin Wang
Yongshan Zhang
Xiaochen Yuan
Yicong Zhou

Graph-based incomplete multi-view clustering algorithms have gathered much attention due to their impressive clustering performance. However, existing methods primarily leverage intra-view correlation from observed views, while ignoring the exploration of explicit compensation relationships between different views. Moreover, these methods need post-processing to get labels, and the separate steps lack negotiation, which may lead to sub-optimal solutions. To address these issues, we propose a Cross-view Anchor Graph Learning and Factorization (AGLF) method. AGLF develops an Anchor Graph Completion (AGC) framework that explicitly learn the missing subgraph structures. Instead of requiring post-processing, AGC directly produces soft labels. By establishing a third-order tensor of soft labels, it employs the tensor Schatten p-norm to enhance anchor graph learning and factorization. To significantly improve the quality of subgraph learning, AGLF incorporates compensation subgraphs from supplementary views into the AGC framework, enabling the construction of a better anchor graph for label learning. An optimization algorithm is devised to solve the objective function. Experimental results across various datasets demonstrate the effectiveness of our method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Efficient Tensorized Multi-View Anchor Graph Clustering with Affinity Propagation for Remote Sensing Data

Yongshan Zhang
Kangyue Zheng
Shuaikang Yan
Xinxin Wang
Zhihua Cai

Multi-view clustering of remote sensing data presents significant challenges, as it integrates diverse data representations to improve Earth observation. Although existing anchor graph-based methods have yielded promising results, they generally exhibit two key limitations: (1) the time-consuming process of directly exploring pixel clustering structures, and (2) insufficient modeling of high-order correlations among different views. To address these issues, we propose an Efficient Tensorized multi-view anchor graph clustering method with Affinity Propagation (ETAP) for remote sensing data. Based on superpixel preprocessing, anchor graphs are learned from view-specific pixels and anchors, while compressed anchor graphs are simultaneously learned from the view-specific anchors. An adaptive weighting scheme is introduced to facilitate the learning of these anchor graphs. To capture high-order correlations, tensor Schatten p-norm regularization is applied to the compressed anchor graphs. A connectivity constraint is introduced to uncover the clustering structures of anchors. Finally, pixel clustering structures are then efficiently revealed from the pseudo-labeled anchors through affinity propagation without requiring additional clustering steps. To solve the proposed formulation, we develop an alternating optimization algorithm. Extensive experiments on three public datasets demonstrate the efficacy and efficiency of the proposed method over state-of-the-art methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

PASA: Progressive-Adaptive Spectral Augmentation for Automated Auscultation in Data-Scarce Environments

Ying Wang
Guoheng Huang
Xueyuan Gong
Xinxin Wang
Xiaochen Yuan

Automated auscultation advances the detection of respiratory diseases, especially in areas with limited resources where traditional diagnostic methods are unavailable. On the other hand, the scarcity of auscultation datasets limits the automation performance, prompting the needs for data augmentation methods. However, most of the existing methods neglect the difference in acoustic sounds that requires personalized augmentation strategies. To address this, we propose a Progressive-Adaptive Spectral Augmentation (PASA), which is one of the first paradigms to adaptively select the best augmentation strategy for each sample. The PASA innovatively treats augmentation selection problem as a Markov Decision Process (MDP), creating an alternating loop between the diagnostic model and the augmentation selection. The agent selects the optimal augmentation operations and magnitudes via a task-specific design, including state construction, action sampling, Hybrid Batch-Sample (HBS) strategy execution, and reward guidance. The HBS strategy initially applies uniform augmentation across mini-batches while collecting sample-specific performance statistics. When model performance stabilizes, it transits to sample-level augmentation based on accumulated difficulty assessments. This two-phase design balances computational complexity with personalization. Extensive experiments across three benchmark datasets demonstrate that the PASA outperforms the state-of-the-art methods, pioneering a transformative paradigm for adaptive data augmentation in automated auscultation.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Highly Efficient Rotation-Invariant Spectral Embedding for Scalable Incomplete Multi-View Clustering

Xinxin Wang
Yongshan Zhang
Yicong Zhou

Incomplete multi-view clustering presents significant challenges due to missing views. Although many existing graph-based methods aim to recover missing instances or complete similarity matrices with promising results, they still face several limitations: (1) Recovered data may be unsuitable for spectral clustering, as these methods often ignore guidance from spectral analysis; (2) Complex optimization processes require high computational burden, hindering scalability to large-scale problems; (3) Most methods do not address the rotational mismatch problem in spectral embeddings. To address these issues, we propose a highly efficient rotation-invariant spectral embedding (RISE) method for scalable incomplete multi-view clustering. RISE learns view-specific embeddings from incomplete bipartite graphs to capture the complementary information. Meanwhile, a complete consensus representation with second-order rotation-invariant property is recovered from these incomplete embeddings in a unified model. Moreover, we design a fast alternating optimization algorithm with linear complexity and promising convergence to solve the proposed formulation. Extensive experiments on multiple datasets demonstrate the effectiveness, scalability, and efficiency of RISE compared to the state-of-the-art methods.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Learn Multi-task Anchor: Joint View Imputation and Label Generation for Incomplete Multi-view Clustering

Xinxin Wang
Yongshan Zhang
Yicong Zhou

Anchor-based incomplete multi-view clustering methods utilize anchors to uncover clustering structures. However, relying on anchor graphs for producing final indicators is indirect, which can lead to information loss and suboptimal outcomes. Besides, most methods neglect the potential of anchors for imputing missing views. To address these limitations, we propose a Joint View Imputation and Label Generation (JVILG) method. JVILG comprises the Anchor-based tensorized Label Generation (ALG) module for generating clustering labels and the Anchor-based sparse regularized Subspace Correlation (ASC) module for recovering missing views. The ALG module explicitly connects data observations, the fine-grained anchor matrix, and soft label matrices within a reconstruction framework through a membership matrix, while imposing tensor Schatten p-norm regularization on the constructed label tensor to capture spatial correlations among views. Meanwhile, the ASC module directly uses fine-grained anchors to impute missing data in respective views. By integrating the ALG and ASC modules, JVILG enhances synergy between different tasks and mitigates the impact of missing information on clustering. Experimental results on six datasets demonstrate the effectiveness of JVILG compared to both shallow and deep state-of-the art methods. The code is available at https: //github. com/W-Xinxin/JVILG.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Spatial-Spectral Similarity-Guided Fusion Network for Pansharpening

Jiazhuang Xiong
Yongshan Zhang
Xinxin Wang
Lefei Zhang

Pansharpening fuses lower-resolution multispectral (LRMS) images with high-resolution panchromatic (PAN) images to generate high-resolution multispectral (HRMS) images that preserves both spatial and spectral information. Most deep pansharpening methods face challenges in cross-modal feature extraction and fusion, as well as in exploring the similarities between the fused image and both PAN and LRMS images. In this paper, we propose a spatial-spectral similarity-guided fusion network (S3FNet) for pansharpening. This architecture is composed of three parts. Specifically, a shallow feature extraction layer learns initial spatial, spectral and fused features from PAN and LRMS images. Then, a multi-branch asymmetric encoder, consisting of spatial, spectral and fusion branches, generates corresponding high-level features at different scales. A multi-scale reconstruction decoder, equipped with a well-designed cross-feature multi-head attention fusion block, processes the intermediate feature maps to generate HRMS images. To ensure HRMS images retain maximum spatial and spectral information, a similarity-constrained loss is defined for network training. Extensive experiments demonstrate the effectiveness of our S3FNet over state-of-the-art methods. The code is released at https: //github. com/ZhangYongshan/S3FNet.

PDF Details DOI

IJCAI Conference 2018 Conference Paper

Deeply-Supervised CNN Model for Action Recognition with Trainable Feature Aggregation

Yang Li
Kan Li
Xinxin Wang

In this paper, we propose a deeply-supervised CNN model for action recognition that fully exploits powerful hierarchical features of CNNs. In this model, we build multi-level video representations by applying our proposed aggregation module at different convolutional layers. Moreover, we train this model in a deep supervision manner, which brings improvement in both performance and efficiency. Meanwhile, in order to capture the temporal structure as well as preserve more details about actions, we propose a trainable aggregation module. It models the temporal evolution of each spatial location and projects them into a semantic space using the Vector of Locally Aggregated Descriptors (VLAD) technique. This deeply-supervised CNN model integrating the powerful aggregation module provides a promising solution to recognize actions in videos. We conduct experiments on two action recognition datasets: HMDB51 and UCF101. Results show that our model outperforms the state-of-the-art methods.

PDF Details