Author name cluster

Jiahui Qu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

1 author row

AAAI Conference 2026 Conference Paper

T-APT: Text-Guided Modality-Aware Prompt Tuning for Arbitrary Multimodal Remote Sensing Data Joint Classification

Qinghao Gao
Jiahui Qu
Wenqian Dong

Multimodal remote sensing image joint classification has achieved significant progress. However, existing methods primarily focus on designing modality-specific networks, lacking adaptive generalization capabilities in diverse and dynamic modality combinations encountered in real-world scenarios. Inspired by the generalization capabilities of visual foundation model in downstream tasks, we propose a unified Text-guided Arbitrary Modalitiy Prompting (T-APT) framework, which leverages complementary fused features to drive the foundation model and employs text-guided modality-specific prior knowledge as cross-modal prompts to fine-tune a pretrained Vision Transformer (ViT) model. Specifically, a Mamba-Based Arbitrary Modal-Focused Feature Capture (MAMF-FC) module is designed to extract complementary joint features and modality-specific prior knowledge from arbitrary modalities through a shared-specific scanning encoder-decoder architecture. Subsequently, a Text-Guided Modality-Aware Prompt Tuning (TMPT) module is proposed to support the adaptation of fused features to the foundation model, enabling our arbitrary remote sensing image classification task. Extensive experiments on public datasets spanning multispectral (MS), hyperspectral (HS), light detection and ranging (LiDAR), and synthetic aperture radar (SAR) modalities demonstrate that our T-APT achieves classification performance comparable to specialized networks across arbitrary modal combinations.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Bi-DiffCD: Bidirectional Diffusion Guided Collaborative Change Detection for Arbitrary-Modal Remote Sensing Images

Jingyu Zhao
Jiahui Qu
Wenqian Dong

Change detection aims to identify land cover changes by analyzing multitemporal images that cover the same area. However, It may be difficult to effectively obtain high-quality multitemporal images with the same modality in real dynamic scenarios. The rapid development of remote sensing technology enables collaborative observation of multimodal images, but it is challenging for uni-modal image-specific methods to overcome modal discrepancy and achieve complementary advantage detection. To this end, we propose a bidirectional diffusion guided collaborative change detection model (Bi-DiffCD) for arbitrary-modal images, which eliminates the modal discrepancy between arbitrary-modal images through the bidirectional diffusion and makes full use of the multilevel complementary advantage features to improve the detection accuracy. Specifically, a conditional diffusion-based bidirectional modal alignment module (CDBMA) is designed to step-wise align the modal attribute bidirectionally while preserving the multimodal complementary features. Furthermore, a multilevel complementary feature collaborative change detection module (MLCCD) is proposed to collaborate the multilevel enhanced complementary change information from transformed images and potential features for change detection. Experiments have been conducted on three widely used and one self-made multimodal datasets to demonstrate the effectiveness of the proposed method with different combinations of modalities. Code is available at https: //github. com/Jiahuiqu/Bi-DiffCD.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Do You Steal My Model? Signature Diffusion Embedded Dual-Verification Watermarking for Protecting Intellectual Property of Hyperspectral Image Classification Models

Yufei Yang
Song Xiao
Lixiang Li
Wenqian Dong
Jiahui Qu

Due to the high cost of data collection and training, the well-performed hyperspectral image (HSI) classification models are of great value and vulnerable to piracy threat during transmission and use. Model watermarking is a promising technology for intellectual property (IP) protection of models. However, the existing model watermarking methods for RGB image classification models ignore the complexity of ground objects and high dimension of HSIs, which makes trigger samples easy to be detected and forged. To address this problem, we propose a signature diffusion embedded dual-verification watermarking method, which generates imperceptible trigger samples with explicit owner information to achieve dual verification of both model ownership and legality of trigger set. Specifically, the subpixel-space owner signature diffusion incorporated imperceptible trigger set generation method is proposed to manipulate owner signature incorporated to the abundance matrix of seeds via diffusion model in subpixel space, thus balancing the perceptual quality of trigger samples and signature extraction capability. To resist ownership confusion, dual-stamp ownership verification is proposed to query the suspicious model with trigger samples for ownership verification, and further extracts signature from trigger samples to guarantee their legality. Extensive experiments demonstrate the proposed method can effectively protect IP of HSI classification models.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

DPMamba: Distillation Prompt Mamba for Multimodal Remote Sensing Image Classification with Missing Modalities

Yueguang Yang
Jiahui Qu
Ling Huang
Wenqian Dong

Multimodal remote sensing image classification (RSIC) has emerged as a key focus in Earth observation, driven by its capacity to extract complementary information from diverse sources. Existing methods struggle with modality absence caused by weather or equipment failures, leading to performance degradation. As a solution, knowledge distillation-based methods train student networks (SN) using a full-modality teacher, but they usually require training separate SN for each modality absence scenario, increasing complexity. To this end, we propose a unified Distillation Prompt Mamba (DPMamba) framework for multimodal RSIC with missing modalities. DPMamba leverages knowledge distillation in a shared text semantic space to optimize learnable prompts, transforming them from ``placeholder" to ``adaptation" states by enriching missing modality information with full-modality knowledge. To achieve this, we focus on two main aspects: first, we propose a new modality-aware Mamba for dynamically and hierarchically extracting cross-modality interactive features, providing richer, contextually relevant representations for backpropagation-based optimization of prompts; and second, we introduce a novel text-bridging distillation method to efficiently transfer full-modality knowledge, guiding the inclusion of missing modality information into prompts. Extensive evaluations demonstrate the effectiveness and robustness of the proposed DPMamba.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Fusion from a Distributional Perspective: A Unified Symbiotic Diffusion Framework for Any Multisource Remote Sensing Data Classification

Teng Yang
Song Xiao
Wenqian Dong
Jiahui Qu
Yueguang Yang

The joint classification of multisource remote sensing data is a prominent research field. However, most of the existing works are tailored for two specific data sources, which fail to effectively address the diverse combinations of data sources in practical applications. The importance of designing a unified network with applicability has been disregarded. In this paper, we propose a unified and self-supervised Symbiotic Diffusion framework (named SymDiffuser), which achieves the joint classification of any pair of different remote sensing data sources in a single model. The SymDiffuser captures the inter-modal relationship through establishing reciprocal conditional distributions across diverse sources step by step. The fusion process of multisource data is consistently represented within the framework from a data distribution perspective. Subsequently, features under the current conditional distribution at each time step is integrated during the downstream phase to accomplish the classification task. Such joint classification methodology transcends source-specific considerations, rendering it applicable to remote sensing data from any diverse sources. The experimental results showcase the framework's potential in achieving state-of-the-art performance in multimodal fusion classification task.

PDF Details DOI

AAAI Conference 2024 Conference Paper

LDS2AE: Local Diffusion Shared-Specific Autoencoder for Multimodal Remote Sensing Image Classification with Arbitrary Missing Modalities

Jiahui Qu
Yuanbo Yang
Wenqian Dong
Yufei Yang

Recent research on the joint classification of multimodal remote sensing data has achieved great success. However, due to the limitations imposed by imaging conditions, the case of missing modalities often occurs in practice. Most previous researchers regard the classification in case of different missing modalities as independent tasks. They train a specific classification model for each fixed missing modality by extracting multimodal joint representation, which cannot handle the classification of arbitrary (including multiple and random) missing modalities. In this work, we propose a local diffusion shared-specific autoencoder (LDS2AE), which solves the classification of arbitrary missing modalities with a single model. The LDS2AE captures the data distribution of different modalities to learn multimodal shared feature for classification by designing a novel local diffusion autoencoder which consists of a modality-shared encoder and several modality-specific decoders. The modality-shared encoder is designed to extract multimodal shared feature by employing the same parameters to map multimodal data into a shared subspace. The modality-specific decoders put the multimodal shared feature to reconstruct the image of each modality, which facilitates the shared feature to learn unique information of different modalities. In addition, we incorporate masked training to the diffusion autoencoder to achieve local diffusion, which significantly reduces the training cost of model. The approach is tested on widely-used multimodal remote sensing datasets, demonstrating the effectiveness of the proposed LDS2AE in addressing the classification of arbitrary missing modalities. The code is available at https://github.com/Jiahuiqu/LDS2AE.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Learning Multi-Modal Cross-Scale Deformable Transformer Network for Unregistered Hyperspectral Image Super-resolution

Wenqian Dong
Yang Xu
Jiahui Qu
Shaoxiong Hou

Hyperspectral image super-resolution (HSI-SR) is a technology to improve the spatial resolution of HSI. Existing fusion-based SR methods have shown great performance, but still have some problems as follows: 1) existing methods assume that the auxiliary image providing spatial information is strictly registered with the HSI, but images are difficult to be registered finely due to the shooting platforms, shooting viewpoints and the influence of atmospheric turbulence; 2) most of the methods are based on convolutional neural networks (CNNs), which is effective for local features but cannot utilize the global features. To this end, we propose a multi-modal cross-scale deformable transformer network (M2DTN) to achieve unregistered HSI-SR. Specifically, we formulate a spectrum-preserving based spatial-guided registration-SR unified model (SSRU) from the view of the realistic degradation scenarios. According to SSRU, we propose multi-modal registration deformable module (MMRD) to align features between different modalities by deformation field. In order to efficiently utilize the unique information between different modals, we design multi-scale feature transformer (MSFT) to emphasize the spatial-spectral features at different scales. In addition, we propose the cross-scale feature aggregation module (CSFA) to accurately reconstruct the HSI by aggregating feature information at different scales. Experiments show that M2DTN outperforms the-state-of-the-art HSI-SR methods. Code is obtainable at https://github.com/Jiahuiqu/M2DTN.

PDF Details DOI

AAAI Conference 2024 Conference Paper

S2CycleDiff: Spatial-Spectral-Bilateral Cycle-Diffusion Framework for Hyperspectral Image Super-resolution

Jiahui Qu
Jie He
Wenqian Dong
Jingyu Zhao

Hyperspectral image super-resolution (HISR) is a technique that can break through the limitation of imaging mechanism to obtain the hyperspectral image (HSI) with high spatial resolution. Although some progress has been achieved by existing methods, most of them directly learn the spatial-spectral joint mapping between the observed images and the target high-resolution HSI (HrHSI), failing to fully reserve the spectral distribution of low-resolution HSI (LrHSI) and the spatial distribution of high-resolution multispectral imagery (HrMSI). To this end, we propose a spatial-spectral-bilateral cycle-diffusion framework (S2CycleDiff) for HISR, which can step-wise generate the HrHSI with high spatial-spectral fidelity by learning the conditional distribution of spatial and spectral super-resolution processes bilaterally. Specifically, a customized conditional cycle-diffusion framework is designed as the backbone to achieve the spatial-spectral-bilateral super-resolution by repeated refinement, wherein the spatial/spectral guided pyramid denoising (SGPD) module seperately takes HrMSI and LrHSI as the guiding factors to achieve the spatial details injection and spectral correction. The outputs of the conditional cycle-diffusion framework are fed into a complementary fusion block to integrate the spatial and spectral details to generate the desired HrHSI. Experiments have been conducted on three widely used datasets to demonstrate the superiority of the proposed method over state-of-the-art HISR methods. The code is available at https://github.com/Jiahuiqu/S2CycleDiff.

PDF Details DOI