Author name cluster

Xinlong Jiang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2025 Conference Paper

Mitigating Pervasive Modality Absence Through Multimodal Generalization and Refinement

Wuliang Huang
Yiqiang Chen
Xinlong Jiang
Chenlong Gao
Teng Zhang
Qian Chen
Yifan Wang

The performance of multimodal models often deteriorates when modality absence occurs. The absence disrupts the learned inter-modal correlations, resulting in biased multimodal representations. This challenge is especially pronounced when the absence is pervasive, affecting both the training and inference phases. Recent studies have attempted to reconstruct the missing information; however, most of them require complete supervision, which is seldom available in scenarios of pervasive absence. The quality of reconstruction remains a critical issue. Alternatively, others aim to learn robust representations from the available modalities but the substantial variations and biases are not fully addressed. This paper introduces the Multimodal Generalization and Refinement (MGR) framework to mitigate the issue of pervasive modality absence. MGR begins by acquiring generalized multimodal representations and iteratively refines them to recognize and calibrate the biased representations. Initially, multimodal samples with absence are embedded through foundation models, and MGR integrates independent unimodal features to further enhance generalization. Additionally, a novel mixed-context prompt is adopted to identify biases in both features and correlations. A redistribution operation can then refine these biases through graph pooling, culminating in robust and calibrated multimodal representations, which are suitable for downstream tasks. Comprehensive experiments on four benchmark datasets demonstrate that the proposed MGR framework outperforms state-of-the-art methods, effectively mitigating the impact of pervasive modality absence.

PDF Details DOI

AAAI Conference 2025 Conference Paper

VersaFusion: A Versatile Diffusion-Based Framework for Fine-Grained Image Editing and Enhancement

Haocun Ye
Xinlong Jiang
Chenlong Gao
Bingyu Wang
Wuliang Huang
Yiqiang Chen

Text-to-image (T2I) diffusion models have achieved remarkable progress in generating realistic images from textual descriptions. However, ensuring consistent high-quality image generation with complete backgrounds, object appearance, and optimal texture rendering remains challenging. This paper presents a novel fine-grained pixel-level image editing method based on pre-trained diffusion models. The proposed dual-branch architecture, consisting of Guidance and Generation branches, employs U-Net Denoisers and Self-Attention mechanisms. An improved DDIM-like inversion method obtains the latent representation, followed by multiple denoising steps. Cross-branch interactions, such as KV Replacement, Classifier Guidance, and Feature Correspondence, enable precise control while preserving image fidelity. The iterative refinement and reconstruction process facilitates finegrained editing control, supporting attribute modification, image outpainting, style transfer, and face synthesis with Clickand-Drag style editing using masks. Experimental results demonstrate the effectiveness of the proposed approach in enhancing the quality and controllability of T2I-generated images, surpassing existing methods while maintaining attractive computational complexity for practical real-world applications.

PDF Details DOI

TIST Journal 2022 Journal Article

Domain Generalization for Activity Recognition via Adaptive Feature Fusion

Xin Qin
Jindong Wang
Yiqiang Chen
Wang Lu
Xinlong Jiang

Human activity recognition requires the efforts to build a generalizable model using the training datasets with the hope to achieve good performance in test datasets. However, in real applications, the training and testing datasets may have totally different distributions due to various reasons such as different body shapes, acting styles, and habits, damaging the model’s generalization performance. While such a distribution gap can be reduced by existing domain adaptation approaches, they typically assume that the test data can be accessed in the training stage, which is not realistic. In this article, we consider a more practical and challenging scenario: domain-generalized activity recognition (DGAR) where the test dataset cannot be accessed during training. To this end, we propose Adaptive Feature Fusion for Activity Recognition (AFFAR), a domain generalization approach that learns to fuse the domain-invariant and domain-specific representations to improve the model’s generalization performance. AFFAR takes the best of both worlds where domain-invariant representations enhance the transferability across domains and domain-specific representations leverage the model discrimination power from each domain. Extensive experiments on three public HAR datasets show its effectiveness. Furthermore, we apply AFFAR to a real application, i.e., the diagnosis of Children’s Attention Deficit Hyperactivity Disorder (ADHD), which also demonstrates the superiority of our approach.

Details DOI