Author name cluster

Jie Gui

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

AAAI Conference 2026 Conference Paper

Diversifying Counterattacks: Orthogonal Exploration for Robust CLlP Inference

Chengze Jiang
Minjing Dong
Xinli Shi
Jie Gui

Vision-language pre-training models (VLPs) demonstrate strong multimodal understanding and zero-shot generalization, yet remain vulnerable to adversarial examples, raising concerns about their reliability. Recent work, Test-Time Counterattack (TTC), improves robustness by generating perturbations that maximize the embedding deviation of adversarial inputs using PGD, pushing them away from their adversarial representations. However, due to the fundamental difference in optimization objectives between adversarial attacks and counterattacks, generating counterattacks solely based on gradients with respect to the adversarial input confines the search to a narrow space. As a result, the counterattacks could overfit limited adversarial patterns and lack the diversity to fully neutralize a broad range of perturbations. In this work, we argue that enhancing the diversity and coverage of counterattacks is crucial to improving adversarial robustness in test-time defense. Accordingly, we propose Directional Orthogonal Counterattack (DOC), which augments counterattack optimization by incorporating orthogonal gradient directions and momentum-based updates. This design expands the exploration of the counterattack space and increases the diversity of perturbations, which facilitates the discovery of more generalizable counterattacks and ultimately improves the ability to neutralize adversarial perturbations. Meanwhile, we present a directional sensitivity score based on averaged cosine similarity to boost DOC by improving example discrimination and adaptively modulating the counterattack strength. Extensive experiments on 16 datasets demonstrate that DOC improves adversarial robustness under various attacks while maintaining competitive clean accuracy.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Deep Graph Online Hashing for Multi-Label Image Retrieval

Yuan Cao
Xiangru Chen
Zifan Liu
Wenzhe Jia
Fanlei Meng
Jie Gui

Online hashing has attracted much research attention for large-scale image retrieval in a streaming way. The main challenge lies in keeping balance between high retrieval accuracy and low training time. Existing online hashing methods almost rely on shallow models rather than deep networks due to high training costs, because it is unacceptable to update hash functions on an order of hours. In addition, the multi-label supervision information is not fully utilized to guide the hash learning process and the affinity matrix is always fixed once constructed. In this paper, we propose a novel Deep Graph Online Hashing (DGOH) method, which for the first time introduces inductive graph neural networks (GNNs) to realize deep online hashing with acceptable training costs on an order of seconds. Furthermore, we mine the multi-label information of the images by constructing a label network and learn label-wise weights dynamically to help to update the affinity matrix. In addition, we provide a strategy to obtain examples from the old data to solve the catastrophic forgetting problem. An integrated objective function is designed to train the entire architecture. Extensive experiments on two common benchmarks demonstrate that the proposed method achieves up to 13.3% accuracy gains over state-of-the-art baselines and shows competitive performance on training time.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection

Hongsong Wang
Andi Xu
Pinle Ding
Jie Gui

Video Anomaly Detection (VAD) is essential for computer vision and multimedia research. Existing VAD methods utilize either reconstruction-based or prediction-based frameworks. The former excels at detecting irregular patterns or structures, whereas the latter is capable of spotting abnormal deviations or trends. We address pose-based video anomaly detection and introduce a novel framework called Dual Conditioned Motion Diffusion (DCMD), which enjoys the advantages of both approaches. The DCMD integrates conditioned motion and conditioned embedding to comprehensively utilize the pose characteristics and latent semantics of observed movements, respectively. In the reverse diffusion process, a motion transformer is proposed to capture potential correlations from multi-layered characteristics within the spectrum space of human motion. To enhance the discriminability between normal and abnormal instances, we design a novel United Association Discrepancy (UAD) regularization that primarily relies on a Gaussian kernel-based time association and a self-attention-based global association. Finally, a mask completion strategy is introduced during the inference stage of the reverse diffusion process to enhance the utilization of conditioned motion for the prediction branch of anomaly detection. Extensive experiments conducted on four datasets demonstrate that our method dramatically outperforms state-of-the-art methods and exhibits superior generalization performance.

PDF Details DOI

AAAI Conference 2025 Conference Paper

External Reliable Information-enhanced Multimodal Contrastive Learning for Fake News Detection

Biwei Cao
Qihang Wu
Jiuxin Cao
Bo Liu
Jie Gui

With the rapid development of the Internet, the information dissemination paradigm has changed and the efficiency has been improved greatly. While this also brings the quick spread of fake news and leads to negative impacts on cyberspace. Currently, the information presentation formats have evolved gradually, with the news formats shifting from texts to multimodal contents. As a result, detecting multimodal fake news has become one of the research hotspots. However, multimodal fake news detection research field still faces two main challenges: the inability to fully and effectively utilize multimodal information for detection, and the low credibility or static nature of the introduced external information, which limits dynamic updates. To bridge the gaps, we propose ERIC-FND, an external reliable information-enhanced multimodal contrastive learning framework for fake news detection. ERIC-FND strengthens the representation of news contents by entity-enriched external information enhancement method. It also enriches the multimodal news information via multimodal semantic interaction method where the multimodal constrative learning is employed to make different modality representations learn from each other. Moreover, an adaptive fusion method is taken to integrate the news representations from different dimensions for the eventual classification. Experiments are done on two commonly used datasets in different languages, X (Twitter) and Weibo. Experiment results demonstrate that our proposed model ERIC-FND outperforms existing state-of-the-art fake news detection methods under the same settings.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

A Comprehensive Survey and Taxonomy on Point Cloud Registration Based on Deep Learning

Yu-Xin Zhang
Jie Gui
Xiaofeng Cong
Xin Gong
Wenbing Tao

Point cloud registration (PCR) involves determining a rigid transformation that aligns one point cloud to another. Despite the plethora of outstanding deep learning (DL)-based registration methods proposed, comprehensive and systematic studies on DL-based PCR techniques are still lacking. In this paper, we present a comprehensive survey and taxonomy of recently proposed PCR methods. Firstly, we conduct a taxonomy of commonly utilized datasets and evaluation metrics. Secondly, we classify the existing research into two main categories: supervised and unsupervised registration, providing insights into the core concepts of various influential PCR models. Finally, we highlight open challenges and potential directions for future research. A curated collection of valuable resources is made available at https: //github. com/yxzhang15/PCR.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Taxonomy Driven Fast Adversarial Training

Kun Tong
Chengze Jiang
Jie Gui
Yuan Cao

Adversarial training (AT) is an effective defense method against gradient-based attacks to enhance the robustness of neural networks. Among them, single-step AT has emerged as a hotspot topic due to its simplicity and efficiency, requiring only one gradient propagation in generating adversarial examples. Nonetheless, the problem of catastrophic overfitting (CO) that causes training collapse remains poorly understood, and there exists a gap between the robust accuracy achieved through single- and multi-step AT. In this paper, we present a surprising finding that the taxonomy of adversarial examples reveals the truth of CO. Based on this conclusion, we propose taxonomy driven fast adversarial training (TDAT) which jointly optimizes learning objective, loss function, and initialization method, thereby can be regarded as a new paradigm of single-step AT. Compared with other fast AT methods, TDAT can boost the robustness of neural networks, alleviate the influence of misclassified examples, and prevent CO during the training process while requiring almost no additional computational and memory resources. Our method achieves robust accuracy improvement of 1.59%, 1.62%, 0.71%, and 1.26% on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet-100 datasets, when against projected gradient descent PGD10 attack with perturbation budget 8/255. Furthermore, our proposed method also achieves state-of-the-art robust accuracy against other attacks. Code is available at https://github.com/bookman233/TDAT.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Underwater Organism Color Fine-Tuning via Decomposition and Guidance

Xiaofeng Cong
Jie Gui
Junming Hou

Due to the wavelength dependent light attenuation and scattering, the color of the underwater organism usually appears distorted. The existing underwater image enhancement methods mainly focus on designing networks capable of generating enhanced underwater organisms with fixed color. Due to the complexity of the underwater environment, ground truth labels are difficult to obtain, which results in the non-existence of perfect enhancement effects. Different from the existing methods, this paper proposes an algorithm with color enhancement and color fine-tuning (CECF) capabilities. The color enhancement behavior of CECF is the same as that of existing methods, aiming to restore the color of the distorted underwater organism. Beyond this general purpose, the color fine-tuning behavior of CECF can adjust the color of organisms in a controlled manner, which can generate enhanced organisms with diverse colors. To achieve this purpose, four processes are used in CECF. A supervised enhancement process learns the mapping from a distorted image to an enhanced image by the decomposition of color code. A self reconstruction process and a cross-reconstruction process are used for content-invariant learning. A color fine-tuning process is designed based on the guidance for obtaining various enhanced results with different colors. Experimental results have proven the enhancement ability and color fine-tuning ability of the proposed CECF. The source code is provided in https://github.com/Xiaofeng-life/CECF.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Fast Online Hashing with Multi-Label Projection

Wenzhe Jia
Yuan Cao
Junwei Liu
Jie Gui

Hashing has been widely researched to solve the large-scale approximate nearest neighbor search problem owing to its time and storage superiority. In recent years, a number of online hashing methods have emerged, which can update the hash functions to adapt to the new stream data and realize dynamic retrieval. However, existing online hashing methods are required to update the whole database with the latest hash functions when a query arrives, which leads to low retrieval efficiency with the continuous increase of the stream data. On the other hand, these methods ignore the supervision relationship among the examples, especially in the multi-label case. In this paper, we propose a novel Fast Online Hashing (FOH) method which only updates the binary codes of a small part of the database. To be specific, we first build a query pool in which the nearest neighbors of each central point are recorded. When a new query arrives, only the binary codes of the corresponding potential neighbors are updated. In addition, we create a similarity matrix which takes the multi-label supervision information into account and bring in the multi-label projection loss to further preserve the similarity among the multi-label data. The experimental results on two common benchmarks show that the proposed FOH can achieve dramatic superiority on query time up to 6.28 seconds less than state-of-the-art baselines with competitive retrieval accuracy.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Good Helper Is around You: Attention-Driven Masked Image Modeling

Zhengqi Liu
Jie Gui
Hao Luo

It has been witnessed that masked image modeling (MIM) has shown a huge potential in self-supervised learning in the past year. Benefiting from the universal backbone vision transformer, MIM learns self-supervised visual representations through masking a part of patches of the image while attempting to recover the missing pixels. Most previous works mask patches of the image randomly, which underutilizes the semantic information that is beneficial to visual representation learning. On the other hand, due to the large size of the backbone, most previous works have to spend much time on pre-training. In this paper, we propose Attention-driven Masking and Throwing Strategy (AMT), which could solve both problems above. We first leverage the self-attention mechanism to obtain the semantic information of the image during the training process automatically without using any supervised methods. Masking strategy can be guided by that information to mask areas selectively, which is helpful for representation learning. Moreover, a redundant patch throwing strategy is proposed, which makes learning more efficient. As a plug-and-play module for masked image modeling, AMT improves the linear probing accuracy of MAE by 2.9% ~ 5.9% on CIFAR-10/100, STL-10, Tiny ImageNet, and ImageNet-1K, and obtains an improved performance with respect to fine-tuning accuracy of MAE and SimMIM. Moreover, this design also achieves superior performance on downstream detection and segmentation tasks.

PDF Details DOI

IJCAI Conference 2021 Conference Paper

A Comprehensive Survey on Image Dehazing Based on Deep Learning

Jie Gui
Xiaofeng Cong
Yuan Cao
Wenqi Ren
Jun Zhang
Jing Zhang
Dacheng Tao

The presence of haze significantly reduces the quality of images. Researchers have designed a variety of algorithms for image dehazing (ID) to restore the quality of hazy images. However, there are few studies that summarize the deep learning (DL) based dehazing technologies. In this paper, we conduct a comprehensive survey on the recent proposed dehazing methods. Firstly, we conclude the commonly used datasets, loss functions and evaluation metrics. Secondly, we group the existing researches of ID into two major categories: supervised ID and unsupervised ID. The core ideas of various influential dehazing models are introduced. Finally, the open issues for future research on ID are pointed out.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Delving into Variance Transmission and Normalization: Shift of Average Gradient Makes the Network Collapse

Yuxiang Liu
Jidong Ge
Chuanyi Li
Jie Gui

Normalization operations are essential for state-of-the-art neural networks and enable us to train a network from scratch with a large learning rate (LR). We attempt to explain the real effect of Batch Normalization (BN) from the perspective of variance transmission by investigating the relationship between BN and Weights Normalization (WN). In this work, we demonstrate that the problem of the shift of the average gradient will amplify the variance of every convolutional (conv) layer. We propose Parametric Weights Standardization (PWS), a fast and robust to mini-batch size module used for conv filters, to solve the shift of the average gradient. PWS can provide the speed-up of BN. Besides, it has less computation and does not change the output of a conv layer. PWS enables the network to converge fast without normalizing the outputs. This result enhances the persuasiveness of the shift of the average gradient and explains why BN works from the perspective of variance transmission. The code and appendix will be made available on https: //github. com/lyxzzz/PWSConv.

PDF Details

ECAI Conference 2020 Conference Paper

Randomized Kernel Multi-View Discriminant Analysis

Xiaoyun Li
Jie Gui
Ping Li 0001

In many artificial intelligence and computer vision systems, the same object can be observed at distinct viewpoints or by diverse sensors, which raises the challenges for recognizing objects from different, even heterogeneous views. Multi-view discriminant analysis (MvDA) is an effective multi-view subspace learning method, which finds a discriminant common subspace by jointly learning multiple view-specific linear projections for object recognition from multiple views, in a non-pairwise way. In this paper, we propose the kernel version of multi-view discriminant analysis, called kernel multi-view discriminant analysis (KMvDA). To overcome the well-known computational bottleneck of kernel methods, we also study the performance of using random Fourier features (RFF) to approximate Gaussian kernels in KMvDA, for large scale learning. Theoretical analysis on stability of this approximation is developed. We also conduct experiments on several popular multi-view datasets to illustrate the effectiveness of our proposed strategy.

Details

AIIM Journal 2010 Journal Article

Multi-step dimensionality reduction and semi-supervised graph-based tumor classification using gene expression data

Jie Gui
Shu-Lin Wang
Ying-Ke Lei

Details DOI