Arrow Research search

Author name cluster

Guansong Pang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

AAAI Conference 2026 Conference Paper

MTAttack: Multi-Target Backdoor Attacks Against Large Vision-Language Models

  • Zihan Wang
  • Guansong Pang
  • Wenjun Miao
  • Jin Zheng
  • Xiao Bai

Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and instruction tuning. However, the security vulnerabilities of LVLMs have become increasingly concerning, particularly their susceptibility to backdoor attacks. Existing backdoor attacks focus on single-target attacks, i.e., targeting a single malicious output associated with a specific trigger. In this work, we uncover multi-target backdoor attacks, where multiple independent triggers corresponding to different attack targets are added in a single pass of training, posing a greater threat to LVLMs in real-world applications. Executing such attacks in LVLMs is challenging since there can be many incorrect trigger-target mappings due to severe feature interference among different triggers. To address this challenge, we propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs. The core of MTAttack is a novel optimization method with two constraints, namely Proxy Space Partitioning constraint and Trigger Prototype Anchoring constraint. It jointly optimizes multiple triggers in the latent space, with each trigger independently mapping clean images to a unique proxy class while at the same time guaranteeing their separability. Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks, substantially outperforming existing attack methods. Furthermore, our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies. These findings highlight the vulnerability of LVLMs to multi-target backdoor attacks and underscore the urgent need for mitigating such threats.

AAAI Conference 2026 Conference Paper

TargetVAU: Multimodal Anomaly-Aware Reasoning for Target Behavior Understanding in Videos

  • Lingru Zhou
  • Peng Wu
  • Manqing Zhang
  • Qingsheng Wang
  • Guansong Pang
  • Peng Wang

Understanding anomalous human behaviors at a fine-grained level remains a major challenge in complex scenarios. Existing video anomaly understanding (VAU) methods often rely on coarse frame-level cues or overlook structured modeling of individual actions, limiting their capacity for reasoning about human interactions and accountability. To address these challenges, we propose TargetVAU, a multimodal anomaly-aware reasoning framework designed for individual-level anomaly recognition and explanation. TargetVAU first extracts both global-level and human-centric visual features using a frozen Vision Transformer (ViT) encoder. An Anomaly-focused Temporal Sampler is then employed to select behaviorally informative frames via a density-aware strategy guided by predicted anomaly scores. A Spatio-Temporal Interaction Graph is constructed to explicitly model interactions among individuals across time and space. These structured representations are fused with prompt embeddings via a frozen Q-Former to form a unified semantic representation. Finally, a large language model fine-tuned with low-rank adaptation (LoRA) performs instruction-guided reasoning to identify anomalous individuals and generate natural language explanations. Extensive experiments on UCCD and HIVAU-70K demonstrate that TargetVAU significantly outperforms existing methods in both accuracy and interpretability, advancing the state of individual-level anomaly understanding in surveillance videos.

IJCAI Conference 2025 Conference Paper

FreqLLM: Frequency-Aware Large Language Models for Time Series Forecasting

  • Shunnan Wang
  • Min Gao
  • Zongwei Wang
  • Yibing Bai
  • Feng Jiang
  • Guansong Pang

Large Language Models (LLMs) have recently shown promise in Time Series Forecasting (TSF) by effectively capturing intricate time-domain dependencies. However, our preliminary experiments reveal that standard LLM-based approaches often fail to capture global correlations, limiting predictive performance. We found that embedding frequency-domain signals smooths weight distributions and enhances structured correlations by clearly separating global trends (low-frequency components) from local variations (high-frequency components). Building on these insights, we propose FreqLLM, a novel framework that integrates frequency-domain semantic alignment into LLMs to refine prompts for improved time series analysis. By bridging the gap between frequency signals and textual embeddings, FreqLLM effectively captures multi-scale temporal patterns and provides more robust forecasting results. Extensive experiments on benchmark datasets demonstrate that FreqLLM outperforms state-of-the-art TSF methods in both accuracy and generalization. The code is available at https: //github. com/biya0105/FreqLLM.

ICML Conference 2025 Conference Paper

GrokFormer: Graph Fourier Kolmogorov-Arnold Transformers

  • GuoguoAi
  • Guansong Pang
  • Hezhe Qiao
  • Yuan Gao
  • Hui Yan

Graph Transformers (GTs) have demonstrated remarkable performance in graph representation learning over popular graph neural networks (GNNs). However, self-attention, the core module of GTs, preserves only low-frequency signals in graph features, leading to ineffectiveness in capturing other important signals like high-frequency ones. Some recent GT models help alleviate this issue, but their flexibility and expressiveness are still limited since the filters they learn are fixed on predefined graph spectrum or spectral order. To tackle this challenge, we propose a Graph Fourier Kolmogorov-Arnold Transformer (GrokFormer), a novel GT model that learns highly expressive spectral filters with adaptive graph spectrum and spectral order through a Fourier series modeling over learnable activation functions. We demonstrate theoretically and empirically that the proposed GrokFormer filter offers better expressiveness than other spectral methods. Comprehensive experiments on 10 real-world node classification datasets across various domains, scales, and graph properties, as well as 5 graph classification datasets, show that GrokFormer outperforms state-of-the-art GTs and GNNs. Our code is available at https: //github. com/GGA23/GrokFormer.

ICML Conference 2025 Conference Paper

Information Bottleneck-guided MLPs for Robust Spatial-temporal Forecasting

  • Min Chen
  • Guansong Pang
  • Wenjun Wang 0002
  • Cheng Yan

Spatial-temporal forecasting (STF) plays a pivotal role in urban planning and computing. Spatial-Temporal Graph Neural Networks (STGNNs) excel at modeling spatial-temporal dynamics, thus being robust against noise perturbations. However, they often suffer from relatively poor computational efficiency. Simplifying the architectures can improve efficiency but also weakens robustness with respect to noise interference. In this study, we investigate the problem: can simple neural networks such as Multi-Layer Perceptrons (MLPs) achieve robust spatial-temporal forecasting while remaining efficient? To this end, we first reveal the dual noise effect in spatial-temporal data and propose a theoretically grounded principle termed Robust Spatial-Temporal Information Bottleneck (RSTIB), which holds strong potential for improving model robustness. We then design an implementation named RSTIB-MLP, together with a new training regime incorporating a knowledge distillation module, to enhance the robustness of MLPs for STF while maintaining their efficiency. Comprehensive experiments demonstrate that RSTIB-MLP achieves an excellent trade-off between robustness and efficiency, outperforming state-of-the-art STGNNs and MLP-based models. Our code is publicly available at: https: //github. com/mchen644/RSTIB.

ICLR Conference 2025 Conference Paper

Open-Set Graph Anomaly Detection via Normal Structure Regularisation

  • Qizhou Wang 0001
  • Guansong Pang
  • Mahsa Salehi
  • Xiaokun Xia
  • Christopher Leckie

This paper considers an important Graph Anomaly Detection (GAD) task, namely open-set GAD, which aims to train a detection model using a small number of normal and anomaly nodes (referred to as *seen anomalies*) to detect both seen anomalies and *unseen anomalies* (*i.e*., anomalies that cannot be illustrated the training anomalies). Those labelled training data provide crucial prior knowledge about abnormalities for GAD models, enabling substantially reduced detection errors. However, current supervised GAD methods tend to over-emphasise fitting the seen anomalies, leading to many errors of detecting the unseen anomalies as normal nodes. Further, existing open-set AD models were introduced to handle Euclidean data, failing to effectively capture discriminative features from graph structure and node attributes for GAD. In this work, we propose a novel open-set GAD approach, namely $\underline{n}ormal$ $\underline{s}tructure$ $\underline{reg}ularisation$ (**NSReg**), to achieve generalised detection ability to unseen anomalies, while maintaining its effectiveness on detecting seen anomalies. The key idea in NSReg is to introduce a regularisation term that enforces the learning of compact, semantically-rich representations of normal nodes based on their structural relations to other nodes. When being optimised with supervised anomaly detection losses, the regularisation term helps incorporate strong normality into the modelling, and thus, it effectively avoids over-fitting the seen anomalies and learns a better normality decision boundary, largely reducing the false negatives of detecting unseen anomalies as normal. Extensive empirical results on seven real-world datasets show that NSReg significantly outperforms state-of-the-art competing methods by at least 14% AUC-ROC on the unseen anomaly classes and by 10% AUC-ROC on all anomaly classes. Code and datasets are available at https://github.com/mala-lab/NSReg.

NeurIPS Conference 2025 Conference Paper

Robust Hallucination Detection in LLMs via Adaptive Token Selection

  • Mengjia Niu
  • Hamed Haddadi
  • Guansong Pang

Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment. Recent research in hallucination detection has demonstrated that LLMs' internal representations contain truthfulness hints, which can be harnessed for detector training. However, the performance of these detectors is heavily dependent on the internal representations of predetermined tokens, fluctuating considerably when working on free-form generations with varying lengths and sparse distributions of hallucinated entities. To address this, we propose HaMI, a novel approach that enables robust detection of hallucinations through adaptive selection and learning of critical tokens that are most indicative of hallucinations. We achieve this robustness by an innovative formulation of the Hallucination detection task as Multiple Instance (HaMI) learning over token-level representations within a sequence, thereby facilitating a joint optimisation of token selection and hallucination detection on generation sequences of diverse forms. Comprehensive experimental results on four hallucination benchmarks show that HaMI significantly outperforms existing state-of-the-art approaches.

IS Journal 2025 Journal Article

Securing Foundation Models: Failure Cases, Challenges, and the Future

  • Mengjia Niu
  • Jiawen Zhu
  • Hezhe Qiao
  • Hamed Haddadi
  • Guansong Pang

Foundation models (FMs), trained on diverse web-scale datasets, have demonstrated remarkable performance on a broad range of tasks. Despite their strong capabilities, the rapid expansion in scale and complexity of FMs introduces significant challenges that could compromise their reliability upon deployment. Key concerns can include the potential leakage of private data, exacerbation of existing bias, generation of incorrect or even harmful responses, and the risk of malicious use, among other emerging issues. This article discusses the public perception of critical ethical issues surrounding the privacy, safety, and security of FMs and their major challenges and opportunities.

NeurIPS Conference 2025 Conference Paper

Semi-supervised Graph Anomaly Detection via Robust Homophily Learning

  • GUOGUO AI
  • Hezhe Qiao
  • Hui Yan
  • Guansong Pang

Current semi-supervised graph anomaly detection (GAD) methods utilizes a small set of labeled normal nodes to identify abnormal nodes from a large set of unlabeled nodes in a graph. These methods posit that 1) normal nodes share a similar level of homophily and 2) the labeled normal nodes can well represent the homophily patterns in the entire normal class. However, this assumption often does not hold well since normal nodes in a graph can exhibit diverse homophily in real-world GAD datasets. In this paper, we propose RHO, namely Robust Homophily Learning, to adaptively learn such homophily patterns. RHO consists of two novel modules, adaptive frequency response filters (AdaFreq) and graph normality alignment (GNA). AdaFreq learns a set of adaptive spectral filters that capture different frequency components of the labeled normal nodes with varying homophily in the channel-wise and cross-channel views of node attributes. GNA is introduced to enforce consistency between the channel-wise and cross-channel homophily representations to robustify the normality learned by the filters in the two views. Experiments on eight real-world GAD datasets show that RHO can effectively learn varying, often under-represented, homophily in the small labeled node set and substantially outperforms state-of-the-art competing methods. Code is available at \url{https: //github. com/mala-lab/RHO}.

NeurIPS Conference 2025 Conference Paper

SEMPO: Lightweight Foundation Models for Time Series Forecasting

  • Hui He
  • Kun Yi
  • Yuanchi Ma
  • Qi Zhang
  • Zhendong Niu
  • Guansong Pang

The recent boom of large pre-trained models witnesses remarkable success in developing foundation models (FMs) for time series forecasting. Despite impressive performance across diverse downstream forecasting tasks, existing time series FMs possess massive network architectures and require substantial pre-training on large-scale datasets, which significantly hinders their deployment in resource-constrained environments. In response to this growing tension between versatility and affordability, we propose SEMPO, a novel lightweight foundation model that requires pretraining on relatively small-scale data, yet exhibits strong general time series forecasting. Concretely, SEMPO comprises two key modules: 1) energy-aware S p E ctral decomposition module, that substantially improves the utilization of pre-training data by modeling not only the high-energy frequency signals but also the low-energy yet informative frequency signals that are ignored in current methods; and 2) M ixture-of- P r O mpts enabled Transformer, that learns heterogeneous temporal patterns through small dataset-specific prompts and adaptively routes time series tokens to prompt-based experts for parameter-efficient model adaptation across different datasets and domains. Equipped with these modules, SEMPO significantly reduces both pre-training data scale and model size, while achieving strong generalization. Extensive experiments on two large-scale benchmarks covering 16 datasets demonstrate the superior performance of SEMPO in both zero-shot and few-shot forecasting scenarios compared with state-of-the-art methods. Code and data are available at https: //github. com/mala-lab/SEMPO.

IJCAI Conference 2025 Conference Paper

Zero-shot Generalist Graph Anomaly Detection with Unified Neighborhood Prompts

  • Chaoxi Niu
  • Hezhe Qiao
  • Changlu Chen
  • Ling Chen
  • Guansong Pang

Graph anomaly detection (GAD), which aims to identify nodes in a graph that significantly deviate from normal patterns, plays a crucial role in broad application domains. However, existing GAD methods are one-model-for-one-dataset approaches, i. e. , training a separate model for each graph dataset. This largely limits their applicability in real-world scenarios. To overcome this limitation, we propose a novel zero-shot generalist GAD approach UNPrompt that trains a one-for-all detection model, requiring the training of one GAD model on a single graph dataset and then effectively generalizing to detect anomalies in other graph datasets without any retraining or fine-tuning. The key insight in UNPrompt is that i) the predictability of latent node attributes can serve as a generalized anomaly measure and ii) generalized normal and abnormal graph patterns can be learned via latent node attribute prediction in a properly normalized node attribute space. UNPrompt achieves a generalist mode for GAD through two main modules: one module aligns the dimensionality and semantics of node attributes across different graphs via coordinate-wise normalization, while another module learns generalized neighborhood prompts that support the use of latent node attribute predictability as an anomaly score across different datasets. Extensive experiments on real-world GAD datasets show that UNPrompt significantly outperforms diverse competing methods under the generalist GAD setting, and it also has strong superiority under the one-model-for-one-dataset setting. Code is available at https: //github. com/mala-lab/UNPrompt.

ICLR Conference 2024 Conference Paper

AnomalyCLIP: Object-agnostic Prompt Learning for Zero-shot Anomaly Detection

  • Qihang Zhou
  • Guansong Pang
  • Yu Tian 0001
  • Shibo He
  • Jiming Chen 0001

Zero-shot anomaly detection (ZSAD) requires detection models trained using auxiliary data to detect anomalies without any training sample in a target dataset. It is a crucial task when training data is not accessible due to various concerns, e.g., data privacy, yet it is challenging since the models need to generalize to anomalies across different domains where the appearance of foreground objects, abnormal regions, and background features, such as defects/tumors on different products/ organs, can vary significantly. Recently large pre-trained vision-language models (VLMs), such as CLIP, have demonstrated strong zero-shot recognition ability in various vision tasks, including anomaly detection. However, their ZSAD performance is weak since the VLMs focus more on modeling the class semantics of the foreground objects rather than the abnormality/normality in the images. In this paper we introduce a novel approach, namely AnomalyCLIP, to adapt CLIP for accurate ZSAD across different domains. The key insight of AnomalyCLIP is to learn object-agnostic text prompts that capture generic normality and abnormality in an image regardless of its foreground objects. This allows our model to focus on the abnormal image regions rather than the object semantics, enabling generalized normality and abnormality recognition on diverse types of objects. Large-scale experiments on 17 real-world anomaly detection datasets show that AnomalyCLIP achieves superior zero-shot performance of detecting and segmenting anomalies in datasets of highly diverse class semantics from various defect inspection and medical imaging domains. Code will be made available at https://github.com/zqhang/AnomalyCLIP.

NeurIPS Conference 2024 Conference Paper

Generative Semi-supervised Graph Anomaly Detection

  • Hezhe Qiao
  • Qingsong Wen
  • Xiaoli Li
  • Ee-Peng Lim
  • Guansong Pang

This work considers a practical semi-supervised graph anomaly detection (GAD) scenario, where part of the nodes in a graph are known to be normal, contrasting to the extensively explored unsupervised setting with a fully unlabeled graph. We reveal that having access to the normal nodes, even just a small percentage of normal nodes, helps enhance the detection performance of existing unsupervised GAD methods when they are adapted to the semi-supervised setting. However, their utilization of these normal nodes is limited. In this paper, we propose a novel Generative GAD approach (namely GGAD) for the semi-supervised scenario to better exploit the normal nodes. The key idea is to generate pseudo anomaly nodes, referred to as 'outlier nodes', for providing effective negative node samples in training a discriminative one-class classifier. The main challenge here lies in the lack of ground truth information about real anomaly nodes. To address this challenge, GGAD is designed to leverage two important priors about the anomaly nodes -- asymmetric local affinity and egocentric closeness -- to generate reliable outlier nodes that assimilate anomaly nodes in both graph structure and feature representations. Comprehensive experiments on six real-world GAD datasets are performed to establish a benchmark for semi-supervised GAD and show that GGAD substantially outperforms state-of-the-art unsupervised and semi-supervised GAD methods with varying numbers of training normal nodes.

ECAI Conference 2024 Conference Paper

Graph Continual Learning with Debiased Lossless Memory Replay

  • Chaoxi Niu
  • Guansong Pang
  • Ling Chen 0006

Real-life graph data often expands continually, rendering the learning of graph neural networks (GNNs) on static graph data impractical. Graph continual learning (GCL) tackles this problem by continually adapting GNNs to the expanded graph of the current task while maintaining the performance over the graph of previous tasks. Memory replay-based methods, which aim to replay data of previous tasks when learning new tasks, have been explored as one principled approach to mitigate the forgetting of the knowledge learned from the previous tasks. In this paper we extend this methodology with a novel framework, called Debiased Lossless Memory replay (DeLoMe). Unlike existing methods that sample nodes/edges of previous graphs to construct the memory, DeLoMe learns lossless prototypical node representations as the memory. The learned memory can not only preserve the graph data privacy but also capture the holistic graph information, both of which the sampling-based methods fail to achieve. Further, prior methods suffer from bias toward the current task due to the data imbalance between the classes in the memory data and the current data. A debiased GCL loss function is devised in DeLoMe to effectively alleviate this bias. Extensive experiments on four graph datasets show the effectiveness of DeLoMe under both class- and task-incremental learning settings. Code is available at https: //github. com/mala-lab/DeLoMe.

NeurIPS Conference 2024 Conference Paper

Long-Tailed Out-of-Distribution Detection via Normalized Outlier Distribution Adaptation

  • Wenjun Miao
  • Guansong Pang
  • Jin Zheng
  • Xiao Bai

One key challenge in Out-of-Distribution (OOD) detection is the absence of ground-truth OOD samples during training. One principled approach to address this issue is to use samples from external datasets as outliers ($\textit{i. e. }$, pseudo OOD samples) to train OOD detectors. However, we find empirically that the outlier samples often present a distribution shift compared to the true OOD samples, especially in Long-Tailed Recognition (LTR) scenarios, where ID classes are heavily imbalanced, $\textit{i. e. }$, the true OOD samples exhibit very different probability distribution to the head and tailed ID classes from the outliers. In this work, we propose a novel approach, namely $\textit{normalized outlier distribution adaptation}$ (AdaptOD), to tackle this distribution shift problem. One of its key components is $\textit{dynamic outlier distribution adaptation}$ that effectively adapts a vanilla outlier distribution based on the outlier samples to the true OOD distribution by utilizing the OOD knowledge in the predicted OOD samples during inference. Further, to obtain a more reliable set of predicted OOD samples on long-tailed ID data, a novel $\textit{dual-normalized energy loss}$ is introduced in AdaptOD, which leverages class- and sample-wise normalized energy to enforce a more balanced prediction energy on imbalanced ID samples. This helps avoid bias toward the head samples and learn a substantially better vanilla outlier distribution than existing energy losses during training. It also eliminates the need of manually tuning the sensitive margin hyperparameters in energy losses. Empirical results on three popular benchmarks for OOD detection in LTR show the superior performance of AdaptOD over state-of-the-art methods. Code is available at https: //github. com/mala-lab/AdaptOD.

AAAI Conference 2024 Conference Paper

Out-of-Distribution Detection in Long-Tailed Recognition with Calibrated Outlier Class Learning

  • Wenjun Miao
  • Guansong Pang
  • Xiao Bai
  • Tianqi Li
  • Jin Zheng

Existing out-of-distribution (OOD) methods have shown great success on balanced datasets but become ineffective in long-tailed recognition (LTR) scenarios where 1) OOD samples are often wrongly classified into head classes and/or 2) tail-class samples are treated as OOD samples. To address these issues, current studies fit a prior distribution of auxiliary/pseudo OOD data to the long-tailed in-distribution (ID) data. However, it is difficult to obtain such an accurate prior distribution given the unknowingness of real OOD samples and heavy class imbalance in LTR. A straightforward solution to avoid the requirement of this prior is to learn an outlier class to encapsulate the OOD samples. The main challenge is then to tackle the aforementioned confusion between OOD samples and head/tail-class samples when learning the outlier class. To this end, we introduce a novel calibrated outlier class learning (COCL) approach, in which 1) a debiased large margin learning method is introduced in the outlier class learning to distinguish OOD samples from both head and tail classes in the representation space and 2) an outlier-class-aware logit calibration method is defined to enhance the long-tailed classification confidence. Extensive empirical results on three popular benchmarks CIFAR10-LT, CIFAR100-LT, and ImageNet-LT demonstrate that COCL substantially outperforms existing state-of-the-art OOD detection methods in LTR while being able to improve the classification accuracy on ID data. Code is available at https://github.com/mala-lab/COCL.

NeurIPS Conference 2024 Conference Paper

Replay-and-Forget-Free Graph Class-Incremental Learning: A Task Profiling and Prompting Approach

  • Chaoxi Niu
  • Guansong Pang
  • Ling Chen
  • Bing Liu

Class-incremental learning (CIL) aims to continually learn a sequence of tasks, with each task consisting of a set of unique classes. Graph CIL (GCIL) follows the same setting but needs to deal with graph tasks (e. g. , node classification in a graph). The key characteristic of CIL lies in the absence of task identifiers (IDs) during inference, which causes a significant challenge in separating classes from different tasks (i. e. , inter-task class separation). Being able to accurately predict the task IDs can help address this issue, but it is a challenging problem. In this paper, we show theoretically that accurate task ID prediction on graph data can be achieved by a Laplacian smoothing-based graph task profiling approach, in which each graph task is modeled by a task prototype based on Laplacian smoothing over the graph. It guarantees that the task prototypes of the same graph task are nearly the same with a large smoothing step, while those of different tasks are distinct due to differences in graph structure and node attributes. Further, to avoid the catastrophic forgetting of the knowledge learned in previous graph tasks, we propose a novel graph prompting approach for GCIL which learns a small discriminative graph prompt for each task, essentially resulting in a separate classification model for each task. The prompt learning requires the training of a single graph neural network (GNN) only once on the first task, and no data replay is required thereafter, thereby obtaining a GCIL model being both replay-free and forget-free. Extensive experiments on four GCIL benchmarks show that i) our task prototype-based method can achieve 100% task ID prediction accuracy on all four datasets, ii) our GCIL model significantly outperforms state-of-the-art competing methods by at least 18% in average CIL accuracy, and iii) our model is fully free of forgetting on the four datasets.

AAAI Conference 2024 Conference Paper

Simple Image-Level Classification Improves Open-Vocabulary Object Detection

  • Ruohuan Fang
  • Guansong Pang
  • Xiao Bai

Open-Vocabulary Object Detection (OVOD) aims to detect novel objects beyond a given set of base categories on which the detection model is trained. Recent OVOD methods focus on adapting the image-level pre-trained vision-language models (VLMs), such as CLIP, to a region-level object detection task via, eg., region-level knowledge distillation, regional prompt learning, or region-text pre-training, to expand the detection vocabulary. These methods have demonstrated remarkable performance in recognizing regional visual concepts, but they are weak in exploiting the VLMs' powerful global scene understanding ability learned from the billion-scale image-level text descriptions. This limits their capability in detecting hard objects of small, blurred, or occluded appearance from novel/base categories, whose detection heavily relies on contextual information. To address this, we propose a novel approach, namely Simple Image-level Classification for Context-Aware Detection Scoring (SIC-CADS), to leverage the superior global knowledge yielded from CLIP for complementing the current OVOD models from a global perspective. The core of SIC-CADS is a multi-modal multi-label recognition (MLR) module that learns the object co-occurrence-based contextual information from CLIP to recognize all possible object categories in the scene. These image-level MLR scores can then be utilized to refine the instance-level detection scores of the current OVOD models in detecting those hard objects. This is verified by extensive empirical results on two popular benchmarks, OV-LVIS and OV-COCO, which show that SIC-CADS achieves significant and consistent improvement when combined with different types of OVOD models. Further, SIC-CADS also improves the cross-dataset generalization ability on Objects365 and OpenImages. Code is available at https://github.com/mala-lab/SIC-CADS.

AAAI Conference 2024 Conference Paper

VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection

  • Peng Wu
  • Xuerong Zhou
  • Guansong Pang
  • Lingru Zhou
  • Qingsen Yan
  • Peng Wang
  • Yanning Zhang

The recent contrastive language-image pre-training (CLIP) model has shown great success in a wide range of image-level tasks, revealing remarkable ability for learning powerful visual representations with rich semantics. An open and worthwhile problem is efficiently adapting such a strong model to the video domain and designing a robust video anomaly detector. In this work, we propose VadCLIP, a new paradigm for weakly supervised video anomaly detection (WSVAD) by leveraging the frozen CLIP model directly without any pre-training and fine-tuning process. Unlike current works that directly feed extracted features into the weakly supervised classifier for frame-level binary classification, VadCLIP makes full use of fine-grained associations between vision and language on the strength of CLIP and involves dual branch. One branch simply utilizes visual features for coarse-grained binary classification, while the other fully leverages the fine-grained language-image alignment. With the benefit of dual branch, VadCLIP achieves both coarse-grained and fine-grained video anomaly detection by transferring pre-trained knowledge from CLIP to WSVAD task. We conduct extensive experiments on two commonly-used benchmarks, demonstrating that VadCLIP achieves the best performance on both coarse-grained and fine-grained WSVAD, surpassing the state-of-the-art methods by a large margin. Specifically, VadCLIP achieves 84.51% AP and 88.02% AUC on XD-Violence and UCF-Crime, respectively. Code and features are released at https://github.com/nwpu-zxr/VadCLIP.

AAAI Conference 2023 Conference Paper

Cross-Domain Graph Anomaly Detection via Anomaly-Aware Contrastive Alignment

  • Qizhou Wang
  • Guansong Pang
  • Mahsa Salehi
  • Wray Buntine
  • Christopher Leckie

Cross-domain graph anomaly detection (CD-GAD) describes the problem of detecting anomalous nodes in an unlabelled target graph using auxiliary, related source graphs with labelled anomalous and normal nodes. Although it presents a promising approach to address the notoriously high false positive issue in anomaly detection, little work has been done in this line of research. There are numerous domain adaptation methods in the literature, but it is difficult to adapt them for GAD due to the unknown distributions of the anomalies and the complex node relations embedded in graph data. To this end, we introduce a novel domain adaptation approach, namely Anomaly-aware Contrastive alignmenT (ACT), for GAD. ACT is designed to jointly optimise: (i) unsupervised contrastive learning of normal representations of nodes in the target graph, and (ii) anomaly-aware one-class alignment that aligns these contrastive node representations and the representations of labelled normal nodes in the source graph, while enforcing significant deviation of the representations of the normal nodes from the labelled anomalous nodes in the source graph. In doing so, ACT effectively transfers anomaly-informed knowledge from the source graph to learn the complex node relations of the normal class for GAD on the target graph without any specification of the anomaly distributions. Extensive experiments on eight CD-GAD settings demonstrate that our approach ACT achieves substantially improved detection performance over 10 state-of-the-art GAD methods. Code is available at https://github.com/QZ-WANG/ACT.

NeurIPS Conference 2023 Conference Paper

Truncated Affinity Maximization: One-class Homophily Modeling for Graph Anomaly Detection

  • Hezhe Qiao
  • Guansong Pang

We reveal a one-class homophily phenomenon, which is one prevalent property we find empirically in real-world graph anomaly detection (GAD) datasets, i. e. , normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker than normal nodes. However, this anomaly-discriminative property is ignored by existing GAD methods that are typically built using a conventional anomaly detection objective, such as data reconstruction. In this work, we explore this property to introduce a novel unsupervised anomaly scoring measure for GAD -- local node affinity-- that assigns a larger anomaly score to nodes that are less affiliated with their neighbors, with the affinity defined as similarity on node attributes/representations. We further propose Truncated Affinity Maximization (TAM) that learns tailored node representations for our anomaly measure by maximizing the local affinity of nodes to their neighbors. Optimizing on the original graph structure can be biased by non-homophily edges(i. e. , edges connecting normal and abnormal nodes). Thus, TAM is instead optimized on truncated graphs where non-homophily edges are removed iteratively to mitigate this bias. The learned representations result in significantly stronger local affinity for normal nodes than abnormal nodes. Extensive empirical results on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. Our code is available at https: //github. com/mala-lab/TAM-master/.

IS Journal 2022 Journal Article

AI in Beijing 2022 Olympic Winter Games

  • Guansong Pang

Artificial intelligence (AI) is playing a driving role in empowering and safeguarding Beijing 2022 Olympic Winter Games. The power of AI ranges from improving player training, coaching and performance scoring, supporting language translation and communication, athlete and audience services, and catering, to enabling security, and COVID-19 and epidemic prevention.

IS Journal 2022 Journal Article

Artificial Intelligence for Natural Disaster Management

  • Guansong Pang

Artificial intelligence (AI) can leverage massive amount of diverse types of data, such as geospatial data, social media data, and wireless network sensor data, to enhance our understanding of natural disasters, their forecasting and detection, and humanitarian assistance in natural disaster management (NDM). Due to this potential, different communities have been dedicating enormous efforts to the development and/or adoption of AI technologies for NDM. This article provides an overview of these efforts and discusses major challenges and opportunities in this topic.

AAAI Conference 2022 Conference Paper

Deep One-Class Classification via Interpolated Gaussian Descriptor

  • Yuanhong Chen
  • Yu Tian
  • Guansong Pang
  • Gustavo Carneiro

One-class classification (OCC) aims to learn an effective data description to enclose all normal training samples and detect anomalies based on the deviation from the data description. Current state-of-the-art OCC models learn a compact normality description by hyper-sphere minimisation, but they often suffer from overfitting the training data, especially when the training set is small or contaminated with anomalous samples. To address this issue, we introduce the interpolated Gaussian descriptor (IGD) method, a novel OCC model that learns a one-class Gaussian anomaly classifier trained with adversarially interpolated training samples. The Gaussian anomaly classifier differentiates the training samples based on their distance to the Gaussian centre and the standard deviation of these distances, offering the model a discriminability w. r. t. the given samples during training. The adversarial interpolation is enforced to consistently learn a smooth Gaussian descriptor, even when the training data is small or contaminated with anomalous samples. This enables our model to learn the data description based on the representative normal samples rather than fringe or anomalous samples, resulting in significantly improved normality description. In extensive experiments on diverse popular benchmarks, including MNIST, Fashion MNIST, CIFAR10, MVTec AD and two medical datasets, IGD achieves better detection accuracy than current state-of-the-art models. IGD also shows better robustness in problems with small or contaminated training sets.

IS Journal 2022 Journal Article

The AI Chip Race

  • Guansong Pang

The strong demand of computing power for artificial intelligence (AI) and machine learning is accelerating the race to develop cheaper and faster AI chips. The AI chip market was valued $10. 6 billion in 2021 and the total revenue is expected to reach $79. 8 billion by 2027. aa. [Online]. Available: https://www.maximizemarketresearch.com/market-report/global-artificial-intelligence-chipset-market/66849/. To be part of the market, tech giants from different countries have been successively joining the race, while AI chip startups attracting billions of dollars are taking off like a rocket.

IS Journal 2021 Journal Article

Guest Editorial: Non-IID Outlier Detection in Complex Contexts

  • Guansong Pang
  • Fabrizio Angiulli
  • Mihai Cucuringu
  • Huan Liu

Outlier detection, also known as anomaly detection, aims at identifying data instances that are rare or significantly different from the majority of instances. Due to its significance in many critical domains like cybersecurity, fintech, healthcare, public security, and AI safety, outlier detection has been one of the most active research areas in various communities, such as machine learning, data mining, computer vision, and statistics. Traditional outlier-detection techniques generally assume that data are independent and identically distributed (IID), which are significantly challenged in complex contexts where data are actually non-IID. These contexts are ubiquitous in not only graph data, sequence data, spatial data, temporal data, and streaming data, but also traditional multidimensional, textual, and image data. 4–6 This demands for advanced outlierdetection approaches to address those explicit or implicit non-IID data characteristics.

IJCAI Conference 2020 Conference Paper

Unsupervised Representation Learning by Predicting Random Distances

  • Hu Wang
  • Guansong Pang
  • Chunhua Shen
  • Congbo Ma

Deep neural networks have gained great success in a broad range of tasks due to its remarkable capability to learn semantically rich features from high-dimensional data. However, they often require large-scale labelled data to successfully learn such features, which significantly hinders their adaption in unsupervised learning tasks, such as anomaly detection and clustering, and limits their applications to critical domains where obtaining massive labelled data is prohibitively expensive. To enable unsupervised learning on those domains, in this work we propose to learn features without using any labelled data by training neural networks to predict data distances in a randomly projected space. Random mapping is a theoretically proven approach to obtain approximately preserved distances. To well predict these distances, the representation learner is optimised to learn genuine class structures that are implicitly embedded in the randomly projected space. Empirical results on 19 real-world datasets show that our learned representations substantially outperform a few state-of-the-art methods for both anomaly detection and clustering tasks. Code is available at: \url{https: //git. io/RDP}

AAAI Conference 2018 Conference Paper

Sparse Modeling-Based Sequential Ensemble Learning for Effective Outlier Detection in High-Dimensional Numeric Data

  • Guansong Pang
  • Longbing Cao
  • Ling Chen
  • Defu Lian
  • Huan Liu

The large proportion of irrelevant or noisy features in reallife high-dimensional data presents a significant challenge to subspace/feature selection-based high-dimensional outlier detection (a. k. a. outlier scoring) methods. These methods often perform the two dependent tasks: relevant feature subset search and outlier scoring independently, consequently retaining features/subspaces irrelevant to the scoring method and downgrading the detection performance. This paper introduces a novel sequential ensemble-based framework SEMSE and its instance CINFO to address this issue. SEMSE learns the sequential ensembles to mutually refine feature selection and outlier scoring by iterative sparse modeling with outlier scores as the pseudo target feature. CINFO instantiates SEMSE by using three successive recurrent components to build such sequential ensembles. Given outlier scores output by an existing outlier scoring method on a feature subset, CINFO first defines a Cantelli’s inequality-based outlier thresholding function to select outlier candidates with a false positive upper bound. It then performs lasso-based sparse regression by treating the outlier scores as the target feature and the original features as predictors on the outlier candidate set to obtain a feature subset that is tailored for the outlier scoring method. Our experiments show that two different outlier scoring methods enabled by CINFO (i) perform significantly better on 11 real-life high-dimensional data sets, and (ii) have much better resilience to noisy features, compared to their bare versions and three state-of-theart competitors. The source code of CINFO is available at https: //sites. google. com/site/gspangsite/sourcecode.

IJCAI Conference 2017 Conference Paper

Embedding-based Representation of Categorical Data by Hierarchical Value Coupling Learning

  • Songlei Jian
  • Longbing Cao
  • Guansong Pang
  • Kai Lu
  • Hang Gao

Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i. e. , a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster values with different granularities. It further models the couplings in value clusters within the same granularity and with different granularities to embed feature values into a new numerical space with independent dimensions. Substantial experiments show that CDE significantly outperforms three popular unsupervised embedding methods and three state-of-the-art similarity-based representation methods.

IJCAI Conference 2017 Conference Paper

Learning Homophily Couplings from Non-IID Data for Joint Feature Selection and Noise-Resilient Outlier Detection

  • Guansong Pang
  • Longbing Cao
  • Ling Chen
  • Huan Liu

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i. e. , data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i. e. , abnormal behaviors are not independent - they bond together) in constructing a noise-resilient outlier scoring function to produce a reliable outlier ranking in each iteration. We show that HOUR (i) retains a 2-approximation outlier ranking to the optimal one; and (ii) significantly outperforms five state-of-the-art competitors on 15 real-world data sets with different noise levels in terms of AUC and/or P@n. The source code of HOUR is available at https: //sites. google. com/site/gspangsite/sourcecode.

IJCAI Conference 2016 Conference Paper

Outlier Detection in Complex Categorical Data by Modeling the Feature Value Couplings

  • Guansong Pang
  • Longbing Cao
  • Ling Chen

This paper introduces a novel unsupervised outlier detection method, namely Coupled Biased Random Walks (CBRW), for identifying outliers in categorical data with diversified frequency distributions and many noisy features. Existing pattern-based outlier detection methods are ineffective in handling such complex scenarios, as they misfit such data. CBRW estimates outlier scores of feature values by modelling feature value level couplings, which carry intrinsic data characteristics, via biased random walks to handle this complex data. The outlier scores of feature values can either measure the outlierness of an object or facilitate the existing methods as a feature weighting and selection indicator. Substantial experiments show that CBRW can not only detect outliers in complex data significantly better than the state-of-the-art methods, but also greatly improve the performance of existing methods on data sets with many noisy features.

JAIR Journal 2016 Journal Article

ZERO++: Harnessing the Power of Zero Appearances to Detect Anomalies in Large-Scale Data Sets

  • Guansong Pang
  • Kai Ming Ting
  • David Albrecht
  • Huidong Jin

This paper introduces a new unsupervised anomaly detector called ZERO++ which employs the number of zero appearances in subspaces to detect anomalies in categorical data. It is unique in that it works in regions of subspaces that are not occupied by data; whereas existing methods work in regions occupied by data. ZERO++ examines only a small number of low dimensional subspaces to successfully identify anomalies. Unlike existing frequency-based algorithms, ZERO++ does not involve subspace pattern searching. We show that ZERO++ is better than or comparable with the state-of-the-art anomaly detection methods over a wide range of real-world categorical and numeric data sets; and it is efficient with linear time complexity and constant space complexity which make it a suitable candidate for large-scale data sets.