Author name cluster

Guibo Zhu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2026 Conference Paper

AnomalyMoE: Towards a Language-free Generalist Model for Unified Visual Anomaly Detection

Zhaopeng Gu
Bingke Zhu
Guibo Zhu
Yingying Chen
Wei Ge
Ming Tang
Jinqiao Wang

Anomaly detection is a critical task across numerous domains and modalities, yet existing methods are often highly specialized, limiting their generalizability. These specialized models, tailored for specific anomaly types like textural defects or logical errors, typically exhibit limited performance when deployed outside their designated contexts. To overcome this limitation, we propose AnomalyMoE, a novel and universal anomaly detection framework based on a Mixture-of-Experts (MoE) architecture. Our key insight is to decompose the complex anomaly detection problem into three distinct semantic hierarchies: local structural anomalies, component-level semantic anomalies, and global logical anomalies. AnomalyMoE correspondingly employs three dedicated expert networks at the patch, component, and global levels, and is specialized in reconstructing features and identifying deviations at its designated semantic level. This hierarchical design allows a single model to concurrently understand and detect a wide spectrum of anomalies. Furthermore, we introduce an Expert Information Repulsion (EIR) module to promote expert diversity and an Expert Selection Balancing (ESB) module to ensure the comprehensive utilization of all experts. Experiments on 8 challenging datasets spanning industrial imaging, 3D point clouds, medical imaging, video surveillance, and logical anomaly detection demonstrate that AnomalyMoE establishes new state-of-the-art performance, significantly outperforming specialized methods in their respective domains.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Improving Generalization in LLM Structured Pruning via Function-Aware Neuron Grouping

Tao Yu
Yongqi An
Kuan Zhu
Guibo Zhu
Ming Tang
Jinqiao Wang

Large Language Models (LLMs) demonstrate impressive performance across natural language tasks but incur substantial computational and storage costs due to their scale. Post-training structured pruning offers an efficient solution. However, when few-shot calibration sets fail to adequately reflect the pretraining data distribution, existing methods exhibit limited generalization to downstream tasks. To address this issue, we propose Function-Aware Neuron Grouping (FANG), a post-training pruning framework that alleviates calibration bias by identifying and preserving neurons critical to specific function. FANG groups neurons with similar function based on the type of semantic context they process and prunes each group independently. During importance estimation within each group, tokens that strongly correlate with the functional role of the neuron group are given higher weighting. Additionally, FANG also preserves neurons that contribute across multiple context types. To achieve a better trade-off between sparsity and performance, it allocates sparsity to each block adaptively based on its functional complexity. Experiments show that FANG improves downstream accuracy while preserving language modeling performance. It achieves the state-of-the-art (SOTA) results when combined with FLAP and OBC, two representative pruning methods. Specifically, FANG outperforms FLAP and OBC by 1.5%–8.5% in average accuracy under 30% and 40% sparsity.

PDF Details DOI

AAAI Conference 2025 Conference Paper

See Through Their Minds: Learning Transferable Brain Decoding Models from Cross-Subject fMRI

Yulong Liu
Yongqiang Ma
Guibo Zhu
Haodong Jing
Nanning Zheng

Deciphering visual content from fMRI sheds light on the human vision system, but data scarcity and noise limit brain decoding model performance. Traditional approaches rely on subject-specific models, which are sensitive to training sample size. In this paper, we address data scarcity by proposing shallow subject-specific adapters to map cross-subject fMRI data into unified representations. A shared deep decoding model then decodes these features into the target feature space. We use both visual and textual supervision for multi-modal brain decoding and integrate high-level perception decoding with pixel-wise reconstruction guided by high-level perceptions. Our extensive experiments reveal several interesting insights: 1) Training with cross-subject fMRI benefits both high-level and low-level decoding models; 2) Merging high-level and low-level information improves reconstruction performance at both levels; 3) Transfer learning is effective for new subjects with limited training data by training new adapters; 4) Decoders trained on visually-elicited brain activity can generalize to decode imagery-induced activity, though with reduced performance.

PDF Details DOI

AAAI Conference 2024 Conference Paper

AnomalyGPT: Detecting Industrial Anomalies Using Large Vision-Language Models

Zhaopeng Gu
Bingke Zhu
Guibo Zhu
Yingying Chen
Ming Tang
Jinqiao Wang

Large Vision-Language Models (LVLMs) such as MiniGPT-4 and LLaVA have demonstrated the capability of understanding images and achieved remarkable performance in various visual tasks. Despite their strong abilities in recognizing common objects due to extensive training datasets, they lack specific domain knowledge and have a weaker understanding of localized details within objects, which hinders their effectiveness in the Industrial Anomaly Detection (IAD) task. On the other hand, most existing IAD methods only provide anomaly scores and necessitate the manual setting of thresholds to distinguish between normal and abnormal samples, which restricts their practical implementation. In this paper, we explore the utilization of LVLM to address the IAD problem and propose AnomalyGPT, a novel IAD approach based on LVLM. We generate training data by simulating anomalous images and producing corresponding textual descriptions for each image. We also employ an image decoder to provide fine-grained semantic and design a prompt learner to fine-tune the LVLM using prompt embeddings. Our AnomalyGPT eliminates the need for manual threshold adjustments, thus directly assesses the presence and locations of anomalies. Additionally, AnomalyGPT supports multi-turn dialogues and exhibits impressive few-shot in-context learning capabilities. With only one normal shot, AnomalyGPT achieves the state-of-the-art performance with an accuracy of 86.1%, an image-level AUC of 94.1%, and a pixel-level AUC of 95.3% on the MVTec-AD dataset.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

TaiSu: A 166M Large-scale High-Quality Dataset for Chinese Vision-Language Pre-training

Yulong Liu
Guibo Zhu
Bin Zhu
Qi Song
Guojing Ge
Haoran Chen
GuanHui Qiao
Ru Peng

Vision-Language Pre-training (VLP) has been shown to be an efficient method to improve the performance of models on different vision-and-language downstream tasks. Substantial studies have shown that neural networks may be able to learn some general rules about language and visual concepts from a large-scale weakly labeled image-text dataset. However, most of the public cross-modal datasets that contain more than 100M image-text pairs are in English; there is a lack of available large-scale and high-quality Chinese VLP datasets. In this work, we propose a new framework for automatic dataset acquisition and cleaning with which we construct a new large-scale and high-quality cross-modal dataset named as TaiSu, containing 166 million images and 219 million Chinese captions. Compared with the recently released Wukong dataset, our dataset is achieved with much stricter restrictions on the semantic correlation of image-text pairs. We also propose to combine texts collected from the web with texts generated by a pre-trained image-captioning model. To the best of our knowledge, TaiSu is currently the largest publicly accessible Chinese cross-modal dataset. Furthermore, we test our dataset on several vision-language downstream tasks. TaiSu outperforms BriVL by a large margin on the zero-shot image-text retrieval task and zero-shot image classification task. TaiSu also shows better performance than Wukong on the image-retrieval task without using image augmentation for training. Results demonstrate that TaiSu can serve as a promising VLP dataset, both for understanding and generative tasks. More information can be referred to https: //github. com/ksOAn6g5/TaiSu.

PDF Details

IJCAI Conference 2017 Conference Paper

Diverse Neuron Type Selection for Convolutional Neural Networks

Guibo Zhu
Zhaoxiang Zhang
Xu-Yao Zhang
Cheng-Lin Liu

The activation function for neurons is a prominent element in the deep learning architecture for obtaining high performance. Inspired by neuroscience findings, we introduce and define two types of neurons with different activation functions for artificial neural networks: excitatory and inhibitory neurons, which can be adaptively selected by self-learning. Based on the definition of neurons, in the paper we not only unify the mainstream activation functions, but also discuss the complementariness among these types of neurons. In addition, through the cooperation of excitatory and inhibitory neurons, we present a compositional activation function that leads to new state-of-the-art performance comparing to rectifier linear units. Finally, we hope that our framework not only gives a basic unified framework of the existing activation neurons to provide guidance for future design, but also contributes neurobiological explanations which can be treated as a window to bridge the gap between biology and computer science.

PDF Details

AAAI Conference 2016 Conference Paper

MC-HOG Correlation Tracking with Saliency Proposal

Guibo Zhu
Jinqiao Wang
Yi Wu
Xiaoyu Zhang
Hanqing Lu

Designing effective feature and handling the model drift problem are two important aspects for online visual tracking. For feature representation, gradient and color features are most widely used, but how to effectively combine them for visual tracking is still an open problem. In this paper, we propose a rich feature descriptor, MC-HOG, by leveraging rich gradient information across multiple color channels or spaces. Then MC-HOG features are embedded into the correlation tracking framework to estimate the state of the target. For handling the model drift problem caused by occlusion or distracter, we propose saliency proposals as prior information to provide candidates and reduce background interference. In addition to saliency proposals, a ranking strategy is proposed to determine the importance of these proposals by exploiting the learnt appearance ﬁlter, historical preserved object samples and the distracting proposals. In this way, the proposed approach could effectively explore the color-gradient characteristics and alleviate the model drift problem. Extensive evaluations performed on the benchmark dataset show the superiority of the proposed method.

PDF Details