Author name cluster

Jinbao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

2 author rows

EAAI Journal 2026 Journal Article

A novel incremental method with dynamic learnable pruning mechanism for low-speed machinery fault diagnosis

Haihong Tang
Xiaojia Zu
Yuncheng Guoa
Xue Jiang
Jinbao Wang
Rongsheng Lin
Hongtao Xue
Huaqing Wang

—it is a non-negligible issue for catastrophic forgetting to perform low-speed bearing fault diagnosis, whereas previously learned features significantly affect the model's performance facing challenges related to the fault information increments. In terms of issues, a new lifelong learning based on inverted transformers with learnable pruning mechanism is proposed to enhance adaptability facing multiple fault information increments. The backbone of diagnosis model effectively learned global information perception and local information refinement in signals of multiple sensors through the multi-head inverted attention in the inverted transformer. One new contribution (the dynamic learnable pruning mechanism), consisting of dynamic exemplar selection and pruning mechanism, effectively assists in balancing the memory and learning capabilities, that is, consolidating the stability-plasticity of the model. The former is performed to adjust the retention and utilization of exemplars in the memory bank, thereby keeping memory through the exemplars' diversity, mitigating catastrophic forgetting. Furthermore, the latter is applied to address the dilemma caused by predefined and fixed structures in the previous stage throughout the entire training process of the model. The effectiveness and feasibility of the proposed method is validated on low-speed machinery (two cases).

Details DOI

AAAI Conference 2026 Conference Paper

Scene Experts: Specializing in 3D Gaussian Splatting with Adaptive Decomposition

Xiaowen Fu
Yang Zhang
Yuhan Tang
Huazhong Zhang
Tianxing Zhao
Yuhang Guo
Yu Huang
Jinbao Wang

Anchor-based 3D Gaussian Splatting (GS), exemplified by Scaffold-GS, achieves remarkable storage efficiency through a hybrid explicit-implicit representation. However, their reliance on a single, monolithic network to decode anchor features imposes a severe bottleneck on model capacity, often resulting in blurred details and view-dependent artifacts in complex scenes. To break this bottleneck, we introduce the concept of Scene Experts: a strategy that decomposes the task of modeling a complex scene across a collection of specialized sub-models. To realize the paradigm, we propose MoE-GS. Our approach designs the decoder as a Sparsely-Gated Mixture of Experts (MoE), which dramatically increases the model's total capacity while maintaining comparable inference cost via sparse activation. To effectively train this high-capacity model, we propose two key innovations: (1) A progressive curriculum learning strategy that first trains all experts on a robust baseline before encouraging them to specialize on different scene components. (2) A novel opacity-aware regularization that penalizes inactive neural Gaussians, ensuring the expanded capacity is efficiently used. Extensive experiments demonstrate that MoE-GS substantially outperforms state-of-the-art methods on diverse benchmarks, significantly improving reconstruction fidelity while requiring a smaller or comparable Gaussian model size.

PDF Details DOI

AAAI Conference 2025 Conference Paper

DCSF-KD: Dynamic Channel-wise Spatial Feature Knowledge Distillation for Object Detection

Tao Dai
Yang Lin
Hang Guo
Jinbao Wang
Zexuan Zhu

Knowledge distillation (KD) has recently gained great success in the field of object detection. By transferring the knowledge of the spatial or channel domain from the teacher model to the student model, it allows for a more compact representation with minimal performance loss. Despite this progress, existing KD methods typically treat knowledge from spatial or channel domains independently, ignoring the exploitation of the mutual relationship between these domains. In this work, we first explore the connection between spatial and channel domains and find there exists a strong correlation between them, i.e. the salient channels tend to contain significant object regions in the spatial domain. Motivated by this observation, we propose DCSF-KD, a novel Dynamic Channel-wise Spatial Feature Knowledge Distillation framework for object detection by fully exploiting both spatial and channel knowledge. Specifically, we introduce channel-wise spatial feature distillation and global channel attention distillation, using information from both domains to improve the accuracy of the student network. Experiments demonstrate that our DCSF-KD outperforms existing detection methods on both homogeneous and heterogeneous teacher-student network pairs. For example, when using the MaskRCNN-Swin detector as the teacher, and based on RetinaNet and FCOS with ResNet-50 on MS COCO, our DCSF-KD can achieve 41.9% and 44.1% mAP, respectively.

PDF Details DOI

EAAI Journal 2025 Journal Article

Enhancing anomaly detection with few-shot fine-tuned long text-to-image models

Jiachen Liu
Jiajia An
Junbin Lu
Zhuoqin Yang
Jinbao Wang
Ping Lu
Yuying Wang
Linlin Shen

Industrial anomaly detection plays a crucial role in the industrial manufacturing field. Currently, utilizing generated data to improve the performance of the anomaly detection model is an effective approach. However, most existing methods often rely on mask-guided synthesis, where the distribution of the generated defects is limited by masks that are typically random or learned by a model. In addition, the scarcity of real anomalous samples makes it difficult for generative models to capture genuine defect patterns and align with the real anomaly distribution. To tackle these issues, we propose DefectGen, the first long-text-guided few-shot text-to-image data generation pipeline for industrial anomaly detection. To improve distribution alignment under limited anomaly samples, DefectGen incorporates a Prompt Generation and Variation Module, which uses MLLMs (Multimodal Large Language Models) to expand few-shot image–text pairs into diverse and semantically rich prompts, and DoKr (Weight-Decomposed Low-Rank Adaptation with Kronecker product), a lightweight fine-tuning strategy with structured low-rank adaptation. To ensure the quality of synthetic data, DefectGen further introduces the Real-Guided Clustering Filter, which selects high-quality generated samples by comparing their features with those of real anomalies. Experiments on the MVTec AD(MVTec AnomalyDetection) dataset show that DefectGen generates more diverse and realistic synthetic anomalies and achieves a 5. 58% average improvement in anomaly classification accuracy compared to state-of-the-art methods. Code and data are available at: https: //anonymous. 4open. science/r/DefectGen-CD04/.

Details DOI

NeurIPS Conference 2025 Conference Paper

FAST: Foreground‑aware Diffusion with Accelerated Sampling Trajectory for Segmentation‑oriented Anomaly Synthesis

xichen xu
Yanshu Wang
Jinbao Wang
XiaoNing Lei
Guoyang Xie
Guannan Jiang
Zhichao Lu

Industrial anomaly segmentation relies heavily on pixel-level annotations, yet real-world anomalies are often scarce, diverse, and costly to label. Segmentation-oriented industrial anomaly synthesis (SIAS) has emerged as a promising alternative; however, existing methods struggle to balance sampling efficiency and generation quality. Moreover, most approaches treat all spatial regions uniformly, overlooking the distinct statistical differences between anomaly and background areas. This uniform treatment hinders the synthesis of controllable, structure-specific anomalies tailored for segmentation tasks. In this paper, we propose FAST, a foreground-aware diffusion framework featuring two novel modules: the Anomaly-Informed Accelerated Sampling (AIAS) and the Foreground-Aware Reconstruction Module (FARM). AIAS is a training-free sampling algorithm specifically designed for segmentation-oriented industrial anomaly synthesis, which accelerates the reverse process through coarse-to-fine aggregation and enables the synthesis of state-of-the-art segmentation-oriented anomalies in as few as 10 steps. Meanwhile, FARM adaptively adjusts the anomaly-aware noise within the masked foreground regions at each sampling step, preserving localized anomaly signals throughout the denoising trajectory. Extensive experiments on multiple industrial benchmarks demonstrate that FAST consistently outperforms existing anomaly synthesis methods in downstream segmentation tasks. We release the code in https: //github. com/Chhro123/fast-foreground-aware-anomaly-synthesis.

PDF Details

AAAI Conference 2025 Conference Paper

Learning with Open-world Noisy Data via Class-independent Margin in Dual Representation Space

Linchao Pan
Can Gao
Jie Zhou
Jinbao Wang

Learning with Noisy Labels (LNL) aims to improve the model generalization when facing data with noisy labels, and existing methods generally assume that noisy labels come from known classes, called closed-set noise. However, in real-world scenarios, noisy labels from similar unknown classes, i.e., open-set noise, may occur during the training and inference stage. Such open-world noisy labels may significantly impact the performance of LNL methods. In this study, we propose a novel dual-space joint learning method to robustly handle the open-world noise. To mitigate model overfitting on closed-set and open-set noises, a dual representation space is constructed by two networks. One is a projection network that learns shared representations in the prototype space, while the other is a One-Vs-All (OVA) network that makes predictions using unique semantic representations in the class-independent space. Then, bi-level contrastive learning and consistency regularization are introduced in two spaces to enhance the detection capability for data with unknown classes. To benefit from the memorization effects across different types of samples, class-independent margin criteria are designed for sample identification, which selects clean samples, weights closed-set noise, and filters open-set noise effectively. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods and achieves an average accuracy improvement of 4.55\% and an AUROC improvement of 6.17\% on CIFAR80N.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection

Hanzhe Liang
Guoyang Xie
Chengbin Hou
Bingshu Wang
Can Gao
Jinbao Wang

3D anomaly detection has recently become a significant focus in computer vision. Several advanced methods have achieved satisfying anomaly detection performance. However, they typically concentrate on the external structure of 3D samples and struggle to leverage the internal information embedded within samples. Inspired by the basic intuition of why not look inside for more, we observed this prototype is straightforward and effective. As a result, we introduce a newly designed mode named Internal Spatial Modality Perception (ISMP) to explore the feature representation from internal views fully. Specifically, our proposed ISMP consists of a critical perception module, Spatial Insight Engine (SIE), which abstracts complex internal information of point clouds into essential global features. Besides, to better align structural information with point data, we propose an enhanced key point feature extraction method for amplifying spatial structure feature representation. Simultaneously, a novel feature filtering module is incorporated to reduce noise and redundant features for further precise spatial structure aligning. Extensive experiments validate the efficiency of our proposed method, achieving object-level and pixel-level AUROC improvements of 4.2% and 13.1%, respectively, on the Real3D-AD benchmarks. Note that the strong generalization ability of SIE has been theoretically proven and verified in both classification and segmentation tasks. Our code will be released upon acceptance.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection

Jiayi Cheng
Can Gao
Jie Zhou
Jiajun Wen
Tao Dai
Jinbao Wang

3D Anomaly Detection (AD) is a promising means of controlling the quality of manufactured products. However, existing methods typically require carefully training a task-specific model for each category independently, leading to high cost, low efficiency, and weak generalization. This study presents a novel unified model for Multi-Category 3D Anomaly Detection (MC3D-AD) that aims to utilize both local and global geometry-aware information to reconstruct normal representations of all categories. First, to learn robust and generalized features of different categories, we propose an adaptive geometry-aware masked attention module that extracts geometry variation information to guide mask attention. Then, we introduce a local geometry-aware encoder reinforced by the improved mask attention to encode group-level feature tokens. Finally, we design a global query decoder that utilizes point cloud position embeddings to improve the decoding process and reconstruction ability. This leads to local and global geometry-aware reconstructed feature tokens for the 3D AD task. MC3D-AD is evaluated on two publicly available Real3D-AD and Anomaly-ShapeNet datasets, and exhibits significant superiority over current state-of-the-art single-category methods, achieving 3. 1% and 9. 3% improvement in object-level AUROC over Real3D-AD and Anomaly-ShapeNet, respectively. The code is available at https: //github. com/iCAN-SZU/MC3D-AD.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

FreqFormer: Frequency-aware Transformer for Lightweight Image Super-resolution

Tao Dai
Jianping Wang
Hang Guo
Jinmin Li
Jinbao Wang
Zexuan Zhu

Transformer-based models have been widely and successfully used in various low-vision visual tasks, and have achieved remarkable performance in single image super-resolution (SR). Despite the significant progress in SR, Transformer-based SR methods (e. g. , SwinIR) still suffer from the problems of heavy computation cost and low-frequency preference, while ignoring the reconstruction of rich high-frequency information, hence hindering the representational power of Transformers. To address these issues, in this paper, we propose a novel Frequency-aware Transformer (FreqFormer) for lightweight image SR. Specifically, a Frequency Division Module (FDM) is first introduced to separately handle high- and low-frequency information in a divide-and-conquer manner. Moreover, we present Frequency-aware Transformer Block (FTB) to extracting both spatial frequency attention and channel transposed attention to recover high-frequency details. Extensive experimental results on public datasets demonstrate the superiority of our FreqFormer over state-of-the-art SR methods in terms of both quantitative metrics and visual quality. Code and models are available at https: //github. com/JPWang-CS/FreqFormer.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion

Yu Zeng
Yang Zhang
Jiachen Liu
Linlin Shen
Kaijun Deng
Weizhao He
Jinbao Wang

Hair editing is a critical image synthesis task that aims to edit hair color and hairstyle using text descriptions or reference images, while preserving irrelevant attributes (e. g. , identity, background, cloth). Many existing methods are based on StyleGAN to address this task. However, due to the limited spatial distribution of StyleGAN, it struggles with multiple hair color editing and facial preservation. Considering the advancements in diffusion models, we utilize Latent Diffusion Models (LDMs) for hairstyle editing. Our approach introduces Multi-stage Hairstyle Blend (MHB), effectively separating control of hair color and hairstyle in diffusion latent space. Additionally, we train a warping module to align the hair color with the target region. To further enhance multi-color hairstyle editing, we fine-tuned a CLIP model using a multi-color hairstyle dataset. Our method not only tackles the complexity of multi-color hairstyles but also addresses the challenge of preserving original colors during diffusion editing. Extensive experiments showcase the superiority of our method in editing multi-color hairstyles while preserving facial attributes given textual descriptions and reference images.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt

Jiaqi Liu
Kai Wu
Qiang Nie
Ying Chen
Bin-Bin Gao
Yong Liu
Jinbao Wang
Chengjie Wang

Unsupervised Anomaly Detection (UAD) with incremental training is crucial in industrial manufacturing, as unpredictable defects make obtaining sufficient labeled data infeasible. However, continual learning methods primarily rely on supervised annotations, while the application in UAD is limited due to the absence of supervision. Current UAD methods train separate models for different classes sequentially, leading to catastrophic forgetting and a heavy computational burden. To address this issue, we introduce a novel Unsupervised Continual Anomaly Detection framework called UCAD, which equips the UAD with continual learning capability through contrastively-learned prompts. In the proposed UCAD, we design a Continual Prompting Module (CPM) by utilizing a concise key-prompt-knowledge memory bank to guide task-invariant 'anomaly' model predictions using task-specific 'normal' knowledge. Moreover, Structure-based Contrastive Learning (SCL) is designed with the Segment Anything Model (SAM) to improve prompt learning and anomaly segmentation results. Specifically, by treating SAM's masks as structure, we draw features within the same mask closer and push others apart for general feature representations. We conduct comprehensive experiments and set the benchmark on unsupervised continual anomaly detection and segmentation, demonstrating that our method is significantly better than anomaly detection methods, even with rehearsal training. The code will be available at https://github.com/shirowalker/UCAD.

PDF Details DOI

ICLR Conference 2023 Conference Paper

Pushing the Limits of Fewshot Anomaly Detection in Industry Vision: Graphcore

Guoyang Xie
Jinbao Wang
Jiaqi Liu 0004
Yaochu Jin
Feng Zheng 0001

In the area of few-shot anomaly detection (FSAD), efficient visual feature plays an essential role in the memory bank $\mathcal{M}$-based methods. However, these methods do not account for the relationship between the visual feature and its rotated visual feature, drastically limiting the anomaly detection performance. To push the limits, we reveal that rotation-invariant feature property has a significant impact on industrial-based FSAD. Specifically, we utilize graph representation in FSAD and provide a novel visual isometric invariant feature (VIIF) as an anomaly measurement feature. As a result, VIIF can robustly improve the anomaly discriminating ability and can further reduce the size of redundant features stored in $\mathcal{M}$ by a large amount. Besides, we provide a novel model GraphCore via VIIFs that can fast implement unsupervised FSAD training and improve the performance of anomaly detection. A comprehensive evaluation is provided for comparing GraphCore and other SOTA anomaly detection models under our proposed few-shot anomaly detection setting, which shows GraphCore can increase average AUC by 5.8%, 4.1%, 3.4%, and 1.6% on MVTec AD and by 25.5%, 22.0%, 16.9%, and 14.1% on MPDD for 1, 2, 4, and 8-shot cases, respectively.

Details

NeurIPS Conference 2023 Conference Paper

Real3D-AD: A Dataset of Point Cloud Anomaly Detection

Jiaqi Liu
Guoyang Xie
Ruitao Chen
Xinpeng Li
Jinbao Wang
Yong Liu
Chengjie Wang
Feng Zheng

High-precision point cloud anomaly detection is the gold standard for identifying the defects of advancing machining and precision manufacturing. Despite some methodological advances in this area, the scarcity of datasets and the lack of a systematic benchmark hinder its development. We introduce Real3D-AD, a challenging high-precision point cloud anomaly detection dataset, addressing the limitations in the field. With 1, 254 high-resolution 3D items (from forty thousand to millions of points for each item), Real3D-AD is the largest dataset for high-precision 3D industrial anomaly detection to date. Real3D-AD surpasses existing 3D anomaly detection datasets available in terms of point cloud resolution (0. 0010mm-0. 0015mm), $360^{\circ}$ degree coverage and perfect prototype. Additionally, we present a comprehensive benchmark for Real3D-AD, revealing the absence of baseline methods for high-precision point cloud anomaly detection. To address this, we propose Reg3D-AD, a registration-based 3D anomaly detection method incorporating a novel feature memory bank that preserves local and global representations. Extensive experiments on the Real3D-AD dataset highlight the effectiveness of Reg3D-AD. For reproducibility and accessibility, we provide the Real3D-AD dataset, benchmark source code, and Reg3D-AD on our website: https: //github. com/M-3LAB/Real3D-AD.

PDF Details

NeurIPS Conference 2022 Conference Paper

SoftPatch: Unsupervised Anomaly Detection with Noisy Data

Xi Jiang
Jianlin Liu
Jinbao Wang
Qiang Nie
Kai Wu
Yong Liu
Chengjie Wang
Feng Zheng

Although mainstream unsupervised anomaly detection (AD) algorithms perform well in academic datasets, their performance is limited in practical application due to the ideal experimental setting of clean training data. Training with noisy data is an inevitable problem in real-world anomaly detection but is seldom discussed. This paper considers label-level noise in image sensory anomaly detection for the first time. To solve this problem, we proposed a memory-based unsupervised AD method, SoftPatch, which efficiently denoises the data at the patch level. Noise discriminators are utilized to generate outlier scores for patch-level noise elimination before coreset construction. The scores are then stored in the memory bank to soften the anomaly detection boundary. Compared with existing methods, SoftPatch maintains a strong modeling ability of normal data and alleviates the overconfidence problem in coreset. Comprehensive experiments in various noise scenes demonstrate that SoftPatch outperforms the state-of-the-art AD methods on the MVTecAD and BTAD benchmarks and is comparable to those methods under the setting without noise.

PDF Details