Arrow Research search

Author name cluster

De Cheng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers
2 author rows

Possible papers

16

AAAI Conference 2026 Conference Paper

Better Matching, Less Forgetting: A Quality-Guided Matcher for Transformer-based Incremental Object Detection

  • Qirui Wu
  • Shizhou Zhang
  • De Cheng
  • Yinghui Xing
  • Lingyan Ran
  • Dahu Shi
  • Peng Wang

Incremental Object Detection (IOD) aims to continuously learn new object classes without forgetting previously learned ones. A persistent challenge is catastrophic forgetting, primarily attributed to background shift in conventional detectors. While pseudo-labeling mitigates this in dense detectors, we identify a novel, distinct source of forgetting specific to DETR-like architectures: background foregrounding. This arises from the exhaustiveness constraint of the Hungarian matcher, which forcibly assigns every ground truth target to one prediction, even when predictions primarily cover background regions (i.e., low IoU). This erroneous supervision compels the model to misclassify background features as specific foreground classes, disrupting learned representations and accelerating forgetting. To address this, we propose a Quality-guided Min-Cost Max-Flow (Q-MCMF) matcher. To avoid forced assignments, Q-MCMF builds a flow graph and prunes implausible matches based on geometric quality. It then optimizes for the final matching that minimizes cost and maximizes valid assignments. This strategy eliminates harmful supervision from background foregrounding while maximizing foreground learning signals. Extensive experiments on the COCO dataset under various incremental settings demonstrate that our method consistently outperforms existing state-of-the-art approaches.

AAAI Conference 2026 Conference Paper

Harnessing Textual Semantic Priors for Knowledge Transfer and Refinement in CLIP-Driven Continual Learning

  • Lingfeng He
  • De Cheng
  • Di Xu
  • Huaijie Wang
  • Nannan Wang

Continual learning (CL) aims to equip models with the ability to learn from a stream of tasks without forgetting previous knowledge. With the progress of vision-language models like Contrastive Language-Image Pre-training (CLIP), their promise for CL has attracted increasing attention due to their strong generalizability. However, the potential of rich textual semantic priors in CLIP in addressing the stability–plasticity dilemma remains underexplored. During backbone training, most approaches transfer past knowledge without considering semantic relevance, leading to interference from unrelated tasks that disrupt the balance between stability and plasticity. Besides, while text-based classifiers provide strong generalization, they suffer from limited plasticity due to the inherent modality gap in CLIP. Visual classifiers help bridge this gap, but their prototypes lack rich and precise semantics. To address these challenges, we propose Semantic-Enriched Continual Adaptation (SECA), a unified framework that harnesses the anti-forgetting and structured nature of textual priors to guide semantic-aware knowledge transfer in the backbone and reinforce the semantic structure of the visual classifier. Specifically, a Semantic-Guided Adaptive Knowledge Transfer (SG-AKT) module is proposed to assess new images' relevance to diverse historical visual knowledge via textual cues, and aggregate relevant knowledge in an instance-adaptive manner as distillation signals. Moreover, a Semantic-Enhanced Visual Prototype Refinement (SE-VPR) module is introduced to refine visual prototypes using inter-class semantic relations captured in class-wise textual embeddings. Extensive experiments on multiple benchmarks validate the effectiveness of our approach.

ICML Conference 2025 Conference Paper

Demystifying Catastrophic Forgetting in Two-Stage Incremental Object Detector

  • Qirui Wu
  • Shizhou Zhang
  • De Cheng
  • Yinghui Xing
  • Di Xu 0010
  • Peng Wang 0015
  • Yanning Zhang 0001

Catastrophic forgetting is a critical chanllenge for incremental object detection (IOD). Most existing methods treat the detector monolithically, relying on instance replay or knowledge distillation without analyzing component-specific forgetting. Through dissection of Faster R-CNN, we reveal a key insight: Catastrophic forgetting is predominantly localized to the RoI Head classifier, while regressors retain robustness across incremental stages. This finding challenges conventional assumptions, motivating us to develop a framework termed NSGP-RePRE. Regional Prototype Replay (RePRE) mitigates classifier forgetting via replay of two types of prototypes: coarse prototypes represent class-wise semantic centers of RoI features, while fine-grained prototypes model intra-class variations. Null Space Gradient Projection (NSGP) is further introduced to eliminate prototype-feature misalignment by updating the feature extractor in directions orthogonal to subspace of old inputs via gradient projection, aligning RePRE with incremental learning dynamics. Our simple yet effective design allows NSGP-RePRE to achieve state-of-the-art performance on the Pascal VOC and MS COCO datasets under various settings. Our work not only advances IOD methodology but also provide pivotal insights for catastrophic forgetting mitigation in IOD. Code will be available soon.

AAAI Conference 2025 Conference Paper

Dual Information Purification for Lightweight SAR Object Detection

  • Xi Yang
  • Jiachen Sun
  • Songsong Duan
  • De Cheng

Synthetic aperture radar (SAR) object detection requires accurate identification and localization of targets at various scales within SAR images. However, background clutter and speckle noise can obscure key features and mislead the knowledge distillation process. To address these challenges, we introduce the Dual Information Purification Knowledge Distillation (DIPKD) method, which improves the performance of the student model through three key strategies: denoising, enrichment, and decoupling. First, our Selective Noise Suppression (SNS) technique reduces speckle noise in global features by minimizing misleading information from the teacher model. Second, the Knowledge Level Decoupling (KLD) module separates features into target and non-target knowledge, balancing feature mapping and reducing background noise to enhance the extraction of critical information for the student model. Finally, the Reverse Information Transfer (RIT) module refines intermediate features in the student model, compensating for the loss of detailed local information. Experimental results demonstrate that DIPKD significantly outperforms existing distillation techniques in SAR object detection, achieving 60.2% and 51.4% mAP scores on the SSDD and HRSID datasets, respectively. Additionally, the student model shows performance improvements of 1.3% and 2.9% over the teacher model, highlighting the effectiveness of the information purification approach.

IJCAI Conference 2025 Conference Paper

Screening, Rectifying, and Re-Screening: A Unified Framework for Tuning Vision-Language Models with Noisy Labels

  • Chaowei Fang
  • Hangfei Ma
  • Zhihao Li
  • De Cheng
  • Yue Zhang
  • Guanbin Li

Pre-trained vision-language models have shown remarkable potential for downstream tasks. However, their fine-tuning under noisy labels remains an open problem due to challenges like self-confirmation bias and the limitations of conventional small-loss criteria. In this paper, we propose a unified framework to address these issues, consisting of three key steps: Screening, Rectifying, and Re-Screening. First, a dual-level semantic matching mechanism is introduced to categorize samples into clean, ambiguous, and noisy samples by leveraging both macro-level and micro-level textual prompts. Second, we design tailored pseudo-labeling strategies to rectify noisy and ambiguous labels, enabling their effective incorporation into the training process. Finally, a re-screening step, utilizing cross-validation with an auxiliary vision-language model, mitigates self-confirmation bias and enhances the robustness of the framework. Extensive experiments across ten datasets demonstrate that the proposed method significantly outperforms existing approaches for tuning vision-language pre-trained models with noisy labels.

AAAI Conference 2025 Conference Paper

Training Consistent Mixture-of-Experts-Based Prompt Generator for Continual Learning

  • Yue Lu
  • Shizhou Zhang
  • De Cheng
  • Guoqiang Liang
  • Yinghui Xing
  • Nannan Wang
  • Yanning Zhang

Visual prompt tuning-based continual learning (CL) methods have shown promising performance in exemplar-free scenarios, where their key component can be viewed as a prompt generator. Existing approaches generally rely on freezing old prompts, slow updating and task discrimination for prompt generators to preserve stability and minimize forgetting. In contrast, we introduce a novel approach that trains a consistent prompt generator to ensure stability during CL. Consistency means that for any instance from an old task, its corresponding instance-ware prompt generated by the prompt generator remains consistent even as the generator continually updates in a new task. This ensures that the representation of a specific instance remains stable across tasks and thereby prevents forgetting. We employ a mixture of experts (MoE) as the prompt generator, which contains a router and multiple experts. By deriving conditions sufficient to achieve the consistency for the MoE prompt generator, we demonstrate that: during training in a new task, if the router and experts update in the directions orthogonal to the subspaces spanned by old input features and gating vectors, respectively, the consistency can be theoretically guaranteed. To implement this orthogonality, we project parameter gradients to those orthogonal directions using the orthogonal projection matrices computed via the null space method. Extensive experiments on four class-incremental learning benchmarks validate the effectiveness and superiority of our approach.

NeurIPS Conference 2024 Conference Paper

Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection

  • Ying Yang
  • De Cheng
  • Chaowei Fang
  • Yubiao Wang
  • Changzhe Jiao
  • Lechao Cheng
  • Nannan Wang
  • Xinbo Gao

Unsupervised out-of-distribution (OOD) detection aims to identify out-of-domain data by learning only from unlabeled In-Distribution (ID) training samples, which is crucial for developing a safe real-world machine learning system. Current reconstruction-based method provides a good alternative approach, by measuring the reconstruction error between the input and its corresponding generative counterpart in the pixel/feature space. However, such generative methods face the key dilemma, $i. e. $, improving the reconstruction power of the generative model, while keeping compact representation of the ID data. To address this issue, we propose the diffusion-based layer-wise semantic reconstruction approach for unsupervised OOD detection. The innovation of our approach is that we leverage the diffusion model's intrinsic data reconstruction ability to distinguish ID samples from OOD samples in the latent feature space. Moreover, to set up a comprehensive and discriminative feature representation, we devise a multi-layer semantic feature extraction strategy. Through distorting the extracted features with Gaussian noises and applying the diffusion model for feature reconstruction, the separation of ID and OOD samples is implemented according to the reconstruction errors. Extensive experimental results on multiple benchmarks built upon various datasets demonstrate that our method achieves state-of-the-art performance in terms of detection accuracy and speed.

NeurIPS Conference 2024 Conference Paper

Feature-Level Adversarial Attacks and Ranking Disruption for Visible-Infrared Person Re-identification

  • Xi Yang
  • Huanling liu
  • De Cheng
  • Nannan Wang
  • Xinbo Gao

Visible-infrared person re-identification (VIReID) is widely used in fields such as video surveillance and intelligent transportation, imposing higher demands on model security. In practice, the adversarial attacks based on VIReID aim to disrupt output ranking and quantify the security risks of models. Although numerous studies have been emerged on adversarial attacks and defenses in fields such as face recognition, person re-identification, and pedestrian detection, there is currently a lack of research on the security of VIReID systems. To this end, we propose to explore the vulnerabilities of VIReID systems and prevent potential serious losses due to insecurity. Compared to research on single-modality ReID, adversarial feature alignment and modality differences need to be particularly emphasized. Thus, we advocate for feature-level adversarial attacks to disrupt the output rankings of VIReID systems. To obtain adversarial features, we introduce \textit{Universal Adversarial Perturbations} (UAP) to simulate common disturbances in real-world environments. Additionally, we employ a \textit{Frequency-Spatial Attention Module} (FSAM), integrating frequency information extraction and spatial focusing mechanisms, and further emphasize important regional features from different domains on the shared features. This ensures that adversarial features maintain consistency within the feature space. Finally, we employ an \textit{Auxiliary Quadruple Adversarial Loss} to amplify the differences between modalities, thereby improving the distinction and recognition of features between visible and infrared images, which causes the system to output incorrect rankings. Extensive experiments on two VIReID benchmarks (i. e. , SYSU-MM01, RegDB) and different systems validate the effectiveness of our method.

AAAI Conference 2024 Conference Paper

Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

  • Yubin Wang
  • Xinyang Jiang
  • De Cheng
  • Dongsheng Li
  • Cairong Zhao

Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections among entities or attributes linked to a particular category. To address this limitation and prioritize harnessing structured knowledge, this paper advocates for leveraging LLMs to build a graph for each description to model the entities and attributes describing the category, as well as their correlations. Preexisting prompt tuning methods exhibit inadequacies in managing this structured knowledge. Consequently, we propose a novel approach called Hierarchical Prompt Tuning (HPT), which enables simultaneous modeling of both structured and conventional linguistic knowledge. Specifically, we introduce a relationship-guided attention module to capture pair-wise associations among entities and attributes for low-level prompt learning. In addition, by incorporating high-level and global-level prompts modeling overall semantics, the proposed hierarchical structure forges cross-level interlinks and empowers the model to handle more complex and long-term relationships. Extensive experiments demonstrate that our HPT shows strong effectiveness and generalizes much better than existing SOTA methods. Our code is available at https://github.com/Vill-Lab/2024-AAAI-HPT.

IJCAI Conference 2024 Conference Paper

Multi-Granularity Graph-Convolution-Based Method for Weakly Supervised Person Search

  • Haichun Tai
  • De Cheng
  • Jie Li
  • Nannan Wang
  • Xinbo Gao

One-step Weakly Supervised Person Search (WSPS) jointly performs pedestrian detection and person Re-IDentification (ReID) only with bounding box annotations, which makes the traditional person ReID problem more suitable and efficient for real-world applications. However, this task is very challenging due to the following reasons: 1) large feature gap between person ReID and general object detection tasks when learning shared representations; 2) difficult pseudo identity estimation for each person image with unrefined raw detection and dramatic scale changes. To address above issues, we propose a multi-granularity graph convolution framework to jointly optimize the aligned task features, as well as to assist the pseudo label estimation. Specifically, the multi-granularity feature alignment module (MFA) in the designed two-branch framework, employs cluster-level bi-directional interaction of various granularity information to narrow down the large feature gap. Further, upon the MFA module, we introduce the multi-granularity graph-convolution-based pseudo-label estimation module, to enhance feature representations for distinguishing diverse identities. Extensive experimental results demonstrate the effectiveness of the proposed method, and show superior performances to state-of-the art methods by a large margin on CUHK-SYSU and PRW datasets.

ICML Conference 2024 Conference Paper

Task-aware Orthogonal Sparse Network for Exploring Shared Knowledge in Continual Learning

  • Yusong Hu
  • De Cheng
  • Dingwen Zhang
  • Nannan Wang 0001
  • Tongliang Liu
  • Xinbo Gao 0001

Continual learning (CL) aims to learn from sequentially arriving tasks without catastrophic forgetting (CF). By partitioning the network into two parts based on the Lottery Ticket Hypothesis—one for holding the knowledge of the old tasks while the other for learning the knowledge of the new task—the recent progress has achieved forget-free CL. Although addressing the CF issue well, such methods would encounter serious under-fitting in long-term CL, in which the learning process will continue for a long time and the number of new tasks involved will be much higher. To solve this problem, this paper partitions the network into three parts—with a new part for exploring the knowledge sharing between the old and new tasks. With the shared knowledge, this part of network can be learnt to simultaneously consolidate the old tasks and fit to the new task. To achieve this goal, we propose a task-aware Orthogonal Sparse Network (OSN), which contains shared knowledge induced network partition and sharpness-aware orthogonal sparse network learning. The former partitions the network to select shared parameters, while the latter guides the exploration of shared knowledge through shared parameters. Qualitative and quantitative analyses, show that the proposed OSN induces minimum to no interference with past tasks, i. e. , approximately no forgetting, while greatly improves the model plasticity and capacity, and finally achieves the state-of-the-art performances.

NeurIPS Conference 2024 Conference Paper

Visual Prompt Tuning in Null Space for Continual Learning

  • Yue Lu
  • Shizhou Zhang
  • De Cheng
  • Yinghui Xing
  • Nannan Wang
  • Peng Wang
  • Yanning Zhang

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to overcome catastrophic forgetting in CL. However, different from the orthogonal projection in the traditional CNN architecture, the prompt gradient orthogonal projection in the ViT architecture shows completely different and greater challenges, i. e. , 1) the high-order and non-linear self-attention operation; 2) the drift of prompt distribution brought by the LayerNorm in the transformer block. Theoretically, we have finally deduced two consistency conditions to achieve the prompt gradient orthogonal projection, which provide a theoretical guarantee of eliminating interference on previously learned knowledge via the self-attention mechanism in visual prompt tuning. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient orthogonal projection. Extensive experimental results demonstrate the effectiveness of anti-forgetting on four class-incremental benchmarks with diverse pre-trained baseline models, and our approach achieves superior performances to state-of-the-art methods. Our code is available at https: //github. com/zugexiaodui/VPTinNSforCL

AAAI Conference 2023 Conference Paper

Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding

  • De Cheng
  • Xiaolong Wang
  • Nannan Wang
  • Zhen Wang
  • Xiaoyu Wang
  • Xinbo Gao

Visible-infrared person re-identification (VI-ReID) aims to retrieve the person images of the same identity from the RGB to infrared image space, which is very important for real-world surveillance system. In practice, VI-ReID is more challenging due to the heterogeneous modality discrepancy, which further aggravates the challenges of traditional single-modality person ReID problem, i.e., inter-class confusion and intra-class variations. In this paper, we propose an aggregated memory-based cross-modality deep metric learning framework, which benefits from the increasing number of learned modality-aware and modality-agnostic centroid proxies for cluster contrast and mutual information learning. Furthermore, to suppress the modality discrepancy, the proposed cross-modality alignment objective simultaneously utilizes both historical and up-to-date learned cluster proxies for enhanced cross-modality association. Such training mechanism helps to obtain hard positive references through increased diversity of learned cluster proxies, and finally achieves stronger ``pulling close'' effect between cross-modality image features. Extensive experiment results demonstrate the effectiveness of the proposed method, surpassing state-of-the-art works significantly by a large margin on the commonly used VI-ReID datasets.

NeurIPS Conference 2022 Conference Paper

Class-Dependent Label-Noise Learning with Cycle-Consistency Regularization

  • De Cheng
  • Yixiong Ning
  • Nannan Wang
  • Xinbo Gao
  • Heng Yang
  • Yuxuan Du
  • Bo Han
  • Tongliang Liu

In label-noise learning, estimating the transition matrix plays an important role in building statistically consistent classifier. Current state-of-the-art consistent estimator for the transition matrix has been developed under the newly proposed sufficiently scattered assumption, through incorporating the minimum volume constraint of the transition matrix T into label-noise learning. To compute the volume of T, it heavily relies on the estimated noisy class posterior. However, the estimation error of the noisy class posterior could usually be large as deep learning methods tend to easily overfit the noisy labels. Then, directly minimizing the volume of such obtained T could lead the transition matrix to be poorly estimated. Therefore, how to reduce the side-effects of the inaccurate noisy class posterior has become the bottleneck of such method. In this paper, we creatively propose to estimate the transition matrix under the forward-backward cycle-consistency regularization, of which we have greatly reduced the dependency of estimating the transition matrix T on the noisy class posterior. We show that the cycle-consistency regularization helps to minimize the volume of the transition matrix T indirectly without exploiting the estimated noisy class posterior, which could further encourage the estimated transition matrix T to converge to its optimal solution. Extensive experimental results consistently justify the effectiveness of the proposed method, on reducing the estimation error of the transition matrix and greatly boosting the classification performance.

IJCAI Conference 2022 Conference Paper

Robust Single Image Dehazing Based on Consistent and Contrast-Assisted Reconstruction

  • De Cheng
  • Yan Li
  • Dingwen Zhang
  • Nannan Wang
  • Xinbo Gao
  • Jiande Sun

Single image dehazing as a fundamental low-level vision task, is essential for the development of robust intelligent surveillance system. In this paper, we make an early effort to consider dehazing robustness under variational haze density, which is a realistic while under-studied problem in the research filed of singe image dehazing. To properly address this problem, we propose a novel density-variational learning framework to improve the robustness of the image dehzing model assisted by a variety of negative hazy images, to better deal with various complex hazy scenarios. Specifically, the dehazing network is optimized under the consistency-regularized framework with the proposed Contrast-Assisted Reconstruction Loss (CARL). The CARL can fully exploit the negative information to facilitate the traditional positive-orient dehazing objective function, by squeezing the dehazed image to its clean target from different directions. Meanwhile, the consistency regularization keeps consistent outputs given multi-level hazy images, thus improving the model robustness. Extensive experimental results on two synthetic and three real-world datasets demonstrate that our method significantly surpasses the state-of-the-art approaches.

IJCAI Conference 2017 Conference Paper

Discriminative Dictionary Learning With Ranking Metric Embedded for Person Re-Identification

  • De Cheng
  • Xiaojun Chang
  • Li Liu
  • Alexander G. Hauptmann
  • Yihong Gong
  • Nanning Zheng

The goal of person re-identification (Re-Id) is to match pedestrians captured from multiple non-overlapping cameras. In this paper, we propose a novel dictionary learning based method with the ranking metric embedded, for person Re-Id. A new and essential ranking graph Laplacian term is introduced, which minimizes the intra-personal compactness and maximizes the inter-personal dispersion in the objective. Different from the traditional dictionary learning based approaches and their extensions, which just use the same or not information, our proposed method can explore the ranking relationship among the person images, which is essential for such retrieval related tasks. Simultaneously, one distance measurement has been explicitly learned in the model to further improve the performance. Since we have reformulated these ranking constraints into the graph Laplacian form, the proposed method is easy-to-implement but effective. We conduct extensive experiments on three widely used person Re-Id benchmark datasets, and achieve state-of-the-art performances.