Author name cluster

Guiping Cao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

2 papers

2 author rows

ICML Conference 2025 Conference Paper

Open-Det: An Efficient Learning Framework for Open-Ended Detection

Guiping Cao
Tao Wang
Wenjian Huang 0001
Xiangyuan Lan
Jianguo Zhang 0001
Dongmei Jiang

Open-Ended object Detection (OED) is a novel and challenging task that detects objects and generates their category names in a free-form manner, without requiring additional vocabularies during inference. However, the existing OED models, such as GenerateU, require large-scale datasets for training, suffer from slow convergence, and exhibit limited performance. To address these issues, we present a novel and efficient Open-Det framework, consisting of four collaborative parts. Specifically, Open-Det accelerates model training in both the bounding box and object name generation process by reconstructing the Object Detector and the Object Name Generator. To bridge the semantic gap between Vision and Language modalities, we propose a Vision-Language Aligner with V-to-L and L-to-V alignment mechanisms, incorporating with the Prompts Distiller to transfer knowledge from the VLM into VL-prompts, enabling accurate object name generation for the LLM. In addition, we design a Masked Alignment Loss to eliminate contradictory supervision and introduce a Joint Loss to enhance classification, resulting in more efficient training. Compared to GenerateU, Open-Det, using only 1. 5% of the training data (0. 077M vs. 5. 077M), 20. 8% of the training epochs (31 vs. 149), and fewer GPU resources (4 V100 vs. 16 A100), achieves even higher performance (+1. 0% in APr). The source codes are available at: https: //github. com/Med-Process/Open-Det.

Details

IJCAI Conference 2024 Conference Paper

MLP-DINO: Category Modeling and Query Graphing with Deep MLP for Object Detection

Guiping Cao
Wenjian Huang
Xiangyuan Lan
Jianguo Zhang
Dongmei Jiang
Yaowei Wang

Popular transformer-based detectors detect objects in a one-to-one manner, where both the bounding box and category of each object are predicted only by the single query, leading to the box-sensitive category predictions. Additionally, the initialization of positional queries solely based on the predicted confidence scores or learnable embeddings neglects the significant spatial interrelation between different queries. This oversight leads to an imbalanced spatial distribution of queries (SDQ). In this paper, we propose a new MLP-DINO model to address these issues. Firstly, we present a new Query-Independent Category Supervision (QICS) approach for modeling categories information, decoupling the sensitive bounding box prediction process to improve the detection performance. Additionally, to further improve the category predictions, we introduce a deep MLP model into transformer-based detection framework to capture the long-range and short-range information simultaneously. Thirdly, to balance the SDQ, we design a novel Graph-based Query Selection (GQS) method that distributes each query point in a discrete manner by graphing the spatial information of queries to cover a broader range of potential objects, significantly enhancing the hit-rate of queries. Experimental results on COCO indicate that our MLP-DINO achieves 54. 6% AP with only 44M parame ters under 36-epoch setting, greatly outperforming the original DINO by +3. 7% AP with fewer parameters and FLOPs. The source codes will be available at https: //github. com/Med-Process/MLP-DINO.

PDF Details DOI