AAAI Conference 2026 Conference Paper
CP-CLIP: Customized Parameter Generation for Open-vocabulary Semantic Segmentation
- Zelin Peng
- Zhengqin Xu
- Feilong Tang
- Wei Shen
Open-vocabulary semantic segmentation aims to assign pixel-level labels to images based on textual descriptions, even for categories beyond predefined closed sets. While vision-language foundation models like CLIP are widely used for this task, fine-tuning them for pixel-level predictions often compromises their generalization capabilities. To address this, we propose a novel fine-tuning strategy, CP-CLIP, which generates customized parameters for CLIP without sacrificing its generalization. Our method employs a customized parameter generator that produces newly added parameters based on random noise, using local visual features from CLIP's image encoder as conditions, enabling generalization to new images from unseen scenarios. Additionally, we introduce an orthogonal adaptation technique to ensure the update direction is orthogonal to the pre-trained weights, largely preserving the initial generalization ability. Extensive experiments demonstrate that CP-CLIP achieves state-of-the-art performance across multiple benchmarks in open-vocabulary semantic segmentation.