Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Yan Zhu; Huan Ma; Changqing Zhang

doi:10.1609/aaai.v39i10.33199

Back to AAAI

AAAI 2025

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Conference Paper AAAI Technical Track on Computer Vision IX Artificial Intelligence

PDF Details DOI

Abstract

With the development of Vision Foundation Models (VFMs) in recent years, Visual In-Context Learning (VICL) has become a better choice compared to modifying models in most scenarios. Different from retraining or fine-tuning models, VICL does not require modifications to the model's weights and architecture, and only needs a prompt with demonstrations to teach VFM how to solve tasks. Currently, significant computational cost for finding optimal prompts for every test sample hinders the deployment of VICL, as determining which demonstrations to use for constructing the prompt is very costly. In this paper, however, we find a counterintuitive phenomenon that most test samples actually achieve optimal performance under the same prompts, and searching for sample-level prompts only costs much time but results in completely identical prompts actually. Therefore, we propose task-level prompting to reduce the cost of searching for prompts during the inference stage and introduce two time-saving yet effective task-level prompt search strategies accordingly. Extensive experimental results show that our proposed method can identify near-optimal prompts and reach the best VICL performance with a minimal cost that prior work has never achieved.

Exploring Task-Level Optimal Prompts for Visual In-Context Learning

Abstract

Authors

Keywords

Context