EAAI Journal 2025 Journal Article
An ensemble-based transfer testing method for Large Language Models
- Yuanxin Qiao
- Yong Liu
- Xiang Chen
- Zhanqi Cui
Large Language Models (LLMs) can pose serious risks in real-world applications due to their potential for erroneous behavior, necessitating comprehensive and effective testing of LLMs. To assess the robustness of LLMs, adversarial attacks are typically conducted by constructing adversarial examples. Previous methods often require extensive queries and access to the internal information of the victim model. However, the internal information of most black-box LLMs is not accessible, rendering these testing methods infeasible. In addition, excessive queries to commercial black-box LLMs may incur substantial costs. To address these issues, this paper proposes an Ensemble-based Transfer Testing method for Large Language Models (ETTLLM). In contrast to previous adversarial testing methods for LLMs, ETTLLM queries white-box surrogates rather than the victim model, thereby significantly reducing testing costs. Moreover, it enhances the transferability and generalization of adversarial examples across diverse real-world classification tasks. Compared to baselines, ETTLLM significantly reduced the number of queries to the victim model, with an average of 1. 6 queries, just 1. 2% of the baselines. Furthermore, the textual similarity and modification rate of the adversarial examples generated by ETTLLM differ from the baselines by no more than 1. 6%, while achieving 70% of the attack success rate compared to the baselines.