TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Hao Yu; Ke Liang; Junxian Duan; Jun Wang; Siwei Wang; Chuan Ma; Xinwang Liu

doi:10.1609/aaai.v40i33.40018

Back to AAAI

AAAI 2026

TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Conference Paper AAAI Technical Track on Machine Learning X Artificial Intelligence

PDF Details DOI

Abstract

Large Vision-Language Models (LVLMs) enhance the capabilities of Large Language Models by integrating visual inputs, thereby enabling advanced multimodal reasoning across diverse applications. However, these enhanced reasoning capabilities introduce new security risks, particularly to jailbreaking attacks that bypass built-in safety mechanisms to elicit harmful or unauthorized outputs. While recent efforts have explored adversarial and typographic prompts, most existing attacks suffer from three key limitations: reliance on auxiliary models, limited effectiveness in black-box scenarios, and inadequate exploitation of the LVLMs' intrinsic reasoning abilities. In this work, we propose TVChain, a novel black-box jailbreaking framework that explicitly intervenes in both the visual and textual reasoning processes of LVLMs. TVChain decomposes malicious prompts into a sequence of semantically meaningful sub-images that represent relevant objects and behaviors, thereby circumventing direct exposure of illicit content. In parallel, a carefully designed chain-of-thought (CoT) textual prompt is employed to steer the model's reasoning toward reconstructing the intended activity in a covert yet effective manner. We demonstrate that this compositional prompting strategy reduces the likelihood of triggering safety mechanisms while preserving attack efficacy. Extensive evaluations on eleven LVLMs (seven open-source and four commercial) across two benchmark datasets and three state-of-the-art defenses validate the effectiveness and robustness of TVChain.

TVChain: Leveraging Textual-Visual Prompt Chains for Jailbreaking Large Vision-Language Models

Abstract

Authors

Keywords

Context