Author name cluster

Kaichen Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2026 Conference Paper

Dynamic Deep Prompt Optimization for Defending Against Jailbreak Attacks on LLMs

Doniyorkhon Obidov
Honggang Yu
Xiaolong Guo
Kaichen Yang

Large Language Models (LLMs) demonstrate impressive capabilities across many applications but remain vulnerable to jailbreak attacks, which elicit harmful or unintended content. While model fine-tuning is an option for safety alignment, it is costly and prone to catastrophic forgetting. Prompt optimization has emerged as a promising alternative, yet existing prompt-based defenses typically rely on static modifications (e.g., fixed prefixes or suffixes) that cannot adapt to diverse and evolving attacks. We propose Dynamic Deep Prompt Optimization (DDPO), the first jailbreak defense based on deep prompt optimization. DDPO uses the target LLM's own intermediate layers as feature extractors to dynamically generate defensive embeddings via a lightweight multilayer perceptron. These tailored embeddings are then injected into a subsequent intermediate layer, enabling an input-dependent defense without modifying the LLM's weights. This design ensures high adaptability with minimal computational overhead. Experiments on a diverse set of models and attacks demonstrate that DDPO significantly outperforms static prompt optimization methods, particularly on weakly aligned models and when handling semantically ambiguous benign prompts, successfully distinguishing them from genuinely harmful requests.

PDF Details DOI

AAAI Conference 2020 Conference Paper

Beyond Digital Domain: Fooling Deep Learning Based Recognition System in Physical World

Kaichen Yang
Tzungyu Tsai
Honggang Yu
Tsung-Yi Ho
Yier Jin

Adversarial examples that can fool deep neural network (DNN) models in computer vision present a growing threat. The current methods of launching adversarial attacks concentrate on attacking image classifiers by adding noise to digital inputs. The problem of attacking object detection models and adversarial attacks in physical world are rarely touched. Some prior works are proposed to launch physical adversarial attack against object detection models, but limited by certain aspects. In this paper, we propose a novel physical adversarial attack targeting object detection models. Instead of simply printing images, we manufacture real metal objects that could achieve the adversarial effect. In both indoor and outdoor experiments we show our physical adversarial objects can fool widely applied object detection models including SSD, YOLO and Faster R-CNN in various environments. We also test our attack in a variety of commercial platforms for object detection and demonstrate that our attack is still valid on these platforms. Consider the potential defense mechanisms our adversarial objects may encounter, we conduct a series of experiments to evaluate the effect of existing defense methods on our physical attack.

PDF Details

AAAI Conference 2020 Conference Paper

Robust Adversarial Objects against Deep Learning Models

Tzungyu Tsai
Kaichen Yang
Tsung-Yi Ho
Yier Jin

Previous work has shown that Deep Neural Networks (DNNs), including those currently in use in many ﬁelds, are extremely vulnerable to maliciously crafted inputs, known as adversarial examples. Despite extensive and thorough research of adversarial examples in many areas, adversarial 3D data, such as point clouds, remain comparatively unexplored. The study of adversarial 3D data is crucial considering its impact in real-life, high-stakes scenarios including autonomous driving. In this paper, we propose a novel adversarial attack against PointNet++, a deep neural network that performs classiﬁcation and segmentation tasks using features learned directly from raw 3D points. In comparison to existing works, our attack generates not only adversarial point clouds, but also robust adversarial objects that in turn generate adversarial point clouds when sampled both in simulation and after construction in real world. We also demonstrate that our objects can bypass existing defense mechanisms designed especially against adversarial 3D data.

PDF Details