Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

Juanjuan Weng; Zhiming Luo; Zhun Zhong; Dazhen Lin; Shaozi Li

doi:10.1609/aaai.v37i3.25377

Back to AAAI

AAAI 2023

Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

Conference Paper AAAI Technical Track on Computer Vision III Artificial Intelligence

PDF Details DOI

Abstract

The ensemble attack with average weights can be leveraged for increasing the transferability of universal adversarial perturbation (UAP) by training with multiple Convolutional Neural Networks (CNNs). However, after analyzing the Pearson Correlation Coefficients (PCCs) between the ensemble logits and individual logits of the crafted UAP trained by the ensemble attack, we find that one CNN plays a dominant role during the optimization. Consequently, this average weighted strategy will weaken the contributions of other CNNs and thus limit the transferability for other black-box CNNs. To deal with this bias issue, the primary attempt is to leverage the Kullback–Leibler (KL) divergence loss to encourage the joint contribution from different CNNs, which is still insufficient. After decoupling the KL loss into a target-class part and a non-target-class part, the main issue lies in that the non-target knowledge will be significantly suppressed due to the increasing logit of the target class. In this study, we simply adopt a KL loss that only considers the non-target classes for addressing the dominant bias issue. Besides, to further boost the transferability, we incorporate the min-max learning framework to self-adjust the ensemble weights for each CNN. Experiments results validate that considering the non-target KL loss can achieve superior transferability than the original KL loss by a large margin, and the min-max training can provide a mutual benefit in adversarial ensemble attacks. The source code is available at: https://github.com/WJJLL/ND-MM.

Exploring Non-target Knowledge for Improving Ensemble Universal Adversarial Attacks

Abstract

Authors

Keywords

Context