Diffusion Models for Imperceptible and Transferable Adversarial Attack

Authors: Jianqi Chen, Hao Chen, Keyan Chen, Yilan Zhang, Zhengxia Zou, Zhenwei Shi

Abstract: Many existing adversarial attacks generate $L_{p}$ -norm perturbations on image RGB space. Despite some achievements in transferability and attack success rate, the crafted adversarial examples are easily perceived by human eyes. Towards visual imperceptibility, some recent works explore unrestricted attacks without $L_{p}$ -norm constraints, yet lacking transferability of attacking black-box models. In this work, we propose a novel imperceptible and transferable attack by leveraging both the generative and discriminative power of diffusion models. Specifically, instead of direct manipulation in pixel space, we craft perturbations in latent space of diffusion models. Combined with well-designed content-preserving structures, we can generate human-insensitive perturbations embedded with semantic clues. For better transferability, we further “deceive” the diffusion model which can be viewed as an additional recognition surrogate, by distracting its attention away from the target regions. To our knowledge, our proposed method, DiffAttack, is the first that introduces diffusion models into adversarial attack field. Extensive experiments on various model structures (including CNNs, Transformers, MLPs) and defense methods have demonstrated our superiority over other attack methods.

What, Why and How

This paper proposes a new adversarial attack method called DiffAttack that generates imperceptible and transferable adversarial examples by leveraging diffusion models.

What:

DiffAttack is an unrestricted adversarial attack that operates in the latent space of diffusion models instead of directly manipulating pixels.

Why:

Diffusion models can generate natural images conforming to human perception, satisfying imperceptibility.
Pretrained diffusion models are powerful recognition models, so fooling them enhances transferability.

How:

Invert images to latent space with DDIM inversion.
Optimize latent to fool classifier and diffuse model’s attention.
Use self-attention maps to preserve image structure.
Control inversion strength for semantics and attack space.

In summary, DiffAttack exploits the generative and discriminative power of diffusion models to craft adversarial examples with good imperceptibility and transferability. It perturbs latents rather than pixels for more semantic and natural perturbations. Specific designs are introduced to fool strong diffusion models and preserve content structure. Experiments demonstrate superiority over existing attacks.

Main Contributions

The main contributions of this paper are:

They are the first to introduce diffusion models into the field of adversarial attacks and unveil their potential for crafting imperceptible and transferable adversarial examples.
They propose DiffAttack, a novel unrestricted attack method that perturbs the latent space of diffusion models instead of directly manipulating pixels.
They leverage the generative power of diffusion models to achieve imperceptibility by generating natural image perturbations.
They utilize the discriminative power of diffusion models as strong recognition surrogates to enhance transferability by “deceiving” them.
They design techniques like self-attention control and inversion strength adjustment to preserve content structure and semantics.
They demonstrate through extensive experiments that DiffAttack outperforms existing attacks in terms of imperceptibility and transferability across various model architectures and defense methods.

In summary, the key contributions are proposing the first diffusion model based attack, DiffAttack, and showing how to effectively exploit diffusion models’ properties for superior imperceptible and transferable adversarial attacks.

Method Section

Here is a summary of the method section:

The overall framework follows recent diffusion image editing methods.

Invert images to latent space using DDIM inversion to get latent code x_t.
Optimize x_t to minimize attack loss and fool classifier. Basic attack framework.
Add loss to minimize variance of cross-attention maps between x_t and label text. Distracts diffusion model’s attention to “deceive” it as a strong recognition surrogate and enhance transferability.
Preserve content structure:

Use self-attention distance loss between x_t and fixed copy x_t(fix) to retain structural information.
Control inversion strength to balance semantics and attack space.

Also optimize unconditional embeddings for good reconstruction.

The overall loss function combines the attack loss, diffusion model deception loss, and structure preservation losses.

In summary, the key aspects are inverting images for latent edits, deceiving the diffusion model for transferability, and using self-attention plus controlling inversion strength to preserve structure and semantics. The method exploits diffusion models’ generative and discriminative properties for imperceptible and transferable attacks.

High-Level Pseudo Code

Here is the high-level pseudo code for the DiffAttack method:

# Input: clean image x, label y
# Output: adversarial example x'
 
# Load pretrained diffusion model G
# Set label text C as y
 
# Invert x to get latent code xt using DDIM
xt = Inverse(x) 
 
# Copy xt as xt(fix) for structure retention 
 
# Define losses
L_attack = cross_entropy_loss(G(xt), y) 
L_transfer = variance(average(cross_attn(xt,C)))
L_structure = L2_dist(self_attn(xt),self_attn(xt(fix)))
 
# Optimize xt
for i in iterations:
  
  # Get recon image x' and attn maps
  x' = Denoise(xt)  
  attn_maps = cross_attn(xt,C)
  self_maps = self_attn(xt)
  
  # Update xt
  xt = xt - lr * (dL_attack + dL_transfer + dL_structure)
 
# Return adversarial example   
x' = Denoise(xt)
return x'

The key steps are:

Invert image x to latent xt
Define attack, transferability and structure losses
Iterate over optimizing xt w.r.t these losses
Reconstruct and return adversarial example x’

This leverages the diffusion model’s generative and discriminative properties to craft imperceptible and transferable adversarial examples.

Detailed Pseudo Code

Here is a more detailed pseudo code to implement the DiffAttack method:

import diffusion_model as dm
 
# Hyperparameters
num_steps = 1000 
lr = 1e-2
alpha = 10
beta = 10000 
gamma = 100
 
# Load pretrained diffusion model
model = dm.load_model('stable-diffusion') 
 
# Input image
x = load_image('input.png')
y = true_label(x) 
 
# Invert image 
xt = dm.ddim_invert(model, x, steps=5)
 
# Copy for structure loss
xt_fix = xt.clone()
 
for i in range(num_steps):
 
  # Diffusion forward 
  x_recon = dm.ddim_sample(model, xt, steps=20) 
  
  # Get losses
  l_attack = cross_entropy(model(x_recon), y)
 
  # Attention maps
  c = one_hot(y)
  attn = dm.cross_attention(model, xt, c) 
  l_transfer = attn.var()  
 
  self_attn = dm.self_attention(model, xt)
  self_attn_fix = dm.self_attention(model, xt_fix)
  l_structure = (self_attn - self_attn_fix).norm()  
  
  # Total loss
  loss = alpha*l_attack + beta*l_transfer + gamma*l_structure
  
  # Update latent
  xt.backward(loss)
  opt.step()
  xt.zero_grad()
 
# Generate adversarial example
x_adv = dm.ddim_sample(model, xt, steps=20)
 
return x_adv

Key aspects:

Invert input image
Diffusion forward/backward between latent and image
Compute attack, transferability (cross-attn), and structure losses
Optimize latent w.r.t. weighted losses
Generate and return adversarial example

This provides a detailed implementation leveraging diffusion model inversion, diffusion process, and loss formulations.

aixiv-c

Table of Contents

2305.08192 Diffusion Models for Imperceptible and Transferable Adversarial Attack

Diffusion Models for Imperceptible and Transferable Adversarial Attack

What, Why and How

Main Contributions

Method Section

High-Level Pseudo Code

Detailed Pseudo Code

Graph View

Backlinks

aixiv-c

Table of Contents

2305.08192 Diffusion Models for Imperceptible and Transferable Adversarial Attack

Diffusion Models for Imperceptible and Transferable Adversarial Attack §

What, Why and How §

Main Contributions §

Method Section §

High-Level Pseudo Code §

Detailed Pseudo Code §

Graph View

Backlinks

Diffusion Models for Imperceptible and Transferable Adversarial Attack

What, Why and How

Main Contributions

Method Section

High-Level Pseudo Code

Detailed Pseudo Code