Raising the Cost of Malicious AI-Powered Image Editing

Authors: Hadi Salman, Alaa Khaddaj, Guillaume Leclerc, Andrew Ilyas, Aleksander Madry

Abstract: We present an approach to mitigating the risks of malicious image editing posed by large diffusion models. The key idea is to immunize images so as to make them resistant to manipulation by these models. This immunization relies on injection of imperceptible adversarial perturbations designed to disrupt the operation of the targeted diffusion models, forcing them to generate unrealistic images. We provide two methods for crafting such perturbations, and then demonstrate their efficacy. Finally, we discuss a policy component necessary to make our approach fully effective and practical — one that involves the organizations developing diffusion models, rather than individual users, to implement (and support) the immunization process.

What, Why and How

Here is a summary of the key points from the paper:

What:

The paper presents a method to “immunize” images against malicious editing using large diffusion models like DALL-E and Stable Diffusion.

Why:

These diffusion models make it very easy to realistically modify images based on text prompts. This enables harmful uses like nonconsensual image edits.
The goal is to make it more difficult to maliciously edit someone’s photo by immunizing the image.

How:

They immunize images by adding tiny imperceptible noises that disrupt the diffusion models.
Two attack methods are proposed: an encoder attack that forces a bad latent representation, and a diffusion attack that directly targets the diffusion process.
Experiments show these attacks force unrealistic edits compared to unimmunized images.
For practical use, cooperation from diffusion model developers is needed to let users immunize images against current and future versions of the models.

In summary, the paper provides a technical approach to immunize images against unethical AI-powered editing by diffusion models. This raises the difficulty of performing such edits, mitigating potential harms. Implementation requires policy measures alongside the technical proposals.

Main Contributions

Here are the main contributions of this paper:

Proposes the concept of “immunizing” images against malicious editing by diffusion models through imperceptible adversarial perturbations. This makes it harder to realistically modify immunized images.
Develops two methods to craft adversarial perturbations that disrupt diffusion models - an encoder attack and a more powerful diffusion attack.
Demonstrates experimentally that immunized images cannot be easily edited by Stable Diffusion to match text prompts, unlike original unimmunized images.
Quantitatively evaluates attack effectiveness using image similarity metrics between edited immunized and non-immunized images.
Discusses limitations of technical immunization, like lack of robustness to image transformations and forward compatibility.
Proposes a complementary techno-policy approach where diffusion model developers assist in immunization against current and future versions of their models.
Situates the work in the context of related work on defending against AI generative model misuse and detecting fake images.

In summary, the core technical contribution is an immunization procedure to protect images against unethical manipulation by diffusion models. The policy component addresses practical challenges in implementing this defense.

Method Section

High-Level Pseudo Code

Here is the high-level pseudo code for the key algorithms proposed in the paper:

# Encoder attack
 
image = original_image 
target_encoding = get_encoding(target_image)
 
for i in range(num_steps):
 
  encoding = get_encoding(image)
  
  loss = ||target_encoding - encoding||^2
  
  update = compute_gradient(loss, image)
  
  perturbation += step_size * update
  
  perturbation = clip(perturbation) 
  
  image = image - perturbation
 
return image

# Diffusion attack
 
image = original_image
target_image = gray_image 
 
for i in range(num_steps):
 
  output = diffusion_model(image)
  
  loss = ||target_image - output||^2 
  
  update = compute_gradient(loss, image)
  
  perturbation += step_size * update
  
  perturbation = clip(perturbation)
 
  image = image - perturbation
  
return image

The encoder attack adds imperceptible noise to make the encoding of the image match a target encoding. The diffusion attack directly optimizes to make the diffusion model output a specific target image when fed the immunized image.

Detailed Pseudo Code

Here is more detailed pseudo code for implementing the key algorithms in the paper:

# Encoder attack
 
def encoder_attack(image, target_image, encoder, epsilon, step_size, num_steps):
 
  target_encoding = encoder(target_image)
  
  perturbation = 0
  immunized_image = image
  
  for i in range(num_steps):
 
    image_encoding = encoder(immunized_image)
 
    loss = ||target_encoding - image_encoding||^2
 
    gradient = compute_gradient(loss, immunized_image) 
 
    perturbation += step_size * sign(gradient)
 
    perturbation = clip(perturbation, -epsilon, epsilon)
 
    immunized_image = image - perturbation
  
  return immunized_image

# Diffusion attack 
 
def diffusion_attack(image, target_image, diffusion_model, epsilon, step_size, num_steps):
 
  perturbation = 0
  immunized_image = image
 
  for i in range(num_steps):
 
    output = diffusion_model(immunized_image)
 
    loss = ||target_image - output||^2
 
    gradient = compute_gradient(loss, immunized_image)
 
    perturbation += step_size * sign(gradient) 
 
    perturbation = clip(perturbation, -epsilon, epsilon)
 
    immunized_image = image - perturbation
  
  return immunized_image

Key steps are computing the loss, taking the gradient to update the perturbation, clipping it, and updating the immunized image. This is done iteratively to craft an effective adversarial perturbation.

aixiv-c

Table of Contents

2302.06588 Raising the Cost of Malicious AI-Powered Image Editing

Raising the Cost of Malicious AI-Powered Image Editing

What, Why and How

Main Contributions

Method Section

High-Level Pseudo Code

Detailed Pseudo Code

Graph View

Backlinks

aixiv-c

Table of Contents

2302.06588 Raising the Cost of Malicious AI-Powered Image Editing

Raising the Cost of Malicious AI-Powered Image Editing §

What, Why and How §

Main Contributions §

Method Section §

High-Level Pseudo Code §

Detailed Pseudo Code §

Graph View

Backlinks

Raising the Cost of Malicious AI-Powered Image Editing

What, Why and How

Main Contributions

Method Section

High-Level Pseudo Code

Detailed Pseudo Code