Unified Concept Editing in Diffusion Models

Authors: Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzyńska, David Bau

Abstract: Text-to-image models suffer from various safety issues that may limit their suitability for deployment. Previous methods have separately addressed individual issues of bias, copyright, and offensive content in text-to-image models. However, in the real world, all of these issues appear simultaneously in the same model. We present a method that tackles all issues with a single approach. Our method, Unified Concept Editing (UCE), edits the model without training using a closed-form solution, and scales seamlessly to concurrent edits on text-conditional diffusion models. We demonstrate scalable simultaneous debiasing, style erasure, and content moderation by editing text-to-image projections, and we present extensive experiments demonstrating improved efficacy and scalability over prior work. Our code is available at unified.baulab.info

What, Why and How

Here is a summary of the key points from this paper:

\textbf{What}: The paper introduces a new method called Unified Concept Editing (UCE) to modify text-to-image diffusion models to address issues like bias, copyright, and offensive content.

\textbf{Why}: Existing methods tackle these issues separately, but in real-world deployments they appear together in the same model. UCE provides a unified approach to handle all these problems concurrently through closed-form editing of model parameters.

\textbf{How}:

  • UCE modifies the text-image alignment in diffusion models by changing the cross-attention weights using a closed-form solution. This avoids expensive retraining.

  • It can perform debiasing by evenly distributing attributes, erase concepts by misaligning text tokens, and moderate content by reducing extreme attributes.

  • The closed-form update allows editing hundreds of concepts simultaneously with less interference compared to prior editing methods.

  • Experiments demonstrate UCE’s improved performance on debiasing, erasing styles/objects, and moderating nudity over previous state-of-the-art techniques.

In summary, UCE enables scalable editing of diffusion models to address multiple safety issues together through an efficient closed-form solution, outperforming prior specialized methods. It provides a unified approach for developing socially responsible text-to-image models.

Main Contributions

The main contributions of this paper are:

  • Proposes a new method called Unified Concept Editing (UCE) that can modify diffusion models to address multiple issues like bias, copyright, and offensive content concurrently.

  • Introduces a generalized closed-form solution to edit the cross-attention weights that aligns text tokens to different target vectors based on the editing goal.

  • Enables debiasing by distributing attributes evenly, erasing concepts by misalignment, and moderating by reducing extreme attributes.

  • Scales to editing hundreds of concepts in one pass without expensive retraining.

  • Demonstrates superior performance compared to prior state-of-the-art methods on tasks of debiasing, style/object erasure, and nudity moderation.

  • Provides extensive analysis and experiments on real-world use cases like occupational gender/racial bias, artistic style removal, and nudity moderation.

  • Introduces a new iterative debiasing approach that tunes per-concept constants to mitigate interference between edited concepts.

  • Proposes a preservation term to minimize interference and retain capabilities for non-edited concepts.

  • Unifies and generalizes prior diffusion model editing techniques like TIME and MEMIT under one framework.

In summary, the key contributions are introducing the UCE method for unified editing of diffusion models, the closed-form update solution, demonstrations of its improved performance on multiple tasks, and analyses of its capabilities and limitations. The unified approach enables developing socially responsible text-to-image models.

Method Section

Here is a summary of the method section from the paper:

  • The paper introduces a general editing methodology applicable to any linear projection layer in a model.

  • Given a pretrained layer, the goal is to edit a set of concepts E while preserving another set P.

  • An objective function is constructed to minimize the difference between edited outputs v* and original outputs v for concepts in E, while preserving outputs for concepts in P.

  • A closed-form solution is derived for the updated weights W based on the embeddings of concepts in E and P.

  • The method can perform different types of editing by changing how the target outputs v* are defined:

  • For erasing a concept, v* is set to align with a different concept embedding. This misaligns the concept token.

  • For debiasing across attributes, v* is set to a weighted combination of the concept and attribute embeddings. The weights control the attribute ratios.

  • For moderating a concept, v* aligns with an unconditional embedding to reduce extreme attributes.

  • The closed-form update allows editing the cross-attention weights in diffusion models to change text-image alignments.

  • Multiple concepts can be edited simultaneously by including their embeddings in the objective function.

  • The preservation terms minimize interference and retain capabilities for non-edited concepts.

In summary, the key idea is a generalized closed-form update to edit linear model weights for multiple concepts concurrently, with applications to debiasing, erasing, and moderating text-conditional diffusion models.

High-Level Pseudo Code

Here is high-level pseudocode for the key algorithms from the paper:

# Unified Concept Editing
 
# Input: 
concepts_to_edit = [c1, c2, ...] 
concepts_to_preserve = [c3, c4, ...]
target_outputs = [v1*, v2*, ...] 
 
# Calculate edit covariance matrix
C_edit = sum([ci * ci.T for ci in concepts_to_edit])  
 
# Calculate preserve covariance matrix  
C_preserve = sum([cj * cj.T for cj in concepts_to_preserve])
 
# Compute closed-form weight update
W_new = (sum([vi* * ci.T for vi*, ci in zip(target_outputs, concepts_to_edit)]) + 
         sum([W_old * cj * cj.T for cj in concepts_to_preserve])) *
        (C_edit + C_preserve).inverse()
 
# Apply update
W = W_new
# Debiasing
 
# Input
concepts = [c1, c2, ...] 
attributes = [a1, a2, ...]
target_ratios = [[r11, r12...], ...] 
 
for iter in num_iters:
  
  for ci in concepts:
 
    # Get current attribute ratios
    ratios = classify_attributes(generate_images(ci)) 
    
    # Compute debiasing constants
    alphas = [(ri - rat) / ri for ri, rat in zip(target_ratios, ratios)]
 
    # Get target output  
    vi* = W_old * (ci + sum([a * alpha for a,alpha in zip(attributes,alphas)]))
    
    # Edit concept
    edit_concept(ci, vi*)
 
  # Update concepts list  
  concepts = check_debiasing()

The key pseudocode covers the closed-form update for unified editing, and the iterative process for debiasing multiple concepts.

Detailed Pseudo Code

Here is more detailed pseudocode to implement the key algorithms in the paper:

# Unified Concept Editing
 
def edit_model(concepts_to_edit, concepts_to_preserve, target_outputs):
 
  # Get embeddings for edit concepts
  edit_embeddings = [get_embedding(c) for c in concepts_to_edit]  
  
  # Get embeddings for preserve concepts
  preserve_embeddings = [get_embedding(c) for c in concepts_to_preserve]
 
  # Compute edit covariance matrix
  C_edit = sum([ci * ci.T for ci in edit_embeddings])
 
  # Compute preserve covariance matrix
  C_preserve = sum([cj * cj.T for cj in preserve_embeddings])  
 
  # Compute target output outer products
  target_ops = [np.outer(vi, ci) for vi, ci in zip(target_outputs, edit_embeddings)]
  
  # Compute preserve output outer products
  preserve_ops = [np.outer(W_old@cj, cj) for cj in preserve_embeddings]
 
  # Calculate closed-form weight update
  num = sum(target_ops) + sum(preserve_ops) 
  den = C_edit + C_preserve
  W_new = np.linalg.inv(den) @ num
 
  # Apply weight update
  W[...] = W_new
 
  return W_new
# Debiasing 
 
def debias_concepts(concepts, attributes, target_ratios, num_iters):
  
  for i in range(num_iters):
  
    ratios = []
  
    for c in concepts:
    
      # Generate images
      images = generate_images(c) 
 
      # Classify attributes
      attr_counts = classify_attributes(images)  
 
      # Calculate ratios
      ratios.append([cnt/sum(attr_counts) for cnt in attr_counts])
    
    # Compute debias constants
    alphas = [[(r - ra) / r for r, ra in zip(tr, ra)]  
              for tr, ra in zip(target_ratios, ratios)]
    
    for j, (c, a) in enumerate(zip(concepts, alphas)):
    
      # Get target output
      v* = get_target_output(c, attributes, a)  
 
      # Edit concept
      edit_concept(c, v*)
 
    # Update concept list
    concepts = check_debiasing(concepts, target_ratios, ratios)
 
  return W  

This shows more implementation details like computing the embeddings, covariance matrices, closed-form solution, and iterative debiasing process.