How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models

Authors: Zhenting Wang, Chen Chen, Yuchen Liu, Lingjuan Lyu, Dimitris Metaxas, Shiqing Ma

Abstract: Recent text-to-image diffusion models have shown surprising performance in generating high-quality images. However, concerns have arisen regarding the unauthorized usage of data during the training process. One example is when a model trainer collects a set of images created by a particular artist and attempts to train a model capable of generating similar images without obtaining permission from the artist. To address this issue, it becomes crucial to detect unauthorized data usage. In this paper, we propose a method for detecting such unauthorized data usage by planting injected memorization into the text-to-image diffusion models trained on the protected dataset. Specifically, we modify the protected image dataset by adding unique contents on the images such as stealthy image wrapping functions that are imperceptible to human vision but can be captured and memorized by diffusion models. By analyzing whether the model has memorization for the injected content (i.e., whether the generated images are processed by the chosen post-processing function), we can detect models that had illegally utilized the unauthorized data. Our experiments conducted on Stable Diffusion and LoRA model demonstrate the effectiveness of the proposed method in detecting unauthorized data usages.

What, Why and How

Here is a summary of the key points from the paper:

What:

The paper proposes a method to detect unauthorized usage of protected image data in training text-to-image diffusion models.

Why:

There are concerns about diffusion models being trained on datasets without proper permissions. The goal is to prevent unauthorized usage of protected data.

How:

The method “coats” the protected dataset by adding stealthy image transformations (called the signal function).
This injects a unique “memorization” into models trained on the coated dataset.
A classifier (signal classifier) is trained to detect the presence of the signal function in generated images.
Given a model, its memorization strength for the signal function is approximated.
Hypothesis testing determines if the model likely used the protected data.

In summary, the paper coats the protected data to inject a detectable signal into models that use it without authorization. This allows detecting unauthorized usage by checking for the presence of the injected memorization.

Main Contributions

Based on my reading, the main contributions of this paper are:

It defines two types of injected memorizations on text-to-image diffusion models: unconditional and trigger-conditioned. It also formally defines the memorization strength.
It proposes a framework for detecting unauthorized data usage by planting injected memorizations into models trained on the protected dataset. The framework involves coating the dataset, approximating memorization strength, and hypothesis testing.
It implements a prototype and conducts experiments on Stable Diffusion and LoRA models. The results demonstrate the effectiveness of the proposed method in detecting unauthorized usage with high accuracy and minimal impact on model performance.
It discusses the differences between unconditional and trigger-conditioned injected memorizations in terms of their applicability and performance.

In summary, the key contributions are: formally defining injected memorizations, proposing the detection framework, implementing and evaluating it experimentally, and analyzing different types of injected memorizations.

Method Section

Here is a summary of the method section from the paper:

Problem Formulation: The goal is to detect unauthorized usage of protected image data in training/fine-tuning text-to-image diffusion models. The infringer aims to train a model on the data without permission. The protector aims to determine if a given model used the protected data.
Data Usage Tracing and Injected Memorization: The key idea is to “coat” the protected data by adding unique transformations (signal function) that get memorized by models trained on it. This injected memorization can be unconditional or triggered by a text function.
Tracing Unauthorized Usage: A binary classifier (signal classifier) is trained to detect the presence of the signal function in images. For a given model, its memorization strength for the signal function is approximated. Hypothesis testing determines if the memory strength indicates unauthorized usage.
Proposed Framework: The coating phase modifies and coats the protected dataset and trains the signal classifier. The detection phase approximates memorization strength of a given model and performs hypothesis testing using the trained classifier.

In summary, the core idea is injecting unique memorizations into models by coating the protected data, and then detecting unauthorized usage by checking for the presence of the injected memorization. The method involves dataset coating, training a signal classifier, approximating memorization strength, and statistical hypothesis testing.

High-Level Pseudo Code

Here is the high-level pseudo code for the key steps in the paper:

# Coating phase
def coat_dataset(dataset, signal_function):
  coated_dataset = apply_signal_function(subset(dataset)) 
  train_signal_classifier(coated_dataset)
  return coated_dataset, signal_classifier
 
# Detection phase  
def detect_unauthorized_usage(model, coated_dataset, signal_classifier):
 
  # Approximate memorization strength
  prompts = sample_prompts(coated_dataset)
  images = model(prompts) 
  strength = test_signal_classifier(images)
 
  # Hypothesis testing
  baseline_strength = validate_signal_classifier(clean_dataset) 
  threshold = calculate_threshold(baseline_strength)
  
  if strength > threshold:
    print("Unauthorized data usage detected!")
  else: 
    print("No unauthorized usage detected.")

The key steps are:

Coating the dataset by applying a signal function
Training a signal classifier on the coated dataset
Approximating memorization strength of the model using the classifier
Performing hypothesis testing against a baseline threshold

So in summary, the pseudo code covers the main procedures of coating the data, training the classifier, approximating memorization strength, and detecting unauthorized usage via hypothesis testing.

Detailed Pseudo Code

Here is a more detailed pseudo code to implement the key components of the method proposed in the paper:

# Dataset coating
def coat_dataset(dataset, signal_function, coating_rate):
  
  num_to_coat = int(len(dataset) * coating_rate)
  coated_subset = random_sample(dataset, num_to_coat)
 
  for x, y in coated_subset:
    x = signal_function(x) # Apply signal fn
  
  coated_dataset = coated_subset + dataset
  shuffle(coated_dataset)
 
  return coated_dataset
 
# Signal classifier  
def train_signal_classifier(dataset):
 
  # Train test split
  train_set, test_set = split(dataset) 
  
  # Create model architecture
  model = ClassifierModel() 
  
  # Generate training data
  X_train, y_train = get_data(train_set)
 
  # Train
  model.fit(X_train, y_train)
 
  # Evaluate on test set
  test_acc = model.evaluate(test_set)
 
  return model
 
# Memorization strength
def get_memorization_strength(model, prompts, signal_classifier):
 
  images = model.generate(prompts)
 
  predictions = signal_classifier.predict(images)
 
  strength = np.mean(predictions)
 
  return strength
 
# Hypothesis testing
def unauthorized_usage_detection(model, dataset, classifier):
 
  prompts = sample_prompts(dataset)
 
  strength = get_memorization_strength(model, prompts, classifier)
 
  baseline_strength = 1 - classifier.test_accuracy 
 
  threshold = calculate_threshold(baseline_strength)
 
  return strength > threshold

The key steps covered:

Coating dataset by applying signal function to a subset
Training a classifier on coated dataset
Getting memorization strength using classifier on model outputs
Hypothesis testing against a baseline threshold

Let me know if you would like me to explain or expand on any part of this in more detail!

aixiv-c

Table of Contents

2307.03108 How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models

How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models

What, Why and How

Main Contributions

Method Section

High-Level Pseudo Code

Detailed Pseudo Code

Graph View

Backlinks

aixiv-c

Table of Contents

2307.03108 How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models

How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models §

What, Why and How §

Main Contributions §

Method Section §

High-Level Pseudo Code §

Detailed Pseudo Code §

Graph View

Backlinks

How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models

What, Why and How

Main Contributions

Method Section

High-Level Pseudo Code

Detailed Pseudo Code