Exact Diffusion Inversion via Bi-directional Integration Approximation

Authors: Guoqiang Zhang, J. P. Lewis, W. Bastiaan Kleijn

Abstract: Recently, different methods have been proposed to address the inconsistency issue of DDIM inversion to enable image editing, such as EDICT \cite{Wallace23EDICT} and Null-text inversion \cite{Mokady23NullTestInv}. However, the above methods introduce considerable computational overhead. In this paper, we propose a new technique, named \emph{bi-directional integration approximation} (BDIA), to perform exact diffusion inversion with neglible computational overhead. Suppose we would like to estimate the next diffusion state at timestep with the historical information and . We first obtain the estimated Gaussian noise , and then apply the DDIM update procedure twice for approximating the ODE integration over the next time-slot in the forward manner and the previous time-slot in the backward manner. The DDIM step for the previous time-slot is used to refine the integration approximation made earlier when computing . One nice property with BDIA-DDIM is that the update expression for is a linear combination of . This allows for exact backward computation of given , thus leading to exact diffusion inversion. Experiments on both image reconstruction and image editing were conducted, confirming our statement. BDIA can also be applied to improve the performance of other ODE solvers in addition to DDIM. In our work, it is found that applying BDIA to the EDM sampling procedure produces slightly better FID score over CIFAR10.

What, Why and How

Here is a summary of the key points in the paper:

What:

  • The paper proposes a new technique called Bi-directional Integration Approximation (BDIA) to enable exact diffusion inversion with DDIM.

Why:

  • DDIM inversion suffers from inconsistency between forward and backward diffusion states due to inherent approximations. This causes issues for image editing applications.
  • Recently proposed methods like EDICT and null-text inversion address this but have considerable computational overhead.

How:

  • BDIA performs forward DDIM update over next timestep and backward DDIM update over previous timestep to compute each new diffusion state.
  • This makes the update expression symmetric in time, allowing exact inversion.
  • BDIA doubles the diffusion updates per step but keeps neural network evaluations the same, so has little overhead.

In summary, the paper introduces BDIA to modify the DDIM update rule in a way that facilitates exact diffusion inversion while adding minimal computational cost. This improves consistency for applications like image editing.

Main Contributions

Here are the main contributions of this paper:

  • Proposes a new technique called Bi-directional Integration Approximation (BDIA) to enable exact inversion with DDIM while minimizing computational overhead.

  • Provides theoretical analysis showing BDIA leads to a time-symmetric update rule that allows exact diffusion state recovery during inversion.

  • Demonstrates BDIA can be applied to both DDIM and EDM sampling procedures and improves sample quality in terms of FID score.

  • Shows BDIA-DDIM can produce satisfactory results for image reconstruction and editing applications.

  • Discusses how BDIA is related to prior work on time-reversible integration techniques for ODEs.

  • Overall, introduces a lightweight way to achieve invertibility in diffusion models like DDIM, with theoretic justification and experimental validation of the benefits.

In summary, the key contributions are: 1) the BDIA technique itself, 2) theoretical analysis of time-symmetry, 3) applications to DDIM/EDM sampling, and 4) experimental demonstrations for image tasks.

Method Section

Here is a summary of the method section:

The paper first reviews DDIM inversion and its inconsistency issue due to inhereted approximations during diffusion state recovery. It also reviews the EDICT technique which achieves exact inversion by coupling two diffusion states and alternating updates. However, EDICT doubles the number of neural network evaluations.

The key ideas of BDIA are then presented:

  • Reformulate the DDIM update rule to represent each state in terms of all historical states and their incremental updates over timesteps.

  • Argue the forward DDIM integration approximation can be replaced by backward DDIM approximation over the previous timestep.

  • This leads to the BDIA update rule that computes each new state from the next state, current state, and current noise estimate.

  • Show this update is time-symmetric, allowing exact inversion by swapping time indices.

  • Only needs one neural network evaluation per step like DDIM, not two like EDICT.

In summary, BDIA modifies the DDIM update rule to leverage both forward and backward steps across timesteps. This enables invertibility while minimizing computational overhead. Theoretical analysis proves time-symmetry for inversion.

High-Level Pseudo Code

Here is the high-level pseudo code for the BDIA technique proposed in the paper:

# Input: diffusion states z[N],...,z[0]
# Output: inverted states z'[0],...,z'[N] 
 
# Forward diffusion with BDIA
for i=N-1 to 0:
   epsilon = estimate_noise(z[i], i)
   z[i-1] = z[i+1] - back_ddim(z[i], i) + forw_ddim(z[i], epsilon, i)
 
# Backward inversion with BDIA 
for i=1 to N-1:
   epsilon = estimate_noise(z'[i], i) 
   z'[i+1] = z'[i-1] - forw_ddim(z'[i], epsilon, i) + back_ddim(z'[i], i)  

Where:

  • forw_ddim performs the forward DDIM update
  • back_ddim performs the backward DDIM update
  • estimate_noise evaluates the noise estimation network

This shows how BDIA simply modifies the DDIM loop by adding backward steps, while keeping the number of neural network evaluations the same. The forward and backward loops are symmetric.

Detailed Pseudo Code

Here is some more detailed pseudo code to implement BDIA:

# Hyperparameters
N = 1000 # number of timesteps
alpha, sigma = setup_schedule() # noise schedules
 
# Forward diffusion
z[N] = sample_data() 
for i=N-1 to 0:
 
  # Noise estimation
  epsilon = estimate_noise(z[i], i)
  
  # Backward DDIM 
  a = alpha[i] / alpha[i+1]
  b = sigma[i+1] - sigma[i] * alpha[i+1] / alpha[i]
  z_back = z[i]/a - b/a * epsilon
  
  # Forward DDIM
  a = alpha[i-1] / alpha[i] 
  b = sigma[i-1] - sigma[i] * alpha[i-1] / alpha[i]
  z_forw = a * z[i] + b * epsilon
 
  # BDIA update 
  z[i-1] = z[i+1] - z_back + z_forw
 
 
# Backward inversion
for i=N-1 to 0:
 
  # Noise estimation
  epsilon = estimate_noise(z'[i], i)
 
  # Forward DDIM
  a = alpha[i] / alpha[i-1]
  b = sigma[i] - sigma[i-1] * alpha[i] / alpha[i-1] 
  z_forw = a * z'[i] + b * epsilon
 
  # Backward DDIM
  a = alpha[i+1] / alpha[i]
  b = sigma[i] - sigma[i+1] * alpha[i] / alpha[i+1]
  z_back = z'[i]/a - b/a * epsilon
 
  # BDIA update
  z'[i+1] = z'[i-1] - z_forw + z_back 

This implements the forward and backward BDIA loops in more detail, handling the DDIM computations and alpha/sigma schedule. The forward and backward passes have symmetric steps.