Model Inversion Attack via Dynamic Memory Learning
Authors: Gege Qi, YueFeng Chen, Xiaofeng Mao, Binyuan Hui, Xiaodan Li, Rong Zhang, Hui Xue
Abstract: Model Inversion (MI) attacks aim to recover the private training data from the target model, which has raised security concerns about the deployment of DNNs in practice. Recent advances in generative adversarial models have rendered them particularly effective in MI attacks, primarily due to their ability to generate high-fidelity and perceptually realistic images that closely resemble the target data. In this work, we propose a novel Dynamic Memory Model Inversion Attack (DMMIA) to leverage historically learned knowledge, which interacts with samples (during the training) to induce diverse generations. DMMIA constructs two types of prototypes to inject the information about historically learned knowledge: Intra-class Multicentric Representation (IMR) representing target-related concepts by multiple learnable prototypes, and Inter-class Discriminative Representation (IDR) characterizing the memorized samples as learned prototypes to capture more privacy-related information. As a result, our DMMIA has a more informative representation, which brings more diverse and discriminative generated results. Experiments on multiple benchmarks show that DMMIA performs better than state-of-the-art MI attack methods.
What, Why and How
Here is a summary of the key points from the paper:
What:
-
The paper proposes a new model inversion attack called Dynamic Memory Model Inversion Attack (DMMIA).
-
It uses two types of prototypes to improve the attack:
- Intra-class Multicentric Representation (IMR): Represents the target class with multiple learnable prototypes to capture diverse concepts.
- Inter-class Discriminative Representation (IDR): Stores features of generated images in a memory bank as non-parametric prototypes to distinguish between classes.
Why:
-
To address the issue of “catastrophic forgetting” in GAN-based model inversion attacks, where the attack tends to forget previously learned features.
-
The prototypes help prevent forgetting and reuse historical knowledge to improve attack success and diversity.
How:
-
IMR maximizes likelihood of generated samples belonging to target class concepts.
-
IDR aligns features to be more similar to corresponding class prototypes.
-
The attack optimizes a combined loss with cross-entropy term and losses for the two prototypes.
-
Experiments show DMMIA outperforms prior attacks, achieving higher success rate and diversity on facial image datasets.
In summary, the key contribution is using dynamic memory prototypes to improve model inversion attacks by reusing prior knowledge, preventing catastrophic forgetting. The intra-class and inter-class prototypes help capture target concepts and discriminate between classes.
Main Contributions
Here are the main contributions of this paper:
-
Proposes a new model inversion attack called Dynamic Memory Model Inversion Attack (DMMIA) that uses memory prototypes to improve attack performance.
-
Introduces two types of prototypes:
- Intra-class Multicentric Representation (IMR) to represent target class concepts
- Inter-class Discriminative Representation (IDR) to distinguish between classes
-
Provides theoretical analysis on how the prototypes increase diversity of generated samples.
-
Achieves state-of-the-art attack success rates on facial image datasets, outperforming prior methods.
-
Demonstrates improved sample diversity and quality compared to baseline attacks without prototypes.
-
Extensive experiments validate effectiveness of DMMIA on facial datasets as well as natural images, attacking various model architectures.
-
Ablation studies analyze the impact of different prototype configurations.
-
Shows minimal extra computational cost for training with prototypes, no added inference cost.
In summary, the main contributions are:
- The DMMIA attack method using memory prototypes
- Intra-class and inter-class prototype formulations
- Theoretical analysis of diversity
- Superior attack performance demonstrated through experiments
- Model-agnostic approach with minimal computational overhead
Method Section
Here is a summary of the method section from the paper:
The key components of the proposed Dynamic Memory Model Inversion Attack (DMMIA) are:
-
Uses a pre-trained StyleGAN generator with an auxiliary dataset as the generative model.
-
The mapping network converts the input latent code z to an intermediate latent code.
-
The synthesis network then generates the image from the intermediate latent code.
-
Optimization is done on the mapping network parameters to invert the target model.
Intra-class Multicentric Representation (IMR):
-
Learns multiple positive and negative prototypes to represent concepts of the target class.
-
Maximizes likelihood of generated samples belonging to target concepts.
Inter-class Discriminative Representation (IDR):
-
Maintains a memory bank of embedding features of generated images.
-
Aligns features to be more similar to corresponding class prototypes.
-
Uses momentum update to update memory bank smoothly.
Training:
-
Combines cross-entropy loss, IMR loss, and IDR loss.
-
Alternately optimizes mapping network parameters and IMR prototypes.
-
Samples latent vectors and selects top ones based on prediction scores.
-
Progressive update of prototypes helps prevent catastrophic forgetting.
In summary, the key method components are the two types of prototypes (IMR and IDR) that leverage memory to improve the model inversion attack by reusing prior knowledge in an efficient way. The prototypes are optimized along with the mapping network to invert the target model.
High-Level Pseudo Code
Here is the high-level pseudo code for the main method in the paper:
# Input:
# G: Generator (StyleGAN)
# F_phi: Mapping network
# f: Target model
# T: Training epochs
# Initialize:
# IMR prototypes W
# IDR memory bank M (empty)
for i in range(T):
# Sample latent vector
z ~ N(0, I)
# Generate image
x_hat = G(F_phi(z))
# Update mapping network
phi_(i+1) = grad(L_DMMIA)
# Update IMR prototypes
W_(i+1) = grad(L_imr)
# Update IDR memory bank
M_(i+1) = r*M_i + (1-r)*M'_i
# L_DMMIA = L_ce + lambda_1*L_imr + lambda_2*L_idr
Where:
- L_DMMIA is the total loss
- L_ce is the cross-entropy loss
- L_imr is the IMR loss
- L_idr is the IDR loss
The key steps are sampling the latent vectors, generating images, updating the mapping network and prototypes based on the losses, and updating the memory bank. The training loop continues for T epochs to optimize the attack.
Detailed Pseudo Code
Here is a more detailed pseudo code to implement the key components of the Dynamic Memory Model Inversion Attack (DMMIA) method proposed in the paper:
# Hyperparameters
T = # training epochs
N_z = # number of latent vectors
N_w = # number of prototypes
rho = # number of positive prototypes
r = # momentum coefficient
lambda_1, lambda_2 = # loss weights
# Initialize models
G = Pretrained_StyleGAN(...) # Generator
F_phi = MappingNetwork(...) # Mapping network
f = TargetModel(...) # Target model
# Initialize prototypes
W = RandomEmbeddings(N_w, N_d) # IMR prototypes
M = {} # IDR memory bank
for c in range(K):
M[c] = []
for i in range(T):
# Sample latent vectors
Z = [SampleZ() for _ in range(N_z)]
# Get top vectors based on prediction
Z_top = TopK(f(G(F_phi(z))), k=200)
for z in Z_top:
# Generate image
x_hat = G(F_phi(z))
# Class prediction
y_hat = argmax(f(x_hat))
# Get image feature
f_x = FeatureExtractor(x_hat)
# Update mapping network
phi = Optimizer(L_DMMIA)
# Update IMR prototypes
W = Optimizer(L_imr)
# Update IDR memory
M[y_hat].append(f_x)
for c in range(K):
M[c] = r*M[c] + (1-r)*Mean(M[c])
# Loss functions
L_DMMIA = L_ce + lambda_1*L_imr + lambda_2*L_idr
L_imr = -log(p_imr) # IMR likelihood
L_idr = -log(p_idr) # IDR likelihood
The key steps are sampling latent vectors, getting predictions, generating images, extracting features, updating models and prototypes, and updating the memory bank. The training loop runs for T epochs.