Can Neural Network Memorization Be Localized?
Authors: Pratyush Maini, Michael C. Mozer, Hanie Sedghi, Zachary C. Lipton, J. Zico Kolter, Chiyuan Zhang
Abstract: Recent efforts at explaining the interplay of memorization and generalization in deep overparametrized networks have posited that neural networks “hard” examples in the final few layers of the model. Memorization refers to the ability to correctly predict on examples of the training set. In this work, we show that rather than being confined to individual layers, memorization is a phenomenon confined to a small set of neurons in various layers of the model. First, via three experimental sources of converging evidence, we find that most layers are redundant for the memorization of examples and the layers that contribute to example memorization are, in general, not the final layers. The three sources are (measuring the contribution to the gradient norms from memorized and clean examples), (replacing specific model weights of a converged model with previous training checkpoints), and (training rewound layers only on clean examples). Second, we ask a more generic question: can memorization be localized in a model? We discover that memorization is often confined to a small number of neurons or channels (around 5) of the model. Based on these insights we propose a new form of dropout — that enables us to direct the memorization of examples to an apriori determined set of neurons. By dropping out these neurons, we are able to reduce the accuracy on memorized examples from , while also reducing the generalization gap.
What, Why and How
Here is a summary of the key points from this paper:
What:
-
This paper investigates whether memorization of examples in neural networks is localized to specific layers or neurons. Prior work hypothesized memorization occurs in the final layers.
-
The authors perform experiments to analyze memorization at both the layer and neuron level through gradient accounting, weight rewinding, and targeted dropout.
Why:
-
Understanding where memorization occurs has implications for generalizability, privacy, and removing memorization.
-
If memorization is localized, one could potentially modify or regularize those areas to reduce memorization.
How:
-
Gradient accounting shows memorized examples influence weights across all layers, not just later ones.
-
Rewinding weights of most layers maintains memorization accuracy, indicating redundancy.
-
Few neurons are needed to flip predictions of memorized examples, suggesting localization.
-
They propose example-tied dropout to direct memorization to specific neurons. This significantly reduces memorization accuracy while maintaining generalization.
In summary, this paper shows memorization is dispersed across neurons in multiple layers, rather than isolated in later layers. The authors are able to localize memorization to specific neurons using targeted dropout.
Main Contributions
The main contributions of this paper are:
-
Providing empirical evidence that memorization is not confined to the final layers of a neural network, contrary to prior beliefs. The authors show through gradient accounting, weight rewinding, and retraining that most layers do not play a critical role in memorizing examples.
-
Demonstrating that memorization can be localized to a small set of neurons distributed across multiple layers of the network. On average, far fewer neurons are needed to flip the predictions on memorized examples compared to clean examples.
-
Introducing a new targeted dropout technique called example-tied dropout that directs memorization to a predetermined set of neurons. Using this during training strongly concentrates memorization into those neurons.
-
Showing that dropping out the memorization neurons at test time significantly reduces accuracy on memorized examples, from 100% to 0.1-3%, while minimally impacting accuracy on clean examples.
-
Providing insights that memorization being dispersed has implications for privacy, forgetting, and regularization methods that target only some layers. Localizing memorization with dropout enables reducing overfitting.
In summary, the key contributions are empirically demonstrating that memorization is not isolated to final layers, showing it can be localized to few neurons across layers, and developing an example-tied dropout method to direct and reduce memorization.
Method Section
Here is a summary of the methods section from the paper:
The authors conduct experiments in two stages to analyze memorization in neural networks:
Stage 1: Layer-level Memorization
-
Gradient Accounting: Compute gradient norms from clean and noisy (mislabeled) examples. Compare their relative contributions and cosine similarity across layers and over training.
-
Weight Rewinding: Replace weights of individual layers with earlier checkpoints and evaluate model accuracy, especially on noisy examples. Shows which layers are critical.
-
Layer Retraining: Retrain individual layers on clean data only. Check if model still memorizes noisy examples, indicating redundancy.
Key Findings:
-
Noisy examples influence gradients across all layers, not just later ones.
-
Rewinding most layers maintains memorization accuracy.
-
Retraining layers on clean data still allows memorizing noisy examples.
Stage 2: Neuron-level Memorization
-
Iteratively prune neurons that are most important for an example’s prediction. Check the minimum neurons needed to flip predictions.
-
Propose example-tied dropout to direct memorization to specific neurons during training.
Key Findings:
-
Far fewer neurons needed to flip noisy examples compared to clean ones.
-
Neurons critical for memorization are scattered across layers.
-
Example-tied dropout localizes memorization to predefined neurons which can be dropped at test time.
In summary, the methods involve analyzing memorization at both layer and neuron granularity through gradients, rewinding, retraining, and targeted dropout. The key findings are that memorization is dispersed across layers and neurons, but can be localized with example-tied dropout.
High-Level Pseudo Code
Here is the high-level pseudocode for the key experiments in the paper:
# Gradient Accounting
for batch in train_batches:
clean_grads = backprop(clean_examples)
noisy_grads = backprop(noisy_examples)
record_norms(clean_grads, noisy_grads)
record_cosine_sim(clean_grads, noisy_grads)
# Weight Rewinding
rewind_layer_weights(model, layer_idx, epoch_num)
eval_accuracy(model, noisy_examples)
# Layer Retraining
for layer_idx in range(num_layers):
rewind_layer_weights(model, layer_idx, epoch_0)
retrain_layer(model, layer_idx, clean_batches)
eval_accuracy(model, noisy_examples)
# Neuron-level Memorization
for example in examples:
important_neurons = []
for i in range(max_prunes):
model_copy = model.copy()
prune_neuron(model_copy, example)
if pred_changed(model, model_copy, example):
important_neurons.append(pruned_neuron)
record_num_important_neurons(example, important_neurons)
# Example-tied Dropout
for batch in train_batches:
# Forward pass
mem_neurons = get_memorization_neurons(batch)
gen_neurons = get_generalization_neurons()
out = forward(batch, dropout=gen_neurons)
out += forward(batch, dropout=mem_neurons)
# Backward pass
backprop(out)
The key methods are analyzing layer and neuron importance through gradients, rewinding, and pruning, and directing memorization with example-tied dropout.
## Detailed Pseudo Code
Here is some more detailed pseudocode to implement the key experiments in the paper:
```python
# Gradient Accounting
for epoch in range(num_epochs):
for batch in train_batches:
clean_examples, clean_labels = get_clean_subset(batch)
noisy_examples, noisy_labels = get_noisy_subset(batch)
# Forward pass
model.zero_grad()
clean_out = model(clean_examples)
noisy_out = model(noisy_examples)
# Backward pass
clean_loss = loss_fn(clean_out, clean_labels)
clean_loss.backward()
clean_grads = model.get_gradients()
noisy_loss = loss_fn(noisy_out, noisy_labels)
noisy_loss.backward()
noisy_grads = model.get_gradients()
# Record norms and cosine sim
for layer in model.layers:
clean_norm = torch.norm(clean_grads[layer])
noisy_norm = torch.norm(noisy_grads[layer])
sim = cosine_similarity(clean_grads[layer], noisy_grads[layer])
record_metrics(layer, epoch, clean_norm, noisy_norm, sim)
# Weight Rewinding
for layer_idx in range(num_layers):
model.load_state_dict(checkpoint_epochs[epoch_num]) # Rewind weights
accuracy = test(model, noisy_examples) # Check accuracy
record_accuracy(layer_idx, epoch_num, accuracy)
# Layer Retraining
for layer_idx in range(num_layers):
# Rewind weights
model.load_state_dict(checkpoint_epochs[0])
# Freeze all except layer_idx
for params in model.parameters():
params.requires_grad = False
model.layers[layer_idx].requires_grad = True
# Retrain only layer_idx
for epoch in range(num_epochs):
for batch in clean_batches:
out = model(batch)
loss = loss_fn(out, batch_labels)
loss.backward()
optimizer.step()
# Check accuracy on noisy examples
accuracy = test(model, noisy_examples)
record_accuracy(layer_idx, accuracy)
# Example-tied Dropout
mem_neurons = [] # IDs of memorization neurons
gen_neurons = [] # IDs of generalization neurons
for batch in train_batches:
batch_mem_neurons = [] # Memorization neurons for this batch
for example in batch:
# Sample k neurons specific to this example
example_mem_neurons = sample(mem_neurons, k)
batch_mem_neurons.extend(example_mem_neurons)
# Forward pass
gen_out = model(batch, dropout=gen_neurons)
mem_out = model(batch, dropout=batch_mem_neurons)
out = gen_out + mem_out
# Backward pass
loss = loss_fn(out, batch_labels)
loss.backward()
optimizer.step()
This shows some more implementation details like getting clean/noisy subsets, freezing layer weights, sampling example-specific neurons, and combining generalization and memorization outputs.