Beating Backdoor Attack at Its Own Game
Authors: Min Liu, Alberto Sangiovanni-Vincentelli, Xiangyu Yue
Abstract: Deep neural networks (DNNs) are vulnerable to backdoor attack, which does not affect the network’s performance on clean data but would manipulate the network behavior once a trigger pattern is added. Existing defense methods have greatly reduced attack success rate, but their prediction accuracy on clean data still lags behind a clean model by a large margin. Inspired by the stealthiness and effectiveness of backdoor attack, we propose a simple but highly effective defense framework which injects non-adversarial backdoors targeting poisoned samples. Following the general steps in backdoor attack, we detect a small set of suspected samples and then apply a poisoning strategy to them. The non-adversarial backdoor, once triggered, suppresses the attacker’s backdoor on poisoned data, but has limited influence on clean data. The defense can be carried out during data preprocessing, without any modification to the standard end-to-end training pipeline. We conduct extensive experiments on multiple benchmarks with different architectures and representative attacks. Results demonstrate that our method achieves state-of-the-art defense effectiveness with by far the lowest performance drop on clean data. Considering the surprising defense ability displayed by our framework, we call for more attention to utilizing backdoor for backdoor defense. Code is available at non-adversarial_backdoor.
What, Why and How
Here is a summary of the key points in this paper:
What:
- The paper proposes a novel defense framework called Non-Adversarial Backdoor (NAB) against backdoor attacks on deep neural networks.
Why:
- Backdoor attacks can manipulate model behaviors by inserting trigger patterns, but existing defenses often reduce accuracy on clean data.
- Inspired by the stealthiness of backdoor attacks, the authors propose using non-adversarial backdoors that target poisoned samples to suppress adversarial backdoors.
How:
- Detect a small set of suspicious samples from the training data.
- Inject a non-adversarial backdoor by stamping the samples and assigning them pseudo labels.
- The non-adversarial backdoor suppresses the adversarial backdoor on poisoned data but has limited impact on clean data.
- Experiments show NAB achieves state-of-the-art defense with negligible accuracy drop on clean data.
In summary, the paper proposes using non-adversarial backdoors targeting poisoned samples to beat backdoor attacks. This simple but effective defense achieves high accuracy on clean data while defending against various backdoor attacks.
Main Contributions
Here are the key contributions of this paper:
-
Proposes the idea of using non-adversarial backdoors for defense against backdoor attacks. To the best of the authors’ knowledge, this is the first work to utilize backdoors for defense rather than attack.
-
Presents the NAB framework that implements the idea of injecting non-adversarial backdoors targeting poisoned samples. The framework is simple, flexible and can be easily augmented with other techniques.
-
Achieves state-of-the-art defense performance against various backdoor attacks with negligible accuracy drop on clean data. Experiments on CIFAR-10 and Tiny ImageNet validate the effectiveness of NAB.
-
Provides extensive analyses on the working mechanism of NAB, including visualizations of model representations and attention. This offers insights into how the non-adversarial backdoor suppresses the adversarial one.
-
Discusses future research directions to further explore backdoors for defense, such as sample-efficient backdoor injection and backdoor vaccination.
-
Highlights the long-term value of NAB as a general framework for backdoor defense research, not just the specific implementation. More advanced techniques can be integrated into the framework.
In summary, the key contributions are: proposing the idea of using backdoors for defense, presenting the NAB framework, achieving state-of-the-art defense performance, providing in-depth analyses, and discussing future research directions in this area. The paper opens up a new direction for exploring backdoors to defend against backdoor attacks.
Method Section
Here is a summary of the method section in the paper:
The paper proposes the Non-Adversarial Backdoor (NAB) framework for defending against backdoor attacks. The key steps are:
Backdoor Detection:
- Detect a small set of suspicious samples from the training data. This can be done using existing detection methods.
Poisoning Strategy:
- Apply a trigger pattern (stamp) to the detected samples.
- Generate pseudo labels for the stamped samples. This is done using a relabeling function.
Non-Adversarial Backdoor Injection:
- Stamp the detected samples whose original and pseudo labels differ. This injects a non-adversarial backdoor.
- Merge the stamped samples back into the training data with their pseudo labels.
Model Training:
- Train the model on the processed dataset with standard procedures. The non-adversarial backdoor is learned alongside the classification task.
Testing:
- Stamp all test inputs to keep the non-adversarial backdoor triggered, suppressing the adversarial backdoor.
- An optional filtering technique can further identify poisoned samples using predictions with and without stamp.
In summary, the key steps are detecting suspicious samples, injecting a non-adversarial backdoor via stamping and relabeling, and keeping it triggered at test time to defend against backdoor attacks. The method is implemented via data preprocessing and standard model training.
High-Level Pseudo Code
Here is the high-level pseudo code for the Non-Adversarial Backdoor (NAB) method proposed in the paper:
# Input: poisoned dataset D_p
# Backdoor detection
D_s', D_c' = BACKDOOR_DETECTION(D_p, μ)
# Initialize processed dataset
D_p' = D_c'
# Poisoning strategy
for (x, y) in D_s':
# Apply stamp
x = STAMP(x)
# Generate pseudo label
y' = RELABEL(x)
# Inject backdoor
if y' != y:
D_p'.append((x, y'))
else:
D_p'.append((x, y))
# Model training on D_p'
# Testing
for x in test_data:
# Apply stamp
x = STAMP(x)
# Make prediction
y = MODEL(x)
The key steps are:
- Detect suspicious samples from poisoned data
- Stamp and relabel detected samples to inject non-adversarial backdoor
- Train model on poisoned data with injected backdoor
- Stamp all test inputs to trigger non-adversarial backdoor
This implements the core idea of using a non-adversarial backdoor on poisoned samples to defend against backdoor attacks.
Detailed Pseudo Code
Here is a more detailed pseudo code to implement the Non-Adversarial Backdoor (NAB) method:
import defenses # custom defense methods
import models # model architectures
# Hyperparameters
MU = 0.05 # detection rate
STAMP_PATTERN = [[0,0,0,0],
[0,0,0,0]] # 2x2 stamp
# Backdoor detection
detector = defenses.SPECTRE()
D_s', D_c' = detector.detect(D_p, MU)
# Pseudo label generation
relabeller = defenses.NearestCenter()
relabeller.fit(D_train)
# Poisoning strategy
D_p' = D_c'
for x, y in D_s':
# Apply stamp
x[:2,:2] = STAMP_PATTERN
# Generate pseudo label
y' = relabeller.predict(x)
# Inject backdoor
if y' != y:
D_p'.append((x, y'))
else:
D_p'.append((x, y))
# Model training
model = models.ResNet18()
model.fit(D_p')
# Testing
for x in D_test:
# Apply stamp
x[:2,:2] = STAMP_PATTERN
# Make prediction
y = model.predict(x)
Key details:
- Use existing detection method (e.g. SPECTRE) to obtain suspicious samples
- Generate pseudo labels (e.g. using nearest class center)
- Apply stamp and relabel samples to inject backdoor
- Train model on poisoned data with backdoor using standard pipelines
- Stamp test inputs to trigger backdoor defense
This shows a detailed implementation with configurable components like detection and relabeling methods.