Understanding Adversarial Attacks on Neural Networks
A deep dive into perturbation-based adversarial attacks, their mathematical foundations, and defense mechanisms.
Introduction
Adversarial attacks have become a critical area of research in deep learning security. These attacks demonstrate that even state-of-the-art neural networks can be fooled by carefully crafted perturbations that are often imperceptible to humans.
NOTE
This post covers the mathematical foundations of adversarial attacks. Familiarity with gradient-based optimization is helpful but not required.
WARNING
Adversarial examples can pose real security risks in deployed ML systems. Always consider robustness when deploying models in safety-critical applications.
Mathematical Foundation
The core idea behind adversarial attacks is to find a perturbation that, when added to an input , causes the model to misclassify while keeping the perturbation small:
where is the loss function, is the neural network, is the true label, and bounds the perturbation magnitude.
Fast Gradient Sign Method (FGSM)
One of the simplest and most effective attacks is FGSM, introduced by Goodfellow et al. The perturbation is computed as:
This single-step attack follows the gradient direction to maximize the loss.
Projected Gradient Descent (PGD)
PGD extends FGSM by iteratively applying smaller perturbations:
where projects the result back onto the valid perturbation set .
PGD Algorithm
The complete PGD attack algorithm can be expressed as follows:
Algorithm 1: Projected Gradient Descent Attack
Defense Mechanisms
Several defense strategies have been proposed:
- Adversarial Training: Augmenting training data with adversarial examples
- Input Preprocessing: Applying transformations to remove perturbations
- Certified Defenses: Providing provable robustness guarantees
Adversarial Training Algorithm
The standard adversarial training procedure can be formalized as:
Algorithm 2: Adversarial Training
Conclusion
Understanding adversarial attacks is crucial for deploying neural networks in safety-critical applications. The mathematical framework provides both insight into vulnerabilities and guidance for building more robust models.
This post is based on my thesis research at Politecnico di Milano on adversarial robustness of deep neural networks.