Introduction

Adversarial attacks have become a critical area of research in deep learning security. These attacks demonstrate that even state-of-the-art neural networks can be fooled by carefully crafted perturbations that are often imperceptible to humans.

NOTE

This post covers the mathematical foundations of adversarial attacks. Familiarity with gradient-based optimization is helpful but not required.

WARNING

Adversarial examples can pose real security risks in deployed ML systems. Always consider robustness when deploying models in safety-critical applications.

Mathematical Foundation

The core idea behind adversarial attacks is to find a perturbation $\delta$ that, when added to an input $x$ , causes the model to misclassify while keeping the perturbation small:

\underset{\delta}{\text{maximize}} \quad \mathcal{L}(f(x + \delta), y) \quad \text{s.t.} \quad \|\delta\|_p \leq \epsilon

where $\mathcal{L}$ is the loss function, $f$ is the neural network, $y$ is the true label, and $\epsilon$ bounds the perturbation magnitude.

Fast Gradient Sign Method (FGSM)

One of the simplest and most effective attacks is FGSM, introduced by Goodfellow et al. The perturbation is computed as:

\delta = \epsilon \cdot \text{sign}(\nabla_x \mathcal{L}(\theta, x, y))

This single-step attack follows the gradient direction to maximize the loss.

Projected Gradient Descent (PGD)

PGD extends FGSM by iteratively applying smaller perturbations:

x^{t+1} = \Pi_{x + \mathcal{S}} \left( x^t + \alpha \cdot \text{sign}(\nabla_x \mathcal{L}(\theta, x^t, y)) \right)

where $\Pi$ projects the result back onto the valid perturbation set $\mathcal{S}$ .

PGD Algorithm

The complete PGD attack algorithm can be expressed as follows:

Algorithm 1: Projected Gradient Descent Attack

Loading pseudocode...

Defense Mechanisms

Several defense strategies have been proposed:

Adversarial Training: Augmenting training data with adversarial examples
Input Preprocessing: Applying transformations to remove perturbations
Certified Defenses: Providing provable robustness guarantees

Adversarial Training Algorithm

The standard adversarial training procedure can be formalized as:

Algorithm 2: Adversarial Training

Loading pseudocode...

Conclusion

Understanding adversarial attacks is crucial for deploying neural networks in safety-critical applications. The mathematical framework provides both insight into vulnerabilities and guidance for building more robust models.

This post is based on my thesis research at Politecnico di Milano on adversarial robustness of deep neural networks.