# Adding Adversarial Attacks
NOTE
The development guidelines explain how to get started with with developing features and adversarial attacks for Foolbox.
# The Attack
base class
Adversarial attacks in Foolbox should either directly or indirectly subclass
the Attack
base class in foolbox/attacks/base.py
.
An attack in Foolbox needs to implement two methods, __call__
and repeat
.
Both methods need to be implemented with the same signature as the base class.
The type annotation for the criterion
argument of __call__
can be made
more precise, see foolbox/attacks/carlini_wagner.py
for an example.
The __call__
method should return three values, a list of raw tensors (one
for each epsilon) with the internal raw attack results, a list of tensors
corresponding to the raw tensors but with perturbation sizes guaranteed to
be clipped to the given epsilons, and a boolean tensor with len(epsilons)
rows and len(inputs)
columns indicating for each returned sample whether
it is a successful adversarial example given the respective epsilon and
criterion. If epsilons
is a single scalar epsilon (and not a list with
one element), then the first and second return value should be a tensor
rather than a list and the third return value should be 1-D tensor.
All returned tensors must have the same type as the input tensors. In
particular, native tensors should be returned as native tensors and
EagerPy-wrapped tensors should be returned as EagerPy-wrapped tensors.
Use astensor_
or astensors_
and restore_type
.
The repeat
method should return a version of the attack that repeats itself
n times and returns the best result.
NOTE
In practice, it is usually not necessary to subclass Attack
directly.
Instead, for most attacks it is easiest to subclass either FixedEpsilonAttack
or MinimizatonAttack
.
# The FixedEpsilonAttack
base class
Attacks that require a fixed epsilon and try to find an adversarial
perturbation given this perturbation budget (e.g. FGSM
and PGD
) should
be implemented by subclassing FixedEpsilonAttack
. It already provides
implementations of __call__
and repeat
. The attack just needs
to specify the distance
property (simply assign a class variable) and
implement the run
method that gets a single epsilon
and returns a batch
of perturbed inputs, ideally adversarial and ideally with a perturbation
size smaller or equal to epsilon
.
The distance
is used by __call__
to determine which perturbed inputs
are actually adversarials given epsilon
and by repeat
to determine the
run.
# The MinimizatonAttack
base class
Attacks that try to find adversarial examples with minimal perturbation size
(e.g. the Carlini & Wagner
attack or the Boundary Attack
) should
be implemented by subclassing MinimizatonAttack
. It already provides
implementations of __call__
and repeat
. The attack just needs
to specify the distance
property (simply assign a class variable) and
implement the run
method that returns a batch of minimally perturbed
adversarials. For MinimizatonAttack
subclasses, run
gets called only once
by __call__
independent of how many epsilons
are given. The __call__
method then compares the minimal adversarial perturbation to the different
epsilons.
TIP
You should have a look at the implementation of existing attacks to get an impression of the best practices and conventions used in Foolbox.