Florian Tramèr | Publications

Adversarial Training and Robustness for Multiple Perturbations

Florian Tramèr and Dan Boneh

Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight Presentation)

Links:

Abstract

Defenses against adversarial examples, such as adversarial training, are typically tailored to a single perturbation type (e.g., small Linf-noise). For other perturbations, these defenses offer no guarantees and, at times, even increase the model’s vulnerability. Our aim is to understand the reasons underlying this robustness trade-off, and to train models that are simultaneously robust to multiple perturbation types. We prove that a trade-off in robustness to different types of Lp-bounded and spatial perturbations must exist in a natural and simple statistical setting. We corroborate our formal analysis by demonstrating similar robustness trade-offs on MNIST and CIFAR10. Building upon new multi-perturbation adversarial training schemes, and a novel efficient attack for finding L1-bounded adversarial examples, we show that no model trained against multiple attacks achieves robustness competitive with that of models trained on each attack individually. In particular, we uncover a pernicious gradient-masking phenomenon on MNIST, which causes adversarial training with first-order Linf, L1 and L2 adversaries to achieve merely 50% accuracy. Our results question the viability and computational scalability of extending adversarial robustness, and adversarial training, to multiple perturbation types.

BibTeX

@inproceedings{TB19b,
author	=	{Tram{\`e}r, Florian and Boneh, Dan},
title	=	{Adversarial Training and Robustness for Multiple Perturbations},
booktitle	=	{Conference on Neural Information Processing Systems (NeurIPS)},
pages	=	{5866--5876},
year	=	{2019},
howpublished	=	{arXiv preprint arXiv:1904.13000},
url	=	{https://arxiv.org/abs/1904.13000}
}