**Florian Tramèr**

*
ICML Workshop on Adversarial Machine Learning (AdvML), 2021.
*

Best Paper Award.

- Slides: PDF PPTx
- arXiv: Version 2107.11630
- Video: ICML 2021

Making classifiers robust to adversarial examples is hard. Thus, many defenses tackle the seemingly easier task of detecting perturbed inputs.

We show a barrier towards this goal. We prove a general hardness reduction between detection and classification of adversarial examples: given a robust detector for attacks at distance ε(in some metric), we can build a similarly robust (but inefficient) classifier for attacks at distance ε/2.

Our reduction is computationally inefficient, and thus cannot be used to build practical classifiers. Instead, it is a useful sanity check to test whether empirical detection results imply something much stronger than the authors presumably anticipated.

To illustrate, we revisit 13 detector defenses. For 11/13 cases, we show that the claimed detection results would imply an inefficient classifier with robustness far beyond the state-of-the-art.

@inproceedings{Tra21, | |||

author | = | {Tram{\`e}r, Florian}, | |

title | = | {Detecting Adversarial Examples Is (Nearly) As Hard As Classifying Them}, | |

booktitle | = | {ICML Workshop on Adversarial Machine Learning (AdvML)}, | |

year | = | {2021}, | |

howpublished | = | {arXiv preprint arXiv:2107.11630}, | |

url | = | {https://arxiv.org/abs/2107.11630} | |

} |