×

A Bayes-optimal view on adversarial examples. (English) Zbl 07626736

Summary: Since the discovery of adversarial examples – the ability to fool modern CNN classifiers with tiny perturbations of the input, there has been much discussion whether they are a “bug” that is specific to current neural architectures and training methods or an inevitable “feature” of high dimensional geometry. In this paper, we argue for examining adversarial examples from the perspective of Bayes-Optimal classification. We construct realistic image datasets for which the Bayes-Optimal classifier can be efficiently computed and derive analytic conditions on the distributions under which these classifiers are provably robust against any adversarial attack even in high dimensions. Our results show that even when these “gold standard” optimal classifiers are robust, CNNs trained on the same datasets consistently learn a vulnerable classifier, indicating that adversarial examples are often an avoidable “bug”. We further show that RBF SVMs trained on the same data consistently learn a robust classifier. The same trend is observed in experiments with real images in different datasets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Nicholas Carlini and David A. Wagner.Towards evaluating the robustness of neural networks. In2017 IEEE Symposium on Security and Privacy, SP 2017, San Jose, CA,
[2] Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian J. Goodfellow, Aleksander Madry, and Alexey Kurakin. On evaluating adversarial robustness.CoRR, abs/1902.06705, 2019. URLhttp://arxiv.org/abs/1902. 06705.
[3] Dimitrios Diochnos, Saeed Mahloujifar, and Mohammad Mahmoody. Adversarial risk and robustness: General definitions and implications for the uniform distribution. InAdvances · Zbl 1406.94074
[4] Elvis Dohmatob. Limitations of adversarial robustness: strong no free lunch theorem.arXiv preprint arXiv:1810.04065, 2018.
[5] Richard O Duda, Peter E Hart, and David G Stork.Pattern classification. Wiley, New York, 1973. · Zbl 0277.68056
[6] Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural Information Processing Systems 31, pages 1178-1187, 2018a. · Zbl 1462.62383
[7] Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. Analysis of classifiers’ robustness to adversarial perturbations.Machine Learning, 107(3):481-508, 2018b. doi: 10.1007/ s10994-017-5663-3. URLhttps://doi.org/10.1007/s10994-017-5663-3. · Zbl 1462.62383
[8] Nicolas Ford, Justin Gilmer, and Ekin D. Cubuk. Adversarial examples are a natural consequence of test error in noise, 2019. URLhttps://openreview.net/forum?id= S1xoy3CcYX.
[9] Zoubin Ghahramani, Geoffrey E Hinton, et al. The EM algorithm for mixtures of factor analyzers. Technical report, Technical Report CRG-TR-96-1, University of Toronto, 1996.
[10] Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S Schoenholz, Maithra Raghu, Martin Wattenberg, and Ian Goodfellow. Adversarial spheres. InInternational Conference on
[11] Ian Goodfellow, Patrick McDaniel, and Nicolas Papernot. Making machine learning robust against adversarial inputs.Commun. ACM, 61(7):56-66, June 2018. ISSN 0001-0782. doi: 10.1145/3134599. URLhttp://doi.acm.org/10.1145/3134599.
[12] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In3rd International Conference on Learning Representations, ICLR
[13] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein GANs. InAdvances in Neural Information Processing Systems, pages 5769-5779, 2017.
[14] Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. The entire regularization path for the support vector machine.Journal of Machine Learning Research, 5(Oct): 1391-1415, 2004. · Zbl 1222.68213
[15] Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Adversarial examples are not bugs, they are features. InAdvances in
[16] Susmit Jha, Uyeong Jang, Somesh Jha, and Brian Jalaian. Detecting adversarial examples using data manifolds. InMILCOM 2018-2018 IEEE Military Communications Conference
[17] Marc Khoury and Dylan Hadfield-Menell. On the geometry of adversarial examples.arXiv preprint arXiv:1811.00525, 2018.
[18] Diederik P Kingma and Max Welling. Auto-encoding variational Bayes.International Conference on Learning Representations, 2014. · Zbl 1431.68002
[19] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
[20] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of International Conference on Computer Vision (ICCV), 2015.
[21] Kaifeng Lyu and Jian Li. Gradient descent maximizes the margin of homogeneous neural networks. InInternational Conference on Learning Representations, 2020. URLhttps: //openreview.net/forum?id=SJeLIgBKPS.
[22] David JC MacKay and David JC Mac Kay.Information theory, inference and learning algorithms. Cambridge university press, 2003. · Zbl 1055.94001
[23] Saeed Mahloujifar, Dimitrios I Diochnos, and Mohammad Mahmoody. The curse of concentration in robust learning: Evasion and poisoning attacks from concentration of measure. In · Zbl 1473.68155
[24] Preetum Nakkiran. A discussion of ’adversarial examples are not bugs, they are features’: Adversarial examples are just bugs, too.Distill, 2019. doi: 10.23915/distill.00019.5. https://distill.pub/2019/advex-bugs-discussion/response-5.
[25] Nicolas Papernot, Ian Goodfellow, Ryan Sheatsley, Reuben Feinman, and Patrick McDaniel. cleverhans v1. 0.0: an adversarial machine learning library.arXiv preprint
[26] Fabian Pedregosa, Ga¨el Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python.Journal of machine learning research, 12(Oct): 2825-2830, 2011. · Zbl 1280.68189
[27] Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, and Hrushikesh Mhaskar. Theory of deep learning iii: explaining the non-overfitting puzzle, 2017.
[28] Eitan Richardson and Yair Weiss. On gans and gmms. InAdvances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montr´eal, Canada., pages 5852-5863, 2018.
[29] Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, and Aleksander Madry. Adversarially robust generalization requires more data. InAdvances in Neural Information
[30] L. Schott, J. Rauber, W. Brendel, and M. Bethge. Towards the first adversarially robust neural network model on mnist. InInternational Conference on Learning Representations
[31] Ali Shafahi, W Ronny Huang, Christoph Studer, Soheil Feizi, and Tom Goldstein. Are adversarial examples inevitable? InInternational Conference on Learning Representations, 2019.
[32] Adi Shamir, Itay Safran, Eyal Ronen, and Orr Dunkelman. A simple explanation for the existence of adversarial examples with small hamming distance.CoRR, abs/1901.10861, 2019. URLhttp://arxiv.org/abs/1901.10861.
[33] Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Suriya Gunasekar, and Nathan Srebro. The implicit bias of gradient descent on separable data.J. Mach. Learn. Res., 19:70:1-70:57, 2018. URLhttp://jmlr.org/papers/v19/18-188.html. · Zbl 1477.62192
[34] David Stutz, Matthias Hein, and Bernt Schiele. Disentangling adversarial robustness and generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and
[35] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. In2nd International
[36] Thomas Tanay and Lewis D. Griffin. A boundary tilting persepective on the phenomenon of adversarial examples.CoRR, abs/1608.07690, 2016. URLhttp://arxiv.org/abs/1608. 07690.
[37] Hongyang Zhang, Yaodong Yu, Jiantao Jiao, Eric Xing, Laurent El Ghaoui, and Michael Jordan. Theoretically principled trade-off between robustness and accuracy. InProceedings
[38] Yonggang Zhang, Xinmei Tian, Ya Li, Xinchao Wang, and Dacheng Tao. Principal component adversarial example.IEEE Transactions on Image Processing, 29:4804-4815, 2020. · Zbl 07586213
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.