×

All-in-one robust estimator of the Gaussian mean. (English) Zbl 1486.62159

Summary: The goal of this paper is to show that a single robust estimator of the mean of a multivariate Gaussian distribution can enjoy five desirable properties. First, it is computationally tractable in the sense that it can be computed in a time, which is at most polynomial in dimension, sample size and the logarithm of the inverse of the contamination rate. Second, it is equivariant by translations, uniform scaling and orthogonal transformations. Third, it has a high breakdown point equal to 0.5, and a nearly minimax rate breakdown point approximately equal to 0.28. Fourth, it is minimax rate optimal, up to a logarithmic factor, when data consists of independent observations corrupted by adversarially chosen outliers. Fifth, it is asymptotically efficient when the rate of contamination tends to zero. The estimator is obtained by an iterative reweighting approach. Each sample point is assigned a weight that is iteratively updated by solving a convex optimization problem. We also establish a dimension-free nonasymptotic risk bound for the expected error of the proposed estimator. It is the first result of this kind in the literature and involves only the effective rank of the covariance matrix. Finally, we show that the obtained results can be extended to sub-Gaussian distributions, as well as to the cases of unknown rate of contamination or unknown covariance matrix.

MSC:

62H12 Estimation in multivariate analysis
62F35 Robustness and adaptive procedures (parametric inference)

Software:

AWS; robustbase

References:

[1] BALAKRISHNAN, S., DU, S. S., LI, J. and SINGH, A. (2017). Computationally efficient robust sparse estimation in high dimensions. In COLT 2017 169-212.
[2] BATENI, A.-H. and DALALYAN, A. S. (2020). Confidence regions and minimax rates in outlier-robust estimation on the probability simplex. Electron. J. Stat. 14 2653-2677. · Zbl 1447.62031 · doi:10.1214/20-EJS1731
[3] Cai, T. T. and Li, X. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Statist. 43 1027-1059. · Zbl 1328.62381 · doi:10.1214/14-AOS1290
[4] Cannings, T. I., Fan, Y. and Samworth, R. J. (2020). Classification with imperfect training labels. Biometrika 107 311-330. · Zbl 1441.62165 · doi:10.1093/biomet/asaa011
[5] CHEN, M., GAO, C. and REN, Z. (2016). A general decision theory for Huber’s \(ϵ\)-contamination model. Electron. J. Stat. 10 3752-3774. · Zbl 1357.62038 · doi:10.1214/16-EJS1216
[6] Chen, M., Gao, C. and Ren, Z. (2018). Robust covariance and scatter matrix estimation under Huber’s contamination model. Ann. Statist. 46 1932-1960. · Zbl 1408.62104 · doi:10.1214/17-AOS1607
[7] CHENG, Y., DIAKONIKOLAS, I. and GE, R. (2019). High-dimensional robust mean estimation in nearly-linear time. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms 2755-2771. SIAM, Philadelphia, PA. · Zbl 1432.68615 · doi:10.1137/1.9781611975482.171
[8] CHERAPANAMJERI, Y., FLAMMARION, N. and BARTLETT, P. L. (2019). Fast mean estimation with sub-gaussian rates. In Proceedings of COLT Proceedings of Machine Learning Research 99 786-806. PMLR.
[9] CHINOT, G. (2020). ERM and RERM are optimal estimators for regression problems when malicious outliers corrupt the labels. Electron. J. Stat. 14 3563-3605. · Zbl 1453.62484 · doi:10.1214/20-EJS1754
[10] COLLIER, O. and DALALYAN, A. S. (2019). Multidimensional linear functional estimation in sparse Gaussian models and robust estimation of the mean. Electron. J. Stat. 13 2830-2864. · Zbl 1432.62203 · doi:10.1214/19-EJS1590
[11] Comminges, L. and Dalalyan, A. S. (2012). Tight conditions for consistency of variable selection in the context of high dimensionality. Ann. Statist. 40 2667-2696. · Zbl 1373.62154 · doi:10.1214/12-AOS1046
[12] COMMINGES, L., COLLIER, O., NDAOUD, M. and TSYBAKOV, A. B. (2021). Adaptive robust estimation in sparse vector model. Ann. Statist. 49 1347-1377. · Zbl 1476.62063 · doi:10.1214/20-aos2002
[13] COX, B., JUDITSKY, A. and NEMIROVSKI, A. (2014). Dual subgradient algorithms for large-scale nonsmooth learning problems. Math. Program. 148 143-180. · Zbl 1305.65149 · doi:10.1007/s10107-013-0725-1
[14] DALALYAN, A. S. and MINASYAN, A. (2022). Supplement to “All-In-One Robust Estimator of the Gaussian Mean.” https://doi.org/10.1214/21-AOS2145SUPP
[15] DALALYAN, A. and THOMPSON, P. (2019). Outlier-robust estimation of a sparse linear model using \[{\ell_1}\]-penalized Huber’s M-estimator. In NeurIPS 32 13188-13198.
[16] DEPERSIN, J. and LECUÉ, G. (2019). Robust subgaussian estimation of a mean vector in nearly linear time. Available at 1906.03058. · Zbl 1486.62077
[17] DEVROYE, L., LERASLE, M., LUGOSI, G. and OLIVEIRA, R. I. (2016). Sub-Gaussian mean estimators. Ann. Statist. 44 2695-2725. · Zbl 1360.62115 · doi:10.1214/16-AOS1440
[18] DIAKONIKOLAS, I. and KANE, D. M. (2019). Recent advances in algorithmic high-dimensional robust statistics. CoRR. Available at 1911.05911.
[19] DIAKONIKOLAS, I., KANE, D. M. and STEWART, A. (2017). Statistical query lower bounds for robust estimation of high-dimensional Gaussians and Gaussian mixtures (extended abstract). In 58th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2017 73-84. IEEE Computer Soc., Los Alamitos, CA. · doi:10.1109/FOCS.2017.16
[20] Diakonikolas, I., Kamath, G., Kane, D. M., Li, J., Moitra, A. and Stewart, A. (2016). Robust estimators in high dimensions without the computational intractability. In 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016 655-664. IEEE Computer Soc., Los Alamitos, CA. · Zbl 1421.68149 · doi:10.1109/FOCS.2016.85
[21] DIAKONIKOLAS, I., KAMATH, G., KANE, D. M., LI, J., MOITRA, A. and STEWART, A. (2018). Robustly learning a Gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms 2683-2702. SIAM, Philadelphia, PA. · Zbl 1403.68185 · doi:10.1137/1.9781611975031.171
[22] DIAKONIKOLAS, I., KAMATH, G., KANE, D., LI, J., MOITRA, A. and STEWART, A. (2019a). Robust estimators in high-dimensions without the computational intractability. SIAM J. Comput. 48 742-864. · Zbl 1421.68149 · doi:10.1137/17M1126680
[23] DIAKONIKOLAS, I., KANE, D., KARMALKAR, S., PRICE, E. and STEWART, A. (2019b). Outlier-robust high-dimensional sparse estimation via iterative filtering. In NeurIPS 2019 10688-10699.
[24] DONG, Y., HOPKINS, S. B. and LI, J. (2019). Quantum entropy scoring for fast robust mean estimation and improved outlier detection. In NeurIPS 2019 6065-6075.
[25] DONOHO, D. (1982). Breakdown properties of multivariate location estimators. Phd thesis, Harvard Univ.
[26] DONOHO, D. L. and GASKO, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. 20 1803-1827. · Zbl 0776.62031 · doi:10.1214/aos/1176348890
[27] Donoho, D. and Huber, P. J. (1983). The notion of breakdown point. In A Festschrift for Erich L. Lehmann. Wadsworth Statist./Probab. Ser. 157-184. Wadsworth, Belmont, CA. · Zbl 0523.62032
[28] ELSENER, A. and VAN DE GEER, S. (2018). Robust low-rank matrix estimation. Ann. Statist. 46 3481-3509. · Zbl 1412.62068 · doi:10.1214/17-AOS1666
[29] GAO, C. (2020). Robust regression via mutivariate regression depth. Bernoulli 26 1139-1170. · Zbl 1466.62368 · doi:10.3150/19-BEJ1144
[30] HAMPEL, F. R. (1968). Contributions to the theory of robust estimation. PhD thesis, Univ. California, Berkeley.
[31] HOPKINS, S. B. (2018). Sub-gaussian mean estimation in polynomial time. CoRR. Available at 1809.07425.
[32] Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1276.62022 · doi:10.1002/9780470434697
[33] Klopp, O., Lounici, K. and Tsybakov, A. B. (2017). Robust matrix completion. Probab. Theory Related Fields 169 523-564. · Zbl 1383.62167 · doi:10.1007/s00440-016-0736-y
[34] Koltchinskii, V. and Lounici, K. (2017). Concentration inequalities and moment bounds for sample covariance operators. Bernoulli 23 110-133. · Zbl 1366.60057 · doi:10.3150/15-BEJ730
[35] LAI, K. A., RAO, A. B. and VEMPALA, S. (2016). Agnostic estimation of mean and covariance. In 57th Annual IEEE Symposium on Foundations of Computer Science—FOCS 2016 665-674. IEEE Computer Soc., Los Alamitos, CA.
[36] LECUÉ, G. and LERASLE, M. (2019). Learning from MOM’s principles: Le Cam’s approach. Stochastic Process. Appl. 129 4385-4410. · Zbl 1435.62175 · doi:10.1016/j.spa.2018.11.024
[37] Lecué, G. and Lerasle, M. (2020). Robust machine learning by median-of-means: Theory and practice. Ann. Statist. 48 906-931. · Zbl 1487.62034
[38] Lepski, O. V. and Spokoiny, V. G. (1997). Optimal pointwise adaptive methods in nonparametric estimation. Ann. Statist. 25 2512-2546. · Zbl 0894.62041 · doi:10.1214/aos/1030741083
[39] LEPSKII, O. V. (1992). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 682-697. · Zbl 0776.62039
[40] LI, A. H. and BRADIC, J. (2018). Boosting in the presence of outliers: Adaptive classification with nonconvex loss functions. J. Amer. Statist. Assoc. 113 660-674. · Zbl 1398.62167 · doi:10.1080/01621459.2016.1273116
[41] Loh, P.-L. (2017). Statistical consistency and asymptotic normality for high-dimensional robust \(M\)-estimators. Ann. Statist. 45 866-896. · Zbl 1371.62023 · doi:10.1214/16-AOS1471
[42] LOPUHAÄ, H. P. and ROUSSEEUW, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Statist. 19 229-248. · Zbl 0733.62058 · doi:10.1214/aos/1176347978
[43] LUGOSI, G. and MENDELSON, S. (2019). Near-optimal mean estimators with respect to general norms. Probab. Theory Related Fields 175 957-973. · Zbl 1431.62234 · doi:10.1007/s00440-019-00906-4
[44] LUGOSI, G. and MENDELSON, S. (2020). Risk minimization by median-of-means tournaments. J. Eur. Math. Soc. (JEMS) 22 925-965. · Zbl 1436.62312 · doi:10.4171/jems/937
[45] LUGOSI, G. and MENDELSON, S. (2021). Robust multivariate mean estimation: The optimality of trimmed mean. Ann. Statist. 49 393-410. · Zbl 1461.62069 · doi:10.1214/20-AOS1961
[46] Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley Series in Probability and Statistics. Wiley, Chichester. · Zbl 1094.62040 · doi:10.1002/0470010940
[47] Minsker, S. (2018). Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann. Statist. 46 2871-2903. · Zbl 1418.62235 · doi:10.1214/17-AOS1642
[48] POLZEHL, J. and SPOKOINY, V. G. (2000). Adaptive weights smoothing with applications to image restoration. J. R. Stat. Soc. Ser. B. Stat. Methodol. 62 335-354. · doi:10.1111/1467-9868.00235
[49] Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer. Statist. Assoc. 79 871-880. · Zbl 0547.62046
[50] ROUSSEEUW, P. (1985). Multivariate estimation with high breakdown point. In Mathematical Statistics and Applications, Vol. B (Bad Tatzmannsdorf, 1983) 283-297. Reidel, Dordrecht. · Zbl 0609.62054
[51] ROUSSEEUW, P. and HUBERT, M. (2013). High-breakdown estimators of multivariate location and scatter. In Robustness and Complex Data Structures 49-66. Springer, Heidelberg. · doi:10.1007/978-3-642-35494-6_4
[52] SOLTANOLKOTABI, M. and CANDÉS, E. J. (2012). A geometric analysis of subspace clustering with outliers. Ann. Statist. 40 2195-2238. · Zbl 1318.62217 · doi:10.1214/12-AOS1034
[53] STAHEL, W. (1981). Robuste Schätzungen: Infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. Phd thesis, ETH, Zurich. · Zbl 0531.62036
[54] Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210-268. Cambridge Univ. Press, Cambridge.
[55] ZHU, B., JIAO, J. and STEINHARDT, J. (2020). When does the Tukey median work?. In IEEE International Symposium on Information Theory (ISIT)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.