×

Robust inference via multiplier bootstrap. (English) Zbl 1458.62075

Under the standard linear model of the form \[ Y = \boldsymbol{X}^\top \boldsymbol{\theta}^* + \varepsilon, \] the authors are concerned with robust statistical inference methods for the parameter vector \(\boldsymbol{\theta}^*\), provided that a random sample \((Y_1, \boldsymbol{X}_1), \ldots, (Y_n, \boldsymbol{X}_n)\) is at hand. In this, they define robustness as robustness against heavy tails of the regression error term \(\varepsilon\), meaning that only a few finite moments of \(\varepsilon\) exist (conditionally to \(\boldsymbol{X}\)). The authors propose to estimate \(\boldsymbol{\theta}^*\) by means of the Huber estimator \(\widehat{\boldsymbol{\theta}}_\tau\), where the tuning parameter \(\tau\) is called the robustification parameter. Based on \(\widehat{\boldsymbol{\theta}}_\tau\), confidence sets can be constructed by means of the multiplier bootstrap method. The authors establish the validity of the resulting bootstrap scheme under certain conditions regarding the choice of \(\tau\), which should be adapted to the sample size \(n\), the dimension \(d\) of \(\boldsymbol{\theta}^*\), and the number \(\delta \geq 0\), when it is assumed that \(\mathbb{E}\left(|\varepsilon|^{2 + \delta} | \boldsymbol{X}\right)\) is finite. Data-driven procedures for choosing \(\tau\) are also discussed. Finally, the authors consider the case that \(m \gg 1\) regression models are simultaneously under consideration, and that their intercepts shall simultaneously be tested for being zero. The latter problem has applications in empirical finance. A multiplier bootstrap-based variant of the linear step-up test by Y. Benjamini and Y. Hochberg [J. R. Stat. Soc., Ser. B 57, No. 1, 289–300 (1995; Zbl 0809.62014)] for control of the false discovery rate is proposed in this context. The authors’ theoretical results are illustrated by numerical studies based on computer simulations.

MSC:

62F35 Robustness and adaptive procedures (parametric inference)
62F40 Bootstrap, jackknife and other resampling methods
62J15 Paired and multiple comparisons; multiple testing
62J05 Linear regression; mixed models
60F10 Large deviations

Citations:

Zbl 0809.62014

Software:

FAMT

References:

[1] Arlot, S., Blanchard, G. and Roquain, E. (2010). Some nonasymptotic results on resampling in high dimension. I. Confidence regions. Ann. Statist. 38 51-82. · Zbl 1180.62066 · doi:10.1214/08-AOS667
[2] Audibert, J.-Y. and Catoni, O. (2011). Robust linear least squares regression. Ann. Statist. 39 2766-2794. · Zbl 1231.62126 · doi:10.1214/11-AOS918
[3] Barras, L., Scaillet, O. and Wermers, R. (2010). False discoveries in mutual fund performance: Measuring luck in estimated alphas. J. Finance 65 179-216.
[4] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289-300. · Zbl 0809.62014 · doi:10.1111/j.2517-6161.1995.tb02031.x
[5] Berk, J. B. and Green, R. C. (2004). Mutual fund flows and performance in rational markets. J. Polit. Econ. 112 1269-1295.
[6] Brownlees, C., Joly, E. and Lugosi, G. (2015). Empirical risk minimization for heavy-tailed losses. Ann. Statist. 43 2507-2536. · Zbl 1326.62066 · doi:10.1214/15-AOS1350
[7] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148-1185. · Zbl 1282.62070 · doi:10.1214/11-AIHP454
[8] Catoni, O. and Giulini, L. (2017). Dimension free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression. Technical Report.
[9] Chatterjee, S. and Bose, A. (2005). Generalized bootstrap for estimating equations. Ann. Statist. 33 414-436. · Zbl 1065.62073 · doi:10.1214/009053604000000904
[10] Chen, X. and Zhou, W.-X (2020). Supplement to “Robust inference via multiplier bootstrap.” https://doi.org/10.1214/19-AOS1863SUPP.
[11] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786-2819. · Zbl 1292.62030 · doi:10.1214/13-AOS1161
[12] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787-1818. · Zbl 1305.62161 · doi:10.1214/14-AOS1235
[13] Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s \(t\)-statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 283-301. · Zbl 1411.62222 · doi:10.1111/j.1467-9868.2010.00761.x
[14] Desai, K. H. and Storey, J. D. (2012). Cross-dimensional inference of dependent high-dimensional data. J. Amer. Statist. Assoc. 107 135-151. · Zbl 1261.62048 · doi:10.1080/01621459.2011.645777
[15] Devroye, L., Lerasle, M., Lugosi, G. and Oliveira, R. I. (2016). Sub-Gaussian mean estimators. Ann. Statist. 44 2695-2725. · Zbl 1360.62115 · doi:10.1214/16-AOS1440
[16] Dudoit, S. and van der Laan, M. J. (2008). Multiple Testing Procedures with Applications to Genomics. Springer Series in Statistics. Springer, New York. · Zbl 1261.62014
[17] Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge Univ. Press, Cambridge. · Zbl 1277.62016
[18] Fama, E. F. and French, K. R. (1993). Common risk factors in the returns on stocks and bonds. J. Financ. Econ. 33 3-56. · Zbl 1131.91335 · doi:10.1016/0304-405X(93)90023-5
[19] Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s \(t\) or bootstrap calibration be applied? J. Amer. Statist. Assoc. 102 1282-1288. · Zbl 1332.62063 · doi:10.1198/016214507000000969
[20] Fan, J., Han, X. and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. J. Amer. Statist. Assoc. 107 1019-1035. · Zbl 1395.62219 · doi:10.1080/01621459.2012.720478
[21] Fan, J., Li, Q. and Wang, Y. (2017). Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 247-265. · Zbl 1414.62178 · doi:10.1111/rssb.12166
[22] Fan, J., Liao, Y. and Yao, J. (2015). Power enhancement in high-dimensional cross-sectional tests. Econometrica 83 1497-1541. · Zbl 1410.62201 · doi:10.3982/ECTA12749
[23] Friguet, C., Kloareg, M. and Causeur, D. (2009). A factor model approach to multiple testing under dependence. J. Amer. Statist. Assoc. 104 1406-1415. · Zbl 1205.62071 · doi:10.1198/jasa.2009.tm08332
[24] Giulini, I. (2017). Robust PCA and pairs of projections in a Hilbert space. Electron. J. Stat. 11 3903-3926. · Zbl 1384.62185 · doi:10.1214/17-EJS1343
[25] Hahn, M. G., Kuelbs, J. and Weiner, D. C. (1990). The asymptotic joint distribution of self-normalized censored sums and sums of squares. Ann. Probab. 18 1284-1341. · Zbl 0725.62017 · doi:10.1214/aop/1176990747
[26] Hsu, D. and Sabato, S. (2016). Loss minimization and parameter estimation with heavy tails. J. Mach. Learn. Res. 17 18. · Zbl 1360.62380
[27] Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35 73-101. · Zbl 0136.39805 · doi:10.1214/aoms/1177703732
[28] Huber, P. J. and Ronchetti, E. M. (2009). Robust Statistics, 2nd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ. · Zbl 1276.62022
[29] Lan, W. and Du, L. (2019). A factor-adjusted multiple testing procedure with application to mutual fund selection. J. Bus. Econom. Statist. 37 147-157.
[30] Lepskii, O. V. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatn. Primen. 36 645-659. · Zbl 0738.62045
[31] Lintner, J. (1965). The valuation of risk assets and the selection of risky investment in stock portfolios and capital budgets. Rev. Econ. Stat. 47 13-37.
[32] Liu, W. and Shao, Q.-M. (2014). Phase transition and regularized bootstrap in large-scale \(t\)-tests with false discovery rate control. Ann. Statist. 42 2003-2025. · Zbl 1305.62213 · doi:10.1214/14-AOS1249
[33] Lugosi, G. and Mendelson, S. (2019). Sub-Gaussian estimators of the mean of a random vector. Ann. Statist. 47 783-794. · Zbl 1417.62192 · doi:10.1214/17-AOS1639
[34] Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308-2335. · Zbl 1348.60041 · doi:10.3150/14-BEJ645
[35] Minsker, S. (2018). Sub-Gaussian estimators of the mean of a random matrix with heavy-tailed entries. Ann. Statist. 46 2871-2903. · Zbl 1418.62235 · doi:10.1214/17-AOS1642
[36] Qi, L. and Sun, D. (1999). A survey of some nonsmooth equations and smoothing Newton methods. In Progress in Optimization. Appl. Optim. 30 121-146. Kluwer Academic, Dordrecht. · Zbl 0957.65042
[37] Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. J. Finance 19 425-442.
[38] Spokoiny, V. and Zhilova, M. (2015). Bootstrap confidence sets under model misspecification. Ann. Statist. 43 2653-2675. · Zbl 1327.62179 · doi:10.1214/15-AOS1355
[39] Sun, Q., Zhou, W.-X. and Fan, J. (2019). Adaptive Huber regression. Technical Report.
[40] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, New York. · Zbl 0862.60002
[41] Vershynin, R. (2018). High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics 47. Cambridge Univ. Press, Cambridge. · Zbl 1430.60005
[42] Wang, J., Zhao, Q., Hastie, T. and Owen, A. B. (2017). Confounder adjustment in multiple hypothesis testing. Ann. Statist. 45 1863-1894. · Zbl 1486.62223 · doi:10.1214/16-AOS1511
[43] Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9 60-62. · JFM 64.1211.05 · doi:10.1214/aoms/1177732360
[44] Zhilova, M. (2016). Non-classical Berry-Esseen inequality and accuracy of the weighted bootstrap. Technical Report.
[45] Zhou, W. · Zbl 1409.62154 · doi:10.1214/17-AOS1606
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.