×

Confidence intervals for low dimensional parameters in high dimensional linear models. (English) Zbl 1411.62196

Summary: The purpose of this paper is to propose methodologies for statistical inference of low dimensional parameters with high dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broader context. The theoretical results that are presented provide sufficient conditions for the asymptotic normality of the proposed estimators along with a consistent estimator for their finite dimensional covariance matrices. These sufficient conditions allow the number of variables to exceed the sample size and the presence of many small non-zero coefficients. Our methods and theory apply to interval estimation of a preconceived regression coefficient or contrast as well as simultaneous interval estimation of many regression coefficients. Moreover, the method proposed turns the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients, which can be used to select variables after proper thresholding. The simulation results that are presented demonstrate the accuracy of the coverage probability of the confidence intervals proposed as well as other desirable properties, strongly supporting the theoretical results.

MSC:

62J05 Linear regression; mixed models
62F25 Parametric tolerance and confidence regions

Software:

PDCO

References:

[1] Antoniadis, A. ( 2010) Comments on: urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0626‐penalization for mixture regression models. Test, 19, 257– 258. · Zbl 1203.62124
[2] Belloni, A., Chernozhukov, V. and Wang, L. ( 2011) Square‐root Lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98, 791– 806. · Zbl 1228.62083
[3] Berk, R., Brown, L. B. and Zhao, L. ( 2010) Statistical inference after model selection. J. Quant. Crimin., 26, 217– 236.
[4] Bickel, P. J. and Levina, E. ( 2008) Regularized estimation of large covariance matrices. Ann. Statist., 36, 199– 227. · Zbl 1132.62040
[5] Bickel, P., Ritov, Y. and Tsybakov, A. ( 2009) Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 37, 1705– 1732. · Zbl 1173.62022
[6] Bühlmann, P. and van de Geer, S. ( 2011) Statistics for High‐dimensional Data: Methods, Theory and Applications. New York: Springer. · Zbl 1273.62015
[7] Candès, E. J. and Tao, T. ( 2005) Decoding by linear programming. IEEE Trans. Inform. Theor., 51, 4203– 4215. · Zbl 1264.94121
[8] Candès, E. and Tao, T. ( 2007) The dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Statist., 35, 2313– 2404. · Zbl 1139.62019
[9] Chen, S. S., Donoho, D. L. and Saunders, M. A. ( 2001) Atomic decomposition by basis pursuit. SIAM Rev., 43, 129– 159. · Zbl 0979.94010
[10] Davidson, K. and Szarek, S. ( 2001) Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces, vol. 1 (eds W. B. Johnson and J. Lindenstrauss). Amsterdam: North‐Holland. · Zbl 1067.46008
[11] Donoho, D. L. and Johnstone, I. ( 1994) Minimax risk over urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0627‐balls for urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0628‐error. Probab. Theor. Reltd Flds, 99, 277– 303.
[12] Fan, J. and Li, R. ( 2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Statist. Ass., 96, 1348– 1360. · Zbl 1073.62547
[13] Fan, J. and Lv, J. ( 2008) Sure independence screening for ultrahigh dimensional feature space (with discussion). J. R. Statist. Soc. B, 70, 849– 911. · Zbl 1411.62187
[14] Fan, J. and Lv, J. ( 2010) A selective overview of variable selection in high dimensional feature space. Statist. Sin., 20, 101– 148. · Zbl 1180.62080
[15] Fan, J. and Peng, H. ( 2004) On non‐concave penalized likelihood with diverging number of parameters. Ann. Statist., 32, 928– 961. · Zbl 1092.62031
[16] Fano, R. ( 1961) Transmission of Information; a Statistical Theory of Communications. Cambridge: Massachusetts Institute of Technology Press. · Zbl 1474.94054
[17] Frank, I. E. and Friedman, J. H. ( 1993) A statistical view of some chemometrics regression tools (with discussion). Technometrics, 35, 109– 148. · Zbl 0775.62288
[18] van de Geer, S. and Bühlmann, P. ( 2009) On the conditions used to prove oracle results for the Lasso. Electron. J. Statist., 3, 1360– 1392. · Zbl 1327.62425
[19] Greenshtein, E. ( 2006) Best subset selection, persistence in high‐dimensional statistical learning and optimization under urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0629 constraint. Ann. Statist., 34, 2367– 2386. · Zbl 1106.62022
[20] Greenshtein, E. and Ritov, Y. ( 2004) Persistence in high‐dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971– 988. · Zbl 1055.62078
[21] Huang, J., Ma, S. and Zhang, C.‐H. ( 2008) Adaptive Lasso for sparse high‐dimensional regression models. Statist. Sin., 18, 1603– 1618. · Zbl 1255.62198
[22] Huang, J. and Zhang, C.‐H. ( 2012) Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res., 13, 1809– 1834.
[23] Kim, Y., Choi, H. and Oh, H.‐S. ( 2008) Smoothly clipped absolute deviation on high dimensions. J. Am. Statist. Ass., 103, 1665– 1673. · Zbl 1286.62062
[24] Koltchinskii, V. ( 2009) The dantzig selector and sparsity oracle inequalities. Bernoulli, 15, 799– 828. · Zbl 1452.62486
[25] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. ( 2011) Nuclear‐norm penalization and optimal rates for noisy low‐rank matrix completion. Ann. Statist., 39, 2302– 2329. · Zbl 1231.62097
[26] Laber, E. and Murphy, S. A. ( 2011) Adaptive confidence intervals for the test error in classification (with discussion). J. Am. Statist. Ass., 106, 904– 913. · Zbl 1229.62085
[27] Leeb, H. and Potscher, B. M. ( 2006) Can one estimate the conditional distribution of post‐model‐selection es timators?Ann. Statist., 34, 2554– 2591. · Zbl 1106.62029
[28] Meinshausen, N. and Bühlmann, P. ( 2006) High‐dimensional graphs and variable selection with the Lasso. Ann. Statist., 34, 1436– 1462. · Zbl 1113.62082
[29] Meinshausen, N. and Bühlmann, P. ( 2010) Stability selection (with discussion). J. R. Statist. Soc. B, 72, 417– 473. · Zbl 1411.62142
[30] Meinshausen, N. and Yu, B. ( 2009) Lasso‐type recovery of sparse representations for high‐dimensional data. Ann. Statist., 37, 246– 270. · Zbl 1155.62050
[31] Städler, N., Bühlmann, P. and van de Geer, S. ( 2010) urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0630‐penalization for mixture regression models (with discussion). Test, 19, 209– 285.
[32] Sun, T. and Zhang, C.‐H. ( 2010) Comments on: urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0631‐penalization for mixture regression models. Test, 19, 270– 275. · Zbl 1203.62130
[33] Sun, T. and Zhang, C.‐H. ( 2012) Scaled sparse linear regression. Biometrika, 99, 879– 898. · Zbl 1452.62515
[34] Tibshirani, R. ( 1996) Regression shrinkage and selection via the lasso. J. R. Statist. Soc. B, 58, 267– 288. · Zbl 0850.62538
[35] Tropp, J. A. ( 2006) Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theor., 52, 1030– 1051. · Zbl 1288.94025
[36] Wainwright, M. J. ( 2009a) Sharp thresholds for noisy and high‐dimensional recovery of sparsity using urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0632‐constrained quadratic programming (Lasso). IEEE Trans. Inform. Theor., 55, 2183– 2202. · Zbl 1367.62220
[37] Wainwright, M. J. ( 2009b) Information‐theoretic limitations on sparsity recovery in the high‐dimensional and noisy setting. IEEE Trans. Inform. Theor., 55, 5728– 5741. · Zbl 1367.94106
[38] Ye, F. and Zhang, C.‐H. ( 2010) Rate minimaxity of the Lasso and Dantzig selector for the urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0633 loss in urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0634 balls. J. Mach. Learn. Res., 11, 3481– 3502.
[39] Zhang, C.‐H. ( 2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist., 38, 894– 942. · Zbl 1183.62120
[40] Zhang, C.‐H. ( 2011) Statistical inference for high‐dimensional data. In Very High Dimensional Semiparametric Models, Report No. 48/2011, pp. 28– 31. Mathematisches Forschungsinstitut Oberwolfach.
[41] Zhang, C.‐H. and Huang, J. ( 2008) The sparsity and bias of the Lasso selection in high‐dimensional linear regression. Ann. Statist., 36, 1567– 1594. · Zbl 1142.62044
[42] Zhang, C.‐H. and Zhang, S. S. ( 2011) Confidence intervals for low‐dimensional parameters in high‐dimensional linear models. Preprint arXiv:1110.2563.
[43] Zhang, C.‐H. and Zhang, T. ( 2012) A general theory of concave regularization for high dimensional sparse estimation problems. Statist. Sci., 27, 576– 593. · Zbl 1331.62353
[44] Zhang, T. ( 2009) Some sharp performance bounds for least squares regression with urn:x-wiley:13697412:media:rssb12026:rssb12026-math-0635 regularization. Ann. Statist., 37, no. 5A, 2109– 2144. · Zbl 1173.62029
[45] Zhang, T. ( 2011a) Adaptive forward‐backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theor., 57, 4689– 4708. · Zbl 1365.62288
[46] Zhang, T. ( 2011b) Multi‐stage convex relaxation for feature selection. Preprint arXiv:1106.0565.
[47] Zhao, P. and Yu, B. ( 2006) On model selection consistency of Lasso. J. Mach. Learn. Res., 7, 2541– 2567. · Zbl 1222.62008
[48] Zou, H. ( 2006) The adaptive Lasso and its oracle properties. J. Am. Statist. Ass., 101, 1418– 1429. · Zbl 1171.62326
[49] Zou, H. and Li, R. ( 2008) One‐step sparse estimates in nonconcave penalized likelihood models. Ann. Statist., 36, 1509– 1533. · Zbl 1142.62027
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.