×

Inference in linear regression models with many covariates and heteroscedasticity. (English) Zbl 1402.62036

Summary: The linear regression model is widely used in empirical work in economics, statistics, and many other disciplines. Researchers often include many covariates in their linear model specification in an attempt to control for confounders. We give inference methods that allow for many covariates and heteroscedasticity. Our results are obtained using high-dimensional approximations, where the number of included covariates is allowed to grow as fast as the sample size. We find that all of the usual versions of Eicker-White heteroscedasticity consistent standard error estimators for linear models are inconsistent under this asymptotics. We then propose a new heteroscedasticity consistent standard error formula that is fully automatic and robust to both (conditional) heteroscedasticity of unknown form and the inclusion of possibly many covariates. We apply our findings to three settings: parametric linear models with many covariates, linear panel models with many fixed effects, and semiparametric semi- linear models with many technical regressors. Simulation evidence consistent with our theoretical results is provided, and the proposed methods are also illustrated with an empirical application.

MSC:

62F12 Asymptotic properties of parametric estimators
62J05 Linear regression; mixed models
62J86 Fuzziness, and linear inference and regression
62P20 Applications of statistics to economics

References:

[1] Abadie, A.; Imbens, G. W.; Zheng, F., Inference for misspecified models with fixed regressors, Journal of the American Statistical Association, 109, 1601-1614, (2014) · Zbl 1368.62188
[2] Anatolyev, S., Inference in regression models with many regressors, Journal of Econometrics, 170, 368-382, (2012) · Zbl 1443.62054
[3] Angrist, J.; Hahn, J., When to control for covariates? panel asymptotics for estimates of treatment effects, Review of Economics and Statistics, 86, 58-72, (2004)
[4] Belloni, A.; Chernozhukov, V.; Chetverikov, D.; Kato, K., On the asymptotic theory for least squares series: pointwise and uniform results, Journal of Econometrics, 186, 345-366, (2015) · Zbl 1331.62250
[5] Belloni, A.; Chernozhukov, V.; Hansen, C., Inference on treatment effects after selection among high-dimensional controls, Review of Economic Studies, 81, 608-650, (2014) · Zbl 1409.62142
[6] Belloni, A.; Chernozhukov, V.; Hansen, C.; Fernandez-Val, I., Program evaluation and causal inference with high-dimensional data, Econometrica, 85, 233-298, (2017) · Zbl 1410.62197
[7] Bera, A. K.; Suprayitno, T.; Premaratne, G., On some heteroscedasticity-robust estimators of variance-covariance matrix of the least-squares estimators, Journal of Statistical Planning and Inference, 108, 121-136, (2002) · Zbl 1095.62506
[8] Bickel, P. J.; Freedman, D. A., Bootstrapping regression models with many parameters, A Festschrift for Erich L. Lehmann, (1983), Chapman and Hall, Boca Raton, FL · Zbl 0529.62057
[9] Carneiro, P.; Heckman, J. J.; Vytlacil, E. J., Estimating marginal returns to education, American Economic Review, 101, 2754-2781, (2011)
[10] Cattaneo, M. D.; Farrell, M. H., Efficient estimation of the dose-response function under ignorability using subclassification on the covariates, Missing-Data Methods: Cross-sectional Methods and Applications (Advances in Econometrics), 93-127, (2011), Bingley, UK: Emerald Group Publishing · Zbl 1443.62027
[11] Optimal convergence rates, bahadur representation, and asymptotic normality of partitioning estimators, Journal of Econometrics, 174, 127-143, (2013) · Zbl 1283.62060
[12] Cattaneo, M. D.; Jansson, M.; Newey, W. K., Alternative asymptotics and the partially linear model with many regressors, Econometric Theory, 34, 277-301, (2018) · Zbl 1441.62630
[13] Chen, X., Large sample sieve estimation of semi-nonparametric models, Handbook of Econometrics, 5549-5632, (2007), Amsterdam, Netherlands: Elsevier Science B.V.
[14] Chesher, A., Hájek inequalities, measures of leverage and the size of heteroscedasticity robust Wald tests, Econometrica, 57, 971-977, (1989) · Zbl 0684.62077
[15] Chesher, A.; Jewitt, I., The bias of a heteroscedasticity consistent covariance matrix estimator, Econometrica, 55, 1217-1222, (1987) · Zbl 0634.62051
[16] Cochran, W. G., The effectiveness of adjustment by subclassification in removing bias in observational studies, Biometrics, 24, 295-313, (1968)
[17] Cribari-Neto, F.; Lima, M., A sequence of improved standard errors under heteroscedasticity of unknown form, Journal of Statistical Planning and Inference, 141, 3617-3627, (2011) · Zbl 1221.62086
[18] Cribari-Neto, F.; Ferrari, S. L. P.; Cordeiro, G. M., Improved heteroscedasticity-consistent covariance matrix estimators, Biometrika, 87, 907-918, (2000) · Zbl 1028.62044
[19] Donald, S. G.; Newey, W. K., Series estimation of semilinear models, Journal of Multivariate Analysis, 50, 30-40, (1994) · Zbl 0798.62074
[20] El Karoui, N.; Bean, D.; Bickel, P. J.; Lim, C.; Yu, B., On robust regression with high-dimensional predictors, Proceedings of the National Academy of Sciences, 110, 14557-14562, (2013) · Zbl 1359.62184
[21] Farrell, M. H., Robust inference on average treatment effects with possibly more covariates than observations, Journal of Econometrics, 189, 1-23, (2015) · Zbl 1337.62113
[22] Freedman, D. A., Bootstrapping regression models, Annals of Statistics, 9, 1218-1228, (1981) · Zbl 0449.62046
[23] Gonçalves, S.; White, H., Bootstrap standard error estimates for linear regression, Journal of the American Statistical Association, 100, 970-979, (2005) · Zbl 1117.62347
[24] Huber, P. J., Robust regression: asymptotics, conjectures, and Monte Carlo, Annals of Stastistics, 1, 799-821, (1973) · Zbl 0289.62033
[25] Kauermann, G.; Carroll, R. J., A note on the efficiency of sandwich covariance matrix estimation, Journal of the American Statistical Association, 96, 1387-1396, (2001) · Zbl 1073.62539
[26] Kline, P.; Santos, A., Higher order properties of the wild bootstrap under misspecification, Journal of Econometrics, 171, 54-70, (2012) · Zbl 1443.62114
[27] Koenker, R., Asymptotic theory and econometric practice, Journal of Applied Econometrics, 3, 139-147, (1988)
[28] Li, C.; Müller, U. K., Linear regression with many controls of limited explanatory power, working paper, Princeton University, (2017)
[29] Long, J. S.; Ervin, L. H., Using heteroscedasticity consistent standard errors in the linear regression model, The American Statistician, 54, 217-224, (2000)
[30] MacKinnon, J.; White, H., Some heteroscedasticity-consistent covariance matrix estimators with improved finite sample properties, Journal of Econometrics, 29, 305-325, (1985)
[31] MacKinnon, J. G., Thirty years of heteroscedasticity-robust inference, Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis, (2012), Springer, New York
[32] Mammen, E., Bootstrap and wild bootstrap for high dimensional linear models, Annals of Statistics, 21, 255-285, (1993) · Zbl 0771.62032
[33] Müller, U. K., Risk of Bayesian inference in misspecified models, and the sandwich covariance matrix, Econometrica, 81, 1805-1849, (2013) · Zbl 1291.62069
[34] Newey, W. K., Convergence rates and asymptotic normality for series estimators, Journal of Econometrics, 79, 147-168, (1997) · Zbl 0873.62049
[35] Rosenbaum, P. R.; Rubin, D. B., The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55, (1983) · Zbl 0522.62091
[36] Shao, J.; Wu, C. F. J., Heteroscedasticity-robustness of jackknife variance estimators in linear models, Annals of Statistics, 15, 1563-1579, (1987) · Zbl 0651.62064
[37] Stock, J. H.; Watson, M. W., Heteroscedasticity-robust standard errors for fixed effects panel data regression, Econometrica, 76, 155-174, (2008) · Zbl 1132.62102
[38] Varah, J. M., A lower bound for the smallest singular value of a matrix, Linear Algebra and its Applications, 11, 3-5, (1975) · Zbl 0312.65028
[39] Verdier, V., Estimation and inference for linear models with two-way fixed effects and sparsely matched data, Working paper, UNC, (2017)
[40] White, H., A heteroscedasticity-consistent covariance matrix estimator and a direct test for heteroscedasticity, Econometrica, 48, 817-838, (1980) · Zbl 0459.62051
[41] Wu, C. F. J., Jackknife, bootstrap and other resampling methods in regression analysis, Annals of Statistics, 14, 1261-1295, (1986) · Zbl 0618.62072
[42] Zheng, S.; Jiang, D.; Bai, Z.; He, X., Inference on multiple correlation coefficients with moderately high dimensional data, Biometrika, 101, 748-754, (2014) · Zbl 1336.62157
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.