×

Large sample properties of partitioning-based series estimators. (English) Zbl 1457.62108

The authors study nonparametric regression problems for univariate responses \(y_1, \ldots, y_n\) and \(\mathbb{R}^d\)-valued, continuously distributed covariates \(\mathbf{x}_1, \ldots, \mathbf{x}_n\), where the latter are supported on the compact set \(\mathcal{X}\). In this, the object of interest is the (mean) regression function \(\mu(\cdot)\), such that \(\mu(\mathbf{x}) = \mathbb{E}[y | \mathbf{x}]\).
The authors consider partitioning-based series least squares estimators (LSEs) for \(\mu(\cdot)\) and its derivatives, meaning that \(\mathcal{X}\) is partitioned into non-overlapping cells, on which basis functions are defined. Examples are spline bases, compactly supported wavelet bases, and piecewise polynomial bases.
First, the (asymptotic) bias of the LSE is characterized by means of its leading term. Based on this, three bias correction methods are derived. Second, the performance of the LSE is analyzed in terms of the asymptotic behaviour of its integrated mean squared error. Third, pointwise (for fixed \(\mathbf{x}\)) and uniform (over \(\mathcal{X}\)) inference methods are elaborated upon in terms of central limit theorems and strong approximations, respectively, based on undersmoothing and robust bias correction.
For practical purposes, the authors also propose ways of feasible tuning parameter selection, and they illustrate their theoretical findings by means of Monte Carlo simulations.

MSC:

62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
65C05 Monte Carlo methods
65D07 Numerical computation using splines
42C40 Nontrigonometric harmonic analysis involving wavelets and other special systems

References:

[1] Agarwal, G. G. and Studden, W. J. (1980). Asymptotic integrated mean square error using least squares and bias minimizing splines. Ann. Statist. 8 1307-1325. · Zbl 0522.62032 · doi:10.1214/aos/1176345203
[2] Belloni, A., Chernozhukov, V., Chetverikov, D. and Fernandez-Val, I. (2019). Conditional quantile processes based on series or many regressors. J. Econometrics. To appear. · Zbl 1456.62067 · doi:10.1016/j.jeconom.2019.04.003
[3] Belloni, A., Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Some new asymptotic theory for least squares series: Pointwise and uniform results. J. Econometrics 186 345-366. · Zbl 1331.62250 · doi:10.1016/j.jeconom.2015.02.014
[4] Bhatia, R. (2013). Matrix Analysis. Springer, New York. · Zbl 0863.15001
[5] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Statistics/Probability Series. Wadsworth Advanced Books and Software, Belmont, CA. · Zbl 0541.62042
[6] Calonico, S., Cattaneo, M. D. and Farrell, M. H. (2018). On the effect of bias estimation on coverage accuracy in nonparametric inference. J. Amer. Statist. Assoc. 113 767-779. · Zbl 1398.62113 · doi:10.1080/01621459.2017.1285776
[7] Calonico, S., Cattaneo, M. D. and Farrell, M. H. (2019). Coverage error optimal confidence intervals for local polynomial regression. arXiv:1808.01398.
[8] Calonico, S., Cattaneo, M. D. and Titiunik, R. (2014). Robust nonparametric confidence intervals for regression-discontinuity designs. Econometrica 82 2295-2326. · Zbl 1410.62066 · doi:10.3982/ECTA11757
[9] Calonico, S., Cattaneo, M. D. and Titiunik, R. (2015). Optimal data-driven regression discontinuity plots. J. Amer. Statist. Assoc. 110 1753-1769. · Zbl 1373.62569 · doi:10.1080/01621459.2015.1017578
[10] Cattaneo, M. D., Crump, R. K., Farrell, M. H. and Feng, Y. (2019). On binscatter. arXiv:1902.09608.
[11] Cattaneo, M. D., Crump, R. K., Farrell, M. H. and Schaumburg, E. (2019). Characteristic-sorted portfolios: Estimation and inference. Rev. Econ. Stat. To appear.
[12] Cattaneo, M. D. and Farrell, M. H. (2011). Efficient estimation of the dose-response function under ignorability using subclassification on the covariates. In Missing Data Methods: Cross-Sectional Methods and Applications. Adv. Econom. 27 93-127. Emerald Group Publ, Ltd., Bingley. · Zbl 1443.62027
[13] Cattaneo, M. D. and Farrell, M. H. (2013). Optimal convergence rates, Bahadur representation, and asymptotic normality of partitioning estimators. J. Econometrics 174 127-143. · Zbl 1283.62060 · doi:10.1016/j.jeconom.2013.02.002
[14] Cattaneo, M. D., Farrell, M. H. and Feng, Y. (2019). \( \mathtt{lspartition} \): Partitioning-based least squares regression. R J. To appear.
[15] Cattaneo, M. D., Farrell, M. H. and Feng, Y. (2020). Supplement to “Large sample properties of partitioning-based series estimators.” https://doi.org/10.1214/19-AOS1865SUPP.
[16] Chen, X. and Christensen, T. M. (2015). Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions. J. Econometrics 188 447-465. · Zbl 1337.62101 · doi:10.1016/j.jeconom.2015.03.010
[17] Chen, X. and Christensen, T. M. (2018). Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression. Quant. Econ. 9 39-84. · Zbl 1398.62088 · doi:10.3982/QE722
[18] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Gaussian approximation of suprema of empirical processes. Ann. Statist. 42 1564-1597. · Zbl 1317.60038 · doi:10.1214/14-AOS1230
[19] Chernozhukov, V., Chetverikov, D. and Kato, K. (2014). Anti-concentration and honest, adaptive confidence bands. Ann. Statist. 42 1787-1818. · Zbl 1305.62161 · doi:10.1214/14-AOS1235
[20] Chernozhukov, V., Chetverikov, D. and Kato, K. (2015). Comparison and anti-concentration bounds for maxima of Gaussian random vectors. Probab. Theory Related Fields 162 47-70. · Zbl 1319.60072 · doi:10.1007/s00440-014-0565-9
[21] Chernozhukov, V., Chetverikov, D. and Kato, K. (2016). Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings. Stochastic Process. Appl. 126 3632-3651. · Zbl 1351.60035 · doi:10.1016/j.spa.2016.04.009
[22] Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation and inference. Econometrica 81 667-737. · Zbl 1274.62233 · doi:10.3982/ECTA8718
[23] Davydov, O. (2001). Stable local bases for multivariate spline spaces. J. Approx. Theory 111 267-297. · Zbl 0979.41005 · doi:10.1006/jath.2001.3577
[24] Eggermont, P. P. B. and LaRiccia, V. N. (2009). Maximum Penalized Likelihood Estimation. Volume II: Regression. Springer Series in Statistics. Springer, Dordrecht. · Zbl 1184.62063
[25] Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. CRC Press, London.
[26] Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York. · Zbl 1021.62024
[27] Huang, J. Z. (1998). Projection estimation in multiple regression with application to functional ANOVA models. Ann. Statist. 26 242-272. · Zbl 0930.62042 · doi:10.1214/aos/1030563984
[28] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31 1600-1635. · Zbl 1042.62035 · doi:10.1214/aos/1065705120
[29] Jurinskii, V. V. (1978). On the error of the Gaussian approximation for convolutions. Theory Probab. Appl. 22 236-247. · Zbl 0378.60008 · doi:10.1137/1122030
[30] Komlós, J., Major, P. and Tusnády, G. (1975). An approximation of partial sums of independent \({\text{RV}} \)’s and the sample \({\text{DF}} \). I. Z. Wahrsch. Verw. Gebiete 32 111-131. · Zbl 0308.60029
[31] Komlós, J., Major, P. and Tusnády, G. (1976). An approximation of partial sums of independent RV’s, and the sample DF. II. Z. Wahrsch. Verw. Gebiete 34 33-58. · Zbl 0307.60045
[32] Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators. J. Econometrics 79 147-168. · Zbl 0873.62049 · doi:10.1016/S0304-4076(97)00011-0
[33] Nobel, A. (1996). Histogram regression estimation using data-dependent partitions. Ann. Statist. 24 1084-1105. · Zbl 0862.62038 · doi:10.1214/aos/1032526958
[34] Ruppert, D., Wand, M. P. and Carroll, R. J. (2009). Semiparametric Regression. Cambridge Univ. Press, New York. · Zbl 1326.62094 · doi:10.1214/09-EJS525
[35] Sakhanenko, A. I. (1985). Convergence rate in the invariance principle for non-identically distributed variables with exponential moments. In Advances in Probability Theory: Limit Theorems for Sums of Random Variables 2-73. · Zbl 0591.60027
[36] Sakhanenko, A. I. (1991). On the accuracy of normal approximation in the invariance principle [translation of Trudy Inst. Mat. (Novosibirsk) 13 (1989), Asimptot. Analiz Raspred. Sluch. Protsess., 40-66; MR1037248 (91d:60082)] 1 58-91. Siberian Adv. Math., 4, Siberian Advances in Mathematics. · Zbl 0845.60031
[37] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040-1053. · Zbl 0511.62048 · doi:10.1214/aos/1176345969
[38] Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285-323. · Zbl 1307.62118 · doi:10.1214/13-AOS1189
[39] Tukey, J. W. (1961). Curves as parameters, and touch estimation. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 681-694. Univ. California Press, Berkeley, CA. · Zbl 0105.12304
[40] Zaitsev, A. Yu. (2013). The accuracy of strong Gaussian approximation for sums of independent random vectors. Russian Math. Surveys 68 721-761. · Zbl 1287.60044 · doi:10.1070/RM2013v068n04ABEH004851
[41] Zhai, A. (2018). A high-dimensional CLT in W2 distance with near optimal convergence rate. Probab. Theory Relat. Fields 3-4 821-845. · Zbl 1429.60031 · doi:10.1007/s00440-017-0771-3
[42] Zhang, H. and Singer, B. H. (2010). Recursive Partitioning and Applications, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1271.62016
[43] Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. Ann. Statist. 26 1760-1782. · Zbl 0929.62052 · doi:10.1214/aos/1024691356
[44] Zhou, S. · Zbl 0970.62024
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.