
Nonparametric regression with selectively missing covariates. (English) Zbl 1471.62322

Summary: We consider the problem of regression with selectively observed covariates in a nonparametric framework. Our approach relies on instrumental variables that explain variation in the latent covariates but have no direct effect on selection. The regression function of interest is shown to be a weighted version of observed conditional expectation where the weighting function is a fraction of selection probabilities. Nonparametric identification of the fractional probability weight (FPW) function is achieved via a partial completeness assumption. We provide primitive functional form assumptions for partial completeness to hold. The identification result is constructive for the FPW series estimator. We derive the rate of convergence and also the pointwise asymptotic distribution. In both cases, the asymptotic performance of the FPW series estimator does not suffer from the inverse problem which derives from the nonparametric instrumental variable approach. In a Monte Carlo study, we analyze the finite sample properties of our estimator and we compare our approach to inverse probability weighting, which can be used alternatively for unconditional moment estimation. In the empirical application, we focus on two different applications. We estimate the association between income and health using linked data from the SHARE survey and administrative pension information and use pension entitlements as an instrument. In the second application we revisit the question how income affects the demand for housing based on data from the German Socio-Economic Panel Study (SOEP). In this application we use regional income information on the residential block level as an instrument. In both applications we show that income is selectively missing and we demonstrate that standard methods that do not account for the nonrandom selection process lead to significantly biased estimates for individuals with low income.


62G08 Nonparametric regression and quantile regression
62G05 Nonparametric estimation
62P20 Applications of statistics to economics


[1] Adeline, A.; Delattre, E., Some microeconometric evidence on the relationship between health and income, Health Econ. Rev., 7, 1, 27 (2017)
[2] Albouy, D.; Ehrlich, G.; Liu, Y., Housing Demand, Cost-of-Living Inequality, and the Affordability CrisisNBER Working Paper 22816 (2016)
[3] Belloni, A.; Chernozhukov, V.; Chetverikov, D.; Kato, K., Some new asymptotic theory for least squares series: Pointwise and uniform results, J. Econometrics, 186, 2, 345-366 (2015) · Zbl 1331.62250
[4] Bingley, P.; Martinello, A., Measurement error in income and schooling and the bias of linear estimators, J. Labor Econ., 35, 4, 1117-1148 (2017)
[5] Blundell, R.; Chen, X.; Kristensen, D., Semi-nonparametric IV estimation of shape-invariant engel curves, Econometrica, 75, 6, 1613-1669 (2007) · Zbl 1133.91461
[6] Bohannon, R. W., Muscle strength: clinical and prognostic value of hand-grip dynamometry, Curr. Opin. Clin. Nutr. Metab. Care, 18, 5, 465-470 (2015)
[7] Börsch-Supan, A.; Brandt, M.; Hunkler, C.; Kneip, T.; Korbmacher, J.; Malter, F.; Schaan, B.; Stuck, S.; Zuber, S., Data resource profile: the survey of health, ageing and retirement in europe (SHARE), Int. J. Epidemiol., 42, 4, 992-1001 (2013)
[8] Breunig, C., Testing missing at random using instrumental variables, J. Bus. Econom. Statist., 37, 2, 223-234 (2019)
[9] Breunig, C.; Mammen, E.; Simoni, A., Nonparametric estimation in case of endogenous selection, J. Econometrics, 202, 2, 268-285 (2018) · Zbl 1394.62040
[10] Chen, X., Large sample sieve estimation of semi-nonparametric models, Handbook of Econometrics (2007)
[11] Chen, X.; Christensen, T. M., Optimal uniform convergence rates and asymptotic normality for series estimators under weak dependence and weak conditions, J. Econometrics, 188, 2, 447-465 (2015) · Zbl 1337.62101
[12] Chen, X.; Christensen, T. M., Optimal sup-norm rates and uniform inference on nonlinear functionals of nonparametric IV regression, Quantit. Econ., 9, 1, 39-84 (2018) · Zbl 1398.62088
[13] Chen, X.; Linton, O.; Van Keilegom, I., Estimation of semiparametric models when the criterion function is not smooth, Econometrica, 71, 1591-1608 (2003) · Zbl 1154.62325
[14] Chen, X.; Pouzo, D., Estimation of nonparametric conditional moment models with possibly nonsmooth generalized residuals, Econometrica, 80, 1, 277-321 (2012) · Zbl 1274.62232
[15] Chen, Q.; Zeng, D.; Ibrahim, J. G., Sieve maximum likelihood estimation for regression models with covariates missing at random, J. Amer. Statist. Assoc., 102, 480, 1309-1317 (2007) · Zbl 1332.62112
[16] Chernozhukov, V.; Chetverikov, D.; Kato, K., Anti-concentration and honest, adaptive confidence bands, Ann. Statist., 42, 5, 1787-1818 (2014) · Zbl 1305.62161
[17] Cutler, D.; Deaton, A.; Lleras-Muney, A., The determinants of mortality, J. Econ. Perspect., 20, 3, 97-120 (2006)
[18] Cutler, D. M.; Lleras-Muney, A.; Vogl, T., Socioeconomic status and health: Dimensions and mechanisms, (The Oxford Handbook of Health Economics (2011))
[19] Das, M.; Newey, W. K.; Vella, F., Nonparametric estimation of sample selection models, Rev. Econom. Stud., 70, 1, 33-58 (2003) · Zbl 1060.62132
[20] Deaton, A.; Paxson, C., Aging and inequality in income and health., Amer. Econ. Rev.: Pap. Proc., 88, 2, 248-253 (1998)
[21] D’Haultfoeuille, X., A new instrumental method for dealing with endogenous selection, J. Econometrics, 154, 1, 1-15 (2010) · Zbl 1431.62608
[22] Dodds, R. M.; Syddall, H. E.; Cooper, R.; Benzeval, M.; Deary, I. J.; Dennison, E. M.; Der, G.; Gale, C. R.; Inskip, H. M.; Jagger, C., Grip strength across the life course: normative data from twelve british studies, PLoS One, 9, 12, Article e113637 pp. (2014)
[23] Dustmann, C.; Fitzenberger, B.; Zimmermann, M., Housing Expenditures and Income InequalityCReAM DP 16/18 (2018)
[24] Fang, F.; Zhao, J.; Shao, J., Imputation-based adjusted score equations in generalized linear models with nonignorable missing covariate values, Statistica Sinica, 28, 4, 1677-1701 (2018) · Zbl 1406.62080
[25] Goebel, J.; Spiess, C. K.; Witte, N. R.; Gerstenber, S., Die Verknuepfung des SOEP mit MICROM-Indikatoren: Der MICROM-SOEP-DatensatzSOEP Survey Paper 233 (2014)
[26] Hu, Y.; Schennach, S. M., Instrumental variable treatment of nonclassical measurement error models, Econometrica, 76, 1, 195-216 (2008) · Zbl 1132.62101
[27] Little, R. J.; Rubin, D. B., Statistical Analysis with Missing Data (2002), Wiley · Zbl 1011.62004
[28] Mammen, E., Bootstrap and wild bootstrap for high dimensional linear models, Ann. Statist., 255-285 (1993) · Zbl 0771.62032
[29] Newey, W. K., Convergence rates and asymptotic normality for series estimators, J. Econometrics, 79, 1, 147-168 (1997) · Zbl 0873.62049
[30] Newey, W. K.; Powell, J. L., Instrumental variable estimation of nonparametric models, Econometrica, 71, 1565-1578 (2003) · Zbl 1154.62415
[31] Pollard, D., A User’s Guide to Measure Theoretic Probability, Vol. 8 (2002), Cambridge University Press · Zbl 0992.60001
[32] Preston, S., The changing relation between mortality and level of economic development., Popul. Stud., 29, 2, 231-248 (1975)
[33] Quigley, J. M.; Raphael, S., Is housing unaffordable? Why isn’t it more affordable?, J. Econ. Perspect., 18, 1, 191-214 (2004)
[34] Ramalho, E. A.; Smith, R. J., Discrete choice non-response, Rev. Econom. Stud., 80, 1, 343-364 (2013) · Zbl 1409.62078
[35] Rantanen, T.; Guralnik, J. M.; Foley, D.; Masaki, K.; Leveille, S.; Curb, J. D.; White, L., Midlife hand grip strength as a predictor of old age disability, JAMA, 281, 6, 558-560 (1999)
[36] Schwandt, H., Wealth Shocks and Health Outcomes: Evidence from Stock market FluctuationsCepr discussion paper no. dp12562 (2018)
[37] Tang, G.; Little, R. J.; Raghunathan, T. E., Analysis of multivariate missing data with nonignorable nonresponse, Biometrika, 90, 4, 747-764 (2003) · Zbl 1436.62206
[38] van der Vaart, A.; Wellner, J., Weak Convergence and Empirical Processes: With Applications to Statistics (Springer Series in Statistics) (2000), Springer
[39] Wagner, G.; Frick, J.; Schupp, J., The german socio-economic panel study (SOEP) - scope, evolution and enhancements, Schmollers Jahrbuch, 127, 1, 139-169 (2007)
[40] Zhao, J.; Shao, J., Semiparametric pseudo-likelihoods in generalized linear models with nonignorable missing data, J. Amer. Statist. Assoc., 110, 512, 1577-1590 (2015) · Zbl 1373.62388
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.