×

Inference on regressions with interval data on a regressor or outcome. (English) Zbl 1121.62544

Summary: This paper examines inference on regressions when interval data are available on one variable, the other variables being measured precisely. Let a population be characterized by a distribution \(P(y,x,v,v_0,v_1)\), where \(y\in \mathbb R^1,\;x\in\mathbb R^k\), and the real variables \((v,v_0,v_1)\) satisfy \(v_0\leq v\leq v_1\). Let a random sample be drawn from \(P\) and the realizations of \((y,x,v_0,v_1)\) be observed, but not those of \(v\). The problem of interest may be to infer \(E(y|x,v)\) or \(E(v|x)\). This analysis maintains interval (I), monotonicity (M), and mean independence (MI) assumptions: (I) \(P(v_0\leq v\leq v_1)=1\); (M) \(E(y|x,v)\) is monotone in \(v\); (MI) \(E(y|x,v,v_0,v_1)=E(y|x,v)\). No restrictions are imposed on the distribution of the unobserved values of \(v\) within the observed intervals \([v_0,v_1]\). It is found that the IMMI assumptions alone imply simple nonparametric bounds on \(E(y|x,v)\) and \(E(v|x)\). These assumptions invoked when \(y\) is binary and combined with a semiparametric binary regression model yield an identification region for the parameters that may be estimated consistently by a modified maximum score method. The IMMI assumptions combined with a parametric model for \(E(y|x,v)\) or \(E(v|x)\) yield an identification region that may be estimated consistently by a modified minimum-distance method. Monte Carlo methods are used to characterize the finite-sample performance of these estimators. Empirical case studies are performed using interval wealth data in the health and retirement study and interval income data in the current population survey.

MSC:

62G10 Nonparametric hypothesis testing
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI