×

Nonparametric independence screening for ultra-high dimensional generalized varying coefficient models with longitudinal data. (English) Zbl 1417.62105

Summary: In this paper, we propose a nonparametric independence screening method for sparse ultra-high dimensional generalized varying coefficient models with longitudinal data. Our methods combine the ideas of sure independence screening (SIS) in sparse ultra-high dimensional generalized linear models and varying coefficient models with the marginal generalized estimating equation (GEE) method, called NIS-GEE, considering both the marginal correlation between response and covariates, and the subject correlation for variable screening. The corresponding iterative algorithm is introduced to enhance the performance of the proposed NIS-GEE method. Furthermore it is shown that, under some regularity conditions, the proposed NIS-GEE method enjoys the sure screening properties. Simulation studies and a real data analysis are used to assess the performance of the proposed method.

MSC:

62G08 Nonparametric regression and quantile regression
62J12 Generalized linear models (logistic models)
Full Text: DOI

References:

[1] de Boor, C., A Practical Guide to Splines (1978), Springer-Verlag: Springer-Verlag New York · Zbl 0406.41003
[2] Candès, E.; Tao, T., The Dantzig selector: Statistical estimation when \(p\) is much larger than \(n\), Ann. Statist., 35, 2313-2404 (2007) · Zbl 1139.62019
[3] Cheng, M.; Honda, T.; Li, J.; Peng, H., Nonparametric independence screening and structure identification for ultra-high dimensional longitudinal data, Ann. Statist., 42, 1819-1849 (2014) · Zbl 1305.62169
[4] Cheng, M.; Honda, T.; Zhang, J., Forward variable selection for sparse ultra-high dimensional varying coefficient models, J. Amer. Statist. Assoc., 111, 1209-1221 (2016)
[5] Chu, W.; Li, R.; Reimherr, M., Feature screening for time-varying coefficient models with ultrahigh dimensional longitudinal data, Ann. Appl. Statist., 10, 596-617 (2016) · Zbl 1400.62255
[6] Fan, J.; Feng, Y.; Song, R., Nonparametric independence screening in sparse ultra-high-dimensional additive models, J. Amer. Statist. Assoc., 106, 544-557 (2011) · Zbl 1232.62064
[7] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Amer. Statist. Assoc., 96, 1348-1360 (2001) · Zbl 1073.62547
[8] Fan, J.; Lv, J., Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B Stat. Methodol., 70, 849-911 (2008) · Zbl 1411.62187
[9] Fan, J.; Ma, Y.; Dai, W., Nonparametric independence screening in sparse ultra-high-dimensional varying coefficient models, J. R. Stat. Soc. Ser. B Stat. Methodol., 109, 1270-1284 (2014) · Zbl 1368.62095
[10] Fan, J.; Song, R., Penalized estimating equations, Biometrics, 59, 126-132 (2003) · Zbl 1210.62016
[11] Fan, J.; Song, R., Sure independence screening in generalized linear models with NP-dimensionality, Ann. Statist., 38, 3567-3604 (2010) · Zbl 1206.68157
[12] Hastie, T.; Tibshirani, R. J., Varying-coefficient models, J. R. Stat. Soc. Ser. B Stat. Methodol., 55, 757-796 (1993) · Zbl 0796.62060
[13] He, X.; Wang, L.; Hong, H., Quantile-adaptive model-free variable screening for high-dimensional heterogeneous data, Ann. Statist., 41, 342-369 (2013) · Zbl 1295.62053
[14] Hong, H.; Kang, J.; Li, Y., Conditional screening for ultra-high dimensional covariates with survival outcomes, Lifetime Data Anal., 24, 45-71 (2018) · Zbl 1468.62386
[15] Hong, H.; Li, Y., Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review, Appl. Math. J. Chin. Univ., 32, 379-396 (2017) · Zbl 1399.62178
[16] Huang, J.; Wu, C.; Zhou, L., Varying-coefficient models and basis function approximations for the analysis of repeated measurements, Biometrika, 89, 111-128 (2002) · Zbl 0998.62024
[17] Huang, J.; Wu, C.; Zhou, L., Polynomial spline estimation and inference for varying coefficient models with longitudinal data, Statist. Sinica, 14, 763-788 (2004) · Zbl 1073.62036
[18] Ledoux, M.; Talagrand, M., Probability in Banach Spaces: Isoperimetry and Processes (2013), Springer Science & Business Media
[19] Li, G.; Lai, P.; Lian, H., Variable selection and estimation for partially linear single-index models with longitudinal data, Statist. Comput., 25, 579-593 (2015) · Zbl 1331.62336
[20] Li, Y.; Li, G.; Lian, H.; Tong, T., Profile forward regression screening for ultra-high dimensional semiparametric varying coefficient partially linear models, J. Multivariate Anal., 155, 133-150 (2017) · Zbl 1360.62180
[21] Li, Y.; Li, G.; Tong, T., Sequential profile lasso for ultra-high-dimensional partially linear models, Statist. Theory Rel. Fields, 1, 234-245 (2017) · Zbl 07660546
[22] Li, G.; Peng, H.; Zhang, J.; Zhu, L., Robust rank correlation based screening, Ann. Statist., 40, 1846-1877 (2012) · Zbl 1257.62067
[23] Li, R.; Zhong, W.; Zhu, L., Feature screening via distance correlation learning, J. Amer. Statist. Assoc., 107, 1129-1139 (2012) · Zbl 1443.62184
[24] Liu, J.; Li, R.; Wu, R., Feature selection for varying coefficient models with ultrahigh-dimensional covariates, J. Amer. Statist. Assoc., 109, 266-274 (2014) · Zbl 1367.62048
[25] Luan, Y.; Li, H., Clustering of time-course gene expression data using a mixed-effects model with B-splines, Bioinformatics, 19, 474-482 (2003)
[26] Ma, S.; Li, R.; Tsai, C., Variable screening via quantile partial correlation, J. Amer. Statist. Assoc., 112, 650-663 (2017)
[27] Massart, P., About the constants in Talagrand’s concentration inequalities for empirical processes, Ann. Probab., 28, 863-884 (2000) · Zbl 1140.60310
[28] McCullagh, P.; Nelder, J., Generalized Linear Models (1989), Chapman & Hall: Chapman & Hall London · Zbl 0744.62098
[29] Oman, S., Easily simulated multivariate binary distributions with given positive and negative correlations, Comput. Statist. Data Anal., 53, 999-1005 (2009) · Zbl 1452.62105
[30] Simon, I.; Barnett, J.; Hannett, N.; Harbison, C.; Rinaldi, N.; Volkert, T.; Wyrick, J.; Zeitlinger, J.; Gifford, D.; Jaakkola, T.; Young, R., Serial regulation of transcriptional regulators in the yeast cell cycle, Cell, 106, 697-708 (2001)
[31] Song, R.; Lu, W.; Ma, S.; Jeng, X., Censored rank independence screening for high-dimensional survival data, Biometrika, 101, 799-814 (2014) · Zbl 1306.62207
[32] Song, R.; Yi, F.; Zou, H., On varying-coefficient independence screening for high-dimensional varying-coefficient models, Statist. Sinica, 24, 1735-1752 (2014) · Zbl 1480.62151
[33] Spellman, P.; Sherlock, G.; Zhang, M.; Iyer, V.; Anders, K.; Eisen, M.; Brown, P.; Botstein, D., Comprehensive identification of cell cyclešcregulated genes of the yeast saccharomyces cerevisiae by microarray hybridization, Mol. Biol. Cell, 9, 3273-3297 (1998)
[34] Tibshirani, R. J., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B Stat. Methodol., 58, 267-288 (1996) · Zbl 0850.62538
[35] van der Vaart, A.; Wellner, J. A., Weak Convergence and Empirical Processes (1996), Springer: Springer New York · Zbl 0862.60002
[36] Wang, H., Forward regression for ultra-high dimensional variable screening, J. Amer. Statist. Assoc., 104, 1512-1524 (2009) · Zbl 1205.62103
[37] Wang, L.; Chen, G.; Li, H., Group scad regression analysis for microarray time course gene expression data, Bioinformatics, 23, 1486-1494 (2007)
[38] Wang, L.; Li, H.; Huang, J., Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements, J. Amer. Statist. Assoc., 103, 1556-1569 (2008) · Zbl 1286.62034
[39] Wang, L.; Zhou, J.; Qu, A., Penalized generalized estimating equations for high-dimensional longitudinal data analysis, Biometrics, 68, 353-360 (2012) · Zbl 1251.62051
[40] Xu, P.; Zhu, L.; Li, Y., Ultrahigh dimensional time course feature selection, Biometrics, 70, 356-365 (2014) · Zbl 1419.62482
[41] Xue, L.; Qu, A., Variable selection in high-dimensional varying-coefficient models with global optimality, J. Mach. Learn. Res., 13, 1973-1998 (2012) · Zbl 1435.62093
[42] Yang, H.; Guo, C.; Lv, J., Variable selection for generalized varying coefficient models with longitudinal data, Statist. Pap., 57, 115-132 (2016) · Zbl 1364.62200
[43] Zeger, S.; Liang, K., Longitudinal data analysis for discrete and continuous outcomes, Biometrics, 42, 121-130 (1986)
[44] Zhang, J.; Zhang, R.; Lu, Z., Quantile-adaptive variable screening in ultra-high dimensional varying coefficient models, J, Appl. Statist., 43, 643-654 (2016) · Zbl 1514.62970
[45] Zhao, S.; Li, Y., Principled sure independence screening for cox models with ultra-high-dimensional covariates, J. Multivariate Anal., 105, 397-411 (2012) · Zbl 1233.62173
[46] Zhu, L.; Li, L.; Li, R.; Zhu, L., Model-free feature screening for ultrahigh-dimensional data, J. Amer. Statist. Assoc., 106, 1464-1475 (2011) · Zbl 1233.62195
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.