×

Kernel knockoffs selection for nonparametric additive models. (English) Zbl 07751835

Summary: Thanks to its fine balance between model flexibility and interpretability, the nonparametric additive model has been widely used, and variable selection for this type of model has been frequently studied. However, none of the existing solutions can control the false discovery rate (FDR) unless the sample size tends to infinity. The knockoff framework is a recent proposal that can address this issue, but few knockoff solutions are directly applicable to nonparametric models. In this article, we propose a novel kernel knockoffs selection procedure for the nonparametric additive model. We integrate three key components: the knockoffs, the subsampling for stability, and the random feature mapping for nonparametric function approximation. We show that the proposed method is guaranteed to control the FDR for any sample size, and achieves a power that approaches one as the sample size tends to infinity. We demonstrate the efficacy of our method through intensive simulations and comparisons with the alternative solutions. Our proposal thus, makes useful contributions to the methodology of nonparametric variable selection, FDR-based inference, as well as knockoffs. Supplementary materials for this article are available online.

MSC:

62-XX Statistics

Software:

Bolasso; R; gamair

References:

[1] Ahmed, I.; Hartikainen, A.-L.; Järvelin, M.-R.; Richardson, S., “False Discovery Rate Estimation for Stability Selection: Application to Genome-Wide Association Studies, Statistical Applications in Genetics and Molecular Biology, 10, 1-21 (2011) · doi:10.2202/1544-6115.1663
[2] “2020 Alzheimer’s Disease Facts and Figures,”, Alzheimer’s & Dementia, 16, 391-460 (2020)
[3] Aronszajn, N., “Theory of Reproducing Kernels, Transactions of the American Mathematical Society, 68, 337-404 (1950) · Zbl 0037.20701 · doi:10.1090/S0002-9947-1950-0051437-7
[4] Bach, F. R., “Bolasso: Model Consistent Lasso Estimation Through the Bootstrap,”, Proceedings of the 25th International Conference on Machine Learning (2008)
[5] Bakkour, A.; Morris, J. C.; Wolk, D. A.; Dickerson, B. C., “The Effects of Aging and Alzheimer’s Disease on Cerebral Cortical Anatomy: Specificity and Differential Relationships with Cognition, NeuroImage, 76, 332-344 (2013) · doi:10.1016/j.neuroimage.2013.02.059
[6] Barber, R. F.; Candès, E. J., “Controlling the False Discovery Rate via Knockoffs, The Annals of Statistics, 43, 2055-2085 (2015) · Zbl 1327.62082 · doi:10.1214/15-AOS1337
[7] Barber, R. F.; Candès, E. J., “A Knockoff Filter for High-Dimensional Selective Inference, The Annals of Statistics, 47, 2504-2537 (2019) · Zbl 1444.62034
[8] Barber, R. F.; Candès, E. J.; Samworth, R. J., “Robust Inference with Knockoffs, The Annals of Statistics, 48, 1409-1431 (2020) · Zbl 1452.62193 · doi:10.1214/19-AOS1852
[9] Băzăvan, E. G.; Li, F.; Sminchisescu, C., Fourier Kernel Learning (2012), New York: Springer, New York
[10] Benjamini, Y.; Hochberg, Y., “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society, Series B, 57, 289-300 (1995) · Zbl 0809.62014 · doi:10.1111/j.2517-6161.1995.tb02031.x
[11] Bochner, S., “A Theorem on Fourier-Stieltjes Integrals, Bulletin of the American Mathematical Society, 40, 271-276 (1934) · JFM 60.0221.03 · doi:10.1090/S0002-9904-1934-05843-9
[12] Cai, T. T.; Sun, W., “Large-Scale Global and Simultaneous Inference: Estimation and Testing in Very High Dimensions, Annual Review of Economics, 9, 411-439 (2017) · doi:10.1146/annurev-economics-063016-104355
[13] Candès, E.; Fan, Y.; Janson, L.; Lv, J., “Panning for Gold: ‘model-x’ Knockoffs for High Dimensional Controlled Variable Selection, Journal of the Royal Statistical Society, Series B, 80, 551-577 (2018) · Zbl 1398.62335 · doi:10.1111/rssb.12265
[14] Convit, A.; de Asis, J.; de Leon, M. J.; Tarshish, C. Y.; De Santi, S.; Rusinek, H., “Atrophy of the Medial Occipitotemporal, Inferior, and Middle Temporal gyri in Non-demented Elderly Predict Decline to Alzheimer’s Disease, Neurobiology of Aging, 21, 19-26 (2000) · doi:10.1016/S0197-4580(99)00107-4
[15] Dai, R.; Barber, R., “The Knockoff Filter for FDR Control in Group-Sparse and Multitask Regression,”, in International Conference on Machine Learning. PMLR (2016)
[16] Dai, X.; Li, L., “Kernel Ordinary Differential Equations, Journal of the American Statistical Association, 1-35 (2021) · Zbl 1515.62075 · doi:10.1080/01621459.2021.1882466
[17] Donohue, M. C.; Sperling, R. A.; Salmon, D. P.; Rentz, D. M.; Raman, R.; Thomas, R. G.; Weiner, M.; Aisen, P. S., “The Preclinical Alzheimer Cognitive Composite: Measuring Amyloid-Related Decline,”, Journal of the American Medical Association: Neurology, 71, 961-970 (2014)
[18] Du, A.-T.; Schuff, N.; Kramer, J. H.; Rosen, H. J.; Gorno-Tempini, M. L.; Rankin, K.; Miller, B. L.; Weiner, M. W., “Different Regional Patterns of Cortical Thinning in Alzheimer’s Disease and Frontotemporal Dementia, Brain, 130, 1159-1166 (2007) · doi:10.1093/brain/awm016
[19] Dümbgen, L.; Samworth, R. J.; Schuhmacher, D.; Banerjee, M.; Bunea, F.; Huang, J.; Koltchinskii, V.; Maathuis, M. H., From Probability to Statistics and Back: High-Dimensional Models and Processes-A Festschrift in Honor of Jon A. Wellner, “Stochastic Search for Semiparametric Linear Regression Models,”, 78-90 (2013), Beachwood: Institute of Mathematical Statistics, Beachwood
[20] Efron, B.; Tibshirani, R.; Storey, J. D.; Tusher, V., “Empirical Bayes Analysis of a Microarray Experiment, Journal of the American Statistical Association, 96, 1151-1160 (2001) · Zbl 1073.62511 · doi:10.1198/016214501753382129
[21] Fan, Y.; Demirkaya, E.; Li, G.; Lv, J., “Rank: Large-scale Inference with Graphical Nonlinear Knockoffs, Journal of the American Statistical Association, 115, 362-379 (2020) · Zbl 1437.62699 · doi:10.1080/01621459.2018.1546589
[22] Fornito, A.; Zalesky, A.; Breakspear, M., “Graph Analysis of the Human Connectome: Promise, Progress, and Pitfalls, NeuroImage, 80, 426-444 (2013) · doi:10.1016/j.neuroimage.2013.04.087
[23] Hastie, T. J.; Tibshirani, R. J., Generalized Additive Models, 43 (1990), London: Chapman and Hall, London · Zbl 0747.62061
[24] He, K.; Li, Y.; Zhu, J.; Liu, H.; Lee, J. E.; Amos, C. I.; Hyslop, T.; Jin, J.; Lin, H.; Wei, Q., “Component-Wise Gradient Boosting and False Discovery Control in Survival Analysis with High-Dimensional Covariates,”, Bioinformatics, 32, 50-57 (2016)
[25] Huang, J.; Horowitz, J. L.; Wei, F., “Variable Selection in Nonparametric Additive Models, Annals of Statistics, 38, 2282-2313 (2010) · Zbl 1202.62051
[26] Kang, J.; Bowman, F. D.; Mayberg, H.; Liu, H., “A Depression Network of Functionally Connected Regions Discovered via Multi-Attribute Canonical Correlation Graphs, NeuroImage, 141, 431-441 (2016) · doi:10.1016/j.neuroimage.2016.06.042
[27] Koltchinskii, V.; Yuan, M., “Sparsity in Multiple Kernel Learning, The Annals of Statistics, 38, 3660-3695 (2010) · Zbl 1204.62086 · doi:10.1214/10-AOS825
[28] Li, L.; Cook, R. D.; Nachtsheim, C. J., “Model-free Variable Selection, Journal of the Royal Statistical Society, Series B, 67, 285-299 (2005) · Zbl 1069.62053 · doi:10.1111/j.1467-9868.2005.00502.x
[29] Li, S.; Hsu, L.; Peng, J.; Wang, P., “Bootstrap Inference for Network Construction with an Application to a Breast Cancer Microarray Study, The Annals of Applied Statistics, 7, 391-417 (2013) · Zbl 1454.62353 · doi:10.1214/12-AOAS589
[30] Lin, Y.; Zhang, H. H., “Component Selection and Smoothing in Multivariate Nonparametric Regression, The Annals of Statistics, 34, 2272-2297 (2006) · Zbl 1106.62041 · doi:10.1214/009053606000000722
[31] Loh, P.-L.; Wainwright, M. J., “High-Dimensional Regression with Noisy and Missing Data: Provable Guarantees with Nonconvexity,”, Annals of Statistics, 40, 1637-1664 (2012) · Zbl 1257.62063
[32] Meier, L.; Van de Geer, S.; Bühlmann, P., “High-Dimensional Additive Modeling, The Annals of Statistics, 37, 3779-3821 (2009) · Zbl 1360.62186 · doi:10.1214/09-AOS692
[33] Meinshausen, N.; Bühlmann, P., “Stability Selection, Journal of the Royal Statistical Society, Series B, 72, 417-473 (2010) · Zbl 1411.62142 · doi:10.1111/j.1467-9868.2010.00740.x
[34] Mercer, J., Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations, Philosophical Transactions of the Royal Society, London A, 209, 415-446 (1909) · JFM 40.0408.02
[35] Miller, A., Subset Selection in Regression (2002), Boca Raton, FL: CRC Press, Boca Raton, FL · Zbl 1051.62060
[36] Pini, L.; Pievani, M.; Bocchetta, M.; Altomare, D.; Bosco, P.; Cavedo, E.; Galluzzi, S.; Marizzoni, M.; Frisoni, G. B., “Brain Atrophy in Alzheimer’s Disease and Aging, Aging Research Reviews, 30, 25-48 (2016) · doi:10.1016/j.arr.2016.01.002
[37] Rahimi, A.; Recht, B., “Random Features for Large-Scale Kernel Machines,”, Advances in Neural Information Processing Systems, 20, 1177-1184 (2007)
[38] Raskutti, G.; Wainwright, M. J.; Yu, B., “Minimax-Optimal Rates for Sparse Additive Models Over Kernel Classes via Convex Programming, Journal of Machine Learning Research, 13, 389-427 (2012) · Zbl 1283.62071
[39] Raskutti, G.; Wainwright, M. J.; Yu, B., “Minimax Rates of Estimation for High-Dimensional Linear Regression Over \(####\)-Balls,”, IEEE Transactions on Information Theory, 57, 6976-6994 (2011) · Zbl 1365.62276
[40] Ravikumar, P.; Lafferty, J.; Liu, H.; Wasserman, L., “Sparse Additive Models, Journal of the Royal Statistical Society, Series B, 71, 1009-1030 (2009) · Zbl 1411.62107 · doi:10.1111/j.1467-9868.2009.00718.x
[41] Ravikumar, P.; Wainwright, M. J.; Lafferty, J. D., “High-Dimensional Ising Model Selection Using l_1-Regularized Logistic Regression, The Annals of Statistics, 38, 1287-1319 (2010) · Zbl 1189.62115 · doi:10.1214/09-AOS691
[42] Ren, Z.; Wei, Y.; Candès, E., “Derandomizing Knockoffs,”, arXiv preprint arXiv:2012.02717 (2020) · Zbl 07707214
[43] Romano, Y.; Sesia, M.; Candès, E., “Deep Knockoffs, Journal of the American Statistical Association, 115, 1861-1872 (2019) · Zbl 1452.62710 · doi:10.1080/01621459.2019.1660174
[44] Rudi, A.; Rosasco, L., “Generalization Properties of Learning with Random Features,”, NIPS (2017)
[45] Stone, C. J., “Additive Regression and Other Nonparametric Models, The Annals of Statistics, 13, 689-705 (1985) · Zbl 0605.62065 · doi:10.1214/aos/1176349548
[46] Storey, J. D., “The Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing, Journal of the Royal Statistical Society, Series B, 69, 347-368 (2007) · Zbl 07555356 · doi:10.1111/j.1467-9868.2007.005592.x
[47] Su, W.; Bogdan, M.; Candes, E., “False Discoveries Occur Early on the Lasso Path, The Annals of Statistics, 45, 2133-2150 (2017) · Zbl 1459.62142 · doi:10.1214/16-AOS1521
[48] Sun, W.; Cai, T. T., “Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control, Journal of the American Statistical Association, 102, 901-912 (2007) · Zbl 1469.62318 · doi:10.1198/016214507000000545
[49] Sun, W.; Cai, T. T., “Large-Scale Multiple Testing Under Dependence, Journal of the Royal Statistical Society, Series B, 71, 393-424 (2009) · Zbl 1248.62005
[50] van de Geer, S. A.; Dehling, H.; Mikosch, T.; Sørensen, M., Empirical Process Techniques for Dependent Data, “On Hoeffding’s Inequality for Dependent Random Variables,”, 161-169 (2002), Boston: Springer, Boston · Zbl 1027.60013
[51] van Hoesen, G. W.; Hyman, B. T.; Damasio, A. R., “Entorhinal Cortex Pathology in Alzheimer’s Disease, Hippocampus, 1, 1-8 (1991) · doi:10.1002/hipo.450010102
[52] Visser, P. J.; Verhey, F. R. J.; Hofman, P. A. M.; Scheltens, P.; Jolles, J., “Medial Temporal Lobe Atrophy Predicts Alzheimer’s Disease in Patients with Minor Cognitive Impairment, Journal of Neurology, Neurosurgery & Psychiatry, 72, 491-497 (2002)
[53] Wahba, G., Spline Models for Observational Data (1990), Philadelphia, PA: SIAM, Philadelphia, PA · Zbl 0813.62001
[54] Wahba, G.; Wang, Y.; Gu, C.; Klein, R.; Klein, B., “Smoothing Spline Anova for Exponential Families, with Application to the Wisconsin Epidemiological Study of Diabetic Retinopathy, The Annals of Statistics, 23, 1865-1895 (1995) · doi:10.1214/aos/1034713638
[55] Weinstein, A.; Su, W. J.; Bogdan, M.; Barber, R. F.; Candès, E. J., “A Power Analysis for Knockoffs with the Lasso Coefficient-Difference Statistic,”, arXiv preprint arXiv:2007.15346 (2020)
[56] Wood, S. N., Generalized Additive Models: An Introduction With R (2017), Boca Raton, FL: CRC press, Boca Raton, FL · Zbl 1368.62004
[57] Wu, Y.; Boos, D. D.; Stefanski, L. A., “Controlling Variable Selection by the Addition of Pseudovariables, Journal of the American Statistical Association, 102, 235-243 (2007) · Zbl 1284.62242 · doi:10.1198/016214506000000843
[58] Yuan, M.; Zhou, D.-X., “Minimax Optimal Rates of Estimation in High Dimensional Additive Models, Annals of Statistics, 44, 2564-2593 (2016) · Zbl 1360.62200
[59] Zhao, P.; Yu, B., “On Model Selection Consistency of Lasso, Journal of Machine Learning Research, 7, 2541-2563 (2006) · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.