×

Matrix regression heterogeneity analysis. (English) Zbl 1539.62038

Summary: The development of modern science and technology has facilitated the collection of a large amount of matrix data in fields such as biomedicine. Matrix data modeling has been extensively studied, which advances from the naive approach of flattening the matrix into a vector. However, existing matrix modeling methods mainly focus on homogeneous data, failing to handle the data heterogeneity frequently encountered in the biomedical field, where samples from the same study belong to several underlying subgroups, and different subgroups follow different models. In this paper, we focus on regression-based heterogeneity analysis. We propose a matrix data heterogeneity analysis framework, by combining matrix bilinear sparse decomposition and penalized fusion techniques, which enables data-driven subgroup detection, including determining the number of subgroups and subgrouping membership. A rigorous theoretical analysis is conducted, including asymptotic consistency in terms of subgroup detection, the number of subgroups, and regression coefficients. Numerous numerical studies based on simulated and real data have been constructed, showcasing the superior performance of the proposed method in analyzing matrix heterogeneous data.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H12 Estimation in multivariate analysis
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI

References:

[1] Amato, R.; Pinelli, M.; D’Andrea, D.; Miele, G.; Nicodemi, M.; Raiconi, G.; Cocozza, S., A novel approach to simulate gene-environment interactions in complex diseases, BMC Bioinform., 11, 1, 1-9, 2010 · doi:10.1186/1471-2105-11-8
[2] Benjamin, EJ; Blaha, MJ; Chiuve, SE; Cushman, M.; Das, SR; Deo, R.; De Ferranti, SD; Floyd, J.; Fornage, M.; Gillespie, C., Heart disease and stroke statistics-2017 update: a report from the American Heart Association, Circulation, 135, 10, 146-603, 2017 · doi:10.1161/CIR.0000000000000485
[3] Caner, M., Generalized linear models with structured sparsity estimators, J. Econ., 236, 2, 2023 · Zbl 07743050 · doi:10.1016/j.jeconom.2023.105478
[4] Chakraborty, R.; Ostrin, LA; Nickla, DL; Iuvone, PM; Pardue, MT; Stone, RA, Circadian rhythms, refractive development, and myopia, Ophthalmic Physiol. Opt., 38, 3, 217-245, 2018 · doi:10.1111/opo.12453
[5] Clark, R.; Pozarickij, A.; Hysi, PG; Ohno-Matsui, K.; Williams, C.; Guggenheim, JA; Eye, UB; Consortium, V., Education interacts with genetic variants near GJD2, RBFOX1, LAMA2, KCNQ5 and LRRC4C to confer susceptibility to myopia, PLoS Genet., 18, 11, 478, 2022 · doi:10.1371/journal.pgen.1010478
[6] Ding, S.; Dennis Cook, R., Matrix variate regressions and envelope models, J. R. Stat. Soc. Ser. B Stat Methodol., 80, 2, 387-408, 2018 · Zbl 06849260 · doi:10.1111/rssb.12247
[7] Enthoven, CA; Tideman, JWL; Polling, JR; Tedja, MS; Raat, H.; Iglesias, AI; Verhoeven, VJ; Klaver, CC, Interaction between lifestyle and genetic susceptibility in myopia: the generation R study, Eur. J. Epidemiol., 34, 777-784, 2019 · doi:10.1007/s10654-019-00512-7
[8] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 96, 456, 1348-1360, 2001 · Zbl 1073.62547 · doi:10.1198/016214501753382273
[9] Fan, Q.; Guo, X.; Tideman, JWL; Williams, KM; Yazar, S.; Hosseini, SM; Howe, LD; Pourcain, BS; Evans, DM; Timpson, NJ, Childhood gene-environment interactions and age-dependent effects of genetic variants associated with refractive error and myopia: The cream consortium, Sci. Rep., 6, 1, 25853, 2016 · doi:10.1038/srep25853
[10] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, J. Am. Stat. Assoc., 97, 458, 611-631, 2002 · Zbl 1073.62545 · doi:10.1198/016214502760047131
[11] Guggenheim, JA; McMahon, G.; Kemp, JP; Akhtar, S.; St Pourcain, B.; Northstone, K.; Ring, SM; Evans, DM; Smith, GD; Timpson, NJ, A genome-wide association study for corneal curvature identifies the platelet-derived growth factor receptor alpha gene as a quantitative trait locus for eye size in white europeans, Mol. Vis., 19, 243, 2013
[12] Hu, X.; Huang, J.; Liu, L.; Sun, D.; Zhao, X., Subgroup analysis in the heterogeneous cox model, Stat. Med., 40, 3, 739-757, 2021 · doi:10.1002/sim.8800
[13] Hughes, A.; Piggins, H., Behavioral responses of VIPR2-/-mice to light, J. Biol. Rhythms, 23, 3, 211-219, 2008 · doi:10.1177/0748730408316290
[14] Hung, H.; Wang, C-C, Matrix variate logistic regression model with application to EEG data, Biostatistics, 14, 1, 189-202, 2013 · doi:10.1093/biostatistics/kxs023
[15] Hunter, DJ, Gene-environment interactions in human diseases, Nat. Rev. Genet., 6, 4, 287-298, 2005 · doi:10.1038/nrg1578
[16] Khalili, A.; Chen, J., Variable selection in finite mixture of regression models, J. Am. Stat. Assoc., 102, 479, 1025-1038, 2007 · Zbl 1469.62306 · doi:10.1198/016214507000000590
[17] Kossaï, M.; Leary, A.; Scoazec, J-Y; Genestie, C., Ovarian cancer: a heterogeneous disease, Pathobiology, 85, 1-2, 41-49, 2018 · doi:10.1159/000479006
[18] Kravitz, RL; Duan, N.; Braslow, J., Evidence-based medicine, heterogeneity of treatment effects, and the trouble with averages, Milbank Q., 82, 4, 661-687, 2004 · doi:10.1111/j.0887-378X.2004.00327.x
[19] Li, B., Kim, M.K., Altman, N.: On dimension folding of matrix-or array-valued statistical objects. Ann. Stat. (2010) · Zbl 1183.62091
[20] Li, S-M; Liu, L-R; Li, S-Y; Ji, Y-Z; Fu, J.; Wang, Y.; Li, H.; Zhu, B-D; Yang, Z.; Li, L., Design, methodology and baseline data of a school-based cohort study in central china: the Anyang childhood eye study, Ophthalmic Epidemiol., 20, 6, 348-359, 2013 · doi:10.3109/09286586.2013.842596
[21] Li, S-M; Li, S-Y; Kang, M-T; Zhou, Y.; Liu, L-R; Li, H.; Wang, Y-P; Zhan, S-Y; Gopinath, B.; Mitchell, P., Near work related parameters and myopia in Chinese children: the Anyang childhood eye study, PLoS ONE, 10, 8, 0134514, 2015
[22] Li, S-M; Ran, A-R; Kang, M-T; Yang, X.; Ren, M-Y; Wei, S-F; Gan, J-H; Li, L.; He, X.; Li, H., Effect of text messaging parents of school-aged children on outdoor time to control myopia: a randomized clinical trial, JAMA Pediatr., 176, 11, 1077-1083, 2022 · doi:10.1001/jamapediatrics.2022.3542
[23] Liu, L.; Lin, L., Subgroup analysis for heterogeneous additive partially linear models and its application to car sales data, Comput. Stat. Data Anal., 138, 239-259, 2019 · Zbl 1507.62114 · doi:10.1016/j.csda.2019.04.011
[24] Liu, J.; Huang, J.; Zhang, Y.; Lan, Q.; Rothman, N.; Zheng, T.; Ma, S., Identification of gene-environment interactions in cancer studies using penalization, Genomics, 102, 4, 189-194, 2013 · doi:10.1016/j.ygeno.2013.08.006
[25] Ma, S.; Huang, J., A concave pairwise fusion approach to subgroup analysis, J. Am. Stat. Assoc., 112, 517, 410-423, 2017 · doi:10.1080/01621459.2016.1148039
[26] Ma, S.; Huang, J.; Zhang, Z.; Liu, M., Exploration of heterogeneous treatment effects via concave fusion, Int. J. Biostat., 16, 1, 20180026, 2019 · doi:10.1515/ijb-2018-0026
[27] Mathew, D.; Giles, JR; Baxter, AE; Oldridge, DA; Greenplate, AR; Wu, JE; Alanio, C.; Kuri-Cervantes, L.; Pampena, MB; D’Andrea, K., Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications, Science, 369, 6508, 8511, 2020 · doi:10.1126/science.abc8511
[28] Morgan, IG; Ohno-Matsui, K.; Saw, S-M, Myopia, Lancet, 379, 9827, 1739-1748, 2012 · doi:10.1016/S0140-6736(12)60272-4
[29] Pozarickij, A.; Williams, C.; Hysi, PG; Guggenheim, JA, Quantile regression analysis reveals widespread evidence for gene-environment or gene-gene interactions in myopia development, Commun. Biol., 2, 1, 167, 2019 · doi:10.1038/s42003-019-0387-5
[30] Ren, M.; Zhang, Q.; Zhang, S.; Zhong, T.; Huang, J.; Ma, S., Hierarchical cancer heterogeneity analysis based on histopathological imaging features, Biometrics, 78, 4, 1579-1591, 2022 · Zbl 1520.62311 · doi:10.1111/biom.13544
[31] Sørensen, T.I.: Which patients may be harmed by good treatments? Lancet 348(9024), 351-352 (1996)
[32] Stucky, B.; Geer, S., Asymptotic confidence regions for high-dimensional structured sparsity, IEEE Trans. Signal Process., 66, 8, 2178-2190, 2018 · Zbl 1415.94240 · doi:10.1109/TSP.2018.2807399
[33] Turajlic, S.; Sottoriva, A.; Graham, T.; Swanton, C., Resolving genetic heterogeneity in cancer, Nat. Rev. Genet., 20, 7, 404-416, 2019 · doi:10.1038/s41576-019-0114-6
[34] Vaart, AW, Asymptotic Statistics, 2000, Cambridge: Cambridge University Press, Cambridge · Zbl 0943.62002
[35] Wang, H.; Li, B.; Leng, C., Shrinkage tuning parameter selection with a diverging number of parameters, J. R. Stat. Soc. Ser. B (Stat. Methodol.), 71, 3, 671-683, 2009 · Zbl 1250.62036 · doi:10.1111/j.1467-9868.2008.00693.x
[36] Yang, X.; Yan, X.; Huang, J., High-dimensional integrative analysis with homogeneity and sparsity recovery, J. Multivar. Anal., 174, 2019 · Zbl 1428.62243 · doi:10.1016/j.jmva.2019.06.007
[37] Yiu, WC; Yap, MK; Fung, WY; Ng, PW; Yip, SP, Genetic susceptibility to refractive error: association of vasoactive intestinal peptide receptor 2 (vipr2) with high myopia in chinese, PLoS ONE, 8, 4, 61805, 2013 · doi:10.1371/journal.pone.0061805
[38] Zadnik, K.; Mutti, DO, Outdoor activity protects against childhood myopia-let the sun shine in, JAMA Pediatr., 173, 5, 415-416, 2019 · doi:10.1001/jamapediatrics.2019.0278
[39] Zhang, C.-H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. (2010) · Zbl 1183.62120
[40] Zhang, H.; Jia, J., Elastic-net regularized high-dimensional negative binomial regression: consistency and weak signal detection, Stat. Sin., 32, 181-207, 2022 · Zbl 1524.62334
[41] Zhou, H.; Li, L.; Zhu, H., Tensor regression with applications in neuroimaging data analysis, J. Am. Stat. Assoc., 108, 502, 540-552, 2013 · Zbl 06195959 · doi:10.1080/01621459.2013.776499
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.