×

Powerful multiple testing of paired null hypotheses using a latent graph model. (English) Zbl 07524986

Summary: In this paper, we explore the multiple testing problem of paired null hypotheses, for which the data are collected on pairs of entities and tests have to be performed for each pair. Typically, for each pair \((i,j)\), we observe some interaction/association score between \(i\) and \(j\) and the aim is to detect the pairs with a significant score. In this setting, it is natural to assume that the true/false null constellation is structured according to an unobserved graph, where present edges correspond to a significant association score. The point of this work is to build an improved multiple testing decision by learning the graph structure. Our approach is in line with the seminal work of W. Sun and T. T. Cai [J. R. Stat. Soc., Ser. B, Stat. Methodol. 71, No. 2, 393–424 (2009; Zbl 1248.62005)], that uses the hidden Markov model to structure the dependencies between null hypotheses. Here, we adapt this strategy by considering the stochastic block model for the latent graph. Under appropriate assumptions, the new proposed procedure is shown to control the false discovery rate, up to remainder terms that vanish when the size of the number of hypotheses increases. The procedure is also shown to be nearly optimal in the sense that it is close to the procedure maximizing the true discovery rate. Numerical experiments reveal that our method outperforms state-of-the-art methods and is robust to model mis-specification. Finally, the applicability of the new method is demonstrated on data concerning the usage of self-service bicycles in London.

MSC:

62-XX Statistics

Citations:

Zbl 1248.62005

Software:

Mixnet

References:

[1] Christopher Aicher, Abigail Z. Jacobs, and Aaron Clauset. Learning latent block structure in weighted networks. Journal of Complex Networks, 3(2):221-248, 06 2014. · Zbl 1397.68151
[2] Elizabeth S. Allman, Catherine Matias, and John A. Rhodes. Parameter identifiability in a class of random graph mixture models. Journal of Statistical Planning and Inference, 141(5):1719-1736, 2011. · Zbl 1207.62010
[3] Albert-László Barabási and Réka Albert. Emergence of scaling in random networks. Science, 286(5439):509-512, 1999. · Zbl 1226.05223
[4] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B, 57(1):289-300, 1995. · Zbl 0809.62014
[5] Yoav Benjamini, Abba M. Krieger, and Daniel Yekutieli. Adaptive linear step-up procedures that control the false discovery rate. Biometrika, 93(3):491-507, 2006. · Zbl 1108.62069
[6] Yoav Benjamini and Daniel Yekutieli. The control of the false discovery rate in multiple testing under dependency. Ann. Statist., 29(4):1165-1188, 2001. · Zbl 1041.62061
[7] Peter Bickel, David Choi, Xiangyu Chang, and Hai Zhang. Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist., 41(4):1922-1943, 08 2013. · Zbl 1292.62042
[8] C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719-725, 2000.
[9] Gilles Blanchard and Etienne Roquain. Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res., 10:2837-2871, 2009. · Zbl 1235.62093
[10] Vincent Brault, Christine Keribin, and Mahendra Mariadassou. Consistency and asymptotic normality of latent blocks model estimators. Electronic Journal of Statistics, 14(1):1234-1268, 2020. · Zbl 1439.62256
[11] T. Tony Cai. R codes for the adaptive testing procedure. http://stat.wharton.upenn.edu/ tcai/paper/html/FDR.html.
[12] T. Tony Cai and Weidong Liu. Large-scale multiple testing of correlations. J. Amer. Statist. Assoc., 111(513):229-240, 2016.
[13] T. Tony Cai and Wenguang Sun. Simultaneous testing of grouped hypotheses: finding needles in multiple haystacks. J. Amer. Statist. Assoc., 104(488):1467-1481, 2009. · Zbl 1205.62005
[14] T. Tony Cai, Wenguang Sun, and Weinan Wang. Covariate-assisted ranking and screening for large-scale two-sample inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 81(2):187-234, 2019. · Zbl 1420.62032
[15] Ismaël Castillo and Étienne Roquain. On spike and slab empirical Bayes multiple testing. Ann. Statist., 48(5):2548-2574, 2020. · Zbl 1455.62035
[16] A. Celisse, J.-J. Daudin, and L. Pierre. Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model. Electron. J. Statist., 6:1847-1899, 2012. · Zbl 1295.62028
[17] Jinyuan Chang, E. Kolaczyk, and Q. Yao. Estimation of subgraph densities in noisy networks. Journal of the American Statistical Association, pages 1-14, 2020.
[18] Zhiyi Chi. On the performance of FDR control: constraints and a partial solution. Ann. Statist., 35(4):1409-1431, 2007. · Zbl 1125.62075
[19] J.-J. Daudin, F. Picard, and S. Robin. A mixture model for random graphs. Statistics and Computing, 18(2):173-183, 2008.
[20] Mathias Drton, Michael D. Perlman, et al. Multiple testing and error control in gaussian graphical model selection. Statistical Science, 22(3):430-449, 2007. · Zbl 1246.62143
[21] Nathan Eagle and Alex (Sandy) Pentland. Reality mining: Sensing complex social systems. Journal of Personal and Ubiquitous Computing, 10(4):255-268, March 2006.
[22] Bradley Efron. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Amer. Statist. Assoc., 99(465):96-104, 2004. · Zbl 1089.62502
[23] Bradley Efron, Robert Tibshirani, John D. Storey, and Virginia Tusher. Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc., 96(456):1151-1160, 2001. · Zbl 1073.62511
[24] P.W. Holland, K.B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 5(2):109-137, 1983.
[25] Can M. Le, Keith Levin, and Elizaveta Levina. Estimating a network from multiple noisy realizations. Electronic Journal of Statistics, 12(2):4697-4740, 2018. · Zbl 1409.62115
[26] Jie Liu, Chunming Zhang, and David Page. Multiple testing under dependence via graphical models. Ann. Appl. Stat., 10(3):1699-1724, 2016. · Zbl 1391.62231
[27] Weidong Liu. Gaussian graphical model estimation with false discovery rate control. Ann. Statist., 41(6):2948-2978, 2013. · Zbl 1288.62094
[28] David Lusseau. Evidence for social role in a dolphin social network. Evolutionary Ecology, 21:357-366, 2007.
[29] Mahendra Mariadassou, Stéphane Robin, and Corinne Vacher. Uncovering latent structure in valued graphs: a variational approach. Ann. Appl. Stat., 4(2):715-742, 2010. · Zbl 1194.62125
[30] T. Martin, B. Ball, and M. Newman. Structural inference for uncertain networks. Physical Review E, 93(1):012306, 2016.
[31] P. Massart. The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab., 18(3):1269-1283, 1990. · Zbl 0713.62021
[32] C. Matias and S. Robin. Modeling heterogeneity in random graphs through latent space models: a selective review. Esaim Proc. & Surveys, 47:55-74, 2014. · Zbl 1335.05002
[33] Elchanan Mossel, Joe Neeman, and Allan Sly. A proof of the block model threshold conjecture. Combinatorica, 38(3):665-708, 2018. · Zbl 1424.05272
[34] United Nations. https://www.un.org/en/development/desa/population/migration/data, 2019.
[35] M. Newman. Network structure from rich but noisy data. Nature Physics, 14:542-545, 2017.
[36] M. Newman. Estimating network structure from unreliable measurements. Physical Review E, 98, 2018.
[37] Krzysztof Nowicki and Tom A. B Snijders. Estimation and prediction for stochastic blockstructures. Journal of the American Statistical Association, 96(455):1077-1087, 2001. · Zbl 1072.62542
[38] Tiago P. Peixoto. Nonparametric weighted stochastic block models. Physical review. E, 97(1-1):012306, 2018.
[39] Tiago P. Peixoto. Reconstructing networks with unknown and heterogeneous errors. Phys. Rev. X, 8:041011, Oct 2018.
[40] Franck Picard, Vincent Miele, Jean-Jacques Daudin, Ludovic Cottret, and Stéphane Robin. Deciphering the connectivity structure of biological networks using mixnet. BMC Bioinformatics, 10(6), 2009.
[41] Yannik Pitcan. A Note on Concentration Inequalities for U-Statistics. arXiv e-prints, page arXiv:1712.06160, Dec 2017.
[42] T. Schweder and E. Spjøtvoll. Plots of P-values to evaluate many tests simultaneously. Biometrika, 69(3):493-502, 1982.
[43] John D. Storey. A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol., 64(3):479-498, 2002. · Zbl 1090.62073
[44] John D. Storey. The positive false discovery rate: a Bayesian interpretation and the \(q\)-value. Ann. Statist., 31(6):2013-2035, 2003. · Zbl 1042.62026
[45] Wenguang Sun and T. Tony Cai. Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc., 102(479):901-912, 2007. · Zbl 1469.62318
[46] Wenguang Sun and T. Tony Cai. Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B Stat. Methodol., 71(2):393-424, 2009. · Zbl 1248.62005
[47] Henry Teicher. Identifiability of mixtures. The Annals of Mathematical Statistics, 32(1):244-248, 1961. · Zbl 0146.39302
[48] Henry Teicher. Identifiability of finite mixtures. Ann. Math. Statist., 34(4):1265-1269, 12 1963. · Zbl 0137.12704
[49] Jichun Xie, T. Tony Cai, John Maris, and Hongzhe Li. Optimal false discovery rate control for dependent data. Statistics and Its Interface, 4(4):417, 2011. · Zbl 1245.62091
[50] Jean-Gabriel Young, George T. Cantwell, and M. Newman. Robust bayesian inference of network structure from unreliable data. J. Complex Networks, 8, 2021.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.