×

Valid two-sample graph testing via optimal transport procrustes and multiscale graph correlation with applications in connectomics. (English) Zbl 07853567

Summary: Testing whether two graphs come from the same distribution is of interest in many real-world scenarios, including brain network analysis. Under the random dot product graph model, the nonparametric hypothesis testing framework consists of embedding the graphs using the adjacency spectral embedding (ASE), followed by aligning the embeddings using the median flip heuristic and finally applying the nonparametric maximum mean discrepancy (MMD) test to obtain a \(p\) value. Using synthetic data generated from Drosophila brain networks, we show that the median flip heuristic results in an invalid test and demonstrate that optimal transport Procrustes (OTP) for alignment resolves the invalidity. We further demonstrate that substituting the MMD test with the multiscale graph correlation (MGC) test leads to a more powerful test both in synthetic and in simulated data. Lastly, we apply this powerful test to the right and left hemispheres of the larval Drosophila mushroom body brain networks and conclude that there is not sufficient evidence to reject the null hypothesis that the two hemispheres are equally distributed.
{© 2021 John Wiley & Sons, Ltd.}

MSC:

62-XX Statistics

Software:

GraSPy

References:

[1] Agterberg, J., Tang, M., & Priebe, C. (2020a). Nonparametric two‐sample hypothesis testing for random graphs with negative and repeated eigenvalues. arXiv preprint arXiv:2012.09828.
[2] Agterberg, J., Tang, M., & Priebe, C. E. (2020b). On two distinct sources of nonidentifiability in latent position random graph models. arXiv:2003.14250.
[3] Alvarez‐Melis, D., Jegelka, S., & Jaakkola, T. S. (2019). Towards optimal transport with global invariances. In The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 1870-1879.
[4] Alyakin, A. A., Agterberg, J., Helm, H. S., & Priebe, C. E. (2020). Correcting a nonparametric two‐sample graph hypothesis test for graphs with different numbers of vertices.
[5] Arroyo, J., Athreya, A., Cape, J., Chen, G., Priebe, C. E., & Vogelstein, J. T. (2019). Inference for multiple heterogeneous networks with a common invariant subspace. Journal of Machine Learning Research, 22, 1-49. · Zbl 07415085
[6] Asta, D. M., & Shalizi, C. R. (2015). Geometric network comparisons. In Proceedings of the thirty‐first conference on uncertainty in artificial intelligence, UAI’15, AUAI Press, Arlington, Virginia, United States, pp. 102-110. http://dl.acm.org/citation.cfm?id=3020847.3020859
[7] Athreya, A., Fishkind, D. E., Tang, M., Priebe, C. E., Park, Y., Vogelstein, J. T., Levin, K., Lyzinski, V., Qin, Y., & Sussman, D. L. (2018). Statistical inference on random dot product graphs: a survey. Journal of Machine Learning Research, 18(226), 1-92. http://jmlr.org/papers/v18/17-448.html · Zbl 1473.05279
[8] Athreya, A., Priebe, C. E., Tang, M., Lyzinski, V., Marchette, D. J., & Sussman, D. L. (2015). A limit theorem for scaled eigenvectors of random dot product graphs. · Zbl 1338.62061
[9] Bullmore, E., & Sporns, O. (2009). Complex brain networks: graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10(3), 186.
[10] Chung, J., Pedigo, B. D., Bridgeford, E. W., Varjavand, B. K., Helm, H. S., & Vogelstein, J. T. (2019). Graspy: Graph statistics in python. Journal of Machine Learning Research, 20(158), 1-7. http://jmlr.org/papers/v20/19-490.html
[11] Eichler, K., Li, F., Litwin‐Kumar, A., Park, Y., Andrade, I., Schneider‐Mizell, C. M., Saumweber, T., Huser, A., Eschbach, C., Gerber, B., & Fetter, R. D. (2017). The complete connectome of a learning and memory centre in an insect brain. Nature, 548(7666), 175.
[12] Ghoshdastidar, D., Gutzeit, M., Carpentier, A., & vonLuxburg, U. (2017). Two‐sample tests for large random graphs using network statistics. In Proceedings of the 2017 conference on learning theory (Kale, S. (ed.), & Shamir, O. (ed.), Eds.), Proceedings of Machine Learning Research, 65, PMLR, Amsterdam, Netherlands, pp. 954-977. http://proceedings.mlr.press/v65/ghoshdastidar17a.html
[13] Ghoshdastidar, D., Gutzeit, M., Carpentier, A., & vonLuxburg, U. (2019). Two‐sample hypothesis testing for inhomogeneous random graphs. arXiv:1707.00833.
[14] Ginestet, C. E., Li, J., Balachandran, P., Rosenberg, S., & Kolaczyk, E. D. (2017). Hypothesis testing for network data in functional neuroimaging. Annals of Applied Statistics, 11(2), 725-750. https://projecteuclid.org/download/pdfview_1/euclid.aoas/1500537721 · Zbl 1391.62217
[15] Goldenberg, A., Zheng, A. X., Fienberg, S. E., & Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning, 2(2), 129-233. https://doi.org/10.1561/2200000005 · Zbl 1184.68030 · doi:10.1561/2200000005
[16] Lee, Y., Shen, C., Priebe, C. E., & Vogelstein, J. T. (2019). Network dependence testing via diffusion maps and distance‐based correlations. Biometrika, 106, 857-873. · Zbl 1435.62212
[17] Levin, K., Athreya, A., Tang, M., Lyzinski, V., & Priebe, C. E. (2017). A central limit theorem for an omnibus embedding of multiple random dot product graphs. In 2017 IEEE international conference on data mining workshops (icdmw), pp. 964-967.
[18] Levin, K., & Levina, E. (2019). Bootstrapping networks with latent space structure. arXiv:1907.10821.
[19] Lyzinski, V., Sussman, D. L., Tang, M., Athreya, A., & Priebe, C. E. (2014). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Electronic Journal of Statistics, 8(2), 2905-2922. https://doi.org/10.1214/14-EJS978 · Zbl 1308.62131 · doi:10.1214/14-EJS978
[20] Maugis, P.‐A. G., Olhede, S. C., Priebe, C. E., & Wolfe, P. J. (2020). Testing for equivalence of network distribution using subgraph counts. Journal of Computational and Graphical Statistics, 0(0), 1-11. https://doi.org/10.1080/10618600.2020.1736085 · Zbl 07499288 · doi:10.1080/10618600.2020.1736085
[21] Panda, S., Palaniappan, S., Xiong, J., Bridgeford, E. W., Mehta, R., Shen, C., & Vogelstein, J. T. (2020). hyppo: A comprehensive multivariate hypothesis testing python package.
[22] Rukhin, A., & Priebe, C. E. (2011). A comparative power analysis of the maximum degree and size invariants for random graph inference. Journal of Statistical Planning and Inference, 141(2), 1041-1046. · Zbl 1353.05113
[23] Schönemann, P. H. (1966). A generalized solution of the orthogonal procrustes problem. Psychometrika, 31(1), 1-10. · Zbl 0147.19401
[24] Shen, C., Priebe, C. E., & Vogelstein, J. T. (2019a). From distance correlation to multiscale graph correlation. Journal of the American Statistical Association.
[25] Shen, C., Priebe, C. E., & Vogelstein, J. T. (2019b). The exact equivalence of independence testing and two‐sample testing. arXiv:1910.08883.
[26] Shen, C., & Vogelstein, J. T. (2018). The exact equivalence of distance and kernel methods for hypothesis testing. arXiv:1806.05514.
[27] Shen, C., & Vogelstein, J. T. (2019). The chi‐square test of distance correlation. arXiv preprint arXiv:1912.12150.
[28] Sussman, D. L., Tang, M., Fishkind, D. E., & Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. Journal of the American Statistical Association, 107(499), 1119-1128. https://doi.org/10.1080/01621459.2012.699795 · Zbl 1443.62188 · doi:10.1080/01621459.2012.699795
[29] Sussman, D. L., Tang, M., & Priebe, C. E. (2014). Consistent latent position estimation and vertex classification for random dot product graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 48-57.
[30] Szekely, G., & Rizzo, M. (2014). Partial distance correlation with methods for dissimilarities. Annals of Statistics, 42(6), 2382-2412. · Zbl 1309.62105
[31] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., Park, Y., & Priebe, C. E. (2017). A semiparametric two‐sample hypothesis testing problem for random graphs. Journal of Computational and Graphical Statistics, 26(2), 344-354.
[32] Tang, M., Athreya, A., Sussman, D. L., Lyzinski, V., & Priebe, C. E. (2017). A nonparametric two‐sample hypothesis testing problem for random graphs. Bernoulli, 23(3), 1599-1630. https://doi.org/10.3150/15-BEJ789 · Zbl 1450.62040 · doi:10.3150/15-BEJ789
[33] Vogelstein, J. T., Wang, Q., Bridgeford, E., Priebe, C. E., Maggioni, M., & Shen, C. (2019). Discovering and deciphering relationships across disparate data modalities. eLife, 8, e41690.
[34] Wasserman, S., & Faust, K. (1994). Social Network Analysis: Methods and Applications, Vol. 8: Cambridge university press.
[35] Zhu, M., & Ghodsi, A. (2006). Automatic dimensionality selection from the scree plot via the use of profile likelihood. Computational Statistics & Data Analysis, 51(2), 918-930. · Zbl 1157.62429
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.