×

Community detection with nodal information: likelihood and its variational approximation. (English) Zbl 07853566

Summary: Community detection is one of the fundamental problems in the study of network data. Most existing community detection approaches only consider edge information as inputs, and the output could be suboptimal when nodal information is available. In such cases, it is desirable to leverage nodal information for the improvement of community detection accuracy. Towards this goal, we propose a flexible network model incorporating nodal information and develop likelihood-based inference methods. For the proposed methods, we establish favorable asymptotic properties as well as efficient algorithms for computation. Numerical experiments show the effectiveness of our methods in utilizing nodal information across a variety of simulated and real network data sets.
{© 2021 John Wiley & Sons, Ltd.}

MSC:

62-XX Statistics

Software:

SDPT3
Full Text: DOI

References:

[1] Abbe, E., & Sandon, C. (2015). Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms. arXiv:1503.00609.
[2] Airoldi, E. M., Blei, D. M., Fienberg, S. E., & Xing, E. P. (2009). Mixed membership stochastic blockmodels. In Advances in neural information processing systems, pp. 33-40.
[3] Akoglu, L., Tong, H., Meeder, B., & Faloutsos, C. (2012). Pics: Parameter‐free identification of cohesive subgroups in large attributed graphs. In Sdm, Citeseer, pp. 439-450.
[4] Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo‐likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41(4), 2097-2122. · Zbl 1277.62166
[5] Amini, A. A., & Levina, E. (2014). On semidefinite relaxations for the block model. arXiv:1406.5647. · Zbl 1393.62021
[6] Ana, L. N. F., & Jain, A. K. (2003). Robust data clustering. In Computer vision and pattern recognition, 2003. Proceedings. 2003 IEEE computer society conference on, 2, IEEE, pp. II-128.
[7] Anandkumar, A., Ge, R., Hsu, D., & Kakade, S. M. (2014). A tensor approach to learning mixed membership community models. The Journal of Machine Learning Research, 15(1), 2239-2312. · Zbl 1318.68136
[8] Bickel, P. J., & Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences, 106(50), 21068-21073. · Zbl 1359.62411
[9] Bickel, P. J., Choi, D., Chang, X., & Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. The Annals of Statistics, 41(4), 1922-1943. · Zbl 1292.62042
[10] Binkiewicz, N., Vogelstein, J. T., & Rohe, K. (2017). Covariate‐assisted spectral clustering. Biometrika, 104(2), 361-377. · Zbl 1506.62319
[11] Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine Learning, 3(1), 1-122. · Zbl 1229.90122
[12] Cai, T. T., & Li, X. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. The Annals of Statistics, 43(3), 1027-1059. · Zbl 1328.62381
[13] Celisse, A., Daudin, J.‐J., & Pierre, L. (2012). Consistency of maximum‐likelihood and variational estimators in the stochastic block model. Electronic Journal of Statistics, 6, 1847-1899. · Zbl 1295.62028
[14] Chang, J., & Blei, D. M. (2010). Hierarchical relational models for document networks. The Annals of Applied Statistics, 4, 124-150. · Zbl 1189.62191
[15] Chen, Y., Li, X., & Xu, J. (2015). Convexified modularity maximization for degree‐corrected stochastic block models. arXiv:1512.08425.
[16] Chen, Y., Sanghavi, S., & Xu, H. (2012). Clustering sparse graphs, Advances in neural information processing systems, pp. 2204-2212.
[17] Choi, D. S., Wolfe, P. J., & Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika, 99, 273-284. · Zbl 1318.62207
[18] Cross, R. L., & Parker, A. (2004). The Hidden Power of Social Networks: Understanding How Work Really Gets Done in Organizations: Harvard Business Review Press.
[19] Dasgupta, A., Hopcroft, J. E., & McSherry, F. (2004). Spectral analysis of random graphs with skewed degree distributions, Foundations of computer science, 2004. Proceedings. 45th annual IEEE symposium on, pp. 602-610.
[20] Daudin, J.‐J., Picard, F., & Robin, S. (2008). A mixture model for random graphs. Statistics and Computing, 18(2), 173-183.
[21] Decelle, A., Krzakala, F., Moore, C., & Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Physical Review E, 84(6), 066106.
[22] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39, 1-38. · Zbl 0364.62022
[23] Deshpande, Y., Montanari, A., Mossel, E., & Sen, S. (2018). Contextual stochastic block models, Advances in neural information processing systems, pp. 8581-8593.
[24] Fortunato, S. (2010). Community detection in graphs. Physics Reports, 486(3), 75-174.
[25] Gao, C., Ma, Z., Zhang, A. Y., & Zhou, H. H. (2015). Achieving optimal misclassification proportion in stochastic block model. arXiv:1505.03772.
[26] Guédon, O., & Vershynin, R. (2016). Community detection in sparse networks via Grothendieck inequality. Probability Theory and Related Fields, 165, 1025-1049. · Zbl 1357.90111
[27] Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(5), 1303-1347. · Zbl 1317.68163
[28] Holland, P. W., Laskey, K. B., & Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks, 5(2), 109-137.
[29] Huang, S., & Feng, Y. (2018). Pairwise covariates‐adjusted block model for community detection. arXiv preprint arXiv:1807.03469.
[30] Huang, S., Weng, H., & Feng, Y. (2020). Spectral clustering via adaptive layer aggregation for multi‐layer networks. arXiv preprint arXiv:2012.04646.
[31] Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193-218.
[32] Jin, J. (2015). Fast community detection by score. The Annals of Statistics, 43(1), 57-89. · Zbl 1310.62076
[33] Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37(2), 183-233. · Zbl 0945.68164
[34] Joseph, A., & Yu, B. (2013). Impact of regularization on spectral clustering. arXiv:1312.1733.
[35] Karrer, B., & Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Physical Review E, 83(1), 016107.
[36] Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L., & Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52), 20935-20940. · Zbl 1359.62252
[37] Le, C. M., & Levina, E. (2015). Estimating the number of communities in networks by spectral methods. arXiv preprint arXiv:1507.00827.
[38] Lei, J. (2016). A goodness‐of‐fit test for stochastic block models. The Annals of Statistics, 44(1), 401-424. · Zbl 1331.62283
[39] Lei, J., & Rinaldo, A. (2014). Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1), 215-237. · Zbl 1308.62041
[40] Montanari, A., & Sen, S. (2015). Semidefinite programs on sparse random graphs and their application to community detection. arXiv:1504.05910.
[41] Nallapati, R., & Cohen, W. W. (2008). Link‐plsa‐lda: A new unsupervised model for topics and influence of blogs., Icwsm.
[42] Nesterov, Y. (2012). Efficiency of coordinate descent methods on huge‐scale optimization problems. SIAM Journal on Optimization, 22(2), 341-362. · Zbl 1257.90073
[43] Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Review, 45(2), 167-256. · Zbl 1029.68010
[44] Newman, M. E. J. (2006). Modularity and community structure in networks. Proceedings of the National Academy of Sciences, 103(23), 8577-8582.
[45] Newman, M. E. J., & Clauset, A. (2015). Structure and inference in annotated networks. arXiv:1507.04001.
[46] Newman, M. E. J., & Clauset, A. (2016). Structure and inference in annotated networks. Nature Communications, 7(1), 1-11.
[47] Qin, T., & Rohe, K. (2013). Regularized spectral clustering under the degree‐corrected stochastic blockmodel, Advances in neural information processing systems, pp. 3120-3128.
[48] Rohe, K., Chatterjee, S., & Yu, B. (2011). Spectral clustering and the high‐dimensional stochastic blockmodel. The Annals of Statistics, 39(4), 1878-1915. · Zbl 1227.62042
[49] Ruan, Y., Fuhry, D., & Parthasarathy, S. (2013). Efficient community detection in large networks using content and links, Proceedings of the 22nd international conference on world wide web, pp. 1089-1098.
[50] Saade, A., Krzakala, F., & Zdeborová, L. (2014). Spectral clustering of graphs with the bethe hessian, Advances in neural information processing systems, pp. 406-414.
[51] Saldana, D. F., Yu, Y., & Feng, Y. (2017). How many communities are there?Journal of Computational and Graphical Statistics, 26(1), 171-181.
[52] Stegehuis, C., & Massoulié, L. (2019). Efficient inference in stochastic block models with vertex labels. IEEE Transactions on Network Science and Engineering, 7(3), 1215-1225.
[53] Steinhaeuser, K., & Chawla, N. V. (2010). Identifying and evaluating community structure in complex networks. Pattern Recognition Letters, 31(5), 413-421.
[54] Tütüncü, R. H., Toh, K. C., & Todd, M. J. (2003). Solving semidefinite‐quadratic‐linear programs using sdpt3. Mathematical Programming, 95(2), 189-217. · Zbl 1030.90082
[55] Wang, Y. X. R. achel, & Bickel, P. J. (2017). Likelihood‐based model selection for stochastic block models. The Annals of Statistics, 45(2), 500-528. · Zbl 1371.62017
[56] Yan, B., & Sarkar, P. (2020). Covariate regularized community detection in sparse graphs. Journal of the American Statistical Association, 116, 1-12.
[57] Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes, Data mining (icdm), 2013 ieee 13th international conference on, pp. 1151-1156.
[58] Yuan, M., Liu, R., Feng, Y., & Shang, Z. (2021). Testing community structures for hypergraphs. The Annals of Statistics, to appear.
[59] Yurtsever, A., Tropp, J. A., Fercoq, O., Udell, M., & Cevher, V. (2021). Scalable semidefinite programming. SIAM Journal on Mathematics of Data Science, 3(1), 171-200. · Zbl 1470.90068
[60] Zhang, Y., Levina, E., & Zhu, J. (2014). Detecting overlapping communities in networks with spectral methods. arXiv:1412.3432.
[61] Zhang, Y., Levina, E., & Zhu, J. (2016). Community detection in networks with node features. Electronic Journal of Statistics, 10(2), 3153-3178. · Zbl 1359.62271
[62] Zhang, A. Y., & Zhu, H. H. (2015). Minimax rates of community detection in stochastic block models. arXiv preprint arXiv:1507.05313.
[63] Zhao, Y., Levina, E., & Zhu, J. (2012). Consistency of community detection in networks under degree‐corrected stochastic block models. The Annals of Statistics, 40(4), 2266-2292. · Zbl 1257.62095
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.