×

Impact of regularization on spectral clustering. (English) Zbl 1357.62229

This paper reviews the regularized spectral clustering (RSC) procedure proposed in [A. A. Amini et al., Ann. Stat. 41, No. 4, 2097–2122 (2013; Zbl 1277.62166)] and the stochastic block model (SBM) introduced in [P. W. Holland et al., “Stochastic blockmodels: first steps”, Soc. Netw. 5, 109–137 (1983; doi:10.1016/0378-8733(83)90021-7)]. It focuses on the SBM and an extension of this model, and attempts to understand regularization for SBM. It provides a theoretical justification for the regularization in the RSC procedure. It is shown why choosing a large regularization parameter can lead to good results. It also partly explains empirical findings in [Amini et al., loc. cit.] showing that the performance of regularized spectral clustering becomes insensitive for larger values of regularization parameters. A data dependent method DKest is proposed for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.

MSC:

62H99 Multivariate analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G20 Asymptotic properties of nonparametric inference
62G05 Nonparametric estimation
65C60 Computational problems in statistics (MSC2010)

Citations:

Zbl 1277.62166

References:

[1] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3 rd International Workshop on Link Discovery 36-43. ACM, New York.
[2] Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097-2122. · Zbl 1277.62166 · doi:10.1214/13-AOS1138
[3] Awasthi, P. and Sheffet, O. (2012). Improved spectral-norm bounds for clustering. In Approximation , Randomization , and Combinatorial Optimization. Lecture Notes in Computer Science 7408 37-49. Springer, Heidelberg. · Zbl 1358.68220 · doi:10.1007/978-3-642-32512-0_4
[4] Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373-1396. · Zbl 1085.68119 · doi:10.1162/089976603321780317
[5] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169 . Springer, New York. · Zbl 0863.15001
[6] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1359.62411 · doi:10.1073/pnas.0907096106
[7] Chaudhuri, K., Chung, F. and Tsiatas, A. Spectral clustering of graphs with general degrees in the extended planted partition model. J. Mach. Learn. Res. 2012 1-23.
[8] Chen, A., Amini, A., Bickel, P. and Levina, L. (2012). Fitting community models to large sparse networks. In Joint Statistical Meetings , San Diego .
[9] Dhillon, I. S. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proc. Seventh ACM SIGKDD Inter. Conf. on Know. Disc. and Data Mining 269-274. ACM, New York.
[10] Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23-39. · Zbl 1314.05186 · doi:10.1137/120875600
[11] Hagen, L. and Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 11 1074-1085.
[12] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109-137. · doi:10.1016/0378-8733(83)90021-7
[13] Joseph, A. and Yu, B. (2016). Supplement to “Impact of regularization on spectral clustering.” . · Zbl 1357.62229 · doi:10.1214/16-AOS1447
[14] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107. · doi:10.1103/PhysRevE.83.016107
[15] Kumar, A. and Kannan, R. (2010). Clustering with spectral norm and the \(k\)-means algorithm. In 2010 IEEE 51 st Annual Symposium on Foundations of Computer Science FOCS 2010 299-308. IEEE Computer Soc., Los Alamitos, CA. · doi:10.1109/FOCS.2010.35
[16] Kwok, T. C., Lau, L. C., Lee, Y. T., Oveis Gharan, S. and Trevisan, L. (2013). Improved Cheeger’s inequality: Analysis of spectral partitioning algorithms through higher order spectral gap. In STOC’ 13 -Proceedings of the 2013 ACM Symposium on Theory of Computing 11-20. ACM, New York. · Zbl 1293.05301 · doi:10.1145/2488608.2488611
[17] Le, C. M. and Vershynin, R. (2015). Concentration and regularization of random graphs. Available at . arXiv:1506.0066
[18] Mackey, L., Jordan, M. I., Chen, R. Y., Farrell, B. and Tropp, J. A. (2012). Matrix concentration inequalities via the method of exchangeable pairs. Available at . arXiv:1201.6002 · Zbl 1294.60008 · doi:10.1214/13-AOP892
[19] McSherry, F. (2001). Spectral partitioning of random graphs. In 42 nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) 529-537. IEEE Computer Soc., Los Alamitos, CA.
[20] Newman, M. E. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E 69 026113. · Zbl 1032.91716
[21] Ng, A. Y., Jordan, M. I., Weiss, Y. et al. (2002). On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2 849-856.
[22] Oliveira, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Available at . arXiv:0911.0600 · Zbl 1173.60343 · doi:10.1214/08-AAP550
[23] Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Available at . arXiv:1309.4111
[24] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878-1915. · Zbl 1227.62042 · doi:10.1214/11-AOS887
[25] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 888-905.
[26] Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119-1128. · Zbl 1443.62188 · doi:10.1080/01621459.2012.699795
[27] von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395-416. · doi:10.1007/s11222-007-9033-z
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.