×

Clustering data that are graph connected. (English) Zbl 1403.90630

Summary: A new combinatorial model for clustering is proposed for all applications in which individual and relational data are available. Individual data refer to the intrinsic features of units, they are stored in a matrix \(D\), and are the typical input of all clustering algorithms proposed so far. Relational data refer to the observed links between units, representing social ties such as friendship, joint participation to social events, and so on. Relational data are stored in the graph \(G = (V, E)\), and the data available for clustering are the triplet \(G = (V, E, D)\), called attributed graph. Known clustering algorithms can take advantage of the relational structure of \(G\) to redefine and refine the units membership. For example, uncertain membership of units to groups can be resolved using the sociological principle that ties are more likely to form between similar units. The model proposed here shows how to take into account the graph information, combining the clique partitioning objective function (a known clustering methodology) with connectivity as the structural constraint of the resulting clusters. The model can be formulated and solved using integer linear programming and a new family of cutting planes. Moderate size problems are solved, and heuristic procedures are developed for instances in which the optimal solution can only be approximated. Finally, tests conducted on simulated data show that the clusters quality is greatly improved through this methodology.

MSC:

90C35 Programming involving graphs or networks
62H30 Classification and discrimination; cluster analysis (statistical aspects)
05C90 Applications of graph theory
90C10 Integer programming
90C27 Combinatorial optimization

References:

[1] Bansal, N.; Blum, A.; Chawla, S., Correlation clustering, Machine Learning, 56, 1-3, 89-113 (2004) · Zbl 1089.68085
[2] Bektaş, T.; Gouveia, L., Requiem for the Miller-Tucker-Zemlin subtour elimination constraints?, European Journal of Operational Research, 236, 3, 820-832 (2014) · Zbl 1304.90168
[3] Benati, S., Categorical data fuzzy clustering: An analysis of local search heuristics, Computers and Operations Research, 35, 3, 766-775 (2008) · Zbl 1278.90484
[4] Bertsimas, D.; King, A., Or forum- an algorithmic approach to linear regression, Operations Research, 64, 1, 2-16 (2016) · Zbl 1338.90272
[5] Bertsimas, D.; Shioda, R., Classification and regression via integer optimization, Operations Research, 55, 2, 252-271 (2007) · Zbl 1167.90593
[6] Bothorel, C.; Cruz, J.; Magnani, M.; Micenkova, B., Clustering attributed graphs: Models, measures and methods, Network Science, 3, 408-444 (2015)
[7] Cafieri, S.; Hansen, P.; Mladenovic, N., Edge-ratio network clustering by variable neighborhood search, European Physical Journal B, 87, 5 (2014) · Zbl 1476.05192
[8] Charikar, M.; Guruswami, V.; Wirth, A., Clustering with qualitative information, Journal of Computer and System Sciences, 71, 3, 360-383 (2005) · Zbl 1094.68075
[9] Cheng, H.; Zhou, Y.; Huang, X.; Yu, J. X., Clustering large attributed information networks: an efficient incremental computing approach, Data Mining and Knowledge Discovery, 25, 3, 450-477 (2012) · Zbl 1259.05150
[10] Combe, D.; Largeron, C.; Egyed-Zsigmond, E.; Géry, M., Combining relations and text in scientific network clustering, Proceedings of the international conference on advances in social networks analysis and mining, ASONAM 2012, Istanbul, Turkey, 26-29 august 2012, 1248-1253 (2012)
[11] Gavish, B., Formulations and algorithms for the capacitated minimal directed tree problem, Journal of the Association for Computing Machinery, 30, 1, 118-132 (1983) · Zbl 0504.90052
[12] Gouveia, L., Using the Miller-Tucker-Zemlin constraints to formulate a minimal spanning tree problem with hop constraints, Computers & Operations Research, 2, 3, 959-970 (1996) · Zbl 0854.90139
[13] Grötschel, M.; Wakabayashi, Y., A cutting plane algorithm for a clustering problem, Mathematical Programming, 45, 1-3, 59-96 (1989) · Zbl 0675.90072
[14] Grötschel, M.; Wakabayashi, Y., Facets of the clique partitioning polytope, Mathematical Programming, 47, 3, (Ser. A), 367-387 (1990) · Zbl 0715.90092
[15] Hansen, P.; Mladenovic, N.; Moreno Perez, J., Variable neighbourhood search: Methods and applications, Annals of Operations Research, 175, 1, 367-407 (2010) · Zbl 1185.90211
[16] Hansen, P.; Ruiz, M.; Aloise, D., A VNS heuristic for escaping local extrema entrapment in normalized cut clustering, Pattern Recognition, 45, 12, 4337-4345 (2012)
[17] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 1, 193-218 (1985)
[18] Inglehart, R.; Baker, W., Modernization, cultural change, and the persistence of traditional values, American Sociological Review, 65, 1, 19-51 (2000)
[19] Jaehn, F.; Pesch, E., New bounds and constraint propagation techniques for the clique partitioning problem, Discrete Applied Mathematics, 161, 13-14, 2025-2037 (2013) · Zbl 1287.68155
[20] Johnson, E. L.; Mehrotra, A.; Nemhauser, G. L., Min-cut clustering, Mathematical Programming, 62, 1, Ser. B, 133-151 (1993) · Zbl 0807.90117
[21] Landete, M.; Marín, A., Looking for edge-equitable spanning trees, Computers and Operations Research, 41, 44-52 (2014) · Zbl 1348.90600
[22] Laporte, G., The traveling salesman problem: An overview of exact and approximate algorithms, European Journal of Operational Research, 59, 2, 231-247 (1992) · Zbl 0760.90089
[23] Marcotorchino, F.; Michaud, P., Agrégation de similarités en classification automatique, Review of Statistics and Its Application, 30, 2, 21-44 (1982) · Zbl 0537.62006
[24] Martí, R.; Resende, M. G.C.; Ribeiro, C. C., Multi-start methods for combinatorial optimization, European Journal of Operational Research, 226, 1, 1-8 (2013) · Zbl 1292.90257
[25] McPherson, M.; Smith-Lovin, L.; Cook, J., Birds of a feather: Homophily in social networks, Annual Review of Sociology, 27, 415-444 (2001)
[26] Miller, C. E.; Tucker, A. W.; Zemlin, R. A., Integer programming formulation of traveling salesman problems, Journal of the Association for Computing Machinery, 7, 326-329 (1960) · Zbl 0100.15101
[27] Miyauchi, A.; Sukegawa, N., Redundant constraints in the standard formulation for the clique partitioning problem, Optimization Letters, 9, 1, 199-207 (2015) · Zbl 1316.90057
[28] Neville, J.; Adler, M.; Jensen, D. D., Clustering relational data using attribute and link information, Proceedings of the workshop on text mining and link analysis, 18th international joint conference on artificial intelligence. Acapulco, Mexico (2003)
[29] Newman, M.; Girvan, M., Finding and evaluating community structure in networks, Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 69, 2 2, 026113-1-026113-15 (2004)
[30] Swamy, C., Correlation clustering: Maximizing agreements via semidefinite programming, Proceedings of the fifteenth annual ACM-SIAM symposium on discrete algorithms, 526-527(electronic) (2004), ACM, New York · Zbl 1318.68197
[31] Wasserman, M.; Faust, K., Social networks analysis: Methods and applications (1994), Cambridge University Press
[32] Xu, Z.; Ke, Y.; Wang, Y.; Cheng, H.; Cheng, J., GBAGC: A general Bayesian framework for attributed graph clustering, ACM Transactions on Knowledge Discovery from Data, 9, 1, 5:1-5:43 (2014)
[33] Zhou, Y.; Hao, J.-K.; Goëffon, A., A three-phased local search approach for the clique partitioning problem (English) Zbl 06620838, J. Comb. Optim., 32, 2, 469-491 (2016) · Zbl 1354.90125
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.