×

Clustering with qualitative information. (English) Zbl 1094.68075

Summary: We consider the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. N. Bansal, A. Blum and S. Chawla [(*) Mach. Learn. 56, 89–113 (2004; Zbl 1089.68085)] cast the problem thus: given a graph \(G\) whose edges are labeled “+” (similar) or “\(-\)” (dissimilar), partition the vertices into clusters so that the number of pairs correctly (resp., incorrectly) classified with respect to the input labeling is maximized (resp., minimized). It is worthwhile studying both complete graphs, in which every edge is labeled, and general graphs, in which some input edges might not have labels. We answer several questions left open in (*) and provide a sound overview of clustering with qualitative information.
Specifically, we demonstrate a factor 4 approximation for minimization on complete graphs, and a factor \(O(\log n)\) approximation for general graphs. For the maximization version, a PTAS for complete graphs was shown in (*), we give a factor 0.7664 approximation for general graphs, noting that a PTAS is unlikely by proving APX-hardness. We also prove the APX-hardness of minimization on complete graphs.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68R10 Graph theory (including graph drawing) in computer science
68W25 Approximation algorithms

Citations:

Zbl 1089.68085
Full Text: DOI

References:

[1] Bansal, N.; Blum, A.; Chawla, S., Correlation clustering, (Proceedings of 43rd FOCS (2002)), 238-247
[2] Bansal, N.; Blum, A.; Chawla, S., Correlation clustering, Machine Learning, 56, 89-113 (2004) · Zbl 1089.68085
[3] Ben-Dor, A.; Shamir, R.; Yakhini, Z., Clustering gene expression patterns, J. Comput. Biol., 6, 281-297 (1999)
[4] Chen, Z.; Jiang, T.; Lin, G., Computing phylogenetic roots with bounded degrees and errors, SIAM J. Comput., 32, 4, 864-879 (2003) · Zbl 1053.68069
[5] Calinescu, G.; Karloff, H.; Rabani, Y., An improved approximation algorithm for multiway cut, J. Comput. System Sci., 60, 564-574 (2000) · Zbl 0986.90043
[6] Demaine, E.; Immorlica, N., Correlation clustering with partial information, (Proceedings of Sixth APPROX (2003)), 1-13 · Zbl 1202.68479
[7] Emanuel, D.; Fiat, A., Correlation clustering—minimizing disagreements on arbitrary weighted graphs, (Proceedings of 11th ESA (2003)), 208-220 · Zbl 1266.68228
[8] Even, G.; Naor, J.; Schieber, B.; Zosin, L., Approximating minimum subset feedback sets in undirected graphs with applications, SIAM J. Discrete Math., 25, 255-267 (2000) · Zbl 0941.68057
[9] Frieze, A.; Jerrum, M., Improved approximation algorithms for and MAX \(k\)-CUT and MAX BISECTION, (Balas, E.; Clausen, J., Proceedings of Fourth IPCO, Lecture Notes in Computer Science, vol. 920 (1995), Springer: Springer Berlin), 1-13 · Zbl 1135.90420
[10] Garg, N.; Vazirani, V.; Yannakakis, M., Approximate max-flow min-(multi)cut theorems and their applications, SIAM J. Comput., 25, 235-251 (1996) · Zbl 0844.68061
[11] Garg, N.; Vazirani, V.; Yannakakis, M., Primal-dual approximation algorithms for integral flow and multicut in trees, Algorithmica, 18, 3-20 (1997) · Zbl 0873.68075
[12] Gramm, J.; Guo, J.; Hüffner, F.; Niedermeier, R., Graph-modeled data clusteringFixed-parameter algorithms for clique generation, (Proceedings of Fifth CIAC (2003)), 108-119 · Zbl 1032.68158
[13] Guruswami, V., Inapproximability results for set splitting and satisfiability problems with no mixed clauses, Algorithmica, 38, 3, 451-469 (2003) · Zbl 1095.68036
[14] Håstad, J., Some optimal inapproximability results, J. Assoc. Comput. Mach., 48, 798-859 (2001) · Zbl 1127.68405
[15] Karger, D.; Klein, P.; Stein, C.; Thorup, M.; Young, N., Rounding algorithms for a geometric embedding of minimum multiway cut, (Proceedings of 31st STOC (1999)), 668-678 · Zbl 1345.90095
[16] Kearns, M.; Schapire, R.; Sellie, L., Toward efficient agnostic learning, Machine Learning, 17, 115-142 (1994) · Zbl 0938.68797
[17] Khot, S., On the power of unique 2-prover 1-round games, (Proceedings of 34th STOC (2002)), 767-775 · Zbl 1192.68367
[18] Papadimitrìou, C., Computational Complexity (1994), Addsion-Wesley: Addsion-Wesley Longman, New York · Zbl 0833.68049
[19] Roberts, F., Discrete mathematics, (International Encyclopedia on Social and Behavioural Sciences (2001), Elsevier: Elsevier Amsterdam), 3743-3746
[20] Shamir, R.; Sharan, R.; Tsur, D., Cluster graph modification problems, (Proceedings of 28th Workshop on Graph Theory (WG) (2002)), 379-390 · Zbl 1022.68104
[21] Swamy, C., Correlation clusteringmaximizing agreements via semidefinite programming, (Proceedings of 15th SODA (2004)), 519-520
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.