×

Correlation clustering with partial information. (English) Zbl 1202.68479

Arora, Sanjeev (ed.) et al., Approximation, randomization, and combinatorial optimization. Algorithms and techniques. 6th international workshop on approximation algorithms for combinatorial optimization problems, APPROX 2003 and 7th international workshop on randomization and approximation techniques in computer science, RANDOM 2003, Princeton, NJ, USA, August 24–26, 2003. Proceedings. Berlin: Springer (ISBN 3-540-40770-7/pbk). Lect. Notes Comput. Sci. 2764, 1-13 (2003).
Summary: We consider the following general correlation-clustering problem: given a graph with real edge weights (both positive and negative), partition the vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Thus, large positive weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster; large negative weights encourage the endpoints to belong to different clusters; and weights with small absolute value represent little information. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition.
Correlation clustering was introduced by Bansal, Blum, and Chawla, motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has weight \(+1\) or \(-1\). We give an \(O(\log n)\)-approximation algorithm for the general case based on a linear-programming rounding and the “region-growing” technique. We also prove that this linear program has a gap of \(\Omega(\log n)\), and therefore our approximation is tight under this approach. We also give an \(O(r ^{3})\)-approximation algorithm for \(K _{r,r }\)-minor-free graphs. On the other hand, we show that the problem is APX-hard, and any \(o(\log n)\)-approximation would require improving the best approximation algorithms known for minimum multicut.
For the entire collection see [Zbl 1026.00023].

MSC:

68W25 Approximation algorithms
68Q17 Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.)
90C27 Combinatorial optimization
90C59 Approximation methods and heuristics in mathematical programming
Full Text: DOI