×

Fully dynamic \(k\)-center clustering with outliers. (English) Zbl 07785279

Summary: We consider the robust version of the classic \(k\)-center clustering problem, where we wish to remove up to \(z\) points (outliers), so as to be able to cluster the remaining points in \(k\) clusters with minimum maximum radius. We study such a problem under the fully dynamic adversarial model, where points can be inserted or deleted arbitrarily. In this setting, the main goal is to design algorithms that maintain a high quality solution at any point in time, while requiring a “small” amortized cost, i.e. a “small” number of operations per insertion or deletion, on average. In our work, we provide the first constant bi-criteria approximation algorithm for such a problem with its amortized cost being independent of both \(z\) and the size of the current input. We also complement our positive result with a lower bound showing that any constant (non bi-criteria) approximation algorithm has amortized cost at least linear in \(z\). Finally, we conduct an in-depth experimental analysis of our algorithm on Twitter, Flickr, and Air-Quality datasets showing the effectiveness of our approach.

MSC:

68Wxx Algorithms in computer science
05Cxx Graph theory
Full Text: DOI

References:

[1] Hou, B.J., Zhang, L., Zhou, Z.H.: Learning with feature evolvable streams. In: Advances in Neural Information Processing Systems, pp. 1417-1427 (2017)
[2] Wang, H., Fan, W., Yu, P.S., Han, J.: Mining concept-drifting data streams using ensemble classifiers. In: SIGKDD, pp. 226-235 (2003)
[3] Zhang, P., Li, J., Wang, P., Gao, B.J., Zhu, X., Guo, L.: Enabling fast prediction for ensemble models on data streams. In: SIGKDD, pp. 177-185 (2011)
[4] Chan, T.H.H., Guerqin, A., Sozio, M.: Fully Dynamic k-Center Clustering. In: WWW, pp. 579-587 (2018)
[5] Goranci, G., Henzinger, M., Leniowski, D., Svozil, A.: Fully Dynamic k-Center Clustering in Doubling Metrics (2019). arXiv:1908.03948 · Zbl 07302443
[6] Schmidt, M., Sohler, C.: Fully dynamic hierarchical diameter k-clustering and k-center (2019). arXiv:1908.02645
[7] Cohen-Addad, V., Hjuler, N.O.D., Parotsidis, N., Saulpic, D., Schwiegelshohn, C.: Fully Dynamic Consistent Facility Location. In: NeurIPS, pp. 3250-3260 (2019)
[8] Henzinger, M., Kale, S.: Fully-Dynamic Coresets. In: ESA 2020. vol. 173 of LIPIcs, pp. 57:1-57:21 (2020) · Zbl 07651196
[9] Bateni, M., Esfandiari, H., Jayaram, R., Mirrokni, V.S.: Optimal Fully Dynamic k-Centers Clustering (2021) arXiv:2112.07050
[10] Ceccarello, M.; Pietracaprina, A.; Pucci, G., Solving k-center Clustering (with Outliers) in MapReduce and Streaming, almost as Accurately as Sequentially, Proceedings of the VLDB Endowment., 12, 7, 766-778 (2019) · doi:10.14778/3317315.3317319
[11] de Berg, M., Monemizadeh, M., Zhong, Y.: k-Center Clustering with Outliers in the Sliding-Window Model. In: ESA. vol. 204. Schloss Dagstuhl, pp. 13:1-13:13 (2021) · Zbl 07740868
[12] Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642-651 (2001) · Zbl 1012.90026
[13] Bhaskara, A., Vadgama, S., Xu, H.: Greedy sampling for approximate clustering in the presence of outliers. In: NeurIPS, pp. 11146-11155 (2019)
[14] Harris, DG; Pensyl, TW; Srinivasan, A.; Trinh, K., A lottery model for center-type problems with outliers, ACM Trans Algor., 15, 3, 36:1-36:25 (2019) · Zbl 1454.68182
[15] Malkomes, G., Kusner, M.J., Chen, W., Weinberger, K.Q., Moseley, B.: Fast distributed k-center clustering with outliers on massive data. In: Advances in Neural Information Processing Systems, pp. 1063-1071 (2015)
[16] McCutchen, R.M., Khuller, S.: Streaming algorithms for k-center clustering with outliers and with anonymity. In: APPROX-RANDOM. Springer, pp. 165-178 (2008) · Zbl 1159.68672
[17] Cohen-Addad, V., Schwiegelshohn, C., Sohler, C.: Diameter and k-Center in Sliding Windows. In: ICALP, pp. 19:1-19:12 (2016) · Zbl 1388.68305
[18] Ding, H., Yu, H., Wang, Z.: Greedy Strategy Works for k-Center Clustering with Outliers and Coreset Construction. In: ESA (2019) · Zbl 07525477
[19] Huang, L., Jiang, S., Li, J., Wu, X.: Epsilon-coresets for clustering (with outliers) in doubling metrics. In: FOCS, pp. 814-825 (2018)
[20] Putina, A., Sozio, M., Rossi, D., Navarro, J.M.: Random histogram forest for unsupervised anomaly detection. In: ICDM. IEEE, pp. 1226-1231 (2020)
[21] Alon, N., Krivelevich, M., Newman, I., Szegedy, M.: Regular Languages Are Testable with a Constant Number of Queries. In: FOCS, pp. 645-655 (1999) · Zbl 0992.68064
[22] Epasto, A., Lattanzi, S., Sozio, M.: Efficient densest subgraph computation in evolving graphs. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, May 18-22, 2015, pp. 300-310 (2015)
[23] Chan, T.H.H., Lattanzi, S., Sozio, M., Wang, B.: Fully dynamic k-center clustering with outliers. In: COCOON. Vol. 13595 of Lecture Notes in Computer Science, pp. 150-161. Springer (2022) · Zbl 07724741
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.