×

Convex decomposition based cluster labeling method for support vector clustering. (English) Zbl 1280.68191

Summary: Support vector clustering (SVC) is an important boundary-based clustering algorithm in multiple applications for its capability of handling arbitrary cluster shapes. However, SVC’s popularity is degraded by its highly intensive time complexity and poor label performance. To overcome such problems, we present a novel efficient and robust convex decomposition based cluster labeling (CDCL) method based on the topological property of dataset. The CDCL decomposes the implicit cluster into convex hulls and each one is comprised by a subset of support vectors (SVs). According to a robust algorithm applied in the nearest neighboring convex hulls, the adjacency matrix of convex hulls is built up for finding the connected components; and the remaining data points would be assigned the label of the nearest convex hull appropriately. The approach’s validation is guaranteed by geometric proofs. Time complexity analysis and comparative experiments suggest that CDCL improves both the efficiency and clustering quality significantly.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68W05 Nonnumerical algorithms
68T10 Pattern recognition, speech recognition

Software:

clusterpath; UCI-ml
Full Text: DOI

References:

[1] Xu R, Wunsch D C. Hierarchical clustering. In Clustering, Hoboken: John Wiley&Sons, 2008.
[2] Ben-Hur A, Horn D, Siegelmann H T, Vapnik V N. Support vector clustering. Journal of Machine Learning Research, 2001, 2: 125–137.
[3] Schölkopf B, Platt J C, Shawe-Taylor J C, Smola A J, Williamson R C. Estimating the support of a high-dimensional distribution. Neural Computation, 2001, 13(7): 1443–1472. · Zbl 1009.62029 · doi:10.1162/089976601750264965
[4] Tax D M J, Duin R P W. Support vector domain description. Pattern Recognition Letters, 1999, 20(11–13): 1191–1199. · doi:10.1016/S0167-8655(99)00087-2
[5] Burges C J C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 1998, 2(2): 121–167. · doi:10.1023/A:1009715923555
[6] Yang J H, Estivill-Castro V, Chalup S K. Support vector clustering through proximity graph modelling. In Proc. the 9th International Conference on Neural Information Processing (ICONIP 2002), Orchid Country Club, Singapore, Nov. 18–22, 2002, pp.898–903.
[7] Lee J, Lee D. An improved cluster labeling method for support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(3): 461–464. · doi:10.1109/TPAMI.2005.47
[8] Lee J, Lee D. Dynamic characterization of cluster structures for robust and inductive support vector clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(11): 1869–1874. · doi:10.1109/TPAMI.2006.225
[9] Lee D, Lee J. Equilibrium-based support vector machine for semisupervised classi\={}cation. IEEE Transactions on Neural Networks, 2007, 18(2): 578–583. · doi:10.1109/TNN.2006.889495
[10] Lee S-H, Daniels K M. Cone cluster labeling for support vector clustering. In Proc. the 6th SIAM Conference on Data Mining, Bethesda, Maryland, Apr. 20–22, 2006, pp.484–488.
[11] Varma C M B S, Asharaf S, Murty M N. Rough core vector clustering. In Proc. the 2nd International Conference on Pattern Recognition and Machine Intelligence, Kolkata, India, Dec. 18–22, 2007, pp.304–310.
[12] Hsieh T W, Taur J S, Tao C W, Kung S Y. A kernel-based core growing clustering method. International Journal of Intelligent Systems, 2009, 24(4): 441–458. · Zbl 1178.68496 · doi:10.1002/int.20346
[13] Ling P, Zhou C G, Zhou X. Improved support vector clustering. Eng. Applicat. Artificial Intelligence, 2010, 23(4): 552–559. · doi:10.1016/j.engappai.2010.01.029
[14] Ping Y, Zhou Y J, Yang Y X. A novel scheme for accelerating support vector clustering. Computing and Infomatics, 2012, 31(2): in press. · Zbl 1399.62106
[15] Jung J H, Kim N, Lee J. Dynamic pattern denoising method using multi-basin system with kernels. Pattern Recognition, 2011, 44(8): 1698–1707. · Zbl 1218.68125 · doi:10.1016/j.patcog.2011.02.004
[16] Vapnik V. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995. · Zbl 0833.62008
[17] Jung K H, Lee D, Lee J. Fast support-based clustering method for large-scale problems. Pattern Recognition, 2010, 43(5): 1975–1983. · Zbl 1191.68580 · doi:10.1016/j.patcog.2009.12.010
[18] Lee D, Lee J. Dynamic dissimilarity measure for support-based clustering. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(6): 900–905. · doi:10.1109/TKDE.2009.140
[19] Ben-Hur A, Horn D, Siegelmann H T, Vapnik V. A support vector cluster method. In Proc. the 15th International Conference on Pattern Recognition, Barcelona, Spain, Sep. 3–7, 2000, pp.724–727.
[20] Estivill-Castro V, Lee I. AMOEBA: Hierarchical clustering based on spatial proximity using Delaunay diagram. In Proc. the 9th Int. Symposium on Spatial Data Handling, Beijing, China, Aug. 10–12, 2000, pp.7a.26–7a.41.
[21] Estivill-Castro V, Lee I, Murray A T. Criteria on proximity graphs for boundary extraction and spatial clustering. In Proc. the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD2001), Hong Kong, China, Apr. 16–18, 2001, pp.348–357. · Zbl 0989.68591
[22] Ban T, Abe S. Spatially chunking support vector clustering algorithm. In Proc. International Joint Conference on Neural Networks, Budapest, Hungary, July 25–29, 2004, pp.413–418.
[23] Puma-Villanueva W J, Bezerra G B, Lima C A M, Zuben F J V. Improving support vector clustering with ensembles. In Proc. International Joint Conference on Neural Networks, Montreal, Quebec, Canada, Jul. 31–Aug. 4, 2005, pp.13–15.
[24] Chiang J H, Hao P Y. A new kernel-based fuzzy clustering approach: Support vector clustering with cell growing. IEEE Transactions on Fuzzy Systems, 2003, 11(4): 518–527. · doi:10.1109/TFUZZ.2003.814839
[25] Wang F, Zhao B, Zhang C S. Linear time maximum margin clustering. IEEE Transactions on Neural Networks, 2010, 21(2): 319–332. · doi:10.1109/TNN.2009.2036998
[26] Peng J M, Mukherjee L, Singh V, Schuurmans D, Xu L L. An efficient algorithm for maximal margin clustering. Journal of Global Optimization, 2011, 2: 1–15. doi: 10.1007/s10898-011-9691-4 . · Zbl 1230.90133
[27] Lee S H, Daniels K. Gaussian kernel width selection and fast cluster labeling for support vector clustering. University of Massachusetts, Lowell, Technical Report No. 2005–009, 2005.
[28] Lee C H, Yang H C. Construction of supervised and unsupervised learning systems for multilingual text categorization. Expert Systems with Applications, 2009, 36(2): 2400–2410. · doi:10.1016/j.eswa.2007.12.052
[29] Lee D, Jung K H, Lee J. Constructing sparse kernel machines using attractors. IEEE Transactions on Neural Networks, 2009, 20(4): 721–729. · doi:10.1109/TNN.2009.2014059
[30] Hao P Y, Chiang J H, Tu Y K. Hierarchically SVM classification based on support vector clustering method and its application to document categorization. Expert Systems with Applications, 2007, 33(3): 627-635. · doi:10.1016/j.eswa.2006.06.009
[31] Hocking T D, Joulin A, Bach F, Vert J P. Clusterpath: An algorithm for clustering using convex fusion penalties. In Proc. the 28th International Conference on Machine Learning (ICML), Bellevue, WA, USA, Jun. 28–Jul. 2, 2011, pp.1–8.
[32] Shamir O, Tishby N. Stability and model selection in k-means clustering. Machine Learning, 2010, 80(2–3): 213–243. · Zbl 1470.62094 · doi:10.1007/s10994-010-5177-8
[33] Lee S H, Daniels K. Gaussian kernel width generator for support vector clustering. In Proc. International Conference on Bioinformatics and Its Applications, Fort Lauderdale, Florida, USA, Dec. 16–19, 2004, pp.151–162.
[34] Boyd S, Vandenberghe L. Convex Optimization. Cambridge, 7th edition, New York: Cambridge University Press, 2009. · Zbl 1058.90049
[35] Li Y H, Maguire L. Selecting critical patterns based on local geometrical and statistical information. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(6): 1189–1201. · doi:10.1109/TPAMI.2010.188
[36] Kpotufe S, von Luxburg U. Pruning nearest neighbor cluster trees. In Proc. the 28th International Conference on Machine Learning (ICML 2011), Bellevue, USA, Jun. 28–Jul. 2, 2011, pp.225–232.
[37] Kim H C, Lee J. Clustering based on gaussian processes. Neural Computation, 2007, 19(11): 3088–3107. · Zbl 1143.68574 · doi:10.1162/neco.2007.19.11.3088
[38] Camastra F, Verri A. A novel kernel method for clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(5): 801–805. · doi:10.1109/TPAMI.2005.88
[39] Frank A, Asuncion A. UCI machine learning repository. http://archive.ics.uci.edu/ml , 2010.
[40] Hubert L, Arabie P. Comparing partitions. Journal of Classification, 1985, 2(1): 193-218. · Zbl 0587.62128 · doi:10.1007/BF01908075
[41] Wu M, Scholkopf B. A local learning approach for clustering. In Proc. the 20th Annual Conference on Advances Neural Information Processing Systems (NIPS 2007), Vancouver, B.C., Canada, Dec. 4–7, 2006, pp.1529–1536.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.