×

KNN loss and deep KNN. (English) Zbl 1522.68471

Summary: The k Nearest Neighbor (KNN) algorithm has been widely applied in various supervised learning tasks due to its simplicity and effectiveness. However, the quality of KNN decision making is directly affected by the quality of the neighborhoods in the modeling space. Efforts have been made to map data to a better feature space either implicitly with kernel functions, or explicitly through learning linear or nonlinear transformations. However, all these methods use pre-determined distance or similarity functions, which may limit their learning capacity. In this paper, we present two loss functions, namely KNN Loss and Fuzzy KNN Loss, to quantify the quality of neighborhoods formed by KNN with respect to supervised learning, such that minimizing the loss function on the training data leads to maximizing KNN decision accuracy on the training data. We further present a deep learning strategy that is able to learn, by minimizing KNN loss, pairwise similarities of data that implicitly maps data to a feature space where the quality of KNN neighborhoods is optimized. Experimental results show that this deep learning strategy (denoted as Deep KNN) outperforms state-of-the-art supervised learning methods on multiple benchmark data sets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68T07 Artificial neural networks and deep learning
Full Text: DOI

References:

[1] Peterson LE. K-nearest neighbor. Scholarpedia, 2009. 4(2):1883. doi:10.4249/scholarpedia.1883. · doi:10.4249/scholarpedia.1883
[2] Vaidehi V. Person authentication using face recognition. In: Proc. World Congress on Engineering and Computer Science, 2008.
[3] Xu S, Wu Y. An algorithm for remote sensing image classification based on artificial immune B-cell net-work. The international archives of the photogrammetry, remote sensing and spatial information sciences, 2008. 37:107-112.
[4] Xia M, Lu W, Yang J, Ma Y, Yao W, Zheng Z. A hybrid method based on extreme learning machine and k-nearest neighbor for cloud classification of ground-based visible cloud image. Neurocomputing, 2015. 160:238 -249. doi:https://doi.org/10.1016/j.neucom.2015.02.022. · doi:10.1016/j.neucom.2015.02.022
[5] Lin WC, Ke SW, Tsai CF. CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-Based Systems, 2015. 78:13 -21. doi:https://doi.org/10.1016/j.knosys. 2015.01.009. · doi:10.1016/j.knosys.2015.01.009
[6] Wohlhart P, Lepetit V. Learning Descriptors for Object Recognition and 3D Pose Estimation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015.
[7] Toker G, Kirmemis O. Text categorization using k nearest neighbor classification. Survey Paper, Middle East Technical University, 2013.
[8] Liao Y, Vemuri VR. Using text categorization techniques for intrusion detection. In: USENIX Security Symposium, volume 12. 2002 pp. 51-59.
[9] Peng J, Choo KKR, Ashman H. Bit-level n-gram based forensic authorship analysis on social media: Identifying individuals from linguistic profiles. Journal of Network and Computer Applications, 2016. 70:171 -182. doi:https://doi.org/10.1016/j.jnca.2016.04.001. · doi:10.1016/j.jnca.2016.04.001
[10] Irfan R, King CK, Grages D, Ewen S, Khan SU, Madani SA, Kolodziej J, Wang L, Chen D, Rayes A, et al. A survey on text mining in social networks. The Knowledge Engineering Review, 2015. 30(2):157170. doi:10.1017/S0269888914000277. · doi:10.1017/S0269888914000277
[11] Bajramovic F, Mattern F, Butko N, Denzler J. A comparison of nearest neighbor search algorithms for generic object recognition. In: International Conference on Advanced Concepts for Intelligent Vision Systems. Springer, 2006 pp. 1186-1197. doi:10.1007/11864349 108. · doi:10.1007/11864349108
[12] Adeniyi D, Wei Z, Yongquan Y. Automated web usage data mining and recommendation system using K-Nearest Neighbor (KNN) classification method. Applied Computing and Informatics, 2016. 12(1):90-108. URL https://doi.org/10.1016/j.aci.2014.10.001. · doi:10.1016/j.aci.2014.10.001
[13] Korn F, Sidiropoulos N, Faloutsos C, Siegel E, Protopapas Z. Fast nearest neighbor search in medical image databases. Technical report, 1998. URL http://hdl.handle.net/1903/805.
[14] Deekshatulu B, Chandra P, et al. Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia Technology, 2013. 10:85-94. doi:10.1016/j.protcy.2013.12.340. · doi:10.1016/j.protcy.2013.12.340
[15] Xie Y. KNN++: An Enhanced K-Nearest Neighbor Approach for Classifying Data with Heteroge-neous Views. In: International Conference on Hybrid Intelligent Systems. Springer, 2016 pp. 13-23. doi:10.1007/978-3-319-27221-4 2. · doi:10.1007/978-3-319-27221-42
[16] Globerson A, Roweis ST. Metric learning by collapsing classes. In: Advances in neural information processing systems. 2006 pp. 451-458.
[17] Goldberger J, Hinton GE, Roweis ST, Salakhutdinov RR. Neighbourhood components analysis. In: Advances in neural information processing systems. 2005 pp. 513-520. URL http://papers.nips. cc/paper/2566-neighbourhood-components-analysis.pdf.
[18] Weinberger KQ, Saul LK. Distance metric learning for large margin nearest neighbor classification. Jour-nal of Machine Learning Research, 2009. 10(Feb):207-244. URL http://dl.acm.org/citation. cfm?id=1577069.1577078. · Zbl 1235.68204
[19] Xing EP, Jordan MI, Russell SJ, Ng AY. Distance metric learning with application to clustering with side-information. In: Advances in neural information processing systems. 2003 pp. 521-528.
[20] Zuo W, Zhang D, Wang K. On kernel difference-weighted k-nearest neighbor classification. Pattern Analysis and Applications, 2008. 11(3-4):247-257. doi:10.1007/s10044-007-0100-z. · Zbl 1422.68214 · doi:10.1007/s10044-007-0100-z
[21] Le L, Xie Y. Deep Embedding Kernel. arXiv preprint arXiv:1804.05806, 2018.
[22] Min R, Stanley DA, Yuan Z, Bonner A, Zhang Z. A deep non-linear feature mapping for large-margin knn classification. In: Ninth IEEE International Conference on Data Mining, 2009. ICDM’09. IEEE, 2009 pp. 357-366. doi:10.1109/ICDM.2009.27. · doi:10.1109/ICDM.2009.27
[23] Ren W, Yu Y, Zhang J, Huang K. Learning convolutional nonlinear features for k nearest neighbor image classification. In: Pattern Recognition (ICPR), 2014 22nd International Conference on. IEEE, 2014 pp. 4358-4363. doi:10.1109/ICPR.2014.746. · doi:10.1109/ICPR.2014.746
[24] Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of machine learning research, 2010. 11(Dec):3371-3408. URL http://www.jmlr.org/papers/volume11/vincent10a/ vincent10a.pdf. · Zbl 1242.68256
[25] Hinton GE. Deep belief networks. Scholarpedia, 2009. 4(5):5947. doi:10.4249/scholarpedia.5947. · doi:10.4249/scholarpedia.5947
[26] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems. 2012 pp. 1097-1105. doi:10.1145/3065386. · doi:10.1145/3065386
[27] Funahashi Ki, Nakamura Y. Approximation of dynamical systems by continuous time recurrent neural networks. Neural networks, 1993. 6(6):801-806. URL https://doi.org/10.1016/S0893-6080(05) 80125-X.
[28] Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation, 1997. 9(8):1735-1780.
[29] Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[30] Friedman JH. Stochastic gradient boosting. Computational Statistics & Data Analysis, 2002. 38(4):367-378. URL https://doi.org/10.1016/S0167-9473(01)00065-2. · Zbl 1072.65502
[31] Liaw A, Wiener M, et al. Classification and regression by randomForest. R news, 2002. 2(3):18-22.
[32] Bergstra J, Breuleux O, Bastien F, Lamblin P, Pascanu R, Desjardins G, Turian J, Warde-Farley D, Bengio Y. Theano: A CPU and GPU math compiler in Python. In: Proc. 9th Python in Science Conf. 2010 pp. 1-7.
[33] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in Python. Journal of machine learning research, 2011. 12(Oct):2825-2830. · Zbl 1280.68189
[34] Smith JW, Everhart J, Dickson W, Knowler W, Johannes R. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Annual Symposium on Computer Application in Medical Care. American Medical Informatics Association, 1988 p. 261.
[35] Decencière E, Zhang X, Cazuguel G, Lay B, Cochener B, Trone C, Gain P, Ordonez R, Massin P, Erginay A, et al. Feedback on a publicly distributed image database: the Messidor database. Image Analysis & Stereology, 2014. 33(3):231-234. doi:10.5566/ias.1155. · Zbl 1393.92026 · doi:10.5566/ias.1155
[36] Wolberg WH, Mangasarian OL. Multisurface method of pattern separation for medical diagnosis ap-plied to breast cytology. Proceedings of the national academy of sciences, 1990. 87(23):9193-9196. doi:10.1073/pnas.87.23.9193. · Zbl 0709.92537 · doi:10.1073/pnas.87.23.9193
[37] Thabtah F. Autism spectrum disorder screening: machine learning adaptation and DSM-5 fulfillment. In: Proceedings of the 1st International Conference on Medical and Health Informatics 2017. ACM, 2017 pp. 1-6. doi:10.1145/3107514.3107515. · doi:10.1145/3107514.3107515
[38] Fernandes K, Cardoso JS, Fernandes J. Transfer learning with partial observability applied to cervical cancer screening. In: Iberian conference on pattern recognition and image analysis. Springer, 2017 pp. 243-250. doi:10.1007/978-3-319-58838-4 27. · doi:10.1007/978-3-319-58838-427
[39] Zhang J. Selecting typical instances in instance-based learning. In: Machine Learning Proceedings 1992, pp. 470-479. Elsevier, 1992. URL https://doi.org/10.1016/B978-1-55860-247-2.50066-8. · doi:10.1016/B978-1-55860-247-2.50066-8
[40] Ayres-de Campos D, Bernardes J, Garrido A, Marques-de Sa J, Pereira-Leite L. SisPorto 2.0: a program for automated analysis of cardiotocograms. Journal of Maternal-Fetal Medicine, 2000. 9(5):311-318. doi:10.1002/1520-6661(200009/10)9:5¡311::AID-MFM12¿3.0.CO;2-9. · doi:10.1002/1520-6661(200009/10)9:5¡311::AID-MFM12¿3.0.CO;2-9
[41] Mansouri K, Ringsted T, Ballabio D, Todeschini R, Consonni V. Quantitative structure-activity relation-ship models for ready biodegradability of chemicals. Journal of chemical information and modeling, 2013. 53(4):867-878. doi:10.1021/ci4000213. · doi:10.1021/ci4000213
[42] Hamilton JD. Time series analysis, volume 2. Princeton university press Princeton, NJ, 1994. ISBN:0-691-04289-6. · Zbl 0831.62061
[43] Cortes C, Vapnik V. Support vector machine. Machine learning, 1995. 20(3):273-297. URL https: //doi.org/10.1023/A:1022627411411. · Zbl 0831.68098 · doi:10.1023/A:1022627411411
[44] Kruse R, Borgelt C, Klawonn F, Moewes C, Steinbrecher M, Held P. Multi-layer perceptrons. In: Com-putational Intelligence, pp. 47-81. Springer, 2013. doi:10.1007/978-1-4471-5013-8 5. · Zbl 1283.68280 · doi:10.1007/978-1-4471-5013-85
[45] Le L, Hao J, Xie Y, Priestley J. Deep Kernel: Learning Kernel Function from Data Using Deep Neural Network. In: 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT). 2016 pp. 1-7.
[46] Xie Y, Le L, Hao J. Unsupervised deep kernel for high dimensional data. In: Neural Networks (IJCNN), 2017 International Joint Conference on. IEEE, 2017 pp. 294-299. doi:10.1109/IJCNN.2017.7965868. · doi:10.1109/IJCNN.2017.7965868
[47] Bailey T, Jain A. A note on distance-weighted k-nearest neighbor rules. IEEE Transactions on Systems, Man, and Cybernetics, 1978. (4):311-313. doi:10.1109/TSMC.1978.4309958. · Zbl 0383.68067 · doi:10.1109/TSMC.1978.4309958
[48] Thabtah F. ASDTests. A mobile app for ASD screening, 2017.
[49] Thabtah F. Machine learning in autistic spectrum disorder behavioral research: A review and ways for-ward. Informatics for Health and Social Care, 2018. 44(175):1-20. doi:10.1080/17538157.2017.1399132 · doi:10.1080/17538157.2017.1399132
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.