×

Learning distance to subspace for the nearest subspace methods in high-dimensional data classification. (English) Zbl 1443.62190

Summary: The nearest subspace methods (NSM) are a category of classification methods widely applied to classify high-dimensional data. In this paper, we propose to improve the classification performance of NSM through learning tailored distance metrics from samples to class subspaces. The learned distance metric is termed as ‘learned distance to subspace’ (LD2S). Using LD2S in the classification rule of NSM can make the samples closer to their correct class subspaces while farther away from their wrong class subspaces. In this way, the classification task becomes easier and the classification performance of NSM can be improved. The superior classification performance of using LD2S for NSM is demonstrated on three real-world high-dimensional spectral datasets.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

LMNN

References:

[1] Arnalds, T.; McElhinney, J.; Fearn, T.; Downey, G., A hierarchical discriminant analysis for species identification in raw meat by visible and near infrared spectroscopy, J. Near Infrared Spectrosc., 12, 3, 183-188 (2004)
[2] Branden, K. V.; Hubert, M., Robust classification in high dimensions based on the SIMCA method, Chemom. Intell. Lab. Syst., 79, 1, 10-21 (2005)
[3] Chi, Y., Nearest subspace classification with missing data, Signals, Systems and Computers, 2013 Asilomar Conference on, 1667-1671 (2013), IEEE
[4] Chi, Y.; Porikli, F., Connecting the dots in multi-class classification: from nearest subspace to collaborative representation, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 3602-3609 (2012), IEEE
[5] Durante, C.; Bro, R.; Cocchi, M., A classification tool for N-way array based on SIMCA methodology, Chemom. Intell. Lab. Syst., 106, 1, 73-85 (2011)
[6] Ferraty, F.; Vieu, P., Nonparametric functional data analysis: Theory and practice (2006), Springer Science & Business Media · Zbl 1119.62046
[7] Fukui, K.; Maki, A., Difference subspace and its generalization for subspace-based methods, IEEE Trans. Pattern Anal. Mach. Intell., 37, 11, 2164-2177 (2015)
[8] Hall, P.; Titterington, D. M.; Xue, J.-H., Median-based classifiers for high-dimensional data, J. Am. Stat. Assoc., 104, 488, 1597-1608 (2009) · Zbl 1205.62078
[9] Hall, P.; Xue, J.-H., Incorporating prior probabilities into high-dimensional classifiers, Biometrika, 97, 1, 31-48 (2010) · Zbl 1183.62105
[10] Hall, P.; Xue, J.-H., On selecting interacting features from high-dimensional data, Comput. Stat. Data Anal., 71, 694-708 (2014) · Zbl 1471.62085
[11] Lee, K.-C.; Ho, J.; Kriegman, D. J., Acquiring linear subspaces for face recognition under variable lighting, IEEE Trans. Pattern Anal. Mach. Intell., 27, 5, 684-698 (2005)
[12] Mees, C.; Souard, F.; Delporte, C.; Deconinck, E.; Stoffelen, P.; Stévigny, C.; Kauffmann, J.-M.; De Braekeleer, K., Identification of coffee leaves using FT-NIR spectroscopy and SIMCA, Talanta, 177, 4-11 (2018)
[13] Mi, J.-X.; Huang, D.-S.; Wang, B.; Zhu, X., The nearest-farthest subspace classification for face recognition, Neurocomputing, 113, 241-250 (2013)
[14] Mnassri, B.; Ananou, B.; Ouladsine, M., Fault detection and diagnosis based on PCA and a new contribution plot, IFAC Proc. Vol., 42, 8, 834-839 (2009)
[15] Mnassri, B.; Ouladsine, M., Reconstruction-based contribution approaches for improved fault diagnosis using principal component analysis, J. Process Control, 33, 60-76 (2015)
[16] Nejadgholi, I.; Bolic, M., A comparative study of PCA, SIMCA and cole model for classification of bioimpedance spectroscopy measurements, Comput. Biol. Med., 63, 42-51 (2015)
[17] Rafferty, M.; Liu, X.; Laverty, D. M.; McLoone, S., Real-time multiple event detection and classification using moving window PCA, IEEE Trans. Smart Grid, 7, 5, 2537-2548 (2016)
[18] Sgarbossa, A.; Costa, C.; Menesatti, P.; Antonucci, F.; Pallottino, F.; Zanetti, M.; Grigolato, S.; Cavalli, R., A multivariate SIMCA index as discriminant in wood pellet quality assessment, Renew. Energy, 76, 258-263 (2015)
[19] Tian, Q.; Chen, S.; Qiao, L., Ordinal margin metric learning and its extension for cross-distribution image data, Inf. Sci., 349, 50-64 (2016) · Zbl 1398.68461
[20] Van den Kerkhof, P.; Vanlaer, J.; Gins, G.; Van Impe, J. F., Analysis of smearing-out in contribution plot based fault isolation for statistical process control, Chem. Eng. Sci., 104, 285-293 (2013)
[21] Weinberger, K. Q.; Saul, L. K., Distance metric learning for large margin nearest neighbor classification, J. Mach. Learn. Res., 10, 207-244 (2009) · Zbl 1235.68204
[22] Wold, S., Pattern recognition by means of disjoint principal components models, Pattern Recognit., 8, 3, 127-139 (1976) · Zbl 0336.68040
[23] Xing, E. P.; Ng, A. Y.; Jordan, M. I.; Russell, S., Distance metric learning with application to clustering with side-information, Adv. Neural Inf. Process. Syst., 15, 505-512 (2003)
[24] Yu, J.; Tao, D.; Li, J.; Cheng, J., Semantic preserving distance metric learning and applications, Inf. Sci., 281, 674-686 (2014)
[25] Zhang, L.; Zhou, W.-D.; Liu, B., Nonlinear nearest subspace classifier, International Conference on Neural Information Processing, 638-645 (2011), Springer
[26] Zhu, P.; Hu, Q.; Zuo, W.; Yang, M., Multi-granularity distance metric learning via neighborhood granule margin maximization, Inf. Sci., 282, 321-331 (2014)
[27] Zhu, R.; Fukui, K.; Xue, J.-H., Building a discriminatively ordered subspace on the generating matrix to classify high-dimensional spectral data, Inf. Sci., 382, 1-14 (2017)
[28] Zhu, R.; Xue, J.-H., On the orthogonal distance to class subspaces for high-dimensional data classification, Inf. Sci., 417, 262-273 (2017) · Zbl 1435.62257
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.