×

A support vector machine-based ensemble algorithm for breast cancer diagnosis. (English) Zbl 1403.92109

Summary: This research studies a support vector machine (SVM)-based ensemble learning algorithm for breast cancer diagnosis. Illness diagnosis plays a critical role in designating treatment strategies, which are highly related to patient safety. Nowadays, numerous classification models in data mining domains are adapted to breast cancer diagnosis based on patients’ historical medical records. However, the performance of each algorithm depends on various model configurations, such as input feature types and model parameters. To tackle the limitation of individual model performance, this research focuses on breast cancer diagnosis that uses an SVM-based ensemble learning algorithm to reduce the diagnosis variance and increase diagnosis accuracy. Twelve different SVMs, based on the proposed weighted area under the receiver operating characteristic curve ensemble (WAUCE) approach, are hybridized. To evaluate the performance of the proposed model, Wisconsin Breast Cancer, Wisconsin Diagnostic Breast Cancer, and the U.S. National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) program breast cancer datasets have been studied. The experimental results show that the WAUCE model achieves a higher accuracy with a significantly lower variance for breast cancer diagnosis compared to five other ensemble mechanisms and two common ensemble models, i.e., adaptive boosting and bagging classification tree. The proposed WAUCE model reduces the variance by 97.89% and increases accuracy by 33.34%, compared to the best single SVM model on the SEER dataset. In practice, the proposed methodology can be further applied to other illness diagnoses, which offers an alternative to a safer, more reliable, and more robust illness diagnosis process.

MSC:

92C50 Medical applications (general)
62P10 Applications of statistics to biology and medical sciences; meta analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

AdaBoost.MH
Full Text: DOI

References:

[1] Abonyi, J.; Szeifert, F., Supervised fuzzy clustering for the identification of fuzzy classifiers, Pattern Recognition Letters, 24, 14, 2195-2207, (2003) · Zbl 1047.68119
[2] Ades, F.; Zardavas, D.; Bozovic-Spasojevic, I.; Pugliano, L.; Fumagalli, D.; de Azambuja, E., Luminal B breast cancer: molecular characterization, clinical management, and future perspectives, Journal of Clinical Oncology, 32, 25, 2794-2803, (2014)
[3] Aruna, S.; Rajagopalan, S.; Nandakishore, L., Knowledge based analysis of various statistical tools in detecting breast cancer, Computer Science & Information Technology, 2, 37-45, (2011)
[4] Ayat, N.-E.; Cheriet, M.; Suen, C. Y., Automatic model selection for the optimization of SVM kernels, Pattern Recognition, 38, 10, 1733-1745, (2005)
[5] Bashir, S.; Qamar, U.; Khan, F. H., Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble, Quality & Quantity, 49, 5, 2061-2076, (2015)
[6] Bradley, A. P., The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition, 30, 7, 1145-1159, (1997)
[7] Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140, (1996) · Zbl 0858.68080
[8] Cawley, G. C., Model selection for support vector machines via adaptive step-size tabu search, Artificial neural nets and genetic algorithms, 434-437, (2001), Springer · Zbl 1011.68030
[9] Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S., Choosing multiple parameters for support vector machines, Machine Learning, 46, 1-3, 131-159, (2002) · Zbl 0998.68101
[10] Chauhan, N.; Ravi, V.; Chandra, D. K., Differential evolution trained wavelet neural networks: application to bankruptcy prediction in banks, Expert Systems with Applications, 36, 4, 7659-7665, (2009)
[11] Chen, H.-L.; Yang, B.; Liu, J.; Liu, D.-Y., A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis, Expert Systems with Applications, 38, 7, 9014-9022, (2011)
[12] Chen, P.-H.; Lin, C.-J.; Schölkopf, B., A tutorial on ν-support vector machines, Applied Stochastic Models in Business and Industry, 21, 2, 111-136, (2005) · Zbl 1097.93040
[13] Cortes, C.; Vapnik, V., Support vector networks, Machine Learning, 20, 3, 273-297, (1995) · Zbl 0831.68098
[14] Delen, D.; Walker, G.; Kadam, A., Predicting breast cancer survivability: a comparison of three data mining methods, Artificial Intelligence in Medicine, 34, 2, 113-127, (2005)
[15] Forouzanfar, M. H.; Foreman, K. J.; Delossantos, A. M.; Lozano, R.; Lopez, A. D.; Murray, C. J.; Naghavi, M., Breast and cervical cancer in 187 countries between 1980 and 2010: A systematic analysis, The Lancet, 378, 9801, 1461-1484, (2011)
[16] Freund, Y.; Schapire, R. E., Experiments with a new boosting algorithm, Proceedings of the International Conference on Machine Learning, 96, 148-156, (1996)
[17] Friedman, M., The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association, 32, 200, 675-701, (1937) · JFM 63.1098.02
[18] Friedrichs, F.; Igel, C., Evolutionary tuning of multiple SVM parameters, Neurocomputing, 64, 107-117, (2005)
[19] Gao, S.; Lee, C.-H.; Lim, J. H., An ensemble classifier learning approach to ROC optimization, Proceedings of the 80th International Conference on Pattern Recognition, 2, 679-682, (2006), IEEE
[20] Gao, S.; Sun, Q., Improving semantic concept detection through optimizing ranking function, IEEE Transactions on Multimedia, 9, 7, 1430-1442, (2007)
[21] Gomes, T. A.; Prudêncio, R. B.C.; Soares, C.; Rossi, A. L.; Carvalho, A., Combining meta-learning and search techniques to SVM parameter selection, Proceedings of the 11th Brazilian Symposium on Neural Networks, 79-84, (2010), IEEE
[22] Gupta, S.; Kumar, D.; Sharma, A., Data mining classification techniques applied for breast cancer diagnosis and prognosis, Indian Journal of Computer Science and Engineering (IJCSE), 2, 2, 188-195, (2011)
[23] Ishikawa, T.; Takahashi, J.; Takemura, H.; Mizoguchi, H.; Kuwata, T., Gastric lymph node cancer detection using multiple features support vector machine for pathology diagnosis support system, Proceedings of the 15th International Conference on Biomedical Engineering, 120-123, (2014), Springer
[24] Kamruzzaman, J.; Begg, R. K., Support vector machines and other pattern recognition approaches to the diagnosis of cerebral palsy gait, IEEE Transactions on Biomedical Engineering, 53, 12, 2479-2490, (2006)
[25] Karabatak, M.; Ince, M. C., An expert system for detection of breast cancer based on association rules and neural network, Expert Systems with Applications, 36, 2, 3465-3469, (2009)
[26] Kate, R. J.; Nadig, R., Stage-specific predictive models for breast cancer survivability, International Journal of Medical Informatics, 97, 304-311, (2017)
[27] Khan, M. U.; Choi, J. P.; Shin, H.; Kim, M., Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare, Proceedings of the EMBS 30th Annual International Conference of the IEEE on Engineering in Medicine and Biology Society, 5148-5151, (2008), IEEE
[28] Kim, W.; Kim, K. S.; Lee, J. E.; Noh, D.-Y.; Kim, S.-W.; Jung, Y. S., Development of novel breast cancer recurrence prediction model using support vector machine, Journal of Breast Cancer, 15, 2, 230-238, (2012)
[29] Levesque, J.-C.; Durand, A.; Gagne, C.; Sabourin, R., Multi-objective evolutionary optimization for generating ensembles of classifiers in the ROC space, Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation, 879-886, (2012), ACM
[30] Li, X.; Wang, L.; Sung, E., Adaboost with SVM-based component classifiers, Engineering Applications of Artificial Intelligence, 21, 5, 785-795, (2008)
[31] Liu, E. T.; Sotiriou, C., Defining the galaxy of gene expression in breast cancer, Breast Cancer Research, 4, 4, 141, (2002)
[32] Liu, H.; Zhang, R.; Luan, F.; Yao, X.; Liu, M.; Hu, Z.; Fan, B. T., Diagnosing breast cancer based on support vector machines, Journal of Chemical Information and Computer Sciences, 43, 3, 900-907, (2003)
[33] Lorena, A. C.; De Carvalho, A. C., Evolutionary tuning of SVM parameter values in multiclass problems, Neurocomputing, 71, 16, 3326-3334, (2008)
[34] Mangasarian, O. L.; Street, W. N.; Wolberg, W. H., Breast cancer diagnosis and prognosis via linear programming, Operations Research, 43, 4, 570-577, (1995) · Zbl 0857.90073
[35] Naveen, N.; Ravi, V.; Rao, C. R.; Chauhan, N., Differential evolution trained radial basis function network: application to bankruptcy prediction in banks, International Journal of Bio-Inspired Computation, 2, 3-4, 222-232, (2010)
[36] Neville, J.; Jensen, D., A bias/variance decomposition for models using collective inference, Machine Learning, 73, 1, 87-106, (2008) · Zbl 1470.68152
[37] Quinlan, J. R., Improved use of continuous attributes in C4. 5, Journal of Artificial Intelligence Research, 4, 77-90, (1996) · Zbl 0900.68112
[38] Ravdin, P. M.; Clark, G. M., A practical application of neural network analysis for predicting outcome of individual breast cancer patients, Breast Cancer Research and Treatment, 22, 3, 285-293, (1992)
[39] Ravi, V.; Reddy, P.; Zimmermann, H.-J., Pattern classification with principal component analysis and fuzzy rule bases, European Journal of Operational Research, 126, 3, 526-533, (2000) · Zbl 0986.90015
[40] Ravi, V.; Zimmermann, H.-J., Fuzzy rule based classification with feature selector and modified threshold accepting, European Journal of Operational Research, 123, 1, 16-28, (2000) · Zbl 0961.90135
[41] Rokach, L., Ensemble-based classifiers, Artificial Intelligence Review, 33, 1-2, 1-39, (2010)
[42] Rosales-Pérez, A.; Escalante, H. J.; Gonzalez, J. A.; Reyes-Garcia, C. A.; Coello, C. A.C., Bias and variance multi-objective optimization for support vector machines model selection, Pattern recognition and image analysis, 108-116, (2013), Springer
[43] Salama, G. I.; Abdelhalim, M.; Zeid, M. A., Breast cancer diagnosis on three different datasets using multi-classifiers, International Journal of Computer and Information Technology, 1, 1, 0764-2277, (2012)
[44] Schapire, R. E.; Freund, Y., Boosting: Foundations and algorithms, (2012), MIT Press · Zbl 1278.68021
[45] Schölkopf, B.; Smola, A. J.; Williamson, R. C.; Bartlett, P. L., New support vector algorithms, Neural Computation, 12, 5, 1207-1245, (2000)
[46] Shah, V.; Turkbey, B.; Mani, H.; Pang, Y.; Pohida, T.; Merino, M. J., Decision support system for localizing prostate cancer based on multiparametric magnetic resonance imaging, Medical Physics, 39, 7, 4093-4103, (2012)
[47] Siegel, R. L.; Miller, K. D.; Jemal, A., Cancer statistics, 2015, CA: A Cancer Journal for Clinicians, 65, 1, 5-29, (2015)
[48] Son, Y.-J.; Kim, H.-G.; Kim, E.-H.; Choi, S.; Lee, S.-K., Application of support vector machine for prediction of medication adherence in heart failure patients, Healthcare Informatics Research, 16, 4, 253-259, (2010)
[49] Sotiriou, C.; Neo, S.-Y.; McShane, L. M.; Korn, E. L.; Long, P. M.; Jazaeri, A., Breast cancer classification and prognosis based on gene expression profiles from a population-based study, Proceedings of the National Academy of Sciences, 100, 18, 10393-10398, (2003)
[50] SEER, (2017). Surveillance, Epidemiology, and End Results (SEER) program research data (1973-2014). In National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2017, based on the November 2016 submission. SEER. (www.seer.cancer.gov; SEER, (2017). Surveillance, Epidemiology, and End Results (SEER) program research data (1973-2014). In National Cancer Institute, DCCPS, Surveillance Research Program, Surveillance Systems Branch, released April 2017, based on the November 2016 submission. SEER. (www.seer.cancer.gov
[51] West, D.; Mangiameli, P.; Rampal, R.; West, V., Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application, European Journal of Operational Research, 162, 2, 532-551, (2005) · Zbl 1176.90328
[52] Wickramaratna, J.; Holden, S.; Buxton, B., Performance degradation in boosting, Multiple classifier systems, 11-21, (2001), Springer · Zbl 0980.68780
[53] Wolpert, D. H., The supervised learning no-free-lunch theorems, Soft computing and industry, 25-42, (2002), Springer
[54] Zeng, T.; Liu, J., Mixture classification model based on clinical markers for breast cancer prognosis, Artificial Intelligence in Medicine, 48, 2, 129-137, (2010)
[55] Zheng, B.; Yoon, S. W.; Lam, S. S., Breast cancer diagnosis based on feature extraction using a hybrid of k-means and support vector machine algorithms, Expert Systems with Applications, 41, 4, 1476-1482, (2014)
[56] Zheng, B.; Zhang, J.; Yoon, S. W.; Lam, S. S.; Khasawneh, M.; Poranki, S., Predictive modeling of hospital readmissions using metaheuristics and data mining, Expert Systems with Applications, 42, 20, 7110-7120, (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.