×

Classifier chains: a review and perspectives. (English) Zbl 1512.68287

Summary: The family of methods collectively known as classifier chains has become a popular approach to multi-label learning problems. This approach involves chaining together off-the-shelf binary classifiers in a directed structure, such that individual label predictions become features for other classifiers. Such methods have proved flexible and effective and have obtained state-of-the-art empirical performance across many datasets and multi-label evaluation metrics. This performance led to further studies of the underlying mechanism and efficacy, and investigation into how it could be improved. In the recent decade, numerous studies have explored the theoretical underpinnings of classifier chains, and many improvements have been made to the training and inference procedures, such that this method remains among the best options for multi-label learning. Given this past and ongoing interest, which covers a broad range of applications and research themes, the goal of this work is to provide a review of classifier chains, a survey of the techniques and extensions provided in the literature, as well as perspectives for this approach in the domain of multi-label classification in the future. We conclude positively, with a number of recommendations for researchers and practitioners, as well as outlining key issues for future research.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Borchani, H., Varando, G., Bielza, C., & Larra˜naga, P. (2015). A survey on multi-output regression.Wiley Int. Rev. Data Min. and Knowl. Disc.,5(5), 216-233.
[2] Breiman, L. (1996). Bagging predictors.Machine Learning,24(2), 123-140. · Zbl 0858.68080
[3] Burkhardt, S., & Kramer, S. (2015). On the spectrum between binary relevance and classifier chains in multi-label classification. InSAC 2015: 30th ACM Symposium on Applied Computing, pp. 885-892. ACM.
[4] Cheng, W., & H¨ullermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification.Machine Learning,76(2-3), 211-225. · Zbl 1470.68091
[5] Cisse, M., Al-Shedivat, M., & Bengio, S. (2016). Adios: Architectures deep in output space. InProceedings of The 33rd International Conference on Machine Learning, Vol. 48, pp. 2770-2779, New York, New York, USA. PMLR.
[6] da Silva, P. N., Gon calves, E. C., Plastino, A., & Freitas, A. A. (2014). Distinct chains for different instances: An effective strategy for multi-label classifier chains. In Calders, T., Esposito, F., H¨ullermeier, E., & Meo, R. (Eds.),Machine Learning and Knowledge Discovery in Databases, pp. 453-468, Berlin, Heidelberg. Springer Berlin Heidelberg.
[7] Daum´e, III, H., Langford, J., & Marcu, D. (2009). Search-based structured prediction. Machine Learning,75(3), 297-325. · Zbl 1470.68094
[8] Dembczy´nski, K., Cheng, W., & H¨ullermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. InICML ’10: 27th International Conference on Machine Learning, pp. 279-286, Haifa, Israel. Omnipress.
[9] Dembczy´nski, K., Kotlowski, W., Waegeman, W., Busa-Fekete, R., & H¨ullermeier, E. (2016). Consistency of probabilistic classifier trees. InECML-PKDD 2016 : machine learning and knowledge discovery in databases, Vol. 9852, pp. 511-526. Springer.
[10] Dembczy´nski, K., Waegeman, W., Cheng, W., & H¨ullermeier, E. (2012a). On label dependence and loss minimization in multi-label classification.Machine Learning,88(1-2), 5-45. · Zbl 1243.68237
[11] Dembczy´nski, K., Waegeman, W., & H¨ullermeier, E. (2012b). An analysis of chaining in multi-label classification. InECAI: European Conference of Artificial Intelligence, Vol. 242, pp. 294-299. IOS Press. · Zbl 1327.68189
[12] Dietterich, T. G. (2002). Machine learning for sequential data: A review. InProceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition, pp. 15-30, London, U.K. Springer-Verlag. · Zbl 1073.68712
[13] Doppa, J. R., Fern, A., & Tadepalli, P. (2014a). HC-search: A learning framework for search-based structured prediction.Journal of Artificial Intelligence Research,50, 369-407. · Zbl 1367.68263
[14] Doppa, J. R., Yu, J., Ma, C., Fern, A., & Tadepalli, P. (2014b). HC-search for multilabel prediction: An empirical study. In Brodley, C. E., & Stone, P. (Eds.),AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Qu´ebec City, Qu´ebec, Canada, pp. 1795-1801. AAAI Press.
[15] Enrique Sucar, L., Bielza, C., Morales, E. F., Hernandez-Leal, P., Zaragoza, J. H., & Larra˜naga, P. (2014). Multi-label classification with bayesian network-based chain classifiers.Pattern Recognition Letters,41(C), 14-22.
[16] Frank, E., & Kramer, S. (2004). Ensembles of nested dichotomies for multi-class problems. InProceedings of the Twenty-First International Conference on Machine Learning, ICML 04, p. 39, New York, NY, USA. Association for Computing Machinery.
[17] Gasse, M. (2017).Probabilistic Graphical Model Structure Learning : Application to MultiLabel Classification. Theses, Universit´e de Lyon.
[18] Goncalves, E. C., Plastino, A., & Freitas, A. A. (2013). A genetic algorithm for optimizing the label ordering in multi-label classifier chains. In2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pp. 469-476.
[19] Goodfellow, I., Bengio, Y., & Courville, A. (2016).Deep Learning. The MIT Press. · Zbl 1373.68009
[20] Guo, Y., & Gu, S. (2011). Multi-label classification using conditional dependency networks. InIJCAI ’11: 24th International Conference on Artificial Intelligence, pp. 1300-1305. IJCAI/AAAI.
[21] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770- 778.
[22] Jun, X., Lu, Y., Lei, Z., & Guolun, D. (2019). Conditional entropy based classifier chains for multi-label classification.Neurocomputing,335, 185 - 194.
[23] Kajdanowicz, T., & Kazienko, P. (2009). Hybrid repayment prediction for debt portfolio. In Nguyen, N. T., Kowalczyk, R., & Chen, S.-M. (Eds.),Computational Collective Intelligence. Semantic Web, Social Networks and Multiagent Systems, pp. 850-857, Berlin, Heidelberg. Springer Berlin Heidelberg.
[24] Kajdanowicz, T., & Kazienko, P. (2013). Heuristic classifier chains for multi-label classification. In Larsen, H. L., Martin-Bautista, M. J., Vila, M. A., Andreasen, T., & Christiansen, H. (Eds.),Flexible Query Answering Systems, pp. 555-566, Berlin, Heidelberg. Springer Berlin Heidelberg.
[25] Kiritchenko, S., Matwin, S., Nock, R., & Famili, F. (2006). Learning and evaluation in the presence of class hierarchies: Application to text categorization. InProc. of the 19th Canadian Conference on Artificial Intelligence, pp. 395-406.
[26] Kocev, D., Vens, C., Struyf, J., & Dˇzeroski, S. (2007). Ensembles of multi-objective decision trees. InProc. of the 18th European Conf. on Machine Learning, ECML ’07, pp. 624- 631, Berlin, Heidelberg. Springer-Verlag.
[27] Kumar, A., Vembu, S., Menon, A., & Elkan, C. (2013). Beam search algorithms for multilabel learning.Machine Learning,92(1), 65-89. · Zbl 1273.68301
[28] Leathart, T., Frank, E., Pfahringer, B., & Holmes, G. (2019). On calibration of nested dichotomies. InAdvances in Knowledge Discovery and Data Mining - 23rd PacificAsia Conference, PAKDD 2019, Macau, China, April 14-17, 2019, Proceedings, Part I, pp. 69-80.
[29] Li, C.-L., & Lin, H.-T. (2014). Condensed filter tree for cost-sensitive multi-label classification. In Xing, E. P., & Jebara, T. (Eds.),Proceedings of the 31st International Conference on Machine Learning, Vol. 32 ofProceedings of Machine Learning Research, pp. 423-431, Bejing, China. PMLR.
[30] Lin, W., & Xu, D. (2016). Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types.Bioinformatics,32 24, 3745-3752.
[31] Liu, B., & Tsoumakas, G. (2018). Making classifier chains resilient to class imbalance. In Zhu, J., & Takeuchi, I. (Eds.),Proceedings of The 10th Asian Conference on Machine Learning, Vol. 95 ofProceedings of Machine Learning Research, pp. 280-295. PMLR.
[32] Liu, J., Chang, W.-C., Wu, Y., & Yang, Y. (2017). Deep learning for extreme multi-label text classification. InProceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, pp. 115-124, New York, NY, USA. ACM.
[33] Loza Menc´ıa, E., & Janssen, F. (2016). Learning rules for multi-label classification: a stacking and a separate-and-conquer approach.Machine Learning,105(1), 77-126.
[34] Madjarov, G., Kocev, D., Gjorgjevikj, D., & Dˇzeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning.Pattern Recognition,45(9), 3084- 3104.
[35] Mena, D., Monta˜n´es, E., Quevedo, J. R., & Coz, J. J. (2016). An overview of inference methods in probabilistic classifier chains for multilabel classification.Wiley Int. Rev. Data Min. and Knowl. Disc.,6(6), 215-230.
[36] Molnar,C.(2019).InterpretableMachineLearning. https://christophm.github.io/interpretable-ml-book/.
[37] Montiel, J., Read, J., Bifet, A., & Abdessalem, T. (2018). Scikit-MultiFlow: A multi-output streaming framework.Journal of Machine Learning Research,19(72), 1-5.
[38] Moyano, J. M., Galindo, E. L. G., Cios, K. J., & Ventura, S. (2018). Review of ensembles of multi-label classifiers: Models, experimental study and prospects.Information Fusion, 44, 33-45.
[39] Nam, J., Kim, J., Menc´ıa, E. L., Gurevych, I., & F¨urnkranz, J. (2014). Large-scale multilabel text classification - revisiting neural networks. InECML-PKDD ’14: 25th European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 437-452.
[40] Nam, J., Loza Menc´ıa, E., Kim, H. J., & F¨urnkranz, J. (2017). Maximizing subset accuracy with recurrent neural networks in multi-label classification. InAdvances in Neural Information Processing Systems 30, pp. 5413-5423.
[41] Narassiguin, A., Elghazel, H., & Aussem, A. (2017). Dynamic ensemble selection with probabilistic classifier chains. In Ceci, M., Hollm´en, J., Todorovski, L., Vens, C., & Dˇzeroski, S. (Eds.),Machine Learning and Knowledge Discovery in Databases, pp. 169-186, Cham. Springer International Publishing.
[42] Park, L. A. F., & Read, J. (2018). A blended metric for multi-label optimisation and evaluation. InECML-PKDD 2018: 29th European Conference on Machine Learning, pp. 719-734.
[43] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E. (2011). Scikit-learn: Machine learning in Python.Journal of Machine Learning Research,12, 2825-2830. · Zbl 1280.68189
[44] Pinto, F., Soares, C., & Mendes-Moreira, J. (2016). Chade: Metalearning with classifier chains for dynamic combination of classifiers. In Frasconi, P., Landwehr, N., Manco, G., & Vreeken, J. (Eds.),Machine Learning and Knowledge Discovery in Databases, pp. 410-425, Cham. Springer International Publishing.
[45] Powell, W. B. (2019). A unified framework for stochastic optimization.European Journal of Operational Research,275(3), 795 - 821. · Zbl 1430.90445
[46] Puurula, A., Read, J., & Bifet, A. (2014). Kaggle LSHTC4 winning solution. Tech. rep., Kaggle LSHTC4 Winning Solution. Report on our winning solution to the LSHTC4 Kaggle Competition.
[47] Ramrez-Corona, M., Sucar, L. E., & Morales, E. F. (2014). Chained path evaluation for hierarchical multi-label classification..
[48] Read, J., & Hollm´en, J. (2017). Multi-label classification using labels as hidden nodes. Tech. rep. 1503.09022v3, ArXiv.org. ArXiv.
[49] Read, J., & Martino, L. (2020). Probabilistic regressor chains with Monte-Carlo methods. Neurocomputing,In Press, 1-26.
[50] Read, J., Martino, L., & Hollm´en, J. (2017). Multi-label methods for prediction with sequential data.Pattern Recognition,63(March), 45-55.
[51] Read, J., Martino, L., & Luengo, D. (2014). Efficient Monte Carlo methods for multidimensional learning with classifier chains.Pattern Recognition,47(3), 1535-1546. · Zbl 1326.68251
[52] Read, J., Martino, L., Olmos, P. M., & Luengo, D. (2015). Scalable multi-output label prediction: From classifier chains to classifier trellises.Pattern Recognition,48(6), 2096-2109. · Zbl 1374.68421
[53] Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. InECML 2009: 20th European Conference on Machine Learning, pp. 254-269. Springer.
[54] Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification.Machine Learning,85(3), 333-359.
[55] Read, J., Puurula, A., & Bifet, A. (2014). Multi-label classification with meta labels. In ICDM’14: IEEE International Conference on Data Mining, pp. 941-946. IEEE.
[56] Rivolli, A., Read, J., Soares, C., Pfahringer, B., & de Carvalho, A. C. P. L. F. (2020). An empirical analysis of binary transformation strategies and base algorithms for multilabel learning.Machine Learning,In Press(1573-0565), 1-55.
[57] Scanagatta, M., Salmer´on, A., & Stella, F. (2019). A survey on bayesian network structure learning from data.Progress in Artificial Intelligence, pp. 425-439.
[58] Senge, R., del Coz, J., & H¨ullermeier, E. (2013). Rectifying classifier chains for multi-label classification. InLernen, Wissen, Adaption (LWA) 2013, pp. 162-169.
[59] Senge, R., del Coz, J. J., & H¨ullermeier, E. (2014). On the problem of error propagation in classifier chains for multi-label classification. In Spiliopoulou, M., Schmidt-Thieme, L., & Janning, R. (Eds.),Data Analysis, Machine Learning and Knowledge Discovery, pp. 163-170, Cham. Springer International Publishing.
[60] Szyma´nski, P., & Kajdanowicz, T. (2017). A scikit-based Python environment for performing multi-label classification.ArXiv e-prints.
[61] Teisseyre, P. (2017). CCnet: Joint multi-label classification and feature selection using classifier chains and elastic net regularization.Neurocomputing,235, 98 - 111.
[62] Tenenboim-Chekina, L., Rokach, L., & Shapira, B. (2013). Ensemble of feature chains for anomaly detection. InMultiple Classifier Systems, Vol. 7872 ofLecture Notes in Computer Science, pp. 295-306. Springer Berlin Heidelberg.
[63] Tsoumakas, G., Katakis, I., & Vlahavas, I. (2011). Random k-labelsets for multi-label classification.IEEE Transactions on Knowledge and Data Engineering,23(7), 1079- 1089.
[64] Waegeman, W., Dembczy´nski, K., & H¨ullermeier, E. (2019). Multi-target prediction: a unifying view on problems and methods.Data Mining and Knowledge Discovery, 33(2), 293-324. · Zbl 1464.62399
[65] Wever, M., Tornede, A., Mohr, F., & H¨ullermeier, E. (2020). Libre: Label-wise selection of base learners in binary relevance for multi-label classification. In Berthold, M. R., Feelders, A., & Krempl, G. (Eds.),Advances in Intelligent Data Analysis XVIII, pp. 561-573, Cham. Springer International Publishing.
[66] Wydmuch, M., Jasinska, K., Kuznetsov, M., Busa-Fekete, R., & Dembczynski, K. (2018). A no-regret generalization of hierarchical softmax to extreme multi-label classification. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., & Garnett, R. (Eds.),Advances in Neural Information Processing Systems 31, pp. 6355-6366. Curran Associates, Inc.
[67] Zaragoza, J. H., Sucar, L. E., Morales, E. F., Bielza, C., & Larra˜naga, P. (2011). Bayesian chain classifiers for multidimensional classification. In24th International Joint Conference on Artificial Intelligence (IJCAI ’11), pp. 2192-2197.
[68] Zhang, M.-L., Li, Y.-K., Liu, X.-Y., & Geng, X. (2018). Binary relevance for multi-label learning: an overview.Frontiers of Computer Science,12(2), 191-202.
[69] Zhang, M.-L., & Zhang, K. (2010). Multi-label learning by exploiting label dependency. In KDD ���10: 16th ACM SIGKDD International conference on Knowledge Discovery and Data mining, pp. 999-1008. ACM.
[70] Zhang, M.-L., & Zhou, Z.-H. (2014). A review on multi-label learning algorithms.IEEE Transactions on Knowledge and Data Engineering,26(8), 1819-1837
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.