×

A method of data analysis based on division-mining-fusion strategy. (English) Zbl 07848808

Summary: With the advancement of data technology and storage services, the scale and complexity of data are rapidly growing. Consequently, promptly analyzing data and deriving precise insights have become urgent. Nevertheless, traditional methods struggle to balance the speed and accuracy of data mining. This paper proposes a data analysis technique called the Division-Mining-Fusion (DMF) strategy to tackle this challenge. Specifically, we divide a large-scale and complex dataset into multiple small-scale and simple sub-datasets. Then, we extract the knowledge embedded within each sub-dataset. Finally, we combine the extracted knowledge from each sub-dataset to accomplish learning tasks. To demonstrate the superior performance of the DMF strategy, we apply it to two fields: rough set theory and feature selection. The DMF strategy can accelerate the speed of data mining, enhance the accuracy of data analysis, and reduce the dimensionality of data. These advantages suggest that the DMF strategy outperforms traditional methods in processing data more efficiently. In addition, the number of sub-datasets is a crucial parameter of the DMF strategy. As the number of sub-datasets increases, the ability of the DMF strategy to analyze data continuously improves.

MSC:

68-XX Computer science
92-XX Biology and other natural sciences
Full Text: DOI

References:

[1] Eirola, E.; Lendasse, A.; Vandewalle, V., Mixture of Gaussians for distance estimation with missing data, Neurocomputing, 131, 32-42, 2014
[2] Yu, Q.; Miche, Y.; Eirola, E., Regularized extreme learning machine for regression with missing data, Neurocomputing, 102, 45-51, 2013
[3] Zadeh, L. A., Fuzzy sets, Inf. Control, 8, 338-353, 1965 · Zbl 0139.24606
[4] Kovalerchuk, B.; Triantaphyllou, E.; Ruizet, J. F., Fuzzy logic in computer-aided breast cancer diagnosis: analysis of lobulation, Artif. Intell. Med., 11, 75-85, 1997
[5] Li, W. T.; Zhai, S. C.; Xu, W. H., Feature selection approach based on improved fuzzy C-means with principle of refined justifiable granularity, IEEE Trans. Fuzzy Syst., 31, 2112-2126, 2023
[6] Pawlak, Z., Rough sets, Int. J. Comput. Inf. Sci., 11, 341-356, 1982 · Zbl 0501.68053
[7] Xu, W. H.; Huang, M.; Jiang, Z. Y., Graph-based unsupervised feature selection for interval-valued information system, IEEE Trans. Neural Netw. Learn. Syst., 2023
[8] Xu, W. H.; Yuan, Z. T.; Liu, Z., Feature selection for unbalanced distribution hybrid data based on k-nearest neighborhood rough set, IEEE Trans. Artif. Intell., 2023
[9] Kong, Q. Z.; Wang, W. T.; Zhang, D. X., Two kinds of average approximation accuracy, CAAI Trans. Intell. Technol., 2023
[10] Qian, Y. H.; Liang, X. Y.; Wang, Q., Local rough set: a solution to rough data analysis in big data, Int. J. Approx. Reason., 97, 38-63, 2018 · Zbl 1445.68223
[11] Kong, Q. Z.; Chang, X. E., Rough set model based on variable universe, CAAI Trans. Intell. Technol., 7, 503-511, 2022
[12] Zhang, J. B.; Li, T. R.; Ruan, D., A parallel method for computing rough set approximations, Inf. Sci., 194, 209-223, 2012
[13] Li, S. Y.; Li, T. R., A parallel matrix-based approach for computing approximations in dominance-based rough sets approach, (9th International Conference on Rough Sets and Knowledge Technology (RSKT). 9th International Conference on Rough Sets and Knowledge Technology (RSKT), Shanghai, China, 2014)
[14] Zhang, J. B.; Wong, J. S.; Pan, Y., A parallel matrix-based method for computing approximations in incomplete information systems, IEEE Trans. Knowl. Data Eng., 27, 2, 326-339, 2015
[15] Yao, Y. Y., Three-way decisions with probabilistic rough sets, Inf. Sci., 180, 341-353, 2010
[16] Kong, Q. Z.; Zhang, X. W.; Xu, W. H., A novel granular computing model based on three-way decision, Int. J. Approx. Reason., 144, 92-112, 2022 · Zbl 07512026
[17] Yao, Y. Y., Three-way conflict analysis: reformulations and extensions of the Pawlak model, Knowl.-Based Syst., 180, 26-37, 2019
[18] Srishti, V.; Seba, S., Sentiment cognition from words shortlisted by fuzzy entropy, IEEE Trans. Cogn. Dev. Syst., 12, 541-550, 2020
[19] Hu, M.; Chen, Y. T.; Chen, D. G., Attribute reduction based on neighborhood constrained fuzzy rough sets, Knowl.-Based Syst., 274, Article 110632 pp., 2023
[20] Wang, P.; He, J. L.; Li, Z. W., Attribute reduction for hybrid data based on fuzzy rough iterative computation model, Inf. Sci., 632, 555-575, 2023 · Zbl 07829679
[21] Qian, W. B.; Yu, S. D.; Yang, J., Multi-label feature selection based on information entropy fusion in multi-source decision system, Evol. Intell., 13, 255-268, 2020
[22] Aremu, O. O.; Cody, R. A.; Hyland-Wood, D., A relative entropy based feature selection framework for asset data in predictive maintenance, Comput. Ind. Eng., 145, Article 106536 pp., 2020
[23] Sun, L., Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy data, IEEE Trans. Fuzzy Syst., 29, 19-33, 2021
[24] Xu, W. H.; Guo, D. D.; Mi, J. S., Two-way concept-cognitive learning via concept movement viewpoint, IEEE Trans. Neural Netw. Learn. Syst., 34, 10, 6798-6812, 2023
[25] Xu, W. H.; Guo, D. D.; Qian, Y. H., Two-way concept-cognitive learning method: a fuzzy-based progressive learning, IEEE Trans. Fuzzy Syst., 31, 1885-1899, 2023
[26] Xu, W. H.; Pan, Y. Z.; Chen, X. W., A novel dynamic fusion approach using information entropy for interval-valued ordered datasets, IEEE Trans. Big Data, 9, 845-859, 2023
[27] Xu, W. H.; Yuan, K. H.; Li, W. T., An emerging fuzzy feature selection method using composite entropy-based uncertainty measure and data distribution, IEEE Trans. Emerg. Top. Comput. Intell., 7, 76-88, 2022
[28] Sang, B. B.; Chen, H. M.; Yang, L., Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets, IEEE Trans. Fuzzy Syst., 30, 1683-1697, 2022
[29] Ding, W. P.; Qin, T. Z.; Shen, X. J., Parallel incremental efficient attribute reduction algorithm based on attribute tree, Inf. Sci., 610, 1102-1121, 2022
[30] Yang, Y.; Chen, Z. R.; Liang, Z., Attribute reduction for massive data based on rough set theory and MapReduce, (5th International Conference on Rough Set and Knowledge Technology (RSKT). 5th International Conference on Rough Set and Knowledge Technology (RSKT), Beijing, China, 2010)
[31] Chen, H. M.; Li, T. R.; Cai, Y., Parallel attribute reduction in dominance-based neighborhood rough set, Inf. Sci., 373, 351-368, 2016 · Zbl 1429.68280
[32] Luo, C.; Wang, S. Z.; Li, T. R., Spark rough hypercuboid approach for scalable feature selection, IEEE Trans. Knowl. Data Eng., 35, 3130-3144, 2023
[33] Dai, G. Y.; Jiang, T. B.; Mu, Y. L., A novel rough sets positive region based parallel multi-reduction algorithm, (4th International Conference on Advanced Intelligent Systems and Informatics (AISI). 4th International Conference on Advanced Intelligent Systems and Informatics (AISI), Cairo, Egypt, 2018)
[34] Zhang, X. Y.; Hou, J. L., A novel rough set method based on adjustable-perspective dominance relations in intuitionistic fuzzy ordered decision tables, Int. J. Approx. Reason., 154, 218-241, 2023 · Zbl 07698078
[35] Xia, S. Y.; Liu, Y. S.; Ding, X., Granular ball computing classifiers for efficient, scalable and dobust learning, Inf. Sci., 483, 136-152, 2019
[36] Pawlak, Z., Information systems theoretical foundations, Inf. Syst., 6, 205-218, 1981 · Zbl 0462.68078
[37] Pawlak, Z., Rough Sets: Theoretical Aspects of Reasoning About Data, 1991, Kluwer Academic Publishers: Kluwer Academic Publishers Boston · Zbl 0758.68054
[38] Wang, C. Z.; Wang, Y.; Shao, M. W., Fuzzy rough attribute reduction for categorical data, IEEE Trans. Fuzzy Syst., 28, 818-830, 2020
[39] Xu, W. H.; Guo, Y. T., Generalized multigranulation double-quantitative decision-theoretic rough set, Knowl.-Based Syst., 105, 190-205, 2016
[40] Kong, Q. Z.; Xu, W. H.; Zhang, D. X., A comparative study of different granular structures induced from the information systems, Soft Comput., 26, 105-122, 2022
[41] Yang, X. L.; Chen, H. M.; Wang, H., Feature selection with local density-based fuzzy rough set model for noisy data, IEEE Trans. Fuzzy Syst., 31, 1614-1627, 2023
[42] Pedrycz, W., Granular Computing: Analysis and Design of Intelligent Systems, 2013, CRC Press
[43] Zhang, W. X.; Wu, W. Z.; Liang, J. Y., Rough Set Theory and Method, 2001, Science Press: Science Press Beijing
[44] Wang, Z. H.; Chen, H. M.; Yang, X. L., Fuzzy rough dimensionality reduction: a feature set partition-based approach, Inf. Sci., 2023 · Zbl 07837671
[45] Guo, Z. J.; Shen, Y.; Yang, T., Semi-supervised feature selection based on fuzzy related family, Inf. Sci., 652, 2023
[46] Zhang, H. Q.; Yu, X.; Li, T. R., Noise-aware and correlation analysis-based for fuzzy-rough feature selection, Inf. Sci., 659, Article 120047 pp., 2024 · Zbl 07806104
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.