Application of forward selection strategy using C4.5 algorithm to improve the accuracy of classification’s data set

Etika, Kartidarma and Pandu, Adi and Syafar, Faisal and Akbar, Iskandar and Arman, Paramansyah and Robbi, Rahim (2023) Application of forward selection strategy using C4.5 algorithm to improve the accuracy of classification’s data set. Application of Forward Selection Strategy Using C4.5 Algorithm to Improve the Accuracy of Classification’s data set, 30 (1). e14-e23. ISSN 1198-581X

[img] Text (Application of forward selection strategy using C4.5 algorithm to improve the accuracy of classification’s data set)
Application of Forward Selection-JPTCP_ARTIKEL.pdf - Published Version

Download (1MB)
[img] Text (Peer Review-Application of forward selection strategy using C4.5 algorithm to improve the accuracy of classification’s data set)
Peer Review Jurnal Internasional-Application of Forward Selection Strategy Using C4.5 Algorithm to I_a.pdf - Supplemental Material

Download (1MB)
[img] Text (TURNITIN-Application of forward selection strategy using C4.5 algorithm to improve the accuracy of classification’s data set)
Application of Forward Selection-JPTCP_TURNITIN.pdf - Supplemental Material

Download (1MB)
Official URL: https://jptcp.com/index.php/jptcp/article/view/100...

Abstract

The purpose of this study is to improve the classification accuracy of the C4.5 Algorithm utilizing the forward selection technique. Breast Cancer from the UCI Machine Learning Repository is the dataset utilized. There are 286 records in the dataset with nine attributes and one class (label). The suggested model was evaluated with two existing classification models (C4.5 and Naïve Bayes) using the RapidMiner program. The procedure consists of multiple stages, the first of which consists of selecting the dominant trait using the feature selection technique (weight by information gain). The second step is forward selection based on the outcome of feature selection. Before processing, the dataset is separated into training and testing halves, where the ratios of comparison are 70:30, 80:20, and 90:10. The final step is examining the output. The experimental results demonstrate that the forward selection methodology employing the C4.5 (C4.5 + FS method outperforms the C4.5 and Naïve Bayes classification techniques. C4.5 + FS (Split Data 70:30) has an accuracy value of 76.74%, C4.5 + FS (Split Data 80:20) has an accuracy value of 78.95%, C4.5 + FS (Split Data 90:10) has an accuracy value of 78.57%, C4.5 (Split Data 70:30) has an accuracy value of 65.12%, and Naïve Bayes (Split Data is 70:30) has an accuracy value 85.55%. In comparison to typical classification algorithms (C4.5 and Naïve Bayes), the average accuracy values increased by 12.97% and 8.32%, respectively. In terms of precision, recall, and F-measure, the forward selection strategy utilizing the C4.5 method beat all other classification techniques, achieving 79.84%, 92.50%, and 85.55%, respectively. In addition, the results demonstrated an increase in the average Area Under Curve (AUC) from 0.628 to 0.732%. Therefore, it can be inferred that the forward selection strategy can be applied to the Breast Cancer Data Set in order to increase the accuracy value of classification method C4.5. Keywords: Forward selection, data mining, classification, method selection, data mining, classification, method C4.5, breast cancer

Item Type: Article
Subjects: FAKULTAS TEKNIK > Pendidikan Teknik Elektronika
Divisions: FAKULTAS TEKNIK
Depositing User: Faisal Syafar
Date Deposited: 27 Apr 2023 08:07
Last Modified: 27 Apr 2023 08:07
URI: http://eprints.unm.ac.id/id/eprint/27690

Actions (login required)

View Item View Item