Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy

Mustapa, Mahmud and Rahmah, Ummiati and Cakranegara, Pandu Adi and Firdaus, Winci and Pratama, Dendi and Rahim, Robbi (2023) Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy. Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications, 14 (1). pp. 50-59. ISSN 20935374

[img] Text (Artikel_JoWUA_Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy)
2023_ Mustapa_Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy.pdf

Download (469kB)
[img] Text (Turnitin_JoWUA_Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy)
TURNITIN_Mustapa_2023_JoWUA_ Implementation of Feature Selection and Data Split using brute force to improve accuracy.pdf

Download (531kB)
[img] Text (Koresponding_JoWUA_Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy)
Koresponding_Mustapa_2023_JoWUA_Implementation of Feature Selection and Data Split.pdf

Download (337kB)
[img] Text (Peer Review_JoWUA_Implementation of Feature Selection and Data Split using Brute Force to Improve Accuracy)
Jurnal Internasional-Implementation of Feature Selection and Data Split using Brute Force to Improve.pdf

Download (952kB)
Official URL: http://jowua.com/article/2023.I1.004/70780/

Abstract

This study seeks to classify data using feature selection and brute force. The dataset contains irrelevant characteristics, therefore feature selection influences computing time and the classification model. UCI's YouTube Spam Collection was used for testing. This dataset contains five datasets with 1,956 legitimate messages from five popular videos (Shakira, Katy Perry, Psy, Eminem, and LMFAO). Using weight information gain, the feature selection technique finds the best attributes. The dataset will then be separated into two parts: training with a 70:30 ratio and testing with a 30:70 ratio. Comparing using C4.5 and Nave Bayes. The FS+BF+C4.5 approach has an accuracy of 69.90%, 63.37%, 98.32%, 50.89%, and 91.75 for five videos (Psy, Katy Perry, LMFAO, Eminem and Shakira). Standard C4.5 technique accuracy is 66.99%, 59.41%, 95.80%, 50.89%, and 88.66%. Naive Bayes accuracy is 61.17, 51.49, 89.08, 50.00, and 79.38. FS+BF+C4.5 obtains an overall average accuracy of 74.85%, 2.5% and 8.6% higher than C4.5 and Naive Bayes (72.35 percent and 66.22 percent). Using feature selection and brute force with the C4.5 approach can reduce classification error compared to the normal C4.5 and Naive Bayes methods.

Item Type: Article
Subjects: FAKULTAS TEKNIK > Pendidikan Teknik Elektronika
Divisions: FAKULTAS TEKNIK
Depositing User: Dr. Hendra Jaya
Date Deposited: 28 Jun 2023 02:12
Last Modified: 29 Jun 2023 04:33
URI: http://eprints.unm.ac.id/id/eprint/32174

Actions (login required)

View Item View Item