ANALISA PERBANDINGAN ALGORITMA C4.5 DAN NAIVE BAYES DALAM MELAKUKAN KLASIFIKASI TEKS BERITA
Daftar Isi:
- Classification is one of the data mining techniques used to predict group membership in data instances. Text classification is a branch of classification that classifies a set of documents into automatically assigned categories. C4.5 and Naive Bayes algorithms are two algorithms that are often compared in the classification tasks because both of them have high accuracy, but generally only with the implementation of numeric datasets. In this study the C4.5 and Naive Bayes algorithms use word weighting techniques and pre-processing to finally predict the classes, and then the performace can be compared to see if they still maintain good performance or not. The C4.5 algorithm has threshold, entropy, info, and gain values which has an important role in building a decision tree, and related to the prediction of each document, variations in the gain value, and the frequency of occurrences for each word in the dataset and the key to making tuples in decision tree. While in the Naïve Bayes Algorithm, predictions depend on the posterior value that can be obtained by multiplying all the word weights for each document and comparing them by the training set. Naive Bayes algorithm with a total of 500 training data text documents resulting a high accuracy on 97.4% and an efficient computing time of 98.45 seconds.