EKSTRAKSI INFORMASI KESEHATAN MASYARAKAT DARI TWEET BERBAHASA INDONESIA BERBASIS KLASIFIKASI DENGAN ALGORITMA NAIVE BAYES
Main Author: | Rosikin, Khoirir |
---|---|
Format: | Thesis NonPeerReviewed Book |
Bahasa: | eng |
Terbitan: |
, 2018
|
Subjects: | |
Online Access: |
http://eprints.umm.ac.id/40233/1/Pendahuluan.pdf http://eprints.umm.ac.id/40233/2/BAB%201.pdf http://eprints.umm.ac.id/40233/3/BAB%202.pdf http://eprints.umm.ac.id/40233/4/BAB%203.pdf http://eprints.umm.ac.id/40233/5/BAB%204.pdf http://eprints.umm.ac.id/40233/6/BAB%205.pdf http://eprints.umm.ac.id/40233/ |
Daftar Isi:
- Health is a primary human need. In Indonesia there are health problems, namely the increase of infectious diseases and non-communicable diseases. To overcome this need to do precautionary measures. One effort to prevent disease, is to know the disease information, including about the causes and effects caused, so it can do prevention. Information can be obtained in various ways, one of which is taken from social media, especially twitter. Twitter is used because of the number of tweets produced resulting in big data phenomenon. Because of that, this research intends to perform an information extraction method. Information extraction is a method of application of data mining, especially the text mining field used to obtain information from a large collection of data. The information in question is a disease, effect, and cause. This research uses a classification-based information extraction approach with Naive Bayes algorithm. This research uses 7 feature sets and a model of classification algorithm that is Naive Bayes. In feature extraction there is imbalance dataset, so it is done resample filtering data. The test is done by 2 methods, namely model testing using 10-folds cross-validation and classification testing using 100 test data. The result of model test get the accuracy value 77,27% and the classification test get the accuracy value 74,16%.