Clustering K-Means untuk Sistem Tanya Jawab Bahasa Indonesia Bidang Kesehatan

Main Authors: Muliadi, Steven, Mawardi, Viny Christanti, Pragantha, Jeanny
Format: Article eJournal
Bahasa: ind
Terbitan: Fakultas Teknologi Informasi Universitas Tarumanagara , 2015
Online Access: http://fti.tarumanagara.ac.id/jurnal/index.php/JIKSI/article/view/254
Daftar Isi:
  • Question and Answering (QA) system is a system to answer question based on collections of unstructured text documents in the form of natural language or human language. In general, QA system consists of four stages, i.e. question analysis, document selection, passage retrieval, and answer extraction. In this study, we added two processes, i.e. documents clustering and passage clustering. Clustering K-Means is used for this study. Naive Bayes Classification is used for document or passage selection. Passage building is done with Dynamic Passage Partitioning. Document selection is done with Lucene. The experiments was done using 100 questions from 1000 Indonesian Health Documents. Test results show that system without clustering has the best accuracy 63 %. System produces the best result with the use of 5 of the most relevant documents, 5 passage with the highest score, and 10 answer with the closest distance.