Semantic Clustering Dan Pemilihan Kalimat Representatif Untuk Peringkasan Multi Dokumen

Main Authors: ., Pasnur, Santika, Putu Praba, Syaifuddin, Gus Nanang
Format: Article info application/pdf Journal
Bahasa: eng
Terbitan: Fakultas Ilmu Komputer, Universitas Brawijaya , 2014
Online Access: http://jtiik.ub.ac.id/index.php/jtiik/article/view/117
http://jtiik.ub.ac.id/index.php/jtiik/article/view/117/pdf
Daftar Isi:
  • Abstrak Coverage dan saliency merupakan masalah utama dalam peringkasan multi dokumen. Hasil ringkasan yang baik harus mampu mampu mencakup (coverage) sebanyak mungkin konsep penting (salient) yang ada pada dokumen sumber. Penelitian ini bertujuan untuk mengembangkan metode baru peringkasan multi dokumen dengan teknik semantic clustering dan pemilihan kalimat representatif cluster. Metode yang diusulkan berdasarkan prinsip kerja Latent Semantic Indexing (LSI) dan Similarity Based Histogram Clustering (SHC) untuk pembentukan cluster kalimat secara semantik, serta mengkombinasikan fitur Sentence Information Density (SID) dan Sentence Cluster Keyword (SCK) untuk pemilihan kalimat representatif cluster. Pengujian dilakukan pada dataset Document Understanding Conference (DUC) 2004 Task 2 dan hasilnya diukur menggunakan Recall-Oriented Understudy for Gisting Evaluation (ROUGE). Hasil pengujian menunjukkan bahwa metode yang diusulkan mampu mencapai nilai ROUGE-1 rata-rata sebesar 0,395 dan nilai ROUGE-2 rata-rata sebesar 0,106. Kata kunci: peringkasan multi dokumen, latent semantic indexing, similarity based histogram clustering, sentence information density, sentence cluster keyword Abstract Coverage and saliency is a major problem in multi-document summarization. The good summary should be able to cover (coverage) as much as possible the important concepts (salient) that exist in the source document. This research aims to develop a new method for multiple document summarization with semantic clustering techniques and the selection of representative clusters sentence. The proposed method is based on the principles of Latent Semantic Indexing (LSI) and Similarity Based Histogram Clustering (SHC) for clustering sentences semantically, and combine features of Sentence Information Density (SID) and Sentence Cluster Keyword (SCK) for selecting a representative sentence cluster. Tests are performed on Document Understanding Conference (DUC) 2004 Task 2 dataset and the results are measured using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE). The results show that the proposed method is able to achieve ROUGE-1 value by an average of 0.395 and the ROUGE-2 value by an average of 0.106. Keywords: multiple document summarization, latent semantic indexing, similarity based histogram clustering, sentence information density, sentence cluster keyword