Examining learning algorithms for text classification in digital libraries

Main Author: Fahmi, Ismail
Format: Thesis NonPeerReviewed application/pdf
Bahasa: eng
Terbitan: , 2004
Subjects:
Online Access: http://eprints.rclis.org/9315/1/IsmailFahmi_Thesis_master.pdf
http://eprints.rclis.org/9315/
Daftar Isi:
  • Information presentation in a digital library plays important role especially in improving the usability of collections and helping users to get started with the collection. One approach is to provide an overview through large topical category hierarchies associated with the documents of a collection. But with the growth in the amount of information, this manual classification becomes a new problem for users. The navigation through the hierarchy can be a time-consuming and frustrating process. In this master thesis, we examine the performance of machine learning algorithms for automatic text classification. We examine three learning algorithms namely ID3, Instance Based Learning, and Naive Bayes to classify documents according to their category hierarchies. We focused on the effectiveness measurement such as recall, precision, the F1- measure, error, and the learning curve in learning a manually classified metadata collection from the Indonesian Digital Library Network (IndonesiaDLN), and we compare the results with an examination of the Reuters-21578 dataset. We summarize the algorithm that is most suitable for the digital library collection and the performance of the algorithms on these datasets.