Comparison of Hierarchical Agglomerative Algorithms for Clustering Medical Documents

Main Author: Fathi H. Saad , Omer I. E. Mohamed , and Rafa E. Al-Qutaish
Format: Article Journal
Terbitan: , 2021
Online Access: https://zenodo.org/record/4435458
Daftar Isi:
  • ABSTRACT Extensive amount of data stored in medical documents require developing methods that help users to find what they are looking for effectively by organizing large amounts of information into a small number of meaningful clusters. The produced clusters contain groups of objects which are more similar to each other than to the members of any other group. Thus, the aim of high-quality document clustering algorithms is to determine a set of clusters in which the inter-cluster similarity is minimized and intra-cluster similarity is maximized. The most important feature in many clustering algorithms is treating the clustering problem as an optimization process, that is, maximizing or minimizing a particular clustering criterion function defined over the whole clustering solution. The only real difference between agglomerative algorithms is how they choose which clusters to merge. The main purpose of this paper is to compare different agglomerative algorithms based on the evaluation of the clusters quality produced by different hierarchical agglomerative clustering algorithms using different criterion functions for the problem of clustering medical documents. Our experimental results showed that the agglomerative algorithm that uses I1 as its criterion function for choosing which clusters to merge produced better clusters quality than the other criterion functions in term of entropy and purity as external measures. KEYWORDS Medical Documents, Clustering, Hierarchical Agglomerative Algorithms