PENGARUH QUERY EXPANSION TERHADAP PENDETEKSIAN KEMIRIPAN TEKS MENGGUNAKAN COSINE SIMILARITY
Daftar Isi:
- Cosine Similarity is a method of calculating the similarity of text that depends on the same word as the word being tested. If the word in the test text is not the same as the word in the source text, then the word does not match the word in the word list and the word cannot be counted. This research examines the effect of query expansion using a thesaurus, which is one algorithm to improve the effectiveness of a word list match. Cosine similarity algorithm with query expansion or without query expansion each tested with 7 source documents and 21 comparison documents. Based on cosine similarity evaluation results with query expansion can improve the detection of text similarity compared to the cosine similarity algorithm without query expansion, which is a percentage value of 46.90%, on data without query expansion and 43.11% for window size 2, 42.90 % for window size 3, 42.59% for window size 4. Although it can increase overall computing time, however, the term obtained from forming query expansion makes the text similarity better. Keywords: Expansion Query, Thesaurus, Cosine Similarity, Window Size.