A Dataset-Driven Parameter Tuning Approach for Enhanced K-Nearest Neighbour Algorithm Performance
Main Authors: | Inyang, Udoinyang G.; Department of Computer Science, Faculty of Science, University of Uyo, Nigeria, Ijebu, Funebi F.; School of Computer Science and Technology, Harbin Institute of Technology, China, Osang, Francis B.; Department of Computer Science, National Open University of Nigeria, Abuja, Nigeria, Afoluronsho, Aderenle A.; Department of Computer Science, National Open University of Nigeria, Abuja, Nigeria, Udoh, Samuel S.; Department of Computer Science, Faculty of Science, University of Uyo, Nigeria, Eyoh, Imo J.; Department of Computer Science, Faculty of Science, University of Uyo, Nigeria |
---|---|
Format: | Article info application/pdf eJournal |
Bahasa: | eng |
Terbitan: |
International Journal on Advanced Science, Engineering and Information Technology
, 2023
|
Subjects: | |
Online Access: |
http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/16706 http://insightsociety.org/ojaseit/index.php/ijaseit/article/view/16706/pdf_2367 |
Daftar Isi:
- The number of Neighbours (k) and distance measure (DM) are widely modified for improved kNN performance. This work investigates the joint effect of these parameters in conjunction with dataset characteristics (DC) on kNN performance. Euclidean; Chebychev; Manhattan; Minkowski; and Filtered distances, eleven k values, and four DC, were systematically selected for the parameter tuning experiments. Each experiment had 20 iterations, 10-fold cross-validation method and thirty-three randomly selected datasets from the UCI repository. From the results, the average root mean squared error of kNN is significantly affected by the type of task (p9000, as optimal performance pattern for classification tasks. For regression problems, the experimental configuration should be7000≤SS≤9000; 4≤number of attributes ≤6, and DM = 'Filtered'. The type of task performed is the most influential kNN performance determinant, followed by DM. The variation in kNN accuracy resulting from changes in k values only occurs by chance, as it does not depict any consistent pattern, while its joint effect of k value with other parameters yielded a statistically insignificant change in mean accuracy (p>0.5). As further work, the discovered patterns would serve as the standard reference for comparative analytics of kNN performance with other classification and regression algorithms.