Daftar Isi:
  • Diabetes is one of the serious diseases and it causes the sufferer to have high blood sugar due to the body unable to produce the required amount of insulin to regulate glucose. It may cause complications or may increase the risk of developing another disease like heart disease, kidney disease, blindness, etc. One of the best ways to fight this disease is by early diagnosis. If there are a lot of patient records, the machine learning classification algorithms play a great role in predicting whether a person has diabetes or not. The used dataset is Diabetes UCI Dataset from kaggle which has been collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet, Bangladesh, and approved by a doctor. The dataset has 520 data and 17 attributes. Several studies have been made in the last few decades and some of them show that Artificial Neural Networks (ANN) are one of the best algorithms for diabetes predictions, Extreme Gradient Boosting (XGBoost) is one of the popular machine learning algorithms used for classification, because of that reason the writer wants to find out whether XGBoost can be used on diabetes prediction and compare it with ANN. Both algorithms models were trained with the same ratio 80:20, 75:25, 70:30. 60:40, and 50:50. There are four models for the ANN with 3 hidden layers, 4 hidden layers, 5 hidden layers, and 6 hidden layers, as for the XGBoost models there are the first model with default parameters and the second one with the hyperparameters tuning. The accuracy, precision, recall, and f1 score of the models will be compared to find out which one has better performance. XGBoost performance able to achieve better performance but the third ANN models able to achieve highest score on 80:20, with 75:25 XGBoost with hyperparameters tuning able to achieve highest score, but XGBoost with default parameters have the same score as the the third ANN model, with 70:30 ratio, the third ANN model and both XGBoost models have the same score and have the highest score among all ratio. with 60:40 ratio, the first to third ANN models and XGBoost with default parameters have the same accuracy score but the third ANN models have the highest recall but lower precision than the XGBoost models. And with 50:50 XGBoost 2 has the best overall performances than the other models.