Optimizing Dropout Prediction in University Using Oversampling Techniques for Imbalanced Datasets

Home > Articles > All issues > 2024 > Volume 14 Number 8 (2024) >

IJIET 2024 Vol.14(8): 1052-1060
doi: 10.18178/ijiet.2024.14.8.2133

I Ketut Resika Arthana*, I Made Dendi Maysanjaya, Gede Aditra Pradnyana, and Gede Rasben Dantes

Faculty of Engineering and Vocational, Universitas Pendidikan Ganesha, Singaraja, Bali, Indonesia
Email: resika@undiksha.ac.id (I.K.R.A.); dendi.maysanjaya@undiksha.ac.id (I.M.D.M.); gede.aditra@undiksha.ac.id (G.A.P.); rasben.dantes@undiksha.ac.id (G.R.D.)
^*Corresponding author

Manuscript received December 28, 2023; revised February 7, 2024; accepted March 18, 2024; published August 13, 2024

Abstract—The phenomenon of student dropout is a significant concern within universities. Institutions must accurately predict the likelihood of student dropout to address this issue effectively. The prediction of student dropout aids universities in identifying early signs of student challenges. Moreover, it enables institutions to implement proactive measures to mitigate dropout rates. This paper presents a novel approach for selecting a classification algorithm to predict student dropout to aid universities in identifying early signs of student dropout. Moreover, it enables institutions to implement proactive measures to mitigate dropout rates. Each university possesses its academic dataset attributes, which can be leveraged for predicting potential dropout cases of student dropout. Our methodology begins with attribute selection, dataset preprocessing, and comparative evaluation of classification algorithms based on priority performance metrics. The research case study is conducted at Universitas Pendidikan Ganesha (Undiksha). The model selection was based on comparing classification algorithm performance, including Naïve Bayesian, Decision Tree (DT), and K-Nearest Neighbors (KNN). The dataset for this research was collected from the Information Academic System of Undiksha, encompassing students who graduated or dropped out between 2013 and 2023. It should be noted that the dataset exhibits class imbalance. Hence, this research utilized the Synthetic Minority Over Sampling Technique (SMOTE) algorithm to address the imbalance in lowsized datasets. The original and oversampled datasets were subjected to each classification algorithm. We chose Recall as the primary evaluation metric to prioritize ensuring that actual dropouts are not incorrectly predicted as graduates. This research demonstrates that the KNN classification algorithm, applied to the oversampled dataset, achieves the highest Recall value of 93.5%, Precision of 94.1%, F1-Score of 93.5%, and AUC value of 97.9%.

Keywords—drop out, oversampling, K-Nearest Neighbors (KNN), decision tree, Naïve Bayesian

[PDF]

Cite: I Ketut Resika Arthana, I Made Dendi Maysanjaya, Gede Aditra Pradnyana, and Gede Rasben Dantes, "Optimizing Dropout Prediction in University Using Oversampling Techniques for Imbalanced Datasets," International Journal of Information and Education Technology vol. 14, no. 8, pp. 1052-1060, 2024.

Copyright © 2024 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Article Metrics in Dimensions

Previous Paper

The Influence of Students’ Beliefs of ChatGPT on Their Intentions of Using ChatGPT in Learning Foreign Languages

Next Paper

Evaluating the Impact of Film and Television on Student Learning Outcomes in Chinese University Education: A Quantitative Analysis

Optimizing Dropout Prediction in University Using Oversampling Techniques for Imbalanced Datasets

Article Metrics in Dimensions

Menu