Manuscript received December 28, 2023; revised February 7, 2024; accepted March 18, 2024; published August 13, 2024
Abstract—The phenomenon of student dropout is a significant concern within universities. Institutions must accurately predict the likelihood of student dropout to address this issue effectively. The prediction of student dropout aids universities in identifying early signs of student challenges. Moreover, it enables institutions to implement proactive measures to mitigate dropout rates. This paper presents a novel approach for selecting a classification algorithm to predict student dropout to aid universities in identifying early signs of student dropout. Moreover, it enables institutions to implement proactive measures to mitigate dropout rates. Each university possesses its academic dataset attributes, which can be leveraged for predicting potential dropout cases of student dropout. Our methodology begins with attribute selection, dataset preprocessing, and comparative evaluation of classification algorithms based on priority performance metrics. The research case study is conducted at Universitas Pendidikan Ganesha (Undiksha). The model selection was based on comparing classification algorithm performance, including Naïve Bayesian, Decision Tree (DT), and K-Nearest Neighbors (KNN). The dataset for this research was collected from the Information Academic System of Undiksha, encompassing students who graduated or dropped out between 2013 and 2023. It should be noted that the dataset exhibits class imbalance. Hence, this research utilized the Synthetic Minority Over Sampling Technique (SMOTE) algorithm to address the imbalance in lowsized datasets. The original and oversampled datasets were subjected to each classification algorithm. We chose Recall as the primary evaluation metric to prioritize ensuring that actual dropouts are not incorrectly predicted as graduates. This research demonstrates that the KNN classification algorithm, applied to the oversampled dataset, achieves the highest Recall value of 93.5%, Precision of 94.1%, F1-Score of 93.5%, and AUC value of 97.9%.
Keywords—drop out, oversampling, K-Nearest Neighbors (KNN), decision tree, Naïve Bayesian
Cite: I Ketut Resika Arthana, I Made Dendi Maysanjaya, Gede Aditra Pradnyana, and Gede Rasben Dantes, "Optimizing Dropout Prediction in University Using Oversampling Techniques for Imbalanced Datasets," International Journal of Information and Education Technology vol. 14, no. 8, pp. 1052-1060, 2024.