A modified adaptive synthetic SMOTE approach in graduation success rate classification
Share
Abstract
In the real research situation, the oversampling method in data preprocessing is used to solve the problem in imbalanced data. This imbalance may lessen the capability of classification algorithms to identify instances of interest that lead to misclassification such as false positive generation. These imbalanced datasets come from fields of finance, health, education, among other areas. Academic related data such as graduate success rate on higher education are at times imbalanced. One of the established oversampling methods is the Synthetic Minority Oversampling Technique (SMOTE) with Adaptive Synthetic (Adasyn) SMOTE as one of its many variations. K-Nearest Neighbors (KNN) calculations using Euclidean distance is an embedded in Adasyn. In this study, Manhattan distance is utilized in the KNN calculations. The researchers correspondingly gathered actual data from open admission programs of Davao del Norte State College for the training and testing, which consists of 14 features and 897 records. This modified Adasyn was tested on an imbalanced and primary dataset on graduation success rate using logistic regression and random forest as the classification algorithms. This was evaluated in terms of the performance measurements on overall accuracy, precision, recall, and F1 score. Results showed that the modified Adasyn dominated on each performance metrics over SMOTE and Adasyn. Thus, proving that the modified Adasyn is reliable in decreasing misclassification on the graduate success rate dataset.
Recommended Citation
Gameng, H., Gerardo, B., & Medina, R. (2019). A modified adaptive synthetic SMOTE approach in graduation success rate classification.Type
ArticleISSN
2278-3091Keywords
Adaptive synthetic SMOTE Classification Graduate success rate Manhattan Distance SMOTE Adaptive synthetic K-nearest neighbor Classification algorithms Random forest F1 scores Data preprocessing Imbalanced datasets Precision Synthetic minority oversampling technique Euclidean distance Adasyn Logistic regression Recall