Modified adaptive synthetic SMOTE to improve classification performance in imbalanced datasets
Share
Abstract
The oversampling technique in the data preprocessing has been utilized to mitigate the imbalanced data problem in the real research scenario. This imbalance may reduce the ability of classification algorithms to recognize cases of interest leading to misclassification of positive samples as negative class or the false positive generation. Synthetic Minority Oversampling Technique (SMOTE) is one of the oversampling techniques existing and the Adaptive Synthetic (Adasyn) SMOTE is one of its many variants. K-Nearest Neighbor (KNN) is incorporated in Adasyn. In this study, Manhattan distance is applied in the KNN computations. This modified Adasyn was evaluated in terms of its effectiveness in the performance measure of overall accuracy, precision, recall and F1 measure on the six imbalanced datasets using logistic regression as the classification algorithm. The modified Adasyn dominated over SMOTE and the original Adasyn by 66.67 percent of the total performance metric count. It leads the accuracy and recall count with 4 out of 6, precision count with 3 out of 6, and the F1 measure count with 5 over 6. Thus, proving that the modified Adasyn can provide an efficient solution in decreasing misclassification on imbalanced datasets.
Recommended Citation
Gameng, H. A., Gerardo, B. B., & Medina, R. P. (2019). Modified adaptive synthetic SMOTE to improve classification performance in imbalanced datasets. In 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS) (pp. 1-5). Kuala Lumpur, Malaysia: IEEE.
Type
Conference paperISBN
978-1-7281-4082-7Keywords
Subject
Collections
- Conference Papers [14]
Related items
Showing items related by title, author, creator and subject.
-
Differentiation between organic and non-organic green onions using image classification with hyperparameter tuning
Dela Cruz, Nerilou B. (2022-07)Differentiation between agricultural organic and non-organic crops involves professional laboratory techniques using expensive devices. This research domain requires a real-world dataset (RWD) which is limited depending ... -
Plant leaf detection and disease recognition using deep learning
Militante , Sammy V. ; Gerardo, Bobby D. ; Dionisio, Nanette V. (Institute of Electrical and Electronics Engineers Inc., 2019-12)The latest improvements in computer vision formulated through deep learning have paved the method for how to detect and diagnose diseases in plants by using a camera to capture images as basis for recognizing several types ... -
A modified adaptive synthetic SMOTE approach in graduation success rate classification
Gameng, Hazel ; Gerardo, Bobby ; Medina, Ruji (World Academy of Research in Science and Engineering, 2019)In the real research situation, the oversampling method in data preprocessing is used to solve the problem in imbalanced data. This imbalance may lessen the capability of classification algorithms to identify instances of ...