Dengesiz Verilerde Sentetik Azınlık Aşırı Örnek Tekniklerinin (Smote) Karşılaştırılması: İnme Verisi Örneği
Loading...
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Fen Bilimleri Enstitüsü
Abstract
The class imbalance problem remains one of the most significant challenges limiting the effectiveness of classification algorithms in contemporary data-driven applications. This study investigates the impact of various oversampling techniques including SMOTE, Borderline-SMOTE, SVM-SMOTE, SMOTE-ENN, KMeans-SMOTE, SMOTE-Tomek, and ADASYN on the performance of classification models. The classifiers employed in this evaluation are Logistic Regression (LR), Random Forest (RF), Support Vector Machines (SVM), and XGBoost (XGB). The models' performances were assessed based on widely used evaluation metrics, including Precision, Recall, F1-score, ROC curve and AUC value. The comparisons were conducted between oversampling techniques and classification algorithms to determine the most effective combinations. In the final phase of the study, a single best-performing classifier was selected for each oversampling method, followed by a comparative analysis to identify the overall most successful pair. The experimental results demonstrate that the combination of the SMOTE-ENN oversampling technique and the RF classifier yields the highest performance across the considered evaluation metrics, indicating it as the most effective pairs for handling imbalanced datasets in this context.