Makine Öğrenmesinde Kategorik Değişken Seçimi

Gölen, Çağrı

Göster/Aç

Tez Dosyası (3.639Mb)

Tarih

2024

Yazar

Gölen, Çağrı

Ambargo Süresi

6 ay

publications

supporting

mentioning

contrasting

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Üst veri

Tüm öğe kaydını göster

Özet

Feature selection is a critical step in machine learning model development. It is recommended to perform feature selection before building any model. Feature selection processes significantly impact model performance by reducing model complexity, improving interpretability, and enhancing generalization ability. Given the importance of this approach, numerous methods have been developed over time. A voting mechanism, created by combining different feature selection methods, can provide a more reliable and comprehensive feature selection process. This study investigates the significance of feature selection in machine learning models and the impact of different methods on model performance. Eight different feature selection methods were employed to identify the most important features in the dataset, and their performance was evaluated using the F1-score. The feature selection methods used include Chi-squared test, Fisher's exact test, information gain, backward elimination, forward selection, recursive feature elimination, logistic regression Lasso regularization, and feature importance scores. Each method was applied to identify the most relevant features from the dataset and improve the model's prediction accuracy. In this study, a voting mechanism was established by combining the most important features determined by different methods, aiming to create a more reliable and comprehensive feature selection model. The findings indicate that the features selected using this voting method yielded better results compared to a single method. The scope of the study is limited to categorical data. However, it demonstrates that this method can be extended to include continuous variables and mixed datasets. Additionally, a framework has been designed to evaluate the performance of these methods on different machine learning algorithms. This opens possibilities for the investigation of various algorithms. The developed voting mechanism has an end-to-end design, suggesting that it can be made available as a Python package. In conclusion, this thesis highlights the significance of feature selection processes in machine learning models and demonstrates that combining different methods can yield higher F1-scores. This approach provides a flexible and effective framework that can cater to the specific requirements of different applications and serves as a solid foundation for future research.

Bağlantı

https://hdl.handle.net/11655/35849

Koleksiyonlar

İstatistik Bölümü Tez Koleksiyonu [130]

Künye

Gölen, Ç. (2024). Makine Öğrenmesinde Kategorik Değişken Seçimi.