Mikrodizi Gen İfade Verilerinde Farklı Öznitelik Seçim Yöntemleri ile Sınıflama Yöntemlerinin Performanslarının Değerlendirilmesi
Özet
Bioinformatics is an interdisciplinary branch of science that combines statistics, biology, computing, mathematics, and genetics, and thanks to the analysis in bioinformatics, it can be shown which abnormalities causes which disease. In cancer disease, diagnosis with microarray gene expression data, classification procedures and identification of genes that are effective in the structure of cancer are of great importance for early diagnosis of the disease. In the thesis, microarray gene expression data of lung, kidney, lymphoma, cervical, prostate, breast and leukemia cancer types were studied. Since the number of features of the data is high, varFilter, nsFilter, rf, lasso, rfe and limma feature selection methods have been discussed. In filtered data sets, classification models were constructed with Naive Bayes, Support Vector Machines, k-Nearest Neighbor, Artificial Neural Networks and Deep Learning method, which has gained popularity in recent years. Accuracy, sensitivity, specificity and AUC were obtained to demonstrate which classification methods are better in the subject feature selection methods and to compare the performance and success of the generated classification models. Generally, classification models obtained in lasso and limma feature selection methods are more successful than models obtained in other feature selection methods. Deep Learning method is also generally more successful than classical data mining classification methods. Deep learning classification models were also obtained without applying the feature selection method on the datasets. It was compared whether there is a difference between the performances of deep learning models obtained by applying and without applying feature selection methods. In addition, implementation steps were carried out in four different simulation data. Similar results were obtained on real and simulation datasets.