Öğrencilerin PISA Matematik Başarılarının Yordanmasında Veri Madenciliği Yöntemlerinin Karşılaştırılması

View/ Open
Date
2018-01-31Author
Koyuncu, İlhan
xmlui.mirage2.itemSummaryView.MetaData
Show full item recordAbstract
The purpose of this study is to examine the performance of Naive Bayes, nearest neighborhood, artificial neural networks, and logistic regression analysis in terms of sample size and test-data ratio in classifying students participated in the PISA (2012) study according to their mathematics performance. The population is students in the 15-year-old group who are participated in the PISA (2012) study. The target population is 62728 students from OECD countries who have participated in the study and have no missing data for the relevant variables. A total of 180 datasets were created by selecting from the target population for the sample sizes including 500 (100 datasets), 1000 (50 datasets) and 5000 (30 datasets) students. The performance of each algorithm was tested by using 11%, 22%, 33%, 44% and 55% of each dataset. It has been checked to what extent the assumptions of the univariate and multivariate analyzes satisfy. For each dataset, 100 analyzes in which test-sample is randomly selected at each time were performed. As the evaluation criteria, accuracy rates and their standard deviations, Kappa values and the area under ROC curve were used. For each dataset, methods’ means of accuracy rates and their standard errors were statistically tested. According to the results of the study, while the classification performance of the methods increased as the sample size increased, the increase of the test-data ratio had different effects on the performance of the methods. The Naive Bayes method showed high performance even in small samples, performed the analyzes very quickly and was not affected by the change in the test-data ratio. Logistic regression analysis was the most effective method in large samples, but had poor performance in small samples. While neural networks method showed a similar tendency, its overall performance was lower than Naive Bayes and logistic regression. The lowest performances in all conditions were obtained by the nearest neighbor method. In the conclusions and suggestions part of the present study, the findings were discussed in detail and some suggestions for theory and practice were made.