Pısa Başarısını Tahmin Etmede Kullanılan Veri Madenciliği Yöntemlerinin İncelenmesi
Özet
In this study, it was tried to determine the use of data mining and machine learning approach in the field of education and the level of reliability and validity of the results obtained based on these algorithms. In the study that students were classified as successful and unsuccessful according to the Turkey's PISA average, it was predicted that in which class the students will take place in terms of science literacy using different learning methods, and the reliability and validity criteria of the results obtained at this stage were examined. Besides, all the algorithms and methods of Weka program were compared under different conditions, and it was determined that which learning method is advantageous or disadvantageous in which situations. In the study, the best results in terms of correct classification number, correct classification ratio, kappa statistic, square root error and relative square root error were obtained from Random Forest method, and It was identified that Ridge logistic regression, logistic model and Hoefding tree methods are the most successful other methods It was also determined that in case the whole data set is separated as training and test data set without using the cross validation method, the Logistic model, Random Forest and Ridge Regression methods gave the lowest error values in test data with different size, and Random Tree and J.48 methods have the highest error values. It was concluded that the error values obtained by the Ridge regression, Random forest and Logistic model were quite consistent in test data.in different percentile. It was determined that if we do not allocate the data set of the measurement results obtained by different methods as test and training data and we train and test the learning method through the same data set, especially Random tree and J.48 learning methods have a higher correct classification rate than real performances.