Açık Uçlu Maddelerde Otomatik Puanlamanın Güvenirliği ve Test Eşitleme Hatalarına Etkisi
Özet
It might be difficult, time-consuming and costly to score constructed response items in tests. However, improvements in computer technology have enabled the automated scoring of constructed response items. Yet, the application of automated scoring without making investigation on validity, reliability and test equating can lead to serious problems. In this sense, the aim of this study was to score the constructed response items in mixed format tests automatically and to investigate the effect of this on test equating and reliability. The data examined in this study were the 8th grade Turkish test data of ABİDE research (Education Skills Monitoring and Evaluation) carried out by Ministry of National Education in Turkey in 2016. These tests contained common items. Support vector machine (SVM), logistic regression (LR), multinominal naive bayes (MNB), long-short term memory (LSTM) and bidirectional long-short term memory (BLSTM) were selected as automated scoring methods. During the test equating process, methods based on Classical Test Theory and Item Response Theory were utilized. The results of the study revealed that the most compatible automated scoring method with actual raters is BLSTM. The scores obtained by the BLSTM method were in good agreement with the scores of actual raters. In most of the equating methods, it was observed that errors of equating process done with automated scoring were close to the errors of equating process done by actual raters. It was concluded that automated scoring can be applied since it is compatible with actual raters and convenient in terms of equating.