Kernel Eşitleme Ve Madde Tepki Kuramına Dayalı Eşitleme Yöntemlerinin Karşılaştırılması

Akın Arıkan, Çiğdem

Göster/Aç

doktora tezi (5.115Mb)

Tarih

2017

Yazar

Akın Arıkan, Çiğdem

Üst veri

Tüm öğe kaydını göster

Özet

This study aimed to evaluate the performances of Item Response Theory (IRT) true-score equating (Haebara) and the Kernel post-stratification equipercentile, Kernel post-stratification linear, Kernel chained equipercentile and Kernel chained linear equating methods based on equating errors (the root-mean-square difference-RMSD) and standard error of equating (SEE) using the common-item nonequivalent groups design. To this purpose; the sample size, group ability differences, common item type, common item rate and the difficulty distribution of common items were examined. The current study was designed as a comparison of five equating methods on 72 simulation conditions consisting of three sample sizes, two group ability distribution, two common item type (internal and external), three rates of common items, and two item difficulty distribution of common items (mini and midi common test) by fixing the test length at 50 items. The analysis of the study was conducted on R software, and 100 replication were performed for each condition. The “ltm” package (Rizopoulos, 2015) of the R software was used in item parameter estimation in the IRT true-score equating; the “plink” package (Weeks, 2010) was used for scaling item parameters, and for the equating test scores, and the “kequate” package (Andersson, Branberg & Wiberg, 2013) was used for Kernel equating methods. Test forms scored in two categories (common test and main test forms) were generated using the “irtoys” package (Partchev, 2016) in accordance with the 3PLM model. For the common effect of equating error and standard error on the conditions of the present study, the ‘lattice’ package (Sarkar, 2017) was utilized. The results showed that the conditions, including the sample size, the distribution of ability, the type of common item, the rate of common items and the power distribution of common items, as well as the interaction of these conditions, effected the performance of the equation methods. However, the performances of the methods were found to be different based on these conditions. The larger the sample size, the less the equating error and the standard error of equating. However, the effect of the sample size on the standard error was found to be greater than on the total error. As the group ability distributions varied, the errors in all equating methods increased. However the increase in the errors differs by equating methods. The chained equating methods were observed to be less affected by the difference on ability distributions between groups. Furthermore, when ability distributions differ, the extreme scores have the greatest error in Kernel methods, and middle and high scores have the greatest error in IRT true score equating. The external common test revealed a lower standard and total error than the internal common test. As the rate of common items increased, the standard errors and the total errors decreased. In the internal common test, the mini and midi common tests concluded similar results when the group ability differences between groups were similar. When the group ability distribution were different, while the midi common test concluded better results, than midi common test in the external common test, the mini common test gave better results than mini common test in internal common test. When Kernel equating methods were compared, linear equating method performed better with respect to standard errors while Kernel equipercentile equating method performed better with respect to equating error. In every condition, the Kernel equating methods were found to reveal lower standard errors in the medium score scale, and higher standard errors in extreme scores where score frequency was lower, compared to the IRT true-score equating. Regarding the extreme scores, less errors were obtained through the IRT true-score equating method than the Kernel equating methods. Therefore, the IRT true-score equating methods could be used instead of the Kernel equating methods, especially when important decisions are to be made about individuals. In general, it was concluded that the results obtained from Kernel equating methods as a new approach were as appropriate as those of the IRT true-score equating method. Use of the Kernel equating methods can be recommended when the true-score equating methods are not available. Since the equated scores resulting from the equation of test forms vary by the equating methods, the decision regarding which equating method to use should be made after considering the strengths and weaknesses of each method in accordance with the test purpose.

Bağlantı

http://hdl.handle.net/11655/3902

Koleksiyonlar

Eğitim Bilimleri Bölümü Tez Koleksiyonu [656]