MADDE TAKIMLARINDAN OLUŞAN TESTLERDE BİLGİSAYARDA BİREYSELLEŞTİRİLMİŞ TEST UYGULAMA DESENLERİNİN KARŞILAŞTIRILMASI
Özet
In this study, CAT applications of testlet based tests were examined under different designs. In this context, the real data set obtained from the computer-based English proficiency exam, consisting of nine forms, administered to students at Bilkent University. Since each form contains nine anchor items for scaling purposes, an item pool was created through concurrent calibration method. Three distinct CAT designs were developed, namely, the CAT where testlets were treated as independent items, the Testlet Based-CAT(T-CAT) where testlets were treated together and based on the Testlet Response Theory (TRT), and the Passage Based-CAT(P-CAT) where testlets were treated together but using the unidimensional Item Response Theory (IRT). These test designs were examined for measurement precision and accuracy, item pool utilization under different conditions. The simulation conditions included the number of testlets (6-9-12), sample size (200-500-1000) and ability estimation methods (EAP-MAP). The correlation, BIAS, MAB, RMSE and SH values between true and estimated θ values obtained from these conditions were compared. Item usage frequency, number of unused items and test overlap rates were calculated. The findings indicated that CAT was more effective while the P-CAT produced similar results to the CAT design in general. Conversely with the T-CAT, the expected results could not be achieved due to the low local dependency degrees of the testlets. Since T-CAT analyses are complex and time-consuming, the P-CAT may be preferred in terms of usefulness for assessments with low or medium testlet effects. All findings obtained were discussed and recommendations were made for practitioners and researchers.