Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Sınıflama Doğruluğu Ve Test Uzunluğu Açısından Karşılaştırılması
Özet
Computerized Adaptive Classification Testing (CACT) aims to classify the persons
with the highest classification accuracy using the least number of items according
to one or more predefined cut-points. The efficiency of these classifications varies
by item pools, classification criteria, item selection methods and ability estimation
methods. According to this, in the CACT, forming of different patterns and
identification of these patterns under Monte Carlo (MC) and Post Hoc (PH)
simulations are important for real applications.
In this study, different classification criteria, various methods for item selection and
ability estimation in the CACT, are compared using classification accuracy, test
length and precision of measurement under the simulations of both MC and PH. In
our research, as classification criteria, Sequential Probability Ratio Test (SPRT),
Generalized Likelihood Ratio (GLR) and Confidence Interval (CI) methods; as
ability estimation methods, Expected a Posteriori (EAP) and Weighted Likelihood
Estimation (WLE) methods; and as item selection methods, Maximum Fisher
Information (MFI) and Kullback-Leibler Information (KLI) methods on the basis of
cut-point (CP) and estimated ability (EA) have been examined. For this aim, for the
MC simulation, a pool of 500 items, which is based on 3 PLM and informs at the
cut-point (theta=1,0) and around, has been generated; for the PH simulation, a
real data set including 80 items has been used. In the MC simulation, individual
abilities have been generated using normal distribution (N(0,1)) for 3000
individuals. In the PH simulation, the ability level of the 994 individuals in the data
set have been estimated by EAP on the basis of 3 PLM. The item response
patterns have been generated randomly in R software in the MC simulation,
whereas, the real item response pattern has been used without any manipulation
in PH simulation. In our study, 96 conditions have been investigated for the MC
and the PH simulations. At the end of the CACT simulations, the mean values of
Average Test Length (ATL), Average Classification Accuracy (ACA), correlation
ix
between the real thetas and estimated thetas (r), bias, Root Mean Square Error
(RMSE) and Mean Absolute Error (MAE) for 25 replications have been calculated.
According to results of the study, it has been observed in both the MC and the PH
simulation results that the GLR and the CI classification criteria perform better
compared to the SPRT in terms of test efficiency, however the SPRT works better
compared to the other two methods in terms of bias, RMSE and MAE. It has also
been deduced that the ATL decreases and test efficiency increases as the
indifference region of classification criteria expands or the error value decreases.
In addition, it has been concluded that all classification criteria have considerably
high level of the classification accuracy in all conditions; and both ability estimation
methods, the EAP and the WLE, have successful estimation results in terms of the
correlation between real and estimated thetas (r); wheras the EAP relatively
performs better than the WLE in terms of the bias, RMSE and MAE. It has also
been observed that, all of the item selection methods work similarly to each other
however the MFI-EA performs better for all conditions in terms of all dependent
variables.