Buz Pateninde Hakem Değerlendirmelerinin Genellenebilirlik Kuramı ve Rasch Modeli İle İncelenmesi
Abstract
The purpose of this study is to compare Generalizability Theory and Many Facet Rasch
Measurement Model based on estimated parameters from raters’ assessment scores that
obtained from Ice Skating World Championships. More advantageous, more
informative and more operative theory has aimed to investigated in real (authentic) data.
Also, consistency of information which generated from those two theories were
investigated. In generalizability theory; crossed designs (skaterxjudgextask), decision
studies in these designs were compared in this study. In this process, validity of decision
studies has been examined. In addition to this, detailed information have acquired for
each facet by using many facet Rasch measurement model. 397 ice skaters who performed in Ice Skating World Championships between 2006 and
2011 and 189 raters who judged them in these competitions were consisted research group. Raters’ assessments which obtained from singles (men and women) and pairs competition in five program component were used as a research data. Data were analyzed based on designs that are appropriate for generalizability theory and many facet rasch measurement models. For generalizability theory analyses EduG (Swiss Society for Research in Education Working Group, 2010) and for many facet Rasch measurement model analyses FACETS programs were used. As a result of generalizability analyses, variances due to skaters were too high. Skaters’ variances were found between 76.6%-94.4% of the total variances for all years and groups. According to Rasch analysis, the reliability of skater ranges between .99 and 1.0. So, in the measurement of the ice skating performances, differences between skaters in five program components were identified.
Variances due to interactions between skatersxraters in all years and groups were found between 2.8%-14.4% of the total variances. In Rasch model, bias analyses were provided similar results (2.8%-13.7%). According to generalizability analyses, the generalizability and reliability coefficients
were between .98-1.00 in the 2006-2008 competitions. Decision studies were conducted
for changing raters’ number from 12 to 9. Decision studies’ results showed that
generalizability (.98-1.00) and reliability coefficients (.98-.99) were not decreasing. In
addition to this, absolute and relative error variances were not increasing so much.
Generalizability and reliability coefficients (.98-.99) were high for competitions
between 2009 and 2011 in all groups. In these years, raters number were 9 and decision
studies were conducted for 12 raters. As a result, increasing the number of raters were
not change the generalizability and reliability coefficients. These results were indicating
the validity of D-studies.
Both methods provide the information in different levels (group level statistics and
individual level statistics). They had different advantageous and they complement each
other. So, it is beneficial to use both methods in multi-faceted conditions.