G-ANOVA VARIANCE COMPONENTS RESULTS
Examples from Artificial Discrete Formulation: Task and Rater Misclassifications
Part I
population distribution: skew (.25,.25,.25,.10,.10,.05)
population variance for the 1-6 valued discrete rv is 2.11
1. Basic rater-task misclassification. Homogeneous raters and homogeneous tasks;
error facets defined by the misclassification matices below.
rater misclassification (d=.3, k=.2)
rater.matrix: d = .3; k = .2
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.70 0.24 0.06 0.00 0.00 0.00
[2,] 0.15 0.70 0.12 0.03 0.00 0.00
[3,] 0.03 0.12 0.70 0.12 0.03 0.00
[4,] 0.00 0.03 0.12 0.70 0.12 0.03
[5,] 0.00 0.00 0.03 0.12 0.70 0.15
[6,] 0.00 0.00 0.00 0.06 0.24 0.70
task misclassification with task parameter t = .3
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.70 0.30 0.00 0.00 0.00 0.00
[2,] 0.15 0.70 0.15 0.00 0.00 0.00
[3,] 0.00 0.15 0.70 0.15 0.00 0.00
[4,] 0.00 0.00 0.15 0.70 0.15 0.00
[5,] 0.00 0.00 0.00 0.15 0.70 0.15
[6,] 0.00 0.00 0.00 0.00 0.30 0.70
pXtXR anova
variance components:
p t r pt pr tr ptr,e
1.59 .003 0 .202 0 0 .423
Generalizations from these structures.
The population variance for the 1-6 valued discrete rv is 2.11,
compared with the component for p of 1.59. This is a typical
discrepency, and component for p decreases as d increases--i.e.,
"universe variance" gets smaller as raters get worse. Component for p
decreases as task parameter t increases. Interaction component taskXperson is large (even
for this homogenous, symmetric task misclassification).
Also magnitude of txp decreases as d increases--task by person interaction gets smaller as
recorders get worse.
2. Second, Replace artificial rater misclassification matrix above with empirical from CLAS '94
BASE Rater misclass Writing
1 2 3 4 5 6
1 .191 .515 .243 .043 .009 .000
2 .167 .494 .283 .054 .003 .000
3 .007 .250 .580 .158 .005 .000
4 .003 .071 .300 .461 .154 .011
5 .001 .001 .026 .315 .481 .177
6 .000 .000 .002 .057 .374 .567
Run anova with homogeneous population of recorders who "pinch in" as above, and use task misclassification with t=.3 as above
pXtXR anova
variance components:
p t r pt pr tr ptr,e
.918 0 0 .108 0 0 .598
Again, we have the population variance for the 1-6 valued discrete rv is 2.11,
yet the component for p is now even smaller at .918, for this case of lower
hit-rate for raters than in the example above (.70 versus .4 -.5).
Another illustration of "universe variance" gets smaller as raters get worse.
3. Third, mixed raters, one roll-the dice--i.e. raters heterogeneous, not interchangable.
Example, one rater as above d = .3; k = .2. The other rater, roll-the-dice.
Roll-the-dice rater has all elements 1/6 (maximum chaos); no matter what the quality
of the paper, rating has equal chance of each score category
(refer back to discrete raters examples. Task misclassification
with t=.3 as above.
pXtXR anova
variance components:
p t r pt pr tr ptr,e
.009 0 .207 .01 .72 0 1.771
Person (universe) and personXtask now wiped out (even though person distribution
and task miscalssification are same as above). Rater components finally show up.
But adding more roll-the-dice recorders will not help; quality over quantity.
4. Fourth, is it the rater heterogeneity or the chaos? Make all raters
roll-the-dice to acheive homogeneous rater population (the usual G-theory, psychometric
assumption).
pXtXR anova
variance components:
p t r pt pr tr ptr,e
.001 0 0 .01 0 0 2.92
So with homogeneous roll-the-dice raters, all rater components (r, pr, tr) are 0, even
with an insane amount of rater error. Also rater process eliminates person and personXtask,
even though those processes are the same as in all examples above.
Examples from Empirical Rater Misclassifications
Math and Writing Examples from:
Examples of the Performance of
G-theory Extensions for Estimating Error
David Rogosa Haggai Kupermintz
CCSSO National Conference on Large-Scale Assessment
June 24, 1996
The tables below display G-theory variance components for different G-anovas:
personXraters anova, just rater misclasssification, no task factor
perXtaskXrater anova t = .1, small task misclassification plus rater misclasssification
perXtaskXrater anova t = .3, larger task misclassification plus rater misclasssification
Data structures are 4-category data in the Math example and 6-category example in the Writing example. For Writing 6-cat population distribution is set to flat: (1/6, 1/6, 1/6, 1/6, 1/6, 1/6) and thus the population variance for the 1-6 valued discrete r.v. is 2.92. For Math 4-cat population distribution is set to flat: (1/4, 1/4, 1/4, 1/4) and thus the population variance for the 1-4 valued discrete r.v. is 1.25 (refer to the item for 4-cat flat dist, Effects of misclassification errors on category membership and moments).
General Conclusions
The two examples provide a look at results of G-anova for different levels of rater accuracy and of task precision.
Person (universe?) components: For the person variance, the combination of the worst raters and task specification reduces the person variance from 1.25 to .48 (nearly factor of 3 reduction) for Math; similarly for Writing the reduction is from 2.92 to 1.4 (more than a factor of 2). Person (true) variance is reduced by quality of measurement?
Rater components: In both examples the r,pr,tr compents show absolutely no sensitivity to large changes in rater acuity--all components remain 0, even for the worst raters!
Task components: Changes (i.e. decreases) in the quality of the task (increasing t) leave t component = 0, but increase personXtask interaction.
Discrete Formulation Examples Based on the 4-category Math ratings
pXr, pXtXr, G-anova
VARIANCE COMPONENTS RESULTS
pXr anova pXtXR anova t = .1 pXtXR anova t = .3
BASE rater W || W var. comp.: variance components: variance components:
1 2 3 4 O || O p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
1 .913 .080 .006 .002 R || R 1.096 0 .149 .96 0 0 .08 0 0 .151 .73 0 0 .22 0 0 .154
2 .049 .848 .097 .006 S || S
3 .004 .085 .802 .109 E || E
4 .001 .004 .088 .908 ||
\/
R || R
A || A
base - .10 T || T
1 2 3 4 E || E
1 .813 .171 .0128 .0043 R || R p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
2 .081 .748 .161 .0099 S || S .951 0 .251 .85 0 0 .07 0 0 .248 .65 0 0 .184 0 0 .26
3 .006 .128 .702 .164 ||
4 .002 .008 .183 .808 ||
||
||
\/
base - .20 W || W
1 2 3 4 O || O p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
1 .713 .262 .0196 .0065 R || R .805 0 .336 .71 0 0 .06 0 0 .339 .56 0 0 .15 0 0 .355
2 .113 .648 .225 .0139 S || S
3 .008 .171 .602 .219 E || E
4 .003 .013 .277 .708 ||
||
R || R
base - .30 A \/ A
1 2 3 4 T || T p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
1 .613 .353 .026 .0088 E || E .674 0 .417 .60 0 0 .05 0 0 .418 .48 0 0 .13 .01 0 .416
2 .146 .548 .288 .0178 R || R
3 .010 .214 .502 .274 S || S
4 .004 .017 .372 .608 \/
||
Discrete Formulation Examples Based on the 6-category writing ratings
pXr, pXtXr, G-anova
=======================================================================================================================================
Examples Based on the Writing ratings--6 categories Examples Based on the Writing ratings
=======================================================================================================================================
VARIANCE COMPONENT RESULTS
pXr anova pXtXR anova t = .1 pXtXR anova t = .3
base + .30
1 2 3 4 5 6
1 .491 .324 .153 .027 .006 .000 W || W
2 .068 .794 .116 .022 .001 .000 O || O var. comp.: variance components: variance components:
3 .002 .071 .880 .045 .001 .000 R || R p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
4 .0013 .032 .133 .761 .068 .005 S || S 2.18 0 .317 2.1 0 0 .06 0 0 .313 1.9 0 0 .19 0 0 .302
5 .0004 .0004 .011 .133 .781 .075 E || E
6 .000 .000 .0006 .018 .115 .867 ||
\/
R || R
A || A
base + .20 T || T
1 2 3 4 5 6 E || E
1 .391 .388 .183 .0324 .007 .000 R || R
2 .101 .694 .171 .0327 .002 .000 S || S p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
3 .0037 .131 .780 .0828 .003 .000 || 1.941 0 .425 1.9 0 0 .06 0 0 .417 1.7 0 0 .18 0 0 .409
4 .0019 .0447 .189 .661 .097 .007 ||
5 .0006 .0006 .016 .194 .681 .109 ||
6 .000 .000 .0011 .0307 .201 .767 ||
\/
W || W
O || O
base + .10 R || R
1 2 3 4 5 6 S || S
1 .291 .451 .213 .0377 .008 .000 E || E
2 .134 .594 .227 .0433 .002 .000 || p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
3 .0053 .190 .680 .120 .004 .000 || 1.76 0 .509 1.7 0 0 .06 0 0 .498 1.5 0 0 .16 0 0 .50
4 .0024 .0578 .244 .561 .125 .009 R || R
5 .0008 .0008 .021 .254 .581 .143 A \/ A
6 .000 .000 .0015 .044 .288 .667 T || T
E || E
R || R
S || S
BASE Rater misclass \/
1 2 3 4 5 6 ||
1 .191 .515 .243 .043 .009 .000 ||
2 .167 .494 .283 .054 .003 .000 || p r pr,e p t r pt pr tr ptr,e p t r pt pr tr ptr,e
3 .007 .250 .580 .158 .005 .000 || 1.592 0 .575 1.5 0 0 .05 0 0 .576 1.4 0 0 .15 0 0 .573
4 .003 .071 .300 .461 .154 .011 ||
5 .001 .001 .026 .315 .481 .177 \/
6 .000 .000 .002 .057 .374 .567 ||
||