Education 257 SOLUTIONS HW2 2/7/05 A. Nested Designs --------------------- problem 1. Group Decision Making Example NWK Nested Chapter (26v5, 28v4) First off need to create data set from info in text I typed in the values for the dependent variable (questions) in c1 then created the indices for nationality, group size, observer. As follows first I checked my outcome values MTB > sum c1 Column Sum Sum of C1 = 242.00 MTB > note create indices for country size observer MTB > Set c2 DATA> 1( 1 : 2 / 1 )8 DATA> End. MTB > Set c3 DATA> 2( 1 : 2 / 1 )4 DATA> End. MTB > Set c4 DATA> 4( 1 : 2 / 1 )2 DATA> End. MTB > print c1-c4 Data Display Row C1 C2 C3 C4 1 16 1 1 1 2 20 1 1 1 3 14 1 1 2 4 19 1 1 2 5 21 1 2 1 6 25 1 2 1 7 28 1 2 2 8 19 1 2 2 9 7 2 1 1 10 5 2 1 1 11 4 2 1 2 12 9 2 1 2 13 11 2 2 1 14 17 2 2 1 15 12 2 2 2 16 15 2 2 2 part (a) cell means MTB > table c2 c3; SUBC> count; SUBC> mean c1. Tabulated Statistics Rows: C2 Columns: C3 1 2 All 1 4 4 8 17.250 23.250 20.250 2 4 4 8 6.250 13.750 10.000 All 8 8 16 11.750 18.500 15.125 Cell Contents -- Count C1:Mean I also got the sum note syntax sums rather than sum C1 MTB > table c2 c3; SUBC> sums c1. Tabulated Statistics Rows: C2 Columns: C3 1 2 All 1 69.000 93.000 162.000 2 25.000 55.000 80.000 All 94.000 148.000 242.000 Cell Contents -- C1:Sum part b cross-nested anova follow the model stated to construct the anova command MTB > anova c1 = c2 c3 c4(c2) c2*c3 c3*c4(c2); SUBC> random c4; SUBC> restrict; SUBC> ems. Analysis of Variance (Balanced Designs) Factor Type Levels Values C2 fixed 2 1 2 C3 fixed 2 1 2 C4(C2) random 2 1 2 Analysis of Variance for C1 Source DF SS MS F P C2 1 420.25 420.25 1681.00 0.001 C3 1 182.25 182.25 145.80 0.007 C4(C2) 2 0.50 0.25 0.02 0.981 C2*C3 1 2.25 2.25 1.80 0.312 C3*C4(C2) 2 2.50 1.25 0.09 0.911 Error 8 106.00 13.25 Total 15 713.75 Source Variance Error Expected Mean Square component term (using restricted model) 1 C2 3 (6) + 4(3) + 8Q[1] 2 C3 5 (6) + 2(5) + 8Q[2] 3 C4(C2) -3.250 6 (6) + 4(3) 4 C2*C3 5 (6) + 2(5) + 4Q[4] 5 C3*C4(C2) -6.000 6 (6) + 2(5) 6 Error 13.250 (6) This recreates the anova table in text Also Minitab gives same value for Nationality test statistic as NWK . Treating observer as fixed does change the test statistic for Nationality and others as now error MS is the denominator of all test statistics MTB > anova c1 = c2 c3 c4(c2) c2*c3 c3*c4(c2) Analysis of Variance (Balanced Designs) Factor Type Levels Values C2 fixed 2 1 2 C3 fixed 2 1 2 C4(C2) fixed 2 1 2 Analysis of Variance for C1 Source DF SS MS F P C2 1 420.25 420.25 31.72 0.000 C3 1 182.25 182.25 13.75 0.006 C4(C2) 2 0.50 0.25 0.02 0.981 C2*C3 1 2.25 2.25 0.17 0.691 C3*C4(C2) 2 2.50 1.25 0.09 0.911 Error 8 106.00 13.25 Total 15 713.75 ------------------------------------------------------------ ======================================================================= problem 2 read in data MTB > Read '[path]\HAWARE.DAT' c1-c4. Entering data from file: G:[path]\HAWARE.DAT 45 rows read. for sanity I'll just do text refs to Ch28; for version 5 it is Ch 26 do 28.9c by getting state and city-within-state means MTB > table c2 c3; SUBC> mean c1. Tabulated Statistics ROWS: C2 COLUMNS: C3 1 2 3 ALL 1 40.200 38.800 43.600 40.867 2 54.200 58.600 59.200 57.333 3 24.800 27.800 28.000 26.867 ALL 39.733 41.733 43.600 41.689 CELL CONTENTS -- C1:MEAN big state differences (rows) somewhat small differences of cities within state do 28.10a by running anova for nested design MTB > anova c1 = c2 c3(c2) Analysis of Variance (Balanced Designs) Factor Type Levels Values C2 fixed 3 1 2 3 C3(C2) fixed 3 1 2 3 Analysis of Variance for C1 Source DF SS MS F P C2 2 6976.8 3488.4 32.26 0.000 C3(C2) 6 167.6 27.9 0.26 0.953 Error 36 3893.2 108.1 Total 44 11037.6 do 28.10 b,c by carrying out the tests for state and city within state clearly significant effect for states (c2) nothing for cities within states what c3(c2) source is indicating is whether within a state do the city means differ (aside from between-state differences) as we saw from looking at the means not that much variation of city scores within a state do 28.10d by noting that these two tests each done with Type I error rate .05 will combine for an overall error rate less than or equal to .10 do 28.11 c by substituting from NWK sec 28.5 esp p.1135-36 for this balanced nested design the interval estimate for the pairwise differences. These have the form: difference of state means +/- q(.90, 3 , 36)*Sqrt[MSE/(#households*#cities) or difference of state means +/- 3*Sqrt[108.1/5*3] difference of state means +/- 8.05 so the set of interval estimates is means CI 1 - 2 (-24.52, -8.42) 1 - 3 (5.95, 22.05) 2 - 3 (22.42, 38.52) ------------------------------------------- Sidenote 1--- Could try to get between-state pairwise comparisons by ignoring the nested structure of the data and running a one-way anova (classification variable state with 3 levels) and subsequent Tukey. Because between-city within-state differences are so small in these data, ignoring the nested structure of the data produces almost identical results: MTB > oneway c1 c2; SUBC> tukey 10. One-Way Analysis of Variance Analysis of Variance on C1 Source DF SS MS F p C2 2 6976.8 3488.4 36.08 0.000 Error 42 4060.8 96.7 Total 44 11037.6 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --+---------+---------+---------+---- 1 15 40.867 10.385 (---*---) 2 15 57.333 9.868 (---*---) 3 15 26.867 9.211 (---*----) --+---------+---------+---------+---- Pooled StDev = 9.833 24 36 48 60 Tukey's pairwise comparisons Family error rate = 0.100 Individual error rate = 0.0409 Critical value = 2.98 Intervals for (column level mean) - (row level mean) 1 2 2 -24.03 -8.90 3 6.43 22.90 21.57 38.03 Sidenote 2--to approximate the tabled display of these data on p.1157 of NWK (ver 4) try the following configuration of Table MTB > table c4 c3 c2; SUBC> mean c1. Tabulated Statistics CONTROL: C2 = 1 ROWS: C4 COLUMNS: C3 1 2 3 ALL 1 42.000 26.000 34.000 34.000 2 56.000 38.000 51.000 48.333 3 35.000 42.000 60.000 45.667 4 40.000 35.000 29.000 34.667 5 28.000 53.000 44.000 41.667 ALL 40.200 38.800 43.600 40.867 CONTROL: C2 = 2 ROWS: C4 COLUMNS: C3 1 2 3 ALL 1 47.000 56.000 68.000 57.000 2 58.000 43.000 51.000 50.667 3 39.000 65.000 49.000 51.000 4 62.000 70.000 71.000 67.667 5 65.000 59.000 57.000 60.333 ALL 54.200 58.600 59.200 57.333 CONTROL: C2 = 3 ROWS: C4 COLUMNS: C3 1 2 3 ALL 1 19.000 18.000 16.000 17.667 2 36.000 40.000 28.000 34.667 3 24.000 27.000 45.000 32.000 4 12.000 31.000 30.000 24.333 5 33.000 23.000 21.000 25.667 ALL 24.800 27.800 28.000 26.867 CELL CONTENTS -- C1:MEAN ------------------------------------------------------------ problem 3. (a) putting together the nested anova using only primitive tools MTB > Read "TRAINING.DAT" c1-c4. 12 rows read. to get the between schools SS, simply MTB > oneway c1 c2 One-Way Analysis of Variance Analysis of Variance for C1 Source DF SS MS F P C2 2 156.5 78.3 1.16 0.358 Error 9 609.5 67.7 Total 11 766.0 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --------+---------+---------+-------- 1 4 19.750 8.617 (-----------*----------) 2 4 14.250 7.136 (-----------*----------) 3 4 11.000 8.832 (-----------*----------) --------+---------+---------+-------- Pooled StDev = 8.229 8.0 16.0 24.0 to get the other terms in Table 28.4, analyze instructors within schools break out instructors within schools MTB > copy c1-c3 c11-c13; SUBC> use c2 = 1. MTB > copy c1-c3 c21-c23; SUBC> use c2 = 2. MTB > copy c1-c3 c31-c33; SUBC> use c2=3. Now do the series of one-way anovas within schools find total instructors SS within schools is 210.25 + 132.25 + 225 = 567.5 find error SS accumulated within schools is 12.5 + 20.5 + 9 = 42 MTB > oneway c11 c13 One-Way Analysis of Variance Analysis of Variance for C11 Source DF SS MS F P C13 1 210.25 210.25 33.64 0.028 Error 2 12.50 6.25 Total 3 222.75 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ------+---------+---------+---------+ 1 2 27.000 2.828 (-------*-------) 2 2 12.500 2.121 (-------*------) ------+---------+---------+---------+ Pooled StDev = 2.500 10 20 30 40 MTB > oneway c21 c23 One-Way Analysis of Variance Analysis of Variance for C21 Source DF SS MS F P C23 1 132.2 132.2 12.90 0.070 Error 2 20.5 10.2 Total 3 152.8 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev --+---------+---------+---------+---- 1 2 8.500 3.536 (---------*--------) 2 2 20.000 2.828 (---------*---------) --+---------+---------+---------+---- Pooled StDev = 3.202 0 10 20 30 MTB > oneway c31 c33 One-Way Analysis of Variance Analysis of Variance for C31 Source DF SS MS F P C33 1 225.00 225.00 50.00 0.019 Error 2 9.00 4.50 Total 3 234.00 Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev ----+---------+---------+---------+-- 1 2 18.500 2.121 (-------*-------) 2 2 3.500 2.121 (-------*-------) ----+---------+---------+---------+-- Pooled StDev = 2.121 0.0 8.0 16.0 24.0 MTB > ------ Or an alternative basic anova (crossed design ) gives the correct Schools and Error SS and then use Eq 28.16 (v4) MTB > twoway c1 c2 c3 Two-way Analysis of Variance Analysis of Variance for C1 Source DF SS MS C2 2 156.50 78.25 C3 1 108.00 108.00 Interaction 2 459.50 229.75 Error 6 42.00 7.00 Total 11 766.00 ---------------------------------------------------- part (b) Tukey intervals critical value for 90% interval is q(.90; 3 , 6)= 3.56 with 3 schools compared and 6df error. Sqrt[MSE/2*2] = Sqrt[7/4] = 1.32 1.32*3.56 = 4.7 School Means are given from anova above Level N Mean 1 4 19.750 2 4 14.250 3 4 11.000 pairwise differences +/- 4.7 give the indicated intervals ------------------------------------------------------ ======================================================================== B. Repeated Measures Designs problem 4. Hearing Tests Story start out by reading in data and a quick description MTB > Read "[path]\HEARING.DAT" c1-c3. Entering data from file: G:\HEARING.DAT 96 rows read. MTB > describe c3; SUBC> by c2. Descriptive Statistics Variable C2 N Mean Median Tr Mean StDev SE Mean C3 1 24 32.75 32.00 32.73 7.41 1.51 2 24 29.67 30.00 29.64 8.06 1.64 3 24 25.25 25.00 25.18 8.32 1.70 4 24 25.58 25.00 25.36 7.78 1.59 Variable C2 Min Max Q1 Q3 C3 1 18.00 48.00 28.00 37.50 2 16.00 44.00 20.50 36.00 3 14.00 38.00 18.50 32.00 4 14.00 42.00 18.50 31.50 MTB > table c2; SUBC> mean c3. Tabulated Statistics Rows: C2 C3 Mean 1 32.750 2 29.667 3 25.250 4 25.583 All 28.312 part (a) carry out the repeated measures anova MTB > anova c3 = c1 c2; SUBC> random c1; SUBC> restrict; SUBC> ems; SUBC> mean c1 c2. Analysis of Variance (Balanced Designs) Factor Type Levels Values C1 random 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 C2 fixed 4 1 2 3 4 Analysis of Variance for C3 Source DF SS MS F P C1 23 3231.63 140.51 3.87 0.000 C2 3 920.46 306.82 8.45 0.000 Error 69 2506.54 36.33 Total 95 6658.62 Source Variance Error Expected Mean Square component term (using restricted model) 1 C1 26.04 3 (3) + 4(1) 2 C2 3 (3) + 24Q[2] 3 Error 36.33 (3) Means C1 N C3 1 4 24.500 2 4 24.000 3 4 28.000 4 4 20.500 5 4 31.000 6 4 28.000 7 4 27.000 8 4 28.500 9 4 36.500 10 4 30.500 11 4 28.000 12 4 31.000 13 4 32.000 14 4 36.000 15 4 33.500 16 4 27.000 17 4 28.500 18 4 19.500 19 4 38.000 20 4 21.500 21 4 17.000 22 4 22.000 23 4 27.000 24 4 40.000 C2 N C3 1 24 32.750 2 24 29.667 3 24 25.250 4 24 25.583 MTB > part (b) obtain Tukey intervals for the 4 lists list means are from above C2 N C3 1 24 32.750 2 24 29.667 3 24 25.250 4 24 25.583 Sqrt[36.33/24] = 1.514 q(.95; 4, 69) = 3.73 product is 5.646 so confidence intervals are diff of means +/- 5.646 [arithmetic omitted] so list1 is sig diff from list3 and list4 part (c) from anova in part a (ems subcommand) we obtained Source Variance component 1 C1 26.04 ---------------------------------------------------------------------- problem 5. REPEATED MEASURES "WALK-THROUGH" a. NWK shoes example (i) Figure 29.8 (ver4) illustrates the volume of sales for each test market at each time period, broken down by ad-type. Plots such as these are useful for examining trends in the data and for evaluating whether your proposed model is appropriate. In this case, we can see trends over time that are similar for each site, suggesting that time period is important. We also get a sense that sales volume tends to be higher for ad-type 1. The plots indicate the absence of a site*time interaction, the term that is confounded with random error and that the model in NWK assumes is zero. One way to replicate the NWK plots is to copy the data into new columns and delete the rows corresponding to ad-type 2 before carrying out the lplot command. Then copy the original data into two other columns and delete rows corresponding to ad-type 1 to construct the second plot. [this is one strategy; another is to employ the "USE" subcommand with copy to just copy add1 or add2 data] (or perhaps easiest to do it by hand with graph paper etc) MTB > read '/usr/class/ed257/shoes.dat' c1-c4 30 ROWS READ ROW C1 C2 C3 C4 1 958 1 1 1 2 1005 1 1 2 3 351 1 1 3 4 549 1 1 4 . . . MTB > copy c1-c4 c11-c14 MTB > delete 6:10 16:20 26:30 c11-c14 MTB > lplot c11 c13 c14 1200+ - B C11 - - B A - A 2 900+ - - E - E E - D 600+ - D D - - C - C C 300+ - --------+---------+---------+---------+---------+--------C13 1.20 1.60 2.00 2.40 2.80 MTB > copy c1-c4 c21-c24 MTB > delete 1:5 11:15 21:25 c21-c24 MTB > lplot c21 c23 c24 1000+ - C C21 - C A - - A C 750+ - D A - - D D - 500+ - E - E - E - B 250+ B - B --------+---------+---------+---------+---------+--------C23 1.20 1.60 2.00 2.40 2.80 (ii) The Minitab commands follow. Note that we have to tell Minitab that sites are nested within ad-type and that ad-type is crossed with time. MTB > name c1='sales' c2='ad' c3='time' c4='site' MTB > anova sales = site(ad) ad|time; SUBC> random site; SUBC> restrict; SUBC> ems; SUBC> mean ad|time. Factor Type Levels Values site(ad) random 5 1 2 3 4 5 ad fixed 2 1 2 time fixed 3 1 2 3 Analysis of Variance for sales Source DF SS MS F P site(ad) 8 1833681 229210 640.31 0.000 ad 1 168151 168151 0.73 0.417 time 2 67073 33537 93.69 0.000 ad*time 2 391 196 0.55 0.589 Error 16 5727 358 Total 29 2075023 Source Variance Error Expected Mean Square component term (using restricted model) 1 site(ad) 76284.0 5 (5) + 3(1) 2 ad 1 (5) + 3(1) + 15Q[2] 3 time 5 (5) + 10Q[3] 4 ad*time 5 (5) + 5Q[4] 5 Error 358.0 (5) MEANS ad N sales 1 15 739.40 2 15 589.67 time N sales 1 10 648.40 2 10 728.80 3 10 616.40 ad time N sales 1 1 5 718.60 1 2 5 804.20 1 3 5 695.40 2 1 5 578.20 2 2 5 653.40 2 3 5 537.40 (iii) The test statistic for ad type, the between-subjects factor, is .73. This is a rounded-off version of 168151/229210, or MSA/MSS(A). NWK Table 29.9, the expected mean squares table, shows that E(MSS(A)) contains all of the terms in E(MSA) except for the part due to the variability among levels of A, which is what we want to isolate. The degrees of freedom for this test are (a-1) and a(n-1). With two levels of ad-type, a-1=1. We have 5 replications per cell, so a(n- 1)=2*4=8. Therefore the degrees of freedom are 1 and 8. Note: In addition to recalculating the test statistic to verify the error term used by Minitab, you can look at the EMS table, which shows the error term used for each test. (iv) The test statistic for time, the within-subjects factor, is 93.69. This is a rounded-off version of 33537/358, or MSB/MSW. The test statistic for the interaction between ad and time is .55, which is equal to 196/358, or MSAB/MSW. If we assume perfect compound symmetry, the degrees of freedom for the test statistic for time are those that correspond to the numerator and denominator of the test statistic, as usual: b-1 and a(n-1)(b-1), or 2 and 16. To apply the Box correction, multiply both of these values by 1/(b-1), or .5; this is the conservative value for 'epsilon'. With this correction, the degrees of freedom are 1 and 8. Alternatively, we could use the Greenhouse-Geisser epsilon factor , which can be computed (by me) as .5953 (from SAS or bmd). If we multiply the numerator and denominator df by this factor, the resulting df are approximately 1.2 and 9.6, which we can round down to 1 and 9. Here it makes little difference whether we use the Box or the G-G correction factor. ---------------------------- b. Winer dial example (i) The following data were created in a text file and then read into Minitab: MTB > print c1-c4 ROW accuracy method shape subj 1 0 1 1 1 2 3 1 1 2 3 4 1 1 3 4 4 2 1 1 5 5 2 1 2 6 7 2 1 3 7 0 1 2 1 8 1 1 2 2 9 3 1 2 3 10 2 2 2 1 11 4 2 2 2 12 5 2 2 3 13 5 1 3 1 14 5 1 3 2 15 6 1 3 3 16 7 2 3 1 17 6 2 3 2 18 8 2 3 3 19 3 1 4 1 20 4 1 4 2 21 2 1 4 3 22 8 2 4 1 23 6 2 4 2 24 9 2 4 3 You could also put the data into Minitab directly, using the following commands: MTB > set c1 DATA> 0 3 4 4 5 7 0 1 3 2 4 5 5 5 6 7 6 8 3 4 2 8 6 9 DATA> end MTB > set c2 DATA> 4(1:2)3 DATA> end MTB > set c3 DATA> (1:4)6 DATA> end MTB > set c4 DATA> 8(1:3) DATA> end (ii) MTB > anova accuracy=subj(method) method|shape; SUBC> random subj; SUBC> restrict; SUBC> ems; SUBC> mean method|shape. Factor Type Levels Values subj(method) random 3 1 2 3 method fixed 2 1 2 shape fixed 4 1 2 3 4 Analysis of Variance for accuracy Source DF SS MS F P subj(method) 4 17.167 4.292 3.47 0.042 method 1 51.042 51.042 11.89 0.026 shape 3 47.458 15.819 12.80 0.000 method*shape 3 7.458 2.486 2.01 0.166 Error 12 14.833 1.236 Total 23 137.958 Source Variance Error Expected Mean Square component term (using restricted model) 1 subj(method) 0.7639 5 (5) + 4(1) 2 method 1 (5) + 4(1) + 12Q[2] 3 shape 5 (5) + 6Q[3] 4 method*shape 5 (5) + 3Q[4] 5 Error 1.2361 (5) MEANS method N accuracy 1 12 3.0000 2 12 5.9167 shape N accuracy 1 6 3.8333 2 6 2.5000 3 6 6.1667 4 6 5.3333 method shape N accuracy 1 1 3 2.3333 1 2 3 1.3333 1 3 3 5.3333 1 4 3 3.0000 2 1 3 5.3333 2 2 3 3.6667 2 3 3 7.0000 2 4 3 7.6667 (iii) Test for between-subjects factor, method: Ho: alpha(j)=0 for all j Ha: not all alpha(j)=0 Test statistic is MSA/MSS(A) = 51.042/4.292 = 11.89 F critical value is F(.983,1,4) = 15.50 (note alpha split among 3 tests) Do not reject Ho. note: no correction (Box etc) for between-subjects factors Test for within-subjects factor, shape: Ho: beta(k)=0 for all k Ha: not all beta(k)=0 Test statistic is MSB/MSW = 15.819/1.236 = 12.80 F critical value is F(.983,1,4) = 15.50 Now using Box lower bound, we divide the numerator and denominator df, 3 and 12, by 3 to get df of 1 and 4. So with Box do not reject Ho. Note: The F critical value when no correction is applied is 5.07; if we assumed compound symmetry, we would reject Ho for the within-subjects factor. From BMDP output Greenhouse Geisser correction is approx 1/2, rather than 1/3 from Box. That would yield 2,6 or 1,6 df for test statistic of 12.8 . get a quick critical value for F 1,6 MTB > invcdf .983; SUBC> f 1 6. Inverse Cumulative Distribution Function F distribution with 1 d.f. in numerator and 6 d.f. in denominator P( X <= x) x 0.9830 10.7034 so with the more accurate/less-severe correction we can reject Ho. Test for method*shape interaction: Ho: alpha-beta(jk)=0 for all j,k Ha: not all alpha-beta(jk)=0 Test statistic is MSAB/MSW = 2.486/1.236 = 2.01 F critical value is F(.983,3,12) = 5.07 Do not reject Ho. ================================================== c. SAS implementation data shoes; input ad m1 m2 m3; datalines; 1 958.00 1047.00 933.00 1 1005.00 1122.00 986.00 1 351.00 436.00 339.00 1 549.00 632.00 512.00 1 730.00 784.00 707.00 2 780.00 897.00 718.00 2 229.00 275.00 202.00 2 883.00 964.00 817.00 2 624.00 695.00 599.00 2 375.00 436.00 351.00 ; proc glm data=shoes; class ad; model m1--m3 = ad /nouni; repeated Time 3 (1 2 3) /summary printe; run; The SAS System 13:21 Monday, February 12, 2001 1 The GLM Procedure Class Level Information Class Levels Values ad 2 1 2 Number of observations 10 The SAS System 13:21 Monday, February 12, 2001 2 The GLM Procedure Repeated Measures Analysis of Variance Repeated Measures Level Information Dependent Variable m1 m2 m3 Level of Time 1 2 3 [deleted unecessary SAS info] The SAS System 13:21 Monday, February 12, 2001 4 The GLM Procedure Repeated Measures Analysis of Variance Tests of Hypotheses for Between Subjects Effects Source DF Type III SS Mean Square F Value Pr > F ad 1 168150.533 168150.533 0.73 0.4166 Error 8 1833680.933 229210.117 The SAS System 13:21 Monday, February 12, 2001 5 The GLM Procedure Repeated Measures Analysis of Variance Univariate Tests of Hypotheses for Within Subject Effects Adj Pr > F Source DF Type III SS Mean Square F Value Pr > F G - G H - F Time 2 67073.06667 33536.53333 93.69 <.0001 <.0001 <.0001 Time*ad 2 391.46667 195.73333 0.55 0.5892 0.5075 0.5387 Error(Time) 16 5727.46667 357.96667 Greenhouse-Geisser Epsilon 0.5953 Huynh-Feldt Epsilon 0.7274 The SAS System 13:21 Monday, February 12, 2001 6 The GLM Procedure Repeated Measures Analysis of Variance Analysis of Variance of Contrast Variables Time_N represents the contrast between the nth level of Time and the last Contrast Variable: Time_1 Source DF Type III SS Mean Square F Value Pr > F Mean 1 10240.00000 10240.00000 38.22 0.0003 ad 1 774.40000 774.40000 2.89 0.1275 Error 8 2143.60000 267.95000 Contrast Variable: Time_2 Source DF Type III SS Mean Square F Value Pr > F Mean 1 126337.6000 126337.6000 99.26 <.0001 ad 1 129.6000 129.6000 0.10 0.7578 Error 8 10182.8000 1272.8500 ---------------------- problem 6. Bock Vocabulary data MTB > print c1-c4 ROW C1 C2 C3 C4 1 1.75 2.60 3.76 3.68 2 0.90 2.47 2.44 3.43 3 0.80 0.93 0.40 2.27 4 2.42 4.15 4.56 4.21 5 -1.31 -1.31 -0.66 -2.22 6 -1.56 1.67 0.18 2.33 7 1.09 1.50 0.52 2.33 8 -1.92 1.03 0.50 3.04 9 -1.61 0.29 0.73 3.24 10 2.47 3.64 2.87 5.38 11 -0.95 0.41 0.21 1.82 12 1.66 2.74 2.40 2.17 13 2.07 4.92 4.46 4.71 14 3.30 6.10 7.19 7.46 15 2.75 2.53 4.28 5.93 16 2.25 3.38 5.79 4.40 17 2.08 1.74 4.12 3.62 18 0.14 0.01 1.48 2.78 19 0.13 3.19 0.60 3.14 20 2.19 2.65 3.27 2.73 21 -0.64 -1.31 -0.37 4.09 22 2.02 3.45 5.32 6.01 23 2.05 1.80 3.91 2.49 24 1.48 0.47 3.63 3.88 25 1.97 2.54 3.26 5.62 26 1.35 4.63 3.54 5.24 27 -0.56 -0.36 1.14 1.34 28 0.26 0.08 1.17 2.15 29 1.22 1.41 4.66 2.62 30 -1.43 0.80 -0.03 1.04 31 -1.17 1.66 2.11 1.42 32 1.68 1.71 4.07 3.30 33 -0.47 0.93 1.30 0.76 34 2.18 6.42 4.64 4.82 35 4.21 7.08 6.00 5.65 36 8.26 9.55 10.24 10.58 37 1.24 4.90 2.42 2.54 38 5.94 6.56 9.36 7.72 39 0.87 3.36 2.58 1.73 40 -0.09 2.29 3.08 3.35 41 3.24 4.78 3.52 4.84 42 1.03 2.10 3.88 2.81 43 3.58 4.67 3.83 5.19 44 1.41 1.75 3.70 3.77 45 -0.65 -0.11 2.40 3.53 46 1.52 3.04 2.74 2.63 47 0.57 2.71 1.90 2.41 48 2.18 2.96 4.78 3.34 49 1.10 2.65 1.72 2.96 50 0.15 2.69 2.69 3.50 51 -1.27 1.26 0.71 2.68 52 2.81 5.19 6.33 5.93 53 2.62 3.54 4.86 5.80 54 0.11 2.25 1.56 3.92 55 0.61 1.14 1.35 0.53 56 -2.19 -0.42 1.54 1.16 57 1.55 2.42 1.11 2.18 58 -0.04 0.50 2.60 2.61 59 3.10 2.00 3.92 3.91 60 -0.29 2.62 1.60 1.86 61 2.28 3.39 4.91 3.89 62 2.57 5.78 5.12 4.98 63 -2.19 0.71 1.56 2.31 64 -0.04 2.44 1.79 2.64 **Here's some descriptives on the data MTB > describe c1-c4 N MEAN MEDIAN TRMEAN STDEV SEMEAN C1 64 1.137 1.230 1.046 1.889 0.236 C2 64 2.542 2.455 2.457 2.085 0.261 C3 64 2.988 2.715 2.854 2.169 0.271 C4 64 3.472 3.270 3.403 1.925 0.241 MIN MAX Q1 Q3 C1 -2.190 8.260 -0.078 2.187 C2 -1.310 9.550 1.058 3.435 C3 -0.660 10.240 1.383 4.240 C4 -2.220 10.580 2.330 4.633 ANALYSIS OF VARIANCE C10 SOURCE DF SS MS C11 3 194.338 64.779 [occasions] C12 63 873.603 13.867 ERROR 189 154.942 0.820 TOTAL 255 1222.883 **Now implement orthogonal polynomial decomposition of the 3 df occasions factor into linear, quadratic, cubic. The orthogonal decomp matrix for this is L Q C 1 -3 1 -1 1 -1 -1 3 1 1 -1 -3 1 3 1 1 here's the arithmetic in mathematica In[1]:= means = {1.137, 2.542, 2.998, 3.472} Out[1]= {1.137,2.542,2.998,3.472} In[2]:= (* Matrix orthogonal contrasts 1 -3 1 -1 1 -1 -1 3 1 1 -1 -3 1 3 1 1 *) In[3]:= (* linear *) In[9]:= linSS =64*(means.{-3,-1,1,3})^2/Apply[Plus,{-3,-1,1,3}^2] Out[9]= 178.133 In[7]:= (* quadratic *) In[10]:= quadSS =64*(means.{1,-1,-1,1})^2/Apply[Plus,{1,-1,-1,1}^2] Out[10]= 13.8682 In[11]:= (* cubic *) In[12]:= cubSS =64*(means.{-1,3,-3,1})^2/Apply[Plus,{-1,3,-3,1}^2] Out[12]= 2.99228 In[13]:= linSS + quadSS + cubSS Out[13]= 194.993 In[14]:= (* pretty close to SSocc 194.3 *) ------------ problem 7. student question data, repeated measures design since these data came in the form of a row for each score (observation) rather than a row for each student (subject) the SAS form, I did this in Minitab. If someone bothered to reformat data to each student a row, send it to me and I'll append to the data file for this example. a quick look at these data (outcomes 0,1,2,3) shows we are unlikely to see much "treatment" effect as the intervention wasn't very strong. MTB > dotplot c1; SUBC> by c2. Dotplot: score by treatment : : . : treatmen: : 1 : . : : : : : : : : : : : +---------+---------+---------+---------+---------+-------score . : . : treatmen : : 2 : : : : : : : : : : : : : +---------+---------+---------+---------+---------+-------score 0.00 0.60 1.20 1.80 2.40 3.00 which you can also see by the t-test MTB > anova score = treatment ANOVA: score versus treatment Factor Type Levels Values treatment fixed 2 1, 2 Analysis of Variance for score Source DF SS MS F P treatment 1 0.0556 0.0556 0.06 0.800 Error 70 60.3889 0.8627 Total 71 60.4444 the full repeated measures analysis has the new feature of a 2x2 factorial stucture on the repeated measures (see bloodflow class ex) plus one between subjects factor MTB > anova score = student(treatment) treatment|factrela|mccr ANOVA: score versus treatment, factrela, mccr, student Factor Type Levels Values student(treatment) fixed 9 1, 2, 3, 4, 5, 6, 7, 8, 9 treatment fixed 2 1, 2 factrela fixed 2 1, 2 mccr fixed 2 1, 2 Analysis of Variance for score Source DF SS MS F P student(treatment) 16 23.3889 1.4618 3.44 0.000 treatment 1 0.0556 0.0556 0.13 0.719 factrela 1 0.2222 0.2222 0.52 0.473 mccr 1 1.3889 1.3889 3.27 0.077 treatment*factrela 1 1.3889 1.3889 3.27 0.077 treatment*mccr 1 0.8889 0.8889 2.09 0.155 factrela*mccr 1 12.5000 12.5000 29.43 0.000 treatment*factrela*mccr 1 0.2222 0.2222 0.52 0.473 Error 48 20.3889 0.4248 Total 71 60.4444 just to break up this anova table Analysis of Variance for score Source DF SS MS F P student(treatment) 16 23.3889 1.4618 between-subs treatment 1 0.0556 0.0556 ---------------------- factrela 1 0.2222 0.2222 0.52 0.473 mccr 1 1.3889 1.3889 3.27 0.077 within subs treatment*factrela 1 1.3889 1.3889 3.27 0.077 treatment*mccr 1 0.8889 0.8889 2.09 0.155 factrela*mccr 1 12.5000 12.5000 29.43 0.000 treatment*factrela*mccr 1 0.2222 0.2222 0.52 0.473 Error 48 20.3889 0.4248 Total 71 60.4444 Now you can break this anova up (like Winer dial ex) ignoring the subjects nested within treatement MTB > anova score = treatment|factrela|mccr ANOVA: score versus treatment, factrela, mccr Factor Type Levels Values treatment fixed 2 1, 2 factrela fixed 2 1, 2 mccr fixed 2 1, 2 Analysis of Variance for score Source DF SS MS F P treatment 1 0.0556 0.0556 0.08 0.777 factrela 1 0.2222 0.2222 0.32 0.571 mccr 1 1.3889 1.3889 2.03 0.159 treatment*factrela 1 1.3889 1.3889 2.03 0.159 treatment*mccr 1 0.8889 0.8889 1.30 0.259 factrela*mccr 1 12.5000 12.5000 18.27 0.000 treatment*factrela*mccr 1 0.2222 0.2222 0.32 0.571 Error 64 43.7778 0.6840 Total 71 60.4444 MTB > #do within treatments to break down the error term MTB > Copy 'score' 'treatment' 'student' 'factrela' 'mccr' c11 - c15; SUBC> Include; SUBC> Where "treatment = 1"; SUBC> Varnames. Including rows where treatment = 1 36 rows excluded MTB > Copy 'score' 'treatment' 'student' 'factrela' 'mccr' c21 - c25; SUBC> Include; SUBC> Where "treatment = 2"; SUBC> Varnames. Including rows where treatment = 2 36 rows excluded MTB > anova c11 = c13 c14|c15 ANOVA: score_1 versus student_1, factrela_1, mccr_1 Factor Type Levels Values student_1 fixed 9 1, 2, 3, 4, 5, 6, 7, 8, 9 factrela_1 fixed 2 1, 2 mccr_1 fixed 2 1, 2 Analysis of Variance for score_1 Source DF SS MS F P student_1 8 13.0000 1.6250 4.23 0.003 factrela_1 1 0.2500 0.2500 0.65 0.428 mccr_1 1 2.2500 2.2500 5.86 0.023 factrela_1*mccr_1 1 8.0278 8.0278 20.89 0.000 Error 24 9.2222 0.3843 Total 35 32.7500 S = 0.619886 R-Sq = 71.84% R-Sq(adj) = 58.93% MTB > anova c21 = c23 c24|c25 ANOVA: score_2 versus student_2, factrela_2, mccr_2 Factor Type Levels Values student_2 fixed 9 1, 2, 3, 4, 5, 6, 7, 8, 9 factrela_2 fixed 2 1, 2 mccr_2 fixed 2 1, 2 Analysis of Variance for score_2 Source DF SS MS F P student_2 8 10.3889 1.2986 2.79 0.025 factrela_2 1 1.3611 1.3611 2.93 0.100 mccr_2 1 0.0278 0.0278 0.06 0.809 factrela_2*mccr_2 1 4.6944 4.6944 10.09 0.004 Error 24 11.1667 0.4653 Total 35 27.6389 the student rows sum to the student(treatment) and the Error rows some to the error term for the within subjects portion of the anova table rep measXsubject within method ----------------- just for reference to the other examples with a single repeated measure take the 2x2 factorial structure and string it out to a 1x4 let c6 = c4*c5 + (2 -c4)*(c5-1) MTB > anova score = student(treatment) treatment|c6 ANOVA: score versus treatment, C6, student Factor Type Levels Values student(treatment) fixed 9 1, 2, 3, 4, 5, 6, 7, 8, 9 treatment fixed 2 1, 2 C6 fixed 4 1, 2, 3, 4 Analysis of Variance for score Source DF SS MS F P student(treatment) 16 23.3889 1.4618 3.44 0.000 treatment 1 0.0556 0.0556 0.13 0.719 C6 3 14.1111 4.7037 11.07 0.000 treatment*C6 3 2.5000 0.8333 1.96 0.132 Error 48 20.3889 0.4248 Total 71 60.4444 you can see the single rep measure factor (c6) and the c6xtreatment interaction include the SS components for the factorial version ----------------------------------------------------------------- problem 8. (i) So first let's take a look at the descriptive statistics for these data (five groups, single classification. MTB > desc c1-c5 N MEAN MEDIAN TRMEAN STDEV SEMEAN C1 5 39.40 36.00 39.40 7.92 3.54 C2 5 44.20 41.00 44.20 10.40 4.65 C3 5 52.0 62.0 52.0 26.0 11.6 C4 5 40.80 43.00 40.80 10.13 4.53 C5 5 53.20 52.00 53.20 17.48 7.82 MIN MAX Q1 Q3 C1 32.00 51.00 33.00 47.50 C2 32.00 56.00 35.00 55.00 C3 10.0 75.0 27.5 71.5 C4 30.00 55.00 31.00 49.50 C5 35.00 80.00 38.00 69.00 The standard deviations range from 8 to 26, representing variances ranging from 64 to 680, a range of 10 to 1 (rather non-equal). Now, one consideration is that sample sizes are very small (so we cannot assume these differences are highly significant) and also since sample sizes are equal, we don't need to worry much about the effects of unequal variances on the one-way anova tests. We could, to carry this further, look at the Brown-Forsythe or Welch alternatives to standard anova (done in part ii) ii. Standard one-way anova (unstacked data). F-test for the null hypothesis of equal group means across the 5 groups (against the alternative of unequal means) has a miniscule value .81. Compare with critical value based on F(4,20) (F.95(4,20) = 2.87) so we cannot reject the null hypothesis of equal group means. MTB > aovoneway c1-c5 ANALYSIS OF VARIANCE SOURCE DF SS MS F p FACTOR 4 808 202 0.81 0.536 ERROR 20 5016 251 TOTAL 24 5824 INDIVIDUAL 95 PCT CI'S FOR MEAN BASED ON POOLED STDEV LEVEL N MEAN STDEV ----------+---------+---------+------ C1 5 39.40 7.92 (-----------*-----------) C2 5 44.20 10.40 (-----------*-----------) C3 5 52.00 25.97 (-----------*------------) C4 5 40.80 10.13 (-----------*-----------) C5 5 53.20 17.48 (-----------*------------) ----------+---------+---------+------ POOLED STDEV = 15.84 36 48 60 (iii). For comparison, we are asked to try out the non-parametric alternative to the standard one-way anova, Kruskal-Wallis. We need to put the data in stacked form to carry this out. MTB > stack c1 c2 c3 c4 c5 into c6; SUBC> subscripts c7. MTB > kruskal-wallis c6 c7 LEVEL NOBS MEDIAN AVE. RANK Z VALUE 1 5 36.00 9.5 -1.19 2 5 41.00 12.3 -0.24 3 5 62.00 17.0 1.36 4 5 43.00 10.1 -0.99 5 5 52.00 16.1 1.05 OVERALL 25 13.0 H = 4.32 d.f. = 4 p = 0.366 H = 4.33 d.f. = 4 p = 0.364 (adj. for ties) And turning to the chi-square with (5-1) degrees of freedom (see NWK 18.7ver4,5) the critical value is 9.49 (type I error rate .05). The test statistic H is 4.3. So we do not reject Ho, just as with the parametric anova on raw data (or even transformed data if we had tried to stabilize variance). ============================================================================== END HW2 solutions