Stat209/Ed260 D Rogosa 2/1/08 Lab 2 Rogosa R Session Best to read through at least the formal lab write up with exposition and commands, before working through this version. For this version of the Lab 2 I focused on reading in the two data files, constructing a combined data file that was amenable to lme, and the carrying out the mixed-model analysis shown in the 1/31/08 class handout. Another resource/treatment for those who have done alot of R is John Fox's "Script for web appendix on mixed models" doing fancier graphics etc http://socserv.mcmaster.ca/jfox/Books/Companion/mixed-models.txt note I'm still using 2.5.1 from the fall (but nlme is part of this distribution as it is with current 2.6.1) R version 2.5.1 (2007-06-27) Copyright (C) 2007 The R Foundation for Statistical Computing ISBN 3-900051-07-0 R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(nlme) # The data sets we need are part of the nlme package #(along with the random effects functions--lmList. lme etc, #so you could look at the nlme package documentation from CRAN #or the equivalent functionality from another program) > data(MathAchieve) # this loads the individual data file ------------------------------------------------------ for completeness I insert here the documentation MathAchieve {nlme} R Documentation Mathematics achievement scores Description The MathAchieve data frame has 7185 rows and 6 columns. Format This data frame contains the following columns: School an ordered factor identifying the school that the student attends Minority a factor with levels No Yes indicating if the student is a member of a minority racial group. Sex a factor with levels Male Female SES a numeric vector of socio-economic status. MathAch a numeric vector of mathematics achievement scores. MEANSES a numeric vector of the mean SES for the school. Details Each row in this data frame contains the data for one student. Examples data(MathAchieve) summary(MathAchieve) ------------------------------------------------------ # here's the first 10 cases (all from the same school) > MathAchieve[1:10,] School Minority Sex SES MathAch MEANSES 1 1224 No Female -1.528 5.876 -0.428 2 1224 No Female -0.588 19.708 -0.428 3 1224 No Male -0.528 20.349 -0.428 4 1224 No Male -0.668 8.781 -0.428 5 1224 No Male -0.158 17.898 -0.428 6 1224 No Male 0.022 4.583 -0.428 7 1224 No Female -0.618 -2.832 -0.428 8 1224 No Male -0.998 0.523 -0.428 9 1224 No Female -0.888 1.527 -0.428 10 1224 No Male -0.458 21.521 -0.428 > dim(MathAchieve) [1] 7185 6 Public Catholic 3642 3543 # read in school-level file > data(MathAchSchool) ----------------------------- here's the documentation for that file MathAchSchool {nlme} R Documentation School demographic data for MathAchieve Description The MathAchSchool data frame has 160 rows and 7 columns. Format This data frame contains the following columns: School a factor giving the school on which the measurement is made. Size a numeric vector giving the number of students in the school Sector a factor with levels Public Catholic PRACAD a numeric vector giving the percentage of students on the academic track DISCLIM a numeric vector measuring the discrimination climate HIMINTY a factor with levels 0 1 MEANSES a numeric vector giving the mean SES score. Details These variables give the school-level demographic data to accompany the MathAchieve data. Examples data(MathAchSchool) ---------------------------------------------- #here's the first 10 cases > MathAchSchool[1:10,] School Size Sector PRACAD DISCLIM HIMINTY MEANSES 1224 1224 842 Public 0.35 1.597 0 -0.428 1288 1288 1855 Public 0.27 0.174 0 0.128 1296 1296 1719 Public 0.32 -0.137 1 -0.420 1308 1308 716 Catholic 0.96 -0.622 0 0.534 1317 1317 455 Catholic 0.95 -1.694 1 0.351 1358 1358 1430 Public 0.25 1.535 0 -0.014 1374 1374 2400 Public 0.50 2.016 0 -0.007 1433 1433 899 Catholic 0.96 -0.321 0 0.718 1436 1436 185 Catholic 1.00 -1.141 0 0.569 1461 1461 1672 Public 0.78 2.096 0 0.683 > dim(MathAchSchool) [1] 160 7 > table(MathAchSchool$Sector) Public Catholic 90 70 ##################################################### Next series of activities are data management, cleaning up these two data sets, and creating one combined (individ and school attributes) data set ################################################ # get correct mean school ses, using individ data > attach(MathAchieve) > mses = tapply(SES, School, mean) > detach(MathAchieve) > mses 8367 8854 4458 5762 6990 5815 0.025571429 -0.756750000 -1.049458333 -1.193945946 -0.489698113 -0.680000000 7172 4868 7341 1358 4383 2305 -0.289818182 0.361117647 -0.164078431 -0.019666667 0.073200000 -0.628000000 and so forth # now start a new data frame and add variable to it > Bryk = as.data.frame(MathAchieve[, c("School", "SES", "MathAch" )]) > names(Bryk) = c("school", "ses", "mathach") > Bryk[1:10,] school ses mathach 1 1224 -1.528 5.876 2 1224 -0.588 19.708 3 1224 -0.528 20.349 4 1224 -0.668 8.781 5 1224 -0.158 17.898 6 1224 0.022 4.583 7 1224 -0.618 -2.832 8 1224 -0.998 0.523 9 1224 -0.888 1.527 10 1224 -0.458 21.521 > #individ level file > Bryk$meanses = mses[as.character(Bryk$school)] > Bryk[1:10,] school ses mathach meanses 1 1224 -1.528 5.876 -0.434383 2 1224 -0.588 19.708 -0.434383 3 1224 -0.528 20.349 -0.434383 4 1224 -0.668 8.781 -0.434383 5 1224 -0.158 17.898 -0.434383 6 1224 0.022 4.583 -0.434383 7 1224 -0.618 -2.832 -0.434383 8 1224 -0.998 0.523 -0.434383 9 1224 -0.888 1.527 -0.434383 10 1224 -0.458 21.521 -0.434383 > #added group-level var to individ-level file > Bryk$cses = Bryk$ses - Bryk$meanses > Bryk[1:10,] school ses mathach meanses cses 1 1224 -1.528 5.876 -0.434383 -1.09361702 2 1224 -0.588 19.708 -0.434383 -0.15361702 3 1224 -0.528 20.349 -0.434383 -0.09361702 4 1224 -0.668 8.781 -0.434383 -0.23361702 5 1224 -0.158 17.898 -0.434383 0.27638298 6 1224 0.022 4.583 -0.434383 0.45638298 7 1224 -0.618 -2.832 -0.434383 -0.18361702 8 1224 -0.998 0.523 -0.434383 -0.56361702 9 1224 -0.888 1.527 -0.434383 -0.45361702 10 1224 -0.458 21.521 -0.434383 -0.02361702 > # added individ ses minus school meanses > sector = MathAchSchool$Sector > names(sector) = row.names(MathAchSchool) > Bryk$sector = sector[as.character(Bryk$school)] > Bryk[1:10,] school ses mathach meanses cses sector 1 1224 -1.528 5.876 -0.434383 -1.09361702 Public 2 1224 -0.588 19.708 -0.434383 -0.15361702 Public 3 1224 -0.528 20.349 -0.434383 -0.09361702 Public 4 1224 -0.668 8.781 -0.434383 -0.23361702 Public 5 1224 -0.158 17.898 -0.434383 0.27638298 Public 6 1224 0.022 4.583 -0.434383 0.45638298 Public 7 1224 -0.618 -2.832 -0.434383 -0.18361702 Public 8 1224 -0.998 0.523 -0.434383 -0.56361702 Public 9 1224 -0.888 1.527 -0.434383 -0.45361702 Public 10 1224 -0.458 21.521 -0.434383 -0.02361702 Public > #so we added a sector (pub/Cath) indicator to the individ level file > dim(Bryk) [1] 7185 6 # now our data set is complete ############################################################################# > remove(sector) > attach(Bryk) # now do the within-school regressions, math on SES, via OLS, for each sector in turn # and show coeficients. Using lmList # would be better to use cses as the indiv predictor, as then the intercept # of each regression would be the mean mathach # cathRegC = lmList(mathach ~ cses| school, subset = sector == "Catholic", data = Bryk) # pubRegC = lmList(mathach ~ cses| school, subset = sector == "Public", data = Bryk) > table(MathAchSchool$Sector) Public Catholic 90 70 > cathReg = lmList(mathach ~ ses| school, subset = sector == "Catholic", data = Bryk) > coef(cathReg) (Intercept) ses 7172 8.355037 0.99448053 4868 11.845609 1.28647122 2305 10.646595 -0.78211116 8800 9.157380 2.56812536 5192 10.511666 1.60349497 4523 8.479699 2.38078920 6816 13.822772 1.35271694 2277 7.762289 -2.01502640 8009 13.281974 1.55687204 4530 10.039029 1.64742597 9021 13.114915 2.52415849 4511 13.413589 0.04251038 6578 13.228887 2.39054400 9347 12.970266 2.68599393 3705 10.059828 1.15849738 3533 10.367122 -0.31177009 4253 9.263557 -0.39954421 7342 11.619820 1.01245787 3499 12.830059 0.99238234 7364 14.195326 0.25949653 5650 14.258257 0.68061888 2658 12.243090 2.62990138 9508 14.120280 3.95379067 4292 12.786274 -0.16060800 8857 14.882063 0.80222326 1317 12.737763 1.27391282 2629 14.938378 0.22234915 4223 14.856361 2.48658514 1462 9.940754 -0.82880861 4931 13.458207 0.91184501 5667 12.301934 3.52296571 5720 14.201984 2.46630669 3498 16.476910 -0.13108527 3688 14.033848 1.53672220 8165 15.894442 1.80224301 9104 15.721400 1.49398346 8150 14.909166 -0.18571088 4042 13.634585 1.69362361 6074 14.202264 1.52908771 1906 14.885481 2.14550546 3992 14.448670 0.53787533 4173 12.616958 3.36567036 5761 12.141945 3.10801055 7635 14.470598 2.44847402 2458 13.312180 2.95669443 3610 14.999882 2.95584910 3838 15.975920 0.59789922 9359 15.565737 -0.83347896 2208 14.288928 2.63664069 1477 14.032087 1.23060592 3039 15.936130 2.95566676 1308 16.188959 0.12602422 1433 18.398885 1.85429439 1436 17.210643 1.60056175 2526 17.000856 0.15950396 2755 16.164849 0.56049993 2990 17.580729 1.32453982 3020 14.014701 1.65367684 3427 19.790291 -0.48816617 5404 14.413261 1.21423377 5619 13.206326 5.25753341 6366 15.186330 1.51752409 6469 17.134818 1.75528869 7011 13.972581 5.07465093 7332 13.904533 2.46320360 7688 18.400688 0.11634493 8193 16.644665 2.33521174 8628 16.700530 1.23139333 9198 17.807900 2.61054903 9586 13.825077 1.67208118 > length(cathReg) [1] 70 > pubReg = lmList(mathach ~ ses| school, subset = sector == "Public", data = Bryk) > coef(pubReg) (Intercept) ses 8367 4.546383 0.25037481 8854 5.707002 1.93884461 4458 6.999212 1.13183718 5762 3.114085 -1.01409923 6990 6.440875 0.94769026 5815 9.323601 3.01800184 7341 10.073717 1.70370049 1358 11.305904 5.06800874 4383 11.013290 6.18019316 3088 9.992462 1.79134272 8775 9.807161 1.00145622 7890 7.998220 -0.65596791 6144 9.757160 2.77027083 6443 9.213132 -0.74334668 6808 9.505033 2.27610417 2818 13.568305 2.80240778 9340 12.493449 3.30949739 5783 12.611506 3.11405659 3013 12.780651 3.83979913 7101 11.840860 1.29545472 2639 6.007785 -0.63010463 3377 8.749553 -0.74685357 1296 8.093779 1.07595915 4350 11.188243 4.37191824 9397 10.027332 2.44644371 2655 16.012744 5.22703806 9292 10.730525 0.75867588 8983 11.399304 1.38594141 8188 12.129701 4.39759465 4410 13.206920 2.76019657 8707 12.357826 3.39153230 1499 9.353870 3.63473417 8477 13.269838 3.81215624 1288 13.114937 3.25544868 6291 12.184191 3.98086905 1224 10.805132 2.50858170 3967 12.780449 3.31106817 6415 12.521243 3.53007531 9550 10.882731 3.89193777 6464 7.770259 1.00349366 5937 16.099212 1.03961575 7919 13.024135 3.98937031 3716 12.835454 5.86378696 1909 13.715130 2.85479025 2651 10.775417 4.89905964 2467 11.181929 3.13713485 1374 9.777194 3.85432284 6600 11.955740 4.70428938 3881 11.644142 2.39070579 2995 9.978231 1.43231309 5838 13.399580 1.85305195 9158 10.097960 3.86121158 8946 10.969203 1.69048215 7232 12.993356 5.00160032 2917 8.885629 1.13585350 6170 15.633290 4.81178434 2030 11.614521 1.41198038 8357 14.668854 2.67578156 8531 12.174522 3.31822792 4420 14.537637 2.95866415 3999 11.268793 3.56697524 4325 13.375410 2.75604958 6484 13.024537 0.60567730 6897 13.846070 3.58048769 7734 13.392353 6.03522862 8175 11.974832 1.61237121 8874 13.450957 4.09630389 9225 13.936080 2.88589215 5640 13.836071 3.82774230 6089 15.424127 1.69245494 2768 11.075178 3.51228023 5819 11.780690 1.97252104 6397 13.393425 2.75900524 1461 12.597370 6.26649691 1637 9.222729 3.11680644 1942 18.049937 0.08938349 1946 12.893909 3.58583321 2336 15.676514 1.90497358 2626 13.662437 4.09967961 2771 13.292965 4.26818800 3152 13.123116 2.76824957 3332 13.036284 2.03095286 3351 10.664585 2.45503986 3657 11.946440 3.73590722 4642 14.221260 3.27238611 7276 12.384219 3.77336399 7345 11.198507 4.21192377 7697 14.911853 3.13621972 8202 11.394780 3.70590198 8627 10.687731 1.86955959 > length(pubReg) [1] 90 #now create boxplots of level and slope side-by-side for each sector > pubcoef= coef(pubReg) > cathcoef= coef(cathReg) > par( mfrow = c(1,2)) # opens a graphics window, creates the figure shown in lecture > boxplot(cathcoef[,1], pubcoef[,1], main = 'Intercepts', names = c('Catholic', 'Public')) > # these are the actual intercepts, better to use cses as predictor in lmlist, then these are emean ach > boxplot(cathcoef[,2], pubcoef[,2], main = 'Slopes', names = c('Catholic', 'Public')) # order the sector factor > Bryk$sector = factor(Bryk$sector, levels = c('Public', 'Catholic')) > Bryk$sector[1:10] [1] Public Public Public Public Public Public Public Public Public Public Levels: Public Catholic #fit the random-eccets models, obtaing coefficients etc presented in class-handout # on 1/31 and in the various references using the HSB data > bryklme = lme(mathach ~ meanses*cses + sector*cses, random = ~ cses|school, data = Bryk) > summary(bryklme) Linear mixed-effects model fit by REML Data: Bryk AIC BIC logLik 46523.66 46592.45 -23251.83 Random effects: Formula: ~cses | school Structure: General positive-definite, Log-Cholesky parametrization StdDev Corr (Intercept) 1.5426150 (Intr) cses 0.3182015 0.391 Residual 6.0597955 Fixed effects: mathach ~ meanses * cses + sector * cses Value Std.Error DF t-value p-value (Intercept) 12.127931 0.1992919 7022 60.85510 0e+00 meanses 5.332875 0.3691684 157 14.44564 0e+00 cses 2.945041 0.1556005 7022 18.92694 0e+00 sectorCatholic 1.226579 0.3062733 157 4.00485 1e-04 meanses:cses 1.039230 0.2988971 7022 3.47688 5e-04 cses:sectorCatholic -1.642674 0.2397800 7022 -6.85076 0e+00 Correlation: (Intr) meanss cses sctrCt mnss:c meanses 0.256 cses 0.075 0.019 sectorCatholic -0.699 -0.356 -0.053 meanses:cses 0.019 0.074 0.293 -0.026 cses:sectorCatholic -0.052 -0.027 -0.696 0.077 -0.351 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -3.1592608 -0.7231893 0.0170471 0.7544510 2.9582205 Number of Observations: 7185 Number of Groups: 160 > ########################################## end main output Lab2, see further comments in lab text we will revisit lme etc week 9 for growth curve models, longitudinal data