Stat209/Ed260 May 1, 2007 Take Home Problems #1 Due in class Tuesday May 8 (submission of hard copy, no Word files) or if you are away Statistics Department Fax (address to Rogosa) Department of Statistics -- Sequoia Hall 390 Serra Mall Stanford University Stanford, CA 94305-4065 Phone: (650) 723-2620 Fax: (650) 725-8977 ----------------------------- Usual Honor Code procedures: You may use any inanimate resources--no collaboration. This work is done under Stanford's Honor Code. Solutions for these problems are to be submitted in hard-copy form. Given that these problems are untimed, some care should be taken in presentation, clarity, format. Especially important is to give full and clear answers to questions, not just to submit unannotated computer output, although relevant output should be included. ------------------------------ Problem 1 Coleman data example revisited. In week 1, a multiple regression example using school means for 20 schools (see class web page links and handout for week 1) from the Mosteller and Tukey text. The text analysis, a multiple regression of the school means showed a negative weight for momed. To refresh, the data file has File coleman.dat contains data from a random sample of 20 schools (from the East) from the 1966 Coleman Report. The outcome measure vach in C7 is the verbal mean test score for all sixth graders in the school. The predictor variables are: in C2, staff salaries per pupil, in C3, percent white collar fathers for the sixth graders; in C4 a SES composite measure (deviation) for the sixth graders, in C5 Mean teacher's verbal test score, in C6 6th grade mean mother's educational level (1 unit=2 school yrs) a) take those data, 20 observations at the school level and fit a straight-line regression for outcome vach and predictor momed. Comment on the sign and magnitude of the momed coefficient. Also give the correlation between momed and vach. b) identify a second predictor, which when used along with momed, reverses the sign of the regression coefficient for momed. Show an adjusted (partial) variables scatterplot representing the coefficient for momed in this two-predictor fit to vach. I created an artificial data file at http://www-stat.stanford.edu/~rag/stat209/colemanindiv.dat This data set has 1000 rows and values for 3 variables vach momed ses The data set is constructed to be an individual-level data file corresponding to the 20 school Coleman data. That is each of the 20 schools have 50 individual entries, producing a data set of 1000 rows. The first 50 rows correspond to school 1 in the week 1 coleman data file, the next 50 rows correspond to school 2 c) obtain the total (ignoring schools) and within-pooled (relative standing) 3x3 correlation matrices. Compare with the between school (either derived from this artificial data or from the original data example) d) compare the coefficients for momed in a fit to vach (simple straight-line regression) for both total and a within-pooled regression to the between-schools regression result in part a. e) repeat part d using two predictors: momed and ses for fitting vach and comment f) obtain the 20 within school slopes of vach on momed. Is there a systematic relation of those slopes with the mean teacher verbal score for the school? ============================================================ Problem 2 Encouragement Design: worth its salt? [note I had an extended exercise based on these research reports and decided this was too much so I pulled problem 2 but I'll leave the descriptive stub here for interest and we may use the material in another form.] The in-the-news item for week 4 was an encouragement design on the effects of reduced salt intake British Medical Journal April 2007 Long term effects of dietary sodium reduction on cardiovascular disease outcomes http://www.bmj.com/cgi/content/short/bmj.39147.604896.55v1 This question focuses on the original TOPH I study description pages 1-2 of 2007 online posting The original publication on TOPH I was in 1993 Feasibility and efficacy of sodium reduction in the Trials of Hypertension Prevention, phase I. Trials of Hypertension Prevention Collaborative Research Group Hypertension 1993 22: 502-512 http://hyper.ahajournals.org.laneproxy.stanford.edu/cgi/reprint/22/4/502.pdf see page 503 for description of trials and Table 3 p.508 for the outcome data on sodium intake and Table 5 p.509 for the blood pressure outcomes (also 2007 pub Table 1) ====================================================== Problem 3 Causal Models of Publishing Productivity Homework 3 problem 3 considered one of the path analysis models from "Causal Models of Publishing Productivity in Psychology", Rogers & Maranto, J. Applied Psychology, 1989, 74(4), 636-649. direct link to paper http://content.apa.org/journals/apl/74/4/636.pdf The path analysis conducted by the authors from a sample of 86 men and 76 women is shown in p.101 of Freedman's text and on page 647 of the publication; that page also exists at http://www-stat.stanford.edu/~rag/stat209/pathpage647.pdf note: descriptions of variable abbreviations given pp.642-3 To use the information in Table 7 to obtain the observed sample correlations, add the entries above and below the diagonal-- as above the diagonal are fitted correlations and below are observed minus fitted, thus the sum yields the observed correlations. Consider the equation depicted for PUBS where PUBS = number of publications in first 6 years after Ph.D. (largest coefficient is for SEX in the direction of male) Determine from the correlation information given in table 7 (article pdf or pathpage647.pdf) the squared multiple correlation and the path coefficient for the disturbance term (not shown) for this equation. Is the coefficient for SEX statistically significant? Taking the path analysis results seriously, what would be the most effective way for an aspiring tenure candidate to raise their publication level? gender change if female go to a better graduate school (GPQ measure)? do better as an undergraduate (ABILITY measure)? discuss from an "as if by experiment" interpretation and consider direct and indirect effects ========================================================= ===================== END TH1