Statistics 209 / HRP 239/ Education 260
Winter 2008
Understanding Statistical Models and their Social Science Applications
David Rogosa
rag AT stat DOT stanford DOT edu
Lecture: TTh 12:50-2:05, Sequoia 200
course web page at http://www-stat.stanford.edu/~rag/stat209/
To see full course materials from Spring 2007 go here
Instructor. David Rogosa, Sequoia 224, rag AT stat DOT stanford DOT edu.
Office hours T 2:30-3:15, Th 3:30-4:30.
TA, Yunting Sun, Sequoia 244. Office Hours M 4PM
Course Overview
For students who have had intermediate-level instruction in statistical methods including multiple regression, logistic regression, log-linear models.
At the very least, the content of the course should provide some consolidation of previous instruction in statistical methods. The goal is also to instill some introspection and critical analysis for the uses of statistical methods common in social science and medical applications.
The focus of the course is on understanding what useful information statistical modeling can provide in experimental and especially non-experimental social science settings. Presentation will emphasize the many misconceptions and misunderstandings in social science applications, such as casual (nee causal) modeling.
Quick Course Outline
Week 1. Course Introduction; properties of regression models
Week 2. Experiments vs observational studies; Neyman-Rubin-Holland formulation
Week 3. Path analysis and causal modeling, multiple regression with pictures
Week 4. Multilevel data. Contextual effects, aggregation bias, random effects models
Week 5. The many uses and forms of analysis of covariance (including regression discontinuity designs)
Week 6. Instrumental variable methods, simultaneous equations, reciprocal effects
Week 7. Compliance and experimental protocols; encouragement designs; intent to treat
Week 8. Matching and propensity score methods
Week 9. Time-1, Time-2 data in experimental and non-experimental designs:
including Lord's paradox, Measurement of change, Repeated Measures Anova,
value-added analysis, interrupted time-series
Dead Week overflow and course summary. discussion of case studies
Course Readings, Files and Examples
Textbooks. Class texts on reserve at Math/CS library
The core of the course is working through David Freedman's new text, along with auxiliary texts and materials. One intent of this course is for students to read some statistical literature and actual research reports to augment the texts (on that theme Freedman's text actually includes reprints of four published empirical research papers).
Main text.
Statistical Models: Theory and Practice David Freedman (2005). Publishers webpage
Auxiliary texts.
Regression Analysis : A Constructive Critique Richard A Berk (2003). Table of contents
Jan de Leeuw, Preface to Berk's "Regression Analysis: A Constructive Critique" Some Berk chapters linked in lecture material
Data analysis and regression: A second course in statistics. Mosteller, F. and Tukey, J. W. (1977) (the green book)
Matched Sampling for Causal Effects, Donald B. Rubin
Cambridge University Press 2006
Observational Studies Paul R. Rosenbaum, Publisher: Springer; 2 edition (January 8, 2002)
Homework and exams. Weekly homework assignments following class content will be posted, with solutions posted the next class cycle. Homeworks are not graded.
Assessment. Two take home problem sets will be scheduled:
TH1 covering content weeks 1-4.
TH2 covering content weeks 5-8.
In class final, scheduled by registrar, exam week: (can be taken remotely).
Course Assignments Page
Statistical computing
Most class presentation will be in, and students are encouraged to use, R, (occasionally some SAS, Mathematica, and Matlab). Students can use whatever they are comfortable with.
We have a set of 4 computer labs to supplement lecture materials.
Lab 1. Multiple regression basics
Lab 2, random-effects models for multilevel data. Lab2, exposition and commands Lab 2, Rogosa R-session
Lab 3, Instrumental Variables. Lab3, exposition and commands Lab 3, Rogosa R-session
Lab 4, Matching and propensity scores. Lab4, exposition and commands Lab 4, Rogosa R-session
Current version of R is 2.6.1 (nov 2007). For references and software: The R Project for Statistical Computing Closest download mirror is Berkeley
The CRAN Task View: Statistics for the Social Sciences provides an overview of relevant R packages. A good R-primer on various applications (repeated measures and lots else). Notes on the use of R for psychology experiments and questionnaires Jonathan Baron, Yuelin Li. Another version
An additional R resource that is efficient if you are experienced with another statistical package is a presentation An Introduction to R, John Verzani For categorical data, especially if you've had a course using Agresti, the lengthy guide by Laura Thompson has more than you want to know. For introductory materials on R see the Stat141 site, especially the Course Files and Examples page.
Case Studies in Cause and Effect (to be updated)
Freedman text includes a series of older social science publications as case studies.
1. Is TV bad or is it bad parenting? Attention Deficit Disorder and TV
and should the question be answered with LISREL (structural equation models)
2004 version : Pediatrics. 2004;113:708-713. Christakis DA, Zimmerman FJ, DiGiuseppe DL, McCarty CA. Early television exposure and subsequent attentional problems in children. Publication summary press release news report audio NPR interview interview transcript and publication
2006 reversal? (with LISREL) Pediatrics. March 2006. Stevens T and Mulsow M. There is no meaningful relationship between television exposure and symptoms of attention-deficit hyperactivity disorder. Pediatrics. 2006; 117(3):665-672.
News Reports: TV may not cause kids' attention disorders Researchers say TV is not to blame for ADHD TV may not cause kids' attention disorders: study.
Good general commentary in Slate Feb '06 The Benefits of Bozo Proof that TV doesn't harm kids.
Auxiliary notes This research example raises an important theme of this course-- similarities (often indistinguishability) between social science and medical research. Is the TV and ADHD child development or medical research? (point being the division is often unclear or unuseful)
further aside: ADHD medication: Prescribing of hyperactivity drugs is out of control FDA panel
As with most important issues, definitive wisdom is provided by South Park via Cartman: here, episode 404 (4/19/2000), episode summary script
2. Money and Happiness
Would You Be Happier If You Were Richer? A Focusing Illusion
Science 30 June 2006: Vol. 312. no. 5782, pp. 1908 - 1910,
Daniel Kahneman Alan B. Krueger, David Schkade, Norbert Schwarz, Arthur A. Stone
Or is it age-dependent? The Midlife Happiness Crisis Is Well-being U-Shaped over the Life Cycle?
David G. Blanchflower, Andrew Oswald
NBER Working Paper No. 12935
February 2007
3. Does Television Cause Autism?
and should instrumental variables (IV) provide the answer? Is Rain the magic IV?
A cautionary comment, including my Nobel-laureate Jim Heckman
Citizen and blooger comments
Autism Bulletin Ariana Huffington
Economists' Full paper: Does Television Cause Autism?
4. Drug Use and Depression
Ecstasy causes depression in pigs Child anxiety link to ecstasy use
British Medical Journal publication Anja C Huizink, Robert F Ferdinand, Jan van der Ende, and Frank C Verhulst
Symptoms of anxiety and depression in childhood and use of MDMA: prospective, population based study
BMJ, Feb 2006;