Statistics 191:  Introduction to Applied Statistics

Winter 2008   ●   M W      11:00-12:15 PM     Redwood G19

Nancy R. Zhang    nzhang atstanford    Office Hours:  W  12:15-2:15 PM, 4:30-5:30 PM Sequoia Hall 141   


C O U R S E    D E S C R I P T I O N

This course introduces statistical regression models with an applied focus.  The list of subjects include:

We will cover the basic concepts behind these models, and apply them to the analysis of data sets.    Please see the syllabus for more information.

P R E R E Q U I S I T E S

Basic probability and statistics at the level of Stat 60 and Stat 116.   Basic matrix algebra.

T A

Jun Li  (junli07 atstanford)      Office hours 2-4 PM Friday, Sequoia 229       

T E X T B O O K S

Chatterjee and Hadi, Regression Analysis by Example, 4th Edition.

Neter et al., Applied Linear Statistical Models, 5th Edition (Reference)

Weisberg, Applied Linear Regression, 2nd Edition (Reference)

L E C T U R E S (Materials will be posted here after every lecture.)

Date Materials
W 1/9 Slides
M 1/14 Slides, R examples, Heights, CalciumBloodPressure
W 1/16 Slides, R examples, Rivers
W 1/23 Slides, R examples, Rivers
M 1/28 Slides, R examples, Races
W 1/30 Slides, R examples, Salary, Personnel, Beer
M 2/4 Slides, R examples, MilesPGallon, Pearls
W 2/6 Slides, R examples, Bacteria, Manager, Brain, Plutonium
M 2/11 Midterm Review
W 2/13 Midterm (tentative), 2007 Midterm, 2007 solutions, 2008 midterm solutions.
W 2/20 Slides, R examples, Manager, European Jobs
M 2/25 Slides, R examples, European Jobs
W 2/27 Slides, R examples, Building Prices
M 3/3 Slides, R examples, Building Prices, Flu
W 3/5 Slides, R examples, Flu, Lumber
M 3/10 Slides, R examples
W 3/12 Slides, R examples

F I N A L   E X A M

 Final exam for 2008, Heart Disease, Evolution

Clarifications to exam questions will be posted here.

Problem 2.c    Show that if X and Z are conditionally (rather than marginally) independent given Y, then X and Y are still marginally independent.

Problem 1.b    "...quantify the effect of race and gender on party identification..." part of this problem is deciding how to quantify it.  The 3/12 lecture may help.

Problem 3.c  Use squared error on test set for prediction error.

Problem 3.d  Use a model selection criterion to find the best model for linear regression on the transformed variables.

Problem 4:    There are 20 flies of each gender collected from each geographic location (continent,  latitude), the data are summaries for these groups of 20 flies.   This question is very specific and you need to construct the model as being asked.  The columns to the data file are:

    females: average wing size for 20 adults in female group from the geographic location (in log scale, millimeters).

    males: average wing size for 20 adults in male group from the geographic location (in log scale, millimeters).

    ratio: average basal length - to - wing ratio for each group (only available for female flies), used for part (c).

    the se columns: standard error for the numbers in the previous column (i.e. se of all flies measured in that particular group).

 

 

 

A S S I G N M E N T S

Assignments are due at the beginning of the lecture when the next assignment is assigned.

Assign date (tentative) File Data sets Solutions
1/14 Problem Set 1 Cigarette, Examination Problem set 1 solutions, If you have points taken off for 3(c) and 6(c) it can be regraded.
1/28 Problem Set 2 Beer, Election, Megabyte, Rehab Problem set 2 solutions
2/11 Problem Set 3 Election, EducationExpenditure, ComputerAssistedLearning Problem set 3 solutions
2/27 (Due 3/12) Problem Set 4 Orings, NFL, Geriatrics Problem set 4 solutions

R

We will be using R for most of the data analysis in this class.  R can be freely downloaded here.  If you are new to R, here is a brief introduction to the language.  The Stats 141 page also has more extensive tutorials.  Elizabeth Purdom's website also has some good resources.

Here is the R tutorial given by Jun Li from this year.

G R A D I N G

Homeworks (4-5) 40%
Midterm (in class) 20%
Final (take home) 40%