HOMEWORK 5 Ed257 D Rogosa Due May 4, 2005 1. A criminologist wants to estimate the proportion of U.S. citizens who live in a home in which firearms are available. The 1991 General Social Survey asked respondents, “Do you have in your home any guns or revolvers?” Of the respondents, 393 answered “yes” and 583 answered “no.” Construct a 90% confidence interval for the true proportion of “yes.” Construct an exact CI using SAS (or your own computation) and compare with the standard large-sample normal approximation (which should do pretty well for this example) ------------------------------------------------------- 2. The Donner Party, a more competitive environment than graduate school From the Statistical Slueth: 20.1.1 Survival in the Donner Party—An Observational Study In 1846 the Donner and Reed families left Springfield, Illinois, for California by covered wagon. In July, the Donner Party, as it became known, reached Fort Bridger, Wyoming. There its leaders decided to attempt a new and untested route to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed by a difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastem Sierra Nevada mountains when the region was hit by heavy snows in late October. By the time the last survivor was rescued on April 21, 1847, 40 of the 87 members had died from famine and exposure to extreme cold. File donner.dat contains the ages and sexes of the adult (over 15 years) survivors and non-survivors of the party. These data were used by an anthropologist to study the theory that females are better able to withstand harsh conditions than are males (Data from D. K. Grayson, 1990, "Donner Party Deaths: A Demographic Assessment," Journal of Anthropological Research 46: (1990): 223—42.) a. Use logistic regression to predict survival using age and gender as predictors. Comment on the results. Display probability and odds of survival as a function of age and gender. Construct an index plot of the deviance residuals following NWK fig 14.7 (ver4) How do the fits for survival probabilities compare to those from separate fits for males and females. b. For any given age, were the odds of survival greater for women than for men? Give a point estimate and a 95% confidence interval. c. From the logistic fit compare the odds of survival of a woman 50 yrs old with that for a woman 20 years old. Give a point estimate and a 95% confidence interval for the odds ratio. d. The full model in part a contains no interaction term between age and gender (i.e. the "effect" of gender is the same at all levels of age. Fit a more complex model including an agexgender interaction and conduct a statistical test for that term using a drop-in-deviance test statistics (e.g. as in the Course example disease data from NWK Ch 14) ------------------------------------------------------------------ 3. Poisson Regression For the Miller Lumber Poisson Regression example in NWK Section 14.11 the data are in miller.dat: Y in C1; X1-X5 in C2-C6. Fit a Poisson regression model using 2 predictors: Competitor distance, store distance. Compare this model with a model using all 5 predictors. ---------------------------------------------------------- 4. Logistic Regression: Best Subsets Variable Selection For the disease data example (i.e. data NWK Table 14.3) use the following SAS code [adjust path to data] to carry out a best subsets variable selection (via the /selection=score command). data diseasedat; infile 'E:\disease.dat'; input age ses1 ses2 sector disease; run; proc logistic data=diseasedat descending; model disease = age ses1 ses2 sector /selection=score; run; Use a drop-in-deviance test statistic to compare the best (largest score Chi-square) 2-predictor model with the best 3-predictor model. ------------------------------------------------- 5. Deviance and computer programmers. a. Use NWK (14.64ver4, 14.83 ver5) definition for a deviance residual to compute the 25 deviance residuals for the programmer data. Recreate the index plot for deviance residuals in NWK Fig 14.7ver4, or logit class handout. b. Use equation (14.65ver4, 14.82ver5) for total model deviance to verify that model deviance equals -2*LogLikelihood. ---------------------------------- end of HW5