Assignment #6 Ed257 Spring 2005 D Rogosa Due May 24, 2005 ------------------- NOTE: Problems 8,9,10 are more like didactic extensions of Lecture than typical HW problems. You may find it equally useful to work through these with the solutions (at your convenience). All references are to "little" Agresti. There are corresponding sections in the larger Agresti text (with different examples). -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 1. A study on educational aspirations of high school students (S. Crysdale, mt. J. Compar. Sociol., 16: 19—36 (1975)) measured aspirations using the scale (some high school, high school graduate, some college, college grad- uate). For students whose family income was low, the counts in these categories were (9,44,13, 10); when family income was middle, the counts were (11,52, 23, 22); when family income was high, the counts were (9,41,12,27). Part I (basics) a. Test independence of educational aspirations and family income using X2 or G2. Interpret. b. Calculate the adjusted residuals. Do they suggest any association pattern? construct a table following Agresti Table 7.3 (i.e. for the 3x4 table three entries {count, fit under independence, and adjusted residual}. Do the adjusted residuals suggest any association pattern? Part II Following the class example "Linear Association Models for Ordinal Data" construct a linear association term and include this term in the log-linear model. Is this model an improvement over the independence model (and saturated model). construct a table following Agresti Table 7.4 (i.e. for the 3x4 table three entries {count, fit under linear association, and adjusted residual}. Demonstrate that 2x2 tables of fits formed by adjacent cells (e.g. Agresti Fig 7.1) have identical odds ratios. That is linear association model behaves like straight-line regression for measured variables--the increase in the fit from increasing the predictor by one unit is the same at all values of the predictor. ---------------------------------- 2. Most colleges and Universities have annual campaigns in which they ask former graduates to contribute money. For the 1986 to 1987 Providence College fund-raising campaign, statistics were recorded for the number of people contacted and the number of doners categorized by their class year. Some of these data are summarized in the rable below. (data from Providence College Fund Year Report 1986-7). Class 1961 1966 1971 1976 1981 Contributed 196 266 194 276 333 Did not Contribute 123 226 241 322 568 --------------------------------- Construct a null hypothesis that the probabilities of contributing are the same for all these 5 classes. Calculate a table of expected counts under the assumption that this null hypothesis is true. Construct a test statistic for this null hypothesis and carry out a test of the null hypothesis using Type I error rate .01. Carry out a test for linear trend in the proportion contributing following the course example: "Trend in 2xC tables, Alcohol and Infant Malformation" Compare this with the results for the test of independence above ---------------------------------------------------------------------- 3. For the asprin and myocardial infarction 2x2 table from aspirin handout and Agresti section 2.2.2 compute by plug-in (rather than SAS) a point and interval estimate (95%) for the odds ratio for an MI. --------------------------------------------------------------------- 4. Chicago Crime data. The cell counts that follow are said to be for the number of crimes (perhaps daily or hourly ?) committed in Chicago area. The variables are (1) type of neighborhood (suburb vs. center of city), (2) socioeconomic status of neighborhood (high SES vs. low SES), and (3) year the crimes were committed (1976 vs. 1986). Here's the data: Suburbs Center of City High SES 5 10 1976 Low SES 15 120 High SES 5 10 1986 Low SES 15 90 call city-suburb 'C', SES 'S' , year 'T' . a. from these sample data, what is the marginal CxS table? what's the odds ratio for this table? what are the partial C-S odds ratios? b. are there any marginal tables that exhibit Simpson's paradox? c. Suppose that from fitting a log-linear model to these data the likelihood ratio fit statistics told you that the best model was (CS, CT, ST). which sets of variables appear to be conditionally independent? mutually independent? ---------------------------------------------------- 5. In an article about crime in the United States, Newsweek magazine (Jan. 10, 1994) quoted FBI statistics stating that of all blacks slain in 1992, 94% were slain by blacks, and of all whites slain in 1992, 83% were slain by whites. Let Y denote race of victim and X denote race of murderer. a. Which conditional distribution do these statistics refer to, Y given X, or X given Y? b. Calculate and interpret the odds ratio between X and Y. c. Given that a murderer was white, can you estimate the probability that the victim was white? What additional information would you need to do this? (Hint: Use Bayes Theorem.) --------------------------------------------------------- 6. A criminologist wants to estimate the proportion of U.S. citizens who live in a home in which firearms are available. The 1991 General Social Survey asked respondents, “Do you have in your home any guns or revolvers?” Of the respondents, 393 answered “yes” and 583 answered “no.” Construct a 90% confidence interval for the true proportion of “yes.” Construct an exact CI using SAS (or your own computation) and compare with the standard large-sample normal approximation (which should do pretty well for this example) ------------------------------------------------------- 7. Death Penalty example from lecture Death Penalty Example Radelet (1981), studied effects of racial characteristics on whether individuals convicted of homicide receive the death penalty. The variables are “death penalty verdict,” having categories (yes, no), and race of defendant” and “race of victim,” each having categories (white, black) The 326 subjects were defendants in homicide indictments in 20 Florida counties during 1976—1977, and the data form a 2 x 2 x 2 contingency table. Data are available in the course example (deathpen) Agresti sec 3.1 has a larger similar data set a. Run the set of log-linear models for the death penalty using SAS Proc Genmod, and identify the best fitting model. b. For the best-fitting model obtain the fitted odds-ratios for the conditional and marginal 2x2 tables (e.g. the form of entries in Agresti Table 6.5). Interpret. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= NOTE Problems 8,9,10 are more didactic extensions of Lecture than typical HW problems. You may find it equally useful to work through these with the solutions (at your convenience). =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 8. Migraine by 0,1 data anova etc The Migraine Course Example discussed in lecture has the form of an experimental 2x2 factorial design (factors gender, treatment) with a binary outcome (same, better). If the outcome were continuous/measured (e.g. amount of relief) we would think we know how to analyze these data from anova-based methods of Winter qtr (or even ed161). Here are two options for anova-style analyses. a. consider a 2x2 design with the outcome for each cell being the proportion of subjects getting "better". c.f NWK p.773 ("response is a proportion") for details on arcsine transformation etc) b. Reconstruct the individual 0,1 data file for this 2x2 design which has 106 rows. The experiment has cell sizes Active Placebo F 27 25 M 28 26 which is only slightly unbalanced. Use glm in minitab to obtain an anova table and interpret. --------------------------------------------------------------- 9. Revisit Alcohol, Cigarette, and Marijuana Use Example a. From the output from the best-fitting model (AC,AM,CM) verify that the coefficients for the interaction terms are the natural logs of the odds-ratios for the fitted values (e.g. shown in Agresti p.153 table 6.5) b. A C M 0,1 indiv data, phi coeff and correlations The question raised in lecture (and not fuly answered): what can we 'learn' from this study? The Alcohol, Cigarette, and Marijuana Use is simply an observational (correlational) study with 3 intercorrelated variables, all of which happen to be binary. One way to approach the study is to obtain obtain the 3x3 correlation matrix for the three variables over the 2276 respondents. These correlation coeffs are phi coefficients for the corresponding 2x2 tables. Compute the 3 pairwise correlation coefficients (by either reconstructing individual 0,1, data) or computing phi-coefficients from the counts (Formulas for the phi-coefficient in terms of table counts or chi-square were given in lecture also in basic texts such as Hopkins&Glass ). --Compare these correlation coefficients with the corresponding marginal associations (odds-ratios) obtained from the best fitting log-linear model. --Compare the partial correlation coefficients, AC.M AM.C CM.A, with the conditional association (odds-ratios) obtained from the best fitting log-linear model. ------------------------------------------------------------------- 10. odds ratio and correlations One way to try to interpret odds ratios and log-odds is by linking them with the more familiar idea of correlation coefficients. For binary data the phi-coefficient is the usual Pearson correlation coefficient. Try the following exercise: start with a 2x2 table with 100 counts in each cell (i.e. no association). Then construct a series of 2x2 tables by altering the diagonals, adding or subtracting 10, 20 30 40 50 counts and adjusting the off-diagonals to keep the marginal counts constant. Example Base 2x2 table altered table (adding 20 to diag) 100 100 120 80 100 100 80 120 Use a set of these tables to see how the odds-ratio corresponds to a value of the correlation coefficient ------------------------------ ------------------------------- end of HW6