Solutions Assignment #6 Ed257 Spring 2005 D Rogosa May 23, 2005 PROBLEM 1 Minitab is not a bad way to do this to start set up the data as Row C1 C2 C3 1 1 1 9 2 1 2 44 3 1 3 13 4 1 4 10 5 2 1 11 6 2 2 52 7 2 3 23 8 2 4 22 9 3 1 9 10 3 2 41 11 3 3 12 12 3 4 27 use the following command to obtain the chi-square for independence tests and what Minitab call standardized residuals. MTB > Table C1 C2; SUBC> Frequencies C3; SUBC> ChiSquare 3. Tabulated Statistics: C1, C2 Rows: C1 Columns: C2 1 2 3 4 All 1 9 44 13 10 76 8.07 38.14 13.36 16.42 76.00 0.33 0.95 -0.10 -1.59 -- 2 11 52 23 22 108 11.47 54.20 18.99 23.34 108.00 -0.14 -0.30 0.92 -0.28 -- 3 9 41 12 27 89 9.45 44.66 15.65 19.23 89.00 -0.15 -0.55 -0.92 1.77 -- All 29 137 48 59 273 29.00 137.00 48.00 59.00 273.00 -- -- -- -- -- Chi-Square = 8.871, DF = 6, P-Value = 0.181 Cell Contents -- Count Exp Freq St Resid so we cannot reject that income and aspiration are independent what is Minitab giving as standardized residuals? check using the 3,4 cell MTB > let k121 = (27 - 19.23)/sqrt(19.23) MTB > print k121 Data Display K121 1.77187 so Minitab is not giving Agresti eq 2.4.4 p.31, the standardized residuals will be somewhat smaller than the adjusted resids, which are N(0,1) pretty close. MTB > # check against Agresti eq 2.4.4 p.31 try 1,1 cell MTB > let k1 = (9 - 8.07)/sqrt(8.07*(1 - 76/273)*(1 - 29/273)) MTB > print k1 Data Display K1 0.407644 try 3,4 cell MTB > let k12 = (27 - 19.23)/sqrt(19.23*(1 - 89/273)*(1 - 59/273)) MTB > print k12 Data Display K12 2.43769 so this cell has big residual compared to N(0,1) Perhaps better to do it in SAS. options linesize=80 pagesize = 60; data aspire; input inc schl count @@; cards; 1 1 9 1 2 44 1 3 13 1 4 10 2 1 11 2 2 52 2 3 23 2 4 22 3 1 9 3 2 41 3 3 12 3 4 27 ; proc freq; weight count; tables inc*schl /expected chisq cellchi2; run; The FREQ Procedure Table of inc by schl inc schl Frequency | Expected | Cell Chi-Square| Percent | Row Pct | Col Pct | 1| 2| 3| 4| Total ---------------+--------+--------+--------+--------+ 1 | 9 | 44 | 13 | 10 | 76 | 8.0733 | 38.139 | 13.363 | 16.425 | | 0.1064 | 0.9006 | 0.0098 | 2.5132 | | 3.30 | 16.12 | 4.76 | 3.66 | 27.84 | 11.84 | 57.89 | 17.11 | 13.16 | | 31.03 | 32.12 | 27.08 | 16.95 | ---------------+--------+--------+--------+--------+ 2 | 11 | 52 | 23 | 22 | 108 | 11.473 | 54.198 | 18.989 | 23.341 | | 0.0195 | 0.0891 | 0.8472 | 0.077 | | 4.03 | 19.05 | 8.42 | 8.06 | 39.56 | 10.19 | 48.15 | 21.30 | 20.37 | | 37.93 | 37.96 | 47.92 | 37.29 | ---------------+--------+--------+--------+--------+ 3 | 9 | 41 | 12 | 27 | 89 | 9.4542 | 44.663 | 15.648 | 19.234 | | 0.0218 | 0.3004 | 0.8506 | 3.1352 | | 3.30 | 15.02 | 4.40 | 9.89 | 32.60 | 10.11 | 46.07 | 13.48 | 30.34 | | 31.03 | 29.93 | 25.00 | 45.76 | ---------------+--------+--------+--------+--------+ Total 29 137 48 59 273 10.62 50.18 17.58 21.61 100.00 Statistics for Table of inc by schl Statistic DF Value Prob ------------------------------------------------------ Chi-Square 6 8.8709 0.1810 Likelihood Ratio Chi-Square 6 8.9165 0.1783 Mantel-Haenszel Chi-Square 1 4.7489 0.0293 Phi Coefficient 0.1803 Contingency Coefficient 0.1774 Cramer's V 0.1275 Sample Size = 273 biggest contributions to chi-square are 1,4 and 3,4 cells Low income students aspire to college grad less often than indep would indicate whereas high income students aspire more than independence. Part II Basically this problem asks us to reproduce the analyses in the Course Example "Linear Association Models..." taken from Agresti Ch 7. Above we used Minitab to conduct a chi-square for independence tests and what Minitab call standardized residuals. ----------------------------- the following SAS commands ---------------------------- options nodate pageno=1 linesize=80 pagesize=72; data aspiration; input income edasp Count @@; assoc = income*edasp; cards; 1 1 9 1 2 44 1 3 13 1 4 10 2 1 11 2 2 52 2 3 23 2 4 22 3 1 9 3 2 41 3 3 12 3 4 27 run; proc freq data=aspiration; weight count; tables income*edasp /chisq expected deviation; proc genmod; class income edasp; model count = income edasp / dist=poi link=log obstats; run; proc genmod; class income edasp; model count = income edasp assoc / dist=poi link=log obstats; run; title 'association model'; run; -------------------------------------- Full output given below solutions -------- Take the PROC FREQ table (same as minitab table) and add a row from the GENMOD output for the adjusted residuals---StReschi Table of income by edasp income edasp Frequency| Expected | StReschi | 1| 2| 3| 4| Total ---------+--------+--------+--------+--------+ 1 | 9 | 44 | 13 | 10 | 76 | 8.0733 | 38.139 | 13.363 | 16.425 | 0.40613 1.5828 -0.1286 -2.1078 ---------+--------+--------+--------+--------+ 2 | 11 | 52 | 23 | 22 | 108 | 11.473 | 54.198 | 18.989 | 23.341 | -0.1898 -0.544 1.3041 -0.4031 ---------+--------+--------+--------+--------+ 3 | 9 | 41 | 12 | 27 | 89 | 9.4542 | 44.663 | 15.648 | 19.234 | -0.1903 -0.9459 -1.2374 2.43601 ---------+--------+--------+--------+--------+ Total 29 137 48 59 273 10.62 50.18 17.58 21.61 100.00 we do see bigger residuals in the corners 1,4 3,4 but not as pronounced as in premarital Agresti ex. , wherin independence was strongly rejected. ---------- run the linear association model with the code proc genmod; class income edasp; model count = income edasp assoc / dist=poi link=log obstats; run; after defining assoc = income*edasp. comparing independence model (which we couldn't reject) with linear association model, we do find a significant linear association (wow) The assiciation parameter is significant (see excerpt below) Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq assoc 1 0.1830 0.0844 0.0175 0.3484 4.70 0.0302 Equivalently deviance (likleihood ratio chisq) is 8.92 for independence and 4.11 for linear association models respectively. difference of 4.81 is compared to chisq 1df. So the association is statistically significant even though independence can't be rejected. ---------- edit the genmod output to extract for each of the 12 cells, obs count, fit from association model and adjusted residual income edasp Count Predicted residual 1 1 9 10.447966 -0.688296 1 2 44 41.569359 0.714553 1 3 13 12.022837 0.3619727 1 4 10 11.959839 -0.986569 2 1 11 11.466497 -0.18911 2 2 52 54.782023 -0.699736 2 3 23 19.025559 1.2512735 2 4 22 22.725922 -0.222924 3 1 9 7.0855369 0.9406921 3 2 41 40.648618 0.104745 3 3 12 16.951607 -1.782359 3 4 27 24.314239 1.139088 the important interpretive point is that this linear association implies that the odds ratios for all 2x2 adjacent subtables are the same. In the same sense as in straight-line regression that the increase in the fit is the same from X to X+1 for all values of X. To demonstrate: 2x2 table composed of fits for cells 1,1 1,2 2,1 2,2 1 1 10.447966 1 2 41.569359 odds ratio = 1.20079 2 1 11.466497 2 2 54.782023 2x2 table composed of fits for cells 2,3 2,4 3,3 3,4 2 3 19.025559 2 4 22.725922 odds ratio = 1.20079 3 3 16.951607 3 4 24.314239 see Agresti p.182-184. -------------------- full SAS output -one FREQ and two GENMODs HW6 problem 1 1 The FREQ Procedure Table of income by edasp income edasp Frequency| Expected | Deviation| Percent | Row Pct | Col Pct | 1| 2| 3| 4| Total ---------+--------+--------+--------+--------+ 1 | 9 | 44 | 13 | 10 | 76 | 8.0733 | 38.139 | 13.363 | 16.425 | | 0.9267 | 5.8608 | -0.363 | -6.425 | | 3.30 | 16.12 | 4.76 | 3.66 | 27.84 | 11.84 | 57.89 | 17.11 | 13.16 | | 31.03 | 32.12 | 27.08 | 16.95 | ---------+--------+--------+--------+--------+ 2 | 11 | 52 | 23 | 22 | 108 | 11.473 | 54.198 | 18.989 | 23.341 | | -0.473 | -2.198 | 4.011 | -1.341 | | 4.03 | 19.05 | 8.42 | 8.06 | 39.56 | 10.19 | 48.15 | 21.30 | 20.37 | | 37.93 | 37.96 | 47.92 | 37.29 | ---------+--------+--------+--------+--------+ 3 | 9 | 41 | 12 | 27 | 89 | 9.4542 | 44.663 | 15.648 | 19.234 | | -0.454 | -3.663 | -3.648 | 7.7656 | | 3.30 | 15.02 | 4.40 | 9.89 | 32.60 | 10.11 | 46.07 | 13.48 | 30.34 | | 31.03 | 29.93 | 25.00 | 45.76 | ---------+--------+--------+--------+--------+ Total 29 137 48 59 273 10.62 50.18 17.58 21.61 100.00 Statistics for Table of income by edasp Statistic DF Value Prob ------------------------------------------------------ Chi-Square 6 8.8709 0.1810 Likelihood Ratio Chi-Square 6 8.9165 0.1783 Mantel-Haenszel Chi-Square 1 4.7489 0.0293 Phi Coefficient 0.1803 Contingency Coefficient 0.1774 Cramer's V 0.1275 Sample Size = 273 2 The GENMOD Procedure Model Information Data Set WORK.ASPIRATION Distribution Poisson Link Function Log Dependent Variable Count Observations Used 12 Class Level Information Class Levels Values income 3 1 2 3 edasp 4 1 2 3 4 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 6 8.9165 1.4861 Scaled Deviance 6 8.9165 1.4861 Pearson Chi-Square 6 8.8709 1.4785 Scaled Pearson X2 6 8.8709 1.4785 Log Likelihood 627.9901 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 2.9567 0.1566 2.6498 3.2636 356.50 <.0001 income 1 1 -0.1579 0.1562 -0.4640 0.1482 1.02 0.3120 income 2 1 0.1935 0.1432 -0.0871 0.4741 1.83 0.1765 income 3 0 0.0000 0.0000 0.0000 0.0000 . . edasp 1 1 -0.7102 0.2268 -1.1547 -0.2658 9.81 0.0017 edasp 2 1 0.8424 0.1557 0.5372 1.1476 29.27 <.0001 edasp 3 1 -0.2063 0.1944 -0.5873 0.1746 1.13 0.2884 edasp 4 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Observation Statistics Observation Count income edasp Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 9 1 1 8.0732611 2.0885575 0.2097085 8.0732611 5.3523538 12.177361 0.9267389 0.3261617 0.3202024 0.3987119 0.4061323 0.4013622 2 44 1 2 38.139199 3.6412426 0.1295923 38.139199 29.584376 49.167792 5.8608012 0.9490109 0.926141 1.5446752 1.582819 1.5692137 3 13 1 3 13.362639 2.5924627 0.17415 13.362639 9.4985104 18.798751 -0.362639 -0.099204 -0.099658 -0.129226 -0.128637 -0.128988 4 10 1 4 16.424924 2.7987999 0.1626162 16.424924 11.942196 22.590327 -6.424924 -1.585318 -1.710424 -2.274189 -2.107847 -2.203483 3 The GENMOD Procedure Observation Statistics Observation Count income edasp Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 5 11 2 1 11.472527 2.4399552 0.2001974 11.472527 7.7490903 16.985074 -0.472527 -0.139507 -0.140482 -0.191137 -0.189812 -0.190529 6 52 2 2 54.1978 3.9926403 0.1135585 54.1978 43.383093 67.708438 -2.1978 -0.298536 -0.300589 -0.547803 -0.544062 -0.545191 7 23 2 3 18.989011 2.9438604 0.1625718 18.989011 13.807688 26.114619 4.0109894 0.9204503 0.8906041 1.2618685 1.3041566 1.2832659 8 22 2 4 23.340677 3.1501976 0.1501512 23.340677 17.390196 31.32726 -1.340677 -0.277503 -0.280225 -0.407119 -0.403164 -0.405042 9 9 3 1 9.4542125 2.2464604 0.2050749 9.4542125 6.3250695 14.131407 -0.454212 -0.147722 -0.14893 -0.191884 -0.190329 -0.191268 10 41 3 2 44.663004 3.7991455 0.1219517 44.663004 35.167577 56.722245 -3.663004 -0.548105 -0.555865 -0.959298 -0.945905 -0.950423 11 12 3 3 15.648352 2.7503656 0.1685416 15.648352 11.246198 21.773664 -3.648352 -0.922279 -0.962127 -1.290907 -1.237442 -1.26742 12 27 3 4 19.234448 2.9567028 0.1565953 19.234448 14.150949 26.144111 7.7655521 1.770649 1.6679729 2.2947526 2.4360117 2.3624328 4 The GENMOD Procedure Model Information Data Set WORK.ASPIRATION Distribution Poisson Link Function Log Dependent Variable Count Observations Used 12 Class Level Information Class Levels Values income 3 1 2 3 edasp 4 1 2 3 4 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 5 4.1097 0.8219 Scaled Deviance 5 4.1097 0.8219 Pearson Chi-Square 5 4.0204 0.8041 Scaled Pearson X2 5 4.0204 0.8041 Log Likelihood 630.3935 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 0.9953 0.9291 -0.8257 2.8164 1.15 0.2840 income 1 1 0.7543 0.4477 -0.1231 1.6318 2.84 0.0920 income 2 1 0.6644 0.2652 0.1446 1.1841 6.28 0.0122 income 3 0 0.0000 0.0000 0.0000 0.0000 . . edasp 1 1 0.4138 0.5666 -0.6967 1.5243 0.53 0.4652 edasp 2 1 1.6118 0.3963 0.8351 2.3884 16.54 <.0001 edasp 3 1 0.1882 0.2726 -0.3461 0.7226 0.48 0.4899 edasp 4 0 0.0000 0.0000 0.0000 0.0000 . . assoc 1 0.1830 0.0844 0.0175 0.3484 4.70 0.0302 Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Observation Statistics Observation Count assoc income edasp Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 9 1 1 1 10.447966 2.3464073 0.2305561 10.447966 6.6493934 16.416534 -1.447966 -0.447963 -0.458958 -0.688296 -0.671807 -0.679188 2 44 2 1 2 41.569359 3.7273633 0.1322378 41.569359 32.07837 53.868436 2.4306411 0.3769938 0.3734066 0.714553 0.7214175 0.7195494 3 13 3 1 3 12.022837 2.4868079 0.184586 12.022837 8.3731161 17.263419 0.9771629 0.2818146 0.2781209 0.3619727 0.36678 0.3639497 HW6 problem 4 5 The GENMOD Procedure Observation Statistics Observation Count assoc income edasp Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 4 10 4 1 4 11.959839 2.4815543 0.2331955 11.959839 7.5723197 18.889554 -1.959839 -0.566706 -0.583347 -0.986569 -0.958425 -0.968358 5 11 2 2 1 11.466497 2.4394295 0.200719 11.466497 7.7371034 16.993512 -0.466497 -0.137763 -0.138714 -0.18911 -0.187814 -0.188512 6 52 4 2 2 54.782023 4.0033621 0.1135584 54.782023 43.850749 68.438285 -2.782023 -0.375873 -0.379124 -0.699736 -0.693736 -0.695503 7 23 6 2 3 19.025559 2.9457833 0.1626316 19.025559 13.832644 26.167948 3.9744409 0.9111866 0.8819415 1.2512735 1.2927656 1.2723218 8 22 8 2 4 22.725922 3.1235062 0.1524756 22.725922 16.855205 30.641427 -0.725922 -0.152275 -0.153097 -0.222924 -0.221728 -0.222293 9 9 3 3 1 7.0855369 1.9580556 0.2553376 7.0855369 4.2956513 11.687362 1.9144631 0.7192181 0.6900105 0.9406921 0.9805109 0.9592921 10 41 6 3 2 40.648618 3.7049648 0.1334533 40.648618 31.29321 52.800917 0.3513817 0.0551133 0.0550342 0.104745 0.1048956 0.104854 11 12 9 3 3 16.951607 2.8303626 0.170448 16.951607 12.13739 23.675353 -4.951607 -1.202654 -1.269752 -1.782359 -1.688173 -1.736613 12 27 12 3 4 24.314239 3.1910622 0.1790329 24.314239 17.118576 34.534546 2.6857605 0.5446744 0.535082 1.139088 1.1595083 1.1550334 ------------------------------------------------------------------ Problem 2 The completed table is following, including percentages and expected number. 1961 1966 1971 1976 1981 Total ------------------------------------------------------------------------ CONTRI: orignial 196 266 194 276 333 1265 percentage 61.44 54.07 44.60 46.15 36.96 46.08 expected 147.0 226.7 200.5 275.6 415.2 NOT-CON:orignal 123 226 241 322 568 1480 percentage 38.56 45.93 55.40 53.85 63.04 53.92 expected 172.0 265.3 234.5 322.4 485.8 Total 319 492 435 598 901 2745 For example in the class of 1966, 54.07% of alumni contacted were contributors. H0: p1(1)=...=p1(5). H1: at least one of the above equalities does not hold. (i.e. not all proportions equal) Expected Counts given above Components of test statistic (from Minitab output) ChiSq = 16.33 + 6.80 + 0.21 + 0.00 + 16.28 + 13.96 + 5.81 + 0.18 + 0.00 + 13.91 = 73.48 df = 4 clearly, reject H0 at level .01. (critical value 13.3) So the proportion varies with the year of graduation. From empirical proportions it appears graduates from earlier years tend to participate more than the recent years. unlike the infant trend course example, we don't need the additional power that a focused hypothesis (trend) provides. here's SAS code for the linear trend in proportions options nodate pageno=1 linesize=80 pagesize=72; data contrib; input clss contr Count @@; cards; 1961 1 196 1961 2 123 1966 1 266 1966 2 226 1971 1 194 1971 2 241 1976 1 276 1976 2 322 1981 1 333 1981 2 568 ; proc freq data=contrib; weight count; tables clss*contr /chisq trend; /* exact trend ; */ run; ---------------------------------- In the above contr (contribution) was coded 1=yes, 2=no the exact statement is commented out because the data set is too large for an exact test (at least on my big machine it wouldn't run in 5 hours) output is The FREQ Procedure Table of clss by contr clss contr Frequency| Percent | Row Pct | Col Pct | 1| 2| Total ---------+--------+--------+ 1961 | 196 | 123 | 319 | 7.14 | 4.48 | 11.62 | 61.44 | 38.56 | | 15.49 | 8.31 | ---------+--------+--------+ 1966 | 266 | 226 | 492 | 9.69 | 8.23 | 17.92 | 54.07 | 45.93 | | 21.03 | 15.27 | ---------+--------+--------+ 1971 | 194 | 241 | 435 | 7.07 | 8.78 | 15.85 | 44.60 | 55.40 | | 15.34 | 16.28 | ---------+--------+--------+ 1976 | 276 | 322 | 598 | 10.05 | 11.73 | 21.79 | 46.15 | 53.85 | | 21.82 | 21.76 | ---------+--------+--------+ 1981 | 333 | 568 | 901 | 12.13 | 20.69 | 32.82 | 36.96 | 63.04 | | 26.32 | 38.38 | ---------+--------+--------+ Total 1265 1480 2745 46.08 53.92 100.00 Statistics for Table of clss by contr Statistic DF Value Prob ------------------------------------------------------ Chi-Square 4 73.4780 <.0001 Likelihood Ratio Chi-Square 4 73.8906 <.0001 Mantel-Haenszel Chi-Square 1 67.8408 <.0001 Phi Coefficient 0.1636 Contingency Coefficient 0.1615 Cramer's V 0.1636 Cochran-Armitage Trend Test --------------------------- Statistic (Z) -8.2381 One-sided Pr < Z <.0001 Two-sided Pr > |Z| <.0001 Sample Size = 2745 proportion contributing decreases from .61 to .37. test for linear trend has test statistic -8.2 (negative trend) which is compared to N(0,1). ---------------------------------------------------------- PROBLEM 3 The aspirin and myocardial infarction (MI) data can be summarized in the following 2X2 table. MI No MI ALL Placebo 189 10,845 11,034 Aspirin 104 10,933 11,037 ALL 293 21,778 22,071 The odds ratio (theta) is equal to the odds of an MI for the placebo group divided by the odds for an MI for the aspirin group. This is equivalent to the product of the probabilities on the diagonals divided by the product of the probabilities off the diagonals. theta = [pi(1,1)*pi(2,2)]/[pi(1,2)*pi(2,1)] >Since we do not have the population probabilities, we create a sample estimate of theta from the sample proportions (or equiv the counts) >If we plug in n(i,j)/n for p(i,j), the n's (total count) cancel out: theta(hat) = [n(1,1)*n(2,2)]/[n(1,2)*n(2,1)] = (189*10,933)/(10,845*104) = 1.832 This is our point estimate of theta. NOTE: Since none of our cell counts come close to zero, we won't worry here about calculating the "modified" odds ratio (i.e. adding .5 to each of the counts above). To compute an interval estimate for theta, we first take its natural log. Interval estimate: ln[theta(hat)] +/- N(0,1)*a.s.e.[ln(theta(hat))] (where N(0,1) is the "z" critical value (for a 95% confidence interval = 1.96) and a.s.e. is the asymptotic standard error). ln[theta(hat)] = ln(1.832) = 0.605 a.s.e.[ln(theta(hat))] = [1/n(1,1) + 1/n(1,2) + 1/n(2,1) + 1/n(2,2)]**.5 = (1/189 + 1/10,845 + 1/104 + 1/10,933)**.5 = 0.123 ***note: small cell counts dominate standard error******** The 95% interval estimate endpoints (in ln form): .605 +/- (1.96)*(0.123) ==> (.364, .846) Convert endpoints back by taking the exponent: e**.364 = 1.439 e**.846 = 2.330 So our final 95% interval estimate for the odds ratio (theta): (1.439, 2.330) In English, this means that the odds of getting an MI for the placebo group appear to be between 1.439 and 2.330 times greater than the odds of getting an MI for the aspirin group. A larger sample (esp of MI) would yield a narrower interval estimate. ---------------------------------------------------------------------- 4. a) Marginal CxS table: high SES low SES theta-hat(CS) = (10*210)/(20*30)=3.50 suburb 10 30 (interpretation: odds of crime occuring city 20 210 in a high SES area are 3.5 times greater in the suburbs than in the center of the city) Partial C-S tables: T=1976 high SES low SES theta-hat(CS,1976) = (5*120)/(10*15)=4.00 suburb 5 15 city 10 120 T=1986 high SES low SES theta-hat(CS,1986) = (5*90)/(10*15)=3.00 suburb 5 15 city 10 90 b. None of the marginal tables exhibit Simpson's paradox. There is no evidence of a reversal of association between any of the marginal tables and the corresponding partial tables. Odds ratios are consistently >= 1 or <= 1 across marginal and partial tables for pairs of variables. For completeness, C-T and S-T marginal and partial tables and odds ratios are given below. Marginal CxT table: 1976 1986 theta-hat(CT) = (20*100)/(130*20)=0.77 suburb 20 20 city 130 100 Partial C-T tables: S=high SES 1976 1986 theta-hat(CT,high) = (5*10)/(5*10)=1.00 suburb 5 5 city 10 10 S=low SES 1976 1986 theta-hat(CT,low) = (15*90)/(15*120)=0.75 suburb 15 15 city 120 90 Marginal SxT table: 1976 1986 theta-hat(ST) = (15*105)/(15*135)=0.78 high 15 15 low 135 105 Partial S-T tables: C=suburb 1976 1986 theta-hat(ST,sub.) = (5*15)/(15*5)=1.00 high 5 5 low 15 15 C=city 1976 1986 theta-hat(ST,city) = (10*90)/(10*120)=0.75 high 10 10 low 120 90 c) Since all 2-way interactions appear in the model, no pair of variables is conditionally independent (given a third var). Because interactions are present in the model, the variables are not mutually independent. (see Agresti sec 3.1.4, sec 6.2). ------------------------- PROBLEM 5 This looks like a simple problem, but that may be deceiving as it requires thinking rather than computing. The text tells us given that the victim is black (Y=B), the probability the murderer is black (X=B) is .94 given that the victim is white (Y=W), the probability the murderer is white (X=W) is .83 part a. so we have conditional probabilities, X given Y. part b. calculate odds ratio, need to go back to the definition, such as on Agresti pp.22-23. Pr(X=W|Y=W)/(1 - Pr(X=W|Y=W)) .83*.94 odds ratio = ----------------------------- = --------- = 76.5 Pr(X=W|Y=B)/(1 - Pr(X=W|Y=B)) .06*.17 (if I did the arithmetic right, please advise) part c. set up our Bayes Thm calc following the course example Pr(X=W|Y=W)*Pr{Y=W} Pr(Y=W|X=W) = ----------------------------------------------------- Pr(X=W|Y=W)*Pr{Y=W} + Pr(X=W|Y=B)*(1 - Pr{Y=W}) we know Pr(X=W|Y=W) = .83 and Pr(X=W|Y=B) = .06 we need Pr{Y=W} ----------------------------------------------------------------- PROBLEM 6 data propx; input guns count @@; cards; 1 393 2 583 ; proc freq order=data; weight count; tables guns /binomial alpha = .1; exact binomial / alpha = .1; run; The FREQ Procedure Cumulative Cumulative guns Frequency Percent Frequency Percent --------------------------------------------------------- 1 393 40.27 393 40.27 2 583 59.73 976 100.00 Binomial Proportion for guns = 1 ------------------------------------- Proportion (P) 0.4027 ASE 0.0157 90% Lower Conf Limit 0.3768 90% Upper Conf Limit 0.4285 Exact Conf Limits 90% Lower Conf Limit 0.3766 90% Upper Conf Limit 0.4292 Test of H0: Proportion = 0.5 ASE under H0 0.0160 Z -6.0818 One-sided Pr < Z <.0001 Two-sided Pr > |Z| <.0001 Exact Test One-sided Pr <= P 6.460E-10 Two-sided = 2 * One-sided 1.292E-09 Sample Size = 976 ----------------------------------------------------------- PROBLEM 7. part a (long) run file and output d is defendent race v is victim race # white=1, black=2 p is death penalty (1=yes, 2= no) options linesize=80 pagesize = 60; data death; input d v p count @@; cards; 1 1 1 19 1 1 2 132 1 2 1 0 1 2 2 9 2 1 1 11 2 1 2 52 2 2 1 6 2 2 2 97 ; proc genmod; class d v p; model count = d v p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p v*p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*v / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*v d*p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*v v*p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*p v*p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*v d*p v*p / dist=poi link=log obstats; run; proc genmod; class d v p; model count = d v p d*v d*p v*p d*v*p/ dist=poi link=log obstats; run; -------------- output The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 4 137.9294 34.4823 Scaled Deviance 4 137.9294 34.4823 Pearson Chi-Square 4 122.3975 30.5994 Scaled Pearson X2 4 122.3975 30.5994 Log Likelihood 1011.6236 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 3.9266 0.1108 3.7095 4.1436 1256.97 <.0001 d 1 1 -0.0368 0.1108 -0.2540 0.1803 0.11 0.7397 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 0.6475 0.1166 0.4189 0.8761 30.82 <.0001 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.0864 0.1767 -2.4327 -1.7400 139.40 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 11.598479 2.450874 0.1804597 11.598479 8.1431695 16.519947 7.4015206 2.1733037 1.9880714 2.5202086 2.7550212 2.6113829 2 132 1 1 2 93.432195 4.537236 0.0907535 93.432195 78.207249 111.62105 38.567805 3.9900344 3.7541706 7.8199431 8.3112478 8.2006258 3 0 1 2 1 6.0702323 1.8033969 0.191889 6.0702323 4.1674377 8.8418166 -6.070232 -2.463784 -3.484317 -3.95413 -2.795992 -3.726634 4 9 1 2 2 48.899093 3.8897589 0.1117671 48.899093 39.279369 60.874738 -39.89909 -5.705748 -7.023715 -11.2591 -9.146383 -10.02164 5 11 2 1 1 12.033422 2.487688 0.1798327 12.033422 8.4589272 17.118395 -1.033422 -0.297909 -0.302333 -0.386831 -0.38117 -0.384638 6 52 2 1 2 96.935903 4.57405 0.0895003 96.935903 81.339572 115.52273 -44.9359 -4.564058 -5.009986 -10.59702 -9.653804 -9.872449 7 6 2 2 1 6.297866 1.8402108 0.1912994 6.297866 4.3287154 9.1627913 -0.297866 -0.118693 -0.119647 -0.136393 -0.135305 -0.136143 8 97 2 2 2 50.732809 3.9265728 0.1107519 50.732809 40.833514 63.031997 46.267191 6.4957406 5.7623277 9.3759909 10.569341 10.135124 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 131.6796 43.8932 Scaled Deviance 3 131.6796 43.8932 Pearson Chi-Square 3 115.9014 38.6338 Scaled Pearson X2 3 115.9014 38.6338 Log Likelihood 1014.7484 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 3.9885 0.1113 3.7704 4.2067 1283.91 <.0001 d 1 1 -0.0368 0.1108 -0.2540 0.1803 0.11 0.7397 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 0.5515 0.1219 0.3125 0.7905 20.46 <.0001 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.8717 0.4196 -3.6942 -2.0492 46.83 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 1 1 1 1.0579 0.4635 0.1494 1.9665 5.21 0.0225 v*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 14.723926 2.6894738 0.1910912 14.723926 10.124335 21.41316 4.2760736 1.1143801 1.0660811 1.5678624 1.6388947 1.6064438 2 132 1 1 2 90.306749 4.5032122 0.0928294 90.306749 75.284172 108.32701 41.693251 4.3873842 4.1018515 8.7096222 9.3159048 9.1848859 3 0 1 2 1 2.9447868 1.0800364 0.4121275 2.9447868 1.3129563 6.604766 -2.944787 -1.716038 -2.426844 -3.432658 -2.427256 -2.972601 4 9 1 2 2 52.02454 3.9517155 0.1123231 52.02454 41.744442 64.836243 -43.02454 -5.965023 -7.38026 -12.58995 -10.1757 -11.06488 5 11 2 1 1 15.276074 2.7262878 0.1904992 15.276074 10.516193 22.190391 -4.276074 -1.094055 -1.152177 -1.725961 -1.638895 -1.678252 6 52 2 1 2 93.693251 4.5400262 0.0916046 93.693251 78.295062 112.11978 -41.69325 -4.307364 -4.706711 -10.17961 -9.315905 -9.507146 7 6 2 2 1 3.0552163 1.1168504 0.4118534 3.0552163 1.3629243 6.8487636 2.9447837 1.6847382 1.4863854 2.1414802 2.427253 2.2940262 8 97 2 2 2 53.97546 3.9885295 0.111313 53.97546 43.39569 67.134555 43.02454 5.8562291 5.2602402 9.1401188 10.175701 9.8447749 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 137.7079 45.9026 Scaled Deviance 3 137.7079 45.9026 Pearson Chi-Square 3 121.3219 40.4406 Scaled Pearson X2 3 121.3219 40.4406 Log Likelihood 1011.7343 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 3.9355 0.1121 3.7158 4.1553 1231.94 <.0001 d 1 1 -0.0552 0.1175 -0.2855 0.1751 0.22 0.6386 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 0.6475 0.1166 0.4189 0.8761 30.82 <.0001 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.1707 0.2560 -2.6725 -1.6690 71.90 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 1 1 1 0.1664 0.3539 -0.5273 0.8601 0.22 0.6382 d*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 12.472393 2.5235176 0.2328884 12.472393 7.9015965 19.687235 6.5276071 1.8483283 1.7145836 3.0143805 3.2495148 3.1753465 2 132 1 1 2 92.558282 4.5278385 0.093261 92.558282 77.09592 111.12178 39.441718 4.0996632 3.8505716 8.7206313 9.2847647 9.1775007 3 0 1 2 1 6.5276075 1.8760405 0.2418526 6.5276075 4.0633923 10.486228 -6.527608 -2.554918 -3.6132 -4.595508 -3.249515 -4.133637 4 9 1 2 2 48.441718 3.8803614 0.1138125 48.441718 38.756287 60.547597 -39.44172 -5.666907 -6.970436 -11.42049 -9.284765 -10.13311 5 11 2 1 1 11.159509 2.412292 0.245823 11.159509 6.8928716 18.067165 -0.159509 -0.047749 -0.047863 -0.083875 -0.083674 -0.08374 6 52 2 1 2 97.809816 4.5830249 0.0911966 97.809816 81.80046 116.9524 -45.80982 -4.631987 -5.090617 -11.7867 -10.7248 -10.93071 7 6 2 2 1 5.8404908 1.7648148 0.2543317 5.8404908 3.5478219 9.6147254 0.1595092 0.0660026 0.0657055 0.0832978 0.0836744 0.0834402 8 97 2 2 2 51.190184 3.9355478 0.1121271 51.190184 41.090734 63.77192 45.809816 6.4027302 5.6901692 9.5312368 10.724802 10.315253 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 3 8.1316 2.7105 Scaled Deviance 3 8.1316 2.7105 Pearson Chi-Square 3 6.9773 2.3258 Scaled Pearson X2 3 6.9773 2.3258 Log Likelihood 1076.5225 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 4.5177 0.1004 4.3208 4.7146 2022.86 <.0001 d 1 1 -2.4375 0.3476 -3.1188 -1.7562 49.18 <.0001 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 -0.4916 0.1599 -0.8051 -0.1781 9.45 0.0021 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.0864 0.1767 -2.4327 -1.7400 139.40 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 1 1 1 3.3116 0.3786 2.5697 4.0536 76.52 <.0001 d*v 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 16.674847 2.8139014 0.1770108 16.674847 11.78664 23.590312 2.3251534 0.5694042 0.5568855 0.805871 0.8239868 0.8153862 2 132 1 1 2 134.32515 4.9002634 0.0836858 134.32515 114.005 158.26716 -2.325153 -0.200619 -0.201202 -0.826381 -0.823987 -0.824129 3 0 1 2 1 0.993865 -0.006154 0.3685396 0.993865 0.482643 2.0465806 -0.993865 -0.996928 -1.409869 -1.51589 -1.071896 -1.46384 4 9 1 2 2 8.006135 2.0802081 0.333904 8.006135 4.1610533 15.40432 0.993865 0.3512497 0.3443344 1.0507929 1.0718961 1.0696499 5 11 2 1 1 6.9570552 1.9397563 0.201453 6.9570552 4.6875763 10.325297 4.0429448 1.5327986 1.4117961 1.6665283 1.8093634 1.708067 6 52 2 1 2 56.042945 4.0261183 0.1274904 56.042945 43.651674 71.951688 -4.042945 -0.540054 -0.546751 -1.831799 -1.809363 -1.811374 7 6 2 2 1 11.374233 2.4313505 0.1855237 11.374233 7.9068608 16.362142 -5.374233 -1.593512 -1.753104 -2.247366 -2.04278 -2.169572 8 97 2 2 2 91.625767 4.5177125 0.1004466 91.625767 75.251867 111.56243 5.3742331 0.5614456 0.556087 2.0232832 2.0427801 2.0413138 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 2 7.9102 3.9551 Scaled Deviance 2 7.9102 3.9551 Pearson Chi-Square 2 7.0420 3.5210 Scaled Pearson X2 2 7.0420 3.5210 Log Likelihood 1076.6332 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 4.5267 0.1020 4.3268 4.7265 1971.03 <.0001 d 1 1 -2.4559 0.3498 -3.1414 -1.7703 49.30 <.0001 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 -0.4916 0.1599 -0.8051 -0.1781 9.45 0.0021 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.1707 0.2560 -2.6725 -1.6690 71.90 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 1 1 1 3.3116 0.3786 2.5697 4.0536 76.52 <.0001 d*v 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 1 1 1 0.1664 0.3539 -0.5273 0.8601 0.22 0.6382 d*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 17.93125 2.886545 0.2302262 17.93125 11.419358 28.156551 1.06875 0.2523892 0.2499424 1.1226103 1.1336005 1.1330582 2 132 1 1 2 133.06875 4.8908659 0.0863986 133.06875 112.33975 157.62267 -1.06875 -0.092648 -0.092773 -1.135123 -1.1336 -1.133611 3 0 1 2 1 1.06875 0.0664897 0.3968535 1.06875 0.4909913 2.3263681 -1.06875 -1.033804 -1.462019 -1.603153 -1.1336 -1.53421 4 9 1 2 2 7.93125 2.0708107 0.3345942 7.93125 4.1165605 15.280895 1.06875 0.3794943 0.3714171 1.1094726 1.1336005 1.1309221 5 11 2 1 1 6.4518072 1.8643603 0.2620543 6.4518072 3.8602939 10.783069 4.5481928 1.7905983 1.6252336 2.1777706 2.399355 2.2786062 6 52 2 1 2 56.548193 4.0350932 0.1286869 56.548193 43.942039 72.770817 -4.548193 -0.604825 -0.613217 -2.432647 -2.399355 -2.401484 7 6 2 2 1 10.548193 2.3559545 0.2500163 10.548193 6.4619525 17.218383 -4.548193 -1.400393 -1.525138 -2.613086 -2.399355 -2.474237 8 97 2 2 2 92.451807 4.5266875 0.101961 92.451807 75.705254 112.90282 4.5481928 0.4730218 0.4692208 2.3800748 2.399355 2.3986086 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 2 1.8819 0.9409 Scaled Deviance 2 1.8819 0.9409 Pearson Chi-Square 2 1.4313 0.7157 Scaled Pearson X2 2 1.4313 0.7157 Log Likelihood 1079.6473 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 4.5797 0.1011 4.3816 4.7778 2053.37 <.0001 d 1 1 -2.4375 0.3476 -3.1188 -1.7562 49.18 <.0001 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 -0.5876 0.1639 -0.9087 -0.2664 12.86 0.0003 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.8717 0.4196 -3.6942 -2.0492 46.83 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 1 1 1 3.3116 0.3786 2.5697 4.0536 76.52 <.0001 d*v 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 1 1 1 1.0579 0.4635 0.1494 1.9665 5.21 0.0225 v*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 21.168224 3.0525012 0.1878376 21.168224 14.648623 30.589478 -2.168224 -0.471262 -0.479671 -0.953406 -0.936692 -0.94095 2 132 1 1 2 129.83178 4.8662396 0.0859325 129.83178 109.70719 153.648 2.1682243 0.190289 0.189763 0.9341023 0.9366915 0.9365848 3 0 1 2 1 0.4821475 -0.729505 0.5185052 0.4821475 0.1745129 1.3320863 -0.482148 -0.694368 -0.981985 -1.052571 -0.74428 -1.01789 4 9 1 2 2 8.5178731 2.1421667 0.3340902 8.5178731 4.4254056 16.394918 0.4821269 0.1651946 0.1636718 0.7373882 0.7442486 0.7439121 5 11 2 1 1 8.8317757 2.1783561 0.2110295 8.8317757 5.840088 13.356008 2.1682243 0.7295922 0.7024339 0.9018241 0.9366915 0.9156962 6 52 2 1 2 54.168224 3.9920945 0.1289764 54.168224 42.068761 69.747633 -2.168224 -0.294599 -0.296598 -0.943047 -0.936692 -0.937322 7 6 2 2 1 5.5179 1.7079973 0.4092011 5.5179 2.4743506 12.305136 0.4821 0.2052344 0.2023497 0.7337466 0.7442071 0.7434167 8 97 2 2 2 97.48214 4.5796692 0.101065 97.48214 79.964715 118.83701 -0.48214 -0.048833 -0.048873 -0.744884 -0.744269 -0.744271 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 2 131.4582 65.7291 Scaled Deviance 2 131.4582 65.7291 Pearson Chi-Square 2 115.7475 57.8738 Scaled Pearson X2 2 115.7475 57.8738 Log Likelihood 1014.8592 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 3.9975 0.1127 3.7767 4.2184 1258.56 <.0001 d 1 1 -0.0552 0.1175 -0.2855 0.1751 0.22 0.6386 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 0.5515 0.1219 0.3125 0.7905 20.46 <.0001 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.9560 0.4587 -3.8551 -2.0570 41.53 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 1 1 1 0.1664 0.3539 -0.5273 0.8601 0.22 0.6382 d*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 1 1 1 1.0579 0.4635 0.1494 1.9665 5.21 0.0225 v*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 15.833334 2.7621174 0.2412201 15.833334 9.8683708 25.403834 3.1666664 0.7958224 0.7712884 2.7492788 2.8367309 2.8299461 2 132 1 1 2 89.462069 4.4938147 0.0952822 89.462069 74.222324 107.83092 42.537931 4.4973521 4.1972322 9.6853412 10.377884 10.251393 3 0 1 2 1 3.166674 1.1526818 0.4376301 3.166674 1.3430493 7.46646 -3.166674 -1.779515 -2.516614 -4.011753 -2.836738 -3.348694 4 9 1 2 2 51.537931 3.9423181 0.1143586 51.537931 41.189335 64.486556 -42.53793 -5.925335 -7.325584 -12.83034 -10.37788 -11.23633 5 11 2 1 1 14.166666 2.6508918 0.2537303 14.166666 8.615729 23.293959 -3.166666 -0.841334 -0.876019 -2.953679 -2.836731 -2.847211 6 52 2 1 2 94.537931 4.5490011 0.0932626 94.537931 78.744605 113.49883 -42.53793 -4.374952 -4.786344 -11.35375 -10.37788 -10.55791 7 6 2 2 1 2.8333398 1.0414562 0.4446482 2.8333398 1.1852594 6.773044 3.1666602 1.8812743 1.6341112 2.4640343 2.8367253 2.6792045 8 97 2 2 2 54.462069 3.9975045 0.1126814 54.462069 43.669635 67.921725 42.537931 5.7640706 5.186733 9.3384204 10.377884 10.068673 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1 0.7007 0.7007 Scaled Deviance 1 0.7007 0.7007 Pearson Chi-Square 1 0.3755 0.3755 Scaled Pearson X2 1 0.3755 0.3755 Log Likelihood 1080.2379 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 4.5781 0.1012 4.3797 4.7764 2045.75 <.0001 d 1 1 -2.4177 0.3480 -3.0998 -1.7356 48.27 <.0001 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 -0.6331 0.1714 -0.9690 -0.2972 13.64 0.0002 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.8421 0.4203 -3.6659 -2.0183 45.72 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 1 1 1 3.3580 0.3820 2.6093 4.1066 77.29 <.0001 d*v 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 1 1 1 -0.4402 0.4009 -1.2260 0.3455 1.21 0.2722 d*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 1 1 1 1.3242 0.5193 0.3063 2.3421 6.50 0.0108 v*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . The GENMOD Procedure Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 18.674359 2.9271514 0.2296511 18.674359 11.906012 29.290387 0.3256406 0.0753556 0.0751382 0.611041 0.6128092 0.6127825 2 132 1 1 2 132.32564 4.8852659 0.0868389 132.32564 111.61605 156.87776 -0.325641 -0.028309 -0.02832 -0.613062 -0.612811 -0.612811 3 0 1 2 1 0.3256432 -1.121953 0.6387653 0.3256432 0.0931159 1.1388329 -0.325643 -0.570652 -0.807023 -0.86665 -0.612814 -0.837367 4 9 1 2 2 8.6743628 2.1603719 0.3339603 8.6743628 4.5078563 16.691874 0.3256372 0.1105644 0.1098832 0.6090274 0.6128028 0.6126802 5 11 2 1 1 11.325642 2.4270693 0.2934175 11.325642 6.372435 20.128908 -0.325642 -0.096763 -0.097232 -0.615783 -0.612811 -0.612885 6 52 2 1 2 51.674359 3.9449617 0.1387307 51.674359 39.371984 67.820797 0.3256413 0.0453004 0.0452529 0.6121685 0.6128105 0.612807 7 6 2 2 1 5.67437 1.7359595 0.4092202 5.67437 2.5444198 12.654545 0.32563 0.1366991 0.1354219 0.6070641 0.6127892 0.6125056 8 97 2 2 2 97.325641 4.5780625 0.1012175 97.325641 79.812482 118.68169 -0.325641 -0.033009 -0.033027 -0.613153 -0.612811 -0.612812 The GENMOD Procedure Model Information Data Set WORK.DEATH Distribution Poisson Link Function Log Dependent Variable count Observations Used 8 Class Level Information Class Levels Values d 2 1 2 v 2 1 2 p 2 1 2 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 0 0.0000 . Scaled Deviance 0 0.0000 . Pearson Chi-Square 0 0.0000 . Scaled Pearson X2 0 0.0000 . Log Likelihood 1080.5883 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square Intercept 1 4.5747 0.1015 4.3757 4.7737 2030.01 d 1 1 -2.3775 0.3485 -3.0604 -1.6945 46.55 d 2 0 0.0000 0.0000 0.0000 0.0000 . v 1 1 -0.6235 0.1719 -0.9603 -0.2866 13.16 v 2 0 0.0000 0.0000 0.0000 0.0000 . p 1 1 -2.7830 0.4207 -3.6075 -1.9584 43.76 Analysis Of Parameter Estimates Parameter Pr > ChiSq Intercept <.0001 d 1 <.0001 d 2 . v 1 0.0003 v 2 . p 1 <.0001 The GENMOD Procedure Analysis Of Parameter Estimates Standard Wald 95% Confidence Chi- Parameter DF Estimate Error Limits Square p 2 0 0.0000 0.0000 0.0000 0.0000 . d*v 1 1 1 3.3090 0.3850 2.5545 4.0636 73.87 d*v 1 2 0 0.0000 0.0000 0.0000 0.0000 . d*v 2 1 0 0.0000 0.0000 0.0000 0.0000 . d*v 2 2 0 0.0000 0.0000 0.0000 0.0000 . d*p 1 1 1 -19.3850 0.4127 -20.1940 -18.5761 2205.96 d*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . d*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . d*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . v*p 1 1 1 1.2296 0.5358 0.1794 2.2798 5.27 v*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . v*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . v*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 1 1 1 0 19.0000 0.0000 19.0000 19.0000 . d*v*p 1 1 2 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 1 2 1 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 1 2 2 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 2 1 1 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 2 1 2 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 2 2 1 0 0.0000 0.0000 0.0000 0.0000 . d*v*p 2 2 2 0 0.0000 0.0000 0.0000 0.0000 . Scale 0 1.0000 0.0000 1.0000 1.0000 Analysis Of Parameter Estimates Parameter Pr > ChiSq p 2 . d*v 1 1 <.0001 d*v 1 2 . d*v 2 1 . d*v 2 2 . d*p 1 1 <.0001 d*p 1 2 . d*p 2 1 . d*p 2 2 . v*p 1 1 0.0217 v*p 1 2 . v*p 2 1 . v*p 2 2 . d*v*p 1 1 1 . d*v*p 1 1 2 . d*v*p 1 2 1 . d*v*p 1 2 2 . d*v*p 2 1 1 . d*v*p 2 1 2 . d*v*p 2 2 1 . d*v*p 2 2 2 . Scale The GENMOD Procedure NOTE: The scale parameter was held fixed. Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 19 2.944439 0.2294157 19 12.119217 29.787402 -2.84E-14 -6.52E-15 0 0 -6.17E-10 -6.17E-10 2 132 1 1 2 132 4.8828019 0.0870388 132 111.29774 156.55304 -8.53E-14 -7.42E-15 -7.214E-9 -0.001799 -1.851E-9 -7.448E-9 3 0 1 2 1 2.1223E-9 -19.97074 0.6770777 2.1223E-9 5.63E-10 8.001E-9 -2.122E-9 -0.000046 -0.000065 -0.000065 -0.000046 -0.000065 4 9 1 2 2 9 2.1972246 0.3333333 9 4.6828329 17.297222 -1.78E-15 -5.92E-16 0 0 -3.86E-11 -3.86E-11 5 11 2 1 1 11 2.3978953 0.3015113 11 6.0918018 19.862761 8.882E-15 2.678E-15 1.3171E-9 0.0000948 1.928E-10 1.3311E-9 6 52 2 1 2 52 3.9512437 0.138675 52 39.624421 68.240745 1.421E-14 1.971E-15 0 0 3.085E-10 3.085E-10 7 6 2 2 1 6 1.7917595 0.4082483 6 2.6955642 13.355275 -5.33E-15 -2.18E-15 0 0 -1.16E-10 -1.16E-10 8 97 2 2 2 97 4.574711 0.1015346 97 79.496006 118.35815 0 0 0 0 0 0 ------------------------------------------ part b clearly the DV, VP model is indicated as the best, output reproduced below (i.e. victims race matters). As indicated below Death Penalty and Defendent Race are conditionally independent at each level of Victim Race (so is there bias). Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 2 1.8819 0.9409 Scaled Deviance 2 1.8819 0.9409 Pearson Chi-Square 2 1.4313 0.7157 Scaled Pearson X2 2 1.4313 0.7157 Log Likelihood 1079.6473 Algorithm converged. Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 4.5797 0.1011 4.3816 4.7778 2053.37 <.0001 d 1 1 -2.4375 0.3476 -3.1188 -1.7562 49.18 <.0001 d 2 0 0.0000 0.0000 0.0000 0.0000 . . v 1 1 -0.5876 0.1639 -0.9087 -0.2664 12.86 0.0003 v 2 0 0.0000 0.0000 0.0000 0.0000 . . p 1 1 -2.8717 0.4196 -3.6942 -2.0492 46.83 <.0001 p 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 1 1 1 3.3116 0.3786 2.5697 4.0536 76.52 <.0001 d*v 1 2 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 1 0 0.0000 0.0000 0.0000 0.0000 . . d*v 2 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 1 1 1 1.0579 0.4635 0.1494 1.9665 5.21 0.0225 v*p 1 2 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 1 0 0.0000 0.0000 0.0000 0.0000 . . v*p 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed. The GENMOD Procedure Observation Statistics Observation count d v p Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 19 1 1 1 21.168224 3.0525012 0.1878376 21.168224 14.648623 30.589478 -2.168224 -0.471262 -0.479671 -0.953406 -0.936692 -0.94095 2 132 1 1 2 129.83178 4.8662396 0.0859325 129.83178 109.70719 153.648 2.1682243 0.190289 0.189763 0.9341023 0.9366915 0.9365848 3 0 1 2 1 0.4821475 -0.729505 0.5185052 0.4821475 0.1745129 1.3320863 -0.482148 -0.694368 -0.981985 -1.052571 -0.74428 -1.01789 4 9 1 2 2 8.5178731 2.1421667 0.3340902 8.5178731 4.4254056 16.394918 0.4821269 0.1651946 0.1636718 0.7373882 0.7442486 0.7439121 5 11 2 1 1 8.8317757 2.1783561 0.2110295 8.8317757 5.840088 13.356008 2.1682243 0.7295922 0.7024339 0.9018241 0.9366915 0.9156962 6 52 2 1 2 54.168224 3.9920945 0.1289764 54.168224 42.068761 69.747633 -2.168224 -0.294599 -0.296598 -0.943047 -0.936692 -0.937322 7 6 2 2 1 5.5179 1.7079973 0.4092011 5.5179 2.4743506 12.305136 0.4821 0.2052344 0.2023497 0.7337466 0.7442071 0.7434167 8 97 2 2 2 97.48214 4.5796692 0.101065 97.48214 79.964715 118.83701 -0.48214 -0.048833 -0.048873 -0.744884 -0.744269 -0.744271 odds-ratios (obatined from fits or from parameter estimates Partial Marginal D-P V-P D-V D-P V-P D-V 1.0 2.9 27.4 1.65 2.9 27.4 ------------------------------------------- PROBLEM 8 I created the individual data in my editor, and have placed the file migraineind.dat in the HW section (i.e. where the HW data sets reside). In this file there are 106 rows C1 Gender (F = 1, M = 0) C2 Treat (Active = 1, Placebo = 0) C3 Outcome (Better = 1, Same = 0) I then cut-and-paste that text file into Minitab. part a. The 2x2 design has cell sizes Active Placebo F 27 25 M 28 26 The proportions improving in each cell are Active Placebo F .5926 .2000 M .4286 .2692 Construct an anova table for this 2x2 design using the same methods as the unweighted means approach for unbalanced designs (which this is), i.e. the "Miller" solution in the 'unbalanced' course examples. part 1. anova on cell means c6 c8 c9 propbetter tr gen 0.2000 0 1 0.2692 0 0 0.5926 1 1 0.4286 1 0 MTB > anova c6 = c8 c9 ANOVA: propbetter versus tr, gen Factor Type Levels Values tr fixed 2 0 1 gen fixed 2 0 1 Analysis of Variance for propbett Source DF SS MS F P tr 1 0.07618 0.07618 5.60 0.254 gen 1 0.00225 0.00225 0.17 0.754 Error 1 0.01360 0.01360 Total 3 0.09202 To obtain the treatment gender and treatment*gender interaction SS multiply the tr gen and error SS above by the harmonic mean of the cell sizes. 1/(1/27 + 1/25 + 1/28 + 1/26) = 6.61319 %*4 = 26.4528 . Sums-of-Squares (each 1 df) treatment gender treatment*gender 2.01496 0.05951 0.35972 now the error term is obtained by summing the errorSS for each cell for each cell compute n*p*(1 - p) to obtain MTB > print c13 4.00000 5.11502 6.51848 6.85726 Sum of C13 = 22.491 (with 102 df) so then the full anova table would be Source DF SS MS Gender 1 0.0716 0.0595 Treat 1 1.9832 2.015 Gender*Treat 1 0.3598 0.3597 Error 102 22.49 0.2205 Clearly, treatment is significant, whereas gender and t*g are not. part b. do the anova the easier way using the individual data file MTB > glm c3 = c1|c2 General Linear Model: Outcome versus Gender, Treat Factor Type Levels Values Gender fixed 2 0 1 Treat fixed 2 0 1 Analysis of Variance for Outcome, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Gender 1 0.0716 0.0594 0.0594 0.27 0.605 Treat 1 1.9832 2.0146 2.0146 9.14 0.003 Gender*Treat 1 0.3598 0.3598 0.3598 1.63 0.204 Error 102 22.4910 22.4910 0.2205 Total 105 24.9057 Anova table remarkably close to the old-fashioned solution in part a. Also traditional is to consider a variance stabilizing transformation for the proportions in part a--the arcsine--but that doesn't seem important here... ---------------------------------------------- PROBLEM 9 9a Genmod Output for the AC, CM, AM model fit Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 1 0.3740 0.3740 Scaled Deviance 1 0.3740 0.3740 Pearson Chi-Square 1 0.4011 0.4011 Scaled Pearson X2 1 0.4011 0.4011 Log Likelihood 12010.6124 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 5.6334 0.0597 5.5164 5.7504 8903.96 <.0001 a 1 1 0.4877 0.0758 0.3392 0.6362 41.44 <.0001 a 2 0 0.0000 0.0000 0.0000 0.0000 . . c 1 1 -1.8867 0.1627 -2.2055 -1.5678 134.47 <.0001 c 2 0 0.0000 0.0000 0.0000 0.0000 . . m 1 1 -5.3090 0.4752 -6.2404 -4.3777 124.82 <.0001 m 2 0 0.0000 0.0000 0.0000 0.0000 . . a*c 1 1 1 2.0545 0.1741 1.7134 2.3957 139.32 <.0001 a*c 1 2 0 0.0000 0.0000 0.0000 0.0000 . . a*c 2 1 0 0.0000 0.0000 0.0000 0.0000 . . a*c 2 2 0 0.0000 0.0000 0.0000 0.0000 . . a*m 1 1 1 2.9860 0.4647 2.0753 3.8968 41.29 <.0001 a*m 1 2 0 0.0000 0.0000 0.0000 0.0000 . . a*m 2 1 0 0.0000 0.0000 0.0000 0.0000 . . a*m 2 2 0 0.0000 0.0000 0.0000 0.0000 . . c*m 1 1 1 2.8479 0.1638 2.5268 3.1690 302.14 <.0001 c*m 1 2 0 0.0000 0.0000 0.0000 0.0000 . . c*m 2 1 0 0.0000 0.0000 0.0000 0.0000 . . c*m 2 2 0 0.0000 0.0000 0.0000 0.0000 . . Observation Statistics Observation count a c m Pred Xbeta Std HessWgt Lower Upper Resraw Reschi Resdev StResdev StReschi Reslik 1 911 1 1 1 910.38317 6.8138656 0.0331254 910.38317 853.15473 971.45041 0.6168304 0.0204434 0.0204411 0.6332534 0.6333249 0.6333248 2 538 1 1 2 538.61683 6.2890044 0.0430504 538.61683 495.03436 586.03627 -0.61683 -0.026578 -0.026583 -0.633446 -0.633325 -0.633325 3 44 1 2 1 44.61683 3.7981111 0.1481099 44.61683 33.375465 59.644459 -0.61683 -0.092346 -0.09256 -0.634793 -0.633325 -0.633356 4 456 1 2 2 455.38317 6.1211392 0.0468122 455.38317 415.46112 499.14136 0.6168304 0.0289053 0.0288988 0.633182 0.6333249 0.6333246 5 3 2 1 1 3.616831 1.2855982 0.4516316 3.616831 1.4924462 8.7651178 -0.616831 -0.324341 -0.334285 -0.652742 -0.633326 -0.638475 6 43 2 1 2 42.38317 3.7467513 0.1518756 42.38317 31.471445 57.078189 0.6168304 0.0947478 0.0945193 0.631798 0.6333249 0.6332908 7 2 2 2 1 1.3831699 0.3243779 0.476606 1.3831699 0.5434852 3.5201674 0.6168301 0.5244786 0.491342 0.5933111 0.6333247 0.6061677 8 279 2 2 2 279.61683 5.6334202 0.0597008 279.61683 248.74013 314.32632 -0.61683 -0.036888 -0.036901 -0.633558 -0.633325 -0.633326 ------------------------------- From the model parameter estimates A-C Exp[2.0545] = 7.80294 A-M Exp[2.986] = 19.8063 C-M Exp[2.8479] = 17.2515 These match the "conditional associations" (odds ratios for partial tables) If you crave some additional arithmetic, compute the odds ratios for the 2x2 tables with the model fits as the cell counts. For example the two A-C patial tables (at each level of M) are [c.f Agresti Table 6.4) M yes M no A notA A notA C 910.38317 3.616831 C 538.61683 42.38317 notC 44.61683 1.3831699 notC 455.38317 279.61683 ----- part b. I created the individual data in my editor, and have placed the file acmind.dat in the HW section (i.e. where the HW data sets reside). There are three columns A C M (1=yes, 0=no). I then cut-and-paste that text file into Minitab. The 3x3 correlation matrix for these survey data are (phi coeffs) MTB > corr c1 c2 c3 Correlations: A, C, M A C C 0.445 M 0.337 0.531 Note that the phi coeffs don't correspond exactly to the strength of the associations are measured by the odds ratios (e.g. AM has the largest odds ratio) Now you didn't have to go through the pain of creating the indiv data to get these correlations if you remembered the relation between the chi-square statistic and the phi coefficient For example MTB > table c1 c2; SUBC> chisq. Tabulated Statistics: A, C Rows: A Columns: C 0 1 All 0 281 46 327 1 500 1449 1949 All 781 1495 2276 Chi-Square = 451.404, DF = 1, P-Value = 0.000 MTB > let k1 = sqrt(451.4/2276) MTB > print k1 K1 0.445343 MTB > table c1 c3; SUBC> chisq. Tabulated Statistics: A, M Rows: A Columns: M 0 1 All 0 322 5 327 1 994 955 1949 All 1316 960 2276 Chi-Square = 258.733, DF = 1, P-Value = 0.000 MTB > let k2 = sqrt(258.7/2276) MTB > print k2 K2 0.337141 MTB > table c2 c3; SUBC> chisq. Rows: C Columns: M 0 1 All 0 735 46 781 1 581 914 1495 All 1316 960 2276 Chi-Square = 642.035, DF = 1, P-Value = 0.000 MTB > let k3 = sqrt(642.035/2276) MTB > print k3 K3 0.531121 Now recall what a partial correlation coeff is supposed to tell you. Corr(AB.C) is interpreted as the correlation between A and B with the value of C held constant and this relation is presumed to be constant across the range of C. That should sound alot like a measure of conditional association for a partial table. For example from the AC, AM, CM model the partial AC table at both levels of M (yes/no) are constrained to have the same odds-ratio. So for our 3x3 correlation matrix the partial correlations are (if I did the arith correctly) AC.M partcorr[.445, .337, .531] 0.333481 AM.C partcorr[ .337, .445, .531] 0.132708 CM.A partcorr[.531, .445, .337] 0.45192 You do see that these conditional associations are nonzero but don't perfectly correspond to the log-linear analysis. Without much effort you could list a number of advantages of the log-linear analysis over the partial correlations. -------------------------------------------------------- 10. I constructed a table using the agebraic relations that hold for any n. The 2x2 Table has cell counts {n11, n12, n21, n22} Let oddsratio[n11_, n12_, n21_, n22_] := n11*(n22/(n21*n12)) phi[n11_, n12_, n21_, n22_] := -(n12*n21 - n11*n22)/ Sqrt[(n11 + n21)*(n12 + n22)*(n11 + n12)*(n21 + n22)] The base 2x2 table has cell counts {n,n,n,n}. Add or subtract a fraction, f, of the off diagonal counts. For each values of f the table below gives an example of the resulting 2x2 table with base table {100,100,100,100}. f odds-ratio Log(odds-ratio) Phi Example 2x2 Table -0.9 0.00277008 -5.88888 -0.9 {10. 190. 190. 10. } -0.8 0.0123457 -4.39445 -0.8 {20. 180. 180. 20. } -0.7 0.0311419 -3.4692 -0.7 {30. 170. 170. 30. } -0.6 0.0625 -2.77259 -0.6 {40. 160. 160. 40. } -0.5 0.111111 -2.19722 -0.5 {50. 150. 150. 50. } -0.4 0.183673 -1.6946 -0.4 {60. 140. 140. 60. } -0.3 0.289941 -1.23808 -0.3 {70. 130. 130. 70. } -0.2 0.444444 -0.81093 -0.2 {80. 120. 120. 80. } -0.1 0.669421 -0.401341 -0.1 {90. 110. 110. 90. } 0 1. 0. 0. {100. 100. 100. 100.} 0.1 1.49383 0.401341 0.1 {110. 90. 90. 110.} 0.2 2.25 0.81093 0.2 {120. 80. 80. 120.} 0.3 3.44898 1.23808 0.3 {130. 70. 70. 130.} 0.4 5.44444 1.6946 0.4 {140. 60. 60. 140.} 0.5 9. 2.19722 0.5 {150. 50. 50. 150.} 0.6 16. 2.77259 0.6 {160. 40. 40. 160.} 0.7 32.1111 3.4692 0.7 {170. 30. 30. 170.} 0.8 81. 4.39445 0.8 {180. 20. 20. 180.} 0.9 361. 5.88888 0.9 {190. 10. 10. 190.} ===================================================================== END HW6 END 257