Solutions HOMEWORK 5 Ed257 D Rogosa May 4, 2005 1. SAS proc Freq solution for inference about proportion -------- data propx; input guns count @@; cards; 1 393 2 583 ; proc freq order=data; weight count; tables guns /binomial alpha = .1; exact binomial / alpha = .1; run; The SAS System The FREQ Procedure Cumulative Cumulative guns Frequency Percent Frequency Percent --------------------------------------------------------- 1 393 40.27 393 40.27 2 583 59.73 976 100.00 Binomial Proportion for guns = 1 ------------------------------------- Proportion (P) 0.4027 ASE 0.0157 90% Lower Conf Limit 0.3768 90% Upper Conf Limit 0.4285 Exact Conf Limits 90% Lower Conf Limit 0.3766 90% Upper Conf Limit 0.4292 Test of H0: Proportion = 0.5 ASE under H0 0.0160 Z -6.0818 One-sided Pr < Z <.0001 Two-sided Pr > |Z| <.0001 Exact Test One-sided Pr <= P 6.460E-10 Two-sided = 2 * One-sided 1.292E-09 Sample Size = 976 ========================================================= ----------------------------- PROBLEM 2 ----------------------------- note the donner.dat file has a first row composed of variable names which of course are not numbers (your editor can compensate) SAS commands for parts a,b,c-- first is a proc logistic for the full data set (including iplots to obtain plots of deviance residuals) followed by separate proc logistics for males and females separately. data donnerfull; infile 'E:\donnernumbers.dat'; input age male survival; run; proc logistic data=donnerfull descending; model survival = age male /influence iplots; output out=pred p=phat lower=lcl upper=ucl; run; proc print data=pred; title2 'Predicted Probabilities and 95% Confidence Limits'; run; data donnermale; infile 'E:\donnermale.dat'; input age survival; run; proc logistic data=donnermale descending; model survival = age; output out=predm p=phat lower=lcl upper=ucl; run; proc print data=predm; title2 'Male Predicted Probabilities and 95% Confidence Limits'; run; data donnerfemale; infile 'E:\donnerfemale.dat'; input age survival; run; proc logistic data=donnerfemale descending; model survival = age; output out=predf p=phat lower=lcl upper=ucl; run; proc print data=predf; title2 'Female Predicted Probabilities and 95% Confidence Limits'; run; part a --------- prob survival given in output for full model Predicted Probabilities and 95% Confidence Limits phat and upper and lower limits of a 95% confidence interval,lcl ucl odds of survival simply phat/(1-phat) index plot of the deviance residuals is given by SAS as the plot of RESDEV vs INDEX (case number). Note data I input has first 15 observations all females. There are not extremely large residuals, the pattern seen may be a result of ordering on age. One would want to plot resid vs age as a followup. comparing logistic regression for all donner party versus separate male and female fits here's a summary of parameter estimates and fits I constructed from SAS output. Param estimates seem to differ and esp for females (n=15) the fits differ a good bit from the overall analysis likley because it's hard to fit to 15 obs. ------------ all Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 3.2304 1.3870 5.4248 0.0199 age 1 -0.0782 0.0373 4.3988 0.0360 male 1 -1.5973 0.7555 4.4699 0.0345 male Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 0.3183 1.1310 0.0792 0.7784 age 1 -0.0325 0.0353 0.8480 0.3571 female Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 7.2458 3.2050 5.1113 0.0238 age 1 -0.1941 0.0874 4.9289 0.0264 Predicted Probabilities and 95% Confidence Limits overall separate gender Obs age male survival phat phat 1 47 0 0 0.39051 0.13296 2 25 0 0 0.78165 0.91639 3 50 0 0 0.33631 0.07891 4 45 0 0 0.42831 0.18439 5 45 0 0 0.42831 0.18439 6 40 0 1 0.52554 0.37365 7 25 0 1 0.78165 0.91639 8 21 0 1 0.83035 0.95971 9 24 0 1 0.79470 0.93011 10 32 0 1 0.67434 0.73806 11 32 0 1 0.67434 0.73806 12 15 0 1 0.88669 0.98707 13 23 0 1 0.80717 0.94172 14 22 0 1 0.81905 0.95150 15 20 0 1 0.84109 0.96658 16 30 1 0 0.32894 0.34164 17 25 1 0 0.42019 0.37905 18 25 1 0 0.42019 0.37905 19 15 1 0 0.61303 0.45789 20 25 1 0 0.42019 0.37905 21 35 1 0 0.24899 0.30611 22 25 1 0 0.42019 0.37905 23 23 1 0 0.45870 0.39445 24 25 1 0 0.42019 0.37905 25 30 1 0 0.32894 0.34164 26 24 1 0 0.43936 0.38672 27 57 1 0 0.05601 0.17758 28 60 1 0 0.04483 0.16379 29 28 1 0 0.36434 0.35640 30 40 1 0 0.18317 0.27274 31 30 1 0 0.32894 0.34164 32 23 1 0 0.45870 0.39445 33 28 1 0 0.36434 0.35640 34 65 1 0 0.03076 0.14274 35 62 1 0 0.03859 0.15509 36 18 1 1 0.55612 0.43383 37 20 1 1 0.51725 0.41795 38 28 1 1 0.36434 0.35640 39 25 1 1 0.42019 0.37905 40 46 1 1 0.12301 0.23584 41 28 1 1 0.36434 0.35640 42 32 1 1 0.29538 0.32719 43 30 1 1 0.32894 0.34164 44 40 1 1 0.18317 0.27274 45 23 1 1 0.45870 0.39445 part b ----------- from the overall fit the coefficient for male (or not) is -1.5973. so the odds ratio for males (compared to females of the same age) is exp(-1.5973) = 0.202442 or inversting to say females have odds of survival about 5 times (4.94) the odds for a man the same age. We can also construct a 95% confidence interval taking the upper and lower endoints of the confidence interval for the male coefficient -1.5973 +/- 1.96*.7555 which are {-3.07808, -0.11652}. The for the odds ratio the endpoints are {0.0460476, 0.890012} for males compared to females or inverting to obtain females compared to males {21.7167, 1.12358} part c --------- Comparing women 50 years old with woman 20 years old the estimated odds ratio is exp[-0.0782*(50 - 20)] = 0.09575. Odds of a 20yr woman surviving nearly 10 times as large as 50yr woman. Endpoint of a confidence interval obtained from exp[30*{-0.0782 - 1.96*.0373, -0.0782 + 1.96*.0373}] = {0.0106815, 0.858336} which does approach 1 at the upper end. ------------------- SAS output for parts a,b,c The SAS System The LOGISTIC Procedure Model Information Data Set WORK.DONNERFULL Response Variable survival Number of Response Levels 2 Number of Observations 45 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value survival Frequency 1 1 20 2 0 25 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 63.827 57.256 SC 65.633 62.676 -2 Log L 61.827 51.256 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 10.5703 2 0.0051 Score 9.0965 2 0.0106 Wald 6.8627 2 0.0323 The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 3.2304 1.3870 5.4248 0.0199 age 1 -0.0782 0.0373 4.3988 0.0360 male 1 -1.5973 0.7555 4.4699 0.0345 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 0.925 0.860 0.995 male 0.202 0.046 0.890 Association of Predicted Probabilities and Observed Responses Percent Concordant 73.0 Somers' D 0.492 Percent Discordant 23.8 Gamma 0.508 Percent Tied 3.2 Tau-a 0.248 Pairs 500 c 0.746 The LOGISTIC Procedure Regression Diagnostics Pearson Residual Deviance Residual Covariates Case (1 unit = 0.33) (1 unit = 0.26) Number age male Value -8 -4 0 2 4 6 8 Value -8 -4 0 2 4 6 8 1 47.0000 0 -0.8004 | * | | -0.9951 | * | | 2 25.0000 0 -1.8920 | * | | -1.7445 | * | | 3 50.0000 0 -0.7118 | * | | -0.9055 | * | | 4 45.0000 0 -0.8656 | * | | -1.0575 | * | | 5 45.0000 0 -0.8656 | * | | -1.0575 | * | | 6 40.0000 0 0.9502 | | * | 1.1343 | | * | 7 25.0000 0 0.5285 | | * | 0.7019 | | * | 8 21.0000 0 0.4520 | |* | 0.6098 | | * | 9 24.0000 0 0.5083 | | * | 0.6779 | | * | 10 32.0000 0 0.6949 | | * | 0.8877 | | * | 11 32.0000 0 0.6949 | | * | 0.8877 | | * | 12 15.0000 0 0.3575 | |* | 0.4904 | | * | 13 23.0000 0 0.4888 | |* | 0.6546 | | * | 14 22.0000 0 0.4700 | |* | 0.6318 | | * | 15 20.0000 0 0.4347 | |* | 0.5883 | | * | 16 30.0000 1.0000 -0.7001 | * | | -0.8932 | * | | 17 25.0000 1.0000 -0.8513 | * | | -1.0441 | * | | 18 25.0000 1.0000 -0.8513 | * | | -1.0441 | * | | 19 15.0000 1.0000 -1.2586 | * | | -1.3780 | * | | 20 25.0000 1.0000 -0.8513 | * | | -1.0441 | * | | 21 35.0000 1.0000 -0.5758 | * | | -0.7567 | * | | 22 25.0000 1.0000 -0.8513 | * | | -1.0441 | * | | 23 23.0000 1.0000 -0.9205 | * | | -1.1080 | * | | 24 25.0000 1.0000 -0.8513 | * | | -1.0441 | * | | 25 30.0000 1.0000 -0.7001 | * | | -0.8932 | * | | 26 24.0000 1.0000 -0.8852 | * | | -1.0758 | * | | 27 57.0000 1.0000 -0.2436 | *| | -0.3395 | *| | 28 60.0000 1.0000 -0.2166 | *| | -0.3029 | *| | 29 28.0000 1.0000 -0.7571 | * | | -0.9519 | * | | 30 40.0000 1.0000 -0.4735 | *| | -0.6361 | * | | 31 30.0000 1.0000 -0.7001 | * | | -0.8932 | * | | 32 23.0000 1.0000 -0.9205 | * | | -1.1080 | * | | 33 28.0000 1.0000 -0.7571 | * | | -0.9519 | * | | 34 65.0000 1.0000 -0.1782 | *| | -0.2500 | *| | 35 62.0000 1.0000 -0.2003 | *| | -0.2805 | *| | 36 18.0000 1.0000 0.8934 | | * | 1.0833 | | * | 37 20.0000 1.0000 0.9661 | | * | 1.1482 | | * | 38 28.0000 1.0000 1.3209 | | * | 1.4210 | | * | 39 25.0000 1.0000 1.1747 | | * | 1.3168 | | * | 40 46.0000 1.0000 2.6701 | | *| 2.0472 | | *| 41 28.0000 1.0000 1.3209 | | * | 1.4210 | | * | 42 32.0000 1.0000 1.5445 | | * | 1.5617 | | * | 43 30.0000 1.0000 1.4283 | | * | 1.4912 | | * | The LOGISTIC Procedure Regression Diagnostics Hat Matrix Diagonal Intercept Case (1 unit = 0.01) DfBeta (1 unit = 0.07) Number Value 0 2 4 6 8 12 16 Value -8 -4 0 2 4 6 8 1 0.1438 | * | 0.0894 | |* | 2 0.0780 | * | -0.4620 | * | | 3 0.1626 | *| 0.1193 | | * | 4 0.1310 | * | 0.0634 | |* | 5 0.1310 | * | 0.0634 | |* | 6 0.1025 | * | 0.0210 | * | 7 0.0780 | * | 0.1291 | | * | 8 0.0810 | * | 0.1255 | | * | 9 0.0787 | * | 0.1289 | | * | 10 0.0786 | * | 0.1093 | | * | 11 0.0786 | * | 0.1093 | | * | 12 0.0839 | * | 0.1091 | | * | 13 0.0794 | * | 0.1283 | | * | 14 0.0802 | * | 0.1271 | | * | 15 0.0817 | * | 0.1235 | | * | 16 0.0387 | * | 0.0268 | * | 17 0.0433 | * | -0.0396 | *| | 18 0.0433 | * | -0.0396 | *| | 19 0.0928 | * | -0.2885 | * | | 20 0.0433 | * | -0.0396 | *| | 21 0.0453 | * | 0.0643 | |* | 22 0.0433 | * | -0.0396 | *| | 23 0.0492 | * | -0.0760 | *| | 24 0.0433 | * | -0.0396 | *| | 25 0.0387 | * | 0.0268 | * | 26 0.0459 | * | -0.0570 | *| | 27 0.0721 | * | 0.0596 | |* | 28 0.0694 | * | 0.0524 | |* | 29 0.0389 | * | 0.00415 | * | 30 0.0567 | * | 0.0806 | |* | 31 0.0387 | * | 0.0268 | * | 32 0.0492 | * | -0.0760 | *| | 33 0.0389 | * | 0.00415 | * | 34 0.0628 | * | 0.0413 | |* | 35 0.0670 | * | 0.0478 | |* | 36 0.0735 | * | 0.1561 | | * | 37 0.0623 | * | 0.1331 | | * | 38 0.0389 | * | -0.00724 | * | 39 0.0433 | * | 0.0546 | |* | 40 0.0685 | * | -0.5810 |* | | 41 0.0389 | * | -0.00724 | * | 42 0.0403 | * | -0.1070 | *| | 43 0.0387 | * | -0.0547 | *| | The LOGISTIC Procedure Regression Diagnostics age male Case DfBeta (1 unit = 0.08) DfBeta (1 unit = 0.06) Number Value -8 -4 0 2 4 6 8 Value -8 -4 0 2 4 6 8 1 -0.2279 | * | | 0.1472 | | * | 2 0.2718 | | * | 0.4770 | | *| 3 -0.2456 | * | | 0.1165 | | * | 4 -0.2096 | * | | 0.1699 | | * | 5 -0.2096 | * | | 0.1699 | | * | 6 0.1262 | | * | -0.2110 | * | | 7 -0.0759 | *| | -0.1332 | * | | 8 -0.0867 | *| | -0.1119 | * | | 9 -0.0797 | *| | -0.1278 | * | | 10 -0.0211 | * | -0.1719 | * | | 11 -0.0211 | * | -0.1719 | * | | 12 -0.0858 | *| | -0.0831 | *| | 13 -0.0828 | *| | -0.1224 | * | | 14 -0.0851 | *| | -0.1171 | * | | 15 -0.0877 | *| | -0.1068 | * | | 16 -0.0297 | * | -0.0847 | *| | 17 0.0438 | |* | -0.0847 | *| | 18 0.0438 | |* | -0.0847 | *| | 19 0.3194 | | * | -0.0567 | *| | 20 0.0438 | |* | -0.0847 | *| | 21 -0.0712 | *| | -0.0787 | *| | 22 0.0438 | |* | -0.0847 | *| | 23 0.0841 | |* | -0.0825 | *| | 24 0.0438 | |* | -0.0847 | *| | 25 -0.0297 | * | -0.0847 | *| | 26 0.0631 | |* | -0.0838 | *| | 27 -0.0660 | *| | -0.0327 | *| | 28 -0.0581 | *| | -0.0277 | * | 29 -0.00459 | * | -0.0855 | *| | 30 -0.0892 | *| | -0.0692 | *| | 31 -0.0297 | * | -0.0847 | *| | 32 0.0841 | |* | -0.0825 | *| | 33 -0.00459 | * | -0.0855 | *| | 34 -0.0457 | *| | -0.0206 | * | 35 -0.0529 | *| | -0.0247 | * | 36 -0.1728 | * | | 0.0559 | |* | 37 -0.1473 | * | | 0.0713 | |* | 38 0.00801 | * | 0.1492 | | * | 39 -0.0605 | *| | 0.1169 | | * | 40 0.6433 | | *| 0.3971 | | * | 41 0.00801 | * | 0.1492 | | * | 42 0.1184 | |* | 0.1976 | | * | 43 0.0605 | |* | 0.1727 | | * | The LOGISTIC Procedure Regression Diagnostics Confidence Interval Displacement C Confidence Interval Displacement CBar Case (1 unit = 0.04) (1 unit = 0.03) Number Value 0 2 4 6 8 12 16 Value 0 2 4 6 8 12 16 1 0.1256 | * | 0.1076 | * | 2 0.3285 | * | 0.3029 | * | 3 0.1175 | * | 0.0984 | * | 4 0.1300 | * | 0.1129 | * | 5 0.1300 | * | 0.1129 | * | 6 0.1149 | * | 0.1032 | * | 7 0.0256 | * | 0.0236 | * | 8 0.0196 | * | 0.0180 | * | 9 0.0239 | * | 0.0221 | * | 10 0.0447 | * | 0.0412 | * | 11 0.0447 | * | 0.0412 | * | 12 0.0128 |* | 0.0117 |* | 13 0.0224 | * | 0.0206 | * | 14 0.0209 | * | 0.0193 | * | 15 0.0183 | * | 0.0168 | * | 16 0.0205 | * | 0.0197 | * | 17 0.0343 | * | 0.0328 | * | 18 0.0343 | * | 0.0328 | * | 19 0.1786 | * | 0.1620 | * | 20 0.0343 | * | 0.0328 | * | 21 0.0165 |* | 0.0157 |* | 22 0.0343 | * | 0.0328 | * | 23 0.0461 | * | 0.0438 | * | 24 0.0343 | * | 0.0328 | * | 25 0.0205 | * | 0.0197 | * | 26 0.0396 | * | 0.0377 | * | 27 0.00497 |* | 0.00461 |* | 28 0.00376 |* | 0.00350 |* | 29 0.0241 | * | 0.0232 | * | 30 0.0143 |* | 0.0135 |* | 31 0.0205 | * | 0.0197 | * | 32 0.0461 | * | 0.0438 | * | 33 0.0241 | * | 0.0232 | * | 34 0.00227 |* | 0.00213 |* | 35 0.00309 |* | 0.00288 |* | 36 0.0684 | * | 0.0634 | * | 37 0.0662 | * | 0.0621 | * | 38 0.0734 | * | 0.0706 | * | 39 0.0653 | * | 0.0624 | * | 40 0.5625 | *| 0.5240 | *| 41 0.0734 | * | 0.0706 | * | 42 0.1045 | * | 0.1002 | * | 43 0.0854 | * | 0.0821 | * | The LOGISTIC Procedure Regression Diagnostics Delta Deviance Delta Chi-Square Case (1 unit = 0.29) (1 unit = 0.48) Number Value 0 2 4 6 8 12 16 Value 0 2 4 6 8 12 16 1 1.0978 | * | 0.7483 | * | 2 3.3462 | * | 3.8827 | * | 3 0.9182 | * | 0.6051 | * | 4 1.2312 | * | 0.8621 | * | 5 1.2312 | * | 0.8621 | * | 6 1.3898 | * | 1.0060 | * | 7 0.5163 | * | 0.3030 | * | 8 0.3898 | * | 0.2223 |* | 9 0.4816 | * | 0.2804 | * | 10 0.8292 | * | 0.5241 | * | 11 0.8292 | * | 0.5241 | * | 12 0.2522 | * | 0.1395 |* | 13 0.4491 | * | 0.2595 | * | 14 0.4185 | * | 0.2402 | * | 15 0.3629 | * | 0.2058 |* | 16 0.8175 | * | 0.5099 | * | 17 1.1229 | * | 0.7575 | * | 18 1.1229 | * | 0.7575 | * | 19 2.0608 | * | 1.7462 | * | 20 1.1229 | * | 0.7575 | * | 21 0.5884 | * | 0.3473 | * | 22 1.1229 | * | 0.7575 | * | 23 1.2714 | * | 0.8913 | * | 24 1.1229 | * | 0.7575 | * | 25 0.8175 | * | 0.5099 | * | 26 1.1951 | * | 0.8214 | * | 27 0.1199 |* | 0.0639 |* | 28 0.0952 |* | 0.0504 |* | 29 0.9294 | * | 0.5963 | * | 30 0.4181 | * | 0.2377 |* | 31 0.8175 | * | 0.5099 | * | 32 1.2714 | * | 0.8913 | * | 33 0.9294 | * | 0.5963 | * | 34 0.0646 |* | 0.0339 |* | 35 0.0816 |* | 0.0430 |* | 36 1.2369 | * | 0.8615 | * | 37 1.3805 | * | 0.9953 | * | 38 2.0899 | * | 1.8153 | * | 39 1.7965 | * | 1.4423 | * | 40 4.7150 | *| 7.6536 | *| 41 2.0899 | * | 1.8153 | * | 42 2.5392 | * | 2.4857 | * | 43 2.3059 | * | 2.1222 | * | The LOGISTIC Procedure Regression Diagnostics Pearson Residual Deviance Residual Covariates Case (1 unit = 0.33) (1 unit = 0.26) Number age male Value -8 -4 0 2 4 6 8 Value -8 -4 0 2 4 6 8 44 40.0000 1.0000 2.1118 | | * | 1.8425 | | * | 45 23.0000 1.0000 1.0863 | | * | 1.2485 | | * | Regression Diagnostics Hat Matrix Diagonal Intercept Case (1 unit = 0.01) DfBeta (1 unit = 0.07) Number Value 0 2 4 6 8 12 16 Value -8 -4 0 2 4 6 8 44 0.0567 | * | -0.3594 | * | | 45 0.0492 | * | 0.0896 | |* | Regression Diagnostics age male Case DfBeta (1 unit = 0.08) DfBeta (1 unit = 0.06) Number Value -8 -4 0 2 4 6 8 Value -8 -4 0 2 4 6 8 44 0.3980 | | * | 0.3086 | | * | 45 -0.0992 | *| | 0.0974 | | * | Regression Diagnostics Confidence Interval Displacement C Confidence Interval Displacement CBar Case (1 unit = 0.04) (1 unit = 0.03) Number Value 0 2 4 6 8 12 16 Value 0 2 4 6 8 12 16 44 0.2842 | * | 0.2680 | * | 45 0.0642 | * | 0.0611 | * | Regression Diagnostics Delta Deviance Delta Chi-Square Case (1 unit = 0.29) (1 unit = 0.48) Number Value 0 2 4 6 8 12 16 Value 0 2 4 6 8 12 16 44 3.6628 | * | 4.7276 | * | 45 1.6198 | * | 1.2411 | * | The LOGISTIC Procedure -------------------+----+----+----+----+----+----+----+----+----+-------------------- RESCHI | | P 4 + + e | | a | | r | * | s | | o 2 + * + n | ** | | ** * * | R | * ** ** | e | *** **** | s 0 + * + i | * ** * * | d | * *** *** * ***** * *** | u | * | a | | l -2 + * + | | -------------------+----+----+----+----+----+----+----+----+----+-------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX -------------------+----+----+----+----+----+----+----+----+----+-------------------- D RESDEV | | e 2 + * + v | * | i | * *** | a | * * * * | n | ** * | c | * * ** | e | * * * | | | R 0 + + e | ** ** | s | | i | * * | d | * *** *** * ***** * *** | u | | a | * | l | * | -2 + + -------------------+----+----+----+----+----+----+----+----+----+-------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX The LOGISTIC Procedure --------------------+----+----+----+----+----+----+----+----+----+--------------------- H | | 0.20 + + | | H | | a | * | t 0.15 + * + | | D | ** | i | | a 0.10 + * + g | * * * | o | * ***** ** ** * | n | * ** * * * | a 0.05 + * * * * * + l | *** * * ** * * * ** *** | | | | | 0.00 + + --------------------+----+----+----+----+----+----+----+----+----+--------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX -------------------+----+----+----+----+----+----+----+----+----+------------------- DFBETA0 | | I 0.25 + + n | | t | * ********* ** | e | * ** * ** * ** * * | r 0.00 + * * * * * * * * + c | ** * *** * * * | e | * | p | | t -0.25 + + | * | D | * | f | * | B -0.50 + + e | * | t | | a | | -0.75 + + -------------------+----+----+----+----+----+----+----+----+----+------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX The LOGISTIC Procedure -------------------+----+----+----+----+----+----+----+----+----+------------------- DFBETA1 | | 1.0 + + | | | | a | | g | * | e 0.5 + + | * | D | * * | f | | B | * * * * ** | e 0.0 + ** *** * * ** * * ** * * + t | *** **** * ** * * * * * | a | * *** * | | | | | -0.5 + + | | -------------------+----+----+----+----+----+----+----+----+----+------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX -------------------+----+----+----+----+----+----+----+----+----+------------------- DFBETA2 | | 0.50 + * + | | m | * | a | | l | * | e 0.25 + + | * | D | * ** * * * | f | * * * | B | ** | e 0.00 + ** + t | * ** * | a | * ******* ******* * *** | | * *** | | * | -0.25 + + | | -------------------+----+----+----+----+----+----+----+----+----+------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX The LOGISTIC Procedure ---------------------+----+----+----+----+----+----+----+----+----+--------------------- C C | | o 0.6 + + n | * | f | | i | | d | | e 0.4 + + n | | c | * | e | * | | | I 0.2 + + n | * | t | * **** * | e | **** * * * | r | * *** ** *** * ***** * *** | v 0.0 + * * * * ** * ** + a | | l ---------------------+----+----+----+----+----+----+----+----+----+--------------------- 0 5 10 15 20 25 30 35 40 45 D Case Number INDEX --------------------+----+----+----+----+----+----+----+----+----+--------------------- C CBAR | | o 0.6 + + n | | f | * | i | | d | | e 0.4 + + n | | c | * | e | * | | | I 0.2 + + n | * | t | * *** * | e | * **** * * * | r | * *** * ** * *** * * ** | v 0.0 + * * *** * * ** ** ** + a | | l --------------------+----+----+----+----+----+----+----+----+----+--------------------- 0 5 10 15 20 25 30 35 40 45 D Case Number INDEX The LOGISTIC Procedure -------------------+----+----+----+----+----+----+----+----+----+-------------------- DIFDEV | | 6 + + D | | e | | l | * | t | | a 4 + + | * | D | * | e | | v | ** | i 2 + * * * + a | * * | n | * *** ** * *** * * ** | c | * ** * * * * * | e | *** **** * * | 0 + ** ** + | | -------------------+----+----+----+----+----+----+----+----+----+-------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX ------------------+----+----+----+----+----+----+----+----+----+------------------- DIFCHISQ | | D 7.5 + * + e | | l | | t | | a | | 5.0 + + C | * | h | * | i | | S | | q 2.5 + * + u | * * * | a | * * | r | *** ** * *** * * ** * | e | * * * *** * * * * * * * | 0.0 + * * ** ** * ** + | | ------------------+----+----+----+----+----+----+----+----+----+------------------- 0 5 10 15 20 25 30 35 40 45 Case Number INDEX Predicted Probabilities and 95% Confidence Limits Obs age male survival _LEVEL_ phat lcl ucl 1 47 0 0 1 0.39051 0.12256 0.74612 2 25 0 0 1 0.78165 0.48758 0.93088 3 50 0 0 1 0.33631 0.08687 0.72965 4 45 0 0 1 0.42831 0.15156 0.75857 5 45 0 0 1 0.42831 0.15156 0.75857 6 40 0 1 1 0.52554 0.23964 0.79562 7 25 0 1 1 0.78165 0.48758 0.93088 8 21 0 1 1 0.83035 0.52549 0.95582 9 24 0 1 1 0.79470 0.49808 0.93789 10 32 0 1 1 0.67434 0.39067 0.86992 11 32 0 1 1 0.67434 0.39067 0.86992 12 15 0 1 1 0.88669 0.56616 0.97913 13 23 0 1 1 0.80717 0.50787 0.94438 14 22 0 1 1 0.81905 0.51698 0.95035 15 20 0 1 1 0.84109 0.53342 0.96079 16 30 1 0 1 0.32894 0.17747 0.52686 17 25 1 0 1 0.42019 0.24082 0.62346 18 25 1 0 1 0.42019 0.24082 0.62346 19 15 1 0 1 0.61303 0.31741 0.84367 20 25 1 0 1 0.42019 0.24082 0.62346 21 35 1 0 1 0.24899 0.11216 0.46527 22 25 1 0 1 0.42019 0.24082 0.62346 23 23 1 0 1 0.45870 0.26153 0.66971 24 25 1 0 1 0.42019 0.24082 0.62346 25 30 1 0 1 0.32894 0.17747 0.52686 26 24 1 0 1 0.43936 0.25156 0.64628 27 57 1 0 1 0.05601 0.00598 0.36913 28 60 1 0 1 0.04483 0.00386 0.36261 29 28 1 0 1 0.36434 0.20429 0.56132 30 40 1 0 1 0.18317 0.06288 0.42837 31 30 1 0 1 0.32894 0.17747 0.52686 32 23 1 0 1 0.45870 0.26153 0.66971 33 28 1 0 1 0.36434 0.20429 0.56132 34 65 1 0 1 0.03076 0.00184 0.35282 35 62 1 0 1 0.03859 0.00287 0.35855 36 18 1 1 1 0.55612 0.30063 0.78503 37 20 1 1 1 0.51725 0.28694 0.74046 38 28 1 1 1 0.36434 0.20429 0.56132 39 25 1 1 1 0.42019 0.24082 0.62346 40 46 1 1 1 0.12301 0.02859 0.40062 41 28 1 1 1 0.36434 0.20429 0.56132 42 32 1 1 1 0.29538 0.15031 0.49834 43 30 1 1 1 0.32894 0.17747 0.52686 44 40 1 1 1 0.18317 0.06288 0.42837 45 23 1 1 1 0.45870 0.26153 0.66971 Predicted Probabilities and 95% Confidence Limits The LOGISTIC Procedure Model Information Data Set WORK.DONNERMALE Response Variable survival Number of Response Levels 2 Number of Observations 30 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value survival Frequency 1 1 10 2 0 20 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 40.191 41.219 SC 41.592 44.021 -2 Log L 38.191 37.219 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 0.9719 1 0.3242 Score 0.8953 1 0.3440 Wald 0.8480 1 0.3571 Predicted Probabilities and 95% Confidence Limits The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 0.3183 1.1310 0.0792 0.7784 age 1 -0.0325 0.0353 0.8480 0.3571 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 0.968 0.903 1.037 Association of Predicted Probabilities and Observed Responses Percent Concordant 52.0 Somers' D 0.115 Percent Discordant 40.5 Gamma 0.124 Percent Tied 7.5 Tau-a 0.053 Pairs 200 c 0.558 Male Predicted Probabilities and 95% Confidence Limits Obs age survival _LEVEL_ phat lcl ucl 1 30 0 1 0.34164 0.19360 0.52867 2 25 0 1 0.37905 0.20747 0.58736 3 25 0 1 0.37905 0.20747 0.58736 4 15 0 1 0.45789 0.18777 0.75527 5 25 0 1 0.37905 0.20747 0.58736 6 35 0 1 0.30611 0.15961 0.50611 7 25 0 1 0.37905 0.20747 0.58736 8 23 0 1 0.39445 0.20731 0.61868 9 25 0 1 0.37905 0.20747 0.58736 10 30 0 1 0.34164 0.19360 0.52867 11 24 0 1 0.38672 0.20773 0.60262 12 57 0 1 0.17758 0.02798 0.61823 13 60 0 1 0.16379 0.02108 0.64052 14 28 0 1 0.35640 0.20182 0.54807 15 40 0 1 0.27274 0.11785 0.51285 16 30 0 1 0.34164 0.19360 0.52867 17 23 0 1 0.39445 0.20731 0.61868 18 28 0 1 0.35640 0.20182 0.54807 19 65 0 1 0.14274 0.01303 0.67743 20 62 0 1 0.15509 0.01741 0.65537 21 18 1 1 0.43383 0.19749 0.70465 22 20 1 1 0.41795 0.20267 0.66980 23 28 1 1 0.35640 0.20182 0.54807 24 25 1 1 0.37905 0.20747 0.58736 25 46 1 1 0.23584 0.07448 0.54205 26 28 1 1 0.35640 0.20182 0.54807 27 32 1 1 0.32719 0.18195 0.51532 28 30 1 1 0.34164 0.19360 0.52867 29 40 1 1 0.27274 0.11785 0.51285 30 23 1 1 0.39445 0.20731 0.61868 FeMale Predicted Probabilities and 95% Confidence Limits The LOGISTIC Procedure Model Information Data Set WORK.DONNERFEMALE Response Variable survival Number of Response Levels 2 Number of Observations 15 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value survival Frequency 1 1 10 2 0 5 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied. Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 21.095 14.127 SC 21.803 15.543 -2 Log L 19.095 10.127 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 8.9680 1 0.0027 Score 7.8749 1 0.0050 Wald 4.9289 1 0.0264 FeMale Predicted Probabilities and 95% Confidence Limits The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates Standard Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 7.2458 3.2050 5.1113 0.0238 age 1 -0.1941 0.0874 4.9289 0.0264 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age 0.824 0.694 0.978 Association of Predicted Probabilities and Observed Responses Percent Concordant 92.0 Somers' D 0.860 Percent Discordant 6.0 Gamma 0.878 Percent Tied 2.0 Tau-a 0.410 Pairs 50 c 0.930 Female Predicted Probabilities and 95% Confidence Limits Obs age survival _LEVEL_ phat lcl ucl 1 47 0 1 0.13296 0.01197 0.66007 2 25 0 1 0.91639 0.49885 0.99178 3 50 0 1 0.07891 0.00444 0.62226 4 45 0 1 0.18439 0.02255 0.68894 5 45 0 1 0.18439 0.02255 0.68894 6 40 1 1 0.37365 0.09199 0.77840 7 25 1 1 0.91639 0.49885 0.99178 8 21 1 1 0.95971 0.55578 0.99780 9 24 1 1 0.93011 0.51471 0.99405 10 32 1 1 0.73806 0.33965 0.93915 11 32 1 1 0.73806 0.33965 0.93915 12 15 1 1 0.98707 0.61857 0.99972 13 23 1 1 0.94172 0.52938 0.99571 14 22 1 1 0.95150 0.54303 0.99692 15 20 1 1 0.96658 0.56774 0.99843 ---------------------------- end SAS output part d --------- for a change of pace lets do this in Minitab blogistic. Also we can demonstrate that SAS and Minitab give the same results. output below is for the donner.dat variables. 3 blogistic runs. base run from part a MTB > blogist c3 = c1 c2 2 equivalent runs with interaction term MTB > blogist c3 = c1 c2 c1*c2 MTB > let c4 = c1*c2 MTB > blogist c3 = c1 c2 c4 From the output for the base model in part a Log-Likelihood = -25.628 For the model including age*male interaction Log-Likelihood = -23.673 So drop-in-deviance test statistic is 2*(25.628 - 23.673) = 3.91 compare with a chisq 1df, which is significant at .05 Can also look at the parameter estimate and approx standard error (e.g. Wald test) and from this interaction not quite significant. --------------- Minitab output ————— 6/3/2001 12:29:45 AM ———————————————————— Welcome to Minitab, press F1 for help. MTB > info Information on the Worksheet Column Count Name C1 45 age C2 45 male C3 45 survival MTB > descr c1-c3 Descriptive Statistics: age, male, survival Variable N Mean Median TrMean StDev SE Mean age 45 31.80 28.00 31.07 12.51 1.87 male 45 0.6667 1.0000 0.6829 0.4767 0.0711 survival 45 0.4444 0.0000 0.4390 0.5025 0.0749 Variable Minimum Maximum Q1 Q3 age 15.00 65.00 23.50 40.00 male 0.0000 1.0000 0.0000 1.0000 survival 0.0000 1.0000 0.0000 1.0000 MTB > blogist c3 = c1 c2 Binary Logistic Regression: survival versus age, male Link Function: Logit Response Information Variable Value Count survival 1 20 (Event) 0 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 3.230 1.387 2.33 0.020 age -0.07820 0.03729 -2.10 0.036 0.92 0.86 0.99 male -1.5973 0.7555 -2.11 0.034 0.20 0.05 0.89 Log-Likelihood = -25.628 Test that all slopes are zero: G = 10.570, DF = 2, P-Value = 0.005 Goodness-of-Fit Tests Method Chi-Square DF P Pearson 24.353 25 0.499 Deviance 26.441 25 0.384 Hosmer-Lemeshow 10.952 8 0.204 Table of Observed and Expected Frequencies: (See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic) Group Value 1 2 3 4 5 6 7 8 9 10 Total 1 Obs 0 3 1 2 1 1 3 3 4 2 20 Exp 0.2 1.0 1.3 1.8 2.9 2.7 2.2 2.9 3.3 1.7 0 Obs 4 2 3 3 6 5 1 1 0 0 25 Exp 3.8 4.0 2.7 3.2 4.1 3.3 1.8 1.1 0.7 0.3 Total 4 5 4 5 7 6 4 4 4 2 45 Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant 365 73.0% Somers' D 0.49 Discordant 119 23.8% Goodman-Kruskal Gamma 0.51 Ties 16 3.2% Kendall's Tau-a 0.25 Total 500 100.0% MTB > blogist c3 = c1 c2 c1*c2 Binary Logistic Regression: survival versus age, male Link Function: Logit Response Information Variable Value Count survival 1 20 (Event) 0 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 7.246 3.205 2.26 0.024 age -0.19407 0.08742 -2.22 0.026 0.82 0.69 0.98 male -6.928 3.399 -2.04 0.042 0.00 0.00 0.77 age*male 0.16160 0.09426 1.71 0.086 1.18 0.98 1.41 Log-Likelihood = -23.673 Test that all slopes are zero: G = 14.480, DF = 3, P-Value = 0.002 Goodness-of-Fit Tests Method Chi-Square DF P Pearson 20.781 24 0.652 Deviance 22.532 24 0.548 Hosmer-Lemeshow 5.169 7 0.639 Table of Observed and Expected Frequencies: (See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic) Group Value 1 2 3 4 5 6 7 8 9 Total 1 Obs 0 1 2 3 2 1 4 3 4 20 Exp 0.5 0.9 1.2 2.8 2.6 1.6 2.8 3.7 3.9 0 Obs 4 4 2 5 5 3 1 1 0 25 Exp 3.5 4.1 2.8 5.2 4.4 2.4 2.2 0.3 0.1 Total 4 5 4 8 7 4 5 4 4 45 Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant 381 76.2% Somers' D 0.56 Discordant 103 20.6% Goodman-Kruskal Gamma 0.57 Ties 16 3.2% Kendall's Tau-a 0.28 Total 500 100.0% MTB > let c4 = c1*c2 MTB > blogist c3 = c1 c2 c4 Binary Logistic Regression: survival versus age, male, C4 Link Function: Logit Response Information Variable Value Count survival 1 20 (Event) 0 25 Total 45 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 7.246 3.205 2.26 0.024 age -0.19407 0.08742 -2.22 0.026 0.82 0.69 0.98 male -6.928 3.399 -2.04 0.042 0.00 0.00 0.77 C4 0.16160 0.09426 1.71 0.086 1.18 0.98 1.41 Log-Likelihood = -23.673 Test that all slopes are zero: G = 14.480, DF = 3, P-Value = 0.002 Goodness-of-Fit Tests Method Chi-Square DF P Pearson 20.781 24 0.652 Deviance 22.532 24 0.548 Hosmer-Lemeshow 5.169 7 0.639 Table of Observed and Expected Frequencies: (See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic) Group Value 1 2 3 4 5 6 7 8 9 Total 1 Obs 0 1 2 3 2 1 4 3 4 20 Exp 0.5 0.9 1.2 2.8 2.6 1.6 2.8 3.7 3.9 0 Obs 4 4 2 5 5 3 1 1 0 25 Exp 3.5 4.1 2.8 5.2 4.4 2.4 2.2 0.3 0.1 Total 4 5 4 8 7 4 5 4 4 45 Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant 381 76.2% Somers' D 0.56 Discordant 103 20.6% Goodman-Kruskal Gamma 0.57 Ties 16 3.2% Kendall's Tau-a 0.28 Total 500 100.0% --------- end minitab output ====================================================================== -------------------------- PROBLEM 3 -------------------------- run for both 2 pred and 5 pred models shown below 5 predictor model reproduces NWK Table 14.9 (ver4) 14.15 (ver5) deviance 2 predictor 149.33 5 predictor 114.99 difference 34.3 compared to chi-sq 3df; significant (full model better) ---------------------------- options linesize = 80 pagesize=80; data miller; infile 'F:\drr99\ed257\web\hw\miller.dat'; input ncust house inc age compd stored; run; proc genmod data=miller; model ncust = compd stored / dist = poi link = log; title 'SAS poisson regression Miller lumber 2 predictors'; proc genmod data=miller; model ncust = house inc age compd stored / dist = poi link = log; title 'SAS poisson regression Miller lumber 5 predictors'; run; -------------------------------- SAS poisson regression Miller lumber 2 predictors 1 The GENMOD Procedure Model Information Description Value Data Set WORK.MILLER Distribution POISSON Link Function LOG Dependent Variable NCUST Observations Used 110 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 107 149.3287 1.3956 Scaled Deviance 107 149.3287 1.3956 Pearson Chi-Square 107 135.7075 1.2683 Scaled Pearson X2 107 135.7075 1.2683 Log Likelihood . 1880.8508 . Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 2.5951 0.1731 224.7588 0.0001 COMPD 1 0.1516 0.0253 35.8588 0.0001 STORED 1 -0.1091 0.0156 48.7346 0.0001 SCALE 0 1.0000 0.0000 . . NOTE: The scale parameter was held fixed. SAS poisson regression Miller lumber 5 predictors 2 The GENMOD Procedure Model Information Description Value Data Set WORK.MILLER Distribution POISSON Link Function LOG Dependent Variable NCUST Observations Used 110 Criteria For Assessing Goodness Of Fit Criterion DF Value Value/DF Deviance 104 114.9854 1.1056 Scaled Deviance 104 114.9854 1.1056 Pearson Chi-Square 104 101.8808 0.9796 Scaled Pearson X2 104 101.8808 0.9796 Log Likelihood . 1898.0224 . Analysis Of Parameter Estimates Parameter DF Estimate Std Err ChiSquare Pr>Chi INTERCEPT 1 2.9424 0.2072 201.5738 0.0001 HOUSE 1 0.0006 0.0001 18.1674 0.0001 INC 1 -0.0000 0.0000 30.6256 0.0001 AGE 1 -0.0037 0.0018 4.3734 0.0365 COMPD 1 0.1684 0.0258 42.6973 0.0001 STORED 1 -0.1288 0.0162 63.1729 0.0001 SCALE 0 1.0000 0.0000 . . NOTE: The scale parameter was held fixed. ----------------------------------- PROBLEM 4 ----------------------------------- run best subsets variable selection for logistic regression , disease data data diseasedat; infile 'E:\disease.dat'; input age ses1 ses2 sector disease; run; proc logistic data=diseasedat descending; model disease = age ses1 ses2 sector /selection=score; run; ------------------------- The SAS System 11:25 Saturday, May 26, 2001 The LOGISTIC Procedure Model Information Data Set WORK.DISEASEDAT Response Variable disease Number of Response Levels 2 Number of Observations 98 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Value disease Frequency 1 1 31 2 0 67 Regression Models Selected by Score Criterion Number of Score Variables Chi-Square Variables Included in Model 1 14.7805 sector 1 7.5802 age 1 3.9087 ses2 1 1.4797 ses1 2 19.5250 age sector 2 15.7058 ses2 sector 2 15.3663 ses1 sector 2 10.0177 age ses2 2 9.2368 age ses1 2 4.0670 ses1 ses2 3 20.2719 age ses1 sector 3 20.0188 age ses2 sector 3 15.8641 ses1 ses2 sector 3 10.4575 age ses1 ses2 4 20.4067 age ses1 ses2 sector can use minitab to get deviance = -2*Log-Likelihood put in data Column Count Name C1 98 age C2 98 ses1 C3 98 ses2 C4 98 sector C5 98 disease for 2 predictor model blogist disease = age sector Logistic Regression Table Odds 95% CI Predictor Coef StDev Z P Ratio Lower Upper Constant -2.3352 0.5111 -4.57 0.000 age 0.02929 0.01317 2.22 0.026 1.03 1.00 1.06 sector 1.6735 0.4873 3.43 0.001 5.33 2.05 13.85 Log-Likelihood = -51.130 so model deviance = 102.26 for 3-predictor model blogist disease = age sector ses1 Logistic Regression Table Odds 95% CI Predictor Coef StDev Z P Ratio Lower Upper Constant -2.4900 0.5466 -4.56 0.000 age 0.03045 0.01335 2.28 0.023 1.03 1.00 1.06 sector 1.6317 0.4904 3.33 0.001 5.11 1.96 13.37 ses1 0.5368 0.5481 0.98 0.327 1.71 0.58 5.01 Log-Likelihood = -50.655 so model deviance = 101.31 decrease in model deviance from adding third predictor is less than 1 so clearly not significant compared to chi-square 1 df. ----------------------------------------------------- Problem 5 Mathematica output in pdf format in 05hw5p5sol.pdf (NWK refs to ver4 in pdf) --------------- end hw5 solutions