Welcome to Minitab, press F1 for help. MTB > random 10000 c1; SUBC> normal 10 2. MTB > descr c1 Descriptive Statistics: C1 Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum C1 10000 0 9.9945 2.0200 2.7985 8.6333 10.0061 11.3717 17.4543 MTB > name c1 'x' MTB > random 10000 c2; SUBC> normal 0 1.24. MTB > name c2 'u' MTB > let c3 = c1 + c2 MTB > corr c1 c3 Correlations: x, C3 Pearson correlation of x and C3 = 0.853 MTB > name c3 'latentY' MTB > # regression discontinuity MTB > Code (10:100) 0 (0:9.999999) 1 'x' c4 MTB > tally c4 Tally for Discrete Variables: C4 C4 Count 0 5013 1 4987 N= 10000 MTB > name c4 'G' MTB > desc c3 Descriptive Statistics: latentY Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum latentY 10000 0 10.001 2.369 2.132 8.390 10.008 11.602 18.856 MTB > desc c3; SUBC> by c4. Descriptive Statistics: latentY Variable G N N* Mean StDev Minimum Q1 Median Q3 Maximum latentY 0 5013 0 11.619 1.716 6.464 10.408 11.515 12.672 18.856 1 4987 0 8.3747 1.7363 2.1321 7.2582 8.4697 9.5809 14.1597 MTB > desc c1; SUBC> by c4. Descriptive Statistics: x Variable G N N* Mean StDev Minimum Q1 Median Q3 Maximum x 0 5013 0 11.606 1.199 10.001 10.654 11.369 12.314 17.454 1 4987 0 8.3744 1.2256 2.7985 7.6581 8.6281 9.3489 10.0000 MTB > let c5 = c3 + 1.2*G MTB > name c5 'obsY' MTB > desc c5; SUBC> by c4. Descriptive Statistics: obsY Variable G N N* Mean StDev Minimum Q1 Median Q3 Maximum obsY 0 5013 0 11.619 1.716 6.464 10.408 11.515 12.672 18.856 1 4987 0 9.5747 1.7363 3.3321 8.4582 9.6697 10.7809 15.3597 MTB > brief 1 MTB > regress c5 1 c4 Regression Analysis: obsY versus G The regression equation is obsY = 11.6 - 2.04 G Predictor Coef SE Coef T P Constant 11.6190 0.0244 476.62 0.000 G -2.04428 0.03452 -59.22 0.000 S = 1.72602 R-Sq = 26.0% R-Sq(adj) = 26.0% MTB > # ancova using X (complete disciminant) MTB > regress c5 2 c4 c1 Regression Analysis: obsY versus G, x The regression equation is obsY = 0.086 + 1.17 G + 0.994 x Predictor Coef SE Coef T P Constant 0.0860 0.1197 0.72 0.472 G 1.16701 0.04121 28.32 0.000 x 0.99370 0.01020 97.42 0.000 S = 1.23631 R-Sq = 62.0% R-Sq(adj) = 62.0% MTB > #coeff of G 1.17 is withing 1 se (.04) of correct answer MTB > #create incomplete covariate Z (pure error) MTB > random 10000 c6; SUBC> normal 0 1.6. MTB > name c6 'v' MTB > let c7 = c1 + c6 MTB > name c7 'Z' MTB > desc z Descriptive Statistics: Z Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum Z 10000 0 9.9964 2.5861 -0.1827 8.2346 10.0043 11.7412 20.4731 MTB > corr c1 c7 Correlations: x, Z Pearson correlation of x and Z = 0.780 MTB > regress c5 2 c4 c7 Regression Analysis: obsY versus G, Z The regression equation is obsY = 7.57 - 0.912 G + 0.349 Z Predictor Coef SE Coef T P Constant 7.56743 0.09366 80.80 0.000 G -0.91196 0.04051 -22.51 0.000 Z 0.348811 0.007832 44.54 0.000 S = 1.57676 R-Sq = 38.2% R-Sq(adj) = 38.2% MTB > # a reasonable covariate z horribly misinforms MTB > # try probabilstic assignment on X (see mma output) MTB > let c8 = 1/(1 + 1/exp(-5 + .5*c1)) MTB > let c9 = round(c8) MTB > tally c9 Tally for Discrete Variables: C9 C9 Count 0 4987 1 5013 N= 10000 MTB > table c4 c9 Tabulated statistics: G, Glogist Rows: G Columns: Glogist 0 1 All 0 5013 0 5013 1 0 4987 4987 All 5013 4987 10000 MTB > # no change in assignment by round gotta do it as bernoilli MTB > desc c8 Descriptive Statistics: C8 Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum C8 10000 0 0.49979 0.21006 0.02658 0.33551 0.50077 0.66504 0.97650 ------------------------------------------- GMacro DoB DO K1 = 1 :10000 let k2 = c8(k1) random 1 c12; bernoulli k2. let c11(k1) = c12 ENDDO ENDMACRO -------------------------------------------- MTB > %D:\drr04\ed260\propen.mtb Executing from file: D:\drr04\ed260\propen.mtb MTB > let c12 = 1 - c11 MTB > table c4 c12 Tabulated statistics: G, C12 Rows: G Columns: C12 0 1 All 0 3368 1645 5013 1 1567 3420 4987 All 4935 5065 10000 MTB > desc c1; SUBC> by c12. Descriptive Statistics: x Variable C12 N N* Mean StDev Minimum Q1 Median Q3 Maximum x 0 4935 0 10.853 1.812 3.797 9.623 10.856 12.082 17.454 1 5065 0 9.1584 1.8553 2.7985 7.9152 9.1662 10.3984 16.0624 MTB > name c12 'GlogistB' MTB > let c13 = c3 + 1.2*c12 MTB > name c13 'obsYlogistB' MTB > desc c13; SUBC> by c12. Descriptive Statistics: obsYlogistB Variable GlogistB N N* Mean StDev Minimum Q1 Median Q3 Maximum obsYlogistB 0 4935 0 10.851 2.204 2.927 9.357 10.864 12.299 18.856 1 5065 0 10.373 2.227 3.332 8.890 10.381 11.887 17.999 MTB > regress c13 1 c12 Regression Analysis: obsYlogistB versus GlogistB The regression equation is obsYlogistB = 10.9 - 0.477 GlogistB Predictor Coef SE Coef T P Constant 10.8505 0.0315 344.07 0.000 GlogistB -0.47716 0.04431 -10.77 0.000 S = 2.21535 R-Sq = 1.1% R-Sq(adj) = 1.1% MTB > regress c13 2 c12 c1 Regression Analysis: obsYlogistB versus GlogistB, x The regression equation is obsYlogistB = - 0.0282 + 1.22 GlogistB + 1.00 x Predictor Coef SE Coef T P Constant -0.02822 0.07526 -0.38 0.708 GlogistB 1.22106 0.02724 44.83 0.000 x 1.00242 0.00674 148.68 0.000 S = 1.23631 R-Sq = 69.2% R-Sq(adj) = 69.2% MTB > # 1.22 within 1 se of correct answer MTB > regress c13 2 c12 c7 Regression Analysis: obsYlogistB versus GlogistB, Z The regression equation is obsYlogistB = 4.78 + 0.469 GlogistB + 0.559 Z Predictor Coef SE Coef T P Constant 4.78485 0.08137 58.81 0.000 GlogistB 0.46901 0.03692 12.70 0.000 Z 0.558846 0.007139 78.28 0.000 S = 1.74439 R-Sq = 38.7% R-Sq(adj) = 38.7% MTB > # z misses by almost a factor of 3 MTB > # propensity matching MTB > # do the groups overlap on z? MTB > desc c7; SUBC> by c12. Descriptive Statistics: Z Variable GlogistB N N* Mean StDev Minimum Q1 Median Q3 Maximum Z 0 4935 0 10.854 2.420 1.049 9.229 10.910 12.507 20.473 1 5065 0 9.1609 2.4670 -0.1827 7.5014 9.1603 10.7977 17.4644 MTB > #Q3 treatment ~ median control MTB > # can either match on naked Z or on logistic regression fit MTB > desc c7 Descriptive Statistics: Z Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum Z 10000 0 9.9964 2.5861 -0.1827 8.2346 10.0043 11.7412 20.4731 MTB > Code (-5 : 7.499999) 1 (7.5: 9.299999) 2 (9.3 : 10.69999) 3 (10.7: & CONT> 12.49999) 4 (12.5:100) 5 'Z' c14 MTB > tally c14 Tally for Discrete Variables: C14 C14 Count 1 1686 2 2240 3 2105 4 2281 5 1688 N= 10000 MTB > table c14 c12 Tabulated statistics: C14, GlogistB Rows: C14 Columns: GlogistB 0 1 All 1 421 1265 1686 2 859 1381 2240 3 1011 1094 2105 4 1409 872 2281 5 1235 453 1688 All 4935 5065 10000 MTB > table c14 c12; SUBC> mean c13. Tabulated statistics: C14, GlogistB Rows: C14 Columns: GlogistB 0 1 All 1 8.44 8.63 8.58 2 9.46 9.95 9.76 3 10.40 10.84 10.63 4 11.28 11.72 11.45 5 12.52 12.82 12.60 All 10.85 10.37 10.61 Cell Contents: obsYlogistB : Mean MTB > # here comparing subgroups shows little or no treatment outcome MTB > table c14 c12; SUBC> mean c7. Tabulated statistics: C14, GlogistB Rows: C14 Columns: GlogistB 0 1 All 1 6.402 6.038 6.129 2 8.482 8.438 8.455 3 10.032 9.974 10.002 4 11.553 11.491 11.529 5 13.897 13.634 13.826 All 10.854 9.161 9.996 Cell Contents: Z : Mean MTB > table c14 c12; SUBC> median c7. Tabulated statistics: C14, GlogistB Rows: C14 Columns: GlogistB 0 1 All 1 6.683 6.395 6.458 2 8.527 8.448 8.483 3 10.054 9.946 9.992 4 11.511 11.441 11.477 5 13.630 13.370 13.526 All 10.910 9.160 10.004 Cell Contents: Z : Median MTB > blog c12 = c7; SUBC> eprobability c15. Binary Logistic Regression: GlogistB versus Z Link Function: Logit Response Information Variable Value Count GlogistB 1 5065 (Event) 0 4935 Total 10000 Logistic Regression Table Odds 95% CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant 2.86940 0.0940681 30.50 0.000 Z -0.284021 0.0091416 -31.07 0.000 0.75 0.74 0.77 MTB > desc c15 Descriptive Statistics: C15 Variable N N* Mean StDev Minimum Q1 Median Q3 Maximum C15 10000 0 0.50650 0.16424 0.04996 0.38572 0.50699 0.62962 0.94889 MTB > name c15 'propensity' MTB > Code (0 : .329999999) 1 (.33: .44999999) 2 (.45 : .5499999) 3 (.55: & CONT> .66999999) 4 (.67:1.00) 5 'propensity' c16 MTB > tally c16 Tally for Discrete Variables: C16 C16 Count 1 1598 2 2220 3 2117 4 2264 5 1801 N= 10000 MTB > table c16 c12; SUBC> mean c13. Tabulated statistics: C16, GlogistB Rows: C16 Columns: GlogistB 0 1 All 1 12.56 12.91 12.65 2 11.32 11.76 11.49 3 10.44 10.90 10.67 4 9.54 10.00 9.82 5 8.47 8.68 8.63 All 10.85 10.37 10.61 Cell Contents: obsYlogistB : Mean MTB > table c16 c12; SUBC> mean c7. Tabulated statistics: C16, GlogistB Rows: C16 Columns: GlogistB 0 1 All 1 13.96 13.71 13.90 2 11.65 11.59 11.62 3 10.13 10.07 10.10 4 8.58 8.54 8.56 5 6.49 6.13 6.22 All 10.85 9.16 10.00 Cell Contents: Z : Mean MTB > table c16 c12; SUBC> mean c15. Tabulated statistics: C16, GlogistB Rows: C16 Columns: GlogistB 0 1 All 1 0.2554 0.2673 0.2586 2 0.3927 0.3966 0.3942 3 0.4980 0.5025 0.5003 4 0.6060 0.6085 0.6075 5 0.7324 0.7495 0.7452 All 0.4517 0.5599 0.5065 Cell Contents: propensity : Mean MTB > #pretty good matching on Z and propensity