Solution file for additional exercise 2.2 ----------------------------------------- (Minitab version 15) The analysis consists simply in fitting a linear regression model for a regression of y on x, and to screen appropriate residuals and statistics for "strange things". Model: y_i = beta_0 + beta_1*x_i + eps_i where the eps_i's are i.i.d. and N(0,sigma^2). MTB > WOpen "R:\data_csv\hs02_2.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: ‘R:\data_csv\hs02_2.csv’ Worksheet was saved on 12/01/2011 MTB > Fitline 'y' 'x'; SUBC> Confidence 95.0. Regression Analysis: y versus x The regression equation is y = 29.22 - 0.4595 x S = 8.41252 R-Sq = 21.9% R-Sq(adj) = 17.5% Analysis of Variance Source DF SS MS F P Regression 1 356.33 356.333 5.04 0.038 Error 18 1273.87 70.770 Total 19 1630.20 Fitted Line: y versus x MTB > Name c3 "RESI1" c4 "SRES1" c5 "TRES1" c6 "HI1" c7 "COOK1" c8 "DFIT1" MTB > Regress 'y' 1 'x'; SUBC> Residuals 'RESI1'; SUBC> SResiduals 'SRES1'; SUBC> Tresiduals 'TRES1'; SUBC> Hi 'HI1'; SUBC> Cookd 'COOK1'; SUBC> DFits 'DFIT1'; SUBC> GFourpack; SUBC> RType 2; SUBC> Constant; SUBC> Brief 2. Regression Analysis: y versus x The regression equation is y = 29.2 - 0.459 x Predictor Coef SE Coef T P Constant 29.221 5.889 4.96 0.000 x -0.4595 0.2048 -2.24 0.038 S = 8.41252 R-Sq = 21.9% R-Sq(adj) = 17.5% Analysis of Variance Source DF SS MS F P Regression 1 356.33 356.33 5.04 0.038 Residual Error 18 1273.87 70.77 Total 19 1630.20 Unusual Observations Obs x y Fit SE Fit Residual St Resid 19 10.0 8.00 24.63 4.00 -16.63 -2.25R R denotes an observation with a large standardized residual. Residual Plots for y MTB > Print 'x'-'DFIT1'. Data Display Row x y RESI1 SRES1 TRES1 HI1 COOK1 DFIT1 1 28 15 -1.3554 -0.16533 -0.16079 0.050333 0.000724 -0.03702 2 26 14 -3.2744 -0.39953 -0.39001 0.050926 0.004283 -0.09034 3 42 15 5.0774 0.66607 0.65544 0.178907 0.048334 0.30595 4 29 12 -3.8959 -0.47559 -0.46512 0.051815 0.006180 -0.10873 5 16 37 15.1308 1.92277 2.09624 0.124989 0.264049 0.79226 6 21 30 10.4282 1.28759 1.31325 0.073145 0.065418 0.36892 7 25 7 -10.7338 -1.31116 -1.33980 0.053000 0.048106 -0.31696 8 35 14 0.8610 0.10703 0.10405 0.085587 0.000536 0.03183 9 30 28 12.5636 1.53586 1.60119 0.054481 0.067959 0.38435 10 36 13 0.3205 0.04006 0.03893 0.095364 0.000085 0.01264 11 37 5 -7.2200 -0.90787 -0.90321 0.106325 0.049031 -0.31154 12 41 13 2.6180 0.33995 0.33144 0.162020 0.011172 0.14574 13 20 24 3.9687 0.49215 0.48154 0.081144 0.010695 0.14310 14 26 8 -9.2744 -1.13164 -1.14110 0.050926 0.034358 -0.26433 15 38 13 1.2395 0.15693 0.15261 0.118471 0.001655 0.05595 16 26 17 -0.2744 -0.03348 -0.03253 0.050926 0.000030 -0.00754 17 10 27 2.3738 0.32081 0.31266 0.226307 0.015052 0.16910 18 18 29 8.0497 1.00903 1.00957 0.100696 0.057001 0.33782 19 10 8 -16.6262 -2.24689 -2.57422 0.226307 0.738352 -1.39223 20 31 5 -9.9769 -1.22214 -1.24028 0.058332 0.046262 -0.30869 MTB > CDF -2.57 k1; SUBC> T 17. MTB > let k2=k1*2*20 MTB > print k1 k2 Data Display K1 0.00993699 K2 0.397479 Answers to exercise: -------------------- The fitted line plot shows the observation (10,8) in the lower left corner to be considerably off the regression line. It is exactly this observation in the Minitab table of Unusual Observations. The standardised residual is -2.25 (and the deletion residual is -2.57); neither of these are large enough to provide overwhelming evidence of data error (P=0.40 for the outlier test based on the deletion residual). Nor is the leverage very large (and it shouldn't be, because the x-value is not an outlier in the distribution of x's). The observation shows up as influential on both Cook's D and DFITS. The reason why the statistics do not point more clearly to this point as an outlier is that there is much noise about the regression equation, with an R^2 of only 0.22 (increasing to 0.41 without the suspect observation).