Solution file for additional exercise 2.2
-----------------------------------------
(Minitab version 15)

The analysis consists simply in fitting a linear regression model for a
regression of y on x, and to screen appropriate residuals and statistics
for "strange things".

Model:   y_i = beta_0 + beta_1*x_i + eps_i
where the eps_i's are i.i.d. and N(0,sigma^2).

MTB > WOpen "R:\data_csv\hs02_2.csv";
SUBC>   FType;
SUBC>     CSV;
SUBC>   DecSep;
SUBC>     Period;
SUBC>   Field;
SUBC>     Comma;
SUBC>   TDelimiter;
SUBC>     DoubleQuote.
Retrieving worksheet from file: ‘R:\data_csv\hs02_2.csv’
Worksheet was saved on 12/01/2011

MTB > Fitline 'y' 'x';
SUBC>   Confidence 95.0.
Regression Analysis: y versus x 

The regression equation is
y = 29.22 - 0.4595 x

S = 8.41252   R-Sq = 21.9%   R-Sq(adj) = 17.5%

Analysis of Variance

Source      DF       SS       MS     F      P
Regression   1   356.33  356.333  5.04  0.038
Error       18  1273.87   70.770
Total       19  1630.20
 
Fitted Line: y versus x 

MTB > Name c3 "RESI1" c4 "SRES1" c5 "TRES1" c6 "HI1" c7 "COOK1" c8 "DFIT1"
MTB > Regress 'y' 1 'x';
SUBC>   Residuals 'RESI1';
SUBC>   SResiduals 'SRES1';
SUBC>   Tresiduals 'TRES1';
SUBC>   Hi 'HI1';
SUBC>   Cookd 'COOK1';
SUBC>   DFits 'DFIT1';
SUBC>   GFourpack;
SUBC>   RType 2;
SUBC>   Constant;
SUBC>   Brief 2.
Regression Analysis: y versus x 

The regression equation is
y = 29.2 - 0.459 x

Predictor     Coef  SE Coef      T      P
Constant    29.221    5.889   4.96  0.000
x          -0.4595   0.2048  -2.24  0.038

S = 8.41252   R-Sq = 21.9%   R-Sq(adj) = 17.5%

Analysis of Variance

Source          DF       SS      MS     F      P
Regression       1   356.33  356.33  5.04  0.038
Residual Error  18  1273.87   70.77
Total           19  1630.20

Unusual Observations

Obs     x     y    Fit  SE Fit  Residual  St Resid
 19  10.0  8.00  24.63    4.00    -16.63     -2.25R

R denotes an observation with a large standardized residual.

Residual Plots for y 

MTB > Print  'x'-'DFIT1'.
Data Display 

Row   x   y     RESI1     SRES1     TRES1       HI1     COOK1     DFIT1
  1  28  15   -1.3554  -0.16533  -0.16079  0.050333  0.000724  -0.03702
  2  26  14   -3.2744  -0.39953  -0.39001  0.050926  0.004283  -0.09034
  3  42  15    5.0774   0.66607   0.65544  0.178907  0.048334   0.30595
  4  29  12   -3.8959  -0.47559  -0.46512  0.051815  0.006180  -0.10873
  5  16  37   15.1308   1.92277   2.09624  0.124989  0.264049   0.79226
  6  21  30   10.4282   1.28759   1.31325  0.073145  0.065418   0.36892
  7  25   7  -10.7338  -1.31116  -1.33980  0.053000  0.048106  -0.31696
  8  35  14    0.8610   0.10703   0.10405  0.085587  0.000536   0.03183
  9  30  28   12.5636   1.53586   1.60119  0.054481  0.067959   0.38435
 10  36  13    0.3205   0.04006   0.03893  0.095364  0.000085   0.01264
 11  37   5   -7.2200  -0.90787  -0.90321  0.106325  0.049031  -0.31154
 12  41  13    2.6180   0.33995   0.33144  0.162020  0.011172   0.14574
 13  20  24    3.9687   0.49215   0.48154  0.081144  0.010695   0.14310
 14  26   8   -9.2744  -1.13164  -1.14110  0.050926  0.034358  -0.26433
 15  38  13    1.2395   0.15693   0.15261  0.118471  0.001655   0.05595
 16  26  17   -0.2744  -0.03348  -0.03253  0.050926  0.000030  -0.00754
 17  10  27    2.3738   0.32081   0.31266  0.226307  0.015052   0.16910
 18  18  29    8.0497   1.00903   1.00957  0.100696  0.057001   0.33782
 19  10   8  -16.6262  -2.24689  -2.57422  0.226307  0.738352  -1.39223
 20  31   5   -9.9769  -1.22214  -1.24028  0.058332  0.046262  -0.30869

MTB > CDF -2.57 k1;
SUBC>   T 17.
MTB > let k2=k1*2*20
MTB > print k1 k2
Data Display 

K1    0.00993699
K2    0.397479

Answers to exercise:
--------------------
The fitted line plot shows the observation (10,8) in the lower left
corner to be considerably off the regression line. It is exactly this
observation in the Minitab table of Unusual Observations. The
standardised residual is -2.25 (and the deletion residual is -2.57);
neither of these are large enough to provide overwhelming evidence of
data error (P=0.40 for the outlier test based on the deletion residual).
Nor is the leverage very large (and it shouldn't be, 
because the x-value is not an outlier in the distribution of x's). The
observation shows up as influential on both Cook's D and DFITS.
The reason why the statistics do not point more clearly to this point 
as an outlier is that there is much noise about the regression equation,
with an R^2 of only 0.22 (increasing to 0.41 without the suspect observation).