Solution file for additional exercise 6.1 ----------------------------------------- Data: measurements of iron in the livers of white rats. Rats were randomly allocated to 5 diets (A-E), with 10 rats per diet. The data constitute 5 independent samples with continuous outcome, and the model immediately suggested is a one-way ANOVA. MTB > WOpen "h:\VHM\VHM802\Data_csv\hs06_1.csv"; SUBC> FType; SUBC> CSV; SUBC> DecSep; SUBC> Period; SUBC> Field; SUBC> Comma; SUBC> TDelimiter; SUBC> DoubleQuote. Retrieving worksheet from file: 'h:\VHM\VHM802\Data_csv\hs06_1.csv' Worksheet was saved on 12/02/2011 MTB > Oneway 'iron' 'diet'; SUBC> GBoxplot. One-way ANOVA: iron versus diet Source DF SS MS F P diet 4 127.82 31.96 12.44 0.000 Error 45 115.60 2.57 Total 49 243.42 S = 1.603 R-Sq = 52.51% R-Sq(adj) = 48.29% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev +---------+---------+---------+--------- A 10 5.693 2.406 (----*-----) B 10 2.906 1.983 (-----*----) C 10 2.823 1.621 (----*----) D 10 1.584 0.562 (----*----) E 10 1.090 0.428 (----*-----) +---------+---------+---------+--------- 0.0 2.0 4.0 6.0 Pooled StDev = 1.603 Comments: --------- The standard deviations are clearly not equal in the 5 groups, but seem to increase almost linearly with the mean. We therefore try a log transformation. MTB > let c3=ln(c1) MTB > name c3 'ln(iron)' MTB > Oneway 'ln(iron)' 'diet'; SUBC> GBoxplot; SUBC> GFourpack. One-way ANOVA: ln(iron) versus diet Source DF SS MS F P diet 4 14.902 3.725 15.72 0.000 Error 45 10.665 0.237 Total 49 25.567 S = 0.4868 R-Sq = 58.29% R-Sq(adj) = 54.58% Individual 95% CIs For Mean Based on Pooled StDev Level N Mean StDev -----+---------+---------+---------+---- A 10 1.6517 0.4579 (-----*----) B 10 0.8741 0.6465 (-----*----) C 10 0.8939 0.5584 (----*----) D 10 0.4056 0.3447 (----*----) E 10 0.0259 0.3559 (----*-----) -----+---------+---------+---------+---- 0.00 0.60 1.20 1.80 Pooled StDev = 0.4868 Residual Plots for ln(iron) MTB > PPlot 'lniron'; SUBC> Normal; SUBC> Symbol; SUBC> FitD; SUBC> Grid 2; SUBC> Grid 1; SUBC> MGrid 1; SUBC> Panel 'diet'. Probability Plot of lniron Comments: --------- The 1-way ANOVA model: ln(iron_i) = mu_diet(i) + eps_i, i=1,...,50 where the errors eps_i are N(0,sigma^2) seems to describe the data well. The standard deviations are roughly the same in the 5 groups, the values in each group do not show strong deviations from normality (difficult to assess with only 10 obs.), and the residual plot and normal plot look fine. A Box-Cox analysis points to an optimal power of lambda=-0.3, and we may also consider the rounded "nice" value of -0.5 (this is Minitab's choice in the General Regression menu, with a 95% CI for lambda of (-0.685,0.075)). At both scales the analysis of residuals do not detect any major problems with model assumptions. For simplicity, it might be preferable to work with the log transformation (and that's what we'll do for this solution). The ANOVA table is given above, and the F-statistic for testing all groups having equal means is F=15.72 and clearly significant in F(4,45). The text defines 4 contrasts by their coefficients. These are seen to be pairwise orthogonal, because for any pair the sum of products of coefficients is zero. For example, for contrasts 1 and 2: 1*1 + 1*(-1) + 0*2 + 0*0 + 0*0 = 0 or for contrasts 2 and 3: 1*2 + 1*2 + (-2)*2 + 0*3 + 0*3 = 0. Contrast Estimate SE SS SS(%) t P(t) F(Schef) P(Schef) ------------------------------------------------------------------------------- beef vs pork 0.778 .2177 3.03 20.3 3.57 0.0001 3.19 0.022 mammals vs poultry 0.738 .3771 0.91 6.1 1.96 0.057 0.96 0.44 animal vs vegetab. 5.545 .8432 10.25 68.8 6.58 0.0000 10.8 0.0000 beans vs oats 0.380 .2177 0.72 4.8 1.75 0.088 0.76 0.56 ------------------------------------------------------------------------------- formulae: SS=(estimate^2)/[(w_1^2+...+w_5^2)/10] SS (%) = SS / SSTrT (SSTrT=14.902) t=Est/SE=sqrt(SS/MSE) (MSE=0.237) P(t) ~ t(45) F(Scheffe)=SS/4/MSE (or t^2/4) P(Scheffe) ~ F(4,45) Conclusions: ------------ Just looking at the SS-values for the contrasts, it is clear that the last one (animal vs. vegetable) is accounting for a large proportion (69%) of the variation between diets. Using ordinary t-tests, all 4 contrasts are interesting - two of them are clearly significant and the other two are borderline significant. This method assumes that all contrasts were preplanned, and works with individual error levels of 5%. It is possible to do a Bonferroni correction for carrying out 4 tests by multiplying all P-values by 4 (not shown). This approach would make the 2 "interesting" contrasts non-significant. Note however that the method still assumes that all contrasts were preplanned. We could also do a Holm correction, by which we would then multiply the P-values by 4,3,2,1; the conclusions are the same. Finally, the Scheffe method takes into account both the multiple tests and contrasts being suggested by the data. It is the most conservative of the 3 methods but also the most "safe". It shows the animal vs vegetable contrast to be still highly significant, and also the beef vs pork contrast is still significant. There is evidence that beef and pork diets are different, and that animal and vegetable diets are clearly different.