Exercise Set 2 Q4: feedback
Two scientists are trying to establish how storage temperature affects the percentage of
viable cells in a sample. They agreed that a sensible range of temperatures to consider is -5C
to 5C and that they should keep the cell in the storage for one week but couldn’t agree on
the design of the experiment. Therefore, they decided to conduct separate experiments, each
with 20 observations. Their data are in files DrVarlow.csv and DrEven.csv.
a) Fit a simple linear regression model to each data set using R. What conclusions can you
draw from the estimated values for each of the models?
MODEL FOR DR VARLOW
DrVarlow <- read.csv("DrVarlow.csv")
names(DrVarlow) <- c("Temp","PercCells")
DrVarlow.slr <- lm(PercCells~Temp,data = DrVarlow)
summary(DrVarlow.slr)
##
## Call:
## lm(formula = PercCells ~ Temp, data = DrVarlow)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.181 -3.840 -0.291 2.790 10.217
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.4470 1.1519 53.342 < 2e-16 ***
## Temp 1.9752 0.2304 8.573 9.02e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.152 on 18 degrees of freedom
## Multiple R-squared: 0.8033, Adjusted R-squared: 0.7924
## F-statistic: 73.5 on 1 and 18 DF, p-value: 9.02e-08
MODEL FOR DR EVEN
DrEven <- read.csv("DrEven.csv")
names(DrEven) <- c("Temp","PercCells")
DrEven.slr <- lm(PercCells~Temp,data = DrEven)
summary(DrEven.slr)
##
## Call:
## lm(formula = PercCells ~ Temp, data = DrEven)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7614 -3.8334 -0.2515 4.3554 10.7210
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.0765 1.2189 58.31 < 2e-16 ***
## Temp 1.7595 0.4017 4.38 0.000361 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.451 on 18 degrees of freedom
## Multiple R-squared: 0.5159, Adjusted R-squared: 0.489
## F-statistic: 19.18 on 1 and 18 DF, p-value: 0.0003611
Denoting the percentage of viable cells by y and temperature by x, the fitted lines are
y=61.447+1.975x for Dr Varlow and y=71.076+1.759x for Dr Even. For both fitted lines, the
percentage of viable cells increases as the temperature increases (for Dr Varlow's data by
about 2 percent for every 1oC increase in storage temperature, and for Dr Even's by about
1.8 percent). On average, the percentage of viable cells stored in temperature 0oC is about
61.5 for Dr Varlow's data and about 71.1 for Dr Even's data.
b) Create a scatter plot for each of the data sets. What conclusions can you draw from the
plots?
DR VARLOW
with(DrVarlow,plot(Temp,PercCells,main="Scatter plot of Dr Varlow's
data",xlab="temperature",ylab="Percentage of viable cells"))
For Dr Varlow's data, we only have observation at two different values of the covariate. There
does not seem to be any problem with the model but we cannot see how the response changes
between the values of -5 and 5 as no data was collected.
DR EVEN
with(DrEven,plot(Temp,PercCells,main="Scatter plot of Dr Even's
data",xlab="temperature",ylab="Percentage of viable cells"))
For Dr Even's data, the observation are evenly spread between -5 and 5. The response appear
to increase with temperature between values -5 and about 2, but after that it seems to
decrease. A simple linear regression model might not be appropriate here. Perhaps a quadratic
model would be better.
c) Which of the approaches is more appropriate? Justify your answer.
Dr Varlow's approach should result in estimators with minimal variance (for the given
range and number of observations) IF the simple linear regression model is appropriate.
However, it does not show us how the response changes with $x$ between the two points.
Looking at data from Dr Even, it is likely that SLR is not appropriate here. Given that the
researchers couldn't be certain that SLR is the correct model here, Dr Even's approach is
better: it showed us that the shape of the response is not a straight line!
d) What advice would you give the Dr Even and Dr Varlow? You should comment on both
modelling approaches and conclusions that could be drawn from the analysis.
They should consider fitting a quadratic model instead of a linear one, and ideally combing both
data sets together. Given that there appears to be an indication that the SLR is not appropriate
conclusions about how the percentage of viable cells changes with temperature based on this
model might be wrong as well. Notice that SLR could not possibly be appropriate for a wider
rage of temperature as for example the model fitted to Dr Varlow's data implies that at 25oC,
on average, about 110% of cell are viable! This is clearly impossible.
ADDITIONAL COMMENTS: If they were to fit a quadratic model to the combined data sets, the
estimated equation is y=74.294+1.917x-0.470x2 with all coefficients significant (more about this
in week 5) and with maximum at about 2.04oC.
QUADRATIC MODEL FOR COMBINED DATA
A AllData <-rbind(DrVarlow,DrEven)
AllData.quad <- lm(PercCells~Temp+I(Temp^2),data=AllData)
summary(AllData.quad)
##
## Call:
## lm(formula = PercCells ~ Temp + I(Temp^2), data = AllData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.5788 -3.8177 -0.6838 3.4258 9.3998
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.29138 1.73408 42.842 < 2e-16 ***
## Temp 1.91714 0.20843 9.198 4.26e-11 ***
## I(Temp^2) -0.46947 0.08797 -5.337 4.97e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.452 on 37 degrees of freedom
## Multiple R-squared: 0.7535, Adjusted R-squared: 0.7401
## F-statistic: 56.54 on 2 and 37 DF, p-value: 5.62e-12
学霸联盟