r代写-SMM634|学霸联盟

r代写-SMM634

时间：2021-12-17

SMM634 Individual Assignment (worth 50% of final grading)
Deadline 17 December 2021
1. For this assignment you need to use data frame wine which contains information on prices and growing
characteristics of 25 Bordeaux wines from 1952 to 1998. The data frame contains 7 columns and 47 rows.
The columns are: year of production year, average price of the wines as a percentage of the 1961 price
(price), mm of rain in the harvest month (h.rain), average temperature (C) over the summer preceding
harvest (s.temp), mm of rain in the winter preceding harvest (w.rain), average temperature (C) at har-
vest (h.temp), a rating of the wine quality (parker).
See https://www.wine-searcher.com/critics-27-robert+parker+the+wine+advocate
for details on parker.
The aim of the analysis is to model the response variable price as a function of the variables described
above.
Using these data, write a report addressing the following points:
(a) Justification of the chosen regression model specification. [6 marks]
(b) Using the final model, provide a summary (e.g., using tables and figures) of the empirical findings as
well as interpretation of the estimated model parameters. [6 marks]
(c) Provide recommendations and limitations of your analysis. [6 marks]
(d) What did you learn from the analysis? What is the answer, if any, to the questions you set out to
address? How can the analysis be improved? [6 marks]
The reaming 6 marks will be for presentation.
The report must be in pdf format. It (excluding the title page) must not be longer than 6 pages (including
graphs, tables, etc.) using font size 12pt with one and a half line spacing and at least 2.5 cm margin.
2. In an experiment designed to simulate a production operation carried out at different speeds, 15 similarly
experienced operatives were randomly divided into 5 groups of 3 and each group was randomly allocated to
one of the speeds Speed = 1, 2, 3, 4, 5. Each operative was required to perform a routine task repetitively
over a given period of time. The total numbers of mistakes (Mistakes) over the 3 runs at each speed were
as follows:
Speed Mistakes
1 1 2
2 2 7
3 3 25
4 4 47
5 5 121
> # Fitted values and residuals
> cbind(fitted(speed.glm),resid(speed.glm))
[,1] [,2]
1 3.245971 -0.74489780
2 8.031721 -0.37229102
3 19.873420 1.10520289
4 49.174117 -0.31236557
5 121.674773 -0.06122931
> # Residual plots below
> plot(speed.glm,which=1:4)
1 2 3 4 5
0
40
80
12
0
Scatterplot of Mistakes vs Speed
Speed
M
is
ta
ke
s
1 2 3 4 5
1
2
3
4
Plot of Log(Mistakes) vs Speed
Speed
Lo
gm
ist
ak
es
2 3 4
−
0.
5
0.
0
0.
5
1.
0
Predicted values
R
es
id
ua
ls
Residuals vs Fitted
3
1
2
−1.0 −0.5 0.0 0.5 1.0
−
0.
5
0.
0
0.
5
1.
0
1.
5
Theoretical Quantiles
St
d.
d
ev
ia
nc
e
re
sid
.
Normal Q−Q
3
1
2
2 3 4
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
1.
2
Predicted values
St
d.

de
vi
a
n
ce

re
si
d.
Scale−Location
3
1
2
1 2 3 4 5
0.
0
0.
1
0.
2
0.
3
0.
4
Obs. number
Co
ok
’s
di
st
an
ce
Cook’s distance
3
51
(a) Comment on the two plots above and suggest two suitable models to fit the data. [3 marks]
(b) By looking at the R code below, write the statistical model that has been fitted. [3 marks]
speed.glm <- glm(Mistakes ˜ Speed, family = poisson)
summary(speed.glm)
Call:
glm(formula = Mistakes ˜ Speed, family = poisson)
Deviance Residuals:
1 2 3 4 5
-0.74490 -0.37229 1.10520 -0.31237 -0.06123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.27143 0.33881 0.801 0.423
Speed 0.90598 0.07574 11.962 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 219.1271 on 4 degrees of freedom
Residual deviance: 2.0163 on 3 degrees of freedom
AIC: 29.828
Number of Fisher Scoring iterations: 3
(c) Explain what is meant by
Number of Fisher Scoring iterations: 3
[1 mark]
(d) Does the model above fit well the data? Justify your answer. [2 marks]
(e) What null hypothesis H0 is tested with the Null deviance? What conclusion can be drawn? Is
there an alternative way to test the same H0? Justify your answer. [4 marks]
(f) Consider the R output below and write the statistical model that has been fitted. [3 marks]
logMistakes <- log(Mistakes)
llm <- lm(logMistakes ˜ Speed)
summary(llm)
Residuals:
1 2 3 4 5
-0.18572 0.05609 0.31810 -0.06158 -0.12689
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.13208 0.24124 -0.548 0.622129
Speed 1.01095 0.07274 13.899 0.000806 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 0.23 on 3 degrees of freedom
Multiple R-Squared: 0.9847, Adjusted R-squared: 0.9796
F-statistic: 193.2 on 1 and 3 DF, p-value: 0.0008063
(g) Explain what the R code below is doing. [2 marks]
plot(Speed, logMistakes)
abline(coef(llm))
abline(coef(speed.glm), lty=2)
legend(1,4.5,c("Normal linear model with logMistakes",
"Poisson GLM with log link"), lty=1:2)
(h) Look at the plot below. What conclusions can you draw? [2 marks]
# Residual standard error: 0.1545 on 3 degrees of freedom
# Multiple R-Squared: 0.9916, Adjusted R-squared: 0.9888
# F-statistic: 354.9 on 1 and 3 DF, p-value: 0.0003265
1 2 3 4 5
1
2
3
4
Speed
lo
gM
ist
ak
es
Normal linear model with logMistakes
Poisson GLM with log link
1 2 3 4 5
1
2
3
4
Speed
lo
g(M
ist
ak
es
+1
)
Normal linear model with log(Mistakes+1)
Poisson GLM with log link, fitted values
G1 R output 10: Comments
1. Plot of ‘no. of mistakes’ versus ‘speed’ shows exponential relation. Plot of ‘log no.
of mistakes’ versus ‘speed’ suggests modelling this by straight line, i.e. log-linear
model.
2. Fitting log-linear model (R commands explained in G3).
3. Maximum likelihood estimates βˆ0 (first row), βˆ1 (2nd row) (Section 3.2.1 (i), (ii)).
4. Number of iterations for the iterative procedure to converge: just 3 (section Section
3.2.1 (ii)).
5. Stating that φ = 1 (Section 3.1)
6. Estimate of V (βˆ) from (XTWX)−1 with W estimated (Exercises 6: wii estimated
by µˆi).
7. Standard errors: se(βˆ0) (first row), se(βˆ1) (2nd row) (Section 3.2.2 (i)).
8. Testing H0 : β0 = 0 (1st row), H0 : β1 = 0 (2nd row): z-value is βˆj/se(βˆj) (Section
3.2.3 (i)).
9. Last column: Pr(> |z|) gives p-values for testing each null hypothesis in 8. Very
small p-value in 2nd row indicates strong evidence against β1 = 0 - as we would
expect from plots.
[Total marks: 50]