r代写-ST221
时间:2022-02-28
Assessed coursework 1
ST221 Linear Statistical Modelling
Deadline: 1 March 2022, 1 pm
Please read these instructions carefully!
This assignment counts for 15% of your final module mark. The maximum score for this
coursework is 30 marks.
Your solutions must be produced using a word processor, R Markdown, or LaTeX. You may
cut-and-paste R-output. Use a font size of 11pt or larger. Question sub-sections must be
clearly labelled for ease of marking.
If you do not submit your solutions in a typed format, then this will not be
accepted as a submission.
You should convert your solutions into one PDF file that should be submitted on the ST221
moodle page. Please DO NOT to add your name on your submission to allow anonymous
marking.
Please read Chapter 5 in the course guide which gives details around the procedures regarding
coursework including applying for extensions and lateness penalties. Please ensure that you
submit in good time before the deadline. Penalties will apply if work is submitted more than 1
minute after the deadline unless an extension or waiver is granted. Coursework is not eligible
for mitigating circumstances for the loss of work in progress. The penalty for late submission
is 5% per 24 hour period encompassing a working day. However no submission will be
accepted more than 5 working days after the original deadline unless there is a pre-approved
extension extending past the cut off period.
If you have any queries about the coursework, please post them on the ST221 forum, but do
not post any part of your solutions.You can also submit questions to the anonymous question
form on moodle.
Please be aware that your work will be submitted to TurnItIn, a piece of
plagiarism-detection software. Cases of suspected collusion or plagiarism will be followed
up as outlined in Section 5.3 of the course guide.
Make sure to read questions carefully. If asked to produce a plot, then please include the
plot in your report. Make sure it is of appropriate scale and the axes are clearly labelled.
Include R code only if requested to do so.
Good luck with the assignment!
1
Download the file Nambe.csv from the module webpage and load it into R.1 The dataset
consists of information about the production of various tableware products. After casting,
each piece of tableware goes through a series of grinding and polishing steps.
The variables (adapted from the original dataset) are:
• Type: the type of product; a categorical variable with categories Bowl, Plate and Tray;
• Diam: the diameter of the product (in inches);
• Time: the total grinding and polishing time of the product (in minutes);
An engineer in the ceramic factory suggests that the total time it takes to grind and polish
the product can be predicted from its diameter using an equation of the form
time = a× diameterb (1)
for some constants a and b.
(a) [4 marks] Produce a scatterplot of the grinding and polishing time against the diameter
of the product. Use different colours and/or plotting symbols to show the type of the product.
Your plot should be clearly labelled and contain a legend.
(b) [3 marks] Explain how the relationship in equation (1) as suggested by the engineer
can be transformed into a relationship that can be modelled by a simple linear regression.
What assumptions are you making about the errors in the original relationship, that is in the
original scale of the response?
(c) [4 marks] Fit the simple linear regression model in (b), that is a model for the trans-
formed relationship. Produce a ‘residuals versus fitted values’ plot and a scale-location
plot of the fitted model and discuss whether linearity and homoscedasticity can be assumed
to hold.
(d) [2 marks] Give a quantitative interpretation of the estimated slope parameter for the
model in (c) in the original scale of the response variable Time.
(e) [2 marks] Predict the time (in minutes) that it will take to grind and polish a product
that has a diameter of 15 inches.
(f) [1 mark] Reproduce the plot from part (a) and add a curve that shows how, according to
the model in (c), the predicted time for grinding and polishing changes with diameter.
Question continues on the next page
1The data is adapted from the Nambe dataset available from DASL, the Data and Story library.
2
(g) [3 marks] The engineer sees your plot which shows the relationship between the
predicted time for grinding/polishing and the diameter. They realise that the
predicted grinding and polishing time for a given products looks like it is (ap-
proximately) proportional to the diameter of the product. They wonder whether a
simple linear regression of time on diameter (possibly through the origin) would have sufficed.
Discuss whether this would have been a suitable alternative supporting your answer with
appropriate evidence.
(h) [2 marks] The engineer then suggests that the constant a in equation (1) may depend on
the type of the product. Explain how to modify the model in (b)-(c) to accommodate this.
(i) [2 marks] Write out the model equations for the new linear model suggested in (h).
(j) [2 marks] Give a description of the jth row of the design matrix for the model suggested
in (h) using indicator variables.
(k) [2 marks] Fit the model in (h). Reproduce the plot from part (a) and add a curve for
each product type that shows how, according to the fitted model, the predicted time for
grinding and polishing changes with diameter.
Hint: If c is an array of observations for the variable C and d is an array of corresponding
observations for the variable D, then to compute the predictions from a model m with
explanatory variables C and D, we use a command of the form
predict(m, list(C=c, D=d)).
(l) [3 marks] Judging from the plots in (f) and (k) do you think that the engineer was right
to suggest that the constant a in equation (1) should depend on the type of the product.
Justify your answer.
3


essay、essay代写