r代写-STATS4044-Assignment 2
时间:2021-11-30
Assignment 2 (2020) alternative questions
STATS4044 Intro to R (H) 2021-2022
Introduction
This document contains alternative questions for Assignment 2 (2020), additional to those available in
A2_2020.pdf and A2_2020.html.
Contents:
• 1 alternative version of Question 2 (worth 13 marks):
– Calibration of reflectance
• 1 alternative version of Question 3 (worth 16 marks):
– Triangles
• 2 additional questions that were not used in the assignment, but provide extra material of a similar
level:
– mcycle
– Circle
• 1 additional question from the DD80 level Assignment 2. This is at a lower level of material than those
for the Honours level assignment:
– Triangles (DD80)
The original assignment contained 1 of each version of Question 2 and 3, along with Question 1, which is
available in A2_2020.pdf and A2_2020.html (35 marks total).
Question 2 - Calibration of reflectance [13 marks total]
The data for this question is the file a2data2020.RData which is available on Moodle, or can be loaded
directly into R using by running the following line of code;
load(url("https://www.maths.gla.ac.uk/~rhaggarty/rp/a2data2020.RData"))
Water quality can be measured in a number of different ways. One method involves using reflectance
measurements from earth observation satellites, whilst another involves collecting the equivalent measurements
by humans using hand held instruments (these are called in situ measurements). These two types of
measurement are often regressed against one another to assess how similar they are.
The file a2data2020.RData contains a data frame called calib, which contains the satellite measured
reflectance (stored in the column named sat) and the in-situ measured reflectance (stored in column insitu)
for 110 unique lake locations.
(a) [3 marks]
Create a scatter plot of sat against insitu. Label the x-axis “In situ reflectance”, and the y-axis “Satellite
Reflectance”. The title of your plot should be “Water Reflectance Calibration”.
(b) [2 marks]
Adapt your plot from part (a) so that the points which represent lake locations where the satellite reflectance
measurement is greater than the in situ reflectance measurement are represented by a different plotting
characters to those where the opposite is true (i.e. where the satellite measurement are less than the in situ).
1
(c) [1 mark]
A linear regression model (without intercept) fitted to this data gives the following estimated regression
equation:
sat = 1.1903 insitu
Add this regression line to the plot.
(d) [2 marks]
A 95% confidence interval for the expected (average) sat reflectance given the insitu reflectance is
(1.1247 insitu, 1.2559 insitu)
Add red dashed lines to the plot corresponding to the lower and upper bound of this 95% confidence interval.
(e) [3 marks]
A 95% prediction interval for sat given insitu is(
1.1903 insitu− 0.0852√18.7 + insitu2, 1.1903 insitu + 0.0852√18.7 + insitu2
)
Add blue dotted lines to the plot corresponding to the lower and upper bound of this 95% prediction interval.
(f) [2 marks]
Add a legend to your plot which indicates what each type of line from parts (c)-(e) represents. Your final
plot should look similar to the one below.
2
Question 3 - Triangles [16 marks total]
For this question you should first set up a plotting canvas using the code below
plot(c(1,4), c(2,8), type="n")
The function polygon can be used to draw a polygon with a set of specified vertices.
(a) [2 marks]
Use the function polygon to produce a plot of the triangle that connects the three points (also called vertices)
(1, 2), (4, 8) and (3, 2). Your plot should look similar to the one below.
3
In general, let the vertices of a triangle be denoted by A = (xa, ya), B = (xb, yb) and C = (xc, yc). From this
point on we shall refer to this triangle with vertices A, B and C as ∆ABC.
(b) [4 marks]
Next define a function named triangle which takes as an arguments three points, A, B and C and produces
a plot of the triangle ∆ABC that connects those three points. Your function should have the vertices
A = (1, 2), B = (4, 8) and C = (3, 2) as the default values of A, B and C. You should also ensure your
function can pass on additional arguments (such as col, lty,border, . . . ) to the function polygon.
Note: Your function should act as a low-level plotting function, i.e. it should add the triangle to an existing
plotting canvas that is defined before the function is called.
(c) [1 mark]
Use your function triangle to add an additional triangle to your plot from part (a). The triangle should be
defined by the vertices (1, 7), (2.5, 3) and (3.5, 5)
To show that a point p = (x, y) lies within a triangle with vertices A = (xa, ya), B = (xb, yb) and C = (xc, yc)
we first solve the system of linear equations shown below to find λa, λb and λc. 1 1 1xa xb xc
ya yb yc
−11x
y
 =
λ1λ2
λ3

Then we can say the point p lies strictly inside the triangle if λ1 > 0, λ2 > 0 and λ3 > 0.
(d) [5 marks]
Write a function called inside which takes four arguments A = (xa, ya), B = (xb, yb), C = (xc, yc) and
p = (x, y) and returns the value TRUE if p lies within the triangle defined ∆ABC and FALSE otherwise.
4
Your function should have the points A = (1, 2), B = (4, 8), C = (3, 2) and p = (2.5, 3) as defaults.
The matrix trip stored in a2data2020.Rdata contains 3000 points pi = (xi, yi), i = 1, ..., 3000 from a
uniform distribution (xi ∼ Unif(1, 4), yi ∼ Unif(2, 8) i = 1, ..., 3000).
Each row of the matrix corresponds to a single point pi(xi, yi).
(e) [1 mark]
Add the 3000 points stored in trip to the plot you produced in part (a).
Let the triangle defined by vertices by (1, 2), (4, 8) and (3, 2) (the one drawn in part (a)) be called ∆U and
the triangle defined by vertices (1, 7), (2.5, 3) and (3.5, 5) (the one drawn in part (c)) be called ∆V .
(f) [5 marks]
{i} For each point (row) in trip use your function inside to determine if it is lies in triangle ∆U . Similarly,
repeat this step to see if each point lies in triangle ∆V .
{ii} Following this, colour the points on your plot so that points inside triangle ∆U are coloured in red, points
inside triangle ∆V are coloured in blue and points inside ∆U ∩∆V (the intersection of triangles U and V )
are coloured in yellow.
Your plot should look similar to the one below.
NOTE: If you have not managed to complete (f) {i} then you can use the vector tricols (stored in
a2data2020.Rdata) to colour the points stored in trip on your plot. The ith element of tricols is
• U if the point i is in ∆U
• V if point i in square ∆V
• both if point i in ∆U ∩∆V
• neither if point in neither ∆U or ∆V
5
Additional question - mcycle [18 marks total]
The dataset mcycle is built into R. This data is from a simulated motorcycle accident and consists of two
columns;
times milliseconds after impact
accel measurement of head acceleration (in g)
To access the data type and run the following code;
library(MASS)
data(mcycle)
(a) [2 marks]
Plot the mcycle data. Your plot should have meaningful axis labels and look similar to the one below
10 20 30 40 50

10
0

50
0
50
Time (milliseconds after impact)
Ac
ce
le
ra
tio
n
Given the linear model E(yi) = β0 + β1xi and a vector of observation weights w = (w1, ..., wn), the weighted
least squares estimate of the regression coefficient βˆ = (β0;β1) is
βˆ = (XTWX)−1XTWy
where y is the response vector and the design matrix, X, and weights matrix, W , can be written as follows
X =
1 x1... ...
1 xn
 W =
w1 . . . 0... . . . ...
0 . . . wn

W is a diagonal matrix containing the weights (w1, ...wn) on the diagonal and 0 off diagonal.
(b) [4 marks]
Write a function weighted.ls which takes the vectors x, y, and w as arguments and which returns the
weighted least squares estimate βˆ = (β0, β1).
6
(c) [3 marks]
Suppose we want to predict the response for a new observation, for which the covariate takes the value x0.The
predicted value of the response is then
yˆ0 = βˆ0 + βˆ1x0 (1)
where βˆ = (β0, β1) is the coefficient estimate obtained using the function weighted.ls.
Write a function predict.wls, which takes x, y, w and x0 as arguments, and returns the least squares
prediction yˆ0 computed using Equation 1.
Locally linear regression is a method of fitting a smooth function f(.) to data points (xi, yi), where x =
(x1, ..., xn) is the covariate and y = (y1, ..., yn) is the response variable.
In locally linear regression the weighted linear model is used to construct a non-linear method for regression.
This is done by computing a different set of weights for each value of x0. Given x0 and a parameter ρ (which
controls how smooth the fitted function will be), the set of weights w = (w1, ..., wn) is computed using
wi = exp(−ρ(xi − x0)2) (2)
i.e. observations close to x0 have larger weights.
(d) [3 marks]
Write a function compute.weights which takes the vector x, the new observation x0 (a single number), and
the parameter ρ as arguments, and which computes the weights w.
(e) [4 marks]
Define a vector called fitted which contains an estimate of local linear regression fitted to the mcycle data
with ρ = 0.15.
First use the code below to create a vector of x0 values at which to estimate the function;
x.0 <- seq(3, 58, length.out=100)
Then for each value of x.0;
• (i) Compute the weights w using Equation 2.
• (ii) Compute the weighted least squares estimate βˆ using w from step (i) as weights, times as the
covariate, x, and accel as the response, y.
• (iii) Compute the predicted value y0 = βˆ0 + βˆ1x0
Hint: You have already implemented step (i) in the function compute.weights, and steps (ii) and (iii) in the
function predict.wls.
(f) [2 marks]
Add the line corresponding to the local linear regression fit you have estimated in part (e) to the plot you
produced in part (a) using a blue line. Your plot should look similar to the one below.
Note if you have not managed to obtain an estimate for the fitted values in part (e) you can use the vector
fits which is contained within the object a2data2020.Rdata to complete this question.
7
10 20 30 40 50

10
0

50
0
50
Time (milliseconds after impact)
Ac
ce
le
ra
tio
n
Additional question - Circle [18 marks total]
(a) [3 marks]
The Euclidean distance between two points (x1, y1) and (x2, y2) can be computed as

(x1 − x2)2 + (y1 − y2)2
Write a function called euc.dist which take as arguments two points p1 = (x1, y1) and p2 = (x2, y2) and
returns the euclidean distance between them.
(b) [3 marks]
Write a function named harm.mat which takes a single argument named n. Your function should
• create a vector of length n named t that contains an equally-spaced sequence from 0 to 2pi. (this vector
does not need to be returned by the function)
• create a matrix named mat with two columns named x and y and n rows, such that xi = sin(ti) and
yi = cos(ti), i = 1, ..., n.
• round mat such that each element of mat is rounded to two decimal places.
• return mat
(c) [1 marks]
Use your function harm.mat to with n set to be 15 to define a 15 row matrix named data
Note if you have not managed to complete parts (b) and (c) you can use the matrix tdata (contained in
a2data2020.Rdata ) in place of the matrix data to for the remainder of this question.
(d) [2 marks]
Create a plot of the data points stored in data created in part (c). The x and y axis labels should be “sin(t)”
and “cos(t)”, respectively. You should set the plotting character code to be 19.
(e) [3 marks]
Connect all pairs of points in the plot from part (d) with a line. Your plot should look similar to the one
below.
8
−1.0 −0.5 0.0 0.5 1.0

1.
0

0.
5
0.
0
0.
5
1.
0
Circle
sin(t)
co
s(t
)
(f) [4 marks]
Using a for loop (or for loops) draw over each of the horizontal lines on your plot produced in part (e) in
red. Your plot should look similar to the one below.
9
−1.0 −0.5 0.0 0.5 1.0

1.
0

0.
5
0.
0
0.
5
1.
0
Circle
sin(t)
co
s(t
)
(g) [2 marks]
Adapt your code from part (f) so that the width of the red lines drawn (controlled by the parameter lwd) is
set to be 3 times the length of the line (i.e. shorter lines are drawn using a narrower line, longer lines are
wider line). Your function euc.dist from part (a) may be useful here. Your plot should look similar to the
one below.
10
−1.0 −0.5 0.0 0.5 1.0

1.
0

0.
5
0.
0
0.
5
1.
0
Circle
sin(t)
co
s(t
)
Additional question (DD80 level) - Triangles (DD80) [18 marks total]
Note that this question was included in the DD80 version of Assignment 2, not the Honours version.
(a) [5 marks]
Produce a plot of the triangle that connects the three points (1, 2), (4, 8) and (3, 2) and draw the hypotenuse
of the triangle (the longest side) using a thick blue line. Your plot should look similar to the one below.
11
Let the vertices of a triangle be denoted A = (xa, ya), B = (xb, yb) and C = (xc, yc). From this point on we
shall refer to this triangle as ∆ABC.
(b) [2 marks]
The centroid M of ∆ABC can be found using the equation;
M =
(
xa+xb+xc
3 ,
ya+yb+yc
3
)
Define a vector of length 2 named M which contains the centroid of the ∆ABC where A = (1, 2), B = (4, 8)
and C = (3, 2) and draw the centroid on the plot produced in part (a) with a large green point (you should
use plotting character 19 for this). Your plot should look similar to the one below
12
It can be shown that for a point p = (x, y) lies within ∆ABC if the coefficients λ1, λ2 and λ3 found by solving
the system of equations shown below are all greater than 0. 1 1 1xa xb xc
ya yb yc
−11x
y
 =
λ1λ2
λ3

In other words, we can say the point p lies strictly inside ∆ABC iff λ1 > 0, λ2 > 0 and λ3 > 0.
(c) [3 marks]
Define a vector of length 3 called lambda which contains the values of λ1, λ2 and λ3 found by solving the
linear system shown above that could be used to assess if the point p = (2.5, 3) lies within ∆ABC where
A = (1, 2), B = (4, 8) and C = (3, 2).
Hint: the function solve may be useful here
The matrix trip stored in a2data2020.RData contains 3000 points pi = (xi, yi), i = 1, ..., 3000 from a
uniform distribution (xi ∼ Unif(1, 4), yi ∼ Unif(2, 8) i = 1, ..., 3000).
Each row of the matrix corresponds to a single point pi(xi, yi).
13
(d) [1 mark]
Add the 3000 points stored in trip to the plot you produced in part (a) .
(e) [4 marks]
Using the method above for determining if a point lies inside a triangle, create a vector of length 3000 named
inside where the i-th element is TRUE if the i-th row of trip lies inside the triangle defined by vertices
A = (1, 2), B = (4, 8) and C = (3, 2) and FALSE otherwise.
(f) [3 marks]
Use the vector from part (e) above to colour the points on your plot (i.e. those stored in the rows of trip) so
that points inside the triangle with vertices A = (1, 2), B = (4, 8) and C = (3, 2) are coloured in red and
points outside the triangle are in black.
You should ensure the centroid of the triangle is completely visible (not covered by points). Your plot should
look similar to the one below.
Note: If you have not managed to successfully complete part (f) then you may use the vector tricols stored
within the file a2data2020.RData to colour the points on the plot. The ith element of the vector tricols
can be interpreted as follows
• U - point i is inside the triangle∗
14
• both - point i is inside the triangle∗
• neither - point i is outside the triangle∗
• V - point i is is outside the triangle∗
∗for clarity the triangle we are referring to here is defined by vertices A = (1, 2), B = (4, 8) and C = (3, 2)
15

essay、essay代写