F20SA/F21SA 2021: Assessed Project
In this project we analyse average wind speeds at the top of Arthur’s Seat in Edinburgh. We
will model the wind speed by using a Rayleigh statistical model, which we will fit to the data
by maximum likelihood estimation and subsequently use to make predictions about future
wind speeds.
Project description
You have been asked to analyse the distribution of the wind speed at the top of Arthur’s Seat
in Edinburgh. You have access to the values x = (x1, . . . , x1000) of average wind speeds
registered on 1000 days in years 2018-2020. The data is provided in the wind.csv file, with
values recorded in miles per hour (mph). You are also asked to use this data to make some
predictions about the wind speed in the subsequent years.
You are advised to assume a Rayleigh model for the wind speeds, with probability density
function given by
f(x, σ) =
{
x
σ2
exp
(
−x2
2σ2
)
, x ≥ 0
0, x < 0,
where σ > 0 is an unknown parameter.
1. Summarise the data by plotting a histogram and calculating numerical summaries, such
as the sample mean, standard deviation, median, and other quantiles. Briefly comment
on your results.
[3 marks]
2. Derive the maximum likelihood estimator for σ (denoted henceforth by σˆ). (Hint: For
any n ≥ 1 we have d
dσ
(
1
σn
)
= − n
σn+1
).
[4 marks]
3. Derive the Fisher information for σ and use it to approximate the distribution of σˆ.
(Hint: Use the fact that for a random variable X with the Rayleigh distribution with
parameter σ > 0, we have E[X2] = 2σ2).
[4 marks]
4. Using the results of parts 2 and 3, calculate σˆ = σˆ(x) and report an approximate 95%
equal-tailed confidence interval I = [σL(x), σU(x)] for σ.
[4 marks]
1
5. For any i = 1, . . . , 1000, let X ′i ∼ Rayleigh(σˆ) denote the predicted wind speed on a
random day in the future. Let Y ′ = 1
1000
∑1000
i=1 X
′
i be the predicted mean wind speed
in the next 1000 days (assume that predicted individual wind speeds are independent
of each other). Use simulation in R to estimate the distribution of Y ′. Report this
distribution by plotting a histogram and calculating numerical summaries. (Hint: Use
the VGAM R package to get access to functions involving the Rayleigh distribution).
[4 marks]
6. Climate researchers believe that in the comings years the variance of the wind speed
will increase. Using the result of part (5), calculate the probability that this assumption
is true based on the data and model available (i.e., calculate P[sd(Z ′) > sd(x)], where
sd(Z ′) is the standard deviation of the predicted sample of future wind speeds Z ′ =
(X ′1, . . . , X
′
1000) and sd(x) is the standard deviation of the sample x = (x1, . . . , x1000)).
Briefly comment on your results.
[3 marks]
7. To assess if your conclusions are robust to errors in the estimation of σ, recalculate
the probability from part (6) by using the lower and upper estimates σL(x) and σU(x)
derived in part (4). Briefly comment on your results.
[3 marks]
Your findings should be presented in the form of a report, which should:
• have a clear and logical structure;
• include an introduction and clearly stated conclusions that can be understood by any
numerate scientist (not necessarily a statistician);
• include details of your mathematical calculations so that your results could be repro-
duced by another statistician;
• include clearly labelled and correctly referenced tables and diagrams, as appropriate;
• include the R code you used in an appendix (you do not need to explain individual
R commands but some comments should be included to indicate the purpose of each
section of code);
• include citation and referencing for any material (books, papers, websites etc) used.
• maximum page limit of four (4) pages (11-point font, A4 size, 4 pages = 2 sheets
of paper, additional pages are allowed for the R code). Excluding R code, only the
first four pages of your submission will be marked. No feedback will be given, or
marks awarded, for any work (apart from R code) appearing on subsequent pages.
A total of 5 Marks is available for these aspects of your report. This will be marked according
to the rubric given in the Appendix. [Total: 30 Marks]
2
Notes
• This assignment counts for 30% of the course assessment.
• You may discuss this project with your colleagues, but your report must be your own
work. Plagiarism is a serious academic offence and carries a range of penalties, some
very serious. Copying a friend’s report or code, or copying text into your report from
another source (such as a book or website) without citing and referencing that source, is
plagiarism. Collusion is also a serious academic offence. You must not share a copy of
your report (as a hard copy or in electronic form) or your computer code with anyone
else. Penalties for plagiarism or collusion can include voiding of your mark for the
course.
• Your report should be submitted through Canvas by 15:30 GMT on Friday 26th
November 2021. A link to the submission page is available through the ‘Assess-
ment’ section of the course Canvas page. Please use the submission link appropri-
ate for the campus where you are studying (Edinburgh or Dubai). For late submis-
sions, 30% will be deducted for work submitted at most 5 working days late. Any
coursework submitted after 5 working days of the set submission date will receive
0 marks.
3
Appendix: Rubric for marking of the report
The five marks available for the exposition of your report will be awarded according to the
scale below:
0–1 Marks • Lack of clear and logical structure
will be • Conclusions missing or not suitable for a non-statistician
awarded • Statistical calculations and methodology not clearly set out for the reader
for • Tables and figures unclear, badly labelled or not correctly referred to
• R code not included, or no comments included in it
• Sources used not clearly referenced
2–3 Marks • Clear and logical structure
will be • Conclusions generally suitable for a non-statistician
awarded • Statistical calculations and methodology generally set out clearly for the reader
for • Tables and figures often clear and correctly referred to
• R code included with some comments
• Sources used clearly referenced
4–5 Marks • Clear and logical structure
will be • Conclusions suitable for a non-statistician
awarded • Statistical calculations and methodology set out clearly for the reader
for • Tables and figures clear, correctly referred to and easy to interpret
• R code included with comments
• Sources used clearly and correctly referenced
4
学霸联盟