R代写-STATS 3001 /-Assignment 1
时间:2021-03-11
STATS 3001 / STATS 4104 / STATS 7054
Statistical Modelling III
Assignment 1
2021
DEADLINE:
• Friday 12th March 2021 5pm (Week 2)
QUESTIONS:
1. AFL home-ground advantage theory
In this assignment, we are going to examine if there is a home ground ad-
vantage in the AFL.
We will need to develop a more complicated model that the usual linear
regression, and so we cannot just use the built in function lm(), we will need
to use some coding to fit it.
In this question, we will develop the correct design matrix X for our model,
and in the next question we will apply our method to some real data.
Set up
Consider a football competition with six teams, A, B, C, D, E, F . Each
team has its own home ground and the following games were played in the
first four weeks of the season.
Week Home team Away team
1 A D
1 B E
1 C F
2 D B
2 E C
2 F A
2 A E
3 B F
3 C D
3 D C
4 E A
4 F B
Let yijk denote the difference between the points scored by the home team,
1
i, and the points scored by the away team, j, when the two teams played on
the kth occasion.
For example
y141
is the difference in scores between Team A and Team D in the first week, i.e.
first row of table.
Consider the linear model
M : yijk = µ+ αi − αj + eijk,
where it is assumed that the eijk are uncorrelated
E[eijk] = 0, V ar(eijk) = σ
2.
In the equation
αi
is the strength of the home team, and
αj
is the ability of the away team.
(a) Write down the model matrix for the given table, assuming the data
ordering above and without imposing constraints on the parameters
(µ, α1, α2, . . . , α6).
(b) Show that the columns of X are not linearly independent and explain
how this is evident in the formulation
yijk = µ+ αi − αj + eijk.
Hint: consider adding a constant to the αs
(c) Show that the columns of X are linearly independent if the constraint
α1 = 0 is imposed.
2
(d) Explain why the parameter µ can be considered to be the home ground
advantage.
Hint: Consider two teams, i and j, of equal strength so that αi = αj.
Compare the expected score in favour of Team 1 at a home game and
at an away game with Team 2.
(e) Interpret the parameters α2, α3, α4, α5, α6 and the hypothesis
H0 : α2 = α3 = α4 = α5 = α6 = 0
in context.
(f) Consider the models
M1 : yijk = µ+ αi + eijk and M2 : yijk = µ+ αj + eijk
both with the constraint α1 = 0 and let X1 and X2 be the corresponding
model matrices. If X is the model matrix for the model M in Part (c),
then explain the relationship between X and X1 −X2.
2. AFL home-ground advantage coding
So now we use the ideas from Q1 to code this up and find if there is a home
ground advantage in AFL.
The file AFL2019.csv contains the results from the 198 AFL games played
during the home and away season. The variables recorded are
Variable Name Description
Round Round numbers from 1-23
Location Venue where match played
Home.Team The home team
Away.Team The away team
Home.Score The total points scored by the home team
Away.Score The total points scored by the away team
(a) Read the data into R. Obtain a list of the AFL teams in 2019. Which
team will be used as reference level if the standard factor coding is used?
(b) Add a new column, difference, to the data frame that contains the
difference between the home team and away team scores. Include the
relevant R code and the first 6 lines of the data frame.
(c) Consider the model, M , introduced in Question 1a. Calculate a matrix,
X, containing all of the necessary columns of the model matrix, except
the intercept.
3
Hint 1: Use the result of Question 1f to construct the X matrix.Note
also that you can use the X matrix in an R model formula, for example,
y~X.
Hint 2:: model.matrix() is your friend.
(d) Fit the model, M . Obtain residuals vs fitted values plot and also a nor-
mal quantile plot of the residuals. Comment on whether the regression
assumptions appear reasonable.
(e) Obtain the residuals vs leverage plot of the data and comment on
whether there are any influential points. Explain also whether there
are any points of high leverage.
(f) Based on this analysis, what is the estimated home team effect? Is the
effect statistically significant?
(g) Test the hypothesis
H0 : α2 = α3 = . . . = α17 = 0.
State the F-statistic, the degrees of freedom and the P-value as well as your
final conclusion. h. If the Brisbane Lions play at home against Carlton,
what is the expected number of points (round to the nearest point) that the
Lions will win by?
3. Post-grad question
We are now going to see if we can have a different home ground advantage
for different teams, for example, perhaps the Adelaide Crows do better at
home than the Brisbane Lions.
Our new model is
M3 : yijk = µ+ αi − βj + eijk
where α1 = β1 = 0.
In this model, αi, is the strength of Team i when playing at home and βj
is the strength of Team j when playing away.
(a) Show that M3 can be expressed equivalently as
M ′3 : yijk = µ+ αi − (αj + γj) + eijk
where α1 = γ1 = 0.
4
(b) Consider the model formulation M3 and the hypothesis
H0 : βi = αi for i = 2, . . . , 6.
Interpret this hypothesis in context.
(c) Formulate H0 in terms of the model formulation, M

3.
(d) Fit the models, M3 and M

3 to the AFL data from Q2. Use the param-
eters for Brisbane to illustrate the equivalence of the two models.
(e) Perform an ANOVA to test the hypothesis
H0 : γi = 0
5











































































































































学霸联盟


essay、essay代写