统计代写-STATS 4043
时间:2022-05-15
Wednesday, 19 May 2021
EXAMINATION FOR THE DEGREES OF M.A., M.SCI. AND B.SC.
(SCIENCE)
Statistics – Generalised Linear Models - STATS 4043
This paper consists of 4 pages and contains 3 question(s).
Candidates should attempt all questions.
Question 1 20 marks
Question 2 20 marks
Question 3 20 marks
Total 60 marks
The following material is made available to you:
Statistical tables∗
Statistical Tables
Formula sheet
1. This is a Moodle quiz question provided separately. [20 MARKS]
1
CONTINUED OVERLEAF/
2. Let Yi be independent random variables each following a negative binomial distribution
with probability mass function
f(yi;φ, µi) =
Γ(φ+ yi)
Γ(φ)yi!
· µ
yi
i φ
φ
(µi + φ)yi+φ
, yi = 0, 1, 2, . . . , i = 1, . . . , n,
mean E(Yi) = µi and variance Var
(
Yi
)
= µi +
µ2i
φ
for φ > 0.
Consider the generalised linear model
log(µi) = xi
ᵀβ (1)
for these data, where β is a p-dimensional parameter vector.
(a) Explain how the maximum likelihood estimator βˆ is obtained, making reference
to the equation
βˆ(m) =
[
XᵀW (m−1)X
]−1
XᵀW (m−1)(η(m−1) + z(m−1)). (2)
Please note that you are required to define X and η in your answer. However,
you are not required to derive equation (2). You are also not required to define
the elements of W and z for this part.
[4 MARKS]
(b) Let wi be the ith diagonal element of W and zi the ith element of z in equation
(2). Express wi and zi in terms of µi and φ. [4 MARKS]
(c) 616 diabetic patients were followed up over an eight-year period to study factors
associated with the incidence of severe hypoglycaemia (SH), a complication of
Type 2 diabetes. The duration of follow-up time for each patient was recorded, and
in a total of 3953 patient-years of follow-up, 52 patients experienced 66 episodes of
SH. A multivariable negative binomial regression model was fitted to the number
of episodes, with the estimated rate ratios (RR) and their 95% confidence intervals
shown in the table below.
Variable RR (95% CI)
History of SH (1=Yes, 0=No) 5.05 (1.69-15.04)
Education higher than primary level (1=Yes, 0=No) 2.59 (1.18-5.66)
Taking insulin (1=Yes, 0=No) 3.24 (1.76-5.94)
Time on insulin (years) 2.99 (1.57-5.66)
Interpret the effect of “time on insulin”, making reference to an appropriate point
estimate and confidence interval from the table. [3 MARKS]
2
CONTINUED OVERLEAF/
(d) Let ni be the time in years for which the ith patient was followed up. Write down
a modified version of equation (1) that includes a term for ni. Explain the role of
this term in the model.
[4 MARKS]
(e) An alternative model to the negative binomial model for count data is the Poisson
model. Give one similarity and one difference between the two models.
[2 MARKS]
(f) How would you decide which of the two models to use? Give one reason why the
study authors may have chosen the negative binomial model in this case.
[3 MARKS]
3. A company surveyed its 2276 employees to find out if they would like to continue to
work remotely for the remainder of the year. In addition to answering the question
of interest on remote work, (R: yes or no), the employees provided information about
their gender (G: male or female), age group (A: under 30, 30-49, 50+), whether they
had a suitable space to use as a home office, (H: yes,no) and the distance (D) from
home to work in miles.
(a) Describe a suitable generalised linear model that can be used to analyse these
data, stating the distribution of the response, your choice of link function and
the model equation. Make sure to use appropriate mathematical notation and to
carefully define all the terms used. [6 MARKS]
(b) When focusing on just some of the variables (remote work, R, gender, G, and
availability of home office space, H), the data can be displayed in a contingency
table as follows.
Remote work yes no
Home office Gender
yes female 911 538
male 44 456
no female 3 43
male 2 279
Several loglinear models for these data are summarised in the table below.
Model Deviance
[R,GH] 843.83
[GR,GH] 92.08
[RH,GH] 187.75
[GR,GH] 497.37
[GR,GH,RH] 0.37
3
CONTINUED OVERLEAF/
[Note on the notation used in the Model column: When an interaction is present,
all associated main effects are also present in the model, e.g. [R,GH] represents a
loglinear model with main effects for factors G, R and H and an interaction term
between G and H.]
Select the model from the above table that best describes the data, justifying your
choice with an appropriate statistical test. [4 MARKS]
(c) Based on your chosen model, is the attitude towards remote working significantly
associated with any of the other variables? Explain. [3 MARKS]
(d) Write down a logistic regression model with R as the response that corresponds
to your chosen loglinear model. Show that the two models are equivalent.
[7 MARKS]
4
END OF QUESTION PAPER.


essay、essay代写