代写-MAST90084|学霸联盟

代写-MAST90084

时间：2021-06-07

! Student ID:
The University of Melbourne
Semester 1 Exam — June 2019
School of Mathematics and Statistics
MAST90084 Statistical Modelling
Exam duration: 3 hours
Reading time: 15 minutes
This paper has 7 pages including this page
Authorised materials:
This is a closed book exam.
A University approved hand-held calculator, i.e. Casio FX82 (any suffix), may
be used.
Instructions to invigilators:
Script books shall be supplied to each student.
Students may not take this paper with them at the end of the exam.
Instructions to students:
There are 10 questions. All questions may be attempted.
The number of marks for each question is indicated after the question.
The total number of marks available is 100.
Your raw mark of this exam will be multiplied with 0.80 before being added
to your final subject mark.
This exam paper is not to be held by Baillieu Library.
MAST90084 Statistical Modelling, Semester 1 2019 page 2
1. Consider a GLM (generalised linear model) for an independent random sample Y1, · · · , Yn
where each Yi follows an exponential distribution with pdf f(yi; γi) =
1
γi
e−yi/γi , yi ≥
0; γi > 0; i = 1, · · · , n. Note that the pdf from an exponential family has the following
general form: f(y; θ, φ) = exp
{
yθ − b(θ)
a(φ)
+ c(y, φ)
}
.
Suppose the linear predictor of the GLM is ηi = α+βxi, with (α, β) being the parameters
of interest and xi’s being the covariate observations, i = 1, · · · , n.
(a) Show that the natural link function for this GLM is g(µi) = −µ−1i , where µi = E(yi).
(b) Find the log-likelihood function `(α, β) under the natural link for data y1, · · · , yn.
(c) Find
∂`
∂α
and
∂`
∂β
which specify the score function s(α, β).
(d) Find
∂2`
∂α2
,
∂2`
∂β2
and
∂2`
∂α∂β
. Then list the Fisher information matrix F (α, β).
(e) The current estimate (α(j), β(j))T in finding the MLE of (α, β)T can be updated to
(α(j+1), β(j+1))T by a Newton-Raphson procedure. Give a formula for this update.
[2 + 1 + 2 + 3 + 2 = 10 marks]
2. Suppose a random variable Y satisfies E(Y ) = µ and Var(Y ) = φV (µ), where V (µ) is
the variance function and φ is a dispersion parameter. Also suppose Y is associated with
a covariate vector x satisfying µ = (−2η)−1/2 and η = xTβ. Let (y1,xT1 ), · · · , (yn,xTn )
be n independent observations of (Y,xT ), for which the quasi-likelihood is defined as
Q(µ;y) =
n∑
i=1
Qi(µi; yi), with Qi(µi; yi) =
∫ µi
yi
yi − t
φV (t)
dt,
where µi =
(−2xTi β)−1/2, µ = (µ1, · · · , µn)T and y = (y1, · · · , yn)T . Now let V (t) = t3.
(a) Show that Q(µ;y) = −
n∑
i=1
1
2φyi
(
yi
µi
− 1
)2
.
(b) Find the quasi-score function s(β).
(c) Find the expected quasi-information matrix F (β).
[4 + 3 + 3 = 10 marks]
3. For response variable Y having k ordered categories, the cumulative model, with given
covariate vector x, thresholds θ1,· · ·, θq (q=k−1), and a cumulative distribution function
(cdf) F , is
P (Y ≤ r|x) = F (θr + xTγ), r = 1, · · · , q.
Write pir = P (Y = r|x) so that P (Y ≤ r|x) = pi1 + pi2 + · · ·+ pir.
(a) Find the link function g = (g1, · · · , gq)T , with gr = gr(pi1, · · · , piq), for this model
when choosing F = F (x) = exp
{−2e−(x−2)}.
(b) Write β = (θ1, · · · , θq,γT )T and linear predictor η = (η1, · · · , ηq)T = Zβ for obser-
vation (y,xT ). Specify the design matrix for the cumulative model in this question.
[5+ 5 = 10 marks]
– page 2 of 7 –
MAST90084 Statistical Modelling, Semester 1 2019 page 3
4. LetX1 andX2 be two independent random variables following Bernoulli(pi1) and Bernoulli(pi2)
distributions, respectively. Define Y1 = min(X1, X2) and Y2 = max(X1, X2).
(a) Find the joint probability mass function (pmf) of (Y1, Y2).
(b) Find E(Y1), E(Y2), Var(Y1) and Var(Y2).
(c) Find Cov(Y1, Y2).
(d) Find the correlation coefficient Corr(Y1, Y2).
[4 + 4 + 2 + 2 = 12 marks]
5. The power-divergence statistic can be used to assess the adequacy of a generalised linear
model (GLM) with binomial response. This statistic with parameter λ ∈ R is given
by Sλ =
g∑
i=1
SDλ(yi, pˆii) where the sum of deviations over sample proportions yi =
(yi1, · · · , yik) for group i is
SDλ(yi, pˆii) =
2ni
λ(λ+ 1)
k∑
j=1
yij
[(
yij
pˆiij
)λ
− 1
]
, −∞ < λ <∞.
Here ni is the size of group i, and pˆiij is the fitted value of probability piij obtained from
the relevant GLM.
(a) Show that S−2 =
g∑
i=1
ni
k∑
j=1
(yij − pˆiij)2
yij
, which is Neyman’s minimum modified χ2-
statistics.
(b) Show that lim
λ→−1
Sλ = 2
g∑
i=1
ni
k∑
j=1
pˆiij log
pˆiij
yij
, which is Kullback’s minimum discrim-
ination information statistic.
[4 + 4 = 8 marks]
6. Consider a count time series {yt, t = 1, 2, · · · , T} together with a fixed covariate time
series {xt, t = 1, 2, · · · , T}. Suppose, conditioning on an unobserved latent stationary
time series {εt, t = 1, 2, · · · , T}, {yt} are independent of each other together with
E(yt|{εt}) = Var(yt|{εt}) = exp(β0 + β1xt)εt.
Also suppose E(εt) = 1, Var(εt) = 1 and Cov(εt, εt+s) = ρ
s, −1 < ρ < 1.
(a) Show that µt ≡ E(yt) = exp(β0 + β1xt).
(b) Show that Var(yt) = µt + µ
2
t .
(c) Show that Cov(yt, yt+s) = µtµt+sρ
s, s = 1, 2, · · · , T − t.
(d) Compute Var(yt − yt−1).
[2 + 3 + 3 + 2 = 10 marks]
– page 3 of 7 –
MAST90084 Statistical Modelling, Semester 1 2019 page 4
7. Suppose the data consist of repeated observations (yit,x
T
it), t = 1, · · · , T , for each in-
dividual i = 1, · · · , n. Here yit is the response and xit is a covariate vector. A linear
mixed-effects model for analysing the population-averaged and subject-specific effects of
xit is of the following form
yi = Ziβ +Wibi + εi,
where yi = (yi1, · · · , yiT )T ; Zi is a T × p design matrix built from {xit} for the fixed
effects β; Wi is a T×q design matrix for the random effects bi; {bi} are i.i.d. MVN(0, Q)
random vectors with Q > 0; {εi} are i.i.d. MVN(0, σ2I) random vectors with σ2 > 0
and I being an identity matrix; and {bi} and {εi} are mutually uncorrelated.
(a) Find E(yi).
(b) Find Var(yi).
(c) Find Corr(yit, yis) for t 6= s; t, s = 1, · · · , T .
[2 + 4 + 4 = 10 marks]
8. Through radiological examination 371 coal miners were classified into 3 categories of
pneumonoconiosis: normal, mild and severe. These coal miners were also classified
into 8 groups according to the number of years each had spent working at the coal face.
Data summarizing these cross classifications are given in the data frame pneumo as shown
below:
> pneumo
Freq status year
1 98 1normal 5.8
2 51 1normal 15.0
3 34 1normal 21.5
4 35 1normal 27.5
5 32 1normal 33.5
6 23 1normal 39.5
7 12 1normal 46.0
8 4 1normal 51.5
9 0 2mild 5.8
10 2 2mild 15.0
11 6 2mild 21.5
12 5 2mild 27.5
13 10 2mild 33.5
14 7 2mild 39.5
15 6 2mild 46.0
16 2 2mild 51.5
17 0 3severe 5.8
18 1 3severe 15.0
19 3 3severe 21.5
20 8 3severe 27.5
21 9 3severe 33.5
22 8 3severe 39.5
23 10 3severe 46.0
24 5 3severe 51.5
Treating pneumonoconiosis status as a nominal categorical response variable, a multi-
categorical logit model has been fitted resulting in the following R output:
– page 4 of 7 –
MAST90084 Statistical Modelling, Semester 1 2019 page 5
> pneumo$status=relevel(pneumo$status, ref="1normal")
> nominal.mod <- multinom(status~year, data=pneumo, weights=Freq, Hess=T)
> summary(nominal.mod)
Coefficients:
(Intercept) year
2mild -4.2917 0.0836
3severe -5.0598 0.1093
Std. Errors:
(Intercept) year
2mild 0.5214 0.0153
3severe 0.5964 0.0165
(a) Write down the model fitted in the above R output. You need to define the response
variable and covariate for the model. Also, you need to specify the probability
distribution of the response variable and estimates of all parameters in the model.
(b) Provide an interpretation for the coefficient estimate 0.1093. Then calculate an ap-
proximate 95% confidence interval for the odds ratio of severe status versus normal
status for every 10 more years spent working at the coal face.
(c) Estimate the pneumonoconiosis status probabilities for a miner who has spent 25
years working at the coal face.
[4 + 3 + 3 = 10 marks]
9. Refer to the pneumo data in Q8. Treating the pneumonoconiosis status as an ordinal
categorical response variable, a cumulative model is fitted producing the following R
output (Note Coefficients Value needs to change sign for being used in the model):
> pneumo$tatus=as.ordered(as.character(pneumo$status))
> ordinal.mod=polr(status~year, data=pneumo,weights=Freq, Hess=T, method="logistic")
> summary(ordinal.mod)
Coefficients:
Value Std. Error t value
year 0.0959 0.01194 8.034
Intercepts:
Value Std. Error t value
1normal|2mild 3.9558 0.4097 9.6558
2mild|3severe 4.8690 0.4411 11.0383
(a) Write down the model fitted in the above R output. You need to define the response
variable and covariate for the model. Also, you need to specify the probability
distribution of the response variable and estimates of all parameters in the model.
(b) Provide an interpretation for the coefficient estimate 0.0959. Then calculate an
approximate 95% confidence interval for the odds ratio of non-normal status versus
normal status for every 10 more years spent working at the coal face.
(c) Estimate the pneumonoconiosis status probabilities for a miner who has spent 25
years working at the coal face.
[4 + 3 + 3 = 10 marks]
– page 5 of 7 –
MAST90084 Statistical Modelling, Semester 1 2019 page 6
10. The toenail data comes from a multi-center study comparing two oral treatments for
toenail infection. Patients were evaluated for the degree of separation of the nail. A
total of 294 patients were randomised into two treatments and were followed over seven
visits: four in the first year and yearly thereafter. Some of the patients did not attend
all seven visits, thus only a total of 1908 visits were observed. The patients had not been
treated prior to the first visit so this should be regarded as the baseline.
The variables available in the data are
outcome: 0 = none or mild separation, 1 = moderate or severe separation
ID: ID of patient
treatment: the treatment; A = 0 or B = 1
month: time of the visit, in months, from the first visit
visit: the number of the visit
The purpose of this study is to see how toenail infection responds to the treatments
and progresses over time. Some analysis has been done to the data in R, producing the
following output.
> toenail[1:14,]
ID outcome treatment month visit
1 1 1 1 0.000 1
2 1 1 1 0.857 2
3 1 1 1 3.536 3
4 1 0 1 4.536 4
5 1 0 1 7.536 5
6 1 0 1 10.036 6
7 1 0 1 13.071 7
8 2 0 0 0.000 1
9 2 0 0 0.964 2
10 2 1 0 2.000 3
11 2 1 0 3.036 4
12 2 0 0 6.500 5
13 2 0 0 9.000 6
14 3 0 0 0.000 1
> tail(toenail)
ID outcome treatment month visit
1903 383 1 1 0.00 1
1904 383 1 1 1.04 2
1905 383 1 1 2.04 3
1906 383 1 1 3.29 4
1907 383 0 1 7.29 5
1908 383 0 1 10.79 6
> str(toenail)
'data.frame': 1908 obs. of 5 variables:
$ ID : int 1 1 1 1 1 1 1 2 2 2 ...
$ outcome : int 1 1 1 0 0 0 0 0 0 1 ...
$ treatment: int 1 1 1 1 1 1 1 0 0 0 ...
$ month : num 0 0.857 3.536 4.536 7.536 ...
$ visit : int 1 2 3 4 5 6 7 1 2 3 ...
– page 6 of 7 –
MAST90084 Statistical Modelling, Semester 1 2019 page 7
> library(geepack}
fit.exch <- geeglm(outcome~treatment+month, family=binomial(link="logit"),
data=toenail, id=ID, corstr = "exchangeable", std.err="san.se")
summary(fit.exch)
Call:
geeglm(formula = outcome ~ treatment + month, family = binomial(link = "logit"),
data = toenail, id = ID, corstr = "exchangeable", std.err = "san.se")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -0.6104 0.1777 11.80 0.00059 ***
treatment 0.0402 0.2532 0.03 0.87388
month -0.2051 0.0259 62.66 2.4e-15 ***
---
Signif. codes: 0 a^A˘Y¨***a^A˘Z´ 0.001 a^A˘Y¨**a^A˘Z´ 0.01 a^A˘Y¨*a^A˘Z´ 0.05 a^A˘Y¨.a^A˘Z´ 0.1 a^A˘Y¨ a^A˘Z´ 1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 1.09 0.423
Correlation: Structure = exchangeable Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.424 0.182
Number of clusters: 294 Maximum cluster size: 7
> summary(fit.exch)$cov.unscaled
[,1] [,2] [,3]
[1,] 0.03159 -0.031374 -0.001395
[2,] -0.03137 0.064120 -0.000546
[3,] -0.00139 -0.000546 0.000671
Use the above output to answer the following questions.
(a) Let yit be the response value outcome of patient i during visit t. Write down
the model involved in the analysis, including the mean, variance and correlation
coefficient of yit’s. Give the estimates of the parameters appearing in the model.
(b) Write down the model’s design matrix for data where ID=383.
(c) Estimate the odds ratio of toe infection of a patient with treatment B versus with
treatment A at a given value of month. Calculate an approximate 95% confidence
interval for this odds ratio.
(d) Estimate the probability of toe infection in the first month from the first visit for a
patient using treatment A. Also compute an approximate 95% confidence interval
for this probability.
[4 + 1 + 2 + 3 = 10 marks]
Total marks = 100
End of the Questions
– page 7 of 7