MAST90084: Statistical Modelling Assignment 2
1. Let yi = (yi1, · · · , yiq)T be a q×1 random vector following a probability distribution from multi-parameter
exponential family. Namely, the pdf of yi is f(yi|θi, φ, wi) = exp
{
yTi θi − b(θi)
φ
wi + c(yi, φ, wi)
}
, where
θi = (θi1, · · · , θiq)T is a q × 1 natural parameter vector, φ is a dispersion parameter and wi is a weight.
It is known that E
[
∂ ln f
∂θij
]
= 0, j = 1, · · · , q; and E
[
∂2 ln f
∂θij∂θij′
]
+ E
[(
∂ ln f
∂θij
)
·
(
∂ ln f
∂θij′
)]
= 0, j, j′ =
1, · · · , q. Using these two properties, show that E(yij) = ∂b(θi)
∂θij
, Var(yij) =
φ
wi
· ∂
2b(θi)
∂θ2ij
and
Cov(yij , yij′) =
φ
wi
· ∂
2b(θi)
∂θij∂θij′
, j, j′ = 1, · · · , q. [5]
2. You need to install the R package faraway to do this question. The hsb data was collected from the High
School and Beyond Study. Type help(hsb) to see the description of the dataset. We want to see how the
relevant variables in the data are related to the choice of the type of program — academic, vocational, or
general — that the students pursue in high school. The response is multinomial with three levels.
For this problem, you have to show your R code in a clean manner and add appropriate
comments when necessary. Points may be deducted for disorganized R code.
(a) Fit a trinomial response model with all the other variables other than id as predictors (untrans-
formed). Use “academic” as the base level for the response “prog”. Show the estimated coefficients
and their standard errors. [5]
(b) For the student with id 99, compute and show the predicted probabilities of the three possible choices.
[5]
3. In this problem you are to reproduce some of the results in Table 2.7 for Example 2.7 in Fahrmeir and Tutz
(F & T). You have to show your R code in a clean manner and add appropriate comments
when necessary. Points may be deducted for disorganized R code.
(a) The second column of Table 2.7 in F & T employs the variance funciton σ2(µ) = φµ. In class,
we have reproduced the “robust” p-values shown in the parentheses using the “robust.variance”
directly accessible from the gee object produce with the gee() function. For this sub-problem you are
asked to compute the robust.variance matrix “by hand”, i.e. using elementary matrix operations
in R based on the sandwich-estimator formula. When doing this, you are allowed to use other
information accessible from the gee object, such as scale, fitted.values, linear.predictors,
residuals, naive.variance, etc. However, you must clearly demonstrate that you can reproduce
the robust.variance matrix using the sandwich-estimator formula. [10]
(b) Now you are to reproduce ALL the numbers presented in the third column of Table 2.7 in F & T
where the variance function is taken to be σ2(µ) = µ + θµ2, using the methodology that alternates
between estimations of θ (by method of moments) and β (by solving a score equation) we have
discussed in lecture. You should pay attention to the following:
i. You are advised to use glm.nb() from the MASS package, which employs a full-on likelihood
approach, to obtain an intial estimate for θ. This initial value for θ can subsequently serve as
the input for the alternating estimation procedure.
MAST90084 Statistical Modelling Assignment 2 Semester 1, 2021
ii. To solve the score equation for β, you can use glm() and the function negative.binomial()
from the MASS package.
iii. Read the documents of any R functions that you are unsure about CAREFULLY, as well as
section 7.4 in Modern Applied Statistics with S by Venables and Ripley if necessary.
iv. Your numbers may differ slightly from those in F & T due to numerical differences. However,
they should be very close in general.
v. Present your work in a CLEAN way.
[25]
Total marks = 50
2