Examiners’ commentaries 2019
Examiners’ commentaries 2019
ST2134 Advanced statistics: statistical inference
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2018–19. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Information about the subject guide and the Essential reading
references
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2018).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
General remarks
Learning outcomes
At the end of this half course and having completed the essential reading and activities you should
be able to:
• explain the principles of data reduction
• judge the quality of estimators
• choose appropriate methods of inference to tackle real problems.
Key steps to improvement
Candidates should pay close attention to the essential topics of sufficient statistics, point estimation,
interval estimation and hypothesis testing. Being able to state and prove key theorems and lemmas
simply requires close study of these in the subject guide. The point estimation techniques of method
of moments estimation and maximum likelihood estimation are standard (and are even introduced in
ST104b Statistics 2), with questions typically varying in terms of the probability distribution
from which the random sample is drawn (with many common distributions covered in ST104b
Statistics 2 and ST2133 Advanced statistics: distribution theory). Pivotal functions feature
heavily in Chapter 4 (Interval estimation), and likelihood ratio tests similarly feature heavily in
Chapter 5 (Hypothesis testing).
• An examination paper attempts to cover as much of the syllabus as possible, but inevitably
some parts are left out. As the included parts will vary from year to year it is not enough to
practise only on the past papers. Candidates should practise on all the examples, Learning
1
ST2134 Advanced statistics: statistical inference
activities and Sample examination questions in the subject guide in order to understand the
relevant theory and adequately prepare for the examination. Afterwards you may try one or
two past papers for further practice.
• The examination paper for this course involves four questions and each of the questions
contains several parts which contribute toward the final grade with the different numbers of
marks reflecting their difficulty. The difficulty of each part is usually assessed after taking
into account the level of thinking related to statistical inference and the extent of algebraic
computations required. Take 5–10 minutes to read the whole paper and try to identify the
difficulties of each question in comparison to your ability. You will find that some questions
could be much easier for you than others.
• If you are unable to solve an exercise due to algebraic computations or lack of knowledge of
the material of ST2133 Advanced statistics: distribution theory, do not abandon an
exercise. Write clearly the procedure which should be followed for the relevant statistical
inference tasks. Remember that the primary aspect you are being examined on is the ability
to understand the statistical methodology.
• A good knowledge of basic calculus, such as the ability to compute integrals, sums and
derivatives, will give you invaluable help for completing the examination paper. Also, the
main concepts of ST2133 Advanced statistics: distribution theory (random variables
and their distribution, expectation, independence, conditional expectation etc.) should be
well-understood before attempting this half course. If you are not comfortable with any of
these, go back and revise. Memorising basic properties such as the moments and probability
mass/density functions of some standard distributions (such as the binomial, Poisson,
uniform, normal and gamma) is also a good idea.
• When you are trying to prove an equality or inequality be careful of what is on the left-hand
and right-hand side. Justify the steps in the calculations as much as possible. If you cannot
get a question right, do not just write in the correct answer without any further explanation.
If you just write on the script what you think went wrong you will get more marks. Also try
to avoid writing more than you are asked to write. For example, when asked to state a
theorem, do not give the proof as well as this simply wastes time.
• Be clear in your answers. When giving the probability mass/density function of a random
variable you should also state its sample space. When you are deriving an estimator or a
sufficient statistic also state the parameter to which it corresponds. Sometimes it is a good
idea to underline your final result (if applicable).
• Make sure that you are able to provide the definitions of basic concepts such as
estimators/estimates, confidence intervals and hypothesis tests. The above are statements
about population parameters which are derived using functions of the data (that is,
statistics). In the relevant calculations you should always be able to distinguish a statistic
from a parameter and be clear on what they represent.
• Practise operations involving products and sums. A substantial number of candidates fail to
write them down correctly. Quite often they omit or write the indices incorrectly which
inevitably leads to errors. A typical example of this is the following:
Assume that a random sample is observed which consists of the random variables
Y1, Y2, . . . , Yn from a population with probability density function fYi(yi; θ), often written
for simplicity as fY (y; θ). The fact that this sample is a random sample indicates that we
can write the joint probability density function of Y1, . . . , Yn as:
fY1,...,Yn(y1, . . . , yn; θ) =
n∏
i=1
fYi(yi; θ).
2
Examiners’ commentaries 2019
Make sure that you write the above and not the following:
fY1,...,Yn(y1, . . . , yn; θ) =
n∏
i=1
fY (y; θ)
which sometimes leads to the incorrect expression:
fY1,...,Yn(y1, . . . , yn; θ) = fY (y; θ)
n.
Note that if you were asked to provide the log-likelihood function l(y1, . . . , yn; θ) the
calculations would have led you to sums rather than products (refresh your memory on the
properties of log functions). The correct expression would be:
l(y1, . . . , yn; θ) = log (fY1,...,Yn(y1, . . . , yn; θ))
= log
(
n∏
i=1
fYi(yi; θ)
)
=
n∑
i=1
log fYi(yi; θ).
Examination revision strategy
Many candidates are disappointed to find that their examination performance is poorer than they
expected. This may be due to a number of reasons, but one particular failing is ‘question
spotting’, that is, confining your examination preparation to a few questions and/or topics which
have come up in past papers for the course. This can have serious consequences.
We recognise that candidates might not cover all topics in the syllabus in the same depth, but you
need to be aware that examiners are free to set questions on any aspect of the syllabus. This
means that you need to study enough of the syllabus to enable you to answer the required number of
examination questions.
The syllabus can be found in the Course information sheet available on the VLE. You should read
the syllabus carefully and ensure that you cover sufficient material in preparation for the
examination. Examiners will vary the topics and questions from year to year and may well set
questions that have not appeared in past papers. Examination papers may legitimately include
questions on any topic in the syllabus. So, although past papers can be helpful during your revision,
you cannot assume that topics or specific questions that have come up in past examinations will
occur again.
If you rely on a question-spotting strategy, it is likely you will find yourself in difficulties
when you sit the examination. We strongly advise you not to adopt this strategy.
3
ST2134 Advanced statistics: statistical inference
Examiners’ commentaries 2019
ST2134 Advanced statistics: statistical inference
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2018–19. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Information about the subject guide and the essential reading
references
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2018).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Comments on specific questions – Zone A
Candidates should answer all FOUR questions: Question 1 of Section A (40 marks) and all
THREE questions from Section B (60 marks in total).
Section A
Answer Question 1 from this section.
Question 1
(a) Suppose the random vector Y has a smooth probability density function
f(y; θ), where θ is an unknown parameter. Let T = T (Y ) be an unbiased
estimator of g(θ). Then the following Crame´r–Rao inequality holds:
Var(T ) ≥ (dg(θ)/dθ)
2
E ([∂ log{f(y; θ)}/∂θ]2) .
i. Show that the denominator on the right-hand side of the above inequality
can be replaced by:
−E
((
∂
∂θ
)2
log{f(y; θ)}
)
.
(7 marks)
ii. State the condition under which there exists an unbiased estimator of g(θ)
whose variance attains the Crame´r–Rao lower bound.
(3 marks)
4
Examiners’ commentaries 2019
Reading for this question
Section 3.5 of the subject guide.
Approaching the question
i. Let s(θ; y) = ∂ log{f(y; θ)}/∂θ. Therefore:
d
dθ
E[s(θ; Y )] =
d
dθ
∫
Rn
s(θ; y) f(y; θ) dy
=
∫
Rn
[(
∂
∂θ
s(θ; y)
)
f(y; θ) + s(θ; y)
(
∂
∂θ
f(y; θ)
)]
dy
=
∫
Rn
[
∂
∂θ
s(θ; y) + s(θ; y)2
]
f(y; θ) dy
= E
[
∂
∂θ
s(θ; Y ) + s(θ; Y )2
]
.
Since E[s(θ; Y )] = 0, then dE[s(θ; Y )]/dθ = 0, from which the result immediately follows.
ii. If U = h(Y ) is an unbiased estimator of g(θ), then U attains the Crame´r–Rao lower
bound if and only if:
s(θ; y) = b(θ)[h(y)− g(θ)]
where b(θ) is a function involving the parameter θ, but not y.
(b) The following data show the number of occupants in passenger cars observed
during one hour at a busy junction. It is assumed that these data follow a
geometric distribution with probability mass function:
p(x; pi) =
{
(1− pi)x−1 pi for x = 1, 2, . . .
0 otherwise.
Number of occupants 1 2 3 4 5 ≥ 6 Total
Frequency 678 227 56 28 8 14 1011
Find the maximum likelihood estimate of pi. You do not need to show that the
solution is a maximum.
(12 marks)
Reading for this question
Section 3.6 of the subject guide.
Approaching the question
The sample size is n = 1011. If we know all the 1,011 observations, the joint probability
function for x1, . . . , x1011 is:
L(pi) =
1011∏
i=1
p(xi; pi).
However, we only know that there are 678 xis equal to 1, 227 xis equal to 2, . . ., and 14 xis
equal to some integers not smaller than 6. Note that:
P (Xi ≥ 6) =
∞∑
x=6
p(x; pi) = pi (1− pi)5 (1 + (1− pi) + (1− pi)2 + · · · )
= pi (1− pi)5 × 1
pi
= (1− pi)5.
5
ST2134 Advanced statistics: statistical inference
Hence we may only use:
L(pi) = p(1, pi)678p(2, pi)227p(3, pi)56p(4, pi)28p(5, pi)8
(
(1− pi)5)14
= pi1011−14(1− pi)227+56×2+28×3+8×4+14×5
= pi997(1− pi)525
hence:
l(pi) = logL(pi) = 997 log pi + 525 log(1− pi).
Setting:
d
dpi
l(pi) =
997
pi
− 525
1− pi = 0 ⇒ pi =
997
997 + 525
= 0.655.
(c) Suppose that a random variable X has a Poisson distribution with unknown
rate parameter λ, where λ > 0. Find a statistic g(X), i.e. some known function
of X, which will be an unbiased estimator of eλ.
Hint: If E(g(X)) = eλ, then:
∞∑
x=0
g(x) e−λλx
x!
= eλ.
Consider multiplying both sides of this equation by eλ, and use the series
expansion of the exponential function, where for any number a we have:
ea =
∞∑
x=0
ax
x!
= 1 + a+
a2
2!
+
a3
3!
+ · · · .
(8 marks)
Reading for this question
Section 3.1 of the subject guide.
Approaching the question
Using the hint, if E(g(X)) = eλ, then:
eλ = E(g(X)) =
∞∑
x=0
g(x) p(x;λ) =
∞∑
x=0
g(x) e−λ λx
x!
.
Multiplying by eλ, we have:
∞∑
x=0
g(x)λx
x!
= e2λ =
∞∑
x=0
(2λ)x
x!
=
∞∑
x=0
2x λx
x!
.
Since two power series in λ can be equal only if the coefficients of λx are equal for
x = 0, 1, 2, . . ., it follows that g(x) = 2x for x = 0, 1, 2, . . .. This argument also shows that
the estimator 2X is the unique unbiased estimator of eλ in this problem.
(d) A random sample {X1, X2, . . . , Xn} is drawn from the distribution with the
following probability density function:
f(x; θ) =
θ
2
√
x
e−θ
√
x
for x ≥ 0, and 0 otherwise.
i. Find the maximum likelihood estimator for θ. You do not need to show that
the solution is a maximum.
(8 marks)
6
Examiners’ commentaries 2019
ii. Suppose n = 4 and we observe x1 = 4.1, x2 = 7.3, x3 = 6.5 and x4 = 8.8.
Based on this random sample, calculate the maximum likelihood estimate of
θ/2.
(2 marks)
Reading for this question
Section 3.6 of the subject guide.
Approaching the question
i. Since the Xis are independent and identically distributed, the likelihood function is:
L(θ) =
n∏
i=1
θ
2
√
Xi
e−θ
√
Xi =
θn
2n
n∏
i=1
√
Xi
e
−θ
n∑
i=1
√
Xi
.
Hence the log-likelihood function is:
l(θ) = lnL(θ) = n ln θ − θ
n∑
i=1
√
Xi − ln
(
2n
n∏
i=1
√
Xi
)
.
Differentiating with respect to θ, we obtain:
dl(θ)
dθ
=
n
θ
−
n∑
i=1
√
Xi.
Equating to zero, we obtain the maximum likelihood estimator:
θ̂ =
n
n∑
i=1
√
Xi
.
ii. The maximum likelihood estimator of θ/2 is θ̂/2. For the given data, the point estimate
is:
θ̂ =
4
2× (√4.1 +√7.3 +√6.5 +√8.8) = 0.1953.
Section B
Answer all three questions from this section.
Question 2
Let {Y1, Y2, . . . , Yn} be a random sample from a U(0, 2θ) distribution. The
probability density function for each Yi, for i = 1, . . . , n, is given by:
fYi(y | θ) =
{
1/(2θ) for 0 ≤ y ≤ 2θ
0 otherwise.
(a) Find the method of moments estimator of θ and check if this is an unbiased
estimator of θ.
(5 marks)
(b) Find the maximum likelihood estimator of θ and derive its probability density
function.
(5 marks)
7
ST2134 Advanced statistics: statistical inference
(c) Is the maximum likelihood estimator of θ you found in part (b) an unbiased
estimator of θ? If not, modify it to find an unbiased estimator of θ.
(5 marks)
(d) Consider the method of moments estimator and the unbiased estimator from
part (c). Compare the variances of the two estimators to decide which estimator
is preferred.
(5 marks)
Reading for this question
Sections 3.1, 3.3 and 3.6 of the subject guide.
Approaching the question
(a) The first population moment is E(Yi) = θ. To get the method of moments estimator we just
set θ̂MM = Y¯ . The estimator is an unbiased estimator of θ since E(Y¯ ) = E(Yi) = θ.
(b) The likelihood function of θ given the sample Y = y may be written as:
L(θ | y) =
n∏
i=1
1
2θ
I(y(i) ≤ 2θ) = (2θ)−nI(y(n) ≤ 2θ)
where y(n) denotes the sample maximum.
The likelihood function takes strictly positive values for 2θ ≥ y(n), equivalently θ ≥ y(n)/2.
Specifically:
L(θ | y) =
{
(2θ)−n for θ ≥ y(n)/2
0 otherwise.
Note that the likelihood function is decreasing in θ, so it is maximised at y(n)/2. Hence the
maximum likelihood estimator of θ is θ̂ML = Y(n)/2.
To find the distribution of Y(n) we need fYi(yi | θ) = 1/(2θ) and:
FYi(yi | θ) =
∫ yi
0
1
2θ
dyi =
yi
2θ
for 0 ≤ yi ≤ 2θ. We get:
fY(n)|θ(y | θ) = n!FY (y | θ)n−1fY (y | θ) = n
( y
2θ
)n−1 1
2θ
=
nyn−1
(2θ)n
for 0 ≤ y ≤ 2θ.
(c) The expected value of θ̂ML = Y(n)/2 is:
E
(
Y(n)
2
)
=
1
2
∫ 2θ
0
y
nyn−1
(2θ)n
dy =
nθ
n+ 1
6= θ.
Therefore, the maximum likelihood estimator is a biased estimator of θ. However, if we set:
θ˜ =
n+ 1
n
θ̂ML =
n+ 1
n
Y(n)
2
we get an unbiased estimator of θ since:
E(θ˜) = E
(
n+ 1
n
Y(n)
2
)
=
n+ 1
n
E
(
Y(n)
2
)
=
n+ 1
n
n
n+ 1
θ = θ.
8
Examiners’ commentaries 2019
(d) Since both estimators are unbiased we will compare them in terms of their variances. We
get:
Var(θ̂MM ) = Var(2Y¯ ) =
4 Var(Yi)
n
=
4θ2
12n
=
θ2
3n
.
For the variance of θ˜ we need to compute E(Y 2(n)) which is equal to:
E(Y 2(n)) =
∫ θ
0
y2
nyn−1
θn
dy =
n
n+ 2
θ2.
The variance is then:
Var(θ˜) =
(
n+ 1
n
)2
Var(Y(n)) =
(
n+ 1
n
)2(
n
n+ 2
θ2 − n
2
(n+ 1)2
θ2
)
=
θ2
n(n+ 2)
.
Since Var(θ˜) ≤ Var(θ̂MM ) for n ≥ 1, the estimator θ˜ is preferred.
Question 3
Let Y = {Y1, Y2, . . . , Yn} be a random sample from a distribution with probability
density function for each Yi, for i = 1, . . . , n, given by:
fYi(yi |β) =
{
βyβ−1i for 0 < yi < 1, β > 0
0 otherwise.
(a) Find the likelihood function and find a one-dimensional sufficient statistic for β.
(4 marks)
(b) Find the maximum likelihood estimator of β.
(6 marks)
(c) Describe how a maximum likelihood estimator may be used to form an
approximate pivotal function for large sample sizes.
(5 marks)
(d) Find an asymptotic confidence interval for β with level 100(1− α)%, where
α ∈ (0, 1).
(5 marks)
Reading for this question
Sections 2.3, 3.6 and 4.3 of the subject guide.
Approaching the question
(a) The likelihood function is:
L(β | y) = f(y |β) =
n∏
i=1
βyβ−1i = β
n
(
n∏
i=1
yi
)β−1
.
We can write f(y |β) = g (∏i yi, β) h(y) by setting g (∏i yi, β) = f(y |β) and h(y) = 1.
Hence
n∏
i=1
yi is a one-dimensional sufficient statistic for β.
9
ST2134 Advanced statistics: statistical inference
(b) The log-likelihood function is:
l(β | y) = logL(β | y) = n log(β) + (β − 1)
n∑
i=1
log(yi).
The score function is:
s(β | y) = ∂l(β | y)
∂β
=
n
β
+
n∑
i=1
log(yi).
Fisher’s information for β is:
I(β) = −E
(
∂2l(β | y)
∂β2
)
= −E
(
− n
β2
)
=
n
β2
.
Setting s(β | y) = 0 leads to β̂ = −n/∑i log(yi) as a candidate maximum
likelihood estimator. Since:
∂2l(β | y)
∂β2
= − n
β2
< 0
we conclude that this is indeed the maximum likelihood estimator of β.
(c) Example 4.3.5 on page 104 in the subject guide.
(d) Let β̂ denote β̂ML. We know that for large sample sizes:
β̂
a∼ N
(
β,
β2
n
)
since I(β) = n/β2. In an extra level of approximation we may use:
β̂
a∼ N
(
β,
β̂2
n
)
.
Let Sβ = β̂
2/n. We can write:
β̂ − β√
Sβ
a∼ N(0, 1).
Using the pivotal function technique for constructing confidence intervals, we get the
following asymptotic 100(1− α)% confidence interval:(
β̂ − z1−α/2 ×
√
Sβ , β̂ + z1−α/2 ×
√
Sβ
)
.
Question 4
(a) Explain the rationale behind the likelihood ratio test and why the likelihood
ratio test statistic takes values larger than one. Describe how this test may be
used in the presence of nuisance parameters.
(6 marks)
Reading for this question
Section 5.3 of the subject guide.
Approaching the question
The contents of Definition 5.3.1 on page 125 of the subject guide. This includes Learning
activity 5.3. Also, the contents of Section 5.3.1 on page 126.
10
Examiners’ commentaries 2019
(b) Let {X1, X2, . . . , Xn} be a random sample from an exponential distribution.
The probability density function for each Yi, for i = 1, . . . , n, is:
fXi(xi |λ) =
1
λ
exp
(
−xi
λ
)
for xi > 0, and 0 otherwise, where λ > 0 is an unknown parameter.
i. Obtain the test statistic for the likelihood ratio test for testing H0 : λ = λ0
vs. H1 : λ 6= λ0.
(6 marks)
ii. Use the likelihood ratio test statistic to provide the rejection region of an
asymptotic test at a 100α% significance level, where α ∈ (0, 1).
(5 marks)
iii. Describe how you would obtain an asymptotic p-value for the test in part ii.
Also, provide a graphical representation of the p-value.
(3 marks)
Reading for this question
Section 5.3 of the subject guide.
Approaching the question
i. The joint probability density function of the sample is:
f(x;λ) = λ−n exp
(
−
∑
xi
λ
)
and the log-likelihood function for the observed sample x is:
l(λ |x) = −n log(λ)−
∑
xi
λ
and the score function is:
s(λ |x) = −n
λ
+
∑
xi
λ2
.
The score function is equal to 0 only at λ = X¯. The second derivative of the
log-likelihood function is:
∂2
∂λ2
l(λ |x) = n
λ2
− 2nX¯
λ3
which is negative at λ = X¯. Hence we conclude that the likelihood function is maximised
at X¯. The likelihood ratio statistic, LR, is the following:
LR =
LX(λ̂)
LX(λ0)
=
λ̂−n exp(−∑Xi/λ̂)
λ−n0 exp(−
∑
Xi/λ0)
=
X¯−n exp(−n)
λ−n0 exp(−nX¯/λ0)
.
ii. The null hypothesis, H0, is rejected if LR > k, where k is a suitably chosen constant.
However, the exact distribution of LR is unknown and it is, therefore, unclear how to
construct an exact test. Alternatively, under H0, we know that:
2 logLR = 2n
(
log(λ0)− log(X¯)−
(
1− X¯
λ0
))
∼ χ21
approximately. A 100α% significance level test will reject H0 if 2 logLR > χ
2
α, 1, where
χ2α, 1 denotes the upper 100α% of the χ
2
1 distribution.
iii. For the asymptotic test in part ii., we can use the observed value of LR = x and the fact
that 2 logLR > χ2α, 1 under H0, to calculate the following probability:
p-value = PH0(2 logLR ≥ x).
Candidates were also expected to provide a rough sketch of the area corresponding to the
p-value.
11
ST2134 Advanced statistics: statistical inference
Examiners’ commentaries 2019
ST2134 Advanced statistics: statistical inference
Important note
This commentary reflects the examination and assessment arrangements for this course in the
academic year 2018–19. The format and structure of the examination may change in future years,
and any such changes will be publicised on the virtual learning environment (VLE).
Information about the subject guide and the essential reading
references
Unless otherwise stated, all cross-references will be to the latest version of the subject guide (2018).
You should always attempt to use the most recent edition of any Essential reading textbook, even if
the commentary and/or online reading list and/or subject guide refer to an earlier edition. If
different editions of Essential reading are listed, please check the VLE for reading supplements – if
none are available, please use the contents list and index of the new edition to find the relevant
section.
Comments on specific questions – Zone B
Candidates should answer all FOUR questions: Question 1 of Section A (40 marks) and all
THREE questions from Section B (60 marks in total).
Section A
Answer Question 1 from this section.
Question 1
(a) State and prove the Crame´r–Rao inequality theorem.
(10 marks)
Reading for this question
Section 3.5 of the subject guide.
Approaching the question
The relevant theorem is Theorem 3.5.1 on page 70 in the subject guide, followed by the
proof.
(b) The following data show the number of occupants in passenger cars observed
during one hour at a busy junction. It is assumed that these data follow a
geometric distribution with probability mass function:
p(x; pi) =
{
(1− pi)x−1 pi for x = 1, 2, . . .
0 otherwise.
12
Examiners’ commentaries 2019
Number of occupants 1 2 3 4 5 ≥ 6 Total
Frequency 724 298 63 32 11 15 1143
Find the maximum likelihood estimate of pi. You do not need to show that the
solution is a maximum.
(12 marks)
Reading for this question
Section 3.6 of the subject guide.
Approaching the question
The sample size is n = 1143. If we know all the 1,143 observations, the joint probability
function for x1, . . . , x1143 is:
L(pi) =
1143∏
i=1
p(xi; pi).
However, we only know that there are 724 xis equal to 1, 298 xis equal to 2, . . ., and 15 xis
equal to some integers not smaller than 6. Note that:
P (Xi ≥ 6) =
∞∑
x=6
p(x; pi) = pi (1− pi)5 (1 + (1− pi) + (1− pi)2 + · · · )
= pi (1− pi)5 × 1
pi
= (1− pi)5.
Hence we may only use:
L(pi) = p(1, pi)724p(2, pi)298p(3, pi)63p(4, pi)32p(5, pi)11
(
(1− pi)5)15
= pi1143−15(1− pi)298+63×2+32×3+11×4+15×5
= pi1128(1− pi)639
hence:
l(pi) = logL(pi) = 1128 log pi + 639 log(1− pi).
Setting:
d
dpi
l(pi) =
1128
pi
− 639
1− pi = 0 ⇒ pi =
1128
1128 + 639
= 0.638.
(c) Suppose that a random variable X has a Poisson distribution with unknown
rate parameter λ, where λ > 0. Find a statistic g(X), i.e. some known function
of X, which will be an unbiased estimator of eλ.
Hint: If E(g(X)) = eλ, then:
∞∑
x=0
g(x) e−λλx
x!
= eλ.
Consider multiplying both sides of this equation by eλ, and use the series
expansion of the exponential function, where for any number a we have:
ea =
∞∑
x=0
ax
x!
= 1 + a+
a2
2!
+
a3
3!
+ · · · .
(8 marks)
13
ST2134 Advanced statistics: statistical inference
Reading for this question
Section 3.1 of the subject guide.
Approaching the question
Using the hint, if E(g(X)) = eλ, then:
eλ = E(g(X)) =
∞∑
x=0
g(x) p(x;λ) =
∞∑
x=0
g(x) e−λ λx
x!
.
Multiplying by eλ, we have:
∞∑
x=0
g(x)λx
x!
= e2λ =
∞∑
x=0
(2λ)x
x!
=
∞∑
x=0
2x λx
x!
.
Since two power series in λ can be equal only if the coefficients of λx are equal for
x = 0, 1, 2, . . ., it follows that g(x) = 2x for x = 0, 1, 2, . . .. This argument also shows that
the estimator 2X is the unique unbiased estimator of eλ in this problem.
(d) A random sample {X1, X2, . . . , Xn} is drawn from the distribution with the
following probability density function:
f(x; θ) =
θ
2
√
x
e−θ
√
x
for x ≥ 0, and 0 otherwise.
i. Find the maximum likelihood estimator for θ. You do not need to show that
the solution is a maximum.
(8 marks)
ii. Suppose n = 4 and we observe x1 = 6.2, x2 = 5.8, x3 = 7.3 and x4 = 4.5.
Based on this random sample, calculate the maximum likelihood estimate of
θ/2.
(2 marks)
Reading for this question
Section 3.6 of the subject guide.
Approaching the question
i. Since the Xis are independent and identically distributed, the likelihood function is:
L(θ) =
n∏
i=1
θ
2
√
Xi
e−θ
√
Xi =
θn
2n
n∏
i=1
√
Xi
e
−θ
n∑
i=1
√
Xi
.
Hence the log-likelihood function is:
l(θ) = lnL(θ) = n ln θ − θ
n∑
i=1
√
Xi − ln
(
2n
n∏
i=1
√
Xi
)
.
Differentiating with respect to θ, we obtain:
dl(θ)
dθ
=
n
θ
−
n∑
i=1
√
Xi.
Equating to zero, we obtain the maximum likelihood estimator:
θ̂ =
n
n∑
i=1
√
Xi
.
14
Examiners’ commentaries 2019
ii. The maximum likelihood estimator of θ/2 is θ̂/2. For the given data, the point estimate
is:
θ̂ =
4
2× (√6.2 +√5.8 +√7.3 +√4.5) = 0.2057.
Section B
Answer all three questions from this section.
Question 2
Let {X1, X2, . . . Xn} be a random sample from a distribution with probability
density function:
fXi(x | θ) =
{
3θ−3x2 for 0 ≤ x ≤ θ
0 otherwise
for any i = 1, . . . , n, and where θ > 0 is an unknown parameter.
(a) Let Y = max
1≤i≤n
Xi. Prove that the probability density function of Y is:
fY (y) =
{
3nθ−3ny3n−1 for 0 ≤ y ≤ θ
0 otherwise.
(5 marks)
(b) Find a one-dimensional sufficient statistic for θ.
(5 marks)
(c) Find the maximum likelihood estimator θ̂ of θ.
(5 marks)
(d) Compute the mean squared error E[(θ̂ − θ)2], when n = 10 and θ = 1.
(5 marks)
Reading for this question
Sections 3.1, 3.3 and 3.6 of the subject guide.
Approaching the question
(a) We have:
fY (y) = lim
∆y→0
=
P (y ≤ Y ≤ y + ∆y)
∆y
= n[FX1(y)]
n−1fX1(y).
Therefore:
FX1(y) =
∫ y
0
fX1(x) dx =
3
θ3
∫ y
0
x2 dx =
y3
θ3
and:
fY (y) = n
y3n−3
θ3n−3
3y2
θ3
= 3n
y3n−1
θ3n
for 0 ≤ y ≤ θ, and 0 otherwise, as requested.
(b) The joint pdf of {X1, X2, . . . , Xn} is:
fX1,...,Xn(x1, . . . , xn | θ) =
3n
θ3n
n∏
i=1
x2i I(min
i
xi ≥ 0) I(max
i
xi ≤ θ)
=
[
1
θ3n
I(max
i
xi ≤ θ)
] [
3nI(min
i
xi ≥ 0)
n∏
i=1
x2i
]
and by the factorisation theorem a sufficient statistic for θ is Y = maxiXi.
15
ST2134 Advanced statistics: statistical inference
(c) From the joint pdf in part (b) the log-likelihood function is:
l(θ |x1, . . . , xn) = log fX1,...,Xn(x1, . . . , xn | θ) = constant×−3n log θ, for θ ≥ max
i
xi
while it is zero otherwise. Hence the log-likelihood function is a decreasing function of θ in
the range [maxi xi, ∞). Therefore, the maximum likelihood estimator is θ̂ = maxiXi = Y .
(d) From part (a) we have:
E(θ̂) = E(Y ) =
∫ θ
0
yfY (y) dy =
3n
θ3n
∫ θ
0
y3n−1y dy =
3n
3n+ 1
θ
E(θ̂2) = E(Y 2) =
∫ θ
0
y2fY (y) dy =
3n
θ3n
∫ θ
0
y3n−1y2 dy =
3n
3n+ 2
θ2
and:
Var(θ̂) = Var(Y ) = E(Y 2)− (E(Y ))2 = 3nθ
2
(3n+ 2)(3n+ 1)2
.
Therefore:
MSE(θ̂) = Var(θ̂) +
(
E(θ̂)− θ
)2
=
3nθ2
(3n+ 2)(3n+ 1)2
+ θ2
[
3n
3n+ 1
− 1
]2
=
(6n+ 2)θ2
(3n+ 2)(3n+ 1)2
=
62
30752
' 0.002.
Question 3
Let {Y1, Y2, . . . , Yn} be a random sample from a distribution with the following
probability density function for each Yi (for i = 1, . . . , n):
f(yi;µ) =
(
1
2piy3i
)1/2
exp
(
1
µ
− yi
2µ2
− 1
2yi
)
, for yi > 0, µ > 0
and 0 otherwise.
(a) Derive the likelihood function and provide a one-dimensional sufficient statistic
for µ.
(4 marks)
(b) Find the maximum likelihood estimator of µ.
(6 marks)
(c) Describe how a maximum likelihood estimator may be used to form an
approximate pivotal function for large sample sizes.
(5 marks)
(d) Find an approximate pivotal function for µ and construct an asymptotic level
100(1− α)% confidence interval, where α ∈ (0, 1).
(5 marks)
Reading for this question
Sections 2.3, 3.6 and 4.3 of the subject guide.
16
Examiners’ commentaries 2019
Approaching the question
(a) The likelihood function is:
L(µ; y) = fY (y |µ) =
n∏
i=1
fYi(yi |µ)
=
n∏
i=1
(
1
2piy3i
)1/2
exp
(
1
µ
− yi
2µ2
− 1
2yi
)
=
(
2pi
n∏
i=1
y3i
)−n/2
exp
nµ −
n∑
i=1
yi
2µ2
− 1
2
n∑
i=1
1
yi
= exp
nµ −
n∑
i=1
yi
2µ2
exp
(
−1
2
n∑
i=1
1
yi
)(
2pi
n∏
i=1
y3i
)−n/2
.
We can write L(y;µ) = g(
∑
i yi, µ)h(y) by setting:
g
(∑
i
yi, µ
)
= exp
nµ −
n∑
i=1
yi
2µ2
and:
h(y) = exp
(
−1
2
n∑
i=1
1
yi
)(
2pi
n∏
i=1
y3i
)−n/2
.
Hence
n∑
i=1
yi is a one-dimensional sufficient statistic.
(b) Note that the likelihood function is proportional to:
L(µ; y) ∝ exp
nµ −
n∑
i=1
yi
2µ2
.
Therefore, the log-likelihood function can be written as:
l(µ; y) ∝ logL(µ; y) = n
µ
−
n∑
i=1
yi
2µ2
.
The score function is:
s(µ; y) =
∂l(µ; y)
∂µ
= − n
µ2
+
n∑
i=1
yi
µ3
.
Setting s(µ; y) = 0 gives µ̂ = −∑i Yi/n = Y¯ as a candidate maximum likelihood estimator.
Since:
∂2l(µ; y)
∂µ2
∣∣∣∣
µ=y¯
=
2n
µ3
− 3ny¯
µ4
∣∣∣∣
µ=y¯
=
2n
y¯3
− 3ny¯
y¯4
= − n
y¯3
< 0
we conclude that this is indeed the maximum likelihood estimator.
(c) Direct bookwork: see Example 4.3.5 on page 104 of the subject guide.
17
ST2134 Advanced statistics: statistical inference
(d) We know that for large sample sizes:
µ̂ML
a∼ N
(
µ,
1
I(µ)
)
.
In an extra level of approximation we may use:
µ̂ML
a∼ N
(
µ,
1
H(µ | y)
)
where:
H(µ) = − ∂
2l(µ; y)
∂µ2
∣∣∣∣
µ=y¯
=
n
y¯3
.
Let Sµ = y¯
3/n. We can write:
µ̂− µ√
Sµ
a∼ N(0, 1).
Using the pivotal function technique for constructing confidence intervals, we get the
following asymptotic 100(1− α)% confidence interval:(
µ̂− z1−α/2 ×
√
Sµ, µ̂+ z1−α/2 ×
√
Sµ
)
.
Question 4
(a) State the Rao–Blackwell theorem.
(6 marks)
Reading for this question
Section 3.2 of the subject guide.
Approaching the question
The Rao–Blackwell theorem is stated on page 47 of the subject guide.
(b) Let {Y1, Y2, . . . , Yn} be a random sample from a distribution with the following
probability density function for each Yi (for i = 1, . . . , n):
f(yi;β) = βy
β−1
i , for 0 < yi < 1, β > 0
and 0 otherwise.
i. Consider the test H0 : β = β0 vs. H1 : β 6= β0. Find the likelihood ratio test
statistic.
(6 marks)
ii. Use the likelihood ratio test statistic to provide the rejection region of an
asymptotic test at a 100α% significance level, where α ∈ (0, 1).
(5 marks)
iii. Describe how you would obtain an asymptotic p-value for the test in part ii.
Also, provide a graphical representation of the p-value.
(3 marks)
Reading for this question
Section 5.3 of the subject guide.
18
Examiners’ commentaries 2019
Approaching the question
i. The likelihood function for this problem is:
L(y;β) = f(y;β) =
n∏
i=1
βyβ−1i = β
n
(
n∏
i=1
yi
)β−1
.
The likelihood ratio test statistic, LR, for this test can be written as:
LR =
L(y; β̂)
L(y;β0)
where β̂ is the maximum likelihood estimator of β. To find β̂ consider the log-likelihood
function which is:
l(y;β) = logL(y;β) = n log(β) + (β − 1)
n∑
i=1
log(yi)
and the score function:
s(y;β) =
∂l(y;β)
∂β
=
n
β
+
n∑
i=1
log(yi).
Setting s(y;β) = 0 gives β̂ = −n/∑i log(yi) as a candidate maximum likelihood
estimator. Since:
∂2l(y;β)
∂β2
= − n
β2
< 0
we conclude that this is indeed the maximum likelihood estimator.
Hence we can write LR as:
LR =
β̂n
(
n∏
i=1
yi
)β̂−1
βn0
(
n∏
i=1
yi
)β0−1 = (−n/
∑
i log(yi))
n
(
n∏
i=1
yi
)−n/∑i log(yi)−1
βn0
(
n∏
i=1
yi
)β0−1 .
ii. For large n, the distribution of 2 logLR is χ21 under H0, with large values of 2 logLR
being against H0. Hence a test of size 100α% will reject H0 if 2 logLR > χ
2
α, 1.
iii. For the asymptotic test in the previous part we can use the observed value of LR = x
and the fact that 2 logLR > χ2α, 1 under H0, to calculate the following probability:
p-value = PH0(2 logLR ≥ x).
Candidates were also expected to provide a rough sketch of the area corresponding to the
p-value.
19