MATH3723-3723代写|学霸联盟

MATH3723-3723代写

时间：2023-03-17

School of Mathematics
MATH3723 Statistical Theory
Session 2022–2023 (Semester 2)
Practical
This practical sheet is posted on Minerva on Monday 6 March 2023.
The timetabled practical session will be held at 16:00 –18:00 on Friday 17
March 2023 in Cohen A/B Cluster (1.40). It is strongly recommended that
you attempt the tasks beforehand.
Your reports are due by 23:59 on Friday 31 March 2023, to be submitted
through Gradescope (access on Minerva via Submit My Work). Later submis-
sions are allowed but penalty will be applied (1 mark per calendar day or part
day), so please make sure you meet the deadline.
Reports must be typed (in LaTeX or Word) and not exceed 8 printed pages,
including appropriate graphical output and (briefly) the R codes used. If you
exceed this limit, a penalty will be applied (1 mark per page or part page).
Your reports will be assessed out of 20 marks (up to 15 for the content and
up to 5 for presentation). The practical is worth 20% of the overall module
mark.
Background. The Cauchy distribution is defined by its probability density function
f(x) =
1
pi(1 + x2)
, x ∈ R.
Here, the normalisation 1/pi ensures that
∫∞
−∞ f(x) dx = 1.
Indeed, recalling that (arctanx)′ =
1
1 + x2
, we have∫ ∞
−∞
dx
1 + x2
= 2
∫ ∞
0
dx
1 + x2
= 2 lim
K→∞
∫ K
0
dx
1 + x2
= 2 lim
K→∞
arctanK = 2 · pi
2
= pi.
Note that the expected value of the Cauchy distribution is infinite:∫ ∞
−∞
|x|
1 + x2
dx = 2 lim
K→∞
∫ K
0
x
1 + x2
dx = lim
K→∞
ln(1 + x2)
∣∣K
0
= lim
K→∞
ln(1 +K2) =∞.
However, the median M is well defined and, by symmetry, is given by M = 0, because e.g.∫ ∞
0
f(x) dx =
1
pi
∫ ∞
0
dx
1 + x2
=
1
2
.
Statistical model. Suppose we have an i.i.d. sample X = (X1, . . . , Xn) from the shifted
Cauchy distribution with density f(x |θ) = f(x− θ) (θ ∈ R), that is,
f(x |θ) = 1
pi
(
1 + (x− θ)2) , x ∈ R.
1
We are interested in estimating the location parameter θ. Arranging the observed values
X1, X2, . . . , Xn in ascending order, consider the order statistics
X(1) < X(2) < · · · < X(n).
(Since the Cauchy distribution is continuous, with probability 1 there are no ties in the obser-
vations, so all the inequalities may be presumed to be strict.)
The sample median Mn is defined as the value separating the order statistics into two “equal
parts”; more precisely, if n = 2k + 1 then Mn := X(k+1), so that
X(1) < · · · < X(k) < Mn = X(k+1) < X(k+2) < · · · < X(2k+1).
If n = 2k then the usual convention is to set Mn :=
1
2
(
X(k) +X(k+1)
)
.
In this practical, you will consider the following four estimators of the parameter θ:
θˆ
(1)
n :=Xn =
1
n
n∑
i=1
Xi (sample mean)
θˆ
(2)
n :=Mn (sample median)
θˆ
(3)
n :=Mn +
2
n
· ∂`
∂θ
∣∣∣∣
θ =Mn
(modified sample median)
θˆ
(4)
n := maximum likelihood estimator (MLE)
The MLE θˆ
(4)
n is the maximiser of the log-likelihood, i.e.
`
(
θˆ(4)n |X
)
= max
θ∈R
`(θ |X),
where
`(θ |X) = ln f(X |θ) = −n lnpi −
n∑
i=1
ln
(
1 + (Xi − θ)2
)
.
The coefficient 2/n in the estimator θˆ
(3)
n (modified sample median) is explained by Theorem
8.3 in the lecture notes (see formula (8.23)) and by the fact that Fisher’s information in this
model is given by (see Appendix 1 below)
In(θ) = Eθ
[
(`′θ(θ |X))2
]
=
n
2
.
Derivation of this formula is provided in the handout attached with the practical task.
Task. The objective of the practical is to explore and compare the asymptotic properties of
the four estimators above (where analytical calculations may not be possible).
To this end, use computer simulations to verify if these estimators are consistent (i.e., θˆn
approaches the true value θ as n → ∞) and also to assess their accuracy by evaluating, for
different values of the sample size n, their mean square error MSEθ(θˆn) = Eθ
(
(θˆn− θ)2
)
and
the coverage probability Pθ(|θˆn − θ| ≤ ε), say for ε = 0.1.
2
More specific guidelines:
1. Statistical software R should be used for simulations and statistical computations including
graphics. See http://www.maths.leeds.ac.uk/school/students/computing.html for some
useful documentation on R. More help is available within R itself by clicking on “Help”
in the menu and choosing “R functions” or “Html help”. In particular, you might find
it useful to familiarise yourself with the entries Cauchy, plot, function, mean, median
and sum, and to practice with these commands before starting your practical work.
2. Fix a certain value of θ and simulate your Cauchy samples using command rcauchy.
3. To visually verify consistency of estimator θˆn = θˆ(Xn), one method is to sample the
values X1, X2, . . . , Xn sequentially and to plot the sequence of resulting values θˆn as a
(random) function of n ∈ N. What behaviour of such a plot would you expect for a
consistent estimator?
4. When assessing the quality of the estimators, the sequential approach may be too compu-
tationally demanding, so it is recommended to confine oneself with some representative
(increasing) values of the sample size n, say n = 10, 100, 500, 1000, . . .
5. To evaluate numerically the mean square error and the coverage probability (for a given
n), make use of the Monte Carlo method (based on the Law of Large Numbers), according
to which the expected value E(Y ) of a random variable Y can be approximated by the
sample mean Y m = (Y1 + · · ·+ Ym)/m of a (large) i.i.d. sample Y = (Y1, . . . , Ym) from
this distribution:
E(Y ) ≈ Y1 + · · ·+ Ym
m
.
In practical applications, the number of replicates m should be large enough so that any
two different estimation runs would yield reasonably close approximate values.
6. To illustrate the maximum likelihood estimation (MLE) method, plot the log-likelihood
function `(θ |X) as a function of parameter θ for several different values of n (and with
the corresponding sample values X1, . . . , Xn fixed). This can be done using the R function
plot, but first you will have to define the function `(θ |X) using suitable R commands.
Is there always a unique root of the likelihood equation `′θ = 0?
7. To calculate MLE numerically, it is recommended to use the R command optim; note
however that it minimises a given function.
8. Summarise your findings by drawing the conclusions and recommendations.
3
Appendix 1: Fisher’s information
SupposeX = (X1, . . . , Xn) is an i.i.d. sample from the shifted Cauchy distribution with density
f(x |θ) = 1
pi
(
1 + (x− θ)2) , x ∈ R. (1)
The likelihood is given by
L(θ |X) = f(X |θ) =
n∏
i=1
f(Xi |θ) = 1
pin
n∏
i=1
1
1 + (Xi − θ)2 .
Hence, the log-likelihood and its partial derivative are easily calculated:
`(θ |X) = lnL(θ |X) = −n lnpi −
n∑
i=1
ln
(
1 + (Xi − θ)2
)
and
`′θ(θ |X) = 2
n∑
i=1
Xi − θ
1 + (Xi − θ)2 .
Our aim is to calculate Fisher’s information in this model by showing that
In(θ) =
n
2
(2)
The fact that the information does not depend on the parameter θ should not be surprising in
view of the shift nature of the distributional family (1).
By definition,
In(θ) = Eθ
[
(`′θ)
2
]
.
However, it is more convenient to use the information’s additivity, In(θ) = nI(θ) (see the
lecture notes, Section 4.3, Theorem 4.4), where I(θ) is the information per observation,
I(θ) = Eθ
[
(`′θ(θ |Xi))2
]
=
∫ ∞
−∞
(
2(x− θ)
1 + (x− θ)2
)2
dx
pi
(
1 + (x− θ)2) .
By a simple shift of the variable, x 7→ x− θ, this integral is rewritten as
I(θ) = Eθ
[
(`′θ(θ |Xi))2
]
=
4
pi
∫ ∞
−∞
x2
(1 + x2)3
dx.
Thus, we must evaluate the last integral. Namely, we have the following
Proposition. ∫ ∞
−∞
x2
(1 + x2)3
dx =
pi
8
(3)
4
Proof. Integrate in (3) by parts:∫ ∞
−∞
x2
(1 + x2)3
dx = −1
4
∫ ∞
−∞
x d
(
1
(1 + x2)2
)
= −
(
x
(1 + x2)2
)∣∣∣∣∞
−∞︸︷︷︸
=0
+
1
4
∫ ∞
−∞
dx
(1 + x2)2
=
1
4
∫ ∞
−∞
dx
(1 + x2)2
.
Furthermore, decomposing and again integrating by parts:
1
4
∫ ∞
−∞
dx
(1 + x2)2
=
1
4
∫ ∞
−∞
(1 + x2)− x2
(1 + x2)2
dx
=
1
4
∫ ∞
−∞
dx
1 + x2
− 1
4
∫ ∞
−∞
x2
(1 + x2)2
dx
=
pi
4
+
1
8
∫ ∞
−∞
x d
(
1
1 + x2
)
=
pi
4
− 1
8
∫ ∞
−∞
dx
1 + x2
=
pi
4
− pi
8
=
pi
8
,
and (3) is proved.
Alternatively, using the substitution x = tan t (−pi
2
< t < pi
2
) and noting that
dx =
dt
cos2 t
, 1 + x2 =
1
cos2 t
,
we obtain ∫ ∞
−∞
x2
(1 + x2)3
dx =
∫ pi/2
−pi/2
tan2 t
1/ cos6 t
· dt
cos2 t
= 2
∫ pi/2
0
sin2 t cos2 t dt [2 cos t sin t = sin 2t]
=
1
2
∫ pi/2
0
sin2(2t) dt [change of variables u = 2t]
=
1
4
∫ pi
0
sin2 u du [2 sin2 u = 1− cos 2u]
=
1
8
∫ pi
0
(1− cos 2u) du
=
1
8
(pi − 0) = pi
8
,
as required.
5
Appendix 2: Various hints on using R
##### Simulations:
> n = 10 # sample size
> theta = 2 # shift parameter
> X = rnorm(n, mean=theta, sd=1) # generating a random (normal) sample
# for Cauchy distribution, use rcauchy(n, location, scale)
> hist(X) # plotting the histogram of the data
> theta1 = mean(X) # estimation by the sample mean
> theta2 = median(X) # estimation by the median
##### How to plot the estimates dynamically? (consistency)
# This is simple for the mean:
> M1 = cumsum(X)/seq(1:n) # computes the sample mean of subsamples
# (X[1],...X[k]) for k=1,...,n
> plot(M1, xlab="k", ylab="Estimator I", type="line") # plot of M1
> abline(h=theta) # to compare with the true value
##### How to do the same for the median?
> Median = function(X) {z=vector(length=n);
for(k in 1:n) z[k]=median(X[1:k]); z}
> M2=Median(X)
> plot(M2, xlab="k", ylab="Estimator II", type="line")
> abline(h=theta)
##### How to estimate the MSE?
> m = 1000 # number of replicates
> Theta1 = vector(length=m) # allocating space for iterations
> for(i in 1:m) {X = rnorm(n, mean=theta, sd=1); Theta1[i]=mean(X); Theta1}
# simulating m samples and computing the corresponding estimates of theta
> hist(Theta1) # plotting the histigram of Theta1
> MSE1 = mean((Theta1-theta)^2) # estimating MSE of the estimate theta1
> MSE1 # print MSE1
##### How to define the log-likelihood (for a given sample X) and plot it?
> logL = function(t) {ell=0; for (j in 1:n) ell=ell-(X[j]-t)^2; ell}
> plot(logL, xlab="theta", ylab="log-likelihood", from=-20, to=20, type="line")
##### How to find the maximum of a function?
> theta = 2
> n = 100
> X = rnorm(n, mean=theta, sd=1)
> f = function(theta) sum((X-theta)^2)
> a = optim(0, f) # 0 = initial value for search (of min f)
> a$par # point of minimum
> a$value # minimum value
6
##### NOTES
# 1. ’optim’ searches for the MINIMUM value of the function.
# 2. The exact answer in the example above: a = mean(X).
# 3. There will be a warning message that the default method is unreliable;
# indeed, accuracy may not be great.
# 4. Experiment further by specifying the method, e.g."Brent", to see the
# difference; but you will have to specify the search range, e.g.
# > a = optim(0, f, method="Brent", lower=-10, upper=10)
# 5. Alternatively, you can use the function ’optimize’ instead of ’optim’;
# this also needs specification of the range.