MAST20005/MAST90058 Assignment 1
1. (R) Come up with a dataset with exactly 9 data points x1, . . . , x9 such that, when displayed with the
boxplot() function in R using all the default values for its arguments, the ends of the “whiskers” exactly
coincide with the first and third quartiles (the “hinges”) defining the interquartile range. Describe your
dataset in words very briefly and produce its boxplot with R. Hint: It maybe helpful to look at the help
documentation for the R function boxplot.stat(). [3]
2. (R) For a given natural number n and any k = 1, . . . , n, let c k
n+1
be the kn+1 -th population quantile of a
normal random variable with mean µ and variance σ2. Moreover, let Φ(·) be the cumulative distribution
function for the standard normal and Φ(·)−1 be its inverse function. Prove rigorously that, when plotting
the duples (
Φ−1
( k
n+ 1
)
, c k
n+1
)
, k = 1, . . . , n,
on the plane, they will exactly lie on a straight line with intercept µ and slope σ. For n = 10, make a
plot of these duples in R with µ = 1 and σ = 3, and add a straight line joining them. Label your y-axis
by “N(1, 9)”, and x-axis by “N(0, 1)”. [3]
3. Assume X1, . . . , Xn ∼i.i.d. N(µ, σ2), where both the mean µ and variance σ2 are unknown. Consider a
general estimator for σ2 defined by
a
n∑
i=1
(Xi − X¯)2, (1)
where a is a given real number; two specific such examples are the unbiased sample variance and the
maximum likelihood estimator (MLE), with a taking the values (n− 1)−1 and n−1 respectively.
(a) Derive the mean squared error (MSE) of the estimator for σ2 in (1), as a function in a. [3]
(b) Among all the estimators of the form in (1), find the one that minimizes the MSE. [2]
(c) Does your solution in (b) coincide with the maximum likelihood estimator? If not, does it “refute”
the optimality property of the MLE described in the lectures? (i.e., if your estimator from (b) does
have lower MSE than the MLE and is also asymptotically unbiased, does it contradict that the MLE
is “asymptotically efficient”?) [1]
4. Let X1, . . . , Xn be i.i.d. Be(p) random variables, i.e. Bernoulli random variables with success probability
p. Moreover, assume it is already known that 1/3 ≤ p ≤ 2/3.
(a) Find the maximum likelihood estimator (MLE) for p under this problem. [3]
(b) (R) For the special case of n = 1, i.e. when there is only one data point, also consider the “naive”
estimator δ(X1) of p defined by the constant function:
δ(x) = 1/2 for all 0 ≤ x ≤ 1.
With simulations in R, compare the estimator δ(X1) against your MLE obtained in (a), by plotting
empirical mean squared errors (MSE) of the two estimators against the following true values for p:
8/24, 9/24, 10/24, 11/24, 12/24, 13/24, 14/24, 15/24, 16/24.
Use these calibrations for your plot:
• Use red points for the MSE estimates of the MLE, and blue points for that of the naive estimator.
• Label the x- and y-axes as “p” and “MSE” respectively.
• The y-axis should extend from the minimum value 0 to the maximum value 0.05.
MAST20005/MAST90058 Statistics Assignment 1 Semester 2, 2023
• Use 10000 instances of repeated samples to form the empirical MSE’s.
(Moral: In certain finite-sample situations, a principled estimator like the MLE is no better than a
naive estimator.)
[3]
5. A social worker would like to gauge the level of unreported petty crimes committed by teenagers in a
certain city, by conducting a survey with them. Because some of the questions posed can be sensitive, to
encourage honesty from the respondents the technique described below is used:
The social worker shows the following two versions of the same question to an interviewee:
• Version 1: “Have you ever shoplifted?”
• Version 2: “Have you never stolen anything from a store?”
For r ∈ (0, 1), a coin with a probability of r of landing head and probability 1−r of landing tail is provided
to the interviewee. The interviewee will flip the coin, and answer version 1 of the question if the coin lands
on head; otherwise, s/he would answer version 2 of the question. The interviewee can only answer with
“yes” or “no”. The social worker doesn’t get to know the result of the coin flip and hence the version of
the question the interviewee is responding to. As such, it is assumed that the interviewee will answer the
question honestly. (In reality, one can argue that this assumption of honesty is approximately true; for
any coin used in practice, the probability r should be quite close to 1/2, so the interviewee doesn’t have
a lot of incentive to lie as the chance of getting either version of the question is roughly the same.)
Suppose, in a population of teenagers, the proportion having shoplifted is p, and a random sample of
n teenagers are questioned in the way just described. Let Xi, i = 1, . . . , n, be the answers of the n
interviewees given as
Xi =
{
1 if the respondent answers “yes”
0 if the responder answers “no”
.
Suppose r is known, determine the MLE and MME of p as a function of the data {X1, . . . , Xn} and r.
Hint: consider the cases r = 1/2 and r 6= 1/2 separately. [4]
Total marks = 22