3551 Trousdale Rkwy, University Park, Los Angeles, CA
MAS2904 Statistical Inference
Assignment 1, Semester 1 2020/2021
You will be allocated an individual data set, which is unique to you. Only use your own
data. The allocation is in a file “Data.Allocation.2020.csv”. Email the lecturer if your name
Solutions should be submitted via Canvas by 4pm on Friday 6 November.
Solutions should be in the form of a csv file, named “MAS2904.Ex1.XXX.csv”, with “XXX”
replaced by your data set number as three digits, ie including leading zeros if necessary, so
“MAS2904.Ex1.001.csv”, . . . “MAS2904.Ex1.010.csv”,. . . “MAS2904.Ex1.100.csv”, . . ..
A template is provided. Put all your solutions in the second column.
The data relate to the number of deaths due to a particular condition in acute hospital trusts
in two different time periods. You will be provided with a sample of 50 trusts. The variables
• Trust. Code number of the trust.
• X1, X2. Measures of trust size in periods 1 and 2.
• N1, N2. Number of deaths in periods 1 and 2.
• E1, E2. Expected number of deaths in periods 1 and 2, adjusted for trust size and
The data are available as csv files at
Once you have downloaded your data, you can read it into R using
with of course “XXX” replaced by your data set number. You may need to add a full path
name or use setwd to set an appropriate working directory. You can also read your data
directly into R using
The numbers in bold font below specify how many decimal places you should
give your answer to. It is important to keep to this!
When an answer to one question is used in later questions, use in calculations the value
rounded to the number of decimal places specified in the earlier part.
1. Find the sample mean of X1. [1dp] ( 5 marks)
2. Find the sample standard deviation of X1. [1dp] ( 5 marks)
3. Suppose we have a random variable X ∼ N(µ, σ2). Take µ and σ equal to the values
you found in Q1 and Q2. Find
(a) Pr(X > 1100), [3dp] ( 5 marks)
(b) Pr(900 < X < 1100). [3dp] ( 5 marks)
4. Give the five-number summary of |N2 - E2|, noting we are interested in absolute value
only. [2dp] ( 10 marks)
5. Suppose Y is an exponential random variable with expected value equal to the median
of |N2 - E2|. What is the variance of Y ? [2dp] ( 10 marks)
6. Find P (Y > 4). [2dp] ( 10 marks)
7. What proportion of trusts have N1 more than 20% higher than E1, ie more than 1.2×E1?
[2dp]. ( 5 marks)
8. Suppose trusts are independent and a further sample of 10 trusts is selected randomly.
Let W be the number of new trusts that have N1 more than 20% higher than E1. Use
your answer to the previous question to estimate
(a) Pr(W = 0), [3dp] ( 10 marks)
(b) Var(W ). [2dp] ( 10 marks)
9. Suppose X1, . . . , X4 are independent random variables, each with expectation E(X) =
2θ − 1 and variance Var(X) = θ2/3, where θ is an unknown parameter. Consider the
estimator θˆ of θ, where
θˆ = X1 + 2X2 −X3 − 3X4.
Set θ equal to the sample median of N1. Give the numerical values of the:
(a) expectation of the estimator θˆ, [1dp] ( 5 marks)
(b) variance of the estimator θˆ, [2dp] ( 10 marks)
(c) bias of the estimator θˆ. [1dp] ( 10 marks)