T1 2023-无代写-Assignment 1
时间:2024-03-24
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
UNSW Sydney
Department of Statistics
Term 1, 2024
MATH5905 - Statistical Inference
Assignment 1 Solutions
Problem One
A single die is tossed; then n coins are tossed where n is the number shown on the die. Show that the
probability of exactly two heads is close to 0.2578.
Solution: Let U1, U2, . . . , U6 denote the events “uppermost 1, 2, . . . , 6” respectively and A denote the event
in question. We have P (U1) = · · · = P (U6) = 16 . The formula of total probability gives
P (A) = P (A|U1) ∗ P (U1) + P (A|U2)P (U2) + . . . P (A|U6)P (U6) = 1
6
(P (A|U1) + . . . P (A|U6)).
Obviously P (A|U1) = 0 holds. For the remaining conditional probabilities, considering ‘Head’ as a success
and ’Tail’ as a failure, we need to get the probability of a number of two successes out of i independent
Bernoulli trials by calculating the ratio favourable outcomes
total number of outcomes
. The total number of outcomes is 2i and
the favourable ones are
(
i
2
)
, that is, P (A|Ui) = (
i
2)
2i , i = 2, 3, 4, 5, 6. Hence we get
P (A) =
1
6
(0 + 1/4 + 3/8 + 6/16 + 20/64 + 30/128) = 33/128 ≈ 0.2578.
Problem Two
A certain river floods every year. Suppose that the low-water mark is set at 1 and the high-water mark
X has a distribution function
FX(x) = P (X ≤ x) = 1− 1
x2
, 1 ≤ x <∞
1. Verify that FX(x) is a cumulative distribution function
2. Find the density fX(x) (specify it on the whole real axis)
3. If the (same) low-water mark is reset at 0 and we use a unit of measurement that is 110 of that used
previously, express the random variable Z for the new measurement as a function of X. Then find
precisely the cumulative distribution function and the density of Z.
Solution: a) Obviously limx→−∞ FX(x) = 0 (as FX(x) is a constant 0 for x ≤ 1). Also,
lim
x→∞FX(x) = limx→∞ 1−
1
x2
= 1.
For x > 1, ddxFX(x) = 2/x
3 > 0 which implies that FX(x) is increasing. So-for all x on the real axis, FX(x)
is non-decreasing.
b) The density as a derivative was calculated in a) already for x > 1 and it is zero for x ≤ 1.
c) FZ(z) = P (Z ≤ z) = P (10(X − 1) ≤ z) = P (X ≤ (z/10) + 1) = FX((z/10) + 1). Hence
FZ(z) = 1− ( 1
[z/10 + 1]2
)
when z > 0 (and zero else).
The density is obtained via differentiation of the cdf leading to
fZ(z) =
1
5(z/10 + 1)3
, z > 0
(and zero else).
1
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
Problem 3
Suppose X = (X1, . . . , Xn) are i.i.d. Poisson(θ) with a probability mass function
f(x, θ) =
e−θθx
x!
, x ∈ {0, 1, 2, . . . }, θ > 0
The prior on the unknown parameter θ is assumed to be Gamma(α, β) distribution with density
τ(θ) =
1
Γ(α)βα
θα−1e−θ/β , α, β > 0, θ > 0.
a) Find the posterior distribution for θ.
b) Hence or otherwise determine the Bayes estimator of θ with respect to the quadratic loss function
L(a, θ) = (a− θ)2.
c) Suppose the following ten observations were observed:
4, 0, 1, 0, 0, 2, 3, 1, 1, 0.
Using a zero-one loss with the parameters α = 2 and β = 1 for the prior, what is your decision
when testing H0 : θ ≤ 1.2 versus H1 : θ > 1.2. (You may use the integrate function in R or
another numerical integration routine from your favourite programming package to answer the
question.)
Solution:
a) We use the fact that the posterior is proportional to the prior and the likelihood:
p(θ|X) ∝ L(X|θ)τ(θ)
=
e−nθθ
∑n
i=1Xi∏n
i=1 xi!
1
Γ(α)βα
θα−1e−θ/β
∝ θα+
∑n
i=1Xi−1e−θ(n+
1
β ).
Hence we recognise this as a gamma density with parameters
α˜ = α+
n∑
i=1
Xi and β˜ =
1
n+ 1β
=
β
nβ + 1
b) From lectures, the Bayes estimator with respect to quadratic loss is the posterior mean given the
sample.
θˆbayes = E(θ|X) = α˜β˜ = (α+
n∑
i=1
Xi)(
β
nβ + 1
).
c) The structure of the test is as follows:
φ =
{
1 if P (θ < 1.2|X) < 0.5
0 if P (θ < 1.2|X) ≥ 0.5
We have n = 10,
∑10
i=1 xi = 12 so the posterior distribution given the sample is:
θ|X ∼ Gamma
(
2 + 12,
1
10× 1 + 1
)
= Gamma(14, 1/11)
2
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
Then using R we can compute the posterior probability under these conditions as
P (θ < 1.2|X) =
∫ 1.2
0
1
Γ(14)(1/11)14
θ13e−11θdθ ≈ 0.44893
Hence, since the posterior probability is smaller than 0.50 we reject H0.
Problem Four
The Premier of NSW has to take an important decision whether or not to include an additional line
(extension) to Sydney’s metro system. The decision is based on the financial viability of the extension.
He is willing to apply decision theory in making his decision. He uses the independent opinion of two
consulting experts. The data he uses is the number X of viability recommendations of the two experts
(so X = 0, 1 or 2).
If the Premier decides not to go ahead and the extension turns out not to be financially viable or
if he decides to go ahead and the project is financially viable, nothing is lost. If the project is not
financially viable and he has decided to go ahead, his subjective judgement is that his loss would equal
three times the loss of not going ahead but the project is financially viable.
The Premier has investigated the history of viability predictions of the two consulting experts and it is
as follows. When a project is financially viable, both experts have correctly predicted its viability with
probability 4/5 (and wrongly with a probability 1/5). When a project has not been financially viable,
both experts had a prediction of 3/5 for it to be viable. The Premier listens to the recommendations
of the two experts and makes his decision based on the value of X.
a) There are two possible actions in the action space A = {a0, a1} where action a0 is to go ahead
and action a1 is not to go ahead with the extension. There are two states of nature Θ = {θ0, θ1}
where θ0 = 0 represents “Extension is financially viable” and θ1 = 1 represents “Extension is not
financially viable”. Define the appropriate loss function L(θ, a) for this problem.
b) Compute the probability mass function (pmf) for X under both states of nature.
c) The complete list of all the non-randomized decisions rules D based on x is given by:
d1 d2 d3 d4 d5 d6 d7 d8
x = 0 a0 a1 a0 a1 a0 a1 a0 a1
x = 1 a0 a0 a1 a1 a0 a0 a1 a1
x = 2 a0 a0 a0 a0 a1 a1 a1 a1
For the set of non-randomized decision rules D compute the corresponding risk points.
d) Find the minimax rule(s) among the non-randomized rules in D.
e) Sketch the risk set of all randomized rules D generated by the set of rules in D. You might want
to use R (or your favorite programming language) to make this sketch more precise.
f) Suppose there are two decisions rules d and d′. The decision d strictly dominates d′ ifR(θ, d) ≤ R(θ, d′)
for all values of θ and R(θ, d) < (θ, d′) for at least one value θ. Hence, given a choice between d
and d′ we would always prefer to use d. Any decision rules which is strictly dominated by another
decisions rule (as d′ is in the above) is said to be inadmissible. Correspondingly, if a decision rule
d is not strictly dominated by any other decision rule then it is admissible. Show on the risk plot
the set of randomized decisions rules that correspond to the admissible decision rules.
g) Find the risk point of the minimax rule in the set of randomized decision rules D and determine
its minimax risk. Compare the two minimax risks of the minimax decision rule in D and in D.
Comment.
3
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
h) Define the minimax rule in the set D in terms of rules in D.
i) For which prior on {θ1, θ2} is the minimax rule in the set D also a Bayes rule?
j) Prior to listening to the two experts, the Premier’s belief in the viability is 50%. Find the Bayes
rule and the Bayes risk with respect to his prior.
k) For a small positive ϵ = 0.1, illustrate on the risk set the risk points of all rules which are ϵ-
minimax.
Solution: a) There are two actions: a0 : decide to go ahead and a1 : do not go ahead. Two states of
nature, θ0 = 0, ”Extension is financially viable”, and θ1 = 1 ”Extension is not financially viable”. Let x be
the number of experts predicting financial viability. The loss function is given by
L(θ1, a0) = 3, L(θ1, a1) = 0, L(θ0, a0) = 0, L(θ0, a1) = 1.
b) The pmf for both states of nature:
x p(x|θ = 0) p(x|θ = 1)
0 15 · 15 = 125 25 · 25 = 425
1 2 · 15 · 45 = 825 2 · 35 · 25 = 1225
2 45 · 45 = 1625 35 · 35 = 925
c)
There are 23 = 8 non-randomized decision rules:
d1 d2 d3 d4 d5 d6 d7 d8
x = 0 a0 a1 a0 a1 a0 a1 a0 a1
x = 1 a0 a0 a1 a1 a0 a0 a1 a1
x = 2 a0 a0 a0 a0 a1 a1 a1 a1
Calculation of the risk points {R(θ0, di), R(θ1, di)} is as follows:
For d1 = (0, 3):
R(θ0, d1) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0) = 0
as L(θ0, a0) = 0.
R(θ1, d1) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1) = 3
as L(θ1, a0) = 3.
For d2 = (1/25, 63/25):
R(θ0, d2) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0)
= 1× 1
25
+ 0× 8
25
+ 0× 16
25
=
1
25
R(θ1, d2) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1)
= 0 + 3× 12
25
+ 3× 9
25
=
63
25
4
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
For d3 = (8/25, 39/25):
R(θ0, d3) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0)
= 0× 1
25
+ 1× 8
25
+ 0× 16
25
=
8
25
R(θ1, d3) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1)
= 3× 4
25
+ 0× 12
25
+ 3× 9
25
=
39
25
For d4 = (9/25, 27/25):
R(θ0, d4) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0)
= 1× 1
25
+ 1× 8
25
+ 0× 16
25
=
9
25
R(θ1, d4) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1)
= 0× 4
25
+ 0× 12
25
+ 3× 9
25
=
27
25
For d5 = (16/25, 48/25):
R(θ0, d5) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0)
= 0× 1
25
+ 0× 8
25
+ 1× 16
25
=
16
25
R(θ1, d5) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1)
= 3× 4
25
+ 3× 12
25
+ 0× 9
25
=
48
25
For d6 = (17/25, 36/25):
R(θ0, d6) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0)
= 1× 1
25
+ 0× 8
25
+ 1× 16
25
=
17
25
5
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
R(θ1, d6) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1)
= 0× 4
25
+ 3× 12
25
+ 0× 9
25
=
36
25
For d7 = (24/25, 12/25):
R(θ0, d7) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0)
= 0× 1
25
+ 1× 8
25
+ 1× 16
25
=
24
25
R(θ1, d7) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1)
= 3× 4
25
+ 0× 12
25
+ 0× 9
25
=
12
25
For d8 = (0, 1):
R(θ0, d8) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0) = 1
as L(θ0, a1) = 1.
R(θ1, d8) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1) = 0
as L(θ1, a1) = 0.
This leads to the following risk points:
d1 d2 d3 d4 d5 d6 d7 d8
R(θ0, di) 0
1
25
8
25
9
25
16
25
17
25
24
25 1
R(θ1, di) 3
63
25
39
25
27
25
48
25
36
25
12
25 0
d) For each non-randomized decision rule we need to compute:
d1 d2 d3 d4 d5 d6 d7 d8
sup
θ∈{θ0,θ1}
R(θ, di) 3
63
25
39
25
27
25
48
25
36
25
24
25 1
Hence, inf
d∈D
sup
θ∈Θ
R(θ, d) is d7 which therefore the minimax decision in the set D with a minimax risk of
24/25.
e) Sketch of the randomized rules D generated by the set of non-randomized decision rulesD: see attached
graph of the set.
f) The admissible rules are those on the ”south-west boundary” of the risk set: any convex combination
of d8 and d4, or d4 and d2, or d2 and d1.The randomized decisions rules that correspond to admissible rules
6
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
0.0 0.5 1.0 1.5 2.0 2.5 3.0
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
3.
0
R(0,d)
R
(1,
d)
d1
d2
d3
d4
d5
d6
d7
d8
R(0,d) = R(1,d)
M
7
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
are colored in blue on the graph.
g) The minimax decision rule in the set is given by the intersection of the lines y = x and the risk set
towards the south-most boundary. The line d8d4 has an equation
y =
27
25 − 0
9
25 − 1
(x− 1) = −27
16
x+
27
16
To find the intersection, we solve
x = −27
16
x+
27
16
which gives x = y = 2743 = 0.627907 as the coordinates of the intersection point M. Hence the risk point of
the minimax rule δ∗ in D is δ∗ = (0.627907, 0.627907) with a minimax risk of 0.627907. We also realize that
the risk of the minimax rule in the set D is reduced further in the larger set of randomized decision rules D
(as 24/25 = 0.96 > 0.627907).
h) To express the rule M in terms of d4 and d8 we need to find α ∈ [0, 1] such that
27/43 = 9/25α+ 1 ∗ (1− α) and 27/43 = 27/25α+ 0 ∗ (1− α).
Each of these two relations gives the same solution α = 2543 = 0.5814. Therefore the randomized minimax
rule chooses d4 with probability α = 0.5814 and d8 with probability 0.4186.
i) Suppose the prior is (p, 1− p). This leads to a line with a normal vector (p, 1− p), that is, a slope with
−p
1−p and this slope should coincide with the slope of d4d8. Hence
−p
1− p =
−27
16
should hold.
Solving this leads to p = 2743 and the least favourable prior with respect to which δM is Bayes is
(0.627907, 0.372093) on (θ0, θ1). As we know from the lectures, with respect to this prior, the minimax
rule M is also a Bayes rule.
j) The line with normal vector ( 12 ,
1
2 ) has slope −1. When moving such a line ”south-west” as much as
possible but retaining intersection with the risk set, we end up with the intersection point d8. Hence d8 is
the Bayes decision that corresponds to the manager’s prior.
It’s Bayes risk is:
1
2
× 1 + 1
2
0 = 1/2
k) The shaded area on the graph. From the point (27/43+0.1, 27/43+0.1), put a vertical and a horizontal
line. The region in the risk set that is southwest of the quadrant defined by these lines is the relevant one.
Problem Five
Let a continuous random variable T be the length of live of an electrical component. The hazard
function hT (t) associated with T is defined as
hT (t) = lim
η→0
P (t ≤ T < t+ η|T ≥ t)
η
.
(In other words, hT (t) describes the rate of change of the probability that the component survives a
little past time t given that the component survives to time t.
a) Denoting by FT (t) and fT (t) the cdf and the density of T respectively, show that
hT (t) =
fT (t)
1− FT (t) = −
d
dt
log(1− FT (t).
8
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
b) Verify that if T is exponentially distributed, i.e.,
fT (t) = βe
−tβ , t > 0
where β > 0 is a parameter of the distribution then hT (t) = β, that is, the hazard function is a
constant for the exponential distribution.
c) Verify that if T is logistic with parameters µ (real) and β > 0, i.e.,
FT (t) =
1
1 + exp (− t−µβ )
then hT (t) =
1
βFT (t).
Solution: a) We have
P (t ≤ T ≤ t+ η|t ≤ T ) = P (t ≤ T ≤ t+ η)
P (t ≤ T ) =
FT (t+ η)− FT (t)
1− FT (t) .
Therefore
hT (t) =
1
1− FT (t) limη→0
FT (t+ η)− FT (t)
η
=
F ′T (t)
1− FT (t) =
fT (t)
1− FT (t) .
We also see directly that
− d
dt
(log[1− FT (t)]) = − fT (t)
1− FT (t) = hT (t)
holds.
b) Since fT (t) = βe
−βt, FT (t) = 1 − e−βt per definition, direct substitution in the formula for the
hazard from a) gives
hT (t) =
βe−βt
1− (1− e−βt) = β.
c) Taking the derivative of FT (t) we get
fT (t) =
1
β
e−(t−µ)/β
(1 + e−(t−µ)/β)2
.
Direct substitution in the formula for the hazard leads then to
hT (t) =
1
β
e−(t−µ)/β
(1 + e−(t−µ)/β)2
1 + e−(t−µ)/β
e−(t−µ)/β
=
1
β
FT (t).