T1 2023-math5905代写-Assignment 1|学霸联盟

T1 2023-math5905代写-Assignment 1

时间：2023-04-27

MATH5905, T1 2023 Assignment One Solutions Statistical Inference
UNSW Sydney
Department of Statistics
Term 1, 2023
MATH5905 - Statistical Inference
Assignment 1 Solutions
Problem One
i) It is known that for independent Poisson distributed random variables X1 ∼ Poisson(λ1) and
X2 ∼ Poisson(λ2) it holds that
X1 +X2 ∼ Poisson(λ1 + λ2).
Show that if Xi ∼ Poisson(λi), i = 1, 2, . . . , k are independent then the conditional distribution of X1
given X1 +X2 + . . . Xk is Binomial and determine the parameters of this Binomial distribution.
ii) Suppose that the X and Y are components of continuous random vector with a density
fX,Y (x, y) = cxy
2, 0 < x < y, 0 < y < 2
(and zero else). Here c is a normalizing constant.
a) Show that c = 516 .
b) Find the marginal density fX(x) and FX(x).
c) Find the marginal density fY (y) and FY (y).
d) Find the conditional density fY |X(y|x).
e) Find the conditional expected value a(x) = E(Y |X = x).
Make sure that you show your working and do not forget to always specify the support of the
respective distribution.
Solution: i) Let us denote Y = X1+X2+ . . . Xk. We are looking at P (X1 = x|Y = y) and we know that
Y ∼ Poisson(λ1 + . . . λk). Here y is any fixed value 0, 1, 2, . . . and for a given y, x can be 0, 1, . . . , y. Using
the conditional distribution definiton we have for the conditional distribution, for any x in the support SX :
fX1|Y (x|y) = P (X1 = x|X1+. . . Xk = y) =
P (X1 = x)P (X1 + . . . Xk = y)
P (X1 + . . . Xk = y)
=
P (X1 = x,X2 + . . . Xk = y − x)
P (X1 + . . . Xk = y)
As X2 + . . . Xk ∼ Poisson(λ2 + . . . λk) this can be continued as:
e−λ1e−(λ2+...λk)λx1(λ2 + . . . λk)
y−xy!
x!(y − x)!e−(λ1+λ2+...λk)
and after cancellation we get it equal to
y!
x!(y − x)! (
λ1
λ1 + . . . λk
)x(1− λ1
λ1 + . . . λk
)y−x
for any x = 0, 1, . . . , y (and zero else). This is precisely Binomial(y, λ1λ1+...λk ) distribution.
ii) a) The support of the density is the region ∆ bounded by the triangle with coordinates (0,0), (0,2)
and (2,2). Now it is easy to see that∫
∆
cxy2dxdy =
∫ 2
0
∫ 2
x
cxy2dydx = 48c/15
As this integral must be equal to 1, we see that c = 516 must hold.
b) fX(x) =
∫ 2
x
fX,Y (x, y)dy =
5
16
∫ 2
x
xy2dy = 548 (8x − x4) for x ∈ (0, 2) (and zero else). For the cdf we
get via integration FX(x) =
5
48 (4x
2 − 0.2x5), 0 < x < 2 (and, of course, FX(x) = 0 when x < 0, FX(x) = 1
when x > 2).
1
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
c) fY (y) =
∫ y
0
fX,Y (x, y)dx =
∫ y
0
5
16xy
2dx = 532y
4 for y ∈ (0, 2) (and zero else) For the cdf we get via
integration: FY (y) =
1
32y
5, 0 < y < 2 (and, of course, FY (y) = 0 when y < 0, FY (y) = 1 when y > 2).
d) For each fixed x ∈ (0, 2) we have
fY |X(y|x) = fX,Y (x, y)
fX(x)
=
5
16xy
2
5
48 (8x− x4)
=
3y2
8− x3
if y ∈ (x, 2) (and zero else).
e)
E(Y |X = x) =
∫ 2
x
yfY |X(y|x)dy =
∫ 2
x
y
3y2
8− x3 dy =
3
4
16− x4
8− x3
for every x ∈ (0, 2).
Problem 2
Let X and Y be independent uniformly distributed in (0, 1) random variables. Further, let
U(X,Y ) = X + Y, V (X,Y ) = Y −X
be a transformation.
a) Sketch the support S(X,Y ) of the random vector X,Y in R
2.
b) Sketch the support S(U,V ) of the random vector (U, V ) in R
2.
c) Determine the Jacobian of the transformation
d) Determine the density of the random vector (U, V )
Justify each step.
Solution: The joint density of X and Y has a support defined on the unit square in R2 limited by
coordinates (0, 0), (0, 1), (1, 1), (1, 0). Denote this region of support by S(X,Y ). It can be seen that the
transformed vector (U, V ) has a support limited by the square with coordinates (0, 0), (1,−1), (2, 0), (1, 1).
Denote this region of support by S(U,V ). Sketches are provided below on page 3.
Using indicators we can write
fX,Y (x, y) = I(0,1)(x)I(0,1)(y).
From the defined transformation, solving the system w.r. to x and y we get
x =
1
2
(v − u), y = 1
2
(u+ v)
(as long as (u, v) is in the support S(U,V ) The Jacobian of this transformation is equal to 1/2. Hence the
density f(U,V )(u, v) = 1/2 when (u, v) ∈ S(U,V ) (and zero else).
2
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
3
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
Problem 3
You are going to the races and want to decide whether or not to bet on the horse Thunderbolt. You
want to apply decision theory to make a decision. You use the information from two independent horse-
racing experts. Data X represents the number of experts recommending you to bet on Thunderbolt
(due, of course, to their belief that this horse will win the race).
If you decide not to bet and Thunderbolt does not win, or when you bet and Thunderbolt wins the
race, nothing is lost. If Thunderbolt does not win and you have decided to bet on him, your subjective
judgment is that your loss would be four times higher than the cost of not betting but the Thunderbolt
does win (as you will have missed other opportunities to invest your money).
You have investigated the history of correct winning bets for the two horse-racing experts and it is
as follows. When Thunderbolt has been a winner, both experts have correctly predicted his win with
probability 5/6 (and a loss with a probability 1/6). When Thunderbolt had not won a race, both experts
had a prediction of 3/5 for him to win. You listen to both experts and make your decision based on the
data X.
a) There are two possible actions in the action space A = {a0, a1} where action a0 is to bet and action
a1 is not to bet. There are two states of nature Θ = {θ0, θ1} labelled symbolically as 0 and 1,
respectively, where θ0 = 0 represents “Thunderbolt winning” and θ1 = 1 represents “Thunderbolt
not winning”. Define the appropriate loss function L(θ, a) for this problem.
b) Compute the probability mass function (pmf) for X under both states of nature.
c) The complete list of all the non-randomized decisions rules D based on x is given by:
d1 d2 d3 d4 d5 d6 d7 d8
x = 0 a0 a1 a0 a1 a0 a1 a0 a1
x = 1 a0 a0 a1 a1 a0 a0 a1 a1
x = 2 a0 a0 a0 a0 a1 a1 a1 a1
For the set of non-randomized decision rules D compute the corresponding risk points.
d) Find the minimax rule(s) among the non-randomized rules in D.
e) Sketch the risk set of all randomized rules D generated by the set of rules in D. You might want
to use R (or your favorite programming language) to make this sketch more precise.
f) Suppose there are two decisions rules d and d′. The decision d strictly dominates d′ ifR(θ, d) ≤ R(θ, d′)
for all values of θ and R(θ, d) < (θ, d′) for at least one value θ. Hence, given a choice between d
and d′ we would always prefer to use d. Any decision rules which is strictly dominated by another
decisions rule (as d′ is in the above) is said to be inadmissible. Correspondingly, if a decision rule
d is not strictly dominated by any other decision rule then it is admissible. Show on the risk plot
the set of randomized decisions rules that correspond to the admissible decision rules.
g) Find the risk point of the minimax rule in the set of randomized decision rules D and determine
its minimax risk. Compare the two minimax risks of the minimax decision rule in D and in D.
Comment.
h) Define the minimax rule in the set D in terms of rules in D.
i) For which prior on {θ1, θ2} is the minimax rule in the set D also a Bayes rule?
j) Prior to listening to the two experts, you believe that Thunderbolt will win the race with proba-
bility 1/2. Find the Bayes rule and the Bayes risk with respect to your prior.
4
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
k) For a small positive ϵ = 0.1, illustrate on the risk set the risk points of all rules which are ϵ-
minimax.
Solution: a) There are two actions: a0 : decide to bet on Thunderbolt and a1 : do not bet. Two states of
nature, θ0 = 0, ”Thundebolt wins the race”, and θ1 = 1 ”Thurderbolt loses the race”. Let x be the number
of experts predicting that Thunderbolt will win. The loss function is given by
L(θ1, a0) = 4, L(θ1, a1) = 0, L(θ0, a0) = 0, L(θ0, a1) = 1.
b) The pmf for both states of nature:
x p(x|θ = 0) p(x|θ = 1)
0 16 · 16 = 136 25 · 25 = 425
1 2 · 16 · 56 = 1036 2 · 35 · 25 = 1225
2 56 · 56 = 2536 35 · 35 = 925
c)
There are 23 = 8 non-randomized decision rules:
d1 d2 d3 d4 d5 d6 d7 d8
x = 0 a0 a1 a0 a1 a0 a1 a0 a1
x = 1 a0 a0 a1 a1 a0 a0 a1 a1
x = 2 a0 a0 a0 a0 a1 a1 a1 a1
Calculation of the risk points {R(θ0, di), R(θ1, di)} is as follows:
For d1 = (0, 4):
R(θ0, d1) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0) = 0
as L(θ0, a0) = 0.
R(θ1, d1) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1) = 4
as L(θ1, a0) = 4.
For d2 = (1/36, 84/25):
R(θ0, d2) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0)
= 1× 1
36
+ 0× 5
18
+ 0× 25
36
=
1
36
R(θ1, d2) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1)
= 0 + 4× 12
25
+ 4× 9
25
=
84
25
For d3 = (5/18, 2.08):
R(θ0, d3) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0)
= 0× 1
36
+ 1× 5
18
+ 0× 25
36
=
5
18
5
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
R(θ1, d3) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1)
= 4× 4
25
+ 0× 12
25
+ 4× 9
25
=
52
25
For d4 = (11/36, 36/25):
R(θ0, d4) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a0)P (x = 2|θ0)
= 1× 1
36
+ 1× 5
18
+ 0× 25
36
=
11
36
R(θ1, d4) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a0)P (x = 2|θ1)
= 0× 4
25
+ 0× 12
25
+ 4× 9
25
=
36
25
For d5 = (25/36, 64/25):
R(θ0, d5) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0)
= 0× 1
36
+ 0× 5
18
+ 1× 25
36
=
25
36
R(θ1, d5) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1)
= 4× 4
25
+ 4× 12
25
+ 0× 9
25
=
64
25
For d6 = (26/36, 1.92):
R(θ0, d6) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a0)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0)
= 1× 1
36
+ 0× 5
18
+ 1× 25
36
=
26
36
R(θ1, d6) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a0)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1)
= 0× 4
25
+ 4× 12
25
+ 0× 9
25
=
48
25
6
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
For d7 = (35/36, 16/25):
R(θ0, d7) = L(θ0, a0)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0)
= 0× 1
36
+ 1× 5
18
+ 1× 25
36
=
35
36
R(θ1, d7) = L(θ1, a0)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1)
= 4× 4
25
+ 0× 12
25
+ 0× 9
25
=
16
25
For d8 = (0, 1):
R(θ0, d8) = L(θ0, a1)P (x = 0|θ0) + L(θ0, a1)P (x = 1|θ0) + L(θ0, a1)P (x = 2|θ0) = 1
as L(θ0, a1) = 1.
R(θ = 1, d8) = L(θ1, a1)P (x = 0|θ1) + L(θ1, a1)P (x = 1|θ1) + L(θ1, a1)P (x = 2|θ1) = 0
as L(θ1, a1) = 0.
This leads to the following risk points:
d1 d2 d3 d4 d5 d6 d7 d8
R(θ0, di) 0
1
36
5
18
11
36
25
36
26
36
35
36 1
R(θ1, di) 4
84
25
52
25
36
25
64
25
48
25
16
25 0
d) For each non-randomized decision rule we need to compute:
d1 d2 d3 d4 d5 d6 d7 d8
sup
θ∈{θ0,θ1}
R(θ, di) 4
84
25
52
25
36
25
64
25
48
25
35
36 1
Hence, inf
d∈D
sup
θ∈Θ
R(θ, d) is d7 which therefore the minimax decision in the set D with a minimax risk of
35/36.
e) Sketch of the randomized rules D generated by the set of non-randomized decision rulesD: see attached
graph of the set.
f) The admissible rules are those on the ”south-west boundary” of the risk set: any convex combination
of d8 and d4, or d4 and d2, or d2 and d1.The randomized decisions rules that correspond to admissible rules
are colored in blue on the graph.
g) The minimax decision rule in the set is given by the intersection of the lines y = x and the risk set
towards the south-most boundary. The line d8d4 has an equation
y =
36
25 − 0
11
36 − 1
(x− 1) = −2.07360x+ 2.0736.
7
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
0 1 2 3 4
0
1
2
3
4
R(0,d)
R
(1,
d)
d1
d2
d3
d4
d5
d6
d7
d8
R(0,d) = R(1,d)
M
8
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
To find the intersection, we solve
x =
36
25 − 0
11
36 − 1
(x− 1),
which gives x = y = 0.675 as the coordinates of the intersection pointM. Hence the risk point of the minimax
rule δ∗ in D is δ∗ = (0.675, 0.675) with a minimax risk of 0.675. We also realize that the risk of the minimax
rule in the set D is reduced further in the larger set of randomized decision rules D (as 35/36 > 0.675).
h) To express the rule M in terms of d4 and d8 we need to find α ∈ [0, 1] such that
0.675 = 11/36α+ 1 ∗ (1− α) and 0.675 = 36/25α+ 0 ∗ (1− α).
Each of these two relations gives the same solution α = 0.468. Therefore the randomized minimax rule
chooses d4 with probability α = 0.468 and d8 with probability 0.532.
i) Suppose the prior is (p, 1− p). This leads to a line with a normal vector (p, 1− p), that is, a slope with
−p
1−p and this slope should coincide with the slope of d4d8. Hence
−p
1− p =
36
25 − 0
11
36 − 1
= −2.0736
should hold.
Solving this leads to p = 0.675 and the least favourable prior with respect to which δM is Bayes is
(0.675, 0.325) on (θ0, θ1). As we know from the lectures, with respect to this prior, the minimax rule M is
also a Bayes rule.
j) The line with normal vector ( 12 ,
1
2 ) has slope −1. When moving such a line ”south-west” as much as
possible but retaining intersection with the risk set, we end up with the intersection point d8. Hence d8 is
the Bayes decision that corresponds to the manager’s prior.
It’s Bayes risk is:
1
2
× 1 + 1
2
0 = 1/2
k) The shaded area on the graph. From the point (0.675+0.1, 0.675+0.1) = (0.775, 0.775) put a vertical
and a horizontal line. The region in the risk set that is southwest of the quadrant defined by these lines is
the relevant one.
9
MATH5905, T1 2023 Assignment One Solutions Statistical Inference
Problem 4
In a Bayesian estimation problem, we sample n i.i.d. observations X = (X1, X2, . . . , Xn) from a popu-
lation with conditional distribution of each single observation being the geometric distribution
fX1|Θ(x|θ) = θx(1− θ), x = 0, 1, 2, . . . ; 0 < θ < 1.
The parameter θ is considered as random in the interval Θ = (0, 1).
i) If θ is interpreted as a probability of success in a single trial in a success-failure scheme, give an
interpretation of the conditional distribution of the random variable X1 given Θ = θ.
ii) If the prior on Θ is given by τ(θ) = 20θ3(1 − θ), 0 < θ < 1, show that the posterior distribution
h(θ|X = (x1, x2, . . . , xn)) is also in the Beta family. Hence determine the Bayes estimator of θ with
respect to quadratic loss.
Hint: For α > 0 and β > 0 the beta functionB(α, β) =
∫ 1
0
xα−1(1−x)β−1dx satisfiesB(α, β) = Γ(α)Γ(β)Γ(α+β)
where Γ(α) =
∫∞
0
exp(−x)xα−1dx. A Beta (α, β) distributed random variable X has a density
f(x) =
1
B(α, β)
xα−1(1− x)β−1, 0 < x < 1,
with E(X) = α/(α+ β).
iii) Seven observations form this distribution were observed: 3, 4, 7, 3, 6, 5, 2. Using zero-one loss, what
is your decision when testing H0 : θ ≤ 0.80 against H1 : θ > 0.80. (You may use the integrate function
in R or another numerical integration routine from your favourite programming package to answer the
question.)
Solution: i) X1 denotes the number of trials before the first failure happens. The conditional distribution
is the distribution of X1 given that the probability of success in a single trial is θ.
ii) Let x = (x1,x2, . . . ,xn) be the realized data. Then
f(x|θ) = θ
∑n
i=1 xi(1− θ)n
and
f(x|θ)τ(θ) = 20θ3+
∑n
i=1 xi(1− θ)n+1 ∝ θ3+
∑n
i=1 xi(1− θ)n+1
which means that the posterior of θ given the sample is Beta (4 +
∑n
i=1 xi, n+ 2). Then the Bayes estimate
θˆB being the mean of the posterior distribution, is
θˆB =
∑n
i=1 xi + 4∑n
i=1 xi + 6 + n
.
iii) Now we have θˆB = 34/43 = 0.7907 which is in the vicinity of 0.80 so we need to evaluate the effect
of the prior to see if H0 or H1 is more relevant. The posterior is Beta(34,9). Now
P (H0|x) = 1
B(34, 9)
∫ 0.80
0
x33(1− x)8dx ≈ 0.5309
Hence we do not have sufficient reason to reject H0. (Here we used the beta function and the integrate
function in R for the numerical calculation.)