Java代写-MATH3871/MATH5970|学霸联盟

Java代写-MATH3871/MATH5970

时间：2021-09-24

MATH3871/MATH5970
Bayesian Inference and Computation
Tutorial Problems 2 (Solutions)
1. Posterior computations
We have data for Melbourne average daily per capita water use (litres) from 1940-2004.
Data is discrete, so one possible model is Xi ～ Poisson(θ), i = 1, . . . , n. For the moment,
assume independence among observations, and a constant rate θ, which is clearly a wrong
assumptions, but it simplifies computations for now.
We observe
n∑
i=1
xi = 24, 890 , n = 65
and the first part of the dataset is:
Year 1940 1941 1942 1943 1944 1945 1946 ...
Litres (xi) 300 320 315 330 340 335 290 ...
Compute the posterior distribution for θ using a conjugate prior distribution. Make two
prior specifications in terms of hyperparameters.
Answer:
The model is: Xi ～ Poisson(θ), p(xi|θ) = θ
x
i
xi!
exp(?θ), for i = 1, . . . , n and θ > 0. The
likelihood function is then: L(θ;x) =
∏n
i=1
θx
x!
exp(?θ) = θ
∑n
i=1 xi exp(?nθ)∏n
i=1 xi!
.
We want to run a conjugate analysis, so we choose a prior distribution which has the same
structure of the likelihood function: θ ～ Gamma(α, β), pi(θ) = βα
Γ(α)
θα?1 exp(βθ), α, β > 0.
Combining the likelihood function with the prior distribution, we obtain:
pi(θ|x) ∝ θ
∑n
i=1 xi exp(?nθ)∏n
i=1 xi!
× β
α
Γ(α)
θα?1 exp(?βθ)
∝ θ
∑n
i=1 xi exp(?nθ)× θα?1 exp(?βθ)
= θ(α+
∑n
i=1 xi)?1 exp(?θ(β + n)) ∝ Gamma(α +
n∑
i=1
xi, β + n)
We can use the following prior specifications:
? α = 1, β = 0.01
? α = 1, β = 10
The posterior is influenced by the prior.
1
0 100 200 300 400
0
.0
0
0
.0
5
0
.1
0
0
.1
5
(a) Moderate prior
θ
P
o
st
e
ri
o
r
D
e
n
si
ty
Prior
Posterior
Sample mean
0 100 200 300 400
0
.0
0
.2
0
.4
0
.6
0
.8
1
.0
(b) Strongly informative prior
θ
P
o
st
e
ri
o
r
D
e
n
si
ty
Prior
Posterior
Sample mean
2. Gamma model
Suppose X1, . . . , Xn are i.i.d. Gamma(k, θ) variables, with k known/fixed.
Find the conjugate prior distribution and derive the posterior distribution.
Answer:
The likelihood function is
L(θ;x) =
n∏
i=1
θk
Γ(k)
xi
k?1 exp(?xiθ)
∝ θkn exp(?θ
n∑
i=1
xi)
Suppose prior beliefs can again be represented by θ ～ Gamma(a, b):
pi(θ) =
ab
Γ(a)
θa?1 exp(?bθ) θ > 0
∝ θa?1 exp(?bθ).
Then by Bayes’ Theorem:
pi(θ|x) ∝ L(θ;x)pi(θ)
∝ θkn exp(?θ
n∑
i=1
xi)θ
a?1 exp(?bθ)
= θa+kn?1 exp(?(b+
∑n
i=1 xi)θ)
～ Gamma(a+ kn, b+
n∑
i=1
xi).
3. Jeffreys’ prior for the normal model - UGD
Find the Jeffreys’ prior for
? the normal model with known variance;
? the normal model with knwon mean;
? the normal model with unknown variance and mean.
2
Answer:
? the normal model with known variance;
Suppose X1, . . . , Xn are i.i.d. from N(θ, σ
2) with σ2 known.
Then
L(θ;x) ∝ exp
{
?
∑n
i=1(xi ? θ)2
2σ2
}
and the log-likelihood (Step 1) is
log(L(θ;x)) = c?
∑n
i=1(xi ? θ)2
2σ2
where c is a constant not depending on θ.
The first derivative (Step 2) is
d logL(θ;x)
dθ
=
2nxˉ
2σ2
? n
2σ2
2θ
and the second derivative (Step 3) is
d2 logL(θ;x)
dθ2
= ? n
σ2
.
Finally, we compute the Fisher information (Step 4)
I(θ) = ?E{d2 logL(θ;X)
dθ2
}
= E{ n
σ2
} = n
σ2
Hence (Step 5)
piJ(θ) ∝ 1
the improper uniform distribution is the Jeffreys’ prior.
? the normal model with known mean;
Suppose X1, . . . , Xn are i.i.d. N(m, θ) with m known. Then The likelihood function
is
L(θ;x) ∝ θ?n/2 exp{?s/(2θ)}
and the log-likelihood (Step 1)
logL(θ;x) = c? n
2
log θ ? s
2θ
where s =
∑n
i=1(xi ?m)2 and c is a constant not depending on σ2.
Then the first derivative is (Step 2)
d logL(θ;x)
dθ
= ? n
2θ
+
s
2θ2
and the second derivative is (Step 3)
d2 logL(θ;x)
dθ2
=
( n
2θ2
)
?
( s
θ3
)
and since E(S|θ) = nθ (Step 4), the Jeffreys’ prior is (Step 5)
piJ(θ) ∝
[
2n? n
2θ2
] 1
2
∝ θ?1
3
? the normal model with unknown mean and variance;
Suppose X1, . . . , Xn are i.i.d. N(μ, σ
2) with θ = (μ, σ2) then the likelihood funciton
L(θ;x) ∝ σ?n exp
[
? n
2σ2
{s+ (xˉ? μ)2}
]
,
where s = n?1
∑n
i=1(xi ? xˉ)2.
Then the Fisher information matrix is (Step 4)
I(θ) = ?E
[
d2 logL(θ;X)
dμ2
d2 logL(θ;X)
dμdσ2
d2 logL(θ;X)
dσ2dμ
d2 logL(θ;X)
d(σ2)2
]
= E
(
n
σ2
n(Xˉ?μ)
σ4
n(Xˉ?μ)
σ4
?{n/(2σ4)}+n{S+(Xˉ?μ)2}
σ6
)
=
( n
σ2
0
0 n
(2σ4)
)
.
And so Jeffreys’ prior (Step 5) is
piJ(μ, σ
2) ∝ σ?3.
4. Some more conjugate analysis
Derive the posterior if X1, . . . , Xn are a random sample from the distribution with prob-
ability mass function
p(x|θ) = θx?1(1? θ); 1, 2, . . .
when using a conjugate prior distribution.
Answer:
A conjugate prior distribution is a Be(p, q):
p(θ) =
θp?1(1? θ)q?1
Be(p, q)
, 0 < θ < 1
pi(θ|x) ∝ θ
∑n
i=1(xi?1)(1? θ)n θ
p?1(1? θ)q?1
B(p, q)
=
θ
∑n
i=1(xi?1)+p?1(1? θ)n+q?1
B(p, q)
hence pi(θ|x) ～ Beta(∑ni=1 xi ? n+ p, n+ q).
5. Defective items
(a) The proportion, θ, of defective items in a large shipment is unknown, but expert
assessment assigns θ the Beta(2,200) prior distribution. If 100 items are selected at
random from the shipment, and 3 are found to be defective, what is the posterior
distribution of θ?
(b) If a different statistician, having observed the 3 defectives, calculated their posterior
distribution as being a Beta distribution with mean 4/102 and variance 0.0003658,
then what prior distribution had they used?
4
Answer:
(a) We have that X|θ ～ Binomial(100, θ), so
pi(θ|x) ∝ θ3(1? θ)100?3θ(1? θ)199
= θ4(1? θ)296,
and so pi(θ|x) ～ Beta(5, 297).
(b) With E(θ) = p/(p+ q) = 4/102 and Var(θ) = pq
(p+q)2(p+q+1)
= 0.003658, the posterior
Beta(p,q) distribution must have parameters p = 4 and q = 98.
If we have a Beta(p′, q′) prior distribution then by looking at the powers of θ and
(1? θ) we know that p′+ x = 4 and q′+ n? x = 98. Therefore we know that p′ = 1
and q′ = 1.
6. Magnetic tape - PGD
The number of defectives in a 1200 meter roll of magnetic tape has a Poisson(θ) distri-
bution. Suppose that the prior distribution for θ is Gamma(3,1). When 5 rolls of this
tape are selected at random, the number of defectives found on each are 2, 2, 6, 0 and 3
respectively. Determine the posterior distribution of θ.
Answer: This is a conjugate prior setup, so right away we know that pi(θ|x) ～ Gamma(p+∑n
i=1 xi, q + n) = Gamma(3 + 13, 1 + 5) = Gamma(16, 6), as
∑n
i=1 xi = 13 and n = 5.
7. Prior elicitation
Table 1 shows data on the times between a series of earthquakes. An earthquake is
included if its magnitude is at least 7.5 on the Richter scale or if over 1000 people were
killed. Recordings starts on 16 December 1902 (4500 killed in Turkmenistan). The table
includes data on 21 earthquakes (i.e. 20 waiting times between earthquakes).
840 157 145 44 33 121 150 280 434 736
584 887 263 1901 695 294 562 721 76 710
Table 1: Time intervals between major earthquakes (in days)
Historical evidence suggests that an exponential distribution Exp(θ) is appropriate to
model waiting times. The parameter θ described the rate at which earthquakes occur.
Suppose an expert tells us that earthquakes in the region usually occur less than once a
year; in fact, they occur on average once every 400 days. This gives us a rate of occurence
of about 1/400 = 0.0025 per day (to match the daily units from the Table). Further, he
is fairly certain about this and specifies a very small variance of 6.25× 10?7.
Find the hyperparameters of the conjugate prior distribution.
Answer:
The conjugate prior of the exponential distribution is a Gamma prior Ga(a, b). Therefore
we can just use the information provided by the expert and transform it into the expected
value and the variance of the prior gamma distribution. More specifically{
a
b
= E[θ] = μ
a
b2
= Var[θ] = σ2
=
{
α = μ
2
σ2
= 0.0025
2
6.25×10?7
β = μ
σ2
= 0.0025
6.25×10?7
Therefore the prior distribution is Ga(10, 4000).
5
The posterior distribution is then
pi(θ|x) ∝ L(θ; x)× pi(θ)
=
n∏
i=1
θe?θxi × b
a
Γ(a)
θa?1e?θb
= θne?θ
∑n
i=1 xi × θa?1e?θb
= θa+n?1e?θ(b+
∑n
i=1 xi)
which is the core of a Gamma distribution with parameters Ga(a+ n, b+
∑n
i=1 xi).
Figure 1 shows the comparison between the prior and the posterior distribution of the
parameter θ.
Figure 1: Earthquakes example
8. Prior elicitation
Let’s suppose we want to use a beta prior distribution for a probability parameter; we
talked with an expert who says that the distribution should have mean 0.70 and variance
equal to 0.05. Find the shape parameters of the prior distributions.
Answer:
{
a
a+b
= μ
ab
(a+b)2(a+b+1)
= σ2
Let’s consider a first.
a = aμ+ bμ
a(1? μ) = μb
a =
μ
1? μb
6
Then b becomes
μ
1?μb
2(
μ
1?μb+ b
)2 (
μ
1?μb+ b+ 1
) = μ1?μb2(
1
1?μb
)2 (
1
1?μb+ 1
)
=
μ
1?μ
1
(1?μ)2
(
1
1?μb+ 1
) = μ1?μ1
(1?μ)3 b+
1
(1?μ)2
=
μ
1? μ
(1? μ)2
1
[
b+ 1? μ
1? μ
]?1
=
μ(1? μ)2
b+ (1? μ) = σ
2
μ(1? μ)2 = σ2b+ σ2(1? μ)
b =
μ(1? μ)2 ? σ2(1? μ)
σ2
= 0.96
And going back to a
a =
μ
1? μ
(1? μ)[μ(1? μ)? σ2]
σ2
=
μ[μ(1? μ)? σ2]
σ2
= 2.24
7