COMP3670/6670-Python代写-Assignment 3|学霸联盟

COMP3670/6670-Python代写-Assignment 3

时间：2023-10-05

The Australian National University Semester 2, 2023
School of Computing Theory Assignment 3 of 4
COMP3670/6670: Introduction to Machine Learning
Release Date. Sept 21, 2023
Due Date. 11:59 pm, Oct 9, 2023
Maximum credit. 100
Exercise 1 Probability rules (5 + 5 + 15 credits)
John and Ashley participated in an Easter Egg Hunt. After the hunt, John had a bag with 20 eggs, 15
red and 5 blue, and Ashley was able to collect 15 eggs, 10 red and 5 blue.
1. Without looking, John picks an egg from his bag. What is the probability that the egg is red?
2. Assuming we don’t know the colour of the selected egg in 1. and don’t return that egg to John’s
bag, he picks another egg from his bag. What is the probability that the second pick is a red egg?
3. Now John put those two eggs back into his bag. When they got home, their dad Papi accidentally
mixed the two bags into one. Papi picks two eggs from the big bag, sequentially. The first is a red
egg and the second is blue, and gives them to Ashley. Assuming John and Ashley didn’t eat any on
their way home, what is the probability that those two eggs Papi picked are actually from Ashley’s
original bag?
Exercise 2 Geometric distributions and Bayes’ rule (5 + 5 + 10 + 10 credits)
Young statistician Terry is in a bar and sees the bartender start tossing a coin when each customer orders,
and if the coin lands on a head, the customer will get a free drink.
1. Terry noted the first free drink was given out after 5 tosses. What is the probability of this, assuming
the probability of the coin landing heads is p?
2. Terry’s friend, Tao, is the tenth customer. Assuming no free drink was given between the fifth
customer and Tao, what is the probability that Tao will get a free drink, given the first free drink?
Comment on the result.
3. Tao actually got a free drink! Terry estimates p by finding pml that maximises the probability of
the observed free drink and non-free drink events. Find pml.
4. Terry is not happy about this estimate. He then tries a Bayesian approach, by first placing a beta
prior over p. The beta distribution has the following pdf,
f(p;α, β) =
1
B(α, β)
pα−1(1− p)β−1,
where α and β are two parameters of the distribution, and B(α, β) is the Beta function (you
don’t need to know the exact form to complete this exercise). For α > 1 and β > 1, the mode
of the distribution where the density is peaked is α−1α+β−2 . Assume α = β = 2, find the posterior
distribution, that is the distribution over p conditioned on the observed events, and then find the
mode of this distribution and compare the posterior mode with the estimate in 3. Comment on the
difference.
Exercise 3 Gaussian distributions and Bayes’ rule (10 + 5 + 5 credits)
An explosion was detected by two sensors. Each sensor is only able to output a noisy estimate of the
location of the explosion due to measurement noise. Assuming the two sensor outputs are y1 and y2, and
the likelihood of the exact location x given the sensor outputs is,
p(y1|x)p(y2|x) = N (y1;x, σ21)N (y2;x, σ22),
where σ21 and σ
2
2 are the measurement noise variances. Assuming a prior distribution over the location
p(x) = N (x; 0, σ20), where σ20 is the prior variance.
1. Find the posterior distribution p(x|y1, y2)
2. What happens to the posterior distribution
(a) when the measurement noise variances are very large, σ21 , σ
2
2 →∞,
(b) when the prior variance σ20 →∞, and σ1 = σ2.
Exercise 4 Conjugate priors (10 + 15 credits)
In the lecture, we discussed the Bayesian linear regression model when we assume
1. the likelihood p(y|X,θ) is a Gaussian distribution N (y;Φθ, σ2I)
2. the prior weight θ follows a Gaussian distribution N (θ;m0,S0)
Then we derive the posterior distribution. Note the order here: we first fix the likelihood, then we
choose the prior distribution. So you may ask: will the choice of likelihood affect the choice of the
prior? The answer is yes. In the example given, we fix the prior Gaussian, given a Gaussian likelihood
with known covariance, the posterior is also a Gaussian. Given a likelihood function, the posterior
will maintain the same probability distribution family as the conjugate prior after the Bayesian up-
date. Here, the conjugate prior of aGaussian with known covariance is another Gaussian distribution.
The advantage is obvious by assuming a Gaussian likelihood, as the conjugate prior and the posterior
are all Gaussians, significantly simplifying the Bayesian analysis. However, in some cases, the likelihood
can be a distribution other than the Gaussian, or a Gaussian with unknown mean and covariance.
How should we set up the priors in these situations? We consider some one dimensional cases below:
1. Given that the Gamma distribution likelihood
p(y|β) = Gamma(y;α, β) = β
α
Γ(α)
yα−1e−βy
where α is a positive constant and β > 0 is unknown. We place a Gamma prior over β,
Gamma(β;α0, β0) as the conjugate prior. Prove the posterior distribution can be written as
Gamma(β;α0 + α, β0 + y).
2. Given that the Gaussian distribution likelihood
p(y|µ, τ) = N (y;µ, τ−1) =
√
τ√
2π
e−
(y−µ)2τ
2
where µ, τ are unknown and has the Normal-Gamma distribution
NormalGamma(µ, τ ;µ0, τ0, α0, β0) =
βα00
√
τ0
Γ(α0)
√
2π
τα0−
1
2 e−β0τe−
ττ0(µ−µ0)2
2
as the conjugate prior. Prove the posterior distribution can be written as
NormalGamma(µ, τ ;µ′, τ ′, α′, β′)
And represent µ′, τ ′, α′, β′ using µ0, τ0, α0, β0, y.
Hints:
1. You do not need to know the specific form the gamma function Γ(·) to do this question.
2. What are the constants? What are the variables? Given a certain family of distribu-
tions, what makes individual distributions different from each other?