Machine Learning Wednesday, 19. January 2022
Prof. S. Harmeling DUE 23:55 Tuesday, 25. January 2022
Exercise set #12
Please submit your solutions in teams of two using the sciebo file-drop folder. The link is
available in ILIAS. For the formatting please stick to the submission guideline.pdf that you
can find on sciebo. In the case of multiple uploads we will consider the latest. Uploads after
the deadline will be deleted without further notice.
1. Laplace approximation (from MacKay [1])
A photon counter is pointed at a remote star for one minute, in order to infer the rate of
photons arriving at the counter per minute λ. Assuming the number of photons collected
r has a Poisson distribution
p(r|λ) = exp(−λ)λ
r
r!
, with λ > 0
and assuming an improper1 prior
p(λ) =
1
λ
perform a Laplace approximation for the posterior of λ.
30 points
2. Gaussian processes with probit likelihood (from Rasmussen and Williams [2])
For a binary Gaussian process classification model, show the equivalence of using
(i) a noise-free latent process combined with a probit likelihood, i.e., choosing
p(yi = 1|fi) = σprobit(fi) = 1√
2pi
∫ fi
−∞
exp
(
−t
2
2
)
dt
and
(ii) a latent process with Gaussian noise combined with a step-function likelihood. This
can be expressed by introducing additional noisy latent variables f˜i, which differ
from fi by Gaussian noise, and defining p(yi = 1|f˜i) as follows:
p(f˜i|fi) = N (f˜i|fi, σ2) p(yi = 1|f˜i) =
{
1 if f˜i ≥ 0
0 otherwise
Hint: Start with the expressions in (i) and integrate out f˜i to get an expression for
p(yi = 1|fi), which should look like σprobit. What do you have to plug in for σ2 in (ii) to
exactly match (i)?
30 points
1An improper prior is a prior that cannot be normalized to a probability distribution.
3. Kernel design
Recall from the lecture that a positive semidefinite kernel is a function k : X × X → R,
such that, for every set x = {xi}i=1,...,N with xi ∈ X ∀i, the matrix kxx with elements
kxx(ij) = k(xi, xj) is positive semidefinite. You can use the facts, that if k1(x, x
′) and
k2(x, x
′) are both positive semidefinite kernels and α ∈ R\0, then the functions k(x, x′) =
α2k1(x, x
′), k(x, x′) = k1(x, x′)+k2(x, x′) and k(x, x′) = k1(x, x′)·k2(x, x′) are also positive
semidefinite kernels. For each of the following functions, determine whether or not they
are kernels. Clearly state your assumptions and your derivation.
(a) k(x1, x2) = C for C ∈ R>0
(b) k(x1, x2) = x1x2 for X = R
(c) k(x1, x2) = x1 + x2 for X = R
(d) k(x1, x2) = 5x
ᵀ
1x2 for X = RD
(e) k(x1, x2) = (x
ᵀ
1x2 + 1)
2 for X = RN
40 points
References
[1] David JC MacKay and David JC Mac Kay. Information theory, inference and learning
algorithms. Cambridge university press, 2003.
[2] Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine
Learning (Adaptive Computation and Machine Learning). The MIT Press, 2005.