IERG 5130 Homework 1
Due on Mar 16, 2021
Problem 1: Bayesian Networks [18 pts]
Consider a Bayesian Network given by the following joint distribution
P (A,B,C,D,E) = P (A) · P (B|A) · P (C|A) · P (D|B,C) · P (E|D).
Questions:
1. Please draw the graphical model [2 pts]
2. Please draw the factor graph. [2 pts]
3. Please determine whether the following marginal/conditional independencies are true. For
each conditional independence statement that does not hold, please specify an active trail.
[6 pts]
(a) A ⊥ E
(b) A ⊥ E | C
(c) A ⊥ E | D
(d) B ⊥ C
(e) B ⊥ C | A
(f) B ⊥ C | (A,E)
4. Suppose all variables are discrete variables, each in a finite space of cardinality m.
(a) Please analyze the computational complexity of computing P (A) using the naive
approach. [3 pts]
(b) Please find an optimal variable elimination order, and analyze the computational
complexity again following the variable elimination algorithm. [5 pts]
Problem 2: Markov Random Fields [20 pts]
Consider a Markov random field defined over a circle of length n (with n ≥ 2):
P (X1, . . . , Xn) =
1
Z
φ(X1, X2) · · ·φ(Xn−1, Xn) · φ(Xn, X1).
Questions:
1. For the case with n = 4, [6 pts]
(a) Please list all local independencies.
(b) Please list all pairwise independencies.
1
(c) Please list all global independencies.
2. Please analyze the computational complexity of computing P (X1) using the naive ap-
proach. Here, each variable Xi is in a finite space of cardinality m. [4 pts]
3. Please analyze the computational complexity of computing P (X1) using variable elimina-
tion, and show that all elimination orders result in the same complexity. [4 pts]
4. Suppose Xi is a binary variable (can take either 0 or 1) for each i and n = 10. The factor
φ is defined as
φ(x, y) =
{
1 (x = y)
α (x 6= y)
Please compute the chance of all variables taking the same value. What is the minimum
α to make this chance above 50%? (Note: please write down your derivation in the
submitted answer. Using a computer program, e.g. enumerating all combinations, is not
an acceptable solution.) [6 pts]
Problem 3: Multivariate Normal Distribution [16 pts]
The probability density function for a multivariate normal distribution over a d-dimensional
space is given by
p(x|µ,Σ) = 1√
(2pi)d|Σ| exp
(
−1
2
(x− µ)TΣ−1(x− µ)
)
.
Questions:
1. Show that it is an exponential family, and please specify the sufficient statistics, the
canonical parameters in terms of µ and Σ, and the log-partition function. [5 pts]
2. Is it a minimal representation? [2 pts]
3. Compute the entropy of this distribution (show the derivation steps). [3 pts]
4. Suppose Σ is fixed. Show that Multivariate normal distribution is a conjugate prior for
the mean parameter µ. Let the prior be N (µ0,Σ0), given a set of samples x1, . . . ,xn,
compute the posterior distribution. [6 pts]
Problem 4: Predictive Distribution [18 pts]
Given a generative model:
p(x|θ) = exp(η(θ)Tφ(x)− γA(θ))
and a conjugate prior:
p(θ|α, β) = exp(αTη(θ)− βA(θ)−B(α, β))
Given a set of observations D = {x1, . . . ,xn}, we have a posterior distribution p(θ|D), which,
due to conjugacy, remains in the same family as the prior. With the parameters θ marginalized
out, we can get a predictive distribution of the next sample x′, defined as
p(x′|D) =

p(x′|θ)p(θ|D)µ(dθ).
2
Here, µ is the base measure of the underlying parameter space.
Questions:
1. Please derive the analytical form of p(x′|D) in terms of A and/or B. [8 pts]
2. Consider a dynamic model as below
xt = xt−1 + ε, ε ∼ N (µ,Σ),
where Σ is fixed, and µ is generated from a prior distribution N (0, σ20I). Note that all
time steps share the same µ (but it is unknown a priori).
(a) Please derive the predictive distribution p(xT+1|x0, . . . ,xT ). [6 pts]
(b) Please derive the n-step predictive distribution p(xT+n|x0, . . . ,xT ). [4 pts]
3 