pytbon代写-CAP 6610
时间:2022-04-16
CAP 6610 Machine Learning, Spring 2022
Homework 4
Due 4/20/2022 11:59PM
1. (15%) Alternating optimization for PCA. Recall that the principal component analysis (PCA) tries to
find the projection matrix Θ and embeddings y1, . . . ,yn as the solution of
minimize
Θ,yi
n∑
i=1
‖φi −Θyi‖2.
We can define matrices Φ = [ φ1 φ2 · · · φn ] and Y = [ y1 y2 · · · yn ] and rewrite the problem as
minimize
Θ,Y
‖Φ−ΘY ‖2. (1)
Without imposing the constraint Θ>Θ = I, derive an alternating optimization algorithm for (1).
2. (15%) Naive Bayes GMM. Consider a Gaussian mixture model in which the marginal distribution of
the latent variable y is Pr(y = ec) = pic, c = 1, . . . , k, and the conditional distribution of x given y is
N (µc,Σc) where each of the covariance matrices Σc is diagonal. This means given the latent variable
y, elements of the ambient variable x are conditionally independent, resembling the naive Bayes model.
Given a set of observations x1, . . . ,xn, derive the expectation-maximization algorithm for estimating
the model parameters pic,µc, and Σc for c = 1, . . . , k.
3. (40%) 20 Newsgroup revisited. Let us revisit the 20 Newsgroup data set 20Newsgroups/>, and apply some of the unsupervised methods by ignoring their labels. We will only
consider the training data. You are required to code the algorithms by yourselves in the language of
your choice.
(a) LSI/PCA via orthogonal iteration. Implement the orthogonal iteration algorithm that finds the
PCA projection matrix Θ of a data matrix Φ. You are allowed to use a pre-existing function
of QR. Apply tf-idf to the term-document matrix to obtain Φ and feed it into your orthogonal
iteration algorithm. Remember to use sparse matrix operations to avoid unneccessary mem-
ory/computational complexities. Set k = 2 and let the algorithm run until Θ doesn’t change
much. Then get Y = Θ>Φ. Each column of Y is a two-dimensional vector that you can plot on a
plain. Plot all the documents on a two-dimensional plain, and use a different color for each point
that belong to different news groups.
(b) GMM via EM. Implement the EM algorithm for the Gaussian mixture model (with different means
and covariances for each Gaussian component). The data matrix Φ is obtained from LSI with
kLSI = 100 using the previous orthogonal iteration algorithm. Run the EM algorithmn for GMM
with kGMM = 20 until convergence. For each Gaussian component N (µc,Σc), calculate Θµc
where Θ is the PCA projection; the vector Θµc should be element-wise nonnegative. For each
cluster c, show the 10 terms that have the highest value in Θµc. The index-term mapping can
be found here . Does the result
make sense?
1
4. (30%) Mixture of multinomials. Consider the latent variable model with the following probability
distribution:
Pr(yi = c) =
1
k
, Pr(xi|yi = c) ∼ Multi(pc, Li),
meaning that yi is categorical, with k possible outcomes and equal probability; (xi|yi = c) follows a
multinomial distribution by drawing from pc Li times, i.e.,
p(xi|yi = c) = Li!∏d
j=1 xij !
d∏
j=1
pcj
xij .
Given data samples x1, ...,xn:
(a) Write out the maximum likelihood formulation for estimating p1, ...,pk. Simplify the objective
function as much as possible.
(b) Derive an expectation-maximization algorithm for approximately solving the aforementioned
problem.
(c) Implement this algorithm and try it on the 20 News Group data set with k = 20 (on the raw
word-count data, without tf-idf preprocessing). Show the top 10 words in each cluster.
2


essay、essay代写