Homework 3 Statistical Machine Learning II, Semeter B 2020/2021
14, 2021. Homework must be neatly written-up or typed for submission. I reserve the right
to refuse homework that is deemed (by me) to be excessively messy.
1. PCA. Consider the monthly log stock returns, in percentages and including dividends,
of Merck & Company, Johnson & Johnson, General Electric, General Motors, Ford Motor
Company, and value-weighted index from January 1960 to December 1999; see the file
m-mrk2vw.txt, which has six columns in the order listed before.
(a) Perform a principal component analysis of the data using the sample covariance matrix.
Try different number of principal components and report the variance explained in
each scenario.
(b) Perform a principal component analysis of the data using Radial kernel. Try different
number of principal components and use CV to tune the kernel parameter σ.
2. Neural networks.. The file daily0005.txt (available on Canvas) contains daily data on
several stocks from 2000 to 2005. We will focus only on the data from IBM. As an initial
step in this problem, you should use daily0005.txt to create a dataset containing daily log
returns on IBM for the available time period. In what follows, use the data from the last 252
days as the test set, and the rest as the training set.
(a) Fit a neural network for predicting the daily log returns, using the three previous log
returns as inputs (Take log of column 4 PRC). Do not use weight decay, and use M
hidden units. For each 1 ≤M ≤ 12, (i) use ten random starting values for the weights;
(ii) for each starting value, compute the test error under the squared loss; and (iii)
produce a box plot of the ten test errors. Finally put these twelve box plots together
in one graph. Comment on your findings.
(b) Fix the number of units at M = 12. Try different weight decay parameters, produce
a box plot for each weight decay parameter, and put these box plots in one graph.
(c) Choose the best weight decay parameter from part (b). Repeat part (a) with this
weight decay. Produce a graph with twelve box plots and comment on your findings.
3. Latent Factor Model (Open problem). In the latent factor model for recommender systems,
we assume that the ratings of user i for item j is modeled by
rij = u
>
i vj + ij ,
where ui ∈ Rk and vj ∈ Rk are user factor and item factor, respectively. The resulted
minimization problem is
min
U,V

(i,j)∈Ω
(rij − u>i vj)2 + λ
n∑
i=1
‖ui‖2 + λ
m∑
j=1
‖vj‖2
Homework 3 Statistical Machine Learning II, Semeter B 2020/2021
In above model, both user factor and item factor are assumed be of the same dimension
k. However, such assumptions seems to be a little bit artificial. In general, the number of
latent user features does not need to be the same as latent item features. Suppose now that
each user is governed by p features ui = (ui,1, ui,2, . . . , ui,p) and each movie by q features
vj = (vj,1, vj,2, . . . , vj,q) and p 6= q, then how can we adjust the latent factor model? Provide