STA 135 Spring 2021
Homework III - Due Friday, April 23th
Book Homework (does not require R)
Note: This may be hand written or typed. Answers
should be clearly marked.
Please do 8.1, 8.2, 8.4, 8.6
R Homework (requires some use of R)
Note: You do not have to use R Markdown to turn
in the homework, but the homework must be turned
in in a reasonable format. The answers to the ques-
tions should be in the body of the homework, and
the code used to obtain those answers should be in
an appendix. There should be no code in the body of
the homework. You can accomplish this in R, Word,
LaTex, Google Docs, etc.
I. The purpose of this problem is to examine the effect
that different correlations have on the outcome of the
PCA. To make this easier, suppose x has a bivariate nor-
mal distribution with µ = (0, 0)T , σ11 = σ22 = 1. For
σ12 = −0.99,−0.9,−0.5, 0, 0.5, 0.9 and 0.99 (remember
that σ12 = ρ12 because the variances are equal to 1),
complete the following:
(a) Simulate 1,000 observations from the bivariate nor-
mal where a seed number of 8128 is set right before
each data simulation.
In R, use the command ”set.seed(8128)” to set your
seed. Then ”rmvnorm(n = N, mean = mu, sigma =
sigma)” to generate random normal variables.
(b) Use ”princomp()” with ”cor = TRUE” to find the
estimated eigenvalues and eigenvectors from the cor-
relation matrix.
(c) Interpret the PCs
(d) How many PCs are necessary?
(e) Create separate scatter plots of the data and the PC
scores, but use one overall x-axis and y-axis set of
limits. Describe the relationship between these plots
for each ρ12.
(f) Relate your answers in c) – e) to the value of σ12.
II. The weekly rates of return for five stocks listed on the
New York Stock Exchange. Online you will find the
file ”Stock-Data.txt”. The txt file has the following
columns:
Column 1. JP Morgan:
Column 2. Citibank:
Column 3. Well Fargo:
Column 4. Royal Dutch Shell:
Column 5. Exxon Mobil:
(a) Construct the sample covariance matrix S, and find
the sample principal components.
(b) Interpret the first two PCs.
(c) Determine the proportion of the total sample vari-
ance explained by the first three principal compo-
nents. Interpret these components.
(d) Generate the scree plot and interpret the plot.
(e) Plot the first two PCs and interpret your plot.
(f) Given the results from the previous parts, do you
feel that the stock rates-of-return data can be sum-
marized in fewer than five dimensions ? Explain.
III. Online you will find the ”Goblet.csv” file.
Below is the picture of the measurements for the Goblet.
Subject-matter researchers are interested in grouping
goblets that have the same shape although they may
have different sizes. One way suggested by Manly(1994)
to adjust the data is to divide each measurement by X3
(height). This can easily be done in R.
Create these variables in R. You analysis will be done
based on this variables.
w1 = goblet$x1/goblet$x3,
w2 = goblet$x2/goblet$x3,
w4 = goblet$x4/goblet$x3,
w5 = goblet$x5/goblet$x3,
w6 = goblet$x6/goblet$x3)
(a) Generate the Star plot using the following R com-
mand.
win.graph(width = 11, height = 7)
stars(x = goblet[,-1], draw.segments = TRUE,
key.loc = c(14,10), main = "Goblet star plot",
labels = goblet$ID)
(b) Which goblets appear to stand out? Can you make
any generalizations about groups for goblets?
(c) Generate the Parallel coordinates plot using the fol-
lowing R command.
1
parcoord(x = goblet2[,-1], col = col.w5,
main = "Goblet parallel coordinate plot")
(d) Interpret the parallel plot.
(e) Run the Principal component analysis using ”cor =
TRUE”
(f) Interpret the first three principal components.
(g) How many PCs would you suggest for the analysis
of the goblet data. Justify your answer.
(h) Produce the Scree plot for the cereal data and inter-
pret your plot.
(i) Plot the first two PCs and interpret the plot.
2
学霸联盟