xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-STA 135

时间：2021-04-23

STA 135 Spring 2021

Homework III - Due Friday, April 23th

Book Homework (does not require R)

Note: This may be hand written or typed. Answers

should be clearly marked.

Please do 8.1, 8.2, 8.4, 8.6

R Homework (requires some use of R)

Note: You do not have to use R Markdown to turn

in the homework, but the homework must be turned

in in a reasonable format. The answers to the ques-

tions should be in the body of the homework, and

the code used to obtain those answers should be in

an appendix. There should be no code in the body of

the homework. You can accomplish this in R, Word,

LaTex, Google Docs, etc.

I. The purpose of this problem is to examine the effect

that different correlations have on the outcome of the

PCA. To make this easier, suppose x has a bivariate nor-

mal distribution with µ = (0, 0)T , σ11 = σ22 = 1. For

σ12 = −0.99,−0.9,−0.5, 0, 0.5, 0.9 and 0.99 (remember

that σ12 = ρ12 because the variances are equal to 1),

complete the following:

(a) Simulate 1,000 observations from the bivariate nor-

mal where a seed number of 8128 is set right before

each data simulation.

In R, use the command ”set.seed(8128)” to set your

seed. Then ”rmvnorm(n = N, mean = mu, sigma =

sigma)” to generate random normal variables.

(b) Use ”princomp()” with ”cor = TRUE” to find the

estimated eigenvalues and eigenvectors from the cor-

relation matrix.

(c) Interpret the PCs

(d) How many PCs are necessary?

(e) Create separate scatter plots of the data and the PC

scores, but use one overall x-axis and y-axis set of

limits. Describe the relationship between these plots

for each ρ12.

(f) Relate your answers in c) – e) to the value of σ12.

II. The weekly rates of return for five stocks listed on the

New York Stock Exchange. Online you will find the

file ”Stock-Data.txt”. The txt file has the following

columns:

Column 1. JP Morgan:

Column 2. Citibank:

Column 3. Well Fargo:

Column 4. Royal Dutch Shell:

Column 5. Exxon Mobil:

(a) Construct the sample covariance matrix S, and find

the sample principal components.

(b) Interpret the first two PCs.

(c) Determine the proportion of the total sample vari-

ance explained by the first three principal compo-

nents. Interpret these components.

(d) Generate the scree plot and interpret the plot.

(e) Plot the first two PCs and interpret your plot.

(f) Given the results from the previous parts, do you

feel that the stock rates-of-return data can be sum-

marized in fewer than five dimensions ? Explain.

III. Online you will find the ”Goblet.csv” file.

Below is the picture of the measurements for the Goblet.

Subject-matter researchers are interested in grouping

goblets that have the same shape although they may

have different sizes. One way suggested by Manly(1994)

to adjust the data is to divide each measurement by X3

(height). This can easily be done in R.

Create these variables in R. You analysis will be done

based on this variables.

w1 = goblet$x1/goblet$x3,

w2 = goblet$x2/goblet$x3,

w4 = goblet$x4/goblet$x3,

w5 = goblet$x5/goblet$x3,

w6 = goblet$x6/goblet$x3)

(a) Generate the Star plot using the following R com-

mand.

win.graph(width = 11, height = 7)

stars(x = goblet[,-1], draw.segments = TRUE,

key.loc = c(14,10), main = "Goblet star plot",

labels = goblet$ID)

(b) Which goblets appear to stand out? Can you make

any generalizations about groups for goblets?

(c) Generate the Parallel coordinates plot using the fol-

lowing R command.

1

parcoord(x = goblet2[,-1], col = col.w5,

main = "Goblet parallel coordinate plot")

(d) Interpret the parallel plot.

(e) Run the Principal component analysis using ”cor =

TRUE”

(f) Interpret the first three principal components.

(g) How many PCs would you suggest for the analysis

of the goblet data. Justify your answer.

(h) Produce the Scree plot for the cereal data and inter-

pret your plot.

(i) Plot the first two PCs and interpret the plot.

2

学霸联盟

Homework III - Due Friday, April 23th

Book Homework (does not require R)

Note: This may be hand written or typed. Answers

should be clearly marked.

Please do 8.1, 8.2, 8.4, 8.6

R Homework (requires some use of R)

Note: You do not have to use R Markdown to turn

in the homework, but the homework must be turned

in in a reasonable format. The answers to the ques-

tions should be in the body of the homework, and

the code used to obtain those answers should be in

an appendix. There should be no code in the body of

the homework. You can accomplish this in R, Word,

LaTex, Google Docs, etc.

I. The purpose of this problem is to examine the effect

that different correlations have on the outcome of the

PCA. To make this easier, suppose x has a bivariate nor-

mal distribution with µ = (0, 0)T , σ11 = σ22 = 1. For

σ12 = −0.99,−0.9,−0.5, 0, 0.5, 0.9 and 0.99 (remember

that σ12 = ρ12 because the variances are equal to 1),

complete the following:

(a) Simulate 1,000 observations from the bivariate nor-

mal where a seed number of 8128 is set right before

each data simulation.

In R, use the command ”set.seed(8128)” to set your

seed. Then ”rmvnorm(n = N, mean = mu, sigma =

sigma)” to generate random normal variables.

(b) Use ”princomp()” with ”cor = TRUE” to find the

estimated eigenvalues and eigenvectors from the cor-

relation matrix.

(c) Interpret the PCs

(d) How many PCs are necessary?

(e) Create separate scatter plots of the data and the PC

scores, but use one overall x-axis and y-axis set of

limits. Describe the relationship between these plots

for each ρ12.

(f) Relate your answers in c) – e) to the value of σ12.

II. The weekly rates of return for five stocks listed on the

New York Stock Exchange. Online you will find the

file ”Stock-Data.txt”. The txt file has the following

columns:

Column 1. JP Morgan:

Column 2. Citibank:

Column 3. Well Fargo:

Column 4. Royal Dutch Shell:

Column 5. Exxon Mobil:

(a) Construct the sample covariance matrix S, and find

the sample principal components.

(b) Interpret the first two PCs.

(c) Determine the proportion of the total sample vari-

ance explained by the first three principal compo-

nents. Interpret these components.

(d) Generate the scree plot and interpret the plot.

(e) Plot the first two PCs and interpret your plot.

(f) Given the results from the previous parts, do you

feel that the stock rates-of-return data can be sum-

marized in fewer than five dimensions ? Explain.

III. Online you will find the ”Goblet.csv” file.

Below is the picture of the measurements for the Goblet.

Subject-matter researchers are interested in grouping

goblets that have the same shape although they may

have different sizes. One way suggested by Manly(1994)

to adjust the data is to divide each measurement by X3

(height). This can easily be done in R.

Create these variables in R. You analysis will be done

based on this variables.

w1 = goblet$x1/goblet$x3,

w2 = goblet$x2/goblet$x3,

w4 = goblet$x4/goblet$x3,

w5 = goblet$x5/goblet$x3,

w6 = goblet$x6/goblet$x3)

(a) Generate the Star plot using the following R com-

mand.

win.graph(width = 11, height = 7)

stars(x = goblet[,-1], draw.segments = TRUE,

key.loc = c(14,10), main = "Goblet star plot",

labels = goblet$ID)

(b) Which goblets appear to stand out? Can you make

any generalizations about groups for goblets?

(c) Generate the Parallel coordinates plot using the fol-

lowing R command.

1

parcoord(x = goblet2[,-1], col = col.w5,

main = "Goblet parallel coordinate plot")

(d) Interpret the parallel plot.

(e) Run the Principal component analysis using ”cor =

TRUE”

(f) Interpret the first three principal components.

(g) How many PCs would you suggest for the analysis

of the goblet data. Justify your answer.

(h) Produce the Scree plot for the cereal data and inter-

pret your plot.

(i) Plot the first two PCs and interpret the plot.

2

学霸联盟