xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-ALY2010

时间：2021-04-18

1 | P a g e

ALY2010 Project 6

Instructor: Dr. Dee Chiluiza, PhD

Correlation, regression analysis and chi-square test

Overview and Rationale

This project will help you measure your understandings of basic concepts on analytics.

It will help you measure your skills on R, R Studio and R Markdown.

It will help you to measure your understanding of correlation, regression analysis and chi-square test.

It will help you measure your skills to apply critical thinking to make meaningful observations of your data

analysis results.

Support file

Use the attached R Markdown file (Project_6_Template.Rmd) as a template to fill your answers.

Assignment

Part 1. Title and Introduction

Prepare your report using R Markdown, and present your report using an HTML file.

1. Title: Present a title to your report.

2. Introduction: Present a well informative introduction section, this will measure your understanding of

the topic and analytical processes for data analysis:

Your introduction needs good information and good organization. This applies for any report you make.

Separate each topic in individual paragraph.

• Regression: Using your own words, talk about the significance of the regression analysis. Provide a practical

example from the financial or market industries.

• Chi-Square: Using your own words, talk about the significance of using chi-square tests and their application

in the industry.

Use Bluman as a reference. Also present at least one additional academic reference for each topic.

Part 2. Analysis section

Task 1. Correlation and regression analysis

1.1 Data set description.

Use ?faithful in the console and read the information about faithful. This is a public data set.

2 | P a g e

1.1 Using your own words, describe the data set.

1.2 What is the coefficient of correlation between eruptions and waiting? Create an object named: corr_coef =

1.3 Explain the meaning of the coefficient of correlation?

1.4 What is the coefficient of determination between eruptions and waiting? Create an object named:

determ_coef =

1.5 Explain the meaning of the coefficient of determination?

1.6 Obtain the linear regression model for eruptions and waiting. Create an object named: Linear_reg =

1.7 Write the linear regression formula.

1.8 Present a scatter plot of eruptions versus waiting, and using the regression model you obtained on 1.6,

add the regression line to the plot.

The plot should have a good title, good x- and y-axes labels, data points presented as triangle (check pch

codes), and regression line must have a color.

Check this page for pch codes:

http://www.sthda.com/english/wiki/r-plot-pch-symbols-the-different-point-shapes-available-in-r

1.9 Describe the direction of the regression line and explain what it tells you about your data.

Important: Notice that in the template Rmd file I already created an R chunk where you can enter your

r codes.

Task 2. Chi-square Goodness-of-fit test

Customers per day in store. Imagine that you own a store, and you want to know if there are differences in the

number of customers that visit your store each fay from Monday to Saturday. In order to answer this question,

you collect data for three weeks, your data is the following:

3 | P a g e

Prepare one single R chunk to enter all your codes, remember to add names to all your objects to prevent the

display of their outcomes on your report, here you will present your answers using inline r codes.

Important: all the following tasks must be prepared in the same r chunk.

You will apply the following formula:

Check M12 Lecture ChiSQ.pptx, I made modifications to slides 10, 11 and 12.

2.1 Create vectors to enter the data for each day.

2.2 Create object named table1 to create a matrix with the data. If your matrix is well done, it should look like

this:

Do not present this table in your report, just create the object: table1 = matrix()

2.3 Create a vector for the days of the week:

days = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

2.4 Create a vector for the week numbers:

weeks = c("Week1", "Week2", "Week3")

2.5 Read my R file “Vectors and Matrices. R”.

Use the vector days to provide names to the rows and the vector weeks to provide names to the columns.

2.6 Transform your table1 into a data frame. If your table is well done, it should look like this:

4 | P a g e

Do not present this table in your report, just create the object: table1 = data.frame()

2.7 Read my R Markdown file “1 Calculated_field. Rmd”.

Create a new object named: table1a and use the mutate code to create a new column for the means of

weeks1, 2, and 3, name this column Observed.

In the same mutate code create a new column named Expected, this is the sum of observed values divided by

the number of days, same for each cell. Remember, if there are no customer preferences, then the number of

customers visiting the stores is the same each day.

In the same mutate code create a new column named OmE, to calculate Observed minus Expected.

In the same mutate code create a new column named "(OmE)^2", to calculate the squares of OmE. Notice

that this name has quotations since it has special characters.

In the same mutate code create a new column named "((OmE)^2)/E", to divide (OmE)^2 by the Expected

values. Notice that this name has quotations since it has special characters.

Now your column should have the following look:

I hided the values, you must calculate them.

Important. Sometimes the name of the rows is lost when applying the mutate() code. If this is your case, use

pipes (%>%) to process this data, and use code rownames_to_column to fix the row names issue, I will help

you with this strategy:

table1a = table1 %>%

rownames_to_column('Days') %>%

mutate( )

2.8 Create an object named chisq_value and use it to calculate the chi-square test value. It must be calculated

from the table you created. Remember that it is the sum of all (O-E)^2/E, last column you created.

2.9 Create an object named alpha to enter the value α = 0.01

5 | P a g e

2.10 Create an object named df to calculate the degrees of freedom.

2.11 Create an object named cv to calculate the critical value of your test.

2.12 Create an object named table1b to prepare your table using the knitr::kable() code. Make sure to use

only one decimal for your data.

Important:

At this point, if you knit your document, you should obtain the r chunk without any white box, this is because

you created names for all your values, calculations, vectors, and tables.

Using inline r codes with two `` at each side (``r ``) Complete the following:

2.13 α = ``r ``

2.14 Critical value = ``r ``

2.15 Chi-square value = ``r ``

2.16 Is Chi-Square higher than critical value? = ``r ``

2.17 Based on the answer obtained on 2.15, do you have enough evidence to reject Ho?

2.18 Present your table 1b here: ``r ``

Important: Notice that in the template Rmd file I already created an R chunk where you can enter your r

codes.

6 | P a g e

Part 3. Conclusions

Write your conclusions.

Part 4. Bibliography

Write your Bibliography section to present all your references.

Due date

Tuesday April 20 at 11:59 PM

Grade

50 points.

学霸联盟

ALY2010 Project 6

Instructor: Dr. Dee Chiluiza, PhD

Correlation, regression analysis and chi-square test

Overview and Rationale

This project will help you measure your understandings of basic concepts on analytics.

It will help you measure your skills on R, R Studio and R Markdown.

It will help you to measure your understanding of correlation, regression analysis and chi-square test.

It will help you measure your skills to apply critical thinking to make meaningful observations of your data

analysis results.

Support file

Use the attached R Markdown file (Project_6_Template.Rmd) as a template to fill your answers.

Assignment

Part 1. Title and Introduction

Prepare your report using R Markdown, and present your report using an HTML file.

1. Title: Present a title to your report.

2. Introduction: Present a well informative introduction section, this will measure your understanding of

the topic and analytical processes for data analysis:

Your introduction needs good information and good organization. This applies for any report you make.

Separate each topic in individual paragraph.

• Regression: Using your own words, talk about the significance of the regression analysis. Provide a practical

example from the financial or market industries.

• Chi-Square: Using your own words, talk about the significance of using chi-square tests and their application

in the industry.

Use Bluman as a reference. Also present at least one additional academic reference for each topic.

Part 2. Analysis section

Task 1. Correlation and regression analysis

1.1 Data set description.

Use ?faithful in the console and read the information about faithful. This is a public data set.

2 | P a g e

1.1 Using your own words, describe the data set.

1.2 What is the coefficient of correlation between eruptions and waiting? Create an object named: corr_coef =

1.3 Explain the meaning of the coefficient of correlation?

1.4 What is the coefficient of determination between eruptions and waiting? Create an object named:

determ_coef =

1.5 Explain the meaning of the coefficient of determination?

1.6 Obtain the linear regression model for eruptions and waiting. Create an object named: Linear_reg =

1.7 Write the linear regression formula.

1.8 Present a scatter plot of eruptions versus waiting, and using the regression model you obtained on 1.6,

add the regression line to the plot.

The plot should have a good title, good x- and y-axes labels, data points presented as triangle (check pch

codes), and regression line must have a color.

Check this page for pch codes:

http://www.sthda.com/english/wiki/r-plot-pch-symbols-the-different-point-shapes-available-in-r

1.9 Describe the direction of the regression line and explain what it tells you about your data.

Important: Notice that in the template Rmd file I already created an R chunk where you can enter your

r codes.

Task 2. Chi-square Goodness-of-fit test

Customers per day in store. Imagine that you own a store, and you want to know if there are differences in the

number of customers that visit your store each fay from Monday to Saturday. In order to answer this question,

you collect data for three weeks, your data is the following:

3 | P a g e

Prepare one single R chunk to enter all your codes, remember to add names to all your objects to prevent the

display of their outcomes on your report, here you will present your answers using inline r codes.

Important: all the following tasks must be prepared in the same r chunk.

You will apply the following formula:

Check M12 Lecture ChiSQ.pptx, I made modifications to slides 10, 11 and 12.

2.1 Create vectors to enter the data for each day.

2.2 Create object named table1 to create a matrix with the data. If your matrix is well done, it should look like

this:

Do not present this table in your report, just create the object: table1 = matrix()

2.3 Create a vector for the days of the week:

days = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

2.4 Create a vector for the week numbers:

weeks = c("Week1", "Week2", "Week3")

2.5 Read my R file “Vectors and Matrices. R”.

Use the vector days to provide names to the rows and the vector weeks to provide names to the columns.

2.6 Transform your table1 into a data frame. If your table is well done, it should look like this:

4 | P a g e

Do not present this table in your report, just create the object: table1 = data.frame()

2.7 Read my R Markdown file “1 Calculated_field. Rmd”.

Create a new object named: table1a and use the mutate code to create a new column for the means of

weeks1, 2, and 3, name this column Observed.

In the same mutate code create a new column named Expected, this is the sum of observed values divided by

the number of days, same for each cell. Remember, if there are no customer preferences, then the number of

customers visiting the stores is the same each day.

In the same mutate code create a new column named OmE, to calculate Observed minus Expected.

In the same mutate code create a new column named "(OmE)^2", to calculate the squares of OmE. Notice

that this name has quotations since it has special characters.

In the same mutate code create a new column named "((OmE)^2)/E", to divide (OmE)^2 by the Expected

values. Notice that this name has quotations since it has special characters.

Now your column should have the following look:

I hided the values, you must calculate them.

Important. Sometimes the name of the rows is lost when applying the mutate() code. If this is your case, use

pipes (%>%) to process this data, and use code rownames_to_column to fix the row names issue, I will help

you with this strategy:

table1a = table1 %>%

rownames_to_column('Days') %>%

mutate( )

2.8 Create an object named chisq_value and use it to calculate the chi-square test value. It must be calculated

from the table you created. Remember that it is the sum of all (O-E)^2/E, last column you created.

2.9 Create an object named alpha to enter the value α = 0.01

5 | P a g e

2.10 Create an object named df to calculate the degrees of freedom.

2.11 Create an object named cv to calculate the critical value of your test.

2.12 Create an object named table1b to prepare your table using the knitr::kable() code. Make sure to use

only one decimal for your data.

Important:

At this point, if you knit your document, you should obtain the r chunk without any white box, this is because

you created names for all your values, calculations, vectors, and tables.

Using inline r codes with two `` at each side (``r ``) Complete the following:

2.13 α = ``r ``

2.14 Critical value = ``r ``

2.15 Chi-square value = ``r ``

2.16 Is Chi-Square higher than critical value? = ``r ``

2.17 Based on the answer obtained on 2.15, do you have enough evidence to reject Ho?

2.18 Present your table 1b here: ``r ``

Important: Notice that in the template Rmd file I already created an R chunk where you can enter your r

codes.

6 | P a g e

Part 3. Conclusions

Write your conclusions.

Part 4. Bibliography

Write your Bibliography section to present all your references.

Due date

Tuesday April 20 at 11:59 PM

Grade

50 points.

学霸联盟