xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

R studio代写-B350F-Assignment 1

时间：2020-12-16

1

BIA B350F Assignment 1 (Part B)

Weighting: 18%

Due date: 21 December 2020 (Monday) (Revised)

Learning outcome:

• Construct linear and logistic regression models to solve business prediction problems.

Instructions: (Marks would be deducted if you fail to follow the instructions below.)

• This assignment requires you to utilize R to determine and evaluate logistic regression model

and perform principal component analysis.

• The soft copy of the assignment (in Word) and the R program script must be uploaded to OLE

by the due date.

• Your analysis reports for the assignment must be uploaded to OLE (“Assignment 1 (Part B) -

Report”) for Turnitin checking. The R programs (.r file) must be uploaded to OLE

(“Assignment 1 (Part B) – R program”.

Question (100 marks)

A researcher wishes to predict to predict the particulate matter (PM2.5) in Bejing. An hourly data set

contains the PM2.5 data of US Embassy in Beijing and the meteorological data from Beijing Capital

International Airport were collected from Jan 1st, 2010 to Dec 31st, 2014. The attributes of the dataset

are as follows:

Attribute Description

No Row number

year Year of data in this row

month Month of data in this row

day Day of data in this row

hour Hour of data in this row

pm2.5 PM2.5 concentration (µg/m3)

DEWP Dew Point (Celsius Degree)

TEMP Temperature (Celsius Degree)

PRES Pressure (hPa)

cbwd Combined wind direction

lws Cumulated wind speed (m/s)

ls Cumulated hours of snow

lr Cumulated hours of rain

Pm2.5 is the dependent variables. The data is stored in file “Bejing-PM25.csv” that can be downloaded

from the OLE. The Missing data are denoted as NA.

(a) Utilize R to determine the multiple linear regression model to predict the pm2.5 by considering

which independent variable(s) be included in the model among the other given variables using

stepwise regression (forward). You are expected to perform relevant model checking including

relevant graphs plotting after the desired model is formulated. All R programs must be included

in the answer and marks will be deducted if failing to do so. (60 marks)

Specifically, you have to perform the following analysis/modeling:

2

• Descriptive analysis and normality checking – 10 marks

• Correlation analysis – 10 marks

• Cleansing data: missing data and outliers checking – 10 marks

• Developing basic regression model and performing residual diagnostics – 10 marks

• Improving model by transforming variables (include residual diagnostics) – 10 marks

• Using stepwise regression to develop the final model – 10 marks

(b) Perform relevant hypothesis testing to assess the validity of the multiple linear regression model

obtained as well as the validity of individual regression coefficients. (15 marks)

(c) Interpret the regression coefficients of the model. (10 marks)

(d) Write a reflective journal to summarize your learning experience in applying knowledge and

skills acquired in the course to build the regression model for the given problem and how this

experience could enrich your ability to apply course knowledge to real life applications.

(15 marks)

BIA B350F Assignment 1 (Part B)

Weighting: 18%

Due date: 21 December 2020 (Monday) (Revised)

Learning outcome:

• Construct linear and logistic regression models to solve business prediction problems.

Instructions: (Marks would be deducted if you fail to follow the instructions below.)

• This assignment requires you to utilize R to determine and evaluate logistic regression model

and perform principal component analysis.

• The soft copy of the assignment (in Word) and the R program script must be uploaded to OLE

by the due date.

• Your analysis reports for the assignment must be uploaded to OLE (“Assignment 1 (Part B) -

Report”) for Turnitin checking. The R programs (.r file) must be uploaded to OLE

(“Assignment 1 (Part B) – R program”.

Question (100 marks)

A researcher wishes to predict to predict the particulate matter (PM2.5) in Bejing. An hourly data set

contains the PM2.5 data of US Embassy in Beijing and the meteorological data from Beijing Capital

International Airport were collected from Jan 1st, 2010 to Dec 31st, 2014. The attributes of the dataset

are as follows:

Attribute Description

No Row number

year Year of data in this row

month Month of data in this row

day Day of data in this row

hour Hour of data in this row

pm2.5 PM2.5 concentration (µg/m3)

DEWP Dew Point (Celsius Degree)

TEMP Temperature (Celsius Degree)

PRES Pressure (hPa)

cbwd Combined wind direction

lws Cumulated wind speed (m/s)

ls Cumulated hours of snow

lr Cumulated hours of rain

Pm2.5 is the dependent variables. The data is stored in file “Bejing-PM25.csv” that can be downloaded

from the OLE. The Missing data are denoted as NA.

(a) Utilize R to determine the multiple linear regression model to predict the pm2.5 by considering

which independent variable(s) be included in the model among the other given variables using

stepwise regression (forward). You are expected to perform relevant model checking including

relevant graphs plotting after the desired model is formulated. All R programs must be included

in the answer and marks will be deducted if failing to do so. (60 marks)

Specifically, you have to perform the following analysis/modeling:

2

• Descriptive analysis and normality checking – 10 marks

• Correlation analysis – 10 marks

• Cleansing data: missing data and outliers checking – 10 marks

• Developing basic regression model and performing residual diagnostics – 10 marks

• Improving model by transforming variables (include residual diagnostics) – 10 marks

• Using stepwise regression to develop the final model – 10 marks

(b) Perform relevant hypothesis testing to assess the validity of the multiple linear regression model

obtained as well as the validity of individual regression coefficients. (15 marks)

(c) Interpret the regression coefficients of the model. (10 marks)

(d) Write a reflective journal to summarize your learning experience in applying knowledge and

skills acquired in the course to build the regression model for the given problem and how this

experience could enrich your ability to apply course knowledge to real life applications.

(15 marks)