BIA B350F Assignment 1 (Part B)
Due date: 21 December 2020 (Monday) (Revised)
• Construct linear and logistic regression models to solve business prediction problems.
Instructions: (Marks would be deducted if you fail to follow the instructions below.)
• This assignment requires you to utilize R to determine and evaluate logistic regression model
and perform principal component analysis.
• The soft copy of the assignment (in Word) and the R program script must be uploaded to OLE
by the due date.
• Your analysis reports for the assignment must be uploaded to OLE (“Assignment 1 (Part B) -
Report”) for Turnitin checking. The R programs (.r file) must be uploaded to OLE
(“Assignment 1 (Part B) – R program”.
Question (100 marks)
A researcher wishes to predict to predict the particulate matter (PM2.5) in Bejing. An hourly data set
contains the PM2.5 data of US Embassy in Beijing and the meteorological data from Beijing Capital
International Airport were collected from Jan 1st, 2010 to Dec 31st, 2014. The attributes of the dataset
are as follows:
No Row number
year Year of data in this row
month Month of data in this row
day Day of data in this row
hour Hour of data in this row
pm2.5 PM2.5 concentration (µg/m3)
DEWP Dew Point (Celsius Degree)
TEMP Temperature (Celsius Degree)
PRES Pressure (hPa)
cbwd Combined wind direction
lws Cumulated wind speed (m/s)
ls Cumulated hours of snow
lr Cumulated hours of rain
Pm2.5 is the dependent variables. The data is stored in file “Bejing-PM25.csv” that can be downloaded
from the OLE. The Missing data are denoted as NA.
(a) Utilize R to determine the multiple linear regression model to predict the pm2.5 by considering
which independent variable(s) be included in the model among the other given variables using
stepwise regression (forward). You are expected to perform relevant model checking including
relevant graphs plotting after the desired model is formulated. All R programs must be included
in the answer and marks will be deducted if failing to do so. (60 marks)
Specifically, you have to perform the following analysis/modeling:
• Descriptive analysis and normality checking – 10 marks
• Correlation analysis – 10 marks
• Cleansing data: missing data and outliers checking – 10 marks
• Developing basic regression model and performing residual diagnostics – 10 marks
• Improving model by transforming variables (include residual diagnostics) – 10 marks
• Using stepwise regression to develop the final model – 10 marks
(b) Perform relevant hypothesis testing to assess the validity of the multiple linear regression model
obtained as well as the validity of individual regression coefficients. (15 marks)
(c) Interpret the regression coefficients of the model. (10 marks)
(d) Write a reflective journal to summarize your learning experience in applying knowledge and
skills acquired in the course to build the regression model for the given problem and how this
experience could enrich your ability to apply course knowledge to real life applications.