1
BIA B350F Assignment 1
Weighting: 30%
Due date: 10 December 2021 (Friday)
Learning outcome:
Apply matrix algebra to summarize and process multivariate data.
Construct linear and logistic regression models to solve business prediction problems.
Instructions: (Marks would be deducted if you fail to follow the instructions below.)
In answering questions of the assignment, show clearly the steps you take in arriving at your
solutions. Keep at least four decimal places in the final answer for statistical computations or
otherwise specified.
Except Question 5 which require you to utilize R, the rest of the questions must be answered
manually.
The soft copy of handwritten or typed answers of Questions 1 to 4 and the analysis report for
Question 5 (in Word) must be uploaded to OLE by the due date. The R program of Question
5 must also be uploaded to “Assignment 1 – R program”. (Note: The assignment and R
program will be checked by Turnitin and zero mark will be given to plagiarized works.)
Question 1 (8 marks)
Let = [
4 2
3 4
] , = [
4 −2
1 2
] , = [
1
3
4
] and = [−3 2 1]′
Perform the following operations.
(a) + and ′ − (2 marks)
(b) ′ (2 marks)
(c) ′ (2 marks)
(d) |−1||−1| (2 marks)
Question 2 (10 marks)
Solve the following system of equations by Matrix method:
31 − 2 + 23 = −3
21 + 52 − 3 = 8
−1 + 42 − 33 = 5
(Note: zero mark will be given for non-matrix method.)
Question 3 (10 marks)
Let = [
−2 −1
4 3
] Determine the eigenvalues and normalized eigenvectors of .
Question 4 (12 marks)
2
Let Σ = [
16 2 −3
2 9 −9
−3 −9 4
] be the covariance matrix of the random vector = [
1
2
3
].
(a) Determine 1 2⁄ , (1 2⁄ )
−1
and . (6 marks)
(b) Find the covariance matrix for the linear combination 1 − 32 + 23. (3 marks)
(c) Find the covariance matrix for the following linear combinations of X1, X2 and X3.
1 = 21−2 + 3
2 = 1 + 32 − 23
(3 marks)
Question 5 (60 marks)
A researcher wishes to predict human wine taste preferences that is based on easily available analytical
tests at the certification step. A dataset “winequality-red.csv” is considered with 1599 white wine
samples (from Portugal). The independent variables include the values of the following 11
physicochemical tests:
1. Fixed acidity (g(tartaric acid)/dm3)
2. Volatile acidity (g(acetic acid)/dm3)
3. Citric acid (g/dm3)
4. Residual sugar (g/dm3)
5. Chlorides (g(sodium chloride)/dm3)
6. Free sulfur dioxide (mg/dm3)
7. Total sulfur dioxide (mg/dm3)
8. Density (g/cm3)
9. pH
10. Sulphates (g(potassium sulphate)/dm3)
11. Alcohol (vol.%)
The dependent variable ‘Quality’ is sensory data (median of at least 3 evaluations made by wine
experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent). (Note:
The dependent variable is ordinal but an “ordinal approximation of a continuous variable” can be made
for ordinal variables with five or more categories. The “winequality-red.csv” dataset can be
downloaded from the OLE.)
(a) Utilize R to determine the multiple linear regression model to predict the quality of red wine by
considering which independent variable(s) be included in the model among the other given
variables using stepwise regression (forward). You are expected to perform relevant model
checking including relevant graphs plotting after the desired model is formulated. All R
programs must be included in the answer and marks will be deducted if failing to do so.
(40 marks)
Specifically, you have to perform the following analysis/modeling:
Descriptive analysis and normality checking
Correlation analysis
Cleansing data: missing data and outliers checking
Developing basic regression model and performing residual diagnostics
Improving model by transforming variables (include residual diagnostics)
3
Using stepwise regression to develop the final model
(b) Perform relevant hypothesis testing to assess the validity of the multiple linear regression model
obtained as well as the validity of individual regression coefficients. (5 marks)
(c) Interpret the regression coefficients of the model. (5 marks)
(d) Write a reflective journal of not more than 200 words that summarizes your learning experience
in applying knowledge and skills acquired in the course to build the regression model for the
given problem, and that explain how this experience could enrich your ability to apply course
knowledge to real life applications. (10 marks)
