Page 1
S261F Data Analytics with Applications
S261F
Tutorial Assignment 04
2021
Covers 9-11 Week Material
Cut-off date
05 May 2021
❖ Checklist for the submission of a successful
assignment
Contents Check Mark
Official name
Student ID
Solution in numeral order
Note: if any checklist content is missing, the assignment
will not be considered.
(i) You may submit a single pdf file which consists of all
solutions in numeral order.
❖ You are advised to keep a copy of your assignment in
case of loss.
❖ This assignment contained 7.5% weightage of your
continuous assessment (for more details, see course
book).
STAT S261F: Tutorial Assignment 04
Page 2
Instruction:
▪ Please attempt the following questions in the R markdown file and submit the output
in the form of the "PDF" file.
▪ This assignment is completely dependent on a dataset named “Heart failure clinical
records Data Set” available on the UCI data repository. You may find this data using
the link: https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records#
Question 1 – 30 marks
a. Once you download and load the data, assign categories to each categorical data with
the description given in the following table.
b. Run the appropriate univariate analysis on the data by providing the summaries and
plots.
c. Perform bivariate analysis for all categorical variables based on the sex and death
event. Moreover, provide the findings that is there exists association among
responses.
[5+15+10=30]
Question 2 – 50 marks
a. Draw an appropriate plot for all numeric variables against the variable sex and death
event to find the distribution of numeric variable with respect to categories and to
show outliers in the data.
STAT S261F: Tutorial Assignment 04
Page 3
b. The normal range of platelet count in a healthy person is 150,000 to 400,000.
However, in this study, assuming a normal distribution, a researcher is interested in
checking that is the average platelet count equals 300,000 or not? Run the appropriate
analysis and provide the conclusion.
c. Let the serum creatinine is a normally distributed variable. Test that the average
serum creatinine is the same in both women and men. Moreover, also test that the
average serum creatinine is the same in both dead and live patients.
d. Assume that the creatinine phosphokinase is not normally distributed. Test that the
average creatinine phosphokinase is the same in both women and men. Moreover,
also test that the average creatinine phosphokinase is the same in both dead and live
patients.
e. Assume that ejection fraction is a normally distributed variable. Find the numeric
relation of ejection fraction with the serum creatinine and serum sodium. Provide the
model and interpret which serum has a statistically significant relation with ejection
fraction.
[5 X 10=50]
Question 3 – 20 marks
A company, “ABC” is producing Water Purifiers for the last few decades. Recently, they
launched a new water purifier named nano-purifier, which is based on nanotechnology to
remove useless particles. The company CEO is interested in looking at the reputation of
their advanced product on social media and online selling websites. Please provide the text
analysis steps with a detailed description to the CEO to learn about the reputation of his
advanced product.
END
学霸联盟