程序代写案例-S261F-Assignment 04

时间：2021-04-28

Page 1

S261F Data Analytics with Applications

S261F

Tutorial Assignment 04

2021

Covers 9-11 Week Material

Cut-off date

05 May 2021

❖ Checklist for the submission of a successful

assignment

Contents Check Mark

Official name

Student ID

Solution in numeral order

Note: if any checklist content is missing, the assignment

will not be considered.

(i) You may submit a single pdf file which consists of all

solutions in numeral order.

❖ You are advised to keep a copy of your assignment in

case of loss.

❖ This assignment contained 7.5% weightage of your

continuous assessment (for more details, see course

book).

STAT S261F: Tutorial Assignment 04

Page 2

Instruction:

▪ Please attempt the following questions in the R markdown file and submit the output

in the form of the "PDF" file.

▪ This assignment is completely dependent on a dataset named “Heart failure clinical

records Data Set” available on the UCI data repository. You may find this data using

the link: https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records#

Question 1 – 30 marks

a. Once you download and load the data, assign categories to each categorical data with

the description given in the following table.

b. Run the appropriate univariate analysis on the data by providing the summaries and

plots.

c. Perform bivariate analysis for all categorical variables based on the sex and death

event. Moreover, provide the findings that is there exists association among

responses.

[5+15+10=30]

Question 2 – 50 marks

a. Draw an appropriate plot for all numeric variables against the variable sex and death

event to find the distribution of numeric variable with respect to categories and to

show outliers in the data.

STAT S261F: Tutorial Assignment 04

Page 3

b. The normal range of platelet count in a healthy person is 150,000 to 400,000.

However, in this study, assuming a normal distribution, a researcher is interested in

checking that is the average platelet count equals 300,000 or not? Run the appropriate

analysis and provide the conclusion.

c. Let the serum creatinine is a normally distributed variable. Test that the average

serum creatinine is the same in both women and men. Moreover, also test that the

average serum creatinine is the same in both dead and live patients.

d. Assume that the creatinine phosphokinase is not normally distributed. Test that the

average creatinine phosphokinase is the same in both women and men. Moreover,

also test that the average creatinine phosphokinase is the same in both dead and live

patients.

e. Assume that ejection fraction is a normally distributed variable. Find the numeric

relation of ejection fraction with the serum creatinine and serum sodium. Provide the

model and interpret which serum has a statistically significant relation with ejection

fraction.

[5 X 10=50]

Question 3 – 20 marks

A company, “ABC” is producing Water Purifiers for the last few decades. Recently, they

launched a new water purifier named nano-purifier, which is based on nanotechnology to

remove useless particles. The company CEO is interested in looking at the reputation of

their advanced product on social media and online selling websites. Please provide the text

analysis steps with a detailed description to the CEO to learn about the reputation of his

advanced product.

END

