Introduction to Statistical Methods (MED5029)
Coursework 2020-2021
(Practical skills e-assessment)
Attempt all parts of the questions, which are worth 75% of your overall mark for this
course.
The deadline for submission is 24 th June 2021.
Save the files to a convenient location on your PC and read them into R. The data set
contains variables that are ready for analysis and no recoding of variables is required, but
you will need to label them appropriately and assign value labels to the variables where
appropriate. You are expected to use both the R manual and the Introduction to Statistics
manual to help you answer the questions.
Please check well in advance that you can access the data for analysis in case you have
problems with this near the deadline!
When analysing the data and writing your report you should:
• provide appropriate plots and /or tables to summarise the data
• state and justify any assumptions you make,
• describe the analysis you have done,
• interpret the results in non-technical language
Your report should include the relevant sections of R output, copied and pasted into your
document. Do not include R syntax in the output. You should ensure that your output and
any plots you consider informative are appropriately and informatively labelled. [Do NOT
print out and hand in the entire contents from the R console – only include relevant
output.]
Context
The Framingham data set is a famous cross sectional survey of respondents in Framingham,
Massachusetts who were asked to participate in a study in 1948 to help predict the factors
of coronary heart disease. Since the survey was first done in 1948 the relatives of the
original respondents now contribute data to the study. The data set you have is a random
sample from the original 1948 survey.
Data
The data are stored in the Stata data file “ITS2021_REASSESSMENT_v12.dta" as follows:
ID Subject reference number (1001, 1002, …,5240)
age age at last birthday (years)
sex 1: male
0: female
education level of educational qualification
1: none or basic education
2: high school qualification
3: college/university qualification
4: post graduate qualification
NA: missing
smoker current smoker
1: smoker
0: non-smoker
cigsperday number cigarettes smoked per day
NA: missing
bpmeds taking blood pressure medication
0: not on medication
1: taking medication
NA: missing
stroke has had a stroke ?
1: Yes 0: No
hypertension has hypertension ?
1: Yes 0: No
diabetes has diabetes ?
1: Yes 0: No
totchol cholesterol level (mg/dL)
NA: missing
sysbp systolic blood pressure (Hg)
diabp diastolic blood pressure (Hg)
bmi Body Mass index (kg/m 2 )
heartrate Heart rate (beats per minute)
NA: missing
glucose blood glucose (mg/dL)
NA: missing
tenyearchd CHD risk in 10 years ?
1: At risk 0: No risk
Questions
1. Calculate the % of males and females who have hypertension in the sample. Use an
appropriate test and 95% confidence interval to investigate whether the percentage of
males with hypertension is significantly different to the percentage of females with
hypertension
2. Is there any evidence of a difference in cholesterol levels between males and females?
3. For males and females separately is there any evidence that blood glucose differs by
hypertension diagnosis?
4. Is there any evidence of a difference in average BMI between the 4 categories of
education?
5. The literature on cardiovascular disease suggests that BMI can be predicted from
diastolic blood pressure. Is there any evidence that this is the case within this data
set?
6. Predict BMI for the following diastolic blood pressure readings.
a) 45 Hg
b) 150 Hg
7. In your opinion are these predictions valid ?
学霸联盟