ST332: Medical Statistics, 2020-2021
Assessed Coursework: Group Project 2
Prof J L Hutton
This second assignment will use an extended subset of data from the randomised controlled trial,
CAST: Collaborative Ankle Support Trial, which was used for the first project. It considers whath
we might learn about people who choose not to return questionnaires, or return partly completed
The main outcome variable was a Foot and Ankle Outcome Score (FAOS). The FAOS questionnaire
and a user’s guide are available on the moodle. FAOS consists of five subscales; Pain, other Symptoms
(symp), Function in daily living (adl), Function in sport and recreation (sport), and foot and ankle-
related Quality of Life (qual).
As before, in this project description, ‘Discuss’ means ‘Discuss with your group’, not ‘Write this up as
a discussion in the report’. The intention is that you make notes which you can use when you write
2 Data set provided: CAST2miss.csv
The subset of the data for this project, ‘CAST2miss.csv’, includes the previous data on 565 people,
and 11 further variables. The first ten further variables are the scores on the five FAOS subscales at
randomisation or baseline and at 9 months. The names are as in parentheses above, prefixed by ‘b’
for baseline, and followed by ‘9’ for the 9 month scores. At nine months between 24% and 40% of
scores on the subscales were missing. The final variable, ‘Yscore’ is a composite outcome score related
to the problems a person has with their ankle, so a good outcome has a low score. The worst possible
outcome is 10.
2.1 Analysis of missing data
You are to conduct an analysis of these data in R, with the aim of finding an appropriate model for
when scores at nine months are missing. The groups should have four or five people. If there are five
of you, all five subscales must be analysed. If there are four of you, only four subscales have to be
Consider each of the basic demographic variables: age, sex, height, weight, BMI, as well as the baseline
value of the subscale in separate models for the same subscale as missing or not at nine months. Then
decide on a final model which uses as many of those six variables as you decide. Consider how to
interpret the models if age, for example, predicts missing results at nine months, in a model with only
age as an explanatory variable, but your final model does not include age.
The steps outlined in the previous project can be adapted these analyses. As the outcome variables
are binary (missing or recorded), the relevant exploratory plots include empirical logit plots.
2.2 Analysis of combined outcome variable
The composite Yscore is missing for over 100 participants. Compare (five or four, as before) models
for whether Yscore is missing or not which include some or all of age, sex, height, weight, BMI, and
a single baseline subscore.
Create a subset which includes only participants with a Yscore and baseline scores. Do women and men
have different Yscores on average? How much of the difference might be explained by the demographic
and baseline scores?
Discuss the possible effects of reducing the data to this subset, and of reducing further to keep only
participants who have no missing data.
The report should be a single pdf file, which includes any appendices. This assessment is worth 20%
of your final mark on ST332 and ST409. The assessment will be based on the presentation of your
report, your understanding and the competence of your analysis and discussion of the data and articles.
Include your results on each subscale, for missing 9 month subscales, missing Yscore. The results on
the differences between Yscores for men and women can be a separate section. Also include a short
section, between 100 and 400 words, on dealing with missing data by using subsets with available
cases or carrying out a complete case analysis.
The word limit is 3500 words. This is an upper limit, and there will be no penalty if you use fewer
words. However, it is likely that you will need at least 2500 words in order to provide a good report.
You must give a word count for the main report.
In addition to the word limit:
you may include up to twelve figures and seven tables;
you may include a second appendix with references; and
you may include an appendix with a brief statement of the contributions of each group member,
identified only by University number.
If you consider you have all made a fair contribution, there is no need to specify contributions.
Contributions in medical articles
In Lamb et al (2009) article, the statement about contributions is:
“SL wrote the original protocol. SL, MC, JH, and JM participated in the trial design, data analysis,
interpretation of results, drafting, and approval of the final manuscript.”
In Nakash et al (2006), the statement about contributions is:
“No persons apart from the authors contributed to this paper. The guarantors of this paper are RN
and SL. RN, SL and JH had the original idea for the paper, RN performed the literature search and
wrote the paper, RN and EJ conducted quality assessment and data extraction. The paper was drafted
by RN and critically appraised for intellectual content by SL, JH, SG and EJ. RN, JH and SL were
involved in interpretation of the data. The final version of the paper was approved by all authors.”