ST117-无代写
时间:2023-04-20
ST117 Introduction to Statistical Modelling (2023) • WR-A: Digital Twin
Please read guidance on expected answer format on Moodle in Log-4 week
1. Create the cohort
(a) Simulate 288 synthetic students with realistic-sounding names (e.g. use the randomNames R
package) and assign them to lab groups of 18 students each. Within each lab group, randomly
assign them to Homework Pods of 3 denoted by A, B,. . . , F. Sampling from the whole cohort,
independently of lab groups, randomly assign the students into Report Pods of 4 students each.
(b) Calculate the observed relative frequencies pk of Report Pods containing exactly k ∈ {0, 1, 2, 3, 4}
students who were in a Homework Pods labelled A (across any of the labs). Define qk in a similar
way to pk, but with the additional constraint that there are not any two students from the same
lab group in those Report Pods. Plot tables of both relative frequency distributions.
(c) For each k ∈ {0, 1, 2, 3, 4} run 100 realisations of (a) to obtain 100 samples pk(i) (i = 1, 2 . . . , 100)
and qk(i) (i = 1, 2, . . . , 100) of the frequency distributions pk and qk (k = 0, 1, 2, 3, 4), respectively.
Generate a series of 5 boxplots of pk (k = 0, 1, 2, 3, 4), each of them visualising the distribution
of relative frequencies based on the 100 simulations pk(i) (i = 1, 2 . . . , 100). For each of the five
simulated distributions calculate means and SDs and display them all in a table. Describe the
shape of the distributions. Carry out the same for qk (k = 0, 1, 2, 3, 4).
2. Pod combinatorics (theoretical)
You only receive credit if you prove your statements step by step, involving explanations as concise
as possible while including all necessary rationales.
(a) What is the probability mass function for the number of students in a randomly selected Report
Pod that had a Homework Pod denoted with the letter A?
(b) What is the probability that a randomly selected Report Pods contains only students whose
Homework Pods all have the same letter?
(c) What is the expected number of Report Pods that contain only of students whose Homework
Pods are denoted by the same letter?
(d) What is the expected number of Report Pods that have only students in them which were in a
Homework Pod denoted by the same letter without any two of them being from the same lab.
3. Create the grade book
(a) Simulate marks for the components of the assessment scheme as detailed in the digital twin’s
grade book shown in the table on page 2 (a simplified version of the real ST117 grade book on
Moodle). For each row in the table define a mark added as column to your dataframe from the
previous step. Use the abbreviations in the first column (yellow) of the table as column names
in your dataframe. Where scores below 0 or above 100 occur, set them to 0 or 100, respectively.
Round continuous variables to one decimal place.
(b) Calculate mean, SD, and the five-number summary of the distribution of the module mark. List
its main characteristics in words and find two suitable plots visualising aspects of the its shape.
4. Analysis of grade book statistics
(a) Let xi (i = 1, 2, . . . , 288) be the Q1 scores of the synthetic students and yi (i = 1, 2, . . . , 288)
be their Q2 scores. Calculate their correlation rxy. Partition the interval [0, 100] into 10 bins
I0 = [0, 10), I1 = [10, 20), . . . , I9 = [90, 100]. For each interval Ij (j = 0, 1, . . . , 9), let r
j
xy be the
correlation of (xi, yi) (i ∈ Dj), where Dj = {i = 1, 2, . . . , 288 |xi ∈ Ij}, i.e., the correlation of
the dataset restricted to those data point where the x-values (Q1) are in the bin Ij . Compare
the values for rjxy (j = 0, 1 . . . , 9) with rxy. Is there a relationship? Which one? Describe and
explain your findings.
(b) Create a series of boxplots of the Q2 distributions restricted to Ij (j = 0, 1 . . . , 9), side-by-side.
Describe your main observations and link them to known phenomena. For each of them, first
describe what you see and then discuss the reasons that may explain it.
(c) Build a simple linear regression model that predicts Q2 from Q1 and implement this in R using
lm(). What is your conclusion about the model fit? Back this up with evidence such a diagnostic
plots along with explanations what they show or with suitable scores.
(d) The synthetic Warwick Statistics Department recruits synthetic student ambassadors to help
on an Offer Holder Visit Day for synthetic prospective students. The ambassadors are selected
based on being in the upper quintile of the average of their Q2 and their 7GO@W scores.
The 7GO@W score is an assessment method developed by the synthetic Admissions team and
involves balancing 7 glasses of orange juice on a Warwick prospectus riding on a scooter. It
has a Beta(2,2) distribution and is not correlated with Q2. However, the student ambassadors
compare their scores and find a negative correlation. Create a suitable plot that demonstrates
why this is not a contradiction and also explain in words how this is possible.
Grade Book for the Digital Twin of ST117
Assignment Submitting Mark simulation
A0 Activity 0 225 randomly selected stu-
dents
1 if submitted, 0 otherwise
Q1 Quiz 1 all students N (65, 15) if A0 submitted, N (40, 30)
otherwise
A Activities all students who submitted
A0 plus 20 randomly selected
other students
1 if submitted, 0 otherwise
E Exercises all students maximum of Q1 results of all Homework
Pod members
Q2 Quiz 2 all students Q1 score plus independent random error
N (0, 20) if A submitted, N (−30, 20) if
not
NLogs Number
submitted
logs
Binomial(6, 0.9) if A submit-
ted, Binomial(6, 0.6) other-
wise
all submissions pass
PE Performance-
Engagement
(mean of Q1 and Q2) multi-
plied with NLogs/6
WR Written
Report
all students N (µ, 15), where µ is the maximum of
Report Pod’s member’s PE scores
MM Module
Mark
Sum of 11% Q1, 14% Q2, 4% A, 35% E,
NLogs, and 30% WR
Disclaimer: All names, characters, and incidents portrayed in this production of the digital
twin of ST117 are fictitious. No identification with actual persons (living or deceased) is
intended or should be inferred.
essay、essay代写