PSYM201-无代写
时间:2023-08-28
PSYM201
University of Exeter
College of Life & Environmental Sciences
May 2022
ADVANCED STATISTICS
Module Convenors: Tim Fawcett & Andrew Higginson
Duration: 3 hours
This is an Open Book exam
Materials to be supplied:
Data files potent-potions.csv, soothing-pets.csv and thirsty-bees.csv
Additional materials:
Statistical software R, RStudio, rstudio.cloud
R packages psych, car, pwr, ggplot2, multcomp, emmeans, ppcor, afex, GPArotation, corpcor, lavaan,
rockchalk, lme4, lattice, ResourceSelection, mlogit, arm, MASS, brms
Part A (35 marks)
Evil mastermind Professor Leigh Hogwarts is busy in his underground laboratory, concocting potions that he
believes will boost his cognitive powers and enable him to take over the world. He has developed four different
types of potion—WombleJuice, GutterSlime, BurpleGoo and FibbleJelly—and now wishes to test their
effectiveness alongside an inert placebo solution. He buys 100 guinea-pigs and measures their performance in
a radial arm maze (maze1), with higher numbers representing better performance. He then randomly allocates
the guinea-pigs to five groups of 20, gives each group a different potion or the placebo, and tests them in the
radial arm maze a second time (maze2). His data are in the file potent-potions.csv.
A1. (2 marks) Report the mean and standard error for the change in performance between the first and second
maze attempts.
mean = 6.78 (1), s.e. = 0.53 (1)
potions$change <- potions$maze2 - potions$maze1
library(psych)
describe(potions$change)
A2. (5 marks) Nine guinea-pigs sadly died between their first and second maze tests (indicated as ‘NA’ for
maze2). Is there evidence that this pattern was non-random with respect to the potion they received? Report
the results of a statistical test to support your answer, and remember to check the assumptions.
Pattern is significantly non-random (1) (Fisher’s exact test: P < 0.001) (3). There were excess deaths in the
GutterSlime group (1).
Binomial logistic regression also fine if comparing to null model (χ24 = 25.6, P < 0.001), but max. 1 mark if
only reporting parameter estimates (as standard errors unreliable).
Max. 3 marks for chi-squared test (half of expected counts < 5): χ24 = 29.8, P < 0.001.
Max. 1 mark for Kruskal–Wallis test on dead/alive DV: χ24 = 29.5, P < 0.001.
potions$died <- ifelse(is.na(potions$maze2),1,0)
fisher.test(potions$died,potions$potion)
table(potions$died,potions$potion)
A3. (4 marks) Professor Hogwarts notices that the variables maze1 and maze2 are not normally distributed.
Identify suitable transformations for each variable and report whether these have corrected the problem, using
either plots or a statistical test.
Plots indicate strong positive skew (1), so log transformation (or square-root transformation) (1) is
appropriate. Q-Q plots or Shapiro–Wilk tests indicate that log-transformed variables are normally distributed
(2).
library(car)
qqPlot(potions$maze1)
hist(potions$maze1)
shapiro.test(potions$maze1)
qqPlot(potions$maze2)
shapiro.test(potions$maze2)
hist(potions$maze2)
potions$log.maze1 <- log10(potions$maze1+1)
qqPlot(potions$log.maze1)
shapiro.test(potions$log.maze1)
potions$log.maze2 <- log10(potions$maze2+1)
qqPlot(potions$log.maze2)
shapiro.test(potions$log.maze2)
A4. (4 marks) Report Pearson’s correlation coefficient between the maze1 and maze2 scores and test its
significance, using the transformed variables if appropriate. What is the percentage of shared variance?
r = 0.933 (1), t89 = 24.46, P < 0.001 (2). Shared variance = 100*(0.933
2) = 87% (1).
Max. 1 mark if using untransformed data or a non-parametric alternative.
with(potions,cor.test(log.maze1,log.maze2))
100*(0.9330068^2)
A5. (4 marks) Professor Hogwarts is excited to discover that the 11 guinea-pigs with the lowest maze1 scores
all performed better in their second maze attempt. Give two reasons why this does not provide convincing
evidence that his potions improve maze performance.
Performing the same task a second time may lead to improvement, even if the potions are ineffective (2). And
even with no overall improvement, the lowest scores are statistically likely to increase on the second
measurement because of regression towards the mean (2).
A6. (3 marks) The bumbling professor wants to analyse his data using a repeated-measures ANOVA, to test
whether the repeated measurements of maze performance (maze1, maze2) differ between the potion
treatments. Explain why instead it would be more logical to use an ANCOVA to test whether maze2 differs
between the potion treatments, with maze1 as a covariate.
maze1 is measured before the guinea-pigs have received the potion treatment, so it cannot conceivably affect
their performance (3) [or something else along these lines].
A7. (8 marks) Use an ANCOVA to compare the maze performance (maze2) of the guinea-pigs after their
potion treatment, including maze1 as a covariate to control for pre-existing differences in cognitive ability and
using the transformed versions of these variables if appropriate. For both the fixed factor and the covariate,
report the results of a statistical test and a brief interpretation of the effect.
Using log10(maze2+1)~potion+log10(maze1+1):
Maze performance after the potion treatment was significantly positively associated with pre-treatment maze
performance (1) (F1,85 = 619.9, P < 0.001) (3), but was not significantly affected by potion treatment (1) (F4,85
= 1.8, P = 0.136) (3).
potions$potion <- factor(potions$potion)
potions$potion <- relevel(potions$potion,"placebo")
model1 <- lm(log.maze2~log.maze1+potion,data=potions)
anova(model1)
summary(model1)
par(mfrow=c(2,2))
plot(model1)
A8. (5 marks) Test whether pre-treatment maze performance is independent of potion type, remembering to
check the assumptions of your analysis. Report the test statistic, degrees of freedom, P value and your
conclusion.
Need to use log-transformed data (1).
Pre-treatment maze performance is independent of potion type (1) (F4,95 = 0.677, P = 0.610) (3).
model2 <- aov(log.maze1~potion,data=potions)
summary(model2)
par(mfrow=c(2,2))
plot(model2)
A9. UPLOAD YOUR R SCRIPT
Part B (40 marks)
Heroic mastermind Professor Kat Left has been busy developing the use of pet animals to reduce exam nerves.
She is interested in identifying what characteristics of animals most reduce nerves. She has given 200 students
a pet that they spend 10 minutes with, and recorded the change in their reported exam nerves (nervesChange)
over this period. The data are in the file soothing-pets.csv. The 10 characteristics of the pets are as follows:
• weight: body weight (kg)
• body: body length (cm)
• tail: tail length (cm)
• eyes: eye-to-head ratio (proportion)
• colour: colour of fur (hue as proportion 0 to 1)
• friend: how much the animal likes being around humans (0 to 100)
• noisiness: noise production when petted (0 to 100)
• fluff: how fluffy their fur or feathers are (0 to 100)
• cuddle: how much the animal likes being cuddled (0 to 100)
• sex: sex of the animal (1: male, 2: female)
B10. (2 marks) Is there evidence that overall the experiment reduced the average nerves among the students?
Yes, according to a one-sample t-test (t199 = −23.72, P<0.001). (2)
t.test(pets$nerveschange,mu=0,alternative="two.sided")
B11. (4 marks) The heroic professor suspects that nerves are more likely to decrease if the pet has a tail. Use
a chi-squared test to assess this prediction.
There is no evidence for this prediction (1) (χ22 = 0.65, P=0.421). (3)
pets$changebin<-ifelse(pets$nerveschange<0,1,0)
pets$tailbin<-ifelse(pets$tail>0,1,0)
chisq.test(pets$changebin,pets$tailbin,correct=FALSE)
B12. (6 marks) Test whether there is a difference in friendliness between male and female pets, remembering
to check the assumptions of your test.
The variable friend is non-normal within each sex (W > 0.93, P<0.004) (2) so Mann–Whitney–Wilcoxon test
used (1). There was no difference in friendliness between male and female pets (1) (W = 4524.5, P = 0.515)
(2).
Max. 3 marks if using t-test.
shapiro.test(subset(pets,sex==1)$friend)
shapiro.test(subset(pets,sex==2)$friend)
wilcox.test(friend~sex,data=pets)
B13. (3 marks) Use hierarchical multiple regression to predict change in nerves from the pet characteristics.
In the first model include the physical characteristics (weight, body, sex). In the second model add appearance
(eyes, colour, fluff). Does adding the appearance variables significantly improve the fit of the model?
Appearance variables do not significantly improve the fit (1) (F 3,193 = 2.37, P=0.072) (2)
model1<-lm(nerveschange~weight+body+sex,data=pets)
model2<-update(model1,~.+eyes+colour+fluff)
anova(model1,model2)
B14. (2 marks) Now add the behaviour variables (friend, noisiness, cuddle) to make a final model with 9
predictors. What percentage of the variance in the change in nerves does the final model explain?
11% (r2=0.11) (2)
model3<-update(model2,~.+friend+noisiness+cuddle)
summary(model3)
B15. (4 marks) What is the effect on exam nerves of a 10% increase in the proportional eye size?
−0.14 (4)
2 marks for −1.4
1 mark for 1.4
B16. (2 marks) Which case (row number) has the highest leverage in the final model?
117 (2)
par(mfrow=c(2,2))
plot(model1)
B17. (2 marks) Is there any evidence of multicollinearity? Report the maximum and mean values of the
statistic you use.
Max VIF = 1.084 (1)
Mean VIF = 1.041 (1)
library(car)
max(vif(model3))
mean(vif(model3))
B18. (3 marks) How many influential cases are there for cuddliness according to the DFBeta values?
15 cases (2) have DFBeta scores greater than ±0.141 (= 2/sqrt(200)) (1)
dfbs <- dfbetas(model3)
2/sqrt(nrow(pets))
View(dfbs)
B19. (4 marks) Report the AIC values for all three models, and report which is the best model according to
this criterion.
Model 1: 1037 (1)
Model 2: 1036 (1)
Model 3: 1027 (1)
Model 3 is the best as it has the lowest AIC (1)
AIC(model1,model2,model3)
B20. (3 marks) Now drop all predictors except cuddliness (cuddle). Does including a quadratic term for
cuddle result in a significantly better fit to the data than a simple regression with just a linear effect of cuddle?
No, adding a quadratic term does not significantly improve the fit (1) (F1,197 = 1.71, P=0.193) (2)
modelLin<-lm(nerveschange~cuddle,data=pets)
modelQuad<-lm(nerveschange~cuddle+I(cuddle^2),data=pets)
anova(modelLin,modelQuad)
B21. (2 marks) Use the regression equation for the quadratic model to predict the change in nerves for a pet
with the mean cuddliness.
−5.58 (2)
summary(modelQuad)
mean(pets$cuddle)
modelQuad$coefficients[1]+modelQuad$coefficients[2]*mean(pets$cuddle)
+modelQuad$coefficients[3]*mean(pets$cuddle)^2
B22. (3 marks) Use the regression equation for the simpler model (with a linear term only) to predict the
cuddliness of a pet that reduced exam nerves by 5.
14.79 (3)
(-5-modelLin$coefficients[1])/modelLin$coefficients[2]
B23. UPLOAD YOUR R SCRIPT
Part C (25 marks)
Professor Natasha Hamper de Ibiza is studying the gustatory responses of honeybees to varying concentrations
of sucrose solution. She collects 30 honeybees (labelled under beeID) and fixes them in a harness. She then
repeatedly stimulates each bee’s antennae with sucrose solution and counts how many times in 1 minute the
bee extends its proboscis (extensions). She tests each bee three times, presenting the following sucrose
concentrations (concn) in randomised order: 1% (‘conc1’), 10% (‘conc10’) and 30% (‘conc30’). Her data are
in the file thirsty-bees.csv.
C24. (2 marks) To understand the effect of sucrose concentration on proboscis extensions, why is it important
that the order of the three treatments was randomised?
To avoid carry-over effects, e.g. bees could become fatigued or habituated to the sucrose solution (2) [or
something else along these lines].
C25. (2 marks) Why is it sensible to treat the number of proboscis extensions in 1 minute as a Poisson
variable?
Number of occurrences of an event (1) in a fixed unit of time (1); count variable with lower limit of zero and
no theoretical upper limit [or something else along these lines].
C26. (9 marks) Fit a generalised linear mixed-effects model (GLMM) with a Poisson error structure to predict
the number of proboscis extensions as a function of sucrose concentration, including a random intercept term
to capture variation in the average extension rate among bees. Test the significance of the predictor using a
likelihood-ratio test and enter your results in the yellow cells in the table below.
Predictor Estimate ± SE χ2 d.f. P
intercept (1%) 1.77 ± 0.08 (2)
concentration 362.5 2 < 0.001 (3)
10% 1.24 ± 0.09 (2)
30% 1.36 ± 0.08 (2)
library(lme4)
model1 <- glmer(extensions~concn+(1|beeID),data=bees,family=poisson)
summary(model1)
drop1(model1,test="Chisq")
C27. (3 marks) Compute the scale parameter for this model. Is there evidence of over- or underdispersion?
Scale parameter = 0.902 (2); indicates slight underdispersion (1), but not problematic.
deviance(model1)/df.residual(model1)
C28. (4 marks) Using the parameter estimates from the model you fitted in (C24), how many times greater is
the number of proboscis extensions for 10% sucrose, compared to 1% sucrose?
3.45 times greater (4).
exp(fixef(model1))
C29. (5 marks) Compute confidence intervals for the effects of 10% and 30% sucrose. Based on this evidence,
do honeybees discriminate between these higher concentrations?
10%: (2.92, 4.09) (2); 30%: (3.31, 4.61) (2). Overlapping, so no evidence of discrimination (1).
exp(confint(model1))
C30. UPLOAD YOUR R SCRIPT
essay、essay代写