考试代写-STATS 1000 /|学霸联盟

考试代写-STATS 1000 /

时间：2022-04-29

STATS 1000 / STATS 1004 / STATS 1504
Statistical Practice 1
Lecture notes
Week 7
Stephen Crotty
School of Mathematical Sciences, University of Adelaide
Semester 1 2022
Two-sample t-test
Two-sample problems
What if we want to compare the mean of some quantitative variable
for the individuals in two populations, Population 1 and Population
2?
Population Parameter Statistic Sample Size
1 µ1 x¯1 n1
2 µ2 x¯2 n2
We are interested in µ1 − µ2.
Sampling distribution of X¯1 − X¯2
I What is the mean of X¯1 − X¯2?
I What is the standard error of X¯1 − X¯2?
Example
Rats and ozone
To measure the effects of ozone on weight, one group of 70-day-old
rats was kept in an environment containing ozone for 7 days. A
second group of rats of the same age (the control group) was kept
in an ozone-free environment for the same time. The weight gains
(to the nearest gram) were as follows.
Boxplots
Compare the distributions
Control Ozone
−
10
0
10
20
30
40
50
group
w
e
ig
ht
Null and alternative hypotheses
Write down appropriate null and alternative hypotheses to test if the
population means for each group are the same.
Calculate the value of the test statistic
If we have two-samples and we are interested in comparing the
populations means, then we use
T = X¯1 − X¯2√
S21
n1 +
S22
n2
.
Example
group mean SD n
Control 22.35 10.79 23
Ozone 11.00 19.09 22
I Calculate the value of t for the rats data.
The degrees of freedom for the two-sample t-test
I By hand, calculate using min(n1 − 1, n2 − 1).
I R calculates a more accurate version.
Calculate, by hand, the degrees of freedom for the rats dataset.
What is the P-value for the rats data?
Do you reject or retain the null hypothesis at the 5% significance
level?
Confidence interval for two-sample t-test
The formula for calculating the C% confidence interval is
(X¯1 − X¯2)± t∗
√
S21
n1
+ S
2
2
n2
where t∗ is the appropriate critical value to give a confidence level
of C.
Note t∗ is not the same as the test statistic that you calculated in
the hypothesis test.
What is the 95% confidence interval for the mean difference in the
weight gain for the ozone compared to the control group?
Two-sample t-test in R
t.test(weight~group, data=rats)
##
## Welch Two Sample t-test
##
## data: weight by group
## t = 2.4403, df = 32.877, p-value = 0.02023
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 1.885639 20.810013
## sample estimates:
## mean in group Control mean in group Ozone
## 22.34783 11.00000
Check the assumptions
Normality
0
20
40
−2 −1 0 1 2
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
−20
0
20
40
−2 −1 0 1 2
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
Check the assumptions
Independence
I Within each group (i.e., random sampling).
I Between each group (i.e., random allocation).
Summary
Hypothesis testing for two means from independent normal
distributions
I Hypotheses:
H0 : µ1 − µ2 = 0,
Ha : µ1 − µ2 6= 0.
I Test statistic:
T = X¯1 − X¯2√
S21
n1 +
S22
n2
I P-value: t-distribution with min(n1 − 1, n2 − 1) degrees of
freedom, or look at the sig(2-tailed) in the R output.
I Confidence interval:
(X¯1 − X¯2)± t∗
√
S21
n1
+ S
2
2
n2
.
Example Wood dataset
high low
5
10
15
20
25
Preservative
Lo
ss
R output
t.test(Loss~Preservative, data=wood)
##
## Welch Two Sample t-test
##
## data: Loss by Preservative
## t = -7.5472, df = 30.269, p-value = 1.935e-08
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -13.04549 -7.49051
## sample estimates:
## mean in group high mean in group low
## 6.016 16.284
QQ-plots
3
6
9
−2 −1 0 1 2
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
10
20
30
−2 −1 0 1 2
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
Matched Pairs t-test
Matched pairs
So far in the two-sample t-tests, we have assumed the two groups
are independent. Consider the following experiment
Each student in SP1 tests how long they can keep their hands in ice
water before the cold becomes too painful. Once while swearing,
once while not swearing.
Matched pairs
In the matched pairs experiment, illustrated in the swearing
example, we are still comparing two treatments, but now each
subject receives both treatments. Therefore, the two groups are no
longer independent.
Model
In the matched pairs design, each subject has two measurements,
and we call these X and Y . To convert this into a problem we can
already do, we look at the differences
D = X − Y
So for each subject, we have one difference. Then we can test if the
mean of D is different from 0, i.e.,
H0 : µD = 0 vs Ha : µD 6= 0
We can do this with a one-sample t-test.
Example moon data
To assess if the moon phase has an effect on dementia, 15 patients
had the average number of disruptive events measured on moon
days (three days either side of full moon) and the other days.
Moon data
patient moon other
1 3.33 0.27
2 3.67 0.59
3 2.67 0.32
4 3.33 0.19
5 3.33 1.26
6 3.67 0.11
7 4.67 0.30
8 2.67 0.40
9 6.00 1.59
10 4.33 0.60
11 3.33 0.65
12 0.67 0.69
13 1.33 1.26
14 0.33 0.23
15 2.00 0.38
Moon data
0
2
4
6
moon other
day
N
um
be
r o
f i
nc
id
en
ts
Null and alternative hypotheses
Write down appropriate null and alternative hypotheses to test if
there is a difference in the moon data.
Moon data
patient moon other D
1 3.33 0.27 3.06
2 3.67 0.59 3.08
3 2.67 0.32 2.35
4 3.33 0.19 3.14
5 3.33 1.26 2.07
6 3.67 0.11 3.56
7 4.67 0.30 4.37
8 2.67 0.40 2.27
9 6.00 1.59 4.41
10 4.33 0.60 3.73
11 3.33 0.65 2.68
12 0.67 0.69 -0.02
13 1.33 1.26 0.07
14 0.33 0.23 0.10
15 2.00 0.38 1.62
Moon data
The summary statistics of the differences are
mean SD n
2.43 1.46 15
Calculate the value of the test statistic
If we have matched pairs data and we are interested in testing if
there was a difference for one treatment compared to the other,
then we use
T = D¯SD/
√n .
Calculate the P-value
This is done using a t-distribution with n − 1 degrees of freedoms,
where n is the number of subjects.
Confidence interval for the matched-pairs t-test
The formula for calculating the C% confidence interval is
D¯ ± t∗ sD√n
where t∗ is the appropriate critical value to give a confidence level
of C .
Calculate the 95% confidence interval for the mean of the
differences.
R output for moon data
t.test(moon$D)
##
## One Sample t-test
##
## data: moon$D
## t = 6.4518, df = 14, p-value = 1.518e-05
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 1.623968 3.241365
## sample estimates:
## mean of x
## 2.432667
Check the assumptions
Normality
0
1
2
3
4
5
−2 −1 0 1 2
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
Check the assumptions
Independence
Would the behaviour of one patient affect the behaviour of another
patient?
Summary
Hypothesis testing for mean for matched pairs data
I Hypotheses:
H0 : µD = 0,
Ha : µD 6= 0.
I Test statistic:
T = D¯ − 0SD/
√n
I P-value: Calculate using a t-distribution with n − 1 degrees of
freedom.
I Confidence interval
D¯ ± t∗ SD√n .