STA 304 /1003 Winter 2021
Module 5 Outline
Stratified Random Sampling Design:
I Inference for the population mean §4.3
I Inference for the population total §4.4
I Inference for the population proportion §4.5
Shivon Sue-Chee Module 5- Stratified Random Sampling 1
Term Test #1 Info
Test #1 on Monday, February 22 from 3:10-4pm.
Quercus Quiz
Coverage: tested on Chps 1-5; except 4.6, 5.11, slides (up to Wed.
Feb. 10), homework, past tests
Practice tests available in Quercus
Student Solutions Manual: worked-out answers to odd-numbered
questions
Support hours during Reading week
Shivon Sue-Chee Module 5- Stratified Random Sampling 2
Stratified random sampling- Ch. 5
Definition
[STRS] Divide the target population into strata; draw SRS within each
stratum
Shivon Sue-Chee Module 5- Stratified Random Sampling 3
Stratified random sampling- Ch. 5
Definition
[STRS] Divide the target population into strata; draw SRS within each
stratum
Why?
1 more efficient- smaller margin of error
2 more convenient-$$ cost less
3 get estimates for subgroups
4 protects against imbalance
Example: study of job status of undergraduate students
Possible strata:
Shivon Sue-Chee Module 5- Stratified Random Sampling 4
Stratified random sampling- Ch. 5
Text Examples:
USA Canada
Consumer Price Index
Current Population Survey
Establishment Survey
HW: What are the comparable Statistics Canada surveys?
Shivon Sue-Chee Module 5- Stratified Random Sampling 5
Notation and inference in Stratified RS
Strata and number of elements:
{1, . . . , L} =⇒ {N1, . . . ,NL} where
∑L
`=1N` = N
Samples: means and sizes
{n1, . . . , nL} =⇒ {y¯1, . . . , y¯L} where
∑L
`=1 n` = n
Population mean and total:
µ =
τ
N
=
τ1 + . . .+ τL
N
=
N1µ1 + . . .+ NLµL
N
Stratum means and estimate of µ
µ̂ = y¯st =
N1
N
y¯1 + . . .+
NL
N
y¯L
Shivon Sue-Chee Module 5- Stratified Random Sampling 6
Notation and inference in Stratified RS
Strata and number of elements:
{1, . . . , L} =⇒ {N1, . . . ,NL} where
∑L
`=1N` = N
Samples: means and sizes
{n1, . . . , nL} =⇒ {y¯1, . . . , y¯L} where
∑L
`=1 n` = n
Population mean and total:
µ =
τ
N
=
τ1 + . . .+ τL
N
=
N1µ1 + . . .+ NLµL
N
Stratum means and estimate of µ
µ̂ = y¯st =
N1
N
y¯1 + . . .+
NL
N
y¯L
Shivon Sue-Chee Module 5- Stratified Random Sampling 7
Inference in STratified RS: Estimator of µ (§5.3)
Toolbox (5.1) & (5.2)
Estimator of population mean:
µ̂st = y¯st =
L∑
i=1
Ni
N
y¯i
Estimated variance of estimator
Using {y¯i ,Var(y¯i ), V̂ar(y¯i )} :
V̂ar(µˆst) = V̂ar(y¯st) =
L∑
i=1
N2i
N2︸︷︷︸
weights
s2i
ni
(
1− ni
Ni
)
︸ ︷︷ ︸
from SRS
Shivon Sue-Chee Module 5- Stratified Random Sampling 8
Examples 5.1, 5.2
see data in Table 5.1
Summary: {i = 1, 2, 3}
Ni ni Mean (y¯i ) Median si
Town A 155 20 33.90 34.50 5.95
Town B 62 8 25.12 26.00 15.25
Rural 93 12 19.00 17.50 9.36
Shivon Sue-Chee Module 5- Stratified Random Sampling 9
...example 5.2, using R
> A=scan()
1: 35 43 36 39 28 28 29 25 38 27
11: 26 32 29 40 35 41 37 31 45 34
21:
Read 20 items
> B = scan()
1: 27 15 4 41 49 25 10 30
9:
Read 8 items
> Rural = scan()
1: 8 14 12 15 30 32 21 20 34 7 11 24
13:
Read 12 items
Shivon Sue-Chee Module 5- Stratified Random Sampling 10
...example 5.2: boxplots
To
w
n
A
To
w
n
B
R
ur
al
10 20 30 40 50
hours
S
tra
tu
m
Q: Sketch one boxplot of the hours using the data from all 3 areas.
Shivon Sue-Chee Module 5- Stratified Random Sampling 11
...example 5.2: an R function
> msn<- function(v){
+ nv=length(v);
+ meanv=mean(v);
+ sdv=sqrt(var(v));
+ return(c(meanv,sdv,nv))
+ }
> #using msn function
> msn(A)
[1] 33.90000 5.94625 20.00000
> msn(B)
[1] 25.12500 15.24502 8.00000
> msn(Rural)
[1] 19.00000 9.36143 12.00000
Shivon Sue-Chee Module 5- Stratified Random Sampling 12
...example 5.2: getting margin of error
> nA = msn(A)[3] # or length(A)
> nB = msn(B)[3]
> nR = msn(Rural)[3]
> NAA = 155
> NB = 62
> NR = 93
> c(nA/NAA,nB/NB,nR/NR)
[1] 0.1290323 0.1290323 0.1290323
> #notice that fpc’s are the same
> fpc = 1-.129
> NAA^2*fpc*var(A)/nA + NB^2*fpc*var(B)/nB
+ NR^2*fpc*var(Rural)/nR
[1] 189277.8
> .Last.value/((NAA+NB+NR)^2)
[1] 1.969592
Therefore, m.e.= ±2×
Shivon Sue-Chee Module 5- Stratified Random Sampling 13
Class data example
Use our class data to obtain a stratified random sample of n = 20
heights with job status as our stratification variable. Find a 95%
interval estimate for the mean population height.
Compare results with simple random sampling.
Refer to “jobheight.csv” data file and lecture codes.
Shivon Sue-Chee Module 5- Stratified Random Sampling 14
[STRS] of total τ and proportion p
Shivon Sue-Chee Module 5- Stratified Random Sampling 15
Estimation of total τ and proportion p
Toolbox: (5.3) & (5.4)
τˆst = Nµˆst = Ny¯st =
∑L
i=1Ni y¯i
Notice: N cancels out
V̂ar(τˆst) = N
2
∑L
i=1
N2i
N2
s2i
ni
(
1− niNi
)
Toolbox: (5.13) & (5.14)
pˆst =
1
N
∑L
i=1Ni pˆi
V̂ar(pˆst) =
∑L
i=1
N2i
N2
pˆi qˆi
ni−1
(
1− niNi
)
Shivon Sue-Chee Module 5- Stratified Random Sampling 16
Estimation of total τ and proportion p
Toolbox: (5.3) & (5.4)
τˆst = Nµˆst = Ny¯st =
∑L
i=1Ni y¯i
Notice: N cancels out
V̂ar(τˆst) = N
2
∑L
i=1
N2i
N2
s2i
ni
(
1− niNi
)
Toolbox: (5.13) & (5.14)
pˆst =
1
N
∑L
i=1Ni pˆi
V̂ar(pˆst) =
∑L
i=1
N2i
N2
pˆi qˆi
ni−1
(
1− niNi
)
Shivon Sue-Chee Module 5- Stratified Random Sampling 17
Determining sample size
Shivon Sue-Chee Module 5- Stratified Random Sampling 18
What about sample size(s)? (§5.4, 5.7)
use Var(θ̂): V (y¯st); Var(pˆ)
need {σ21, . . . , σ2L}; {p1, . . . , pL}
Given a desired bound, B, set B = 2
√
Var(θ̂) and solve for n
Toolbox (5.6): for µ or τ Toolbox (5.15): for p
n =
∑L
i=1 N
2
i σ
2
i /ai
N2D+
∑L
i=1 Niσ
2
i
n =
∑L
i=1 N
2
i piqi/ai
N2D+
∑L
i=1 Nipiqi
D(µ) = B
2
4 , D(τ) =
B2
4N2
D(p) = B
2
4
allocation fraction, ai , i = 1, . . . , L;
∑
ai = 1. Hence, ni = nai
How to allocate to strata?
Shivon Sue-Chee Module 5- Stratified Random Sampling 19
What about sample size(s)? (§5.4, 5.7)
use Var(θ̂): V (y¯st); Var(pˆ)
need {σ21, . . . , σ2L}; {p1, . . . , pL}
Given a desired bound, B, set B = 2
√
Var(θ̂) and solve for n
Toolbox (5.6): for µ or τ Toolbox (5.15): for p
n =
∑L
i=1 N
2
i σ
2
i /ai
N2D+
∑L
i=1 Niσ
2
i
n =
∑L
i=1 N
2
i piqi/ai
N2D+
∑L
i=1 Nipiqi
D(µ) = B
2
4 , D(τ) =
B2
4N2
D(p) = B
2
4
allocation fraction, ai , i = 1, . . . , L;
∑
ai = 1. Hence, ni = nai
How to allocate to strata?
Shivon Sue-Chee Module 5- Stratified Random Sampling 20
Optimal allocation (§5.5, 5.7)
Aim: minimize Var(θ̂) for a fixed cost or minimize cost for fixed
Var(θ̂)
ni ∝ Niσi√
ci
see Toolbox (5.7) What does ni depend on?
Shivon Sue-Chee Module 5- Stratified Random Sampling 21
Optimal allocation (§5.5, 5.7)
Special case: Neyman allocation- ci ’s are all the same
ni = n
(
Niσi∑L
i=1Nkσk
)
see eq.s (5.9, 5.10)
Special case: proportional allocation- σi ’s are all the same as well
ni = n
(
Ni
N
)
see eq.s (5.11, 5.12)
Shivon Sue-Chee Module 5- Stratified Random Sampling 22
Why do we stratify?
improves precision, if strata are more homogeneous than population
can be more work, depending on how strata are chosen
so if strata are not more homogeneous, may not be worth it
See Example 5.17: stores chosen from various chains:
Chain 1 Chain 2 Chain 3 Chain 4
Ni 24 36 30 30
ni 4 6 5 5
yi 99 100 98 100
s2i 78.67 55.60 39.50 112.50
V̂ar(y¯st) > than it would be from SRS
Shivon Sue-Chee Module 5- Stratified Random Sampling 23
More examples
Dollar stratification (Example 5.19): “cumulative square root of the
frequency method”
EPA National Pesticide Survey
Shivon Sue-Chee Module 5- Stratified Random Sampling 24
Example (Lohr: example 3.12)
EPA sampled drinking water wells to estimate the prevalence of
pesticides and nitrate
needed wide range of levels of pesticide use, and susceptibility to
groundwater pollution
wanted to study community water systems (CWS) and rural
domestic wells
EPA developed criteria for separating the population of CWS wells
and rural wells into 4 categories of pesticide use and 3 relative
ground-water vulnerability measures
identified 5 subgroups: CWS with high average vulnerability, rural
with high, rural high pesticide, etc.
assumed 0.5% of CWS wells contained pesticides; 1% of rural;
estimated 564 public and 734 private wells provide adequate precision
Shivon Sue-Chee Module 5- Stratified Random Sampling 25
Example (Lohr: example 3.12)
EPA sampled drinking water wells to estimate the prevalence of
pesticides and nitrate
needed wide range of levels of pesticide use, and susceptibility to
groundwater pollution
wanted to study community water systems (CWS) and rural
domestic wells
EPA developed criteria for separating the population of CWS wells
and rural wells into 4 categories of pesticide use and 3 relative
ground-water vulnerability measures
identified 5 subgroups: CWS with high average vulnerability, rural
with high, rural high pesticide, etc.
assumed 0.5% of CWS wells contained pesticides; 1% of rural;
estimated 564 public and 734 private wells provide adequate precision
Shivon Sue-Chee Module 5- Stratified Random Sampling 26
...Example (Lohr: example 3.12)
Stratum Pesticide Use Estimated Number of
groundwater
vulnerability counties
1 High High 106
2 High Moderate 234
3 High Low 129
4 Moderate High 110
5 Moderate Moderate 204
6 Moderate Low 267
7 Low High 193
8 Low Moderate 375
9 Low Low 404
10 Uncommon High 186
11 Uncommon Moderate 513
12 Uncommon Low 416
Shivon Sue-Chee Module 5- Stratified Random Sampling 27
Advantages for this study
wells within stratum more homogeneous → more precise estimate
each level of pesticide use and groundwater vulnerability are included
in the sample
can estimate within each stratum as well
factorial design permits estimation of interactions
Shivon Sue-Chee Module 5- Stratified Random Sampling 28
Practice questions
Shivon Sue-Chee Module 5- Stratified Random Sampling 29
Exercise 5.10 (same as Example 5.3)
Stratum N n y¯ s2
Stratum I 86 14 63.4 1072
Stratum II 72 12 183.0 9054
Stratum III 52 9 340.6 16794
Stratum IV 30 5 472.4 72376
τˆst
V̂ar(τˆst)
Shivon Sue-Chee Module 5- Stratified Random Sampling 30
Exercise 5.13 (same as Example 5.12)
Stratum N p c
Stratum I: users, city 97 0.9 $4
Stratum II: users, country 43 0.9 $4
Stratum III: nonusers, city 145 0.5 $8
Stratum IV: nonusers, country 68 0.5 $8
sample size and allocation given bound:
Shivon Sue-Chee Module 5- Stratified Random Sampling 31
Exercise 5.14
Stratum N n pˆ
Stratum I: users, city 97 39 0.87
Stratum II: users, country 43 17 0.93
Stratum III: nonusers, city 145 69 0.60
Stratum IV: nonusers, country 68 32 0.53
Shivon Sue-Chee Module 5- Stratified Random Sampling 32
Exercise 5.32
Job Mean(y¯) SD∗ Sample size
Anesthesiologist 6.63 0.15 1347
Anesthesiology resident 7.74 0.35 163
Nurse anesthetist 6.55 0.11 1095
Estimate µ of population and find a bound on the error of estimation
Shivon Sue-Chee Module 5- Stratified Random Sampling 33
Exercise 5.36
Region Number Mean Standard Deviation
North Central 1052 326 271
North East 210 95 79
South 1376 200 244
West 418 730 837
optimal allocation
Shivon Sue-Chee Module 5- Stratified Random Sampling 34
Next steps
Comparative Stats Canada Surveys:
HW : 5.10, 5.13, 5.14, 5.32, 5.36
Contact hours available during Reading week
Readings: Chapter 5 (ESS)
Shivon Sue-Chee Module 5- Stratified Random Sampling 35
学霸联盟