R代写-STA 304

STA 304 H1S/ 1003 HS Winter 2021
Outline
Infinite vs Finite populations (§3.1-3.3)
Drawing SRS’s using R
Estimation (§3.6)
Intro to Sampling distributions (§3.4)
Shivon Sue-Chee Module 3- Populations and Distributions 1
Review
I Assignment 1 due in Quercus on Thursday, Sept. 27 by 8pm
I Instructor and TA office hours, Special R help sessions
I Piazza discussion forum
You should know...:
I How to overcome selection bias and errors of observation?
I What is a probability sample?
I Give examples of types of probability samples.
I Give examples of non-probability samples.
I Compare and contrast simple random, stratified, cluster and systematic
sampling.
Shivon Sue-Chee Module 3- Populations and Distributions 2
Homework Overview
Practice Problems: 2.30, 2.31, 3.2-3.3, 3.7, 3.12-3.14 (ESS)
Shivon Sue-Chee Module 3- Populations and Distributions 3
Infinite vs Finite population
Infinite population:
I infinitely many units exist
I randomly sampled elements have the same chance of being selected
I selections are independent of one another
I oppositie to finite population
I probability models
I approximate distribution of sample statistic
Finite population:
I population size is finite, N; {u1, u2, . . . , uN}
I sampling without replacement: condition of independence not satisfied
Shivon Sue-Chee Module 3- Populations and Distributions 4
Random sampling from an infinite population (§3.2)
y1, . . . , yn : an i.i.d. sample of n observations from a probability
mass function, p(·) or a probability density function, f (·)
we observe random variable, Y1 and it has the value, y1,
Y2 has the value, y2 and so on
“identical”: each Yi follows p(y) or f (y)
values are sampled independently
Eg.: y1, . . . , yn i .i .d . N (µ, σ2)
In R: rnorm(n, µ, σ)
Shivon Sue-Chee Module 3- Populations and Distributions 5
Random sampling from a finite population (§3.3)
has a set of N units/elements of interest, {u1, u2, . . . , uN}
population size= number of elements, N (likely large)
uses a randomization method to select the sample
sampling with replacement, δi = P(ui is sampled)
sampling without replacement, pii = P(ui is sampled)
Shivon Sue-Chee Module 3- Populations and Distributions 6
Random sampling from a finite population (§3.3)
most general approach is to consider unequal probabilities of
selection for each unit
more usual approach to have equal probabilities of selection;
simplest case is pii = 1/N :
I population mean and variance as defined on slide that follows
I consistent with usual probability definitions
I motivation for pp 53-57
I easier to find “good” estimates for designs employing equal probabilities
Shivon Sue-Chee Module 3- Populations and Distributions 7
Random sampling from a finite population (§3.3)
Equal probabilities of selection under:
With replacement:
δi=P(ui is selected on any one draw)=
1
N
Without replacement:
pii=P(the ith element in the population, ui is selected in the
sample)
I changes with the draw
I use average probability that ui is selected across n draws
Shivon Sue-Chee Module 3- Populations and Distributions 8
Drawing SRS’s using R
Randomly select with or without replacement from a list
sample(N,n)
sample(N,n,replace=TRUE)
Random samples from various models
runif(n)
rnorm(n)
rbinom(n,m,p)
Shivon Sue-Chee Module 3- Populations and Distributions 9
Finite population parameters
population mean: µ = 1N
∑N
i=1 y(ui ) =
1
N
∑N
i=1 yi
population variance:
σ2 =
1
N
N∑
i=1
{y(ui )− µ}2 = 1
N
N∑
i=1
(yi − µ)2
population total: τ =
∑N
i=1 y(ui ) =
∑N
i=1 yi = Nµ
population proportion: y=0 or 1,
p =
1
N
N∑
i=1
y(ui ) =
1
N
N∑
i=1
yi
population ratio: µy/µx
Give real examples:
Shivon Sue-Chee Module 3- Populations and Distributions 10
Random Variables
A random variable/measurement/property is a function,
yi = y(ui) defined on population elements, ui
the range of a variable is the set of possible values of y
Example:
I population: N restaurants in Toronto
I variable: number of job openings for waitstaff
I range of variable: 0, 1, 2, . . .
I Eg. of values: y(u1) = 5, y(u2) = 0, y(u3) = 1, . . . , y(uN) = 0
Shivon Sue-Chee Module 3- Populations and Distributions 11
Finite population: our class example
population students in 304/1003 Winter 2021
variable social media hours, height, interests,. . .
range
eg. of values
Shivon Sue-Chee Module 3- Populations and Distributions 12
Sampling distributions (§3.4)
y¯ , s2 and other functions of sample (y1, . . . , yn)
statistics have probability distributions, because (y1, . . . , yn) has
a probability distribution
Note: only probability samples have probability distributions
In sampling from:
I an infinite population (i.i.d. sampling), p(y) or f (y) determines the
probability distribution of y¯ and other functions of (y1, . . . , yn)
I a finite population, the sampling method, sample size n and the nature
of the population determine the probability distribution of y¯ and other
functions of (y1, . . . , yn)
Shivon Sue-Chee Module 3- Populations and Distributions 13
...sampling distributions (§3.4)
in either case, we can try to figure out the sampling distribution
of the function of interest
or we can simulate lots of samples, compute lots of, for e.g., y¯ ’s,
and make a histogram
see Figures 3.2, 3.4, 3.5 (ESS)
Shivon Sue-Chee Module 3- Populations and Distributions 14
...sampling distributions (§3.4)
For infinite population sampling...:
the Central Limit Theorem (CLT) says

·∼ N (µ, σ
2
n
)
as long as n is large
For finite population sampling...:
CLT says

·∼ N (µ, ...)
but we need n→∞, N →∞, N − n→∞
Shivon Sue-Chee Module 3- Populations and Distributions 15
Table 3.4
population given in Table 3.4 N = 68
histogram in Figure 3.3
sample mean computed based on samples of size 5: sampling
histogram Figure 3.4a
sample mean computed based on samples of size 40: sampling
histogram Figure 3.4b
simple random sampling without replacement
central limit theorem works better if underlying distribution is
symmetric
they changed to consider log(weight); the population distribution is
more nearly symmetric
see Figure 3.6
in Figure 3.7, N − n is very small (68-60=8); CLT not working so well
Shivon Sue-Chee Module 3- Populations and Distributions 16
Estimation (§3.6)
Notation:
I population parameter, θ
I sample statistic, θ̂
Using the (approx.) sampling distribution of a sample statistic θ̂,
we can find
I expected value of θ̂, E(θ̂)
I variance of θ̂, Var(θ̂) = E (θ̂ − E(θ̂))2
I error of estimation, |θ̂ − θ|
I Bias(θ̂) = E(θ̂)− θ
I Mean Square Error(θ̂) = E(θ̂ − θ)2=Bias2(θ̂) + Var(θ̂)
Show!
I bound on error of estimation=margin of error=2 ∗

Var(θ̂)
to assess the accuracy of θ̂
Shivon Sue-Chee Module 3- Populations and Distributions 17
In general, we want θ̂ unbiased and precise
Unbiased, if E(θ̂) = θ
Precise, if Var(θ̂) is small
Accurate, if MSE(θ̂) is small
Shivon Sue-Chee Module 3- Populations and Distributions 18
In general, we want θ̂ unbiased and precise
Unbiased, if E(θ̂) = θ
Precise, if Var(θ̂) is small
Accurate, if MSE(θ̂) is small
Shivon Sue-Chee Module 3- Populations and Distributions 19
In general, we want θ̂ unbiased and precise
Unbiased, if E(θ̂) = θ
Precise, if Var(θ̂) is small
Accurate, if MSE(θ̂) is small
Shivon Sue-Chee Module 3- Populations and Distributions 20
Outline
Sampling distributions
I of the sample mean
I central limit theorem
I confidence intervals
I Survey 1 data
Inclusion Probability
Shivon Sue-Chee Module 3- Populations and Distributions 21
...sampling distributions (§3.4)
For infinite population sampling...:
the Central Limit Theorem (CLT) says

·∼ N (µ, σ
2
n
)
as long as n is large
For finite population sampling...:
CLT says

·∼ N (µ, ...)
but we need n→∞, N →∞, N − n→∞
Shivon Sue-Chee Module 3- Populations and Distributions 22
Estimation (§3.6)
Using the (approx.) sampling distribution of a sample statistic θ̂,
we can find
I expected value of θ̂, E(θ̂)
I variance of θ̂, Var(θ̂) = E (θ̂ − E(θ̂))2
I error of estimation, |θ̂ − θ|
I Bias(θ̂) = E(θ̂)− θ
I Mean Square Error(θ̂) = E(θ̂ − θ)2=Bias2(θ̂) + Var(θ̂)
I bound on error of estimation=margin of error=2 ∗

Var(θ̂)
to assess the accuracy of θ̂
Shivon Sue-Chee Module 3- Populations and Distributions 23
Proof of MSE
MSE [θ̂] = E[(θ̂ − θ)2]
= E[(θ̂ − E[θ̂] + E[θ̂]− θ)2]
= E[(θ̂ − E[θ̂])2] + (E[θ̂]− θ)2 + 2E[(θ̂ − E[θ̂])(E[θ̂]− θ)]
= Var(θ̂) + [Bias(θ̂)]2
Shivon Sue-Chee Module 3- Populations and Distributions 24
Some raw Intro Survey data
Shivon Sue-Chee Module 3- Populations and Distributions 25
Shivon Sue-Chee Module 3- Populations and Distributions 26
Side-by-side boxplots
> summary(handspan)
Min. 1st Qu. Median Mean 3rd Qu. Max.
6.70 17.00 19.00 18.56 20.50 31.00
> summary(digit)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 3.000 6.000 5.358 7.000 9.000
Shivon Sue-Chee Module 3- Populations and Distributions 27
Generating sampling distributions
function of interest:
I daily average social media hours or
I average height
sampling distribution of sample mean
approx. sampling distributions using simulations in R:
1. generate 1000 samples of size n
2. vary n: 10, 20, 30
3. find the 1000 means of each of the size-n samples
4. plot histogram of 1000 means
5. repeat steps 3-4 for various n
Shivon Sue-Chee Module 3- Populations and Distributions 28
Generating sampling distributions
1 Suppose that the population size is N = 212.
2 How many samples of size n = 10 are possible?
3 How many samples of size n = 20 are possible?
4 How many samples of size n = 30 are possible?
5 How many samples are sufficient to represent the sampling
distribution?
6 Which phenomenon occurs as n increases?
Shivon Sue-Chee Module 3- Populations and Distributions 29
...Generating sampling distributions
Varying sample size, n
#initialize a zero vector of length 1000 for handspans
hbar = rep(0,1000)
par(mfrow=c(3,1))
#sample size, n=10
for(i in 1:1000){hbar[i]=mean(sample(handspan,10))}
hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31), main="n=10")
#sample size, n=20
for(i in 1:1000){hbar[i]=mean(sample(handspan,20))}
hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31),main="n=20")
#sample size, n=30
for(i in 1:1000){hbar[i]=mean(sample(handspan,30))}
hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31),main="n=30")
Shivon Sue-Chee Module 3- Populations and Distributions 30
Shivon Sue-Chee Module 3- Populations and Distributions 31
Shivon Sue-Chee Module 3- Populations and Distributions 32
Probability of inclusion
Suppose we draw a sample of size n without replacement from a
finite population of N elements (u1, u2, . . . , uN).
How many samples of size n are possible?
If we assume that each sample is equally likely, what is the
probability that one of the samples is selected?
What is the probability that the ith unit is included in the
sample of size n?
Shivon Sue-Chee Module 3- Populations and Distributions 33
...Probability of inclusion
An Indicator random variable is defined as (Appendix A, (Lohr)):
Zi =
{
1 if unit i is in the sample
0 if unit i is not in the sample.
Then in a sample of size n, n of the random variables
Z1,Z2, . . . ,ZN will take on the value of 1, and the remaining
N − n will be 0.
If the ith unit is in the sample, then Zi = 1 and the other n − 1
units must come from the remaining N − 1 units in the
population.
Shivon Sue-Chee Module 3- Populations and Distributions 34
...Probability of inclusion
P(Zi = 1) = P(ith unit is in the sample)
=
(1
1
)(N−1
n−1
)(N
n
)
=
n
N
Thus,
E[Zi ] = 1× P(Zi = 1) + 0× P(Zi = 0)
= P(Zi = 1) =
n
N
Shivon Sue-Chee Module 3- Populations and Distributions 35
Summary of Sampling distributions
sample statistics=functions of sample (y1, . . . , yn)
we are considering probability samples
a sample statistic has a probability distribution =⇒ sampling
distribution
Eg. sampling distribution of sample mean of tv
depend on sampling mechanism, size of sample n, and the nature
of the population
by the CLT, for large n, y¯
·∼ N
Shivon Sue-Chee Module 3- Populations and Distributions 36
Homework
Assignment #1 due by 8pm Feb. 4 into Quercus
Exercises: 3.2-3.3, 3.7, 3.12, 3.13, 3.14, (Lohr)
2.12-A.1 