xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-STA 304

时间：2021-02-01

STA 304 H1S/ 1003 HS Winter 2021

Outline

Infinite vs Finite populations (§3.1-3.3)

Drawing SRS’s using R

Estimation (§3.6)

Intro to Sampling distributions (§3.4)

Shivon Sue-Chee Module 3- Populations and Distributions 1

Review

Admin:

I Assignment 1 due in Quercus on Thursday, Sept. 27 by 8pm

I Instructor and TA office hours, Special R help sessions

I Piazza discussion forum

You should know...:

I How to overcome selection bias and errors of observation?

I What is a probability sample?

I Give examples of types of probability samples.

I Give examples of non-probability samples.

I Compare and contrast simple random, stratified, cluster and systematic

sampling.

Shivon Sue-Chee Module 3- Populations and Distributions 2

Homework Overview

Practice Problems: 2.30, 2.31, 3.2-3.3, 3.7, 3.12-3.14 (ESS)

Shivon Sue-Chee Module 3- Populations and Distributions 3

Infinite vs Finite population

Infinite population:

I infinitely many units exist

I randomly sampled elements have the same chance of being selected

I selections are independent of one another

I oppositie to finite population

I probability models

I approximate distribution of sample statistic

Finite population:

I population size is finite, N; {u1, u2, . . . , uN}

I sampling without replacement: condition of independence not satisfied

Shivon Sue-Chee Module 3- Populations and Distributions 4

Random sampling from an infinite population (§3.2)

y1, . . . , yn : an i.i.d. sample of n observations from a probability

mass function, p(·) or a probability density function, f (·)

we observe random variable, Y1 and it has the value, y1,

Y2 has the value, y2 and so on

“identical”: each Yi follows p(y) or f (y)

values are sampled independently

Eg.: y1, . . . , yn i .i .d . N (µ, σ2)

In R: rnorm(n, µ, σ)

Shivon Sue-Chee Module 3- Populations and Distributions 5

Random sampling from a finite population (§3.3)

has a set of N units/elements of interest, {u1, u2, . . . , uN}

population size= number of elements, N (likely large)

uses a randomization method to select the sample

sampling with replacement, δi = P(ui is sampled)

sampling without replacement, pii = P(ui is sampled)

Shivon Sue-Chee Module 3- Populations and Distributions 6

Random sampling from a finite population (§3.3)

most general approach is to consider unequal probabilities of

selection for each unit

more usual approach to have equal probabilities of selection;

simplest case is pii = 1/N :

I population mean and variance as defined on slide that follows

I consistent with usual probability definitions

I motivation for pp 53-57

I easier to find “good” estimates for designs employing equal probabilities

Shivon Sue-Chee Module 3- Populations and Distributions 7

Random sampling from a finite population (§3.3)

Equal probabilities of selection under:

With replacement:

δi=P(ui is selected on any one draw)=

1

N

Without replacement:

pii=P(the ith element in the population, ui is selected in the

sample)

I changes with the draw

I use average probability that ui is selected across n draws

Shivon Sue-Chee Module 3- Populations and Distributions 8

Drawing SRS’s using R

Randomly select with or without replacement from a list

sample(N,n)

sample(N,n,replace=TRUE)

Random samples from various models

runif(n)

rnorm(n)

rbinom(n,m,p)

Shivon Sue-Chee Module 3- Populations and Distributions 9

Finite population parameters

population mean: µ = 1N

∑N

i=1 y(ui ) =

1

N

∑N

i=1 yi

population variance:

σ2 =

1

N

N∑

i=1

{y(ui )− µ}2 = 1

N

N∑

i=1

(yi − µ)2

population total: τ =

∑N

i=1 y(ui ) =

∑N

i=1 yi = Nµ

population proportion: y=0 or 1,

p =

1

N

N∑

i=1

y(ui ) =

1

N

N∑

i=1

yi

population ratio: µy/µx

Give real examples:

Shivon Sue-Chee Module 3- Populations and Distributions 10

Random Variables

A random variable/measurement/property is a function,

yi = y(ui) defined on population elements, ui

the range of a variable is the set of possible values of y

Example:

I population: N restaurants in Toronto

I variable: number of job openings for waitstaff

I range of variable: 0, 1, 2, . . .

I Eg. of values: y(u1) = 5, y(u2) = 0, y(u3) = 1, . . . , y(uN) = 0

Shivon Sue-Chee Module 3- Populations and Distributions 11

Finite population: our class example

population students in 304/1003 Winter 2021

variable social media hours, height, interests,. . .

range

eg. of values

Shivon Sue-Chee Module 3- Populations and Distributions 12

Sampling distributions (§3.4)

y¯ , s2 and other functions of sample (y1, . . . , yn)

statistics have probability distributions, because (y1, . . . , yn) has

a probability distribution

Note: only probability samples have probability distributions

In sampling from:

I an infinite population (i.i.d. sampling), p(y) or f (y) determines the

probability distribution of y¯ and other functions of (y1, . . . , yn)

I a finite population, the sampling method, sample size n and the nature

of the population determine the probability distribution of y¯ and other

functions of (y1, . . . , yn)

Shivon Sue-Chee Module 3- Populations and Distributions 13

...sampling distributions (§3.4)

in either case, we can try to figure out the sampling distribution

of the function of interest

or we can simulate lots of samples, compute lots of, for e.g., y¯ ’s,

and make a histogram

see Figures 3.2, 3.4, 3.5 (ESS)

Shivon Sue-Chee Module 3- Populations and Distributions 14

...sampling distributions (§3.4)

For infinite population sampling...:

the Central Limit Theorem (CLT) says

y¯

·∼ N (µ, σ

2

n

)

as long as n is large

For finite population sampling...:

CLT says

y¯

·∼ N (µ, ...)

but we need n→∞, N →∞, N − n→∞

Shivon Sue-Chee Module 3- Populations and Distributions 15

Table 3.4

population given in Table 3.4 N = 68

histogram in Figure 3.3

sample mean computed based on samples of size 5: sampling

histogram Figure 3.4a

sample mean computed based on samples of size 40: sampling

histogram Figure 3.4b

simple random sampling without replacement

central limit theorem works better if underlying distribution is

symmetric

they changed to consider log(weight); the population distribution is

more nearly symmetric

see Figure 3.6

in Figure 3.7, N − n is very small (68-60=8); CLT not working so well

Shivon Sue-Chee Module 3- Populations and Distributions 16

Estimation (§3.6)

Notation:

I population parameter, θ

I sample statistic, θ̂

Using the (approx.) sampling distribution of a sample statistic θ̂,

we can find

I expected value of θ̂, E(θ̂)

I variance of θ̂, Var(θ̂) = E (θ̂ − E(θ̂))2

I error of estimation, |θ̂ − θ|

I Bias(θ̂) = E(θ̂)− θ

I Mean Square Error(θ̂) = E(θ̂ − θ)2=Bias2(θ̂) + Var(θ̂)

Show!

I bound on error of estimation=margin of error=2 ∗

√

Var(θ̂)

to assess the accuracy of θ̂

Shivon Sue-Chee Module 3- Populations and Distributions 17

In general, we want θ̂ unbiased and precise

Unbiased, if E(θ̂) = θ

Precise, if Var(θ̂) is small

Accurate, if MSE(θ̂) is small

Shivon Sue-Chee Module 3- Populations and Distributions 18

In general, we want θ̂ unbiased and precise

Unbiased, if E(θ̂) = θ

Precise, if Var(θ̂) is small

Accurate, if MSE(θ̂) is small

Shivon Sue-Chee Module 3- Populations and Distributions 19

In general, we want θ̂ unbiased and precise

Unbiased, if E(θ̂) = θ

Precise, if Var(θ̂) is small

Accurate, if MSE(θ̂) is small

Shivon Sue-Chee Module 3- Populations and Distributions 20

Outline

Sampling distributions

I of the sample mean

I central limit theorem

I confidence intervals

I Survey 1 data

Inclusion Probability

Shivon Sue-Chee Module 3- Populations and Distributions 21

...sampling distributions (§3.4)

For infinite population sampling...:

the Central Limit Theorem (CLT) says

y¯

·∼ N (µ, σ

2

n

)

as long as n is large

For finite population sampling...:

CLT says

y¯

·∼ N (µ, ...)

but we need n→∞, N →∞, N − n→∞

Shivon Sue-Chee Module 3- Populations and Distributions 22

Estimation (§3.6)

Using the (approx.) sampling distribution of a sample statistic θ̂,

we can find

I expected value of θ̂, E(θ̂)

I variance of θ̂, Var(θ̂) = E (θ̂ − E(θ̂))2

I error of estimation, |θ̂ − θ|

I Bias(θ̂) = E(θ̂)− θ

I Mean Square Error(θ̂) = E(θ̂ − θ)2=Bias2(θ̂) + Var(θ̂)

I bound on error of estimation=margin of error=2 ∗

√

Var(θ̂)

to assess the accuracy of θ̂

Shivon Sue-Chee Module 3- Populations and Distributions 23

Proof of MSE

MSE [θ̂] = E[(θ̂ − θ)2]

= E[(θ̂ − E[θ̂] + E[θ̂]− θ)2]

= E[(θ̂ − E[θ̂])2] + (E[θ̂]− θ)2 + 2E[(θ̂ − E[θ̂])(E[θ̂]− θ)]

= Var(θ̂) + [Bias(θ̂)]2

Shivon Sue-Chee Module 3- Populations and Distributions 24

Some raw Intro Survey data

Shivon Sue-Chee Module 3- Populations and Distributions 25

Shivon Sue-Chee Module 3- Populations and Distributions 26

Side-by-side boxplots

> summary(handspan)

Min. 1st Qu. Median Mean 3rd Qu. Max.

6.70 17.00 19.00 18.56 20.50 31.00

> summary(digit)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.000 3.000 6.000 5.358 7.000 9.000

Shivon Sue-Chee Module 3- Populations and Distributions 27

Generating sampling distributions

function of interest:

I daily average social media hours or

I average height

sampling distribution of sample mean

approx. sampling distributions using simulations in R:

1. generate 1000 samples of size n

2. vary n: 10, 20, 30

3. find the 1000 means of each of the size-n samples

4. plot histogram of 1000 means

5. repeat steps 3-4 for various n

Shivon Sue-Chee Module 3- Populations and Distributions 28

Generating sampling distributions

1 Suppose that the population size is N = 212.

2 How many samples of size n = 10 are possible?

3 How many samples of size n = 20 are possible?

4 How many samples of size n = 30 are possible?

5 How many samples are sufficient to represent the sampling

distribution?

6 Which phenomenon occurs as n increases?

Shivon Sue-Chee Module 3- Populations and Distributions 29

...Generating sampling distributions

Varying sample size, n

#initialize a zero vector of length 1000 for handspans

hbar = rep(0,1000)

par(mfrow=c(3,1))

#sample size, n=10

for(i in 1:1000){hbar[i]=mean(sample(handspan,10))}

hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31), main="n=10")

#sample size, n=20

for(i in 1:1000){hbar[i]=mean(sample(handspan,20))}

hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31),main="n=20")

#sample size, n=30

for(i in 1:1000){hbar[i]=mean(sample(handspan,30))}

hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31),main="n=30")

Shivon Sue-Chee Module 3- Populations and Distributions 30

Shivon Sue-Chee Module 3- Populations and Distributions 31

Shivon Sue-Chee Module 3- Populations and Distributions 32

Probability of inclusion

Suppose we draw a sample of size n without replacement from a

finite population of N elements (u1, u2, . . . , uN).

How many samples of size n are possible?

If we assume that each sample is equally likely, what is the

probability that one of the samples is selected?

What is the probability that the ith unit is included in the

sample of size n?

Shivon Sue-Chee Module 3- Populations and Distributions 33

...Probability of inclusion

An Indicator random variable is defined as (Appendix A, (Lohr)):

Zi =

{

1 if unit i is in the sample

0 if unit i is not in the sample.

Then in a sample of size n, n of the random variables

Z1,Z2, . . . ,ZN will take on the value of 1, and the remaining

N − n will be 0.

If the ith unit is in the sample, then Zi = 1 and the other n − 1

units must come from the remaining N − 1 units in the

population.

Shivon Sue-Chee Module 3- Populations and Distributions 34

...Probability of inclusion

P(Zi = 1) = P(ith unit is in the sample)

=

(1

1

)(N−1

n−1

)(N

n

)

=

n

N

Thus,

E[Zi ] = 1× P(Zi = 1) + 0× P(Zi = 0)

= P(Zi = 1) =

n

N

Shivon Sue-Chee Module 3- Populations and Distributions 35

Summary of Sampling distributions

sample statistics=functions of sample (y1, . . . , yn)

we are considering probability samples

a sample statistic has a probability distribution =⇒ sampling

distribution

Eg. sampling distribution of sample mean of tv

depend on sampling mechanism, size of sample n, and the nature

of the population

by the CLT, for large n, y¯

·∼ N

Shivon Sue-Chee Module 3- Populations and Distributions 36

Homework

Assignment #1 due by 8pm Feb. 4 into Quercus

Exercises: 3.2-3.3, 3.7, 3.12, 3.13, 3.14, (Lohr)

2.12-A.1

Readings: Chapter 4 (ESS)

Next topic: Simple Random Sampling

Shivon Sue-Chee Module 3- Populations and Distributions 37

Outline

Infinite vs Finite populations (§3.1-3.3)

Drawing SRS’s using R

Estimation (§3.6)

Intro to Sampling distributions (§3.4)

Shivon Sue-Chee Module 3- Populations and Distributions 1

Review

Admin:

I Assignment 1 due in Quercus on Thursday, Sept. 27 by 8pm

I Instructor and TA office hours, Special R help sessions

I Piazza discussion forum

You should know...:

I How to overcome selection bias and errors of observation?

I What is a probability sample?

I Give examples of types of probability samples.

I Give examples of non-probability samples.

I Compare and contrast simple random, stratified, cluster and systematic

sampling.

Shivon Sue-Chee Module 3- Populations and Distributions 2

Homework Overview

Practice Problems: 2.30, 2.31, 3.2-3.3, 3.7, 3.12-3.14 (ESS)

Shivon Sue-Chee Module 3- Populations and Distributions 3

Infinite vs Finite population

Infinite population:

I infinitely many units exist

I randomly sampled elements have the same chance of being selected

I selections are independent of one another

I oppositie to finite population

I probability models

I approximate distribution of sample statistic

Finite population:

I population size is finite, N; {u1, u2, . . . , uN}

I sampling without replacement: condition of independence not satisfied

Shivon Sue-Chee Module 3- Populations and Distributions 4

Random sampling from an infinite population (§3.2)

y1, . . . , yn : an i.i.d. sample of n observations from a probability

mass function, p(·) or a probability density function, f (·)

we observe random variable, Y1 and it has the value, y1,

Y2 has the value, y2 and so on

“identical”: each Yi follows p(y) or f (y)

values are sampled independently

Eg.: y1, . . . , yn i .i .d . N (µ, σ2)

In R: rnorm(n, µ, σ)

Shivon Sue-Chee Module 3- Populations and Distributions 5

Random sampling from a finite population (§3.3)

has a set of N units/elements of interest, {u1, u2, . . . , uN}

population size= number of elements, N (likely large)

uses a randomization method to select the sample

sampling with replacement, δi = P(ui is sampled)

sampling without replacement, pii = P(ui is sampled)

Shivon Sue-Chee Module 3- Populations and Distributions 6

Random sampling from a finite population (§3.3)

most general approach is to consider unequal probabilities of

selection for each unit

more usual approach to have equal probabilities of selection;

simplest case is pii = 1/N :

I population mean and variance as defined on slide that follows

I consistent with usual probability definitions

I motivation for pp 53-57

I easier to find “good” estimates for designs employing equal probabilities

Shivon Sue-Chee Module 3- Populations and Distributions 7

Random sampling from a finite population (§3.3)

Equal probabilities of selection under:

With replacement:

δi=P(ui is selected on any one draw)=

1

N

Without replacement:

pii=P(the ith element in the population, ui is selected in the

sample)

I changes with the draw

I use average probability that ui is selected across n draws

Shivon Sue-Chee Module 3- Populations and Distributions 8

Drawing SRS’s using R

Randomly select with or without replacement from a list

sample(N,n)

sample(N,n,replace=TRUE)

Random samples from various models

runif(n)

rnorm(n)

rbinom(n,m,p)

Shivon Sue-Chee Module 3- Populations and Distributions 9

Finite population parameters

population mean: µ = 1N

∑N

i=1 y(ui ) =

1

N

∑N

i=1 yi

population variance:

σ2 =

1

N

N∑

i=1

{y(ui )− µ}2 = 1

N

N∑

i=1

(yi − µ)2

population total: τ =

∑N

i=1 y(ui ) =

∑N

i=1 yi = Nµ

population proportion: y=0 or 1,

p =

1

N

N∑

i=1

y(ui ) =

1

N

N∑

i=1

yi

population ratio: µy/µx

Give real examples:

Shivon Sue-Chee Module 3- Populations and Distributions 10

Random Variables

A random variable/measurement/property is a function,

yi = y(ui) defined on population elements, ui

the range of a variable is the set of possible values of y

Example:

I population: N restaurants in Toronto

I variable: number of job openings for waitstaff

I range of variable: 0, 1, 2, . . .

I Eg. of values: y(u1) = 5, y(u2) = 0, y(u3) = 1, . . . , y(uN) = 0

Shivon Sue-Chee Module 3- Populations and Distributions 11

Finite population: our class example

population students in 304/1003 Winter 2021

variable social media hours, height, interests,. . .

range

eg. of values

Shivon Sue-Chee Module 3- Populations and Distributions 12

Sampling distributions (§3.4)

y¯ , s2 and other functions of sample (y1, . . . , yn)

statistics have probability distributions, because (y1, . . . , yn) has

a probability distribution

Note: only probability samples have probability distributions

In sampling from:

I an infinite population (i.i.d. sampling), p(y) or f (y) determines the

probability distribution of y¯ and other functions of (y1, . . . , yn)

I a finite population, the sampling method, sample size n and the nature

of the population determine the probability distribution of y¯ and other

functions of (y1, . . . , yn)

Shivon Sue-Chee Module 3- Populations and Distributions 13

...sampling distributions (§3.4)

in either case, we can try to figure out the sampling distribution

of the function of interest

or we can simulate lots of samples, compute lots of, for e.g., y¯ ’s,

and make a histogram

see Figures 3.2, 3.4, 3.5 (ESS)

Shivon Sue-Chee Module 3- Populations and Distributions 14

...sampling distributions (§3.4)

For infinite population sampling...:

the Central Limit Theorem (CLT) says

y¯

·∼ N (µ, σ

2

n

)

as long as n is large

For finite population sampling...:

CLT says

y¯

·∼ N (µ, ...)

but we need n→∞, N →∞, N − n→∞

Shivon Sue-Chee Module 3- Populations and Distributions 15

Table 3.4

population given in Table 3.4 N = 68

histogram in Figure 3.3

sample mean computed based on samples of size 5: sampling

histogram Figure 3.4a

sample mean computed based on samples of size 40: sampling

histogram Figure 3.4b

simple random sampling without replacement

central limit theorem works better if underlying distribution is

symmetric

they changed to consider log(weight); the population distribution is

more nearly symmetric

see Figure 3.6

in Figure 3.7, N − n is very small (68-60=8); CLT not working so well

Shivon Sue-Chee Module 3- Populations and Distributions 16

Estimation (§3.6)

Notation:

I population parameter, θ

I sample statistic, θ̂

Using the (approx.) sampling distribution of a sample statistic θ̂,

we can find

I expected value of θ̂, E(θ̂)

I variance of θ̂, Var(θ̂) = E (θ̂ − E(θ̂))2

I error of estimation, |θ̂ − θ|

I Bias(θ̂) = E(θ̂)− θ

I Mean Square Error(θ̂) = E(θ̂ − θ)2=Bias2(θ̂) + Var(θ̂)

Show!

I bound on error of estimation=margin of error=2 ∗

√

Var(θ̂)

to assess the accuracy of θ̂

Shivon Sue-Chee Module 3- Populations and Distributions 17

In general, we want θ̂ unbiased and precise

Unbiased, if E(θ̂) = θ

Precise, if Var(θ̂) is small

Accurate, if MSE(θ̂) is small

Shivon Sue-Chee Module 3- Populations and Distributions 18

In general, we want θ̂ unbiased and precise

Unbiased, if E(θ̂) = θ

Precise, if Var(θ̂) is small

Accurate, if MSE(θ̂) is small

Shivon Sue-Chee Module 3- Populations and Distributions 19

In general, we want θ̂ unbiased and precise

Unbiased, if E(θ̂) = θ

Precise, if Var(θ̂) is small

Accurate, if MSE(θ̂) is small

Shivon Sue-Chee Module 3- Populations and Distributions 20

Outline

Sampling distributions

I of the sample mean

I central limit theorem

I confidence intervals

I Survey 1 data

Inclusion Probability

Shivon Sue-Chee Module 3- Populations and Distributions 21

...sampling distributions (§3.4)

For infinite population sampling...:

the Central Limit Theorem (CLT) says

y¯

·∼ N (µ, σ

2

n

)

as long as n is large

For finite population sampling...:

CLT says

y¯

·∼ N (µ, ...)

but we need n→∞, N →∞, N − n→∞

Shivon Sue-Chee Module 3- Populations and Distributions 22

Estimation (§3.6)

Using the (approx.) sampling distribution of a sample statistic θ̂,

we can find

I expected value of θ̂, E(θ̂)

I variance of θ̂, Var(θ̂) = E (θ̂ − E(θ̂))2

I error of estimation, |θ̂ − θ|

I Bias(θ̂) = E(θ̂)− θ

I Mean Square Error(θ̂) = E(θ̂ − θ)2=Bias2(θ̂) + Var(θ̂)

I bound on error of estimation=margin of error=2 ∗

√

Var(θ̂)

to assess the accuracy of θ̂

Shivon Sue-Chee Module 3- Populations and Distributions 23

Proof of MSE

MSE [θ̂] = E[(θ̂ − θ)2]

= E[(θ̂ − E[θ̂] + E[θ̂]− θ)2]

= E[(θ̂ − E[θ̂])2] + (E[θ̂]− θ)2 + 2E[(θ̂ − E[θ̂])(E[θ̂]− θ)]

= Var(θ̂) + [Bias(θ̂)]2

Shivon Sue-Chee Module 3- Populations and Distributions 24

Some raw Intro Survey data

Shivon Sue-Chee Module 3- Populations and Distributions 25

Shivon Sue-Chee Module 3- Populations and Distributions 26

Side-by-side boxplots

> summary(handspan)

Min. 1st Qu. Median Mean 3rd Qu. Max.

6.70 17.00 19.00 18.56 20.50 31.00

> summary(digit)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0.000 3.000 6.000 5.358 7.000 9.000

Shivon Sue-Chee Module 3- Populations and Distributions 27

Generating sampling distributions

function of interest:

I daily average social media hours or

I average height

sampling distribution of sample mean

approx. sampling distributions using simulations in R:

1. generate 1000 samples of size n

2. vary n: 10, 20, 30

3. find the 1000 means of each of the size-n samples

4. plot histogram of 1000 means

5. repeat steps 3-4 for various n

Shivon Sue-Chee Module 3- Populations and Distributions 28

Generating sampling distributions

1 Suppose that the population size is N = 212.

2 How many samples of size n = 10 are possible?

3 How many samples of size n = 20 are possible?

4 How many samples of size n = 30 are possible?

5 How many samples are sufficient to represent the sampling

distribution?

6 Which phenomenon occurs as n increases?

Shivon Sue-Chee Module 3- Populations and Distributions 29

...Generating sampling distributions

Varying sample size, n

#initialize a zero vector of length 1000 for handspans

hbar = rep(0,1000)

par(mfrow=c(3,1))

#sample size, n=10

for(i in 1:1000){hbar[i]=mean(sample(handspan,10))}

hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31), main="n=10")

#sample size, n=20

for(i in 1:1000){hbar[i]=mean(sample(handspan,20))}

hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31),main="n=20")

#sample size, n=30

for(i in 1:1000){hbar[i]=mean(sample(handspan,30))}

hist(hbar, breaks = seq(6,31,by=0.5), xlim=c(6,31),main="n=30")

Shivon Sue-Chee Module 3- Populations and Distributions 30

Shivon Sue-Chee Module 3- Populations and Distributions 31

Shivon Sue-Chee Module 3- Populations and Distributions 32

Probability of inclusion

Suppose we draw a sample of size n without replacement from a

finite population of N elements (u1, u2, . . . , uN).

How many samples of size n are possible?

If we assume that each sample is equally likely, what is the

probability that one of the samples is selected?

What is the probability that the ith unit is included in the

sample of size n?

Shivon Sue-Chee Module 3- Populations and Distributions 33

...Probability of inclusion

An Indicator random variable is defined as (Appendix A, (Lohr)):

Zi =

{

1 if unit i is in the sample

0 if unit i is not in the sample.

Then in a sample of size n, n of the random variables

Z1,Z2, . . . ,ZN will take on the value of 1, and the remaining

N − n will be 0.

If the ith unit is in the sample, then Zi = 1 and the other n − 1

units must come from the remaining N − 1 units in the

population.

Shivon Sue-Chee Module 3- Populations and Distributions 34

...Probability of inclusion

P(Zi = 1) = P(ith unit is in the sample)

=

(1

1

)(N−1

n−1

)(N

n

)

=

n

N

Thus,

E[Zi ] = 1× P(Zi = 1) + 0× P(Zi = 0)

= P(Zi = 1) =

n

N

Shivon Sue-Chee Module 3- Populations and Distributions 35

Summary of Sampling distributions

sample statistics=functions of sample (y1, . . . , yn)

we are considering probability samples

a sample statistic has a probability distribution =⇒ sampling

distribution

Eg. sampling distribution of sample mean of tv

depend on sampling mechanism, size of sample n, and the nature

of the population

by the CLT, for large n, y¯

·∼ N

Shivon Sue-Chee Module 3- Populations and Distributions 36

Homework

Assignment #1 due by 8pm Feb. 4 into Quercus

Exercises: 3.2-3.3, 3.7, 3.12, 3.13, 3.14, (Lohr)

2.12-A.1

Readings: Chapter 4 (ESS)

Next topic: Simple Random Sampling

Shivon Sue-Chee Module 3- Populations and Distributions 37