xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-1F/1003HF

时间：2021-03-18

UNIVERSITY OF TORONTO

Faculty of Arts and Science

STA304H1F/1003HF FALL 2015 MIDTERM TEST #2 SOLUTIONS

November 25, 2015 Duration- 50 minutes

Aids: Two-sided handwritten notes (8 1/2 x 11) and a non-programmable calculator.

Instructions: This test consists of 4 questions on 7 pages. Please answer all questions on the question

paper, showing all your work and using proper English. The maximum mark for this test is 50.

1. (9 marks) A auditor is confronted with a long list of accounts receivable for a firm. She must verify the

amounts on 10% of these accounts and estimate the average difference between the audited and book

values.

(a) (3 marks) Suppose the accounts are arranged chronologically (according to their dates), with the

older accounts tending to have smaller values. Would systematic or random sampling be preferred?

Explain briefly.

In this case systematic sampling would be preferred, as the population is ordered. [2]

Thus, the variance of an estimate from a systematic sample would be expected to be

smaller. [1]

OR: A systematic sample would give a better representation of the population...

OR: (Any other sound reason)

(b) (3 marks) Suppose the accounts are grouped by department, and then listed chronologically within

departments. The older accounts again tend to have smaller values. Would systematic or random

sampling be preferred? Explain briefly.

In this case (simple) random sampling would be preferred. [1]

Because the accounts are ordered within departments, the population behaves more

like a periodic population. [2]

OR: The population will have a cycle (large to small to large) along the list, so sys-

tematic sampling could be biased and collect all large or small accounts.

OR: Use stratified random sampling, with departments as strata. Within each stra-

tum, we can use simple random sampling, or systematic sampling- to take advantage

of the chronology. [3]

OR: Use repeated systematic sampling to overcome the periodicity.[3]

OR: (Any other sound reason)

(c) (3 marks) Which of the following three estimation methods do you think is most appropriate to

estimate the desired population mean- ratio estimation, regression estimation or difference estima-

tion? Explain.

In this case, difference estimation is most appropriate, [1]

since audited and book values are highly correlated and both are measured on the

same scale. [1]

It is easier than regression estimation since the regression coefficient is set to one. [1]

AND/OR Compared to ratio estimation, we would not necessary have that there is

regression through the origin and the aim is to find difference rather than ratio. [1]

Page 1 of 7

2. (16 marks) A forest resource manager is interested in estimating the number of dead fir trees in a

300-acre area of heavy infestation. Using an aerial photo, he divides the area into 200 plots, each of 1.5

acres. Let x denote the photo count of dead firs and y the actual ground count for a simple random

sample of n = 10 plots. The total number of dead fir trees obtained from the photo count is τx = 4200.

The sample data is shown in the table and plotted in the figure below.

Plot sampled 1 2 3 4 5 6 7 8 9 10

Photo count 12 30 24 24 18 30 12 6 36 42

Ground count 18 42 24 36 24 36 14 10 48 54

(Note: considerations were made for the typo corrected in the above table for the ground count of the

8th plot sampled.)

5 10 15 20 25 30 35 40

10

20

30

40

50

photo

gr

ou

nd

(a) (4 marks) Construct a ratio estimate of the total number of dead firs in the 300-acre area. omit-

Place a bound on the error of estimation.

r =

y¯

x¯

=

30.6

23.4

= 1.307692

τˆy = rτx = 1.307692(4200) = 5492.31

Hence, a ratio estimate of the number of dead firs in the 300-acre plot is 5492.31 trees.

(This question #2 continues on the next page.)

2

(b) (4 marks) The model yi = α+β(xi− x¯) was fitted to this data and some related R output appears

below. The estimates of α and β were 30.60 and 1.26 respectively, to 2 decimal places. Construct

a regression estimate for the total number of dead firs. Place a bound on the error of estimation.

> reg_model= lm(ground~ I(photo - mean(photo)))

> summary(reg_model)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 30.6000 1.1507 26.59 4.30e-09 ***

I(photo - mean(photo)) 1.2594 0.1057 11.91 2.27e-06 ***

Residual standard error: 3.639 on 8 degrees of freedom

Multiple R-squared: 0.9466,Adjusted R-squared: 0.94

F-statistic: 141.9 on 1 and 8 DF, p-value: 2.269e-06

> mean(ground)

[1] 30.6

> mean(photo)

[1] 23.4

> sum(residuals(reg_model)^2)/8

[1] 13.24012

Answer (b)

µˆyL = 30.6 + (1.2594)

(4200

200

− 23.4

)

τˆyL = NµˆyL = 200(27.57744) = 5515.50

A regression estimate of the total number of dead fir trees is 5515.5 trees.

A bound on the error of estimation is found by B = 2(200) ∗

√

(1− 10200) ∗ 13.2401210 = 448.6

(c) (3 marks) Do you think that regression estimation is better than ratio estimation for this problem?

Explain.

Not necessarily. We expect that there is regression through the origin. (For reference,

see alternative R regression output.)

Call:

lm(formula = ground ~ photo)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.1307 2.7286 0.414 0.689

photo 1.2594 0.1057 11.911 2.27e-06 ***

Residual standard error: 3.639 on 8 degrees of freedom

Multiple R-squared: 0.9466,Adjusted R-squared: 0.94

F-statistic: 141.9 on 1 and 8 DF, p-value: 2.269e-06

3

(d) (i) (1 mark) Compute an estimate of the total number of dead firs using the ground count

data only.

Ny¯=200(30.6)=6120

(ii) (2 marks) Is your estimator in (i) unbiased or biased? Explain. (No calculations necessary.)

It is unbiased since y¯ is an unbiased estimator of µy and τˆ is a linear function of

y¯.

(iii) (2 marks) Do you expect that your estimator in (i) will be more efficient than the ratio esti-

mator in part (a)? Explain. (No further calculations necessary.)

No, since photo count and ground count are strongly positively correlated, we

expect that the ratio estimator will be more precise.

4

3. (12 marks) Define any three of the following terms, and illustrate each with an example:

(a) observational study (b) margin of error (c) model-based estimation

(d) post-stratification (e) two-stage cluster sampling (f) probability sampling

(g) standard error (h) repeated systematic sampling (i) unbiased estimator

[Any three; 4 marks each (2 for definition and 2 for example). Examples will vary.]

(a) An observational study draws inferences about the effect of an “exposure” where the

assignment of subjects to groups is observed rather than manipulated by the investi-

gator.

Eg.: A study of the risk of developing lung cancer between smokers and non-smokers.

(b) Commonly, the margin of error is half the length of a confidence interval or the same

as the bound on the error of estimation. It describes the precision of the estimator.

Eg: Using y¯ to estimate µ, a margin of error is ±2SE(y¯)

(c) In model-based estimation, a model motivates the form of the estimator and how

variability is estimated. This is in contrast to design-based estimation where sampling

variability is determined by the sampling design.

Eg.: In model-based estimation, the variance is the average squared deviation of the

estimate from its expected value over all possible samples that could be generated from

the population model.

(d) Post-stratification is a way of improving an estimator after a simple random sample is

collected, by stratifying on an important auxiliary variable.

Eg.: to estimate average weight of a human population, we might stratify our random

sample by sex after collecting our sample, so that we adjust for possible imbalance in

the observed sample.

(e) A two-stage cluster sample is obtained by first selecting a probability sample of clus-

ters (primary sampling units) and then selecting a probability sample of elements

(secondary sampling units) from each sampled cluster.

Eg.: In a city ward, suppose we are interested in grade 3 performance. Schools can

be considered as psu’s and students within selected schools as ssu’s. A 2-stage clus-

ter sample can be conducted by randomly selecting schools within the ward and then

randomly selecting grade 3 students in the selected schools.

(f) In a probability sample, each unit in the population has a known probability of selec-

tion, and a random number table or other randomization mechanism is used to choose

the specific units to be included in the sample.

Eg.: SRS is the simplest form of probability sampling.

(g) Standard error is the standard deviation of (the sampling distribution of) a sample

statistic.

Eg: For y¯ from a SRS, the standard error is σ/

√

n

(h) Repeated systematic sampling involves taking several systematic samples to makeup

the entire sample.

Eg.

(i) An estimator θˆ of a population parameter, θ is unbiased if its expectation equals the

parameter, i.e., E(θˆ) = θ.

Eg: The sample mean y¯ is unbiased for the population mean µ.

5

4. (a) (9 marks) A language school owner takes an SRS of 10 of the 72 Introductory Spanish classes

offered by the school. Each student in each of the sampled classes is asked whether he or she is

planning a trip to a Spanish-speaking country in the next year. (Note: total marks for this part (a)

corrected to 9 marks.)

i. (3 marks) Describe why this is a one-stage cluster sampling design. What is the primary sam-

pling unit? What is the secondary sampling unit?

The primary sampling units are classes and the secondary sampling units are stu-

dents within the classes. [2]

This is a one-stage cluster sample since classes are randomly selected and each

student within the selected class is surveyed. [1]

ii. A. (5 marks) Suppose the owner wanted to estimate the total number of students planning a

trip to a Spanish speaking country in the next year, of the students in the 72 Introductory

Spanish classes. Using data from the 10 randomly selected classes, describe formulas for a

ratio estimator and an unbiased estimator, that the owner can use to estimate the total.

Use the notation:

• M=total number of students in the school

• N= total number of Introductory Spanish classes offered by the school

• n=the number of classes selected

• mi=number of students in the ith class, i = 1, . . . , N

• yi=total number of students in ith class who are planning a trip to a Spanish-speaking

country;

however, specify their values where possible.

B. (1 mark) Under what conditions would the two estimators (the ratio estimator

and the unbiased estimator in ii. A.) be equivalent?

A. (2 marks for each formula)

Ratio estimator Unbiased estimator

τˆy = M

∑n

i=1 yi∑n

i=1mi

τˆy = N

∑n

i=1 yi

n

where N = 72, n = 10 and M is unknown. [1]

B. The two estimators will be equivalent when cluster sizes are the same, i.e.,

m1 = m2 = . . . = mn = m.

6

(b) (4 marks) Under what conditions does cluster sampling produce a smaller bound on the error of

estimation for the population mean than simple random sampling? Explain.

Cluster sampling produces estimates of better precision than simple random sampling

when the clusters are heterogeneous within [2], with respect to the measurement of

interest, and have similar cluster means as we move from one cluster to another [2].

END OF TEST

Q1 Q2 Q3 Q4 Total

9 16 12 13 50

7

学霸联盟

Faculty of Arts and Science

STA304H1F/1003HF FALL 2015 MIDTERM TEST #2 SOLUTIONS

November 25, 2015 Duration- 50 minutes

Aids: Two-sided handwritten notes (8 1/2 x 11) and a non-programmable calculator.

Instructions: This test consists of 4 questions on 7 pages. Please answer all questions on the question

paper, showing all your work and using proper English. The maximum mark for this test is 50.

1. (9 marks) A auditor is confronted with a long list of accounts receivable for a firm. She must verify the

amounts on 10% of these accounts and estimate the average difference between the audited and book

values.

(a) (3 marks) Suppose the accounts are arranged chronologically (according to their dates), with the

older accounts tending to have smaller values. Would systematic or random sampling be preferred?

Explain briefly.

In this case systematic sampling would be preferred, as the population is ordered. [2]

Thus, the variance of an estimate from a systematic sample would be expected to be

smaller. [1]

OR: A systematic sample would give a better representation of the population...

OR: (Any other sound reason)

(b) (3 marks) Suppose the accounts are grouped by department, and then listed chronologically within

departments. The older accounts again tend to have smaller values. Would systematic or random

sampling be preferred? Explain briefly.

In this case (simple) random sampling would be preferred. [1]

Because the accounts are ordered within departments, the population behaves more

like a periodic population. [2]

OR: The population will have a cycle (large to small to large) along the list, so sys-

tematic sampling could be biased and collect all large or small accounts.

OR: Use stratified random sampling, with departments as strata. Within each stra-

tum, we can use simple random sampling, or systematic sampling- to take advantage

of the chronology. [3]

OR: Use repeated systematic sampling to overcome the periodicity.[3]

OR: (Any other sound reason)

(c) (3 marks) Which of the following three estimation methods do you think is most appropriate to

estimate the desired population mean- ratio estimation, regression estimation or difference estima-

tion? Explain.

In this case, difference estimation is most appropriate, [1]

since audited and book values are highly correlated and both are measured on the

same scale. [1]

It is easier than regression estimation since the regression coefficient is set to one. [1]

AND/OR Compared to ratio estimation, we would not necessary have that there is

regression through the origin and the aim is to find difference rather than ratio. [1]

Page 1 of 7

2. (16 marks) A forest resource manager is interested in estimating the number of dead fir trees in a

300-acre area of heavy infestation. Using an aerial photo, he divides the area into 200 plots, each of 1.5

acres. Let x denote the photo count of dead firs and y the actual ground count for a simple random

sample of n = 10 plots. The total number of dead fir trees obtained from the photo count is τx = 4200.

The sample data is shown in the table and plotted in the figure below.

Plot sampled 1 2 3 4 5 6 7 8 9 10

Photo count 12 30 24 24 18 30 12 6 36 42

Ground count 18 42 24 36 24 36 14 10 48 54

(Note: considerations were made for the typo corrected in the above table for the ground count of the

8th plot sampled.)

5 10 15 20 25 30 35 40

10

20

30

40

50

photo

gr

ou

nd

(a) (4 marks) Construct a ratio estimate of the total number of dead firs in the 300-acre area. omit-

Place a bound on the error of estimation.

r =

y¯

x¯

=

30.6

23.4

= 1.307692

τˆy = rτx = 1.307692(4200) = 5492.31

Hence, a ratio estimate of the number of dead firs in the 300-acre plot is 5492.31 trees.

(This question #2 continues on the next page.)

2

(b) (4 marks) The model yi = α+β(xi− x¯) was fitted to this data and some related R output appears

below. The estimates of α and β were 30.60 and 1.26 respectively, to 2 decimal places. Construct

a regression estimate for the total number of dead firs. Place a bound on the error of estimation.

> reg_model= lm(ground~ I(photo - mean(photo)))

> summary(reg_model)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 30.6000 1.1507 26.59 4.30e-09 ***

I(photo - mean(photo)) 1.2594 0.1057 11.91 2.27e-06 ***

Residual standard error: 3.639 on 8 degrees of freedom

Multiple R-squared: 0.9466,Adjusted R-squared: 0.94

F-statistic: 141.9 on 1 and 8 DF, p-value: 2.269e-06

> mean(ground)

[1] 30.6

> mean(photo)

[1] 23.4

> sum(residuals(reg_model)^2)/8

[1] 13.24012

Answer (b)

µˆyL = 30.6 + (1.2594)

(4200

200

− 23.4

)

τˆyL = NµˆyL = 200(27.57744) = 5515.50

A regression estimate of the total number of dead fir trees is 5515.5 trees.

A bound on the error of estimation is found by B = 2(200) ∗

√

(1− 10200) ∗ 13.2401210 = 448.6

(c) (3 marks) Do you think that regression estimation is better than ratio estimation for this problem?

Explain.

Not necessarily. We expect that there is regression through the origin. (For reference,

see alternative R regression output.)

Call:

lm(formula = ground ~ photo)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.1307 2.7286 0.414 0.689

photo 1.2594 0.1057 11.911 2.27e-06 ***

Residual standard error: 3.639 on 8 degrees of freedom

Multiple R-squared: 0.9466,Adjusted R-squared: 0.94

F-statistic: 141.9 on 1 and 8 DF, p-value: 2.269e-06

3

(d) (i) (1 mark) Compute an estimate of the total number of dead firs using the ground count

data only.

Ny¯=200(30.6)=6120

(ii) (2 marks) Is your estimator in (i) unbiased or biased? Explain. (No calculations necessary.)

It is unbiased since y¯ is an unbiased estimator of µy and τˆ is a linear function of

y¯.

(iii) (2 marks) Do you expect that your estimator in (i) will be more efficient than the ratio esti-

mator in part (a)? Explain. (No further calculations necessary.)

No, since photo count and ground count are strongly positively correlated, we

expect that the ratio estimator will be more precise.

4

3. (12 marks) Define any three of the following terms, and illustrate each with an example:

(a) observational study (b) margin of error (c) model-based estimation

(d) post-stratification (e) two-stage cluster sampling (f) probability sampling

(g) standard error (h) repeated systematic sampling (i) unbiased estimator

[Any three; 4 marks each (2 for definition and 2 for example). Examples will vary.]

(a) An observational study draws inferences about the effect of an “exposure” where the

assignment of subjects to groups is observed rather than manipulated by the investi-

gator.

Eg.: A study of the risk of developing lung cancer between smokers and non-smokers.

(b) Commonly, the margin of error is half the length of a confidence interval or the same

as the bound on the error of estimation. It describes the precision of the estimator.

Eg: Using y¯ to estimate µ, a margin of error is ±2SE(y¯)

(c) In model-based estimation, a model motivates the form of the estimator and how

variability is estimated. This is in contrast to design-based estimation where sampling

variability is determined by the sampling design.

Eg.: In model-based estimation, the variance is the average squared deviation of the

estimate from its expected value over all possible samples that could be generated from

the population model.

(d) Post-stratification is a way of improving an estimator after a simple random sample is

collected, by stratifying on an important auxiliary variable.

Eg.: to estimate average weight of a human population, we might stratify our random

sample by sex after collecting our sample, so that we adjust for possible imbalance in

the observed sample.

(e) A two-stage cluster sample is obtained by first selecting a probability sample of clus-

ters (primary sampling units) and then selecting a probability sample of elements

(secondary sampling units) from each sampled cluster.

Eg.: In a city ward, suppose we are interested in grade 3 performance. Schools can

be considered as psu’s and students within selected schools as ssu’s. A 2-stage clus-

ter sample can be conducted by randomly selecting schools within the ward and then

randomly selecting grade 3 students in the selected schools.

(f) In a probability sample, each unit in the population has a known probability of selec-

tion, and a random number table or other randomization mechanism is used to choose

the specific units to be included in the sample.

Eg.: SRS is the simplest form of probability sampling.

(g) Standard error is the standard deviation of (the sampling distribution of) a sample

statistic.

Eg: For y¯ from a SRS, the standard error is σ/

√

n

(h) Repeated systematic sampling involves taking several systematic samples to makeup

the entire sample.

Eg.

(i) An estimator θˆ of a population parameter, θ is unbiased if its expectation equals the

parameter, i.e., E(θˆ) = θ.

Eg: The sample mean y¯ is unbiased for the population mean µ.

5

4. (a) (9 marks) A language school owner takes an SRS of 10 of the 72 Introductory Spanish classes

offered by the school. Each student in each of the sampled classes is asked whether he or she is

planning a trip to a Spanish-speaking country in the next year. (Note: total marks for this part (a)

corrected to 9 marks.)

i. (3 marks) Describe why this is a one-stage cluster sampling design. What is the primary sam-

pling unit? What is the secondary sampling unit?

The primary sampling units are classes and the secondary sampling units are stu-

dents within the classes. [2]

This is a one-stage cluster sample since classes are randomly selected and each

student within the selected class is surveyed. [1]

ii. A. (5 marks) Suppose the owner wanted to estimate the total number of students planning a

trip to a Spanish speaking country in the next year, of the students in the 72 Introductory

Spanish classes. Using data from the 10 randomly selected classes, describe formulas for a

ratio estimator and an unbiased estimator, that the owner can use to estimate the total.

Use the notation:

• M=total number of students in the school

• N= total number of Introductory Spanish classes offered by the school

• n=the number of classes selected

• mi=number of students in the ith class, i = 1, . . . , N

• yi=total number of students in ith class who are planning a trip to a Spanish-speaking

country;

however, specify their values where possible.

B. (1 mark) Under what conditions would the two estimators (the ratio estimator

and the unbiased estimator in ii. A.) be equivalent?

A. (2 marks for each formula)

Ratio estimator Unbiased estimator

τˆy = M

∑n

i=1 yi∑n

i=1mi

τˆy = N

∑n

i=1 yi

n

where N = 72, n = 10 and M is unknown. [1]

B. The two estimators will be equivalent when cluster sizes are the same, i.e.,

m1 = m2 = . . . = mn = m.

6

(b) (4 marks) Under what conditions does cluster sampling produce a smaller bound on the error of

estimation for the population mean than simple random sampling? Explain.

Cluster sampling produces estimates of better precision than simple random sampling

when the clusters are heterogeneous within [2], with respect to the measurement of

interest, and have similar cluster means as we move from one cluster to another [2].

END OF TEST

Q1 Q2 Q3 Q4 Total

9 16 12 13 50

7

学霸联盟