xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-STA 304/1003

时间：2021-03-31

STA 304/1003 Winter 2021: Week 7

A1 regrades

Engagement Activity II

Missed Test guideline

Today: Ratio and regression estimation

Readings: Chapter 6; exclude §6.5, 6.9

Upcoming:

I March 8 (new date!): Assignment 2

I March 15: Drop day

I March 22: Test 2

Shivon Sue-Chee Ratio and Regression Estimation 1

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 2

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 3

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 4

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 5

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 6

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 7

Examples

Parameters of interest:

Ratio: R =

µy

µx

Population Mean: µy

Population Total: τy

Examples:

Y -

X -

Y -

X -

Y -

X -

Shivon Sue-Chee Ratio and Regression Estimation 8

Motivating Examples

1 Estimate population size N (Laplace):

use total # of births × birth rate

2 Estimate total sugar content (Eg. 6.2):

use total weight × av. sugar content / av. weight

3 Estimate population ratio (Ex. 6.3):

amount spent on food

household income

Shivon Sue-Chee Ratio and Regression Estimation 9

Example (Lohr, Ch.4): Estimate N

Laplace wanted to estimate population of France (1802)

first sampled 30 communes (districts)

y1, . . . , y30 – number of persons in commune i

x1, . . . , x30 – number of registered births in commune i

additional information: 1 million births in the whole country

n∑

i=1

yi = 2, 037, 615

n∑

i=1

xi = 71, 866∑

yi∑

xi

= 28.3

estimate of population 28.3× 1million = 28.3 million

with x and y correlated, less variability in y¯/x¯ that Ny¯

Shivon Sue-Chee Ratio and Regression Estimation 10

Example (§6.2): Estimate total

Estimate the sugar content of a truck-load of oranges

sample n oranges: measure sugar content yi and weight xi

τy

τx

=

τy =

τˆy =

How do we get τx?

What is the Ch.4 solution?

Shivon Sue-Chee Ratio and Regression Estimation 11

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 12

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 13

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 14

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 15

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 16

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 17

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 18

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 19

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 20

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 21

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 22

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 23

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 24

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 25

Estimating a population ratio: Exercise 6.3

Ratio: Y - Money spent on food/yr vs

X - total yearly household income

26000 28000 30000 32000 34000

30

00

40

00

50

00

60

00

x(income)

y(

fo

od

e

xp

en

di

tu

re

)

plot the data! any influential points?

Shivon Sue-Chee Ratio and Regression Estimation 26

Example: Exercise 6.3- estimate R

> exer63x = scan()

1: 25100 32200 29600 35000 34400 26500 28700

8: 28200 34600 32700 31500 30600 27700 28500

15:

Read 14 items

> exer63y = scan()

1: 3800 5100 4200 6200 5800 4100 2900

8: 3600 3800 4100 4500 5100 4200 4000

15:

Read 14 items

> plot(exer63x,exer63y, xlab="x(income)",ylab="y(food expenditure)")

> mean(exer63y)/mean(exer63x)

[1] 0.1443687

> mean(exer63x)

[1] 30378.57

> mean(exer63y)

[1] 4385.714

=⇒ r = R̂ = 0.144 V̂ (r) =

Shivon Sue-Chee Ratio and Regression Estimation 27

Stats In the News: Meat causes Cancer?!

The New York Times

Shivon Sue-Chee Ratio and Regression Estimation 28

Stats In the News: WHO report on meat and cancer

Cancer deaths worldwide, on a yearly basis:

I Tobacco - about a million

I Alcohol -600,000

I Diets high in processed meat -about 34,000

22 scientists from 10 countries reviewed more than 800 studies

‘linking what people ate with cancers they developed later’

‘Often such studies can’t prove a causal link’

Conclusion: people should follow diets “lower in red and processed

meat.”

Shivon Sue-Chee Ratio and Regression Estimation 29

Ratio estimation formulas

Toolboxes (6.1-6.7)

Estimator Estimated Variance

r =

∑n

i=1 yi∑n

i=1 xi

V̂ (r) =

(

1− nN

)(

1

µ2x

s2r

n

)

τˆy = V̂ (τˆy ) =

µˆy = V̂ (µˆy ) =

where s2r =

∑n

i=1(yi−rxi )2

n−1

Shivon Sue-Chee Ratio and Regression Estimation 30

Ratio estimation formulas

Toolboxes (6.1-6.7)

Parameter Estimator Estimated Variance

R =

µy

µx

r = R̂ =

∑n

i=1 yi∑n

i=1 xi

= y¯x¯ V̂(r) =

(

1− nN

)(

1

µ2x

s2r

n

)

τy τˆy =

y¯

x¯ τx = rτx V̂(τˆy ) = τ

2

x V̂(r)

µy µˆy =

y¯

x¯ µx = rµx V̂(µˆy ) = µ

2

x V̂(r)

where

s2r =

∑n

i=1(yi − rxi )2

n − 1

Shivon Sue-Chee Ratio and Regression Estimation 31

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 32

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 33

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 34

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 35

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 36

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 37

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 38

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 39

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 40

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 41

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 42

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 43

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 44

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 45

Regression estimation formulas (§6.6)

Regression estimation: used instead of ratio estimation if there is a

linear relationship between y and x but not necessarily through the

origin

Regression line: yˆi = a + bxi

By least squares method:

a = y¯ − bx¯ and b =

∑n

i=1(yi − y¯)(xi − x¯)∑n

i=1(xi − x¯)2

Alternatively, yˆi = y¯ + b(xi − x¯)

To estimate µy by linear(L) regression, using µx at xi ,we get

µˆyL = y¯ + b(µx − x¯)

V̂(µˆyL) =

Shivon Sue-Chee Ratio and Regression Estimation 46

Regression estimation formulas (§6.6)

Regression estimation: used instead of ratio estimation if there is a

linear relationship between y and x but not necessarily through the

origin

Regression line: yˆi = a + bxi

By least squares method:

a = y¯ − bx¯ and b =

∑n

i=1(yi − y¯)(xi − x¯)∑n

i=1(xi − x¯)2

Alternatively, yˆi = y¯ + b(xi − x¯)

To estimate µy by linear(L) regression, using µx at xi ,we get

µˆyL = y¯ + b(µx − x¯)

V̂(µˆyL) =

Shivon Sue-Chee Ratio and Regression Estimation 47

Regression estimation formulas (Toolbox 6.24-6.26)

µˆyL = y¯ + b(µx − x¯)

V̂(µˆyL) =

(

1− n

N

) 1

n

∑n

i=1(yi − a− bxi )2

n − 2 =

(

1− n

N

) MSE

n

note on notation:

MSE =

∑n

i=1(yi − a− bxi )2

n − 2 =

SSE

n − 2

many books call this MSR = SSR/(n − 2) for “mean square

residuals” and “sum of squared residuals”

because MSE usually means “variance + bias-squared”= the mean

squared error of a biased estimator

Shivon Sue-Chee Ratio and Regression Estimation 48

Example 6.9

Simple Least Squares Regression

> achieve=scan()

1: 39 43 21 64 57 47 28 75 34 52

11:

Read 10 items

> calc = scan()

1: 65 78 52 82 92 89 73 98 56 75

11:

Read 10 items

> plot(achieve, calc, main="Figure 6.6")

> # ordinary least squares regression

> fit=lm(calc~achieve)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 40.7842 8.5069 4.794 0.00137 **

achieve 0.7656 0.1750 4.375 0.00236 **

Residual standard error: 8.704 on 8 degrees of freedom

Multiple R-squared: 0.7052,Adjusted R-squared: 0.6684

F-statistic: 19.14 on 1 and 8 DF, p-value: 0.002365

20 30 40 50 60 70

60

70

80

90

Figure 6.6

achieve

ca

lc

Shivon Sue-Chee Ratio and Regression Estimation 49

Example 6.9

‘Deviations from sample average’ Regression

> # least squares regression on mean centred data

> fittedmodel = lm(calc~ I(achieve - mean(achieve)))

> # I( ... ) treats the arguments numerically

> summary(fittedmodel)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 76.0000 2.7523 27.613 3.19e-09 ***

I(achieve - mean(achieve)) 0.7656 0.1750 4.375 0.00236 **

Residual standard error: 8.704 on 8 degrees of freedom

Multiple R-squared: 0.7052,Adjusted R-squared: 0.6684

F-statistic: 19.14 on 1 and 8 DF, p-value: 0.002365

> 2*sqrt((1-10/486)*(sum(residuals(fittedmodel)^2)/8)/10)

[1] 5.447734

> #Knowing mu_x=52, estimate mu_y

> mean(calc)+ 0.7656*(52-mean(achieve))

[1] 80.5936

Shivon Sue-Chee Ratio and Regression Estimation 50

Regression Example 6.9 Summary

Figure 6.6: least squares line from sample has intercept a = 40.7842

and slope b = 0.7656

y¯=76, x¯=46

µx=52, µˆyL=80.6

V̂(µˆyL)=7.42

Margin of error: 2

√

V̂(µˆyL)=5.45

Check residual plot to determine whether the simple linear model is

an appropriate fit.

See Figure 6.7: the linear model seems appropriate; no obvious

pattern or outlier values.

Shivon Sue-Chee Ratio and Regression Estimation 51

Summary

Ratio estimation is most appropriate when the relationship between y

and x is linear through the origin.

Regression estimator performs better than ratio estimator when the

population relationship moves away from a straight line through the

origin (i.o.w., intercept close to zero).

As the population relationship exhibits more curvature, the regression

estimator becomes more biased.

Shivon Sue-Chee Ratio and Regression Estimation 52

Summary

Ratio estimation is most appropriate when the relationship between y

and x is linear through the origin.

Regression estimator performs better than ratio estimator when the

population relationship moves away from a straight line through the

origin (i.o.w., intercept close to zero).

As the population relationship exhibits more curvature, the regression

estimator becomes more biased.

Shivon Sue-Chee Ratio and Regression Estimation 53

Summary

Ratio estimation is most appropriate when the relationship between y

and x is linear through the origin.

Regression estimator performs better than ratio estimator when the

population relationship moves away from a straight line through the

origin (i.o.w., intercept close to zero).

As the population relationship exhibits more curvature, the regression

estimator becomes more biased.

Shivon Sue-Chee Ratio and Regression Estimation 54

Homework

Read §6.1 – §6.8

EX: 6.1, 6.2, 6.6, 6.9, 6.12, 6.13, 6.14, 6.16

Sampling from real populations: EX 6.4

Shivon Sue-Chee Ratio and Regression Estimation 55

学霸联盟

A1 regrades

Engagement Activity II

Missed Test guideline

Today: Ratio and regression estimation

Readings: Chapter 6; exclude §6.5, 6.9

Upcoming:

I March 8 (new date!): Assignment 2

I March 15: Drop day

I March 22: Test 2

Shivon Sue-Chee Ratio and Regression Estimation 1

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 2

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 3

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 4

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 5

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 6

Ratio and regression estimation (Ch. 6)

most surveys measure population values on more than one variable

even if yi , i = 1, . . . ,N is the variable of most interest

consider one additional auxiliary or subsidiary variable xi , i = 1, . . . ,N

auxiliary: x is correlated with y , y ∝ x

obtain a random sample of paired measurements:

y1, . . . , yn

x1, . . . , xn

use information in x to improve estimation of population parameters

related to y , such as µy , τy , or

µy

µx

Shivon Sue-Chee Ratio and Regression Estimation 7

Examples

Parameters of interest:

Ratio: R =

µy

µx

Population Mean: µy

Population Total: τy

Examples:

Y -

X -

Y -

X -

Y -

X -

Shivon Sue-Chee Ratio and Regression Estimation 8

Motivating Examples

1 Estimate population size N (Laplace):

use total # of births × birth rate

2 Estimate total sugar content (Eg. 6.2):

use total weight × av. sugar content / av. weight

3 Estimate population ratio (Ex. 6.3):

amount spent on food

household income

Shivon Sue-Chee Ratio and Regression Estimation 9

Example (Lohr, Ch.4): Estimate N

Laplace wanted to estimate population of France (1802)

first sampled 30 communes (districts)

y1, . . . , y30 – number of persons in commune i

x1, . . . , x30 – number of registered births in commune i

additional information: 1 million births in the whole country

n∑

i=1

yi = 2, 037, 615

n∑

i=1

xi = 71, 866∑

yi∑

xi

= 28.3

estimate of population 28.3× 1million = 28.3 million

with x and y correlated, less variability in y¯/x¯ that Ny¯

Shivon Sue-Chee Ratio and Regression Estimation 10

Example (§6.2): Estimate total

Estimate the sugar content of a truck-load of oranges

sample n oranges: measure sugar content yi and weight xi

τy

τx

=

τy =

τˆy =

How do we get τx?

What is the Ch.4 solution?

Shivon Sue-Chee Ratio and Regression Estimation 11

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 12

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 13

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 14

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 15

Example: Estimate mean and post-stratification

Suppose population is 50% male, 50% female

We select a sample of size 100, and record yi = weight of ith person

Goal: estimate the average weight in the population, µy

The sample turns out to have just 20 mean, and 80 women

Men Women

n1 = 20 n2 = 80

y¯1 = 180 pounds y¯2 = 110 pounds

y¯ = 124 pounds

Adjusted to a more realistic value:

y¯st = 0.5(180) + 0.5(110) = 145

Shivon Sue-Chee Ratio and Regression Estimation 16

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 17

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 18

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 19

Examples: Estimating a ratio

Example 6.1:

R = mean monthly cost in 2002/ mean monthly cost in 1994

See Table 6.1: y¯ = 901.5, x¯ = 695.8,

r = Rˆ = y¯/x¯ = 1.296

Example: estimate the average number of fish caught per hour by

anglers visiting a lake

x – number of hours fished, y – number of fish caught

Example: estimate the average amount that undergraduate students

spent on textbooks

x – number of textbooks bought, y – total cost

Example (see §6.0 and §6.9): estimate the mean number of students

per section in elementary courses

x – numbers of sections, y – enrollments

Shivon Sue-Chee Ratio and Regression Estimation 20

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 21

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 22

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 23

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 24

Why ratio estimation?

1 To estimate a ratio

E.g. average yield per acre, percentage of magazine pages devoted to

advertising, mean enrollment per section (§6.0), Consumer Price

Index (§6.2)

2 To estimate a population total, but N is unknown

E.g. oranges, Laplace’s

3 If x and y are correlated, the ratio estimator of µy or τy could have

smaller variance than the variance of the simpler estimators

4 To adjust estimates from a sample to reflect demographics

E.g. Current Population Survey (§6.2)

This is called “post-stratification”

5 To adjust estimates for non-response (§11.6)

Shivon Sue-Chee Ratio and Regression Estimation 25

Estimating a population ratio: Exercise 6.3

Ratio: Y - Money spent on food/yr vs

X - total yearly household income

26000 28000 30000 32000 34000

30

00

40

00

50

00

60

00

x(income)

y(

fo

od

e

xp

en

di

tu

re

)

plot the data! any influential points?

Shivon Sue-Chee Ratio and Regression Estimation 26

Example: Exercise 6.3- estimate R

> exer63x = scan()

1: 25100 32200 29600 35000 34400 26500 28700

8: 28200 34600 32700 31500 30600 27700 28500

15:

Read 14 items

> exer63y = scan()

1: 3800 5100 4200 6200 5800 4100 2900

8: 3600 3800 4100 4500 5100 4200 4000

15:

Read 14 items

> plot(exer63x,exer63y, xlab="x(income)",ylab="y(food expenditure)")

> mean(exer63y)/mean(exer63x)

[1] 0.1443687

> mean(exer63x)

[1] 30378.57

> mean(exer63y)

[1] 4385.714

=⇒ r = R̂ = 0.144 V̂ (r) =

Shivon Sue-Chee Ratio and Regression Estimation 27

Stats In the News: Meat causes Cancer?!

The New York Times

Shivon Sue-Chee Ratio and Regression Estimation 28

Stats In the News: WHO report on meat and cancer

Cancer deaths worldwide, on a yearly basis:

I Tobacco - about a million

I Alcohol -600,000

I Diets high in processed meat -about 34,000

22 scientists from 10 countries reviewed more than 800 studies

‘linking what people ate with cancers they developed later’

‘Often such studies can’t prove a causal link’

Conclusion: people should follow diets “lower in red and processed

meat.”

Shivon Sue-Chee Ratio and Regression Estimation 29

Ratio estimation formulas

Toolboxes (6.1-6.7)

Estimator Estimated Variance

r =

∑n

i=1 yi∑n

i=1 xi

V̂ (r) =

(

1− nN

)(

1

µ2x

s2r

n

)

τˆy = V̂ (τˆy ) =

µˆy = V̂ (µˆy ) =

where s2r =

∑n

i=1(yi−rxi )2

n−1

Shivon Sue-Chee Ratio and Regression Estimation 30

Ratio estimation formulas

Toolboxes (6.1-6.7)

Parameter Estimator Estimated Variance

R =

µy

µx

r = R̂ =

∑n

i=1 yi∑n

i=1 xi

= y¯x¯ V̂(r) =

(

1− nN

)(

1

µ2x

s2r

n

)

τy τˆy =

y¯

x¯ τx = rτx V̂(τˆy ) = τ

2

x V̂(r)

µy µˆy =

y¯

x¯ µx = rµx V̂(µˆy ) = µ

2

x V̂(r)

where

s2r =

∑n

i=1(yi − rxi )2

n − 1

Shivon Sue-Chee Ratio and Regression Estimation 31

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 32

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 33

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 34

Example 6.2: Estimating a population total

To estimate total sugar content of truckload of oranges

y is sugar content; x is weight; τx = 1800lbs is weight of the

truckload (easily obtained)

10∑

i=1

yi = 0.246,

10∑

i=1

xi = 4.35, r =

0.246

4.35

τˆy =

V̂ (τˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 35

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 36

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 37

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 38

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 39

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 40

Example 6.3: Estimating a population mean

µy = mean acreage in sugarcane, in 1999, across N = 32 counties

sample 6 counties and record: yi = mean acreage in sugarcane, 1999

and xi = mean acreage in sugarcane, 1997

plus we know µx = mean acreage across all 32 counties

µˆy =

V̂ (µˆy ) =

Shivon Sue-Chee Ratio and Regression Estimation 41

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 42

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 43

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 44

Improved estimation using regression (§6.6)

Ratio estimation:

- uses τx/

∑n

i=1 xi to improve estimation of µy or τy

- works well when y ∝ x

Regression estimation:

- can be used if y − a ∝ x

-there is a linear relationship between y and x but not necessarily

through the origin

- get estimate of µy

Example 6.9: x- score on SAT math, y - final grade in calculus course

Class example: x- handspan, y -height

Shivon Sue-Chee Ratio and Regression Estimation 45

Regression estimation formulas (§6.6)

Regression estimation: used instead of ratio estimation if there is a

linear relationship between y and x but not necessarily through the

origin

Regression line: yˆi = a + bxi

By least squares method:

a = y¯ − bx¯ and b =

∑n

i=1(yi − y¯)(xi − x¯)∑n

i=1(xi − x¯)2

Alternatively, yˆi = y¯ + b(xi − x¯)

To estimate µy by linear(L) regression, using µx at xi ,we get

µˆyL = y¯ + b(µx − x¯)

V̂(µˆyL) =

Shivon Sue-Chee Ratio and Regression Estimation 46

Regression estimation formulas (§6.6)

Regression estimation: used instead of ratio estimation if there is a

linear relationship between y and x but not necessarily through the

origin

Regression line: yˆi = a + bxi

By least squares method:

a = y¯ − bx¯ and b =

∑n

i=1(yi − y¯)(xi − x¯)∑n

i=1(xi − x¯)2

Alternatively, yˆi = y¯ + b(xi − x¯)

To estimate µy by linear(L) regression, using µx at xi ,we get

µˆyL = y¯ + b(µx − x¯)

V̂(µˆyL) =

Shivon Sue-Chee Ratio and Regression Estimation 47

Regression estimation formulas (Toolbox 6.24-6.26)

µˆyL = y¯ + b(µx − x¯)

V̂(µˆyL) =

(

1− n

N

) 1

n

∑n

i=1(yi − a− bxi )2

n − 2 =

(

1− n

N

) MSE

n

note on notation:

MSE =

∑n

i=1(yi − a− bxi )2

n − 2 =

SSE

n − 2

many books call this MSR = SSR/(n − 2) for “mean square

residuals” and “sum of squared residuals”

because MSE usually means “variance + bias-squared”= the mean

squared error of a biased estimator

Shivon Sue-Chee Ratio and Regression Estimation 48

Example 6.9

Simple Least Squares Regression

> achieve=scan()

1: 39 43 21 64 57 47 28 75 34 52

11:

Read 10 items

> calc = scan()

1: 65 78 52 82 92 89 73 98 56 75

11:

Read 10 items

> plot(achieve, calc, main="Figure 6.6")

> # ordinary least squares regression

> fit=lm(calc~achieve)

> summary(fit)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 40.7842 8.5069 4.794 0.00137 **

achieve 0.7656 0.1750 4.375 0.00236 **

Residual standard error: 8.704 on 8 degrees of freedom

Multiple R-squared: 0.7052,Adjusted R-squared: 0.6684

F-statistic: 19.14 on 1 and 8 DF, p-value: 0.002365

20 30 40 50 60 70

60

70

80

90

Figure 6.6

achieve

ca

lc

Shivon Sue-Chee Ratio and Regression Estimation 49

Example 6.9

‘Deviations from sample average’ Regression

> # least squares regression on mean centred data

> fittedmodel = lm(calc~ I(achieve - mean(achieve)))

> # I( ... ) treats the arguments numerically

> summary(fittedmodel)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 76.0000 2.7523 27.613 3.19e-09 ***

I(achieve - mean(achieve)) 0.7656 0.1750 4.375 0.00236 **

Residual standard error: 8.704 on 8 degrees of freedom

Multiple R-squared: 0.7052,Adjusted R-squared: 0.6684

F-statistic: 19.14 on 1 and 8 DF, p-value: 0.002365

> 2*sqrt((1-10/486)*(sum(residuals(fittedmodel)^2)/8)/10)

[1] 5.447734

> #Knowing mu_x=52, estimate mu_y

> mean(calc)+ 0.7656*(52-mean(achieve))

[1] 80.5936

Shivon Sue-Chee Ratio and Regression Estimation 50

Regression Example 6.9 Summary

Figure 6.6: least squares line from sample has intercept a = 40.7842

and slope b = 0.7656

y¯=76, x¯=46

µx=52, µˆyL=80.6

V̂(µˆyL)=7.42

Margin of error: 2

√

V̂(µˆyL)=5.45

Check residual plot to determine whether the simple linear model is

an appropriate fit.

See Figure 6.7: the linear model seems appropriate; no obvious

pattern or outlier values.

Shivon Sue-Chee Ratio and Regression Estimation 51

Summary

Ratio estimation is most appropriate when the relationship between y

and x is linear through the origin.

Regression estimator performs better than ratio estimator when the

population relationship moves away from a straight line through the

origin (i.o.w., intercept close to zero).

As the population relationship exhibits more curvature, the regression

estimator becomes more biased.

Shivon Sue-Chee Ratio and Regression Estimation 52

Summary

Ratio estimation is most appropriate when the relationship between y

and x is linear through the origin.

Regression estimator performs better than ratio estimator when the

population relationship moves away from a straight line through the

origin (i.o.w., intercept close to zero).

As the population relationship exhibits more curvature, the regression

estimator becomes more biased.

Shivon Sue-Chee Ratio and Regression Estimation 53

Summary

Ratio estimation is most appropriate when the relationship between y

and x is linear through the origin.

Regression estimator performs better than ratio estimator when the

population relationship moves away from a straight line through the

origin (i.o.w., intercept close to zero).

As the population relationship exhibits more curvature, the regression

estimator becomes more biased.

Shivon Sue-Chee Ratio and Regression Estimation 54

Homework

Read §6.1 – §6.8

EX: 6.1, 6.2, 6.6, 6.9, 6.12, 6.13, 6.14, 6.16

Sampling from real populations: EX 6.4

Shivon Sue-Chee Ratio and Regression Estimation 55

学霸联盟