xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-L2

时间：2021-04-05

Econometrics: L2

Multiple Regression Model

Sung Y. Park

Chung-Ang Univ.

Motivation

Consider the following example:

wage = β0 + β1educ + β2exper + u,

where exper is years of labor market experience.

◮ wage is determined by the two independent variables

◮ β1: the effect of educ on wage holding all other factors affecting

wage (our interest)

◮ just as with simple regression, we make assumptions about how u is

related with independent variables

◮ Note that we have to assume educ and exper are uncorrelated in

the simple regression (why?)

Motivation

A model with two independent variables:

y = β0 + β1x1 + β2x2 + u,

◮ β0: the intercept

◮ β1: measures the change in y with respect to x1, holding

other factors fixed

◮ β2: measures the change in y with respect to x1, holding

other factors fixed

Note: In Economics, this is used a lot: ceteris paribus, keeping all

else constant.

Motivation

Quadratic functional relationships:

cons = β0 + β1inc + β2inc

2 + u,

◮ note that consumption depends on only income ⇒ x1 = inc

and x2 = inc

2

◮ here, is β1 the ceteris paribus effect of inc on cons? ⇒ No!

(why?)

◮ the change in consumption with respect to the change in

income (the marginal propensity to consume)

∆cons

∆inc

∼ β1 + 2β2inc.

Motivation

Key Assumption:

E (u|x1, x2) = 0

◮ for any values of x1 and x2, the average unobservable is equal

to zero

◮ note that the common value 0 is not crucial as long as β0 is

included in the model

◮ E (u|educ, exper) = 0 ⇒ other factors affecting wage are not

related on average to educ and exper

◮ E (u|inc, inc2) = 0? ⇒ E (u|inc) = 0 (inc2 is redundant)

Motivation

More generally...

y = β0 + β1x1 + β2x2 + β3x3 + · · ·+ βkxk + u,

◮ there are k independent variables in the model but k + 1

parameters

◮ the parameters other than the intercept (β0): slope

parameters

◮ u: error term of disturbance

Motivation

An example:

log(salary) = β0 + β1 log(sales) + β2ceoten + β3ceoten

2 + u,

◮ β1 : ceteris paribus elasticity of salary with respect to sales

◮ if β3 = 0, 100β2: the ceteris paribus percentage increase in

salary when ceotan increases by one year

◮ if β3 6= 0, the effect of ceoten on salary is more complicated

(later)

◮ note that it is nonlinear relationship between salary and sales

but linear in βj

OLS estimates

OLS estimates: Choose, simultaneously, βˆ0, βˆ1 and βˆ2 that make

n∑

i=1

(yi − βˆ0 − βˆ1xi1 − βˆ2xi2)

2

as small as possible.

◮ first index i refers to the observation number

◮ second index represents different independent variable

◮ xij : the i-th observation on the j-th independent variable.

OLS estimates

General case:

min

β0,β1,··· ,βk

n∑

i=1

(yi − β0 − β1xi1 − β2xi2 − · · · − βkxik)

2

⇒ leads to k + 1 linear equations in k + 1 unknown βˆ0, βˆ1, · · · , βˆk :

First order conditions:

n∑

i=1

(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0

n∑

i=1

xi1(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0

...

n∑

i=1

xik(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0

OLS estimates

Interpretation:

yˆ = βˆ0 + βˆ1x1 + βˆ2x2

◮ βˆ1 and βˆ2 have partial effect (ceteris paribus) interpretations

◮ From the above equation we can get the predicted change in y

given x1 and x2

∆yˆ = βˆ1∆x1 + βˆ2∆2

◮ when x2 is fixed (∆x2 = 0)

∆yˆ = βˆ1∆x1.

◮ when x1 is fixed (∆x1 = 0)

∆yˆ = βˆ2∆x2.

OLS estimates : “Partialling out”

Interesting Formulas:

yˆ = βˆ0 + βˆ1x1 + βˆ2x2

◮

βˆ1 = (

n∑

i=1

rˆi1yi)/(

n∑

i=1

rˆ2i1)

rˆi1 are the OLS residual from a simple regression of x1 on x2

◮ note that rˆi1 are the part of xi1 that is uncorrelated with xi2

⇒ rˆi1 is xi1 after the effect of xi2 have been partialled out.

◮ βˆ1 measures the sample relationship between y and x1 after

x2 has been partialled out.

◮ General case: rˆi1 come from the regression of x1 on

x2, x3, · · · , xk .

OLS estimates : Comparison

Simple regression: y˜ = β˜0 + β˜1x1

Multiple regression: yˆ = βˆ0 + βˆ1x1 + βˆ2x2

Generally, β˜1 6= βˆ1

Interesting comparison:

β˜1 = βˆ1 + βˆ2δ˜1,

where δ˜1 is from xˆ2 = δ˜0 + δ˜1x1.

OLS estimates : Comparison

Under which conditions they are equal?

◮ The partial effect of x2 on yˆ is zero, i.e., βˆ2 = 0.

◮ x1 and x2 are uncorrelated, i.e., δ˜1 = 0.

◮ General multiple regression case (k number of indep vars). ⇒

(i) βˆ2 = βˆ3 = · · · = βˆk = 0; (ii) x1 is uncorrelated with each

of x2, x3, · · · , xk .

Statistical properties of OLS estimators

• Assumptions :

A1 (Linear in Parameter) y is related to x1, x2, · · · , xk by a linear

function, i.e., y = β0 + β1x1 + β2x2 + · · · + βkxk + u.

A2 (Random Sampling) {(yi , xi1, xi2, · · · , xik)}, i = 1, 2, · · · , n is

a random sample of the population model in A1.

A3 (No Perfect Collinearity) In the sample, none of the indep vars

is constant, and there are no exact linear relationship among

the indep vars.

A4 (Zero Conditional Mean) E (u|x1, x2, · · · , xk) = 0.

Statistical properties of OLS estimators

A3 (No Perfect Collinearity) In the sample, none of the indep vars

is constant, and there are no exact linear relationship among

the indep vars.

◮ A3 allows the independent variables to be correlated; but not

perfectly correlated. Ex: one car is constant multiple of

another (x1 = z and x2 = 3× z).

◮ what about y = β0 + β1x + β2x

2 + u?

◮ what about log(y) = β0 + β1 log(x) + β2 log(x

2) + u?

◮ what about y = β0 + β1x1 + β2x2 + β3z + u, where

z = x1 + x2?

◮ what can we do when there is the perfect collinearity?

Statistical properties of OLS estimators

• Unbiasedness of OLS :

Theorem

Under A1-A4,

E (βˆj) = βj , j = 1, 2, · · · , k,

for any population parameter βj .

Statistical properties of OLS estimators

• Proof :

Under A3, OLS estimators exist and

βˆ1 =

∑n

i=1 rˆi1yi∑n

i=1 rˆ

2

i1

=

∑n

i=1 rˆi1(β0 + β1xi1 + · · ·+ βkxik + u)∑n

i=1 rˆ

2

i1

=

∑n

i=1(β1rˆi1xi1 + rˆi1u)∑n

i=1 rˆ

2

i1

= β1 +

∑n

i=1 rˆi1u∑n

i=1 rˆ

2

i1

(why?)

⇒

E (βˆ1|X) = β1 + (

n∑

i=1

rˆi1E (ui |X))/(

n∑

i=1

rˆ2i1)

= β1 + (

n∑

i=1

rˆi1 · 0)/(

n∑

i=1

rˆ2i1) = β1

where X is the data on all indep vars.

Including irrelevant variables

We specify the model

y = β0 + β1x1 + β2x2 + β3x3 + u,

and the model satisfies A1-A4. But the true model is given by

E (y |x1, x2) = β0 + β1x1 + β2x2.

• What is the effect of including irrelevant x3 in the model?

◮ Unbiasedness: No problem. (E (βˆ3) = 0 by previous Thm)

◮ Undesirable effects on the variance of the OLS estimators

Omitted variable

The true model is

y = β0 + β1x1 + β2x2 + u,

and the model satisfies A1-A4. But we perform a simple regression

of y on x1 only,

y˜ = β˜0 + β˜1x1

• What is the effect of omitting x2?

◮ Unbiasedness : we can use β˜1 = βˆ1 + βˆ2δ˜1.

⇒ E (β˜1) = E (βˆ1 + βˆ2δ˜1) = E (βˆ1) + E (βˆ2)δ˜1 = β1 + β2δ˜1

⇒ Bias(β˜1) = E (β˜1)− β1 = β2δ˜1.

◮ β2 = 0 → β˜1 is unbiased.

◮ δ˜1 = 0 → β˜1 is unbiased even if β2 6= 0

Omitted variable

Bias(β˜1) = β2δ˜1

Sign of the bias:

Corr(x1, x2) > 0 Corr(x1, x2) < 0

β2 > 0 positive bias (upward bias) negative bias (downward bias)

β2 < 0 negative bias (downward bias) positive bias (upward bias)

Note: The size of the bias is also important. Small size of bias

might be okay. The size of the bias is determined by the sizes of

β2 and δ˜1.

Omitted variable

Example:

The true model is

log(wage) = β0 + β1educ + β2abil + u

We estimate the model based on

log(wage) = β0 + β1educ + u

◮ more ability leads to higher wage (β2 > 0).

◮ educ and abil are possibly positively correlated (δ1 > 0).

◮ The OLS estimates are on average too large.

The variance of OLSE

To show the efficiency of OLSE, we need one more assumption.

A5 (Homoskedasticity) The error u has the same variance given

any values of the independent variables, i.e.,

Var(u|x1, · · · , xk) = σ

2

Then we can obtain the variance of βˆj .

Theorem

Under A1-A5,

Var(βˆj) =

σ2

(

∑n

i=1(xij − x¯j)

2)(1− R2j )

,

for j = 1, 2, ·, k, where R2j is the R-square from regressing xj on all

other independent variables and an constant.

The variance of OLSE

Var(βˆj) =

σ2

(

∑n

i=1(xij − x¯j)

2)(1− R2j )

.

◮ (error variance) σ2 ↑ ⇒ Var(βˆj ) ↑

◮ (sample variation in xj)

∑n

i=1(xij − x¯j)

2 ↑ ⇒ Var(βˆj ) ↓

◮ (multicollinearity) R2j ↑ ⇒ Var(βˆj) ↑

Variance in misspecified model

The true model:

y = β0 + β1x1 + β2x2 + u, (βˆ0, βˆ1, βˆ2)

Our model:

y = β0 + β1x1 + u, (β˜0, β˜1)

◮ β˜1: biased unless E (x1x2) = 0 and β2 = 0

◮ βˆ1 is preferred to β˜1 (unbiasedness criterion only)

◮ What about their variances?

Var(βˆ1) = σ

2/[SST1(1− R

2

1 )], Var(β˜1) = σ

2/SST1

⇒ Var(β˜1) is always smaller than Var(βˆ1)!

Variance in misspecified model

◮ In summary,

When β2 = 0, β˜1 and βˆ1 are unbiased and Var(β˜1) < Var(βˆ1).

When β2 6= 0, β˜1 is biased, βˆ1 is unbiased and

Var(β˜1) < Var(βˆ1).

◮ β2 = 0: β˜1 is preferred.

◮ β2 6= 0: Bias in β˜1 does not decrease as the sample size

grows. But Var(β˜1) and Var(βˆ1) goes to 0 when the sample

size is large enough.

Estimation of σ2

Note that we do not observe u!

ui = yi − β0 − β1xi1 − β2xi2 − · · · − βkxik

so we replace each βj with OLSE.

uˆi = yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik

• The unbiased estimator of σ2:

σˆ2 =

n∑

i=1

uˆ2i /(n − k − 1).

• Standard error of βˆj :

se(βˆj) = σˆ/[SSTj(1− R

2

j )]

1/2

Gauss-Markov Theorem

Theorem

Under A1-A5, βˆj , j = 0, 1, 2, · · · , k, are the best linear unbiased

estimators of βj , j = 0, 1, 2, · · · , k.

◮ linear: β˜j =

∑n

i=1 wijyi .

◮ best: smallest variance

◮ In the class of linear and unbiased estimators, OLSE has the

smallest variance.

◮ We don’t have to look at the alternative unbiased estiamtor of

the linear form when A1-A5 hold.

Inference

To perform statistical inference we need to know the sampling

distribution rather than the expected value and variance of OLSE.

We need to assume that the error is normally distributed.

A6 (Normality) u is independent of x1, x2, · · · , xk and is normally

distributed with zero mean and variance σ2.

Then we have (why?)

βˆj ∼ N(βj ,Var(βˆ)),

⇒

(βˆj − βj)

sd(βˆj)

∼ N(0, 1).

t-test

The population model

y = β0 + β1x1 + β2x2 + · · · ,+βkxk + u,

Under A1-A6,

(βˆj − βj)

se(βˆj)

∼ tn−k−1,

where tn−k−1 denotes Student’s t distribution with n − k − 1

degree of freedom.

More on t-test

◮ One-sided alternative:

H1 : βj > 0, tβˆj > c

H1 : βj < 0, tβˆj < −c

◮ Two-sided alternative:

H1 : βj 6= 0, |tβˆj | > c

◮ Computing p-values for tests:

P(|T | > |t|): two-sided alternative

P(T > t), P(T < t)...

t-test

Example:

̂log(wage) = 0.284 + 0.092educ + 0.0041exper + 0.022tenure

(0.104) (0.007) (0.0017) (0.003)

n = 526,R2 = 0.316

Perform a test whether the return to exper is equal to zero in population,

against the alternative that it is positive.

⇒ H0 : βexper = 0 versus H1 : βexper > 0

◮ Test statistic (under H0) ⇒ (0.0041− 0)/0.0017 ≃ 2.41

◮ Type of the test (under H1) ⇒ one-sided (right) test

◮ 5% and 1% critical value: 1.645 and 2.326, respectively.

◮ We reject the null hypothesis ⇒ βˆexper is statistically greater than

zero at the 1% significance level.

Single linear combination of the parameters

Example:

log(wage) = β0 + β1jc + β2univ + β3exper + u,

where jc is number of years attending a two-year college, univ is

number of years at a four-year college, and exper is months in the

workforce.

H0 : β1 = β2

H1 : β1 < β2

This is nothing but

H0 : β1 − β2 = 0

H1 : β1 − β2 < 0

Single linear combination of the parameters

Example (Con’t):

t-statistic:

t =

βˆ1 − βˆ2

se(βˆ1 − βˆ2)

.

In order to calculate se(βˆ1 − βˆ2) we need to know

Var(βˆ1 − βˆ2) = Var(βˆ1) + Var(βˆ2)− 2Cov(βˆ1, βˆ2)

⇒ se(βˆ1 − βˆ2) = {[se(βˆ1)]

2 + se[(βˆ2)]

2 − 2s12}

1/2

where s12 denotes an estimate of Cov(βˆ1 − βˆ2).

Single linear combination of the parameters

Example (Con’t):

Another way:

Define θ1 = β1 − β2. So β1 = θ1 + β2.

log(wage) = β0 + (θ1 + β2)jc + β2univ + β3exper + u

= β0 + θ1jc + β2(jc + univ) + β3exper + u.

⇒ Perform t-test to θ1!

F-Test

Consider major league baseball players’ salaries:

log(salary) = β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u,

where years : years in the league; gamesyr : average games played

per year; bavg : career batting average; hrunsyr : home runs per

year; rbisyr : runs batted in per year.

H0 : β3 = 0, β4 = 0, β5 = 0.

H1 : H0 is not true.

How can we construct the test statistic?

F-Test

Basic idea: Compare two models: H0 and the full models.

Unrestricted model:

log(salary) = β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u,

Restricted model:

log(salary) = β0 + β1years + β2gamesyr + u,

At first get, SSR (sum of squared residuals) from the two models.

Check whether the increase in the SSR in going from the

unrestricted model to the restricted model is large enough to reject

the null hypothesis.

What is “large enough”?

F-Test

General case:

Unrestricted model:

y = β0 + β1x1 + · · ·+ βkxk + u.

H0 : βk−q+1 = 0, · · · , βk = 0,

⇒ q exclusion restrictions on the unrestricted model.

Restricted model:

y = β0 + β1x1 + · · · + βk−qxk−q + u.

F-Test

F statistic:

F ≡

(SSRr − SSRur )/q

SSRur/(n − k − 1)

(

=

(R2ur − R

2

r )/q

(1− R2ur )/dfur

)

.

◮ SSRr > SSRur ⇒ F statistic is always nonnegative.

◮ q: numerator degrees of freedom; n − k − 1: denominator

degrees of freedom.

◮ F is distributed as an F rv with q, n − k − 1 degree of

freedom, i.e., F ∼ Fq,n−k−1.

◮ Reject H0 in favor of H1 at the chosen significance level if

F > c.

◮ p-value = P(F > F ).

F-Test

Example:

log(price) = β0+β1 log(assess)+β2 log(lotsize)+β3 log(sqrft)+β4 log(bdrms

assess: assessed housing value; lotsize: size of lots; sqrft: square

footage; bdrms: number of bedrooms.

Question: the assessed housing price is a rational valuation?

H0 : β1 = 1, β2 = 0, β3 = 0, β4 = 0.

Restricted model:

log(price) − log(assess) = β0 + u,

⇒ SSRur = 1.822 and SSRr = 1.880 ⇒

[(1.880 − 1.822)/1.822](83/4) = 0.661, (F4,83 = 2.50) ⇒ fail to

reject H0.

学霸联盟

Multiple Regression Model

Sung Y. Park

Chung-Ang Univ.

Motivation

Consider the following example:

wage = β0 + β1educ + β2exper + u,

where exper is years of labor market experience.

◮ wage is determined by the two independent variables

◮ β1: the effect of educ on wage holding all other factors affecting

wage (our interest)

◮ just as with simple regression, we make assumptions about how u is

related with independent variables

◮ Note that we have to assume educ and exper are uncorrelated in

the simple regression (why?)

Motivation

A model with two independent variables:

y = β0 + β1x1 + β2x2 + u,

◮ β0: the intercept

◮ β1: measures the change in y with respect to x1, holding

other factors fixed

◮ β2: measures the change in y with respect to x1, holding

other factors fixed

Note: In Economics, this is used a lot: ceteris paribus, keeping all

else constant.

Motivation

Quadratic functional relationships:

cons = β0 + β1inc + β2inc

2 + u,

◮ note that consumption depends on only income ⇒ x1 = inc

and x2 = inc

2

◮ here, is β1 the ceteris paribus effect of inc on cons? ⇒ No!

(why?)

◮ the change in consumption with respect to the change in

income (the marginal propensity to consume)

∆cons

∆inc

∼ β1 + 2β2inc.

Motivation

Key Assumption:

E (u|x1, x2) = 0

◮ for any values of x1 and x2, the average unobservable is equal

to zero

◮ note that the common value 0 is not crucial as long as β0 is

included in the model

◮ E (u|educ, exper) = 0 ⇒ other factors affecting wage are not

related on average to educ and exper

◮ E (u|inc, inc2) = 0? ⇒ E (u|inc) = 0 (inc2 is redundant)

Motivation

More generally...

y = β0 + β1x1 + β2x2 + β3x3 + · · ·+ βkxk + u,

◮ there are k independent variables in the model but k + 1

parameters

◮ the parameters other than the intercept (β0): slope

parameters

◮ u: error term of disturbance

Motivation

An example:

log(salary) = β0 + β1 log(sales) + β2ceoten + β3ceoten

2 + u,

◮ β1 : ceteris paribus elasticity of salary with respect to sales

◮ if β3 = 0, 100β2: the ceteris paribus percentage increase in

salary when ceotan increases by one year

◮ if β3 6= 0, the effect of ceoten on salary is more complicated

(later)

◮ note that it is nonlinear relationship between salary and sales

but linear in βj

OLS estimates

OLS estimates: Choose, simultaneously, βˆ0, βˆ1 and βˆ2 that make

n∑

i=1

(yi − βˆ0 − βˆ1xi1 − βˆ2xi2)

2

as small as possible.

◮ first index i refers to the observation number

◮ second index represents different independent variable

◮ xij : the i-th observation on the j-th independent variable.

OLS estimates

General case:

min

β0,β1,··· ,βk

n∑

i=1

(yi − β0 − β1xi1 − β2xi2 − · · · − βkxik)

2

⇒ leads to k + 1 linear equations in k + 1 unknown βˆ0, βˆ1, · · · , βˆk :

First order conditions:

n∑

i=1

(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0

n∑

i=1

xi1(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0

...

n∑

i=1

xik(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0

OLS estimates

Interpretation:

yˆ = βˆ0 + βˆ1x1 + βˆ2x2

◮ βˆ1 and βˆ2 have partial effect (ceteris paribus) interpretations

◮ From the above equation we can get the predicted change in y

given x1 and x2

∆yˆ = βˆ1∆x1 + βˆ2∆2

◮ when x2 is fixed (∆x2 = 0)

∆yˆ = βˆ1∆x1.

◮ when x1 is fixed (∆x1 = 0)

∆yˆ = βˆ2∆x2.

OLS estimates : “Partialling out”

Interesting Formulas:

yˆ = βˆ0 + βˆ1x1 + βˆ2x2

◮

βˆ1 = (

n∑

i=1

rˆi1yi)/(

n∑

i=1

rˆ2i1)

rˆi1 are the OLS residual from a simple regression of x1 on x2

◮ note that rˆi1 are the part of xi1 that is uncorrelated with xi2

⇒ rˆi1 is xi1 after the effect of xi2 have been partialled out.

◮ βˆ1 measures the sample relationship between y and x1 after

x2 has been partialled out.

◮ General case: rˆi1 come from the regression of x1 on

x2, x3, · · · , xk .

OLS estimates : Comparison

Simple regression: y˜ = β˜0 + β˜1x1

Multiple regression: yˆ = βˆ0 + βˆ1x1 + βˆ2x2

Generally, β˜1 6= βˆ1

Interesting comparison:

β˜1 = βˆ1 + βˆ2δ˜1,

where δ˜1 is from xˆ2 = δ˜0 + δ˜1x1.

OLS estimates : Comparison

Under which conditions they are equal?

◮ The partial effect of x2 on yˆ is zero, i.e., βˆ2 = 0.

◮ x1 and x2 are uncorrelated, i.e., δ˜1 = 0.

◮ General multiple regression case (k number of indep vars). ⇒

(i) βˆ2 = βˆ3 = · · · = βˆk = 0; (ii) x1 is uncorrelated with each

of x2, x3, · · · , xk .

Statistical properties of OLS estimators

• Assumptions :

A1 (Linear in Parameter) y is related to x1, x2, · · · , xk by a linear

function, i.e., y = β0 + β1x1 + β2x2 + · · · + βkxk + u.

A2 (Random Sampling) {(yi , xi1, xi2, · · · , xik)}, i = 1, 2, · · · , n is

a random sample of the population model in A1.

A3 (No Perfect Collinearity) In the sample, none of the indep vars

is constant, and there are no exact linear relationship among

the indep vars.

A4 (Zero Conditional Mean) E (u|x1, x2, · · · , xk) = 0.

Statistical properties of OLS estimators

A3 (No Perfect Collinearity) In the sample, none of the indep vars

is constant, and there are no exact linear relationship among

the indep vars.

◮ A3 allows the independent variables to be correlated; but not

perfectly correlated. Ex: one car is constant multiple of

another (x1 = z and x2 = 3× z).

◮ what about y = β0 + β1x + β2x

2 + u?

◮ what about log(y) = β0 + β1 log(x) + β2 log(x

2) + u?

◮ what about y = β0 + β1x1 + β2x2 + β3z + u, where

z = x1 + x2?

◮ what can we do when there is the perfect collinearity?

Statistical properties of OLS estimators

• Unbiasedness of OLS :

Theorem

Under A1-A4,

E (βˆj) = βj , j = 1, 2, · · · , k,

for any population parameter βj .

Statistical properties of OLS estimators

• Proof :

Under A3, OLS estimators exist and

βˆ1 =

∑n

i=1 rˆi1yi∑n

i=1 rˆ

2

i1

=

∑n

i=1 rˆi1(β0 + β1xi1 + · · ·+ βkxik + u)∑n

i=1 rˆ

2

i1

=

∑n

i=1(β1rˆi1xi1 + rˆi1u)∑n

i=1 rˆ

2

i1

= β1 +

∑n

i=1 rˆi1u∑n

i=1 rˆ

2

i1

(why?)

⇒

E (βˆ1|X) = β1 + (

n∑

i=1

rˆi1E (ui |X))/(

n∑

i=1

rˆ2i1)

= β1 + (

n∑

i=1

rˆi1 · 0)/(

n∑

i=1

rˆ2i1) = β1

where X is the data on all indep vars.

Including irrelevant variables

We specify the model

y = β0 + β1x1 + β2x2 + β3x3 + u,

and the model satisfies A1-A4. But the true model is given by

E (y |x1, x2) = β0 + β1x1 + β2x2.

• What is the effect of including irrelevant x3 in the model?

◮ Unbiasedness: No problem. (E (βˆ3) = 0 by previous Thm)

◮ Undesirable effects on the variance of the OLS estimators

Omitted variable

The true model is

y = β0 + β1x1 + β2x2 + u,

and the model satisfies A1-A4. But we perform a simple regression

of y on x1 only,

y˜ = β˜0 + β˜1x1

• What is the effect of omitting x2?

◮ Unbiasedness : we can use β˜1 = βˆ1 + βˆ2δ˜1.

⇒ E (β˜1) = E (βˆ1 + βˆ2δ˜1) = E (βˆ1) + E (βˆ2)δ˜1 = β1 + β2δ˜1

⇒ Bias(β˜1) = E (β˜1)− β1 = β2δ˜1.

◮ β2 = 0 → β˜1 is unbiased.

◮ δ˜1 = 0 → β˜1 is unbiased even if β2 6= 0

Omitted variable

Bias(β˜1) = β2δ˜1

Sign of the bias:

Corr(x1, x2) > 0 Corr(x1, x2) < 0

β2 > 0 positive bias (upward bias) negative bias (downward bias)

β2 < 0 negative bias (downward bias) positive bias (upward bias)

Note: The size of the bias is also important. Small size of bias

might be okay. The size of the bias is determined by the sizes of

β2 and δ˜1.

Omitted variable

Example:

The true model is

log(wage) = β0 + β1educ + β2abil + u

We estimate the model based on

log(wage) = β0 + β1educ + u

◮ more ability leads to higher wage (β2 > 0).

◮ educ and abil are possibly positively correlated (δ1 > 0).

◮ The OLS estimates are on average too large.

The variance of OLSE

To show the efficiency of OLSE, we need one more assumption.

A5 (Homoskedasticity) The error u has the same variance given

any values of the independent variables, i.e.,

Var(u|x1, · · · , xk) = σ

2

Then we can obtain the variance of βˆj .

Theorem

Under A1-A5,

Var(βˆj) =

σ2

(

∑n

i=1(xij − x¯j)

2)(1− R2j )

,

for j = 1, 2, ·, k, where R2j is the R-square from regressing xj on all

other independent variables and an constant.

The variance of OLSE

Var(βˆj) =

σ2

(

∑n

i=1(xij − x¯j)

2)(1− R2j )

.

◮ (error variance) σ2 ↑ ⇒ Var(βˆj ) ↑

◮ (sample variation in xj)

∑n

i=1(xij − x¯j)

2 ↑ ⇒ Var(βˆj ) ↓

◮ (multicollinearity) R2j ↑ ⇒ Var(βˆj) ↑

Variance in misspecified model

The true model:

y = β0 + β1x1 + β2x2 + u, (βˆ0, βˆ1, βˆ2)

Our model:

y = β0 + β1x1 + u, (β˜0, β˜1)

◮ β˜1: biased unless E (x1x2) = 0 and β2 = 0

◮ βˆ1 is preferred to β˜1 (unbiasedness criterion only)

◮ What about their variances?

Var(βˆ1) = σ

2/[SST1(1− R

2

1 )], Var(β˜1) = σ

2/SST1

⇒ Var(β˜1) is always smaller than Var(βˆ1)!

Variance in misspecified model

◮ In summary,

When β2 = 0, β˜1 and βˆ1 are unbiased and Var(β˜1) < Var(βˆ1).

When β2 6= 0, β˜1 is biased, βˆ1 is unbiased and

Var(β˜1) < Var(βˆ1).

◮ β2 = 0: β˜1 is preferred.

◮ β2 6= 0: Bias in β˜1 does not decrease as the sample size

grows. But Var(β˜1) and Var(βˆ1) goes to 0 when the sample

size is large enough.

Estimation of σ2

Note that we do not observe u!

ui = yi − β0 − β1xi1 − β2xi2 − · · · − βkxik

so we replace each βj with OLSE.

uˆi = yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik

• The unbiased estimator of σ2:

σˆ2 =

n∑

i=1

uˆ2i /(n − k − 1).

• Standard error of βˆj :

se(βˆj) = σˆ/[SSTj(1− R

2

j )]

1/2

Gauss-Markov Theorem

Theorem

Under A1-A5, βˆj , j = 0, 1, 2, · · · , k, are the best linear unbiased

estimators of βj , j = 0, 1, 2, · · · , k.

◮ linear: β˜j =

∑n

i=1 wijyi .

◮ best: smallest variance

◮ In the class of linear and unbiased estimators, OLSE has the

smallest variance.

◮ We don’t have to look at the alternative unbiased estiamtor of

the linear form when A1-A5 hold.

Inference

To perform statistical inference we need to know the sampling

distribution rather than the expected value and variance of OLSE.

We need to assume that the error is normally distributed.

A6 (Normality) u is independent of x1, x2, · · · , xk and is normally

distributed with zero mean and variance σ2.

Then we have (why?)

βˆj ∼ N(βj ,Var(βˆ)),

⇒

(βˆj − βj)

sd(βˆj)

∼ N(0, 1).

t-test

The population model

y = β0 + β1x1 + β2x2 + · · · ,+βkxk + u,

Under A1-A6,

(βˆj − βj)

se(βˆj)

∼ tn−k−1,

where tn−k−1 denotes Student’s t distribution with n − k − 1

degree of freedom.

More on t-test

◮ One-sided alternative:

H1 : βj > 0, tβˆj > c

H1 : βj < 0, tβˆj < −c

◮ Two-sided alternative:

H1 : βj 6= 0, |tβˆj | > c

◮ Computing p-values for tests:

P(|T | > |t|): two-sided alternative

P(T > t), P(T < t)...

t-test

Example:

̂log(wage) = 0.284 + 0.092educ + 0.0041exper + 0.022tenure

(0.104) (0.007) (0.0017) (0.003)

n = 526,R2 = 0.316

Perform a test whether the return to exper is equal to zero in population,

against the alternative that it is positive.

⇒ H0 : βexper = 0 versus H1 : βexper > 0

◮ Test statistic (under H0) ⇒ (0.0041− 0)/0.0017 ≃ 2.41

◮ Type of the test (under H1) ⇒ one-sided (right) test

◮ 5% and 1% critical value: 1.645 and 2.326, respectively.

◮ We reject the null hypothesis ⇒ βˆexper is statistically greater than

zero at the 1% significance level.

Single linear combination of the parameters

Example:

log(wage) = β0 + β1jc + β2univ + β3exper + u,

where jc is number of years attending a two-year college, univ is

number of years at a four-year college, and exper is months in the

workforce.

H0 : β1 = β2

H1 : β1 < β2

This is nothing but

H0 : β1 − β2 = 0

H1 : β1 − β2 < 0

Single linear combination of the parameters

Example (Con’t):

t-statistic:

t =

βˆ1 − βˆ2

se(βˆ1 − βˆ2)

.

In order to calculate se(βˆ1 − βˆ2) we need to know

Var(βˆ1 − βˆ2) = Var(βˆ1) + Var(βˆ2)− 2Cov(βˆ1, βˆ2)

⇒ se(βˆ1 − βˆ2) = {[se(βˆ1)]

2 + se[(βˆ2)]

2 − 2s12}

1/2

where s12 denotes an estimate of Cov(βˆ1 − βˆ2).

Single linear combination of the parameters

Example (Con’t):

Another way:

Define θ1 = β1 − β2. So β1 = θ1 + β2.

log(wage) = β0 + (θ1 + β2)jc + β2univ + β3exper + u

= β0 + θ1jc + β2(jc + univ) + β3exper + u.

⇒ Perform t-test to θ1!

F-Test

Consider major league baseball players’ salaries:

log(salary) = β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u,

where years : years in the league; gamesyr : average games played

per year; bavg : career batting average; hrunsyr : home runs per

year; rbisyr : runs batted in per year.

H0 : β3 = 0, β4 = 0, β5 = 0.

H1 : H0 is not true.

How can we construct the test statistic?

F-Test

Basic idea: Compare two models: H0 and the full models.

Unrestricted model:

log(salary) = β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u,

Restricted model:

log(salary) = β0 + β1years + β2gamesyr + u,

At first get, SSR (sum of squared residuals) from the two models.

Check whether the increase in the SSR in going from the

unrestricted model to the restricted model is large enough to reject

the null hypothesis.

What is “large enough”?

F-Test

General case:

Unrestricted model:

y = β0 + β1x1 + · · ·+ βkxk + u.

H0 : βk−q+1 = 0, · · · , βk = 0,

⇒ q exclusion restrictions on the unrestricted model.

Restricted model:

y = β0 + β1x1 + · · · + βk−qxk−q + u.

F-Test

F statistic:

F ≡

(SSRr − SSRur )/q

SSRur/(n − k − 1)

(

=

(R2ur − R

2

r )/q

(1− R2ur )/dfur

)

.

◮ SSRr > SSRur ⇒ F statistic is always nonnegative.

◮ q: numerator degrees of freedom; n − k − 1: denominator

degrees of freedom.

◮ F is distributed as an F rv with q, n − k − 1 degree of

freedom, i.e., F ∼ Fq,n−k−1.

◮ Reject H0 in favor of H1 at the chosen significance level if

F > c.

◮ p-value = P(F > F ).

F-Test

Example:

log(price) = β0+β1 log(assess)+β2 log(lotsize)+β3 log(sqrft)+β4 log(bdrms

assess: assessed housing value; lotsize: size of lots; sqrft: square

footage; bdrms: number of bedrooms.

Question: the assessed housing price is a rational valuation?

H0 : β1 = 1, β2 = 0, β3 = 0, β4 = 0.

Restricted model:

log(price) − log(assess) = β0 + u,

⇒ SSRur = 1.822 and SSRr = 1.880 ⇒

[(1.880 − 1.822)/1.822](83/4) = 0.661, (F4,83 = 2.50) ⇒ fail to

reject H0.

学霸联盟