R代写-L2

Econometrics: L2
Multiple Regression Model
Sung Y. Park
Chung-Ang Univ.
Motivation
Consider the following example:
wage = β0 + β1educ + β2exper + u,
where exper is years of labor market experience.
◮ wage is determined by the two independent variables
◮ β1: the effect of educ on wage holding all other factors affecting
wage (our interest)
◮ just as with simple regression, we make assumptions about how u is
related with independent variables
◮ Note that we have to assume educ and exper are uncorrelated in
the simple regression (why?)
Motivation
A model with two independent variables:
y = β0 + β1x1 + β2x2 + u,
◮ β0: the intercept
◮ β1: measures the change in y with respect to x1, holding
other factors fixed
◮ β2: measures the change in y with respect to x1, holding
other factors fixed
Note: In Economics, this is used a lot: ceteris paribus, keeping all
else constant.
Motivation
cons = β0 + β1inc + β2inc
2 + u,
◮ note that consumption depends on only income ⇒ x1 = inc
and x2 = inc
2
◮ here, is β1 the ceteris paribus effect of inc on cons? ⇒ No!
(why?)
◮ the change in consumption with respect to the change in
income (the marginal propensity to consume)
∆cons
∆inc
∼ β1 + 2β2inc.
Motivation
Key Assumption:
E (u|x1, x2) = 0
◮ for any values of x1 and x2, the average unobservable is equal
to zero
◮ note that the common value 0 is not crucial as long as β0 is
included in the model
◮ E (u|educ, exper) = 0 ⇒ other factors affecting wage are not
related on average to educ and exper
◮ E (u|inc, inc2) = 0? ⇒ E (u|inc) = 0 (inc2 is redundant)
Motivation
More generally...
y = β0 + β1x1 + β2x2 + β3x3 + · · ·+ βkxk + u,
◮ there are k independent variables in the model but k + 1
parameters
◮ the parameters other than the intercept (β0): slope
parameters
◮ u: error term of disturbance
Motivation
An example:
log(salary) = β0 + β1 log(sales) + β2ceoten + β3ceoten
2 + u,
◮ β1 : ceteris paribus elasticity of salary with respect to sales
◮ if β3 = 0, 100β2: the ceteris paribus percentage increase in
salary when ceotan increases by one year
◮ if β3 6= 0, the effect of ceoten on salary is more complicated
(later)
◮ note that it is nonlinear relationship between salary and sales
but linear in βj
OLS estimates
OLS estimates: Choose, simultaneously, βˆ0, βˆ1 and βˆ2 that make
n∑
i=1
(yi − βˆ0 − βˆ1xi1 − βˆ2xi2)
2
as small as possible.
◮ first index i refers to the observation number
◮ second index represents different independent variable
◮ xij : the i-th observation on the j-th independent variable.
OLS estimates
General case:
min
β0,β1,··· ,βk
n∑
i=1
(yi − β0 − β1xi1 − β2xi2 − · · · − βkxik)
2
⇒ leads to k + 1 linear equations in k + 1 unknown βˆ0, βˆ1, · · · , βˆk :
First order conditions:
n∑
i=1
(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0
n∑
i=1
xi1(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0
...
n∑
i=1
xik(yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik) = 0
OLS estimates
Interpretation:
yˆ = βˆ0 + βˆ1x1 + βˆ2x2
◮ βˆ1 and βˆ2 have partial effect (ceteris paribus) interpretations
◮ From the above equation we can get the predicted change in y
given x1 and x2
∆yˆ = βˆ1∆x1 + βˆ2∆2
◮ when x2 is fixed (∆x2 = 0)
∆yˆ = βˆ1∆x1.
◮ when x1 is fixed (∆x1 = 0)
∆yˆ = βˆ2∆x2.
OLS estimates : “Partialling out”
Interesting Formulas:
yˆ = βˆ0 + βˆ1x1 + βˆ2x2

βˆ1 = (
n∑
i=1
rˆi1yi)/(
n∑
i=1
rˆ2i1)
rˆi1 are the OLS residual from a simple regression of x1 on x2
◮ note that rˆi1 are the part of xi1 that is uncorrelated with xi2
⇒ rˆi1 is xi1 after the effect of xi2 have been partialled out.
◮ βˆ1 measures the sample relationship between y and x1 after
x2 has been partialled out.
◮ General case: rˆi1 come from the regression of x1 on
x2, x3, · · · , xk .
OLS estimates : Comparison
Simple regression: y˜ = β˜0 + β˜1x1
Multiple regression: yˆ = βˆ0 + βˆ1x1 + βˆ2x2
Generally, β˜1 6= βˆ1
Interesting comparison:
β˜1 = βˆ1 + βˆ2δ˜1,
where δ˜1 is from xˆ2 = δ˜0 + δ˜1x1.
OLS estimates : Comparison
Under which conditions they are equal?
◮ The partial effect of x2 on yˆ is zero, i.e., βˆ2 = 0.
◮ x1 and x2 are uncorrelated, i.e., δ˜1 = 0.
◮ General multiple regression case (k number of indep vars). ⇒
(i) βˆ2 = βˆ3 = · · · = βˆk = 0; (ii) x1 is uncorrelated with each
of x2, x3, · · · , xk .
Statistical properties of OLS estimators
• Assumptions :
A1 (Linear in Parameter) y is related to x1, x2, · · · , xk by a linear
function, i.e., y = β0 + β1x1 + β2x2 + · · · + βkxk + u.
A2 (Random Sampling) {(yi , xi1, xi2, · · · , xik)}, i = 1, 2, · · · , n is
a random sample of the population model in A1.
A3 (No Perfect Collinearity) In the sample, none of the indep vars
is constant, and there are no exact linear relationship among
the indep vars.
A4 (Zero Conditional Mean) E (u|x1, x2, · · · , xk) = 0.
Statistical properties of OLS estimators
A3 (No Perfect Collinearity) In the sample, none of the indep vars
is constant, and there are no exact linear relationship among
the indep vars.
◮ A3 allows the independent variables to be correlated; but not
perfectly correlated. Ex: one car is constant multiple of
another (x1 = z and x2 = 3× z).
◮ what about y = β0 + β1x + β2x
2 + u?
◮ what about log(y) = β0 + β1 log(x) + β2 log(x
2) + u?
◮ what about y = β0 + β1x1 + β2x2 + β3z + u, where
z = x1 + x2?
◮ what can we do when there is the perfect collinearity?
Statistical properties of OLS estimators
• Unbiasedness of OLS :
Theorem
Under A1-A4,
E (βˆj) = βj , j = 1, 2, · · · , k,
for any population parameter βj .
Statistical properties of OLS estimators
• Proof :
Under A3, OLS estimators exist and
βˆ1 =
∑n
i=1 rˆi1yi∑n
i=1 rˆ
2
i1
=
∑n
i=1 rˆi1(β0 + β1xi1 + · · ·+ βkxik + u)∑n
i=1 rˆ
2
i1
=
∑n
i=1(β1rˆi1xi1 + rˆi1u)∑n
i=1 rˆ
2
i1
= β1 +
∑n
i=1 rˆi1u∑n
i=1 rˆ
2
i1
(why?)

E (βˆ1|X) = β1 + (
n∑
i=1
rˆi1E (ui |X))/(
n∑
i=1
rˆ2i1)
= β1 + (
n∑
i=1
rˆi1 · 0)/(
n∑
i=1
rˆ2i1) = β1
where X is the data on all indep vars.
Including irrelevant variables
We specify the model
y = β0 + β1x1 + β2x2 + β3x3 + u,
and the model satisfies A1-A4. But the true model is given by
E (y |x1, x2) = β0 + β1x1 + β2x2.
• What is the effect of including irrelevant x3 in the model?
◮ Unbiasedness: No problem. (E (βˆ3) = 0 by previous Thm)
◮ Undesirable effects on the variance of the OLS estimators
Omitted variable
The true model is
y = β0 + β1x1 + β2x2 + u,
and the model satisfies A1-A4. But we perform a simple regression
of y on x1 only,
y˜ = β˜0 + β˜1x1
• What is the effect of omitting x2?
◮ Unbiasedness : we can use β˜1 = βˆ1 + βˆ2δ˜1.
⇒ E (β˜1) = E (βˆ1 + βˆ2δ˜1) = E (βˆ1) + E (βˆ2)δ˜1 = β1 + β2δ˜1
⇒ Bias(β˜1) = E (β˜1)− β1 = β2δ˜1.
◮ β2 = 0 → β˜1 is unbiased.
◮ δ˜1 = 0 → β˜1 is unbiased even if β2 6= 0
Omitted variable
Bias(β˜1) = β2δ˜1
Sign of the bias:
Corr(x1, x2) > 0 Corr(x1, x2) < 0
β2 > 0 positive bias (upward bias) negative bias (downward bias)
β2 < 0 negative bias (downward bias) positive bias (upward bias)
Note: The size of the bias is also important. Small size of bias
might be okay. The size of the bias is determined by the sizes of
β2 and δ˜1.
Omitted variable
Example:
The true model is
log(wage) = β0 + β1educ + β2abil + u
We estimate the model based on
log(wage) = β0 + β1educ + u
◮ more ability leads to higher wage (β2 > 0).
◮ educ and abil are possibly positively correlated (δ1 > 0).
◮ The OLS estimates are on average too large.
The variance of OLSE
To show the efficiency of OLSE, we need one more assumption.
A5 (Homoskedasticity) The error u has the same variance given
any values of the independent variables, i.e.,
Var(u|x1, · · · , xk) = σ
2
Then we can obtain the variance of βˆj .
Theorem
Under A1-A5,
Var(βˆj) =
σ2
(
∑n
i=1(xij − x¯j)
2)(1− R2j )
,
for j = 1, 2, ·, k, where R2j is the R-square from regressing xj on all
other independent variables and an constant.
The variance of OLSE
Var(βˆj) =
σ2
(
∑n
i=1(xij − x¯j)
2)(1− R2j )
.
◮ (error variance) σ2 ↑ ⇒ Var(βˆj ) ↑
◮ (sample variation in xj)
∑n
i=1(xij − x¯j)
2 ↑ ⇒ Var(βˆj ) ↓
◮ (multicollinearity) R2j ↑ ⇒ Var(βˆj) ↑
Variance in misspecified model
The true model:
y = β0 + β1x1 + β2x2 + u, (βˆ0, βˆ1, βˆ2)
Our model:
y = β0 + β1x1 + u, (β˜0, β˜1)
◮ β˜1: biased unless E (x1x2) = 0 and β2 = 0
◮ βˆ1 is preferred to β˜1 (unbiasedness criterion only)
Var(βˆ1) = σ
2/[SST1(1− R
2
1 )], Var(β˜1) = σ
2/SST1
⇒ Var(β˜1) is always smaller than Var(βˆ1)!
Variance in misspecified model
◮ In summary,
When β2 = 0, β˜1 and βˆ1 are unbiased and Var(β˜1) < Var(βˆ1).
When β2 6= 0, β˜1 is biased, βˆ1 is unbiased and
Var(β˜1) < Var(βˆ1).
◮ β2 = 0: β˜1 is preferred.
◮ β2 6= 0: Bias in β˜1 does not decrease as the sample size
grows. But Var(β˜1) and Var(βˆ1) goes to 0 when the sample
size is large enough.
Estimation of σ2
Note that we do not observe u!
ui = yi − β0 − β1xi1 − β2xi2 − · · · − βkxik
so we replace each βj with OLSE.
uˆi = yi − βˆ0 − βˆ1xi1 − βˆ2xi2 − · · · − βˆkxik
• The unbiased estimator of σ2:
σˆ2 =
n∑
i=1
uˆ2i /(n − k − 1).
• Standard error of βˆj :
se(βˆj) = σˆ/[SSTj(1− R
2
j )]
1/2
Gauss-Markov Theorem
Theorem
Under A1-A5, βˆj , j = 0, 1, 2, · · · , k, are the best linear unbiased
estimators of βj , j = 0, 1, 2, · · · , k.
◮ linear: β˜j =
∑n
i=1 wijyi .
◮ best: smallest variance
◮ In the class of linear and unbiased estimators, OLSE has the
smallest variance.
◮ We don’t have to look at the alternative unbiased estiamtor of
the linear form when A1-A5 hold.
Inference
To perform statistical inference we need to know the sampling
distribution rather than the expected value and variance of OLSE.
We need to assume that the error is normally distributed.
A6 (Normality) u is independent of x1, x2, · · · , xk and is normally
distributed with zero mean and variance σ2.
Then we have (why?)
βˆj ∼ N(βj ,Var(βˆ)),

(βˆj − βj)
sd(βˆj)
∼ N(0, 1).
t-test
The population model
y = β0 + β1x1 + β2x2 + · · · ,+βkxk + u,
Under A1-A6,
(βˆj − βj)
se(βˆj)
∼ tn−k−1,
where tn−k−1 denotes Student’s t distribution with n − k − 1
degree of freedom.
More on t-test
◮ One-sided alternative:
H1 : βj > 0, tβˆj > c
H1 : βj < 0, tβˆj < −c
◮ Two-sided alternative:
H1 : βj 6= 0, |tβˆj | > c
◮ Computing p-values for tests:
P(|T | > |t|): two-sided alternative
P(T > t), P(T < t)...
t-test
Example:
̂log(wage) = 0.284 + 0.092educ + 0.0041exper + 0.022tenure
(0.104) (0.007) (0.0017) (0.003)
n = 526,R2 = 0.316
Perform a test whether the return to exper is equal to zero in population,
against the alternative that it is positive.
⇒ H0 : βexper = 0 versus H1 : βexper > 0
◮ Test statistic (under H0) ⇒ (0.0041− 0)/0.0017 ≃ 2.41
◮ Type of the test (under H1) ⇒ one-sided (right) test
◮ 5% and 1% critical value: 1.645 and 2.326, respectively.
◮ We reject the null hypothesis ⇒ βˆexper is statistically greater than
zero at the 1% significance level.
Single linear combination of the parameters
Example:
log(wage) = β0 + β1jc + β2univ + β3exper + u,
where jc is number of years attending a two-year college, univ is
number of years at a four-year college, and exper is months in the
workforce.
H0 : β1 = β2
H1 : β1 < β2
This is nothing but
H0 : β1 − β2 = 0
H1 : β1 − β2 < 0
Single linear combination of the parameters
Example (Con’t):
t-statistic:
t =
βˆ1 − βˆ2
se(βˆ1 − βˆ2)
.
In order to calculate se(βˆ1 − βˆ2) we need to know
Var(βˆ1 − βˆ2) = Var(βˆ1) + Var(βˆ2)− 2Cov(βˆ1, βˆ2)
⇒ se(βˆ1 − βˆ2) = {[se(βˆ1)]
2 + se[(βˆ2)]
2 − 2s12}
1/2
where s12 denotes an estimate of Cov(βˆ1 − βˆ2).
Single linear combination of the parameters
Example (Con’t):
Another way:
Define θ1 = β1 − β2. So β1 = θ1 + β2.
log(wage) = β0 + (θ1 + β2)jc + β2univ + β3exper + u
= β0 + θ1jc + β2(jc + univ) + β3exper + u.
⇒ Perform t-test to θ1!
F-Test
Consider major league baseball players’ salaries:
log(salary) = β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u,
where years : years in the league; gamesyr : average games played
per year; bavg : career batting average; hrunsyr : home runs per
year; rbisyr : runs batted in per year.
H0 : β3 = 0, β4 = 0, β5 = 0.
H1 : H0 is not true.
How can we construct the test statistic?
F-Test
Basic idea: Compare two models: H0 and the full models.
Unrestricted model:
log(salary) = β0+β1years+β2gamesyr+β3bavg+β4hrunsyr+β5rbisyr+u,
Restricted model:
log(salary) = β0 + β1years + β2gamesyr + u,
At first get, SSR (sum of squared residuals) from the two models.
Check whether the increase in the SSR in going from the
unrestricted model to the restricted model is large enough to reject
the null hypothesis.
What is “large enough”?
F-Test
General case:
Unrestricted model:
y = β0 + β1x1 + · · ·+ βkxk + u.
H0 : βk−q+1 = 0, · · · , βk = 0,
⇒ q exclusion restrictions on the unrestricted model.
Restricted model:
y = β0 + β1x1 + · · · + βk−qxk−q + u.
F-Test
F statistic:
F ≡
(SSRr − SSRur )/q
SSRur/(n − k − 1)
(
=
(R2ur − R
2
r )/q
(1− R2ur )/dfur
)
.
◮ SSRr > SSRur ⇒ F statistic is always nonnegative.
◮ q: numerator degrees of freedom; n − k − 1: denominator
degrees of freedom.
◮ F is distributed as an F rv with q, n − k − 1 degree of
freedom, i.e., F ∼ Fq,n−k−1.
◮ Reject H0 in favor of H1 at the chosen significance level if
F > c.
◮ p-value = P(F > F ).
F-Test
Example:
log(price) = β0+β1 log(assess)+β2 log(lotsize)+β3 log(sqrft)+β4 log(bdrms
assess: assessed housing value; lotsize: size of lots; sqrft: square
footage; bdrms: number of bedrooms.
Question: the assessed housing price is a rational valuation?
H0 : β1 = 1, β2 = 0, β3 = 0, β4 = 0.
Restricted model:
log(price) − log(assess) = β0 + u,
⇒ SSRur = 1.822 and SSRr = 1.880 ⇒
[(1.880 − 1.822)/1.822](83/4) = 0.661, (F4,83 = 2.50) ⇒ fail to
reject H0.