2SLS-无代写
时间:2023-05-15
ECMT2150 – Lecture 11
Topics Today
Week 11
Part 1:
Instrumental Variable Estimation continued
– IV Applications continued – Measurement error
– 2SLS
References: Woolridge Chapter 15, Chp 9.4 (Meas Error)
Endogeneity
• A major challenge in social sciences incl. Economics
• Sources:
• Omitted variables
• Simultaneity
• Measurement error
• = violation of Assumption MLR.4 (Zero conditional mean)
=> , ് 0
What can we do about it?
• Ignore it, & hopefully sign the bias
• Proxy variables
• Instrumental variables
• Panel methods
Reminder ‐ Endogeneity
2
Definition of a instrumental variable (z)
In a regression with an endogenous variable (x):
z does not belong in the regression and,
1) z is highly correlated with the endogenous variable (x) (is
relevant)
is large
2) z is uncorrelated with u (is exogenous)
RECAP: Instrumental Variables
3
Cov(zi, xi )
Instrument
Relevance
Instrument
Validity
Otherwise it would
be in the model
already
RECAP: Instrumental Variables
• Last week
– IV as solution to problems of endogeneity
– Definition of an IV
• 2 key requirements: Validity and Relevance
– Properties of IV Estimators
• Consistent if our requirements are met
– Weak Instruments, Invalid Instruments
– IV in Multiple regression – Structural and reduced
form equation
– Testing for Endogeneity
– Potential sources for instruments
IV Applications:
A Solution to Measurement Error
Reference: Wooldridge 9.4 & 15.4
Mismeasured value = True value + Measurement error
Population
regression
Regression we
can estimate
6
Measurement error in an explanatory variable
• So how can we explore the consequences?
• Make either of these two assumptions:
ଵ uncorrelated with ଵ ‐ the observed x
2. Classical Errors in Variables (CEV) assumption
• #1 is less interesting
– sampling variance will increase, but no bias or
inconsistency.
– see the textbook (section 9.4b)
7
Measurement error in an explanatory variable
Population regression
Estimated
regression
Classical errors-in-variables assumption:
The
mismeasured
variable x1 is
correlated with
the error term
Error
uncorrelated
to true value
8
Measurement error in an explanatory
variable
• Consequences
– Under the classical errors‐in‐variables assumption, OLS is biased and
inconsistent because the mismeasured variable is endogenous
– One can show that the inconsistency is of the following form:
– The effect of the mismeasured variable suffers from attenuation bias,
i.e. the magnitude of the effect will be attenuated towards zero
– In addition, the effects of the other explanatory variables will be biased
This factor (which involves the
error variance of a regression of
the true value of x1 on the other
explanatory variables) will always
be between zero and one
9
Measurement error in an explanatory variable
when xj is measured with error, the OLS estimator of
βj is most likely biased
In general, we don’t know the sign of the bias
Two special cases:
1) if the meas error is CEV meas err, ଵ∗, ଵ ൌ 0
βj is biased toward 0
‘attenuation bias’
2) Or if ଵ, ଵ ൌ 0, no bias in β1, just larger standard errors
• NOW ‐ Instrumental Variables – these provide a possible
solution to measurement error in an explanatory variable
=> If a second measurement of the mismeasured variable is
available, this can be used as an IV for the mismeasured variable
10
Measurement error in an explanatory variable
IV Solution to Errors‐in‐Variables problem
• Model of interest
y = β0 +β1x1* +β2x2 +u (1)
• Observe y, x2
• Do not observe x1*, but we do observe a measurement of x1*:
x1 = x1*+ e
=> We could estimate
y = β0 +β1x1 +β2x2 +(u‐β1e) (2)
• Our problem is that x1 is correlated with the composite error
term in (2) via e
• So we cannot get an unbiased estimate of β1
• IV Solution: find and use an IV for x1
IV Solution to Errors‐in‐Variables problem
• IV Solution: find and use an IV for x1 (see Sect 15.4)
• Natural candidate: a second measurement of x1*
• Examples
– wages – worker’s report is the first measure, an
employer report could be the second
– In married couples, could use each spouses’
report for income or savings
– could think of motheduc and fatheduc as IVs to
deal with measurement error in education
IV for Education
13
Leigh and Ryan (2008, EER): Estimating returns to education
using different natural experiment techniques
• Model:
• Estimate the rate of return to schooling in Australia using two
different instruments for schooling:
• Month of birth
• Changes in compulsory schooling laws
• Also compare to a natural experiment approach using data on
twins
ln(incomei ) 1Edui ui
14
Calculate bias in OLS estimate:
• The implied ability bias is 9% when instrumenting with changes in school‐leaving
laws [1‐ 0.118/0.13]
• 39% instrumenting with month of birth [1 ‐ 0.079/0.130].
• 10–28% estimating a fixed effects model with identical twins, and;
• Return to schooling in Australia estimated at 10% ‐ similar to Britain,
Canada and US.
^Result from Miller et al 2006, compared to different OLS estimate.
Natural
experiment
twins
IV ‐ changes
in compulsory
schooling
laws
IV‐ month of
birthOLS
0.054**0.118***0.079**0.130***
10% (OLS)
28% (IV)^
9%39%‐Implied
Ability Bias
IV for Education
1 AbilityAdjustedOLS
IV Applications
– and Standard errors in IV
Returns to Education
• Example: Father‘s education as an IV for education
OLS:
IV:
Is the education of the father a good IV?
1) Doesn‘t appear as regressor
2) Sign. correlated with educ
3) Uncorrelated w/ the error (?)
The estimated return to
education decreases (which
is to be expected)
It is also much less
precisely estimated
Return to education
likely overestimated
16
Reminder – Std errors in OLS vs IV
• Consider both IV and OLS estimation when x and u are uncorrelated
=> OLS will be unbiased
OLS: መଵ ൌ ఙෝమௌௌ்ೣ
IV: መଵ ൌ ఙෝమௌௌ்ೣ ோೣ,మ
• Since ௫,௭ଶ ൏ 1, the IV variance is larger than the OLS variance (when
OLS is valid)
If the ௫,௭ଶ is small (i.e. we have a ‘weak’ instrument) and x and z are
only slightly correlated this will translate into very large IV standard
errors
The cost of IV estimation when x and u are uncorrelated; variance
of IV estimator is larger, sometimes much larger, than the variance
of the OLS estimator
*Also see extra page of notes added to Canvas for more details
Two Stage Least Squares
(2SLS)
Reference: Wooldridge 15‐3
IV and 2SLS
• 2SLS = Two stage least squares
• So far:
– 1 endogenous variable
– 1 instrument
–What if we have more than one IV available?
Reminder – IV Estimation
• Recall that the OLS estimator is
• And the IV estimator is
Note: Replace z with x in the IV estimator and
you are back to the OLS estimator
ˆ1OLS
xi x yi y
xi x 2
ˆ1IV
zi z yi y
zi z xi x
1) Different z
have different
sample covar
with x and y; give
different point
estimates
2) If each z is
valid, they are
all consistent
3) z isn’t included
in the model (like
a proxy variable
would be); it is
just being used to
identify the
causal effect of x
Two Stage Least Squares (2SLS)
Consider our original structural model
y1 = β0 +β1y2 +β2z1 +u1
and we have the first stage equation:
y2 = π0 +π1z1 +π2z2 +π3z3 +v2
both z2 and z3 are valid instruments
‐ they do not appear in the structural model
‐ they are uncorrelated with the structural error
term u1
IV and 2SLS
• The best IV for y2 is then a linear combination of all of the exogenous
variables:
y2∗ = π0 +π1z1 +π2z2 +π3z3
⇒We can estimate y2∗ by regressing y2 on z1, z2, z3
‐ call this the ‘first stage’
‐ the first stage = a reduced form regression
‐ endogenous variable y2 is predicted using ONLY exogenous information
⇒ Require that π2≠ 0, or π3≠0 for identification
• Identification is equivalent to relevance in the 1 IV case
• Here we require an F‐test for identification:
:ଶ ൌ ଷ ൌ 0,ଵ:
• With the F‐stat>10
original structural model
y1 = β0 +β1y2 +β2z1 +u1
First stage equation:
y2 = π0 +π1z1 +π2z2 +π3z3 +v2
Additional
exogenous
variables
IV and 2SLS
Why is it called Two Stage LS?
‐ Can think of it as conducted in 2 stages:
1. Form the prediction for y2 based on the OLS
regression of the first stage:
2. Substitute ŷ2 for y2 into the structural model
Run ଵ ଵ ଶ ଶ ଵ ଵ by OLS
=> get the IV coefficient estimate ଵ
original structural model
y1 = β0 +β1y2 +β2z1 +u1
First stage equation:
y2 = π0 +π1z1 +π2z2 +π3z3 +v2
IV and 2SLS
• The IV estimator is also known as Two Stage Least Squares (2SLS)
• If there is one endogenous variable and one instrument then 2SLS = IV
• In general,
– The IV estimates of β0, β1, and β2 for the structural model are
identical to OLS estimates from the regression of y1 on ŷ2 and z1.
– While the coefficients are the same, the standard errors from doing
2SLS in two stages or steps are incorrect
⇒ so let the software do it for you
• software packages also have test commands for IV/2SLS estimator that
calculate the correct test statistic for tests of joint hypotheses
– STATA: ivregress 2sls y1 z1 (y2 = z2 z3)
original structural model
y1 = β0 +β1y2 +β2z1 +u1
First stage equation:
y2 = π0 +π1z1 +π2z2 +π3z3 +v2
IV and 2SLS
• Why/How does Two Stage Least Squares work?
– All variables in the second stage regression are exogenous
because y2 was replaced by a prediction based on only
exogenous information
– By using the prediction based on exogenous information,
y2 is purged of its endogenous part (the part that is related
to the error term)
• 2SLS can also be used if there is more than one
endogenous variable
=> must have at least as many instruments (excluded
exogenous variables) as there are endogenous variables in
the structural equation
=> required for identification
IV and 2SLS ‐ Example
Return to Education for Married Women, Mroz (1987)
log(wage) = β0 + β1educ + β2exper + β3exper2 + u
• Use 2 instruments for educ: motheduc and fatheduc
run first stage regression
ଶ
• Test the instruments are jointly significant in explaining
educ get F ‐stat of 55.40 (p‐value= 0.000).
Education is significantly
partially correlated with the
education of the parents
IV and 2SLS ‐ Example
Return to Education for Married Women, Mroz (1987)
log(wage) = β0 + β1educ + β2exper + β3exper2 + u
• Estimate structural equation using 2SLS:
The return to education is much lower
but also much more imprecise than
with OLS
(OLS estimate = 0.109 with s.e. 0.014)
IV – A few more comments
• Under assumptions completely analogous to OLS, but
conditioning on zi rather than on xi, 2SLS/IV is consistent
and asymptotically normal
• 2SLS/IV is typically much less precise because there is more
multicollinearity and less explanatory variation in the
second stage regression
o Std error in IV > Std error in OLS unless , ൌ 1
• After 2SLS, you can test, and correct, for heteroskedasticity
as we did for OLS. You can use robust standard errors.
• 2SLS works with time series data, as well as pooled cross
sections and panel data
IV – Recap
• Consider the IV method for situations where ZCM assumption
does not hold for the regression model
• IV must satisfy two properties
(i) VALID: it must be exogenous ‐ unrelated to error term in the
structural model, and
(ii) RELEVANT: it must be partially correlated with the endogenous
explanatory variable
• 2SLS: can have more IVs than endogenous explanatory variables
• When IVs are poor ‐ correlated with the error term or weakly
correlation with the endogenous explanatory variable – IV/2SLS
may be more inconsistent than OLS
• We can test for Endogeneity of an explanatory variable when
have an IV (see Wooldridge 15‐5a).
Next topics
• Pooled Cross Sections and Panel Data
• Models for these kinds of data
• Policy Analysis
• Panel Data Models
– Fixed effects
– First differences
– Random effects
– Cluster‐robust standard errors
– Other data structures: Panel of matched pairs
References:
• Wooldridge Chp 13 & Chp 14
• Additional ref: Cameron ‐ Chp 21
– The Cameron chapter is available via our Reading List – under ‘Required Reading’