ECMT2150-无代写
时间:2023-05-15
ECMT2150 – Lecture 10
Topics Today
Week 10
Review Endogeneity
Instrumental Variable Estimation
References/Required reading: Chp 15 & A few pages from
Cameron & Trivedi (Sections 4.8.1 and 4.8.2)
Review Endogeneity
Endogeneity is a major challenge in social sciences incl.
economics:
– With observational data:
• Omitted variables: In many cases important characteristics cannot
be observed AND these are often correlated with observed
explanatory information.
• Simultaneity: two or more variables are simultaneously determined
(in market equilibrium)
• Measurement error: variables are measured with error
• Endogeneity = violation of Assumption MLR.4/ZCM
– There is a correlation between the error u and the regressor(s)
– Changes in x are associated not only with with changes in y but also
with changes in u
Endogeneity
3
What can we do about endogeneity?
Given omitted variable bias (or unobserved heterogeneity),
x and u will be correlated so OLS is biased & inconsistent.
Options:
• Ignore it: Live with the biased estimates (perhaps sign the
bias)
• Proxy variables:
– Solve the omitted variable problem by replacing the unobserved
variable with a proxy
– Can’t always find a good proxy
• Instrumental variables
• Panel methods
Instrumental Variable (IV) Estimation
Instrumental Variables
Faced with omitted variable bias, IV is one of our options.
The IV approach
• leaves the unobserved variable in the error term
• estimates the model parameters consistently using some
additional information =
a new variable = the instrument = z
• the instrument z must satisfy 2 key assumptions:
– changes in z are associated with changes in x
– z is exogenous to u
+ z does not belong in the model
• the instrument is used to generate only exogenous
variation in our explanatory variable of interest (x)
– We use the instrument to isolate or extract the exogenous
component of x
Instrumental Variables
OLS, x is exogenous
OLS, x is endogenous
= 0 + 1 +
IV, x is endogenous, z is an instrumental variablez
Instrumental Variables
More generally:
• Consider the model
y = β0 + β1x+ u
But we believe MLR.4 does not hold, that is,
Cov(x,u) ≠ 0, i.e. E(u|x) ≠ 0
To get consistent estimators of β0, β1 we require additional
information in the form of a new variable, z, that satisfies 2
conditions
Definition of a instrumental variable (z)
In a regression with an endogenous variable (x):
z does not belong in the regression and,
1) z is highly correlated with the endogenous variable (x) (is
relevant)
is large
2) z is uncorrelated with u (is exogenous)
Instrumental Variables
9
Instrument
Relevance
Instrument
Validity
Otherwise it would
be in the model
already
Instrumental Variables
Example: Wages and the return on education, with unobserved ability
• We wish to estimate the return to education:
Consider the population model
log(wage) = β0 + β1educ + β2ability + e (1)
but suppose we do not observe ability, and we use the simple regression
model:
log(wage) = β0 + β1educ + u (2)
⇒ ability is now absorbed into the error term for model (2)
⇒ OLS applied to (2) will give us a biased and inconsistent estimator for β1 if
educ and ability are correlated (and β2 ≠ 0).
⇒ We can still use model (2) IF we can find an instrumental variable for educ
Instrumental Variables
Example: Wages and the return on education, with unobserved ability
• An IV for education must be
(1) RELEVANT: correlated with education
(2) VALID: uncorrelated with ability
• Potential instruments for education?
⇒ Last digit of Medicare number?
• it is random, so satisfies condition (2), but not condition (1).
⇒ IQ score?
• It’s a good proxy variable for ability, but would be a poor instrument for education
• a good proxy means that it is highly correlated with ability, which violates the
second condition for an IV for educ (as ability is part of the error term).
⇒ family background variables
• often used by labour economists as IVs for education
• Examples:
- mother’s or father’s education - commonly found to be positively related to
children’s education ... though it may also be correlated with child’s ability.
- number of siblings while growing up
⇒ state compulsory schooling laws
IV Example 1
Returns to Education
• Example: Father‘s education as an IV for education
OLS:
IV:
Is the education of the father a good IV?
1) Doesn‘t appear as regressor
2) Sign. correlated with educ
3) Uncorrelated w/ the error (?)
The estimated return to
education decreases (which
is to be expected)
It is also much less
precisely estimated
Return to education
likely overestimated
12
So how does IV work?
OLS – deriving the OLS estimator using exogeneity
Before looking at IV, let review the OLS estimator when x is
exogenous:
The OLS estimator = sample covariance / sample variance:
̂1 = � ,� = 1 − 1∑ − ̅)( − ̅1
− 1∑ − ̅ 2 = ∑ − ̅)( − ̅∑ − ̅ 2
and assume
Exogeneity
= population parameter
OLS – deriving the OLS estimator using exogeneity
Before looking at IV, let review the OLS estimator when x is
exogenous:
The OLS estimator = ̂1 = � ,�
And the OLS estimator is consistent, as long as the data are such that sample
variances and covariances converge to their population counterparts as n →∞,
i.e. if a LLN holds.
⇒ Bottom Line: OLS will be consistent if, and only if, exogeneity holds.
Exogeneity
• To see how the IV works, assume we have an instrumental
variable (Z) that meets the previous conditions:
... even though
IV-estimator:
Z is exogenous,
i.e. uncorrelated
with u
Z is correlated with the
explanatory variable &
implies this correlation is
non-zero
Instrumental Variables
Properties of the IV Estimator
• It can be shown that the IV estimator is consistent and
asymptotically normal.
• Again, similar to the OLS case, the IV estimator can be decomposed
as follow
• With , = 0, we have that, as → ∞, the numerator
in is zero => ̂1
= 1
• IV estimators can have substantial bias in small samples …
therefore want large samples. (see p 465)
17
Properties of IV Estimators
Our 2 IV Conditions
• We cannot test for condition (2) (Instrument Validity)
– so we need to use economic theory and intuition to decide if assuming
Cov(z,u) = 0 is reasonable.
• We can test for condition (1), i.e. if Cov(z,x) ≠ 0 holds:
To do this:
– Run the regression
x = γ0 + γ1z + v
and then test:
H0: γ1 = 0
against
H1 :γ1 ≠ 0
18
Sometimes called
the “first-stage”
regression
For instrument relevance – we want γ1 to
be large and highly significant
F-stat>10 is a usual rule of thumb
What happen if the IV is invalid or weak?
IV may be much more inconsistent than OLS if the IV is not
completely exogenous and only weakly related to x.
IV worse than OLS if: e.g.
There is no problem if
the instrumental
variable is really
exogenous.
If not, the
inconsistency will be
the larger the weaker
the correlation with x.
Invalid/Weak Instrumental Variables
19
is small
Inference with the IV Estimator
• It is straightforward to estimate the variance of the IV estimator,
and thus the standard error
• Under homoskedasticity [ 2 = 2], we have
̂1 = �2 �2 ,2= �2
,2
where:
– is the total sum of squares of xi
– ,2 is the R2 from regressing xi on zi
• We can use this standard error for constructing t-tests or CI,
under homoskedasticity
OLS vs IV Estimation
• Consider both IV and OLS estimation when x and u are uncorrelated
=> OLS will be unbiased
OLS: ̂1 = �2
IV: ̂1 = �2,2
• Since ,2 < 1, the IV variance is larger than the OLS variance (when
OLS is valid)
⇒ If the ,2 is small (i.e. we have a ‘weak’ instrument) and x and z are
only slightly correlated this will translate into very large IV standard
errors
⇒ The cost of IV estimation when x and u are uncorrelated; variance
of IV estimator is larger, sometimes much larger, than the variance
of the OLS estimator
IV & Multiple Regression
IV and Multiple Regression
• Straightforward to extend IV estimation to the multiple
regression model
• The model we are interested in estimating is known as
the structural model:
y1 = β0 +β1y2 +β2z1 +u1
where the
• y variables represent ‘endogenous’ variables
• the z variables are ‘exogenous’
⇒ the problem is that one or more of the explanatory
variables (y2) are endogenous.
IV and Multiple Regression - Example
• Wages and return on education
• Structural model:
log(wage) = β0 + β1educ + β2exper + u1
Where
• y1 = log(wage)
• y2 = educ
• z1 = exper
Here we are concerned with endogeneity of educ - allow
for educ to be correlated with u1
IV and Multiple Regression
Simple Case:
If we estimate the structural model
1 = 0 + 12 + 21 + 1
by OLS, all the coefficient estimates, �, will be biased and
inconsistent
• we need an IV for each endogenous variable
⇒Let z2 be the instrument for y2
⇒That is, Cov(z2,u1) = 0, and Cov(z2,y2) ≠ 0
IV and Multiple Regression
General Case
Structural model:
And for our instrument zk we require:
• Cov(zk,u1) = 0, and
• Cov(zk,y2) ≠ 0
We confirm relevance (also known as identification) using the first
stage regression:
endogenous exogenous variables
πk must be large and significant
Notice – all the exogenous variables
are included in the first stage
IV and Multiple Regression
• Let’s think a bit more about the first stage equation
• Notice
– the endogenous variable, y2, is expressed in terms of
exogenous variables only
– the IV estimator requires that πk ≠ 0 ⇒ y2 and zk are
correlated after controlling for z1, …, zk-1
– this is known as an “identification condition” and it is
easy to test:
• F-test on πk with the F-stat>10
IV and Multiple Regression
• It is straightforward to add many more exogenous
explanatory variables to the structural model:
– these additional explanatory variables must then also
enter the first stage, and the identification condition must
still be satisfied
• Intuitively, the instrument zk extracts a component of the
variation in y2 that is uncorrelated with the other
exogenous variables and uncorrelated with the error
term u1.
Testing for Endogeneity
Testing for Endogeneity
• Since OLS is preferred to IV if we do not have an endogeneity
problem, then we’d like to be able to test for endogeneity.
• If we do not have endogeneity, both OLS and IV are consistent.
⇒ Idea of Hausman test is to see if the estimates from OLS and IV
are different.
⇒Good idea to always compute OLS and IV estimates and see if
the estimates are practically similar or different
⇒To test whether any differences are significant, there is a
simple testing procedure
30
Testing for Endogeneity
Structural model
First stage regression:
Variable y2 is exogenous if and only if v2 is uncorrelated with u1,
i.e. if the parameter 1 is zero in the regression:
31
Variable that is suspected to be
endogenous
Testing for Endogeneity
To implement the test:
• we need the predicted residuals from estimating the first stage
by OLS, �2
• Then use them as an additional regressor in the structural
model:
Test equation:
1 = 0 + 12 + 21 + ⋯+ −1 + 1 �2 +
Hypotheses:
0: 1 = 0 …
1:1 ≠ 0 …
Conclusions:
• We conclude that 2 is endogeneous is the null hypothesis is
rejected, i.e. if the parameter 1 is significantly different from zero.
• If we do not reject the null, this is not proof of exogeneity.
32
IV Applications
IV Example 1 - Returns to Education
• Example: Father‘s education as an IV for education
OLS:
IV:
Is the education of the father a good IV?
1) Doesn‘t appear as regressor
2) Sign. correlated with educ
3) Uncorrelated w/ the error (?)
The estimated return to
education decreases (which
is to be expected)
It is also much less
precisely estimated
Return to education
likely overestimated
34
IV for Education in a Wage Regression
IVs for estimating wage return to education that have been used in the
literature:
• The number of siblings
– 1) Not a wage determinant, 2) Correlated with education because of
resource constraints in hh, 3) Uncorrelated with innate ability
• College proximity when 18 years old
– 1) Not a wage determinant, 2) Correlated with education because
more education if lived near college, 3) Uncorrelated with error (?)
• Month of birth
– 1) Not a wage determinant, 2) Correlated with education because of
compulsory school attendance laws, 3) Uncorrelated with error
35
Angrist and Krueger (1991, QJE): Does Compulsory School Attendance
Affect Schooling and Earnings?
• Return to education:
log(Wagei) = β0 + β1Edui + ui
• Unobservable Abilityi in ui is correlated with Wagei (why?)
• Edui is endogenous and the OLS estimates are inconsistent.
• Use “Season of Birth” as a IV for Education.
Need to argue that:
– Season of birth is correlated with years of schooling, but not with
wage.
– Season of birth is not likely to be correlated with ability.
IV for Education
36
IV for Education
Is quarter of birth related to education? (US 1960 Census)
– Born in 1st qrt: 6.45 years old
– Born in 4th qrt: 6.07 years old
37
IV for Education
38
• Model: ln(Wagei) = β0 + β1Edui + ui
• IV for Edui is
– FirstQrti = 1 if individual i was born in the 1st quarter,
= 0 otherwise
We are assuming that FirstQrti is correlated with Edui but not with ui.
OLS IV
Return to
Education
0.0801
(0.0004)
0.0769
(0.0150)
Estimation Results using data
about 247,199 men born b/w
1920 and 1929 (US)
Estimation Results using data
about 329,509 men born b/w
1930 and 1939 (US)
OLS IV
Return to
Education
0.0711
(0.0003)
0.0891
(0.0161)
IV for Education
39
Leigh and Ryan (2008, EER): Estimating returns to education
using different natural experiment techniques
• Model:
• Estimate the rate of return to schooling in Australia using two
different instruments for schooling:
• Month of birth
• Changes in compulsory schooling laws
• Also compare to a natural experiment approach using data on
twins
40
Calculate bias in OLS estimate:
• The implied ability bias is 9% when instrumenting with changes in school-leaving
laws [1- 0.118/0.13]
• 39% instrumenting with month of birth [1 - 0.079/0.130].
• 10–28% estimating a fixed effects model with identical twins, and;
• Return to schooling in Australia estimated at 10% - similar to Britain,
Canada and US.
^Result from Miller et al 2006, compared to different OLS estimate.
OLS
IV- month of
birth
IV - changes
in compulsory
schooling
laws
Natural
experiment
twins
0.130*** 0.079** 0.118*** 0.054**
Implied
Ability Bias
- 39% 9% 10% (OLS)
28% (IV)^
IV for Education
Smoking
Suppose we are interested in
where i indexes states.
Because Pricei is endogenous (Why?), we need to instrument.
Which of these variables would be suitable?
1. Each state’s cigarette excise tax
2. A measure of each state’s anti-smoking laws
3. Each state’s general sales tax
41
Smoking
1. Each state’s cigarette excise tax
– Cigarette excise taxes are surely correlated with cigarette
prices.
– However, they also reflect the level of anti-smoking
sentiment in the state (in the USA, MA has a tax of $1.51
per pack, NC has a tax of $0.05 per pack).
– Anti-smoking sentiment is an omitted determinant of
consumption, so excise taxes are correlated with ε.
– Excise taxes are not a valid instrument.
42
Smoking
2. A measure of state anti-smoking laws
– State anti-smoking laws might be correlated with price,
but only through their effect on cigarette demand in the
state.
– Such measures are an explanator of cigarette
consumption – arguably belong in the model; moreover,
they are also a proxy for state anti-smoking sentiment.
– Anti-smoking laws are a component of e, and would
make an invalid instrument.
43
Smoking
3. Each state’s sales tax
– State sales taxes are correlated with cigarette prices.
Higher sales taxes raise the prices of all goods.
– There is no reason to expect sales taxes to have any
other effect on cigarette consumption, or to be
correlated with any other determinant of
consumption.
– State sales taxes are a reasonable instrument.
44
Potential Sources of Instruments
• Geography as an instrument
distance, rivers, boundary, small area variation
• Legal/political institutions as an instrument
changes of laws or regulation, election dynamics
• Administrative rules or assignment rules as an
instrument
wage/staffing rules, reimbursement rules, eligibility or
assignment rules, selection procedures
• Naturally occurring randomization
draft, birth timing, lottery, roommate assignment,
weather
45
Next week
• What if you have multiple instruments available? 2SLS
– Reference Woolridge Chapter 15
• Panel Data Models
– Reference: Woolridge Chapter 13