ECN6540-无代写
时间:2024-08-07
ECN6540 Econometric Methods Professor Karl Taylor
1
Topic 7 (B) – SAMPLE SELECTION
Outline
1. Introduce the concept of sample selection – treatment and
outcome;
2. Consider the HECKMAN correction procedure and the
problem of selectivity bias;
3. Derive the inverse Mills ratio;
4. Investigate the relationship between the HECKMAN
correction procedure and the TOBIT model;
5. Reservations about sample selection.
ECN6540 Econometric Methods Professor Karl Taylor
2
SAMPLE SELECTION
Builds on lectures which looked at qualitative response models where the
dependent variable is discrete rather than a continuous variable.
AND
economic relationships when censoring occurs in the data – partly discrete.
SAMPLE SELECTION models involve the consideration of a treatment
(typically discrete binary variable) and an outcome (typically a continuous
variable).
Terminology derives from medicine where the treatment might be a new
drug and the outcome the life span of the patient for example.
Heckman General Model (AER, 1990)
ECN6540 Econometric Methods Professor Karl Taylor
3
iii
y
1111
βX
iii
y
2222
βX
01
0iii
T γZ
iiiii
yTyTy
21
1
Where
i
T is the treatment, an indicator variable taking the value of (0/1) as
the statement 1(.) is true or false respectively.
The continuous variables
i
y
1
and
i
y
2
describe the relationship between the
outcome and the covariates if the individual does or does not get the
treatment.
ECN6540 Econometric Methods Professor Karl Taylor
4
I. Heckman Correction
The classical example of selectivity bias stems from Gronau (JPE, 1974)
where:
outcome is a woman’s wage,
and the
treatment is her decision to work in the labour market.
The simplest model would be to estimate the following:
iii
w
1
βX
Where w is the log wage rate, X is vector of characteristics such as
experience and schooling.
ECN6540 Econometric Methods Professor Karl Taylor
5
WHY can this not be estimated for females only?
Rests on the argument that the sample of women who work in the labour
market is not random sample of women.
OLSLeads to selectivity bias.
Can write a labour market participation equation as:
01
0iii
T γZ
Where Z includes those characteristics thought to influence the probability
of whether or not a woman works.
Hence a women works if
ii 0
γZ , or 0i iZ γ <
ECN6540 Econometric Methods Professor Karl Taylor
6
NOTE
Its possible that X and Z include common variables.
In Gronau’s example the Z included the number of small children.
– makes sense in that the women’s participation decision may well depend
upon the number of dependents in the households – but this variable should
not have an impact upon wages.
The selection problem is apparent by taking expectations of the wage
equation over the sample of working women.
γZβX
iiiiii
ETwE
01
1
Provided the errors
i0
and
i1
are normally distributed, we have:
ECN6540 Econometric Methods Professor Karl Taylor
7
iii
0
2
01,01
Where:
i
is uncorrelated with
i0
, and;
1,0
is the covariance between
i0
and ,
i1
101,0
and;
2
0
is the variance of
i0
.
This means we can now write:
0
0
0
0
0
0
1,0
01
γZ
γZ iii
iii
EE
ECN6540 Econometric Methods Professor Karl Taylor
8
0
0
0
1,0
γZ
γZ
i
i
. is the standard normal density and . its cumulative distribution
function.
Hence estimates of
iii
w
1
βX by OLS for females only will yield
biased estimates.
Selectivity bias occurs whenever 01,0 i.e. 0 .
HECKMAN noted that the bias on βˆ occurs because essentially the
underlying equation of interest has an omitted variable:
THE INVERSE MILLS RATIO
ECN6540 Econometric Methods Professor Karl Taylor
9
0
0
γZ
γZ
i
i
. is the standard normal density and . its cumulative distribution
function.
Given this is the omitted variable then we can gain consistent estimates of
βˆ by estimating the following:
~
0
0
γZ
γZ
βX
i
i
ii
w
How do we estimate this?
2 step estimator proposed by Heckman
ECN6540 Econometric Methods Professor Karl Taylor
10
STEPS
1. Run a PROBIT (WHY PROBIT?) of the treatment on the vector Z to
obtain estimates of
0
γ ;
2. Use these estimates to construct the inverse Mills ratio;
3. Run OLS of the outcome on X using the estimated inverse Mills ratio
as an additional regressor.
An estimate of
0
1,0
is given by the coefficient ~ on the inverse Mills ratio.
ECN6540 Econometric Methods Professor Karl Taylor
11
II. Example of Selection Bias
Model the wages of female workers in the UK.
iii
w
1
βX
01
0iii
T γZ
The BHPS is a random sample survey, carried out by the Institute for Social
and Economic Research, of each adult member from a nationally
representative sample. For Wave one, interviews were conducted during the
autumn of 1991.
Use the 2001 wave i.e. a single cross section.
ECN6540 Econometric Methods Professor Karl Taylor
12
The participation equation
VARIABLE DESCRIPTION
Emp Dummy dependent variable (0/1) equals 1 if employed
Age Age of the individual at date of interview
Nkids Number of children in household
Depkids Dummy variable (0/1) equals 1 if dependent children in household
Poorhealth Dummy variable (0/1) equals 1 if in poor health over past year
Marrried Dummy variable (0/1) equals 1 if married or cohabiting
Deg Dummy variable (0/1) equals 1 if highest qualification is a degree
Voc Dummy variable (0/1) equals 1 if highest qualification is vocational
Alev Dummy variable (0/1) equals 1 if highest qualification is A’level
Olev Dummy variable (0/1) equals 1 if highest qualification is O’level
CSE Dummy variable (0/1) equals 1 if highest qualification is CSE
ECN6540 Econometric Methods Professor Karl Taylor
13
The wage equation
VARIABLE DESCRIPTION
Lw Log usual monthly wage rate
Exp Years of experience
Expsq Experience squared
Yos Years of schooling
Firm100_199 Dummy variable (0/1) equals 1 if works in firm with 100-199 workers
Firm200_499 Dummy variable (0/1) equals 1 if works in firm with 200-499 workers
Firm500_999 Dummy variable (0/1) equals 1 if works in firm with 500-999 workers
Firm1000 Dummy variable (0/1) equals 1 if works in firm with 1000+ workers
Hrperweek Number of hours worked per week
Marrried Dummy variable (0/1) equals 1 if married or cohabiting
ECN6540 Econometric Methods Professor Karl Taylor
14
Regression ignoring selection
iii
w
1
βX
reg lw exp expsq yos firm100_199 firm200_499 firm500_999 firm1000 hrperweek
married if emp==1
Source | SS df MS Number of obs = 4713
-------------+------------------------------ F( 9, 4703) = 105.84
Model | 2132.24141 9 236.915712 Prob > F = 0.0000
Residual | 10527.6758 4703 2.23850218 R-squared = 0.1684
-------------+------------------------------ Adj R-squared = 0.1668
Total | 12659.9172 4712 2.68673964 Root MSE = 1.4962
------------------------------------------------------------------------------
lw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exp | .0396655 .0074869 5.30 0.000 .0249878 .0543433
expsq | -.0007474 .0001554 -4.81 0.000 -.0010521 -.0004427
yos | .0403402 .0094993 4.25 0.000 .0217171 .0589634
firm100_199 | -.05539 .0771011 -0.72 0.473 -.2065442 .0957642
firm200_499 | .1911997 .0755536 2.53 0.011 .0430792 .3393203
firm500_999 | .1188637 .1025135 1.16 0.246 -.0821107 .3198381
firm1000 | .2856979 .0724845 3.94 0.000 .1435944 .4278014
hrperweek | .055935 .0019693 28.40 0.000 .0520743 .0597957
married | .0540388 .0494026 1.09 0.274 -.0428135 .150891
_cons | 3.769861 .1649014 22.86 0.000 3.446577 4.093145
------------------------------------------------------------------------------
ECN6540 Econometric Methods Professor Karl Taylor
15
Calculate the predicted wage for the following female:
29 years of experience;
12 years of schooling;
Works 21 hours per week in a firm employing less than 100 people;
And is married.
iiiiii
MarriedHrperweekYosExpsqExpw
543210
ˆˆˆˆˆˆˆ
1ˆ21ˆ12ˆ841ˆ29ˆˆˆ
543210
i
w
10540388.021055935.0
120403402.08410007474.0290396655.0769861.3ˆ
i
w
17.405£0043.6ˆ
i
w
ECN6540 Econometric Methods Professor Karl Taylor
16
THIS IS WRONG – COEFFICIENTS BIASED!
2-STEP APPROACH
STEP 1
01
0iii
T γZ
probit emp age nkids depkids poorhealth deg voc alev olev cse married
Probit estimates Number of obs = 8336
LR chi2(10) = 877.63
Prob > chi2 = 0.0000
Log likelihood = -5267.7906 Pseudo R2 = 0.0769
------------------------------------------------------------------------------
emp | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0031502 .0012895 -2.44 0.015 -.0056777 -.0006228
nkids | -.2389976 .025578 -9.34 0.000 -.2891296 -.1888655
depkids | .3150533 .0542688 5.81 0.000 .2086883 .4214183
poorhealth | -.7700927 .0491398 -15.67 0.000 -.8664049 -.6737805
deg | .8842659 .051825 17.06 0.000 .7826908 .985841
voc | .5717135 .063198 9.05 0.000 .4478478 .6955793
alev | .471608 .0452927 10.41 0.000 .3828359 .5603802
olev | .4855719 .0392577 12.37 0.000 .4086282 .5625156
cse | .503995 .0668361 7.54 0.000 .3729987 .6349913
married | .2861741 .0332624 8.60 0.000 .220981 .3513672
_cons | -.1269548 .0628628 -2.02 0.043 -.2501637 -.0037459
------------------------------------------------------------------------------
ECN6540 Econometric Methods Professor Karl Taylor
17
predict y, xb (calculates linear prediction)
gen n1=normd(y) (normden(z) returns the standard normal density)
gen n2=normprob(y) (normprob(z) returns the cummulative density)
gen mills=n1/n2
THIS CALCULATES THE INVERSE MILLS RATIO
0
0
γZ
γZ
i
i
. is the standard normal density and . its cumulative distribution
function.
STEP 2
~
0
0
γZ
γZ
βX
i
i
ii
w
ECN6540 Econometric Methods Professor Karl Taylor
18
reg lw exp expsq yos firm100_199 firm200_499 firm500_999 firm1000 hrperweek
mills married if emp==1
Source | SS df MS Number of obs = 4713
-------------+------------------------------ F( 10, 4702) = 155.91
Model | 3152.53575 10 315.253575 Prob > F = 0.0000
Residual | 9507.38141 4702 2.02198669 R-squared = 0.2490
-------------+------------------------------ Adj R-squared = 0.2474
Total | 12659.9172 4712 2.68673964 Root MSE = 1.422
------------------------------------------------------------------------------
lw | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
exp | .0482869 .0071259 6.78 0.000 .0343167 .0622571
expsq | -.0006845 .0001477 -4.63 0.000 -.0009741 -.0003948
yos | .0293446 .0090415 3.25 0.001 .011619 .0470701
firm100_199 | -.0803412 .0732859 -1.10 0.273 -.224016 .0633335
firm200_499 | .2236171 .0718213 3.11 0.002 .0828137 .3644206
firm500_999 | .0956679 .0974351 0.98 0.326 -.0953506 .2866864
firm1000 | .2264656 .0689403 3.28 0.001 .0913103 .3616209
hrperweek | .0512885 .001883 27.24 0.000 .047597 .0549801
mills | -2.335898 .1039873 -22.46 0.000 -2.539762 -2.132034
married | -.262907 .0490268 -5.36 0.000 -.3590226 -.1667914
_cons | 5.492425 .1744783 31.48 0.000 5.150366 5.834484
------------------------------------------------------------------------------
ECN6540 Econometric Methods Professor Karl Taylor
19
~= –2.34 t=(22.46)
So selection is a problem in this model and it would be incorrect to
estimate a wage equation for females directly.
The estimate on the inverse Mills ratio is negative and significant which
means that OLS would produce downwardly biased estimates.
Calculate the predicted wage for the following female:
29 years of experience;
12 years of schooling;
Works 21 hours per week in a firm employing less than 100 people;
And is married.
iiiiii
MarriedHrperweekYosExpsqExpw
543210
ˆˆˆˆˆˆˆ
1ˆ21ˆ12ˆ841ˆ29ˆˆˆ
543210
i
w
ECN6540 Econometric Methods Professor Karl Taylor
20
1262907.0210512885.0
120293446.08410006845.0290482869.0492425.5ˆ
i
w
6.777,1£483.7ˆ
i
w
Can also use Heckman command in STATA
#delimit;
heckman lw exp expsq yos firm100_199 firm200_499 firm500_999 firm1000 hrperweek
married, twostep select (emp = age nkids depkids poorhealth deg voc alev olev
cse married);
predict z, mills;
ECN6540 Econometric Methods Professor Karl Taylor
21
Heckman selection model -- two-step estimates Number of obs = 8336
(regression model with sample selection) Censored obs = 3623
Uncensored obs = 4713
Wald chi2(10) = 965.47
Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lw |
exp | .0482869 .0077443 6.24 0.000 .0331083 .0634655
expsq | -.0006845 .0001572 -4.35 0.000 -.0009926 -.0003764
yos | .0293446 .0096872 3.03 0.002 .0103581 .0483311
firm100_199 | -.0803412 .0737323 -1.09 0.276 -.2248538 .0641714
firm200_499 | .2236171 .0714028 3.13 0.002 .0836702 .3635641
firm500_999 | .0956679 .0978508 0.98 0.328 -.0961162 .287452
firm1000 | .2264656 .06999 3.24 0.001 .0892878 .3636435
hrperweek | .0512885 .0019267 26.62 0.000 .0475123 .0550648
married | -.262907 .0655342 -4.01 0.000 -.3913516 -.1344624
_cons | 5.492425 .1898561 28.93 0.000 5.120314 5.864536
-------------+----------------------------------------------------------------
ECN6540 Econometric Methods Professor Karl Taylor
22
emp |
age | -.0031502 .0012895 -2.44 0.015 -.0056777 -.0006228
nkids | -.2389976 .025578 -9.34 0.000 -.2891296 -.1888655
depkids | .3150533 .0542688 5.81 0.000 .2086883 .4214183
poorhealth | -.7700927 .0491398 -15.67 0.000 -.8664049 -.6737805
deg | .8842659 .051825 17.06 0.000 .7826908 .985841
voc | .5717135 .063198 9.05 0.000 .4478478 .6955793
alev | .471608 .0452927 10.41 0.000 .3828359 .5603802
olev | .4855719 .0392577 12.37 0.000 .4086282 .5625156
cse | .503995 .0668361 7.54 0.000 .3729987 .6349913
married | .2861741 .0332624 8.60 0.000 .220981 .3513672
_cons | -.1269548 .0628628 -2.02 0.043 -.2501637 -.0037459
-------------+----------------------------------------------------------------
mills |
lambda | -2.335898 .1283709 -18.20 0.000 -2.5875 -2.084296
-------------+----------------------------------------------------------------
rho | -1.03603
sigma | 2.2546626
lambda | -2.335898 .1283709
------------------------------------------------------------------------------
ECN6540 Econometric Methods Professor Karl Taylor
23
predict z, mills;
Variable | Obs Mean Std. Dev. Min Max
-----------------------------------------------------------------------------
z | 8336 .7143428 .2732871 .2716154 2.438151
mills | 8336 .7143428 .2732871 .2716154 2.438151
Interpretation
– A married woman has a higher probability of being working [note we
cannot interpret magnitude of coefficients] (treatment equation) and
26% lower wages (outcome equation). NOTE based upon OLS without
the inverse Mills ratio 5.4% higher wages! (although insignificant).
– An extra year of schooling increases wages by 2.9% (Heckman 2 step).
NOTE based upon OLS without the inverse Mills ratio 4% increase.
ECN6540 Econometric Methods Professor Karl Taylor
24
III. The Tobit Model and Sample Selection
Tobit Model:
iii
'y xβ*
0
00
**
*
yifyy
yify
From the general HECKMAN model:
iii
y
1111
βX
iii
y
2222
βX
01
0iii
T Z
iiiii
yTyTy
21
1
ECN6540 Econometric Methods Professor Karl Taylor
25
Imagine considering debt. We are interested in the amount of debt
undertaken by individuals (see previous lectures ECN6540), let T be an
indicator variable of whether individuals choose to take on debt.
iii
y
1111
βX
0
2
i
y
01
0iii
T Z
iiiii
yTyTy
21
1
If we assume the following: ZX ,
1
βγ and
ii 10
We now have the tobit model!
ECN6540 Econometric Methods Professor Karl Taylor
26
In some instances the tobit model may be incorrect.
WHY?
Covariates affect the participation decision differently from the decision of
how much to debt to undertake conditional on debt being positive, hence
1
βγ .
Consider the tobit model as a selectivity problem – and non zero debt as the
treatment
Imagine performing OLS regression on the non-zero values of debt y
γZβX
iiiiii
ETyE
01
1
Remember if ZX ,
1
βγ and
ii 10
we have a standard tobit model.
ECN6540 Econometric Methods Professor Karl Taylor
27
The expression looks exactly the same as for the model of working females
wages.
In the first step a probit predicts whether an observation is observed – used
to calculate the inverse Mills ratio.
In the second step, an OLS of y on X and the inverse Mills ratio using only
non-zero observations.
ECN6540 Econometric Methods Professor Karl Taylor
28
IV. Cautionary Remarks About Selection
- In principle the model is identified even if the variables in X and Z are
identical. However, under this scenario identification rests upon the
normality assumption to be exactly correct – probably too restrictive an
assumption.
- Desirable to find variables that enter the selection equation but not the
outcome equation. In practice this might be difficult.
- The model is sensitive to heteroscedasticity and non-normality. Indeed
Greene (2003) shows that heteroscedasticity is inherent in the selection
model. the standard 2 step approach therefore has less efficient
standard errors than ML. NOTE that the Heckman command in STATA
uses the ML approach (i.e. compare the standard errors of the 2 step
approach to the Heckman approach)
ECN6540 Econometric Methods Professor Karl Taylor
29
Overview
1. Considered the concepts of treatment and outcome;
2. Introduced the notion of sample selection;
3. Observed how the selection problem is essentially one of an omitted
variable;
4. Shown the relationship between Heckman’s sample selection model
and the TOBIT;
5. Noted limitations of both TOBIT and sample selection.