程序代写案例-ECON 452
时间:2022-03-31
Simple Panel Data Analysis
ECON 452 Lecture 18
Jung Hwan Koh
Department of Economics
University of Michigan
March 28, 2022
1 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Review
2 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Causal Inference Dierence-in-Dierence
Review
3 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Contents of Pannel Data Analysis
1. Types of Data
2. Pooling Independent Cross Sections across Time
3. Policy Analysis with Pooled Cross Sections
4. Two-Period with Pooled Cross Sections
5. Two-Period with Panel Data Analysis
6. Difference in Difference
4 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Policy Analysis with Pooled Cross Sections
Dierence-in-Dierence
Set up
Two groups: treatment and control groups ( Di )
Two two periods: Pre and Post periods ( t = 0, 1 (pre, post) or Ti )
Treatment administrated between pre and post periods
Y1it: Outcome of individual i who receive treatment at time t
Y0it: Outcome of individual i who do not treatment at time t
Priminary Interested: Average Treatment Eect
E(Y1it |Di = 1, t = 1) − E(Y0it |Di = 1, t = 1) 5 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Dierence-in-Dierence
Dierence-in-Dierence Estimator
δ1 = (E(Y1it |Di = 1, t = 1) − E(Y0it |Di = 1, t = 0)) − (E(Y0it |Di = 0, t = 1) − E(Y0it |Di = 0, t = 0))
Dierence-in-Dierence in the Regression Model
yit = β0 + δ0Dt + β1Ti + δ1Dit ∗ Tit + uit
Assumptions
MLRs hold
Parallel Trends (Common Trend)
No Pre-treatment Effect
6 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Dierence-in-Dierence
Dierence-in-Dierence in the Regression Model
yit = β0 + δ0Dit + β1Tit + δ1Dit ∗ Tit + uit
Pre Post Post - Pre
Treatment β0 + δ0 β0 + δ0 + β1 + δ1 δ0 + δ1
Control β0 β0 + δ0 δ0
Treatment - Control β1 β1 + δ1 δ1
7 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Dierence-in-Dierence
Example: Eect of a Garbage Incinerator’s Location on Housing Prices
Kiel and McClain (1995)
Two groups
Control: houses far away from the incinerator
Treatment: houses near from the incinerator
Two time period
1978 (before there were any rumors about the new incinerator)
1981 (when the construction began)
8 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Dierence-in-Dierence
Example: Eect of a Garbage Incinerator’s Location on Housing Prices
> coef( lm(rprice~nearinc, data=kielmc, subset=(year==1978)) )
(Intercept) nearinc
82517.23 -18824.37
> coef( lm(rprice~nearinc, data=kielmc, subset=(year==1981)) )
(Intercept) nearinc
101307.51 -30688.27
> coeftest( lm(rprice~nearinc*y81, data=kielmc) )
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82517.2 2726.9 30.2603 < 2.2e-16 ***
nearinc -18824.4 4875.3 -3.8612 0.0001368 ***
y81 18790.3 4050.1 4.6395 5.117e-06 ***
nearinc:y81 -11863.9 7456.6 -1.5911 0.1125948
9 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Dierence-in-Dierence
Dierence-in-Dierence with Control Variables
DiD DiDcontr
nearinc -0.340*** 0.032
(0.055) (0.047)
y81 0.193*** 0.162***
(0.045) (0.028)
nearinc × y81 -0.063 -0.132*
(0.083) (0.052)
Contral Var. No Yes
N 321 321
R2 0.25 0.73 10 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
5. Two-Period with Panel Data Analysis
11 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Fixed Eects Model
yit = β0 + δ0Tt + β1xit + ai + uit, t = 1, 2
t = 1, 2: two time period
Tt = 0, 1: dummy variable that equals zero when t = 1 and one when t = 2 (varies with t)
xit: indepedent variables (varies with i and t)
ai: all unobserved, time-constant factors that affect yit; called unobserved effect, or fixed
effect
uit: idiosyncratic error
12 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Example (Crime rates)
Data set crime2.raw contains crimes per 1000 people and unemployment rates of 46 cities
for 1982 and 1987. Using 1987 data:
library(wooldridge)
data(crime2)
f0 <- lm(crmrte ~ unem, data = crime2[crime2$d87==1,])
summary(f0) %>% tidy()
# A tibble: 2 × 5
term estimate std.error statistic p.value

1 (Intercept) 128. 20.8 6.18 0.000000180
2 unem -4.16 3.42 -1.22 0.230
So a higher unemployment rate implies a decrease in the crime rate?
Something is wrong here. Did we omit variables?
13 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Problems with individual heterogeneity
Usually assume that fixed effect and covariates are correlated.
E.g., socio-economic environment and unemployment rate in a city are probably related;
education and experience are probably correlated with innate ability.
We know that in such a case MLR.4 fails and our OLS estimates will be biased and
inconsistent.
14 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Problems with individual heterogeneity
What if ai is correlated with xit? What if we don't do anything?
yit = β0 + δ0Tt + β1xit + ai + uityit = β0 + δ0Tt + β1xit + vit
where vit = ai + uit
15 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
First Dierence
Difference across two time periods
yi2 = β0 + δ0 + β1xi2 + ai + ui2
yi1 = β0 + β1xi1 + ai + ui1
Subtract first equation from second to get
16 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
First - Dierence
First-difference as in
Δyi = δ0 + β1Δxi + Δui
removes unobserved effect ai and intercept β0 , but can still estimate β1 and δ0.
Assumption MLR.4 in new regression equation means
E(ΔU | Δx) = 0
that is, after controlling for changes in the independent variable over time, the expected
change in the error is zero.
Could fail if increase of unemployment is correlated with increase in black market
activities, which would be contained in the error term.
17 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
First - Dierence
Alternatively, assume idiosyncratic error in both time periods has mean 0 conditional on
independent variable in both time periods,
E(ui1 | xi1, xi2) = 0 and E(ui2 | xi1, xi2) = 0
which is also called strict exogeneity of the covariates.
Strict exogeneity implies MLR.4 because
E(Δui | xi1, xi2) = E(ui2 | xi1, xi2) − E(ui1 | xi1, xi2) = 0 − 0 = 0
MLR.4 does not imply strict exogeneity. Why?
18 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Example (Crime rates, continued)
Generate first-differenced regressors by subtracting obs for which dummy d87 equals 0 from
obs for which dummy equals 1:
attach(crime2)
Dcrmrte <- crmrte[d87==1] - crmrte[d87==0]
Dunem <- unem[d87==1] - unem[d87==0]
When we worked with whole data set d, we wrote [d87==1,] to select obs from 1987.
[d87==1,] means we select rows with observations from 1987 and all columns. crmrte and
unem have only one column, so we omit comma when we select data.
Next, run regression with differenced data.
f1 <- lm(Dcrmrte ~ Dunem)
19 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Example (Crime rates, continued)
options(pillar.sigfig = 4)
summary(f1) %>% tidy()
# A tibble: 2 × 5
term estimate std.error statistic p.value

1 (Intercept) 15.40 4.702 3.276 0.002060
2 Dunem 2.218 0.8779 2.527 0.01519
Much better. Interpretation of Δunem?
Interpretation of intercept (our estimate of δ0)?
20 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Common problems with panel data
Data can lack time-variation, e.g., work with
log(wageit) = β0 + δ0d2 + β1educit + β2experit + ai + uit
or. first-differenced form,
Δlog(wageit) = δ0d2 + β1Δeducit + β2Δexperit + Δuit
Years of education of people in the workforce doesn’t change, so Δeduc = 0 for possibly all
observations.
First differencing then removes β1 from equation and cannot estimate return to
education.
21 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Panel data with more than 2 periods
Works but first-differencing is more complicated. We have to subtract period 1 from
period 2 and period 2 from period 3.
We won’t discuss first-differencing in panels with 3 or more periods further because the
data violate MLR.2.
Cov(Δui3, Δui2) < 0, so errors in first-differenced equations with 3 or more periods are
negatively correlated.
Since this correlation is over time, we call this serial correlation.
Study of serial correlation is called time series analysis. Chapters 10–12 of Wooldridge
introduce basic concepts.
22 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Dierence-In-Dierence with Panel Data
Researchers often use 2-period panels, where one group is control and other receives
treatment between period 1 and 2.
Write this as
yit = β0 + δ0Tt + β1Di + δ1Di ∗ Tt + ai + uit
Ti is a time dummy, Dt is a dummy for treatment, and ai is unobserved effect.
Remove the unobserved effect via first-differencing:
Δyi =
and and again obtain δ1 as difference between changes in treatment and changes in
control.
23 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Two-Period with Panel Data Analysis
Dierence-In-Dierence with Panel Data
Could easily include characteristics as in
yit = β0 + δ0Tt + β1Di + δ1Dit ∗ Tit + β2xit + aiuit
Can happen that characteristics are time-constant, i.e., xit = xi.
First-differencing would remove these covariates.
If we want include these covariates, let their impact vary over time by interacting them
with time dummy:
yit = β0 + δ0Tt + β1Di + δ1Dit ∗ Tit + β2Ttxi + aiuit
Equation in first-differences is then
Δyi = δ0 + δ1Di + β2xi + Δuit
24 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Time-Demeaning
Model with Unobserved Heterogeneity
yit = β0 + β1xit + ai + uit
cov(xit, ai) ≠ 0 → Endogeneity
Demeaning
yit − yˉ i = β1(xit − xˉ i) + uit − uˉi
where yˉ i = ∑
T
t yit, xˉ i = ∑
T
t xit, and uˉ i = ∑
T
t uit
25 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Least Square Dummy Variable (LSDV)
Suppose i = 1, 2, 3
yit = β0 + β1xit + ai + uit
Including dummy variables
yit = β0 + β1xit + δ0t2 + δ1t3 + uit
Allowing individuals have own intercept, representing heterogeneity
Estimate heterogeneity effects
Require large N as i increases
26 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Least Square Dummy Variable (LSDV)
Example: Wage and Education
library(plm)
data(wagepan)
wagepan.p <- pdata.frame(wagepan, index=c("nr","year") )
pdim(wagepan.p)
Balanced Panel: n = 545, T = 8, N = 4360
27 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Least Square Dummy Variable (LSDV)
Example: Wage and Education
summary(plm(lwage~married+union+factor(year)*educ, data=wagepan.p, model="within"))
Oneway (individual) effect Within Model
Call:
plm(formula = lwage ~ married + union + factor(year) * educ,
data = wagepan.p, model = "within")
Balanced Panel: n = 545, T = 8, N = 4360
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-4.152111 -0.125630 0.010897 0.160800 1.483401
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
married 0.0548205 0.0184126 2.9773 0.002926 **
union 0.0829785 0.0194461 4.2671 2.029e-05 ***
factor(year)1981 -0.0224158 0.1458885 -0.1537 0.877893
factor(year)1982 -0.0057611 0.1458558 -0.0395 0.968495
factor(year)1983 0.0104297 0.1458579 0.0715 0.942999
factor(year)1984 0.0843743 0.1458518 0.5785 0.562965
factor(year)1985 0.0497253 0.1458602 0.3409 0.733190
factor(year)1986 0.0656064 0.1458917 0.4497 0.652958
factor(year)1987 0.0904448 0.1458505 0.6201 0.535216
28 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Example: Crime RateProbability of Arrest
29 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Within each county,
now is a negative
relationship!!
Different intercepts
(county 1 is be the
reference group),
Unique slope coefficient
β. (you observe that the
lines are parallel).
We are shifting lines
down from the
reference group 1.
Example: Crime RateProbability of Arrest
30 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
xsect dummy demeaned
(Intercept) 0.009+ 0.045*** 0.000
(0.005) (0.005) (0.001)
prbarr 0.065*** -0.028*
(0.016) (0.014)
demeaned_prob -0.028*
(0.013)
N 28 28 28
R2 0.39 0.89 0.16
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Estimate for prbarr is
positive in the cross-
section
Taking care of the
unobservered
heterogeneity ci...
...either by including an
intercept for each i or
by time-demeaning the
data
we obtain: -0.028 .
The Within Transformation in R
31 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Interpreting the Within Estimates
How to interpret those negative slopes?
We look at a single unit i and ask:
if the arrest probability in i increases by 10 percentage points (i.e. from 0.2 to 0.3)
from year t to t + 1, we expect crimes per person to fall from 0.039 to 0.036, or by
-7.69 percent (in the reference county number 1).
32 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js
Concluding Mark
Next time we will discuss some applications of di-in-di in policy analysis: Please read the
paper “Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey
and Pennsylvania” by David Card and Alan B. Krueger.
33 / 33Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

essay、essay代写