ECON 6040
Problem Set
1. Consider the simple hypothetical example in Table 1. This example involves
eleven patients each of whom is infected with COVID. There are two treatments: ven-
tilators Y 1 and bedrest Y 0. Table 1 displays each patient’s potential outcomes in terms
of years of post-treatment survival under each treatment. Larger outcome values corre-
spond to better health outcomes. [15 points]
Table 1: Hypothetical Example
Patient Y 1 Y 0 Age
1 1 10 29
2 1 5 35
3 1 4 19
4 5 6 45
5 5 1 65
6 6 7 50
7 7 8 77
8 7 10 18
9 8 2 85
10 9 6 96
11 10 7 77
a) Calculate each unit’s treatment effect. [3]
b) What is the average treatment effect (ATE) for ventilators compared to bedrest?
Which type of intervention is more effective on average? [3]
c) Suppose the “perfect doctor” knows each patient’s potential outcomes and as a
result chooses the best treatment for each patient. If she assigns each patient to
the treatment more beneficial for that patient, which patients will receive ventilators
and which will receive bedrest? [3]
d) Calculate the simple difference in average outcomes that would obtain if treatment
assignment happened as in part (c). How similar is it to the ATE? [3]
e) Provide an example of how SUTVA might be violated for treatments of COVID. [3]
1
2. In this exercise you will estimate the effect of lecture attendance on academic
performance using the data in “ATTEND”. [20 points]
a) Use OLS to estimate a regression model relating stndfnl (the standardized final exam
score) to atndrte (the percent of lectures attended). Include the binary variables frosh
and soph as explanatory variables. Interpret the coefficient on stndfnl, and discuss
its statistical significance. [5]
b) How confident are you that the OLS estimates from part (i) are estimating the causal
effect of attendance? Explain your answer. [5]
c) As proxy variables for student ability, add to the regression priGPA (prior cummu-
lative GPA) and ACT (achievement test score). Now what is the effect of atndrte?
Discuss how the effect differs from that in part (a). [5]
d) To test for a nonlinear effect of atndrte, add its square to the equation from part (c).
What do you conclude? [5]
3. In this exercise you will estimate the effect of cigarette smoking during preg-
nancy on the weight of newborns using the data in “BWGHT”. Consider the following
specification:
log(bwght) = β0 + β1male+ β2parity + β3log(faminc) + β4packs+ u, (1)
where male is a binary indicator equal to one if the child is male, parity is th ebirth
order of this child, faminc is family income, and packs is the average number of packs
of cigarettes smoked per day during pregnancy. [30 points]
a) Why might you expect packs to be correlated with u? [6]
b) Suppose that you have data on average cigarette price in each woman’s place of
residence. Discuss whether this information is likely to satisfy the properties of a
good instrumental variable for packs. [6]
c) Use the data in “BWGHT” to estimate equation (1) using OLS. [6]
2
d) Now estimate equation (1) using 2SLS, where cigprice is an instrument for packs.
Discuss how your OLS estimates compare to the 2SLS estimates. [6]
e) Estimate the reduced form for packs. What do you conclude about identification of
equation (1) using cigprice as an instrument for packs? [6]
4. Use the data in “WAGEPAN” for this exercise, which is a panel dataset of 545
men who worked every year from 1980 to 1987. Consider the wage equation:
log(wageit) = β0+β1educi+β2blacki+β3hispi+β4experit+β5exper
2
it+β6marriedit+β7unionit+ci+uit.
(2)
The variables are described in the dataset. Notice that education does not change
over time. [20 points]
(a) Estimate equation (2) by pooled OLS. Are the usual OLS standard errors reli-
able, even if ci is uncorrelated with all explanatory variables? Explain. Compute
appropriate standard errors. [6]
(b) Estimate equation (2) by Random Effects. Compare your estimates with the
pooled OLS estimates in part (a). [7]
(c) Estimate equation (2) by Fixed Effects. Compare your estimates with the RE
estimates in part (b). [7]
5. A researcher is concerned with estimating the effect of the level of unemployment
insurance benefits on the length of unemployment spells. She finds out that recently
US state Blue changed its unemployment insurance programme so that workers with
earnings above a certain threshold (group H) will receive higher benefits if they become
unemployed, whereas for workers below the earnings threshold (group L) unemployment
benefits remain unchanged. The researcher collects information on average unemploy-
ment duration (in weeks) in State Blue and neighbouring State Red for both groups of
workers (H and L), from the year before the policy change and for the year after. [15
points]
3
State Blue State Red
Before After Before After
Group H 15.8 16.9 15.2 15.4
Group L 17.1 17.6 16.8 17.1
(a) Using the data provided, construct two alternative difference-in-difference esti-
mates of the effect of unemployment benefits on unemployment duration. Discuss
the key assumptions underlying the validity of your estimates in each case. [5]
(b) Using the data provided, construct a difference-in-difference-in-difference estimate
of the effect of unemployment benefits on unemployment duration. Discuss the
assumption underlying the validity of this estimate. [10]
4