程序代写案例-SPRING 2022
时间:2022-05-08
SPRING 2022 - ECONOMETRICS
ECON-SHU 9301
PROBLEM SET 5
Distribution Date: May 2nd Due Date: May 11th (by 1:00 PM Florence time – CEST
GMT+2)
Relevant Chapters: Chapter 8, 10 and 13 of Wooldridge [2018]
Instructor Details:
Prof. Giampiero M. Gallo (email: giampiero.gallo@nyu.edu)
Graders:
Marta Korczak (email: marta.korczak@nyu.edu)
Belén Rodríguez (email: belen.rodriguez@nyu.edu)
Instructions:
This problem set contains a mix of multiple choice and longer questions. For both categories you
have to show all relevant details (or the intuition) leading to the answer. Everything can be solved
with a pocket calculator (apart from the questions directly related to R, which can also be used
as a calculator). Please submit your answers in the form of a single PDF file uploaded through
Brightspace Assignments in the corresponding PSet folder. Start working on the answers at your
earliest convenience, in order to make sure there is enough time to clarify possible doubts. Send an
email to Prof. Gallo, Marta and Philip if you have relevant questions about the problem set. We will
post the solutions right after the deadline. Part of the learning experience is to be able to evaluate
whether your answers are in line with the suggested solutions, so keep a copy. You are allowed to
work in groups, but you need to submit your own solution (no copying and pasting each other).
Meeting the deadline is mandatory: we allow for a few minutes grace period, but, as stated in the
syllabus, once solutions are out (approximately an hour past the deadline) no submissions will be
considered.
Breakdown of points:
Question: 1 2 3 4 5 6 7 8 9 10 11 12 Total
Points: 5 5 5 5 5 5 5 5 15 15 15 15 100
Score:
References:
Jeffrey M. Wooldridge. Introductory Econometrics: A Modern Approach. Upper Level Economics
Titles. Cengage Learning, 7th edition, 2018. ISBN 9781337558860.
Good luck!
1
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
MULTIPLE CHOICE SECTION
You should pick one answer to each question and provide at least a one sentence explanation.
1. (5 points) A long-standing issue in the literature of labour economics is how the arrival of the
first child influences a mother’s career. Suppose that you want to study the specific effect of the
first newborn on the mothers’ earnings in a country. We propose the following regression model:
wage = β0 + β1 × 1Postbirth + β2educ+ β3age+ β4exper + u (1)
where wage measures the monthly wage in euros, 1Postbirth is a binary variable that takes value
one for all the periods after the birth of the first child, and zero otherwise. The rest of variables
are the usual controls for years of education and working experience in the labor market, and
mother’s age. Based on this information, say which of the following statements is true.
(a) The best data set we could have to answer this question would be two independently pooled
cross-sections gathering information on the variables in Equation 1 for a sample of new
mothers in a country.
(b) The best data set we could have to answer this question would be a time series, aggregating
the values of the variables in Equation 1 across a sample of new mothers in a country during
at least two periods.
(c) The best data set we could have to answer this question would be a longitudinal data set
gathering information on the variables in Equation 1 for the same sample of new mothers
in a country during at least two periods.
(d) The best data set we could have to answer this question would be an unique cross-section
gathering information on the variables in Equation 1 for a sample of women (new mothers
and childless women) working in a country, since you could just compare the differences in
earnings between those mothers who gave birth and those who did not.
2. (5 points) Suppose that your colleague estimates the model in Equation 1 by OLS, getting the
usual standard errors. Reviewing the results that she gets, you see the following graph:
Say which of the following statements are true based on this figure.
(a) The OLS estimators for the parameters in Equation 1 are for sure biased.
(b) The error term in Equation 1 is probably heteroskedastic.
(c) You can still take as valid the t-test and p-value results for all the variables except age, since
this graph only suggest that the variance of the error term maybe heteroskedastic in age.
(d) A strategy to reduce the potential problem of heteroskedasticity will be to transform our
dependent and independent variables into logarithms.
(e) This graph is only telling us that our model performs badly explaining the variation in
earnings for older women in our sample.
3. (5 points) Consider the same settings as in Questions 1 and 2. Based on Figure 1, you decide
to run the following regression in order to test whether there is heteroskedasticity
uˆ2 = δ0 + δ1age+ δ2educ+ ν (2)
where uˆ2 is the square of the fitted residuals resulting from estimating Equation 1. Say which of
the following statements is true:
Page 2 of 8
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
Figure 1: Squared fitted residuals against the mother’s age.
(a) Suppose that the joint hypothesis that H0 : δ1 = δ2 = 0 is rejected at a 5% significance
level. This is the same as rejecting the hypothesis of having homoskedastic standard errors.
(b) The degrees of freedom of the F-statistic that test the joint significance of the parameters
δ1, δ2 are: n-k-1, where n is the sample size, and k is the number of total parameters consider
in the original model in Equation 1.
(c) We could transform the variables in Equation 2 into logarithms in order to better interpret
the coefficients.
(d) We could skip this step and perform directly the Breusch-Pagan test for Heteroskedasticity.
4. (5 points) Consider the same setting as in Question 1, and suppose that you make the following
assumptions:
(ASS1) E[u|1Postbirth, educ, age] = 0
(ASS2) V ar(u|1Postbirth, educ, age) = σ2age2
How can you transform Equation 1 so that we have homoskedasticity?
(a) Divide all variables by the square root of age including the intercept term.
(b) Divide only the right-hand side variables by the square root of age
(c) Add age2 as an independent variable to the model.
(d) Divide all variables by age including the intercept term.
5. (5 points) Consider the following equation estimated on monthly data:
yt = α0 + α1xt + α2xt−1 + ut
Which of the following statement is correct?
Page 3 of 8
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
(a) The long run multiplier is equal to α0 + α1 + α2
(b) We might have problem with seasonal effects even if our variables are transformed into
percentage rates of change across one year (month on month the previous year) .
(c) Impact multiplier is equal to α0 + α1
(d) The impact of a shock in January on the level of yt in July would suggest the presence of
heteroskedasticity.
6. (5 points) Consider the following finite-lag time series model:
yt = α0 + δ0zt + δ1zt−1 + δ2zt−2 + ut.
You are interested in the effect of a temporary change in z at time t = 0 on y. You standardize
the initial value of y at yt−1 = 0, so that any observed change can be attributed to change in z.
The graph below plots your estimated lag distribution:
lag
Coefficient(δj)
0 1 2 3 4
Which of the following statements is false:
(a) The value of the impact multiplier is greater than the sum of the values of lagged effects.
(b) The strongest effect of the temporary change in z is after one period.
(c) Our predicted value for yt+3 is equal to the value before the shift.
(d) We do not find effects after the second lag because we assume that only two lags of z
appear in our model.
7. (5 points) You are interested in understanding whether parental education has an effect on
children’s school outcomes. You propose the following linear model:
mathscoreit = β0 + β1fatheducit + β2incomeit + ait + uit (3)
wheremathscoreit measures the average score in math of child i in period t, fatheducit measures
the years of education of the parents at period t, and income is a measure of the net yearly earnings
of in the child’s household in period t. You decide to take first differences of (3) to account for
individual fixed effects, such that it is equivalent to:
∆mathscoreit = β1∆fatheducit + β2∆incomeit +∆uit (4)
Your colleague thinks that you will have problems in estimating the parameter β1 if you run an
OLS regression in 4. What could be the problem she is referring to?
8. (5 points) A strand of urban economics studies the optimal design of cities. One of its main
concerns is the reduction of commuting travel times within cities, specially the commuting times
to work. You want to study the impact of the opening of a new metro line connecting an outskirt
neighborhood with the business area on commuting times. You collect data on the average daily
minutes spent on travelling to work in the periods before and after the opening of the metro line
Page 4 of 8
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
for a sample of dwellers in your city. You propose a two-group, two-period difference-in-differences
identification strategy using the following model:
wrktime = β0 + δ0d2 + β1dIn+ δ1d2 · dIn+ u
where wrktime measures the average daily commuting time to work (in minutes), dIN is a dummy
variable indicating whether an individual lives inside the outskirt neighborhood where the metro
line has opened, and d2 is a binary variable that stands for the post-treatment period. State
which of the following statements is false (Hint: there could be more than one false statement.:
(a) The parameter δ1 captures the effect of interest.
(b) The parameters β0 + β1 measures the average commuting time to work of dwellers in the
affected neighborhood before the opening of the metro line.
(c) The parameters β1 + δ1 measures the difference in the average commuting time to work
between dwellers living inside and outside the affected neighbourhood after the opening of
the metro line.
(d) The parameter β1 captures the effect of interest.
(e) All the above statements are correct.
Page 5 of 8
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
LONGER QUESTIONS
9. (15 points) The data set sleep75 from the Wooldridge package contains information on indi-
viduals’ time use. You propose the following model to understand the determinants of workers’
sleeping behaviour.
sleep = β0 + β1totwrk + β2educ+ β3age+ β4age
2 + β5yngkid+ β6male+ u
where sleep measures individuals’ sleeping time per week (in minutes), totwork measures the
amount of time spending at work per week (in minutes). The rest of variables are the usual
controls for education (years of education), age, and a dummy for gender (male) and whether
an individual has a young kid (yngkid).
(a) Suppose that you estimate the model in Equation 9 by standard OLS regression. That
is, assuming homoskedasticity. Do you think that the estimated standard errors of the
coefficients are right? Why?
(b) Write down a model that allows the variance of the error term to differ between men
and women. The variance should not depend on other factors. (Hint: Consider defining
V ar(u|totwrk, educ, age, yngkid,male) = V ar(u|male) equal to some linear function that
depends on male.)
(c) Load in Rstudio the file sleep75 and estimate the parameters of the model in Equation 9
by OLS. Is the estimated variance of the error term higher for men or for women?
(d) Run the auxiliary regression that allows you to determine whether the difference in the
variance of the error term between women and men is statistically significant.
(e) Making use of the previous steps, run the Breusch-Pagan test for heteroskedasticity manually.
10. (15 points) Heteroskedasticity. Load in Rstudio the file gpa1 from the Wooldridge package.
This data set contains information on students GPA performance as well as on other information.
Use this data set to answer the following questions.
(a) Estimate by OLS the model relating college GPA (colGPA) to the GPA of high school
(hsGPA), the achievement score (ACT ), the weekly average number of lectures skipped
(skipped) and to whether students have a personal computer available at the school (PC).
(b) Plot the squared fitted residuals against the different controls included in your model. Do
you have suggestive evidence of the existence of heteroskedasticity?
(c) Compute the special case of the White test for heteroskedasticity. Obtain the fitted values
of running the auxiliary the regression of uˆ2i on ˆcolGPA, ˆcolGPA
2. Store then under the
name hˆ.
(d) Are the fitted values from the previous part all strictly positive? If so, obtain the Weigthed
Least Square estimates using individual weights 1√
hˆi
. Compare the WLS estimates with
the corresponding OLS estimates. Is there any coefficient that is not anymore statistically
significant?
(e) Suppose that the the variance function that you used in the previous part is misspecified.
Obtain the heteroskedasticity-robust standard errors. Do the standard errors change much
with respect the ones obtained by WLS?
Page 6 of 8
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
11. (15 points) Time trends. Load in Rstudio the file traffic2 from the Wooldridge package. This
data set contains 108 monthly observations on total automobile accidents per month (totacc)
and some some other variables for California from January 1981 through December 1989. This
data set also includes two dummy variables that indicate the introduction of the mandatory use
of seatbelt (beltlaw)) and the increase of the highway speed to 65 miles per hour (spdlaw) . Use
this data set to answer the following questions.
(a) Looking at the data, during what month and year did these two laws take effect in California?
(b) Regress the the logarithm of total accidents (ltotacc) on a linear time trend and 11 monthly
dummy variables, using January as the base month. Interpret the coefficient estimates on
the time trend. Would you say there is seasonality in total accidents? In which period of
the year does that occur? What testing procedure would you use to formally answer that
question? Does this seasonal pattern make sense?
(c) Add to the regression from part b the dummy variables indicating the number of accidents
that took place on a weekend (wkends), the unemployment rate (unem) and whether the
two driving laws of part a were in place. Discuss your estimates. Does the estimated
coefficient on the unemployment variable make sense to you? What about the estimates for
the coefficients on spdlaw and beltlwar?
(d) The variable prcfat measures the percentage of accidents resulting in at least one fatality.
Plot the evolution of this variable over the observed period. Does the magnitude of this
variable seem about right? (Hint: Note that this variable is a percentage, not a proportion.)
(e) Run again the regression in part c using this time as an outcome variable prcfat. Interpret
the estimates that you get now. Is there an explanation that gathers the estimates that we
obtain here with those in part c?
12. (15 points) A quasi-experimental study. In the summer of 1980, an estimated 125,000
emigrants from Cuba entered the United States (the Mariel Boatlift), most of whom settled in
the Miami metropolitan region, and therefore, did not move to other cities. Economist have
long made used of this quasi-experimental set-up to study the effect of immigration in the labor
market. The file cps_boatlift.dta contains data in STATA format from the Current Population
Survey on individual characteristics and labor market outcomes (earnings and labor force status)
of respondents in Miami and in four additional cities: Atlanta, Houston, Los Angeles, and Tampa-
St. Petersburg. Based on this information, answer the following questions:
(a) Open the data set in Rstudio, and visualize the dataframe. Based on the variables indicating
year and location, which type of data do you think we are dealing with and why: an
independently pooled cross section or a longitudinal dataset?
(b) As stated in Wooldridge, quasi-experimental settings occurs when some exogenous event,
like the arrival of the Marielitos in this case, changes the policy or the environment in which
individuals used to operate. In order to evaluate the impact of this type of changes, it is
important to identify the four elements: (i) the treated group, or the group of individuals
impacted by the policy/enviroment change; (ii) the control group, or the comparable group
of individuals that are not affected by the change; and (iii) the pre-treatment period, or the
baseline period before the policy change; and the (iv) post-treatment period, or the period
right after the policy change takes place. Identify these four elements in our case of study.
Which variables help you to pin them down in our data set? Create a dummy variable that
takes value in the post-treatment period and zero otherwise. (Hint: there are several ways
to create a dummy variable, e.g. the command ifelse( argument, 1, 0).)
Page 7 of 8
ECON-SHU 9301 PROBLEM SET 5 SPRING 2022
(c) Before analysing the impact of any policy change, it is always recommendable to characterize
the environment in the pre-treatment period. Present descriptive statistics on sample size,
mean education (educ), employment status (emp) and hourly earnings (earn) in our treated
and control groups before the arrival of the Mariel Boatlift. (Hint: you can specify the
option data=subset( year == ‘x’) when applying the function aggregate to compute the
average for a specific subset of the sample.
(d) Test whether there were significant differences on the average value of the variables in part
(c) between the treated and control group at the baseline period. (Hint: you can specify
the option subset when using the command lm, too.) Do your results induce you to caution
when you study the effect of the subsequent arrival of the Marielitos on Miami’s labor
market?
(e) Write down the regression equation that allows you to get the effect of the Mariel Boatlitf
on employment and earnings in the Miami labor market using the difference-in-differences
estimator. Estimate the regression. What is the effect of the arrival of the Mariel Boatlift
in these outcomes? Was it statistically significant?
(f) Does the inclusion of additional regressors, such as age, sex, ethnicity, and educ (years of
schooling) change our estimate of the treatment effect in part e? Does our estimated effect
on wage and unemployment make sense from an economic point of view? What could be
going on?
Page 8 of 8