Dr. Maria Kyriacou
Department of Economics
University of Southampton
Semester 2, 2021-2022
ECON2007: Econometrics with Big Data: Coursework 2
You are required to answer all questions A–E. This piece of Coursework will be assessed
and will count 15% towards your final grade. You need to submit your answers (accom-
panied with a Cover Sheet) by Wednesday, 27th of April 2022 at 18:00 via Turnitin).
Individual submissions are allowed only for this Coursework. You are responsi-
ble for submitting your answers and uploading them on time on turnitin. Answers must
be typeset in word document or .pdf format. You are encouraged to use the “copy as
picture” option to extract the STATA outputs from the software into your submission
document. You may use a stylo pen to (hand)draw plots and annotate in formulae (such
as drawing of a distribution plot, annotate with a hat such as βˆ, superscript/subscript).
No late submissions are allowed.
This coursework focuses on empirical questions and uses the dataset earnings.dta,
which is in Stata format (.dta) and you need to include all Stata outputs obtained in your
completed answers.
Using the dataset earnings.dta, we are interested in analyzing the effects of schooling
(S), experience (EXP ), ability (SAT ), migration status (MIGRATION) and gender
(MALE) on earnings (EARNINGS). The variables S and EXP denote the com-
pleted years of school and work experience of the respondent, respectively. SAT is the
SAT exam score obtained by the respondent. MIGRATION is dummy variable with
MIGRANT = 1 if the respondent comes from a MIGRANT family and zero otherwise.
Similarly MALE = 1 if the respondent is male and zero otherwise.
A. Perform an OLS regression of the logarithm of EARNINGS on S, EXP , SAT ,
MIGRATION and MALE.
i. Write down the fitted model.
ii. Comment on the statistical significance of the each estimated coefficient (spec-
ify the significance levels). What can you conclude about the overall signifi-
cance of this model?
iii. Plot the squared residuals from the OLS regression against EXP. What do you
observe?
iv. Perform the same OLS regression, this time using heteroskedasticity-robust
Standard errors. What do you conclude?
v. Perform a RESET test to check whether there is an omitted variables issue in
the model in (i.). Briefly discuss your results. (Hint: You can either use the
ovtest command or run you own specification of the RESET test. Specify the
null/alternative hypotheses, test statistics etc to justify your answer.)
vi. Perform an OLS regression of the logarithm of EARNINGS on S and EXP .
Comment on the estimated output and perform a RESET test to this specifi-
cation. How do your results compare with the results in (v.)?
1
B. We suspect that SAT might not be a good measure of ability and might suffer from
measurement error. Perform an IV regression of the logarithm of EARNINGS on
S, EXP , SAT , MIGRATION and MALE, using years of schooling of the mother
of the respondent (variable SM in the dataset) as an instrument for SAT .
(Hint: The Stata command for IV regression is ivregress)
How can you check whether SM satisfies the relevance condition as an instrument?
C. Now assume that, together with mother’s schooling, SM , we have other possible
instruments for SAT . Specifically, together with SM , we have data for years of
schooling of the father (SF ), number of brothers and sisters (SIBLINGS) and a
dummy variable, LIBRARY , where LIBRARY = 1 if a member of the family of
the respondent had a library card when the respondent was 14, and zero otherwise.
Discuss the advantages of using SM , SF , SIBLINGS and LIBRARY as instru-
ments for SAT , rather than using SM only. In particular, describe mathematically
the procedure known as Two Stages Least Squares (2SLS).
D. Using the dataset perform the 2SLS regression of logarithm of EARNINGS on S,
EXP , SAT , MIGRATION and MALE using all the instruments for SAT defined
in (C.) and compare the results with the outcomes in parts (A.) and (B.).
E. Using the results of parts (A.) and (D.), perform a Hausman test to assess whether
SAT is indeed a poor measure of ability. Clearly specify null and alternative hy-
potheses, the logic of the test and the outcome you obtain.
Tip: Instructions on how to implement the Hausman test in STATA are shown in
the slides from Lecture 7 (and the empirical example we covered while discussing
Instrumental Variable estimation).
2