ECOM30002/90002 Econometrics 2
Group Assignment 3
Deadline: 4pm, Wednesday May 5, 2021
Submission method: Electronically via the LMS
Weight: 7.5%
Material covered: Mainly Lectures 1–16 & Tutorials 1–8
Instructions
Group size: Minimum = 1, maximum = 4. Groups may be formed across different tutorials.
Group registration: If you plan to work in a group of 2–4 students, then you must register the
membership of your group prior to submission using the ECOM30002 A3 Group tab in the People
area on the LMS. If you plan to complete the assignment individually, you can ignore this step.
Further instructions for group registration as well as the deadline for group registration will be
announced via the LMS.
Cover page: Each assignment must include a cover page listing the full name of every member
of the group along with their student ID and the name of their tutor.
Division of marks: Equal marks will be awarded to each member of a group.
Word processing: Assignments should be submitted as fully-typed documents in PDF or Word
format. Question numbers should be clearly indicated.
Statistical output: Raw R output and/or screenshots are not acceptable. Regression output
must be presented in clearly labelled equation or table form. Figures should be presented on an
appropriate scale, labelled clearly (including axes) and with an appropriate heading. Marks may
be deducted for failure to meet any of these requirements.
Length of answers: The word limit for this assignment is 600 words. Penalties may apply for
submissions exceeding this limit by 20% or more. Concise and correct answers to questions requir-
ing interpretation/discussion will be valued over more lengthy unclear and/or off-topic attempts.
R script: You must append a complete copy of the R script that you have used to generate your
results. Your R script does not count towards the word limit.
1
Section 1: Conceptual Questions (30 marks)
(1.1) Consider a balanced panel dataset on fatalities from road traffic accidents for n = 51 US
states (including Washington D.C.), with data for each of the T = 8 years from 1990-1997.
States are indexed by i = 1, 2, . . . , n and years by t = 1, 2, . . . , T . The policy question of
interest is whether the rate of seatbelt usage has any effect on the rate of road fatalities. The
variables in the dataset are defined as follows:
FRatei,t number of road fatalities per million traffic miles
SBUsei,t seatbelt usage rate, in percent
Speed65i,t binary variable: 1 for 65 mile per hour speed limit, 0 otherwise
Speed70i,t binary variable: 1 for 70 mile per hour speed limit, 0 otherwise
BA08i,t binary variable: 1 for blood alcohol limit ≤ 0.08%, 0 otherwise
Incomei,t median per capita income, in US dollars
Agei,t mean age, in years
The following model is estimated using data only from the first and last years of the dataset:
(A) A model in differences estimated by OLS, where each variable is transformed by taking
the change in its value from 1990 to 1997 (e.g. ∆FRatei = FRatei,1997 − FRatei,1990):
∆FRatei = β0 + β1∆SBUsei + β2∆Speed65i + β3∆Speed70i
+ β4∆BA08i + β5∆ log(Incomei) + β6∆Agei + Ui
The following four models are estimated using the entire dataset:
(B) A pooled model estimated by OLS:
FRatei,t = β0 + β1SBUsei,t + β2Speed65i,t + β3Speed70i,t
+ β4BA08i,t + β5 log(Incomei,t) + β6Agei,t + Ui,t
(C) A model with state fixed effects estimated using the within estimator:
FRatei,t = αi + β1SBUsei,t + β2Speed65i,t + β3Speed70i,t
+ β4BA08i,t + β5 log(Incomei,t) + β6Agei,t + Ui,t
(D) A model with time fixed effects estimated using the within estimator:
FRatei,t = λt + β1SBUsei,t + β2Speed65i,t + β3Speed70i,t
+ β4BA08i,t + β5 log(Incomei,t) + β6Agei,t + Ui,t
(E) A model with state and time fixed effects estimated using the within estimator:
FRatei,t = αi + λt + β1SBUsei,t + β2Speed65i,t + β3Speed70i,t
+ β4BA08i,t + β5 log(Incomei,t) + β6Agei,t + Ui,t
Table 1.1 presents the coefficient estimate βˆ1 from all five models, along with its standard
error and the p-value for a two-sided test of the statistical significance of the seatbelt usage
rate. For this question, we do not concern ourselves with the issue of heteroskedasticity,
because we have not yet covered robust standard errors for panel data models.
2
Table 1.1 Estimation Results for Question 1.1
Model βˆ1 Std. Error p-value
(A) -0.008 0.004 0.085
(B) 0.014 0.002 0.000
(C) -0.007 0.001 0.000
(D) 0.015 0.002 0.000
(E) -0.004 0.001 0.008
(a.) Marks Available: 1
What is the benefit of expressing the number of road fatalities relative to mileage trav-
elled, as opposed to working with the raw number of road fatalities?
(b.) Marks Available: 5
Interpret the sign, magnitude and significance of the estimates of β1 from models (A)
and (C) using Table 1.1. Conceptually, do you expect the estimate of β1 from model
(A) to be identical to the estimate from model (C) in this case? Explain your answer.
(c.) Marks Available: 4
Explain the reasons for the difference between the estimates of β1 obtained from models
(B) and (C). Which estimate do you think is likely to be closer to the actual effect of
seatbelt usage rates on road fatalities? Explain your answer.
(d.) Marks Available: 4
Based on the information reported in Table 1.1, do you believe that state-invariant but
time-varying omitted variables generate omitted variables bias in the estimate of β1
obtained from Model (C)? Explain your answer.
(e.) Marks Available: 5
By virtue of the within transformation, the values of the state fixed effects are not
estimated in Model (C). They can, however, be estimated using an OLS regression of
FRatei,t on SBUsei,t, Speed65i,t, Speed70i,t, BA08i,t, log(Incomei,t) and Agei,t, as well as
a separate dummy variable for each state in the panel, having excluded the intercept
term.
Point estimates of the state fixed effects obtained in this way are presented as a barchart
in Figure 1.1. Briefly interpret the reported values of the state fixed effects for Wisconsin
(WI), Mississippi (MS) and North Dakota (ND). Explain why care should be taken when
interpreting the values of the state fixed effects reported in Figure 1.1.
(f.) Marks Available: 2
Below is a statement that is true for one or more of Models (A) to (E). Identify the
model or models to which this statement applies and explain your reasoning:
The estimate of β1 controls for state-invariant,
time-varying omitted variables.
3
Figure 1.1 OLS Point Estimates of the State Fixed Effects for Question 1.1
St
at
eA
K
St
at
eA
L
St
at
eA
R
St
at
eA
Z
St
at
eC
A
St
at
eC
O
St
at
eC
T
St
at
eD
C
St
at
eD
E
St
at
eF
L
St
at
eG
A
St
at
eH
I
St
at
eI
A
St
at
eI
D
St
at
eI
L
St
at
eI
N
St
at
eK
S
St
at
eK
Y
St
at
eL
A
St
at
eM
A
St
at
eM
D
St
at
eM
E
St
at
eM
I
St
at
eM
N
St
at
eM
O
St
at
eM
S
St
at
eM
T
St
at
eN
C
St
at
eN
D
St
at
eN
E
St
at
eN
H
St
at
eN
J
St
at
eN
M
St
at
eN
V
St
at
eN
Y
St
at
eO
H
St
at
eO
K
St
at
eO
R
St
at
eP
A
St
at
eR
I
St
at
eS
C
St
at
eS
D
St
at
eT
N
St
at
eT
X
St
at
eU
T
St
at
eV
A
St
at
eV
T
St
at
eW
A
St
at
eW
I
St
at
eW
V
St
at
eW
Y
0.10
0.11
0.12
0.13
0.14
(1.2) Marks Available: 6
Consider a balanced panel dataset that records observations on i = 1, 2, . . . , n individuals
over t = 1, 2, . . . , T years. One of the regressors in the dataset, Xi,t, can be split into the
additively separable form:
Xi,t = αi + λt,
where αi is a time-invariant component that varies across individuals and λt is a component
that varies over time in precisely the same way for every individual. An econometrician plans
to conduct three regressions using Xi,t as a regressor:
(a.) An individual fixed effects regression
(b.) A time fixed effects regression
(c.) A two-ways fixed effects regression
For each of these three regression models, state whether or not a coefficient estimate will be
reported for Xi,t and explain your reasoning.
(1.3) Marks Available: 3
Derive the within transformation for the panel data regression model with time fixed effects:
Yi,t = λt + β1Xi,t + Vi,t,
where i = 1, 2, . . . n, t = 1, 2, . . . , T and λt is a dummy variable equal to 1 at time t and 0
otherwise. Specifically, you should derive a representation of the form:
Y˜i,t = β1X˜i,t + V˜i,t,
where Y˜i,t = Yi,t − Y¯t, with Y¯t denoting the average over individuals of Yi,t, and so on.
4
Section 2: Empirical Questions (20 marks)
This question continues from where we left off in Question 1.1 of Assignment 2 in our exploration
of the nature of economic institutions1 and their possible effect on economic development and
prosperity. The dataset A3_Data.csv contains a random sample of n = 54 formerly colonised
countries, with the following variables available:
logGDPpci Natural log of per capita GDP in 1995 (Purchasing Power Parity basis)
AvExprRiski The ‘expropriation’
2 risk score (from 0 to 10) averaged from 1985 to
1995, with 0 being the most risky and 10 being the least risky.
Latitudei Absolute value of the latitude of country i on a scale of 0 to 1, where
0 is the equator and 1 is the North (or South) Pole.
logSettMorti Natural log of the rate of settler mortality faced by Europeans at the
time of colonisation.
Democracy1900i Democracy score (from 0 to 10) capturing the extent of democracy in
country i in 1900, with 0 being the least democratic and 10 being the
most democratic.
Consider the following causal representation:
logGDPpci = β0 + β1AvExprRiski + β2Latitudei + Vi, (1)
where Vi is an i.i.d. mean-zero disturbance term. In this question, both logSettMorti and
Democracy1900i are used as IVs for AvExprRiski. Latitudei is to be treated as exogenous.
(2.1) Marks Available: 6
Use R to complete the last three columns of Table 2.1, rounding each value to three decimal
places. Please use heteroskedasticity consistent standard errors in all cases.3
(2.2) Marks Available: 5
Use your completed version of Table 2.1 to provide statistical evidence on the validity (or
otherwise) of logSettMorti and Democracy1900i as IVs. Inference should be conducted at
the 5% significance level. Explain each step in your discussion and clearly indicate which
rows of Table 2.1 are being consulted and which χ2 distribution (i.e. χ21 or χ
2
2) is being used
to obtain the relevant p-value(s).
(2.3) Marks Available: 3
An R user who wishes to estimate the parameters of equation (1) implements the following
2SLS procedure:
– First, use the lm command to regress AvExprRiski on an intercept, logSettMorti,
Democracy1900i and Latitudei and save the fitted values, ̂AvExprRiski.
– Second, use the lm command to regress logGDPpci on an intercept, ̂AvExprRiski and
Latitudei.
Briefly explain whether this procedure causes any problems related to the computation of
the parameter estimate βˆ1 and/or its standard error, S.E.(βˆ1).
1You can think of these as the ‘rules’ that shape economic interaction. These involve things such as property
rights and contract enforcement.
2Definition: (noun) the action by the state or an authority of taking property from its owner. One possible example
might be a government seizing someone’s plot of land in order to build a road.
3To get you started, the values of Row (A) have been filled out in bold. You may find it helpful to replicate these
values first, to ensure that you understand how to obtain the necessary test statistics and p-values.
5
(2.4) Marks Available: 6
(a.) Estimate an OLS regression of logGDPpci on an intercept, AvExprRiski and Latitudei.
Use your model to compute a prediction of log per capita GDP in 1995 for ‘Sudstralia’, a
fictitious country with Sudan’s average expropriation risk score and Australia’s latitude
(note that the country codes for Sudan and Australia are SDN and AUS, respectively).
(b.) Estimate the parameters of (1) by 2SLS using both logSettMorti and Democracy1900i
as IVs for AvExprRiski and treating Latitudei as exogenous. Use your 2SLS model to
compute a prediction of log per capita GDP in 1995 for Sudstralia.
(c.) Do you prefer the OLS-based prediction from question 2.4(a) or the 2SLS-based predic-
tion from question 2.4(b)? Explain your answer.
Table 2.1 Hypothesis Tests for Section 2
Row Regression H0 Wald p-values
χ21 χ
2
2
1 (A) δ1 = 0, δ2 = 0 112.718 0.000 0.000
2 (B) β1 = 0, β2 = 0
3 (C) pi1 = 0, pi2 = 0
4 (D) α1 = 0, α2 = 0
Details of regressions (A) to (D):
(A) OLS estimation of the following Population Regression Function (PRF):
E(logGDPpci|AvExprRiski, Latitudei) = δ0 + δ1AvExprRiski + δ2Latitudei. (2)
(B) 2SLS estimation of equation (1) using the ivreg command with both logSettMorti and
Democracy1900i as IVs for AvExprRiski and treating Latitudei as exogenous.
(C) OLS estimation of the following PRF:
E(AvExprRiski|logSettMorti, Democracy1900i, Latitudei) = pi0 + pi1logSettMorti
+ pi2Democracy1900i + pi3Latitudei. (3)
(D) OLS estimation using the residuals of regression (B), Vˆi, as the dependent variable in:
E(Vi|logSettMorti, Democracy1900i, Latitudei) = α0 + α1logSettMorti
+ α2Democracy1900i + α3Latitudei. (4)
Additional notes:
Row numbers are provided in the table for your reference.
Each row of the table relates to a joint test of the specified null hypothesis (H0) in Regression
(A), (B), (C) or (D) against the general alternative hypothesis that one or more of the stated
restrictions does not hold.
The ‘Wald’ column is to be populated with heteroskedasticity consistent (HC) Wald statistics
testing the appropriate hypotheses.
The two ‘p-values’ columns are to be populated with p-values corresponding to the relevant
Wald statistic obtained from the chi-squared distribution with either one degree of freedom
(χ21) or two degrees of freedom (χ
2
2).
6
学霸联盟