ECMT6007/ECON4954: Analysis of Panel Data
School of Economics, University of Sydney,
Semester 1, 2022
Sample Questions of Final Exam
These sample questions here are not meant to be exhaustive, covering every aspects of
the unit; they rather illustrate the types of questions to expect in the exam.
The appendix contains the tables for the critical values of the t, F, and χ2 distributions,
which you may use for the final exam (in case you don’t have them).
1. Consider the following dynamic linear panel data model:
yit = λyi,t−1 + βxit + ηi + εit (1.1)
where ηi and εit are independently and identically distributed (i.i.d.) with zero means and
variances σ2η and σ
2
ε respectively. We are interested in the estimation of β using a random
sample {yit, xit}, i = 1, · · · , N , t = 1, · · · , T , N > T > 2. Suppose that the explanatory
variable xit is predetermined by:
xit = γxi,t−1 + µi + eit (1.2)
where 0 < γ < 1, µi and eit are i.i.d. with zero means and variances σ
2
µ and σ
2
e respectively.
Suppose that ηi and µi are uncorrelated to each other. Both εit and eit are i.i.d. noises,
and each of these two does not correlated to any other random variables.
(a) Does the Pooled OLS estimator applied to model (1.1) with {yit, xit} typically over-
estimate or underestimate λ? Explain briefly.
(b) Is the FD estimator of λ consistent, and why? What happens if xit is strictly
exogenous?
(c) Is the FE estimator of λ consistent, and why? What happens if xit is strictly exoge-
nous?
(d) If we resort to estimation with instrumental variable (either IV or GMM estimators),
propose two instruments and briefly illustrate the estimation procedure. Remember
to check the variables you proposed are indeed valid instruments.
1
2. Consider the estimation of the Cobb-Douglas production function of an industry:
Yit = AitK
β1
it L
β2
it M
β3
it
where Ait, Kit, Lit and Mit, respectively are firm i’s productivity, capital, labor and
intermediate inputs at time t. After taking logarithm, we get
yit = ait + β1kit + β2lit + β3mit
where the small letters represent the logarithm of the corresponding large letters. We col-
lect a sample of N firms’ output and input data {Yit, Kit, Lit,Mit} (or equivalently{yit, kit, lit,mit}),
where i = 1, · · · , n and t = 1, · · · , T . Now suppose that ait can be decomposed:
ait = β0 + ai + eit where ai is firm i’s intrinsic productivity that does not change over
time, and eit is the temporary productivity shock; β0 is a global intercept term reflecting
the industry average productivity. Firm i cannot observe eit at time t so that it makes
input decisions based on ai and then produces output yit. We end up with the regression
model (2):
yit = β0 + β1kit + β2lit + β3mit + ai + eit (2)
(a) Interpret the coefficients β1, β2 and β3 one by one.
(b) Suppose that we want to test whether the return to scale is constant. Write down
the null and alternative hypotheses, the test statistics to be used and its distribution
under the null (including the degree of freedom), and the decision rule.
(c) Outline the procedure of the First Difference (FD) estimator for this model. List
the assumptions for the FD estimator to be consistent.
(d) Outline the procedure of the Fixed Effect (FE) estimator for this model. List the
assumptions for the FE estimator to be consistent.
(e) Under what conditions on the error term uit = ai+eit will the Random Effects (RE)
estimator provide consistent estimates of (β1, β2, β3)? Are the assumptions likely to
be valid in this model? Explain briefly.
(f) One by one, calculate the degree of freedom of Pooled OLS, FD, FE, and RE esti-
mators for the model (1) with T > 2.
3. We are interested in analyzing the effect of the government building a new hospital on
housing prices in the suburb of Sydenham. Rumors that a new hospital would be built
in Sydenham began after 2006, and the hospital was built and began operating in 2008.
We have data on the prices of houses sold in Sydenham in 2006 and another sample on
houses that sold in Sydenham in 2010. The hypothesis we wish to test is that the price
of houses located near the site of new hospital would rise above the price of more distant
houses. The data for each year includes the dummy variable near which is equal to one
if the house is located within 2 kilometers of the new hospital. House prices, for both
2
years of data, were measured in 2010 prices. The variable rprice denotes the real house
price (scaled by $100,000). The following simple regression model was estimated using
only the year 2010 sample of data
̂rprice = 10.131
(0.309)
+ 2.688
(0.788)
near (3.1)
n = 96, R2 = 0.199
while the following was estimated using only the 2006 sample of data
̂rprice = 9.252
(0.265)
+ 1.412
(0.671)
near (3.2)
n = 105, R2 = 0.106
(a) Explain one by one the interpretation of the estimates in model (3.2)?
(b) Based on the estimates in (3.1) and (3.2), from 2006 to 2010, what is the average
price change for all houses in Sydenham?
(c) Explain why we cannot infer from the estimates in (3.1) that the location of the
hospital caused the price of houses located nearby to increase? What evidence from
model (3.2) supports this conclusion?
(d) Using the information from models (3.1) and (3.2), calculate the difference-in-differences
estimate of the impact of the new hospital on the price of nearby houses?
(e) Propose a linear regression model that can directly estimate the effect of new hospital
on housing price.
4. Consider a linear panel data model
yit = x
′
itβ + αi + γt + eit
where αi is a fixed effect, and γt is a time effect.
(a) Write down the model after the within transformation.
(b) State the assumptions for β̂FE to be consistent and show β̂FE is consistent under
those conditions.
(c) Show that β̂FE is a generalized least square estimator. In particular, show the
transformed regression model (in matrix form) for β̂FE.
3
4
5
Table for Critical Values of the Chi-Squared Distribution