程序代写案例-A1|学霸联盟

程序代写案例-A1

时间：2021-05-07

ECONOMETRIC METHODS
Midterm Exam
January 2021
SECTION A
A1 Consider an i.i.d. sample of (yi,xi)′ for i = 1, 2, . . . , n, where xi is a k × 1 vector not including
a constant. Let 1 = (1, 1, . . . , 1)′ be the n × 1 vector of ones, M1 be its annihilator matrix,
Y = (y1, . . . , yn)′ be an n× 1 vector, and X = (x1, . . . ,xn)′ be an n× k matrix.
(a) [5 points] Show the following:
(i) M1Y = Y− y¯ · 1, where y¯ = n−1∑ni=1 yi,
(ii) M1X = X− 1x¯′, where x¯ = X′1/n = (x¯1, . . . , x¯k)′ with x¯j = n−1
∑n
i=1 xj,i being the
mean of the j-th regressor.
Solution. To see (i), note that
M1Y = Y− 1(1′1)−11′Y = Y− 1 · n−1
n∑
i=1
yi = Y− y¯ · 1.
For (ii), clearly
M1X = X− 1(1′1)−11′X = X− 11′X/n = X− 1
(
X′1/n
)′
= X− 1x¯′
with x¯ = X′1/n. Then X′1/n = (x¯1, . . . , x¯k)′ can be verified with direct matrix multipli-
cation.
(b) [5 points] Let y¨i = yi− y¯ and x¨i = xi− x¯ for i = 1, . . . , n denote the demeaned values. Use
the Frisch-Waugh-Lovell theorem to show that the OLS estimators of β in yi = α+x′iβ+εi
and in y¨i = x¨′iβ + εi are the same.
Solution. Notice that Y¨ = (y¨1, . . . , y¨n)′ = Y − y¯ · 1 = M1Y and X¨ = (x¨1, . . . , x¨n)′ =
M1X. Hence, the problem is equivalent to showing that the OLS estimators of β in
Y = α · 1 + Xβ + ε and Y¨ = X¨β + ε are the same. The OLS estimator in the second
regression is given by
βˆ = (X¨′X¨)−1X¨Y¨.
By the Frisch-Waugh-Lovell theorem for the first regression, we also have, using (a),
βˆ =
(
(M1X)′M1X
)−1
(M1X)′M1Y =
(
X¨′X¨
)−1
X¨′Y¨,
as needed.
A2 Consider a regression model yi = βxi + εi. Which of the Gauss-Markov assumptions, if any,
does each of the following four statements violate?
(a) [5 points] (i) E
[
ε3i
∣∣ xi] = xi, (ii) E[ε4i ∣∣ xi] = 1 + x2i .
1
Solution. None of the assumptions is violated as (i) concerns the third moment of the errors
and (ii) the fourth one, while we do not assume anything about those in the Gauss-Markov
theorem.
(b) [5 points] (iii) E
[
ε2i ε
2
j
∣∣∣ xixj] 6= 0 for i 6= j, (iv) E[ε2i ε2jxixj] 6= 0 for i 6= j.
Solution. None of the assumptions is violated. As an example, suppose that xi are i.i.d.
with E[xi] = 1, εi also are i.i.d. with zero mean and variance σ2 > 0, and that xi
and εi are independent. Hence, none of the Gauss-Markov assumptions is violated, but
E
[
ε2i ε
2
j
∣∣∣ xixj] = E[ε2i ε2jxixj] = σ4 > 0.
A3 Consider an i.i.d. sample of (yi,xi)′ for i = 1, 2, . . . , n, where xi are k × 1 vectors, and a linear
regression yi = x′iβ + εi with E[εi | xi] = 0 and E
[
ε2i
]
= σ2 <∞.
(a) [5 points] Suppose that k = 1 so that yi = βxi+εi. Show that n−1
∑n
i=1 εˆ
2
ix
2
i is a consistent
estimator of E
[
ε2ix
2
i
]
<∞, where εˆi are the OLS residuals from the regression of yi on xi.
State any additional assumptions that you need. (Hint: write εˆi only in terms of εi, xi, β,
and βˆ.)
Solution. As the hint suggests, we can write εˆi = yˆi − yi = (βˆ − β)xi − εi so that
1
n
n∑
i=1
εˆ2ix
2
i = (βˆ − β)2
1
n
n∑
i=1
x4i − 2(βˆ − β)
1
n
n∑
i=1
x3i εi +
1
n
n∑
i=1
ε2ix
2
i .
Under the current assumptions we have that βˆ is consistent so that βˆ − β p−→ 0 as n→∞.
Assume that also E
[
x4i
]
<∞. Then, using the Cauchy-Schwarz inequality, E[x3i εi] is also
finite since
E
[
x3i εi
]
≤
√
E
[
x4i
]
E
[
x2i ε
2
i
]
<∞.
Hence, using the Slutsky’s theorem,
1
n
n∑
i=1
εˆ2ix
2
i = (βˆ − β)2︸︷︷︸
p−→0
1
n
n∑
i=1
x4i︸︷︷︸
p−→E[x4i ]<∞
−2 (βˆ − β)︸︷︷︸
p−→0
1
n
n∑
i=1
x3i εi︸︷︷︸
p−→E[x3i εi]<∞
+ 1
n
n∑
i=1
ε2ix
2
i︸︷︷︸
p−→E[ε2i x2i ]<∞
= E
[
ε2ix
2
i
]
,
where the Law of Large Numbers applies because all the limits are finite and the data is
i.i.d.
(b) [5 points] Suppose that εi | xi ∼ N (0, σ2) and consider the following result: If z ∼
N (0, σ2I) and A is a symmetric idempotent matrix, then z′Az/σ2 ∼ χ2(tr(A)). Use this
result to show that
(n− k) σˆ
2
σ2
| X ∼ χ2(n− k),
where σˆ2 = (n− k)−1∑ni=1 εˆ2i .
Solution. When showing unbiasedness of σ2 we saw that εˆ′εˆ = ε′MXε, where MX is the
2
annihilator matrix of X. We have also seen that
tr(MX) = tr
(
In −X(X′X)−1X′
)
= n− tr
(
X(X′X)−1X′
)
= n− tr
(
(X′X)−1X′X
)
= n− tr (Ik)
= n− k.
Hence, we have that
(n− k) σˆ
2
σ2
= εˆ
′εˆ
σ2
= ε
′MXε
σ2
and, thus, since εi | xi ∼ N (0, σ2) and ε | X ∼ N (0, σ2In),
(n− k) σˆ
2
σ2
| X ∼ χ2(n− k).
A4 Consider yi = βxi + εi for i = 1, 2 with Y = (2, 3)′ and X = (1, 2)′, where it is known that(
ε1
ε2
)
| X ∼ N
((
0
0
)
,Ω
)
with Ω =
(
1 −12
−12 2
)
.
(a) [5 points] Perform the GLS transformation and compute the GLS estimator. Hint: consider
Ω = PP′ with
P =
(
a 0
b c
)
.
Solution. We have that(
1 −12
−12 2
)
= Ω = PP′ =
(
a 0
b c
)(
a b
0 c
)
=
(
a2 ab
ab b2 + c2
)
.
Hence, a = 1, b = −1/2, and c = √7/2. Then
P−1 = 2
√
7
7
(√
7
2 0
1
2 1
)
=
(
1 0√
7
7
2
√
7
7
)
, Y˜ = P−1Y =
(
2
8
√
7
7
)
, X˜ = P−1X =
(
1
5
√
7
7
)
,
giving y˜i = βx˜i + ε˜i, and
βˆGLS =
(
X˜′X˜
)−1
X˜′Y˜ =
2 + 40·749
1 + 25·749
= 98 + 40 · 749 + 25 · 7 =
378
224 =
27
16 = 1.6875.
(b) [5 points] Compute the F -statistic to test β = 2 using the available information. What is
its finite sample distribution?
Solution. The transformed model satisfies the Gauss-Markov assumptions, and the errors
3
remain normal. Hence,
βˆGLS | X ∼ N
(
β,
(
X˜′X˜
)−1)
,
where we already know that (
X˜′X˜
)−1
= 1
1 + 25·749
= 732 .
Note that in the Rβ = r we have simply R = (1) and r = (2). Hence, the F -statistic is
F = 1
m
(
Rβˆ − r
)′ [
R
(
X˜′X˜
)−1
R′
]−1 (
Rβˆ − r
)
= 11
(27
16 − 2
)2 ( 7
32
)−1
= 2556 ≈ 0.44.
Note that we can use σ˜2 = 1 as it is known. As a result, the finite sample distribution of
F also changes and F ∼ χ2(1).
A5 Consider a random walk with a hyperbolic trend,
yt = β
1
t
+ yt−1 + εt
for t = 1, . . . , T , where y0 = 0, εt are i.i.d. random variables distributed as N (0, σ2), and σ2 > 0
is assumed to be known.
(a) [5 points] Find the Maximum Likelihood estimator βˆMLE of β. Is it unbiased?
Solution. The log-likelihood function is given by
l(β | Y, y0) = −T2 log 2pi −
T
2 log σ
2 − 12σ2
T∑
t=1
(
yt − yt−1 − β 1
t
)2
.
The score function then equals
d
dβ
l(β | Y, y0) = 1
σ2
T∑
t=1
1
t
(
yt − yt−1 − β 1
t
)
.
To find the maximum likelihood estimator, write
T∑
t=1
1
t
(
yt − yt−1 − βˆMLE 1
t
)
= 0,
βˆMLE
T∑
t=1
1
t2
=
T∑
t=1
1
t
(yt − yt−1) ,
βˆMLE =
(
T∑
t=1
1
t2
)−1 T∑
t=1
1
t
(yt − yt−1) .
Now we also have that
E
[
βˆMLE
]
=
(
T∑
t=1
1
t2
)−1 T∑
t=1
1
t
E
[
β
1
t
+ εt
]
= β
so that βˆMLE is unbiased.
4
(b) [5 points] What is the smallest possible variance of an unbiased estimator of β? Is it
achieved by βˆMLE?
Solution. We have that
d2
dβ2
l(β | Y, y0) = − 1
σ2
T∑
t=1
1
t2
so that the Cramer-Rao bound is σ2
(∑T
t=1
1
t2
)−1
. On the other hand,
Var
[
βˆMLE
]
=
(
T∑
t=1
1
t2
)−2
Var
[
T∑
t=1
1
t
(yt − yt−1)
]
=
(
T∑
t=1
1
t2
)−2 T∑
t=1
1
t2
Var
[
β
1
t
+ εt
]
=
(
T∑
t=1
1
t2
)−2
σ2
T∑
t=1
1
t2
= σ2
(
T∑
t=1
1
t2
)−1
.
Thus, βˆMLE achieves the Cramer-Rao bound.
A6 Consider the following three regression models along with the corresponding null hypotheses:
(i) yi = α+ eβxi + γzi + εi with H0 : β = 0,
(ii) yi = α+ βxi + γzi + εi with H0 : β = 0,
(iii) yi = α+ xiβ + γzi + εi with H0 : β = 2.
(a) [7 points] Which of the LR, W, and LM tests would you use in each case and why? Pick
each of the tests for just one model.
Solution. LM for (i) because the restricted model is linear and, hence, easy to estimate.
W for (ii) as it is easy to obtain the unrestricted estimator. LR for (iii) since we can
reparametrise the model with λ = β−1 giving a linear model (and, hence, easy to estimate
both restricted and unrestricted versions) and not compromising the test statistic.
(b) [3 points] If computational burden were not a problem and you had a large sample, would
you have any preference between the three tests under the null? Why?
Solution. No, because all three statistics under the null asymptotically follow χ2(1).
5
SECTION B
B1 Let yt follow an ARMA(1,1) process given by
yt = φyt−1 + εt + θεt−1
with independently and identically distributed errors εt with zero mean and variance σ2. Also,
|φ| < 1, |θ| < 1, and φ+ θ 6= 0.
(a) [5 points] Show that yt has the following MA(∞) representation:
yt = εt +
∞∑
j=1
(θφj−1 + φj)εt−j .
Solution. It could be shown iterating backwards. Using the lag polynomials we have
(1− φL)yt = (1 + θ)εt,
yt =
1 + θL
1− φLεt
= (1 + θL)
∞∑
j=0
φjεt−j
=
∞∑
j=0
φjεt−j + θ
∞∑
j=0
φjεt−(j+1)
= εt +
∞∑
j=1
φjεt−j + θ
∞∑
j=1
φj−1εt−j
= εt +
∞∑
j=1
(θφj−1 + φj)εt−j .
(b) [6 points] Note that under our parameter restrictions yt is stationary. Let γ(h) = Cov[yt, yt−h].
Show that
γ(h) = φγ(h− 1) + E[εtyt−h] + θE[εt−1yt−h]
and, hence,
γ(0) = σ
2(1 + θ2 + 2θφ)
1− φ2 ,
γ(1) = σ
2(1 + θφ)(θ + φ)
1− φ2 ,
γ(h) = φγ(h− 1), for h > 1.
Solution. Since yt clearly has zero mean, γ(h) = E[ytyt−h] and, for h ≥ 0,
ytyt−h = φyt−1yt−h + εtyt−h + θεt−1yt−h,
E[ytyt−h] = φE[yt−1yt−h] + E[εtyt−h] + θE[εt−1yt−h],
γ(h) = φγ(h− 1) + E[εtyt−h] + θE[εt−1yt−h].
6
Using the derived relation for h = 0, 1 we get
γ(0) = φγ(1) + E[εtyt] + θE[εt−1yt]
= φγ(1) + σ2 + θ(θ + φ)σ2,
γ(1) = φγ(0) + E[εtyt−1] + θE[εt−1yt−1]
= φγ(0) + 0 + θσ2.
Solving the last two equations gives γ(0) and γ(1). Lastly, since E[εtyt−h] = E[εt−1yt−h] =
0 for h > 1, the last result follows.
(c) [3 points] Suppose that you observe yt for t = 1, . . . , T and have parameter estimates
φˆ and θˆ (with |φˆ| < 1) as well as residuals εˆt for t = 1, . . . , T . Using this information
obtain a prediction for yT+3. What feature of stationary ARMA processes do your results
illustrate?
Solution. We have
yT+1|T = φyT |T + εT+1|T + θεT |T = φyT + θεˆT ,
yT+2|T = φyT+1|T + εT+2|T + θεT+1|T = φ2yT + θφεˆT ,
yT+3|T = φyT+2|T + εT+3|T + θεT+2|T = φ3yT + θφ2εˆT .
Notice that yT+h|T → E[yt] = 0 as h → ∞. Thus, this illustrates the fact that stationary
ARMA processes are mean-reverting.
(d) [6 points] Suppose that you are only interested in the autoregressive parameter φ and
estimate
yt = ρyt−1 + ut.
What is the probability limit of the OLS estimate ρˆ? When is it consistent for φ?
Solution. We have
ρˆ =
∑T
t=2 yt−1yt∑T
t=2 y
2
t−1
=
∑T
t=2 yt−1(φyt−1 + εt + θεt−1)∑T
t=2 y
2
t−1
= φ+
1
n
∑T
t=2 yt−1(εt + θεt−1)
1
n
∑T
t=2 y
2
t−1
p−→ φ+ E[yt−1εt] + θE[yt−1εt−1]
E
[
y2t−1
]
= φ+ θσ
2
γ(0)
= φ+ θ(1− φ
2)
1 + θ2 + 2φθ .
Thus, we have consistency when θ = 0, i.e., the errors are not autocorrelated and we are
dealing with an AR(1).
7
B2 Consider T observations y1, . . . , yT following an AR(2) model yt = φ1yt−1 + φ2yt−2 + εt with
independent and identically distributed errors εt with zero mean and variance σ2 <∞.
(a) [4 points] Suppose that φ1 = 0.3 and φ2 = −0.4. Is yt stable?
Solution. The autoregressive lag polynomial is Φ(z) = 1 − 0.3z + 0.4z2 and has roots
3/8 ± 1511/2/8i with modulus √(3/8)2 + 151/82 = 101/2/2 > 1. Thus, the process is
stable.
(b) [6 points] Assume that φ1 and φ2 are such that yt is stable and, hence, stationary. Show
that
γ(1) = φ1γ(0) + φ2γ(1) and γ(2) = φ1γ(1) + φ2γ(0),
where γ(h) = Cov[yt, yt−h] is the autocovariance function.
Solution. Since yt is stationary,
E[yt] = φ1 E[yt−1] + φ2 E[yt−2] + E[εt],
E[yt] = φ1 E[yt] + φ2 E[yt],
E[yt] = 0.
Hence, γ(h) = E[ytyt−h]. Premultiply both sides of the model by yt−1 to get
ytyt−1 = φ1y2t−1 + φ2yt−2yt−1
γ(1) = φ1γ(0) + φ2γ(1).
Now multiply both sides by yt−2 to get
ytyt−2 = φ1yt−1yt−2 + φ2y2t−2
γ(2) = φ1γ(1) + φ2γ(0).
(c) [5 points] Based on the results in (b), find method of moments estimators of φ1 and φ2.
Solution. We may additionally divide the two relations by γ(0) to have an equivalent in
terms of autocorrelations. Then, solving
ρ(1) = φ1 + φ2ρ(1) and ρ(2) = φ1ρ(1) + φ2
as a system of equations gives
φˆ1 =
ρˆ(1)(1− ρˆ(2))
1− ρˆ(1)2 and φˆ2 =
ρˆ(2)− ρˆ(1)2
1− ρˆ(1)2 ,
where, for h > 0,
ρˆ(h) =
∑T
t=h+1 ytyt−h∑T
t=1 y
2
t
.
8
(d) [5 points] Assume that σ2 > 0 is known. Construct a GMM estimator for φ1 and φ2 using
more than two moment conditions. You may use an identity matrix as your weighting
matrix.
Solution. Continuing in the same manner as in (b) we get that also, e.g.,
γ(3) = φ1γ(2) + φ2γ(1) and γ(4) = φ1γ(3) + φ2γ(2).
Hence, the moment conditions in terms of autocovariances are
g(φ1, φ2) = E

φ1y2t−1 + φ2yt−2yt−1 − ytyt−1
φ1yt−1yt−2 + φ2y2t−2 − ytyt−2
φ1yt−1yt−3 + φ2yt−2yt−3 − ytyt−3
φ1yt−1yt−4 + φ2yt−2yt−4 − ytyt−4
 = 0.
The sample moments then are
gˆ(φ1, φ2) =
1
n
T∑
t=5

φ1y2t−1 + φ2yt−2yt−1 − ytyt−1
φ1yt−1yt−2 + φ2y2t−2 − ytyt−2
φ1yt−1yt−3 + φ2yt−2yt−3 − ytyt−3
φ1yt−1yt−4 + φ2yt−2yt−4 − ytyt−4
 .
Thus,
(φˆ1, φˆ2)′ = arg min
φ1,φ2
gˆ(φ1, φ2)′gˆ(φ1, φ2)
defines a GMM estimator.
B3 Consider an i.i.d. sample of (yi,xi)′ for i = 1, 2, . . . , n, where yi is binary. That is, yi can only
take values in {0, 1}. We model the conditional distribution of yi assuming that P(yi = 1 | xi) =
x′iβ for some unknown β and, hence, consider estimating yi = x′iβ + εi.
(a) [4 points] Find E[yi | xi] and Var[yi | xi]. Which one of the Gauss-Markov assumptions is
violated and why?
Solution. We have that yi conditional on xi is a Bernoulli random variable. Hence,
E[yi | xi] = x′iβ and Var[yi | xi] = x′iβ (1− x′iβ). The model is conditionally heteroskedas-
tic.
(b) [5 points] Provide a procedure to estimate the model efficiently and mention any additional
assumptions that you make.
Solution. Step 1: estimate yi = x′iβ+εi using OLS to construct σˆ(xi) =
[
x′iβˆ
(
1− x′iβˆ
) ]1/2
,
where we assume that the fitted values x′iβˆ always fall strictly between 0 and 1. Step 2:
use FGLS with Ωˆ = diag(σˆ2(x1), . . . , σˆ2(xn))′ or simply WLS (Weighted Least Squares)
with weights 1/σˆ(xi).
(c) [5 points] Suppose that P(yi = 1 | xi) = β0+β1x1,i+β2x2,i. That is, we have just a constant
term and two additional regressors. Could you use the OLS estimator along with the usual
9
(i.e., derived under the Gauss Markov assumptions) F -statistic to test β1 = β2 = 0? What
about β1 = β2 = 1?
Solution. Yes and no. Under the null of β1 = β2 = 0 the model becomes homoskedastic
with Var[yi | xi] = β0(1−β0) so that the usual F -statistic is valid. Under β1 = β2 = 1, how-
ever, we have Var[yi | xi] = (β0 + x1,i + x2,i) (1− β0 − x1,i − x2,i), yielding heteroskedas-
ticity and invalidating the usual F -statistic since one of the Gauss-Markov assumptions is
violated.
(d) [6 points] Consider again the general case P(yi = 1 | xi) = x′iβ. Suppose that instead
of your proposal in (b), you use the OLS estimator. Construct an asymptotically valid
F -statistic to test Rβ = r. (You do not need to actually prove its asymptotic validity.)
Solution. Using OLS we have that
√
n
(
βˆ − β
)
d−→ N
(
0,
(
E
[
xix′i
])−1 E[ε2ixix′i] (E[xix′i])−1) .
Let V = (E[xix′i])
−1 E
[
ε2ixix′i
]
(E[xix′i])
−1, and note that we can further write
E
[
ε2ixix′i
]
= E
[
E
[
ε2i
∣∣∣ xi]xix′i] = E[x′iβ (1− x′iβ)xix′i].
Hence, V could be estimated as(
1
n
n∑
i=1
xix′i
)−1( 1
n
n∑
i=1
εˆ2ixix′i
)(
1
n
n∑
i=1
xix′i
)−1
,
where εˆi are the OLS residuals, as well as(
1
n
n∑
i=1
xix′i
)−1( 1
n
n∑
i=1
x′iβˆ
(
1− x′iβˆ
)
xix′i
)(
1
n
n∑
i=1
xix′i
)−1
.
Denote an estimator of V as Vˆ. Then, by Slutsky’s theorem, under the null,
√
n
(
Rβˆ − r
)
d−→ N (0,RVR′) .
Thus,
F = n
m
(
Rβˆ − r
)′ [
RVˆR′
]−1 (
Rβˆ − r
)
,
where m is the number of linearly independent restrictions or the rank of R, and F d−→
χ2(m)/m.
B4 Discuss the following two topics.
(a) [10 points] Efficient estimation and inferences about the regression coefficients.
Solution. Under the Gauss-Markov assumptions use OLS (which is efficient) with usual
standard errors. Under heteroskedasticity use the efficient GLS or asymptotically efficient
FGLS; if the conditional variance-covariance structure is unknown, use the heteroskedasticity-
robust standard errors for valid inferences. With time series data, use the OLS with its
10
usual standard errors when the Gauss-Markov assumptions are satisfied, with the HAC
standard errors if unsure about the serial dependence structure in the errors, and the FGLS
when the dependence structure can be modelled. Maximum Likelihood is also an option
with time series data, which also achieves the Cramer-Rao bound. Among other things, we
also use it for nonlinear (e.g., limited dependent variable) models. Under endogeneity, use
the IV, 2SLS, or optimal GMM with the corresponding standard errors. 2SLS is as efficient
as the optimal GMM under homoskedasticity. GMM is more efficient than 2SLS under
heteroskedasticity. In general, optimal GMM is efficient in a wide class of estimators.
(b) [10 points] Inconsistent estimators and invalid inferences.
Solution. If we treat a model equation as describing a causal mechanism, endogeneity (as
a result of selection bias, omitted variables, measurement errors, simultaneous causality)
leads to inconsistency. Naturally, if we try to eliminate endogeneity using instrumental
variables, the bias will come back if the instruments are not valid (failure of their exo-
geneity). More generally, Maximum Likelihood may lead to inconsistent estimation under
incorrect distributional specifications.
The main reason for invalid inferences is using wrong standard errors, i.e., those that do
not correspond to the estimator that we are actually using. For instance, if the model is
heteroskedastic, we must take it into account at least using the heteroskedasticity-robust
standard errors. Incorrectly using the usual standard errors based on the Gauss-Markov
assumptions will lead to incorrect test statistics and, hence, invalid inferences. Another
typical issue is not taking into account serial correlation in the errors by using, say, FGLS
or HAC standard errors.
11

学霸联盟