ECON6300/7320-无代写|学霸联盟

ECON6300/7320-无代写

时间：2024-03-15

ECON6300/7320
Advanced Microeconometrics
Introduction and Review
Fu Ouyang
1University of Queensland
Lecture 1
1 / 54
Course Staff
▶ Lecturer: Fu Ouyang, uqfouyan@uq.edu.au, 39-532A
▶ Lecture: 14:00-16:00 on Thursdays, 78-420
▶ Consultation: 10:00-11:00 on Wednesdays, 39-532A
▶ Tutor: Dao Nguyen
▶ Tutorial session: 18:00-20:00 on Tuesdays, 39-208
▶ Consultation: 14:30-15:30 on Fridays, 39-125B
2 / 54
Course Information
▶ The course will focus on estimation and inference methods
that are widely used in applied microeconomics.
▶ The course has a topics-based structure, and theory and
applications are closely integrated.
▶ Whenever you learn a method theoretically (lecture), there
will be an R exercise with data for the method (practical)
▶ The course will provide econometric skills that could be
used in quantitative research at the Graduate level.
3 / 54
Course Meetings: Lectures and Tutorials
▶ Lectures
▶ Go through theory, illustrating examples, perhaps some
proofs, and Q& A
▶ Tutorials
▶ See the Course Timetable posted on Blackboard for the
time and location.
▶ Focus is on practical implementation in R
▶ Start from week 2 (Introduction to R programming)
4 / 54
Assessment
▶ Two assignments + Final problem set.
▶ Analytical and empirical problems.
▶ Assignments 1-2 each cover 1/3 topics of the course (30%
each).
▶ Final problem set is comprehensive and covers the entire
course (40%).
5 / 54
Learning Resources
▶ Recommended Textbooks:
▶ Hansen, E.B. (2022). Econometrics. Princeton University
Press. (H)
▶ Cameron, A.C., and P.K. Trivedi (2005).
Microeconometrics: Methods and Applications. Cambridge
University Press. (CT)
▶ There are many other useful texts, e.g., Greene,
Wooldridge, etc. See ECP.
▶ As the semester progresses, lecture notes, slides,
datasets, problem sets, etc. will be provided. It is however
strongly encouraged that you read the relevant part of the
textbook and other references.
6 / 54
This week: math review
▶ You are EXPECTED to have a basic grasp of
mathematics/statistics. If most of the concepts we cover today are
unfamiliar to you, this course may be unsuitable (unless it is
compulsory (!!!)).
▶ Random Variables, Expectation, Variance, Covariance
▶ Point Estimation, Hypothesis Testing, Interval Estimation
▶ Linear Algebra: vector and matrix and their operations
7 / 54
Math Review random variable
▶ Informally, a Random Variable (RV) takes on numerical values
determined by an experiment.
▶ Example: Consider an experiment in which a fair coin is tossed. Then,
the possible outcomes are Head and Tail, i.e., {H,T}. Then, we define
a random variable X as follows;
X =
{
0 if the outcome is T
1 if the outcome is H
The X could be either 0 or 1 whenever the coin is tossed.
Each time, the realisation of X would be different.
The uncertainty can be summarised as Pr(X = 0) = 0.5.
8 / 54
Math Review probability functions
▶ The uncertainty of a random variable, say X , is represented by its
cumulative distribution function (CDF), defined as
FX (x) := Pr(X ≤ x)
▶ When FX (x) is a continuous function, X is said to be a continuous
random variable. For a continuous random variable, the probability
density function (PDF), denoted by fX (x), is a function that satisfies
Pr(a < X < b)︸︷︷︸
=FX (b)−FX (a)
=
∫ b
a
fX (t)dt
▶ If FX (x) is differentiable, we have
fX (x) =
d
dx
FX (x)
▶ If FX (x) is a step function, X is discrete, which we do not cover today.
9 / 54
Math Review expectation
▶ A function of a random variable is also a random variable. For example,
exp(X ), log(X ), etc. More generally, g(X ) with some function g(·).
▶ Suppose X have the PDF fX (x). The expectation of g(X ) is
E [g(X )] :=
∫ ∞
−∞
g(t) · fX (t)dt
which represents the central tendency of the random variable, g(X ).
▶ As a special case with g(X ) = X ,
E [X ] :=
∫ ∞
−∞
t · fX (t)dt
which is the expectation of X , or the expected value or the mean
▶ Consider another case with g(X ) = (X − E [X ])2.
V (X ) := E [(X − E [X ])2]
which is called the variance that represents the variability of X . Note
that
√
V (X ) is the standard deviation of X .
10 / 54
Math Review expectation
▶ E [X ] and V (X ) are the population parameters, or theoretical moments.
You need to know the distribution to compute E [X ] and V (X ).
▶ E [X ] must be distinguished from the sample mean (average). For
example, when you observe some numbers x1, . . . , xn, its average is
xn :=
1
n
n∑
i=1
xi
which is NOT the expectation of the distribution
▶ Similarly, you must distinguish V (X ) and the sample variance
1
n
n∑
i=1
(xi − xn)2
11 / 54
Math Review expectation
1. E [c] = c, for any constant c
2. E [aX + b] = aE [X ] + b for any constants a and b
3. a1, . . . , an are constants, and X1, . . . ,Xn are RV’s. Then,
E
[
n∑
i=1
aiXi
]
=
∑
i=1
aiE [Xi ].
As a special case, we have
E
[
n∑
i=1
Xi
]
=
∑
i=1
E [Xi ].
4. V (c) = 0, for any constant c.
5. V (aX + b) = a2Var(X ) for any constants a and c.
12 / 54
Math Review joint distribution
▶ Suppose X and Y are jointly distributed with the joint PDF fXY (x , y).
▶ The marginal PDF of X can be obtained by integrating Y out,
fX (x) =
∫ ∞
−∞
fXY (x , y)dy
▶ X and Y are independent if and only if for all x and y
fXY (x , y) = fX (x)fY (y)
▶ The expectation of g(X ,Y ) for some function g(X ,Y ) is
E [g(X ,Y )] =
∫ ∞
−∞
∫ ∞
−∞
g(x , y)fXY (x , y)dxdy
13 / 54
Math Review covariance
▶ If g(X ,Y ) = (X − E [X ])(Y − E [Y ]), then we have the covariance,
C(X ,Y ) = E [(X − E [X ])(Y − E [Y ])]
▶ If C(X ,Y ) > 0, the X and Y move in the same direction.
If C(X ,Y ) < 0, they move in the opposite direction.
▶ |C(X ,Y )| ≤√V (X )√V (Y ). So, the correlation coefficient
ρXY :=
C(X ,Y )√
V (X )
√
V (Y )
is always between -1 and 1.
▶ If X and Y are independent, C(X ,Y ) = 0. But, the converse is not
generally true
14 / 54
Math Review conditional distribution
▶ Suppose X and Y are jointly distributed with the joint PDF fXY (x , y).
Then, the conditional PDF of Y given X = x is
fY |X (y |x) := fXY (x , y)fX (x)
which summarises the distribution of Y when X takes a value x .
▶ If X and Y are independent, fY |X (y |x) = fY (y) and fX |Y (x |y) = fX (x)
15 / 54
Math Review conditional expectation
▶ The conditional expectation of Y given X = x is
E [Y |X = x ] =
∫ ∞
−∞
yfY |X (y |x)dy
▶ The conditional variance of Y given X = x is
V (Y |X = x) = E
[
(Y − E [Y |X = x ])2
∣∣∣X = x]
▶ E [Y |X = x ] and V (Y |X = x) are functions of x .
▶ When x is not specified, E [Y |X ] and V (Y |X ) are random.
16 / 54
Math Review conditional expectation
1. E [c(X )|X ] = c(X ) for any function c(·)
2. E [a(X )Y + b(X )|X ] = a(X )E [Y |X ] + b(X ) for any functions a(·), b(·)
3. If X and Y are independent, E [Y |X ] = E [Y ]
4. E [E [Y |X ]] = E [Y ]. This is called the law of iterated expectations.
5. If X and Y are independent, V (Y |X ) = V (Y )
17 / 54
Math Review normal distribution
▶ Suppose a random variable Y follows a normal distribution with mean µ
and variance σ2. Or simply we can write
Y ∼ N (µ, σ2)
▶ The Y takes on a value from R = (−∞,∞).
Whenever it is drawn from N (µ, σ2), the realisation of Y will be different.
The uncertainty is summarised as its probability density function (PDF),
fY (y |µ, σ2) = 1√
2πσ2
exp
(
− 1
2σ2
(y − µ)2
)
18 / 54
Math Review normal distribution
▶ The shape of fY (y |µ, σ2) is given as follows when µ = 0 and σ2 = 1.
▶ It is symmetric about µ and the scale is determined by σ :=
√
σ2
▶ The whole shape (distribution) is completely determined by (µ, σ2)
19 / 54
Math Review normal distribution
▶ E [Y ] = µ and V (Y ) = σ2 for Y ∼ N (µ, σ2),
▶ N (0, 1) is called the standard normal distribution (the graph above),
often used for statistical inference
▶ I simulated three number from N (0, 1) using my computer and I
obtained
{0.5377, 1.8339,−2.2588}
▶ These numbers are realisations of N (0, 1). Their average is NOT the
expectation of N (0, 1). In fact the average is 0.0376 while the
expectation is 0.
▶ But, the average seems quite close to the mean
20 / 54
Math Review point estimation for normal mean
▶ Suppose Y1, . . . ,Yn are identically and independently distributed as
N (µ, σ2), i.e.,
Y1, . . . ,Yn
iid∼ N (µ, σ2),
and also assume that µ is unknown but σ2 is known. That is, if we knew
the parameter µ, we would know all.
▶ We wish to learn µ from the random sample of size n, i.e., Y1, . . . ,Yn.
Especially, we consider the (point) estimator
µˆn :=
1
n
n∑
i=1
Yi
▶ The estimator µˆn is random because Y1, . . . ,Yn are all random.
(Its realisation is an estimate of µ.)
21 / 54
Math Review sampling distribution of the estimator
▶ Result: a linear transformation of normal variables is a normal.
▶ This implies that
µˆn =
1
n
n∑
i=1
Yi
is normally distributed and its distribution is fully characterised by its
mean and variance
▶ That is, we know that
µˆn ∼ N (E [µˆn],V (µˆn))
▶ Then, what are the mean and the variance of the estimator µˆn?
22 / 54
Math Review sampling distribution of the estimator
▶ Result 2: E [X + Z ] = E [X ] + E [Z ] for any two RVs X and Z
▶ This implies that
E [µˆn] =
1
n
n∑
i=1
E [Yi ] =
1
n
n∑
i=1
µ =
1
n
(nµ) = µ
which means
µˆn ∼ N (µ,V (µˆn))
▶ The estimator µˆn is unbiased: E [µˆn] = µ
23 / 54
Math Review sampling distribution of the estimator
▶ Result: V (aX ) = a2V (X ) when X is random and a is a constant
▶ Result: V (X + Z ) = V (X ) + V (Z ) if X and Z are independent
▶ Therefore,
V [µˆn] = V
(
1
n
n∑
i=1
Yi
)
=
1
n2
n∑
i=1
V (Yi) =
1
n2
n∑
i=1
σ2 =
1
n2
(nσ2) =
σ2
n
▶ Finally, we have
µˆn ∼ N
(
µ,
σ2
n
)
where µ is the unknown parameter and σ2 is assumed to be known.
24 / 54
Math Review consistency of the estimator
▶ Recall that
µˆn ∼ N
(
µ,
σ2
n
)
especially, E [µˆn] = µ and V (µˆn) = σ
2
n
▶ The estimator is distributed centred around µ and its variance gets
smaller when n grows,
▶ This suggests that µˆn gets ‘very’ close to µ as n→∞.
▶ This quality is referred to as the consistency of the estimator.
25 / 54
Math Review consistency of the estimator
▶ Suppose X1,X2, . . . , are sequence of random variables such that
lim
n→∞
Pr(|Xn − c| > ε) = 0,
for any ε > 0. Then Xn converges in probability to c, and we often write
Xn
p−→ c
▶ µˆn is consistent to µ if µˆn
p−→ µ
26 / 54
Math Review statistical inference
▶ When
Y ∼ N (µ, σ2),
the properties of normal random variables implies
Y − µ
σ
∼ N (0, 1)
Exercise: verify this.
▶ Back to our estimator µˆn: since
µˆn ∼ N
(
µ,
σ2
n
)
we have
µˆn − µ√
σ2/n
∼ N (0, 1)
27 / 54
Math Review statistical inference
▶ Recall that when Z ∼ N (0, 1),
Pr(|Z | ≤ 1.96) = 0.95 or, equivalently, Pr(|Z | > 1.96) = 0.05
▶ That is, the event that |Z | ≤ 1.96 happens with 95% probability.
But, the event that |Z | > 1.96 happens with 5% probability.
▶ So, the event that |Z | ≤ 1.96 is likely to happen.
But, the event that |Z | > 1.96 is unlikely to happen.
▶ Conversely, when the event that |Z | > 1.96 happens, you might want to
suspect whether Z really follows N (0, 1)
28 / 54
Math Review statistical inference
▶ We know
µˆn − µ√
σ2/n
∼ N (0, 1)
but we do not know the true value of µ.
▶ So, let’s assume that µ = µ0 for some number µ0 and study whether this
value is reasonable with respect to our distributional knowledge of µˆn.
▶ We test the null hypothesis
H0 : µ = µ0
29 / 54
Math Review statistical inference
▶ Under the null hypothesis H0, we know µˆn−µ0√
σ2/n
∼ N (0, 1) and the event
∣∣∣ µˆn − µ0√
σ2/n
∣∣∣ > 1.96
happens with 5% of probability (low prob).
▶ Since µ0 is a hypothesised value, we consider this unlikely event as a
statistical evidence against H0. So, we reject H0.
▶ We admit that even if H0 is correct, we could mistakenly reject H0 with
5% of probability. This error rate is called the size (level) of the test.
▶ You could have a different size. For example, you could use
P(|Z | > 2.5758) = 0.01
30 / 54
Math Review interval estimation
▶ A confidence interval (CI) can be obtained by inverting the test
▶ For example, 95% CI is the set of all that values for µ0 that would not be
rejected by the test of 5% size. After some algebra, we construct
95% CI =
[
µˆn − 1.96
√
σ2/n, µˆn + 1.96
√
σ2/n
]
or simply we may write µˆn ± 1.96
√
σ2/n
▶ CI = interval estimator
31 / 54
Math Review asymptotic inference
▶ We relax the normality assumption. That is, Y1, . . . ,Yn are i.i.d from
some distribution (may not be normal) with mean µ and variance σ2
▶ Two useful asymptotic results:
µˆn
p−→ µ Weak Law of Large Numbers (WLLN)
√
n(µˆn − µ)/σ d−→ N (0, 1) Central Limit Theorem (CLT)
▶ WLLN directly implies consistency of µˆn
32 / 54
Math Review asymptotic inference
▶ The CLT √
n(µˆn − µ)/σ d−→ N (0, 1)
implies that when n is large
√
n(µˆn − µ)/σ a∼ N (0, 1)
where a∼ is ‘approximately distributed,’ which can be rewritten as
µˆn − µ√
σ2/n
a∼ N (0, 1)
33 / 54
Math Review asymptotic inference
▶ So, the hypothesis testing and confidence interval are asymptotically
valid even without the normality assumption.
▶ For example, we reject H0 : µ = µ0 at 5% level if∣∣∣ µˆn − µ0√
σ2/n
∣∣∣ > 1.96
and the asymptotic 95% confidence interval is
µˆn ± 1.96
√
σ2/n
▶ Under normality, µˆn−µ0√
σ2/n
∼ N (0, 1) but
without normality µˆn−µ0√
σ2/n
a∼ N (0, 1)
34 / 54
Math Review asymptotic inference
▶ So far, we have assumed that σ2 is known. Now, we relax this
assumption.
▶ If σ2 is unknown, we cannot compute the test statistic and the
confidence interval.
▶ Suppose we have a consistent estimator for σ2, i.e.,
σˆ2n
p−→ σ2,
▶ Exercise: show that the CLT and Slutsky lemma implies
√
n(µˆn − µ)/σˆn d−→ N (0, 1)
35 / 54
Math Review asymptotic inference
▶ Then, this result √
n(µˆn − µ)/σˆn d−→ N (0, 1)
implies that when n is large
√
n(µˆn − µ)/σˆn a∼ N (0, 1)
where a∼ is ‘approximately distributed,’ which can be rewritten as
µˆn − µ√
σˆ2n/n
a∼ N (0, 1)
36 / 54
Math Review asymptotic inference
▶ Therefore, we can construct the asymptotic testing and confidence
interval as before.
▶ Specifically, we reject H0 : µ = µ0 at 5% level if∣∣∣ µˆn − µ0√
σˆ2n/n
∣∣∣ > 1.96
and the asymptotic 95% confidence interval is
µˆn ± 1.96
√
σˆ2n/n
▶ Note that
√
σˆ2n/n is called the standard error, se(µˆn) =
√
σˆ2n/n.
37 / 54
Math Review asymptotic inference
▶ Recall that Z ∼ N (0, 1). Then, Z 2 ∼ χ2(1).
▶ Also, Z1, . . . ,ZK iid∼ N (0, 1). Then,
∑K
j=1 Z
2
j ∼ χ2(K )
▶ Under H0 : µ = µ0, since
µˆn − µ0√
σˆ2n/n
a∼ N (0, 1)
we have
Wn :=
(
µˆn − µ0√
σˆ2n/n
)2
a∼ χ2(1)
▶ The LHS is the Wald statistic. At level 5%, we reject H0 if Wn > (1.96)2.
38 / 54
Math Review asymptotic inference
▶ We started from the result
√
n(µˆn − µ)/σ d−→ N (0, 1) or
√
n(µˆn − µ) d−→ N (0, σ2)
and by squaring both sides and using σˆ2n
p−→ σ2, we obtain
Wn = n(µˆn − µ)
(
σˆ2n
)−1
(µˆn − µ) a∼ χ2(1)
▶ More generally, for a parameter θ ∈ RK and and its estimator θˆn, if
√
n(θˆn − θ) d−→ N (0,V ),
we can construct the Wald statistic
n(θˆn − θ)′Vˆ−1(θˆn − θ) a∼ χ2(k)
with a consistent estimator Vˆ
p−→ V , the asymptotic variance
covariance matrix of θˆ. You will see this sandwich form often.
39 / 54
Math Review asymptotic inference
▶ To test
H0 : θ1 = θ1,0, θ2 = θ2,0..., θq = θq,0
at the 0.05 level, why not do q z-tests?
▶ Let q = 2, so we conduct two z-tests, each at level α = 0.05, with
Z1 =
θ̂1 − θ1,0
s.e.(θ̂1)
, Z2 =
θ̂2 − θ2,0
s.e.(θ̂2)
▶ Reject H0 if either z-test rejects the null. Then
P(Reject |H0) = P(|Z1| > 1.96|H0) + P(|Z2| > 1.96|H0)
− P(|Z1| > 1.96, |Z2| > 1.96|H0)
= 2× 0.05− p = 0.1− p
▶ Problem 1: We do not know p
▶ Problem 2: 0.1− p ̸= 0.05
▶ Even in the best case in which Z1 and Z2 are independent, p = 0.052,
so the level of this test is 0.1− 0.052 ≈ 0.1 ̸= 0.05!
▶ Under independence we could adjust the level of the z-tests, by solving
2α− α2 = 0.05 (α = 0.0253). But Z1 and Z2 will never be independent
in practice...
40 / 54
Math Review matrix algebra
▶ As mentioned briefly, we often use a vector and matrix notation in
econometrics. Vectors and matrices are collections of numbers in a
certain form.
▶ Let a :=
 a1...
an
 and b :=
 b1...
bn
 be n dimensional column vectors.
▶ Then, a′ =
(
a1 . . . an
)
, the transpose of a, is an n-dim row vector.
▶ Addition is defined when two vectors are in the same form.
a+ b =
 a1 + b1...
an + bn
 and a′ + b′ = (a+ b)′. But a′ + b is not defined.
41 / 54
Math Review matrix algebra
▶ Inner product
(a,b) := a′ · b = ( a1 . . . an ) ·
 b1...
bn
 = n∑
i=1
aibi
▶ Consider an n × k dimensional matrix;
A :=
 a11 . . . a1k... ...
an1 . . . ank
 = ( a1 . . . ak )
where aj =
 a1j...
anj
 for j = 1, . . . , k .
▶ a vector is a special case of a matrix (e.g., k = 1)
42 / 54
Math Review matrix algebra
▶ Transpose:
A′ :=
 a11 . . . an1... ...
a1k . . . ank
 =
 a
′
1
...
a′k

▶ property: (AB)′ = B′A′
▶ Addition is allowed between two matrices in the same dimension. Let
B :=
 b11 . . . b1k... ...
bn1 . . . bnk
 = ( b1 . . . bk )
where bj denotes the j th column of B. Then,
A+ B =
 a11 + b11 . . . a1k + b1k... ...
an1 + bn1 . . . ank + bnk

But, A′ + B is not defined unless k = n, i.e., A and B are square
43 / 54
Math Review matrix algebra
▶ To explain multiplication, we define a k ×m matrix
C :=
 c11 . . . c1m... ...
ck1 . . . ckm

Then,
A× C :=
 a11 . . . a1k... ...
an1 . . . ank

 c11 . . . c1m... ...
ck1 . . . ckm

:=

∑k
ℓℓ=1 a1ℓcℓ1
∑k
ℓ=1 a1ℓcℓ2 . . .
∑k
ℓ=1 a1ℓcℓm∑k
ℓ=1 a2ℓcℓ1
∑k
ℓ=1 a2ℓcℓ2 . . .
∑k
ℓ=1 a2ℓcℓm
...
...∑k
ℓ=1 anℓcℓ1
∑k
ℓ=1 anℓcℓ2 . . .
∑k
ℓ=1 anℓcℓm

44 / 54
Math Review matrix algebra
▶ The (i, j)th entry of A× C is ∑kℓ=1 aiℓcℓj
▶ Multiplication is defined only when the number of columns of the first
matrix is equal to the number of rows of the second matrix.
▶ For example, A︸︷︷︸
n×k
× C︸︷︷︸
k×m
is well defined and the resulting matrix is
n ×m dimensional but C︸︷︷︸
k×m
× A︸︷︷︸
n×k
is not defined unless m = n.
▶ Suppose A is a square matrix, i.e., n = k
aij is called a diagonal element if i = j . Otherwise, it is off-diagonal.
If off-diagonal elements are all zero, A is said to be a diagonal matrix.
If A′ = A, it is symmetric (aij = aji for all (i, j))
45 / 54
Math Review matrix algebra
▶ If A is diagonal and all diagonal elements are ones, A is an n
dimensional identity matrix, for which a reserved notation is In. So,
In :=

1 0 . . . 0
0 1 . . . 0
...
0 0 . . . 1

▶ A multiplication with the identity does not change.
That is, A× I = A and I × B = B.
▶ For a square matrix A, if there is another matrix B such that
I = AB = BA, then B is called the inverse of A and denoted by A−1,
and A is said to be invertible or nonsingular.
▶ (i) (αA)−1 = 1
α
A−1 for a constant α,
(ii) (AB)−1 = B−1A−1,
(iii) (A′)−1 = (A−1)′
46 / 54
Math Review matrix algebra
▶ For any n × n matrix A, the trace of A, tr(A), is the sum of diagonal
elements
tr(A) =
n∑
i=1
aii
▶ (i) tr(In) = n, (ii) tr(A′) = tr(A), (iii) tr(A+ B) = tr(A) + tr(B),
(iv) tr(αA) = αtr(A) for a constant α
(v) tr(AB) = tr(BA) for any dimensionality conformable A and B
47 / 54
Math Review matrix algebra
▶ Let x1, . . . , xn be a set of n × 1 vectors. These are linearly
independent if and only if
α1x1 + α2x2 + · · ·+ αnxn = 0
implies α1 = α2 = · · · = αn = 0.
▶ When x1, . . . , xn are linearly independent, xj cannot be represented as a
linear combination of others.
▶ Let X be an n × k matrix (n ≥ k ). The rank of X , denoted by rank(A), is
the maximum number of linearly independent column vectors in X .
▶ If rank(X ) = k , X has full column rank.
48 / 54
Math Review matrix algebra
▶ Let A be an n × n symmetric matrix. The quadratic form with A is a
function f (x) defined for all n × 1 vectors x
f (x) = x ′Ax
Then, A is positive definite (p.d.) if x ′Ax > 0 for all non-zero x .
Moreover, A is positive semi-definite (p.s.d.) if x ′Ax ≥ 0 for all x .
▶ (i) If A is p.d., then aii > 0 for all i .
(ii) If A is p.s.d., then aii ≥ 0 for all i .
(iii) If A is p.d., then A is invertible, i.e., A−1 exists
(iv) If X is n × k , then X ′X and XX ′ are p.s.d
(v) If X is n × k and rank(X ) = k , then X ′X is p.d. (so invertible)
49 / 54
Math Review matrix algebra
▶ For a given n × 1 vector a, consider the linear function
f (x) = a′x
for all n × 1 vectors x . Then,
∂f (x)
∂x
= a
▶ For an n × n symmetric matrix A, define a quadratic function
g(x) = x ′Ax
for all n × 1 vectors x . Then,
∂g(x)
∂x
= 2Ax
▶ Note ∂f (x)
∂x′ =
(
∂f (x)
∂x
)′
and ∂g(x)
∂x′ =
(
∂g(x)
∂x
)′
50 / 54
Math Review matrix algebra
▶ Let y be an n × 1 vector and X be n × k . Consider
min
β
(y − Xβ)′(y − Xβ)
▶ (y − Xβ)′ = y ′ − (Xβ)′ = y ′ − β′X ′. So, the objective function is
(y − Xβ)′(y − Xβ)
= (y ′ − β′X ′)(y − Xβ)
= y ′y − y ′Xβ − β′X ′y + β′X ′Xβ
= y ′y − 2y ′Xβ + β′X ′Xβ
▶
min
β
(y − Xβ)′(y − Xβ) = min
β
(y ′y − 2y ′Xβ + β′X ′Xβ)
51 / 54
Math Review matrix algebra
▶ Then, the first order necessary condition (FOC) is
∂
∂β
(y ′y − 2y ′Xβ + β′X ′Xβ) = −2 ∂
∂β
y ′Xβ +
∂
∂β
β′X ′Xβ
▶ Applying the rules above,
∂
∂β
y ′Xβ = X ′y ,
∂
∂β
β′X ′Xβ = 2X ′Xβ
▶ The FOC is then
−2X ′y + 2X ′Xβ = 0 ⇐⇒ X ′y = X ′Xβ
at the optimal solution for β, say βˆ. The solution exists if and only if X ′X
is invertible;
βˆ = (X ′X )−1X ′y
rank(X ) = k =⇒ (X ′X ) is p.d. (therefore invertible)
52 / 54
Math Review
▶ Suppose we have n random variables, Y1, . . . ,Yn. Then, define a
random vector
Y :=
 Y1...
Yn
 , and then we have E [Y ] :=
 E [Y1]...
E [Yn]

▶ Also V (Y ) := E [(Y − E [Y ])(Y − E [Y ])′].
▶ (i) The (i, j)th element of V (Y ) is C(Yi ,Yj).
(ii) The i th diagonal element of V (Y ) is C(Yi ,Yi) = V (Yi).
(iii) V (Y ) is symmetric because C(Yi ,Yj) = C(Yj ,Yi),
53 / 54
Math Review
▶ Let X be another random variable. Then,
E [Y |X ] :=
 E [Y1|X ]...
E [Yn|X ]

and
V (Y |X ) := E [(Y − E [Y |X ])(Y − E [Y |X ])′|X]
54 / 54