程序代写案例-ECMT1020|学霸联盟

程序代写案例-ECMT1020

时间：2022-05-11

ECMT1020 Introduction to Econometrics Week 9, 2022S1
Lecture 8: Specification of Regression Variables
Instructor: Ye Lu
Please read Chapter 6 of the textbook.
Contents
1 Omitting a relevant variable § 2
1.1 Omitted variable bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Review: If simple regression (2) were true . . . . . . . . . . . . . . . . 3
1.1.2 Now: True model is multiple regression (1) . . . . . . . . . . . . . . . 4
1.2 Effects on the statistical tests and R2 . . . . . . . . . . . . . . . . . . . . . . 6
2 Including a redundant variable 7
2.1 No bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Efficiency loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3 Proxy variable 9
4 Testing for linear restrictions 10
4.1 F test for linear restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1.1 F test for one linear restrictions . . . . . . . . . . . . . . . . . . . . . 12
4.1.2 F test for multiple linear restrictions . . . . . . . . . . . . . . . . . . . 13
4.2 t test for a linear restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Figure 1: Four cases of model specification
1
1 Omitting a relevant variable §
Recall the example that we used at the beginning of Lecture 5 (Multiple Regression Analysis)
to introduce the multiple regression:
Y = β1 + β2X2 + β3X3 + u (1)
where
• Y = EARNINGS, the hourly earnings measured in dollars;
• X2 = S, years of schooling (highest grade completed);
• X3 = EXP, years spent on working after leaving full-time education (experience);
• u is the disturbance term.
When we discussed the interpretation of coefficient β2 in Section 1.3 of Lecture 5, we em-
phasized the difference between the interpretation of β2 from a simple regression
Y = β1 + β2X2 + v, (2)
and the interpretation of β2 from the multiple regression (1) where an additional regressor
X3 is present. Note that I use v to denote the disturbance term in regresssion (2), because
I want to make it clear that the two disturbance terms in these two regressions are very
different1.
In this section, we will analyze in more detail the consequences of not including the
variable X3 in our regression, in particular when it ought to be included. In other words,
what will happen if (1) is the ‘true model’ but we run regression (2)? It turns out that our
estimator of the coefficient of X2 will suffer from the so-called omitted variable bias; and the
statistical tests will be invalid.
The analysis here is similar to what we did in Section 1.3 of Lecture 5. Again, we
• denote βˆ2 as the OLS estimator for β2 in the multiple regression (1);
• denote β˜2 as the OLS estimator for β2 in the simple regression (2).
A note before we proceed: we are still in the classical linear regression model (CLRM)
world, and all the assumptions for CLRM apply to the true models under our discussion.
1.1 Omitted variable bias
In Lecture 5, we showed that (in Section 1.4.1) βˆ2 is an unbiased estimator for β2 in model (1)
under the assumptions of CLRM. Moreover, we explained intuitively and by example (earn-
ings regression) that β˜2 and βˆ2 can be very different, and this is not just due to random
chance.2
1Oftentimes we use the same generic notation u to denote the disturbance term. But here is one of the
examples when we want to make the difference more explicit.
2By checking the formulas of βˆ2 and β˜2, you can see they differ in their values for sure. In fact, in our
earnings regression example, both βˆ2 and β˜2 are postive and β˜2 < βˆ2. This can be seen from the comparison
of two slopes estimated from the simple regression without X3 and from the multiple regression with X3,
shown in Figure 3.2 in the textbook, or from the Stata outputs. Again, omitting the work experience as
the explanatory variable for hourly earnings will lead us to underestimate the effect of schooling on hourly
earnings.
2
Here, we follow up with the previous discussion, and will formally show that β˜2 is a biased
estimator for the effect of X2 on Y , if model (1) is the true model. This addresses a very
common scenario in practice: suppose model (1) is the true model (X3 has explanatory
power to the dependent variable and hence ought to be included in the model), but unfor-
tunately we did not know it §. Instead, we fit a regression omitting X3, like in (2), and get
the OLS estimate β˜2 for the slope coefficient of X2. Then we use β˜2 to interpret the effect
of X2 on Y . The question is:
HOW WRONG can we be?
To answer this question, we begin with writing down the OLS formula for β˜2 given n
observations of Y and X2:
β˜2 =
∑n
i=1(X2i −X2)(Yi − Y )∑n
i=1(X2i −X2)2
, (3)
where X2 =
1
n
∑n
i=1X2i and Y =
1
n
∑n
i=1 Yi. In what follows, we analyze the property of β˜2
in two different scenarios:
1. The simple regression (2) is indeed the true model; −→ This is a review of simple
regression analysis
2. The simple regression (2) is NOT the true model, and the true model is the multiple
regression (1). −→ This is new because we now deal with ‘model misspecification’
1.1.1 Review: If simple regression (2) were true
Recall that if model (2) were the true model where the assumptions of CLRM holds (including
E(vi) = 0 for all i = 1, . . . , n), then β˜2 is an unbiased estimator for β2 in model (2). This
is what we learned in Lecture 4 (Chapter 3 in the textbook) on simple regression analysis.
The way how we show it was to first note that under the true model (2),
Yi = β1 + β2X2i + vi and Y = β1 + β2X2 + v,
which implies that for each i = 1, . . . , n,
Yi − Y = β2(X2i −X2) + (vi − v), (4)
where v = 1n
∑n
i=1 vi. Then we plug (4) into the OLS formula (3) to get
β˜2 =
∑n
i=1(X2i −X2)[(β2(X2i −X2) + (vi − v)]∑n
i=1(X2i −X2)2
,
=
β2
∑n
i=1(X2i −X2)2 +
∑n
i=1(X2i −X2)(vi − v)∑n
i=1(X2i −X2)2
,
= β2
∑n
i=1(X2i −X2)2∑n
i=1(X2i −X2)2
+
∑n
i=1(X2i −X2)(vi − v)∑n
i=1(X2i −X2)2
,
= β2 +
∑n
i=1(X2i −X2)(vi − v)∑n
i=1(X2i −X2)2
. (5)
3
Lastly, we denote
di :=
X2i −X2∑n
i=1(X2i −X2)2
, i = 1, . . . , n, (6)
which allows us to write (5) as
β˜2 = β2 +
n∑
i=1
di(vi − v). (7)
The unbiasedness can then be deduced by taking expectation on both sides of (7), with the
notice that di’s are all deterministic, which yields
E(β˜2) = β2 +
n∑
i=1
diE(vi − v¯) = β2 + 0 = β2,
using the assumption that E(vi) = 0 for i = 1, . . . , n. The interpretation is that β˜2 is an
unbiased estimator for β2, the popoulation parameter in regression (2).
1.1.2 Now: True model is multiple regression (1)
The estimator here is still β˜2 given by (3). But now suppose the true model is actually (1)
which means the partial effect ofX2 on Y is characterized by the population parameter β2 in (1).
Again, we impose the usual assumptions of CLRM on the true model, which include that
E(ui) = 0 for i = 1, . . . , n.
Note that under the true model (1), what we have in (4) are no longer true. Instead, we
have
Yi = β1 + β2X2i + β3X3i + ui and Y = β1 + β2X2 + β3X3 + u,
which implies that
Yi − Y = β2(X2i −X2) + β3(X3i −X3) + (ui − u), (8)
for each i = 1, . . . , n. Next, we plug (8) into the OLS formula (3) to understand the properties
of β˜2 under the true model (1). It’s a pain in the neck
3 but we can do it similarly as how we
derived (5). Omitting some lines of tedious (but simple) derivations, we have
β˜2 =
∑n
i=1(X2i −X2)[(β2(X2i −X2) + β3(X3i −X3) + (ui − u)]∑n
i=1(X2i −X2)2
,
= β2︸︷︷︸
Term 1
+β3
∑n
i=1(X2i −X2)(X3i −X3)∑n
i=1(X2i −X2)2︸︷︷︸
Term 2
+
∑n
i=1(X2i −X2)(ui − u)∑n
i=1(X2i −X2)2︸︷︷︸
Term 3
. (9)
Let’s pause here a bit. We should have a close look at the three terms on the right-hand
side of (9), and compare them with the two terms on the right-hand side of (5).
3Especially because I did not assume away the intercept term in the regression as I did in Lecture 4 to
simplify the derivation.
4
1. Term 1 in (9) is β2, the population parameter in the true model (1). Note that β2
here is different than the first term in (5), although also denoted as ‘β2’, which was the
popoulation parameter in model (2).
2. Term 3 in (9) is similar to the second term in (5) except that we have a different
disturbance term u here. The zero mean assumption of the disturbance term now
applies to u becuase it is the disturbance term of the true model. So, using the
assumption E(ui) = 0 for i = 1, . . . , n, we have E(u) =
1
n
∑n
i=1E(ui) = 0 and hence
E(ui − u) = E(ui)− E(u) = 0 for i = 1, . . . , n. Then it follows that
E(Term 3) = E
(
n∑
i=1
di(ui − u¯)
)
=
n∑
i=1
diE(ui − u¯) = 0, (10)
where the deterministic di’s are the same as defined in (6).
3. Term 2 in (9) is completely new. In particular,
Term 2 = β3 · γX2,X3 , (11)
where
γX2,X3 :=
∑n
i=1(X2i −X2)(X3i −X3)∑n
i=1(X2i −X2)2
. (12)
The first observation is that γX2,X3 is deterministic because it depends solely on the
observations of the regressors. Therefore,
E(Term 2) = Term 2 = β3γX2,X3 , (13)
which is nonzero in general. The second observation 4 is that γX2,X3 is closely related
to the sample correlation coefficient between X2 and X3:
rX2,X3 =
∑n
i=1(X2i −X2)(X3i −X3)√∑n
i=1(X2i −X2)2
√∑n
i=1(X3i −X3)2
.
In fact, we have
γX2,X3 = rX2,X3 ·
√∑n
i=1(X3i −X3)2∑n
i=1(X2i −X2)2︸︷︷︸
positive
= rX2,X3 ·
√
MSD(X3)
MSD(X2)
,
and hence γX2,X3 and rX2,X3 have the same sign:
• If X2 and X3 are positively correlated, then γX2,X3 is positive;
• If X2 and X3 are negatively correlated, then γX2,X3 is negative;
• If X2 and X3 are not correlated, i.e. γX2,X3 = 0, then Term 2 in (9) is zero too.
4You might have also recognized that γX2,X3 defined in (12) is actually the estimated slope coefficient if
you fit a simple regression of X3 (the dependent variable) on X2 (the regressor)! Fitting a regression between
two regressors in a multiple regression should not sound unfamiliar to you, because this appeared in the
‘purged regression’ we discussed in Lecture 5 when presenting the Frisch-Waugh-Lovell theorem.
5
Now, we have reached the point that we can deduce from (9), (13) and (10) that
E(β˜2) = β2 + β3γX2,X3 .
Clearly, β˜2 is a biased estimator for β2, the parameter in the true model (1) interpreted as
the partical effect of X2 on Y , unless β3 = 0 or γX2,X3 = 0. When β˜2 is a biased estimator,
its bias is
E(β˜2)− β2 = β3 · γX2,X3 , (14)
which is often called ‘omitted variable bias’ for the variable X3 is omitted when estimating
the model. We say
• β˜2 has upward bias, or β˜2 overestimates the true parameter, if E(β˜2)− β2 > 0;
• β˜2 has downward bias, or β˜2 underestimates the true parameter, if E(β˜2)− β2 < 0.
Therefore, what determine whether β˜2 overestimates or underestimates the partial effect of
X2 on Y are the signs of β3 and γX2,X3 (we have noted that the sign of γX2,X3 is the same
as the sign of the sample correlation coefficient between X2 and X3). Depending on your
application, the omitted variable bias can be either positive or negative.
For example, in Lecture 5 when we considered the earnings regression example, X2 (years
of schooling) and X3 (work experience) are negatively correlated. This makes γˆX2,X3 neg-
ative. On the other hand, β3 is much likely to be positive because it captures the partial
effect of work experience on hourly earnings. All together, E(β˜2)−β2 < 0 by (14), and hence
β˜2 may underestimate the effect of schooling on the hourly earnings with variable on work
experience omitted.
The intuitive understanding is shown in Figure 2 which is also given in the textbook.
Figure 2: Direct and indirect effects of X2 when X3 is omitted.
1.2 Effects on the statistical tests and R2
Another consequence of omitting a variable that is in the true model is that the standard
errors of the coefficients and the test statistics are in general invalidated. In principle, this
means that we are unable to test any hypotheses based on the regression results. §§
In general, it is impossible to determine the contribution to R2 of each explanatory
variable in multiple regression analysis. We can see why in the context of omitted variable.
6
2 Including a redundant variable
Next, we analyze the flip side of the coin: What if the simple regression model (2) is true,
but we opt for a more complicated model – the multiple regression (1) with a redundant
variable X3?
First of all, there is an intuitive explanation. Note that the the simple regression without
variable X3 is nested in the multiple regression with variable X3, in the sense that the former
can be considered as the latter subject to the parameter restriction that β3 = 0. Specifically,
the true model (2) can be written as
Y = β1 + β2X2 + 0︸︷︷︸
β3=0
·X3 + v. (15)
Provided that this is the true model, then the OLS estimator for β2 will be an unbiased
estimator of β2 and the OLS estimator for β3 will be an unbiased estimator for β3 = 0 under
the assumptions of CLRM.
But if you realized beforehand that in the true model we have β3 = 0, then you would
be able to exploit this information to exclude X3 from your regression, which will yield an
efficiency gain in the estimation. Conversely, if you did not realize this but included X3 in
your model specification, then you will face an efficiency loss.
Below we give a more formal analysis on the properties of the OLS estimator for the
effect of X2 on Y when a redundant variable X3 is included in the regression specification.
2.1 No bias
We first denote
s22 =
n∑
i=1
(X2i −X2)2, s23 =
n∑
i=1
(X3i −X3)2, s23 =
n∑
i=1
(X2i −X2)(X3i −X3),
where s22 is essentially n ·MSD(X2) and s23 is essentially n ·MSD(X3). Then the formula for
βˆ2 in the multiple regression (1) can be written as
βˆ2 =
numerator
denominator
(16)
where
numerator = s23
n∑
i=1
(X2i −X2)(Yi − Y )− s23
n∑
i=1
(X3i −X3)(Yi − Y ),
denominator = s22s
2
3 − s223 = s22s23
(
1− s
2
23
s22s
2
3
)
= s22s
2
3(1− r2X2,X3). (17)
Note that the denominator of βˆ2 depends solely on the regressors.
Under the true model (2), we have (4). To analyze the behavior of βˆ2 under the true
model (2), we need to plug (4) into the OLS formula (16). Since the denominator does not
7
depend on Y , all that matters is to see that
numerator = s23
n∑
i=1
(X2i −X2)[β2(X2i −X2) + (vi − v)]
− s23
n∑
i=1
(X3i −X3)[β2(X2i −X2) + (vi − v)]
= s23β2
n∑
i=1
(X2i −X2)2︸︷︷︸
s22
+s23
n∑
i=1
(X2i −X2)(vi − v)
− s23β2
n∑
i=1
(X2i −X2)(X3i −X3)︸︷︷︸
s23
+s23
n∑
i=1
(X3i −X3)(vi − v)
= (s22s
2
3 − s223)︸︷︷︸
denominator
β2 +
n∑
i=1
s23(X2i −X2)(vi − v) +
n∑
i=1
s23(X3i −X3)(vi − v)
= β2 · denominator +
n∑
i=1
[
s23(X2i −X2) + s23(X3i −X3)
]︸︷︷︸
:=ai
(vi − v). (18)
Notice that the denominator given in (17) is simply a number calculated from the observa-
tions of regresssors. For each i = 1, . . . , n, we define
d†i :=
ai
denominator
.
Then deviding both sides of (18) by the ‘denominator’, we have
βˆ2 = β2 +
n∑
i=1
d†i (vi − v), (19)
the usual form!
We should compare the right-hand side of equation (19) and that of equation (7):
1. β2 in (19) is the same as β2 in (7), because they are both population parameter in the
(assumed) true model (2).
2. The difference between the second term in (19) and that in (7) lies in the difference
between di and d
†
i . It is clear that di 6= d†i : the former depends only on the observations
of X2, while the latter depends on observations of both X2 and X3. But they are both
deterministic as in the usual structure.
Based on (19), we conclude immediately that
E(βˆ2) = β2 +
n∑
i=1
d†iE(vi − v) = β2 + 0 = β2, (20)
which means βˆ2 is unbiased for estimating the effect of X2 on Y , if model (2) is the true
model!
8
2.2 Efficiency loss
Since βˆ2 is unbiased for β2 in model (2), we may want to further examine its variance,
especially for comparing it with the variance of β˜2 (another unbiased estimator).
We can certainly derive the variance of βˆ2 by definition and (19)–(20):
σ2
βˆ2
= Var(βˆ2) = E[(βˆ2 − E(βˆ2))2] = E[(βˆ2 − β2)2]
= E
( n∑
i=1
d†i (vi − v)
)2 ,
but here we will just directly cite the result that was stated in Lecture 5 (or Chapter 3 of
the textbook):
σ2
βˆ2
=
σ2v∑n
i=1(X2i −X2)2
1
1− r2X2,X3
=
σ2v
n ·MSD(X2)
1
1− r2X2,X3
, (21)
where σ2v is the variance of v in the regression model (15), and rX2,X3 is the sample correlation
between X2 and X3.
On the other hand, the variance of β˜2 is
σ2
β˜2
=
σ2v∑n
i=1(X2i −X2)2
=
σ2v
n ·MSD(X2) , (22)
if (2) is the true model.
Comparing (21) and (22), we can clearly see the efficiency loss by using βˆ2 instead of β˜2
when X3 is a redundant variable (and X3 is correlated with X2). This is simply because
σ2
βˆ2
> σ2
β˜2
unless rX2,X3 = 0. Moreover, the more correlated X2 and X3 are (no matter positively or
negatively), the larger σ2
βˆ2
will be and hence the more efficiency loss there is.
3 Proxy variable
It happens often in practice that a variable that you would like to include in a regression
is unobservable or impossible to be measured. In short, you cannot obtain data on that
variable. For example:
• The survey data set happens not to include data on a variable that you are interested
in;
• Socio-economic status and quality of education: vaguely defined and impossible to
measure them.
In such unpleasant scenario, rather than leave out the missing variable5, it is generally
better to use a proxy variable to stand in for it, if you can find one. Back to the above two
examples:
5We have already seen that your regression may suffer from omitted variable bias if a missing variable
actually belongs to the true model.
9
• The survey data set happens not to include data on a variable that you are interested
in. −→ Have a look at the data actually collected and see whether there is a suitable
substitute.
• Socio-economic status −→ Use income as a proxy.
The proxy variable discussed in this chapter is the ‘perfect’ proxy in the sense that the
proxy variable, denoted as Z, has an exact linear realtionship with the variable which we do
not have data on, denoted as X:
Z = µ+ λX.
If we use the ‘perfect’ proxy Z in the place of X in the regression, although we will NOT be
able to obtain an estimate of the coefficent of the variable X, most of the regression results
will be saved. Specifically,
• the coefficients of other variables will not be biased;
• the standard errors and associated t tests will be valid;
• the R2 will be the same as if we have been able to included the variable X directly.
• the t statistic for the proxy variable Z will be the same as the t statistic for the variable
X.
However, it is unusual to find a perfect proxy. Generally, the best you can hope for is
a proxy that is approximately linearly related to the missing variable. The consequences
of using an ‘imperfect’ proxy instead of a perfect one are parallel to those of using a vari-
able subject to measurement error which will be discussed in Lecture 10 (Chapter 8 in the
textbook).
4 Testing for linear restrictions
What is a linear restriction in a regresion model? It means that the parameters conform to
a simple linear equation such as
β2 = 0, β3 = 1, β2 = β3, or β2 + β3 = 2. (23)
What’s versus the linear restriction is a nonlinear restriction such as β2 = β3β4 (the param-
eters do not conform to a linear equation).
We discuss in this section about how to test linear restrictions on the parameters in a
regression model, such as those in (23), in general.
4.1 F test for linear restrictions
There is a very general framework to use F test to test one or multiple linear restrictions.
The null hypothesis can be formed as
H0 : some linear restriction(s) on the parameters (24)
Then we have
10
Restricted model (restriction from H0) versus Unrestricted model
Idea: testing the null hypothesis in (24) on the parameters can be done by testing whether
the unrestricted model provides a significantly better goodness of fit than the restricted
model. If it does, then the restricted model (or the restrictions in H0) should be rejected.
In general, the unrestricted model is always ‘larger’/‘bigger’, and hence yields smaller
RSS and better fit at the cost of some degrees of freedom (DF)6. Again, no free lunch! The
question is whether this improvement in fit is significant or not.
Recall the general formula for the F test statistic we mentioned in the last lecture:
F (extra DF,DF remaining) =
improvement in fit/extra DF
RSSremaining/DF remaining
(25)
where the improvement in fit comes from using a model with more parameters (at the cost
of extra DF); and the RSS remaining and DF remaining are the RSS and DF of the model
with more parameters. The null hypothesis is
H0 : no improvement in fit from a model with more parameters
and we reject the null hypothesis if the F test statistic is greater than the critical value at
certain significant level.
We may apply this formula to testing the restricted model against the unrestricted model.
We first denote
• TSS is the total sum of squares which is TSS =
∑n
i=1(Yi − Y )2, the same for both
restricted and unrestricted models;
• DFR and DFU as the degrees of freedom in the restricted model and unrestricted model,
respectively;
• ESSR,RSSR as the explained sum of squares, and residual sum of squares of the
restricted model;
• ESSU ,RSSU as the explained sum of squares, and residual sum of squares of the
unrestricted model.
Then the formula in (25) can be written as
F (DFR −DFU ,DFU ) = (RSSR − RSSU )/(DFR −DFU )
RSSU/DFU
(26)
where the improvement in fit RSSR − RSSU is the same as ESSU − ESSR, simply because
ESSR + RSSR = ESSU + RSSU = TSS.
Suppose there are k parameters in total in the unrestricted model. Let the number of
restrictions is p, then by moving from the restricted model to unrestricted model, we gain
better fit at the cost of losing p degress of freedom. In this case,
• DFU = n− k, and
6A ‘larger’ model (model with more parameteres or model that is less restricted) improves the goodness of
fit at the cost of sacrificing extra degrees of freedom (DF). This is why the DF of a larger model, compared
with the more pasimonious model (the smaller model with fewer parameters or more parameter restrictions)
in the test, as ‘DF remaining’.
11
• DFR −DFU = p for there are p linear restrictions7.
Therefore, the formula for F test statistic boils down to
F (p, n− k) = (RSSR − RSSU )/p
RSSU/(n− k) . (27)
This is stated in equation (6.40) in the textbook.
4.1.1 F test for one linear restrictions
If there is only one linear restriciton, and there are k parameters in the unrestricted model
(alternative model), then the F test uses the test statistic
F (1, n− k) = RSSR − RSSU
RSSU/(n− k) . (28)
This is what we have seen before when we wanted to test:
• the goodness of fit of the simple regression with one regressor (Section 2.7 of the
textbook);
• the significance of the additional regressor in the multiple regression (Testing equation
(3.80) against equation (3.81) on p.186, Section 3.5 of the textbook)
In both of these examples, the null hypothesis is of the type
H0 : βi = 0, for some i. (29)
Here βi can be the slope parameter in the simple regression, or any additional parameter in
the multiple regression.
In this chapter, we give an example of testig for the linear restriction of a different type:
H0 : β3 = β4 (30)
in an educational attainment regression
Unrestricted model : Y = β1 + β2X2 + β3X3 + β4X4 + u, (31)
where
• Y = S, the years of schools;
• X2 = ASVABC,
• X3 = SM, mother’s education;
• X4 = SF, father’s education;
Under the null hypothesis that mother’s education and father’s education are equally impor-
tant for educational attainment, the restricted model is
Restricted model : Y = β1 + β2X2 + β3(X3 +X4) + u. (32)
7By imposing restrictions on the parameters, we gain degrees of freedom and sacrifice the goodness of fit.
12
Since there are 4 parameters in the unrestricted model, the formula (28) implies that the F
statistic for testing the linear restriction is
F (1, n− 4) = RSSR − RSSU
RSSU/(n− 4) .
where n is the number of observations in the sample, and RSSR and RSSU are the residual
sum of squares from fitting the restricted model (32) and unrestricted model (31), respec-
tively.
4.1.2 F test for multiple linear restrictions
We have seen this use of F test for the joint significance of multiple regressors. In other
words, F test was used to test a group of zero restrictions. For example, when we want to
test the null hypothesis such as that in equation (3.78) on p.185, Section 3.5 of the textbook8:
H0 : βm+1 = βm+2 = · · · = βk = 0 (33)
where the parameters are from the regression
Unrestricted model: Y = β1 + β2X2 + · · ·+ βmXm + βm+1Xm+1 + · · ·+ βkXk + u. (34)
Note that when m = 1, the null hypothesis (33) is for testing the joint significance of all the
explanatory variables in the unrestricted model9.
In general, we have in total p = k −m linear restrictions:
βm+1 = 0, βm+2 = 0, · · · , and βk = 0
imposed by the null hypothesis in (33), and the restricted model is
Restricted model : Y = β1 + β2X2 + · · ·+ βmXm + u. (35)
So, testing the k −m linear restrictions in (33) can be done by testing whether there is a
significant improvement in fit by using the unrestricted model (34) instead of the restricted
model (35).
Note that there are k parameters in the unrestricted model, the general formula for F
statistic in (27) implies that we should use
F (k −m,n− k) = (RSSR − RSSU )/(k −m)
RSSU/(n− k) .
where RSSR and RSSU are from the restricted model (35) and unrestricted model (34),
respectively.
In particular, if m = 1, then we are testing β2 = · · · = βk = 0, and the restricted model
8Note that I changed the roles of ‘m’ and ’k’ in equation (3.78) of the textbook, such that the unrestricted
model has k parameters which is consistent with the notations here.
9Or sometimes we just say ‘the joint significance of the unrestricted model’ since all the explanatory
variables are included in the joint test.
13
reduces to
Restricted model (if m = 1) : Y = β1 + u.
In this case, RSSR = TSS and the F statistic becomes
F (k − 1, n− k) = (RSSR − RSSU )/(k − 1)
RSSU/(n− k)
=
(TSS− RSSU )/(k − 1)
RSSU/(n− k) =
ESSU/(k − 1)
RSSU/(n− k) .
This was the definition of F statistic given before for testing the joint significance of the
model in Chapter 3.
4.2 t test for a linear restriction
We have seen that using F test for testing the single restriction like (29) is equivalent to
using a t test to test the significance of βi in the regression model.
Here we show that we can also use t test for the linear restriction like (30). First note
that (30) is equivalent to
H0 : β4 − β3 = 0.
So, if we define a new parameter θ = β4−β3, then the null hypothesis can be further written
as
H0 : θ = 0
which can be tested using a simple t test if θ is one of the parameters in some regression
model.
In fact, if we substitute β4 = β3 + θ in the original regression (31), it follows that
Y = β1 + β2X2 + β3X3 + (β3 + θ)X4 + u
= β1 + β2X2 + β3(X3 +X4) + θX4 + u. (36)
We call (36) as the reparameterized regression of the original regression (31). For a compar-
ison, we have
• Original regression (31):
– parameters: β1, β2, β3, β4
– regressors (including intercept): 1, X2, X3, X4
• Reparameterized regression (36):
– parameters: β1, β2, β3, θ
– regressors (including intercept): 1, X2, X3 +X4, X4
In particular, testing β3 = β4 in the original regression (31) is equivalent to testing θ = 0 in
the reparameterized regression (36). The latter can be done by a simple t test.
14