ECON2300-无代写
时间:2024-03-20
ECON2300 - Introductory Econometrics
Linear Regression with Multiple Regressors
Alicia N. Rambaldi
School of Economics, UQ
Lecture 4
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 1
Outline
Omitted variable bias
Causality and regression analysis
Multiple regression and OLS
Measures of fit
Sampling distribution of the OLS estimator
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 2
Omitted Variable Bias (SW Section 6.1)
The error u arises because of factors, or variables, that influence Y but are not
included in the regression function. There are always omitted variables.
Sometimes, the omission of those variables can lead to bias in the OLS
estimator.
The bias in the OLS estimator that occurs as a result of an omitted factor, or
variable, is called omitted variable bias.
For omitted variable bias to occur, the omitted variable Z must satisfy the
following two conditions:
1 Z is a determinant of Y (i.e. Z is part of u); and
2 Z is correlated with the regressor X (i.e., corr(Z ,X ) ̸= 0)
Both conditions must hold for the omission of Z to result in omitted variable
bias, i.e., OLS estimators are biased and inconsistent.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 3
In the test score example:
1 English language ability (e.g. the student has English as a second language) is
likely to affect standardized test scores as these are a combined score of reading
and math proficiency: Z is a determinant of Y , i.e., Z is part of u.
2 Immigrant communities tend to be less affluent and thus have smaller school
budgets and higher STR: Z is correlated with X .
Accordingly, β̂1 is biased. What is the direction of this bias?
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 4
In the test score example:
Suppose that the true model is given as
TestScorei = β0+β1STRi +β2Zi + ei
where Zi is the proportion of ESL students in district i . Now, let’s assume that the
following are logically sound;
STR ↑⇒ TestScore ↓. That is, β1 < 0
Zi ↑⇒ English skill ↓ ⇒ TestScorei ↓. So, Zi is part of ui . Indeed β2 < 0
Zi ↑⇒ Educ Budget ↓ ⇒ STRi ↑. So, corr(Zi ,STRi )> 0
Hence, if the equation without Zi is estimated,
TestScorei = β0+β1STRi + ui︸︷︷︸
=β2Zi+ei
,
the effect of Zi on TestScore will be partially absorbed into the effect of STR on
TestScore.
That is, OLS estimate for β1 will overestimate the effect of STR on TestScore.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 5
What does the sample say about this?
Districts with fewer English Learners have higher test scores
Districts with lower percent EL (PctEL) have smaller classes
Among districts with comparable PctEL, the effect of class size is small (recall
overall “test score gap” = 7.4)
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 6
Causality and regression analysis
This example (test score/STR/proportion of English Learners) shows that, if an
omitted variable satisfies the two conditions for omitted variable bias, then the
OLS estimator in the regression omitting that variable is biased and inconsistent.
So, even if n is large, β̂1 will not be close to β1.
This raises a deeper question: how do we define β1? That is, what precisely do
we want to estimate when we run a regression?
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 7
What precisely do we want to estimate
when we run a regression?
There are (at least) three possible answers to this question:
1 We want to estimate the slope of a line through a scatter plot as a simple
summary of the data to which we attach no substantive meaning.
▶ This can be useful at times, but isn’t useful for furthering economic understanding
and policy decisions and isn’t what this course is about.
2 We want to make forecasts, or predictions, of the value of Y for an entity not in
the data set, for which we know the value of X .
▶ Forecasting is an important job for economists, and can be done by regression
methods without considering causal effects.
3 We want to estimate the causal effect on Y of a change in X.
▶ This is why we are interested in the class size effect. Suppose the school decided to
cut class size by 2 students. What would be the effect on test scores? This is a causal
question (what is the causal effect of STR on test scores?).
▶ Except when we discuss forecasting, the aim of this course is the estimation of
causal effects using regression methods.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 8
What is a causal effect?
“Causality” is a complex concept! In this course, we take a practical approach to
defining causality:
A causal effect is defined to be the effect measured in an ideal randomized
controlled experiment.
▶ Ideal: subjects all follow the treatment protocol – perfect compliance, no errors in
reporting, etc.!
▶ Randomized: subjects from the population of interest are randomly assigned to a
treatment or control group (no confounding factors)
▶ Controlled: having a control group permits measuring the differential effect of the
treatment
▶ Experiment: the treatment is assigned as part of the experiment: the subjects have
no choice, so there is no “reverse causality” in which subjects choose the treatment
they think will work best.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 9
Back to class size
Imagine an ideal randomized controlled experiment for measuring the effect on Test
Score of reducing STR.
In that experiment, students would be randomly assigned to classes, which
would have different sizes.
Because they are randomly assigned, all student characteristics (and thus ui )
would be distributed independently of STRi .
Thus, E (ui |STRi ) = 0, that is, LSA #1 holds in a randomized controlled
experiment.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 10
How does our observational data
differ from this ideal?
The treatment is often not randomly assigned
Consider PctEL – percent of English learners – in the district. It plausibly
satisfies the two criteria for omitted variable bias: Z = PctEL is:
1 a determinant of Y ; and
2 correlated with the regressor X .
Thus, the “control” and “treatment” groups differ in a systematic way, so
corr(STR,PctEL) ̸= 0.
This means that E (ui |STRi ) ̸= 0 because PctEL is included in u and LSA #1 is
violated.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 11
(Randomization + control group) ⇒ any differences between the treatment and
control groups are random – not systematically related to the treatment
We can eliminate the difference in PctEL between the large (control) and small
(treatment) groups by examining the effect of class size among districts with the
same PctEL.
▶ If the only systematic difference between the large and small class size groups is in
PctEL, then we are back to the randomized controlled experiment – within each
PctEL group.
▶ This is one way to control for the effect of PctEL when estimating the effect of STR.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 12
Return to omitted variable bias
Three ways to overcome omitted variable bias;
1 Run a randomized controlled experiment in which treatment (STR) is randomly
assigned: then PctEL is still a determinant of TestScore, but PctEL is
uncorrelated with STR. (This solution to Omitted Variable bias is rarely
feasible.)
2 Adopt the “cross tabulation” approach, with finer gradations of STR and PctEL –
within each group, all classes have the same PctEL, so we control for PctEL (But
soon you will run out of data, and what about other determinants like family
income and parental education?)
3 Use a regression in which the omitted variable (PctEL) is no longer omitted:
include PctEL as an additional regressor in a multiple regression.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 13
The Population Multiple Regression Model (SW Section 6.2)
Consider the case of two regressors:
Yi = β0+β1X1i +β2X2i +ui , i = 1, . . . ,n
Y is the dependent variable (or LHS variable)
X1, X2 are the two independent variables (regressors, RHS variables)
(Yi ,X1i ,X2i ) denote the i th observation on Y , X1, and X2.
β0 = unknown population intercept
β1 = effect on Y of a change in X1, holding X2 constant
β2 = effect on Y of a change in X2, holding X1 constant
ui = the regression error (omitted factors)
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 14
Interpretation of coefficients in multiple regression
Yi = β0+β1X1i +β2X2i +ui , i = 1, . . . ,n
Consider changing X1 by ∆X1 while holding X2 constant:
Population regression line before the change:
Y = β0+β1X1+β2X2
Population regression line after the change:
Y +∆Y = β0+β1(X1+∆X1)+β2X2
Difference: ∆Y = β1∆X1. So,
β1 =∆Y /∆X1 holding X2 constant,
β2 =∆Y /∆X2 holding X1 constant,
β0 = predicted value of Y when X1 = X2 = 0.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 15
The OLS Estimator in Multiple Regression (SW Section 6.3)
With two regressors, the OLS estimator solves:
min
b0,b1,b2
n
∑
i=1
[Yi − (b0+b1X1i +b2X2i )]2
The OLS estimator minimizes the average squared difference between the actual
values of Yi and the prediction (predicted value) based on the estimated line.
This minimization problem can be solved using calculus
This yields the OLS estimators of (β0,β1,β2).
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 16
Example: the California test score data
Regression of TestScore against STR:
TestScore = 698.9−2.28×STR
Now include percent English Learners in the district (PctEL):
TestScore = 686.0−1.10×STR−0.65×PctEL
What happens to the coefficient on STR?
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 17
Multiple regression in R
The fitted regression equation can be written as
̂TestScore= 686.03
(8.73)
− 1.10
(0.43)
×STR− 0.65
(0.03)
×STR, R2 = 0.43
Also, sqrt(reg1$res_var) gives SER = 14.5
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 18
Measures of Fit for Multiple Regression (SW Section 6.4)
Actual = predicted + residual: Yi = Ŷi + ûi
SER = standard deviation of ûi (with d.f. correction)
RMSE = standard deviation of ûi (without d.f. correction)
R2 = fraction of variance of Y explained by X
R
2
= “adjusted R2” = R2 with a degrees-of-freedom correction that adjusts for
estimation uncertainty; R
2
< R2
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 19
SER and RMSE
As in regression with a single regressor, the SER and the RMSE are measures of
the spread of the Y ’s around the regression line:
SER =
√
1
n−k−1
n
∑
i=1
û2i
RMSE =
√
1
n
n
∑
i=1
û2i
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 20
R2 and adjusted R2
The R2 is the fraction of the variance explained – same definition as in
regression with a single regressor:
R2 =
ESS
TSS
= 1− SSR
TSS
where ESS = ∑ni=1(Ŷi −Y )2, TSS = ∑ni=1(Yi −Y )2, SSR = ∑ni=1 û2i
The R2 always increases when you add another regressor – a bit of a problem for
a measure of “fit”
The R
2
(the “adjusted R2”) corrects this problem by “penalizing” you for
including another regressor – the R
2
does not necessarily increase when you add
another regressor.
R
2
= 1− n−1
n−k−1
SSR
TSS
Note that R
2 ≤ R2, however if n is large the two will be very close.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 21
Measures of fit (continued)
Test score example:
1 TestScore = 698.9−2.28×STR with R2 = 0.05 and SER = 18.6
2 TestScore = 686.0−1.10×STR−0.65×PctEL
with R2 = 0.426, R2 = 0.424, and SER = 14.5
Including PctEL substantially improves the goodness of fit.
▶ SER reduces (unit of SER = unit of TestScore)
▶ R2 substantially increases.
▶ Note:R2 ≈ R2 because n is large.
Question: how to choose a variable – should we maximize R
2
?
Chapter 7 will discuss about how to choose a variable for a regression analysis.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 22
The Least Squares Assumptions for
Multiple Regression (SW Section 6.5)
Yi = β0+β1X1i +β2X2i + · · ·+βkXki +ui , i = 1, . . . ,n
1 The conditional distribution of u given X has mean zero, that is,
E (ui |X1i ,X2i , . . . ,Xki ) = 0.
2 (X1i , . . . ,Xki ,Yi ), i = 1, . . . ,n, are i.i.d.
3 Large outliers are unlikely: X1, . . . ,Xk , and Y have four moments:
E (X 41i )< ∞, . . . ,E (X
4
ki )< ∞,E (Y
4
i )< ∞
4 There is no perfect multicollinearity.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 23
The Least Squares Assumptions for
Multiple Regression (SW Section 6.5)
Assumption #1: the conditional mean of u given the included X ’s is zero.
E (ui |X1i ,X2i , . . . ,Xki ) = 0
This has the same interpretation as in regression with a single regressor.
This condition fails when there exists an omitted variable, i.e.,
1 belongs in the equation (so is in u) and
2 is correlated with an included X
then this condition fails and there is Omitted Variable bias.
The best solution, if possible, is to include the omitted variable in the regression.
A second, related solution is to include a variable that controls for the omitted
variable (discussed in Ch. 7)
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 24
The Least Squares Assumptions for
Multiple Regression (SW Section 6.5)
Assumption #2: (X1i , . . . ,Xki ,Yi ), i = 1, . . . ,n, are i.i.d.
This is satisfied automatically if the data are collected by simple random
sampling.
Assumption #3: large outliers are rare (finite fourth moments)
This is the same assumption as we had before for a single regressor. As in the
case of a single regressor, OLS can be sensitive to large outliers, so you need to
check your data (scatter plots!) to make sure there are no crazy values (typos or
coding errors).
Assumption #4: There is no perfect multicollinearity
Perfect multicollinearity is when one of the regressors is an exact linear function
of the other regressors.
Solution: just drop out the problematic variables!
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 25
The Sampling Distribution of the
OLS Estimator (SW Section 6.6)
Under the four Least Squares Assumptions,
E [β̂j ] = βj for j = 0,1, . . . ,k , i.e., OLS estimators are unbiased
V (β̂j ) is inversely proportional to n
For n large,
β̂j −βj
SE (β̂j )
approx∼ N (0,1)
Conceptually, there is nothing new here! The way we test a simple hypothesis
such as H0 : βj = β 0j is the same. When α = 0.05, Reject H0
1 if | β̂j−β
0
j
SE(β̂j )
|> 1.96
2 if p-value is smaller than 0.05
3 if β0j is outside the 95% confidence interval, β̂j ±1.96SE(β̂j )
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 26
Multicollinearity, Perfect and Imperfect
(SW Section 6.7)
Perfect multicollinearity is when one of the regressors is an exact linear
function of the other regressors.
Some more examples of perfect multicollinearity
1 Include the same variable twice, i.e., X1 = X2.
2 Regress TestScore on a constant, D, and B , where D is dummy for STR ≤ 20 and
B is dummy for STR > 20. So, B = 1−D.
(2) above is an example of ‘dummy variable trap’. More explicitly, suppose you
have a set of multiple binary (dummy) variables, which are mutually exclusive
and exhaustive
That is, there are multiple categories and every observation falls in one and only
one category. If you include all these dummy variables and a constant, you will
have perfect multicollinearity. (Why?)
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 27
Perfect multicollinearity (continued)
Solutions: (1) omit one of the groups or (2) omit the intercept. The interpretation
of the coefficients is different between (1) and (2)!!
Perfect multicollinearity usually reflects a mistake in the definitions of the
regressors, or an oddity in the data
If you have perfect multicollinearity, your statistical software will let you know –
either by crashing or returning an error message or by “dropping” one of the
variables arbitrarily
The solution to perfect multicollinearity is to modify your list of regressors so
that you no longer have perfect multicollinearity.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 28
Imperfect multicollinearity
Imperfect and perfect multicollinearity are quite different despite the similarity
of the names.
Imperfect multicollinearity occurs when two or more regressors are highly
correlated.
Why the term “multicollinearity”?
If two regressors are highly correlated, then their scatterplot will pretty much
look like a straight line – they are “co-linear” – but unless the correlation is
exactly ±1, that collinearity is imperfect.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 29
Imperfect multicollinearity, ctd.
Imperfect multicollinearity implies that one or more of the regression
coefficients will be imprecisely estimated (large standard errors).
The idea: the coefficient on X1 is the effect of X1 holding X2 constant; but if X1
and X2 are highly correlated, there is very little variation in X1 once X2 is held
constant.
So the data don’t contain much information about what happens when X1
changes but X2 doesn’t. If so, the variance of the OLS estimator of the
coefficient on X1 will be large.
Example: X1 is dummy for a woman and X2 is dummy for a lipstick user.
Having high standard errors is a natural result: when X1 and X2 are highly
correlated, it is hard to disentangle the effect of X1 on Y from the effect of X2 on
Y . So, the estimates naturally have a lot of uncertainty.
Next Week: Inference in Multiple Regression, Ch7.
Rambaldi (UQ) ECON2300 S1 2024 L4 slide 30