ECOM30002/90002 -无代写|学霸联盟

ECOM30002/90002 -无代写

时间：2025-08-18

ECOM30002/90002
Econometrics 2
Lecture 2
Short vs long regressions, and the omitted variable bias
Kevin Staub
1 / 19
Lecture roadmap
1 Causal and descriptive interpretations of linear regressions
1 Causal and descriptive regression
2 Long vs Short regressions, and the OVB
2 Instrumental variables estimation
3 Estimation theory and simulation
4 Statistical properties of OLS and 2SLS
5 Panel data analysis
6 Prediction and “big data”
7 Time series analysis
2 / 19
Recap
In the last lecture, we looked mainly at simple regression.
We learned that we can understand linear regression as a device to capture
1 conditional expectation functions
The regression parameters approximate the CEF
Always “correct” (we can always interpret regression this way)
2 causal effects
The regression parameters relate to a causal model
(e.g., a model that can be understood in terms of potential outcomes)
At minimum, requires the model error to be uncorrelated to the
“treatment” variable. This is satisfied in a randomised experiment but not
necessarily in other situations.
3 / 19
This lecture
Descriptive vs causal interpretations of regression in the context of
multiple regression:
Comparing “long” vs “short” regression models.
In a nutshell:
Descriptive interpretation: Ceteris paribus (long) vs mutatis mutandis
(short).
Causal: Long as causal (conditional independence). Short as suffering from
omitted variable bias.
4 / 19
Descriptive regression: What interpretation do we want?
“All models are wrong but some are useful” (George Box)
The interpretation we want determines both
a) the selection of regressors
b) and the functional form
ad a) should we regress Yi on X1i only, or rather on X1i and X2i?
(this lecture)
ad b) should we regress Yi on X1i , or rather log(Yi ) on X1i?
(not the focus of this subject)
In many cases, answers to these questions are given by an economic model.
5 / 19
Comparing short and long population regressions
Yi = β
s
0 + β
s
1X1i + V
s
i , E(V
s
i |X1i ) = 0 (short)
Yi = β
`
0 + β
`
1X1i + β
`
2X2i + V
`
i , E(V
`
i |X1i ,X2i ) = 0 (long)
Both models are “true” and “valid”.
But they describe different things and have different interpretations.
In particular, βs1 6= β`1 in general.
6 / 19
Relation between βs1 and β
`
1
Method 1: take expectation of long model over X2i
E(Yi |X1i ) = EX2i |X1i [E(Yi |X1i ,X2i )] = β`0 + β`1X1i + β`2E(X2i |X1i )
Suppose, E(X2i |X1i ) = γ0 + γ1X1i . Then,
E(Yi |X1i ) = (β`0 + β`2γ0) + (β`1 + β`2γ1)X1i .
Method 2: plug in long model into short OLS estimand:
βs1 =
Cov(Yi ,X1i )
V (X1i )
=
Cov(β`0 + β
`
1X1i + β
`
2X2i + V
`
i ,X1i )
V (X1i )
= β`1 + β
`
2
Cov(X1i ,X2i )
V (X1i )
Identical result if E(X2i |X1i ) is indeed linear.
Formulae for K > 2 tend to be qualitatively similar but more complex.
7 / 19
Relation between βs1 and β
`
1
The population relationship between short and long parameters is:
βs1 = β
`
1 + β
`
2
Cov(X1i ,X2i )
V (X1i )
Short equals long plus the effect of omitted times the regression of
omitted on included.
Note: Also holds between sample estimates βˆs1 and βˆ
`
1 .
Interpretation:
βs1 = β
`
1 if
X1i and X2i are uncorrelated, and/or
β`2 = 0.
Otherwise: βs1 6= β`1
8 / 19
A decomposition
The following relation holds among the population parameters
Direct effect β`1
+ Indirect effect β`2
Cov(X1i ,X2i )
V (X1i )
= Total effect βs1
Often the “direct effect” is also called a “ceteris paribus” effect:
an effect holding everything else equal.
The “total effect” then is a “mutatis mutandis” effect:
an effect including changes through indirect channels.
Example:
Yi : Price of a refrigerator
X1i : Energy costs
X2i : Volume
9 / 19
Causal regression models
So far: descriptive interpretation of “short vs. long”. Now: in a causal setting.
Relate a general multiple regression model to a causal framework. Two
generalisations relative to Lecture 1:
Continuous “treatment” variable.
Multiple regressors.
The role of covariates other than the treatment in this setting is that is to
serve as “control variables”, allowing us to obtain causal effects under a weaker
assumption (conditional independence rather than independence).
Use causal effect of schooling (Si ) on wages (Yi ) as example.
10 / 19
Potential outcomes with continuous treatment
General individual-specific potential outcome function
Ysi = fi (s)
Example:
Y12i is the wages i would earn if they had 12 years of schooling (Si = 12).
Possible objects of interest:
E(fi (s)− fi (s − 1)), E(fi (16)− fi (12)|Si = 16), E(f ′i (Si )), etc.
11 / 19
Conditional independence
Conditional independence assumption (CIA)
Ysi ⊥ Si | Xi for all s.
Conditional on a vector of covariates Xi , the treatment Si is independent of all
potential outcomes.In particular,
E(fi (s)|Xi , Si = s) = E(fi (s)|Xi , Si = s ′) = E(fi (s)|Xi ), for all s, s ′
This implies that for given values of Xi , conditional expectation contrasts have
a causal interpretation. E.g.,
E(Yi |Xi ,Si =12)− E(Yi |Xi , Si =11) = E(fi (12)|Xi , Si =12)− E(fi (11)|Xi ,Si =11)
= E(fi (12)− fi (11) |Xi ,Si =12)
= E(fi (12)− fi (11) |Xi )
12 / 19
Regression and CIA: Causal linear constant effects model
Linear constant causal effects model
fi (s) = β0 + β1s + Ui .
Observed outcome under chosen treatment Si
Yi = β0 + β1Si + Ui .
Assume
Ui = X
′
i γ + Vi with E(Ui |Xi ) = X ′i γ.
13 / 19
Regression and CIA: Causal linear constant effects model
By CIA,
E(fi (s)|Xi , Si = s) = E(fi (s)|Xi ) = β0 + β1s + X ′i γ.
And so Vi in the linear regression model
Yi = β0 + β1Si + X
′
i γ + Vi
is uncorrelated with Si and Xi ; β1 is the causal effect and can be obtained by
regressing Yi on (Si ,X
′
i )
′.
Exogeneity
The key assumption that identifies β1 is often called exogeneity.
Definitions vary, depending on the setting and convenience, but always
imply some form of limitation on the dependence between regressors and
error.
Here, for instance, we just used E(Vi |Xi , Si ) = E(Vi |Xi ), rather than the
more common but stronger E(Vi |Si ,Xi ) = 0 (that we will use later).
14 / 19
Spurious Correlation and Omitted Variable Bias
Back to the relationship between “short” and “long” regressions:
Suppose
Model of interest: long model
Estimated model: short model
E.g.
We see ‘long’ as causal (CIA), or
We estimate the short model (the total effect), but we want to have the
interpretation of the long model (the direct effect).
Problem: Short model suffers from omitted variable bias (OVB)
Causal Descriptive
Long Yi = β0 + β1Si + X
′
i γ + Vi Yi = β
`
0 + β
`
1X1i + β
`
2X2i + V
`
i
Short Yi = α0 + α1Si + V
s
i Yi = β
s
0 + β
s
1X1i + V
s
i
OVB α1 = β1 + γ
′δX βs1 = β
`
1 + β
`
2δ1
δX : Vector of slopes from δ1: Slope from regression
regressionss of Xki on Si . of X2i on X1i .
15 / 19
Spurious Correlation and Omitted Variable Bias
For simplicity, let’s return to the “descriptive” notation (β`, βs).
Two cases without omitted variable bias (β`1 = β
s
1):
β`2 = 0 and/or Cov(X1i ,X2i ) = 0
Otherwise...
Cov(X1i ,X2i ) > 0 Cov(X1i ,X2i ) < 0
β`2 > 0 upward bias downward bias
β`2 < 0 downward bias upward bias
Note: the direction of the relation between X1i and X2i is immaterial for the
argument. We call X2i a “confounder”, if X2i affects both X1i and Yi .
Extreme cases
fully spurious correlation (β`1 = 0, β
s
1 6= 0)
fully spurious non-correlation (β`1 6= 0, βs1 = 0))
16 / 19
Example
## Dependent variable: wage (short)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 242.81 35.11 6.92 0
## educ 26.70 2.49 10.73 0
## Dependent variable: wage (long)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 149.19 41.90 3.56 0
## educ 20.73 2.88 7.19 0
## IQ 1.73 0.43 4.06 0
## Dependent variable: IQ (omitted on included)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 54.27 1.81 29.93 0
## educ 3.46 0.13 26.92 0
26.70 = βˆs1 = βˆ
`
1 + βˆ
`
2 δˆ1 = 20.73 + 1.73× 3.46
17 / 19
Including too many regressors
What if we include X2i but want to have a ”short” interpretation? Why too
many?
X2i is part of the causal pathway from X1i to Yi (mediator variable).
X2i is a post-treatment outcome resulting from the intervention.
Example:
Regression of mortality (Yi ) on physical exercise (X1i ). Control for health (X2i )?
18 / 19
Summary
Often we want to estimate ceteris paribus relationships (“causal flavour”).
Then we need to condition on, or control for, other ”confounding”
variables.
If we are interested in the relationship between Yi and a particular Xik the
other variables in Xi act as control variables.
For example, what is the impact of energy costs upon a refrigerator’s price
controlling for size?
Sometimes, ceteris paribus is uninteresting and we want mutatis mutandis.
For example, what is the impact of education upon a person’s wage,
keeping job position fixed?
19 / 19

学霸联盟