ECOM30002/90002 Econometrics 2 Lecture 2 Short vs long regressions, and the omitted variable bias Kevin Staub 1 / 19 Lecture roadmap 1 Causal and descriptive interpretations of linear regressions 1 Causal and descriptive regression 2 Long vs Short regressions, and the OVB 2 Instrumental variables estimation 3 Estimation theory and simulation 4 Statistical properties of OLS and 2SLS 5 Panel data analysis 6 Prediction and “big data” 7 Time series analysis 2 / 19 Recap In the last lecture, we looked mainly at simple regression. We learned that we can understand linear regression as a device to capture 1 conditional expectation functions The regression parameters approximate the CEF Always “correct” (we can always interpret regression this way) 2 causal effects The regression parameters relate to a causal model (e.g., a model that can be understood in terms of potential outcomes) At minimum, requires the model error to be uncorrelated to the “treatment” variable. This is satisfied in a randomised experiment but not necessarily in other situations. 3 / 19 This lecture Descriptive vs causal interpretations of regression in the context of multiple regression: Comparing “long” vs “short” regression models. In a nutshell: Descriptive interpretation: Ceteris paribus (long) vs mutatis mutandis (short). Causal: Long as causal (conditional independence). Short as suffering from omitted variable bias. 4 / 19 Descriptive regression: What interpretation do we want? “All models are wrong but some are useful” (George Box) The interpretation we want determines both a) the selection of regressors b) and the functional form ad a) should we regress Yi on X1i only, or rather on X1i and X2i? (this lecture) ad b) should we regress Yi on X1i , or rather log(Yi ) on X1i? (not the focus of this subject) In many cases, answers to these questions are given by an economic model. 5 / 19 Comparing short and long population regressions Yi = β s 0 + β s 1X1i + V s i , E(V s i |X1i ) = 0 (short) Yi = β ` 0 + β ` 1X1i + β ` 2X2i + V ` i , E(V ` i |X1i ,X2i ) = 0 (long) Both models are “true” and “valid”. But they describe different things and have different interpretations. In particular, βs1 6= β`1 in general. 6 / 19 Relation between βs1 and β ` 1 Method 1: take expectation of long model over X2i E(Yi |X1i ) = EX2i |X1i [E(Yi |X1i ,X2i )] = β`0 + β`1X1i + β`2E(X2i |X1i ) Suppose, E(X2i |X1i ) = γ0 + γ1X1i . Then, E(Yi |X1i ) = (β`0 + β`2γ0) + (β`1 + β`2γ1)X1i . Method 2: plug in long model into short OLS estimand: βs1 = Cov(Yi ,X1i ) V (X1i ) = Cov(β`0 + β ` 1X1i + β ` 2X2i + V ` i ,X1i ) V (X1i ) = β`1 + β ` 2 Cov(X1i ,X2i ) V (X1i ) Identical result if E(X2i |X1i ) is indeed linear. Formulae for K > 2 tend to be qualitatively similar but more complex. 7 / 19 Relation between βs1 and β ` 1 The population relationship between short and long parameters is: βs1 = β ` 1 + β ` 2 Cov(X1i ,X2i ) V (X1i ) Short equals long plus the effect of omitted times the regression of omitted on included. Note: Also holds between sample estimates βˆs1 and βˆ ` 1 . Interpretation: βs1 = β ` 1 if X1i and X2i are uncorrelated, and/or β`2 = 0. Otherwise: βs1 6= β`1 8 / 19 A decomposition The following relation holds among the population parameters Direct effect β`1 + Indirect effect β`2 Cov(X1i ,X2i ) V (X1i ) = Total effect βs1 Often the “direct effect” is also called a “ceteris paribus” effect: an effect holding everything else equal. The “total effect” then is a “mutatis mutandis” effect: an effect including changes through indirect channels. Example: Yi : Price of a refrigerator X1i : Energy costs X2i : Volume 9 / 19 Causal regression models So far: descriptive interpretation of “short vs. long”. Now: in a causal setting. Relate a general multiple regression model to a causal framework. Two generalisations relative to Lecture 1: Continuous “treatment” variable. Multiple regressors. The role of covariates other than the treatment in this setting is that is to serve as “control variables”, allowing us to obtain causal effects under a weaker assumption (conditional independence rather than independence). Use causal effect of schooling (Si ) on wages (Yi ) as example. 10 / 19 Potential outcomes with continuous treatment General individual-specific potential outcome function Ysi = fi (s) Example: Y12i is the wages i would earn if they had 12 years of schooling (Si = 12). Possible objects of interest: E(fi (s)− fi (s − 1)), E(fi (16)− fi (12)|Si = 16), E(f ′i (Si )), etc. 11 / 19 Conditional independence Conditional independence assumption (CIA) Ysi ⊥ Si | Xi for all s. Conditional on a vector of covariates Xi , the treatment Si is independent of all potential outcomes.In particular, E(fi (s)|Xi , Si = s) = E(fi (s)|Xi , Si = s ′) = E(fi (s)|Xi ), for all s, s ′ This implies that for given values of Xi , conditional expectation contrasts have a causal interpretation. E.g., E(Yi |Xi ,Si =12)− E(Yi |Xi , Si =11) = E(fi (12)|Xi , Si =12)− E(fi (11)|Xi ,Si =11) = E(fi (12)− fi (11) |Xi ,Si =12) = E(fi (12)− fi (11) |Xi ) 12 / 19 Regression and CIA: Causal linear constant effects model Linear constant causal effects model fi (s) = β0 + β1s + Ui . Observed outcome under chosen treatment Si Yi = β0 + β1Si + Ui . Assume Ui = X ′ i γ + Vi with E(Ui |Xi ) = X ′i γ. 13 / 19 Regression and CIA: Causal linear constant effects model By CIA, E(fi (s)|Xi , Si = s) = E(fi (s)|Xi ) = β0 + β1s + X ′i γ. And so Vi in the linear regression model Yi = β0 + β1Si + X ′ i γ + Vi is uncorrelated with Si and Xi ; β1 is the causal effect and can be obtained by regressing Yi on (Si ,X ′ i ) ′. Exogeneity The key assumption that identifies β1 is often called exogeneity. Definitions vary, depending on the setting and convenience, but always imply some form of limitation on the dependence between regressors and error. Here, for instance, we just used E(Vi |Xi , Si ) = E(Vi |Xi ), rather than the more common but stronger E(Vi |Si ,Xi ) = 0 (that we will use later). 14 / 19 Spurious Correlation and Omitted Variable Bias Back to the relationship between “short” and “long” regressions: Suppose Model of interest: long model Estimated model: short model E.g. We see ‘long’ as causal (CIA), or We estimate the short model (the total effect), but we want to have the interpretation of the long model (the direct effect). Problem: Short model suffers from omitted variable bias (OVB) Causal Descriptive Long Yi = β0 + β1Si + X ′ i γ + Vi Yi = β ` 0 + β ` 1X1i + β ` 2X2i + V ` i Short Yi = α0 + α1Si + V s i Yi = β s 0 + β s 1X1i + V s i OVB α1 = β1 + γ ′δX βs1 = β ` 1 + β ` 2δ1 δX : Vector of slopes from δ1: Slope from regression regressionss of Xki on Si . of X2i on X1i . 15 / 19 Spurious Correlation and Omitted Variable Bias For simplicity, let’s return to the “descriptive” notation (β`, βs). Two cases without omitted variable bias (β`1 = β s 1): β`2 = 0 and/or Cov(X1i ,X2i ) = 0 Otherwise... Cov(X1i ,X2i ) > 0 Cov(X1i ,X2i ) < 0 β`2 > 0 upward bias downward bias β`2 < 0 downward bias upward bias Note: the direction of the relation between X1i and X2i is immaterial for the argument. We call X2i a “confounder”, if X2i affects both X1i and Yi . Extreme cases fully spurious correlation (β`1 = 0, β s 1 6= 0) fully spurious non-correlation (β`1 6= 0, βs1 = 0)) 16 / 19 Example ## Dependent variable: wage (short) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 242.81 35.11 6.92 0 ## educ 26.70 2.49 10.73 0 ## Dependent variable: wage (long) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 149.19 41.90 3.56 0 ## educ 20.73 2.88 7.19 0 ## IQ 1.73 0.43 4.06 0 ## Dependent variable: IQ (omitted on included) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 54.27 1.81 29.93 0 ## educ 3.46 0.13 26.92 0 26.70 = βˆs1 = βˆ ` 1 + βˆ ` 2 δˆ1 = 20.73 + 1.73× 3.46 17 / 19 Including too many regressors What if we include X2i but want to have a ”short” interpretation? Why too many? X2i is part of the causal pathway from X1i to Yi (mediator variable). X2i is a post-treatment outcome resulting from the intervention. Example: Regression of mortality (Yi ) on physical exercise (X1i ). Control for health (X2i )? 18 / 19 Summary Often we want to estimate ceteris paribus relationships (“causal flavour”). Then we need to condition on, or control for, other ”confounding” variables. If we are interested in the relationship between Yi and a particular Xik the other variables in Xi act as control variables. For example, what is the impact of energy costs upon a refrigerator’s price controlling for size? Sometimes, ceteris paribus is uninteresting and we want mutatis mutandis. For example, what is the impact of education upon a person’s wage, keeping job position fixed? 19 / 19
学霸联盟