LECTURE 5B-无代写|学霸联盟

LECTURE 5B-无代写

时间：2023-10-16

ECONOMETRICS
LECTURE 5B
Trang Le
LECTURE SLIDES #7A TOPICS
 Adjusted R-Squared
 Qualitative Information
 Dummy Variable and Multiple Groups
 Key references 6.3 , 7.1, 7.2 and 7.3
2
MORE ON GOODNESS OF FIT
 General remarks on R-squared
 High R-squared does not imply there is a causal interpretation
 Low R-squared does not preclude precise estimation of
marginal effects
 R-squared will always increase (at least never decrease) when we
add an extra variable
 How to construct a version of R-squared that takes into account
this fact
3
MORE ON GOODNESS OF FIT...
 Adjusted R-squared accounts for degrees of freedom
�2 = 1 − ( ⁄ ( − − 1))( ⁄ ( − 1))
 Adjusted R-squared imposes a penalty for adding new
regressors
 Adjusted R-squared may increase or decrease when add a
variable
 Potentially useful in comparing models with alternative
numbers of regressors
 Adjusted R-squared may be negative
�2 = 1 − (1 − 2)( − 1)/( − − 1)
4
ADJUSTED R-SQUARED IN STATA
5

_cons 4.821997 .2883396 16.72 0.000 4.253538 5.390455
lsales .2566717 .0345167 7.44 0.000 .1886224 .3247209

lsalary Coef. Std. Err. t P>|t| [95% Conf. Interval]

Total 66.7221632 208 .320779631 Root MSE = .50436
Adj R-squared = 0.2070
Residual 52.6559944 207 .254376785 R-squared = 0.2108
Model 14.0661688 1 14.0661688 Prob > F = 0.0000
F(1, 207) = 55.30
Source SS df MS Number of obs = 209
. reg lsalary lsales
USING FIT TO CHOOSE BETWEEN MODELS
 If models are nested – one is a special case of other
= 0 + 1 +
= 0 + 1 + 22 +
 First model is nested within second depending on 2
 Could choose between models on basis of test of 0:2 = 0
 Could also decide on basis of fit using �2
 Implicitly selecting a specific critical value
 �2 for first model increases relative to second iff t-statistic for
estimate of 2 is greater than one in absolute value
6
USING FIT TO CHOOSE BETWEEN MODELS...
 Models are nonnested if neither model is special case of other
= 0 + 1 log +
= 0 + 1 + 22 +
 Can’t impose restrictions on 1 & 2 to move to log model
 Testing option not available but can compare fit
 Using RDCHEM data log model 2 = .061 while quadratic model
yields 2 = .148 but comparison unfair to first model
 �2 = 0.030 for log & �2 = 0.090 for quadratic model
 Even after adjusting for difference in degrees of freedom quadratic
model is preferred
7
USING FIT TO CHOOSE BETWEEN MODELS...
 Models with different dependent variables will typically
be non-nested
 Here neither R-squared nor adjusted R-squared should be
used for comparison
 Continuing previous ex. what if comparison between log() = 0 + 1 + 22 +
= 0 + 1 + 22 +
 Now not possible to compare fit
 Comparing how well variation in log is explained
versus with how well variation in is explained
 Extent of variation in these two could be very different
8
QUALITATIVE INFORMATION
 Thus far variables have been quantitative – number of
bedrooms, years of education, hourly wage, …
 Many features likely to appear in analyses are
qualitative
 Gender of individual, their occupation, whether they are
employed or not, …
 Industry classification of firm, its credit rating, whether or not it
paid a dividend last quarter, …
 One way to incorporate qualitative information is to use
dummy (binary, indicator) variables
 Equals 1 or 0 representing presence or absence of feature
 May appear as dependent or as independent variables 9
DUMMY EXPLANATORY VARIABLE
 Single dummy independent variable
= 0 + 0 +
= 1 if person is a woman & = 0 otherwise
 Choice of who is the dummy is arbitrary
 Using zero/one also arbitrary but useful for
interpretation
 In our example being a woman is choosen for the dummy
variable being equal to 1. By using the binary female,
we have chosen male to be the base/benchmark group.
10
DUMMY VARIABLE
= 0 + 0 +
 Specified model is regression representation of
conditional means:
= 0 = 0
= 1 = 0 + 0
= 1 − = 0 = 0
11
DUMMY EXPLANATORY VARIABLE
 Have relied on ZCM assumption
 To better estimate gender effect need to control for other
factors
= 0 + 0 + 1 +
= 1, − = 0, = 0
 0represents difference in mean wage between men & women
with the same education
12
DUMMY EXPLANATORY VARIABLE
13
 Implication of this
particular model
 Difference does not
depend on level of
education
 Data determine this
difference
 Graphically, model
specifies an intercept shift
according to gender
DUMMY VARIABLE TRAP
 What happened to the male dummy?
 Why can’t we estimate
= 0 + 0 + 0 + 1 + ?
 Answer: There is a perfect multicollinearity problem (MLR.3 not
satisfied)
+ = 1
 Male & female dummy variables are perfectly collinear with
the intercept
 An example of the dummy variable trap
 More latter when talk about multiple groups
14
DUMMY VARIABLE TRAP
 Solution was to drop male dummy
= 0 + 0 +
 males chosen as base group
 Alternatively could make females the base
= 0 + 0 +
 Choice of base arbitary as can always recover estimates for
one specification from the other: 0 = 0 + 0; 0 + 0 = 0
 These are just alternative reparameterizations of the same
model
 Can also drop the intercept although not advisable
= + +
15
EXAMPLE 7.1
 Incorporating gender into our wage equation
�=−1.57(.72)− 1.81.26 + .572.049.025.012 + .141.021
= 526,2 = .364
 Holding education, experience & tenure fixed, women earn
$1.81 less per hour compared to men
 Does that imply wage discrimination against women?
 Depends on how good are the controls
 Being female may be correlated with other productivity
characteristics not controlled for
16
EXAMPLE 7.1
 Comparing means
�= 7.10(.35) − 2.51.30
= 526,2 = .116
 $7.10 is estimated mean hourly wage for men (base group)
 Women earn $2.51 less per hour (not controlling for anything)
 Is this difference in mean wages significant?
 Have 2 estimates of the gender effect (previously $1.81)
 Some but not all of the difference in male & female wages is
explained by differences in education, experience & tenure
17
EXAMPLE 7.1
 What if dependent variable is in logs? log( �)= .50(.10)− .30.04 + .087.007.0046.0016 + .017.003
= 526,2 = .392
 Effect of gender? (Recall previous lecture)
 As dummy changes from 0 to 1 (males to females) change in
wage approximately100. −.3 = −30 percent
Large change so approximation may be poor
18
PROGRAM EVALUATION
 Useful & important application of a dummy explanatory
variable occurs in policy analysis
 Governments or firms are interested in costs & benefits of
alternative policies
 Program evaluation involves measuring effect of a specific
program (or treatment or intervention)
 E.g. a training program that potentially improves worker
employability
 To evaluate a program consider comparison between
 Control group that does not participate in the program
 Treatment group that does participate
 Natural model is outcome (hours employed) depending on a
treatment dummy (attended training) plus controls (education) 19
PROGRAM EVALUATION
 Experimental evaluation
 In randomized experiments assignment to treatment is random
 Here causal effects can be inferred using a simple differences
in means regression
 = 0 + 0 +
 Self-selection into treatment as a source for endogeneity
 When treatment status is not randomly assigned then likely
related to characteristics that also influence the outcome
 Subjects self-select themselves into treatment depending on
their individual characteristics & prospects
20
PROGRAM EVALUATION - EXAMPLE
 Week 1 we wanted to asses the effectiveness of a
training program on wages
 Option 1: Random Treatment
 Option 2: People can decide whether to train or notlog() = 0 + 0 +
Now we know in:
1. Option 1: = 0 → 0 causal effect
2. Option 2: ≠ 0 selection into
treatment. Bias estimates 21
DUMMY VARIABLES: MULTIPLE GROUPS
Use of dummy variables easily extends to case of multiple
groups
 Examples include occupation, industry, region, ...., where
groups are mutually exclusive & exhaustive
 Define membership in each category by a dummy
variable (Group1, Group 2, ... , Group S) are all
dummies
 In regression model avoid the dummy variable trap by
leaving out one category (becomes base category)
22
DUMMY VARIABLES: MULTIPLE GROUPS
 US divided into north central, south, west & east
= 0 + 0 + 1 + 2 + 3+4 + 5 + 6 +
 East is what is called the reference group
 Essential to drop one of the groups
 Arbitrary which one is dropped
 Important to know which group is dropped for interpretation
 Once one group is dropped is easy to know the estimation
results if we would have changed the reference group
23
DUMMY VARIABLES: MULTIPLE GROUPS
24
 Other things equal hourly
wages compared to the
east are lower in northcen
($.62 less) & south ($.57
less) but higher in west
($.57 more)
 None of these individual
differences are significant
at the 5% level
 What about joint
significance?
Dependent variable: wage
Explanatory
variables
Estimate (se)
female -1.86 (.26)
educ .566 (.049)
exper .027 (.011)
tenure .139 (.021)
northcen -.621 (.372)
south -.572 (.348)
west .571 (.413)
constant -1.23 (.77)
n 526
2 .378
DUMMY VARIABLES: MULTIPLE GROUPS
 Test joint null
 0:4 = 5 = 6 = 0;1:
 Need F statistic for joint test of linear restrictions
 See Slides #4
, = � −
�
= 4557.3 − 4453.03 /34453.03/(526 − 7 − 1) ≈ 4.04
 5% critical value is 2.6 sufficient evidence to reject 0
 Conclude that regional effects are jointly significant
 Other estimates do not change much between models
 Omission of regional effects does not seem to be source of
omitted variable bias 25
DUMMY VARIABLES: MULTIPLE GROUPS
 Reference group is East and got the following estimates
 Northcen → −.621
 South → −.572
 West → .571
 What if the reference group was South
 Treat East in the previous regression as having the related
parameter equal to zero:
 East → +.572
 Northcen → −.621 + .572 = −.049
 West → .571 + .572 = 1.143