ECON2300: INTRODUCTORY ECONOMETRICS
Coordinator: Professor Alicia N. Rambaldi
Research Project 2
Due: 4 pm on 24 May
This project weighs 15% of your final overall mark. Total possible points 100.
Submission of your report
Your report must be single-spaced and in 12 Font size. You should give your answer to each of the
following questions following a similar format of the solutions to the tutorial problem sets. When you
are required to use R, you must show your R command and R outputs (screenshots or figures generated
from R). You will lose 2 points whenever you fail to provide R commands and outputs. For each
question, when you are asked to discuss or interpret, your answer has to be brief and compact. You will
lose 2 points if your answer is needlessly wordy. You must upload your assignment via the course’s
Blackboard provided link, in PDF format. (Do not submit a hard copy.)
Research Tasks
Part I: Labour Force Participation of Women (20 Marks)
A colleague has estimated the following model (t− stats in brackets ):
̂Pr(lfp = 1|age, educ, kids,mtr) = Φ(1.1923 −0.0206age +0.0838educ −0.3139kids −1.3939mtr)
(−2.93) (3.61) (−2.54) (−2.26)
where,
• lfp woman’s labour force participation
• age woman’s age
• educ woman’s years of education
• kids binary, =1 if she has children; zero otherwise
• mtr woman’s marginal tax rate
(a) What is the probability that a 30 year-old woman, with 15 years of education, no kids and with
an income that places her in the marginal tax rate of 39 cents per dollar will participate in the
labour force? (4 marks)
(b) How does the probability of her participating in the labour force change if she had children? (4
marks)
(c) How does the probability of her participating in the labour force change from (a) if she had
children and her marginal tax rate is 30 cents per dollar? (4 marks)
(d) Using the results from (a), (b) and (c); what would you advise policy makers in relation to
increasing labour force participation of women? (8 marks)
You can compute the probabilities for parts (a)-(c) using R or a Statistical Table. Please indicate which
you have used (provide R code/ workings as relevant).
1
Part II: Wage Equation (80 Marks)
The file project2dat.csv contains data collected by a longitudinal survey by the US Department of
Labor. The sample contains 716 women who had completed their schooling and were employed when
interviewed. These individuals were interviewed in 1982, 1983, 1985, 1987 and 1988. The file contains
3580 lines of data. The following variables are included
1. id is the entity identifier (i.e. individuals), i = 1, . . . , 716
2. year is the time period, t = 1982, 1983, 1985, 1987, 1988
3. wage in US dollars per hour
4. educ years of education
5. black binary, if individual is black = 1; zero otherwise.
6. exper years of experience
7. tenure years with current employer
8. south binary, if residing in a southern state = 1; zero otherwise.
9. union binary, if affiliated to a union =1; zero otherwise.
A: The Wage Equation in a Cross-Sectional Setting (36 Marks)
A member of your team has run a number of models using the sample of 716 individuals for the year
1988 and produced the results in Table 1 and additional output presented below.
(a) Does it seem reasonable to model wage and exper as a log-linear relationship? Produce two
scatter plots and compare the linear and log-linear cases and comment on the choice. (6 marks)
(b) Does it seem reasonable to model wage and tenure as a log-linear relationship? Produce two
scatter plots and compare the linear and log-linear cases and comment on the choice. (6 marks)
(c) Using all the evidence provided, do you find empirical support of a wage gap based on the
geographical location of the worker? (6 marks)
(d) Using all the evidence provided, do you find empirical support of a wage gap based on the ethnicity
of the worker? (6 marks)
(e) Is there evidence that the interaction between geography and ethnicity might be important? Use
all the evidence provided to make your case. (6 marks)
(f) Is there evidence that the interaction between experience and tenure is a significant determinant
of log(wages)? (6 marks)
2
Table 1: Table - Regressions for Year 1988, ln(Wage)
Dependent Variable: ln(Wage) in all specifications
(1) Small Model (2) No Interact (3) w/ One Interact (4) w/ Two Interact
(Intercept) 0.538∗∗∗ 0.589∗∗∗ 0.404∗∗ 0.393∗∗
(0.106) (0.106) (0.151) (0.152)
south −0.187∗∗∗ −0.133∗∗∗ −0.139∗∗∗ −0.085∗
(0.031) (0.033) (0.033) (0.040)
educ 0.083∗∗∗ 0.079∗∗∗ 0.078∗∗∗ 0.077∗∗∗
(0.007) (0.007) (0.007) (0.007)
exper 0.031∗∗∗ 0.028∗∗∗ 0.041∗∗∗ 0.041∗∗∗
(0.005) (0.006) (0.009) (0.009)
tenure 0.001 0.027 0.028
(0.003) (0.015) (0.015)
black −0.129∗∗∗ −0.130∗∗∗ −0.009
(0.034) (0.033) (0.047)
union 0.130∗∗∗ 0.130∗∗∗ 0.131∗∗∗
(0.034) (0.034) (0.034)
exten −0.002 −0.002
(0.001) (0.001)
b s −0.202∗∗
(0.065)
R2 0.288 0.309 0.313 0.320
Adj. R2 0.285 0.303 0.306 0.312
Num. obs. 716 716 716 716
RMSE 0.410 0.405 0.404 0.403
∗∗∗p < 0.001; ∗∗p < 0.01; ∗p < 0.05
Note: exten=exper*tenure; b s=black*south
- Linear hypothesis test
Hypothesis:
south = 0
b_s = 0
Model 1: restricted model
Model 2: lwage ~ south + educ + exper + tenure + black + union + exten + b_s
Res.Df Df F Pr(>F)
1 709
2 707 2 16.198 1.323e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- Linear hypothesis test
Hypothesis:
black = 0
b_s = 0
Model 1: restricted model
Model 2: lwage ~ south + educ + exper + tenure + black + union + exten + b_s
Res.Df Df F Pr(>F)
1 709
2 707 2 10.831 2.327e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3
B: Wage Equation in a Panel Setting (44 Marks)
(a) Please use the full panel to estimate the following versions of Model (4) from Table 1 (You can
present them in a table or using a standard format (e.g. see Part I above) for each model)
• Pooled OLS regression
• Fixed Effects (FE)
• Fixed Effects and Time Effects (FE and TE)
(10 marks)
(b) What is the Fixed Effects model controlling for that the Pooled OLS regression does not? (5
marks)
(c) What is the Fixed Effects + Time Effects model controlling for that the Pooled OLS regression
does not? (5 marks)
(d) The output from the FE and FE+TE do not provide estimates for some of the variables in the
original model. List these variables and explain why this is the case. (7 marks)
(e) Formally test that the preferred specification is a ”two-way” effects model. What is your conclu-
sion at the 5% significance level. (5 marks)
(f) Present and discuss the key differences in the findings relating to the presence of a wage gap due
to geographical and ethnic factors, between the modelling presented in Table 1, and those you
have obtained using the full panel. (12 marks)
4