NOTE: The deadline for handing in this assignment is Friday, 26 March 2021, 12:00
noon (online via FASer). It is acceptable to copy/paste output from R. However, clarity
of presentation will be taken into account and to get full marks you are required to comment
appropriately on your output. Each dataset is unique and identifiable, do not share it.
You MUST provide an R script that executes all relevant analyses. This script must be clear.
You may include comments within your script if relevant.
Please submit your answers in one single pdf (.pdf format) including an appendix with your
script (in the same pdf). The R script is worth 20 marks.
Disclaimer: The dataset provided is computer simulated for the sole purpose of this assignment.
1. We want to estimate the determinants of wages. For this, a suitable candidate specification
Wagesi = α + β0Experiencei + β1Genderi + β2Agei + β3Age
i + i
whereWagesi corresponds to the total wages (in pounds) of individual i,Experiencei
corresponds to the number of years of experience of individual i,Genderi corresponds
to a dummy that equals 1 if the individual i is a woman and 0 if the individual is a man,
Agei corresponds to the age of individual i and Age2i corresponds to the age squared
of individual i. i designates the disturbances. α, β0, β1, β2 and β3 are parameters.
(a) [5 marks] Using your own words, explain what the disturbances i account for.
Illustrate your answer with an example of your choice.
(b) [5 marks] Give the condition(s) on the distribution of i such that the OLS
estimator is unbiased.
(c) [5 marks] Using your data compute the mean, the minimum and the maximum
for the variables wages and age (rounding numbers to 2 digits after comma
for the variable wages). Provide these in a well-presented table and report the
number of observations in your sample.
(d) [10 marks] Produce and provide a scatter plot of the relationship between wages
(on the y-axis) and age (on the x-axis). Does the relationship between wages
and age seem linear? (For full marks, plots must be properly labelled)
(e) [15 marks] Estimate the three following models by OLS and present your results
in one Table containing the three models, where column 1 will correspond to the
variable names (one variable per row), column 2 will correspond to model 1,
column 3 will correspond to model 2 and column 4 will correspond to model
3. When presenting the models, for each row you will display the coefficient
related to the variable in column 1 and the standard errors below in parentheses
(rounding numbers to 2 digits after comma). The last two rows of the table
should indicate the coefficient of determinationR2 for each model and the sample
size. The three models are:
• Model 1: Wagesi = α + β0Experiencei + β1Genderi + i
• Model 2: Wagesi = α + β0Experiencei + β1Genderi + β2Agei + i
• Model 3: Wagesi = α+β0Experiencei+β1Genderi+β2Agei+β3Age2i +
(f) [10 marks] Interpret the coefficient of determination R2 for each of the three
models estimated in question (e). Which model best fits the data?
(g) [10 marks] Suppose that the age of a given individual increases by 5. Based
on model 3 estimated in question (e), by how much will the expected wages
(h) [10 marks] How do you interpret the coefficient related to Gender in model (3)
estimated in question (e)? At which standard significance level (10%, 5% or
1%) is the coefficient significant?
(i) [10 marks] Using model 3 estimated in question (e), predict the expected wages
for each observation in the data set. Produce and provide a scatter plot of the
relationship between predicted wages (on the y-axis) and actual wages (on the
x-axis). Do predicted wages increase with actual wages? (For full marks, plots
must be properly labelled)