ECO220Y5Y: Introduction to Data Analysis and Applied
Data Project Two
Winter Term 2021
Is COVID-19 likely to increase educational inequality amongst less de-
veloped countries? Does cross-country evidence indicate some countries
are at greater risk of an educational crisis?
0.1 Project Overview
Your goal is to describe the variation in COVID-19 related educational inequality and risks facing the
education sector across countries. You should comment on the articles which discuss the severity of
the situation for many countries and what COVID-19 might mean for educational vulnerabilities. You
will do this using the ‘projectdata Fall2020.xlsx’ file provided. Continuing from your previous analysis
you will produce additional evidence using your knowledge from linear regression. You must now
include and discuss results using multivariate regression techniques, for example: output
tables, interpretation of coefficients, goodness of fit statistics, plots of residuals, etc. You should
consider carefully the best specification and provide evidence (formal testing or analysis of plots) to
support your model selection. You should consider creating indicator variables that might be used
to impact the intercept or as interaction terms. You should look for common violations of the OLS
assumptions in your regressions such as heteroskedasticity, serial correlation and non-normality. You
should conclude using your model to determine whether COVID has increased educational inequality
for developing countries. You should comment on how significant your variables of interest are
and what economic significance you can uncover relating COVID to educational outcomes amongst
different countries.
As with data project one, the data are a set of variables downloaded and combined from the
Federal Reserve Economic Data,, the OECD data repository and UNICEF.
The file comprises cross sectional data for various countries measured on similar dates (ex. data on
the severity of COVID-19 and its response are from April 1, 2020). A brief description of the available
variables is given in the excel file (further description is available on the corresponding websites).
Using suitable quantitative techniques from ECO220 describe some interesting characteristics of the
variables of interest to you. In interpreting, explaining and assessing validity of your output, you
should read the articles provided. Try to pick out variables that might be related in some way to
the question and discuss these. You can also search out your own literature to guide your discussion
but be sure to include any other sources in a bibliography.
0.2 Project Submission
As with project 1, project 2 will not be marked based on length but rather how well you addressed
the question. Your submission should not exceed 1200 words of text and 4 pages of graphs and tables.
If it is written in a clear and concise style, and you have a good handle on generating useful graphs,
this limit will be sufficient for a full mark. Write an assessment that is smart, not long. Highlight
the findings that are puzzling, practically useful, thought provoking or seem to be counter-intuitive.
Try to deliver a submission that is interesting and easy to follow, a short piece of statistical analysis
that you yourself would like to read.
This Data Project is worth 7.5% of your final mark. All statistical analysis should be done using
either Stata or R. The final report should be submitted as a single written document in .pdf format
and you must also include your DO file for Stata or SCRIPT file for R. The submission deadline is
Monday April 19th.
0.3 Software Help
Several videos on how to use econometrics software are available online. An additional help lecture
will be provided. Alternatively there are some good handbooks available for Stata.