程序代写案例-NALYSIS 1
时间:2022-04-13
1
DEPARTMENT OF ECONOMICS AND RELATED STUDIES
ECONOMIC DATA ANALYSIS 1
SUMMATIVE ASSIGNMENT – SPRING/SUMMER 2022
The submission deadline for the Economic Data Analysis assignment is Thursday 28 April
2022, by 2pm (UK time).
Introduction
The COVID-19 pandemic is one of the most relevant issues for many economies across the
world. The efforts of professions at the frontline have to be supported by people with excellent
data skills. As an economist, you have the tools and skills required to contribute to the
understanding of the COVID-19 pandemic.
This assignment uses data collected from the Epidemic intelligence (ECDC) weekly reports on
COVID-19 cases and fatalities for a sample of countries between December 2020 and February
2022 as well as country-level macroeconomic indicators from the World Bank.
The assignment has three parts. The objective of the first part is to understand and summarise
the data. The second part looks at the relationship between economic performance and the
cumulative number of COVID-19 cases and deaths as of February 2022. Finally, Part 3
evaluates the impact of government interventions on the number of weekly reported fatalities
due to COVID-19.
Before you start, please read the instructions on page 2 carefully.
2
Instructions
From the EDA VLE page in Summative Assignment – Instructions and Submission point:
Download the Excel file EDA22_summative_assignment_dataset.xlsx
Download the paper by Goodman-Bacon and Marcus (2020)
Access the pre-recorded video: “Introduction to the EDA summative assignment”. It
presents the assignment and offers technical help for questions 1.2(a), 1.4(a), 2.1(a).
Each part is composed of questions that involve computational tasks or/and analytical answers.
You are advised to use Excel (from Office 365) to complete the tasks. Answer all questions.
All the material from the EDA module remains available during the time of the assignment and
you are allowed to use these resources.
If you think that there is a typo or an inconsistency in the assignment, please use the dedicated
Discussion board “EDA Summative Assignment Forum” on the EDA VLE page. Please note
that questions related to Excel tasks or concepts seen during the lectures will not be answered.
The marks for each question are given in brackets: they show the total for each part, the total
for each question and the breakdown for sub-questions.
Respect the word limit indicated under each question. Markers will stop reading after the
limit. The word count does not include: charts or tables copied from Excel, Excel formulas,
mathematical expressions, captions and references to tables, figures or to the literature.
Provide your answers on a report that respects professional standards:
Your answers have to be clearly indicated, expressed in a clear and appropriate manner.
Use a standard font (Calibri, Arial or Times New Roman), size 12, font colour Black.
Your charts should have a title, axis labels and a legend if necessary.
Charts and tables that you copy from your Excel spreadsheet should be pasted as
“Pictures”. They should be legible and captioned.
If you write an Excel formula on your report, briefly explain how it works: which
function do you use, to which variables or values the cells refer to.
Your final report should be compiled in a single PDF document:
It is your responsibility to make sure that the PDF document is legible.
You do not have to submit your Excel file.
Your report will be marked anonymously. Do not include your name or student number. Only
include your exam number (starting with Y…).
This formally assessed assignment is marked as part of your Part I examinations and forms
100% of your final module mark for Economic Data Analysis (ECO00017C). Under no
circumstances should you submit a report that you have worked on with another student, this
is an individual assignment for you to complete on your own.
Submit your final report by Thursday 28 April, 2pm (UK time) on the EDA VLE page in
Summative Assignment – Instructions and Submission point.
3
Part 1 – Understanding and summarising the data [40]
The objective is to understand the variables in the dataset, produce summary statistics and
discuss potential limitations of the data in terms of accuracy and comparability.
Question 1.1 [15]
Open the Excel file EDA22_summative_assignment_dataset.xlsx
The “Definitions and sources” tab in your spreadsheet contains a description of all variables in
the dataset “ECDC Data February 2022”. Use this information as well as your knowledge in
data analysis to answer the following questions.
(a) How would you characterise the design (cross-section, panel, time-series…) of the
dataset “ECDC Data February 2022”? Is the source of the data experimental or
observational?
[3] Justify your answer. Max 75 words
(b) Explain what “GDP per capita, PPP (constant international 2017 US$) for the year
2020” means. Will you be able to compare GDP values across countries?
[3] Justify your answer. Max 75 words
(c) The Gini coefficient measures the extent of inequality in an economy. Is Canada’s
income distribution more or less unequal than China’s? Briefly describe an alternative
method to compare income distributions across countries.
[3] Justify your answer. Max 75 words
(d) Briefly describe how the World Bank classifies countries in different income
categories.
[3] Max 75 words
(e) The variables weekly_pop is not in the original dataset from the ECDC and has been
calculated. The formula used to get the value for the cell L2 is
=IF(F2="","",IF(E2="cases",F2/D2*100000,F2/D2*1000000))
Where F is the column for weekly_count, E the column for indicator and D the column
for population.
Explain how the formula works.
[3] Max 75 words
4
Question 1.2 [10]
To identify the number of observations and possible missing values, we will use Excel’s
PivotTable option.
(a) Make a table showing the number of weeks (year_week) for which cases and deaths
are reported (use count) with continent and country as the row variables and indicator
as the column variable. Do not select observations that have a blank value for
country_code (they identify total values for continents, e.g Africa (total)).
Paste your table into your report and briefly explain how you built it.
How many countries do you have in your database? How many observations
are available for reported COVID-19 cases in Mali? How many observations
are available for reported COVID-19 deaths in Italy?
[5] Max 75 words
(b) Make a table showing the number of observations available (to only count cells that are
not blank, use count numbers), the average, minimum, maximum and standard
deviation (use StdDev) for gdp_capita_20, with income_group as the row variable, for
week 2022-06 only. To avoid double counting, you also need to select only one
indicator, either cases or deaths. Make a similar table for the Gini coefficient (again
restricting the analysis to week 2022-06 and one indicator only).
Paste your two tables into your report and briefly explain how you built them.
Is there any missing data for GDP per capita and/or the Gini coefficient? If so,
for which country(ies)?
[5] Max 75 words
Question 1.3 [7]
To get an idea of what the distribution of COVID-19 cases and deaths looks like across
countries, we will use box and whisker plots.
(a) Using the Excel’s PivotTable option, make a table showing the average value of
cumulative_pop, with country as a row variable (excluding totals for continents),
indicator as a column variable, for week 2022-06 only.
For Algeria you should find the values 596 (cases) and 153 (deaths). Give an
interpretation for these two values.
[3] Max 75 words
(b) Copy your table and paste it on a new spreadsheet as “values”. Based on these values,
make two separate box and whisker plots that show the distribution of cumulative
COVID-19 cases and deaths across countries.
Paste your two charts into your report.
Comment on the overall distribution and the presence or not of outliers.
Compare the mean and the median. What does it say about the shape of the
overall distribution?
[4] Max 100 words
5
Question 1.4 [8]
We want to compare the weekly COVID-19 cases reported in France and the UK over the
period 2020-50 (December 2020) to 2022-06 (February 2022).
Using the Sort & Filter options (click on the arrow next to the variable name) follow these
steps: 1. In country, select only France and United Kingdom; 2. Sort year_week A to Z; 3.
Sort indicator A to Z; 4. Select only cases; 5. Sort country A to Z.
You should now only see the cases in chronological order (2020-50 to 2022-06) for France first
and then the UK.
(a) Make a chart that compares weekly COVID-19 cases in France and the UK over the
period 2020-50 to 2022-06.
Paste your chart into your report.
Briefly present your chart and explain your choices in terms of data
visualisation. You might want to refer to the work of Edward Tufte and Jonathan
Schwabish.
[4] Max 100 words
(b) Using the information presented in the “Definitions and sources” tab as well as your
understanding of common issues with data collection, measurement and revisions,
discuss whether we can confidently compare the impact of the COVID-19 pandemic in
France and in the UK over the period December 2020-February 2022.
[4] Max 150 words
6
Part 2 – A cross-country analysis of economic performance and COVID-19
reported cases and deaths [35]
We will now explore the relationship between national income, inequality and the cumulative
reported COVID-19 cases and deaths as of February 2022.
Question 2.1 [12]
To visualise the relationship between macroeconomic indicators of economic performance
(GDP per capita and Gini coefficient) and COVID-19 cases and deaths we will use a scatter
plot. We will also measure their correlation with the Pearson correlation coefficient.
Using Excel’s PivotTable option, make a table showing the average values of cumulative_pop,
gdp_capita_20 and gini, with country as a row variable (excluding totals for continents) and
indicator as a column variable, for the week 2022-06 only. Copy your table and paste it on a
new spreadsheet as “values”. Delete the values for the cells that indicates #DIV/0!, leave the
cells blank.
The values in the first three rows of your PivotTable should be similar to Figure 1.
Figure 1- First three rows of the PivotTable for question 2.1
(a) Using the data from your table, make a scatter plot showing on the horizontal axis the
GDP per capita and on the vertical axis the cumulative COVID-19 cases (per 100,000
pop. as of 2022-06). Add a linear trend line to your scatter plot. Repeat the same steps
to produce three additional scatter plots: GDP per capita v cumulative COVID-19
deaths (per 1,000,000 pop. as of 2022-06), Gini v cumulative COVID-19 cases (per
100,000 pop. as of 2022-06) and Gini v cumulative COVID-19 deaths (per 1,000,000
pop. as of 2022-06).
Paste your four charts into your report.
Comment on the relationship between macroeconomic indicators and COVID-
19 cases and fatalities.
[7] Max 100 words
(b) Calculate the Pearson correlation coefficient for 1) GDP per capita and cumulative
COVID-19 cases (per 100,000 pop. as of 2022-06); 2) GDP per capita and cumulative
COVID-19 deaths (per 1,000,000 pop. as of 2022-06); 3) Gini and cumulative COVID-
19 cases (per 100,000 pop. as of 2022-06); 4) Gini and cumulative COVID-19 deaths
(per 1,000,000 pop. as of 2022-06).
Present and interpret your results. Do the values of the correlation coefficients
confirm your graphical analysis of 2.1(a)?
[5] Max 100 words
7
Question 2.2 [23]
We want to compare the mean of reported cumulative COVID-19 cases and deaths across
income groups and test whether differences relative to high income countries are statistically
significant.
(a) Use Excel’s PivotTable option to make a table showing the average, standard deviation
(use stdev) and number of observations for cumulative_pop, with income_group as a
row variable and indicator as a column variable, for the week 2022-06 only.
Paste your table into your report.
Comment on any patterns or differences in the mean, distribution and number
of observations across income categories.
[5]Max 100 words
Copy your table and paste as “values” on a new spreadsheet. Use the values in this table to
create two tables similar to Table 1, one for cases and one for deaths. Table 1 shows the values
you should find for cases when comparing low and high-income countries.
Table 1 - Differences in means compared to high-income countries and 95% confidence interval for
COVID-19 cases
Cases Difference in means t* CI 95% (width)
Low income -15670.07 2.5706 7034.10
Lower middle-income
Upper middle-income
The column “Difference in means” is the difference between the average value for either low,
lower middle or upper middle-income categories and the high-income category.
The column “t*” is the critical value from the t-Student distribution. Use the Excel function
TINV, a probability of 0.05 and adopt a conservative approach to calculate the degrees of
freedom: take the smaller sample size between ଵ and ଶ, minus 1. Where ଵ and ଶ, are the
number of observations respectively for high-income countries and low/lower middle/upper
middle-income countries.
The column “CI 95% (width)” is the 95% confidence interval “width” of the difference in
means: the distance from one end of the interval to the mean. Use the following formula:
95% ℎ = ∗ × ඨ
ଵଶ
ଵ
+
ଶଶ
ଶ
Where ∗ is the critical t value you calculated, ଵ and ଶ represent the standard deviation of
your indicator (cases or deaths) respectively for high-income countries and low/lower
middle/upper middle-income countries.
8
(b) After you created two tables similar to Table 1, one for cases and one for deaths (see
instructions above)
Paste your two tables into your report.
Write and describe the Excel formulas you have used to calculate the values
presented in Table 1: difference in means, t* and CI95% for cases, low-income
countries versus high-income countries.
Interpret the values presented in Table 1: difference in means, t* and CI95% for
cases, low-income countries versus high-income countries.
[8] Max 100 words
(c) Plot two column charts, one for cases one for deaths, showing the differences in means
on the vertical axis and the three income categories (low, lower middle, upper middle-
income) on the horizontal axis. Add the confidence intervals from question 2.2(b) to
your charts.
Paste your two charts into your report.
Interpret your findings. Is there evidence that countries from low, lower middle
and upper middle-income countries have been more or less affected by the
COVID-19 pandemic than high-income countries?
[5] Max 100 words
(d) Based on your understanding of data collection in low and middle-income countries
(LMICs), discuss the potential limitations of your analysis in terms of accuracy and
comparability.
[5] Max 100 words
9
Part 3 – Measuring the impact of government intervention on the COVID-19 pandemic:
the case of the UK and Spain [25]
We now want to measure the impact of government intervention on the number of reported
deaths due to COVID-19. For this, we use additional information collected by the Oxford
University Coronavirus Government Response Tracker (OxCGRT).
The OxCGRT calculates a Stringency index, a composite measure of nine types of government
responses including school closures, restrictions on public gatherings, closure of public
transport, stay at home requirements, public information campaigns, restrictions on internal
movements and international travel control
They can identify over time and across countries whether these different policies are
implemented or relaxed by governments.
The Stringency index takes values from 0 to 100. A value of 0 represents a situation in which
no measures are in place and 100 a situation in which all respective policies are implemented
to the maximum stringency level.
Figure 2 uses data available from the OxCGRT dataset. It represents the changes over time,
from October 2021 to January 2022 of the Stringency index values for the UK (full line, in
orange) and Germany (dashed line, in blue).
You can see that the levels of stringency were pretty similar in Germany and the UK until
November 2021. At that time, Germany started increasing restrictions whilst the UK
maintained a lower level of stringency.
Figure 2 The Covid stringency index for the UK and Germany (19 October 2021 – 19 January 2022),
source: OxCGRT
10
Question 3.1 [5]
We want to evaluate the impact of an increase in restrictions in Germany observed at the end
of November (2021-48) by comparing the weekly reported COVID-19 deaths in the UK and
in Germany, 6 weeks before (2021-42) and 6 weeks after (2022-02).
(a) Explain why the increase in restrictions observed in Germany in November 2021 but
not in the UK, can be considered as a natural experiment. Define what would be the
control group and the treatment group in this natural experiment.
[2] Max 75 words
(b) Briefly explain how we can measure the impact of the policy with the differences-in-
differences method. What is the “common trend” or “parallel trend” assumption?
[3] Max 100 words
Question 3.2 [13]
We will now measure the impact of an increase in restrictions in Germany which took place in
2021-48 on reported COVID-19 deaths by comparing the weekly deaths (per 1,000,000
population) in Germany and in the UK, 6 weeks before the policy (2021-42) and 6 weeks after
(2022-02).
(a) Using a Pivot table or a similar method as for question 1.4(a), make a line chart that
compares the weekly COVID-19 deaths per 1,000,000 population in Germany and in
the UK during the period 2021-42 to 2022-02.
Paste your chart into your report.
Based on your chart, can you see whether the increase in restrictions in Germany
had an impact on reported COVID-19 deaths?
[5] Max 75 words
(b) Use the PivotTable option to make a table that shows the (average) weekly COVID-19
deaths (per 1,000,000 population) in Germany and the UK for the week 2021-42 and
the week 2022-02.
Paste your table into your report.
Briefly explain how you built it.
Without doing any calculations, does the data confirm your findings from
question 3.2(a)?
[5] Max 75 words
(c) Calculate the average treatment effect using the differences-in-differences method.
Explain how you calculated the average treatment effect.
Interpret your findings.
[3] Max 75 words
11
Question 3.3 [7]
The differences-in-differences method is widely used to evaluate economic policies. However,
the method relies on important assumptions that guarantee the internal and external validity of
the results.
The researchers Goodman-Bacon and Marcus wrote a short paper discussing the specific issues
with applying differences-in-differences to evaluate the effectiveness of COVID-19 policies.
Read the paper (especially pages 1-4) and answer the following question.
Based on the paper by Goodman-Bacon and Marcus and on your understanding of the
assumptions that guarantee internal and external validity of the differences-in-
differences method, discuss whether you are able to measure a causal effect of an
increase in restrictions in Germany on COVID-19 fatalities.
[7] Max 250 words
END OF THE ASSIGNMENT.