ECON2040-无代写
时间:2023-12-06
ECON2040 – COMPUTATIONAL ECONOMICS
Semester 1, 2023-24
Coursework 2
• This coursework consists of three questions and is worth 35% of the overall mark for ECON2040.
• The deadline for submission is 18:00 GMT on Wednesday 13 December 2023.
• Standard University policies and procedures will be followed for late submission, extensions,
and academic integrity. (See the Module Syllabus and Programme Handbook for details.)
• Submission is via Blackboard. Your answers should be composed of three parts: a report (type-
set using Word/LaTex and saved in pdf format), containing your analysis/summary statis-
tics/output (e.g. tables and figures), AND csv files containing your data output AND your
Python and R scripts, containing the code that you used to obtain your results.
– You should submit your report via TurnItIn on Blackboard in a file called ECON2040CW2_ID.pdf,
where ID is your student ID number, for example ECON2040CW2_12345678.pdf. In the As-
signments folder, click on Coursework 2 – Report Submission to submit your report.
– Please make sure that your answers are typeset, and that the output is well-organised and
clear and that tables and figures are appropriately labelled.
– You should not include Python/R code used in your analysis in your report, but you
must submit a separate Python and R scripts via Blackboard containing your code called
ECON2040CW2_ID.py and ECON2040CW2_ID.R, where ID is your student ID number, for
example ECON2040CW2_12345678.R. In the Assignments folder, click on Coursework 2 –
Code Submission to submit your code.
– Please also upload your CSV files called ECON2040CW2_Sales_ID.csv and ECON2040CW2_WDI_ID.csv,
where ID is your student ID number, for example ECON2040CW2_Sales_12345678.csv. In
the Assignments folder, click on Coursework 2 – CSV Submission to upload your data.
– Your Python and R scripts should include comments throughout to explain what you are
doing and each section should be properly labelled.
• In answering the questions, please keep in mind the Grade Descriptors for Year 2 as posted on
Blackboard at the start of the module.
• It is the policy of the Department of Economics that coursework is anonymous, therefore please
do not put your name on any part of your report.
1
1. The Sales Data.zip contains 12 csv files with the information about the sales of a U.S. electronic
goods store chain throughout 12 months of a year. Every line contains information about a
product sold by one of the stores in a given month. It has the following columns:
• Order ID identifies a specific sales order. There may be several lines with the same Order
ID, which means that all those goods were part of the same order.
• Product gives the name of the product sold.
• Quantity Ordered provides information about how many items of this product were sold
in this order.
• Price Each lists the price at which each item was sold.
• Order Date gives the date of the sale.
• Purchase Address provides the address of the store where sale took place.
Some of the rows may contain missing information or processing errors, so you may need to do
some data cleaning. Use Python’s Pandas library to analyse this data in order to answer the
following questions:
(a) Merge all the data into one Pandas dataframe. How many observations are there altogether?
(b) Calculate and plot the total value of sales by month. In what month were the total sales
the highest and the lowest? What were they equal to?
(c) In what city were the total annual sales the highest? Be careful, since in the U.S., there
may be several cities with the same name in different states.
(d) The company wants to order some online advertising, and wants to know at what time of
the day most of the sales usually happen. Plot the distribution of sales by hour.
(e) Which are the items that are most likely to be sold together?
2. The European Commission has asked you to perform a study on the relationships between
pollution and socio-economic indicators of Member States. Your task is to organise and sum-
marise selected data using the World Development Indicators. The following files (obtained
from https://databank.worldbank.org) contain, for each of the 27 EU Member States and for
the years 2012 and 2017:
• pm25.csv: PM2.5 air pollution, mean annual exposure (micrograms per cubic meter)
• gini.csv: Gini Index
• mortality.csv: Mortality rate (per 1,000 male adults)
• seats.csv: Proportion of seats held by women in national parliaments
• unemployment.csv: Unemployment rate
(a) Merge all data so to obtain a panel of observations with country code, year and all the
key variables contained in the above datasets. Save this date frame in a csv file called
ECON2040CW2_WDI_ID.csv, where ID is your student ID number.
(b) Produce one table summarizing key variables for the year 2012 and one table summarizing
the change in the values of variables (i.e., the difference between 2017 and 2012). Make sure
that your summary statistics contain at least the mean and the standard deviation. Export
these tables in word format or as an image, with appropriate labelling.
(c) Produce a correlation table or a correlation plot of the key variables for year 2012. Provide
an interpretation of the associations between pollution (PM 2.5) and the other indicators.
2
3. Imagine you are the data analyst for a chain of popular hotels. You have been asked to prepare
a visualisation of recent booking trends. The file hotel_bookings.csv contains data about
hotel demand from real bookings that arrived between 1 July 2015 and 31 August 2017.
(a) Produce a graph of the booking lead time vs the price, as measured by the Average Daily
Rate (ADR), for all City Hotel bookings received in February 2017. Differentiate the points
on your plot by the number of adults on your booking. Remember to include a legend.
(b) Plot the average weekly price for each week in 2016. You should calculate one time series
for the City Hotel and another one for the Resort Hotel.
(c) Compare the Resort Hotel’s ADR distribution of August bookings for each of the three
years. You should use box plots to do this.
3