MTH783P Time Series Analysis for Business Spring 2021
The data for this assessment has been extracted from the dataset Uber Daily Travel
Times 18 Cities 2016-2020 on Kaggle, while the weather data have been extracted from
a London-based weather station dataset on Meteostat.
This dataset contains the mean travel time (in seconds) of Uber rides from the center-
most area in London to all other areas, together with information on weather conditions
(temperature, precipitation and wind speeds) from a weather station in London. Data
are provided for almost 2 years, from 2nd January 2016 to 31st December 2017. The
description of the variables is available in Table 1.
Table 1: Description of the variables
Date Date (yyyy-mm-dd)
MeanTravelTimeSeconds Mean travel time (seconds)
tavg Average temperature (°C)
tmin Minimum temperature (°C)
tmax Maximum temperature (°C)
prcp Total precipitation (mm)
windspeed Wind speed (0: low; 1: medium; 2: strong winds)
The dataset is available on QMplus as uberLondon.csv. Use R to analyse the dataset
and address the following tasks.
1. (5 points) Split the data into two datasets: a training dataset and a test dataset.
The aim will be to use the training dataset to forecast the mean travel time in
Decmber 2017. Therefore, the training dataset should include the observations until
30th November 2017. The test dataset should include the 31 daily observations for
2. (25 points) Explore the training dataset: plot and produce summary statistics to
identify the key characteristics of the data and produce a report of your main findings.
The topics that you might choose to discuss include: possible issues with the data
collection, identification of possible outliers or mistakes in the data, role of missing
data (if any), distribution of the variables provided, relationships between variables.
3. Fit a statistical model to the training data and use it to forecast the mean travel
time every day in December 2017.
(a) (20 points) How did you decide which model to fit? Include details of other
models that you tried, if any.
(b) (10 points) What are the underlying assumptions of the model that you have
chosen? Carry out a residual analysis to ensure that the assumptions are satis-
(c) (10 points) Forecast the mean travel time every day in December 2017 and
discuss the results.
(d) (10 points) Discuss any weaknesses of this analysis.
4. (10 points) All tables and plots that you include in your report should be repro-
ducible. Therefore, include in your submission on QMplus a text file with the R
commands that can be used to reproduce your results, including tables and plots.
This text file should include all and only lines of code used to produce results pre-
sented in the report and it should be written in a clear and readable way.
5. (10 points) Marks will be given for the overall presentation of the coursework, the
quality of figures and writing.
All modelling and forecasting choices and assumptions must be justified.
Requirements for the coursework submission:
• The submission deadline is 15:00 on Friday 30th April.
• The submission should include a document in .pdf format containing the answers
to questions 2 and 3 (with a 3-page limit, including figures and discussions) and a
text file (with extension .txt) containing the R-code used for the results presented
in the report. Minimum fontsize is 12.
• While discussing the coursework with your classmates is encouraged, the sub-
mission must be your own independent work. Every submission will be checked
for plagiarism using an automated system. Please refer to the QMUL Academic
Regulations for more information about the definition of plagiarism and the re-
lated penalties: https://qmplus.qmul.ac.uk/mod/book/view.php?id=1322479&
• The policy for late submissions of the School of Mathematical Sciences will be used.
You can read the policy here: https://qmplus.qmul.ac.uk/mod/book/view.php?