UNSW ECON2209 Assessment
Project
2022
At the start of an R session for this course, remember to type library(fpp3) in the R Studio Console. This
will then load (most of) the R packages you will need, including some data sets.
Details:
• Total value: 25 marks.
• Submission is due on Thursday of Week 9 (15 April), 5pm.
• A submission link is on the Moodle site under Assessments.
• Submit your answer document in PDF format. Your file name should follow this naming convention:
CP_your first name_zID_your last name_ECON2209.pdf
For example: CP_John_z1234567_Smith_ECON2209.pdf
• You get one opportunity to submit your file. Make sure that you submit the file that you intend to
submit.
• Your submitted answers should include the R code that you used.
• Format: No longer than 20 pages, including code, figures, tables and any appendices. Do not include
a separate title page. At least 11 point font should be used, with adequate margins for comments. Any
extra pages will not be marked.
• This project requires you to analyse time series data. The series will differ between students.
• The project is set out as containing four Parts, each with multiple subparts. This is mainly to guide you
with your analysis of your data series. It is strongly recommended that you follow the given sequence
in your analysis and in presenting your results.
• Unless approval for an extension is given on medical grounds (supported by a medical certificate
submitted through the Special Consideration process) there will be an immediate late penalty of 5%
from 5:01pm on 15 April, followed by additional penalities of 5% per calendar day or part thereof.
Submissions will not be accepted after 5 days (120 hours) of the original deadline.
Marking Criteria: Marks are not awarded by Part, but by overall achievement against the following criteria:
(a) Suitability of methods. (10)
(b) Interpretation of the results, arguments used and conclusions drawn. (10).
(c) Presentation: Appropriate style of graphs, tables, reporting and clarity of writing. (5)
Maximum marks: 25
Note that criteria (b) and (c) together comprise 60% of the overall mark for the project.
1
Select the data series that you will analyse
Forecasting official statistics is very common in business and government. In this project you will
use data from the Australian Bureau of Statistics (ABS). Specifically, you will use data on Gross
Value Added for an industry from the latest release of the Australian National Accounts: ABS
Catalogue 5206.0. We will be interested in real Gross Value Added (GVA), or GVA expressed in
constant (2019-2020) dollar terms. Real GVA is then a quantity or output index, or a “volume”
index in the terminology used by the ABS. These data are available from Table 6 of the national
accounts.
We can download the Excel spreadsheet from the ABS website, or we can use an R package to
read in the data, as follows.
Install the package readabs:
install.packages("readabs")
We can use this package to read the data from ABS website and create a tsibble, as follows:
library(readabs)
nadata <- read_abs("5206.0", tables="6", check_local=FALSE) %>%
mutate(Quarter = yearquarter (date)) %>%
as_tsibble(
index = Quarter,
key = c (series_id)
)
Keep only the volume series for the industries, dropping a few data series in the full data set that
we are not interested in modelling (e.g. the “Statistical Discrepancy”):
nadata_vol <- nadata %>% filter(series_type == "Original") %>%
filter(!(`series_id` %in% c("A2323348A", "A2302356K",
"A2302358R", "A2302459A", "A2529213W")))
You must use the following method for selecting your data series.
Use the seven digits of your UNSW student ID to get the data series that you will analyse in this
project, as in the following example for the case when your student ID is Z7654321:
Randomly select the volume series for a single industry that you are going to analyse:
set.seed(7654321)
myseries <- nadata_vol %>%
filter (`series_id` == sample(nadata_vol$`series_id` , 1), year(Quarter)>=1995)
On my computer the above code selects the Electricity industry within the “Electricity, Gas,
Water and Waste Services” sector (sector D, in the ABS classification). It is possible that your
student ID number will lead to the selection of the same series, but more likely that it will not. It
may result in the selection of the whole sectOr (e.g. sector D) rather than an individual industry
(e.g. Electricity). That is fine. (Note while sample() takes a random sample, using the same
“seed” through set.seed() will result in the same series being selected each time.)
The ABS spreadsheet includes the official seasonally adjusted series. This can be extracted for
your industry as follows:
myseries_sa <- nadata %>%
filter(series_type == "Seasonally Adjusted") %>%
filter(series == myseries$series[1], year(Quarter)>=1995)
2
Part 1: Data Exploration and Transformation
Plot your volume and seasonally adjusted volume data together using the following code:
myseries %>%
autoplot() +
autolayer(myseries_sa, .vars = value, colour="red") +
labs(y = "Volume Index",
title ="GVA (black) and Seasonally Adjusted GVA (red)",
subtitle = myseries$series[1])
a. Based on the plot, discuss characteristics of each series.
b. Now explore your original (non-seasonally adjusted) GVA series using the following functions,
being sure to discuss what you find:
gg_season(), gg_subseries(), gg_lag(), ACF() %>% autoplot()
c. What Box-Cox transformation, if any, would you select for your (non-seasonally adjusted)
data?
Part 2: Time Series Decomposition
a. Consider the last ten years (i.e. 40 quarters) of your GVA data. Use an STL decomposition
and produce a standard decomposition plot showing the trend-cycle and seasonal components.
b. Then plot your seasonally adjusted data together with the official seasonally adjusted data
for the last ten years. What observations can you make about the respective series?
Part 3: ETS Forecasting
For your (untransformed) data series:
a. Create a training dataset (myseries_train) consisting of observations before 2015. Check
that your data have been split appropriately by producing a plot of myseries_train and
myseries in one figure.
b. Fit an ETS model to your training data using the default ETS() command. Describe the
model chosen and comment on the residuals.
c. Produce forecasts for the test data, and plot these along with the data series from 2005.
Include and comment on the prediction intervals.
d. Compare the accuracy of the model on both the training data and the test data.
Part 4: ARIMA Modelling
a. For your full original (non-seasonally adjusted) data series, using the visual inspection of
plots, find the appropriate order of differencing, after transformation if necessary, to obtain
stationary data. Then use statistical tests to check your choices.
b. Select an appropriate ARIMA model. Explain your choice and report the results.
c. Using the training data set as before, try an STL decomposition (applied to the transformed
series if appropriate), followed by ARIMA on the seasonally adjusted data; that is, an
STL-ARIMA model. Using the test data set, compare the forecast performance with the
ETS model you obtained in Part 3, and plot forecasts from both models on the same figure,
3
along with the actual data from 2005 onwards. Include and comment on the prediction
intervals.
4