r studio代写-ECON2209|学霸联盟

r studio代写-ECON2209

时间：2022-03-18

UNSW ECON2209 Assessment
Problem Set 1: Solutions
2022
At the start of an R session for this course, remember to type library(fpp3) in the R Studio Console. This
will then load (most of) the R packages you will need, including some data sets.
Details:
• Total value: 10 marks.
• Submission is due on Friday of Week 3 (4 March), 5pm.
• A submission link is on the Moodle site under Assessments.
• Submit your answer document in PDF format. Your file name should follow this naming convention:
PS1_your first name_zID_your last name_ECON2209.pdf
For example: PS1_John_z1234567_Smith_ECON2209.pdf
• You get one opportunity to submit your file. Make sure that you submit the file that you intend to
submit.
• Your submitted answers should include the R code that you used and any figures produced. Note that
in the bottom right quadrant of RStudio, under the Plots tab there is an export button. This can be
used to export figures for inserting into your answer document; e.g. select “Copy to Clipboard” and
paste into a Word document. (Other methods are also possible.) You do not need to use Word. You
could use e.g. R Markdown, in which case the figures would be automatically included in your PDF file.
• Problems are not all of equal value.
Problem 1 [3 marks]:
The USgas package contains data on the demand for natural gas in the US.
a. Install the USgas package: install.packages("USgas")
b. Create a tsibble from us_residential with date as the index and state as the key.
c. Plot (in one figure) the monthly residential natural gas consumption by state for Florida,
Texas, New York and Minnesota from the start of 2010.
Submit your code, figure and any observations on the plotted series. You are expected to comment on any
interesting aspects of each series and to compare them.
Solution 1
library(USgas)
us_tsibble_res <- us_residential %>%
as_tsibble(index=date, key=state)
us_tsibble_res %>%
filter(state %in% c("Florida", "Texas", "New York", "Minnesota")) %>%
1
filter(year(date) >= 2010) %>%
autoplot(y/1e3) +
labs(y = "billion cubic feet") + xlab("Month")
0
25
50
75
2010 2015 2020
Month
bi
llio
n
cu
bi
c
fe
e
t state
Florida
Minnesota
New York
Texas
This solution puts gas into units of billions of cubic feet (by using ‘y/1e3’). This is not required, but it makes
the figure a bit tidier than otherwise.
Any legitimate comments on the data series are fine, but can include:
• Pronounced seasonality, especially noticeable in New York, Texas and Minnesota, less so in Florida.
Demand goes up in the winter months, which makes sense if natural gas is being used for heating.
• Florida has the smallest consumption of natural gas for residential use, possibly because it is warmer
than the other states.
• Texas is warmer than Minnesota in the winter yet has higher consumption. This could reflect that
natural gas is a more readily available option for household heating in Texas than it is in Minnesota. It
could also reflect that there are cheaper sources of energy than natural gas for heating in Minnesota,
such as hydroelectric power. But perhaps it simply reflects that the population of Minnesota (5.75
million) is a lot smaller than the population of Texas (29.1 million). Texas has a population around
five times the size of Minnesota, yet its consumption of gas is not five times as large. So adjusting for
population will show that per capita consumption is higher in Minnesota than Texas.
• Similar comparisons between the other states can also be made; e.g. note that Florida has a population
of around 21.5 million, which is approximately 75% of the population of Texas, yet it has much less
than 75% of natural gas consumption by Texas in the winter months.This is likely to be due to climate
differences, as the lack of a discernible seasonal pattern in the series indicates that climate is not a big
driver of natural gas consumption in Florida.
• Some data points seem to be missing near the end of the period covered by the data. (Notice how R
still makes us a nice plot even with missing data.) Given their timing, the missing observations may be
due to problems with data collection during the early period of the pandemic.
Problem 2 [4 marks]:
For this problem you will will explore features of a data series provided in us_employment.
Specifically, the number of people employed in the “Leisure and Hospitality: Food Services and
Drinking Places” sector of the US economy.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(),
ACF().
2
• Can you spot any seasonality, cyclicity and trend?
• What do you learn about the series?
• What can you say about the seasonal patterns?
• Can you identify any unusual years?
Hint 1: Data are only available for this sector from January 1990, but for other sectors in the data set they
are available for a longer period. Hence there are a lot of “NA” (“Not Available”) values in the data set for
this sector. You probably do not want to plot these! You can use a filter command to drop observations before
January 1990, but a simple way to exclude the NA values is to use the command drop_na():
us_employment %>%
filter(Title == "Leisure and Hospitality: Food Services and Drinking Places") %>%
drop_na()
Hint 2: For the lag plots, you may want to consider plotting up to 12 lags using gg_lag(Employed, geom=
'point', lags=1:12). Explain why this may be of interest.
Solution 2
us_employment %>%
filter(Title == "Leisure and Hospitality: Food Services and Drinking Places") %>%
drop_na() %>%
autoplot(Employed)
6000
8000
10000
12000
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
Em
pl
oy
e
d
There is a strong trend and seasonality. Some cyclic behaviour is seen, with a big drop due to the global
financial crisis.
us_employment %>%
filter(Title == "Leisure and Hospitality: Food Services and Drinking Places") %>%
drop_na() %>%
gg_season(Employed)
3
6000
8000
10000
12000
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month
Em
pl
oy
e
d
1999
2009
2019
Employment seems highest around June and lowest in January in this sector, perhaps because fewer people
go out to eat and drink following the Christmas-New Year period.
us_employment %>%
filter(Title == "Leisure and Hospitality: Food Services and Drinking Places") %>%
drop_na() %>%
gg_subseries(Employed)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
CEU7072200001
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
19
90
20
00
20
10
20
20
6000
8000
10000
12000
Month
Em
pl
oy
e
d
This plot confirms that there is on average a peak in employment in this sector around the warmer months of
June, July and August, with employment being on average lowest in January. The downward spike in the
plots for each month across the years shows that global financial crisis affected employment in all months.
us_employment %>%
filter(Title == "Leisure and Hospitality: Food Services and Drinking Places") %>%
drop_na() %>%
gg_lag(Employed, geom= 'point', lags=1:12)
4
lag 9 lag 10 lag 11 lag 12
lag 5 lag 6 lag 7 lag 8
lag 1 lag 2 lag 3 lag 4
6000 8000 10000120006000 8000 10000120006000 8000 10000120006000 8000 1000012000
6000
8000
10000
12000
6000
8000
10000
12000
6000
8000
10000
12000
lag(Employed, n)
Em
pl
oy
e
d
season
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
The dots do not deviate much from the 45 degree line for any lag length. This is because of the strong
trend. But the relationship is closest for the case of 12 lags; the dots lie almost completely on the 45 degree
line. This is because Employment in January is likely to be most similar to employment 12 months earlier
(i.e. January of the previous year), and same for the other months. Hence our interest in looking at lags up
to 12 to see if this is true.
us_employment %>%
filter(Title == "Leisure and Hospitality: Food Services and Drinking Places") %>%
drop_na() %>%
ACF(Employed) %>%
autoplot()
5
0.00
0.25
0.50
0.75
1.00
6 12 18 24
lag [1M]
a
cf
In this plot, as with the previous one, the trend is so dominant that it is hard to see anything else. We need
to remove the trend so we can explore the other features of the data.
Problem 3 [3 marks]:
For the following series, find an appropriate Box-Cox transformation in order to stabilise the
variance, if required. Report your observations on each series and the transformations you tried.
• Cement from aus_production
• Business class passengers between Melbourne and Sydney from ansett
Solution 3
Australian cement production (2 marks)
aus_production %>%
autoplot(Cement)
500
1000
1500
2000
2500
1960 Q1 1980 Q1 2000 Q1
Quarter [1Q]
Ce
m
en
t
Variation in this series appears to change across different levels of the series. This suggests that a quite strong
6
transformation may be appropriate. This can be seen if a strong transformation (such as log) is used.
aus_production %>%
autoplot(log(Cement))
6.5
7.0
7.5
1960 Q1 1980 Q1 2000 Q1
Quarter [1Q]
lo
g(C
em
en
t)
Guerrero’s method suggests that something around λ = −0.161 is appropriate. This is a very strong
transformation, as it is close to 0.
aus_production %>%
features(Cement, features = guerrero)
## # A tibble: 1 x 1
## lambda_guerrero
##
## 1 -0.161
aus_production %>%
autoplot(box_cox(Cement, -0.161))
3.9
4.0
4.1
4.2
4.3
4.4
1960 Q1 1980 Q1 2000 Q1
Quarter [1Q]
bo
x_
co
x(C
em
en
t, −
0.1
61
)
This series appears very similar to the case when logs were taken. Either seems to do a good job of stabilising
the variation in the series so that it is not changing with the level of the series.
7
Business class passengers between Melbourne and Sydney (1 mark)
ansett %>%
filter(Airports == "MEL-SYD", Class == "Business") %>%
autoplot(Passengers) +
labs(title = "Business passengers", subtitle = "MEL-SYD")
0
2500
5000
7500
10000
1990 W01 1991 W01 1992 W01 1992 W53
Week [1W]
Pa
ss
e
n
ge
rs
MEL−SYD
Business passengers
The data series does not appear to vary proportionally with the level of the series. There are some periods
in this time series that may need further attention, such as the very large and temporary increase in the
business passengers in the first half of 1992, but they are probably better resolved with modelling rather than
through using transformations.
8