15. Regression Models for Prediction
Monash College Topic 15: Regression Models for Prediction MCD2080 200 / 250
Key Objectives for Topic 15
• Identify the components of a time series; seasonal, trend, cycle and
irregular.
• Estimate a multiple linear regression model when the data represents
observations over time.
• Interpret the coefficients of a multiple linear regression model when
the data represents observations over time.
• Use Excel to forecast in-sample and out-of-sample values of a
time-series and check the forecast accuracy.
• Interpret the R2 and standard error of the regression model to assess
the fit of the model.
Monash College Topic 15: Regression Models for Prediction MCD2080 201 / 250
15.1. Regression When the Data is Recorded Over Time
Mostly we have been using regression to look at how one variable effects
another.
e.g. How education effects income.
Another important use of regression is as a tool for prediction and
forecasting.
Forecasting is important for time series data. This is data that is indexed
by time,
e.g. production, unemployment, sales, inflation, interest rates, prices,
inventory, wages, etc.
An example of some time series data is the burger company McDonald’s
annual revenue which goes back to 1966.
Monash College Topic 15: Regression Models for Prediction MCD2080 202 / 250
15.1. Regression When the Data is Recorded Over Time
Figure: Example of Time Series
Data—McDonald’s Revenue Figure: Line Graph of McDonald’s
Revenue
0
5,000
10,000
15,000
20,000
25,000
30,000
1966 1970 1974 1978 1982 1986 1990 1994 1998 2002 2006 2010 2014
R e
v e
n u
e
( $
m
i l .
)
Year
Monash College Topic 15: Regression Models for Prediction MCD2080 203 / 250
15.2. Components of a Time Series
Data that is recorded over time has some special features or components.
Let’s illustrate by looking at Australia’s unemployment rate.
Figure: Australia’s Unemployment
Rate—Long Time Span
2.0
4.0
6.0
8.0
10.0
12.0
14.0
J a
n ‐
1 9
9 0
M
a r
‐ 1
9 9
1
M
a y
‐ 1
9 9
2
J u
l ‐ 1
9 9
3
S e
p ‐
1 9
9 4
N
o v
‐ 1
9 9
5
J a
n ‐
1 9
9 7
M
a r
‐ 1
9 9
8
M
a y
‐ 1
9 9
9
J u
l ‐ 2
0 0
0
S e
p ‐
2 0
0 1
N
o v
‐ 2
0 0
2
J a
n ‐
2 0
0 4
M
a r
‐ 2
0 0
5
M
a y
‐ 2
0 0
6
J u
l ‐ 2
0 0
7
S e
p ‐
2 0
0 8
N
o v
‐ 2
0 0
9
J a
n ‐
2 0
1 1
M
a r
‐ 2
0 1
2
M
a y
‐ 2
0 1
3
J u
l ‐ 2
0 1
4
S e
p ‐
2 0
1 5
N
o v
‐ 2
0 1
6
U
n e
m
p l
o y
m
e n
t
R
a t
e
( %
)
Month and Year
Figure: Australia’s Unemployment
Rate—Short Time Span
4.0
4.5
5.0
5.5
6.0
6.5
7.0
J a
n ‐
2 0
1 2
A
p r
‐ 2
0 1
2
J u
l ‐ 2
0 1
2
O
c t
‐ 2
0 1
2
J a
n ‐
2 0
1 3
A
p r
‐ 2
0 1
3
J u
l ‐ 2
0 1
3
O
c t
‐ 2
0 1
3
J a
n ‐
2 0
1 4
A
p r
‐ 2
0 1
4
J u
l ‐ 2
0 1
4
O
c t
‐ 2
0 1
4
J a
n ‐
2 0
1 5
A
p r
‐ 2
0 1
5
J u
l ‐ 2
0 1
5
O
c t
‐ 2
0 1
5
J a
n ‐
2 0
1 6
A
p r
‐ 2
0 1
6
J u
l ‐ 2
0 1
6
O
c t
‐ 2
0 1
6
U
n e
m
p l
o y
m
e n
t
R
a t
e
( %
)
Month and Year
The features to note in these figures are:
Monash College Topic 15: Regression Models for Prediction MCD2080 204 / 250
15.2. Components of a Time Series
• Trend: A trend is a persistent, long term upward or downward
pattern of movement. The duration of a trend is usually several years.
The source of such a trend might be gradual and ongoing changes in
technology, population, wealth, etc.
A clear downward trend is evident in the unemployment rate from the
early 1990s to the late 2000s with a fairly flat trend thereafter.
• Cycle: A cycle is a pattern of up-and-down swings that tend to
repeat every 2-10 years. They have periods of contraction, leading to
peaks, then contractions leading to troughs, with the cycle then
repeating itself.
Cyclical fluctuations are evident that relate to the state of the
economy. Australia was in recession in the early 1990s and this led to
a rise in the unemployment rate. A much smaller, though still clear,
rise in the unemployment rate can be seen during the time of the
Global Financial Crisis around 2008-9.
Monash College Topic 15: Regression Models for Prediction MCD2080 205 / 250
15.2. Components of a Time Series
• Seasonal: A seasonal pattern is a regular pattern of fluctuations that
occur within each year, and tend to repeat year after year.
In the right hand figure what is evident is that in every year the
unemployment rate peaks in a particular month—February.
• Irregular: This component represents whatever is ‘left over’ after
identifying the other three systematic components. It represents the
random, unpredictable fluctuations in data. There is no pattern to the
irregular component.
In this unit we will discuss how to model the trend and seasonal
components.
• In later units you will discuss modeling the cyclical component as this
is a bit more complicated.
Monash College Topic 15: Regression Models for Prediction MCD2080 206 / 250
15.3. Modelling the Trend
To illustrate the methods we are going to use quarterly sales revenue for
the retail chain Target from 2000Q1 to 2016Q3.
Figure: Quarterly Sales Revenue for Target
0
5000
10000
15000
20000
25000
2 0
0 0
Q
1
2 0
0 0
Q
4
2 0
0 1
Q
3
2 0
0 2
Q
2
2 0
0 3
Q
1
2 0
0 3
Q
4
2 0
0 4
Q
3
2 0
0 5
Q
2
2 0
0 6
Q
1
2 0
0 6
Q
4
2 0
0 7
Q
3
2 0
0 8
Q
2
2 0
0 9
Q
1
2 0
0 9
Q
4
2 0
1 0
Q
3
2 0
1 1
Q
2
2 0
1 2
Q
1
2 0
1 2
Q
4
2 0
1 3
Q
3
2 0
1 4
Q
2
2 0
1 5
Q
1
2 0
1 5
Q
4
2 0
1 6
Q
3
R e
v e
n u
e
( $
m
i l .
)
Year and Quarter
Monash College Topic 15: Regression Models for Prediction MCD2080 207 / 250
15.3. Modelling the Trend
The linear trend model is given by the equation:
Yt = β0 + β1t + et
Our regression results are shown below.
Figure: Regression Results—Target Revenue on a Time Trend
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.802386482
R Square 0.643824067
Adjusted R Square 0.638344437
Standard Error 2275.082382
Observations 67
ANOVA
df SS MS F Significance F
Regression 1 608149350.2 608149350.2 117.494082 3.27349E‐16
Residual 65 336439989.8 5175999.843
Total 66 944589339.9
Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 9514.203528 562.1725556 16.92399146 1.64827E‐25 8391.467406 10636.93965 8391.467406 10636.93965
Time 155.7872536 14.37222226 10.83946872 3.27349E‐16 127.0839437 184.4905635 127.0839437 184.4905635
Monash College Topic 15: Regression Models for Prediction MCD2080 208 / 250
15.3. Modelling the Trend
Figure: Quarterly Sales Revenue for
Target with Regression Line
y = 155.79x + 9514.2
R² = 0.6438
0
5000
10000
15000
20000
25000
2 0
0 0
Q
1
2 0
0 0
Q
4
2 0
0 1
Q
3
2 0
0 2
Q
2
2 0
0 3
Q
1
2 0
0 3
Q
4
2 0
0 4
Q
3
2 0
0 5
Q
2
2 0
0 6
Q
1
2 0
0 6
Q
4
2 0
0 7
Q
3
2 0
0 8
Q
2
2 0
0 9
Q
1
2 0
0 9
Q
4
2 0
1 0
Q
3
2 0
1 1
Q
2
2 0
1 2
Q
1
2 0
1 2
Q
4
2 0
1 3
Q
3
2 0
1 4
Q
2
2 0
1 5
Q
1
2 0
1 5
Q
4
2 0
1 6
Q
3
R e
v e
n u
e
( $
m
i l .
)
Year and Quarter
Interpreting βˆ0 and βˆ1:
βˆ0 = 9514: This is the fitted
value of Y when X = 0; since
X is time in this case, and
t = 1 in 2000Q1, then t = 0 in
1999Q4. So the model predicts
a trend value for revenue of
$9,514 million in 1999Q4.
βˆ1 = 156: This is the change in
Y for a one unit change in X .
In this case, X increases by one
unit each quarter. So we say
that the model estimates the
average growth in revenue to
be $156 million per quarter.
Monash College Topic 15: Regression Models for Prediction MCD2080 209 / 250
15.4. Modelling Seasonal Patterns
The previous model is not really adequate. There are periods where the
predicted and actual values are quite different and at least part of the
reason is due to seasonality.
We augment our regression model with quarterly dummy variables. This
gives the model:
Yt = β0 + β1t + β2Q1t + β3Q2t + β4Q3t + et
Here, Q1t is a dummy variable that takes the value 1 if the observation
falls in the first quarter of each year, and zero for the other quarters.
Likewise for Q2t and Q3t . We do not include a dummy for Q4 as this is
our reference quarter.
Monash College Topic 15: Regression Models for Prediction MCD2080 210 / 250
15.4. Modelling Seasonal Patterns
Figure: Regression Results—Target Revenue on Time and Seasonal Dummy
Variables
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.96123786
R Square 0.923978223
Adjusted R Square 0.919073593
Standard Error 1076.2042
Observations 67
ANOVA
df SS MS F Significance F
Regression 4 872779980.2 218194995 188.3889473 6.03956E‐34
Residual 62 71809359.76 1158215.48
Total 66 944589339.9
Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 13064.13587 354.8140174 36.81967236 7.56019E‐44 12354.87275 13773.39898 12354.87275 13773.39898
Time 155.630195 6.803248223 22.87586604 6.35029E‐32 142.0306956 169.2296944 142.0306956 169.2296944
Q1 ‐4800.814658 374.9200902 ‐12.80490105 4.5703E‐19 ‐5550.269239 ‐4051.360076 ‐5550.269239 ‐4051.360076
Q2 ‐4599.738971 374.8583597 ‐12.27060529 3.07501E‐18 ‐5349.070155 ‐3850.407787 ‐5349.070155 ‐3850.407787
Q3 ‐4569.310342 374.9200902 ‐12.18742463 4.15096E‐18 ‐5318.764924 ‐3819.855761 ‐5318.764924 ‐3819.855761
Monash College Topic 15: Regression Models for Prediction MCD2080 211 / 250
15.4. Modelling Seasonal Patterns
We can interpret the coefficients as follows:
• βˆ0 = 13064: The estimated revenue for Target in period t = 0 (4th
quarter 1999) is $13,064 million.
• βˆ1 = 156: The estimated average growth in sales is $156 million
per quarter.
• βˆ2 = −4801: We estimate that sales in the 1st quarter each year are
typically $4,801 million below what they would be in the 4th quarter,
after adjusting for the trend.
• βˆ3 = −4600: We estimate that sales in the 2nd quarter each year are
typically $4,600 million below what they would be in the 4th quarter,
after adjusting for the trend.
• βˆ4 = −4569: We estimate that sales in the 3rd quarter each year are
typically $4,569 million below what they would be in the 4th quarter,
after adjusting for the trend.
Monash College Topic 15: Regression Models for Prediction MCD2080 212 / 250
15.5. Using the Time Series Model to Forecast
Let us use the model of Target sales revenue to forecast revenue into the
future.
The data ended in the 3rd quarter of 2016, with t = 67. So to forecast
sales for the 4th quarter of 2016, we plug in t = 68, and
Q1t = Q2t = Q3t = 0, giving:
4th quarter of 2016 :Yˆ68 = 13064 + 156(68) = 23, 647
Predictions for the quarters that follow could be calculated in the same
way:
1st quarter of 2017 :Yˆ69 = 13064 + 156(69)− 4801 = 19, 002
2nd quarter of 2017 :Yˆ70 = 13064 + 156(70)− 4600 = 19, 359
3rd quarter of 2017 :Yˆ71 = 13064 + 156(71)− 4569 = 19, 545
Monash College Topic 15: Regression Models for Prediction MCD2080 213 / 250
15.5. Using the Time Series Model to Forecast
Figure: Target Quarterly Revenue
Forecast—Data
Figure: Target Quarterly Revenue
Forecast—Graph
0
5,000
10,000
15,000
20,000
25,000
2 0
0 0
Q
1
2 0
0 0
Q
4
2 0
0 1
Q
3
2 0
0 2
Q
2
2 0
0 3
Q
1
2 0
0 3
Q
4
2 0
0 4
Q
3
2 0
0 5
Q
2
2 0
0 6
Q
1
2 0
0 6
Q
4
2 0
0 7
Q
3
2 0
0 8
Q
2
2 0
0 9
Q
1
2 0
0 9
Q
4
2 0
1 0
Q
3
2 0
1 1
Q
2
2 0
1 2
Q
1
2 0
1 2
Q
4
2 0
1 3
Q
3
2 0
1 4
Q
2
2 0
1 5
Q
1
2 0
1 5
Q
4
2 0
1 6
Q
3
2 0
1 7
Q
2
R e
v e
n u
e
( $
m
i l .
)
Year and Quarter
Monash College Topic 15: Regression Models for Prediction MCD2080 214 / 250
15.6. Other Aspects of Modelling Time Series
Outliers
Prior to modeling it is useful to plot the data identify any ‘outliers’—a
value that occurs because of a one-off event. e.g. the September 11th
attack, a legislative change, a big worker strike, or a political revolution.
It may be best to remove these data points from the modeling exercise as
they could distort your results.
Cycle
We have learned how to model trend and seasonal patterns, but not
cycle—this will come later in your study of econometrics.
Meanwhile, we should recognise that our models will produce forecasts
that ignore this component, and maybe make some allowance for this.
Monash College Topic 15: Regression Models for Prediction MCD2080 215 / 250
15.6. Other Aspects of Modelling Time Series
Standardising
We have talked about the importance of standardising data before drawing
interpretations or conclusions from it.
For example, it may have been more sensible (depending upon our precise
objective) to model the ‘real’ growth in Target revenue. This would
involve dividing by a price index such as the CPI.
Modeling Growth Rates vs. Levels
In practice it may be more sensible to model the growth rate of Y—this is
Yt−Yt−1
Yt−1 —rather than Yt .
This will be discussed in later units.
Monash College Topic 15: Regression Models for Prediction MCD2080 216 / 250
15.6. Other Aspects of Modelling Time Series
Other Functional Forms
In our modeling of Target’s revenue we implicitly assumed that trend
revenue increased by a constant dollar amount each quarter. It may be
more sensible to suppose that revenue increases by a constant percentage
amount. This would imply a model like:
log (Yt) = β0 + β1t + β2Q1t + β3Q2t + β4Q3t + et
In this case it can be shown that: β1 =
∆Y /Y
∆t . i.e. β1 is the percentage
change in Y each period.
Monash College Topic 15: Regression Models for Prediction MCD2080 217 / 250