程序代写案例-QBUS6830
时间:2022-06-01
BUSINESS SCHOOL
QBUS6830
Financial Time Series and Forecasting
Practice questions for final exam
Q1
(a) If we can observe repeated finite values of a real data series, like a price series,
explain why and how it could possibly have an infinite mean, infinite variance and
infinite 4th moments.
It is not just possible, but quite usual, for a variable to have infinite variance but
still yield finite values of itself. A variance is a mathematical formula that involves
a sum over an infinite number of values over the real line, and effectively averages
(X )2. As an example, a Student-t distribution with 2 degrees of freedom has
infinite variance, yet will only generate finite values from its distribution. Variance
being infinite simply means that the tails of the distribution are fat or thick or long
enough so that the weighted average of (X )2 , weighted by the probability
density function, over the whole real line is infinite. It implies we should expect to
see outliers in the data from this distribution.
Another example is an asset price series. The price can never be infinite, but as the
series progresses over time, it does not have to have, and indeed usually does not
have, a long-run or any mean reversion, and if it follows a random walk it has no
long-run variance either.
(b) What is the purpose of a factor model?
The purpose of a factor model is to find a small number of underlying components
in multiple series of data, so as to learn what might drive their variation. It applies
to situations where a number of variables have been observed at the same times,
over the same time period. The factor model tries to find a set of nonlinear
combinations of these series that can explain most of the variation in these series.
(c) Explain why the CAPM is not usually used for forecasting purposes. Explain one
method you could use to forecast with this model.
The CAPM relates asset premiums to market premiums, both at the same time. To
forecast what will happen to the asset premium in the next period (e.g. tomorrow),
using the CAPM would require knowing the market premium for that same time,
the next period (e.g. tomorrow). Of course this is not possible. The only way to
forecast with the CAPM is to make an assumption about what will happen
tomorrow in the market. E.g. we can do a stress test by seeing what would happen
if the market premium was a certain value, or a list of values.
(d) Why are likelihood methods most favoured, in general, for estimation in
volatility models like GARCH, but not favoured when estimating CAPM models
(where least squares is favoured)?
GARCH models are time series regressions in the squares of the data: i.e. they are
models for squared returns. For LS estimation to have good properties in this case, we
need the 4th moments of the squared returns, i.e. the 8th moment of returns to be finite.
Now, returns certainly do seem prone to outliers and extreme observations, and have
much fatter tails than a Gaussian, making even a finite 4th moment questionable.
Further, since the 8th moment involves the average of (rt – )8 , and return data has
outliers, it seems somewhat unlikely that this 8th moment would be finite for return data.
Thus, LS methods are not preferred for GARCH models in general.
When estimating a CAPM model, we only need the 4th moment of returns to be finite
(not the 8th moment) for LS estimation to have good properties. In this case, LS
estimation is favoured for regression since under the LS assumptions the estimates
found have desirable properties: like being unbiased, consistent and relatively efficient
among estimators for regression parameters. Further, likelihood methods force us to
make an assumption about the conditional distribution of Y given X, while LS
estimation does not. Also, LS estimators give the conditional mean of Y given X, which
is important in asset pricing models.
(e) Explain the leverage effect and how it is potentially captured by the GJR-
GARCH model.
The leverage effect is a theory that suggests that as asset prices fall, the volatility of that
asset’s returns increases. Naturally as prices fall a firm’s equity is decreased. If at the
same time the level of debt for the company is unchanged the drop in equity will
increase the debt/equity ratio and thus leverage increases. The leverage effect associates
this with a subsequent increase in volatility due to the increased risk the firm faces. The
GJR-GARCH model
2 2 20 1 1 1 1 1
1
1
1
0, 0
1, 0
t t t t
t
t
t
I a
a
I
a
includes a dummy variable that is 1 whenever a return shock is negative. This dummy
variable allows the ARCH affect to be different for negative, than for positive, return
shocks. If the variable is positive, then volatility would be higher following negative
shocks (since at-1
2 is also positive) than following positive return shocks. Note that this
is not exactly the leverage effect, since here volatility would increase only when the
return was higher than it estimated mean (i.e. at = rt - t). However, the GJR would
capture the leverage effect if the mean was set to zero.
(f) Compare the Value at Risk and Expected Shortfall risk measures, listing and
discussing at least one advantage and one disadvantage of each, compared to
each other.
Value at Risk is the maximum amount that an asset return would realise in a fixed
period of time at a given probability level. VaR at 1% is then the 1% percentile or
quantile of a return distribution over a fixed period of time.
Expected shortfall (ES) is the average amount that an asset return would realise in a
fixed period of time if it was more extreme than a return quantile at a given probability
level. ES at 1% is then the average return for returns below the 1% percentile or quantile
of a return distribution, i.e. below the 1% VaR, over a fixed period of time.
An advantage of ES is that it represents the average loss in a specific part of the return
distribution, which is perhaps more representative of such losses than the VaR, which
represents the minimum loss in that part of the distribution. A disadvantage of ES is
that it is often a point that is very far out in the tails of the returns distribution, and is
thus estimated with a high level of uncertainty or standard error (since hardly any actual
observations are this far out in the tails in real data sets), and is very sensitive to outliers.
A disadvantage of VaR is that it represents a minimum loss in one part of the return
distribution, which is not really representative of the range or typical losses in that part
of the distribution. An advantage of VaR is that it is not as extreme as ES and thus can
be estimated with more certainty and with less standard error.
(g) A GARCH model specifies the one-step-ahead forecast return distribution.
Explain why it does not specify the two-step-ahead return distribution and why this
distribution is not the same as the one-step-ahead forecast return distribution.
A GARCH model can be written as:
; t t t t t tr a a
2 2 2
0 1 1 1 1t t ta
~ (0,1)
where 0 and 1
t
t t
D
E Var
This setting implies that the one-step-ahead distribution for returns is:
21 1 1| ~ ,t t t tr D where t+1 and t+1 are known at time t, so only t+1 varies here,
and it varies according to D.
For the 2-step-ahead distribution 2 2 2 2
| |t t t t t tr
Here 2 2 2 2 2
2 0 1 1 1 1 0 1 1 1 1 1t t t t t ta . Since this depends on
2
1t , which is not known at time t, this two-step-ahead standard deviation is a random
variable. Thus
2 2 2 2| |t t t t t tr is a random variable that involves the
distribution of the multiplication of the rv t+2 and the rv t+2. We do not, in general
know what the distribution of this new rv is, except that it is not the same as D.
(h) Why is forecasting important in finance, especially in the context of investment?
Investment involves making a decision and seeing and realising the subsequent result
of that decision. In the simplest case if we buy an asset, we make money if the asset
price subsequently increases and lose money if that price subsequently decreases. Thus,
our investment decisions are based on what we think we will happen after we make our
investment decision. Thus forecasting is important, since we are at least implicitly
forecasting what will occur. A little thought will help us realise that all investments
require forecasts of what will happen to prices or other financial instruments.
Q2
We consider the daily prices for the asset NAB (National Australia Bank) from January
2003 until June 2012 with 2434 observations.
Percentage log returns for NAB appear in the bottom plot above.
(a) (5 marks, as shown) GARCH models with Gaussian and Student-t errors are fit to this
data, using only the first 2000 returns in the sample, with the following results:
2 2 2
1 1
0.005 ; ; ~ (0,1)
0.062 0.158 0.831
t t t t t t
t t t
r a a N
a
*
5.0
2 2 2
1 1
0.041 ; ; ~ (0,1)
0.033 0.141 0.858
t t t t t t
t t t
r a a t
a
i. Interpret the estimated GARCH-t model, i.e. explain the three parts of the
estimated model.
The unconditional average return is estimated as 0.041%. The conditional distribution is
estimated to be a standardised Student-t with 5 degrees of freedom. The volatility equation has
11/10/02 23/02/04 07/07/05 19/11/06 02/04/08 15/08/09 28/12/10 11/05/12 23/09/13
15
20
25
30
35
40
45
11/10/02 23/02/04 07/07/05 19/11/06 02/04/08 15/08/09 28/12/10 11/05/12 23/09/13
-60
-40
-20
0
20
40
estimated intercept of 0.033, with ARCH effect of 0.141 and GARCH effect of 0.858. The
ARCH effect indicates how much the volatility will increase by, if yesterday’s squared shock
was increased by 1%, holding yesterday’s volatility constant. Thus, if yesterday’s squared
shock was increased by 1%, the volatility estimate would increase by 0.141.
ii. If yesterday’s variance was 0.93 and yesterdays return shock was -1.5%,
estimate today’s volatility.
The estimate is:
2 20.033 0.141 1.5 0.858 0.93t
= 1.148
iii. Estimate the volatility persistence and the average variance in NAB returns,
for both models.
Volatility persistence is given by ARCH+GARCH effects = 0.141+0.858 = 0.999
Average volatility is given by 0
1 1
0.033
33
1 1 0.141 0.858
(b) Diagnostics are applied to the GARCH with Student-t errors, as follows:
The degrees of freedom are estimated as 6.1, and these are used to form the transformed
standardised residuals, as 1 6.1 ˆtˆ te T
, where ˆt are the standardised residuals from
the 2nd model above. These residuals tˆe are plot over time below, as is there ACF for
the first 25 lags.
0 200 400 600 800 1000 1200 1400 1600 1800 2000
-5
0
5
0 2 4 6 8 10 12 14 16 18 20
-0.5
0
0.5
1
Lag
S
a
m
p
le
A
u
to
c
o
rr
e
la
ti
o
n
Sample Autocorrelation Function
i. Ljung-Box tests are applied to residuals tˆe testing the first 8 and 13 lags. The p-values
from these tests are 0.50 and 0.76. Conduct these tests by listing the hypotheses and
stating the conclusions. Do the results agree with what you see in the plots?
The null hypothesis is that the first 8 autocorrelations all equal 0, the alternative is that at
least one of these is non-zero. I choose a significance level of 5%, or 0.05. The p-value from
the Ljung-Box statistic is 0.50. Since 0.5 > 0.05, we cannot reject the null and conclude that
the first 8 autocorrelation estimates are not significantly different from 0, as a group.
The test is the same when testing 13 lags, except now the null is that the first 13
autocorrelations all equal 0. Since the p-value is 0.76 > 0.05, we cannot reject the null and
conclude that the first 13 autocorrelation estimates are not significantly different from 0, as
a group.
This agrees with the ACF plot above which shows no clearly significant correlations in the
first 20 lags (none of the correlations are outside the 95% intervals around 0 given in blue
in the plot).
ii. The squares of the residuals tˆe are also plot, as well as their ACF below
iii. Ljung-Box tests are applied to the squares of the residuals tˆe testing the first 8 and 13
lags. The p-values from these tests are 0.002 and 0.018. Conduct these tests by listing
the hypotheses and stating the conclusions. Do the results agree with what you see
in the plots?
The null hypothesis is that the first 8 autocorrelations all equal 0, the alternative is that at
least one of these is non-zero. I choose a significance level of 5%, or 0.05. The p-value from
the Ljung-Box statistic is 0.002. Since 0.002 < 0.05, we can reject the null and conclude that
at least one of the first 8 autocorrelation estimates is significantly different from 0.
The test is the same when testing 13 lags, except now the null is that the first 13
autocorrelations all equal 0. Since the p-value is 0.018 < 0.05, we can reject the null and
0 200 400 600 800 1000 1200 1400 1600 1800 2000
0
5
10
15
20
0 2 4 6 8 10 12 14 16 18 20
-0.5
0
0.5
1
Lag
S
a
m
p
le
A
u
to
c
o
rr
e
la
ti
o
n
Sample Autocorrelation Function
conclude that at least one of the first 13 autocorrelation estimates is significantly different
from 0.
This agrees with the ACF plot above which shows that the first lag auto-correlation is
significant and that perhaps the 5th is marginally significant too (these two correlations are
just outside the 95% intervals around 0 given in blue in the plot).
(c) A histogram of the residuals tˆe is shown below, together with a qq-plot
i. Does the distributional assumption of this model seem appropriate?
We expect to see a Gaussian N(0,1) distribution in the histogram and the crosses appearing
on the straight dashed line in the qqplot, if the distribution is the correct one. For a N(0,1),
we expect most points to lie within (-3, 3), and in a very large sample a few to be close to -
4 and 4. This seems to be the case in the histogram. Indeed, the blue crosses all seem very
close to being right on the dashed line, as expected for a N(0,1).
-5 -4 -3 -2 -1 0 1 2 3 4
0
100
200
300
400
-4 -3 -2 -1 0 1 2 3 4
-5
0
5
Standard Normal Quantiles
Q
u
a
n
ti
le
s
o
f
In
p
u
t
S
a
m
p
le
QQ Plot of Sample Data versus Standard Normal
ii. The sample skewness and kurtosis are estimated as -0.066 and 3.018 respectively. A Jarque-
Bera test is performed with p-value of 0.459. Conduct this test by listing the hypotheses
and stating the conclusion. Does the distributional assumption of this model seem
appropriate?
The null hypothesis is that the skewness equals 0 and the kurtosis equals 3, the alternative is
that either or both of things are not true. The p-value of 0.459 shows the probability of getting
sample skewness and kurtosis as far as from 0 and 3 as -0.066 and 3.018. Since 0.459 > 0.05,
we cannot reject the null and conclude that these residuals could indeed follow a Gaussian
N(0,1) distribution.
(d) 1-step-ahead 5% and 1% forecast VaRs are estimated for each day's return in the
forecast sample (after day 4000) using these models, plus GJR-GARCH with Gaussian and
Student-t errors, plus an IGARCH model with = 0.94 and a 100 day historical simulation
method.
The plot below shows 1-step-ahead 5% VaR forecasts for the last 433 days in the sample.
i. Compare and explain the behaviour of the three GARCH-type model 5% VaR
forecasts. Note the differences and similarities and try to explain why they have
occurred.
The three GARCH-type models all sit similarly on the “bottom shoulder” of the data in
their 5% VaR forecasts. They each get more extreme (negative) following outlying or
extreme returns, as their volatility estimates also will and mostly they are quite close to
each other as a group. However, following highly extreme returns, the GARCH-
0 50 100 150 200 250 300 350 400 450
-8
-6
-4
-2
0
2
4
6
data1
GARCH-Gaussian
GARCH-t
IGARCH
HS-100 day
Gaussian and GARCH-t have immediately more extreme 5% VaR forecasts than the
IGARCH, while the IGARCH takes longer to move back towards the data than these
two models, so the IGARCH tends to be firstly less extreme for a few days, but then
more extreme than these two for long periods (around 50 days) after an extreme return.
Why is this? The ARCH effects in the three models are 0.16, 0.14 and 0.06. GARCH-
Gaussian and GARCH-t have much higher ARCH effects (0.16 and 0.14) and so react
much more strongly to extreme or large returns, than the IGARCH. However, the
IGARCH is non-stationary and not mean-reverting, while the other two are mean-
reverting in volatility. Thus the GARCH-Gaussian and GARCH-t have to revert to a
long-run average, while IGARCH does not, causing the latter to deviate from the data
for longer following an extreme return.
ii. Why do the 100 day historical simulation VaRs stay flat for long periods?
Following an extreme return, this return will be in the last 100 days for 100 days! i.e.
for the next 100 days, the sample quantile will be dominated by this extreme return, and
so stay flat. When the extreme return is 101 days ago, there will be a marked change in
the 5% sample VaR estimate (unless another extreme has occurred in that 100 day
period).
(e) The table below shows number of violations and violation rates for each of the
models above, from their 5% VaR forecasts:
Model Violations ˆ ˆ
0.05
Significantly different to
0.05?
GARCH-
Gaussian
22 0.051 1.02 No
GARCH-t 25 0.058 1.15 No
GJR-Gaussian 22 0.051 1.02 No
GJR-t 22 0.051 1.02 No
IGARCH 20 0.046 0.92 No
HS-100 day 23 0.053 1.06 No
i. The 95% confidence interval for the violation rate is (0.0295, 0.0705),
if the true rate was 0.05, in a sample of size 433. Fill in the last column
of the table above.
All sample violation rates are inside the 95% CI, meaning none can be rejected as
having a violation rate different from the required 0.05.
ii. Briefly discuss these results, compare the models performance and
discuss why you believe each model has performed the way it has,
regarding accuracy of forecasting 5% VaR in this data.
All models pass the unconditional coverage test and have acceptable violation rates.
Even still, three models GARCH-Gaussian, GJR-Gaussian and GJR-t are closest to 0.05
on this measure, while the GARCH-t is furthest away with 0.058, 15% too many
violations. The IGARCH and HS-100 methods have performed pretty well on this
aspect, about equally far away from 0.05. Note that the IGARCH is the only
conservative risk model here, since it is only to have fewer violations than expected, at
a rate of 0.046.
(f) The table below shows p-value from the independence and Dynamic Quantile
(DQ) tests applied to the violations from each model’s set of 1% VaR forecasts,
as well as the criterion function loss values.
Model Violations ˆ Independence DQ Loss
GARCH-Gaussian 22 0.051 0.12 0.47 77.47
GARCH-t 25 0.058 0.68 0.50 77.20
GJR-Gaussian 22 0.051 0.12 0.49 77.37
GJR-t 22 0.051 0.12 0.54 76.67
IGARCH 20 0.046 0.94 0.60 75.86
HS-100 day 23 0.053 0.13 0.12 78.21
i. Briefly discuss these results, compare the models performance and
discuss why you believe each model has performed the way it has, via
these criteria.
No models are rejected by the independence test, nor by the DQ test. All seem quite
comparable and to have violations that are roughly independent over time and close to
the expected 5%. Regarding loss functions, the best model will have the lowest loss
value, and be the one that is “closest” to the unknown true 5% VaR values. The model
that does best on this criterion is the IGARCH, followed closely by the GJR-t; while
the worst is the HS-100.
ii. Which model has performed the best? Why? Which has performed the
worst? Why?
The HS-100 has performed the worst by Loss function and has the lowest p-value for the DQ
test. The GARCH-t has the violation rate furthest from 0.05, and also has the most violations
and more than expected (0.058 and 25), which is not good for financial solvency. These two
models have performed the worst.
The best model seems to be between the GARCH-Gaussian, GJR-Gaussian, GJR-t and
IGARCH. The first three did best by violation rate, but the IGARCH is conservative here, which
is good for financial solvency (less violations than expected). The IGARCH does best by loss
function, followed closely by the GJR-t. It seems the IGARCH models may be the best
here.
FORMULAS YOU MAY ASSUME AND USE
Some formulas and information to assist you in the test
Some formulas and information to assist you in the test
The assumptions of OLS regression are:
1. The population residuals and the X variables are uncorrelated. In other words,
| 0E X
2. The data sample are iid
3. The 4th moments of both Y and each X are finite, i.e. 4 4;E X E Y . This
implies that the mean and variance of each of Y and each X are also finite.
4. The X variables, if there are more than 1, are not perfectly correlated with each other,
and none is a perfect linear combination of the other.
The assumptions of Factor analysis are:
1. The data sample for Y are iid.
2. The estimated factors F are iid, with mean 0 and variance 1.
3. The population residuals and the factor variables are uncorrelated. In other words,
, 0jCov F for each factor Fj
4. The 4th moments of both Y and each factor F are finite, i.e. 4 4;E F E Y .
This implies that the mean and variance of each of Y and each F are also finite.
Omitted variables:
(a) When X occurs in time before Y
A variable Z is an omitted variable from the analysis of the relationship between two
variables Y and X, under the following conditions:
1. The variable Z is not accounted for specifically in the analysis
2. The variable Z is correlated or associated with X
3. The variable Z is causal for Y
(b) If two variables X and Y occur simultaneously, one cannot cause the other. However,
there may be an omitted variable if:
1. The variable Z is not accounted for specifically in the analysis
2. The variable Z is correlated, associated with or causal for X and Y
Value at Risk (VaR)
The VaR is the minimum loss that could occur with probability level over a fixed
time period.
Expected Shortfall (ES)
The ES is the average loss that could occur, for losses occurring with probability level
over a fixed time period.
GARCH models:
The basic structure of a GARCH model has three components:
1. Mean equation: ; t t t t t tr a a
2. Volatility equation: e.g. ARCH(p) 2 20
1
p
t i t i
i
a
GARCH(1,1) 2 2 2
0 1 1 1 1t t ta
GJR-GARCH
2 2 20 1 1 1 1 1
1
1
1
0, 0
1, 0
t t t t
t
t
t
I a
a
I
a
Risk-Metrics 2 2 21 1 1 11t t ta
EGARCH 2 20 1 1 1 1 1log logt t t t tE
3. Conditional distribution:
~ (0,1)
where 0 and 1
t
t t
D
E Var
e.g. Gaussian ~ (0,1)t N
or standardised Student-t *
2
~ (0,1) (0,1)t v v
v
t t
v