A13073W1
DEGREE OF MASTER OF SCIENCE IN FINANCIAL ECONOMICS
FINANCIAL ECONOMETRICS
TRINITY TERM 2020
Tuesday, 21 April 2020
Time allowed is FOUR HOURS
You MUST upload your submission within 4 hours of accessing the paper
Candidates should answer ALL questions in part A.
Candidates should answer THREE of questions in part B.
Candidates should answer ALL questions in part C.
Examiners will place weight 2% on each question in Part A (40% total),
10% on each question in part B and 30% on Part C.
Please use the solutions template provided if possible.
Materials: Candidates may use their any calculator or any other software when preparing their
Do not turn over until told that you may do so.
1
PART A: MULTIPLE CHOICE
Answer ALL questions in this section.
The section contributes 40% towards the final mark. Each question is worth 2% of the exam mark.
Select a single answer for each question. Correct marks are awarded 2%, and incorrect marks reduce
the score by 0.5% so that random guessing does not improve your expected mark.
1. If you flip three fair coins, what is the probability that all three show the same side?
(a) 1/2
(b) 1/8
(c) 1/16
(d) 1/4
2. When evaluating a series of forecasts using the Mincer-Zarnowitz regression yt+h− yˆt+h|t =
α + β yˆt+h|t + ηt+h, what are the values of α and β that should occur when the forecasting
model is correctly specified.
(a) β = 0, no restriction on α
(b) α = 0,β = 0
(c) α = 0,β = 1
(d) β = 1, no restriction on α
3. Daily return on a portfolio are i.i.d.with a N
(
0.05%,(1%)2
)
distribution. What is the 1-
week 5% Value-at-Risk or a portfolio with £1,000,000,000 under management (to the nearest
10,000)?
(a) 2,500,000
(b) 34,280,000
(c) 41,330,000
(d) 4,130,000
4. How are Diebold-Mariano tests used to compare the forecasts from two VaR models?
A13073W1 2
(a) The two series of HITs from the models are used as losses.
(b) HITs are regressed on lagged HITs and the two VaR forecasts.
(c) Diebold-Mariano tests cannot be used to compare two VaR models.
(d) The two series of HITs are transformed using the tick-loss function, and then the differ-
ence is tested.
5. How are simulated values used in Historical Simulation VaR?
(a) They are not. The quantile depends only on α and the sample size n.
(b) They rely on simulated models produced in the past.
(c) Multi-step VaR uses values simulated from historical observations.
(d) Returns are first filtered by an ARCH-type model and then simulated forward to estimate
the VaR.
6. What restrictions are required on Φ1 for a VAR(1)
yt =Φ1yt−1+ εt
to be stationary?
(a) All eigenvalues must be positive.
(b) All eigenvalues must be less than 1 in modulus.
(c) VAR(1) models are always covariance stationary. Only VAR(P) models can be non-
stationary.
(d) All values in Φ1 must be less than 1 in absolute value.
7. The Local in Local Average Treatment Effects refers to:
(a) The estimate of the ATE is weighted to reflect the probability of participation in a ran-
domized controlled trial (RCT).
(b) The reality the all experiments only use subjects in one locality.
(c) That the maximum likelihood estimator of the ATE may achieve a local rather than a
global maximum.
(d) The effect is homogeneous across the treatment group.
8. Consider the model yt = yt−1+ εt , where εt
i.i.d.∼ N(0,σ2ε ). Which statement below false?
(a) E [yt ] grows over time.
(b) A regression of yt on xt with xt = 0.5xt−1 + vt where vt ∼ N(0,σ2v ) and where εt and vt
are independent can result in spurious correlation.
A13073W1 3 TURN OVER
(c) There is no mean reversion in long-run forecasts of yt so that Et [yt+h] = yt for any h≥ 1.
(d) In a regression yt = βyt−1+ εt , our OLS estimate βˆ would be inconsistent.
9. How does the GJR-GARCH model improve on the GARCH model?
(a) It models the log-variance instead of the variance, and so is always positive.
(b) It allows for more lags of the squared return and variance.
(c) It adds an asymmetry term that depends on the sign of the past return.
(d) It models the standard deviation instead of the variance.
10. In a hypothesis test, the power of the test is
(a) The probability the null is rejected given the alternative is true.
(b) the probability of a Type I error.
(c) the probability of a Type II error.
(d) The probability that the alternative is true.
Select all correct answers. Each question has between 0 and n correct answers where n is the number
of options in the questions. Each of the n answers is treated as a true-false question so that a correct
subpart answer is awarded 2%/n, and an incorrect answer reduces the mark by 2%/n. For example, if
the correct answers in a 5-part question are A, D, and E, then an answer with A is awarded 0.4%,
and an answer without A is reduced by 0.4%. An answer with B is reduced by 0.4%, and an answer
without B is awarded 0.4%. Random guessing does not improve your expected mark.
11. If E[Y |X ] = 0, where X is uniformly distributed on [5,10], then which of the following state-
ments are true:
(a) E [g(Y ) |X ] = 0 for any well defined function g(·)
(b) E[Y X ] = 0
(c) E[Y/X] = 0
(d) Cov [X ,Y ] = 0
12. What restrictions are required in an APARCH model to get an ARCH(1)?
σδt = ω+α (|εt−1|+ γεt−1)δ +βσδt−1
(a) γ=0
(b) β = 0
A13073W1 4
(c) δ = 0
(d) β = 1−α
(e) δ = 2
13. Why is the companion form of a VAR(P) useful?
(a) It allows any VAR(P) to be expressed as a VAR(1)
(b) It simplifies computing the autocovariance function of a VAR.
(c) At allows an AR(P) to be written as a VAR(1)
(d) It transforms a VAR to ensure that it is covariance stationary.
14. Which of the following are true about the expectations operator, E [·]:
(a) E [XY ] = Cov [X ,Y ] only when the random variables X and Y are independent
(b) If g(·) is a concave function, then E [g(X)]≥ g [E [X ]]
(c) V [a+bX ] = bV [X ] where a and b are constants and X is a random variable
(d) E [E [X |Y ]] = E [X ] only if X and Y are independent
15. Which are true of a vector white noise process {ε t}?
(a) The elements in the sequence {ε t} must have no contemporaneous correlation.
(b) The elements in the sequence {ε t} must have no correlation across time.
(c) The vector must be independent across time.
(d) Et [ε t |ε t−1,ε t−2, . . .] = 0
(e) The vector must be conditionally homoskedastic.
16. Assuming yt is covariance stationary and is generated by a VAR(P) with white noise residuals,
which of the VAR order selection methods lead to consistent lag length selection?
(a) BIC (Bayesian)
(b) AIC (Akaike)
(c) Likelihood-ratio
(d) HQIC (Hannan-Quinn)
17. The central limit theorem ...
(a) holds with finite and asymptotic samples sizes.
(b) forms a basis for inference on the OLS regression parameters.
(c) is a distributional statement for the sample mean.
A13073W1 5 TURN OVER
(d) states that the sample mean converges to the population mean for i.i.d.data with finite
variance.
(a) A finite number of outliers result in biased OLS estimates of linear regression coefficients.
(b) A finite number of outliers result in inconsistent OLS estimates of linear regression coef-
ficients.
(c) Windsorization and trimming identify outlying observations by the most extreme realiza-
tions of the dependent variable.
(d) Trimming removes observations identified as outliers.
19. Which regression specifications can be estimated with linear regression assuming xi is observ-
able, E [εi|X ] = 0 and E [εi|Yi−1] = 0?
(a) yi = β1x
β2
i εi, εi > 0
(b) yi = β1x
β2
i + εi, εi > 0
(c) yi =

σ2i εi, with σ
2
i = ω+αy2i−1+βσ
2
i−1
(d) yi = β1 sinxi+β2 lnxi+ εi, xi > 0
20. Which of the following terms are sources of non-stationarity in the time-series model yt =
φ+δ t+yt−1+β1x1t +β2x2t +εt , where εt
i.i.d.∼ N(0,σ2ε ) and xt i.i.d.∼ N (0,Σ) are bivariate normally
distributed?
(a) The intercept φ .
(b) The correlated regressors x1t and x2t .
(c) The deterministic trend (δ t).
(d) The lag (yt−1).
A13073W1 6
Answer THREE of the seven questions in this section.
Each question is worth 10% of the exam mark (i.e., 1/3 of 30%). Within each question points sum
to 100% and so will be scaled by 10% when combined in the final exam mark. Answers must be as
precise as possible, i.e., should use mathematical notation and formulae where relevant.
1. Suppose
yt =
[
0.2 0.4
0.0 0.6
]
yt−1+
[
0.0 0.4
0.1 0.3
]
yt−2+ ε t
where ε t is a vector white noise process with covariance Σ. Answer the following questions
(a) [30%] In bivariate a VAR(2), what restrictions on the model’s parameters are implied if y2
does not Granger Cause y1?
(b) [30%] Write the model in error correction form.
(c) [20%] Using the coefficient matrix on yt−1 in the ECM, determine if the model is (a)
a cointegrated VAR, (b) 2 random walks, or (c) covariance stationary. Note that in a 2
by 2 matrix A, the eigenvalues are the solution to λ1λ2 = a11a22− a12a21 and λ1+λ2 =
a11+a22.
(d) [20%] What are the 1-step and 2-step ahead forecast from the ECM for Et [yt+h], h = 1,2.
2. Suppose two assets, X and Y , have are bivariate normally distributed with µX = 8%, µY = 5%,
σ2X = 0.252, σ2Y = 0.152 and ρXY =−0.3.
(a) [20%] What the expected return to a portfolio Z = wX +(1−w)Y ?
(b) [20%] What is the variance of Z as a function of w?
(c) [40%] What value of w minimizes the variance of the portfolio?
(d) [20%] What value of w maximizes the Sharpe ratio of the portfolio, E[Z]/

V [Z]?
3. Suppose you observe a sequence of n i.i.d.data from a Poisson(λ ) distribution where each
observation has pmf
f (x;λ ) =
λ x exp(−λ )
x!
.
(a) [30%] What is the MLE of λˆ?
(b) [20%] What is the asymptotic distribution of the MLE?
A13073W1 7 TURN OVER
Use the sample
{4,5,4,6,3,5,5,6,3,3}
to answer the questions (c) and (d).
(c) [25%] Using the data above, test the null H0 : λ = 3.3 using a t-test and a 5% test size.
The lower-tail quantiles from a normal distribution are in the table below.
(d) [25%] Repeat the test in (c) using a Likelihood ratio test.
Quantile Value
1% -2.32
2.5% -1.95
5% -1.64
10% -1.28
(a) [50%] Describe 2 methods to identify a causal effect in observational (non-experimental)
data. Compare the two methods and discuss their advantages and limitations.
(b) [50%] How might an RCT, which is often referred to as the gold standard for causal effect
5. Suppose yt = φ0+φ1yt−1+φ2yt−2+φ12yt−12+θεt−1+ εt and εt
i.i.d.∼WN(0,σ2ε ). For the parts
(a) - (c) assume that parameters are consistent with yt being a covariance stationary process.
(a) [20%] What is the value of E[yt+2]?
(b) [20%] What is the value of Et [yt+2]?
(c) [20%] What is the value of limh→∞Et [yt+h]?
(d) [20%] Now we do not assume yt to be covariance stationary. Let φ0 = 8, φ1 = 0.8, φ2 =
−0.15 and θ = 12. Is yt stable for the given parameters?
(e) [10%] Rewrite the model using only differenced data ∆yt , ∆yt−1, ∆yt−2. . . and yt−1?
(f) [10%] How would you describe the process {yt} if the coefficient on yt−1 is 0 in this form?
6. If lnRVt is modeled as a HAR
lnRVt = 0.1+0.4lnRVt−1+0.3lnRVt−1:5+0.22lnRVt−1:22+ εt
where εt ∼ N(0,σ2) where lnRVt−1:h = h−1
∑h
i=1 lnRVt−i is the average of h lags of lnRV .
(a) [20%] What is Et [lnRVt+1]?
(b) [20%] What is Et [lnRVt+2]?
(c) [20%] What is limh→∞Et [lnRVt+h]?
A13073W1 8
(d) [20%] What is the conditional distribution of the 2-step forecast error, lnRVt+2−Et [lnRVt+2]?
(e) [10%] What is Et [RVt+1]?
(f) [10%] What is Et [RVt+2]?
7. Suppose the correct model is
yi = x1,iβ1+ x2,iβ2+ εi, (1)
where i = 1, ...,n and the researcher estimates
yi = x1,iβ1+ vi.
(a) [20%] What is the effect on βˆ2 of omitting the variable x1,i in the regression equation?
(b) [20%] Is there always a cost of missing x2,i in the regression equation? If not, give two
examples.
(c) [20%] Suppose the researcher did not include x2,i because she cannot access this variable.
Explain how she can use an instrumental variable zi to fix the problematic estimate βˆ1.
Now suppose the correct model is
yi = x1,iβ1+ εi,
and the researcher estimates the larger model
yi = x1,iβ1+ x2,iβ2+ εi.
(d) [20%] What is the cost of adding an unnecessary variable x2,i?
(e) [20%] Explain how the researcher can use cross-validation to select which variables to
include in cross-sectional regression models.
A13073W1 9 TURN OVER
Answer ALL questions in this section.
The section contributes 30% towards the final mark.
1. Your colleague has built two new models for Value-at-Risk, both using magical machine learn-
ing methods (Models 2 & 3). Your colleague needs you to validate that her machine learning
approach is a good alternative to Filtered Historical Simulation (Model 1). While she did not
leave you the code or the raw data, she has produced some basic statistics and visualizations
(Tables 1 and 2 and Figure 1). All models are fit to the same return data, and all results are
out-of-sample.
(a) [67%] Use the available measures to construct the best story you can about whether you
think your firm should move to one of the machine-learning-based Value-at-Risk or remain
with Filtered Historical Simulation. The best answers will use mathematical notation were
relevant and compute statistics using the data in the tables when these can be transformed
into measures of absolute or relative performance of the models.
(b) [33%] Why do we use the tick-loss function when forecasting Value-at-Risk? Explain
how the tick-loss function is like the Mean Square Error (MSE) loss function that is used
when foresting the conditional mean or the Quasi-likelihood-loss (QLIK) function that is
used when forecasting the conditional variance.
A13073W1 10
Statistics Computed using the HITs
Summary Statistics
Model 1 Model 2 Model 3
µˆ 0.0111 -0.0126 0.0164
σˆ 0.3144 0.2824 0.3209
σˆNW 0.2964 0.3831 0.3475
T 756 756 756∑T−1
t=1 I
[
rt<−VaR jt
]I[
rt+1<−VaR jt+1
] 7 13 28∑T−1
t=1
(
1− I[
rt<−VaR jt
])(1− I[
rt+1<−VaR jt+1
]) 594 636 607
Corr [HITt ,HITt+1] -0.03142 0.1200 0.2282
Covariance
(
Σˆ
)
Model 1 Model 2 Model 3
Model 1 0.0987 0.0577 0.0743
Model 2 0.0577 0.0796 0.0506
Model 3 0.0743 0.0506 0.1028
Long-run Covariance
(
ΣˆNW
)
Model 1 Model 2 Model 3
Model 1 0.0878 0.0744 0.0813
Model 2 0.0744 0.1467 0.0793
Model 3 0.0813 0.0793 0.1207
Table 1: This table contains statistics based on the sequence of HITs defined as I[
rt+1<−VaR jt+1
]−
α for models j = 1,2,3 where α = 10%. The top panel contains the mean of the HITs (µˆ), the
standard deviation of the HITs (σˆ ), the long-run standard deviation of the HITs computed as the
square root of a Newey-West variance using 12 lags (σˆNW ), the number of out-of-sample observations
(T ), the number of periods where a VaR violation (an exceedance) was followed by a VaR violation(∑T−1
t=1 I
[
rt<−VaR jt
]I[
rt+1<−VaR jt+1
]), the number of periods where no VaR violation was followed by
no VaR violation
(∑T−1
t=1
(
1− I[
rt<−VaR jt
])(1− I[
rt+1<−VaR jt+1
])), and the correlation of the HITs
across two consecutive periods (Corr [HITt ,HITt+1]). The middle panel contains the covariance of
the HITs across the three methods (Σˆ). The final panel contains the long-run covariance of the HITs
measured using a Newey-West covariance estimator with 12 lags (ΣˆNW ).
A13073W1 11 TURN OVER
Statistics Computed using the Tick Losses
Mean
Model 1 Model 2 Model 3
L¯ 0.1273 0.1712 0.1294
Covariance
(
Σˆ
)
Model 1 Model 2 Model 3
Model 1 0.0426 0.0433 0.0381
Model 2 0.0433 0.0752 0.0364
Model 3 0.0381 0.0364 0.0353
Long-run Covariance
(
ΣˆNW
)
Model 1 Model 2 Model 3
Model 1 0.1591 0.1783 0.1543
Model 2 0.1783 0.2332 0.1702
Model 3 0.1543 0.1702 0.1513
Table 2: The top panel contains the mean tick-loss for each of the models computed using α =
10%. The middle panel contains the covariance
(
Σˆ
)
of the tick-losses estimated using the standard
covariance estimator. The bottom panel contains an estimate of the long-run covariance
(
ΣˆNW
)
of the
tick-losses estimated using a Newey-West covariance estimator and 12 lags.
A13073W1 12

\$QQ9RO
0
RG
HO

\$QQ9RO
0
RG
HO

\$QQ9RO
0
RG
HO

Fi
gu
re
1:
Pl
ot
s
of
th
e
V
aR
vi
ol
at
io
ns
fo
re
ac
h
of
th
e
th
re
e
m
od
el
s
(H
)a
lo
ng
w
ith
th
e
fit
te
d
vo
la
til
ity
,w
hi
ch
is
th
e
sa
m
e
in
al
lt
hr
ee
pa
ne
ls
si
nc
e
th
e
un
de
rl
yi
ng
as
se
ti
s
id
en
tic
al
.
A13073W1 13 LAST PAGE 