程序代写案例-NY 10012-1126|学霸联盟

程序代写案例-NY 10012-1126

时间：2021-11-09

CAViaR: Conditional Autoregressive Value at
Risk by Regression Quantiles
Robert F. ENGLE
Stern School of Business, New York University, New York, NY 10012-1126 (rengle@stern.nyu.edu)
Simone MANGANELLI
DG-Research, European Central Bank, 60311 Frankfurt am Main, Germany (simone.manganelli@ecb.int)
Value at risk (VaR) is the standard measure of market risk used by financial institutions. Interpreting
the VaR as the quantile of future portfolio values conditional on current information, the conditional
autoregressive value at risk (CAViaR) model specifies the evolution of the quantile over time using an
autoregressive process and estimates the parameters with regression quantiles. Utilizing the criterion that
each period the probability of exceeding the VaR must be independent of all the past information, we
introduce a new test of model adequacy, the dynamic quantile test. Applications to real data provide
empirical support to this methodology.
KEY WORDS: Nonlinear regression quantile; Risk management; Specification testing.
1. INTRODUCTION
The importance of effective risk management has never been
greater. Recent financial disasters have emphasized the need
for accurate risk measures for financial institutions. As the na-
ture of the risks has changed over time, methods of measuring
these risks must adapt to recent experience. The use of quanti-
tative risk measures has become an essential management tool
to be placed in parallel with models of returns. These measures
are used for investment decisions, supervisory decisions, risk
capital allocation, and external regulation. In the fast-paced fi-
nancial world, effective risk measures must be as responsive to
news as are other forecasts and must be easy to grasp in even
complex situations.
Value at risk (VaR) has become the standard measure of mar-
ket risk used by financial institutions and their regulators. VaR
is a measure of how much a certain portfolio can lose within
a given time period, for a given confidence level. The great
popularity that this instrument has achieved among financial
practitioners is essentially due to its conceptual simplicity; VaR
reduces the (market) risk associated with any portfolio to just
one monetary amount. The summary of many complex bad out-
comes in a single number naturally represents a compromise
between the needs of different users. This compromise has re-
ceived the blessing of a wide range of users and regulators.
Despite VaR’s conceptual simplicity, its measurement is a
very challenging statistical problem, and none of the method-
ologies developed so far gives satisfactory solutions. Because
VaR is simply a particular quantile of future portfolio values,
conditional on current information, and because the distribution
of portfolio returns typically changes over time, the challenge
is to find a suitable model for time-varying conditional quan-
tiles. The problem is to forecast a value each period that will
be exceeded with probability (1 − θ) by the current portfolio,
where θ ∈ (0,1) represents the confidence level associated with
the VaR. Let {yt}Tt=1 denote the time series of portfolio returns
and T denote the sample size. We want to find VaRt such that
Pr[ yt < −VaRt|t] = θ , where t denotes the information set
available at time t. Any reasonable methodology should address
the following three issues: (1) provide a formula for calculating
VaRt as a function of variables known at time t − 1 and a set of
parameters that need to be estimated; (2) provide a procedure
(namely, a loss function and a suitable optimization algorithm)
to estimate the set of unknown parameters; and (3) provide a
test to establish the quality of the estimate.
In this article we address each of these issues. We propose
a conditional autoregressive specification for VaRt , which we
call conditional autoregressive value at risk (CAViaR). The un-
known parameters are estimated using Koenker and Bassett’s
(1978) regression quantile framework. (See also Chernozhukov
and Umantsev 2001 for an application of linear regression
quantile to VaR estimation.) Consistency and asymptotic re-
sults build on existing contributions of the regression quantile
literature. We propose a new test, the dynamic quantile (DQ)
test, which can be interpreted as an overall goodness-of-fit test
for the estimated CAViaR processes. This test, which has been
independently derived by Chernozhukov (1999), is new in the
literature on regression quantiles.
The article is structured as follows. Section 2 reviews the cur-
rent approaches to VaR estimation, and Section 3 introduces the
CAViaR models. Sections 4 reviews the literature on regression
quantiles and establishes consistency and asymptotic normal-
ity of the estimator. Section 5 introduces the DQ test, and Sec-
tion 6 presents an empirical application to real data. Section 7
concludes the article.
2. VALUE AT RISK MODELS
VaR was developed in the early 1990s in the financial in-
dustry to provide senior management with a single number that
could quickly and easily incorporate information about the risk
of a portfolio. Today VaR is part of every risk manager’s tool-
box. Indeed, VaR can help management estimate the cost of
positions in terms of risk, allowing them to allocate risk in
a more efficient way. Also, the Basel Committee on Banking
Supervision (1996) at the Bank for International Settlements
© 2004 American Statistical Association
Journal of Business & Economic Statistics
October 2004, Vol. 22, No. 4
DOI 10.1198/073500104000000370
367
368 Journal of Business & Economic Statistics, October 2004
uses VaR to require financial institutions, such as banks and in-
vestment firms, to meet capital requirements to cover the mar-
ket risks that they incur as a result of their normal operations.
However, if the underlying risk is not properly estimated, these
requirements may lead financial institutions to overestimate (or
underestimate) their market risks and consequently to main-
tain excessively high (low) capital levels. The result is an in-
efficient allocation of financial resources that ultimately could
induce firms to move their activities into jurisdictions with less-
restrictive financial regulations.
The existing models for calculating VaR differ in many as-
pects, but all follow a common structure, which can be summa-
rized as follows: (1) The portfolio is marked-to-market daily,
(2) the distribution of the portfolio returns is estimated, and
(3) the VaR of the portfolio is computed. The main differ-
ences among VaR models are related to the second aspect. VaR
methodologies can be classified initially into two broad cate-
gories: factor models, such as RiskMetrics (1996), and port-
folio models, such as historical quantiles. In the first case, the
universe of assets is projected onto a limited number of fac-
tors whose volatilites and correlations have been forecast. Thus
time variation in the risk of a portfolio is associated with time
variation in the volatility or correlation of the factors. The VaR
is assumed to be proportional to the computed standard devi-
ation of the portfolio, often assuming normality. The portfolio
models construct historical returns that mimic the past perfor-
mance of the current portfolio. From these historical returns, the
current VaR is constructed based on a statistical model. Thus
changes in the risk of a particular portfolio are associated with
the historical experience of this portfolio. Although there may
be issues in the construction of the historical returns, the in-
teresting modeling question is how to forecast the quantiles.
Several different approaches have been used. Some first esti-
mate the volatility of the portfolio, perhaps by a generalized
autoregressive conditional heteroscedasticity (GARCH) or ex-
ponential smoothing, and then compute VaR from this, often as-
suming normality. Others use rolling historical quantiles under
the assumption that any return in a particular period is equally
likely. A third approach appeals to extreme value theory.
It is easy to criticize each of these approaches. The volatility
approach assumes that the negative extremes follow the same
process as the rest of the returns and that the distribution of the
returns divided by standard deviations will be iid, if not normal.
The rolling historical quantile method assumes that for a certain
window, such as a year, any return is equally likely, but a return
more than a year old has zero probability of occurring. It is easy
to see that the VaR of a portfolio will drop dramatically just
1 year after a very bad day. Implicit in this methodology is the
assumption that the distribution of returns does not vary over
time, at least within a year. An interesting variation of the his-
torical simulation method is the hybrid approach proposed by
Boudoukh, Richardson, and Whitelaw (1998), which combines
volatility and historical simulation methodologies by applying
exponentially declining weights to past returns of the portfolio.
However, both the choice of the parameters of interest and the
procedure behind the computation of the VaR seem to be ad
hoc and based on empirical justifications rather than on sound
statistical theory.
Applications of extreme quantile estimation methods to
VaR have recently been proposed (see, e.g., Danielsson and
de Vries 2000). The intuition here is to exploit results from sta-
tistical extreme value theory and to concentrate the attention
on the asymptotic form of the tail, rather than modeling the
whole distribution. There are two problems with this approach.
First, it works only for very low probability quantiles. As shown
by Danielsson and de Vries (2000), the approximation may be
very poor at very common probability levels (such as 5%), be-
cause they are not “extreme” enough. Second, and most impor-
tant, these models are nested in a framework of iid variables,
which is not consistent with the characteristics of most financial
datasets, and, consequently, the risk of a portfolio may not vary
with the conditioning information set. Recently, McNeil and
Frey (2000) suggested fitting a GARCH model to the time se-
ries of returns and then applying the extreme value theory to the
standardized residuals, which are assumed to be iid. Although
it is an improvement over existing applications, this approach
still suffers from the same problems as the volatility models.
Chernozhukov (2000) and Manganelli and Engle (2004) have
shown how extreme value theory can be incorporated into the
regression quantile framework.
3. CAVIAR
We propose a different approach to quantile estimation. In-
stead of modeling the whole distribution, we model the quantile
directly. The empirical fact that volatilities of stock market re-
turns cluster over time may be translated in statistical words by
saying that their distribution is autocorrelated. Consequently,
the VaR, which is tightly linked to the standard deviation of the
distribution, must exhibit similar behavior. A natural way to for-
malize this characteristic is to use some type of autoregressive
specification. We propose a conditional autoregressive quantile
specification, which we call CAViaR.
Suppose that we observe a vector of portfolio returns, {yt}Tt=1.
Let θ be the probability associated with VaR, let xt be a vector
of time t observable variables, and let βθ be a p-vector of un-
known parameters. Finally, let ft(β) ≡ ft(xt−1,βθ ) denote the
time t θ -quantile of the distribution of portfolio returns formed
at time t − 1, where we suppress the θ subscript from βθ for
notational convenience. A generic CAViaR specification might
be the following:
ft(β) = β0 +
q∑
i=1
βi ft−i(β) +
r∑
j=1
βjl(xt−j), (1)
where p = q + r + 1 is the dimension of β and l is a function
of a finite number of lagged values of observables. The autore-
gressive terms βi ft−i(β), i = 1, . . . ,q, ensure that the quantile
changes “smoothly” over time. The role of l(xt−j) is to link
ft(β) to observable variables that belong to the information set.
This term thus has much the same role as the news impact curve
for GARCH models introduced by Engle and Ng (1993). A nat-
ural choice for xt−1 is lagged returns. Indeed, we would expect
the VaR to increase as yt−1 becomes very negative, because one
bad day makes the probability of the next somewhat greater. It
might be that very good days also increase VaR, as would be
the case for volatility models. Hence, VaR could depend sym-
metrically on |yt−1|.
Engle and Manganelli: CAViaR 369
Next we discuss some examples of CAViaR processes
that we estimate. Throughout, we use the notation (x)+ =
max(x,0), (x)− = −min(x,0).
Adaptive:
ft(β1) = ft−1(β1)
+ β1
{[
1 + exp(G[ yt−1 − ft−1(β1)]
)]−1 − θ},
where G is some positive finite number. Note that as G → ∞,
the last term converges almost surely to β1[I( yt−1 ≤ ft−1(β1))−
θ ], where I(·) represents the indicator function; for finite G, this
model is a smoothed version of a step function. The adaptive
model incorporates the following rule: Whenever you exceed
your VaR, you should immediately increase it, but when you do
not exceed it, you should decrease it very slightly. This strategy
obviously will reduce the probability of sequences of hits and
will also make it unlikely that there will never be hits. But it
learns little from returns that are close to the VaR or are ex-
tremely positive, when G is large. It increases the VaR by the
same amount regardless of whether the returns exceeded the
VaR by a small margin or a large margin. This model has a unit
coefficient on the lagged VaR. Other alternatives are as follows:
Symmetric absolute value:
ft(β) = β1 + β2 ft−1(β) + β3|yt−1|.
Asymmetric slope:
ft(β) = β1 + β2 ft−1(β) + β3( yt−1)+ + β4( yt−1)−.
Indirect GARCH(1,1):
ft(β) =
(
β1 + β2 f 2t−1(β) + β3y2t−1
)1/2
.
The first and third of these respond symmetrically to past re-
turns, whereas the second allows the response to positive and
negative returns to be different. All three are mean-reverting
in the sense that the coefficient on the lagged VaR is not con-
strained to be 1.
The indirect GARCH model would be correctly specified if
the underlying data were truly a GARCH(1,1) with an iid er-
ror distribution. The symmetric absolute value and asymmetric
slope quantile specifications would be correctly specified by a
GARCH process in which the standard deviation, rather than
the variance, is modeled either symmetrically or asymmetri-
cally with iid errors. This model was introduced and estimated
by Taylor (1986) and Schwert (1988) and analyzed by Engle
(2002). But the CAViaR specifications are more general than
these GARCH models. Various forms of non-iid error distrib-
utions can be modeled in this way. In fact, these models can
be used for situations with constant volatilities but changing er-
ror distributions, or situations in which both error densities and
volatilities are changing.
4. REGRESSION QUANTILES
The parameters of CAViaR models are estimated by regres-
sion quantiles, as introduced by Koenker and Bassett (1978).
Koenker and Bassett showed how to extend the notion of a sam-
ple quantile to a linear regression model. Consider a sample of
observations y1, . . . , yT generated by the model
yt = x′tβ0 + εθ t, Quantθ (εθ t|xt) = 0, (2)
where xt is a p-vector of regressors and Quantθ (εθ t|xt) is the
θ -quantile of εθ t conditional on xt. Let ft(β) ≡ xtβ . Then the
θ th regression quantile is defined as any βˆ that solves
min
β
1
T
T∑
t=1
[
θ − I(yt < ft(β)
)][ yt − ft(β)]. (3)
Regression quantiles include as a special case the least ab-
solute deviation (LAD) model. It is well known that LAD
is more robust than ordinary least squares (OLS) estimators
whenever the errors have a fat-tailed distribution. Koenker and
Bassett (1978), for example, ran a simple Monte Carlo exper-
iment and showed how the empirical variance of the median,
compared with the variance of the mean, is slightly higher un-
der the normal distribution but much lower under all of the other
distributions considered.
Analysis of linear regression quantile models has been ex-
tended to cases with heteroscedastic (Koenker and Bassett
1982) and nonstationary dependent errors (Portnoy 1991), time
series models (Bloomfield and Steiger 1983), simultaneous
equations models (Amemiya 1982; Powell 1983), and censored
regression models (Powell 1986; Buchinsky and Hahn 1998).
Extensions to the autoregressive quantiles have been proposed
by Koenker and Zhao (1996) and Koul and Saleh (1995). These
approaches differ from the one proposed in this article in that all
of the variables are observable and the models are linear in the
parameters. In the nonlinear case, asymptotic theory for mod-
els with serially independent (but not identically distributed)
errors have been proposed by, among others, Oberhofer (1982),
Dupacova (1987), Powell (1991), and Jureckova and Prochazka
(1993). There is relatively little literature that considers nonlin-
ear quantile regressions in the context of time series. The most
important contributions are those by White (1994, cor. 5.12),
who proved the consistency of the nonlinear regression quan-
tile, both in the iid and stationary dependent cases, and by Weiss
(1991), who showed consistency, asymptotic normality and as-
ymptotic equivalence of Lagrange multiplier (LM) and Wald
tests for LAD estimators for nonlinear dynamic models. Finally,
Mukherjee (1999) extended the concept of regression and au-
toregression quantiles to nonlinear time series models with iid
error terms.
Consider the model
yt = f ( yt−1,xt−1, . . . , y1,x1;β0) + εtθ [Quantθ (εtθ |t) = 0]
≡ ft(β0) + εtθ , t = 1, . . . ,T, (4)
where f1(β0) is some given initial condition, xt is a vector of
exogenous or predetermined variables, β0 ∈ p is the vector of
true unknown parameters that need to be estimated, and t =
[ yt−1,xt−1, . . . , y1,x1, f1(β0)] is the information set available
at time t. Let βˆ be the parameter vector that minimizes (3).
Theorems 1 and 2 show that the nonlinear regression quantile
estimator βˆ is consistent and asymptotically normal. Theorem 3
provides a consistent estimator of the variance–covariance ma-
trix. In Appendix A we give sufficient conditions on f in (4),
together with technical assumptions, for these results to hold.
The proofs are extensions of work of Weiss (1991) and Pow-
ell (1984, 1986, 1991) and are in Appendix B. We denote the
conditional density of εtθ evaluated at 0 by ht(0|t), denote the
1 × p gradient of ft(β) by ∇ft(β), and define ∇f (β) to be a
T × p matrix with typical row ∇ft(β).
370 Journal of Business & Economic Statistics, October 2004
Theorem 1 (Consistency). In model (4), under assump-
tions C0–C7 (see App. A), βˆ p→ β0, where βˆ is the solution
to
min
β
T−1
T∑
t=1
{[
θ − I(yt < ft(β)
)] · [ yt − ft(β)]
}
.
Proof. See Appendix B.
Theorem 2 (Asymptotic normality). In model (4), under as-
sumptions AN1–AN4 and the conditions of Theorem 1,
√
TA−1/2T DT(βˆ − β0)
d→N(0, I),
where
AT ≡ E
[
T−1θ(1 − θ)
T∑
t=1
∇′ft(β0)∇ft(β0)
]
,
DT ≡ E
[
T−1
T∑
t=1
ht(0|t)∇′ft(β0)∇ft(β0)
]
,
and βˆ is computed as in Theorem 1.
Proof. See Appendix B.
Theorem 3 (Variance–covariance matrix estimation). Un-
der assumptions VC1–VC3 and the conditions of Theorems
1 and 2, AˆT − AT p→0 and DˆT − DT p→0, where
AˆT = T−1θ(1 − θ)∇′f (βˆ)∇f (βˆ),
DˆT = (2TcˆT)−1
T∑
t=1
I
(|yt − ft(βˆ)| < cˆT
)∇′ft(βˆ)∇ft(βˆ),
AT and DT have been defined in Theorem 2, and cˆT is a band-
width defined in assumption VC1.
Proof. See Appendix B.
In the proof of Theorem 1, we apply corollary 5.12 of
White (1994), which establishes consistency results for non-
linear models of regression quantiles in a dynamic context.
Assumption C1, requiring continuity in the vector of para-
meters β of the quantile specification, is clearly satisfied by
all of the CAViaR models considered in this article. Assump-
tions C3 and C7 are identification conditions that are common
in the regression quantile literature. Assumptions C4 and C5
are dominance conditions that rule out explosive behaviors
(e.g., the CAViaR equivalent of an indirect integrated GARCH
(IGARCH) process would not be covered by these conditions).
Derivation of the asymptotic distribution builds on the
approximation of the discontinuous gradient of the objec-
tive function with a smooth differentiable function, so that
the usual Taylor expansion can be performed. Assumptions
AN1 and AN2 impose sufficient conditions on the func-
tion ft(β) and on the conditional density function of the error
terms to ensure that this smooth approximation will be suffi-
ciently well behaved. The device for obtaining such an approx-
imation is provided by an extension of theorem 3 of Huber
(1967). This technique is standard in the regression quantile
and LAD literature (Powell 1984, 1991; Weiss 1991). Alterna-
tive strategies for deriving the asymptotic distribution are the
approach suggested by Amemiya (1982), based on the approx-
imation of the regression quantile objective function by a con-
tinuously differentiable function, and the approach based on
empirical processes as suggested by, for example, van de Geer
(2000).
Regarding the variance–covariance matrix, note that AˆT is
simply the outer product of the gradient. Estimation of the
DT matrix is less straightforward, because it involves the
ht(0|t) term. Following Powell (1984, 1986, 1991), we pro-
pose an estimator that combines kernel density estimation with
the heteroscedasticity-consistent covariance matrix estimator
of White (1980). Our Theorem 3 is a generalization of Powell’s
(1991) theorem 3 that accommodates the nonlinear dependent
case. Buchinsky (1995) reported a Monte Carlo study on the
estimation of the variance–covariance matrices in quantile re-
gression models.
Note that all of the models considered in Section 3 satisfy
the continuity and differentiability assumptions C1 and AN1
of Appendix A. The others are technical assumptions that are
impossible to verify in finite samples.
5. TESTING QUANTILE MODELS
If model (4) is the true data generating process (DGP), then
Pr[ yt < ft(β0)] = θ ∀ t. This is equivalent to requiring that
the sequence of indicator functions {I( yt < ft(β0))}Tt=1 be iid.
Hence a property that any VaR estimate should satisfy is that
of providing a filter to transform a (possibly) serially corre-
lated and heteroscedastic time series into a serially independent
sequence of indicator functions. A natural way to test the va-
lidity of the forecast model is to check whether the sequence
{I( yt < ft(β0))}Tt=1 ≡ {It}Tt=1 is iid, as was done by, for example,
Granger, White, and Kamstra (1989) and Christoffersen (1998).
Although these tests can detect the presence of serial correla-
tion in the sequence of indicator functions {It}Tt=1, this is only a
necessary but not sufficient condition to assess the performance
of a quantile model. Indeed, it is not difficult to generate a se-
quence of independent {It}Tt=1 from a given sequence of {yt}Tt=1.
It suffices to define a sequence of independent random variables
{zt}Tt=1, such that
zt =
{
1 with probability θ
−1 with probability (1 − θ). (5)
Then setting ft(β0) = Kzt , for K large, will do the job. No-
tice, however, that once zt is observed, the probability of ex-
ceeding the quantile is known to be almost 0 or 1. Thus the
unconditional probabilities are correct and serially uncorre-
lated, but the conditional probabilities given the quantile are
not. This example is an extreme case of quantile measure-
ment error. Any noise introduced into the quantile estimate will
change the conditional probability of a hit given the estimate
itself.
Therefore, none of these tests has power against this form of
misspecification and none can be simply extended to examine
other explanatory variables. We propose a new test that can be
easily extended to incorporate a variety of alternatives. Define
Hitt(β0) ≡ I
(
yt < ft(β0)
) − θ. (6)
Engle and Manganelli: CAViaR 371
The Hitt(β0) function assumes value (1 − θ) every time yt is
less than the quantile and −θ otherwise. Clearly, the expected
value of Hitt(β0) is 0. Furthermore, from the definition of the
quantile function, the conditional expectation of Hitt(β0) given
any information known at t − 1 must also be 0. In particu-
lar, Hitt(β0) must be uncorrelated with its own lagged values
and with ft(β0), and must have expected value equal to 0. If
Hitt(β0) satisfies these moment conditions, then there will be
no autocorrelation in the hits, no measurement error as in (5),
and the correct fraction of exceptions. Whether there is the right
proportion of hits in each calendar year can be determined by
checking the correlation of Hitt(β0) with annual dummy vari-
ables. If other functions of the past information set, such as
rolling standard deviations or a GARCH volatility estimate, are
suspected of being informative, then these can be incorporated.
A natural way to set up a test is to check whether the test
statistic T−1/2X′(βˆ)Hit(βˆ) is significantly different from 0,
where Xt(βˆ), t = 1, . . . ,T , the typical row of X(βˆ) (possibly
depending on βˆ), is a q-vector measurable t and Hit(βˆ) ≡
[Hit1(βˆ), . . . ,HitT(βˆ)]′.
Let MT ≡ (X′(β0) − E[T−1X′(β0)H∇f (β0)]D−1T ×
∇′f (β0)), where H is a diagonal matrix with typical entry
ht(0|t). Theorem 4 derives the in-sample distribution of the
DQ test. The out-of-sample case is considered in Theorem 5.
Theorem 4 (In-sample dynamic quantile test). Under the as-
sumptions of Theorems 1 and 2 and assumptions DQ1–DQ6,
[
θ(1 − θ)E(T−1MTM′T )
]−1/2T−1/2X′(βˆ)Hit(βˆ) d∼N(0, I).
If assumption DQ7 and the conditions of Theorem 3 also hold,
then
DQIS ≡ Hit
′(βˆ)X(βˆ)(MˆT Mˆ′T)−1X′(βˆ)Hit
′(βˆ)
θ(1 − θ)
d∼χ2q
as T → ∞
where
MˆT ≡ X′(βˆ) −
{
(2TcˆT)−1
T∑
t=1
I
(|yt − ft(βˆ)| < cˆT
)
× X′t(βˆ)∇ft(βˆ)
}
Dˆ−1T ∇′f (βˆ).
Proof. See Appendix B.
If X(βˆ) contains m < q lagged Hitt−i(βˆ) (i = 1, . . . ,m),
then X(βˆ),Hit(βˆ), and ∇f (βˆ) are not conformable, because
X(βˆ) contains only (T − m) elements. Here we implicitly as-
sume, without loss of generality, that the matrices are made
conformable by deleting the first m rows of ∇f (βˆ) and X(βˆ).
Note that if we choose X(βˆ) = ∇f (βˆ), then M = 0, where
0 is a (p,p) matrix of 0’s. This is consistent with the fact that
T−1/2∇′f (βˆ)Hit(βˆ) = op(1), by the first-order conditions of
the regression quantile framework.
To derive the out-of-sample DQ test, let TR denote the num-
ber of in-sample observations and let NR denote the num-
ber of out-of-sample observations (with the dependence of
TR and NR on R as specified in assumption DQ8). Make ex-
plicit the dependence of the relevant variables on the num-
ber of observations, using appropriate subscripts. Define the
q-vector measurable n Xn(βˆTR), n = TR + 1, . . . ,TR + NR,
as the typical row of X(βˆTR), possibly depending on βˆTR , and
Hit(βˆTR) ≡ [HitTR+1(βˆTR), . . . ,HitTR+NR(βˆTR)]′.
Theorem 5 (Out-of-sample dynamic quantile test). Under the
assumptions of Theorems 1 and 2 and assumptions DQ1–DQ3,
DQ8, and DQ9,
DQOOS ≡ N−1R Hit′
(
βˆTR
)
X
(
βˆTR
)[
X′
(
βˆTR
) · X(βˆTR
)]−1
× X′(βˆTR
)
Hit′
(
βˆTR
)
/
(
θ(1 − θ)) d∼χ2q as R → ∞.
Proof. See Appendix B.
The in-sample DQ test is a specification test for the partic-
ular CAViaR process under study and it can be very useful for
model selection purposes. The simpler version of the out-of-
sample DQ test, instead, can be used by regulators to check
whether the VaR estimates submitted by a financial institution
satisfy some basic requirements that every good quantile esti-
mate must satisfy, such as unbiasedeness, independent hits, and
independence of the quantile estimates. The nicest features of
the out-of-sample DQ test are its simplicity and the fact that
it does not depend on the estimation procedure: to implement
it, the evaluator (either the regulator or the risk manager) just
needs a sequence of VaRs and the corresponding values of the
portfolio.
6. EMPIRICAL RESULTS
To implement our methodology on real data, a researcher
needs to construct the historical series of portfolio returns and
to choose a specification of the functional form of the quantile.
We took a sample of 3,392 daily prices from Datastream for
General Motors (GM), IBM, and the S&P 500, and computed
the daily returns as 100 times the difference of the log of the
prices. The samples range from April 7, 1986, to April 7, 1999.
We used the first 2,892 observations to estimate the model and
the last 500 for out-of-sample testing. We estimated 1% and 5%
1-day VaRs, using the four CAViaR specifications described in
Section 3. For the adaptive model, we set G = 10, where G en-
tered the definition of the adaptive model in Section 3. In prin-
ciple, the parameter G itself could be estimated; however, this
would go against the spirit of this model, which is simplicity.
The 5% VaR estimates for GM are plotted in Figure 1, and all
of the results are reported in Table 1.
The table presents the value of the estimated parameters,
the corresponding standard errors and (one-sided) p values, the
value of the regression quantile objective function [eq. (3)], the
percentage of times the VaR is exceeded, and the p value of
the DQ test, both in-sample and out-of-sample. To compute
the VaR series with the CAViaR models, we initialize f1(β)
to the empirical θ -quantile of the first 300 observations. The
instruments used in the out-of-sample DQ test were a con-
stant, the VaR forecast and the first four lagged hits. For the
in-sample DQ test, we did not include the constant and the
VaR forecast, because for some models there was collinearity
372 Journal of Business & Economic Statistics, October 2004
Ta
bl
e
1.
Es
tim
at
es
an
d
Re
le
va
nt
St
at
ist
ics
fo
r
th
e
Fo
ur
CA
Vi
aR
Sp
ec
ific
at
io
n
Sy
m
m
et
ric
ab
so
lu
te
va
lu
e
As
ym
m
et
ric
slo
pe
In
di
re
ct
G
AR
CH
Ad
ap
tiv
e
G
M
IB
M
S&
P
50
0
G
M
IB
M
S&
P
50
0
G
M
IB
M
S&
P
50
0
G
M
IB
M
S&
P
50
0
1%
Va
R
B
et
a1
.
45
11
.
12
61
.
20
39
.
37
34
.
05
58
.
14
76
1.
49
59
1.
32
89
.
23
28
.
29
68
.
16
26
.
55
62
St
a
n
da
rd
e
rr
o
rs
.
20
28
.
09
29
.
06
04
.
24
18
.
05
40
.
04
56
.
92
52
1.
94
88
.
11
91
.
11
09
.
07
36
.
11
50
p
va
lu
es
.
01
31
.
08
72
.
00
04
.
06
13
.
15
09
.
00
06
.
05
30
.
24
77
.
02
53
.
00
37
.
01
36
0
B
et
a2
.
82
63
.
94
76
.
87
32
.
79
95
.
94
23
.
87
29
.
78
04
.
87
40
.
83
50
St
a
n
da
rd
e
rr
o
rs
.
08
26
.
05
01
.
05
07
.
08
69
.
02
47
.
03
02
.
05
90
.
11
33
.
02
25
p
va
lu
e
s
0
0
0
0
0
0
0
0
0
B
et
a3
.
33
05
.
11
34
.
38
19
.
27
79
.
04
99
−.
01
39
.
93
56
.
33
74
1.
05
82
St
a
n
da
rd
e
rr
o
rs
.
16
85
.
11
85
.
27
72
.
13
98
.
05
63
.
11
48
1.
26
19
.
09
53
1.
09
83
p
va
lu
es
.
02
49
.
16
92
.
08
42
.
02
35
.
18
76
.
45
19
.
22
92
.
00
02
.
16
76
B
et
a4
.
45
69
.
25
12
.
49
69
St
a
n
da
rd
e
rr
o
rs
.
17
87
.
08
48
.
13
42
p
va
lu
es
.
00
53
.
00
15
.
00
01
R
Q
17
2.
04
18
2.
32
10
9.
68
16
9.
22
17
9.
40
10
5.
82
17
0.
99
18
3.
43
10
8.
34
17
9.
61
19
2.
20
11
7.
42
H
its
in
-s
am
pl
e
(%
)
1.
00
28
.
96
82
1.
00
28
.
96
82
1.
03
73
.
96
82
1.
00
28
1.
00
28
1.
00
28
.
96
82
1.
24
48
.
93
36
H
its
o
u
t-o
f-s
a
m
pl
e
(%
)
1.
40
00
1.
60
00
1.
80
00
1.
40
00
1.
60
00
1.
60
00
1.
20
00
1.
60
00
1.
80
00
1.
80
00
1.
60
00
1.
20
00
D
Q
in
-s
am
pl
e
(p
va
lu
es
)
.
63
49
.
53
75
.
32
08
.
59
58
.
77
07
.
54
50
.
59
37
.
57
98
.
74
86
.
01
17
0∗
.
16
97
D
Q
o
u
t-o
f-s
a
m
pl
e
( p
va
lu
es
)
.
89
65
.
03
26
.
01
91
.
94
32
.
04
31
.
04
76
.
93
05
.
03
50
.
03
09
.
00
17
∗
.
00
09
∗
.
00
35
∗
5%
Va
R
B
et
a1
.
18
12
.
11
91
.
05
11
.
07
60
.
09
53
.
03
78
.
33
36
.
53
87
.
02
62
.
28
71
.
39
69
.
37
00
St
a
n
da
rd
e
rr
o
rs
.
08
33
.
08
39
.
00
83
.
02
49
.
05
32
.
01
35
.
10
39
.
15
69
.
01
00
.
05
06
.
08
12
.
07
67
p
va
lu
es
.
01
48
.
07
78
0
.
00
11
.
03
66
.
00
26
.
00
07
.
00
03
.
00
43
0
0
0
B
et
a2
.
89
53
.
90
53
.
93
69
.
93
26
.
88
92
.
90
25
.
90
42
.
82
59
.
92
87
St
a
n
da
rd
e
rr
o
rs
.
03
61
.
05
00
.
02
24
.
01
94
.
03
85
.
01
44
.
01
34
.
02
94
.
00
61
p
va
lu
e
s
0
0
0
0
0
0
0
0
0
B
et
a3
.
11
33
.
14
81
.
13
41
.
03
98
.
06
17
.
03
77
.
12
20
.
15
91
.
14
07
St
a
n
da
rd
e
rr
o
rs
.
01
22
.
03
48
.
05
17
.
03
22
.
02
72
.
02
24
.
11
49
.
11
52
.
61
98
p
va
lu
es
0
0
.
00
47
.
10
88
.
01
17
.
04
57
.
14
41
.
08
36
.
41
02
B
et
a4
.
12
18
.
21
87
.
28
71
St
a
n
da
rd
e
rr
o
rs
.
04
05
.
04
65
.
02
58
p
va
lu
es
.
00
13
0
0
R
Q
55
0.
83
52
2.
43
30
6.
68
54
8.
31
51
5.
58
30
0.
82
55
2.
12
52
4.
79
30
5.
93
55
3.
79
52
7.
72
31
2.
06
H
its
in
-s
am
pl
e
(%
)
4.
97
93
5.
01
38
5.
04
84
4.
91
01
4.
97
93
5.
01
38
4.
97
93
5.
04
84
5.
01
38
4.
91
01
4.
84
09
4.
73
72
H
its
o
u
t-o
f-s
a
m
pl
e
(%
)
4.
80
00
6.
00
00
5.
60
00
5.
00
00
7.
40
00
6.
40
00
4.
60
00
7.
40
00
5.
80
00
6.
00
00
5.
00
00
4.
60
00
D
Q
in
-s
am
pl
e
(p
va
lu
es
)
.
36
09
.
08
24
.
36
85
.
91
32
.
61
49
.
95
40
.
10
37
.
17
27
.
26
61
.
05
43
.
00
32
∗
.
03
80
D
Q
o
u
t-o
f-s
a
m
pl
e
( p
va
lu
es
)
.
98
55
.
08
84
.
00
05
∗
.
92
35
.
00
71
∗
.
00
07
∗
.
87
70
.
12
08
.
00
01
∗
.
36
81
.
50
21
.
02
40
N
OT
E:
Si
gn
ific
an
tc
o
e
ffi
cie
n
ts
a
t5
%
fo
rm
a
tte
d
in
bo
ld
;“
∗”
de
no
te
s
re
jec
tio
n
fro
m
th
e
D
Q
te
st
a
t1
%
si
gn
ific
an
ce
le
ve
l.
Engle and Manganelli: CAViaR 373
(a) (b)
(c) (d)
Figure 1. 5% Estimated CAViaR Plots for GM: (a) Symmetric
Absolute Value; (b) Asymmetric Slope; (c) GARCH; (d) Adaptive.
Because VaR is usually reported as a positive number, we set
ˆVaRt−1 = −ft−1(βˆ). The sample ranges from April 7, 1986, to April 7,
1999. The spike at the beginning of the sample is the 1987 crash. The
increase in the quantile estimates toward the end of the sample reflects
the increase in overall volatility following the Russian and Asian crises.
with the matrix of derivatives. We computed the standard er-
rors and the variance–covariance matrix of the in-sample DQ
test as described in Theorems 3 and 4. The formulas to com-
pute DˆT and MˆT were implemented using k-nearest neighbor
estimators, with k = 40 for 1% VaR and k = 60 for 5% VaR.
As optimization routines, we used the Nelder–Mead simplex
algorithm and a quasi-Newton method. All of the computations
were done in MATLAB 6.1, using the functions fminsearch and
fminunc as optimization algorithms. The loops to compute the
recursive quantile functions were coded in C.
We optimized the models using the following procedure. We
generated n vectors using a uniform random number gener-
ator between 0 and 1. We computed the regression quantile
(RQ) function described in equation (3) for each of these vec-
tors and selected the m vectors that produced the lowest RQ
criterion as initial values for the optimization routine. We set
n = [104,105,104,104] and m = [10,15,10,5] for the sym-
metric absolute value, asymmetric slope, Indirect GARCH, and
adaptive models. For each of these initial values, we ran first
the simplex algorithm. We then fed the optimal parameters to
the quasi-Newton algorithm and chose the new optimal parame-
ters as the new initial conditions for the simplex. We repeated
this procedure until the convergence criterion was satisfied. Tol-
erance levels for the function and the parameters values were
set to 10−10. Finally, we selected the vector that produced the
lowest RQ criterion. An alternative optimization routine is the
interior point algorithm for nonlinear regression quantiles sug-
gested by Koenker and Park (1996).
Figure 2 plots the CAViaR news impact curve for the 1%
VaR estimates of the S&P 500. Notice how the adaptive and the
asymmetric slope news impact curves differ from the others.
For both indirect GARCH and symmetric absolute value mod-
els, past returns (either positive or negative) have a symmetric
(a) (b)
(c) (d)
Figure 2. 1% CAViaR News Impact Curve for S&P 500 for (a) Sym-
metric Absolute Value, (b) Asymmetric Slope, (c) Indirect GARCH, and
(d) Adaptive. For given estimated parameter vector βˆ and setting (arbi-
trarily) ˆVaRt−1 = −1.645, the CAViaR news impact curve shows howˆVaRt changes as lagged portfolio returns yt−1 vary. The strong asym-
metry of the asymmetric slope news impact curve suggests that nega-
tive returns might have a much stronger effect on the VaR estimate than
positive returns.
impact on VaR. In contrast, for the adaptive model, the most
important news is whether or not past returns exceeded the pre-
vious VaR estimate. Finally, the sharp difference between the
impact of positive and negative returns in the asymmetric slope
model suggests that there might be relevant asymmetries in the
behavior of the 1% quantile of this portfolio.
Turning our attention to Table 1, the first striking result is
that the coefficient of the autoregressive term (β2) is always
very significant. This confirms that the phenomenon of clus-
tering of volatilities is relevant also in the tails. A second in-
teresting point is the precision of all the models, as measured
by the percentage of in-sample hits. This is not surprising, be-
cause the objective function of RQ models is designed exactly
to achieve this kind of result. The results for the 1% VaR show
that the symmetric absolute value, asymmetric slope, and indi-
rect GARCH models do a good job describing the evolution of
the left tail for the three assets under study. The results are par-
ticularly good for GM, producing a rather accurate percentage
of hits out-of-sample (1.4% for the symmetric absolute value
and the asymmetric slope and 1.2% for the indirect GARCH).
The performance of the adaptive model is inferior both in-
sample and out-of-sample, even though the percentage of hits
seems reasonably close to 1. This shows that looking only at
the number of exceptions, as suggested by the Basle Commit-
tee on Banking Supervision (1996), may be a very unsatisfac-
tory way of evaluating the performance of a VaR model. But
5% results present a different picture. All the models perform
well with GM. Note the remarkable precision of the percent-
age of out-of-sample hits generated by the asymmetric slope
model (5.0%). Also notice that this time also the adaptive model
is not rejected by the DQ tests. For IBM, the asymmetric slope,
of which the symmetric absolute value is a special case, tends
to overfit in-sample, providing a very poor performance out-of-
sample. Finally, for the S&P 500 5% VaR, only the adaptive
374 Journal of Business & Economic Statistics, October 2004
model survives the DQ test at the 1% confidence level, produc-
ing a rather accurate number of out-of-sample hits (4.6%). The
poor out-of-sample performance of the other models can prob-
ably be explained by the fact that the last part of the sample
of the S&P 500 is characterized by a sudden spur of volatility
and roughly coincides with our out-of-sample period. Finally,
it is interesting to note in the asymmetric slope model how
the coefficients of the negative part of lagged returns are al-
ways strongly significant, whereas those associated to positive
returns are sometimes not significantly different from 0. This
indicates the presence of strong asymmetric impacts on VaR of
lagged returns.
The fact that the DQ tests select different models for differ-
ent confidence levels suggests that the process governing the
tail behavior might change as we move further out in the tail. In
particular, this contradicts the assumption behind GARCH and
RiskMetrics, because these approaches implicitly assume that
the tails follow the same process as the rest of the returns. Al-
though GARCH might be a useful model for describing the evo-
lution of volatility, the results in this article show that it might
provide an unsatisfactory approximation when applied to tail
estimation.
7. CONCLUSION
We have proposed a new approach to VaR estimation. Most
existing methods estimate the distribution of the returns and
then recover its quantile in an indirect way. In contrast, we di-
rectly model the quantile. To do this, we introduce a new class
of models, the CAViaR models, which specify the evolution of
the quantile over time using a special type of autoregressive
process. We estimate the unknown parameters by minimizing
the RQ loss function. We also introduced the DQ test, a new test
to evaluate the performance of quantile models. Applications
to real data illustrate the ability of CAViaR models to adapt to
new risk environments. Moreover, our findings suggest that the
process governing the behavior of the tails might be different
from that of the rest of the distribution.
ACKNOWLEDGMENTS
The authors thank Hal White, Moshe Buchinsky, Wouter
Den Haan, Joel Horowitz, Jim Powell, and the participants
of the UCSD, Yale, MIT-Harvard, Wharton, Iowa, and NYU
econometric seminars for their valuable comments. The views
expressed in this article are those of the authors and do not nec-
essarily reflect those of the European Central Bank.
APPENDIX A: ASSUMPTIONS
Consistency Assumptions
C0. (,F,P) is a complete probability space, and {εtθ ,xt},
t = 1,2, . . . , are random vectors on this space.
C1. The function ft(β) :kt × B → is such that for each
β ∈ B, a compact subset of p, ft(β) is measurable with
respect to the information set t and ft(·) is continu-
ous in B, t = 1,2, . . . , for a given choice of explanatory
variables {yt−1,xt−1, . . . , y1,x1}.
C2. Conditional on all of the past information t , the er-
ror terms εtθ form a stationary process, with continuous
conditional density ht(ε|t).
C3. There exists h > 0 such that for all t, ht(0|t) ≥ h.
C4. | ft(β)| < K(t) for each β ∈ B and for all t, where
K(t) is some (possibly) stochastic function of vari-
ables that belong to the information set, such that
E(|K(t)|) ≤ K0 < ∞, for some constant K0.
C5. E[|εtθ |] < ∞ for all t.
C6. {[θ − I( yt < ft(β))][ yt − ft(β)]} obeys the uniform law
of large numbers.
C7. For every ξ > 0, there exists a τ > 0 such that if
‖β − β0‖ ≥ ξ , then lim infT→∞ T−1 ∑P[| ft(β) −
ft(β0)| > τ ] > 0.
Asymptotic Normality Assumptions
AN1. ft(β) is differentiable in B and for all β and γ in
a neighborhood υ0 of β0, such that ‖β − γ ‖ ≤ d for
d sufficiently small and for all t:
(a) ‖∇ft(β)‖ ≤ F(t), where F(t) is some (possi-
bly) stochastic function of variables that belong to
the information set and E(F(t)3) ≤ F0 < ∞, for
some constant F0.
(b) ‖∇ft(β) − ∇ft(γ )‖ ≤ M(t,β,γ ) = O(‖β −
γ ‖), where M(t,β,γ ) is some function such
that E[M(t,β,γ )]2 ≤ M0‖β − γ ‖ < ∞ and
E[M(t,β,γ )F(t)] ≤ M1‖β − γ ‖ < ∞ for
some constants M0 and M1.
AN2. (a) ht(ε|t) ≤ N < ∞ ∀ t, for some constant N.
(b) ht(ε|t) satisfies the Lipschitz condition |ht(λ1|
t) − ht(λ2|t)| ≤ L|λ1 − λ2| for some constant
L < ∞ ∀ t.
AN3. The matrices AT ≡ E[T−1θ(1 − θ)∑Tt=1 ∇′ft(β0) ×
∇ft(β0)] and DT ≡ E[T−1 ∑Tt=1 ht(0|t)∇′ft(β0) ×
∇ft(β0)] have the smallest eigenvalues bounded be-
low by a positive constant for T sufficiently large.
AN4. The sequence {T−1/2 ∑Tt=1[θ − I( yt < ft(β0))] ·
∇′ft(β0)} obeys the central limit theorem.
Variance–Covariance Matrix Estimation Assumptions
VC1. cˆT/cT
p→1, where the nonstochastic positive sequence
cT satisfies cT = o(1) and c−1T = o(T1/2).
VC2. E(|F(t)|4) ≤ F1 < ∞ for all t and for some con-
stant F1, where F(t) has been defined in assumption
AN1(a).
VC3. T−1θ(1 − θ)∑Tt=1 ∇′ft(β0)∇ft(β0) − AT
p→0 and
T−1
∑T
t=1 ht(0|t)∇′ft(β0)∇ft(β0) − DT
p→0.
In-Sample Dynamic Quantile Test Assumption
DQ1. Xt(β) is different element wise from ∇ft(β), is
measurable t,‖Xt(β)‖ ≤ W(t), where W(t) is
some (possibly) stochastic function of variables that
belong to the information set, such that E[W(t) ×
M(t,β,γ )] ≤ W0‖β − γ ‖ < ∞ and E[[W(t) ·
F(t)]2] < W1 < ∞ for some finite constants
W0 and W1, and F(t) and M(t,β,γ ) have been
defined in AN1.
Engle and Manganelli: CAViaR 375
DQ2. ‖Xt(β) − Xt(γ )‖ ≤ S(t,β,γ ), where E[S(t,β,
γ )] ≤ S0‖β − γ ‖ < ∞, E[W(t)S(t,β,γ )] ≤
S1‖β − γ ‖ < ∞, and for some constant S0.
DQ3. Let {ε1t , . . . , εJit } the set of values for which Xt(β) is
not differentiable. Then Pr(εtθ = ε jt ) = 0 for j = 1,
. . . , Ji. Whenever the derivative exists, ‖∇Xt(β)‖ ≤
Z(t), where Z(t) is some (possibly) stochastic
function of variables that belong to the informa-
tion set, such that E[Z(t)r] < Z0 < ∞, r = 1,2, for
some constant Z0.
DQ4. T−1X′(β0)H∇f (β0)−E[T−1X′(β0)H∇f (β0)] p→0.
DQ5. T−1MTM′T − T−1E(MTM′T )
p→0, where MT ≡
X′(β0) − E[T−1X′(β0)H∇f (β0)] · D−1T · ∇′f (β0).
DQ6. The sequence {T−1/2MT Hit(β0)} obeys the central
limit theorem.
DQ7. T−1E(MTM′T) is a nonsingular matrix.
Out-of-Sample Dynamic Quantile Test Assumptions
DQ8. limR→∞ TR = ∞, limR→∞ NR = ∞, and
limR→∞ NR/TR = 0.
DQ9. The sequence {N−1/2R X′(β0)Hit(β0)} obeys the cen-
tral limit theorem.
APPENDIX B: PROOFS
Proof of Theorem 1
We verify that the conditions of corollary 5.12 of White
(1994, p. 75) are satisfied. We check only assumptions
3.1 and 3.2 of White’s corollary, because the others are ob-
viously satisfied.
Let QT(β) ≡ T−1 ∑Tt=1 qt(β) , where qt(β) ≡ [θ − I( yt <
ft(β))][ yt − ft(β)]. First, we need to show that E[qt(β)] exists
and is finite for every β . This can be easily checked as follows:
E[qt(β)] < E|yt − ft(β)| ≤ E|εtθ | + E| ft(β)| + E| ft(β0)| < ∞,
by assumptions C4 and C5. Moreover, because f is continuous
in β by assumption C1, qt(β) is continuous [because the re-
gression quantile objective function is continuous in f (β)], and
hence its expected value (which we just showed to be finite)
will be also continuous. It remains to show that E[VT(β)] =
E[QT(β) − QT(β0)] is uniquely minimized at β0 for T suffi-
ciently large. Let vt(β) ≡ qt(β) − qt(β0). Note that qt(β) =
[θ − I(εtθ < δt(β))][εtθ − δt(β)], where δt(β) ≡ ft(β) − ft(β0).
Then
vt(β) =



(1 − θ)δt(β) if εtθ < δt(β) and εtθ < 0
(1 − θ)δt(β) − εtθ if εtθ < δt(β) and εtθ > 0
εtθ − θδt(β) if εtθ ≥ δt(β) and εtθ < 0
−θδt(β) if εtθ ≥ δt(β) and εtθ > 0.
After some algebra, it can be shown that
E[vt(β)|t] = I
(
δt(β) < 0
)∫ 0
−|δt(β)|
(
λ + |δt(β)|
)
ht(λ|t)dλ
+ I(δt(β) > 0
)∫ |δt(β)|
0
(|δt(β)| − λ
)
ht(λ|t)dλ.
Reasoning following Powell (1984), the continuity of ht(·|t)
(assumption C2) and assumption C3 imply that there exist
h1 > 0 such that ht(λ|t) > h1 whenever |λ| < h1. Hence, for
any 0 < τ < h1,
E[vt(β)|t] ≥ I
(
δt(β) < −τ
)∫ 0
−τ
[λ + τ ]h1 dλ
+ I(δt(β) > τ
)∫ τ
0
[τ − λ]h1 dλ
= 12τ 2h1I
(|δt(β)| > τ
)
.
Therefore, taking the unconditional expectation,
E[VT(β)] ≡ E
[
T−1
T∑
t=1
vt(β)
]
≥ 12τ 2h1T−1
T∑
t=1
Pr
[| ft(β) − ft(β0)| > τ
]
,
which is greater than 0 by assumption C7 if ‖β − β0‖ ≥ ξ .
Proof of Theorem 2
The proof builds on Huber’s (1967) theorem 3. Weiss (1991)
showed that Huber’s conclusion also holds in the case of non-iid
dependent random variables. Define Hitt(β) ≡ I( yt < ft(β)) −
θ and gt(β) ≡ Hitt(β)∇′ft(β). The strategy of the proof fol-
lows three steps: (1) Show that Huber’s theorem holds; (2) ap-
ply Huber’s theorem; and (3) apply the central limit theorem to
T−1/2
∑T
t=1 gt(β0).
To verify Huber’s conditions, define λT (β) ≡ T−1 ×∑T
t=1 E[gt(β)] and µt(β,d) ≡ sup‖τ−β‖≤d ‖gt(τ ) − gt(β)‖.
Here we only show that T−1/2
∑T
t=1 gt(βˆ) = op(1) and that
assumptions (N2) and (N3) in the proof of Theorem 3 of Weiss
(1991) are satisfied, because the other conditions are easily
checked. We follow Ruppert and Carroll’s (1980) strategy (see
Lemmas A1 and A2). Let {ej}pj=1 be the standard basis of p
and define Qj(a) ≡ −T−1/2 ∑Tt=1 qt(βˆ + aej), where a is a
scalar. Let Gj(a) be the (finite) one-sided derivative of Qj(a),
that is,
Gj(a) ≡ −T−1/2
T∑
t=1
∇j ft(βˆ + aej)Hitt(βˆ + aej).
Because Qj(a) is continuous in a and achieves a maximum at 0,
it must be that in a neighborhood of 0, for some ξ > 0,
|Gj(0)| ≤ Gj(ξ) − Gj(−ξ)
= T−1/2
T∑
t=1
[−∇j ft(βˆ + ξej)Hitt(βˆ + ξej)
+ ∇j ft(βˆ − ξej)Hitt(βˆ − ξej)
]
.
Now note that by taking the limit for ξ → 0,
|Gj(0)| ≤ T−1/2
T∑
t=1
|∇j ft(βˆ)|I
(
yt = ft(βˆ)
)
≤ T−1/2
[
max
1≤t≤T F(t)
]
·
T∑
t=1
I
(
yt = ft(βˆ)
)
.
376 Journal of Business & Economic Statistics, October 2004
But T−1/2[max1≤t≤T F(t)] = op(1) by assumption AN1(a)
and
∑T
t=1 I( yt = ft(βˆ)) = Oa.s.(1) by assumption C2. There-
fore, Gj(0)
p→0. Since this holds for every j, this implies that
T−1/2
∑T
t=1 gt(βˆ) = op(1).
For (N2) of Weiss’s (1991) proof of Theorem 3, note that
λT(β
0) is well defined, because β0 is an interior point of B
by assumption AN1. Then E[gt(β0)] = E[E(Hitt(β0)|t) ×
∇′ft(β0)] = 0, by the assumption that model (4) is specified
correctly.
For N3(i), using the mean value theorem to expand λT (β)
around β0(applying Leibnitz’s rule for differentiating under the
integral sign), we get
λT (β) = T−1
T∑
t=1
E
{
∇′ft(β∗)∇ft(β∗)
×
[
I
(
δt(β
∗) > 0
)∫ δt(β∗)
0
ht(λ|t)dλ
− I(δt(β∗) < 0
)∫ 0
δt(β
∗)
ht(λ|t)dλ
]}
+ T−1
T∑
t=1
E
{∇′ft(β∗) · ∇ft(β∗)ht
(
δt(β
∗)|t
)}
× (β − β0)
≡ T(β∗)(β − β0),
where β∗ lies between β and β0. We now show that T (β∗) =
DT + O(‖β − β0‖), where DT is defined in assumption AN3.
‖T(β∗) − DT‖
=
∥∥∥∥∥T
−1
T∑
t=1
E
{
∇′ft(β∗)∇ft(β∗)
×
[
I
(
δt(β
∗) > 0
)∫ δt(β∗)
0
ht(λ|t)dλ
− I(δt(β∗) < 0
)∫ 0
δt(β
∗)
ht(λ|t)dλ
]}
+ T−1
T∑
t=1
E
[∇′ft(β∗)∇ft(β∗)ht
(
δt(β
∗)|t
)
− ∇′ft(β0)∇ft(β0)ht(0|t)
]
∥∥∥∥∥.
The first line of the foregoing expression can be shown to be
O(‖β∗ − β0‖); therefore, O(‖β − β0‖), by a mean value ex-
pansion of the integrals around β0. For the second line, invok-
ing assumptions AN1(a), AN1(b), AN2(a), and AN2(b), note
that it can be rewritten as
∥∥∥∥∥T
−1
T∑
t=1
E
[
∇′ft(β∗)∇ft(β∗)ht
(
δt(β
∗)|t
)
− ∇′ft(β0)∇ft(β∗)ht
(
δt(β
∗)|t
)
+ ∇′ft(β0)∇ft(β∗)ht
(
δt(β
∗)|t
)
− ∇′ft(β0)∇ft(β0)ht
(
δt(β
∗)|t
)
+ ∇′ft(β0)∇ft(β0)ht
(
δt(β
∗)|t
)
− ∇′ft(β0)∇ft(β0)ht(0|t)
]∥∥∥∥∥
≤ T−1
T∑
t=1
E
[
M(t,β∗,β0) · F(t) · N
+ N · F(t) · M(t,β∗,β0)
+ F(t)3 · L‖β − β0‖
]
≤ T−1
T∑
t=1
(
2N · M1 · ‖β − β0‖ + F0L‖β − β0‖
)
≤ (2N · M1 + F0L)‖β − β0‖
= O(‖β − β0‖).
Therefore,
λT(β) = DT(β − β0) + O(‖β − β0‖2). (B.1)
But because DT is positive definite for T sufficiently large by
assumption AN3, the result follows.
For N3(ii), noting that |Hitt(τ ) − Hitt(β)| = I(|yt − ft(β)| <
| ft(τ ) − ft(β)|), we have
µt(β, δ)
≤ sup
‖τ−β‖≤d
‖∇′ft(τ ) − ∇′ft(β)‖
+ sup
‖τ−β‖≤d
‖∇′ft(β)‖ · I
(|yt − ft(β)| < | ft(τ ) − ft(β)|
)
.
Thus,
µt(β, δ)
≤ M(t,β,τ ) + F(t) · I
(|yt − ft(β)| < | ft(τ ) − ft(β)|
)
and
E[µt(β, δ)] ≤ M0d + E
[
F(t) · 2|∇ft(τ∗) · (τ − β)| · N
]
≤ M0d + 2NF0d
= O(d),
where d is defined in assumption AN1.
Finally, for N3(iii), we have
E[µt(β, δ)2] ≤ M0d + E
[
F(t)2 · 2F(t) · ‖τ − β‖ · N
+ 2M(t,β,τ ) · F(t)
]
≤ M0d + 2F0Nd + 2M1d
= O(d).
We can therefore apply Huber’s theorem,
T1/2λT (βˆ) = −T−1/2
T∑
t=1
gt(β0) + op(1).
Consistency of βˆ and application of Slutsky’s theorem to (B.1)
give
T1/2λT(βˆ) = DT · T1/2(βˆ − β0) + op(1).
Engle and Manganelli: CAViaR 377
This, together with Huber’s result, yields
DT · T1/2(βˆ − β0) = −T−1/2
T∑
t=1
gt(β0) + op(1). (B.2)
Application of the central limit theorem (assumption AN4)
completes the proof.
Proof of Theorem 3
The proof that AˆT − AT p→0 is standard and is omitted. Fol-
lowing Powell (1991), define
D˜T ≡ (2TcT)−1
T∑
t=1
I(|εtθ | < cT)∇′ft(β0)∇ft(β0).
We first show that DˆT − D˜T = op(1) and then that D˜T − DT =
op(1).
Define εˆt ≡ yt − ft(βˆ). Then
‖DˆT − D˜T‖
= cT
cˆT
∥∥∥∥∥(2TcT)
−1
×
T∑
t=1
{[
I(|εˆt| < cˆT) − I(|εtθ | < cT)
]∇′ft(βˆ)∇ft(βˆ)
+ I(|εtθ | < cT)[∇′ft(βˆ) − ∇′ft(β0)]∇ft(βˆ)
+ I(|εtθ | < cT)∇′ft(β0)[∇ft(βˆ) − ∇ft(β0)]
+ cT − cˆT
cT
I(|εtθ | < cT)∇′ft(β0)∇ft(β0)
}∥∥∥∥∥.
The indicator functions in the first line satisfy the inequality
|I(|εˆt| < cˆT) − I(|εtθ | < cT)|
≤ I(|εtθ − cT | < |δt(βˆ)| + |cˆT − cT |
)
+ I(|εtθ + cT | < |δt(βˆ)| + |cˆT − cT |
)
.
Thus
‖DˆT − D˜T‖
≤ cT
cˆT
(2TcT)−1
×
T∑
t=1
{
I
(|εt − cT | < |δt(βˆ)| + |cˆT − cT |
)
+ I(|εt + cT | < |δt(βˆ)| + |cˆT − cT |
)} · F(t)2
+ I(|εt| < cT) · M(t, βˆ,β0) · F(t)
+ I(|εt| < cT) · F(t) · M(t, βˆ,β0)
+ cT − cˆT
cT
I(|εtθ | < cT) · F(t)2
≡ cT
cˆT
(A1 + 2A2 + A3).
Now suppose that for given d > 0 (which can be chosen ar-
bitrarily small), T is sufficiently large that | cT−cˆT
cT
| < d and
c−1T ‖βˆ − β0‖ < d. It is possible to show that E(Ai) = O(d),
for i = 1,2,3. This implies that we found a bounding function
for ‖DˆT − D˜T‖, which can be made arbitrarily small in proba-
bility, by choosing d sufficiently small. Here we show only that
E(A1) = O(d), because the others are easily derived:
E(A1) ≤ E
{
(2TcT)−1
×
T∑
t=1
{
I
(|εt − cT | < ‖∇ft(β∗)‖ · ‖βˆ − β0‖
+ |cˆT − cT |
)
+ I(|εt + cT | < ‖∇ft(β∗)‖ · ‖βˆ − β0‖
+ |cˆT − cT |
)} · F(t)2
}
≤ E
{
(2TcT)−1
T∑
t=1
{
I
(|εt − cT | < dcT[F(t) + 1]
)
+ I(|εt + cT | < dcT [F(t) + 1]
)}
× F(t)2
}
≤ E
{
(2TcT)−1
T∑
t=1
4dcT[F(t) + 1]HF(t)2
}
≤ T−1
T∑
t=1
4HF0 · d = 4HF0 · d = O(d).
To show that D˜T − DT = op(1), rewrite this difference as
D˜T − DT
= (2TcT)−1
T∑
t=1
{
I(|εtθ | < cT)∇′ft(β0)∇ft(β0)
− E[I(|εtθ | < cT)|t
]∇′ft(β0)∇ft(β0)
}
+ T−1
T∑
t=1
{
(2cT)−1E
[
I(|εtθ | < cT)|t
]∇′ft(β0)∇ft(β0)
− E[ht(0|t)∇′ft(β0)∇ft(β0)
]}
.
For the first term, note that it has expectation 0 and variance
equal to
E
{
(2TcT)−1
T∑
t=1
I(|εtθ | < cT)∇′ft(β0)∇ft(β0)
− E[I(εtθ | < cT)|t
]∇′ft(β0)∇ft(β0)
}2
= (2TcT)−2E
{ T∑
t=1
{
I(|εtθ | < cT) − E
[
I(εtθ | < cT)|t
]}2
× [∇′ft(β0)∇ft(β0)]2
}
378 Journal of Business & Economic Statistics, October 2004
≤ (2TcT)−2
T∑
t=1
E[F(t)F(t)]2
= (4Tc2T)−1F1 = o(1),
where the first equality holds because all of the cross-products
are 0 by the law of iterated expectations, and the inequality ex-
ploits assumption VC2. Hence the first term converges to 0 in
mean square, and therefore converges to 0 in probability. For
the second term, note that
∣∣(2cT)−1E
[
I(|εtθ | < cT)|t
] − ht(0|t)
∣∣
=
∣∣∣∣(2cT)
−1
∫ cT
−cT
ht(λ|t)dλ − ht(0|t)
∣∣∣∣
≤ ∣∣(2cT)−12cTht(c∗|t) − ht(0|t)
∣∣
≤ L|cT | = op(1),
where ht(c∗|t) ≡ maxλ∈[−cT ,cT ] ht(λ|t) and the second in-
equality exploits AN2(b). Substituting and using assump-
tion VC3, the second term also converges to 0 in probability.
Therefore, D˜T − DT p→0, and the result follows.
Proof of Theorem 4
We first approximate the discontinuous function Hitt(βˆ) with
a continuously differentiable function. We then apply the mean
value theorem around β0 and show that the approximated test
statistic converges in distribution to the normal distribution
stated in Theorem 4. Finally, we prove that this approximation
converges in probability to the test statistic defined in Theo-
rem 4.
Define
Hit⊕t (βˆ) ≡
[
1 + exp{c−1T εˆt}
]−1 − θ
≡ I∗(εˆt) − θ,
where εˆt ≡ yt − ft(βˆ) and cT is a nonstochastic sequence such
that limT→∞ cT = 0. Then
∇βHit⊕t (βˆ) = c−1T exp{c−1T εˆt}
[
1 + exp{c−1T εˆt}
]−2∇ft(βˆ)
≡ kcT (εˆt) · ∇ft(βˆ).
Note that kcT (εˆt) is the pdf of a logistic with mean 0 and para-
meter cT . In matrix form, we write ∇βHit⊕(βˆ) = K(εˆt)∇f (βˆ),
where K(εˆt) is a diagonal matrix with typical entry kcT (εˆt).
Now, because Xt(βˆ) is bounded in probability and Hit⊕t (βˆ) is
bounded between −θ and 1 − θ , note that
T−1/2X′(βˆ)Hit⊕(βˆ)
= T−1/2
T∑
t=1
[
X′t(βˆ)Hit⊕t (βˆ) ·
(
1 −
Ji∑
j=1
I(εtθ = ε jt )
)]
+ op(1),
because the points over which Xt(βˆ) is nondifferentiable form
a set of measure 0 by assumption DQ3. In the following, for
simplicity of notation, we assume that Xt(βˆ) is differentiable
everywhere. The case of nondifferentiability would be covered
by working with T−1/2
∑T
t=1[X′t(βˆ)Hit⊕t (βˆ) ·(1−
∑Ji
j=1 I(εtθ =
ε
j
t ))] rather than with T−1/2X′t(βˆ)Hit⊕t (βˆ). An application of
the mean value theorem gives
T−1/2X′(βˆ)Hit⊕(βˆ)
= T−1/2X′(β0)Hit⊕(β0)
+ T−1/2[∇X(β∗)Hit⊕(β∗) + X(β∗)K(ε∗t )∇f (β∗)
]
× (βˆ − β0),
where β∗ lies between βˆ and β0, ε∗t ≡ yt − ft(β∗), X′(β) ≡
[X′1(β), . . . ,X′T(β)], ∇X(β) ≡ [∇X1(β), . . . ,∇XT(β)],
Hit⊕(β) ≡ [Hit⊕1 (β), . . . ,Hit⊕T (β)]′, K(ε∗t ) ≡ diag([kcT (ε∗1),
. . . , kcT (ε∗T)]), and ∇f (β) ≡ [∇′f1(β), . . . ,∇′fT(β)]′. By adding
and subtracting appropriate terms, we can rewrite the foregoing
expression as
T−1/2X′(βˆ)Hit⊕(βˆ)
= T−1/2X′(β0)Hit(β0)
− E[T−1X(β0)H∇f (β0)] · D−1T · T−1/2∇′f (β0)Hit(β0)
+ E[T−1X(β0)H∇f (β0)]D−1T T−1/2∇′f (β0)Hit(β0)
− T−1[X(β0)H∇f (β0)]D−1T T−1/2∇′f (β0)Hit(β0)
+ T−1/2X(β0)Hit⊕(β0) − T−1/2X′(β0)Hit(β0)
+ T−1/2[∇X(β∗)Hit⊕(β∗) + X(β∗)K(ε∗t )∇f (β∗)
]
× (βˆ − β0)
+ T−1[X(β0)H∇f (β0)]
× D−1T · T−1/2∇′f (β0)Hit(β0). (B.3)
We first need to show that the terms in the last five lines
are op(1). The term in the fourth and fifth lines is op(1) by
assumption DQ4. For the term in the sixth line, noting that
I∗(|εtθ |) = 1 − I∗(−|εtθ |), we have, for each t,
∣∣Hit⊕t (β0) − Hitt(β0)
∣∣
≤ I∗(|εtθ |)
[
I(|εtθ | ≥ T−d) + I(|εtθ | < T−d)
]
≡ Ct + Dt,
where d is a positive number greater than 1/2, such that
limT→∞ cTTd = 0. Therefore,
T−1/2
T∑
t=1
∥∥Xt(β0)[Hit⊕t (β0) − Hitt(β0)]
∥∥
≤ T−1/2
T∑
t=1
‖Xt(β0)‖ · |Hit⊕t (β0) − Hitt(β0)|
≤ T−1/2
T∑
t=1
W(t) · (Ct + Dt),
where Ct ≡ I∗(|εtθ |) · I(|εtθ | ≥ T−d) and Dt ≡ I∗(|εtθ |) ·
I(|εtθ | < T−d).
Engle and Manganelli: CAViaR 379
Noting that I∗(|εtθ |) is decreasing in |εtθ |, we have Ct ≤
I∗(T−d). Therefore,
T−1/2
T∑
t=1
E[W(t)Ct]
≤ E[W(t)]
[
1 + exp(c−1T T−d)
]−1
= T−1/2
T∑
t=1
W0
[
1 + exp(c−1T T−d)
]−1
= T1/2W0
[
1 + exp(c−1T T−d)
]−1 T→∞→ 0.
For Dt, note that Dt ≤ I(|εtθ | < T−d), because I∗(|εtθ |) is
bounded between 0 and 1. Therefore, for any ξ > 0,
T−1/2
T∑
t=1
Pr
(
W(t)Dt > ξ
)
≤ T−1/2ξ−1
T∑
t=1
E
[
W(t)
∫ T−d
−T−d
ht(λ|t)dλ
]
≤ T−1/2ξ−1
T∑
t=1
W0 · 2T−dN
= 2ξ−1W0NT−d+1/2 T→∞→ 0.
Rewrite the term in the seventh line of (B.3) as
T−1/2
[∇X(β∗)Hit⊕(β∗) + X(β∗)K(ε∗t )∇f (β∗)
]
× T−1/2D−1T DTT1/2(βˆ − β0).
Then the last two lines of (B.3) will cancel each other asymp-
totically if
DTT1/2(βˆ − β0) = −T−1/2∇′f (β0)Hit(β0) + op(1) (B.4)
and
T−1
[∇X(β∗)Hit⊕(β∗) + X(β∗)K(ε∗t )∇f (β∗)
]
D−1T
= T−1[X(β0)H∇f (β0)] · D−1T + op(1). (B.5)
(B.4) is equivalent to (B.2). For (B.5) note that, by the consis-
tency of βˆ ,
T−1
[∇X(β∗)Hit⊕(β∗) + X(β∗)K(ε∗t )∇f (β∗)
]
−T−1[∇X(β0)Hit⊕(β0) + X(β0)K(εtθ )∇f (β0)
] p→0.
For T−1∇X(β0)Hit⊕(β0), we have already shown that T−1 ×
∇X(β0)Hit⊕(β0) − T−1∇X(β0)Hit(β0) p→0. Moreover,
E
[
T−1
T∑
t=1
∇Xt(β0)Hitt(β0)
]
= E
[
T−1
T∑
t=1
∇Xt(β0)E
[
Hitt(β0)|t
]
]
= 0
and
Var
[
T−1
T∑
t=1
∇Xt(β0)Hitt(β0)
]
≤ T−2
T∑
t=1
E
[∇Xt(β0)∇′j Xt(β0)
]
≤ T−2
T∑
t=1
E[Z(t)Z(t)]
= T−1Z0 T→∞→ 0.
It remains to show that T−1[X′(β0)K(εtθ )∇f (β0)] −
T−1[X′(β0)H∇f (β0)] = op(1). Rewrite this term as
T−1
[
X′(β0)K(εtθ )∇f (β0)
] − T−1[X′(β0)H∇f (β0)]
= T−1
T∑
t=1
[
kcT (εtθ ) − E
(
kcT (εtθ )
∣∣t
)]
X′t(β0)∇ft(β0)
+ T−1
T∑
t=1
[
E
(
kcT (εtθ )
∣∣t
) − ht(0|t)
]
X′t(β0)∇ft(β0).
First, we show that the expected value of kcT (εtθ ), given t ,
converges to ht(0|t). Let k(u) ≡ eu[1 + eu]−2. Then
E
[
kcT (εtθ )|t
]
=
∫ ∞
−∞
k(u)ht(ucT |t)du
=
∫ ∞
−∞
k(u)
[
ht(0|t) + h′t(0|t)ucT + o(cT)
]
du
= ht(0|t) + o(cT),
where in the first equality we performed a change of variables,
in the second we applied the Taylor expansion to ht(ucT |t)
around 0, and the last equality comes from the fact that k(u) is
a density function with first moment equal to 0.
Next, we need to show that T−1
∑T
t=1[kcT (εtθ ) − E(kcT (εtθ )|
t)]X′t(β0)∇ft(β0) = op(1). It obviously has 0 expectation. If,
in addition, its variance converges to 0, then the result follows
from application of the Chebyshev inequality,
∥∥∥∥∥E
[
T−1
T∑
t=1
[
kcT (εtθ ) − E
(
kcT (εtθ )
∣∣t
)]
X′t(β0)∇ft(β0)
]2∥∥∥∥∥
=
∥∥∥∥∥E
[
T−2
T∑
t=1
[
kcT (εtθ ) − E
(
kcT (εtθ )
∣∣t
)]2
× [X′t(β0)∇ft(β0)]2
]∥∥∥∥∥
≤ T−2
T∑
t=1
E
[
O(c−1T )[W(t)F(t)]2
]
≤ T−2
T∑
t=1
W1O(c−1T )
= T−1W1O(c−1T )
T→∞→ 0,
380 Journal of Business & Economic Statistics, October 2004
where the first equality holds because all of the cross-products
are 0 by the law of iterated expectations, and the two inequali-
ties follow from assumptions DQ1, AN1(a) and
E
{[
kcT (εtθ ) − E
(
kcT (εtθ )
∣∣t
)]2|t
}
= E[kcT (εtθ )2
∣∣t
] − E[kcT (εtθ )
∣∣t
]2
=
∫ ∞
−∞
kcT (λ)2h(λ|t)dλ − h(0|t)2 + o(cT)
= c−1T
∫ ∞
−∞
k(u)2h(ucT |t)du − h(0|t)2 + o(cT)
≤ 1/4c−1T
∫ ∞
−∞
k(u)
[
h(0|t) + h′(0|t)ucT + o(cT)
]
du
− h(0|t)2 + o(cT)
≤ 1/4c−1T [h(0|t) + o(cT)] − h(0|t)2 + o(cT)
= O(c−1T ).
Therefore, (B.3) can be rewritten as
T−1/2X′(βˆ)Hit⊕(βˆ)
= T−1/2[X′(β0)Hit(β0)
− E[T−1X(β0)H∇f (β0)] · D−1T · ∇′f (β0)Hit(β0)
]
+ op(1)
≡ T−1/2MTHit(β0) + op(1).
Finally, analogously to how we showed that T−1/2X(β0) ×
Hit⊕(β0) − T−1/2X′(β0)Hit(β0) = op(1), it is also possible
to show that T−1/2X(βˆ)Hit⊕(βˆ)−T−1/2X′(βˆ)Hit(βˆ) = op(1).
Therefore,
T−1/2X′(βˆ)Hit(βˆ) = T−1/2MT Hit(β0) + op(1).
We can now apply the central limit theorem to T−1/2MT ×
Hit(β0), by assumption DQ6. The law of iterated expectations
can be used to show that this term has expectation equal to 0 and
variance equal to θ(1 − θ)E{T−1MT M′T}. The result follows.
To conclude the proof of Theorem 4, it remains to show that
T−1MˆTMˆ′T − E(T−1MT M′T)
p→0, where
MˆT ≡ X′(βˆ) − GˆTDˆ−1T ∇′f (βˆ)
and
GˆT ≡ (2TcˆT)−1
T∑
t=1
I
(|yt − ft(βˆ)| < cˆT
)
X′t(βˆ)∇ft(βˆ).
Expanding the product, we get
T−1MˆTMˆ′T = T−1
[
X′(βˆ) · X(βˆ) − 2X′(βˆ)∇f (βˆ)Dˆ−1T GˆT
+ GˆTDˆ−1T ∇′f (βˆ)∇f (βˆ)Dˆ−1T GˆT
]
.
Adopting the same strategy used in the proof of Theorem 3,
each of these terms can be shown to converge in probability to
the analogous terms of E(T−1MT M′T).
Proof of Theorem 5
As in the proof of Theorem 4, we first approximate the dis-
continuous function Hitt(βˆTR) with a continuously differen-
tiable function and then apply the mean value expansion to this
approximation,
N−1/2R X
′(βˆTR
)
Hit⊕
(
βˆTR
)
= N−1/2R
{
X′(β0)Hit⊕(β0)
+ [∇X(β∗)Hit⊕(β∗) + X(β∗)K(ε∗t )∇f (β∗)
]
× (βˆTR − β0
)}
,
where β∗ lies between βˆ and β0 and the variables are defined in
the proof of Theorem 4. Assumption DQ8, consistency of βˆTR ,
and Slutsky’s theorem yield
lim
R→∞ N
−1/2
R X
′(βˆTR
)
Hit⊕
(
βˆTR
)
= lim
R→∞
{
N−1/2R X
′(β0)Hit⊕(β0)
+
(
NR
TR
)1/2
× 1
NR
[∇X(β∗)Hit⊕(β∗)
+ X(β∗)K(ε∗t )∇f (β∗)
]
T1/2R
(
βˆTR − β0
)}
= lim
R→∞ N
−1/2
R X
′(β0)Hit⊕(β0).
The rest of the proof follows the analogous parts in Theorem 4.
[Received July 2002. Revised January 2004.]
REFERENCES
Amemiya, T. (1982), “Two-Stage Least Absolute Deviations Estimators,”
Econometrica, 50, 689–711.
Basle Committee on Banking Supervision (1996), “Amendment to the Cap-
ital Accord to Incorporate Market Risks,” Basel Committee Publications,
www.bis.org/publ/bcbs24a.htm.
Bloomfield, P., and Steiger, W. L. (1983), Least Absolute Deviations: Theory,
Applications and Algorithms, Boston: Birkhauser.
Boudoukh, J., Richardson, M., and Whitelaw, R. F. (1998), “The Best of Both
Worlds,” Risk, 11, 64–67.
Buchinsky, M. (1995), “Estimating the Asymptotic Covariance Matrix for
Quantile Regression Models. A Monte Carlo Study,” Journal of Economet-
rics, 68, 303–338.
Buchinsky, M., and Hahn, J. (1998), “An Alternative Estimator for the Censored
Quantile Regression Model,” Econometrica, 66, 653–671.
Chernozhukov, V. (1999), “Specification and Other Test Processes for Quantile
Regression,” mimeo, Stanford University.
(2000), “Conditional Extremes and Near-Extremes,” Working Paper
01-21, MIT, Dept. of Economics.
Chernozhukov, V., and Umantsev, L. (2001), “Conditional Value-at-Risk: As-
pects of Modeling and Estimation,” Empirical Economics, 26, 271–293.
Christoffersen, P. F. (1998), “Evaluating Interval Forecasts,” International Eco-
nomic Review, 39, 841–862.
Danielsson, J., and de Vries, C. G. (2000), “Value-at-Risk and Extreme Re-
turns,” Annales d’Economie et de Statistique, 60, 239–270.
Dupacova, J. (1987), “Asymptotic Properties of Restricted L1-Estimates of Re-
gression,” in Statistical Data Analysis Based on the L1-Norm and Related
Methods, ed. Y. Dodge, Amsterdam: North-Holland.
Engle, R. F. (2002), “New Frontiers for ARCH Models,” Journal of Applied
Econometrics, 17, 425–446.
Engle and Manganelli: CAViaR 381
Engle, R. F., and Ng, V. (1993), “Measuring and Testing the Impact of News
On Volatility,” Journal of Finance, 48, 1749–1778.
Granger, C. W. J., White, H., and Kamstra, M. (1989), “Interval Forecasting:
An Analysis Based Upon ARCH-Quantile Estimators,” Journal of Econo-
metrics, 40, 87–96.
Huber, P. J. (1967), “The Behaviour of Maximum Likelihood Estimates Under
Nonstandard Conditions,” Proceedings of the Fifth Berkeley Symposium, 4,
221–233.
Jureckova, J., and Prochazka, B. (1993), “Regression Quantiles and Trimmed
Least Squares Estimators in Nonlinear Regression Models,” Journal of Non-
parametric Statistics, 3, 202–222.
Koenker, R., and Bassett, G. (1978), “Regression Quantiles,” Econometrica, 46,
33–50.
(1982), “Robust Tests for Heteroscedasticity Based on Regression
Quantiles,” Econometrica, 50, 43–61.
Koenker, R., and Park, B. J. (1996), “An Interior Point Algorithm for Nonlinear
Quantile Regression,” Journal of Econometrics, 71, 265–283.
Koenker, R., and Zhao, Q. (1996), “Conditional Quantile Estimation and Infer-
ence for ARCH Models,” Econometric Theory, 12, 793–813.
Kould, H. L., and Saleh, A. K. E. (1995), “Autoregression Quantiles and Re-
lated Rank-Scores Processes,” The Annals of Statistics, 23, 670–689.
Manganelli, S., and Engle, R. F. (2004), “A Comparison of Value at Risk Mod-
els in Finance,” in Risk Measures for the 21st Century, ed. G. Szegö, Chich-
ester, U.K.: Wiley.
McNeil, A. J., and Frey, R. (2000), “Estimation of Tail-Related Risk Measures
for Heteroscedastic Financial Time Series: An Extreme Value Approach,”
Journal of Empirical Finance, 7, 271–300.
Mukherjee, K. (1999), “Asymptotics of Quantiles and Rank Scores in Nonlinear
Time Series,” Journal of Time Series Analysis, 20, 173–192.
Oberhofer, W. (1982), “The Consistency of Nonlinear Regression Minimizing
the L1 Norm,” The Annals of Statistics, 10, 316–319.
Portnoy, S. (1991), “Asymptotic Behavior of Regression Quantiles in Non-
stationary, Dependent Cases,” Journal of Multivariate Analysis, 38, 100–113.
Powell, J. (1983), “The Asymptotic Normality of Two-Stage Least Absolute
Deviations Estimators,” Econometrica, 51, 1569–1575.
(1984), “Least Absolute Deviations Estimation for the Censored Re-
gression Model,” Journal of Econometrics, 25, 303–325.
(1986), “Censored Regression Quantiles,” Journal of Econometrics,
32, 143–155.
(1991), “Estimation of Monotonic Regression Models Under Quantile
Restriction,” in Nonparametric and Semiparametric Methods in Economics
and Statistics, eds. W. A. Barnett, J. Powell, and G. E. Tauchen, Cambridge
University Press.
RiskMetrics (1996), Technical Document, New York: Morgan Guarantee Trust
Company of New York.
Ruppert, D., and Carroll, R. J. (1980), “Trimmed Least Squares Estimation
in the Linear Model,” Journal of the American Statistical Association, 75,
828–838.
Schwert, G. W. (1988), “Why Does Stock Market Volatility Change Over
Time?” Journal of Finance, 44, 1115–1153.
Taylor, S. J. (1986), Modelling Financial Time Series, Chichester, U.K.: Wiley.
van de Geer, S. (2000), Empirical Processes in M-Estimation, Cambridge,
U.K.: Cambridge University Press.
Weiss, A. (1991), “Estimating Nonlinear Dynamic Models Using Least Ab-
solute Error Estimation,” Econometric Theory, 7, 46–68.
White, H. (1980), “A Heteroskedasticity-Consistent Covariance Matrix Estima-
tor and a Direct Test for Heteroskedasticity,” Econometrica, 48, 817–838.
(1994), Estimation, Inference and Specification Analysis, Cambridge,
U.K.: Cambridge University Press.

学霸联盟