MA417/ST433:!Computational!Methods!in!Finance!n!
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
MA417/ST433 Computational Methods in Finance
Luitgard A. M. Veraart,
Email: L.Veraart@lse.ac.uk
Lent Term 2023
1
Contents
1 Generating Random Numbers 5
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 The linear congruential generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 The inverse transform method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Von Neumann’s acceptance-rejection method . . . . . . . . . . . . . . . . . . . . . 13
1.5 The Box-Muller method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.6 Sampling using the Cholesky factorization . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7.1 The moment generating function . . . . . . . . . . . . . . . . . . . . . . . . 23
1.7.2 The normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2 Foundations of Monte Carlo Methods 26
2.1 Introduction to Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Monte Carlo estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Confidence intervals of Monte Carlo estimators . . . . . . . . . . . . . . . . . . . . 27
2.4 Appendix: Convergence in distribution (weak convergence) and the Central Limit
Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Alternatives to Monte Carlo Methods 32
3.1 Quasi-Monte Carlo methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Quadrature methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 One-dimensional integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.2 d-dimensional integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Which method should one choose? . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Variance Reduction Techniques for Monte Carlo Estimation 37
4.1 Control variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Antithetic variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.1 Some remarks on absolute continuous probability measures and importance
sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4.2 Some remarks on change of measure and importance sampling . . . . . . . 55
2
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
5 Generating Sample Paths 56
5.1 Simulating a Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.2 Simulating a geometric Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 63
6 Euler Schemes for Simulating Solutions of SDEs 65
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 A first order scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.3 Order of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4 A refinement of the first order scheme . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5 A second order scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.6 A simplified second order scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.7 Example: Path-dependent options . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7 Finite Difference Schemes for Partial Differential Equations (PDEs) 80
7.1 Black-Scholes-PDE and the heat equation . . . . . . . . . . . . . . . . . . . . . . . 80
7.1.1 Transforming the BS-PDE to a PDE with constant coefficients . . . . . . . 82
7.1.2 Elimination of lower order terms . . . . . . . . . . . . . . . . . . . . . . . . 83
7.1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.2 Approximation of the derivatives of smooth functions . . . . . . . . . . . . . . . . . 85
7.3 The grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.4 The explicit (forward) finite difference scheme . . . . . . . . . . . . . . . . . . . . . 87
7.5 The implicit (backward) finite difference scheme . . . . . . . . . . . . . . . . . . . 90
7.6 The Crank-Nicolson finite difference scheme . . . . . . . . . . . . . . . . . . . . . . 92
7.7 The θ-method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.8 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.8.1 The general heat equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.8.2 The heat equation on [0, τmax]× [xmin, xmax] . . . . . . . . . . . . . . . . . 96
7.9 Stability of the finite difference schemes . . . . . . . . . . . . . . . . . . . . . . . . 99
3
Remarks
These lecture notes are based on two textbooks: Chapters 1 - 5 mainly follow the presentation
provided in Glasserman (2004) and Chapter 6 is mainly based on Seydel (2009). More details
regarding which chapters/sections in these books are relevant are provided at the end of the
sections/chapters etc. in these lecture notes.
In addition to that, there are many other textbooks on computational methods that you might
find useful for this course, see e.g. Asmussen & Glynn (2007) and Ross (2006).
If you feel that you need to revise some general results in probability, you might find the
following books useful: Jacod & Protter (2000) or Kallenberg (2002).
4
Chapter 1
Generating Random Numbers
”Random numbers should not be generated with a method chosen at random”, Knuth (1981).
1.1 Introduction
• In many applications in finance one needs to compute the expectation EX of a random
variable X, e.g., in option pricing.
• It is not always possible to compute EX analytically.
• Suppose we have a sequence of random variables (Xi)i∈N which aremutually independent
and identically distributed with the same distribution as X (which is assumed to
be integrable, i.e., E|X| <∞), then
P( lim
n→∞
1
n
n∑
i=1
Xi = EX) = 1
by the Strong Law of Large Numbers (see Chapter 2 for details).
• How can one obtain (realisations of) Xi?
5
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Example 1. • Consider a random variable X which can take values in {1, 2, 3, 4, 5, 6} with
P(X = i) = 16 for i = 1, . . . , 6.
• Then EX can be computed analytically and is given by
EX =
6∑
i=1
iP(X = i)︸ ︷︷ ︸
=1/6
= 3.5.
• We can interpret the random variable X as the outcome of the role of a fair die.
• We can compute an approximation for EX by doing a random experiment: We role a fair
die n times and therefore generate some numbers x1, . . . , xn. Then
1
n
∑n
i=1 xi can be used
as an approximation for EX.
• Here: Realisations of X can be obtained by rolling a fair die.
• We need more general methods than this!
• Need methods that can generate a large number of random numbers from any distribu-
tion on a computer.
• We are interested in generating random numbers from various distributions.
• As soon as we generate random numbers on a computer, the generation will have to
be based on a completely deterministic mechanism.
• The generated output is therefore sometimes also referred to as pseudo-random numbers.
• The idea is to develop mechanisms such that the computer-generated random num-
bers mimic the properties of true random numbers as much as possible.
1.2 The linear congruential generator for generating a sample
from the uniform distribution
Initially we want to generate a sample from a uniform distribution. As we will later see, this
sample can also be used to derive a sample from a different distribution.
6
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Recall that the cumulative distribution function (cdf) of a random variable X is the
function FX : R → [0, 1] given by FX(x) = P(X ≤ x). When there is no risk of ambiguity,
we sometimes drop the subscript and just write F (or some other letter) for a cdf.
• Properties of the cumulative distribution function F : F is non-decreasing, F is right-
continuous, limx→−∞ F (x) = 0, and limx→+∞ F (x) = 1.
• Let F be a cumulative distribution function (cdf). We refer to real numbers x1, . . . , xn as a
sample from F if the numbers are realisations of mutually independent random variables
all of which having cdf F .
Recall that a function F : R → [0, 1] is non-decreasing if for all x, y ∈ R with x < y it holds
that F (x) ≤ F (y).
Recall that a function f : R → [0, 1] is right-continuous at a ∈ R, if for all ǫ > 0 there exist
a δ > 0 such that for all x ∈ R with a < x < a + δ it holds that |f(x) − f(a)| < ǫ. It is
right-continuous on R if it is right-continuous at every a ∈ R.
Example 2 (Uniform distribution). Let X be a random variable which has a (continuous)
uniform distribution on [a, b], a, b ∈ R, a < b. We write X ∼ U [a, b].
• The corresponding probability density function of X is given by
f(x) =
1
b− a1[a,b](x) for x ∈ R,
where 1A(x) = 1 if x ∈ A and 1A(x) = 0 if x 6∈ A.
• The corresponding cumulative distribution function of X is
F (x) =
0, if x < a,
x−a
b−a , if x ∈ [a, b],
1, if x > b.
• EX = a+b2 and var(X) = E(X − EX)2 = 112 (b− a)2.
In the following we consider the linear congruential generator and discuss how well its
output resembles a sample from the uniform distribution on [0, 1).
7
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Definition 3. Let m ∈ N and a, x0 ∈ {1, 2, . . . ,m− 1}.
A linear congruential generator is a recurrence of the following form. For i = 0, 1, 2, . . . set
xi+1 = axi mod m,
ui+1 = xi+1/m.
x0 is called the seed, a the multiplier and m the modulus.
• We do not allow x0 = 0, since for x0 = 0 we would get xi = 0 ∀i ∈ N.
• The sequence (ui)i∈N is a sequence in [0, 1).
It is deterministic (so-called pseudo-random numbers). However, for appropriate values
of a and m, it does resemble a sample from the uniform distribution on [0, 1).
• Recall that for x,m ∈ N the expression x mod m (say x modulo m) returns the remainder
of x after division by m, i.e.,
x mod m = x−m⌊ x
m
⌋ ∈ {0, 1, . . . ,m− 1},
where ⌊x⌋ denotes the greatest integer less than or equal to x.
• Examples:
5 mod 7 = 5, 20 mod 10 = 0, 13 mod 7 = 6.
Example 4. For a = 6, m = 11 and x0 = 1 the linear congruential generator yields
1, 6, 3, 7, 9, 10, 5, 8, 4, 2, 1, 6, . . . .
for the xi and the corresponding sequence (ui) is obtained by setting ui+1 = xi+1/11.
Observe that this sequence periodically returns back to the seed 1!
• Each of the numbers in a sequence (xi) resulting from the linear congruential generator takes
values in the set {0, 1, . . . ,m− 1}.
• The sequence (xi) (and hence also (ui)) will repeat itself after at most m steps.
• Hence, large values of m are necessary for a long cycle.
• Large values of m are not sufficient for a long cycle.
The following example shows that a large m itself does not guarantee a long cycle.
8
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Example 5. Let m = 11 as in Example 4, but now we choose multiplier a = 3.
1. With seed x0 = 1 we obtain
1, 3, 9, 5, 4, 1, . . .
for the xi.
2. Changing the seed to x0 = 2 yields
2, 6, 7, 10, 8, 2, . . . .
We see that the possible values are split into two cycles.
• One can show that for a prime number m, a full period is achieved for any x0 6= 0, if
– am−1 − 1 is a multiple of m,
– aj − 1 is not a multiple of m for j = 1, . . . ,m− 2.
(Full period means that allm−1 distinct values 1, 2, . . . ,m−1 are produced before repeating.)
• Such a number a is called a primitive root of m.
• The sequence (xi) is then given by
x0, ax0 mod m, a
2x0 mod m, . . . .
• It returns to x0 at the smallest k for which akx0 mod m = x0, which is the smallest k for
which ak mod m = 1, i.e., the smallest k for which (ak − 1) is a multiple of m.
Example 6 (Lattice structure).
• Consider the linear congruential generator
in Example 4 with a = 6, m = 11, x0 = 1.
• Plot consecutive overlapping pairs
(u1, u2), (u2, u3), . . . , (u10, u11).
• You will find that the ten points lie on just
two parallel lines through the unit square
(with ui on x-axis and ui+1-on y-axis).
• This lattice structure distinguishes this
sample from genuine random numbers.
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
ui
u
i+
1
9
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• As we have seen, the linear congruential generator is started with some seed x0.
• The existence of this seed is a common feature of random number generators.
• On a computer, a time stamp is often used as a seed for a random number generator.
• If one would like to reproduce results one will need to start the random number generator
with the same seed.
• The linear congruential generator is a very simple random number generator and has some
undesirable properties (e.g. the lattice structure).
There are much more advanced methods for generating uniform random numbers but these
are beyond the scope of this course.
• Reading: (Glasserman, 2004, Section 2.1.1 - 2.1.2 (p. 39–44), 2.1.4. (p. 47–49)),
(Seydel, 2009, Section 2.1 (p. 69–76)).
1.3 The inverse transform method for generating a sample
from a cumulative distribution function F
We consider a random variable X with cumulative distribution function F , i.e., F (x) = P(X ≤ x).
Definition 7. Given a cumulative distribution function (cdf) F , its (generalized) inverse F−1 is
defined by
F−1(u) = inf {x ∈ R | F (x) ≥ u} for u ∈ (0, 1).
One can check that F−1 is non-decreasing (i.e., for all u, v ∈ R with u < v it holds that
F−1(u) ≤ F−1(v)) and left-continuous on (0, 1).
If the cdf F is continuous and strictly increasing (where strictly increasing means that for
all x, y ∈ R with x < y it holds that F (x) < F (y)), then the generalized inverse is the ordinary
inverse function satisfying F (F−1(u)) = u for all u ∈ (0, 1) and F−1(F (x)) = x for all x ∈ R. Since
the ordinary inverse does not always exist for a cumulative distribution function, the generalized
inverse has been introduced which is well-defined for any cumulative distribution function. We
will discuss examples in the seminar.
10
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Lemma 8. If U ∼ U(0, 1), then the random variable F−1(U) has cumulative distribution function
F .
Proof. We first note that, given x ∈ R, u ∈ (0, 1), the right-continuity of F implies that
F−1(u) ≤ x⇔ u ≤ F (x).
We prove this in the Seminar. Hence,
P
(
F−1(U) ≤ x) = P(U ≤ F (x)) = F (x),
where the last equality follows because U is uniform on (0, 1).
Example 9. (Exponential distribution)
• Consider the exponential distribution with parameter µ. Its cdf F is given by
F (x) = 1− e−µx for x > 0. (1.1)
• Here, F is continuous and strictly increasing. We can compute the generalized inverse by
setting
F (x) = 1− e−µx = u
and solving for x. Then, x = − ln(1−u)µ and hence we define
F−1(u) = − ln(1− u)
µ
for all u ∈ (0, 1).
• Noting that, if U ∼ U(0, 1), then 1 − U ∼ U(0, 1), we conclude that the random variable
− 1µ ln(U) has the exponential distribution with parameter µ.
Note that for the exponential distribution it holds that for all u ∈ (0, 1) and for all x ∈ R
F (F−1(u)) = 1− e−µF−1(u)) = 1− e−µ(− ln(1−u)µ ) = 1− (1− u) = u,
F−1(F (x)) = − ln(1− F (x))
µ
= − ln(1− 1 + e
−µx)
µ
= x,
i.e., here the generalized inverse coincides with the ordinary inverse. This is not always the case
as we will see in the seminars.
11
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• In many cases, F−1 cannot be calculated in closed analytical form.
• However, we can still calculate F−1 numerically.
Let u ∈ (0, 1). There are many situations (e.g., if F is continuous and strictly increasing) in
which one can compute F−1(u) by just solving the following equation for x:
F (x)− u = 0.
• For instance, if F is C1, then we can use Newton’s method. Choose an initial point x0
and compute recursively
xn+1 = xn − F (xn)− u
F ′(xn)
.
For a suitable choice of x0 and for a suitably well-behaved function F this sequence converges
to the required solution.
If a discrete random variable is considered, the evaluation of F−1 reduces to a table lookup:
Example 10 (Discrete distributions). • Suppose X is a random variable which has possible
values c1 < . . . < cn. (Note: This ordering is important!) Let
P(X = ci) = pi, i = 1, . . . , n with pi ≥ 0 and
n∑
i=1
pi = 1. (1.2)
• We define cumulative probabilities
q0 = 0, qi =
i∑
j=1
pj , i = 1, . . . , n.
• The cdf F of X satisfies F (ci) = P(X ≤ ci) = qi.
• Now we can sample from this distribution as follows:
1. Generate U from U [0, 1].
2. Find K ∈ {1, . . . , n} such that qK−1 < U ≤ qK .
3. Set X˜ = cK .
To see that X˜ has indeed the same distribution as X specified in (1.2), consider the following
probabilities:
P(X˜ = ck) = P(qK−1 < U ≤ qK) = FU (qK)− FU (qK−1) = qK − qK−1 = pK .
12
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Example 11 (Discrete distribution). Plot F (x) = P(X ≤ x) for a random variable X taking only
the values c1 = 1, c2 = 3, c3 = 4.5, c4 = 7.
• Let P(X = 1) = 18 = p1, P(X = 3) =
1
2 = p2, P(X = 4.5) =
2
8 = p3, P(X =
7) = 18 = p4.
• Here q0 = 0 = P(X < 1), q1 = 18 =
0.125 = F (1), q2 =
5
8 = 0.625 = F (3),
q3 =
7
8 = 0.875 = F (4.5), q4 = 1 =
F (7).
• Suppose U = 0.7, then q2 = F (c2) <
U ≤ F (c3) = q3. Therefore, X = c3 =
4.5.
x
F
1 −
7/8−
6/8−
5/8−
4/8−
3/8−
2/8−
1/8−
| | | |
1 3 4.5 7
U
• Reading: (Glasserman, 2004, Section 2.2.1, pp. 54–58).
1.4 Von Neumann’s acceptance-rejection method for gen-
erating a sample from a probability density function f
• We would like to sample from a target distribution which has corresponding probability
density f on the set X (where X ⊆ Rd).
• Suppose there is a density g on X from which we know how to generate a sample from and
for which
f(x) ≤ cg(x) for all x ∈ X , (1.3)
for a constant c.
• The idea is now to generate X from g and accept the sample with probability f(X)cg(X) .
13
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
cg(x)
f(x)
x
+
+
+
+
+
Simulation by acceptance-rejection-sampling. Graphical interpretation:
• Sample X0 from g.
• Given this X0 sample U˜ from a uniform distribution on [0, cg(X0)].
• If U˜ ≤ f(X0) (labeled as ’◦’) accept X0, otherwise (labeled as ’+’) reject X0.
We can use the following modified algorithm:
Definition 12 (Acceptance - Rejection - Algorithm). Suppose condition (1.3) is satisfied. Then,
the Von Neumann’s Acceptance-Rejection Algorithm is given by
1. Generate X from the density g.
2. Generate U from the uniform U [0, 1] distribution.
3. If U ≤ f(X)cg(X) , then accept X and return it. Otherwise, go back to 1.).
Hence, we see that we first generate a sample from another distribution (with density g) which
might be easier to sample from and then reject some of those candidates. The rejection mechanism
is designed such that the accepted candidates are indeed a sample from our target distribution.
We see that the algorithm in Definition 12 returns samples from a distribution which coincides
with the distribution of X (which has density g(·)) conditional on U ≤ f(X)cg(X) (where U ∼ U [0, 1]).
The two random variables X and U are independent.
14
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Theorem 13. Let f, g be probability density functions (pdf) on X ⊆ Rd satisfying condition (1.3)
for a constant c ∈ R. Let X,U be two independent random variables where X has pdf g and
U ∼ U [0, 1]. Then for any A ∈ B(X ) it holds that
P
(
X ∈ A∣∣U ≤ f(X)
cg(X)
)
=
∫
A
f(x)dx.
Corollary 14. The algorithm in Definition 12 works, i.e., it indeed returns a sample from the
density f .
Note that B(X ) denotes the Borel-σ-algebra on X .
Proof of Theorem 13.
• For any A ∈ B(X ) we compute
P
(
X ∈ A∣∣U ≤ f(X)
cg(X)
)
=
P
(
X ∈ A,U ≤ f(X)cg(X)
)
P
(
U ≤ f(X)cg(X)
) . (1.4)
• Since U,X are independent, the joint density of (U,X) denoted by h(·) can be written as
the product of the two marginal densities:
h(u, x) = 1[0,1](u)g(x), u ∈ R, x ∈ X .
The denominator in (1.4) is given by
P
(
U ≤ f(X)
cg(X)
)
=
∫
X
∫
R
1{u≤ f(x)
cg(x)
}h(u, x)dudx
=
∫
X
∫
R
1{u≤ f(x)
cg(x)
}1[0,1](u)g(x)dudx
=
∫
X
g(x)
∫ f(x)
cg(x)
0
dudx
=
∫
X
g(x)
f(x)
cg(x)
dx
=
1
c
∫
X
f(x)dx︸ ︷︷ ︸
=1,since f density!
=
1
c
.
(1.5)
(Here we take 00 = 1 if g(x) = 0 somewhere on X ).
15
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Similarly, the numerator in (1.4) is given by
P(X ∈ A,U ≤ f(X)
cg(X)
) =
∫
X
∫
R
1A(x)1{u≤ f(x)
cg(x)
}g(x)1[0,1](u)dudx
=
∫
A
g(x)
∫ f(x)
cg(x)
0
du︸ ︷︷ ︸
=
f(x)
cg(x)
dx
=
1
c
∫
A
f(x)dx.
(1.6)
• By combining (1.5) and (1.6), we see that (1.4) reduces to
P
(
X ∈ A∣∣U ≤ f(X)
cg(X)
)
=
∫
A
f(x)dx.
• Hence, we see that the algorithm indeed returns samples from the density f .
Remark 15. • From (1.5) we see, that the probability that X generated from g is accepted
is P
(
U ≤ f(X)cg(X)
)
= 1c .
• Obviously, one will try to maximize this acceptance probability.
• Since f, g are both probability densities they integrate to 1 and since (1.3) holds,
1 =
∫
X
f(x)dx ≤
∫
X
cg(x)dx = c.
Hence, one would try to choose c as close to 1 as possible.
Example 16. We can use Von Neumann’s acceptance-rejection method to generate samples from
the standard normal distribution using the double exponential distribution as follows.
• Consider the probability density
function of the standard normal
distribution
f(x) =
1√
2π
e−x
2/2 for x ∈ R,
• and the probability density func-
tion of the double exponential
distribution
g(x) =
1
2
e−|x| for x ∈ R. −3 −2 −1 0 1 2 3
0.
1
0.
2
0.
3
0.
4
0.
5
0.
6
f(x)
2e pig(x)
16
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We next look for a constant c ≥ 1 such that (1.3) is true.
• We calculate
f(x)
g(x)
=
2√
2π
e−
x2
2 +|x|
=
√
2
π
e−
x2−2|x|+1
2 +
1
2
=
√
2e
π
e−
(|x|−1)2
2
≤
√
2e
π
=: c ≈ 1.3155.
• In fact, the constant c = √2e/π is the smallest one such that (1.3) holds true, because
f(1) =
√
2e/πg(1) and f(−1) =√2e/πg(−1).
We find that Von Neumann’s algorithm for generating a sample from the standard normal
distribution, which has probability density function f , using samples from the double exponential
distribution can be described as follows.
• Von Neumann’s algorithm for example:
1. Generate X from the double exponential probability density function g and generate
U from the uniform distribution on (0, 1);
2. If
U >
f(X)
cg(X)
= e−
(|X|−1)2
2 ,
then reject X and go back to step (1);
3. Return X.
• The proportion of rejected samples is
1− 1
c
= 1−
√
π
2e
≈ 23.98%.
• Reading: (Glasserman, 2004, Section 2.2.2, pp. 58 – 63).
A very nice description and proof of the acceptance-rejection method is given in Flury (1990).
17
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
1.5 The Box-Muller method for generating a sample from
the bivariate standard normal distribution
We can introduce the Box-Muller algorithm by means of the following observations.
Let (X1, X2)
⊤ ∼ N2(0, I2), and define the random variable R = X21 +X22 .
1. Then R is exponentially distributed with mean 2 (i.e., parameter 12 ).
2. Given R, the point (X1, X2)
⊤ is uniformly distributed on the circle of radius
√
R centered
around the origin.
Proposition 17. Let (X1, X2)
⊤ ∼ N2(0, I2). Then the random variable R = X21 +X22 is expo-
nentially distributed with mean 2 (i.e., parameter 12 ).
Proof. To see this claim, we calculate the moment generating function.
MX21+X22 (θ) = E
[
eθ(X
2
1+X
2
2 )
]
= E
[
eθX
2
1
]
E
[
eθX
2
2
]
=
(
1√
2π
∫ ∞
−∞
e(θ−
1
2 )x
2
1 dx1
)(
1√
2π
∫ ∞
−∞
e(θ−
1
2 )x
2
2 dx2
)
=
1√
1− 2θ
1√
1− 2θ
=
1
1− 2θ for θ <
1
2
,
which proves that X21 +X
2
2 has the moment generating function of an exponentially distributed
random variable with mean 2 (see also Exercise).
Recall that
∫∞
−∞
1√
2πσ2
e−
x2
2σ2 dx = 1 ⇔ ∫∞−∞ 1√2π e− x22σ2 dx = √σ2. Hence with θ − 12 =
(− 12 ) 11/(1−2θ) and σ2 = 11−2θ we obtain the result in line 4.
The result then follows with Theorem 24.
• We can represent each point (x1, x2)⊤ in the 2-dimensional plane by a distance
√
R from
Cartesian origin (0, 0)⊤ and an angle θ from Cartesian x-axis (polar coordinates).
• Hence we see that we can generate first the random variable R which is exponentially dis-
tributed with parameter 12 and can be sampled by the inversion method by setting
R = −2 log(U1), U1 ∼ U(0, 1).
• Then we generate a random angle θ uniformly on [0, 2π) by setting
θ = 2πU2, U2 ∼ U [0, 1).
• The corresponding point on the circle with radius√R centered around the origin is then given
by (
√
R cos(θ),
√
R sin(θ)) and is the required sample from a bivariate normal distribution.
18
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
We can write this as a compact algorithm:
Definition 18. The Box-Muller-Method is defined as follows:
1. Generate independent U1, U2 ∼ U(0, 1);
2. Set R = −2 ln(U1);
3. Set θ = 2πU2
4. Set X1 =
√
R cos(θ);
5. Set X2 =
√
R sin(θ);
6. Return X1 and X2.
Theorem 19. Let R, θ be two independent random variables with R ∼ Exp( 12 ) and θ ∼ U [0, 2π).
Then (
√
R cos(θ),
√
R sin(θ))⊤ ∼ N2(0, I2).
Corollary 20. The algorithm in Definition 18 returns a sample (X1, X2)
⊤ from the bivariate
standard normal distribution.
Proof of Theorem 19.
• To prove the statement consider
P(
√
R cos(θ) ≤ z1,
√
R sin(θ) ≤ z2)
=
∫ 2π
0
∫ +∞
0
1(
√
r cos(θ)≤z1,
√
r sin(θ)≤z2)
1
2π
1
2
e−r/2drdθ.
• Change of variables from polar coordinates to Cartesian coordinates:
x1 =
√
r cos(θ) =: h1(r, θ),
x2 =
√
r sin(θ) =: h2(r, θ).
• With h(r, θ) = (h1(r, θ), h2(r, θ))⊤ we compute the absolute value of the determinant of the
Jacobi matrix
| detDh(r, θ)|= ∣∣ det
(
∂h1
∂r
∂h1
∂θ
∂h2
∂r
∂h2
∂θ
)∣∣= ∣∣ det
(
cos θ
2
√
r
−√r sin θ
sin θ
2
√
r
√
r cos θ
)∣∣= 1
2
.
19
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Recall, ∫
h(G)
f(x1, x2)d(x1, x2) =
∫
G
f(h(r, θ))| detDh(r, θ)|d(r, θ)
• and hence we obtain
P(
√
R cos(θ) ≤ z1,
√
R sin(θ) ≤ z2)
=
∫ 2π
0
∫ +∞
0
1(
√
r cos(θ)≤z1,
√
r sin(θ)≤z2)
1
2π
e−r/2
1
2
drdθ
=
∫ z1
−∞
∫ z2
−∞
1
2π
e−(x
2
1+x
2
2)/2dx1dx2
=
∫ z1
−∞
1√
2π
e−
x21
2 dx1
∫ z2
−∞
1√
2π
e−
x22
2 dx2.
• Reading: (Glasserman, 2004, p.65–67)
Note that the proof in (Seydel, 2009, p.78–81) is not completely correct, since if one wants to
invert the tangent one needs to restrict it to an interval (usually (−π2 , π2 )). Then arctan : R →
(−π2 , π2 ) 6= [0, 2π]. The correct inverse function, however, can be defined. It is usually a shifted
version of a function called atan2. The inverse can be defined such that it indeed maps R to [0, 2π].
1.6 Generating a sample from a multivariate normal dis-
tribution Nd(µ,Σ) using samples from the the standard
normal distribution N1(0, 1): the Cholesky factorization
algorithm
Lemma 21. Let X ∼ Nd(0, Id), where Id is the d× d identity matrix, i.e.,
Id =
1
1 0
1
0
. . .
1
,
and consider a vector µ ∈ Rd and a matrix A ∈ Rd×d.
Then µ+AX ∼ Nd(µ,AA⊤).
20
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Proof of Lemma 21. Using (1.8) below, we can see that the moment generating function of the
random variable µ+AX is given by
Mµ+AX(θ) = E
[
eθ
⊤(µ+AX)
]
= eθ
⊤µE
[
e(A
⊤θ)⊤X
]
= eθ
⊤µMX(A
⊤θ)
= eθ
⊤µe
1
2‖A⊤θ‖22
= eθ
⊤µ+ 12 θ
⊤(AA⊤)θ ∀θ ∈ Rd.
In view of (1.8), it follows that µ + AX has the moment generating function of a Nd(µ,AA⊤)
random variable, and the result follows.
• Lemma 21 shows that, to generate a sample from the Nd(µ,AA⊤) distribution, it suffices
– to generate a sample from the Nd(0, Id) distribution,
– which is the same as generating d independent numbers from the N1(0, 1) distribution,
– multiply them by A and adding µ.
• We can generate a sample from the Nd(0,Σ) distribution in the same fashion, provided we
can express the symmetric matrix Σ as Σ = AA⊤ for some matrix A.
Definition 22. The Cholesky factorization of a symmetric matrix Σ is a representation
Σ = AA⊤,
where A is a lower triangular matrix A.
If Σ is a symmetric matrix which is positive definite, then a Cholesky factorization exists.
• Why do we consider the Cholesky factorization?
• Consider Z = µ+AX.
• If A is a lower triangular matrix this simplifies to
Z1 = µ1 +A11X1,
Z2 = µ2 +A21X1 +A22X2,
. . .
Zd = µd +Ad1X1 + . . .+AddXd.
21
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• TheCholesky factorization of a symmetric matrix Σ into the product of a lower triangular
matrix A and its transpose A⊤ starts from the calculation
AA⊤ =
A11
A21 A22 0
...
...
. . .
Ad1 Ad2 · · · Add
A11 A21 · · · Ad1
A22 · · · Ad2
0
. . .
...
Add
=
A211 A11A21 · · · A11Ad1
A21A11 A
2
21 +A
2
22 · · · A21Ad1 +A22Ad2
...
...
...
Ad1A11 Ad1A21 +Ad2A22 · · · A2d1 +A2d2 + · · ·+A2dd
• Equating the matrix on the right-hand side of this expression to Σ, we obtain a system of
equations.
For j = 1, i = 1 : A211 = Σ11 ⇒ A11 =
√
Σ11.
For j = 1, i = 2 : A21A11 = Σ21 ⇒ A21 = Σ21/A11.
. . .
For j = 1, i = d : Ad1A11 = Σd1 ⇒ Ad1 = Σd1/A11.
For j = 2, i = 2 : A221 +A
2
22 = Σ22 ⇒ A22 =
√
Σ22 −A221.
. . .
We find one new entry of A in each equation. Therefore we can solve for the individual
entries sequentially. We obtain the following algorithm for solving these equations.
In the algorithm above we take A11 =
√
Σ11 etc. The matrix A is not unique, since setting
A11 = −
√
Σ11 etc. would have also been possible. Note that there exists an alternative definition of
the Cholesky factorization which additionally requires that the diagonal elements are non-negative.
22
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Definition 23. Let Σ ∈ Rd×d be symmetric and positive definite. Then its Cholesky factorization
Σ = AA⊤ can be computed as follows:
For j = 1, . . . , d:
For i = j
Ajj =
√
Σjj −
∑j−1
k=1A
2
jk.
For i = j + 1, . . . , d
Aij =
(
Σij −
∑j−1
k=1AikAjk
)
/Ajj .
• One can prove by induction that Σjj −
∑j−1
k=1A
2
jk in the algorithm is non-negative.
• Note that all Ajj 6= 0. We therefore do not divide by 0 in the algorithm.
To see that all Ajj 6= 0 recall:
• Since the matrix Σ is assumed to be positive definite, i.e., x⊤Σx > 0 for all x ∈ Rd \ {0}, all
its eigenvalues are positive.
• Hence det(Σ) > 0 (since the determinant is the product of the eigenvalues).
• Moreover, det(Σ) = det(A) det(A⊤) = (A11 · . . . · Add)2 6= 0 (since A is lower triangular
matrix. )
The algorithm above computes the d(d+ 1)/2 unknown elements of the matrix A using O(d3)
operations.
• Reading: (Glasserman, 2004, Section 2.3, particularly pp. 72-73).
1.7 Appendix
1.7.1 The moment generating function
Let X be a random variable with cdf F . The moment generating function of X is defined as
ψX(t) = Ee
tX =
∫ ∞
−∞
etxdF (x),
provided the expectation is finite for |t| < h, for some h > 0.
Theorem 24. Let X and Y be random variables and let ψX(t) = ψY (t) for all t with |t| < h for
some h > 0. Then X
d
= Y .
This theorem and its proof can be found in (Gut, 2005, Theorem 8.1, p. 189).
23
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
1.7.2 The normal distribution
• The probability density function fX of a random variable X ∼ N1(µ, σ2) is given by
fX(x) =
1√
2πσ2
e−
(x−µ)2
2σ2 ,
and its cumulative distribution function is given by
P (X ≤ x) ≡ Φµ,σ2(x) = Φ
(
x− µ√
σ2
)
for x ∈ R,
where Φ is the cumulative distribution function of the standard normal distribution. Indeed,
assuming that σ > 0, we can calculate
P (X ≤ x) = 1
σ
√
2π
∫ x
−∞
e−
(t−µ)2
2σ2 dt =
1√
2π
∫ x−µ
σ
−∞
e−
s2
2 ds = Φ
(
x− µ
σ
)
,
for x ∈ R, where the second equality is obtained by the substitution s = (t− µ)/σ.
The moment generating functions of a random variable X ∼ N1(µ, σ2) is given by
MX(θ) := E
[
eθX
]
= eµθ+
1
2σ
2θ2 for θ ∈ R. (1.7)
• The d-dimensional normal distribution Nd(µ,Σ) is characterised by its mean vector µ and its
covariance matrix Σ. If X ∼ Nd(µ,Σ), then
E [Xi] = µi for i = 1, . . . , d,
and
E [(Xi − E [Xi]) (Xj − E [Xj ])] = Σij for i, j = 1, . . . , d.
A covariance matrix Σ is always symmetric and positive semidefinite, i.e., x⊤Σx ≥ 0 for all
x ∈ Rd. In what follows, we assume that Σ is actually positive definite, i.e., that x⊤Σx > 0 for all
x ∈ Rd \ {0}.
• The probability density function fX of a d-variate normal random variable X ∼ Nd(µ,Σ) is
given by
fX(x) =
1
(2π)d/2
√|Σ| exp
(
−1
2
(x− µ)⊤Σ−1(x− µ)
)
,
where |Σ| is the determinant of the matrix Σ.
24
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
The choices µ = 0 and
Σ = Id =
1
1 0
1
0
. . .
1
,
give rise to the standard d-dimensional normal distributionNd(0, Id). In this case, ifX ∼ Nd(0, Id),
then
fX(x) =
1
(2π)d/2
exp
(
−1
2
‖x‖22
)
,
where ‖x‖22 = x⊤x = x21 + x22 + · · ·+ x2d.
• The moment generating function of a d-variate Nd(µ,Σ) random variable X is given by
MX(θ) := E
[
eθ
⊤X
]
= eθ
⊤µ+ 12 θ
⊤Σθ for θ ∈ Rd. (1.8)
25
Chapter 2
Foundations of Monte Carlo
Methods
2.1 Introduction to Monte Carlo integration
• The numerical evaluation of definite integrals is one of the main applications that
Monte Carlo simulation is concerned with.
• To fix ideas, let f be a given function and consider the integral
I =
∫ 1
0
f(s) ds.
• We can view this integral as the expectation of the random variable X = f(U),
where U ∼ U(0, 1). Indeed,
I = E[X] = E[f(U)].
• This representation gives rise to the Monte Carlo estimator of I, which is given by
In =
1
n
n∑
i=1
Xi =
1
n
n∑
i=1
f(Ui),
where U1, . . . , Un are i.i.d. random variables from the U(0, 1) distribution and Xi = f(Ui).
26
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
2.2 Monte Carlo estimators
Definition 25. Suppose that X is a random variable such that E[|X|] < ∞. Let X1, . . . , Xn be
i.i.d. random variables from the distribution of X, a Monte Carlo estimator of µ = E[X] is
given by the sample mean
X¯n =
1
n
n∑
i=1
Xi. (2.1)
Remark 26. • The mean of X¯n is
E
[
X¯n
]
=
1
n
n∑
i=1
E[Xi] = E[X] = µ,
so this estimator is unbiased.
• Also, its variance is
var
(
X¯n
)
=
1
n2
n∑
i=1
var(Xi) =
1
n
var(X). (2.2)
Remark 27. • The Strong Law of Large Numbers implies that the Monte Carlo estimator
X¯n of µ = E[X] given by (2.1) is strongly consistent, i.e.,
lim
n→∞
X¯n = µ, P-a.s..
• Since P-a.s. convergence implies convergence in probability, the Monte Carlo estimator X¯n
of µ = E[X] is also (weakly) consistent, i.e.,
lim
n→∞
P
(∣∣X¯n − µ∣∣ > ε) = 0
for all ε > 0.
For details on different types of convergence and implications see also Remark 30 below.
2.3 Confidence intervals of Monte Carlo estimators
To study the rate of convergence of Monte Carlo estimators, we further assume that
0 < σ :=
√
var(X) <∞,
so that the Central Limit Theorem can be used.
27
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Remark 28.
We can use the Central Limit Theorem to obtain
lim
n→∞
P
(
X¯n − µ
σ/
√
n
≤ x
)
= Φ(x) for all x ∈ R,
where Φ is the cdf of the standard normal distribution. It follows that
lim
n→∞
P
(
µ ∈
(
X¯n − a σ√
n
, X¯n + b
σ√
n
))
= lim
n→∞
P
(
X¯n − a σ√
n
< µ < X¯n + b
σ√
n
)
= lim
n→∞
P
(
−a σ√
n
< µ− X¯n < b σ√
n
)
= lim
n→∞
P
(
−b < X¯n − µ
σ/
√
n
< a
)
= lim
n→∞
P
(
X¯n − µ
σ/
√
n
< a
)
− lim
n→∞
P
(
X¯n − µ
σ/
√
n
≤ −b
)
= Φ(a)− Φ(−b), for all a, b ≥ 0, (2.3)
which provides asymptotic confidence intervals for the mean µ.
See Section 2.4 for details on the Central Limit Theorem.
• In view of the symmetry of the standard normal distribution, these confidence intervals will
be narrowest for each level of confidence if b = a.
• For this reason, given ε ∈ (0, 1), we consider the unique point aε such that
Φ(aε) = 1− ε
2
, (2.4)
and we note that Φ(−aε) = ε2 , since Φ(−x) = 1− Φ(x) for all x.
• For a = b = aε, (2.3) implies that
lim
n→∞
P
(
µ ∈
(
X¯n − aε σ√
n
, X¯n + aε
σ√
n
))
= 1− ε. (2.5)
• We conclude that the mean µ = E[X] belongs to the 1− ε confidence interval(
X¯n − aε σ√
n
, X¯n + aε
σ√
n
)
as n→∞.
• In this context, we can state that the rate of convergence of the Monte Carlo method is
OP (n
−1/2).
Recall that for two sequences of random variables (an), (bn) we say that an = OP (bn) if ∀ǫ > 0
∃C > 0, n0 ∈ N such that P(
∣∣∣anbn ∣∣∣ ≤ C) > 1− ǫ for all n ≥ n0. Hence, X¯n − µ = OP (n−1/2).
28
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Remark 29. • The Central Limit Theorem provides information about the distribution of
the error between the Monte Carlo estimator and µ, in particular X¯n−µ
σ/
√
n
is in the limit (i.e.,
as n→∞) standard normally distributed.
• Since a standard normal distribution has support R there is no finite bound on the error
between the MC estimator X¯n and µ.
• Suppose Z ∼ N1(0, 1) and we choose ε = 0.05. Then aε ≈ 1.96 and
P(|Z| < 1.96) ≈ 0.95.
• Hence, we can say that with a probability close to 0.95, for n large enough, the error between
the MC estimator X¯n and the true µ is bounded by 1.96
σ√
n
.
• In practice, the variance σ2 of X is typically unknown.
• In such a case, we may consider a consistent estimator sn > 0 of σ > 0.
• The convergence of sn to σ in probability implies that σ/sn converges to 1 in probability,
hence, in distribution.
• By combining the Central Limit Theorem (Theorem 33) and the fact that σ/sn converges
to 1 in probability we obtain using Slutsky’s Lemma (Lemma 32) that
X¯n − µ
sn/
√
n
≡ X¯n − µ
σ/
√
n
σ
sn
⇒ Z for Z ∼ N (0, 1).
• Combining this with the analysis in Remark 28 implies that(
X¯n − aε sn√
n
, X¯n + aε
sn√
n
)
is an asymptotically valid 1− ε confidence interval.
• In practice, we often consider the sample standard deviation
sn =
√√√√ 1
n− 1
n∑
i=1
(
Xi − X¯n
)2
when σ is not known.
29
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We see that the number of Xi needed to achieve a certain degree of accuracy
depends on the variance σ2 of the Xi in (2.1).
• One way of improving the efficiency of Monte Carlo simulation schemes is to con-
sider sampling that is associated with reduced variance, which is the topic of Chapter
4.
2.4 Appendix: Convergence in distribution (weak conver-
gence) and the Central Limit Theorem
• A sequence of random variables (Xn) converges to a random variable X, P-a.s., if
P
(
lim
n→∞
Xn = X
)
= 1.
• A sequence of random variables (Xn) converges to a random variable X in probability if
lim
n→∞
P (|Xn −X| > ε) = 0,
for all ε > 0.
• A sequence of random variables (Xn) converges to a random variable X in L1 if
lim
n→∞
E [|Xn −X|] = 0.
• A sequence of random variables (Xn) converges in distribution to a random variable X if
lim
n→∞
E[f(Xn)] = E[f(X)],
for every bounded and continuous function f : R→ R. Convergence in distribution in also called
weak convergence, and it is denoted by “⇒”: if (Xn) converge weakly to X, then we write
Xn ⇒ X.
Remark 30. The following statements are true:
- P-a.s. convergence implies convergence in probability;
- convergence in L1 implies convergence in probability;
- convergence in probability implies weak convergence.
Remark 31. A sequence of random variables (Xn) converges in distribution to the random
variable X if and only if
lim
n→∞
Fn(x) = F (x) for all x ∈ R at which F is continuous,
where Fn (resp., F ) is the distribution function of Xn (resp., X), see Portmanteau Theorem,
(Kallenberg, 2002, Theorem 4.25, p. 75).
30
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
•Weak convergence is also equivalent to pointwise convergence of characteristic functions. Indeed,
Xn ⇒ X if and only if
lim
n→∞
E
[
eitXn
]
= E
[
eitX
]
for all t ∈ R,
where i =
√−1.
Lemma 32 (Slutsky’s Lemma). Consider two sequences of random variables such that Xn con-
verges to a random variable X in distribution and Yn converges to a constant c in probability as
n→∞. Then
Xn + Yn ⇒ X + c and XnYn ⇒ Xc as n→∞. (2.6)
Theorem 33 (Central Limit Theorem). If (Xn) is a sequence of independent and identically
distributed random variables with finite mean µ and finite variance σ2 > 0, then
X¯n − µ
σ/
√
n
=
1√
n
n∑
i=1
Xi − µ
σ
⇒ Z for Z ∼ N1(0, 1) as n→∞. (2.7)
• Reading: (Glasserman, 2004, Appendix A) (Seydel, 2009, Section 2.4).
31
Chapter 3
Alternatives to Monte Carlo
Methods
In the following, we briefly introduce some alternatives to Monte Carlo methods of approximating
integrals. We are particularly interested in the difference in the rate of convergence between these
methods.
3.1 Quasi-Monte Carlo methods
Quasi-Monte Carlo methods do not try to mimic randomness as Monte Carlo methods. Instead
they use points that are evenly distributed over the area of interest.
32
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Suppose U1, U2, . . . , Ud are i.i.d. random variables from a uniform distribution (on [0, 1)).
Suppose the objective is to calculate
E(f(U1, . . . , Ud)) =
∫
[0,1)d
f(x)dx.
• We can simulate V1, . . . , Vn ∼ U [0, 1)d and compute
E(f(U1, . . . , Ud)) ≈ 1
n
n∑
i=1
f(Vi).
(We will discuss this in detail in Chapter 2.)
• Quasi-Monte Carlo methods approximate this integral by
∫
[0,1)d
f(x)dx ≈ 1
n
n∑
i=1
f(xi),
for carefully and deterministically chosen points x1, . . . , xn ∈ [0, 1)d.
• Quasi-Monte Carlo methods attempt to improve the rate of convergence of Monte
Carlo methods by spreading out the points more evenly.
Recall that i.i.d. stands for independent and identically distributed. If a collection of random
variables is i.i.d. that means that they all have the same probability distribution and they are
mutually independent.
In ordinary Monte Carlo simulation in higher dimensions (e.g., when integrating over [0, 1)d),
taking a scalar i.i.d. sequence of uniforms u1, u2, . . . and taking vectors V1 = (u1, . . . , ud), V2 =
(ud+1, . . . , u2d), . . . produces a sequence of points in the d-dimensional hypercube. In Quasi Monte
Carlo the construction of the points xi depends explicitly on the dimension of the problem.
The Van der Corput sequences that we now introduce provide an example of a Quasi Monte
Carlo sequences.
Remark 34. Let b ≥ 2 be an integer which we call base. Recall, that every positive integer k
has a unique base-b representation
k =
N(k)∑
j=0
aj(k)b
j ,
where aj(k) ∈ {0, 1, . . . , b− 1} for all j ∈ {0, 1, . . . , N(k)} and N(k) ∈ N, aN(k)(k) 6= 0.
Example 35. For base b = 2 we obtain, for example,
7 = 1 · 20 + 1 · 21 + 1 · 22.
33
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
We can now define a function which maps each k ∈ N to a point in [0, 1) as follows:
Definition 36. The radical inverse function with base b is defined by
ϕb(k) =
N(k)∑
j=0
aj(k) · b−j−1, for k = 1, 2, . . . , (3.1)
where N(k) and (a0(k), . . . , aN(k)(k)) are uniquely determined by the base-b representation of k.
The sequences (ϕb(k))k∈N thus constructed are called Van der Corput sequences.
Example 37. For base b = 2, we can see that
1 = 1 · 20 2 = 0 · 20 + 1 · 21 3 = 1 · 20 + 1 · 21
4 = 0 · 20 + 0 · 21 + 1 · 22 5 = 1 · 20 + 0 · 21 + 1 · 22 6 = 0 · 20 + 1 · 21 + 1 · 22
7 = 1 · 20 + 1 · 21 + 1 · 22 · · · · · · · · ·
The associated Van der Corput sequence is therefore given by
ϕ2(1) = 1 · 2−1 =
1
2
ϕ2(2) = 0 · 2−1 + 1 · 2−2 =
1
4
ϕ2(3) = 1 · 2−1 + 1 · 2−2=
3
4
ϕ2(4) =
1
8
ϕ2(5) =
5
8
ϕ2(6) =
3
8
ϕ2(7) =
7
8
· · · · · · · · ·
• We see that if we increase k we get a finer mesh filled by the Van der Corput sequence.
• The larger the base, the greater the number of points required to achieve uniformity.
In the following we are interested in generating points in [0, 1)d.
Definition 38. Let p1, p2, . . . , pd be pairwise relatively prime numbers. The Halton sequence
is defined as the sequence of d-dimensional vectors
xk =
(
ϕp1(k) ϕp2(k) · · · ϕpd(k)
)⊤
, k = 1, 2, . . . ,
where ϕpi(k) are the radical inverse functions with base pi defined in (3.1).
• In practice, the first d of the prime numbers 2, 3, 5, 7, 11, 13, . . . are used for p1, . . . , pd.
• We see that the coordinates of the Halton sequence follow Van der Corput sequences in
distinct bases.
Quasi-Monte Carlo methods can provide faster convergence than Monte Carlo methods in some
situations.
See (Glasserman, 2004, Chapter 5) for more details on Quasi-Monte Carlo methods.
• Reading: (Glasserman, 2004, Introduction to Chapter 5 + Section 5.1.1-5.1.2., pp. 281–287),
(Seydel, 2009, Section 2.5, pp. 88–93).
34
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
3.2 Quadrature methods
3.2.1 One-dimensional integration
• We are considering the problem of approximating ∫ b
a
f(x)dx where f : R→ R.
• Monte Carlo method: rate of convergence is OP (n−1/2).
• Quadrature: Finding approximations to the integral ∫ b
a
f(x)dx is also a classical problem of
numerical analysis.
– Often the function f is approximated by some polynomial Pm.
– Then the integral
∫ b
a
Pm(x)dx is considered which leads to the so-call quadrature
formulae.
• The trapezoidal rule yields
∫ b
a
f(x)dx ≈ (b− a)f(a) + f(b)
2
.
• Here the function f is approximated by a polynomial of degree 1 going through (a, f(a))
and (b, f(b)). Here P1(x) =
f(a)−f(b)
a−b x+
af(b)−bf(a)
a−b .
• This can be generalized to
∫ b
a
f(x)dx ≈ b− a
n
(
f(a) + f(b)
2
+
n−1∑
i=1
f(a+
i(b− a)
n
)
)
,
which can be shown to have a rate of convergence of order O(n−2) for f ∈ C2.
3.2.2 d-dimensional integration
• Consider the integral ∫
[a1,b1]×...×[ad,bd] f(x)dx, where f : R
d → R.
• Monte-Carlo method: rate of convergence is OP (n−1/2). (n is the sample size of vectors in
Rd.)
• Trapezoidal rule: O(n−2/d) for twice continuously differentiable integrands. (n is the number
of interpolating points.)
35
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
3.3 Which method should one choose?
• Choice of method depends on the dimension of the problem.
• Monte Carlo method is not competitive for one-dimensional integrands but very at-
tractive for high-dimensional problems.
• Note also that if we are interested in evaluating an expectation of a random variable in
dimension d = 1 we can only use numerical integration if we know the density
of this random variable. If the density is unknown and we can sample from the required
distribution it can still be necessary/useful to evaluate expected values of one-dimensional
random variables by Monte-Carlo methods!
See also the discussion in (Glasserman, 2004, Section 1.1.).
36
Chapter 4
Variance Reduction Techniques
for Monte Carlo Estimation
We assume that all random variables considered in this chapter are in L1 and have finite variances.
• We consider the problem of estimating the expectation of a random variable X, i.e., we
would like to estimate µ = E[X].
• The classical Monte Carol estimator is X¯n = 1n
∑n
i=1Xi where X1, . . . , Xn are i.i.d. random
variables with the same distribution as X.
• The variance of X¯n is var(X¯n) = var(X)n .
• We consider three variance reduction methods for estimating µ. They both try to find
another random variable Z such that E[Z] = µ.
• Then, they estimate µ using a Monte Carlo estimator Z¯n = 1n
∑n
i=1 Zi, where Z1, . . . , Zn
are i.i.d. random variables with the same distribution as Z.
• If var(Z¯n) ≤ var(X¯n), then the Monte Carol estimator Z¯n has a reduced variance compared
to the original Monte Carlo estimator X¯n.
• How to find Z?
4.1 Control variates
We can introduce the context of the control variates technique by means of the following example.
37
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Example 39. • Suppose we want to estimate E[f(X)], where f is a given function and X is
a random variable with known µ = E[X].
• Let (Xi) be an an i.i.d. sequence of random variables with the same distribution as X, and
define Yi = f(Xi), then the sample mean
Y¯n =
1
n
n∑
i=1
Yi (4.1)
is an unbiased Monte Carlo estimator of E[Y ] = E[f(X)].
• The sample mean X¯n = 1n
∑n
i=1Xi, which arises as a byproduct in the computation of Y¯n,
is an unbiased estimator of the known mean µ = E[X].
• The control variates technique uses this extra information to construct a Monte
Carlo estimator of E[Y ] = E[f(X)] that performs better than the estimator Y¯n given by
(4.1).
• Let X and Y be random variables such that E[X] is known.
• The control variates method for estimating E[Y ] can be described as follows.
– Given a sequence (Xi, Yi) of i.i.d. random vectors from the joint distribution of (X,Y ),
we define
Yi(b) = Yi − b (Xi − E[X]) , for i = 1, 2, . . . ,
where b ∈ R is a constant.
– For each choice of b ∈ R, (Yi(b)) is a sequence of i.i.d. random variables such that
E [Yi(b)] = E[Yi]− b (E[Xi]− E[X]) = E[Y ], (4.2)
var (Yi(b)) = var(Yi − bXi) = var(Y )− 2b cov(X,Y ) + b2 var(X). (4.3)
Definition 40. Suppose the pairs (Xi, Yi), i = 1, . . . , n are i.i.d. and that the expectation EX is
known.
((X,Y ) denotes a generic pair of random variables with the same distribution as each (Xi, Yi).)
The control variate estimator with parameter b of E[Y ] is defined by
Y¯n(b) := Y¯n − b
(
X¯n − E[X]
)
=
1
n
n∑
i=1
[Yi − b (Xi − E[X])] = 1
n
n∑
i=1
Yi(b). (4.4)
Note that the observed error X¯n − E[X] is used to control the estimation of E[Y ].
38
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Remark 41. • The Strong Law of Large Numbers implies that Y¯n(b) is strongly consistent,
i.e., limn→∞ Y¯n(b) = E[Y ], P-a.s..
• The mean of the control variate estimator is
E
[
Y¯n(b)
]
=
1
n
n∑
i=1
E [Yi(b)]
(4.2)
= E[Y ], (4.5)
so it is unbiased.
39
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Remark 42.
• The variance of the control variate estimator is
var(Y¯n(b)) =
1
n2
n∑
i=1
var(Yi(b))
=
1
n
var(Yi(b))
(4.3)
=
1
n
[
var(Y )− 2b cov(X,Y ) + b2 var(X)] . (4.6)
• This is a function in b and can be minimized with respect to b.
• The value b∗ of the parameter b that minimizes the variance of Y¯n(b) is given by
b∗ =
cov(X,Y )
var(X)
. (4.7)
• Substituting b∗ for b in (4.6), we obtain
var(Y¯n(b
∗)) =
1
n
[
var(Y )− cov(X,Y )
2
var(X)
]
. (4.8)
• This expression and the fact that
var(Y¯n) =
1
n
var(Y )
imply that
var(Y¯n(b
∗))
var(Y¯n)
= 1− cov(X,Y )
2
var(X) var(Y )
= 1− ρ2XY , (4.9)
where ρXY is the correlation between X and Y .
• The control variates method is useful provided that the correlation ρXY of X and Y is big
and the extra computational effort associated with generating the samples Xi is relatively
small.
40
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• In practice, if E[Y ] is not known and we need simulations to estimate it, then it is unlikely
that cov(X,Y), which is needed for determining b∗ in (4.7), is known.
• In such a case, we can use an unbiased estimator bˆ∗n of b∗.
• In particular, we can choose
bˆ∗n =
∑n
j=1
(
Xj − X¯n
) (
Yj − Y¯n
)
∑n
j=1
(
Xj − X¯n
)2 ,
which converges in probability to b∗ as n→∞.
• The corresponding control variates estimator is given by 1n
∑n
i=1 Yi(bˆ
∗
n), where
Yi(bˆ
∗
n) = Yi −
∑n
j=1
(
Xj − X¯n
) (
Yj − Y¯n
)
∑n
j=1
(
Xj − X¯n
)2 (Xi − E[X]) .
Example 43. In the following we consider an example which shows the potential benefit of using
control variates.
• Suppose we would like to compute the integral
µ =
∫ 1
0
eudu = e− 1
using Monte Carlo simulation.
• We can express µ as the expectation of a function of a U [0, 1] random variable by setting
µ =
∫ 1
0
eudu = E[eU ], where U ∼ U [0, 1].
• Hence, using the previous notation we are interested in EY , where Y = f(X), f(x) = ex
and X = U ∼ U [0, 1].
• The control variate estimator is
Y¯n(b) =
1
n
n∑
i=1
(eUi − b(Ui − EU︸︷︷︸
=1/2
)),
where U1, . . . , Un ∼ U [0, 1] i.i.d..
41
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Using (4.7) and (4.8) we obtain
var(U) =
1
12
,
var(eU ) = E[e2U ]− (E[eU ])2 = 1
2
(e2 − 1)− (e− 1)2,
cov(U, eU ) = E[eUU ]− E[U ]E[eU ] =
∫ 1
0
euudu− 1
2
(e− 1)
= euu|u=1u=0 −
∫ 1
0
eudu− 1
2
(e− 1),
= e− (e− 1)− 1
2
(e− 1) = −e
2
+
3
2
,
var(Y¯n(b
∗)) =
1
n
[
var(eU )− cov(U, e
U )2
var(U)
]
≈ 1
n
0.003940.
• Then var(Y¯n(b∗)) = 1n
[
var(eU )− cov(U,eU )2var(U)
]
≈ 1n0.003940.
• If we compare this to the variance of the standard Monte Carlo estimator given by
var(Y¯n) = var(
1
n
n∑
i=1
eUi) =
1
n
var(eU ) ≈ 1
n
0.242036
we find that
1− var(Y¯n(b
∗))
var(Y¯n)
≈ 0.9837.
Hence, the control variate estimator has reduced the variance by 98.37 % compared to the
Monte Carlo estimator.
We consider now an example from option pricing.
Example 44.
• Suppose we want to price a European call written on an underlying stock with:
– time-T price S(T ),
– maturity T ,
– strike price K.
– We assume constant interest rate r ≥ 0.
• Then, Y = f(S(T )) = e−rT (S(T )−K)+.
• The discounted stock price is a martingale under the risk-neutral measure and hence
E
(
e−rTS(T )
)
= S(0)
(where the expectation is taken with respect to the risk-neutral measure).
42
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Let Si be i.i.d. random variables with the same distribution as S(T ) and Yi = e−rT (Si−K)+.
• Then a control variate estimator is given by
Y n(b) =
1
n
n∑
i=1
(Yi − b(e−rTSi(T )− S(0))).
• Here b might be replaced by bˆ∗n.
• The effectiveness of the control variate depends heavily on K, see (Glasserman, 2004, Ex-
ample 4.1.1). For K = 0 we would have perfect correlation.
Remark 45.
• Interpretation in a linear regression context: Suppose we fit a least squares regression
line through the pairs (Xi, Yi), i = 1, . . . , n.
• We define Y = (Y1, . . . , Yn)⊤ ∈ Rn, β = (β0, β1)⊤ ∈ R2, X =
1 X1. . .
1 Xn
∈ Rn×2.
• Then minimizing ‖Y −Xβ‖22 over β yields the optimal β∗, where
β∗0 = Y n − β∗1Xn,
β∗1 =
∑n
i=1(Xi −Xn)(Yi − Y n)∑n
i=1(Xi −Xn)2
.
• Hence we see that the regression line through the points (Xi, Yi) has slope β∗1 = bˆ∗n and
passes through (Xn, Y n).
• If we set
f(x) = β∗0 + β
∗
1x,
we see that f(Xn) = Y n and if we evaluate this function (the regression line) at EX we get
f(EX) = Y n − β∗1Xn + β∗1EX = Y n − β∗1(Xn − EX)
= Y n − bˆ∗n(Xn − EX) = Y n(bˆ∗n),
which is exactly the control variate estimator.
43
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Interpretation of the control variate method in a regression context.
• The regression line fitted to the points (Xi, Yi) has slope bˆ∗n and goes through (Xn, Y n).
• The control variate estimator is the value of the regression line evaluated at EX.
• In this example the sample mean Xn underestimates EX and the control variate estimator
uses this information to adjust for it.
Y (bˆ∗
n
)
EX
Y n
Xn
• Further remarks and examples can be found in Glasserman (2004):
* Using tractable options as control for evaluating more complicated option prices, (Glasser-
man, 2004, Example 4.1.2).
* Using tractable dynamics as control for evaluating option prices in a more complicated asset
pricing model, (Glasserman, 2004, Example 4.1.4).
• Hedges as control, (Glasserman, 2004, Example 4.1.5).
• A discussion on the differences between using the estimated bˆ∗n and the theoretical optimal
b∗ is given in (Glasserman, 2004, pp. 195–196). This contains a motivation why using the
estimated bˆ∗n rather than b
∗ does asymptotically (for n → ∞) still work. For small sample
issues, see (Glasserman, 2004, Section 4.1.3).
• Reading: (Glasserman, 2004, Section 4.1, particularly pp. 185–189).
(Seydel, 2009, pp.121–122)
44
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
4.2 Antithetic variates
• The method of antithetic variates attempts to exploit negative correlation between
random variables to reduce the variance.
• Uniform distribution: U ∼ U(0, 1) if and only if 1− U ∼ U(0, 1).
• This observation can also be applied to more general distributions which can be generated
from the inverse transform method. Let U ∼ U(0, 1):
F−1(U) and F−1(1− U) both have cdf F and are antithetic since F−1 is monotone.
• Standard normal distribution: X ∼ Nd(0, Id) if and only if −X ∼ Nd(0, Id).
• Pairs of random variables, such as (U, 1−U), (F−1(U), F−1(1−U)) and (X,−X), are called
antithetic pairs.
• The method of antithetic variates uses antithetic pairs to produce estimators
with reduced variance.
Example 46. • Suppose that we want to estimate E[f(U)], where f is a given function and
U ∼ U(0, 1).
• Given a sequence (Ui) of i.i.d. random variables from U(0, 1),
X¯n =
1
n
n∑
i=1
f(Ui) and Y¯n =
1
n
n∑
i=1
f(1− Ui)
are unbiased Monte Carlo estimators of E[f(U)].
• In particular, limn→∞ X¯n = E[f(U)] and limn→∞ Y¯n = E[f(U)], P-a.s..
• The antithetic variates estimator simply takes the average of these two estimators and
considers
Xn + Y n
2
as an estimator for Ef(U).
Definition 47. Let (X,Y ) be a random vector such that X and Y have the same distribu-
tion. Given a sequence (Xi, Yi) of i.i.d. random vectors from the joint distribution of (X,Y ), the
antithetic variates estimator of E[X] ≡ E[Y ] is given by
Z¯n =
X¯n + Y¯n
2
, (4.10)
where X¯n =
1
n
∑n
i=1Xi and Y¯n =
1
n
∑n
i=1 Yi.
45
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• The antithetic variates estimator is strongly consistent because
lim
n→∞
Z¯n =
1
2
lim
n→∞
X¯n +
1
2
lim
n→∞
Y¯n = E[X] ≡ E[Y ], P-a.s.. (4.11)
• Note, that the pairs (Xi, Yi) are assumed to be i.i.d., but obviously for each i the Xi and
Yi have the same distribution but are not necessarily independent.
Since the random variables in the sequence ((Xi, Yi), i = 1, 2, . . .) are independent, we can see that
the variance of the antithetic variates estimator Z¯n given by (4.10) is
var(Z¯n) = var
(
X¯n + Y¯n
2
)
= var
(
1
2n
n∑
i=1
(Xi + Yi)
)
=
1
4n2
var
(
n∑
i=1
(Xi + Yi)
)
=
1
4n
var (Xi + Yi)
=
1
4n
[var(Xi) + 2 cov(Xi, Yi) + var(Yi)]
=
1
2n
[var(Y ) + cov(X,Y )] .
Comparing this result with
var(Y¯2n) =
1
2n
var(Y ),
we can see that var(Z¯n) < var(Y¯2n) if cov(X,Y ) < 0.
• Hence, the antithetic variates technique can be useful in practice when the random vari-
ables X and Y are negatively correlated and it takes roughly twice (or less than twice) the
computational effort to generate a sample (Xi, Yi) relative to generating a sample Yi.
• In applications, the following result will be very useful:
Lemma 48. If h : [0, 1]d → R is a monotone function of each of its arguments, then for
U1, . . . , Ud i.i.d. with distribution U [0, 1]:
cov(h(U1, . . . , Ud), h(1− U1, . . . , 1− Ud)) ≤ 0.
Remark 49.
• We know already, that we can use the inverse transform method to generate a sample from a
distribution with cumulative distribution function F by setting Xi = F
−1(Ui) and U1, . . . , Un ∼
U [0, 1] i.i.d..
• Note that for a cumulative distribution function F the generalized inverse F−1 is non-decreasing.
Hence, if we consider a function g : R → R that is monotone, then h := g ◦ F−1 is monotone as
well. This also generalizes to the case g : Rd → R.
46
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Example 50.
• We consider again the integral from Example 43:
µ =
∫ 1
0
eudu = E[eU ] = e− 1, where U ∼ U [0, 1].
• Let U1, . . . , U2n ∼ U [0, 1] i.i.d.. Then, the standard Monte Carlo estimator using 2n random
numbers has variance
var
(
1
2n
2n∑
i=1
eUi
)
=
var(eU )
2n
=
1
2 (e
2 − 1)− (e− 1)2
2n
≈ 1
n
0.1210.
• Note that
cov(eU , e1−U ) = E[eUe1−U ]− E[eU ]E[e1−U ] = e− (e− 1)(−1)(1− e)
= e− (e− 1)2 ≈ −0.23.
• Let U1, . . . , Un ∼ U [0, 1] i.i.d.. Then the variance of the antithetic variates estimator is
var
(
1
2n
(
n∑
i=1
eUi +
n∑
i=1
e1−Ui
))
=
1
2n
(
var(eU ) + cov(eU , e1−U )
)
≈ 1
n
0.003913.
• Hence,
1− var
(
1
2n
(∑n
i=1 e
Ui +
∑n
i=1 e
1−Ui))
var
(
1
2n
∑2n
i=1 e
Ui
) ≈ 0.9677
and therefore the antithetic variates method reduced the variance by 96.77 % .
Example 51. The antithetic variates method eliminates variance due to the antisymmetric part
of an integrand.
47
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Suppose that we want to estimate E[f(Z)], where f is a given function and Z ∼ Nd(0, Id).
• The standard Monte Carlo estimator of E[f(Z)] is given by
1
n
n∑
i=1
f(Zi),
• while the antithetic variates estimator of E[f(Z)] is given by
1
n
n∑
i=1
f(Zi) + f(−Zi)
2
.
• The variances of these two estimators are as follows:
standard Monte Carlo scheme:
1
n
var (f(Z)) , (4.12)
antithetic variates scheme:
1
n
var
(
f(Z) + f(−Z)
2
)
. (4.13)
• To investigate how these variances compare, we define
fs(z) =
f(z) + f(−z)
2
and fa(z) =
f(z)− f(−z)
2
,
so that f(z) = fs(z) + fa(z).
• Then, the variance of the antithetic variates scheme is
1
n
var
(
f(Z) + f(−Z)
2
)
=
var(fs(Z))
n
.
48
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• One can show that
var (f(Z)) = var (fs(Z)) + var (fa(Z)) . (4.14)
• In the view of (4.14) and (4.12)–(4.13), we can see that
– if f is symmetric (even), i.e., if f = fs and fa = 0, then
var(f(Z)) = var(fs(Z)),
and antithetic sampling eliminates no variance;
– if f is antisymmetric (odd), i.e., if f = fa and fs = 0, then
var(fs(Z)) = 0,
and the method of antithetic variates eliminates all variance.
• Finally, we prove (4.14). First, observe that
var (f(Z)) = var (fs(Z)) + var (fa(Z)) + 2 cov(fs(Z), fa(Z)). (4.15)
• Now, since both of Z and −Z have the same distribution Nd(0, Id), we can calculate
E [fs(Z)fa(Z)] =
1
4
(
E
[
f2(Z)− f2(−Z)])
=
1
4
(
E
[
f2(Z)
]− E [f2(−Z)])
= 0
= E [fs(Z)]E [fa(Z)]︸ ︷︷ ︸
=0
,
• which implies that
cov (fs(Z), fa(Z)) ≡ E [fs(Z)fa(Z)]− E [fs(Z)]E [fa(Z)] = 0
and hence (4.15) reduces to (4.14).
Note that if f is symmetric, then f(z) = fs(z) + fa(z) = fs(z) ∀z, since fa(z) = 0 ∀z. Hence,
in particular f(z) = f(−z), so f is symmetric with respect to the y-axis (z = 0).
If f is antisymmetric, then f(z) = fs(z) + fa(z) = fa(z) ∀z and in particular −f(z) = f(−z) ∀z.
Hence f is symmetric with respect to the origin.
Note that fs is symmetric and fa is antisymmetric.
• Reading: (Glasserman, 2004, Section 4.2).
49
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
4.3 Importance sampling
• Importance sampling is another method which can lead to a reduced variance.
• It generates a sample from a different distribution than originally required and uses then
importance weights to account for this.
It works as follows:
• Suppose X is a random vector in Rd with probability density f and h : Rd → R is a function.
• We are interested in evaluating
µ := Efh(X) =
∫
h(x)f(x)dx,
where we use the notation Ef to indicate the expectation with respect to the probability
measure under which X has density f .
• Suppose there is another probability density on Rd satisfying
f(x) > 0 =⇒ g(x) > 0 ∀x ∈ Rd. (4.16)
• We can therefore write
µ = Efh(X)=
∫
h(x)f(x)dx=
∫
h(x)
f(x)
g(x)
g(x)dx = Eg
(
h(X)f(X)
g(X)
)
. (4.17)
Definition 52. Let f, g : Rd → R be density functions which satisfy (4.16). The importance
sampling estimator of Ef (h(X)) is defined as
µISn =
1
n
n∑
i=1
h(Xi)
f(Xi)
g(Xi)︸ ︷︷ ︸
=w(Xi)
=
1
n
n∑
i=1
h(Xi)w(Xi),
where X1, . . . , Xn are i.i.d. random variables from density g. The ratio
w(Xi) =
f(Xi)
g(Xi)
is referred to as the importance weight or Radon-Nikodym derivative evaluated at Xi.
50
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We see that in importance sampling a change of measure is performed to increase the
sampling efficiency.
• Important outcomes get more weight.
• From (4.17) one sees immediately that the importance sampling estimator is an unbiased
estimator of Efh(X).
• Suppose Eg|h(X)w(X)| <∞ and X1, . . . , Xn i.i.d. with density g, then by the Strong Law
of Large Numbers
1
n
n∑
i=1
h(Xi)w(Xi)→ Eg(h(X)w(X)) = µ Pg − a.s..
• varg(µISn) = 1n varg(h(X)w(X)).
• In order to compare the variance we only need to compare the second moments since
the first moments coincide.
Eg
[(
h(X)
f(X)
g(X)
)2]
=
∫
h2(x)
f2(x)
g2(x)
g(x)dx =
∫
h2(x)
f(x)
g(x)
f(x)dx
= Ef
[
h2(X)
f(X)
g(X)
]
,
• which can be larger or smaller than Ef (h2(X)).
• Hence a good choice of g is crucial.
51
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Suppose the function h is nonnegative and we define g˜ : Rd → R by
g˜(x) =
h(x)f(x)∫
h(x)f(x)dx
=
h(x)f(x)
µ
. (4.18)
• Then g˜ is a density and it can be shown that using g˜ in the importance sampling results
in an estimator with minimal variance (indeed zero variance!) compared to all other
choices of densities g.
• Suppose X1, . . . , Xn are i.i.d. random variables from density
g˜(x) =
h(x)f(x)
µ
.
Then
µISn =
1
n
n∑
i=1
h(Xi)
f(Xi)
g˜(Xi)
=
1
n
n∑
i=1
µ = µ
and therefore the variance of µISn is zero.
• Obviously, in practice, we cannot use g˜ since in (4.18) the denominator is exactly the
quantity µ which we want to estimate.
• But we still see that we should sample in proportion to the product of h and f .
• If h is not non-negative, we can choose
gˆ(x) =
|h(x)f(x)|∫ |h(x)f(x)|dx.
This does usually not lead to zero variance of the importance sampling estimator but still
minimal variance compared to other densities.
52
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Example 53.
• Suppose we want to compute
µ = E[1{X≥10}] = P(X ≥ 10) = 1− Φ(10),
where X ∼ N1(0, 1), hence f(x) = 1√2π exp(−x
2
2 ).
• If you sample i.i.d. X1, X2, . . . Xn ∼ N1(0, 1), for e.g. n = 10000, and compute the Monte
Carlo estimator
µˆ =
1
n
n∑
i=1
1{Xi≥10}
in practice you will usually get µˆ = 0 and for the sample variance 1n−1
∑n
i=1(1{Xi≥10}−µˆ)2 =
0.
• The event that {X ≥ 10} for X ∼ N1(0, 1) just does not have enough probability mass to
successfully use the standard Monte Carlo method.
• The idea is now to shift the probability mass and use importance sampling with density
g(x) =
1√
2π
exp
(
−1
2
(x− 10)2
)
,
which is the density of a random variable with N1(10, 1) distribution.
• The importance sampling estimator is then given by
µISn =
1
n
n∑
i=1
1{Xi≥10}f(Xi)
g(Xi)
,
where X1, . . . , Xn ∼ N1(10, 1) i.i.d..
We have seen before, that it suffices to compare the second moments of the two estimators.
One can show that
Ef [h(X)
2] = 1− Φ(10) > e100(1− Φ(20)) = Eg
[(
h(X)f(X)
g(X)
)2]
,
where here h(X) = 1{X≥10}.
• Reading: (Glasserman, 2004, Section 4.6 (particularly 4.6.1) and Section 4.7).
53
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
4.4 Appendix
4.4.1 Some remarks on absolute continuous probability measures and
importance sampling
• Let P1,P2 be two probability measures defined on the measurable space (Ω,F) and we assume
that P2 is absolute continuous with respect to P1, i.e., the following condition holds:
∀A ∈ F : P1(A) = 0 =⇒ P2(A) = 0. (4.19)
Then the F-measurable random variable L satisfying
P2(A) =
∫
A
LdP1, ∀A ∈ F
is the Radon-Nikody´m derivative and we also write L = dP2dP1 .
Note that the condition (4.19) is equivalent to
∀A ∈ F : P2(A) > 0 =⇒ P1(A) > 0.
It is still possible that P1(A) > 0 if P2(A) = 0, i.e., the probability measure P1 may assign a
positive probability to events that are impossible under P2.
Then
EP2h(X) =
∫
Ω
h(X)dP2 =
∫
Ω
h(X)
dP2
dP1︸︷︷︸
=L
dP1 =
∫
Ω
h(X)LdP1 = EP1(h(X)L).
Now, consider a random variable X that takes values in R and has density f with respect to
the Lebesgue measure on R. Then X (or its distribution PX) is absolutely continuous with density
f and for x ∈ R:
FX(x) = PX((−∞, x]) =
∫ x
−∞
f(y)dy.
Note that the distribution PX is a probability measure on (R,B(R)), where B(R) is the Borel
σ-Algebra on R. Let B ∈ B(R), then PX(B) =
∫
B
f(x)dx. Hence, we see that condition (4.16) is
related to absolute continuity of probability measures.
• Note, that condition (4.16) can be equivalently expressed as
supp f ⊆ supp g,
where supp f = {x ∈ Rd : f(x) 6= 0} = {x ∈ Rd : f(x) > 0} (in the last step we used the fact that
f is a probability density function and therefore non-negative).
Let f, g be probability density functions and h a given function. From the computation
Ef (h(X)) =
∫
h(x)f(x)dx =
∫
h(x)
f(x)
g(x)
g(x)dx = Eg
(
h(X)
f(X)
g(X)
)
54
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
we see that we can change assumption (4.16) to
supp f · h ⊆ supp g,
where supp f · h = {x ∈ Rd : h(x)f(x) 6= 0}.
4.4.2 Some remarks on change of measure and importance sampling
• Consider a random variable Z on (Ω,F) that has pdf g under the probability measure P,
hence
P(Z ≤ z) =
∫ z
−∞
g(x)dx ∀z ∈ R.
• Let f be another pdf on R with property
g(z) = 0⇒ f(z) = 0.
• We define a new probability measure Q on (Ω,F) by
Q(A) = EP[1A
f(Z)
g(Z)
], A ∈ F .
(Set f(z)/g(z) = 1 if f(z) = g(z) = 0. ) Note that P( f(Z)g(Z) ≥ 0) = 1 and
∫
Ω
f(Z(ω))
g(Z(ω))dP(ω) =∫∞
−∞
f(x)
g(x) g(x)dx = 1.
• Under the new measure Q, Z has pdf f , since
Q(Z ≤ z) = EP[1{Z≤z} f(Z)
g(Z)
] =
∫ z
−∞
f(x)
g(x)
g(x)dx =
∫ z
−∞
f(x)dx.
• Hence when we do importance sampling we generate a sample from pdf g. Then we compute
the expectation of h(Z) under the new probability measure Q that was obtained using the
Radon-Nikody´m derivative L = f(Z)/g(Z). Under the new measure Q the random variable
Z has indeed pdf f , which is what we need. Formally,
EQ[h(Z)] =
∫
Ω
h(Z(ω))dQ(ω) =
∫
Ω
h(Z(ω))
dQ(ω)
dP(ω)︸ ︷︷ ︸
=f(Z(ω))/g(Z(ω))
dP(ω) =
∫ ∞
−∞
h(z)
f(z)
g(z)
g(z)dz
=
∫ ∞
−∞
h(z)f(z)dz = Ef [h(Z)].
55
Chapter 5
Generating Sample Paths
The aim of this chapter is to present exact simulation methods for some stochastic processes which
are very important in financial mathematics.
We consider a stochastic process X = (Xt)t∈[0,T ] and want to simulate X at a fixed set of
points 0 < t1 < . . . < tn ≤ T such that the joint distribution of the simulated values (Xˆt1 , . . . , Xˆtn)
coincides with the distribution of the underlying stochastic process evaluated at these points in
time (Xt1 , . . . , Xtn).
• Consider a stochastic process X = (Xt)t∈[0,T ].
• Aim: to simulate X at a fixed set of points 0 < t1 < . . . < tn ≤ T such that
joint distribution of the simulated values (Xˆt1 , . . . , Xˆtn) coincides with that of
(Xt1 , . . . , Xtn).
• We refer to such a simulation method as exact.
56
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
5.1 Simulating a Brownian motion
Definition 54. A real-valued stochastic process W = (Wt)t∈[0,∞) is a Brownian motion
(Wiener process) if it has the properties
1. W0(ω) = 0, ∀ω ∈ Ω,
2. the map t 7→Wt(ω) is a continuous function of t ∈ [0,∞) for all ω ∈ Ω (continuous paths),
3. For every t, h ≥ 0 is Wt+h −Wt independent of {Wu : 0 ≤ u ≤ t} and normally distributed
with mean 0 and variance h.
• From this definition we see that for a Brownian motion W and 0 ≤ s ≤ t the covariance
cov(Ws,Wt) = E(WsWt)− EWs︸︷︷︸
=0
EWt︸︷︷︸
=0
= E(Ws(Wt −Ws +Ws))
= E(Ws(Wt −Ws))︸ ︷︷ ︸
independent increment
= 0
+EW 2s = s.
• Hence, cov(Ws,Wt) = min{s, t}.
• Definition 54 can be directly used to simulate a path of a Brownian motion:
• Let Z1, . . . , Zn be i.i.d. standard normally distributed random variables.
• Then we set t0 = 0 and Wˆt0 = Wˆ0 = 0 and
• for i = 0, . . . , n− 1 we define
Wˆti+1 = Wˆti +
√
ti+1 − tiZi+1. (5.1)
• We see that Wˆti+1 − Wˆti =
√
ti+1 − tiZi+1 ∼ N1(0, ti+1 − ti) and the increments are inde-
pendent by construction.
Recall the following property of the multivariate normal distribution:
57
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Lemma 55. Let X ∼ Nd(µ,Σ), µ ∈ Rd,Σ ∈ Rd×d. Then for any matrix A ∈ Rk×d AX ∼
Nk(Aµ,AΣA⊤).
• Note that Wˆ0 = 0 and
1 0 0 · · · 0 0
1 1 0 · · · 0 0
...
...
...
...
...
1 1 1 · · · 1 1
︸ ︷︷ ︸
=A
Wˆt1 − Wˆ0
Wˆt2 − Wˆt1
...
Wˆtn − Wˆtn−1
︸ ︷︷ ︸
=X
=
Wˆt1
Wˆt2
...
Wˆtn
.
• Moreover, X ∼ Nn(0,Σ) with Σij = 0 for i 6= j and Σii = ti − ti−1 for i = 1, . . . , n.
• From Lemma 55 we then see that (Wˆt1 , . . . , Wˆtn)⊤ has indeed a multivariate normal distri-
bution with the correct mean and covariance structure.
• Hence, the joint distribution of the simulated values (Wˆt1 , . . . , Wˆtn) coincides with the dis-
tribution of the corresponding Brownian motion at t1, . . . , tn given by (Wt1 , . . . ,Wtn). ⇒
exact scheme!
• Recall that (Wt1 , . . . ,Wtn) ∼ Nn(0,Σ), where Σij = min{ti, tj}.
• We can use the same idea to simulate sample paths of more general stochastic processes:
• The process
Xt = µt+ σWt
with W a Brownian motion, µ ∈ R, σ > 0 (sometimes referred to as Brownian motion
with drift µ and diffusion coefficient σ2) can be simulated exactly
• by setting Xˆ0 = 0
Xˆti+1 = Xˆti + µ(ti+1 − ti) + σ
√
ti+1 − tiZi+1, i = 0, . . . , n− 1,
where again Z1, . . . , Zn are i.i.d. standard normally distributed random variables.
58
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Recall that for 0 ≤ s ≤ t the increments of X satisfy
Xt −Xs = µ(t− s) + σ(Wt −Ws) ∼ N1(µ(t− s), σ2(t− s)).
Moreover, the increments are independent and X has continuous sample paths.
• We can use the same arguments as for the Brownian motion to show that the distributions
of the simulated vector and that of the original vector coincide.
• We now consider time-dependent deterministic coefficients (µs), (σs) and
Xt = X0 +
∫ t
0
µsds+
∫ t
0
σsdWs (5.2)
with W a Brownian motion and a constant X0.
• Recall that for 0 ≤ s ≤ t
Xt −Xs ∼ N1
(∫ t
s
µudu,
∫ t
s
σ2udu
)
and again the increments are independent and X has continuous sample paths.
• Recall from the Itoˆ-Isometry that
E
[(∫ t
s
σudWu
)2]
= E
∫ t
s
σ2u d〈W 〉u︸ ︷︷ ︸
=du
.
• Using this, X can be simulated exactly by setting Xˆ0 = X0 and
Xˆti+1 = Xˆti +
∫ ti+1
ti
µsds+
√∫ ti+1
ti
σ2udu Zi+1, i = 0, . . . , n− 1,
where again Z1, . . . , Zn are i.i.d. standard normally distributed random variables.
59
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Note that we could also simulate (5.2) by setting
Xˆti+1 = Xˆti + µti(ti+1 − ti) + σti
√
ti+1 − tiZi+1, i = 0, . . . , n− 1,
• but then we will in general introduce a discretization error at t1, . . . , tn since the increments
of X have usually no longer the correct mean and variance:
E[Xˆti+1 − Xˆti ] = µti(ti+1 − ti) 6=
∫ ti+1
ti
µudu = E[Xti+1 −Xti ],
var[Xˆti+1 − Xˆti ] = σ2ti(ti+1 − ti) 6= var
[∫ ti+1
ti
σsdWs
]
=
∫ ti+1
ti
σ2udu
= var[Xti+1 −Xti ].
We will come back to this in Chapter 6.
• Application of the Cholesky factorization:
Let W be a Brownian motion. Recall that for s, t ≥ 0 cov(Ws,Wt) = min{s, t}.
• Hence (Wt1 , . . . ,Wtn)⊤ ∼ Nn(0,Σ) where Σij = min{ti, tj}.
• This vector can be simulated from a vector Z ∼ Nn(0, In)
by taking AZ where A ∈ Rn×n satisfies AA⊤ = Σ and is a lower triangular matrix.
(Cholesky representation of Σ is used here!).
• Here,
A =
√
t1 0 . . . 0√
t1
√
t2 − t1 . . . 0
. . .√
t1
√
t2 − t1 . . . √tn − tn−1
.
The evaluation of AZ requires O(n2) operations whereas the recursive iteration (5.1) only
requires O(n) operations.
• Brownian bridge construction:
• Let W be a Brownian motion, then Xt =Wt − tW1 is a Brownian bridge on [0, 1].
• We will present a simulation method which allows us to simulate the path of a Brownian
motion using a Brownian bridge construction:
– We first simulate the final value Wtn
– and then sample Wt⌊n
2
⌋
conditional on the value of Wtn and W0,
– i.e., we consider then a Brownian bridge from W0 to Wtn .
– We proceed like this.
60
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
We will use the following result regarding the multivariate normal distribution.
Lemma 56. Suppose the partitioned vector (X[1], X[2])
⊤ is multivariate normal with
(
X[1]
X[2]
)
∼ Nn
((
µ[1]
µ[2]
)
,
(
Σ[11] Σ[12]
Σ[21] Σ[22]
))
and Σ[22] has full rank. Then
X[1]
∣∣(X[2] = z) ∼ Nn1 (µ[1] +Σ[12]Σ−1[22](z − µ[2]),Σ[11] − Σ[12]Σ−1[22]Σ[21]) .
Regarding the dimensions of the block matrices we have (X[1], X[2])
⊤ ∈ Rn, X[1] ∈ Rn1 , X[2] ∈ Rn2 ,
where n1, n2, n ∈ N and n1+n2 = n. Then µ[1] ∈ Rn1 , µ[2] ∈ Rn2 , Σ[11] ∈ Rn1×n1 , Σ[22] ∈ Rn2×n2 ,
Σ[12] ∈ Rn1×n2 , Σ[21] ∈ Rn2×n1 .
Let 0 < u < s < t and suppose we would like to generate Ws conditional on Wu and Wt. Recall
that
WuWs
Wt
∼ N3
00
0
,
u u uu s s
u s t
and if we change the order in the vector we obtain
WsWu
Wt
∼ N3
00
0
,
s u su u u
s u t
.
61
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We set X[1] = Ws, X[2] = (Wu,Wt)⊤ and z = (x, y)⊤ ∈ R2, Σ[11] = s, Σ[22] =
(
u u
u t
)
,
Σ[12] = (u s), Σ[21] =
(
u
s
)
in Lemma 56.
• Then,
E[Ws|Wu = x,Wt = y] = 0 + (u s) 1
t− u
(
t
u −1
−1 1
)(
x
y
)
=
x(t− s) + y(s− u)
t− u ,
var[Ws|Wu = x,Wt = y] = s− (u s) 1
t− u
(
t
u −1
−1 1
)(
u
s
)
=
(s− u)(t− s)
t− u
• and in particular Ws|(Wu = x,Wt = y) is normally distributed.
• The conditional mean is just the linear interpolation between (u, x), (t, y) evaluated at s.
• To sample from this conditional distribution we can set
Wˆs =
x(t− s) + y(s− u)
t− u +
√
(s− u)(t− s)
t− u Z, (5.3)
where Z ∼ N1(0, 1).
• Generally we can proceed as follows: Set u = 0, Wˆu = 0, and sample Wˆtn from N1(0, tn).
• Then we sample Wˆs for 0 = u < s < tn from (5.3).
• We can then repeat this method in the two subintervals etc..
• Suppose we have already sampled at the time points s1 < s2 < . . . sk and we would like to
sample at the time point s with si < s < si+1. Then we choose u = si and t = si+1 and
apply (5.3), hence we always condition under the values si, si+1 closest to s.
62
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
5
1.
0
1.
5
time t
W
t
Step 1
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
5
1.
0
1.
5
time t
W
t
Step 2
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
5
1.
0
1.
5
time t
W
t
Step 3
• Reading: (Glasserman, 2004, Section 3.1.1)
5.2 Simulating a geometric Brownian motion
• Consider a geometric Brownian motion X which solves the SDE
dXt = Xt(µdt+ σdWt), (5.4)
where µ ∈ R, σ > 0 and W is a Brownian motion.
• With initial value X0 this SDE has the solution
Xt = X0 exp
(
(µ− σ
2
2
)t+ σWt
)
and more general, for 0 < s < t
Xt = Xs exp
(
(µ− σ
2
2
)(t− s) + σ(Wt −Ws)
)
.
• We can simulate values of X at 0 = t0 < t1 < . . . < tn recursively by
Xˆti+1 =Xˆti exp
(
(µ− σ
2
2
)(ti+1 − ti) + σ
√
ti+1 − ti Zi+1
)
, i = 0, . . . , n− 1 (5.5)
where again Z1, . . . , Zn are i.i.d. N1(0, 1) and Xˆ0 = X0.
• This method is again exact.
63
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Note that if we start from the SDE (5.4) and set Xˆ0 = X0 and
Xˆti+1 = Xˆti + µXˆti(ti+1 − ti) + σXˆti
√
ti+1 − tiZi+1 (5.6)
= Xˆti(1 + µ(ti+1 − ti) + σ
√
ti+1 − tiZi+1), i = 0, . . . , n− 1, (5.7)
where Z1, . . . , Zn are i.i.d. N1(0, 1), we have introduced a discretisation bias.
• If we, however, consider the SDE of Yt = log(Xt) which is given by
dYt = d log(Xt) =
(
µ− σ
2
2
)
dt+ σdWt,
• and simulate Y by setting Yˆ0 = log(X0) and
Yˆti+1 = Yˆti +
(
µ− σ
2
2
)
(ti+1 − ti) + σ
√
ti+1 − tiZi+1, i = 0, . . . , n− 1, (5.8)
where Z1, . . . , Zn are i.i.d. N1(0, 1)
• we obtain an exact simulation scheme for log(X).
(Note that (5.5) and (5.8) are equivalent sampling schemes.)
• Note that the discretization scheme in (5.6) does not guarantee the non-negativity of the
process X any longer!
• We see in this example already that transformation of variables might be useful in a
simulation context to reduce discretization errors, to guarantee non-negativity etc..
• Reading: (Glasserman, 2004, Section 3.2.1).
64
Chapter 6
Euler Schemes for Simulating
Solutions of Stochastic Differential
Equations (SDEs)
6.1 Introduction
Throughout this chapter we consider a stochastic process satisfying the SDE
dXt = µ(Xt)dt+ σ(Xt)dWt, (6.1)
with a fixed initial value X0. The functions µ, σ : R → R satisfy certain conditions such that the
existence and uniqueness of a strong solution to the SDE (6.1) is guaranteed, see (Glasserman,
2004, Appendix B.2), and W is a standard Brownian motion.
In Chapter 5 we have discussed some exact simulation schemes for special stochastic processes.
Exact simulation, however, is not always possible. The idea is to derive a discrete-time approxim-
ation Xˆ of the continuous-time stochastic process X. This will usually lead to some discretisation
error. We will discuss different discretisation methods in the following. In order to simplify the
notation we choose a grid with equal step size for all time steps. We choose a constant h > 0 and
want to approximate the process X at the following points in time: t = 0, h, 2h, . . ..
• Consider stochastic process satisfying the SDE
dXt = µ(Xt)dt+ σ(Xt)dWt
with a fixed initial value X0.
• The functions µ, σ : R→ R satisfy certain conditions. W is a standard Brownian motion.
• Aim: Find a discrete-time approximation Xˆ of the continuous-time stochastic process X at
time points 0, h, 2h, 3h . . . for some h > 0.
65
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
6.2 A first order scheme
We consider the one-dimensional SDE (6.1):
• Let h > 0 and t ≥ 0, then
Xt+h = Xt +
∫ t+h
t
µ(Xs) ds+
∫ t+h
t
σ(Xs) dWs
≈ Xt + µ(Xt)
∫ t+h
t
ds+ σ(Xt)
∫ t+h
t
dWs
= Xt + µ(Xt)h+ σ(Xt) (Wt+h −Wt) . (6.2)
• Since Wt+h −Wt ∼ N1(0, h), this approximation gives rise to the following definition.
Definition 57. The first order Euler scheme for the SDE (6.1) is given by Xˆ0 = X0 and
Xˆ(j+1)h = Xˆjh + µ(Xˆjh)h+ σ(Xˆjh)
√
hZj+1, j = 0, 1, 2, . . . , (6.3)
where (Zj+1) are i.i.d. random variables with N1(0, 1) distribution.
• For µ = 0 and σ = 1, this method provides a way for simulating paths of the standard
Brownian Motion as we have seen already in Chapter 5.
• Loosely speaking, the second term on the right-hand side of (6.3) is O(h), while the third
one is O(
√
h)
• A first attempt to improve the order of convergence of the algorithm as h ↓ 0 could
concentrate on the stochastic integral in (6.2).
6.3 Order of convergence
Let β > 0. We consider a discrete time approximation (Xˆjh, j = 0, . . . , n) of a stochastic process
X with h = T/n.
Definition 58. We say that a discrete time approximation (Xˆ) with maximum step size h con-
verges strongly with order β > 0 to X at time T (for h ↓ 0) if there exists a positive constant
c which does not depend on β and a β0 > 0 such that
E
[
|XˆT −XT |
]
≤ chβ
for all h ∈ (0, β0). (If X is multidimemsional the absolute value is replaced by some vector norm.)
66
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Definition 59. We say that a discrete time approximation (Xˆ) with maximum step size h con-
verges weakly with order β > 0 to X at time T (for h ↓ 0) if for each f ∈ C2β+2P (Rd,R) there
exists
a positive constant c = c(f) which does not depend on β and a finite β0
> 0 such that∣∣∣E [f(XˆT )]− E [f(XT )]∣∣∣ ≤ chβ
for all h ∈ (0, β0).
Here C2β+2P (R
d,R) is the set of all (2β+2)-times continuously differentiable functions f : Rd → R
for which f and all its partial derivatives of order 1, 2, . . . , 2β + 2 are of polynomial growth (A
function g with g : Rd → R is said to be of polynomial growth if it satisfies |g(x)| ≤ K(1 + ‖x‖q)
for some constants K, q and all x ∈ Rd).
Under appropriate assumptions, the following statements about the order of convergence hold:
• The first order Euler scheme has strong order of convergence of 12 and weak order
of convergence of 1.
• Next we will consider the Milstein scheme (refinement of the first order Euler scheme)
which has both strong and weak order of convergence of 1, and
• afterwards we will study the second order Euler scheme and the simplified second
order Euler scheme which have weak order of convergence of 2.
Remark 60. • Note that option pay-offs are usually not differentiable and therefore it is
important to check the numerical performance of the various schemes (particularly higher-
order schemes).
• In many applications in derivative pricing only weak order of convergence is important.
• Reading: Detailed proofs and the necessary assumptions can be found in (Kloeden & Platen,
1999, Chapter 10, Chapter 14).
67
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
6.4 A refinement of the first order scheme
• We consider again the one-dimensional SDE
dXt = µ(Xt) dt+ σ(Xt) dWt,
where µ, σ : R → R are given functions such that σ is C2, and W is a standard one-
dimensional Brownian motion.
• To simplify the notation, we consider the operator L defined by
Lf(x) = 1
2
σ2(x)f ′′(x) + µ(x)f ′(x)
for a C2 function f .
• With this notation, Itoˆ’s formula can be written as
f(Xs) = f(Xt) +
∫ s
t
Lf(Xu) du+
∫ s
t
σ(Xu)f
′(Xu) dWu, for t < s.
• If we apply Itoˆ’s formula to the function σ, we obtain for t ≤ u ≤ s
σ(Xu) = σ(Xt) +
∫ u
t
Lσ(Xy) dy +
∫ u
t
σ(Xy)σ
′(Xy) dWy. (6.4)
• We can now approximate (6.4) as in the derivation of the first-order Euler scheme to obtain
σ(Xu) ≈ σ(Xt) + Lσ(Xt)(u− t) + σ(Xt)σ′(Xt)(Wu −Wt).
• Note that Wu −Wt is of order O(
√
u− t) in probability, whereas the drift is of the higher
order O(u− t). We drop therefore the drift term and obtain the simpler approximation
σ(Xu) ≈ σ(Xt) + σ(Xt)σ′(Xt)(Wu −Wt).
• Then we obtain for h > 0∫ t+h
t
σ(Xu)dWu ≈
∫ t+h
t
(σ(Xt) + σ(Xt)σ
′(Xt)(Wu −Wt)) dWu
= σ(Xt)(Wt+h −Wt) + σ(Xt)σ′(Xt)
∫ t+h
t
(Wu −Wt)dWu.
(6.5)
68
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Furthermore,
∫ t+h
t
(Wu −Wt)dWu =
∫ t+h
t
WudWu −Wt(Wt+h −Wt)
=
1
2
(W 2t+h −W 2t − h)−Wt(Wt+h −Wt)
=
1
2
(Wt+h −Wt)2 − 1
2
h.
(6.6)
• Combining (6.5) and (6.6) then yields
∫ t+h
t
σ(Xu)dWu ≈ σ(Xt)(Wt+h −Wt)
+ σ(Xt)σ
′(Xt)
1
2
((Wt+h −Wt)2 − h).
We can now define a first refinement to the first order Euler scheme given a constant
h > 0 and points in time t = 0, h, 2h, . . . in terms of the following algorithm:
Definition 61. The Milstein scheme for the SDE (6.1) is given by Xˆ0 = X0 and for j =
0, 1, 2, . . .
Xˆ(j+1)h = Xˆjh + µ(Xˆjh)h+ σ(Xˆjh)
√
hZj+1 +
1
2
σ(Xˆjh)σ
′(Xˆjh)h(Z2j+1 − 1), (6.7)
where (Zj+1) are i.i.d. random variables with N1(0, 1) distribution.
• The Milstein scheme is the Euler scheme with additional term
1
2
σ(Xˆjh)σ
′(Xˆjh)h(Z2j+1 − 1).
• Conditional on Xˆjh the additional term has expectation 0. Moreover Z2j+1− 1 and Zj+1 are
uncorrelated:
cov(Z2j+1−1, Zj+1)=E(Z3j+1−Zj+1)−E(Z2j+1 − 1)︸ ︷︷ ︸
=0
E(Zj+1)︸ ︷︷ ︸
=0
=0−0−0=0.
• The Milstein scheme is an improvement over the first-order Euler scheme since it has both strong
and weak order of convergence of 1 under appropriate conditions.
• Reading: (Glasserman, 2004, 6.1.1).
69
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
6.5 A second order scheme
• We consider again the one-dimensional SDE
dXt = µ(Xt) dt+ σ(Xt) dWt,
where µ, σ : R → R are smooth (e.g., C2) functions, and W is a standard one-dimensional
Brownian motion.
• To simplify the notation, we consider again the operator L defined by
Lf(x) = 1
2
σ2(x)f ′′(x) + µ(x)f ′(x)
for a C2 function f .
• With this notation, Itoˆ’s formula can be written as
f(Xs) = f(Xt) +
∫ s
t
Lf(Xu) du+
∫ s
t
σ(Xu)f
′(Xu) dWu, for t < s.
• We use Itoˆ’s formula and apply it to both functions µ(·) and σ(·).
Again let h > 0 and t ≥ 0.
Applying Itoˆ’s formula to µ(·)
For t ≤ s ≤ t+ h:
•
µ(Xs) = µ(Xt) +
∫ s
t
Lµ(Xu) du+
∫ s
t
σ(Xu)µ
′(Xu) dWu
≈ µ(Xt) + Lµ(Xt)
∫ s
t
du+ σ(Xt)µ
′(Xt)
∫ s
t
dWu.
• Hence,
∫ t+h
t
µ(Xs) ds ≈ µ(Xt)h+ Lµ(Xt)
∫ t+h
t
∫ s
t
du ds︸ ︷︷ ︸
=:I(0,0)
+ σ(Xt)µ
′(Xt)
∫ t+h
t
∫ s
t
dWuds︸ ︷︷ ︸
=:I(1,0)
= µ(Xt)h+ Lµ(Xt)I(0,0) + σ(Xt)µ′(Xt)I(1,0).
70
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Applying Itoˆ’s formula to σ(·)
For t ≤ s ≤ t+ h:
•
σ(Xs) = σ(Xt) +
∫ s
t
Lσ(Xu) du+
∫ s
t
σ(Xu)σ
′(Xu) dWu
≈ σ(Xt) + Lσ(Xt)
∫ s
t
du+ σ(Xt)σ
′(Xt)
∫ s
t
dWu.
• ∫ t+h
t
σ(Xs) dWs ≈ σ(Xt)(Wt+h −Wt) + Lσ(Xt)
∫ t+h
t
∫ s
t
du dWs︸ ︷︷ ︸
=:I(0,1)
+ σ(Xt)σ
′(Xt)
∫ t+h
t
∫ s
t
dWudWs︸ ︷︷ ︸
=:I(1,1)
= σ(Xt)(Wt+h −Wt) + Lσ(Xt)I(0,1) + σ(Xt)σ′(Xt)I(1,1).
Combining the results so far
Xt+h ≈ Xt
+µ(Xt)h+ Lµ(Xt)I(0,0) + σ(Xt)µ′(Xt)I(1,0)
+σ(Xt)(Wt+h −Wt) + Lσ(Xt)I(0,1) + σ(Xt)σ′(Xt)I(1,1)
= Xt + µ(Xt)h+ σ(Xt)(Wt+h −Wt)︸ ︷︷ ︸
terms in Euler scheme
+ σ(Xt)σ
′(Xt)I(1,1)︸ ︷︷ ︸
additional term in Milstein scheme
+Lµ(Xt)I(0,0) + σ(Xt)µ′(Xt)I(1,0) + Lσ(Xt)I(0,1)︸ ︷︷ ︸
additional term in second order scheme
We evaluate the four double integrals I(0,0), I(1,0), I(0,1), I(1,1) in the following:
71
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Lemma 62. Let W be a standard Brownian motion, h > 0, 0 ≤ t ≤ s ≤ t+ h. Then
I(0,0) =
∫ t+h
t
∫ s
t
duds =
h2
2
, (6.8)
I(1,1) =
∫ t+h
t
∫ s
t
dWudWs =
1
2
(Wt+h −Wt)2 − h
2
, (6.9)
I(1,0) =
∫ t+h
t
∫ s
t
dWuds =
∫ t+h
t
(Ws −Wt)ds =
∫ t+h
t
Wsds− hWt, (6.10)
I(0,1) =
∫ t+h
t
∫ s
t
du dWs =
∫ t+h
t
(s− t)dWs =
∫ t+h
t
sdWs − t(Wt+h −Wt). (6.11)
Moreover,
I(0,1) = h(Wt+h −Wt)− I(1,0). (6.12)
Proof.
Equations (6.8), (6.10), (6.11) follow from direct calculations and equation (6.9) follows from (6.6).
In particular,
I(0,0) =
∫ t+h
t
∫ s
t
duds =
∫ t+h
t
(s− t)ds = (t+ h)
2
2
− t
2
2
− th = h
2
2
,
I(1,1) =
∫ t+h
t
∫ s
t
dWudWs =
∫ t+h
t
(Ws −Wt)dWs
=
1
2
(W 2t+h −W 2t − h)−Wt(Wt+h −Wt)
=
1
2
(Wt+h −Wt)2 − h
2
.
72
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
I(0,1) =
∫ t+h
t
(s− t)dWs =
∫ t+h
t
sdWs︸ ︷︷ ︸
=(∗)
−t(Wt+h −Wt)
and (*) can be computed using the integration by parts formula:
Xt+hYt+h = X0Y0 +
∫ t+h
0
XsdYs +
∫ t+h
0
YsdXs + 〈X,Y 〉t+h,
XtYt = X0Y0 +
∫ t
0
XsdYs +
∫ t
0
YsdXs + 〈X,Y 〉t,
⇒ Xt+hYt+h −XtYt =
∫ t+h
t
XsdYs +
∫ t+h
t
YsdXs + 〈X,Y 〉t+h − 〈X,Y 〉t,
⇔
∫ t+h
t
XsdYs = Xt+hYt+h −XtYt −
∫ t+h
t
YsdXs − 〈X,Y 〉t+h + 〈X,Y 〉t.
By setting Xs = s and Ys =Ws in∫ t+h
t
XsdYs = Xt+hYt+h −XtYt −
∫ t+h
t
YsdXs − 〈X,Y 〉t+h + 〈X,Y 〉t
one obtains ∫ t+h
t
sdWs = (t+ h)Wt+h − tWt −
∫ t+h
t
Wsds− 0 + 0
= hWt+h −
∫ t+h
t
Wsds+ t(Wt+h −Wt).
Hence,
I(0,1) =
∫ t+h
t
(s− t)dWs =
∫ t+h
t
sdWs − t(Wt+h −Wt)
= hWt+h −
∫ t+h
t
Wsds+
∫ t+h
t
Wtds−
∫ t+h
t
Wtds︸ ︷︷ ︸
=Wth
= h(Wt+h −Wt)−
∫ t+h
t
(Ws −Wt)ds = h(Wt+h −Wt)− I(1,0).
Hence
I(1,0) = h(Wt+h −Wt)− I(0,1)
⇔ I(0,1) = h(Wt+h −Wt)− I(1,0).
73
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Lemma 63. Let W be a standard Brownian motion and h > 0.
1. Let Jh :=
∫ h
0
sdWs, then I(0,1)
D
= Jh.
2. Let J˜h :=
∫ h
0
Wsds, then I(1,0)
D
= J˜h.
3. The random vector (Wh, Jh) is bivariate normal. In particular,(
Wh
Jh
)
∼ N2
((
0
0
)
,
(
h h2/2
h2/2 h3/3
))
.
4. The random vector (Wh, J˜h) is bivariate normal. In particular,(
Wh
J˜h
)
∼ N2
((
0
0
)
,
(
h h2/2
h2/2 h3/3
))
.
Note that Jh = hWh − J˜h .
Recall that
D
= means equality in distribution.
Once we have proved the lemma, we have derived the second order Euler scheme for generating
paths of our SDE, which is realized by the algorithm:
Definition 64. The second order Euler scheme is given by the following recursion:
Xˆ(j+1)h = Xˆjh + µ(Xˆjh)h+ σ(Xˆjh)W¯j +
1
2
σ(Xˆjh)σ
′(Xˆjh)(W¯ 2j − h)
+
(
1
2
σ2(Xˆjh)µ
′′(Xˆjh) + µ(Xˆjh)µ′(Xˆjh)
)
h2
2
+ σ(Xˆjh)µ
′(Xˆjh)J¯j
+
(
1
2
σ2(Xˆjh)σ
′′(Xˆjh) + µ(Xˆjh)σ′(Xˆjh)
)
(hW¯j − J¯j),
where (W¯j , J¯j) is a sequence of i.i.d. random vectors with the same distribution as the Gaussian
random vector (Wh, J˜h).
Proof of Lemma 63.
1.
I(0,1) =
∫ t+h
t
(s− t) dWs ∼ N1(0,
∫ t+h
t
(s− t)2ds) = N1(0, h
3
3
),
Jh =
∫ h
0
s dWs ∼ N1(0,
∫ h
0
s2ds) = N1(0, h
3
3
).
Hence
I(0,1)
D
= Jh.
74
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
2.
I(1,0) =
∫ t+h
t
(Ws −Wt)︸ ︷︷ ︸
D
=Ws−t
ds
D
=
∫ h
0
Wudu = J˜h.
3.
We want to show that:(
Wh
Jh
)
=
(
Wh∫ h
0
sdWs
)
∼ N2
((
0
0
)
,
(
h h2/2
h2/2 h3/3
))
.
• Given constants θ1, θ2 ∈ R, consider the function g given by
g(t) = E
[
eθ1Wt+θ2Jt
]
for t ≥ 0.
• Note that θ1Wt =
∫ t
0
θ1dWs.
• Define Yt =
∫ t
0
[θ1 + θ2s] dWs .
• Then,
g(t) = E
[
eYt
]
.
• Applying Itoˆ’ formula to the function f(y) = ey we obtain
f(Yt) = e
Yt = eY0 +
∫ t
0
f ′(Ys)dYs +
1
2
∫ t
0
f ′′(Ys)d〈Y, Y 〉s
= 1 +
∫ t
0
eYs(θ1 + θ2s)dWs +
1
2
∫ t
0
eYs(θ1 + θ2s)
2ds.
• Taking expectations and assuming that the expectation of the stochastic integral is 0, we
obtain
g(t) = EeYt = 1 +
1
2
E
[∫ t
0
eYs(θ1 + θ2s)
2ds
]
= 1 +
1
2
∫ t
0
E
[
eYs
]︸ ︷︷ ︸
=g(s)
(θ1 + θ2s)
2 ds.
• It follows that g satisfies the ODE
g′(t) =
1
2
(θ1 + θ2t)
2g(t), g(0) = 1.
75
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• The solution of this first order linear ODE is given by
g(t) = exp
(
1
2
∫ t
0
(θ1 + θ2s)
2 ds
)
= exp
(
1
2
θ21t+
1
2
θ1θ2t
2 +
1
6
θ22t
3
)
.
• Consider the moment generating function of (Wh, Jh):
MWh,Jh(θ1, θ2) ≡ E
[
eθ1Wh+θ2Jh
]
= g(h)
= exp
(
1
2
(
θ1 θ2
)( h h22
h2
2
h3
3
)(
θ1
θ2
))
∀θ1, θ2 ∈ R.
Hence, we see that (Wh, Jh) is indeed multivariate normally distributed with the stated cov-
ariance matrix and expectation 0.
4.
• Integration by parts with Xs = s and Ys =Ws yields again hWh = Jh + J˜h
• and hence J˜h = hWh − Jh.
• From Lemma 63 we obtain immediately, that
(
Wh
J˜h
)
=
(
1 0
h −1
)(
Wh
Jh
)
∼ N2
(
0
0
)
,
(
1 0
h −1
)(
h h2/2
h2/2 h3/3
)(
1 h
0 −1
)
︸ ︷︷ ︸
=(∗)
• and
(∗) =
(
h h2/2
h2/2 h3/3
)
.
• Reading: (Glasserman, 2004, 6.2.1).
76
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
6.6 A simplified second order scheme
A simplified second order scheme is based on the following observation:
• Let (W¯j , J¯j) be a sequence of i.i.d. random vectors which have the same distribution as the
Gaussian random vector (Wh, J˜h).
• Then we define
Wˆj :=
h
2
W¯j .
• Using W¯j ∼ N1(0, h) we obtain the following results:
Wˆj =
h
2
W¯j ∼ N1(0, h
3
4
),
E(WˆjW¯j) = E(
h
2
W¯ 2j ) =
h2
2
.
• Hence, if we replace J¯j by Wˆj = h2 W¯j we still have the same expectation and covariance
(but a slightly different variance h3/4 rather than h3/3) and only need to sample a one-
dimensional random variable.
In (Kloeden & Platen, 1999, Section 14.2), the following simplified second order scheme is
suggested:
Definition 65. The simplified second order Euler scheme is given by the following recursion:
Xˆ(j+1)h = Xˆjh + µ(Xˆjh)h+ σ(Xˆjh)W¯j +
1
2
σ(Xˆjh)σ
′(Xˆjh)(W¯ 2j − h)
+
(
1
2
σ2(Xˆjh)µ
′′(Xˆjh) + µ(Xˆjh)µ′(Xˆjh)
)
h2
2
+ σ(Xˆjh)µ
′(Xˆjh)
h
2
W¯j +
(
1
2
σ2(Xˆjh)σ
′′(Xˆjh) + µ(Xˆjh)σ′(Xˆjh)
)
(hW¯j − h
2
W¯j)︸ ︷︷ ︸
=(σ(Xˆjh)µ′(Xˆjh)+ 12σ2(Xˆjh)σ′′(Xˆjh)+µ(Xˆjh)σ′(Xˆjh)
h
2 W¯j
,
where (W¯j) is a sequence of i.i.d. random variables with N1(0, h) distribution.
• In (Kloeden & Platen, 1999, Section 14.2) it is also shown that in the simplified second order
scheme the random variable W¯j can even be replaced by a random variable which only takes three
values:
√
3h with probability 16 , −
√
3h with probability 16 , 0 with probability
2
3 .
• Reading: (Glasserman, 2004, pp.355-356).
77
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
6.7 Example: Path-dependent options
• Consider the problem of pricing the following Asian option whose payoff is given by
(
1
m
m∑
j=1
Stj −K)+,
where 0 = t0 < t1 < . . . < tm = T is a fixed set of dates, T is the maturity of the option, K
is the strike price.
• Note that the payoff of this option depends on the path of stock price and not just its
terminal value.
• We consider the Black-Scholes model with time-t stock price
St = S0 exp
(
(r − σ
2
2
)t+ σWt
)
and riskless asset with time-t price Bt = e
rt.
• The time-0 price of the Asian option is given by (expectation is taken under the
risk-neutral probability measure)
I = E[e−rT (
1
m
m∑
j=1
Stj −K)+].
• We can compute the price of this option using a Monte Carlo estimator
Iˆ =
1
n
n∑
i=1
Yi,
where Yi are i.i.d. with the same distribution as the random variable
Y = e−rT (
1
m
m∑
j=1
Stj −K)+.
(As usual r ≥ 0 is the interest rate, σ > 0 the volatility and (Wt) is a Brownian motion under
the risk-neutral probability measure.)
78
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• For i = 1, . . . , n, we denote by Z(i)j the jth draw from the N1(0, 1) distribution along the ith
path. We assume that the Z
(i)
j , j = 1, . . . ,m, i = 1, . . . , n are mutually independent.
– We generate one sample path of the stock price using the exact simulation scheme, i.e.,
we set Sˆ
(i)
0 = S0 and
for j = 1, . . . , n we generate Z
(i)
j from N1(0, 1) and set
Sˆ
(i)
tj = Sˆ
(i)
tj−1 exp
(
(r − σ
2
2
)(tj − tj−1) + σ
√
tj − tj−1Z(i)j
)
.
– Then we define
Yi = e
−rT (
1
m
m∑
j=1
Sˆ
(i)
tj −K)+.
– Hence the Monte Carlo estimator is given by
1
n
n∑
i=1
Yi =
1
n
n∑
i=1
e−rT ( 1
m
m∑
j=1
Sˆ
(i)
tj −K)+
.
• Consider the problem of pricing the following Asian option whose payoff is given by
(
1
T
∫ T
0
Stdt−K)+.
• Now we consider the continuous monitoring case rather than the discrete monitoring case.
• Let M ∈ N and h = T/M . Then,
∫ T
0
Stdt =
M−1∑
i=0
∫ (i+1)h
ih
Stdt ≈
M−1∑
i=0
Sih
∫ (i+1)h
ih
dt =
M−1∑
i=0
hSih.
Hence we can approximate
∫ T
0
Stdt by
∑M−1
i=0 hSih and then proceed similarly as in the
previous case.
• Further details can be found in (Glasserman, 2004, Section 1.1.2 (p. 7–8)).
79
Chapter 7
Finite Difference Schemes for
Partial Differential Equations
(PDEs)
In the following we will look at numerical methods for solving partial differential equations (PDEs).
The importance of PDEs in finance comes from the fact that option prices can be expressed as
solutions of special PDEs. The mathematical link between PDEs and conditional expectations is
provided by the Feynman-Kac-Theorem. We will start our analysis with the well-known Black-
Scholes-PDE. The Black-Scholes-PDE can be transformed to the heat equation by using several
transformations of variables. We will then describe several numerical techniques for solving the
heat equation. We also discuss the stability of the various numerical schemes.
7.1 Black-Scholes-PDE and the heat equation
• Aim: Transforming the Black-Scholes-PDE to the heat equation.
• Discussing numerical methods and their stability for solving the heat equation.
80
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We consider the Black and Scholes PDE
Vt(t, s) +
1
2
σ2s2Vss(t, s) +msVs(t, s)− rV (t, s) = 0,
∀(t, s) ∈ DBS := {(t, s) ∈ R2 : 0 ≤ t < T, s > 0},
(7.1)
with boundary condition
V (T, s) = g(s) ∀s > 0, (7.2)
• where, e.g., g(s) = (s −K)+ in the case of a European call option, g(s) = (K − s)+ in the
case of a European put option (and K strike price).
• The constant m is just m = r if the underlying stock is not dividend-paying
• and if we consider a dividend paying stock then m = r − δ where δ is the continuously
compounding dividend yield.
• Recall that the heat equation is
∂u(τ, x)
∂τ
=
∂2u(τ, x)
∂x2
∀(τ, x) ∈ DH , (7.3)
with initial condition
u(0, x) = f(x), ∀x ∈ R, (7.4)
for some given function f .
• Note that the heat equation is usually considered for DH = {(τ, x) ∈ R2 : τ > 0, x ∈ R} ,
• but since we consider a finite time domain in the asset pricing context we will consider the
heat equation only on DH = {(τ, x) ∈ R2 : 0 < τ ≤ Tσ22 , x ∈ R}
• Several PDEs of interest can be reduced to the heat equation by means of appropriate trans-
formations. In the following we will show how the BS-PDE (7.1)–(7.2) can be transformed to the
heat equation (7.3)–(7.4).
81
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
7.1.1 Transforming the BS-PDE to a PDE with constant coefficients
.
• We consider the new variables (τ, x) defined as
s = Kex ⇔ x = log
( s
K
)
,
t = T − 2
σ2
τ ⇔ τ = (T − t)σ
2
2
,
where K > 0 is a constant.
• We define a new function v¯ : [0, σ2T2 ]× R→ R by setting
v¯(τ, x) := V (T − 2
σ2
τ,Kex) = V (t, s).
• An application of the chain rule yields
v¯τ =
∂v¯
∂τ
= − 2
σ2
Vt,
v¯x =
∂v¯
∂x
= KexVs = sVs,
v¯xx = (Ke
x)2Vss +Ke
xVs = s
2Vss + sVs = s
2Vss + v¯x.
Here the transformation from s to x ensures that we obtain a PDE with constant coefficients.
The transformation from t to τ ensures that the coefficient of the leading term is 1.
• If we plug in these results into the BS-PDE (7.1)–(7.2) we obtain
−σ
2
2
v¯τ (τ, x) +
1
2
σ2s2
1
s2
(v¯xx(τ, x)− v¯x(τ, x))+ms1
s
v¯x(τ, x)−rv¯(τ, x)=0,
• which can be equivalently expressed as
−v¯τ (τ, x) + v¯xx(τ, x) + v¯x(τ, x)(−1 + 2m
σ2
)− 2r
σ2
v¯(τ, x) = 0 (7.5)
• and the boundary condition is
v¯(0, x) = g(Kex), x ∈ R. (7.6)
82
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
7.1.2 Elimination of lower order terms
• We assume that
v¯(τ, x) = eax+bτu(τ, x)K,
for some function u and some constants a, b ∈ R (and as before K > 0).
• We compute the derivatives of v¯ and obtain
v¯τ = (be
ax+bτu+ eax+bτuτ )K = e
ax+bτK(bu+ uτ ),
v¯x = (ae
ax+bτu+ eax+bτux)K = e
ax+bτK(au+ ux),
v¯xx = (ae
ax+bτ (au+ ux) + e
ax+bτ (aux + uxx))K
= eax+bτK(a2u+ 2aux + uxx).
• We plug this into (7.5) and divide by Keax+bτ . Then we obtain
0 = −(bu+ uτ ) + (a2u+ 2aux + uxx) + (au+ ux)(−1 + 2m
σ2
)− 2r
σ2
u
= u
(
−b+ a2 + a(−1 + 2m
σ2
)− 2r
σ2
)
− uτ + ux
(
2a+ (−1 + 2m
σ2
)
)
+ uxx.
• Hence if we choose
a = −1
2
(
−1 + 2m
σ2
)
,
b = a2 + a(−1 + 2m
σ2
)− 2r
σ2
= −1
4
(
−1 + 2m
σ2
)2
− 2r
σ2
.
(7.7)
• this simplifies to
−uτ (τ, x) + uxx(τ, x) = 0
• and the boundary condition is
u(0, x) = e
1
2 (−1+ 2mσ2 )xg(Kex)
1
K
. (7.8)
83
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
7.1.3 Summary
• Recall the following relationship:
V (t, s) = V (T − 2
σ2
τ,Kex) = v¯(τ, x) = eax+bτu(τ, x)K,
s = Kex, x = log
( s
K
)
, t = T − 2
σ2
τ, τ = (T − t)σ
2
2
,
where a, b are defined in (7.7).
• Hence we find that if u solves the heat equation (7.3) with boundary condition (7.8), then
V (t, s) = ea log(
s
K )+b(T−t)σ
2
2 u
(
(T − t)σ
2
2
, log
( s
K
))
K
solves the BS-PDE (7.1) with boundary condition (7.2).
• Similarly, if V solves the BS-PDE (7.1) with boundary condition (7.2), then
u(τ, x) = e−ax−bτ
1
K
v¯(τ, x) = e−ax−bτ
1
K
V (T − 2
σ2
τ,Kex)
solves the heat equation (7.3) with boundary condition (7.8).
• You might wonder why we have used the constant K in the transformation of the PDEs. The
motivation was that this will allow us to simplify the boundary conditions later. Consider e.g., a
European call option with strike price K the boundary condition of the BS-PDE is g(s) = (s−K)+
which can be rewritten as V (T, s) = g(s) = (Kex −K)+ = K(ex − 1)+.
• Reading: (Seydel, 2009, Section 4.1).
84
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
7.2 Approximation of the derivatives of smooth functions
• In what follows, we fix a constant h > 0 and a smooth function g.
Also, we recall that, if a function h 7→ w(h) is O(hk), for some k ≥ 1, then the function
h 7→ w(h)/h is O(hk−1).
• To estimate g′(x), we consider Taylor’s expansion of g at x, namely,
g(x+ h) = g(x) + g′(x)h+O(h2),
g(x− h) = g(x)− g′(x)h+O(h2).
• Rearranging terms, we obtain
g′(x) =
g(x+ h)− g(x)
h
+O(h), (7.9)
g′(x) =
g(x)− g(x− h)
h
+O(h), (7.10)
respectively.
Definition 66. We refer to an approximation of the first derivative as a forward scheme if
g′(x) ≈ g(x+ h)− g(x)
h
,
as a backward scheme, if
g′(x) ≈ g(x)− g(x− h)
h
.
• To approximate g′′(x), we consider higher order terms in the Taylor’s expansion of g at x.
• In particular, we note that
g(x+ h) = g(x) + g′(x)h+
1
2
g′′(x)h2 +
1
3!
g′′′(x)h3 +O(h4),
g(x− h) = g(x)− g′(x)h+ 1
2
g′′(x)h2 − 1
3!
g′′′(x)h3 +O(h4).
• Adding these equations side by side, we obtain
g(x+ h) + g(x− h) = 2g(x) + g′′(x)h2 +O(h4),
• which implies that
g′′(x) =
g(x+ h)− 2g(x) + g(x− h)
h2
+O(h2). (7.11)
85
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
You might have wondered why we used the Taylor expansion up to third order and not just
up to second order? By doing this we could exploit the fact that the term including the third
derivative actually cancels and we find that the order of convergence is actually better. Going
up to even higher order (assuming the function is sufficiently often continuously differentiable)
does not help because the derivatives g′′′′(x), g′′′′′′(x) etc. would not cancel out and need to be
approximated as well.
Definition 67. We refer to an approximation of the second derivative as a centered scheme if
g′′(x) ≈ g(x+ h)− 2g(x) + g(x− h)
h2
. (7.12)
• Reading: (Seydel, 2009, Section 4.2.1).
7.3 The grid
• We discretise the time and the state variables τ and x and obtain therefore a two-dimensional
grid.
• Note that τ ∈ [0, σ2T2 ] and x ∈ R.
Discretisation in the state variable:
• Note that for the discretisation in x we need to replace R by a finite interval.
• We choose two real numbers xmin < xmax and consider the interval [xmin, xmax].
• Note that since S = Kex we obtain Smin := Kexmin ≤ S = Kex ≤ Kexmax =: Smax and from
there suitable xmin, xmax should be chosen.
• Let m be a strictly positive integer.
• The step length in x is then defined by ∆x := xmax−xminm > 0,
• and we use the m+ 1 points of the state space
xmin = x0 < x1 < x2 < · · · < xm = xmax,
where
xi := xmin + i∆x for i = 0, . . . ,m.
86
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Discretisation in the time variable:
• We consider the time interval [0, τmax], where τmax = σ2T2 .
• Given a strictly positive integer ℓmax, we define ∆τ = τmaxℓmax > 0,
• and we use the ℓmax + 1 points in time
0 = τ0 < τ1 < τ2 < · · · < τℓmax = τmax,
• where
τℓ = ℓ∆τ, for ℓ = 0, . . . , ℓmax.
Given this grid and a function u, we write
uℓ,i = u(τℓ, xi) for ℓ = 0, 1, . . . , ℓmax and i = 0, 1, . . . ,m,
in what follows.
• Reading: (Seydel, 2009, Section 4.2.2).
7.4 The explicit (forward) finite difference scheme
• The idea of the explicit scheme is to use a
– forward scheme for the first derivative with respect to time and
– a centered scheme for the second derivative with respect to the state space.
• Therefore the explicit scheme is sometimes also referred to as Forward-Time-Centered-
Space scheme.
87
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We approximate the derivatives appearing in the heat equation (7.3) using (7.9) and (7.11)
to obtain
uℓ+1,i − uℓ,i
∆τ
+O(∆τ) =
uℓ,i+1 − 2uℓ,i + uℓ,i−1
(∆x)2
+O((∆x)2).
• Ignoring the error terms, we obtain the difference equation
uℓ+1,i − uℓ,i
∆τ
=
uℓ,i+1 − 2uℓ,i + uℓ,i−1
(∆x)2
.
• This is equivalent to
uℓ+1,i − uℓ,i = ∆τ
(∆x)2
(uℓ,i+1 − 2uℓ,i + uℓ,i−1).
• If we define λ = ∆τ(∆x)2 , then we can see that this expression can be rewritten and yields the
explicit scheme.
Definition 68. The explicit finite difference method is given by the following scheme
uℓ+1,i = λuℓ,i+1 + (1− 2λ)uℓ,i + λuℓ,i−1,
for ℓ = 0, . . . , ℓmax − 1 and i = 1, . . . ,m− 1
and some suitable boundary conditions for computing u0,i ∀i = 0, . . . ,m and for the points with
i = 0 (left boundary) and i = m (right boundary).
i− 1 i i+ 1
ℓ− 1
ℓ
ℓ+ 1
time indices
• The explicit method only uses points
on the grid from the previous time step
(index ℓ) to compute the new values
with time index ℓ+ 1.
• This suggest to start computing the
values on the grid with time index
ℓ = 0 first and then at ∆τ , 2∆τ up
to τmax.
88
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Remark 69 (Boundaries). The two-dimensional grid representing the state space (x-axes) and the
time (y-axes) has four boundaries. Recall that uℓ,i = u(τℓ, xi), ℓ = 0, 1, . . . , ℓmax, i = 0, 1, . . . ,m.
• Lower boundary: ℓ = 0, τℓ = 0: These points can be computed directly from the boundary
condition of the heat equation u0,i = u(0, xi) = f(xi), ∀i = 0, . . . ,m.
• Left and right boundary: i = 0, xi = xmin and i = m, xi = xmax: These boundaries
are artificial boundaries. For now we set uℓ,0 = 0, ∀ℓ = 1, . . . , ℓmax and uℓ,m = 0, ∀ℓ =
1, . . . , ℓmax.
• Upper boundary: ℓ = ℓmax: These points can be computed directly from the explicit
scheme for i = 1, . . . ,m − 1 and for i = 0, i = m they are determined from the previous
boundary conditions.
Remark 70 (Matrix notation).
• We can write the explicit scheme in matrix notation as follows.
• We collect all grid points which are not on the left or right boundary and correspond to time
τℓ in the vector uℓ = (uℓ,1, . . . , uℓ,m−1)
⊤ ∈ Rm−1.
• Then
uℓ+1 = AE uℓ + λu
b
ℓ , for ℓ = 0, . . . , ℓmax − 1,
• where AE ∈ R(m−1)×(m−1), ubℓ ∈ Rm−1 and
AE=
1− 2λ λ 0 · · · 0 0
λ 1− 2λ λ · · · 0 0
0 λ 1− 2λ · · · 0 0
...
...
...
...
...
0 0 0 · · · 1− 2λ λ
0 0 0 · · · λ 1− 2λ
, ubℓ =
uℓ,0
0
...
0
uℓ,m
.
89
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Boundaries: The initial vector u0 and ub0 is determined by the initial condition of the heat
equation by setting
u0,i = f(xi), for i = 1, . . . ,m− 1.
• For simplicity we set ubℓ = (0, . . . , 0) ∈ Rm−1 ∀ℓ = 0, . . . , ℓmax.
• We could have considered ub0 = (f(x0), 0, . . . , 0, f(xm)) ∈ Rm−1 and then ubℓ = (0, . . . , 0) ∈
Rm−1 ∀ℓ = 1, . . . , ℓmax. This would be the equivalent formulation to the previous scheme.
• Algorithm: With these simplified boundary conditions (left and right boundaries are 0)
the approximation of the solution of the heat equation is obtained by repeatedly computing
u1 = AE u0, u2 = A
2
E u0, . . . uℓ = A
ℓ
E u0.
• Reading: (Seydel, 2009, Section 4.2.3).
7.5 The implicit (backward) finite difference scheme
• The idea of the implicit scheme is to use
– a backward scheme for the first derivative with respect to time and
– a centered scheme for the second derivative with respect to the state space.
• Therefore the implicit scheme is sometimes also referred to as Backward-Time-Centered-
Space scheme.
• We approximate the derivatives appearing in the heat equation (7.3) using (7.10) and (7.11)
to obtain
uℓ,i − uℓ−1,i
∆τ
=
uℓ,i+1 − 2uℓ,i + uℓ,i−1
(∆x)2
.
• Again we set λ = ∆τ/(∆x)2 and obtain the equivalent formulation.
Definition 71. The implicit finite difference method is given by the following scheme
− λuℓ,i+1 + (1 + 2λ)uℓ,i − λuℓ,i−1 = uℓ−1,i
for ℓ = 1, . . . , ℓmax and i = 1, . . . ,m− 1
and some suitable boundary conditions.
90
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
i− 1 i i+ 1
ℓ− 1
ℓ
ℓ+ 1
time indices
• In the implicit scheme it is no longer
possible to solve for one individual grid
point at a time, but we have to solve
the whole system of equations.
• We see that at time ℓ we only know the
value uℓ−1,i but have to compute three
unknown variables uℓ,i+1, uℓ,i, uℓ,i−1.
We therefore write the implicit scheme in matrix form as well.
Remark 72 (Matrix notation).
• We collect all grid points which are not on the left or right boundary and correspond to time
τℓ in the vector uℓ = (uℓ,1, . . . , uℓ,m−1)
⊤ ∈ Rm−1.
• Then,
uℓ−1 = AI uℓ − λuBℓ , for ℓ = 1, . . . , ℓmax,
• where AI ∈ R(m−1)×(m−1), uBℓ ∈ Rm−1 and
AI=
1 + 2λ −λ 0 · · · 0 0
−λ 1 + 2λ −λ · · · 0 0
0 −λ 1 + 2λ · · · 0 0
...
...
...
...
...
0 0 0 · · · 1 + 2λ −λ
0 0 0 · · · −λ 1 + 2λ
, uBℓ=
uℓ,0
0
...
0
uℓ,m
.
91
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• The initial vector u0 is again determined by the initial condition of the heat equation by
setting
u0,i = f(xi) for i = 1, . . . ,m− 1.
• Also, the points uℓ,0 and uℓ,m appearing in the vectors uBℓ are determined by boundary
conditions.
• For now we set again
uℓ,0 = uℓ,m = 0 for all ℓ = 0, . . . , ℓmax.
• In this case, the approximate solution of the heat equation (7.3) is obtained by repeatedly
computing
u1 = A
−1
I u0, u2 = (A
−1
I )
2u0, . . . uℓ = (A
−1
I )
ℓu0.
• Reading: (Seydel, 2009, Section 4.2.5).
7.6 The Crank-Nicolson finite difference scheme
• The idea underlying the Crank-Nicolson scheme is to take the “average” of the previous two
methods:
• explicit (forward in time) for ℓ:
uℓ+1,i − uℓ,i
∆τ
=
uℓ,i+1 − 2uℓ,i + uℓ,i−1
(∆x)2
,
• implicit (backward in time) for ℓ+ 1:
uℓ+1,i − uℓ,i
∆τ
=
uℓ+1,i+1 − 2uℓ+1,i + uℓ+1,i−1
(∆x)2
.
• The Crank-Nicolson method just adds these two equations and hence
uℓ+1,i − uℓ,i
∆τ
=
uℓ,i+1 − 2uℓ,i + uℓ,i−1 + uℓ+1,i+1 − 2uℓ+1,i + uℓ+1,i−1
2(∆x)2
.
92
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
By rearranging terms, we obtain
uℓ+1,i − uℓ,i = (uℓ,i+1 − 2uℓ,i + uℓ,i−1 + uℓ+1,i+1 − 2uℓ+1,i + uℓ+1,i−1) ∆τ
2(∆x)2︸ ︷︷ ︸
=λ2
and thus
uℓ+1,i − λ
2
uℓ+1,i+1 + λuℓ+1,i − λ
2
uℓ+1,i−1 = (uℓ,i+1 − 2uℓ,i + uℓ,i−1)λ
2
+ uℓ,i.
Definition 73. The Crank-Nicolson finite difference method is given by the following
scheme
−λ
2
uℓ+1,i−1 + (1 + λ)uℓ+1,i − λ
2
uℓ+1,i+1 =
λ
2
uℓ,i−1 + (1− λ)uℓ,i + λ
2
uℓ,i+1
for ℓ = 0, . . . , ℓmax − 1 and i = 1, . . . ,m and some suitable boundary conditions.
i− 1 i i+ 1
ℓ− 1
ℓ
ℓ+ 1
time indices
• Use of the points on the grid in the
Crank-Nicolson scheme.
• At each time level ℓ and ℓ + 1 three
grid points are involved.
• Let uℓ = (uℓ,1, . . . , uℓ,m−1)⊤.
• Then the Crank-Nicolson scheme can be written as
ACN uℓ+1 −
λ
2
ubℓ+1 = BCN uℓ +
λ
2
ubℓ,
• where
ubℓ =
(
uℓ,0 0 · · · 0 uℓ,m
)⊤
∈ Rm−1,
93
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
ACN , BCN ∈ R(m−1)×(m−1) with
ACN =
1 + λ −λ2 0 · · · 0 0
−λ2 1 + λ −λ2 · · · 0 0
0 −λ2 1 + λ · · · 0 0
...
...
...
...
...
0 0 0 · · · 1 + λ −λ2
0 0 0 · · · −λ2 1 + λ
,
BCN =
1− λ λ2 0 · · · 0 0
λ
2 1− λ λ2 · · · 0 0
0 λ2 1− λ · · · 0 0
...
...
...
...
...
0 0 0 · · · 1− λ λ2
0 0 0 · · · λ2 1− λ
.
• The initial vector u0 is given by
u0,i = f(xi), for i = 1, . . . ,m− 1.
• Furthermore, if we set
uℓ,0 = uℓ,m = 0 for all ℓ = 0, . . . , ℓmax,
• then the approximate solution of the heat equation (7.3) is obtained by repeatedly computing
u1 = A
−1
CN BCN u0, u2 = (A
−1
CN BCN )
2u0, . . . uℓ = (A
−1
CN BCN )
ℓu0.
• Reading: (Seydel, 2009, Section 4.3).
94
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
7.7 The θ-method
• The θ-method is a generalisation of the previous schemes.
• It considers the following approximation of the heat equation:
uℓ+1,i − uℓ,i
∆τ
= (1− θ)
(
uℓ,i+1 − 2uℓ,i + uℓ,i−1
(∆x)2
)
+ θ
(
uℓ+1,i+1 − 2uℓ+1,i + uℓ+1,i−1
(∆x)2
)
.
• Hence with θ = 0 we obtain the explicit scheme,
with θ = 1/2 we obtain the Crank-Nicolson scheme and
with θ = 1 we obtain the implicit scheme.
• Reading (Seydel, 2009, Section 4.6.1)
7.8 Boundary conditions
7.8.1 The general heat equation
We consider a function u on [0,∞)× R which satisfies
uxx(τ, x) = uτ (τ, x) ∀(τ, x) ∈ (0,∞)× R,
u(0, x) = u0(x) ∀x ∈ R,
(7.13)
where u0 is a continuous and bounded function.
• Consider the function
U(τ, x) =
1√
4πτ
exp
(
−x
2
4τ
)
=
1√
2π(2τ)
exp
(
−1
2
(
x2
2τ
))
which is the probability density function of the N1(0, 2τ)-distribution.
• Then one can show by differentiation that
Uxx(τ, x) = Uτ (τ, x), ∀(τ, x) ∈ (0,∞)× R.
• Moreover, one can show that
u˜(τ, x) :=
∫ +∞
−∞
U(τ, x− y)u0(y)dy (7.14)
=
∫ +∞
−∞
1√
2π(2τ)
exp
(
−1
2
(
(x− y)2
2τ
))
u0(y)dy (7.15)
solves (7.13) and indeed limτ→0+ u˜(τ, x) = u0(x). This solution is unique.
95
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Recall that in (7.8) the boundary condition was stated in terms of the terminal payoff
function g of an option as
u(0, x) = e
1
2 (−1+ 2mσ2 )xg(Kex)
1
K
.
• If we consider a European Call option, then g(Kex) = (Kex−K)+ = K(ex−1)+ and hence
we obtain
u(0, x) = e
1
2 (−1+ 2mσ2 )x(ex − 1)+
=
(
e(−
1
2+
m
σ2
+1)x − e(− 12+ mσ2 )x
)+
.
• Similarly, for a European put option, g(Kex) = (K −Kex)+ = K(1− ex)+ and hence
u(0, x) =
(
e(−
1
2+
m
σ2
)x − e(− 12+ mσ2 +1)x
)+
.
• If one uses these boundary conditions in (7.14), one can derive the Black-Scholes formula for
a European Call or Put.
7.8.2 The heat equation on [0, τmax]× [xmin, xmax]
• In order to apply a finite difference schemes we cannot consider a function u on [0,∞)×R,
but we will have to consider a space [0, τmax]× [xmin, xmax] where [xmin, xmax] ⊆ R.
• As we have seen before τmax = σ2T2 , where T is the maturity of the option.
Hence this is a natural boundary in an option pricing context.
• The bounds, xmin, xmax are artificial.
From the change of variables S = Kex and hence x = log
(
S
K
)
. Since the logarithm/
exponential are increasing functions Smin = Ke
xmin and Smax = Ke
xmax .
• Let VP be the European put price and VC the European call price.
• It is clear that VC(t, 0) = 0 ∀t, i.e., the price of a European Call is zero, if the asset
price is 0. This is obvious from the payoff (0−K)+.
• Similarly, for large values of the stock, the price of a European put will be zero:
VP (t, S) = 0 ∀t and S large.
96
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• We keep these results in mind and consider the Put-Call-Parity (with dividends, i.e.,
δ ≥ 0):
VC(t, St)− VP (t, St) = Ste−δ(T−t) −Ke−r(T−t).
• At terminal time T this is clear, since VC(T, ST )− VP (T, ST ) = (ST −K)+ − (K − ST )+ =
ST −K. One can use an arbitrage argument to prove this for t < T .
• Hence we can read of the following boundary conditions for s ≈ 0 and for s→∞ which hold
for all 0 ≤ t < T :
VC(t, s) = 0 for s ≈ 0,
VC(t, s) = se
−δ(T−t) −Ke−r(T−t) for large s,
VP (t, s) = 0 for large s,
VP (t, s) = Ke
−r(T−t) − se−δ(T−t) for s ≈ 0.
(7.16)
• These conditions are sometimes simplified further to
VC(t, s) = 0 for s ≈ 0,
VC(t, s) = se
−δ(T−t) for large s
VP (t, s) = 0 for large s,
VP (t, s) = Ke
−r(T−t) for s ≈ 0.
(7.17)
• For large values of s we then obtain
VC(t, s) = se
−δ(T−t) = Kexe−δ
2τ
σ2 = Keax+bτu(τ, x)
⇔ u(τ, x) = exp
(
x (1− a) + τ
(
−δ 2
σ2
− b
))
• and
1− a = 1 + 1
2
(−1 + 2m
σ2
) =
1
2
(1 +
2m
σ2
),
−b− δ 2
σ2
=
1
4
(−1 + 2m
σ2
)2 +
2r
σ2
− 2δ
σ2︸ ︷︷ ︸
= 2m
σ2
=
1
4
(1 +
2m
σ2
)2.
97
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• For s ≈ 0 we obtain
VP (t, s) = Ke
−r(T−t) = Ke−r
2τ
σ2 = Keax+bτu(τ, x)
⇔ u(τ, x) = exp
(
x (−a) + τ
(
−r 2
σ2
− b
))
• and
−a = 1
2
(−1 + 2m
σ2
),
−b− r 2
σ2
=
1
4
(−1 + 2m
σ2
)2 +
2r
σ2
− 2r
σ2
=
1
4
(−1 + 2m
σ2
)2.
Remark 74.
• Boundary conditions ∀ 0 ≤ τ < σ2T2 for large x = xmax and small x = xmin are
u(τ, xmin) = r1(τ, xmin),
u(τ, xmax) = r2(τ, xmax),
• where for a European Call
u(τ, xmin) = r1(τ, xmin) = 0,
u(τ, xmax) = r2(τ, xmax) = exp
(
1
2
(1 +
2m
σ2
)xmax +
1
4
(1 +
2m
σ2
)2τ
)
,
• and for a European Put
u(τ, xmin) = r1(τ, xmin) = exp
(
1
2
(−1 + 2m
σ2
)xmin +
1
4
(−1 + 2m
σ2
)2τ
)
,
u(τ, xmax) = r2(τ, xmax) = 0.
• For boundary points uℓ,0, uℓ,m ∀ℓ = 1, . . . , ℓmax we choose
uℓ,0 = r1(τℓ, xmin),
uℓ,m = r2(τℓ, xmax).
• Reading: (Seydel, 2009, Section 4.4).
98
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
7.9 Stability of the finite difference schemes
• In the explicit scheme with simplified boundary conditions we do the iteration
uℓ+1 = AE uℓ.
• We analyse rounding errors in the following.
• Let uℓ be the theoretical (exact) value and uˆℓ the vector computed by the computer.
• Then we define the error in time step ℓ by
eℓ = uˆℓ − uℓ, ℓ ≥ 0. (7.18)
• If a computer evaluates AE uˆℓ a rounding error denoted by rℓ+1 occurs.
• Then
uˆℓ+1 = AE uˆℓ + rℓ+1.
• Suppose a rounding error occurs at a step ℓ∗. We are interested in the consequences of this
rounding error as we continue with our computations.
• We could, for example, investigate the effect of a rounding error at ℓ∗ = 0. For simplification
we assume rℓ = 0 for all ℓ ≥ 1.
• The error e0 is the initial rounding error which occurs when the boundary condition is
evaluated to derive uˆ0. We analyse the effect of this error as we progress.
99
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• By assumption rℓ = 0 ∀ℓ ≥ 1. Hence,
uˆ1 = AE uˆ0 + r1︸︷︷︸
=0
= AE uˆ0,
uˆ2 = AE uˆ1 + r2︸︷︷︸
=0
= AE uˆ1 = A
2
E uˆ0,
...
uˆℓ = AE uˆℓ−1 + rℓ︸︷︷︸
=0
= AℓE uˆ0.
• Furthermore, u1 = AEu0, . . . , uℓ = AℓEu0. Hence,
•
eℓ = uˆℓ − uℓ = AℓE uˆ0 −AℓEu0 = AℓE(uˆ0 − u0) = AℓEe0.
• We call a method stable if the error is damped. We require that
AℓE e0 → 0 as ℓ→∞.
Hence, limℓ→∞(AℓE)ij = 0 for all pairs of indices (i, j).
Lemma 75. Let µ1, . . . , µm−1 be the eigenvalues of A ∈ R(m−1)×(m−1) and ρ(A) = maxi |µi| be
the spectral radius of A. Then the following statements are equivalent:
ρ(A) < 1
⇔ Aℓz → 0 ∀z and ℓ→∞
⇔ lim
ℓ→∞
(Aℓ)ij = 0 for all pairs of indices (i, j).
See (Seydel, 2009, Lemma 4.2, p. 149).
100
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
Lemma 76. Consider the matrix A ∈ Rn×n, where
A =
α β 0 · · · 0 0
γ α β · · · 0 0
0 γ α · · · 0 0
...
...
...
...
...
0 0 0 · · · α β
0 0 0 · · · γ α
,
for some α, β, γ ∈ R, β 6= 0, βγ ≥ 0. Then the eigenvalues µk and the eigenvectors v(k) of A are
given by µk = α+ 2β
√
γ
β cos
(
kπ
n+1
)
,
v(k)=
(√
γ
β
sin
(
kπ
n+ 1
)
,
(√
γ
β
)2
sin
(
2kπ
n+ 1
)
,. . . ,
(√
γ
β
)n
sin
(
nkπ
n+ 1
))⊤
,
for k = 1, . . . , n.
See (Seydel, 2009, Lemma 4.3, p. 149).
Proof. This result can be verified by checking that indeed Av(k) = µkv
(k) for all k = 1, . . . , n.
• Using the previous lemma one finds that the eigenvalues of the matrix AE are given by
µk = 1− 4λ sin2( kπ
2m
), k = 1, . . . ,m− 1,
see (Seydel, 2009, p. 150).
• Hence requiring that |µk| < 1 for all k leads to the condition for all k = 1, . . . ,m− 1 that
−1 < 1− 4λ sin2( kπ
2m
) < 1
⇔ −2 < −4λ sin2( kπ
2m
) < 0
⇔ 1
2
> λ sin2(
kπ
2m
) > 0.
• Hence λ > 0 and for k = m− 1 we obtain (m−1)π2m as the argument of sin. For large m this
tends to π2 and the whole function tends to 1.
When deriving the eigenvalues of AE use Lemma 76 with α = 1 − 2λ, β = λ, γ = λ (where
λ = ∆τ(∆x)2 ) and use the fact that cos(2θ) = 1− 2 sin2(θ) for all θ (and set here θ = kπ2m ).
101
L. A. M. Veraart MA417/ST433 Computational Methods in Finance Lent Term 2023
• Hence we see that the explicit method is stable for 0 < λ ≤ 12 and hence for
0 <∆τ ≤ (∆x)2
2
.
• The implicit finite difference scheme is computationally more expensive (but not extens-
ively) than the explicit scheme because it involves inverting the matrix AI .
However, one can also derive the corresponding eigenvalues and finds that the implicit scheme
is stable for all ∆τ > 0.
• The Crank-Nicolson method is also stable for all ∆τ > 0. Moreover the order of conver-
gence is O((∆τ)2)+O((∆x)2) (rather than O(∆τ)+O((∆x)2) as in the explicit and implicit
scheme).
For proofs see (Seydel, 2009, Theorem 4.4, p.153).
• Reading: (Seydel, 2009, Section 4.2.4, 4.3).
102
Bibliography
Asmussen, S. & Glynn, P. W. (2007). Stochastic Simulation: Algorithms and Analysis . Springer.
Flury, B. (1990). Acceptance-rejection sampling made easy. SIAM Review 32, 474–476.
Glasserman, P. (2004). Monte Carlo Methods in Financial Engineering . Springer.
Gut, A. (2005). Probability; a Graduate Course. Springer.
Jacod, J. & Protter, P. E. (2000). Probability Essentials . Springer.
Kallenberg, O. (2002). Foundations of Modern Probability . Springer.
Kloeden, P. & Platen, E. (1999). Numerical Solution of Stochastic Differential Equations. Springer.
Ross, S. N. (2006). Simulation. 4 ed., Elsevier Academic Press.
Seydel, R. U. (2009). Tools for Computational Finance. 4 ed., Springer.
103