UMR7013-math4512代写|学霸联盟

UMR7013-math4512代写

时间：2023-10-25

Long-time dynamics
of stochastic differential equations
Nils Berglund
Institut Denis Poisson – UMR 7013
Universite d’Orleans, Universite de Tours, CNRS
Lecture notes
Summer School — From kinetic equations to statistical mechanics
Saint Jean de Monts, June–July 2021
— Version of June 25, 2021—
ar
X
iv
:2
10
6.
12
99
8v
1
[m
ath
.PR
]
24
Ju
n 2
02
1

Contents
1 Stochastic Differential Equations and Partial Differential Equations 1
1.1 Brownian motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Construction of Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Basic properties of Brownian motion . . . . . . . . . . . . . . . . . . . . . 5
1.1.3 Brownian motion and heat equation . . . . . . . . . . . . . . . . . . . . . 6
1.2 Ito calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.1 Ito’s integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Stochastic differential equations . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Diffusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 Semigroups and generators . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.3 Dynkin’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.4 Kolmogorov’s equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.3.5 The Feynman–Kac formula . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Invariant measures for SDEs 35
2.1 Existence of invariant probability measures . . . . . . . . . . . . . . . . . . . . . 36
2.1.1 Some basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 The Krylov–Bogoliubov criterion . . . . . . . . . . . . . . . . . . . . . . . 38
2.2 The Lyapunov function approach by Meyn and Tweedie . . . . . . . . . . . . . . 38
2.2.1 Non-explosion and Harris recurrence criteria . . . . . . . . . . . . . . . . 39
2.2.2 Positive Harris recurrence and existence of an invariant probability . . . 41
2.2.3 Convergence to the invariant probability measure . . . . . . . . . . . . . 42
2.2.4 A simplified proof by Hairer and Mattingly . . . . . . . . . . . . . . . . . 43
2.3 Garrett Birkhoff’s approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.3.1 Two-dimensional case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3.2 General vector space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.3.3 Integral transformations and Jentzsch’s theorem . . . . . . . . . . . . . . 50
2.3.4 Banach lattices and spectral gap . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3.5 From discrete time to continuous time . . . . . . . . . . . . . . . . . . . . 53
Bibliography 57
i
ii Contents
Preface
These lecture notes have been prepared for a series of lectures to be given at the Summer School
“From kinetic equations to statistical mechanics”, organised by the Henri Lebesgue Center in
Saint Jean de Monts, from June 28th to July 2nd 2021. This is a preliminary version of the notes,
which may still contain errors. A third chapter, concerning the theory of large deviations and
its applications to stochastic differential equations, has yet to be written.
The author wishes to thank the organisers of the Summer School, Frédéric Hérau (Univer-
sité de Nantes), Laurent Michel (Université de Bordeaux), and Karel Pravda-Starov (Université
de Rennes 1) for the kind invitation that provided the motivation to compile these lecture
notes.
iii
iv Contents
Chapter1
Stochastic Differential Equations and
Partial Differential Equations
1.1 Brownian motion
The fundamental building block of the theory of stochastic differential equations is a math-
ematical object called Wiener process, or Brownian motion. This should not be confused with
the physical phenomenon of Brownian motion, describing for instance the erratic movements
of a small particle in a fluid, though the mathematical model has of course been introduced
as a simplified description of the physical process. There is a huge literature on properties of
Brownian motion. In what follows, we will focus on only a few of these properties that will be
important for links between stochastic and partial differential equations.
1.1.1 Construction of Brownian motion
Heuristically, Brownian motion can be defined as a scaling limit of a random walk. Let {Xn}n>0
be a symmetric random walk on Z, defined as
Xn =
n∑
i=1
ξi ,
where the ξi are i.i.d. (independent and identically distributed) random variables, taking val-
ues ±1 with probability 12 . The following properties are easy to check:
1. Xn has zero expectation: E[Xn] = 0 for all n;
2. The variance of Xn satisfies Var(Xn) = n;
3. Xn takes values in {−n,−n+ 2, . . . ,n− 2,n}, with
P
{
Xn = k
}
=
1
2n
n!(n+k
2
)
!
(n−k
2
)
!
.
4. Independent increments: for all n > m> 0, Xn −Xm is independent of X1, . . . ,Xm;
5. Stationary increments: for all n > m> 0, Xn −Xm has the same distribution as Xn−m.
Consider now the sequence of processes
W
(n)
t =
1√
n
Xbntc , t ∈ R+ , n ∈ N .
At stage n, space has been compressed by a factor n, while time has been sped up by a factor√
n (Figure 1.1).
1
2 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
5 10
−20−2
−1
0
1
2
n
Sn
10 20 30 40 50−20
−4
−2
0
2
4
n
Sn
50 100 150 200 250
−20
−10
0
10
n
Sn
500 1000 1250
−60
−40
−20
0
20
n
Sn
Figure 1.1 – Two realisations (one in red, the other one in blue) of a symmetric random walk on Z,
seen at different scales. From one picture to the next, the horizontal scale is compressed by a factor
5, while the vertical scale is compressed by a factor
√
5.
Formally, as n→∞, the processes {W (n)t }t>0 should converge to a stochastic process {Wt}t>0
satisfying the following properties.
1. E[Wt] = 0 for all t> 0;
2. The variance of Wt satisfies
Var(Wt) = limn→∞
(
1√
n
)2
bntc = t .
3. By the central limit theorem, Xbntc/
√bntc converges in distribution to a standard normal
random variable. Therefore, for each t, Wt follows a normal law N (0, t).
4. Independent increments: for all t > s> 0, Wt −Ws est independent of {Wu}06u6s;
5. Stationary increments: for all t > s> 0, Wt −Ws has the same distribution as Wt−s.
This motivates the following definition.
Definition 1.1.1: Brownian motion
Standard Brownian motion (also called the standard Wiener process) is the stochastic process
{Wt}t>0 satisfying:
1. W0 = 0;
2. Independent increments: for all t > s> 0, Wt −Ws est independent of {Wu}u6s;
3. Stationary increments: for all t > s> 0, Wt −Ws follows a normal law N (0, t − s).
Theorem 1.1.2: Existence of Brownian motion
There exists a stochastic process {Wt}t>0 satisfying Definition 1.1.1, and whose trajectories
t 7→ Bt(ω) are continuous.
Proof:
1.1. Brownian motion 3
W
(3)
t
W
(2)
t
W
(1)
t
W
(0)
t
X1/2
X1
X3/4
X1/4
1
4
1
2
3
4
1
Figure 1.2 – Construction of Brownian motion by interpolation.
1. We start by constructing {Wt}06t61 from a collection of independent Gaussian random vari-
ables V1,V1/2,V1/4,V3/4,V1/8, . . . , all with zero mean, where V1 and V1/2 have variance 1 and
each Vk2−n has variance 2−(n−1) (k < 2n odd).
We first show that if Xs et Xt are two random variables such that Xt−Xs is centred, Gaussian
with variance t − s, then there exists a random variable X(t+s)/2 such that the random vari-
ables Xt −X(t+s)/2 and X(t+s)/2 −Xs are i.i.d. with law N (0, (t − s)/2). If U = Xt −Xs and V is
independent of U , with the same distribution, il suffices to define X(t+s)/2 by
Xt −X(t+s)/2 = U +V2
X(t+s)/2 −Xs = U −V2 . (1.1.1)
Indeed, it is easy to check that these variables have the required distributions, and that they
are independent, since E[(U +V )(U −V )] = E[U2]−E[V 2] = 0, and normal random variables
are independent if and only if they are uncorrelated.
Let us set X0 = 0, X1 = V1, and construct X1/2 with the above procedure, taking V = V1/2.
Then we construct X1/4 with the help of X0, X1/2 and V1/4, and so on, to obtain a family of
variables {Xt}t=k2−n,n>1,k<2n such that for t > s, Xt −Xs is independent of Xs and has distribu-
tion N (0, t − s).
2. For n> 0, let {W (n)t }06t61 be the stochastic process with piecewise linear trajectories on in-
tervals [k2−n, (k+1)2−n], k < 2n, and such thatW (n)k2−n = Xk2−n (Figure 1.2). We want to show
that the sequence W (n)(ω) converges uniformly on [0,1] for any realisation ω of the Vi . We
thus have to estimate
∆(n)(ω) = sup
06t61
∣∣∣W (n+1)t (ω)−W (n)t (ω)∣∣∣
= max
06k62n−1
max
k2−n6t6(k+1)2−n
∣∣∣W (n+1)t (ω)−W (n)t (ω)∣∣∣
= max
06k62n−1
∣∣∣∣X(2k+1)2−(n+1)(ω)− 12(Xk2−n(ω) +X(k+1)2−n(ω))∣∣∣∣
(see Figure 1.3). The term in the absolute value is 12V(2k+1)2−(n+1) by construction, c.f. (1.1.1),
4 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
W
(n+1)
t
W
(n)
t
X(2k+1)2−(n+1)
X(k+1)2−n
Xk2−n+X(k+1)2−n
2
Xk2−n
k2−n (2k + 1)2−(n+1) (k + 1)2−n
Figure 1.3 – Computation of ∆(n).
which is Gaussian with variance 2−n. Therefore,
P
{
∆(n) >
√
n2−n} = P{ max
06k62n−1
∣∣∣V(2k+1)2−(n+1) ∣∣∣> 2√n2−n}
6 2 · 2n
∫ ∞
2
√
n2−n
e−x2/2·2−n dx√
2pi2−n
= 2 · 2n
∫ ∞
2
√
n
e−y2/2 dx√
2pi
6 const2n e−2n ,
and thus ∑
n>0
P
{
∆(n) >
√
n2−n}6 const ∑
n>0
(2e−2)n <∞ .
The Borel–Cantelli lemma shows that with probability 1, there exist only finitely many n
for which ∆(n) >
√
n2−n. It follows that
P
{∑
n>0
∆(n) <∞
}
= 1 .
The sequence {W (n)t }06t61 is thus a Cauchy sequence for the sup norm with probability 1,
and therefore converges uniformly. For t ∈ [0,1] we set
W 0t =
limn→∞W (n)t if the sequence converges uniformly0 otherwise (with probability 0).
It is easy to check that B0 satisfies the three properties of the definition.
3. To extend the process to all times, we build independent copies {W i}i>0 and set
Wt =

W 0t 06 t < 1
W 01 +W
1
t−1 16 t < 2
W 01 +W
1
1 +W
2
t−2 26 t < 3
. . .
This concludes the proof.
1.1. Brownian motion 5
Remark 1.1.3: n-dimensional Brownian motion
For any n ∈ N, one can define n-dimensional Brownian motion in the same way as in Defini-
tion 1.1.1, except that the normal laws are n-dimensional. Its components are then simply
independent 1-dimensional Brownian motions.
1.1.2 Basic properties of Brownian motion
The following basic properties of Brownian motion follow more or less immediately from Def-
inition 1.1.1.
1. Markov property: For any Borel set A ⊂ R,
P
{
Wt+s ∈ A
∣∣∣∣Wt = x} = ∫
A
p(t + s,y|t,x)dy ,
independently of {Wu}up(t + s,y|t,x) = e
−(y−x)2/2s
√
2pis
. (1.1.2)
The proof follows directly from the decompositionWt+s =Wt+(Wt+s−Wt), where the second
term is independent of the first one, with distribution N (0, s). In particular, one checks the
Chapman–Kolmogorov equation: For t > u > s,
p(t,y|s,x) =
∫
R
p(t,y|u,z)p(u,z|s,x)du . (1.1.3)
2. Differential property: For all t>0, {Wt+s−Wt}s>0 is a standard Brownian motion, independent
of {Wu}u3. Scaling proprerty: For all c > 0, {cWt/c2}s>0 is a standard Brownian motion.
4. Symmetry: {−Wt}t>0 is a standard Brownian motion.
5. Gaussian process : The Wiener process is Gaussian with zero mean (meaning that its finite-
dimensional joint distributions P{Wt1 6 x1, . . . ,Wtn 6 xn} are centred normal), and charac-
terised by its covariance
Cov{Wt ,Ws} ≡ E[WtWs] = s∧ t (1.1.4)
(where s∧ t denotes the minimum of s and t).
Proof: For s < t, we have
E[WtWs] = E[Ws(Ws +Wt −Ws)] = E[W 2s ] +E[Ws(Wt −Ws)] = s ,
since the second term vanishes by the independent increments property.
In fact, one can show that a centred Gaussian process whose covariance satisfies (1.1.4) is a
standard Wiener process.
One important consequence of the scaling and independent increments properties is then
the following.
Theorem 1.1.4: Non-differentiability of Brownian paths
The paths t 7→Wt(ω) are almost surely nowhere Lipschitzian, and thus nowhere differen-
tiable.
6 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Proof: Fix C <∞ and introduce, for n> 1, the event
An =
{
ω : ∃s ∈ [0,1] s.t. |Wt(ω)−Ws(ω)|6C|t − s| if |t − s|6 3n
}
.
We have to show that P(An) = 0 for all n. Observe that if n increases, the condition gets weaker,
so that An ⊂ An+1. For n> 3 and 16 k6n− 2, define
Yk,n(ω) = max
j=0,1,2
{∣∣∣∣W(k+j)/n(ω)−W(k+j−1)/n(ω)∣∣∣∣} ,
Bn =
n−2⋃
k=1
{
ω : Yk,n(ω)6
5C
n
}
.
The triangular inequality implies An ⊂ Bn. Indeed, let ω ∈ An. If for instance s = 1, then for
k = n− 2, one has∣∣∣W(n−3)/n(ω)−W(n−2)/n(ω)∣∣∣6 ∣∣∣W(n−3)/n(ω)−W1(ω)∣∣∣+ ∣∣∣W1(ω)−W(n−2)/n(ω)∣∣∣6C(3n + 2n)
and thus ω ∈ Bn. It follows from the independent increments and scaling properties that
P(An)6P(Wn)6nP
(
|B1/n|6 5Cn
)3
= nP
(
|W1|6 5C√
n
)3
6n
(
10C√
2pin
)3
.
Therefore P(An)→ 0 for all n→∞. But since P(An)6 P(An+1) for all n, this implies P(An) = 0
for all n.
Remark 1.1.5: Hölder regularity of Brownian paths
Even though paths of Brownian motion are nowhere differentiable, one can show that they
do have a regularity that is better than continuity: namely, the paths are almost surely
(locally) Hölder continuous of exponent α for any α < 12 . This can be shown by applying
the Kolmogorov–Centsov continuity criterion.
1.1.3 Brownian motion and heat equation
Observe that the Gaussian transition probabilities (1.1.3) of the Wiener process are, up to a
scaling, equal to the heat kernel. In particular, p(t,x|0,0) satisfies the heat equation
∂
∂t
p(t,x|0,0) = 1
2
∆p(t,x|0,0)
p(0,x|0,0) = δ(x) ,
where we write ∆ for the second derivative with respect to x. This reflects the fact that paths
of Brownian motion have the same diffusive behaviour as solutions of the heat equation.
Similarly, transition probabilities of n-dimensional Brownian motion are given by
p(t + s,y|t,x) = e
−‖y−x‖2/2s
(2pis)n/2
,
and satisfy therefore the n-dimensional heat equation.
It is, however, important to realise that Brownian motion contains much more information
than the solutions of the heat equation, since it gives a probability distribution on paths t 7→
Wt(ω), rather than just a collection of probability distributions for the Wt(ω) with t > 0. To
illustrate the difference, we discuss two examples of modifications of Brownian motion.
1.1. Brownian motion 7
Wt
W rt
H
t
Figure 1.4 – Brownian motion reflected at level H .
Example 1.1.6: Reflected Brownian motion
Denote by W rt Brownian motion reflected on the line x = H in (t,x)-space, for a constant
H > 0. For any x6H we can write
P
{
W rt 6 x
}
= P
{
Wt 6 x
}
+P
{
Wt > 2H − x} .
Indeed, at least heuristically, if Wt(ω)6 x, then one obtains a reflected path of Brownian
motion from an original path simply by reflecting all parts of the path that lie above the
line x =H . If Wt(ω)>2H −x, reflecting again all parts above the line also yields a reflected
path ending up below H . We thus have
P
{
W rt 6 x
}
= Φ
(
x√
t
)
+Φ
(
x − 2H√
t
)
,
where
Φ(x) =
1√
2pi
∫ x
0
e−y2/2 dy
denotes the distribution function of the standard normal law. Note in particular that since
Φ(−x) = 1−Φ(x), one has P{W rt 6H} = 1 for all t>0, as it should be. Taking the derivative
with respect to x, we obtain the density
pr(t,x) =
1√
2pit
(
e−x2/(2t) +e−(2H−x)2/(2t)
)
,
which solves
∂
∂t
pr(t,x) =
1
2
∆pr(t,x) x6H ,
∇p(t,H) = 0 ,
that is, the heat equation with Neumann boundary conditions.
To discuss our second example, we will need the so-called reflection principle, which we give
here without proof.
8 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Wt
W ∗t
H
t
Figure 1.5 – Reflection principle.
Proposition 1.1.7: Andre’s reflection principle
For any H > 0 and setting τ = inf{t> 0: Wt >H}, the process
W ∗t =
Wt if t6 τ2H −Wt if t > τ
is a standard Brownian motion.
Example 1.1.8: Brownian motion killed upon reaching level H
Denote by W kt Brownian motion killed at level H > 0, which is defined by
P
{
W kt 6 x
}
= P
{
Wt 6 x,τ > t
}
for any x6H . Note that W kt is an improper random variable for any t > 0, in the sense
that the total probability is strictly smaller than 1. Since paths of Brownian motion are
continuous, we can write for any y>H
P
{
Wt > y
}
= P
{
Wt > y,τ 6 t
}
= P
{
2H −Wt 6 2H − y,τ 6 t}
= P
{
W ∗t 6 2H − y,τ 6 t}
= P
{
Wt 6 2H − y,τ 6 t} .
Setting y = 2H − x, this provides us with an expression for P{Wt 6 x,τ 6 t} that yields
P
{
W kt 6 x
}
= P
{
Wt 6 x
}−P{Wt 6 x,τ 6 t}
= P
{
Wt 6 x
}−P{Wt > 2H − x} .
This gives the density
pk(t,x) =
1√
2pit
(
e−x2/(2t)−e−(2H−x)2/(2t)
)
,
1.2. Ito calculus 9
which solves
∂
∂t
pk(t,x) =
1
2
∆pk(t,x) x6H ,
p(t,H) = 0 ,
that is, the heat equation with Dirichlet boundary conditions.
Exercise 1.1.9: 2-dimensional Brownian motion hitting a straight line
Let Wt = (W
(1)
t ,W
(2)
t ) be a standard 2-dimensional Brownian motion, and let
τ = inf
{
t > 0: W (1)t = 1
}
be the first-hitting time of the line {x = 1}. Determine the density of τ , and use it to
compute the distribution of W (2)τ .
1.2 Ito calculus
While Brownian motion is a useful model with many interesting properties, one may have
to deal with more general processes, such as functions of Brownian motion. The question
then arises, what kinds of stochastic or partial differential equations are associated with these
processes.
Example 1.2.1: Brownian motion squared
Consider Brownian motion squared. Since
P
{
W 2t 6 x
}
= P
{|Wt |6√x} = Φ(√xt
)
−Φ
(
−
√
x
t
)
,
we obtain that the density of W 2t is given by
2
∂
∂x
Φ
(√
x
t
)
=
1√
2pitx
e−x/(2t) .
This function should solve some PDE, which, however, is not straightforward to guess. In
this section, we will develop methods that will ultimately allow to determine associated
PDEs quite easily.
1.2.1 Ito’s integral
The key notion of stochastic calculus is the Ito integral, which allows to give a meaning to the
quantity ∫ t
0
f (s)dWs
for suitable, possibly random functions f . Since the Wiener process is not Lipschitzian, it
does not have bounded variation, so that the above integral cannot be defined as a Riemann–
Stieltjes integral. There are nowadays several (essentially equivalent) ways of defining the
above integral, the oldest one going back to Kiyoshi Ito.
10 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Fix a Brownian motion {Wt}t>0. The random variables {Ws}06s6t define an increasing se-
quence σ -algebras {Ft}t>0 (called a filtration) that will play a key role in what follows. In par-
ticular, we will use the notion of random variables that are measurable with respect to a given
Ft. Intuitively, these are exactly the random variables that depend only on the behaviour of
the Wiener process up to time t.
Definition 1.2.2: Ito integral of elementary functions
Fix a time interval [0,T ]. A random function {et}t∈[0,T ] is called simple or elementary if there
exists a partition 0 = t0 < t1 < · · · < tN = T of [0,T ] such that
et =
N∑
k=1
etk−11[tk−1,tk)(t) .
It is called adapted to the filtration {Ft}t>0 if each etk−1 is a random variable measurable
with respect to Ftk−1 . For such an elementary function, Ito’s integral is defined as∫ t
0
esdWs =
m∑
k=1
etk−1
[
Wtk −Wtk−1
]
+ etm
[
Wt −Wtm
]
(1.2.1)
for any t ∈ [0,T ], where m is such that t ∈ [tm, tm+1).
One easily sees that this integral is a linear functional of the integrand, and is additive with
respect to time intervals. Furthermore, since each increment
[
Wtk −Wtk−1
]
is independent of
etk−1 , the integral has zero expectation. The key property is then the following.
Lemma 1.2.3: Ito isometry
If
∫ t
0
E[e2s ]ds <∞, then
E
[(∫ t
0
esdWs
)2]
=
∫ t
0
E[e2s ]ds . (1.2.2)
Proof: Set tm+1 = t. Then
E
[(∫ t
0
esdWs
)2]
= E
[m+1∑
k,l=1
etk−1etl−1
(
Wtk −Wtk−1
)(
Wtl −Wtl−1
)]
=
m+1∑
k=1
E
[
e2tk−1
]
E
[(
Wtk −Wtk−1
)2]︸︷︷︸
tk−tk−1
=
∫ t
0
E[e2s ]ds .
We have used the property of independent increments to eliminate the terms k , l from the
double sum, and the fact that each es is measurable with respect to Fs.
Ito’s isometry is an isometry between the Hilbert space L2ad([0,T ]×Ω,P) of adapted square-
integrable processes and the Hilbert space L2(Ω,P) of square-integrable random variables. For
a general square-integrable adapted process (Xt)t>0, one can find a sequence of elementary en
such that
lim
n→∞
∫ T
0
E
[(
Xs − e(n)s )2]ds = 0 .
1.2. Ito calculus 11
The isometry (1.2.2) then shows that for any t ∈ [0,T ], the following limit exists in L2(P):
lim
n→∞
∫ t
0
e
(n)
s dWs=:
∫ t
0
XsdWs .
This is by definition the Ito integral of Xs against Ws. This integral has the same linearity and
additivity properties as integrals of elementary functions, and also satifies Ito’s isometry
E
[(∫ t
0
XsdWs
)2]
=
∫ t
0
E[X2s ]ds .
1.2.2 Ito’s formula
Ito’s formula gives a simple answer to a question we have been asking above, namely what kind
of differential relation governs functions of Brownian motion. We start by a simple example,
which however contains all the essential ideas of the general case.
Example 1.2.4
Let us show that ∫ t
0
WsdWs =
1
2
W 2t − t2 . (1.2.3)
Define the sequence of elementary functions e(n)t = W2−nb2ntc. It is then sufficient to check
that
lim
n→∞
∫ t
0
e
(n)
s dWs =
1
2
W 2t − t2 .
Write tk = k2−n for k6m = b2ntc and tm+1 = t. The definition (1.2.1) implies
2
∫ t
0
e
(n)
s dWs = 2
m+1∑
k=1
Wtk−1
(
Wtk −Wtk−1
)
=
m+1∑
k=1
[
W 2tk −W 2tk−1 −
(
Wtk −Wtk−1
)2]
=W 2t −
m+1∑
k=1
(
Wtk −Wtk−1
)2 .
Consider now the random variable
M
(n)
t =
m+1∑
k=1
(
Wtk −Wtk−1
)2 − t = m+1∑
k=1
[(
Wtk −Wtk−1
)2 − (tk − tk−1)] . (1.2.4)
12 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Since all terms of the sum are independent and have zero expectation, we obtain
E
[(
M
(n)
t
)2] = m+1∑
k=1
E
[[(
Wtk −Wtk−1
)2 − (tk − tk−1)]2]
6 (m+ 1)E[[(Wt1 −Wt0)2 − (t1 − t0)]2]
6 const2nE[[(W2−n)2 − 2−n]2]
= const2−nE[[(W1)2 − 1]2]
6 const2−n ,
owing to the scaling property. Therefore, M(n)t converges to zero in L
2, proving (1.2.3).
Remark 1.2.5
Using more sophisticated tools from stochastic analysis, it is possible to prove a stronger
type of convergence. Indeed, M(n)t is what is known as a submartingale, for which Doob’s
inequality yields
P
{
sup
06s6t
(
M
(n)
s
)2 > n22−n}6 2nn−2E[(M(n)t )2]6 constn−2 .
The Borel–Cantelli lemma then shows that
P
{
sup
06s6t
∣∣∣M(n)s ∣∣∣ < n2−n/2,n→∞} = 1 ,
proving almost sure convergence.
Consider now a stochastic integral of the form
Xt = X0 +
∫ t
0
fsds+
∫ t
0
gsdWs , t ∈ [0,T ] (1.2.5)
where X0 is a random variable independent of the Brownian motion, and f and g are two
adapted processes satisfying
P
{∫ T
0
|fs|ds <∞
}
= 1
P
{∫ T
0
g2s ds <∞
}
= 1 .
The process (1.2.5) can also be written in differential form as
dXt = ft dt + gt dWt .
For instance, Relation (1.2.3) is equivalent to
d(W 2t ) = dt + 2Wt dWt . (1.2.6)
Ito’s formula allows to determine the effect of a change of variables on the stochastic inte-
gral (1.2.5) in a general way.
1.2. Ito calculus 13
Lemma 1.2.6: Ito’s formula
Let u : [0,∞) ×R→ R, (t,x) 7→ u(t,x) be continuously differentiable with respect to t and
twice continuously differentiable with respect to x. Then the stochastic process Yt = u(t,Xt)
satisfies the equation
Yt = Y0 +
∫ t
0
∂u
∂t
(s,Xs)ds+
∫ t
0
∂u
∂x
(s,Xs)fsds+
∫ t
0
∂u
∂x
(s,Xs)gsdWs
+
1
2
∫ t
0
∂2u
∂x2
(s,Xs)g
2
s ds .
Proof: It suffices to prove the result for elementary integrands, and by additivity of the inte-
grals, one can reduce the problem to the case of constant integrands. In that case,Xt = f0t+g0Wt
and Yt = u(t, f0t + g0Wt) can be expressed as functions of (t,Wt). It suffices thus to consider the
case Xt =Wt. Now for a partition 0 = t0 < t1 < · · · < tn = t, one has
u(t,Wt)−u(0,0) =
n∑
k=1
[
u(tk ,Wtk )−u(tk−1,Wtk )
]
+
[
u(tk−1,Wtk )−u(tk−1,Wtk−1)
]
=
n∑
k=1
∂u
∂t
(tk−1,Wtk )(tk − tk−1) +
∂u
∂x
(tk−1,Wtk−1)(Wtk −Wtk−1)
+
1
2
∂2u
∂x2
(tk−1,Wtk−1)(Wtk −Wtk−1)2 + O
(
tk − tk−1
)
+ O
(
(Wtk −Wtk−1)2
)
=
∫ t
0
∂u
∂t
(s,Ws)ds+
∫ t
0
∂u
∂x
(s,Ws)dWs +
1
2
∫ t
0
∂2u
∂x2
(s,Ws)ds
+
n∑
k=1
1
2
∂2u
∂x2
(tk−1,Wtk−1)
[
(Wtk −Wtk−1)2 − (tk − tk−1)
]
+ O(1) .
The sum can be dealt with as M(n)t in the above example when tk − tk−1→ 0, c.f. (1.2.4).
Remark 1.2.7
1. Ito’s formula can be written in differential form as
dYt =
∂u
∂t
(t,Xt)dt +
∂u
∂x
(t,Xt)
[
ft dt + gt dWt
]
+
1
2
∂2u
∂x2
(t,Xt)g
2
t dt .
2. A mnemotechnic way to recover the formula is to write it in the form
dYt =
∂u
∂t
dt +
∂u
∂x
dXt +
1
2
∂2u
∂x2
dX2t ,
where dX2t can be computed using the rules
dt2 = dtdWt = 0, dW
2
t = dt .
3. The formula can be generalised to functions u(t,X(1)t , . . . ,X
(n)
t ), depending on n processes
14 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
defined by dX(i)t = f
(i)
t dt + g
(i)
t dWt, to
dYt =
∂u
∂t
dt +
∑
i
∂u
∂xi
dX(i)t +
1
2
∑
i,j
∂2u
∂xi∂xj
dX(i)t dX
(j)
t ,
where dX(i)t dX
(j)
t = g
(i)
t g
(j)
t dt.
Example 1.2.8
1. If Xt =Wt and u(x) = x2, one recovers Relation (1.2.6).
2. If dXt = gt dWt − 12g2t dt and u(x) = ex, one obtains
d(eXt ) = gte
Xt dWt .
Therefore, Mt = exp
{
γWt −γ2 t2
}
solves the equation
dMt = γMt dWt .
Exercise 1.2.9: Ornstein–Uhlenbeck process
Consider the two stochastic processes
Xt =
∫ t
0
esdWs , Yt = e
−tXt .
1. Determine E[Xt], Var(Xt), E[Yt] and Var(Yt).
2. Specify the law of Xt and Yt.
3. Show that Yt converges in distribution to a random variable Y∞ as t →∞, and specify
its law.
4. Express dYt as a function of Yt and Wt.
Exercise 1.2.10: Stratonovich integral
Let {Wt}t∈[0,T ] be a standard Brownian motion. Let 0 = t0 < t1 < · · · < tN = T be a partition
of [0,T ], and let
et =
N∑
k=1
etk−11[tk−1,tk)(t)
be an elementary function, adapted to the canonical filtration of Brownian motion. The
Stratonovich integral of et is defined by∫ T
0
et ◦dWt =
N∑
k=1
etk + etk−1
2
∆Wk where ∆Wk =Wtk −Wtk−1 .
The Stratonovich integral ∫ T
0
Xt ◦dWt
1.2. Ito calculus 15
of an adapted process Xt is defined as the limit of the sequence∫ T
0
e
(n)
t ◦dWt ,
where e(n) is a sequence of elementary functions converging to Xt in L2. Assume that this
limit exists and is independent of the sequence e(n).
1. Compute ∫ T
0
Wt ◦dWt .
2. Let g : R→ R be a C 2 function, and let Xt be an adapted process satisfying
Xt =
∫ t
0
g(Xs) ◦dWs ∀t ∈ [0,T ] .
Let Yt be the Ito integral
Yt =
∫ t
0
g(Xs)dWs .
Show that
Xt −Yt = 12
∫ t
0
g ′(Xs)g(Xs)ds ∀t ∈ [0,T ] .
1.2.3 Stochastic differential equations
A stochastic differential equation (SDE) is an equation of the form
dXt = f (Xt , t)dt + g(Xt , t)dWt , (1.2.7)
where f ,g : R × [0,T ] → R are deterministic measurable functions. A strong solution if this
equation is by definition an adapted process satisfying
Xt = X0 +
∫ t
0
f (Xs, s)ds+
∫ t
0
g(Xs, s)dWs (1.2.8)
almost surely for all t ∈ [0,T ], as well as the regularity conditions
P
{∫ T
0
|f (Xs, s)|ds <∞
}
= P
{∫ T
0
g(Xs, s)
2 ds <∞
}
= 1 .
Here are two important examples of solvable SDEs.
Example 1.2.11: Linear SDE with additive noise
Consider the linear SDE with additive noise
dXt = a(t)Xt dt + σ (t)dWt , (1.2.9)
where a and σ are deterministic functions. In the particular case σ ≡ 0, the solution can be
simply written
Xt = e
α(t)X0 , α(t) =
∫ t
0
a(s) ds .
This suggest applying the method of variation of the constant, that is, looking for a solution
16 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
of the form Xt = eα(t)Yt. Ito’s formula applied to Yt = u(Xt , t) = e−α(t)Xt gives us
dYt = −a(t)e−α(t)Xt dt + e−α(t) dXt = e−α(t)σ (t) dWt ,
so that integrating and using Y0 = X0, one gets
Yt = X0 +
∫ t
0
e−α(s)σ (s) dWs .
This finally gives the strong solution of equation (1.2.9)
Xt = X0 e
α(t) +
∫ t
0
eα(t)−α(s)σ (s) dWs .
One checks that this process indeed solves (1.2.8) by applying Ito’s formula once again.
Note in particular that if the initial condition X0 is deterministic, then Xt follows a normal
law, with expectation E[Xt] = X0 eα(t) and variance
Var(Xt) =
∫ t
0
e2(α(t)−α(s))σ (s)2 ds ,
as a consequence of Ito’s isometry.
Example 1.2.12: Linear SDE with multiplicative noise
Consider the linear SDE with multiplicative noise
dXt = a(t)Xt dt + σ (t)Xt dWt ,
with again a and σ deterministic functions. We can then write
dXt
Xt
= a(t)dt + σ (t)dWt .
Integrating the left-hand side, one should get log(Xt), but is this compatible with Ito cal-
culus? To check this, set Yt = u(Xt) = log(Xt). Then Ito’s formula gives
dYt =
1
Xt
dXt − 1
2X2t
dX2t
= a(t)dt + σ (t)dWt − 12σ (t)
2 dt .
Integrating and taking the exponential, one obtains the strong solution
Xt = X0 exp
{∫ t
0
[
a(s)− 1
2
σ (s)2
]
ds+
∫ t
0
σ (s)dWs
}
.
In particular, if a ≡ 0 and σ ≡ γ , one recovers Xt = X0 exp{γWt −γ2t/2}, which is called the
geometric (or exponential) Brownian motion.
1.2. Ito calculus 17
We now state an existence and uniqueness result of solutions for a class of SDEs.
Theorem 1.2.13: Existence and uniqueness of a strong solution
Assume the functions f and g satisfy the following two conditions:
1. Global Lipschitz condition: there exists a constant K such that
|f (x, t)− f (y, t)|+ |g(x, t)− g(y, t)|6K |x − y|
for all x,y ∈ R and t ∈ [0,T ].
2. Bounded growth condition: there exists a constant L such that
|f (x, t)|+ |g(x, t)|6L(1 + |x|)
for all x ∈ R and t ∈ [0,T ].
Then the SDE (1.2.7) admits, for any square-integrable initial condition X0, a strong solu-
tion {Xt}t∈[0,T ], which is almost surely continuous. This solution is unique in the sense that
if {Xt}t∈[0,T ] and {Yt}t∈[0,T ] are two almost surely continuous solutions, then
P
{
sup
06t6T
|Xt −Yt | > 0
}
= 0 .
We will omit the details of the proof of this result, which is very similar to corresponding
proofs in the deterministic case. Uniqueness follows by estimating the derivative of the ex-
pected difference E
[|Xt − Yt |2] and applying Gronwall’s lemma, while existence is obtained by
applying a fixed-point argument, or more precisely by showing that the sequence of functions
X
(k+1)
t = X0 +
∫ t
0
f (X(k)s , s) ds+
∫ t
0
g(X(k)s , s) dWs
converges to a limit which solves the SDE.
Remark 1.2.14: Weaker conditions on drift and diffusion coefficients
The conditions on f and g in the above result can be relaxed to the following ones:
1. Local Lipschitz condition: For any compactK ∈ R, there exists a constant K = K(K ) such
that
|f (x, t)− f (y, t)|+ |g(x, t)− g(y, t)|6K |x − y|
for all x,y ∈K and t ∈ [0,T ].
2. Bounded growth condition: There exists a constant L such that
xf (x, t) + g(x, t)26L2(1 + x2)
for all x, t.
Indeed, one can show that under the local Lipschitz condition, any solution path Xt(ω)
either exists up to time T , or leaves any compact K at a time τ(ω) < T . Therefore, there
exists a random blow-up time τ , such that either τ(ω) = +∞ and then Xt(ω) exists up to
time T , or τ(ω)6 T , and then Xt(ω)→±∞ as t→ τ(ω).
Under the bounded growth condition, one shows that solution pathsXt(ω) cannot blow
up (because the drift term does not grown fast enough, or pulls paths back towards the
origin if xf (x, t) is negative).
18 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Exercise 1.2.15
Solve the SDE
dXt = −12Xt dt +
√
1−X2t dWt , X0 = 0
using the change of variables Y = Arcsin(X).
Exercise 1.2.16
Fix r,α ∈ R. Solve the SDE
dYt = r dt +αYt dWt , Y0 = 1
by using the “integrating factor” Ft = e
−αWt+ 12α2t, and considering Xt = FtYt.
1.3 Diffusions
A diffusion is a stochastic process solving an SDE of the form
dXt = f (Xt)dt + g(Xt)dWt ,
with a drift coefficient f (modelling a deterministic force), and a diffusion coefficient g (mod-
elling a random effect such as collisions with particles of a fluid). When speaking of diffusions,
we focus on the dependence of solutions on the initial condition X0 = x, which is one of the
main mechanisms creating links between SDEs and PDEs.
Definition 1.3.1: Ito diffusion
A time-homogeneous Ito diffusion is a stochastic process {Xt(ω)}t>0 satisfying an SDE of the
form
dXt = f (Xt)dt + g(Xt)dWt , t> s > 0 , Xs = x , (1.3.1)
where Wt is a standard Brownian motion of dimension m, and the drift coefficient f : Rn→
Rn and diffusion coefficient g : Rn → Rn×m are such that the SDE (1.3.1) admits a unique
solution for all times.
We will denote the solution of (1.3.1) Xs,xt .
1.3.1 The Markov property
Time homogeneity, that is, the fact that f and g do not depend on time, has the following
important consequence.
Lemma 1.3.2: Time homogeneity of the law
The processes {Xs,xs+h}h>0 and {X0,xh }h>0 have the same distribution.
Proof: By definition, X0,xh satisfies the integral equation
X0,xh = x+
∫ h
0
f (X0,xv )dv +
∫ h
0
g(X0,xv )dWv . (1.3.2)
1.3. Diffusions 19
Furthermore, Xs,xs+h satisfies the equation
Xs,xs+h = x+
∫ s+h
s
f (Xs,xu )du +
∫ s+h
s
g(Xs,xu )dWu
= x+
∫ h
0
f (Xs,xs+v)dv +
∫ h
0
g(Xs,xs+v)dW˜v (1.3.3)
where we have used the change of variables u = s + v, and W˜v = Ws+v −Ws. By the differen-
tial property, W˜v is a standard Brownian motion, so that by uniqueness of solutions of the
SDE (1.3.1), the integrals (1.3.3) and (1.3.2) have the same distribution.
We will denote Px the probability measure on the σ -algebra generated by all random vari-
ables X0,xt , t> 0, x ∈ Rn, defined by
Px
{
Xt1 ∈ A1, . . . ,Xtk ∈ Ak
}
= P
{
X0,xt1 ∈ A1, . . . ,X0,xtk ∈ Ak
}
for any choice of times 06 t1 < t2 < · · · < tk and Borel sets A1, . . . ,Ak ⊂ Rn. Expectations with
respect to Px will be denoted Ex.
Theorem 1.3.3: Markov propery for Ito diffusions
For any bounded measurable function ϕ : Rn→ R,
Ex
[
ϕ(Xt+h)
∣∣∣ Ft](ω) = EXt(ω)[ϕ(Xh)] , (1.3.4)
where the right-hand side denotes the function Ey[ϕ(Xh)] evaluated at y = Xt(ω).
Proof: Consider for y ∈ Rn and s> t the function
F(y, t, s,ω) = Xt,ys (ω) = y +
∫ s
t
f (Xu(ω))du +
∫ s
t
g(Xu(ω))dWu(ω) .
Note that F is independent of Ft. By uniqueness of solutions of the SDE (1.3.1), we have
Xs(ω) = F(Xt(ω), t, s,ω) .
Let g(y,ω) = ϕ ◦F(y, t, t+h,ω). One can check that this function is measurable. Relation (1.3.4)
is thus equivalent to
E
[
g(Xt ,ω)
∣∣∣ Ft] = E[ϕ ◦F(y,0,h,ω)]∣∣∣∣
y=Xt(ω)
.
We have
E
[
g(Xt ,ω)
∣∣∣ Ft] = E[g(y,ω) ∣∣∣ Ft]∣∣∣∣
y=Xt(ω)
.
Indeed, this relation is true for functions of the form g(y,ω) = φ(y)ψ(ω), since
E
[
φ(Xt)ψ(ω)
∣∣∣ Ft] = φ(Xt)E[ψ(ω) ∣∣∣ Ft] = E[φ(y)ψ(ω) ∣∣∣ Ft]∣∣∣∣
y=Xt(ω)
.
It can thus be extended to any bounded measurable function, by approximating it by a se-
quence of linear combinations of functions as above. It follows from the independence of F
and Ft that
E
[
g(y,ω)
∣∣∣ Ft] = E[g(y,ω)]
= E
[
ϕ ◦F(y, t, t + h,ω)]
= E
[
ϕ ◦F(y,0,h,ω)] ,
where the last equality follows from Lemma 1.3.2. The result then follows by evaluating the
last inequality at y = Xt.
20 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
There exists an important generalisation of the Markov property to so-called stopping
times. We have already encountered such a time in André’s reflection principle, see Propo-
sition 1.1.7, with the random time τ = inf{t> 0: Wt >H}. The general definition of a stopping
time is as follows.
Definition 1.3.4: Stopping time
A stopping time is a random variable τ : Ω→ [0,∞] such that {τ < t} ∈ Ft for all t> 0. For
such a stopping time, the pre-τ sigma algebra is defined by
Fτ =
{
A ∈ F : A∩ {τ 6 t} ∈ Ft ∀t> 0}
In what follows, it will be sufficient to know that first-exit times
τ = inf{t > 0: Xt < A}
of an open or closed set A are stopping times. The pre-τ sigma algebra is in this case the set of
all events that only depend on the behaviour of the process as long as it stays in A.
The generalisation of the Markov property to stopping times reads as follows.
Theorem 1.3.5: Strong Markov property for Ito diffusions
For any bounded, measurable function ϕ : Rn→ R and almost surely finite stopping time
τ ,
Ex
[
ϕ(Xτ+h)
∣∣∣ Fτ](ω) = EXτ (ω)[ϕ(Xh)] .
Proof: The proof is a relatively direct adaptation of the previous proof. See for instance
[Øks03, Theorem 7.2.4].
1.3.2 Semigroups and generators
Definition 1.3.6: Markov semi-group
To any bounded measurable function ϕ : Rn→ R, one associates for all t> 0 the function
Ptϕ defined by
(Ptϕ)(x) = E
x[ϕ(Xt)] .
The linear operator Pt is called the Markov semi-group of the diffusion.
For instance, if ϕ(x) = 1A(x) denotes the indicator function of a Borel set A ⊂ Rn, one has
(Pt1A)(x) = P
x{Xt ∈ A} .
The name semi-group is justified by the following result.
Lemma 1.3.7: Semi-group property
For any t,h> 0, one has
Ph ◦ Pt = Pt+h .
1.3. Diffusions 21
Proof: We have
(Ph ◦ Pt)(ϕ)(x) = (Ph(Ptϕ))(x)
= Ex
[
(Ptϕ)(Xh)
]
= Ex
[
EXh
[
ϕ(Xt)
]]
= Ex
[
Ex
[
ϕ(Xt+h)
∣∣∣ Ft]]
= Ex
[
ϕ(Xt+h)
]
= (Pt+hϕ)(x) ,
where we have used the Markov property to go from the third to the fourth line.
The following properties are easy to check:
1. Pt preserves constant functions: Pt(c1Rn) = c1Rn ;
2. Pt preserves non-negative functions: ϕ(x)> 0 ∀x⇒ (Ptϕ)(x)> 0 ∀x;
3. Pt is contracting (in the non-strict sense) in the L∞-norm:
sup
x∈Rn
∣∣∣(Ptϕ)(x)∣∣∣ = sup
x∈Rn
∣∣∣Ex[ϕ(Xt)]∣∣∣6 sup
y∈Rn
∣∣∣ϕ(y)∣∣∣ sup
x∈Rn
Ex
[
1
]
= sup
y∈Rn
∣∣∣ϕ(y)∣∣∣ .
The Markov semigroup is thus a positive, linear operator, which is bounded in L∞-norm (in
fact, it has operator norm 1).
The semi-group property implies that the behaviour of Pt on any interval [0, ε], with ε > 0
arbitrarily small, determines its behaviour for any t > 0. It is thus natural to consider the
derivative of Pt in t = 0.
Definition 1.3.8: Infinitesimal generator of an Ito diffusion
The infinitesimal generatorL of an Ito diffusion is defined by its action on test functions ϕ
via
(L ϕ)(x) = lim
h→0+
(Phϕ)(x)−ϕ(x)
h
. (1.3.5)
The domain of L is by definition the set of functions ϕ for which the limit (1.3.5) exists
for all x ∈ Rn.
Remark 1.3.9
Formally, Relation (1.3.5) can be written
L =
dPt
dt
∣∣∣∣
t=0
.
By the Markov property, this relation generalises to
d
dt
Pt = lim
h→0+
Pt+h − Pt
h
= lim
h→0+
Ph − id
h
Pt =L Pt ,
and the semigroup can thus by formally written
Pt = e
tL .
22 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Proposition 1.3.10
The generator of the Ito diffusion (1.3.1) is the differential operator
L =
n∑
i=1
fi(x)
∂
∂xi
+
1
2
n∑
i,j=1
(ggT )ij(x)
∂2
∂xi∂xj
.
The domain of L contains the set of twice continuously differentiable functions of com-
pact support.
Proof: Consider the case n = m = 1. Let ϕ be a twice continuously differentiable functions of
compact support, and let Yt = ϕ(Xt). By Ito’s formula,
Yh = ϕ(X0) +
∫ h
0
ϕ′(Xs)f (Xs)ds+
∫ h
0
ϕ′(Xs)g(Xs)dWs +
1
2
∫ h
0
ϕ′′(Xs)g(Xs)2 ds .
Taking the expectation, as the expectation of the Ito integral vanishes, one gets
Ex
[
Yh
]
= ϕ(x) +Ex
[∫ h
0
ϕ′(Xs)f (Xs)ds+
1
2
∫ h
0
ϕ′′(Xs)g(Xs)2 ds
]
, (1.3.6)
so that
Ex
[
ϕ(Xh)
]−ϕ(x)
h
=
1
h
∫ h
0
Ex
[
ϕ′(Xs)f (Xs)
]
ds+
1
2h
∫ h
0
Ex
[
ϕ′′(Xs)g(Xs)2
]
ds .
Taking the limit h→ 0+, we get
(L ϕ)(x) = ϕ′(x)f (x) + 1
2
ϕ′′(x)g(x)2 .
The cases n> 2 or m> 2 are treated similarly, using the multidimensional Ito formula.
Example 1.3.11: Generator of Brownian motion
Let Wt be an m-dimensional Brownian motion. This is a particular case of diffusion, with
f = 0 and g = 1l. Its generator is thus given by
L =
1
2
m∑
i=1
∂2
∂x2i
=
1
2
∆ .
1.3.3 Dynkin’s formula
Dynkin’s formula is essentially a generalisation of the expression (1.3.6) to stopping times. It
will yield a first important class of links between SDEs and PDEs.
Proposition 1.3.12: Dynkin’s formula
Let {Xt}t>0 be a diffusion with generatorL . Fix x ∈ Rn, a stopping time τ such that Ex[τ] <
∞, and a compactly supported, twice continuously differentiable function ϕ : Rn → R.
Then
Ex
[
ϕ(Xτ )
]
= ϕ(x) +Ex
[∫ τ
0
(L ϕ)(Xs)ds
]
.
1.3. Diffusions 23
Proof: Consider the case n = m = 1, m being the dimension of Brownian motion. Proceeding
as in the proof of Proposition 1.3.10, we obtain
Ex
[
ϕ(Xτ )
]
= ϕ(x) +Ex
[∫ τ
0
(L ϕ)(Xs)ds
]
+Ex
[∫ τ
0
g(Xs)ϕ
′(Xs)dWs
]
. (1.3.7)
It thus suffices to show that the expectation of the stochastic integral vanishes. For any function
h bounded by M and any N ∈ N, one has
Ex
[∫ τ∧N
0
h(Xs)dWs
]
= Ex
[∫ N
0
1{s<τ}h(Xs)dWs
]
= 0 ,
owing to the Fs-measurability of 1{s<τ} and h(Xs). Moreover,
Ex
[(∫ τ
0
h(Xs)dWs −
∫ τ∧N
0
h(Xs)dWs
)2]
= Ex
[∫ τ
τ∧N
h(Xs)
2 ds
]
6M2Ex[τ − τ ∧N ] ,
which goes to 0 as N → ∞, owing to the assumption Ex[τ] < ∞, by Lebesgues’ dominated
convergence theorem. One can thus write
0 = lim
N→∞E
x
[∫ τ∧N
0
h(Xs)dWs
]
= Ex
[∫ τ
0
h(Xs)dWs
]
, (1.3.8)
which finishes the proof, after plugging (1.3.8) into (1.3.7). The proof of the general case is
analogous.
Consider now the particular case where the stopping time τ is the first-exit time from an
open bounded set D ⊂ Rn. Assume the boundary value problem
(L u)(x) = θ(x) x ∈D
u(x) = ψ(x) x ∈ ∂D (1.3.9)
admits a unique solution. This is the case if D, θ and ψ are sufficiently regular. Replacing ϕ by
u in Dynkin’s formula, we get the relation
u(x) = Ex
[
ψ(Xτ )−
∫ τ
0
θ(Xs)ds
]
. (1.3.10)
For ψ = 0 and θ = −1, u(x) is equal to the expectation of τ , starting form x. For θ = 0 and ψ
the indicator of a subset A of the boundary ∂D, u(x) is the probability of leaving D though A.
Hence, if one can solve the problem (1.3.9), one obtains information on the first-exit time and
location from D. Conversely, simulating the expression (1.3.10) by a Monte-Carlo method, one
gets a numerical approximation of the solution of the boundary value problem (1.3.9).
Example 1.3.13: Mean exit time of Brownian motion from a ball
Let K = {x ∈ Rn : ‖x‖ < R} be the ball of radius R centred at the origin. Given a point x ∈ K ,
let
τK = inf{t > 0: x+Wt < K}
and let
τ(N ) = τK ∧N .
The function ϕ(x) = ‖x‖21{‖x‖6R} is compactly supported and satisfies ∆ϕ(x) = 2n for all
x ∈ K . One can furthermore extend it outsite K in a smooth and compactly supported way.
24 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Plugging into Dynkin’s formula, one gets
Ex
[‖x+Wτ(N )‖2] = ‖x‖2 +Ex[∫ τ(N )
0
1
2
∆ϕ(Ws)ds
]
= ‖x‖2 +nEx[τ(N )] .
Since ‖x+Wτ(N )‖6R, letting N go to infinity, one obtains by dominated convergence
Ex
[
τK
]
=
R2 − ‖x‖2
n
. (1.3.11)
Example 1.3.14: Recurrence/transience of Brownian motion
Let again K = {x ∈ Rn : ‖x‖ < R}. We now consider the case where x < K , and we want to
determine if Brownian motion starting in x hits K almost surely, in which case it is called
recurrent, or if it hits K with a probability strictly less than 1, in which case it is called
transient. As for random walks, the answer depends on the dimension n of space.
We define
τK = inf{t > 0: x+Wt ∈ K} .
For N ∈ N, let AN be the ring
AN = {x ∈ Rn : R < ‖x‖ < 2NR} ,
and let τ be the first-exit time of x+Wt from AN . We thus have
τ = τK ∧ τ ′ , τ ′ = inf{t > 0: ‖x+Wt‖ = 2NR} .
Finally, let
p = Px
{
τK < τ
′} = Px{‖x+Wτ‖ = R} = 1−Px{‖x+Wτ‖ = 2NR} .
The spherically symmetric solutions of ∆ϕ = 0 are of the form
ϕ(x) =

|x| if n = 1 ,
− log‖x‖ if n = 2 ,
‖x‖2−n if n > 2 .
For such a ϕ, Dynkin’s formula yields
Ex
[
ϕ(x+Wτ )
]
= ϕ(x) .
On the other hand,
Ex
[
ϕ(x+Wτ )
]
= ϕ(R)p+ϕ(2NR)(1− p) .
Solving with respect to p, one gets
p =
ϕ(x)−ϕ(2NR)
ϕ(R)−ϕ(2NR) .
1.3. Diffusions 25
As N →∞, one has τ ′→∞, so that
Px
{
τK <∞} = lim
N→∞
ϕ(x)−ϕ(2NR)
ϕ(R)−ϕ(2NR) .
Consider now separately the cases n = 1, n = 2 and n > 2.
1. For n = 1, one has
Px
{
τK <∞} = lim
N→∞
2NR− |x|
2NR−R = 1 ,
showing that Brownian motion is recurrent in dimension 1.
2. For n = 2, one has
Px
{
τK <∞} = lim
N→∞
log‖x‖+N log2− logR
N log2
= 1 ,
showing that Brownian motion is also recurrent in dimension 2.
3. For n > 2, one has
Px
{
τK <∞} = lim
N→∞
(2NR)2−n + ‖x‖2−n
(2NR)2−n +R2−n
=
( R
‖x‖
)n−2
< 1 .
Brownian motion is thus transient in dimension n > 2.
Exercise 1.3.15: First-exit time of geometric Brownian motion
Consider the diffusion defined by the equation
dXt = Xt dWt .
1. Determine its generatorL .
2. Find the general solution of the equationL u = 0.
3. Determine Px{τa < τb}, where τa denotes the first-passage time of Xt in a.
Hint: This amounts to computing Ex[ψ(Xτ )], where τ is the first-exit time from [a,b],
and ψ(a) = 1, ψ(b) = 0.
Exercise 1.3.16: First-exit time of geometric Brownian motion with linear drift
Consider more generally the diffusion defined by the equation
dXt = rXt dt +Xt dWt , r ∈ R .
1. Compute its generatorL .
2. Show that if r , 12 , the general solution of the equationL u = 0 is given by
u(x) = c1x
γ + c2 ,
where γ is a function of r to be determined.
3. Assume r < 1/2. Compute Px{τb < τa} for 0 < a < x < b, and then Px{τb < τ0} by letting a
go to 0. Note that if Xt0 = 0, then Xt = 0 for all t> t0. Therefore, if τ0 < τb, then Xt will
never reach b. What is the probability that this happens?
4. Assume now r > 1/2.
26 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
(a) Compute Px{τa < τb} for 0 < a < x < b, and show that this probability goes to 0 as
a → 0+ for all x ∈]a,b[. Conclude that almost surely, Xt will never reach 0 in this
situation.
(b) Find α and β such that u(x) = α logx+ β satisfies the problem(Lu)(x) = −1 if 0 < x < b ,u(x) = 0 if x = b .
(c) Use this to compute Ex[τb].
1.3.4 Kolmogorov’s equations
A second class of links between SDEs and PDEs is given by Kolmogorov’s equations, which are
initial value problems.
Observe that by taking the derivative of Dynkin’s formula with respect to t, in the particu-
liar case τ = t, one gets
∂
∂t
(Ptϕ)(x) =
∂
∂t
Ex
[
ϕ(Xt)
]
= Ex
[
(L ϕ)(Xt)
]
= (PtL ϕ)(x) ,
which can be written in compact form as
d
dt
Pt = PtL .
We have seen in Remark 1.3.9 that one can also formally write ddtPt =L Pt. Therefore, the op-
eratorsL et Pt commute, at least formally. The next theorem makes this observation rigorous.
Theorem 1.3.17: Backward Kolmogorov equation
Let ϕ : Rn→ R be a compactly supported, twice continuously differentiable function.
1. The function
u(t,x) = (Ptϕ)(x) = E
x[ϕ(Xt)]
satisfies the initial value problem
∂u
∂t
(t,x) = (L u)(t,x) , t > 0 , x ∈ Rn ,
u(0,x) = ϕ(x) , x ∈ Rn . (1.3.12)
2. If w(t,x) is a bounded function, which is continuously differentiable in t and twice con-
tinuously differentiable in x, satisfying the initial value problem (1.3.12), then w(t,x) =
(Ptϕ)(x).
Proof:
1. One has u(0,x) = (P0ϕ)(x) = ϕ(x) and
(L u)(t,x) = lim
h→0+
(Ph ◦ Ptϕ)(x)− (Ptϕ)(x)
h
= lim
h→0+
(Pt+hϕ)(x)− (Ptϕ)(x)
h
=
∂
∂t
(Ptϕ)(x) =
∂
∂t
u(t,x) .
1.3. Diffusions 27
2. If w(t,x) satisfies (1.3.12), then one has
L˜ w = 0 where L˜ w = −∂w
∂t
+L w .
Fix (s,x) ∈ R+ ×Rn. The process Yt = (s − t,X0,xt ) admits L˜ as generator. Let
τR = inf{t > 0: ‖Xt‖>R} .
Dynkin’s formula shows that
Es,x
[
w(Yt∧τR)
]
= w(s,x) +Es,x
[∫ t∧τR
0
(L˜ w)(Yu)du
]
= w(s,x) .
Letting R go to infinity, one obtains
w(s,x) = Es,x
[
w(Yt)
] ∀t> 0 .
In particular, taking t = s, one has
w(s,x) = Es,x
[
w(Ys)
]
= E
[
w(0,X0,xs )
]
= E
[
ϕ(X0,xs )
]
= Ex
[
ϕ(Xs)
]
,
as claimed.
Note that in the case of Brownian motion, which has generator L = 12∆, Kolmogorov’s
backward equation (1.3.12) is nothing but the heat equation.
Since Kolmogorov’s backward equation is linear, it is sufficient to solve it for a complete
family of initial conditions ϕ to determine its solution for all initial conditions. A first impor-
tant case occurs when one knows all eigenfunctions and eigenvalues of L . Then the general
solution can be decomposed on a basis of eigenfunctions, with coefficients depending expo-
nentially on time.
Example 1.3.18: Brownian motion
Eigenfunctions of the generator L = 12∆ =
1
2
d2
dx2 of one-dimensional Brownian motion are
of the form eikx. Decomposing the solution on this basis of eigenfunctions amounts to
solving the heat equation by Fourier transform. One knows that the solution can be written
as
u(t,x) =
1√
2pi
∫
R
e−k2t/2 ϕˆ(k)eikxdk , (1.3.13)
where ϕˆ(k) is the Fourier transform of the initial condition.
A second important case occurs when formally decomposing the initial condition on a “ba-
sis” of Dirac distributions. In practice, this amounts to using the notion of transition density.
Definition 1.3.19: Transition density
The diffusion {Xt}t is said to admit the transition density pt(x,y), also written p(y, t|x,0), if
Ex
[
ϕ(Xt)
]
=
∫
Rn
ϕ(y)pt(x,y)dy
for all bounded measurable functions ϕ : Rn→ R.
By linearity, if the transition density exists and is smooth, it satifies the Kolmogorov’s back-
ward equation (the generator L acting on the variable x), with initial condition p0(x,y) =
δ(x − y).
28 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Example 1.3.20: Brownian motion and heat kernel
In the case one-dimensional Brownian motion, we have seen (c.f. (1.1.2)) that the transition
density is given by the heat kernel
p(y, t|x,0) = 1√
2pit
e−(x−y)2/2t .
This is also the value of the integral (1.3.13) with ϕˆ(k) = e− iky /
√
2pi, which is indeed the
Fourier transform of ϕ(x) = δ(x − y).
The adjoint of the generatorL is by definition the linear operatorL † such that
〈L φ,ψ〉 = 〈φ,L †ψ〉 (1.3.14)
for any choice of twice continuously differentiable functions φ,ψ : Rn→ R, with φ compactly
supported, where 〈·, ·〉 denotes the usual inner product in L2. Integrating 〈L φ,ψ〉 by parts
twice, one obtains
(L †ψ)(y) = 1
2
n∑
i,j=1
∂2
∂yi∂yj
(
(ggT )ijψ
)
(y)−
n∑
i=1
∂
∂yi
(
fiψ
)
(y) .
Theorem 1.3.21: Forward Kolmogorov equation
If Xt admits a smooth transition density pt(x,y), then it satisfies the equation
∂
∂t
pt(x,y) =L
†
y pt(x,y) , (1.3.15)
where the notationL †y means thatL † acts on the variable y.
Proof: Dynkin’s formula with τ = t implies∫
Rn
ϕ(y)pt(x,y)dy = E
x[ϕ(Xt)]
= ϕ(x) +
∫ t
0
Ex
[
(L ϕ)(Xs)
]
ds
= ϕ(x) +
∫ t
0
∫
Rn
(L ϕ)(y)ps(x,y)dy .
Taking the derivative with respect to time, and using (1.3.14), we get
∂
∂t
∫
Rn
ϕ(y)pt(x,y)dy =
∫
Rn
(L ϕ)(y)pt(x,y)dy =
∫
Rn
ϕ(y)(L †y pt)(x,y)dy ,
which implies the result.
Assume the distribution of X0 admits a density ρ with respect to Lebesgue measure. Then
Xt has the density
ρ(t,y) = (Qtρ)(y) :=
∫
Rn
pt(x,y)ρ(x)dx .
1.3. Diffusions 29
Applying Kolmogorov’s forward equation (1.3.15), one obtains the Fokker–Planck equation
∂
∂t
ρ(t,y) =L †y ρ(t,y) ,
which can also be formally written
d
dt
Qt =L
†Qt .
The adjoint generatorL † is thus the generator of the adjoint semi-group Qt.
Corollary 1.3.22
If ρ0(y) is the density of a probability measure satisfyingL †ρ0 = 0, then ρ0 is a stationnary
measure of the diffusion. In other words, if the distribution of X0 admits the density ρ0,
then Xt admits the density ρ0 for all t> 0.
Exercise 1.3.23: Invariant measure of the Ornstein–Uhlenbeck process
Consider the diffusion defined by the equation
dXt = −Xt dt + dWt .
1. Give its generatorL and its adjointL †.
2. Let ρ(x) = pi−1/2 e−x2 . ComputeL †ρ(x) and interpret the result.
1.3.5 The Feynman–Kac formula
So far, we have encountered elliptic boundary value problems of the form L u = θ, as well as
parabolic evolution equations of the form ∂tu = L u. The Feynman–Kac formula will show
that one can also link properties of diffusions with those of parabolic equations containing a
term linear in u. Adding a linear term to the generator can be interpreted as “killing” the
diffusion at certain rate. The simplest case is that of a constant rate. Let ζ be a random variable
of exponential distribution with parameter λ, independent of Wt. Set
X˜t =
Xt if t < ζ ,∆ if t> ζ ,
where ∆ is a “cemetery state” that has been added to Rn. One checks that owing to the ex-
ponential distribution of ζ, X˜t is a Markov process on Rn ∪ {∆}. If ϕ : Rn → R is a bounded
measurable test function, one has (setting ϕ(∆) = 0)
Ex
[
ϕ(X˜t)
]
= Ex
[
ϕ(Xt)1{t<ζ}
]
= P{ζ > t}Ex[ϕ(Xt)] = e−λtEx[ϕ(Xt)] .
It follows that
lim
h→0
Ex
[
ϕ(X˜h)
]−ϕ(x)
h
= −λϕ(x) + (L ϕ)(x) ,
which shows that the infinitesimal generator of X˜ is the differential operator
L˜ =L −λ .
30 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
More generally, if q : Rn→ R is a continuous function bounded from below, one can construct
a random variable ζ such that
Ex
[
ϕ(X˜t)
]
= Ex
[
ϕ(Xt)e
−∫ t
0
q(Xs)ds
]
.
In this case, the generator of X˜t is
L˜ =L − q ,
that is, (L˜ ϕ)(x) = (L ϕ)(x)− q(x)ϕ(x).
Theorem 1.3.24: Feynman–Kac formula
Let ϕ : Rn→ R be a compactly supported, twice continuously differentiable function, and
let q : Rn→ R be a continuous function bounded from below.
1. The function
v(t,x) = Ex
[
e−
∫ t
0
q(Xs)dsϕ(Xt)
]
(1.3.16)
solves the initial value problem
∂v
∂t
(t,x) = (L v)(t,x)− q(x)v(t,x) , t > 0 , x ∈ Rn ,
v(0,x) = ϕ(x) , x ∈ Rn . (1.3.17)
2. If w(t,x) is continuously differentiable in t and twice continuously differentiable in x,
bounded for x in a compact set, and satisfies (1.3.17), then w(t,x) is equal to the right-
hand side of (1.3.16).
Proof:
1. Set Yt = ϕ(Xt) and Zt = e
−∫ t
0
q(Xs)ds, and let v(t,x) be given by (1.3.16). Then for h > 0,
1
h
[
Ex
[
v(t,Xh)
]− v(t,x)] = 1
h
[
Ex
[
EXh
[
YtZt
]]−Ex[YtZt]]
=
1
h
[
Ex
[
Ex[Yt+h e−
∫ t
0
q(Xs+h)ds |Fh]−YtZt
]]
=
1
h
Ex
[
Yt+hZt+h e
∫ h
0
q(Xs)ds−YtZt
]
=
1
h
Ex
[
Yt+hZt+h −YtZt
]
− 1
h
Ex
[
Yt+hZt+h
[
e
∫ h
0
q(Xs)ds−1
]]
.
As h goes to 0, the first term in the last expression converges to ∂tv(t,x), while the second
one converges to q(x)v(t,x).
2. If w(t,x) satisfies (1.3.17), then
L˜ w = 0 where L˜ w = −∂w
∂t
+L w − qw .
Fix (s,x,z) ∈ R+ × Rn × Rn and set Zt = z +
∫ t
0 q(Xs)ds. The process Yt = (s − t,X0,xt ,Zt) is a
diffusion with generator
L̂ = − ∂
∂s
+L + q
∂
∂z
.
1.3. Diffusions 31
Let φ(s,x,z) = e−zw(s,x). Then L̂ φ = 0, and Dynkin’s formula shows that if τR is the first-
exit time from a ball of radius R, then
Es,x,z
[
φ(Yt∧τR)
]
= φ(s,x,z) .
It follows that
w(s,x) = φ(s,x,0) = Es,x,0
[
φ(Yt∧τR)
]
= Ex
[
φ
(
s − t ∧ τR,X0,xt∧τR ,Zt∧τR
)]
= Ex
[
e−
∫ t∧τR
0
q(Xu)duw(s − t ∧ τR,X0,xt∧τR)
]
,
which converges to the expectation of e−
∫ t
0
q(Xu)duw(s− t,X0,xt ) as R goes to infinity. In partic-
ular, for t = s one obtains
w(s,x) = Ex
[
e−
∫ s
0
q(Xu)duw(0,X0,xs )
]
,
which is indeed equal to the function v(t,x) defined in (1.3.16).
In combination with Dynkin’s formula, the Feynman–Kac formula can be generalised to
stopping times. If for instance D ⊂ Rn is a regular domain, and τ denotes the first-exit time
from D, then under some regularity conditions on the functions q,ϕ,θ :D→ R, the quantity
v(t,x) = Ex
[
e−
∫ t∧τ
0
q(Xs)dsϕ(Xt∧τ )−
∫ t∧τ
0
e−
∫ s
0
q(Xu)du θ(Xs)ds
]
satisfies the initial value problem with boundary conditions
∂v
∂t
(t,x) = (L v)(t,x)− q(x)v(t,x)−θ(x) , t > 0 , x ∈D ,
v(0,x) = ϕ(x) , x ∈D ,
v(t,x) = ϕ(x) , x ∈ ∂D .
In particular, if τ is almost surely finite, taking the limit t→∞, one obtains that
v(x) = Ex
[
e−
∫ τ
0
q(Xs)dsϕ(Xτ )−
∫ τ
0
e−
∫ s
0
q(Xu)du θ(Xs)ds
]
satisfies
(L v)(x) = q(x)v(x) +θ(x) , x ∈D ,
v(x) = ϕ(x) , x ∈ ∂D .
Note that in the case q = 0, one recovers Relations (1.3.9) and (1.3.10).
Example 1.3.25
Let D =]− a,a[ and Xt = x+Wt. Then v(x) = Ex[e−λτ] satisfies
1
2
v′′(x) = λv(x) , x ∈D ,
v(−a) = v(a) = 1 .
The general solution of the first equation is of the form v(x) = c1 e
√
2λx+c2 e−
√
2λx. The
32 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
integration constants c1 and c2 are determined by the boundary conditions, and one finds
Ex
[
e−λτ] = cosh(√2λx)
cosh(
√
2λa)
. (1.3.18)
Evaluating the derivative in λ = 0, one obtains
Ex
[
τ
]
= a2 − x2 ,
which is a particular case of (1.3.11), but (1.3.18) also determines all other moments of τ ,
as well as its density.
Solving the equation with boundary conditions v(−a) = 0 and v(a) = 1 one obtains
Ex
[
e−λτ 1{τa<τ−a}
]
=
sinh(
√
2λ (x+ a))
sinh(
√
2λ · 2a) .
In particular, for λ = 0, we find
Px
{
τa < τ−a
}
=
x+ a
2a
,
which can also be obtained directly from Dynkin’s formula. However, taking derivatives
at λ = 0, we also obtain
Ex
[
τ1{τa<τ−a}
]
=
(a2 − x2)(3a+ x)
6a
,
Ex
[
τ
∣∣∣ τa < τ−a] = (a− x)(3a+ x)3 .
Remark 1.3.26: Cover image
The cover image shows a numerical solution of the heat equation, with constant temper-
atures (say 1 and 0) inside the Mandelbrot set, and outside the ellipse, after a time long
enough for the solution to by close to a stationary state. The colour code represents the
norm of the gradient of the solution u(t,x). By the above results (assuming regularity of
the Mandelbrot set does not pose any problem), u(x) = limt→∞u(t,x) is the solution of
∆u = 0 in the domain, with boundary conditions 1 on the Mandelbrot set, and 0 on the
ellipse. Therefore, u(x) gives the probability, starting at x, to hit the Mandelbrot set before
the ellipse. It also represents the electric potential in a capacitor formed by two conduc-
tors shaped like the boundary sets, while the colours represent the intensity of the electric
field. One can observe a “knife edge effect”: the electric field is stronger near the sharp
tips of the Mandelbrot set. A video of the convergence towards the equilibrium field can
be found on the page https://www.youtube.com/c/NilsBerglund.
Exercise 1.3.27: The arcsine law
Let {Wt}t>0 be a standard Brownian motion in R. Consider the process
Xt =
1
t
∫ t
0
1{Ws>0}ds , t > 0 .
1.3. Diffusions 33
The aim of this exercise is to prove the arcsine law :
P
{
Xt < u
}
=
2
pi
Arcsin
(√
u
)
, 06u6 1 . (1.3.19)
1. What does the variable Xt represent?
2. Show that Xt is equal in distribution to X1 for all t > 0.
3. Fix λ > 0. For t > 0 and x ∈ R, one defines
v(t,x) = E
[
e−λ
∫ t
0
1{x+Ws>0}ds
]
and its Laplace transform
gρ(x) =
∫ ∞
0
v(t,x)e−ρt dt , ρ > 0 .
Show that
gρ(0) = E
[ 1
ρ+λX1
]
.
4. Compute ∂v∂t (t,x) using the Feynman–Kac formula.
5. Compute g ′′ρ (x). Conclude that gρ(x) satisfies a second-order ODE with piecewise con-
stant coefficients. Show that its general solution is given by
gρ(x) = A± +B± eγ±x+C± e−γ±x
with constants A±,B±,C±,γ± depending on the sign of x.
6. Determine these constants by using the fact that gρ should be bounded, continuous in
0, and that g ′ρ should be continuous in 0. Conclude that gρ(0) = 1/
√
ρ(λ+ ρ).
7. Prove (1.3.19) by using the identity
1√
1 +λ
=
∞∑
n=0
(−λ)n 1
pi
∫ 1
0
xn√
x(1− x) dx .
34 Chapter 1. Stochastic Differential Equations and Partial Differential Equations
Chapter2
Invariant measures for SDEs
We consider in this chapter SDEs in Rn of the form
dXt = f (Xt)dt + g(Xt)dWt , (2.0.1)
where the drift coefficient f and the diffusion coefficient g are such that there exists a unique
strong solution, which is global in time. Then it is natural to ask the following questions:
1. Does the diffusion (2.0.1) admit an invariant probability measure?
2. If so, is this measure unique?
3. If so, does any initial distribution converge to the invariant measure?
4. If so, how hast does this convergence occur? For which distance? Can one obtain explicit
bounds on the speed of convergence?
Various methods have been derived to answer these questions, each one having its advan-
tages and drawbacks. Some methods are easier to use and provide, for instance, convergence
to an invariant distribution, but without any bound on the speed of convergence, while oth-
ers may provide sharp bounds, but are limited to specific sets of initial distributions. In what
follows, we are going to present a few selected examples of these methods, which have been
chosen because they proved useful in particular applications. But one should keep in mind
that there exist many more approaches.
In what follows, it will by useful to employ the notation
Pt(x,A) = (Pt1A)(x) = Px{Xt ∈ A}
for the Markov semigroup, where A is any Borel set in Rn. Given a probability measure µ, we
write
(µPt)(A) =
∫
Rn
µ(dx)Pt(x,A)
instead of (Qtµ)(A) for the action of the adjoint semigroup, because it is reminiscent of matrix
multiplication used for Markov chains on finite sets. A measure is invariant if it satisfies
(µPt)(A) = µ(A)
for all t> 0 and all Borel sets A ⊂ Rn.
Definition 2.0.1: Feller property
A semigroup (Pt)t>0 is said to heave the Feller property if Ptf is bounded and continuous
whenever f is bounded and continuous.
A useful standard result in our situation is then the following.
35
36 Chapter 2. Invariant measures for SDEs
Proposition 2.0.2: Condition for Feller property
Any diffusion (Xt)t>0 solving an SDE with globally Lipschitz coefficients has the Feller
property.
For a proof, see for instance [Øks03, Lemma 8.1.4], or [RY99, Theorem IX.2.5]. The global
Lipschitz condition can often be relaxed to a local condition by working with appropriate stop-
ping times (that is, by considering the diffusion up to its first exit from a sequence of balls of
growing radius).
Remark 2.0.3: Strong Feller property
The semigroup Pt is said to have the strong Feller property if Ptf is bounded and continuous
whenever f is bounded and measurable, but not necessarily continuous. A sufficient con-
dition for a diffusion to satisfy the strong Feller property is the ellipticity condition on the
drift coefficient
〈ξ,g(x)g(x)T ξ〉> c‖ξ2‖ ∀ξ ∈ Rd .
This condition can be relaxed to hypoellipticity (Hörmander condition).
2.1 Existence of invariant probability measures
2.1.1 Some basic examples
Corollary 1.3.22 shows that the density ρ of an invariant probability measure should satisfy
L †ρ = 0, where L † is the adjoint generator. Cases where this equation can be solved are rare,
but one important example is given by gradient SDEs.
Example 2.1.1: Gradient system
Consider the SDE
dXt = −∇V (Xt)dt +
√
2dWt , (2.1.1)
where V : Rn→ R is bounded below, and satisfies∫
Rn
e−V (x) dx <∞ .
The generator of the diffusion can be written in the two equivalent ways
L = ∆−∇V · ∇ = eV ∇ · e−V ∇
(the factor
√
2 in (2.1.1) avoids a factor 12 in front of the Laplacian). Integrating by parts
twice, we find
〈f ,L g〉 = −
∫
e−V (x)∇(f (x)eV (x)) · ∇g(x)dx = 〈L †f ,g〉 ,
with the adjoint generator given by
L †f = ∇ · (e−V ∇(eV f )) .
2.1. Existence of invariant probability measures 37
In view of Corollary 1.3.22, this shows that
ρ(x) =
1
Z
e−V (x) , Z =
∫
e−V (x) dx
is the density of an invariant probability measure of the diffusion (2.1.1), since it satisfies
L †ρ = 0.
Next, we discuss two very simple examples for which existence or uniqueness of an invari-
ant probability measure fails.
Example 2.1.2: Brownian motion
The transition density of Brownian motion in Rn is a Gaussian of variance t, as we have
seen for instance in Section 1.1.3. Therefore, for any fixed x ∈ Rn, we have
lim
n→∞p(t,x|0,0) = 0 ,
which is not a normalisable measure. Therefore, Brownian motion in Rn does not admit an
invariant probability measure (though it does admit invariant measures, which are simply
all multiples of Lebesgue measure).
Example 2.1.3: Non-irreducible SDE
Consider the SDE in R2
dXt = −Xt dt + dWt
dYt = (Yt −Yt)3 dt .
The diffusion admits three invariant measures, given by
pi−(dx,dy) =
1√
pi
e−x2 dxδ−1(dy) ,
pi0(dx,dy) =
1√
pi
e−x2 dxδ0(dy) ,
pi+(dx,dy) =
1√
pi
e−x2 dxδ1(dy) .
This is because the x- and y-components do not interact with each other, and the process
Xt is an Ornstein–Uhlenbeck process with Gaussian invariant measure, while Yt is deter-
ministic, with three invariant points located at ±1 and 0.
The two last examples illustrate a general fact about invariant probability measures of
Markov processes, namely that their existence requires two properties to hold:
1. There should be a mechanism preventing all the probability mass of going to infinity. More
precisely, a positive recurrence property is required to hold, that is, the return time to some
bounded set should have finite expectation.
2. There should be a mechanism making the diffusion irreducible, that is, there should not be
any non-trivial invariant sets.
These two conditions are analogous to those that one finds for Markov chains on a count-
able space. The main difference is that discrete-time Markov chains require an aperiodicity
38 Chapter 2. Invariant measures for SDEs
condition to hold in addition. However, this is specific to discrete time, and no such condition
is necessary as soon as transition times between different points are sufficiently random.
2.1.2 The Krylov–Bogoliubov criterion
A general criterion for existence of invariant measures, going back to Krylov and Bogoliubov,
is based on the notion of ergodic averages (or Cesaro means). Given an initial point X0 ∈ Rn,
consider the family of measures { 1
T
∫ T
0
Pt(X0, ·)dt : T > 1
}
. (2.1.2)
In a similar way as for Markov chains, if these ergodic averages converge to a limiting probabil-
ity measure, then this limit should be invariant. A convergence criterion is given by thightness.
Definition 2.1.4: Tightness
A family {µt} of probability measures on Rn is tight if for any δ > 0, there exists a compact
set K ⊂ Rn such that µt(K)> 1− δ for all t.
Then one has the following existence result, see for instance [DPZ96, Corollary 3.1.2].
Proposition 2.1.5: Existence of an invariant probability measure
If the family of measures (2.1.2) is tight, then there exists an invariant probability measure.
While this criterion may be used to obtain abstract existence results for invariant proba-
bility measures, it is not so easy to apply because it requires some a priori knowledge on the
semi–group Pt. Therefore, in what follows, we will rather discuss more practical criteria for
analysing invariant measure.
2.2 The Lyapunov function approach by Meyn and Tweedie
We present here some results of the approach based on Lyapunov functions, as developed
in [MT93c] for continuous-time Markov processes. This approach provides a relatively easy
way of proving global existence of solution, existence of an invariant measure, and some con-
vergence results, provided one can guess an appropriate Lyapunov function.
Definition 2.2.1: Norm-like function
A function V : Rn→ R+ is called norm-like if
lim
‖x‖→∞
V (x) = +∞ .
This means that the level sets {x ∈ Rn : V (x)6 h} are precompact for any h > 0.
In the case of ordinary differential equations, Lyapunov functions are norm-like functions
that decrease along orbits of the dynamical system, at least when starting far away from the ori-
gin. The application to SDEs uses quite similar ideas, where the time derivative along orbits is
replaced by the action of the generator. In the following, we will present several of these results,
without giving detailed proofs. A detailed proof of a quite general existence and convergence
result, due to Martin Hairer and Jonathan Mattingly, will be discussed in Section 2.2.4.
2.2. The Lyapunov function approach by Meyn and Tweedie 39
2.2.1 Non-explosion and Harris recurrence criteria
A first application of Lyapunov functions is a relatively easy criterion for existence of global in
time solutions.
Theorem 2.2.2: Non-explosion criterion
Assume that there exist a norm-like function V and constants c,d > 0 such that(
L V
)
(x)6 cV (x) + d (2.2.1)
for all x ∈ Rn. Then
1. The SDE (2.0.1) admits global in time solutions for any starting point x ∈ Rn.
2. There exists an almost surely finite random variable D such that
V (Xt)6D ect ∀t> 0 .
The random variable D satisfies the bound
Px
{
D > a}6 V (x)
a
∀a > 0 , ∀x ∈ Rn . (2.2.2)
3. The expectation Ex
[
V (Xt)
]
is finite for all x ∈ Rn and all t> 0, and satisfies
Ex
[
V (Xt)
]6 ectV (x) . (2.2.3)
Sketch of proof: Consider first the case c = d = 0. Then Ito’s formula (cf. Lemma 1.2.6) yields
Ex
[
V (Xt)
]
= V (x) +Ex
[∫ t
0
(L V )(Xs)ds
]
6V (x) .
This proves (2.2.3), as well as global existence of the solution. The bound (2.2.2) follows from
a slightly more sophisticated stochastic analysis argument, using the fact that e−ctV (Xs) is a
supermartingale.
The case c > 0 and d = 0 follows in a similar way from the Feynman–Kac formula (see
Theorem 1.3.24), while the other cases can be reduced to already treated cases by changing the
Lyapunov function.
Remark 2.2.3: Stopping times
To rule out the possibility of finite-time blow-up, the actual argument given in [MT93c,
Theorem 2.1] uses a more careful computation based on the process killed when leaving a
large ball, whose radius is then sent to infinity. This requires in particular using Dynkin’s
formula instead of Ito’s formula.
The following result gives a condition under which solutions remain bounded almost surely
(a property called non-evanescence in [MT93c]), which is slightly stronger than (2.2.1). For a
proof, see [MT93c, Theorem 3.1].
40 Chapter 2. Invariant measures for SDEs
Theorem 2.2.4: Non-evanescence condition
Assume there exist a compact set C ⊂ Rn, a constant d > 0, and a norm-like function V
such that (
L V
)
(x)6 d1C(x) (2.2.4)
for all x ∈ Rd . Then
P
{
lim
t→∞‖Xt‖ =∞
}
= 0 .
A discussed above, in order to obtain existence of an invariant measure, we will need some
stronger form of recurrence condition. Recall that a discrete-space Markov chain is said to
be recurrent if it almost surely returns to its starting point, and thus visits this point infinitely
often. Such a property cannot hold for continuous-space processes, since sets of measure 0 may
never be hit. The relevant concept is given by Harris recurrence.
Definition 2.2.5: Harris recurrence
The diffusion (Xt)t>0 is called Harris recurrent if there exists a σ -finite measure µ such that
whenever µ(A) > 0, one has for all x ∈ Rn
Px
{
τA <∞} = 1 where τA = inf{t> 0: Xt ∈ A} .
Equivalently, the diffusion is Harris recurrent if there exists a σ -finite measure ν such that
whenever ν(A) > 0, one has for all x ∈ Rn
Px
{
ηA =∞} = 1 where ηA = ∫ ∞
0
1Xt∈Adt .
The equivalence of the two definitions is well-known for Markov chains on countable
spaces, and a proof in the general case can be found for instance in [MT93a, Theorem 1.1].
The interest of this definition is the following classical result [ADR69, Get80].
Proposition 2.2.6: Existence of an invariant measure
If (Xt)t>0 is Harris recurrent, then it admits an (essentially unique) invariant measure pi.
One way of showing Harris recurrence is to use the (somewhat tricky) notion of petite sets.
Definition 2.2.7: Petite set
Let a be a probability distribution on R+, and define a Markov kernel Ka by
Ka(x,A) =
∫ ∞
0
Pt(x,A)a(dt)
for any Borel set A ⊂ Rn. Let ϕa be a nontrivial measure on Rn. Then a non-empty Borel
set C ∈ Rn is called ϕa-petite if Ka(x,A)>ϕa(A) for all x ∈ C.
The intuition behind this definition is as follows. If we choose, say, a = δ1, then Ka(x,A) =
P1(x,A). The condition Ka(x,A)>ϕa(A) then requires that the probability of being in the set A
at time 1 is bounded below by a measure ϕa(A) independent of the starting point x in the petite
set (the measure ϕa need not be a probability mesure). This would be quite restrictive, but the
definition allows to replace P1(x,A) by an average of Pt(x,A) over all times t>0, a much weaker
requirement. Petite sets allow us to give a first condition for a diffusion to be Harris recurrent.
2.2. The Lyapunov function approach by Meyn and Tweedie 41
Theorem 2.2.8: Harris recurrence condition
If all compact subsets of Rn are petite and (2.2.4) holds for a compact set C ⊂ Rn, a con-
stant d > 0, and a norm-like function V , then the process (Xt)t>0 is Harris recurrent, and
therefore admits an essentially unique invariant measure pi.
This result follows from Theorem 2.2.4, combined with [MT93b, Theorem 5.1], which gives
an analogous statement in the discrete-time case.
2.2.2 Positive Harris recurrence and existence of an invariant probability
Combining Theorem 2.2.8 with Proposition 2.2.6, we obtain a condition for the existence of an
invariant measure. This measure need not have finite mass, preventing it from being normal-
isable to yield an invariant probability measure. This motivates the following definition.
Definition 2.2.9: Positive Harris recurrence
Let (Xt)t>0 be a Harris recurrent diffusion, with invariant measure pi. If pi(Rn) < ∞, then
(Xt)t>0 is said to be positive Harris recurrent.
Remark 2.2.10: Link with expected return times
A diffusion satisfying Ex
[
τA
]
<∞ for any x ∈ Rn and any Borel set A such that µ(A) > 0 for
a σ -finite measure µ is positive Harris recurrent. Indeed, in this case, τA is almost surely
finite, so that the chain is Harris recurrent. Furthermore, given x ∈ Rn and Awith µ(A) > 0,
the probability measure piA(x, ·) given by
piA(x,B) =
1
Ex
[
τA
]Ex[∫ τA
0
1Xt∈Bdt
]
for any Borel set B ⊂ Rn can easily be shown to be invariant. By essential uniqueness, any
invariant measure is thus normalisable.
A sufficient condition for the process (Xt)t>0 to be positive Harris recurrent will thus auto-
matically be a sufficient condition for the existence of an invariant probability measure. A first
such condition is provided by the following result, which is [MT93c, Theorem 4.2].
Theorem 2.2.11: Positive Harris recurrence condition
Assume there exist constants c,d > 0, a function f : Rn→ [1,∞), a closed petite set C ⊂ Rn,
and a positive function V such that
(L V )(x)6−cf (x) + d1C(x) (2.2.5)
for all x ∈ Rn. Assume furthermore that V is bounded on C. Then the process (Xt)t>0 is
positive Harris recurrent, and therefore admits an invariant probability measure pi. Fur-
thermore,
〈pi,f 〉 := Epi[f ] = ∫
Rn
f (x)pi(dx) <∞ .
While Condition (2.2.5) is usually easy to check for a given f and V , the requirement that C
be petite may be harder to verify. Fortunately, for Feller diffusions, there exists an alternative
criterion for the existence of an invariant measure, which avoids having to check that C is
petite. The following result is [MT93c, Theorem 4.5].
42 Chapter 2. Invariant measures for SDEs
Theorem 2.2.12: Existence of invariant probability measures for Feller diffusions
Assume that the diffusion (Xt)t>0 has the Feller property, and that (2.2.5) holds for a com-
pact C ⊂ Rn. Then the diffusion admits an invariant probability measure pi. Furthermore,
any invariant probability pi satisfies 〈pi,f 〉6 d/c.
2.2.3 Convergence to the invariant probability measure
Once the existence of an invariant probability measure pi is established, the next natural ques-
tion is whether the distribution of (Xt)t>0 will converge to pi, at least under some conditions on
the law of X0. There are many choices of norms quantifying such a convergence, and results
exist for several of them. Here we consider the following weighted norm on signed measures.
Definition 2.2.13: f -norm of a signed measure
Let µ be a signed measure on (Rn,B ), and let f : Rn → [1,∞) be a measurable function.
Then we define the f -norm of µ by
‖µ‖f = sup
g:|g |∣∣∣〈µ,g〉∣∣∣ , 〈µ,g〉 := Eµ[g] = ∫
Rn
g(x)µ(dx) ,
where the supremum runs over all measurable functions such that |g(x)| 6 f (x) for all
x ∈ Rn.
Definition 2.2.14: Exponential ergodicity
Given a measurable function f : Rn → [0,∞), a diffusion process (Xt)t>0 admitting an
invariant probability measure pi is called f -exponentially ergodic if there exist β > 0 and a
function B : Rn→ R+ such that
‖Pt(x, ·)−pi‖f 6B(x)e−βt
for any x ∈ Rn and t> 0.
The following result, which is [MT93c, Theorem 6.1], provides a condition on Lyapunov
functions that guarantees exponential ergodicity.
Theorem 2.2.15: Condition for exponential ergodicity
Assume there exist a norm-like function V , and constants c > 0, d ∈ R such that the diffu-
sion (Xt)t>0 satisfies the condition
(L V )(x)6−cV (x) + d (2.2.6)
for all x ∈ Rn. Assume further that all compact K ⊂ Rn are petite for some discrete-time
Markov chain (Xn∆)n>0. Then the diffusion is exponentially ergodic. More precisely, there
exist constants β,b > 0 such that
‖Pt(x, ·)−pi‖1+V 6 b(1 +V (x))e−βt
for any x ∈ Rn and t> 0.
2.2. The Lyapunov function approach by Meyn and Tweedie 43
Again, while condition (2.2.6) is usually easy to check for a given Lyapunov function V ,
the requirement that all compact subsets be petite is often more difficult to verify. This is why
we present in the next section an alternative approach, due to Martin Hairer and Jonathan
Mattingly, that provides an exponential ergodicity criterion in a slightly different norm, and
avoids any condition on sets being petite.
2.2.4 A simplified proof by Hairer and Mattingly
We present here a convergence criterion from [HM11], which applies to discrete-time Markov
processes. This is, however, not really a restriction, because if Pt is the semi-group of a diffusion
(Xt)t>0, then for any δ > 0, Pδ generates the embedded discrete-time Markov chain (Xδn)n∈N ob-
tained by restricting t to integer multiples of δ. An invariant measure pi of the diffusion is
clearly also invariant for the discrete-time Markov chain, and it is not difficult to convert a
convergence result in discrete time to a convergence result in continuous time. To avoid con-
fusion, we will denote the discrete semi-group by P , and its actions on bounded measurable
functions f and signed measures µ by(
P f
)
(x) = Ex
[
f (Xδ)
]
:=
∫
Rn
f (y)P (x,dy) ,
(
µP
)
(A) = Pµ
{
Xδ ∈ A} := ∫
Rn
P (x,A)µ(dx) .
The convergence result of [HM11] requires two rather simple conditions on P . The first one is
a discrete-time analogue of (2.2.6), which guarantees that the Markov process does not escape
to infinity.
Assumption 2.2.16: Geometric drift condition
There exist a function V : Rn→ [0,∞) and constants d> 0 and γ ∈ (0,1) such that(
P V
)
(x)6γV (x) + d (2.2.7)
for all x ∈ Rn.
Note that γ is the discrete-time analogue of e−δc in continuous time, so that (2.2.7) is indeed
the discrete anlogue of (2.2.6). The second condition is a form of irreducibility condition, which
is analogous to the conditions on sets being petite that we encountered above, but much simpler
to verify. It is also similar to what is known as Doeblin condition in the Markov chain literature.
Assumption 2.2.17: Minorisation condition
Let C = {x ∈ Rn : V (x) < R} for some R > 2d(1 − γ)−1. Then there exists α ∈ (0,1) and a
probability measure ν such that
inf
x∈CP (x,A)>αν(A) (2.2.8)
holds for all Borel sets A ⊂ Rn.
Under these two conditions, the main result of [HM11] is the following statement, which
provides both existence of a unique invariant measure and convergence to this measure. Con-
vergence takes place in the weighted supremum norm defined by
‖f ‖1+V = sup
x∈Rn
|f (x)|
1 +V (x)
.
44 Chapter 2. Invariant measures for SDEs
Theorem 2.2.18: Exponential ergodicity in discrete time
If Assumptions 2.2.16 and 2.2.17 hold, thenP admits a unique invariant probability mea-
sure pi. Furthermore, there exist constants M > 0 and γ ∈ (0,1) such that
‖P nf − 〈pi,f 〉‖1+V 6Mγn‖f − 〈pi,f 〉‖1+V
holds for all measurable functions f : Rn→ R such that ‖f ‖1+V <∞.
Since the proof of this result is rather elementary (and also quite elegant), we will provide
the details of it in the remainder of this section. Our presentation follows closely the arti-
cle [HM11]. We first recall the definition of the total variation distance between probability
measures.
Definition 2.2.19: Total variation distance
Let µ and ν be two probability measures on a measure space (X ,F ). The total variation
distance between µ and ν is defined as
‖µ− ν‖TV = 2sup
{
|µ(A)− ν(A)| : A ∈ F
}
.
It is known that the total variation distance between µ and ν coincides with the L1-distance,
that is,
‖µ− ν‖TV =
∫
Rn
|µ− ν|(dx) .
This can be seen, for instance, by assuming the measures have densities with respect to Lebesgue
measure.
The main idea of the proof of Theorem 2.2.18 is to work with a whole family of equivalent
norms. Instead of just ‖f ‖1+V , we thus consider the norms ‖f ‖1+βV where β > 0 is a scale
parameter. We also consider the dual metric on probability measures given by
ρβ(µ,ν) = sup
f :‖f ‖1+βV61
∫
Rn
f (x)(µ− ν)(dx) . (2.2.9)
By an argument similar to the one yielding equivalence between the two definitions of total
variation norm, the distance (2.2.9) is in fact equivalent to the weighted total variation distance
given by
ρβ(µ,ν) =
∫
Rn
(
1 + βV (x)
)|µ− ν|(dx) .
The key result for this distance is the following.
Proposition 2.2.20: Contraction estimate in ρβ distance
If Assumptions 2.2.16 and 2.2.17 hold, then there exist α¯ ∈ (0,1) and β > 0 such that
ρβ(P µ,P ν)6 α¯ρβ(µ,ν)
holds for all probability measures µ,ν on Rn. More precisely, for any α0 ∈ (0,α) and any
γ0 ∈ (γ + 2dR−1,1), one can choose
β =
α0
d
, α¯ =
(
1− (α −α0)
)
∨ 2 +Rβγ0
2 +Rβ
.
2.2. The Lyapunov function approach by Meyn and Tweedie 45
To prove this result, we introduce an alternative definition of ρβ . Consider first the function
dβ(x,y) =
0 if x = y ,2 + βV (x) + βV (y) if x , y ,
which can easily be checked to be a metric on Rn. This metric induces the Lipschitz seminorm
|||f |||β = sup
x,y
|f (x)− f (y)|
dβ(x,y)
, (2.2.10)
and a dual metric on probability measures given by
ρ∗β(µ,ν) = sup
f :|||f |||β61
∫
Rn
f (x)(µ− ν)(dx) .
Note that the supremum is taken on a different set of functions than (2.2.9).
Lemma 2.2.21: Equivalence of norms
We have
|||f |||β = inf
c∈R‖f + c‖1+βV .
In particular, ρ∗β = ρβ .
Proof: We first note that since |f (x)|6 ‖f ‖1+βV (1 + βV (x)) for all x ∈ Rn, we have
|f (x)− f (y)|
2 + βV (x) + βV (y)
6 |f (x)|+ |f (y)|
2 + βV (x) + βV (y)
6 ‖f ‖1+βV
for all x,y ∈ Rn, so that |||f |||β 6 ‖f ‖1+βV . It follows from the definition (2.2.10) of |||f |||β that
|||f |||β 6 inf
c∈R ‖f + c‖1+βV .
To prove the reverse inequality, we take f with |||f |||β 6 1 and set
c∗ = inf
x∈Rn
(
1 + βV (x)− f (x)
)
.
For any x,y ∈ Rn, we have
f (x)6 |f (y)|+ |f (x)− f (y)|6 |f (y)|+ 2 + βV (x) + βV (y) ,
which implies
1 + βV (x)− f (x)>−1− βV (y)− |f (y)| .
Since V (y) is finite at one point at least, c∗ is bounded below, and hence |c∗| < ∞. Now we
observe that on one hand,
f (x) + c∗6 f (x) + 1 + βV (x)− f (x) = 1 + βV (x) ,
while on the other hand,
f (x) + c∗ = inf
y∈Rn
(
f (x) + 1 + βV (y)− f (y)
)
> inf
y∈Rn
(
1 + βV (y)− |||f |||βdβ(x,y)
)
>−1− βV (x) ,
46 Chapter 2. Invariant measures for SDEs
where we have used the fact that |||f |||β 6 1. Hence |f (x) + c∗|6 1 + βV (x), and thus
inf
c∈R ‖f + c‖1+βV 6 ‖f + c
∗‖1+βV 6 1 ,
proving the reverse inequality. The equality of ρ∗β and ρβ follows from the fact that the unit
balls {f : ‖f ‖1+βV }6 1 and {f : |||f |||β 6 1} only differ by additive constants.
Proof of Proposition 2.2.20. We prove that under Assumptions 2.2.16 and 2.2.17, one has
|||P f |||β 6 α¯|||f |||β . (2.2.11)
Fix a test function f with |||f |||β61. By Lemma 2.2.21 we can assume, without loss of generality,
that one also has ‖f ‖1+βV 6 1. It then suffices to show that∣∣∣(P f )(x)− (P f )(y)∣∣∣6 α¯dβ(x,y) .
Since the claim is true for x = y, we consider the case x , y. We treat separately the cases
V (x) +V (y)>R and V (x) +V (y) < R.
• If V (x) +V (y)>R, we note that∣∣∣(P f )(x)∣∣∣6 ‖f ‖1+βV ∫
Rn
(1 + βV (y))P (x,dy)6 1 + β(P V )(x) . (2.2.12)
Therefore, the geometric drift condition (2.2.7) yields∣∣∣(P f )(x)− (P f )(y)∣∣∣6 2 + β(P V )(x) + β(P V )(y)
6 2 + βγV (x) + βγV (y) + 2βd
6 2 + βγ0V (x) + βγ0V (y) ,
where we have set γ0 = γ + dR−1 and used V (x) +V (y)>R. We now set
γ1 =
2 + βRγ0
2 + βR
.
One readily checks that 2(1−γ1) = βR(γ1 −γ0)6 β(γ1 −γ0)(V (x) +V (y)), so that∣∣∣(P f )(x)− (P f )(y)∣∣∣6γ1(2 + βV (x) + βV (y)) = γ1dβ(x,y) . (2.2.13)
• If V (x) +V (y) < R, then x,y ∈ C. We introduce the Markov kernel P˜ defined by
P˜ (x,A) =
1
1−αP (x,A)−
α
1−αν(A) .
Then we have (
P f
)
(x) = (1−α)(P˜ f )(x) +α∫
Rn
f (y)ν(dy) ,
showing that ∣∣∣(P f )(x)− (P f )(y)∣∣∣ = (1−α)∣∣∣(P˜ f )(x)− (P˜ f )(y)∣∣∣
6 (1−α)[2 + β(P˜ V )(x) + β(P˜ V )(y)]
6 2(1−α) + β(P V )(x) + β(P V )(y)
6 2(1−α) +γβ[V (x) +V (y)]+ 2βd .
2.3. Garrett Birkhoff’s approach 47
Here, to obtain the second line, we have used a similar argument as in (2.2.12), while the
third line uses the fact that (
P˜ V
)
(x)6 1
1−α
(
P V
)
(x)
since V is non-negative. It follows that setting
β =
α0
d
, γ2 = γ ∨
(
1− (α −α0)
)
for some α0 ∈ (0,α), one obtains∣∣∣(P f )(x)− (P f )(y)∣∣∣6γ2dβ(x,y) . (2.2.14)
It follows from (2.2.13) and (2.2.14) that∣∣∣(P f )(x)− (P f )(y)∣∣∣6 α¯dβ(x,y) , α¯ = γ1 ∨γ2 .
Since γ1>γ , this implies (2.2.11). The result then follows from the fact that ρ∗β = ρβ , and dβ is
the norm dual to ||| · |||β .
To conclude the proof of Theorem 2.2.18, it remains to prove existence of the invariant
measure, which can be done by a contraction argument.
Proof of Theorem 2.2.18. We fix some x ∈ Rn, and define the measure µn = P nδx for any
n ∈ N. Then by Proposition 2.2.20, we have
ρβ(µn+1,µn)6 α¯nρβ(µ1,δx)
for some α¯ ∈ (0,1) and β > 0. Therefore, µn is a Cauchy sequence. It is known that the to-
tal variation distance is complete for the space of measures with finite mass, implying that
ρβ is complete for the space of probability measures integrating V . Therefore, there exists
a probability measure µ∞ such that ρβ(µn,µ∞) → 0 as n → ∞. This implies that µn con-
verges to µ∞ in total variation. Since P is a contraction in total variation, it follows that
P µ∞ = limn→∞P µn = limn→∞µn+1 = µ∞.
2.3 Garrett Birkhoff’s approach
The aim of this section is to present a slightly different approach to estimating the rate of
convergence to an invariant probability distribution, due to Garret Birkhoff [Bir57]. Compared
to the approaches we have discussed so far, it has the following advantages:
1. The proof has a more transparent geometric interpretation, that helps understand the mi-
norisation condition (2.2.8) we have seen in the last section.
2. As Theorem 2.2.18, the result provides explicit bounds on the rate of convergence to the
invariant probability.
3. The result also works for submarkovian processes, that is, processes in which the total prob-
ability decreases. In that case, it provides information on the principal eigenvalue of the
process, as well as on the spectral gap to the next-to-leading eigenvalue.
As in the last subsection, the approach applies to discrete-time Markov chains. For simplic-
ity, we are going to assume that the transition kernel P has a density p(x,y) defined on X ×X
for a domain X ⊂ Rn (or possibly on a more general Banach space). The transition kernel thus
acts on bounded measurable functions f and on signed measure µ according to(
P f
)
(x) =
∫
X
p(x,y)f (y)dy = Ex[f (X1)] ,(
µP
)
(dy) =
∫
X
µ(dx)dy = Pµ{X1 ∈ dy} . (2.3.1)
48 Chapter 2. Invariant measures for SDEs
The main property that will guarantee convergence to an invariant probability distribution
(or to a so-called quasistationary distribution in the submarkovian case) is the following. Note
the similarity of the lower bound with the minorisation condition (2.2.8).
Definition 2.3.1: Uniform positivity
The transformation P is called uniformly positive if there exist strictly positive functions
s,m :X → R+ and a constant L such that
s(x)m(y)6 p(x,y)6Ls(x)m(y) ∀x,y ∈X . (2.3.2)
The main result we are going to prove is the following.
Theorem 2.3.2: Perron–Frobenius theorem and spectral gap estimate
IfP is uniformly positive, there exist λ0 > 0, a bounded measurable function h0 :X → R+,
and a probability measure pi0 on X such that(
P h0
)
(x) = λ0h0(x) ,(
µP
)
(A) = λ0pi0(A)
for all x ∈X and all Borel sets A ⊂X . In particular, in the Markovian caseP (x,X ) = 1 for
all x ∈X , one has λ0 = 1 and h0(x) = 1 for all x ∈X .
Furthermore, for any bounded measurable f : X → R+, there exist finite constants
M1(f ),M2(f ) such that∣∣∣P nf (x)−λn0M1(f )h0(x)∣∣∣6M2(f )λn0(1− 1L2
)n
h0(x)
for all x ∈X . In particular,
M1(f ) =
〈pi0, f 〉
〈pi0,h0〉 ,
which reduces in the Markovian case to M1(f ) = 〈pi0, f 〉.
The number λ0 is called the principal eigenvalue of the Markov process, the function h0
is called the principal eigenfunction, while pi0 is called the quasistationary distribution (in the
submarkovian case, when λ0 < 1), and is equal to the stationary distribution in the Markovian
case, when λ0 = 1. The first part of this theorem is thus a generalisation to integral operators
of the well-known Perron–Frobenius theorem, which was first obtained by Jentzsch [Jen12].
In what follows, we are going to provide a detailed proof of Theorem 2.3.2, starting with
some simpler situations in order to build the intuition.
2.3.1 Two-dimensional case
The simplest case occurs when X is a discrete set of cardinality 2. Then P is a linear operator
on E = R2, that is, a 2× 2 matrix
P =
(
a b
c d
)
with strictly positive entries. Therefore P maps the cone E + = R+ × R+ strictly into itself.
IteratingP , the image of E + becomes thinner and thinner, and concentrates on the eigenvector
of P for the largest eigenvalue. However, unless the principal eigenvalue λ0 of P is 1, iterates
2.3. Garrett Birkhoff’s approach 49
of a vector in E + will not converge to an eigenvector: they will shrink to 0 if λ0 < 1 and diverge
if λ0 > 1.
To avoid this, one can identify all vectors f ,g such that f = λg for some λ > 0. In other
words, this amounts to working on the projective line. Iterates of a projective line in E + will
converge to the eigenspace associated with λ0.
Birkhoff introduces Hilbert’s projective metric by defining, for f = (f1, f2) and g = (g1, g2) ∈
E +, the distance
θ(f ,g) =
∣∣∣∣∣log(f2g1f1g2
)∣∣∣∣∣ .
Note that this distance is infinite if f or g belongs to a coordinate axis; in fact, it induces a
hyperbolic geometry. Also note that by definition,
θ(λf ,µg) = θ(f ,g) ∀λ,µ > 0 . (2.3.3)
Birkhoff then computes the operator norm of P for this metric, showing that
sup
f ,g∈E +
θ(P f ,P g)
θ(f ,g)
= tanh
(
∆
4
)
, (2.3.4)
where ∆ = |log(ad/bc)| is the diameter, in the projective norm, of P E +. The computation is
based on the fact that z = f2/f1 evolves according to the homographic transformation
z 7→ c+ dz
a+ bz
.
However, the details of this computation will not matter in what follows.
2.3.2 General vector space
Let now E be a general vector space, of finite or infinite dimension. Again, let E + be the cone
of elements whose components are all non-negative.
Definition 2.3.3: Projective metric
Let f ,g ∈ E +. Consider the two-dimensional vector space E spanned by f and g. The
intersection C = E∩E + is a cone (it is invariant under multiplication by positive constants).
There exists a linear map A, mapping C to R+ ×R+. We define
θ(f ,g;E +) = θ(Af ,Ag) .
The definition does not depend on the choice of the map A (this follows from (2.3.3)). θ is
called the projective metric associated with E +.
To understand this definition better, consider the line {fα = f −αg,α ∈ R}. If α6 0, then fα,
being the sum of two positive elements, is in E +. When α > 0, however, the components of fα
decrease with increasing α, and change sign at some point. Let
α∗ = sup{α > 0: f −αg ∈ E +} . (2.3.5)
Similarly, we define
β∗ = sup{β > 0: g − βf ∈ E +} . (2.3.6)
Then the linear map of matrix (in the basis (f ,g))
A =
(
1 β∗
α∗ 1
)
50 Chapter 2. Invariant measures for SDEs
maps fα∗ to a multiple of (1,0) and fβ∗ to a multiple of (0,1). It thus satisfies the definition.
Furthermore we have Af = (1,α∗) and Ag = (β∗,1), so that
θ(f ,g;E +) = θ(Af ,Ag) =
∣∣∣log(α∗β∗)∣∣∣ . (2.3.7)
Proposition 2.3.4: Operator norm of P
Let P : E + → E + be a linear map. If P E + has finite diameter ∆, then the operator norm
of P is given by
sup
f ,g∈E +
θ(P f ,P g;E +)
θ(f ,g;E +)
= tanh
(
∆
4
)
.
Proof: If θ(f ,g;E +) < ∞, let a,b be the endpoints of the intersection of E + with the line
through f and g. By definition of ∆, θ(P a,P b;E +)6∆. Thus the operator norm is bounded by
tanh(∆/4) as a consequence of (2.3.4). To show equality, one uses an approximation argument
for a sequence of (fn, gn) of growing projective distance.
Theorem 2.3.5: Convergence of iterates in projective space
If P E + has finite diameter ∆ and the cone E + is complete with respect to the distance θ,
then there is a unique ray h in E + to which P nf converges for all f ∈ E +.
The proof is a standard contraction argument. A proof of completeness will be given in
Corollary 2.3.10 below.
2.3.3 Integral transformations and Jentzsch’s theorem
We now return to the situation described at the beginning of this section, where X is a Borel
set of Rn, and E is the Banach space of continuous functions f : X → R, equipped with the
supremum norm. Let E + denote the cone of positive functions f : X → R+. Consider the
integral operator P : E +→ E + defined by (2.3.1).
Proposition 2.3.6: Bound on the projective diameter
If P satisfies the uniform positivity condition (2.3.2), then the diameter of P E + satisfies
∆6 2logL .
Proof: Let f ,g ∈ E +. Without limiting the generality, we may assume∫
X
f (y)m(y)dy =
∫
X
g(y)m(y)dy = 1
This implies
s(x)6 (P f )(x) , (P g)(x)6Ls(x) ∀x ∈X .
It follows that (P f )− 1L (P g)>0 and (P g)− 1L (P f )>0. Thus α∗,β∗ defined in (2.3.5) and (2.3.6)
are greater or equal than 1/L, that is 1/α∗,1/β∗6L and the result follows from (2.3.7).
Applying Theorem 2.3.5, we recover Jentzsch’s generalisation of the Perron–Frobenius the-
orem to integral operators [Jen12]:
2.3. Garrett Birkhoff’s approach 51
Theorem 2.3.7: Perron–Frobenius theorem for integral operators
If P is uniformly positive, then there exists a strictly positive h0 ∈ E + and λ0 > 0 such that
P h0 = λ0h0. Moreover, for any f ∈ E +, the sequence of lines spanned by P nf converges
to the line spanned by h0.
Remark 2.3.8: Dual picture
The dual map P ∗ given by
(
P ∗v)(y) := (vP )(y) = ∫
X
v(x)p(x,y)dx
satisfies 〈P ∗v,f 〉 = 〈x,P f 〉, where 〈·, ·〉 denotes the usual inner product. Jentzsch’s theorem
shows the existence of a strictly positive function p0 such that P ∗p0 = λ0p0, with a similar
convergence property. The eigenvalue λ0 is the same, since
λ0〈p0,h0〉 = 〈p0,P h0〉 = 〈P ∗p0,h0〉
and 〈p0,h0〉 > 0. Furthermore, we have for any f ∈ E + that
λn0〈p0, f 〉 = 〈(P ∗)np0, f 〉 = 〈p0,P nf 〉 ,
which implies that
lim
n→∞
P nf
λn0
= c(f )h0 where c(f ) =
〈p0, f 〉
〈p0,h0〉 .
2.3.4 Banach lattices and spectral gap
Birkhoff extends the theory to Banach lattices, that is, Banach spaces E with a (partial) order
in which every pair of elements f ,g admits an infimum f ∧g and a supremum f ∨g. Examples
of vector lattices include
1. the space of continuous functions f : X → R, equipped with the supremum norm, with
pointwise order given by
f 6 g ⇔ f (x)6 g(x) ∀x ∈X ,
and
(f ∧ g)(x) = f (x)∧ g(x) and (f ∨ g)(x) = f (x)∨ g(x) ;
2. the space of bounded measurable functions f :X → R, with the same norm;
3. the space of finite signed measures µ on X , equipped with the L1-norm.
The last two examples are associated with Markov kernels P (x,A) and the (dual) maps in-
troduced in (2.3.1). As in Definition 2.3.1, the Markov kernel is called uniformly positive if
there exist a positive function f , a measure ν (absolutely continuous with respect to Lebesgue
measure, with strictly positive density1) and a constant L such that
s(x)ν(A)6P (x,A)6Ls(x)ν(A) ∀x ∈X ,∀A ⊂X .
A similar computation as above shows that P E + has projective diameter ∆6 2logL. Then
similar arguments as before show that P admits a unique principal eigenvalue λ0, a measure
1For results on more general measures, see [Num84, Ore71]
52 Chapter 2. Invariant measures for SDEs
pi0 such that pi0P = λ0pi0, called the quasistationary distribution, and a positive function h0
such that P h0 = λ0h0.
We now examine the speed of convergence of iterates of a positive map P for a general
Banach lattice. The following proposition is a key result.
Proposition 2.3.9: Strong comparability
Any f ,g ∈ E + are strongly comparable, in the sense that there exist strictly positive con-
stants α,β,R such that
αf 6 g 6Rαf ,
βg 6 f 6Rβg . (2.3.8)
The optimal constant is R = eθ(f ,g;E
+).
Proof: LetA be the linear map of Definition 2.3.3, and writeAf = (f1, f2),Ag = (g1, g2). Assume
without limiting the generality that f1g2> f2g1. Then
f1(Ag)− g1(Af ) = (0, f1g2 − g1f2) ∈ R+ ×R+
and thus f1g − g1f ∈ E +. Similarly, we have g2f − f2g ∈ E +. This shows that
g1
f1
f 6 g 6 g2
f2
f = eθ(f ,g;E
+) g1
f1
f ,
and thus (2.3.8) holds with α = g1/f1. The proof of the second inequality is analogous.
A first consequence of this result is that we can prove completeness.
Corollary 2.3.10: Completeness of the metric
If ‖f ‖ = ‖g‖ = 1, then
‖f − g‖6 eθ(f ,g;E +)−1 .
As a consequence, in the metric defined by θ, any θ-connected component of the unit
sphere is a complete metric space.
Proof: If ‖f ‖ = ‖g‖ = 1, then (2.3.8) holds with α6 16Rα and R = eθ(f ,g;E +). Thus
‖f − g‖ = ‖f ∨ g − f ∧ g‖6 ‖Rαf −αf ‖ = (R− 1)α‖f ‖6R− 1 ,
as claimed.
It follows that Theorem 2.3.5 indeed applies in this setting. Let us finally derive a spectral-gap
estimate.
Proof of Theorem 2.3.2. Denote P nf by fn. For any n let αn and βn be the optimal constants
for which
αnh06
fn
λn0
6 βnh0 .
Such constants exist and are positive for n = 1 because f1,h0 belong to a cone with diameter ∆.
Assuming by induction that the above inequality holds for some n, and applying P , we obtain
that it holds for n+ 1 with
αn6αn+16 βn+16 βn .
2.3. Garrett Birkhoff’s approach 53
Define
rn = fn −αnλn0h0 ∈ E + ,
sn = βnλ
n
0h0 − fn ∈ E + . (2.3.9)
We have
rn + sn = (βn −αn)λn0h0 ,
P rn +P sn = (βn −αn)λn+10 h0 .
By Proposition 2.3.9, there exist positive constants an,bn and R6 e∆ such that
anh06P rn6Ranh0 ,
bnh06P sn6Rbnh0 .
On one hand it follows that
(an + bn)h06P rn +P sn = (βn −αn)λn+10 h06R(an + bn)h0 . (2.3.10)
On the other hand, we conclude by applying P to (2.3.9) that
(αnλ
n+1
0 + an)h06P fn = fn+16 (βnλn+10 − bn)h0 ,
This yields
αn+1>αn +
an
λn+10
, βn+16 βn − bn
λn+10
.
Using (2.3.10) it follows that
(βn+1 −αn+1)6 (βn −αn)− an + bn
λn+10
6
(
1− 1
R
)
(βn −αn) .
This shows that the sequences αn and βn converge to a common limit M1(f ), and thus that
fn/λ
n
0 converges to M1(f )h0 at rate (1−R−1)n = (1− e−∆)n.
Finally, the uniform positivity condition (2.3.2) implies that e−∆ is bounded below by 1/L2,
which concludes the proof.
Remark 2.3.11: Dual picture
As in Remark 2.3.8, we have
λn0〈pi0, f 〉 = 〈pi0P n, f 〉 = 〈pi0,P nf 〉
for all n, which shows that
M1(f ) =
〈pi0, f 〉
〈pi0,h0〉 .
2.3.5 From discrete time to continuous time
We provide here a simple illustration of how the discrete-time results presented in this section
(and in Section 2.2.4) can be applied to continuous-time SDEs. Consider the SDE
dXt = f (Xt)dt + σ dWt , (2.3.11)
54 Chapter 2. Invariant measures for SDEs
where σ > 0 is a small parameter, and f has a stable equilibrium point at the origin, that is
f (0) = 0 ,
and all the eigenvalues of the Jacobian matrix
A =
∂f
∂x
(0)
are strictly negative. The approaches we just introduced require upper and lower bounds on
the transition density pt(x,y) for some t > 0, say t = 1. A general approach for obtaining such
bounds is based on Malliavin calculus, but it is also possible to obtain the required information
by less elaborate methods. The approach we outline here is a simplification of the method used
in [BG14, BB17].
A first point is that one can use Harnack inequalities forL -harmonic functions (see [GT01,
Corollaries 9.24 and 9.25]) to show that the transition density at time 1, p1(x,y), satisfies the
following two regularity estimates on small balls. For x ∈ Rn and r > 0, we letBr(x) denote the
ball of radius r centred in x.
Lemma 2.3.12: Harnack-type bounds on the transition density
1. Fix x0, y ∈ Rn. There exists a constant C0, independent of x0 and σ , such that
sup
x∈Bσ2 (x0)
p1(x,y)6C0 inf
x∈Bσ2 (x0)
p1(x,y) .
2. Fix x0, y ∈ R and r0 > 0, and let R0 = r0σ2. Then there exist constants C1 > 1 and α > 0,
independent of σ , such that for any R6R0, one has
oscBR(x0)p1 := sup
x∈BR(x0)
p1(x,y)− inf
x∈BR(x0)
p1(x,y)6C1
(
R
R0
)α
oscBR0 (x0)p1 .
Using these properties, one can then show (cf. [BG14, Lemma 5.7]) that for y in a compact
set D, one has the rough a priori bound
sup
x∈D
p1(x,y)
inf
x∈D p1(x,y)
6 eC/σ2
for a constantC, depending onD, but not on σ . Furthermore, one obtains (cf. [BG14, Lemma 5.8])
that for x0, y in D and any η > 0, there exists r = r(y,η) > 0, independent of σ , such that
sup
x∈Brσ2 (x0)
p1(x,y)6 (1 + η) inf
x∈Brσ2 (x0)
p1(x,y) .
This result can now be extended to larger balls by using a coupling argument. Let Xxt denote
the solution of (2.3.11) with initial condition x, conditionned to stay in D up to time t. For
x1,x2, y ∈D, let
N (x1,x2) = inf
{
n> 1: |Xx2n −Xx1n | < r(η,y)σ2
}
.
If pDn denotes the transition density at time n of the process conditioned to stay in D, one
obtains [BG14, Proposition 5.9] that for all n> 2, one has
sup
x∈D
pDn (x,y)
inf
x∈D p
D
n (x,y)
6 1 + η + sup
x1,x2∈D
P
{
N (x1,x2) > n− 1
}
eC/σ
2
(2.3.12)
2.3. Garrett Birkhoff’s approach 55
for a constant C independent of σ and y ∈ D. Controlling the tails of N (x1,x2) thus amounts
to a coupling argument, with an error of size rσ2. To do this, we observe that the difference
Yt = X
x1
t −Xx2t satisfies the equation
dYt = AYt dt + b(t,Yt)dt ,
where b(t,y) = O(y2) for small y. Using the integral representation
Yt = (x2 − x1)eAt+
∫ t
0
∫ t
0
eA(t−s) b(s,Ys)ds ,
one can prove an estimate of the form
P
{
‖Xx11 −Xx21 ‖> ρ‖x2 − x1‖
}
6 e−κ/σ2
uniformly over x1,x2 ∈ D, for some ρ ∈ (0,1) and κ > 0. Using the Markov property2, one
arrives at
P
{
N (x1,x2) > n
}
6 e−nκ/σ2 ,
so that by choosing n large enough, one can make the right-hand side of (2.3.12) smaller than
1 + 2η. We thus obtain a bound on the variation of the map x 7→ p(x,y) inside D, yielding a
uniform positivity property of the form (2.3.2) with m(y) = 1 and L close to 1.
2We simplified the argument somewhat, because one has to account for the difference between the initial process,
and the process conditioned on staying in D. See [BG14, Proposition 6.13] for the precise argument.
56 Chapter 2. Invariant measures for SDEs
Bibliography
[ADR69] J. Azema, M. Duflo, and D. Revuz. Mesure invariante des processus de Markov recur-
rents. Sem. Probab. III, Univ. Strasbourg 1967/68, Lect. Notes Math. 88, 24-33 (1969).,
1969.
[BB17] Manon Baudel and Nils Berglund. Spectral Theory for Random Poincaré Maps. SIAM
J. Math. Anal., 49(6):4319–4375, 2017.
[BG14] Nils Berglund and Barbara Gentz. On the noise-induced passage through an unstable
periodic orbit II: General case. SIAM J. Math. Anal., 46(1):310–352, 2014.
[Bir57] Garrett Birkhoff. Extensions of Jentzsch’s theorem. Trans. Amer. Math. Soc., 85:219–
227, 1957.
[DPZ96] G. Da Prato and J. Zabczyk. Ergodicity for infinite-dimensional systems, volume 229
of London Mathematical Society Lecture Note Series. Cambridge University Press, Cam-
bridge, 1996.
[Get80] R. K. Getoor. Transience and recurrence of Markov processes. Seminaire de proba-
bilites XIV, 1978/79, Lect. Notes Math. 784, 397-409 (1980)., 1980.
[GT01] David Gilbarg and Neil S. Trudinger. Elliptic partial differential equations of second
order. Classics in Mathematics. Springer-Verlag, Berlin, 2001. Reprint of the 1998
edition.
[HM11] Martin Hairer and Jonathan C. Mattingly. Yet another look at Harris’ ergodic theorem
for Markov chains. In Seminar on Stochastic Analysis, Random Fields and Applications
VI, volume 63 of Progr. Probab., pages 109–117. Birkhäuser/Springer Basel AG, Basel,
2011.
[Jen12] Jentzsch. Über Integralgleichungen mit positivem Kern. J. f. d. reine und angew. Math.,
141:235–244, 1912.
[MT93a] S. P. Meyn and R. L. Tweedie. Generalized resolvents and Harris recurrence of
Markov processes. In Doeblin and modern probability (Blaubeuren, 1991), volume 149
of Contemp. Math., pages 227–250. Amer. Math. Soc., Providence, RI, 1993.
[MT93b] Sean P. Meyn and R. L. Tweedie. Stability of Markovian processes. II. Continuous-
time processes and sampled chains. Adv. in Appl. Probab., 25(3):487–517, 1993.
[MT93c] Sean P. Meyn and R. L. Tweedie. Stability of Markovian processes. III. Foster-
Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab., 25(3):518–548,
1993.
57
58 Bibliography
[Num84] Esa Nummelin. General irreducible Markov chains and nonnegative operators, vol-
ume 83 of Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge,
1984.
[Øks03] B. Øksendal. Stochastic Differential Equations. Springer, 2003.
[Ore71] Steven Orey. Lecture notes on limit theorems for Markov chain transition probabilities.
Van Nostrand Reinhold Co., London, 1971. Van Nostrand Reinhold Mathematical
Studies, No. 34.
[RY99] Daniel Revuz and Marc Yor. Continuous martingales and Brownian motion, volume 293
of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathe-
matical Sciences]. Springer-Verlag, Berlin, third edition, 1999.
Nils Berglund
Institut Denis Poisson (IDP)
Universite d’Orleans, Universite de Tours, CNRS – UMR 7013
Bâtiment de Mathematiques, B.P. 6759
45067 Orleans Cedex 2, France
E-mail address: nils.berglund@univ-orleans.fr
https://www.idpoisson.fr/berglund