Lecture Notes
March 3, 2022
1 Stochastic models of epidemics
MAS316/414 Mathematical Modelling of Natural Systems Topic 2
Alex Best (a.best@shef.ac.uk)
1.1 Lecture 1: A deterministic model for epidemics
1.1.1 Introduction
In this topic we are going to be thinking about modelling the spread of infectious disease in some
population. We will do this using a classic compartment model, where we divide the population
into three types:
• Susceptible - uninfected individuals with no prior immunity,
• Infected - infected and infectious individuals,
• Recovered - immune to reinfection.
More detailed model structures exist, but we will focus on this initial simple model for now.
We can then think about how individuals move between these three compartments to describe the
dynamics of the model. Again, we will start with the simplest possible (yet biologically reason-
able) structure where the only two processes are infection (Susceptible -> Infected) and recovery
(Infected -> Recovered).
This is known as the SIR model and has formed the cornerstone of modelling infectious dis-
eases in plant, animal and human populations since the 1920s. In general, the model is built as a
set of ordinary differential equations. This is a deterministic model, where we assume the popula-
tion sizes are large and that time and the variables are continuous. An important feature of this
type of model is that, if we take the same parameter values and initial conditions, we will get the
exact same time-course every time we run the model, and there is no randomness involved. Of
course, the real world is messier than this, and we would expect a degree of inherent randomness
- the precise time an infection takes place, for example. An alternative approach that captures this
inherent randomness is a stochastic model. Now the same parameters and initial conditions will
lead to different time-courses every time. Stochastic models can also give us more accuracy over
what happens when population numbers are small and assume that events cause discrete changes
in the numbers of individuals (rather than continuous changes to the density). However, their
inherent randomness often makes them harder to analyse.
1
1.1.2 The deterministic model
Although our focus is to be on the stochastic implementation of a model of disease spread, it is
useful for us first to get a good understanding of its deterministic counterpart. As such, let us
assume that the population is large, and that both time and the population densities are contin-
uous. We can then describe the dynamics of the model using the following ordinary differential
equations,
dS
dt
= −βSI (1)
dI
dt
= βSI − γI (2)
dR
dt
= γI (3)
with the total population,N = S+I+R. Note that dN/dt = dS/dt+dI/dt+dR/dt = 0, and so
the total population stays at a constant size. This is a reasonable assumption for studying a short-
term epidemic outbreak in long-lived populations. Because of this we can eliminate one variable,
with R = N − (S + I) and ignore the dR/dt equation. We should also stress that while we will
often think of these variables as numbers, they are actually densities (since they are continuous
variables).
While this is a fairly simple looking system (two variables and two parameters), there is no
explicit solution. However, we can explore the qualitative behaviour of the model by identifying
equilibria of the system and their stability. Moreover, we can use numerical routines to simulate
the possible dynamics, as we will do in the first computer lab.
Considering these equations, we can see that the only equilibrium is when I = 0 and S may
take any value < N . We therefore seem to have a continuum of non-isolated equilibria. To as-
sess the stability here we need to take the Jacobian of the system - a matrix made up of partial
derivatives of our ODEs:
J =
(
∂(dS/dt)
∂S
∂(dS/dt)
∂I
∂(dI/dt)
∂S
∂(dI/dt)
∂I
)
=
( −βI −βS
βI βS − γ
)
Since I = 0 the bottom-left entry is 0, meaning we can read the eigenvalues off as the entries
on the main diagonal. We therefore have λ1 = 0 and λ2 = βS−γ. The fact that one eigenvalue is 0
confirms our assessment that we have a line of non-isolated equilibria. The stability of each point
now just depends on λ2. We can see that for S > γ/β, the eigenvalue is positive and that point
is unstable, while for S < γ/β, the eigenvalue is negative and that point is stable. Overall then,
we expect the dynamics to eventually reach an equilibrium where I = 0 and S is some value less
than γ/β (but the precise value depends on the initial conditions).
This implies that in the long-run, we would expect the population to reach a point with no
infected hosts i.e. the system will become disease-free. In the long-run this is often the case with
emerging diseases - they will eventually burn out - but it is still important for us to know whether
there can be an epidemic, when an initially small amount of infection causes a large outbreak of
disease in the population (even if it eventually tends to zero).
2
Figure 1: Example of epidemic curve from deterministic model. Parameters: N = 1000, γ =
1/14, R0 = 5
For an epidemic to occur, we need to have dI/dt > 0 initially. Before an outbreak, the initial
densities are S(0) ≈ N, I(0) ≈ 0. This gives,
dI
dt
= (βN − γ)I (4)
• βN − γ < 0 =⇒ disease dies out.
• βN − γ > 0 =⇒ epidemic.
Note that βN − γ > 0 =⇒ βN/γ > 1. This fraction is an important measure of how quickly
a disease spreads at early points in the epidemic, and is known as the basic reproduction ratio, or
R0. Provided R0 > 1, we see a classic epidemic curve, of initially exponential growth, slowing to
a peak and then gradually tailing off towards 0.
1.1.3 An extended model
We have introduced the simplest possible model for the spread of an infetcious disease, with just
three possible compartments (S, I and R) and only two mechanisms (infection and recovery). We
have therefore built a lot of assumptions into the model, some of which are more obvious than
others. Let us think about one particular extension, which is to allow immunity to wane at rate ω,
such that individuals who were in the recovered compartment can now return to being susceptible
once again. We can then update our model as follows,
3
dS
dt
= −βSI + ωR (5)
dI
dt
= βSI − γI (6)
dR
dt
= γI − ωR. (7)
Notice that we still have a constant population size, meaning we can still study only the first
two equations, setting R = N − S − I . We now have,
dS
dt
= −βSI + ω(N − S − I) (8)
dI
dt
= βSI − γI (9)
We now find we have two possible equilibria. Firstly we can have (S, I) = (N, 0), the disease-
free case. This time, however, the susceptible density takes a fixed value of N , because eventually
everyone who was recovered will lose their immunity and return to being susceptible. The second
equilibrium occurs at:
(S, I) =
γ
β
,
ω
(
N − γβ
)
γ + ω
which after some re-arranging can be written as,
(S, I) =
(
γ
β
,
ωγ
β(γ + ω)
(R0 − 1)
)
.
Clearly, then, the second equilibrium only exists for feasible values if R0 > 1. It is left as a
challenge for you to prove that the stability of each point depends on whether R0 is larger than or
less than 1.
1.2 Lecture 2: A first stochastic simulation model
We will now start thinking about how we can build a stochastic model. Recall that unlike the
deterministic ODE model, we will now assume that we have discrete numbers of individuals, that
there is variation in individual event rates around a mean and that the population size need not be
large (in fact, large population sizes will slow us down considerably). We will use a well-known
algorithm for building our stochastic model, popularly known as the Gillespie algorithm, named
after the scientist who popularised the method in the 1970s (in fact, the fundamental techniques
date back to the 1930s and 1940s). It is also sometimes called the direct method. This was originally
designed to model chemical reactions but is equally applicable to epidemic models.
1.2.1 Introducing the basics
Let us start by considering an example where there is just one type of event happening. Assume
we have a population of infected individuals where there is no further transmission occurring
(perhaps a vaccine has rapidly removed all other susceptible individuals, or we have quarantined
4
all infected individuals together). The only process we need to model, therefore, is recovery. The
deterministic equivalent of this system would be,
dI
dt
= −γI. (10)
We can also think of this as a discrete event as,
I
γ−→ R. (11)
The parameter, or rate constant, γ has units 1/time, and is defined so that γdt gives the prob-
ability that a randomly chosen infected individual recovers during the time interval [t, t + dt),
where dt is an infinitessimally small timestep. By extension, the probability that exactly one re-
covery happens during this interval [t, t + dt) is γI(t)dt. We would call this a first-order event,
because the number of such events in some time period depends linearly on one of the variables -
the number of infecteds.
Our goal is to find how many infected individuals there are at future timepoints. Let us start at
t = 0 with I(t) = N . Given the definitions above, our first naive attempt might be to move along
at very small time intervals, ∆t, and each time decide, probabalistically, whether a recovery event
occurs. We could do that as follows:
1. Choose a random number, r, uniformly distributed in (0,1).
2. (a) If r < γI(t)∆t, then recovery occurs. Set I(t+ ∆t) = I(t)− 1 and update t = t+ ∆t.
(b) Otherwise no recovery occurs, I(t+ ∆t) = I(t) and update t = t+ ∆t.
Provided we set ∆t to be small enough (remember, in the definition it is assumed to be in-
fenitessimally small), this should accurately implement the process we have described, or at least
be a very good approximation of it. However, notice that if the time steps are very small, a large
proportion of the time we will find that no recovery occurs and our system stays as it was. This
is computationally inefficient since we spend a lot of time drawing random numbers and testing
conditions only for nothing to happen. Instead, we shall try and develop a method where we only
calculate one random number in between each event (and at the same time make it exact, not just
a very good approximation). This links to something called ‘the master equation’, but here we
will be focussing on how we can build a stochastic simulation algorithm.
1.2.2 The direct method (Gillespie algorithm)
The key to our approach is to generate random numbers that represent waiting times between
events. In other words, if the current time is t, we are asking, “at what time t + τ will the next
event occur?” We will assume τ is a continuous random variable between 0 and∞. Let f(I(t), s)ds
be the probability that if there are I(t) individuals at time t, the next recovery occurs in the interval
[t + s, t + s + ds), where ds is infinitesimally small. We will call f(I(t), s) the reaction probability
density function. We can expand this function as meaning that there is no recovery between t and
t + s, with the recovery then ocurring in the interval [t + s, t + s + ds). If we let g(I(t), s) be the
probability that no event occurs in the interval [t, t+ s), then we can write,
f(I(t), s)ds = g(I(t), s)γI(t+ s)ds. (12)
Now, since there was no recovery in the interval [t, t+s) we know that I(t+s) = I(t), meaning,
5
f(I(t), s)ds = g(I(t), s)γI(t)ds. (13)
Let us now consider g(I(t), s). For some σ > 0, the probability that no recovery event happens
in the interval [t, t + σ + dσ) can be expressed as the product of the probability that no recovery
happens in the interval [t, t + σ) and the probability that no recovery happens in the interval
[t+ σ, t+ σ + dσ). This gives,
g(I(t), σ + dσ) = g(I(t), σ)[1− γI(t+ σ)dσ], (14)
= g(I(t), σ)[1− γI(t)dσ], (15)
since we know that if no recovery has happened in [t, t+ σ) it must be that I(t+ σ) = I(t). We
can rearrange this to give,
g(I(t), σ + dσ)− g(I(t), σ)
dσ
= −γI(t)g(I(t), σ). (16)
If we now take the limit dσ → 0 we can express this simply as a first-order linear differential
equation,
dg(I(t), σ)
dσ
= −γI(t)g(I(t), σ). (17)
For an initial condition of g(I(t), 0) = 1 (i.e. no recovery event happens by t = 0) and using
separation of variables we obtain the solution,
g(I(t), σ) = exp[−γI(t)σ]. (18)
Finally we can substitute this back into our original equation for the probability that the next
recovery event happens in the interval [t+ s, t+ s+ ds),
f(I(t), s)ds = exp[−γI(t)s]γI(t)ds. (19)
We can notice that the this means the reaction probablity density function f(I(t), s) =
exp[−γI(t)s]γI(t), which is in fact the probablity density function for the exponential distribu-
tion with mean 1/(γI(t)). In other words, the waiting times between events are exponentially
distributed. We want to use this result to easily calculate waiting times, τ for the next event to
occur. Let us take the function,
F (τ) = exp[−γI(t)τ ]. (20)
Since τ was a random number in [0,∞), then F (τ) is a random number in the interval [0, 1]. In
fact, we can show that F (τ) is a random number uniformly distributed in the interval [0, 1]. This
is worked through below for completeness, but we will not dwell on it.
Take constants a, b in the interval [0,1]. The probability that F (τ) is in the interval (a, b) is the
same as the probability that τ is in the interval (F−1(b), F−1(a)). This latter probability is given by
the integral of f(I(t), s) with respect to s between F−1(b) and F−1(a). This gives,
6
∫ F−1(a)
F−1(b)
f(I(t), s)ds =
∫ F−1(a)
F−1(b)
γI(t) exp[−γI(t)s]ds (21)
= −
∫ F−1(a)
F−1(b)
dF
ds
ds (22)
= −F [F−1(a)] + F [F−1(b)] (23)
= b− a. (24)
We therefore know that our waiting times are exponentially distributed with mean 1/(γI(t)),
and that we can find each one simply by drawing a random number uniformly distributed be-
tween 0 and 1. If we have such a random number, r, then we can set,
r = F (τ) = e−γI(t)τ (25)
=⇒ τ = 1
γI(t)
ln
(
1
r
)
. (26)
Using this, we can now build our full algorithm:
1. Set t = 0 and I(0) = N .
2. Draw a random number r from the uniform distribution in [0, 1].
3. Set
τ =
1
γI(t)
ln
(
1
r
)
.
4. (a) If t + τ < tend, set I(t + τ) = I(t)− 1 and t = t + τ . If I(t) = 0, exit. Otherwise, return
to step 2.
(b) If t+ τ > tend, set I(tend) = I(t) and t = tend and exit.
1.2.3 Example
We can illustrate a few steps of the algorithm with some numbers. We will populate the numbers
live in the lecture. Assume a population of 20 infected individuals and that the recovery rate is
γ = 1/10. If we work through our algorithm we will have the following,
• Set t = 0 and I(0) = 20.
• Draw a random number: r = .....
• Find the waiting time
τ =
1
2
ln
(
1
.....
)
≈ .....
• If ..... < tend we continue.
• Set t = ..... and I(.....) = 19. * Draw a random number r = .....
• Find the waiting time
τ =
1
1.9
ln
(
1
.....
)
≈ .....
7
Figure 2: Example trajectories from the stochastic and deterministic versions of the recovery-only
model with γ = 1/10 and I(0) = 20.
• If ..... < tend we continue.
• Set t = ..... and I(.....) = 18.
• . . .
The figure below shows 20 trajectories from the stochastic model alongside the detrministic
version. We can see that the determinsitc version is a reasonable ‘average’ of the stochastic runs.
We can also see the variation in the stochastic runs caused by the inherent randomness.
1.3 The full stochastic model
Last time we built up our first stochastic model. This was not really a true infectious disease model
because the only mechanism we included was recovery. Now we shall build up our full model.
Recall that the deterministic equivalent for our model is,
dS
dt
= −βSI (27)
dI
dt
= βSI − γI (28)
We now have two possible events,
S
βI−→ I. (29)
I
γ−→ R. (30)
8
The first event occurs when a susceptible individual becomes infected. Notice that in this case
the transition rate depends on the current infected density. This is a second-order event. The second
event occurs, as we saw last time, when an individual recovers.
Our direct method algorithm for producing our stochastic runs is going to be very similar
to what we saw last time. In particular, the time to the next event will still be exponentially dis-
tributed. However, because we now have two different types of event, the mean of the exponential
distirubtion will be the inverse of the sum of all the rates. We can therefore set,
ρ = βSI + γI.
Then if we draw a random number, r1 from the uniform distribution in (0,1), the time to the
next event will be,
τ =
1
ρ
ln
(
1
r1
)
. (31)
The second change to our algorithm is that the time we have just calculated is until the next
time an event occurs, but it does not tell us which event occurs. It makes sense that the probability
of one or other event happening will be proportional to their rate. Therefore, if we were to draw a
second random number betweeon 0 and 1, r2, then we could say,
• if r2 ≤ γIρ then the event is recovery,
• if r2 > γIρ then the event is infection.
We can now update our full algorithm:
1. Set t = 0, S(0) and I(0).
2. Draw a random number r1 from the uniform distribution in (0, 1).
3. Calculate the total of all rates, ρ.
4. Set
τ =
1
ρ
ln
(
1
r1
)
.
5. Check if t+ τ < tend. If not, exit.
6. Draw a second random number r2 from the uniform distribution in (0, 1).
7. (a) If r2 ≤ γI/ρ, event is recovery. Set I(t+ τ) = I(t)− 1 and t = t+ τ . Return to step 2.
(b) Otherwise event is transmission. Set S(t+τ) = S(t)−1, I(t+τ) = I(t)+1 and t = t+τ ,.
Return to step 2.
1.3.1 An example
Again, we can illustrate a few steps of the algorithm with some numbers. Assume a population of
100 individuals, with 2 people initially infected and the rest susceptible. Assume that the recovery
rate is γ = 1/10 and that R0 = 2 meaning β = 0.2. We will again populate this with real numbers
during the lecture.
• First, t = 0, S(0)=98 and I(0) = 2.
9
Figure 3: Example trajectories from the stochastic and deterministic versions of the recovery-only
model with γ = 1/10, R0 = 5, N = 100 and I(0) = 2
.
• Draw a random number r1 = ......
• Calculate the total of all rates, ρ = γI + βSI = 39.4.
• Set
τ =
1
39.4
ln
(
1
.....
)
≈ .....
• Check if ..... < tend. If not, exit.
• Draw a second random number ..... from the uniform distribution in [0, 1].
• Check if ..... > 0.2/39.4.
– If so set S(.....) = 97, I(.....) = 3 and t = ......
– If not set I(.....) = 1 and t = ......
• Draw a random number r1 = ......
• Calculate the total of all rates, ρ = γI + βSI = ......
• Set
τ =
1
.....
ln
(
1
.....
)
≈ .....
• Check if ..... < tend. If not, exit.
• Draw a second random number ..... from the uniform distribution in [0, 1].
• Check if ..... > ......
– Therefore set S(.....) = ....., I(.....) = ..... and t = ......
A full time-course from 20 replicate runs is shown below.
10
1.3.2 More complex models
This algorithm works if we have just two mechanisms at work. How will things change if we
have a more complex system? For example, what about our model from the first lecture where
we included waning immunity? This means we now have a third mechanism. Therefore the total
rate, ρ will be the sum of the three event rates. Similarly, when we check which event occurs
we will need more than a single if-else statement. Most programming languages have a switch-
case function, which is an effective extension of if-else statements to multpile cases. For many
years python had no such functionality, and instead we would use extended if-elif-else statements.
It appears that since python 3.10 was introduced, there is now a match-case statement, which is
basically the same thing. Given that this is quite a recent release it is probably best to stick to
if-elif-else ladders for now.
Even in the above case we still only need to keep track of two variables - susceptibles and
infecteds - because we knowR = N−S−I . If we had a model where the total population was not
necessarily constant, such as the extended model seen in the first practical, then we would need
to make sure we were keeping track of all three variables.
In [ ]:
11