MAT 5170 Probability Theory I - Lecture Notes
October 23, 2020
1 Probability Measures
A quick review for sets and events
• ∅ denotes the empty set.
• A ⊂ B means A is a subset of B. That is, all elements in A also belong
to B. Typically, B is a bigger set. For example, if A = {1, 2, 3} and
B = {1, 2, 3, 4}, then A ⊂ B. But if C = {3, 4, 5}, then C 6⊂ B.
• A ∪ B is the union of two sets. This is the set of all outcomes that
are either in A or in B, or in both. For example, if A = {1, 2, 3} and
B = {3, 4, 5}, then A ∪ B = {1, 2, 3, 4, 5}. Note that 3 is not counted
twice. If C = {1, 2, 3, 4}, then A ∪ C = {1, 2, 3, 4} = C.
• A∩B is the intersection of two sets. This is the set of all outcomes that
belong to both A and B. For example, if A = {1, 2, 3} and B = {3, 4, 5},
then A ∪B = {3}. If C = {4, 5}, then A ∪ C = ∅.
• A \ B is the difference of two sets. We remove from B all elements
that are also in A. For example, if A = {1, 2, 3} and B = {3, 4, 5}, then
B \A = {4, 5}.
• A′ is the complement of A. That is, A′ is the set of all elements in S
that are not in A.
• Commutative laws: A ∩B = B ∩A; A ∪B = B ∪A.
• Associative laws:
(A ∪B) ∪ C = A ∪ (B ∪ C) = A ∪B ∪ C
(A ∩B) ∩ C = A ∩ (B ∩ C) = A ∩B ∩ C
• Distributive laws:
A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)
A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)
1
• De Morgan’s laws:
(A1 ∩ · · · ∩Ak)′ = A′1 ∪ · · · ∪A′k
(A1 ∪ · · · ∪Ak)′ = A′1 ∩ · · · ∩A′k
1.1 Spaces and Probabilities
In the probability theory Ω denotes a probability space. It could be as simple
as {0, 1} in case of one coin tossing, or as complicated as a space of all contin-
uous functions defined on [0, 1].
A subset of Ω is called an event and an element ω of Ω is a sample point.
This is a definition that you learned in elementary probability class:
Definition 1. Probability is a real-valued set function P that assigns, to each
event A ⊆ Ω a number P(A), called the probability of the event A, such that the
following axioms of probability are satisfied:
1. 0 ≤ P(A) ≤ 1 for any event A ⊂ Ω
2. P(∅) = 0, P(Ω) = 1
3. If A1, A2, ....Ak are mutually exclusive, then
P(A1 ∪A2 ∪ · · · ∪Ak) = P(A1) + P(A2) + · · ·+ P(Ak) .
You will learn in a moment that this definition is not 100% correct ..... Our
goal is to assign probabilities to as many events as possible.
For example, if Ω is finite then we can assign a uniform probability measure:
P({ω}) = 1|Ω| . But this is not only one choice. For example if you roll a fair dice
then Ω = {1, 2, 3, 4, 5, 6} and then P({1}) = · · · = P({6}) = 1/6. But if the dice
is loaded as follows: 1 side with ”1”, 2 sides with ”2”, 3 sides with ”3”, then
P({2}) = 2/6, P({3}) = 3/6. In other words, for the same sample space we may
have different probabilities.
If Ω = [0, 1] and A = (a, b), 0 < a < b < 1, then we can assign
P(A) = b− a .
Using the third axiom of probability, we can extend it to all finite unions of
intervals (see Theorem 11). What about more complicated events? For example,
what is P(Q), where Q is the set of all rationals in [0, 1]?
In fact, we cannot assign a probability to an arbitrary event, rather we need
to consider some special classes of sets like fields or σ-fields.
1.2 Classes of sets
A ”nice” class of sets should be closed under the formation of countable unions
and intersections.
2
Definition 2. A class F of subsets of Ω is called a field if:
• Ω ∈ F ;
• If A ∈ F then Ac = Ω \A ∈ F ;
• If A,B ∈ F then A ∪B ∈ F . (”closed under union of two sets”)
Note that De Morgan’s laws immediately imply that A ∩ B ∈ F . Indeed,
A∩B = (Ac∪Bc)c. Also, the third property implies that a field is closed under
any finite union.
Definition 3. A class F of subsets of Ω is called a σ-field if:
• Ω ∈ F ;
• If A ∈ F then Ac = Ω \A ∈ F ;
• If A1, A2, . . . ∈ F then A1 ∪ A2 ∪ · · · ∈ F . (”closed under countable
unions”)
Example 4. Let F consist of the finite and the cofinite sets (A being cofinite
if Ac is finite). For example, if Ω = [0, 1] and A = [0, 1) then A is cofinite since
Ac = {1}.
Then F is a field.
• If Ω is finite then F is also a σ-field.
• If Ω is infinite then F may not be a σ-field. For example, let Ω =
{ω1, ω2, . . .}. Take A = {ω2, ω4, . . .}. Then A 6∈ F , but A is a count-
able union of singletons {ω2i}, i = 1, 2, . . ..
Let A be a (”small” and ”easy”) class of subsets of Ω. For example, A can
be a class of all subintervals of [0, 1]. Note that this class is not closed under
finite unions since e.g. for a1 < a2 < a3 < a4, [a1, a2]∪ [a3, a4] is not an interval.
So, it is not a field of a σ-field.
Since we want to consider all sets in A and at the same time have a ”nice”
structure of a σ-field, we will consider a σ-field generated by the class A, denoted
by σ(A). This is the smallest σ-field that contains A.
In general, σ(A) has the following properties:
• A ⊂ σ(A);
• σ(A) is a σ-field;
• If A ⊂ G and G is σ-field, then σ(A) ⊂ G.
Example 5. If Ω = {1, 2, 3}, then σ({1}) = σ({2, 3}) = {∅, {1}, {2, 3},Ω}.
Indeed, denote A = {1} and A′ = {∅, {1}, {2, 3},Ω}. Check (this is easy!)
that A is not a field (since the complement of {1} does not belong to A), but
A′ is a field. It is also a σ-field, since the space Ω is finite. Of course, A ⊂ A′.
Now, is A′ the smallest field that contains A? We list all the possible classes
that contain A = {1}:
3
• {∅, {1},Ω} - it is not a field, since it is not closed under taking comple-
ments;
• {{1},Ω} - it is not a field, since it is not closed under taking complements;
• {∅, {1}} - it is not a field, since it is not closed under taking complements;
• {∅, {1}, {1, 2},Ω} - it is not a field, since it is not closed under taking
complements;
• you can continue listing all the possibilities and you will see that most of
them will not be fields.
• Now, I will list all the fields that include {1}:
– A′ = {∅, {1}, {2, 3},Ω};
– {∅, {1}, {1, 2}, {3}, {2, 3},Ω} - it is bigger than A′;
– {∅, {1}, {1, 3}, {2}, {2, 3},Ω} - it is bigger than A′;
– {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3},Ω} - it is bigger than A′;
Thus, we proved thatA′ is the smallest field that contains {1}, henceA′ = σ(A).
Of course, this is not the most effective way of proving this.
Example 6. If F is a σ-field, then σ(F) = F .
Example 7. Let Ω = [0, 1] and I be a class of all subintervals (a, b] of (0, 1].
Then B = σ(I) is called a Borel σ-field. It is a class of all countable unions
and intersections of intervals. Its elements are called Borel sets.
A set of rationals Q is a Borel set. Indeed, each rational q can be obtained as
q =
⋂∞
i=1(q − 1/i, q]. The sets (q − 1/i, q] ∈ I. Note that {q} 6∈ I, but {q} ∈ B.
Since there are countably many rationals, Q ∈ B.
1.3 Probability Measures
Definition 8. A set function P on a field F is called probability if
1. 0 ≤ P(A) ≤ 1 for any event A ∈ A
2. P(∅) = 0, P(Ω) = 1
3. If A1, A2, . . . ,∈ F are mutually exclusive (disjoint) and
⋃∞
i=1Ai ∈ F , then
P(A1 ∪A2 ∪ · · · ) = P(A1) + P(A2) + · · · .
From the axioms of probability we can derive further properties. Let A ⊂
B ∈ F . Then B = A ∪ (B \ A) is a union of disjoint sets. Thus, P(B) =
P(A) + P(B \A) and hence we conclude that P is monotone:
4
P(A) ≤ P(B) , A ⊂ B .
Another property is
P(A ∪B) = P(A) + P(B)− P(A ∩B) .
Further, let B1 = A1 and B2 = A2∩Ac1. Then the events B1, B2 are disjoint
and B1 ∪B2 = A1 ∪A2. Thus,
P(A1 ∪A2) = P(B1 ∪B2) = P(B1) + P(B2)
= P(A1) + P(A2 ∩Ac1) ≤ P(A1) + P(A2) .
In the last step we sued the fact that A2 ∩Ac1 ⊆ A2.
This can be generalized to obtain finite subadditivity:
P
(
n⋃
i=1
Ai
)

n∑
i=1
P(Ai) , Ai ∈ F .
Definition 9. Let F be a σ-field. A triple (Ω,F ,P) is called a probability
space.
Theorem 10. Let F be a field.
• (Continuity from below) If A,An ∈ F , n ≥ 1 and A1 ⊆ A2 ⊆ · · · ⊆ A,
then
lim
n→∞P(An) = P(A) .
• (Continuity from above) If A,An ∈ F , n ≥ 1 and A ⊆ · · ·A2 ⊆ A1, then
lim
n→∞P(An) = P(A) .
Proof. Set B1 = A1, Bk = Ak \ Ak−1. Then Bk are disjoint, A =
⋃∞
k=1Bk
and An =
⋃n
k=1Bk and An, n ≥ 1 is an increasing sequence. By countable
P(A) = P
( ∞⋃
k=1
Bk
)
=
∞∑
k=1
P(Bk)
= lim
n→∞
n∑
k=1
P(Bk) = lim
n→∞P
(
n⋃
k=1
Bk
)
= lim
n→∞P(An) .
5
1.4 Lebesque measure
Let Ω = [0, 1]. Let I be a class of subintervals (a, b] of (0, 1]. Let B0 be a class
that consists of all finite union of disjoint intervals. Note that B0 is a field. For
I = (a, b] its Lebesque measure is λ(I) = b− a.
Theorem 11. The Lebesgue measure is a probability measure on B0.
Proof. See the textbook.
Example 12. Assume that we select a value from the interval (0, 1] at ”ran-
dom”. What is the probability that the selected number is rational?
To each interval A = (a, b] we assigned P(A) = b−a. Thus, P((q−1/i, q]) =
1/i. Then
P({q}) = P
( ∞⋂
i=1
(q − 1/i, q]
)
= lim
i→∞
P((q − 1/i, q]) = lim
i→∞
(1/i) = 0 .
On the other hand, Q is a countable union thus P(Q) = 0. 1
2 Existence of Probability Measures
Some facts about sets and classes of sets. Let F and G be two classes of
sets. If G ⊆ F then any set that belongs to G also belongs to F : A ∈ G ⇒ A ∈ F .
On the other hand, if A ∈ G and B ⊆ A (that is, B is a subset of set A) then
we cannot conclude that B ∈ G. Indeed, let G be a class of subintervals (a, b] of
(0, 10]. Then B = {1/2} is included in many of the intervals, but B 6∈ G.
2.1 Existence
The main result of today’s lecture is the following theorem:
Theorem 13. A probability measure on a field has a unique extension to the
generated σ-field.
The meaning of this theorem is the following. Let F0 be a field and let
F = σ(F0). Let P be a probability measure on F0. We are allowed to write
P(A) for A ∈ F0, but not for A ∈ F . Then there exists (only one) probability
measure P˜ on F such that P(A) = P˜(A) for all A ∈ F0. Usually the extension
P˜ of P is denoted just by P.
1Note: At this moment, thanks to Theorem 11, we can compute the Lebesgue measure
on finite unions and intersections of subintervals. We do not know if we can compute the
measure on countable unions and intersections. This will cam later. Thus, at this point, the
first displayed equation is not completely justified, since
⋂∞
i=1(q − 1/i, q] 6∈ B0.
6
Example 14. Let Ω = [0, 1]. Let I be a class of subintervals I = (a, b] of (0, 1].
It is easy to define the Lebesgue measure on I by λ(I) = b−a. This is easy. Let
B0 be a field that consists of all finite union of disjoint intervals. By Theorem
10 from Slides set A, the Lebesgue measure is also a probability measure on B0.
This is still easy. Our main theorem extends the Lebesque measure to all Borel
sets of (0, 1]. In particular, we are allowed to write λ(Q) (which happens to be
zero as we learned in Slides set A).
Let P be the probability measure on F0. For A ∈ Ω define
P∗(A) = inf

n
P(An) , (1)
where the infimum extends over all finite and infinite sequences A1, A2, . . . , such
that A ⊂ ⋃nAn and An ∈ F0. Note that for A ∈ Ω such that A 6∈ F0 we are
not allowed to write P(A). However, if A ∈ F0 then we will show that
P∗(A) = P(A) . (2)
The measure P∗ is called the outer measure of P.
The measure P∗ has the following properties:
1. P∗(∅) = 0;
2. P∗(A) ≥ 0 for all A;
3. P∗ is monotone, that is, if A ⊂ B, then P∗(A) ≤ P∗(B);
4. P∗ is countably subadditive, P∗(∪nAn) ≤

n P∗(An).
However, we do not know if P∗ is a probability measure.
The inner measure P∗ is defined as
P∗(A) = 1− P∗(A′) .
Both P∗ and P∗ are candidates for an extension of P.
If P∗ and P∗ agree then we have
P∗(A) + P∗(A′) = 1 .
P∗(A ∩ E) + P∗(A′ ∩ E) = P∗(E) (3)
for all sets E ∈ Ω. Note that by subadditivity, (3) is equivalent to
P∗(A ∩ E) + P∗(A′ ∩ E) ≤ P∗(E) . (4)
We do not know at this point how many (if any at all!!!) sets A fulfill this. Let
M be the class of sets A that fulfill (3). We call such sets P∗-measurable.
7
Lemma 15. The class M is a field.
Proof. First, Ω ∈M, since
P∗(Ω ∩ E) + P∗(Ω′ ∩ E) = P∗(E) .
Next, M closed under complementation: if A ∈ M, then A′ ∈ M. This is
obvious. Furthermore, if A,B ∈ M, then A ∩ B ∈ M. For this one need to
show that if
P∗(A ∩ F ) + P∗(A′ ∩ F ) = P∗(F ) , P∗(B ∩ E) + P∗(B′ ∩ E) = P∗(E) ,
for all sets E,F , then
P∗((A ∩B) ∩ E) + P∗((A ∩B)′ ∩ E) ≤ P∗(E) .
We have
P∗(E) = P∗(B ∩ E︸ ︷︷ ︸
=F
) + P∗(B′ ∩ E︸ ︷︷ ︸
=F
)
= {P∗(A ∩B ∩ E) + P∗(A′ ∩B ∩ E)}
+ {P∗(A ∩B′ ∩ E) + P∗(A′ ∩B′ ∩ E)}
≥ P∗(A ∩B ∩ E) + P∗(((A′ ∩B) ∪ (A ∩B′) ∪ (A′ ∩B′)) ∩ E) .
In the first equation we used the property (3) applied to B, then in the next
equality the property of (3) applied to A twice. Then we used subadditivity of
P∗. Now,
((A′ ∩B) ∪ (A ∩B′) ∪ (A′ ∩B′)) = (A ∩B)′ ,
thus
P∗(E) ≥ P∗(A ∩B ∩ E) + P∗((A ∩B)′ ∩ E) .
Hence, A ∩B ∈M.
Lemma 16. If A1, A2, . . . is a finite or infinite sequence of disjoint M-sets,
then for each E ∈ Ω,
P∗
(
E ∩

k
Ak
)
=

k
P∗(E ∩Ak) . (5)
Proof. We will do a proof of A1, A2 only. If A1 ∪ A2 = Ω, then A2 = A′1 and
hence (5) is just (3). If A1 ∪A2 ⊂ Ω, then write
E ∩ (A1 ∪A2)︸ ︷︷ ︸
=F
=
A1 ∩ (E ∩ (A1 ∪A2))︸ ︷︷ ︸
=F
 ∪
A′1 ∩ (E ∩ (A1 ∪A2))︸ ︷︷ ︸
=F

and use (20) with E replaced by F .
8
Lemma 17. The class M is a σ-field.
Proof. Note that we only need to prove thatM is closed under countable unions.
I will not prove it, you may consult the textbook.
Combining Lemma 17 with (5) (taking E = Ω) gives us the countable addi-
tivity on M:
P∗
(⋃
k
Ak
)
=

k
P∗(Ak) , (6)
where Ak are disjoint and belong to M.
Lemma 18. If P∗ is defined by (3), then F0 ⊂M.
Proof. Let A ∈ F0 and E ∈ Ω. Let > 0. Choose sets An ∈ F0 such that
E ⊂ ⋃nAn and∑n P(An) ≤ P∗(E)+. The sets Bn = A∩An and Cn = A∩A′n
belong to F0, since it is field. Also, E ∩A ⊂

nBn, E ∩A′ ⊂

n Cn. Thus, by
monotonicity and countable subadditivity of P∗,
P∗(E ∩A) + P∗(E ∩A′) ≤ P∗
(⋃
n
Bn
)
+ P∗
(⋃
n
Cn
)

n
P∗(Bn) +

n
P∗(Cn)
=

n
P(Bn) +

n
P(Cn) .
Since Bn and Cn are disjoint,
P∗(E ∩A) + P∗(E ∩A′) ≤

n
P(An) ≤ P∗(E) + .
Hence, if A ∈ F0, then (22) holds and thus A ∈M. Hence, F0 ⊂M.
Corollary 19. If P∗ is defined by (1), then F = σ(F0) ⊂M.
Lemma 20. If P∗ is defined by (1) then P∗(A) = P(A) for all A ∈ F0.
Proof. From (1) we obtain that if A ∈ F0, then P∗(A) ≤ P(A) (since A can be
covered by itself). Now, let A ⊂ ∪nAn, where A,An ∈ F0. Then
P(A) = P (A ∩ (∪nAn)) ≤

n
P(A ∩An) ≤

n
P(An) .
Note that here we are allowed to write P since all the sets belong to F0. Thus
P∗(A) ≤ P(A) ≤

n
P(An)
and since the left hand side does not depend on n we have
P∗(A) ≤ P(A) ≤ inf
n

n
P(An) . (7)
At the same time, (1) holds, thus the inequalities in (7) must be equalities.
9
• By Lemma 20, P∗(Ω) = P(Ω).
• Equation (6) gives us countable additivity of P∗ on M, hence also on
F = σ(F0).
• Thus, P∗ is a probability measure on F .
• Lemma 20 means that P∗ is an extension of P.
2.2 Uniqueness
2 A class P of subsets of Ω is called a pi-system if it is closed under finite
intersections:
• If A,B ∈ P, then A ∩B ∈ P.
Note that a field is necessary a pi-system, but the converse is not true.
A class L of subsets of Ω is called a λ-system if
• Ω ∈ L;
• A ∈ L implies A′ ∈ L;
• L is closed under countable disjoint unions, that is if An ∈ L are disjoint,
then

nAn ∈ L.
Note that σ-field is also a λ-system, but the converse is not true.
Lemma. A class that is both a pi-system and a λ-system is a σ-field.
Theorem. (Dynkin pi-λ Theorem). If P is a pi-system and L is λ-system and
P ⊆ L, then σ(P) ⊆ L.
is a field and L is a σ-field, and P ⊆ L, then σ(P) ⊆ L. This is obvious by the
definition of σ(P).
Theorem. Suppose that P1 and P2 are probability measures on σ(P), where
P is a pi-system. Assume that P1 and P2 agree on P. Then they agree on σ(P).
Proof. Let G be the class of sets A such that P1(A) = P2(A).
• Obviously, Ω ∈ G, since P1(Ω) = P2(Ω).
• If A ∈ G, then P1(A′) == 1− P1(A) = 1− P2(A) = P2(A′), hence A′ ∈ G.
• If An ∈ G, n ≥ 1, are disjoint, then the union belongs to G by the countable
additivity of both P1 and P2.
• Thus, G is a λ-system. We do not know if G is a pi-system, but we know
from the assumption that P ⊆ G.
• Dynkin theorem tells us that σ(P) ⊆ G.
2This material is optional
10
• Note that if we have a stronger assumption that P is a field, then a proof
would be slightly shorter.
How do we apply this result. Take a field F0. Then for A ∈ F0, P∗(A) =
P∗(A). Thus, both inner and outer measure agree on a field. So, they must
agree on a generated σ-field.
3 Denumerable Probabilities
3.1 General Formulas
Let (Ω,F ,P) be a probability space. Let An, n ≥ 1 be a sequence of events,
An ∈ F . Define
lim sup
n
An =
∞⋂
n=1
∞⋃
k=n
Ak ,
lim inf
n
An =
∞⋃
n=1
∞⋂
k=n
Ak .
The sets above are called limits superior and limits inferior, respectively.
Since F is a σ-field, both lim supnAn and lim infnAn belong to F . Then ω ∈
lim supnAn if and only if for each n there is some k ≥ n for which ω ∈ Ak. In
other words, ω lies in lim supnAn if and only if it lies in infinitely many of the
An’s. Likewise, ω ∈ lim infnAn if and only if there is n such that for all k ≥ n,
ω ∈ Ak. In other words, ω lies in all but finitely many of the An’s.
• lim supnAn = limAn = {An i.o.} = {An infinitely often};
• ⋂∞k=nAk ↑ lim infnAn;
• ⋃∞k=nAk ↓ lim supnAn;
• lim infnAn ⊆ lim supnAn (to prove it, use the ”words” description from
the previous slide, instead of the set-theoretic arguments). To see that
inclusion may be strict, consider Ω = {ω1, ω2}. Let An = {ω1} for n even
and An = {ω2} for n odd. Then lim inf An = ∅ and lim supAn = Ω.
• Check https://en.wikipedia.org/wiki/Set-theoretic_limit for ”Ex-
amples”.
• From de Morgan’s laws we can also conclude that lim infnAn = (lim supnAcn)c.
• If lim supnAn = lim infnAn, then we write
A = lim
n
An = lim sup
n
An = lim inf
n
An .
In this case A ∈ F .
11
Theorem 21. We have
P(lim inf
n
An) ≤ lim inf
n
P(An) ≤ lim sup
n
P(An) ≤ P(lim sup
n
An) .
Proof. Set Bn =
⋂∞
k=nAk and Cn =
⋃∞
k=nAk. Then Bn ↑ lim infnAn and
Cn ↓ lim supnAn. The continuity from below and above give
lim sup
n
P(An) ≤ lim sup
n
P(Cn) ≤ P(lim sup
n
An)
and
lim inf
n
P(An) ≥ lim sup
n
P(Bn) ≤ P(lim inf
n
An) .
Corollary 22. If An → A, then limn→∞ P(An) = P(A).
Note that this result extends the previously proven continuity from below
and above.
Definition 23. A finite collection A1, . . . , An of events is independent if
P(Ai1 ∩ · · · ∩Aik) = P(Ai1)× · · · × P(Aik) (8)
for al collections of indices 1 ≤ i1 < · · · < ik ≤ n.
The requirement (8) can be stated in a different way:
P(B1 ∩ · · · ∩Bn) = P(B1)× · · · × P(Bn) , (9)
where Bi is either Ai or Ω.
In particular, a collection if 3 events A,B,C is independent if
P(A ∩B ∩ C) = P(A)P(B)P(C) , (10)
P(A ∩B) = P(A)P(B) , P(A ∩ C) = P(A)P(C) , P(B ∩ C) = P(B)P(C) .
(11)
Example 24. Let Ω = {1, 2, 3, 4, 5, 6}. Take A1 = {1, 2, 3, 4}, A2 = A3 =
{4, 5, 6}. Then (10) holds, but none of (11) are satisfied.
Definition 25. We say that classes A1 and A2 are independent if for each
choice of A1 ∈ A1 and A2 ∈ A2, the events A1 and A2 are independent.
Theorem 26. Assume that fields A1 and A2 are independent. Then the σ-fields
σ(A1) and σ(A2) are also independent. 3
Proof. By the assumption (9) holds for all sets B1, B2 from A1 and A2, respec-
tively. We want (9) to hold for all sets B1, B2 from σ(A1) and σ(A2). Fix
B2 ∈ A2. Let L be the class of sets B1 for which (9) holds. Our goal is to show
that the class L is a σ-field.
3In the textbook, you can see a slightly different formulation, namely it is required that
A1 and A2 are pi systems. Here, in our theorem, we require more.
12
• Ω ∈ L: Indeed, P(Ω ∩B2) = P(Ω)P(B2) trivially holds.
• L is closed under complement: Indeed, if P(B1 ∩B2) = P(B1)P(B2), then
P(B′1 ∩B2) = P(Ω ∩B2)− P(B1 ∩B2)
= P(Ω ∩B2)− P(B1)P(B2) = P(B′1)P(B2) .
• L is closed under finite disjoint unions: Indeed, if A,B ∈ L, A ∩ B = ∅
and both P(A∩B2) = P(A)P(B2) and P(B ∩B2) = P(B)P(B2) hold then
P((A ∪B) ∩B2) = P((A ∩B2) ∪ (B ∩B2)) = P(A ∩B2) + P(B ∩B2)
= P(A)P(B2) + P(B)P(B2) = P(A ∪B)P(B2) .
• The same argument allows us to conclude that L is closed under count-
able disjoint unions. Instead of finite additivity of P use the countable
• We cannot conclude yet that L is a σ-field, since we only know that L is
closed under countable disjoint unions.
• But, L is a pi-system (closed under finite intersections) and λ-system
(closed under complement and countable disjoint unions), hence by Prob-
lem Set 2 it is a σ-field.
• Since (9) holds for A1 and σ(A1) ⊂ L, (9) holds for σ(A1).
• Now, in principle, you should fix B1 ∈ A1 and redo the proof. But the
steps are exactly the same.
3.2 The Borel-Cantelli Lemmas
Note fist that if limn→∞ P(An) = 0, then P(lim infnAn) = 0. The next result
has a stronger assumption but also a stronger conclusion.
Theorem 27 (First Borel-Cantelli Lemma). If

n P(An) <∞, then P(lim supnAn) =
0.
Proof. Fix m ∈ N. We have
P
(
lim sup
n
An
)
≤ P
( ∞⋃
k=m
Ak
)

∞∑
k=m
P(Ak) .
Since the series is summable, the latter expression converges to zero as m →
∞.
Theorem 28 (Second Borel-Cantelli Lemma). If

n P(An) = ∞ and the
events An are independent, then P(lim supnAn) = 1.
13
Proof. Recall that lim supnAn =
⋂∞
n=1
⋃∞
k=nAk, so that the complement is⋃∞
n=1
⋂∞
k=nA
c
k. We will show that the complement has the probability zero.
For this it suffices to prove that P(
⋂∞
k=nA
c
k) = 0 for each n. Fix j. Then, using
1− x ≤ exp(−x), x ∈ (0, 1),
P
(
n+j⋂
k=n
Ack
)
=
n+j∏
k=n
P(Ack) =
n+j∏
k=n
(1− P(Ak)) ≤ exp
(

n+j∑
k=n
P(Ak)
)
If we let j →∞, then the last expression goes to zero.
Example 29. The assumption

n P(An) <∞ is sufficient, but not necessary
for the first Borel-Cantelli Lemma. Consider the probability space ((0, 1],B, λ)
and the sets An = (0, 1/n]. Then An → ∅ and so lim supnAn = ∅ (so that its
measure is 0). On the other hand,

n λ(An) =∞.
Note also that this example serves as a counterexample to the second Borel-
Cantelli Lemma. Note that the events An are not independent.
Let {An, n ≥ 1} be the sequence of events in the probability space (Ω,F ,P).
Consider σ-fields σ(An, An+1, . . .) and define the tail sigma field
T =
∞∏
n=1
σ(An, An+1, . . .) .
Any event in T is called a tail event and is determined by the An’s for arbitrary
large n.
Example 30. lim supnAn and lim infnAn are tail events.
Theorem 31. If the events An are independent, then A ∈ T implies P(A) = 0
or P(A) = 1.
Proof. since the events are independent, the σ-fields σ(A1), . . . , σ(An−1) and
σ(An, An+1, . . .) are independent. If A ∈ T , then A ∈ σ(An, An+1, . . .) and
hence the events A,A1, . . . , An−1 are independent. of events is defined by in-
dependence of each finite subcollection, the events A,A1, A2, . . . are indepen-
dent. Thus, σ(A) and σ(A1, A2, . . .) are independent. Moreover, A ∈ T ⊂
σ(A1, A2, . . .), hence A is independent from itself. This means that its proba-
bility is zero or 1.
4 Random variables
4.1 Definition
Definition 32. Let (Ω,F ,P) be a probability space and let (S,S) be a measurable
space. A mapping X : Ω→ S such that
X−1(B) = {ω : X(ω) ∈ B} = {X ∈ B} ∈ F (12)
for any B ∈ S is called a ((S,S)-valued)-random variable. Such a mapping
is also called a measurable mapping.
14
Example 33. If S = R and S = B, then we simply say that X is a random
variable. Recall that B is the smallest σ-algebra that contains all intervals.
Remark 34. There is no guarantee just by the definition that the class {X−1(B) :
B ∈ S} is a σ-field. It needs a proof.
The σ-algebra S may be huge. Therefore, verifying (19) may be difficult.
The following theorem simplifies (19) in a special case.
Theorem 35. Assume that S = σ(A) and X : Ω→ S. Assume that X−1(B) ∈
F for all B ∈ A. Then X is (S,S)-valued random variable.
See Assignment 3.
Example 36. Let S = R and S = B. Then we only need to check that
X−1((−∞, x]) ∈ F for any x ∈ R (in fact, for any x ∈ Q).
4.2 Simple random variables
Let A ∈ F . Define
IA(ω) =
{
1 if ω ∈ A
0 if ω 6∈ A .
Note that for any Borel set B, {ω : IA(ω) ∈ B} is either ∅, A, Ac or Ω,
(depending on whether 0 ∈ B or not and whether 1 ∈ B or not). Thus, IA is
a random variable, called an indicator random variable. Likewise, a finite
sum
X =
d∑
i=1
xiIAi(ω)
is a simple random variable if the sets Ai form a finite partition of Ω into
F-sets.
4.3 σ-fields generated by random variables
If G ⊂ F is a σ-field, then a random variable X is G-measurable or measurable
with respect to G if {X ∈ B} = {ω : X(ω) ∈ B} ∈ G for any Borel set B.
In case of simple random variables, this reduces to {X = x} ∈ G. Indeed,
{X ∈ B} = ⋃{X = x}, where the union extends over the finitely many values
of x lying both in B and the range of X.
Example 37. Let (Ω,F ,P) = (R,B,P) and consider X(ω) = ω. Then X is
(Ω,F)-measurable (and hence a random variable). Take now G = σ([a, b] : a, b ∈
Z). Then X is not G-measurable. Indeed, the set X−1((−∞, x]) = {w : w ≤ x},
x 6∈ Z, is not in G.
Definition 38. The σ-field generated by (S,S)-valued random variable
X denoted by σ(X), is the smallest σ-field with respect to which X is measurable.
15
Example 39. Assume that S = σ(A) and X : Ω→ S. Then σ(X) is generated
by the collection of sets X−1(A) = {X−1(B) : B ∈ A} ⊂ F . That is, σ(X) =
σ(G), where G is a collection of sets A such that A = X−1(B) for some B ∈ A.
Example 40. For a random variable X we have
σ(X) = σ({ω : X(ω) ≤ q}, q ∈ Q) .
Note that {ω : X(ω) ≤ q} = X−1((−∞, q]).
Theorem 41. Assume that X is a simple random variable. Then σ(X) consists
of the sets {ω : X(ω) ∈ B}, where B ⊂ R.
Proof. Let M be the class of the subsets of Ω of the form {ω : X(ω) ∈ B}.
ThenM⊂ σ(X). At the same timeM is a σ-field. This finishes the proof. See
the textbook for details.
Theorem 42. A simple random variable Y is σ(X)-measurable if and only if
Y = f(X) for a function f : R→ R.
Example 43. For example, Y = X2 is σ(X)-measurable, but Y = X2 + Z,
where Z is another random variable is not σ(X)-measurable. However, Y is
σ(X,Z)-measurable.
4.4 Independence
Definition 44. A sequence X1, X2, . . . (finite or infinite) of (S,S)-valued ran-
dom variables is independent if the classes σ(X1), σ(X2), . . .. That is, for each
finite k,
P(X1 ∈ B1, . . . , Xk ∈ Bk) = P(X1 ∈ B1) · · ·P(Xk ∈ Bk) (13)
for all B1, . . . , Bk ∈ S.
If S = σ(A) is suffices to verify (13) for sets B1, . . . , Bk ∈ A only (see the
previous lectures). In case of random variables it reduces to
P(X1 ≤ x1, . . . , Xk ≤ xk) = P(X1 ≤ x1) · · ·P(Xk ≤ xk) , (14)
while in case of simple random variables it further reduces to
P(X1 = y1, . . . , Xk = yk) = P(X1 = y1) · · ·P(Xk = yk) , (15)
Indeed, {X1 ≤ x1} =

y≤x1{X1 = y} and the latter is a finite union. Thus,
(15) implies (14).
16
4.5 Closure properties
Recall that f is a measurable function from (S,S) to (T, T ) if
f−1(B) = {x : f(x) ∈ B} ∈ S
for any B ∈ T .
Theorem 45. Let X : Ω→ S be a (S,S)-valued random variable and let f be a
measurable function from (S,S) to (T, T ). Then the composition f ◦X : Ω→ T
is (T, T )-valued random variable.
Proof. Let B ∈ T . We know that f−1(B) ∈ S. Now
(f ◦X)−1(B) = X−1(f−1(B)) ∈ F .
Remark 46. Note that Theorem 45 extends Theorem 42 from simple to arbi-
trary random variables. In Theorem 45 we need however f to be measurable
function.
Example 47. Let X : Ω → Rd be measurable so that (X1(ω), . . . , Xd(ω)) is a
random vector. Then
∑d
i=1Xi,
∏n
i=1Xi, mini=1,...,dXi, maxi=1,...,dXi are
random variables.
4.6 Convergence of random variables
https://en.wikipedia.org/wiki/Uniform_convergence
Let (Ω,F ,P) be a probability space and let X,Xn, n ≥ 1 be random variables
defined on that probability space. That is X : Ω → R and Xn : Ω → R. Then
Xn(ω) converges to X(ω) if for each > 0 there exists n0 such that for all
n ≥ n0 we have |Xn(ω) − X(ω)| < . Thus, the complementary event is:
|Xn(ω)−X(ω)| ≥ for some > 0 and infinitely many values of n:
{lim
n
Xn = X}c =

{|Xn −X| ≥ i.o.} .
Note that the union can be taken over rational values of .
Definition 48. We say that Xn converges with probability 1 to X (con-
verges almost surely) if
P(|Xn −X| ≥ i.o.) = 0 (16)
for all > 0. We will write Xn
a.s.−→ X.
Let An = {|Xn − X| ≥ }. Then {|Xn − X| ≥ i.o.} = lim supnAn.
Thus Xn
a.s.−→ X if and only if P(lim supnAn) = 0. Since lim supn P(An) ≤
P(lim supnAn) we conclude that (16) implies
lim
n→∞P(|Xn −X| ≥ ) = 0 (17)
17
Definition 49. We say that Xn converges in probability to X if
lim
n→∞P(|Xn −X| ≥ ) = 0 (18)
for all > 0. We will write Xn
p−→ X.
Remark 50. We get immediately
Xn
a.s.−→ X ⇒ Xn p−→ X .
The converse is not true.
Example 51. Let An be events such that P(An) = 0. Take X ≡ 0 and Xn =
IAn . Note that Xn
p−→ X is equivalent to P(An) → 0. Take any sets An such
that {An i.o.} has a positive probability. For example take An = (tn, tn+sn) with
sn ↓ 0 and such that ω is covered by infinitely many intervals An: tn = (i−1)/k,
sn = 1/k when n = k(k − 1)/2 + i, i = 1, . . . , k, k ≥ 1.
Example. This example was suggested by Joe. Consider a sequence of inde-
pendent random variables Xn, n ≥ 1, taking values 1 with probability 1/n and 0
with probability 1−1/n. Note that we can take Ω = (0, 1] with P = λ and write
Xn = 1An , where An = (0, 1/n]. Then P(An) = 1/n → 0 and hence we have
convergence in probability: Xn
p−→ X, whereX ≡ 0. You can also see this in the
following way: P(|Xn| > ) = P(Xn = 1) = 1/n→ 0. But we do not almost sure
convergence. Indeed, since

n P(An) = +∞, using the second Borel-Cantelli
lemma, P(lim supAn) = 1. Thus P({ω : Xn(ω) = 1 for infinitely many n}) = 1,
hence Xn does not converge to zero almost surely.
4.7 Approximation by simple random variables
Theorem 52. For any random variable X there exists a sequence Xn of simple
random variables such that limnXn(ω) = X(ω) for each fixed ω ∈ Ω.
Proof. Let
fn(x) = nI(n,∞)(x) + 2−n
n2n−1∑
k=0
kI(k2−n,(k+1)2−n](x) .
IfX ≥ 0 thenXn = fn(X) is a simple function. Note that |X(ω)−Xn(ω)| ≤ 2−n
whenever X(ω) ≤ n. Thus the statement holds for X ≥ 0. If X is arbitrary,
then we can write X(ω) = X+(ω) − X−(ω), where X+(ω) = max{X(ω), 0}
and X−(ω) = −min{X(ω), 0}. Both are nonnegative random variables and
Xn = fn(X+)− fn(X−).
4.8 Expected Value
Let X be a simple random variable
X =
d∑
i=1
xi1Ai .
18
Then the expected value is define as
E[X] =
d∑
i=1
xiP(Ai) . (19)
On the other hand, we have
E[X] =

x
xP(X = x) . (20)
Indeed, the right-hand sides of (19) and (20) can be written as∑
x

i:xi=x
xiP(Ai) .
In particular, E[IA] = P(A).
If Y = f(X) then Y is the simple variable of the form
Y =
d∑
i=1
f(xi)1Ai
and hence
E[Y ] = E[f(X)] =
d∑
i=1
f(xi)P(Ai) =

x
f(x)P(X = x) .
The k-th moment of X is given by
E[Xk] =

x
xkP(X = x) .
The properties of the expected value follow from the corresponding properties of sum
operation:
• Linearity: If X = ∑i xi1Ai and Y = ∑j yj1Bj , and α, β ∈ R, then
xiIAi = xi

j
1Ai∩Bj
and hence
αX + βY =

i,j
(αxi + βyj)1Ai∩Bj .
Then
E[αX + βY ] =

i,j
(αxi + βyj)P(Ai ∩Bj)
= α

i
xiP(Ai) + β

j
yjP(Bj) = αE[X] + βE[Y ] .
• It extends to a finite number of simple random variables:
E[
n∑
i=1
Xi] =
n∑
i=1
E[Xi] . (21)
19
• If X(ω) ≤ Y (ω) for all ω, then E[X] ≤ E[Y ];
• |E[X]| ≤ E[|X|];

|E[X − Y ]| ≤ E[|X − Y |] . (22)
• If X and Y are independent, then
E[XY ] = E
[∑
i,j
xiyj1Ai1Bj
]
= E
[∑
i,j
xiyj1Ai∩Bj
]
=

i,j
xiyjP(Ai ∩Bj)
=

i,j
xiyjP(Ai)P(Bj) =
(∑
i
xiP(Ai)
)(∑
j
yjP(Bj)
)
= E[X]E[Y ] .
Assume that X is nonnegative. Order the range as 0 ≤ x1 < x2 · · · ≤ xd. Then
E[X] =
d∑
i=i
xiP(X = xi) =
d−1∑
i=1
xi (P(X ≥ xi)− P(X ≥ xi−1)) + xdP(X = xd)
= x1P(X ≥ x1) +
d∑
i=2
(xi − xi−1)P(X ≥ xi) .
This can be written a ∫ ∞
0
P(X ≥ x)dx .
Theorem 53. If the sequence {Xn} is uniformly bounded and X = limnXn (that is
Xn
a.s.−→ X), then limn E[Xn] = E[X].
Proof. There exists K0 such that supn supω |Xn(ω)| < K0. Let K1 = supω |X(ω)|,
K = max{K0,K1}. Then supn supω |X(ω) − Xn(ω)| ≤ 2K. Let An = {ω : |X(ω) −
Xn(ω)| > }. Then for all ω
|X(ω)−Xn(ω)| = |X −Xn|1An(ω) + |X −Xn|1A′n(ω) ≤ K1An(ω) + 1A′n(ω) .
Thus
E [|X(ω)−Xn(ω)|] ≤ KP(An) + P(A′n) .
Since Xn
a.s.−→ X, we also have Xn p−→ X and by the definition limn P(An) = 0. This
implies
lim
n
E [|X(ω)−Xn(ω)|] ≤ .
Since is arbitrary, limn E [|X(ω)−Xn(ω)|] = 0. Apply (22).
Let µ = E[X]. We define variance
Var(X) = E[(X − µ)2] = E[X2]− µ2 .
Justification:
E[(X − µ)2] = E[X2 − 2µX + µ2] = E[X2]− E[2µX] + E[µ2]
= E[X2]− 2µE[X] + µ2 = E[X2]− 2µ2 + µ2 = E[X2]− µ2 .
20

Var(αX + β) = α2Var(X) .
Justification: Let Y = aX + b. Then µY = E[Y ] = aE[X] + b = aµX + b. We
have
Var(aX + b) = Var(Y ) = E[Y 2]− µ2Y = E[(aX + b)2]− (aµX + b)2
= E[a2X2 + 2abX + b2]− (a2µ2X + 2abµX + b2)
= E[a2X2] + E[2abX] + E[b2]− (a2µ2X + 2abµX + b2)
= a2E[X2] + 2abE[X] + b2 − (a2µ2X + 2abµX + b2)
= a2
{
E[X2]− µ2X
}
= a2Var(X) .
• Let S = ∑ni=1Xi and µi = E[Xi]. Using (21) we get
Var (S) = E
[(
n∑
i=1
(Xi − µi)
)2]
=
n∑
i=1
E[(Xi − µi)2] + 2

1≤iE[(Xi − µi)(Xj − µj)] .
• If Xi are independent, then
Var
(
n∑
i=1
Xi
)
=
n∑
i=1
Var(Xi) .
4.9 Inequalities
Lemma 54. Assume that X is nonnegative. Then for any > 0,
P(X ≥ ) ≤ −1E[X] .
Proof.
E[X] =

x
xP(X = x) ≥

x,x≥
xP(X = x) ≥

x,x≥
P(X = x) .
Applying this to |X|k we obtain the Markov inequality: for any X:
P(|X| ≥ ) ≤ −kE[|X|k] .
Applying this with k = 2 and |X − µ| we obtain the Chebyshev inequality:
P(|X − µ| > ) ≤ −2Var(X) .
Other inequalities:
• If ϕ is convex then we obtain the Jensen inequality:
ϕ(E[X]) ≤ E[ϕ(X)] .
21
• If p, q > 1 and 1/p+ 1/q = 1, then we obtain the Ho¨lder inequality:
E[|XY |] ≤ (E[|X|q])1/q(E[|X|p])1/p
(Use ab ≤ ap/p+ bq/q for a, b > 0).
• If 0 < p ≤ q then we obtain the Lyapunov inequality:
(E[|X|p])1/p ≤ (E[|X|q])1/q .
5 Measures in Euclidean Spaces
5.1 Lebesgue Measure
The Lebesgue measure λ was already defined on B, the Borel σ-field on R. It is the
only measure that satisfies λ((a, b]) = b− a. The class Bk is a σ-field generated by the
sets of the form
Ik :=
k∏
i=1
(ai, bi] = (a1, b1]× · · · × (ak, bk] .
The k-dimensional Lebesgue measure λk is defined as
λk(Ik) =
k∏
i=1
(bi − ai) .
The extension theorem allows us to consider this measure on Bk.
Theorem 55. If A ∈ Bk, then for x ∈ Rk, A + x = {a + x : a ∈ A} ∈ Bk and
λk(A+ x) = λk(A).
Proof. Let G = {A ∈ Bk : A+ x ∈ Bk for all x ∈ Rk}. Then
• G contains Ik, the class of all the rectangles Ik;
• G is a σ-field. For example, if A,B ∈ G, then C = A ∪ B ∈ G. Indeed,
C + x = (A ∪B) + x = (A+ x) ∪ (B + x).
• Thus, Bk = σ(Ik) ⊆ G.
For the second statement,
• Note that Ik is a pi-system (closed under intersections; this is not a field!);
• Fix x ∈ Rk and define µ(A) = λk(A+ x).
• Then µ and λk agree on the pi-system since for A ∈ Ik, λk(A) = λk(A + x).
Thus, µ and λk agree on all Borel sets.
Let T : Rk → Rk be a linear and nonsingular map. For example,
• rotation or reflection (special case of so-called orthogonal or unitary transfor-
mation); det(T ) = ±1;
• T (x1, . . . , xk) = (x1 + x2, x2, . . . , xk), then det(T ) = 1;
• T (x1, . . . , xk) = (ax1, x2, . . . , xk), then det(T ) = a.
For A ∈ Rk denote TA = {Ta : a ∈ A}.
Theorem 56. If T is linear and nonsingular, then A ∈ Bk implies TA ∈ Bk and
λk(TA) = |det(T )|λk(A) .
22
Singular case. Note that
B = {a} × (a2, b2]× · · · × (ak, bk]
can be viewed as an element of both Rk and Rk−1. Then
λk(B) = 0 , λk−1(B) =
k∏
i=2
(bi − ai) .
Note that B can be viewed as the image of
A = (a1, b1]× (a2, b2]× · · · × (ak, bk]
through the projection: B = TA, where T (x1, x2, . . . , xk) = (a, x2, . . . , xk). We have
det(T ) = 0.
In fact we have a general statement: if A, TA ∈ Bk and T is singular, then
λk(TA) = 0. Note however, contrary to the singular case, it is not always true that
TA ∈ Bk.
5.2 Regularity
Open and closed sets A subset A ∈ R is open if for every x ∈ A there exists
> 0 such that x+ ∈ A. For example, (a, b) is open, but (a, b] is not. For x, y ∈ Rk
let d(x, y) =
√∑k
i=1(xi − yi)2 be the Euclidean distance. Then A ∈ Rk is open if for
any x ∈ A there exists > 0 such that d(x, y) < implies y ∈ A. Equivalently, a set
A is open if there exists > 0 such that the ball B(x, ) = {y ∈ Rk : d(x, y) < } ⊆ A.
A closed is a complement of an open set. A compact set is bounded and closed.
Theorem 57. For any A ∈ Bk and > 0 there exist open and closed sets O and C
such that C ⊂ A ⊂ O and λk(O \ C) < .
5.3 Specifying Measures on the Line
Measures on R Let µ be a measure on R. Then
• µ is finite on bounded sets if assigns finite values to bounded subsets A ∈ R.
• µ is a finite measure if µ(R) <∞.
• Any probability measure is a finite measure (in fact, probability measures are
normalized finite measures).
• The Lebesgue measure is finite on bounded sets, but it is not finite: λ(R) =∞.
For a measure that is finite on bounded sets, define
F (x) =
{
µ((0, x]) , x ≥ 0
−µ((x, 0]) , x < 0 .
• The function F is nondecreasing;
• The function F is continuous from the right: if xn ↓ x, then F (xn)→ F (x).
23
Distribution function For every bounded interval (a, b],
µ((a, b]) = F (b)− F (a) . (23)
Note that (23) determines F up to an additive constant: if we know µ then from (23)
we can recover F (x) + c. If moreover µ is finite, then we can alternatively define
F (x) = µ((−∞, x]) .
Then:
• limx→−∞ F (x) = 0;
• limx→+∞ F (x) = µ(R);
• If µ(R) = 1, then F is called a (cumulative) distribution function.
• If µ = P is a probability measure, then for some random variable µ((−∞, x]) =
P(X ≤ x).
Theorem 58. Let F be a nondecreasing, right-continuous function. Then there exists
a unique measure µ such that (23) holds for all a, b. In particular, there is 1-1
equivalence between probability measures and distribution functions.
Proof. We need to show first that µ defined in (23) is a measure. Let A = (a, b] and
B = (c, d] be disjoint. We want to show that µ((a, b] ∪ (c, d]) = µ((a, b]) + µ((c, d]).
(Full details in assignment)
Example 59. Consider the following measure defined on the Borel sets of [0, 1]:
µ = (1/3)δ1/2 + (2/3)δ1 ,
where δx is the Dirac measure. Then
F (x) =

0 if x < 1/2 ,
1/3 if 1/2 ≤ x < 2/3 ,
1 if 2/3 ≤ x < 1 .
It is the probability measure since the total mass is one.
5.4 Specifying Measures on R2
Measures on R2 Let a = (a1, a2), b = (b1, b2) and A = (a, b] = (a1, b1] × (a2, b2]
be a bounded rectangle. For x ∈ R2 consider
Sx = {y ∈ R2 : y1 ≤ x1, y2 ≤ x2} .
The class of sets {Sx, x ∈ R2} generates B2. For a function F : R2 → R define
∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2) .
Let now µ be a finite measure on R2. Define F by
F (x) = µ(Sx) = µ({y : y1 ≤ x1, y2 ≤ x2}) . (24)
24
Theorem 60. Let F be continuous from above and such that ∆AF ≥ 0 for any
bounded rectangle A. Then there exists a unique measure µ such that (24) holds.
If µ = P is a probability measure, then for some random vector (X,Y ),
∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2)
= µ((−∞, b1]× (−∞, b2])− µ((−∞, b1]× (−∞, a2])
+ µ((−∞, a1]× (−∞, b2]) + µ((−∞, a1]× (−∞, a2])
= P(X ≤ b1, Y ≤ b2)− P(X ≤ b1, Y ≤ a2)
− P(X ≤ a1, Y ≤ b2) + P(X ≤ a1, Y ≤ a2)
= P(a1 < X ≤ b2, a2 < Y ≤ b2) .
Thus, the requirement ∆AF ≥ 0 is quite natural.
6 Measurable Functions and Mappings
6.1 Measurable Mappings
Let (Ω,F) and (Ω′,F ′) be two measurable spaces. A map T : Ω→ Ω′ is measurable
(sometimes written as F/F ′-measurable) if for each A′ ∈ F ′, we have T−1A′ ∈ F .
Recall that a F/B-measurable map is called a random variable. In this we will simply
write F-measurable.
Example 61. A real function f : Ω→ R with a finite range is measurable if f−1({x}) ∈
F .
Theorem 62. If T−1A′ ∈ F for each A′ ∈ A′ and σ(A′) = F ′, then T is F/F ′-
measurable. (cf. theorem 4, Slides Set E).
Proof. Let G = {A′ : T−1A′ ∈ F}. Then G is a σ-field and contains A′.
Let T : Ω → Ω′ and T ′ : Ω′ → Ω′′ . The space Ω′′ is equipped with a σ-field F ′′ .
A composition T ′ ◦ T : Ω→ Ω′′ is defined as ω → T ′(T (ω)).
Theorem 63. If T is F/F ′-measurable and T ′ is F ′/F ′′ -measurable then T ′ ◦ T is
F/F ′′ -measurable (composition of measurable maps is measurable).
6.2 Mappings into Rk
A map f : Ω→ Rk has the form
f(ω) = (f1(ω), . . . , fk(ω)) .
Since the sets
Sx = {y ∈ Rk : yi ≤ xi, i = 1, . . . , k}
generate Bk, a function f is F-measurable if and only if the set
{ω : fi(ω) ≤ xi, i = 1, . . . , k}
belongs to F for each (x1, . . . , xk).
Theorem 64. If f : Ri → Rk is continuous, then it is measurable.
Proof. Take for simplicity i = 1, k = 1. We need to show that Sx = {ω : f(ω) ≤ x} ∈ B
for each x. But continuity implies that Sx is a closed set, hence a complement of an
open set. Open sets are Borel sets.
25
6.3 Limits and Measurability
Theorem 65. Assume that fn, n ≥ 1, are real valued F-measurable functions. Then
(a) The functions supn fn, inf fn, lim supn fn, lim inf fn are measurable.
(b) If limn fn exists everywhere then it is measurable.
(c) The set {ω : limn fn(ω) exists} ∈ F .
Proof. We have
{ω : sup
n
fn(ω) ≤ x} = ∩n{ω : fn(ω) ≤ x} ∈ F .
Thus, supn fn is measurable.
Note: Assume that we have a probability space (Ω,F ,P) and f : Ω → R. Then
”limn fn exists everywhere” means that
P({ω : lim
n
fn(ω) does not exist}) = 0 .
For example if P = λ then a function f : (0, 1] → R that has jumps at countable
number of points is not continuous, but measurable.
Example 66. Let f(x) = x, g(x) = 1−x, x ∈ [0, 1]. Then h = sup{f, g} is a function
on [0, 1] defined as h(x) = 1− x, x ∈ [0, 1/2], h(x) = x, x ∈ (1/2, 1].
6.4 Transformations of Measures
Let (Ω,F , µ) be a measurable space (for example, a probability space). Let T : Ω→ Ω′
be F/F ′-measurable. Define a set function4 ν
ν(A′) = µT−1(A′) = µ(T−1A′) , A′ ∈ F ′ .
We note that ν = µT−1 is a set function on F ′.
It is not obvious that ν is a measure itself. However, assume that A′, B′ ∈ F ′ are
disjoint. Then the sets T−1(A′) and T−1(B′) are also disjoint and
ν(A′ ∪B′) = µ(T−1(A′ ∪B′)) = µ(T−1(A′) ∪ T−1(B′))
= µ(T−1(A′)) + µ(T−1(B′)) = ν(A′) + ν(B′) .
In a similar way you can justify other properties. Hence ν is a measure.
• If µ is finite then µT−1 is also finite.
• If µ is a probability measure then µT−1 is also a probability measure.
Indeed,
ν(Ω′) = µ(T−1Ω′) = µ(Ω) = 1 .
4Note that any measure is a set function, but not vice-versa
26
Measure preserving transformations
Definition 67. Let T : Ω→ Ω. If
µ(A) = µ(T−1(A)) , A ∈ F ,
then we say that T preserves the measure µ.
Example 68. If Ω = Rk and T is a linear map with det(T ) = 1, then T preserves
the Lebesque measure λk.
Example 69. The identity map is always measure preserving.
7 Distributions
7.1 Transformation of probability measures. Distributions
Consider a probability space (Ω,F ,P).
Recall that a random variable X is a measurable map X : Ω → R. Define the
measure µ on B by
µ = PX−1 .
Thus, for each A ∈ B we have
µ(A) = P(X−1A) = P({ω : X(ω) ∈ A}) = P(X ∈ A) .
We say that µ is the law or the distribution of a random variable X.
The (cumulative) distribution function of X is
F (x) = µ((−∞, x]) = P(X ≤ x) .
Example 70. Consider a probability space ([0, 1],B, λ). Define a map U(ω) = ω.
Then µ = λ and the cumulative distribution function of U is
FU (x) = P(U ≤ x) =

0 x < 0 ,
x x ∈ [0, 1] ,
1 x > 1 .
We say that U has a uniform distribution.
Example 71. Consider a probability space ([0, 1],B, λ). Assume that X is a random
variable defined on Ω with the law µ = PX−1. Define Y = X + b, b ∈ R. Note that Y
is also defined on Ω. Consider the law ν = P−1Y . Recall that for A ⊆ R the set A− b
is defined as A− b = {a− b : a ∈ A}. Then
ν(A) = P(Y −1A) = P({ω : Y (ω) ∈ A}) = P({ω : X(ω) + b ∈ A})
= P({ω : X(ω) ∈ A− b}) = P(X ∈ A− b) = µ(A− b) .
If FX and FY are the cumulative distribution functions for X and Y , respectively, then
FY (x) = FX(x− b) .
27
Properties of cdf
• limx→−∞ F (x) = 0;
• limx→+∞ F (x) = 1;
• If F is a distribution function then
F¯ (x) = 1− F (x) = 1− P(X ≤ x) = P(X > x)
is called the tail distribution function.
• F (x−) = limy↑x F (y) = P(X < x) = µ((−∞, x));
• P(X = x) = F (x)− F (x−). If P(X = x) 6= 0 then we say that F has a jump at
x.
Example 72. Assume that X is a nonnegative random variable with the cumulative
distribution function FX (for example, without any jumps). For b > 0 define
Y = (X − b)+ =
{
X − b if X > b ,
0 if X ≤ b .
By the definition Y ≥ 0. Therefore, P (Y ≤ y) = 0 for all y < 0.
For y = 0 we have
P(Y ≤ 0) = P(Y = 0) = P(X ≤ b) = FX(b) .
Moreover, for y > 0,
P(Y ≤ y) = P(Y ≤ y,X ≤ b) + P(Y ≤ y,X > b)
= P(0 ≤ y,X ≤ b) + P(X − b ≤ y,X > b)
= P(X ≤ b) + P(b < X ≤ y + b)
= P(X ≤ y + b) = FX(y + b) .
In summary
FY (y) =

0 , y < 0
FX(b) , y = 0
FX(y + b) , y > 0
.
Note that the cumulative distribution function has the jump of size P (X ≤ b) at point
y = 0.
Quantile function Let F be the cumulative distribution function. Recall that
F : R→ [0, 1]. Define a function Q : [0, 1]→ R by
Q(y) = inf{x : F (x) ≥ y} .
• If F is strictly increasing and continuous (no jumps), then Q is nothing else but
the inverse function of F (strictly increasing and continuous). Then Q(F (x)) =
x.
• If F is strictly increasing but has some jumps, then Q is continuous but not
strictly increasing.
• If F is strictly not increasing but continuous, then Q has jumps but is strictly
increasing.
28
Lemma 73. Assume that U is uniform on [0, 1], that is its distribution is given by
FU (x) = 0 if x < 0, FU (x) = x for x ∈ [0, 1] and FU (x) = 1 for x > 1. Let F
be a strictly increasing and continuous distribution function and let Q be its quantile
function. Define X = Q(U). Then X has distribution F .
Proof. We have
P(X ≤ x) = P(Q(U) ≤ x) = P(F (Q(U)) ≤ F (x))
= P(U ≤ F (x)︸ ︷︷ ︸
=y
) = y = F (x) .
Remark 74. This result is very important for simulations.
Existence theorem Before, we started with a probability measure P and a random
variable X and we defined the cdf. The following result gives a converse statement.
Theorem 75. If F is a nondecreasing, right-continuous function satisfying limx→−∞ F (x) =
0 and limx→∞ F (x) = 1, then there exists on some probability space (Ω,F ,P) a random
variable X for which F (x) = P(X ≤ x).
Proof. Assume for a moment that F is strictly increasing and continuous. From Ex-
ample 70 we already know that there exists an uniform random variable U defined on
the probability space ([0, 1],B, λ). By Lemma 73 we can construct a random variable
X with the cdf F .
For the general case, please see the textbook.
29

E[(Xi − µi)(Xj − µj)] .
• If Xi are independent, then
Var
(
n∑
i=1
Xi
)
=
n∑
i=1
Var(Xi) .
4.9 Inequalities
Lemma 54. Assume that X is nonnegative. Then for any > 0,
P(X ≥ ) ≤ −1E[X] .
Proof.
E[X] =

x
xP(X = x) ≥

x,x≥
xP(X = x) ≥

x,x≥
P(X = x) .
Applying this to |X|k we obtain the Markov inequality: for any X:
P(|X| ≥ ) ≤ −kE[|X|k] .
Applying this with k = 2 and |X − µ| we obtain the Chebyshev inequality:
P(|X − µ| > ) ≤ −2Var(X) .
Other inequalities:
• If ϕ is convex then we obtain the Jensen inequality:
ϕ(E[X]) ≤ E[ϕ(X)] .
21
• If p, q > 1 and 1/p+ 1/q = 1, then we obtain the Ho¨lder inequality:
E[|XY |] ≤ (E[|X|q])1/q(E[|X|p])1/p
(Use ab ≤ ap/p+ bq/q for a, b > 0).
• If 0 < p ≤ q then we obtain the Lyapunov inequality:
(E[|X|p])1/p ≤ (E[|X|q])1/q .
5 Measures in Euclidean Spaces
5.1 Lebesgue Measure
The Lebesgue measure λ was already defined on B, the Borel σ-field on R. It is the
only measure that satisfies λ((a, b]) = b− a. The class Bk is a σ-field generated by the
sets of the form
Ik :=
k∏
i=1
(ai, bi] = (a1, b1]× · · · × (ak, bk] .
The k-dimensional Lebesgue measure λk is defined as
λk(Ik) =
k∏
i=1
(bi − ai) .
The extension theorem allows us to consider this measure on Bk.
Theorem 55. If A ∈ Bk, then for x ∈ Rk, A + x = {a + x : a ∈ A} ∈ Bk and
λk(A+ x) = λk(A).
Proof. Let G = {A ∈ Bk : A+ x ∈ Bk for all x ∈ Rk}. Then
• G contains Ik, the class of all the rectangles Ik;
• G is a σ-field. For example, if A,B ∈ G, then C = A ∪ B ∈ G. Indeed,
C + x = (A ∪B) + x = (A+ x) ∪ (B + x).
• Thus, Bk = σ(Ik) ⊆ G.
For the second statement,
• Note that Ik is a pi-system (closed under intersections; this is not a field!);
• Fix x ∈ Rk and define µ(A) = λk(A+ x).
• Then µ and λk agree on the pi-system since for A ∈ Ik, λk(A) = λk(A + x).
Thus, µ and λk agree on all Borel sets.
Let T : Rk → Rk be a linear and nonsingular map. For example,
• rotation or reflection (special case of so-called orthogonal or unitary transfor-
mation); det(T ) = ±1;
• T (x1, . . . , xk) = (x1 + x2, x2, . . . , xk), then det(T ) = 1;
• T (x1, . . . , xk) = (ax1, x2, . . . , xk), then det(T ) = a.
For A ∈ Rk denote TA = {Ta : a ∈ A}.
Theorem 56. If T is linear and nonsingular, then A ∈ Bk implies TA ∈ Bk and
λk(TA) = |det(T )|λk(A) .
22
Singular case. Note that
B = {a} × (a2, b2]× · · · × (ak, bk]
can be viewed as an element of both Rk and Rk−1. Then
λk(B) = 0 , λk−1(B) =
k∏
i=2
(bi − ai) .
Note that B can be viewed as the image of
A = (a1, b1]× (a2, b2]× · · · × (ak, bk]
through the projection: B = TA, where T (x1, x2, . . . , xk) = (a, x2, . . . , xk). We have
det(T ) = 0.
In fact we have a general statement: if A, TA ∈ Bk and T is singular, then
λk(TA) = 0. Note however, contrary to the singular case, it is not always true that
TA ∈ Bk.
5.2 Regularity
Open and closed sets A subset A ∈ R is open if for every x ∈ A there exists
> 0 such that x+ ∈ A. For example, (a, b) is open, but (a, b] is not. For x, y ∈ Rk
let d(x, y) =
√∑k
i=1(xi − yi)2 be the Euclidean distance. Then A ∈ Rk is open if for
any x ∈ A there exists > 0 such that d(x, y) < implies y ∈ A. Equivalently, a set
A is open if there exists > 0 such that the ball B(x, ) = {y ∈ Rk : d(x, y) < } ⊆ A.
A closed is a complement of an open set. A compact set is bounded and closed.
Theorem 57. For any A ∈ Bk and > 0 there exist open and closed sets O and C
such that C ⊂ A ⊂ O and λk(O \ C) < .
5.3 Specifying Measures on the Line
Measures on R Let µ be a measure on R. Then
• µ is finite on bounded sets if assigns finite values to bounded subsets A ∈ R.
• µ is a finite measure if µ(R) <∞.
• Any probability measure is a finite measure (in fact, probability measures are
normalized finite measures).
• The Lebesgue measure is finite on bounded sets, but it is not finite: λ(R) =∞.
For a measure that is finite on bounded sets, define
F (x) =
{
µ((0, x]) , x ≥ 0
−µ((x, 0]) , x < 0 .
• The function F is nondecreasing;
• The function F is continuous from the right: if xn ↓ x, then F (xn)→ F (x).
23
Distribution function For every bounded interval (a, b],
µ((a, b]) = F (b)− F (a) . (23)
Note that (23) determines F up to an additive constant: if we know µ then from (23)
we can recover F (x) + c. If moreover µ is finite, then we can alternatively define
F (x) = µ((−∞, x]) .
Then:
• limx→−∞ F (x) = 0;
• limx→+∞ F (x) = µ(R);
• If µ(R) = 1, then F is called a (cumulative) distribution function.
• If µ = P is a probability measure, then for some random variable µ((−∞, x]) =
P(X ≤ x).
Theorem 58. Let F be a nondecreasing, right-continuous function. Then there exists
a unique measure µ such that (23) holds for all a, b. In particular, there is 1-1
equivalence between probability measures and distribution functions.
Proof. We need to show first that µ defined in (23) is a measure. Let A = (a, b] and
B = (c, d] be disjoint. We want to show that µ((a, b] ∪ (c, d]) = µ((a, b]) + µ((c, d]).
(Full details in assignment)
Example 59. Consider the following measure defined on the Borel sets of [0, 1]:
µ = (1/3)δ1/2 + (2/3)δ1 ,
where δx is the Dirac measure. Then
F (x) =

0 if x < 1/2 ,
1/3 if 1/2 ≤ x < 2/3 ,
1 if 2/3 ≤ x < 1 .
It is the probability measure since the total mass is one.
5.4 Specifying Measures on R2
Measures on R2 Let a = (a1, a2), b = (b1, b2) and A = (a, b] = (a1, b1] × (a2, b2]
be a bounded rectangle. For x ∈ R2 consider
Sx = {y ∈ R2 : y1 ≤ x1, y2 ≤ x2} .
The class of sets {Sx, x ∈ R2} generates B2. For a function F : R2 → R define
∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2) .
Let now µ be a finite measure on R2. Define F by
F (x) = µ(Sx) = µ({y : y1 ≤ x1, y2 ≤ x2}) . (24)
24
Theorem 60. Let F be continuous from above and such that ∆AF ≥ 0 for any
bounded rectangle A. Then there exists a unique measure µ such that (24) holds.
If µ = P is a probability measure, then for some random vector (X,Y ),
∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2)
= µ((−∞, b1]× (−∞, b2])− µ((−∞, b1]× (−∞, a2])
+ µ((−∞, a1]× (−∞, b2]) + µ((−∞, a1]× (−∞, a2])
= P(X ≤ b1, Y ≤ b2)− P(X ≤ b1, Y ≤ a2)
− P(X ≤ a1, Y ≤ b2) + P(X ≤ a1, Y ≤ a2)
= P(a1 < X ≤ b2, a2 < Y ≤ b2) .
Thus, the requirement ∆AF ≥ 0 is quite natural.
6 Measurable Functions and Mappings
6.1 Measurable Mappings
Let (Ω,F) and (Ω′,F ′) be two measurable spaces. A map T : Ω→ Ω′ is measurable
(sometimes written as F/F ′-measurable) if for each A′ ∈ F ′, we have T−1A′ ∈ F .
Recall that a F/B-measurable map is called a random variable. In this we will simply
write F-measurable.
Example 61. A real function f : Ω→ R with a finite range is measurable if f−1({x}) ∈
F .
Theorem 62. If T−1A′ ∈ F for each A′ ∈ A′ and σ(A′) = F ′, then T is F/F ′-
measurable. (cf. theorem 4, Slides Set E).
Proof. Let G = {A′ : T−1A′ ∈ F}. Then G is a σ-field and contains A′.
Let T : Ω → Ω′ and T ′ : Ω′ → Ω′′ . The space Ω′′ is equipped with a σ-field F ′′ .
A composition T ′ ◦ T : Ω→ Ω′′ is defined as ω → T ′(T (ω)).
Theorem 63. If T is F/F ′-measurable and T ′ is F ′/F ′′ -measurable then T ′ ◦ T is
F/F ′′ -measurable (composition of measurable maps is measurable).
6.2 Mappings into Rk
A map f : Ω→ Rk has the form
f(ω) = (f1(ω), . . . , fk(ω)) .
Since the sets
Sx = {y ∈ Rk : yi ≤ xi, i = 1, . . . , k}
generate Bk, a function f is F-measurable if and only if the set
{ω : fi(ω) ≤ xi, i = 1, . . . , k}
belongs to F for each (x1, . . . , xk).
Theorem 64. If f : Ri → Rk is continuous, then it is measurable.
Proof. Take for simplicity i = 1, k = 1. We need to show that Sx = {ω : f(ω) ≤ x} ∈ B
for each x. But continuity implies that Sx is a closed set, hence a complement of an
open set. Open sets are Borel sets.
25
6.3 Limits and Measurability
Theorem 65. Assume that fn, n ≥ 1, are real valued F-measurable functions. Then
(a) The functions supn fn, inf fn, lim supn fn, lim inf fn are measurable.
(b) If limn fn exists everywhere then it is measurable.
(c) The set {ω : limn fn(ω) exists} ∈ F .
Proof. We have
{ω : sup
n
fn(ω) ≤ x} = ∩n{ω : fn(ω) ≤ x} ∈ F .
Thus, supn fn is measurable.
Note: Assume that we have a probability space (Ω,F ,P) and f : Ω → R. Then
”limn fn exists everywhere” means that
P({ω : lim
n
fn(ω) does not exist}) = 0 .
For example if P = λ then a function f : (0, 1] → R that has jumps at countable
number of points is not continuous, but measurable.
Example 66. Let f(x) = x, g(x) = 1−x, x ∈ [0, 1]. Then h = sup{f, g} is a function
on [0, 1] defined as h(x) = 1− x, x ∈ [0, 1/2], h(x) = x, x ∈ (1/2, 1].
6.4 Transformations of Measures
Let (Ω,F , µ) be a measurable space (for example, a probability space). Let T : Ω→ Ω′
be F/F ′-measurable. Define a set function4 ν
ν(A′) = µT−1(A′) = µ(T−1A′) , A′ ∈ F ′ .
We note that ν = µT−1 is a set function on F ′.
It is not obvious that ν is a measure itself. However, assume that A′, B′ ∈ F ′ are
disjoint. Then the sets T−1(A′) and T−1(B′) are also disjoint and
ν(A′ ∪B′) = µ(T−1(A′ ∪B′)) = µ(T−1(A′) ∪ T−1(B′))
= µ(T−1(A′)) + µ(T−1(B′)) = ν(A′) + ν(B′) .
In a similar way you can justify other properties. Hence ν is a measure.
• If µ is finite then µT−1 is also finite.
• If µ is a probability measure then µT−1 is also a probability measure.
Indeed,
ν(Ω′) = µ(T−1Ω′) = µ(Ω) = 1 .
4Note that any measure is a set function, but not vice-versa
26
Measure preserving transformations
Definition 67. Let T : Ω→ Ω. If
µ(A) = µ(T−1(A)) , A ∈ F ,
then we say that T preserves the measure µ.
Example 68. If Ω = Rk and T is a linear map with det(T ) = 1, then T preserves
the Lebesque measure λk.
Example 69. The identity map is always measure preserving.
7 Distributions
7.1 Transformation of probability measures. Distributions
Consider a probability space (Ω,F ,P).
Recall that a random variable X is a measurable map X : Ω → R. Define the
measure µ on B by
µ = PX−1 .
Thus, for each A ∈ B we have
µ(A) = P(X−1A) = P({ω : X(ω) ∈ A}) = P(X ∈ A) .
We say that µ is the law or the distribution of a random variable X.
The (cumulative) distribution function of X is
F (x) = µ((−∞, x]) = P(X ≤ x) .
Example 70. Consider a probability space ([0, 1],B, λ). Define a map U(ω) = ω.
Then µ = λ and the cumulative distribution function of U is
FU (x) = P(U ≤ x) =

0 x < 0 ,
x x ∈ [0, 1] ,
1 x > 1 .
We say that U has a uniform distribution.
Example 71. Consider a probability space ([0, 1],B, λ). Assume that X is a random
variable defined on Ω with the law µ = PX−1. Define Y = X + b, b ∈ R. Note that Y
is also defined on Ω. Consider the law ν = P−1Y . Recall that for A ⊆ R the set A− b
is defined as A− b = {a− b : a ∈ A}. Then
ν(A) = P(Y −1A) = P({ω : Y (ω) ∈ A}) = P({ω : X(ω) + b ∈ A})
= P({ω : X(ω) ∈ A− b}) = P(X ∈ A− b) = µ(A− b) .
If FX and FY are the cumulative distribution functions for X and Y , respectively, then
FY (x) = FX(x− b) .
27
Properties of cdf
• limx→−∞ F (x) = 0;
• limx→+∞ F (x) = 1;
• If F is a distribution function then
F¯ (x) = 1− F (x) = 1− P(X ≤ x) = P(X > x)
is called the tail distribution function.
• F (x−) = limy↑x F (y) = P(X < x) = µ((−∞, x));
• P(X = x) = F (x)− F (x−). If P(X = x) 6= 0 then we say that F has a jump at
x.
Example 72. Assume that X is a nonnegative random variable with the cumulative
distribution function FX (for example, without any jumps). For b > 0 define
Y = (X − b)+ =
{
X − b if X > b ,
0 if X ≤ b .
By the definition Y ≥ 0. Therefore, P (Y ≤ y) = 0 for all y < 0.
For y = 0 we have
P(Y ≤ 0) = P(Y = 0) = P(X ≤ b) = FX(b) .
Moreover, for y > 0,
P(Y ≤ y) = P(Y ≤ y,X ≤ b) + P(Y ≤ y,X > b)
= P(0 ≤ y,X ≤ b) + P(X − b ≤ y,X > b)
= P(X ≤ b) + P(b < X ≤ y + b)
= P(X ≤ y + b) = FX(y + b) .
In summary
FY (y) =

0 , y < 0
FX(b) , y = 0
FX(y + b) , y > 0
.
Note that the cumulative distribution function has the jump of size P (X ≤ b) at point
y = 0.
Quantile function Let F be the cumulative distribution function. Recall that
F : R→ [0, 1]. Define a function Q : [0, 1]→ R by
Q(y) = inf{x : F (x) ≥ y} .
• If F is strictly increasing and continuous (no jumps), then Q is nothing else but
the inverse function of F (strictly increasing and continuous). Then Q(F (x)) =
x.
• If F is strictly increasing but has some jumps, then Q is continuous but not
strictly increasing.
• If F is strictly not increasing but continuous, then Q has jumps but is strictly
increasing.
28
Lemma 73. Assume that U is uniform on [0, 1], that is its distribution is given by
FU (x) = 0 if x < 0, FU (x) = x for x ∈ [0, 1] and FU (x) = 1 for x > 1. Let F
be a strictly increasing and continuous distribution function and let Q be its quantile
function. Define X = Q(U). Then X has distribution F .
Proof. We have
P(X ≤ x) = P(Q(U) ≤ x) = P(F (Q(U)) ≤ F (x))
= P(U ≤ F (x)︸ ︷︷ ︸
=y
) = y = F (x) .
Remark 74. This result is very important for simulations.
Existence theorem Before, we started with a probability measure P and a random
variable X and we defined the cdf. The following result gives a converse statement.
Theorem 75. If F is a nondecreasing, right-continuous function satisfying limx→−∞ F (x) =
0 and limx→∞ F (x) = 1, then there exists on some probability space (Ω,F ,P) a random
variable X for which F (x) = P(X ≤ x).
Proof. Assume for a moment that F is strictly increasing and continuous. From Ex-
ample 70 we already know that there exists an uniform random variable U defined on
the probability space ([0, 1],B, λ). By Lemma 73 we can construct a random variable
X with the cdf F .
For the general case, please see the textbook.
29 