xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

程序代写案例-MAT 5170

时间：2020-12-12

MAT 5170 Probability Theory I - Lecture Notes

October 23, 2020

1 Probability Measures

A quick review for sets and events

• ∅ denotes the empty set.

• A ⊂ B means A is a subset of B. That is, all elements in A also belong

to B. Typically, B is a bigger set. For example, if A = {1, 2, 3} and

B = {1, 2, 3, 4}, then A ⊂ B. But if C = {3, 4, 5}, then C 6⊂ B.

• A ∪ B is the union of two sets. This is the set of all outcomes that

are either in A or in B, or in both. For example, if A = {1, 2, 3} and

B = {3, 4, 5}, then A ∪ B = {1, 2, 3, 4, 5}. Note that 3 is not counted

twice. If C = {1, 2, 3, 4}, then A ∪ C = {1, 2, 3, 4} = C.

• A∩B is the intersection of two sets. This is the set of all outcomes that

belong to both A and B. For example, if A = {1, 2, 3} and B = {3, 4, 5},

then A ∪B = {3}. If C = {4, 5}, then A ∪ C = ∅.

• A \ B is the difference of two sets. We remove from B all elements

that are also in A. For example, if A = {1, 2, 3} and B = {3, 4, 5}, then

B \A = {4, 5}.

• A′ is the complement of A. That is, A′ is the set of all elements in S

that are not in A.

• Commutative laws: A ∩B = B ∩A; A ∪B = B ∪A.

• Associative laws:

(A ∪B) ∪ C = A ∪ (B ∪ C) = A ∪B ∪ C

(A ∩B) ∩ C = A ∩ (B ∩ C) = A ∩B ∩ C

• Distributive laws:

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)

A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)

1

• De Morgan’s laws:

(A1 ∩ · · · ∩Ak)′ = A′1 ∪ · · · ∪A′k

(A1 ∪ · · · ∪Ak)′ = A′1 ∩ · · · ∩A′k

1.1 Spaces and Probabilities

In the probability theory Ω denotes a probability space. It could be as simple

as {0, 1} in case of one coin tossing, or as complicated as a space of all contin-

uous functions defined on [0, 1].

A subset of Ω is called an event and an element ω of Ω is a sample point.

This is a definition that you learned in elementary probability class:

Definition 1. Probability is a real-valued set function P that assigns, to each

event A ⊆ Ω a number P(A), called the probability of the event A, such that the

following axioms of probability are satisfied:

1. 0 ≤ P(A) ≤ 1 for any event A ⊂ Ω

2. P(∅) = 0, P(Ω) = 1

3. If A1, A2, ....Ak are mutually exclusive, then

P(A1 ∪A2 ∪ · · · ∪Ak) = P(A1) + P(A2) + · · ·+ P(Ak) .

You will learn in a moment that this definition is not 100% correct ..... Our

goal is to assign probabilities to as many events as possible.

For example, if Ω is finite then we can assign a uniform probability measure:

P({ω}) = 1|Ω| . But this is not only one choice. For example if you roll a fair dice

then Ω = {1, 2, 3, 4, 5, 6} and then P({1}) = · · · = P({6}) = 1/6. But if the dice

is loaded as follows: 1 side with ”1”, 2 sides with ”2”, 3 sides with ”3”, then

P({2}) = 2/6, P({3}) = 3/6. In other words, for the same sample space we may

have different probabilities.

If Ω = [0, 1] and A = (a, b), 0 < a < b < 1, then we can assign

P(A) = b− a .

Using the third axiom of probability, we can extend it to all finite unions of

intervals (see Theorem 11). What about more complicated events? For example,

what is P(Q), where Q is the set of all rationals in [0, 1]?

In fact, we cannot assign a probability to an arbitrary event, rather we need

to consider some special classes of sets like fields or σ-fields.

1.2 Classes of sets

A ”nice” class of sets should be closed under the formation of countable unions

and intersections.

2

Definition 2. A class F of subsets of Ω is called a field if:

• Ω ∈ F ;

• If A ∈ F then Ac = Ω \A ∈ F ;

• If A,B ∈ F then A ∪B ∈ F . (”closed under union of two sets”)

Note that De Morgan’s laws immediately imply that A ∩ B ∈ F . Indeed,

A∩B = (Ac∪Bc)c. Also, the third property implies that a field is closed under

any finite union.

Definition 3. A class F of subsets of Ω is called a σ-field if:

• Ω ∈ F ;

• If A ∈ F then Ac = Ω \A ∈ F ;

• If A1, A2, . . . ∈ F then A1 ∪ A2 ∪ · · · ∈ F . (”closed under countable

unions”)

Example 4. Let F consist of the finite and the cofinite sets (A being cofinite

if Ac is finite). For example, if Ω = [0, 1] and A = [0, 1) then A is cofinite since

Ac = {1}.

Then F is a field.

• If Ω is finite then F is also a σ-field.

• If Ω is infinite then F may not be a σ-field. For example, let Ω =

{ω1, ω2, . . .}. Take A = {ω2, ω4, . . .}. Then A 6∈ F , but A is a count-

able union of singletons {ω2i}, i = 1, 2, . . ..

Let A be a (”small” and ”easy”) class of subsets of Ω. For example, A can

be a class of all subintervals of [0, 1]. Note that this class is not closed under

finite unions since e.g. for a1 < a2 < a3 < a4, [a1, a2]∪ [a3, a4] is not an interval.

So, it is not a field of a σ-field.

Since we want to consider all sets in A and at the same time have a ”nice”

structure of a σ-field, we will consider a σ-field generated by the class A, denoted

by σ(A). This is the smallest σ-field that contains A.

In general, σ(A) has the following properties:

• A ⊂ σ(A);

• σ(A) is a σ-field;

• If A ⊂ G and G is σ-field, then σ(A) ⊂ G.

Example 5. If Ω = {1, 2, 3}, then σ({1}) = σ({2, 3}) = {∅, {1}, {2, 3},Ω}.

Indeed, denote A = {1} and A′ = {∅, {1}, {2, 3},Ω}. Check (this is easy!)

that A is not a field (since the complement of {1} does not belong to A), but

A′ is a field. It is also a σ-field, since the space Ω is finite. Of course, A ⊂ A′.

Now, is A′ the smallest field that contains A? We list all the possible classes

that contain A = {1}:

3

• {∅, {1},Ω} - it is not a field, since it is not closed under taking comple-

ments;

• {{1},Ω} - it is not a field, since it is not closed under taking complements;

• {∅, {1}} - it is not a field, since it is not closed under taking complements;

• {∅, {1}, {1, 2},Ω} - it is not a field, since it is not closed under taking

complements;

• you can continue listing all the possibilities and you will see that most of

them will not be fields.

• Now, I will list all the fields that include {1}:

– A′ = {∅, {1}, {2, 3},Ω};

– {∅, {1}, {1, 2}, {3}, {2, 3},Ω} - it is bigger than A′;

– {∅, {1}, {1, 3}, {2}, {2, 3},Ω} - it is bigger than A′;

– {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3},Ω} - it is bigger than A′;

Thus, we proved thatA′ is the smallest field that contains {1}, henceA′ = σ(A).

Of course, this is not the most effective way of proving this.

Example 6. If F is a σ-field, then σ(F) = F .

Example 7. Let Ω = [0, 1] and I be a class of all subintervals (a, b] of (0, 1].

Then B = σ(I) is called a Borel σ-field. It is a class of all countable unions

and intersections of intervals. Its elements are called Borel sets.

A set of rationals Q is a Borel set. Indeed, each rational q can be obtained as

q =

⋂∞

i=1(q − 1/i, q]. The sets (q − 1/i, q] ∈ I. Note that {q} 6∈ I, but {q} ∈ B.

Since there are countably many rationals, Q ∈ B.

1.3 Probability Measures

Definition 8. A set function P on a field F is called probability if

1. 0 ≤ P(A) ≤ 1 for any event A ∈ A

2. P(∅) = 0, P(Ω) = 1

3. If A1, A2, . . . ,∈ F are mutually exclusive (disjoint) and

⋃∞

i=1Ai ∈ F , then

P(A1 ∪A2 ∪ · · · ) = P(A1) + P(A2) + · · · .

From the axioms of probability we can derive further properties. Let A ⊂

B ∈ F . Then B = A ∪ (B \ A) is a union of disjoint sets. Thus, P(B) =

P(A) + P(B \A) and hence we conclude that P is monotone:

4

P(A) ≤ P(B) , A ⊂ B .

Another property is

P(A ∪B) = P(A) + P(B)− P(A ∩B) .

Further, let B1 = A1 and B2 = A2∩Ac1. Then the events B1, B2 are disjoint

and B1 ∪B2 = A1 ∪A2. Thus,

P(A1 ∪A2) = P(B1 ∪B2) = P(B1) + P(B2)

= P(A1) + P(A2 ∩Ac1) ≤ P(A1) + P(A2) .

In the last step we sued the fact that A2 ∩Ac1 ⊆ A2.

This can be generalized to obtain finite subadditivity:

P

(

n⋃

i=1

Ai

)

≤

n∑

i=1

P(Ai) , Ai ∈ F .

Definition 9. Let F be a σ-field. A triple (Ω,F ,P) is called a probability

space.

Theorem 10. Let F be a field.

• (Continuity from below) If A,An ∈ F , n ≥ 1 and A1 ⊆ A2 ⊆ · · · ⊆ A,

then

lim

n→∞P(An) = P(A) .

• (Continuity from above) If A,An ∈ F , n ≥ 1 and A ⊆ · · ·A2 ⊆ A1, then

lim

n→∞P(An) = P(A) .

Proof. Set B1 = A1, Bk = Ak \ Ak−1. Then Bk are disjoint, A =

⋃∞

k=1Bk

and An =

⋃n

k=1Bk and An, n ≥ 1 is an increasing sequence. By countable

additivity:

P(A) = P

( ∞⋃

k=1

Bk

)

=

∞∑

k=1

P(Bk)

= lim

n→∞

n∑

k=1

P(Bk) = lim

n→∞P

(

n⋃

k=1

Bk

)

= lim

n→∞P(An) .

5

1.4 Lebesque measure

Let Ω = [0, 1]. Let I be a class of subintervals (a, b] of (0, 1]. Let B0 be a class

that consists of all finite union of disjoint intervals. Note that B0 is a field. For

I = (a, b] its Lebesque measure is λ(I) = b− a.

Theorem 11. The Lebesgue measure is a probability measure on B0.

Proof. See the textbook.

Example 12. Assume that we select a value from the interval (0, 1] at ”ran-

dom”. What is the probability that the selected number is rational?

To each interval A = (a, b] we assigned P(A) = b−a. Thus, P((q−1/i, q]) =

1/i. Then

P({q}) = P

( ∞⋂

i=1

(q − 1/i, q]

)

= lim

i→∞

P((q − 1/i, q]) = lim

i→∞

(1/i) = 0 .

On the other hand, Q is a countable union thus P(Q) = 0. 1

2 Existence of Probability Measures

Some facts about sets and classes of sets. Let F and G be two classes of

sets. If G ⊆ F then any set that belongs to G also belongs to F : A ∈ G ⇒ A ∈ F .

On the other hand, if A ∈ G and B ⊆ A (that is, B is a subset of set A) then

we cannot conclude that B ∈ G. Indeed, let G be a class of subintervals (a, b] of

(0, 10]. Then B = {1/2} is included in many of the intervals, but B 6∈ G.

2.1 Existence

The main result of today’s lecture is the following theorem:

Theorem 13. A probability measure on a field has a unique extension to the

generated σ-field.

The meaning of this theorem is the following. Let F0 be a field and let

F = σ(F0). Let P be a probability measure on F0. We are allowed to write

P(A) for A ∈ F0, but not for A ∈ F . Then there exists (only one) probability

measure P˜ on F such that P(A) = P˜(A) for all A ∈ F0. Usually the extension

P˜ of P is denoted just by P.

1Note: At this moment, thanks to Theorem 11, we can compute the Lebesgue measure

on finite unions and intersections of subintervals. We do not know if we can compute the

measure on countable unions and intersections. This will cam later. Thus, at this point, the

first displayed equation is not completely justified, since

⋂∞

i=1(q − 1/i, q] 6∈ B0.

6

Example 14. Let Ω = [0, 1]. Let I be a class of subintervals I = (a, b] of (0, 1].

It is easy to define the Lebesgue measure on I by λ(I) = b−a. This is easy. Let

B0 be a field that consists of all finite union of disjoint intervals. By Theorem

10 from Slides set A, the Lebesgue measure is also a probability measure on B0.

This is still easy. Our main theorem extends the Lebesque measure to all Borel

sets of (0, 1]. In particular, we are allowed to write λ(Q) (which happens to be

zero as we learned in Slides set A).

Let P be the probability measure on F0. For A ∈ Ω define

P∗(A) = inf

∑

n

P(An) , (1)

where the infimum extends over all finite and infinite sequences A1, A2, . . . , such

that A ⊂ ⋃nAn and An ∈ F0. Note that for A ∈ Ω such that A 6∈ F0 we are

not allowed to write P(A). However, if A ∈ F0 then we will show that

P∗(A) = P(A) . (2)

The measure P∗ is called the outer measure of P.

The measure P∗ has the following properties:

1. P∗(∅) = 0;

2. P∗(A) ≥ 0 for all A;

3. P∗ is monotone, that is, if A ⊂ B, then P∗(A) ≤ P∗(B);

4. P∗ is countably subadditive, P∗(∪nAn) ≤

∑

n P∗(An).

However, we do not know if P∗ is a probability measure.

The inner measure P∗ is defined as

P∗(A) = 1− P∗(A′) .

Both P∗ and P∗ are candidates for an extension of P.

If P∗ and P∗ agree then we have

P∗(A) + P∗(A′) = 1 .

We will ask for more:

P∗(A ∩ E) + P∗(A′ ∩ E) = P∗(E) (3)

for all sets E ∈ Ω. Note that by subadditivity, (3) is equivalent to

P∗(A ∩ E) + P∗(A′ ∩ E) ≤ P∗(E) . (4)

We do not know at this point how many (if any at all!!!) sets A fulfill this. Let

M be the class of sets A that fulfill (3). We call such sets P∗-measurable.

7

Lemma 15. The class M is a field.

Proof. First, Ω ∈M, since

P∗(Ω ∩ E) + P∗(Ω′ ∩ E) = P∗(E) .

Next, M closed under complementation: if A ∈ M, then A′ ∈ M. This is

obvious. Furthermore, if A,B ∈ M, then A ∩ B ∈ M. For this one need to

show that if

P∗(A ∩ F ) + P∗(A′ ∩ F ) = P∗(F ) , P∗(B ∩ E) + P∗(B′ ∩ E) = P∗(E) ,

for all sets E,F , then

P∗((A ∩B) ∩ E) + P∗((A ∩B)′ ∩ E) ≤ P∗(E) .

We have

P∗(E) = P∗(B ∩ E︸ ︷︷ ︸

=F

) + P∗(B′ ∩ E︸ ︷︷ ︸

=F

)

= {P∗(A ∩B ∩ E) + P∗(A′ ∩B ∩ E)}

+ {P∗(A ∩B′ ∩ E) + P∗(A′ ∩B′ ∩ E)}

≥ P∗(A ∩B ∩ E) + P∗(((A′ ∩B) ∪ (A ∩B′) ∪ (A′ ∩B′)) ∩ E) .

In the first equation we used the property (3) applied to B, then in the next

equality the property of (3) applied to A twice. Then we used subadditivity of

P∗. Now,

((A′ ∩B) ∪ (A ∩B′) ∪ (A′ ∩B′)) = (A ∩B)′ ,

thus

P∗(E) ≥ P∗(A ∩B ∩ E) + P∗((A ∩B)′ ∩ E) .

Hence, A ∩B ∈M.

Lemma 16. If A1, A2, . . . is a finite or infinite sequence of disjoint M-sets,

then for each E ∈ Ω,

P∗

(

E ∩

⋃

k

Ak

)

=

∑

k

P∗(E ∩Ak) . (5)

Proof. We will do a proof of A1, A2 only. If A1 ∪ A2 = Ω, then A2 = A′1 and

hence (5) is just (3). If A1 ∪A2 ⊂ Ω, then write

E ∩ (A1 ∪A2)︸ ︷︷ ︸

=F

=

A1 ∩ (E ∩ (A1 ∪A2))︸ ︷︷ ︸

=F

∪

A′1 ∩ (E ∩ (A1 ∪A2))︸ ︷︷ ︸

=F

and use (20) with E replaced by F .

8

Lemma 17. The class M is a σ-field.

Proof. Note that we only need to prove thatM is closed under countable unions.

I will not prove it, you may consult the textbook.

Combining Lemma 17 with (5) (taking E = Ω) gives us the countable addi-

tivity on M:

P∗

(⋃

k

Ak

)

=

∑

k

P∗(Ak) , (6)

where Ak are disjoint and belong to M.

Lemma 18. If P∗ is defined by (3), then F0 ⊂M.

Proof. Let A ∈ F0 and E ∈ Ω. Let > 0. Choose sets An ∈ F0 such that

E ⊂ ⋃nAn and∑n P(An) ≤ P∗(E)+. The sets Bn = A∩An and Cn = A∩A′n

belong to F0, since it is field. Also, E ∩A ⊂

⋃

nBn, E ∩A′ ⊂

⋃

n Cn. Thus, by

monotonicity and countable subadditivity of P∗,

P∗(E ∩A) + P∗(E ∩A′) ≤ P∗

(⋃

n

Bn

)

+ P∗

(⋃

n

Cn

)

≤

∑

n

P∗(Bn) +

∑

n

P∗(Cn)

=

∑

n

P(Bn) +

∑

n

P(Cn) .

Since Bn and Cn are disjoint,

P∗(E ∩A) + P∗(E ∩A′) ≤

∑

n

P(An) ≤ P∗(E) + .

Hence, if A ∈ F0, then (22) holds and thus A ∈M. Hence, F0 ⊂M.

Corollary 19. If P∗ is defined by (1), then F = σ(F0) ⊂M.

Lemma 20. If P∗ is defined by (1) then P∗(A) = P(A) for all A ∈ F0.

Proof. From (1) we obtain that if A ∈ F0, then P∗(A) ≤ P(A) (since A can be

covered by itself). Now, let A ⊂ ∪nAn, where A,An ∈ F0. Then

P(A) = P (A ∩ (∪nAn)) ≤

∑

n

P(A ∩An) ≤

∑

n

P(An) .

Note that here we are allowed to write P since all the sets belong to F0. Thus

P∗(A) ≤ P(A) ≤

∑

n

P(An)

and since the left hand side does not depend on n we have

P∗(A) ≤ P(A) ≤ inf

n

∑

n

P(An) . (7)

At the same time, (1) holds, thus the inequalities in (7) must be equalities.

9

• By Lemma 20, P∗(Ω) = P(Ω).

• Equation (6) gives us countable additivity of P∗ on M, hence also on

F = σ(F0).

• Thus, P∗ is a probability measure on F .

• Lemma 20 means that P∗ is an extension of P.

2.2 Uniqueness

2 A class P of subsets of Ω is called a pi-system if it is closed under finite

intersections:

• If A,B ∈ P, then A ∩B ∈ P.

Note that a field is necessary a pi-system, but the converse is not true.

A class L of subsets of Ω is called a λ-system if

• Ω ∈ L;

• A ∈ L implies A′ ∈ L;

• L is closed under countable disjoint unions, that is if An ∈ L are disjoint,

then

⋃

nAn ∈ L.

Note that σ-field is also a λ-system, but the converse is not true.

Lemma. A class that is both a pi-system and a λ-system is a σ-field.

Theorem. (Dynkin pi-λ Theorem). If P is a pi-system and L is λ-system and

P ⊆ L, then σ(P) ⊆ L.

I will not prove this theorem (see the textbook), but think about this: If P

is a field and L is a σ-field, and P ⊆ L, then σ(P) ⊆ L. This is obvious by the

definition of σ(P).

Theorem. Suppose that P1 and P2 are probability measures on σ(P), where

P is a pi-system. Assume that P1 and P2 agree on P. Then they agree on σ(P).

Proof. Let G be the class of sets A such that P1(A) = P2(A).

• Obviously, Ω ∈ G, since P1(Ω) = P2(Ω).

• If A ∈ G, then P1(A′) == 1− P1(A) = 1− P2(A) = P2(A′), hence A′ ∈ G.

• If An ∈ G, n ≥ 1, are disjoint, then the union belongs to G by the countable

additivity of both P1 and P2.

• Thus, G is a λ-system. We do not know if G is a pi-system, but we know

from the assumption that P ⊆ G.

• Dynkin theorem tells us that σ(P) ⊆ G.

2This material is optional

10

• Note that if we have a stronger assumption that P is a field, then a proof

would be slightly shorter.

How do we apply this result. Take a field F0. Then for A ∈ F0, P∗(A) =

P∗(A). Thus, both inner and outer measure agree on a field. So, they must

agree on a generated σ-field.

3 Denumerable Probabilities

3.1 General Formulas

Let (Ω,F ,P) be a probability space. Let An, n ≥ 1 be a sequence of events,

An ∈ F . Define

lim sup

n

An =

∞⋂

n=1

∞⋃

k=n

Ak ,

lim inf

n

An =

∞⋃

n=1

∞⋂

k=n

Ak .

The sets above are called limits superior and limits inferior, respectively.

Since F is a σ-field, both lim supnAn and lim infnAn belong to F . Then ω ∈

lim supnAn if and only if for each n there is some k ≥ n for which ω ∈ Ak. In

other words, ω lies in lim supnAn if and only if it lies in infinitely many of the

An’s. Likewise, ω ∈ lim infnAn if and only if there is n such that for all k ≥ n,

ω ∈ Ak. In other words, ω lies in all but finitely many of the An’s.

• lim supnAn = limAn = {An i.o.} = {An infinitely often};

• ⋂∞k=nAk ↑ lim infnAn;

• ⋃∞k=nAk ↓ lim supnAn;

• lim infnAn ⊆ lim supnAn (to prove it, use the ”words” description from

the previous slide, instead of the set-theoretic arguments). To see that

inclusion may be strict, consider Ω = {ω1, ω2}. Let An = {ω1} for n even

and An = {ω2} for n odd. Then lim inf An = ∅ and lim supAn = Ω.

• Check https://en.wikipedia.org/wiki/Set-theoretic_limit for ”Ex-

amples”.

• From de Morgan’s laws we can also conclude that lim infnAn = (lim supnAcn)c.

• If lim supnAn = lim infnAn, then we write

A = lim

n

An = lim sup

n

An = lim inf

n

An .

In this case A ∈ F .

11

Theorem 21. We have

P(lim inf

n

An) ≤ lim inf

n

P(An) ≤ lim sup

n

P(An) ≤ P(lim sup

n

An) .

Proof. Set Bn =

⋂∞

k=nAk and Cn =

⋃∞

k=nAk. Then Bn ↑ lim infnAn and

Cn ↓ lim supnAn. The continuity from below and above give

lim sup

n

P(An) ≤ lim sup

n

P(Cn) ≤ P(lim sup

n

An)

and

lim inf

n

P(An) ≥ lim sup

n

P(Bn) ≤ P(lim inf

n

An) .

Corollary 22. If An → A, then limn→∞ P(An) = P(A).

Note that this result extends the previously proven continuity from below

and above.

Definition 23. A finite collection A1, . . . , An of events is independent if

P(Ai1 ∩ · · · ∩Aik) = P(Ai1)× · · · × P(Aik) (8)

for al collections of indices 1 ≤ i1 < · · · < ik ≤ n.

The requirement (8) can be stated in a different way:

P(B1 ∩ · · · ∩Bn) = P(B1)× · · · × P(Bn) , (9)

where Bi is either Ai or Ω.

In particular, a collection if 3 events A,B,C is independent if

P(A ∩B ∩ C) = P(A)P(B)P(C) , (10)

P(A ∩B) = P(A)P(B) , P(A ∩ C) = P(A)P(C) , P(B ∩ C) = P(B)P(C) .

(11)

Example 24. Let Ω = {1, 2, 3, 4, 5, 6}. Take A1 = {1, 2, 3, 4}, A2 = A3 =

{4, 5, 6}. Then (10) holds, but none of (11) are satisfied.

Definition 25. We say that classes A1 and A2 are independent if for each

choice of A1 ∈ A1 and A2 ∈ A2, the events A1 and A2 are independent.

Theorem 26. Assume that fields A1 and A2 are independent. Then the σ-fields

σ(A1) and σ(A2) are also independent. 3

Proof. By the assumption (9) holds for all sets B1, B2 from A1 and A2, respec-

tively. We want (9) to hold for all sets B1, B2 from σ(A1) and σ(A2). Fix

B2 ∈ A2. Let L be the class of sets B1 for which (9) holds. Our goal is to show

that the class L is a σ-field.

3In the textbook, you can see a slightly different formulation, namely it is required that

A1 and A2 are pi systems. Here, in our theorem, we require more.

12

• Ω ∈ L: Indeed, P(Ω ∩B2) = P(Ω)P(B2) trivially holds.

• L is closed under complement: Indeed, if P(B1 ∩B2) = P(B1)P(B2), then

P(B′1 ∩B2) = P(Ω ∩B2)− P(B1 ∩B2)

= P(Ω ∩B2)− P(B1)P(B2) = P(B′1)P(B2) .

• L is closed under finite disjoint unions: Indeed, if A,B ∈ L, A ∩ B = ∅

and both P(A∩B2) = P(A)P(B2) and P(B ∩B2) = P(B)P(B2) hold then

P((A ∪B) ∩B2) = P((A ∩B2) ∪ (B ∩B2)) = P(A ∩B2) + P(B ∩B2)

= P(A)P(B2) + P(B)P(B2) = P(A ∪B)P(B2) .

• The same argument allows us to conclude that L is closed under count-

able disjoint unions. Instead of finite additivity of P use the countable

additivity.

• We cannot conclude yet that L is a σ-field, since we only know that L is

closed under countable disjoint unions.

• But, L is a pi-system (closed under finite intersections) and λ-system

(closed under complement and countable disjoint unions), hence by Prob-

lem Set 2 it is a σ-field.

• Since (9) holds for A1 and σ(A1) ⊂ L, (9) holds for σ(A1).

• Now, in principle, you should fix B1 ∈ A1 and redo the proof. But the

steps are exactly the same.

3.2 The Borel-Cantelli Lemmas

Note fist that if limn→∞ P(An) = 0, then P(lim infnAn) = 0. The next result

has a stronger assumption but also a stronger conclusion.

Theorem 27 (First Borel-Cantelli Lemma). If

∑

n P(An) <∞, then P(lim supnAn) =

0.

Proof. Fix m ∈ N. We have

P

(

lim sup

n

An

)

≤ P

( ∞⋃

k=m

Ak

)

≤

∞∑

k=m

P(Ak) .

Since the series is summable, the latter expression converges to zero as m →

∞.

Theorem 28 (Second Borel-Cantelli Lemma). If

∑

n P(An) = ∞ and the

events An are independent, then P(lim supnAn) = 1.

13

Proof. Recall that lim supnAn =

⋂∞

n=1

⋃∞

k=nAk, so that the complement is⋃∞

n=1

⋂∞

k=nA

c

k. We will show that the complement has the probability zero.

For this it suffices to prove that P(

⋂∞

k=nA

c

k) = 0 for each n. Fix j. Then, using

1− x ≤ exp(−x), x ∈ (0, 1),

P

(

n+j⋂

k=n

Ack

)

=

n+j∏

k=n

P(Ack) =

n+j∏

k=n

(1− P(Ak)) ≤ exp

(

−

n+j∑

k=n

P(Ak)

)

If we let j →∞, then the last expression goes to zero.

Example 29. The assumption

∑

n P(An) <∞ is sufficient, but not necessary

for the first Borel-Cantelli Lemma. Consider the probability space ((0, 1],B, λ)

and the sets An = (0, 1/n]. Then An → ∅ and so lim supnAn = ∅ (so that its

measure is 0). On the other hand,

∑

n λ(An) =∞.

Note also that this example serves as a counterexample to the second Borel-

Cantelli Lemma. Note that the events An are not independent.

Let {An, n ≥ 1} be the sequence of events in the probability space (Ω,F ,P).

Consider σ-fields σ(An, An+1, . . .) and define the tail sigma field

T =

∞∏

n=1

σ(An, An+1, . . .) .

Any event in T is called a tail event and is determined by the An’s for arbitrary

large n.

Example 30. lim supnAn and lim infnAn are tail events.

Theorem 31. If the events An are independent, then A ∈ T implies P(A) = 0

or P(A) = 1.

Proof. since the events are independent, the σ-fields σ(A1), . . . , σ(An−1) and

σ(An, An+1, . . .) are independent. If A ∈ T , then A ∈ σ(An, An+1, . . .) and

hence the events A,A1, . . . , An−1 are independent. of events is defined by in-

dependence of each finite subcollection, the events A,A1, A2, . . . are indepen-

dent. Thus, σ(A) and σ(A1, A2, . . .) are independent. Moreover, A ∈ T ⊂

σ(A1, A2, . . .), hence A is independent from itself. This means that its proba-

bility is zero or 1.

4 Random variables

4.1 Definition

Definition 32. Let (Ω,F ,P) be a probability space and let (S,S) be a measurable

space. A mapping X : Ω→ S such that

X−1(B) = {ω : X(ω) ∈ B} = {X ∈ B} ∈ F (12)

for any B ∈ S is called a ((S,S)-valued)-random variable. Such a mapping

is also called a measurable mapping.

14

Example 33. If S = R and S = B, then we simply say that X is a random

variable. Recall that B is the smallest σ-algebra that contains all intervals.

Remark 34. There is no guarantee just by the definition that the class {X−1(B) :

B ∈ S} is a σ-field. It needs a proof.

The σ-algebra S may be huge. Therefore, verifying (19) may be difficult.

The following theorem simplifies (19) in a special case.

Theorem 35. Assume that S = σ(A) and X : Ω→ S. Assume that X−1(B) ∈

F for all B ∈ A. Then X is (S,S)-valued random variable.

See Assignment 3.

Example 36. Let S = R and S = B. Then we only need to check that

X−1((−∞, x]) ∈ F for any x ∈ R (in fact, for any x ∈ Q).

4.2 Simple random variables

Let A ∈ F . Define

IA(ω) =

{

1 if ω ∈ A

0 if ω 6∈ A .

Note that for any Borel set B, {ω : IA(ω) ∈ B} is either ∅, A, Ac or Ω,

(depending on whether 0 ∈ B or not and whether 1 ∈ B or not). Thus, IA is

a random variable, called an indicator random variable. Likewise, a finite

sum

X =

d∑

i=1

xiIAi(ω)

is a simple random variable if the sets Ai form a finite partition of Ω into

F-sets.

4.3 σ-fields generated by random variables

If G ⊂ F is a σ-field, then a random variable X is G-measurable or measurable

with respect to G if {X ∈ B} = {ω : X(ω) ∈ B} ∈ G for any Borel set B.

In case of simple random variables, this reduces to {X = x} ∈ G. Indeed,

{X ∈ B} = ⋃{X = x}, where the union extends over the finitely many values

of x lying both in B and the range of X.

Example 37. Let (Ω,F ,P) = (R,B,P) and consider X(ω) = ω. Then X is

(Ω,F)-measurable (and hence a random variable). Take now G = σ([a, b] : a, b ∈

Z). Then X is not G-measurable. Indeed, the set X−1((−∞, x]) = {w : w ≤ x},

x 6∈ Z, is not in G.

Definition 38. The σ-field generated by (S,S)-valued random variable

X denoted by σ(X), is the smallest σ-field with respect to which X is measurable.

15

Example 39. Assume that S = σ(A) and X : Ω→ S. Then σ(X) is generated

by the collection of sets X−1(A) = {X−1(B) : B ∈ A} ⊂ F . That is, σ(X) =

σ(G), where G is a collection of sets A such that A = X−1(B) for some B ∈ A.

Example 40. For a random variable X we have

σ(X) = σ({ω : X(ω) ≤ q}, q ∈ Q) .

Note that {ω : X(ω) ≤ q} = X−1((−∞, q]).

Theorem 41. Assume that X is a simple random variable. Then σ(X) consists

of the sets {ω : X(ω) ∈ B}, where B ⊂ R.

Proof. Let M be the class of the subsets of Ω of the form {ω : X(ω) ∈ B}.

ThenM⊂ σ(X). At the same timeM is a σ-field. This finishes the proof. See

the textbook for details.

Theorem 42. A simple random variable Y is σ(X)-measurable if and only if

Y = f(X) for a function f : R→ R.

Example 43. For example, Y = X2 is σ(X)-measurable, but Y = X2 + Z,

where Z is another random variable is not σ(X)-measurable. However, Y is

σ(X,Z)-measurable.

4.4 Independence

Definition 44. A sequence X1, X2, . . . (finite or infinite) of (S,S)-valued ran-

dom variables is independent if the classes σ(X1), σ(X2), . . .. That is, for each

finite k,

P(X1 ∈ B1, . . . , Xk ∈ Bk) = P(X1 ∈ B1) · · ·P(Xk ∈ Bk) (13)

for all B1, . . . , Bk ∈ S.

If S = σ(A) is suffices to verify (13) for sets B1, . . . , Bk ∈ A only (see the

previous lectures). In case of random variables it reduces to

P(X1 ≤ x1, . . . , Xk ≤ xk) = P(X1 ≤ x1) · · ·P(Xk ≤ xk) , (14)

while in case of simple random variables it further reduces to

P(X1 = y1, . . . , Xk = yk) = P(X1 = y1) · · ·P(Xk = yk) , (15)

Indeed, {X1 ≤ x1} =

⋃

y≤x1{X1 = y} and the latter is a finite union. Thus,

(15) implies (14).

16

4.5 Closure properties

Recall that f is a measurable function from (S,S) to (T, T ) if

f−1(B) = {x : f(x) ∈ B} ∈ S

for any B ∈ T .

Theorem 45. Let X : Ω→ S be a (S,S)-valued random variable and let f be a

measurable function from (S,S) to (T, T ). Then the composition f ◦X : Ω→ T

is (T, T )-valued random variable.

Proof. Let B ∈ T . We know that f−1(B) ∈ S. Now

(f ◦X)−1(B) = X−1(f−1(B)) ∈ F .

Remark 46. Note that Theorem 45 extends Theorem 42 from simple to arbi-

trary random variables. In Theorem 45 we need however f to be measurable

function.

Example 47. Let X : Ω → Rd be measurable so that (X1(ω), . . . , Xd(ω)) is a

random vector. Then

∑d

i=1Xi,

∏n

i=1Xi, mini=1,...,dXi, maxi=1,...,dXi are

random variables.

4.6 Convergence of random variables

Refresh your memory about pointwise and uniform convergence of functions:

https://en.wikipedia.org/wiki/Uniform_convergence

Let (Ω,F ,P) be a probability space and let X,Xn, n ≥ 1 be random variables

defined on that probability space. That is X : Ω → R and Xn : Ω → R. Then

Xn(ω) converges to X(ω) if for each > 0 there exists n0 such that for all

n ≥ n0 we have |Xn(ω) − X(ω)| < . Thus, the complementary event is:

|Xn(ω)−X(ω)| ≥ for some > 0 and infinitely many values of n:

{lim

n

Xn = X}c =

⋃

{|Xn −X| ≥ i.o.} .

Note that the union can be taken over rational values of .

Definition 48. We say that Xn converges with probability 1 to X (con-

verges almost surely) if

P(|Xn −X| ≥ i.o.) = 0 (16)

for all > 0. We will write Xn

a.s.−→ X.

Let An = {|Xn − X| ≥ }. Then {|Xn − X| ≥ i.o.} = lim supnAn.

Thus Xn

a.s.−→ X if and only if P(lim supnAn) = 0. Since lim supn P(An) ≤

P(lim supnAn) we conclude that (16) implies

lim

n→∞P(|Xn −X| ≥ ) = 0 (17)

17

Definition 49. We say that Xn converges in probability to X if

lim

n→∞P(|Xn −X| ≥ ) = 0 (18)

for all > 0. We will write Xn

p−→ X.

Remark 50. We get immediately

Xn

a.s.−→ X ⇒ Xn p−→ X .

The converse is not true.

Example 51. Let An be events such that P(An) = 0. Take X ≡ 0 and Xn =

IAn . Note that Xn

p−→ X is equivalent to P(An) → 0. Take any sets An such

that {An i.o.} has a positive probability. For example take An = (tn, tn+sn) with

sn ↓ 0 and such that ω is covered by infinitely many intervals An: tn = (i−1)/k,

sn = 1/k when n = k(k − 1)/2 + i, i = 1, . . . , k, k ≥ 1.

Example. This example was suggested by Joe. Consider a sequence of inde-

pendent random variables Xn, n ≥ 1, taking values 1 with probability 1/n and 0

with probability 1−1/n. Note that we can take Ω = (0, 1] with P = λ and write

Xn = 1An , where An = (0, 1/n]. Then P(An) = 1/n → 0 and hence we have

convergence in probability: Xn

p−→ X, whereX ≡ 0. You can also see this in the

following way: P(|Xn| > ) = P(Xn = 1) = 1/n→ 0. But we do not almost sure

convergence. Indeed, since

∑

n P(An) = +∞, using the second Borel-Cantelli

lemma, P(lim supAn) = 1. Thus P({ω : Xn(ω) = 1 for infinitely many n}) = 1,

hence Xn does not converge to zero almost surely.

4.7 Approximation by simple random variables

Theorem 52. For any random variable X there exists a sequence Xn of simple

random variables such that limnXn(ω) = X(ω) for each fixed ω ∈ Ω.

Proof. Let

fn(x) = nI(n,∞)(x) + 2−n

n2n−1∑

k=0

kI(k2−n,(k+1)2−n](x) .

IfX ≥ 0 thenXn = fn(X) is a simple function. Note that |X(ω)−Xn(ω)| ≤ 2−n

whenever X(ω) ≤ n. Thus the statement holds for X ≥ 0. If X is arbitrary,

then we can write X(ω) = X+(ω) − X−(ω), where X+(ω) = max{X(ω), 0}

and X−(ω) = −min{X(ω), 0}. Both are nonnegative random variables and

Xn = fn(X+)− fn(X−).

4.8 Expected Value

Let X be a simple random variable

X =

d∑

i=1

xi1Ai .

18

Then the expected value is define as

E[X] =

d∑

i=1

xiP(Ai) . (19)

On the other hand, we have

E[X] =

∑

x

xP(X = x) . (20)

Indeed, the right-hand sides of (19) and (20) can be written as∑

x

∑

i:xi=x

xiP(Ai) .

In particular, E[IA] = P(A).

If Y = f(X) then Y is the simple variable of the form

Y =

d∑

i=1

f(xi)1Ai

and hence

E[Y ] = E[f(X)] =

d∑

i=1

f(xi)P(Ai) =

∑

x

f(x)P(X = x) .

The k-th moment of X is given by

E[Xk] =

∑

x

xkP(X = x) .

The properties of the expected value follow from the corresponding properties of sum

operation:

• Linearity: If X = ∑i xi1Ai and Y = ∑j yj1Bj , and α, β ∈ R, then

xiIAi = xi

∑

j

1Ai∩Bj

and hence

αX + βY =

∑

i,j

(αxi + βyj)1Ai∩Bj .

Then

E[αX + βY ] =

∑

i,j

(αxi + βyj)P(Ai ∩Bj)

= α

∑

i

xiP(Ai) + β

∑

j

yjP(Bj) = αE[X] + βE[Y ] .

• It extends to a finite number of simple random variables:

E[

n∑

i=1

Xi] =

n∑

i=1

E[Xi] . (21)

19

• If X(ω) ≤ Y (ω) for all ω, then E[X] ≤ E[Y ];

• |E[X]| ≤ E[|X|];

•

|E[X − Y ]| ≤ E[|X − Y |] . (22)

• If X and Y are independent, then

E[XY ] = E

[∑

i,j

xiyj1Ai1Bj

]

= E

[∑

i,j

xiyj1Ai∩Bj

]

=

∑

i,j

xiyjP(Ai ∩Bj)

=

∑

i,j

xiyjP(Ai)P(Bj) =

(∑

i

xiP(Ai)

)(∑

j

yjP(Bj)

)

= E[X]E[Y ] .

Assume that X is nonnegative. Order the range as 0 ≤ x1 < x2 · · · ≤ xd. Then

E[X] =

d∑

i=i

xiP(X = xi) =

d−1∑

i=1

xi (P(X ≥ xi)− P(X ≥ xi−1)) + xdP(X = xd)

= x1P(X ≥ x1) +

d∑

i=2

(xi − xi−1)P(X ≥ xi) .

This can be written a ∫ ∞

0

P(X ≥ x)dx .

Theorem 53. If the sequence {Xn} is uniformly bounded and X = limnXn (that is

Xn

a.s.−→ X), then limn E[Xn] = E[X].

Proof. There exists K0 such that supn supω |Xn(ω)| < K0. Let K1 = supω |X(ω)|,

K = max{K0,K1}. Then supn supω |X(ω) − Xn(ω)| ≤ 2K. Let An = {ω : |X(ω) −

Xn(ω)| > }. Then for all ω

|X(ω)−Xn(ω)| = |X −Xn|1An(ω) + |X −Xn|1A′n(ω) ≤ K1An(ω) + 1A′n(ω) .

Thus

E [|X(ω)−Xn(ω)|] ≤ KP(An) + P(A′n) .

Since Xn

a.s.−→ X, we also have Xn p−→ X and by the definition limn P(An) = 0. This

implies

lim

n

E [|X(ω)−Xn(ω)|] ≤ .

Since is arbitrary, limn E [|X(ω)−Xn(ω)|] = 0. Apply (22).

Let µ = E[X]. We define variance

Var(X) = E[(X − µ)2] = E[X2]− µ2 .

Justification:

E[(X − µ)2] = E[X2 − 2µX + µ2] = E[X2]− E[2µX] + E[µ2]

= E[X2]− 2µE[X] + µ2 = E[X2]− 2µ2 + µ2 = E[X2]− µ2 .

20

•

Var(αX + β) = α2Var(X) .

Justification: Let Y = aX + b. Then µY = E[Y ] = aE[X] + b = aµX + b. We

have

Var(aX + b) = Var(Y ) = E[Y 2]− µ2Y = E[(aX + b)2]− (aµX + b)2

= E[a2X2 + 2abX + b2]− (a2µ2X + 2abµX + b2)

= E[a2X2] + E[2abX] + E[b2]− (a2µ2X + 2abµX + b2)

= a2E[X2] + 2abE[X] + b2 − (a2µ2X + 2abµX + b2)

= a2

{

E[X2]− µ2X

}

= a2Var(X) .

• Let S = ∑ni=1Xi and µi = E[Xi]. Using (21) we get

Var (S) = E

[(

n∑

i=1

(Xi − µi)

)2]

=

n∑

i=1

E[(Xi − µi)2] + 2

∑

1≤iE[(Xi − µi)(Xj − µj)] .

• If Xi are independent, then

Var

(

n∑

i=1

Xi

)

=

n∑

i=1

Var(Xi) .

4.9 Inequalities

Lemma 54. Assume that X is nonnegative. Then for any > 0,

P(X ≥ ) ≤ −1E[X] .

Proof.

E[X] =

∑

x

xP(X = x) ≥

∑

x,x≥

xP(X = x) ≥

∑

x,x≥

P(X = x) .

Applying this to |X|k we obtain the Markov inequality: for any X:

P(|X| ≥ ) ≤ −kE[|X|k] .

Applying this with k = 2 and |X − µ| we obtain the Chebyshev inequality:

P(|X − µ| > ) ≤ −2Var(X) .

Other inequalities:

• If ϕ is convex then we obtain the Jensen inequality:

ϕ(E[X]) ≤ E[ϕ(X)] .

21

• If p, q > 1 and 1/p+ 1/q = 1, then we obtain the Ho¨lder inequality:

E[|XY |] ≤ (E[|X|q])1/q(E[|X|p])1/p

(Use ab ≤ ap/p+ bq/q for a, b > 0).

• If 0 < p ≤ q then we obtain the Lyapunov inequality:

(E[|X|p])1/p ≤ (E[|X|q])1/q .

5 Measures in Euclidean Spaces

5.1 Lebesgue Measure

The Lebesgue measure λ was already defined on B, the Borel σ-field on R. It is the

only measure that satisfies λ((a, b]) = b− a. The class Bk is a σ-field generated by the

sets of the form

Ik :=

k∏

i=1

(ai, bi] = (a1, b1]× · · · × (ak, bk] .

The k-dimensional Lebesgue measure λk is defined as

λk(Ik) =

k∏

i=1

(bi − ai) .

The extension theorem allows us to consider this measure on Bk.

Theorem 55. If A ∈ Bk, then for x ∈ Rk, A + x = {a + x : a ∈ A} ∈ Bk and

λk(A+ x) = λk(A).

Proof. Let G = {A ∈ Bk : A+ x ∈ Bk for all x ∈ Rk}. Then

• G contains Ik, the class of all the rectangles Ik;

• G is a σ-field. For example, if A,B ∈ G, then C = A ∪ B ∈ G. Indeed,

C + x = (A ∪B) + x = (A+ x) ∪ (B + x).

• Thus, Bk = σ(Ik) ⊆ G.

For the second statement,

• Note that Ik is a pi-system (closed under intersections; this is not a field!);

• Fix x ∈ Rk and define µ(A) = λk(A+ x).

• Then µ and λk agree on the pi-system since for A ∈ Ik, λk(A) = λk(A + x).

Thus, µ and λk agree on all Borel sets.

Let T : Rk → Rk be a linear and nonsingular map. For example,

• rotation or reflection (special case of so-called orthogonal or unitary transfor-

mation); det(T ) = ±1;

• T (x1, . . . , xk) = (x1 + x2, x2, . . . , xk), then det(T ) = 1;

• T (x1, . . . , xk) = (ax1, x2, . . . , xk), then det(T ) = a.

For A ∈ Rk denote TA = {Ta : a ∈ A}.

Theorem 56. If T is linear and nonsingular, then A ∈ Bk implies TA ∈ Bk and

λk(TA) = |det(T )|λk(A) .

22

Singular case. Note that

B = {a} × (a2, b2]× · · · × (ak, bk]

can be viewed as an element of both Rk and Rk−1. Then

λk(B) = 0 , λk−1(B) =

k∏

i=2

(bi − ai) .

Note that B can be viewed as the image of

A = (a1, b1]× (a2, b2]× · · · × (ak, bk]

through the projection: B = TA, where T (x1, x2, . . . , xk) = (a, x2, . . . , xk). We have

det(T ) = 0.

In fact we have a general statement: if A, TA ∈ Bk and T is singular, then

λk(TA) = 0. Note however, contrary to the singular case, it is not always true that

TA ∈ Bk.

5.2 Regularity

Open and closed sets A subset A ∈ R is open if for every x ∈ A there exists

> 0 such that x+ ∈ A. For example, (a, b) is open, but (a, b] is not. For x, y ∈ Rk

let d(x, y) =

√∑k

i=1(xi − yi)2 be the Euclidean distance. Then A ∈ Rk is open if for

any x ∈ A there exists > 0 such that d(x, y) < implies y ∈ A. Equivalently, a set

A is open if there exists > 0 such that the ball B(x, ) = {y ∈ Rk : d(x, y) < } ⊆ A.

A closed is a complement of an open set. A compact set is bounded and closed.

Theorem 57. For any A ∈ Bk and > 0 there exist open and closed sets O and C

such that C ⊂ A ⊂ O and λk(O \ C) < .

5.3 Specifying Measures on the Line

Measures on R Let µ be a measure on R. Then

• µ is finite on bounded sets if assigns finite values to bounded subsets A ∈ R.

• µ is a finite measure if µ(R) <∞.

• Any probability measure is a finite measure (in fact, probability measures are

normalized finite measures).

• The Lebesgue measure is finite on bounded sets, but it is not finite: λ(R) =∞.

For a measure that is finite on bounded sets, define

F (x) =

{

µ((0, x]) , x ≥ 0

−µ((x, 0]) , x < 0 .

• The function F is nondecreasing;

• The function F is continuous from the right: if xn ↓ x, then F (xn)→ F (x).

23

Distribution function For every bounded interval (a, b],

µ((a, b]) = F (b)− F (a) . (23)

Note that (23) determines F up to an additive constant: if we know µ then from (23)

we can recover F (x) + c. If moreover µ is finite, then we can alternatively define

F (x) = µ((−∞, x]) .

Then:

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = µ(R);

• If µ(R) = 1, then F is called a (cumulative) distribution function.

• If µ = P is a probability measure, then for some random variable µ((−∞, x]) =

P(X ≤ x).

Theorem 58. Let F be a nondecreasing, right-continuous function. Then there exists

a unique measure µ such that (23) holds for all a, b. In particular, there is 1-1

equivalence between probability measures and distribution functions.

Proof. We need to show first that µ defined in (23) is a measure. Let A = (a, b] and

B = (c, d] be disjoint. We want to show that µ((a, b] ∪ (c, d]) = µ((a, b]) + µ((c, d]).

(Full details in assignment)

Example 59. Consider the following measure defined on the Borel sets of [0, 1]:

µ = (1/3)δ1/2 + (2/3)δ1 ,

where δx is the Dirac measure. Then

F (x) =

0 if x < 1/2 ,

1/3 if 1/2 ≤ x < 2/3 ,

1 if 2/3 ≤ x < 1 .

It is the probability measure since the total mass is one.

5.4 Specifying Measures on R2

Measures on R2 Let a = (a1, a2), b = (b1, b2) and A = (a, b] = (a1, b1] × (a2, b2]

be a bounded rectangle. For x ∈ R2 consider

Sx = {y ∈ R2 : y1 ≤ x1, y2 ≤ x2} .

The class of sets {Sx, x ∈ R2} generates B2. For a function F : R2 → R define

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2) .

Let now µ be a finite measure on R2. Define F by

F (x) = µ(Sx) = µ({y : y1 ≤ x1, y2 ≤ x2}) . (24)

24

Theorem 60. Let F be continuous from above and such that ∆AF ≥ 0 for any

bounded rectangle A. Then there exists a unique measure µ such that (24) holds.

If µ = P is a probability measure, then for some random vector (X,Y ),

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2)

= µ((−∞, b1]× (−∞, b2])− µ((−∞, b1]× (−∞, a2])

+ µ((−∞, a1]× (−∞, b2]) + µ((−∞, a1]× (−∞, a2])

= P(X ≤ b1, Y ≤ b2)− P(X ≤ b1, Y ≤ a2)

− P(X ≤ a1, Y ≤ b2) + P(X ≤ a1, Y ≤ a2)

= P(a1 < X ≤ b2, a2 < Y ≤ b2) .

Thus, the requirement ∆AF ≥ 0 is quite natural.

6 Measurable Functions and Mappings

6.1 Measurable Mappings

Let (Ω,F) and (Ω′,F ′) be two measurable spaces. A map T : Ω→ Ω′ is measurable

(sometimes written as F/F ′-measurable) if for each A′ ∈ F ′, we have T−1A′ ∈ F .

Recall that a F/B-measurable map is called a random variable. In this we will simply

write F-measurable.

Example 61. A real function f : Ω→ R with a finite range is measurable if f−1({x}) ∈

F .

Theorem 62. If T−1A′ ∈ F for each A′ ∈ A′ and σ(A′) = F ′, then T is F/F ′-

measurable. (cf. theorem 4, Slides Set E).

Proof. Let G = {A′ : T−1A′ ∈ F}. Then G is a σ-field and contains A′.

Let T : Ω → Ω′ and T ′ : Ω′ → Ω′′ . The space Ω′′ is equipped with a σ-field F ′′ .

A composition T ′ ◦ T : Ω→ Ω′′ is defined as ω → T ′(T (ω)).

Theorem 63. If T is F/F ′-measurable and T ′ is F ′/F ′′ -measurable then T ′ ◦ T is

F/F ′′ -measurable (composition of measurable maps is measurable).

6.2 Mappings into Rk

A map f : Ω→ Rk has the form

f(ω) = (f1(ω), . . . , fk(ω)) .

Since the sets

Sx = {y ∈ Rk : yi ≤ xi, i = 1, . . . , k}

generate Bk, a function f is F-measurable if and only if the set

{ω : fi(ω) ≤ xi, i = 1, . . . , k}

belongs to F for each (x1, . . . , xk).

Theorem 64. If f : Ri → Rk is continuous, then it is measurable.

Proof. Take for simplicity i = 1, k = 1. We need to show that Sx = {ω : f(ω) ≤ x} ∈ B

for each x. But continuity implies that Sx is a closed set, hence a complement of an

open set. Open sets are Borel sets.

25

6.3 Limits and Measurability

Theorem 65. Assume that fn, n ≥ 1, are real valued F-measurable functions. Then

(a) The functions supn fn, inf fn, lim supn fn, lim inf fn are measurable.

(b) If limn fn exists everywhere then it is measurable.

(c) The set {ω : limn fn(ω) exists} ∈ F .

Proof. We have

{ω : sup

n

fn(ω) ≤ x} = ∩n{ω : fn(ω) ≤ x} ∈ F .

Thus, supn fn is measurable.

Note: Assume that we have a probability space (Ω,F ,P) and f : Ω → R. Then

”limn fn exists everywhere” means that

P({ω : lim

n

fn(ω) does not exist}) = 0 .

For example if P = λ then a function f : (0, 1] → R that has jumps at countable

number of points is not continuous, but measurable.

Example 66. Let f(x) = x, g(x) = 1−x, x ∈ [0, 1]. Then h = sup{f, g} is a function

on [0, 1] defined as h(x) = 1− x, x ∈ [0, 1/2], h(x) = x, x ∈ (1/2, 1].

6.4 Transformations of Measures

Let (Ω,F , µ) be a measurable space (for example, a probability space). Let T : Ω→ Ω′

be F/F ′-measurable. Define a set function4 ν

ν(A′) = µT−1(A′) = µ(T−1A′) , A′ ∈ F ′ .

We note that ν = µT−1 is a set function on F ′.

It is not obvious that ν is a measure itself. However, assume that A′, B′ ∈ F ′ are

disjoint. Then the sets T−1(A′) and T−1(B′) are also disjoint and

ν(A′ ∪B′) = µ(T−1(A′ ∪B′)) = µ(T−1(A′) ∪ T−1(B′))

= µ(T−1(A′)) + µ(T−1(B′)) = ν(A′) + ν(B′) .

In a similar way you can justify other properties. Hence ν is a measure.

• If µ is finite then µT−1 is also finite.

• If µ is a probability measure then µT−1 is also a probability measure.

Indeed,

ν(Ω′) = µ(T−1Ω′) = µ(Ω) = 1 .

4Note that any measure is a set function, but not vice-versa

26

Measure preserving transformations

Definition 67. Let T : Ω→ Ω. If

µ(A) = µ(T−1(A)) , A ∈ F ,

then we say that T preserves the measure µ.

Example 68. If Ω = Rk and T is a linear map with det(T ) = 1, then T preserves

the Lebesque measure λk.

Example 69. The identity map is always measure preserving.

7 Distributions

7.1 Transformation of probability measures. Distributions

Consider a probability space (Ω,F ,P).

Recall that a random variable X is a measurable map X : Ω → R. Define the

measure µ on B by

µ = PX−1 .

Thus, for each A ∈ B we have

µ(A) = P(X−1A) = P({ω : X(ω) ∈ A}) = P(X ∈ A) .

We say that µ is the law or the distribution of a random variable X.

The (cumulative) distribution function of X is

F (x) = µ((−∞, x]) = P(X ≤ x) .

Example 70. Consider a probability space ([0, 1],B, λ). Define a map U(ω) = ω.

Then µ = λ and the cumulative distribution function of U is

FU (x) = P(U ≤ x) =

0 x < 0 ,

x x ∈ [0, 1] ,

1 x > 1 .

We say that U has a uniform distribution.

Example 71. Consider a probability space ([0, 1],B, λ). Assume that X is a random

variable defined on Ω with the law µ = PX−1. Define Y = X + b, b ∈ R. Note that Y

is also defined on Ω. Consider the law ν = P−1Y . Recall that for A ⊆ R the set A− b

is defined as A− b = {a− b : a ∈ A}. Then

ν(A) = P(Y −1A) = P({ω : Y (ω) ∈ A}) = P({ω : X(ω) + b ∈ A})

= P({ω : X(ω) ∈ A− b}) = P(X ∈ A− b) = µ(A− b) .

If FX and FY are the cumulative distribution functions for X and Y , respectively, then

FY (x) = FX(x− b) .

27

Properties of cdf

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = 1;

• If F is a distribution function then

F¯ (x) = 1− F (x) = 1− P(X ≤ x) = P(X > x)

is called the tail distribution function.

• F (x−) = limy↑x F (y) = P(X < x) = µ((−∞, x));

• P(X = x) = F (x)− F (x−). If P(X = x) 6= 0 then we say that F has a jump at

x.

Example 72. Assume that X is a nonnegative random variable with the cumulative

distribution function FX (for example, without any jumps). For b > 0 define

Y = (X − b)+ =

{

X − b if X > b ,

0 if X ≤ b .

By the definition Y ≥ 0. Therefore, P (Y ≤ y) = 0 for all y < 0.

For y = 0 we have

P(Y ≤ 0) = P(Y = 0) = P(X ≤ b) = FX(b) .

Moreover, for y > 0,

P(Y ≤ y) = P(Y ≤ y,X ≤ b) + P(Y ≤ y,X > b)

= P(0 ≤ y,X ≤ b) + P(X − b ≤ y,X > b)

= P(X ≤ b) + P(b < X ≤ y + b)

= P(X ≤ y + b) = FX(y + b) .

In summary

FY (y) =

0 , y < 0

FX(b) , y = 0

FX(y + b) , y > 0

.

Note that the cumulative distribution function has the jump of size P (X ≤ b) at point

y = 0.

Quantile function Let F be the cumulative distribution function. Recall that

F : R→ [0, 1]. Define a function Q : [0, 1]→ R by

Q(y) = inf{x : F (x) ≥ y} .

• If F is strictly increasing and continuous (no jumps), then Q is nothing else but

the inverse function of F (strictly increasing and continuous). Then Q(F (x)) =

x.

• If F is strictly increasing but has some jumps, then Q is continuous but not

strictly increasing.

• If F is strictly not increasing but continuous, then Q has jumps but is strictly

increasing.

28

Lemma 73. Assume that U is uniform on [0, 1], that is its distribution is given by

FU (x) = 0 if x < 0, FU (x) = x for x ∈ [0, 1] and FU (x) = 1 for x > 1. Let F

be a strictly increasing and continuous distribution function and let Q be its quantile

function. Define X = Q(U). Then X has distribution F .

Proof. We have

P(X ≤ x) = P(Q(U) ≤ x) = P(F (Q(U)) ≤ F (x))

= P(U ≤ F (x)︸ ︷︷ ︸

=y

) = y = F (x) .

Remark 74. This result is very important for simulations.

Existence theorem Before, we started with a probability measure P and a random

variable X and we defined the cdf. The following result gives a converse statement.

Theorem 75. If F is a nondecreasing, right-continuous function satisfying limx→−∞ F (x) =

0 and limx→∞ F (x) = 1, then there exists on some probability space (Ω,F ,P) a random

variable X for which F (x) = P(X ≤ x).

Proof. Assume for a moment that F is strictly increasing and continuous. From Ex-

ample 70 we already know that there exists an uniform random variable U defined on

the probability space ([0, 1],B, λ). By Lemma 73 we can construct a random variable

X with the cdf F .

For the general case, please see the textbook.

29

E[(Xi − µi)(Xj − µj)] .

• If Xi are independent, then

Var

(

n∑

i=1

Xi

)

=

n∑

i=1

Var(Xi) .

4.9 Inequalities

Lemma 54. Assume that X is nonnegative. Then for any > 0,

P(X ≥ ) ≤ −1E[X] .

Proof.

E[X] =

∑

x

xP(X = x) ≥

∑

x,x≥

xP(X = x) ≥

∑

x,x≥

P(X = x) .

Applying this to |X|k we obtain the Markov inequality: for any X:

P(|X| ≥ ) ≤ −kE[|X|k] .

Applying this with k = 2 and |X − µ| we obtain the Chebyshev inequality:

P(|X − µ| > ) ≤ −2Var(X) .

Other inequalities:

• If ϕ is convex then we obtain the Jensen inequality:

ϕ(E[X]) ≤ E[ϕ(X)] .

21

• If p, q > 1 and 1/p+ 1/q = 1, then we obtain the Ho¨lder inequality:

E[|XY |] ≤ (E[|X|q])1/q(E[|X|p])1/p

(Use ab ≤ ap/p+ bq/q for a, b > 0).

• If 0 < p ≤ q then we obtain the Lyapunov inequality:

(E[|X|p])1/p ≤ (E[|X|q])1/q .

5 Measures in Euclidean Spaces

5.1 Lebesgue Measure

The Lebesgue measure λ was already defined on B, the Borel σ-field on R. It is the

only measure that satisfies λ((a, b]) = b− a. The class Bk is a σ-field generated by the

sets of the form

Ik :=

k∏

i=1

(ai, bi] = (a1, b1]× · · · × (ak, bk] .

The k-dimensional Lebesgue measure λk is defined as

λk(Ik) =

k∏

i=1

(bi − ai) .

The extension theorem allows us to consider this measure on Bk.

Theorem 55. If A ∈ Bk, then for x ∈ Rk, A + x = {a + x : a ∈ A} ∈ Bk and

λk(A+ x) = λk(A).

Proof. Let G = {A ∈ Bk : A+ x ∈ Bk for all x ∈ Rk}. Then

• G contains Ik, the class of all the rectangles Ik;

• G is a σ-field. For example, if A,B ∈ G, then C = A ∪ B ∈ G. Indeed,

C + x = (A ∪B) + x = (A+ x) ∪ (B + x).

• Thus, Bk = σ(Ik) ⊆ G.

For the second statement,

• Note that Ik is a pi-system (closed under intersections; this is not a field!);

• Fix x ∈ Rk and define µ(A) = λk(A+ x).

• Then µ and λk agree on the pi-system since for A ∈ Ik, λk(A) = λk(A + x).

Thus, µ and λk agree on all Borel sets.

Let T : Rk → Rk be a linear and nonsingular map. For example,

• rotation or reflection (special case of so-called orthogonal or unitary transfor-

mation); det(T ) = ±1;

• T (x1, . . . , xk) = (x1 + x2, x2, . . . , xk), then det(T ) = 1;

• T (x1, . . . , xk) = (ax1, x2, . . . , xk), then det(T ) = a.

For A ∈ Rk denote TA = {Ta : a ∈ A}.

Theorem 56. If T is linear and nonsingular, then A ∈ Bk implies TA ∈ Bk and

λk(TA) = |det(T )|λk(A) .

22

Singular case. Note that

B = {a} × (a2, b2]× · · · × (ak, bk]

can be viewed as an element of both Rk and Rk−1. Then

λk(B) = 0 , λk−1(B) =

k∏

i=2

(bi − ai) .

Note that B can be viewed as the image of

A = (a1, b1]× (a2, b2]× · · · × (ak, bk]

through the projection: B = TA, where T (x1, x2, . . . , xk) = (a, x2, . . . , xk). We have

det(T ) = 0.

In fact we have a general statement: if A, TA ∈ Bk and T is singular, then

λk(TA) = 0. Note however, contrary to the singular case, it is not always true that

TA ∈ Bk.

5.2 Regularity

Open and closed sets A subset A ∈ R is open if for every x ∈ A there exists

> 0 such that x+ ∈ A. For example, (a, b) is open, but (a, b] is not. For x, y ∈ Rk

let d(x, y) =

√∑k

i=1(xi − yi)2 be the Euclidean distance. Then A ∈ Rk is open if for

any x ∈ A there exists > 0 such that d(x, y) < implies y ∈ A. Equivalently, a set

A is open if there exists > 0 such that the ball B(x, ) = {y ∈ Rk : d(x, y) < } ⊆ A.

A closed is a complement of an open set. A compact set is bounded and closed.

Theorem 57. For any A ∈ Bk and > 0 there exist open and closed sets O and C

such that C ⊂ A ⊂ O and λk(O \ C) < .

5.3 Specifying Measures on the Line

Measures on R Let µ be a measure on R. Then

• µ is finite on bounded sets if assigns finite values to bounded subsets A ∈ R.

• µ is a finite measure if µ(R) <∞.

• Any probability measure is a finite measure (in fact, probability measures are

normalized finite measures).

• The Lebesgue measure is finite on bounded sets, but it is not finite: λ(R) =∞.

For a measure that is finite on bounded sets, define

F (x) =

{

µ((0, x]) , x ≥ 0

−µ((x, 0]) , x < 0 .

• The function F is nondecreasing;

• The function F is continuous from the right: if xn ↓ x, then F (xn)→ F (x).

23

Distribution function For every bounded interval (a, b],

µ((a, b]) = F (b)− F (a) . (23)

Note that (23) determines F up to an additive constant: if we know µ then from (23)

we can recover F (x) + c. If moreover µ is finite, then we can alternatively define

F (x) = µ((−∞, x]) .

Then:

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = µ(R);

• If µ(R) = 1, then F is called a (cumulative) distribution function.

• If µ = P is a probability measure, then for some random variable µ((−∞, x]) =

P(X ≤ x).

Theorem 58. Let F be a nondecreasing, right-continuous function. Then there exists

a unique measure µ such that (23) holds for all a, b. In particular, there is 1-1

equivalence between probability measures and distribution functions.

Proof. We need to show first that µ defined in (23) is a measure. Let A = (a, b] and

B = (c, d] be disjoint. We want to show that µ((a, b] ∪ (c, d]) = µ((a, b]) + µ((c, d]).

(Full details in assignment)

Example 59. Consider the following measure defined on the Borel sets of [0, 1]:

µ = (1/3)δ1/2 + (2/3)δ1 ,

where δx is the Dirac measure. Then

F (x) =

0 if x < 1/2 ,

1/3 if 1/2 ≤ x < 2/3 ,

1 if 2/3 ≤ x < 1 .

It is the probability measure since the total mass is one.

5.4 Specifying Measures on R2

Measures on R2 Let a = (a1, a2), b = (b1, b2) and A = (a, b] = (a1, b1] × (a2, b2]

be a bounded rectangle. For x ∈ R2 consider

Sx = {y ∈ R2 : y1 ≤ x1, y2 ≤ x2} .

The class of sets {Sx, x ∈ R2} generates B2. For a function F : R2 → R define

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2) .

Let now µ be a finite measure on R2. Define F by

F (x) = µ(Sx) = µ({y : y1 ≤ x1, y2 ≤ x2}) . (24)

24

Theorem 60. Let F be continuous from above and such that ∆AF ≥ 0 for any

bounded rectangle A. Then there exists a unique measure µ such that (24) holds.

If µ = P is a probability measure, then for some random vector (X,Y ),

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2)

= µ((−∞, b1]× (−∞, b2])− µ((−∞, b1]× (−∞, a2])

+ µ((−∞, a1]× (−∞, b2]) + µ((−∞, a1]× (−∞, a2])

= P(X ≤ b1, Y ≤ b2)− P(X ≤ b1, Y ≤ a2)

− P(X ≤ a1, Y ≤ b2) + P(X ≤ a1, Y ≤ a2)

= P(a1 < X ≤ b2, a2 < Y ≤ b2) .

Thus, the requirement ∆AF ≥ 0 is quite natural.

6 Measurable Functions and Mappings

6.1 Measurable Mappings

Let (Ω,F) and (Ω′,F ′) be two measurable spaces. A map T : Ω→ Ω′ is measurable

(sometimes written as F/F ′-measurable) if for each A′ ∈ F ′, we have T−1A′ ∈ F .

Recall that a F/B-measurable map is called a random variable. In this we will simply

write F-measurable.

Example 61. A real function f : Ω→ R with a finite range is measurable if f−1({x}) ∈

F .

Theorem 62. If T−1A′ ∈ F for each A′ ∈ A′ and σ(A′) = F ′, then T is F/F ′-

measurable. (cf. theorem 4, Slides Set E).

Proof. Let G = {A′ : T−1A′ ∈ F}. Then G is a σ-field and contains A′.

Let T : Ω → Ω′ and T ′ : Ω′ → Ω′′ . The space Ω′′ is equipped with a σ-field F ′′ .

A composition T ′ ◦ T : Ω→ Ω′′ is defined as ω → T ′(T (ω)).

Theorem 63. If T is F/F ′-measurable and T ′ is F ′/F ′′ -measurable then T ′ ◦ T is

F/F ′′ -measurable (composition of measurable maps is measurable).

6.2 Mappings into Rk

A map f : Ω→ Rk has the form

f(ω) = (f1(ω), . . . , fk(ω)) .

Since the sets

Sx = {y ∈ Rk : yi ≤ xi, i = 1, . . . , k}

generate Bk, a function f is F-measurable if and only if the set

{ω : fi(ω) ≤ xi, i = 1, . . . , k}

belongs to F for each (x1, . . . , xk).

Theorem 64. If f : Ri → Rk is continuous, then it is measurable.

Proof. Take for simplicity i = 1, k = 1. We need to show that Sx = {ω : f(ω) ≤ x} ∈ B

for each x. But continuity implies that Sx is a closed set, hence a complement of an

open set. Open sets are Borel sets.

25

6.3 Limits and Measurability

Theorem 65. Assume that fn, n ≥ 1, are real valued F-measurable functions. Then

(a) The functions supn fn, inf fn, lim supn fn, lim inf fn are measurable.

(b) If limn fn exists everywhere then it is measurable.

(c) The set {ω : limn fn(ω) exists} ∈ F .

Proof. We have

{ω : sup

n

fn(ω) ≤ x} = ∩n{ω : fn(ω) ≤ x} ∈ F .

Thus, supn fn is measurable.

Note: Assume that we have a probability space (Ω,F ,P) and f : Ω → R. Then

”limn fn exists everywhere” means that

P({ω : lim

n

fn(ω) does not exist}) = 0 .

For example if P = λ then a function f : (0, 1] → R that has jumps at countable

number of points is not continuous, but measurable.

Example 66. Let f(x) = x, g(x) = 1−x, x ∈ [0, 1]. Then h = sup{f, g} is a function

on [0, 1] defined as h(x) = 1− x, x ∈ [0, 1/2], h(x) = x, x ∈ (1/2, 1].

6.4 Transformations of Measures

Let (Ω,F , µ) be a measurable space (for example, a probability space). Let T : Ω→ Ω′

be F/F ′-measurable. Define a set function4 ν

ν(A′) = µT−1(A′) = µ(T−1A′) , A′ ∈ F ′ .

We note that ν = µT−1 is a set function on F ′.

It is not obvious that ν is a measure itself. However, assume that A′, B′ ∈ F ′ are

disjoint. Then the sets T−1(A′) and T−1(B′) are also disjoint and

ν(A′ ∪B′) = µ(T−1(A′ ∪B′)) = µ(T−1(A′) ∪ T−1(B′))

= µ(T−1(A′)) + µ(T−1(B′)) = ν(A′) + ν(B′) .

In a similar way you can justify other properties. Hence ν is a measure.

• If µ is finite then µT−1 is also finite.

• If µ is a probability measure then µT−1 is also a probability measure.

Indeed,

ν(Ω′) = µ(T−1Ω′) = µ(Ω) = 1 .

4Note that any measure is a set function, but not vice-versa

26

Measure preserving transformations

Definition 67. Let T : Ω→ Ω. If

µ(A) = µ(T−1(A)) , A ∈ F ,

then we say that T preserves the measure µ.

Example 68. If Ω = Rk and T is a linear map with det(T ) = 1, then T preserves

the Lebesque measure λk.

Example 69. The identity map is always measure preserving.

7 Distributions

7.1 Transformation of probability measures. Distributions

Consider a probability space (Ω,F ,P).

Recall that a random variable X is a measurable map X : Ω → R. Define the

measure µ on B by

µ = PX−1 .

Thus, for each A ∈ B we have

µ(A) = P(X−1A) = P({ω : X(ω) ∈ A}) = P(X ∈ A) .

We say that µ is the law or the distribution of a random variable X.

The (cumulative) distribution function of X is

F (x) = µ((−∞, x]) = P(X ≤ x) .

Example 70. Consider a probability space ([0, 1],B, λ). Define a map U(ω) = ω.

Then µ = λ and the cumulative distribution function of U is

FU (x) = P(U ≤ x) =

0 x < 0 ,

x x ∈ [0, 1] ,

1 x > 1 .

We say that U has a uniform distribution.

Example 71. Consider a probability space ([0, 1],B, λ). Assume that X is a random

variable defined on Ω with the law µ = PX−1. Define Y = X + b, b ∈ R. Note that Y

is also defined on Ω. Consider the law ν = P−1Y . Recall that for A ⊆ R the set A− b

is defined as A− b = {a− b : a ∈ A}. Then

ν(A) = P(Y −1A) = P({ω : Y (ω) ∈ A}) = P({ω : X(ω) + b ∈ A})

= P({ω : X(ω) ∈ A− b}) = P(X ∈ A− b) = µ(A− b) .

If FX and FY are the cumulative distribution functions for X and Y , respectively, then

FY (x) = FX(x− b) .

27

Properties of cdf

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = 1;

• If F is a distribution function then

F¯ (x) = 1− F (x) = 1− P(X ≤ x) = P(X > x)

is called the tail distribution function.

• F (x−) = limy↑x F (y) = P(X < x) = µ((−∞, x));

• P(X = x) = F (x)− F (x−). If P(X = x) 6= 0 then we say that F has a jump at

x.

Example 72. Assume that X is a nonnegative random variable with the cumulative

distribution function FX (for example, without any jumps). For b > 0 define

Y = (X − b)+ =

{

X − b if X > b ,

0 if X ≤ b .

By the definition Y ≥ 0. Therefore, P (Y ≤ y) = 0 for all y < 0.

For y = 0 we have

P(Y ≤ 0) = P(Y = 0) = P(X ≤ b) = FX(b) .

Moreover, for y > 0,

P(Y ≤ y) = P(Y ≤ y,X ≤ b) + P(Y ≤ y,X > b)

= P(0 ≤ y,X ≤ b) + P(X − b ≤ y,X > b)

= P(X ≤ b) + P(b < X ≤ y + b)

= P(X ≤ y + b) = FX(y + b) .

In summary

FY (y) =

0 , y < 0

FX(b) , y = 0

FX(y + b) , y > 0

.

Note that the cumulative distribution function has the jump of size P (X ≤ b) at point

y = 0.

Quantile function Let F be the cumulative distribution function. Recall that

F : R→ [0, 1]. Define a function Q : [0, 1]→ R by

Q(y) = inf{x : F (x) ≥ y} .

• If F is strictly increasing and continuous (no jumps), then Q is nothing else but

the inverse function of F (strictly increasing and continuous). Then Q(F (x)) =

x.

• If F is strictly increasing but has some jumps, then Q is continuous but not

strictly increasing.

• If F is strictly not increasing but continuous, then Q has jumps but is strictly

increasing.

28

Lemma 73. Assume that U is uniform on [0, 1], that is its distribution is given by

FU (x) = 0 if x < 0, FU (x) = x for x ∈ [0, 1] and FU (x) = 1 for x > 1. Let F

be a strictly increasing and continuous distribution function and let Q be its quantile

function. Define X = Q(U). Then X has distribution F .

Proof. We have

P(X ≤ x) = P(Q(U) ≤ x) = P(F (Q(U)) ≤ F (x))

= P(U ≤ F (x)︸ ︷︷ ︸

=y

) = y = F (x) .

Remark 74. This result is very important for simulations.

Existence theorem Before, we started with a probability measure P and a random

variable X and we defined the cdf. The following result gives a converse statement.

Theorem 75. If F is a nondecreasing, right-continuous function satisfying limx→−∞ F (x) =

0 and limx→∞ F (x) = 1, then there exists on some probability space (Ω,F ,P) a random

variable X for which F (x) = P(X ≤ x).

Proof. Assume for a moment that F is strictly increasing and continuous. From Ex-

ample 70 we already know that there exists an uniform random variable U defined on

the probability space ([0, 1],B, λ). By Lemma 73 we can construct a random variable

X with the cdf F .

For the general case, please see the textbook.

29

学霸联盟:"https://www.xuebaunion.com"

October 23, 2020

1 Probability Measures

A quick review for sets and events

• ∅ denotes the empty set.

• A ⊂ B means A is a subset of B. That is, all elements in A also belong

to B. Typically, B is a bigger set. For example, if A = {1, 2, 3} and

B = {1, 2, 3, 4}, then A ⊂ B. But if C = {3, 4, 5}, then C 6⊂ B.

• A ∪ B is the union of two sets. This is the set of all outcomes that

are either in A or in B, or in both. For example, if A = {1, 2, 3} and

B = {3, 4, 5}, then A ∪ B = {1, 2, 3, 4, 5}. Note that 3 is not counted

twice. If C = {1, 2, 3, 4}, then A ∪ C = {1, 2, 3, 4} = C.

• A∩B is the intersection of two sets. This is the set of all outcomes that

belong to both A and B. For example, if A = {1, 2, 3} and B = {3, 4, 5},

then A ∪B = {3}. If C = {4, 5}, then A ∪ C = ∅.

• A \ B is the difference of two sets. We remove from B all elements

that are also in A. For example, if A = {1, 2, 3} and B = {3, 4, 5}, then

B \A = {4, 5}.

• A′ is the complement of A. That is, A′ is the set of all elements in S

that are not in A.

• Commutative laws: A ∩B = B ∩A; A ∪B = B ∪A.

• Associative laws:

(A ∪B) ∪ C = A ∪ (B ∪ C) = A ∪B ∪ C

(A ∩B) ∩ C = A ∩ (B ∩ C) = A ∩B ∩ C

• Distributive laws:

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)

A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)

1

• De Morgan’s laws:

(A1 ∩ · · · ∩Ak)′ = A′1 ∪ · · · ∪A′k

(A1 ∪ · · · ∪Ak)′ = A′1 ∩ · · · ∩A′k

1.1 Spaces and Probabilities

In the probability theory Ω denotes a probability space. It could be as simple

as {0, 1} in case of one coin tossing, or as complicated as a space of all contin-

uous functions defined on [0, 1].

A subset of Ω is called an event and an element ω of Ω is a sample point.

This is a definition that you learned in elementary probability class:

Definition 1. Probability is a real-valued set function P that assigns, to each

event A ⊆ Ω a number P(A), called the probability of the event A, such that the

following axioms of probability are satisfied:

1. 0 ≤ P(A) ≤ 1 for any event A ⊂ Ω

2. P(∅) = 0, P(Ω) = 1

3. If A1, A2, ....Ak are mutually exclusive, then

P(A1 ∪A2 ∪ · · · ∪Ak) = P(A1) + P(A2) + · · ·+ P(Ak) .

You will learn in a moment that this definition is not 100% correct ..... Our

goal is to assign probabilities to as many events as possible.

For example, if Ω is finite then we can assign a uniform probability measure:

P({ω}) = 1|Ω| . But this is not only one choice. For example if you roll a fair dice

then Ω = {1, 2, 3, 4, 5, 6} and then P({1}) = · · · = P({6}) = 1/6. But if the dice

is loaded as follows: 1 side with ”1”, 2 sides with ”2”, 3 sides with ”3”, then

P({2}) = 2/6, P({3}) = 3/6. In other words, for the same sample space we may

have different probabilities.

If Ω = [0, 1] and A = (a, b), 0 < a < b < 1, then we can assign

P(A) = b− a .

Using the third axiom of probability, we can extend it to all finite unions of

intervals (see Theorem 11). What about more complicated events? For example,

what is P(Q), where Q is the set of all rationals in [0, 1]?

In fact, we cannot assign a probability to an arbitrary event, rather we need

to consider some special classes of sets like fields or σ-fields.

1.2 Classes of sets

A ”nice” class of sets should be closed under the formation of countable unions

and intersections.

2

Definition 2. A class F of subsets of Ω is called a field if:

• Ω ∈ F ;

• If A ∈ F then Ac = Ω \A ∈ F ;

• If A,B ∈ F then A ∪B ∈ F . (”closed under union of two sets”)

Note that De Morgan’s laws immediately imply that A ∩ B ∈ F . Indeed,

A∩B = (Ac∪Bc)c. Also, the third property implies that a field is closed under

any finite union.

Definition 3. A class F of subsets of Ω is called a σ-field if:

• Ω ∈ F ;

• If A ∈ F then Ac = Ω \A ∈ F ;

• If A1, A2, . . . ∈ F then A1 ∪ A2 ∪ · · · ∈ F . (”closed under countable

unions”)

Example 4. Let F consist of the finite and the cofinite sets (A being cofinite

if Ac is finite). For example, if Ω = [0, 1] and A = [0, 1) then A is cofinite since

Ac = {1}.

Then F is a field.

• If Ω is finite then F is also a σ-field.

• If Ω is infinite then F may not be a σ-field. For example, let Ω =

{ω1, ω2, . . .}. Take A = {ω2, ω4, . . .}. Then A 6∈ F , but A is a count-

able union of singletons {ω2i}, i = 1, 2, . . ..

Let A be a (”small” and ”easy”) class of subsets of Ω. For example, A can

be a class of all subintervals of [0, 1]. Note that this class is not closed under

finite unions since e.g. for a1 < a2 < a3 < a4, [a1, a2]∪ [a3, a4] is not an interval.

So, it is not a field of a σ-field.

Since we want to consider all sets in A and at the same time have a ”nice”

structure of a σ-field, we will consider a σ-field generated by the class A, denoted

by σ(A). This is the smallest σ-field that contains A.

In general, σ(A) has the following properties:

• A ⊂ σ(A);

• σ(A) is a σ-field;

• If A ⊂ G and G is σ-field, then σ(A) ⊂ G.

Example 5. If Ω = {1, 2, 3}, then σ({1}) = σ({2, 3}) = {∅, {1}, {2, 3},Ω}.

Indeed, denote A = {1} and A′ = {∅, {1}, {2, 3},Ω}. Check (this is easy!)

that A is not a field (since the complement of {1} does not belong to A), but

A′ is a field. It is also a σ-field, since the space Ω is finite. Of course, A ⊂ A′.

Now, is A′ the smallest field that contains A? We list all the possible classes

that contain A = {1}:

3

• {∅, {1},Ω} - it is not a field, since it is not closed under taking comple-

ments;

• {{1},Ω} - it is not a field, since it is not closed under taking complements;

• {∅, {1}} - it is not a field, since it is not closed under taking complements;

• {∅, {1}, {1, 2},Ω} - it is not a field, since it is not closed under taking

complements;

• you can continue listing all the possibilities and you will see that most of

them will not be fields.

• Now, I will list all the fields that include {1}:

– A′ = {∅, {1}, {2, 3},Ω};

– {∅, {1}, {1, 2}, {3}, {2, 3},Ω} - it is bigger than A′;

– {∅, {1}, {1, 3}, {2}, {2, 3},Ω} - it is bigger than A′;

– {∅, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3},Ω} - it is bigger than A′;

Thus, we proved thatA′ is the smallest field that contains {1}, henceA′ = σ(A).

Of course, this is not the most effective way of proving this.

Example 6. If F is a σ-field, then σ(F) = F .

Example 7. Let Ω = [0, 1] and I be a class of all subintervals (a, b] of (0, 1].

Then B = σ(I) is called a Borel σ-field. It is a class of all countable unions

and intersections of intervals. Its elements are called Borel sets.

A set of rationals Q is a Borel set. Indeed, each rational q can be obtained as

q =

⋂∞

i=1(q − 1/i, q]. The sets (q − 1/i, q] ∈ I. Note that {q} 6∈ I, but {q} ∈ B.

Since there are countably many rationals, Q ∈ B.

1.3 Probability Measures

Definition 8. A set function P on a field F is called probability if

1. 0 ≤ P(A) ≤ 1 for any event A ∈ A

2. P(∅) = 0, P(Ω) = 1

3. If A1, A2, . . . ,∈ F are mutually exclusive (disjoint) and

⋃∞

i=1Ai ∈ F , then

P(A1 ∪A2 ∪ · · · ) = P(A1) + P(A2) + · · · .

From the axioms of probability we can derive further properties. Let A ⊂

B ∈ F . Then B = A ∪ (B \ A) is a union of disjoint sets. Thus, P(B) =

P(A) + P(B \A) and hence we conclude that P is monotone:

4

P(A) ≤ P(B) , A ⊂ B .

Another property is

P(A ∪B) = P(A) + P(B)− P(A ∩B) .

Further, let B1 = A1 and B2 = A2∩Ac1. Then the events B1, B2 are disjoint

and B1 ∪B2 = A1 ∪A2. Thus,

P(A1 ∪A2) = P(B1 ∪B2) = P(B1) + P(B2)

= P(A1) + P(A2 ∩Ac1) ≤ P(A1) + P(A2) .

In the last step we sued the fact that A2 ∩Ac1 ⊆ A2.

This can be generalized to obtain finite subadditivity:

P

(

n⋃

i=1

Ai

)

≤

n∑

i=1

P(Ai) , Ai ∈ F .

Definition 9. Let F be a σ-field. A triple (Ω,F ,P) is called a probability

space.

Theorem 10. Let F be a field.

• (Continuity from below) If A,An ∈ F , n ≥ 1 and A1 ⊆ A2 ⊆ · · · ⊆ A,

then

lim

n→∞P(An) = P(A) .

• (Continuity from above) If A,An ∈ F , n ≥ 1 and A ⊆ · · ·A2 ⊆ A1, then

lim

n→∞P(An) = P(A) .

Proof. Set B1 = A1, Bk = Ak \ Ak−1. Then Bk are disjoint, A =

⋃∞

k=1Bk

and An =

⋃n

k=1Bk and An, n ≥ 1 is an increasing sequence. By countable

additivity:

P(A) = P

( ∞⋃

k=1

Bk

)

=

∞∑

k=1

P(Bk)

= lim

n→∞

n∑

k=1

P(Bk) = lim

n→∞P

(

n⋃

k=1

Bk

)

= lim

n→∞P(An) .

5

1.4 Lebesque measure

Let Ω = [0, 1]. Let I be a class of subintervals (a, b] of (0, 1]. Let B0 be a class

that consists of all finite union of disjoint intervals. Note that B0 is a field. For

I = (a, b] its Lebesque measure is λ(I) = b− a.

Theorem 11. The Lebesgue measure is a probability measure on B0.

Proof. See the textbook.

Example 12. Assume that we select a value from the interval (0, 1] at ”ran-

dom”. What is the probability that the selected number is rational?

To each interval A = (a, b] we assigned P(A) = b−a. Thus, P((q−1/i, q]) =

1/i. Then

P({q}) = P

( ∞⋂

i=1

(q − 1/i, q]

)

= lim

i→∞

P((q − 1/i, q]) = lim

i→∞

(1/i) = 0 .

On the other hand, Q is a countable union thus P(Q) = 0. 1

2 Existence of Probability Measures

Some facts about sets and classes of sets. Let F and G be two classes of

sets. If G ⊆ F then any set that belongs to G also belongs to F : A ∈ G ⇒ A ∈ F .

On the other hand, if A ∈ G and B ⊆ A (that is, B is a subset of set A) then

we cannot conclude that B ∈ G. Indeed, let G be a class of subintervals (a, b] of

(0, 10]. Then B = {1/2} is included in many of the intervals, but B 6∈ G.

2.1 Existence

The main result of today’s lecture is the following theorem:

Theorem 13. A probability measure on a field has a unique extension to the

generated σ-field.

The meaning of this theorem is the following. Let F0 be a field and let

F = σ(F0). Let P be a probability measure on F0. We are allowed to write

P(A) for A ∈ F0, but not for A ∈ F . Then there exists (only one) probability

measure P˜ on F such that P(A) = P˜(A) for all A ∈ F0. Usually the extension

P˜ of P is denoted just by P.

1Note: At this moment, thanks to Theorem 11, we can compute the Lebesgue measure

on finite unions and intersections of subintervals. We do not know if we can compute the

measure on countable unions and intersections. This will cam later. Thus, at this point, the

first displayed equation is not completely justified, since

⋂∞

i=1(q − 1/i, q] 6∈ B0.

6

Example 14. Let Ω = [0, 1]. Let I be a class of subintervals I = (a, b] of (0, 1].

It is easy to define the Lebesgue measure on I by λ(I) = b−a. This is easy. Let

B0 be a field that consists of all finite union of disjoint intervals. By Theorem

10 from Slides set A, the Lebesgue measure is also a probability measure on B0.

This is still easy. Our main theorem extends the Lebesque measure to all Borel

sets of (0, 1]. In particular, we are allowed to write λ(Q) (which happens to be

zero as we learned in Slides set A).

Let P be the probability measure on F0. For A ∈ Ω define

P∗(A) = inf

∑

n

P(An) , (1)

where the infimum extends over all finite and infinite sequences A1, A2, . . . , such

that A ⊂ ⋃nAn and An ∈ F0. Note that for A ∈ Ω such that A 6∈ F0 we are

not allowed to write P(A). However, if A ∈ F0 then we will show that

P∗(A) = P(A) . (2)

The measure P∗ is called the outer measure of P.

The measure P∗ has the following properties:

1. P∗(∅) = 0;

2. P∗(A) ≥ 0 for all A;

3. P∗ is monotone, that is, if A ⊂ B, then P∗(A) ≤ P∗(B);

4. P∗ is countably subadditive, P∗(∪nAn) ≤

∑

n P∗(An).

However, we do not know if P∗ is a probability measure.

The inner measure P∗ is defined as

P∗(A) = 1− P∗(A′) .

Both P∗ and P∗ are candidates for an extension of P.

If P∗ and P∗ agree then we have

P∗(A) + P∗(A′) = 1 .

We will ask for more:

P∗(A ∩ E) + P∗(A′ ∩ E) = P∗(E) (3)

for all sets E ∈ Ω. Note that by subadditivity, (3) is equivalent to

P∗(A ∩ E) + P∗(A′ ∩ E) ≤ P∗(E) . (4)

We do not know at this point how many (if any at all!!!) sets A fulfill this. Let

M be the class of sets A that fulfill (3). We call such sets P∗-measurable.

7

Lemma 15. The class M is a field.

Proof. First, Ω ∈M, since

P∗(Ω ∩ E) + P∗(Ω′ ∩ E) = P∗(E) .

Next, M closed under complementation: if A ∈ M, then A′ ∈ M. This is

obvious. Furthermore, if A,B ∈ M, then A ∩ B ∈ M. For this one need to

show that if

P∗(A ∩ F ) + P∗(A′ ∩ F ) = P∗(F ) , P∗(B ∩ E) + P∗(B′ ∩ E) = P∗(E) ,

for all sets E,F , then

P∗((A ∩B) ∩ E) + P∗((A ∩B)′ ∩ E) ≤ P∗(E) .

We have

P∗(E) = P∗(B ∩ E︸ ︷︷ ︸

=F

) + P∗(B′ ∩ E︸ ︷︷ ︸

=F

)

= {P∗(A ∩B ∩ E) + P∗(A′ ∩B ∩ E)}

+ {P∗(A ∩B′ ∩ E) + P∗(A′ ∩B′ ∩ E)}

≥ P∗(A ∩B ∩ E) + P∗(((A′ ∩B) ∪ (A ∩B′) ∪ (A′ ∩B′)) ∩ E) .

In the first equation we used the property (3) applied to B, then in the next

equality the property of (3) applied to A twice. Then we used subadditivity of

P∗. Now,

((A′ ∩B) ∪ (A ∩B′) ∪ (A′ ∩B′)) = (A ∩B)′ ,

thus

P∗(E) ≥ P∗(A ∩B ∩ E) + P∗((A ∩B)′ ∩ E) .

Hence, A ∩B ∈M.

Lemma 16. If A1, A2, . . . is a finite or infinite sequence of disjoint M-sets,

then for each E ∈ Ω,

P∗

(

E ∩

⋃

k

Ak

)

=

∑

k

P∗(E ∩Ak) . (5)

Proof. We will do a proof of A1, A2 only. If A1 ∪ A2 = Ω, then A2 = A′1 and

hence (5) is just (3). If A1 ∪A2 ⊂ Ω, then write

E ∩ (A1 ∪A2)︸ ︷︷ ︸

=F

=

A1 ∩ (E ∩ (A1 ∪A2))︸ ︷︷ ︸

=F

∪

A′1 ∩ (E ∩ (A1 ∪A2))︸ ︷︷ ︸

=F

and use (20) with E replaced by F .

8

Lemma 17. The class M is a σ-field.

Proof. Note that we only need to prove thatM is closed under countable unions.

I will not prove it, you may consult the textbook.

Combining Lemma 17 with (5) (taking E = Ω) gives us the countable addi-

tivity on M:

P∗

(⋃

k

Ak

)

=

∑

k

P∗(Ak) , (6)

where Ak are disjoint and belong to M.

Lemma 18. If P∗ is defined by (3), then F0 ⊂M.

Proof. Let A ∈ F0 and E ∈ Ω. Let > 0. Choose sets An ∈ F0 such that

E ⊂ ⋃nAn and∑n P(An) ≤ P∗(E)+. The sets Bn = A∩An and Cn = A∩A′n

belong to F0, since it is field. Also, E ∩A ⊂

⋃

nBn, E ∩A′ ⊂

⋃

n Cn. Thus, by

monotonicity and countable subadditivity of P∗,

P∗(E ∩A) + P∗(E ∩A′) ≤ P∗

(⋃

n

Bn

)

+ P∗

(⋃

n

Cn

)

≤

∑

n

P∗(Bn) +

∑

n

P∗(Cn)

=

∑

n

P(Bn) +

∑

n

P(Cn) .

Since Bn and Cn are disjoint,

P∗(E ∩A) + P∗(E ∩A′) ≤

∑

n

P(An) ≤ P∗(E) + .

Hence, if A ∈ F0, then (22) holds and thus A ∈M. Hence, F0 ⊂M.

Corollary 19. If P∗ is defined by (1), then F = σ(F0) ⊂M.

Lemma 20. If P∗ is defined by (1) then P∗(A) = P(A) for all A ∈ F0.

Proof. From (1) we obtain that if A ∈ F0, then P∗(A) ≤ P(A) (since A can be

covered by itself). Now, let A ⊂ ∪nAn, where A,An ∈ F0. Then

P(A) = P (A ∩ (∪nAn)) ≤

∑

n

P(A ∩An) ≤

∑

n

P(An) .

Note that here we are allowed to write P since all the sets belong to F0. Thus

P∗(A) ≤ P(A) ≤

∑

n

P(An)

and since the left hand side does not depend on n we have

P∗(A) ≤ P(A) ≤ inf

n

∑

n

P(An) . (7)

At the same time, (1) holds, thus the inequalities in (7) must be equalities.

9

• By Lemma 20, P∗(Ω) = P(Ω).

• Equation (6) gives us countable additivity of P∗ on M, hence also on

F = σ(F0).

• Thus, P∗ is a probability measure on F .

• Lemma 20 means that P∗ is an extension of P.

2.2 Uniqueness

2 A class P of subsets of Ω is called a pi-system if it is closed under finite

intersections:

• If A,B ∈ P, then A ∩B ∈ P.

Note that a field is necessary a pi-system, but the converse is not true.

A class L of subsets of Ω is called a λ-system if

• Ω ∈ L;

• A ∈ L implies A′ ∈ L;

• L is closed under countable disjoint unions, that is if An ∈ L are disjoint,

then

⋃

nAn ∈ L.

Note that σ-field is also a λ-system, but the converse is not true.

Lemma. A class that is both a pi-system and a λ-system is a σ-field.

Theorem. (Dynkin pi-λ Theorem). If P is a pi-system and L is λ-system and

P ⊆ L, then σ(P) ⊆ L.

I will not prove this theorem (see the textbook), but think about this: If P

is a field and L is a σ-field, and P ⊆ L, then σ(P) ⊆ L. This is obvious by the

definition of σ(P).

Theorem. Suppose that P1 and P2 are probability measures on σ(P), where

P is a pi-system. Assume that P1 and P2 agree on P. Then they agree on σ(P).

Proof. Let G be the class of sets A such that P1(A) = P2(A).

• Obviously, Ω ∈ G, since P1(Ω) = P2(Ω).

• If A ∈ G, then P1(A′) == 1− P1(A) = 1− P2(A) = P2(A′), hence A′ ∈ G.

• If An ∈ G, n ≥ 1, are disjoint, then the union belongs to G by the countable

additivity of both P1 and P2.

• Thus, G is a λ-system. We do not know if G is a pi-system, but we know

from the assumption that P ⊆ G.

• Dynkin theorem tells us that σ(P) ⊆ G.

2This material is optional

10

• Note that if we have a stronger assumption that P is a field, then a proof

would be slightly shorter.

How do we apply this result. Take a field F0. Then for A ∈ F0, P∗(A) =

P∗(A). Thus, both inner and outer measure agree on a field. So, they must

agree on a generated σ-field.

3 Denumerable Probabilities

3.1 General Formulas

Let (Ω,F ,P) be a probability space. Let An, n ≥ 1 be a sequence of events,

An ∈ F . Define

lim sup

n

An =

∞⋂

n=1

∞⋃

k=n

Ak ,

lim inf

n

An =

∞⋃

n=1

∞⋂

k=n

Ak .

The sets above are called limits superior and limits inferior, respectively.

Since F is a σ-field, both lim supnAn and lim infnAn belong to F . Then ω ∈

lim supnAn if and only if for each n there is some k ≥ n for which ω ∈ Ak. In

other words, ω lies in lim supnAn if and only if it lies in infinitely many of the

An’s. Likewise, ω ∈ lim infnAn if and only if there is n such that for all k ≥ n,

ω ∈ Ak. In other words, ω lies in all but finitely many of the An’s.

• lim supnAn = limAn = {An i.o.} = {An infinitely often};

• ⋂∞k=nAk ↑ lim infnAn;

• ⋃∞k=nAk ↓ lim supnAn;

• lim infnAn ⊆ lim supnAn (to prove it, use the ”words” description from

the previous slide, instead of the set-theoretic arguments). To see that

inclusion may be strict, consider Ω = {ω1, ω2}. Let An = {ω1} for n even

and An = {ω2} for n odd. Then lim inf An = ∅ and lim supAn = Ω.

• Check https://en.wikipedia.org/wiki/Set-theoretic_limit for ”Ex-

amples”.

• From de Morgan’s laws we can also conclude that lim infnAn = (lim supnAcn)c.

• If lim supnAn = lim infnAn, then we write

A = lim

n

An = lim sup

n

An = lim inf

n

An .

In this case A ∈ F .

11

Theorem 21. We have

P(lim inf

n

An) ≤ lim inf

n

P(An) ≤ lim sup

n

P(An) ≤ P(lim sup

n

An) .

Proof. Set Bn =

⋂∞

k=nAk and Cn =

⋃∞

k=nAk. Then Bn ↑ lim infnAn and

Cn ↓ lim supnAn. The continuity from below and above give

lim sup

n

P(An) ≤ lim sup

n

P(Cn) ≤ P(lim sup

n

An)

and

lim inf

n

P(An) ≥ lim sup

n

P(Bn) ≤ P(lim inf

n

An) .

Corollary 22. If An → A, then limn→∞ P(An) = P(A).

Note that this result extends the previously proven continuity from below

and above.

Definition 23. A finite collection A1, . . . , An of events is independent if

P(Ai1 ∩ · · · ∩Aik) = P(Ai1)× · · · × P(Aik) (8)

for al collections of indices 1 ≤ i1 < · · · < ik ≤ n.

The requirement (8) can be stated in a different way:

P(B1 ∩ · · · ∩Bn) = P(B1)× · · · × P(Bn) , (9)

where Bi is either Ai or Ω.

In particular, a collection if 3 events A,B,C is independent if

P(A ∩B ∩ C) = P(A)P(B)P(C) , (10)

P(A ∩B) = P(A)P(B) , P(A ∩ C) = P(A)P(C) , P(B ∩ C) = P(B)P(C) .

(11)

Example 24. Let Ω = {1, 2, 3, 4, 5, 6}. Take A1 = {1, 2, 3, 4}, A2 = A3 =

{4, 5, 6}. Then (10) holds, but none of (11) are satisfied.

Definition 25. We say that classes A1 and A2 are independent if for each

choice of A1 ∈ A1 and A2 ∈ A2, the events A1 and A2 are independent.

Theorem 26. Assume that fields A1 and A2 are independent. Then the σ-fields

σ(A1) and σ(A2) are also independent. 3

Proof. By the assumption (9) holds for all sets B1, B2 from A1 and A2, respec-

tively. We want (9) to hold for all sets B1, B2 from σ(A1) and σ(A2). Fix

B2 ∈ A2. Let L be the class of sets B1 for which (9) holds. Our goal is to show

that the class L is a σ-field.

3In the textbook, you can see a slightly different formulation, namely it is required that

A1 and A2 are pi systems. Here, in our theorem, we require more.

12

• Ω ∈ L: Indeed, P(Ω ∩B2) = P(Ω)P(B2) trivially holds.

• L is closed under complement: Indeed, if P(B1 ∩B2) = P(B1)P(B2), then

P(B′1 ∩B2) = P(Ω ∩B2)− P(B1 ∩B2)

= P(Ω ∩B2)− P(B1)P(B2) = P(B′1)P(B2) .

• L is closed under finite disjoint unions: Indeed, if A,B ∈ L, A ∩ B = ∅

and both P(A∩B2) = P(A)P(B2) and P(B ∩B2) = P(B)P(B2) hold then

P((A ∪B) ∩B2) = P((A ∩B2) ∪ (B ∩B2)) = P(A ∩B2) + P(B ∩B2)

= P(A)P(B2) + P(B)P(B2) = P(A ∪B)P(B2) .

• The same argument allows us to conclude that L is closed under count-

able disjoint unions. Instead of finite additivity of P use the countable

additivity.

• We cannot conclude yet that L is a σ-field, since we only know that L is

closed under countable disjoint unions.

• But, L is a pi-system (closed under finite intersections) and λ-system

(closed under complement and countable disjoint unions), hence by Prob-

lem Set 2 it is a σ-field.

• Since (9) holds for A1 and σ(A1) ⊂ L, (9) holds for σ(A1).

• Now, in principle, you should fix B1 ∈ A1 and redo the proof. But the

steps are exactly the same.

3.2 The Borel-Cantelli Lemmas

Note fist that if limn→∞ P(An) = 0, then P(lim infnAn) = 0. The next result

has a stronger assumption but also a stronger conclusion.

Theorem 27 (First Borel-Cantelli Lemma). If

∑

n P(An) <∞, then P(lim supnAn) =

0.

Proof. Fix m ∈ N. We have

P

(

lim sup

n

An

)

≤ P

( ∞⋃

k=m

Ak

)

≤

∞∑

k=m

P(Ak) .

Since the series is summable, the latter expression converges to zero as m →

∞.

Theorem 28 (Second Borel-Cantelli Lemma). If

∑

n P(An) = ∞ and the

events An are independent, then P(lim supnAn) = 1.

13

Proof. Recall that lim supnAn =

⋂∞

n=1

⋃∞

k=nAk, so that the complement is⋃∞

n=1

⋂∞

k=nA

c

k. We will show that the complement has the probability zero.

For this it suffices to prove that P(

⋂∞

k=nA

c

k) = 0 for each n. Fix j. Then, using

1− x ≤ exp(−x), x ∈ (0, 1),

P

(

n+j⋂

k=n

Ack

)

=

n+j∏

k=n

P(Ack) =

n+j∏

k=n

(1− P(Ak)) ≤ exp

(

−

n+j∑

k=n

P(Ak)

)

If we let j →∞, then the last expression goes to zero.

Example 29. The assumption

∑

n P(An) <∞ is sufficient, but not necessary

for the first Borel-Cantelli Lemma. Consider the probability space ((0, 1],B, λ)

and the sets An = (0, 1/n]. Then An → ∅ and so lim supnAn = ∅ (so that its

measure is 0). On the other hand,

∑

n λ(An) =∞.

Note also that this example serves as a counterexample to the second Borel-

Cantelli Lemma. Note that the events An are not independent.

Let {An, n ≥ 1} be the sequence of events in the probability space (Ω,F ,P).

Consider σ-fields σ(An, An+1, . . .) and define the tail sigma field

T =

∞∏

n=1

σ(An, An+1, . . .) .

Any event in T is called a tail event and is determined by the An’s for arbitrary

large n.

Example 30. lim supnAn and lim infnAn are tail events.

Theorem 31. If the events An are independent, then A ∈ T implies P(A) = 0

or P(A) = 1.

Proof. since the events are independent, the σ-fields σ(A1), . . . , σ(An−1) and

σ(An, An+1, . . .) are independent. If A ∈ T , then A ∈ σ(An, An+1, . . .) and

hence the events A,A1, . . . , An−1 are independent. of events is defined by in-

dependence of each finite subcollection, the events A,A1, A2, . . . are indepen-

dent. Thus, σ(A) and σ(A1, A2, . . .) are independent. Moreover, A ∈ T ⊂

σ(A1, A2, . . .), hence A is independent from itself. This means that its proba-

bility is zero or 1.

4 Random variables

4.1 Definition

Definition 32. Let (Ω,F ,P) be a probability space and let (S,S) be a measurable

space. A mapping X : Ω→ S such that

X−1(B) = {ω : X(ω) ∈ B} = {X ∈ B} ∈ F (12)

for any B ∈ S is called a ((S,S)-valued)-random variable. Such a mapping

is also called a measurable mapping.

14

Example 33. If S = R and S = B, then we simply say that X is a random

variable. Recall that B is the smallest σ-algebra that contains all intervals.

Remark 34. There is no guarantee just by the definition that the class {X−1(B) :

B ∈ S} is a σ-field. It needs a proof.

The σ-algebra S may be huge. Therefore, verifying (19) may be difficult.

The following theorem simplifies (19) in a special case.

Theorem 35. Assume that S = σ(A) and X : Ω→ S. Assume that X−1(B) ∈

F for all B ∈ A. Then X is (S,S)-valued random variable.

See Assignment 3.

Example 36. Let S = R and S = B. Then we only need to check that

X−1((−∞, x]) ∈ F for any x ∈ R (in fact, for any x ∈ Q).

4.2 Simple random variables

Let A ∈ F . Define

IA(ω) =

{

1 if ω ∈ A

0 if ω 6∈ A .

Note that for any Borel set B, {ω : IA(ω) ∈ B} is either ∅, A, Ac or Ω,

(depending on whether 0 ∈ B or not and whether 1 ∈ B or not). Thus, IA is

a random variable, called an indicator random variable. Likewise, a finite

sum

X =

d∑

i=1

xiIAi(ω)

is a simple random variable if the sets Ai form a finite partition of Ω into

F-sets.

4.3 σ-fields generated by random variables

If G ⊂ F is a σ-field, then a random variable X is G-measurable or measurable

with respect to G if {X ∈ B} = {ω : X(ω) ∈ B} ∈ G for any Borel set B.

In case of simple random variables, this reduces to {X = x} ∈ G. Indeed,

{X ∈ B} = ⋃{X = x}, where the union extends over the finitely many values

of x lying both in B and the range of X.

Example 37. Let (Ω,F ,P) = (R,B,P) and consider X(ω) = ω. Then X is

(Ω,F)-measurable (and hence a random variable). Take now G = σ([a, b] : a, b ∈

Z). Then X is not G-measurable. Indeed, the set X−1((−∞, x]) = {w : w ≤ x},

x 6∈ Z, is not in G.

Definition 38. The σ-field generated by (S,S)-valued random variable

X denoted by σ(X), is the smallest σ-field with respect to which X is measurable.

15

Example 39. Assume that S = σ(A) and X : Ω→ S. Then σ(X) is generated

by the collection of sets X−1(A) = {X−1(B) : B ∈ A} ⊂ F . That is, σ(X) =

σ(G), where G is a collection of sets A such that A = X−1(B) for some B ∈ A.

Example 40. For a random variable X we have

σ(X) = σ({ω : X(ω) ≤ q}, q ∈ Q) .

Note that {ω : X(ω) ≤ q} = X−1((−∞, q]).

Theorem 41. Assume that X is a simple random variable. Then σ(X) consists

of the sets {ω : X(ω) ∈ B}, where B ⊂ R.

Proof. Let M be the class of the subsets of Ω of the form {ω : X(ω) ∈ B}.

ThenM⊂ σ(X). At the same timeM is a σ-field. This finishes the proof. See

the textbook for details.

Theorem 42. A simple random variable Y is σ(X)-measurable if and only if

Y = f(X) for a function f : R→ R.

Example 43. For example, Y = X2 is σ(X)-measurable, but Y = X2 + Z,

where Z is another random variable is not σ(X)-measurable. However, Y is

σ(X,Z)-measurable.

4.4 Independence

Definition 44. A sequence X1, X2, . . . (finite or infinite) of (S,S)-valued ran-

dom variables is independent if the classes σ(X1), σ(X2), . . .. That is, for each

finite k,

P(X1 ∈ B1, . . . , Xk ∈ Bk) = P(X1 ∈ B1) · · ·P(Xk ∈ Bk) (13)

for all B1, . . . , Bk ∈ S.

If S = σ(A) is suffices to verify (13) for sets B1, . . . , Bk ∈ A only (see the

previous lectures). In case of random variables it reduces to

P(X1 ≤ x1, . . . , Xk ≤ xk) = P(X1 ≤ x1) · · ·P(Xk ≤ xk) , (14)

while in case of simple random variables it further reduces to

P(X1 = y1, . . . , Xk = yk) = P(X1 = y1) · · ·P(Xk = yk) , (15)

Indeed, {X1 ≤ x1} =

⋃

y≤x1{X1 = y} and the latter is a finite union. Thus,

(15) implies (14).

16

4.5 Closure properties

Recall that f is a measurable function from (S,S) to (T, T ) if

f−1(B) = {x : f(x) ∈ B} ∈ S

for any B ∈ T .

Theorem 45. Let X : Ω→ S be a (S,S)-valued random variable and let f be a

measurable function from (S,S) to (T, T ). Then the composition f ◦X : Ω→ T

is (T, T )-valued random variable.

Proof. Let B ∈ T . We know that f−1(B) ∈ S. Now

(f ◦X)−1(B) = X−1(f−1(B)) ∈ F .

Remark 46. Note that Theorem 45 extends Theorem 42 from simple to arbi-

trary random variables. In Theorem 45 we need however f to be measurable

function.

Example 47. Let X : Ω → Rd be measurable so that (X1(ω), . . . , Xd(ω)) is a

random vector. Then

∑d

i=1Xi,

∏n

i=1Xi, mini=1,...,dXi, maxi=1,...,dXi are

random variables.

4.6 Convergence of random variables

Refresh your memory about pointwise and uniform convergence of functions:

https://en.wikipedia.org/wiki/Uniform_convergence

Let (Ω,F ,P) be a probability space and let X,Xn, n ≥ 1 be random variables

defined on that probability space. That is X : Ω → R and Xn : Ω → R. Then

Xn(ω) converges to X(ω) if for each > 0 there exists n0 such that for all

n ≥ n0 we have |Xn(ω) − X(ω)| < . Thus, the complementary event is:

|Xn(ω)−X(ω)| ≥ for some > 0 and infinitely many values of n:

{lim

n

Xn = X}c =

⋃

{|Xn −X| ≥ i.o.} .

Note that the union can be taken over rational values of .

Definition 48. We say that Xn converges with probability 1 to X (con-

verges almost surely) if

P(|Xn −X| ≥ i.o.) = 0 (16)

for all > 0. We will write Xn

a.s.−→ X.

Let An = {|Xn − X| ≥ }. Then {|Xn − X| ≥ i.o.} = lim supnAn.

Thus Xn

a.s.−→ X if and only if P(lim supnAn) = 0. Since lim supn P(An) ≤

P(lim supnAn) we conclude that (16) implies

lim

n→∞P(|Xn −X| ≥ ) = 0 (17)

17

Definition 49. We say that Xn converges in probability to X if

lim

n→∞P(|Xn −X| ≥ ) = 0 (18)

for all > 0. We will write Xn

p−→ X.

Remark 50. We get immediately

Xn

a.s.−→ X ⇒ Xn p−→ X .

The converse is not true.

Example 51. Let An be events such that P(An) = 0. Take X ≡ 0 and Xn =

IAn . Note that Xn

p−→ X is equivalent to P(An) → 0. Take any sets An such

that {An i.o.} has a positive probability. For example take An = (tn, tn+sn) with

sn ↓ 0 and such that ω is covered by infinitely many intervals An: tn = (i−1)/k,

sn = 1/k when n = k(k − 1)/2 + i, i = 1, . . . , k, k ≥ 1.

Example. This example was suggested by Joe. Consider a sequence of inde-

pendent random variables Xn, n ≥ 1, taking values 1 with probability 1/n and 0

with probability 1−1/n. Note that we can take Ω = (0, 1] with P = λ and write

Xn = 1An , where An = (0, 1/n]. Then P(An) = 1/n → 0 and hence we have

convergence in probability: Xn

p−→ X, whereX ≡ 0. You can also see this in the

following way: P(|Xn| > ) = P(Xn = 1) = 1/n→ 0. But we do not almost sure

convergence. Indeed, since

∑

n P(An) = +∞, using the second Borel-Cantelli

lemma, P(lim supAn) = 1. Thus P({ω : Xn(ω) = 1 for infinitely many n}) = 1,

hence Xn does not converge to zero almost surely.

4.7 Approximation by simple random variables

Theorem 52. For any random variable X there exists a sequence Xn of simple

random variables such that limnXn(ω) = X(ω) for each fixed ω ∈ Ω.

Proof. Let

fn(x) = nI(n,∞)(x) + 2−n

n2n−1∑

k=0

kI(k2−n,(k+1)2−n](x) .

IfX ≥ 0 thenXn = fn(X) is a simple function. Note that |X(ω)−Xn(ω)| ≤ 2−n

whenever X(ω) ≤ n. Thus the statement holds for X ≥ 0. If X is arbitrary,

then we can write X(ω) = X+(ω) − X−(ω), where X+(ω) = max{X(ω), 0}

and X−(ω) = −min{X(ω), 0}. Both are nonnegative random variables and

Xn = fn(X+)− fn(X−).

4.8 Expected Value

Let X be a simple random variable

X =

d∑

i=1

xi1Ai .

18

Then the expected value is define as

E[X] =

d∑

i=1

xiP(Ai) . (19)

On the other hand, we have

E[X] =

∑

x

xP(X = x) . (20)

Indeed, the right-hand sides of (19) and (20) can be written as∑

x

∑

i:xi=x

xiP(Ai) .

In particular, E[IA] = P(A).

If Y = f(X) then Y is the simple variable of the form

Y =

d∑

i=1

f(xi)1Ai

and hence

E[Y ] = E[f(X)] =

d∑

i=1

f(xi)P(Ai) =

∑

x

f(x)P(X = x) .

The k-th moment of X is given by

E[Xk] =

∑

x

xkP(X = x) .

The properties of the expected value follow from the corresponding properties of sum

operation:

• Linearity: If X = ∑i xi1Ai and Y = ∑j yj1Bj , and α, β ∈ R, then

xiIAi = xi

∑

j

1Ai∩Bj

and hence

αX + βY =

∑

i,j

(αxi + βyj)1Ai∩Bj .

Then

E[αX + βY ] =

∑

i,j

(αxi + βyj)P(Ai ∩Bj)

= α

∑

i

xiP(Ai) + β

∑

j

yjP(Bj) = αE[X] + βE[Y ] .

• It extends to a finite number of simple random variables:

E[

n∑

i=1

Xi] =

n∑

i=1

E[Xi] . (21)

19

• If X(ω) ≤ Y (ω) for all ω, then E[X] ≤ E[Y ];

• |E[X]| ≤ E[|X|];

•

|E[X − Y ]| ≤ E[|X − Y |] . (22)

• If X and Y are independent, then

E[XY ] = E

[∑

i,j

xiyj1Ai1Bj

]

= E

[∑

i,j

xiyj1Ai∩Bj

]

=

∑

i,j

xiyjP(Ai ∩Bj)

=

∑

i,j

xiyjP(Ai)P(Bj) =

(∑

i

xiP(Ai)

)(∑

j

yjP(Bj)

)

= E[X]E[Y ] .

Assume that X is nonnegative. Order the range as 0 ≤ x1 < x2 · · · ≤ xd. Then

E[X] =

d∑

i=i

xiP(X = xi) =

d−1∑

i=1

xi (P(X ≥ xi)− P(X ≥ xi−1)) + xdP(X = xd)

= x1P(X ≥ x1) +

d∑

i=2

(xi − xi−1)P(X ≥ xi) .

This can be written a ∫ ∞

0

P(X ≥ x)dx .

Theorem 53. If the sequence {Xn} is uniformly bounded and X = limnXn (that is

Xn

a.s.−→ X), then limn E[Xn] = E[X].

Proof. There exists K0 such that supn supω |Xn(ω)| < K0. Let K1 = supω |X(ω)|,

K = max{K0,K1}. Then supn supω |X(ω) − Xn(ω)| ≤ 2K. Let An = {ω : |X(ω) −

Xn(ω)| > }. Then for all ω

|X(ω)−Xn(ω)| = |X −Xn|1An(ω) + |X −Xn|1A′n(ω) ≤ K1An(ω) + 1A′n(ω) .

Thus

E [|X(ω)−Xn(ω)|] ≤ KP(An) + P(A′n) .

Since Xn

a.s.−→ X, we also have Xn p−→ X and by the definition limn P(An) = 0. This

implies

lim

n

E [|X(ω)−Xn(ω)|] ≤ .

Since is arbitrary, limn E [|X(ω)−Xn(ω)|] = 0. Apply (22).

Let µ = E[X]. We define variance

Var(X) = E[(X − µ)2] = E[X2]− µ2 .

Justification:

E[(X − µ)2] = E[X2 − 2µX + µ2] = E[X2]− E[2µX] + E[µ2]

= E[X2]− 2µE[X] + µ2 = E[X2]− 2µ2 + µ2 = E[X2]− µ2 .

20

•

Var(αX + β) = α2Var(X) .

Justification: Let Y = aX + b. Then µY = E[Y ] = aE[X] + b = aµX + b. We

have

Var(aX + b) = Var(Y ) = E[Y 2]− µ2Y = E[(aX + b)2]− (aµX + b)2

= E[a2X2 + 2abX + b2]− (a2µ2X + 2abµX + b2)

= E[a2X2] + E[2abX] + E[b2]− (a2µ2X + 2abµX + b2)

= a2E[X2] + 2abE[X] + b2 − (a2µ2X + 2abµX + b2)

= a2

{

E[X2]− µ2X

}

= a2Var(X) .

• Let S = ∑ni=1Xi and µi = E[Xi]. Using (21) we get

Var (S) = E

[(

n∑

i=1

(Xi − µi)

)2]

=

n∑

i=1

E[(Xi − µi)2] + 2

∑

1≤i

• If Xi are independent, then

Var

(

n∑

i=1

Xi

)

=

n∑

i=1

Var(Xi) .

4.9 Inequalities

Lemma 54. Assume that X is nonnegative. Then for any > 0,

P(X ≥ ) ≤ −1E[X] .

Proof.

E[X] =

∑

x

xP(X = x) ≥

∑

x,x≥

xP(X = x) ≥

∑

x,x≥

P(X = x) .

Applying this to |X|k we obtain the Markov inequality: for any X:

P(|X| ≥ ) ≤ −kE[|X|k] .

Applying this with k = 2 and |X − µ| we obtain the Chebyshev inequality:

P(|X − µ| > ) ≤ −2Var(X) .

Other inequalities:

• If ϕ is convex then we obtain the Jensen inequality:

ϕ(E[X]) ≤ E[ϕ(X)] .

21

• If p, q > 1 and 1/p+ 1/q = 1, then we obtain the Ho¨lder inequality:

E[|XY |] ≤ (E[|X|q])1/q(E[|X|p])1/p

(Use ab ≤ ap/p+ bq/q for a, b > 0).

• If 0 < p ≤ q then we obtain the Lyapunov inequality:

(E[|X|p])1/p ≤ (E[|X|q])1/q .

5 Measures in Euclidean Spaces

5.1 Lebesgue Measure

The Lebesgue measure λ was already defined on B, the Borel σ-field on R. It is the

only measure that satisfies λ((a, b]) = b− a. The class Bk is a σ-field generated by the

sets of the form

Ik :=

k∏

i=1

(ai, bi] = (a1, b1]× · · · × (ak, bk] .

The k-dimensional Lebesgue measure λk is defined as

λk(Ik) =

k∏

i=1

(bi − ai) .

The extension theorem allows us to consider this measure on Bk.

Theorem 55. If A ∈ Bk, then for x ∈ Rk, A + x = {a + x : a ∈ A} ∈ Bk and

λk(A+ x) = λk(A).

Proof. Let G = {A ∈ Bk : A+ x ∈ Bk for all x ∈ Rk}. Then

• G contains Ik, the class of all the rectangles Ik;

• G is a σ-field. For example, if A,B ∈ G, then C = A ∪ B ∈ G. Indeed,

C + x = (A ∪B) + x = (A+ x) ∪ (B + x).

• Thus, Bk = σ(Ik) ⊆ G.

For the second statement,

• Note that Ik is a pi-system (closed under intersections; this is not a field!);

• Fix x ∈ Rk and define µ(A) = λk(A+ x).

• Then µ and λk agree on the pi-system since for A ∈ Ik, λk(A) = λk(A + x).

Thus, µ and λk agree on all Borel sets.

Let T : Rk → Rk be a linear and nonsingular map. For example,

• rotation or reflection (special case of so-called orthogonal or unitary transfor-

mation); det(T ) = ±1;

• T (x1, . . . , xk) = (x1 + x2, x2, . . . , xk), then det(T ) = 1;

• T (x1, . . . , xk) = (ax1, x2, . . . , xk), then det(T ) = a.

For A ∈ Rk denote TA = {Ta : a ∈ A}.

Theorem 56. If T is linear and nonsingular, then A ∈ Bk implies TA ∈ Bk and

λk(TA) = |det(T )|λk(A) .

22

Singular case. Note that

B = {a} × (a2, b2]× · · · × (ak, bk]

can be viewed as an element of both Rk and Rk−1. Then

λk(B) = 0 , λk−1(B) =

k∏

i=2

(bi − ai) .

Note that B can be viewed as the image of

A = (a1, b1]× (a2, b2]× · · · × (ak, bk]

through the projection: B = TA, where T (x1, x2, . . . , xk) = (a, x2, . . . , xk). We have

det(T ) = 0.

In fact we have a general statement: if A, TA ∈ Bk and T is singular, then

λk(TA) = 0. Note however, contrary to the singular case, it is not always true that

TA ∈ Bk.

5.2 Regularity

Open and closed sets A subset A ∈ R is open if for every x ∈ A there exists

> 0 such that x+ ∈ A. For example, (a, b) is open, but (a, b] is not. For x, y ∈ Rk

let d(x, y) =

√∑k

i=1(xi − yi)2 be the Euclidean distance. Then A ∈ Rk is open if for

any x ∈ A there exists > 0 such that d(x, y) < implies y ∈ A. Equivalently, a set

A is open if there exists > 0 such that the ball B(x, ) = {y ∈ Rk : d(x, y) < } ⊆ A.

A closed is a complement of an open set. A compact set is bounded and closed.

Theorem 57. For any A ∈ Bk and > 0 there exist open and closed sets O and C

such that C ⊂ A ⊂ O and λk(O \ C) < .

5.3 Specifying Measures on the Line

Measures on R Let µ be a measure on R. Then

• µ is finite on bounded sets if assigns finite values to bounded subsets A ∈ R.

• µ is a finite measure if µ(R) <∞.

• Any probability measure is a finite measure (in fact, probability measures are

normalized finite measures).

• The Lebesgue measure is finite on bounded sets, but it is not finite: λ(R) =∞.

For a measure that is finite on bounded sets, define

F (x) =

{

µ((0, x]) , x ≥ 0

−µ((x, 0]) , x < 0 .

• The function F is nondecreasing;

• The function F is continuous from the right: if xn ↓ x, then F (xn)→ F (x).

23

Distribution function For every bounded interval (a, b],

µ((a, b]) = F (b)− F (a) . (23)

Note that (23) determines F up to an additive constant: if we know µ then from (23)

we can recover F (x) + c. If moreover µ is finite, then we can alternatively define

F (x) = µ((−∞, x]) .

Then:

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = µ(R);

• If µ(R) = 1, then F is called a (cumulative) distribution function.

• If µ = P is a probability measure, then for some random variable µ((−∞, x]) =

P(X ≤ x).

Theorem 58. Let F be a nondecreasing, right-continuous function. Then there exists

a unique measure µ such that (23) holds for all a, b. In particular, there is 1-1

equivalence between probability measures and distribution functions.

Proof. We need to show first that µ defined in (23) is a measure. Let A = (a, b] and

B = (c, d] be disjoint. We want to show that µ((a, b] ∪ (c, d]) = µ((a, b]) + µ((c, d]).

(Full details in assignment)

Example 59. Consider the following measure defined on the Borel sets of [0, 1]:

µ = (1/3)δ1/2 + (2/3)δ1 ,

where δx is the Dirac measure. Then

F (x) =

0 if x < 1/2 ,

1/3 if 1/2 ≤ x < 2/3 ,

1 if 2/3 ≤ x < 1 .

It is the probability measure since the total mass is one.

5.4 Specifying Measures on R2

Measures on R2 Let a = (a1, a2), b = (b1, b2) and A = (a, b] = (a1, b1] × (a2, b2]

be a bounded rectangle. For x ∈ R2 consider

Sx = {y ∈ R2 : y1 ≤ x1, y2 ≤ x2} .

The class of sets {Sx, x ∈ R2} generates B2. For a function F : R2 → R define

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2) .

Let now µ be a finite measure on R2. Define F by

F (x) = µ(Sx) = µ({y : y1 ≤ x1, y2 ≤ x2}) . (24)

24

Theorem 60. Let F be continuous from above and such that ∆AF ≥ 0 for any

bounded rectangle A. Then there exists a unique measure µ such that (24) holds.

If µ = P is a probability measure, then for some random vector (X,Y ),

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2)

= µ((−∞, b1]× (−∞, b2])− µ((−∞, b1]× (−∞, a2])

+ µ((−∞, a1]× (−∞, b2]) + µ((−∞, a1]× (−∞, a2])

= P(X ≤ b1, Y ≤ b2)− P(X ≤ b1, Y ≤ a2)

− P(X ≤ a1, Y ≤ b2) + P(X ≤ a1, Y ≤ a2)

= P(a1 < X ≤ b2, a2 < Y ≤ b2) .

Thus, the requirement ∆AF ≥ 0 is quite natural.

6 Measurable Functions and Mappings

6.1 Measurable Mappings

Let (Ω,F) and (Ω′,F ′) be two measurable spaces. A map T : Ω→ Ω′ is measurable

(sometimes written as F/F ′-measurable) if for each A′ ∈ F ′, we have T−1A′ ∈ F .

Recall that a F/B-measurable map is called a random variable. In this we will simply

write F-measurable.

Example 61. A real function f : Ω→ R with a finite range is measurable if f−1({x}) ∈

F .

Theorem 62. If T−1A′ ∈ F for each A′ ∈ A′ and σ(A′) = F ′, then T is F/F ′-

measurable. (cf. theorem 4, Slides Set E).

Proof. Let G = {A′ : T−1A′ ∈ F}. Then G is a σ-field and contains A′.

Let T : Ω → Ω′ and T ′ : Ω′ → Ω′′ . The space Ω′′ is equipped with a σ-field F ′′ .

A composition T ′ ◦ T : Ω→ Ω′′ is defined as ω → T ′(T (ω)).

Theorem 63. If T is F/F ′-measurable and T ′ is F ′/F ′′ -measurable then T ′ ◦ T is

F/F ′′ -measurable (composition of measurable maps is measurable).

6.2 Mappings into Rk

A map f : Ω→ Rk has the form

f(ω) = (f1(ω), . . . , fk(ω)) .

Since the sets

Sx = {y ∈ Rk : yi ≤ xi, i = 1, . . . , k}

generate Bk, a function f is F-measurable if and only if the set

{ω : fi(ω) ≤ xi, i = 1, . . . , k}

belongs to F for each (x1, . . . , xk).

Theorem 64. If f : Ri → Rk is continuous, then it is measurable.

Proof. Take for simplicity i = 1, k = 1. We need to show that Sx = {ω : f(ω) ≤ x} ∈ B

for each x. But continuity implies that Sx is a closed set, hence a complement of an

open set. Open sets are Borel sets.

25

6.3 Limits and Measurability

Theorem 65. Assume that fn, n ≥ 1, are real valued F-measurable functions. Then

(a) The functions supn fn, inf fn, lim supn fn, lim inf fn are measurable.

(b) If limn fn exists everywhere then it is measurable.

(c) The set {ω : limn fn(ω) exists} ∈ F .

Proof. We have

{ω : sup

n

fn(ω) ≤ x} = ∩n{ω : fn(ω) ≤ x} ∈ F .

Thus, supn fn is measurable.

Note: Assume that we have a probability space (Ω,F ,P) and f : Ω → R. Then

”limn fn exists everywhere” means that

P({ω : lim

n

fn(ω) does not exist}) = 0 .

For example if P = λ then a function f : (0, 1] → R that has jumps at countable

number of points is not continuous, but measurable.

Example 66. Let f(x) = x, g(x) = 1−x, x ∈ [0, 1]. Then h = sup{f, g} is a function

on [0, 1] defined as h(x) = 1− x, x ∈ [0, 1/2], h(x) = x, x ∈ (1/2, 1].

6.4 Transformations of Measures

Let (Ω,F , µ) be a measurable space (for example, a probability space). Let T : Ω→ Ω′

be F/F ′-measurable. Define a set function4 ν

ν(A′) = µT−1(A′) = µ(T−1A′) , A′ ∈ F ′ .

We note that ν = µT−1 is a set function on F ′.

It is not obvious that ν is a measure itself. However, assume that A′, B′ ∈ F ′ are

disjoint. Then the sets T−1(A′) and T−1(B′) are also disjoint and

ν(A′ ∪B′) = µ(T−1(A′ ∪B′)) = µ(T−1(A′) ∪ T−1(B′))

= µ(T−1(A′)) + µ(T−1(B′)) = ν(A′) + ν(B′) .

In a similar way you can justify other properties. Hence ν is a measure.

• If µ is finite then µT−1 is also finite.

• If µ is a probability measure then µT−1 is also a probability measure.

Indeed,

ν(Ω′) = µ(T−1Ω′) = µ(Ω) = 1 .

4Note that any measure is a set function, but not vice-versa

26

Measure preserving transformations

Definition 67. Let T : Ω→ Ω. If

µ(A) = µ(T−1(A)) , A ∈ F ,

then we say that T preserves the measure µ.

Example 68. If Ω = Rk and T is a linear map with det(T ) = 1, then T preserves

the Lebesque measure λk.

Example 69. The identity map is always measure preserving.

7 Distributions

7.1 Transformation of probability measures. Distributions

Consider a probability space (Ω,F ,P).

Recall that a random variable X is a measurable map X : Ω → R. Define the

measure µ on B by

µ = PX−1 .

Thus, for each A ∈ B we have

µ(A) = P(X−1A) = P({ω : X(ω) ∈ A}) = P(X ∈ A) .

We say that µ is the law or the distribution of a random variable X.

The (cumulative) distribution function of X is

F (x) = µ((−∞, x]) = P(X ≤ x) .

Example 70. Consider a probability space ([0, 1],B, λ). Define a map U(ω) = ω.

Then µ = λ and the cumulative distribution function of U is

FU (x) = P(U ≤ x) =

0 x < 0 ,

x x ∈ [0, 1] ,

1 x > 1 .

We say that U has a uniform distribution.

Example 71. Consider a probability space ([0, 1],B, λ). Assume that X is a random

variable defined on Ω with the law µ = PX−1. Define Y = X + b, b ∈ R. Note that Y

is also defined on Ω. Consider the law ν = P−1Y . Recall that for A ⊆ R the set A− b

is defined as A− b = {a− b : a ∈ A}. Then

ν(A) = P(Y −1A) = P({ω : Y (ω) ∈ A}) = P({ω : X(ω) + b ∈ A})

= P({ω : X(ω) ∈ A− b}) = P(X ∈ A− b) = µ(A− b) .

If FX and FY are the cumulative distribution functions for X and Y , respectively, then

FY (x) = FX(x− b) .

27

Properties of cdf

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = 1;

• If F is a distribution function then

F¯ (x) = 1− F (x) = 1− P(X ≤ x) = P(X > x)

is called the tail distribution function.

• F (x−) = limy↑x F (y) = P(X < x) = µ((−∞, x));

• P(X = x) = F (x)− F (x−). If P(X = x) 6= 0 then we say that F has a jump at

x.

Example 72. Assume that X is a nonnegative random variable with the cumulative

distribution function FX (for example, without any jumps). For b > 0 define

Y = (X − b)+ =

{

X − b if X > b ,

0 if X ≤ b .

By the definition Y ≥ 0. Therefore, P (Y ≤ y) = 0 for all y < 0.

For y = 0 we have

P(Y ≤ 0) = P(Y = 0) = P(X ≤ b) = FX(b) .

Moreover, for y > 0,

P(Y ≤ y) = P(Y ≤ y,X ≤ b) + P(Y ≤ y,X > b)

= P(0 ≤ y,X ≤ b) + P(X − b ≤ y,X > b)

= P(X ≤ b) + P(b < X ≤ y + b)

= P(X ≤ y + b) = FX(y + b) .

In summary

FY (y) =

0 , y < 0

FX(b) , y = 0

FX(y + b) , y > 0

.

Note that the cumulative distribution function has the jump of size P (X ≤ b) at point

y = 0.

Quantile function Let F be the cumulative distribution function. Recall that

F : R→ [0, 1]. Define a function Q : [0, 1]→ R by

Q(y) = inf{x : F (x) ≥ y} .

• If F is strictly increasing and continuous (no jumps), then Q is nothing else but

the inverse function of F (strictly increasing and continuous). Then Q(F (x)) =

x.

• If F is strictly increasing but has some jumps, then Q is continuous but not

strictly increasing.

• If F is strictly not increasing but continuous, then Q has jumps but is strictly

increasing.

28

Lemma 73. Assume that U is uniform on [0, 1], that is its distribution is given by

FU (x) = 0 if x < 0, FU (x) = x for x ∈ [0, 1] and FU (x) = 1 for x > 1. Let F

be a strictly increasing and continuous distribution function and let Q be its quantile

function. Define X = Q(U). Then X has distribution F .

Proof. We have

P(X ≤ x) = P(Q(U) ≤ x) = P(F (Q(U)) ≤ F (x))

= P(U ≤ F (x)︸ ︷︷ ︸

=y

) = y = F (x) .

Remark 74. This result is very important for simulations.

Existence theorem Before, we started with a probability measure P and a random

variable X and we defined the cdf. The following result gives a converse statement.

Theorem 75. If F is a nondecreasing, right-continuous function satisfying limx→−∞ F (x) =

0 and limx→∞ F (x) = 1, then there exists on some probability space (Ω,F ,P) a random

variable X for which F (x) = P(X ≤ x).

Proof. Assume for a moment that F is strictly increasing and continuous. From Ex-

ample 70 we already know that there exists an uniform random variable U defined on

the probability space ([0, 1],B, λ). By Lemma 73 we can construct a random variable

X with the cdf F .

For the general case, please see the textbook.

29

• If Xi are independent, then

Var

(

n∑

i=1

Xi

)

=

n∑

i=1

Var(Xi) .

4.9 Inequalities

Lemma 54. Assume that X is nonnegative. Then for any > 0,

P(X ≥ ) ≤ −1E[X] .

Proof.

E[X] =

∑

x

xP(X = x) ≥

∑

x,x≥

xP(X = x) ≥

∑

x,x≥

P(X = x) .

Applying this to |X|k we obtain the Markov inequality: for any X:

P(|X| ≥ ) ≤ −kE[|X|k] .

Applying this with k = 2 and |X − µ| we obtain the Chebyshev inequality:

P(|X − µ| > ) ≤ −2Var(X) .

Other inequalities:

• If ϕ is convex then we obtain the Jensen inequality:

ϕ(E[X]) ≤ E[ϕ(X)] .

21

• If p, q > 1 and 1/p+ 1/q = 1, then we obtain the Ho¨lder inequality:

E[|XY |] ≤ (E[|X|q])1/q(E[|X|p])1/p

(Use ab ≤ ap/p+ bq/q for a, b > 0).

• If 0 < p ≤ q then we obtain the Lyapunov inequality:

(E[|X|p])1/p ≤ (E[|X|q])1/q .

5 Measures in Euclidean Spaces

5.1 Lebesgue Measure

The Lebesgue measure λ was already defined on B, the Borel σ-field on R. It is the

only measure that satisfies λ((a, b]) = b− a. The class Bk is a σ-field generated by the

sets of the form

Ik :=

k∏

i=1

(ai, bi] = (a1, b1]× · · · × (ak, bk] .

The k-dimensional Lebesgue measure λk is defined as

λk(Ik) =

k∏

i=1

(bi − ai) .

The extension theorem allows us to consider this measure on Bk.

Theorem 55. If A ∈ Bk, then for x ∈ Rk, A + x = {a + x : a ∈ A} ∈ Bk and

λk(A+ x) = λk(A).

Proof. Let G = {A ∈ Bk : A+ x ∈ Bk for all x ∈ Rk}. Then

• G contains Ik, the class of all the rectangles Ik;

• G is a σ-field. For example, if A,B ∈ G, then C = A ∪ B ∈ G. Indeed,

C + x = (A ∪B) + x = (A+ x) ∪ (B + x).

• Thus, Bk = σ(Ik) ⊆ G.

For the second statement,

• Note that Ik is a pi-system (closed under intersections; this is not a field!);

• Fix x ∈ Rk and define µ(A) = λk(A+ x).

• Then µ and λk agree on the pi-system since for A ∈ Ik, λk(A) = λk(A + x).

Thus, µ and λk agree on all Borel sets.

Let T : Rk → Rk be a linear and nonsingular map. For example,

• rotation or reflection (special case of so-called orthogonal or unitary transfor-

mation); det(T ) = ±1;

• T (x1, . . . , xk) = (x1 + x2, x2, . . . , xk), then det(T ) = 1;

• T (x1, . . . , xk) = (ax1, x2, . . . , xk), then det(T ) = a.

For A ∈ Rk denote TA = {Ta : a ∈ A}.

Theorem 56. If T is linear and nonsingular, then A ∈ Bk implies TA ∈ Bk and

λk(TA) = |det(T )|λk(A) .

22

Singular case. Note that

B = {a} × (a2, b2]× · · · × (ak, bk]

can be viewed as an element of both Rk and Rk−1. Then

λk(B) = 0 , λk−1(B) =

k∏

i=2

(bi − ai) .

Note that B can be viewed as the image of

A = (a1, b1]× (a2, b2]× · · · × (ak, bk]

through the projection: B = TA, where T (x1, x2, . . . , xk) = (a, x2, . . . , xk). We have

det(T ) = 0.

In fact we have a general statement: if A, TA ∈ Bk and T is singular, then

λk(TA) = 0. Note however, contrary to the singular case, it is not always true that

TA ∈ Bk.

5.2 Regularity

Open and closed sets A subset A ∈ R is open if for every x ∈ A there exists

> 0 such that x+ ∈ A. For example, (a, b) is open, but (a, b] is not. For x, y ∈ Rk

let d(x, y) =

√∑k

i=1(xi − yi)2 be the Euclidean distance. Then A ∈ Rk is open if for

any x ∈ A there exists > 0 such that d(x, y) < implies y ∈ A. Equivalently, a set

A is open if there exists > 0 such that the ball B(x, ) = {y ∈ Rk : d(x, y) < } ⊆ A.

A closed is a complement of an open set. A compact set is bounded and closed.

Theorem 57. For any A ∈ Bk and > 0 there exist open and closed sets O and C

such that C ⊂ A ⊂ O and λk(O \ C) < .

5.3 Specifying Measures on the Line

Measures on R Let µ be a measure on R. Then

• µ is finite on bounded sets if assigns finite values to bounded subsets A ∈ R.

• µ is a finite measure if µ(R) <∞.

• Any probability measure is a finite measure (in fact, probability measures are

normalized finite measures).

• The Lebesgue measure is finite on bounded sets, but it is not finite: λ(R) =∞.

For a measure that is finite on bounded sets, define

F (x) =

{

µ((0, x]) , x ≥ 0

−µ((x, 0]) , x < 0 .

• The function F is nondecreasing;

• The function F is continuous from the right: if xn ↓ x, then F (xn)→ F (x).

23

Distribution function For every bounded interval (a, b],

µ((a, b]) = F (b)− F (a) . (23)

Note that (23) determines F up to an additive constant: if we know µ then from (23)

we can recover F (x) + c. If moreover µ is finite, then we can alternatively define

F (x) = µ((−∞, x]) .

Then:

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = µ(R);

• If µ(R) = 1, then F is called a (cumulative) distribution function.

• If µ = P is a probability measure, then for some random variable µ((−∞, x]) =

P(X ≤ x).

Theorem 58. Let F be a nondecreasing, right-continuous function. Then there exists

a unique measure µ such that (23) holds for all a, b. In particular, there is 1-1

equivalence between probability measures and distribution functions.

Proof. We need to show first that µ defined in (23) is a measure. Let A = (a, b] and

B = (c, d] be disjoint. We want to show that µ((a, b] ∪ (c, d]) = µ((a, b]) + µ((c, d]).

(Full details in assignment)

Example 59. Consider the following measure defined on the Borel sets of [0, 1]:

µ = (1/3)δ1/2 + (2/3)δ1 ,

where δx is the Dirac measure. Then

F (x) =

0 if x < 1/2 ,

1/3 if 1/2 ≤ x < 2/3 ,

1 if 2/3 ≤ x < 1 .

It is the probability measure since the total mass is one.

5.4 Specifying Measures on R2

Measures on R2 Let a = (a1, a2), b = (b1, b2) and A = (a, b] = (a1, b1] × (a2, b2]

be a bounded rectangle. For x ∈ R2 consider

Sx = {y ∈ R2 : y1 ≤ x1, y2 ≤ x2} .

The class of sets {Sx, x ∈ R2} generates B2. For a function F : R2 → R define

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2) .

Let now µ be a finite measure on R2. Define F by

F (x) = µ(Sx) = µ({y : y1 ≤ x1, y2 ≤ x2}) . (24)

24

Theorem 60. Let F be continuous from above and such that ∆AF ≥ 0 for any

bounded rectangle A. Then there exists a unique measure µ such that (24) holds.

If µ = P is a probability measure, then for some random vector (X,Y ),

∆AF = F (b1, b2)− F (b1, a2)− F (a1, b2) + F (a1, a2)

= µ((−∞, b1]× (−∞, b2])− µ((−∞, b1]× (−∞, a2])

+ µ((−∞, a1]× (−∞, b2]) + µ((−∞, a1]× (−∞, a2])

= P(X ≤ b1, Y ≤ b2)− P(X ≤ b1, Y ≤ a2)

− P(X ≤ a1, Y ≤ b2) + P(X ≤ a1, Y ≤ a2)

= P(a1 < X ≤ b2, a2 < Y ≤ b2) .

Thus, the requirement ∆AF ≥ 0 is quite natural.

6 Measurable Functions and Mappings

6.1 Measurable Mappings

Let (Ω,F) and (Ω′,F ′) be two measurable spaces. A map T : Ω→ Ω′ is measurable

(sometimes written as F/F ′-measurable) if for each A′ ∈ F ′, we have T−1A′ ∈ F .

Recall that a F/B-measurable map is called a random variable. In this we will simply

write F-measurable.

Example 61. A real function f : Ω→ R with a finite range is measurable if f−1({x}) ∈

F .

Theorem 62. If T−1A′ ∈ F for each A′ ∈ A′ and σ(A′) = F ′, then T is F/F ′-

measurable. (cf. theorem 4, Slides Set E).

Proof. Let G = {A′ : T−1A′ ∈ F}. Then G is a σ-field and contains A′.

Let T : Ω → Ω′ and T ′ : Ω′ → Ω′′ . The space Ω′′ is equipped with a σ-field F ′′ .

A composition T ′ ◦ T : Ω→ Ω′′ is defined as ω → T ′(T (ω)).

Theorem 63. If T is F/F ′-measurable and T ′ is F ′/F ′′ -measurable then T ′ ◦ T is

F/F ′′ -measurable (composition of measurable maps is measurable).

6.2 Mappings into Rk

A map f : Ω→ Rk has the form

f(ω) = (f1(ω), . . . , fk(ω)) .

Since the sets

Sx = {y ∈ Rk : yi ≤ xi, i = 1, . . . , k}

generate Bk, a function f is F-measurable if and only if the set

{ω : fi(ω) ≤ xi, i = 1, . . . , k}

belongs to F for each (x1, . . . , xk).

Theorem 64. If f : Ri → Rk is continuous, then it is measurable.

Proof. Take for simplicity i = 1, k = 1. We need to show that Sx = {ω : f(ω) ≤ x} ∈ B

for each x. But continuity implies that Sx is a closed set, hence a complement of an

open set. Open sets are Borel sets.

25

6.3 Limits and Measurability

Theorem 65. Assume that fn, n ≥ 1, are real valued F-measurable functions. Then

(a) The functions supn fn, inf fn, lim supn fn, lim inf fn are measurable.

(b) If limn fn exists everywhere then it is measurable.

(c) The set {ω : limn fn(ω) exists} ∈ F .

Proof. We have

{ω : sup

n

fn(ω) ≤ x} = ∩n{ω : fn(ω) ≤ x} ∈ F .

Thus, supn fn is measurable.

Note: Assume that we have a probability space (Ω,F ,P) and f : Ω → R. Then

”limn fn exists everywhere” means that

P({ω : lim

n

fn(ω) does not exist}) = 0 .

For example if P = λ then a function f : (0, 1] → R that has jumps at countable

number of points is not continuous, but measurable.

Example 66. Let f(x) = x, g(x) = 1−x, x ∈ [0, 1]. Then h = sup{f, g} is a function

on [0, 1] defined as h(x) = 1− x, x ∈ [0, 1/2], h(x) = x, x ∈ (1/2, 1].

6.4 Transformations of Measures

Let (Ω,F , µ) be a measurable space (for example, a probability space). Let T : Ω→ Ω′

be F/F ′-measurable. Define a set function4 ν

ν(A′) = µT−1(A′) = µ(T−1A′) , A′ ∈ F ′ .

We note that ν = µT−1 is a set function on F ′.

It is not obvious that ν is a measure itself. However, assume that A′, B′ ∈ F ′ are

disjoint. Then the sets T−1(A′) and T−1(B′) are also disjoint and

ν(A′ ∪B′) = µ(T−1(A′ ∪B′)) = µ(T−1(A′) ∪ T−1(B′))

= µ(T−1(A′)) + µ(T−1(B′)) = ν(A′) + ν(B′) .

In a similar way you can justify other properties. Hence ν is a measure.

• If µ is finite then µT−1 is also finite.

• If µ is a probability measure then µT−1 is also a probability measure.

Indeed,

ν(Ω′) = µ(T−1Ω′) = µ(Ω) = 1 .

4Note that any measure is a set function, but not vice-versa

26

Measure preserving transformations

Definition 67. Let T : Ω→ Ω. If

µ(A) = µ(T−1(A)) , A ∈ F ,

then we say that T preserves the measure µ.

Example 68. If Ω = Rk and T is a linear map with det(T ) = 1, then T preserves

the Lebesque measure λk.

Example 69. The identity map is always measure preserving.

7 Distributions

7.1 Transformation of probability measures. Distributions

Consider a probability space (Ω,F ,P).

Recall that a random variable X is a measurable map X : Ω → R. Define the

measure µ on B by

µ = PX−1 .

Thus, for each A ∈ B we have

µ(A) = P(X−1A) = P({ω : X(ω) ∈ A}) = P(X ∈ A) .

We say that µ is the law or the distribution of a random variable X.

The (cumulative) distribution function of X is

F (x) = µ((−∞, x]) = P(X ≤ x) .

Example 70. Consider a probability space ([0, 1],B, λ). Define a map U(ω) = ω.

Then µ = λ and the cumulative distribution function of U is

FU (x) = P(U ≤ x) =

0 x < 0 ,

x x ∈ [0, 1] ,

1 x > 1 .

We say that U has a uniform distribution.

Example 71. Consider a probability space ([0, 1],B, λ). Assume that X is a random

variable defined on Ω with the law µ = PX−1. Define Y = X + b, b ∈ R. Note that Y

is also defined on Ω. Consider the law ν = P−1Y . Recall that for A ⊆ R the set A− b

is defined as A− b = {a− b : a ∈ A}. Then

ν(A) = P(Y −1A) = P({ω : Y (ω) ∈ A}) = P({ω : X(ω) + b ∈ A})

= P({ω : X(ω) ∈ A− b}) = P(X ∈ A− b) = µ(A− b) .

If FX and FY are the cumulative distribution functions for X and Y , respectively, then

FY (x) = FX(x− b) .

27

Properties of cdf

• limx→−∞ F (x) = 0;

• limx→+∞ F (x) = 1;

• If F is a distribution function then

F¯ (x) = 1− F (x) = 1− P(X ≤ x) = P(X > x)

is called the tail distribution function.

• F (x−) = limy↑x F (y) = P(X < x) = µ((−∞, x));

• P(X = x) = F (x)− F (x−). If P(X = x) 6= 0 then we say that F has a jump at

x.

Example 72. Assume that X is a nonnegative random variable with the cumulative

distribution function FX (for example, without any jumps). For b > 0 define

Y = (X − b)+ =

{

X − b if X > b ,

0 if X ≤ b .

By the definition Y ≥ 0. Therefore, P (Y ≤ y) = 0 for all y < 0.

For y = 0 we have

P(Y ≤ 0) = P(Y = 0) = P(X ≤ b) = FX(b) .

Moreover, for y > 0,

P(Y ≤ y) = P(Y ≤ y,X ≤ b) + P(Y ≤ y,X > b)

= P(0 ≤ y,X ≤ b) + P(X − b ≤ y,X > b)

= P(X ≤ b) + P(b < X ≤ y + b)

= P(X ≤ y + b) = FX(y + b) .

In summary

FY (y) =

0 , y < 0

FX(b) , y = 0

FX(y + b) , y > 0

.

Note that the cumulative distribution function has the jump of size P (X ≤ b) at point

y = 0.

Quantile function Let F be the cumulative distribution function. Recall that

F : R→ [0, 1]. Define a function Q : [0, 1]→ R by

Q(y) = inf{x : F (x) ≥ y} .

• If F is strictly increasing and continuous (no jumps), then Q is nothing else but

the inverse function of F (strictly increasing and continuous). Then Q(F (x)) =

x.

• If F is strictly increasing but has some jumps, then Q is continuous but not

strictly increasing.

• If F is strictly not increasing but continuous, then Q has jumps but is strictly

increasing.

28

Lemma 73. Assume that U is uniform on [0, 1], that is its distribution is given by

FU (x) = 0 if x < 0, FU (x) = x for x ∈ [0, 1] and FU (x) = 1 for x > 1. Let F

be a strictly increasing and continuous distribution function and let Q be its quantile

function. Define X = Q(U). Then X has distribution F .

Proof. We have

P(X ≤ x) = P(Q(U) ≤ x) = P(F (Q(U)) ≤ F (x))

= P(U ≤ F (x)︸ ︷︷ ︸

=y

) = y = F (x) .

Remark 74. This result is very important for simulations.

Existence theorem Before, we started with a probability measure P and a random

variable X and we defined the cdf. The following result gives a converse statement.

Theorem 75. If F is a nondecreasing, right-continuous function satisfying limx→−∞ F (x) =

0 and limx→∞ F (x) = 1, then there exists on some probability space (Ω,F ,P) a random

variable X for which F (x) = P(X ≤ x).

Proof. Assume for a moment that F is strictly increasing and continuous. From Ex-

ample 70 we already know that there exists an uniform random variable U defined on

the probability space ([0, 1],B, λ). By Lemma 73 we can construct a random variable

X with the cdf F .

For the general case, please see the textbook.

29

学霸联盟:"https://www.xuebaunion.com"