ST2313

REVISION

University of London
International Programmes
Dr Larry Gui
ST2133 Topics
Module provides a basis for an advanced course in statistical inference.
Aim — to provide a thorough theoretical grounding in probability distributions. The course teaches fundamental material
that is required for specialised courses in statistics, actuarial science and econometrics.
2 Probability space
• Probability measure, Methods for counting, Conditional probability and independence
3 Random variables and univariate distributions
• Random Variables, Distribution functions, Expectation, variance and higher moments.
• Generating functions, Functions of random variables
4 Multivariate distributions
• Joint & marginal distributions, Expectation and Joint Moments , Random vectors and random matrices
• Transformations and Sum of random variables, Multivariate normal distribution
5 Conditional distributions
• Discrete and continuous conditional distributions, Conditional expectation and conditional moments
• Hierarchies and mixtures, Random sums, Conditioning for random vectors
ST2313

CHAPTER 2 – PROBABILITY SPACE

University of London
International Programmes
Dr Larry Gui
CHAPTER 2 — Probability space
2.1  Intuitive probability
2.2  Mathematical probability
2.2.1 Measure
2.2.2 Probability measure
2.3  Methods for counting outcomes
2.3.1 Permutations and combinations
2.3.2 Number of combinations and multinomial coefficients
2.4  Conditional probability and independence
2.4.1 Conditional probability
2.4.2 Law of total probability and Bayes’ theorem
2.4.3 Independence
Dr Larry Gui
2.2.1 Measure
Definition 2.2.4
The indicator function for a set

S ⊆ ℝ
IS(x) = {1 if x ∈ S0 otherwise
ST2313

CHAPTER 3 – RANDOM VARIABLES AND UNIVARIATE DISTRIBUTIONS

University of London
International Programmes
Dr Larry Gui

Chapter 3 – random variables and univariate distributions
3.1  Random Variables as Functions
3.2  Distribution Functions
3.3  Discrete and continuous random variables
3.4 Expectation, variance and higher moments
3.5 Generating functions
3.6 Functions of random variables
3.7 Sequences of random variables and convergence
Dr Larry Gui
3.4.5 Moments
Definition 3.4.18 (Moments)
For a random variable X and positive integer r, the rth moment of X is
(whenever it is well defined)

Definition 3.4.19 (Central moments)
For a random variable X and positive integer r, the rth central moment of
X is (whenever it is well defined)
μ′ r = E(Xr)
μr = E[(X − E(X ))r]
Dr Larry Gui
3.5.1 Moment generating functions
Definition 3.5.1 (Moment generating function)
The moment generating function of a discrete random variable X is a function
given by

requiring for some h > 0 such that .
Maclaurin expansion for
MX : ℝ → [0,∞)
MX(t) = E(etX)
={
∑x e
tx fX(x) if X is discrete
∫∞−∞ e
tx fX(x)dx if X is continuous
MX(t) <∞ for all t ∈ [−h, h]
ex
ex = 1 + x + 12! x
2 + . . . + 1
r! x
r + . . .
Dr Larry Gui
3.5.1 Moment generating functions
Proposition 3.5.2 (Expansion of Moment generating function)

Generating rth moment (a)
The coefficient of tr in the expansion of the moment generating function is the
rth moment divided by r! That is

MX(t) = E(etX)
1 + tE(X ) + t
2
2! E(X
2) + . . . + t
r
r! E(X
r)
MX(t) = 1 + E(X ) t +
1
2! E(X
2) t2 + . . . + 1
r! E(X
r) tr + . . .
Dr Larry Gui
3.5.1 Moment generating functions
Proposition 3.5.3 (Derivatives of a Moment generating function at zero)
The rth derivative of the moment generating function evaluate at zero is
the rth moment

Proposition 3.5.4 (Uniqueness of Moment generating function)
If X and Y are random variables and we can find h > 0 such that

for all , then .
M(r)X (0) =
dr
dtr
MX(t)
t=0
= E(Xr)
MX(t) = MY(t)
t ∈ [−h, h] FX(x) = FY(x) for all x ∈ ℝ
ST2313

CHAPTER 4 – MULTIVARIATE DISTRIBUTIONS

University of London
International Programmes
Dr Larry Gui

Chapter 4 – Multivariate Distributions
4.1  Joint and marginal distributions
4.2  Joint mass and joint density
4.3  Expectation and joint moments
4.4 Independence for pairs of random variables
4.5 Random vectors and random matrices
4.6 Transformations of continuous random variables
4.7 Sums of random variables
4.8  Multivariate normal distribution
Dr Larry Gui
4.2.1 Mass for discrete distributions
Claim 4.2.4 (Marginal mass from joint mass)
For discrete random variables X and Y with joint mass function , the
marginal mass functions are given by

Multivariate extension: for discrete random variables
fX,Y
fX(x) =∑
y
fX,Y(x, y)
fY(y) =∑
x
fX,Y(x, y)
X1, X2, . . . , Xn
fXj(xj) =∑
x1
. . .∑
xj−1

xj+1
. . .∑
xn
fX1,...,Xn(x1, . . . , xn)
Dr Larry Gui
4.2.2 Density for continuous distributions
Definition 4.2.5 (Joint density function)
For jointly continuous random variables X and Y, with joint distribution
function , the joint density function is such that

Claim 4.2.6 (Joint density from joint distribution)
For jointly continuous random variables X and Y, with joint distribution
function , the joint density function is given by
FX,Y fX,Y : ℝ2 → [0,∞)
FX,Y(x, y) = ∫
y
−∞ ∫
x
−∞
fX,Y(u, v)dudv ∀x, y ∈ ℝ
FX,Y
fX,Y(x, y) =
δ2
δuδv
FX,Y(u, v)
u=x,v=y
Dr Larry Gui
4.2.2 Density for continuous distributions
Claim 4.2.9 (Marginal density from joint density)
If X and Y are jointly continuous random variables with joint density
function , then

fX,Y(x, y)
fX(x) = ∫

−∞
fX,Y(x, y)dy
fY(y) = ∫

−∞
fX,Y(x, y)dx
Dr Larry Gui
4.3.2 Covariance and correlation
Definition 4.3.5 (Covariance)
For random variables X and Y, the covariance between X and Y is
defined as

Alternatively,
Cov(X,Y ) = E[(X − E(X ))(Y − E(Y ))]
Cov(X,Y ) = E(XY ) − E(X )E(Y )
Dr Larry Gui
4.6.1 Bivariate transformations
Definition 4.6.1 (Change of variables formula)
If (U, V) is a pair of continuous random variables with support ,
and with g mapping D onto the range , and
, the joint density of X and Y is

where the Jacobian of the inverse transformation is
D ⊆ ℝ2
(X,Y ) = g(U,V ) R ∈ ℝ2
h(X,Y ) = g−1(X,Y ) = (U,V )
fX,Y(x, y) = {fU,V(h(x, y)) |Jh(x, y) |  for (x, y) ∈ R0 otherwise
Jh(u, v) =

∂x h1(x, y)

∂x h2(x, y)

∂y h1(x, y)

∂y h2(x, y)
ST2313

CHAPTER 5 – CONDITIONAL DISTRIBUTIONS

University of London
International Programmes
Dr Larry Gui

Chapter 5 – Conditional Distributions
5.1  Discrete conditional distributions
5.2  Continuous conditional distributions
5.3  Relationship between joint, marginal and conditional
5.4 Conditional expectation and conditional moments
5.5 Hierarchies and mixtures
5.6 Random sums
5.7 Conditioning for random vectors
Dr Larry Gui
5.2 Continuous conditional distributions
Definition 5.2.1 (Conditional density)
Suppose X and Y are jointly continuous random variables with joint density ,
and the marginal density of X is . The conditional density of Y given is

The distribution function and expected value of are

and
fX,Y
fX X = x
fY|X(y |x) ={
fX,Y(x, y)
fX(x)
for fX(x) > 0
0 otherwise
Y |X = x
FY|X(y |x) = ∫
y
−∞
fY|X(u |x)du
E(Y |X = x) = ∫

−∞
y fY|X(y |x)dy
Dr Larry Gui
5.4 Conditional expectation and moments
Definition 5.4.1 (Conditional expectation)
Suppose X and Y are random variables define

The conditional expectation of is thus which is a
random variable.
Proposition 5.4.2 (Law of iterated expectations)
If X and Y are random variables with conditional expectation ,
ψ (x) = E[Y |X = x] =
∑y y fY|X(y |x) discrete case
∫∞−∞ y fY|X(y |x)dy continuous case
Y |X E(Y |X ) = ψ (X )
E(Y |X )
E(Y ) = E[E(Y |X )]
Dr Larry Gui
5.4.1 Conditional expectation
Proposition 5.4.2 (Law of iterated expectations)
If X and Y are random variables with conditional expectation ,

Consequence:
E(Y |X )
E(Y ) = E[E(Y |X )]
E[Y ] ={
∑x E(Y |X = x) fX(x) discrete case
∫∞−∞E(Y |X = x) fX(x)dx continuous case
Dr Larry Gui
5.4.2 Conditional moments
Lemma 5.4.9 (Useful representation of conditional variance)
For random variables X and Y, the conditional variance of Y given X can
be written as

Proposition 5.4.10 (Decomposition of variance)
For random variables X and Y, the variance of Y is given by
Var(Y |X ) = E(Y2 |X ) − E(Y |X )2
Var(Y ) = E(Var(Y |X )) + Var(E(Y |X ))
Dr Larry Gui
Exam 201b (4.7, 5.6)
3. If X is Gamma distributed with parameters and , i.e. , then it has density:
, for
and for .
(a) If , , and X is independent of Y, derive the distribution of
. You may use the moment generating function of a Gamma random variable without proof, as
long as you state it clearly.
(b) Let be independent of each other and . Each is also
independent of N, which is Poisson distributed with mean µ, so that the probability mass function for
N is given by:
, for n = 0, 1, …
Consider the random variable (where W = 0 if N = 0.)
(i) Derive the moment generating function of W.
(ii) Find the mean of W. You can use the means of a Poisson and a Gamma random variable
without proof. If you use any standard results about sums, you must first state home clearly.
α β X ∼ Gamma(α, β)
fX(x) =
βα
Γ(α) x
α−1e−βx x > 0
Γ(α) = ∫

0
yα−1e−ydy x > 0
X ∼ Gamma(α1, β) Y ∼ Gamma(α2, β)
X + Y
Xi ∼ Gamma(α, β), i = 1,...,N α, β > 0 Xi
pN(n) =
μne−μ
n!
W =
N

i=1
Xi

Section A
Answer all three parts of question 1 (40 marks in total)
1. (a) Let g(x) be a function taking on integer values of x, with
g(x) =

2a, x = −3,−1;
a, x = 0, 2;
3a, x = 1, 3;
0, otherwise.
i. Find a so that g(x) is a probability mass function. [3 marks]
ii. Let X be a discrete random variable with probability mass function g(x).
Find E(X) and Var(X). [5 marks]
iii. Write down the probability mass function of Y = X2− 4|X|+4. [4 marks]
(b) The cumulative distribution function FX(·) for the continuous random variable
X is defined by
FX(x) =

0, x < 0;
ax2/4, 0 ≤ x < 1;
((x− 1)3 + a)/4, 1 ≤ x < 2;
1, x ≥ 2.
i. Find the value of a. [1 mark]
ii. Derive the probability density function of X. [4 marks]
iii. Let W = X2. Derive the cumulative distribution function of W . Hence,
derive the probability density function of W . [7 marks]
(c) Let X follow an exponential distribution with rate λ, i.e., X has a density
function
fX(x) =
{
λe−λx, x > 0;
0, otherwise.
UL18/0327 Page 2 of 7
Section A
Answer all three parts of question 1 (40 marks in total)
1. (a) Let g(x) be a function taking on integer values of x, with
g(x) =

2a, x = −3,−1;
a, x = 0, 2;
3a, x = 1, 3;
0, otherwise.
i. Find a so that g(x) is a probability mass function. [3 marks]
ii. Let X be a discrete random variable with probability mass function g(x).
Find E(X) and Var(X). [5 marks]
iii. Write down the probability mass function of Y = X2− 4|X|+4. [4 marks]
(b) The cumulative distribution function FX(·) for the continuous random variable
X is defined by
FX(x) =

0, x < 0;
ax2/4, 0 ≤ x < 1;
((x− 1)3 + a)/4, 1 ≤ x < 2;
1, x ≥ 2.
i. Find the value of a. [1 mark]
ii. Derive the probability density function of X. [4 marks]
iii. Let W = X2. Derive the cumulative distribution function of W . Hence,
derive the probability density function of W . [7 marks]
(c) Let X follow an exponential distribution with rate λ, i.e., X has a density
function
fX(x) =
{
λe−λx, x > 0;
0, otherwise.
UL18/0327 Page 2 of 7
Section A
Answer all three parts of question 1 (40 marks in total)
1. (a) Let g(x) be a function taking on integer values of x, with
g(x) =

2a, x = −3,−1;
a, x = 0, 2;
3a, x = 1, 3;
0, otherwise.
i. Find a so that g(x) is a probability mass function. [3 marks]
ii. Let X be a discrete random variable with probability mass function g(x).
Find E(X) and Var(X). [5 marks]
iii. Write down the probability mass function of Y = X2− 4|X|+4. [4 marks]
(b) The cumulative distribution function FX(·) for the continuous random variable
X is defined by
FX(x) =

0, x < 0;
ax2/4, 0 ≤ x < 1;
((x− 1)3 + a)/4, 1 ≤ x < 2;
1, x ≥ 2.
i. Find the value of a. [1 mark]
ii. Derive the probability density function of X. [4 marks]
iii. Let W = X2. Derive the cumulative distribution function of W . Hence,
derive the probability density function of W . [7 marks]
(c) Let X follow an exponential distribution with rate λ, i.e., X has a density
function
fX(x) =
{
λe−λx, x > 0;
0, otherwise.
UL18/0327 Page 2 of 7
i. Derive the moment generating function of X. [3 marks]
ii. Let Y be an independent and identically distributed copy of X. For w > 0,
show that
P (X − Y ≤ w) = 1− e
−λw
2
.
(Hint: find the joint density of X and Y first. Determine the valid region
in the double integral involved.) [5 marks]
iii. For w ≤ 0, show that
P (X − Y ≤ w) = e
λw
2
.
[5 marks]
iv. Using parts ii and iii of question (c), show that the density function of
W = X − Y is given by
fW (w) =
λe−λ|w|
2
, w ∈ R.
[3 marks]
UL18/0327 Page 3 of 7

Section B
Answer all three questions in this section (60 marks in total)
2. The conditional density of a random variable X given Y = y is given by
fX|Y (x|y) =
{
x/(2y2), 0 < x < 2y < 2;
0, otherwise.
The conditional density of Y given X = x is given by
fY |X(y|x) =
{
24y2/(8− x3), 0 < x < 2y < 2;
0, otherwise.
(a) Find the ratio fY (y)/fX(x), where fX(x) and fY (y) are the marginal den-
sities of X and Y , respectively. [2 marks]
(b) By integrating out y first in the answer in (a), show that
fX(x) =
{
(5x(8− x3))/48, 0 < x < 2;
0, otherwise.
(c) Let U = XY and V = X/Y . Derive the joint density for U, V , and carefully
state the region for (U, V ) where this joint density is non-zero. [9 marks]
UL18/0327 Page 4 of 7

3. If X is Gamma distributed with parameters α and β, i.e., X ∼ Gamma(α, β),
then it has density
fX(x) =
βα
Γ(α)
xα−1e−βx, x > 0,
and Γ(α) =
∫∞
0 y
α−1e−ydy for α > 0.
(a) Suppose X ∼ Gamma(α1, β1), Y ∼ Gamma(α2, β2), and X is independent
of Y . Derive the distribution of β1X + β2Y . You may use the moment
generating function of a Gamma random variable without proof, as long as
you state it clearly. [7 marks]
(b) Let Xi ∼ Gamma(α, βi), i = 1, . . . , N , be independent of each other and
α, βi > 0. Each Xi is also independent of N , which is Poisson distributed
with mean µ, so that the probability mass function for N is given by
pN(n) =
µne−µ
n!
, n = 0, 1, . . . .
Consider the random variable
W =
N∑
i=1
βiXi,
with the convention that W = 0 if N = 0.
i. Derive the moment generating function of W . [8 marks]
ii. Find the mean of W . You can use the mean of a Poisson random vari-
able without proof. The mean of X ∼ Gamma(α, β) is α/β. [5 marks]
UL18/0327 Page 5 of 7
3. If X is Gamma distributed with parameters α and β, i.e., X ∼ Gamma(α, β),
then it has density
fX(x) =
βα
Γ(α)
xα−1e−βx, x > 0,
and Γ(α) =
∫∞
0 y
α−1e−ydy for α > 0.
(a) Suppose X ∼ Gamma(α1, β1), Y ∼ Gamma(α2, β2), and X is independent
of Y . Derive the distribution of β1X + β2Y . You may use the moment
generating function of a Gamma random variable without proof, as long as
you state it clearly. [7 marks]
(b) Let Xi ∼ Gamma(α, βi), i = 1, . . . , N , be independent of each other and
α, βi > 0. Each Xi is also independent of N , which is Poisson distributed
with mean µ, so that the probability mass function for N is given by
pN(n) =
µne−µ
n!
, n = 0, 1, . . . .
Consider the random variable
W =
N∑
i=1
βiXi,
with the convention that W = 0 if N = 0.
i. Derive the moment generating function of W . [8 marks]
ii. Find the mean of W . You can use the mean of a Poisson random vari-
able without proof. The mean of X ∼ Gamma(α, β) is α/β. [5 marks]
UL18/0327 Page 5 of 7
4. Suppose we have a biased coin, which comes up heads with probability u. An
experiment is carried out so that X is the number of independent flips of the
coin required for r heads to show up, where r ≥ 1 is known.
(a) Show that the probability mass function for X is
pX(x) =
{ (
x−1
r−1
)
ur(1− u)x−r, x = r, r + 1, . . . .;
0, otherwise.
[5 marks]
(b) Suppose U is random and has a density given by
fU(u) =
{
Γ(α+β)
Γ(α)Γ(β)u
α−1(1− u)β−1, 0 < u < 1;
0, otherwise.
where α, β > 0, and Γ(α) is defined in question 3, which has the property
that Γ(α) = (α − 1)Γ(α − 1) for α ≥ 1, and Γ(k) = (k − 1)! for a positive
integer k. The distribution in part (a) thus becomes
pX|U(x|u) =
{ (
x−1
r−1
)
ur(1− u)x−r, x = r, r + 1, . . . .;
0, otherwise.
i. Find the marginal probability mass function of X if α = β = 2.
. [6 marks]
ii. With α = β = 2 still, show that the density of U |X = x is given by
fU |X(u|x) =
{
(x+3)!
(r+1)!(x−r+1)!u
r+1(1− u)x−r+1, 0 < u < 1;
0, otherwise.
Hence find the mean of U |X = x. [5 marks]
(c) Another independent experiment is carried out, with Y denoting the num-
ber of independent flips of the coin required for r heads to show up (the
same r as for the first experiment).
UL18/0327 Page 6 of 7
4. Suppose we have a biased coin, which comes up heads with probability u. An
experiment is carried out so that X is the number of independent flips of the
coin required for r heads to show up, where r ≥ 1 is known.
(a) Show that the probability mass function for X is
pX(x) =
{ (
x−1
r−1
)
ur(1− u)x−r, x = r, r + 1, . . . .;
0, otherwise.
[5 marks]
(b) Suppose U is random and has a density given by
fU(u) =
{
Γ(α+β)
Γ(α)Γ(β)u
α−1(1− u)β−1, 0 < u < 1;
0, otherwise.
where α, β > 0, and Γ(α) is defined in question 3, which has the property
that Γ(α) = (α − 1)Γ(α − 1) for α ≥ 1, and Γ(k) = (k − 1)! for a positive
integer k. The distribution in part (a) thus becomes
pX|U(x|u) =
{ (
x−1
r−1
)
ur(1− u)x−r, x = r, r + 1, . . . .;
0, otherwise.
i. Find the marginal probability mass function of X if α = β = 2.
. [6 marks]
ii. With α = β = 2 still, show that the density of U |X = x is given by
fU |X(u|x) =
{
(x+3)!
(r+1)!(x−r+1)!u
r+1(1− u)x−r+1, 0 < u < 1;
0, otherwise.
Hence find the mean of U |X = x. [5 marks]
(c) Another independent experiment is carried out, with Y denoting the num-
ber of independent flips of the coin required for r heads to show up (the
same r as for the first experiment).
UL18/0327 Page 6 of 7

4. Suppose we have a biased coin, which comes up heads with probability u. An
experiment is carried out so that X is the number of independent flips of the
coin required for r heads to show up, where r ≥ 1 is known.
(a) Show that the probability mass function for X is
pX(x) =
{ (
x−1
r−1
)
ur(1− u)x−r, x = r, r + 1, . . . .;
0, otherwise.
[5 marks]
(b) Suppose U is random and has a density given by
fU(u) =
{
Γ(α+β)
Γ(α)Γ(β)u
α−1(1− u)β−1, 0 < u < 1;
0, otherwise.
where α, β > 0, and Γ(α) is defined in question 3, which has the property
that Γ(α) = (α − 1)Γ(α − 1) for α ≥ 1, and Γ(k) = (k − 1)! for a positive
integer k. The distribution in part (a) thus becomes
pX|U(x|u) =
{ (
x−1
r−1
)
ur(1− u)x−r, x = r, r + 1, . . . .;
0, otherwise.
i. Find the marginal probability mass function of X if α = β = 2.
. [6 marks]
ii. With α = β = 2 still, show that the density of U |X = x is given by
fU |X(u|x) =
{
(x+3)!
(r+1)!(x−r+1)!u
r+1(1− u)x−r+1, 0 < u < 1;
0, otherwise.
Hence find the mean of U |X = x. [5 marks]
(c) Another independent experiment is carried out, with Y denoting the num-
ber of independent flips of the coin required for r heads to show up (the
same r as for the first experiment).
UL18/0327 Page 6 of 7
State (no need for a derivation) the density of U |(X, Y ) = (x, y) and its
mean, where U still has the density in part (b) with α = β = 2. [4 marks]
END OF PAPER
UL18/0327 Page 7 of 7