MTH313 Loss Distribution
Chapter 2: Estimation for Complete Data
Xi’an Jiaotong-Liverpool University
Dr Jiajun Liu
Dr Jiajun Liu (XJTLU) MTH 313 1 / 23
Introduction
Data Set A This data set is well-known in the casualty actuarial
literature. It collects data from 1956–1958 on the number of accidents by
one driver in one year.
Data Set A
Number of accidents Number of drivers
0 81,714
1 11,306
2 1,618
3 250
4 40
5 or more 7
Total 94,935
Dr Jiajun Liu (XJTLU) MTH 313 2 / 23
Data Set B
Data Set B These numbers are artificial. They represent the amounts
paid on workers compensation medical benefits but are not related to any
particular policy or set of policyholders. These payments are the full
amount of loss.
Data Set B
27 82 115 126 155 161 243 294 340 384
457 680 855 877 974 1,193 1,340 1,884 2,558 15,743
Dr Jiajun Liu (XJTLU) MTH 313 3 / 23
Data Set C
Data Set C These observations represent payments on 227 claims from
a general liability insurance policy.
Data Set C
Payments range Number of payments
0-7,500 99
7,500-17,500 42
17,500-32,500 29
32,500-67,500 28
67,500-125,000 17
125,000-300,000 9
Over 300,000 3
Total 227
Dr Jiajun Liu (XJTLU) MTH 313 4 / 23
Data Set D
Data Set D These numbers are artificial. They represent the time at
which a five-year term insurance policy terminates. For some
policyholders, termination is by death, for some it is by surrender (the
cancellation of the insurance contract), and for the remainder it is
expiration of the five-year period.
For Data Set D1, there were 30 policies observed from issue. For each,
both the time of death and time of surrender are presented, provided they
were before the expiration of the five-year period.
Dr Jiajun Liu (XJTLU) MTH 313 5 / 23
Data Set D1
Data Set D1
Policyholder Time of death Time of surrender
1 - 0.1
2 4.8 0.5
3 - 0.8
4 0.8 3.9
5 3.1 1.8
6 - 1.8
7 - 2.1
8 - 2.5
9 - 2.8
10 2.9 4.6
Dr Jiajun Liu (XJTLU) MTH 313 6 / 23
Data Set D1
Policyholder Time of death Time of surrender
11 2.9 4.6
12 - 3.9
13 4.0 -
14 - 4.0
15 - 4.1
16 4.8 -
17 - 4.8
18 - 4.8
19-30 - -
Dr Jiajun Liu (XJTLU) MTH 313 7 / 23
Data Set D2
For Data Set D2, only the time of the first event is observed. In addition,
there are 10 more policyholders who were first observed at some time after
the policy was issued. The table presents the results for all 40 policies.
Dr Jiajun Liu (XJTLU) MTH 313 8 / 23
Data Set D2
Data Set D2
Policy First Last Event Policy First Last Event
1 0 0.1 s 16 0 4.8 d
2 0 0.5 s 17 0 4.8 s
3 0 0.8 s 18 0 4.8 s
4 0 0.8 d 19-30 0 5.0 e
5 0 1.8 s 31 0.3 5.0 e
6 0 1.8 s 32 0.7 5.0 e
7 0 2.1 s 33 1.0 4.1 d
8 0 2.5 s 34 1.8 3.1 d
9 0 2.8 s 35 2.1 3.9 s
Dr Jiajun Liu (XJTLU) MTH 313 9 / 23
Data Set D2
Data Set D2
Policy First Last Event Policy First Last Event
10 0 2.9 d 36 2.9 5.0 e
11 0 2.9 d 37 2.9 4.8 s
12 0 3.9 s 38 3.2 4.0 d
13 0 4.0 d 39 3.4 5.0 e
14 0 4.0 s 40 3.9 5.0 e
15 0 4.1 s
Dr Jiajun Liu (XJTLU) MTH 313 10 / 23
11.2 The empirical distribution for complete individual data
Definition 11.5 The empirical distribution is obtained by assigning
probability 1/n to each data point. As a result, the empirical distribution
function is
Fn(x) =
1
n
⇥ number of observations x .
Dr Jiajun Liu (XJTLU) MTH 313 11 / 23
Example
Example Draw the empirical distribution for a data set containing the
numbers
1.0 1.3 1.5 1.5 2.1 2.1 2.1.
Dr Jiajun Liu (XJTLU) MTH 313 12 / 23
Empirical distribution function
Let x1, x2,. . . , xn be a sample of size n, let
y1 < y2 < · · · < yk
be the k unique values that appear in the sample, and let sj be the
number of times the observation yj appears in the sample. Thus,Pk
j=1 sj = n. Also of interest is the number of observations in the data set
that are greater than or equal to a given value. Both the observations and
the number of observations are referred to as the risk set.
Let rj =
Pk
i=j si be the number of observations greater than or equal to
yj . Using this notation, the empirical distribution function is
Fn(x) =
8<: 0 x < y11 rjn yj1 x < yj j = 2, . . . , k .
1 x yk
Dr Jiajun Liu (XJTLU) MTH 313 13 / 23
Example 11.2
Example 11.2 Consider a data set containing the numbers
1.0 1.3 1.5 1.5 2.1 2.1 2.1.
Determine the quantities described above.
Dr Jiajun Liu (XJTLU) MTH 313 14 / 23
Cumulative Hazard Rate function
Definition 11.6 The cumulative hazard rate function is defined as
H(x) = ln S(x).
Dr Jiajun Liu (XJTLU) MTH 313 15 / 23
Nelson-A˚alen estimate
Definition 11.7 The Nelson-A˚alen estimate of the cumulative hazard
rate function is
Hˆ(x) =
8><>:
0 x < y1Pj1
i=1
si
ri
yj1 x < yj j = 2, . . . , k .Pk
i=1
si
ri
x yk
Dr Jiajun Liu (XJTLU) MTH 313 16 / 23
Example
Example 11.3 Consider a data set containing the numbers
1.0 1.3 1.5 1.5 2.1 2.1 2.1.
Determine the Nelson-A˚alen estimate of the cumulative hazard rate
function.
Dr Jiajun Liu (XJTLU) MTH 313 17 / 23
Example 11.4
Example 11.4 Determine the empirical survival function and
Nelson-A˚alen estimate of the cumulative hazard rate function for the time
to death for Data Set D1. Estimate the survival function from the
Nelson-A˚alen estimate. Assume that the death time is known for those
who surrender.
Dr Jiajun Liu (XJTLU) MTH 313 18 / 23
Example 11.4
Data Set D1
Policyholder Time of death
1 -
2 4.8
3 -
4 0.8
5 3.1
6 -
7 -
8 -
9 -
10 2.9
Dr Jiajun Liu (XJTLU) MTH 313 19 / 23
Example 11.4
Policyholder Time of death
11 2.9
12 -
13 4.0
14 -
15 -
16 4.8
17 -
18 -
19-30 -
Dr Jiajun Liu (XJTLU) MTH 313 20 / 23
Empirical distributions for grouped data
11.3 Empirical distributions for grouped data
For grouped data, construction of the empirical distribution as defined
previously is not possible. However, it is possible to approximate the
empirical distribution in some reasonable way.
Definition 11.8 For grouped data, the distribution function obtained by
connecting the values of the empirical distribution at the group boundaries
with straight lines is called the ogive. The formula is
Fn(x) =
cj x
cj cj1Fn(cj1) +
x cj1
cj cj1Fn(cj), cj1 x cj .
Dr Jiajun Liu (XJTLU) MTH 313 21 / 23
Histogram
Definition 11.9 For grouped data, the empirical distribution function
can be obtained by di↵erentiating the ogive. The resulting function is
called a histogram. The formula is
fn(x) =
Fn(cj) Fn(cj1)
cj cj1 =
nj
n (cj cj1) , cj1 x < cj .
Dr Jiajun Liu (XJTLU) MTH 313 22 / 23
Example 11.5
Example 11.5 Construct the ogive and histogram for Data Set C.
Data Set C
Payments range Number of payments
0-7,500 99
7,500-17,500 42
17,500-32,500 29
32,500-67,500 28
67,500-125,000 17
125,000-300,000 9
Over 300,000 3
Total 227
Dr Jiajun Liu (XJTLU) MTH 313 23 / 23