ACTL30004-精算代写
时间:2022-11-01
DEPARTMENT OF ECONOMICS
CENTRE FOR ACTUARIAL STUDIES
Final Assessment, Semester 2, 2020
ACTL30004 Actuarial Statistics
Preamble
Time allowed: 3 hour
Reading time: 30 minutes
This exam contributes 70% to the total assessment in this subject.
Total: 100 Marks
Instructions to candidates
• Show your answer correct to 4 decimal places unless stated otherwise.
• Except for the True/False questions, submit one pdf file for all questions at
the first question.
Page 1 of 18 pages.

Long Answer Questions
Common Question: 12 Marks
1. The negative binomial distribution has probability mass function (pmf):
f (y|r, b) =

y+ r 1
y
◆ ✓
1
1+ b
◆r ✓ b
1+ b
◆y
y = 0, 1, . . . r, b > 0.
(a) Show that E(Y) = r b.
(b) By denoting µ = r b, find a new parametrization of the pmf of the neg-
ative binomial distribution in terms of f (y|r, µ).
(c) Assuming that r is known, express this new parametrization in the form
of the exponential family of probability distributions. Specify the form
of the functions b(·) and c(·, ·) and also give the expressions for the
canonical parameter q, dispersion parameter f and prior weight w.
(d) Find E(Y) and Var(Y) under the new parametrization.
[Total 3+3+3+3=12 marks ]
Page 2 of 18 pages.
Students answer either A or B: 8 Marks.
2A. Data on the response variable Y and the predictor x are given below:
x 2 4 8
y 6 2 9
Assume a simple linear regression model E(Y|x) = b0 + b1x given the ob-
serve pairs (xi, yi) with i = 1, 2, 3.
(a) Compute the Spearman’s rank correlation coefficient rs.
(b) Find the distribution of the Spearman’s rank correlation coefficient rs.
(c) Use the Spearman’s rank correlation to test H0 : r = 0 vs H1 : r > 0 at
the 5% significance level.
[Total 2+3+3=8 marks ]
2B. Data on the response variable Y and the predictor x are given below:
x 2 4 8
y 6 9 2
Assume a simple linear regression model E(Y|x) = b0 + b1x given the ob-
serve pairs (xi, yi) with i = 1, 2, 3.
(a) Compute the Spearman’s rank correlation coefficient rs.
(b) Find the distribution of the Spearman’s rank correlation coefficient rs.
(c) Use the Spearman’s rank correlation to test H0 : r = 0 vs H1 : r < 0 at
the 5% significance level.
[Total 2+3+3=8 marks ]
Page 3 of 18 pages.
Students answer either A or B: 14 Marks.
3A. The cumulative distribution function (cdf) of the extreme value distribution
is given by
FY(y|µ, s) = 1 exp

exp

y µ
s

, with y 2 R, µ 2 R and s 2 R+.
(a) Show that the complementary log-log (cloglog) link function, i.e. g(p) =
log( log(1 p)) with p 2 (0, 1), for the binomial family GLM can be
derived from the cdf of the extreme value distribution.
The probability mass function for the binomial proportion in the form of
exponential family is given by the following expression
f (z|n, p) = exp
8<:z log

p
1p

+ log(1 p)
1/n
+ log
✓✓
n
nz
◆◆9=;
where z = {0, 1n , . . . , n1n , 1} and 0 < p < 1. Also assume that the parameter
n is known.
A binomial proportion family GLM with cloglog link function is fitted to
dependent variable data denoted zi, i = 1, 2, ...,m. For each data point in
the GLM, it is given a prior weight wi = ni for i = 1, 2, ...,m. Associated
with each dependent variable observation is a row vector of length 3 of ex-
planatory variables denoted xi = (1, xi1, xi2) for i = 1, 2, ...,m. The model
for the success probability conditional on the values taken by the explana-
tory variables is
pi = 1 exp { exp {b0 + b1 xi1 + b2 xi2}}, where b j 2 R, with j = 0, 1, 2.
(b) Derive an expression for the jth score equation.
An investigation is based on 30 patients selected in a probability sample.
The response variable Y describes whether a patient is covered by health
insurance (Y = 0 if no and Y = 1 if yes). Two explanatory variables have
been included in the analysis: the age A (in years) of the patient and the
unemployment status U of the patient (U = 0 if employed and U = 1 if
unemployed). The data is stored in anR dataframe called healthInsurance.
Some R code and output are given in the Appendix.
(c) Write out the model fitted in the dataframe object coverage1 showing
the estimated regression coefficients.
The odds of being covered by health insurance is defined as the probability
of being covered by health insurance divided by the complement of this
probability. When we compare two groups, an odds ratio of 1 indicates that
the condition or event under study is equally likely to occur in both groups.
An odds ratio greater than 1 indicates that the condition or event is more
likely to occur in the first group.
Page 4 of 18 pages.
(d) Using the fitted model in coverage1, calculate the odds ratio for be-
ing covered by health insurance to compare (1) a patient who is un-
employed and 55 years old and (2) a patient who is 45 years old and
employed. Compare both groups.
(e) Using the model stored in the R object coverage2, perform a statistical
test at the 5% significance level to assess whether there is a difference in
adding an indicator variable for the employment status of the patient
when an intercept and linear term for the age of the patient are already
included in the model.
(f) Write a single line R command that will give the p-value of the statis-
tical test in (e).
> healthInsurance=read.csv("healthInsurance.csv");
> coverage1=glm(Y~A+U,family=binomial(link=cloglog),data=healthInsurance)
> summary(coverage1)
Call:
glm(formula = Y ~ A + U, family = binomial(link = cloglog), data = healthInsurance)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0221 0.5121 0.5510 0.6860 0.9220
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.900007 0.665726 1.352 0.176
A -0.008405 0.016145 -0.521 0.603
U -0.370453 0.609248 -0.608 0.543
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 30.024 on 29 degrees of freedom
Residual deviance: 29.008 on 27 degrees of freedom
AIC: 35.008
Number of Fisher Scoring iterations: 5
>
>
> coverage2=glm(Y~A,family=binomial(link=cloglog),data=healthInsurance)
> summary(coverage2)
Call:
glm(formula = Y ~ A, family = binomial(link = cloglog), data = healthInsurance)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.0189 0.5069 0.5647 0.7365 0.8430
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.98624 0.66189 1.490 0.136
A -0.01247 0.01545 -0.807 0.420
(Dispersion parameter for binomial family taken to be 1)
Page 5 of 18 pages.
Null deviance: 30.024 on 29 degrees of freedom
Residual deviance: 29.360 on 28 degrees of freedom
AIC: 33.36
Number of Fisher Scoring iterations: 5
[Total 2+3+2+3+3+1=14 marks ]
3B. Suppose that Y is a two-parameter Cauchy random variable with probability
density function
f (y|µ, s) = s
p (s2 + (y µ)2) , for all • < y < •, µ 2 R and s > 0.
(a) Using the fact that
R 1
a2+x2 dx =
1
a arctan(
x
a ) + c, show that the cau-
chit link function i.e. g(p) = tan(p(p 1/2)) with p 2 (0, 1), for the
binomial family GLM can be derived from the cdf of the Cauchy dis-
tribution.
The probability mass function for the binomial proportion in the form of
exponential family is given by the following expression
f (z|n, p) = exp
8<:z log

p
1p

+ log(1 p)
1/n
+ log
✓✓
n
nz
◆◆9=;
where z = {0, 1n , . . . , n1n , 1} and 0 < p < 1. Also assume that the parameter
n is known.
A binomial proportion family GLM with cauchit link function is fitted to
dependent variable data denoted zi, i = 1, 2, ...,m. For each data point in
the GLM, it is given a prior weight wi = ni for i = 1, 2, ...,m. Associated
with each dependent variable observation is a row vector of length 3 of ex-
planatory variables denoted xi = (1, xi1, xi2) for i = 1, 2, ...,m. The model
for the success probability conditional on the values taken by the explana-
tory variables is
pi = 12 +
1
p arctan {b0 + b1 xi1 + b2 xi2}, where b j 2 R, with j = 0, 1, 2.
(b) Derive an expression for the jth score equation.
An investigation is based on 30 patients selected in a probability sample.
The response variable Y describes whether a patient is covered by health
insurance (Y = 0 if no and Y = 1 if yes). Two explanatory variables have
been included in the analysis: the age A (in years) of the patient and the
unemployment status U of the patient (U = 0 if employed and U = 1 if
unemployed). The data is stored in anR dataframe called healthInsurance.
Some R code and output are given in the Appendix.
(c) Write out the model fitted in the dataframe object coverage1 showing
the estimated regression coefficients.
Page 6 of 18 pages.
The odds of being covered by health insurance is defined as the probability
of being covered by health insurance divided by the complement of this
probability. When we compare two groups, an odds ratio of 1 indicates that
the condition or event under study is equally likely to occur in both groups.
An odds ratio greater than 1 indicates that the condition or event is more
likely to occur in the first group.
(d) Using the fitted model in coverage1, calculate the odds ratio for be-
ing covered by health insurance to compare (1) a patient who is un-
employed and 55 years old and (2) a patient who is 45 years old and
employed. Compare both groups.
(e) Using the model stored in the R object coverage2, perform a statistical
test at the 5% significance level to assess whether there is a difference in
adding an indicator variable for the employment status of the patient
when an intercept and linear term for the age of the patient are already
included in the model.
(f) Write a single line R command that will give the p-value of the statis-
tical test in (e).
> healthInsurance=read.csv("healthInsurance.csv");
> coverage1=glm(Y~A+U,family=binomial(link=cauchit),data=healthInsurance)
> summary(coverage1)
Call:
glm(formula = Y ~ A + U, family = binomial(link = cauchit), data = healthInsurance)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1380 0.4519 0.4840 0.6536 1.0814
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 4.16236 3.52585 1.181 0.238
A -0.05183 0.06590 -0.787 0.432
U -1.07766 1.23694 -0.871 0.384
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 30.024 on 29 degrees of freedom
Residual deviance: 28.490 on 27 degrees of freedom
AIC: 34.49
Number of Fisher Scoring iterations: 8
> coverage2=glm(Y~A,family=binomial(link=cauchit),data=healthInsurance)
> summary(coverage2)
Call:
glm(formula = Y ~ A, family = binomial(link = cauchit), data = healthInsurance)
Deviance Residuals:
Min 1Q Median 3Q Max
Page 7 of 18 pages.
-2.0195 0.5148 0.5524 0.7223 0.9109
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.22850 2.79184 1.156 0.248
A -0.04183 0.05334 -0.784 0.433
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 30.024 on 29 degrees of freedom
Residual deviance: 29.305 on 28 degrees of freedom
AIC: 33.305
Number of Fisher Scoring iterations: 5
[Total 2+3+2+3+3+1=14 marks ]
Page 8 of 18 pages.
Students answer either A or B: 6 Marks.
4A. The truncated exponential distribution on the interval (0, k) is defined by the
following probability density function:
g(x|m, q) = m exp{q x}, for 0 < x < k
where q > 0 is a parameter and m is a normalizing constant.
(a) Give an expression for m in terms of q and k.
(b) Use the inverse transformation method of simulation for generating a
random variate from the truncated exponential distribution.
[Total 3+3=6 marks ]
4B. Given the mixed random variable X with probability density/mass function
f (x) =
8>>><>>>:
3
2

2
x
◆3/2 1
x
, 2  x  10✓
2
10
◆3/2
, x = 12.
(a) Find the cumulative distribution function of X.
(b) Outline, using bullet points, themethod of inverse transformation. Sim-
ulate four values from the mixed random variable X by using this
method. Four random variates from the standard uniform distribution
are given below for use in this question:
0.973414 0.197390 0.309618 0.163117.
[Total 3+3=6 marks ]
Page 9 of 18 pages.
Students answer either A or B: 10 Marks.
5A. Suppose that x1, x2, . . . , xn with n 1, are independent realisations of a con-
tinuous random variable X, with probability density function
f (x) = a x(a+1) with x > 1 and a > 0.
(a) Find the probability density function of Y = logX.
Show that E(Y) =
1
a
.
(b) Find the maximum likelihood estimator for a.
Denote this estimator Tn(X).
(c) Write down a 95% confidence interval for a based on the observed sam-
ple of size 40 where Â40i=1 log xi = 68.698. You may use asymptotic
properties of maximum likelihood estimators in forming your confi-
dence interval.
(d) Given an initial estimate of a equal to a0, use the Fisher-Scoring algo-
rithm to find an updated estimate of a using one iteration. Recall that
the Fisher-Scoring algorithm uses
a1 = a0 +

E


2 log Ln(a|x)
∂a2
|a0
◆1
∂ log Ln(a|x)
∂a
|a0 .
(e) Is MSE(Tn(X)) = Var(Tn(X))? Justify your answer.
[Total 2+2+2+2+2=10 marks ]
5B. Let us suppose that X is a discrete random variable with with probability
mass function
fX (x|l) = 11+ l

l
1+ l
◆x1
, for x = 1, 2, 3, ...;l > 0. (0.1)
(a) Show that the mean of the random variable X is l+ 1.
(b) Given a sequence X1,X2, ...,Xn of independent and identically distributed
random variables that follows the probability mass function given in
(0.1). Show that the maximum likelihood estimator of l, Tn(X), is an
unbiased estimator of the parameter l.
(c) Let us assume that a random sample x1, x2, . . . , x70 of size 70, drawn
from the random variable X, is available and Â70i=1 xi = 128. Write
down a 95% confidence interval for l.
(d) Given an initial estimate l0 of l, use Fisher-Scoring to determine the
minimumnumber of iterations required to compute themaximum like-
lihood estimate of the parameter l.
(e) Is MSE(Tn(X)) = Var(Tn(X))? Justify your answer.
[Total 2+2+2+2+2=10 marks ]
Page 10 of 18 pages.
Students answer either A or B: 10 Marks.
6A. Consider a simulated dataset of n = 100 observations with s2y = TSSn1 = 4900.
We run a regression with an intercept and three explanatory variables, i.e.
xi0, xi1, xi2, xi3 with i = 1, . . . , n to obtain, the mean square error s2 = 1600
and and the vector of regression coefficients bˆ = (b0, b1, b2, b3)>. We also
obtain
(X>X)1 =
0BB@
90 20 20 20
20 80 30 40
20 30 70 50
20 40 50 60
1CCA .
(a) Calculate the standard error of bˆ3, SE(bˆ3).
(b) Determine the estimated covariance between bˆ2 and bˆ3.
(c) Find the estimated correlation between bˆ2 and bˆ3.
(d) Find the estimated variance of 4bˆ1 + 3bˆ2.
(e) Compute the the coefficient of determination adjusted for degrees of
freedom, R2a.
[Total 2+2+2+2+2=10 marks ]
6B. Consider a simulated dataset of n = 100 observations with s2y = TSSn1 = 4900.
We run a regression with an intercept and three explanatory variables, i.e.
xi0, xi1, xi2, xi3 with i = 1, . . . , n to obtain, the mean square error s2 = 1600
and and the vector of regression coefficients bˆ = (b0, b1, b2, b3)>. We also
obtain
(X>X)1 =
0BB@
90 20 20 20
20 80 30 40
20 30 70 50
20 40 50 60
1CCA .
(a) Calculate the standard error of bˆ1, SE(bˆ1).
(b) Determine the estimated covariance between bˆ1 and bˆ2.
(c) Find the estimated correlation between bˆ1 and bˆ2.
(d) Find the estimated variance of 3bˆ2 + 5bˆ3.
(e) Compute the the coefficient of determination adjusted for degrees of
freedom, R2a.
[Total 2+2+2+2+2=10 marks ]
Page 11 of 18 pages.
Students answer either A or B: 10 Marks.
7A. The total claim amount in a year on a particular insurance policy follows
a normal distribution with mean l and standard deviation 100. Prior in-
vestigations show that l follows a normal distribution with mean 500 and
standard deviation 50. Claims sizes x1, . . . , xn are observed over n years.
(a) Provide the posterior distribution of l.
(b) Show that the mean of the posterior distribution of l can be written as
a credibility formula.
(c) Assume that n = 4 and the value of total claims over four years is $2,500.
Calculate the posterior probability that l is greater than 500.
[Total 3+4+3=10 marks ]
7B. The total claim amount in a year on a particular insurance policy follows
a normal distribution with mean l and standard deviation 150. Prior in-
vestigations show that l follows a normal distribution with mean 700 and
standard deviation 75. Claims sizes x1, . . . , xn are observed over n years.
(a) Provide the posterior distribution of l.
(b) Show that the mean of the posterior distribution of l can be written as
a credibility formula.
(c) Assume that n = 6 and the value of total claims over six years is $4,500.
Calculate the posterior probability that l is less than 500.
[Total 3+4+3=10 marks ]
Page 12 of 18 pages.
Students answer either A or B: 10 Marks.
8A. The table below shows aggregate claims for each of four risks in a collective
over the past four years.
Year, j
1 2 3 4
R
is
k,
i 1 82 75 92 86
2 60 101 106 75
3 112 124 96 75
4 160 91 133 135
(a) State the assumptions of the EBCT Model 1.
(b) Assuming that the risks in the collective satisfy the assumptions for
EBCT Model 1, calculate the credibility estimate of the pure premium
for the coming year for the second risk in the collective.
[Total 4+6=10 marks ]
8B. The table below shows aggregate claims for each of four risks in a collective
over the past four years.
Year, j
1 2 3 4
R
is
k,
i 1 82 75 92 86
2 68 102 104 75
3 110 120 94 77
4 150 91 122 132
(a) State the assumptions of the EBCT Model 1.
(b) Assuming that the risks in the collective satisfy the assumptions for
EBCT Model 1, calculate the credibility estimate of the pure premium
for the coming year for the third risk in the collective.
[Total 4+6=10 marks ]
[Total Marks for Long Answer Questions 80 marks ]
Page 13 of 18 pages.
Short Answer Questions:
Students will answer three questions: 4 Marks each.
1. The mean vector and covariance matrix for two random variables are
x¯ =

2
3

, S =

1 2
2 4

.
What percentage of the variance is explained by the first principal compo-
nent?
[Total 4 marks ]
2. Data on the response variable Y and the predictor x are given below:
x 2 4 8
y 6 2 9
Assume a simple linear regression model E(Y|x) = b0 + b1x given the ob-
served pairs (xi, yi) with i = 1, 2, 3. Compute the Kendall’s rank correlation
coefficient t.
3. Let us suppose that x is a realisation of a random variable X|q with proba-
bility mass function Pr{X = x|q} = e
qqx
x!
with x = 0, 1, 2, . . . and q > 0 is
a random variable with prior density function given by p(q) = lelq with
l > 0. Find the unconditional distribution of X.
[Total 4 marks ]
4. Let as assume that the proportion of insurance policies in a portfolio on
which a claim is made over a fixed period is Bernoulli, i.e. Bin(1, q). Sup-
pose that x = {x1, . . . , xn} is a random sample from a Bernoulli distribu-
tion with parameter q 2 (0, 1). Prior knowledge about q is described by
a Beta distribution with parameters a and b. The Bayesian estimator un-
der squared error loss function is a credibility formula. Give the analytical
expression of the credibility factor.
[Total 4 marks ]
Page 14 of 18 pages.
5. An actuarial student has fitted an exponential distribution with rate param-
eter l and a Gamma distribution with shape parameter a and rate param-
eter q to a dataset of 30 losses (x1, x2, . . . , x30) over a period of six months in a
certain community. Empirical calculations giveÂ30i=1 xi = 8822,Â
30
i=1 log xi =
164.7894. The student has used the method of maximum likelihood and ob-
tained lˆ = 0.0034, aˆ = 2.7756 and bˆ = 0.0094 and also log G(aˆ) = 0.4963.
Which model is preferable in terms of the AIC and BIC?
[Total 4 marks ]
6. Derive the analytical expressions of the ith deviance residual for the bino-
mial (proportion) family GLM with known prior weight ni.
[Total 4 marks ]
7. Define the hat matrix to be H = X(X>X)1X>, so that yˆ = Xb⇤ = Hy. The
matrix H is said to project the vector of responses y onto the vector of fitted
values yˆ and b⇤ is the least square estimator in a multiple linear regression
model. Assume also that (X>X)1 is invertible. Show thatH is a symmetric
and idempotent matrix w.
Hint: The matrix A is symmetric if and only if A> = A. The matrix A is
idempotent if and only if A2 = A.
[Total 4 marks ]
Page 15 of 18 pages.
True/False Questions
Students will answer eight questions: 1 Mark each.
1. The probit link is the canonical link function in the binomial family GLM.
• True
• False
[Total 1 marks ]
2. The log link is the canonical link function in the Poisson family GLM.
• True
• False
[Total 1 marks ]
3. The coefficient of determination adjusted for degrees of freedom is inter-
preted as the proportion of variation in the response variable explained by
changes in the explanatory variables.
• True
• False
[Total 1 marks ]
4. The coefficient of determination cannot decrease whenever an explanatory
variable is added to the model.
• True
• False
[Total 1 marks ]
5. The principal components derived from the variance-covariance matrix are
scale invariant.
• True
• False
[Total 1 marks ]
Page 16 of 18 pages.
6. The Spearman’s rank correlation coefficient can always be calculated by us-
ing the expression
rs = 1 6Â
n
i=1(r(xi) r(yi))2
n(n2 1) .
• True
• False
[Total 1 marks ]
7. The Kendall’s rank correlation coefficient can always be calculated by using
the expression
t =
nc nd
(n2)
=
nc nd
n(n 1)/2.
• True
• False
[Total 1 marks ]
8. Is the following equation correct?
E


2 log(L1(q|Y))
∂q2

= E

∂ log(L1(q|Y))
∂q
◆2!
• True
• False
[Total 1 marks ]
9. The inverse transformation method of simulation can only used to simulate
random variates from a continuous random variable.
• True
• False
[Total 1 marks ]
10. Sometimes a biased estimator is preferable to an unbiased estimator.
• True
• False
[Total 1 marks ]
Page 17 of 18 pages.
11. In the gamma family GLM the scale deviance is equal to the deviance.
• True
• False
[Total 1 marks ]
12. The Jeffrey’s noninformative prior is invariant under transformations.
• True
• False
[Total 1 marks ]
13. The Bayesian estimator under the absolute error loss function is the mean
of the posterior distribution.
• True
• False
[Total 1 marks ]
14. In the credibility premium formula, if the credibility factor is equal to 1, the
premium is entirely computed base on individual information.
• True
• False
[Total 1 marks ]
15. In the Empirical Bayes Credibility Theory Model 1, the distribution of Xij
given qi with j = 1, . . . , n and i = 1, . . . ,N are only independent random
variables.
• True
• False
[Total 1 marks ]
Page 18 of 18 pages.


essay、essay代写