程序代写案例-L5A|学霸联盟

程序代写案例-L5A

时间：2021-11-09

Chapter 5
Week 5
5.1 L5A: Summarising Posteriors
Overview
A complete summary of a posterior is the posterior distribution. In the case of a single parameter, θ, simply
drawing a picture of the pmf (a histogram) or the pdf (a curve) “shows everything” about the parameter. In
the case of two parameters, Θ=(θ1, θ2), a three dimensional (perspective) plot or contour plot can be drawn.
Given that the θs often have some degree of dependency in the joint posterior distribution, e.g., p(θ1, θ2|y)
6= p(θ1|y)p(θ2|y), one would want to look at the joint distribution, by looking at contour plots, for example.
However, in the case of dozens or even 100s of parameters, sorting through dozens or 100s of histograms or
density plots may be burdensome.
Thus while graphical summaries of the posterior distributions are indeed good statistical practice, there is
utility in simpler numerical summaries, such as simple point estimates, like posterior mean, measures of
spread like posterior standard deviations or interval estimates.
5.1.1 Point estimates
Point estimates are single number summaries, such as the mean, or median, or mode. As it turns out, each
of these summaries, in the context of Decision Theory, can be viewed as optimal estimators for particular
types of Risk, where risk is the expected value of a Loss Function.
Components of Decision Theory with emphasis on parameter estimation
Decision theory includes parameter estimation, hypothesis testing, and prediction: here we emphasize pa-
rameter estimation. We note below that the notation for the loss function is somewhat atypical.
1. State space, Θ: There is a true state of nature. In this case this true state is an unknown parameter θ
which is contained in a set of possible values called the state space, denoted Θ; thus θ ∈ Θ. (The state
space could be a set of hypotheses.)
2. Action space, A: The decision maker will choose a possible action a from a set of possible actions, the
action space denoted A; thus a ∈ A. The action might be viewed as a “decision”. In some cases A is
77
the same as Θ and the action is to choose a value from Θ. (Selecting one hypothesis (H0) or another
(H1) can be the action.)
3. Sampling distribution for data, f(y|θ): Conditional on the state of nature, there are data y which
have a distribution that depends upon θ, f(y|θ). The data in turn can affect the action taken, i.e., the
action taken can be a function of data: a=δ(y). An example action is to estimate a parameter using
the data: a = θˆ(y).
4. Loss function, L(a|θ): Given a specific action state of nature, denoted θ, there is an associated loss or
“cost” for any action or decision. This loss is denoted L(a|θ)1, and in the examples below L(θˆ|θ). An
example loss function is L(θˆ|θ) = (θ − θˆ)2. Note that L(θˆ|θ) is a function of θˆ as θ is a fixed value.
5. Risk, Rθ(a|y): The Risk is the expected value of the loss function under some probability distribution
for the state of nature, Θ. In a Bayesian setting the probability distribution is the posterior distribution2
for θ:
Risk: Rθ(a|y) = Eθ [L(a|θ,y)] =
∫
θ∈Θ
L(a|θ)p(θ|y)dθ
Here we focus on the action of estimating a parameter, namely a=θˆ
Risk: Rθ(θˆ|y) = Eθ
[
L(θˆ|θ,y)
]
=
∫
θ∈Θ
L(θˆ|θ)p(θ|y)dθ (5.1)
The subscript θ has been attached to R and E to emphasize that the expectation is being taken with
regard to the distribution of θ.
Note that, like loss, risk is a function of θˆ. The distinction from loss is that θ is not part of the risk
function as θ has been removed from the function by integration over the support of θ.
6. Bayes estimator of a parameter: The Bayes Estimator of θ is defined as the value of θ that minimizes
Posterior Risk :
Bayes Estimator: θˆBE =θˆ∈Θ Rθ(θˆ|y) (5.2)
For clarity the operation yields the value of θˆ with the smallest risk. Note that the subscript BE is
not standard notation.
Comments.
• Determining what loss function to use is not necessarily simple, and can be arrived at via “elicitation”,
much like priors. (See the 2017 presentation by Soares on Learn in Course Materials, Week 4.)
• Reich & Ghosh (2019) have a tidy 2 page description of Decision Theory, focusing on applications to
point estimation, hypothesis testing, and prediction.
Example distinguishing loss and risk. Suppose that θ is restricted to a finite interval, L ≤ θ ≤ U , e.g.,
[2,8], and suppose that the true value of θ is 5. Figure 5.1 shows plots of L(θˆ|θ = 5) for four different loss
functions (to be discussed shortly) against varying values of θˆ. Note that the loss is minimized when θˆ=5,
namely θ, for all four loss functions.
1This is not typical notation, l(θ, a) or l(θ, θˆ) is more common but that does not make clear that θ is fixed and a or θˆ is the
variable.
2A prior distribution alone could in principle be used, too.
78
In practice one will not know the true value of θ. Uncertainty about the value of θ is reflected by its posterior
distribution p(θ|y). This leads to the risk function calculation. Suppose that the posterior distribution for
θ is right-triangular on [2,8] (see Figure 5.2):
p(θ|y) = −1
9
+
1
18
θ, 2 ≤ θ ≤ 8
The risk function for the different lost functions and the triangle pdf is calculated using eq’n 5.1. Plots of
the risk versus θˆ for four loss functions are shown in Figure 5.3 and the corresponding Bayes Estimators are
indicated on the plot. R code to reproduce these plots is in Appendices 5.2 and 5.3.
2 3 4 5 6 7 8
0
2
4
6
8
Different loss functions for theta= 5
θ^
Squared Error
Absolute Error
0−1 Loss
Asymmetric Abs Error
Figure 5.1: Different loss function values for θˆ when θ=5.
2 3 4 5 6 7 8
0.
00
0.
05
0.
10
0.
15
0.
20
0.
25
0.
30
pdf for θ
θ
Figure 5.2: Triangle pdf for θ: − 19 + 118θ,
2 3 4 5 6 7 8
0
5
10
15
Different risk functions
θ^
Squared Error
Absolute Error
Asymmetric Abs Error
0−1 Loss
Sq err BE= 6
Abs err BE= 6.2
Asym err w= 0.3 BE= 7
0−1 Loss BE= 8
Figure 5.3: Risk furnction for triangle pdf (on [2,8]) for four loss functions with corresponding θBE indicated.
79
Common loss functions for estimators
Squared error loss
Squared error loss = L(θˆ|θ)=(θ − θˆ)2. (The notation for y is omitted.) Here we consider the continuous
case. The Bayes Estimator, θˆ, for minimizing expected squared error loss is the posterior mean, E[θ|y].
Proof:
d
dθˆ
E
[
(θ − θˆ)2|y
]
=
d
dθˆ
∫
(θ − θˆ)2p(θ|y)dθ =
∫
d
dθˆ
(θ − θˆ)2p(θ|y)dθ =
∫
−2(θ − θˆ)p(θ|y)dθ
Setting the above equal to 0 and solving for θˆ:∫
(θ − θˆ)p(θ|y)dθ = 0 ⇒
∫
θp(θ|y)dθ = θˆ
∫
p(θ|y)dθ
⇒ θˆ = E(θ|y) (5.3)
Thus E(θ|y) is a critical point. To check that it is a minimum, take the 2nd derivative:
d
dθˆ
∫
−2(θ − θˆ)p(θ|y)dθ = 2
∫
p(θ|y)dθ = 2 > 0
Thus E(θ|y) is a minimum.
Example A. Let θ be the average amount students in Edinburgh will spend on entertainment this month.
In this case the sample space is a range of positive numbers; e.g., Θ = [£1, £1000]. A random sample of
n students will be taken next month and the data are the amounts spent by each sampled student, y =
y1, . . . , yn. The “action” or decision is to estimate θ based on the data, a=θˆ(y). The sampling distribution of
the yi is assumed Normal(θ,50
2). The prior for θ is Normal(30, 50
2
5 )
3. A random sample of n=100 students
was selected and the average spent was £42. The posterior mean4 is the Bayes Estimator for θ, namely,
41.43.
Absolute error loss
Absolute error loss= L(θˆ|θ)=|θ − θˆ|. Here again we consider the continuous case. The Bayes Estimator, θˆ,
for minimizing expected absolute error loss is the posterior median, denoted θ0.5.
Proof:
First re-express the Risk as follows:
E
[
|θ − θˆ||y
]
=
∫ ∞
−∞
|θ − θˆ|p(θ|y)dθ =
∫ θˆ
−∞
(θˆ − θ)p(θ|y)dθ +
∫ ∞
θˆ
(θ − θˆ)p(θ|y)dθ
= θˆ
∫ θˆ
−∞
p(θ|y)dθ −
∫ θˆ
−∞
θp(θ|y)dθ +
∫ ∞
θˆ
θp(θ|y)dθ − θˆ
∫ ∞
θˆ
p(θ|y)dθ
= θˆFθ|y(θˆ)−
∫ θˆ
−∞
θp(θ|y)dθ +
∫ ∞
θˆ
θp(θ|y)dθ − θˆ(1− Fθ|y(θˆ))
= θˆ2Fθ|y(θˆ)− θˆ −
∫ θˆ
−∞
θp(θ|y)dθ +
∫ ∞
θˆ
θp(θ|y)dθ
3This is not entirely realistic as θ must be non-negative.
4The posterior distribution is Normal
(
5∗30+100∗42
5+100
, 50
2
5+100
)
or Normal(41.43, 7.292).
80
where Fθ|y(θ)) is the cumulative posterior distribution function for p(θ|y).
Now differentiate with respect to θˆ5:
d
dθˆ
E
[
|θ − θˆ||y
]
= 2Fθ|y(θˆ) + 2θˆp(θˆ)− 1− θˆp(θˆ)− θˆp(θˆ) = 2Fθ|y(θˆ)− 1
Setting the above equal to 0,
2Fθ|y(θˆ) = 1 ⇒ Fθ|y(θˆ) = 1/2 (5.4)
Thus θˆ=50th percentile, θ0.5, is a critical value. To check that it is a minimum, take the 2nd derivative:
d
dθˆ
[
2Fθ|y(θˆ)− 1
]
= 2p(θˆ)
where p(θˆ) is the probability density function which is greater than or equal to 0. If greater than 0, then
a=median is a minimum. If equal to 0 then need to check further (we will not worry about this).
Example A (cont). Because the posterior distribution is normal, the median equals the mean, thus the
Bayes estimator of absolute loss is again £41.43.
Exercise. Show that the Bayes estimator for the following loss function:
L(θˆ|θ) =
{
(1− τ)(θˆ − θ) if θˆ > θ
τ(θ − θˆ) if θˆ < θ
where 0 < τ < 1, is θτ , the τ
th quantile.
0-1 Loss
0-1 loss, L(θˆ|θ)=I(θ 6= θˆ) is defined as follows:
L(θˆ|θ) =
{
1, if θˆ 6= θ.
0, if θˆ = θ.
or L(θˆ|θ) = I(θ 6= θˆ). We only consider the proof for the discrete case but the results are the same for the
continuous case6. We want to minimize
E
[
L(θˆ|θ)
]
=
∑
θ∈Θ
I(θ 6= θˆ)p(θ|y) (5.5)
Before giving the general result, suppose there are 3 values of θ: θ1, θ2, and θ3. Thus θˆ can be chosen to be
one of these 3 values. The expected loss (risk) for each choice is then:
E
[
L(θˆ = θ1|θ))
]
= 0 ∗ p(θ1|y) + 1 ∗ p(θ2|y) + 1 ∗ p(θ3|y) = p(θ2|y) + p(θ3|y) = 1− p(θ1|y)
E
[
L(θˆ = θ2|θ))
]
= 1 ∗ p(θ1|y) + 0 ∗ p(θ2|y) + 1 ∗ p(θ3|y) = p(θ1|y) + p(θ3|y) = 1− p(θ2|y)
E
[
L(θˆ = θ3|θ))
]
= 1 ∗ p(θ1|y) + 1 ∗ p(θ2|y) + 0 ∗ p(θ3|y) = p(θ1|y) + p(θ2|y) = 1− p(θ3|y)
5Recall the Fundamental theorem of calculus: d
dx
∫ x
c f(t)dt = f(x), and similarly
d
dx
∫ c
x f(t)dt = -f(x).
6The proof for the continuous case involves the Dirac Delta function; or one can define an even more complicated loss
function which has 0-1 loss as a limiting case.
81
The value of θˆ minimizing expected loss is the θi that has the largest probability, the most likely value or
the posterior mode7.
The more general solution given q possible values for θ. Let Θ denote a finite set of possible parameters.
The risk is
E
[
L(θˆ|θ)
]
=
∑
θ∈Θ
I(θ 6= θˆ)p(θ|y) = 1− p(θˆ|y) (5.6)
Thus the risk is minimized by θˆ equal to θ with the largest probability, namely the mode.
More Examples
Example B. Binomial θ with a Beta prior. Given a Beta(α, β) prior for a Binomial(n, θ) sampling
distribution, the posterior distribution is Beta(α + y, β + n− y). Suppose α = 2, β = 3, n=10, and y = 4.
Then the posterior is Beta(6,9). The Bayes estimators for the loss functions given above:
Quadratic loss (posterior mean): θˆ =
α+ y
α+ β + n
=
2 + 4
2 + 3 + 10
= 0.4
Absolute error loss (posterior median): θˆ = qbeta(0.5, shape1 = α+ y, shape2 = β + n− y)
= qbeta(0.5, 6, 9) = 0.395
0-1 loss (posterior mode): θˆ =
α+ y − 1
α+ β + n− 2 =
2 + 4− 1
2 + 3 + 10− 2 = 0.385
The three values are shown in Figure 5.4.
Figure 5.4: Posterior distribution for Binomial θ, Beta(6,9), with posterior mean, median, and mode.
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
3.
0
θ
Mean
Mode
Median
7Note the Bayes estimate would not be unique if there were two or more modes that were the largest and of equal value.
82
Example C. Poisson θ with a Gamma prior. As shown previously, given a Gamma(α, β) prior for a
Poisson(θ) sampling distribution, the posterior distribution (given n=1) is Gamma(α + y, β + 1). Suppose
α = 5, β = 3, and y = 3. The posterior for θ is Gamma(8,4). The Bayes estimators for the loss functions
given above:
Quadratic loss: θˆ =
α+ y
β + 1
=
5 + 3
3 + 1
= 2
Absolute error loss: θˆ = qgamma(0.5,shape1=α+ y, shape2=β + 1) = qgamma(0.5, 8, 4) = 1.917
0-1 loss: θˆ =
α+ y − 1
β + 1
=
5 + 3− 1
3 + 1
= 1.75
Values are shown in Figure 5.5.
Figure 5.5: Posterior distribution for Poisson θ, Gamma(8,4), with posterior mean, median, and mode.
0 2 4 6 8
0.
0
0.
1
0.
2
0.
3
0.
4
0.
5
0.
6
θ
Mean
Mode
Median
5.1.2 Interval estimates
Analogous to frequentist Confidence Intervals are Bayesian Credible Intervals but the latter are more easily
interpreted. To begin we just consider the situation of a single parameter, θ, and then discuss intervals,
“regions”, for two or more parameters.
For a given probability P where P is often expressed as 1 − α, where 0 < α < 1, a P% Bayesian Credible
interval is defined as an interval [LB,UB] where∫ UB
LB
p(θ|y)dθ = P (5.7)
Suppose α equals 0.05, then P=1-0.05=0.95. Then a 95% credible interval are those values LB and UB
such that ∫ UB
LB
p(θ|y)dθ = 0.95
Similar to one sided-confidence bounds, one can define lower credible bounds, [LB,∞], and upper credible
bounds, [−∞, UB].
Example
Binomial θ with Beta prior. Suppose the prior for θ is Beta(2,3) and for n=10 Bernoulli trials the
observed number of successes is y=4. Then the posterior for θ is Beta(6,9). A 95% credible interval will be
83
any combination of LB and UB such that:∫ UB
LB
Γ(α+ β)
Γ(α)Γ(β)
θα−1(1− θ)β−1dθ =
∫ UB
LB
Γ(6 + 9)
Γ(6)Γ(9)
θ6−1(1− θ)9−1dθ = 0.95
Note that these bounds are not unique. For example, the following three intervals are all 95% credible
intervals (See Figure 5.6).
[0.177, 0.649] [0.146, 0.623] [0.196, 0.692]
Figure 5.6: Three 95% credible intervals for Binomial θ, with posterior Beta(6,9).
0
1
2
3
4
0.00 0.25 0.50 0.75 1.00
x
y
95% CI 1
0
1
2
3
4
0.00 0.25 0.50 0.75 1.00
x
y
95% CI 2
0
1
2
3
4
0.00 0.25 0.50 0.75 1.00
x
y
95% CI 3
Bimodal posteriors. If the posterior distribution is bimodal, then a credible interval composed of two
line segments may be more sensible. (Draw a picture.)
84
Symmetric credible intervals
A less arbitrary approach to constructing credible intervals is to use symmetric intervals where the probability
1-P , or α, is divided evenly in the tails: ∫ UBsym
LBsym
p(θ|y)dθ = P (5.8)
where Pr(θ ≤ LBsym) = Pr(θ ≥ UBsym) = α/2.
• One advantage of the symmetric approach is that for monotonic transformations of θ, g(θ), the 1-α
symmetric interval for g(θ) is [g(LB), g(UB)] where [LB,UB] are the symmetric bounds for θ.
Referring to the previous Beta example: a 95% symmetric credible interval is [0.177, 0.649], and is shown in
Figure 5.7.
Figure 5.7: Symmetric 95% credible interval for Binomial θ, with posterior Beta(6,9).
0
1
2
3
4
0.00 0.25 0.50 0.75 1.00
x
y
Symmetric 95% Cred Int
Highest Posterior Density Intervals
Another non-arbitrary approach is Highest Posterior Density Intervals (HPDIs). These are 1-α intervals
where the densities (or probabilities) for values in the interval are higher than for values not in the inter-
val. Another way to say this: “it is the interval with the shortest interval width, U -L, while maintaining
appropriate coverage” (Reich and Ghosh, p 268. Formally: The interval [LB,UB] is the 1-α HPDI if
1. [LB,UB] is a 1-α credible interval.
2. For all θ′ ∈ [LB,UB] and all θ′′ 6∈ [LB,UB], p(θ′|y) ≥ p(θ′′|y).
• This will yield the shortest 1-α credible interval.
• If the posterior is unimodal and symmetric, the symmetric and HPDI are the same.
• If the posterior is multimodal, the HPDI may consist of two or more line segments.
• In contrast to symmetric intervals, HPDI’s are not invariant to monotonic transformations.
8Although bimodal distributions may have two non-overlapping intervals that are shorter in total length than a single
interval.
85
Software for HPDI calculations. The R package HDInterval has a function hdi that can be used to
calculate HPDIs for a variety of situations. Below are two examples.
Demonstration with Binomial.
#Jeffreys prior
alpha.prior <- 0.5
beta.prior <- 0.5
#Sampling distribution is Binomial(n=5,theta)
set.seed(1922)
n <- 5
true.theta <- 0.7
y <- rbinom(n=1,size=n,prob=true.theta)
cat("Obs’d data=",y,"\n")
# Obs’d data= 4
#posterior Beta(a+y, b+n-y)
alpha.post <- alpha.prior + y
beta.post <- alpha.post+n-y
cat("posterior alpha=",alpha.post,"beta=",beta.post,"\n")
#posterior alpha= 4.5 beta= 5.5
x <- seq(0.01,0.99,by=0.01)
plot(x,dbeta(x,alpha.post,beta.post),xlab=expression(theta),
ylab="",main="Beta Posterior",type="l")
# symmetric and HPDI 99% credible intervals
credible.level <- 0.99
alpha.level <- 1-credible.level
sym.interval <- qbeta(c(alpha.level/2, 1-alpha.level/2),
shape1=alpha.post,shape2=beta.post)
cat("Symmetric interval",sym.interval[1],sym.interval[2],
"length=",diff(sym.interval),"\n")
# Symmetric interval 0.1147131 0.8191715 length= 0.7044585
# HPDI
hpdi.interval <- hdi(object=qbeta,credMass=credible.level,
shape1=alpha.post,shape2=beta.post)
cat("HPD interval",hpdi.interval["lower"],hpdi.interval["upper"],
"length=",diff(hpdi.interval),"\n")
# HPD interval 0.1092904 0.8129713 length= 0.7036809
Note that in this case the symmetric and HPDI are nearly identical.
Demonstration with Poisson.
# Prior is Gamma(25.0, 2.5) to yield a
# prior mean of 10 and a CV=0.2
shape.prior <- 25.0
rate.prior <- 2.5
set.seed(128)
# Sampling distribution
true.theta <- 3
n <- 5
y <- rpois(n=n,lambda=true.theta)
print(y)
# [1] 4 6 4 5 6
# Posterior distribution
86
shape.post <- shape.prior+sum(y)
rate.post <- rate.prior+n
#posterior alpha= 4.5 beta= 5.5
x <- seq(0.1,12,by=0.01)
plot(x,dgamma(x,shape.post,rate.post),type="l")
# symmetric and HPDI 90% credible intervals
credible.level <- 0.90
alpha.level <- 1-credible.level
sym.interval <- qgamma(c(alpha.level/2, 1-alpha.level/2),
shape=shape.post,rate=rate.post)
cat("Symmetric interval",sym.interval[1],sym.interval[2],
"length=",diff(sym.interval),"\n")
# Symmetric interval 5.195298 8.289474 length= 3.094177
# HPDI
hpdi.interval <- hdi(object=qgamma,credMass=credible.level,
shape=shape.post,rate=rate.post)
cat("HPD interval",hpdi.interval["lower"],hpdi.interval["upper"],
"length=",diff(hpdi.interval),"\n")
# HPD interval 5.113741 8.194088 length= 3.080347
In this case the symmetric and HPDI interval endpoints differ slightly but the lengths are nearly identical.
Credible regions
Given two or more parameters, 1-α, credible regions can be defined as well. There are a variety of ways to
construct such regions and the construction can be quite complex. As a simple example suppose that there
are two parameters, thus the joint posterior is a surface over the (θ1, θ2) plane. A rectangular region can be
defined with the four corners being [(LB1, LB2), (LB1, UB2), (UB1, LB2), (UB1, UB2)] such that∫ UB2
LB2
∫ UB1
LB1
p(θ1, θ2|y)dθ1dθ2 = 1− α
The region need not be rectangular, however. An HPD region can be calculated as well, generally by
numerical methods, and it will usually not be rectangular. (Draw a picture.)
Contrast with frequentist intervals
Often, particularly if the data dominates the prior, 1-α frequentist confidence intervals and Bayesian credible
intervals can be quite similar. Their interpretation is quite different however.
For example, the sampling model is Normal(θ,1) and the prior for θ is Normal(0,100). One then takes
a random sample of n=10 observations and the sample average y¯ is 1.54. The posterior distribution is
Normal(1.540,0.0999) (we will show why this is in a later lecture).
• Frequentist 95% confidence interval:
[y − 1.96
√
1/n, y + 1.96
√
1/n] = [0.922, 2.162]
87
All randomness is in terms of the data as θ is treated as a fixed but unknown quantity.
The interpretation of a 95% confidence interval is that if such intervals are constructed repeatedly
based on repeated random samples of the data, the interval will include the unknown parameter θ 95%
of the time.
For this given sample the interval either contains θ or it doesn’t. Once the sample has been observed
there is no more randomness, and one cannot make a probability statement about an observed event
after the fact. That’s like rolling a die and seeing a 5 and then trying to say there’s a 90% probability
that it is a 3.
• Bayesian 95% credible interval (symmetric and HPDI are the same here), letting θq|y denote the qth
quantile from the Normal(1.540, 0.0999) posterior:
[θ0.025|y, θ0.975|y] = [0.921, 2.160]
Thus the credible interval is quite similar to the confidence interval.
However, the interpretation the credible interval is that there is a 95% probability that this particular
interval contains the unknown parameter θ.
5.1.3 Other summaries
Standard numerical output. Numerical summaries of posterior distributions are routinely produced by
software (e.g., JAGS). These include:
Parameter Min 1st q Median Mean 3rd q Max StdDev
θ1 3 12 21 19 31 45 8
θ2 83 101 122 126 154 198 27
Posterior probabilities for specific events. Given a posterior density one can calculate many quanti-
tative summaries some of which could be quite complex. For example if one has a joint posterior density for
two parameters, θ1 and θ2, one can calculate:
Pr(θ1 + θ2 > 7)
Pr[(2 ≤ θ1 ≤ 4) ∩ (1 ≤ θ2)]
Pr(exp(θ1) < 10)
Posterior distributions for functions of parameters. The Schaffer surplus production model is used
to model the biomass of a harvested fish stock. Suppose that the sampling model is
Bt+1 ∼ Lognormal
(
ln
[
Bt + rmaxBt
(
1− Bt
K
)
− Ct
]
, σ2
)
where Bt is fish biomass in year t, Ct is the catch and rmax and K are unknown parameters. Suppose one
has n years of catch and biomass data, and carries out a Bayesian analysis of the unknown parameters, and
arrives at a joint posterior distribution for rmax and K. The maximum sustainable yield (the maximum
harvest that can be taken in “perpetuity”) is calculated by
MSY =
rmaxK
4
Given the joint posterior distribution for rmax and K (or a sample from it), one can calculate a joint posterior
distribution for MSY .
88
5.2 R Code for Loss Function Plots
# 4 Different Loss Functions:
# Squared Error
L1 <- function(theta,theta.hat) {
out <- (theta-theta.hat)^2; return(out)
}
# Absolute Error
L2 <- function(theta,theta.hat) {
out <- abs(theta-theta.hat); return(out)
}
# Asymmetric Error
L3 <- function(theta,theta.hat,w) {
out <- rep(NA,length(theta))
ok <- theta >=theta.hat
out[ok] <- (1-w)*(theta-theta.hat[ok])
out[!ok] <- w*(theta.hat[!ok]-theta)
return(out)
}
# 0-1 Loss
L4 <- function(theta,theta.hat) {
out <- ifelse(theta==theta.hat,0,1); return(out)
}
theta <- 5; w <- 0.3
lower <- 2; upper <- 8
theta.hat <- seq(lower,upper,by=0.1)
L1.example <- L1(theta,theta.hat)
L2.example <- L2(theta,theta.hat)
L3.example <- L3(theta,theta.hat,w=w)
L4.example <- L4(theta,theta.hat)
my.ylim <- range(c(L1.example,L2.example,L3.example))
my.lwd <- 2
plot(theta.hat,L1.example,type="l",lty=1,
col=1,ylim=my.ylim,lwd=my.lwd,ylab="",xlab=expression(hat(theta)),
main=paste("Different loss functions for theta=",theta))
lines(theta.hat,L2.example,lty=2,col=2,lwd=my.lwd)
lines(theta.hat,L3.example,lty=3,col=4,lwd=my.lwd)
lines(theta.hat,L4.example,lty=4,col=5,lwd=my.lwd)
legend("top",legend=c("Squared Error","Absolute Error",
"Asymmetric Abs Error","0-1 Loss"),
lty=1:4,col=1:4,lwd=my.lwd)
89
5.3 R Code for Risk Function Plots
5.3.1 Triangle pdf code
# Calculate triangle pdf parameters and pdf values
triangle.pdf <- function(theta,lower,upper,verbose=FALSE) {
b1 <- 2/(upper-lower)^2
b0 <- -b1*lower
if(verbose) {
cat("b0=",b0,"b1=",b1,"\n")
}
out <- list(pdf=b0+b1*theta,b0=b0,b1=b1)
return(out)
}
# Plot the Triangle pdf (and calculate hyper-parameters)
lower <- 2; upper <- 8
theta.hat <- seq(lower,upper,by=0.1)
true.theta <- theta.hat
prob.theta <- triangle.pdf(theta=true.theta,lower=lower,upper=upper,verbose=TRUE)
b0 <- prob.theta$b0
b1 <- prob.theta$b1
plot(true.theta,prob.theta$pdf,type="l",xlab=expression(theta),
ylab="",main=expression(paste("Triangle pdf")))
5.3.2 Risk function code
# Function to calculate the Risk for 4 loss functions
risk.fun <- function(theta.hat,b0,b1,lower,upper,loss.choice=1,
w=NULL,verbose=FALSE) {
n <- length(theta.hat)
out <- numeric(n)
for(i in 1:n) {
theta.hat.val <- theta.hat[i]
switch(as.character(loss.choice),
#squared error
"1"= {
integrand <-function(theta,theta.hat.val,b0,b1,w) {
x <- (b0+b1*theta)*(theta-theta.hat.val)^2
return(x)
}
},
#absolute error
"2"= {
integrand <-function(theta,theta.hat.val,b0,b1,w) {
x <- (b0+b1*theta)*abs(theta-theta.hat.val)
return(x)
}
},
# asymmetric absolute error
90
"3"= {
integrand <-function(theta,theta.hat.val,b0,b1,w) {
x <- numeric(length(theta))
ok <- theta >= theta.hat.val
x[ok] <- (1-w)*(b0+b1*theta[ok])*(theta[ok]-theta.hat.val)
x[!ok] <- w*(b0+b1*theta[!ok])*(theta.hat.val-theta[!ok])
# x <- 2*x # this will scale to the absolute error loss function
return(x)
}
}
)
if(loss.choice !=4) {
out[i]<- integrate(f=integrand,lower=lower,upper=upper,
theta.hat.val=theta.hat.val,b0=b0,b1=b1,w=w)$value
} else {
# 0-1 Loss function
out[i] <- 1-(b0+b1*theta.hat.val)
}
} #end of loop over different values of theta.hat
return(out)
}
5.3.3 Risk function plots
#---Now Calculate and plot Risk functions
w <- 0.3
R1 <- risk.fun(theta.hat,b0=b0,b1=b1,lower=lower,upper=upper,loss.choice=1)
R2 <- risk.fun(theta.hat,b0=b0,b1=b1,lower=lower,upper=upper,loss.choice=2)
R3 <- risk.fun(theta.hat,b0=b0,b1=b1,lower=lower,upper=upper,loss.choice=3,w=w)
R4 <- risk.fun(theta.hat,b0=b0,b1=b1,lower=lower,upper=upper,loss.choice=4)
my.ylim <- range(c(0,R1,R2,R3,R4))
plot(theta.hat,R1,type="l",xlab=expression(hat(theta)),ylab="",
main="Different risk functions",ylim=my.ylim,lwd=my.lwd)
x <- which(R1==min(R1))
Bayes.L1 <- theta.hat[x]
segments(Bayes.L1,0,Bayes.L1,R1[x],col=1,lwd=my.lwd)
cat("Bayes estimator Sq Err=",Bayes.L1,"\n")
# Bayes estimator Sq Err= 6
lines(theta.hat,R2,lty=2,col=2,lwd=my.lwd)
x <- which(R2==min(R2))
Bayes.L2 <- theta.hat[x]
segments(Bayes.L2,0,Bayes.L2,R2[x],col=2,lty=2,lwd=my.lwd)
cat("Bayes estimator abs error=",Bayes.L2,"\n")
# Bayes estimator abs error= 6.2
lines(theta.hat,R3,lty=3,col=3,lwd=my.lwd)
x <- which(R3==min(R3))
91
Bayes.L3 <- theta.hat[x]
segments(Bayes.L3,0,Bayes.L3,R3[x],col=3,lty=3,lwd=my.lwd)
cat("Bayes estimator quantile=",Bayes.L3,"\n")
# Bayes estimator quantile= 7
lines(theta.hat,R4,lty=4,col=4,lwd=my.lwd)
x <- which(R4==min(R4))
Bayes.L4 <- theta.hat[x]
segments(Bayes.L4,0,Bayes.L4,R4[x],col=4,lty=4,lwd=my.lwd)
cat("Bayes estimator 0-1=",Bayes.L4,"\n")
# Bayes estimator 0-1= 8
legend("top",legend=c("Squared Error","Absolute Error","Asymmetric Abs Error",
"0-1 Loss"),
lty=1:4,col=c(1,2,3,4),lwd=my.lwd)
legend("topright",legend=c(paste("Sq err BE=",Bayes.L1),
paste("Abs err BE=",Bayes.L2),
paste("Asym err w=",w,"BE=",Bayes.L3),
paste("0-1 Loss BE=",Bayes.L4)),
lty=1:4,col=1:4,lwd=my.lwd)
92

学霸联盟