MAST20005/MAST90058-无代写|学霸联盟

MAST20005/MAST90058-无代写

时间：2023-10-03

MAST20005/MAST90058: Week 10 Solutions
1. The sign test is the only one we can do with this summary data. Let Z = freq(X ≤ 40),
then H0 : m = 40, H1 : m ̸= 40 and Z ∼ Bi(100, 0.5) ≈ N(50, 25). We can use the
exact Binomial, and against a two-sided alternative with α say not allowed to exceed
0.05, we note that 2*pbinom(39,100,0.5)=0.0352 but 2*pbinom(40,100,0.5)=0.0569, and
so decide we will reject H0 unless 40 ≤ Z ≤ 60; the observed z = 37, so we reject H0.
For the normal approximation the p-value is 2*pnorm(37.5,50,5)=0.0124 with continuity
correction, so again we reject at the nominal α = 0.05 level.
2. In a truly random sequence of numbers, the probability of the next digit being the same
as the preceding one is 1/10 and the probability of the next one differing by 1 from the
preceding is 2/10 (for all digits as we ensured that 0 and 9 each had two neighbours as
well by linking them together). The null hypothesis is:
H0 : p1 = 0.1, p2 = 0.2, p3 = 0.7
Suppose we obtain the following observed counts:
Observed Expected
Same 0 50× 0.1 = 5
Differ by 1 8 50× 0.2 = 10
Other 42 50× 0.7 = 35
The chi-squared statistic is:
(0− 5)2
5
+
(8− 10)2
10
+
(42− 35)2
35
= 6.8 > 5.991 (0.95 quantile of χ22)
Thus we conclude that the string of 51 digits is unlikely to have been randomly generated.
In your groups it is likely, judging from past year’s experience, that you will have a low
count for the ’same’ category as people are reluctant to repeat digits when attempting to
mimic randomness. As this category has a small expected number the residual is large
for smaller deviations - so this cell is likely to have contributed most to your test statistic.
Most attempts at randomness in the past have failed mainly for this reason.
3. (a) Note W+ +W− =
∑n
i=1 i =
1
2
n(n+ 1) and W = W+ −W− = V −W−. So
W = V −W−
= V − [1
2
n(n+ 1)− V ]
= 2V − 1
2
n(n+ 1)
Hence
V =
W + 1
2
n(n+ 1)
2
(b) Define the sum we seek as Sn ≡
∑n
i=1 i
2.
1
i. First note that
∑n
1 [(i+ 1)
3 − i3] = (23 − 1) + (33 − 23) + . . .+ (n3 − (n− 1)3) +
((n+ 1)3 − n3) = (n+ 1)3 − 1 by cancellation of telescoping terms. Secondly:
n∑
1
[(i+ 1)3 − i3] =
n∑
i=1
[i3 + 3i2 + 3i+ 1− i3]
= 3Sn + 3
1
2
n(n+ 1) + n
Equating these two expressions gives:
(n+ 1)3 − 1 = 3Sn + 3
2
n(n+ 1) + n
and elementary algebra gives the required formula.
ii. By induction we first verify Sn holds for n = 1, namely S1 = 1
2 = 1
6
1(1+1)(2+1)
which is true. We then assume Sn is true for n and prove if holds for (n+ 1) as
follows:
12 + 22 + . . .+ n2 + (n+ 1)2 = Sn + (n+ 1)
2
=
1
6
n(n+ 1)(2n+ 1) + (n+ 1)2
=
1
6
(n+ 1)[n(2n+ 1) + 6(n+ 1)]
=
1
6
(n+ 1)[2n2 + 7n+ 6]
=
1
6
(n+ 1)[(n+ 2)(2n+ 3)]
=
1
6
(n+ 1)((n+ 1) + 1)(2(n+ 1) + 1) = Sn+1
So by induction Sn holds for all n.
(c) Finally we have:
E(V ) =
E(W ) + 1
2
n(n+ 1)
2
=
1
4
n(n+ 1)
var(V ) =
1
4
var(W )
=
1
24
n(n+ 1)(2n+ 1)
V can be approximated by a normal like W as it is a linear function of W .
4. (a) Given X
d
=Y we have P (X > Y ) = P (Y > X) = P (X < Y ) = 1−P (X > Y ) which
implies P (D > 0) = P (X − Y > 0) = P (X > Y ) = 1
2
so mD = 0.
(b) Given X
d
=Y we have D = X−Y d=Y −X = −(X−Y ) = −D so the pdf of D must
be symmetric around 0. Alternatively, FD(d) = P (D < d) = P (−D < d) = P (D >
−d) = 1− FD(−d) so differentiating gives fD(d) = −fD(d)×−1 = fD(−d) which is
a definition of symmetry.
2
5. Let our sign test statistic be N = number of Xi < 0 ∼ Binomial(10, 0.5). Note 2 ∗
pbinom(2, 10, 0.5) = 0.108 > 0.05 and 2 ∗ pbinom(1, 10, 0.5) = 0.0107 < 0.05. So we fail
to reject with the sign test if 2 ≤ N ≤ 8.
Signed rank test will reject if |W | ≥ c. To find an extreme value for W for a data set
for which the sign test accepts H0, lets make the 8 largest ranks positive and the only
negative ranks −1 and −2.
This gives W = 49. We need to convert to V to use the psignrank command in R.
V = 1
2
(W + 1
2
n(n+ 1)) = 1
2
(49 + 55) = 52. So:
p-value = 2Pr(V ≥ 52)
= 2(1− Pr(V ≤ 51))
= 2(1− psignrank(51, 10))
= 0.009765
To verify, note that all 210 possible allocations of signs to the ranks are equally likely,
each with probability 2−10. So we just need to count up all sign allocations resulting in
a value of |W | ≥ 49. The sets for W ≥ 49 are as follows (where we just list the negative
ranks in the set and the corresponding W value):
{none negative} with W=55
{-1} with W=53
{-2} with W=51
{-3} with W=49
{-1,-2} with W=49
Clearly W is symmetric around 0 so there are another 5 outcomes with W ≤ −49, hence
p-value=2× 5
210
= 0.009765 as before.
The signed rank test is rejecting at better than the 1% level so we can probably afford
to add in more small negative ranks. If we have {−1,−2,−3}, then V = 49 with p-
value= 2 ∗ (1− psignrank(48, 10)) = 0.0273. Trying a fourth gives {−1,−2,−3,−4} with
V = 45 and p-value= 2 ∗ (1 − psignrank(44, 10)) = 0.0839 which finally fails to reject.
So {−1,−2,−3} gives rejection for signed rank with p-value= 0.0273 but acceptance for
sign test at p-value=2 ∗ pbinom(3, 10, 0.5) = 0.34.
6. (a) Given the dependence of the before and after observations here we must work with
the differences. If the distributions before and after are the same, then the median of the
differences will be 0. One pair is tied and we must have only two categories - positive or
negative differences - as we are modelling the data with a Binomial distribution. So we
ignore that pair giving n = 9, H0 : mD = 0, test statistic (number of positive differences
which is expected to be high if the treatment works), Z ∼ Bi(9, 0.5) , observed z = 7, so
the p-value is (1− pbinom(6, 9, 0.5)) = 0.0898 and we do not reject H0.
(b) The Null distribution of the signed rank test assumes that differences are either
positive or negative so we again ignore the tie and reduce n to 9. We know from Q4 that
the differences have a symmetric distribution. The differences in order are:
(2,-3,6,8,5,7,4,9,-1) with V = 41. The p-value= 1 − psignrank(40, 9) = 0.0137, so the
3
signed rank test rejects at the α = 0.05 level. (c) the signed rank test is more likely to
reject H0 when it is false than the sign test as it uses not just the sign but also takes
account of the magnitude of the differences - using more information gives it more power
provided the additional assumption of symmetry is met.
7. (a) The table is:
i xi xi −m Rank Sign
1 41.195 1.195 5 1
2 39.485 −0.515 1 −1
3 41.229 1.229 6 1
4 36.840 −3.160 10 −1
5 38.050 −1.950 8.5 −1
6 40.890 0.890 4 1
7 38.345 −1.655 7 −1
8 34.930 −5.070 11 −1
9 39.245 −0.755 2 −1
10 31.031 −8.969 12 −1
11 40.780 0.780 3 1
12 38.050 −1.950 8.5 −1
13 30.906 −9.094 13 −1
and
W = 5− 1 + 6− 10− 8.5 + 4− 7− 11− 2− 12 + 3− 8.5− 13 = −55.
Recall that
E(W ) = 0, var(W ) =
n(n+ 1)(2n+ 1)
6
= 819
so
z =
−55√
819
= −1.922
which is less than −1.645 (0.05 quantile of a standard normal distribution), so we
reject H0 at a 5% level of significance.
(b) Bounding z against known quantiles from a standard normal gives: −1.96 < z <
−1.645. Therefore, we deduce that, 0.025 < p-value < 0.05.
(c) There are 4 positive signs. Therefore, the p-value is
Pr(Y ⩽ 4 | p = 0.5) = 0.1334.
This is greater than α so we cannot reject H0.
The null hypothesis is rejected using the signed-rank test but cannot be rejected
using the sign test.
8. (a) i. As under H0 we have X
d
=Y
d
=C say we can view the combined sample as a
sample size (nX +nY ) from the common distribution C. To designate which Ck
are to be Y ’s (and hence the Ri) we must choose a subset of size nY from the
C’s such that all subsets are equally likely (this follows from the equivalence of
the X’s and Y ’s). There are
(
nX+nY
nY
)
such subsets.
ii. If the Yi are all smaller than all of the X’s then the minimum WY is
∑nY
i=1 i =
1
2
nY (nY + 1). The maximum value is given when the Yi are all larger than the
X’s which adds precisely nX to every one of the ranks in the minimum set so
the maximum value of WY is
1
2
nY (nY + 1) + nXnY =
1
2
nY (nX + nY + 1)
4
iii. This is really an immediate consequence of part (i) and the definition of #(w, nX , nY ).
It enables the calculation of exact p-values for the statistic by enumeration of
cases when it is close to its most extreme values.
(b) Let’s put the Y ’s in order to form the order statistics Y(1), Y(2), . . . , Y(nY ). Let the
corresponding rank of Y(i) = R(i). Since the rank of Y(1) = R(1) there are R(1) − 1
smaller observations which must all be X’s. So the number of X’s less than Y(1) is
R(1) − 1. Similarly there are R(2) − 1 observations smaller than Y(2). One of these is
a Y (namely Y(1)), so the number of X’s less than Y(2) is (R(2) − 1)− 1 = R(2) − 2.
In the same way it is seen quite generally that since the number of Y ’s less than Y(i)
is (i− 1), the number of X’s less than Y(i) is R(i) − 1− (i− 1) = R(i) − i.
The total number of pairs (Xi, Yj) with Xi < Yj is therefore:
(R(1) − 1) + (R(2) − 2) + . . .+ (R(nY ) − nY ) = WY −
1
2
nY (nY + 1) = WXY
(c) The average of the min and max values of WY (defining N = nX + nY ) is:
1
2
(
1
2
nY (nY + 1) + nXnY +
1
2
nY (nY + 1)
)
=
1
2
nY (N + 1)
Now consider ranking the combined N observations in reverse order, so the observa-
tion that previously had rank 1 now has rank N , rank 2 is replaced by rank N − 1,
and in general rank R is replaced by rank R′ = N − R + 1. Let WY ′ denote the
rank sum of these R′ inverse ranks for the Y ’s. The argument in (a)(i) still applies
to these ranks so the null distributions of WY and WY ′ are the same. But
WY ′ = [(N + 1)−R(1)] + [(N + 1)−R(2)] + [(N + 1)−R(nY )] = nY (N + 1)−WY
So:
WY ′ − 1
2
nY (N + 1) =
1
2
nY (N + 1)−WY
It follows that these have the same distribution and combined with the fact that
WY ′
d
=WY we have:
WY ′ − 1
2
nY (N + 1)
d
=WY − 1
2
nY (N + 1)
d
=
1
2
nY (N + 1)−WY
From this symmetry around the average follows as:
Pr(WY =
1
2
nY (N + 1)− k) = Pr(WY = 1
2
nY (N + 1) + k)
9. (a) In order of size the ranks of the control group are (3, 5, 7, 9, 10) and for the treatment
group (1, 2, 4, 6, 8) so WY = 21. We reject for small values of WY as this means the
original score was larger.
(b) Note there are
(
10
5
)
= 252 equally likely rank assignments. To find a 0.05 one sided
rejection region requires c cases where c
252
≈ 0.05 which gives c = 12.6. Listing cases
we have:
5
1 + 2 + 3 + 4 + 5 = 15 1 + 2 + 4 + 5 + 6 = 18
1 + 2 + 3 + 4 + 6 = 16 1 + 2 + 3 + 4 + 9 = 19
1 + 2 + 3 + 4 + 7 = 17 1 + 2 + 3 + 5 + 8 = 19
1 + 2 + 3 + 5 + 6 = 17 1 + 2 + 3 + 6 + 7 = 19
1 + 2 + 3 + 4 + 8 = 18 1 + 2 + 4 + 5 + 7 = 19
1 + 2 + 3 + 5 + 7 = 18 1 + 3 + 4 + 5 + 6 = 19
This exhausts all cases with WY ≤ 19 and shows that under H0, Pr(WY ≤ 19) =
12
252
= 0.0476. As observed WY is 21 we do not reject H0.
10. Rank sum test (two-sided, size 0.05): reject H0 unless 79 ≤ W ≤ 131; observations give
wB = 129 (rank sum B-sample) so we accept H0.
With the additional observations, rank sum test: reject H0 unless 95 ≤ W ≤ 165 (using
H0 ⇒ W ≈ N(130, 18.032)); observations give wB = 168 so reject H0.
11. The observed and expected frequencies are:
Red Brown Scarlet White
O 254 69 87 22
E 243 81 81 27
and
χ2 = 3.646 < 7.815 (0.95 quantile of χ23)
so we cannot reject H0 at the 5% level of significance.
12. This a problem where we want to do a goodness-of-fit test of a particular model but where
we need to first estimate some of the parameters. We can set it up in one of two ways.
The first way is to think about the null distribution and work out which parameters need
to be estimated. Under H0 we have pi1 = pi2, so let’s call both of them pi (since they
are equal). These define the probabilities of each category (columns) that apply to each
group of nurses (rows). Note that these are conditional probabilities, pi = Pr(category i |
group I) = Pr(category i | group II). To complete the model we also need to estimate the
marginal probabilities of the two groups, let’s call these gj = Pr(group j), for j = 1, 2.
The null model, therefore, is that the probability of an observation for category i in
group j is gjpi. Note that there are 6 independent parameters to estimate (5 conditional
column probabilities and one row probability), so ultimately we’ll end up with a test with
12− 6− 1 = 5 degrees of freedom.
The other way is to note that this model is equivalent to the usual test of independence
of a contingency table, we end up estimating the same parameters and apply the same
test as described above.
Under either setup, the observed and expected frequencies are:
6
Category
1 2 3 4 5 6
Group I O 95.0 36.0 71.0 21.0 45.0 32.0
E 88.8 37.2 68.4 23.4 46.2 36.0
Group II O 53.0 26.0 43.0 18.0 32.0 28.0
E 59.2 24.8 45.6 15.6 30.8 24.0
and as there are 5 df
χ2 =
(95− 88.8)2
88.8
+ · · ·+ (28− 24)
2
24
= 3.23 < 11.07 (0.95 quantile of χ25)
so we cannot reject H0.