R代写-ACST8040
时间:2022-04-28
1

ACST8040 Quantitative Research Methods

Solution to Exercise 2

Question 1
(a) Calculate i i iZ Y X= − to obtain
1 10 6, )18,12, 3,( 25, 18,8, 15,, 21, ( , 12)Z Z − − −= − − − −…
Their absolute values ( )1 10 6,18,12,3 ),, , ( ,25,18 8,15,21,12Z Z = are ordered by
( )4 1 7 3 10 8 2 6 9 5, , , , , , , , , (3,6,8,12,12,15,18,18,21,25)Z Z Z Z Z Z Z Z Z Z =
Hence the ordered ranks are 1,2,3,4.5,4.5, )6( ,7.5,7.5,9,10 with average ranks for ties.
Thus the ranks of 1 10( , , )Z Z… are 1 10( , , ) (1,7.5,4.5,2,10,7.5,3,6,9,4.5)R R = .
It follows that the observed value of the Wilcoxon signed rank test statistic is
1 3 7 2 4.5 3 9.5T R R R
+ = + + = + + =
(b) To assess the effects of the new measure to lower the cost, we can test 0 : 0H θ = against
1 : 0H θ < . By part (a), the exact p-value of the test based on the data is Pr 9.5( )T + ≤
conditional on ties. Count the number of outcomes such that 9.5T + ≤ as follows:
• 0,1,2,3T + = have 1 1 1 2 5+ + + = outcomes as in the case of no ties.
• 4 9.5T +≤ ≤ have 1 4 2 4 3 4 2 6 29× + × + + × + = outcomes as listed below:
T + Outcome Number T + Outcome Number
4 (1,3) 1 7 (1,6) 1
4.5 (4.5) 2× 2 7.5 (7.5) 2× , (3,4.5) 2× ,
(1,2,4.5) 2×
6
5 (2,3) 1 8 (2,6) 1
5.5 (1,4.5) 2× 2 8.5 (1,7.5) 2× , (1,3,4.5) 2× 4
6 (6), (1,2,3) 2 9 (9), (1,2,6), (3,6), (4.5,4.5) 4
6.5 (2,4.5) 2× 2 9.5 (2,7.5) 2× , (2,3,4.5) 2× 4
It follows that the exact p-value of the Wilcoxon signed rank test of 0 : 0H θ = against
1 : 0H θ < is calculated by
10
5 30 35Pr 9.5 0.03418 0.05
10242
( )T + +≤ = = = <
Thus 0H is rejected in favour of 1 : 0H θ < at the 5% level of significance. This provides
sufficient evidence that the new measure is effective to reduce the cost.

2

(c) To test 0H by the normal approximation, calculate
0
10(11)E 27.5
4
[ ]T + = = , 0
10(11)(21) 2(2)(1)(3) 385 1Var 96
24 48 4
( )T + −= − = = ⇒
* 0
0.05
0
E 9.5 27.5 1.837 1.645
96Var
[ ]
( )
T TT z
T
+ +
+
− −
= = = − < − = −
It also shows sufficient evidence for 0θ < at the 5% level, confirming the effects of the
new measure to reduce the cost.
(d) Calculate the Walsh averages to obtain their ordered values (1) (55)W W≤ ≤ below:
i ( )iW i ( )iW i ( )iW i ( )iW i ( )iW
1 -25 12 -18 23 -12 34 -6 45 -1.5
2 -23 13 -18 24 -10.5 35 -5 46 0
3 -21.5 14 -16.5 25 -10.5 36 -5 47 1.5
4 -21.5 15 -16.5 26 -9.5 37 -4.5 48 2.5
5 -21 16 -16.5 27 -9 38 -4.5 49 4.5
6 -20 17 -15 28 -8.5 39 -3.5 50 6
7 -19.5 18 -15 29 -7.5 40 -3 51 7
8 -19.5 19 -15 30 -7.5 41 -3 52 8
9 -18.5 20 -14 31 -6.5 42 -3 53 9
10 -18 21 -13.5 32 -6.5 43 -3 54 10
11 -18 22 -12 33 -6 44 -2 55 25
Then θ is estimated by ((55 1) 2) (28)ˆ 8.5W Wθ += = = − .
By the numbers of outcomes counted in part (a),
35 4 31Pr 46 Pr 9 0.03027 0.025
1024 1024
( ) ( )T T+ + −≥ = ≤ = = = >
and
31 4 4 23Pr 47 Pr 8 0.02246 0.025
1024 1024
( ) ( )T T+ + − −≥ = ≤ = = = <
Thus 2 47tα = and 55 1 47 9Cα = + − = for 2(0.02246) 0.04492α = = . Then the exact
%100(1 )% 95.51α− = confidence interval of θ is given by
( ) ( )2( ) ( ) (9) (47), , ( 18.5, 1.5)C tW W W Wα α = = −
3

Question 2
(a) Let ( )if x denote the density and ( )iF x the cdf of iX , 1,2i = .
Since 1X and 2X are continuous with median 0,
(0) Pr( 0) 0.5i iF X= < = and Pr( 0) 0.5iX > = , 1,2i = .
Then the assumptions of 1 2~X X− and independent 1X , 2X imply
1 1 2 1 1 2 2 1 2Pr 0, 1, 0 Pr 0, , 0 Pr 0( ) ( ) ( )X R X X X X X X X> = < = > < < = < −<
2 1 1 1
0 0
Pr ( ) Pr ( )( ) ( )X x f x dx X x f x dx
∞ ∞
= > = >−∫ ∫
[ ] [ ]21 1 1
0 0
11 ( ) ( ) 1 ( )
2
F x f x dx F x
∞∞
= − = − −∫
[ ] [ ]2 21
1 1 11 (0) 1 0.5 0.125
2 2 8
F= − = − = =
Similarly, by interchanging 1X and 2X in the above equations,
1 2 2 2 1 1 2
0
Pr 0, 1, 0 Pr 0 Pr ( )( ) ( ) ( )X R X X X X x f x dx

< = > = < − = >< −∫
[ ]2 2 2 2
0 0
Pr ( ) 1 ( ) ( )( )X x f x dx F x f x dx
∞ ∞
= > = −∫ ∫
[ ] [ ]2 22
1 11 (0) 1 0.5 0.125
2 2
F= − = − =
It then follows from the independence of 1X and 2X that
1 1 2 1 2 2Pr 1 Pr 0, 1, 0 Pr 0, 1, 0( ) ( ) ( )T X R X X R X+ = = > = < + < = >
1 20.125 0.125 0.25 Pr( 0)Pr( 0)X X= + = = > <
1 2Pr( 0, 0) Pr( 1)X X S= > < = =
The range of T + is {0,1,2,3}. It is obvious that
1 2 1 2Pr 0 Pr 0, 0 Pr 0)Pr( 0 0.25 Pr( 0)( ) ( ) ( )T X X X X S+ = = < < = < < = = =
and
1 2Pr 3 Pr 0, 0 0.25 Pr( 3)( ) ( )T X X S+ = = > > = = =
Thus Pr 2 Pr( 2)( )T S+ = = = . These together prove ~T S+ .

4

(b) If N is even, then
1j N ja a j− += = for 1,2, , 2j N=  , and
1 1j N ja a N j− += = − + for 2, 2 1, ,j N N N= +  .
For each outcome 1 (1) ( )( , , ) ( , , )n j j nr r a a=  of Y-scores drawn from 1{ , , }Na a with
1 (1) ( )j j n N≤ < < ≤ , take
( )2 1 2 1i i j ir N r N a= + − = + − , 1, ,i n=  .
If ( ) 2j i N≤ , then ( ) 1( ) 1 2 1 ( ) 2 { , , }j i i i Na j i r N j i N r a a= ⇒ ≤ = + − ≤ ⇒ ∈   .
If ( ) 2j i N> , then ( ) 1 ( ) 1 ( ) 2 2 2j i ia N j i r j i N N N N= + − ⇒ ≤ = − ≤ − = .
Thus 1{ , , }i Nr a a∈  for all 1, ,i n=  . Rearrange 1( , , )nr r  in the order of 1{ , , }Na a
if needed. Then there is a one-to-one mapping between 1( , , )nr r and 1( , , )nr r  .
Therefore, for each outcome 1( , , )nr r of Y-scores with 1 nr r c+ + = , there is one
corresponding outcome 1( , , )nr r  such that
1 1( 2 1) ( ) ( 2 1)n nr r n N r r n N c+ + = + − + + = + −  
It follows that under 20 : 1H γ = ,
( ) ( ) ( )1 1Pr {( , , )} Pr {( , , )} 1 n n Nr r r r n= = ⇒   ( )Pr( ) Pr ( 2 1)C c C n N c= = = + −
for every value c of C . Consequently, C is symmetric about
( ) 01 ( 2)1 E [ ]2 2 4
N n Nn C++ = =

5

Question 3 [25 marks]
(a) The ranks of the observations in the combined sample are given by:
X 35 57 39 30 52 42 38 49 24 36 32 44
Rank 6 19 9 4 17 11 8 15 2 7 5 12
Y 47 40 61 80 28 89 54 74 45 50 21
Rank 14 10 20 22 3 23 18 21 13 16 1
The observed value of W is
14 10 20 22 3 23 18 21 13 16 1 161w = + + + + + + + + + + =
Since 12m = , 11n = and 12 11 23N = + = , the mean and variance of W under the null
hypothesis 0 : 0H ∆ = are
0
11(23 1)E [ ] 132
2
W += = and 0
12(11)(23 1)Var ( ) 264
12
W += =
Hence
0*
0.05
0
E [ ] 161 132 1.785 1.645
Var ( ) 264
W WW z
W
− −
= = = > =
This result shows sufficient evidence for 0∆ > , i.e., Y has a greater location parameter
than X , at the 5% level of significance.
(b) The rank scores for the Ansari-Bradley rank test are (1,2,…,10,11,12,11,10,…,2,1) for
ranks (1,2,…,23). Hence from the Y-ranks obtained in part (a), the Y-scores rank are
given by
Y 47 40 61 80 28 89 54 74 45 50 21
Rank 14 10 20 22 3 23 18 21 13 16 1
Score 10 10 4 2 3 1 6 4 11 8 1
Thus the Ansari-Bradley rank test statistic is
10 10 4 2 3 1 6 4 11 8 1 59C = + + + + + + + + + + =
Since 23N = is odd,
2
0
11(23 1)E [ ] 68.87
4(23)
C += = ,
2
0 2
12(11)(23 1) 23 3Var ( ) 66.374
48 23
( )
( )
C + += =
Hence
0*
0.1
0
E [ ] 59 68.87 1.211 1.282
Var ( ) 66.374
C CC z
C
− −
= = = − > − = −
This shows insufficient evidence at the 10% level of significance for Var( ) Var( )X Y< .

6

(c) The values of 1 12( , , )A A A= … and 1 11( , , )B B B= … for the Miller’s Jackknife test are:
A 3.75 7.73 3.47 4.69 5.40 3.52 3.50 4.52 6.98 3.64 4.22 3.67
B 5.15 5.51 5.18 6.94 6.80 8.76 5.05 6.14 5.23 5.08 8.08
Calculate
3.75 7.73 3.67 4.591
12
A + + += = , 5.15 5.51 8.08 6.175
11
B + + += =
12 2
1
1
0.170
12(11)
( )i
i
A AV
=

= =∑ and
211
2
1
0.156
11(10)
( )j
j
B B
V
=

= =∑
Then
1 2
4.591 6.175 2.776
0.170 0.156
A BQ
V V
− −
= = = −
+ +

Thus the approximate p-value for Var( ) Var( )X Y< is Pr( 2.776) 0.00275Z < − = by the
Miller’s Jackknife test. This provides very strong evidence for Var( ) Var( )X Y< .
(d) The values of * 3X X= and * 66Y Y= + with their ranks are shown below:
X* 105 171 117 90 156 126 114 147 72 108 96 132
Rank 6 23 13 3 22 15 11 20 1 8 5 17
Y* 113 106 127 146 94 155 120 140 111 116 87
Rank 10 7 16 19 4 21 14 18 9 12 2
The empirical distribution functions *12( )F t of
*X and *11( )G t of
*Y at the ordered
values (1) (23)Z Z≤ ≤ of ( )* *,X Y are given by
( ) ,iZ i = 1 2 3 4 5 6 7 8 9 10 11 12
X*/Y* X* Y* X* Y* X* X* Y* X* Y* Y* X* Y*
*
12( )F t 1/12 1/12 2/12 2/12 3/12 4/12 4/12 5/12 5/12 5/12 6/12 6/12
*
11( )G t 0 1/11 1/11 2/11 2/11 2/11 3/11 3/11 4/11 5/11 5/11 6/11
( ) ,iZ i = 13 14 15 16 17 18 19 20 12 22 23
X*/Y* X* Y* X* Y* X* Y* Y* X* Y* X* X*
*
12( )F t 7/12 7/12 8/12 8/12 9/12 9/12 9/12 10/12 10/12 11/12 1
*
11( )G t 6/11 7/11 7/11 8/11 8/11 9/11 10/11 10/11 1 1 1
It follows that
* *
12 ( ) 11 ( )1 23
12(11) 10max 1 1110 12 22
1 12
( ) ( )i ii
mnJ F Z G Z
d ≤ ≤
= − = − = − =

7

Run the following R-codes to obtain the output below:
x <-c(35,57,39,30,52,42,38,49,24,36,32,44)
y <-c(47,40,61,80,28,89,54,74,45,50,21)
ks.test(3*x,y+66)
Two-sample Kolmogorov-Smirnov test
data: 3*x and y + 66
D = 0.16667, p-value = 0.98
alternative hypothesis: two-sided
It shows 0.16667D = and hence verifies ( ) 12(11)(0.16667) 22J mn d D= = = .
The p-value of the test is 0.98.
(e) Based on the results in parts (a) – (d), we can draw the following answers:
(i) The test in part (d) shows a very large p-value 0.98, which provides no evidence
against the hypothesis of equal distribution for * 3X X= and * 66Y Y= + .
(ii) Let 1θ and 2θ denote the medians; 1η and 2η the dispersion parameters of X and
Y , respectively. Then the test in part (d) accepts 3 ~ 66X Y + . As a result,
( ) 1 21 1 21
1 21 1 1 2
3 661 66 (3 66)~
33 3
X Y Y Y θ θθ θ θθ
η ηη η η η
− =− + − − −
− = = ⇒  =

(iii) The location-shift model in part (a) is not appropriate since the Miller’s Jackknife
test in part (c) shows very strong evidence for Var( ) Var( )X Y< and part (d) shows
3 ~ 66X Y + , which contradicts ~X Y + ∆ under the location-shift model.
(iv) The location-scale parameter model is not justified in part (b) as X and Y do not
have an equal location parameter θ by part (c). It is however justified in part (c),
which allows 1 2θ θ≠ , by part (d) as shown in item (ii) above.
(v) The result of part (a) is not justified because the location-shift model is not right.
The result of part (b) is not justified since the Ansari-Bradley test requires 1 2θ θ= ,
which has no support from the analyses.
The result of part (c) is justified because the location-scale parameter model is
justified and 1 2θ θ= is not required for the Miller’s Jackknife test.
(vi) There is insufficient evidence of difference ( 1 2θ θ≠ ) in the profitability of the two
banks despite the total profit in sample Y is greater than in X . On the other hand,
the result of part (c) and the relation 1 23η η= indicate that profits are more stable
at the bank with profits X than the bank with profits Y .

8

Question 4
(a) First find the ranks { }ijr of all 15 observations { }ijX as follows:
Treatment j
1 2 3 4 5
17 (2) 22 (5) 20 (4) 86 (15) 68 (13)
ijX ( ijr ) 28 (7) 15 (1) 39 (9) 54 (11) 73 (14)
18 (3) 43 (10) 61 (12) 32 (8) 25 (6)
1 12R = 2 16R = 3 25R = 4 34R = 5 33R =
Since 5k = , 1 5 3n n= = = and 15N = , the Kruskal-Wallis test statistic from the data
is calculated by
2 2 2 2 2 2
1
12 12 12 16 25 34 333( 1) 3(16)
( 1) 15(16) 3
k
j
jj
R
H N
N N n
=
+ + + +
= − + = ⋅ −
+ ∑
2 2
4,0.10 1,0.10
3270 109 96 1348 6.5 7.78
60 2 2 k
χ χ −

= − = = = < = =
Thus 0H is accepted at the 10% level of significance. This shows insufficient evidence
against 0H for general alternatives at the 10% level.
(b) The counts No. ( , ) :{ }uv iu jvU i j X X= < for 1 5u v≤ < ≤ are calculated below:
uvU
v
u 2 3 4 5
1 5 8 9 8
2 6 8 8
3 6 7
4 4
Hence the Jonckheere-Terpstra test statistic is
4 5 2(6) 7 4(8) 9 69uv
u v
J U
<
= = + + + + + =∑
Calculate
5
2 2 2 2
0
1 1
1 1 225 45E [ ] 15 3 45
4 4 4
k
u
u u
J N n
= =
−   = − = − = =   
   
∑ ∑
and
2
2 2
0
1
1 15 (30 3) 5(9)(6 3)Var ( ) (2 3) (2 3) 97.5
72 72
k
u u
u
J N N n n
=
  + − +
= + − + = = 
 

9

It follows that
* 0
0.01
0
E [ ] 69 45 2.431 2.326
Var ( ) 97.5
J JJ z
J
− −
= = = > =
Thus there is very strong evidence for ordered alternatives 1 5τ τ≤ ≤ with at least one
strict inequality.
(c) Since 1 5 3n n= = = ,
* 3N = and * 3 3( )u u uN R R R= = , 1, ,5u =  .
Therefore, the Nemenyi-Damico-Wolfe one-sided treatments-versus-control multiple
comparison procedure is given by
Decide 1uτ τ> if *1uR R yα− ≥ ; otherwise accept 1uτ τ= , 2, ,5u =  .
By R, we get *0.0919 21y = as shown below:
cNDWol(0.1,c(3,3,3,3,3))
Monte Carlo Approximation (with 10000 Iterations) used:
Control group size: 3 Treatment group size(s): 3 3 3 3
For the given experimentwise alpha=0.1, the upper cutoff value is Nemenyi,
Damico-Wolfe Y*=21, with true experimentwise alpha level=0.0919
Thus at α = 10%, the decisions are
2 1 2 116 12 4 21 R R τ τ− = − = < ⇒ = , 3 1 3 125 12 13 21 R R τ τ− = − = < ⇒ = ,
4 1 4 134 12 22 21 R R τ τ− = − = > ⇒ > , 5 1 5 133 12 21 R R τ τ− = − = ⇒ > .
(d) The test in (a) accepts 0 1 5:H τ τ= = at the 10% level, the test in (b) rejects 0H and
concludes 1 5τ τ≤ ≤ with at least one “<” at the 1% level, and the procedure in (c)
decides 2 3 1 4 5, ,τ τ τ τ τ= < at the 10% level. While these results appear quite different,
they are not contradictive due to differences in alternatives.
Specifically, (a) looks at the evidence for any difference in 1 5, ,τ τ , whereas (b) only
considers 1 1 5:H τ τ≤ ≤ , which attracts stronger evidence when the data match 1H
with 1, , kR R mostly in the same order as 1H (as in this question).
Similarly, the restrictions to 1uτ τ≥ for 2,3,4,5u = in (c) lead to stronger evidence
against 1 5τ τ= = than in (a) even on a multiple-comparison basis because the data
match it with 1uR R> for 2,3,4,5u = . Moreover, the results of (b) and (c) are consistent
in the sense that they both include cases such as 1 2 3 4 5τ τ τ τ τ= = < = .

10

Question 5
(a) Based on the sample data { }ijX , the ranks ijr of ijX and No. ( , ) :{ }uv iu jvU i j X X= <
for 1 4u v≤ < ≤ are calculated in the following tables:
i 1ir 2ir 3ir 4ir
1 11 14 10 2
2 4 18 17 8
3 13 3 20 6
4 1 19 7 16
5 9 12 15 5

uvU
v
u 2 3 4
1 20 20 12
2 13 6
3 4

The Mack-Wolfe test statistic for known peak 2p = is calculated by
2 12 32 42 43 20 12 19 21 72A U U U U= + + + = + + + =
Then 1 4 1 2( , , ) (5,5,5,5) 5 5 10, 3(5) 15 n n N N= ⇒ = + = = = ⇒
2 2 2 2 2 2 2 2
1 2 1 4 2
0 2
10 15 5 5E 50
4 4
[ ] N N n n nA + − − − − + − ×= = =
and
3 3 2 2 2 2
0 2
2 10 15 3 10 15 5 5 (10 3) 5(10)(15) 5 (20)Var 154.17
72 6
( ) ( )( )A + + + − ⋅ + −= + =
Hence the normalized Mack-Wolfe statistic is
*
0.052
72 50 1.772 1.645
154.17
A z−= = > = ⇒ Reject 0H at 0.05α =
Thus there is sufficient evidence for 1 2 3 4τ τ τ τ≤ ≥ ≥ at the 5% level.
(b) Calculate 25vu u v uv uvU n n U U= − = − for 1 4u v≤ < ≤ to obtain uvU for all u v≠ :
v
u 1 2 3 4
1 – 20 20 12
uvU
2 5 – 13 6
3 5 12 – 4
4 13 19 21 –
Then qU are calculated by
1 21 31 41 5 5 13 23U U U U= + + = + + = , 2 12 33 42 20 12 19 51U U U U= + + = + + =
3 13 23 43 20 13 21 54U U U U= + + = + + = , 4 14 22 34 12 6 4 22U U U U= + + = + + =
11

Next, as 1 4( , , ) (5,5,5,5)n n = are all equal, 0E [ ]qU and 0Var ( )qU are invariant in
{1,2,3,4}q∈ . Hence 1 2 3 4, , , {23,51,54,22} { }U U U U = ⇒    3 qU U>  , {1,2,4} q∈ ⇒
0* *3 0 3
3
0 3 0
E [ ]E [ ]
Var ( ) Var ( )
q q
q
q
U UU UU U
U U
−−
= > =  
 
 
for {1,2,4} q∈ ⇒ ˆ 3p =
For 3,p = 3 12 13 23 43 20 20 13 21 74A U U U U= + + + = + + + = and 1 15N = , 2 10 N = ⇒
0 3 0 2E E 50[ ] [ ]A A= = and 0 3 0 2Var Var 154.17( ) ( )A A= = . It follows that
* *

74 50 1.933
154.17p
A A −= = =
(c) The R-command cUmbrPU with 0.08α = produce the following output:
> cUmbrPU(0.06,c(5,5,5,5))
Monte Carlo Approximation (with 10000 Iterations) used:
Group sizes: 5 5 5 5
For the given alpha=0.06, the upper cutoff value is Mack-Wolfe Peak Unknown
A*(p-hat)= 2.1533650717, with true alpha level=0.0526
The R-output shows *ˆ ,0.0526 2.153pa = . As * *ˆ ˆ ,0.05261.933 2.153p pA a= < = , 0H is accepted
at the 5% level of significance. As a result, there is insufficient evidence for umbrella
alternatives with unknown peak at the 5% level.
(d) If ˆ 3p = in part (b) is known, then
*
0.053 1.933 1.645 A z= > = ⇒ Reject 0H at 0.05α =
Thus there is sufficient evidence for umbrella alternatives with known peak 3p = at the
5% level of significance.
The main reason for the difference in the test results between known peak 3p = and
unknown p with estimate ˆ 3p = is that *3A only takes the value with 3p = , whereas
*
pˆA can also take the values of
*
1A ,
*
2A and
*
4A with positive probabilities. This leads to
different distributions of *pˆA and
*
3A , hence different critical points and p-values to
produce different test results.
essay、essay代写