统计推断代写-STAT 305
时间:2020-12-14
University of British Columbia
Final Examination
STAT 305 Introduction to Statistical Inference 2016{17 Term 2
Instructor: William J. Welch
Student FAMILY Name:
(Please PRINT)
Student Given Names:
(Please PRINT)
Student ID Number:
Signature:
Date of Exam: April 13, 2017
Time Period: 12:00{2:30 pm
Number of Exam Pages: 12, including this cover sheet
(please check for completeness)
Additional Materials Allowed: Calculator; formula sheet (81
2
11, 2-sided)
Question Marks Score
1 12
2 13
3 12
4 13
Total 50
Student Conduct During Examinations
1. Each examination candidate must be prepared to produce, upon request of the invigilator or
examiner, his or her UBCcard for identication.
2. Examination candidates are not permitted to ask questions of the examiners or invigilators,
except in cases of supposed errors or ambiguities in examination questions, illegible material,
or the like.
3. No examination candidate shall be permitted to enter the examination room after the expira-
tion of one-half hour from the scheduled starting time, or to leave during the rst half hour of
the examination. Should the examination run forty-ve (45) minutes or less, no examination
candidate shall be permitted to enter the examination room once the examination has begun.
STAT 305, April 13, 2017 Page 2
4. Examination candidates must conduct themselves honestly and in accordance with established
rules for a given examination, which will be articulated by the examiner or invigilator prior
to the examination commencing. Should dishonest behaviour be observed by the examiner(s)
or invigilator(s), pleas of accident or forgetfulness shall not be received.
5. Examination candidates suspected of any of the following, or any other similar practices, may
be immediately dismissed from the examination by the examiner/invigilator, and may be
subject to disciplinary action:
(a) speaking or communicating with other candidates, unless otherwise authorized;
(b) purposely exposing written papers to the view of other examination candidates or imag-
ing devices;
(c) purposely viewing the written papers of other examination candidates;
(d) using or having visible at the place of writing any books, papers or other memory aid
devices other than those authorized by the examiner(s); and,
(e) using or operating electronic devices including but not limited to telephones, calculators,
computers, or similar devices other than those authorized by the examiner(s)|(electronic
devices other than those authorized by the examiner(s) must be completely powered down
if present at the place of writing).
6. Examination candidates must not destroy or damage any examination material, must hand
in all examination papers, and must not take any examination material from the examination
room without permission of the examiner or invigilator.
7. Notwithstanding the above, for any mode of examination that does not fall into the traditional,
paper-based method, examination candidates shall adhere to any special rules for conduct as
established and articulated by the examiner.
8. Examination candidates must follow any additional examination rules or directions commu-
nicated by the examiner(s) or invigilator(s).
STAT 305, April 13, 2017 Page 3
Tables 1 and 2 at the end of the booklet contain some common discrete and continuous distributions,
along with their properties. They are the same as Tables 1.2 and 1.3 in the Course Notes, except
that the Laplace distribution is added to Table 2.
For full marks in questions asking you to show or derive a property, you must be clear about any
RESULT you are using, including any CONDITIONS for it to hold, and HOW the result is applied.
1. The United States PR/HACCP Act prescribes an inspection plan to monitor a facility's
control of E. coli in the production of beef carcasses. It involves taking a sample of n = 13
carcasses periodically from production and testing them.
Let be the probability a randomly chosen carcass from a large batch of carcasses has an
unacceptable test result, and assume the 13 test results in a sample are statistically indepen-
dent. Then Y , the number of carcasses in the random sample with unacceptable levels of E.
coli, has a Bin (n = 13; ) distribution.
Suppose we want to test = 0:1822 (\safe") against = 2 0:1822 = 0:3644 (\unsafe").
(a) [1 mark] Write down the null hypothesis, H0, and the alternative hypothesis, Ha, of
interest here.
(b) [2 marks] Write down the likelihood ratio for testing H0 versus Ha and simplify it.
(c) [2 marks] Let y be the observed value of Y . Do small or large values of y give more
evidence against H0 in favour of Ha? What theorem or lemma justies your answer?
STAT 305, April 13, 2017 Page 4
(d) Suppose the hypothesis test rejects H0 if y is 3 or more.
i. [2 marks] Write down an expression for the probability of a Type I error. Be specic
but do not evaluate the expression numerically.
ii. [2 marks] Write down an expression for the power of the test. Be specic but do not
evaluate the expression numerically.
iii. [1 mark] Give an R expression to evaluate the power of the test numerically.
(e) Now consider a test of = 0:1822 against > 0:1822. We still have n = 13, and H0 is
still rejected if y is 3 or more.
i. [1 mark] Is the probability of a Type I error the same as in question 1(d)i? Why or
why not?
ii. [1 mark] Is the power the same as in question 1(d)ii? Explain.
STAT 305, April 13, 2017 Page 5
2. A contingency table of frequency data has a \row variable" with levels indexed by i = 1; : : : ;
I and a \column variable" with levels j = 1; : : : ; J . Thus, there are frequencies yij for IJ
categories generated by all combinations of the levels of the row and column variables, as set
out in the following table.
Level Frequency
of row Level of column variable
variable 1 2 : : : J Total
1 y11 y12 : : : y1J y1:
2 y21 y22 : : : y2J y2:
...
...
...
. . .
...
...
I yI1 yI2 : : : yIJ yI:
Total y:1 y:2 : : : y:J n
The table also denes the following quantities: yi:, the total frequency for row i; y:j, the total
frequency for column j; and n, the total sample size across all categories.
We assume that y1j; : : : ; yIJ are frequencies for a random sample of size n from a multino-
mial distribution with category probabilities 11; : : : ; IJ . This multinomial distribution has
probability mass function
fY11;:::;YIJ (y11; : : : ; yIJ j n; 11; : : : ; IJ) =

n
y11; : : : ; yIJ
IY
i=1
JY
j=1

yij
ij
(0 ij 1;
IX
i=1
JX
j=1
ij = 1; yij = 0; 1; : : : ; n;
IX
i=1
JX
j=1
yij = n):
(a) [2 marks] Suppose there are no further restrictions on the ij. How many free parameters
are there among 11; : : : ; IJ? Explain.
(b) [1 mark] Write down the log likelihood.
(c) Now impose the null hypothesis
H0 : ij = i::j (i = 1; : : : ; I; j = 1; : : : ; J);
where i: and :j are marginal probabilities for the levels of the row and column variables,
respectively.
STAT 305, April 13, 2017 Page 6
i. [1 mark] Brie y, what does H0 imply in terms of the row and column variables?
ii. [2 marks] In total how many free parameters are there among 1:; : : : ; I: and :1; : : : ;
:J? Explain.
iii. [2 marks] Hence how many degrees of freedom are there for testing H0 against an
alternative hypothesis that H0 is not true. Explain brie y.
iv. [2 marks] The maximum likelihood estimates under H0 are ^i: = yi:=n and ^:j =
y:j=n. What is the expected frequency under H0 corresponding to yij? Explain.
v. [3 marks] How would you test H0 against an alternative Ha that is the negation of
H0? Make sure you outline all steps with enough detail to implement the test.
STAT 305, April 13, 2017 Page 7
3. Two samples of data-transmission lines are available. The rst sample consists of n1 lines
of length about 22 km; the second is n2 lines of about 170 km. They are referred to as the
\22-km sample" and the \170-km sample" below. For both samples, data on the number of
faults per line is recorded.
Let y1; : : : ; yn1 denote the observed numbers of faults for the lines in the 22-km sample,
assumed to be the values of IID draws from a Pois (1) distribution. Similarly, z1; : : : ; zn2 for
the 170-km sample are assumed to be values of IID draws from Pois (2). Thus, the Poisson
mean, , is allowed to be dierent for the two distributions. Furthermore, the two random
samples are assumed to be drawn independently of each other.
(a) [3 marks] Explain carefully why the joint log likelihood for both samples is
c n11 n22 + ln(1)
n1X
i=1
yi + ln(2)
n2X
j=1
zj;
where c is a constant which does not depend on 1 or 2.
(b) [2 marks] Now we further assume that the Poisson mean is proportional to the length
of the line, i.e.,
1 = 22 and 2 = 170;
where > 0 is the mean number of faults per kilometre. Show that the log likelihood is
d (22n1 + 170n2)+ ln()

n1X
i=1
yi +
n2X
j=1
zj
!
;
where d is a constant not depending on .
STAT 305, April 13, 2017 Page 8
(c) [3 marks] Find the maximum likelihood estimate of .
(d) [3 marks] Find the Fisher information for estimating .
(e) [1 mark] Hence, give an approximate formula for the variance of the maximum likelihood
estimator of .
(f) [Bonus 2 marks] How do you know that your method for maximizing the likelihood in
part 3c does indeed nd the maximum?
STAT 305, April 13, 2017 Page 9
4. Let Y1; : : : ; Yn be a random sample of independent and identically distributed draws from a
Laplace distribution with parameters and . (See Table 2 for properties of the distribution.)
(a) Let n = 9. The observations y1; : : : ; y9 in sorted order are
62:7; 99:3; 100:0; 104:5; 106:3; 107:3; 107:5; 109:5; 117:7:
The following plot shows the log of the joint Laplace probability density function,
ln fY1;:::;Y9(y1; : : : ; y9 j ; ), as a function of and . (Note that labels of the contours
are negative values.)
µ
φ
−35
−36
−37
−38
−39


39



39

−39

−40

40 −41


41 −42


45
95 100 105 110 115
5
10
15
20
25
i. [1 mark] What is this function called?
ii. [2 marks] Student A says that the maximum likelihood (ML) estimate of is the
sample mean, whereas Student B says that the ML estimate of is the sample
median (the middle value). Which student is right? Explain brie y (no derivation
required).
STAT 305, April 13, 2017 Page 10
(b) Now consider the general case of sample size n with observed values y1; : : : ; yn.
i. [3 marks] Write down and simplify the log likelihood.
ii. [3 marks] Suppose n is odd. Find the maximum likelihood estimate of . Make sure
you argue that the estimate provides the unique maximum of the likelihood.
iii. [2 marks] Suppose the maximum likelihood estimate of is also found. Student B
suggests nding approximate standard errors for the estimators ~ and ~ using the
observed information matrix. Do you think the suggestion is reasonable? Why or
why not?
iv. [2 marks] Suggest another way of nding an approximate standard error for ~.
STAT 305, April 13, 2017 Page 11
Distribution
and notation PMF, fY (y) E(Y ) Var(Y ) MGF, MY (t)
Bernoulli
Bern ()
fY (0) = 1 , fY (1) =
(y = 0; 1; 0 < < 1)
(1 ) 1 + et (1 < t <1)
Binomial
Bin (n; )

n
y

y(1 )ny
(y = 0; 1; : : : ; n;
n = 1; 2; : : :; 0 < < 1)
n n(1 ) (1 + et)n
(1 < t <1)
Geometric
Geom0 ()
(1 )y (y = 0; 1; : : : ;1;
0 < < 1)
1

1
2

1 (1 )et
(1 < t < ln(1 ))
Geometric
Geom1 ()
(1 )y1
(y = 1; 2; : : : ;1;
0 < < 1)
1

1
2
et
1 (1 )et
(1 < t < ln(1 ))
Negative
binomial
NegBin (n; )

y1
n1

(1 )ynn
(y = n; n+ 1; : : : ;1;
n = 1; 2; : : : ;1; 0 < < 1)
n

n(1 )
2

et
1 (1 )et
n
(1 < t < ln(1 ))
Poisson
Pois ()
ey
y!
(y = 0; 1; : : : ;1; > 0)
e(e
t1) (1 < t <1)
Table 1: Some commonly used discrete distributions, along with their expectations, variances, and
moment generating functions (MGFs)
STAT 305, April 13, 2017 Page 12
Distribution
and notation PDF, fY (y) E(Y ) Var(Y ) MGF, MY (t)
Beta Beta (a; b) 1
B(a; b)
ya1(1 y)b1
(0 < y < 1; a > 0; b > 0)
a
a+ b
ab
(a+ b)2(a+ b+ 1)
Not useful
Chi-squared 2d
1
2d=2(d=2)
yd=21ey=2
(y > 0; d = 1; 2; : : :)
d 2d 1
(1 2t)d=2
(1 < t < 1
2
)
Exponential
Expon ()
ey (y > 0; > 0) 1

1
2

t
(1 < t < )
Fisher's F
Fd1;d2
(d1=d2)
d1=2yd1=21
B

d1
2
; d2
2

1 + d1
d2
y
d1+d2
2
(y > 0; d1; d2 = 1; 2; : : :)
d2
d2 2
(d2 > 2)
2d22(d1 + d2 2)
d1(d2 2)2(d2 4)
(d2 > 4)
Does not exist
Gamma
Gamma (; )
1
()
(y)1ey
(y > 0; > 0; > 0)



2


t

(1 < t < )
Laplace
Lap (; )
1
2
e
jyj
(1 < y <
1;1 < <1; > 0)
22 e
t
1 2t2
(jtj < 1=)
Log-normal
logN (; 2)
1p
2y
e
1
22
(ln(y))2 (y > 0;
> 0; 2 > 0)
e+
2=2 (e
2 1)e2+2 Does not exist at
t = 0
Normal
N (; 2)
1p
2
e
1
22
(y)2
(1 < y <1;
1 < <1; 2 > 0)
2 et+
1
2
2t2
(1 < t <1)
Student's t td
1
B

1
2
; d
2
p
d

1 + y
2
d
d+1
2
(1 < y <1; d = 1; 2; : : :)
0 (d > 1) d
d 2 (d > 2) Does not exist
Uniform
(rectangular)
Unif (a; b)
1
b a (a < y < b; a < b)
a+ b
2
(b a)2
12
ebt eat
(b a)t
(1 < t <1)
Table 2: Some commonly used continuous distributions, along with their expectations, variances,
and moment generating functions (MGFs)













































































































































































































































































































































































































































































essay、essay代写