MAST30027-无代写-Assignment 3
时间:2024-09-16
MAST30027: Modern Applied Statistics
Assignment 3, 2024.
Due: 11:59pm Monday September 23rd
• This assignment is worth 12% of your total mark.
• To get full marks, show your working including 1) R commands and outputs you use, 2) mathematics
derivation, and 3) rigorous explanation why you reach conclusions or answers. If you just provide final
answers, you will get zero mark.
• The assignment you hand in must be typed (except for math formulas), and be submitted using LMS as
a single PDF document only (no other formats allowed). For math formulas, you can take a picture of
them. Your answers must be clearly numbered and in the same order as the assignment questions.
• The LMS will not accept late submissions. It is your responsibility to ensure that your assignments
are submitted correctly and on time, and problems with online submissions are not a valid excuse for
submitting a late or incorrect version of an assignment.
• We will mark a selected set of problems. We will select problems worth ≥ 50% of the full marks listed.
• If you need an extension, please contact the lecturer before the due date with appropriate justification and
supporting documents. Late assignments will only be accepted if you have obtained an extension from
the lecturer before the due date. To ensure that the lecturer responds to your extension request email
before the due date, please contact 24h before the due date. Under no circumstances an assignment will
be marked if solutions for it have been released.
• Also, please read the “Assessments” section in “Subject Overview” page of the LMS.
1. The file assignment3 2024 prob1.txt contains 300 observations. We can read the observations and make
a histogram as follows:
> X <- scan(file="assignment3_2024_prob1.txt", what=double())
Read 300 items
> length(X)
[1] 300
> hist(X, breaks=0:max(X))
1
We will model the observed data using a mixture of three negative binomial distributions. Specifically, we
assume that the observations X1, . . . , X300 are independent of each other, and each Xi follows the mixture
model
Zi ∼ Categorical(pi1, pi2, 1− pi1 − pi2),
Xi|Zi = 1 ∼ NegBinomial(r, p1),
Xi|Zi = 2 ∼ NegBinomial(r, p2),
Xi|Zi = 3 ∼ NegBinomial(r, p3),
where r is some known constant. The negative binomial distribution has the probability mass function
f(x; r, p) =
(
x+ r − 1
x
)
pr(1− p)x.
We aim to obtain MLE of the parameters θ = (pi1, pi2, p1, p2, p3) using the EM algorithm.
(a) (5 marks) Let X = (X1, . . . , X300) and Z = (Z1, . . . , Z300). Derive the expectation of the com-
plete log-likelihood, Q(θ, θ0) = EZ|X,θ0 [log(P (X,Z|θ))].
(b) (3 marks) Derive E-step of the EM algorithm.
(c) (5 marks) Derive M-step of the EM algorithm.
(d) (5 marks) Note: Your answer for this problem should be typed. Answers including
screen-captured R codes or figures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the implemented algorithm
to the observed data, X1, . . . , X300, assuming a value of r = 10. Terminate the algorithm when the
number of EM iterations reaches 500, or when the incomplete log-likelihood has changed by less than
10−5 ( = 10−5). Run the EM algorithm twice with the following two different initial values, and report
the parameter estimates corresponding to the highest incomplete log-likelihood.
pi1 pi2 p1 p2 p3
1st initial values 0.1 0.3 0.1 0.8 0.5
2nd initial values 0.2 0.4 0.3 0.2 0.7
For each EM run, check that the incomplete log-likelihoods increase at each EM-iteration by plotting them.
2. The file assignment3 2024 prob2.txt contains 100 additional observations. We can read the 300 obser-
vations from the Problem 1 and the new 100 observations and make histograms as follows:
> X <- scan(file="assignment3_2024_prob1.txt", what=double())
Read 300 items
> X.more <- scan(file="assignment3_2024_prob2.txt", what=double())
Read 100 items
> length(X)
[1] 300
> length(X.more)
[1] 100
> xgrid <- 0:max(c(X,X.more))
> par(mfrow=c(2,2))
> hist(X, breaks=xgrid, ylim=c(0,21), main="Histogram of X", xlab="X")
> hist(X.more, breaks=xgrid, ylim=c(0,21), main="Histogram of X.more", xlab="X.more")
> hist(c(X,X.more), breaks=xgrid, ylim=c(0,21),
> main="Histogram of X & X.more", xlab="X & X.more")
2
Let X1, . . . , X300 and X301, . . . , X400 denote the 300 observations from assignment3 2024 prob1.txt and
the 100 observations from assignment3 2024 prob2.txt, respectively. We assume that the observations
X1, . . . , X400 are independent to each other. We model X1, . . . , X300 using a mixture of three negative
binomial distributions (as we did in Problem 1), but we model X301, . . . , X400 using the first of those three
negative binomial distributions. Specifically, for i = 1, . . . , 300, Xi follows the mixture model
Zi ∼ Categorical(pi1, pi2, 1− pi1 − pi2),
Xi|Zi = 1 ∼ NegBinomial(r, p1),
Xi|Zi = 2 ∼ NegBinomial(r, p2),
Xi|Zi = 3 ∼ NegBinomial(r, p3),
whereas for i = 301, . . . , 400, Xi follows
Xi ∼ NegBinomial(r, p1),
where r is some known constant. We aim to obtain MLE of the parameters θ = (pi1, pi2, p1, p2, p3) using
the EM algorithm.
(a) (5 marks) Let X = (X1, . . . , X400) and Z = (Z1, . . . , Z300). Derive the expectation of the com-
plete log-likelihood, Q(θ, θ0) = EZ|X,θ0 [log(P (X,Z|θ))].
(b) (5 marks) Derive E-step and M-step of the EM algorithm.
(c) (5 marks) Note: Your answer for this problem should be typed. Answers including
screen-captured R codes or figures won’t be marked.
Implement the EM algorithm and obtain MLE of the parameters by applying the implemented algorithm
to the observed data, X1, . . . , X400, assuming a value of r = 10. Terminate the algorithm when the
3
number of EM iterations reaches 500, or when the incomplete log-likelihood has changed by less than
10−5 ( = 10−5). Run the EM algorithm twice with the following two different initial values, and report
the parameter estimates corresponding to the highest incomplete log-likelihood.
pi1 pi2 p1 p2 p3
1st initial values 0.1 0.3 0.1 0.8 0.5
2nd initial values 0.2 0.4 0.3 0.2 0.7
For each EM run, check that the incomplete log-likelihoods increase at each EM-iteration by plotting
them.
essay、essay代写