School of Mathematics and Statistics
MAST90083: Computational Statistics and Data Science
Assignment 2
Due date: No later than 5:00 pm on Tuesday 3rd October 2023
Weight: 15%
Question 1: Kernel Regression
Starter code for this problem is in starter.R. That code will generate a data set to be used
for this problem, and will also provide a true mean function µ(x). The resulting data frame
has a x column (your predictor) and a y column (your response).
1. Plot y versus x. Overlay the true mean function µ(x) using the curve function in R.
What do you notice for x < 4pi and x > 4pi?
2. Using the np library in R, fit a kernel regression on each of the following datasets:
Only those data points with x < 4pi.
Only those data points with x > 4pi.
All the data points
For each of these regressions, what is the optimal bandwidth? How does the optimal
bandwidth for the overall data set compare to the optimal bandwidth for each of the
halves?
3. For each of the three selected bandwidths, make a plot showing:
The true mean µ(x).
The data points.
The kernel regression predictions, with the bandwidth specified to be the selected
bandwidth.
The 95% confidence band for the regression curve µ using resampling of residuals.
The 95% confidence band for the regression curve µ using resampling of cases.
The result should be three plots, each tuned to one of the selected bandwidths. Give
these plots clear titles to distinguish them.
4. How do these three plots differ? In particular, how well do the regressions trained on
the left and right halves do on each half of the data set? How well does the bandwidth
fit on the overall data set do on each half? (Be specific about the types of problems
that occur.) What lesson might this tell about functions of varying smoothness and
kernel regression, if any?
1
Question 2: EM Algorithm
Assume Y has a multinomial distribution with probabilities
p = (0.35− pi, 0.2 + pi, 0.3− pi, 0.15 + pi).
Assume that the observed frequencies are y = (y1, y2, y3, y4).
1. Find the log-likelihood function `(pi) and its derivatives ∂`(pi)/∂pi, ∂2`(pi)/∂pi2.
2. Let y = (40, 42, 38, 40). Estimate pi using the Newton-Raphson algorithm. Try pi0 = 0.2
as a starting value for the algorithm.
3. Assume now that X is a multinomial distribution with probabilities
p = (0.25, 0.1− pi, 0.1, 0.1 + pi, 0.2, 0.1− pi, 0.05, 0.1 + pi).
The observed data are y1 = x1 + x2 = 40, y2 = x3 + x4 = 42, y3 = x5 + x6 = 38 and
y4 = x7 + x8 = 40 as in (2). Derive the expression of the E-step of the EM algorithm
for estimating pi.
4. Estimate pi using the EM algorithm taking pi0 = 0.2 as a starting value for the algorithm.
Do you obtain the same answer as in (2)?
Grading:
Total: 15 points
Question 1: 8 points
Question 2: 7 points
The assignment is to be submitted via LMS