MATH3821-无代写-Assignment 2|学霸联盟

MATH3821-无代写-Assignment 2

时间：2024-07-24

Assignment 2
MATH3821 Statisical Modelling and Computing, UNSW
Term 2, 2024
The total marks available for this assignment is 20. It will be possible to obtain a raw score above 20, however
the final mark will be the minimum of the raw mark and 20. The assignment is due on Friday July 26 at
5pm and should be submitted through the link titled “Submission link - Assignment 2” on the subject’s
Moodle page. Please combine your work into a single pdf document that includes all relevant R code.
Question 1 [3+4+2+2+5+4=20 Marks]
Let y := (y1, . . . , yn)T . In the univariate scatterplot smoothing problem, yi, i = 1, . . . , n, are responses that
depend on a single predictor xi, through the equation
yi = f(xi) + εi,
where f(·) is a general smooth function, x1 < x2 < · · · < xn, and εi, i = 1, . . . , n, are independent mean 0,
variance σ2 error terms. Our goal is to estimate f := (f(x1), . . . , f(xn))T , and we denote this estimate by
fˆ := (fˆ(x1), . . . , fˆ(xn))T . If we apply a linear smoothing method to compute fˆ , then there always exists an
n× n matrix S such that
fˆ = Sy,
which is called the smoothing matrix. The matrix S depends on x := (x1, . . . , xn)T , the smoothing method
applied, and the smoothing parameter (labelled h below).
In this question we will apply the kernel smoothing method to compute fˆ . In this method
fˆ(xi) =
n∑
j=1
wj(xi)yj , where wj(xi) =
K
(
xj−xi
h
)
∑n
k=1K
(
xk−xi
h
) .
Here h is the smoothing parameter that can be adjusted to manage the bias/variance trade-off and K is the
kernel. The above equation implies that
Sij = wj(xi) =
K
(
xj−xi
h
)
∑n
k=1K
(
xk−xi
h
) ,
where Sij is the ij-th entry of the smoothing matrix S. Below we let x = (x1, . . . , xn)T =
(0, 1/6, 2/6, . . . , 59/60, 10)T and let the kernel K be the standard normal density. The following
code computes S with h = 1.
h=1
n=61
x<-seq(from=0, to=10, length.out=n)
S<-matrix(0,nrow=n,ncol=n)
for(i in 1:n){
S[i,]<-dnorm((x-x[i])/h)/sum(dnorm((x-x[i])/h))
}
1
(a) Suppose that f(x) = x so that f = (f(x1), . . . , f(x61)) = (0, 1/6, . . . , 10)T . Compute the bias vector
b := f − E(fˆ), for h = 1, and use it to plot the bias as a function of x. Does the smoothing method
suffer from boundary bias? Explain you answer. (Hint: we can compute the bias using one of the
formulas given in lecture 14)
(b) Now suppose that f(x) = x cos(10− x) and σ2 = 16, with f = (f(x1), . . . , f(x61))T .
x<-seq(from=0, to=10, length.out=n)
f<-x*cos(10-x)
plot(x,f,type="l")
0 2 4 6 8 10
−
5
0
5
10
x
f
(b-continued) The code below plots the average squared bias
bT b
n
= 1
n
n∑
i=1
(fi − E(fˆi))2
as a function of the smoothing parameter h for h ∈ [0.1, 1.5]. Edit the code so that it also computes the
average variance and the average mean squared error
1
n
n∑
i=1
V ar(fˆi) and MSE(h) =
1
n
n∑
i=1
V ar(fˆi) +
1
n
n∑
i=1
(fi − E(fˆi))2
as a function of h. Plot all three functions on the same graph with MSE(h) illustrated in black and
1
n
∑n
i=1 V ar(fˆi) in blue. Use the plot to explain how the smoothing parameter h manages the bias/variance
trade-off and find the value of h (accurate to ±0.05) that minimises the average mean squared error.
k=100
sigma2=16
hvals=seq(0.1,1.5,length.out=k)
bs<-rep(0,k)
#
#add some code here to answer question
#
j=1
2
for(h in hvals){
S<-matrix(0,nrow=n,ncol=n)
for(i in 1:n){
S[i,]<-dnorm((x-x[i])/h)/sum(dnorm((x-x[i])/h))
}
b<-f-S%*%f
bs[j]<-t(b)%*%b/n
#
#add some code here to answer question
#
j=j+1
}
plot(hvals,bs,col="red",type="l",ylim=c(0,10))
0.2 0.4 0.6 0.8 1.0 1.2 1.4
0
2
4
6
8
10
hvals
bs
#use "lines" function here to add function to plot
(c) The code below simulates an outcome the response vector y. Plot y, f and fˆ as a function of x, where
fˆ is computed using the optimal value of h found in (b). In addition, add 95% confidence intervals for
f using the method described at the end of lecture 14.
set.seed(3821)
y<-f+rnorm(n,sd=4)
(d) Explain why in practice (i.e., when we work with data from real-world applications rather than simulated
data) we are generally unable select h by minimising MSE(h) directly, and suggest an alternative.
(e) Prove that
fˆ−i(xi) =
∑
j 6=i
Sijyj + Siifˆ−i(xi),
where
fˆ−i(xi) =
∑
j 6=iK
(
xj−xi
h
)
yj∑
k 6=iK
(
xk−xi
h
) ,
and explain how this result can be used to construct an algorithm that computes CV (h) efficiently.
3
(f) For y in (c), use the computational method described in (e) to compute CV (h) for h ∈ [0.1, 1.5].
Compare this to PSE(h) by plotting both functions on the same axis. What are the values of h that
minimise CV (h) and PSE(h), respectively? Do you expect these values to be similar? Explain your
answer by citing any relevant results from lectures.
Question 2 [0+0+0=0 Marks — Optional, up to 2 bonus marks available]
In this question we will analyse the spam2.txt dataset seen in the tutorials. The data can be read into R
with the following code:
spam<-read.csv("spam2.txt",header=TRUE)
The predictors are
• ‘freq.excl’ which is the percentage of characters in the email which are exclamation marks (xi1),
• ‘freq.dollar’ which is the percentage of characters in the email which are dollar signs (xi2),
• ‘freq.hash’ which is the percentage of characters in the email which are hash symbols (xi3),
• ‘average’ which is the average run length of capital letters (xi4),
and the response is
• ‘isjunk‘ which is 1 if the email is spam and 0 otherwise (yi).
There is often a trade-off between the interpretability of a statistical model and its predictive performance.
We are going to use the spam dataset and some of the statistical models seen in the course to explore this
trade-off.
In order to measure the predictive performance of the models we will split the data into a training set and a
validation set. This can be done using the following code (entering the command set.seed(3821) before
using sample ensures your training and validation set are the same as mine):
set.seed(3821)
train<-sample(4601, 3000)
spam.train<-spam[train,]
spam.valid<-spam[-train,]
We consider three statistical models. Some key assumptions of each model are:
Model 1: GLM yi ∼ Bern(p(xi)), where
log
(
p(xi)
1− p(xi)
)
= β0 + β1xi1 + β2xi2 + β3xi3 + β4xi,4.
This model can be fit in R using the code:
spam.glm<-glm(isjunk~.,family=binomial, data=spam.train)
Model 2: GAM yi ∼ Bern(p(xi)) where
log
(
p(xi)
1− p(xi)
)
= f1(xi1) + f2(xi2) + f3(xi3) + f4(xi4).
This model can be fit in R using the code:
library(mgcv)
spam.gam<-gam(isjunk~s(freq.excl)+s(freq.dollar)+s(freq.hash)+s(average),
family=binomial, data=spam.train)
4
Model 3: PPR yi = f(xi) + εi where εi is a mean zero error term and
f(xi) = f(xi1, . . . , xi4) =
M∑
j=1
fj(αTj xi),
for M = 5 and α1, . . . , αM ∈ R5. This model can be fit in R using the code:
spam.ppr<-ppr(isjunk~freq.excl+freq.dollar+freq.hash+average,
data=spam.train, nterms=5,max.terms=25)
(a) In general, does the addition of a single $, #, or ! to an email have the biggest positive impact on the
chance that the email is spam? Which of the three models would you use to answer this question and
what is your conclusion?
(b) A programmer believes that the inclusion of a few dollar signs in an email is a strong indication that
the email is spam, but the inclusion of many dollar signs in an email is an indication that the email is
not spam but rather contains text from a programming language. Which of the three models would
you use to explore this question and what is your conclusion?
(c) You are tasked with coming up with a spam filter. You choose to apply the classification rule
yˆi =
{
1, if pˆ(xi) ≥ 0.5
0, if pˆ(xi) < 0.5
for Models 1 and 2, and
yˆi =
{
1, if fˆ(xi) ≥ 0.5
0, if fˆ(xi) < 0.5
for Model 3, where if yˆi = 1 we classify email i as spam. To evaluate the predictive performance of
the three models, you use the models fit with the training data to predict which of the emails in the
validation set are spam and the compare these predictions to the ground truth stored in spam.valid[,5].
Based on your analysis, which of the three models would you use for the spam filter? (Hint: the function
predict(fitted_model, newdata = spam.valid[,-5]) and the function round may help to obtain
the predictions yˆ. The functions abs and sum may help to compare these predictions to the ground
truth spam.valid[,5]).