STATS 769
Mock Questions
Yong Wang
Department of Statistics
The University of Auckland
2021
Question 1
Using least squares for model fitting, a data scientist decides to
eliminate an insignificant predictor variable from a linear regression
model. Will the residual sum of squares increase, decrease, or
possibly vary in either direction?
1
Question 2
A cubic regression spline fit to a data set is shown below, with
knots marked by solid points. What is the major problem with this
regression spline and how should we modify it for better predictive
performance?
0.0 0.5 1.0
0.
5
1.
0
1.
5
2.
0
2.
5
3.
0
3.
5
x
y
2
Question 3
A random sample of univariate observations has been collected,
with n1 = 100 observations in class 1 and n2 = 200 observations in
class 2. The sample mean and variance are, respectively, 0.5 and
1.44 for class 1, and are, respectively, 3 and 4 for class 2. If the
quadratic discrimant analysis is performed, which class should an
observation with x = 1.8 be allocated to?
[Can you also work out the class label of the observation predicted
by the linear discrimant analysis?]
3
Question 4
Let fit be the output of function tree(), which stores an
unpruned tree built for some data set. Based on the following
cross-validation results, how many leaf nodes should the pruned
tree have for optimal predictive performance?
> cv.tree(fit, method="misclass")
$size
[1] 20 11 9 7 4 2 1
$dev
[1] 38 38 43 54 57 58 212
$k
[1] -Inf 0.00 2.50 3.50 3.67 4.50 164.00
$method
[1] "misclass"
attr(,"class")
[1] "prune" "tree.sequence"
4
Question 5
Out of the complete, single, average and centroid linkage methods,
which one will most likely produce a clustering result with 2
ring-shaped clusters for the following data?
−1.0 −0.5 0.0 0.5 1.0
−
1.
0
−
0.
5
0.
0
0.
5
1.
0
x1
x 2
5
学霸联盟学霸联盟