机器学习代写-SCI 3314|学霸联盟

机器学习代写-SCI 3314

时间：2022-10-29

Primary Exam Semester 2, 2021
Introduction to Statistical Machine Learning
COMP SCI 3314 Course ID: 109413
Total Duration: 150 mins
Questions Time Marks
Answer all 6 questions 150 mins 100 marks
100 Total
Permitted Materials
• Calculator - Standard Drawing instruments or rulers, Lecture notes,
Paper dictionary, Paper english dictionary, Paper translation dictionary
are permitted.
• Use of internet is not permitted.
• 1 answer booklet
Introduction to Statistical Machine Learning
Primary Exam Semester 2, 2021 Page 2 of 7
Overview of Machine Learning, etc.
Question 1
(a) Please judge the correctness of the following statement. If it is not
correct, please give reasons:
True or False To perform cross-validation, we can use all training
data to train the model and a subset of training data to assess how
the results of learned model will generalise to an unseen data set.
[4 marks]
(b) Which algorithm(s) of the following can be used as supervised ma-
chine learning algorithm(s): (1) Kernel Principal Component Anal-
ysis. (2) k-means clustering. (3) Support Vector Machine Classifier
(4) K-nearest neighbour classifier (5) Ridge Regression.
[4 marks]
(c) Describe the difference between a generative classifier and a dis-
criminative classifier (3 marks) and please give one example of
generative classifier and one example of discriminative classifier (2
marks).
[5 marks]
(d) Please judge the correctness of the following statement. If it is not
correct, please give reasons:
True or False Increasing k in a k-nearest neighbour classifier can
lead to a smoother decision boundary.
[3 marks]
[Total for Question 1: 16 marks]
Please go on to the next page. . .
Introduction to Statistical Machine Learning
Primary Exam Semester 2, 2021 Page 3 of 7
Support Vector Machines (SVMs)
Question 2
Let {(xi, yi)}ni=1 be the training data for a binary classification problem,
where xi ∈ Rd and yi ∈ {−1, 1}. Let w ∈ Rd be the parameter vector,
b ∈ R be the offset, ξi be the slack variable for i = 1, ..., n.
Here the notation 〈p,q〉 = p · q calculates the inner product of two
vectors.
(a) What is wrong with the following primal form of the soft margin
SVMs?
min
w,b,ξ
1
2
‖w ‖2 + C
n∑
i=1
ξi,
s.t. yi(〈xi,w〉+ b) ≥ 1− ξi, i = 1, · · · , n,
ξi ≤ 0, i = 1, · · · , n.
[3 marks]
(b) The dual form of the hard margin SVMs is given below.
max
α
n∑
i=1
αi − 1
2
∑
i,j
αiαjyiyj 〈xi,xj〉
s.t. αi ≥ 0, i = 1, · · · , n
n∑
i=1
αiyi = 0
Answer the following two questions: (1) Express w using the dual
variables and the training data (3 marks). (2) Describe how to find
support vectors by only using dual variable αi (3 marks).
[6 marks]
(c) Assume that we have a modified version of the standard Support
Vector Machines, which has the following primal formulation:
min
w,b,ξ
1
2
‖w ‖2 + 1
2
C
n∑
i=1
ξ2i ,
s.t. yi(〈xi,w〉+ b) ≥ 1− ξi,
i = 1, · · · , n,
i. Explain the reason why, with or without the constraint ξi ≥ 0,
the above problem will have the same optimal solution. (4 points)
ii. Derive the dual formulation of the above primal problem. (8
points)
[12 marks]
[Total for Question 2: 21 marks]
Please go on to the next page. . .
Introduction to Statistical Machine Learning
Primary Exam Semester 2, 2021 Page 4 of 7
Ensemble Learning and Regression
Question 3
(a) True or False If a classifier can easily achieve 100% training accu-
racy when trained on the training set, its classification accuracy on
the test set can be further boosted by the Adaboost algorithm.
[3 marks]
(b) Please describe how to use Bagging to learn an ensemble of clas-
sifiers (1 mark) and why classifiers trained from the Bagging algo-
rithm tend to be different (3 marks).
[4 marks]
(c) Please write down the objective function of Ridge regression (2
marks) and its solution (3 marks).
[5 marks]
[Total for Question 3: 12 marks]
Please go on to the next page. . .
Introduction to Statistical Machine Learning
Primary Exam Semester 2, 2021 Page 5 of 7
Clustering and Kernels
Question 4
(a) True or False The loss function of k-means algorithm will mono-
tonically decrease with the number of iterations. However, for ker-
nel k-means algorithm, this is not necessarily true.
[3 marks]
(b) Please describe at least two advantages of Gaussian-Mixture-Model-
based clustering over the k-means clustering algorithm.
[4 marks]
(c) Suppose we have two kernels K1(·, ·) and K2(·, ·) such that there
are an implicit high-dimensional feature maps Φj : Rd → RD that
satisfy ∀x, z ∈ Rd, Kj(x, z) = Φj(x) · Φj(z), j = 1, 2, where
Φj(x) · Φj(z) = 〈Φj(x),Φj(z)〉 = ∑Di=1 Φj(x)iΦj(z)i is the dot prod-
uct (a.k.a. inner product) in the D-dimensional space.
Note that here Φj(x)i means the i-th dimension of the vector Φj(x).
Define K(x, z) = K1(x, z)K2(x, z). Will K(x, z) be a valid kernel
function?
If the answer is yes, prove that. If the answer is no, explain why.
[7 marks]
[Total for Question 4: 14 marks]
Please go on to the next page. . .
Introduction to Statistical Machine Learning
Primary Exam Semester 2, 2021 Page 6 of 7
Principal Component Analysis (PCA) and Linear Discriminant Analysis
(LDA)
Question 5
In this problem two linear dimenionality reduction methods will be
discussed. They are principal component analysis (PCA) and linear
discriminant analysis (LDA).
(a) LDA reduces the dimensionality given labels by maximising the
overall interclass variance relative to intraclass variance. Plot the
directions (roughly) of the first PCA and LDA components in the
following figure respectively. In the figure, squares and circles rep-
resent two different classes of data points.
[6 marks]
(b) Given a dataset {xi}, i = 1, · · · , N , xi ∈ Rd. After applying a linear
transform W to the data, that is, x′i = W>xi, the covariance ma-
trix of the transformed data becomes an Identity matrix I. Please
describe how to obtain W.
[6 marks]
(c) Please explain how to efficiently calculate the PCA projection ma-
trix from a small number of very high-dimensional data samples.
Assume the data matrix is X ∈ Rd×N , where d is the feature di-
mensionality and N is the number of samples, with d N , for
example, d = 1000, 000 and N = 100.
[6 marks]
[Total for Question 5: 18 marks]
Please go on to the next page. . .
Introduction to Statistical Machine Learning
Primary Exam Semester 2, 2021 Page 7 of 7
Neural Networks and Semi-supervised Learning
Question 6
(a) Please describe at least one advantage of deep learning over tra-
ditional machine learning approaches, for example, support vector
machines.
[3 marks]
(b) Training a convolutional neural network for recognising handwrit-
ing digits, one finds that performance on the training set is very
good while the performance on the validation set is unacceptably
low. A reasonable fix might be to:
(Select the answer (answers) that could be the solution(s), and
briefly explain why):
(A) Reduce the training set size, and increase the validation set
size.
(B) Increase the number of layers and neurons.
(C) Reduce the number of layers and neurons.
(D) Train longer with more iterations
[2 marks]
(c) Please briefly describe one semi-supervised learning approach (1
mark) and why it can build a stronger model by using unlabelled
data (2 marks).
[3 marks]
(d) We use the following convolutional neural network to classifiy a set
of 32×32 color images, that is, the input size is 32×32×3:
1) Layer 1: convolutional layer with the ReLU nonlinear activiation
function, 100 5×5 filters with stride 2.
2) Layer 2: 2×2 max-pooling layer
3) Layer 3: convolutional layer with the ReLU nonlinear activiation
function, 50 3×3 filters with stride 1.
4) Layer 4: 2×2 max-pooling layer
5) Layer 5: fully-connected layer
6) Layer 6: classfication layer
How many parameters are in the first layer (4 marks), the second
layer (3 marks) and the third layer (assume bias term is used) (4
marks)?
[11 marks]
[Total for Question 6: 19 marks]
End of exam