COMP0078-无代写
时间:2023-05-15
Supervised Learning, COMP0078 (–)
Main Summer Exam Period, Main Summer Exam Period (Sample Exam)
Suitable for Cohorts: 22/23
Answer ALL EIGHT questions.
Notation: Let [m] := {1, . . . ,m}. We also overload notation so that
[pred] :=
1 pred is true0 pred is false .
Marks for each part of each question are indicated in square brackets.
Calculators are NOT permitted.
1. a. “For the K-nearest neighbour algorithm, larger K values will tend to lead to over-
fitting.”
True or False? Explain your answer.
[5 marks]
b. First, give Cover’s bound on the Bayes error of the 1-nearest neighbour algorithm
as the number of examples goes to infinity. Second, give an example where the
1-nearest neighbour algorithm achieves the Bayes error as the number of examples
goes to infinity, explain your reasoning.
[5 marks]
COMP0078 1 TURN OVER
2. a. We consider convexity in this subquestion.
i. Is the following statement true or false? “f and g are convex then f + g is
convex.” Provide an argument supporting your answer.
ii. Is the following statement true or false? “If f is convex and g is affine (linear
+ a constant) then f(g(·)) is convex.” Provide an argument supporting your
answer.
[5 marks]
b. Kernels between sets. Let X be a finite set. Define K : 2X × 2X → R as
K(A,B) := 2|A∩B|
where A,B ⊆ X . Prove that K is a kernel.
[10 marks]
COMP0078 2 CONTINUED
3. a. Consider the optimisation given by
min
w,ξ
‖w‖+ C
m∑
i=1
ξi
subject to yi〈w,xi〉 ≥ 1− ξi, ξi ≥ 0, i = 0, . . . ,m
Which part of the optimisation corresponds to the regulariser? What is the loss
function incorporated into the optimisation?
[5 marks]
b. Assume that the set S = {(xi, yi)}mi=1 ⊂ R2×{−1, 1} of binary examples is strictly
linearly separable by a line going through the origin, that is, there exists a vector
w ∈ R2 such that the linear function f(x) = w>x, x ∈ R2 has the property that
yif(xi) > 0 for every i = 1, · · · ,m. Consider the optimisation problem (linearly
separable SVM):
P1 : min
w,ξ
1
2
w>w
subject to yiw
>xi ≥ 1, i = 1, · · · ,m
Argue that the above problem has a unique solution. Describe the geometric mean-
ing of this solution.
[10 marks]
[Question 4 cont. over page]
COMP0078 3 TURN OVER
[Question 4 cont.]
4. A dataset (x1, y1), . . . , (xm, ym) is generic iff xi = xj =⇒ yi = yj . A set of func-
tions H is called weakly-universal if for every generic data set and weighting of the data
{Di}mi=1 such that
∑m
i=1Di = 1, there exists a function h ∈ H with weighted error∑m
i=1Di [h(xi) 6= yi] < 0.5.
i. A positive decision stump hi,z : Rn → {−1, 1} is defined as
hi,z(x) :=
1 if xi ≤ z,−1 if xi > z.
The set of all decision stumps is defined as
Hds := {s hi,z : s ∈ {−1, 1}, i ∈ {1, . . . , n}, z ∈ R}.
IsHds weakly-universal? Provide an argument supporting your answer.
[5 marks]
ii. Let
S := {a, b, aa, ab, ba, bb, aaa, · · · }
denote the set of all strings of non-zero finite length over {a, b}. A string x is a
substring of y, denoted as x v y iff x is contained as a consecutive sequence of
characters in y. Thus a v abba, ab v abba, abba v abba but aba 6v abba and
abbaa 6v abba. A positive substring-match function gz : S → {−1, 1} is defined as
gz(x) :=
1 if x v z,−1 if x 6v z.
The set of all substring-match functions is denoted as
HSS := {s gz : s ∈ {−1, 1}, z ∈ S}.
IsHSS weakly-universal? Provide an argument supporting your answer.
[5 marks]
COMP0078 4 CONTINUED
5. Define the c-regret of learning algorithm as
c− regret(m) = LA(S)− cmin
i∈[n]
Li(S)
thus the usual regret is the 1-regret.
a. Argue for the weighted majority set-up argue that without randomised prediction it
is impossible for all training sequences to obtain sublinear c-regret for c < 2.
[5 marks]
b. Show how to select β to obtain sublinear 2-regret.
[10 marks]
COMP0078 5 TURN OVER
6. a. Given the unlabeled data points
(
(−2, 0), (−1,−1), (0, 0), (1, 1), (2, 0)) draw the
2-nearest neighbor graph with respect to the euclidean distance.
[5 marks]
b. Consider the following illustration of the parameterized graph Dm (m = 5 in illus-
tration) that consists of two m-cliques connected by a single bridge edge. The first
clique contains a node labelled with a ‘−1’ while the second clique contains a node
labelled with a ‘+1’.
Consider the labelling of the graph Dm as resulting from harmonic energy minimi-
sation. Now consider the labelling of Dm as m → ∞. What is the labelling in the
limit? Provide a justification of your answer.
[10 marks]
7. We have shown that for a finite hypothesis class H, VCdim(H) ≤ bln(|H|)c. However,
this is just an upper bound. The VC-dimension of a class can be much lower than that:
i. Find an example of a classH of functions over the real interval X = [0, 1] such that
H is infinite while VCdim(H) = 1.
ii. Give an example of a finite hypothesis class H over the domain X = [0, 1], where
VCdim(H) = blog2(|H|)c.
[10 marks]
COMP0078 6 CONTINUED
8. a. Explain how the algorithm EXP3.
i. Makes a prediction each trial.
ii. Updates its internal state each trial
[5 marks]
b. Suppose we are setting the parameter α in the fixed-share algorithm. In scenario A
the best sequence of experts/hypotheses has many switches and in scenario B the
best sequence of experts/hypotheses has significantly fewer switches. All else being
the same should we choose
i. αA < αB or
ii. αB < αA
where αA is a good tuning for scenario A and αB is a good tuning for scenario B.
Indicate your answer and explain your reasoning.
[5 marks]
COMP0078 7 END OF PAPER