COMP5328-程序代写案例|学霸联盟

COMP5328-程序代写案例

时间：2021-11-15

COMP5328 Sample Final Exam 2021 Semester 2
is is a sample nal exam of COMP5328.
e information about the nal exam of COMP5328 is provided below.
Duration: 3 hours and 10 minutes (190 minutes). is includes 10 minutes of reading time,
but you can start writing whenever you are ready.
• Buer time (Assignment): You will be allowed a buer time of 15 minutes. is
means that you have 15 minutes aer the ’Due’ date and time to still be able to
submit your exam before the assignment closes. Please note that your assignment
will be marked as a late submission if you submit aer the ’Due’ date and time. If
you are unable to submit your exam within 15 minutes of buer time, you should
apply for special consideration. Buer time does NOT mean you have extra time to
complete your exam.
• Upload time: e nal exam has 15 minutes of upload time added to the duration
to allow you to upload images/workings etc. as per your exam instructions. Do NOT
treat this as extra time. e upload time must be used solely to save and upload your
documents correctly as per the exam instructions.
Format: e nal exam is a take-home exam with 10 questions. You should read the
questions, and FILL WORDS /INSERT IMAGES OF YOUR HAND-WRITTEN ANSWERS IN
THE TEX TEMPLATE. For hand-wrien answers, write them down on A4 paper clearly.
Take images just covering the entire A4 paper. Only the image formats of “.png”, and
“.jpeg” are accepted. You should aempt all questions and follow the instructions for each
question carefully.
estion type Points Recommended
time spent
Part A You are allowed to answer the questions in text only 40 60
Part B You are allowed to answer the questions in text and/or
image. Words on the images are only to explain the formula,
the derivation, or the logic chain. If you want to explain
concepts/statements by words, please type the words in the
latex answer template.
60 120
1
Part A: You are allowed to answer the questions in text only.
estion 1 (Approximation error, estimation error) [10pts]
Suppose we have a task which is to identify dierent persons and a training dataset which
contains pairs of human face image and identity. Two predened hypothesis classes are
given, which are linear classiers and deep neural networks, respectively.
1). Which hypothesis class has a larger Vapnik–Chervonenkis (VC) dimension? Explain
why in details.
2). Which hypothesis class is more possible to produce a smaller approximation error?
Explain why in details.
3). Which hypothesis class is more possible to produce a smaller estimation error when
trained with this dataset? Explain why in details.
estion 2 (Causal Inference) [10pts]
We have a directed graphic causal model as follows.
X2
X 1 X 3
X 4
X 5 X 6
[Note that you can use X-1, X-2, X-3,. . . ,X-4, X-5 and X-6 to refer to X1, X2 X3, X4, X5,
and X6, respectively. You can use NULL to denote an empty set {∅}.]
1). LetX = {X2} and Y = {X6}, Provide a set which d-separatesX and Y . Explain
why in details.
2). LetX = {X1, X2} and Y = {X3, X4}, Provide a set d-separatesX and Y . Explain
why in details.
2
Part B: You are allowed to answer the questions in text and/or image.
estion 5 (Sparse coding) [10pts]
In sparse coding, we prefer sparse matrices.
1). Please briey explain the concept of sparsity.
2). e denition of the lp norm is as follows:
`p = ‖α‖p = (
k∑
j=1
|αj|p)1/p,
where α ∈ Rk. Among `0, `1, `2 norms, which one is the best measure of sparsity? Please
give a brief explanation.
3). Explain why the `0 norm is hard to be optimized. How to handle the problem?
estion 6 (Expected Risk and Empirical Risk) [10pts]
Let ` be a loss function. We can dene the expected risk of a hypothesis h as
R(h) = E[`(X, Y, h)].
And the corresponding empirical risk can be dened as
RS(h) =
1
n
n∑
i=1
`(X, Y, h),
where S = {(X1, Y1), . . . , (Xn, Yn)} is a training sample consists of n pairs of i.i.d. random
variables. LetH be the predened hypothesis class. We further dene
f ∗ = argmin
h∈H
R(h),
fS = argmin
h∈H
RS(h).
Prove that R(hS)−R(h∗) ≤ 2 suph∈H[RS(h)−R(h)]. Show the detailed proof steps.
3
estion 7 (Optimization) [10pts]
Suppose we have a function f(h) where h is the hypothesis. By Taylor’s theorem, we have
f (hk+1) = f (hk) + η∇f (hk)> dk + o(η).
1). If we want to minimize the function value, the steepest gradient descent method tells us
that we need to update our hypothesis in the direction of the negative gradient −∇f (hk).
Please explain why.
2). Please explain howArmijo Backtracking line-search works. (Include somemathematical
notation and formulas if necessary. You have to carefully explain the meaning of each
notation used.)