程序代写案例-FIT5215|学霸联盟

程序代写案例-FIT5215

时间：2021-10-30

2020 Semester Two (November-December 2020)
Examination Period
Faculty of Information Technology
EXAM CODES: FIT5215
TITLE OF PAPER: Deep learning
EXAM DURATION: 2 hours 20 mins
Rules
During an exam, you must not have in your possession any item/material that has not been authorised for your exam. This includes
books, notes, paper, electronic device/s, mobile phone, smart watch/device, calculator, pencil case, or writing on any part of your
body. Any authorised items are listed above. Items/materials on your desk, chair, in your clothing or otherwise on your person will
be deemed to be in your possession.
You must not retain, copy, memorise or note down any exam content for personal use or to share with any other person by any
means following your exam.
You must comply with any instructions given to you by an exam supervisor.
As a student, and under Monash University’s Student Academic Integrity procedure, you must undertake your in-semester tasks,
and end-of-semester tasks, including exams, with honesty and integrity. In exams, you must not allow anyone else to do work for
you and you must not do any work for others. You must not contact, or attempt to contact, another person in an attempt to gain
unfair advantage during your exam session. Assessors may take reasonable steps to check that your work displays the expected
standards of academic integrity.
Failure to comply with the above instructions, or attempting to cheat or cheating in an exam may constitute a breach of instructions
under regulation 23 of the Monash University (Academic Board) Regulations or may constitute an act of academic misconduct
under Part 7 of the Monash University (Council) Regulations.
Authorised Materials
CALCULATORS YES NO Calculator
DICTIONARIES YES NO
NOTES YES NO Notes - Double Sided A4 x 1 -Handwritten only
PERMITTED ITEM YES NO
if yes, items permitted are:
Page 1 of 17
Instructions
Once your exam finishes, you will be given time to scan
a QR code and upload your answers using
your smartphone and laptop.
Here's how to do it. 
This examination is designed for FIT5215 Deep Learning unit, Semester 2, 2020. It contains THREE (3) parts with a total of 100
marks:
Part A contains 11 multiple-choice questions, together they are worth a total of 25 marks. Questions with more than one answer,
a correct choice will receive a partial mark and incorrect choices will reduce the mark. To receive full marks, only correct
choices must be selected.
Part B contains 8 short workout questions, worth 35 marks. These questions typically require short knowledge answers and
calculations based on the knowledge you have learned from the unit. Having the calculator handy for these questions is
recommended.
Part C contains 9 mixed and written-answer questions, worth 40 marks. These questions typically assess the knowledge and
understanding of lecture contents.
Good luck with your exam!
Page 2 of 17
Instructions
Information
This examination is designed for FIT5215 Deep Learning unit, Semester 2, 2020. It contains THREE (3)
parts with a total of 100 marks:
Part A contains 11 multiple-choice questions, together they are worth a total of 25 marks.
Questions with more than one answer, a correct choice will receive a partial mark and incorrect
choices will reduce the mark. To receive full marks, only correct choices must be selected.
Part B contains 8 short workout questions, worth 35 marks. These questions typically require short
knowledge answers and calculations based on the knowledge you have learned from the unit.
Having the calculator handy for these questions is recommended.
Part C contains 9 mixed and written-answer questions, worth 40 marks. These questions typically
assess the knowledge and understanding of lecture contents.
Good luck with your exam!
Page 3 of 17
Part A - Multiple Choices (11 questions, 25 marks)
Question 1
Given an auto-encoder with the encoder and the decoder , match the correct type of auto-
encoders and their corresponding objective functions:
Objective function A: .
Objective function B: .
Objective function C: .
Standard auto-encoder
• Objective function A • Objective function B
• Objective function C
Denoising auto-encoder
• Objective function A • Objective function B
• Objective function C
Sparse auto-encoder
• Objective function A • Objective function B
• Objective function C
3
Marks
gα hβ
[d(x, ( (x))) + λΩ(z)]minα,β Ex∼P hβ gα
[d(x, ( (x)))]minα,β Ex∼P hβ gα
[ [d(x, ( (x + ϵ)))]]minα,β Ex∼P Eϵ∼N(0,ηI) hβ gα
Question 2
Let and , what is the correct formula for the gradient vector
?
Select one:
2
Marks
f(x,y) = x log( + )ex ey u = +ex ey
∇f
A.
softmax([x, y])
B.
[log u+ , log u+ ]xexu ye
y
u
C.
[log u+ , ]xexu xe
y
u
D.
[log u+ , log u+ ]exu e
y
u
E.
None of above.
Page 4 of 17
Question 3
Let and where the function is taken element-wise
as usual. Assume the shapes for and are and respectively. Which of the
following statements are correct:
Select one or more:
A.
The dimension of and must be the same.
B.
The dimension of and must be the same.
C.
The bias must be a vector whose dimension is .
D.
The dimension for the gradient/Jacobian is .
E.
The dimension for the gradient/Jacobian is .
3
Marks
= Wx + bh¯¯¯ h = sigmoid( ) = σ( )h¯¯¯ h¯¯¯ sigmoid()
W x 7 × 150 150 × 1
h¯¯ x
h b
b 7
∂h/∂x 150 × 150
∂h/∂x 7 × 150
Question 4
Consider a general supervised learning objective of deep learning which typically has the following
optimisation problem, select all correct statement(s):
Select one or more:
A.
The first term is the regularization term used to encourage simpler models.
B.
The second term is the empirical loss incurred from the prediction in comparison with the true
outcome.
C.
The first term helps to combat underfitting.
D.
The second term helps to combat overfitting.
E.
The first term can help to combat overfitting.
3
Marks
J(θ) = λΩ(θ) + l( ,f( ,θ))minθ 1N ∑Nt=1 yt xt
λΩ(θ)
l( , f( , θ))1N ∑Nt=1 yt xt
Page 5 of 17
Question 5
For which application is the following architecture best suited to apply?
Select one:
1
Mark
A.
Image captioning.
B.
Machine translation.
C.
Sentiment prediction from texts.
D.
Video classification at frame level.
E.
None of the above.
Question 6
Consider the LSTM cell which is shown in the following figure. What statements are correct for the
forget gate? 2
Marks
Page 6 of 17
Select one or more:
A.
We apply sigmoid activation when computing forget gate.
B.
We apply tanh activation when computing forget gate.
C.
It controls the proportion of information in long-term memory to be carried forward.
D.
It controls the proportion of information in short-term memory to be carried forward.
E.
No correct answer(s) listed.
Question 7
Given a sequence of random variables , which of the following equation expresses the
product rule in probability theory?
Select one:
1
Mark
, , … ,x1 x2 xn
A.
.p( , , … , ) = 1x1 x2 xn
B.
.p( , , … , ) = p( )p( ) …p( )x1 x2 xn x1 x2 xn
C.
.p( , , … , ) = p( | )p( | ) …p( | )p( )x1 x2 xn xn xn−1 xn−1 xn−2 x2 x1 x1
D.
.p( , , … , ) = p( | )p( | ) …p( | )p( )x1 x2 xn xn x1:n−1 xn−1 x1:n−2 x2 x1 x1
Page 7 of 17
Question 8
Consider gradient descent (GD)-based optimisation method, which of the following statement(s) are
correct?
Select one or more:
A.
Stochastic GD employs gradient of a mini-batch to estimate the full gradient.
B.
The GD solution is updated by following the direction of the positive gradient.
C.
Very big learning rate helps gradient descent to converge faster.
D.
Gradient descent guarantees to find the global minima for non-convex functions.
E.
A local maxima is the highest value in its neighborhood in all directions.
F.
SGD’s gradient with respect to a mini-batch is unbiased estimation of the full gradient.
3
Marks
Question 9
Consider different types of activation functions used in deep learning, select all correct statement(s):
Select one or more:
A.
Sigmoid has derivative everywhere.
B.
ReLU also has derivative everywhere.
C.
Sigmoid is an unsaturated activation function.
D.
ReLU has a derivative of zero if the input is less than zero.
E.
None of the above.
2
Marks
Page 8 of 17
Question 10
Regarding the dropout technique in deep learning, select all correct statement(s):
Select one or more:
A.
If the dropout rate during training is 50%, then the dropout rate during testing should be a half of this, which is 25%.
B.
Dropout is simple, but could be an effective method to reduce model capacity, hence helps to reduce overfitting.
C.
Dropout is an effective way to break the symmetry effect in the network.
D.
Dropout helps to solve the internal covariance shift.
E.
None of the above.
3
Marks
Question 11
Given an adversarial example of a clean example with respect to a model and is the true
label. Select all correct answer(s):
Select one or more:
A.
and must visually look very different to create the adversarial effect.
B.
and visually look very similar.
C.
.
D.
.
E.
None of the above.
2
Marks
xadv x f() y
xadv x
xadv x
arg maxf( ) = yxadv
arg maxf( ) ≠ yxadv
Page 9 of 17
Part B - Short Workout & Knowledge Questions (8 questions, 35 marks)
Question 12
Let and where the sigmoid function is applied element-wise.
Calculate and show your steps.
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
4
Marks
= Wx + bh¯¯¯ h = sigmoid( ) = σ( )h¯¯¯ h¯¯¯
∂h
∂x
Question 13
2
Marks
2
Marks
3
Marks
2
Marks
You are constructing a CNN model that mimics the VGG network. Answer the following questions
based on the following code segment:
13a)
a) Explain what are the values of , and in the code. (2 marks)
13b)
b) Identify the depth of the output from this convolutional layer if the input shape is
. (2 marks)
13c)
c) Calculate the output size of the feature map after going through these layers assuming the same
input shape as before. (2 marks)
13d)
d) Explain the purpose of including Batch Normalization and Max Pooling in this CNN architecture. (3
marks)
9
Marks
64 (5, 5)
[64, 320, 480, 8]
Page 10 of 17
Information
The following information relates to Question 14 and Question 15.
Consider the first two time-slices of a RNN architecture for the regression task with weight matrices
and bias terms shown in the following figure. Instead of using tanh() as in a standard RNN, assuming
that the activation ReLU() is used in all cases. Answer the following questions:
Question 14
Without using the any specific values on the right-hand-side of the figure, write down the analytical
expressions for and .
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
2
Marks
h0 h1
Question 15
Now given the values for and matrices as shown on the right-hand-side of the
figure, calculate and then the output .
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
3
Marks
, , b, cx0 x1 U,V ,W
,h0 h1 y^1
Page 11 of 17
Question 16
Draw the computational graph for simple two-layer neural network regression function
where .
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
3
Marks
f(x) = h +W2 b2 h = σ( x + )W1 b1
Question 17
3
Marks
3
Marks
1
Marks
Consider the setting of applying data augmentation technique in training image-based deep networks,
answer the following questions:
17a)
a) Explain the key idea behind data augmentation technique and list THREE (3) methods that could be
performed on the input image for this technique. (3 marks)
17b)
b) List THREE (3) reasons why data argumentation can help to improve the model performance on test
dataset. (3 marks)
17c)
c) List ONE (1) disadvantage of using data argumentation. (1 mark)
7
Marks
Page 12 of 17
Question 18
2
Marks
1
Marks
1
Marks
Consider the following Autoencoder architecture where and are respectively dimension of and ,
answer the following questions:
18a)
a) Is this an under-complete autoencoder? Explain your answer. (1 mark)
18b)
b) What is in this figure and how can it be computed? (1 mark)
18c)
c) Explain how the latent code can learn a useful representation in an under-complete
autoencoder. (2 marks)
4
Marks
d k x z
r
z
Page 13 of 17
Question 19
1
Marks
1
Marks
1
Marks
Consider an embedding approach in text analytics using the word2vec algorithm with a simple text
corpus consisting of the following sentence: “the quick brown fox jumps over the lazy dog”.
Tokenizing this corpus gives us a dictionary which stores all vocabularies (i.e., unique words in the
corpus) as well as their indices. Assume the above corpus returns the following word-to-indices
dictionary = {'brown': 0, 'lazy': 1, 'over': 2, 'fox': 3, 'dog': 4, 'quick': 5, 'the': 6, 'jumps': 7}. Answer the
following questions:
19a)
a) What is the one-hot encoding for the word ‘lazy’? (1 mark)
19b)
b) Consider using word2vec technique to perform word embedding, which word(s) will be used as
context to predict the target word ‘jumps’ if the window size is (i.e., extended to left and right by
)? (1 mark)
19c)
c) Name the method used in the previous question, i.e., skip-gram or CBOW? (1 mark)
3
Marks
w = 3
w
Page 14 of 17
Part C - Mixed & Written-Answer Questions (9 questions, 40 marks)
Question 20
Describe the overfitting problem in deep learning and describe THREE (3) approaches to address it.
6
Marks
Information
The following information relates to Question 21 and Question 22.
During the COVID period in Melbourne, a research team was using a CNN-based deep learning
approach to predict whether a patient is positive to the novel coronavirus based on a single chest X-
ray image. The team has collected a training dataset, an evaluation dataset and a test dataset. Each
data point contains a single chest X-ray of the patient and a binary outcome indicating whether he or
she was contracted with the coronavirus (positive means being contracted with the virus).
Question 21
Describe the general architecture of CNNs used for image classification tasks. Briefly explain the
function of each layer.
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 2
8
Marks
Question 22
Using training and evaluation datasets, the team has trained a model G. When applying this model G
on the test dataset, the team achieved a recall of 80% and a precision also at 80%. We also know that
there are 500 patients contacted with coronavirus in the test dataset, answer the following questions:
(a) What is the number of True Positives (TP) and the number of False Positives (FP)? (2 marks)
(b) The test result also yields a specificity of 95%, what is the number of negative cases (i.e.,
healthy patients without having the coronavirus)? (2 marks)
(c) What is the total number of patients in the test dataset and subsequently calculate the
performance accuracy? (2 marks)
(d) From the information provided above, is it possible to calculate the AUC? If yes, calculate its
value, if no, then explain why. (1 mark)
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 2
7
Marks
Page 15 of 17
Question 23
What are the gradient vanishing and gradient exploding problems when training a deep learning model
such as DNNs, RNNs or CNNs with Back Propagation. Why these problems significantly cause poor
learning performance and list THREE (3) potential approaches which can be used to address them.
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 2
6
Marks
Information
The following information relates to Question 24, Question 25, Question 26 and Question 27.
Consider a seq2seq model using the encoder-decoder architecture shown in the figure below. Let \
(\theta_{e}\) and \(\theta_{d}\) be the parameters for the encoder and the decoder
respectively. Answer the following questions:
Question 24
Explain the role of the symbol ‘’.
1
Mark
Question 25
Let \( \theta = [\theta_{e}, \theta_{d}] \) be the parameter vector to be learned and \(D\) be the training
set consists of multiple pairs of sequences \( (\mathbf{x}, \mathbf{y}) \), write down the maximum
log-likelihood objective function to learn \(\theta\).
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
2
Marks
Page 16 of 17
Question 26
Use the product rule, expand the log-likelihood term \( \log P(\mathbf{y} | \mathbf{x}, \theta) \) where
\( \mathbf{y} = [y_{1},\dots,y_{T_{y}}] \) and \( \mathbf{x} = [x_{1},\dots, x_{T_{x}}] \) so that it can be
computed based on the local probability \( P(y_{j} | q_{j}, c, \theta) \).
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
2
Marks
Question 27
During inference, given an input sequence \(\mathbf{x}\), what are TWO (2) common strategies to
infer corresponding output sequence \(\mathbf{y}\)? 2
Marks
Question 28
2
Marks
2
Marks
2
Marks
As an AI engineer who is working in a face anti-spoof project, you are tasked with using modern deep
generative models to generate as many synthesis unseen faces as possible to train the model. In
addition, the goal is to also generate the appearance of faces to be as diverse as possible.
28a)
a) Describe key steps involved in using Generative Adversarial Networks (GAN) for this task. (2 marks)
28b)
b) Describe the role of the generator and the discriminator in GAN and how should they ideally behave
when GAN obtains its ideal Nash equilibrium solution? (2 marks)
28c)
c) Name TWO (2) technical problems with GAN that might significantly affect its training and the
quality of the output used for the above task. (2 marks)
6
Marks
Page 17 of 17

学霸联盟

’.
1
Mark
Question 25
Let \( \theta = [\theta_{e}, \theta_{d}] \) be the parameter vector to be learned and \(D\) be the training
set consists of multiple pairs of sequences \( (\mathbf{x}, \mathbf{y}) \), write down the maximum
log-likelihood objective function to learn \(\theta\).
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
2
Marks
Page 16 of 17
Question 26
Use the product rule, expand the log-likelihood term \( \log P(\mathbf{y} | \mathbf{x}, \theta) \) where
\( \mathbf{y} = [y_{1},\dots,y_{T_{y}}] \) and \( \mathbf{x} = [x_{1},\dots, x_{T_{x}}] \) so that it can be
computed based on the local probability \( P(y_{j} | q_{j}, c, \theta) \).
Please answer question on your blank piece of paper.
After your exam finishes, you’ll have extra time to access your phone to scan a QR code and upload your
answer.
Clearly label each page with Student ID and this question number (and sub part if applicable) (for example,
'Question 7a')
Do not write your Name on it
No. of answer sheets: 1
2
Marks
Question 27
During inference, given an input sequence \(\mathbf{x}\), what are TWO (2) common strategies to
infer corresponding output sequence \(\mathbf{y}\)? 2
Marks
Question 28
2
Marks
2
Marks
2
Marks
As an AI engineer who is working in a face anti-spoof project, you are tasked with using modern deep
generative models to generate as many synthesis unseen faces as possible to train the model. In
addition, the goal is to also generate the appearance of faces to be as diverse as possible.
28a)
a) Describe key steps involved in using Generative Adversarial Networks (GAN) for this task. (2 marks)
28b)
b) Describe the role of the generator and the discriminator in GAN and how should they ideally behave
when GAN obtains its ideal Nash equilibrium solution? (2 marks)
28c)
c) Name TWO (2) technical problems with GAN that might significantly affect its training and the
quality of the output used for the above task. (2 marks)
6
Marks
Page 17 of 17

学霸联盟