程序代写案例-A33648
时间:2022-01-12
A33648 Specified calculator
Page 1 of 8 TURN OVER
Calculators may be used in this examination but must not be used to store text.
Calculators with the ability to store text should have their memories deleted prior to
the start of the examination.


Special Requirements: None












School of Engineering
Department of Electronic, Electrical and Systems
Engineering


Level M


04 30058


Data Mining and Machine Learning

January Examinations 2020


Time Allowed: 2 hours


Answer THREE questions


The allocation of marks within each question is stated in the right-hand
margin.







Specified calculator

A33648 Page 2 of 8 TURN OVER
1. (a) Consider the following six short texts 1, 2, 3, 4, 5, 6:

1: He famously said that his priorities were education, education and
education.
2: The education secretary was Mr Gove.
3: Some parents teach their children at home.
4: Home schooling can be a challenge for parents.
5: The school education system needs reform.
6: Education is a priority.

After stop word removal and stemming these become the following
documents:

1: famous priority educate educate educate
2: educate secretary gove
3: parent teach child home
4: home school challenge parent
5: school educate system reform
6: educate priority


(i) Calculate the Inverse Document Frequency (IDF) for each term in the
documents and the TF-IDF weights for each term relative to documents 1
and 5.
[6]

(ii) Now consider the query “the education system”, which after stop word
removal and stemming, has the form:

: educate system

Calculate the lengths of the documents 1 and 5 and the query .
[4]

(iii) Calculate the similarities between the query and the documents 1 and
5.
[4]

(iv) How many times would the term “educate” need to appear in to make 1
more similar than 5 to ?
[4]

(v) If the terms “teach” in 3 and “school” in 4 are both replaced by the term
“educate”, how many times would the term “educate” need to appear in
to make 1 more similar than 5 to ?
[2]










Specified calculator

A33648 Page 3 of 8 TURN OVER
2. (a) Let = {1, … , 8} be the following set of points in the plane:

1 = [
−3
3
] , 2 = [
1
3
] , 3 = [
−1
2
] , 4 = [
2
4
],

5 = [
−2
0
], 6 = [
3
4
] , 7 = [
−1
3
] , 8 = [
2
2
],

and let = {1, 2} be two centroids:

1 = [
1
3.5
] , 2 = [
2
3
] ,

that represent .


(i) Calculate the new values of 1 and 2 after one iteration of -means
clustering ( = 2). Use the Euclidean (2) metric for distance calculations.
You must show the steps in your calculations.
[5]

(ii) Carefully draw a scatter plot of the points , the original centroids and the
new centroids. Based on this plot, would the centroids change or remain
unchanged after a further iteration of -means clustering? Explain your
answer.
[4]


(b) Consider 2-dimensional data being modelled using a Gaussian Mixture
Model (GMM) with two mixture components and diagonal covariance
matrices. The parameters of the GMM components are:

Gaussian 1: mean 1 = [
4
6
], variance 1
2 = [
1
1
], and weight 1 = 0.6

Gaussian 2: mean 2 = [
8
3
], variance 2
2 = [
1
4
], and weight 2 = 0.4

Calculate the probability of the sequence of data = ([
4
5
] , [
8
4
]) being
generated by the GMM. Show your working.
[4]


(c) Latent Semantic Analysis (LSA) is applied to a set of documents with a
vocabulary of different words. The Word-Document matrix is
decomposed into = using Singular Value Decomposition.


(i) In this context, explain what is meant by the Word-Document matrix. [2]

(ii) What are the dimensions and properties of the matrices , and ? [3]

(iii) Explain how the latent semantic classes that emerge from LSA can be
interpreted.
[2]

Specified calculator

A33648 Page 4 of 8 TURN OVER
3. (a) Consider two sequences = {1, 2, 3} and = {1, 2, 3, 4, 5}, where:

1 = [
1
2
1
] , 2 = [
2
3.5
2.2
] , 3 = [
6
4
4
]

and
1 = [
0.5
2
1
] , 2 = [
1
2.5
1
] , 3 = [
5
3.5
4
] 4 = [
4.5
3
4.2
] , 5 = [
4.5
2
2
] .


Use Dynamic Programming with the city-block (1) metric to calculate the
accumulated distance and the optimal alignment path between and .
You should assume that insertion and deletion penalties are zero. Your
answer must include all of the steps in the calculation, in particular:


(i) The distance matrix between and . [2]

(ii) The accumulated distance matrix between and . [3]

(iii) The path matrix for and . [2]

(iv) The optimal alignment path for and . [1]

(v) The accumulated distance between and . [1]


(b) Figure 3.1 shows a Multi-Layer Perceptron (MLP) with 3 layers: an input
layer with 3 units and a fixed ‘bias’ unit that always has input 1, a hidden
layer with 2 units, and an output layer with 2 units.











Figure 3.1


The weight matrices W1 (between the Input Layer and the Hidden Layer)
and W2 (between the Hidden Layer and the Output Layer) are given by:

1 = [
0.8 0.3 −0.4 0.5
0.4 −0.8 −0.7 −4
] , 2 = [
−0.6 0.9
0.2 0.4
].





Input Layer



Hidden Layer



Output Layer
1
Specified calculator

A33648 Page 5 of 8 TURN OVER
The Input Layer has linear activation functions, the Hidden Layer has
sigmoid activation functions, and the Output Layer has sigmoid activation
functions.
The sigmoid activation function is given by: () = (1 + −)−1.
The vector = [
5
−3
2
] is applied to the input layer.


(i) Calculate the output from each unit in the Hidden Layer. [2]

(ii) Calculate the output from each unit in the Output Layer. [2]


(c) Consider the MLP from part (b). Now suppose that the input is a training
pattern with target output = [
1
0
].


(i) Use Error Back-Propagation to calculate the values of derivatives of the
error with respect to the weights , , = 1,2, where is the weight
associated with the link between the ℎ unit in the Hidden Layer and the
ℎ unit in the Output Layer.
[4]

(ii) Explain why the initial values of the matrices 1 and 2 may be important
in determining their final values after convergence of the Error Back-
Propagation training process.
[3]
























Specified calculator

A33648 Page 6 of 8 END OF PAPER
4. (a) Figure 4.1 depicts a hidden Markov model (HMM) with three emitting
states. Each of the HMM state can emit the symbol , , or .



Figure 4.1


The state output probability distributions for these states are given by:

(|1) = 0.5, (|1) = 0.3, (|1) = 0.2
(|2) = 0.2, (|2) = 0.3, (|2) = ?
(|3) = 0.1, (|3) = 0.2, (|3) = 0.7

The initial state probability vector is given by P0=(1,0,0).


(i) Calculate the state transition probabilities that are missing in Figure 4.1
and write down the state transition probability matrix. Calculate the
missing value for the state output probability distribution (|2). Justify
your answers.
[2]

(ii) Let = (1, 1, 2, 3, 3) be a sequence of states of M. Let be the
observation sequence (, , , , ). Calculate the probability (, |) of
the observation sequence and the state sequence . Calculate the
probability (|, ).
[4]

(iii) Give the definition of the quantity that is computed using the Viterbi
algorithm and the recursive equation used by the algorithm.
[3]


(b) Consider a sequence of data = (1, … , ) to be modelled by a HMM
whose state output probability density functions (pdfs) are single Gaussian
with a diagonal covariance matrix. Describe how you would estimate the
parameters of the state output pdfs of such a HMM using the maximum-
likelihood procedure.
[5]


(c) What is the precise definition of a phoneme? How is the concept of
phoneme exploited in an HMM-based large vocabulary speech recognition
system?
[3]


(d) What is a ‘triphone HMM’ and what problem does it attempt to solve
compared with a ‘monophone HMM’?
[3]


s3 s2 s1
0.7
0.1
0.5
Specified calculator
A33648 Page 7 of 8























This page is intentionally blank
A33648 LM Data Mining and Machine Learning Specified calculator
Page 8 of 8






Important Reminders

 Coats/outwear should be placed in the designated area.

 Unauthorised materials (e.g. notes or Tippex) must be placed in
the designated area.

 Check that you do not have any unauthorised materials with you
(e.g. in your pockets, pencil case).

 Mobile phones and smart watches must be switched off and
placed in the designated area or under your desk. They must
not be left on your person or in your pockets.

 You are not permitted to use a mobile phone as a clock. If you
have difficulty seeing a clock, please alert an Invigilator.

 You are not permitted to have writing on your hand, arm or
other body part.

 Check that you do not have writing on your hand, arm or other
body part – if you do, you must inform an Invigilator immediately

 Alert an Invigilator immediately if you find any unauthorised item
upon you during the examination.

Any students found with non-permitted items upon their person
during the examination, or who fail to comply with Examination
rules may be subject to Student Conduct procedures.
Do not complete the attendance slip, fill in the front
of the answer book or turn over the question paper
until you are told to do so


essay、essay代写