程序代写案例-COM 2004
时间:2022-01-19
UNIVERSITY OF SHEFFIELD COM 2004
MODEL SOLUTIONS
SETTER: JPB
Data Provided:
None
DEPARTMENT OF COMPUTER SCIENCE AUTUMN SEMESTER 2017–2018
DATA DRIVEN COMPUTING 2 hours
Answer THREE questions.
All questions carry equal weight. Figures in square brackets indicate the percentage of avail-
able marks allocated to each part of a question.
COM 2004 1
UNIVERSITY OF SHEFFIELD COM 2004
1. This question concerns probability theory.
a) The discrete random variable X represents the outcome of a biased coin toss. X has
the probability distribution given in the table below,
x H T
P(X = x) θ 1−θ
where H represents a head and T represents a tail.
(i) Write an expression in terms of θ for the probability of observing the sequence
H, T, H, H. [5%]
ANSWER:
θ× (1−θ)×θ×θ = θ3(1−θ)
(ii) A sequence of coin tosses is observed that happens to contain NH heads and
NT tails. Write an expression in terms of θ for the probability of observing this
specific sequence. [5%]
ANSWER:
θNH (1−θ)NT
(iii) Show that having observed a sequence of coin tosses containing NH heads and
NT tails, the maximum likelihood estimate of the parameter θ is given by
NH
NH +NT
[20%]
ANSWER:
Write down the expression for p(x) in terms of the parameter θ,
P(x;θ) = θNH (1−θ)NT
Find maximum by differentiating w.r.t. the parameter θ and setting to 0,
dP(x;θ)
dθ
= NHθ
NH−1(1−θ)NT −θNH NT (1−θ)
NT−1 = 0
Solve for θ,
NHθ
NH−1(1−θ)NT −θNH NT (1−θ)
NT−1 = 0
NH(1−θ)−θNT = 0
NH = NHθ+NT θ
θ =
NH
NH +NT
COM 2004 2
UNIVERSITY OF SHEFFIELD COM 2004
b) The discrete random variables X1 and X2 represent the outcome of a pair of independent
but biased coin tosses. Their joint distribution P(X1,X2) is given by the probabilities in
the table below,
X1 = H X1 = T
X2 = H λ 3λ
X2 = T 2λ ρ
(i) Write down the probability P(X1 = H,X2 = H). [5%]
ANSWER:
λ
(ii) Calculate the probability P(X1 = H) in terms of λ. [5%]
ANSWER:
λ+2λ = 3λ
(iii) Calculate the probability P(X2 = H) in terms of λ. [5%]
ANSWER:
λ+3λ = 4λ
(iv) Given that the coin tosses are independent and that λ is greater than 0, use your
previous answers to calculate the value of λ. [15%]
ANSWER:
Independence implies that
P(X1 = H,X2 = H) = P(X1 = H)×P(X2 = H)
So
λ = 3λ×4λ
λ =
1
12
COM 2004 3
UNIVERSITY OF SHEFFIELD COM 2004
(v) Calculate the value of ρ. [5%]
ANSWER:
Also entries in the table must sum to one so
λ+3λ+2λ+ρ = 1
Therefore
ρ =
1
2
COM 2004 4
UNIVERSITY OF SHEFFIELD COM 2004
c) Consider the distribution sketched int the figure below.
0 1
2λ
λ
b
p(x)
x
p(x) =
2λ if 0 <= x < b
λ if b <= x <= 1
0 otherwise
(i) Write an expression for λ in terms of the parameter b. [15%]
ANSWER:
2λb+(1−b)λ = 1
λ =
1
1+b
(ii) Two independent samples, x1 and x2, are observed. x1 has the value 0.25 and
x2 has the value 0.75. Sketch p(x1,x2;b) as a function of b as b varies between
0 and 1. Using your sketch, calculate the maximum likelihood estimate of the
parameter b given the observed samples. [20%]
ANSWER:
• For b in the range 0 to 0.25, p(x1,x2) = λ×λ =
1
(1+b)2
, i.e., 1 to 16/25
• For b in the range 0.25 to 0.75, p(x1,x2) = λ× 2λ =
2
(1+b)2
, i.e., 32/25 to
32/49
• For b in the range 0.75 to 1.0, p(x1,x2) = 2λ×2λ =
4
(1+b)2
, 64/49 to 1
The p(x1,x2) has a maximum value of 64/49 when b = 0.75.
COM 2004 5
UNIVERSITY OF SHEFFIELD COM 2004
2. This question concerns the multivariate normal distribution.
a) Consider the data in the following table showing the height (x1) and arm span (x2) of a
sample of 8 adults.
x1 151.1 152.4 152.9 156.8 161.8 158.6 157.4 158.8
x2 154.5 162.2 151.5 158.2 165.3 165.6 159.8 162.0
The joint distribution of the two variables is to be modeled using a multivariate Gaussian
with mean vector, µ and covariance matrix, Σ.
(i) Calculate an appropriate value for the mean vector, µ. [5%]
ANSWER:
(156.2,159.9)
(ii) Write down the formula for sample variance. Use it to calculate the unbiased
variance estimate for both height and arm span. [10%]
ANSWER:
Var(x) =
1
N−1
N
∑
i=1
(xi−µx)
2
The variance of height and arm space are 13.9 and 24.9, respectively.
(iii) Write down the formula for sample covariance. Use it to calculate the unbiased
estimate of the covariance between height and arm span. [10%]
ANSWER:
Cov(x,y) =
1
N−1
N
∑
i=1
(xi−µx)(yi−µy)
The covariance between height and arm span is 13.5.
(iv) Write down the covariance matrix, Σ. [5%]
ANSWER:
(
13.9 13.5
13.5 24.9
)
COM 2004 6
UNIVERSITY OF SHEFFIELD COM 2004
(v) Compute the inverse covariance matrix, Σ−1. [15%]
ANSWER:
(
0.154 −0.0840
−0.0840 0.0860
)
b) Remember that the pdf of a multivariate Gaussian is given by
p(x) =Ce−
1
2 (x−µ)
T Σ−1(x−µ)
where C is a scaling constant that does not depend on x.
Using the answer to 2 (a) and the equation above, answer the following questions.
(i) Who should be considered more unusual:
• Ginny who is 162.1 cm tall and has arms 164.2 cm long, or
• Cho who is 156.0 cm tall and has arms 153.1 cm long?
Show your reasoning. [20%]
ANSWER:
Compute −(x− µ)T Σ−1(x− µ) for each person and see which is smaller. For
Ginny it is -2.67 and for Cho it is -3.71. Therefore Cho’s proportions are more
unusual.
(ii) A large sample of women is taken and it is found that 120 have measurements
similar to those of Ginny. How many women in the same sample would be ex-
pected to have measurements similar to those of Cho? [15%]
ANSWER:
The samples should be in the ratio
exp(−(x1−µ)
T Σ−1(x1−µ)) : exp(−(x2−µ)
T Σ−1(x2−µ))
Plugging in the numbers from the previous section
120∗ exp(−3.71)/exp(−2.67) = 42
COM 2004 7
UNIVERSITY OF SHEFFIELD COM 2004
c) A person’s ‘ape index’ is defined as their arm span minus their height.
(i) Use the data in 2 (a) to estimate a mean and variance for ape index. [10%]
ANSWER:
µ = 3.66
σ2 = 11.6
σ = 3.41
(ii) The figure below shows a standard normal distribution, i.e., X ∼ N(0,1). The
percentages indicate the proportion of the total area under the curve for each
segment.
−3 −2 −1 0 1 2 3
19.1%19.1%
15.0%15.0%
9.2%9.2%
4.4%4.4%
1.7%1.7%
0.5%0.5%
Using the diagram estimate the proportion of the population who will have an ape
index greater than 10.5? [5%]
ANSWER:
10.5 is 2 σ greater than the mean. Therefore about 2.2% of the population will
be expected to have an ape index of 10.5 or greater.
(iii) Using the figure above estimate the mean-centred range of ape indexes that
would include 99% of the population. [5%]
ANSWER:
This would be about + or - 2.5 sigma about the mean. So
• smallest: 3.66−3.41×2.5 =−4.87
• biggest: 3.66+3.41×2.5 = 12.19
COM 2004 8
UNIVERSITY OF SHEFFIELD COM 2004
i.e., the range [−4.87,12.19].
COM 2004 9
UNIVERSITY OF SHEFFIELD COM 2004
3. This question concerns classifiers.
a) Consider a Bayesian classification system based on a pair of univariate normal distribu-
tions. The distributions have equal variance and equal priors. The mean of class 1 is
less than the mean of class 2. For each case below say whether the decision threshold
increases, decreases, remains unchanged or can move in either direction.
(i) The mean of class 2 is increased. [5%]
ANSWER:
Increases
(ii) The mean of class 1 and class 2 are decreased by equal amounts. [5%]
ANSWER:
Decreases
(iii) The prior probability of class 2 is increased. [5%]
ANSWER:
Decreases
(iv) The variance of class 1 and class 2 are increased by equal amounts. [5%]
ANSWER:
Remains the same.
(v) The variance of class 2 is increased. [5%]
ANSWER:
It may increase or decrease (depending on the specific values of the parame-
ters).
b) Consider a Bayesian classification system based on a pair of 2-D multivariate normal
distributions, p(x|ω1) ∼ N(µ1,Σ1) and p(x|ω2) ∼ N(µ2,Σ2) . The distributions have
the following parameters
µ1 =
(
1
2
)
µ2 =
(
3
5
)
Σ1 = Σ2 =
(
1 0
0 1
)
COM 2004 10
UNIVERSITY OF SHEFFIELD COM 2004
The classes have equal priors, i.e., P(ω1) = P(ω2).
Calculate the equation for the decision boundary in the form x2 = mx1 + c.
[25%]
ANSWER:
The covariances are equal and circular therefore the decision boundary will be a line
bisecting and orthoganal to the line between the class means. The vector between
the means is (3,5)T − (1,2) = (2,3)T there the decision boundary runs in the direct
(−3,2)T , i.e., the gradient, m is−2
3
. It passes through the midpoint between the means,
i.e., 0.5× ((1,2)T +(3,5)T ) = (2,3.5)T . So 3.5 =−2
3
×2+ c, so c = 29
6
.
m =−2
3
;c = 29
6
c) Consider a K nearest neighbour classifier being used to classify 1-D data belonging to
classes ω1 and ω2. The training samples for the two classes are
ω1 = {1,3,5,7,9} ω2 = {2,4,6,8}
The diagram below shows the decision boundaries and class labels for the case K = 1.
1 2 3 4 5 6 7 8 9
!1 !1 !1 !1 !1!2 !2 !2 !2
Make similar sketches for the cases K = 3, K = 5, K = 7 and K = 9. [25%]
ANSWER:
K = 3
1 2 3 4 5 6 7 8 9
!1 !2 !2 !2 !1!1 !1
K = 5
1 2 3 4 5 6 7 8 9
!1 !1 !1!2 !2
COM 2004 11
UNIVERSITY OF SHEFFIELD COM 2004
K = 7
1 2 3 4 5 6 7 8 9
!1 !2 !1
K = 9
1 2 3 4 5 6 7 8 9
!1
.
COM 2004 12
UNIVERSITY OF SHEFFIELD COM 2004
d) Consider a K-nearest neighbour that uses a Euclidean distance measure, K = 1, and
the following samples as training data,
ω1 =
{
(0,1)T ,(1,1)T ,(1,2)T
}
ω2 =
{
(1,0)T ,(2,1)T
}
A point is selected uniformly at random from the region defined by 0≤ x1 ≤ 2, 0≤ x2 ≤ 2.
What is the probability that the point is classified as belonging to class ω1? [Hint: start
by sketching the decision boundary.]
[25%]
ANSWER:
The decision boundary is shown below,
!1
!2
2
210
0
1
It’s clear that within the region 0 ≤ x1 ≤ 2,0≤ x2 ≤ 2,
5
8
of the area is assigned to class
ω1. So there is a 62.5% chance of a randomly selected point belonging to class ω1.
COM 2004 13
UNIVERSITY OF SHEFFIELD COM 2004
4. This question concerns clustering and dimensionality reduction.
B
A
C
D E F
H
G
2 4 6
2
4
6
a) The points in the above figure are to be clustered using the agglomerative clustering
algorithm. The cluster-to-cluster distance is defined to be the minimum point-to-point
distance. In the initial clustering, C0, each point is in a separate cluster and the clustering
can be presented as a set of sets as such.
C0 =
{
{A},{B},{C},{D},{E},{F},{G},{H}
}
(i) Point-to-point distances are measured using the Manhattan distance. Perform
the algorithm and use set notation to show the clustering after each iteration.
[10%]
ANSWER:
C0 =
{
{A},{B},{C},{D},{E},{F},{G},{H}
}
C1 =
{
{A,B},{C},{D},{E},{F},{G},{H}
}
C2 =
{
{A,B},{D,E},{C},{F},{G},{H}
}
C3 =
{
{A,B},{D,E},{C},{F,G},{H}
}
C4 =
{
{A,B,C},{D,E},{F,G},{H}
}
C5 =
{
{A,B,C,D,E},{F,G},{H}
}
C6 =
{
{A,B,C,D,E,H},{F,G}
}
C7 =
{
{A,B,C,D,E,F,G,H}
}
(ii) Point-to-point distances are measured using the Euclidean distance. Perform the
algorithm and use set notation to show the clustering after each iteration. [10%]
ANSWER:
COM 2004 14
UNIVERSITY OF SHEFFIELD COM 2004
C0 =
{
{A},{B},{C},{D},{E},{F},{G},{H}
}
C1 =
{
{A,B},{C},{D},{E},{F},{G},{H}
}
C2 =
{
{A,B},{D,E},{C},{F},{G},{H}
}
C3 =
{
{A,B,C},{D,E},{F},{G},{H}
}
C4 =
{
{A,B,C},{D,E},{F,G},{H}
}
C5 =
{
{A,B,C,D,E},{F,G},{H}
}
C6 =
{
{A,B,C,D,E,F,G},{H}
}
C7 =
{
{A,B,C,D,E,F,G,H}
}
(iii) Draw a dendogram to represent the hierarchical sequence of clusterings found
when using the Euclidean distance. [10%]
ANSWER:
B C D E F G HA
C1
C2
C3
C4
C5
C6
C7
C0
.
(iv) Consider a naive implementation of the algorithm which does not store point-
to-point distance measures across iterations. Calculate the precise number of
point-to-point distances that would need to be computed for each iteration when
performing the clustering described in 4 (a)(ii). [20%]
ANSWER:
• 1st Iteration: 8×1 : 1+ ...+7 = 28
• 2nd Iteration: 6×1,2 : 2×6+1+ ...+5 = 27
• 3rd Iteration: 4×1,2×2 : 4×4+1+2+3+4 = 26
COM 2004 15
UNIVERSITY OF SHEFFIELD COM 2004
• 4th Iteration: 1×3,1×2,3×1 : 3+3x2+3x3+2x3 = 24
• 5th Iteration: 3,2×2,1 : 7+4+2×6 = 23
• 6th Iteration: 5,2,1 : 7+10 = 17
• 7th Iteration: 7,1 : 0 = 0
Total 145
COM 2004 16
UNIVERSITY OF SHEFFIELD COM 2004
b) Consider the following dimensionality reduction techniques
• Discrete Cosine Transform (DCT),
• Principal Coponent Analysis (PCA) transform and
• Linear Discriminant Analysis (LDA) transform.
They can all be expressed as a linear transform of the form Y = X M where M is the
transform and X is the data matrix and Y is the data matrix after dimensionality reduc-
tion.
(i) Copy the table below and fill the cells with either ‘Yes’ or ‘No’ to indicate what
information is required in order to determine M .
The data points The class labels
DCT
PCA
LDA
[15%]
ANSWER:
The table should be completed as follows
The data points The class labels
DCT No No
PCA Yes No
LDA Yes Yes
.
(ii) PCA is being used to reduce the dimensionality of a 1000 sample set from 50
dimensions down to 5. State the number of rows and columns in each of Y , X
and M in the equation Y = X M that performs the dimensionality reduction. [15%]
ANSWER:
• X - 1000 rows and 50 columns
• M - 50 rows and 5 columns
• Y - 1000 rows and 5 columns
.
(iii) Dimensionality reduction is to be used to reduce two dimensional data to one
dimension. Draw a scatter plot for a two class problem in which PCA would
COM 2004 17
UNIVERSITY OF SHEFFIELD COM 2004
perform very badly but for which LDA would work well. [20%]
ANSWER:
Scatter plot should show points that (when the class label is ignored) are spread
along a direction that lies parallel to the decision boundary. For full marks the
sketch should be clear and well labeled.
x2
x1
!1
!2
.
END OF QUESTION PAPER
COM 2004 18