xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

python代写-COM 2004

时间：2021-01-18

COM 2004

1. This question concerns probability theory.

a) The discrete random variable X represents the outcome of a biased coin toss. X has

the probability distribution given in the table below,

x H T

P(X = x) θ 1−θ

where H represents a head and T represents a tail.

(i) Write an expression in terms of θ for the probability of observing the sequence

H, T, H, H. [5%]

(ii) A sequence of coin tosses is observed that happens to contain NH heads and

NT tails. Write an expression in terms of θ for the probability of observing this

specific sequence. [5%]

(iii) Show that having observed a sequence of coin tosses containing NH heads and

NT tails, the maximum likelihood estimate of the parameter θ is given by

NH

NH+NT

[20%]

b) The discrete random variables X1 and X2 represent the outcome of a pair of independent

but biased coin tosses. Their joint distribution P(X1,X2) is given by the probabilities in

the table below,

X1 = H X1 = T

X2 = H λ 3λ

X2 = T 2λ ρ

(i) Write down the probability P(X1 = H,X2 = H). [5%]

(ii) Calculate the probability P(X1 = H) in terms of λ. [5%]

(iii) Calculate the probability P(X2 = H) in terms of λ. [5%]

(iv) Given that the coin tosses are independent and that λ is greater than 0, use your

previous answers to calculate the value of λ. [15%]

(v) Calculate the value of ρ. [5%]

COM 2004 3 TURN OVER

COM 2004

c) Consider the distribution sketched int the figure below.

0 1

2λ

λ

b

p(x)

x

p(x) =

2λ if 0<= x< b

λ if b<= x<= 1

0 otherwise

(i) Write an expression for λ in terms of the parameter b. [15%]

(ii) Two independent samples, x1 and x2, are observed. x1 has the value 0.25 and

x2 has the value 0.75. Sketch p(x1,x2;b) as a function of b as b varies between

0 and 1. Using your sketch, calculate the maximum likelihood estimate of the

parameter b given the observed samples. [20%]

COM 2004 4 CONTINUED

COM 2004

2. This question concerns the multivariate normal distribution.

a) Consider the data in the following table showing the height (x1) and arm span (x2) of a

sample of 8 adults.

x1 151.1 152.4 152.9 156.8 161.8 158.6 157.4 158.8

x2 154.5 162.2 151.5 158.2 165.3 165.6 159.8 162.0

The joint distribution of the two variables is to be modeled using a multivariate Gaussian

with mean vector, µ and covariance matrix, Σ.

(i) Calculate an appropriate value for the mean vector, µ. [5%]

(ii) Write down the formula for sample variance. Use it to calculate the unbiased

variance estimate for both height and arm span. [10%]

(iii) Write down the formula for sample covariance. Use it to calculate the unbiased

estimate of the covariance between height and arm span. [10%]

(iv) Write down the covariance matrix, Σ. [5%]

(v) Compute the inverse covariance matrix, Σ−1. [15%]

b) Remember that the pdf of a multivariate Gaussian is given by

p(x) =Ce−

1

2

(x−µ)TΣ−1(x−µ)

whereC is a scaling constant that does not depend on x.

Using the answer to 2 (a) and the equation above, answer the following questions.

(i) Who should be considered more unusual:

• Ginny who is 162.1 cm tall and has arms 164.2 cm long, or

• Cho who is 156.0 cm tall and has arms 153.1 cm long?

Show your reasoning. [20%]

(ii) A large sample of women is taken and it is found that 120 have measurements

similar to those of Ginny. How many women in the same sample would be ex-

pected to have measurements similar to those of Cho? [15%]

COM 2004 5 TURN OVER

COM 2004

c) A person’s ‘ape index’ is defined as their arm span minus their height.

(i) Use the data in 2 (a) to estimate a mean and variance for ape index. [10%]

(ii) The figure below shows a standard normal distribution, i.e., X ∼ N(0,1). The

percentages indicate the proportion of the total area under the curve for each

segment.

Using the diagram estimate the proportion of the population who will have an ape

index greater than 10.5? [5%]

(iii) Using the figure above estimate the mean-centred range of ape indexes that

would include 99% of the population. [5%]

COM 2004 6 CONTINUED

COM 2004

3. This question concerns classifiers.

a) Consider a Bayesian classification system based on a pair of univariate normal distribu-

tions. The distributions have equal variance and equal priors. The mean of class 1 is

less than the mean of class 2. For each case below say whether the decision threshold

increases, decreases, remains unchanged or can move in either direction.

(i) The mean of class 2 is increased. [5%]

(ii) The mean of class 1 and class 2 are decreased by equal amounts. [5%]

(iii) The prior probability of class 2 is increased. [5%]

(iv) The variance of class 1 and class 2 are increased by equal amounts. [5%]

(v) The variance of class 2 is increased. [5%]

b) Consider a Bayesian classification system based on a pair of 2-D multivariate normal

distributions, p(x|ω1) ∼ N(µ1,Σ1) and p(x|ω2) ∼ N(µ2,Σ2) . The distributions have

the following parameters

µ1 =

(

1

2

)

µ2 =

(

3

5

)

Σ1 = Σ2 =

(

1 0

0 1

)

The classes have equal priors, i.e., P(ω1) = P(ω2).

Calculate the equation for the decision boundary in the form x2 = mx1+ c.

[25%]

c) Consider a K nearest neighbour classifier being used to classify 1-D data belonging to

classes ω1 and ω2. The training samples for the two classes are

ω1 = {1,3,5,7,9} ω2 = {2,4,6,8}

The diagram below shows the decision boundaries and class labels for the case K = 1.

1 2 3 4 5 6 7 8 9

!1 !1 !1 !1 !1!2 !2 !2 !2

Make similar sketches for the cases K = 3, K = 5, K = 7 and K = 9. [25%]

COM 2004 7 TURN OVER

COM 2004

d) Consider a K-nearest neighbour that uses a Euclidean distance measure, K = 1, and

the following samples as training data,

ω1 =

{

(0,1)T ,(1,1)T ,(1,2)T

}

ω2 =

{

(1,0)T ,(2,1)T

}

A point is selected uniformly at random from the region defined by 0≤ x1≤ 2, 0≤ x2≤ 2.

What is the probability that the point is classified as belonging to class ω1? [Hint: start

by sketching the decision boundary.]

[25%]

COM 2004 8 CONTINUED

COM 2004

4. This question concerns clustering and dimensionality reduction.

B

A

C

D E F

H

G

2 4 6

2

4

6

a) The points in the above figure are to be clustered using the agglomerative clustering

algorithm. The cluster-to-cluster distance is defined to be the minimum point-to-point

distance. In the initial clustering, C0, each point is in a separate cluster and the clustering

can be presented as a set of sets as such.

C0 =

{

{A},{B},{C},{D},{E},{F},{G},{H}

}

(i) Point-to-point distances are measured using the Manhattan distance. Perform

the algorithm and use set notation to show the clustering after each iteration.

[10%]

(ii) Point-to-point distances are measured using the Euclidean distance. Perform the

algorithm and use set notation to show the clustering after each iteration. [10%]

(iii) Draw a dendogram to represent the hierarchical sequence of clusterings found

when using the Euclidean distance. [10%]

(iv) Consider a naive implementation of the algorithm which does not store point-

to-point distance measures across iterations. Calculate the precise number of

point-to-point distances that would need to be computed for each iteration when

performing the clustering described in 4 (a)(ii). [20%]

COM 2004 9 TURN OVER

COM 2004

b) Consider the following dimensionality reduction techniques

• Discrete Cosine Transform (DCT),

• Principal Coponent Analysis (PCA) transform and

• Linear Discriminant Analysis (LDA) transform.

They can all be expressed as a linear transform of the form Y = XM where M is the

transform and X is the data matrix and Y is the data matrix after dimensionality reduc-

tion.

(i) Copy the table below and fill the cells with either ‘Yes’ or ‘No’ to indicate what

information is required in order to determineM .

The data points The class labels

DCT

PCA

LDA

[15%]

(ii) PCA is being used to reduce the dimensionality of a 1000 sample set from 50

dimensions down to 5. State the number of rows and columns in each of Y , X

and M in the equation Y = XM that performs the dimensionality reduction. [15%]

(iii) Dimensionality reduction is to be used to reduce two dimensional data to one

dimension. Draw a scatter plot for a two class problem in which PCA would

perform very badly but for which LDA would work well. [20%]

END OF QUESTION PAPER

COM 2004 10

1. This question concerns probability theory.

a) The discrete random variable X represents the outcome of a biased coin toss. X has

the probability distribution given in the table below,

x H T

P(X = x) θ 1−θ

where H represents a head and T represents a tail.

(i) Write an expression in terms of θ for the probability of observing the sequence

H, T, H, H. [5%]

(ii) A sequence of coin tosses is observed that happens to contain NH heads and

NT tails. Write an expression in terms of θ for the probability of observing this

specific sequence. [5%]

(iii) Show that having observed a sequence of coin tosses containing NH heads and

NT tails, the maximum likelihood estimate of the parameter θ is given by

NH

NH+NT

[20%]

b) The discrete random variables X1 and X2 represent the outcome of a pair of independent

but biased coin tosses. Their joint distribution P(X1,X2) is given by the probabilities in

the table below,

X1 = H X1 = T

X2 = H λ 3λ

X2 = T 2λ ρ

(i) Write down the probability P(X1 = H,X2 = H). [5%]

(ii) Calculate the probability P(X1 = H) in terms of λ. [5%]

(iii) Calculate the probability P(X2 = H) in terms of λ. [5%]

(iv) Given that the coin tosses are independent and that λ is greater than 0, use your

previous answers to calculate the value of λ. [15%]

(v) Calculate the value of ρ. [5%]

COM 2004 3 TURN OVER

COM 2004

c) Consider the distribution sketched int the figure below.

0 1

2λ

λ

b

p(x)

x

p(x) =

2λ if 0<= x< b

λ if b<= x<= 1

0 otherwise

(i) Write an expression for λ in terms of the parameter b. [15%]

(ii) Two independent samples, x1 and x2, are observed. x1 has the value 0.25 and

x2 has the value 0.75. Sketch p(x1,x2;b) as a function of b as b varies between

0 and 1. Using your sketch, calculate the maximum likelihood estimate of the

parameter b given the observed samples. [20%]

COM 2004 4 CONTINUED

COM 2004

2. This question concerns the multivariate normal distribution.

a) Consider the data in the following table showing the height (x1) and arm span (x2) of a

sample of 8 adults.

x1 151.1 152.4 152.9 156.8 161.8 158.6 157.4 158.8

x2 154.5 162.2 151.5 158.2 165.3 165.6 159.8 162.0

The joint distribution of the two variables is to be modeled using a multivariate Gaussian

with mean vector, µ and covariance matrix, Σ.

(i) Calculate an appropriate value for the mean vector, µ. [5%]

(ii) Write down the formula for sample variance. Use it to calculate the unbiased

variance estimate for both height and arm span. [10%]

(iii) Write down the formula for sample covariance. Use it to calculate the unbiased

estimate of the covariance between height and arm span. [10%]

(iv) Write down the covariance matrix, Σ. [5%]

(v) Compute the inverse covariance matrix, Σ−1. [15%]

b) Remember that the pdf of a multivariate Gaussian is given by

p(x) =Ce−

1

2

(x−µ)TΣ−1(x−µ)

whereC is a scaling constant that does not depend on x.

Using the answer to 2 (a) and the equation above, answer the following questions.

(i) Who should be considered more unusual:

• Ginny who is 162.1 cm tall and has arms 164.2 cm long, or

• Cho who is 156.0 cm tall and has arms 153.1 cm long?

Show your reasoning. [20%]

(ii) A large sample of women is taken and it is found that 120 have measurements

similar to those of Ginny. How many women in the same sample would be ex-

pected to have measurements similar to those of Cho? [15%]

COM 2004 5 TURN OVER

COM 2004

c) A person’s ‘ape index’ is defined as their arm span minus their height.

(i) Use the data in 2 (a) to estimate a mean and variance for ape index. [10%]

(ii) The figure below shows a standard normal distribution, i.e., X ∼ N(0,1). The

percentages indicate the proportion of the total area under the curve for each

segment.

Using the diagram estimate the proportion of the population who will have an ape

index greater than 10.5? [5%]

(iii) Using the figure above estimate the mean-centred range of ape indexes that

would include 99% of the population. [5%]

COM 2004 6 CONTINUED

COM 2004

3. This question concerns classifiers.

a) Consider a Bayesian classification system based on a pair of univariate normal distribu-

tions. The distributions have equal variance and equal priors. The mean of class 1 is

less than the mean of class 2. For each case below say whether the decision threshold

increases, decreases, remains unchanged or can move in either direction.

(i) The mean of class 2 is increased. [5%]

(ii) The mean of class 1 and class 2 are decreased by equal amounts. [5%]

(iii) The prior probability of class 2 is increased. [5%]

(iv) The variance of class 1 and class 2 are increased by equal amounts. [5%]

(v) The variance of class 2 is increased. [5%]

b) Consider a Bayesian classification system based on a pair of 2-D multivariate normal

distributions, p(x|ω1) ∼ N(µ1,Σ1) and p(x|ω2) ∼ N(µ2,Σ2) . The distributions have

the following parameters

µ1 =

(

1

2

)

µ2 =

(

3

5

)

Σ1 = Σ2 =

(

1 0

0 1

)

The classes have equal priors, i.e., P(ω1) = P(ω2).

Calculate the equation for the decision boundary in the form x2 = mx1+ c.

[25%]

c) Consider a K nearest neighbour classifier being used to classify 1-D data belonging to

classes ω1 and ω2. The training samples for the two classes are

ω1 = {1,3,5,7,9} ω2 = {2,4,6,8}

The diagram below shows the decision boundaries and class labels for the case K = 1.

1 2 3 4 5 6 7 8 9

!1 !1 !1 !1 !1!2 !2 !2 !2

Make similar sketches for the cases K = 3, K = 5, K = 7 and K = 9. [25%]

COM 2004 7 TURN OVER

COM 2004

d) Consider a K-nearest neighbour that uses a Euclidean distance measure, K = 1, and

the following samples as training data,

ω1 =

{

(0,1)T ,(1,1)T ,(1,2)T

}

ω2 =

{

(1,0)T ,(2,1)T

}

A point is selected uniformly at random from the region defined by 0≤ x1≤ 2, 0≤ x2≤ 2.

What is the probability that the point is classified as belonging to class ω1? [Hint: start

by sketching the decision boundary.]

[25%]

COM 2004 8 CONTINUED

COM 2004

4. This question concerns clustering and dimensionality reduction.

B

A

C

D E F

H

G

2 4 6

2

4

6

a) The points in the above figure are to be clustered using the agglomerative clustering

algorithm. The cluster-to-cluster distance is defined to be the minimum point-to-point

distance. In the initial clustering, C0, each point is in a separate cluster and the clustering

can be presented as a set of sets as such.

C0 =

{

{A},{B},{C},{D},{E},{F},{G},{H}

}

(i) Point-to-point distances are measured using the Manhattan distance. Perform

the algorithm and use set notation to show the clustering after each iteration.

[10%]

(ii) Point-to-point distances are measured using the Euclidean distance. Perform the

algorithm and use set notation to show the clustering after each iteration. [10%]

(iii) Draw a dendogram to represent the hierarchical sequence of clusterings found

when using the Euclidean distance. [10%]

(iv) Consider a naive implementation of the algorithm which does not store point-

to-point distance measures across iterations. Calculate the precise number of

point-to-point distances that would need to be computed for each iteration when

performing the clustering described in 4 (a)(ii). [20%]

COM 2004 9 TURN OVER

COM 2004

b) Consider the following dimensionality reduction techniques

• Discrete Cosine Transform (DCT),

• Principal Coponent Analysis (PCA) transform and

• Linear Discriminant Analysis (LDA) transform.

They can all be expressed as a linear transform of the form Y = XM where M is the

transform and X is the data matrix and Y is the data matrix after dimensionality reduc-

tion.

(i) Copy the table below and fill the cells with either ‘Yes’ or ‘No’ to indicate what

information is required in order to determineM .

The data points The class labels

DCT

PCA

LDA

[15%]

(ii) PCA is being used to reduce the dimensionality of a 1000 sample set from 50

dimensions down to 5. State the number of rows and columns in each of Y , X

and M in the equation Y = XM that performs the dimensionality reduction. [15%]

(iii) Dimensionality reduction is to be used to reduce two dimensional data to one

dimension. Draw a scatter plot for a two class problem in which PCA would

perform very badly but for which LDA would work well. [20%]

END OF QUESTION PAPER

COM 2004 10