程序代写案例-ESE 402-542|学霸联盟

程序代写案例-ESE 402-542

时间：2021-12-20

ESE 402-542 : Statistics for Data Science
Instructor: Hamed Hassani
Fall 2020
Final Examination
NAME
Grade (y/n) Score Max. Score
Problem 1 30
Problem 2 40
Problem 3 30
TOTAL 100
Problem 1 (30 points)
Assume that X1, X2, · · · , Xn are generated i.i.d. according to the following
distribution:
Pr(X = i) = θ(1− θ)i, for i = 0, 1, 2, 3, · · ·
In other words, the pdf (pmf) of the distribution that generates the data
is of the form f(X = i|θ) = θ(1 − θ)i, where θ is an unknown parameter.
Consider the following hypothesis testing problem:
H0 : θ = θ0
Ha : θ = θa,
where we assume that θa > θ0.
Derive the most powerful test for this hypothesis testing problem and specify
what the acceptance/rejection regions are for a given significance level α0
(you can assume that n is large).
Problem 2 (40 points)
In this question we assume that data is generated according to a distribution
P (X = x, Y = y) given as follows: x ∈ R and y ∈ {−1, 1}, i.e. the data is
one-dimensional and the label is binary. Write P (X = x, Y = y) = P (Y =
y)P (X = x|Y = y). We let P (y = +1) = 3
4
, and P (Y = −1) = 1
4
, and
P (X = x|Y = +1) = 1
2
exp(−|x− 2|), and,
P (X = x|Y = −1) = 1
2
exp(−|x+ 2|).
I.e. P (X = x|Y = +1) and P (X = x|Y = −1) follow the Laplace distribu-
tion. (The mean of a Laplace distribution with pdf 1
2b
e(−
|x−µ|
b
) is µ and the
variance is 2b2.)
1. Derive the expression for P (X = x, Y = y).
2. Plot the conditional distributions P (X = x, Y = +1) and P (X =
x, Y = −1) in one figure.
3. Write the Bayes optimal classification rule given the above distribution
P and simplify it (hint: in the end you should reach to a very simple
classification rule that classifies an input x based on whether or not its
value is greater than a threshold).
4. Compute the probability of classification error for the Bayes optimal
classifier.
5. Let us now consider QDA (this part and the next can be answered inde-
pendently from the previous parts). Given training data (x1, y1), · · · , (xn, yn),
explain briefly the main steps of training the QDA model. I.e. what
quantities/probabilities are being estimated by QDA? What is the
parametric model used? How are the parameters of the model found?
(Recall that QDA is similar to LDA with the exception that the vari-
ance per class can be different.)
6. Assume that the number of training data points is very large (i.e. n→
∞); What will be the exact value of the parameters of the trained QDA
model in this case? (Hint: You don’t really need much calculation to
derive the parameters.)
7. Simplify the QDA classifier that you obtained in the previous part
(hint: in the end you should reach to a very simple classification rule
that classifies an input x based on whether or not its value is greater
than a threshold).
8. Given your answers in Parts 2 and 5, what do you think about the
performance of QDA compared to what can be done optimally? Does
QDA perform optimally when we have many training data points?
Problem 3 (30 points)
1. Find the VC-dimension of the function class H1 = {ha; a ∈R}, where
ha : R→ {0, 1} is defined as
ha(x) =

1, if x ∈ [a, a+ 1]
1, if x ∈ [a+ 5, a+ 6]
0 otherwise.
2. Find the VC-dimension of the function class H2={ha,b; a, b ∈[0,+∞)},
where ha,b : R→ {0, 1} is defined as
ha,b(x) =

1, if |x| ≤ a
1 if |x| ≥ b
0 otherwise.
3. Consider the class H1. Given a training data set S, let h∗S be the
outcome of the empirical risk minimization procedure restricted to class
H1. How many training samples do we need (ignoring constants) to
ensure that for any distribution of the data, D, the following event
occurs with probability 1− δ?
LD(h∗S)− min
h∈H1
LD(h) ≤ .
(Justify your answer in one sentence.)
4. Consider again the same class H1. Recall that we defined h∗S to be
the outcome of the empirical risk minimization procedure restricted to
class H1. How many training samples do we need (ignoring constants)
to ensure that for any distribution of the data, D, the following event
occurs with probability 1− δ?
LD(h∗S)− min
h∈{all the possible functions}
LD(h) ≤ .
(Justify your answer in one sentence.)