xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

Python代写｜机器学习代写 - CSE 5523: Machine Learning

时间：2020-12-07

Version:0.9 StartHTML:0000000105 EndHTML:0000087257 StartFragment:0000000141 EndFragment:0000087217
Instructions (Important)
â€¢ Total points = 55. Full Mark = 50.
â€¢ Read each problem statement carefully. Make sure that you answer exactly what is required and
that your solution is expressed clearly.
â€¢ Some notation introduced in class, e.g., the notation we used for true risk, empirical risk,
the output of a learning algorithm given a training set, parameter vectors, etc., will be
used here directly without being defifined again. Please, refer to the notes if you have any
confusion about the notation. Also, here k Â· k will be used to denote the Euclidean (i.e., L2) norm.
â€¢ You may use any result covered in class, but please cite the result that you use.
Problem 1: Linear Hard-Margin SVMs (15 points)
In this problem, refer to the fifigure attached in the last page. In the attached fifigure, consider the
constellation of feature vectors x
i
= (x
(i)
1 , x
(i)
2 ) âˆˆ R2
and their assigned labels y(i) , i = 1, . . . , 7 (see the
color/label code in the top-right corner of the fifigure):
1. Find the margin-maximizing linear classififier; that is, fifind (Ï‰, b) = ((w1, w2), b) describing the equation
of the plane w1x1 + w2x2 + b = 0 of the largest margin with respect to the given constellation. Express
(Ï‰, b) after performing the proper normalization such that min iâˆˆ{1,...,7}
y(i)
â¹°?hÏ‰, x(i)
i + b = 1.
[Note: You do not need to solve the optimization problem for the SVM or prove formally that your
plane is margin-maximizing. It is only required to fifind the parameters of the margin-maximizing
classififier (Ï‰, b) as instructed above. You may want to provide an intuitive explanation of why you think
your solution is optimal, for which you may get partial credit in case your solution is incorrect.]
2. Given your solution in Part 1:
(a) Identify the set Q âŠ† {x (1) , . . . , x (7)
} of points that are closest to your plane.
(b) Find a set of scalars {Î±i : i âˆˆ Q} that satisfy all of the following (showing all details of your
derivation):
âˆ€ i âˆˆ Q, Î±i â‰¥ 0,
XiâˆˆQ
y
(i)
Î±i = 0,
Ï‰ = XiâˆˆQ
Î±iy
(i)
x
(i)
(c) What are the support vectors of your classififier?Problem 2: Solving a Kernel-Based SVM with SGD (18 points)
This problem illustrates a simple example for solving a kernel-based SVM with Gaussian kernel via SGD.
Consider a training set of 3 data points: (x(i) , y(i)
)
âˆˆ
R2
Ã— {âˆ’
1, +1
}
, i = 1, 2, 3, where
x
(1)
=
0
0
, y
(1)
= +1
x
(2)
=
2
1
, y
(2)
= 1
x
(3)
=
2
1
, y
(3)
= 1
Let K : R2
Ã—
R2
â†’
R be a Gaussian kernel defifined as: K(x, x0
) = exp
1
8
k
x x0
k
2
for any x, x0
âˆˆ
R2
.
Suppose we want to solve the kernel-based SVM as described in class with respect to this training set:
1. Obtain the Kernel matrix with respect to the given training set. Hence, write down the
optimization problem corresponding to solving the soft-margin kernel-based SVM, i.e.,
write down an expression for the objective function (the regularized loss) that needs to be minimized
over the variable coeffiffifficients Î± = (Î±1, Î±2, Î±3) as discussed in class.
2. Set the regularization parameter Î› in the objective function of Part 1 as Î› = 1. Now, suppose we want
to solve this optimization problem via SGD as discussed in class. For simplicity, consider SGD with
total number of updates T = 6, i.e., 5 update steps not including initialization, which is assumed to
be at Î±(1)
= 0. Showing all intermediate steps, perform those updates and obtain the fifinal
coeffiffifficient-vector Î±
b
(i.e., show all updates Î±(t) , t = 1, . . . , 6, as well as the fifinal output Î±
b
). Note
that in the actual execution, in each step, SGD samples one data point uniformly at random from the
training set. For the sake of this problem, assume that over the course of the 5 update steps after
initialization, SGD samples the folowing data points in the following order:
at t = 1,
x
(1)
, y
(1)
is sampled
at t = 2,
x
(2)
, y
(2)
is sampled
at t = 3,
x
(3)
, y
(3)
is sampled
at t = 4,
x
(2)
, y
(2)
is sampled
at t = 5,
x
(3)
, y
(3)
is sampled
3. Given the coeffiffifficient-vector Î±
b
obtained in Part 2,
(a) give an expression for the resulting kernel-based classififier,
(b) suppose you are given a new feature-vector x = (0, 1)
âˆˆ
R2
. What label will your kernel classififier
generate for x? Show your derivation.
Problem 3: Stability (10 points)
We say that a learning algorithm
A
is an AERM (Asymptotic Empirical Risk Minimizer ) for the class
H with excess empirical risk (n) if for every distribution D it holds that
E
Sâˆ¼Dn
L
b
(
A
(S); S) min hâˆˆH L
b
(h; S)
â‰¤
(n).We say learning algorithm
A
learns a class
H
with excess true risk (n) if for every distribution D it holds
that
E
Sâˆ¼Dn
L(
A
(S); D) min hâˆˆH L(h; D)
â‰¤
(n).
Prove that if a learning algorithm
A
is 1(n)-On-Average-Replace-One stable, and is an AERM with excess
empirical risk 2(n) for the class
H
, then it learns
H
with excess true risk 1(n) + 2(n).
Problem 4: Noisy Gradient Descent (12 points)
Let M, Ï > 0. Let
C âŠ‚
Rd
be a closed convex set where max wâˆˆC k
w
k â‰¤
M. Let f : Rd
â†’
R be a convex,
Ï-Lipschitz function. Consider a variant of the basic (non-stochastic) gradient descent algorithm discussed
in class, where at each update step, we get a noisy version of the gradient. Namely, at each iteration
t = 1, . . . , T 1, the update step is given by:
wt+1 = Î C (wt Î± (
âˆ‡
f(wt) + vt)),
where vt
âˆ¼ N
0, Ïƒ2Id
, t = 1, . . . , T 1, are independent Gaussian random variables (Id is the identity
matrix of size d), and Î C is the Euclidean projection onto
C
discussed in class. You may assume the standard
initialization: w1 = 0.
Show that if Î± =
âˆš
M
(Ï2
+dÏƒ2
)T , then the output of the algorithm w
b
=
1
T
P
T
t=1 wt after T iterations satisfifies
E [f(w
b
)] min wâˆˆC f(w)
â‰¤
M
p
Ï2 + dÏƒ2
âˆšT
,
where the expectation is taken w.r.t. the Gaussian noise variables vt, t = 1, . . . , T 1.
[Note that the problem here is an optimization problem not a learning problem, i.e., there is no training data
or the notion of excess risk. The goal is to minimize an objective function f over
C âŠ‚
Rd
via this noisy
version of the basic gradient descent algorithm.]
[Note that the bound to be proved is on the expectation of the optimization error. In your derivation, you
need to take the expectation w.r.t. the randomness due to the Gaussian noise.](1, 5)
+ 1 label
- 1 label
(4, -2)
( - 1, 1)
x2
x1
(-1, 4)
(-2, -5)
(1, - 1)
(-5, 3)
x
(1)
=
x
(2)
=
x
(3)
=
x
(5)
=
x
(4)
=
x
(7)
=
x
(6)
=