APM462-无代写
时间:2024-08-12
APM462 summer 2024
Final project1
Due date: Monday, August 19 before 11:59pm
(1) This problem is intended to make you think about the origins of
the method of conjugate directions. Given a positive definite sym-
metric n × n matrix Q and b ∈ Rn, let f(x) = 12xTQx − bTx
be a quadratic function. Let B0 = {0}, B1 = span{d0}, B2 =
span{d0, d1}, . . . , Bn = span{d0, . . . , dn−1} where d0, d1, . . . , dn−1 is
a basis of Rn. Note we are not assuming that the vectors di are
Q−orthogonal. Now fix x0 ∈ Rn and let xk ∈ x0 + Bk be the mini-
mizer of f(x) on x0 +Bk.
(a) Note that we can write x1 = x0 + α0d0 for a unique α0 ∈ R.
Express α0 in terms of x0, d0 and Q.
(b) Note that we can write x2 = x0+α
′
0d0+α
′
1d1 for unique α
′
0, α
′
1 ∈
R. Express α′0 and α′1 in terms of x0, d0, d1 and Q.
(c) Under what conditions on d0, d1 is α0 = α
′
0? Hint: conjugate
directions method.
(2) Given a function Φ : Rn → RN , define K(x,x′) := Φ(x) · Φ(x′).
The function K : Rn × Rn → R is called the kernel associated to
Φ. Kernels are used in Machine Learning. Given a set of m vectors
X := {x1, . . . ,xm} ⊂ Rn, consider the m × m matrix K whose
ij entry is K(xi,xj). The matrix K is called the kernel matrix
associated to the kernel K(·, ·) and the set X .
(a) Prove that the matrix K is a positive semidefinite symmetric
matrix.
(b) Given the function
Φ : R2 → R3 :
[
x1
x2
]
7→
x21x2√
2x1x2
find the associated kernel K(·, ·). Now find the 2×2 kernel ma-
trix associated to the kernel K(·, ·) and the set X := {x,x′} ⊂
R2.
(c) Given an m×m positive semidefinite symmetric matrix K and
a set of m vectors X := {x1, . . . ,xm} ⊂ Rn, it is possible to find
a function Φ : X → RN such that Kij = Φ(xi) · Φ(xj). Show
this when K =
[
a b
b c
]
is a 2× 2 positive semidefinite symmetric
matrix.
1Copyright ©2024 J. Korman. Sharing or selling this material without permission of
author is a copyright violation.
1
2(3) This is question about optimization with inequality constraints. The
problem we will investigate arises in the context of Support Vector
Machines which is a field of Machine Learning. You do not need to
know any machine learning to solve this problem, but here is some
discussion in case you are interested. If you wish, you can ignore the
discussion and go directly to the details of the problem.
Suppose you are given m points xi ∈ Rn, i = 1, . . . ,m (these
points are called ‘data points’). Each point xi is labelled by a
number yi which is either +1 or −1. You can think of the data
{(xi, yi) | i = 1, . . . ,m} as a collection of points in Rn some of which
are labelled + and some labelled −. The goal is to separate the +
points from the − points by a hyperplane. In general this may not
be possible, but here we assume that it is possible to separate the
+ points from the − points by a hyperplane. For example, in one
application the + data points label pictures of dogs, and the − data
points label pictures of cats, and we would like to find an affine func-
tion that can tell the difference between these cats and dogs, which
is equivalent to finding a separating hyperplane.
Recall that any hyperplane in Rn can be specified by a vector
w ∈ Rn and a number b: {x ∈ Rn | w · x + b = 0}. Note that
the hyperplanes separating the + points from the − points are the
ones satisfying: yi(w · xi + b) ≥ 0 for all i = 1, . . . ,m. For any such
separating hyperplane (w, b), the distance to the closest data point
is given by:
min
i=1,...,m
|w · xi + b|
|w| .
When building a SVM people want to pick, among all the many
separating hyperplanes, one that is optimal in the following sense:
max
w,b
yi(w·xi+b)≥0 mini=1,...,m
|w · xi + b|
|w| .
The solution to this optimization problem is a hyperplane that
maximizes the distance to the closest data point, which is the most
“stable” separating hyperplane. For example, if you choose a hyper-
plane that is very close to one of the data points, then a new nearby
data point of the same class might be misclassified by your SVM in
the future, which is something you would like to avoid.
3It is not difficult to show that the above problem is equivalent to
the following problem, which is the problem (*) we will be investi-
gating in this question:
min
1
2
|w|2
w ∈ Rn, b ∈ R
subject to: 1− yi(w · xi − b) ≤ 0, for i = 1, . . . ,m.
(*)
(a) Write the 1st order conditions for the problem (*) above. Note
that the unknowns here are the vector w ∈ Rn and the number
b ∈ R. Use µi as Lagrange multipliers.
(b) Suppose the data you are given has only two points x1 and x2
where y1 = +1 and y2 = −1. Use your conditions from part (a)
to find a candidate for the optimal separating hyperplane for
the two points x1 and x2. Hint: your answer should be a vector
w and a number b expressed in terms of of the points x1 and x2.
(4) Let A be an m×n matrix and x0 ∈ Rn. Consider the unconstrained
optimization problem
min
x∈Rm
|ATx− x0|2.
(a) Is f(x) = |ATx− x0|2 a convex function? Explain.
(b) Show that the above problem is equivalent to the problem
min
y∈Rn
|Qy − x0|2,
where Q = ATA. Hint: according to the fundamental theorem
of linear algebra any x ∈ Rm can be written as a sum x = Ay+z
for some y ∈ Rn and z ∈ Rm where AT z = 0.
(c) Assuming Q is positive definite find a formula for the minimizer
y∗ of the problem in part (b).
(d) Assuming Q is positive definite use part (c) to find all the min-
imizers x∗ ∈ Rm to the original unconstrained problem above.
(5) In homework 3, you derived the Lagrange multiplier condition for a
constrained minimization problem with one constraint using the 1st-
order condition for unconstrained minimization in the special case
when the surface is the graph of a function. That is, the Lagrange
multiplier condition for a problem of the form
min f(x, y, z)
subject to g(x, y, z) = z − h(x, y) = 0
(x, y, z) ∈ R3,
4can be derived by converting the problem into the unconstrained
problem
min F (x, y)
(x, y) ∈ R2
where F (x, y) := f(x, y, h(x, y)) and then using the condition∇F (x∗, y∗) =
0 to conclude
∇f(x∗, y∗, z∗) + λ∇g(x∗, y∗, z∗) = 0.
Here you are asked do the same thing for two constraints. That
is, starting with the problem
min f(x, y, z)
subject to g1(x, y, z) = z − h1(x, y) = 0
subject to g2(x, y, z) = z − h2(x, y) = 0
(x, y, z) ∈ R3,
derive the Lagrange multiplier condition ∇f + λ1∇g1 + λ2∇g2 = 0
by first converting the problem to an unconstrained minimization
problem as follows. Let (x∗, y∗, z∗) be a local minimum of f subject
to the two constraints and assume (x∗, y∗, z∗) is a regular point, so
∇g1(x∗, y∗, z∗) and ∇g2(x∗, y∗, z∗) are linearly independent. Assume
further the we are in the special case where the intersection of the
two surfaces g1 = 0 and g2 = 0 is a curve C which is parametrized
by a function of x. That is C = {(x, ϕ(x), ψ(x)) | x ∈ R} for some
C1 functions ϕ, ψ : R → R. Now convert the original constrained
problem into the following unconstrained problem:
min F (x)
x ∈ R
where F (x) := f(x, ϕ(x), ψ(x)) and use the condition ∇F (x∗) = 0
to conclude
∇f(x∗, y∗, z∗) + λ1∇g1(x∗, y∗, z∗) + λ2∇g2(x∗, y∗, z∗) = 0.
(6) Consider the optimization problem:
min f(x, y, z)
subject to: g(x, y, z) = z − (x2 + y2) 12 = 0
Note that {g = 0} is a cone with vertex at the origin. Also note that
g is not differentiable at the origin.
(a) Find all the regular points of the surface {g = 0} \ {(0, 0, 0)}.
(b) Suppose f(x, y, z) = z. Use the 1st order necessary conditions
for a local min to find the candidates for minimizers and then
solve the optimization problem.
5(c) Suppose f(x, y, z) = ax + cz where a, c ∈ R. Use the 1st order
necessary conditions for a local min to find the candidates for
minimizers and then solve the optimization problem. Hint: you
should consider different cases depending on the values of a and
c.
(7) Consider the optimization problem:
min cTx
subject to: aTi x ≤ bi for i = 1 . . .m
here x ∈ Rn, 1 ≤ m < n, and the ai’s are linearly independent.
Assuming a minimizer exist, let x∗ and µ∗ satisfy the 1st order nec-
essary Kuhn-Tucker condition.
(a) Prove that µ∗ is unique.
(b) Suppose c = −aj some j and that the all the active constraints
at x∗ are strongly active. Prove that x∗ is not unique.
(8) Let
F [x(·), y(·), z(·)] =
∫ T
0
{
1
2
[
x˙(t)2 + y˙(t)2 + z˙(t)2
]− gz(t)} dt,
and
H[x(t), y(t), z(t)] = x(t) + y(t)− z(t) = 0.
(a) Find and solve the Euler-Lagrange equations for the functional
F subject to the holonomic constraint H.
Note: L(t, x, y, z, x˙, y˙, z˙) = 12
[
x˙2 + y˙2 + z˙2
] − gz is the kinetic
energy minus the potential energy.
(b) Using the equation H[x, y, z] = x+ y − z = 0 to express z(t) =
x(t) + y(t) we can convert the above holonomic problem to the
unconstrained problem
F˜ [x(·), y(·)] =
∫ T
0
{[
x˙(t)2 + y˙(t)2 + x˙(t)y˙(t)
]− g[x(t) + y(t)]} dt.
Find and solve the Euler-Lagrange equation for the functional
F˜ .
Hint: you should get the same answer as in part (a).
(c) Show conservation of energy: that is show that the total energy
is the same at time t = 0 and at time t = T . Note: assume
(x, y)(0) = (1, 1), (x, y)(T ) = (−12 ,−12) and that (x˙, y˙)(0) =
(0, 0). The total energy at time t is the sum of the kinetic en-
ergy and potential energies: [x˙2 + y˙2] + g[x(t) + y(t)].
6(9) Let f : Rn → R be a C1 function and consider the functional:
F [u(·)] = f
(∫ b
a
δ1(x)u(x)dx, . . . ,
∫ b
a
δn(x)u(x)dx
)
where for each i = 1, ..., n, δi(x) is 1 for x ∈ [a+(i− 1) b−an , a+ i b−an ]
and 0 elsewhere. As usual the space A = {u : [a, b] → R | u ∈
C1, u(a) = A, u(b) = B}. Find the first order condition for a
minimizer u∗(·) of F in A.
Hint: Consider computing the “directional derivative” 0 = dds |s=0
F [u∗(·) + sv(·)] = · · · , where v(·) is a test function.