xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

Python代写-CSE 251A

时间：2021-02-06

CSE 251A: Machine learning Winter 2021

Programming project 2 — Coordinate descent

Overview

In this project we consider a standard unconstrained optimization problem:

minL(w)

where L(·) is some cost function and w ∈ Rd. In class, we looked at several approaches to solving such

problems—such as gradient descent and stochastic gradient descent—under differentiability conditions on

L(w). We will now look at a different, and in many ways simpler, approach:

• Initialize w somehow.

• Repeat: pick a coordinate i ∈ {1, 2, . . . , d}, and update the value of wi so as to reduce the loss.

Two questions need to be answered in order to fully specify the updates:

(a) Which coordinate to choose?

(b) How to set the new value of wi?

Give answers to these questions, and thereby flesh out a coordinate descent method. For (a), your solution

should do something more adaptive than merely picking a coordinate at random.

Then implement and test your algorithm on a logistic regression problem. First download the wine data

set that we have frequented alluded to in class:

https://archive.ics.uci.edu/ml/datasets/Wine

This contains 178 data points in 13 dimensions, with 3 classes. Just keep the first two classes (with 59 and

71 points, respectively) so as to create a binary problem.

What to turn in

On the due date, upload (to gradescope) a typewritten report containing the following elements (each

labeled clearly).

1. A short, high-level description of your coordinate descent method.

In particular, you should give a concise description of how you solve problems (a) and (b) above. Do you

need the function L(·) to be differentiable (and maybe even have continuous second-order derivatives),

or does it work with any cost function?

2. Convergence.

Under what conditions do you think your method converges to the optimal loss? There’s no need to

prove anything: just give a few sentences of brief explanation.

2-1

CSE 251A Programming project 2 — Coordinate descent Winter 2021

3. Experimental results.

(Remember that all training must take place on just classes 1 and 2.)

• Begin by running a standard logistic regression solver (e.g., from scikit-learn) on the training

set. It should not be regularized: if the solver forces you to do this, just set the regularization

constant suitably to make it irrelevant. Make note of the final loss L∗.

• Then, implement your coordinate descent method and run it on this data.

• Finally, compare to a method that chooses coordinates i uniformly at random and then updates

wi using your method (we’ll call this “random-feature coordinate descent”).

• Produce a clearly-labeled graph that shows how the loss of your algorithm’s current iterate—

that is, L(wt)—decreases with t; it should asymptote to L

∗. On the same graph, show the

corresponding curve for random-feature coordinate descent.

4. Critical evaluation.

Do you think there is scope for further improvement in your coordinate descent scheme in (1); if so,

how?

5. (Optional) Sparse coordinate descent.

Now, suppose we want a k-sparse solution w: that is, one that has at most k nonzero entries.

• Propose a modified version of your method for this task. Assume k is part of the input, along

with the data.

• Do you think this method always find the best k-sparse solution when L(·) is convex?

• Try this out on the wine data. Make a table of loss values obtained for a few selected values of k.

2-2

学霸联盟

Programming project 2 — Coordinate descent

Overview

In this project we consider a standard unconstrained optimization problem:

minL(w)

where L(·) is some cost function and w ∈ Rd. In class, we looked at several approaches to solving such

problems—such as gradient descent and stochastic gradient descent—under differentiability conditions on

L(w). We will now look at a different, and in many ways simpler, approach:

• Initialize w somehow.

• Repeat: pick a coordinate i ∈ {1, 2, . . . , d}, and update the value of wi so as to reduce the loss.

Two questions need to be answered in order to fully specify the updates:

(a) Which coordinate to choose?

(b) How to set the new value of wi?

Give answers to these questions, and thereby flesh out a coordinate descent method. For (a), your solution

should do something more adaptive than merely picking a coordinate at random.

Then implement and test your algorithm on a logistic regression problem. First download the wine data

set that we have frequented alluded to in class:

https://archive.ics.uci.edu/ml/datasets/Wine

This contains 178 data points in 13 dimensions, with 3 classes. Just keep the first two classes (with 59 and

71 points, respectively) so as to create a binary problem.

What to turn in

On the due date, upload (to gradescope) a typewritten report containing the following elements (each

labeled clearly).

1. A short, high-level description of your coordinate descent method.

In particular, you should give a concise description of how you solve problems (a) and (b) above. Do you

need the function L(·) to be differentiable (and maybe even have continuous second-order derivatives),

or does it work with any cost function?

2. Convergence.

Under what conditions do you think your method converges to the optimal loss? There’s no need to

prove anything: just give a few sentences of brief explanation.

2-1

CSE 251A Programming project 2 — Coordinate descent Winter 2021

3. Experimental results.

(Remember that all training must take place on just classes 1 and 2.)

• Begin by running a standard logistic regression solver (e.g., from scikit-learn) on the training

set. It should not be regularized: if the solver forces you to do this, just set the regularization

constant suitably to make it irrelevant. Make note of the final loss L∗.

• Then, implement your coordinate descent method and run it on this data.

• Finally, compare to a method that chooses coordinates i uniformly at random and then updates

wi using your method (we’ll call this “random-feature coordinate descent”).

• Produce a clearly-labeled graph that shows how the loss of your algorithm’s current iterate—

that is, L(wt)—decreases with t; it should asymptote to L

∗. On the same graph, show the

corresponding curve for random-feature coordinate descent.

4. Critical evaluation.

Do you think there is scope for further improvement in your coordinate descent scheme in (1); if so,

how?

5. (Optional) Sparse coordinate descent.

Now, suppose we want a k-sparse solution w: that is, one that has at most k nonzero entries.

• Propose a modified version of your method for this task. Assume k is part of the input, along

with the data.

• Do you think this method always find the best k-sparse solution when L(·) is convex?

• Try this out on the wine data. Make a table of loss values obtained for a few selected values of k.

2-2

学霸联盟