COGS185-Python代写-Assignment 1
时间:2023-04-18
Homework Assignment 1
COGS 185: Advanced Machine Learning Methods
Due: April 20, 2023 11:59 PM PDT
Instructions: Answer the questions below, attach your code, and insert figures
to create a PDF file; submit your file via Gradescope. You may look up the
information on the Internet, but you must write the final homework solutions
by yourself.
Late policy: 5% of the total received points will be deducted on the first day
past due. Every 10% of the total received points will be deducted for every
extra day thereafter.
Introduction
In multi-class classification, each training data point belongs to one of the N
different classes. The goal is to construct a function that given a new data point,
will correctly predict the class to which the new data point belongs. In this
assignment, you will explore the solutions to the multi-class classification tasks
and compare with the one vs. all strategy and specific multi-class classification
algorithms.
Datasets
Throughout the assignment, we use UCI iris dataset for both training and
testing. There are 3 classes in iris dataset, and each class has 50 examples.
Leave 10 examples in each class to form a testing set. In other words, the size
of training set will be 120 examples, and the testing set is of 30.
(Bonus) In addition, verify the off-the-shelf classifier you chose on the dna
dataset, which has 3 classes as well. It contains 2, 000 examples for training
and 1, 186 examples for testing. For this dataset, you only need to perform the
bonus question and no need to run your own code done in task 1, 2, and 3.
Here is the link to the iris dataset. http://archive.ics.uci.edu/ml/
datasets/Iris?ref=datanews.io
Alternatively, you can use sklearn.datasets.load iris() to access Iris
dataset.
The dna dataset can be downloaded from: http://www.csie.ntu.edu.
tw/~cjlin/libsvmtools/datasets/multiclass.html
1
1 (10 points) Conceptual Questions
Go to Quizzes section in Canvas and submit answers to conceptual questions.
2 (30 points) Task 1: One-vs-All SVM
You need to implement the 3 linear SVM using gradient descent by yourself.
Here are the instructions for one-vs-all SVMs:
1. For each SVM, convert the labels from 3 classes to 2 classes. In other
words, take turns to convert label 0, 1, or 2 into +1 and render the other
2 labels to −1.
2. Train and test on the three bi-class datasets.
3. During prediction phase, we predict an example x with the 3 SVMs and
take the results as degrees of belief to decide the predicted class.
The loss function for each one-vs-all SVM is shown below. Here we assume the
labels y ∈ {−1, 1} and x = {x0, . . . xm}, where x0 = 1 is added as a bias. Given
the training dataset {(x(i), y(i))}, the loss is defined over all data points in the
dataset:
minimize L(w) =
1
2
∥w∥2 + C

i
max[0, 1− y(i) × f(x(i);w)], (1)
where f(x(i);w) = w · x(i) and C = 0.5, 2.0, 5.0, 10.0.
You are NOT allowed to use sklearn’s svm.SVC() here. Train the linear
SVM model and report the code with the following results:
1. The mathematical form of the gradient of the loss function.
2. The optimal w∗ = argminw L(w) as the minimizer.
3. Training accuracy and test accuracy with C = 0.5, 2.0, 5.0, 10.0.
4. Plot training data along with decision boundaries (w∗1, . . . ,w

K),K = 3
using the first two dimensions of the features for x.
3 (30 points) Task 2: Explicit Multi-class SVM
An explicit multi-class SVM utilizes multiple hyperplanes [1] to classify data.
Instead of resorting to the duality of the primal problem and a QP solver, you
will use the gradient descent to implement an explicit multi-class SVM. We
use the same notation as Problem 1. K = 3 refers to the total number of classes.
Remember to add a bias term (x0 = 1) as in Problem 1.
2
minimize L(w1, . . . ,wK) =
1
2
K∑
k=1
∥wk∥2
+ C

i
K∑
k=1,k ̸=y(i)
max
[
0, 1− (wy(i) · x(i) −wk · x(i))
]
(2)
Derive the gradient of the loss function w.r.t. w’s. Implement the training
of the explicit multi-class SVM with Python by yourself.
1. The mathematical form of the gradient of the loss function.
2. The optimal (w∗1, . . . ,w

K) = argminw1,...,wK L(w1, . . . ,wK) as the mini-
mizer.
3. Training accuracy and test accuracy with C = 0.5, 2.0, 5.0, 10.0
4. Plot training data along with decision boundaries (w∗1, . . . ,w

K),K = 3
using the first two dimensions of the features for x.
4 (30 points) Task 3: Softmax
Similar to the explicit multi-class SVM, softmax uses multiple hyperplanes to
classify data. You will use the gradient descent to implement a softmax
classifier. K = 3 refers to the total number of classes. In softmax classifier, we
can express the conditional probability (or confidence) of class j parameterized
by W = (w1, ...,wK)
⊤ and b = (b1, ..., bK) as
pj = p(y = j|x) = e
fj∑K
k=1 e
fk
where fj = wj · x+ bj .
Given the training dataset {(x(i), y(i))}, we sum the cross-entropy losses over
all data points in the dataset with L2 regularization:
minimize L(w1, . . . ,wK , b1, . . . , bK) = −

i
ln py(i) +
λ
2
K∑
k=1
∥wk∥2 .
(3)
Derive the gradient of the loss function w.r.t. w’s and b’s. Implement the
training of the softmax classifier with Python by yourself.
1. The mathematical form of the gradient of the loss function.
2. The optimal (w∗1, . . . ,w

K , b

1, . . . , b

K) = argminw1,...,wK ,b1,...,bK L(w1, . . . ,wK , b1, . . . , bK)
as the minimizer.
3. Training accuracy and test accuracy with λ = 0, 10−5, 10−3, 10−1.
4. Plot training data along with decision boundaries (w∗1, . . . ,w

K , b

1, . . . , b

K),K =
3 using the first two dimensions of the features for x.
3
5 (+15 points) Bonus: Off-the-shelf Classifiers
You will chooseONE classifier below and test in on the iris and the dna dataset.
Some candidate classifiers can be:
• SVM
The svm-light can be downloaded from:
http://daoudclarke.github.io/pysvmlight/
• Boosting
You can use/implement the AdaBoost algorithm and ap-
ply it on one vs. all strategy and implement multi-
class AdaBoost algorithm, For example, AdaBoost.MH al-
gorithm [2]. See [5] for a review. Some existing li-
braries scikit-learn (http://scikit-learn.org/stable/auto_
examples/ensemble/plot_adaboost_multiclass.html and XG-
Boost (https://github.com/dmlc/xgboost/blob/master/demo/
README.md)
• Random Forest
Using random forest classifier is another option.
The scikit-learn also provide random forest classifier.
http://scikit-learn.org/stable/modules/generated/
sklearn.ensemble.RandomForestClassifier.html
5.1 One-vs-All
Follow the instructions described in Task 1 to convert labels and obtain 3 new
datasets, in which the label yi ∈ {+1,−1}. Train and test 3 classifiers on the 3
bi-class datasets. Take the outputs of the 3 classifier to predict the class.
5.2 Explicit Multiclass
Without any processing of the labels, you directly train and test the classifier
on the dataset.
In your report, provide a short description of the classifier and list the train-
ing and test accuracies.
4
6 Requirement
Write a brief report summarizing your work and compare the results. We en-
courage you to read and learn how to proceed your study through the following
two highly cited review/analysis papers:
A Comparison of Methods for Multiclass Support Vector Machines,
Chih-Wei Hsu and Chih-Jen Lin [3]
In Defense of One-vs-all Classification, Ryan Rifkin and Aldebaro Klau-
tau [4]
References
[1] Koby Crammer and Yoram Singer. On the algorithmic implementation of
multiclass kernel-based vector machines. Journal of machine learning re-
search, 2(Dec):265–292, 2001.
[2] Jerome Friedman, Trevor Hastie, Robert Tibshirani, et al. Additive logistic
regression: a statistical view of boosting (with discussion and a rejoinder by
the authors). The annals of statistics, 28(2):337–407, 2000.
[3] Chih-Wei Hsu and Chih-Jen Lin. A comparison of methods for multi-
class support vector machines. Neural Networks, IEEE Transactions on,
13(2):415–425, 2002.
[4] Ryan Rifkin and Aldebaro Klautau. In defense of one-vs-all classification.
The Journal of Machine Learning Research, 5:101–141, 2004.
[5] Ji Zhu, Hui Zou, Saharon Rosset, and Trevor Hastie. Multi-class adaboost.
essay、essay代写