xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

Python代写｜机器学习代写 - CSE 5523: Machine Learning

时间：2020-11-27

Please, read the following instructions carefully
• The is a mini-project where it is required to implement and test the SGD algorithm for logistic regression
in different scenarios. Please, read all parts of this assignment carefully.
• Your submission should be in the form of a report that includes
– A brief introduction containing a description of the goals and a high-level description of your
procedure (you may augment that with a pseudo-code if it helps with clarity).
– A section on your experiments where you explain clearly and concisely how you devised the
experiments (in the light of the guidelines and specifications provided in this assignment), provide a
description of each task involved including the generation of training and test data stating
precisely the specific setting of each parameter to be used in any task (that is, do not leave any
parameter unspecified).
– A section on the results of your experiments, where you state and discuss your results, and include
all the relevant figures.
– A conclusion where you state the main findings.
– A symbol index listing all the symbols used for the parameters and variables in your code and
what they denote.
– An appendix that contains a well-documented copy of your code.
• Please, follow the template given at the end of this document.
• You can use any language you prefer, e.g., MATLAB, Python, C, etc.
• If you are going to use a function from an existing package/library, you must clearly describe (preferably
in a separate section) the inputs (including all the relevant parameters) and outputs of such a function.
Also, whenever this function is invoked in your code, you must clearly and explicitly show the setting
of each of the input parameters and the format of the output to be expected. Please, do not copy
and paste the documentation found in the help folder of the package or any vague description you
found online. You need to describe concisely only the relevant functionality and the relevant
parameters settings.
• Building your own version of SGD is highly recommended.
• All submitted plots must clearly show/describe all the variables on the axes, and you need to add
a caption to describe briefly the plots in each figure, and (if applicable) the setting that each plot
represents.
• Use different line styles or marks for plots that are on the same figure. It is recommended that you use
figure legends. If you will eventually print your final submission in black-and-white, please, do not use
different colors to distinguish different plots on the same figure. Instead, use different line styles or
different marks on your plots.
Stochastic Gradient Descent for Logistic Regression
Recall the logistic loss function we defined in class:
`logist (w,(x, y)) = ln (1 + exp ( yhw, x˜i))
where x ∈ X ⊂ Rd 1, x˜ , (x, 1), y ∈ {−1, +1}, and w ∈ C ⊂ Rd
. We will consider two scenarios, each with
a different setting for X and C. In both scenarios, the dimensionality parameter d is 5.
Scenario 1
• The domain set X = [ 1, 1]d 1
, i.e., X is the 4-dimensional hypercube with edge length 2 and centered
around the origin.
• The parameter set C = [ 1, 1]d
, i.e., C is the 5-dimensional hypercube with edge length 2 and centered
around the origin.
Scenario 2
• The domain set X = x ∈ Rd 1 : kxk ≤ 1
, i.e., X is the 4-dimensional unit ball centered around the
origin.
• The parameter set C = w ∈ Rd : kwk ≤ 1
, i.e., C is the 5-dimensional unit ball centered around the
origin.
For each scenario, show that there is a constant ρ such that for all z ∈ X × {−1, +1}, `logist(·, z) is
convex and ρ-Lipschitz over C and that C is M-bounded for some M > 0. Specify ρ and M for
each of the two scenarios. (Note that the values of ρ and M may not be the same for the two
scenarios.)
Data Distribution D : In practice, the data distribution is usually unknown. However, since you will be
asked to generate training and test examples for the sake of running your experiments, we will describe a
data distribution from which examples will be generated for each scenario. (Nevertheless, note that the SGD
learner should remain oblivious to the distribution). Each example (x, y) is generated as follows.
• with probability 1/2, set y = 1 and generate a d 1-dimensional Gaussian vector u ∼ N (µ0, σ2Id 1)
where µ0 = ( 1/4, 1/4, 1/4, 1/4) and Id 1 is the identity matrix of rank d 1, that is, u is
composed of 4 i.i.d. Gaussian components, each of mean 1/4 and variance σ2 (σ will be specified
later).
• with the remaining probability, set y = 1 and generate u ∼ N (µ1, σ2Id 1) where µ1 = (1/4, 1/4, 1/4, 1/4).
Then, set x = ΠX (u) where ΠX is the Euclidean projection onto X , that is, u generated above is projected
onto X (in case it lies outside X ) and the resulting vector is x.
Note that the procedure above will be used in both scenarios to generate examples for training and testing,
however, since X is different in the two scenarios, the projection step described above will be different.
Let n denote the number of training examples (that will be used by the SGD learner to output a predictor),
and let N denote the number of test examples that will be used to evaluate the performance of the output
predictor on fresh examples. The number of test examples, N, will be fixed in all experiments to
400 examples.
Experiments
Let L(w; D) , E (x,y)∼D [`logist (w,(x, y))] denote the risk incurred by a predictor w ∈ C under the logistic
loss model w.r.t. the distribution D. Let err(w; D) , E (x,y)∼D [1 (sign (hw,(x, 1)i) = y)] denote the binary
classification error (the risk under ’0-1’ loss) incurred by w w.r.t. the distribution D.
For each scenario above, it is required to conduct a set of experiments on the performance of the SGD
learner, each experiment represents a different setting of the parameters σ, n. Namely, for each of the
two scenarios, for each σ ∈ {0.1, 0.35}, it is required to plot
• an estimate of the expected excess risk of the SGD learner, namely, Ewb [L(wb ; D)] min
w∈C
L(w; D)
where wb is the output predictor of the SGD given n training examples,
• an estimate of the expected classification error of the SGD learner, namely, Ewb [err(wb ; D)]
versus the number of training examples, n, (which is equal to the number of iterations of the
SGD), for n = 50, 100, 500, 1000. On your plots, using error bars, it is also required to show
the standard deviation of your estimates. That is, for each estimate for the expected excess risk (and
each estimate for the expected classification error), you need to provide an estimate for p
Varwb [L(wb ; D)]
(and p
Varwb [err (wb ; D)]) shown as an error bar on your plots for the respective expected quantities. Refer
to the procedure outlined below for obtaining these estimates.
Obtaining estimates of the expected performance of SGD:
• For each setting of n and σ, in order to obtain an estimate for the expected performance of the output
predictor wb , you need to run the SGD several times (say, 30 times).
• Each time the SGD is run on a fresh set of n training examples (that is, in total you need to generate
30n training examples).
• In each run, the SGD outputs a (possibly different) predictor vector wb . For each output predictor
wb , you need to evaluate the risk and classification error incurred by that predictor using the test set
of N = 400 examples that is held out separately from the training set, that is, compute the average
logistic loss and the average binary classification error incurred by wb on those N test examples. (You
do not need to generate a new test set every time you run the SGD. There should be only one test set
for each (scenario, σ) pair; i.e., you will need 4 test sets for all your experiments in this project.)
• Hence, for each value of n, σ, you end up with a set of 30 estimates for the risk (and another set of 30
estimates for the binary classification error) corresponding to 30 (possibly different) output predictors.
• For the 30 risk estimates: compute their mean (i.e., average), their minimum, and their standard
deviation. Here, the standard deviation is the average deviation of the 30 estimates around their
mean. Calculate the difference between the mean and the minimum. That should be your estimate
for the expected excess risk for the particular setting of n, σ being considered (i.e., this is a single
point on the expected excess risk plot corresponding to a particular setting of σ). Use your estimate for
the standard deviation to add an error bar on your plot.
• For the 30 binary classification error estimates: Compute their mean and their standard deviation. The mean represents your estimate for the expected binary classification error for the considered
values of n, σ. Show your estimate for the standard deviation as an error bar on your plot.
Comment on your results. Explain whether or not they agree with the theoretical results we
derived in class. Compare your results in Scenarios 1 and 2. Is there any difference? If so,
can you justify it? For each scenario, compare between your results for each setting of σ (the
standard deviation of the Gaussian distribution). Do you spot any difference? If so, can you
justify it?
Project: Stochastic Gradient Descent
TODO: Add author(s)
1 Introduction
TODO: Describe the overall goals of this report
TODO: Outline your algorithm for stochastic gradient descent
2 Experiments
TODO: Describe the experiments that you ran, including the combinations of parameters that you
chose for SGD. Also, describe how you generated your training and test datasets in each scenario
including a clear description of how the projection step is done in each of the two
scenarios.
3 Analysis of ρ-Lipschitz properties
TODO: Answer the question about ρ-Lipschitz properties for each scenario
TODO: If you use non-constant learning rates, describe how you selected them (for each scenario)
4 Results
TODO: Fill in the following table of results. (Note: here, n is the training set size, and N is the
test set size. Excess risk should be calculated as “mean min”.)
Logistic loss Classification error
Scenario σ n N # trials Mean Std Dev Min Excess Risk Mean Std Dev
1 0.1 50 400 30 TODO TODO TODO TODO TODO TODO
1 0.1 100 400 30 TODO TODO TODO TODO TODO TODO
1 0.1 500 400 30 TODO TODO TODO TODO TODO TODO
1 0.1 1000 400 30 TODO TODO TODO TODO TODO TODO
1 0.35 50 400 30 TODO TODO TODO TODO TODO TODO
1 0.35 100 400 30 TODO TODO TODO TODO TODO TODO
1 0.35 500 400 30 TODO TODO TODO TODO TODO TODO
1 0.35 1000 400 30 TODO TODO TODO TODO TODO TODO
2 0.1 50 400 30 TODO TODO TODO TODO TODO TODO
2 0.1 100 400 30 TODO TODO TODO TODO TODO TODO
2 0.1 500 400 30 TODO TODO TODO TODO TODO TODO
2 0.1 1000 400 30 TODO TODO TODO TODO TODO TODO
2 0.35 50 400 30 TODO TODO TODO TODO TODO TODO
2 0.35 100 400 30 TODO TODO TODO TODO TODO TODO
2 0.35 500 400 30 TODO TODO TODO TODO TODO TODO
2 0.35 1000 400 30 TODO TODO TODO TODO TODO TODO
TODO: Include figures of results
1
5 Conclusion
TODO: Comment on your results. Explain whether or not they agree with the theoretical results
we derived in class. Compare your results in Scenarios 1 and 2. Is there any difference? If so, can
you justify it? For each scenario, compare between your results for each setting of σ (the standard
deviation of the Gaussian distribution). Do you spot any difference? If so, can you justify it?
A Appendix: Symbol Listing
TODO: Provide a listing of the symbols used in your analysis, along with a brief description of
their meaning. If your code uses a different variable name corresponding to a particular symbol,
mention it here (for example, your analysis might use N to refer to the test-set size, while your
code uses testSetSize.)
B Appendix: Library Routines
TODO: If you used any packages or built-in libraries in your code (for example, to perform linear
algebra operations), please briefly describe the functions/subroutines you used here. This is merely
to help us understand your code, so you only need to describe library routines which are not trivial
and whose functionality is not obvious from the name.
C Appendix: Code
TODO: Include your code below. Please make sure your code is sufficiently well-documented for
us to understand what it is doing.
#### Insert your code here ####