Python代写｜机器学习代写-CSE 5523: Machine Learning|学霸联盟

Python代写｜机器学习代写-CSE 5523: Machine Learning

时间：2020-11-27

Please, read the following instructions carefully • The is a mini-project where it is required to implement and test the SGD algorithm for logistic regression in different scenarios. Please, read all parts of this assignment carefully. • Your submission should be in the form of a report that includes – A brief introduction containing a description of the goals and a high-level description of your procedure (you may augment that with a pseudo-code if it helps with clarity). – A section on your experiments where you explain clearly and concisely how you devised the experiments (in the light of the guidelines and specifications provided in this assignment), provide a description of each task involved including the generation of training and test data stating precisely the specific setting of each parameter to be used in any task (that is, do not leave any parameter unspecified). – A section on the results of your experiments, where you state and discuss your results, and include all the relevant figures. – A conclusion where you state the main findings. – A symbol index listing all the symbols used for the parameters and variables in your code and what they denote. – An appendix that contains a well-documented copy of your code. • Please, follow the template given at the end of this document. • You can use any language you prefer, e.g., MATLAB, Python, C, etc. • If you are going to use a function from an existing package/library, you must clearly describe (preferably in a separate section) the inputs (including all the relevant parameters) and outputs of such a function. Also, whenever this function is invoked in your code, you must clearly and explicitly show the setting of each of the input parameters and the format of the output to be expected. Please, do not copy and paste the documentation found in the help folder of the package or any vague description you found online. You need to describe concisely only the relevant functionality and the relevant parameters settings. • Building your own version of SGD is highly recommended. • All submitted plots must clearly show/describe all the variables on the axes, and you need to add a caption to describe briefly the plots in each figure, and (if applicable) the setting that each plot represents. • Use different line styles or marks for plots that are on the same figure. It is recommended that you use figure legends. If you will eventually print your final submission in black-and-white, please, do not use different colors to distinguish different plots on the same figure. Instead, use different line styles or different marks on your plots. Stochastic Gradient Descent for Logistic Regression Recall the logistic loss function we defined in class: `logist (w,(x, y)) = ln (1 + exp ( yhw, x˜i)) where x ∈ X ⊂ Rd 1, x˜ , (x, 1), y ∈ {−1, +1}, and w ∈ C ⊂ Rd . We will consider two scenarios, each with a different setting for X and C. In both scenarios, the dimensionality parameter d is 5. Scenario 1 • The domain set X = [ 1, 1]d 1 , i.e., X is the 4-dimensional hypercube with edge length 2 and centered around the origin. • The parameter set C = [ 1, 1]d , i.e., C is the 5-dimensional hypercube with edge length 2 and centered around the origin. Scenario 2 • The domain set X = x ∈ Rd 1 : kxk ≤ 1 , i.e., X is the 4-dimensional unit ball centered around the origin. • The parameter set C = w ∈ Rd : kwk ≤ 1 , i.e., C is the 5-dimensional unit ball centered around the origin. For each scenario, show that there is a constant ρ such that for all z ∈ X × {−1, +1}, `logist(·, z) is convex and ρ-Lipschitz over C and that C is M-bounded for some M > 0. Specify ρ and M for each of the two scenarios. (Note that the values of ρ and M may not be the same for the two scenarios.) Data Distribution D : In practice, the data distribution is usually unknown. However, since you will be asked to generate training and test examples for the sake of running your experiments, we will describe a data distribution from which examples will be generated for each scenario. (Nevertheless, note that the SGD learner should remain oblivious to the distribution). Each example (x, y) is generated as follows. • with probability 1/2, set y = 1 and generate a d 1-dimensional Gaussian vector u ∼ N (µ0, σ2Id 1) where µ0 = ( 1/4, 1/4, 1/4, 1/4) and Id 1 is the identity matrix of rank d 1, that is, u is composed of 4 i.i.d. Gaussian components, each of mean 1/4 and variance σ2 (σ will be specified later). • with the remaining probability, set y = 1 and generate u ∼ N (µ1, σ2Id 1) where µ1 = (1/4, 1/4, 1/4, 1/4). Then, set x = ΠX (u) where ΠX is the Euclidean projection onto X , that is, u generated above is projected onto X (in case it lies outside X ) and the resulting vector is x. Note that the procedure above will be used in both scenarios to generate examples for training and testing, however, since X is different in the two scenarios, the projection step described above will be different. Let n denote the number of training examples (that will be used by the SGD learner to output a predictor), and let N denote the number of test examples that will be used to evaluate the performance of the output predictor on fresh examples. The number of test examples, N, will be fixed in all experiments to 400 examples. Experiments Let L(w; D) , E (x,y)∼D [`logist (w,(x, y))] denote the risk incurred by a predictor w ∈ C under the logistic loss model w.r.t. the distribution D. Let err(w; D) , E (x,y)∼D [1 (sign (hw,(x, 1)i) = y)] denote the binary classification error (the risk under ’0-1’ loss) incurred by w w.r.t. the distribution D. For each scenario above, it is required to conduct a set of experiments on the performance of the SGD learner, each experiment represents a different setting of the parameters σ, n. Namely, for each of the two scenarios, for each σ ∈ {0.1, 0.35}, it is required to plot • an estimate of the expected excess risk of the SGD learner, namely, Ewb [L(wb ; D)] min w∈C L(w; D) where wb is the output predictor of the SGD given n training examples, • an estimate of the expected classification error of the SGD learner, namely, Ewb [err(wb ; D)] versus the number of training examples, n, (which is equal to the number of iterations of the SGD), for n = 50, 100, 500, 1000. On your plots, using error bars, it is also required to show the standard deviation of your estimates. That is, for each estimate for the expected excess risk (and each estimate for the expected classification error), you need to provide an estimate for p Varwb [L(wb ; D)] (and p Varwb [err (wb ; D)]) shown as an error bar on your plots for the respective expected quantities. Refer to the procedure outlined below for obtaining these estimates. Obtaining estimates of the expected performance of SGD: • For each setting of n and σ, in order to obtain an estimate for the expected performance of the output predictor wb , you need to run the SGD several times (say, 30 times). • Each time the SGD is run on a fresh set of n training examples (that is, in total you need to generate 30n training examples). • In each run, the SGD outputs a (possibly different) predictor vector wb . For each output predictor wb , you need to evaluate the risk and classification error incurred by that predictor using the test set of N = 400 examples that is held out separately from the training set, that is, compute the average logistic loss and the average binary classification error incurred by wb on those N test examples. (You do not need to generate a new test set every time you run the SGD. There should be only one test set for each (scenario, σ) pair; i.e., you will need 4 test sets for all your experiments in this project.) • Hence, for each value of n, σ, you end up with a set of 30 estimates for the risk (and another set of 30 estimates for the binary classification error) corresponding to 30 (possibly different) output predictors. • For the 30 risk estimates: compute their mean (i.e., average), their minimum, and their standard deviation. Here, the standard deviation is the average deviation of the 30 estimates around their mean. Calculate the difference between the mean and the minimum. That should be your estimate for the expected excess risk for the particular setting of n, σ being considered (i.e., this is a single point on the expected excess risk plot corresponding to a particular setting of σ). Use your estimate for the standard deviation to add an error bar on your plot. • For the 30 binary classification error estimates: Compute their mean and their standard deviation. The mean represents your estimate for the expected binary classification error for the considered values of n, σ. Show your estimate for the standard deviation as an error bar on your plot. Comment on your results. Explain whether or not they agree with the theoretical results we derived in class. Compare your results in Scenarios 1 and 2. Is there any difference? If so, can you justify it? For each scenario, compare between your results for each setting of σ (the standard deviation of the Gaussian distribution). Do you spot any difference? If so, can you justify it? Project: Stochastic Gradient Descent TODO: Add author(s) 1 Introduction TODO: Describe the overall goals of this report TODO: Outline your algorithm for stochastic gradient descent 2 Experiments TODO: Describe the experiments that you ran, including the combinations of parameters that you chose for SGD. Also, describe how you generated your training and test datasets in each scenario including a clear description of how the projection step is done in each of the two scenarios. 3 Analysis of ρ-Lipschitz properties TODO: Answer the question about ρ-Lipschitz properties for each scenario TODO: If you use non-constant learning rates, describe how you selected them (for each scenario) 4 Results TODO: Fill in the following table of results. (Note: here, n is the training set size, and N is the test set size. Excess risk should be calculated as “mean min”.) Logistic loss Classification error Scenario σ n N # trials Mean Std Dev Min Excess Risk Mean Std Dev 1 0.1 50 400 30 TODO TODO TODO TODO TODO TODO 1 0.1 100 400 30 TODO TODO TODO TODO TODO TODO 1 0.1 500 400 30 TODO TODO TODO TODO TODO TODO 1 0.1 1000 400 30 TODO TODO TODO TODO TODO TODO 1 0.35 50 400 30 TODO TODO TODO TODO TODO TODO 1 0.35 100 400 30 TODO TODO TODO TODO TODO TODO 1 0.35 500 400 30 TODO TODO TODO TODO TODO TODO 1 0.35 1000 400 30 TODO TODO TODO TODO TODO TODO 2 0.1 50 400 30 TODO TODO TODO TODO TODO TODO 2 0.1 100 400 30 TODO TODO TODO TODO TODO TODO 2 0.1 500 400 30 TODO TODO TODO TODO TODO TODO 2 0.1 1000 400 30 TODO TODO TODO TODO TODO TODO 2 0.35 50 400 30 TODO TODO TODO TODO TODO TODO 2 0.35 100 400 30 TODO TODO TODO TODO TODO TODO 2 0.35 500 400 30 TODO TODO TODO TODO TODO TODO 2 0.35 1000 400 30 TODO TODO TODO TODO TODO TODO TODO: Include figures of results 1 5 Conclusion TODO: Comment on your results. Explain whether or not they agree with the theoretical results we derived in class. Compare your results in Scenarios 1 and 2. Is there any difference? If so, can you justify it? For each scenario, compare between your results for each setting of σ (the standard deviation of the Gaussian distribution). Do you spot any difference? If so, can you justify it? A Appendix: Symbol Listing TODO: Provide a listing of the symbols used in your analysis, along with a brief description of their meaning. If your code uses a different variable name corresponding to a particular symbol, mention it here (for example, your analysis might use N to refer to the test-set size, while your code uses testSetSize.) B Appendix: Library Routines TODO: If you used any packages or built-in libraries in your code (for example, to perform linear algebra operations), please briefly describe the functions/subroutines you used here. This is merely to help us understand your code, so you only need to describe library routines which are not trivial and whose functionality is not obvious from the name. C Appendix: Code TODO: Include your code below. Please make sure your code is sufficiently well-documented for us to understand what it is doing. #### Insert your code here ####