R 代写-FIT5201-Assignment 1|学霸联盟

R 代写-FIT5201-Assignment 1

时间：2021-04-26

1
Assignment 1
FIT5201: Machine Learning
Please note that,
1. 1 sec delay will be penalized as 1 day delay. So please submit your
assignment in advance (considering the possible internet delay) and
do not wait until last minute.
2. We will not accept any resubmit version. So please double check
your assignment before the submission.
Objectives This assignment assesses your understanding of model complexity, model selection, uncertainty in prediction with bootstrapping, and probabilistic machine learning, and linear models for regression and classification, covered in Modules 1, 2, and 3. The total marks of this assignment is 150. This assignment constitutes 25% of your final mark for this unit.
Section A. Model Complexity and Model Selection In this section, you study the effect of model complexity on the training and testing error. You also demonstrate your programming skills by developing a regression algorithm and a cross-validation technique that will be used to select the models with the most effective complexity.
Background. A KNN regressor is similar to a KNN classifier (covered in Activity 1.1) in that it finds the K nearest neighbors and estimates the value of the given test point based on the values of its neighbours. The main difference between KNN regression and KNN classification is that KNN classifier returns the label that has the majority vote in the neighborhood, whilst KNN regressor returns the average of the neighbors’ values. In Activity 1 of Module 1, we use the number of mis-classifications as the measurement of training and testing errors in KNN classifier. For KNN regressor, you need to choose another error function as the measurement of training errors and testing errors.
Question 1 [KNN Regressor, 20 Marks] I. [5 marks] Implement the KNN regressor function:
knn(train.data, train.label, test.data, K=3)which takes the training data and their labels (continuous values), the testset, and the size of the neighborhood (K). It should return the regressedvalues for the test data points. Note that, you need to use a distance
2
function to choose the neighbors. The distance function used to measure the distance between a pair of data points is Manhattan distance function.
Hint: You are allowed to use KNN classifier code from Activity 1 of Module 1.
II. [5 marks] Plot the training and the testing errors versus 1/K for K=1,.., 35in one plot, using the Task1A_train.csv and Task1A_test.csv datasetsprovided for this assignment. Save the plot in your Jupyter Notebook filefor Question 1. Report your chosen error function in your JupyterNotebook file.
III. [10 marks] Report (in your Jupyter Notebook file) the optimum value for Kin terms of the testing error. Discuss the values of K and model complexitycorresponding to underfitting and overfitting based on your plot in theprevious part (Part II).
Question 2 [Leave-One-Out Cross-Validation, 15 Marks] I. [5 marks] A special case of L-Fold cross-validation is Leave-One-Out cross- validation where L (i.e., the number of folds/subsets) is equal to the size ofthe training dataset. In each iteration, one training data point is used as thevalidation set. Implement a Leave-One-Out cross-validation (CV) functionfor your KNN regressor:
cv(train.data, train.label, K)which takes the training data and their labels (continuous values), K value(the number of neighbors), the number of folds, and returns errors fordifferent folds of the training data.
II. [8 marks] Using the training data in Question 1, run your Leave-One-OutCV. Change the value of K=1,..,20 in your KNN regressor, and for each Kcompute the average error values you have got for folds. Plot the averageof error values versus 1/K for K=1,..,20 in your KNN regressor. Save the plotin your Jupyter Notebook file for Question 2.
3
III. [2 marks] Report (in your Jupyter Notebook file) the optimum value for Kbased on your plot for this Leave-One-Out cross validation in the previouspart (Part II).
Section B. Prediction Uncertainty with Bootstrapping This section is the adaptation of Activity 1.2 from KNN classification to KNN regression. You use the bootstrapping technique to quantify the uncertainty of predictions for the KNN regressor that you implemented in Section A.
Background. Please refer to the background in Section A.
Question 3 [Bootstrapping, 25 Marks] I. [5 marks] Modify the code in Activity 1.2 to handle bootstrapping for KNNregression.
II. [5 marks] Load Task1B_train.csv and Task1B_test.csv sets. Apply yourbootstrapping for KNN regression with times = 30 (the number of subsets),size = 60 (the size of each subset), and change K=1,..,20 (the neighbourhoodsize). Now create a boxplot where the x-axis is K, and the y-axis is the averagetest error (and the uncertainty around it) corresponding to each K. Save theplot in your Jupyter Notebook file for Question 3.
Hint: You can refer to the boxplot in Activity 1.2 of Module 1. But the error ismeasured in different ways compared with the KNN classifier.
III. [5 marks] Based on the plot in the previous part (Part П), how does the testerror and its uncertainty behave as K increases? Explain in your JupyterNotebook file.
IV. [5 marks] Load Task1B_train.csv and Task1B_test.csv sets. Apply yourbootstrapping for KNN regression with K=5 (the neighbourhood size),times = 50 (the number of subsets), and change sizes = 5, 10, 15,..., 75 (thesize of each subset). Now create a boxplot where the x-axis is ‘sizes’, andthe y-axis is the average test error (and the uncertainty around it)corresponding to each value of ‘sizes’. Save the plot in your JupyterNotebook file for Question 3.
4
V. [5 marks] Based on the plot in the previous part (Part IV), how does thetest error and its uncertainty behave as the size of each subset inbootstrapping increases? Explain in your Jupyter Notebook file.
Section C. Probabilistic Machine Learning
In this section, you show your knowledge about the foundation of the probabilistic machine learning (i.e. probabilistic inference and modeling) by solving one simple but basic statistical inference problems. Solve the following problems based on the probability concepts you have learned in Module 1 with the same math conventions.
Question 4 [Bayes Rule, 20 Marks] Recall the simple example from Appendix A of Module 1. Suppose we have one red, one blue, and one yellow box. In the red box we have 3 apples and 1 orange, in the blue box we have 4 apples and 4 orange, and in the yellow box we have 5 apples and 3 oranges. Now suppose we randomly selected one of the boxes and picked a fruit. If the picked fruit is an orange, what is the probability that it was picked from the yellow box? Note that the chances of picking the red, blue, and yellow boxes are 50%, 30%, and 20% respectively and the selection chance for any of the pieces from a box is equal for all the pieces in that box. Please show your work in your PDF report.
Hint: You can formulize this problem following the denotations in “Random Variable” paragraph in Appendix A of Module 1.
Section D. Ridge Regression In this section, you develop Ridge Regression by adding the L2 norm regularization to the linear regression (covered in Activity 2.1 of Module 2) and study the effect of the L2 norm regularization on the training and testing errors.
5
This section assesses your mathematical skills (derivation), programming, and analytical skills.
Question 5 [Ridge Regression, 25 Marks] I. [10 marks] Given the gradient descent algorithms for linear regression(discussed in Chapter 2 of Module 2), derive weight update steps ofstochastic gradient descent (SGD) for linear regression with L2regularisation norm. Show your work with enough explanation in your PDF report; you should provide the steps of SGD.
Hint: Recall that for linear regression we defined the error function E. Forthis assignment, you only need to add an L2 regularization term to theerror function (error term plus the regularization term). This question issimilar to Activity 2.1 of Module 2.
II. [5 marks] Using R (with no use of special libraries), implement analgorithm that you derived in Step I. The implementation isstraightforward as you are allowed to use the code examples provided.
III. Now let’s study the effect of the L2 norm regularization on the training andtesting errors:a. Load Task1C_train.csv and Task1C_test.csv sets.
b. [5 marks] For each lambda (the regularization parameter) in {0, 0.5,1.0, …, 10}, build a regression model and compute the training andtesting errors, using the provided data sets. While building eachmodel, all parameter settings (initial values, learning rate, etc) areexactly the same, except a lambda value. Set the terminationcriterion as maximum of 20 x N weight updates (where N is thenumber of training data). Create a plot of error rates (use differentcolors for the training and testing errors), where the x-axis is loglambda and y-axis is the error rate. Save your plot in your JupyterNotebook file for Question 5.
c. [5 marks] Based on your plot in the previous part (Part b), what’sthe best value for lambda? Discuss lambda, model complexity, anderror rates, corresponding to underfitting and overfitting, by
6
observing your plot. (Include all your answers in your Jupyter Notebook file.)
Section E. Multiclass Perceptron
In this section, you are asked to demonstrate your understanding of linear models for classification. You expand the binary-class perceptron algorithm that is covered in Activity 3.1 of Module 3 into a multiclass classifier. Then, you study the effect of the learning rate on the error rate. This section assesses your programming, and analytical skills.
Background. Assume we have N training examples {(x1,t1),...,(xN,tN)} where tn can get K discrete values {C1, ..., CK}, i.e. a K-class classification problem. We use to represent the predicted label of
Model. To solve a K-class classification problem, we can learn K weight vectors wk, each of which corresponding to one of the classes.
Prediction. In the prediction time, a data point x will be classified as argmaxk wk . x
Training Algorithm. We train the multiclass perceptron based on the following algorithm:
• Initialise the weight vectors randomly w1,..,wK
• While not converged do:
o For n = 1 to N do:
 y = argmaxk wk . xn
 If y != yn do
• wy := wy - η xn
• wyn := wyn + η xnIn what follows, we look into the convergence properties of the training algorithm for multiclass perceptron (similar to Activity 3.1 of Module 3).
Question 6 [Multiclass Perceptron, 20 Marks] I. Load Task1D_train.csv and Task1D_test.csv sets.
7
II. [10 marks] Implement the multiclass perceptron as explained above. Please provide enough comments for your code in your submission.
III. [10 marks] Train two multiclass perceptron models on the provided
training data by setting the learning rates η to .1 and .01 respectively. Note that all parameter settings stay the same, except the learning rate, when building each model.For each model, evaluate the error of the model on the test data, after processing every 5 training data points (also known as a mini-batch). Then, plot the testing errors of two models built based on the learning rates .1 and .01(with different colors) versus the number of mini-batches. Include it in your Jupyter Notebook file for Question 6.Now, explain how the testing errors of two models behave differently, as the training data increases, by observing your plot. (Include all your answers in your Jupyter Notebook file.)
Section F. Logistic Regression vs. Bayesian Classifier
This task assesses your analytical skills. You need to study the performance of two well-known generative and discriminative models, i.e. Bayesian classifier and logistic regression, as the size of the training set increases. Then, you show your understanding of the behavior of learning curves of typical generative and discriminative models.
Question 7 [Discriminative vs Generative Models, 25 Marks] I. Load Task1E_train.csv and Task1E_test.csv as well as the Bayesianclassifier (BC) and logistic regression (LR) codes from Activities 3.2 and 3.3in Module 3.
II. [10 marks] Using the first 5 data points from the training set, train a BC anda LR model, and compute their training and testing errors. In a “for loop”,increase the size of training set (5 data points at a time), retrain the modelsand calculate their training and testing errors until all training data pointsare used. In one figure, plot the training errors of the BC and LR models(with different colors) versus the size of the training set and in the otherfigure, plot the testing errors of the BC and LR models(with different colors)
8
versus the size of the training set; include two plots in your Jupyter Notebook file for Question 7. III. Explain your observations in your Jupyter Notebook file.: a. [5 marks] What does happen for each classifier when the number of training data points is increased? b. [5 marks] Which classifier is best suited when the training set is small, and which is best suited when the training set is big? c. [5 marks] Justify your observations in previous questions (III.a & III.b) by providing some speculations and possible reasons.
Hint: Think about model complexity and the fundamental concepts of machine learning covered in Module 1.
Submission & Due Date:
The files that you need to submit are: 1. Jupyter Notebook files containing the code and your answers for questions {1,2,3,5,6,7} with the extension “.ipynb”. The file names should be in the following format STUDNETID_assessment_1_qX.ipynb where ‘X=1,2,3,5,6,7’ is the question number. For example, the Notebook for Question 2 should be named STUDNETID_assessment_1_q2.ipynb 2. You must add enough comments to your code to make it readable and understandable by the tutor. Furthermore, you may be asked to meet (online) with your tutor when your assessment is marked to complete your interview. 3. A PDF file that contains your report, the file name should be in the following format STUDNETID_assessment_1_report.pdf You should replace
9
STUDENTID with your own student ID. All files must be submitted via Moodle before the due date and time. 4. Zip all of your files and submit it via Moodle. The name of your file must be in the following format: STUDNETID_FirstName_LastName_assessment_1_report.pdf where in addition to your student ID, you need to use your first name and last name as well.
Assessment Criteria: The following outlines the criteria which you will be assessed against:
• Ability to understand the fundamentals of machine learning and linear models.
• Working code: The code executes without errors and produces correct results.
• Quality of report: You report should show your understanding of the fundamentals of machine learning and linear models by answering the questions in this assessment and attaching the required figures.
Penalties:
• Late submission (students who submit an assessment task after the due date will receive a late-penalty of 10% of the available marks in that task per calendar day. Assessment submitted more than 7 calendar days after the due date will receive a mark of zero (0) for that assessment task. )
• Jupyter Notebook file is not properly named (-5%)
• The report PDF file is not properly named (-5%)

学霸联盟