xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

机器学习代写-COMP9417

时间：2021-03-03

COMP9417 - Machine Learning

Homework 1: Gradient Descent & Friends

Introduction In this homework, you will be required to manually implement (Stochastic) Gradient Descent

in Python to learn the parameters of a linear regression model. You will make use of the publicly avail-

able ‘real estate.csv’ dataset, which you can download directly from the course Moodle page. The dataset

contains 414 real estate records, each of which contains the following features:

• transactiondate: date of transaction

• age: age of property

• nearestMRT: distance of property to nearest supermarket

• nConvenience: number of convenience stores in nearby locations

• latitude

• longitude

The target variable is the property price. The goal is to learn to predict property prices as a function of a

subset of the above features.

Question 1. (Pre-processing)

A number of pre-processing steps are required before we attempt to fit any models to the data.

(a) Remove any rows of the data that contain a missing (‘NA’) value. List the indices of the removed

data points. Then, delete all features from the dataset apart from: age, nearestMRT and nConve-

nience.

(b) An important pre-processing step: feature normalisation. Feature normalisation involves rescaling

the features such that thay all have similar scales. This is also important for algorithms like gradient

descent to ensure the convergence of the algorithm. One common technique is called min-max

normalisation, in which each feature is scaled to the range [0, 1]. To do this, for each feature we

must find the minimum and maximum values in the available sample, and then use the following

formula to make the transformation:

xnew =

x−min(x)

max(x)−min(x) .

After applying this normalisation, the minimum value of your feature will be 0, and the maximum

will be 1. For each of the features, provide the mean value over your dataset.

Question 2. (Train and Test sets)

Now that the data is pre-processed, we will create train and test sets to build and then evaluate our

1

models on. Use the first half of observations (in the same order as in the original csv file excluding

those you removed in the previous question) to create the training set, and the remaining half for the

test set. Print out the first and last rows of both your training and test sets.

Question 3. (Loss Function)

Consider the loss function

Lc(x, y) =

√

1

c2

(x− y)2 + 1− 1,

where c ∈ R is a hyper-parameter. Consider the (simple) linear model

yˆ(i) = w0 + w1x

(i)

1 + w2x

(i)

2 + w3x

(i)

3 , i = 1, . . . , n.

We can write this more succintly by letting w = (w0, w1, w2, w3)T and X(i) = (1, x

(i)

1 , x

(i)

2 , x

(i)

3 )

T , so that

yˆ(i) = wTX(i). The mean-loss achieved by our model (w) on a given dataset of n observations is then

Lc(y, yˆ) = 1

n

n∑

i=1

Lc(y(i), yˆ(i)) = 1

n

n∑

i=1

[√

1

c2

(y(i) − 〈w(t), X(i)〉)2 + 1− 1

]

,

Compute the following derivatives:

∂Lc(y(i)yˆ(i))

∂wk

, k = 0, 1, 2, 3.

You must show your working for full marks.

Question 4. (Gradient Descent Psuedocode)

For some value of c > 0, we have

∂Lc(y(i), yˆ(i))

∂wk

=

x

(i)

k (〈w(t), X(i)〉 − y(i))

2

√

(〈w(t), X(i)〉 − y(i))2 + 4 , k = 0, 1, 2, 3.

Using this result, write down the gradient descent updates for w0, w1, w2, w3 (using pseudocode), as-

suming a step size of η. Note that in gradient descent we consider the loss over the entire dataset, not

just at a single observation. Further, provide pseudocode for stochastic gradient descent updates. Here,

w(t) denotes the weight vector at the t-th iteration of gradient descent.

Question 5. (Gradient Descent Implementation)

In this section, you will implement gradient descent from scratch on the generated dataset using the

gradients computed in Question 3, and the pseudocode in Question 4.

(a) Initialise your weight vector to w(0) = [1, 1, 1, 1]T . Consider step sizes

η ∈ {10, 5, 2, 1, 0.5, 0.25, 0.1, 0.05, 0.01}

(a total of 9 step sizes). For each step-size, generate a plot of the loss achieved at each iteration of

gradient descent. You should use 400 iterations in total (which will require you to loop over your

training data in order). Generate a 3×3 grid of plots showing performance for each step-size. Add

a screen shot of the python code written for this question in your report (as well as pasting the

code in to your solutions.py file). The following code may help with plotting:

Page 2

1 fig, ax = plt.subplots(3,3, figsize=(10,10))

2 nIter = 400

3 alphas = [10,5,2, 1,0.5, 0.25,0.1, 0.05, 0.01]

4 for i, ax in enumerate(ax.flat):

5 # losses is a list of 9 elements. Each element is an array of length nIter

storing the loss at each iteration for

6 # that particular step size

7 ax.plot(losses[i])

8 ax.set_title(f"step size: {alphas[i]}") # plot titles

9 plt.tight_layout() # plot formatting

10 plt.show()

11

(b) From your results in the previous part, choose an appropriate step size (and state your choice), and

explain why you made this choice.

(c) For this part, take η = 0.3, re-run GD under the same parameter initilisations and number of

iterations. On a single plot, plot the progression of each of the four weights over the iterations.

Print out the final weight vector. Finally, run your model on the train and test set, and print the

achieved losses.

Question 6. (Stochastic Gradient Descent Implementation)

We will now re-run the analysis in question 5, but using stochastic gradient descent (SGD) instead. In

SGD, we update the weights after seeing a single observation.

(a) Use the same settings as in 5 (a). For each step-size, generate a plot of the loss achieved at each

iteration of stochastic gradient descent, which is to be run for 6 Epochs. An epoch is one pass

over the entire training dataset, so in total there should be 6 × ntrain updates, where ntrain is the

size of your training data. Be sure to run these updates in the same ordering that the data is

stored. Generate a 3 × 3 grid of plots showing performance for each step-size. Add a screen shot

of the python code written for this question in your report (as well as pasting the code in to your

solutions.py file).

(b) From your results in the previous part, choose an appropriate step size (and state your choice), and

explain why you made this choice.

(c) Take η = 0.4 and re-run SGD under the same parameter initilisations and number of epochs. On

a single plot, plot the progression of each of the four weights over the iterations. Print yoru final

model. Finally, run your model on the train and test set, and record the achieved losses.

Question 7. Results Analysis

In a few lines, comment on your results in Questions 5 and 6. Explain the importance of the step-size

in both GD and SGD. Explain why one might have a preference for GD or SGD. Explain why the GD

paths look much smoother than the SGD paths.

Points Allocation There are a total of 10 marks, the available marks are:

• Question 1: 1 mark (split 0.5/0.5)

• Question 2: 0.5 marks

• Question 3: 1.5 marks

• Question 4: 1 mark

• Question 5: 2.5 marks (split 1/0.5/1)

Page 3

• Question 6: 2.5 marks (split 1/0.5/1)

• Question 7: 1 mark

What to Submit

• A single PDF file which contains solutions to each question. For each question, provide your solution

in the form of text and requested plots. For any question in which you use code, provide a copy of

your code at the bottom of the relevant section.

• .py file(s) containing all code you used for the project, which should be provided in a seperate .zip

file. This code must match the code provided in the report.

• You may be deducted points for not following these instructions.

• You may be deducted points for poorly presented/formatted work. Please be neat and make your

solutions clear.

When and Where to Submit

• Due date: Monday 8 March, 2021 by 5:00pm.

• Late submission incur a penalty of 10% per day for the first 5 days and 100% for more than 5 days.

• Submission has to be done through Moodle.

Final Reminder: You are required to submit one PDF file, AND also to submit the Python file(s) with all

your code as a separate .zip file.

Page 4

学霸联盟

Homework 1: Gradient Descent & Friends

Introduction In this homework, you will be required to manually implement (Stochastic) Gradient Descent

in Python to learn the parameters of a linear regression model. You will make use of the publicly avail-

able ‘real estate.csv’ dataset, which you can download directly from the course Moodle page. The dataset

contains 414 real estate records, each of which contains the following features:

• transactiondate: date of transaction

• age: age of property

• nearestMRT: distance of property to nearest supermarket

• nConvenience: number of convenience stores in nearby locations

• latitude

• longitude

The target variable is the property price. The goal is to learn to predict property prices as a function of a

subset of the above features.

Question 1. (Pre-processing)

A number of pre-processing steps are required before we attempt to fit any models to the data.

(a) Remove any rows of the data that contain a missing (‘NA’) value. List the indices of the removed

data points. Then, delete all features from the dataset apart from: age, nearestMRT and nConve-

nience.

(b) An important pre-processing step: feature normalisation. Feature normalisation involves rescaling

the features such that thay all have similar scales. This is also important for algorithms like gradient

descent to ensure the convergence of the algorithm. One common technique is called min-max

normalisation, in which each feature is scaled to the range [0, 1]. To do this, for each feature we

must find the minimum and maximum values in the available sample, and then use the following

formula to make the transformation:

xnew =

x−min(x)

max(x)−min(x) .

After applying this normalisation, the minimum value of your feature will be 0, and the maximum

will be 1. For each of the features, provide the mean value over your dataset.

Question 2. (Train and Test sets)

Now that the data is pre-processed, we will create train and test sets to build and then evaluate our

1

models on. Use the first half of observations (in the same order as in the original csv file excluding

those you removed in the previous question) to create the training set, and the remaining half for the

test set. Print out the first and last rows of both your training and test sets.

Question 3. (Loss Function)

Consider the loss function

Lc(x, y) =

√

1

c2

(x− y)2 + 1− 1,

where c ∈ R is a hyper-parameter. Consider the (simple) linear model

yˆ(i) = w0 + w1x

(i)

1 + w2x

(i)

2 + w3x

(i)

3 , i = 1, . . . , n.

We can write this more succintly by letting w = (w0, w1, w2, w3)T and X(i) = (1, x

(i)

1 , x

(i)

2 , x

(i)

3 )

T , so that

yˆ(i) = wTX(i). The mean-loss achieved by our model (w) on a given dataset of n observations is then

Lc(y, yˆ) = 1

n

n∑

i=1

Lc(y(i), yˆ(i)) = 1

n

n∑

i=1

[√

1

c2

(y(i) − 〈w(t), X(i)〉)2 + 1− 1

]

,

Compute the following derivatives:

∂Lc(y(i)yˆ(i))

∂wk

, k = 0, 1, 2, 3.

You must show your working for full marks.

Question 4. (Gradient Descent Psuedocode)

For some value of c > 0, we have

∂Lc(y(i), yˆ(i))

∂wk

=

x

(i)

k (〈w(t), X(i)〉 − y(i))

2

√

(〈w(t), X(i)〉 − y(i))2 + 4 , k = 0, 1, 2, 3.

Using this result, write down the gradient descent updates for w0, w1, w2, w3 (using pseudocode), as-

suming a step size of η. Note that in gradient descent we consider the loss over the entire dataset, not

just at a single observation. Further, provide pseudocode for stochastic gradient descent updates. Here,

w(t) denotes the weight vector at the t-th iteration of gradient descent.

Question 5. (Gradient Descent Implementation)

In this section, you will implement gradient descent from scratch on the generated dataset using the

gradients computed in Question 3, and the pseudocode in Question 4.

(a) Initialise your weight vector to w(0) = [1, 1, 1, 1]T . Consider step sizes

η ∈ {10, 5, 2, 1, 0.5, 0.25, 0.1, 0.05, 0.01}

(a total of 9 step sizes). For each step-size, generate a plot of the loss achieved at each iteration of

gradient descent. You should use 400 iterations in total (which will require you to loop over your

training data in order). Generate a 3×3 grid of plots showing performance for each step-size. Add

a screen shot of the python code written for this question in your report (as well as pasting the

code in to your solutions.py file). The following code may help with plotting:

Page 2

1 fig, ax = plt.subplots(3,3, figsize=(10,10))

2 nIter = 400

3 alphas = [10,5,2, 1,0.5, 0.25,0.1, 0.05, 0.01]

4 for i, ax in enumerate(ax.flat):

5 # losses is a list of 9 elements. Each element is an array of length nIter

storing the loss at each iteration for

6 # that particular step size

7 ax.plot(losses[i])

8 ax.set_title(f"step size: {alphas[i]}") # plot titles

9 plt.tight_layout() # plot formatting

10 plt.show()

11

(b) From your results in the previous part, choose an appropriate step size (and state your choice), and

explain why you made this choice.

(c) For this part, take η = 0.3, re-run GD under the same parameter initilisations and number of

iterations. On a single plot, plot the progression of each of the four weights over the iterations.

Print out the final weight vector. Finally, run your model on the train and test set, and print the

achieved losses.

Question 6. (Stochastic Gradient Descent Implementation)

We will now re-run the analysis in question 5, but using stochastic gradient descent (SGD) instead. In

SGD, we update the weights after seeing a single observation.

(a) Use the same settings as in 5 (a). For each step-size, generate a plot of the loss achieved at each

iteration of stochastic gradient descent, which is to be run for 6 Epochs. An epoch is one pass

over the entire training dataset, so in total there should be 6 × ntrain updates, where ntrain is the

size of your training data. Be sure to run these updates in the same ordering that the data is

stored. Generate a 3 × 3 grid of plots showing performance for each step-size. Add a screen shot

of the python code written for this question in your report (as well as pasting the code in to your

solutions.py file).

(b) From your results in the previous part, choose an appropriate step size (and state your choice), and

explain why you made this choice.

(c) Take η = 0.4 and re-run SGD under the same parameter initilisations and number of epochs. On

a single plot, plot the progression of each of the four weights over the iterations. Print yoru final

model. Finally, run your model on the train and test set, and record the achieved losses.

Question 7. Results Analysis

In a few lines, comment on your results in Questions 5 and 6. Explain the importance of the step-size

in both GD and SGD. Explain why one might have a preference for GD or SGD. Explain why the GD

paths look much smoother than the SGD paths.

Points Allocation There are a total of 10 marks, the available marks are:

• Question 1: 1 mark (split 0.5/0.5)

• Question 2: 0.5 marks

• Question 3: 1.5 marks

• Question 4: 1 mark

• Question 5: 2.5 marks (split 1/0.5/1)

Page 3

• Question 6: 2.5 marks (split 1/0.5/1)

• Question 7: 1 mark

What to Submit

• A single PDF file which contains solutions to each question. For each question, provide your solution

in the form of text and requested plots. For any question in which you use code, provide a copy of

your code at the bottom of the relevant section.

• .py file(s) containing all code you used for the project, which should be provided in a seperate .zip

file. This code must match the code provided in the report.

• You may be deducted points for not following these instructions.

• You may be deducted points for poorly presented/formatted work. Please be neat and make your

solutions clear.

When and Where to Submit

• Due date: Monday 8 March, 2021 by 5:00pm.

• Late submission incur a penalty of 10% per day for the first 5 days and 100% for more than 5 days.

• Submission has to be done through Moodle.

Final Reminder: You are required to submit one PDF file, AND also to submit the Python file(s) with all

your code as a separate .zip file.

Page 4

学霸联盟