Python代写 - ESE 402/542 Homework
时间:2020-11-12
(For problems 1 and 2, no other package except numpy and matplotlib should be used
for the programming questions. For problem 3, you can use the packages of your choice.)
Problem 1.
(a) In this problem we will analyze logistic regression learned in class.
Sigmoid function can be written as S(x) = 1+
1e x
For a given variable X assume P(Y = +1|X) is modeled as P(Y = 1|X) = S(β0+β1X).
Plot a 3d figure showing the relation between output and variable β0 and β1 when X
= 1. Take values between [-2,2] for both β0 and β1 with a step size of 0.1 to plot the
3d plot.
(b) In class, we have done binary classification with labels Y={0, 1}. In this problem, we
will be using the labels as Y={−1, 1} as it will be easier to derive the likelihood of the
P(Y |X).
• Show that if Y takes values {−1, 1}, the probability of Y given X can be written
as,(Not programming)
P(Y |X) = 1
1+e y(β0+β1x) • We have learned that the coefficients β0 and β1 can be found using MLE estimates. Show that the Log Likelihood function for m data points can be written
as(Not Programming)
ln L (β0, β1) = = mXi=1
ln n
1 + eeyi(β0+β1xi) • Plot a 3d figure showing the relation between log likelihood function and variable
β0 , β1 when X = 1, Y = -1 and X = 1, Y = 1. Take values between [-2, 2] for
both β0 and β1 with a step size of 0.1 to plot the 3d plot
• Based on the graph, is it possible to maximize this function?
1
Problem 2.
1. While we can formalize the Likelihood estimate there is no close form expression for the
coefficients β0, β1 maximising the above log likelihood. Hence, we will use an iterative
algorithm to solve for the coefficients.
We can see that
max(( mXi=1
ln n
1 + eeyi(β0+β1xi)
) = min(
mXi=1
ln n
1 + eeyi(β0+β1xi))
We will describe our function loss as L = 1m Pmi=1 ln n
1 + eeyi(β0+β1xi)
. Our objective
is to iterative decrease this loss as we keep computing the optimal coefficients. Here
xi ∈ R
In this problem we will be working with real image data where the goal is to classify if the image is 0 or 1 using logistic regression.
The input X ∈ Rm∗d where a single data point xi ∈ Rd
, d = 784. The matrix labels
Y ∈ Rm, where each label yi ∈ {0, 1} • Load the data into the memory and visualize one input as an image for each of
label 0 and label 1. (The data should be reshaped back to [28 x 28] to be able
to visualize it.)
• The data is inbetween 0 to 255. Normalise the data to 0 and 1
• Set yi = 1 for images labeled 0 and yi = -1 for images labeled 1. Split the data
randomly into train and test with a ratio of 80:20.
Why is random splitting better than sequential splitting in our case?
• Initialize the coefficients using a univariate “normal” (Gaussian) distribution of
mean 0 and variance 1. (Remember that coefficients are a vector of = [β0, β1...βd+1],
where d is the dimension of the input)
• Compute the loss using the above mentioned Loss L.
(The loss can be written as L = 1m Pmi=1 ln
1 + eeyi(β0+Pdj=0 β(j+1)·xi,j )
, where (i,
j) represent the data point i for i ∈ [1 ..m] and jth dimension of the data point
xi
for j ∈ [0 ...d-1])
• To minimize the loss function a widely known algorithm is going in the direction
opposite to the gradients of the loss function.
(It’s helpful to write the coefficients [β1...βd+1] as a vector β and β0 as a scalar.
Now β is of size [d] ∈Rd and β0 is of size [1] ∈ R)
2
We can write the gradients of loss function as
∂L
∂β0 = = 1m mXi=1
eeyi·(β0+β·xTi )
1 + e−(yi·(β0+β·xTi
)) yi = dβ0
∂L
∂β = = 1m mXi=1
eeyi·(β0+β·xTi )
1 + e−(yi·(β0+β·xTi
)) yixi = dβ
Write a function to compute the gradients
• Update the parameters as
β = β 0.05 ∗ dβ
β0 = β0 0 0.05 ∗ dβ0
(Gradient updates should be computed based on the train set)
• Repeat the process for 50 iterations and report the loss after the 50th epoch.
• plot the loss for each iteration for the train and test sets
• Logistic regression is a classification problem. We classify as +1 if P(Y = 1|X) ≥ 0.5. Derive the classification rule for the threshold 0.5.(Not a programming
question)
• for the classification rule derived compute the accuracy on the test set for each
iteration and plot the accuracy.
The final code should be along this format
import numpy as np
from matplotlib import pyplot as plt
def compute_loss(data, labels, B, B_0):
return logloss
def compute_gradients(data, labels, B, B_0):
return dB, dB_0
if __name__ == '__main__': x = np.load(data)
y = np.load(label)
3
## Split the data to train and test
x_train, y_train, x_test, y_test = #split_data
B = np.random.randn(1, x.shape[1])
B_0 = np.random.randn(1)
lr = 0.05
for _ in range(50):
## Compute Loss
loss = compute_loss(x_train, y_train, B, B_0)
## Compute Gradients
dB, dB_0 = compute_gradients(x_train, y_train, B, B_0)
## Update Parameters
B = B - lr*dB
B_0 = B_0 - lr*dB_0
##Compute Accuracy and Loss on Test set (x_test, y_test)
accuracy_test =
loss_test =
##Plot Loss and Accuracy
Make sure to vectorize the code. Ideally 50 iterations should run in 15 seconds or less. If
possible avoid using for loops, except for the 50 iterations of gradient updates given in the
sample code.