Python代写-ESE402/542Homework|学霸联盟

Python代写-ESE402/542Homework

时间：2020-11-12

(For problems 1 and 2, no other package except numpy and matplotlib should be used for the programming questions. For problem 3, you can use the packages of your choice.) Problem 1. (a) In this problem we will analyze logistic regression learned in class. Sigmoid function can be written as S(x) = 1+ 1e x For a given variable X assume P(Y = +1|X) is modeled as P(Y = 1|X) = S(β0+β1X). Plot a 3d figure showing the relation between output and variable β0 and β1 when X = 1. Take values between [-2,2] for both β0 and β1 with a step size of 0.1 to plot the 3d plot. (b) In class, we have done binary classification with labels Y={0, 1}. In this problem, we will be using the labels as Y={−1, 1} as it will be easier to derive the likelihood of the P(Y |X). • Show that if Y takes values {−1, 1}, the probability of Y given X can be written as,(Not programming) P(Y |X) = 1 1+e y(β0+β1x) • We have learned that the coefficients β0 and β1 can be found using MLE estimates. Show that the Log Likelihood function for m data points can be written as(Not Programming) ln L (β0, β1) = = mXi=1 ln n 1 + eeyi(β0+β1xi) • Plot a 3d figure showing the relation between log likelihood function and variable β0 , β1 when X = 1, Y = -1 and X = 1, Y = 1. Take values between [-2, 2] for both β0 and β1 with a step size of 0.1 to plot the 3d plot • Based on the graph, is it possible to maximize this function? 1 Problem 2. 1. While we can formalize the Likelihood estimate there is no close form expression for the coefficients β0, β1 maximising the above log likelihood. Hence, we will use an iterative algorithm to solve for the coefficients. We can see that max(( mXi=1 ln n 1 + eeyi(β0+β1xi) ) = min( mXi=1 ln n 1 + eeyi(β0+β1xi)) We will describe our function loss as L = 1m Pmi=1 ln n 1 + eeyi(β0+β1xi) . Our objective is to iterative decrease this loss as we keep computing the optimal coefficients. Here xi ∈ R In this problem we will be working with real image data where the goal is to classify if the image is 0 or 1 using logistic regression. The input X ∈ Rm∗d where a single data point xi ∈ Rd , d = 784. The matrix labels Y ∈ Rm, where each label yi ∈ {0, 1} • Load the data into the memory and visualize one input as an image for each of label 0 and label 1. (The data should be reshaped back to [28 x 28] to be able to visualize it.) • The data is inbetween 0 to 255. Normalise the data to 0 and 1 • Set yi = 1 for images labeled 0 and yi = -1 for images labeled 1. Split the data randomly into train and test with a ratio of 80:20. Why is random splitting better than sequential splitting in our case? • Initialize the coefficients using a univariate “normal” (Gaussian) distribution of mean 0 and variance 1. (Remember that coefficients are a vector of = [β0, β1...βd+1], where d is the dimension of the input) • Compute the loss using the above mentioned Loss L. (The loss can be written as L = 1m Pmi=1 ln 1 + eeyi(β0+Pdj=0 β(j+1)·xi,j ) , where (i, j) represent the data point i for i ∈ [1 ..m] and jth dimension of the data point xi for j ∈ [0 ...d-1]) • To minimize the loss function a widely known algorithm is going in the direction opposite to the gradients of the loss function. (It’s helpful to write the coefficients [β1...βd+1] as a vector β and β0 as a scalar. Now β is of size [d] ∈Rd and β0 is of size [1] ∈ R) 2 We can write the gradients of loss function as ∂L ∂β0 = = 1m mXi=1 eeyi·(β0+β·xTi ) 1 + e−(yi·(β0+β·xTi )) yi = dβ0 ∂L ∂β = = 1m mXi=1 eeyi·(β0+β·xTi ) 1 + e−(yi·(β0+β·xTi )) yixi = dβ Write a function to compute the gradients • Update the parameters as β = β 0.05 ∗ dβ β0 = β0 0 0.05 ∗ dβ0 (Gradient updates should be computed based on the train set) • Repeat the process for 50 iterations and report the loss after the 50th epoch. • plot the loss for each iteration for the train and test sets • Logistic regression is a classification problem. We classify as +1 if P(Y = 1|X) ≥ 0.5. Derive the classification rule for the threshold 0.5.(Not a programming question) • for the classification rule derived compute the accuracy on the test set for each iteration and plot the accuracy. The final code should be along this format import numpy as np from matplotlib import pyplot as plt def compute_loss(data, labels, B, B_0): return logloss def compute_gradients(data, labels, B, B_0): return dB, dB_0 if __name__ == '__main__': x = np.load(data) y = np.load(label) 3 ## Split the data to train and test x_train, y_train, x_test, y_test = #split_data B = np.random.randn(1, x.shape[1]) B_0 = np.random.randn(1) lr = 0.05 for _ in range(50): ## Compute Loss loss = compute_loss(x_train, y_train, B, B_0) ## Compute Gradients dB, dB_0 = compute_gradients(x_train, y_train, B, B_0) ## Update Parameters B = B - lr*dB B_0 = B_0 - lr*dB_0 ##Compute Accuracy and Loss on Test set (x_test, y_test) accuracy_test = loss_test = ##Plot Loss and Accuracy Make sure to vectorize the code. Ideally 50 iterations should run in 15 seconds or less. If possible avoid using for loops, except for the 50 iterations of gradient updates given in the sample code.