xuebaunion@vip.163.com
3551 Trousdale Rkwy, University Park, Los Angeles, CA
留学生论文指导和课程辅导
无忧GPA:https://www.essaygpa.com
工作时间:全年无休-早上8点到凌晨3点

微信客服:xiaoxionga100

微信客服:ITCS521
CSCE 636: Deep Learning (Fall 2021) Assignment #2 Due 11:59PM on 10/7/2021 1. You need to submit (1) a report in PDF and (2) your code files, both to Canvas. 2. Your PDF report should include (1) answers to the non-programming part, and (2) results and analysis of the programming part. For the programming part, your PDF report should at least include the results you obtained, for example the accuracy, training curves, parameters, etc. You should also analyze your results as needed. 3. Please name your PDF report “HW# FirstName LastName.pdf” and submit to “HW# report” assignment. And put all code files into a compressed file named “HW# FirstName LastName.zip” and submit to “HW# code” assignment. 4. Unlimited number of submissions are allowed and the latest one will be timed and graded. 5. Please read and follow submission instructions. No exception will be made to accommodate incorrectly submitted files/reports. 6. All students are highly encouraged to typeset their reports using Word or LATEX. In case you decide to hand-write, please make sure your answers are clearly readable in scanned PDF. 7. Only write your code between the following lines. Do not modify other parts. ### YOUR CODE HERE ### END YOUR CODE Required Reading Materials: [1] Deep Residual Learning for Image Recognition (https://arxiv.org/abs/1512.03385) [2] Identity Mappings in Deep Residual Networks (https://arxiv.org/abs/1603.05027) [3] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (https://arxiv.org/abs/1502.03167) 1. (10 points) Exercise 7.7 (e-Chap:7-11) in the book “Learning from Data”. 2. (10 points) Exercise 7.8 (e-Chap:7-15) in the book “Learning from Data”. 1 3. (20 points) Consider the standard residual block and the bottleneck block in the case where inputs and outputs have the same dimension (e.g. Figure 5 in [1]). In another word, the residual connection is an identity connection. For the standard residual block, compute the number of training parameters when the dimension of inputs and outputs is 128×16×16×32. Here, 128 is the batch size, 16× 16 is the spatial size of feature maps, and 32 is the number of channels. For the bottleneck block, compute the number of training parameters when the dimension of inputs and outputs is 128× 16× 16× 128. Compare the two results and explain the advantages and disadvantages of the bottleneck block. 4. (20 points) Using batch normalization in training requires computing the mean and variance of a tensor. (a) (8 points) Suppose the tensor x is the output of a fully-connected layer and we want to perform batch normalization on it. The training batch size is N and the fully-connected layer has C output nodes. Therefore, the shape of x is N ×C. What is the shape of the mean and variance computed in batch normalization, respectively? (b) (12 points) Now suppose the tensor x is the output of a 2D convolution and has shape N × H × W × C. What is the shape of the mean and variance computed in batch normalization, respectively? 5. (50 points) We investigate the back-propagation of the convolution using a simple example. In this problem, we focus on the convolution operation without any normalization and activation function. For simplicity, we consider the convolution in 1D cases. Given 1D inputs with a spatial size of 4 and 2 channels, i.e., X = [ x11 x12 x13 x14 x21 x22 x23 x24 ] ∈ R2×4, (1) we perform a 1D convolution with a kernel size of 3 to produce output Y with 2 channels. No padding is involved. It is easy to see Y = [ y11 y12 y21 y22 ] ∈ R2×2, (2) where each row corresponds to a channel. There are 12 training parameters involved in this convolution, forming 4 different kernels of size 3: W ij = [wij1 , w ij 2 , w ij 3 ], i = 1, 2, j = 1, 2, (3) where W ij scans the i-th channel of inputs and contributes to the j-th channel of outputs. Note that the notation here might be slightly different in that, one kernel/filter here connects ONE input feature map (instead of ALL input feature maps) to ONE output feature map. (a) (15 points) Now we flatten X and Y to vectors as X˜ = [x11, x12, x13, x14, x21, x22, x23, x24] T Y˜ = [y11, y12, y21, y22] T Please write the convolution in the form of fully connected layer as Y˜ = AX˜ using the notations above. You can assume there is no bias term. Hint: Note that we discussed how to view convolution layers as fully connected layers in the case of single input and output feature maps. This example asks you to extend that to the case of multiple input and output feature maps. 2 (b) (15 points) Next, for the back-propagation, assume we’ve already computed the gradients of loss L with respect to Y˜ : ∂L ∂Y˜ = [ ∂L ∂y11 , ∂L ∂y12 , ∂L ∂y21 , ∂L ∂y22 ]T , (4) Please write the back-propagation step of the convolution in the form of ∂L ∂X˜ = B ∂L ∂Y˜ . Explain the relationship between A and B. (c) (20 points) While the forward propagation of the convolution on X to Y could be written into Y˜ = AX˜, could you figure out whether ∂L ∂X˜ = B ∂L ∂Y˜ also corresponds to a convolution on ∂L∂Y to ∂L ∂X ? If yes, write down the kernels for this convolution. If no, explain why. 6. (90 points)(Coding Task) Deep Residual Networks for CIFAR-10 Image Classifica- tion: In this assignment, you will implement advanced convolutional neural networks on CIFAR-10 using PyTorch. In this classification task, models will take a 32 × 32 image with RGB channels as inputs and classify the image into one of ten pre-defined classes. The “ResNet” folder provides the starting code. You must implement the model using the start- ing code. In this assignment, you must use a GPU. Requirements: Python 3.8, PyTorch 1.7.0 (Make sure you use this particular version of in- stallation and documentation!!!), tqdm, numpy. Please do not use torchvision.dataset and torchvision.transforms in the data pre-processing part for this homework. Please submit running report (briefly explain how you complete those functions, capture screen shot of your training loss, validation/test acc etc., anything required in the original assignment) for all coding tasks. And paste training and testing console record as an appendix in your report. You don’t have to submit the pre-trained model and the dataset which might be potentially large. (a) (10 points) Download the CIFAR-10 dataset (https://www.cs.toronto.edu/~kriz/ cifar.html) and complete “DataReader.py”. For the dataset, you can download any version. But make sure you write corresponding code in “DataReader.py” to read it. (b) (10 points) Implement data augmentation. To complete “ImageUtils.py”, you will im- plement the augmentation process for a single image using numpy. (c) (30 points) Complete “NetWork.py”. Read the required materials carefully before this step. You are asked to implement two versions of ResNet: version 1 uses original residual blocks (Figure4(a) in [2]) and version 2 uses full pre-activation residual blocks (Figure4(e) in [2]). In particular, for version 2, implement the bottleneck blocks instead of standard residual blocks. In this step, only basic PyTorch APIs in torch.nn and torch.optim are allowed to use. (d) (20 points) Complete “Model.py”. Note: For this step and last step, pay attention to how to use batch normalization. (e) (20 points) Tune all the hyperparameters in “main.py” and report your final testing accuracy. 3