程序代写案例-W4705|学霸联盟

程序代写案例-W4705

时间：2021-10-21

Homework 2: Emotion Classification with Neural Networks (100 points) Kathleen McKeown, Fall 2021 COMS W4705: Natural Language Processing Due 10/20/2021 at 11:59pm ET Please post all clarification questions about this homework on the class Edstem under the “hw2” folder. If your question includes code or a partial solution, please post it privately to the instructors ONLY. Overview This homework will serve as your introduction to using neural networks, a powerful tool in NLP. You will be doing emotion classification, in which you assign one of several emotion labels to a piece of text. You will implement and apply different neural architectures to this problem using PyTorch and work through the math that defines them. The data you will use for this homework is taken from a CrowdFlower dataset1 from which a subset was used in an experiment published on Microsoft Azure AI Gallery2. 1 Programming (46 points) To get started, download the provided code from the website. Ultimately, the pro- vided code and the code you write should work together to load the text data from the data/crowdflower_data.csv file, preprocess it and place it into Pytorch DataLoaders, create a number of models, train them on the training and development data, and test them on a blind test set. Most of the code is already written for you; you will complete this assignment by filling in some code of your own. 1https://www.figure-eight.com/data-for-everyone/ 2https://gallery.azure.ai/Experiment/Logistic-Regression-for-Text-Classification- Sentiment-Analysis-1 1 1.1 Provided Resources The data in data/crowdflower_data.csv consists of tweets labeled with 4 emotion la- bels: ‘neutral’, ‘happiness’, ‘worry’, and ‘sadness’. Each tweet has exactly one label. The data is not pre-processed in any way. Code is already provided to 1) load, preprocess, and vectorize the data; 2) load pre- trained 100-dimensional GloVe embeddings ; and 3) test a generic model on the test set. The preprocessing code is located in utils.py. You may not modify test_model(). The main() function in hw2.py is provided to start you out; it loads and preprocesses the data, and will save it to file if you set FRESH_START = True and load it if you set FRESH_START = False so that you do not have to re-process the data every time you run the code. You should use this function to run and test your code. 1.2 Tasks In this assignment you will need to do the following: 1. Fill in the train_model() function in hw2.py to train your models (§1.2.1) 2. Implement a feed-forward neural network (dense network) 3. Implement a recurrent neural network 4. Implement 2 extensions of your choice 5. Fill in main() in hw2.py to run your models 1.2.1 Training Code You will need to fill in the train_model() function in hw2.py to train your models. You may not modify the function header. The train_model() function you submit should do the following: • Train by looping through minibatches of the whole training set; • Calculate the loss on each minibatch (between the gold labels and your model output) using the existing loss function; • Do backpropagation with the loss from each minibatch and take an optimizer step; • At the end of each epoch, calculate and print the total loss on the development set; • Train until the loss on the development set stops improving; and • Return the trained model. 1.2.2 Models You will implement two basic models, a dense neural network and a recurrent neural network, by filling in the init() and forward() functions for the DenseNetwork and RecurrentNetwork classes in the models.py file. 2 Dense Network Your dense network should follow the computation graph below: x1 x2 ... xm pool h1 f h2 s yˆ W1 b1 W2 b2 Input Layer 1 Layer 2 Output where s is the softmax function and pool is a function g : Rd×m → Rd that takes as input a sequence of m vectors in Rd and outputs a single vector in Rd. For example: max-pooling. Note that f is a non-linear function and y ∈ R4. Recurrent Network Your recurrent network should follow the computation graph below: You can choose the type of RNN (plain RNN, LSTM, GRU). 1.2.3 Extensions Finally, experiment with the architecture and training of your networks. Select 2 non- trivial extensions and create a new network class or classes to test them out. 3 NOTE: if you change the pre-processing, you must put this into different functions so that your original dense and recurrent networks run without the extensions. Choose from the following non-trivial (i.e., acceptable) extensions. You can also design your own extension if it is of similar complexity to these. • A different word embedding setting (using a different set of embeddings trained on an emotion-related task, training your own word embeddings on the corpus, etc.) • Changes to the preprocessing of the data (tokenizers specifically for Tweets, etc.) • Architecture changes (adding attention to your recurrent network, CNN, hierarchi- cal encoder (e.g., characters and words)) • Different strategies for optimization and training (adding a learning rate scheduler, doing more complicated forms of early stopping, etc.) • Use of features from an emotion lexicon of your choice • Embedding polarity from a sentiment lexicon of your choice • Use of features from an emoji lexicon Some examples of trivial (i.e., unacceptable) extensions include: • Changing the non-linearity or pooling function • Changing hidden sizes, batch size, etc. • Adding dropout These extensions should all be present in the hw2.py and models.py you submit. 2 Written Problems (54 points) For the written part, you will do an exercise that will help you understand how neural networks work on a mathematical level. You will also do some homework problems related to part-of-speech tagging and syntax. There are four questions in this part: 1. Coding reflections (4 points total) 2. Backpropagation with RNNs (28 points total) 3. Context free grammar (12 points total) 4. Viterbi algorithm (10 points total) Submit the answers to these problems, with your work, in your typeset submission as described in §3.2. (You do not have to show work for multiplying two matrices together). 4 2.1 Coding Reflections (4 points) Answer the following questions about the programming portion of this assignment: • What extensions did you try in §1.2.3? Where in the code did you need to imple- ment them? Why did you think each of them might improve your performance? What was the actual effect of each one, and why do you think that happened? (For each extension, write a short paragraph.) (4 points, 2 per extension) 2.2 Backpropagation with RNNs (28 points total) In this section, you will see how a recurrent neural network learns by working through one iteration of stochastic gradient descent (SGD). 2.2.1 Forward Propagation (11 points) Suppose we have a simple recurrent neural network whose architecture is described as: xt Wx ht Wh; bh yˆt Wy; by and which has the following parameter specifications: yˆt = f(pt) pt =W T y ht + by ht = f(qt) qt =W T x xt +W T h ht−1 + bh where xt ∈ R, ht ∈ R2, yˆt ∈ R, f is the ReLU activation function specified as f(x) = max(0, x), and Wx = [ 1 3 ] Wy = [ 3 1 ] Wh = [ 1 3 2 1 ] bh = [ 1 2 ] by = [ 1 ] h0 = [ 2 1 ] Questions: The first step in SGD is to predict the RNN’s output on a given input via forward prop- agation. Suppose we have an input X = [−4 1], such that xt=1 = −4 and xt=2 = 1. Let the corresponding targets (actual values of y) be Y = [ 10 20 ] , that is yt=1 = 10 and yt=2 = 20. 5 1. (10 points) Calculate the output and hidden state of the RNN (Hint: Consider how the network unfolds over time). Specifically, calculate the following: a) ht=1 (2 points) b) yˆt=1 (2 points) c) ht=2 (3 points) d) yˆt=2 (3 points) 2. (1 point) Now that we have the predicted output of the RNN, we determine the ‘error’ or loss with respect to the gold labels. We will use the Mean Squared Error (MSE) loss function: LMSE = 1 N N∑ i=1 mi∑ t=1 (y (i) t − yˆt(i))2 where N is the number of examples, mi is the sequence length for example i, yˆt(i) is the predicted value for example i at timestep t, and y(i)t is the actual value for example i at timestep t. 2.2.2 Backpropagation Through Time (17 points) Backpropagation is the key to learning via SGD. For recurrent networks, we use a spe- cific form of backpropagation, called backpropagation through time (BPTT), to compute gradients of network parameters with respect to the loss. In order to do BPTT, we must first expand the computational graph of the RNN in order to obtain the dependencies among model parameters. We say a parameter a is dependent on another parameter b if b is used to calculate a during forward propagation. Once we determine the dependencies between parameters and calculate gradients with respect to the loss, the final step during learning is to update the network’s parameters. Questions: 1. (4 points) Draw the computation graph of this RNN. Hint: We have provided the first step of the graph; fill in the rest. LMSE yt=1 yˆt=1 yt=2 yˆt=2 2. (10 points; 2 points each) Calculate the gradients with respect to the parameters of the RNN. You must show your work and your answers should be numerical 6 (not equations). Specifically, calculate the following: a) ∂LMSE ∂Wy ; b) ∂LMSE ∂Wh ; c) ∂LMSE ∂Wx ; d) ∂LMSE ∂by ; e) ∂LMSE ∂bh Hint: 1) use numerator layout for intermediate vector and matrix derivatives; watch out for dimension mismatch (check out the corresponding tables on this Wikipedia page) 2) some multiplications may involve 3D tensors; think about how individual ele- ments of the vectors/matrices/tensors in your expression relate to each other. 3) If W is a weight matrix we want to update, each element in ∂Loss∂W must corre- spond to the element in the same position in W . 3. (3 points) Update the following parameters of our RNN. Recall that the standard SGD update is W ←W − η∂LMSE ∂W Let the learning rate be η = 0.01. Specifically, update a) by (1 point) b) Wh (2 points) 2.3 Syntax and Grammar You have been hired as a linguist by the Grammy Award-winning country music su- perstar, Kacey Musgraves. She wants you to help her better understand the syntactic meaning of her lyrics. In the following sections you will answer questions about the syntax of two of her most popular songs, Space Cowboy and Slow Burn. 2.3.1 Parse Trees (12 points) (a) Boots were not made for sitting by the door. (b) You do not want to stay anymore. (c) You can have your space, cowboy. (d) I am not going to fence you in. 1. (8 points) Provide a context-free grammar that can be used to parse sentences (a) through (d). Your grammar should provide at least one parse tree that captures that correct meaning of each sentence. Ideally, your rules would handle similar sentences not in the list below. Keep in mind that siblings on the right hand side of a rule indicate modification and the right hand side provides the sub-structure of the non-terminal on the left 7 hand side. You can use recursion to allow for multiple modifiers. Modification should be represented correctly in the generated parse tree. The grammar must consist of at most 14 rules. Of those 14 rules, at most two rules may have three non-terminals on the right-hand-side and the rest must have two non-terminals on the right-hand-side. Your grammar should not allow ungrammat- ical sentences to be generated. We provide the following rules for free (these do not count against your limit): S -> Pron VP S -> N VP Pron -> You/you | your | I N -> Boots | door | space | cowboy V -> were | made | sitting | do | want | stay | have | am | going | fence ADV -> not | anymore | in P -> for | by | to D -> the MV -> can For the given rules, note that the OR operator counts as multiple rules. In other words, the rule X −→ Y | Z is also expressed as X −→ Y and X −→ Z. You are welcome to express your rules in either format but note that we will count rules written with the OR operator separately. For example, in this example we have written two rules for X, not one. 2. (4 points) Show all possible parse trees that your grammar allows to generate sentences (a-d). 2.3.2 Viterbi Algorithm (10 points) It is a slow burn N .4 0 .1 0 .5 V 0 .8 .1 .05 .05 D 0 0 1 0 0 A 0 0 0 1 0 Table 1: Observation Likelihoods Given the sentence “It is a slow burn.” show how you would compute the probability of “burn” as a verb versus the probability of “burn” as a noun in the context of this sentence using the probabilities in Tables 1 and 2 and the Viterbi algorithm. For this question you should: 1. (8 points) Show the dynamic programming trellis at each state up to and including the point where “burn” is disambiguated. 8 N V D A

.6 .1 .2 .1

N .2 .8 0 0
V .05 .05 .8 .1
D .5 0 0 .5
A .8 0 0 .2
Table 2: Tag transition probabilities. The rows are labeled with the conditioning event.
Thus, P(N|) = .6.
2. (2 points) Show the formula with the values that would be used to compute the
probability of “burn” as either verb or noun. You do not need to do the arithmetic.
Just show the formula that would be computed.
3 Deliverables and Submission Instructions
PLEASE READ:
• We WILL NOT accept code that does not run in Python 3.6.*; this
includes broken code because of Python 2 print errors.
• We WILL NOT accept hand-written work for the written portion EXCEPT
for the computation graph, trellis, and parse trees.
• If we cannot find your extensions with CTRL+F on extension-grading you will lose
points.
3.1 Programming (46 points total)
Submit one zip file named _hw2.zip to CourseWorks.
This should have exactly the following files. Please DO NOT include the data files in
the zip. Include any external files needed to run your code (e.g., word embeddings we
did not provide you).
Your zip file should have at least the following:
• (8 points total) hw2.py including:
– (8 points) The train_model() function as in §1.2.1.
– A main() function that runs your dense and recurrent models without any
extensions, and then runs your extensions separately. This will be used to
grade your F1 score.
• (18 points total) models.py, including:
– (6 points) Your DenseNetwork as in section §1.2.2.
9
– (12 points) Your RecurrentNetwork class as in §1.2.2.
• (14 points total) Extensions:
– The extensions may be anywhere in your code; you MUST tag them with a
comment pointing them out, and include the verbatim text “extension-grading”
in that comment, or they will not be graded!
– Each extension is worth 7 points. If you do more than two, you will be graded
on the first two listed in your written answers.
• utils.py: As provided, you do not need to change this.
• (2 points total) Documentation: Your code should be documented in a meaning-
ful way and implemented efficiently. This can mean expressive function/variable
names as well as informative comments.
• (4 points) F1 Score: You will also be graded on the performance of your dense
and recurrent models using the provided test_model() function. You will receive
2 points for each model if you achieve at least 40 points in macro F1 score for
that model.
3.2 Written Answers (54 points total)
You should submit the following on Gradescope:
• A hw2-written.pdf file containing your name, email address, the homework num-
ber, and your answers for the written portion. If you are using any late days for
this assignment, note them at the top of this file. This file should include:
1. Coding Reflections (4 points)
2. Recurrent Neural Network (28 points)
3. Context-Free Grammar (12 points)
4. Viterbi Algorithm (10 points)
4 Academic integrity
Copying or paraphrasing someone’s work (code included), or permitting your own work
to be copied or paraphrased, even if only in part, is not allowed, and will result in
an automatic grade of 0 for the entire assignment or exam in which the copying or
paraphrasing was done. Your grade should reflect your own work. If you believe you are
going to have trouble completing an assignment, please talk to the instructor or a TA in
advance of the due date. Note that for the programming portion:
• You may consult internet tutorials and official package source code (e.g., PyTorch)
but you may not copy ANY PORTION without citation.
• You may NOT consult any other code including: Github repositories and other
students’ solutions.
10
• You may NOT post your homework solutions publicly on Github or allow another
student to see any portion of your solution.
11

学霸联盟