Python代写-PM2022
时间:2022-06-22
6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 1 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbdatasci-w266 /2022-summer-main Public2022-summer-main / assignment / a3 /Multiclass_text_classification.ipynbMark H Butler Release a3 Latest commit 0afc15b 10 days ago History0 contributorsCode Issues Pull requests Actions Projects Wiki Securitymaster Go to file4050 lines (4050 sloc) 139 KBAssignment 3: Fine tuning a multiclassclassification BERT modelDescription: This assignment covers fine-tuning of a multiclass classification.You will compare two different types of solutions using BERT-based models.You should also be able to develop an intuition for:Working with BERTThe effects of using different model checkpoints and fine-tuning somehyperparametersDifferent metrics to measure the effectiveness of your modelThe effect of partially cleaning/normalizing your training dataThe assignment notebook closely follows the lesson notebooks. We will use the20 newsgroups dataset and will leverage some of the models, or part of thecode, for our current investigation.You are strongly encouraged to read through the entire notebook beforeanswering any questions or writing any code.The initial part of the notebook is purely setup. We will then generate our BERTmodel and see if and how we can improve it.Raw Blame6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 2 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbmodel and see if and how we can improve it.Do not try to run this entire notebook on your GCP instance as the training ofmodels requires a GPU to work in a timely fashion. This notebook should be runon a Google Colab leveraging a GPU. By default, when you open the notebookin Colab it will try to use a GPU. Total runtime of the entire notebook (withsolutions and a Colab GPU) should be about 1h.Open in ColabThe overall assignment structure is as follows:1. Setup1.1 Libraries & Helper Functions1.2 Data Acquisition1.3 Training/Test/Validation Sets for BERT-based models2. Classification with a fine tuned BERT model2.1 Create the specified BERT model2.2 Fine tune the BERT model as directed2.3 Examine the predictions with various metrics3. Classification with some preprocessed data and the BERT model3.1 Clean up the data a bit3.2 Regenerate the data with the appropriate tokenizer3.3 Regenerate the BERT model3.4. Rerun the data and examine the predictions4. Try again with a different mini batch size to see if that improvesperformanceINSTRUCTIONS::Questions are always indicated as QUESTION:, so you can search for thisstring to make sure you answered all of the questions. You are expected tofill out, run, and submit this notebook, as well as to answer the questions in6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 3 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbfill out, run, and submit this notebook, as well as to answer the questions inthe answers file as you did in a1 and a2.### YOUR CODE HERE indicates that you are supposed to write code.If you want to, you can run all of the cells in section 1 in bulk. This is setupwork and no questions are in there. At the end of section 1 we will state allof the relevant variables that were defined and created in section 1.1. SetupLets get all our libraries and download and process our data.In [1]: !pip install -q transformersIn [2]: !pip install pydot --quietIn [3]: from sklearn.datasets import fetch_20newsgroupsfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_reportIn [4]: from collections import Counterimport numpy as npimport tensorflow as tffrom tensorflow import kerasimport seaborn as snsimport matplotlib.pyplot as pltfrom pprint import pprintIn [5]: from transformers import BertTokenizer, TFBertModelIn [6]: # 4-window plot. Small modification from matplotlib examples.def make_plot(axs, history1,history2,y_lim_loss_lower=0.4,y_lim_loss_upper=1.6,y_lim_accuracy_lower=0.4,y_lim_accuracy_upper=0.9,model_1_name='model 1',model_2_name='model 2',6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 4 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbTake a look at the records. We basically have a long string of text and anassociated label. That label is the Usenet group where the posting occured. Therecords are the raw text. They vary significantly in size.Notice the "labels" are just integers that are an offset into the list of targetnames.):box = dict(facecolor='yellow', pad=5, alpha=0.2)ax1 = axs[0, 0]ax1.plot(history1.history['loss'])ax1.plot(history1.history['val_loss'])ax1.set_title('loss - ' + model_1_name)ax1.set_ylabel('loss', bbox=box)ax1.set_ylim(y_lim_loss_lower, y_lim_loss_upper)ax3 = axs[1, 0]ax3.set_title('accuracy - ' + model_1_name)ax3.plot(history1.history['accuracy'])ax3.plot(history1.history['val_accuracy'])ax3.set_ylabel('accuracy', bbox=box)ax3.set_ylim(y_lim_accuracy_lower, y_lim_accuracy_upper)ax2 = axs[0, 1]ax2.set_title('loss - ' + model_2_name)ax2.plot(history2.history['loss'])ax2.plot(history2.history['val_loss'])ax2.set_ylim(y_lim_loss_lower, y_lim_loss_upper)ax4 = axs[1, 1]ax4.set_title('accuracy - ' + model_2_name)ax4.plot(history2.history['accuracy'])ax4.plot(history2.history['val_accuracy'])ax4.set_ylim(y_lim_accuracy_lower, y_lim_accuracy_upper)In [7]: def read_20newsgroups(test_size=0.1):# download & load 20newsgroups dataset from sklearn's reposdataset = fetch_20newsgroups(subset="all", shuffle=True, remove=("headers"documents = dataset.datalabels = dataset.target# split into training & testing a return data as well as label namesreturn train_test_split(documents, labels, test_size=test_size), dataset# call the function(train_texts, test_texts, train_labels, test_labels), target_names =In [8]: train_texts[:2]6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 5 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbnames.The variable ''target_names'' stores all of the names of the labels.We already have a test set and a train set. Let's explicitly set aside part of ourtraining set for validation purposes.The validation set will always have 961 records.The training set will always have 16000 records.train_texts - an array of text strings for trainingtest_texts - an array of text strings for testingvalid texts - an array of text strings for validationtrain_labels - an array of integers representing the labels associated withtrain_textstest_labels - an array of integers representing the labels associated withtest_textsvalid_labels - an array of integers representing the labels associated withvalid_textstarget_names - an array of label strings that correspond to the integers inthe *_labels arraysIn [9]: train_labels[:2]In [10]: print(target_names)In [11]: len(train_texts)valid_texts = train_texts[16000:]valid_labels = train_labels[16000:]train_texts = train_texts[:16000]train_labels = train_labels[:16000]In [12]: len(valid_texts)In [13]: len(train_texts)In [14]: #get the labels in a needed data format for validationnpvalid_labels = np.asarray(valid_labels)6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 6 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbthe *_labels arrays2. Classification with a fine tuned BERT modelLet's pick our BERT model. We'll start with the base BERT model and we'll usethe cased version since our data has capital and lower case letters.We're setting our maximum training record length to 200. BERT models canhandle more and after you've completed the assignment you're welcome to trylarger and small sized records.Now we'll tokenize our three data slices. This will take a minute or two.Notice our input_ids for the first training record and their padding. Thetrain_encodings also includes an array of token_type_ids and anattention_mask array.Write a function to create this multiclass bert model.Keep in mind the following:Each record can have one of n labels where n = the size of target_names.We'll still want a hidden size layer of size 100We'll also want dropoutOur classification layer will need to be appropriately sized and use theIn [15]: #make it easier to use a variety of BERT subword modelsmodel_checkpoint = 'bert-base-cased'In [16]: bert_tokenizer = BertTokenizer.from_pretrained(model_checkpoint)bert_model = TFBertModel.from_pretrained(model_checkpoint)In [17]: max_length = 200In [18]: # tokenize the dataset, truncate when passed `max_length`,# and pad with 0's when less than `max_length` and return a tf Tensortrain_encodings = bert_tokenizer(train_texts, truncation=True, paddingvalid_encodings = bert_tokenizer(valid_texts, truncation=True, paddingtest_encodings = bert_tokenizer(test_texts, truncation=True, paddingIn [19]: train_encodings.input_ids[:1]6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 7 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbcorrect non-linearity for a multi-class problem.Since we have multiple labels we can no longer use binary cross entropy.Instead we need to change our loss metric to a categorical cross entropy.Which of the two categorical cross entropy metrics will work best here?QUESTION: 2.1 How many trainable parameters are in your dense hidden layer?QUESTION: 2.2 How many trainable parameters are in your classification layer?In [20]: def create_bert_multiclass_model(train_layers=-1,hidden_size = 100,dropout=0.3,learning_rate=0.00005):"""Build a simple classification model with BERT. Use the Pooled Output for classification purposes."""### YOUR CODE HERE#restrict training to the train_layers outer transformer layers### END YOUR CODEreturn classification_modelIn [21]: pooled_bert_model = create_bert_multiclass_model()In [22]: pooled_bert_model.summary()In [23]: keras.utils.plot_model(pooled_bert_model, show_shapes=True, dpi=90)In [24]: #It takes 10 to 14 minutes to complete an epoch when using a GPUpooled_bert_model_history = pooled_bert_model.fit([train_encodings.input_idstrain_labels,validation_data=([6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 8 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbNow we need to run evaluate against our fine-tuned model. This will give us anoverall accuracy based on the test set.QUESTION: 2.3 What is the Test accuracy score you get from your model witha batch size of 8? (Just copy and paste the value into the answers sheet andround to five significant digits.)There are two ways to see what's going on with our classifier. Overall accuracyis interesting but it can be misleading. We need to make sure that each of ourcategories' prediction performance is operating at an equal or higher level thanthe overall.Here we'll use the classification report from scikit learn. It expects two inputs asarrays. One is the ground truth (y_true) and the other is the associatedprediction (y_pred). This is based on gethering all the predictions from our ourtest set.validation_data=([npvalid_labels),batch_size=16,epochs=1)In [ ]: #batch 8, ML=200score = pooled_bert_model.evaluate([test_encodings.input_ids, test_encodingstest_labels)print('Test loss:', score[0])print('Test accuracy:', score[1])In [26]: #run predict for the first three elements in the test data setpredictions = pooled_bert_model.predict([test_encodings.input_ids[:3In [27]: predictionsIn [28]: #run and capture all predictions from our test set using model.predict### YOUR CODE HERE### END YOUR CODE#now we need to get the highest probability in the distribution for each prediction#and store that in a tf.Tensorpredictions = tf.argmax(predictions, axis=-1)predictions6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 9 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbQUESTION: 2.4 What is the macro average f1 score you get from theclassification report for batch size 8?Now we'll generate another very valuable visualization of what's happening withour classifier -- a confusion matrix.And now we'll display it!3. Classification with some preprocessed data and theBERT modelOkay, not bad. As you saw there are a lot of odd characters in our input somaybe cleaning some of those out and forcing everything to lower case whilerunning a bert-base-uncased model will give us some imporvement in ourprediciotns. Let's give that a shot. First let's clean out our text a bit. Remember,it is critical that we preform identical preprocessing on our training, text, andvalidation sets.In [29]: print(classification_report(test_labels, predictions.numpy(), target_namesIn [30]: cm = tf.math.confusion_matrix(test_labels, predictions)cm = cm/cm.numpy().sum(axis=1)[:, tf.newaxis]In [31]: plt.figure(figsize=(20,7))sns.heatmap(cm, annot=True,xticklabels=target_names,yticklabels=target_names)plt.xlabel("Predicted")plt.ylabel("True")In [ ]:In [32]: def preprocess(sentence):sentence=str(sentence)sentence = sentence.lower()sentence = sentence.replace('\n', ' ')#what other characters or strings might you replace to clean up this data#we don't expect a full set. Please enter six of them here.### YOUR CODE HERE### END YOUR CODEreturn sentence6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 10 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbCall the function to recreate our BERT model only this time it will use themodel_checkpoint of bert-base-uncased.This will only display a plot if we've run for more than one epoch. We're notasking you to run more than one in this assignment but when you're done youmight try running another just to see how much more the model learns.return sentencecleantrain_texts = list(map(preprocess, train_texts))#you need to make sure you apply the same preprocessing to the test and validation sets### YOUR CODE HERE### END YOUR CODEIn [33]: cleantrain_texts[:2]In [34]: cleantest_texts[:2]In [35]: model_checkpoint = 'bert-base-uncased'bert_uctokenizer = BertTokenizer.from_pretrained(model_checkpoint)bert_model = TFBertModel.from_pretrained(model_checkpoint)In [36]: # tokenize the dataset, truncate when passed `max_length`,# and pad with 0's when less than `max_length`cleantrain_encodings = bert_uctokenizer(cleantrain_texts, truncationcleanvalid_encodings = bert_uctokenizer(cleanvalid_texts, truncationcleantest_encodings = bert_uctokenizer(cleantest_texts, truncation=TrueIn [37]: cleanvalid_encodings.input_ids[:2]In [38]: clean_pooled_bert_model = create_bert_multiclass_model()In [39]: clean_pooled_bert_model_history = clean_pooled_bert_model.fit([cleantrain_encodingstrain_labels,validation_data=([npvalid_labelsbatch_size=8,epochs=1)In [ ]: fig, axs = plt.subplots(2, 2)6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 11 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbQUESTION:3.1 What is the test accuracy you get when you run the cleaned model withbatch size 8?fig, axs = plt.subplots(2, 2)fig.subplots_adjust(left=0.2, wspace=0.6)make_plot(axs,pooled_bert_model_history,clean_pooled_bert_model_history,model_1_name='raw',model_2_name='clean',y_lim_accuracy_lower=0.42,y_lim_accuracy_upper=0.82)fig.align_ylabels(axs[:, 1])fig.set_size_inches(18.5, 10.5)plt.show()In [ ]:In [40]: #Evaluate the fine tuned clean model against the cleaned test data### YOUR CODE HERE### END YOUR CODEprint('Test loss:', score[0])print('Test accuracy:', score[1])In [41]: #run and capture all the predictions from the clean test data### YOUR CODE HERE### END YOUR CODEpredictionsIn [42]: #Generate a confusion matrix using your new clean test predictions# ccm = ...### YOUR CODE HERE### END YOUR CODEIn [43]: #display that new confusion matrixplt.figure(figsize=(20,7))sns.heatmap(ccm, annot=True,xticklabels=target_names,yticklabels=target_names)plt.xlabel("Predicted")plt.ylabel("True")6/21/22, 2:47 PM2022-summer-main/Multiclass_text_classification.ipynb at master · datasci-w266/2022-summer-mainPage 12 of 12https://github.com/datasci-w266/2022-summer-main/blob/master/assignment/a3/Multiclass_text_classification.ipynbQUESTION:3.2 What is the weighted avg F1 score in the classification when you run thecleaned model with batch size of 8?4. Try again with a different mini batch size to see if thatIn [44]: # Run the sklearn classification_report again with the new predictions### YOUR CODE HERE### END YOUR CODE