xuebaunion@vip.163.com
3551 Trousdale Rkwy, University Park, Los Angeles, CA
留学生论文指导和课程辅导
无忧GPA:https://www.essaygpa.com
工作时间:全年无休-早上8点到凌晨3点

微信客服:xiaoxionga100

微信客服:ITCS521
CS584: Natural Language Processing Due: Sep. 22, 2021 Assignment 1: Logistic Regression Homework assignments will be done individually: each student must hand in their own answers. Use of partial or entire solutions obtained from others or online is strictly prohibited. Electronic submission on Canvas is mandatory. 1. Document Classification (100 points) In this homework, you need to classify news articles into four categories: World, Sports, Business, and Science/Technology by building your own Logistic Regression classifier. The data provided is from AG news. Please follow the steps below: • (30pts) Data Preprocessing: – (10 pts) Fill the functions in the jupyter notebook to remove punctuation, urls, and numbers. Change text to lower case. – (5 pts) Tokenize input text into a list of tokens. Remove stopwords. – (5 pts) Split data into training, validation, and test sets. – (10 pts) Feature extraction: build your TF-IDF feature extractor for the provodied dataset. • (60pts) Build a logistic regression classifier. – (10 pts) Given the objective function of a logistic regression (LR) model with L2 regularization: J = − 1 N N∑ i=1 K∑ k=1 yik log ( exp fk∑K c=1 exp fc ) + λ d∑ j=1 w2kj (1) Derive the gradient of the objective function of LR with respect to wk. Please write down detailed steps. – (20 pts) Implement this Logistic Regression model. This step includes writing code for initial- ization, objective function, gradient, and gradient descent. – (15 pts) Stochastic Gradient Descent (SGD): fill the code for the function of SGD. – (15 pts) Mini-batch Gradient Descent: fill the code for the function of mini-batch GD. – (0 pts) Evaluation your model with the provided code. • (10 pts) Cross-validation: Implement cross-validation on the training data; Report the recall and precision for each category on the test and validation sets; Choose the best λ using the validation set. Please follow the below instructions when you submit the assignment. 1. You are NOT allowed to use packages for implementing the code required in this assignment. 2. Your submission should consist of a zip file named Assignment1 LastName FirstName.zip which contains: • a jupyter notebook file(.ipynb). The file should contain the code and the output after execution. You should run all the code provided in the jupyteer notebook. You should also include detailed comments. • (optional) a pdf file to show (1) the derivation steps of the gradient of J with respect to wk in LR (this can be included in the .ipynb file) and (2) analysis on the results (plots, tables, etc).