python代写-MMAI 5000A --Assignment 2|学霸联盟

python代写-MMAI 5000A --Assignment 2

时间：2021-11-15

MMAI 5000A -- Assignment 2
October 13, 2021
Task
This is an individual assignment that requires you to participate in a machine learning competition on Kaggle. Specifically, you
will participate in the competition Titanic: Machine Learning from Disaster, where the task it to predict survival based on
passenger information.
You will have to register a Kaggle account and follow the instructions under Overview on the competition site. Beyond the
Overview, I recommend you to closely study a couple of notebooks under the Notebooks tab. For example, "Titanic: 81.1%
Leader board Score Guaranteed" and "A Data Science Framework: To Achieve 99% Accuracy" provide good examples of
exploratory analysis and feature engineering and are thus worth your time. Use them as tutorials.
Submission
The assignment is due on November 17 at 8:30 am. You have to do two things.
1) Submit your predictions on the test set to Kaggle and send me your user name so that I can verify your submission.
2) Submit a standard Python file (i.e. .py ) containing the code used to generate the predictions on Canvas.
Grading
The Python submitted Python file should contain the following standard steps of a data science project:
1. Load data.
2. Pre-process the data (aka data wrangling).
Data cleaning.
Identification and treatment of missing values and outliers.
Feature engineering.
3. Exploratory data analysis.
At least two plots describing different aspects of the data set (e.g. identifying outliers, histograms of different
distributions, or scatter plots to explore correlations).
Print a basic data description (e.g. number of examples, number features, number of examples in each class and such).
Print (or include in the plots) descriptive statistics (e.g. means, medians, standard deviation)
i. Partition data into train, validation and test sets.
ii. Fit models on the training set (this can include a hyper-parameter search) and select the best based on validation set
performance.
iii. Print the results of the final model on the test set. This should include accuracy, F1-score and AUC.
While the first part of this submission could be completed by simply copying an existing notebook, the second part cannot. Your
code will be marked based on it's originality and the extent that it reflects an understanding of the task. Extensive copying will be
considered plagiarism and Turnitin will be used for it's detection. For this assignment, learning and understanding are more
important than prediction accuracy.
Good luck!
Hjalmar

学霸联盟