S1-无代写
时间:2023-04-17
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
1
Advanced (Business)
Data Analytics
MODULE 1.2
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
2
Introduction
This tutorial aims to develop Python scripts for data preparation and exploration.
Download “ChurnData.xlsx” from Backboard and copy it into your desktop.
Making a Jupyter Notebook
1. Type Anaconda in the search bar of your window, and then click on Anaconda Navigator to
open Anaconda (see Figure 1).
Note: Anaconda provides an easy and quick way of performing predictive analytics in Python.
Figure 1. Choosing Anaconda Navigator
2. In Anaconda Navigator, double click on the “Lunch” tab in the Jupyter notebook icon (see
Figure 2).
Note: This will direct you to the Jupyter homepage. Jupyter is an open-source web application
that provides Interactive Python (IPython) to create scripts.
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
3
Figure 2. Launching Jupyter Notebook
3. In the folder shown in your Jupyter, choose your desktop.
Note: Make sure to create a New Python 3 in the folder that your data is stored (in this tutorial: your
desktop). This makes your data importation easier.
Figure 3 Choosing your desktop as the working folder
4. Click on the New icon in the top right of your working pane, and then select Python 3 (see
Figure 4).
Note: This will make a new Jupyter Notebook running Python 3.
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
4
Figure 4. Creating a new Python 3 Notebook
Importing the required libraries
1. Add a new cell by clicking on the “+” icon (see figure 5).
Note: You need to add cells for every new script.
Figure 5. Adding a new cell below your current cell
2. In the newly added cell, type the following script (see figure 6).
import pandas as pd
Note: This script imports the Pandas library into your Jupyter notebook. Panda is an open-source
data analysis and manipulation tool, built on top of the Python programming language.
Note: We chose pd as an acronym to refer to the Pandas library.
Figure 6 Adding Panda library to Jupyter Notebook
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
5
3. Run the script written in the previous step by using Ctrl + Enter.
Note: Alternatively, you can hit the “Run” icon on the top of the Jupyter screen (see Figure 7).
Figure 7 Running cell in Jupyter Notebook
Importing data
1. In your Jupyter Notebook, add a new cell, type the following script, and run it.
df=pd.read_excel(“ChurnData.xlsx")
Note: This script helps in importing an xlsx file, named as “ChurnData.xlsx”, into your Jupyter
notebook environment. In this script, we choose to name the imported data frame as df name.
You can use any name you wish.
Data exploration
1. Add a cell, type the following script, and run it.
df.columns
Note: This displays the column names of the data frame.
2. Add a cell, type the following script, and run it.
df.head()
Note: This script helps to explore the first 5 rows of the dataset.
Q1: What do you have in your dataset?
Q2: How about if you put a number inside of brackets: df.head(10)?
3. Add a new cell, type the following script, and run it.
df.shape
Q: What does this show to you?
4. Add a new cell, type the following script, and run it.
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
6
df.dtypes
Q: What does this show to you and how it can help in your model building?
Selecting attributes
1. Add a cell, type the following script, and run it.
df['Age']
Note: This displays the content of the column ['Age'] of the data frame.
2. Add a cell, type the following script, and run it.
df[['Age','Churn']]
Filtering examples
1. Add a cell, type the following script, and run it.
df['Age']>2
2. Add a cell, type the following script, and run it.
df[df['Age']>2]
3. Add a cell, type the following script, and run it.
df[(df['Age']<20) & (df['Gender']=="female")]
4. Add a cell, type the following script, and run it.
df[df['Age']==18]
Data Cleaning
1. Add a new cell, type the following script, and run it.
df.isnull()
2. Add a new cell, type the following script, and run it.
df.isnull().sum()
3. Add a new cell, type the following script, and run it.
df.isnull().any()
4. Add a new cell, type the following script, and run it.
cleaned_df=df.dropna()
Note: This script cleans data and makes a dataset without any null values
5. Add a new cell, type the following script, and run it.
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
7
cleaned_df
6. Add a new cell, type the following script, and run it.
cleaned_df.shape
Note: You can explore the size of the dataset after removing the null values.
Q1: What is the difference between the cleaned and uncleaned dataset?
Q2: What are the sizes of these two datasets?
Q3: What happened to missing rows?
Q4: What happened to their indexes?
7. Add a new cell, type the following script, and run it.
cleaned_df=cleaned_df.reset_index(drop=True)
Note: This script creates proper indexes for the cleaned dataset
8. Add a new cell, type the following script, and run it.
cleaned_df
Q: Can you figure out the difference?
Working with the indexes
5. Add a cell, type the following script, and run it.
df.iloc[:,1]
Note: This displays the rows of the second column of the data frame.
6. Add a cell, type the following script, and run it.
df.iloc[1,:]
Note: This displays the second row of all columns of the data frame.
7. Add a cell, type the following script, and run it.
df.iloc[1,1]
Note: This displays the data of the second row and column of the data frame.
8. Add a cell, type the following script, and run it.
df.iloc[1,[1,2,3]]
Note: This displays the data of the second row related to columns 1,2,3 of the data frame.
9. Add a cell, type the following script, and run it.
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
8
df.iloc[:,[1,2,3]]
Note: This displays the data of all rows related to columns 1,2,3 of the data frame.
10. Add a cell, type the following script, and run it.
df.loc[1]
Note: This displays the content of the first row of the data frame.
Note: Generally, loc[1] (location 1) of df indicates the second column of the data frame.
Note: In Pythonic way, the index starts from 0 not 1. Hence, loc[1] shows the second column
of the data frame.
11. Add a cell, type the following script, and run it.
df.loc[:1]
12. Add a cell, type the following script, and run it.
df.loc[1:]
13. Add a cell, type the following script, and run it.
df.loc[0:10]
14. Add a cell, type the following script, and run it.
df.loc[[1,3]]
Setting target variable
1. Add a cell, type the following script, and run it.
X=df[['Gender','Age','Payment Method', 'LastTransaction']]
2. Add a cell, type the following script, and run it.
y=df['Churn']
Storing your data
1. Add a cell, type the following script, and run it.
X.to_excel("Predictors.xlsx")
y.to_excel("Lable.xlsx")
Advanced (Business) Data Analytics – 2023 S1 – Module 1.2
9
Renaming your Python file
1. Follow figure 8 to rename your Python file.
Figure 8 Renaming your .py file