DS-600 Data Mining
Exploratory data analysis in R programming
This project should be done by individually.
Due date: 3/16/2021 by 11:59 PM ET
Task: Perform data cleaning and data preprocessing techniques in R programming, analyze,
visualize, and conclude your analysis based on your research and results which you find after
applying data preprocessing techniques and analysis.
Data cleaning process should be done before data preprocessing. For data cleaning process
please review “measurement and data collection issues” from Lecture - 2 notes.
Data Preprocessing Techniques: (Apply all these techniques in your analysis)
❑ Dimensionality reduction
❑ Feature subset selection
❑ Feature creation
❑ Discretization and binarization
❑ Variable transformation
What to submit in assignment?
Submit a pdf document maximum limit 5 pages on Blackboard.
What to cover in PDF?
1. Data Cleaning results
2. Data Preprocessing results (Please include all steps results one by one)
3. Visualizations (Include Graphs and Plots in this section and one line description below
5. R code (This is mandatory to use R programming for this assignment)
Data name: Sales in Supermarket
Data location: Week - 2 Folder of Week-by-Week section on Blackboard
Data info: The growth of supermarkets in most populated cities are increasing and market
competitions are also high. The dataset is one of the historical sales of supermarket company
which has recorded in 3 different branches for 3 months data.
Invoice id: Computer generated sales slip invoice identification number
Branch: Branch of supercenter (3 branches are available identified by A, B and C).
City: Location of supercenters
Customer type: Type of customers, recorded by Members for customers using member card
and Normal for without member card.
Gender: Gender type of customer
Product line: General item categorization groups - Electronic accessories, Fashion accessories,
Food and beverages, Health and beauty, Home and lifestyle, Sports and travel
Unit price: Price of each product in $
Quantity: Number of products purchased by customer
Tax: 5% tax fee for customer buying
Total: Total price including tax
Date: Date of purchase (Record available from January 2019 to March 2019)
Time: Purchase time (10am to 9pm)
Payment: Payment used by customer for purchase (3 methods are available – Cash, Credit card
COGS: Cost of goods sold
Gross margin percentage: Gross margin percentage
Gross income: Gross income
Rating: Customer stratification rating on their overall shopping experience (On a scale of 1 to