ENGG2851
Project Analytics
Week 9: Predictive
Analytics
Presented by:
Dr. Mehdi Rajabi Asadabadi
ENGG2851 Course Modules
Weeks Topic
Week 1 Introduction and Overview
Week 2 Data Basics
Week 3 Basic Analytical tools
Week 4 Descriptive Analytics
Week 5 Diagnostic Analytics
Week 6,7 Data Visualisation and Dashboarding
Week 7,9 Predictive Analytics
Week 8 Online Quiz
Week 9 Prescriptive Analytics
Week 10 Automated tools
Week 11 Student presentations
Week 12 Student Presentations
Week 13 Online Quiz
Predictive Analytics
Predictive analytics focus
on using historical data to
identify patterns enabling
the prediction of future
events.
The identified pattern or
trend from historical data is
represented by a
mathematical model.
This model can then be
used to predict future
events based on the
previous data and the
new data.
Descriptive
Analytics
(Analysis)
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
Complexity and Cost
A
d
d
e
d
V
a
lu
e
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data, based
on the existing data.
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data, based
on the existing data.
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data, based
on the existing data.
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Binary classification:
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Binary classification:
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Multiclass classification:
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Decision Tree
• A decision tree is essentially a tree shaped diagram
used to classify.
• It is a predictive model based on a branching series
of Boolean tests.
• It has a root node which is the starting point of the
decision tree.
• Splitting or branching is the process of dividing a
node into two or more sub-nodes.
• Decision nodes are any node that has sub-nodes
and leaf/terminal nodes are the ones without a
split.
• Sub-nodes of a specific node is known as child
nodes and the node is known as parent node.
Decision Tree
Decision Tree
Decision Tree
In small datasets with limited attributes, it is
possible to do the graphing of decision tree by
hand, in larger datasets, it may become really
time consuming and confusing.
Later in your jobs, if you wanted to use decision tree,
remember that there are automated tools to do
decision tree for you.
Predictive Analytics Models
Classification:
Classification is the process of creating a set
of classes for data based on the existing
data.
Decision trees
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Decision trees
Random forest
Predictive Analytics Models
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Decision trees
Random forest
Voting classifiers
Classification:
Classification is the process of creating a set of classes for data based
on the existing data.
Decision trees
Random forest
Voting classifiers
Neural networks and deep learning
…
Predictive Analytics Models
Predictive Analytics Models
• Classification
Classification is a mathematical model that can differentiate between two or
more outcomes. For instance, given the atmospheric data for the last two
weeks, the decision of ‘will it rain?’ can be answered by a classification model.
Instances like this where outcome is known as binary classification. A popular
classification technique is decision trees.
• Regression
As opposed to predicting a decision, regression focus on predicting a number
using available data. This is typically used in banking, investing and other
finance models. For instance, given the bitcoin prices for the last year, ‘what
would be the price today?’ can be answered by a regression model.
• Other models (Clustering, Time Series, Forecasting, and similar)
Regression
• Regression investigates the relationship between a dependent
(target) variable and independent variables (predictor)
• It is used for forecasting, time series analysis and finding causal effect
relationship between variables
• Regression can determine significant relationships between dependent and
independent variables
• Regression can also determine strength of impact of multiple independent
variables on a dependent variable
• Regression can also compare the effects of variables measured on different
scales, such as the effect of price changes and number of promotional
activities.
Linear Regression
• Establishes a relationship between dependent
variable (y) and one or more independent
variables (X) using best fit straight line (also known
as regression line)
• This is represented by an equation y = a*x + b,
where a is the intercept, b is the slope of the line
and e is the error term
• The available data is fitted to this regression line
to determine the values for a, b and e. The
equation can then be used to predict the value of
the target variable based on any new independent
variables
Activity time
Example 1: The
regression
between the
experience
(Years with
Company) and
Salary
Look Look for p-value and coefficients as well
Look Look for the value of R2 as it tells us how much change in Y is driven by X.
Select Select your dependent variable for Y range and independent variable for X range.
Click Click Data->Data Analysis->Regression
Questions:
• What is the independent variable?
• What is the dependent variable?
• What if P-value is more than 0.05?
• What is your prediction of the salary of a person with 5 years of
experience?
Multiple Regression
• This is determining the relationship between multiple independent
variables and a dependent variable. The dependent variable is
modelled as a function of several independent variables with
corresponding coefficients, along with the constant term.
• Multiple regression requires two or more independent variables
which is why it’s called multiple regression.
• It can be represented by:
• Y = a1x1 + a2x2 + … + anxn + b
Activity time
How cost variance in projects is impacted by
variances in time and scope
How cost variance in projects are impacted by
variances in time and scope
How cost variance in projects are impacted by
variances in time and scope
How cost variance in projects are impacted by
variances in time and scope
Can someone say which one has a higher impact on
cost variance, scope variance or time variance?
Assume that P value for Time Variance was 0.25,
what would you do?
Activity time
Activity time
Please copy
and past this
table
somewhere,
and remove
the forecast
values
Please copy
and past this
table, and
remove the
forecast
values
Now, you
can predict!
That was all on
Predictive
Analytics!