sas代写-MGT7160

时间：2022-04-01

MGT7160
Marketing Analytics for Managers
Block 2
Predicting Customer Behaviour Using
Decision Trees
Queens University Belfast
Learning Objectives
▪ By successfully completing this section you
should be able to:
- Describe applications of predictive analytics
- Understand the decision tree algorithm
- Implement the decision tree algorithm in SAS
- Interpret and evaluate the output of the decision
tree algorithm in SAS
PREDICTING CUSTOMER
BEHAVIOURS
MODELLING TOOLS IN SAS EM
MODELLING TOOLS IN SAS EM
MODELLING TOOLS IN SAS EM
Decision Trees
▪ Decision trees are one of the most
widely used predictive modelling
techniques
▪ Learns input > target relationships by
building a tree composed of decision
nodes and prediction nodes
▪ The tree consists of a series of rules
▪ Predictions are made by navigating
down the tree until we reach an
appropriate leaf node
▪ The tricky bit is building the most
efficient and accurate tree
J. Ross Quinlan is a
famed researcher in
data mining and
decision theory. He
has done pioneering
work in the area of
decision trees,
including inventing the
ID3 and C4.5
algorithms.
Decision Trees: Telco Example
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
Yes high no fair yes
Unknown medium no fair yes
Unknown low yes fair yes
Unknown low yes excellent no
Yes low yes excellent yes
No medium no fair no
No low yes fair yes
Unknown medium yes fair yes
No medium yes excellent yes
Yes medium no excellent yes
Yes high yes fair yes
Unknown medium no excellent no
Decision Trees
Overage?
Children?
Credit
Rating?
Yes
Yes YesNo No
no yes unknown
no yes excellent fair
Decision Trees
Overage?
Children?
Credit
Rating?
Yes
Yes YesNo No
no yes unknown
no yes excellent fair
QUERY
Overage: no
Income: medium
Children: yes
CreditRating: fair
Decision Trees
Overage?
Children?
Credit
Rating?
Yes
Yes YesNo No
no yes unknown
no yes excellent fair
QUERY
Overage: no
Income: medium
Children: yes
CreditRating: fair
Decision Trees
Overage?
Children?
Credit
Rating?
Yes
Yes YesNo No
no yes unknown
no yes excellent fair
QUERY
Overage: no
Income: medium
Children: yes
CreditRating: fair
Decision Trees
▪ The important question when
building a decision tree is what
variables should we use at the
different nodes in the tree
▪ We would like to use variables that
partition the data into pure splits
according to the target variable as
much as possible
▪ Purer splits are going to bring us
closest to making a prediction
Decision Trees
This is the full dataset
with 5 examples with
a no outcome for
churn and 9 with a
yes outcome
Try Splitting on Overage
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
No medium no fair no
No low yes fair yes
No medium yes excellent yes
yes high no fair yes
yes low yes excellent yes
yes medium no excellent yes
yes high yes fair yes
unknown medium no fair yes
unknown low yes fair yes
unknown low yes excellent no
unknown medium yes fair yes
unknown medium no excellent no
Try Splitting on Overage
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
No medium no fair no
No low yes fair yes
No medium yes excellent yes
yes high no fair yes
yes low yes excellent yes
yes medium no excellent yes
yes high yes fair yes
unknown medium no fair yes
unknown low yes fair yes
unknown low yes excellent no
unknown medium yes fair yes
unknown medium no excellent no
Try Splitting on Overage
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
No medium no fair no
No low yes fair yes
No medium yes excellent yes
yes high no fair yes
yes low yes excellent yes
yes medium no excellent yes
yes high yes fair yes
unknown medium no fair yes
unknown low yes fair yes
unknown low yes excellent no
unknown medium yes fair yes
unknown medium no excellent no
Decision Trees
no yes unknown
This is the result of splitting the
dataset based on the Overage variable
Overage
Try Splitting on Income
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
Yes high no fair yes
Yes high yes fair yes
No medium no fair no
No medium yes excellent yes
Yes medium no excellent yes
Unknown medium no fair yes
Unknown medium yes fair yes
Unknown medium no excellent no
No low yes fair yes
Yes low yes excellent yes
Unknown low yes fair yes
unknown low yes excellent no
Try Splitting on Income
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
Yes high no fair yes
Yes high yes fair yes
No medium no fair no
No medium yes excellent yes
Yes medium no excellent yes
Unknown medium no fair yes
Unknown medium yes fair yes
Unknown medium no excellent no
No low yes fair yes
Yes low yes excellent yes
Unknown low yes fair yes
unknown low yes excellent no
Try Splitting on Income
Overage Income Children CreditRating Churn
No high no fair no
No high no excellent no
Yes high no fair yes
Yes high yes fair yes
No medium no fair no
No medium yes excellent yes
Yes medium no excellent yes
Unknown medium no fair yes
Unknown medium yes fair yes
Unknown medium no excellent no
No low yes fair yes
Yes low yes excellent yes
Unknown low yes fair yes
unknown low yes excellent no
Decision Trees
high medium low
This is the result of splitting the
dataset based on the Income
variable
Income
Decision Trees
We need to
choose the best
split from all of
those available
high medium low
no yes unknown
Income
Overag
e
Decision Trees
Information gain is
a formal measure
that allows us to
do this
high medium low
no yes unknown
Income
Overag
e
Decision Trees
▪ To build a decision tree we recursively
calculate the information gain of all of
the variables available to split on and
choose the one that gives us the
highest information gain
▪ Once the tree is built we can use it to
make predictions for unseen instances
▪ There are lots of extensions to this
basic idea to make decision trees
applicable to real-world datasets
DECISION TREES IN SAS
ENTERPRISE MINER
Business Rational
▪ Business Problem: The CRM manager in our
mobile phone company is feeling the pressure
from competitors. The churn rate for bill pay
customers is higher than the industry average,
can you help?
▪ Analytics Solution: Develop a model to predict
who is likely to leave so that the CRM manager
can focus the attention of the team on these
customers
▪ Plan of Action:
- Import data
- Create Model
- Analyse the results
Data Partitioning
▪ When we start building predictive
models we must think about how
we are going to test how well our
model predicts our outcome
▪ The most effective way to do this is to create
a holdout or test sample
▪ In this instance our model is generated using
a training dataset and is tested using a test
sample
Training
Dataset
Testing
Dataset
Full
Sample
Data Partition Node
▪ The data partition node is
found under the sample tab
▪ The data partition node allows the user to
separate the data into the following types:
▪ Training data
▪ Validation data
▪ Test data
Data Partition Node
▪ SAS Enterprise miner also allows us to
create a third dataset which is called a
validation dataset
▪ The validation dataset is used for tuning the
model to improve to prevent the model from
over-fitting (we will discuss over-fitting later
in this section)
▪ For the purposes of this course we are not
going to create validation data
Partitioning Data: Add Data Partition Node
▪ Import the data table QUB_INSURE_TRAINING
data and drag it into the diagram.
▪ Make sure for this example to set the target node.
▪ Drag and drop a Data Partition Node from the
sample tab into the diagram
Partitioning Data: Set Partition Node Settings
▪ Click on the Data Partition node to activate the
properties window
▪ Set the partition settings as follow:
- Training = 80%
- Validation = 0%
- Test = 20%
▪ Run the Data Partition Node
Partitioning Data: Result
Partitioning Data: Result
What do you notice
about the preferred
channel ratios for the
3 datasets?
Decision Tree Node
▪ The Decision Tree node is
found under the Model tab
▪ You use the Decision Tree node to create
decision trees that do one of the following
tasks:
- Classify observations based on the values of
nominal, binary, or ordinal targets
- Predict outcomes for interval targets
Create Decision Tree: Run Decision Tree
▪ Drag and drop the Decision Tree Node from
the model tab into the diagram
▪ Connect the Data Partition node to the
Decision Tree node
▪ Run the Decision Tree Node
Create Decision Tree: Results
Create Decision Tree: Decision Tree
Model Accuracy
▪ The accuracy of a model on a given data set
is the percentage of observations that are
correctly classified by the model
- Often also referred to as recognition rate
- Error rate (or misclassification rate) is the
opposite of accuracy
Confusion Matrix
▪ A confusion matrix is a device used to
illustrate how a model is performing in terms
of false positives and false negatives
- It gives us more information than a single
accuracy figure
- It allows us to think about the cost of
mistakes
- It can be extended to any number of classes
Confusion Matrix
Predicted Result
A
Churner
B
Non-Churner
True Positive
(TP)
False
Negative (FN)
A
Churner
Actual Result
False
Positive (FP)
True Negative
(TN)
B
Non-Churner
False Positives Vs False Negatives
▪ While it is useful to generate the simple
accuracy of a model, sometimes we need
more
▪ When is the model wrong?
- False positives vs. false negatives
- Related to type I and type II errors in statistics
▪ Often there is a different cost associated with
false positives and false negatives
- Think about diagnosing diseases
Other Accuracy Measures
▪ Sometimes a simple accuracy measure is not
enough
=
TP
TP+FN
=
TN
TN +FP
=
TP+TN
TP+FP+TN +FN
=
FP+FN
TP+FP+TN +FN
Model Accuracy
Misclassification
Rate
Sensitivity (True
Positive Rate)
Specificity (True
Negative Rate)
Create Decision Tree: Classification Table
▪ Based on the formula from the previous slide the
misclassification rate for the training data is
calculated as:
065.0
3209
209
153172847192
17192
==
+++
+
Create Decision Tree: Fit Statistics
▪ The fit statistic table outputs a number of
statistics used to calculate the accuracy of the
model
▪ The misclassification rate is included
Decision Tree: Rename Tree
▪ It is good practice to name the nodes with a
name that describes them in some way
▪ Right click on the decision tree node and
choose the rename option
Over-fitting Explained
▪ Over-fitting occurs when a statistical model is too
sensitive to random error or noise instead of the
underlying relationship.
▪ Over-fitting generally occurs when a model is
excessively complex
▪ A over-fit model will generally have
poor predictive performance
▪ When in doubt “Keep it Simple Stupid”
Vary the Decision Tree Options
▪ Add a third decision tree to diagram
▪ Set the follow properties
- Set Maximum Branch to 3
- Leaf Size to 50
▪ Run the Node
Vary the Decision Tree Options
Vary the Decision Tree Options
Decision Tree: Interactive Option
▪ The decision tree that is created when you
run the decision tree node is 100% data
driven
▪ In some instances it is useful to be able to
have some business input into the
construction of the decision tree
▪ The interactive decision tree application
allows the user to drive the splits in the
decision tree
Interactive Decision Tree
▪ Add a second Decision Tree node to the
diagram
▪ Connect the Data Partition node to the second
Decision Tree Node
▪ Run the Decision Tree
Decision Tree: Interactive Option
▪ In the properties window for the decision tree
click on the Interactive button to open the
interactive decision tree application
Interactive Decision Tree
▪ The view has three panels
1. The Tree View displays the decision tree that we will interactively
grow
2. The Rules Panel displays the split count from the root node, the
node ID for the current and each predecessor node, the variable
used for this particular splitting decision, and what values or criteria
were used in the split.
3. The Statistics Panel displays summary statistics related to the
training at each node.
1
2 3
Interactive Decision Tree
▪ Right click on the root node and choose Prune
Node to prune the tree back to the root node
Interactive Decision Tree
▪ Right click on the Root Node
▪ Options from here
- Grow the tree automatically (Train node)
- Grow the tree based on user decided splits (Split Node)
▪ Click on Split Node
Interactive Decision Tree
▪ The split node window lists the variables that can be used
to split the node
▪ The variables are listed in order of their predictive power
▪ Choose Payment_Method and click OK
Interactive Decision Tree
▪ Right click CC leaf node
▪ Select the Split Node option
▪ Select the variable AVG_of_OOB_Amount_to_Total
and click Edit Rule
Interactive Decision Tree
▪ Add a two new braches with split point, 0.1 and 0.2
▪ Remove the original branch with the split of 0.2183
▪ Click OK to apply the new splits
Interactive Decision Tree
▪ To grow the left side of the tree right click on
the DD or Missing leaf node
▪ Select Train Node
Interactive Decision Tree
▪ There is evidence of that this model is over
fitting the data
Interactive Decision Tree
▪ Close the Interactive Decision Tree application
▪ It is good practice to Use Frozen Tree option,
this option prevents the node from running
again and over writing the changes you have
made to the tree
Vary the Decision Tree Options
▪ Add a third decision tree to diagram
▪ Set the follow properties
- Set Maximum Branch to 3
- Leaf Size to 50
▪ Run the Node
Vary the Decision Tree Options
Vary the Decision Tree Options
How might we use
this output to help the
CRM manager with
customer retention?
MODEL VALIDATION
Section Outline
▪ In this section we explore the following
topics
- Statistical Model Validation Techniques
- Statistics for measuring model accuracy
- Charts for measuring model accuracy
- Business Model Validation
Model Validation
▪ As we have already discussed once your
model has been built you need to check to
ensure that the model is working from a
statistical context
▪ We should also consider if the model works
from a business context
Validation Data Sets
Split the available data into a
training/development dataset and a test dataset
Train the model in the training dataset and
evaluate based on the test dataset
A couple of drawbacks
- We may not have enough data
- We may happen upon an unfortunate split
Training Data Test Data
Total number of available examples
Evaluating Model Accuracy
▪ During model development, in testing, and
after deploying a model in the wild, we need
to be able to quantify the performance of the
model
- How accurate is the model?
- When the model is wrong, how is it wrong?
Data Samples for validation
▪ The data samples that you use to validate your
model must be carefully chosen
▪ There are a number of types of samples that can
be used to validate a model, the most used ones
are:
- Out of Sample: This validation sample is often
referred to as the hold out sample and is usually a
subset of the initial ABT
- Out of Time: This validation sample is taken from a
period of time that was not included in the training
ABT, it is used to make sure that your model is robust
through an economic cycle
Measures Beyond Simple Accuracy
▪ Most classification algorithms actually give
us a numeric output which we can use to go
further than accuracy:
- The Mahalanobis Distance
- The Kolmogorov-Smirnov Distance
- ROC Curves & Area Under the Curve
- Gini
Example Customer Expected Predicted Prob
100010 Y Y 0.99
100011 N Y 0.65
100012 Y Y 0.60
100013 Y Y 0.58
100014 N N 0.21
100015 N N 0.26
100016 Y Y 0.69
100017 Y N 0.35
100018 N N 0.17
100019 Y Y 0.77
100020 N Y 0.74
100021 Y Y 0.89
100022 Y N 0.01
100023 N N 0.33
100024 Y Y 0.53
100025 N N 0.36
100026 Y Y 0.73
100027 Y N 0.19
100028 N Y 0.63
100029 N N 0.23
Receiver Operating Characteristic (ROC)
Curves
▪ The ROC curve is a graph that plots
sensitivity on the y-axis verses 1- specificity
on the X-axis for various cut-offs
▪ In our example, the sensitivity is the
percentage of non-churners predicted stay
and 1- specificity is the percentage of
churners predicated to stay
ROC Curves
▪At the point (0,0) all of the
instances are classified as
bad and at point (1,1) all
instances are classified as
good
▪The straight line from (0,0)
to (1,1) represents a
randomly selecting model
▪The more the ROC Curve
tends towards the 0,1 point
the better the model
For some great ROC curve examples have a look http://www.anaesthetist.com/mnm/stats/roc/Findex.htm
ROC Curves (cont…)
▪ROC curves can be
used to compare
models
▪The area under the
curve (AUC) is also a
measure or accuracy
of the model
▪A good model should
have an AUC greater
that 0.5
Business Validation
▪ If it is possible it can be very insightful to get
a business expert to validate the outputs of a
model based on their business expertise
▪ For a model to be adopted and utilised by the
business, it is impetrative that the outputs of
a model must make ‘business sense’ and it is
your job as the model builder to work with
the business in their understanding of the
model
Section Takeaways
▪ The important takeaway messages from this
section are:
- Validation of models is a really important step in
the process
- It is important that you validate your models using
a number of different methods and data samples
- While statistical validation is vital it is also
important to be aware of the business validation of
the model
MODELS VALIDATION IN SAS
ENTERPRISE MINER
Model Comparison Node
▪ The Model Comparison node in
Enterprise Miner is found under
the Asses tab
▪ The Model Comparison node enables you to compare the
performance of competing models using various
benchmarking criteria.
▪ The Model Comparison node outputs a wide range of the
many criteria that can be used to compare models such as:
- ROC charts and corresponding area under the curve.
- Statistical Measures: comparative criteria from statistical
literature include Bayesian Information Criterion (BIC), Akaike's
Information Criterion (AIC), Gini statistics, Kolmogorov-Smirnov
statistics, and Bin-Best Two-Way Kolmogorov-Smirnov tests.
Compare Decision Tree Models
▪ Drag and drop a Model Comparison node from the asses
tab into the diagram
▪ Connect the three Decision Tree nodes to the Model
Comparison node
▪ Run the Model Comparison node
Compare Decision Tree Models: Output
Compare Decision Tree Models: Output
Compare Decision Tree Models: Output
Compare Decision Tree Models: Output
Summary of Fit Statistics
Compare Decision Tree Models: Output
Which model would
you deploy and why?
Reporter Node
▪ The Reporter node in Enterprise
Miner is found under the Utility
tab
▪ The Reporter node creates a document
that outlines details of the process flow
▪ Once you are finished the modelling phase of
the project it is good practice to run the
reporter node to document your model
Section Takeaways
▪ The important takeaway messages from this
section are:
- Prediction models are central to any analytics
strategy
- We can build prepiction models using well defined
algorithms such as Decision Trees
- Enterprise Miners includes functionality for running
Decision Trees and Regression
- It is imperative that you measure the accuracy
and predicative power of your models
Appendix 1: Information Gain
▪ The entropy of a dataset
D according to a target
variables is:
where pi is the probability that an arbitrary
row from D belongs to class Ci
( )
ii
ppDInfo
m
i
2
1
log)( 
=
−=
1 2 3
Appendix 1: Information Gain
▪ The average entropy of
a dataset D after
partitioning on a
particular attribute A is:
where Dj to Dv are the dataset partitions
created by the different levels of A and |D| is
the size of a dataset
( )
=
=
v
j
j
j
A DInfo
D
D
DInfo
1
)(
1 2 3
Appendix 1: Information Gain
▪ The information gain of
a variable A is the
difference between the
original entropy and the
average entropy after
partitioning
1 2 3
( ) ( ) ( )DInfoDInfoAGain A−=
7 THANK YOU