澳洲代写-ISSS602|学霸联盟

澳洲代写-ISSS602

时间：2022-06-01

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
1

SMU Classification: Restricted

ISSS602 Data Analytics Lab

Hands-on Exercise 5:
Explanatory Model Building with
Multiple Linear Regression – JMP
Methods

Learning Outcome:
By the end of this session, you will be able to:
• Perform data quality check on the dataset provided.
• Explore dataset using appropriate JMP’s univariate and bivariate analysis functions.
• Conduct for multicollinearity
• Perform simple linear regression analysis using Fit Y by X platform of JMP.
• Perform multiple linear regression analysis using the Fit Model platform of JMP Pro.
• Test the assumptions of linear regression analysis using JMP.
• Interpret the linear regression analysis output generated by JMP’s Fit Y by X and Fit
Model platforms.
• Validate the linear regression model using JMP.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
2

SMU Classification: Restricted

Versions history
Version Description of changes Date Editor
1.0.0 First working draft 09/01/2012 Murphy
1.0.1 Revised Data Example 16/02/2012 Murphy
1.1.0 Dropped version 1.0.1 and started a new lab
exercise
24/03/2012 Kam
1.1.1 Proof-read and finalized the lab exercise 27/03/2012 Kam
1.2.0 Re-write the lab exercise base on JMP 30/03/2012 Kam
1.2.1 Added section 6.6 and 6.6 1/4/2012 Kam
1.3.0 Revised the exercise based on JMP Pro 10 12/02/2013 Kam
1.3.1 Revised by adding more explanation. 23/09/2013 Kam
1.4.0 Revised the exercise based on JMP Pro 11 21/02/2014 Kam
1.6.0 Revision by taking into students’ feedback 22/09/2014 Kam
1.6.0 Minor revision 6/02/2016 Kam
1.7.0 Revised the exercise based on JMP Pro 12 29/09/2016 Kam
1.8.0 Minor revision 7/02/2016 Kam
1.9.0 Major revision 16/09/2016 Kam
2.0.0 Revised the exercise based on JMP Pro 13 16/02/3017 Kam
2.1.0 Minor revision 12/09/2017 Kam
2.2.0 Minor revision 22/01/2018 Kam
2.3.0 Revised the exercise based on JMP Pro 14 16/9/2018 Kam
2.4.0 Minor revision 21/1/2019 Kam
2.5.0 Minor revision 1/10/2019 Kam
2.6.0 Revising the exercise based on JMP Pro 15 17/02/2020 Kam
2.7.0 Edited for AY2020-21 Term 1 batch 25/09/2020 Kam
2.8.0 Minor revision for AY2020-21T2 12/1/2021 Kam
2.9.0 Revision based on JMP Pro 16 30/09/2021 Kam
2.10.0 Minor revision for AY2021-22 Term 2 6/02/2022 Kam
2.11.0 Revision for AY2021-22 Term 3 11/05/2022 Tan

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
3

SMU Classification: Restricted

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
4

SMU Classification: Restricted

Contents
1.0 Introduction ................................................................................................................................ 6
1.1 Setting the Scene .................................................................................................................... 6
1.2 The data .................................................................................................................................. 6
2.0 Data Loading ............................................................................................................................... 8
2.1 Importing the data .................................................................................................................. 8
2.2 Reviewing the data and medatadata ...................................................................................... 8
3.0 Exploration and Data Preparations ............................................................................................. 8
3.1 Univariate data analysis .......................................................................................................... 8
3.2 Bivariate Data Analysis .......................................................................................................... 10
3.3 Data Preparation ................................................................................................................... 13
4.0 Building A Simle Linear Regression Model ................................................................................ 13
4.1 Working with Fit Y by X platform .......................................................................................... 13
5.0 Multiple Linear Regression Modelling ...................................................................................... 17
5.1 Computing a base multiple linear regression model ............................................................ 17
5.2 Detecting multicollinearity using variance inflation factors ................................................. 20
5.3 Model checking using residuals ............................................................................................ 23
5.4 Saving prediction formula ..................................................................................................... 25

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
5

SMU Classification: Restricted

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
6

SMU Classification: Restricted

1.0 Introduction

How much orders my company will receive from my customers next month? How many
customers will churn when the contract expire? How many catalogues do I need to mail, in
order to increase the probability of my potential customers to buy? These and many other
related questions are the challenges face by you and many business analysts. Multiple Linear
Regression (MLR) models are one of the three most common analytics modelling techniques
used by practitioners today. In this lab exercise, we will gain hands-on experiences on how
to build exploratory models using the Fit Y by X and Fit Model platforms of JMP Pro 16.

1.1 Setting the Scene

A large Toyota car dealership offers purchasers of new Toyota cars the option to buy their
used car as part of a trade-in. In particular, a new promotion promises to pay high prices for
used Toyota Corolla cars for purchasers of a new car. The dealer then sells the used cars for
a small profit. To ensure a reasonable profit, the dealer needs to be able to predict the price
that the dealership will get for the used cars.

1.2 The data

The file provided for the analysis is called ToyotaCorolla.xls. The xls extension indicates that
it is in Microsoft xls format. In fact, the data file consists of two worksheets, namely: data and
metadata. The data worksheet provides the actual data records and the metadata describes
the variables of the data records. The data set comprises of 38 columns (i.e., variables) and
1436 rows (i.e., data records).

Figure below shows a subset of the data worksheet.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
7

SMU Classification: Restricted
Table below provides detail information of the variables.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
8

SMU Classification: Restricted
2.0 Data Loading

2.1 Importing the data

To begin the analysis, you will first load the data worksheet of ToyotaCorolla.xls provided
into JMP Pro.

DIY: Using the steps you had learned in the previous lessons, load the
worksheets into JMP Pro. Save the data file into JMP file format and
name the file ToyotaCorolla.jmp.

2.2 Reviewing the data and medatadata

Best Practice: Once data is imported into JMP, you should examine the Data Table carefully
for accuracy and completeness. Accuracy refers to the degree that data types in the imported
dataset matches the original dataset. Completeness refers to the degree that the number of
records and fields in the imported dataset matches the original dataset.

Quiz: How many categorical and continuous variables are there in the
ToyotaCorolla data table?

Next, you will take a quick tour of the data.

• At the data table, click on the scroll bar located at the button of the data table and
scroll towards the right.

• Examine the data properly.

The purpose of this step is to understand data type and to identify those dummy variables.
3.0 Exploration and Data Preparations

In this section, you will explore the ToyotoCorolla data set using appropriate exploratory
univariate and bivariate data analysis techniques. The purpose of this analysis is to discover
the distributions of the variables and their inter-relationships.
3.1 Univariate data analysis

In this section, you will use the Distribution platform and Graph Builder of JMP Pro to
examine the properties of the variables given.

DIY: Using the steps you had learned in the previous lesson, perform
distribution analysis on the information variables given.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
9

SMU Classification: Restricted
Quiz: What conclusions can you draw from the distribution analysis
results?

Before you continue to the next analysis, you will save a script to recreate the analysis in the
data table.

• At the ToyotaCorolla – Distribution window, click on the red triangle next to
Distribution.

• Select Script -> Save Script to Data table from the context menu.

Notice that a new script called Distribution has been added in the File Name pane.

• Right-click on the Distribution and select Edit from the context menu.

The Script for ToyotaCorolla window appears.

• Click on the OK button to close the window.
ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
10

SMU Classification: Restricted
3.2 Bivariate Data Analysis

Next, you will examine the inter-relationship between the predictors also known as
independent variables. In multiple linear regression analysis, we would like to use un-
correlated predictors. To ensure that appropriate predictors will be used in the model, you
will explore the predictors using bivariate data analysis techniques.

DIY: Using the steps you had learned in the previous lesson, perform
bivariate data analysis on the information variables given.

Quiz: What conclusions can you draw from the bivariate analysis
results?

You might wonder which predictors are most strongly correlated with each other, either in a
positive or negative sense. The scatterplot certainly gives us some clear visual information.
However, a quick numerical summary showing correlations for all pairs of variables would
be nice.

• Select Analyze  Multivariate, choose all numeric variables as Y, Columns, click OK.

• At the Multivariate window, click on the red triangle and select Pairwise
Correlations from the drop-down list.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
11

SMU Classification: Restricted

The Pairwise Correlations panel appears. By default, it is not sorted, you will sort the panel
first.

• In the Pairwise Correlations panel, right-click and select Sort by Column from the
context menu.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
12

SMU Classification: Restricted

The Select Columns dialog window appears.

• Select Correlation.

• Click on the OK button.

Your screen should look similar to the figure below.

With the help of Pairwise Correlations report, we can identify the relationship between the
independent variables relatively easier. For example, the report reveals that Radio_cassette
is strongly correlated with Radio and their relationship is a positive. Similarly Mfg_Year is
strongly correlated with Age_08_04 but their relationship is negative.

Quiz: What other observations can you draw from the Multivariate
reports?

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
13

SMU Classification: Restricted
DIY: Use the steps you had learned, save the Multivariate as script.
3.3 Data Preparation

There are a number of variables which are binary (dummy variables), convert them to
nominal for the purpose of this Linear Regression Exercise.

• Select variables: Met_Color, Automatic, Mfr_Guarantee, BOVAG_Guarantee, ABS,
Airbag_1, Airbag_2, Airco, Automatic_airco, Boardcomputer, CD_Player,
Central_Lock, Powered_Windows, Power_Steering, Radio, Mistlamps, Sport_Model,
Backseat_Divider, Metallic_Rim, Radio_cassette, Tow_Bar.

• On one of the highlighted columns, right click to select Standardize Attribute 
change the variables to nominal values.

4.0 Building A Simle Linear Regression Model

In this section, you will learn how to build a simple linear regression model using the Toyota
Corolla dataset. The simple linear regression will be used to determine if re-sale prices are
related to the age of the cars.
4.1 Working with Fit Y by X platform

You will use the Fit Y by X platform to build the simple linear regression model.

• From the menu bar of ToyotaCororlla data table, click on Analysis -> Fit Y by X.

The Fit Y by X platform dialog window appears.

First, you will assign Price to Y.

• From the Select Columns pane, click on Price.

• At the Cast Selected Columns into Roles pane, click on button.

Next, you will assign Age_08_04 to X.

• From the Select Columns pane, click on Age_08_04.

• At the Cast Selected Columns into Roles pane, click on button.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
14

SMU Classification: Restricted

Your screen should look similar to the figure below.

Now, you are ready to perform the analysis.

• Click on the OK button.

The Fit Y by X output window appears.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
15

SMU Classification: Restricted

It is a bivariate scatter. The Y-axis is Price and X-axis is Age_08_04. Each dot represents a
single transaction.

Next, you are going to use the Fitting Commands of JMP Pro to perform a simple linear
regression analysis.

• At the Fit Y by X output window, click on the red triangle.

• Select Fit Line from the context menu.

Notice that a best fit line has been added onto the scatter plot.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
16

SMU Classification: Restricted

The Fit Y by X output window also provides the formula of the simple linear regression, its
Summary of Fit, Lack of Fit, Analysis of Variance and Parameter Estimates statistics reports.

Quiz: With reference to the lesson note, interpret the analysis results
provided by the Line Fit, Summary of Fit, Analysis of Variance and
Parameter estimates reports

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
17

SMU Classification: Restricted

5.0 Multiple Linear Regression Modelling

In this section, you will learn how to calibrate a predictive model using the Multiple Linear
Regression function of JMP. The independent is Price and the predictors (or independent variables)
are Age_08_04, Mfg_Year, KM, Quarterly_Tax, Weight, Guarantee_Period, CC, HP, Doors and Gears.
Notice that all these variables are in continuous values.

5.1 Computing a base multiple linear regression model

First, you are going to perform multiple linear regression analysis using Fit Model platform.

• From the menu bar of ToyotaCorolla data table, select Analyze -> Fit Model.

The Fit Model dialog window appears.

• From the Select Columns, click on Price.
• From the Pick Role Variables pane, click on the button.

• From the Select Columns, click the Red Arrow  Modeling Type  Uncheck Nominal.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
18

SMU Classification: Restricted
• Select the following variables: Age_08_04, Mfg_Year, KM, HP, CC, Doors, Gears,
Quarterly_Tax, Weight, Guarantee_Period.
Note: We have left out numeric variable Mfg_Month.

• At the Construct Model Effects pane, click on the button.

• For Personality, select Standard Least Squares from the drop-down list.

• For Emphasis, select Minimal Report from the drop-down list.

Your screen should look similar to the figure below.

• Click on the Run button.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
19

SMU Classification: Restricted
The Fit Least Squares report window appears.

Quiz: What observation can you draw from the report?

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
20

SMU Classification: Restricted
5.2 Detecting multicollinearity using variance inflation factors

Multicollinearity exists whenever two or more of the predictors in a regression model
are moderately or highly correlated. It is the curse multiple linear regression because by
incorporating highly correlated predictors in a predictive model will affect the
robustness of the prediction significantly. Hence, it is very import for us to detect and
eliminate multicollinearity as much as possible.

Some of the common methods used for detecting multicollinearity include:

• The analysis exhibits the signs of multicollinearity — such as, estimates of the
coefficients vary from model to model.
• The t-tests for each of the individual slopes are non-significant (P > 0.06), but
the overall F-test for testing all of the slopes are simultaneously 0 is significant
(P < 0.06).
• The correlations among pairs of predictor variables are large.

Looking at correlations only among pairs of predictors, however, is limiting. It is possible
that the pairwise correlations are small, and yet a linear dependence exists among three
or even more variables. That's why many regression analysts often rely on what are
called variance inflation factors (VIF) to help detect multicollinearity.

In this section, you will learn how to use VIF to detect multicollinearity of the predictors.

• At the Parameter Estimates of the Fit Least Squares report window, right-click.

• Select Columns -> VIF.

Notice that a new column called VIF has been added in the parameter Estimates pane.

We would like to detect predictors with VIF value greater than 10. You can sort the
column so that it is easier to detect predictors with VIF.

• At the Parameter Estimates pane, right-click and select Sort by Column from
the context menu.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
21

SMU Classification: Restricted

The Select Columns dialog window appears.

• Click on VIF.

• Keep the Ascending box uncheck.

• Click on the OK button.

Notice that the predictors in the Parameter Estimates pane have been sorted in descending order
according to their VIF values.

Quiz: Which are the predictors with VIF values greater than 10?

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
22

SMU Classification: Restricted
Now, you will need to open the correlation matrix to understand their relationship better.
• At the Respond Price bar, click on the red triangle.

• Select Estimates -> Correlation of Estimates.

The Correlation of Estimates pane appears.

Now, let us try to understand why Age_08_04 has a high VIF value.

Notice that between Age_08_04 and Mfg_Year there is a correlation value marks in blue. This
indicates that Age_08_04 is strongly and positively correlated to Mfg_Year.

Since these two predictors are highly correlated (i.e., 0.9687), it is very safe for you decided to drop
one of them in the subsequent analysis.

• At the Respond Price bar, click on the red triangle, select Redo  Relaunch Analysis.

• Remove Mfg_Year from the Construct Model Effects, then click Run.

• Check the VIF of the variables. You should get a similar output as below.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
23

SMU Classification: Restricted

5.3 Model checking using residuals

In this section, you will learn how to check the appropriateness of a simple linear regression model
by using the residual.

When conducting a residual analysis, a "residuals versus fits plot" is the most frequently created
plot. It is a scatter plot of residuals on the y axis and fitted values (estimated responses) on the x
axis. The plot is used to detect non-linearity, unequal error variances, and outliers.

• From the Response Price bar, click on the red triangle.

• Select Row Diagnostics -> Plot Residual by Predicted.

A new scatter plot looks similar to the figure below will be added in the Fit Least Squares window.

ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
24

SMU Classification: Restricted

Note that, as defined, the residuals appear on the y axis and the fitted values appear on the x axis.
You should be able to look back at the scatter plot of the data and see how the data points there
correspond to the data points in the residual versus fits plot here.

Here are the characteristics of a well-behaved residual vs. fits plot and what they suggest about the
appropriateness of the multiple linear regression model:

• The residuals "bounce randomly" around the 0 line. This suggests that the assumption that
the relationship is linear is reasonable.
• The residuals roughly form a "horizontal band" around the 0 line. This suggests that the
variances of the error terms are equal.
• No one residual "stands out" from the basic random pattern of residuals. This suggests that
there are no outliers.

QUIZ: With reference to the guides given above, what observations can you draw
from the Residual by Predicted Plot in the figure?

Two-cent worth thought

Please kindly note that interpreting these plots is subjective. By and large, students learning residual
analysis for the first time tend to over-interpret these plots, looking at every twist and turn as
something potentially troublesome. You'll especially want to be careful about putting too much
weight on residual vs. fits plots based on small data sets. Sometimes the data sets are just too small
to make interpretation of a residuals vs. fits plot worthwhile. Don't worry! You will learn - with
practice - how to "read" these plots.
ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
25

SMU Classification: Restricted

5.4 Saving prediction formula

In this section, you will learn how to save the prediction equation for subsequent model validation
and testing.

• At the Fit Least Squares report window, from the Response Price bar, click on the red
triangle.

• Select Save Columns -> Prediction Formula.

Notice that a new column called Pred Formula Price has been added in the ToyotaCorolla data table.

Let us check out the prediction formula.

• At the ToyotaCorolla data table, right-click on Pred Formula Price.

• Select Formula from the context menu.
ISSS602 Data Analytics Lab
Hands-on Exercise 5: Explanatory Model Building with Multiple Linear Regression – JMP Methods
26

SMU Classification: Restricted

The Column Properties dialog window of Pred Formula Price appears.

Can you see the prediction formula?

Next, you will save the residual of the model.

DIY: Using the steps you had learned in the previous section, save the
residual of the model into ToyotaCorolla data table.