程序代写案例-MKTG2113
时间:2022-06-27
The University of Sydney Page 1
MKTG2113
Marketing Insight
Dr. Aekyoung (Amy) Kim
Copyright © 2017 Pearson Education, Inc. 12-2
Multivariate Analysis I
LEARNING OBJECTIVES
 The nature of logistic regression
analysis
 How to test logistic regression
analysis using SPSS
 How to report logistic regression
analysis findings
Outline
• Characters of Logistic Regression
• Use of Logistic Regression
• Types of Logistic Regression Analysis
− Bivariate logistic regression
− Multiple logistic regression
• Understanding Logistic Regression Model
• Example of Logistic Regression Analysis
• SPSS Exercise
3
When Is Logistic Regression Used?
• An extension of regression
• What is different from regression?
− Same types of independent variable (IV) of regression
analysis (interval or ratio)
− However, the dependent variable (DV) has only two
categories
Characters of Logistic Regression
• Relationship between one or more IVs and a binary DV(e.g., two
categories)
• Example DV:
− Yes vs. No
− True vs. False
− Success vs. Failure
− Buy vs. Not Buy
− Response vs. No Response
− Visit vs. Not Visit
• Estimating the probability of an event taking place, such as the
probability of product purchase, a team winning, of a patient being
healthy, etc. (only between 0 and 1)
Use of Logistic Regression
• A catalog company wants to increase the proportion of mailings that
result in sales (buy or not buy)
− The catalog company can send mailings to the people who are
more likely to respond
• A doctor wants to accurately diagnose a possibly cancerous tumor
− The doctor can determine whether tumor is more likely to
benign or malignant
• A loan officer wants to know whether the next customer is likely to
default
− The loan officer can assess the risk of extending credit to a
particular customer
Types of Logistic Regression Analysis
• Bivariate regression (simple logistic
regression): having only one independent
variable to predict a dependent variable
(e.g., Can likeability of donation be
predicted by annual income?)
• Multiple logistic regression: having two
or more independent variables to predict a
dependent variable (e.g., Can likeability of
donation be predicted by a combination of
age, annual income, gender, and
perceived empathy?)
7
IV1
IV2
IV3
IVn
DV
IV DV
Brainstorm 10-1
• Think about a situation that you can use a bivariate or multiple logistic
regression.
• State the IV(s) and DV respectively.
− IV: interval or ratio variable
− DV: A nominal variable with only two categories
Quiz!
Q1. Which question can be answered by a logistic regression analysis?
I. Can we predict the number of returning to prison for a similar offence based on the
variables socio-economic status, gender, post-release counselling follow-ups, and length
of sentence?
II. Is the likelihood of successful cancer treatment influenced by factors such as family
support, age of onset, and locus of control?
III. Can I predict the probability of a person having a gym membership based on age,
gender, and whether or not they use a photo-blogging app on their phone?
a. I and II
b. I and III
c. II and III
d. I, II, and III
How Can We Analyze These Data?
Age CD Age CD Age CD
22 0 40 0 54 0
23 0 41 1 55 1
24 0 46 0 58 1
27 0 47 0 60 1
28 0 48 0 60 0
30 0 49 1 62 1
30 0 49 0 65 1
32 0 50 1 67 1
33 0 51 0 71 1
35 1 51 1 77 1
38 0 52 0 81 1


Age and signs of coronary heart disease (CD)
How Can We Analyze These Data?
• T-test? (Compare mean age of diseased and non-
diseased)
− Non-diseased: 38.6 years
− Diseased: 58.7 years (p < 0.0001)
• Correlation?
• Linear regression?
AGE (years)
Si
gn
s
of
c
or
on
ar
y
di
se
as
e
No
Yes
0 20 40 60 80 100
Scatter Diagram of These Data
AGE (years)
Si
gn
s
of
c
or
on
ar
y
di
se
as
e
No
Yes
0 20 40 60 80 100
Scatter Diagram of These Data
Older people, more disease
Younger people, less disease
AGE (years)
Si
gn
s
of
c
or
on
ar
y
di
se
as
e
No
Yes
0 20 40 60 80 100
Correlation or Linear Regression?
Could add a
linear regression
line
(1)
(0)
People with age above 61 have
> 1 probability of disease
People with age below 38 have
< 0 probability of disease
Why Logistic Regression?
• Where the dependent variable (DV) is not interval or ratio, we cannot
use a simple linear regression, because…
• No Normal distributions of variables
• No Linearity between IV and DV to get the best fitting line
• We want to predict a probability, which can only vary between 0 and
1., But our simple linear regression may predict values that are
below 0 or above 1
AGE (years)
Si
gn
s
of
c
or
on
ar
y
di
se
as
e
No
Yes
0 20 40 60 80 100
A Best Line to Fit…
Here’s a more realistic
representation of the
relationship between
the probability of age
and disease
0.0
0.2
0.4
0.6
0.8
1.0
x
Logistic Function
Probability of
disease (P)
0.0
0.2
0.4
0.6
0.8
1.0
x
Logistic Function
α + β1X1
Probability of Yes
Probability of No
(1-Yes)
• Y = α + βX (Linear regression)
• Y = α + βX (Logistic Regression)
– P : probability that the event Y occurs (e.g., P = Yes vs. 1 - P = No)
– X : independent variable
– α : constant
– β : coefficient of variable X
• = = x
• Estimating the probability of an event taking place
– How much (the size of β) each factor influences it
– One unit increase in X multiplies the probability that the event Y occurs (vs. not) by Exp(β) =
Bivariate Logistic Regression Analysis
Log functions are the inverses of
exponential functions.
The log function ln(A) = B + C is
defined to be equivalent to the
exponential equation A = eB+C = eB x eC
If eb = 1 : no change in probability
If eb = 2 : 2 times (100% increase) in probability
If eb = 0.2 : 0.2 times (80% decrease) in probability
• Y = α + β1X1 + β2X2 + β3X3 ... + βnXn (Linear regression)
• Y = α + β1X1 + β2X2 + β3X3 ... + βnXn (Logistic Regression)
– P : probability that the event Y occurs (e.g., P = Yes vs. 1 - P = No)
– Xi : independent variable i
– α : constant
– βi : coefficient of variable Xi
• = = x x x x
• Estimating the probability of an event taking place
– How much (the size of βi) each factor influences it
– One unit increase in X multiplies the probability that the event Y occurs (vs. not) by Exp(β1) =
Multiple Logistic Regression Analysis
-1
0
1
2
0 10 20 30 40 50 60 70 80 90 100
Concentration span
G
or
ill
a
sp
ot
te
r
Low concentration, more spotter
High concentration, less spotter
What If Negative Relationship?
-1
0
1
2
0 10 20 30 40 50 60 70 80 90 100
Concentration span
G
or
ill
a
sp
ot
te
r
Could add a linear regression line
People with CS below 21 have
> 1 probability of being a
spotter…
People with CS above 92
have < 0 probability of being
a spotter…
-0.5
0.0
0.5
1.0
1.5
0 10 20 30 40 50 60 70 80 90 100
Concentration span
G
or
ill
a
sp
ot
te
r
Here’s a more realistic
representation of the
relationship between the
probability of difference
spotting and concentration
Brainstorm 10-2
• Think about a situation that you can use a negative bivariate logistic
regression.
• State the IV(s) and DV respectively.
− IV: interval or ratio variable
− DV: A nominal variable with only two categories
• Create survey questions.
Quiz!
Q2. Which one is not a type of DV for a binary logistic regression analysis?
a. True vs. False
b. Success vs. Failure
c. Eat 0 piece vs. Eat 5 pieces vs. eat 10 pieces
d. Not Visit vs. Visit
Q3. Logistic regression analysis can use interval or ratio scale for independent variable. (True or
False?)
– To predict website visit based on age
– The null hypothesis: There is no relationship between the
independent and dependent variables
β = 0
– The alternative hypothesis: There is a relationship between
the independent and dependent variables
β ≠ 0
Age
(IV)
Website
visit
(DV)
Example: Bivariate Logistic Regression
Example SPSS output: Bivariate Logistic Regression
Variable B Standard
error
Exp(B) p-value
Age -.07 .01 .93 .00
Intercept 3.69 .72 40.04 .00
The Exp(B) coefficient for age = .93
For one unit increase in age, the probability of website visit (vs. no visit)
decreases by 7% (1 - .93 = .07).
Age
Gender
Website
visit
Example: Multiple Logistic Regression
28
• To predict website visit based on levels of
convenience and gender
– The null hypothesis: all coefficients in
the model are equal to zero (none of
the IVs have a relationship with the DV)
β1 = β2 = β3 ... = βn = 0
– The alternative hypothesis: not every
coefficient is simultaneously equal to
zero (there is a relationship between
IV(s) and DV)
β1 = β2 = β3 ... = βn ≠ 0
Example SPSS output: Multiple Logistic Regression
Variable B Standard
error
Exp(B) p-value
Age -.09 .02 .914 .000
Gender 3.15 .96 23.3 .001
Intercept 4.01 .83 55.15 .000
The Exp(B) coefficient for age = .914
For one unit increase in age, the probability of website visit (vs. no visit)
decreases by 8.6% (1 - .914 = .086).
The Exp(B) coefficient for gender = 23.3
For one unit increase in gender, the probability of website visit (vs. no visit)
increases by 2230% (23.3 - 1 = 22.3).
SPSS Exercise: Logistic Regression
• Employee turnover: In order to predict the likelihood of marketing employees leaving
an organisation, data on past employees was examined.
• 60 participants provided data on four facets:
− DV: Turnover: Whether they left the organisation or stayed.
− IV1: Job Satisfaction: Satisfaction with their current job.
− IV2: Workload: Level of perceived workload.
− IV3: Employment Type: Whether they were employed on a full-time basis or not.
data_14_1.sav


The Exp(B) coefficient of .076 tells us that
for a one unit increase in job satisfaction,
the probability of turnover intention
decreased by 92.4% (1 - .076 = .924).
SPSS Output: Logistic Regression
Only the IV job satisfaction was significant
in predicting turnover intention, p = .002.
Results
To estimate the probability of turnover for employees working in the marketing department
of an organisation, a logistic regression analysis was conducted. The probability of employee
turnover was estimated using existing questionnaire data on job satisfaction, workload, and
whether employees were full-time employed. Job satisfaction was the only predictor which
significantly improved the model’s predictive capability (Exp(B) = .076, p = .002). If an
employee’s job satisfaction increased by one unit, there was a 92.4% reduction in the probability
of an employee leaving the organisation. Workload and full-time employment did not appear to
significantly influence the probability of an employee leaving the organisation.
34
Write-up: Logistic Regression
Quiz!
Q4. Choose a correct interpretation of the following logistic regression analysis using
product sale as DV and price ($) as IV.
a. For one $ increase in price, the probability of sale (vs. no sale) increases
by 0.1%.
b. For one $ increase in price, the probability of sale (vs. no sale) decreases
by 99.9%.
c. The relationship between price and product sale is not statistically
significant.
d. Price has a positive effect on product sale.
Variable B Standard
error
Exp(B) p-value
price -6.6035 2.3514 .001 .004
Intercept .3070 .1148 1.36 .008
Key Takeaways
• When can marketers use logistic regression analysis?
• Can you choose appropriate variables for IV(s) an DV
for logistic regression analysis?
• Can you state null and alternative hypothesis for logistic
regression analysis?
• Can you run logistic regression analysis using SPSS?
• Can you test significance of logistic regression
analysis?
• Can you interpret output of logistic regression analysis
and suggest managerial implications?
36
Housekeeping
• See if you can replicate SPSS outputs of the lecture note.
• Have a weakly group meeting to work with your group project.
• Download data you collected from Qualtrics and complete
data analysis by week 11.
• Final presentation: week 12-13 (randomly ordered)
• Due date for research report: June 3rd 2022 11:59 pm
37
Q&A


essay、essay代写