程序代写案例-MKTG2113
时间:2022-06-27
The University of Sydney Page 1MKTG2113Marketing InsightDr. Aekyoung (Amy) KimCopyright © 2017 Pearson Education, Inc. 12-2Multivariate Analysis ILEARNING OBJECTIVES The nature of logistic regressionanalysis How to test logistic regressionanalysis using SPSS How to report logistic regressionanalysis findingsOutline• Characters of Logistic Regression• Use of Logistic Regression• Types of Logistic Regression Analysis− Bivariate logistic regression− Multiple logistic regression• Understanding Logistic Regression Model• Example of Logistic Regression Analysis• SPSS Exercise3When Is Logistic Regression Used?• An extension of regression• What is different from regression?− Same types of independent variable (IV) of regressionanalysis (interval or ratio)− However, the dependent variable (DV) has only twocategoriesCharacters of Logistic Regression• Relationship between one or more IVs and a binary DV(e.g., twocategories)• Example DV:− Yes vs. No− True vs. False− Success vs. Failure− Buy vs. Not Buy− Response vs. No Response− Visit vs. Not Visit• Estimating the probability of an event taking place, such as theprobability of product purchase, a team winning, of a patient beinghealthy, etc. (only between 0 and 1)Use of Logistic Regression• A catalog company wants to increase the proportion of mailings thatresult in sales (buy or not buy)− The catalog company can send mailings to the people who aremore likely to respond• A doctor wants to accurately diagnose a possibly cancerous tumor− The doctor can determine whether tumor is more likely tobenign or malignant• A loan officer wants to know whether the next customer is likely todefault− The loan officer can assess the risk of extending credit to aparticular customerTypes of Logistic Regression Analysis• Bivariate regression (simple logisticregression): having only one independentvariable to predict a dependent variable(e.g., Can likeability of donation bepredicted by annual income?)• Multiple logistic regression: having twoor more independent variables to predict adependent variable (e.g., Can likeability ofdonation be predicted by a combination ofage, annual income, gender, andperceived empathy?)7IV1IV2IV3IVnDVIV DVBrainstorm 10-1• Think about a situation that you can use a bivariate or multiple logisticregression.• State the IV(s) and DV respectively.− IV: interval or ratio variable− DV: A nominal variable with only two categoriesQuiz!Q1. Which question can be answered by a logistic regression analysis?I. Can we predict the number of returning to prison for a similar offence based on thevariables socio-economic status, gender, post-release counselling follow-ups, and lengthof sentence?II. Is the likelihood of successful cancer treatment influenced by factors such as familysupport, age of onset, and locus of control?III. Can I predict the probability of a person having a gym membership based on age,gender, and whether or not they use a photo-blogging app on their phone?a. I and IIb. I and IIIc. II and IIId. I, II, and IIIHow Can We Analyze These Data?Age CD Age CD Age CD22 0 40 0 54 023 0 41 1 55 124 0 46 0 58 127 0 47 0 60 128 0 48 0 60 030 0 49 1 62 130 0 49 0 65 132 0 50 1 67 133 0 51 0 71 135 1 51 1 77 138 0 52 0 81 1Age and signs of coronary heart disease (CD)How Can We Analyze These Data?• T-test? (Compare mean age of diseased and non-diseased)− Non-diseased: 38.6 years− Diseased: 58.7 years (p < 0.0001)• Correlation?• Linear regression?AGE (years)SignsofcoronarydiseaseNoYes0 20 40 60 80 100Scatter Diagram of These DataAGE (years)SignsofcoronarydiseaseNoYes0 20 40 60 80 100Scatter Diagram of These DataOlder people, more diseaseYounger people, less diseaseAGE (years)SignsofcoronarydiseaseNoYes0 20 40 60 80 100Correlation or Linear Regression?Could add alinear regressionline(1)(0)People with age above 61 have> 1 probability of diseasePeople with age below 38 have< 0 probability of diseaseWhy Logistic Regression?• Where the dependent variable (DV) is not interval or ratio, we cannotuse a simple linear regression, because…• No Normal distributions of variables• No Linearity between IV and DV to get the best fitting line• We want to predict a probability, which can only vary between 0 and1., But our simple linear regression may predict values that arebelow 0 or above 1AGE (years)SignsofcoronarydiseaseNoYes0 20 40 60 80 100A Best Line to Fit…Here’s a more realisticrepresentation of therelationship betweenthe probability of ageand disease0.00.20.40.60.81.0xLogistic FunctionProbability ofdisease (P)0.00.20.40.60.81.0xLogistic Functionα + β1X1Probability of YesProbability of No(1-Yes)• Y = α + βX (Linear regression)• Y = α + βX (Logistic Regression)– P : probability that the event Y occurs (e.g., P = Yes vs. 1 - P = No)– X : independent variable– α : constant– β : coefficient of variable X• = = x• Estimating the probability of an event taking place– How much (the size of β) each factor influences it– One unit increase in X multiplies the probability that the event Y occurs (vs. not) by Exp(β) =Bivariate Logistic Regression AnalysisLog functions are the inverses ofexponential functions.The log function ln(A) = B + C isdefined to be equivalent to theexponential equation A = eB+C = eB x eCIf eb = 1 : no change in probabilityIf eb = 2 : 2 times (100% increase) in probabilityIf eb = 0.2 : 0.2 times (80% decrease) in probability• Y = α + β1X1 + β2X2 + β3X3 ... + βnXn (Linear regression)• Y = α + β1X1 + β2X2 + β3X3 ... + βnXn (Logistic Regression)– P : probability that the event Y occurs (e.g., P = Yes vs. 1 - P = No)– Xi : independent variable i– α : constant– βi : coefficient of variable Xi• = = x x x x• Estimating the probability of an event taking place– How much (the size of βi) each factor influences it– One unit increase in X multiplies the probability that the event Y occurs (vs. not) by Exp(β1) =Multiple Logistic Regression Analysis-10120 10 20 30 40 50 60 70 80 90 100Concentration spanGorillaspotterLow concentration, more spotterHigh concentration, less spotterWhat If Negative Relationship?-10120 10 20 30 40 50 60 70 80 90 100Concentration spanGorillaspotterCould add a linear regression linePeople with CS below 21 have> 1 probability of being aspotter…People with CS above 92have < 0 probability of beinga spotter…-0.50.00.51.01.50 10 20 30 40 50 60 70 80 90 100Concentration spanGorillaspotterHere’s a more realisticrepresentation of therelationship between theprobability of differencespotting and concentrationBrainstorm 10-2• Think about a situation that you can use a negative bivariate logisticregression.• State the IV(s) and DV respectively.− IV: interval or ratio variable− DV: A nominal variable with only two categories• Create survey questions.Quiz!Q2. Which one is not a type of DV for a binary logistic regression analysis?a. True vs. Falseb. Success vs. Failurec. Eat 0 piece vs. Eat 5 pieces vs. eat 10 piecesd. Not Visit vs. VisitQ3. Logistic regression analysis can use interval or ratio scale for independent variable. (True orFalse?)– To predict website visit based on age– The null hypothesis: There is no relationship between theindependent and dependent variablesβ = 0– The alternative hypothesis: There is a relationship betweenthe independent and dependent variablesβ ≠ 0Age(IV)Websitevisit(DV)Example: Bivariate Logistic RegressionExample SPSS output: Bivariate Logistic RegressionVariable B StandarderrorExp(B) p-valueAge -.07 .01 .93 .00Intercept 3.69 .72 40.04 .00The Exp(B) coefficient for age = .93For one unit increase in age, the probability of website visit (vs. no visit)decreases by 7% (1 - .93 = .07).AgeGenderWebsitevisitExample: Multiple Logistic Regression28• To predict website visit based on levels ofconvenience and gender– The null hypothesis: all coefficients inthe model are equal to zero (none ofthe IVs have a relationship with the DV)β1 = β2 = β3 ... = βn = 0– The alternative hypothesis: not everycoefficient is simultaneously equal tozero (there is a relationship betweenIV(s) and DV)β1 = β2 = β3 ... = βn ≠ 0Example SPSS output: Multiple Logistic RegressionVariable B StandarderrorExp(B) p-valueAge -.09 .02 .914 .000Gender 3.15 .96 23.3 .001Intercept 4.01 .83 55.15 .000The Exp(B) coefficient for age = .914For one unit increase in age, the probability of website visit (vs. no visit)decreases by 8.6% (1 - .914 = .086).The Exp(B) coefficient for gender = 23.3For one unit increase in gender, the probability of website visit (vs. no visit)increases by 2230% (23.3 - 1 = 22.3).SPSS Exercise: Logistic Regression• Employee turnover: In order to predict the likelihood of marketing employees leavingan organisation, data on past employees was examined.• 60 participants provided data on four facets:− DV: Turnover: Whether they left the organisation or stayed.− IV1: Job Satisfaction: Satisfaction with their current job.− IV2: Workload: Level of perceived workload.− IV3: Employment Type: Whether they were employed on a full-time basis or not.data_14_1.savThe Exp(B) coefficient of .076 tells us thatfor a one unit increase in job satisfaction,the probability of turnover intentiondecreased by 92.4% (1 - .076 = .924).SPSS Output: Logistic RegressionOnly the IV job satisfaction was significantin predicting turnover intention, p = .002.ResultsTo estimate the probability of turnover for employees working in the marketing departmentof an organisation, a logistic regression analysis was conducted. The probability of employeeturnover was estimated using existing questionnaire data on job satisfaction, workload, andwhether employees were full-time employed. Job satisfaction was the only predictor whichsignificantly improved the model’s predictive capability (Exp(B) = .076, p = .002). If anemployee’s job satisfaction increased by one unit, there was a 92.4% reduction in the probabilityof an employee leaving the organisation. Workload and full-time employment did not appear tosignificantly influence the probability of an employee leaving the organisation.34Write-up: Logistic RegressionQuiz!Q4. Choose a correct interpretation of the following logistic regression analysis usingproduct sale as DV and price ($) as IV.a. For one $ increase in price, the probability of sale (vs. no sale) increasesby 0.1%.b. For one $ increase in price, the probability of sale (vs. no sale) decreasesby 99.9%.c. The relationship between price and product sale is not statisticallysignificant.d. Price has a positive effect on product sale.Variable B StandarderrorExp(B) p-valueprice -6.6035 2.3514 .001 .004Intercept .3070 .1148 1.36 .008Key Takeaways• When can marketers use logistic regression analysis?• Can you choose appropriate variables for IV(s) an DVfor logistic regression analysis?• Can you state null and alternative hypothesis for logisticregression analysis?• Can you run logistic regression analysis using SPSS?• Can you test significance of logistic regressionanalysis?• Can you interpret output of logistic regression analysisand suggest managerial implications?36Housekeeping• See if you can replicate SPSS outputs of the lecture note.• Have a weakly group meeting to work with your group project.• Download data you collected from Qualtrics and completedata analysis by week 11.• Final presentation: week 12-13 (randomly ordered)• Due date for research report: June 3rd 2022 11:59 pm37Q&A