K5-R代写
时间:2023-03-15
Customer Analytics (Practice) Final Exam – 70 minutes BU.450.760.K5
NO COMMUNICATION WITH OTHERS IS PERMITTED
1
The Johns Hopkins Carey Business School
Honor Code
The Carey Business School measures success by the way a Carey graduate stands out as an
innovative business leader and exemplary citizen. The Carey community believes that honesty,
integrity, and community responsibility are qualities inherent in an exemplary citizen. The objective
of the Carey Business School Honor Code is to create an environment of trust among all members
of the academic community while the qualities associated with success are developed in students.
The Honor Code requires that each student act with honesty and integrity in all academic and co-
curricular activities and that each student endeavor to hold his or her peers to the same standard.
Upon witnessing an alleged violation of the Honor Code, a student is expected to inform either the
responsible faculty member or the Honor Council of both the alleged violation and the name of the
student accused of committing the alleged violation. Each member of the Carey community, as a
person of integrity, has a personal obligation to adhere to this requirement. It is only by upholding
the Honor Code that members of the entire Carey community can contribute to the School’s ability
to maintain its high standards and its reputation.
Violations of this agreement are viewed as serious matters that are subject to disciplinary sanctions
imposed by the Honor Council of the Carey Business School, which is composed of a fair
representation of part-time and full-time MBA, MS, BS and BBA students and faculty members.
INSTRUCTIONS
• No interpersonal communication.
• To answer questions, make assumptions if necessary.
• Fill-in your answers into the “EXAM_ANSWERS.docx” template. Do not exceed the allotted
number of lines.
• Continuously save your work. Make sure you upload the correct file and that upload is
successful.
• Submit the file with answers via the “Final exam” link in the assignments tab in Blackboard.
This link expires 2 minutes after due time. In this event, email submission to instructor
(jzliu@jhu.edu). Late submissions face a per-minute point penalty.
Customer Analytics (Practice) Final Exam – 70 minutes BU.450.760.K5
NO COMMUNICATION WITH OTHERS IS PERMITTED
2
1. [8 points] Consider the following sample corpus from Yelp. Each row (review) is a
document. Assume the list of stopwords = c(“so”, “or”, “when” “and”, “the”) and non-
words contain white space, punctuation, numbers, and symbols (e.g. $).
[1] All the food is great here. But the best thing they have is their wings. Their wings are simply fantastic!!
[2] This place is truly a Yinzer's dream!! \"Pittsburgh Dad\" would love this place n'at!!
[3] Wing sauce is like water. Pretty much a lot of butter and some hot sauce (franks red hot maybe).
[4] The whole wings are good size and crispy, but for $1 a wing the sauce could be better.
[5] The fish sandwich is good and is a large portion, sides are decent.
(1) [4 points] After removing non-words and stopwords, what is the TF-IDF score of the
term “good” in doc 4?
# terms in doc 4 = 14
TF = !"#$%#&'( *! “,**-” /& -*' # 1#"23 /& -*' = 445
IDF = 6 & # 789 :; <=> 98?@AB# 789 <=C< :;9DA7> <=> <>?E' = 6 &F6'
TF-IDF = TF * IDF =6 &F6' ∗ 445 ≈ 0.0944
(2) [4 points] If we use this corpus to predict restaurants’ survival rate, is there a “wide X”
problem? Why or why not. Please explain.
The “wide X” problem refers to the case where the number of variables (i.e. coefficients to
estimate) >> number of observations. In this case, there is potentially “wide X” problem
because we have way more terms (~ 60-70 terms even after pre-processing) than
observations (= 5).
We can solve the wide X problem by removing some sparse terms from document-term
matrix and evaluate model performance in validation sample.
2. [8 points] Suppose that your work for the marketing division of the athletic apparel
company Reebok. You are now discussing the allocation of advertising dollars. In a
meeting, the chart shown below is presented. This chart describes the relationship between
the number of times Facebook users see an ad for Reebok shoes (horizontal axis) and the
probability that users will purchase a pair of Reebok shoes after clicking on the link
(vertical axis). Your colleague presents this figure in a meeting, arguing that it provides
“undisputable evidence that advertising on Facebook pays-off” and that the company
should “probably increase the number of advertising dollars in this platform.” Do you
agree? Explain your argument.
Customer Analytics (Practice) Final Exam – 70 minutes BU.450.760.K5
NO COMMUNICATION WITH OTHERS IS PERMITTED
3
• The statement based on the graph is misleading b/c correlation is not causation.
• There are potentially many reasons other than being exposed to Facebook ads that could lead to
the correlation between one’s purchase probability and number of times one saw an ad. For
example, people who are loyal to Reebok might have followed many Reebok-related Facebook
accounts, thus more likely to see more ads on average. If more ads are allocated to people who
will purchase with a high probability anyway, it would seem the ads are very effective by just
looking at the correlation.
• This is the very definition of “selection bias” and the reason why we need to run an A/B test to
estimate the true “causal effect” of advertising on conversion (purchase probability).
3. [8 points] The blue line in the graph below represents the outcomes of a set of units
affected by a shock (ie, “as if” natural experiment) that unfolded in week 60 of the dataset
at hand. To evaluate the impact of this shock on treated units, we would like to implement a
diff-in-diff analysis, which requires us to select a control series. Our data contains two
candidate control series, controls #1 and #2, respectively shown by the red and green lines.
Which of these two controls would you select to implement the diff-in-diff analysis? Justify
your answer.
Customer Analytics (Practice) Final Exam – 70 minutes BU.450.760.K5
NO COMMUNICATION WITH OTHERS IS PERMITTED
4
We should choose # 1 as the control group. This is because based on the data before week 60 (i.e.
prior to the event) it has a parallel trend to the treated group.
On the other hand, the outcome trend of potential control #2 group is almost opposite to that of the
treated, implying that post-event gap between the two groups are not attributable solely to the
treatment effect.
4. [6 points] The figure below represents the stylized impact of a shock (ie, natural
experiment) on a set of treated units (in red). Blue markers represent the outcomes that are
observed for a (adequate) control.
What are implied values for parameters G, 4, 6, H in the below diff-in-diff equation?
= G + 4 + 6 + H × +
G = 33, 4 = (15 − 33) = −18, 6 = (22 − 33) = −11, H = 38 − 22 − (15 − 33) = 16 + 18 = 34
(Note that if not due to the event, the outcome of the treated group was supposed to fall based on
the trend of the control group.)
Customer Analytics (Practice) Final Exam – 70 minutes BU.450.760.K5
NO COMMUNICATION WITH OTHERS IS PERMITTED
5
5. [10 points] Suppose that you manage marketing campaigns for a subscription-based
business. You customer base is described by three segments, as shown by the table below.
You are considering a campaign that consists of sending out a one-time gift by the amount
of $100 (e.g., champaign bottle) per segment. It is believed that this gift discount would
reduce the churn probability of segment 1 customers by 0.05, of segment 3 2 customers by
0.03, and of segment 3 customers by 0.01.
[Assume unlimited budget.]
• [5 points] Would it make sense to adopt this campaign (all segments receive the
gift)? What would be the impact on the valuation of the firm’s customer
portfolio?
No. This is because the gain from campaign < cost from campaign for segment
1&2.
If we were to target all segments, the valuation of customer portfolio will
increase by $136.
• [5 points] Judging from your results, should the company consider an
alternative targeting policy? What would be the impact on the valuation of the
firm’s customer portfolio in this case?
Yes. The company should only target segment 3 because only that segment
satisfies (new CLV-baseline CLV) > $100.
Under the optimal policy, the valuation would increase by $104.
Notes: (i) Use a discount factor of 0.97, (ii) Work in excel but paste your results into the
document containing your answers. = 11 − ∗ ; = ∗ = 1 − > $100
Total portfolio = SUM(CLV * % of customer base)
essay、essay代写