BSAN2205-无代写
时间:2024-08-30
BSAN2205 MACHINE LEARNING FOR BUSINESS
Project Plan
The course BSAN2205 Machine Learning for Business has three assessment items including a
Project Plan, a Project Report and Presentation, and a School-based Take-home Assessment
(weighted 20%, 50%, and 30%, respectively). These notes outline my expectations for the
Project Plan and introduce the context for the project work. I intend the Plan or proposal to be a
formative piece of assessment. The Plan should set the groundwork for your project and project
report. I will provide feedback on your Plan that you can incorporate into your project.
Background and Context

In competitive markets, businesses face the challenge of acquiring and retaining customers.
Consider subscription services, for example, subscriptions to digital editions of newspapers and
magazines, subscriptions to streaming services (film and television, music, news, sport, etc.), and
subscriptions to cable television services (Foxtel). Other businesses face the same challenges, for
example, airlines, banks, insurance companies, telecommunication companies, and retailers,
restaurants, and personal services businesses. One retention strategy is to deepen relationships
with customers through “upselling” – convincing a customer to buy something in addition to or more
expensive than that they have previously purchased from a business. Streaming services like Netflix
and Spotify strive to build customer “engagement” – increasing the number of downloads and/or
the time spent streaming.

Bank marketing provides the specific context for the project. Like many consumer businesses, banks
confront the challenges of attracting new customers and retaining existing customers. Strategies for
retaining customers provides the setting for the project. For banks, engagement is reflected in the
number of products (active accounts) customers maintain. Often retention strategies have the goal
of deepening engagement by encouraging customers to open new accounts. Consolidating accounts
with one rather than many banks may offer consumers some benefits at the margin. For example,
highly engaged customers may be offered lower rates on loans, access to services for which they do
not have to pay (at least, not directly), and minimising the overall burden of managing multiple
banking relationships. For banks, the benefits of more highly engaged customers are larger and
more stable cash flows, lower marketing expenses (with the costs of attracting a customer higher
than the costs of retaining a customer, per customer relationship economics), and thus potentially
higher profits.

Before moving on, I would like you to appreciate that in problems in business can be solved through
effective predictive models of binary outcomes. The decision to purchase or not purchase shares in
a company, to acquire or merge with another business, to hire or not hire a prospective employee,
etc. All of these decisions involve binary outcomes (in some cases, they can be characterised as
“go/no go” decisions). The specific focus of the project is customer acceptance of a marketing offer,
but the concepts and models have much broader application.
Aims of the Proposal
The Project Plan has two broad aims. Firstly, the Plan is a marketing document. Second, the Plan is
a roadmap. As a marketing document, the Project Plan must sell the project to the stakeholder(s)
and/or client. Thus, the Plan should emphasis the emphasis of doing the project. As a proposal or
“roadmap,” the Project Plan should outline in some detail the likely direction of the project. This
might include identifying the key variables and methods of analysis.
2
Key Sections of the Project Plan

More specifically, you might consider including the following sections in your Plan.
1. Background statement
2. Conceptual development
3. Variable selection
4. Methods of analysis/analysis plan
5. Form of the results
6. Next steps
In the background statement (section 1), you may wish to sketch out the initial motivation for the
study. This might include reference to the key stakeholder(s) and/or client. I recommend targeting
the proposal at a (hypothetical) client to bring a degree of realism to project and to help focus the
project (for example, you could contextualise the study with reference to an Australian bank). In this
section, also make sure to sell the project. What are the likely benefits of doing the project, what
new insights do you anticipate and how will these improve decision making for example?
You might find value in a section 2 that outlines the conceptual framework for your project work. If
you focus your project on customer engagement with banks, for example, you might give some
thought to advantages to banks and their customers from greater engagement and the process that
might drive customers to respond favourably to a bank’s marketing efforts. My preference is you
use your own common sense and logic to define the key concepts and to develop a rationale for
their links. I do not expect a review of the literature, but you might find some desk (Google)
research helpful in identifying past studies that have explored similar issues to the ones you are. A
boxes and arrows diagram might help to illustrate the core concepts and relationships.
The section on variable selection is probably the key section (section 3). Be very specific about the
variables you intend to study. In the social science tradition, much emphasis is placed on explaining
why the variables selected for study have been selected – the focus is explanation rather than
prediction. This is less the case with the data science paradigm with its focus on prediction –
business analysts/data scientists may wish to specific a (initial) model that includes all of the possible
feature variables. My minimum expectation for this section is that you provide some description of
the output and feature variables you intend to study, and why these feature variables.
Section 4 outlines the methods of analysis. Here I would you to be specific about the models you
might use to analyse the data. You may have completed the course BSAN2204 Methods of Business
Analytics. A focus of that course was predicting a numeric output variable (“song hotness”) using
linear regression. For this course (BSAN2205 Machine Learning for Business), our target variable is
categorical: it records whether customers opened or did not open a new account in response to the
Bank’s marketing efforts. My expectations for section 4 are that you can identify an appropriate
statistical model(s) for analysing the data, state something about the assumptions of the model, and
perhaps list the key steps in employing the model. You could also write out the specific model you
intend estimating (write out the regression equation, for example, with reference to the y- and x-
variables).
Section 5 – form of the results – should give an indication of what the outputs might look like. You
could do mock-up of the results. You could also say that you will document the results in
PowerPoint format and present them verbally. The next steps section concludes the proposal. Here
you might remind the client of the core benefits and indicate you need to initialise the project (final
client sign-off, for example). You could also add a timeline or perhaps Gantt chart (timetabling the
key activities, when you will do them, and identifying any critical paths). At this stage, refrain from
3
doing any statistical analysis of the data – save the analysis for the project reports. Use the Plan to
develop some general knowledge of the models you intend to use and sketch out your best plan for
the analysis you intend to implement.
The final section of your Plan might address next steps (Section 6). You can briefly restate the main
motivation for your Plan and highlight the key “next steps.” Remember the Plan is a marketing
document – perhaps remind the reader of the Plan that this project is an important one and should
be completed now.
The Bank Marketing Dataset
The project work for this Semester uses the Bank Marketing dataset. Several variations of the
dataset exist. There is one variation available from the UCI Machine Learning Repository and
another variation on Kaggle. We will use the version of the dataset available from Kaggle (with some
minor variations). Owned by Google, Kaggle is an online community of business analysts and data
scientists. Users can freely upload and download data to and from the site (kaggle.com). Kaggle
runs competitions often sponsored by third parties. I encourage you to explore the Kaggle website
and join the Kaggle community. Kaggle is a great place for those with an interest in machine
learning.
I have downloaded the dataset from Kaggle, introduced some further variations, and placed the
dataset to the Blackboard site. Please use this version of the dataset for your project. Appendix A
provides a list of the variables in the Bank Marketing dataset, including brief descriptions. The target
or output variable is customers’ responses to a recent marketing campaign run by the Bank (the
Bank being a European bank, specifically, a Portuguese bank). The data is real-world data offered
freely by the Bank to the data science community. The data consists of 21 variables (the target
variable and 20 feature variables) and observations on approximately 40,000 customers targeted
with a particular marketing campaign. The output variable is a binary categorical variable –
customers responded to the marketing campaign by either opening a new account or not. The 20
feature variables include a mix of variables reflecting customers’ characteristics (age, education,
etc.), the nature and status of their existing accounts with the Bank (type of accounts, accounts in
debit, etc.), variables describing the campaign (number of customer contacts during the campaign),
and socio-economic variables (consumer confidence, etc.). The feature variables are a mix of
categorical and numeric variables.
Given the output variable is a (binary) categorical variable you should explore model forms other
than linear regression. As a starting point, I recommend you fit a logistic regression model to the
data and subsequently use tree-based methods. A comparison of these methods could be an
important of your overall project (logistic regression vs decision trees). Further, you might explore
ensemble methods to enhance your implementation of tree-based methods. We will cover these
methods in the coming weeks!
Submission Guidelines
The Project Plan has a weight of 20 percent of your score for the course. Please submit your Plan in
the form of a written Word document. I expect you could easily write 2,000 words. Try not to write
more than 3,000. I will give your Plan a score out of 100. I will also provide you with written
feedback. When marking the Project Plan, I will be looking closely at the links between the sections
as much as what you write in each individual section. For example, the background statement
should set-up the conceptual development that in turn should set-up the variable selection etc. A
high scoring Plan will have a degree of novelty to it (a unique and/or compelling contextualisation, a
4
thoughtfully specified analysis plan – including appropriate performance metrics, etc.). Finally, these
notes are a guide only to preparing your Project Plan. You may find other ways to present it that are
more compelling, more compact, and more complete. If in doubt, do what you think is best.
I will separately provide you with the marking criteria for the Project Plan. Note they will closely
follow the criteria of the Project Plan for the course BSAN2204 Methods of Business Analytics.
5
Appendix A
The Bank Marketing Dataset is based on the “Bank Marketing” UCI dataset, with some variations.
Table 1 below lists the variables in the dataset and offers brief descriptions.
Table A1
Variables and Variable Descriptions
Variable Variable Name Variable Type Units/Category Labels
Age

age Numeric Years
Type of Job job Categorical admin
blue-collar
entrepreneur
housemaid
management
retired
self-employed
services
student
technician
unemployed
unknown
Marital Status marital Categorical divorced
married
single
unknown
Education History education Categorical basic4y
basic6y
basic9y
highschool
illiterate
professionalcourse
universitydegree
unknown
Credit in Default default Categorical no
yes
unknown
Housing Loan housing Categorical no
yes
unknown
Personal Loan loan Categorical no
yes
unknown
Contact Type contact Categorical cellular
telephone
Month of Last Contact month Categorical jan
feb
mar
.
.
.
6
Table A1 (Cont’d)
Variables and Variable Descriptions
Variable Variable Name Variable Type Units/Category Labels
Day of Last Contact day_of_week Categorical mon
tue
wed
thu
fri
Duration of Last Call
duration Numeric Seconds
Number of Contacts
campaign Numeric Counts
Days since Last
Contact
pdays Numeric Days
Prior Contacts
previous Numeric Counts
Response to Last
Campaign
poutcome Categorical failure
nonexistent
success
Cyclical Employment
Variation
emp_var_rate Numeric Index
Consumer Price Index
cons_price_idx Numeric Index
Consumer Confidence
cons_conf_idx Numeric Index
Euro Interbank
Offered Rate
(Euriobor)
euribor3m Numeric Interest rate
Employment Rate nr_employed Numeric Index
Customer Response response Categorical no
yes
essay、essay代写