Python代写-QBUS6810

时间：2022-05-23

HD EDUCATION

HD
EDUCATION

QBUS6810
作业拓展课

TUTOR: Burger

HD EDUCATION

HD EDUCATION

TUTOR 自我介绍：

1. USYD 2021 届研究生
2. 商业分析和金融
3. Ba 平均成绩 HD, 收到 qbus6810 和 qbus6860 推荐信
4. 做事认真负责，多次担任组长
5. 喜欢挖掘好吃的汉堡和做饭
6. 曾在中国银行和中信证券核心部门实习，现在一家跨国企业担任项目运营
HD EDUCATION

1.点【参会者】
2.点【举⼿】即可与⽼师实时互动
3.问题被解答了还可以【⼿放下】
红圈处输⼊问题提问
红圈处输⼊问题提问
HD EDUCATION

学科特点及学习方法

学科特点：
1. Qbus6810 是 buss6002 的课程延伸，在原本回归模型基础上新增很多新模型
2. 这门课程还是 Qbus6850 的前置课程，为后续学习铺垫重要理论知识和代码实践
基础
3. USYD 商业分析的黄金课程之一，对于以后想要从事商业分析师或者想进入金融
机构做数据分析和风险管理的同学都非常有帮助。

学习方法
1. 有数学，但不难不要害怕，重在理解，死记硬背效果甚微
2. 课程信息量非常大，每周要好好听课。因为知识有串联，前几周是重要基础，前
面听不懂后面也很难跟得上学校的步伐。不要堆到考试才抱佛脚！！
3. 自己学会总结知识点易错点，理清思路和思维逻辑！
不懂的问题最好每周解决，可以在 tut 上问老师或者在邮件和 ed 问问题

HD EDUCATION

本节课知识点
作业占比：30%
需要提交：一篇 15 页内的 report，team expectations agreement, Kaggle predictions, python code, self and
peer assessment
General rules:
• Originality. The analysis of the data must be entirely the group's original work. If you borrow material
from anywhere based on the same or similar dataset, it will be disregarded by the marking even with
appropriate referencing.
• Length. Your written report should have a maximum of 15 pages (single-spaced, 11pt; cover
page, references, and appendix not included). However, there would be no penalties for exceeding
the limit, within reason. The main part of your report should be objective and focus on the highlights of
your analysis. You can include as much extra material to support the main report as you like in the
appendix.
• Computation. You must use Python for this assignment.
• Kaggle competition. Your work should be strictly based only on the training and test sets provided.
The predictions for the test data on Kaggle must come from your own analysis in Python and be
consistent with the description in the report.
• Announcements. You must follow any further instructions announced on Canvas
• Referencing. You must follow the University of Sydney referencing rules and guidelines. In particular,
you must use quotation marks and provide a reference whenever you copy and paste someone else's
words.
• University rules. You must follow all other University of Sydney rules relevant to this assessment.

HD EDUCATION

Deadline:

Regression Project: Airbnb Pricing Analytics

1. Overview：
In this project your team will analyse data from Airbnb rentals in Sydney to provide market advice to hosts,
real estate investors, and other stakeholders（你的客户对象）.
two tasks:
• build a predictive model for vacation rental prices 建模预测假日的房租价格
• uncover interesting facts from the data that can help your clients make better decisions. 找出一些数据
有趣的规律，能够为客户提供参考，让他们做出好的投资决策或价格决策
2. Problem description
Airbnb (www.airbnb.com) is a global platform that runs an online marketplace for short term travel rentals.
As a team of data scientists and business analysts working at a market intelligence and consulting company
targeting the Airbnb market, you are tasked with developing an advice service for hosts, property managers,
and real estate investors.
HD EDUCATION

To achieve your project’s goals, you are provided with a dataset containing detailed information on a number
of existing Airbnb listings in Sydney. Your team has two tasks:
1) To develop a predictive model for the daily prices of Airbnb rentals based on state-of- the-art machine
learning techniques. This model will and allow the company to advise hosts on pricing and to help
owners and investors to predict the potential revenue of Airbnb rentals (which also depends on the
occupancy rate). 预测房价
2) To obtain at least three insights that can help hosts to make better decisions. What are the best hosts
doing? 一个好房东会做什么才能得到好评？房价会高？入住率提升….（发现数据特别的规律）？你得
到了什么 insights？
We will refer to these tasks as statistical learning and data mining respectively.
As part of the contract, you are asked to write a report according to the instructions given below.

3. Understanding the data
3.1 Training, validation, and test sets
The data are split into two files, a training dataset and a second dataset for validation and evaluation. The
second omits the price values.
We will run a Kaggle competition as part of the assignment. Kaggle randomly splits the observations in the
second file into validation (50%) and test (50%) cases, but you will not know which ones are which. You get a
score equal to the RMSLE computed on the validation cases when you submit to the competition. These
scores are displayed on the Public Leaderboard and provide an ongoing ranking of teams. You can use the
scores of your submissions to help you select the best predictive model.
You will select one of your submissions to be used as the final model at the end of the competition. Once the
competition is over, Kaggle will rank the teams’ final submissions based on the test cases only. Those will be
displayed on the Private Leaderboard. Your goal is to achieve the best possible score on the Private
HD EDUCATION

Leaderboard at the end of the competition.
Be careful not to overfit the validation cases in an attempt to improve your public ranking.
要把数据分为 training set 和 validation set

3.2 Data description
Each row corresponds to a separate Airbnb listing in Sydney. Because the dataset was scraped from Airbnb,
a detailed description of the variables is not available. However, you can identify their meanings from the
context.
The response variable, price, is the last column in the training dataset. It gives the price per night for each
listing in Australian Dollars. The latitude and longitude variables specify the geographic location of each
property. Some variables are binary, with the word “true” recorded as “t” and “false” recorded as “f”.
Since this is a real dataset, you will encounter several practical issues, such as redundant columns and
missing values. Overcoming these practical problems is part of the assessment.

4. Statistical Learning (Task 1)
Requirements:
• Your report must provide validation or cross-validation metrics based on the training set.
• Your report must provide the Kaggle Public Leaderboard scores for at least five different sets of
predictions, including your final model. You need to submit to Kaggle to get each validation score. The
five sets of predictions should all come from different machine learning methods.
• At least one of your models should be a linear model. 一个线性模型
• At least one of your models should be a tree-based model. 一个树模型
• At least one of your models should be a model average or model stack. 一个 model average/model
stack
• Identify one of your five models as a benchmark. 其中一个做 benchmark（一般是 linear regression）
Note that these are only minimum requirements. Refer to the rubric for the details on the marking criteria.
HD EDUCATION

Suggested:
• Try to build at least some features based on text data.
5. Data Mining (Task 2)
Business question: What are the best hosts doing? Requirements:
• Extract at least three quantitative insights from the data that address the business question.
• The meaning of “best hosts” is for the group to decide based on the context of the project. Your clients are
hosts and real estate investors, so they’d probably be interested in maximising their property income.
Therefore, you want to consider outcomes that relate to that, such as price and revenue.

Notes:
• This task is open-ended as is the nature of data mining applications. Here you should think creatively and
explore the data in a way that is interesting for you. The ability to explore open-ended problems is important
for industry work in data science.
• Remember that association is not causation. Do not oversell your insights.

6.Written report
The purpose of the report is to describe, explain, and justify your solution to the clients. You can assume that
the clients have training in business analytics. However, please do not assume that they are experts on the
methods used in your project. 他们不是专家，写人话！
Preparing the report will involve careful consideration of what should go in the main text (15 pages). The main
text should focus on the highlights of the project. Note that there is no page limit for the appendix. It’s ok to
put extra material (such as additional figures and tables) in the appendix and refer to it in the main text.
Requirements:
• The report should discuss problem formulation, exploratory data analysis, feature engineering,
methodology, and results. 问题是什么，怎么做 EDA和处理数据，模型方法和作用，结果和结论
• Write about the data mining task in a separate section.
HD EDUCATION

• In the problem formulation section, discuss the business problem from the perspective of decision
theory. In particular, is Airbnb pricing a prediction problem? In what ways can machine learning
meaningfully help hosts to increase revenue or reduce costs?
• Discuss three models in detail in the methodology section. One model should be your best linear model,
the other your best nonlinear model, and the third is the model stack (or average).
• When you submit the report on Canvas, include the Python code that generates all the results that
appear on the report as an additional attachment.
Suggested outline:
1. Introduction: write a few paragraphs introducing the project and overview the methodology and main
results. Use plain English and avoid technical language as much as possible in this section (write it for
a broad audience).
2. Problem formulation and objectives: state the problem to be solved and the goals of the project.
3. Exploratory data analysis: provide essential information about the data, discuss potential issues and
highlight the most interesting findings. Due to a possible lack of space, you may want to refer to the
appendix for most EDA plots.
4. Feature engineering.
5. Methodology: focus on the three models specified above. Explain the rationale for using these
learning algorithms and explain the choices that you’ve made regarding configuration, training and
hyperparameter optimisation. This part is allowed to be more technical than the rest of the report.
6. Results.
7. What are the best hosts doing?
7. Kaggle Competition
We’ll post the link to join the competition on Canvas.
You will need to create a Kaggle account identifiable by your name to access the competition and make
submissions. After creating an account and logging into Kaggle, use the provided link to get to the competition
page. Click on “Join Competition”, located in a light blue box near the top right corner of the page, then click
to accept the competition rules.
HD EDUCATION

Each group should create a team on Kaggle. The group leader can create a team by joining the competition
and then going into the “Team” tab, which will appear near the top of the competition page. The leader can
then invite other group members using their (Kaggle) names. The name of the Kaggle team must be identical
to the group name on Canvas, i.e. the team number must match the group number. Each student in the group
must sign up and be identifiable as a member of a Kaggle team.

Requirement: the Kaggle team must be set up and have a valid submission by the first prediction deadline
posted on Canvas.

The purpose of the Kaggle competition is to incorporate feedback by allowing you to compare your
performance with that of other groups. Participation in the competition is part of the assessment. Make sure
that your final submission is correct. Your ranking in the competition will typically not affect your marks directly,
as long as we can establish that your participation represents a genuine effort to submit good predictions and
improve them over the course of the competition.

Real-world relevance: employers highly value the ability to participate in a Kaggle competition. Some
companies in Australia go as far as to set up a Kaggle competition just for recruitment.

Bonus marks: The team with the best performance on the Private Leaderboard will receive ten bonus marks
on the assignment. To qualify for the bonus, the choice of the final model needs to be well justified in the report,
and your Python code must reproduce the winning predictions. Furthermore, the group would need to post a
description of their winning solution on Ed. Please do this as soon as you can once the competition is finished.
Attention! You have to manually select which submission Kaggle will use to compute the test (Private
Leaderboard) results. It will not necessarily pick the best submission for you.

HD EDUCATION

Classification Project: Marketing Analytics
1. Overview
In this project, your team will analyse marketing data from a bank and a retail company. Your team will have
two tasks. The first will be to build machine learning models to predict the success of marketing campaigns.
The second will be to uncover insights that can help your clients make better marketing decisions. 你要分析
银行和一家零售公司的数据，建模看看零售公司的广告是否成功，你还要给客户建议，如何做好的市场营销决
策
2. Problem description
As a team of data scientists and business analysts working for a marketing consulting company, you have
been tasked with helping two clients, a bank and a fashion store, to leverage their data to increase the
effectiveness of their marketing campaigns. 帮助他们提高广告的有效性
The two clients provided your team with data from their latest direct marketing campaigns. You have two tasks:
1. To develop statistical learning models to predict whether the marketing campaign will be successful with a
customer.
2. To obtain at least three insights that can help the clients make decisions about their marketing campaigns.
What types of customers are more responsive to marketing campaigns?
We will refer to these tasks as statistical learning and data mining, respectively.
As part of the project, you need to write a report according to the instructions below.
3. Understanding the data
3.1 Two datasets
This project involves two marketing datasets, one from a bank and another from a fashion store. The
assignment requires you to work with both datasets.
HD EDUCATION

One dataset primarily has numerical variables, while the other emphasises categorical variables.
3.2 Bank dataset
The bank dataset is from a phone campaign to encourage clients to subscribe to a term deposit.
The dataset has two files, a training dataset and a second dataset without the response labels for the
Kaggle competition.
Kaggle randomly splits this second file into validation (50%) and test (50%) cases, but you will not know
which ones are which. You get a score equal to the competition metric (to be announced) computed on the
validation cases when you submit to the competition. These scores are displayed on the Public Leaderboard
and provide an ongoing ranking of teams. You can use the scores of your submissions to help you select the
best model.
You will select one of your submissions to be used as the final model at the end of the competition. Once the
competition is over, Kaggle will rank the teams’ final submissions based on the test cases only, and those
will be displayed on the Private Leaderboard. Your goal is to score as best as possible on the Private
Leaderboard at the end of the competition. Therefore, please be careful not to overfit the validation cases in
an attempt to improve your public ranking.
Each row corresponds to a call made to a customer. The response variable, subscribed, is the last column in
the dataset. It indicates whether the client subscribed to a term deposit, which was the objective of the
campaign.
The data dictionary file describes the predictor variables.
3.3 Fashion store dataset
The store dataset refers to a promotional e-mail campaign.
Each row refers to a different customer. The response variable, RESP, indicates whether the customer
responded to the promotion. It’s the last column in the dataset.
HD EDUCATION

The data dictionary file describes the predictor variables.
3. Statistical Learning (Task 1)
Requirement for bank dataset only:
• Assume a loss matrix.
• Your report must provide model selection results for at least five different models,including your final
model.
• Your report must include model evaluation.
Requirements for both datasets:
• At least one of your models should be a linear model.
• At least one of your models should be a tree-based model.
• At least one of your models should be a model average or model stack.
• Identify one of your five models as the benchmark.
Note that these are only minimum requirements. Refer to the rubric for the details on the marking criteria.
5.Data Mining (Task 2)
Business question: What kinds of customers are most responsive to marketing campaigns?
Requirements:
• Extract at least three quantitative insights from the data that address the business question.
• You can use any combination of the two datasets for this task.
Notes:
• This task is open-ended, as is the nature of data mining applications. Think creatively and explore
the data in a way that you find interesting. The ability to approach open- ended problems is vital in
data science.
• Remember that association is not causation. Do not oversell your insights.
HD EDUCATION

6.Written report
The purpose of the report is to describe, explain, and justify your solution to the clients. You can assume
that the clients have training in business analytics. However, do not assume that they are experts on the
methods used in your project.
Preparing the report will involve careful consideration of what should go in the main text (15 pages). The
main text should focus on the highlights of the project. Note that there is no page limit for the appendix. It’s
ok to put extra material (such as additional figures and tables) in the appendix and refer to it in the main text.
Requirements:
• Discuss problem formulation, exploratory data analysis, feature engineering, methodology, and
results.
• Write about the data mining task in a separate section.
• In the problem formulation section, discuss the business problem from the perspective of decision
theory. Is it a prediction problem? How can machine learning help businesses optimise their
marketing efforts?
• Discuss three models in detail in the methodology section. One model should be your best linear
model, the other your best nonlinear model, and the third is the model stack (or average).
• When you submit the report on Canvas, include the Python code that generates all the results that
appear on the report as an additional attachment.
Suggested outline:
1. Introduction: write a few paragraphs introducing the project and overview the methodology and
main results. Use plain English and avoid technical language as much as possible in this section
(write it for a broad audience).
2. Problem formulation and objectives: state the problem to be solved and the goals of the project.
3. Data understanding: provide essential information about the data, discuss potential issues, and
highlight the most interesting findings. Due to a possible lack of space, you may want to refer to
the appendix for most EDA plots.
4. Feature engineering.
HD EDUCATION

5. Methodology: focus on the three models specified above. Explain the rationale for using these
learning algorithms and explain the choices that you’ve made regarding configuration, training and
hyperparameter optimisation. This part is allowed to be more technical than the rest of the report.
6. Results.
7. What kinds of customers are most responsive to marketing campaigns?
7. Kaggle Competition
We will post the link to join the competition on Canvas.
You will need to create a Kaggle account identifiable by your name to access the competition and make
submissions. After creating an account and logging into Kaggle, use the provided link to get to the
competition page. Click on “Join Competition”, located in a light blue box near the top right corner of the
page, then click to accept the competition rules.
Each group should create a team on Kaggle. The group leader can create a team by joining the
competition and then going into the “Team” tab, which will appear near the top of the competition page.
The leader can then invite other group members using their (Kaggle) names. The name of the Kaggle
team must be identical to the group name on Canvas, i.e. the team number must match the group
number. Each student in the group must sign up and be identifiable as a member of a Kaggle team.
Requirement: the Kaggle team must be set up and have a valid submission by the first prediction
deadline posted on Canvas.
The purpose of the Kaggle competition is to incorporate feedback by allowing you to compare your
performance with that of other groups. Participation in the competition is part of the assessment. Make
sure that your final submission is correct. Your ranking in the competition will typically not affect your
marks directly, as long as we can establish that your participation represents a genuine effort to submit
good predictions and improve them over the course of the competition.
Real-world relevance: employers highly value the ability to participate in a Kaggle competition. Some
companies in Australia go as far as to set up a Kaggle competition just for recruitment.
HD EDUCATION

Bonus marks: The team with the best performance on the Private Leaderboard will receive ten bonus
marks on the assignment. To qualify for the bonus, the choice of the final model needs to be well justified
in the report, and your Python code must reproduce the winning predictions. Furthermore, the group
would need to post a description of their winning solution on Ed. Please do this as soon as you can once
the competition is finished.
Attention! You have to manually select which submission Kaggle will use to compute the test
(Private Leaderboard) results. It will not necessarily pick the best submission for you.

HD EDUCATION

第⼀步
关注【海道教育】服务号
第⼆步
点击【购买通知】或【上课提醒】
课程结束后，如果您对课程或者服务的任何建议和意⻅
请给予我们提⾼和改进的机会，感谢您对 HD · EDUCATION 课程和服务的信任！
第三步
【填写问卷】