ICM317 - Machine Learning and Big Data in Finance
2020/2021 In Class Project: Part 1
Submission Date: Monday 10th May 2021, 12pm UK Time (noon)
Introduction:
This is Part-1 of your project and accounts for 20% of the overall ICM317 assessment.
The project covers material from Lectures 1 to 10 of ICM317. In addition, you need to
complete Part-2 of the project which accounts for an additional 20% of the overall
ICM317 assessment, details of Part-2 are below. Part-2 will cover material from
Lectures 11 to 20 of ICM317. The two parts of the project are completely independent
of each other. You should work alone on this project and make your own individual
submission.
Project Background: You work in the Debt Origination team at Hyper Big Bank. Your
team helps companies to raise money by finding buyers for Commercial Bonds.
For those unfamiliar with Commercial Bonds, they work like this. Let’s say a company
called Superstar Manufacturing wants to open a new factory and they need to raise
money for the project. Superstar Manufacturing could approach the Debt Origination
team at Hyper Big Bank to create a Bond Issuance. For example, Hyper Big Bank could
work with Superstar Manufacturing (The Bond Issuer) to create 5-year Bonds, each
bond will have a Face Value of $100 and pay a Coupon of $4 per year (for example).
Hyper Big Bank will then try and find buyers for the Bonds, the money raised (minus a
fee from Hyper Big Bank) will be passed to Superstar Manufacturing (The Bond Issuer).
Over the course of the next 5 years, Superstar Manufacturing will pay $4 per year to
each Bond Holder for each Bond that they own, this will be equivalent to Interest Paid.
At the end of 5 years, Superstar Manufacturing will pay $100 (The Face Value) to each
Bond Holder for each Bond that they own. The Bond Buyer has a risk that Superstar
Manufacturing will not be able to maintain payments and then the Bond Buyer has a
risk of losing money. For this reason, Bond Buyers would prefer some Bonds over
others depending on the level of the Coupon ($4 per year in this case), the initial Bond
Price and the Bond Buyers perceived probability of the Bond Issuer defaulting.
The Debt Origination team is working on Four Bond issuances for Four Companies, we
will call the Four Bonds [BondA, BondB, BondC, BondD]. Sales of BondA, BondB and
BondC are going very well, however there is little interest from clients of Hyper Big
Bank to buy BondD.
The Debt Origination Team Manager has asked you as the Data Science Specialist in
the team to work on a targeted pitch campaign to increase sales of BondD. The idea is
to reach out to some of the other Bond Buyers ([BondA, BondB, BondC]) to encourage
them to also buy BondD. The Debt Origination team has so-far collected Data from
2,000 Customers each of whom has purchased one of [BondA, BondB, BondC] and
some of whom have also purchased BondD. The task for you is to build a Machine
Learning model to predict which customers who have purchased one of [BondA,
BondB, BondC] are predicted to also purchase BondD. Your working model can then
be used for a targeted pitch campaign to select future buyers of one of [BondA, BondB,
BondC] to encourage them to also purchase BondD.
The model you will build is an example of a Cross-Selling Model. We know some
customers who buy one of [BondA, BondB, BondC] have also bought BondD. There
may be some Features of the Customer (Original Bond Purchase, Location, Wealth,
Risk Appetite etc.) that makes them more likely to buy certain combinations of the
available Bonds. Building a Machine Learning model is a highly efficient way of finding
out which existing customers we should be trying to cross-sell BondD to.
The Data Set is provided to you in an Excel (CSV) file. The customer information has
been anonymized to protect customer privacy. You are provided with the customer
information in 12 columns (each a Feature) which are labelled [Feat0, Feat1, Feat2, …
Feat11]. Of the 2000 available rows only 1500 have been provided to you. An
Independent Model Validation Team (Dr Mininder Sethi) has kept aside the further
500 rows for their own testing of the model and information that you will provide.
Project Questions: The Debt Origination Team Manager would like to know if you can
build a Machine Learning Model to predict if a customer who has purchased one of
[BondA, BondB, BondC] will also buy BondD. This is a Classification Problem with two
classes [Class 0=No (Not a Buyer of BondD), Class 1=Yes (Is a Buyer of BondD)]. In
particular, the Debt Origination Team Manager would like you to answer the following
questions.
1- Is it possible to build a predictive Machine Learning Model?
2- If so, which Machine Learning Model would you recommend and why?
3- How accurate is your proposed model on the Data provided?
4- If we are to start collecting Data for less than the current 12 Features then
which Features would you recommend and why?. How many Features do we
need to collect Data for?, what is the lowest possible number of Features to
maintain 70% accuracy?
5- One of Features (Feat3) is the Bond that the customer had originally purchased
(one of [BondA, BondB, BondC]). Are previous purchases of any of BondA,
BondB or BondC in particular more indicative of a purchase of BondD?
The Debt Origination Team Manager would like to see your answers to the questions above
in an executive (summary) report. The Debt Origination Team Manager would also
appreciate any further insights on the Data that you can provide beyond the questions
above. Extra insights might include advice on collecting future Data. The Debt Origination
Team Manager is particularly sensitive about attempted cross-selling of BondD to customers
who are unlikely to follow through with a purchase, so accuracy of your model when it
predicts Class 1=Yes (Is a Buyer of BondD) is important.
The Debt Origination Team Manager would appreciate the provision of technical details about
your Data decisions. In particular, the manager would be interested to know why you have
chosen a particular Machine Learning model, how you deal with missing Data and outliers and
your construction of a Training Set and Testing Set.
You should make the following submissions.
1- An executive report. The report should be no longer than 9 pages and contain no more
than 3000 words. The report should also contain at least 4 visualizations (charts of
some kind). You may add more visualizations if you wish to, but the final report length
should not exceed 9 pages. If your report exceeds 9 pages, then only the first 9 pages
will be considered for grading. If your report exceeds 3000 words, then only the first
3000 words will be considered for grading.
2- A single Jupyter Notebook that can be used to reproduce the numerical results of your
report. The Jupyter Notebook should also be able to reproduce the visualizations in
your report. Your Jupyter Notebook may contain extra information or insights that are
not in your report. You may use such extra information to develop your own thoughts.
However, any information inside the Jupyter Notebook that is not added into your
written report will NOT be used for grading.
You must follow these instructions.
1- You must use the Jupyter Notebook version that is packaged with at least the 2019.10
Windows Installer of Anaconda. You may use any later version that is available.
2- Your code should be in Python 3. You may use any libraries that are included with the
standard Windows Installer of Jupyter. This can include libraries that we have not used
in the lectures. You should not use additional libraries outside of the standard
Anaconda installation.
3- Your code must run on Windows 10. You may develop on any operating system that
you wish to (MacOS, Linux etc.) but you should check yourself that your submitted
code does run on Windows 10.
4- Make sure your results are reproducible, so seed your random number generators
(any seed you like). Also make sure that your final Jupyter Notebook does match up
to your written report.
5- You must use Logistic Regression and at least one other Classifier (but you are
encouraged to use more). You may use any techniques or Machine Learning Models
from outside of the Lecture Material as long as they are included in the standard
Windows Installer of Jupyter. You should explain your modelling decisions in your
report. You should not use any extra Data beyond that which as been provided to you
in the Excel (CSV) File.
6- You should place the read of the Excel (CSV) File in the first cell of your Jupyter
Notebook so that the Independent Model Validation Team can later link to the
complete Excel (CSV) File and test your code against the extra Data that has not been
provided to you. You can assume the extra Data has the same properties (in terms
of presence of outliers and missing data) as the Data that is provided to you.
Your final mark will reflect the accuracy of your proposed model over the Data that is and is
not provided to you. Your final mark will also reflect your explanation of your modelling and
Data decisions and how you address the questions from the Debt Origination Team Manager.
There are no strict right or wrong answers, therefore it is important that you explain your
decisions.
ICM317 - Machine Learning and Big Data in Finance
2020/2021 In Class Project: Part 2
Submission Date: Monday 10th May 2021, 12pm UK Time (noon)
Introduction:
This is Part-2 of your project and accounts for 20% of the overall ICM317 assessment.
The project covers material from Lectures 11 to 20 of ICM317. In addition, you need
to complete Part-1 of the project which accounts for an additional 20% of the overall
ICM317 assessment, details of Part-1 are above. Part-1 will cover material from
Lectures 1 to 10 of ICM317. The two parts of the project are completely independent
of each other. You should work alone on this project and make your own individual
submission.
Project Background: You have been hired by Hyper Big Bank as a consultant. Hyper
Big Bank are considering to setup a new independent standalone business called
Sunshine Day Securities (SDS). SDS will provided low-cost stock trading accounts
(brokerage accounts) to clients only in the USA and will only allow trading on a set of
the 20 largest capitalization (large cap) stocks listed in the USA.
For those unfamiliar with brokerage accounts, they work like this. As an individual you
might want to carry out some stock trading. You can go to a number of ‘low-cost’
brokers, like Interactive Brokers for example, and open a trading account. To begin
trading you need to complete an application process and then deposit money into
your account. You then purchase stocks. The broker will act as a custodian for the
stocks that you purchase and hold them on your behalf, this avoids the need for the
distribution of paper share certificates. Most low-cost brokers offer for a ‘full service’
model, this means that they typically offer most of the following to their customers.
1- An ability to buy a wide range of stocks, typically anything that is listed on an
exchange in a multiple number of countries.
2- An ability to trade multiple types of securities, including stocks, currencies,
bonds, and futures and options. Trading in futures and options is typically
limited to people who can show some minimum amount of trading experience.
3- An ability to trade on margin. Here for example, you could deposit $10,000
USD in a brokerage account and the brokerage firm would loan you additional
money to purchase stocks. A margin loan might be around 5 times your original
deposit amount, so that with $10,000 USD deposited you could purchase up
to $60,000 in stocks. If your portfolio starts to incur losses then are you liable
for the full loss amount, the broker will typically liquidate (close out) your
portfolio as losses approach $10,000 USD. This is so that your total loss does
not exceed the initial amount that you had deposited with the broker. Margin
loans incur interest charges from the broker. A broker may view a margin loan
as low risk as they are typically able to close out stock positions before losses
exceed the original deposit amount ($10,000 USD in this case). If losses exceed
the original deposit amount, then the customer is legally obliged to make a
cash payment against the additional losses.
4- A typical low-cost broker will facilitate short selling of stocks by sourcing
borrow on stocks on your behalf (for a fee).
5- The platform of a typical low-cost broker will allow you to make multiple trades
on the same stock in the same day, this is to facilitate day-traders who look to
profit from short term intraday trends in stocks. Such day-traders often close
out all their positions overnight.
6- A typical low-cost broker will only allow trading through an online platform,
there is no phone trading or voice trading capability. Online platforms often
provide an Application Programming Interface (API) that allows programmers
to write computer programs that can interface with the trading platform to
execute trades. Such an API approach might be used for example by people
building algorithmic trading strategies. There is typically 2-factor
authentication on an online platform and customers need to (i) enter a
password and (ii) generate a code with an electronic token device.
Hyper Big Bank believe they have spotted a gap in market and there is demand from
customers in the USA for a different type of product and Sunshine Day Securities (SDS)
will offer the following.
1- Trading limited to just the 20 largest capitalization (large cap) stocks listed in
the USA. The set of 20 stocks available will be updated quarterly and will be
taken to be the top 20 stocks (by weight) of the S&P500 Index. If you want
more details of the S&P500 Index construction then you can find that here and
if you want to know what the current top 20 stocks are then you can find that
here. SDS will act as custodian for any stocks purchased.
2- No other securities offered, the range is limited to just 20 stocks with no
currency trading and no trading of futures and options.
3- No margin trading. A customer can only purchase up to the limits of the money
in their trading account.
4- No short selling of stocks. Only stock purchases are possible (long only trading).
5- No intra-day trading. If a stock is purchased it must be held for a minimum of
one trading day before it can be sold. Multiple purchases of a stock in the same
day are possible.
6- Trading through an online platform and also through a phone/voice platform.
Authentication for the online platform is proposed to be single factor through
facial recognition using the camera on a smart phone or computer.
Authentication for the phone/voice platform is proposed to be single factor
through voice recognition.
The management of Hyper Big Bank believe that SDS will be able to offer one of the
lowest fee brokerage services around because the platform limitations (only 20 stocks,
only USA market, no margin trading, no short-selling) mean that the business setup
and maintenance costs of SDS will be much lower than for any other low-cost broker.
The First Year Business Plan is summarized as follows. It is expected that SDS will reach
100,000 customers. The plan is to target customers who do not yet have a brokerage
account of their own. The target customer group is under 25 years old and SDS will be
looking to target college students. The average number of trades (buy or sell) for each
customer in a month is expected to be 3 trades at a fixed fee of $10 USD per trade
(exclusive of any exchange fees, bid-offer spread, or trade taxes which are all paid for
by the customer). SDS expects 90% of all trades to be through the online platform. For
the voice platform SDS feels that it only requires the platform to support English and
a human operated call centre will be able to handle the expected call volume. All
operations will be in the Silicon Valley area of California, USA and all staff will work in
one office-based location (assume a post-covid world). The plan is to go live within 3
months of today.
The Second Year Business Plan is summarized as follows. It is expected that SDS will
reach 4,000,000 customers, almost all college students. The average number of trades
(buy or sell) for each customer in a month is expected to be 2 trades at a fixed fee of
$10 USD per trade (exclusive of any exchange fees or trade taxes which are paid by
the customer). SDS expects 70% of all trades to be through the online platform. For
the voice platform SDS feels that it would require the platform to support at least 4
languages which are (i) English, (ii) Spanish, (iii) Mandarin and (iv) Korean. The plan is
to retain all operations in the Silicon Valley area of California, USA and all staff will
work in one office-based location (assume a post-covid free world).
By Year 3, the business is expected to be mature with no net increase or decrease in
customer numbers or significant changes in customer activity. The management at
Hyper Big Bank have no current plans to grow or evolve the business at Year 3.
Project Questions: Hyper Big bank has hired you as a consultant at the very start of
the project. No project work has been carried out. They would like to you provide a
report that addresses some of the questions that they have. Hyper Big Bank views you
as a Business Expert, they understand that you are not an implementation specialist,
but you are someone who understands finance and understands the technical
challenges that may be involved with the project. The questions that Hyper Big Bank
have are.
1- What are your opinions on the decision to launch SDS?
2- What are your opinions on the business plan, is there anything you would do
differently?
3- If SDS is to be an independent business from Hyper Big Bank, then it will require
its own independent technical infrastructure. Should SDS go for building its
own inhouse infrastructure or should it opt for using a Cloud Computing
Provider? How would engagement with a Cloud Computing Provider work?
4- Do you have any views on how the business could be grown beyond Year 3?,
are their particular directions which utilize the infrastructure that has already
been built.
5- Are there any other questions that Hyper Big Bank should be asking
themselves?, what are these questions and do you have answers?
The management at Hyper Big Bank would like to see your answers to the questions above in
an executive (summary) report. The management would also appreciate any further insights
that you can provide beyond the questions above. Extra insights might include advice on
collecting data, storing data, customer security, regulatory issues, increasing profitability or
anything else that you feel is important.
You should submit a single report. The report should be no longer than 9 pages and contain
no more than 3000 words. The report should contain at least 4 images/figures (visuals are
impactful and allow conveyance of a lot of information). You may add more figures if you wish
to, but the final report length should not exceed 9 pages. If your report exceeds 9 pages, then
only the first 9 pages will be considered for grading. If your report exceeds 3000 words, then
only the first 3000 words will be considered.
You may use any of the content of the lecture notes and any external sources that you wish
to. Acceptable external sources include books, research papers and reputable websites
(decide yourself what counts as reputable). You should reference external sources and
provide links to the sources (Hyperlinks for all websites should be provided). Please only quote
and refer to content in English. Make sure that any images that you reuse from other sources
are properly referenced. You should obviously follow all of the University guidelines on report
writing and not plagiarise.
For guidance you might refer back to the four step Project Management approach that we
considered in the lecture notes. The 4 steps are.
(i) Setting aims
(ii) Putting in place data policies
(iii) Making plans to solve problems
(iv) Analysis and action
Your report might cover some parts of the first 3 steps.
学霸联盟