BEAM079-无代写-Assignment 20
时间:2023-08-19
BEAM079
Coding Analytics for Accounting and Finance
Assignment 2023
30 credits (100% of Final Grade)
Deadline: 12.00pm 1st September 2023
Word Limit: 7500 (Includes tables, references, appendices)
Submission to include:
(1) Written report
(2) Python code
Answer one of the following questions:
EITHER Assignment Option(A)
OR Assignment Option (B)
OR Assignment Option (C)
Assignment Option (A) – Social Network Analysis.
Requirement
This assignment investigates Social Network Analysis and requires you to evaluate a network
of company directors.
Using data from BoardEx (available via WRDS)1, you are tasked with compiling a network
of company directors using your knowledge of Social Network Theory. The network should
consist of nodes (the directors) and edges. The edges between each node should be connected
only if the directors sit on the same company board in the same year.
Your network should span at least one whole year and include the latest full (calendar) year
of data from BoardEx (currently this would be 2022). BoardEx has three databases of
companies and their directors, US, UK, and Europe; you may use any one of these for your
analysis.234
Your analysis should include a statistical investigation into the network you create. Who are
the key, central and influential players in the network? Why are they so influential? Do they
often represent a particular type of company or companies, or have particular characteristics?
To answer these questions, you should provide information regarding the network in terms of
the centrality measures provided to you in class: Degree, Betweenness, Closeness and
Eigenvector centrality.
Using your python skills acquired within this module (and elsewhere), your report should
seek to provide (but not limited to):
An introduction – A general discussion of Social Network Analysis, the potential
application to corporate director networks, what your research goals are, and what you hope
to achieve through your analysis.
A literature review – You have one main strand of literature to assess, Social Network
Theory and in particular, its utilisation in the area of business power and influence.
Remember that a literature review should be a cohesive discussion of extant literature, how it
relates to your research, and is the main indicator of whether you understand the subject area
or not.
Methodology - What statistical tests/models are you going to perform and why?
Data – A description of the data and its origin, including summary statistics of the key
variables where appropriate, including the size of the network and other pertaining
information.
1 Please come and see me if you have never used WRDS or simply need a refresher.
2 Using BoardEx is the easier option, as I know that the data exists here. You are welcome to investigate
directors of companies within any country – but it will be up the individual to source the necessary data.
3 Data can be collected from WRDS - BoardEx - (REGION) - Organization Summary - Analytics
From this database you can get information on directors. the key things you need to calculate the edges are:
Company ID and/or Board ID, Director ID/name, Annual report date.
4 From past experience the database may include non-commercial government or oversight boards. They
should be easy to spot but please do not use them. Please use Company Boards only.
Analysis – You are required to analyse the network as a whole and discuss the most
influential directors across the network. There are four centrality measures which may offer
different results. You may wish to formulate a “overall score” taking into account each
measure in combination. 5
Visualisation – Visualise your network. This can be done in any way you feel fit. Make sure
to explain your visualisation and how it was created. You may wish to colour code the
network by community, or by industry for extra visual appeal. The examples and links given
to you in class will assist you on this.
Conclusion – What are your key findings and what are the implications?
Python code – A code (*.py) file which documents each stage of your analysis (uploaded as
a separate file).
You may expand or explore this topic in any way you feel appropriate - provided that the key
areas highlighted above are covered.
5 For an additional analysis (time permitting) you may also wish to perform (a) an independent analysis on
individual business sectors taken in isolation and/or (b) a regression that utilises individual director
characteristics (also available on BoardEx) in order to explain those with high centrality scores. For example,
are these directors of a certain age compared to their peers? Did they go to a particular School or University?
Have they been on the board a long time? Inside or Outside directors? The centrality score would be the
dependent variable – but what factors/qualities increase or decrease your centrality.
Assignment Option (B) – Fraud Detection
Requirement
This assignment surrounds fraud detection and asks you to investigate the application of
Benford Law as a tool to detect financial reporting fraud.
A list of 35 companies has been provided for you. These companies have been determined by
Audit Analytics to have not only misrepresented their accounts, but have done so in a
fraudulent manner. The details of each fraud and the account-years which are affected are
provided for you and are available on ELE within the Assignment section.6 Note that only the
last 10 years (2013-2022) are to be used.
Using financial data, your task is to provide empirical analysis and establish whether these
fraudulent accounts can be detected by Benford Law.
In particular, you should analyse these fraudulent reports and compare the results to
companies that are deemed to be clean. The clean companies can be any of your choosing,
but the clean sample must be comparable to those which are fraudulent (e.g. in terms of size
and industry). 7
Your analysis should include a comparison of MAD, KS and Chi-squared test for both groups
– including tests of difference where possible.
You are also required to perform separate analyses, not only on the financial data as a whole,
but according to type of information i.e. Balance sheet, Income Statement, and Cashflow
items. Comparisons should be made to the findings of Amiram et al (2015).
You may also wish to take the type of fraud being committed into account when performing
any analysis. The column named “RES_FRAUD_RES_TITLE_LIST” details the type of
fraud e.g. [6] “revenue recognition issues” would suggest that the fraud would be more likely
detected in the income statement rather that the other statements. You may wish to test
whether this is true or not.
No financial data is provided – you will need to manually collect this yourselves through, for
example, WRDS. 8
Using your python skills acquired within this module (and elsewhere), your report should
seek to provide (but not limited to):
6 There are potentially 105 total years of fraudulent accounts that were required to be restated. Not all of
these companies had sufficient accounting data (available through compustat) at the time of writing this and
therefore your final analysis may result in fewer firm-years.
7 You may infer that any company which has not been determined by Audit Analytics to be fraudulent, is a
“clean” company.
8 Please come and see me if you have never used WRDS or simply need a refresher. The best place to gather
your financial data is from WRDS>Compustat>North America> Fundamentals Annual.
An introduction – Discussion of Benford Law, the potential application to fraudulent
account detection, what your research goals are, and what you hope to achieve through your
analysis.
A literature review – You have two strands of literature to discuss, namely Benford Law,
and fraud detection. Remember that a literature review should be a cohesive discussion of
extant literature, how it relates to your research, and is the main indicator of whether you
understand the subject area or not.
Methodology - What statistical tests/models are you going to perform and why?
Data – A description of the data and its origin, including summary statistics of the key
variables.
Univariate analysis – The inclusion of (but not limited to) tests of difference between the
two groups of firms in terms of the financial data acquired, along with separate analysis of
balance sheet, income and cashflow statement items. You would expect to see larger
deviations from BL within the fraudulent accounts.
Multivariate analysis – No multivariate analysis is required but you may wish to investigate
Amiram et al’s (2015) proposition that misstatements are more likely to occur in smaller,
younger, more volatile, growing firms. This may be done by using key independent variables
in order to explain your Benford conformity measures by way of a regression (conformity
being the dependent variable).
Conclusion – What are your key findings and what are the implications?
Python code – A code (*.py) file which documents each stage of your analysis (uploaded as
a separate file).
You may expand or explore this topic in any way you feel appropriate - provided that the key
areas highlighted above are covered.
Option (C) – Bankruptcy Prediction with Text Analysis
Requirement
This assignment incorporates several elements of the BEAM079 module and requires that
you perform an empirical investigation and discussion of the following scenario - building
upon the work you may have carried out in BEAM078 Applied Empirical Accounting and
Finance or other modules which have introduced Bankruptcy Prediction as a topic.
Using financial and textual data, your task is to provide empirical analysis to establish
whether several high-profile US corporate bankruptcies (which occurred in 2022) could have
been predicted or not. You are tasked with creating a series of models and assessing how
accurate these models are in the prediction of bankruptcies in the US.
Your data comes in two forms. Firstly, the list below details the Names and Tickers for 10
Bankrupt Companies and 20 Non-Bankrupt companies.9
Bankrupt Non-Bankrupt
Name Ticker Name Ticker
ATLAS FINANCIAL HOLDINGS INC AFHIF MARPAI INC MRAI
ARMSTRONG FLOORING INC AFIIQ COMMERCIAL VEHICLE GROUP INC CVGI
ALFI INC ALFIQ BLACKBOXSTOCKS INC BLBX
ALLENA PHARMACEUTICALS ALNAQ NRX PHARMACEUTICALS INC NRXP
CYPRESS ENVIRONMNTL PRTNS LP CELPQ UR ENERGY INC URG
ENDO INTERNATIONAL PLC ENDPQ HORIZON THERAPEUTICS PUB LTD HZNP
EASTMAN KODAK CO KODK NATIONAL INSTRUMENTS CORP NATI
PHASEBIO PHARMACEUTIC PHASQ T2 BIOSYSTEMS INC TTOO
REVLON INC REVRQ NU SKIN ENTERPRISES INC NUS
ZOSANO PHARMA CORP ZSANQ STRATA SKIN SCIENCES INC SSKN
GOOSEHEAD INSURANCE GSHD
HYZON MOTORS INC HYZN
INTELLINETICS INC INLX
ACERAGEN INC ACGN
PEDEVCO CORP PED
MALLINCKRODT PLC MNK
ITRON INC ITRI
MIROMATRIX MEDICAL INC MIRO
HERBALIFE NUTRITION LTD HLF
ASSURE HOLDINGS CORP IONM
This is your sample and only this sample should be used. You are to also use this list as a
basis for the collection of your financial data which I would recommend gathering from
WRDS.10
9 The 10 bankrupt companies filed for either Chapter 7, 11 or 15 in the United States in 2022. The Non-
bankrupt companies have been matched to the bankrupt companies based on industry and asset size.
10 Please come and see me if you have never used WRDS or simply need a refresher. The best place to gather
your financial data is from WRDS>Compustat>North America> Fundamentals Annual.
Secondly you have been provided with one annual report for each of the companies listed
above. Each of the annual reports are from 2021 (one year prior to bankruptcy) and can be
found on ELE in the Assignment section. You are to use these annual reports in order to add
textual information to your bankruptcy prediction models.
Using your python skills acquired within this module (and elsewhere), the empirical analysis
contained within your Assignment should seek to provide (but not limited to) the following
models:
• A logit model predicting bankruptcy one year before failure (2021) using only
financial ratios of your choosing. These should be selected based on the prior
literature. This should be tested for accuracy via an ROC curve.
• Adding textual information from the annual reports (e.g. sentiment and readability
scores), does this improve the accuracy of the logit model?
• Does methodology matter? Instead of using a logit model, how are the above two
models affected if they are now conducted using a Neural Network?11
In addition, your Assignment should implement the following structure and some further
suggestions have been provided for you:
An introduction – Discussion of the task at hand highlighting what your research goals and
objectives are, whilst setting the scene from which everything else is based.
A literature review – You have 2 main strands of literature to discuss namely: Bankruptcy
prediction literature and sentiment/textual analysis literature. Your study will add to the small
number of papers which have joined the two strands together. Remember that a literature
review should be a cohesive discussion of extant literature, how it relates to your research,
and is often seen as the main indicator of whether you understand the subject area or not.
Methodology - What statistical tests/models are you going to perform and why? How have
they been used in the past? How successful have they been?
Data – A description of the data and its origin, including summary statistics of the key
variables.
Univariate analysis – The inclusion of (but not limited to) tests of difference (t-tests)
between the two groups of firms in terms of both financial and textual data; correlation –
ensure/demonstrate that there are no extreme correlations that will affect your multivariate
logit model12; a comparison of bankrupt and non-bankrupt word clouds may be interesting if
an adequate number of stopwords are utilised.
Multivariate analysis – Combine your individual variables into multivariate models. As
described above, your main overriding task is to discover whether it is better having textual
data in a bankruptcy prediction model rather than just having financial variables? Does the
textual data add any value to the model(s). How much more accurate is a neural network than
11 Given the small sample size it may be difficult to assess for certain. This limitation can be discussed within
the assignment.
12 Multicollinearity will have a severe effect on the logit model but is not as detrimental to a Neural Network.
a logit model? (if any). Due to a lack of bankrupt firms in 2021 you do not need to validate
your models with a separate sample- you may base your results on the training sample alone.
Conclusion – What are your key findings and what are the implications?
Python code – A code (*.py) file which documents each stage of your analysis (uploaded as
a separate file).
Notes:
• The provided annual reports were collected from Filings Expert (UoE database library).
• Financial data should be collected from WRDS (Compustat – Capital IQ).
• The list of bankrupt companies was collected from Audit Analytics.
• The list of sentiment words from the Loughran McDonald dictionary is provided.
Feel free to collect any additional information that you feel is necessary to your study.