Empirical Assignment
BMME116 Financial Data Analytics
Mancy Luo, March 2022
The document includes a section of general information, and separate sections for the
group assignment and the individual assignment. Please read the documents carefully.
1 General Information
Please save your reports in a single pdf as “Group Number.pdf” for the group assignment
written report, and “Last name student number.pdf” for the individual assignment written
report. Please upload them on Canvas. You are not allowed to re-submit the same assignment
once the submission is done. Each team only submits one report for the group assignment.
Assignments that are not in pdf will not be considered, which means that you will receive
zero in this component.
For each assignment, the report should be in a professional layout with a cover page, an
executive summary, main body, tables or graphs, and the appendix of R code for completion.
Please choose font size 11, margin top and bottom with 1 inch, line space with 1.5, and
justify your text evenly between the margins. The cover page contains the report title (which
reflects the central idea), author name(s), student number(s), and the group number for
group assignment. I recommend both reports to be less than 10 pages including graphs and
tables, but not including reference and code. If you feel it necessary to exceed 10 pages with
informative contents, it is fine.
You are encouraged to research related work online. However, you MUST have some of
your own ideas and include all references at the end of the report. The degree of diculty
for each topic can be di↵erent depending on your programming skills. It does not a↵ect your
grade per se, i.e., choosing a dicult project does not guarantee you a high grade, however, it
may give you larger space for more creative thoughts and idea implementation. Please keep
1
3 Individual Assignment
The individual assignment consists of a writen report which accounts for 50% of the total
grade. The deadline is April 30 11:59pm, 2022 (Dutch time).
3.1 Suggested Battle Plan
This section serves as a suggestion for how to organize the reports. Please make sure the
executive summary answers the following questions well:
• What is the question that you are investigating?
• Which data you use?
• What textual analysis techniques do you apply?
• What are the main findings based on your analysis?
The main body could consist of the following sections:
Introduction
Be clear about which project you choose and what is your question. You could provide some
brief introduction or motivation on why you decide to investigate this question.
Data cleaning
Please be clear on the following points:
• how do you obtain your sample (e.g., data sources, how you select the time periods,
what filters you put to select your sample etc.)?
• how do you clean your sample? Cleaning steps should be project-specific, depending on
your question and your interests. Please explain clearly how you cleaned the data in
precise and concise terms. For example, include informative steps rather than something
like “I load all the texts into R”. This sentence does not provide any information because
clearly you must load the data to proceed. Instead, something like something like “I
replace all special characters such as # $ with blanks” provides more information.
6
Please focus on the concepts rather than the coding details. For example, “I replace all
special characters such as # $ with blanks” is much better than “I use gsub(text, “#”,
“”) to remove #, and gsub(text, “$”, “”) to remove $ etc.”. I will check your code in
the end and there is no need to waste your pages on such things.
• Please pay attention to some words that might appear more often by construction,
and you may need to consider removing these words for more informative analysis.
For example, if all your articles are based on search term “sustainability”, it may be
unsurprising that this particular word will have a high word frequency which may not
give you sucient insights.
• In the end, please briefly discuss how your sample looks like, e.g., # of documents about
xx firms over time period 2010 to 2015 etc.
Exploratory analysis
This section helps readers understand how the data looks like. It could start from basic rel-
evant and interesting patterns, e.g., # of articles over time/per author etc. Please perform
data visualization for better understanding (figures are normally better than tables). Be selec-
tive to discuss the most relevant and important features (in your opinion) that make readers
understand the most. Think of the following questions when you prepare tables/figures:
• What do the figures/tables tell us? What can be concluded/inferred/implied? For
example, if you want to plot a word cloud for each year (e.g., 20 figures for the sample
from 2001 to 2020), what is the main message you try to convey using the 20 figures?
• Are the figures/tables informative and relevant to the research question?
• Do the figures/tables provide any insights for further analysis?
• Try showing the most interesting and relevant figures/tables. Piling too much fig-
ures/tables without any focus is not a good idea.
7
More advanced analysis
After having explored the basic patterns in the data, you may want to perform more advanced
analysis, e.g., sentiment analysis. This section may convey more messages, especially the
variation is rich when it comes to how you examine the sentiment content, e.g., topic modelling,
polarity, sentiment scale, sensational word popularity etc. Please also discuss what could be
concluded from the analysis. Be clear about your findings.
3.2 Topics
Topic II.1 Congressional speech and information content
Polarization increases tremendously over the recent years in the U.S. Congressional speech
provides useful materials to understand the change and the trend of the focus of politicians’
political views, and how it may a↵ect individual’s political activities.
Objective: Some potential questions: i) understand the di↵erent topics associated with
parties, ii) the time-series changes of speech focus, iii) the distribution of di↵erent central
topics etc.
Data: You can either get the raw data from Congress database or use the semi-cleaned data
provided by Gentzkow, Shapiro, and Taddy (2019). Please understand how websites and
databases are organized.
Other data sources: Parliamentary speeches for some European countries are provided
by Rauh, De Wilde and Schwalbach (2017) via Harvard dataverse in R data format. Data
related to voting and elections in the U.S. can be found via Federal Election Conmission or
MIT electionlab.
Topic II.2 News Articles
News articles contain information and opinions about the macro economy or firms. Re-
searchers also use them to quantify political bias in the media and understand how they a↵ect
people’s political activities, e.g., voting for elections. Many news journals (e.g., Wall Street
Journal, Reuters, the New York Times etc.) have online article archive that contains news
articles dating back to 1800s. Some news providers also gather news articles from di↵erent
8
sources, e.g., Factiva, ProQuest, LexisNexis etc.
Objective: Some potential questions: i) understand the di↵erent writing styles of authors,
ii) the time-series changes of news sentiment, iii) the distribution of di↵erent central topics
etc. iv) COVID-19 related sentiment changes etc.
Data: News pieces can be obtained from online archive of the journal, e.g., the New York
Time’s archive, or some online archive websites, e.g., https://archive.org/. Besides, Eras-
mus University has subscription to Factiva. You can scientifically select some news articles
here.
Topic II.3 Firms’ Financial Documents
Listed firms in the U.S. are required to file regular disclosures to Securities and Exchange
Commission (SEC). All the files are stored in EDGAR electronic disclosure system. Majority
studies in the finance and accounting uses the database for analysis with firms’ 10-X files.
Objective: Some potential questions: i) understand the general pattern of 10-X reporting
over time and across firms, ii) understand the sentiment in some specific sections of 10-X
forms during some certain events etc.
Data: Since the database is widely used, you may easily find online sources that scholars
have already cleaned. You can either get the raw data from EDGAR or online sources such
as Software Repository for Accounting and Finance.
4 Frequently Asked Questions
4.1 Questions about Data Issues
1. Some topics do not specify what data I exactly need to download. Could you
please be more specific?
I do not specify because I want to leave you enough discretion for data selection. Basically,
the idea is, I point to some potential sources which provide interesting data, and then based
on the data sources, you decide what you want to investigate and what you need to collect.
Please be aware that the emphasis of the assignments is always on analysis with interesting
question, rather than patience/ability to collect data and programming per se.
9