matlab代写-II 2022|学霸联盟

matlab代写-II 2022

时间：2022-05-12

Empirical Finance Spring II 2022
Final Group Project: Due Thursday, May 19th, 2022
In this project, you will be using the following files to do some analysis surrounding Elon
Musk’s path to buying Twitter. In the returns file, note that all returns are holding
period returns and given in percent - these are not log returns:
twtr russell3000 and 5factor daily.xlsx: This file contains daily excess holding-
period returns for Twitter and the Russell 3000 index as well as the corresponding
Fama-French five-factor portfolio returns and the risk-free rate (RF)
Tweets excel.xlsx: This file contains tweets and other news text from April 4th to
April 25th
RunTweetTextAnalysis final project.m: This MATLAB file gives you a start and
some hints on Q1 related to tweet sentiment analysis
RunTwtr5Factor final project.m: This MATLAB file gives you a start and some
hints on Q2 related to estimating some daily factor models and calculating cumulative
abnormal return (CAR)
Q1. Sentiment Analysis [18 points total]: Using RunTweetTextAnalysis final project.m
as your starting point, read in both Excel files and perform the following steps:
1. [1 point]Winsorize the Twitter returns at the 1st and 99th percentile. In other words,
set all returns that are greater than the 99th percentile to the value of the 99th per-
centile return, and set all returns less than the 1st percentile return to the value of the
1st percentile return. Hint: Use the MATLAB prctile function to help with this.
Note that winsorization is a very common technique when dealing with datasets
that have outliers throwing off a model. A less-desirable alternative is removing
the observations, but this eliminates data and reduces statistical power.
2. [0 points]Move any weekend tweets/news to the next business day in a new column
in your tweets table called bus date. I have already done this in the MATLAB file
for you, and I suggest you look up the busdate function in MATLAB to see how it
works as well as what I’m doing in that code.
3. [2 points]Calculate the VADER sentiment score for each Tweet and news item and
add these scores as a new column to your tweets table.
4. [2 points] Calculate the mean sentiment score for each bus date. We will call this the
daily sentiment score, scoret. Note that you will not have a sentiment score for every
day (only for a few days, in fact). The groupsummary function will be helpful here.
5. [3 points]Using MATLAB code, find the Twitter returns corresponding to each
day t from your list of sentiment scores, then run the following regression:
HPR rft = α + βscoret
where HPR rft is the Twitter excess holding period return for day t (this is what’s
already in the Excel file). Hint: an innerjoin will be helpful to find the returns
corresponding to the sentiment scores.
6. [2 points]Display a table showing your estimated α and β as well as the standard
error and t-statistic for each of these estimates (I’ve given you several examples of
how to display a nice table with regression coefficients). Then display your R-squared
estimate as well.
7. [3 points]Based on your regression outputs, what is the effect on Twitter’s excess
return from increasing a daily sentiment score from completely neutral (i.e., scoret = 0)
to completely positive (i.e., scoret = 1)? Comment on your R-squared value as well.
8. [5 points]I carefully chose the tweets and news snippets that got me a result that
I liked, and they didn’t turn out as I had expected. Keeping this in mind, do the
following:
Create an Excel file with a single column, called ”Words”, where you enter 10
words (one in each cell) by typing in a word you think is completely positive and
then another that you think is completely negative (so you will be alternating
good and bad words and you’ll end up with five of each). This way, you know to
expect a negative followed by a positive.
2
Read in the words, then calculate the VADER sentiment scores and display a
table showing both the words and their scores.
Comment on the results. Are they what you expected, and what does this result
tell you about doing sentiment analysis? Can you think of any ways to improve
this, whether in code, by hand or a combination of both? Note: You don’t need to
implement any improvements - just describe at a high level (a couple sentences)
what you might do.
Q2.Abnormal Returns (event-study stuff) [30 points total]: Using
RunTwtr5Factor final project.m as your starting point, read in the
twtr russell3000 and 5factor daily.xlsx file and perform the following steps:
1. [5 points]Using the daily holding period return data from Dec 1st, 2021 to March 31st
(including those two dates), estimate the following factor models and, for each, display
the estimated coefficients along with their standard errors and their t-statistics, and
the R-squared for each regression:
HPR rft = α + β ∗Mkt RFt + s ∗ SMBt + h ∗ HMLt + r ∗ RMWt + c ∗ CMAt
HPR rft = α + β ∗ ret rus3000 rft + s ∗ SMBt + h ∗ HMLt + r ∗ RMWt + c ∗ CMAt
HPR rft = α + β ∗ ret rus3000 rft
where HPR rft is the Twitter excess holding period return for day t (this is what’s
already in the Excel file) and the other returns are named as listed in the Excel file.
Note that I have already set up the MATLAB script file with code that filters the data
by the estimation time, and I have created a for loop that gives you a head start on
writing the code to run and display the regressions just once and that will be executed
three times. This is better than re-writing the same code three times.
2. [3 points]Using the coefficients from the last regression (the one with only the Russell
3000 factor) as your benchmark data (i.e., a control group), predict the Twitter returns,̂HPR rft from April 4th through the end of the data. The reason we are predicting from
April 4th is because that is when Elon Musk’s 9.2% stake in Twitter was announced,
kicking off the events that eventually led to the announced purchase.
3. [3 points]For each day of your predicted returns, calculate the cumulative abnormal
return (CARt), i.e.:
CARt =
t∏
j=1
(1 + HPR rft)−
t∏
j=1
(1 + ̂HPR rft)
3
where j = 1 is the first day of your predicted returns (i.e., April 4th). Note: Re-
member that your holding period returns from the Excel file are in percent,
so you will need to divide them by 100 in the above equation). Also, the
cumprod MATLAB function is helpful here.
4. [2 points]Similarly, for each day of your predicted returns, calculate the cumulative
abnormal log return (lCARt), i.e.:
lCARt =
t∑
j=1
[
log(1 + HPR rft)− log(1 + ̂HPR rft)]
The cumsum function is helpful for the lCARt calculation like the cumprod function was
in the above CAR calculation.
5. [2 points]Plot your cumulative abnormal returns and your cumulative abnormal log
returns (all returns in percent, i.e., multiplied by 100) on the same plot, with the date
of those abnormal returns on the X-axis.
6. [3 points]Create another plot with dates on the X-axis and the following four daily
series plotted on the Y-axis in percent (also, include a legend):
The Twitter return during the measurement period (note that this series will not
overlap dates with the other three series)
The predicted Twitter return during the abnormal return period
The actual Twitter return during the prediction period
The CAR during the prediction period
7. [12 points]Answer the following questions, in just a couple sentences each:
Note that I used the Russell 3000 data because Kenneth French doesn’t post
his Fama-French (F-F) updated data until relatively late in the month after the
data are generated, so I couldn’t give you April F-F data yet. Based on your
regressions above, does the Russell 3000 seem like a pretty good proxy
for the Mkt RF from the Fama-French data? Why or why not? (by the
way, as I mentioned in class, the S&P is not a good proxy for the Mkt RF).
Which of the factor portfolios had coefficients that were statistically
significant in the regressions?
4
Do the R-squared values of the five-factor portfolio regressions give you
confidence that they would make for a good control group if you were
to do a full-scale event study with many firms and not just Twitter?
How much of a drop-off in R-squared was there from the five-factor
models to the one-factor model we used for the predictions? Does that
invalidate the predictions (and ultimately the CARs) for you, and why
or why not?
The CARs during the prediction period are pretty large, and we know
that log returns are approximately equal to holding period returns at
least when they are small. Given the large magnitudes of the CARs,
does the difference between the CAR and lCAR bother you enough
that you think log returns are deceptive in this case?
If we were to make this a full event study, how would you describe
the event that we are studying and what other data would we need to
gather?
5