FINA 5840: Financial Modeling Assignment 1
Halis Sak
Due date: April 16, 2021
Question. We will work with firm characteristics data that we downloaded from WRDS for this
homework assignment.
a) Read “data.csv” data to a Pandas dataframe (“df”). And print the dimension of the created
dataframe, “df”. How many number of rows and columns exist in the data?
b) Please do the following to pre-process the data in the given order.
• Get rid of the rows for which “next_ret” is a string
• Change the column names of “df” to lowercase
• Split the data into train and test (train: 1980 to 1999 and test: 2000 to 2019). Please name these
two new dataframes as “df_train” and “df_test”.
c) We want to compute correlation between “logmmt” and “mmt6” features on training data. Some of
the values are NaN for these features. Please do the following steps in the given order.
• Step 1. import numpy Python package
• Step 2. Create a boolean pandas series, “bool_index_finite”, such that ith element of “bool_index_finite”
should be True if ith element of “logmmt” and “mmt6” are finite. Otherwise, it should have a value
of False. Hint. You can use isfinite function of numpy package to check whether a value is finite
and & operator to combine multiple booleans when using numpy.
• Step 3. Use “bool_index_finite” as a boolean index of “df_train.logmmt” and “df_train.mmt6” to
choose the rows of “df_train” for which both “logmmt” and “mmt6” are finite. Then, you can simply
use corrcoef function of numpy package to compute the correlation between “logmmt” and “mmt6”
that are both finite.
d) We want to create a new feature using “logmmt” and “mmt6” features. If both “logmmt” and
“mmt6” are greater than zero then the new feature “mmt_dir” should be equal to 1, otherwise it should
be equal to 0. As opposed to part c, we need to use a for loop this time. Please do the following steps
in the given order.
• Step 1. Use zeros function of of numpy package to create an array of size number of rows of train
data, and assign this to “mmt_dir” column of “df_train”.
• Step 2. Write a traditional for loop that iterates over the rows of train data
– Step 3. Check whether both of “logmmt” and “mmt6” values for the current row of train data
are greater than zero. If both of “logmmt” and “mmt6” values are greater than zero then change
“mmt_dir” to 1.
1
学霸联盟