ECOM 2001 Term Project: (KO RAD CWT) LIUHAORAN 20750822 Due Tuesday, 5th
October 2021 16:00 AWST # packages library(tidyquant) # for importing
stock data library(tidyverse) # for working with data # library(broom) #
for tidying output from various statistical procedures library(knitr) #
for tables # library(kableExtra) # for improving the appearance of
tables # Add any additional packages that you use to this code chunk 1
Import the Data (2 points) ## 1) Import your assigned stocks ## Use the
package tidyquant. You may need to install this package first. ##
Replace Stock1, Stock2, Stock3 with your assigned stock names (in
quotation marks), uncomment the code, and Run stocks<-c("KO",
"RAD","CWT") %>% tq_get(get = "stock.prices", from =
"2000-01-01")%>% select(symbol, date, adjusted) ## This is your data
set for this project (rename yourDataName to something more descriptive)
## output the first 6 rows of your data frame: head(stocks, n = 6
)%>%mutate_if(is.numeric, round, digits=3)%>% kable(caption =
"Three Stocks") Table 1: Three Stocks symbol date adjusted KO 2000-01-03
15.572 KO 2000-01-04 15.590 KO 2000-01-05 15.728 KO 2000-01-06 15.745
KO 2000-01-07 16.781 KO 2000-01-10 16.246 1 2 The Analysis 2.1 Plot
prices over time (3 points) Plot the prices of each asset over time
separately. Succinctly describe in words the evolution of each asset
over time. (limit: 100 words for each time series). ## Don't forget to
add fig.cap= "Your caption" to the code chunk header. ## facet_wrap()
may be useful ggplot(stocks, aes(date, adjusted))+ geom_line()+
facet_wrap(~symbol) CWT KO RAD 2000 2005 2010 2015 2020 2000 2005 2010
2015 2020 2000 2005 2010 2015 2020 0 50 100 150 200 date a dju ste d
Figure 1: prices over time CWT asset price shows a uptrend over time.
Especially after year 2015 its increase speeds up. KO asset price showed
no trend in general before year 2010 because but it shows a uptrend
over time after 2010. Both CWT and KO assets had seen a drop in year
2020. RAD asset price shows no trend over time, it is purely random. It
increased in some days, but then could decreased. It is hard to find a
clear pattern of RAD asset. 2 2.2 Calculate returns and plot returns
over time (4 points) Calculate the daily percentage returns of each
asset using the following formula: rt = 100 ∗ ln ( Pt Pt−1 ) Where Pt is
the asset price at time t. Then plot the returns for each asset over
time. ## Hint: you need to add a column to your data frame
(yourDataName). ## You can use the mutate() function ## Don't forget to
group_by() ## The lag() function can be used to find the price in the
previous date ## Double check your results!! stocks <- stocks%>%
group_by(symbol)%>% mutate(return = 100*log(adjusted/lag(adjusted)))
ggplot(stocks, aes(date, return))+ geom_line()+ facet_wrap(~symbol) 3
CWT KO RAD 2000 2005 2010 2015 2020 2000 2005 2010 2015 2020 2000 2005
2010 2015 2020 −40 −20 0 20 date re tu rn Figure 2: returns over time
2.3 Histogram of returns (4 points) Create a histogram for each of the
returns series (explain how you determined the number of bins to use).
ggplot(stocks, aes(return))+ geom_histogram()+ facet_wrap(~symbol) 4 CWT
KO RAD −40 −20 0 20 40−40 −20 0 20 40−40 −20 0 20 40 0 1000 2000 3000
4000 return co u n t Figure 3: return histogram I use the default
setting of bins, that is, 30 bins starting at -40 and end at 40. 2.4
Summary table of returns (4 points) Report the descriptive statistics in
a single table which includes the mean, median, variance, standard
deviation, skewness and kurtosis for each series. What conclusions can
you draw from these descriptive statistics? ## Your summary table here.
Be sure to format the table appropriately. stocks%>%drop_na()%>%
group_by(symbol)%>% summarise(mean = mean(return),median =
median(return),variance = var(return), `standard deviation`=sd(return),
skewness = skewness(return), kurtosis = kurtosis(return))%>%
mutate_if(is.numeric, round, digits=3)%>% kable(caption = "Summary
table of returns") conclusion: the mean return of CWT and KO is larger
than RAD. Moreover, the variance of RAD return is much greater than that
of CWT and KO. 5 Table 2: Summary table of returns symbol mean median
variance standard deviation skewness kurtosis CWT 0.038 0.069 3.337
1.827 0.315 11.429 KO 0.022 0.041 1.768 1.330 -0.168 9.284 RAD -0.052
0.000 19.024 4.362 -0.029 12.033 2.5 Are average returns significantly
different from zero? (5 points) Under the assumption that the returns of
each asset are drawn from an independently and identically distributed
normal distribution, are the expected returns of each asset
statistically different from zero at the 1% level of significance?
Provide details for all 5 steps to conduct a hypothesis test, including
the equation for the test statistic. Calculate and report all the
relevant values for your conclusion and be sure to provide an
interpretation of the results. Steps 1. The null and alternative
hypothesis H0 : µ = 0 H1 : µ 6= 0 2. The level of significance and
number of observations. Let’s use α = 0.01. 3. The test statistic. We do
not know the true population standard deviation. So we will use a
t-test statistic. The t-test statistic is t = m s/ √ n where m is the
mean, s is the standard deviation and n is the sample size. 4. The
critical values for our test statistic. 5. The decision. If the test
statistic falls into either rejection region (so t is less than the
lower cutoff value or greater than the upper cutoff value), reject the
null. Also reject the null if the p-value of the test is less than the
significance level we chose (α) (this is the more direct way to make a
decision:both methods lead to the same result). ## Hint: you can extract
specific values from t.test objects using the $ ## Eg. using
t.test(x,y)$statistic will extract the value of the test statistic. ##
Consult the help file for the other values generated by the t.test()
function. ## The relevant values are: the t-test method, the estimated
mean , the test statistic, whether the test is one or two tailed, the
degrees of freedom, and the p-value. (You might wish to present this in a
table) ttesttable <- data.frame(t.test.method = rep("one sample t
test (two-tailed)",3), `estimat mean` = c(
t.test(stocks$return[stocks$symbol=="CWT"])$estimate,
t.test(stocks$return[stocks$symbol=="KO"])$estimate,
t.test(stocks$return[stocks$symbol=="RAD"])$estimate), `test
statistic`=c( t.test(stocks$return[stocks$symbol=="CWT"])$statistic,
t.test(stocks$return[stocks$symbol=="KO"])$statistic,
t.test(stocks$return[stocks$symbol=="RAD"])$statistic), df = c(
t.test(stocks$return[stocks$symbol=="CWT"])$parameter,
t.test(stocks$return[stocks$symbol=="KO"])$parameter, 6 Table 3: t test
table of returns t.test.method estimat.mean test.statistic df p.value
one sample t test (two-tailed) 0.038 1.552 5471 0.121 one sample t test
(two-tailed) 0.022 1.246 5471 0.213 one sample t test (two-tailed)
-0.052 -0.877 5471 0.380
t.test(stocks$return[stocks$symbol=="RAD"])$parameter), p.value = c(
t.test(stocks$return[stocks$symbol=="CWT"])$p.value,
t.test(stocks$return[stocks$symbol=="KO"])$p.value,
t.test(stocks$return[stocks$symbol=="RAD"])$p.value) )
ttesttable%>%mutate_if(is.numeric, round, digits=3)%>%
kable(caption = "t test table of returns") The results show that the
average returns is not significantly different from zero because
p-values are all greater than 0.01. 2.6 Are average returns different
from each other? (6 points) Assume the returns of each asset are
independent from each other. With this assumption, are the mean returns
statistically different from each other at the 1% level of significance?
Provide details for all 5 steps to conduct each of the hypothesis tests
using what your have learned in the unit. Calculate and report all the
relevant values for your conclusion and be sure to provide and
interpretation of the results. (Hint: You need to discuss the equality
of variances to determine which type of test to use.) The testing
procedure 1. The null hypothesis is that all means are equal. The
alternative is that at least one mean is not equal. 2. Again, use a
level of significance of 0.01. 3. The test statistic is an F statistic: F
= MSB MSW ∼ Fc−1,n−c Where MSB is the mean square between groups and
MSW is the mean square within groups. 4. The critical values are from an
F distribution with c-1 degrees of freedom in the numerator and n-c
degrees of freedom in the denominator. This is a one-tailed test so we
place all α = 0.01 in the upper tail. 5. The decision and
interpretation. Before implementing the test, we should test the
equality of variance. We can test for homogeneity of variance using
Levene’s test. ## Decide on which test is appropriate for testing
differences in mean returns ## Hint: Include the results of your
supporting test for the differences in variances (include all 5
hypothesis step tests and the equation for the test statistics, and a
clear interpretation of the result). ## Hint:
http://www.sthda.com/english/wiki/one-way-anova-test-in-r ## So this
section has (at least) 2 significance tests. qf(0.99, 3-1, 5472 - 3) ##
[1] 4.60905 7 Table 4: Levene’s test Df F value Pr(>F) group 2
1098.819 0 16413 NA NA Table 5: One-way analysis of means num.df den.df
statistic p.value method 2 9793.712 0.996 0.369 One-way analysis of
means (not assuming equal variances) car::leveneTest(return ~ symbol,
data = stocks)%>% mutate_if(is.numeric, round, digits=3)%>%
knitr::kable(caption = "Levene’s test") The F statistic for the test is
1098.8 which is larger than our critical value of 4.609. The p-value of
the test is < 2.2e-16 which is less than our level of significance of
0.01. So we can reject the null hypothesis. This means we can assume
that the return variances are unequal across the three stocks. This
result informs our selection of the one-way ANOVA to test for
differences in mean return. We can use the test that assumes that the
samples are drawn from populations which have unequal variances.
oneway.test(return ~ symbol, data = stocks, var.equal = F)%>%
broom::tidy()%>%mutate_if(is.numeric, round, digits=3)%>%
knitr::kable(caption = "One-way analysis of means") We can see that the F
statistic is 0.9960773 which is less than our critical value of 4.609.
The p-value of the test is 0.3693628 which is larger than our level of
significance of 0.01. So we cannot reject the null hypothesis. We can
conclude that the three stocks have a equal average return. 2.7
Correlations (2 points) Calculate and present the correlation matrix of
the returns. Discuss the direction and strength of the correlations. ##
Include a formatted correlation matrix here ## Hint:
http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software
correlationmat<-cor(data.frame(CWT =
stocks$return[stocks$symbol=="CWT"], KO =
stocks$return[stocks$symbol=="KO"], RAD =
stocks$return[stocks$symbol=="RAD"]), use = "complete.obs")
kable(round(correlationmat,3),caption = "correlation matrix") The
correlation between the three stocks is weak. Table 6: correlation
matrix CWT KO RAD CWT 1.000 0.328 0.178 KO 0.328 1.000 0.156 RAD 0.178
0.156 1.000 8 Table 7: orrelation test row column cor p CWT KO 0.3281730
0 CWT RAD 0.1781285 0 KO RAD 0.1557560 0 2.8 Testing the significance
of correlations (2 points) Is the assumption of independence of stock
returns realistic? Provide evidence (the hypothesis test including all 5
steps of the hypothesis test and the equation for the test statistic)
and a rationale to support your conclusion. The testing procedure 1. The
null hypothesis is that all stocks are independent. The alternative is
that at least two stocks are correlated. 2. Again, use a level of
significance of 0.01. 3. The test statistic is an t statistic: t = r√ 1−
r2 ∼ tn−2 Where r is the correlation, n is the number of observation in
x and y variables. 4. The critical values are from a t distribution
with n-2 degrees of freedom. This is a two-tailed test. 5. The decision
and interpretation. ## Report the results of tests for statistical
significance of the correlations here. ## Hint:
http://www.sthda.com/english/wiki/correlation-matrix-a-quick-start-guide-to-analyze-format-and-visualize-a-correlation-matrix-using-r-software
# ++++++++++++++++++++++++++++ # flattenCorrMatrix #
++++++++++++++++++++++++++++ # cormat : matrix of the correlation
coefficients # pmat : matrix of the correlation p-values
flattenCorrMatrix <- function(cormat, pmat) { ut <-
upper.tri(cormat) data.frame( row = rownames(cormat)[row(cormat)[ut]],
column = rownames(cormat)[col(cormat)[ut]], cor =(cormat)[ut], p =
pmat[ut] ) } res2<-Hmisc::rcorr(as.matrix(data.frame(CWT =
stocks$return[stocks$symbol=="CWT"], KO =
stocks$return[stocks$symbol=="KO"], RAD =
stocks$return[stocks$symbol=="RAD"]))) kable(flattenCorrMatrix(res2$r,
res2$P),caption = "orrelation test") Since p-values are 0, we can reject
the null hypothesis and conclude that the the assumption of
independence of stock returns is not realistic. 9 2.9 Advising an
investor (12 points) Suppose that an investor has asked you to assist
them in choosing two of these three stocks to include in their
portfolio. The portfolio is defined by r = w1r1 + w2r2 Where r1 and r2
represent the returns from the first and second stock, respectively, and
w1 and w2 represent the proportion of the investment placed in each
stock. The entire investment is allocated between the two stocks, so w +
1 + w2 = 1. The investor favours the combination of stocks that
provides the highest return, but dislikes risk. Thus the investor’s
happiness is a function of the portfolio, r: h(r) = E(r)− Var(r) Where
E(r) is the expected return of the portfolio, and Var(r) is the variance
of the portfolio.1 Given your values for E(r1), E(r2), Var(r1), Var(r2)
and Cov(r1, r2) which portfolio would you recommend to the investor?
What is the expected return to this portfolio? Provide evidence to
support your answer, including all the steps undertaken to arrive at the
result. (*Hint: review your notes from tutorial 6 on portfolio
optimisation. A complete answer will include the optimal weights for
each possible portfolio (pair of stocks) and the expected return for
each of these portfolios.) Portfolio 1: CWT and KO let r1 be CWT return
and r2 be KO return, then we have E(r1) = 0.038, E(r2) = 0.022 and V
ar(r1) = 3.337, V ar(r2) = 1.768 according to the result got above.
cov(r1, r2) = 0.797 Choose the optimal w1 and w2 y = E(r)−V ar(r) =
w1E(r1)+w2E(r2)−w21V ar(r1)−w22V ar(r2)−2w1w2Cov(r1, r2) =
w10.038+w20.022− w213.337− w221.768− 2w1w20.797 Since w2 = 1−w1, then y =
−3.511w21 +1.76w1− 1.746. According to the property of quadratic
function, we know when w1 = −1.76/(−2 ∗ 3.511) = 0.251, y is the
maximum. Thus, optimal w1 = 0.251, w2 = 1− 0.251 = 0.749 The expected
return of this portfolio is E(r) = 0.251 ∗ 0.038 + 0.749 ∗ 0.022 = 0.026
Portfolio 2: CWT and RAD let r1 be CWT return and r2 be RAD return,
then we have E(r1) = 0.038, E(r2) = −0.052 and V ar(r1) = 3.337, V
ar(r2) = 19.024 according to the result got above. cov(r1, r2) = 1.419
Choose the optimal w1 and w2 y = E(r)−V ar(r) = w1E(r1)+w2E(r2)−w21V
ar(r1)−w22V ar(r2)−2w1w2Cov(r1, r2) = w10.038−w20.052− w213.337−
w2219.024− 2w1w21.419 Since w2 = 1− w1, then y = −19.556w21 + 35.3w1 −
19.076. According to the property of quadratic function, we know when w1
= −35.3/(−2 ∗ 19.556) = 0.903, y is the maximum. Thus, optimal w1 =
0.903, w2 = 1− 0.903 = 0.097 The expected return of this portfolio is
E(r) = 0.903 ∗ 0.038− 0.097 ∗ 0.052 = 0.029 Portfolio 3: KO and RAD let
r1 be KO return and r2 be RAD return, then we have E(r1) = 0.022, E(r2) =
−0.052 and V ar(r1) = 1.768, V ar(r2) = 19.024 according to the result
got above. cov(r1, r2) = 0.903 1Note that E(r) = w1E(r1) + w2E(r2), and
Var(r) = w21Var(r1) + w22Var(r2) + 2w1w2Cov(r1, r2) 10 Table 8: Best
Portfolio Stock Returns Stock Returns Variances Weights Return*Weight
CWT 0.038 3.337 0.903 0.034314 RAD -0.052 19.024 0.097 -0.005044 Choose
the optimal w1 and w2 y = E(r)−V ar(r) = w1E(r1)+w2E(r2)−w21V
ar(r1)−w22V ar(r2)−2w1w2Cov(r1, r2) = w10.022−w20.052− w211.768−
w2219.024− 2w1w20.903 Since w2 = 1−w1, then y =
−18.986w21+36.316w1−19.076. According to the property of quadratic
function, we know when w1 = −36.316/(−2 ∗ 18.986) = 0.956, y is the
maximum. Thus, optimal w1 = 0.956, w2 = 1− 0.956 = 0.044 The expected
return of this portfolio is E(r) = 0.956 ∗ 0.022− 0.044 ∗ 0.052 = 0.019
The best portfolio is CWT and RAD. Its expected return is showed in
below table. cov(stocks$return[stocks$symbol=="CWT"],
stocks$return[stocks$symbol=="KO"],use = "complete.obs") ## [1]
0.7970251 cov(stocks$return[stocks$symbol=="CWT"],
stocks$return[stocks$symbol=="RAD"],use = "complete.obs") ## [1]
1.419271 cov(stocks$return[stocks$symbol=="KO"],
stocks$return[stocks$symbol=="RAD"],use = "complete.obs") ## [1]
0.9032068 # You can use this section to create a table of your results.
tibble("Stock" = c("CWT", "RAD"), "Returns" = c(0.038, -0.052),
"Variances" = c(3.337,19.024), "Weights" = c(0.903,0.097),
"Return*Weight" = Returns*Weights)%>% kable(caption = "Best Portfolio
Stock Returns", label = "stocks") 2.10 The impact of financial events
on returns (6 points) Two significant financial events have occurred in
recent history. On September 15, 2008 Lehman Brothers declared
bankruptcy and a Global Financial Crisis started. On March 11, 2020 the
WHO declared COVID-19 a pandemic. Use linear regression to determine if
a. Any of the stocks in your data exhibit positive returns over time. b.
Either of the two events had a significant impact on returns. Report
the regression output for each stock and interpret the results to
address these two questions. How would you interpret this information in
the context of your chosen portfolio? ## Add a column to your returns
data set. ## This is a factor variable with three levels: ## 'Lehman
Bankruptcy' for the date 2008-09-15, ## 'Pandemic' for the date
2020-03-11, and 11 ## 'BAU' (Business as usual) for all other dates.
finacialevent <- rep("BAU",nrow(stocks))
finacialevent[stocks$date=="2008-09-15"] = 'Lehman Bankruptcy'
finacialevent[stocks$date=="2020-03-11"] = 'Pandemic' stocks =
cbind(stocks,data.frame(finacialevent)) ## Then run a regression
analysis to determine whether returns to each stock are increasing over
time and if the events had and statistically significant impact on the
returns of each stock. CWT =
stocks[stocks$symbol=="CWT",]%>%drop_na() KO =
stocks[stocks$symbol=="KO",]%>%drop_na() RAD =
stocks[stocks$symbol=="RAD",]%>%drop_na() CWT.lm <-
lm(return~date+finacialevent,CWT) summary(CWT.lm) ## ## Call: ##
lm(formula = return ~ date + finacialevent, data = CWT) ## ## Residuals:
## Min 1Q Median 3Q Max ## -12.3224 -0.9246 0.0246 0.9586 25.6784 ## ##
Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept)
-6.051e-02 1.625e-01 -0.372 0.710 ## date 6.743e-06 1.076e-05 0.627
0.531 ## finacialeventLehman Bankruptcy -2.253e+00 1.824e+00 -1.235
0.217 ## finacialeventPandemic -7.945e+00 1.824e+00 -4.355 1.35e-05 ***
## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## ## Residual standard error: 1.824 on 5468 degrees of freedom ##
Multiple R-squared: 0.003787, Adjusted R-squared: 0.003241 ##
F-statistic: 6.93 on 3 and 5468 DF, p-value: 0.0001186 KO.lm <-
lm(return~date+finacialevent,KO) summary(KO.lm) ## ## Call: ##
lm(formula = return ~ date + finacialevent, data = KO) ## ## Residuals:
## Min 1Q Median 3Q Max ## -10.6083 -0.5715 0.0139 0.6057 12.9790 ## ##
Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept)
-6.969e-02 1.184e-01 -0.588 0.5563 ## date 6.195e-06 7.840e-06 0.790
0.4294 ## finacialeventLehman Bankruptcy 4.397e-01 1.329e+00 0.331
0.7408 ## finacialeventPandemic -2.783e+00 1.330e+00 -2.093 0.0364 * ##
--- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##
12 ## Residual standard error: 1.329 on 5468 degrees of freedom ##
Multiple R-squared: 0.0009225, Adjusted R-squared: 0.0003744 ##
F-statistic: 1.683 on 3 and 5468 DF, p-value: 0.1684 RAD.lm <-
lm(return~date+finacialevent,RAD) summary(RAD.lm) ## ## Call: ##
lm(formula = return ~ date + finacialevent, data = RAD) ## ## Residuals:
## Min 1Q Median 3Q Max ## -38.210 -1.802 0.037 1.768 35.524 ## ##
Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept)
-1.393e-01 3.882e-01 -0.359 0.719721 ## date 6.052e-06 2.570e-05 0.236
0.813818 ## finacialeventLehman Bankruptcy 1.059e+00 4.358e+00 0.243
0.808035 ## finacialeventPandemic -1.638e+01 4.358e+00 -3.759 0.000173
*** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' '
1 ## ## Residual standard error: 4.357 on 5468 degrees of freedom ##
Multiple R-squared: 0.002593, Adjusted R-squared: 0.002045 ##
F-statistic: 4.738 on 3 and 5468 DF, p-value: 0.00265 Since the three
coefficients of date are larger than 0, we can conclude that the three
stocks exhibit positive returns over time. Since only the p-value of
coefficient of Pandemic financial event is less than 0.05, which
indicate we can reject the null hypothesis and conclude that the
Pandemic event had a significant impact on returns. 13