CRICOS 00099F-Excel代写
时间:2023-10-04
UTS CRICOS 00099F
Financial Modelling and Analysis
Seminar 6
Linear Regression – Simple and Multiple
2Linear Regression
• What is Linear Regression?
• How to estimate regression coefficients?
o Ordinary Least Squares (OLS ) estimation
o Assumptions and properties
• Interpreting coefficients
• Significance of coefficients
• Regression with log variables
• Coefficient of Determination R2
Seminar 6 - Linear Regression – Simple and Multiple
3Correlation
• Measures the strength of the linear relationship between two variables
• Unit less
• Ranges between -1 and 1
o Close to -1: strong negative linear relationship
o Close to 1: strong positive linear relationship
o Close to 0: weak linear relationship
Seminar 6 - Linear Regression – Simple and Multiple
4Linear Correlation
Seminar 6 - Linear Regression – Simple and Multiple
5Linear Regression
• Correlation is about the strength of the relation.
• It does not tell how much one variable is “affected” by the other variables.
• If trading volume increases by 10%, how much would volatility change?
• In regression, one variable is considered:
o the independent variable / predictor (X); and the other is
o the dependent variable / outcome (Y)
• Change in X is used to predict → Change in Y
• A 10% increase in volume is associated with an average of 3% increase in volatility
Seminar 6 - Linear Regression – Simple and Multiple
6Linear Regression
• Regression is a technique to estimate the statistical relation between Y and X.
• The population (true) relation between Y and X is:
= 0 + 1 +
• The error term captures random deviations.
• If there is no random deviation, i.e. Var () = 0 → Y and X have a deterministic relation.
• Unknown parameters:
o 0 → intercept
o 1 → slope
Seminar 6 - Linear Regression – Simple and Multiple
7OLS Estimation
• If a sample of Xi and Yi , i = 1,…,n, is taken from the population, β0 and β1 can be
estimated using the ordinary least squares (OLS)
• The estimated relationship for the sample is:
෠ = መ0 + መ1
• The residual is = − ෠
• መ0 and መ1 are chosen to minimize the sum of squared errors
=෍
=1
2 =෍
=1
− መ0 − መ1
2
• Same as choosing α of SES to minimize MSE
Seminar 6 - Linear Regression – Simple and Multiple
8Simple Linear Regression Model
Seminar 6 - Linear Regression – Simple and Multiple
1. Regression Model
(unknown parameters)
2. Sample Data 3. Estimate Regression
(sample statistics)
= 0 + 1 + ෠ = መ0 + መ1 +
9OLS Assumptions
I. The population (true) relation between Y and X is linear in parameters.
o It can be = 0 + 1
2 + or log = 0 + 1 log +
o It cannot be = 0 + 1
2 + or = 0 + 1 +
II. The sample of Xi and Yi , i = 1,…,n, is an independent (random) draw from the
population.
III. E( ε ) = E( ε |X) = 0
Knowing X does not help to know ε
IV. = =
2: same error variance regardless of X: homoscedasticity
Seminar 6 - Linear Regression – Simple and Multiple
10
Homoscedasticity
Seminar 6 - Linear Regression – Simple and Multiple
11
Properties of OLS Estimators
• Under Assumptions I to III, መ0 and መ1 are unbiased.
o E መ0 = 0 and E መ1 = 1 : the distributions of መ0 and መ1 are centred around the true
population parameters 0 and 1, respectively.
• Under Assumptions I to IV, the variance of መ1 based on a sample of Xi , i = 1,…,n, is:
መ1 =
2
σ − ത 2
o A large
2 results in less accurate estimate of መ1.
o A large
2 =
σ −ഥ
2
−1
results in more accurate estimate of መ1.
Seminar 6 - Linear Regression – Simple and Multiple
12
Example 1
• The number of shareholders is an
important determinant of a stock’s liquidity
and valuation.
• An analyst wants to build a model to
quantify the relationship between the
sales/revenue of a listed firm and its share
ownership .
• A random sample of 20 stocks was
selected from the NYSE with the data
presented in the following table:
Seminar 6 - Linear Regression – Simple and Multiple
Stock Sales ($m) Shareholders
1 1,001 9,170
2 926 11,050
3 506 8,840
4 741 10,021
5 789 9,420
6 889 11,280
7 874 12,450
8 510 6,640
9 529 7,240
10 364 6,120
11 679 7,630
12 771 8,040
13 924 9,120
14 607 6,640
15 452 6,920
16 794 9,330
17 665 8,950
18 844 10,230
19 1,010 11,770
20 567 5,170
13
Example 1 – Scatter Plot
• A scatter plot is a good first approach to see if there is a relationship between X and Y.
Seminar 6 - Linear Regression – Simple and Multiple
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 200 400 600 800 1,000 1,200
S
h
a
re
h
o
ld
e
rs
Sales ($m)
14
Example 1 – Regression Output
• The equation for the ‘best’ straight line is:
Seminar 6 - Linear Regression – Simple and Multiple
Regression Statistics
Multiple R 0.80
R Square 0.63
Adjusted R Square 0.61
Standard Error 1,237.80
Observations 20.00
ANOVA
df SS MS F Significance F
Regression 1.00 47,846,825.69 47,846,825.69 31.23 0.00
Residual 18.00 27,578,867.26 1,532,159.29
Total 19.00 75,425,692.95
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0%
Intercept 2,804.83 1,108.22 2.531 0.021 476.55 5,133.11 -385.11 5,994.77
Sales ($m) 8.30 1.49 5.588 0.000 5.18 11.43 4.03 12.58
෠ = 2,804.83 + 8.3
15
Example 1 – Slope
• The general interpretation of the slope is that for each increase of one unit in X, Y is
estimated to increase by the coefficient of X.
Seminar 6 - Linear Regression – Simple and Multiple
Regression Statistics
Multiple R 0.80
R Square 0.63
Adjusted R Square 0.61
Standard Error 1,237.80
Observations 20.00
ANOVA
df SS MS F Significance F
Regression 1.00 47,846,825.69 47,846,825.69 31.23 0.00
Residual 18.00 27,578,867.26 1,532,159.29
Total 19.00 75,425,692.95
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0%
Intercept 2,804.83 1,108.22 2.531 0.021 476.55 5,133.11 -385.11 5,994.77
Sales ($m) 8.30 1.49 5.588 0.000 5.18 11.43 4.03 12.58
A $1 million increase in sales is expected to increase
the number of shareholders by 8.3
16
Example 1 – Intercept
• The intercept represents the average value of Y when X equals zero.
o This is a mechanical interpretation and often not meaningful
Seminar 6 - Linear Regression – Simple and Multiple
Regression Statistics
Multiple R 0.80
R Square 0.63
Adjusted R Square 0.61
Standard Error 1,237.80
Observations 20.00
ANOVA
df SS MS F Significance F
Regression 1.00 47,846,825.69 47,846,825.69 31.23 0.00
Residual 18.00 27,578,867.26 1,532,159.29
Total 19.00 75,425,692.95
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0%
Intercept 2,804.83 1,108.22 2.531 0.021 476.55 5,133.11 -385.11 5,994.77
Sales ($m) 8.30 1.49 5.588 0.000 5.18 11.43 4.03 12.58
Since sales is never zero, the intercept can be interpreted as the average
number of shareholders associated with factors other than sales
17
Variable Significance
• What does it tell us about the population ?
o Does X help explain Y in the population?
o The variable X is significant if the population slope β1 ≠ 0
• To determine if the variable X is significant, we conduct a hypothesis test:
o H0: β1 = 0; X does not help to explain
o H1: β1 ≠ 0; X does help to explain
• The test statistic is
=
෡1−1
෡1
Seminar 6 - Linear Regression – Simple and Multiple
18
Testing Variable Significance
• We can use three approaches:
• The bottom two are provided in Excel outputs
Seminar 6 - Linear Regression – Simple and Multiple
Test statistic if test statistic > critical value(df,α/2) → Reject H0: β1 = 0
p-value if p-value < α → Reject H0: β1 = 0
Confidence interval if β1 = 0 is not contained in the CI → Reject H0: β1 = 0
19
Testing Variable Significance | p-value
• Assuming that the null hypothesis is true, the p-value is the probability of obtaining a test
statistic more extreme than the observed value.
o If the p-value is less than α, H0 is rejected.
o If it is greater than α, H0 is not rejected.
• The p-value indicates the likelihood of observing the relationship by chance, given there
is no relationship.
Seminar 6 - Linear Regression – Simple and Multiple
20
Example 1 – Testing Variable Significance | p-value
=
መ1 − 1
෡1
=
8.3 − 0
1.49
= 5.588
Seminar 6 - Linear Regression – Simple and Multiple
Regression Statistics
Multiple R 0.80
R Square 0.63
Adjusted R Square 0.61
Standard Error 1,237.80
Observations 20.00
ANOVA
df SS MS F Significance F
Regression 1.00 47,846,825.69 47,846,825.69 31.23 0.00
Residual 18.00 27,578,867.26 1,532,159.29
Total 19.00 75,425,692.95
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0%
Intercept 2,804.83 1,108.22 2.531 0.021 476.55 5,133.11 -385.11 5,994.77
Sales ($m) 8.30 1.49 5.588 0.000 5.18 11.43 4.03 12.58
Since p-value < 1%, we reject H0: β1 = 0 in favour of H1: β1 ≠ 0;
therefore, X does help to explain
• Perform a hypothesis test that sales is significant. Use α = 1%.
21
Testing Variable Significance | Confidence Intervals
• A (1 -α) confidence interval for β1 is:
• መ1 ± , Τ 2 ෡1
• If zero is not contained within the interval then we would reject H0 at the α significance
level
Seminar 6 - Linear Regression – Simple and Multiple
22
Example 1 – Testing Variable Significance | Confidence
Intervals
Seminar 6 - Linear Regression – Simple and Multiple
Regression Statistics
Multiple R 0.80
R Square 0.63
Adjusted R Square 0.61
Standard Error 1,237.80
Observations 20.00
ANOVA
df SS MS F Significance F
Regression 1.00 47,846,825.69 47,846,825.69 31.23 0.00
Residual 18.00 27,578,867.26 1,532,159.29
Total 19.00 75,425,692.95
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0%
Intercept 2,804.83 1,108.22 2.531 0.021 476.55 5,133.11 -385.11 5,994.77
Sales ($m) 8.30 1.49 5.588 0.000 5.18 11.43 4.03 12.58
Since 0 is not contained in the 99% CI, we reject H0: β1=0 in favour of
H1: β1 ≠ 0; therefore, X does help to explain
• Perform a hypothesis test that sales is significant. Use α = 1%.
23
Example 1 – Forecast
• The estimated relationship can now be used to forecast share ownership.
• For example, the predicted shareholders for a sales of $ 600m is given by:
෠ = 2,804.83 + 8.3 × 600 = 7,787.56
• When forecasting, we should avoid selecting value of X that is well outside its observed
range.
• Interpolation is acceptable but extrapolation can be dangerous: the estimated
relationship may not hold outside the observed range of X.
Seminar 6 - Linear Regression – Simple and Multiple
24
Example 2 – Logarithmic Transformation
• The new model introduces a logarithmic
transformation on both Sales and
Number of Shareholders.
• The transformation has an impact on the
interpretation of the coefficients.
Seminar 6 - Linear Regression – Simple and Multiple
Stock Sales ($m) Shareholders ln(Sales) ln(Shareholders)
1 1,001 9,170 6.91 9.12
2 926 11,050 6.83 9.31
3 506 8,840 6.23 9.09
4 741 10,021 6.61 9.21
5 789 9,420 6.67 9.15
6 889 11,280 6.79 9.33
7 874 12,450 6.77 9.43
8 510 6,640 6.23 8.80
9 529 7,240 6.27 8.89
10 364 6,120 5.90 8.72
11 679 7,630 6.52 8.94
12 771 8,040 6.65 8.99
13 924 9,120 6.83 9.12
14 607 6,640 6.41 8.80
15 452 6,920 6.11 8.84
16 794 9,330 6.68 9.14
17 665 8,950 6.50 9.10
18 844 10,230 6.74 9.23
19 1,010 11,770 6.92 9.37
20 567 5,170 6.34 8.55
25
Interpretation of the Coefficients
Seminar 6 - Linear Regression – Simple and Multiple
Model Model type Change in X Y is estimated to change
= መ0 + መ1 + Level-Level 1 unit → መ1 units
= መ0 + መ1 log + Level-Log 1% → መ1/100 units
log = መ0 + መ1 log + Log-Log 1% → መ1%
log = መ0 + መ1 + Log-Level 1 unit → መ1*100 %
26
Example 2 – Slope
• The general interpretation of the slope is that for each 1% increase in X, Y is estimated to
increase by the coefficient of X %.
Seminar 6 - Linear Regression – Simple and Multiple
A 1% increase in sales is expected to increase
the number of shareholders by 0.64%
27
Example 3
• A mutual fund manager wants to develop
a regression model to estimate the
relationship between a fund’s value of
asset under management (AUM, $b) and
its annual gross return (R, %).
• He has data from the following 10 funds:
Seminar 6 - Linear Regression – Simple and Multiple
Stock AUM ($b)
Annual Return
(%)
1 10 -7
2 6 -11
3 5 -8
4 12 8
5 10 11
6 15 2
7 5 -9
8 12 -4
9 17 11
10 20 14
28
Example 3 – Slope
• The estimated model is ෣ = 10.93 + 0.43
Seminar 6 - Linear Regression – Simple and Multiple
1 percentage point increase in Annual Return is
expected to increase the AUM by 0.43 $b
29
Multiple Linear Regression
• In addition to annual return, are there other variables can help explain AUM?
o Management fee? Advertising expense?
o Currently the effects of other variables are in the constant (10.93) or the residual εi
• Multiple Linear Regression Models have more than one explanatory variable.
Seminar 6 - Linear Regression – Simple and Multiple
30
Multiple Linear Regression
• The population relationship with k explanatory variables is:
= 0 + 11 +⋯+ +
• :average change in Y for 1 unit change in Xj
• i = 1,2,…,n observations; j = 1,2,..,k independent variables
• The coefficients () are estimated from the sample data
෠ = መ0 + መ11 +⋯+ መ +
• Here መ is an estimate of and ei is an estimate of
Seminar 6 - Linear Regression – Simple and Multiple
31
Example 4
• A mutual fund manager wants to develop
a regression model to estimate the
relationship between a fund’s value of
asset under management (AUM, $b), its
annual gross return (R, %), and the
management fee (F, %) it charges.
• He has data from the following 10 funds:
Seminar 6 - Linear Regression – Simple and Multiple
Stock AUM ($b)
Annual Return
(%)
Fee
(%)
1 10 -7 1.3
2 6 -11 2.0
3 5 -8 1.7
4 12 8 1.5
5 10 11 1.6
6 15 2 1.2
7 5 -9 1.6
8 12 -4 1.4
9 17 11 1.0
10 20 14 1.1
32
Example 4 – Coefficients
• The estimated model is ෣ = 25.44 + 0.23 − 9.99
Seminar 6 - Linear Regression – Simple and Multiple
A 1 percentage point increase in Annual Return is
expected to increase the AUM by 0.23 $b,
holding the Management Fee constant
A 1 percentage point increase in Management Fee
is expected to decrease the AUM by 9.99 $b,
holding the Annual Return constant
33
Goodness of Fit
• How well does a model fit the data?
• How strong is the relation between Y and a set of explanatory variables?
• Recall that correlation measures the strength of the linear relationship between two
variables?
• The coefficient of determination measures the strength of the linear relationship between
Y and a set of k explanatory variables (k ≥1) and is denoted as R2.
Seminar 6 - Linear Regression – Simple and Multiple
34
The Calculation of R2
• The coefficient of determination (R2) is defined as the proportion of Var (Y) explained by
the X’s.
• Let ෠ = መ0 + መ11 +⋯+ መ + . It is the value of Y explained by the X’s
= ෠ + = ෠ +
• Therefore,
2 =

=
σ ෡ − ഥ
2
σ − ത 2
= 1 −
< <
Seminar 6 - Linear Regression – Simple and Multiple
35
Interpreting R2
• Example 3
෣ = 10.93 + 0.43
2 = 0.65
65% of the variations in AUM is explained by fund’s return
• Example 4
෣ = 25.44 + 0.23 − 9.99
2 = 0.86
86% of the variations in AUM is explained by fund’s return and fees
• R2 always increases with k, the number of explanatory variables.
Seminar 6 - Linear Regression – Simple and Multiple
essay、essay代写