MCD2080 168-无代写
时间:2022-12-12
13. The Multiple Linear Regression Model
Monash College Topic 13: Multiple Linear Regression MCD2080 168 / 250
Key Objectives for Topic 13
• Use Excel to estimate a multiple linear regression model.
• Interpret estimated coefficients in a multiple linear regression model.
• Use the multiple linear regression model for prediction.
• Perform hypothesis tests to check for a significant relationship
between an independent variable and the dependent variable.
• Perform a 1-sided hypothesis test to check for a significant positive or
negative relationship between an independent variable and the
dependent variable.
Monash College Topic 13: Multiple Linear Regression MCD2080 169 / 250
13. The Multiple Linear Regression Model
Up until now we have looked at regression models of the ‘simple’ variety.
They took the form:
Yi = β0 + β1Xi + ei
This is a simple linear regression model in the sense that it includes just
one X variable.
But this can be problematic when a number of factors influence Y .
Let’s consider our US income data where Y is income and we have
potential factors which influence income like Age and Education.
Monash College Topic 13: Multiple Linear Regression MCD2080 170 / 250
13. The Multiple Linear Regression Model
In such situations we have to be careful in our use of simple linear
regression. Consider the model:
Incomei = β0 + β1Educationi + ei
Estimates are shown below.
Figure: Regression Results—Income on Education
Monash College Topic 13: Multiple Linear Regression MCD2080 171 / 250
13. The Multiple Linear Regression Model
Both Education and Age are likely to positively effect Income. Moreover,
they are likely to be positively correlated.
This means the coefficient on the simple linear regression on
Education will include part of the effect of Age.
The coefficient will be too big.
Figure: An Illustration of the
Correlation-Between-the-X ’s-Problem
Y
(Income)
X1
(Education)
X2
(Age)
β1
β2
Corr(X1,X2) 6= 0
Figure: An Illustration of Omitted
Variable Bias
Y
(Income)
X1
(Education)
X2
(Age)
β∗1
Corr(X1,X2) > 0
Monash College Topic 13: Multiple Linear Regression MCD2080 172 / 250
13.1. The Multiple Linear Regression Model
The solution to this problem is to estimate a multiple linear regression
model:
Yi = β0 + β1Xi1 + β2Xi2 + · · ·βkXik + ei
This is estimated the same way in Excel as a simple linear regression
model.
• Instead of selecting a single column in the ‘Input X Range’ we just
select a number of columns.
Monash College Topic 13: Multiple Linear Regression MCD2080 173 / 250
13.1. The Multiple Linear Regression Model
Let us estimate a model with both Education and Age included.
Figure: Regression Results—Income on Education and Age
The coefficients on both variables are positive as we expected.
The coefficient on Education is smaller than in the simple linear regression
(with just Education).
This reflects the omitted variable bias problem we discussed.
Monash College Topic 13: Multiple Linear Regression MCD2080 174 / 250
13.2. Interpreting the Model
What do the βˆ0, βˆ1 and βˆ2 actually tell us?
• βˆ0 is the intercept—the estimated value of Y when X1 = 0 and
X2 = 0.
• βˆ1 and βˆ2 are the slopes of Y with respect to X1 and X2—they
estimate the change in Y for a 1 unit change in the respective
variables.
Let us explain by looking at the model with Income on Education and Age
• βˆ0 = −117120: What the model is saying is that if X1 = 0 and
X2 = 0 (i.e. Age = 0 and Education = 0) then the income that this
person would earn is −$117, 120 per year.
This is not particularly meaningful. There is no one in our data who
is zero years old—and there is never likely to be! There is also no one
with zero years education.
In the case of this particular regression, the intercept is not
informative. But in many cases, as we will see below, the intercept is
quite useful.
Monash College Topic 13: Multiple Linear Regression MCD2080 175 / 250
13.2. Interpreting the Model
• βˆ1 = 4541: is an estimate of the effect of Education on income. It
tells us how much Income would change if Education were one year
higher.
In particular, we can say “take two people of the same age, one of
whom has one more year of education than the other. The person
with one year higher education can expect to earn, on average, $4,541
more per year than the person with lower education of the same age.”
• βˆ2 = 3369: is an estimate of the effect of Age on Income. It tells us
how much income would change if Age were 1 year higher.
In particular, we can say “take 2 people with the same education, one
of whom is 1 year older than the other. The person who is 1 year
older can expect to earn, on average, $3,369 more per year than the
younger person with the same education.”
Monash College Topic 13: Multiple Linear Regression MCD2080 176 / 250
13.3. Obtaining Predictions From the Model
One of the reasons why you might want to construct a regression model in
the first place is for prediction.
We might want to know what the average income is for a person with
certain characteristics (e.g. a 30 year old with 12 years of education).
Consider the previous regression model:
̂Income = βˆ0 + βˆ1 Age + βˆ2 Education
= −117120 + 4541× Education + 3369×Age
To predict we simply plug in the X values:
̂Income = −117120 + 4541× 12 + 3369× 30 (1)
= 38442
A person who has 12 years of education and is 30 years old is expected to
earn $38,442 per year.
Monash College Topic 13: Multiple Linear Regression MCD2080 177 / 250
13.4. Testing Hypotheses About the Slope
Testing hypotheses about coefficients in the multiple linear regression
model follows the same process as in the simple linear regression model.
Let’s test the hypothesis that education has no effect on income against
the alternative that it has a positive effect using the p-value approach.
1: Formulate the null and alternative hypotheses
H0 : β1 = 0 (the null is that there is no relationship between X1 and
Y where X1 is education)
H1 : β1 > 0 (the alternative is that there is a positive relationship
between X1 and Y )
2: Decide a significance level
Let us use α = 0.05 or 5%.
Monash College Topic 13: Multiple Linear Regression MCD2080 178 / 250
13.4. Testing Hypotheses About the Slope
3: Calculate the test statistic and critical value(s)
The p-value for the Education coefficient is 1.695E−84. We are
undertaking a one-sided test so the relevant number is: 1.695E−84/2.
4: Make a decision and draw a conclusion
As 1.695E−84/2 is less than 0.05 we conclude that there is sufficient
evidence to reject the null hypothesis that Education does not
influence Income and more specifically that it has a positive influence
on Income.
Monash College Topic 13: Multiple Linear Regression MCD2080 179 / 250
13.5. Advanced: Multiple Linear Regression and Least
Squares
We saw previously that the coefficients in the simple linear regression case
were derived as the solution to a least squares problem.
The coefficients in a multiple linear regression are derived in very much the
same way.
• Though a great deal more algebra is required because of the extra
variables.
The error in the multiple linear regression is:
eˆi =Yi −
(
βˆ0 + βˆ1Xi2 + βˆ2Xi2 + · · ·+ βˆkXik
)
Monash College Topic 13: Multiple Linear Regression MCD2080 180 / 250
13.5. Advanced: Multiple Linear Regression and Least
Squares
We find the best coefficients by minimizing:
SSE(βˆ0, βˆ1, βˆ2, . . . , βˆk) =
n∑
i=1
eˆ2i
=
n∑
i=1
(
Yi −
(
βˆ0 + βˆ1Xi2 + βˆ2Xi2 + · · ·+ βˆkXik
))2
This is a quadratic function with a unique minimum.
We proceed to find the minimum by differentiating SSE with respect to
each of the coefficients and setting it equal to zero. Then we solve for the
coefficients.
• This is a laborious task by hand—computers can do it much faster!
Monash College Topic 13: Multiple Linear Regression MCD2080 181 / 250
essay、essay代写