R代写-ECON 452
时间:2022-03-18
Prediction & Residual Analysis
ECON 452: Lecture 13
Jung Hwan Koh
Department of Economics
University of Michigan
March 9, 2022
1 / 24
Logistics
Homeowrk 4 + 5
Assigned today (3/9)
Due 3/16
Research Paper
Instruction will be posted today (3/9)
2 / 24
Prediction and Residual Analysis
3 / 24
Condence Interval for OLS estimators Example: CEO salary and Sales
Estimates of the Regression Model
(Intercept) lsales
4.9610774 0.2242794
Condence Interval
confint(ols0, "lsales", level=0.95)
2.5 % 97.5 %
lsales 0.1707372 0.2778217
Condence Interval for OLS estimators
^
β
1
± t
α/2
se(
^
β
1
)
log(salary
i
) = β
0
+ β
1
log(sales
i
) + u
i
4 / 24
Condence Intervals for Predictions
Suppose we have estimated the equation:
Given values of s, we can obtain a prediction (an estimate of the expected value of y
given the particular values for the explanatory variables.)
Let denote values of each of independent variables (either may or may not
correspond to an actual data point in the sample)
We would like to estimate the parameter given for
^
y =
^
β
0
+
^
β
1
x
1
+
^
β
2
x
2
+⋯+
^
β
k
x
k
.
x
^
y
c
1
, c
2
,⋯ , c
k
θ
0
c
1
,⋯ , c
k
x
1
,⋯ ,x
k
θ
0
= β
0
+ β
1
c
1
+ β
2
c
2
+⋯+ β
k
c
k
= E(y|x
1
= c
1
,⋯ ,x
k
= c
k
)
5 / 24
Condence Intervals for Predictions
Condence Interval for
The predicted value,
1. Point Estimate:
2. Standard Error of How to obtain this??
θ
0
^
θ
0
=
^
β
0
+
^
β
1
c
1
+
^
β
2
c
2
+⋯+
^
β
k
c
k
^
θ
0
± t
α/2
se(
^
θ
0
)
^
θ
0
se(
^
θ
0
) →
6 / 24
Condence Intervals for Predictions
Condence Interval for
How to Obtain the standard error of
Write
θ
0
^
θ
0
± t
α/2
se(
^
θ
0
)
^
θ
0
β
0
θ
0
= β
0
+ β
1
c
1
+ β
2
c
2
+⋯+ β
k
c
k
β
0
= θ
0
− β
1
c
1
− β
2
c
2
−⋯− β
k
c
k
7 / 24
Condence Intervals for Predictions
How to Obtain the standard error of
Plug into the regression model
Regress on
Standard error of the Intercept estimator of the regression model ( )
If are closer to , the standard error of is smaller.
^
θ
0
β
0
y = β
0
+ β
1
x
1
+⋯+ β
k
x
k
+ u
y = θ
0
− β
1
c
1
− β
2
c
2
−⋯− β
k
c
k
+ β
1
x
1
+⋯+ β
k
x
k
+ u
y = θ
0
+ β
1
(x
1
− c
1
) + β
2
(x
2
− c
2
) +⋯+ β
k
(x
k
− c
k
) + u
y (x
1
− c
1
),⋯ , (x
k
− c
k
)
^
θ
0
c
j
¯
x
^
θ
0
8 / 24
Condence Intervals for Predictions
Example: College GPA and Its Determinants
is measured on a four-point scale
is the percentile in the high school graduating class (defined so that, for example,
= 5 means the top 5% of the class)
is the combined math and verbal scores on the student achievement test
is the size graduating class (100s).
colgpa = β
0
+ beta
1
sat+ beta
2
hsprec+ β
3
hsize+ β
4
hsize
2
+ u
colgpa
hsperc
hsperc
sat
hsize
9 / 24
Condence Intervals for Predictions
Example: College GPA and Its Determinants
Estimates of the Regression Model
library(wooldridge)
data('gpa2')
gpa2$hsize_sq = gpa2$hsize^2
lm_gpa <- lm(colgpa ~ sat + hsperc + hsize + hsize_sq , data = gpa2)
summary(lm_gpa)
Call:
lm(formula = colgpa ~ sat + hsperc + hsize + hsize_sq, data = gpa2)
Residuals:
Min 1Q Median 3Q Max
-2.57543 -0.35081 0.03342 0.39945 1.81683
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.493e+00 7.534e-02 19.812 < 2e-16 ***
sat 1.492e-03 6.521e-05 22.886 < 2e-16 ***
hsperc -1.386e-02 5.610e-04 -24.698 < 2e-16 ***
hsize -6.088e-02 1.650e-02 -3.690 0.000228 ***
hsize_sq 5.460e-03 2.270e-03 2.406 0.016191 * 10 / 24
Condence Intervals for Predictions
Example: College GPA and Its Determinants
Prediction
The prediction of given
gpa2$sat0 = gpa2$sat - 1200
gpa2$hsperc0 = gpa2$hsperc - 30
gpa2$hsize0 = gpa2$hsize - 5
lm_gpa2 <- lm(colgpa ~ sat0 + hsperc0 + hsize0 + I(hsize0^2) , data = gpa2)
summary(lm_gpa2)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.700075482 1.987784e-02 135.8334366 0.000000e+00
sat0 0.001492497 6.521336e-05 22.8863702 3.027519e-109
hsperc0 -0.013855782 5.610052e-04 -24.6981355 9.760760e-126
hsize0 -0.006278506 8.600589e-03 -0.7300088 4.654262e-01
I(hsize0^2) 0.005460295 2.269848e-03 2.4055777 1.619059e-02
and
CI for
colgpa sat = 1200,hsperc = 30,hsize = 5
^
θ
0
= 2.709 se(
^
θ
0
) = 0.0199
θ
0
= 2.709 ± 1.96 × 0.0199
11 / 24
Condence Intervals for Predictions
Example: CEO salary and Sales
log(salary
i
) = β
0
+ β
1
log(sales
i
) + u
i
12 / 24
Condence Intervals for Predictions
A confidence interval for , the average (expected) value of for a given , is
where is the standard deviation of the error variance.
E(y|x) y x
∗
^
y ± t
α/2
^
σ
√
+
1
n
(x
∗
−
¯
x)
2
(n− 1) ⋅ s
2
x
^
σ
13 / 24
Example: CEO salary and Sales
Results of the Regression
lm(formula = lsalary ~ lsales, data = ceosal2)
[...]
Residual standard error: 0.5154 on 175 degrees of freedom
Multiple R-squared: 0.2809, Adjusted R-squared: 0.2767
F-statistic: 68.35 on 1 and 175 DF, p-value: 3.317e-14
Standard Deviation of
[1] 1.432086
Condence Intervals for Predictions
^
y ± t
α/2
^
σ
√
+
1
n
(x
∗
−
¯
x)
2
(n− 1) ⋅ s
2
x
lsales
6.477 ± 1.974 × .51542 ⋅√1/177 +
(6.763−7.2310)
2
(177−1)⋅1.4321
2
= 6.477 ± 0.000003488369
14 / 24
Condence Intervals for Predictions
Example: CEO salary and Sales
15 / 24
Prediction Interval for the Predicted Value
Let denote the value for which we would like to construct a interval, called prediction
interval:
Account for additional source of variation: variation in the unobserved error.
Prediction Error
where
y
0
y
0
= β
0
+ β
1
x
0
1
+ β
2
x
0
2
+⋯+ β
k
x
0
k
+ u
0
^
e
0
= y
0
−
^
y
0
= (β
0
+ β
1
x
0
1
+⋯+ β
k
x
0
k
) + u
0
−
^
y
0
^
y
0
=
^
β
0
+
^
β
1
x
0
1
+
^
β
2
x
0
2
+⋯+
^
β
k
x
0
k
16 / 24
Prediction Interval for the Predicted Value
Variance of the Prediction Error
and are uncorrelated.
is proportional to 1/n. This means that, for large samples, can be
very small.
Standard Error of the Prediction Error
Prediction Interval
V ar(
^
e
0
) = V ar(
^
y
0
) + V ar(u
0
) = V ar(
^
y
0
) + σ
2
u
0
^
y
0
V ar(u
0
) = σ
2
V ar(
^
y
0
) V ar(
^
y
0
)
se(
^
e
0
) = [(se(
^
y
0
)
2
+
^
σ
2
)]
1/2
^
y
0
± t
α/2
⋅ se(
^
e
0
)
17 / 24
Example: College GPA and its Determinants
Construct a prediction interval for the
predicted value given
lm(formula = colgpa ~ sat0 + hsperc0 + hsize0 + I(hsize0^2), data = gpa2)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.700e+00 1.988e-02 135.833 <2e-16 ***
sat0 1.492e-03 6.521e-05 22.886 <2e-16 ***
hsperc0 -1.386e-02 5.610e-04 -24.698 <2e-16 ***
hsize0 -6.279e-03 8.601e-03 -0.730 0.4654
I(hsize0^2) 5.460e-03 2.270e-03 2.406 0.0162 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.5599 on 4132 degrees of freedom
Multiple R-squared: 0.2781, Adjusted R-squared: 0.2774
F-statistic: 398 on 4 and 4132 DF, p-value: < 2.2e-16
Prediction Interval
;
Prediction Interval for the Predicted Value
sat = 1200,hsperc = 30,hsize = 5
V ar(
^
y
0
) = 0.0199
2
^
σ
2
= (0.560)
2
se(
^
e
0
) = [0.0199
2
+ (0.560)
2
]
1/2
= 0.5603
^
y
0
± t
α/2
⋅ se(
^
e
0
)
2.709 ± 1.96 ⋅ 0.5603
18 / 24
Prediction Interval for the Predicted Value
Prediction Interval for the predicted value
.content-box-blue[ A prediction interval for for a given is for a given , is
where is the standard deviation of the error variance.
y x
⋆
x
∗
^
y ± t
α/2
^
σ
√
1 + +
1
n
(x
∗
−
¯
x)
2
(n− 1) ⋅ s
2
x
^
σ
19 / 24
CI for E(y) vs. PI for y
20 / 24
CI for E(y) vs. PI for y - dierences
A prediction interval is similar in spirit to a confidence interval, except that
the prediction interval is designed to cover a “moving target”, the random future value of
y, while -the confidence interval is designed to cover the “fixed target”, the average
(expected) value of y, E(y), for a given x⋆.
Although both are centered at yˆ, the prediction interval is wider than the confidence
interval, for a given x⋆ and confidence level. This makes sense, since
the prediction interval must take account of the tendency of y to fluctuate from its mean
value, while
the confidence interval simply needs to account for the uncertainty in estimating the
mean value.
21 / 24
CI for E(y) vs. PI for y - similarities
For a given data set, the error in estimating E(y) and yˆ grows as x⋆ moves away from x ̄.
Thus, the further x⋆ is from x ̄, the wider the confidence and prediction intervals will be.
If any of the conditions underlying the model are violated, then the confidence intervals
and prediction intervals may be invalid as well. This is why it’s so important to check the
assumption.
22 / 24
Residual Analysis
23 / 24
Residual Analysis
1. The relationship between a dependent variable and independent variable (s) should be
linear relationship
2. Multivariate normality - the variables must be statistically Normally Distributed (i.e.
resembling a Bell Curve)
3. No auto-correlation - Autocorrelation occurs when the residuals are not independent
from each other. For instance, this typically occurs in stock prices, where the price is not
independent from the previous price.
4. Homoscedasticity - meaning that the residuals are equally distributed across the
regression line i.e. above and below the regression line and the variance of the residuals
should be the same for all predicted residuals along the regression line.
Example: Fuel Eciency of Cars and Their Sizes
mpg
i
= β
0
+ β
1
wt
i
+ u
i
24 / 24