PSTAT126-回归分析代写
时间:2022-11-22
Lab 6
PSTAT126
library("alr4")
library("MASS")
Consider the variable selection for a regression model:
From Highway data in package we try to model the ‘rate’ variable (accident rates per million vehicle miles,
1973) on other predictors.
After checking for correlaton of the response rate with the other numerical variables, the chosen top 3
covariates for the model are: acpt (0.752), sigs (0.564), slim (-0.681), the correlation with the response are
given in the parentheses.
pairs(Highway[,c("acpt","sigs","slim","rate")])
acpt
0.
0
1.
0
2.
0
10 20 30 40 50
2
4
6
8
0.0 1.0 2.0
sigs
slim
40 50 60 70
2 4 6 8
10
30
50
40
50
60
70
rate
fit<- lm(rate~ acpt + sigs + slim + acpt:sigs + acpt:slim + sigs:slim , data = Highway) # We include all the interactions
summary(fit)
1
##
## Call:
## lm(formula = rate ~ acpt + sigs + slim + acpt:sigs + acpt:slim +
## sigs:slim, data = Highway)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.16988 -0.85548 0.06791 0.68842 2.66209
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.1320336 4.5873140 1.119 0.272
## acpt 0.1579835 0.2375678 0.665 0.511
## sigs 3.0694973 5.4586207 0.562 0.578
## slim -0.0598072 0.0748099 -0.799 0.430
## acpt:sigs -0.0415319 0.0441722 -0.940 0.354
## acpt:slim -0.0001594 0.0043631 -0.037 0.971
## sigs:slim -0.0310221 0.0921576 -0.337 0.739
##
## Residual standard error: 1.234 on 32 degrees of freedom
## Multiple R-squared: 0.675, Adjusted R-squared: 0.6141
## F-statistic: 11.08 on 6 and 32 DF, p-value: 1.141e-06
Remove the predictor with highest p-value iteratively;
fit<- lm(rate~ acpt + sigs + slim + acpt:sigs + sigs:slim , data = Highway) # We remove acpt:slim
summary(fit)
##
## Call:
## lm(formula = rate ~ acpt + sigs + slim + acpt:sigs + sigs:slim,
## data = Highway)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.17381 -0.85892 0.06613 0.68082 2.66184
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.20161 4.10951 1.266 0.21447
## acpt 0.14953 0.05311 2.815 0.00816 **
## sigs 3.11374 5.24141 0.594 0.55652
## slim -0.06108 0.06518 -0.937 0.35555
## acpt:sigs -0.04102 0.04125 -0.994 0.32724
## sigs:slim -0.03196 0.08715 -0.367 0.71616
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.215 on 33 degrees of freedom
## Multiple R-squared: 0.675, Adjusted R-squared: 0.6258
## F-statistic: 13.71 on 5 and 33 DF, p-value: 2.867e-07
fit<- lm(rate~ acpt + sigs + slim + acpt:sigs , data = Highway)
summary(fit)
##
2
## Call:
## lm(formula = rate ~ acpt + sigs + slim + acpt:sigs, data = Highway)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.16651 -0.86568 0.00595 0.60643 2.77836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.20897 3.01741 2.058 0.04735 *
## acpt 0.13936 0.04471 3.117 0.00371 **
## sigs 1.20099 0.51134 2.349 0.02479 *
## slim -0.07699 0.04803 -1.603 0.11821
## acpt:sigs -0.02801 0.02078 -1.348 0.18661
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.199 on 34 degrees of freedom
## Multiple R-squared: 0.6737, Adjusted R-squared: 0.6353
## F-statistic: 17.55 on 4 and 34 DF, p-value: 6.711e-08
fit<- lm(rate~ acpt + sigs + slim, data = Highway)
summary(fit)
##
## Call:
## lm(formula = rate ~ acpt + sigs + slim, data = Highway)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05563 -0.90780 -0.01854 0.59923 2.83674
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.81760 2.80351 2.789 0.00851 **
## acpt 0.09493 0.03056 3.107 0.00374 **
## sigs 0.70700 0.36072 1.960 0.05800 .
## slim -0.09676 0.04627 -2.091 0.04384 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.213 on 35 degrees of freedom
## Multiple R-squared: 0.6563, Adjusted R-squared: 0.6268
## F-statistic: 22.27 on 3 and 35 DF, p-value: 3.031e-08
fit<- lm(rate~ acpt + slim, data = Highway)
summary(fit)
##
## Call:
## lm(formula = rate ~ acpt + slim, data = Highway)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.30429 -0.86525 0.01272 0.44776 3.00942
3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.41484 2.89479 2.907 0.006212 **
## acpt 0.11459 0.02998 3.822 0.000505 ***
## slim -0.10681 0.04776 -2.236 0.031618 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.26 on 36 degrees of freedom
## Multiple R-squared: 0.6185, Adjusted R-squared: 0.5973
## F-statistic: 29.19 on 2 and 36 DF, p-value: 2.925e-08
Diagnostic plots:
par(mfrow=c(2,2))
plot(fit$fitted.values, fit$residuals)
abline(h=0, col="red")
plot(Highway$acpt, fit$residuals)
abline(h=0, col="red")
plot(Highway$slim, fit$residuals)
abline(h=0, col="red")
qqnorm(fit$residuals)
qqline(fit$residuals, col="red")
4
2 4 6 8 10

2
0
2
fit$fitted.values
fit
$re
sid
ua
ls
10 20 30 40 50

2
0
2
Highway$acpt
fit
$re
sid
ua
ls
40 45 50 55 60 65 70

2
0
2
Highway$slim
fit
$re
sid
ua
ls
−2 −1 0 1 2

2
0
2
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
par(mfrow=c(2,2))
plot(hatvalues(fit), cooks.distance(fit), xlab="Leverages", ylab="Cook's Distance")
plot(hatvalues(fit), type = 'h',xlab = "Observation")
plot(cooks.distance(fit), type = "h", xlab ="Observation")
5
0.1 0.2 0.3 0.4 0.5
0.
0
0.
3
Leverages
Co
ok
's
Di
st
an
ce
0 10 20 30 40
0.
1
0.
4
Observation
ha
tv
a
lu
es
(fit
)
0 10 20 30 40
0.
0
0.
3
Observation
co
o
ks
.
di
st
an
ce
(fit
)
Box-Cox transformation
boxcox(fit, plotit=T, lambda=seq(-1,1,len=100))
6
−1.0 −0.5 0.0 0.5 1.0

29

28

27

26
λ
lo
g−
Li
ke
lih
oo
d
95%
Therefore, take log transformation on the response
fit<- lm(log(rate)~ acpt + slim, data = Highway)
summary(fit)
##
## Call:
## lm(formula = log(rate) ~ acpt + slim, data = Highway)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.68333 -0.21717 0.06092 0.17052 0.57559
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.986307 0.736793 4.053 0.000258 ***
## acpt 0.017795 0.007631 2.332 0.025411 *
## slim -0.035315 0.012157 -2.905 0.006243 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3208 on 36 degrees of freedom
## Multiple R-squared: 0.5452, Adjusted R-squared: 0.5199
## F-statistic: 21.58 on 2 and 36 DF, p-value: 6.931e-07
par(mfrow=c(2,2))
plot(fit$fitted.values, fit$residuals)
7
abline(h=0, col="red")
plot(Highway$acpt, fit$residuals)
abline(h=0, col="red")
plot(Highway$slim, fit$residuals)
abline(h=0, col="red")
qqnorm(fit$residuals)
qqline(fit$residuals, col="red")
0.5 1.0 1.5 2.0 2.5

0.
6
0.
2
fit$fitted.values
fit
$re
sid
ua
ls
10 20 30 40 50

0.
6
0.
2
Highway$acpt
fit
$re
sid
ua
ls
40 45 50 55 60 65 70

0.
6
0.
2
Highway$slim
fit
$re
sid
ua
ls
−2 −1 0 1 2

0.
6
0.
2
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e
Qu
an
tile
s
par(mfrow=c(2,2))
plot(hatvalues(fit), cooks.distance(fit), xlab="Leverages", ylab="Cook's Distance")
plot(hatvalues(fit), type = 'h',xlab = "Observation")
plot(cooks.distance(fit), type = "h", xlab ="Observation")
8
0.1 0.2 0.3 0.4 0.5
0.
0
0.
4
Leverages
Co
ok
's
Di
st
an
ce
0 10 20 30 40
0.
1
0.
4
Observation
ha
tv
a
lu
es
(fit
)
0 10 20 30 40
0.
0
0.
4
Observation
co
o
ks
.
di
st
an
ce
(fit
)
9
essay、essay代写