回归分析代写-PSTAT 126|学霸联盟

回归分析代写-PSTAT 126

时间：2022-10-30

PSTAT 126
Lab 3
10/18/2022
Multiple Linear Regression (MLR)
We can extend the Simple Linear Regression to a Multiple Linear Regression by incorporating more that one
predictor:
yi = β0 + β1xi1 + . . .+ βpxip + ϵi, i = 1, . . . , n
Using matrix notation, the model can be written as:
y = Xβ + ϵ, ϵ ∼ Nn(0, σ2In)
To get the LS solution of β we minimize the Sum of Squared Residuals:
SSR = (y−Xβ)T (y−Xβ)
We obtain the solution: βˆLSE = (XTX)−1XTy. Additionally, it can be proved that βˆ ∼
Np∗(β, (XTX)−1σ2).
Therefore, it is possible to obtain the Standard Error of βˆj :
SE(βˆj) =
√
σˆ2 [XTX]−1jj
Where σˆ2 = (y−Xβˆ)
T (y−Xβˆ)
n−p∗ , with p∗ = p+ 1.
Data Example
library(faraway)
data(diabetes)
model <- lm(weight~chol+ stab.glu+ hdl+height+waist+age, data=diabetes)
summary(model)
##
## Call:
## lm(formula = weight ~ chol + stab.glu + hdl + height + waist +
## age, data = diabetes)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.090 -11.415 0.022 11.952 58.537
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.540e+02 1.800e+01 -8.556 2.77e-16 ***
## chol 2.082e-02 2.232e-02 0.933 0.351
## stab.glu 8.289e-03 1.896e-02 0.437 0.662
1
## hdl -8.465e-02 5.790e-02 -1.462 0.145
## height 1.889e+00 2.392e-01 7.899 2.95e-14 ***
## waist 5.992e+00 1.741e-01 34.417 < 2e-16 ***
## age -4.466e-01 6.070e-02 -7.357 1.14e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 18.34 on 387 degrees of freedom
## (9 observations deleted due to missingness)
## Multiple R-squared: 0.7958, Adjusted R-squared: 0.7926
## F-statistic: 251.3 on 6 and 387 DF, p-value: < 2.2e-16
summary(model)$coefficients
Inference
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.539895e+02 17.99772561 -8.5560510 2.767558e-16
## chol 2.082080e-02 0.02231999 0.9328316 3.514885e-01
## stab.glu 8.289022e-03 0.01896535 0.4370614 6.623106e-01
## hdl -8.465368e-02 0.05789839 -1.4621077 1.445233e-01
## height 1.889310e+00 0.23919598 7.8985848 2.949709e-14
## waist 5.991746e+00 0.17409034 34.4174514 7.958028e-120
## age -4.465606e-01 0.06070061 -7.3567719 1.136818e-12
We get βˆ:
library(faraway)
betas = summary(model)$coefficients[,1]
betas
## (Intercept) chol stab.glu hdl height
## -1.539895e+02 2.082080e-02 8.289022e-03 -8.465368e-02 1.889310e+00
## waist age
## 5.991746e+00 -4.465606e-01
coef(model) # this returns the same results
## (Intercept) chol stab.glu hdl height
## -1.539895e+02 2.082080e-02 8.289022e-03 -8.465368e-02 1.889310e+00
## waist age
## 5.991746e+00 -4.465606e-01
We can get SE(βˆj) j = 0, . . . , p:
summary(model)$coefficients[,2]
## (Intercept) chol stab.glu hdl height waist
## 17.99772561 0.02231999 0.01896535 0.05789839 0.23919598 0.17409034
## age
## 0.06070061
Coefficient od determination R2:
summary(model)$r.squared
## [1] 0.7957582
Residuals:
2
Res=residuals(model)
Standard deviation of residuals (σˆ):
sqrt(sum(Resˆ2) /model$df.residual) # using formula
## [1] 18.33928
sigma(model) # using R built-in function
## [1] 18.33928
Confidence and Prediction Intervals:
# LSE of coefficients with CI
confint(model, level = .95)
## 2.5 % 97.5 %
## (Intercept) -189.37501620 -118.60389976
## chol -0.02306283 0.06470442
## stab.glu -0.02899899 0.04557703
## hdl -0.19848845 0.02918109
## height 1.41902346 2.35959601
## waist 5.64946462 6.33402711
## age -0.56590480 -0.32721631
# 95% CI for the mean response w/ Age=34,chol=186,gluc=85,hdl=46,height=66,waist=46
new = data.frame(chol=186, stab.glu=85, hdl=46,height=66,waist=46,age=34)
ans1 = predict(model, new, se.fit = TRUE, interval = "confidence", level = 0.95, type = "response")
ans1$fit
## fit lwr upr
## 1 231.8254 227.9674 235.6834
# 95% PI for a new observation w/ Age=34,chol=186,gluc=85,hdl=46,height=66,waist=46
ans2 = predict(model, new, se.fit = TRUE, interval = "prediction", level = 0.95, type = "response")
ans2$fit
## fit lwr upr
## 1 231.8254 195.5625 268.0883
Normal Distribution Review
Let X1, X2, ..., Xn
iid∼ N(µ, σ2). Then the pdf is given for any µ ∈ R, x ∈ R, σ > 0 Then define the
following: E(Xi) = µ, V ar(Xi) = σ2 ∀i
The CDF is given by, FX(x) = P (X ≤ x) = 1 − P (X > x) For µ = 0, σ = 1, we have the standard
normal Z ∼ N(0, 1):
Some properties are as follows:
1) P (Z < z) = P (Z > −z)
2) P (|Z| > z) = 2P (Z > z) = 2P (Z < −z)
3) if P (Z > zα) = α, then
P (z1−α/2 < Z < zα/2)
= P (−zα/2 < Z < zα/2) = 1− α
3
# consider the x-values
x<- seq(-3.5, 3.5, length.out=100)
# pdf of N(0,1)
f<- dnorm(x)
f2<- dnorm(x, mean =0, sd = .5) # pdf of N(0,.5)
f3<- dnorm(x, mean = 1, sd = .75) # pdf of N(1,.75)
f4<- dnorm(x, mean= -2, sd = 1.5) # pdf of N(-2,1.5)
# plot of pdfs
plot(x, f, type = "l", lwd=1, col= 1, ylab="Density", ylim=c(0, .95))
lines(x, f2, type = "l", lwd=1, col= 2)
lines(x, f3, type = "l", lwd=1, col= 3)
lines(x, f4, type = "l", lwd=1, col= 4)
legend(2, .8, legend=c("N(0,1)", "N(0,0.5)","N(1,0.75)","N(-2,1.5)"),
col=c(1,2,3,4), lty=1, cex=0.8)
Computation:
−3 −2 −1 0 1 2 3
0.
0
0.
2
0.
4
0.
6
0.
8
x
D
en
si
ty
N(0,1)
N(0,0.5)
N(1,0.75)
N(−2,1.5)
#cdf
cf1<- pnorm(x) # pdf of N(0,1)
cf2<- pnorm(x, mean =0, sd = .5) # pdf of N(0,.5)
cf3<- pnorm(x, mean = 1, sd = .75) # pdf of N(1,.75)
cf4<- pnorm(x, mean= -2, sd = 1.5) # pdf of N(-2,1.5)
#quantiles
4
q<- seq(0,1, length.out=100)
qf1<- qnorm(q) # pdf of N(0,1)
qf2<- qnorm(q, mean =0, sd = .5) # pdf of N(0,.5)
qf3<- qnorm(q, mean = 1, sd = .75) # pdf of N(1,.75)
qf4<- qnorm(q, mean= -2, sd = 1.5) # pdf of N(-2,1.5)
par(mfrow= c(1,2))
# plot of cdfs
plot(x, cf1, type = "l", lwd=1, col= 1, ylab="CDF", ylim=c(0, 1))
lines(x, cf2, type = "l", lwd=1, col= 2)
lines(x, cf3, type = "l", lwd=1, col= 3)
lines(x, cf4, type = "l", lwd=1, col= 4)
legend(-3.6, 1.02, legend=c("N(0,1)", "N(0,0.5)","N(1,0.75)","N(-2,1.5)"),
col=c(1,2,3,4), lty=1, cex=0.6)
# plot of quantiles
plot(q, qf1, type = "l", lwd=1, col= 1, ylab="Quantile function", ylim=c(-3,3))
lines(q, qf2, type = "l", lwd=1, col= 2)
lines(q, qf3, type = "l", lwd=1, col= 3)
lines(q, qf4, type = "l", lwd=1, col= 4)
legend(0, 3, legend=c("N(0,1)", "N(0,0.5)","N(1,0.75)","N(-2,1.5)"),
col=c(1,2,3,4), lty=1, cex=0.6)
−3 −1 1 2 3
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
x
CD
F
N(0,1)
N(0,0.5)
N(1,0.75)
N(−2,1.5)
0.0 0.2 0.4 0.6 0.8 1.0
−
3
−
2
−
1
0
1
2
3
q
Qu
an
tile
fu
nc
tio
n
N(0,1)
N(0,0.5)
N(1,0.75)
N(−2,1.5)
5
Simulation of a linear regression Consider the x covariate in [0, 1] distributed uniformly. Then we have
the model
Yi = 1.21 + 2.445Xi + ϵi, i = 1, 2, ..., 200
# generate x's
set.seed(54321)
x<- runif(200)
# generate y's
y<- 1.21+ 2.445*x+ rnorm(200)
# fit linear model
fit1<- lm(y~x)
# plot the regression
plot(x,y, col="red", pch=19)
abline(fit1)
0.0 0.2 0.4 0.6 0.8 1.0
−
1
0
1
2
3
4
5
x
y
fit1$coefficients
## (Intercept) x
## 1.123134 2.664617
6