xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-BIOSTAT 274

时间：2021-04-19

BIOSTAT 274 Spring 2021 Homework 1

Due 11:59 PM 04/21/2020 (Submit to CCLE)

YOUR NAME

Remark. For Computational Part, please complete your answer in the RMarkdown file and summit the

generated PDF and RMD files. Related packages have been loaded in setup.

Computational Part

1. (Model Selection, [ISL] 6.8, 25 pt) In this exercise, we will generate simulated data, and will then use

this data to perform model selection.

(a) Use the rnorm function to generate a predictor X of length n “ 100, as well as a noise vector of

length n “ 100.

(b) Generate a response vector Y of length n “ 100 according to the model

Y “ β0 ` β1X ` β2X2 ` β3X3 ` ,

where β0 “ 3, β1 “ 2, β2 “ ´3, β3 “ 0.3.

(c) Use the regsubsets function from leaps package to perform best subset selection in order to

choose the best model from the set of predictors pX,X2, ¨ ¨ ¨ , X10q. What are the best models

obtained according to Cp, BIC, and adjusted R2, respectively? Show some plots to provide evidence

for your answer, and report the coefficients of the best model obtained.

(d) Repeat (c), using forward stepwise selection and also using backward stepwise selection. How does

your answer compare to the results in (c)?

(e) Now fit a LASSO model with glmnet function from glmnet package to the simulated data, again

using pX,X2, ¨ ¨ ¨ , X10q as predictors. Use cross-validation to select the optimal value of λ. Create

plots of the cross-validation error as a function of λ. Report the resulting coefficient estimates,

and discuss the results obtained.

(f) Now generate a response vector Y according to the model

Y “ β0 ` β7X7 ` ,

where β7 “ 7, and perform best subset selection and the LASSO. Discuss the results obtained.

2. (Prediction, [ISL] 6.9, 20 pt) In this exercise, we will predict the number of applications received (Apps)

using the other variables in the College data set from ISLR package.

(a) Randomly split the data set into equal sized training set and test set (1:1).

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

(c) Fit a ridge regression model on the training set, with λ chosen by 5-fold cross-validation. Report

the test error obtained.

(d) Fit a LASSO model on the training set, with λ chosen by 5-fold cross-validation. Report the test

error obtained, along with the number of non-zero coefficient estimates.

(e) Comment on the results obtained. How accurately can we predict the number of college applications

received? Is there much difference among the test errors resulting from these three approaches?

1

学霸联盟

Due 11:59 PM 04/21/2020 (Submit to CCLE)

YOUR NAME

Remark. For Computational Part, please complete your answer in the RMarkdown file and summit the

generated PDF and RMD files. Related packages have been loaded in setup.

Computational Part

1. (Model Selection, [ISL] 6.8, 25 pt) In this exercise, we will generate simulated data, and will then use

this data to perform model selection.

(a) Use the rnorm function to generate a predictor X of length n “ 100, as well as a noise vector of

length n “ 100.

(b) Generate a response vector Y of length n “ 100 according to the model

Y “ β0 ` β1X ` β2X2 ` β3X3 ` ,

where β0 “ 3, β1 “ 2, β2 “ ´3, β3 “ 0.3.

(c) Use the regsubsets function from leaps package to perform best subset selection in order to

choose the best model from the set of predictors pX,X2, ¨ ¨ ¨ , X10q. What are the best models

obtained according to Cp, BIC, and adjusted R2, respectively? Show some plots to provide evidence

for your answer, and report the coefficients of the best model obtained.

(d) Repeat (c), using forward stepwise selection and also using backward stepwise selection. How does

your answer compare to the results in (c)?

(e) Now fit a LASSO model with glmnet function from glmnet package to the simulated data, again

using pX,X2, ¨ ¨ ¨ , X10q as predictors. Use cross-validation to select the optimal value of λ. Create

plots of the cross-validation error as a function of λ. Report the resulting coefficient estimates,

and discuss the results obtained.

(f) Now generate a response vector Y according to the model

Y “ β0 ` β7X7 ` ,

where β7 “ 7, and perform best subset selection and the LASSO. Discuss the results obtained.

2. (Prediction, [ISL] 6.9, 20 pt) In this exercise, we will predict the number of applications received (Apps)

using the other variables in the College data set from ISLR package.

(a) Randomly split the data set into equal sized training set and test set (1:1).

(b) Fit a linear model using least squares on the training set, and report the test error obtained.

(c) Fit a ridge regression model on the training set, with λ chosen by 5-fold cross-validation. Report

the test error obtained.

(d) Fit a LASSO model on the training set, with λ chosen by 5-fold cross-validation. Report the test

error obtained, along with the number of non-zero coefficient estimates.

(e) Comment on the results obtained. How accurately can we predict the number of college applications

received? Is there much difference among the test errors resulting from these three approaches?

1

学霸联盟