QUIZ 3: -无代写
时间:2025-09-04
QUIZ 3: BIG DATA
General instruction: This assignment is due at 4:00 pm on the Friday, the 5th September. Please generate
a single PDF file using R Markdown. You may either knit directly to PDF or create an HTML document
and convert it to PDF. Once completed, submit the PDF via Turnitin on the course webpage.
Caution: Do not set a seed. If you do, no credit will be given for this quiz. The same penalty applies if
you do not use R Markdown to generate a single document. When a word limit is specified (e.g., 50 words),
do not exceed it; otherwise, no credit will be given. You may count words at https://wordcounter.net/.
Preliminaries: Import the dataset Carseats from the ISLR2 R-package, and consider polynomials;
Sales = β0 + β1Income + β2Income
2 + · · ·+ +βdIncomed + ε
where ε is a mean zero error term and d is the degree of polynomials.
Total 10 marks (each 1 mark)
1. Divide the dataset into two equal sizes, each containing 200 observations.
2. Using one of the two subsample, train the model for each d ∈ {1, . . . , 10} and compute each model’s
test MSE and training MSE. Make a figure with two diagrams as below, indicating the lowest MSE.
2 4 6 8 10
8.
3
8.
4
8.
5
8.
6
degree of polynomials
te
st
M
SE
2 4 6 8 10
6.
95
7.
00
7.
05
7.
10
7.
15
7.
20
7.
25
7.
30
degree of polynomials
tra
in
M
SE
3. Compute test MSEs using LOOCV and 10-fold CV for d ∈ {1, 2, . . . , 10} and make make a figure with
two diagrams as above, indicating the employed method. Here, use the entire sample with n = 400.
4. Based on your exercise so far, choose the best model. Let d∗ be your choice (preferred model).
5. You computed validation set MSEs, training MSEs, LOOCV errors, and 10-fold CV errors. Which one
did you use for choosing d∗? Explain in 20 words.
6. Report your preferred model’s estimated coefficients and standard errors.
7. For your preferred model with d∗, compute the standard errors by 1,000 bootstrap samples.
8. Obtain the standard error of
(
βˆ0 × βˆ1 × · · · × βˆd∗
)
with your d∗ above. Hint: prod() and bootstrap.
9. Implement the best subset approach using all available variables in Carseats to predict Sale. Make a
figure of 2× 2 diagrams, showing RSS, Adjusted R2, AIC, and BIC for different numbers of variables.
10. Using Q9, choose the best model and explain your decision in 20 words.
1

学霸联盟
essay、essay代写