QUIZ 3: BIG DATA General instruction: This assignment is due at 4:00 pm on the Friday, the 5th September. Please generate a single PDF file using R Markdown. You may either knit directly to PDF or create an HTML document and convert it to PDF. Once completed, submit the PDF via Turnitin on the course webpage. Caution: Do not set a seed. If you do, no credit will be given for this quiz. The same penalty applies if you do not use R Markdown to generate a single document. When a word limit is specified (e.g., 50 words), do not exceed it; otherwise, no credit will be given. You may count words at https://wordcounter.net/. Preliminaries: Import the dataset Carseats from the ISLR2 R-package, and consider polynomials; Sales = β0 + β1Income + β2Income 2 + · · ·+ +βdIncomed + ε where ε is a mean zero error term and d is the degree of polynomials. Total 10 marks (each 1 mark) 1. Divide the dataset into two equal sizes, each containing 200 observations. 2. Using one of the two subsample, train the model for each d ∈ {1, . . . , 10} and compute each model’s test MSE and training MSE. Make a figure with two diagrams as below, indicating the lowest MSE. 2 4 6 8 10 8. 3 8. 4 8. 5 8. 6 degree of polynomials te st M SE 2 4 6 8 10 6. 95 7. 00 7. 05 7. 10 7. 15 7. 20 7. 25 7. 30 degree of polynomials tra in M SE 3. Compute test MSEs using LOOCV and 10-fold CV for d ∈ {1, 2, . . . , 10} and make make a figure with two diagrams as above, indicating the employed method. Here, use the entire sample with n = 400. 4. Based on your exercise so far, choose the best model. Let d∗ be your choice (preferred model). 5. You computed validation set MSEs, training MSEs, LOOCV errors, and 10-fold CV errors. Which one did you use for choosing d∗? Explain in 20 words. 6. Report your preferred model’s estimated coefficients and standard errors. 7. For your preferred model with d∗, compute the standard errors by 1,000 bootstrap samples. 8. Obtain the standard error of ( βˆ0 × βˆ1 × · · · × βˆd∗ ) with your d∗ above. Hint: prod() and bootstrap. 9. Implement the best subset approach using all available variables in Carseats to predict Sale. Make a figure of 2× 2 diagrams, showing RSS, Adjusted R2, AIC, and BIC for different numbers of variables. 10. Using Q9, choose the best model and explain your decision in 20 words. 1
学霸联盟