QUIZ 2: -无代写|学霸联盟

QUIZ 2: -无代写

时间：2025-08-28

QUIZ 2: BIG DATA
This assignment is due at 4:00 pm on the Friday, the 29th August. Please generate a single PDF file
using R Markdown. You may either knit directly to PDF or create an HTML document and convert it to
PDF. Once completed, submit the PDF via Turnitin on the course webpage.
Caution: Do not set a seed. If you do, no credit will be given for this quiz. The same penalty applies if
you do not use R Markdown to generate a single document. When a word limit is specified (e.g., 50 words),
do not exceed it; otherwise, no credit will be given. You may count words at https://wordcounter.net/.
Total 10 marks (each 1 mark).
1. Import the dataset Carseats from R-package ISLR2. Add a new binary variable HighSales indicating
whether Sales is above its median value, and add it to the Carseats dataset (done in Quiz 1). Then,
use the sample() function to split the dataset into two halves, by selecting a random subset of 300
observations out of the original 400 observations. These 300 observations form a training set and the
other 100 form a test set. For the rest of the problem set, we use the training set for estimation and
test set for evaluation.
Hint: See the tutorial material for cross-validation and bootstrap for sample(). Particularly, you can
construct a test set by Carseats [−train, ], where train collects training observations’ indexes.
2. Fit a Linear Discriminant Analysis (LDA) model to predict HighSales using the three predictors
(Price, Income, Advertise) based on the training set. Using the estimation results, answer the fol-
lowing questions. Use the mean() function to confirm that the prior probability pˆi1 of HighSales = 1
matches the proportion of training observations for which HighSales = 1.
3. Using the estimation results in Q2, report the sample means of (Price, Income, Advertise) for training
observations for which HighSales = 1.
4. Using results in Q2, build a confusion matrix by predicting HighSales for test observations.
5. Fit a Quadratic Discriminant Analysis (QDA) model to predict HighSales using the three predictors
(Price, Income, Advertise) based on the training set. Compare, in 10 words, the prior probabilities
and the sample means of the predictors between QDA and LDA.
6. Using estimates in Q5, build the confusion matrix by predicting HighSales for test observations. The
confusion matrix here should differ from the one in Q4. What is the difference in model assumptions
between QDA and LDA that differentiates the predictions? Explain in 20 words.
7. Make the confusion matrix for test observations after fitting a Naive Bayes model using training
observations.
8. Make the confusion matrix for test observations after fitting a K-Nearest Neighbor with K = 3 over
training observations.
9. Compute the overall test error rates for LDA, QDA, NB, and KNN with K = 3 using the results above.
Which one would you prefer and why? Explain in 10 words.
10. Try KNN with K ∈ {1, . . . , 10}. Does any of them give a better fit than the best model in Q9.
Some questions in this quiz use random numbers, which means your code might produce a slightly different
result each time you run it. If your answer doesn’t match the expected output on the first try, simply run
your code again. This is the intended behavior and not a mistake in your solution.
1

学霸联盟