R代写-S2

Week Four Section 1 Proof of S2 = 1 n− 1 n∑ i=1 ( Xi − X¯ )2 is an unbised estimator; and S2∗ = 1n ∑n i=1 ( Xi − X¯ )2 is biased. 1 Section 2 library(tidyverse) library(patchwork) set.seed(100) # Functions to compute the estimators T1 <- function(x) var(x) T2 <- function(x) (n-1)/n*var(x) n <- 100 M <- 1000 # One million simulations thesimulations <- list( T1 = numeric(M), T2 = numeric(M) ) for (i in 1:M) { # Do the simulation # Sample from a discrete uniform (?sample.int): thesample <- rnorm(n, mean = 0, sd = sqrt(4)) # Record the values of the two estimators: thesimulations\$T1[i] <- T1(thesample) thesimulations\$T2[i] <- T2(thesample) } # Evaluate the bias of T1 and T2 mean(thesimulations\$T1) - 4 ## [1] -0.005560429 mean(thesimulations\$T2) - 4 ## [1] -0.04550482 Finally, let’s look at the sampled values of T1, T2 # Recreate the plots in Figure 20.1: leftplot <- tibble(T1 = thesimulations\$T1) %>% ggplot(aes(x = T1)) + theme_classic() + geom_histogram(aes(y = ..density..),bins = 30,colour = "black",fill = "transparent") rightplot <- tibble(T2 = thesimulations\$T2) %>% ggplot(aes(x = T2)) + theme_classic() + geom_histogram(aes(y = ..density..),bins = 30,colour = "black",fill = "transparent") + geom_vline(xintercept = 4, color = "red") + #optional (sta130 code) geom_vline(xintercept = mean(thesimulations\$T2), color = "blue") #optional (sta130 code) # google ggplot tag; caption, labs .... leftplot| rightplot 2 0.0 0.2 0.4 0.6 0.8 3 4 5 6 T1 de ns ity 0.0 0.2 0.4 0.6 0.8 3 4 5 6 T2 de ns ity Mean Squared Error mse <- function(x) var(x) + (mean(x) - 4)^2 Variance of the 3 estimators: var(thesimulations\$T1) ## [1] 0.351997 var(thesimulations\$T2) ## [1] 0.3449923 MSE mse(thesimulations\$T1) ## [1] 0.352028 mse(thesimulations\$T2) ## [1] 0.347063 Tables MSE T1 0.352028 T2 0.347063 There are many other options, google it!! 3 Section 3 Linear Regression Let’s perform a linear regression analysis using the coffee ratings data that was introduced in week 1. Load in the coffee ratings data here: Let’s start off by plotting some variables against each other to see what might make for a good linear relationship (we want something appropriate both practically and statistically). # Using Tidyverse coffee_ratings %>% ggplot(aes(x = aroma,y = acidity)) + theme_classic() + geom_point() + scale_x_continuous(breaks = seq(0,10,by=1)) + scale_y_continuous(breaks = seq(0,10,by=1)) + coord_cartesian(xlim = c(5,9),ylim = c(5,9)) + labs(x = "aroma", y = "acidity", title = "Scatterplot") 5 6 7 8 9 5 6 7 8 9 aroma a ci di ty Scatterplot So our model is as follows: Yi = α+ βxi + Ui or (some notation also use Yi = β0 + β1xi + i ). Later on we will learn about how to estimate these parameters, but for now let’s let R do the work for us. 4 model <- lm(acidity ~ aroma, data = coffee_ratings) model ## ## Call: ## lm(formula = acidity ~ aroma, data = coffee_ratings) ## ## Coefficients: ## (Intercept) aroma ## 2.9006 0.6129 Now let’s re-plot the original the original scatterplot but with the affiliated line. # Using Tidyverse coffee_ratings %>% ggplot(aes(x = aroma,y = acidity)) + theme_classic() + geom_point() + scale_x_continuous(breaks = seq(0,10,by=1)) + scale_y_continuous(breaks = seq(0,10,by=1)) + coord_cartesian(xlim = c(5,9),ylim = c(5,9)) + labs(x = "aroma", y = "acidity", title = "Scatterplot") + geom_abline(slope = 0.6129, intercept = 2.9006, col="red") 5 6 7 8 9 5 6 7 8 9 aroma a ci di ty Scatterplot C <- coffee_ratings %>% ggplot(aes(x = aroma,y = acidity)) + 5 theme_classic() + geom_point() + scale_x_continuous(breaks = seq(0,10,by=1)) + scale_y_continuous(breaks = seq(0,10,by=1)) + coord_cartesian(xlim = c(5,9),ylim = c(5,9)) + labs(x = "aroma", y = "acidity", title = "Scatterplot") C + geom_smooth(method = "lm", se = FALSE,colour = "black",size = .5) ## `geom_smooth()` using formula 'y ~ x' 5 6 7 8 9 5 6 7 8 9 aroma a ci di ty Scatterplot 6