Rmarkdown代写-STA465/ STA2016|学霸联盟

Rmarkdown代写-STA465/ STA2016

时间：2021-03-07

Homework 3: STA465/ STA2016
Homework 3 is due on Friday, March 12th at 23:59 EST. The homework assignment is worth 20 points in
total.
Question 1 (20 pts)
For this question, we will use the Pennsylvania data set we have already covered in class. More information
on the data and an analysis can be found here: https://journal.r-project.org/archive/2018/RJ-2018-036/RJ-
2018-036.pdf
library(SpatialEpi)
library(tidyverse)
library(sf)
library(spdep)
data(pennLC)
population <- pennLC$data$population
cases <- pennLC$data$cases
n.strata <- 16
E <- expected(population, cases, n.strata)
d <- aggregate(x = pennLC$data$cases, by = list(county = pennLC$data$county), FUN = sum)
# from spatial polygon to simple feature
pennLC.sf <- st_as_sf(pennLC$spatial.polygon)
pennLC.sf$county <- d$county
pennLC.sf$counts <- d$x
pennLC.sf$E <- E[match(pennLC.sf$county, unique(pennLC$data$county))]
pennLC.sf <- merge(pennLC.sf, pennLC$smoking, by.x = "county", by.y = "county")
pennLC.sf <- pennLC.sf%>%
mutate(SIR = counts/E)
The BYM model for the data:
Yi|θi ∼ Po(Eiθi)
log(θi) = β0 + β1 × smokingi + ui + vi
ui|uj∼i ∼ N
(∑
j∼i uj
di,i
,
1
di,iτu
)
vi ∼ N
(
0, 1
τv
)
where β0 is the intercept, β1 relates to the effect of proportion of smokers on lung cancer, τu and τv are the
precision of the spatial and unstructured random effect, respectively. The expected counts per county are
denoted by Ei and total number of neighbors for location ni is denoted as di,i.
1
Question 1.1
What is the geometry type and CRS of the Pennsylvania lip cancer data set (pennLC.sf)? Display counts,
E, smoking, and SIR in a four panel plot.
Question 1.2
We will fit a Besag-York-Mollié model to the data, but first do a few prior predictive checks. The ICAR
structure is an improper and non-generative prior for the spatial random effect of the BYM model. However,
we can impose a sum-to-zero constraint and use an eigenvalue decomposition to be able to generate data
from the ICAR prior.
Use the following two sets of priors to simulate 100 data sets from the prior predictive distribution of the
BYM model:
1. β0 ∼ N(0, 1), β1 ∼ N(0, 1), τv ∼ Gamma(1, 1), τu ∼ Gamma(1, 1)
2. β0 ∼ N(0, 100), β1 ∼ N(0, 100), τv ∼ Gamma(0.1, 0.1), τu ∼ Gamma(0.1, 0.1)
For counties centre, monroe, westmoreland, lebanon, beaver, and erie, make a histogram of the prior
predictive draws and add a vertical line at the observed value.
Question 1.3
For prior sets 1 and 2 given in Question 1.2, create maps of 4 different prior predictive draws for lung cancer
counts in Pennsylvania. You should have a total of 8 plots.
Question 1.4
Fit the model in INLA using the two different sets of priors in Question 2.2 and using the default priors
specified in INLA (a total of 3 models). Organize the results of the estimated parameters (mean and 95%
credible intervals) in a table. Include R code used.
## Example code for the BYM model -- modify as needed
library(INLA)
## Values of E_i and neighborhood structure
E.penn <- pennLC.sf$E
neighbor.penn <- poly2nb(pennLC.sf)
nb2INLA("npenn.adj", neighbor.penn)
g <- inla.read.graph(filename = "npenn.adj")
pennLC.sf$re_u <- 1:nrow(pennLC.sf)
pennLC.sf$re_v <- 1:nrow(pennLC.sf)
## priors specified through ’hyper’ and ’control.fixed’ arguments
formula <- counts ~ smoking +
f(re_u, model = "besag", graph = g,
hyper = list(prec = list(prior = "loggamma",param = c(0.01, 0.01)))) +
f(re_v, model = "iid",
2
hyper = list(prec = list(prior = "loggamma",param = c(0.01, 0.01))))
res <- inla(formula, family = "poisson", data = pennLC.sf, E = E.penn,
control.predictor = list(compute = TRUE),
control.fixed = list(mean.intercept = 0,
prec.intercept = 1,
mean = 0,
prec = 1))
Question 1.5
Create two maps of the results for each fitted model.
• Map 1: mean expected count of lung cancer cases across Pennsylvania.
• Map 2: standard deviation of expected counts of lung cancer cases across Pennsylvania
## Example code for Map 1:
res$summary.fitted.values[,"mean"]
## Example code for Map 2:
res$summary.fitted.values[,"sd"]
Question 1.6
Simulate 100 data sets from the posterior predictive distribution per each of the three fitted models. For
counties centre, monroe, westmoreland, lebanon, beaver, and erie, make a histogram of the posterior
predictive draws and add a vertical line at the observed value.
Question 1.7
Per each of the three fitted models in Question 1.4, create maps of 4 different posterior predictive draws for
lung cancer counts in Pennsylvania (a total of 12 maps).
Optional: Question 2 (bonus 1 pt)
Let τu = 1, and show that for the ICAR model for a spatial random effect, u, we have that - 12u>[D −W ]u
is equal to - 12
(∑
j∼i(ui − uj)2
)
.
3

学霸联盟