xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

Rmarkdown代写-STA465/ STA2016

时间：2021-03-07

Homework 3: STA465/ STA2016

Homework 3 is due on Friday, March 12th at 23:59 EST. The homework assignment is worth 20 points in

total.

Question 1 (20 pts)

For this question, we will use the Pennsylvania data set we have already covered in class. More information

on the data and an analysis can be found here: https://journal.r-project.org/archive/2018/RJ-2018-036/RJ-

2018-036.pdf

library(SpatialEpi)

library(tidyverse)

library(sf)

library(spdep)

data(pennLC)

population <- pennLC$data$population

cases <- pennLC$data$cases

n.strata <- 16

E <- expected(population, cases, n.strata)

d <- aggregate(x = pennLC$data$cases, by = list(county = pennLC$data$county), FUN = sum)

# from spatial polygon to simple feature

pennLC.sf <- st_as_sf(pennLC$spatial.polygon)

pennLC.sf$county <- d$county

pennLC.sf$counts <- d$x

pennLC.sf$E <- E[match(pennLC.sf$county, unique(pennLC$data$county))]

pennLC.sf <- merge(pennLC.sf, pennLC$smoking, by.x = "county", by.y = "county")

pennLC.sf <- pennLC.sf%>%

mutate(SIR = counts/E)

The BYM model for the data:

Yi|θi ∼ Po(Eiθi)

log(θi) = β0 + β1 × smokingi + ui + vi

ui|uj∼i ∼ N

(∑

j∼i uj

di,i

,

1

di,iτu

)

vi ∼ N

(

0, 1

τv

)

where β0 is the intercept, β1 relates to the effect of proportion of smokers on lung cancer, τu and τv are the

precision of the spatial and unstructured random effect, respectively. The expected counts per county are

denoted by Ei and total number of neighbors for location ni is denoted as di,i.

1

Question 1.1

What is the geometry type and CRS of the Pennsylvania lip cancer data set (pennLC.sf)? Display counts,

E, smoking, and SIR in a four panel plot.

Question 1.2

We will fit a Besag-York-Mollié model to the data, but first do a few prior predictive checks. The ICAR

structure is an improper and non-generative prior for the spatial random effect of the BYM model. However,

we can impose a sum-to-zero constraint and use an eigenvalue decomposition to be able to generate data

from the ICAR prior.

Use the following two sets of priors to simulate 100 data sets from the prior predictive distribution of the

BYM model:

1. β0 ∼ N(0, 1), β1 ∼ N(0, 1), τv ∼ Gamma(1, 1), τu ∼ Gamma(1, 1)

2. β0 ∼ N(0, 100), β1 ∼ N(0, 100), τv ∼ Gamma(0.1, 0.1), τu ∼ Gamma(0.1, 0.1)

For counties centre, monroe, westmoreland, lebanon, beaver, and erie, make a histogram of the prior

predictive draws and add a vertical line at the observed value.

Question 1.3

For prior sets 1 and 2 given in Question 1.2, create maps of 4 different prior predictive draws for lung cancer

counts in Pennsylvania. You should have a total of 8 plots.

Question 1.4

Fit the model in INLA using the two different sets of priors in Question 2.2 and using the default priors

specified in INLA (a total of 3 models). Organize the results of the estimated parameters (mean and 95%

credible intervals) in a table. Include R code used.

## Example code for the BYM model -- modify as needed

library(INLA)

## Values of E_i and neighborhood structure

E.penn <- pennLC.sf$E

neighbor.penn <- poly2nb(pennLC.sf)

nb2INLA("npenn.adj", neighbor.penn)

g <- inla.read.graph(filename = "npenn.adj")

pennLC.sf$re_u <- 1:nrow(pennLC.sf)

pennLC.sf$re_v <- 1:nrow(pennLC.sf)

## priors specified through ’hyper’ and ’control.fixed’ arguments

formula <- counts ~ smoking +

f(re_u, model = "besag", graph = g,

hyper = list(prec = list(prior = "loggamma",param = c(0.01, 0.01)))) +

f(re_v, model = "iid",

2

hyper = list(prec = list(prior = "loggamma",param = c(0.01, 0.01))))

res <- inla(formula, family = "poisson", data = pennLC.sf, E = E.penn,

control.predictor = list(compute = TRUE),

control.fixed = list(mean.intercept = 0,

prec.intercept = 1,

mean = 0,

prec = 1))

Question 1.5

Create two maps of the results for each fitted model.

• Map 1: mean expected count of lung cancer cases across Pennsylvania.

• Map 2: standard deviation of expected counts of lung cancer cases across Pennsylvania

## Example code for Map 1:

res$summary.fitted.values[,"mean"]

## Example code for Map 2:

res$summary.fitted.values[,"sd"]

Question 1.6

Simulate 100 data sets from the posterior predictive distribution per each of the three fitted models. For

counties centre, monroe, westmoreland, lebanon, beaver, and erie, make a histogram of the posterior

predictive draws and add a vertical line at the observed value.

Question 1.7

Per each of the three fitted models in Question 1.4, create maps of 4 different posterior predictive draws for

lung cancer counts in Pennsylvania (a total of 12 maps).

Optional: Question 2 (bonus 1 pt)

Let τu = 1, and show that for the ICAR model for a spatial random effect, u, we have that - 12u>[D −W ]u

is equal to - 12

(∑

j∼i(ui − uj)2

)

.

3

学霸联盟

Homework 3 is due on Friday, March 12th at 23:59 EST. The homework assignment is worth 20 points in

total.

Question 1 (20 pts)

For this question, we will use the Pennsylvania data set we have already covered in class. More information

on the data and an analysis can be found here: https://journal.r-project.org/archive/2018/RJ-2018-036/RJ-

2018-036.pdf

library(SpatialEpi)

library(tidyverse)

library(sf)

library(spdep)

data(pennLC)

population <- pennLC$data$population

cases <- pennLC$data$cases

n.strata <- 16

E <- expected(population, cases, n.strata)

d <- aggregate(x = pennLC$data$cases, by = list(county = pennLC$data$county), FUN = sum)

# from spatial polygon to simple feature

pennLC.sf <- st_as_sf(pennLC$spatial.polygon)

pennLC.sf$county <- d$county

pennLC.sf$counts <- d$x

pennLC.sf$E <- E[match(pennLC.sf$county, unique(pennLC$data$county))]

pennLC.sf <- merge(pennLC.sf, pennLC$smoking, by.x = "county", by.y = "county")

pennLC.sf <- pennLC.sf%>%

mutate(SIR = counts/E)

The BYM model for the data:

Yi|θi ∼ Po(Eiθi)

log(θi) = β0 + β1 × smokingi + ui + vi

ui|uj∼i ∼ N

(∑

j∼i uj

di,i

,

1

di,iτu

)

vi ∼ N

(

0, 1

τv

)

where β0 is the intercept, β1 relates to the effect of proportion of smokers on lung cancer, τu and τv are the

precision of the spatial and unstructured random effect, respectively. The expected counts per county are

denoted by Ei and total number of neighbors for location ni is denoted as di,i.

1

Question 1.1

What is the geometry type and CRS of the Pennsylvania lip cancer data set (pennLC.sf)? Display counts,

E, smoking, and SIR in a four panel plot.

Question 1.2

We will fit a Besag-York-Mollié model to the data, but first do a few prior predictive checks. The ICAR

structure is an improper and non-generative prior for the spatial random effect of the BYM model. However,

we can impose a sum-to-zero constraint and use an eigenvalue decomposition to be able to generate data

from the ICAR prior.

Use the following two sets of priors to simulate 100 data sets from the prior predictive distribution of the

BYM model:

1. β0 ∼ N(0, 1), β1 ∼ N(0, 1), τv ∼ Gamma(1, 1), τu ∼ Gamma(1, 1)

2. β0 ∼ N(0, 100), β1 ∼ N(0, 100), τv ∼ Gamma(0.1, 0.1), τu ∼ Gamma(0.1, 0.1)

For counties centre, monroe, westmoreland, lebanon, beaver, and erie, make a histogram of the prior

predictive draws and add a vertical line at the observed value.

Question 1.3

For prior sets 1 and 2 given in Question 1.2, create maps of 4 different prior predictive draws for lung cancer

counts in Pennsylvania. You should have a total of 8 plots.

Question 1.4

Fit the model in INLA using the two different sets of priors in Question 2.2 and using the default priors

specified in INLA (a total of 3 models). Organize the results of the estimated parameters (mean and 95%

credible intervals) in a table. Include R code used.

## Example code for the BYM model -- modify as needed

library(INLA)

## Values of E_i and neighborhood structure

E.penn <- pennLC.sf$E

neighbor.penn <- poly2nb(pennLC.sf)

nb2INLA("npenn.adj", neighbor.penn)

g <- inla.read.graph(filename = "npenn.adj")

pennLC.sf$re_u <- 1:nrow(pennLC.sf)

pennLC.sf$re_v <- 1:nrow(pennLC.sf)

## priors specified through ’hyper’ and ’control.fixed’ arguments

formula <- counts ~ smoking +

f(re_u, model = "besag", graph = g,

hyper = list(prec = list(prior = "loggamma",param = c(0.01, 0.01)))) +

f(re_v, model = "iid",

2

hyper = list(prec = list(prior = "loggamma",param = c(0.01, 0.01))))

res <- inla(formula, family = "poisson", data = pennLC.sf, E = E.penn,

control.predictor = list(compute = TRUE),

control.fixed = list(mean.intercept = 0,

prec.intercept = 1,

mean = 0,

prec = 1))

Question 1.5

Create two maps of the results for each fitted model.

• Map 1: mean expected count of lung cancer cases across Pennsylvania.

• Map 2: standard deviation of expected counts of lung cancer cases across Pennsylvania

## Example code for Map 1:

res$summary.fitted.values[,"mean"]

## Example code for Map 2:

res$summary.fitted.values[,"sd"]

Question 1.6

Simulate 100 data sets from the posterior predictive distribution per each of the three fitted models. For

counties centre, monroe, westmoreland, lebanon, beaver, and erie, make a histogram of the posterior

predictive draws and add a vertical line at the observed value.

Question 1.7

Per each of the three fitted models in Question 1.4, create maps of 4 different posterior predictive draws for

lung cancer counts in Pennsylvania (a total of 12 maps).

Optional: Question 2 (bonus 1 pt)

Let τu = 1, and show that for the ICAR model for a spatial random effect, u, we have that - 12u>[D −W ]u

is equal to - 12

(∑

j∼i(ui − uj)2

)

.

3

学霸联盟