xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

代写-MAST90084-Assignment 1

时间：2021-03-16

MAST90084: Statistical Modelling Assignment 1

1. Let X and Y be two categorical random variables, X with I different categories identified with the set

{1, . . . , I} and Y with J different categories identified with the set {1, . . . , J}. Suppose observations of

the variable pair (X,Y ) are tabulated in a I × J contingency table. Using standard notations, for a given

(i, j) ∈ {1, . . . , I} × {1, . . . , J}, nij is the entry in the (i, j)-th cell that denotes the count of observations

with X equal to its i-th category and Y equal to its j-th category. A Poisson sampling model for the

contingency table assumes that the nij ’s are independently distributed with

nij ∼ Poi(µij),

where µij denotes the Poisson mean for the cell count nij .

(a) Derive the conditional joint distribution of {nij}(i,j)∈{1,...,I}×{1,...J} given n. Identify the name of this

distribution, and explicitly state what its parameter values are in terms of {µij}(i,j)∈{1,...,I}×{1,...J}

and n. [5]

(b) Let I = J = 2. The quantity µ11/µ12µ21/µ22 , also known as the odds ratio, measures the association between

X and Y . What should be the value of the odds ratio if X and Y are independent and why? [3]

2. Data in the following 2× 2× 3 contingency table were used to study the effect of passive smoking on lung

cancer. The table summarizes the results of case-control studies from 3 countries for nonsmoking women

married to smokers. (Source: Blot and Fraumeni, J. Nat. Cancer Inst., 77:993-1000 (1986) and Agresti

(1996).)

Country Spouse Smoked Cases Controls

Japan No 21 82

Yes 73 188

UK No 5 16

Yes 19 38

USA No 71 249

Yes 137 363

(a) A log-linear model mod1 can be fitted to the data, with the results being given in the following R

output. Give the mathematical formula of form ln(µ) = · · · for the mean model of mod1, where µ is

the mean of the response. Any dummy variables in your formula should be explicitly defined. [5]

> pasSmoking.dat=data.frame(freq=c(21,73,5,19,71,137,82,188,16,38,249,363))

> pasSmoking.dat$Cnt=factor(rep(c("Japan","UK", "USA"), times=2, each=2))

> pasSmoking.dat$Smo=factor(rep(c("No","Yes"), times=6))

> pasSmoking.dat$Can=factor(rep(c("Case","Control"), each=6))

> pasSmoking.dat

freq Cnt Smo Can

1 21 Japan No Case

2 73 Japan Yes Case

3 5 UK No Case

4 19 UK Yes Case

5 71 USA No Case

6 137 USA Yes Case

7 82 Japan No Control

8 188 Japan Yes Control

9 16 UK No Control

10 38 UK Yes Control

11 249 USA No Control

12 363 USA Yes Control

MAST90084 Statistical Modelling Assignment 1 Semester 1, 2021

> mod1=glm(freq~Cnt+Smo+Can+Cnt:Smo+Cnt:Can+Smo:Can, family=poisson, data=pasSmoking.dat)

> anova(mod1, test="Chisq")

Analysis of Deviance Table; Model: poisson; Link: log; Response: freq

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev P(>|Chi|)

NULL 11 1168.85

Cnt 2 726.43 9 442.42 < 2.2e-16

Smo 1 112.52 8 329.90 < 2.2e-16

Can 1 307.56 7 22.34 < 2.2e-16

Cnt:Smo 2 15.50 5 6.84 0.0004316

Cnt:Can 2 1.05 3 5.80 0.5919109

Smo:Can 1 5.56 2 0.24 0.0184215

> 1-pchisq(0.24,2)

[1] 0.8869204

> 1-pchisq(5.80,3)

[1] 0.1217566

(b) Expanding the notations from Question 1, for the current contingency table we can also use nijk to

denote the count in each cell, where i ∈ {1, 2}, j ∈ {1, 2}, k ∈ {1, 2, 3} are indices corresponding

to Can (variable X), Smo (variable Y ) and Cnt (variable Z) respectively. Moreover, if nijk are

independently distributed with

nijk ∼ Poi(µijk),

one can, for any k ∈ {1, 2, 3}, define the odd ratios θXY (k) = µ11kµ22kµ12kµ21k for the partial table with Z = k.

The table is said to have homogeneous XY association when θXY (1) = θXY (2) = θXY (3). Explain why

the model in part (a) has XY homogenous association. [5]

(c) Based on the displayed R output in (a), test the significance of the interaction effect Smo:Can at

significance level 0.05, eliminating the effects of all other terms in mod1. Provide your conclusion

with clear explanation. [4]

(d) Based on the displayed R output in (a), test the adequacy of model Cnt+Smo+Can+Cnt:Smo+Cnt:Can

at significance level 0.05. Provide your conclusion with clear explanation. [4]

(e) Are your conclusions in (c) and (d) contradictory? You must give an explanation to get any score.

[5]

3. A variable Y taking values in {0, 1, 2, . . . } has a Negative Binomial (NB) distribution if its probability

mass function has the form

p(Y = y;µ, κ) =

Γ(κ+ y)

Γ(κ)y!

κκµy

(µ+ κ)κ+y

,

for y = 0, 1, . . . , where µ is the mean of Y .

(a) When κ is considered as fixed (or known), the NB distribution belongs to the exponential dispersion

model (EDM) discussed in class. Write out its form as an EDM explicitly. In particular, you have

to identify the natural parameter θ and the dispersion parameter φ in terms of µ and κ whenever

appropriate, and identify b(·) (as a function of θ). You can simply take the weight ω to be 1. [5]

(b) Let σ2 be the variance of Y . From your answer above, derive the formula for σ2 as a function of µ.

Why do we say that the NB distribution can be used as a likelihood model to handle “overdispersion”

compared to the Poisson distribution? [4]

2

MAST90084 Statistical Modelling Assignment 1 Semester 1, 2021

Total marks = 40

3

学霸联盟

1. Let X and Y be two categorical random variables, X with I different categories identified with the set

{1, . . . , I} and Y with J different categories identified with the set {1, . . . , J}. Suppose observations of

the variable pair (X,Y ) are tabulated in a I × J contingency table. Using standard notations, for a given

(i, j) ∈ {1, . . . , I} × {1, . . . , J}, nij is the entry in the (i, j)-th cell that denotes the count of observations

with X equal to its i-th category and Y equal to its j-th category. A Poisson sampling model for the

contingency table assumes that the nij ’s are independently distributed with

nij ∼ Poi(µij),

where µij denotes the Poisson mean for the cell count nij .

(a) Derive the conditional joint distribution of {nij}(i,j)∈{1,...,I}×{1,...J} given n. Identify the name of this

distribution, and explicitly state what its parameter values are in terms of {µij}(i,j)∈{1,...,I}×{1,...J}

and n. [5]

(b) Let I = J = 2. The quantity µ11/µ12µ21/µ22 , also known as the odds ratio, measures the association between

X and Y . What should be the value of the odds ratio if X and Y are independent and why? [3]

2. Data in the following 2× 2× 3 contingency table were used to study the effect of passive smoking on lung

cancer. The table summarizes the results of case-control studies from 3 countries for nonsmoking women

married to smokers. (Source: Blot and Fraumeni, J. Nat. Cancer Inst., 77:993-1000 (1986) and Agresti

(1996).)

Country Spouse Smoked Cases Controls

Japan No 21 82

Yes 73 188

UK No 5 16

Yes 19 38

USA No 71 249

Yes 137 363

(a) A log-linear model mod1 can be fitted to the data, with the results being given in the following R

output. Give the mathematical formula of form ln(µ) = · · · for the mean model of mod1, where µ is

the mean of the response. Any dummy variables in your formula should be explicitly defined. [5]

> pasSmoking.dat=data.frame(freq=c(21,73,5,19,71,137,82,188,16,38,249,363))

> pasSmoking.dat$Cnt=factor(rep(c("Japan","UK", "USA"), times=2, each=2))

> pasSmoking.dat$Smo=factor(rep(c("No","Yes"), times=6))

> pasSmoking.dat$Can=factor(rep(c("Case","Control"), each=6))

> pasSmoking.dat

freq Cnt Smo Can

1 21 Japan No Case

2 73 Japan Yes Case

3 5 UK No Case

4 19 UK Yes Case

5 71 USA No Case

6 137 USA Yes Case

7 82 Japan No Control

8 188 Japan Yes Control

9 16 UK No Control

10 38 UK Yes Control

11 249 USA No Control

12 363 USA Yes Control

MAST90084 Statistical Modelling Assignment 1 Semester 1, 2021

> mod1=glm(freq~Cnt+Smo+Can+Cnt:Smo+Cnt:Can+Smo:Can, family=poisson, data=pasSmoking.dat)

> anova(mod1, test="Chisq")

Analysis of Deviance Table; Model: poisson; Link: log; Response: freq

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev P(>|Chi|)

NULL 11 1168.85

Cnt 2 726.43 9 442.42 < 2.2e-16

Smo 1 112.52 8 329.90 < 2.2e-16

Can 1 307.56 7 22.34 < 2.2e-16

Cnt:Smo 2 15.50 5 6.84 0.0004316

Cnt:Can 2 1.05 3 5.80 0.5919109

Smo:Can 1 5.56 2 0.24 0.0184215

> 1-pchisq(0.24,2)

[1] 0.8869204

> 1-pchisq(5.80,3)

[1] 0.1217566

(b) Expanding the notations from Question 1, for the current contingency table we can also use nijk to

denote the count in each cell, where i ∈ {1, 2}, j ∈ {1, 2}, k ∈ {1, 2, 3} are indices corresponding

to Can (variable X), Smo (variable Y ) and Cnt (variable Z) respectively. Moreover, if nijk are

independently distributed with

nijk ∼ Poi(µijk),

one can, for any k ∈ {1, 2, 3}, define the odd ratios θXY (k) = µ11kµ22kµ12kµ21k for the partial table with Z = k.

The table is said to have homogeneous XY association when θXY (1) = θXY (2) = θXY (3). Explain why

the model in part (a) has XY homogenous association. [5]

(c) Based on the displayed R output in (a), test the significance of the interaction effect Smo:Can at

significance level 0.05, eliminating the effects of all other terms in mod1. Provide your conclusion

with clear explanation. [4]

(d) Based on the displayed R output in (a), test the adequacy of model Cnt+Smo+Can+Cnt:Smo+Cnt:Can

at significance level 0.05. Provide your conclusion with clear explanation. [4]

(e) Are your conclusions in (c) and (d) contradictory? You must give an explanation to get any score.

[5]

3. A variable Y taking values in {0, 1, 2, . . . } has a Negative Binomial (NB) distribution if its probability

mass function has the form

p(Y = y;µ, κ) =

Γ(κ+ y)

Γ(κ)y!

κκµy

(µ+ κ)κ+y

,

for y = 0, 1, . . . , where µ is the mean of Y .

(a) When κ is considered as fixed (or known), the NB distribution belongs to the exponential dispersion

model (EDM) discussed in class. Write out its form as an EDM explicitly. In particular, you have

to identify the natural parameter θ and the dispersion parameter φ in terms of µ and κ whenever

appropriate, and identify b(·) (as a function of θ). You can simply take the weight ω to be 1. [5]

(b) Let σ2 be the variance of Y . From your answer above, derive the formula for σ2 as a function of µ.

Why do we say that the NB distribution can be used as a likelihood model to handle “overdispersion”

compared to the Poisson distribution? [4]

2

MAST90084 Statistical Modelling Assignment 1 Semester 1, 2021

Total marks = 40

3

学霸联盟