xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

R代写-MTHM506

时间：2021-01-16

MTHM506 Statistical Data Modelling

Problem Sheet 1

(Covers Topics 1-2)

You should attempt all questions on this sheet. The questions constitute both summative (indicated by marks) and

formative assessment. Marks achieved in this assignment will contribute 25% of the final module mark. Solutions

are expected to be clearly explained, concise, well structured and well presented. Give R input commands

for each model fitted (e.g. ‘model <- glm(...)’). Do not display too much raw R output as part of your

solutions (e.g. don’t display the full output of ‘summary(model)’), but edit this down to the essentials. All

plots should have titles and appropriately labelled axes. Hand written solutions will be accepted, but a more

professional word processed submission is preferred (as is a mixture of the two).

Topic 1

1. In the lecture notes, there is an example relating to a study of the number of fish caught of differing sizes in a

net of a particular mesh size. We developed a binomial GLM for probability of escape from net with a single

predictor x, the fish length in cm. The model for the data y1, . . . , yn (fish escaping) is

Yi ∼ Bin(Ni, pi) Yi independent

log

(

pi

1− pi

)

= β0 + β1xi

where Ni (total fish number) has also been observed and xi is fish length. The notes contain details of the

relevant log-likelihood and code the R script topic1.R directly maximises the log-likelihood of this binomial

model using the data set fish using the R (optimiser) function nlm() and reports the maximum likelihood

estimates for β0 and β1 along with their standard errors and other model diagnostics.

(a) Extend the R code in order to fit the modified model:

yi ∼ Bin(ni, pi)

log

(

pi

1− pi

)

= β0 + β1xi + β2x

2

i

Report the maximum likelihood estimates for β0 , β1 and β2 along with their standard errors and the AIC

for the modified model.

(b) Calculate the log likelihood ratio statistic relevant to a formal comparison of difference in fit between

the original and the modified model. Perform a formal test of the hypothesis that there is no significant

difference in fit between the original and the modified model.

2. The module’s R workspace, contains a data frame nlmodel which involves data on a response variable y and

a single explanatory variable x. A scatter plot of y versus x suggests a strong non-linear relationship. Suppose

for these data we wish to consider the model:

Yi ∼ N

(

θ1xi

θ2 + xi

, σ2

)

Yi independent.

(a) Why can’t this model be fitted as a linear (regression) model? (1)

(b) Write down the likelihood L(θ1, θ2, σ2;y,x) and the log-likelihood `(θ1, θ2, σ2;y,x). (2)

(c) Write an R function mylike() which evaluates −`(θ1, θ2, σ2;y,x) for any values of the three parameters.

(1)

(d) Use the R function nlm() in association with your function mylike() to numerically minimise the log-

likelihood. Provide some evidence of how you chose sensible starting values. (3)

(e) Report the maximum likelihood estimates of the parameters and superimpose a plot of the associated

mean relationship on a scatter plot of y versus x. (2)

(f) Report the standard errors for θ1 and θ2, and use those to construct 95% confidence intervals. (5)

1

(g) Test the hypothesis that θ2 = 0.08 at the 5% significance level (not using the confidence interval) and

compute the associated p-value of the test. (3)

(h) Use plug-in prediction to construct and plot 95% prediction intervals. (4)

(21)

3. The dataframe sexr contains a variable ratio which is the ration of male births to female births across various

countries. Interest lies in seeing how this relates to a deprivation measure, namely infant mortality tinmort, so

it was suggested to use a Gamma distribution (since the ratio is strictly positive) with a mean that is a function

of tinmort. Specifically:

Yi ∼ Gamma(µi, λ) Yi indep.

log(µi) = β0 + β1xi

where xi is the year and where the pdf of the Gamma is

p (yi) =

yα−1i e

−yi/λi

Γ (α)λαi

, yi > 0, (1)

which has mean E [Yi] = µi = αλi and variance var [Yi] = σ2i = αλ

2

i . Parameter α is called the shape

parameter, while λi is the scale parameter.

(a) Write down the likelihood and the log-likelihood of this model.

(b) Use R to estimate the unknown parameters and report the standard errors.

(c) Plot the estimated relationship.

(d) Test the hypothesis that β1 = 0.

(e) Predict the ratio for the infant mortality value of 0.4 and report a 95% prediction interval.

Topic 2

4. In the lecture slides for Topic 2, we argued that the Normal, Binomial and Poisson distribution all belong to the

exponential family

p (y; θ, φ) = exp

{

(yθ − b (θ))

a(φ)

+ c (y, φ)

}

.

Show that this is the case by using the fact that a mathematical function f(x) can be written as exp{log(f(x))}.

5. Consider the Gamma distribution

p (y) =

yα−1e−y/λ

Γ (α)λα

, y > 0,

which has mean E [Y ] = µ = αλ and variance var [Y ] = σ2 = αλ2.

(a) Rewrite the probability distribution p(y) in terms of the mean µ and parameter α. (1)

(b) Thus show that it belongs to the exponential family, identifying parameters θ and functions a(·), b (·) and

c (·) in terms of α and µ. (3)

(c) Verify the expressions for the mean and the variance of the distribution using the associated general

results for the exponential family. (3)

(7)

6. The module workspace contains a data frame dicentric which presents data from an experiment conducted

to determine the effect of gamma radiation on the numbers of chromosomal abnormalities ca observed. The

number of cells exposed (in hundreds) in each run, differs and is contained in the variable cells. The dose

amount doseamt and the rate doserate at which the dose is applied are the predictors of interest.

(a) Directly model the rate, i.e. number of abnormalities per cell by using a Poisson GLM with response ca,

with a log link, with doseamt and doserate as predictors and with a suitable offset involving the numbers

of cells (cells). Comment on the model summary - parameter estimates, standard errors, model fit and

residual plots.

2

(b) If your final model from (a) is substantially over-dispersed, then fit a quasi-Poisson model instead and

comment on the differences between that and the results from (a).

(c) Alternatively, fit a Negative Binomial model and comment on the differences between that and the results

from (a).

(d) It was suggested that there might be an interaction between dose rate and dose amount, on the basis

that the effect of dose rate is not independent of dose amount (and vice versa). Fit a Poisson GLM with

the interaction, and also non-linear effects if deemed necessary.

(e) Use the AIC to choose between the Negative Binomial model and the Poisson GLM with the interactions,

and discuss what this model tells us about the relationship of the covariates to the response.

7. In lectures, we have seen data on number of quarterly aids cases in the UK, yi, from January 1983 to March

1994. The data are in dataframe aids, where the variable cases is yi and date is time, symbolised here as

xi. In this question we consider two competing models to describe the trend in the number of cases. Model 1

is

Yi ∼ Pois(λi) Yi independent

log(λi) = β0 + β1xi

and Model 2 is

Yi ∼ N(µi, σ2) Yi independent

log(µi) = γ0 + γ1xi

(a) Plot yi against xi and comment on whether the two proposed models are sensible. (3)

(b) Fit the two models in R, add the estimated trends from each model (λˆi and µˆi) on the plot from (7a) along

with approximate 95% confidence intervals on the mean and comment on the validity of each model

(based on the plot). For the confidence intervals you can assume that the sampling distribution of λˆi and

µˆi is approximately Gaussian with standard errors obtained from the R function

predict(...,type=‘‘response’’,se.fit=T). Obtain the AIC for each model and thus comment on

which model is preferable according to this criterion. (5)

(c) Produce the deviance residuals vs fitted values (λˆi and µˆi) plot for each model, comment appropriately

and thus propose a way that the two models might be extended to improve the fit. (2)

(d) Implement the proposed extensions to each model, to arrive at a final version for each of them (justified

by appropriate hypothesis tests). (3)

(e) On the basis of the analogous plots as in (7b) and (7c) but also on arguments of model fit based on the

deviance and the AIC, comment on which (if any) of the two final models in (d) you would choose as the

best. Mention at least one reason why either model is not ideal. (9)

(f) Further extend the final Poisson model to a Negative Binomial model and comment on whether this model

is preferable to the other two, on the basis of all the criteria used for comparison so far. (3)

(25)

8. The dataframe titanic2 contains an aggregated version of the titanic survival data. The age covariate has

been aggregated to “child” and “adult”.

(a) Fit a Binomial model with just the “main effects” of age_group, pclass and gender, and interpret the

estimates.

(b) To see whether the effect of age is different for males and females, and also whether the effect of class is

different for males and females, fit the model with an interaction between age_group and gender as well

pclass and gender, and interpret the estimates.

9. The module workspace, contains a data frame titanic which relates to 1309 passengers on the last voyage

of the ocean liner ‘Titanic’. The response variable survived is a binary variable where the value 1 means the

passenger survived the sinking. The data frame also contains predictors relating to passenger class (1st, 2nd,

3rd), gender, age and the fare amount each passenger paid. Passenger names are also available (for interest,

rather than for modelling).

3

(a) Fit a Bernoulli GLM with logistic link of survived with age, pclass and gender are predictors, as well

as all the associated two-way interactions. Reduce the model if and as appropriate using the AIC in

conjunction with the R function drop1(), and interpret the final model in terms of parameter estimates

and their significance. Perform relevant model checking analysis (model fit and residuals).

(13)

10. In 1972-74, a survey of one in six residents of Whickham, near Newcastle, England was made. Twenty years

later, this data recorded in a follow-up study. Only women who are current smokers or who have never smoked

are included. Resulting data set comprises 28 obs on the following 4 variables: y is the observed count for

given combination, smoker is a factor with levels yes no, dead is a factor with levels yes/no, age is a factor with

age-group levels 18-24, 25-34, 35-44, 45-54, 55-64, 65-74 and 75+. Interest here lies on the effects of age

and smoking on the probability of death.

(a) Investigate the association of age and smoking on the chance of dying, using Poisson log-linear models

for this contingency table. Clearly state any assumptions you are making about the sampling design.

11. In an archaeological study of animal husbandry, there is a particular interest in proportions of fragments of

animal bones of different types discovered in excavations in the south of England over four time eras. A data

set is collated from the excavation reports which consists of a three-way contingency table where cells contain

total numbers of bone fragments from all excavations cross-classfied by animal type (‘sheep’ or ‘cattle’ or

‘pig’), by the surface geology of the excavation sites (‘valley terrace’ or ’other’) and by the era (‘early saxon’

or ‘middle saxon’ or ‘late saxon’ or ‘high medieval’). These data are contained in the dataframe bones in the

module workspace. Interest lies in the association of era and geology, with the number of different bone types.

Explore significant associations in these data using Poisson log-linear models. In particular, identify an appro-

priate final preferred model which suitably fits the data and then report what this model suggests concerning

significant associations between the numbers of bones of different types and surface geology and ditto with

era. What assumptions are you making about the sampling design for the study in carrying out your analysis

and what are the modelling implications of those? (15)

12. The dataframe ToothGrowth, included in the base R installation, see ?ToothGrowth, contains information on

the growth of teeth in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2

mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

(a) Fit an Exponential GLM to length of teeth with two covariates, with the default link function and check

whether the model fits.

(b) Alternatively, fit this as a Gamma distribution and interpret the effect of the covariates.

(c) Check the residual plots of the Gamma model, and suggest a way that these can be improved.

(d) Implement your suggestion and check whether this has remedied the problem.

(e) Interpret the effects of the covariates from your final model.

13. Consider the data frame gehan in the module R workspace. This involves a trial of 42 leukaemia patients

where some were treated with the drug 6-mercaptopurine and the rest are controls. The trial was designed

as matched pairs, both withdrawn from the trial when either came out of remission (more info can be found by

typing ?gehan having first loaded the R package MASS).

Variable cens indicates that some observations are censored (cens=0). This means that the patient was

removed or dropped out from the trial before their remission time can be recorded. The only data available for

such patients is the time they were removed, i.e. that they didn’t come out of remission before this removal

time. When cens=0, variable time records the drop-out time rather than the remission time.

(a) Create a new dataframe by removing the censored observations and use the command

gehan$treat <- relevel(gehan$treat, "control")

to ensure that it is the “control” that gets buried in the intercept. Write down and fit an Exponential

GLM with an inverse link, with time as the response and treat as the covariate. Discuss the effects of

treatment, and check whether the model fits. (5)

4

(b) Discarding censored observations simplifies the situation, however this means throwing away potentially

useful information (even if partial). To fit a model with all the data we can instead alter the likelihood

contributions of the censored and non-censored data points:

L(λi; ti) = p(ti;λi) = λie

−λiti if no censoring

L(λi; ti) = Pr(t > ti;λi) = 1− Pr(t < ti;λi) = e−λiti if censoring

where p(ti;λi) is the Exponential probability density function for survival times ti, and Pr(t < ti;λi) is

the Exponential cumulative distribution function. The likelihood can then be written as:

L(λi; t1, . . . , tn) =

n∏

i=1

p(ti;λi)

ci [1− Pr(t < ti;λi)]1−ci

where ci = 1 for no censoring and ci = 0 for censoring. Show that for the exponential distribution

p(ti;λi) = λie

−λiti , the likelihood can be written as:

L(λi; t1, . . . , tn) =

n∏

i=1

(λiti)

cie−λiti

tcii

and note that this likelihood is equivalent to the Poisson likelihood up to a constant. One can therefore

fit this model using a Poisson model with mean λiti where the data are the ci. Write down this Poisson

model mathematically, such that λi = β0+β1xi where xi is the treatment variable. Fit this Poisson model

in R, check that it fits, and compare the treatment effect with the Exponential model on the ‘thinned’ data

from part (a). (Hint: to fit a model without an intercept, add a “−1” in the model formula of the glm() call.

Also, you will need to use the command gehan$treat <- relevel(gehan$treat, "6-MP") before

fitting the model, to revert the changes from part (a).) (14)

(19)

(100)

5

Problem Sheet 1

(Covers Topics 1-2)

You should attempt all questions on this sheet. The questions constitute both summative (indicated by marks) and

formative assessment. Marks achieved in this assignment will contribute 25% of the final module mark. Solutions

are expected to be clearly explained, concise, well structured and well presented. Give R input commands

for each model fitted (e.g. ‘model <- glm(...)’). Do not display too much raw R output as part of your

solutions (e.g. don’t display the full output of ‘summary(model)’), but edit this down to the essentials. All

plots should have titles and appropriately labelled axes. Hand written solutions will be accepted, but a more

professional word processed submission is preferred (as is a mixture of the two).

Topic 1

1. In the lecture notes, there is an example relating to a study of the number of fish caught of differing sizes in a

net of a particular mesh size. We developed a binomial GLM for probability of escape from net with a single

predictor x, the fish length in cm. The model for the data y1, . . . , yn (fish escaping) is

Yi ∼ Bin(Ni, pi) Yi independent

log

(

pi

1− pi

)

= β0 + β1xi

where Ni (total fish number) has also been observed and xi is fish length. The notes contain details of the

relevant log-likelihood and code the R script topic1.R directly maximises the log-likelihood of this binomial

model using the data set fish using the R (optimiser) function nlm() and reports the maximum likelihood

estimates for β0 and β1 along with their standard errors and other model diagnostics.

(a) Extend the R code in order to fit the modified model:

yi ∼ Bin(ni, pi)

log

(

pi

1− pi

)

= β0 + β1xi + β2x

2

i

Report the maximum likelihood estimates for β0 , β1 and β2 along with their standard errors and the AIC

for the modified model.

(b) Calculate the log likelihood ratio statistic relevant to a formal comparison of difference in fit between

the original and the modified model. Perform a formal test of the hypothesis that there is no significant

difference in fit between the original and the modified model.

2. The module’s R workspace, contains a data frame nlmodel which involves data on a response variable y and

a single explanatory variable x. A scatter plot of y versus x suggests a strong non-linear relationship. Suppose

for these data we wish to consider the model:

Yi ∼ N

(

θ1xi

θ2 + xi

, σ2

)

Yi independent.

(a) Why can’t this model be fitted as a linear (regression) model? (1)

(b) Write down the likelihood L(θ1, θ2, σ2;y,x) and the log-likelihood `(θ1, θ2, σ2;y,x). (2)

(c) Write an R function mylike() which evaluates −`(θ1, θ2, σ2;y,x) for any values of the three parameters.

(1)

(d) Use the R function nlm() in association with your function mylike() to numerically minimise the log-

likelihood. Provide some evidence of how you chose sensible starting values. (3)

(e) Report the maximum likelihood estimates of the parameters and superimpose a plot of the associated

mean relationship on a scatter plot of y versus x. (2)

(f) Report the standard errors for θ1 and θ2, and use those to construct 95% confidence intervals. (5)

1

(g) Test the hypothesis that θ2 = 0.08 at the 5% significance level (not using the confidence interval) and

compute the associated p-value of the test. (3)

(h) Use plug-in prediction to construct and plot 95% prediction intervals. (4)

(21)

3. The dataframe sexr contains a variable ratio which is the ration of male births to female births across various

countries. Interest lies in seeing how this relates to a deprivation measure, namely infant mortality tinmort, so

it was suggested to use a Gamma distribution (since the ratio is strictly positive) with a mean that is a function

of tinmort. Specifically:

Yi ∼ Gamma(µi, λ) Yi indep.

log(µi) = β0 + β1xi

where xi is the year and where the pdf of the Gamma is

p (yi) =

yα−1i e

−yi/λi

Γ (α)λαi

, yi > 0, (1)

which has mean E [Yi] = µi = αλi and variance var [Yi] = σ2i = αλ

2

i . Parameter α is called the shape

parameter, while λi is the scale parameter.

(a) Write down the likelihood and the log-likelihood of this model.

(b) Use R to estimate the unknown parameters and report the standard errors.

(c) Plot the estimated relationship.

(d) Test the hypothesis that β1 = 0.

(e) Predict the ratio for the infant mortality value of 0.4 and report a 95% prediction interval.

Topic 2

4. In the lecture slides for Topic 2, we argued that the Normal, Binomial and Poisson distribution all belong to the

exponential family

p (y; θ, φ) = exp

{

(yθ − b (θ))

a(φ)

+ c (y, φ)

}

.

Show that this is the case by using the fact that a mathematical function f(x) can be written as exp{log(f(x))}.

5. Consider the Gamma distribution

p (y) =

yα−1e−y/λ

Γ (α)λα

, y > 0,

which has mean E [Y ] = µ = αλ and variance var [Y ] = σ2 = αλ2.

(a) Rewrite the probability distribution p(y) in terms of the mean µ and parameter α. (1)

(b) Thus show that it belongs to the exponential family, identifying parameters θ and functions a(·), b (·) and

c (·) in terms of α and µ. (3)

(c) Verify the expressions for the mean and the variance of the distribution using the associated general

results for the exponential family. (3)

(7)

6. The module workspace contains a data frame dicentric which presents data from an experiment conducted

to determine the effect of gamma radiation on the numbers of chromosomal abnormalities ca observed. The

number of cells exposed (in hundreds) in each run, differs and is contained in the variable cells. The dose

amount doseamt and the rate doserate at which the dose is applied are the predictors of interest.

(a) Directly model the rate, i.e. number of abnormalities per cell by using a Poisson GLM with response ca,

with a log link, with doseamt and doserate as predictors and with a suitable offset involving the numbers

of cells (cells). Comment on the model summary - parameter estimates, standard errors, model fit and

residual plots.

2

(b) If your final model from (a) is substantially over-dispersed, then fit a quasi-Poisson model instead and

comment on the differences between that and the results from (a).

(c) Alternatively, fit a Negative Binomial model and comment on the differences between that and the results

from (a).

(d) It was suggested that there might be an interaction between dose rate and dose amount, on the basis

that the effect of dose rate is not independent of dose amount (and vice versa). Fit a Poisson GLM with

the interaction, and also non-linear effects if deemed necessary.

(e) Use the AIC to choose between the Negative Binomial model and the Poisson GLM with the interactions,

and discuss what this model tells us about the relationship of the covariates to the response.

7. In lectures, we have seen data on number of quarterly aids cases in the UK, yi, from January 1983 to March

1994. The data are in dataframe aids, where the variable cases is yi and date is time, symbolised here as

xi. In this question we consider two competing models to describe the trend in the number of cases. Model 1

is

Yi ∼ Pois(λi) Yi independent

log(λi) = β0 + β1xi

and Model 2 is

Yi ∼ N(µi, σ2) Yi independent

log(µi) = γ0 + γ1xi

(a) Plot yi against xi and comment on whether the two proposed models are sensible. (3)

(b) Fit the two models in R, add the estimated trends from each model (λˆi and µˆi) on the plot from (7a) along

with approximate 95% confidence intervals on the mean and comment on the validity of each model

(based on the plot). For the confidence intervals you can assume that the sampling distribution of λˆi and

µˆi is approximately Gaussian with standard errors obtained from the R function

predict(...,type=‘‘response’’,se.fit=T). Obtain the AIC for each model and thus comment on

which model is preferable according to this criterion. (5)

(c) Produce the deviance residuals vs fitted values (λˆi and µˆi) plot for each model, comment appropriately

and thus propose a way that the two models might be extended to improve the fit. (2)

(d) Implement the proposed extensions to each model, to arrive at a final version for each of them (justified

by appropriate hypothesis tests). (3)

(e) On the basis of the analogous plots as in (7b) and (7c) but also on arguments of model fit based on the

deviance and the AIC, comment on which (if any) of the two final models in (d) you would choose as the

best. Mention at least one reason why either model is not ideal. (9)

(f) Further extend the final Poisson model to a Negative Binomial model and comment on whether this model

is preferable to the other two, on the basis of all the criteria used for comparison so far. (3)

(25)

8. The dataframe titanic2 contains an aggregated version of the titanic survival data. The age covariate has

been aggregated to “child” and “adult”.

(a) Fit a Binomial model with just the “main effects” of age_group, pclass and gender, and interpret the

estimates.

(b) To see whether the effect of age is different for males and females, and also whether the effect of class is

different for males and females, fit the model with an interaction between age_group and gender as well

pclass and gender, and interpret the estimates.

9. The module workspace, contains a data frame titanic which relates to 1309 passengers on the last voyage

of the ocean liner ‘Titanic’. The response variable survived is a binary variable where the value 1 means the

passenger survived the sinking. The data frame also contains predictors relating to passenger class (1st, 2nd,

3rd), gender, age and the fare amount each passenger paid. Passenger names are also available (for interest,

rather than for modelling).

3

(a) Fit a Bernoulli GLM with logistic link of survived with age, pclass and gender are predictors, as well

as all the associated two-way interactions. Reduce the model if and as appropriate using the AIC in

conjunction with the R function drop1(), and interpret the final model in terms of parameter estimates

and their significance. Perform relevant model checking analysis (model fit and residuals).

(13)

10. In 1972-74, a survey of one in six residents of Whickham, near Newcastle, England was made. Twenty years

later, this data recorded in a follow-up study. Only women who are current smokers or who have never smoked

are included. Resulting data set comprises 28 obs on the following 4 variables: y is the observed count for

given combination, smoker is a factor with levels yes no, dead is a factor with levels yes/no, age is a factor with

age-group levels 18-24, 25-34, 35-44, 45-54, 55-64, 65-74 and 75+. Interest here lies on the effects of age

and smoking on the probability of death.

(a) Investigate the association of age and smoking on the chance of dying, using Poisson log-linear models

for this contingency table. Clearly state any assumptions you are making about the sampling design.

11. In an archaeological study of animal husbandry, there is a particular interest in proportions of fragments of

animal bones of different types discovered in excavations in the south of England over four time eras. A data

set is collated from the excavation reports which consists of a three-way contingency table where cells contain

total numbers of bone fragments from all excavations cross-classfied by animal type (‘sheep’ or ‘cattle’ or

‘pig’), by the surface geology of the excavation sites (‘valley terrace’ or ’other’) and by the era (‘early saxon’

or ‘middle saxon’ or ‘late saxon’ or ‘high medieval’). These data are contained in the dataframe bones in the

module workspace. Interest lies in the association of era and geology, with the number of different bone types.

Explore significant associations in these data using Poisson log-linear models. In particular, identify an appro-

priate final preferred model which suitably fits the data and then report what this model suggests concerning

significant associations between the numbers of bones of different types and surface geology and ditto with

era. What assumptions are you making about the sampling design for the study in carrying out your analysis

and what are the modelling implications of those? (15)

12. The dataframe ToothGrowth, included in the base R installation, see ?ToothGrowth, contains information on

the growth of teeth in 60 guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2

mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).

(a) Fit an Exponential GLM to length of teeth with two covariates, with the default link function and check

whether the model fits.

(b) Alternatively, fit this as a Gamma distribution and interpret the effect of the covariates.

(c) Check the residual plots of the Gamma model, and suggest a way that these can be improved.

(d) Implement your suggestion and check whether this has remedied the problem.

(e) Interpret the effects of the covariates from your final model.

13. Consider the data frame gehan in the module R workspace. This involves a trial of 42 leukaemia patients

where some were treated with the drug 6-mercaptopurine and the rest are controls. The trial was designed

as matched pairs, both withdrawn from the trial when either came out of remission (more info can be found by

typing ?gehan having first loaded the R package MASS).

Variable cens indicates that some observations are censored (cens=0). This means that the patient was

removed or dropped out from the trial before their remission time can be recorded. The only data available for

such patients is the time they were removed, i.e. that they didn’t come out of remission before this removal

time. When cens=0, variable time records the drop-out time rather than the remission time.

(a) Create a new dataframe by removing the censored observations and use the command

gehan$treat <- relevel(gehan$treat, "control")

to ensure that it is the “control” that gets buried in the intercept. Write down and fit an Exponential

GLM with an inverse link, with time as the response and treat as the covariate. Discuss the effects of

treatment, and check whether the model fits. (5)

4

(b) Discarding censored observations simplifies the situation, however this means throwing away potentially

useful information (even if partial). To fit a model with all the data we can instead alter the likelihood

contributions of the censored and non-censored data points:

L(λi; ti) = p(ti;λi) = λie

−λiti if no censoring

L(λi; ti) = Pr(t > ti;λi) = 1− Pr(t < ti;λi) = e−λiti if censoring

where p(ti;λi) is the Exponential probability density function for survival times ti, and Pr(t < ti;λi) is

the Exponential cumulative distribution function. The likelihood can then be written as:

L(λi; t1, . . . , tn) =

n∏

i=1

p(ti;λi)

ci [1− Pr(t < ti;λi)]1−ci

where ci = 1 for no censoring and ci = 0 for censoring. Show that for the exponential distribution

p(ti;λi) = λie

−λiti , the likelihood can be written as:

L(λi; t1, . . . , tn) =

n∏

i=1

(λiti)

cie−λiti

tcii

and note that this likelihood is equivalent to the Poisson likelihood up to a constant. One can therefore

fit this model using a Poisson model with mean λiti where the data are the ci. Write down this Poisson

model mathematically, such that λi = β0+β1xi where xi is the treatment variable. Fit this Poisson model

in R, check that it fits, and compare the treatment effect with the Exponential model on the ‘thinned’ data

from part (a). (Hint: to fit a model without an intercept, add a “−1” in the model formula of the glm() call.

Also, you will need to use the command gehan$treat <- relevel(gehan$treat, "6-MP") before

fitting the model, to revert the changes from part (a).) (14)

(19)

(100)

5

- 留学生代写
- Python代写
- Java代写
- c/c++代写
- 数据库代写
- 算法代写
- 机器学习代写
- 数据挖掘代写
- 数据分析代写
- Android代写
- html代写
- 计算机网络代写
- 操作系统代写
- 计算机体系结构代写
- R代写
- 数学代写
- 金融作业代写
- 微观经济学代写
- 会计代写
- 统计代写
- 生物代写
- 物理代写
- 机械代写
- Assignment代写
- sql数据库代写
- analysis代写
- Haskell代写
- Linux代写
- Shell代写
- Diode Ideality Factor代写
- 宏观经济学代写
- 经济代写
- 计量经济代写
- math代写
- 金融统计代写
- 经济统计代写
- 概率论代写
- 代数代写
- 工程作业代写
- Databases代写
- 逻辑代写
- JavaScript代写
- Matlab代写
- Unity代写
- BigDate大数据代写
- 汇编代写
- stat代写
- scala代写
- OpenGL代写
- CS代写
- 程序代写
- 简答代写
- Excel代写
- Logisim代写
- 代码代写
- 手写题代写
- 电子工程代写
- 判断代写
- 论文代写
- stata代写
- witness代写
- statscloud代写
- 证明代写
- 非欧几何代写
- 理论代写
- http代写
- MySQL代写
- PHP代写
- 计算代写
- 考试代写
- 博弈论代写
- 英语代写
- essay代写
- 不限代写
- lingo代写
- 线性代数代写
- 文本处理代写
- 商科代写
- visual studio代写
- 光谱分析代写
- report代写
- GCP代写
- 无代写
- 电力系统代写
- refinitiv eikon代写
- 运筹学代写
- simulink代写
- 单片机代写
- GAMS代写
- 人力资源代写
- 报告代写
- SQLAlchemy代写
- Stufio代写
- sklearn代写
- 计算机架构代写
- 贝叶斯代写
- 以太坊代写
- 计算证明代写
- prolog代写
- 交互设计代写
- mips代写
- css代写
- 云计算代写
- dafny代写
- quiz考试代写
- js代写
- 密码学代写
- ml代写
- 水利工程基础代写
- 经济管理代写
- Rmarkdown代写
- 电路代写
- 质量管理画图代写
- sas代写
- 金融数学代写
- processing代写
- 预测分析代写
- 机械力学代写
- vhdl代写
- solidworks代写
- 不涉及代写
- 计算分析代写
- Netlogo代写
- openbugs代写
- 土木代写
- 国际金融专题代写
- 离散数学代写
- openssl代写
- 化学材料代写
- eview代写
- nlp代写
- Assembly language代写
- gproms代写
- studio代写
- robot analyse代写
- pytorch代写
- 证明题代写
- latex代写
- coq代写
- 市场营销论文代写
- 人力资论文代写
- weka代写
- 英文代写
- Minitab代写
- 航空代写
- webots代写
- Advanced Management Accounting代写
- Lunix代写
- 云基础代写
- 有限状态过程代写
- aws代写
- AI代写
- 图灵机代写
- Sociology代写
- 分析代写
- 经济开发代写
- Data代写
- jupyter代写
- 通信考试代写
- 网络安全代写
- 固体力学代写
- spss代写
- 无编程代写
- react代写
- Ocaml代写
- 期货期权代写
- Scheme代写
- 数学统计代写
- 信息安全代写
- Bloomberg代写
- 残疾与创新设计代写
- 历史代写
- 理论题代写
- cpu代写
- 计量代写
- Xpress-IVE代写
- 微积分代写
- 材料学代写
- 代写
- 会计信息系统代写
- 凸优化代写
- 投资代写
- F#代写
- C#代写
- arm代写
- 伪代码代写
- 白话代写
- IC集成电路代写
- reasoning代写
- agents代写
- 精算代写
- opencl代写
- Perl代写
- 图像处理代写
- 工程电磁场代写
- 时间序列代写
- 数据结构算法代写
- 网络基础代写
- 画图代写
- Marie代写
- ASP代写
- EViews代写
- Interval Temporal Logic代写
- ccgarch代写
- rmgarch代写
- jmp代写
- 选择填空代写
- mathematics代写
- winbugs代写
- maya代写
- Directx代写
- PPT代写
- 可视化代写
- 工程材料代写
- 环境代写
- abaqus代写
- 投资组合代写
- 选择题代写
- openmp.c代写
- cuda.cu代写
- 传感器基础代写
- 区块链比特币代写
- 土壤固结代写
- 电气代写
- 电子设计代写
- 主观题代写
- 金融微积代写
- ajax代写
- Risk theory代写
- tcp代写
- tableau代写
- mylab代写
- research paper代写
- 手写代写
- 管理代写
- paper代写
- 毕设代写
- 衍生品代写
- 学术论文代写
- 计算画图代写
- SPIM汇编代写
- 演讲稿代写
- 金融实证代写
- 环境化学代写
- 通信代写
- 股权市场代写
- 计算机逻辑代写
- Microsoft Visio代写
- 业务流程管理代写
- Spark代写
- USYD代写
- 数值分析代写
- 有限元代写
- 抽代代写
- 不限定代写
- IOS代写
- scikit-learn代写
- ts angular代写
- sml代写
- 管理决策分析代写
- vba代写
- 墨大代写
- erlang代写
- Azure代写
- 粒子物理代写
- 编译器代写
- socket代写
- 商业分析代写
- 财务报表分析代写
- Machine Learning代写
- 国际贸易代写
- code代写
- 流体力学代写
- 辅导代写
- 设计代写
- marketing代写
- web代写
- 计算机代写
- verilog代写
- 心理学代写
- 线性回归代写
- 高级数据分析代写
- clingo代写
- Mplab代写
- coventorware代写
- creo代写
- nosql代写
- 供应链代写
- uml代写
- 数字业务技术代写
- 数字业务管理代写
- 结构分析代写
- tf-idf代写
- 地理代写
- financial modeling代写
- quantlib代写
- 电力电子元件代写
- atenda 2D代写
- 宏观代写
- 媒体代写
- 政治代写
- 化学代写
- 随机过程代写
- self attension算法代写
- arm assembly代写
- wireshark代写
- openCV代写
- Uncertainty Quantificatio代写
- prolong代写
- IPYthon代写
- Digital system design 代写
- julia代写
- Advanced Geotechnical Engineering代写
- 回答问题代写
- junit代写
- solidty代写
- maple代写
- 光电技术代写
- 网页代写
- 网络分析代写
- ENVI代写
- gimp代写
- sfml代写
- 社会学代写
- simulationX solidwork代写
- unity 3D代写
- ansys代写
- react native代写
- Alloy代写
- Applied Matrix代写
- JMP PRO代写
- 微观代写
- 人类健康代写
- 市场代写
- proposal代写
- 软件代写
- 信息检索代写
- 商法代写
- 信号代写
- pycharm代写
- 金融风险管理代写
- 数据可视化代写
- fashion代写
- 加拿大代写
- 经济学代写
- Behavioural Finance代写
- cytoscape代写
- 推荐代写
- 金融经济代写
- optimization代写
- alteryxy代写
- tabluea代写
- sas viya代写
- ads代写
- 实时系统代写
- 药剂学代写
- os代写
- Mathematica代写
- Xcode代写
- Swift代写
- rattle代写
- 人工智能代写
- 流体代写
- 结构力学代写
- Communications代写
- 动物学代写
- 问答代写
- MiKTEX代写
- 图论代写
- 数据科学代写
- 计算机安全代写
- 日本历史代写
- gis代写
- rs代写
- 语言代写
- 电学代写
- flutter代写
- drat代写
- 澳洲代写
- 医药代写
- ox代写
- 营销代写
- pddl代写
- 工程项目代写
- archi代写
- Propositional Logic代写
- 国际财务管理代写
- 高宏代写
- 模型代写
- 润色代写
- 营养学论文代写
- 热力学代写
- Acct代写
- Data Synthesis代写
- 翻译代写
- 公司法代写
- 管理学代写
- 建筑学代写
- 生理课程代写
- 动画代写
- 高数代写
- 内嵌式代写
- Truffles代写
- 地质学代写