xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-STAT 8310

时间：2020-12-14

STAT 8310 - Data Analysis I

Final - Fall 2020

Instructions: Provide answers to each of the four questions below. Points for each problem are specified.

• For each part of each problem, write your answers in complete sentences providing context and statis-

tical justification.

• Do not include R code in your solutions; rather, use R as a means for doing your analysis.

• You are not allowed to discuss any part of these analyses with anyone other than me. Contact me if

you have questions.

1. Nine well-trained cyclists participated in a study in which they were each given 3 doses of caffeine (0, 5,

13mg) and their endurance performance time was measured. The data are given in caffeineCycling.Rdata.

(a) (5 points) Using nonparametric methods, determine whether there is a significant difference in

endurance performance between the 0 and 5mg caffeine dosages. Provide a written summary of

your results along with statistical support to justify your recommendation.

(b) (5 points) Assuming you are interested in comparing all 3 dosages of caffeine, what type of model/test

would you suggest to your collaborators to investigate the relationship between caffeine and en-

durance performance. [You don’t need to do the modeling/testing.]

2. Assuming you’ve been following the news, both Moderna and Pfizer/BioNTech have developed vaccina-

tions for the coronavirus with efficacy of approximately 90-95%. It has been reported recently (https:

//science.sciencemag.org/content/370/6520/1022) that there are some side effects for these vac-

cines, including headache and fatigue. The following table provides information about the results of

the clinical trials of both Moderna and Pfizer/BioNTech. Note that both clinical trials administered

the vaccine to half of the participants while the other half received a placebo. The trials were double

blind meaning that neither the physician nor the participant knew whether they received the vaccine or

placebo.

Moderna Pfizer/BioNTech

Total number of participants in the clinical trial 30,000 43,000

Proportion of vaccine recipients reporting fatigue 0.097 0.038

Proportion of vaccine recipients reporting headache 0.045 0.020

Proportion of vaccine recipients reporting both headache & fatigue 0.024 0.017

(a) (5 points) Compute a 95% confidence interval for the risk of fatigue from the Moderna vaccine.

(b) (5 points) Perform a statistical test to determine whether there is a difference in the risk of headache

between the two vaccines.

(c) (5 points) Suppose with a type I error rate of α = 0.05, you want to simultaneously test the

hypotheses that the risk of having both a headache and fatigue for the Moderna vaccine is different

than 0.02 and the risk of having both a headache and fatigue for the Pfizer/BioNTech vaccine is

different that 0.02. What is the power of each test?

(d) (5 points) Pooling the data from both Moderna and Pfizer/BioNTech, test whether headache and

fatigue are independent side effects of the vaccine. Provide statistical support to justify your answer.

3. (30 points) Consider the data lakeNitrogen.Rdata, which are observations taken at 200 lakes across

the northeastern part of the United States. The data, which come from a research project I am currently

working on, consist of measures of total nitrogen in the lake (measured in micrograms per liter) as well

as important explanatory variables of nitrogen (e.g., land use, lake characteristics). Specifics about the

variables are given in the table below. The researcher is interested in determining what variables are

important for explaining total nitrogen in lakes.

TN Total nitrogen in the lake

Baseflow measure of stream flow between precipitation events

NO3Depo nitrite deposited into the lake from the atmosphere, usually

through precipitation)

TotalDepo total nitrogen deposited into the lake from the atmosphere,

usually through precipitation

Runoff measure of precipitation that flows off the surface of the land into the lake

Urban percent of area around the lake classified as urban

Rowcrop percent of area around the lake classified as rowcrop (agricultural land)

Pasture percent of area around the lake classified as pasture

Forest percent of area around the lake classified as forest

Wetland percent of area around the lake classified as wetland

LakeArea area of the lake

MaxDepth maximum depth of the lake

Connectivity categorical variable for whether it is located downriver from a lake & stream,

located downriver from a stream, or an isolated/headwater lake

LWR lake-to-watershed ratio, measures the proportion of the watershed for which the

lake makes up

Your task is to serve as the data analyst on the project. Build a regression model for total nitrogen using

the techniques we learned in class. Prepare a final report outlining the steps of your analysis. Include

a detailed summary of your final model providing statistical justification for your choice. Make sure to

report the fitted regression equation. Your model must fit the data well, be interpretable, parsimonious,

and not violate any model assumption. I would encourage you to do a fair amount of exploratory data

analysis to get a feel for the different variables before you begin building your model.

4. The dataset training.Rdata contains training session data for four collegiate athletes. In particular,

included in the dataset are the rate of perceived effort (RPE) for each athlete for 16 different training

sessions. RPE is a measure that each athlete assigns to a training session based on their perceived

level of difficulty. Also included in the data is a predefined ordinal intensity for each training session

(low=L1, low/moderate=L2, moderate/high=L3, and high=L4) as well as position played (F=forward,

M=midfield, D=defense).

(a) (5 points) Disregard the athlete variable to start. Is there a significant difference in RPE for the

different intensities of training session? Is there a significant difference in RPE for the different

positions? Summarize your results in context providing statistical justification.

(b) (5 points) Consider an appropriate two-factor fixed effects model for RPE with intensity and po-

sition as fixed factors. Write out the model explicitly in terms of the variables/parameters. Make

sure to define all terms.

(c) (5 points) Fit the two factor fixed effects model in (b) and interpret your results.

(d) (5 points) For the two factor fixed effects model in (b), compare the three positions in terms of

the linear and quadratic contrasts of intensity. Summarize your findings in context and provide

statistical justification.

(e) (5 points) Returning to the full data, notice that 4 athletes were observed in this data. That is, each

athlete was subjected to each combination of training intensity and position. One could consider

the 12 intensity × position combinations as 12 different treatments. Explain why we should include

athlete as a “block” in our model.

(f) (5 points) Discuss under what scenarios “athlete” should be considered as a fixed block effect versus

a random block effect. Is randomization necessary in the experimental design? Why or why not.

(g) (5 points) Assume a mixed effects model with athlete as a random block effect and intensity, posi-

tion, and their interaction as fixed effects. Let Yijk be the response for athlete i, intensity j, position

k. What are the following:

• cov(Y111, Y122)

• cov(Y111, Y211)

• var(Y111)

(h) (5 points) Fit the mixed effects model with athlete as a random block effect and intensity, position,

and their interaction as fixed effects. Report the efficiency gain of including the random block effect.

Final - Fall 2020

Instructions: Provide answers to each of the four questions below. Points for each problem are specified.

• For each part of each problem, write your answers in complete sentences providing context and statis-

tical justification.

• Do not include R code in your solutions; rather, use R as a means for doing your analysis.

• You are not allowed to discuss any part of these analyses with anyone other than me. Contact me if

you have questions.

1. Nine well-trained cyclists participated in a study in which they were each given 3 doses of caffeine (0, 5,

13mg) and their endurance performance time was measured. The data are given in caffeineCycling.Rdata.

(a) (5 points) Using nonparametric methods, determine whether there is a significant difference in

endurance performance between the 0 and 5mg caffeine dosages. Provide a written summary of

your results along with statistical support to justify your recommendation.

(b) (5 points) Assuming you are interested in comparing all 3 dosages of caffeine, what type of model/test

would you suggest to your collaborators to investigate the relationship between caffeine and en-

durance performance. [You don’t need to do the modeling/testing.]

2. Assuming you’ve been following the news, both Moderna and Pfizer/BioNTech have developed vaccina-

tions for the coronavirus with efficacy of approximately 90-95%. It has been reported recently (https:

//science.sciencemag.org/content/370/6520/1022) that there are some side effects for these vac-

cines, including headache and fatigue. The following table provides information about the results of

the clinical trials of both Moderna and Pfizer/BioNTech. Note that both clinical trials administered

the vaccine to half of the participants while the other half received a placebo. The trials were double

blind meaning that neither the physician nor the participant knew whether they received the vaccine or

placebo.

Moderna Pfizer/BioNTech

Total number of participants in the clinical trial 30,000 43,000

Proportion of vaccine recipients reporting fatigue 0.097 0.038

Proportion of vaccine recipients reporting headache 0.045 0.020

Proportion of vaccine recipients reporting both headache & fatigue 0.024 0.017

(a) (5 points) Compute a 95% confidence interval for the risk of fatigue from the Moderna vaccine.

(b) (5 points) Perform a statistical test to determine whether there is a difference in the risk of headache

between the two vaccines.

(c) (5 points) Suppose with a type I error rate of α = 0.05, you want to simultaneously test the

hypotheses that the risk of having both a headache and fatigue for the Moderna vaccine is different

than 0.02 and the risk of having both a headache and fatigue for the Pfizer/BioNTech vaccine is

different that 0.02. What is the power of each test?

(d) (5 points) Pooling the data from both Moderna and Pfizer/BioNTech, test whether headache and

fatigue are independent side effects of the vaccine. Provide statistical support to justify your answer.

3. (30 points) Consider the data lakeNitrogen.Rdata, which are observations taken at 200 lakes across

the northeastern part of the United States. The data, which come from a research project I am currently

working on, consist of measures of total nitrogen in the lake (measured in micrograms per liter) as well

as important explanatory variables of nitrogen (e.g., land use, lake characteristics). Specifics about the

variables are given in the table below. The researcher is interested in determining what variables are

important for explaining total nitrogen in lakes.

TN Total nitrogen in the lake

Baseflow measure of stream flow between precipitation events

NO3Depo nitrite deposited into the lake from the atmosphere, usually

through precipitation)

TotalDepo total nitrogen deposited into the lake from the atmosphere,

usually through precipitation

Runoff measure of precipitation that flows off the surface of the land into the lake

Urban percent of area around the lake classified as urban

Rowcrop percent of area around the lake classified as rowcrop (agricultural land)

Pasture percent of area around the lake classified as pasture

Forest percent of area around the lake classified as forest

Wetland percent of area around the lake classified as wetland

LakeArea area of the lake

MaxDepth maximum depth of the lake

Connectivity categorical variable for whether it is located downriver from a lake & stream,

located downriver from a stream, or an isolated/headwater lake

LWR lake-to-watershed ratio, measures the proportion of the watershed for which the

lake makes up

Your task is to serve as the data analyst on the project. Build a regression model for total nitrogen using

the techniques we learned in class. Prepare a final report outlining the steps of your analysis. Include

a detailed summary of your final model providing statistical justification for your choice. Make sure to

report the fitted regression equation. Your model must fit the data well, be interpretable, parsimonious,

and not violate any model assumption. I would encourage you to do a fair amount of exploratory data

analysis to get a feel for the different variables before you begin building your model.

4. The dataset training.Rdata contains training session data for four collegiate athletes. In particular,

included in the dataset are the rate of perceived effort (RPE) for each athlete for 16 different training

sessions. RPE is a measure that each athlete assigns to a training session based on their perceived

level of difficulty. Also included in the data is a predefined ordinal intensity for each training session

(low=L1, low/moderate=L2, moderate/high=L3, and high=L4) as well as position played (F=forward,

M=midfield, D=defense).

(a) (5 points) Disregard the athlete variable to start. Is there a significant difference in RPE for the

different intensities of training session? Is there a significant difference in RPE for the different

positions? Summarize your results in context providing statistical justification.

(b) (5 points) Consider an appropriate two-factor fixed effects model for RPE with intensity and po-

sition as fixed factors. Write out the model explicitly in terms of the variables/parameters. Make

sure to define all terms.

(c) (5 points) Fit the two factor fixed effects model in (b) and interpret your results.

(d) (5 points) For the two factor fixed effects model in (b), compare the three positions in terms of

the linear and quadratic contrasts of intensity. Summarize your findings in context and provide

statistical justification.

(e) (5 points) Returning to the full data, notice that 4 athletes were observed in this data. That is, each

athlete was subjected to each combination of training intensity and position. One could consider

the 12 intensity × position combinations as 12 different treatments. Explain why we should include

athlete as a “block” in our model.

(f) (5 points) Discuss under what scenarios “athlete” should be considered as a fixed block effect versus

a random block effect. Is randomization necessary in the experimental design? Why or why not.

(g) (5 points) Assume a mixed effects model with athlete as a random block effect and intensity, posi-

tion, and their interaction as fixed effects. Let Yijk be the response for athlete i, intensity j, position

k. What are the following:

• cov(Y111, Y122)

• cov(Y111, Y211)

• var(Y111)

(h) (5 points) Fit the mixed effects model with athlete as a random block effect and intensity, position,

and their interaction as fixed effects. Report the efficiency gain of including the random block effect.