xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-STAT 231-Assignment 1

时间：2021-01-20

STAT 231 Online Assignment 1

Assignment 1 is due on Thursday January 21 at 11:00 am EST.

Your assignment submission must be typed. There are no exceptions.

Any submitted answer which is not typed will not be marked but given a mark of

zero.

You may create your document in Word, Google Docs, LaTeX or any other word

processor. The requirement to type your assignment is to facilitate the marking of

hundreds of assignments so that the marked assignments can be returned to you in a

timely fashion. It is also useful for you to gain some experience in creating a

document containing mathematical expressions especially in this time of doing

everything online! Two documents have been posted in the Assignment 1 folder in

LEARN on how to use the equation editor in Word.

Follow the steps in the document Introduction to R and RStudio (posted on the

course website on Learn) to install the software needed for this course. See Section 1

- Introduction. To learn how to run R code see Section 2 – Getting Started.

Upload your assignment to Crowdmark as a pdf file. Here is a useful link for all

information related to Crowdmark assessments: https://crowdmark.com/help/

You can upload your assignment as one document or individually for each problem. If

you upload one document then you must drag and drop the pages for each problem

to the appropriate question as indicated in Crowdmark. You can resubmit your

assignment any number of times before the due time. Therefore to ensure that there

are no issues with uploading we advise you to upload your assignment well in

advance of the due time.

Assignments which are left as a single document and not uploaded to the appropriate

places in Crowdmark will be assigned a 10% penalty.

A penalty of 10% per hour is applied for late assignments.

Please see the course policy on missed assignments on LEARN posted

under Syllabus.

2

In this course we will use many concepts that were covered in STAT 230 (a pre-

requisite for this course). In Problems 1-4 you will review some of these concepts as

well as using the software R to evaluate probabilities. You may find it useful to look

at the review problems 14 to 18 in Chapter 1 of the STAT 231 Course Notes before

attempting this question. A review document about the continuity correction is

posted in the Assignment 1 folder on LEARN.

Problem 1: Binomial distribution

In a very large population 1% of the people have a certain genetic mutation. Suppose 1200 people are

selected at random. Define the random variable Y = number of people with the genetic mutation in

the sample.

(a) What are the assumptions for a Binomial model? Explain, with reasons, whether or not these

assumptions might hold in this context. Your answer must be written in sentences.

(b) Use the Normal approximation to the Binomial with continuity correction and the Normal table in

the Course Notes to approximate the following probabilities.

P(Y ≤ 8), P(Y ≥ 16), and P(|Y – 12| < 7)

You must show your work for full marks.

(c) Type help(pbinom) in R to see the syntax for the R functions pbinom, qbinom, dbinom, and

rbinom. Use the appropriate R functions to obtain values for:

P(Y ≤ 8), P(Y ≥ 16), and P(|Y – 12| < 7)

Include the R statements that you used in your submitted answer.

(d) For each of the probabilities in (b) and (c) determine the percent relative error 100 |−|

where

is the approximate probability and is the probability calculated using R. Explain why each pair of

values is in good agreement or not.

(e) Suppose the proportion of people with the genetic mutation is an unknown value equal to θ.

Suppose n people are selected at random where n is large. Approximate the probability:

�

− 2.17�(1 − )

≤ ≤

+ 2.17�(1 − )

�

You may ignore the continuity correction. You must show your work for full marks.

3

Problem 2: Poisson distribution

During the week of December 6-13, 2020 the visits to an Eastern Ontario Health Unit website to book

a Covid test occurred at random at the average rate of 10 visits per minute. Suppose it is reasonable

to use a Poisson process to model this process. Define the random variable Y = number of visits to the

website in one minute.

(a) Using the three assumptions for a Poisson process argue whether you think it is reasonable or not

for these assumptions to hold in this scenario. Your answer must be written in sentences.

(b) Use the Normal approximation to the Poisson with continuity correction and the Normal table in

the Course Notes to approximate:

P(Y < 5), P(Y > 14), and P(|Y – 10| ≥ 7)

You must show your work for full marks.

(c) Type help(ppois) in R to see the syntax for the R functions ppois, qpois, dpois, and rpois. Use the

appropriate R functions to obtain values for:

P(Y < 5), P(Y > 14), and P(|Y – 10| ≥ 7)

Include the R statements that you used in your submitted answer.

(d) For each of the probabilities in (b) and (c) determine the percent relative error 100 |−|

where

is the approximate probability and is the probability calculated using R. Explain why each pair of

values is in good agreement or not.

(e) Suppose Y1,Y2, …,Yn is a random sample from a Poisson(θ) distribution and let

� = 1

∑

=1 be the sample mean.

Approximate the probability:

�� − 1.61�

≤ ≤ � + 1.61�

�

You may ignore the continuity correction.

You must show your work for full marks.

4

Problem 3: Normal or Gaussian distribution

Suppose it is reasonable to assume that the heights in centimeters of second year female Math

students at the University of Waterloo have a G(160,9) = N(160, 81) distribution. Define the random

variable Y = height of a female Math student chosen at random.

(a) Use the Normal table in the Course Notes to determine P(Y ≥ 169).

You must show your work for full marks.

(b) Type help(pnorm) in R to see the syntax for the R function pnorm, qnorm, dnorm, and rnorm. Use

the appropriate R function to obtain the value for P(Y ≥ 169).

Include the R statement that you used in your submitted answer.

(c) Find the percent relative error 100 |−|

where is the probability determined in (a) using

the Normal table and is the probability determined in (b) using R. Explain why the answers are in

good agreement or not.

(d) Determine a such that P(Y ≥ a) = 0.83 using the inverse Normal cumulative distribution table in the

Course Notes.

You must show your work for full marks.

(e) Use the appropriate R function to obtain the value for a such that P(Y ≥ a) = 0.83.

Include the R statement that you used in your submitted answer.

(f) Are the answers in (d) and (e) in good agreement or not?

(g) Suppose 64 female Math students are chosen at random. Determine the probability that their

average height lies between 159 and 162. Use R to find the probability, not the Normal table in the

Course Notes.

You must show your work for full marks.

Include the R statement that you used in your submitted answer.

5

Problem 4: Exponential distribution

Suppose it is reasonable to model the battery life (in hours) of a certain type of watch battery using

the Exponential(3) distribution. Define the random variable Y = battery life (in hours) of a randomly

chosen watch battery.

(a) With reference to the Memoryless Property of the Exponential Distribution discuss whether you

think an Exponential Model is a reasonable model for Y.

Your answer must be written in sentences.

(b) Determine P(Y ≥ 4) using the probability density function of Y and integration.

You must show your work for full marks.

(c) Type help(pexp) in R to see the syntax for the R functions pexp, qexp, dexp, and rexp. Use the

appropriate R function to obtain the value for P(Y ≥ 4). Include the R statement that you used in your

submitted answer.

(d) Determine the median of this distribution, that is, determine the value m such that

P(Y ≤ m ) = 0.5

You must show your work for full marks.

(e) Suppose Y1,Y2, …,Yn is a random sample from a Exponential(θ) distribution and let

� = 1

∑

=1 be the sample mean.

Approximate the probability:

�� − 1.96

√

≤ ≤ � + 1.96

√

�

You may ignore the continuity correction. Use R to find the probability, not the Normal table in the

Course Notes.

You must show your work for full marks.

Include the R statement that you used in your submitted answer.

6

Problem 5: Empirical Studies

The purpose of this problem is to examine how empirical studies are reported in the

news media.

On the course website on LEARN you will find a module under Additional Resources called Statistics

in the Media. These are all examples of empirical studies which have been reported in the news

media.

Find your own example of statistics in the news media.

Pick a topic which is of interest to you and search online using keywords which describe

your topic.

News media includes print media (newspapers, newsmagazines), broadcast news (radio and

television), and the Internet (online newspapers, news blogs, news videos, live news

streaming, etc.).

Your article must not come from a research journal.

Your example should be less than 2 pages long.

Make sure you chose an example for which the data are a sample of a larger population and

not a census of that population.

The example must have appeared in the news media after December 31, 2019.

(a) Indicate clearly the information on where the article appeared and the date it appeared.

Give the link to the article. To help the TAs mark this question please cut and paste the article into

your assignment.

The answers to (ii) - (vi) must be written in sentences.

(b) Indicate clearly the keywords you used to find your example and why this topic is of interest to

you.

(c) State clearly and succinctly what the purpose of the study was and the conclusion reached by the

researchers.

(d) The study you selected can be best described as which of the following: an observational study, a

sample survey or an experimental study? Justify your answer.

(e) What are the units in this study? Based on the given information, what population or collection of

units are the researchers interested in?

(f) Give the 2 most important variates in this study and indicate the type of each.

7

Problem 6:

The purpose of problem is to use R to generate numerical summaries (see Chapter 1)

and the relative frequency histogram for a Gaussian data set which has been

randomly generated in R. The are two data sets for each sample size of n = 50, 100,

200, and 300. The aim is to compare the observed summaries with what is expected

for Gaussian data.

The R code for this problem is posted as a text file called RCodeAssignment1.txt in

the Assignment 1 folder on LEARN.

Run the R code provided and verify that you obtain the same plots as shown on the

next 4 pages.

Follow the instructions and answer the questions which appear after these plots.

8

9

10

11

12

Run the R code for this problem again except modify the line

"id<-20456484"

by replacing the number 20456484 with your UWaterloo ID number.

When you run the R code with your ID number you will generate 8 new plots. Export

these plots as .png files using RStudio (See Introduction to R and RStudio - Section 6).

(a) My ID number is _________________.

(b) Insert the plots generated using your ID number in your assignment (2 per page).

(c) Each of these data sets was randomly generated from a G(0,1) distribution.

Complete the following sentence and include it with your assignment:

For each data set we expect the sample mean to be close to ______________,

the sample median to be close to _____________, the sample standard deviation to

be close to ______________, the sample skewness to be close to, ______________,

the sample kurtosis to be close to ____________, and the shape of the relative

frequency histogram to be approximately ___________________.

(d) For each of the 8 plots generated using your ID number, compare the observed

numerical summaries and the relative frequency histogram to what is expected for

G(0,1) data. Comment on any differences. What do you notice as the sample size

changes?

Your answer must be written in sentences.

Assignment 1 is due on Thursday January 21 at 11:00 am EST.

Your assignment submission must be typed. There are no exceptions.

Any submitted answer which is not typed will not be marked but given a mark of

zero.

You may create your document in Word, Google Docs, LaTeX or any other word

processor. The requirement to type your assignment is to facilitate the marking of

hundreds of assignments so that the marked assignments can be returned to you in a

timely fashion. It is also useful for you to gain some experience in creating a

document containing mathematical expressions especially in this time of doing

everything online! Two documents have been posted in the Assignment 1 folder in

LEARN on how to use the equation editor in Word.

Follow the steps in the document Introduction to R and RStudio (posted on the

course website on Learn) to install the software needed for this course. See Section 1

- Introduction. To learn how to run R code see Section 2 – Getting Started.

Upload your assignment to Crowdmark as a pdf file. Here is a useful link for all

information related to Crowdmark assessments: https://crowdmark.com/help/

You can upload your assignment as one document or individually for each problem. If

you upload one document then you must drag and drop the pages for each problem

to the appropriate question as indicated in Crowdmark. You can resubmit your

assignment any number of times before the due time. Therefore to ensure that there

are no issues with uploading we advise you to upload your assignment well in

advance of the due time.

Assignments which are left as a single document and not uploaded to the appropriate

places in Crowdmark will be assigned a 10% penalty.

A penalty of 10% per hour is applied for late assignments.

Please see the course policy on missed assignments on LEARN posted

under Syllabus.

2

In this course we will use many concepts that were covered in STAT 230 (a pre-

requisite for this course). In Problems 1-4 you will review some of these concepts as

well as using the software R to evaluate probabilities. You may find it useful to look

at the review problems 14 to 18 in Chapter 1 of the STAT 231 Course Notes before

attempting this question. A review document about the continuity correction is

posted in the Assignment 1 folder on LEARN.

Problem 1: Binomial distribution

In a very large population 1% of the people have a certain genetic mutation. Suppose 1200 people are

selected at random. Define the random variable Y = number of people with the genetic mutation in

the sample.

(a) What are the assumptions for a Binomial model? Explain, with reasons, whether or not these

assumptions might hold in this context. Your answer must be written in sentences.

(b) Use the Normal approximation to the Binomial with continuity correction and the Normal table in

the Course Notes to approximate the following probabilities.

P(Y ≤ 8), P(Y ≥ 16), and P(|Y – 12| < 7)

You must show your work for full marks.

(c) Type help(pbinom) in R to see the syntax for the R functions pbinom, qbinom, dbinom, and

rbinom. Use the appropriate R functions to obtain values for:

P(Y ≤ 8), P(Y ≥ 16), and P(|Y – 12| < 7)

Include the R statements that you used in your submitted answer.

(d) For each of the probabilities in (b) and (c) determine the percent relative error 100 |−|

where

is the approximate probability and is the probability calculated using R. Explain why each pair of

values is in good agreement or not.

(e) Suppose the proportion of people with the genetic mutation is an unknown value equal to θ.

Suppose n people are selected at random where n is large. Approximate the probability:

�

− 2.17�(1 − )

≤ ≤

+ 2.17�(1 − )

�

You may ignore the continuity correction. You must show your work for full marks.

3

Problem 2: Poisson distribution

During the week of December 6-13, 2020 the visits to an Eastern Ontario Health Unit website to book

a Covid test occurred at random at the average rate of 10 visits per minute. Suppose it is reasonable

to use a Poisson process to model this process. Define the random variable Y = number of visits to the

website in one minute.

(a) Using the three assumptions for a Poisson process argue whether you think it is reasonable or not

for these assumptions to hold in this scenario. Your answer must be written in sentences.

(b) Use the Normal approximation to the Poisson with continuity correction and the Normal table in

the Course Notes to approximate:

P(Y < 5), P(Y > 14), and P(|Y – 10| ≥ 7)

You must show your work for full marks.

(c) Type help(ppois) in R to see the syntax for the R functions ppois, qpois, dpois, and rpois. Use the

appropriate R functions to obtain values for:

P(Y < 5), P(Y > 14), and P(|Y – 10| ≥ 7)

Include the R statements that you used in your submitted answer.

(d) For each of the probabilities in (b) and (c) determine the percent relative error 100 |−|

where

is the approximate probability and is the probability calculated using R. Explain why each pair of

values is in good agreement or not.

(e) Suppose Y1,Y2, …,Yn is a random sample from a Poisson(θ) distribution and let

� = 1

∑

=1 be the sample mean.

Approximate the probability:

�� − 1.61�

≤ ≤ � + 1.61�

�

You may ignore the continuity correction.

You must show your work for full marks.

4

Problem 3: Normal or Gaussian distribution

Suppose it is reasonable to assume that the heights in centimeters of second year female Math

students at the University of Waterloo have a G(160,9) = N(160, 81) distribution. Define the random

variable Y = height of a female Math student chosen at random.

(a) Use the Normal table in the Course Notes to determine P(Y ≥ 169).

You must show your work for full marks.

(b) Type help(pnorm) in R to see the syntax for the R function pnorm, qnorm, dnorm, and rnorm. Use

the appropriate R function to obtain the value for P(Y ≥ 169).

Include the R statement that you used in your submitted answer.

(c) Find the percent relative error 100 |−|

where is the probability determined in (a) using

the Normal table and is the probability determined in (b) using R. Explain why the answers are in

good agreement or not.

(d) Determine a such that P(Y ≥ a) = 0.83 using the inverse Normal cumulative distribution table in the

Course Notes.

You must show your work for full marks.

(e) Use the appropriate R function to obtain the value for a such that P(Y ≥ a) = 0.83.

Include the R statement that you used in your submitted answer.

(f) Are the answers in (d) and (e) in good agreement or not?

(g) Suppose 64 female Math students are chosen at random. Determine the probability that their

average height lies between 159 and 162. Use R to find the probability, not the Normal table in the

Course Notes.

You must show your work for full marks.

Include the R statement that you used in your submitted answer.

5

Problem 4: Exponential distribution

Suppose it is reasonable to model the battery life (in hours) of a certain type of watch battery using

the Exponential(3) distribution. Define the random variable Y = battery life (in hours) of a randomly

chosen watch battery.

(a) With reference to the Memoryless Property of the Exponential Distribution discuss whether you

think an Exponential Model is a reasonable model for Y.

Your answer must be written in sentences.

(b) Determine P(Y ≥ 4) using the probability density function of Y and integration.

You must show your work for full marks.

(c) Type help(pexp) in R to see the syntax for the R functions pexp, qexp, dexp, and rexp. Use the

appropriate R function to obtain the value for P(Y ≥ 4). Include the R statement that you used in your

submitted answer.

(d) Determine the median of this distribution, that is, determine the value m such that

P(Y ≤ m ) = 0.5

You must show your work for full marks.

(e) Suppose Y1,Y2, …,Yn is a random sample from a Exponential(θ) distribution and let

� = 1

∑

=1 be the sample mean.

Approximate the probability:

�� − 1.96

√

≤ ≤ � + 1.96

√

�

You may ignore the continuity correction. Use R to find the probability, not the Normal table in the

Course Notes.

You must show your work for full marks.

Include the R statement that you used in your submitted answer.

6

Problem 5: Empirical Studies

The purpose of this problem is to examine how empirical studies are reported in the

news media.

On the course website on LEARN you will find a module under Additional Resources called Statistics

in the Media. These are all examples of empirical studies which have been reported in the news

media.

Find your own example of statistics in the news media.

Pick a topic which is of interest to you and search online using keywords which describe

your topic.

News media includes print media (newspapers, newsmagazines), broadcast news (radio and

television), and the Internet (online newspapers, news blogs, news videos, live news

streaming, etc.).

Your article must not come from a research journal.

Your example should be less than 2 pages long.

Make sure you chose an example for which the data are a sample of a larger population and

not a census of that population.

The example must have appeared in the news media after December 31, 2019.

(a) Indicate clearly the information on where the article appeared and the date it appeared.

Give the link to the article. To help the TAs mark this question please cut and paste the article into

your assignment.

The answers to (ii) - (vi) must be written in sentences.

(b) Indicate clearly the keywords you used to find your example and why this topic is of interest to

you.

(c) State clearly and succinctly what the purpose of the study was and the conclusion reached by the

researchers.

(d) The study you selected can be best described as which of the following: an observational study, a

sample survey or an experimental study? Justify your answer.

(e) What are the units in this study? Based on the given information, what population or collection of

units are the researchers interested in?

(f) Give the 2 most important variates in this study and indicate the type of each.

7

Problem 6:

The purpose of problem is to use R to generate numerical summaries (see Chapter 1)

and the relative frequency histogram for a Gaussian data set which has been

randomly generated in R. The are two data sets for each sample size of n = 50, 100,

200, and 300. The aim is to compare the observed summaries with what is expected

for Gaussian data.

The R code for this problem is posted as a text file called RCodeAssignment1.txt in

the Assignment 1 folder on LEARN.

Run the R code provided and verify that you obtain the same plots as shown on the

next 4 pages.

Follow the instructions and answer the questions which appear after these plots.

8

9

10

11

12

Run the R code for this problem again except modify the line

"id<-20456484"

by replacing the number 20456484 with your UWaterloo ID number.

When you run the R code with your ID number you will generate 8 new plots. Export

these plots as .png files using RStudio (See Introduction to R and RStudio - Section 6).

(a) My ID number is _________________.

(b) Insert the plots generated using your ID number in your assignment (2 per page).

(c) Each of these data sets was randomly generated from a G(0,1) distribution.

Complete the following sentence and include it with your assignment:

For each data set we expect the sample mean to be close to ______________,

the sample median to be close to _____________, the sample standard deviation to

be close to ______________, the sample skewness to be close to, ______________,

the sample kurtosis to be close to ____________, and the shape of the relative

frequency histogram to be approximately ___________________.

(d) For each of the 8 plots generated using your ID number, compare the observed

numerical summaries and the relative frequency histogram to what is expected for

G(0,1) data. Comment on any differences. What do you notice as the sample size

changes?

Your answer must be written in sentences.