STAT 231 Online Assignment 1
Assignment 1 is due on Thursday January 21 at 11:00 am EST.
Your assignment submission must be typed. There are no exceptions.
Any submitted answer which is not typed will not be marked but given a mark of
zero.
You may create your document in Word, Google Docs, LaTeX or any other word
processor. The requirement to type your assignment is to facilitate the marking of
hundreds of assignments so that the marked assignments can be returned to you in a
timely fashion. It is also useful for you to gain some experience in creating a
document containing mathematical expressions especially in this time of doing
everything online! Two documents have been posted in the Assignment 1 folder in
LEARN on how to use the equation editor in Word.
Follow the steps in the document Introduction to R and RStudio (posted on the
course website on Learn) to install the software needed for this course. See Section 1
- Introduction. To learn how to run R code see Section 2 – Getting Started.
Upload your assignment to Crowdmark as a pdf file. Here is a useful link for all
information related to Crowdmark assessments: https://crowdmark.com/help/
You can upload your assignment as one document or individually for each problem. If
you upload one document then you must drag and drop the pages for each problem
to the appropriate question as indicated in Crowdmark. You can resubmit your
assignment any number of times before the due time. Therefore to ensure that there
are no issues with uploading we advise you to upload your assignment well in
advance of the due time.
Assignments which are left as a single document and not uploaded to the appropriate
places in Crowdmark will be assigned a 10% penalty.
A penalty of 10% per hour is applied for late assignments.
Please see the course policy on missed assignments on LEARN posted
under Syllabus.
2
In this course we will use many concepts that were covered in STAT 230 (a pre-
requisite for this course). In Problems 1-4 you will review some of these concepts as
well as using the software R to evaluate probabilities. You may find it useful to look
at the review problems 14 to 18 in Chapter 1 of the STAT 231 Course Notes before
attempting this question. A review document about the continuity correction is
posted in the Assignment 1 folder on LEARN.
Problem 1: Binomial distribution
In a very large population 1% of the people have a certain genetic mutation. Suppose 1200 people are
selected at random. Define the random variable Y = number of people with the genetic mutation in
the sample.
(a) What are the assumptions for a Binomial model? Explain, with reasons, whether or not these
assumptions might hold in this context. Your answer must be written in sentences.
(b) Use the Normal approximation to the Binomial with continuity correction and the Normal table in
the Course Notes to approximate the following probabilities.
P(Y ≤ 8), P(Y ≥ 16), and P(|Y – 12| < 7)
You must show your work for full marks.
(c) Type help(pbinom) in R to see the syntax for the R functions pbinom, qbinom, dbinom, and
rbinom. Use the appropriate R functions to obtain values for:
P(Y ≤ 8), P(Y ≥ 16), and P(|Y – 12| < 7)
Include the R statements that you used in your submitted answer.
(d) For each of the probabilities in (b) and (c) determine the percent relative error 100 |−|
where
is the approximate probability and is the probability calculated using R. Explain why each pair of
values is in good agreement or not.
(e) Suppose the proportion of people with the genetic mutation is an unknown value equal to θ.
Suppose n people are selected at random where n is large. Approximate the probability:
�
− 2.17�(1 − )
≤ ≤
+ 2.17�(1 − )
�
You may ignore the continuity correction. You must show your work for full marks.
3
Problem 2: Poisson distribution
During the week of December 6-13, 2020 the visits to an Eastern Ontario Health Unit website to book
a Covid test occurred at random at the average rate of 10 visits per minute. Suppose it is reasonable
to use a Poisson process to model this process. Define the random variable Y = number of visits to the
website in one minute.
(a) Using the three assumptions for a Poisson process argue whether you think it is reasonable or not
for these assumptions to hold in this scenario. Your answer must be written in sentences.
(b) Use the Normal approximation to the Poisson with continuity correction and the Normal table in
the Course Notes to approximate:
P(Y < 5), P(Y > 14), and P(|Y – 10| ≥ 7)
You must show your work for full marks.
(c) Type help(ppois) in R to see the syntax for the R functions ppois, qpois, dpois, and rpois. Use the
appropriate R functions to obtain values for:
P(Y < 5), P(Y > 14), and P(|Y – 10| ≥ 7)
Include the R statements that you used in your submitted answer.
(d) For each of the probabilities in (b) and (c) determine the percent relative error 100 |−|
where
is the approximate probability and is the probability calculated using R. Explain why each pair of
values is in good agreement or not.
(e) Suppose Y1,Y2, …,Yn is a random sample from a Poisson(θ) distribution and let
� = 1
∑
=1 be the sample mean.
Approximate the probability:
�� − 1.61�
≤ ≤ � + 1.61�
�
You may ignore the continuity correction.
You must show your work for full marks.
4
Problem 3: Normal or Gaussian distribution
Suppose it is reasonable to assume that the heights in centimeters of second year female Math
students at the University of Waterloo have a G(160,9) = N(160, 81) distribution. Define the random
variable Y = height of a female Math student chosen at random.
(a) Use the Normal table in the Course Notes to determine P(Y ≥ 169).
You must show your work for full marks.
(b) Type help(pnorm) in R to see the syntax for the R function pnorm, qnorm, dnorm, and rnorm. Use
the appropriate R function to obtain the value for P(Y ≥ 169).
Include the R statement that you used in your submitted answer.
(c) Find the percent relative error 100 |−|
where is the probability determined in (a) using
the Normal table and is the probability determined in (b) using R. Explain why the answers are in
good agreement or not.
(d) Determine a such that P(Y ≥ a) = 0.83 using the inverse Normal cumulative distribution table in the
Course Notes.
You must show your work for full marks.
(e) Use the appropriate R function to obtain the value for a such that P(Y ≥ a) = 0.83.
Include the R statement that you used in your submitted answer.
(f) Are the answers in (d) and (e) in good agreement or not?
(g) Suppose 64 female Math students are chosen at random. Determine the probability that their
average height lies between 159 and 162. Use R to find the probability, not the Normal table in the
Course Notes.
You must show your work for full marks.
Include the R statement that you used in your submitted answer.
5
Problem 4: Exponential distribution
Suppose it is reasonable to model the battery life (in hours) of a certain type of watch battery using
the Exponential(3) distribution. Define the random variable Y = battery life (in hours) of a randomly
chosen watch battery.
(a) With reference to the Memoryless Property of the Exponential Distribution discuss whether you
think an Exponential Model is a reasonable model for Y.
Your answer must be written in sentences.
(b) Determine P(Y ≥ 4) using the probability density function of Y and integration.
You must show your work for full marks.
(c) Type help(pexp) in R to see the syntax for the R functions pexp, qexp, dexp, and rexp. Use the
appropriate R function to obtain the value for P(Y ≥ 4). Include the R statement that you used in your
submitted answer.
(d) Determine the median of this distribution, that is, determine the value m such that
P(Y ≤ m ) = 0.5
You must show your work for full marks.
(e) Suppose Y1,Y2, …,Yn is a random sample from a Exponential(θ) distribution and let
� = 1
∑
=1 be the sample mean.
Approximate the probability:
�� − 1.96
√
≤ ≤ � + 1.96
√
�
You may ignore the continuity correction. Use R to find the probability, not the Normal table in the
Course Notes.
You must show your work for full marks.
Include the R statement that you used in your submitted answer.
6
Problem 5: Empirical Studies
The purpose of this problem is to examine how empirical studies are reported in the
news media.
On the course website on LEARN you will find a module under Additional Resources called Statistics
in the Media. These are all examples of empirical studies which have been reported in the news
media.
Find your own example of statistics in the news media.
Pick a topic which is of interest to you and search online using keywords which describe
your topic.
News media includes print media (newspapers, newsmagazines), broadcast news (radio and
television), and the Internet (online newspapers, news blogs, news videos, live news
streaming, etc.).
Your article must not come from a research journal.
Your example should be less than 2 pages long.
Make sure you chose an example for which the data are a sample of a larger population and
not a census of that population.
The example must have appeared in the news media after December 31, 2019.
(a) Indicate clearly the information on where the article appeared and the date it appeared.
Give the link to the article. To help the TAs mark this question please cut and paste the article into
your assignment.
The answers to (ii) - (vi) must be written in sentences.
(b) Indicate clearly the keywords you used to find your example and why this topic is of interest to
you.
(c) State clearly and succinctly what the purpose of the study was and the conclusion reached by the
researchers.
(d) The study you selected can be best described as which of the following: an observational study, a
sample survey or an experimental study? Justify your answer.
(e) What are the units in this study? Based on the given information, what population or collection of
units are the researchers interested in?
(f) Give the 2 most important variates in this study and indicate the type of each.
7
Problem 6:
The purpose of problem is to use R to generate numerical summaries (see Chapter 1)
and the relative frequency histogram for a Gaussian data set which has been
randomly generated in R. The are two data sets for each sample size of n = 50, 100,
200, and 300. The aim is to compare the observed summaries with what is expected
for Gaussian data.
The R code for this problem is posted as a text file called RCodeAssignment1.txt in
the Assignment 1 folder on LEARN.
Run the R code provided and verify that you obtain the same plots as shown on the
next 4 pages.
Follow the instructions and answer the questions which appear after these plots.
8
9
10
11
12
Run the R code for this problem again except modify the line
"id<-20456484"
by replacing the number 20456484 with your UWaterloo ID number.
When you run the R code with your ID number you will generate 8 new plots. Export
these plots as .png files using RStudio (See Introduction to R and RStudio - Section 6).
(a) My ID number is _________________.
(b) Insert the plots generated using your ID number in your assignment (2 per page).
(c) Each of these data sets was randomly generated from a G(0,1) distribution.
Complete the following sentence and include it with your assignment:
For each data set we expect the sample mean to be close to ______________,
the sample median to be close to _____________, the sample standard deviation to
be close to ______________, the sample skewness to be close to, ______________,
the sample kurtosis to be close to ____________, and the shape of the relative
frequency histogram to be approximately ___________________.
(d) For each of the 8 plots generated using your ID number, compare the observed
numerical summaries and the relative frequency histogram to what is expected for
G(0,1) data. Comment on any differences. What do you notice as the sample size
changes?
Your answer must be written in sentences.