R代写-STAT 428|学霸联盟

R代写-STAT 428

时间：2022-02-22

STAT 428 Exam 1 (SP22)
Due: Feb 23 (Wed) at noon (12:00pm)
Instructions:
• Please include the following setup code chunk (circled in yellow) above the response to Ques-
tion 1. You should replace xxxxxxxxxx with your UIN:
• Unless otherwise specified, you can use built-in stats functions (e.g., dnorm()).
• Please include intermediate steps for partial-credit grading.
• The PDF report should include all code written to solve problems, as well as any output and
plots being requested by the problems. Setting seed to UIN, consistent Rmd source code and
PDF, and correctly tagged pages for each question on Gradescope are required for responses
to be fully considered.
• The exam must be independent work, meaning that you are not allowed consult or discuss
the exam questions with others, including with classmates, instructor or TA, and on online
forums. Assisting and sharing answers with others or online are also strictly prohibited.
Academic dishonesty may result in a failing grade.
• The instructor will only answer clarification questions about exam problems.
• Please do NOT discuss the exam with anyone, including on Campuswire, prior to 2/24.
Page 1 of 5
There are 4 questions. Each question is worth 7 to 11 points.
1. (10 pts) Consider a random variable X with probability density
fX(x) =
cos(x)
2
, −π
2
< x <
π
2
.
(a) Write a function that takes input u ∈ (0, 1) and computes the inverse of the CDF, F−1X (u).
(Note: The R asin() function computes arcsine, i.e., sin−1().)
(b) Using the function in (a), generate 1000 samples from fX(x) with the inverse CDF method.
On a Q-Q plot, compare the empirical quantiles of the obtained draws and the theoretical
quantiles of X.
(Note: The inverse CDF function is equivalent to the theoretical quantile function.)
(c) Propose a probability density g(y) to conduct acceptance-rejection sampling from fX(x).
You should provide the analytic expression for the PDF, g(y).
(d) Using acceptance/rejection sampling with g(y) from part (c), draw approximately 1000
accepted samples from f(x). Plot the histogram of empirical density and superimpose
the theoretical density curve.
Page 2 Page 2 of 5
2. (11+3 pts) Suppose we want to use Monte Carlo estimation to approximate E( 1√
2π
e−X2/2),
where X follows fX(·) in Question 1. Then,
θ = E
[
1√
2π
e−
X2
2
]
=
∫ π/2
−π/2
1√
2π
exp
(
−x
2
2
)
· cos(x)
2
dx.
(a) Using fX(·) in Question 1 as the importance function and exactly 1000 random samples,
perform importance sampling to obtain a Monte Carlo estimate for θ.
(Note: You may reuse R objects in Q1(b) from the inverse CDF sampling.)
(b) Using simple Monte Carlo estimation with 1000 samples, obtain a point estimate and a
95% confidence interval for θ.
(c) With a total sample size of 1000 and 4 equal-width, equal-size strata, compute a stratified
sampling estimate for θ.
(d) Out of the three estimators, (a) the importance sampling estimator, (b) the simple Monte
Carlo estimator, and (c) the stratified sampling estimator, which one do you expect to be
the LEAST efficient? Provide a brief verbal justification.
(e) (Bonus, 3 pts) One practical application of importance sampling is for estimating an
expectation, E[g(X)], when the random variable X is not straightforward to draw from.
For this question, θ can also be regarded as EY [a·cos(Y )/2], where Y follows the truncated
standard normal distribution on (−π2 , π2 ). Find out the value of the constant a, and use
1000 samples from truncated N(0, 1) on (−π2 , π2 ) to estimate θ.
Page 3 Page 3 of 5
3. (7 pts) Suppose Y1, Y2 are independent, and Y1 ∼ χ2(m), Y2 ∼ χ2(n). Then
X =
Y1/m
Y2/n
∼ F (m,n),
that is, X follows the F distribution with m and n degrees of freedom.
(a) Using only draws from the standard normal distribution (rnorm()), apply the transfor-
mation method to generate 1000 samples from the F (2, 30) distribution.
(Note: Do NOT use rchisq, rt, rf or the corresponding density/quantile functions.)
(b) Using the samples from part (a) and the F (2, 30) density, fX(·), as the importance func-
tion, obtain a Monte Carlo estimate of
p = P (X ≥ 3) = EX [I(X ≥ 3)] =
∫ ∞
0
I(X ≥ 3)fX(x) dx.
Compare your results with that from the R built-in function, 1-pf(3, df1 = 2, df2 = 30).1
(Note: You do not need to evaluate fX(), because the importance function and the
integrand should partially cancel out.)
1Sidenote: The F distribution is used for hypothesis testing in analysis of variance (ANOVA). Here, you are
essentially using Monte Carlo techniques to estimate the p−value of an F -test, that is, the probability of observing
a test statistic X as large or larger than 3 under the null distribution, F (2, 30).
Page 4 Page 4 of 5
4. (9 pts) An instructor who teaches on a whiteboard has run out of whiteboard markers and
plans to buy new markers. Suppose that, for the type of marker he will order:
• Each marker has 70% chance of coming from a “high-quality” manufacturer and 30%
chance of coming from a “low-quality” manufacturer.
• The lifetime (in weeks) of a marker produced by the high-quality manufacturer follows
an exponential distribution with rate λ = .5 (i.e., Exp(.5).)
• Markers produced by the low-quality manufacturer are worn out at a faster rate, with
lifetime following Exp(2).
Therefore, the lifetime of a randomly chosen marker (i.e., the number of weeks it takes to be
worn out), X, follows a discrete mixture distribution:
fX(x) = .7f1(x) + .3f2(x), x > 0,
where f1(·) is the Exp(.5) density, and f2(·) is the Exp(2) density.
(a) Generate a random sample of size 500 for X, i.e., the lifetime of 500 randomly chosen
markers. Plot the histogram of empirical density, with the theoretical density, fX(·),
superimposed.
(b) This instructor plans to buy 20 markers. He uses one marker at a time and replaces with
a new one when the current one is worn out. Thus, the total time it takes for all 20
markers to be worn out is
T =
20∑
i=1
Xi,
where X1, . . . , X20
i.i.d.∼ fX(·). Draw 1000 random samples of the total time, T .
(c) Let us use Monte Carlo integration to help the instructor estimate when he should expect
to run out of markers again, i.e.,
E(T ) =
∫ ∞
0
t · fT (t) dt.
With the density of T , fT (·), as the importance function, use the 1000 samples from part
(b) to obtain an importance sampling estimate for E(T ). 2
2Sidenote: This is connected to the renewal process, a type of stochastic process commonly used in industrial and
system engineering. Assuming that the lifetime of equipment (e.g., lightbulb, marker) is i.i.d., and that we renew
with a new one at failure, the renewal process can help us estimate the number of renewals by time t or the expected
time of the jth renewal.
Page 5 Page 5 of 5