R代写-OR 664 /
时间:2021-04-29
OR 664 / SYST 664 / CSI 674 Final Exam Page 1 Spring 2021
OR 664 / SYST 664 / CSI 674: Final Exam
Due Monday May 3, 2021, 11:59 PM
Each question is worth ten points, for a total of 100 points for the final exam. Show all your
reasoning. You will receive some credit for making an honest attempt and more credit if I can
tell you were thinking along the right lines. Write up your solutions in a pdf file and submit to
Gradescope. Your writeup for each problem should be self-contained and may include sections
of code. You may submit other documents (R file, spreadsheet) separately to Blackboard for
reference, but your solution must be understandable on its own. You are bound by the GMU
honor code to work by yourself on the exam. I will be available by phone or email to answer
clarification questions.
1. A sensor is designed to detect a pollutant in soil samples. The manufacturer claims the
sensor has 85% accuracy. Based on this claim, the following prior distribution has been
specified for the true positive and false positive rates:
• True positive rate (sensitivity). The probability of correctly detecting the pollutant if it
is present has a beta distribution with shape parameters 4.25 and 0.75.
• False positive rate (1 – specificity). The probability that the test will erroneously
report the pollutant if it is not present is independent of the sensitivity and has a beta
distribution with shape parameters 0.75 and 4.25.
A test was performed on 35 soil samples known to contain the pollutant and 40 soil samples
known not to contain the pollutant. The sensor correctly detected 26 out of the 35 samples
containing the pollutant, and incorrectly reported the pollutant in 7 of the samples not
containing the pollutant. What is the joint posterior distribution for the true positive and false
positive rate of the sensor? Find 95% posterior credible intervals for the true and false
positive rates.
2. A scientist is studying the effect of soil conditions on the height of a species of plant. Height
data in centimeters was collected on four different plots of land with different soil conditions.
Plot A Plot B Plot C Plot D
21.1 19.2 27.0 20.1
30.5 35.7 23.6 42.4
19.3 16.7 19.3 30.5
14.8 18.6 8.8 32.2
17.5 23.8 21.9 28.3
12.0 24.6 10.8 23.8
23.4 25.9 12.4 25.8
34.1 10.1 10.6 15.5
16.7 21.7 20.2 19.8
11.4 17.7 7.9 23.3
14.3 18.6 16.0 23.0
25.0 37.4 20.5 33.3
13.9 27.1 1.1 16.6
21.9 29.5 16.6 28.5
21.5 32.7 26.9 18.8
OR 664 / SYST 664 / CSI 674 Final Exam Page 2 Spring 2021
Assume the observations from each plot are normally distributed with unknown plot-specific
means Qi and precisions Ri for i=1, …, 4. Assume the parameters (Qi, Ri) are independent
draws from a normal-gamma(µ, k, a, b) distribution. Find empirical Bayes estimates for the
hyperparameters µ, k, a, and b as follows:
• Estimate the center µ as the grand mean of all the observations.
• Estimate the shape and scale as follows. Estimate the sample precisions "!, "", "#, "$
by calculating the sample variances and taking their inverses. Estimate the mean
of the Gamma distribution as the average of "!, "", "#, "$. Estimate the variance " as the sample variance of "!, "", "#, "$. Then solve for and .
• To estimate the precision multiplier k, first find the sample means ̅!, ̅", ̅#, ̅$ for
the four plots. Then calculate the sample variance of these four sample means, and
invert to estimate the precision of the means. Divide this value by the average of "!, "", "#, and "$ to estimate k, the ratio of the precision of the means to the precision of
the observations.
Find the joint posterior distribution for Q1, Q2, Q3, and Q4. Find a 95% posterior credible
interval for each Qi. Comment on your results, including whether the assumption that the
observations are normally distributed is justified.
3. Using the posterior distributions from the previous problem, use 5000 direct Monte Carlo
samples to estimate the following probabilities: (1) the posterior probability that the mean
height Q3 for Plot C is the smallest of the four group means; and (2) for four new plants, one
growing on each of the plots, the posterior probability that the plant on Plot C is the shortest
of the four plants. Clearly describe the process you use to find your Monte Carlo estimates.
Discuss your results. Clearly explain the difference between the two estimated probabilities.
4. The file poverty.txt, included on Blackboard with this exam, contains data on poverty
rates and births per 1000 females 15 to 17 years old for 50 states and the District of
Columbia. Assume the relationship between poverty rate !:&! and birth rate !:&! is linear
with independent normally distributed errors. Assume the following unit-information prior
distribution for the parameters (h, b):
• Conditional on the precision r, the transformed intercept is normally distributed
with mean 0 and precision .
• Conditional on the precision r, the slope is independent of and normally
distributed with mean 0 and precision !&! ''.
• The precision r has a gamma distribution with shape ½ and scale 0.065 (the scale is 2/", where " = (( /( − 2)).
Find the joint posterior distribution for (h, b, r). Find 95% credible intervals for the slope b
and the untransformed intercept a. Comment on your results, including whether the
assumptions for normal linear regression are met.
5. For the poverty problem, find a 90% posterior predictive interval for the number of births per
100 females aged 15 to 17 for an area with poverty rate 25%. Repeat for poverty rate 50%.
Discuss.
OR 664 / SYST 664 / CSI 674 Final Exam Page 3 Spring 2021
6. A biologist counts the number of sparrows visiting five bird feeders placed on a given day.
Feeder Number of Birds
1 11
2 22
3 9
4 28
5 19
• Assume that the bird counts are independent Poisson random variables with feeder-
dependent means Li, for i=1,…,5.
• Assume that the means Li are independent and identically distributed gamma random
variables with shape a and scale b (or equivalently, shape a and mean m = ab )
• The mean m = ab of the gamma distribution is uniformly distributed on a grid of 50
equally spaced values starting at 5 and ending at 40.
• The shape a is independent of the mean m and distributed on a grid of 50 equally spaced
points starting at 1 and ending at 50, with prior probability proportional to 1/a.
Use Gibbs sampling to draw 5000 samples from the joint posterior distribution of the mean
M, the shape parameter A, and the five rate parameters Li, i=1,…,5, conditional on the
observed bird counts. Using your sample, calculate 95% credible intervals for the five rate
parameters Li. Discuss.
7. Assume that a virus detection system has a miss probability of 6% and a false alarm
probability of 8%, where a miss is defined as failing to report an infected file, and a false
alarm is defined as reporting an infection when there is none. The company is considering
using the system to screen attachments on all incoming emails. Assume a loss of 0 for
making the right choice: catching an infected file or not reporting a clean attachment.
Assume that the loss for failing to identify an infected file is 25 times the loss for reporting
an infection when there is none. Let p be the prior probability that an attachment is infected.
As a function of p, find the expected loss of three policies: (1) report all attachments as
infected; (2) do not report any infections; and (3) report an attachment as infected if and only
if the testing system identifies it as infected. For what range of p is each policy optimal?
Comment on your results.
8. Management at a diner is investigating business at the morning rush in order to find an
efficient staffing policy. Assume that the number of customers arriving per minute during the
breakfast period has a Poisson distribution with an unknown rate L. A non-informative prior
distribution () ∝ )!" is defined on the positive real line for the unknown rate L. This is
the Jeffreys prior, and it is the limit of a gamma distribution with shape ½ and scale tending
to infinity. Over the 1-hour morning rush, 78 customers arrived. Find the posterior
distribution for the unknown rate L. Given this posterior distribution for L, find the
probability that 25 or more customers will arrive in a future 15-minute time period during the
morning rush.
9. An analyst is working with an expert to specify a prior distribution for data on 5-year relapse
rates in a treatment program for addiction. The expert has given the following judgments:
• There is a 95% chance that the relapse rate is greater than 10%;
OR 664 / SYST 664 / CSI 674 Final Exam Page 4 Spring 2021
• There is an 80% chance that the relapse rate is greater than 20%;
• There is a 50% chance that the relapse rate is greater than 30%;
• There is a 10% chance that the relapse rate is greater than 55%.
Find a beta prior distribution for the relapse rate that fits these judgments as well as possible.
Comment on your results.
10. A management consulting firm conducted a survey of employees of a company. 17 out of 50
respondents stated they were “very satisfied”; 22 out of 50 respondents said they were
“somewhat satisfied,” 8 out of 50 respondents said they were “neither satisfied nor
dissatisfied,” and 3 out of 50 respondents said they were “dissatisfied.” Assume that the
survey respondents are a representative sample from the target population of employees.
Assume a uniform prior distribution for the probability vector (!, ", #, $) for the choice
among the four responses by a user in the target population. Find the posterior distribution
for the probability vector. Find 90% credible intervals for the probability that an employee in
the target population would choose each of the four responses. Comment on your results.











































































































































学霸联盟


essay、essay代写