程序代写案例-MATH 185|学霸联盟

程序代写案例-MATH 185

时间：2022-05-01

MATH 185 – Take-Home Exam 1
AGREEMENT
By taking this exam, starting now, you agree to not discuss the exam with anyone or seek as-
sistance from anyone, neither with a classmate or anyone else, neither in person nor through
other means. You can access the lecture notes, code, and textbook at will. You can also
peruse the internet to find out how to do something in R such as changing the color of a
histogram’s bars, or how to add a legend to a plot. You can ask clarifying questions to the
instructor directly by email. (And only clarifying questions, as you would if you were taking
the exam in a lecture hall.)
Problem 1. (Subsampling) This is a variant of the bootstrap where the sampling is
without replacement. We look at this variant in the context of building a confidence interval
for the mean θ of some (unknown) distribution on the real line. Consider an iid sample
X1, . . . , Xn from that distribution. Let 1− α denote the desired confidence level.
• For b = 1, . . . , B:
◦ Draw uniformly at random without replacement r observations from the original
sample, obtaining Xb1, . . . , X
b
r .
◦ Compute the corresponding sample mean X¯b and sample standard deviation Sb,
and form the t-ratio Tb = (X¯b − X¯)/(Sb/
√
r).
• Letting tγ denote the γ-quantile of {T1, . . . , TB}, return the interval[
X¯ − t1−α/2 S/
√
n, X¯ − tα/2 S/
√
n
]
where X¯ and S denote the mean and standard deviation of the original sample.
A. Write a function
subsample.mean.CI(x, conf = 0.95, sub = length(x)/10, B = 1e4)
that takes in the sample in the form of a numerical vector x, the desired confidence level
conf, the subsample size sub (corresponding to r above), and the number of bootstrap
replicates B, and returns the interval as computed above.
B. Generate a sample of size n from the standard normal distribution. Compute the
subsampling confidence interval corresponding to subsample size r, and record whether
it covers the population mean (1 if yes, 0 if not) and record its length. Do this for
n ∈ {1000, 2000, . . . , 10000} and r ∈ {n/50, n/20, n/10, n/5, n/2} — corresponding to
a total of 10× 5 = 50 settings, repeating each setting M = 100 times. Use B = 1e3 to
limit the computational burden.
• Plot the interval coverage (= the fraction of times in the M repeats that the
interval included the true value of the mean) as a function of the sample size n.
Each of the five choices for r results in a different curve, plotted in a different
color. (Add a legend.)
• Plot the interval length in a similar way.
[Note that the output is two plots, each with five curves.] Offer some brief comments.
Problem 2. (Pulse of the Nation) Cards Against Humanity is a popular game, and
the company was somehow involved in sponsoring several opinion polls in the years 2017-
2018. Details and datasets are available at the Roper Center for Public Opinion Research
at Cornell University.1 We focus on the first survey, conducted on September 2017. The
dataset will be made available to you separately in .RDA format.
Several questions can be asked and answered with the tools seen so far in lecture. For
example, consider the question “Is there an association between gender and party affiliation?”
• Provide one/two plots to assess the question visually.
• Apply a test (name the test, specify how you obtain a p-value for the test).
• Offer some brief comments.
Repeat the above with 3 additional questions of your choosing. [Some variables are
numerical, but if needed, may be converted to categorical by binning. We will see later in
the quarter how to test for association between numerical variables, but for now, only use
tools covered in lecture.]
1Note that we are not endorsing this game or these polls in any way. Opinion polls can be politically
motivated/oriented, and it might well be the case here. We are not endorsing these polls in any way. It just
happens that these polls are fairly large and detailed.