统计代写-STAT5002
时间:2021-11-02

STAT5002
Sample final exam

1. The ages of 100 randomly selected USYD students who enrolled for the Winter 2019 semester ranged
from a low of 17 to a high of 28. The sum of the 100 measurements is 2213; the sum of their squares
is 50359. All measurements were recorded to the nearest age.
a. What is the population in this study?
b. What is the level of measurement in this study?
c. Suppose the sample was obtained by asking the age of the first 100 students who entered Carlsaw
Building at 8:40 am on July 11. Is the sample a random sample? Why, or why not?
d. Compute the sample variance.


2. Refer to the following R codes and output.



a. Compute and interpret the interquartile range (IQR).
b. Calculate 1.5*IQR below the first quartile.
c. Refer to part (b). How many data points can we say are low outliers?
d. Calculate 1.5*IQR above the third quartile.
e. Refer to part (d). How many data points can we say are high outliers?
f. Is there skewness present? If so, what type?
g. The log transformation was applied to the original data. R produces the following boxplots:





What do these boxplots show?


3. A large shipment of portable television sets is received by a retailer. To protect herself against a bad
shipment, she will inspect 5 randomly selected sets and accept the entire shipment of sets if she
observes 0 or 1 defectives in the 5 selected sets. Suppose that 10% of the television sets in the
shipment are defective.
a. What is the probability that she accepts the entire shipment?
b. Given that the retailer accepts the entire lot, what is the probability that she observed exactly 1
defective set?


4. A car dealer offers free coffee and free donuts. He notices that of all the people who enter the
showroom, 62% drink coffee and 27% eat donuts. He also notes that 72% of those entering the
showroom either drink coffee or eat donuts or both.
a. Find the probability that a person entering the showroom both eats donuts and drinks coffee.
b. If a person entering the showroom drinks coffee, what is the probability that he or she eats donuts?
c. For this population, are the events C = drink coffee and D = eat donuts independent? Show your
work for credit.


5. A Ph.D graduate has applied for a job with 2 universities, A and B. The graduate feels that she has a
60% chance of receiving an offer from university A, and a 50% chance of receiving an offer from
university B. If she receives an offer from university B, she believes that she has an 80% chance of
receiving an offer from university A.
a. What is the probability that both universities will make her an offer?
b. What is the probability that at least one university will make her an offer?
c. If she receives an offer from university B, what is the probability that she will not receive an offer
from university A?
d. If she receives an offer from university A, what is the probability that she will receive an offer from
university B?


6. Ed Jones is a stockbroker who recommends 2 stocks (IBM and Citibank) to a client. Suppose that the
probability that each stock’s price will go up next year is 0.6, and that each stock’s price behavior is
independent of the price behavior of the other stock.
a. What is the probability that neither stock will go up next year?
b. What is the probability that both stocks will go up next year?
c. What is the probability that exactly one of these stocks will go up next year?
d. What is the expected number of these stocks that will go up next year?


7. A case of 12 bottles of catsup has 2 contaminated bottles. Three bottles are chosen at random without
replacement from this case. Let p be the proportion of contaminated bottles and ̂ the sample
proportion of contaminated bottles. The sampling distribution of ̂ is given below:

̂ 0 A B
P(̂) 12/22 C D

a. Find the value of p.
b. Find the values of A, B, C, and D.
c. Compute the mean of the sampling distribution of ̂.




8. The diameters of a special tire produced by Firestone are normally distributed with a mean of 25 inches
and a standard deviation of 0.5 inches.
a. Find a value of x such that 89.97% of the special tires produced by Firestone will have a diameter of
more than x inches.
b. What is the probability that a randomly chosen tire will have a diameter of less than 24 inches?
c. If 5 tires are randomly selected from the Firestone exhibition center, what is the probability that at
most 4 of the 5 tires selected will have diameters of more than 24 inches?
d. If a random sample of 36 tires produced by Firestone is taken, what is the probability that the mean
diameter in the sample exceeds 24.81 inches?
e. What can be said about your answer in part (d) if the diameters of the tires are not normally
distributed? Why?


9. Technicians are trained with a computer simulation that randomly generates temperatures for one
sector of a chemical reactor. Those temperatures are uniformly distributed between 400 and 500
degrees Celsius.
a. If one temperature is randomly generated, what is the probability that its value will be greater than
460 degrees?
b. If one temperature is randomly generated, what is the probability that its value is 430 degrees?
c. If 40 of these temperatures are randomly generated, what is the probability that the sample mean
will be greater than 460 degrees?


10. According to government data, 22% of children under the age of 6 in Australia live in households with
incomes less than the poverty level. Suppose that a random sample of 300 AU children under the age
of 6 is selected for a study.
a. Write down the formula to compute the probability that at least 80 of the children in the sample live
in poverty.
b. What is the approximate probability that at least 80 of the children in the sample live in poverty?


11. The Grocery Manufacturers of Australia reported that 76% of consumers read the ingredients listed on
a product’s label. Assume that the population proportion is p = 0.76 and a sample of 400 consumers is
selected from the population.
a. Is the sample size sufficiently large that the normal distribution provides a reasonable
approximation for the sampling distribution of ̂? Why or why not?
b. What is the probability that the sample proportion will be within ± 0.03 of the population proportion?


12. A market researcher with Catalog Apparel, Inc., constructed a questionnaire seeking customer
responses to a particular product. She hypothesized that people were equally split on whether they
liked the product’s package design. To test this hypothesis, she asked the following question: “What is
your feeling about the product’s package design?” The respondents could then check a response on a
seven-point scale that ranged from 1 = strongly dislike to 7 = strongly favor. The response in the
middle was 4 = indifferent. In a pilot test of her questionnaire, she obtained the following responses to
this question from 12 people:
7 3 5 4 7 1 2 2 5 7 6 5
Test H0: μ = 4 against H1: μ ≠ 4 using the sign test. Use α = 0.05.


13. You theorize that 75% of Data Science students are male. You survey a random sample of 12 Data
Science students and find that 7 are male. Do your results significantly differ from the expected
results?

14. From past experience the manager of the parking facilities at a major airport knows that 58 percent of
the customers stay less than one hour, 23 percent between one and two hours, 10 percent between
two and three hours, and nine percent three hours or more.
The manager wants to update this study. A sample of 500 stamped parking tickets is selected. The
results showed 300 stayed less than one hour, 100 from one to two hours, 60 from two to three hours,
and 40 parked three hours or more.
At the 0.01 significance level, does the data suggest there has been a change in the length of time
customers use the parking facilities?


15. On May 5, 2019, a random sample of 10 gas stations in St. Louis was taken by an auto club and the
price of a gallon of premium unleaded gasoline at each station was recorded. The observed prices
were as follows:

1.35 1.37 1.39 1.39 1.48 1.39 1.44 1.41 1.32 1.46

These values were entered into R. Results of some R commands are shown below.






a. Is the price of a gallon of premium unleaded gasoline normally distributed? Justify your answer
with an explanation.
b. Find a 99% confidence interval for the mean price of a gallon of premium unleaded gasoline in St.
Louis on May 5, 2019.
c. Refer to part (b). Interpret this confidence interval in terms of repeated sampling.
16. A utility company is trying to find sites for large wind machines for generating electric power. Wind
speeds must average at least 15 miles per hour (mph) for a site to be acceptable. Assume that 50 wind
speed recordings were taken at random times on a site under consideration for a wind machine and
that the wind speeds averaged 16.2 mph with a standard deviation of 6.3 mph.
a. What assumptions are required to construct a 95% confidence interval estimate for the mean wind
speed at this site? Are these assumptions satisfied? Explain.
b. Find a 95% confidence interval estimate of the mean wind speed at this site.



c. Based on your answer to part (b), would you think the utility company would consider this to be a
suitable site for generating electric power? Explain briefly.


17. You wish to compare the lifetimes of 2 brands of batteries. Independent random samples of 15
Duracell (Brand D) batteries and 10 EverReady (Brand E) batteries were taken. Each battery was used
in a flashlight and its lifetime recorded. The sample of Brand D batteries had an average lifetime 6
hours and a standard deviation of 0.51 hours. The sample of Brand E batteries had an average lifetime
of 6.1 hours and a standard deviation of 0.65 hours. Using a 0.05 level of significance and critical value
approach, state and test appropriate hypotheses to determine if there is a significant difference in the
mean lifetimes of these 2 brands of batteries. Assume that the lifetimes of 2 brands of batteries are
normal.




18. Suppose you want to test H0: μ = 1000 against H1: μ > 1000 using α = 0.05. The population in question
is normally distributed with standard deviation 120. A random sample of size n = 36 will be used.
a. Explain clearly what α implies for tests conducted when the null hypothesis is true.
b. Why is a one-sided alternative hypothesis (greater than or less than) preferable to a two-sided
alternative (not equal to)?
c. Despite this, a two-sided alternative hypothesis is often used. Why?
d. Compute the power of the usual test when μ = 1020.


19. The manufacturer of an insecticide, Fly-die, claims that one application of the spray will kill more than
90% of the flies sprayed. As purchasing agent for the Noc. M. Dead Store if 830 out of 900 flies
sprayed were killed.
a. Write down the null and alternative hypotheses.
b. Find a point estimate of the true proportion of flies sprayed being killed.
c. Is the sample size large enough to construct an z-interval for p? Explain.
d. The 95% confidence interval for the true proportion of flies sprayed being killed is between 0.9047
and 0.9397. Based on this confidence interval, is there evidence to show that one application of the
spray will kill more than 90% of the flies sprayed? Explain.

20. Consider the following data set:

Y 8.47 3.79 24.91 7.57 0.29
X 7 9 12 11 8

The data was entered into R, and the results of some R commands are shown below:



a. Write down the least squares line.
b. Calculate the total variation in Y.
c. Calculate and interpret the coefficient of non-determination.
d. Obtain a point estimate of σ2.
e. What proportion of the total variation in Y is explained by the model?
f. Interpret the coefficients and p-values in this regression.
g. Can the model be reduced to the null model; i.e., Y = β0 + ε?
h. Suppose the population model is Y = 1 + 0.2*X + ε. Fill in the missing information.

Observation Y X E(Y|X) ε Ŷ ε̂
1 8.47 47
2 3.79 42
3 24.91 40
4 7.57 27
5 0.29 33

i. In part (f), observation 3 has a very large positive random error (ε). Does it imply that the
population model is incorrect? Briefly explain.
j. Test H0: β1 = 0.2 against H1: β1 ≠ 0.2


21. Below is a set of data containing one dependent variable (Y), and 6 independent variables. Data were
entered into R. Results of some R commands are shown below.




a. Find the total variation in Y.
b. Find the standard deviation of Y (Sy).
c. What are the degrees of freedom associated with RegSS?
d. Write down the fitted regression line.
e. What is the average distance of the residuals from the regression line?
f. What does the standard error of estimate tell you about the regression model? (Y̅ = 6.3119)
g. Determine the coefficient of determination and interpret its meaning.
h. Find the coefficient of determination adjusted for degrees of freedom.
i. Test the overall utility of the model. What does the p-value of the test statistic tell you?
j. Identify predictors which are significant at α = 0.05.
k. Interpret the physical meaning of the coefficients of significant predictors in the model.

l. Refer to the following R output. What automatic search algorithm is used here to search for the
best subset of predictors? Describe the selection procedures. Write down the estimated final MLR
equation.







m. Write down the null and alternative hypotheses for the following R output. What conclusion can be
made?




n. Refer to part (l). Find and interpret the multiple correlation. What is the range of multiple
correlation?











The following residual diagnostics are based on the final model.







o. Are there any high leverage points in the data for the final model?
p. What characteristics does a high leverage point have in general?
q. Are there any outliers in the data for the final model?
r. Comment on the residual plots for the final model.










22. The data below, from a study of computer assisted learning by 10 students, show the total number of
correct and incorrect responses in completing a lesson (x) and the cost of computer time (y in cents):



a. Write down the least squares line.
b. Comment on any pattern of the residual plots and identify any potential outliers.









c. Estimate the standard error of the estimator for β1.
d. To account for the changes in variability, we can revise the model M0

M0: Yi = β0 + β1Xi + εi where εi ~ N(0, kXi
2)

to fit alternatively

M1:
Yi
Xi
=
β0
Xi
+
β1Xi
Xi
+
εi
Xi


as this produce a model where the variance of the response variable is not increasing with Xi. Write
down the fitted regression equation.



e. Estimate the standard error of the estimator for β1,new which is the intercept parameter in the new
model. Compare the estimated standard error of both parameter estimators decrease compared
with the model in part (a).
f. Estimate the variance constant k.


















23. People are asked to perform a computer programming task, including debugging, within a specified
time. The collected data has been imported into R. There are two columns of data labelled as
“experience”: the number of months of experience they have had in programming and “success”:
whether they were successful (coded 1) or not successful (coded 0) in completing the task in the
specified time. The R output is shown below:




a. How many observations are there in this dataset?
b. What type of variable is success called?
c. Is a simple linear regression model appropriate?
d. What is the name of the underlying model as shown in the R output?
e. Was this model estimated by the method of least squares? If not, what estimation method was
used?
f. Write down the estimated regression equation.
g. Estimate the odds and hence the probability that a person with 12 months experience completes
the task successfully in the specified time.
h. How is the value of the odds ratio obtained from the logistic regression line, and what is the physical
interpretation of its value in this case?


24. A Data Science professor is making up a final examination which is to be given to a very large group of
students. His feelings about the average grade they should get is expressed subjectively by a normal
distribution with the mean μ0 = 65.2 and the standard deviation σ0 = 1.5.
a. What prior probability does the professor assign to the actual average grade being somewhere on
the interval from 63 to 68?
b. What posterior probability would he assign to this event if the examination is tried on a random
sample of 40 students whose grades have a mean of 72.9 and a standard deviation of 7.4? Use s
= 7.4 as an estimate of σ.


25. Find the mean and variance of the posterior distribution as an estimate of the “true” probability of a
success, if 42 successes are obtained in 120 binomial trials and the prior distribution of p is a beta
distribution with α = β = 40.


essay、essay代写