BA 2551-无代写|学霸联盟

BA 2551-无代写

时间：2022-12-12

BA 2551 Practice Final Exam - Fall 2022
Instructor: Oliver Papp

SOLUTIONS

1. Artesian Spring Water provides bottled drinking water to homes in 15-gallon containers. The manager
wants to estimate the mean number of containers the typical home uses each month. A sample of 70 homes
is collected and the number of containers is recorded. The sample mean is computed to be 3.1 containers
and the sample standard deviation is computed to be 0.38 containers.

a. Using alpha = 0.05, test whether the data supports the claim that the mean number of containers per month
is different from 3. State the two hypotheses, compute the test statistic, find the rejection region, and make
your conclusion.

Ho: Mu = 3 = Mu0 x-bar = 3.1 s = 0.38 n=70 (large sample)
Ha: Mu not = 3 Two Tailed Test alpha = 0.05

z* = ( x-bar – Mu0) / ( s / sqrt(n)) = (3.1 – 3) / (0.38 / sqrt( 70)) = 2.202

Rejection Region: | z* | > z (alpha/2) = z (0.025) = 1.96

Since | z*| = 2.202 > z (alpha/2) = 1.96, we can reject Ho. Thus, we DO have strong enough statistical
evidence to support the claim that the mean number of containers per month is different from 3 (at
the 95% Confidence Level).

b. Suppose that the sample mean of 3.1 containers and the sample standard deviation of 0.38 containers were
computed from a smaller sample of 15 homes. Using alpha = 0.05, test whether the data supports the claim
that the mean number of containers per month is greater than 3. State the two hypotheses, compute the test
statistic, find the rejection region, and make your conclusion.

Ho: Mu = 3 = Mu0 x-bar = 3.1 s = 0.38 n=15 (small sample)
Ha: Mu > 3 Upper Tailed Test alpha = 0.05

t* = ( x-bar – Mu0) / ( s / sqrt(n)) = (3.1 – 3) / (0.38 / sqrt( 15)) = 1.0192

Rejection Region: t* > t (alpha, n-1) = t (0.05,14) = 1.761

Since t* = 1.0192 < t (alpha, n-1) = 1.761, we cannot reject Ho. Thus, we DO NOT have strong
enough statistical evidence to support the claim that the mean number of containers per month is
greater than 3 (at the 95% Confidence Level).

2. We are interested in the relationship between the Asking Price of Diamonds in the open market and the size
of the diamonds (measured in Carats). Based on data available for 308 diamonds, we decide to build a
simple linear regression model. The scatter plot of the data and the RStudio printout of the regression
analysis are presented below.

a. Report the least squares regression line and interpret the estimate of the slope of the line in the context of
this problem.

Asking Price = -$2,298.4 + $11,598.9 * Carats

For each Carat increase in diamond size we can expect the mean Asking Price to increase by $11,599.
b. Should we interpret the estimate of Beta0? Explain.

Since 0 is not in the range of the given Carat values, we should not interpret the estimate of Beta0 for
this problem.

c. Using hypothesis testing at the 99% confidence level, investigate whether this model is useful for predicting
the Asking Price or not.

Ho: Beta1 = 0
Ha: Beta1 not = 0 Two Tailed Test

t*= 50.41 with p-value < 2e-16 from the Coefficients portion of the printout

Since p-value < 2e-16 < alpha = 0.01, we can reject Ho.

Thus, there is sufficient evidence to say that slope is not zero and the model is useful for predicting the
Asking Price at the 99 % confidence level.

d. Find the 99% confidence interval for Beta1. Is this result consistent with the result in part c? Explain.

Beta1-hat = $11,598.9 S of Beta1-hat = $230.1 from printout
t(alpha/2 , n-2) = t(0.005 , 306) approximately = z(0.005) = 2.576 from t-table

99% CI for Beta1:

Beta1-hat +- t(0.005,306) * S of Beta1-hat = $11,598.9 +- 2.576 * $230.1 =
= $11,598.9 +- $592.74 ➔ ( $11,006.16 , $12,191.64 )

Since this confidence interval is all positive and does not include 0, we can conclude that at the 99%
confidence level the slope is not zero and the regression model is useful. This result is consistent with
the result in part c.

e. Obtain and interpret the estimated standard error of the regression model.

S = Residual standard error = 1,118 from printout

So, we can expect about 95% of the actual Asking Prices to fall within +- 2*(1,118) = +- $2,236 of the
predicted Asking Price.

3. The manufacturer of an over-the-counter pain reliever claims that its product brings pain relief to headache
sufferers in less than 3.5 minutes, on average. To be able to make this claim in its TV advertisements, the
manufacturer was required to present statistical evidence in support of the claim. The manufacturer reported
that for a random sample of 100 headache sufferers, the mean time to relief was 3.3 minutes and the
standard deviation of the sample was 1.2 minutes.

a. Using alpha = 0.01, test whether the data supports the manufacturer’s claim or not. State the two
hypotheses, compute the test statistic, find the rejection region, and make your conclusion.

Ho: Mu >= 3.5 = Mu0 x-bar = 3.3 s = 1.2 n=100 (large sample)
Ha: Mu < 3.5 Lower Tailed Test alpha = 0.01

z* =( x-bar – Mu0) / ( s / sqrt(n)) = (3.3 – 3.5) / (1.2 / sqrt(100)) = -1.67

Rejection Region: z* < - Z(alpha) = - Z(0.01) = -2.326

Since z* = -1.67 > - Z(alpha) = -2.326, we cannot reject Ho. Thus, we DO NOT have strong enough
statistical evidence to support the manufacturer’s claim that its product brings pain relief to headache
sufferers in less than 3.5 minutes, on average (at the 99% Confidence Level).

b. Find the p-value of the test. Is this p-value consistent with your conclusion from part a?

Since this is a Lower Tailed Test we have:
p-value = P ( Z < z*) = P ( Z < -1.67 ) = 0.5 – P ( 0 < Z < 1.67) = 0.5 – 0.4525 = 0.0475

Since this p-value is larger than the given alpha = 0.01, we cannot reject Ho, and we conclude that the
data DOES NOT support the manufacturer’s claim. Hence the results are consistent.

c. Suppose instead of the random sample of 100 only 25 headache sufferers were selected. Describe how the
rejection region differs in this case from the rejection region found in part a.
(Note: No Step-by-Step Hypothesis Testing need to be conducted.)

Since n = 25 is a small sample, we will need to use t distribution with n-1 = 24 degrees of freedom.

Rejection Region: t < - t (alpha , n-1) = - t (0.01 , 24) = -2.492

d. With the sample size of 25 mentioned in part c. what assumptions would be needed in order to be able to
conduct any hypothesis test for the true mean time to relief? Explain briefly.

Since n < 30, we cannot rely on the Central Limit Theorem.
Hence, we need to assume that the original population that we are sampling from has a distribution
that is approximately normal, and that the sample collected is a random sample.

4. A real estate appraiser wants to model the relationship between the Sale Price of apartment buildings and
the following independent variables: (1) Number of Apartments; (2) Age of Structure; (3) Lot Size Square
Footage; (4) Gross Building Area Square Footage.

Based on the data from 25 properties the following first order multiple regression model was fitted in RStudio.

a. Report the least squares prediction equation. Interpret the sample estimate of Beta4 in the context of this
model.

Sale Price = $113,500 + $4,998 * Number of Apartments – $1,053 * Age of Structure
+ $0.1273 * Lot Size Square Footage + $14.97 * Gross Building Area Square Footage

Interpretation of Beta 4-hat:

For each square foot increase in Gross Building Area we can expect the mean Sale Price to increase
by $14.97 when all other independent variables are held at constant levels.
b. Test whether there is sufficient evidence to conclude that the overall model is useful for predicting Sale Price
(alpha = 0.05).

Ho: Beta1 = Beta2 = Beta3 = Beta4 = 0 ANOVA F - Test
Ha: at least one of Beta1, Beta2, Beta3, or Beta4 is not equal to 0

F*= 217 with p-value = 3.626e-16 from the RStudio printout

Since p-value = 3.626e-16 < alpha = 0.05, we can reject Ho.

Thus, there is sufficient evidence to say that at least one of the Betas is not zero, and this first order
multiple regression model is useful for predicting Sale Price.

c. Does the data support the hypothesis that as Lot Size Square Footage is increased, the mean Sales Price will
also increase? Test using alpha = 0.05

Ho: Beta3 = 0 Upper Tailed Test
Ha: Beta3 > 0

t*= 0.043 with p-value = 0.966115 / 2 = 0.483 from the RStudio printout.

Since p-value = 0.483 > alpha = 0.05, we cannot reject Ho.

Thus, there is insufficient evidence to show that Beta3 > 0. Hence we do not have strong evidence that
Sales Price and Lot Size Square Footage are linearly positively related.

d. Obtain and interpret the R-square adjusted value of the model.

Adjusted R-Square = 0.973 from the RStudio printout.

Hence, about 97.3 % of the variation in the mean Sales Price is explained by this first order multiple
regression model using (1) Number of Apartments; (2) Age of Structure; (3) Lot Size Square Footage;
(4) Gross Building Area Square Footage as predictors, after adjusting for sample size and number of
independent variables in the model.

e. Obtain and interpret the estimated standard error of the multiple regression model.

S = Residual standard error = $34,780 from the RStudio printout.

So, we can expect about 95% of the actual Sale Prices to fall within +- 2*($34,780) = +- $69,560 of the
predicted Sale Price.

5. Some college professors make lecture notes available to their classes in an effort to improve teaching
effectiveness. Two groups of students were surveyed, with the first group receiving lecture notes, while no
notes were offered to the second group. At the end of the semester, the students were asked to respond to the
statement: “Having a copy of the notes was [would be] helpful in understanding the material.” Responses
were measured on a nine-point semantic difference scale, where 1 = “strongly disagree” and 9 = “strongly
agree”. The results are summarized in the following table:

Class with notes: | Class without notes:
n1 = 86 | n2 = 35
x-bar1 = 8.48 | x-bar2 = 7.80
s1 = 0.97 | s2 = 1.73

a. Do the samples provide sufficient evidence to conclude that there is a difference in the mean responses of
the two groups of students? Test using alpha = 0.01

Since both n1 & n2 > 30, we have large samples.
Ho: Mu1 – Mu 2 = 0 Two Tailed Test
Ha: Mu1 – Mu 2 < > 0

z* = (x-bar1 – x-bar2) – 0 / sqrt( s1^2 / n1 + s2^2 / n2) = 2.19

Rejection Region: z > 2.576 or z < - 2.576 since Z alpha/2= Z (0.005) = 2.576

Since -2.576 < z* = 2.19 < 2.576 , at the 99% level of significance we do not have strong enough
statistical evidence to reject Ho in favor of Ha. Hence there is not enough statistical evidence to
conclude that the means differ.

b. Construct a 99% Confidence Interval for (Mu1 – Mu2), and interpret the result.

The 99% Confidence Interval for (Mu1 – Mu2) is given by:

((x-bar1 – x-bar2) +- Z alpha/2 * sqrt( s1^2 / n1 + s2^2 / n2) =
= (8.48 -7.80) +- 2.576 * sqrt( (0.97)^2 / 86 + (1.73)^2 / 35 ) = 0.68 +- 0.8 → ( -0.12, 1.48 )

Since this Confidence Interval includes 0, statistically we cannot tell which mean is larger at the 99%
Confidence level.

c. Would the 95 % Confidence Interval for (Mu1 – Mu2) be narrower or wider? Explain.

The 95% Confidence Interval would be narrower, since the Z alpha/2 multiplier would be 1.96 instead of
the 2.574 value used for the 99% Confidence Interval.

6. A completely randomized design is utilized to compare four treatment means. The ANOVA analysis from
RStudio is given below.

a. Conduct the test to investigate whether the treatment means differ or not using alpha=0.01

Ho: Mu1=Mu2=Mu3=Mu4
Ha: at least two of the treatment means differ

F*= 7.698 with p-value =0.0021 from ANOVA table on printout

Since p-value =0.0021 < alpha = 0.01, we can reject Ho.

Thus, there is sufficient evidence to say that at least two of the treatment means differ.

b. What assumptions must be met to ensure the validity of the inference you made in part b?

Independent random samples are collected from the respective Normal populations. In addition, the
population variances are approximately equal.

c. Based on the results of the Tukey multiple comparison method given below, how do the treatment means

differ?

Confidence Intervals which include 0 suggest that the corresponding pairs of treatment means do not
differ significantly.

Thus, only the confidence intervals for (Mu3-Mu2) and (Mu4-Mu2) suggest statistically significant
difference between the corresponding treatment means. Since the confidence intervals for (Mu3-Mu2) and
(Mu4-Mu2) are both all positive, we have Mu3 > Mu2 and Mu4 > Mu2.

Given that the other four confidence intervals all have negative lower endpoint and positive upper
endpoint, they all contain 0 in the interval. Hence, we do not have statistically significant difference
between the remaining treatment means pairs.

7. A major airline recently began encouraging reservation agents to nap during their breaks. The following
table lists the number of complaints received about each of a sample of 10 reservation agents during the six
months before naps were encouraged and during the six months after the policy change.

Reservation
Before
Naps With Naps Difference
Agent Complaints Complaints (After - Before)
1 10 5 -5
2 3 0 -3
3 16 7 -9
4 11 4 -7
5 8 6 -2
6 2 4 2
7 1 2 1
8 14 3 -11
9 5 5 0
10 6 1 -5

Note: d-bar = -3.9 and Sd = 4.3063

a. Does the data present sufficient evidence to conclude that the new napping policy reduces the mean number
of customer complaints about reservation agents? Test using alpha = .05

Given the way the differences are computed, if the mean of the population of difference is less than
zero then the napping policy is effective. Hence, we test the following:

Ho: Mu D = 0 Lower tailed test
Ha: mu D < 0

The test statistic is given by:
t* = ( d-bar – 0 ) / ( Sd / sqrt(n) ) = (-3.9 – 0) / ( 4.3063 / sqrt(10) ) = -2.864

The rejection region requires alpha = 0.05 in the lower tail of the t distribution with 9 degrees of
freedom. From our t-table we see that t(0.05,9) = 1.833 so the rejection region is given by t < -t(0.05,9)
= -1.833

Since t* = -2.864 < -t(0.05,9) = -1.833 we can reject Ho and conclude that mu D < 0 and the napping
policy does reduce the mean number of customer complaints.

b. What assumptions must hold to ensure the validity of this test?

The population of differences is approximately normally distributed, and the sample of differences is
randomly selected.

8. Failure to meet payments on student loans guaranteed by the U.S. government has been a major problem for
banks and the government. Approximately 50% of all student loans guaranteed by the government are in
default. A random sample of 350 loans to college students in one region of the U.S. indicates that 147 loans
are in default.

a. Test using alpha = 0.01 whether the data indicates that the proportion of defaulted student loans in this area
of the country differs from the proportion of all student loans in the U.S. that are in default. State the two
hypotheses, compute the test statistic, find the rejection region, and make your conclusion.

Ho: p = 0.50 = p0 Two tailed test.
Ha: p not = 0.50 = p0

p-hat = 147 / 350 = .42 and since p0 = 0.50 we also have q0 = 1 - p0 = 0.50

z* = (p-hat – p0) / sqrt(p0*q0 / n) = (0.42 – 0.50) / sqrt(0.5 * 0.5 / 350) = -0.08 / 0.0267 = -2.99

Rejection regions: | z* | > Z(alpha/2) = Z(0.005) = 2.576

Since z* = -2.99 < -Z(alpha/2) = -Z(0.025) = -2.576, we reject Ho and conclude that p is not equal to 0.5

So, we have strong evidence that the proportion of defaulted student loans in this area of the country
differs from the proportion of all student loans in the U.S. (at the 99% confidence level).

b. Is the sample size large enough to use the inferential procedures based on a large sample? Justify your
answer.

n *p0 = 350 * 0.5 = 175 and n*q0 = 350 * 0.5 = 175

Since both n*p0 > 15 and n*q0 > 15, the sample size is large enough.

c. Find the observe significance level for the test and report what conclusion can you draw about the
hypothesis testing problem based on the observed significance level.

z* = -2.99 and since we had a 2 tailed test, the p-value is give by:

P(Z < -2.99) + P(Z > 2.99) = 2* P(Z > 2.99) = 2 * (0.5 – P( 0 < Z < 2.99)) = 2 * (0.5 – 0.4986) =
= 2* (0.0014) = 0.0028

Since p-value = 0.0028 < alpha = 0.01 we can reject Ho and conclude that p is not equal to 0.5

So, we have strong evidence that the proportion of defaulted student loans in this area of the country
differs from the proportion of all student loans in the U.S. (at the 99% confidence level).