Hypothesis Testing (Mean)
with a Single Population
By the end of the lecture you should be able to:
• Know the steps in testing hypotheses and define H0 and H1.
• Define Type I error, Type II error, and power.
• Formulate a null and alternative hypothesis for μ or π
• Explain decision rules, critical values, and rejection regions.
• Perform a hypothesis test for a mean with known σ using z.
• Perform a hypothesis test for a mean with unknown σ using t.
• Perform a hypothesis test for a proportion.
Hypothesis testing
• In statistics, a hypothesis is a claim or a statement about the property of an
underlying population.
• In undertaking hypothesis testing, you want to determine: Is the hypothesis
true or false?
• In other words: are the data consistent with a particular hypothesis?
e.g. given our data, can we reject the claim that women are paid less, on
average, than men?
Examples of daily uses of data in business
• To support marketing claims
• Help managers make decisions
• Measure business improvement
The use of data allows businesses to find the best answers to their questions, e.g.:
• Should a manufacturing company change suppliers of an input?
• Did product defects decrease after a new manufacturing process was introduced?
• Has the average service time at Starbucks decreased since last year?
• Has the NHS ambulance service decreased its average response time to accidents?
Hypothesis testing uses
Hypothesis testing is used in science and business to test assumptions
and theories and guide managers when facing decisions.
First we will explain the logic behind hypothesis testing and then show
how statistical hypothesis testing helps businesses make decisions.
Logic of hypothesis testing
• The business analysts asks questions, makes assumptions, and proposes
testable theories about the values of key parameters of the business
operating environment.
• Each assumption is tested against observed data.
• If an assumption has been disproved, in spite of rigorous efforts to do so,
the business may operate under the belief that the statement is true.
• The analyst states the assumption, called a hypothesis, in a format that
can be tested using well-known statistical procedures. The hypothesis is
compared with sample data to determine if the data are consistent or
inconsistent with the hypothesis
• When the data are found to be inconsistent (i.e. in conflict) with the
hypothesis, the hypothesis is either discarded or reformulated.
Estimation
• Estimation is the process of using sample data to draw inferences about
the population
• In order to estimate a parameter (such as the population mean) a rule
which describes how to derive that estimate is required ; such a rule is
known as an estimator.
Sample
information
Population
parameters
2, sx 2,σµ
inferences
Principles of hypothesis testing
• The approach involves making a tentative assumption about the population
parameter, which is usually referred to as the null hypothesis and denoted by Ho.
• Ho is what we wish to test.
• Another hypothesis referred to as the alternative hypothesis (and denoted by
Ha) is then defined.
• The hypothesis-testing procedure involves using data from a sample to test two
competing statements – one denoted by Ho and the other by Ha.
Principles of hypothesis testing
• A random sample of data is chosen.
• If the sample data are consistent with the null hypothesis, then we do not reject the
null hypothesis.
• Never ‘accept’ a null hypothesis – just because a hypothesis is consistent with our
data does not mean it is true!
• If the sample data are inconsistent with the null hypothesis, then we reject the null
hypothesis in favour of the alternative.
Choosing the Null Hypothesis
The first step in setting up a hypothesis test is to decide on the null and the alternative
hypothesis.
The parameter we will focus on will be the population mean (µ).
The null hypothesis will always specify a single value for the parameter in question.
The null is expressed as:
Ho: µ = µ0
µ0 is some numeric value assumed for the population mean µ.
Choosing the Alternative Hypothesis
Three choices are possible for the alternative hypothesis:
(a) Ha: µ ≠µ0
A hypothesis test of this kind is known as a two-tailed test.
(b) Ha: µ < µ0
A hypothesis test of this kind is known as a left-tailed test but is also known as a one-tailed test.
(c) Ha: µ > µ0
A hypothesis test of this kind is known as a right-tailed test but is also known as a one-tailed test.
Type I & Type II Errors in Hypothesis Testing
State of Nature
Ho is true
Ha is true
Accept Ho
Correct
Conclusion
Type II Error
Conclusion
Reject Ho
Type I Error
Correct
Conclusion
Type I & Type II Errors in Hypothesis Testing
• We would would like to avoid both type I and type II errors
• However, reducing the chance of one can only be achieved at the expense of
increasing the chance of the other.
• In undertaking hypothesis testing, there are α-risks and β-risks.
• We could define the two risks as follows:
α= Prob[Reject the Null Hypothesis | The Null is True]
β = Prob[Accept the Null Hypothesis | The Alternative Hypothesis is True]
Type I and type II errors: examples
Roman law and many other western legal systems assume a defendant is innocent unless the evidence
gathered by the prosecutor is sufficient to reject this assumption.
The hypotheses are:
H0: The defendant is innocent
H1: The defendant is guilty
Type I error?
=> Convicting an innocent defendant
Type II error?
=> Failing to convict a guilty defendant
Type I and type II errors: examples
When an Olympic athlete is tested for performance-enhancing drugs the
presumption is that the athlete is in compliance with the rules. The hypotheses are:
H0: No banned substance was used
H1: Banned substance was used
Type I error?
=> Unfairly disqualifying an athlete who is “clean”
Type II error?
=> Letting the drug user get away with it
Type I and type II errors: examples
To identify authorized and unauthorized persons for computer access, ATM
withdrawals, and entry into secure facilities, there is increasing interest in using the
person’s physical characteristics (e.g. fingerprints, facial structure, or iris patterns)
instead of plastic and paper IDs, which can be forged. The hypothesis are:
H0: User is legitimate
H1: User is not legitimate
Type I error?
=> Denying a legitimate user access to a facility or funds
Type II error?
=> Letting an unauthorized user have access to facilities or a financial account
An example: In dubio pro reo
• There is clearly a trade-off as reducing one leads to an increase in the other.
• “In doubt in favour of the accused”. Presumption of innocence.
• Corner stone of Roman Law and many western legal systems.
• Not universal: some legal systems incorporate elements of presumption of
guilt
• Better that ten guilty persons escape than that one innocent suffer… or
better that ten innocents suffer than one guilty one escape?
A man is on trial for murder.
There is a presumption of innocence: H0 : innocent
Type I error: finding him guilty when he is innocent
Type II error: finding him innocent when he is guilty
Trade-off: to lower the chance of a Type I error, the burden of proof
required to convict could be made tougher…but then there is a
higher chance of a Type II error!
An example: In dubio pro reo
Type I & Type II Errors in Hypothesis Testing
Many statisticians suggest minimizing the α-risk (i.e., minimize the probability of
making a Type I error).
The α-risk is the significance level of the test.
If α = 0.05, there is a 5% risk that the investigator rejects the null hypothesis
when in fact it is true.
Thus the probability that they have correctly rejected the null hypothesis is 95%.
This is known as the confidence level.
Type I & Type II Errors in Hypothesis Testing
The β-risk is related to the power of the test.
Since:
Prob[Accept the Null Hypothesis | The Alternative Hypothesis is True] + Prob[Reject the Null Hypothesis
| The Alternative Hypothesis is True] = 1
It follows that: 1 – Prob[Accept the Null Hypothesis | The Alternative Hypothesis is True] =
Prob[Reject the Null Hypothesis | The Alternative Hypothesis is True]
Thus: 1 – β = Prob[Reject the Null Hypothesis | The Alternative Hypothesis is True]
The expression 1 – β represents the power of the test.
The purpose of the investigator is to minimize the α-risk and maximize the power of the test (1 – β-risk).
Testing a Claim about a Population Mean
Example: should you invest in a Subway franchise?
• The owners of Subway claim that the weekly turnover of each
existing franchise is £5000 on average and at this level you are
willing to take on a franchise.
• A sample of 80 Subways is tested. The average weekly turnover is
found to be £4900, with standard deviation £500.
• Can we reject the owners’ claim?
Diagram of the decision rule
Distribution of mean under
the null hypothesis: μ=5000
Diagram of the decision rule
Distribution of mean under
the null hypothesis: μ=5000
Distribution of mean under the
alternative hypothesis: μ < 5000
Diagram of the decision rule
D
Null rejection region
Distribution of mean under the
alternative hypothesis: μ < 5000
Distribution of mean under
the null hypothesis: μ=5000
Diagram of the decision rule
D
Null rejection region
4900
Reject the Null
Hypothesis
Distribution of mean under the
alternative hypothesis: μ < 5000
Distribution of mean under
the null hypothesis: μ=5000
Diagram of the decision rule
D
Null rejection region
4900
Fail to Reject the
Null Hypothesis
But....how do we determine D?
Distribution of mean under the
alternative hypothesis: μ < 5000
Distribution of mean under
the null hypothesis: μ=5000
How to make a decision
• Where do we place the decision line?
• Set the probability of a Type I error to a particular value. By convention, this is
5%.
• There is thus a 5% probability of wrongly rejecting the null
• This is known as the significance level (α) of the test. It is complementary to
the confidence level (1- α) of estimation.
• 5% significance level α 95% confidence level.
Significance Level
D
5%
Distribution of mean under the null hypothesis: μ=5000
Hypothesis testing
• Is 4,900 far enough below 5,000 in order to reject the null hypothesis?
Step 1: Formulate the null and alternative hypotheses
• H0: μ = 5,000
H1: μ < 5,000
• This is a one tailed test, since the rejection region occupies only one side of
the distribution
• the alternative suggests true distribution can be to the left of the null: left-
tailed test
Null must always
contains the = sign
Step 2: Find your “critical value” corresponding to your significance level (e.g. 5%)
• Again, we need to revisit the standard Normal
• What is the z-score which cuts off the lower 5% tail of the standard Normal?
• Go to the standard normal and look for 0.05 in the body of the table…then find
the corresponding z-score
• 1.64 standard errors below the mean cuts off the bottom 5% of the standard Normal
distribution.
• Hence: -1.64 is the critical value for conducting the test
Significance Level
D
5%
Distribution of mean under the null hypothesis: μ=5000
Significance Level
- 1.64
Standard Normal N(0,1)
5% Threshold for deciding whether to reject the null hypothesis.
- 1.64
Standard Normal N(0,1)
Null rejection region
Significance Level
Step 3: Finding the z-score that corresponds to 4900; this is the “test statistic”
• Calculate the test statistic:
where is the mean under H0
79.1
80500
000,5900,4
22
0 −=
−
=
−
=
ns
xz µ
Standard error of
the mean0
µ
Comparing the Test Statistic with the Critical Value
- 1.64
Standard Normal N(0,1)
-1.79
Null rejection region
Hence Reject
Step 4: Specify the “Decision Rule”
In this one-tailed test (left tail):
• If the test statistic is smaller than the critical value, reject the null
hypothesis
• If the test statistic is larger than the critical value, fail to reject the null
hypothesis.
Step 5: Apply the decision rule and conclude
• 4,900 is 1.79 standard deviations below 5,000, so falls into the rejection
region (bottom 5% of the distribution)
• Hence, we can reject H0 at the 5% significance level
• If the true mean were 5,000, there is less than a 5% chance of obtaining
sample evidence such as from a sample of n = 80.900,4=x
Should H0 be rejected?
One or two tailed tests
• Use a one-tailed test if:
• you are only concerned about falling one side of the hypothesised value
(e.g. we would not worry if the turnover is greater than £5,000)
• Only one side is possible
• Use a two-tailed test if
• you are just as concerned about being above or below the hypothesized
value
• you know both outcomes are possible
• you are not sure!
One tailed test (5% significance)
Do not reject H0
z*= -1.64
Critical value
Two tailed test
-1.96
Reject H0
Do not reject H0
+1.96
1. Write out the null and alternative hypotheses (decide
whether to conduct a one- or two-tailed test)
2. Choose a significance level: e.g. 5%
3. Look up critical value z* e.g. at 5% level, one-tailed test: z0.05 =
-1.64
4. Calculate the test statistic:
e.g. in our example z = -1.79
5. Decision rule: reject H0 or do not reject H0
e.g. in our one-tailed example: -1.79 < -1.64 and
falls into the rejection region, so reject H0
Steps for hypothesis testing
Two tailed test example
• It is claimed that an average child spends 15 hours per week watching
television. A survey of 100 children finds an average of 14.5 hours per week,
with standard deviation 8 hours. Is the claim justified?
Two tailed test example
• It is claimed that an average child spends 15 hours per week watching
television. A survey of 100 children finds an average of 14.5 hours per
week, with standard deviation 8 hours. Is the claim justified?
• The claim would be wrong if children spend either more or less than 15
hours watching TV. The rejection region is split across the two tails of
the distribution. This is a two tailed test.
A two tailed test – 5% significance level
Under the H0
Reject H0 Reject H0
α/2=2.5% α/2=2.5%
-z* +z*
Solution to the problem
1. Write out the null and alternative hypotheses
H0: μ = 15
H1: μ ≠15
2. Choose significance level: 5%
3. Look up critical value: z*0.025 = +/- 1.96
4. Calculate the test statistic:
5. Decision rule: we do not reject H0 since
-1.96 < -0.625 < 1.96 and does not fall into the rejection region
625.0
1008
155.14
22
−=
−
=
−
=
ns
xz µ
The choice of significance level
• Why 5%?
• Like its complement, the 95% confidence level, it is a convention.
A different value can be chosen.
• If the cost of making a Type I error is especially high, then set a
lower significance level, e.g. 1%. The significance level is the
probability of making a Type I error.
P-value of a test
• The p-value is the probability of obtaining a test statistic at least
as extreme as the one observed, assuming H0 is true.
• Provides a measure of the strength of the results of a test, in
contrast to a simple reject or do not reject.
Test Statistics for the Population Mean
Is n large
(i.e., n ≥
30)?
Yes No
σ
known
σ
unknown
σ
known
σ
unknown
Use:
n
xz
σ
µ−=
Use:
n
s
xz µ−=
Use:
n
xz
σ
µ−=
Use:
n
s
xt µ−=
Test Statistics for the Population Mean: small sample
If we have a small sample and do not know the population standard
deviation, we use the t-test.
The critical values for this test are determined by the degrees of
freedom which are n – 1 in the single population applications.
Small samples (n < 30)
• Two consequences:
– the t distribution is used instead of the standard normal for tests of the mean
– tests of proportions in small samples cannot be done by the standard methods used
in the book
– In small samples the sample proportion is based on the Binomial distribution
12
~ −
−
= nt
ns
xt µ
Testing a mean with small samples
• A sample of 12 cars of a particular brand average 35 mpg, with standard
deviation 15. Test the manufacturer’s claim of 40 mpg as the true average.
• H0: μ = 40
H1: μ < 40
• The test statistic is
• The critical value of the t distribution (df = 11, 5% significance level, one tail)
is t*0.05,11 = -1.796
• Hence we cannot reject the manufacturer’s claim
15.1
1215
4035
2
−=
−
=t
Testing a mean with small samples
Testing a Claim about a Proportion
In order to implement tests concerning proportions a number of
assumptions need to be satisfied.
If the conditions that:
n π ≥ 5 and n(1 – π) ≥ 5 are both satisfied
This allows the binomial distribution to be approximated by a normal
distribution.
The z-score then provides the basis for the statistical test.
Testing a Claim about a Proportion
Define n = number of trials (or observations)
̅ = sample proportion; π = population proportion
The test statistic for hypothesis tests on proportions is expressed as:
= ̅ − π
σ̅
~ (0, 1)
The standard deviation of the proportion is expressed as a function of the
proportions that prevail under the null hypothesis. Thus:
σ̅ = π (1− π)
Testing a Claim about a Proportion
We could re-write the z-statistic as:
= ̅ − π
π (1− π)
~(0, 1)
Use z critical values for inferences.
Testing hypotheses about a proportion
• Same principles: reject H0 if the test statistic falls into the rejection region.
• To test H0: π = 0.5 vs. H1: π ≠ 0.5 (e.g. a coin is fair vs. not fair)
• The test statistic is:
( ) ( )
n
p
n
pz
5.015.0
5.0
1 −
−
=
−
−
=
ππ
π
• If the sample evidence were 60 heads from 100 tosses (p = 0.6).
Note that n π ≥ 5 and n(1 – π) ≥ 5 are both satisfied
We would have
• So we would (just) reject H0 since 2 > 1.96.
( )
2
100
5.015.0
5.06.0
=
−
−
=z
Testing hypotheses about a proportion (large samples)
Summary
• The principles are the same for all tests:
• write out the null and alternative
• choose a significance level
• look up the critical value from the z or t tables
• calculate the test statistic
• decide whether to reject or not reject null (sketch!)
• The formula for the test statistic depends upon the problem (mean,
proportion, etc)
• The rejection region varies, depending upon whether it is a one or two
tailed test
Appendix: How to derive the standard deviation of the mean
var(ax) = a2 var(x)
where var(•) denotes the variance and a is some constant and
var(x+y) = var(x) + var(y)
Using these two formulae we can now derive the expression:
Using the first formula above, we can write this as:
var(x) = var( n
xi∑ ) = var( n
1 ∑ ix )
var(x) =
n
1 2 var(∑ ix )
=
n
1 2 var(x1 + x2 + x3 + x4 +.......+ xn)
var(x) =
n
1 2 [var(x1)+ var(x2 ) + var(x3)+........+ var(xn)]
var(x) =
n
1 2 [σ2 +σ2 + σ2 +σ2 + ........+σ2 ]
=
n
1 2 [nσ2 ]
= n
2σ
学霸联盟