程序代写案例-COMM5005|学霸联盟

程序代写案例-COMM5005

时间：2022-05-01

Quantitative Methods
for Business
COMM5005
Lecture 7
Yiyuan Xie
Statistics Flow Chart
Statistics
Disclaimer: These lecture notes are property of the author. They are for the exclusive use of the UNSW students enrolled in COMM 5005 Quantitative Methods
for Business for Term 3, 2020. Reproduction, distribution and re-use of these lecture notes in any way, shape or form is strictly prohibited.
Descriptive Inferential
Multiple linear regression
Moments
First Moment: Mean
Second Moment: Variance
Third Moment: Skewness
Fourth Moment: Kurtosis
Distributions of
Variables
Normal Distribution
Binomial Distribution
Bi-variate Distribution
Uniform Distribution
Lecture 8
Probability
Simple linear regression
Hypothesis Testing
Confidence Interval
Estimation
Sampling distributions
Lecture 7
Lecture 6Week 5
Lecture 6
In this lecture we will cover:
• Confidence interval estimation
• Fundamentals of hypothesis testing: One-sample
tests
4-2
Lecture 7 topics
Objectives
 Construct and interpret confidence interval estimates
 Determine the sample size necessary to develop a confidence
interval
 Identify the basic principles of hypothesis testing
 Explain the assumptions of hypothesis-testing procedure
 Conduct a hypothesis test
Readings
4-4
Sections of Berenson, M. et al. 5th ed. will help you to
understand this week‟s topics more clearly.
Chapter Name Pages
8.1 – 8.4 Confidence interval estimation 279-299
9.1-9.4 Fundamentals of hypothesis testing:
One-sample tests
315-340
1. Confidence interval estimation
• Suppose you want to know the mean number of hours of
paid work undertaken per week by UNSW students.
• What you can do is to conduct a survey. The survey
shows you a sample mean of 14.8 hours.
• But the question is, how accurate is the answer of 14.8
hours?
• We need a confidence interval to answer this question.
A point estimate is the value of a single sample
statistic.
A confidence interval provides a range of values
constructed around the point estimate.
Confidence interval (1 of 3)
An interval gives a range of values
• Takes into consideration variation in sample statistics from
sample to sample
• Based on observations from one sample
• Gives information about closeness to unknown population
parameters
• Stated in terms of level of confidence
• Can never be 100% confident
Confidence interval (2 of 3)
• The general formula for all confidence intervals
is:
Point Estimate ± (Critical Value)*(Standard Error)
• This represents confidence for which the interval
will contain the unknown population parameter.
• How to determine the critical value? It depends
on how confident we want to be.
• A higher confidence level  larger critical value.
Confidence interval (3 of 3)
Common confidence levels = 90%, 95% or 99%
• Also written (1 - ) = 0.90, 0.95 or 0.99
A relative frequency interpretation:
• In the long run, 90%, 95% or 99% of all the confidence intervals
that can be constructed (in repeated samples) will contain the
unknown true parameter
A specific interval either will contain or will not contain the true
parameter:
• Example: suppose [12.2, 17.4] is the 95% confidence interval
you constructed for the paid work example in the beginning of
the slides. You can say “I am 95% confident that the mean work
hours in the population of UNSW students is somewhere
between 12.2 and 17.4 hours.”
Confidence interval for μ (σ known)
X
nσ/
n
σ
ZX 
Finding the critical Z value
The value of Z needed for constructing a confidence interval is called
the critical value for the distribution.
• Critical value: The value in a distribution that cuts off the required
probability in the tail for a given confidence level.
For a 95% confidence interval the value of α is 0.05.
The critical Z value corresponding to a cumulative area of 0.9750 is
1.96 because there is 0.025 in the upper tail of the distribution and
the cumulative area less than Z = 1.96 is 0.975.
There is a different critical value for each level of confidence 1 - α.
Normal curve for determining the Z value needed for 95% confidence
Normal curve for determining the Z value needed for 99% confidence
Common levels of confidence
Confidence
Level
Confidence
Coefficient
1-α
Z Value
80% 0.80 1.28
90% 0.90 1.645
95% 0.95 1.96
98% 0.98 2.33
99% 0.99 2.576
99.8% 0.998 3.08
99.9% 0.999 3.27
Example
Confidence interval for μ (σ unknown)
If the population standard deviation σ is unknown, we can
substitute the sample standard deviation, S
This introduces extra uncertainty, since S is variable from sample to
sample
So we use the Student’s t distribution instead of the normal
distribution
• the t value depends on degrees of freedom denoted by sample
size minus 1; i.e. (d.f = n - 1)
• d.f are number of observations that are free to vary after sample
mean has been calculated
Confidence interval for μ (σ unknown)
Confidence interval estimate
• where t is the critical value of the t distribution with n – 1
degrees of freedom and an area of α/2 in each tail
• Note that S is the standard deviation of the sample:
n
S
tX 1-n
1-n
)X(X
S
n
1i
2
i



Degrees of freedom
Degrees of freedom: the number of values in the
calculation of a statistic that are free to vary.
Suppose the mean of 3 numbers is 8.0
• Let X1 = 7
• Let X2 = 8
• What is X3?
If the mean of these three values is 8.0, then X3 must be 9
(i.e. X3 is not free to vary)
Here, n = 3, so degrees of freedom = n – 1 = 3 – 1 = 2
(2 values can be any numbers, but the third is not free to vary for a
given mean)
t distribution with 99 degrees of freedom
Application: confidence interval
estimation for the proportion
n
)(1
σp
 

Confidence interval endpoints
• Upper and lower confidence limits for the population proportion
are calculated with the formula
• Z is the standard normal value for the level of confidence
desired
• p is the sample proportion
• n is the sample size
n
p)p(1
Zp


Example
• A random sample of 100 people shows that 25 are left-handed
• The point estimate of the proportion is 25/100.
• Form a 95% confidence interval for the true proportion of left-
handers:
/1000.25(0.75)1.9625/100
p)/np(1Zp


0.3349 0.1651
(0.0433) 1.96 0.25



Interpretation
We are 95% confident that the true percentage of left-handers in
the population is between 0.1651 and 0.3349; i.e.
16.51% and 33.49%
Although the interval from 0.1651 to 0.3349 may or may not contain
the true proportion, 95% of intervals formed from repeated
samples of size 100 in this manner will contain the true
proportion.
Determining sample size: general
guideline
n
σ
ZX 
Determining sample size for the
mean
The sample size n is equal to the product of the Z value squared
and the variance σ2, divided by the sampling error e squared.
To determine the required sample size for the mean,
you must know the:
• desired level of confidence (1 - ), which determines the critical Z
value
• acceptable sampling error, e
• standard deviation, σ
2
22
e
σZ
n 
If σ is unknown
If unknown, σ can be estimated using one of the following
approaches:
• from past data using that data‟s standard deviation
• if population is normal, range is approximately 6σ so we can
estimate σ by dividing the range by 6
• conduct a pilot study and estimate σ with the sample
standard deviation,
1-n
)X(X
S
n
1i
2
i



Application: determining sample size
for the proportion
2
2
e
)(1Z
n
ππ 

Example: determining sample size
for the proportion
450.74
(0.03)
0.12)(0.12)(1(1.96)
e
)(1Z
n
2
2
2
2





ππ
2. Fundamentals of hypothesis testing
The null hypothesis, H0
How to formulate a null hypothesis, H0?
• States the belief or assumption in the current situation (status
quo)
• Begin with the assumption that the null hypothesis is true
• Refers to the status quo
• Always contains „=‟, „≤‟ or „‟ sign
• Is always about a population parameter; e.g. μ, not about a
sample statistic
Example: The average number of TV sets in Australian homes is
equal to 3 (H0 : μ = 3 )
The alternative hypothesis, H1
Alternative hypothesis is the opposite of the null hypothesis, and it
is generally the claim or hypothesis that the researcher is trying
to prove.
How to formulate an alternative hypothesis?
• Challenges the status quo
• Can only contain either the „<‟, „>‟ or „≠‟ sign (not „=‟)
e.g. The average number of TV sets in Australian homes is not
equal to 3 (H1 : μ ≠ 3 )
Determining the test statistic and regions
of rejection and non-rejection
The test statistic is a value derived from sample data that is used
to determine whether the null hypothesis should be rejected or
not.
The sampling distribution of the test statistic is divided into two
regions, a region of rejection and a region of non-rejection.
To make a decision concerning the null hypothesis, you first
determine the critical value of the test statistic.
Regions of rejection and non-rejection in hypothesis testing
Risks in decision making using hypothesis
testing – type 1 error
Type I error
• Reject a true null hypothesis
• Considered a serious type of error
Example: The Australian Government will invest almost $6 million to support
research and development of Australian COVID-19 vaccine. Then testing the
vaccine, the null hypothesis: the vaccine has no effect, and the alternative
hypothesis is: the vaccine has an effect.
Type I error in this example: the test rejects the null hypothesis (i.e. showing
an effect), but in fact, the vaccine does not have any effect.
The probability of Type I error is 
• Called level of significance of the test; i.e. 0.01, 0.05, 0.10
• Set by the researcher in advance
Risks in decision making using hypothesis
testing
Type II error
• Fail to reject a false null hypothesis
Example: In the previous example, the null hypothesis: the vaccine
has no effect, and the alternative hypothesis is: the vaccine has
an effect.
Type II error: the test does not reject the null hypothesis (i.e.
showing no effect), but in fact, the vaccine does have any
effect.
The probability of Type II error is β
The power of a statistical test, 1 – β, is the probability that you will
reject the null hypothesis when it is false and should be rejected.
The level of significance, 
The level of significance, , defines the unlikely values of the
sample statistic if the null hypothesis is true
• That is, it defines rejection region of the sampling distribution
• Typical values of  are 0.01, 0.05 or 0.10
•  is selected by the researcher at the beginning
• It provides the critical value(s) of the test
The confidence coefficient
The confidence coefficient, 1 - , is the
probability that you will not reject the null
hypothesis, H0, when it is true and should not be
rejected.
The confidence level of a hypothesis test is:
(1 - ) X 100%.
Z Test of Hypothesis for the Mean (σ Known)
The Z test of hypothesis for the mean is a test
about the population mean which uses the
standard normal distribution.
n
σ
μX
Z


Two-tail tests
A two-tail test is a hypothesis test where the rejection region is
divided into the two tails of the probability distribution.
There are two cut-off values (critical values) that define the
regions of rejection.
Example: The two-tail test can be used to test the null hypothesis
that the average number of TV sets in Australian homes is
equal to 3 (H0 : μ = 3 ), because the alternative hypothesis is μ
≠ 3 (that is, either greater or less than 3 – two sides).
Critical value approach to testing
For a two-tail test for the mean, σ known:
• Convert sample statistic ( ) to the test
statistic (Z statistic)
• Determine the critical Z values for a specified
level of significance  from a Table E.2 or
computer (important: each tail of the
distribution must have an area of /2)
• Decision Rule: If the test statistic falls in the
rejection region, reject H0; otherwise do not
reject H0
X

The six-step method of hypothesis testing
1 State the null hypothesis, H0, and the alternative hypothesis, H1
2 Choose the level of significance, , and the sample size, n
3 Determine the appropriate test statistic and sampling distribution
4 Determine the critical values that divide the rejection and non-
rejection regions
5 Collect data and calculate the value of the test statistic
6 Make the statistical decision and state the managerial conclusion
• if the test statistic falls into the non-rejection region, do not
reject the null hypothesis H0; if the test statistic falls into the
rejection region, reject the null hypothesis
• express the managerial conclusion in the context of the real-
world problem
The p-value approach to hypothesis testing
The p-value is the probability of getting a test statistic more extreme than the
sample results, given that the null hypothesis, H0, is true.
• It is also called observed level of significance
• It is the smallest value of  for which H0 can be rejected
Decision making:
(1) choose a level of significance ;
(2) Compute the sample statistic and use it to obtain the p-value from
Table E.2 or computer, then
• If p-value <  , reject H0
• If p-value   , do not reject H0
Finding a p-value for a two-tail test: suppose the Z statistic is 1.5, so p-value = 0.0668 + 0.0668 = 0.1336 >
 as 0.05, thus do not reject H0.
A connection between confidence
interval and hypothesis testing
Although confidence interval estimation and hypothesis testing are
based on the same set of concepts, they are used for different
purposes.
Confidence intervals: used to estimate parameters
Hypothesis testing: used for making decisions about specified
values of population parameters
One-tail tests
Critical value approach to testing
For a one-tail test for the mean, σ known:
• Convert sample statistic ( ) to the test
statistic (Z statistic)
• Determine the critical Z values for a specified
level of significance  from a Table E.2 or
computer (important: be careful on which
tail the region of rejection is in, and the tail
of the distribution must have an area of )
• Decision Rule: If the test statistic falls in the
rejection region, reject H0; otherwise do not
reject H0
X
A one-tail or directional test is a hypothesis where the entire rejection region is
contained in one tail of the sampling distribution.
The test can be either upper-tail or lower-tail.
t test of hypothesis for the mean (σ
unknown)
The t test of hypothesis for the mean (σ unknown) is a test about
the population mean that uses a t distribution.
where S is the sample standard deviation, and the test statistic t
follows a t distribution having n – 1 degrees of freedom.
The procedure of the test is similar to the case where σ is known.
The only difference is that you should use a t distribution table (E.3)
rather than a normal distribution table (E.2).
n
S
μX
t

