r studio代写-MT2300
1. Introduction and Preliminaries
Faraway: Chapter 1 Krzanowski: Chapter 1
Kleinbaum et al: Chapter 4
Frees: Sections 1.1–1.3
Mendenhall et al: Chapter 2
1.1 Motivation
We deal with measurements of several variables for each of n experimental units or
individuals. The variables are of two types (though the distinction between them is not
always rigid in applications): those of primary interest to the investigator and those
which might provide supplementary or background information. The variables of the
former type are called response, outcome or dependent variables, while those of latter
type are called explanatory, independent or predictor variables. Econometricians also
use the terms endogenous and exogenous to distinguish the two types of variables. The
explanatory variables are used to predict or to understand the response variables.
Relation between variables, models
We distinguish between a functional relation and a statistical relation. The functional
relation between the independent variable X and the dependent variable Y is often ex-
pressed as a mathematical formula
Y = f(X)
and the main feature of this relation is that the observations (xi, yi) (i = 1, . . . , n) fall
directly on the “curve” of the relationship, that is, on the curve y = f(x).
A statistical relation, unlike a functional relation is not a “perfect”one. Very often
explanatory variables are thought of as fixed, and response variables are thought of as
random variables with a distribution depending on the explanatory variables. Therefore,
for each value of an explanatory variable x the response Y may be supposed to be a
random variable with expectation (mean value) f(x) = E(Y |X = x), E(Y |x) in short.
Then a statistician may wish to determine the function f using sample data consisting of
pairs (xi, yi) (i = 1, . . . , n). Function f is called the regression function for regressing Y
on X and X is called the regressor.
The regression function f(x) represents the systematic component of the model. The
systematic component of the model is concerned with overall population features such as
expected values. To emphasize the existence of the random component of the model, the
response is often written in the form
Y = f(x) + ε
where f is the regression function, that is, the systematic component, and ε is the random
component. In most applications ε is a normal random variable with mean zero and
variance σ2 (ε ∼ N (0, σ2)).
Linear statistical model
The systematic component f is often expressed in terms of explanatory variables
through a parametric equation. If, for example, it is supposed that
f(x) = A+Bx+ Cx2
f(x) = A2x +B
f(x) = A log x+B,
then the problem is reduced to one of identifying a few parameters, here labeled as A,B,C.
In each of these three forms for f given above, f is linear in these parameters.
For example, A2x + B can be written as f(x,β) = g(x)Tβ, where g(x)T = (1, 2x) is
known as transformed input and βT = (B,A) is vector of the model parameters. Similarly,
A log(x) + B writes as g(x)Tβ, where transformed input is given by g(x)T = (1, log(x))
and vector of model parameters is still βT = (B,A).
It is the linearity in the parameters that makes the model a linear statistical model!
1.2 The model of measurements, revision of Year 1 Statistics.
Let µ be an unknown quantity of interest which can be measured with some error. A
mathematical (statistical) model for this experiment is specified by the following equation
(the model equation) Y = µ+ ε, where Y is the available measurement (observation) and
ε is a random error modelled as a random variable with zero mean, say, with normal
distribution with variance σ2, i.e. ε ∼ N(0, σ2). By properties of the normal distribution,
we have that Y ∼ N(µ, σ2). Suppose that we have n measurements Yi = µ + εi, where
εi ∼ N(0, σ2), (i = 1, ..., n, ) are independently distributed. It follows that Y1, ..., Yn are
then also independent random variables with Yi ∼ N(µ, σ2). In other words, Y1, ..., Yn is
a random sample from a normally distributed population with mean µ and variance σ2,
so that the problem of estimating an unknown quantity µ is the well known (from the 1st
Year statistics) problem of estimating a population mean of a normal population. The
sample mean Y¯ = 1
(Y1 + ...+ Yn) =
i=1 Yi is usually used as a point estimator of µ.
In MT1300 we briefly stated that there are several general methods of obtaining point
estimators. In this course we are going to use one of these methods, namely, the method
of the least squares (LS). To demonstrate the main idea of this method, let us consider
the case of the model of measurements. Given observations Y1, ..., Yn define the following
S(µ) =
(Yi − µ)2.
The value of µ that minimises S(µ) is called the least square estimator of µ. We can find
the point of minimum of S(µ), by equateing to zero the first derivative of S(µ)
S ′(µ) = −2
(Yi − µ) = −2
Yi − nµ
= 0.
Now it is easy to see that the solution of the above equation is Y¯ = 1
i=1 Yi, the sample
mean, and this is the point of minimum, as the second derivative of S(µ) at µ = Y¯ is
2n > 0. There is also a direct way to see that the sample mean is the point of minimum,
and, hence, the least square estimator of µ. Indeed,
S(µ) =
(Y 2i − 2Yiµ+ µ2) =
Y 2i − 2nµY¯ + nµ2
= −2nµY¯ + nµ2 + nY¯ 2 +
Y 2i − nY¯ 2
= n(µ− Y¯ )2 +
Y 2i − nY¯ 2 ≥
Y 2i − nY¯ 2,
where inequality becomes equality if and only if µ = Y¯ . Note finally that
S(Y¯ ) =
Y 2i − nY¯ 2 = (n− 1)s2,
where s2 is the sample variance, which is the point estimator of another model parameter
σ2 (see the next section).
1.3 Parametric statistical inference, brief revision of
Year 1 background
Krzanowski: Chapter 2
Kleinbaum et al: Chapter 3
Frees: Chapter 2
Mendenhall et al: Chapter 1
Newbold, Chapter 9
The process of making statements about population characteristics/parameters given
only information from samples is known as parametric statistical inference.
Example 1 A mechanical jar filler for filling jars with coffee does not fill every jar with
the same quantity. The weight of coffee Y filled in a jar is a random variable which can be
assumed to be normally distributed with mean value µ and variance σ2 (Y ∼ N (µ, σ2)).
Suppose that we have a sample of n independent measurements on Y and wish to
“identify” the parameters of the population (µ, σ2).
The sort of statements that we wish to make about parameters will often fall into one
of the following three categories:
• Point estimation;
• Interval estimation;
• Hypotheses testing.
1.3.1 Point estimation
Point estimation is the aspect of statistical inference in which we wish to find the “the
best guess” of the true value of a population parameter.
Suppose that Y1, Y2, · · · , Yn is a random sample from the population of interest.
Then an estimator of an unknown parameter θ is some function of the observations
Y1, Y2, · · · , Yn, that is
θˆ = θˆ(Y1, Y2, · · · , Yn)
(which is in some sense a “good approximation” to the unknown parameter θ).
Example 1 (continued) A point estimator of µ in a N (µ, σ2) population is provided by
the sample mean Y¯ , which is defined by
Y¯ =
Yi =
(Y1 + Y2 + · · ·+ Yn).
To estimate σ2 in aN(µ, σ2) population we generally use as its estimator the sample variance
s2 defined by
s2 =
n− 1
(Yi − Y¯ )2.
It is easy to see that s2 = 1
i=1 Y
i − nY¯ 2
Properties of Estimators
Let θˆ = θˆ(Y1, Y2, · · · , Yn) be an estimator of an unknown parameter θ. To clarify in
what sense θˆ is a “good approximation” to θ we consider estimators which are (1) unbiased
and (2) mean square consistent.
(1) θˆ is said to be an unbiased estimator of θ if E(θˆ) = θ.
Example 1 (continued) Y¯ is an unbiased estimator of µ and s2 is an unbiased estimator
of σ2.
To check whether we have a sensible estimator we need to ensure that θˆ is increasingly
likely to yield the right answer θ as the sample size n gets bigger. The mean square error
(MSE) of θˆ is defined to be E(θˆ − θ)2. Since the MSE of θˆ is the average square distance
of θˆ from the true value θ, a good estimator is one with a small MSE.
(2) θˆ is said to be a mean square consistent estimator of θ if
MSE(θˆ)→ 0 as n→∞.
Note that if θˆ is unbiased then it is also mean square consistent if Var(θˆ) → 0 with
1.3.2 Interval estimation
Point estimation is often not sufficiently informative as it does not say anything about
the error of the estimation procedure. Naturally, if the error is large, then we are less
confident in our estimate. Replacing a point estimator by an interval estimator allows
us to quantify the uncertainty of estimation by specifying a desirable level of confidence,
which is the probability of the interval capturing the true value of the parameter. Such
interval estimators are known as confidence intervals (C.I.).
Example 1 (continued) To construct a confidence interval for µ we recall that Y¯ is a linear
combination of independent N (µ, σ2) random variables (Y¯ = ∑ni=1 Yi/n) and therefore is
normally distributed with mean µ (unbiased) and variance σ2/n, that is, Y¯ ∼ N (µ, σ2/n).
It therefore follows that if σ2 is known, then
Z =
Y¯ − µ

∼ N (0, 1)
and so
Y¯ − zα/2σ/

n ≤ µ ≤ Y¯ + zα/2σ/

= 1− α,
that is, the (1− α)100% confidence interval for µ is given by(
Y¯ − zα/2σ/

n, Y¯ + zα/2σ/

If σ2 is unknown, then we construct our CI based on the following T -variable
T =
Y¯ − µ

∼ tn−1,
where tn−1 is the t-distribution with n− 1 degrees of freedom, and
Y¯ − tn−1,α/2s/

n ≤ µ ≤ Y¯ + tn−1,α/2s/

= 1− α,
so that the (1− α)100% confidence interval for µ is given by(
Y¯ − tn−1,α/2s/

n, Y¯ + tn−1,α/2s/

Note that while both the intervals are centered at Y¯ , the margin of error tn−1,α/2s/

is a random variable unlike the margin of error zα/2σ/

n in the normal CI, which is a
non-random quantity (i.e. does not depend on the sample).
Example 1 (continued) Jars of coffee are labeled as 484 grams in weight. A random
sample of ten jars from a production line are opened and weighed accurately. The ten
weights are found to be as follows:
483.7 485.6 486.2 486.0 488.1 480.3 485.4 485.2 483.7 483.3
It is assumed that weights of coffee are normally distributed with mean µ grams and
standard deviation σ.
Find the 95%CI for the true population mean of jar weights.
Using the information provided we find y¯ = 1
(y1+· · ·+y10) = 484.75 and s2 = 19
i=1 y
10(y¯)2 = 3.24 (grams squared). Therefore, the 95% CI for the population mean is 484.75±
= (483.462, 486.038), where the critical value t9,0.025 = 2.262 is found in the
Tables (or using software).
1.3.3 Hypotheses testing
Often an investigator has a theory about the phenomenon under study, and wishes
to see whether this theory is confirmed by the data that have been collected. The
null hypothesis H0 is, usually, what we are prepared to “go along with” until we obtain
convincing evidence in favour of the alternative hypothesis H1. To conduct a hypothesis
test we need to complete the following steps.
(1) Specify the null and alternative hypotheses.
(2) Choose a test statistic T which is such that
◦ T behaves differently under the null and alternative hypotheses;
◦ the sampling distribution of T is fully specified when H0 is true.
(3) Formulate some decision rule based on the statistic T .
Whatever decision rule is adopted, there is some chance of reaching an erroneous
conclusion about the population parameter of interest. One error that could be made,
called a Type I error, is the rejection of a true null hypothesis. If the decision rule is
such that the probability of rejecting of a true null hypothesis is α, then α is said to
be the significance level of the test. The other possible error, called Type II error, arises
when a false null hypothesis is accepted. Suppose that for a particular decision rule , the
probability of making such an error is β. Then, the probability of rejecting a false null
hypothesis is (1− β), which is called the power of the test.
ACCEPT Correct decision Type II error
Probability =1− α Probability = β
REJECT Type I error Correct decision
Probability = α Probability =1− β
(significance level) (power)
Ideally we would like to have the probabilities of both types of error as small as possible.
However, in general, once a sample has been taken, any adjustment to the decision rule to
reduce the probability α of type I error automatically increases the probability β of type
II error. The only way of simultaneously lowering both α and β would be to obtain more
information about the population, e.g., by taking a larger sample. In practice we usually
specify significance level (type I error) α to have a small value such as 0.10, 0.05, 0.025,
or 0.01. This then determines the probability of Type II error β (if there is a choice of
tests then we prefer the one with the smallest β, that is, with the highest power (1− β)).
For a given significance level, the bigger is the sample size, the higher will be the power
of the test.
Example 1 (continued) Would you say that the jars are labeled correctly?
So, the statistical model is already specified. We have a random sample of n obser-
vations Y1, Y2, . . . , Yn with Yi ∼ N (µ, σ2). The objective is to test hypotheses about the
unknown population mean.
Consider the problem of testing the simple null hypothesis that the population mean
is equal to some specified value µ0
H0 : µ = µ0
against one of the following three alternative hypotheses
(i) H1 : µ > µ0, (ii) H1 : µ < µ0, (iii) H1 : µ 6= µ0.
Test of the mean of a normal distribution:
Population variance known
Assume first that population variance is known. For all three cases, when the null
hypothesis is true we have
Z =
Y¯ − µ0

∼ N (0, 1).
If H1 is true then in case (i) the r.v. Z will tend to be larger (for (ii) Z will tend to be
smaller and for (iii) the absolute value of Z will tend to be larger) than would be expected
for a standard normal random variable. Let us denote by cα the number for which
P{Z > cα} = α
where Z ∼ N (0, 1). Then a test with significance level α (type I error) is obtained from
the decision rule:
(i) For H1 : µ > µ0,
Reject H0 if
y¯ − µ0

> cα
(ii) For H1 : µ < µ0,
Reject H0 if
y¯ − µ0

< −cα
(iii) For H1 : µ 6= µ0,
Reject H0 if
∣∣∣∣ y¯ − µ0σ/√n
∣∣∣∣ > cα2 .
Example 1 (continued) Assume that the standard deviation is given as σ = 1.8 gram.
Test of the mean of a normal distribution:
Population variance unknown
Suppose now that the population variance is no longer assumed known. If the sample
size is not large, the procedures discussed above are no longer appropriate.
To perform a testing procedure we replace σ2 by its estimator, the sample variance s2:
T =
Y¯ − µ0

Now, if the null hypothesis is true then the r.v. T follows a Student’s t distribution with
(n− 1) degrees of freedom (tn−1). Now we can use precisely the same arguments adopted
above with the Student’s t distribution now playing the same role as the standard normal
Let us denote by cα the number for which
P{T > cα} = α where T ∼ tn−1
(cα is the (1−α)th quantile, tn−1(1−α), of tn−1 distribution.) Then a test with significance
level α (type I error) is obtained from the decision rule:
(i) For H1 : µ > µ0,
Reject H0 if
y¯ − µ0

> cα
(ii) For H1 : µ < µ0,
Reject H0 if
y¯ − µ0

< −cα
(iii) For H1 : µ 6= µ0,
Reject H0 if
∣∣∣∣ y¯ − µ0s/√n
∣∣∣∣ > cα2 .
Example 1 (continued) Assume that the standard deviation is unknown.
Test of the mean of a normal distribution:
Large sample sizes
Suppose that we have a random sample of n observations from a population with mean
µ and variance σ2. If the sample size n is large (n ≥ 30), the test procedures developed
for the case where the population variance is known can be employed when it is unknown,
replacing σ2 by the observed sample variance s2. Moreover, these procedures remain
approximately valid even if the population distribution is not normal.
The smallest significance level at which a null hypothesis can be rejected is called the
probability value or p-value of the test on the given sample.
The p-value gives the probability of observing a value as extreme as the one we have
got or even more extreme, when the null hypothesis is true. Suppose that the data produce
Tobs as the value of the test statistic T . Then we assume that H0 is true and calculate
the probability p of observing a value of T that is as extreme as Tobs or more extreme
than Tobs, where ‘extreme’ is determined by the direction of departure of H1 from H0. For
example, in the above procedures, if we test H1 : µ = µ0 against H1 : µ > µ0 then the
value of T more extreme than Tobs in the direction of departure of H1 would be values
such that T > Tobs. On the other hand, if H1 : µ 6= µ0, then “more extreme” would be
either T > |Tobs| or T < −|Tobs|.
Example 1 (continued) Find the p-value of the test if σ2 is assumed to be known.
In general, to draw conclusions about a test on the basis of the p-value, the following
guidelines are recommended:
1. If p is small (less than 0.01), reject H0.
2. If p is large (greater than 0.10), do not reject H0.
3. If 0.01 < p < 0.10, the significance is borderline: that is, we reject H0 for α = 0.10
but do not reject H0 for α = 0.01.
Note that if we actually do specify α a priori, we reject H0 if p < α.
In this example, the obvious choices of the null and alternative hypotheses are H0 :
µ = 484, H1 : µ 6= 484. Significance level 0.05 (that is 5%) is specified. The standard
deviation σ is unknown, but from the information provided we can estimate it by the
sample standard deviation s =

s2= 1.8 gram (and s2 could be calculated from the
statistics y1 + · · ·+ y10 = 4847.5, and y21 + · · ·+ y210 = 2349850).
The test statistic is the t-statistic T = Y¯−µ0

∼ t9, which has the t-distribution with
9 = 10 - 1 degrees of freedom. Recall from the above that Y¯ = 484.75 and s2 = 3.24.
Therefore, we have got for the sample that Tobs =

= 1.318. Decision with the
critical values (acceptance/rejection regions).
From the tables of t-distribution we find that t9,0.025 = 2.262. So, the corresponding
rejection region (or critical region) is (−∞,−2.262) ∪ (2.262,∞). Then, since −2.262 <
1.318 < 2.2262, we say that at 5% significance level, the data do not provide enough
evidence for rejection of the null hypothesis.
Decision with the p-value. H1 is two-sided, so that p-value = 2(1 − P (T ≤ |Tobs|)) =
2(1− P (T ≤ 1.318)).
P (T ≤ 1.318) is not explicitly given in the Tables but can be approximated by the
closest available values P (T ≤ 1.3) ≤ P (T ≤ 1.318) ≤ P (T ≤ 1.4), that is, or 0.887 ≤
P (T ≤ 1.318) ≤ 0.9025, which gives the p-value of at least 0.195, which is higher than
0.05, hence we do not reject H0.
Test for the difference between two means: Matched pairs
Consider a different testing situation in which there are n experimental units, each of
which generates a pair of observations as a result of some treatment. Thus there is a set
of n values before the application of the treatment and then a second set of n values after
the application of the treatment, i.e.
Experimental Unit 1 2 3 . . . n
Before treatment y11 y12 y13 . . . y1n
After treatment y21 y22 y23 . . . y2n
Differences d1 d2 d3 . . . dn
The single sample d1, d2, · · · , dn is formed from the differences of the samples, i.e. di =
y1i − y2i. The objective is to test whether the ‘before’ and ‘after’ populations are the
same. Assume that d1, d2, · · · , dn comes from N(µ, σ2) population; then the procedures
developed for the one-sample test can be employed to investigate the null hypothesis
H0 : µ = 0, where µ = E[D1 −D2], the population mean difference of scores before and
after the the treatment. The three alternative hypotheses are µ 6= 0 or µ < 0 or µ > 0.
Example 2 A group of 12 subjects were given a series of tests to assess their memory,
concentration and capacity to undertake simple arithmetic and logic computations. Their
scores were recorded as Score 1. The same subjects again completed an equivalent series
of tests when they were in the fifth week of a slimming diet and their scores were recorded
as Score 2. the results are given in the table below.
Subject 1 2 3 4 5 6 7 8 9 10 11 12
Score 1 60.2 70.7 39.5 40.3 22.5 53.8 62.5 57.1 54 63.9 59.1 67
Score 2 51.6 63.9 43.3 41.2 20.9 47.3 53.6 60.2 44.3 56.7 47.2 72.3
Do the data support the suggestion that dieting reduces mental effectiveness (during the
period of dieting)?