xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

扫码添加客服微信

扫描添加客服微信

r studio代写-MT2300

时间：2021-02-25

MT2300

2020/2021

1. Introduction and Preliminaries

Reading

Faraway: Chapter 1 Krzanowski: Chapter 1

Kleinbaum et al: Chapter 4

Frees: Sections 1.1–1.3

Mendenhall et al: Chapter 2

1.1 Motivation

Terminology

We deal with measurements of several variables for each of n experimental units or

individuals. The variables are of two types (though the distinction between them is not

always rigid in applications): those of primary interest to the investigator and those

which might provide supplementary or background information. The variables of the

former type are called response, outcome or dependent variables, while those of latter

type are called explanatory, independent or predictor variables. Econometricians also

use the terms endogenous and exogenous to distinguish the two types of variables. The

explanatory variables are used to predict or to understand the response variables.

Relation between variables, models

We distinguish between a functional relation and a statistical relation. The functional

relation between the independent variable X and the dependent variable Y is often ex-

pressed as a mathematical formula

Y = f(X)

and the main feature of this relation is that the observations (xi, yi) (i = 1, . . . , n) fall

directly on the “curve” of the relationship, that is, on the curve y = f(x).

A statistical relation, unlike a functional relation is not a “perfect”one. Very often

explanatory variables are thought of as fixed, and response variables are thought of as

random variables with a distribution depending on the explanatory variables. Therefore,

for each value of an explanatory variable x the response Y may be supposed to be a

random variable with expectation (mean value) f(x) = E(Y |X = x), E(Y |x) in short.

Then a statistician may wish to determine the function f using sample data consisting of

pairs (xi, yi) (i = 1, . . . , n). Function f is called the regression function for regressing Y

on X and X is called the regressor.

The regression function f(x) represents the systematic component of the model. The

systematic component of the model is concerned with overall population features such as

expected values. To emphasize the existence of the random component of the model, the

response is often written in the form

Y = f(x) + ε

where f is the regression function, that is, the systematic component, and ε is the random

component. In most applications ε is a normal random variable with mean zero and

variance σ2 (ε ∼ N (0, σ2)).

1

Linear statistical model

The systematic component f is often expressed in terms of explanatory variables

through a parametric equation. If, for example, it is supposed that

f(x) = A+Bx+ Cx2

or

f(x) = A2x +B

or

f(x) = A log x+B,

then the problem is reduced to one of identifying a few parameters, here labeled as A,B,C.

In each of these three forms for f given above, f is linear in these parameters.

For example, A2x + B can be written as f(x,β) = g(x)Tβ, where g(x)T = (1, 2x) is

known as transformed input and βT = (B,A) is vector of the model parameters. Similarly,

A log(x) + B writes as g(x)Tβ, where transformed input is given by g(x)T = (1, log(x))

and vector of model parameters is still βT = (B,A).

It is the linearity in the parameters that makes the model a linear statistical model!

1.2 The model of measurements, revision of Year 1 Statistics.

Let µ be an unknown quantity of interest which can be measured with some error. A

mathematical (statistical) model for this experiment is specified by the following equation

(the model equation) Y = µ+ ε, where Y is the available measurement (observation) and

ε is a random error modelled as a random variable with zero mean, say, with normal

distribution with variance σ2, i.e. ε ∼ N(0, σ2). By properties of the normal distribution,

we have that Y ∼ N(µ, σ2). Suppose that we have n measurements Yi = µ + εi, where

εi ∼ N(0, σ2), (i = 1, ..., n, ) are independently distributed. It follows that Y1, ..., Yn are

then also independent random variables with Yi ∼ N(µ, σ2). In other words, Y1, ..., Yn is

a random sample from a normally distributed population with mean µ and variance σ2,

so that the problem of estimating an unknown quantity µ is the well known (from the 1st

Year statistics) problem of estimating a population mean of a normal population. The

sample mean Y¯ = 1

n

(Y1 + ...+ Yn) =

1

n

∑n

i=1 Yi is usually used as a point estimator of µ.

In MT1300 we briefly stated that there are several general methods of obtaining point

estimators. In this course we are going to use one of these methods, namely, the method

of the least squares (LS). To demonstrate the main idea of this method, let us consider

the case of the model of measurements. Given observations Y1, ..., Yn define the following

function

S(µ) =

n∑

i=1

(Yi − µ)2.

The value of µ that minimises S(µ) is called the least square estimator of µ. We can find

the point of minimum of S(µ), by equateing to zero the first derivative of S(µ)

S ′(µ) = −2

n∑

i=1

(Yi − µ) = −2

(

n∑

i=1

Yi − nµ

)

= 0.

Now it is easy to see that the solution of the above equation is Y¯ = 1

n

∑n

i=1 Yi, the sample

mean, and this is the point of minimum, as the second derivative of S(µ) at µ = Y¯ is

2

2n > 0. There is also a direct way to see that the sample mean is the point of minimum,

and, hence, the least square estimator of µ. Indeed,

S(µ) =

n∑

i=1

(Y 2i − 2Yiµ+ µ2) =

n∑

i=1

Y 2i − 2nµY¯ + nµ2

= −2nµY¯ + nµ2 + nY¯ 2 +

n∑

i=1

Y 2i − nY¯ 2

= n(µ− Y¯ )2 +

n∑

i=1

Y 2i − nY¯ 2 ≥

n∑

i=1

Y 2i − nY¯ 2,

where inequality becomes equality if and only if µ = Y¯ . Note finally that

S(Y¯ ) =

n∑

i=1

Y 2i − nY¯ 2 = (n− 1)s2,

where s2 is the sample variance, which is the point estimator of another model parameter

σ2 (see the next section).

3

1.3 Parametric statistical inference, brief revision of

Year 1 background

Reading

Krzanowski: Chapter 2

Kleinbaum et al: Chapter 3

Frees: Chapter 2

Mendenhall et al: Chapter 1

Newbold, Chapter 9

The process of making statements about population characteristics/parameters given

only information from samples is known as parametric statistical inference.

Example 1 A mechanical jar filler for filling jars with coffee does not fill every jar with

the same quantity. The weight of coffee Y filled in a jar is a random variable which can be

assumed to be normally distributed with mean value µ and variance σ2 (Y ∼ N (µ, σ2)).

Suppose that we have a sample of n independent measurements on Y and wish to

“identify” the parameters of the population (µ, σ2).

The sort of statements that we wish to make about parameters will often fall into one

of the following three categories:

• Point estimation;

• Interval estimation;

• Hypotheses testing.

1.3.1 Point estimation

Point estimation is the aspect of statistical inference in which we wish to find the “the

best guess” of the true value of a population parameter.

Suppose that Y1, Y2, · · · , Yn is a random sample from the population of interest.

Then an estimator of an unknown parameter θ is some function of the observations

Y1, Y2, · · · , Yn, that is

θˆ = θˆ(Y1, Y2, · · · , Yn)

(which is in some sense a “good approximation” to the unknown parameter θ).

Example 1 (continued) A point estimator of µ in a N (µ, σ2) population is provided by

the sample mean Y¯ , which is defined by

Y¯ =

1

n

n∑

i=1

Yi =

1

n

(Y1 + Y2 + · · ·+ Yn).

To estimate σ2 in aN(µ, σ2) population we generally use as its estimator the sample variance

s2 defined by

s2 =

1

n− 1

n∑

i=1

(Yi − Y¯ )2.

It is easy to see that s2 = 1

n−1

(∑n

i=1 Y

2

i − nY¯ 2

)

.

4

Properties of Estimators

Let θˆ = θˆ(Y1, Y2, · · · , Yn) be an estimator of an unknown parameter θ. To clarify in

what sense θˆ is a “good approximation” to θ we consider estimators which are (1) unbiased

and (2) mean square consistent.

(1) θˆ is said to be an unbiased estimator of θ if E(θˆ) = θ.

Example 1 (continued) Y¯ is an unbiased estimator of µ and s2 is an unbiased estimator

of σ2.

To check whether we have a sensible estimator we need to ensure that θˆ is increasingly

likely to yield the right answer θ as the sample size n gets bigger. The mean square error

(MSE) of θˆ is defined to be E(θˆ − θ)2. Since the MSE of θˆ is the average square distance

of θˆ from the true value θ, a good estimator is one with a small MSE.

(2) θˆ is said to be a mean square consistent estimator of θ if

MSE(θˆ)→ 0 as n→∞.

Note that if θˆ is unbiased then it is also mean square consistent if Var(θˆ) → 0 with

n→∞.

1.3.2 Interval estimation

Point estimation is often not sufficiently informative as it does not say anything about

the error of the estimation procedure. Naturally, if the error is large, then we are less

confident in our estimate. Replacing a point estimator by an interval estimator allows

us to quantify the uncertainty of estimation by specifying a desirable level of confidence,

which is the probability of the interval capturing the true value of the parameter. Such

interval estimators are known as confidence intervals (C.I.).

Example 1 (continued) To construct a confidence interval for µ we recall that Y¯ is a linear

combination of independent N (µ, σ2) random variables (Y¯ = ∑ni=1 Yi/n) and therefore is

normally distributed with mean µ (unbiased) and variance σ2/n, that is, Y¯ ∼ N (µ, σ2/n).

It therefore follows that if σ2 is known, then

Z =

Y¯ − µ

σ/

√

n

∼ N (0, 1)

and so

P

(

Y¯ − zα/2σ/

√

n ≤ µ ≤ Y¯ + zα/2σ/

√

n

)

= 1− α,

that is, the (1− α)100% confidence interval for µ is given by(

Y¯ − zα/2σ/

√

n, Y¯ + zα/2σ/

√

n

)

.

If σ2 is unknown, then we construct our CI based on the following T -variable

T =

Y¯ − µ

s/

√

n

∼ tn−1,

where tn−1 is the t-distribution with n− 1 degrees of freedom, and

P

(

Y¯ − tn−1,α/2s/

√

n ≤ µ ≤ Y¯ + tn−1,α/2s/

√

n

)

= 1− α,

5

so that the (1− α)100% confidence interval for µ is given by(

Y¯ − tn−1,α/2s/

√

n, Y¯ + tn−1,α/2s/

√

n

)

.

Note that while both the intervals are centered at Y¯ , the margin of error tn−1,α/2s/

√

n

is a random variable unlike the margin of error zα/2σ/

√

n in the normal CI, which is a

non-random quantity (i.e. does not depend on the sample).

Example 1 (continued) Jars of coffee are labeled as 484 grams in weight. A random

sample of ten jars from a production line are opened and weighed accurately. The ten

weights are found to be as follows:

483.7 485.6 486.2 486.0 488.1 480.3 485.4 485.2 483.7 483.3

It is assumed that weights of coffee are normally distributed with mean µ grams and

standard deviation σ.

Find the 95%CI for the true population mean of jar weights.

Using the information provided we find y¯ = 1

10

(y1+· · ·+y10) = 484.75 and s2 = 19

∑10

i=1 y

2

i−

10(y¯)2 = 3.24 (grams squared). Therefore, the 95% CI for the population mean is 484.75±

t9,0.025

s√

n

= (483.462, 486.038), where the critical value t9,0.025 = 2.262 is found in the

Tables (or using software).

1.3.3 Hypotheses testing

Often an investigator has a theory about the phenomenon under study, and wishes

to see whether this theory is confirmed by the data that have been collected. The

null hypothesis H0 is, usually, what we are prepared to “go along with” until we obtain

convincing evidence in favour of the alternative hypothesis H1. To conduct a hypothesis

test we need to complete the following steps.

(1) Specify the null and alternative hypotheses.

(2) Choose a test statistic T which is such that

◦ T behaves differently under the null and alternative hypotheses;

◦ the sampling distribution of T is fully specified when H0 is true.

(3) Formulate some decision rule based on the statistic T .

Whatever decision rule is adopted, there is some chance of reaching an erroneous

conclusion about the population parameter of interest. One error that could be made,

called a Type I error, is the rejection of a true null hypothesis. If the decision rule is

such that the probability of rejecting of a true null hypothesis is α, then α is said to

be the significance level of the test. The other possible error, called Type II error, arises

when a false null hypothesis is accepted. Suppose that for a particular decision rule , the

probability of making such an error is β. Then, the probability of rejecting a false null

hypothesis is (1− β), which is called the power of the test.

6

NULL HYPOTHESIS NULL HYPOTHESIS

TRUE FALSE

ACCEPT Correct decision Type II error

Probability =1− α Probability = β

REJECT Type I error Correct decision

Probability = α Probability =1− β

(significance level) (power)

Ideally we would like to have the probabilities of both types of error as small as possible.

However, in general, once a sample has been taken, any adjustment to the decision rule to

reduce the probability α of type I error automatically increases the probability β of type

II error. The only way of simultaneously lowering both α and β would be to obtain more

information about the population, e.g., by taking a larger sample. In practice we usually

specify significance level (type I error) α to have a small value such as 0.10, 0.05, 0.025,

or 0.01. This then determines the probability of Type II error β (if there is a choice of

tests then we prefer the one with the smallest β, that is, with the highest power (1− β)).

For a given significance level, the bigger is the sample size, the higher will be the power

of the test.

Example 1 (continued) Would you say that the jars are labeled correctly?

So, the statistical model is already specified. We have a random sample of n obser-

vations Y1, Y2, . . . , Yn with Yi ∼ N (µ, σ2). The objective is to test hypotheses about the

unknown population mean.

Consider the problem of testing the simple null hypothesis that the population mean

is equal to some specified value µ0

H0 : µ = µ0

against one of the following three alternative hypotheses

(i) H1 : µ > µ0, (ii) H1 : µ < µ0, (iii) H1 : µ 6= µ0.

Test of the mean of a normal distribution:

Population variance known

Assume first that population variance is known. For all three cases, when the null

hypothesis is true we have

Z =

Y¯ − µ0

σ/

√

n

∼ N (0, 1).

If H1 is true then in case (i) the r.v. Z will tend to be larger (for (ii) Z will tend to be

smaller and for (iii) the absolute value of Z will tend to be larger) than would be expected

for a standard normal random variable. Let us denote by cα the number for which

P{Z > cα} = α

7

where Z ∼ N (0, 1). Then a test with significance level α (type I error) is obtained from

the decision rule:

(i) For H1 : µ > µ0,

Reject H0 if

y¯ − µ0

σ/

√

n

> cα

(ii) For H1 : µ < µ0,

Reject H0 if

y¯ − µ0

σ/

√

n

< −cα

(iii) For H1 : µ 6= µ0,

Reject H0 if

∣∣∣∣ y¯ − µ0σ/√n

∣∣∣∣ > cα2 .

Example 1 (continued) Assume that the standard deviation is given as σ = 1.8 gram.

Test of the mean of a normal distribution:

Population variance unknown

Suppose now that the population variance is no longer assumed known. If the sample

size is not large, the procedures discussed above are no longer appropriate.

To perform a testing procedure we replace σ2 by its estimator, the sample variance s2:

T =

Y¯ − µ0

s/

√

n

.

Now, if the null hypothesis is true then the r.v. T follows a Student’s t distribution with

(n− 1) degrees of freedom (tn−1). Now we can use precisely the same arguments adopted

above with the Student’s t distribution now playing the same role as the standard normal

distribution.

Let us denote by cα the number for which

P{T > cα} = α where T ∼ tn−1

(cα is the (1−α)th quantile, tn−1(1−α), of tn−1 distribution.) Then a test with significance

level α (type I error) is obtained from the decision rule:

(i) For H1 : µ > µ0,

Reject H0 if

y¯ − µ0

s/

√

n

> cα

(ii) For H1 : µ < µ0,

Reject H0 if

y¯ − µ0

s/

√

n

< −cα

(iii) For H1 : µ 6= µ0,

Reject H0 if

∣∣∣∣ y¯ − µ0s/√n

∣∣∣∣ > cα2 .

Example 1 (continued) Assume that the standard deviation is unknown.

Test of the mean of a normal distribution:

Large sample sizes

Suppose that we have a random sample of n observations from a population with mean

µ and variance σ2. If the sample size n is large (n ≥ 30), the test procedures developed

8

for the case where the population variance is known can be employed when it is unknown,

replacing σ2 by the observed sample variance s2. Moreover, these procedures remain

approximately valid even if the population distribution is not normal.

P-value

The smallest significance level at which a null hypothesis can be rejected is called the

probability value or p-value of the test on the given sample.

The p-value gives the probability of observing a value as extreme as the one we have

got or even more extreme, when the null hypothesis is true. Suppose that the data produce

Tobs as the value of the test statistic T . Then we assume that H0 is true and calculate

the probability p of observing a value of T that is as extreme as Tobs or more extreme

than Tobs, where ‘extreme’ is determined by the direction of departure of H1 from H0. For

example, in the above procedures, if we test H1 : µ = µ0 against H1 : µ > µ0 then the

value of T more extreme than Tobs in the direction of departure of H1 would be values

such that T > Tobs. On the other hand, if H1 : µ 6= µ0, then “more extreme” would be

either T > |Tobs| or T < −|Tobs|.

Example 1 (continued) Find the p-value of the test if σ2 is assumed to be known.

In general, to draw conclusions about a test on the basis of the p-value, the following

guidelines are recommended:

1. If p is small (less than 0.01), reject H0.

2. If p is large (greater than 0.10), do not reject H0.

3. If 0.01 < p < 0.10, the significance is borderline: that is, we reject H0 for α = 0.10

but do not reject H0 for α = 0.01.

Note that if we actually do specify α a priori, we reject H0 if p < α.

In this example, the obvious choices of the null and alternative hypotheses are H0 :

µ = 484, H1 : µ 6= 484. Significance level 0.05 (that is 5%) is specified. The standard

deviation σ is unknown, but from the information provided we can estimate it by the

sample standard deviation s =

√

s2= 1.8 gram (and s2 could be calculated from the

statistics y1 + · · ·+ y10 = 4847.5, and y21 + · · ·+ y210 = 2349850).

The test statistic is the t-statistic T = Y¯−µ0

s/

√

n

∼ t9, which has the t-distribution with

9 = 10 - 1 degrees of freedom. Recall from the above that Y¯ = 484.75 and s2 = 3.24.

Therefore, we have got for the sample that Tobs =

484.75−484

1.8/

√

10

= 1.318. Decision with the

critical values (acceptance/rejection regions).

From the tables of t-distribution we find that t9,0.025 = 2.262. So, the corresponding

rejection region (or critical region) is (−∞,−2.262) ∪ (2.262,∞). Then, since −2.262 <

1.318 < 2.2262, we say that at 5% significance level, the data do not provide enough

evidence for rejection of the null hypothesis.

Decision with the p-value. H1 is two-sided, so that p-value = 2(1 − P (T ≤ |Tobs|)) =

2(1− P (T ≤ 1.318)).

P (T ≤ 1.318) is not explicitly given in the Tables but can be approximated by the

closest available values P (T ≤ 1.3) ≤ P (T ≤ 1.318) ≤ P (T ≤ 1.4), that is, or 0.887 ≤

P (T ≤ 1.318) ≤ 0.9025, which gives the p-value of at least 0.195, which is higher than

0.05, hence we do not reject H0.

9

Test for the difference between two means: Matched pairs

Consider a different testing situation in which there are n experimental units, each of

which generates a pair of observations as a result of some treatment. Thus there is a set

of n values before the application of the treatment and then a second set of n values after

the application of the treatment, i.e.

Experimental Unit 1 2 3 . . . n

Before treatment y11 y12 y13 . . . y1n

After treatment y21 y22 y23 . . . y2n

Differences d1 d2 d3 . . . dn

The single sample d1, d2, · · · , dn is formed from the differences of the samples, i.e. di =

y1i − y2i. The objective is to test whether the ‘before’ and ‘after’ populations are the

same. Assume that d1, d2, · · · , dn comes from N(µ, σ2) population; then the procedures

developed for the one-sample test can be employed to investigate the null hypothesis

H0 : µ = 0, where µ = E[D1 −D2], the population mean difference of scores before and

after the the treatment. The three alternative hypotheses are µ 6= 0 or µ < 0 or µ > 0.

Example 2 A group of 12 subjects were given a series of tests to assess their memory,

concentration and capacity to undertake simple arithmetic and logic computations. Their

scores were recorded as Score 1. The same subjects again completed an equivalent series

of tests when they were in the fifth week of a slimming diet and their scores were recorded

as Score 2. the results are given in the table below.

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Score 1 60.2 70.7 39.5 40.3 22.5 53.8 62.5 57.1 54 63.9 59.1 67

Score 2 51.6 63.9 43.3 41.2 20.9 47.3 53.6 60.2 44.3 56.7 47.2 72.3

Do the data support the suggestion that dieting reduces mental effectiveness (during the

period of dieting)?

10

2020/2021

1. Introduction and Preliminaries

Reading

Faraway: Chapter 1 Krzanowski: Chapter 1

Kleinbaum et al: Chapter 4

Frees: Sections 1.1–1.3

Mendenhall et al: Chapter 2

1.1 Motivation

Terminology

We deal with measurements of several variables for each of n experimental units or

individuals. The variables are of two types (though the distinction between them is not

always rigid in applications): those of primary interest to the investigator and those

which might provide supplementary or background information. The variables of the

former type are called response, outcome or dependent variables, while those of latter

type are called explanatory, independent or predictor variables. Econometricians also

use the terms endogenous and exogenous to distinguish the two types of variables. The

explanatory variables are used to predict or to understand the response variables.

Relation between variables, models

We distinguish between a functional relation and a statistical relation. The functional

relation between the independent variable X and the dependent variable Y is often ex-

pressed as a mathematical formula

Y = f(X)

and the main feature of this relation is that the observations (xi, yi) (i = 1, . . . , n) fall

directly on the “curve” of the relationship, that is, on the curve y = f(x).

A statistical relation, unlike a functional relation is not a “perfect”one. Very often

explanatory variables are thought of as fixed, and response variables are thought of as

random variables with a distribution depending on the explanatory variables. Therefore,

for each value of an explanatory variable x the response Y may be supposed to be a

random variable with expectation (mean value) f(x) = E(Y |X = x), E(Y |x) in short.

Then a statistician may wish to determine the function f using sample data consisting of

pairs (xi, yi) (i = 1, . . . , n). Function f is called the regression function for regressing Y

on X and X is called the regressor.

The regression function f(x) represents the systematic component of the model. The

systematic component of the model is concerned with overall population features such as

expected values. To emphasize the existence of the random component of the model, the

response is often written in the form

Y = f(x) + ε

where f is the regression function, that is, the systematic component, and ε is the random

component. In most applications ε is a normal random variable with mean zero and

variance σ2 (ε ∼ N (0, σ2)).

1

Linear statistical model

The systematic component f is often expressed in terms of explanatory variables

through a parametric equation. If, for example, it is supposed that

f(x) = A+Bx+ Cx2

or

f(x) = A2x +B

or

f(x) = A log x+B,

then the problem is reduced to one of identifying a few parameters, here labeled as A,B,C.

In each of these three forms for f given above, f is linear in these parameters.

For example, A2x + B can be written as f(x,β) = g(x)Tβ, where g(x)T = (1, 2x) is

known as transformed input and βT = (B,A) is vector of the model parameters. Similarly,

A log(x) + B writes as g(x)Tβ, where transformed input is given by g(x)T = (1, log(x))

and vector of model parameters is still βT = (B,A).

It is the linearity in the parameters that makes the model a linear statistical model!

1.2 The model of measurements, revision of Year 1 Statistics.

Let µ be an unknown quantity of interest which can be measured with some error. A

mathematical (statistical) model for this experiment is specified by the following equation

(the model equation) Y = µ+ ε, where Y is the available measurement (observation) and

ε is a random error modelled as a random variable with zero mean, say, with normal

distribution with variance σ2, i.e. ε ∼ N(0, σ2). By properties of the normal distribution,

we have that Y ∼ N(µ, σ2). Suppose that we have n measurements Yi = µ + εi, where

εi ∼ N(0, σ2), (i = 1, ..., n, ) are independently distributed. It follows that Y1, ..., Yn are

then also independent random variables with Yi ∼ N(µ, σ2). In other words, Y1, ..., Yn is

a random sample from a normally distributed population with mean µ and variance σ2,

so that the problem of estimating an unknown quantity µ is the well known (from the 1st

Year statistics) problem of estimating a population mean of a normal population. The

sample mean Y¯ = 1

n

(Y1 + ...+ Yn) =

1

n

∑n

i=1 Yi is usually used as a point estimator of µ.

In MT1300 we briefly stated that there are several general methods of obtaining point

estimators. In this course we are going to use one of these methods, namely, the method

of the least squares (LS). To demonstrate the main idea of this method, let us consider

the case of the model of measurements. Given observations Y1, ..., Yn define the following

function

S(µ) =

n∑

i=1

(Yi − µ)2.

The value of µ that minimises S(µ) is called the least square estimator of µ. We can find

the point of minimum of S(µ), by equateing to zero the first derivative of S(µ)

S ′(µ) = −2

n∑

i=1

(Yi − µ) = −2

(

n∑

i=1

Yi − nµ

)

= 0.

Now it is easy to see that the solution of the above equation is Y¯ = 1

n

∑n

i=1 Yi, the sample

mean, and this is the point of minimum, as the second derivative of S(µ) at µ = Y¯ is

2

2n > 0. There is also a direct way to see that the sample mean is the point of minimum,

and, hence, the least square estimator of µ. Indeed,

S(µ) =

n∑

i=1

(Y 2i − 2Yiµ+ µ2) =

n∑

i=1

Y 2i − 2nµY¯ + nµ2

= −2nµY¯ + nµ2 + nY¯ 2 +

n∑

i=1

Y 2i − nY¯ 2

= n(µ− Y¯ )2 +

n∑

i=1

Y 2i − nY¯ 2 ≥

n∑

i=1

Y 2i − nY¯ 2,

where inequality becomes equality if and only if µ = Y¯ . Note finally that

S(Y¯ ) =

n∑

i=1

Y 2i − nY¯ 2 = (n− 1)s2,

where s2 is the sample variance, which is the point estimator of another model parameter

σ2 (see the next section).

3

1.3 Parametric statistical inference, brief revision of

Year 1 background

Reading

Krzanowski: Chapter 2

Kleinbaum et al: Chapter 3

Frees: Chapter 2

Mendenhall et al: Chapter 1

Newbold, Chapter 9

The process of making statements about population characteristics/parameters given

only information from samples is known as parametric statistical inference.

Example 1 A mechanical jar filler for filling jars with coffee does not fill every jar with

the same quantity. The weight of coffee Y filled in a jar is a random variable which can be

assumed to be normally distributed with mean value µ and variance σ2 (Y ∼ N (µ, σ2)).

Suppose that we have a sample of n independent measurements on Y and wish to

“identify” the parameters of the population (µ, σ2).

The sort of statements that we wish to make about parameters will often fall into one

of the following three categories:

• Point estimation;

• Interval estimation;

• Hypotheses testing.

1.3.1 Point estimation

Point estimation is the aspect of statistical inference in which we wish to find the “the

best guess” of the true value of a population parameter.

Suppose that Y1, Y2, · · · , Yn is a random sample from the population of interest.

Then an estimator of an unknown parameter θ is some function of the observations

Y1, Y2, · · · , Yn, that is

θˆ = θˆ(Y1, Y2, · · · , Yn)

(which is in some sense a “good approximation” to the unknown parameter θ).

Example 1 (continued) A point estimator of µ in a N (µ, σ2) population is provided by

the sample mean Y¯ , which is defined by

Y¯ =

1

n

n∑

i=1

Yi =

1

n

(Y1 + Y2 + · · ·+ Yn).

To estimate σ2 in aN(µ, σ2) population we generally use as its estimator the sample variance

s2 defined by

s2 =

1

n− 1

n∑

i=1

(Yi − Y¯ )2.

It is easy to see that s2 = 1

n−1

(∑n

i=1 Y

2

i − nY¯ 2

)

.

4

Properties of Estimators

Let θˆ = θˆ(Y1, Y2, · · · , Yn) be an estimator of an unknown parameter θ. To clarify in

what sense θˆ is a “good approximation” to θ we consider estimators which are (1) unbiased

and (2) mean square consistent.

(1) θˆ is said to be an unbiased estimator of θ if E(θˆ) = θ.

Example 1 (continued) Y¯ is an unbiased estimator of µ and s2 is an unbiased estimator

of σ2.

To check whether we have a sensible estimator we need to ensure that θˆ is increasingly

likely to yield the right answer θ as the sample size n gets bigger. The mean square error

(MSE) of θˆ is defined to be E(θˆ − θ)2. Since the MSE of θˆ is the average square distance

of θˆ from the true value θ, a good estimator is one with a small MSE.

(2) θˆ is said to be a mean square consistent estimator of θ if

MSE(θˆ)→ 0 as n→∞.

Note that if θˆ is unbiased then it is also mean square consistent if Var(θˆ) → 0 with

n→∞.

1.3.2 Interval estimation

Point estimation is often not sufficiently informative as it does not say anything about

the error of the estimation procedure. Naturally, if the error is large, then we are less

confident in our estimate. Replacing a point estimator by an interval estimator allows

us to quantify the uncertainty of estimation by specifying a desirable level of confidence,

which is the probability of the interval capturing the true value of the parameter. Such

interval estimators are known as confidence intervals (C.I.).

Example 1 (continued) To construct a confidence interval for µ we recall that Y¯ is a linear

combination of independent N (µ, σ2) random variables (Y¯ = ∑ni=1 Yi/n) and therefore is

normally distributed with mean µ (unbiased) and variance σ2/n, that is, Y¯ ∼ N (µ, σ2/n).

It therefore follows that if σ2 is known, then

Z =

Y¯ − µ

σ/

√

n

∼ N (0, 1)

and so

P

(

Y¯ − zα/2σ/

√

n ≤ µ ≤ Y¯ + zα/2σ/

√

n

)

= 1− α,

that is, the (1− α)100% confidence interval for µ is given by(

Y¯ − zα/2σ/

√

n, Y¯ + zα/2σ/

√

n

)

.

If σ2 is unknown, then we construct our CI based on the following T -variable

T =

Y¯ − µ

s/

√

n

∼ tn−1,

where tn−1 is the t-distribution with n− 1 degrees of freedom, and

P

(

Y¯ − tn−1,α/2s/

√

n ≤ µ ≤ Y¯ + tn−1,α/2s/

√

n

)

= 1− α,

5

so that the (1− α)100% confidence interval for µ is given by(

Y¯ − tn−1,α/2s/

√

n, Y¯ + tn−1,α/2s/

√

n

)

.

Note that while both the intervals are centered at Y¯ , the margin of error tn−1,α/2s/

√

n

is a random variable unlike the margin of error zα/2σ/

√

n in the normal CI, which is a

non-random quantity (i.e. does not depend on the sample).

Example 1 (continued) Jars of coffee are labeled as 484 grams in weight. A random

sample of ten jars from a production line are opened and weighed accurately. The ten

weights are found to be as follows:

483.7 485.6 486.2 486.0 488.1 480.3 485.4 485.2 483.7 483.3

It is assumed that weights of coffee are normally distributed with mean µ grams and

standard deviation σ.

Find the 95%CI for the true population mean of jar weights.

Using the information provided we find y¯ = 1

10

(y1+· · ·+y10) = 484.75 and s2 = 19

∑10

i=1 y

2

i−

10(y¯)2 = 3.24 (grams squared). Therefore, the 95% CI for the population mean is 484.75±

t9,0.025

s√

n

= (483.462, 486.038), where the critical value t9,0.025 = 2.262 is found in the

Tables (or using software).

1.3.3 Hypotheses testing

Often an investigator has a theory about the phenomenon under study, and wishes

to see whether this theory is confirmed by the data that have been collected. The

null hypothesis H0 is, usually, what we are prepared to “go along with” until we obtain

convincing evidence in favour of the alternative hypothesis H1. To conduct a hypothesis

test we need to complete the following steps.

(1) Specify the null and alternative hypotheses.

(2) Choose a test statistic T which is such that

◦ T behaves differently under the null and alternative hypotheses;

◦ the sampling distribution of T is fully specified when H0 is true.

(3) Formulate some decision rule based on the statistic T .

Whatever decision rule is adopted, there is some chance of reaching an erroneous

conclusion about the population parameter of interest. One error that could be made,

called a Type I error, is the rejection of a true null hypothesis. If the decision rule is

such that the probability of rejecting of a true null hypothesis is α, then α is said to

be the significance level of the test. The other possible error, called Type II error, arises

when a false null hypothesis is accepted. Suppose that for a particular decision rule , the

probability of making such an error is β. Then, the probability of rejecting a false null

hypothesis is (1− β), which is called the power of the test.

6

NULL HYPOTHESIS NULL HYPOTHESIS

TRUE FALSE

ACCEPT Correct decision Type II error

Probability =1− α Probability = β

REJECT Type I error Correct decision

Probability = α Probability =1− β

(significance level) (power)

Ideally we would like to have the probabilities of both types of error as small as possible.

However, in general, once a sample has been taken, any adjustment to the decision rule to

reduce the probability α of type I error automatically increases the probability β of type

II error. The only way of simultaneously lowering both α and β would be to obtain more

information about the population, e.g., by taking a larger sample. In practice we usually

specify significance level (type I error) α to have a small value such as 0.10, 0.05, 0.025,

or 0.01. This then determines the probability of Type II error β (if there is a choice of

tests then we prefer the one with the smallest β, that is, with the highest power (1− β)).

For a given significance level, the bigger is the sample size, the higher will be the power

of the test.

Example 1 (continued) Would you say that the jars are labeled correctly?

So, the statistical model is already specified. We have a random sample of n obser-

vations Y1, Y2, . . . , Yn with Yi ∼ N (µ, σ2). The objective is to test hypotheses about the

unknown population mean.

Consider the problem of testing the simple null hypothesis that the population mean

is equal to some specified value µ0

H0 : µ = µ0

against one of the following three alternative hypotheses

(i) H1 : µ > µ0, (ii) H1 : µ < µ0, (iii) H1 : µ 6= µ0.

Test of the mean of a normal distribution:

Population variance known

Assume first that population variance is known. For all three cases, when the null

hypothesis is true we have

Z =

Y¯ − µ0

σ/

√

n

∼ N (0, 1).

If H1 is true then in case (i) the r.v. Z will tend to be larger (for (ii) Z will tend to be

smaller and for (iii) the absolute value of Z will tend to be larger) than would be expected

for a standard normal random variable. Let us denote by cα the number for which

P{Z > cα} = α

7

where Z ∼ N (0, 1). Then a test with significance level α (type I error) is obtained from

the decision rule:

(i) For H1 : µ > µ0,

Reject H0 if

y¯ − µ0

σ/

√

n

> cα

(ii) For H1 : µ < µ0,

Reject H0 if

y¯ − µ0

σ/

√

n

< −cα

(iii) For H1 : µ 6= µ0,

Reject H0 if

∣∣∣∣ y¯ − µ0σ/√n

∣∣∣∣ > cα2 .

Example 1 (continued) Assume that the standard deviation is given as σ = 1.8 gram.

Test of the mean of a normal distribution:

Population variance unknown

Suppose now that the population variance is no longer assumed known. If the sample

size is not large, the procedures discussed above are no longer appropriate.

To perform a testing procedure we replace σ2 by its estimator, the sample variance s2:

T =

Y¯ − µ0

s/

√

n

.

Now, if the null hypothesis is true then the r.v. T follows a Student’s t distribution with

(n− 1) degrees of freedom (tn−1). Now we can use precisely the same arguments adopted

above with the Student’s t distribution now playing the same role as the standard normal

distribution.

Let us denote by cα the number for which

P{T > cα} = α where T ∼ tn−1

(cα is the (1−α)th quantile, tn−1(1−α), of tn−1 distribution.) Then a test with significance

level α (type I error) is obtained from the decision rule:

(i) For H1 : µ > µ0,

Reject H0 if

y¯ − µ0

s/

√

n

> cα

(ii) For H1 : µ < µ0,

Reject H0 if

y¯ − µ0

s/

√

n

< −cα

(iii) For H1 : µ 6= µ0,

Reject H0 if

∣∣∣∣ y¯ − µ0s/√n

∣∣∣∣ > cα2 .

Example 1 (continued) Assume that the standard deviation is unknown.

Test of the mean of a normal distribution:

Large sample sizes

Suppose that we have a random sample of n observations from a population with mean

µ and variance σ2. If the sample size n is large (n ≥ 30), the test procedures developed

8

for the case where the population variance is known can be employed when it is unknown,

replacing σ2 by the observed sample variance s2. Moreover, these procedures remain

approximately valid even if the population distribution is not normal.

P-value

The smallest significance level at which a null hypothesis can be rejected is called the

probability value or p-value of the test on the given sample.

The p-value gives the probability of observing a value as extreme as the one we have

got or even more extreme, when the null hypothesis is true. Suppose that the data produce

Tobs as the value of the test statistic T . Then we assume that H0 is true and calculate

the probability p of observing a value of T that is as extreme as Tobs or more extreme

than Tobs, where ‘extreme’ is determined by the direction of departure of H1 from H0. For

example, in the above procedures, if we test H1 : µ = µ0 against H1 : µ > µ0 then the

value of T more extreme than Tobs in the direction of departure of H1 would be values

such that T > Tobs. On the other hand, if H1 : µ 6= µ0, then “more extreme” would be

either T > |Tobs| or T < −|Tobs|.

Example 1 (continued) Find the p-value of the test if σ2 is assumed to be known.

In general, to draw conclusions about a test on the basis of the p-value, the following

guidelines are recommended:

1. If p is small (less than 0.01), reject H0.

2. If p is large (greater than 0.10), do not reject H0.

3. If 0.01 < p < 0.10, the significance is borderline: that is, we reject H0 for α = 0.10

but do not reject H0 for α = 0.01.

Note that if we actually do specify α a priori, we reject H0 if p < α.

In this example, the obvious choices of the null and alternative hypotheses are H0 :

µ = 484, H1 : µ 6= 484. Significance level 0.05 (that is 5%) is specified. The standard

deviation σ is unknown, but from the information provided we can estimate it by the

sample standard deviation s =

√

s2= 1.8 gram (and s2 could be calculated from the

statistics y1 + · · ·+ y10 = 4847.5, and y21 + · · ·+ y210 = 2349850).

The test statistic is the t-statistic T = Y¯−µ0

s/

√

n

∼ t9, which has the t-distribution with

9 = 10 - 1 degrees of freedom. Recall from the above that Y¯ = 484.75 and s2 = 3.24.

Therefore, we have got for the sample that Tobs =

484.75−484

1.8/

√

10

= 1.318. Decision with the

critical values (acceptance/rejection regions).

From the tables of t-distribution we find that t9,0.025 = 2.262. So, the corresponding

rejection region (or critical region) is (−∞,−2.262) ∪ (2.262,∞). Then, since −2.262 <

1.318 < 2.2262, we say that at 5% significance level, the data do not provide enough

evidence for rejection of the null hypothesis.

Decision with the p-value. H1 is two-sided, so that p-value = 2(1 − P (T ≤ |Tobs|)) =

2(1− P (T ≤ 1.318)).

P (T ≤ 1.318) is not explicitly given in the Tables but can be approximated by the

closest available values P (T ≤ 1.3) ≤ P (T ≤ 1.318) ≤ P (T ≤ 1.4), that is, or 0.887 ≤

P (T ≤ 1.318) ≤ 0.9025, which gives the p-value of at least 0.195, which is higher than

0.05, hence we do not reject H0.

9

Test for the difference between two means: Matched pairs

Consider a different testing situation in which there are n experimental units, each of

which generates a pair of observations as a result of some treatment. Thus there is a set

of n values before the application of the treatment and then a second set of n values after

the application of the treatment, i.e.

Experimental Unit 1 2 3 . . . n

Before treatment y11 y12 y13 . . . y1n

After treatment y21 y22 y23 . . . y2n

Differences d1 d2 d3 . . . dn

The single sample d1, d2, · · · , dn is formed from the differences of the samples, i.e. di =

y1i − y2i. The objective is to test whether the ‘before’ and ‘after’ populations are the

same. Assume that d1, d2, · · · , dn comes from N(µ, σ2) population; then the procedures

developed for the one-sample test can be employed to investigate the null hypothesis

H0 : µ = 0, where µ = E[D1 −D2], the population mean difference of scores before and

after the the treatment. The three alternative hypotheses are µ 6= 0 or µ < 0 or µ > 0.

Example 2 A group of 12 subjects were given a series of tests to assess their memory,

concentration and capacity to undertake simple arithmetic and logic computations. Their

scores were recorded as Score 1. The same subjects again completed an equivalent series

of tests when they were in the fifth week of a slimming diet and their scores were recorded

as Score 2. the results are given in the table below.

Subject 1 2 3 4 5 6 7 8 9 10 11 12

Score 1 60.2 70.7 39.5 40.3 22.5 53.8 62.5 57.1 54 63.9 59.1 67

Score 2 51.6 63.9 43.3 41.2 20.9 47.3 53.6 60.2 44.3 56.7 47.2 72.3

Do the data support the suggestion that dieting reduces mental effectiveness (during the

period of dieting)?

10

- 留学生代写
- Python代写
- Java代写
- c/c++代写
- 数据库代写
- 算法代写
- 机器学习代写
- 数据挖掘代写
- 数据分析代写
- Android代写
- html代写
- 计算机网络代写
- 操作系统代写
- 计算机体系结构代写
- R代写
- 数学代写
- 金融作业代写
- 微观经济学代写
- 会计代写
- 统计代写
- 生物代写
- 物理代写
- 机械代写
- Assignment代写
- sql数据库代写
- analysis代写
- Haskell代写
- Linux代写
- Shell代写
- Diode Ideality Factor代写
- 宏观经济学代写
- 经济代写
- 计量经济代写
- math代写
- 金融统计代写
- 经济统计代写
- 概率论代写
- 代数代写
- 工程作业代写
- Databases代写
- 逻辑代写
- JavaScript代写
- Matlab代写
- Unity代写
- BigDate大数据代写
- 汇编代写
- stat代写
- scala代写
- OpenGL代写
- CS代写
- 程序代写
- 简答代写
- Excel代写
- Logisim代写
- 代码代写
- 手写题代写
- 电子工程代写
- 判断代写
- 论文代写
- stata代写
- witness代写
- statscloud代写
- 证明代写
- 非欧几何代写
- 理论代写
- http代写
- MySQL代写
- PHP代写
- 计算代写
- 考试代写
- 博弈论代写
- 英语代写
- essay代写
- 不限代写
- lingo代写
- 线性代数代写
- 文本处理代写
- 商科代写
- visual studio代写
- 光谱分析代写
- report代写
- GCP代写
- 无代写
- 电力系统代写
- refinitiv eikon代写
- 运筹学代写
- simulink代写
- 单片机代写
- GAMS代写
- 人力资源代写
- 报告代写
- SQLAlchemy代写
- Stufio代写
- sklearn代写
- 计算机架构代写
- 贝叶斯代写
- 以太坊代写
- 计算证明代写
- prolog代写
- 交互设计代写
- mips代写
- css代写
- 云计算代写
- dafny代写
- quiz考试代写
- js代写
- 密码学代写
- ml代写
- 水利工程基础代写
- 经济管理代写
- Rmarkdown代写
- 电路代写
- 质量管理画图代写
- sas代写
- 金融数学代写
- processing代写
- 预测分析代写
- 机械力学代写
- vhdl代写
- solidworks代写
- 不涉及代写
- 计算分析代写
- Netlogo代写
- openbugs代写
- 土木代写
- 国际金融专题代写
- 离散数学代写
- openssl代写
- 化学材料代写
- eview代写
- nlp代写
- Assembly language代写
- gproms代写
- studio代写
- robot analyse代写
- pytorch代写
- 证明题代写
- latex代写
- coq代写
- 市场营销论文代写
- 人力资论文代写
- weka代写
- 英文代写
- Minitab代写
- 航空代写
- webots代写
- Advanced Management Accounting代写
- Lunix代写
- 云基础代写
- 有限状态过程代写
- aws代写
- AI代写
- 图灵机代写
- Sociology代写
- 分析代写
- 经济开发代写
- Data代写
- jupyter代写
- 通信考试代写
- 网络安全代写
- 固体力学代写
- spss代写
- 无编程代写
- react代写
- Ocaml代写
- 期货期权代写
- Scheme代写
- 数学统计代写
- 信息安全代写
- Bloomberg代写
- 残疾与创新设计代写
- 历史代写
- 理论题代写
- cpu代写
- 计量代写
- Xpress-IVE代写
- 微积分代写
- 材料学代写
- 代写
- 会计信息系统代写
- 凸优化代写
- 投资代写
- F#代写
- C#代写
- arm代写
- 伪代码代写
- 白话代写
- IC集成电路代写
- reasoning代写
- agents代写
- 精算代写
- opencl代写
- Perl代写
- 图像处理代写
- 工程电磁场代写
- 时间序列代写
- 数据结构算法代写
- 网络基础代写
- 画图代写
- Marie代写
- ASP代写
- EViews代写
- Interval Temporal Logic代写
- ccgarch代写
- rmgarch代写
- jmp代写
- 选择填空代写
- mathematics代写
- winbugs代写
- maya代写
- Directx代写
- PPT代写
- 可视化代写
- 工程材料代写
- 环境代写
- abaqus代写
- 投资组合代写
- 选择题代写
- openmp.c代写
- cuda.cu代写
- 传感器基础代写
- 区块链比特币代写
- 土壤固结代写
- 电气代写
- 电子设计代写
- 主观题代写
- 金融微积代写
- ajax代写
- Risk theory代写
- tcp代写
- tableau代写
- mylab代写
- research paper代写
- 手写代写
- 管理代写
- paper代写
- 毕设代写
- 衍生品代写
- 学术论文代写
- 计算画图代写
- SPIM汇编代写
- 演讲稿代写
- 金融实证代写
- 环境化学代写
- 通信代写
- 股权市场代写
- 计算机逻辑代写
- Microsoft Visio代写
- 业务流程管理代写
- Spark代写
- USYD代写
- 数值分析代写
- 有限元代写
- 抽代代写
- 不限定代写
- IOS代写
- scikit-learn代写
- ts angular代写
- sml代写
- 管理决策分析代写
- vba代写
- 墨大代写
- erlang代写
- Azure代写
- 粒子物理代写
- 编译器代写
- socket代写
- 商业分析代写
- 财务报表分析代写
- Machine Learning代写
- 国际贸易代写
- code代写
- 流体力学代写
- 辅导代写
- 设计代写
- marketing代写
- web代写
- 计算机代写
- verilog代写
- 心理学代写
- 线性回归代写
- 高级数据分析代写
- clingo代写
- Mplab代写
- coventorware代写
- creo代写
- nosql代写
- 供应链代写
- uml代写
- 数字业务技术代写
- 数字业务管理代写
- 结构分析代写
- tf-idf代写
- 地理代写
- financial modeling代写
- quantlib代写
- 电力电子元件代写
- atenda 2D代写
- 宏观代写
- 媒体代写
- 政治代写
- 化学代写
- 随机过程代写
- self attension算法代写
- arm assembly代写
- wireshark代写
- openCV代写
- Uncertainty Quantificatio代写
- prolong代写
- IPYthon代写
- Digital system design 代写
- julia代写
- Advanced Geotechnical Engineering代写
- 回答问题代写
- junit代写
- solidty代写
- maple代写
- 光电技术代写
- 网页代写
- 网络分析代写
- ENVI代写
- gimp代写
- sfml代写
- 社会学代写
- simulationX solidwork代写
- unity 3D代写
- ansys代写
- react native代写
- Alloy代写
- Applied Matrix代写
- JMP PRO代写
- 微观代写
- 人类健康代写
- 市场代写
- proposal代写
- 软件代写
- 信息检索代写
- 商法代写
- 信号代写
- pycharm代写
- 金融风险管理代写
- 数据可视化代写
- fashion代写
- 加拿大代写
- 经济学代写
- Behavioural Finance代写
- cytoscape代写
- 推荐代写
- 金融经济代写
- optimization代写
- alteryxy代写
- tabluea代写
- sas viya代写
- ads代写
- 实时系统代写
- 药剂学代写
- os代写
- Mathematica代写
- Xcode代写
- Swift代写
- rattle代写
- 人工智能代写
- 流体代写
- 结构力学代写
- Communications代写
- 动物学代写
- 问答代写
- MiKTEX代写
- 图论代写
- 数据科学代写
- 计算机安全代写
- 日本历史代写
- gis代写
- rs代写
- 语言代写
- 电学代写
- flutter代写
- drat代写
- 澳洲代写
- 医药代写
- ox代写
- 营销代写
- pddl代写
- 工程项目代写