xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

程序代写案例-1CIS 315

时间：2021-04-15

1CIS 315

Introduction to Business Data Analytics

WEEK 9

MARCH 8, 2021

2Course Roadmap

Data Analytics

Chapter 3:

Data Visualization

Chapter 2:

Descriptive Analytics

Chapter 7:

Regression

Chapter 8:

Time Series Analysis &

Forecasting

Chapter 12, 13, & 14:

Optimization & Prescriptive

Analytics

Experimental Design

Chapter 11:

Simulation

3Today’s Agenda

• Experimental Design

• Sampling

4Sampling

• So far you have been given

observational data

• Little to no control over variables

• Merely observe their values

• For example, Age, Income,

etc…

5Sampling

• But if you designed an experiment…

• Then you could control one or

more variables

• And observe their effect

6Sampling

• Experiment

• Apply treatments to experimental

units (such as people, animals,

land, etc.) and then observe the

effect of the treatments on the

experimental units

7Sampling

• Observational Studies

• Observe subjects and measure

variables of interest without

assigning treatments to subjects.

8Sampling

Experiment

How would you design an experiment?

Observational Study

How would you design an observational study?

Suppose you want to study the effect of smoking on

lung capacity in women

9Sampling

Experiment

Find 100 women, age 20, who do not currently

smoke

Randomly assign half (50) to the smoking treatment

and the other half to the no smoking treatment

Those in the smoking treatment should smoke a

pack a day for 10 years, while those in the no

smoking treatment should remain smoke free for 10

years.

After 10 years, measure lung capacity for each of

the 100 women

Analyze, interpret, and draw conclusions from the

data.

Observational Study

Find 100 women, age 30, for which 50 have

been smoking a pack a day for 10 years while

the other 50 have remained smoke free for those

10 years

Measure lung capacity for each of the 100

women

Analyze, interpret, and draw conclusions from

the data.

Suppose you want to study the effect of smoking on

lung capacity in women

10

Sampling

An economist obtains the unemployment rate and gross

state product for a sample of states over the past 10

years, with the objective of examining the relationship

between the unemployment rate and the gross state

product by census region.

Experiment or Observational Study?

Observational Study

11

Sampling

A psychologist tests the effect of three different

feedback programs by randomly assigning five rats to

each program and recording their response times at

specified intervals during the program.

Experiment or Observational Study?

Experiment

12

Sampling

A design in which the treatments are randomly

assigned to the experimental unit

Random Experiment

Is this a good choice?

13

Sampling

We want to test the null hypothesis that that

treatment means are all equal against the

alternative that at least two differ

The objective of a randomized design is to

usually compare the treatment means

! = " = # = ⋯ = $% = ℎ

14

Sampling

For example, suppose you randomly selected five males and

five females and looked at their SAT scores.

450 475 500 525 550 575 600 625 650

Female MaleFemale Average: 550

Male Average: 590

Can we conclude that there is a difference in test

scores between Females and Males?

No, as the difference in the means is

dominated by the sampling variability

15

Sampling

For example, suppose you randomly selected five males and

five females and looked at their SAT scores.

Female Average: 550

Male Average: 590

Can we conclude that there is a difference in test

scores between Females and Males?

Probably, as the difference in the means is

large relative to the sampling variability

450 475 500 525 550 575 600 625 650

Female Male

16

Sampling

The key to sampling is to compare the difference between

the treatment means with the amount of sampling variability

SST = Sum of Squares for Treatments

SSE = Sum of Squares for Error =$!"#$ !(̅! − )% Where ! is the sample size of the ith treatment, ̅!is the mean of the treatment and ̅ is the mean of the overall sample = ∑&"#'! (#& − ̅#)% + ∑&"#'" (%& − ̅%)%+ … + ∑&"#'# ($& − ̅$)%

Looks complicated, but we can rewrite to… = (#−1)#% + (%−1)%% +⋯+ ($−1)$%

Where s2 is the sample variance = ∑$%&' ()!*)̅)"'*&

17

Sampling

=$!"#$ !(̅! − )%

= ∑&"#'! (#& − ̅#)% + ∑&"#'" (%& − ̅%)%+ … + ∑&"#'# ($& − ̅$)% = 5 − 1 2250 + 5 − 1 2250 = 18000

But what we are really after is the MST and MSE…

For example, suppose you randomly selected five males and

five females and looked at their SAT scores.

= 5 550 − 570 2 + 5 590 − 570 2 = 4000450 475 500 525 550 575 600 625 650

Female Male

= (#−1)#% + (%−1)%% +⋯+ ($−1)$%

Don’t worry about calculating these right now…

18

Sampling

= (()$*#, where k-1 is the degrees of freedom

= − = 1800010 − 2 = 2250

MST = Mean Square for Treatments

(measures the variability among the treatment means)

MSE = Mean Square for Error

(measures the variability within the treatments)

= 40002 − 1 = 4000

19

Sampling

− =

Use the SST, SSE, MST, MSE => F-Statistic

− = 40002250 = 1.78

The F-statistic determines if the means of the treatment

groups are equal (H0) or different (Ha)

Can we reject the null hypothesis that means of the

treatments are equal?

20

Sampling

This graph will change based on the degrees of freedom in

the numerator and denominator, but what you want is that

your F-statistic is larger than the value of F at the

designated level of significance

21

Sampling

Historically, you would read a

table like this with the degrees

of freedom of the numerator in

the columns and the degrees of

freedom of the denominator in

the rows for a certain level of

significance… but now we

have technology that will give

you these numbers…

22

Sampling

Let’s choose level of significance of α=0.05. For our

example, the cut-off value of F0.05 is 5.32

Our F-statistic was 1.78

Can we reject the null hypothesis that means of the

treatments are equal?

NO!

Our F-stat = 1.78 < 5.31 = F0.05

We fail to reject the null hypothesis that the means are equal.

23

Sampling

Suppose you randomly selected five males and five females

and looked at their SAT scores.

450 475 500 525 550 575 600 625 650

Female Male

Let's do the same as before, but with this data.

24

Sampling = (5 − 1)(62.5) + (5 − 1)(62.5) = 500 = − = 50010 − 2 = 62.5 = 4000( ℎ ℎ ℎ ) − = 400062.5 = 64.0

Can we reject the null hypothesis that means of the

treatments are equal?

YES!

Our F-stat = 64.0 > 5.31 = F0.05

We can reject the null hypothesis that the means are equal.

25

Sampling

Since we rejected the null hypothesis that the means are equal,

we can conclude that the SAT mean score of males differs

from that of females.

450 475 500 525 550 575 600 625 650

Female Male

26

Sampling

This type of analysis is called ANOVA or Analysis of Variance

df SS MS F

Treatments − 1 SST = − 1 Error − SSE = −

Total − 1 SS(Total) = +

27

Sampling

Total Sum of Squares

SS(Total)

df=n-1

Sum of Squares for Treatments

SST

df=k-1

Sum of Squares for Error

SSE

df=n-k

28

Sampling

Example: Find the F-statistic and determine if we can reject

the null hypothesis of the means being equal at 0.10 level of

significance (F0.10=2.87)

df SS MS F

Treatments − 1 = 3 2794.39 = − 1

Error − = 36 762.30 = −

Total − 1 = 39 3556.69

Based on the table, tell me something about the experiment…

k=4, so we are comparing 4

different things

n=40, so we have 40 observations

29

Sampling

df SS MS F

Treatments − 1 = 3 2794.39 = − 1

Error − = 36 762.30 = −

Total − 1 = 39 3556.69

= − 1 = 2794.394 − 1 = 931.46 = − = 762.3040 − 4 = 21.18

= 931.4621.18 = 43.99

30

Sampling

= 931.4621.18 = 43.99 (F0.10=2.87)>

Can we reject the null hypothesis that means of the

treatments are equal at 0.10 level of significance?

Yes!

31

Sampling Example

Robotics researchers investigated whether robots could be

trained to behave like ants in an ant colony. Robots were

trained and randomly assigned to “colonies” (i.e. groups)

consisting of 3, 6, 9, or 12 robots. The robots were assigned

the tasks of foraging for “food” and recruiting another robot

when they identified a resource-rich area. One goal of the

experiment was to compare the mean energy expended (per

robot) of the four different sizes of colonies.

32

Sampling Example

1. Experiment or Observational Study? If experiment, what

kind?

2. Identify the treatments and the dependent variable.

3. Set up the null and alternative hypotheses of the test.

4. The following results were reported:

• F=7.70

• numerator df=3, denominator df=56

• F0.05=2.76

Interpret the results.

Randomized Experiment

Treatments: 3, 6, 9, 12 robots & Dependent Variable: Energy Expended

: = = = , : +

Reject H0 and conclude that the means

differ for the robot treatments

33

Sampling Example

We rejected H0 about the all means being equal for the robot

treatments, but that doesn’t tell you anything about the

difference between each treatment

Now we want to test = = = = = =

Essentially testing if the mean of the treatment with

3 robots is the same as the mean of the treatment

with 6 robots, etc…

When you have equal treatment sample sizes you

use the Tukey Method

You don’t want to do this by hand, instead have

the computer calculate the t-statistic and reject

that the means are equal if the t-statistic found is

larger than the t-statistic critical value (same

procedure as when we used the F-statistic)

34

Sampling Example

AB AC AD BC BD CD

A B A C A D B C B D C D

Mean 250.78 261.06 250.78 269.95 250.78 249.32 261.06 269.95 261.06 249.32 269.95 249.32

Variance 22.42 14.95 22.42 20.26 22.42 27.07 14.95 20.26 14.95 27.07 20.26 27.07

Observations 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00

df 18.00 18.00 18.00 18.00 18.00 18.00

t Stat -5.32 -9.28 0.66 -4.74 5.73 9.48

P(T<=t) one-tail 0.00 0.00 0.26 0.00 0.00 0.00

t Critical one-tail 1.33 1.33 1.33 1.33 1.33 1.33

Means of the two

samples

5.32 > 1.33, we can reject the

hypothesis that the mean of A

is the same as the mean of B

Do this for every

combination.

Which combination can you

not reject the hypothesis that

the means are the same?

A=3 robots, B=6 robots, C=9 robots, D=12 robots

35

Sampling

We can also introduce a better experimental design with

better controls to help account for variability

Take our SAT score example, what else could

we control for?

School GPA

Classes SES

Instead of selecting independent samples, we

choose experimental units (students in this

example) that are matched sets.

The matched sets are called blocks.

36

Sampling

df SS MS F

Treatments − 1 SST = − 1

Blocks − 1 SSB = − 1

Error − − + 1 SSE = − − + 1

Total − 1 SS(Total)

37

Sampling

Randomized Block Design

Attempting to reduce the sampling variability

of the experimental units in each block, which

in turn reduces the MSE

Now again we compare SAT scores of male

and female high school seniors, but now we

select matched pairs of females and males

according to their GPA and school

38

Sampling

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

39

Sampling

Follow the same procedure but now with the blocks

Start with the SST, which measures the variation

between female and male means

=CDE"$ (̅F" − ̅)#

Squaring the distance between each treatment mean

and the overall mean, multiplying each squared

distance by the number of measurement for the

treatment, and then summing over treatments

̅"!the sample mean

for the ith treatment,

b is the number of

blocks, k is the

number of treatments

40

Sampling

=CDE"$ (̅F" − ̅)# = 5 606 − 600 # + 5 594 − 600 # = 360

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

Number of Blocks Overall mean

41

Sampling

Now calculate the Sum of Squares for Blocks (SSB)

Measure of variation among the five block means

representing different schools and GPA

=CDE"G (̅H" − ̅)#

Squaring the squares of the differences between each

block mean and the overall mean, multiple each

squared difference by the number of measurements

for each block, and then sum over all blocks

̅#!the sample mean

for the ith block, k is

the number of

treatments

42

Sampling

SSB= 2 535 − 600 # + 2 560 − 600 # + 2() 585 −600 # + 2 630 − 600 # + 2 690 − 600 # = 30100

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

=CDE"G (̅H" − ̅)#Number of Treatments Overall mean

43

Sampling

In a randomized block design, the sampling

variability is measured by subtracting the portion

attributed to treatments and blocks from the total

sum of squares, SS(Total)() =CDE"I (D − ̅)#

44

Sampling

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

= (540 − 600)#+(530 − 600)#+⋯+ 690 − 600 #= 30600

() =:$%&' ($ − ̅), Overall mean

45

Sampling

In a randomized block design, the sampling

variability is measured by subtracting the portion

attributed to treatments and blocks from the total

sum of squares, SS(Total)

= − − = 30600 − 360 − 30100 = 140

= + +

Sum of Squares

for Treatment

Sum of Squares

for Blocks

Sum of Squares

for Error

46

Sampling

Total Sum of Squares

SS(Total)

df=n-1

Sum of Squares for

Treatments

SST

df=k-1

Sum of Squares for

Error

SSE

df=n-k

Sum of Squares for

Blocks

SSB

df=b-1

Sum of Squares for

Error

SSE

df=n-b-k+1

Randomized

Design

Randomized

Block

Design

47

Sampling

= − 1 = 3602 − 1 = 360

= − − + 1 = 14010 − 5 − 2 + 1 = 35

− = = 36035 = 10.29

48

Sampling

df SS MS F

Treatments − 1 = 2 − 1= 1 360 = − 1 = 3602 − 1= 360

=36035 = 10.29

Blocks − 1 = 4 30100 = − 1 = 301004= 7525

=752535 = 215

Error

− − + 1= 10 − 2 − 5 + 1= 4 140

= − − + 1= 14010 − 2 − 5 + 1 = 35

Total 14 30600

Use this to test

the difference in

the means of the

treatments

Use this to test

the difference in

the means of the

blocks

49

Sampling

!.!S = 7.71

= 10.29 > !.!S = 7.71

Can we reject the null hypothesis that the mean

SAT scores are the same for females and males?

YES! And we can conclude that the mean SAT

scores differ for females and males.

50

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design yielded the following results:

51

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

1. How many blocks and treatments were used in this experiment?

3 blocks, 5 treatments

52

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

2. How many observations were collected in the experiment?

15 observations

53

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

3. Specify the null and alternative hypotheses you would use to

compare the treatment means.: = = = = ,: \

54

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

4a. Which test statistic should you use to test the null hypothesis

regarding treatment means? − =

4b. Which test statistic should you use to test the null hypothesis

regarding block means? − =

55

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

5. Conduct the test for treatment means against F0.05=3.84 and

interpret the results. = . > . = .

reject H0 that the treatments means are equal

56

Today’s Agenda

• Experimental Design

• Experiment vs Observational

• Random Sampling

• ANOVA

• Randomized Block Design

57

Next Class

LAB DAY Data Analytics

Chapter 3:

Data Visualization

Chapter 2:

Descriptive Analytics

Chapter 7:

Regression

Chapter 8:

Time Series Analysis &

Forecasting

Chapter 12, 13, & 14:

Optimization & Prescriptive

Analytics

Experimental Design

Chapter 11:

Simulation

To Do List

58

Homework Assignment #6

Due 3/12/2021 by 12:00pm (noon)

学霸联盟

Introduction to Business Data Analytics

WEEK 9

MARCH 8, 2021

2Course Roadmap

Data Analytics

Chapter 3:

Data Visualization

Chapter 2:

Descriptive Analytics

Chapter 7:

Regression

Chapter 8:

Time Series Analysis &

Forecasting

Chapter 12, 13, & 14:

Optimization & Prescriptive

Analytics

Experimental Design

Chapter 11:

Simulation

3Today’s Agenda

• Experimental Design

• Sampling

4Sampling

• So far you have been given

observational data

• Little to no control over variables

• Merely observe their values

• For example, Age, Income,

etc…

5Sampling

• But if you designed an experiment…

• Then you could control one or

more variables

• And observe their effect

6Sampling

• Experiment

• Apply treatments to experimental

units (such as people, animals,

land, etc.) and then observe the

effect of the treatments on the

experimental units

7Sampling

• Observational Studies

• Observe subjects and measure

variables of interest without

assigning treatments to subjects.

8Sampling

Experiment

How would you design an experiment?

Observational Study

How would you design an observational study?

Suppose you want to study the effect of smoking on

lung capacity in women

9Sampling

Experiment

Find 100 women, age 20, who do not currently

smoke

Randomly assign half (50) to the smoking treatment

and the other half to the no smoking treatment

Those in the smoking treatment should smoke a

pack a day for 10 years, while those in the no

smoking treatment should remain smoke free for 10

years.

After 10 years, measure lung capacity for each of

the 100 women

Analyze, interpret, and draw conclusions from the

data.

Observational Study

Find 100 women, age 30, for which 50 have

been smoking a pack a day for 10 years while

the other 50 have remained smoke free for those

10 years

Measure lung capacity for each of the 100

women

Analyze, interpret, and draw conclusions from

the data.

Suppose you want to study the effect of smoking on

lung capacity in women

10

Sampling

An economist obtains the unemployment rate and gross

state product for a sample of states over the past 10

years, with the objective of examining the relationship

between the unemployment rate and the gross state

product by census region.

Experiment or Observational Study?

Observational Study

11

Sampling

A psychologist tests the effect of three different

feedback programs by randomly assigning five rats to

each program and recording their response times at

specified intervals during the program.

Experiment or Observational Study?

Experiment

12

Sampling

A design in which the treatments are randomly

assigned to the experimental unit

Random Experiment

Is this a good choice?

13

Sampling

We want to test the null hypothesis that that

treatment means are all equal against the

alternative that at least two differ

The objective of a randomized design is to

usually compare the treatment means

! = " = # = ⋯ = $% = ℎ

14

Sampling

For example, suppose you randomly selected five males and

five females and looked at their SAT scores.

450 475 500 525 550 575 600 625 650

Female MaleFemale Average: 550

Male Average: 590

Can we conclude that there is a difference in test

scores between Females and Males?

No, as the difference in the means is

dominated by the sampling variability

15

Sampling

For example, suppose you randomly selected five males and

five females and looked at their SAT scores.

Female Average: 550

Male Average: 590

Can we conclude that there is a difference in test

scores between Females and Males?

Probably, as the difference in the means is

large relative to the sampling variability

450 475 500 525 550 575 600 625 650

Female Male

16

Sampling

The key to sampling is to compare the difference between

the treatment means with the amount of sampling variability

SST = Sum of Squares for Treatments

SSE = Sum of Squares for Error =$!"#$ !(̅! − )% Where ! is the sample size of the ith treatment, ̅!is the mean of the treatment and ̅ is the mean of the overall sample = ∑&"#'! (#& − ̅#)% + ∑&"#'" (%& − ̅%)%+ … + ∑&"#'# ($& − ̅$)%

Looks complicated, but we can rewrite to… = (#−1)#% + (%−1)%% +⋯+ ($−1)$%

Where s2 is the sample variance = ∑$%&' ()!*)̅)"'*&

17

Sampling

=$!"#$ !(̅! − )%

= ∑&"#'! (#& − ̅#)% + ∑&"#'" (%& − ̅%)%+ … + ∑&"#'# ($& − ̅$)% = 5 − 1 2250 + 5 − 1 2250 = 18000

But what we are really after is the MST and MSE…

For example, suppose you randomly selected five males and

five females and looked at their SAT scores.

= 5 550 − 570 2 + 5 590 − 570 2 = 4000450 475 500 525 550 575 600 625 650

Female Male

= (#−1)#% + (%−1)%% +⋯+ ($−1)$%

Don’t worry about calculating these right now…

18

Sampling

= (()$*#, where k-1 is the degrees of freedom

= − = 1800010 − 2 = 2250

MST = Mean Square for Treatments

(measures the variability among the treatment means)

MSE = Mean Square for Error

(measures the variability within the treatments)

= 40002 − 1 = 4000

19

Sampling

− =

Use the SST, SSE, MST, MSE => F-Statistic

− = 40002250 = 1.78

The F-statistic determines if the means of the treatment

groups are equal (H0) or different (Ha)

Can we reject the null hypothesis that means of the

treatments are equal?

20

Sampling

This graph will change based on the degrees of freedom in

the numerator and denominator, but what you want is that

your F-statistic is larger than the value of F at the

designated level of significance

21

Sampling

Historically, you would read a

table like this with the degrees

of freedom of the numerator in

the columns and the degrees of

freedom of the denominator in

the rows for a certain level of

significance… but now we

have technology that will give

you these numbers…

22

Sampling

Let’s choose level of significance of α=0.05. For our

example, the cut-off value of F0.05 is 5.32

Our F-statistic was 1.78

Can we reject the null hypothesis that means of the

treatments are equal?

NO!

Our F-stat = 1.78 < 5.31 = F0.05

We fail to reject the null hypothesis that the means are equal.

23

Sampling

Suppose you randomly selected five males and five females

and looked at their SAT scores.

450 475 500 525 550 575 600 625 650

Female Male

Let's do the same as before, but with this data.

24

Sampling = (5 − 1)(62.5) + (5 − 1)(62.5) = 500 = − = 50010 − 2 = 62.5 = 4000( ℎ ℎ ℎ ) − = 400062.5 = 64.0

Can we reject the null hypothesis that means of the

treatments are equal?

YES!

Our F-stat = 64.0 > 5.31 = F0.05

We can reject the null hypothesis that the means are equal.

25

Sampling

Since we rejected the null hypothesis that the means are equal,

we can conclude that the SAT mean score of males differs

from that of females.

450 475 500 525 550 575 600 625 650

Female Male

26

Sampling

This type of analysis is called ANOVA or Analysis of Variance

df SS MS F

Treatments − 1 SST = − 1 Error − SSE = −

Total − 1 SS(Total) = +

27

Sampling

Total Sum of Squares

SS(Total)

df=n-1

Sum of Squares for Treatments

SST

df=k-1

Sum of Squares for Error

SSE

df=n-k

28

Sampling

Example: Find the F-statistic and determine if we can reject

the null hypothesis of the means being equal at 0.10 level of

significance (F0.10=2.87)

df SS MS F

Treatments − 1 = 3 2794.39 = − 1

Error − = 36 762.30 = −

Total − 1 = 39 3556.69

Based on the table, tell me something about the experiment…

k=4, so we are comparing 4

different things

n=40, so we have 40 observations

29

Sampling

df SS MS F

Treatments − 1 = 3 2794.39 = − 1

Error − = 36 762.30 = −

Total − 1 = 39 3556.69

= − 1 = 2794.394 − 1 = 931.46 = − = 762.3040 − 4 = 21.18

= 931.4621.18 = 43.99

30

Sampling

= 931.4621.18 = 43.99 (F0.10=2.87)>

Can we reject the null hypothesis that means of the

treatments are equal at 0.10 level of significance?

Yes!

31

Sampling Example

Robotics researchers investigated whether robots could be

trained to behave like ants in an ant colony. Robots were

trained and randomly assigned to “colonies” (i.e. groups)

consisting of 3, 6, 9, or 12 robots. The robots were assigned

the tasks of foraging for “food” and recruiting another robot

when they identified a resource-rich area. One goal of the

experiment was to compare the mean energy expended (per

robot) of the four different sizes of colonies.

32

Sampling Example

1. Experiment or Observational Study? If experiment, what

kind?

2. Identify the treatments and the dependent variable.

3. Set up the null and alternative hypotheses of the test.

4. The following results were reported:

• F=7.70

• numerator df=3, denominator df=56

• F0.05=2.76

Interpret the results.

Randomized Experiment

Treatments: 3, 6, 9, 12 robots & Dependent Variable: Energy Expended

: = = = , : +

Reject H0 and conclude that the means

differ for the robot treatments

33

Sampling Example

We rejected H0 about the all means being equal for the robot

treatments, but that doesn’t tell you anything about the

difference between each treatment

Now we want to test = = = = = =

Essentially testing if the mean of the treatment with

3 robots is the same as the mean of the treatment

with 6 robots, etc…

When you have equal treatment sample sizes you

use the Tukey Method

You don’t want to do this by hand, instead have

the computer calculate the t-statistic and reject

that the means are equal if the t-statistic found is

larger than the t-statistic critical value (same

procedure as when we used the F-statistic)

34

Sampling Example

AB AC AD BC BD CD

A B A C A D B C B D C D

Mean 250.78 261.06 250.78 269.95 250.78 249.32 261.06 269.95 261.06 249.32 269.95 249.32

Variance 22.42 14.95 22.42 20.26 22.42 27.07 14.95 20.26 14.95 27.07 20.26 27.07

Observations 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00 10.00

df 18.00 18.00 18.00 18.00 18.00 18.00

t Stat -5.32 -9.28 0.66 -4.74 5.73 9.48

P(T<=t) one-tail 0.00 0.00 0.26 0.00 0.00 0.00

t Critical one-tail 1.33 1.33 1.33 1.33 1.33 1.33

Means of the two

samples

5.32 > 1.33, we can reject the

hypothesis that the mean of A

is the same as the mean of B

Do this for every

combination.

Which combination can you

not reject the hypothesis that

the means are the same?

A=3 robots, B=6 robots, C=9 robots, D=12 robots

35

Sampling

We can also introduce a better experimental design with

better controls to help account for variability

Take our SAT score example, what else could

we control for?

School GPA

Classes SES

Instead of selecting independent samples, we

choose experimental units (students in this

example) that are matched sets.

The matched sets are called blocks.

36

Sampling

df SS MS F

Treatments − 1 SST = − 1

Blocks − 1 SSB = − 1

Error − − + 1 SSE = − − + 1

Total − 1 SS(Total)

37

Sampling

Randomized Block Design

Attempting to reduce the sampling variability

of the experimental units in each block, which

in turn reduces the MSE

Now again we compare SAT scores of male

and female high school seniors, but now we

select matched pairs of females and males

according to their GPA and school

38

Sampling

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

39

Sampling

Follow the same procedure but now with the blocks

Start with the SST, which measures the variation

between female and male means

=CDE"$ (̅F" − ̅)#

Squaring the distance between each treatment mean

and the overall mean, multiplying each squared

distance by the number of measurement for the

treatment, and then summing over treatments

̅"!the sample mean

for the ith treatment,

b is the number of

blocks, k is the

number of treatments

40

Sampling

=CDE"$ (̅F" − ̅)# = 5 606 − 600 # + 5 594 − 600 # = 360

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

Number of Blocks Overall mean

41

Sampling

Now calculate the Sum of Squares for Blocks (SSB)

Measure of variation among the five block means

representing different schools and GPA

=CDE"G (̅H" − ̅)#

Squaring the squares of the differences between each

block mean and the overall mean, multiple each

squared difference by the number of measurements

for each block, and then sum over all blocks

̅#!the sample mean

for the ith block, k is

the number of

treatments

42

Sampling

SSB= 2 535 − 600 # + 2 560 − 600 # + 2() 585 −600 # + 2 630 − 600 # + 2 690 − 600 # = 30100

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

=CDE"G (̅H" − ̅)#Number of Treatments Overall mean

43

Sampling

In a randomized block design, the sampling

variability is measured by subtracting the portion

attributed to treatments and blocks from the total

sum of squares, SS(Total)() =CDE"I (D − ̅)#

44

Sampling

Block Female SAT Score Male SAT Score Block Mean

School A, 2.75 GPA 540 530 535

School B, 3.00 GPA 570 550 560

School C, 3.25 GPA 590 580 585

School D, 3.50 GPA 640 620 630

School E, 3.75 GPA 690 690 690

Treatment Mean 606 594

= (540 − 600)#+(530 − 600)#+⋯+ 690 − 600 #= 30600

() =:$%&' ($ − ̅), Overall mean

45

Sampling

In a randomized block design, the sampling

variability is measured by subtracting the portion

attributed to treatments and blocks from the total

sum of squares, SS(Total)

= − − = 30600 − 360 − 30100 = 140

= + +

Sum of Squares

for Treatment

Sum of Squares

for Blocks

Sum of Squares

for Error

46

Sampling

Total Sum of Squares

SS(Total)

df=n-1

Sum of Squares for

Treatments

SST

df=k-1

Sum of Squares for

Error

SSE

df=n-k

Sum of Squares for

Blocks

SSB

df=b-1

Sum of Squares for

Error

SSE

df=n-b-k+1

Randomized

Design

Randomized

Block

Design

47

Sampling

= − 1 = 3602 − 1 = 360

= − − + 1 = 14010 − 5 − 2 + 1 = 35

− = = 36035 = 10.29

48

Sampling

df SS MS F

Treatments − 1 = 2 − 1= 1 360 = − 1 = 3602 − 1= 360

=36035 = 10.29

Blocks − 1 = 4 30100 = − 1 = 301004= 7525

=752535 = 215

Error

− − + 1= 10 − 2 − 5 + 1= 4 140

= − − + 1= 14010 − 2 − 5 + 1 = 35

Total 14 30600

Use this to test

the difference in

the means of the

treatments

Use this to test

the difference in

the means of the

blocks

49

Sampling

!.!S = 7.71

= 10.29 > !.!S = 7.71

Can we reject the null hypothesis that the mean

SAT scores are the same for females and males?

YES! And we can conclude that the mean SAT

scores differ for females and males.

50

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design yielded the following results:

51

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

1. How many blocks and treatments were used in this experiment?

3 blocks, 5 treatments

52

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

2. How many observations were collected in the experiment?

15 observations

53

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

3. Specify the null and alternative hypotheses you would use to

compare the treatment means.: = = = = ,: \

54

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

4a. Which test statistic should you use to test the null hypothesis

regarding treatment means? − =

4b. Which test statistic should you use to test the null hypothesis

regarding block means? − =

55

Sampling Example

df SS MS F

Treatments 4 501 125.25 9.11

Blocks 2 225 112.5 8.18

Error 8 110 13.75

Total 14 836

A randomized block design

yielded the following results:

5. Conduct the test for treatment means against F0.05=3.84 and

interpret the results. = . > . = .

reject H0 that the treatments means are equal

56

Today’s Agenda

• Experimental Design

• Experiment vs Observational

• Random Sampling

• ANOVA

• Randomized Block Design

57

Next Class

LAB DAY Data Analytics

Chapter 3:

Data Visualization

Chapter 2:

Descriptive Analytics

Chapter 7:

Regression

Chapter 8:

Time Series Analysis &

Forecasting

Chapter 12, 13, & 14:

Optimization & Prescriptive

Analytics

Experimental Design

Chapter 11:

Simulation

To Do List

58

Homework Assignment #6

Due 3/12/2021 by 12:00pm (noon)

学霸联盟