Biostatistics M235
Winter 2021
Homework 1
Due: Friday, Feb. 12, 2021
1. A partially filled in analysis of variance table is shown below
Source of
variation df SS MS F
A 2 ___________ 12.07 ______
B ____ ___________ ______ ______
AB interaction 6 ___________ ______ 3.04
Error _____ ___________ 12.09
Total 59 870.93
1.(a) Fill in the blanks in the table above
1.(b) How many treatment combinations are there in this study (where a treatment combination refers to
a unique combination of one of the levels of Factor A and one of the levels of Factor B)?
1.(c) Identify critical values at the = 0.05 level for the F tests of the A main effect, B main effect, and
AB interaction effect. In addition, describe which, if any, of the F-tests are statistically significant at the
= 0.05 level.
2. Characterize each of the following statements as “True” or “False”, and explain your reasoning in a
sentence or two.
2.(a) If you were to use regression analysis with dummy variables to analyze data from a two-factor
factorial design, then the number of dummy variables used to represent main effects of the study factors is
guaranteed to be greater than or equal to the number of dummy variables used to represent interaction
effects involving the study factors.
2.(b) If a two-way interaction effect is statistically significant, then the main effects of the factors
contributing to the interaction effect must also be statistically significant.
2.(c) In a multi-way factorial design with at least 2 replications of each treatment combination, the
number of degrees of freedom for error in the factorial-analysis-of-variance breakdown is always larger
than the number of degrees of freedom for interaction effects.
2.(d) In a multi-way factorial design with each treatment factor at 2 levels, it would be possible to
determine the significance of every main effect and interaction term using tabled values from the family
of t distributions.
2.(e) If an F-test for one-way ANOVA is statistically significant at =0.05, then there must be a contrast
between group means that is also statistically significant at =0.05.
3. A study is done to explore patterns of milk consumption in boys, featuring 10 boys in each of four age
categories. Average milk consumption (in ounces) in each group is described below:
Age
6-8 8-10 10-12 12-14
Average milk
consumption, boys 20.0 24.0 25.0 27.0
3.(a) After assigning labels to group means, write down contrast coefficients appropriate for testing
whether there is a linear trend in milk consumption across age groups, and calculate the estimated value of
the contrast.
3.(b) Suppose data were also collected on 10 girls in each of the four age categories, with the following
results:
Age
6-8 8-10 10-12 12-14
Average milk
consumption, girls 22.0 21.0 20.0 17.0
Suppose you are also told that the MSE in a two-way ANOVA (involving age group and gender) is equal
to 9.00. Making use of the formula
which is appropriate for a balanced design with n replicates for each combination of age group and
gender, comment on whether the effects of gender and age are additive.
3.(c) (6 points) Describe a contrast that could be used to test whether there is a difference between the
degree to which there is a linear trend for boys and the degree to which there is a linear trend for girls, and
carry out the test at the = 0.05 significance level, making use of the information provided in problems
3.(a) and 3.(b).
2
1 1
SSAB (Y Y Y +Y ) ,
a b
ij i j
i j
n
4. Suppose data are collected on blood oxygen levels of 20 males who smoke at least one pack of
cigarettes a day: Group 1 (n1=5) aged 40-49, Group 2 (n2=5) aged 50-59, Group 3 (n3=5) aged 60-69, and
Group 4 (n4=5) aged 70-79. Suppose also that the mean-squared error in a one-way analysis of variance
across these four groups is 25.0.
(a) Suppose the sample means across Groups 1, 2, 3, and 4 are 79, 81, 82, and 86, respectively. Carry
out the ANOVA F-test and orthogonal polynomial contrasts (linear, quadratic, and cubic), and
summarize in a sentence the main findings from these analyses.
(b) Suppose the sample means across Groups 1, 2, 3, and 4 are 79, 86, 82, and 81, respectively. Carry
out the ANOVA F-test and orthogonal polynomial contrasts (linear, quadratic, and cubic), and
summarize in a sentence the main findings from these analyses.
(c) Comment on the similarities and differences in findings between (a) and (b).
(d) Describe a reasonable approach in this context regarding multiple comparisons, and comment on
how a multiple-comparisons perspective might affect your summary of the study results.
5. Consider a study of men aged 55 where their heart rate after running on a treadmill is measured in a
setting where the investigator exercises control over four factors:
A: Temperature in the room (a1: 72oF , a2: 84oF)
B: Whether the treadmill speed is 6.0 miles/hour or 7.2 miles/hour (b1: 6.0 , b2: 7.2)
C: Relative humidity in the room (c1: 40% , c2: 80%)
D: Oxygen level in room (d1: normal = 20%, d2: high = 24%)
Suppose heart rate measurements are obtained for one individual for each possible treatment combination.
5.(a) Suppose it is decided to carry out the analysis assuming that there are no interaction effects higher
than two-way interactions. List sources of variation that would appear in an ANOVA table for such an
analysis along with corresponding degrees of freedom.
5.(b) Suppose instead of studying one individual for each possible treatment combination among four
factors where each factor is at two levels, it was decided to study one individual for each possible
treatment combination among two factors where each factor is at four levels, namely:
A: Temperature in the room (a1: 72oF , a2: 76oF, a3: 80oF , a4: 84oF)
B: Treadmill speed (b1: 6.0 miles/hr , b2: 6.4 miles/hr, b3: 6.8 miles/hr , b4: 7.2 miles/hr)
Describe a procedure that could be used to test whether there is a significant multiplicative interaction
between factor A and factor B, and describe the structure of an ANOVA table (i.e., list sources of
variation and corresponding degrees of freedom along with notation such as “SSA” for sum of squares
associated with factor A) that would provide a basis for testing whether there is a significant
multiplicative interaction between factor A and factor B.
5.(c) Discuss whether there are any advantages of the design described in problem 5.(a) compared to the
design in problem 5.(b) as well as whether there are any advantages of the design in problem 5.(b) over
the design in problem 5.(a) in terms of the ability to detect significant differences that might be of
scientific interest.
6. An experiment has been run to compare mobility scores (Y) at three different doses (X) of a drug,
where the doses considered were X = 5, 10, or 40. Fifteen observations were made overall, with five
observations taken at each of X = 5, X = 10, and X = 40.
Summary statistics for the data from each group are given in the table below:
Dose
Group
Sample
size
Dose
Sample mean
(by group)
Sample variance
(by group)
g=1 n1 = 5 5 1Y = 7.6 21S = 4.300 = (2.073644)2
g=2 n2 = 5 10 2Y =11.0 22S = 0.625 = (0.790569)2
g=3 n3 = 5 40 3Y = 13.8 23S = 8.200 = (2.863564)2
A scatterplot of mobility score vs. dose, with the fitted regression line included, is shown below:
The following output comes from the regression of mobility score on dose:
The REG Procedure
Dependent Variable: Mobility
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 78.39070 78.39070 14.45 0.0022
Error 13 70.50930 5.42379
Corrected Total 14 148.90000
Root MSE 2.32890 R-Square 0.5265
Dependent Mean 10.80000 Adj R-Sq 0.4900
Coeff Var 21.56392
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 8.08837 0.93291 8.67 <.0001
Dose 1 0.14791 0.03891 3.80 0.0022
The following output comes from a one-way analysis of variance treating dose groups as a class variable:
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 96.40000 48.20000 11.02 0.0019
Error 12 52.50000 4.37500
Corrected Total 14 148.90000
6.(a) Suppose you want to explain the difference in the F-test findings from these two analyses to a
colleague who is a subject-matter specialist in the field being studied (perhaps a medical doctor or a
public-health official) whom you know to be familiar with regression and analysis of variance based on
knowing that the colleague obtained a masters degree in public health. Explain in a few sentences why
the F-test results are different in a way that you expect would be convincing to this colleague.
6.(b) Making use of the formula
3
2
1
( )Treatment i i
i
SS n Y Y
which is appropriate for a one-way layout with ni replicates in each treatment arm, show how the
Model sum-of-squares value in the analysis-of-variance output is calculated.
6.(c) Instead of fitting a linear regression of mobility score (Y) versus dose (X), suppose we had
introduced indicators for dose groups, letting the X=5 group be the reference level, defining
X1 = 1 if dose = 10 and X2 = 1 if dose = 40
= 0 else = 0 else ,
and fitting the regression of Y on X1 and X2 . What would be the estimated values of 1 (the coefficient
of X1) and 2 (the coefficient of X2) using this approach?
6.(d) (6 points) Suppose we were to fit the regression of Y on X and X2, where X2 is the square of the
dose value X. Comment on whether you think the coefficient of X2 would be positive, negative, or
identically equal to zero, and describe in a sentence the evidence that leads you to this conclusion.
6.(e) When we fit the regression of Y on X1 and X2 as defined in part 6.(c), we obtain the following
analysis-of-variance table:
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 96.40000 48.20000 11.02 0.0019
Error 12 52.50000 4.37500
Corrected Total 14 148.90000
When we fit the regression of Y on X and X2 as defined in part 7.(d), we obtain the following analysis-of-
variance table:
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 96.40000 48.20000 11.02 0.0019
Error 12 52.50000 4.37500
Corrected Total 14 148.90000
In a way that would be understandable to a professional colleague, describe in a sentence or two why
these different analyses give rise to identical analysis-of-variance tables. Also describe why the approach
in part 6.(c) can be viewed as providing a test of non-linearity.
7. True or false: In a 33 factorial design involving Factor A (with levels a1, a2, and a3) and Factor B
(with levels b1, b2, and b3) with 1 replication per treatment combination, if the outcomes are given by
Factor B
b1 b2 b3
Factor A
a1 2 2 2
a2 2 3 4
a3 2 4 6
then a Tukey one-degree-of-freedom test for non-additivity would be significant at the 0.05 level.
Explain your reasoning. (Hint: It is possible to answer this question without having memorized the
analysis-of-variance formula for the Tukey test of non-additivity.)
学霸联盟