R代写-MODELS 639
时间:2022-05-28
MULTILEVEL LINEAR MODELS 639
suggesting that the simple random effect structure for the clustering of fetal weights
is adequate. The estimate of the intra-cluster correlation, p = 0.57, indicates that
there are moderate litter effects.
Finally, to assess the adequacy of the linear dose-response trend, we considered a
model that included a quadratic effect of (transformed) dose. Both Wald and likeli-
hood ratio tests of the quadratic effect of dose indicated that the linear trend is adequate
for these data (Wald W 2 = 1.38, with 1 df, p > 0.20; likelihood ratio G2 = 1.37,
with l df, p > 0.20).
Television School and Family Smoking Prevention and Cessation Project
Although smoking prevalence has declined among adults in recent decades, substan-
tial numbers of young people begin to smoke and become addicted to tobacco. The
Television, School and Family Smoking Prevention and Cessation Project (TVSFP)
was a study designed to determine the efficacy of a school-based smoking prevention
curriculum in conjunction with a television-based prevention program, in terms of
preventing smoking onset and increasing smoking cessation (Flay et al., 1995). The
study used a 2 x 2 factorial design, with four intervention conditions determined by
the cross-classification of a school-based social-resistance curriculum (CC: coded 1
= yes, 0 = no) with a television-based prevention program (TV: coded 1 = yes. 0 =
no). Randomization to one of the four intervention conditions was at the school level,
while much of the intervention was delivered at the classroom level. The TVSFP
study is described in greater detail in Flay et al. (1995).
The original study involved 6695 students in 47 schools in Southern California.
Our analysis focuses on a subset of 1600 seventh-grade students from 135 classes in
28 schools in Los Angeles. The response variable, a tobacco and health knowledge
scale (THKS), was administered before and after randomization of schools to one of
the four intervention conditions. The scale assessed a student's knowledge of tobacco
and health.
We considered a linear model for the post-intervention THKS score, with the
baseline or pre-intervention THKS score as a covariate. This model for the adjusted
change in THKS scores included the main effects of CC and TV and the CC x TV
interaction. School and classroom effects were modeled by incorporating random
effects at levels 3 and 2, respectively. Letting Yijk denote the post-intervention THKS
score of the ith student within the lh classroom within the k th school, our model is
given by
(3 (3 (3 b(3) (2) Yijk = f31 + fJ2Pre-THKS + 3CC + 4TV + sCC x TV+ k + b1k + Eijk,
where Eijk ~ N(0, O"i), bJ~ ~ N(0, O"~), and b~3) ~ N(0, 0"5). Once again, in
a slight departure from the notation introduced previously, the first-, second-, and
third-level variances are denoted by O"i, O"~, and 0"5, respectively.
The results of fitting this model to the data are presented in Table 22.3. The
REML estimates of the three sources of variability indicate that there is variability at
both classroom and school levels, with almost twice as much variability among class-
640 MULTILEVEL MODELS
Table 22.3 Fixed and random effects estimates for the THKS scores from the Television,
School and Family Smoking Prevention and Cessation Project.
Variable
Intercept
Pre-Intervention THKS
cc
TV
cc X TV
Level 3 Variance:
O"§
Level 2 Variance:
O"§
Level 1 Variance:
af
Estimate
1.702
0.305
0.641
0.182
-0.331
0.039
0.065
1.602
SE z
0.1254 13.57
0.0259 11.79
0.1609 3.99
0.1572 1.16
0.2245 -1.47
0.0253
0.0286
0.0591
rooms within a school as among schools. The correlation among the THKS scores for
classmates (or children within the same classroom within the same school) is approx-
imately 0.061 (or 0_0~;}:~1~~~~602 ), while the correlation among the THKS scores for
children from different classrooms within the same school is approximately 0.023 (or
0_039+0o~ti+ 1 _602 ). The estimates of the fixed effects for the intervention conditions,
when compared to their standard errors, indicate that neither the mass-media inter-
vention (TV) nor its interaction with the social-resistance classroom curriculum (CC)
have an impact on adjusted changes in the THKS scores from baseline. There is a sig-
nificant effect of the social-resistance classroom curriculum, with children assigned
to the social-resistance curriculum showing increased knowledge about tobacco and
health. The estimate of the main effect of CC, in the model that excludes the CC x
TV interaction, is 0.47 (SE= 0.113, p < 0.0001).
The intra-cluster correlations at both the school and classroom levels are relatively
small. The reader might be tempted to regard this as an indication that the clustering
in these data is inconsequential. However, such a conclusion would be erroneous.
Although the intra-cluster correlations are relatively small, they have a substantial im-
pact on inferences concerning the effects of the intervention conditions. To illustrate
this, consider the following model for the adjusted changes in THKS scores:
Y.;jk = /31 + /32Pre-THKS + /33CC + /34TV + /35CC x TV+ Eijk,
where Eijk ~ N(O, a 2). This model ignores clustering in the data at the classroom
and school levels; it is a standard linear regression model and assumes independent
MULTILEVEL GENERALIZED LINEAR MODELS 641
Table 22.4 Fixed effects estimates from analysis that ignores clustering in the THKS scores
from the Television, School and Family Smoking Prevention and Cessation Project.
Variable Estimate SE z
Intercept 1.661 0.0844 19.69
Pre-Intervention THKS 0.325 0.0258 12.58
cc 0.641 0.0921 6.95
TV 0.199 0.0900 2.21
CC x TV -0.322 0.1302 -2.47
observations and homogeneous variance. The results of fitting this model to the THKS
scores are presented in Table 22.4. The estimates of the fixed effects are similar to
those reported in Table 22.3. However, the model-based standard errors (assuming
no clustering) are misleadingly small for the randomized intervention effects and lead
to substantively different conclusions about the effects of the intervention conditions.
This highlights an important lesson: the impact of clustering depends on both the
magnitude of the intra-cluster correlation and the cluster size. For the data from the
TVSFP, the cluster sizes vary from 1 to 13 classrooms within a school and from 2 to
28 students within a classroom. With relatively large cluster sizes, even very modest
intra-cluster correlation can have a discernible impact on inferences.
22.4 MULTILEVEL GENERALIZED LINEAR MODELS
So far the discussion of multilevel models has focused on linear models for a continu-
ous response, where clustering was accounted for through the introduction of random
effects at different levels. Next we briefly describe how multilevel modeling can be
extended to discrete response data. These models can be thought of as multilevel
generalized linear models, and they extend in a natural way the conceptual approach
described in Section 22.3. However, they differ in terms of assumptions concern-
ing the distribution of observations at level 1. The level 1 observations are no longer
required to have a normal distribution; instead, they are assumed to have a distribution
belonging to the exponential family ( e.g., Bernoulli or Poisson). We focus on models
for two- and three-level data; the generalizations to more levels follow directly.
22.4.1 Two-Level Generalized Linear Models
The basic premise of multilevel generalized linear models is that clustering among
units can be thought of as arising from their sharing a set of random effects. For
example, with two-level binary data, it is assumed that the clustering of level 1 units
642 MULTILEVEL MODELS
(within level 2 units) can be accounted for by heterogeneity across level 2 clusters
in a subset of the regression coefficients from a generalized linear model (e.g., a
logistic regression model with randomly varying intercepts). Conditional on the
random effects, the level 1 observations are assumed to be independent and with a
distribution belonging to the exponential family (e.g., Bernoulli).
In our description of two-level generalized linear models we adopt the notation
used earlier. Let Y;j denote the response on the ith level 1 unit in the lh level 2
cluster; the response can be continuous, binary, or a count. Associated with each Y;j
is a 1 x p (row) vector of covariates, Xij• We can formulate two-level models for
discrete ( and continuous) outcomes, 1'.;j, using the familiar three-part specification of
generalized linear mixed effects models outlined in Chapter 14:
1. We assume that the conditional distribution of each Y;j, given a vector of
random effects bj (and the covariates), belongs to the exponential family of
distributions and that Var(Y;jlbj) = v{E(Y;jlbj)} ¢, where v(·) is a known
variance function, a function of the conditional mean, E(Y;j lbj ), and ¢ is a
scale or dispersion parameter. In addition, given the random effects bj, it is
assumed that the Y;j are independent of one another.
2. The conditional mean of Y;j is assumed to depend on fixed and random effects
via the following linear predictor:
with
g{ E(Y;j lbj)} = 7/ij = Xij/3 + Zijbj
for some known link function, g ( ·).
3. The random effects are assumed to have some probability distribution. In
principle, any multivariate distribution can be assumed for the b1; in practice,
for computational convenience, the random effects are usually assumed to have
a multivariate normal distribution, with zero mean and covariance matrix, G.
These three components completely specify a broad class of two-level generalized
linear models for different types of responses. Next, to clarify the main ideas, we
consider two examples of multilevel generalized linear models in greater detail.
Example 1 : Two-Level Generalized Linear Model for Counts
Consider a study comparing cross-national rates of skin cancer and the factors (e.g.,
climate, economic and social factors, regional differences in diagnostic procedures)
that influence variability in the rates of disease. Suppose that we have counts of the
number of cases of skin cancer in a set of well-defined regions, indexed by i, within
countries, indexed by j. Let Y.;1 be a count of the number of individuals who develop
skin cancer within the i th region of the lh country during a given period of time (e.g.,
5 years). The resulting counts have a two-level structure with regional units at the
MULTILEVEL GENERALIZED LINEAR MODELS 643
lower level (level 1 units) nested within countries (level 2 units). Usually the analysis
of count data requires knowledge of the denominator, the population at risk. That is,
the rate at which the disease occurs is of more direct interest than the corresponding
count.
Counts are often modeled as Poisson random variables using a log link function.
This motivates the following illustration of a two-level generalized linear model for
Y;j given by the three-part specification:
1. Conditional on a vector of random effects bj, the Y;j are assumed to be indepen-
dent observations from a Poisson distribution, with Var(Y;j lb1) = E(Y;j lbj ),
(i.e., ¢ = 1).
2. The conditional mean of Y;1 depends on fixed and random effects via the
following log link function,
log {E(Y;jlb1)} = log(Ti1) + Xi1(3 + Zijb1,
where Tij is the population at risk in the i th region of the jlh country and
log(Tij) is an offset.
3. The random effects are assumed to have a multivariate normal distribution,
with zero mean and covariance matrix G.
This is an example of a two-level log-linear model that assumes a linear relationship
between the log rate of disease occurrence and the covariates.
Example 2: Two-Level Generalized Linear Model for Binary Responses
Consider a study of men with newly diagnosed prostate cancer. The study is designed
to evaluate the factors that determine physician recommendations for surgery (radical
prostatectomy) versus radiation therapy. In particular, it is of interest to determine
the relative importance of patient factors (e.g., patient's age, level of prostate spe-
cific antigen) and physician factors (e.g., specialty training, years of experience) on
physician recommendations for treatment. Many patients in the study seek the rec-
ommendation of the same physician. As a result patients (level 1 units) are nested
within physicians (level 2 units). For each patient we have a binary outcome denoting
the physician recommendation (surgery versus radiation therapy).
Let Y;j be the binary response, taking values O and 1 (e.g., denoting surgery or
radiation therapy) for the ith patient of the jlh physician. An illustrative example of
a two-level logistic model for Y;j is given by the following three-part specification:
1. Conditional on a single random effect bj, the Y;j are independent and have a
Bernoulli distribution, with Var(Y;j lbj) = E(Y;j lbj) {1 - E(Y;j lbj )},
(i.e.,¢= 1).
2. The conditional mean of Y;j depends on fixed and random effects via the
following linear predictor:
644 MULTILEVEL MODELS
with
{ Pr(Y;1 = llb1)} log Pr(Y;j = 0lb1) = 'T/ij = Xij/3 + b1.
That is, the conditional mean of Y;1 is related to the linear predictor by a logit
link function.
3. The single random effect b1 is assumed to have a univariate normal distribution,
with zero mean and variance g 11 .
In this example the model is a simple two-level logistic regression model with ran-
domly varying intercepts. In principle, the linear predictor can include additional
random effects. However, some caution must be exercised because there is usually
not much information in binary data to estimate more than a single variance compo-
nent unless the number of level l units is relatively large.
In Section 11.3 we discussed how the logistic regression model for binary data can
also be developed through the notion of an underlying latent variable distribution.
The same notion can be applied to multilevel models for binary responses. Suppose
that Lij is a latent (i.e., unobserved) continuous variable and that a positive response
is observed only when Lij exceeds some threshold. Consider the following two-level
linear model for Lij,
Li1 = Xi1/3 + Zi1b1 + Eij,
where the random effects are assumed to have a multivariate normal distribution,
with mean zero and covariance matrix, G, and the Eij are assumed to have a standard
logistic distribution, with mean zero and variance 1r2 /3. Without loss of generality,
we can assume the threshold for categorizing Lij is zero, with
Y;1 = 1 if Lij > 0,
Y;1 = 0 if Lij :'.S 0.
Then the relationship between Y;1 and Lij results in a logistic regression model for
Pr(Y;1 = llb1). That is, the two-level linear model for Lij with standard logistic
errors,
Lij = xi1/3 + zi1b1 + f.ij,
implies the two-level logistic regression model for Y;,
Using the notion of an underlying latent variable distribution, we can then compare the
magnitudes of the between-cluster and within-cluster sources of variability of the Lij.
For example, in a two-level logistic regression model with a single random effect b1
( with variance g11 ), the relative magnitudes of the between-cluster and within-cluster
sources of variability can be summarized in terms of the intra-cluster correlation
p = Corr(Lij, Li'j) = gu 2 / ; (for i -/= i'). g11 +1r 3
MULTILEVEL GENERALIZED LINEAR MODELS 645
Note that p is the marginal correlation among the latent variables, Lij; it is not the
marginal correlation among the binary variables, ¥;1.
Although in both of the examples of two-level generalized linear models we have
chosen canonical link functions to relate the conditional mean of ¥;1 to 'T/ij, in prin-
ciple, any suitable link function can be selected. These two examples are intended
to be purely illustrative. They demonstrate how the choices of the three components
might differ according to the type of response variable.
So far our discussion of two-level models has closely paralleled the description
of the generalized linear mixed effects model given in Chapter 14 (albeit, with the
indices i and j reversed). In Chapter 14 we focused on two-level models where
measurement occasions are the level 1 units and subjects are the level 2 units; this is
a special case of two-level data. However, the methods in Chapter 14 can be applied
more broadly to different types of two-level data and also extend naturally to more
than two levels.
22.4.2 Three-Level Generalized Linear Models
The extension of two-level generalized linear models to three or more levels is straight-
forward and follows from the previous sections. With three-level data the variability
of the response is accounted for by the introduction of random effects at both levels
2 and 3. For a three-level data structure, we adopt the same notation as in Section
22.3, except that the response can be continuous, binary, or a count. Let Yijk denote
the response on the i th level I unit within the lh level 2 cluster within the k th level
3 cluster. Associated with each Y;jk is a 1 x p (row) vector of covariates, Xijk, with
the covariates defined at different levels. A three-level generalized linear model for
Y;jk is given by the following three part specification:
1. We assume that the conditional distribution of each Y;jk, given vectors of
random effects, bk3) and bj~ (defined at levels 3 and 2, respectively), be-
longs to the exponential family of distributions and that Var(Y;jk lbk3), b;!)) =
v{ E(Y;jk lbk3), bJ~)} ¢, where v( ·) is a known variance function, a function of
the conditional mean, E(Y;jk lbk3), bj~ ), and¢ is a scale or dispersion parame-
ter. In addition, given bk3) and bj~, it is assumed that the Y;jk are independent
of one another.
2. The conditional mean of Y;j k is assumed to depend on fixed and random effects
via the following linear predictor:
_ (3) (3) (2) (2)
'T/ijk - xijk/3 + zijk bk + zijk bjk ,
with
g{ E(Y;jk lbk3), bJ~)} = 'T/ijk = Xijk/3 + zi~z bk3 ) + zi~z bJ~,
for some known link function, g( · ).
646 MULTILEVEL MODELS
3. The random effects are assumed to have multivariate normal distributions, with
mean zero and covariance matrices, Cov(bk3)) = c(3) and Cov(b;r) = G(2).
Although random effects may be correlated within a given level, random effects
at different levels are assumed to be independent of each other.
These three components completely specify a broad class of three-level generalized
linear models.
22.4.3 Estimation
The multilevel generalized linear models described in the previous section fully spec-
ify the joint distribution of the responses at level 1 and the random effects at all higher
levels. As a result we can base estimation and inference on the likelihood func-
tion. However, unlike the case with a continuous response assumed to have a normal
distribution, maximum likelihood (ML) estimation for multilevel generalized linear
models is not straightforward and will, in general, require numerical quadrature.
For example, for three-level data, inference about /3, G(2l and G(3l is based on
the marginal likelihood function. The marginal likelihood can be expressed as the
product of the probability density functions, f (Yk), for the level 3 units. Specifically,
the marginal log-likelihood function is given by the following sum:
n3
Llogf(Yk),
k=l
where f (Yk) can be obtained by recognizing that observations on level 2 units (within
level 3 units) are conditionally independent of one another given the level 3 random
effects, b(3 ); similarly, observations on level 1 units (within level 2 units) are condi-
tionally independent of one another given the level 3 and level 2 random effects, b(3)
and b(2l. The k th level 3 unit's contribution to the likelihood function is
where f ( b ;r) and f ( bk3)) denote the multivariate normal distributions forthe random
effects at levels 2 and 3, respectively. The ML estimates of /3, G(2l and G(3l are simply
those values of /3, G(2l and G(3) that maximize the marginal log-likelihood function.
The primary reason for displaying the expression given above is to highlight that
multivariate integrals must be evaluated to compute the marginal log-likelihood. That
is, the log-likelihood function is obtained by integrating out or averaging over the
distribution of the random effects, b;r and bk3). Because integrals (denoting averag-
ing over the distribution of the random effects) appear in the log-likelihood function,
there are no simple, closed-form solutions. Instead, numerical integration techniques,
for instance, Gaussian quadrature, are required for maximizing the log-likelihood
function. ML estimation, using Gaussian quadrature, for two-level and higher-level
generalized linear models is implemented in some of the major statistical software
MULTILEVEL GENERALIZED LINEAR MODELS 647
packages ( e.g., PROC GLIMMIX in SAS, the g lme r function in the lme 4 package
in R, and the xtmelogit and xtmepoisson commands in Stata). Various alter-
native approximations to ML estimation for the extensions to three or more levels are
implemented in more specialized, stand-alone programs that have been specifically
developed for multilevel modeling (e.g., MLwiN and HLM).
22.4.4 Case Studies
We illustrate the main ideas underlying multilevel modeling by conducting analyses
of two-level data where the observations at level 1 are counts and binary outcomes.
The first example analyzes two-level count data from a study of malignant melanoma
mortality and ultraviolet (UV) radiation exposure. The second example analyzes two-
level data on fetal malformations, a binary outcome, from the developmental toxicity
study of ethylene glycol (EG) described in Section 22.3. For the latter, we present a
traditional multilevel analysis of the fetal malformation data and contrast the results
with those obtained from a marginal model that accounts for clustering in the data in
a different way.
Malignant Melanoma Mortality and Ultraviolet Light Exposure
In a study of the effects of ultraviolet (UV) light exposure on malignant melanoma
mortality (Langford et al., 1998), counts of the number of deaths due to malignant
melanoma were recorded for males of all ages in the United Kingdom. The counts
of the number of deaths between 1975 and 1980 were aggregated over areas that
correspond to counties or shires; hereafter referred to as counties. Data were collected
on 70 counties nested within 11 regions of the United Kingdom. The resulting data
structure is multilevel, with counties at level 1 (indexed by i) nested within regions
at level 2 (indexed by j). The main predictor of interest is exposure to ultraviolet
light in the B band (UVB). An index of UVB dose reaching the earth's surface was
calculated for each county. The mean UVB index in the United Kingdom was 10.9,
with a standard deviation of 1.5.
Let Y;1 denote the count of the number of deaths in the ith county in the jlh region
due to malignant melanoma. Within a given region, we assume Y;1 has a Poisson
distribution to account for level 1 variation in the counts. Variation in the counts
across regions is accounted for by the inclusion of a random region effect, b1. That
is, conditional on a random region effect b1, the counts are assumed to have a Poisson
distribution with conditional mean related to UVB dose via a log link function,
where UVBij is the UVB index (centered at the mean UVB index in the United
Kingdom) in the i th county of the jlh region. For each county, Tij is the number
of deaths that would be expected were U.K. national age- and gender-specific death
rates to apply to the population of the county. Note that TiJ is known and log(TiJ) is
an offset in this model. The ratio of the observed number of deaths to the expected
648 MULTILEVEL MODELS
Table 22.5 Fixed and random effects estimates for the malignant melanoma mortality data
for males in the United Kingdom.
Variable Estimate SE z
Intercept -0.0365 0.0352 -1.04
UVB 0.1301 0.0279 4.67
Level 2 Variance:
u 2 X 100 0.6222 0.5087
Note: ML estimation is based on 50-point adaptive Gaussian quadrature.
number of deaths, Y /T, is referred to as the standardized mortality ratio (SMR) in
each county. Our model assumes a linear relationship between the log SMR due to
malignant melanoma and county level UV radiation exposure. Finally, we assume
the random region effect has a normal distribution, bj ~ N(0, u 2 ).
The results of fitting this model to the UK malignant melanoma mortality data,
using maximum likelihood estimation, are presented in Table 22.5. There is a sig-
nificant positive relationship between the SMRs and exposure to UVB. Recall that
the standard deviation of UVB in the United Kingdom is 1.5. Therefore the esti-
mated effect of UVB dose indicates that the SMR is approximately 1.5 times larger
(or e3 x 0 ·13 , with 95% confidence interval: 1.25 to 1.74) when comparing a county
with UVB index 1 standard deviation above the UK average to a county with UVB
index 1 standard deviation below. Finally, in interpreting these results, it should be
remembered that the UVB covariate used here is simply an index of exposure for each
county; it is the potential, not the actual, UVB dose experienced by the population of
residents in each county.
Developmental Toxicity Study of Ethylene Glycol
Next we consider a two-level logistic regression model for binary data on fetal mal-
formations from the developmental toxicity study of ethylene glycol (EG). Recall that
in this study, EG was administered (0, 750, 1500, or 3000 mg/kg/day) to 94 pregnant
mice (dams) over the period of major organogenesis, beginning just after implantation
(Price et al., 1985). Following sacrifice, each live fetus was examined for evidence
of malformations, recorded as present or absent. The primary question of scientific
interest is the effect of dose on fetal malformations.
Summary statistics (ignoring clustering in the data) for fetal malformations for
the 94 litters (composed of a total of 1028 live fetuses) are presented in Table 22.6.
The percentage of fetal malformations increases monotonically with increasing dose,
with less than I% in the control group and almost 60% in the group administered the
highest dose (3000 mg/kg).
MULTILEVEL GENERALIZED LINEAR MODELS 649
Table 22.6 Descriptive statistics on fetal malformations from the ethylene glycol (EG)
experiment.
Dose Fetal Malformations
(mg/kg) Dams Fetuses Number Percentage
0 25 297 1 0.34
750 24 276 26 9.42
1500 22 229 89 38.86
3000 23 226 129 57.08
Table 22.7 Fixed and random effects estimates for the fetal malformation data from the
ethylene glycol (EG) experiment.
Variable
Intercept
Dose/750
Level 2 Variance:
Estimate
-4.360
1.336
2.517
SE
0.440
0.166
0.685
Note: ML estimation is based on SO-point adaptive Gaussian quadrature.
z
-9.92
8.06
Letting 1'ii = 1 denote the presence of fetal malformations in the i th live fetus
from the Jih litter (and 1'ij = 0 otherwise), we considered the following logistic
model relating the log odds of fetal malformations to a linear effect of dose:
where dj = Dosej /750 denotes the dose (in units of 750 mg/kg) administered to the
Jih dam (cluster). The random effect b1 is assumed to vary independently across
litters, with b1 ~ N(O, a;). This model assumes that fetuses within a litter are
exchangeable and the positive association among the fetal malformation outcomes is
accounted for by their sharing a common random effect, b1.
The results of fitting the model to the fetal malformation data, using maximum
likelihood estimation, are presented in Table 22. 7. The estimated regression parameter
for dose indicates that the log odds of malformation increases with increasing dose.
In particular, the odds ratio for malformation, comparing the highest dose group
650 MULTILEVEL MODELS
Table 22.8 Estimates of regression parameters from marginal model for the fetal malforma-
tion data from the ethylene glycol (EG) experiment.
Variable
Intercept
Dose/750
Log Odds Ratio
Estimate
-3.190
0.960
1.447
SE
0.220
0.099
0.221
z
-14.53
9.66
6.56
to the control group, is 209.2 (or e4 x1. 335 , with 95% confidence interval: 56.1 to
779.9). This provides overwhelming evidence of the increased risk of malformations
at the highest dose of EG. The odds ratio for malformations, comparing the lowest
dose group to the control group, is 3.80 (or el. 336 , with 95% confidence interval:
2.75 to 5.26). Finally, the estimate of al indicates that there are moderate litter
effects, with heterogeneity across dams in the underlying risk of producing fetuses
with malformations. For example, in the control group, 95% of dams have a risk
of producing fetuses with malformations between 0% and 22%. Alternatively, if we
appeal to the notion of a latent variable distribution and assume an underlying two-
level linear model for the latent variable with standard logistic errors, the estimated
intra-cluster correlation is
~= iJl = 2.511 = 4
P iJl + 1r2 /3 2.511 + 3.290 °· 3·
For illustrative purposes we also consider a marginal logistic regression model
relating the log odds of fetal malformations to a linear effect of dose
logit{ E(Y;j)} = f31 + f32 dj,
and account for the intra-litter association by a common log odds ratio,
1
{o (v· __ ~,. )} _ l { Pr(Y;j = 1, Y;k = 1) Pr(Y;j = 0, Y;k = 0)} og R i
,1, i ik - og -~~----~-~~-----C.. . Pr(Y;j = 1, Y;k = 0) Pr(Y;j = 0,
Y;k = 1)
This can be thought of as a marginal model analogue of the random intercepts model
for the within-litter association. The results of fitting this marginal model, using GEE
methods, are presented in Table 22.8. The estimate of the effect of dose indicates that
the odds ratio for malformations, comparing the highest dose group to the control
group, is 46.4 (or e4 x0 ·960 , with 95% confidence interval: 21.3 to 101.2). These
results also provide strong evidence of the increased risk of malformations at the
highest dose of EG. The within-litter odds ratio of 4.26 (or e 1.447 ) indicates that there
is clustering in the fetal malformations data.
Note that the estimated effect of dose in the marginal model is discernibly smaller
than that reported in Table 22. 7. This should not be too surprising given the important
SUMMARY 651
distinctions between the regression parameters in marginal models and generalized
linear mixed effects models that were highlighted in Chapter 16. The regression
parameters for dose in the two models have quite different interpretations. In the
logistic regression model with a random litter effect, /32 describes the change in the
risk of producing fetuses with malformations for any given dam; this change in the
risk for a single-unit change in dose depends on bj, a specific dam's random effect
or underlying propensity for producing fetuses with malformations. In the marginal
model, (32 describes changes in the prevalence of fetal malformations when sub-
populations of dams exposed to different doses of EG are compared. Although both
models account for clustering in the data, the targets of inference are different and
the two analyses address distinct scientific questions.
22.5 SUMMARY
In previous sections we described models for data with a hierarchical structure, where
lower-level units are nested within higher-level units. The dominant approach for
modeling such data is regression models where random effects are introduced at
different levels. A central feature of multilevel modeling is the incorporation of
covariates that can be measured at any level of the hierarchy, thereby allowing the
effects at each level to be disentangled. By combining covariates that have been
measured at different levels within a single regression model, their relative importance
can be determined. For example, multilevel models can address questions about the
effects of individuals' characteristics (e.g., disease severity) while adjusting for their
context ( e.g., being treated at a large university teaching hospital versus a rural clinic).
Multilevel data can be challenging to analyze for at least two main reasons. First,
the covariates can be measured at different levels, and the same covariate can operate
at many different levels. As a result somewhat greater care is required in the inter-
pretation of regression parameters in multilevel models. It is not always transparent
how best to combine covariates measured at different levels within a single model so
that the regression parameters have useful interpretations. In our brief overview of
multilevel models, we have not touched on this important topic; for more information,
the interested reader is directed to the references at the end of this chapter.
The second challenge in the analysis of multilevel data is how best to account
for the clustering that can arise at different levels of the hierarchy. In the multilevel
modeling literature the dominant approach for accounting for the intra-cluster corre-
lations at different levels is via the introduction of random effects at different levels.
This gives rise to mixed effects models that can be extended in a very natural way
to any number of levels of clustering in the data. For linear models, this is certainly
a very natural way to account for clustering. However, for generalized linear mod-
els for discrete data, it does raise subtle issues concerning the interpretation of the
fixed effect regression parameters and questions about what is the relevant target of
inference. The same issues that were given an airing in Chapter 16 apply equally to
multilevel models for discrete data. These issues were highlighted in our analyses
of the two-level clustered data on fetal malformations in Section 22.4, where the
652 MULTILEVEL MODELS
estimated effect of dose was discernibly different depending on how the clustering
was accounted for. The estimates of the effect of dose from the marginal and ran-
dom effects logistic regression models differed because the corresponding regression
parameters have distinct interpretations and address somewhat different scientific
questions. In general, the fixed effects parameters in a two-level model for discrete
data represent changes in the (transformed) mean response, for a single-unit change
in the corresponding covariate, for any given level 2 unit. In Chapters 14 and 16,
in the context of longitudinal data, these regression coefficients were referred to as
"subject-specific"; here they are "cluster-specific" and describe covariate effects for
an individual cluster. In contrast, the regression parameters in a marginal model rep-
resent changes in the (transformed) mean response when sub-populations defined by
different values of the corresponding covariate are compared. The regression param-
eters in marginal models address the dependence of the population-averaged response
(where averaging is over all possible units in the hierarchy) on the covariates. These
regression parameters do not have any direct interpretation for an individual cluster
when there is heterogeneity across clusters.
Although much of the multilevel literature on the analysis of discrete data is dom-
inated by the use of generalized linear mixed effects models, we note that marginal
models can also be used to account for clustering at different levels. All the issues
discussed in Chapter 16 for two-level longitudinal data apply equally to two-level
and higher-level data more broadly defined. In general, the choice between the two
classes of models should not be driven by the availability of software for multilevel
modeling but on the basis of careful thought about the questions of scientific interest.
22.6 FURTHER READING
There is an extensive literature on multilevel models that appears in the statistical,
psychometric, and educational literature. A comprehensive description of multilevel
models, and their application to a wide range of problems, can be found in the books
by Raudenbush and Bryk (2002), Longford (1993), and Goldstein (2003). For readers
who find the level of mathematical difficulty in these books too challenging, the books
by Hox (2002), Kreft and De Leeuw (1998), and Snijders and Bosker (1999) provide
a more introductory and accessible presentation of similar topics targeted at empirical
researchers. An engaging and non-technical introduction to multilevel modeling can
be found in the excellent text by Gelman and Hill (2007).
For illustrations of the application of multilevel models in the biomedical and
health sciences, we recommend the edited volume of articles in Leyland and Goldstein
(2001) and the review articles by Sullivan et al. (1999), Goldstein et al. (2002), and
Subramanian et al. (2003).
FURTHER READING 653
Bibliographic Notes
Although most of the statistical literature on marginal models has focused on two-
level data, Qaqish and Liang (1992) discuss the use of marginal models for multilevel
binary data, with multiple levels of nesting.
Daniels and Gatsonis (1999) describe multilevel modeling in a Bayesian frame-
work; also see Lindley and Smith (1972), Zeger and Karim (1991), Browne et al.
(2002), Carlin and Louis (2000), and Chapters 15 and 16 of Gelman et al. (2003).
Bayesian methods for multilevel modeling can be implemented using the publicly
available software WinBUGS (Spiegelhalter et al., 1999).
Appendix A
Gentle Introduction to
Vectors and Matrices
We present a very brief introduction to vectors and matrices, intended for readers
with no prior exposure to matrix algebra. Specifically, we cover basic definitions and
summarize some of the main properties of vectors and matrices. Vectors and matrices
allow us to perform common mathematical operations (e.g., addition, subtraction,
and multiplication) on a collection of numbers; they also facilitate the description
of statistical methods for multivariate data. Our primary motivation for using them
is the conciseness and compactness with which statistical techniques for analyzing
longitudinal data can be presented when expressed in terms of vectors and matrices.
Mastery of the material presented in this section is a prerequisite for understanding
the statistical methods for longitudinal data described in the book. Although we do
not assume a profound understanding of matrix algebra, vectors and matrices are used
extensively throughout the book to simplify notation and the reader is required to have
some basic facility with the addition and multiplication of vectors and matrices.
655
656 GENTLE INTRODUCTION TO VECTORS AND MATRICES
Basic Concepts and Definitions
A matrix is a rectangular array of elements (e.g., numbers), arranged in rows and
columns. For example,
A= (
13
7
9
8
11
3
2
is a matrix with three rows and four columns. The element or entry in the i th row and
the fh column of the matrix is referred to as the ( i, j) th element of the matrix. For
example, the entry in the 2nd row and 3rd column of A is 3. If we let aij denote the
element in the ith row and the l h column of the matrix A, then
au = 2, a12 = 7, a13 = 11, a14 = 5;
a21 = 4, a22 = 9, a23 = 3, a24 = l;
a31 = 13, a32 = 8, a33 = 2, a34 = 6.
The subscripts on the element aij denote its position in the ith row and the f h column
of the matrix A.
The dimension of a matrix is the number of rows and columns in the matrix. By
convention, the number of rows is listed first, and then the number of columns. Thus
we refer to the matrix A above as being a 3 x 4, or a "3 by 4", matrix.
A vector is a special kind of matrix, having either a single row or a single column.
For example,
V=
2
4
9
7
3
is a 5 x 1 (column) vector. Since the dimension of a vector corresponds to the number
of elements in the vector, the dimension of a vector is often loosely referred to as its
length. 1
Finally, a scalar is a single element (e.g., a single number), and hence can be
treated either as a single element vector or as a 1 x 1 matrix.
1 In matrix algebra, vectors have a geometric meaning, denoting the coordinates of a point in Euclidean
space. The geometric concept of the "length" ( or magnitude) of a vector in Euclidean space has a very
precise definition and technical meaning that is quite different from our informal use of the term here.
GENTLE INTRODUCTION TO VECTORS AND MATRICES 657
Transpose
The transpose is a function that interchanges the rows and columns of a matrix. That
is, the first row becomes the first column, the second row becomes the second column,
and so on. By convention, the transpose of a matrix A is denoted A' (or "A prime").
(Note that in some texts, a superscript T, instead of a prime, is used to denote the
transpose of a matrix, e.g., AT.)
For example, consider the 3 x 4 matrix A,
A= ( ! : 1: ) .
13 8 2 6
The transpose of A,
A'-C~ ; I~),
is the 4 x 3 matrix with rows and columns interchanged. Similarly, since a vector is
a matrix with either a single row or column, if
V=
2
4
9
7
3
then V' = (2 4 9 7 3).
Examples of vectors and matrices that play key roles in the analysis of longitu-
dinal data are the response vector, often denoted Y, and the covariate matrix, often
denoted X. For example, consider the following data from a subject participating
in a longitudinal clinical trial. In this trial, the subject was assigned to the placebo
group (Group = 0 if assigned to placebo, Group = 1 if assigned to active treatment)
and four repeated measures of blood lead levels were obtained at baseline ( or week
0), week 1, week 4, and week 6:
Blood Lead Treatment Group Week
30.8 0 0
26.9 0 1
25.8 0 4
23.8 0 6
If we let Y denote the vector of repeated measurements of the response variable, then
658 GENTLE INTRODUCTION TO VECTORS AND MATRICES
Y= ( ) ( ~~::) Y3 25.8 .
Y4 23.8
Similarly we can let X denote a matrix of covariates associated with the vector of
repeated measurements, with
The first column of X contains only 1 's, while the second column of X contains a
variable denoting the treatment group assignment and the third column contains the
times of the repeated measurements.
Square and Symmetric Matrices
A matrix is said to be square if it has the same number of rows and columns. A square
matrix is symmetric if it equals its transpose. For example,
is a symmetric matrix since it equals its transpose
S'= ( : : ; 1~) 7 1 5 8 .
11 2 8 4
Examples of symmetric matrices that play an important role in the analysis of longi-
tudinal data are the covariance and correlation matrices for the repeated measures on
the same individuals.
Finally, a diagonal matrix is a special case of a symmetric square matrix that has
non-zero elements only in the main diagonal positions, and zeros elsewhere. The
main diagonal elements are those in the same row and column, from the upper left to
GENTLE INTRODUCTION TO VECTORS AND MATRICES 659
the lower right comers of the matrix. For example,
(
2 0 0 0) 0 9 0 0
D= 0 0 5 0
0 0 0 4
is a diagonal matrix. The diagonal matrix having all ones along the main diagonal is
known as the identity matrix and is often denoted by I or In, where the subscript n
denotes the dimension of the identity matrix. Thus
Arithmetic Operations
Addition and subtraction of matrices are defined only for matrices of the same dimen-
sion. That is, the matrices must share the same number of rows and the same number
of columns. The sum of two matrices is obtained by adding their corresponding
elements. For example,
3 2
7 8
6 5
':) ( 2 + 3 7 + 2 11 + 14 ) 4+7 9+8 3+4
13 + 6 8 + 5 2 + 9
(
5
11 17
19 13
Subtraction of matrices is defined in a similar way. For example,
U : 'D ( : 1D (
(
2-3 7-2
4-7 9-8
13 - 6 8 - 5
-1 5 3 )
-3 1 =1 .
7 3 -7
11-14 )
3-4
2-9
660 GENTLE INTRODUCTION TO VECTORS AND MATRICES
Scalar Multiplication of a Matrix
A scalar is a single number, as opposed to a vector or matrix of numbers. The scalar
multiple of a matrix is formed by multiplying each element of the matrix by the scalar.
For example, if
A= (
13
Multiplication of Matrices
4 14 22
6 1~ ) 8 18
26 16 4 12
The multiplication of two matrices is somewhat more involved. The multiplication
of two matrices A and B, denoted AB, is defined only if the number of columns of
A is equal to the number of rows of B. For example, if A is a p x q matrix and B is
a q x r matrix, then the product of the two matrices AB is a p x r matrix. Letting C
be the product of A and B,
C=AB,
the ( i, j) th element of C is the sum of the products of the corresponding elements in
the ith row of A and the fh column of B. Specifically, if Cij is the element in the i th
row and the f h column of the matrix C = AB, then
q
Cij = L aikbkj = ai1bi1 + ai2b21 + · · · + aiqbq1, i = 1, ... ,p; j = 1, ... , r;
k=l
where q is the number of columns in A or the number of rows in B. Matrix multipli-
cation is best understood by considering a simple example. Suppose
then
AB (
(2 X 1) + (7 X 3) + (11 X 2)
( 4 X 1) + (9 X 3) + (3 X 2)
(13 X 1) + (8 X 3) + (2 X 2)
( :~ ~: ) .
41 42
(2 X 2) + (7 X !) + (11 X 4) )
( 4 X 2) + (9 X 1) + (3 X 4)
(13 X 2) + (8 X 1) + (2 X 4)
Note that the order of multiplication is very important. For example, if A and B are
both square matrices of the same dimension, then AB is usually not equal to BA.