R代写-ICPSR 2017|学霸联盟

R代写-ICPSR 2017

时间：2021-04-03

Mixed Models with R:
Longitudinal Data Analysis

The Special Roles of Time

ICPSR 2017 at York University

Georges Monette
random@yorku.ca
2

Contents
Summary: .............................................................................................. 5
Take 1: The basic ideas ........................................................................ 8
A traditional example ........................................................................ 9
Pooling the data ('wrong' analysis) .................................................. 11
Fixed effects regression model ........................................................ 22
Other approaches ............................................................................. 29
Multilevel Models............................................................................ 30
From Multilevel Model to Mixed Model ........................................ 33
Mixed Model for Longitudinal Data in R: ...................................... 38
Notes on interpreting autocorrelation ........................................... 47
Some issues concerning autocorrelation ...................................... 48
Mixed Model in Matrices ................................................................ 51
Fitting the mixed model ................................................................... 53
Comparing GLS and OLS ............................................................... 56
Testing linear hypotheses in R ........................................................ 57
3
Modeling dependencies in time ....................................................... 63
G-side vs. R-side.............................................................................. 65
Simpler Models................................................................................ 68
BLUPS: Estimating Within-Subject Effects ................................... 70
Where the EBLUP comes from : looking at a single subject ....... 79
Interpreting G .................................................................................. 87
Differences between lm (OLS) and lme (mixed model) with
balanced data ................................................................................... 95
Take 2: Learning lessons from unbalanced data ................................ 97
R code and output .......................................................................... 102
Between, Within and Pooled Models ............................................ 104
The Mixed Model .......................................................................... 113
A serious a problem? a simulation ................................................ 118
Splitting age into two variables ..................................................... 121
Using 'lme' with a contextual mean ............................................... 131
> fit.contextual <- lme( .................................................................. 131
+ y ~ (age + cvar(age,Subject) ) * Sex ................................. 131
Simulation Revisited ..................................................................... 135
4
Power ................................................................................................ 138
Some links ........................................................................................ 139
A few books ...................................................................................... 140
Appendix: Reinterpreting weights .................................................... 141

5
Summary:

At first sight a mixed model for longitudinal data analysis does not
look very different from a mixed model for hierarchical data. In
matrices:
Linear Model 2~ ( , )N  y Xβ ε ε 0 I
Mixed Model
for
Hierarchical
Data:
2~ ( , )
~ ( , )
N
N

  

j j j j j
j j
j
y Xγ Zu ε
ε 0 I
u 0 G

1
2
j
j
j
jn
y
y
y
       
jy 
Observations
in jth cluster
(students in
jth school)
Mixed Model
for
Longitudinal
Data:
j j j j j
j j
j
~ ( , )
~ ( , )
N
N
  

y Xγ Zu ε
ε 0 R
u 0 G

1
2
j
j
j
jn
y
y
y
       
jy 
Observations
over time on
jth subject

6

Formally, mixed models for hierarchical data and for longitudinal data look
almost the same. In practice, longitudinal data introduces some fascinating
challenges:

1) The observations within a cluster are not necessarily independent. This
is the reason for the broader conditions that ~ ( , )j jNε 0 R (where jR is a
variance matrix) instead of merely the special case: 2~ ( , )j jN  Iε 0 .
Observations close in time might depend on each other in ways that are
different from those that are far in time. Note that if all observations have
equal variance and are equally positively correlated – what is called a
Compound Symmetry variance structure – this is entirely accounted for by
the random intercept model on the G side. The purpose of the R matrix is to
potentially capture interdependence that is more complex than compound
symmetry.

2) The mean response may depend on time in ways that are far more
complex than is typical for other types of predictors. Depending on the time
7
scale of the observations, it may be necessary to use polynomial models,
asymptotic models, Fourier analysis (orthogonal trigonometric functions)
or splines that adapt to different features of the relationship in different
periods of times.

3) There can be many partially confounded 'clocks' in the same analysis:
period-age-cohort effects, age and time relative to a focal event such as
giving birth, injury, arousal from coma, etc.

4) Some periodic patterns can be modeled either with fixed effects or with
random effects through the R matrix. Periodic patterns with a fixed period
(e.g. seasonal periodic variation) are naturally modeled with fixed effects
(FE) using, for example, trigonometric functions. Periodic patterns with a
randomly varying period (e.g. sunspot cycles, are more appropriately
modeled with the R matrix which can be used to model the kind of pattern
encountered in time series analysis, autoregressive and moving average
models for residuals
These slides focus on the simple functions of time and the R side. Lab 3
introduces more complex forms for functions of time.
8

Take 1: The basic ideas

9
A traditional example
Figure 1: Pothoff and Roy dental measurements in boys and girls.
Balanced data:
 everyone measured at
the same set of ages
 could use a classical
repeated measures
analysis

Some terminology:

Cluster: the set of
observations on one
subject

Occasion: observations
at a given time for each
subject

age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 12 14
M16 M05
8 9 10 12 14
M02 M11
8 9 10 12 14
M07 M08
M03 M12 M13 M14 M09
20
25
30
M15
20
25
30
M06 M04 M01 M10
F10 F09 F06 F01 F05
20
25
30
F07
20
25
30
F02
8 9 10 12 14
F08 F03
8 9 10 12 14
F04 F11
10
Figure 2: A different view by sex
Viewing by sex helps to
see pattern between
sexes:
Note:
Slopes are relatively
consistent within each
sex – except for a few
anomalous male curves.
BUT
Intercept is highly
variable.
An analysis that pools
the data ignores this
feature. Slope estimates
will have excessively
large SEs and 'level'
estimates too low SEs.

age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 11 12 13 14
Male
8 9 10 11 12 13 14
Female
11
Pooling the data ('wrong' analysis)

Ordinary least-squares on pooled data:

0
1, , (number of subjects [clusters])
1, , (number of occasions for th subject)
age sex age sexit it i it i it
i
y age sex age sex
i N
t T i
        





Note that it is customary to use i for subjects (clusters) and t for
occasions (time). This means that ‘i’ assumes the role previously
played by ‘j’.

R:

> library( spida ) # see notes on installation
> library( nlme ) # loaded automatically
> library( lattice ) # ditto
> data ( Orthodont ) # without spida
12

> head(Orthodont)
distance age Subject Sex
1 26.0 8 M01 Male
2 25.0 10 M01 Male
3 29.0 12 M01 Male
4 31.0 14 M01 Male
5 21.5 8 M02 Male
6 22.5 10 M02 Male
> dd <- Orthodont

> tab(dd, ~Sex)
Sex
Male Female Total
64 44 108

13
> dd$Sub <- reorder( dd$Subject, dd$distance)
# for plotting

## OLS Pooled Model

> fit <- lm ( distance ~ age * Sex , dd)
> summary(fit)
. . .
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.3406 1.4162 11.538 < 2e-16 ***
age 0.7844 0.1262 6.217 1.07e-08 ***
SexFemale 1.0321 2.2188 0.465 0.643
age:SexFemale -0.3048 0.1977 -1.542 0.126

Residual standard error: 2.257 on 104 degrees of freedom
Multiple R-squared: 0.4227, Adjusted R-squared: 0.4061
F-statistic: 25.39 on 3 and 104 DF, p-value: 2.108e-12

14
Note that both SexFemale and age:SexFemale have
large p-values. Are you tempted to just drop
both of them?

Check the joint hypothesis that they are
BOTH 0.

> wald( fit, "Sex")

numDF denDF F.value p.value
Sex 2 104 14.97688 <.00001

Coefficients Estimate Std.Error DF t-value p-value
SexFemale 1.032102 2.218797 104 0.465163 0.64279
age:SexFemale -0.304830 0.197666 104 -1.542143 0.12608

This analysis suggests that we could drop one
or the other but not both! Which one should we
choose? To respect the principle of marginality
15
we should drop the interaction, not the main
effect of Sex. This leads us to:

> fit2 <- lm( distance ~ age + Sex ,dd )
> summary( fit2 )

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.70671 1.11221 15.920 < 2e-16 ***
age 0.66019 0.09776 6.753 8.25e-10 ***
SexFemale -2.32102 0.44489 -5.217 9.20e-07 ***

and we conclude there is an effect of Sex of
jaw size but did not find evidence that the
rate of growth is different.

Revisiting the graph we saw earlier:
16

Figure 3: A different view by sex

The analysis is inconsistent
with what we see. It fails
Tukey’s interocular test.
Except for a few irregular
male trajectories, the male
trajectories appear steeper
than the female ones. On
the other hand, if we
extrapolate the curves back
to age 0, there would not
much difference in levels.
estimates too low SEs.
OLS cannot exploit the consistency in slopes to recognize that
hypotheses about slopes should have a relatively smaller SE than
hypotheses about the levels of the curves. OLS is blind to the lines.
age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 11 12 13 14
Male
8 9 10 11 12 13 14
Female
17

From the first OLS fit:

Estimated variance
within each
subject:

2
2
2
2
2.26 . . .
. 2.26 . .
. . 2.26 .
. . . 2.26
         

Why is this wrong?

 Residuals within
clusters are not
independent; they tend
to be highly correlated
with each other
age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 12 14
M16 M05
8 9 10 12 14
M02 M11
8 9 10 12 14
M07 M08
M03 M12 M13 M14 M09
20
25
30
M15
20
25
30
M06 M04 M01 M10
F10 F09 F06 F01 F05
20
25
30
F07
20
25
30
F02
8 9 10 12 14
F08 F03
8 9 10 12 14
F04 F11
18

Fitted lines in ‘data
space’.

age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 11 12 13 14
Male
8 9 10 11 12 13 14
Female
19

Determining the
intercept and slope of
each line

age
d
i
s
t
a
n
c
e
15
20
25
30
0 5 10
Male
15
20
25
30
Female
20

Fitted lines in
‘data’ space

age
d
i
s
t
a
n
c
e
15
20
25
0 5 10
Male
Female
21

Fitted ‘lines’ in
‘beta’ space

^age


^
0
16.0
16.5
17.0
17.5
0.4 0.5 0.6 0.7 0.8 0.9
Male
Female
22
Fixed effects regression model

See
Paul D. Allison (2005) Fixed Effects Regression Methods for
Longitudinal Data Using SAS. SAS Institute – a great book on basics
of mixed models!

 Treat Subject as a factor

 Lose Sex unless it is constructed as a Subject contrast

 Fits a separate OLS model to each subject:

it i age it iti agey     

See the wiki for details: SPIDA 2010: Longitudinal Models:
Additional material
23
Estimated variance for
each subject:

2
2
2
2
1.31 . . .
. 1.31 . .
. . 1.31 .
. . . 1.31
         

Problems:

● No estimate of sex
effect
● Can't generalize to
population, only to 'new'
observations from same
subjects

age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 12 14
M16 M05
8 9 10 12 14
M02 M11
8 9 10 12 14
M07 M08
M03 M12 M13 M14 M09
20
25
30
M15
20
25
30
M06 M04 M01 M10
F10 F09 F06 F01 F05
20
25
30
F07
20
25
30
F02
8 9 10 12 14
F08 F03
8 9 10 12 14
F04 F11
24

● Can't predict for new
subject.
● Can construct sex
effect but CI is for
difference between
sexes in this sample
● No autocorrelation in
time

age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 12 14
M16 M05
8 9 10 12 14
M02 M11
8 9 10 12 14
M07 M08
M03 M12 M13 M14 M09
20
25
30
M15
20
25
30
M06 M04 M01 M10
F10 F09 F06 F01 F05
20
25
30
F07
20
25
30
F02
8 9 10 12 14
F08 F03
8 9 10 12 14
F04 F11
25

Fitted lines in data
space

● Female lines lower
and less steep

● Patterns within Sexes
not so obvious.

age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 11 12 13 14
Male
8 9 10 11 12 13 14
Female
26

Fitted lines in beta
space

● Patterns within sexes
more obvious: steeper
slope associated with
smaller intercept.

● Single male outlier
stands out

^age


^
0
5
10
15
20
25
0.5 1.0 1.5 2.0
Male
0.5 1.0 1.5 2.0
Female
27

Each within-subject
least squares estimate
0
ˆ
ˆ
ˆi
i
i
i age
 
       

has variance ' 1( )i i  X X
which is used to
construct a confidence
ellipse for the ‘fixed
effect’
0i
i
i age
 
      

for the ith subject.
Each CI uses only the
information from that
subject (except for the
estimate of   )

^age


^
0
0
10
20
30
0 1 2
Male
0 1 2
Female
28

Differences between
subjects such as the
dispersion of ˆi s and the
information they provide
on the dispersion of the
true i s is ignored in this
model.

The standard error of the
estimate of each average
Sex line uses the sample
distribution of it s within
subjects but not the
variability in i s
between subjects.

^age


^
0
5
10
15
20
25
0.5 1.0 1.5 2.0
Male
0.5 1.0 1.5 2.0
Female
29
Other approaches

 Repeated measures (univariate and multivariate)

o Need same times for each subject, no other time-varying
(Level 1) variables

 Two-stage approach: use ˆ s in second level analysis:

o If design not balanced, then iˆ s have different variances, and
would need different weights, Using ' 1( )i i  X X does not
work because the relevant weight is based on the marginal
variance ' 1( )i i  G X X , not the conditional variance given
the ith subject, ' 1( )i i  X X .
30
Multilevel Models

Start with the fixed effects model:

Within-subject model (same as fixed effects model above):

1it it iti iy X     ~ (0, )i N I  
1, , 1, , ii N t T  

0i is the ‘true’ intercept and 1i is the ‘true’ slope with respect to
X.

  is the within-subject residual variance.

X (age in our example) is a time-varying variable. We could have
more than one.
31
Then add:
Between-subject model (new part):
We suppose that 0i and 1i vary randomly from subject to
subject.
But the distribution might be different for different Sexes (a
‘between-subject’ or ‘time-invariant’ variable). So we assume a
multivariate distribution:

0
11 11
1 1 1
1, ,
1
i i i
i
i ii
i
ii
uW i NuW
u
uW
    
 
 
 
 
  
 
                  
                
   
 


0
1 11
0~ , ~ ( , ) 1, ,0
i
i
u g gN N i Ng gu
 
 
                      
0 G 

32
where iW is a coding variable for Sex, e.g. 0 for Males and 1 for
Females.

0
1
1 1
among Males
among Females
i
i age
E
 

 
 


 
 
               
    

 

Some software packages use the formulation of the multilevel model,
e.g. MLWin.
SAS and R use the ‘mixed model’ formulation. It is very useful to
know how to go from one formulation to the other.

33
From Multilevel Model to Mixed Model

Combining the two levels of the multilevel model by substituting
the between subject model into the within-subject model. Then
gather together the fixed terms and the random terms:

0 1
0 1
1 1
0
1
1
1
1 1
1
(fixed part of the m
(random part
odel
of t
)
i
i
i
i i
it it iti i
i i it
ii ii it it
it i it
it
iti i
it
it
W W
u u
u u
y X
W W X
W W X X
X X
X
u u
   
  
   
   



         
        
   

   
     
 
 
  
      
    
 
 he model)

34

Anatomy of the fixed part:

1
1
(Intercept)
(between-subject, time-invariant variable)
(within-subject, time-varying variable)
(cross-level interaction)
i
it
i it
W
W
X
X












Interpretation of the fixed part: the parameters reflect population
average values.

Anatomy of the random part:
For one occasion:

0 1i i iit ittu u X  

35
Putting the observations of one subject together:

0
1
1
2
11
22
3 3
4
3
44
1
1
1
1
i
i
i
ii
ii
ii
ii
i
i
i i
i
i
i
u
u
u
X
X
X
X






     
                           



 Ζ  

Note: the random-effects design uses only time-varying variables

Distribution assumption:

~ (0, ) independent of ~ (0, )i iiu N NG R 
where, so far, i I R

36
Notes:

 G (usually) does not vary with i. It is usually a free positive
definite matrix or it may be a structured pos-def matrix.
More on G later.

 iR (usually) does change with i – as it must if iT is not
constant. iR is expressed as a function of parameters. The
simplest example is
i in ni I  R . Later we will use iR to
include auto-regressive parameters for longitudinal
modeling.

 We can’t estimate G and R directly. We estimate them
through:

'Var( )i i i ii  V Z GZ R

37

 Some things can be parametrized either on the G-side or on
the R-side. If they’re done in both, you lose identifiability.
Ill-conditioning due “collinearity” between the G- and R-
side models is a common problem.

38
Mixed Model for Longitudinal Data in R:

> fit <- lme( distance ~ age * Sex, dd,
+ random = ~ 1 + age | Subject,
+ correlation
+ = corAR1 ( form = ~ 1 | Subject))

 Model formula: distance ~ age * Sex

o specifies the fixed model
o includes the intercept and marginal main effects by default
o contains time-varying, time-invariant and cross-level
variables together
39
 Random argument: ~ 1 + age | Subject

o Specifies the variables in the random model and the variable
defining clusters.

o The G matrix is the variance covariance matrix for the
random effect. Here 00 0,00
,,, ,0
Var Var ageii
age ageage iage i age
g gu
u g g


                 
  G

o Normally, the random model only contains an intercept and,
possibly, time-varying variables
40
 Correlation argument: Specifies the model for the iR matrices

o Omit to get the default:
i in ni   R I

o Here we illustrate the use of an AR(1) structure producing for
example
1 2 3
1 1 2
2 1 1
3 2 1
1
1
1
1
iR
  
     
  

        
 in a cluster with 4 occasions.

41

# Mixed Model in R:

> fit <- lme( distance ~ age * Sex, dd,
+ random = ~ 1 + age | Subject,
+ correlation
+ = corAR1 ( form = ~ 1 | Subject))

> summary(fit)
Linear mixed-effects model fit by REML
Data: dd
AIC BIC logLik
446.8076 470.6072 -214.4038

Random effects:
Formula: ~1 + age | Subject
Structure: General positive-definite, Log-
Cholesky parametrization
42
StdDev Corr
(Intercept) 3.3730482 (Intr)
00
g
age 0.2907673 -0.831

11 01 01 00 11
/g r g g g
Residual 1.0919754

Correlation Structure: AR(1)
Formula: ~1 | Subject
Parameter estimate(s):
Phi
-0.47328
correlation between adjoining obervations

43
Fixed effects: distance ~ age * Sex
Value Std.Error DF t-value p-value
(Intercept) 16.152435 0.9984616 79 16.177323 0.0000
age 0.797950 0.0870677 79 9.164702 0.0000
SexFemale 1.264698 1.5642886 25 0.808481 0.4264
age:SexFemale -0.322243 0.1364089 79 -2.362334 0.0206
Correlation: among gammas
(Intr) age SexFml
age -0.877
SexFemale -0.638 0.559
age:SexFemale 0.559 -0.638 -0.877

Standardized Within-Group Residuals:
Min Q1 Med
Q3 Max
-3.288886631 -0.419431536 -0.001271185
0.456257976 4.203271248

Number of Observations: 108
44
Number of Groups: 27

Confidence intervals for all parameters:

> intervals( fit )
Approximate 95% confidence intervals

Fixed effects:
lower est. upper
(Intercept) 14.1650475 16.1524355 18.13982351
age 0.6246456 0.7979496 0.97125348
SexFemale -1.9570145 1.2646982 4.48641100
age:SexFemale -0.5937584 -0.3222434 -0.05072829
attr(,"label")
[1] "Fixed effects:"

45

Random Effects:
Level: Subject
lower est. upper
sd((Intercept)) 2.2066308 3.3730482 5.1560298
sd(age) 0.1848904 0.2907673 0.4572741
cor((Intercept),age) -0.9377008 -0.8309622 -0.5808998

Correlation structure:
lower est. upper
Phi -0.7559617 -0.4728 -0.04182947
attr(,"label")
[1] "Correlation structure:"

Do NOT use CIs for SDs
to test whether they are 0.
Use anova + simulate. Cf
Lab 1.
Negative!
46

Within-group standard error:
lower est. upper
0.9000055 1.0919754 1.3248923

> VarCorr( fit ) from the G matrix
Subject = pdLogChol(1 + age)
Variance StdDev Corr
(Intercept) 11.3774543 3.3730482 (Intr)
age 0.0845456 0.2907673 -0.831
Residual 1.1924103 1.0919754

To get the G matrix itself
in a form that can be used
in matrix expressions

> getVarCov( fit )
47

Random effects variance covariance matrix
(Intercept) age
(Intercept) 11.37700 -0.814980
age -0.81498 0.084546
Standard Deviations: 3.373 0.29077

Notes on interpreting autocorrelation

The estimated autocorrelation is negative. Although most natural
processes would be expected to produce positive autocorrelations,
occasional large measurement errors can create the appearance of a
negative autocorrelation. Some processes correct so that a unusually
large value tends to be compensated by an unusually small one, e.g.
residue in chemical process.

48

Some issues concerning autocorrelation

1. Lack of fit will generally contribute positively to
autocorrelation. For example, if trajectories are quadratic but
you are fitting a linear trajectory, the residuals will be
positively autocorrelated. Strong positive autocorrelation
can be a symptom of lack of fit. This is an example of poor
identification between the FE model and the R model, that is,
between the deterministic and the stochastic aspects of the
model. See Lab 3 for a similar discussion of seasonal (FE)
versus cyclical variation (R-side) periodic patterns.

2. As mentioned above, occasional large measurement errors will
contribute negatively to the estimate of autocorrelation.

49
3. In a well fitted OLS model, the residuals are expected to be
negatively correlated, more so if there are few observations per
subject.

4. With few observations per subject, the estimate of
autocorrelation (R side) can be poorly identified and highly
correlated with G-side parameters. [See 'Additional Notes']

Looking at the data we suspect that M09 might be highly influential
for autocorrelation. We can refit without M09 to see how the estimate
changes.

What happens when we drop M09?

> fit.dropM09 <- update( fit,
+ subset = Subject != "M09")
50
> summary( fit.dropM09 )
Linear mixed-effects model fit by REML
Data: dd
Subset: Subject != "M09"
AIC BIC logLik
406.3080 429.7545 -194.1540
. . . . .
Correlation Structure: AR(1)
Formula: ~1 | Subject
Parameter estimate(s):
Phi
-0.1246035 still negative
. . . . . .
> intervals( fit.dropM09 )
Approximate 95% confidence intervals
. . . .
Correlation structure:
lower est. upper
Phi -0.5885311 -0.1246035 0.4010562
attr(,"label") but not significantly
[1] "Correlation structure:"
51
Mixed Model in Matrices

In the ith cluster:

1 1
1
0
1
1
0
1
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
1 1
1 1
1 1
1 1
i i
i
i
i
it it i it it
i ii i i i
i ii i i i
i ii i i i
i ii i i i
itu u
u
W
u
Wy X X X
y W X W X X
y W X W X X
y W X W X X
y W X W X X
   




   




                                         
    

     
 
1
2
3
4
i
i
i
i
i iii i




       

  y X uγ Ζ ε

[Could we fit this model in cluster i?]
where
'
~ ( , ) ~ ( , )
~ ( , )
i
i
i i
i i i i i i i
N N
N   
0 Gu ε
u ε
0 R
Ζ 0 Ζ GΖ R

52
For the whole sample
1 1 1 11 0
0N N N N N
            
  
           
               
 
u ε
u
y X Ζ
γ
y X Ζ ε

    

 

Finally making the complex look deceptively simple:
  
 
u εy Xγ Ζ
Xγ δ

with

1 Var( )
Var( )
Var( ) 'N
       

   
  
R 0 G
R Ζ
0
u
ε δ u
R G
ε
δ V Ζ Z R

  

53
Fitting the mixed model

Use Generalized Least Squares on

~ ( , ' )N y Xγ ΖGZ R

1
1 1ˆ ˆˆ ' 'GLS
    γ X V X X V y

We need ˆGLSγ to get Vˆ and vice versa so one algorithm iterates from
one to the other until convergence.

There are two main ways of fitting mixed models with normal
responses:

54
1. Maximum Likelihood (ML)

a. Fits all of , ,γ G R at the same time.

b. Two ML fits can be used in the 'anova' function to test
models that differ in their FE models or in their RE
models: G and/or R.

c. ML fits tend to underestimate V and Wald tests will tend
to err on the liberal side.

2. Restricted (or Residual) Maximum Likelihood

a. Maximum likelihood is applied to the 'residual space'
with respect to the X matrix and only estimates G and R,
hence V. The estimated V is then used to obtain the
GLS estimate for γ .

55
b. 'anova' can only be used to compare two models with
identical FE models. Thus 'anova' (Likelihood Ratio
Tests) can only be applied to hypotheses about REs.

c. The estimate of V tends to be better than with ML and
Wald tests are expected to be more accurate than with
ML.

d. Thus with REML, you sacrifice the ability to perform
LRTs for FEs but improve Wald tests for FEs. Also,
LRTs for REs are expected to be more accurate.

e. REML is the default for 'lme' and PROC MIXED in
SAS.

56
Comparing GLS and OLS

We used OLS above:

  1ˆ ' 'OLS γ X X X y

instead of
1
1 1ˆ ˆˆ ' 'GLS
    γ X V X X V y

How does OLS differ from GLS?

Do they differ only in that GLS produces more accurate standard
errors? Or can ˆOLSβ be very different from ˆGLSβ ?

With balanced data they will be the same. With unbalanced data they
can be dramatically different. OLS is an estimate based on the pooled
data. GLS provides an estimate that is closer to that of the unpooled
57
data. Estimation of the FE model and of RE model are highly related
in contrast with OLS and GLMs with canonical links where they are
orthogonal.

Testing linear hypotheses in R

Hypotheses involving linear combination of the fixed effects
coefficients can be tested with a Wald test. The Wald test is
based on the normal approximation for maximum likelihood
estimators using the estimated variance-covariance matrix.

Using the 'wald' function alone displays the estimated fixed
effects coefficients and Wald-type confidence intervals as well as
a test that all true coefficients are equal 0 (this is rarely of any
interest).

58

> wald (fit)
numDF denDF F.value p.value
4 24 952.019 <.00001

Coefficients Estimate Std.Error DF t-value p-value
(Intercept) 16.479081 1.050894 76 15.681019 <.00001
age 0.769464 0.089141 76 8.631956 <.00001
SexFemale 0.905327 1.615657 24 0.560346 0.58044
age:SexFemale -0.290920 0.137047 76 -2.122774 0.03703

Continuation:

Coefficients Lower 0.95 Upper 0.95
(Intercept) 14.386046 18.572117
age 0.591923 0.947004
SexFemale -2.429224 4.239878
age:SexFemale -0.563872 -0.017967

59
We can estimate the response level at age 14 for Males and Females
by specifying the appropriate linear transformation of the coefficients.

> L <- rbind( "Male at 14" = c( 1, 14, 0, 0),
+ "Female at 14" = c( 1, 14, 1, 14))
> L
[,1] [,2] [,3] [,4]
Male at 14 1 14 0 0
Female at 14 1 14 1 14

> wald ( fit, L )
numDF denDF F.value p.value
1 2 24 1591.651 <.00001

Estimate Std.Error DF t-value p-value Lower 0.95
Male at 14 27.25157 0.605738 76 44.98908 <.00001 26.04514
Female at 14 24.08403 0.707349 24 34.04829 <.00001 22.62413 Upper 0.95
Male at 14 28.45800
Female at 14 25.54392
60
To estimate the gap at 14:

> L.gap <- rbind( "Gap at 14" = c( 0, 0, 1, 14))
> L.gap
[,1] [,2] [,3] [,4]
Gap at 14 0 0 1 14

> wald ( fit, L.gap)
numDF denDF F.value p.value
1 1 24 11.56902 0.00235

Estimate Std.Error DF t-value p-value Lower 0.95
Gap at 14 -3.167548 0.931268 24 -3.401327 0.00235 -5.089591

Upper 0.95
Gap at 14 -1.245504

61
To simultaneously estimate the gap at 14 and at 8 we can do the
following. Note that the overall (simultaneous) null hypothesis here is
equivalent to the hypothesis that there is no difference between the
sexes.

> L.gaps <- rbind( "Gap at 14" = c( 0, 0, 1, 14),
+ "Gap at 8" = c( 0,0,1, 8))
> L.gaps
[,1] [,2] [,3] [,4]
Gap at 14 0 0 1 14
Gap at 8 0 0 1 8

> wald ( fit, L.gaps)
numDF denDF F.value p.value
1 2 24 5.83927 0.00858

Estimate Std.Error DF t-value p-value Lower 0.95
Gap at 14 -3.167548 0.931268 24 -3.401327 0.00235 -5.089591
Gap at 8 -1.422030 0.844256 24 -1.684359 0.10508 -3.164489

62

An equivalent hypothesis that there is no difference between the sexes
is the hypothesis that the two coefficients for sex are simultaneously
equal to 0. The 'wald' function simplifies this by allowing a string as a
second argument that is used to match coefficient names. The test
conducted is that all coefficients whose name has been matched are
simultaneously 0.

> wald ( fit, "Sex" )
numDF denDF F.value p.value
Sex 2 24 5.83927 0.00858

Coefficients Estimate Std.Error DF t-value p-value
SexFemale 0.905327 1.615657 24 0.560346 0.58044
age:SexFemale -0.290920 0.137047 76 -2.122774 0.03703

Coefficients Lower 0.95 Upper 0.95
SexFemale -2.429224 4.239878
age:SexFemale -0.563872 -0.017967
Note the equivalence of the two F-tests above.
63
Modeling dependencies in time
The main difference between using mixed models for multilevel
modeling as opposed to longitudinal modeling are the assumptions
about
it
 , plus the more complex functional forms for time effects. For
observations observed in time, part of the correlation between  s
could be related to their distance in time.
R-side model allows the modeling of temporal and spatial
dependence.
Correlation argument R
Autoregressive of order 1:
corAR1( form =
~ 1 | Subject)
2 3
22
2
3 2
1
1
1
1
  
     
  
            

64
Correlation argument R
Autoregressive Moving Average
of order (1,1)
corARMA( form =
~ 1 | Subject,
p = 1, q =1)
2
2
2
1
1
1
1
  
     
  
            

AR(1) in continuous time
e.g. supposing a subject with
times 1,2, 5.5 and 102
corCAR1( form =
~ time | Subject)
4.5 9
3.5 82
4.5 3.5 4.5
9 8 4.5
1
1
1
1
  
     
  
             

2 Note that the times and the number of times – hence the indices – can change from subject to subject but 2 and have the same value.
65
G-side vs. R-side

 A few things can be done with either side. But don’t do it with
both in the same model. The redundant parameters will not be
identifiable. For example, the G-side random intercept model is
‘almost’ equivalent to the R-side compound symmetry model.

 With OLS the linear parameters are orthogonal to the variance
parameter. Collinearity among the linear parameters is determined
by the design, X, and does not depend on values of parameters.
Computational problems due to collinearity can be addressed by
orthogonalizing the X matrix.

 With mixed models the variance parameters are generally not
orthogonal to each other and, with unbalanced data, the linear
parameters are not orthogonal to the variance parameters.

66
 G-side parameters can be highly collinear even if the X matrix is
orthogonal. Centering the variables of the RE model around the
“point of minimal variance” will help but the resulting design
matrix may be highly collinear.

 G-side and R-side parameters can be highly collinear. The degree
of collinearity may depend on the value of the parameters.

 For example, our model identifies  through:
2 3
2200 01
210 11
3 2
11 3
11 1 1 1 1 1ˆ
1 1 3 1 1 3 1
1 3 1
g
g g
g
  
     
  
                                                      

  V
For values of  above 0.5, the Hesssian is very ill-conditioned. The
lesson may be that to use AR, ARMA models effectively, you need
at least some subjects observed on many occasions.

67
 R-side only: population average models

 G-side only: hierarchical models with conditionally independent
observations in each cluster

 Population average longitudinal models can be done on the R-side
with AR, ARMA structures, etc.

 The absence of the G-side may be less crucial with balanced data.

 The G-side is not enough to provide control for unmeasured
between subject confounders if the time-varying predictors are
unbalanced (more on this soon).

 A G-side random effects model DOES NOT provide the equivalent
of temporal correlation.

68
Simpler Models

The model we’ve looked at is deliberately complex including
examples of the main typical components of a mixed model. We can
use mixed models for simpler problems.
Using X as a generic time-varying (within-subject) predictor and W as
a generic time-invariant (between-subject) predictor we have the
following:

MODEL RANDOM Formula
One-way
ANOVA with
random effects
~ 1 ~ 1 | Sub 0i itit uy    
Means as
outcomes
~ 1 + W
(~W)3
~ 1 | Sub
0
i
i i
it
tu
Wy 
 

 


3 The model in () is equivalent in R
69
One-way
ANCOVA with
random effects
~ 1 + X (~X) ~ 1 | Sub
0
1
i it
it it
u
y X 
 
 

Random
coefficients
model
~ 1 + X (~X) ~ 1+ X|Sub
1
0 1i i
it it
it itu u
y X
X
 
 

 


Intercepts and
slopes as
outcomes
~ 1 + X
+ W + X:W
(~ X*W)
~ 1+
X|Sub
1
1
0 1
i
i i
it it
i it
it it
W
u u
W
y X
X
X
  


  



 
 


Non- random
slopes
~ 1 + X
+ W + X:W
(~ X*W)
~ 1 | Sub
0
1
1
i
i
it it
i it
it
W
W
y X
X

 


  




 


70
BLUPS: Estimating Within-Subject Effects

We’ve seen how to estimate  , G and R. Now we consider 0
1
i
i
i
 
    
 .

We’ve already estimated i using the fixed-effects model with a OLS
regression within each subject. Call this estimator: iˆ . How good is
it?
1
'ˆVar( )i i i i  
      X X

Can we do better? We have another ‘estimator’ of i .
71
Suppose we know  s for the population. We could also predict4 i by
using the within Sex mean intercepts and slopes, e.g. for Males we
could use:
1


   
with error variance:

0 00
1 10
Var i
i
 
 
             
G

We could then combine iˆ and
1


   
by weighting then by inverse
variance (= precision). This yields the BLUP (Best Linear Unbiased
Predictor):
   11 11 11 ' 1 '
1
ˆ
i i i i i
  
     

                     
G X X G X X

4 Non-statisticians are always thrown for a loop when we ‘predict’ something that
happened in the past. We use 'predict' for things that are random, 'estimate' for
things that are 'fixed'. Orthodox Bayesians always predict.
72

If we replace the unknown parameters with their estimates, we get the
EBLUP (Empirical BLUP):

   11 11 11 ' 1 '
1
ˆˆ ˆ ˆˆ ˆ
ˆi i i i i i
   
     

                      
G X X G X X

The EBLUP ‘optimally’ combines the information from the ith cluster
with the information from the other clusters. We borrow strength
from the other clusters.
The process ‘shrinks’ iˆ towards
1
ˆ
ˆ




   
along a path determined by the
locus of osculation of the families of ellipses with shape Gˆ around
1
ˆ
ˆ




   

and shape   1'ˆ i i    X X around iˆ .
73

The slope of the BLUP
is close to the population
slope

but

the level of the BLUP is
close to the level of the
BLUE

This suggests that G has
a large variance for
intercepts and a small
variance for slopes
age
d
i
s
t
a
n
c
e
20
25
30
8 9 10 12 14
M16 M05
8 9 10 12 14
M02 M11
8 9 10 12 14
M07 M08
M03 M12 M13 M14 M09
20
25
30
M15
20
25
30
M06 M04 M01 M10
F10 F09 F06 F01 F05
20
25
30
F07
20
25
30
F02
8 9 10 12 14
F08 F03
8 9 10 12 14
F04 F11
Popn BLUE BLUP
74

Population estimate
BLUE and BLUP
in beta space

slope
I
n
t
5
10
15
20
25
0.5 1.0 1.5 2.0
M16 M05
0.5 1.0 1.5 2.0
M02 M11
0.5 1.0 1.5 2.0
M07 M08
M03 M12 M13 M14 M09
5
10
15
20
25
M15
5
10
15
20
25
M06 M04 M01 M10
F10 F09 F06 F01 F05
5
10
15
20
25
F07
5
10
15
20
25
F02
0.5 1.0 1.5 2.0
F08 F03
0.5 1.0 1.5 2.0
F04 F11
Popn BLUE BLUP
75

The marginal dispersion
of BLUEs comes from:

2 ' 1
2
1 11 2
ˆVar( ) ( )
ˆVar( )
i
i i i
i
i X
g
T S
 

 
  
G X X

 Var( )i  G [population
var.]
 2 ' 1ˆVar( | ) ( )i i i i    X X
[cond’l var.
resampling from ith
subject]
 ˆE( | )i i i   [BLUE]

slope
I
n
t
5
10
15
20
25
0.5 1.0 1.5 2.0
Male
0.5 1.0 1.5 2.0
Female
Popn BLUE BLUP
76

So:

 
 
2 ' 1
ˆ ˆVar( ) Var(E( | ))
ˆE Var( | )
Var( )
ˆE Var( | )
( )
i i i
i i
i
i i
i i
  
 

 
 
 


 G X X

slope
I
n
t
5
10
15
20
25
0.5 1.0 1.5 2.0
Male
0.5 1.0 1.5 2.0
Female
Popn BLUE BLUP
77

While the expected
variance of the BLUEs
is larger than G

the expected variance of
the BLUPs is smaller
than G.

Beware of drawing
conclusions about G
from the dispersion of
the BLUPs.

slope
I
n
t
5
10
15
20
25
0.5 1.0 1.5 2.0
Male
0.5 1.0 1.5 2.0
Female
Popn BLUE BLUP
78

The estimate of G can
be unstable and often
collapses to singularity
leading to non-
convergence for many
methods.
Possible remedies:
- Recentre X near point
of minimal variance,
- Use a smaller G
- Change the model

slope
I
n
t
5
10
15
20
25
0.5 1.0 1.5 2.0
Male
0.5 1.0 1.5 2.0
Female
Popn BLUE BLUP
79
Where the EBLUP comes from : looking at a single subject

Note that the EBLUP’s
slope is close to the slope
of the population estimate
(i.e. the male population
conditioning on between-
subject predictors) while
the level of the line is
close to level of the
BLUE.

The relative precisions of
the BLUE and of the
population estimate on
slope and level are
reflected through the
shapes of G and 2 ' 1( )i i X X
M11
age
d
i
s
t
a
n
c
e
22
23
24
25
26
27
8 9 10 11 12 13 14
Popn BLUE BLUP
80

The same picture in
“beta-space”
M11
Int
s
l
o
p
e
0.0
0.2
0.4
0.6
0.8
16 18 20 22
Popn BLUE BLUP
81

The population
estimate with a SD
ellipse.
M11
Int
s
l
o
p
e
0.0
0.2
0.4
0.6
0.8
16 18 20 22
Popn BLUE BLUP
82

The population
estimate with a SD
ellipse

and

the BLUE with its
SE ellipse

M11
Int
s
l
o
p
e
0.0
0.2
0.4
0.6
0.8
16 18 20 22
Popn BLUE BLUP
83

The EBLUP is an Inverse
Variance Weighted mean
of the BLUE and of the
population estimate.

We can think of taking the
BLUE and ‘shrinking’ it
towards the population
estimate along a path that
optimally combines the
two components.

The path is formed by the
osculation points of the
families of ellipses around
the BLUE and the
population estimate.

M11
Int
s
l
o
p
e
0.0
0.2
0.4
0.6
0.8
16 18 20 22
Popn BLUE BLUP
84

The amount and direction
of shrinkage depends on
the relative shapes and
sizes of

G

and
2
2 ' 1 1ˆVar( | ) ( )
ii i i
i
i
T
     XX X S

The BLUP is at an
osculation point of the
families of ellipses
generated around the
BLUE and population
estimate.
M11
Int
s
l
o
p
e
0.0
0.2
0.4
0.6
0.8
16 18 20 22
Popn BLUE BLUP
85

Imagine what could
happen if G were
oriented differently:

Paradoxically, both the
slope and the intercept
could be far outside the
population estimate and
the BLUE.

M11
Int
s
l
o
p
e
0.0
0.2
0.4
0.6
0.8
16 18 20 22
Popn BLUE BLUP
86

When is a BLUP a BLUPPER?

The rationale behind BLUPs is based on exchangeability. No outside
information should make this cluster stand out from the others and the
mean of the population deserves the same weight in prediction for this
cluster as it deserves for any other cluster that doesn’t stand out.

If a cluster stands out somehow, then the BLUP might be a
BLUPPER.

87
Interpreting G

The parameters of G give the variance of the intercepts, the variance
of the slopes and the covariance between intercepts and the slopes.

Would it make sense to assume that the covariance is 0 to reduce the
number of parameters in the model? To address this, consider that the
variance of the heights of individual regression lines a fixed value of
X is:

1
1
00 01
10 11
2
00 01 11
Var( ) Var 1
11
2
X X
g g
X g g X
g g X g X
  
   
             
               
 

  

88

Summarizing:

2
1 00 01 11Var( ) 2X g g X g X      

is quadratic function of X.

So 1Var( )X    has a minimum at 01
11
g
g

and the minimum variance is
2
01
00
11
gg g

89

Thus, assuming that the covariance is 0 is equivalent to
assuming that the minimum variance occurs when X = 0. This is
an assumption that is not invariant with location transformations of
X. It is similar to removing a main effect that is marginal to an
interaction in a model, something that should not be done without a
thorough understanding of its consequences.

Example: Let 20
1




        
and 10.5 1
1 0.1
    G

90

A sample of lines in
beta space
1


0
15
20
25
30
-2.0 -1.5 -1.0 -0.5 0.0
concentration ellipse
91

The same lines in
data space.
X
Y
0
5
10
15
20
25
5 10 15 20 25
92

The same lines in
data space with the
population mean
line and lines at one
SD above and
below the
population mean
line

X
Y
0
5
10
15
20
25
5 10 15 20 25
93

The parameters of
G determine the
location and value
of the minimum
standard deviation
of lines

X
Y
0
5
10
15
20
25
5 10 15 20 25
g01 g11
 g00
 g00 g012 g11
94
With two time-varying variables with random effects, the G matrix
would look like:

0 00 01 02
1 10 11 12
2 20 21 22
Var
i
i
i
g g g
g g g
g g g



                         


The point of minimum variance is located at:

1
1011 12
21 22 20
gg g
g g g
            


95
Differences between lm (OLS) and lme (mixed model) with
balanced data

Just looking at regression coefficients:

> fit.ols <- lm( distance ~ age * Sex, dd)
> fit.mm <- lme( distance ~ age * Sex, dd,
+ random = ~ 1 + age | Subject)
> summary(fit.ols)

Call:
lm(formula = distance ~ age * Sex, data = dd)

Residuals:
Min 1Q Median 3Q Max
-5.6156 -1.3219 -0.1682 1.3299 5.2469

96
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.3406 1.4162 11.538 < 2e-16
age 0.7844 0.1262 6.217 1.07e-08
SexFemale 1.0321 2.2188 0.465 0.643
age:SexFemale -0.3048 0.1977 -1.542 0.126
. . .
> summary(fit.mm)
. . .
Fixed effects: distance ~ age * Sex
Value Std.Error DF t-value p-value
(Intercept) 16.340625 1.0185320 79 16.043311 0.0000
age 0.784375 0.0859995 79 9.120691 0.0000
SexFemale 1.032102 1.5957329 25 0.646789 0.5237
age:SexFemale -0.304830 0.1347353 79 -2.262432 0.0264

Note that going from OLS to MM, precision shifts from between-
subject comparisons to within-subject comparisons. When data are
balanced, lm (OLS) and lme (mixed models) produce the same ˆ s
with different SEs.
97

Take 2: Learning lessons from unbalanced
data

What can happen with unbalanced data?

Here is some data that is similar to the Pothoff and Roy data but with:

 different age ranges for different subjects

 a between-subject effect of age that is different from the within-
subject effect of age

98
> head(du)
y x id xb xw Subject Sex age
1 12.37216 8 1 11 -3 F09 Female 8
2 11.20801 10 1 11 -1 F09 Female 10
3 10.44755 12 1 11 1 F09 Female 12
4 10.43831 13 1 11 2 F09 Female 13
5 14.13549 9 2 12 -3 F11 Female 9
6 13.47965 11 2 12 -1 F11 Female 11

> tail(du)
y x id xb xw Subject Sex age
103 35.67045 37 26 36 1 M08 Male 37
104 35.70928 38 26 36 2 M08 Male 38
105 38.81624 34 27 37 -3 M10 Male 34
106 37.87866 36 27 37 -1 M10 Male 36
107 36.22499 38 27 37 1 M10 Male 38
108 35.62520 39 27 37 2 M10 Male 39
99

age_raw
y
10
20
30
40
10 20 30 40
M08 M10
10 20 30 40
M04 M02
10 20 30 40
M16 M14
M12 M11 M07 M06 M01
10
20
30
40
M05
10
20
30
40
M09 M15 M13 M03
F02 F10 F06 F03 F08
10
20
30
40
F07
10
20
30
40
F01
10 20 30 40
F04 F11
10 20 30 40
F09 F05
100

age_raw
y
10
20
30
40
10 20 30 40
Male
10 20 30 40
Female
101

Using age centered at
25.

Why? Like the ordinary
regression model, the
mixed model is
equivariant under
global centering but
convergence may be
improved because the G
matrix is less eccentric.

age
y
10
20
30
40
-10 0 10
Male
-10 0 10
Female
102

R code and output
> fit <- lme( y ~ age * Sex, du,
+ random = ~ 1 + age| Subject)
> summary( fit )
Linear mixed-effects model fit by REML
Data: du
AIC BIC logLik
374.6932 395.8484 -179.3466

Random effects:
Formula: ~1 + age | Subject
Structure: General positive-definite, Log-
Cholesky parametrization
StdDev Corr
(Intercept) 9.32672995 (Intr)
age 0.05221248 0.941
Residual 0.50627022
103

Fixed effects: y ~ age * Sex
Value Std.Error DF t-value p-value
(Intercept) 40.26568 2.497546 79 16.122095 0.0000
age -0.48066 0.035307 79 -13.613685 0.0000
SexFemale -14.01875 3.830956 25 -3.659333 0.0012
age:SexFemale 0.05239 0.055373 79 0.946092 0.3470

Correlation:
(Intr) age SexFml
age -0.007
SexFemale -0.652 0.005
age:SexFemale 0.005 -0.638 0.058

Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.10716969 -0.54148659 -0.02688422 0.59030024 2.14279806

Number of Observations: 108
Number of Groups: 27

104
Between, Within and Pooled Models

We first focus on one
group, the female data:

What models could we fit
to this data?

age
y
10
20
30
40
-15 -10 -5 0 5 10

105

age
y
10
20
30
40
-15 -10 -5 0 5 10

marginal SD ellipse
106

Regressing with the
pooled data – ignoring
Subject – yields the
marginal (unconditional)
estimate of the slope: Pˆ

age
y
10
20
30
40
-15 -10 -5 0 5 10

marginal regression line
107

We could replace each
Subject by its means for x
and y and use the resulting
aggregated data with one
point for each Subject.

age
y
10
20
30
40
-15 -10 -5 0 5 10

108

age
y
10
20
30
40
-15 -10 -5 0 5 10

dispersion ellipse of subject means
109

Performing a regression
on the aggregated data
yields the ‘between-
subject’ regression, in
some contexts called an
‘ecological regression’
estimating, in some
contexts, the
'compositional effect' of
age.
age
y
10
20
30
40
-15 -10 -5 0 5 10

regression on subject means = ecological regression
110

We can combine all
within-subject regressions
to get a combined estimate
of the within-subject
slope. This is the estimate
obtained with a fixed-
effects model using age
and Subject additively.
Equivalently, we can
perform a regression using
(the within-subject
residuals of y minus mean
y) on (age minus mean
age).

Q: Which is better: ˆ, Wˆ or
Pˆ ?
.
age
y
10
20
30
40
-15 -10 -5 0 5 10

within-subject regression
111

A: None. They answer
different questions.
Typically, Pˆ would be
used for prediction across
the population; Wˆ for
‘causal’ inference
controlling for between-
subject confounders,
assuming that all
confounders affect all
observations similarly.
age
y
10
20
30
40
-15 -10 -5 0 5 10

within-subject regression
112

The relationship among
estimators:

Pˆ combines ˆ and Wˆ :

   1P B W B B W Wˆ ˆ ˆW W W W    

The weights depend only
on the design ( X matrix),
not of estimated variances
of the response.

age
y
10
20
30
40
-15 -10 -5 0 5 10
between-subject
marginal
within-subject
113
The Mixed Model
The mixed model
estimate5 also combines
ˆ and Wˆ :

 
 
1MM
MM B W
MM
B B W W
ˆ
ˆ ˆ
W W
W W

 
 


but with a lower weight on
ˆ:

MM 00
B B B
00
00
/
/
/ /1
T
gTW W W
T g T
g


 


   
Note that
MM
B BW W

5 Using a random intercept model
age
y
10
20
30
40
-15 -10 -5 0 5 10
between-subject
marginal
mixed model
within-subject
114

 The mixed model estimator is a variance optimal combination of ˆ
and Wˆ .
 It makes perfect sense if ˆ and Wˆ estimate the same thing, i.e. if
W   !
 Otherwise, it’s an arbitrary combination of estimates that estimate
different things. The weights in the combination have no
substantive interpretation.
 i.e. it’s an optimal answer to a meaningless question.

Summary of the relationships among 4 models:

Model Estimate of slope Precision
Between Subjects ˆ BW
Marginal (pooled data) Pˆ
Mixed Model MMˆ
Within Subjects Wˆ WW
115

The pooled estimate combines ˆ and Wˆ :

1
P B W B B W W
ˆ ˆ ˆW W W W            
  

Mixed model

With a random intercept model:
1 0 0, 00~ (0, ), ~ (0, )i iit iit it tu uy X N g N      

with 00 ,g   known MMˆ is also a weighted combination of ˆ
and Wˆ but with less weight on ˆ:
116
MM
B B
00
B
/
/
Between-Subject Information
Within-Subject Informationmonotone
TW W
T g
f W




    
 
 

 MMˆ is between Wˆ and Pˆ , i.e. it does better than Pˆ in the sense
of being closer to Wˆ but is not equivalent to Wˆ .

 With balanced data W MM P Pˆ ˆ ˆ ˆ     

 As
00
1 0
T g
    , MM Wˆ ˆ  , so a mixed model estimates the
within effect asymptotically in T – which is the cluster size
NOT the number of clusters.
117

 As
00
1
T g
    , MM Bˆ ˆ  . Thus the mixed model estimate fails
to control for between-subject confounding factors. Note that
this does not capture the whole story because Wˆ and Bˆ are not
independent of
00
g . If
00
0g  then
B W  so that MM B W   
118
A serious a problem? a simulation

1,000 simulations
showing
mixed model estimates
of slope
using the same
configuration of Xs with
W 1/ 2   and B 1  ,
keeping 00 1/ 2g  and
allowing
  to vary from 0.005 to
5




^
1
-0.5
0.0
0.5
1.0
0 1 2 3 4 5
119

What happened?

As  gets larger,
the relatively small
value of 00g is harder to
identify
and
both sources of
variability
(within-subject and
between-subject)
are attributed to  .




^
0
2
4
6
0 1 2 3 4 5
120

The blue line is the
diagonal ˆ  and the
equation of the red line
is ˆ 1   .

When 00ˆ 0g  , the
between- subject
relationship is treated as
if it has very high
precision and it
dominates in forming
the mixed model
estimate.




^
0
2
4
6
0 1 2 3 4 5
121
Splitting age into two variables

Since age has a within-subject effect that is inconsistent with its
between-subject effect we can split it into two variables:

1. Between-subject ‘contextual predictor’: e.g. age.mean of each
subject (or the starting age), and

2. within-subject predictor:

a. age itself or
b. within-subject residual: age.resid = age – age.mean

So we model:

.E( ) .age mean ageit i it ity age mean age      

or
122

* * *
.0 .E( ) . .age meanit i it itage diffy age mean age diff      

Surprisingly

*
. ageage diff 

but

* *
. . .
.
age mean age mean age diff
age mean age
  
 
 
 

123

9 10 11 12
9
1
0
1
1
1
2
age
y
124

9 10 11 12
9
1
0
1
1
1
2
age
y
125

9 10 11 12
9
1
0
1
1
1
2
age
y
age
126

9 10 11 12
9
1
0
1
1
1
2
age
y
age  age.diff*
127

9 10 11 12
9
1
0
1
1
1
2
age
y
age  age.diff*
128

9 10 11 12
9
1
0
1
1
1
2
age
y
age  age.diff*age.mean
129

9 10 11 12
9
1
0
1
1
1
2
age
y
age  age.diff*age.mean
130

*
.
.
age mean
age mean age

  

*
.age mean keeps
age.diff constant

.age mean keeps age
constant

9 10 11 12
9
1
0
1
1
1
2
age
y
age  age.diff*age.mean
age.mean*
Compositional effect
= Contextual effect
+ Within-subject effect
131
Using 'lme' with a contextual mean

> fit.contextual <- lme(
+ y ~ (age + cvar(age,Subject) ) * Sex,
+ du,
+ random = ~ 1 + age | Subject)

> summary(fit.contextual)
Linear mixed-effects model fit by REML
Data: du
AIC BIC logLik
296.8729 323.1227 -138.4365

Random effects:
Formula: ~1 + age | Subject
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 1.53161007 (Intr)
age 0.03287630 0.024
Residual 0.51263884

132
Fixed effects: y ~ (age + cvar(age, Subject)) * Sex
Value Std.Error DF t-value
(Intercept) -3.681624 1.6963039 79 -2.170380
age -0.493880 0.0343672 79 -14.370670
cvar(age, Subject) 1.628584 0.0695822 23 23.405165
SexFemale 6.000170 2.5050694 23 2.395211
age:SexFemale 0.060143 0.0538431 79 1.116996
cvar(age, Subject):SexFemale -0.313087 0.1266960 23 -2.471167
p-value
(Intercept) 0.0330
age 0.0000
cvar(age, Subject) 0.0000
SexFemale 0.0251
age:SexFemale 0.2674
cvar(age, Subject):SexFemale 0.0213

. . . . .

Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.871139553 -0.502221634 -0.006447848 0.552360837 2.428148053

Number of Observations: 108
Number of Groups: 27

133
> fit.compositional <- lme( y ~ (dvar(age,Subject) +
+ cvar(age,Subject) ) * Sex, du,
+ random = ~ 1 + age | Subject)
> summary(fit.compositional)
Linear mixed-effects model fit by REML
Data: du
AIC BIC logLik
296.8729 323.1227 -138.4365

Random effects:
Formula: ~1 + age | Subject
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 1.53161006 (Intr)
age 0.03287629 0.024
Residual 0.51263884

Fixed effects: y ~ (dvar(age, Subject) + cvar(age, Subject)) * Sex
Value Std.Error DF t-value
(Intercept) -3.681624 1.6963039 79 -2.170380
dvar(age, Subject) -0.493880 0.0343672 79 -14.370670
cvar(age, Subject) 1.134704 0.0616092 23 18.417778
SexFemale 6.000170 2.5050694 23 2.395211
dvar(age, Subject):SexFemale 0.060143 0.0538431 79 1.116996
cvar(age, Subject):SexFemale -0.252945 0.1161225 23 -2.178257
134
p-value
(Intercept) 0.0330
dvar(age, Subject) 0.0000
cvar(age, Subject) 0.0000
SexFemale 0.0251
dvar(age, Subject):SexFemale 0.2674
cvar(age, Subject):SexFemale 0.0399

. . . . .

Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-1.871139550 -0.502221640 -0.006447847 0.552360836 2.428148063

Number of Observations: 108
Number of Groups: 27
135
Simulation Revisited

1,000 simulations
using the same
models as the
earlier simulation,
i.e.
the same
configuration of Xs
with
W 1/ 2   and B 1  ,
keeping 00 1/ 2g 
and
allowing
  to vary from
0.005 to 5





^
W


^
B
-1.0
-0.5
0.0
0.5
1.0
0 1 2 3 4 5
136

Here a mixed
model is used with
mean age by
subject and the
within-subject
residual of age
from mean age.




^
W


^
B
-1.0
-0.5
0.0
0.5
1.0
0 1 2 3 4 5
137

Including the
contextual variable
gives better
estimates of
variance
components. The
estimate of  does
not eventually
include 00g




^
0
1
2
3
4
5
0 1 2 3 4 5
138
Power

The best way to carry out power calculations is to simulate. You end
up learning about a lot more than power.

Nevertheless, Stephen Raudenbush and colleagues have a nice
graphical package available at Optimal Design Software.

139

Some links

 There is a very good current bibliography as well as many other
resources at the UCLA Academic Technology Services site. Start
your visit at
http://www.ats.ucla.edu/stat/sas/topics/repeated_measures.htm

 Another important site is the Centre for Multilevel Modeling,
currently at the University of Bristol:
http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-
support/news.shtml

140
A few books

 Pinheiro, Jose C. and Bates, Douglas M. (2000) Mixed-Effects
Models in S and S-PLUS. Springer

 Fitzmaurice, Garrett M., Laird, Nan M., Ware, James H. (2004)
Applied Longitudinal Analysis, Wiley.

 Allison, Paul D. (2005) Fixed Effects Regression Methods for
Longitudinal Data Using SAS, SAS Institute.

 Littell, Ramon C. et al. (2006) SAS for Mixed Models (2nd ed.),
SAS Institute.

 Singer, Judith D. and Willett, John B. (2003) Applied Longitudinal
Data Analysis : Modeling Change and Event Occurrence. Oxford
University Press.
141

Appendix: Reinterpreting weights

The mixed model estimate using a random intercept model can be seen
either as a weighted combination of ˆ and Wˆ or of Pˆ and Wˆ

       
1
MM B W B B W W
00 00
11 1 1 1
00 B W 00 B B W W
11
1 1
00 B W W 00 B W P
/ /ˆ ˆ ˆ
/ /
ˆ ˆ
ˆ
T TW W W W
T g T g
g W W g W W
T T T T
g W W W g W W
T T
    
    
 
 
 
      
  
         
                                 
         
   
    
    
1
W W
ˆW 
     

学霸联盟