xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

R代写-ICPSR 2017

时间：2021-04-03

Mixed Models with R:

Longitudinal Data Analysis

The Special Roles of Time

ICPSR 2017 at York University

Georges Monette

random@yorku.ca

2

Contents

Summary: .............................................................................................. 5

Take 1: The basic ideas ........................................................................ 8

A traditional example ........................................................................ 9

Pooling the data ('wrong' analysis) .................................................. 11

Fixed effects regression model ........................................................ 22

Other approaches ............................................................................. 29

Multilevel Models............................................................................ 30

From Multilevel Model to Mixed Model ........................................ 33

Mixed Model for Longitudinal Data in R: ...................................... 38

Notes on interpreting autocorrelation ........................................... 47

Some issues concerning autocorrelation ...................................... 48

Mixed Model in Matrices ................................................................ 51

Fitting the mixed model ................................................................... 53

Comparing GLS and OLS ............................................................... 56

Testing linear hypotheses in R ........................................................ 57

3

Modeling dependencies in time ....................................................... 63

G-side vs. R-side.............................................................................. 65

Simpler Models................................................................................ 68

BLUPS: Estimating Within-Subject Effects ................................... 70

Where the EBLUP comes from : looking at a single subject ....... 79

Interpreting G .................................................................................. 87

Differences between lm (OLS) and lme (mixed model) with

balanced data ................................................................................... 95

Take 2: Learning lessons from unbalanced data ................................ 97

R code and output .......................................................................... 102

Between, Within and Pooled Models ............................................ 104

The Mixed Model .......................................................................... 113

A serious a problem? a simulation ................................................ 118

Splitting age into two variables ..................................................... 121

Using 'lme' with a contextual mean ............................................... 131

> fit.contextual <- lme( .................................................................. 131

+ y ~ (age + cvar(age,Subject) ) * Sex ................................. 131

Simulation Revisited ..................................................................... 135

4

Power ................................................................................................ 138

Some links ........................................................................................ 139

A few books ...................................................................................... 140

Appendix: Reinterpreting weights .................................................... 141

5

Summary:

At first sight a mixed model for longitudinal data analysis does not

look very different from a mixed model for hierarchical data. In

matrices:

Linear Model 2~ ( , )N y Xβ ε ε 0 I

Mixed Model

for

Hierarchical

Data:

2~ ( , )

~ ( , )

N

N

j j j j j

j j

j

y Xγ Zu ε

ε 0 I

u 0 G

1

2

j

j

j

jn

y

y

y

jy

Observations

in jth cluster

(students in

jth school)

Mixed Model

for

Longitudinal

Data:

j j j j j

j j

j

~ ( , )

~ ( , )

N

N

y Xγ Zu ε

ε 0 R

u 0 G

1

2

j

j

j

jn

y

y

y

jy

Observations

over time on

jth subject

6

Formally, mixed models for hierarchical data and for longitudinal data look

almost the same. In practice, longitudinal data introduces some fascinating

challenges:

1) The observations within a cluster are not necessarily independent. This

is the reason for the broader conditions that ~ ( , )j jNε 0 R (where jR is a

variance matrix) instead of merely the special case: 2~ ( , )j jN Iε 0 .

Observations close in time might depend on each other in ways that are

different from those that are far in time. Note that if all observations have

equal variance and are equally positively correlated – what is called a

Compound Symmetry variance structure – this is entirely accounted for by

the random intercept model on the G side. The purpose of the R matrix is to

potentially capture interdependence that is more complex than compound

symmetry.

2) The mean response may depend on time in ways that are far more

complex than is typical for other types of predictors. Depending on the time

7

scale of the observations, it may be necessary to use polynomial models,

asymptotic models, Fourier analysis (orthogonal trigonometric functions)

or splines that adapt to different features of the relationship in different

periods of times.

3) There can be many partially confounded 'clocks' in the same analysis:

period-age-cohort effects, age and time relative to a focal event such as

giving birth, injury, arousal from coma, etc.

4) Some periodic patterns can be modeled either with fixed effects or with

random effects through the R matrix. Periodic patterns with a fixed period

(e.g. seasonal periodic variation) are naturally modeled with fixed effects

(FE) using, for example, trigonometric functions. Periodic patterns with a

randomly varying period (e.g. sunspot cycles, are more appropriately

modeled with the R matrix which can be used to model the kind of pattern

encountered in time series analysis, autoregressive and moving average

models for residuals

These slides focus on the simple functions of time and the R side. Lab 3

introduces more complex forms for functions of time.

8

Take 1: The basic ideas

9

A traditional example

Figure 1: Pothoff and Roy dental measurements in boys and girls.

Balanced data:

everyone measured at

the same set of ages

could use a classical

repeated measures

analysis

Some terminology:

Cluster: the set of

observations on one

subject

Occasion: observations

at a given time for each

subject

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

10

Figure 2: A different view by sex

Viewing by sex helps to

see pattern between

sexes:

Note:

Slopes are relatively

consistent within each

sex – except for a few

anomalous male curves.

BUT

Intercept is highly

variable.

An analysis that pools

the data ignores this

feature. Slope estimates

will have excessively

large SEs and 'level'

estimates too low SEs.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

11

Pooling the data ('wrong' analysis)

Ordinary least-squares on pooled data:

0

1, , (number of subjects [clusters])

1, , (number of occasions for th subject)

age sex age sexit it i it i it

i

y age sex age sex

i N

t T i

Note that it is customary to use i for subjects (clusters) and t for

occasions (time). This means that ‘i’ assumes the role previously

played by ‘j’.

R:

> library( spida ) # see notes on installation

> library( nlme ) # loaded automatically

> library( lattice ) # ditto

> data ( Orthodont ) # without spida

12

> head(Orthodont)

distance age Subject Sex

1 26.0 8 M01 Male

2 25.0 10 M01 Male

3 29.0 12 M01 Male

4 31.0 14 M01 Male

5 21.5 8 M02 Male

6 22.5 10 M02 Male

> dd <- Orthodont

> tab(dd, ~Sex)

Sex

Male Female Total

64 44 108

13

> dd$Sub <- reorder( dd$Subject, dd$distance)

# for plotting

## OLS Pooled Model

> fit <- lm ( distance ~ age * Sex , dd)

> summary(fit)

. . .

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.3406 1.4162 11.538 < 2e-16 ***

age 0.7844 0.1262 6.217 1.07e-08 ***

SexFemale 1.0321 2.2188 0.465 0.643

age:SexFemale -0.3048 0.1977 -1.542 0.126

Residual standard error: 2.257 on 104 degrees of freedom

Multiple R-squared: 0.4227, Adjusted R-squared: 0.4061

F-statistic: 25.39 on 3 and 104 DF, p-value: 2.108e-12

14

Note that both SexFemale and age:SexFemale have

large p-values. Are you tempted to just drop

both of them?

Check the joint hypothesis that they are

BOTH 0.

> wald( fit, "Sex")

numDF denDF F.value p.value

Sex 2 104 14.97688 <.00001

Coefficients Estimate Std.Error DF t-value p-value

SexFemale 1.032102 2.218797 104 0.465163 0.64279

age:SexFemale -0.304830 0.197666 104 -1.542143 0.12608

This analysis suggests that we could drop one

or the other but not both! Which one should we

choose? To respect the principle of marginality

15

we should drop the interaction, not the main

effect of Sex. This leads us to:

> fit2 <- lm( distance ~ age + Sex ,dd )

> summary( fit2 )

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 17.70671 1.11221 15.920 < 2e-16 ***

age 0.66019 0.09776 6.753 8.25e-10 ***

SexFemale -2.32102 0.44489 -5.217 9.20e-07 ***

and we conclude there is an effect of Sex of

jaw size but did not find evidence that the

rate of growth is different.

Revisiting the graph we saw earlier:

16

Figure 3: A different view by sex

The analysis is inconsistent

with what we see. It fails

Tukey’s interocular test.

Except for a few irregular

male trajectories, the male

trajectories appear steeper

than the female ones. On

the other hand, if we

extrapolate the curves back

to age 0, there would not

much difference in levels.

estimates too low SEs.

OLS cannot exploit the consistency in slopes to recognize that

hypotheses about slopes should have a relatively smaller SE than

hypotheses about the levels of the curves. OLS is blind to the lines.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

17

From the first OLS fit:

Estimated variance

within each

subject:

2

2

2

2

2.26 . . .

. 2.26 . .

. . 2.26 .

. . . 2.26

Why is this wrong?

Residuals within

clusters are not

independent; they tend

to be highly correlated

with each other

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

18

Fitted lines in ‘data

space’.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

19

Determining the

intercept and slope of

each line

age

d

i

s

t

a

n

c

e

15

20

25

30

0 5 10

Male

15

20

25

30

Female

20

Fitted lines in

‘data’ space

age

d

i

s

t

a

n

c

e

15

20

25

0 5 10

Male

Female

21

Fitted ‘lines’ in

‘beta’ space

^age

^

0

16.0

16.5

17.0

17.5

0.4 0.5 0.6 0.7 0.8 0.9

Male

Female

22

Fixed effects regression model

See

Paul D. Allison (2005) Fixed Effects Regression Methods for

Longitudinal Data Using SAS. SAS Institute – a great book on basics

of mixed models!

Treat Subject as a factor

Lose Sex unless it is constructed as a Subject contrast

Fits a separate OLS model to each subject:

it i age it iti agey

See the wiki for details: SPIDA 2010: Longitudinal Models:

Additional material

23

Estimated variance for

each subject:

2

2

2

2

1.31 . . .

. 1.31 . .

. . 1.31 .

. . . 1.31

Problems:

● No estimate of sex

effect

● Can't generalize to

population, only to 'new'

observations from same

subjects

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

24

● Can't predict for new

subject.

● Can construct sex

effect but CI is for

difference between

sexes in this sample

● No autocorrelation in

time

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

25

Fitted lines in data

space

● Female lines lower

and less steep

● Patterns within Sexes

not so obvious.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

26

Fitted lines in beta

space

● Patterns within sexes

more obvious: steeper

slope associated with

smaller intercept.

● Single male outlier

stands out

^age

^

0

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

27

Each within-subject

least squares estimate

0

ˆ

ˆ

ˆi

i

i

i age

has variance ' 1( )i i X X

which is used to

construct a confidence

ellipse for the ‘fixed

effect’

0i

i

i age

for the ith subject.

Each CI uses only the

information from that

subject (except for the

estimate of )

^age

^

0

0

10

20

30

0 1 2

Male

0 1 2

Female

28

Differences between

subjects such as the

dispersion of ˆi s and the

information they provide

on the dispersion of the

true i s is ignored in this

model.

The standard error of the

estimate of each average

Sex line uses the sample

distribution of it s within

subjects but not the

variability in i s

between subjects.

^age

^

0

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

29

Other approaches

Repeated measures (univariate and multivariate)

o Need same times for each subject, no other time-varying

(Level 1) variables

Two-stage approach: use ˆ s in second level analysis:

o If design not balanced, then iˆ s have different variances, and

would need different weights, Using ' 1( )i i X X does not

work because the relevant weight is based on the marginal

variance ' 1( )i i G X X , not the conditional variance given

the ith subject, ' 1( )i i X X .

30

Multilevel Models

Start with the fixed effects model:

Within-subject model (same as fixed effects model above):

1it it iti iy X ~ (0, )i N I

1, , 1, , ii N t T

0i is the ‘true’ intercept and 1i is the ‘true’ slope with respect to

X.

is the within-subject residual variance.

X (age in our example) is a time-varying variable. We could have

more than one.

31

Then add:

Between-subject model (new part):

We suppose that 0i and 1i vary randomly from subject to

subject.

But the distribution might be different for different Sexes (a

‘between-subject’ or ‘time-invariant’ variable). So we assume a

multivariate distribution:

0

11 11

1 1 1

1, ,

1

i i i

i

i ii

i

ii

uW i NuW

u

uW

0

1 11

0~ , ~ ( , ) 1, ,0

i

i

u g gN N i Ng gu

0 G

32

where iW is a coding variable for Sex, e.g. 0 for Males and 1 for

Females.

0

1

1 1

among Males

among Females

i

i age

E

Some software packages use the formulation of the multilevel model,

e.g. MLWin.

SAS and R use the ‘mixed model’ formulation. It is very useful to

know how to go from one formulation to the other.

33

From Multilevel Model to Mixed Model

Combining the two levels of the multilevel model by substituting

the between subject model into the within-subject model. Then

gather together the fixed terms and the random terms:

0 1

0 1

1 1

0

1

1

1

1 1

1

(fixed part of the m

(random part

odel

of t

)

i

i

i

i i

it it iti i

i i it

ii ii it it

it i it

it

iti i

it

it

W W

u u

u u

y X

W W X

W W X X

X X

X

u u

he model)

34

Anatomy of the fixed part:

1

1

(Intercept)

(between-subject, time-invariant variable)

(within-subject, time-varying variable)

(cross-level interaction)

i

it

i it

W

W

X

X

Interpretation of the fixed part: the parameters reflect population

average values.

Anatomy of the random part:

For one occasion:

0 1i i iit ittu u X

35

Putting the observations of one subject together:

0

1

1

2

11

22

3 3

4

3

44

1

1

1

1

i

i

i

ii

ii

ii

ii

i

i

i i

i

i

i

u

u

u

X

X

X

X

Ζ

Note: the random-effects design uses only time-varying variables

Distribution assumption:

~ (0, ) independent of ~ (0, )i iiu N NG R

where, so far, i I R

36

Notes:

G (usually) does not vary with i. It is usually a free positive

definite matrix or it may be a structured pos-def matrix.

More on G later.

iR (usually) does change with i – as it must if iT is not

constant. iR is expressed as a function of parameters. The

simplest example is

i in ni I R . Later we will use iR to

include auto-regressive parameters for longitudinal

modeling.

We can’t estimate G and R directly. We estimate them

through:

'Var( )i i i ii V Z GZ R

37

Some things can be parametrized either on the G-side or on

the R-side. If they’re done in both, you lose identifiability.

Ill-conditioning due “collinearity” between the G- and R-

side models is a common problem.

38

Mixed Model for Longitudinal Data in R:

> fit <- lme( distance ~ age * Sex, dd,

+ random = ~ 1 + age | Subject,

+ correlation

+ = corAR1 ( form = ~ 1 | Subject))

Model formula: distance ~ age * Sex

o specifies the fixed model

o includes the intercept and marginal main effects by default

o contains time-varying, time-invariant and cross-level

variables together

39

Random argument: ~ 1 + age | Subject

o Specifies the variables in the random model and the variable

defining clusters.

o The G matrix is the variance covariance matrix for the

random effect. Here 00 0,00

,,, ,0

Var Var ageii

age ageage iage i age

g gu

u g g

G

o Normally, the random model only contains an intercept and,

possibly, time-varying variables

40

Correlation argument: Specifies the model for the iR matrices

o Omit to get the default:

i in ni R I

o Here we illustrate the use of an AR(1) structure producing for

example

1 2 3

1 1 2

2 1 1

3 2 1

1

1

1

1

iR

in a cluster with 4 occasions.

41

# Mixed Model in R:

> fit <- lme( distance ~ age * Sex, dd,

+ random = ~ 1 + age | Subject,

+ correlation

+ = corAR1 ( form = ~ 1 | Subject))

> summary(fit)

Linear mixed-effects model fit by REML

Data: dd

AIC BIC logLik

446.8076 470.6072 -214.4038

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-

Cholesky parametrization

42

StdDev Corr

(Intercept) 3.3730482 (Intr)

00

g

age 0.2907673 -0.831

11 01 01 00 11

/g r g g g

Residual 1.0919754

Correlation Structure: AR(1)

Formula: ~1 | Subject

Parameter estimate(s):

Phi

-0.47328

correlation between adjoining obervations

43

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.152435 0.9984616 79 16.177323 0.0000

age 0.797950 0.0870677 79 9.164702 0.0000

SexFemale 1.264698 1.5642886 25 0.808481 0.4264

age:SexFemale -0.322243 0.1364089 79 -2.362334 0.0206

Correlation: among gammas

(Intr) age SexFml

age -0.877

SexFemale -0.638 0.559

age:SexFemale 0.559 -0.638 -0.877

Standardized Within-Group Residuals:

Min Q1 Med

Q3 Max

-3.288886631 -0.419431536 -0.001271185

0.456257976 4.203271248

Number of Observations: 108

44

Number of Groups: 27

Confidence intervals for all parameters:

> intervals( fit )

Approximate 95% confidence intervals

Fixed effects:

lower est. upper

(Intercept) 14.1650475 16.1524355 18.13982351

age 0.6246456 0.7979496 0.97125348

SexFemale -1.9570145 1.2646982 4.48641100

age:SexFemale -0.5937584 -0.3222434 -0.05072829

attr(,"label")

[1] "Fixed effects:"

45

Random Effects:

Level: Subject

lower est. upper

sd((Intercept)) 2.2066308 3.3730482 5.1560298

sd(age) 0.1848904 0.2907673 0.4572741

cor((Intercept),age) -0.9377008 -0.8309622 -0.5808998

Correlation structure:

lower est. upper

Phi -0.7559617 -0.4728 -0.04182947

attr(,"label")

[1] "Correlation structure:"

Do NOT use CIs for SDs

to test whether they are 0.

Use anova + simulate. Cf

Lab 1.

Negative!

46

Within-group standard error:

lower est. upper

0.9000055 1.0919754 1.3248923

> VarCorr( fit ) from the G matrix

Subject = pdLogChol(1 + age)

Variance StdDev Corr

(Intercept) 11.3774543 3.3730482 (Intr)

age 0.0845456 0.2907673 -0.831

Residual 1.1924103 1.0919754

To get the G matrix itself

in a form that can be used

in matrix expressions

> getVarCov( fit )

47

Random effects variance covariance matrix

(Intercept) age

(Intercept) 11.37700 -0.814980

age -0.81498 0.084546

Standard Deviations: 3.373 0.29077

Notes on interpreting autocorrelation

The estimated autocorrelation is negative. Although most natural

processes would be expected to produce positive autocorrelations,

occasional large measurement errors can create the appearance of a

negative autocorrelation. Some processes correct so that a unusually

large value tends to be compensated by an unusually small one, e.g.

residue in chemical process.

48

Some issues concerning autocorrelation

1. Lack of fit will generally contribute positively to

autocorrelation. For example, if trajectories are quadratic but

you are fitting a linear trajectory, the residuals will be

positively autocorrelated. Strong positive autocorrelation

can be a symptom of lack of fit. This is an example of poor

identification between the FE model and the R model, that is,

between the deterministic and the stochastic aspects of the

model. See Lab 3 for a similar discussion of seasonal (FE)

versus cyclical variation (R-side) periodic patterns.

2. As mentioned above, occasional large measurement errors will

contribute negatively to the estimate of autocorrelation.

49

3. In a well fitted OLS model, the residuals are expected to be

negatively correlated, more so if there are few observations per

subject.

4. With few observations per subject, the estimate of

autocorrelation (R side) can be poorly identified and highly

correlated with G-side parameters. [See 'Additional Notes']

Looking at the data we suspect that M09 might be highly influential

for autocorrelation. We can refit without M09 to see how the estimate

changes.

What happens when we drop M09?

> fit.dropM09 <- update( fit,

+ subset = Subject != "M09")

50

> summary( fit.dropM09 )

Linear mixed-effects model fit by REML

Data: dd

Subset: Subject != "M09"

AIC BIC logLik

406.3080 429.7545 -194.1540

. . . . .

Correlation Structure: AR(1)

Formula: ~1 | Subject

Parameter estimate(s):

Phi

-0.1246035 still negative

. . . . . .

> intervals( fit.dropM09 )

Approximate 95% confidence intervals

. . . .

Correlation structure:

lower est. upper

Phi -0.5885311 -0.1246035 0.4010562

attr(,"label") but not significantly

[1] "Correlation structure:"

51

Mixed Model in Matrices

In the ith cluster:

1 1

1

0

1

1

0

1

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

1 1

1 1

1 1

1 1

i i

i

i

i

it it i it it

i ii i i i

i ii i i i

i ii i i i

i ii i i i

itu u

u

W

u

Wy X X X

y W X W X X

y W X W X X

y W X W X X

y W X W X X

1

2

3

4

i

i

i

i

i iii i

y X uγ Ζ ε

[Could we fit this model in cluster i?]

where

'

~ ( , ) ~ ( , )

~ ( , )

i

i

i i

i i i i i i i

N N

N

0 Gu ε

u ε

0 R

Ζ 0 Ζ GΖ R

52

For the whole sample

1 1 1 11 0

0N N N N N

u ε

u

y X Ζ

γ

y X Ζ ε

Finally making the complex look deceptively simple:

u εy Xγ Ζ

Xγ δ

with

1 Var( )

Var( )

Var( ) 'N

R 0 G

R Ζ

0

u

ε δ u

R G

ε

δ V Ζ Z R

53

Fitting the mixed model

Use Generalized Least Squares on

~ ( , ' )N y Xγ ΖGZ R

1

1 1ˆ ˆˆ ' 'GLS

γ X V X X V y

We need ˆGLSγ to get Vˆ and vice versa so one algorithm iterates from

one to the other until convergence.

There are two main ways of fitting mixed models with normal

responses:

54

1. Maximum Likelihood (ML)

a. Fits all of , ,γ G R at the same time.

b. Two ML fits can be used in the 'anova' function to test

models that differ in their FE models or in their RE

models: G and/or R.

c. ML fits tend to underestimate V and Wald tests will tend

to err on the liberal side.

2. Restricted (or Residual) Maximum Likelihood

a. Maximum likelihood is applied to the 'residual space'

with respect to the X matrix and only estimates G and R,

hence V. The estimated V is then used to obtain the

GLS estimate for γ .

55

b. 'anova' can only be used to compare two models with

identical FE models. Thus 'anova' (Likelihood Ratio

Tests) can only be applied to hypotheses about REs.

c. The estimate of V tends to be better than with ML and

Wald tests are expected to be more accurate than with

ML.

d. Thus with REML, you sacrifice the ability to perform

LRTs for FEs but improve Wald tests for FEs. Also,

LRTs for REs are expected to be more accurate.

e. REML is the default for 'lme' and PROC MIXED in

SAS.

56

Comparing GLS and OLS

We used OLS above:

1ˆ ' 'OLS γ X X X y

instead of

1

1 1ˆ ˆˆ ' 'GLS

γ X V X X V y

How does OLS differ from GLS?

Do they differ only in that GLS produces more accurate standard

errors? Or can ˆOLSβ be very different from ˆGLSβ ?

With balanced data they will be the same. With unbalanced data they

can be dramatically different. OLS is an estimate based on the pooled

data. GLS provides an estimate that is closer to that of the unpooled

57

data. Estimation of the FE model and of RE model are highly related

in contrast with OLS and GLMs with canonical links where they are

orthogonal.

Testing linear hypotheses in R

Hypotheses involving linear combination of the fixed effects

coefficients can be tested with a Wald test. The Wald test is

based on the normal approximation for maximum likelihood

estimators using the estimated variance-covariance matrix.

Using the 'wald' function alone displays the estimated fixed

effects coefficients and Wald-type confidence intervals as well as

a test that all true coefficients are equal 0 (this is rarely of any

interest).

58

> wald (fit)

numDF denDF F.value p.value

4 24 952.019 <.00001

Coefficients Estimate Std.Error DF t-value p-value

(Intercept) 16.479081 1.050894 76 15.681019 <.00001

age 0.769464 0.089141 76 8.631956 <.00001

SexFemale 0.905327 1.615657 24 0.560346 0.58044

age:SexFemale -0.290920 0.137047 76 -2.122774 0.03703

Continuation:

Coefficients Lower 0.95 Upper 0.95

(Intercept) 14.386046 18.572117

age 0.591923 0.947004

SexFemale -2.429224 4.239878

age:SexFemale -0.563872 -0.017967

59

We can estimate the response level at age 14 for Males and Females

by specifying the appropriate linear transformation of the coefficients.

> L <- rbind( "Male at 14" = c( 1, 14, 0, 0),

+ "Female at 14" = c( 1, 14, 1, 14))

> L

[,1] [,2] [,3] [,4]

Male at 14 1 14 0 0

Female at 14 1 14 1 14

> wald ( fit, L )

numDF denDF F.value p.value

1 2 24 1591.651 <.00001

Estimate Std.Error DF t-value p-value Lower 0.95

Male at 14 27.25157 0.605738 76 44.98908 <.00001 26.04514

Female at 14 24.08403 0.707349 24 34.04829 <.00001 22.62413 Upper 0.95

Male at 14 28.45800

Female at 14 25.54392

60

To estimate the gap at 14:

> L.gap <- rbind( "Gap at 14" = c( 0, 0, 1, 14))

> L.gap

[,1] [,2] [,3] [,4]

Gap at 14 0 0 1 14

> wald ( fit, L.gap)

numDF denDF F.value p.value

1 1 24 11.56902 0.00235

Estimate Std.Error DF t-value p-value Lower 0.95

Gap at 14 -3.167548 0.931268 24 -3.401327 0.00235 -5.089591

Upper 0.95

Gap at 14 -1.245504

61

To simultaneously estimate the gap at 14 and at 8 we can do the

following. Note that the overall (simultaneous) null hypothesis here is

equivalent to the hypothesis that there is no difference between the

sexes.

> L.gaps <- rbind( "Gap at 14" = c( 0, 0, 1, 14),

+ "Gap at 8" = c( 0,0,1, 8))

> L.gaps

[,1] [,2] [,3] [,4]

Gap at 14 0 0 1 14

Gap at 8 0 0 1 8

> wald ( fit, L.gaps)

numDF denDF F.value p.value

1 2 24 5.83927 0.00858

Estimate Std.Error DF t-value p-value Lower 0.95

Gap at 14 -3.167548 0.931268 24 -3.401327 0.00235 -5.089591

Gap at 8 -1.422030 0.844256 24 -1.684359 0.10508 -3.164489

62

An equivalent hypothesis that there is no difference between the sexes

is the hypothesis that the two coefficients for sex are simultaneously

equal to 0. The 'wald' function simplifies this by allowing a string as a

second argument that is used to match coefficient names. The test

conducted is that all coefficients whose name has been matched are

simultaneously 0.

> wald ( fit, "Sex" )

numDF denDF F.value p.value

Sex 2 24 5.83927 0.00858

Coefficients Estimate Std.Error DF t-value p-value

SexFemale 0.905327 1.615657 24 0.560346 0.58044

age:SexFemale -0.290920 0.137047 76 -2.122774 0.03703

Coefficients Lower 0.95 Upper 0.95

SexFemale -2.429224 4.239878

age:SexFemale -0.563872 -0.017967

Note the equivalence of the two F-tests above.

63

Modeling dependencies in time

The main difference between using mixed models for multilevel

modeling as opposed to longitudinal modeling are the assumptions

about

it

, plus the more complex functional forms for time effects. For

observations observed in time, part of the correlation between s

could be related to their distance in time.

R-side model allows the modeling of temporal and spatial

dependence.

Correlation argument R

Autoregressive of order 1:

corAR1( form =

~ 1 | Subject)

2 3

22

2

3 2

1

1

1

1

64

Correlation argument R

Autoregressive Moving Average

of order (1,1)

corARMA( form =

~ 1 | Subject,

p = 1, q =1)

2

2

2

1

1

1

1

AR(1) in continuous time

e.g. supposing a subject with

times 1,2, 5.5 and 102

corCAR1( form =

~ time | Subject)

4.5 9

3.5 82

4.5 3.5 4.5

9 8 4.5

1

1

1

1

2 Note that the times and the number of times – hence the indices – can change from subject to subject but 2 and have the same value.

65

G-side vs. R-side

A few things can be done with either side. But don’t do it with

both in the same model. The redundant parameters will not be

identifiable. For example, the G-side random intercept model is

‘almost’ equivalent to the R-side compound symmetry model.

With OLS the linear parameters are orthogonal to the variance

parameter. Collinearity among the linear parameters is determined

by the design, X, and does not depend on values of parameters.

Computational problems due to collinearity can be addressed by

orthogonalizing the X matrix.

With mixed models the variance parameters are generally not

orthogonal to each other and, with unbalanced data, the linear

parameters are not orthogonal to the variance parameters.

66

G-side parameters can be highly collinear even if the X matrix is

orthogonal. Centering the variables of the RE model around the

“point of minimal variance” will help but the resulting design

matrix may be highly collinear.

G-side and R-side parameters can be highly collinear. The degree

of collinearity may depend on the value of the parameters.

For example, our model identifies through:

2 3

2200 01

210 11

3 2

11 3

11 1 1 1 1 1ˆ

1 1 3 1 1 3 1

1 3 1

g

g g

g

V

For values of above 0.5, the Hesssian is very ill-conditioned. The

lesson may be that to use AR, ARMA models effectively, you need

at least some subjects observed on many occasions.

67

R-side only: population average models

G-side only: hierarchical models with conditionally independent

observations in each cluster

Population average longitudinal models can be done on the R-side

with AR, ARMA structures, etc.

The absence of the G-side may be less crucial with balanced data.

The G-side is not enough to provide control for unmeasured

between subject confounders if the time-varying predictors are

unbalanced (more on this soon).

A G-side random effects model DOES NOT provide the equivalent

of temporal correlation.

68

Simpler Models

The model we’ve looked at is deliberately complex including

examples of the main typical components of a mixed model. We can

use mixed models for simpler problems.

Using X as a generic time-varying (within-subject) predictor and W as

a generic time-invariant (between-subject) predictor we have the

following:

MODEL RANDOM Formula

One-way

ANOVA with

random effects

~ 1 ~ 1 | Sub 0i itit uy

Means as

outcomes

~ 1 + W

(~W)3

~ 1 | Sub

0

i

i i

it

tu

Wy

3 The model in () is equivalent in R

69

One-way

ANCOVA with

random effects

~ 1 + X (~X) ~ 1 | Sub

0

1

i it

it it

u

y X

Random

coefficients

model

~ 1 + X (~X) ~ 1+ X|Sub

1

0 1i i

it it

it itu u

y X

X

Intercepts and

slopes as

outcomes

~ 1 + X

+ W + X:W

(~ X*W)

~ 1+

X|Sub

1

1

0 1

i

i i

it it

i it

it it

W

u u

W

y X

X

X

Non- random

slopes

~ 1 + X

+ W + X:W

(~ X*W)

~ 1 | Sub

0

1

1

i

i

it it

i it

it

W

W

y X

X

70

BLUPS: Estimating Within-Subject Effects

We’ve seen how to estimate , G and R. Now we consider 0

1

i

i

i

.

We’ve already estimated i using the fixed-effects model with a OLS

regression within each subject. Call this estimator: iˆ . How good is

it?

1

'ˆVar( )i i i i

X X

Can we do better? We have another ‘estimator’ of i .

71

Suppose we know s for the population. We could also predict4 i by

using the within Sex mean intercepts and slopes, e.g. for Males we

could use:

1

with error variance:

0 00

1 10

Var i

i

G

We could then combine iˆ and

1

by weighting then by inverse

variance (= precision). This yields the BLUP (Best Linear Unbiased

Predictor):

11 11 11 ' 1 '

1

ˆ

i i i i i

G X X G X X

4 Non-statisticians are always thrown for a loop when we ‘predict’ something that

happened in the past. We use 'predict' for things that are random, 'estimate' for

things that are 'fixed'. Orthodox Bayesians always predict.

72

If we replace the unknown parameters with their estimates, we get the

EBLUP (Empirical BLUP):

11 11 11 ' 1 '

1

ˆˆ ˆ ˆˆ ˆ

ˆi i i i i i

G X X G X X

The EBLUP ‘optimally’ combines the information from the ith cluster

with the information from the other clusters. We borrow strength

from the other clusters.

The process ‘shrinks’ iˆ towards

1

ˆ

ˆ

along a path determined by the

locus of osculation of the families of ellipses with shape Gˆ around

1

ˆ

ˆ

and shape 1'ˆ i i X X around iˆ .

73

The slope of the BLUP

is close to the population

slope

but

the level of the BLUP is

close to the level of the

BLUE

This suggests that G has

a large variance for

intercepts and a small

variance for slopes

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

Popn BLUE BLUP

74

Population estimate

BLUE and BLUP

in beta space

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

M16 M05

0.5 1.0 1.5 2.0

M02 M11

0.5 1.0 1.5 2.0

M07 M08

M03 M12 M13 M14 M09

5

10

15

20

25

M15

5

10

15

20

25

M06 M04 M01 M10

F10 F09 F06 F01 F05

5

10

15

20

25

F07

5

10

15

20

25

F02

0.5 1.0 1.5 2.0

F08 F03

0.5 1.0 1.5 2.0

F04 F11

Popn BLUE BLUP

75

The marginal dispersion

of BLUEs comes from:

2 ' 1

2

1 11 2

ˆVar( ) ( )

ˆVar( )

i

i i i

i

i X

g

T S

G X X

Var( )i G [population

var.]

2 ' 1ˆVar( | ) ( )i i i i X X

[cond’l var.

resampling from ith

subject]

ˆE( | )i i i [BLUE]

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

76

So:

2 ' 1

ˆ ˆVar( ) Var(E( | ))

ˆE Var( | )

Var( )

ˆE Var( | )

( )

i i i

i i

i

i i

i i

G X X

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

77

While the expected

variance of the BLUEs

is larger than G

the expected variance of

the BLUPs is smaller

than G.

Beware of drawing

conclusions about G

from the dispersion of

the BLUPs.

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

78

The estimate of G can

be unstable and often

collapses to singularity

leading to non-

convergence for many

methods.

Possible remedies:

- Recentre X near point

of minimal variance,

- Use a smaller G

- Change the model

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

79

Where the EBLUP comes from : looking at a single subject

Note that the EBLUP’s

slope is close to the slope

of the population estimate

(i.e. the male population

conditioning on between-

subject predictors) while

the level of the line is

close to level of the

BLUE.

The relative precisions of

the BLUE and of the

population estimate on

slope and level are

reflected through the

shapes of G and 2 ' 1( )i i X X

M11

age

d

i

s

t

a

n

c

e

22

23

24

25

26

27

8 9 10 11 12 13 14

Popn BLUE BLUP

80

The same picture in

“beta-space”

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

81

The population

estimate with a SD

ellipse.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

82

The population

estimate with a SD

ellipse

and

the BLUE with its

SE ellipse

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

83

The EBLUP is an Inverse

Variance Weighted mean

of the BLUE and of the

population estimate.

We can think of taking the

BLUE and ‘shrinking’ it

towards the population

estimate along a path that

optimally combines the

two components.

The path is formed by the

osculation points of the

families of ellipses around

the BLUE and the

population estimate.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

84

The amount and direction

of shrinkage depends on

the relative shapes and

sizes of

G

and

2

2 ' 1 1ˆVar( | ) ( )

ii i i

i

i

T

XX X S

The BLUP is at an

osculation point of the

families of ellipses

generated around the

BLUE and population

estimate.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

85

Imagine what could

happen if G were

oriented differently:

Paradoxically, both the

slope and the intercept

could be far outside the

population estimate and

the BLUE.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

86

When is a BLUP a BLUPPER?

The rationale behind BLUPs is based on exchangeability. No outside

information should make this cluster stand out from the others and the

mean of the population deserves the same weight in prediction for this

cluster as it deserves for any other cluster that doesn’t stand out.

If a cluster stands out somehow, then the BLUP might be a

BLUPPER.

87

Interpreting G

The parameters of G give the variance of the intercepts, the variance

of the slopes and the covariance between intercepts and the slopes.

Would it make sense to assume that the covariance is 0 to reduce the

number of parameters in the model? To address this, consider that the

variance of the heights of individual regression lines a fixed value of

X is:

1

1

00 01

10 11

2

00 01 11

Var( ) Var 1

11

2

X X

g g

X g g X

g g X g X

88

Summarizing:

2

1 00 01 11Var( ) 2X g g X g X

is quadratic function of X.

So 1Var( )X has a minimum at 01

11

g

g

and the minimum variance is

2

01

00

11

gg g

89

Thus, assuming that the covariance is 0 is equivalent to

assuming that the minimum variance occurs when X = 0. This is

an assumption that is not invariant with location transformations of

X. It is similar to removing a main effect that is marginal to an

interaction in a model, something that should not be done without a

thorough understanding of its consequences.

Example: Let 20

1

and 10.5 1

1 0.1

G

90

A sample of lines in

beta space

1

0

15

20

25

30

-2.0 -1.5 -1.0 -0.5 0.0

concentration ellipse

91

The same lines in

data space.

X

Y

0

5

10

15

20

25

5 10 15 20 25

92

The same lines in

data space with the

population mean

line and lines at one

SD above and

below the

population mean

line

X

Y

0

5

10

15

20

25

5 10 15 20 25

93

The parameters of

G determine the

location and value

of the minimum

standard deviation

of lines

X

Y

0

5

10

15

20

25

5 10 15 20 25

g01 g11

g00

g00 g012 g11

94

With two time-varying variables with random effects, the G matrix

would look like:

0 00 01 02

1 10 11 12

2 20 21 22

Var

i

i

i

g g g

g g g

g g g

The point of minimum variance is located at:

1

1011 12

21 22 20

gg g

g g g

95

Differences between lm (OLS) and lme (mixed model) with

balanced data

Just looking at regression coefficients:

> fit.ols <- lm( distance ~ age * Sex, dd)

> fit.mm <- lme( distance ~ age * Sex, dd,

+ random = ~ 1 + age | Subject)

> summary(fit.ols)

Call:

lm(formula = distance ~ age * Sex, data = dd)

Residuals:

Min 1Q Median 3Q Max

-5.6156 -1.3219 -0.1682 1.3299 5.2469

96

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.3406 1.4162 11.538 < 2e-16

age 0.7844 0.1262 6.217 1.07e-08

SexFemale 1.0321 2.2188 0.465 0.643

age:SexFemale -0.3048 0.1977 -1.542 0.126

. . .

> summary(fit.mm)

. . .

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.340625 1.0185320 79 16.043311 0.0000

age 0.784375 0.0859995 79 9.120691 0.0000

SexFemale 1.032102 1.5957329 25 0.646789 0.5237

age:SexFemale -0.304830 0.1347353 79 -2.262432 0.0264

Note that going from OLS to MM, precision shifts from between-

subject comparisons to within-subject comparisons. When data are

balanced, lm (OLS) and lme (mixed models) produce the same ˆ s

with different SEs.

97

Take 2: Learning lessons from unbalanced

data

What can happen with unbalanced data?

Here is some data that is similar to the Pothoff and Roy data but with:

different age ranges for different subjects

a between-subject effect of age that is different from the within-

subject effect of age

98

> head(du)

y x id xb xw Subject Sex age

1 12.37216 8 1 11 -3 F09 Female 8

2 11.20801 10 1 11 -1 F09 Female 10

3 10.44755 12 1 11 1 F09 Female 12

4 10.43831 13 1 11 2 F09 Female 13

5 14.13549 9 2 12 -3 F11 Female 9

6 13.47965 11 2 12 -1 F11 Female 11

> tail(du)

y x id xb xw Subject Sex age

103 35.67045 37 26 36 1 M08 Male 37

104 35.70928 38 26 36 2 M08 Male 38

105 38.81624 34 27 37 -3 M10 Male 34

106 37.87866 36 27 37 -1 M10 Male 36

107 36.22499 38 27 37 1 M10 Male 38

108 35.62520 39 27 37 2 M10 Male 39

99

age_raw

y

10

20

30

40

10 20 30 40

M08 M10

10 20 30 40

M04 M02

10 20 30 40

M16 M14

M12 M11 M07 M06 M01

10

20

30

40

M05

10

20

30

40

M09 M15 M13 M03

F02 F10 F06 F03 F08

10

20

30

40

F07

10

20

30

40

F01

10 20 30 40

F04 F11

10 20 30 40

F09 F05

100

age_raw

y

10

20

30

40

10 20 30 40

Male

10 20 30 40

Female

101

Using age centered at

25.

Why? Like the ordinary

regression model, the

mixed model is

equivariant under

global centering but

convergence may be

improved because the G

matrix is less eccentric.

age

y

10

20

30

40

-10 0 10

Male

-10 0 10

Female

102

R code and output

> fit <- lme( y ~ age * Sex, du,

+ random = ~ 1 + age| Subject)

> summary( fit )

Linear mixed-effects model fit by REML

Data: du

AIC BIC logLik

374.6932 395.8484 -179.3466

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-

Cholesky parametrization

StdDev Corr

(Intercept) 9.32672995 (Intr)

age 0.05221248 0.941

Residual 0.50627022

103

Fixed effects: y ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 40.26568 2.497546 79 16.122095 0.0000

age -0.48066 0.035307 79 -13.613685 0.0000

SexFemale -14.01875 3.830956 25 -3.659333 0.0012

age:SexFemale 0.05239 0.055373 79 0.946092 0.3470

Correlation:

(Intr) age SexFml

age -0.007

SexFemale -0.652 0.005

age:SexFemale 0.005 -0.638 0.058

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-2.10716969 -0.54148659 -0.02688422 0.59030024 2.14279806

Number of Observations: 108

Number of Groups: 27

104

Between, Within and Pooled Models

We first focus on one

group, the female data:

What models could we fit

to this data?

age

y

10

20

30

40

-15 -10 -5 0 5 10

105

age

y

10

20

30

40

-15 -10 -5 0 5 10

marginal SD ellipse

106

Regressing with the

pooled data – ignoring

Subject – yields the

marginal (unconditional)

estimate of the slope: Pˆ

age

y

10

20

30

40

-15 -10 -5 0 5 10

marginal regression line

107

We could replace each

Subject by its means for x

and y and use the resulting

aggregated data with one

point for each Subject.

age

y

10

20

30

40

-15 -10 -5 0 5 10

108

age

y

10

20

30

40

-15 -10 -5 0 5 10

dispersion ellipse of subject means

109

Performing a regression

on the aggregated data

yields the ‘between-

subject’ regression, in

some contexts called an

‘ecological regression’

estimating, in some

contexts, the

'compositional effect' of

age.

age

y

10

20

30

40

-15 -10 -5 0 5 10

regression on subject means = ecological regression

110

We can combine all

within-subject regressions

to get a combined estimate

of the within-subject

slope. This is the estimate

obtained with a fixed-

effects model using age

and Subject additively.

Equivalently, we can

perform a regression using

(the within-subject

residuals of y minus mean

y) on (age minus mean

age).

Q: Which is better: ˆ, Wˆ or

Pˆ ?

.

age

y

10

20

30

40

-15 -10 -5 0 5 10

within-subject regression

111

A: None. They answer

different questions.

Typically, Pˆ would be

used for prediction across

the population; Wˆ for

‘causal’ inference

controlling for between-

subject confounders,

assuming that all

confounders affect all

observations similarly.

age

y

10

20

30

40

-15 -10 -5 0 5 10

within-subject regression

112

The relationship among

estimators:

Pˆ combines ˆ and Wˆ :

1P B W B B W Wˆ ˆ ˆW W W W

The weights depend only

on the design ( X matrix),

not of estimated variances

of the response.

age

y

10

20

30

40

-15 -10 -5 0 5 10

between-subject

marginal

within-subject

113

The Mixed Model

The mixed model

estimate5 also combines

ˆ and Wˆ :

1MM

MM B W

MM

B B W W

ˆ

ˆ ˆ

W W

W W

but with a lower weight on

ˆ:

MM 00

B B B

00

00

/

/

/ /1

T

gTW W W

T g T

g

Note that

MM

B BW W

5 Using a random intercept model

age

y

10

20

30

40

-15 -10 -5 0 5 10

between-subject

marginal

mixed model

within-subject

114

The mixed model estimator is a variance optimal combination of ˆ

and Wˆ .

It makes perfect sense if ˆ and Wˆ estimate the same thing, i.e. if

W !

Otherwise, it’s an arbitrary combination of estimates that estimate

different things. The weights in the combination have no

substantive interpretation.

i.e. it’s an optimal answer to a meaningless question.

Summary of the relationships among 4 models:

Model Estimate of slope Precision

Between Subjects ˆ BW

Marginal (pooled data) Pˆ

Mixed Model MMˆ

Within Subjects Wˆ WW

115

The pooled estimate combines ˆ and Wˆ :

1

P B W B B W W

ˆ ˆ ˆW W W W

Mixed model

With a random intercept model:

1 0 0, 00~ (0, ), ~ (0, )i iit iit it tu uy X N g N

with 00 ,g known MMˆ is also a weighted combination of ˆ

and Wˆ but with less weight on ˆ:

116

MM

B B

00

B

/

/

Between-Subject Information

Within-Subject Informationmonotone

TW W

T g

f W

MMˆ is between Wˆ and Pˆ , i.e. it does better than Pˆ in the sense

of being closer to Wˆ but is not equivalent to Wˆ .

With balanced data W MM P Pˆ ˆ ˆ ˆ

As

00

1 0

T g

, MM Wˆ ˆ , so a mixed model estimates the

within effect asymptotically in T – which is the cluster size

NOT the number of clusters.

117

As

00

1

T g

, MM Bˆ ˆ . Thus the mixed model estimate fails

to control for between-subject confounding factors. Note that

this does not capture the whole story because Wˆ and Bˆ are not

independent of

00

g . If

00

0g then

B W so that MM B W

118

A serious a problem? a simulation

1,000 simulations

showing

mixed model estimates

of slope

using the same

configuration of Xs with

W 1/ 2 and B 1 ,

keeping 00 1/ 2g and

allowing

to vary from 0.005 to

5

^

1

-0.5

0.0

0.5

1.0

0 1 2 3 4 5

119

What happened?

As gets larger,

the relatively small

value of 00g is harder to

identify

and

both sources of

variability

(within-subject and

between-subject)

are attributed to .

^

0

2

4

6

0 1 2 3 4 5

120

The blue line is the

diagonal ˆ and the

equation of the red line

is ˆ 1 .

When 00ˆ 0g , the

between- subject

relationship is treated as

if it has very high

precision and it

dominates in forming

the mixed model

estimate.

^

0

2

4

6

0 1 2 3 4 5

121

Splitting age into two variables

Since age has a within-subject effect that is inconsistent with its

between-subject effect we can split it into two variables:

1. Between-subject ‘contextual predictor’: e.g. age.mean of each

subject (or the starting age), and

2. within-subject predictor:

a. age itself or

b. within-subject residual: age.resid = age – age.mean

So we model:

.E( ) .age mean ageit i it ity age mean age

or

122

* * *

.0 .E( ) . .age meanit i it itage diffy age mean age diff

Surprisingly

*

. ageage diff

but

* *

. . .

.

age mean age mean age diff

age mean age

123

9 10 11 12

9

1

0

1

1

1

2

age

y

124

9 10 11 12

9

1

0

1

1

1

2

age

y

125

9 10 11 12

9

1

0

1

1

1

2

age

y

age

126

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*

127

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*

128

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*age.mean

129

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*age.mean

130

*

.

.

age mean

age mean age

*

.age mean keeps

age.diff constant

.age mean keeps age

constant

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*age.mean

age.mean*

Compositional effect

= Contextual effect

+ Within-subject effect

131

Using 'lme' with a contextual mean

> fit.contextual <- lme(

+ y ~ (age + cvar(age,Subject) ) * Sex,

+ du,

+ random = ~ 1 + age | Subject)

> summary(fit.contextual)

Linear mixed-effects model fit by REML

Data: du

AIC BIC logLik

296.8729 323.1227 -138.4365

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 1.53161007 (Intr)

age 0.03287630 0.024

Residual 0.51263884

132

Fixed effects: y ~ (age + cvar(age, Subject)) * Sex

Value Std.Error DF t-value

(Intercept) -3.681624 1.6963039 79 -2.170380

age -0.493880 0.0343672 79 -14.370670

cvar(age, Subject) 1.628584 0.0695822 23 23.405165

SexFemale 6.000170 2.5050694 23 2.395211

age:SexFemale 0.060143 0.0538431 79 1.116996

cvar(age, Subject):SexFemale -0.313087 0.1266960 23 -2.471167

p-value

(Intercept) 0.0330

age 0.0000

cvar(age, Subject) 0.0000

SexFemale 0.0251

age:SexFemale 0.2674

cvar(age, Subject):SexFemale 0.0213

. . . . .

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-1.871139553 -0.502221634 -0.006447848 0.552360837 2.428148053

Number of Observations: 108

Number of Groups: 27

133

> fit.compositional <- lme( y ~ (dvar(age,Subject) +

+ cvar(age,Subject) ) * Sex, du,

+ random = ~ 1 + age | Subject)

> summary(fit.compositional)

Linear mixed-effects model fit by REML

Data: du

AIC BIC logLik

296.8729 323.1227 -138.4365

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 1.53161006 (Intr)

age 0.03287629 0.024

Residual 0.51263884

Fixed effects: y ~ (dvar(age, Subject) + cvar(age, Subject)) * Sex

Value Std.Error DF t-value

(Intercept) -3.681624 1.6963039 79 -2.170380

dvar(age, Subject) -0.493880 0.0343672 79 -14.370670

cvar(age, Subject) 1.134704 0.0616092 23 18.417778

SexFemale 6.000170 2.5050694 23 2.395211

dvar(age, Subject):SexFemale 0.060143 0.0538431 79 1.116996

cvar(age, Subject):SexFemale -0.252945 0.1161225 23 -2.178257

134

p-value

(Intercept) 0.0330

dvar(age, Subject) 0.0000

cvar(age, Subject) 0.0000

SexFemale 0.0251

dvar(age, Subject):SexFemale 0.2674

cvar(age, Subject):SexFemale 0.0399

. . . . .

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-1.871139550 -0.502221640 -0.006447847 0.552360836 2.428148063

Number of Observations: 108

Number of Groups: 27

135

Simulation Revisited

1,000 simulations

using the same

models as the

earlier simulation,

i.e.

the same

configuration of Xs

with

W 1/ 2 and B 1 ,

keeping 00 1/ 2g

and

allowing

to vary from

0.005 to 5

^

W

^

B

-1.0

-0.5

0.0

0.5

1.0

0 1 2 3 4 5

136

Here a mixed

model is used with

mean age by

subject and the

within-subject

residual of age

from mean age.

^

W

^

B

-1.0

-0.5

0.0

0.5

1.0

0 1 2 3 4 5

137

Including the

contextual variable

gives better

estimates of

variance

components. The

estimate of does

not eventually

include 00g

^

0

1

2

3

4

5

0 1 2 3 4 5

138

Power

The best way to carry out power calculations is to simulate. You end

up learning about a lot more than power.

Nevertheless, Stephen Raudenbush and colleagues have a nice

graphical package available at Optimal Design Software.

139

Some links

There is a very good current bibliography as well as many other

resources at the UCLA Academic Technology Services site. Start

your visit at

http://www.ats.ucla.edu/stat/sas/topics/repeated_measures.htm

Another important site is the Centre for Multilevel Modeling,

currently at the University of Bristol:

http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-

support/news.shtml

140

A few books

Pinheiro, Jose C. and Bates, Douglas M. (2000) Mixed-Effects

Models in S and S-PLUS. Springer

Fitzmaurice, Garrett M., Laird, Nan M., Ware, James H. (2004)

Applied Longitudinal Analysis, Wiley.

Allison, Paul D. (2005) Fixed Effects Regression Methods for

Longitudinal Data Using SAS, SAS Institute.

Littell, Ramon C. et al. (2006) SAS for Mixed Models (2nd ed.),

SAS Institute.

Singer, Judith D. and Willett, John B. (2003) Applied Longitudinal

Data Analysis : Modeling Change and Event Occurrence. Oxford

University Press.

141

Appendix: Reinterpreting weights

The mixed model estimate using a random intercept model can be seen

either as a weighted combination of ˆ and Wˆ or of Pˆ and Wˆ

1

MM B W B B W W

00 00

11 1 1 1

00 B W 00 B B W W

11

1 1

00 B W W 00 B W P

/ /ˆ ˆ ˆ

/ /

ˆ ˆ

ˆ

T TW W W W

T g T g

g W W g W W

T T T T

g W W W g W W

T T

1

W W

ˆW

学霸联盟

Longitudinal Data Analysis

The Special Roles of Time

ICPSR 2017 at York University

Georges Monette

random@yorku.ca

2

Contents

Summary: .............................................................................................. 5

Take 1: The basic ideas ........................................................................ 8

A traditional example ........................................................................ 9

Pooling the data ('wrong' analysis) .................................................. 11

Fixed effects regression model ........................................................ 22

Other approaches ............................................................................. 29

Multilevel Models............................................................................ 30

From Multilevel Model to Mixed Model ........................................ 33

Mixed Model for Longitudinal Data in R: ...................................... 38

Notes on interpreting autocorrelation ........................................... 47

Some issues concerning autocorrelation ...................................... 48

Mixed Model in Matrices ................................................................ 51

Fitting the mixed model ................................................................... 53

Comparing GLS and OLS ............................................................... 56

Testing linear hypotheses in R ........................................................ 57

3

Modeling dependencies in time ....................................................... 63

G-side vs. R-side.............................................................................. 65

Simpler Models................................................................................ 68

BLUPS: Estimating Within-Subject Effects ................................... 70

Where the EBLUP comes from : looking at a single subject ....... 79

Interpreting G .................................................................................. 87

Differences between lm (OLS) and lme (mixed model) with

balanced data ................................................................................... 95

Take 2: Learning lessons from unbalanced data ................................ 97

R code and output .......................................................................... 102

Between, Within and Pooled Models ............................................ 104

The Mixed Model .......................................................................... 113

A serious a problem? a simulation ................................................ 118

Splitting age into two variables ..................................................... 121

Using 'lme' with a contextual mean ............................................... 131

> fit.contextual <- lme( .................................................................. 131

+ y ~ (age + cvar(age,Subject) ) * Sex ................................. 131

Simulation Revisited ..................................................................... 135

4

Power ................................................................................................ 138

Some links ........................................................................................ 139

A few books ...................................................................................... 140

Appendix: Reinterpreting weights .................................................... 141

5

Summary:

At first sight a mixed model for longitudinal data analysis does not

look very different from a mixed model for hierarchical data. In

matrices:

Linear Model 2~ ( , )N y Xβ ε ε 0 I

Mixed Model

for

Hierarchical

Data:

2~ ( , )

~ ( , )

N

N

j j j j j

j j

j

y Xγ Zu ε

ε 0 I

u 0 G

1

2

j

j

j

jn

y

y

y

jy

Observations

in jth cluster

(students in

jth school)

Mixed Model

for

Longitudinal

Data:

j j j j j

j j

j

~ ( , )

~ ( , )

N

N

y Xγ Zu ε

ε 0 R

u 0 G

1

2

j

j

j

jn

y

y

y

jy

Observations

over time on

jth subject

6

Formally, mixed models for hierarchical data and for longitudinal data look

almost the same. In practice, longitudinal data introduces some fascinating

challenges:

1) The observations within a cluster are not necessarily independent. This

is the reason for the broader conditions that ~ ( , )j jNε 0 R (where jR is a

variance matrix) instead of merely the special case: 2~ ( , )j jN Iε 0 .

Observations close in time might depend on each other in ways that are

different from those that are far in time. Note that if all observations have

equal variance and are equally positively correlated – what is called a

Compound Symmetry variance structure – this is entirely accounted for by

the random intercept model on the G side. The purpose of the R matrix is to

potentially capture interdependence that is more complex than compound

symmetry.

2) The mean response may depend on time in ways that are far more

complex than is typical for other types of predictors. Depending on the time

7

scale of the observations, it may be necessary to use polynomial models,

asymptotic models, Fourier analysis (orthogonal trigonometric functions)

or splines that adapt to different features of the relationship in different

periods of times.

3) There can be many partially confounded 'clocks' in the same analysis:

period-age-cohort effects, age and time relative to a focal event such as

giving birth, injury, arousal from coma, etc.

4) Some periodic patterns can be modeled either with fixed effects or with

random effects through the R matrix. Periodic patterns with a fixed period

(e.g. seasonal periodic variation) are naturally modeled with fixed effects

(FE) using, for example, trigonometric functions. Periodic patterns with a

randomly varying period (e.g. sunspot cycles, are more appropriately

modeled with the R matrix which can be used to model the kind of pattern

encountered in time series analysis, autoregressive and moving average

models for residuals

These slides focus on the simple functions of time and the R side. Lab 3

introduces more complex forms for functions of time.

8

Take 1: The basic ideas

9

A traditional example

Figure 1: Pothoff and Roy dental measurements in boys and girls.

Balanced data:

everyone measured at

the same set of ages

could use a classical

repeated measures

analysis

Some terminology:

Cluster: the set of

observations on one

subject

Occasion: observations

at a given time for each

subject

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

10

Figure 2: A different view by sex

Viewing by sex helps to

see pattern between

sexes:

Note:

Slopes are relatively

consistent within each

sex – except for a few

anomalous male curves.

BUT

Intercept is highly

variable.

An analysis that pools

the data ignores this

feature. Slope estimates

will have excessively

large SEs and 'level'

estimates too low SEs.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

11

Pooling the data ('wrong' analysis)

Ordinary least-squares on pooled data:

0

1, , (number of subjects [clusters])

1, , (number of occasions for th subject)

age sex age sexit it i it i it

i

y age sex age sex

i N

t T i

Note that it is customary to use i for subjects (clusters) and t for

occasions (time). This means that ‘i’ assumes the role previously

played by ‘j’.

R:

> library( spida ) # see notes on installation

> library( nlme ) # loaded automatically

> library( lattice ) # ditto

> data ( Orthodont ) # without spida

12

> head(Orthodont)

distance age Subject Sex

1 26.0 8 M01 Male

2 25.0 10 M01 Male

3 29.0 12 M01 Male

4 31.0 14 M01 Male

5 21.5 8 M02 Male

6 22.5 10 M02 Male

> dd <- Orthodont

> tab(dd, ~Sex)

Sex

Male Female Total

64 44 108

13

> dd$Sub <- reorder( dd$Subject, dd$distance)

# for plotting

## OLS Pooled Model

> fit <- lm ( distance ~ age * Sex , dd)

> summary(fit)

. . .

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.3406 1.4162 11.538 < 2e-16 ***

age 0.7844 0.1262 6.217 1.07e-08 ***

SexFemale 1.0321 2.2188 0.465 0.643

age:SexFemale -0.3048 0.1977 -1.542 0.126

Residual standard error: 2.257 on 104 degrees of freedom

Multiple R-squared: 0.4227, Adjusted R-squared: 0.4061

F-statistic: 25.39 on 3 and 104 DF, p-value: 2.108e-12

14

Note that both SexFemale and age:SexFemale have

large p-values. Are you tempted to just drop

both of them?

Check the joint hypothesis that they are

BOTH 0.

> wald( fit, "Sex")

numDF denDF F.value p.value

Sex 2 104 14.97688 <.00001

Coefficients Estimate Std.Error DF t-value p-value

SexFemale 1.032102 2.218797 104 0.465163 0.64279

age:SexFemale -0.304830 0.197666 104 -1.542143 0.12608

This analysis suggests that we could drop one

or the other but not both! Which one should we

choose? To respect the principle of marginality

15

we should drop the interaction, not the main

effect of Sex. This leads us to:

> fit2 <- lm( distance ~ age + Sex ,dd )

> summary( fit2 )

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 17.70671 1.11221 15.920 < 2e-16 ***

age 0.66019 0.09776 6.753 8.25e-10 ***

SexFemale -2.32102 0.44489 -5.217 9.20e-07 ***

and we conclude there is an effect of Sex of

jaw size but did not find evidence that the

rate of growth is different.

Revisiting the graph we saw earlier:

16

Figure 3: A different view by sex

The analysis is inconsistent

with what we see. It fails

Tukey’s interocular test.

Except for a few irregular

male trajectories, the male

trajectories appear steeper

than the female ones. On

the other hand, if we

extrapolate the curves back

to age 0, there would not

much difference in levels.

estimates too low SEs.

OLS cannot exploit the consistency in slopes to recognize that

hypotheses about slopes should have a relatively smaller SE than

hypotheses about the levels of the curves. OLS is blind to the lines.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

17

From the first OLS fit:

Estimated variance

within each

subject:

2

2

2

2

2.26 . . .

. 2.26 . .

. . 2.26 .

. . . 2.26

Why is this wrong?

Residuals within

clusters are not

independent; they tend

to be highly correlated

with each other

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

18

Fitted lines in ‘data

space’.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

19

Determining the

intercept and slope of

each line

age

d

i

s

t

a

n

c

e

15

20

25

30

0 5 10

Male

15

20

25

30

Female

20

Fitted lines in

‘data’ space

age

d

i

s

t

a

n

c

e

15

20

25

0 5 10

Male

Female

21

Fitted ‘lines’ in

‘beta’ space

^age

^

0

16.0

16.5

17.0

17.5

0.4 0.5 0.6 0.7 0.8 0.9

Male

Female

22

Fixed effects regression model

See

Paul D. Allison (2005) Fixed Effects Regression Methods for

Longitudinal Data Using SAS. SAS Institute – a great book on basics

of mixed models!

Treat Subject as a factor

Lose Sex unless it is constructed as a Subject contrast

Fits a separate OLS model to each subject:

it i age it iti agey

See the wiki for details: SPIDA 2010: Longitudinal Models:

Additional material

23

Estimated variance for

each subject:

2

2

2

2

1.31 . . .

. 1.31 . .

. . 1.31 .

. . . 1.31

Problems:

● No estimate of sex

effect

● Can't generalize to

population, only to 'new'

observations from same

subjects

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

24

● Can't predict for new

subject.

● Can construct sex

effect but CI is for

difference between

sexes in this sample

● No autocorrelation in

time

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

25

Fitted lines in data

space

● Female lines lower

and less steep

● Patterns within Sexes

not so obvious.

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 11 12 13 14

Male

8 9 10 11 12 13 14

Female

26

Fitted lines in beta

space

● Patterns within sexes

more obvious: steeper

slope associated with

smaller intercept.

● Single male outlier

stands out

^age

^

0

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

27

Each within-subject

least squares estimate

0

ˆ

ˆ

ˆi

i

i

i age

has variance ' 1( )i i X X

which is used to

construct a confidence

ellipse for the ‘fixed

effect’

0i

i

i age

for the ith subject.

Each CI uses only the

information from that

subject (except for the

estimate of )

^age

^

0

0

10

20

30

0 1 2

Male

0 1 2

Female

28

Differences between

subjects such as the

dispersion of ˆi s and the

information they provide

on the dispersion of the

true i s is ignored in this

model.

The standard error of the

estimate of each average

Sex line uses the sample

distribution of it s within

subjects but not the

variability in i s

between subjects.

^age

^

0

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

29

Other approaches

Repeated measures (univariate and multivariate)

o Need same times for each subject, no other time-varying

(Level 1) variables

Two-stage approach: use ˆ s in second level analysis:

o If design not balanced, then iˆ s have different variances, and

would need different weights, Using ' 1( )i i X X does not

work because the relevant weight is based on the marginal

variance ' 1( )i i G X X , not the conditional variance given

the ith subject, ' 1( )i i X X .

30

Multilevel Models

Start with the fixed effects model:

Within-subject model (same as fixed effects model above):

1it it iti iy X ~ (0, )i N I

1, , 1, , ii N t T

0i is the ‘true’ intercept and 1i is the ‘true’ slope with respect to

X.

is the within-subject residual variance.

X (age in our example) is a time-varying variable. We could have

more than one.

31

Then add:

Between-subject model (new part):

We suppose that 0i and 1i vary randomly from subject to

subject.

But the distribution might be different for different Sexes (a

‘between-subject’ or ‘time-invariant’ variable). So we assume a

multivariate distribution:

0

11 11

1 1 1

1, ,

1

i i i

i

i ii

i

ii

uW i NuW

u

uW

0

1 11

0~ , ~ ( , ) 1, ,0

i

i

u g gN N i Ng gu

0 G

32

where iW is a coding variable for Sex, e.g. 0 for Males and 1 for

Females.

0

1

1 1

among Males

among Females

i

i age

E

Some software packages use the formulation of the multilevel model,

e.g. MLWin.

SAS and R use the ‘mixed model’ formulation. It is very useful to

know how to go from one formulation to the other.

33

From Multilevel Model to Mixed Model

Combining the two levels of the multilevel model by substituting

the between subject model into the within-subject model. Then

gather together the fixed terms and the random terms:

0 1

0 1

1 1

0

1

1

1

1 1

1

(fixed part of the m

(random part

odel

of t

)

i

i

i

i i

it it iti i

i i it

ii ii it it

it i it

it

iti i

it

it

W W

u u

u u

y X

W W X

W W X X

X X

X

u u

he model)

34

Anatomy of the fixed part:

1

1

(Intercept)

(between-subject, time-invariant variable)

(within-subject, time-varying variable)

(cross-level interaction)

i

it

i it

W

W

X

X

Interpretation of the fixed part: the parameters reflect population

average values.

Anatomy of the random part:

For one occasion:

0 1i i iit ittu u X

35

Putting the observations of one subject together:

0

1

1

2

11

22

3 3

4

3

44

1

1

1

1

i

i

i

ii

ii

ii

ii

i

i

i i

i

i

i

u

u

u

X

X

X

X

Ζ

Note: the random-effects design uses only time-varying variables

Distribution assumption:

~ (0, ) independent of ~ (0, )i iiu N NG R

where, so far, i I R

36

Notes:

G (usually) does not vary with i. It is usually a free positive

definite matrix or it may be a structured pos-def matrix.

More on G later.

iR (usually) does change with i – as it must if iT is not

constant. iR is expressed as a function of parameters. The

simplest example is

i in ni I R . Later we will use iR to

include auto-regressive parameters for longitudinal

modeling.

We can’t estimate G and R directly. We estimate them

through:

'Var( )i i i ii V Z GZ R

37

Some things can be parametrized either on the G-side or on

the R-side. If they’re done in both, you lose identifiability.

Ill-conditioning due “collinearity” between the G- and R-

side models is a common problem.

38

Mixed Model for Longitudinal Data in R:

> fit <- lme( distance ~ age * Sex, dd,

+ random = ~ 1 + age | Subject,

+ correlation

+ = corAR1 ( form = ~ 1 | Subject))

Model formula: distance ~ age * Sex

o specifies the fixed model

o includes the intercept and marginal main effects by default

o contains time-varying, time-invariant and cross-level

variables together

39

Random argument: ~ 1 + age | Subject

o Specifies the variables in the random model and the variable

defining clusters.

o The G matrix is the variance covariance matrix for the

random effect. Here 00 0,00

,,, ,0

Var Var ageii

age ageage iage i age

g gu

u g g

G

o Normally, the random model only contains an intercept and,

possibly, time-varying variables

40

Correlation argument: Specifies the model for the iR matrices

o Omit to get the default:

i in ni R I

o Here we illustrate the use of an AR(1) structure producing for

example

1 2 3

1 1 2

2 1 1

3 2 1

1

1

1

1

iR

in a cluster with 4 occasions.

41

# Mixed Model in R:

> fit <- lme( distance ~ age * Sex, dd,

+ random = ~ 1 + age | Subject,

+ correlation

+ = corAR1 ( form = ~ 1 | Subject))

> summary(fit)

Linear mixed-effects model fit by REML

Data: dd

AIC BIC logLik

446.8076 470.6072 -214.4038

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-

Cholesky parametrization

42

StdDev Corr

(Intercept) 3.3730482 (Intr)

00

g

age 0.2907673 -0.831

11 01 01 00 11

/g r g g g

Residual 1.0919754

Correlation Structure: AR(1)

Formula: ~1 | Subject

Parameter estimate(s):

Phi

-0.47328

correlation between adjoining obervations

43

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.152435 0.9984616 79 16.177323 0.0000

age 0.797950 0.0870677 79 9.164702 0.0000

SexFemale 1.264698 1.5642886 25 0.808481 0.4264

age:SexFemale -0.322243 0.1364089 79 -2.362334 0.0206

Correlation: among gammas

(Intr) age SexFml

age -0.877

SexFemale -0.638 0.559

age:SexFemale 0.559 -0.638 -0.877

Standardized Within-Group Residuals:

Min Q1 Med

Q3 Max

-3.288886631 -0.419431536 -0.001271185

0.456257976 4.203271248

Number of Observations: 108

44

Number of Groups: 27

Confidence intervals for all parameters:

> intervals( fit )

Approximate 95% confidence intervals

Fixed effects:

lower est. upper

(Intercept) 14.1650475 16.1524355 18.13982351

age 0.6246456 0.7979496 0.97125348

SexFemale -1.9570145 1.2646982 4.48641100

age:SexFemale -0.5937584 -0.3222434 -0.05072829

attr(,"label")

[1] "Fixed effects:"

45

Random Effects:

Level: Subject

lower est. upper

sd((Intercept)) 2.2066308 3.3730482 5.1560298

sd(age) 0.1848904 0.2907673 0.4572741

cor((Intercept),age) -0.9377008 -0.8309622 -0.5808998

Correlation structure:

lower est. upper

Phi -0.7559617 -0.4728 -0.04182947

attr(,"label")

[1] "Correlation structure:"

Do NOT use CIs for SDs

to test whether they are 0.

Use anova + simulate. Cf

Lab 1.

Negative!

46

Within-group standard error:

lower est. upper

0.9000055 1.0919754 1.3248923

> VarCorr( fit ) from the G matrix

Subject = pdLogChol(1 + age)

Variance StdDev Corr

(Intercept) 11.3774543 3.3730482 (Intr)

age 0.0845456 0.2907673 -0.831

Residual 1.1924103 1.0919754

To get the G matrix itself

in a form that can be used

in matrix expressions

> getVarCov( fit )

47

Random effects variance covariance matrix

(Intercept) age

(Intercept) 11.37700 -0.814980

age -0.81498 0.084546

Standard Deviations: 3.373 0.29077

Notes on interpreting autocorrelation

The estimated autocorrelation is negative. Although most natural

processes would be expected to produce positive autocorrelations,

occasional large measurement errors can create the appearance of a

negative autocorrelation. Some processes correct so that a unusually

large value tends to be compensated by an unusually small one, e.g.

residue in chemical process.

48

Some issues concerning autocorrelation

1. Lack of fit will generally contribute positively to

autocorrelation. For example, if trajectories are quadratic but

you are fitting a linear trajectory, the residuals will be

positively autocorrelated. Strong positive autocorrelation

can be a symptom of lack of fit. This is an example of poor

identification between the FE model and the R model, that is,

between the deterministic and the stochastic aspects of the

model. See Lab 3 for a similar discussion of seasonal (FE)

versus cyclical variation (R-side) periodic patterns.

2. As mentioned above, occasional large measurement errors will

contribute negatively to the estimate of autocorrelation.

49

3. In a well fitted OLS model, the residuals are expected to be

negatively correlated, more so if there are few observations per

subject.

4. With few observations per subject, the estimate of

autocorrelation (R side) can be poorly identified and highly

correlated with G-side parameters. [See 'Additional Notes']

Looking at the data we suspect that M09 might be highly influential

for autocorrelation. We can refit without M09 to see how the estimate

changes.

What happens when we drop M09?

> fit.dropM09 <- update( fit,

+ subset = Subject != "M09")

50

> summary( fit.dropM09 )

Linear mixed-effects model fit by REML

Data: dd

Subset: Subject != "M09"

AIC BIC logLik

406.3080 429.7545 -194.1540

. . . . .

Correlation Structure: AR(1)

Formula: ~1 | Subject

Parameter estimate(s):

Phi

-0.1246035 still negative

. . . . . .

> intervals( fit.dropM09 )

Approximate 95% confidence intervals

. . . .

Correlation structure:

lower est. upper

Phi -0.5885311 -0.1246035 0.4010562

attr(,"label") but not significantly

[1] "Correlation structure:"

51

Mixed Model in Matrices

In the ith cluster:

1 1

1

0

1

1

0

1

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

1 1

1 1

1 1

1 1

i i

i

i

i

it it i it it

i ii i i i

i ii i i i

i ii i i i

i ii i i i

itu u

u

W

u

Wy X X X

y W X W X X

y W X W X X

y W X W X X

y W X W X X

1

2

3

4

i

i

i

i

i iii i

y X uγ Ζ ε

[Could we fit this model in cluster i?]

where

'

~ ( , ) ~ ( , )

~ ( , )

i

i

i i

i i i i i i i

N N

N

0 Gu ε

u ε

0 R

Ζ 0 Ζ GΖ R

52

For the whole sample

1 1 1 11 0

0N N N N N

u ε

u

y X Ζ

γ

y X Ζ ε

Finally making the complex look deceptively simple:

u εy Xγ Ζ

Xγ δ

with

1 Var( )

Var( )

Var( ) 'N

R 0 G

R Ζ

0

u

ε δ u

R G

ε

δ V Ζ Z R

53

Fitting the mixed model

Use Generalized Least Squares on

~ ( , ' )N y Xγ ΖGZ R

1

1 1ˆ ˆˆ ' 'GLS

γ X V X X V y

We need ˆGLSγ to get Vˆ and vice versa so one algorithm iterates from

one to the other until convergence.

There are two main ways of fitting mixed models with normal

responses:

54

1. Maximum Likelihood (ML)

a. Fits all of , ,γ G R at the same time.

b. Two ML fits can be used in the 'anova' function to test

models that differ in their FE models or in their RE

models: G and/or R.

c. ML fits tend to underestimate V and Wald tests will tend

to err on the liberal side.

2. Restricted (or Residual) Maximum Likelihood

a. Maximum likelihood is applied to the 'residual space'

with respect to the X matrix and only estimates G and R,

hence V. The estimated V is then used to obtain the

GLS estimate for γ .

55

b. 'anova' can only be used to compare two models with

identical FE models. Thus 'anova' (Likelihood Ratio

Tests) can only be applied to hypotheses about REs.

c. The estimate of V tends to be better than with ML and

Wald tests are expected to be more accurate than with

ML.

d. Thus with REML, you sacrifice the ability to perform

LRTs for FEs but improve Wald tests for FEs. Also,

LRTs for REs are expected to be more accurate.

e. REML is the default for 'lme' and PROC MIXED in

SAS.

56

Comparing GLS and OLS

We used OLS above:

1ˆ ' 'OLS γ X X X y

instead of

1

1 1ˆ ˆˆ ' 'GLS

γ X V X X V y

How does OLS differ from GLS?

Do they differ only in that GLS produces more accurate standard

errors? Or can ˆOLSβ be very different from ˆGLSβ ?

With balanced data they will be the same. With unbalanced data they

can be dramatically different. OLS is an estimate based on the pooled

data. GLS provides an estimate that is closer to that of the unpooled

57

data. Estimation of the FE model and of RE model are highly related

in contrast with OLS and GLMs with canonical links where they are

orthogonal.

Testing linear hypotheses in R

Hypotheses involving linear combination of the fixed effects

coefficients can be tested with a Wald test. The Wald test is

based on the normal approximation for maximum likelihood

estimators using the estimated variance-covariance matrix.

Using the 'wald' function alone displays the estimated fixed

effects coefficients and Wald-type confidence intervals as well as

a test that all true coefficients are equal 0 (this is rarely of any

interest).

58

> wald (fit)

numDF denDF F.value p.value

4 24 952.019 <.00001

Coefficients Estimate Std.Error DF t-value p-value

(Intercept) 16.479081 1.050894 76 15.681019 <.00001

age 0.769464 0.089141 76 8.631956 <.00001

SexFemale 0.905327 1.615657 24 0.560346 0.58044

age:SexFemale -0.290920 0.137047 76 -2.122774 0.03703

Continuation:

Coefficients Lower 0.95 Upper 0.95

(Intercept) 14.386046 18.572117

age 0.591923 0.947004

SexFemale -2.429224 4.239878

age:SexFemale -0.563872 -0.017967

59

We can estimate the response level at age 14 for Males and Females

by specifying the appropriate linear transformation of the coefficients.

> L <- rbind( "Male at 14" = c( 1, 14, 0, 0),

+ "Female at 14" = c( 1, 14, 1, 14))

> L

[,1] [,2] [,3] [,4]

Male at 14 1 14 0 0

Female at 14 1 14 1 14

> wald ( fit, L )

numDF denDF F.value p.value

1 2 24 1591.651 <.00001

Estimate Std.Error DF t-value p-value Lower 0.95

Male at 14 27.25157 0.605738 76 44.98908 <.00001 26.04514

Female at 14 24.08403 0.707349 24 34.04829 <.00001 22.62413 Upper 0.95

Male at 14 28.45800

Female at 14 25.54392

60

To estimate the gap at 14:

> L.gap <- rbind( "Gap at 14" = c( 0, 0, 1, 14))

> L.gap

[,1] [,2] [,3] [,4]

Gap at 14 0 0 1 14

> wald ( fit, L.gap)

numDF denDF F.value p.value

1 1 24 11.56902 0.00235

Estimate Std.Error DF t-value p-value Lower 0.95

Gap at 14 -3.167548 0.931268 24 -3.401327 0.00235 -5.089591

Upper 0.95

Gap at 14 -1.245504

61

To simultaneously estimate the gap at 14 and at 8 we can do the

following. Note that the overall (simultaneous) null hypothesis here is

equivalent to the hypothesis that there is no difference between the

sexes.

> L.gaps <- rbind( "Gap at 14" = c( 0, 0, 1, 14),

+ "Gap at 8" = c( 0,0,1, 8))

> L.gaps

[,1] [,2] [,3] [,4]

Gap at 14 0 0 1 14

Gap at 8 0 0 1 8

> wald ( fit, L.gaps)

numDF denDF F.value p.value

1 2 24 5.83927 0.00858

Estimate Std.Error DF t-value p-value Lower 0.95

Gap at 14 -3.167548 0.931268 24 -3.401327 0.00235 -5.089591

Gap at 8 -1.422030 0.844256 24 -1.684359 0.10508 -3.164489

62

An equivalent hypothesis that there is no difference between the sexes

is the hypothesis that the two coefficients for sex are simultaneously

equal to 0. The 'wald' function simplifies this by allowing a string as a

second argument that is used to match coefficient names. The test

conducted is that all coefficients whose name has been matched are

simultaneously 0.

> wald ( fit, "Sex" )

numDF denDF F.value p.value

Sex 2 24 5.83927 0.00858

Coefficients Estimate Std.Error DF t-value p-value

SexFemale 0.905327 1.615657 24 0.560346 0.58044

age:SexFemale -0.290920 0.137047 76 -2.122774 0.03703

Coefficients Lower 0.95 Upper 0.95

SexFemale -2.429224 4.239878

age:SexFemale -0.563872 -0.017967

Note the equivalence of the two F-tests above.

63

Modeling dependencies in time

The main difference between using mixed models for multilevel

modeling as opposed to longitudinal modeling are the assumptions

about

it

, plus the more complex functional forms for time effects. For

observations observed in time, part of the correlation between s

could be related to their distance in time.

R-side model allows the modeling of temporal and spatial

dependence.

Correlation argument R

Autoregressive of order 1:

corAR1( form =

~ 1 | Subject)

2 3

22

2

3 2

1

1

1

1

64

Correlation argument R

Autoregressive Moving Average

of order (1,1)

corARMA( form =

~ 1 | Subject,

p = 1, q =1)

2

2

2

1

1

1

1

AR(1) in continuous time

e.g. supposing a subject with

times 1,2, 5.5 and 102

corCAR1( form =

~ time | Subject)

4.5 9

3.5 82

4.5 3.5 4.5

9 8 4.5

1

1

1

1

2 Note that the times and the number of times – hence the indices – can change from subject to subject but 2 and have the same value.

65

G-side vs. R-side

A few things can be done with either side. But don’t do it with

both in the same model. The redundant parameters will not be

identifiable. For example, the G-side random intercept model is

‘almost’ equivalent to the R-side compound symmetry model.

With OLS the linear parameters are orthogonal to the variance

parameter. Collinearity among the linear parameters is determined

by the design, X, and does not depend on values of parameters.

Computational problems due to collinearity can be addressed by

orthogonalizing the X matrix.

With mixed models the variance parameters are generally not

orthogonal to each other and, with unbalanced data, the linear

parameters are not orthogonal to the variance parameters.

66

G-side parameters can be highly collinear even if the X matrix is

orthogonal. Centering the variables of the RE model around the

“point of minimal variance” will help but the resulting design

matrix may be highly collinear.

G-side and R-side parameters can be highly collinear. The degree

of collinearity may depend on the value of the parameters.

For example, our model identifies through:

2 3

2200 01

210 11

3 2

11 3

11 1 1 1 1 1ˆ

1 1 3 1 1 3 1

1 3 1

g

g g

g

V

For values of above 0.5, the Hesssian is very ill-conditioned. The

lesson may be that to use AR, ARMA models effectively, you need

at least some subjects observed on many occasions.

67

R-side only: population average models

G-side only: hierarchical models with conditionally independent

observations in each cluster

Population average longitudinal models can be done on the R-side

with AR, ARMA structures, etc.

The absence of the G-side may be less crucial with balanced data.

The G-side is not enough to provide control for unmeasured

between subject confounders if the time-varying predictors are

unbalanced (more on this soon).

A G-side random effects model DOES NOT provide the equivalent

of temporal correlation.

68

Simpler Models

The model we’ve looked at is deliberately complex including

examples of the main typical components of a mixed model. We can

use mixed models for simpler problems.

Using X as a generic time-varying (within-subject) predictor and W as

a generic time-invariant (between-subject) predictor we have the

following:

MODEL RANDOM Formula

One-way

ANOVA with

random effects

~ 1 ~ 1 | Sub 0i itit uy

Means as

outcomes

~ 1 + W

(~W)3

~ 1 | Sub

0

i

i i

it

tu

Wy

3 The model in () is equivalent in R

69

One-way

ANCOVA with

random effects

~ 1 + X (~X) ~ 1 | Sub

0

1

i it

it it

u

y X

Random

coefficients

model

~ 1 + X (~X) ~ 1+ X|Sub

1

0 1i i

it it

it itu u

y X

X

Intercepts and

slopes as

outcomes

~ 1 + X

+ W + X:W

(~ X*W)

~ 1+

X|Sub

1

1

0 1

i

i i

it it

i it

it it

W

u u

W

y X

X

X

Non- random

slopes

~ 1 + X

+ W + X:W

(~ X*W)

~ 1 | Sub

0

1

1

i

i

it it

i it

it

W

W

y X

X

70

BLUPS: Estimating Within-Subject Effects

We’ve seen how to estimate , G and R. Now we consider 0

1

i

i

i

.

We’ve already estimated i using the fixed-effects model with a OLS

regression within each subject. Call this estimator: iˆ . How good is

it?

1

'ˆVar( )i i i i

X X

Can we do better? We have another ‘estimator’ of i .

71

Suppose we know s for the population. We could also predict4 i by

using the within Sex mean intercepts and slopes, e.g. for Males we

could use:

1

with error variance:

0 00

1 10

Var i

i

G

We could then combine iˆ and

1

by weighting then by inverse

variance (= precision). This yields the BLUP (Best Linear Unbiased

Predictor):

11 11 11 ' 1 '

1

ˆ

i i i i i

G X X G X X

4 Non-statisticians are always thrown for a loop when we ‘predict’ something that

happened in the past. We use 'predict' for things that are random, 'estimate' for

things that are 'fixed'. Orthodox Bayesians always predict.

72

If we replace the unknown parameters with their estimates, we get the

EBLUP (Empirical BLUP):

11 11 11 ' 1 '

1

ˆˆ ˆ ˆˆ ˆ

ˆi i i i i i

G X X G X X

The EBLUP ‘optimally’ combines the information from the ith cluster

with the information from the other clusters. We borrow strength

from the other clusters.

The process ‘shrinks’ iˆ towards

1

ˆ

ˆ

along a path determined by the

locus of osculation of the families of ellipses with shape Gˆ around

1

ˆ

ˆ

and shape 1'ˆ i i X X around iˆ .

73

The slope of the BLUP

is close to the population

slope

but

the level of the BLUP is

close to the level of the

BLUE

This suggests that G has

a large variance for

intercepts and a small

variance for slopes

age

d

i

s

t

a

n

c

e

20

25

30

8 9 10 12 14

M16 M05

8 9 10 12 14

M02 M11

8 9 10 12 14

M07 M08

M03 M12 M13 M14 M09

20

25

30

M15

20

25

30

M06 M04 M01 M10

F10 F09 F06 F01 F05

20

25

30

F07

20

25

30

F02

8 9 10 12 14

F08 F03

8 9 10 12 14

F04 F11

Popn BLUE BLUP

74

Population estimate

BLUE and BLUP

in beta space

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

M16 M05

0.5 1.0 1.5 2.0

M02 M11

0.5 1.0 1.5 2.0

M07 M08

M03 M12 M13 M14 M09

5

10

15

20

25

M15

5

10

15

20

25

M06 M04 M01 M10

F10 F09 F06 F01 F05

5

10

15

20

25

F07

5

10

15

20

25

F02

0.5 1.0 1.5 2.0

F08 F03

0.5 1.0 1.5 2.0

F04 F11

Popn BLUE BLUP

75

The marginal dispersion

of BLUEs comes from:

2 ' 1

2

1 11 2

ˆVar( ) ( )

ˆVar( )

i

i i i

i

i X

g

T S

G X X

Var( )i G [population

var.]

2 ' 1ˆVar( | ) ( )i i i i X X

[cond’l var.

resampling from ith

subject]

ˆE( | )i i i [BLUE]

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

76

So:

2 ' 1

ˆ ˆVar( ) Var(E( | ))

ˆE Var( | )

Var( )

ˆE Var( | )

( )

i i i

i i

i

i i

i i

G X X

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

77

While the expected

variance of the BLUEs

is larger than G

the expected variance of

the BLUPs is smaller

than G.

Beware of drawing

conclusions about G

from the dispersion of

the BLUPs.

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

78

The estimate of G can

be unstable and often

collapses to singularity

leading to non-

convergence for many

methods.

Possible remedies:

- Recentre X near point

of minimal variance,

- Use a smaller G

- Change the model

slope

I

n

t

5

10

15

20

25

0.5 1.0 1.5 2.0

Male

0.5 1.0 1.5 2.0

Female

Popn BLUE BLUP

79

Where the EBLUP comes from : looking at a single subject

Note that the EBLUP’s

slope is close to the slope

of the population estimate

(i.e. the male population

conditioning on between-

subject predictors) while

the level of the line is

close to level of the

BLUE.

The relative precisions of

the BLUE and of the

population estimate on

slope and level are

reflected through the

shapes of G and 2 ' 1( )i i X X

M11

age

d

i

s

t

a

n

c

e

22

23

24

25

26

27

8 9 10 11 12 13 14

Popn BLUE BLUP

80

The same picture in

“beta-space”

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

81

The population

estimate with a SD

ellipse.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

82

The population

estimate with a SD

ellipse

and

the BLUE with its

SE ellipse

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

83

The EBLUP is an Inverse

Variance Weighted mean

of the BLUE and of the

population estimate.

We can think of taking the

BLUE and ‘shrinking’ it

towards the population

estimate along a path that

optimally combines the

two components.

The path is formed by the

osculation points of the

families of ellipses around

the BLUE and the

population estimate.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

84

The amount and direction

of shrinkage depends on

the relative shapes and

sizes of

G

and

2

2 ' 1 1ˆVar( | ) ( )

ii i i

i

i

T

XX X S

The BLUP is at an

osculation point of the

families of ellipses

generated around the

BLUE and population

estimate.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

85

Imagine what could

happen if G were

oriented differently:

Paradoxically, both the

slope and the intercept

could be far outside the

population estimate and

the BLUE.

M11

Int

s

l

o

p

e

0.0

0.2

0.4

0.6

0.8

16 18 20 22

Popn BLUE BLUP

86

When is a BLUP a BLUPPER?

The rationale behind BLUPs is based on exchangeability. No outside

information should make this cluster stand out from the others and the

mean of the population deserves the same weight in prediction for this

cluster as it deserves for any other cluster that doesn’t stand out.

If a cluster stands out somehow, then the BLUP might be a

BLUPPER.

87

Interpreting G

The parameters of G give the variance of the intercepts, the variance

of the slopes and the covariance between intercepts and the slopes.

Would it make sense to assume that the covariance is 0 to reduce the

number of parameters in the model? To address this, consider that the

variance of the heights of individual regression lines a fixed value of

X is:

1

1

00 01

10 11

2

00 01 11

Var( ) Var 1

11

2

X X

g g

X g g X

g g X g X

88

Summarizing:

2

1 00 01 11Var( ) 2X g g X g X

is quadratic function of X.

So 1Var( )X has a minimum at 01

11

g

g

and the minimum variance is

2

01

00

11

gg g

89

Thus, assuming that the covariance is 0 is equivalent to

assuming that the minimum variance occurs when X = 0. This is

an assumption that is not invariant with location transformations of

X. It is similar to removing a main effect that is marginal to an

interaction in a model, something that should not be done without a

thorough understanding of its consequences.

Example: Let 20

1

and 10.5 1

1 0.1

G

90

A sample of lines in

beta space

1

0

15

20

25

30

-2.0 -1.5 -1.0 -0.5 0.0

concentration ellipse

91

The same lines in

data space.

X

Y

0

5

10

15

20

25

5 10 15 20 25

92

The same lines in

data space with the

population mean

line and lines at one

SD above and

below the

population mean

line

X

Y

0

5

10

15

20

25

5 10 15 20 25

93

The parameters of

G determine the

location and value

of the minimum

standard deviation

of lines

X

Y

0

5

10

15

20

25

5 10 15 20 25

g01 g11

g00

g00 g012 g11

94

With two time-varying variables with random effects, the G matrix

would look like:

0 00 01 02

1 10 11 12

2 20 21 22

Var

i

i

i

g g g

g g g

g g g

The point of minimum variance is located at:

1

1011 12

21 22 20

gg g

g g g

95

Differences between lm (OLS) and lme (mixed model) with

balanced data

Just looking at regression coefficients:

> fit.ols <- lm( distance ~ age * Sex, dd)

> fit.mm <- lme( distance ~ age * Sex, dd,

+ random = ~ 1 + age | Subject)

> summary(fit.ols)

Call:

lm(formula = distance ~ age * Sex, data = dd)

Residuals:

Min 1Q Median 3Q Max

-5.6156 -1.3219 -0.1682 1.3299 5.2469

96

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 16.3406 1.4162 11.538 < 2e-16

age 0.7844 0.1262 6.217 1.07e-08

SexFemale 1.0321 2.2188 0.465 0.643

age:SexFemale -0.3048 0.1977 -1.542 0.126

. . .

> summary(fit.mm)

. . .

Fixed effects: distance ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 16.340625 1.0185320 79 16.043311 0.0000

age 0.784375 0.0859995 79 9.120691 0.0000

SexFemale 1.032102 1.5957329 25 0.646789 0.5237

age:SexFemale -0.304830 0.1347353 79 -2.262432 0.0264

Note that going from OLS to MM, precision shifts from between-

subject comparisons to within-subject comparisons. When data are

balanced, lm (OLS) and lme (mixed models) produce the same ˆ s

with different SEs.

97

Take 2: Learning lessons from unbalanced

data

What can happen with unbalanced data?

Here is some data that is similar to the Pothoff and Roy data but with:

different age ranges for different subjects

a between-subject effect of age that is different from the within-

subject effect of age

98

> head(du)

y x id xb xw Subject Sex age

1 12.37216 8 1 11 -3 F09 Female 8

2 11.20801 10 1 11 -1 F09 Female 10

3 10.44755 12 1 11 1 F09 Female 12

4 10.43831 13 1 11 2 F09 Female 13

5 14.13549 9 2 12 -3 F11 Female 9

6 13.47965 11 2 12 -1 F11 Female 11

> tail(du)

y x id xb xw Subject Sex age

103 35.67045 37 26 36 1 M08 Male 37

104 35.70928 38 26 36 2 M08 Male 38

105 38.81624 34 27 37 -3 M10 Male 34

106 37.87866 36 27 37 -1 M10 Male 36

107 36.22499 38 27 37 1 M10 Male 38

108 35.62520 39 27 37 2 M10 Male 39

99

age_raw

y

10

20

30

40

10 20 30 40

M08 M10

10 20 30 40

M04 M02

10 20 30 40

M16 M14

M12 M11 M07 M06 M01

10

20

30

40

M05

10

20

30

40

M09 M15 M13 M03

F02 F10 F06 F03 F08

10

20

30

40

F07

10

20

30

40

F01

10 20 30 40

F04 F11

10 20 30 40

F09 F05

100

age_raw

y

10

20

30

40

10 20 30 40

Male

10 20 30 40

Female

101

Using age centered at

25.

Why? Like the ordinary

regression model, the

mixed model is

equivariant under

global centering but

convergence may be

improved because the G

matrix is less eccentric.

age

y

10

20

30

40

-10 0 10

Male

-10 0 10

Female

102

R code and output

> fit <- lme( y ~ age * Sex, du,

+ random = ~ 1 + age| Subject)

> summary( fit )

Linear mixed-effects model fit by REML

Data: du

AIC BIC logLik

374.6932 395.8484 -179.3466

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-

Cholesky parametrization

StdDev Corr

(Intercept) 9.32672995 (Intr)

age 0.05221248 0.941

Residual 0.50627022

103

Fixed effects: y ~ age * Sex

Value Std.Error DF t-value p-value

(Intercept) 40.26568 2.497546 79 16.122095 0.0000

age -0.48066 0.035307 79 -13.613685 0.0000

SexFemale -14.01875 3.830956 25 -3.659333 0.0012

age:SexFemale 0.05239 0.055373 79 0.946092 0.3470

Correlation:

(Intr) age SexFml

age -0.007

SexFemale -0.652 0.005

age:SexFemale 0.005 -0.638 0.058

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-2.10716969 -0.54148659 -0.02688422 0.59030024 2.14279806

Number of Observations: 108

Number of Groups: 27

104

Between, Within and Pooled Models

We first focus on one

group, the female data:

What models could we fit

to this data?

age

y

10

20

30

40

-15 -10 -5 0 5 10

105

age

y

10

20

30

40

-15 -10 -5 0 5 10

marginal SD ellipse

106

Regressing with the

pooled data – ignoring

Subject – yields the

marginal (unconditional)

estimate of the slope: Pˆ

age

y

10

20

30

40

-15 -10 -5 0 5 10

marginal regression line

107

We could replace each

Subject by its means for x

and y and use the resulting

aggregated data with one

point for each Subject.

age

y

10

20

30

40

-15 -10 -5 0 5 10

108

age

y

10

20

30

40

-15 -10 -5 0 5 10

dispersion ellipse of subject means

109

Performing a regression

on the aggregated data

yields the ‘between-

subject’ regression, in

some contexts called an

‘ecological regression’

estimating, in some

contexts, the

'compositional effect' of

age.

age

y

10

20

30

40

-15 -10 -5 0 5 10

regression on subject means = ecological regression

110

We can combine all

within-subject regressions

to get a combined estimate

of the within-subject

slope. This is the estimate

obtained with a fixed-

effects model using age

and Subject additively.

Equivalently, we can

perform a regression using

(the within-subject

residuals of y minus mean

y) on (age minus mean

age).

Q: Which is better: ˆ, Wˆ or

Pˆ ?

.

age

y

10

20

30

40

-15 -10 -5 0 5 10

within-subject regression

111

A: None. They answer

different questions.

Typically, Pˆ would be

used for prediction across

the population; Wˆ for

‘causal’ inference

controlling for between-

subject confounders,

assuming that all

confounders affect all

observations similarly.

age

y

10

20

30

40

-15 -10 -5 0 5 10

within-subject regression

112

The relationship among

estimators:

Pˆ combines ˆ and Wˆ :

1P B W B B W Wˆ ˆ ˆW W W W

The weights depend only

on the design ( X matrix),

not of estimated variances

of the response.

age

y

10

20

30

40

-15 -10 -5 0 5 10

between-subject

marginal

within-subject

113

The Mixed Model

The mixed model

estimate5 also combines

ˆ and Wˆ :

1MM

MM B W

MM

B B W W

ˆ

ˆ ˆ

W W

W W

but with a lower weight on

ˆ:

MM 00

B B B

00

00

/

/

/ /1

T

gTW W W

T g T

g

Note that

MM

B BW W

5 Using a random intercept model

age

y

10

20

30

40

-15 -10 -5 0 5 10

between-subject

marginal

mixed model

within-subject

114

The mixed model estimator is a variance optimal combination of ˆ

and Wˆ .

It makes perfect sense if ˆ and Wˆ estimate the same thing, i.e. if

W !

Otherwise, it’s an arbitrary combination of estimates that estimate

different things. The weights in the combination have no

substantive interpretation.

i.e. it’s an optimal answer to a meaningless question.

Summary of the relationships among 4 models:

Model Estimate of slope Precision

Between Subjects ˆ BW

Marginal (pooled data) Pˆ

Mixed Model MMˆ

Within Subjects Wˆ WW

115

The pooled estimate combines ˆ and Wˆ :

1

P B W B B W W

ˆ ˆ ˆW W W W

Mixed model

With a random intercept model:

1 0 0, 00~ (0, ), ~ (0, )i iit iit it tu uy X N g N

with 00 ,g known MMˆ is also a weighted combination of ˆ

and Wˆ but with less weight on ˆ:

116

MM

B B

00

B

/

/

Between-Subject Information

Within-Subject Informationmonotone

TW W

T g

f W

MMˆ is between Wˆ and Pˆ , i.e. it does better than Pˆ in the sense

of being closer to Wˆ but is not equivalent to Wˆ .

With balanced data W MM P Pˆ ˆ ˆ ˆ

As

00

1 0

T g

, MM Wˆ ˆ , so a mixed model estimates the

within effect asymptotically in T – which is the cluster size

NOT the number of clusters.

117

As

00

1

T g

, MM Bˆ ˆ . Thus the mixed model estimate fails

to control for between-subject confounding factors. Note that

this does not capture the whole story because Wˆ and Bˆ are not

independent of

00

g . If

00

0g then

B W so that MM B W

118

A serious a problem? a simulation

1,000 simulations

showing

mixed model estimates

of slope

using the same

configuration of Xs with

W 1/ 2 and B 1 ,

keeping 00 1/ 2g and

allowing

to vary from 0.005 to

5

^

1

-0.5

0.0

0.5

1.0

0 1 2 3 4 5

119

What happened?

As gets larger,

the relatively small

value of 00g is harder to

identify

and

both sources of

variability

(within-subject and

between-subject)

are attributed to .

^

0

2

4

6

0 1 2 3 4 5

120

The blue line is the

diagonal ˆ and the

equation of the red line

is ˆ 1 .

When 00ˆ 0g , the

between- subject

relationship is treated as

if it has very high

precision and it

dominates in forming

the mixed model

estimate.

^

0

2

4

6

0 1 2 3 4 5

121

Splitting age into two variables

Since age has a within-subject effect that is inconsistent with its

between-subject effect we can split it into two variables:

1. Between-subject ‘contextual predictor’: e.g. age.mean of each

subject (or the starting age), and

2. within-subject predictor:

a. age itself or

b. within-subject residual: age.resid = age – age.mean

So we model:

.E( ) .age mean ageit i it ity age mean age

or

122

* * *

.0 .E( ) . .age meanit i it itage diffy age mean age diff

Surprisingly

*

. ageage diff

but

* *

. . .

.

age mean age mean age diff

age mean age

123

9 10 11 12

9

1

0

1

1

1

2

age

y

124

9 10 11 12

9

1

0

1

1

1

2

age

y

125

9 10 11 12

9

1

0

1

1

1

2

age

y

age

126

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*

127

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*

128

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*age.mean

129

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*age.mean

130

*

.

.

age mean

age mean age

*

.age mean keeps

age.diff constant

.age mean keeps age

constant

9 10 11 12

9

1

0

1

1

1

2

age

y

age age.diff*age.mean

age.mean*

Compositional effect

= Contextual effect

+ Within-subject effect

131

Using 'lme' with a contextual mean

> fit.contextual <- lme(

+ y ~ (age + cvar(age,Subject) ) * Sex,

+ du,

+ random = ~ 1 + age | Subject)

> summary(fit.contextual)

Linear mixed-effects model fit by REML

Data: du

AIC BIC logLik

296.8729 323.1227 -138.4365

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 1.53161007 (Intr)

age 0.03287630 0.024

Residual 0.51263884

132

Fixed effects: y ~ (age + cvar(age, Subject)) * Sex

Value Std.Error DF t-value

(Intercept) -3.681624 1.6963039 79 -2.170380

age -0.493880 0.0343672 79 -14.370670

cvar(age, Subject) 1.628584 0.0695822 23 23.405165

SexFemale 6.000170 2.5050694 23 2.395211

age:SexFemale 0.060143 0.0538431 79 1.116996

cvar(age, Subject):SexFemale -0.313087 0.1266960 23 -2.471167

p-value

(Intercept) 0.0330

age 0.0000

cvar(age, Subject) 0.0000

SexFemale 0.0251

age:SexFemale 0.2674

cvar(age, Subject):SexFemale 0.0213

. . . . .

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-1.871139553 -0.502221634 -0.006447848 0.552360837 2.428148053

Number of Observations: 108

Number of Groups: 27

133

> fit.compositional <- lme( y ~ (dvar(age,Subject) +

+ cvar(age,Subject) ) * Sex, du,

+ random = ~ 1 + age | Subject)

> summary(fit.compositional)

Linear mixed-effects model fit by REML

Data: du

AIC BIC logLik

296.8729 323.1227 -138.4365

Random effects:

Formula: ~1 + age | Subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 1.53161006 (Intr)

age 0.03287629 0.024

Residual 0.51263884

Fixed effects: y ~ (dvar(age, Subject) + cvar(age, Subject)) * Sex

Value Std.Error DF t-value

(Intercept) -3.681624 1.6963039 79 -2.170380

dvar(age, Subject) -0.493880 0.0343672 79 -14.370670

cvar(age, Subject) 1.134704 0.0616092 23 18.417778

SexFemale 6.000170 2.5050694 23 2.395211

dvar(age, Subject):SexFemale 0.060143 0.0538431 79 1.116996

cvar(age, Subject):SexFemale -0.252945 0.1161225 23 -2.178257

134

p-value

(Intercept) 0.0330

dvar(age, Subject) 0.0000

cvar(age, Subject) 0.0000

SexFemale 0.0251

dvar(age, Subject):SexFemale 0.2674

cvar(age, Subject):SexFemale 0.0399

. . . . .

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-1.871139550 -0.502221640 -0.006447847 0.552360836 2.428148063

Number of Observations: 108

Number of Groups: 27

135

Simulation Revisited

1,000 simulations

using the same

models as the

earlier simulation,

i.e.

the same

configuration of Xs

with

W 1/ 2 and B 1 ,

keeping 00 1/ 2g

and

allowing

to vary from

0.005 to 5

^

W

^

B

-1.0

-0.5

0.0

0.5

1.0

0 1 2 3 4 5

136

Here a mixed

model is used with

mean age by

subject and the

within-subject

residual of age

from mean age.

^

W

^

B

-1.0

-0.5

0.0

0.5

1.0

0 1 2 3 4 5

137

Including the

contextual variable

gives better

estimates of

variance

components. The

estimate of does

not eventually

include 00g

^

0

1

2

3

4

5

0 1 2 3 4 5

138

Power

The best way to carry out power calculations is to simulate. You end

up learning about a lot more than power.

Nevertheless, Stephen Raudenbush and colleagues have a nice

graphical package available at Optimal Design Software.

139

Some links

There is a very good current bibliography as well as many other

resources at the UCLA Academic Technology Services site. Start

your visit at

http://www.ats.ucla.edu/stat/sas/topics/repeated_measures.htm

Another important site is the Centre for Multilevel Modeling,

currently at the University of Bristol:

http://www.cmm.bristol.ac.uk/learning-training/multilevel-m-

support/news.shtml

140

A few books

Pinheiro, Jose C. and Bates, Douglas M. (2000) Mixed-Effects

Models in S and S-PLUS. Springer

Fitzmaurice, Garrett M., Laird, Nan M., Ware, James H. (2004)

Applied Longitudinal Analysis, Wiley.

Allison, Paul D. (2005) Fixed Effects Regression Methods for

Longitudinal Data Using SAS, SAS Institute.

Littell, Ramon C. et al. (2006) SAS for Mixed Models (2nd ed.),

SAS Institute.

Singer, Judith D. and Willett, John B. (2003) Applied Longitudinal

Data Analysis : Modeling Change and Event Occurrence. Oxford

University Press.

141

Appendix: Reinterpreting weights

The mixed model estimate using a random intercept model can be seen

either as a weighted combination of ˆ and Wˆ or of Pˆ and Wˆ

1

MM B W B B W W

00 00

11 1 1 1

00 B W 00 B B W W

11

1 1

00 B W W 00 B W P

/ /ˆ ˆ ˆ

/ /

ˆ ˆ

ˆ

T TW W W W

T g T g

g W W g W W

T T T T

g W W W g W W

T T

1

W W

ˆW

学霸联盟