STAT3022-R代写
时间:2023-05-23
Linear mixed effect models
Dr. Linh Nghiem
STAT3022
Fixed effect and random effect
Fixed effect and random effect
• A factor is called to have a fixed effect if the levels of the
factor are specifically selected by the experimenter, and we are
usually interested in difference of responses among those
specific levels.
• In contrast, a factor is called to have a random effect if the
levels are meant to be representative of a general population
of possible levels.
1
Examples
• Y is the rating of job applicants.
• Five personnel officers were selected at random to be
interviewers, and four prospective employee candidates were
assigned at random to each selected officer.
• We are not interested in the differences between the five
interviewers that happened to be picked; rather, we want to
quantify and account for the effect of interviewers in general,
so interviewer is a random effect factor.
These selected interviewers are representatives of a general
“population” of interviewers.
2
Examples
• A traffic engineer is interested in comparing the total unused
red-light time (Y ) for five different traffic-light signal
sequences.
• The experiment was conducted with a latin-square design in
which the two blocking factors are (a) five randomly selected
intersections and (b) five specific time periods.
• Classify:
Signal sequences: fixed effect.
Intersections: random effect
Time periods: fixed effect.
3
Fixed and random effects
• In some situations it is clear from the experiment whether an
effect is fixed or random. However there are also situations in
which calling an effect fixed or random depends on your point
of view, and on your interpretation and understanding.
• With fixed effects we are primarily interested in the means of
the factor levels (and differences between them). With
random effects, we are primarily interested in their variances.
4
Variance component estimations
One-way random effect model
Let yij denote the jth observation on factor level i.
• Fixed effect model (so far):
yij = µ+ αi + εij , εij ∼ N(0, σ2).
• Random effect model:
yij = µ+ ai + εij , ai
iid∼ N(0, σ2a), εij iid∼ N(0, σ2)
ai independent from εij
• For fixed-factor model, we have only one random variable to
explain yij , but for random effect model, we have two random
variables.
5
One-way random effect model
yij = µ+ ai + εij , ai
iid∼ N(0, σ2a), εij iid∼ N(0, σ2)
ai independent from εij
• There are three parameters: µ, σ2a and σ
2.
• Note that ai is a random variable, so we can only predict a
realization of it.
• In general, we will use the Greek letter (α, β, etc.) to denote
fixed effects, and use the regular Roman letter (a, b, etc.) to
denote random effects.
6
Intraclass coefficient
yij = µ+ ai + εij , ai
iid∼ N(0, σ2a), εij iid∼ N(0, σ2)
ai independent from εij
We have
Cov(yij , yik) = Cov(µ+ ai + εij , µ+ ai + εij) = Var(ai) = σ
2
a,
Var(yij) = Var(µ+ ai + εij) = Var(ai + εij) = σ
2
a + σ
2
,
and the intraclass coefficient is defined to be
ρ = Cor(yij , yik) =
Cov(yij , yik)√
Var(yij)Var(yik)
=
σ2a
σ2a + σ
2
.
7
Partitioning total variability
As in the one-way ANOVA model with fixed factor, the partitioning
of total variability is unchanged.
Source of variation df SS MS F
Factor t− 1 SSR MSR MSR/MSE
Residuals n− t SSE MSE
Total n− 1 SSTotal
with SSE =
∑t
i=1
∑ni
j=1(yij − y¯i•)2, SSR =
∑t
i=1 ni(y¯i• − y¯••)2,
and SSTotal =
∑t
i=1
∑ni
j=1(yij − y¯••)2. The mean square is the
ratio of the sum of squares with the corresponding degree of
freedom.
8
Method of moment estimate for variance components
The most important changes from fixed effect to random effect is
in the expected mean squares, particularly the expected value of
MSR.
• With both fixed and random effect: E(MSE) = σ2.
• With fixed effect: E(MSR) = σ2 +Q(α1, . . . , αt), where
Q(α1, . . . αt) denotes a function of α1, . . . αt.
• With random effect: assuming each level has r replicates,
then it can be proven that
E(MSR) = σ2 + rσ2a
9
Method of moment estimate for variance components
The method-of-moment estimates for σ2 and σ2a when each level
has r replicates therefore are
σˆ2 = MSE,
σˆ2a =
MSR−MSE
r
.
These estimates are unbiased, but σˆ2a are not guaranteed to be
positive!
• Maximum likelihood estimators will resolve this issue (we will
talk about them later).
It is also possible to derive confidence intervals for these variances
components (self-study).
10
Hypothesis test for random effect component
• Our null hypothesis is that there is no effect of factor A.
Under the random effect model, it corresponds to:
H0 : σ
2
a = 0, Ha : σ
2
a > 0.
• In other words, we test whether the difference in factor-level
means can be explained by the random noise alone.
• Under H0, E(MSR) = E(MSE) = σ2, so the test statistic is
F =
MSR
MSE
∼ Ft−1,n−t under H0
so we reject H0 when p-value = P (Ft−1,n−t > F ) ≤ α.
11
Estimation of µ
Since E(yij) = µ, then an unbiased estimator for µ is µˆ = y¯••.
The variance of this estimator is
Var(µˆ) = Var
 1
n
t∑
i=1
r∑
j=1
yij
 = t
n2
Var
 r∑
j=1
yij

=
t
n2
 r∑
j=1
Var(yij) + 2
r∑
j=1

k>j
Cov(yij , yik)

=
t
n2
(
r(σ2a + σ
2) + r(r − 1)σ2a
)
=
1
n
(σ2 + rσ2a) =
E(MSR)
n
so the standard error is se(µˆ) =

n−1MSR. The confidence
interval of µ will therefore be
µˆ± tα/2,t−1se(µˆ).
12
Interview score example
50
60
70
80
90
1 2 3 4 5
Officer
Sc
or
e
Method-of-moment estimate for variance components:
σˆ2 = 73.28, σˆ2a =
394.92− 73.28
4
= 80.4, ρˆ =
σˆ2a
σˆ2 + σˆ2a
= 0.5232.
and the F -test is significant, showing σ2a is significantly different
from zero. The ICC shows about half the variance in score is
explained by officer.
13
Interview score example
The point estimate for the mean score is µˆ = y¯•• = 71.45, and the
95% confidence interval is
71.45± t0.025,4

394.92
20
= (59.11, 83.78)
Noting that if the officer were a fixed factor, then the variance of
y¯•• would equal σ2/n, so the standard error for the grand mean is
bigger when the factor is treated as random.
14
Two random factors models
We can extend the idea to more than one random factors. For two
factors A and B, the model is
yijk = µ+ ai + bj + cij + εijk,
ai
iid∼ N(0, σ2a), bj iid∼ N(0, σ2b ), cij iid∼ N(0, σ2ab), εijk iid∼ N(0, σ2)
where ai and bj represent the main effect of factor A and B
respectively, and cij represents the interaction effect.
For simplicity, we only consider the case when we have r ≥ 2 for
each factor-level combination of A and B.
15
Expected mean squares
The partitioning of total variability will stay the same as in the
fixed factor cases, but only the expected mean squares change.
Source df SS MS E(MS)
Factor A a− 1 SSA MSA σ2 + rσ2ab + rbσ2a
Factor B b− 1 SSB MSB σ2 + rσ2ab + raσ2b
Interactions (a− 1)(b− 1) SSAB MSAB σ2 + rσ2ab
Residuals ab(r − 1) SSE MSE σ2
Total abr − 1 SSTo
The data are balanced, so the order of A and B do not matter. We
can use these expected mean squares to estimate variance
components as well, but these estimates are not guaranteed to be
positive.
16
Hypothesis testing for variance components
• Testing for interaction H0 : σ2ab = 0 versus Ha : σ
2
ab > 0 is
similar to testing interaction in the fixed factor case. When
H0 is true, E(MSAB) = E(MSE) = σ
2, so the test statistic is
F =
MSAB
MSE
∼ F(a−1)(b−1),ab(r−1) under H0.
• Testing for main effect H0 : σ2a = 0 versus Ha : σ
2
a > 0
changes. When H0 is true, E(MSA) = σ
2 + rσ2ab = E(MSAB),
so the test statistic is
F =
MSA
MSAB
∼ F(a−1),(a−1)(b−1) under H0.
Similar procedure for testing H0 : σ
2
b = 0 versus Hb : σ
2
b > 0.
17
R&R Example
• A common industrial application is to use a designed
experiment to study the components of variability in a
measurement system, which aims to evaluate gauge
repeatability and reproducibility studies
Gage repeatability is the ability of a single operator to obtain the same measurement value
multiple times using the same measuring instrument (or gage) on the same feature of a single
manufactured component (or part).
Gage reproducibility is the ability of different operators to obtain the same measured value multiple
times using the same gage on the same part.
• First, a sample of 10 parts was selected. Next, a random or
representative sample of three inspectors is selected. Finally,
each inspector measures each part twice in a random order.
18
R&R example
Statistical model:
yijk = µ+ ai + bj + cij + εijk,
ai
iid∼ N(0, σ2a), bj iid∼ N(0, σ2b ), cij iid∼ N(0, σ2ab), εijk iid∼ N(0, σ2)
where yijk is the kth measurement (k = 1, 2) made by the jth
inspector (j = 1, 2, 3) on the ith part (i = 1, . . . , 10), ai represents
the part effect, bj represents the operator effect, and cij represents
the interaction effect.
19
R&R example
Estimation of variance components:
σˆ2 = MSE = 0.00075
σˆ2 + 2σˆ2ab = MSAB = 0.027
σˆ2 + 2σˆ2ab + 20σˆ
2
b = MSB = 0.014
σˆ2 + 2σˆ2ab + 6σˆ
2
a = MSA = 0.1610
which gives σˆ2ab = 0.013, σˆ
2
b = −0.0006, and σˆ2a = 0.022. 20
R&R example
From the ANOVA table, the F-test corresponding to the interaction
term is significant, so we can conclude the variability of the system
is significantly driven by the interaction between operator and part.
21
R&R example
• Note that if one wants to test for the main effect of part
and/or operator, the F-statistic given by the above ANOVA
table corresponding to each main effect term is not valid (the
denominator of these statistics are MSE, not MSAB).
• Instead, assuming we want to test H0 : σ2b = 0 versus
Ha : σ
2
b > 0, the test statsistic would be
F =
MSB
MSAB
=
0.001485
0.02689
= 0.552
and the p-value is P (F2,18 > 0.552) = 0.585, so we fail to
reject H0.
22
Linear mixed effect models
Model description
When we have both fixed effect and random effect factors, we
typically use the linear mixed effect model to study both the mean
of the fixed factors and the (co)variance of the random factors.
The most general form of the linear mixed model is
y = Xβ + Zu+ ε, where
[
u
ε
]
∼ N
([
0
0
]
,
[
G 0
0 R
])
,
• y is a n× 1 vector of response
• X is the n× p design matrix for the fixed effects β,
• Z is the n× q design matrix for the random effects u,
• ε is the n× 1 vector of error.
23
Model description
y = Xβ + Zu+ ε, where
[
u
ε
]
∼ N
([
0
0
]
,
[
G 0
0 R
])
,
• This model implies µ ∼ N(0,G), ε ∼ N(0,R) and µ and ε
are independent. As a result,
E(y) = Xβ, Var(y) = Var(Zu+ ε) = ZGZ> + R := V.
• The model is also written in the hierarchical form:
y|u ∼ N(Xβ + Zu,R)
u ∼ N(0,G).
24
Example: One fixed effect and one random effect
yij = µ+ αi + bj + εij , bj
iid∼ N(0, σ2b ), εij iid∼ N(0, σ2).
For simplicity, let’s assume each factor has only 2 levels, and each
factor-level combination has one replicate.
y =

y11
y12
y21
y22
 , X =

1 1 0
1 1 0
1 0 1
1 0 1
 ,β =
 µα1
α2
 ,
Z =

1 0
0 1
1 0
0 1
 ,u =
[
b1
b2
]
, ε =

ε11
ε12
ε21
ε22
 .
25
Example: One fixed effect and one random effect
yij = µ+ αi + bj + εij , bj
iid∼ N(0, σ2b ), εij iid∼ N(0, σ2).
u ∼ N2
(
0, σ2b I2
)
,G = σ2b I2
ε ∼ N4
(
0, σ2I4
)
,R = σ2I4
V = Z G Z>+R = σ2bZZ
> + σ2I4
=

σ2 + σ2b 0 σ
2
b 0
0 σ2 + σ2b 0 σ
2
b
σ2b 0 σ
2 + σ2b 0
0 σ2b 0 σ
2 + σ2b

which is expected.
26
Example: One fixed effect and one random effect
If we have replication for each factor-level combination, then the
interaction can exist between fixed and random factor. Interaction
itself is a random effect.
yijk = µ+ αi + bj + cij + εijk, i, j, k = 1, 2.
y = (y111, y112, y121, . . . , y222)
> , ε = (ε111, ε112, . . . ε222)>
β = (µ, α1, α2)
> ,u = (b1, b2, c11, c12, c21, c22)>
X =

1 1 0
1 1 0
...
1 0 1
1 0 1
 ,Z =

1 0 1 0 0 0
1 0 1 0 0 0
...
0 1 0 0 0 1
 .
27
One fixed effect and one random effect
We assume bj
iid∼ N(0, σ2b ) and cij ∼ N(0, σ2ab) and bj and cij are
independent with each other . As a result, the covariance matrix
G of random-effect u has a block diagonal structure.
G =
[
σ2b I2 02×4
04×2 σ2abI4.
]
We also assume εijk
iid∼ N(0, σ2), i.e R = Var(ε) = σ2I8.
28
Estimating β when V is known
When V is known, we estimate β based on the marginal model
y ∼ N(Xβ,V).
Recall that for regression model, we assume y ∼ N(Xβ, σ2In) and
use OlS to estimate β. The current model only has a different
covariance structure, but if we let y˜ = V−1/2y and X˜ = V−1/2X,
then we would have
y˜ ∼ N(X˜β, In).
Hence, we can use OLS on (X˜, y˜) to obtain the generalized least
square (GLS) estimator of β. For example, if X is full-ranked, then
βˆ =
(
X˜>X˜
)−1
X˜>y˜ =
(
X>V−1X
)−1
X>V−1y.
29
Estimating variance components
• V = ZGZ> + R is rarely known in practice, since G and R
are unknown and they are usually the main object of interest
in the LMM.
• Recall for a p-variate normal random vector x with mean µ
and covariance matrix Σ, the log likelihood function is
`(µ,Σ) = −1
2
{
log |Σ|+ (x− µ)>Σ−1(x− µ)
}
+ c
where c is a constant not depends on µ and Σ. The
maximum likelihood estimator is found by maximizing this log
likelihood function.
30
Estimating variance components: ML estimation
Recall that y ∼ N(Xβ,V), so the log likelihood function is
`(β,V) = −1
2
{
log |V|+ (y −Xβ)>V−1(y −Xβ)
}
+ c.
This is the function of both β and V. However, if we know V ,
then we use βˆ to estimate β, and hence we can replace β with βˆ
to obtain a profile likelihood for V as
`p(V) = −1
2
{
log |V|+ (y −Xβˆ)>V−1(y −Xβˆ)
}
+ c,
with βˆ =
(
X>V−1X
)−1
X>V−1y. Since V depends on G and
R, we can write `p(V) = `p(G,R) and the MLE estimator for G
and R is found by maximizing this function (and no closed form
solution is available).
31
Estimating variance components: Restricted ML
• For the ML estimation, estimating variance components
depends upon β, which is arguably somewhat undesirable.
• Restricted maximum likelihood estimation (REML): Instead of
maximizing the log likelihood for y data, they maximize the
log likelihood of Ky, where K is a full-rank matrix such that
KX = 0.
Hence, E(Ky) = E(KXβ) = 0 and Var(Ky) = KVK>.
Ky ∼ N(0,KVK>) and the log-likelihood function based on
Ky would not depend on β.
`r(V) = −1
2
{
log |KVK>|+ (Ky)>
(
KVK>
)−1
(Ky)
}
+c,
32
Estimating variance components: Restricted ML
• There are indefinite choice of such matrix K. Assuming X is
full-ranked, the common choice of K is
K = I−H, H = X
(
X>X
)−1
X>
• It can be proven that with such choice of K, we can write
`r(V) = `p(V)− 1
2
log |X>V−1X|.
• Again `r(V) = `r(G,R) and the REML estimator for G and
R is found by maximizing this function (and no closed form
solution is available either).
33
Estimating variance components
In the most general form, `p or `r are functions of matrices G and
R, but in specific problems, they can usually be reduced to
functions of scalars.
Typically, R = σ2In.
In the example of model with one mixed and one random
factor without interaction, G = σ2b Iq these likelihood
functions are only the functions of σ2 and σ2b .
As a result, the optimization problem for both ML and REML
are usually not so expensive.
Finally, REML is typically the default method for estimating
variance components for LMM in statistical software.
34
Revisiting R&R example
In the R&R example above, we have 2 random factors (parts and
operators) without any fixed factor. Previously, we use
methods-of-moments to estimate variance components; now we
will use the ML and REML to estimate them.
yijk = µ+ ai + bj + cij + εijk
• Note that the fixed effect matrix X now has only the column
one (corresponding to constant µ.)
• We will use the lmer function from the lme4 package. To
include a factor as random effects, we use the vertical sign (|)
35
Revisiting R&R example
Syntax for lmer function:
• Formula: y ∼ 1 + (1 | part) + (1 | oper) + (1 | part:oper),
• Default REML = TRUE to use REML, but you can change to
REML = FALSE to use ML.
36
Revisiting R&R example
REML ML
• For reference, the MOM estimates are σˆ2ab = 0.013,
σˆ2b = −0.0006, and σˆ2a = 0.022.
• The REML estimates are typically closer to MOM, especially
when the design is balanced, so it is less biased than ML.
• For both ML and REML, variance components estimated are
guaranteed to be non-negative.
37
Example: disk drive service
• A service center for electronic equipment wants to study the
effects of make of disk drive (factor A) and technicians (B) on
the service time.
• Three makes of disk drives are considered, while the three
technicians are selected at random from a large number of
technicians who work at the company.
• Each technician performs the service on five disks of each
make.
38
Example: disk drive service
Statistical model:
Code:
.
39
Example: disk drive service
REML ML
ANOVA
40
Best linear unbiased prediction (BLUP)
Recall the hierarchical form of the LMM,
y|u ∼ N(Xβ + Zu,R)
u ∼ N(0,G).
so instead of marginalizing u, we can form the joint log-likelihood
based on the joint conditional distribution p(y,u) = p(y |u)p(u),
so
` = log p(y,u) = log p(y |u) + log p(u)
= −1
2
log |R| − 1
2
(y −Xβ − Zu)>R−1(y −Xβ − Zu)
−1
2
log |G| − u>G−1u+ constant.
41
Best linear unbiased prediction (BLUP)
Differentiating ` with respect to β and µ gives[
X>R−1X X>R−1Z
Z>R−1X Z>R−1Z + G−1
][
βˆ

]
=
[
X>R−1y
Z>R−1y
]
,
and by solving this equation, we obtain
βˆ = (X>V−1X)−1X>V−1y
u˜ = GZ>V−1(y −Xβˆ).
Noting that the estimator for βˆ is the same as before, while the
second equation gives the best linear unbiased prediction (BLUP)
for the random effect u.
42
Interview example
Random effect model: yij = µ+ ai + εij , i = 1 . . . , 5
43
Interview example
Comparing fitted values for each officer when it is treated as fixed
and random effects:
The grand mean of score is y¯•• = 71.45, so the BLUP estimate
tends to shrink the fitted values of each officer toward the grand
mean.


essay、essay代写