STA135 Lecture 2: Sample Mean Vector and Sample Covariance Matrix
Xiaodong Li UC Davis
1 Sample mean and sample covariance
Recall that in 1-dimensional case, in a sample x1, . . . , xn, we can define
x¯ =
1
n
n∑
i=1
xi
as the (unbiased) sample mean
s2 :=
1
n− 1
n∑
i=1
(xi − x¯)2
p-dimensional case: Suppose we have p variates X1, . . . , Xp. For the vector of variates
~X =
X1...
Xp
 ,
we have a p-variate sample with size n:
~x1, . . . , ~xn ∈ Rp.
This sample of n observations give the following data matrix:
X =

x11 x12 . . . x1p
x21 x22 . . . x2p
...
...
. . .
...
xn1 xn2 . . . xnp
 =

~x>1
~x>2
...
~x>n
 . (1.1)
Notice that here each column in the data matrix corresponds to a particular variate Xj .
Sample mean: For each variate Xj , define the sample mean:
x¯j =
1
n
n∑
i=1
xij , j = 1, . . . , p.
Then the sample mean vector
~x :=
x¯1...
x¯p
 =

1
n
n∑
i=1
xi1
...
1
n
n∑
i=1
xip
 =
1
n
n∑
i=1
xi1...
xip
 = 1
n
n∑
i=1
~xi.
1
Sample covariance matrix: For each variate Xj , j = 1, . . . , p, define its sample variance as
sjj = s
2
j :=
1
n− 1
n∑
i=1
(xij − x¯j)2, j = 1, . . . , p
and sample covariance between Xj and Xk
sjk = skj :=
1
n− 1
n∑
i=1
(xij − x¯j)(xik − x¯k), 1 ≤ k, j ≤ p, j 6= k.
The sample covariance matrix is defined as
S =

s11 s12 . . . s1p
s21 s22 . . . s2p
...
...
. . .
...
sp1 sp2 . . . spp
 ,
Then
S =

1
n−1
∑n
i=1(xi1 − x¯1)2 . . . 1n−1
∑n
i=1(xi1 − x¯1)(xip − x¯p)
...
. . .
...
1
n−1
∑n
i=1(xip − x¯p)(xi1 − x¯1) . . . 1n−1
∑n
i=1(xip − x¯p)2

=
1
n− 1
n∑
i=1
 (xi1 − x¯1)
2 . . . (xi1 − x¯1)(xip − x¯p)
...
. . .
...
(xip − x¯p)(xi1 − x¯1) . . . (xip − x¯p)2

=
1
n− 1
n∑
i=1
xi1 − x¯1...
xip − x¯p
 [xi1 − x¯1 . . . xip − x¯p]
=
1
n− 1
n∑
i=1
(
~xi − ~x
) (
~xi − ~x
)>
.
2 Linear transformation of observations
Consider a sample of ~X =
X1...
Xp
 with size n:
~x1, . . . , ~xn.
The corresponding data matrix is represented as
X =

x11 x12 . . . x1p
x21 x22 . . . x2p
...
...
. . .
...
xn1 xn2 . . . xnp
 =

~x>1
~x>2
...
~x>n
 .
For some C ∈ Rq×p and ~d ∈ Rq, consider the linear transformation
~Y =
Y1...
Yq
 = C ~X + ~d.
2
Then we get a q-variate sample:
~yi = C~xi + ~d, i = 1, . . . , n,
The sample mean of ~y1, . . . , ~yn is
~y =
1
n
n∑
i=1
~yi =
1
n
n∑
i=1
(C~xi + ~d) = C(
1
n
n∑
i=1
~xi) + ~d = C~x+ ~d.
And the sample covariance is
Sy =
1
n− 1
n∑
i=1
(
~yi − ~y
) (
~yi − ~y
)>
=
1
n− 1
n∑
i=1
(
C~xi −C~x
) (
C~xi −C~x
)>
=
1
n− 1
n∑
i=1
C
(
~xi − ~x
) (
~xi − ~x
)>
C>
= C
(
1
n− 1
n∑
i=1
(
~xi − ~x
) (
~xi − ~x
)>)
C>
= CSxC
>.
3 Block structure of the sample covariance
For the vector of variables ~X =

X1
X2
...
Xp
, we can divide it into two parts: ~X(1) =

X1
X2
...
Xq
 and ~X(2) =

Xq+1
Xq+2
...
Xp
.
In other words,
~X =

X1
X2
...
Xq
Xq+1
Xq+2
...
Xp

=
[
~X(1)
~X(2)
]
, in correspondence ~xi =

xi1
xi2
...
xiq
xi(q+1)
xi(q+2)
...
xip

=
[
~x
(1)
i
~x
(2)
i
]
where
~x
(1)
i =

xi1
xi2
...
xiq
 , ~x(2)i =

xi(q+1)
xi(q+2)
...
xip

3
We have the partition of the sample mean and the sample covariance matrix as follows:
~¯x =

x¯1
x¯2
...
x¯q
x¯q+1
x¯q+2
...
x¯p

=
[
~¯x(1)
~¯x(2)
]
, S =

s11 . . . s1q s1,q+1 . . . s1,p
...
. . .
...
...
. . .
...
sq1 . . . sqq sq,q+1 . . . sq,p
sq+1,1 . . . sq+1,q sq+1,q+1 . . . sq+1,p
...
. . .
...
...
. . .
...
sp1 . . . spq sp,q+1 . . . sp,p

=
[
S11 S12
S21 S22
]
.
By definition, S11 is the sample covariance of ~X
(1) and S22 is the sample covariance of ~X
(2). Here S12
is referred to as the sample cross covariance matrix between ~X(1) and ~X(2). In fact, we can derive the
following formula:
S21 = S
>
12 =
1
n− 1
n∑
i=1
(
~x
(2)
i − ~¯x(2)
)(
~x
(1)
i − ~¯x(1)
)>
4 Standardization and Sample Correlation Matrix
For the data matrix (1.1). The sample mean vector is denoted as ~x and the sample covariance is denoted
as S. In particular, for j = 1, . . . , p, let x¯j be the sample mean of the j-th variable and

sjj be the sample
standard deviation.
For any entry xij for i = 1, . . . , n and j = 1, . . . , p, we get the standardized entry
zij =
xij − x¯j√
sjj
.
Then the data matrix X is standardized to
Z =

z11 z12 . . . z1p
z21 z22 . . . z2p
...
...
. . .
...
zn1 zn2 . . . znp
 =

~z>1
~z>2
...
~z>n
 .
Denote by R the sample covariance for the sample z1, . . . ,zn. What is the connection between R and S?
The i-th row of Z can be written as
zi1
zi2
...
zip
 =

(xi1 − x¯1)/√s11
(xi2 − x¯2)/√s22
...
(xip − x¯p)/√spp
 =

1√
s11
1√
s22
. . .
1√
spp


xi1 − x¯1
xi2 − x¯2
...
xip − x¯p

Let
V −
1
2 =

1√
s11
1√
s22
. . .
1√
spp
 .
This transformation can be represented as
~zi = V
− 1
2 (~xi − ~x) = V − 12~xi − V − 12~x, i = 1, . . . , n.
4
This implies that the sample mean for the new data matrix is
~¯z = V −
1
2 (~¯x− ~¯x) = ~0,
By the formula for the sample covariance of linear combinations of variates, the sample covariance
matrix for the new data matrix Z is
R = V −
1
2S
(
V −
1
2
)>
=

1√
s11
1√
s22
. . .
1√
spp


s11 s12 . . . s1p
s21 s22 . . . s2p
...
...
. . .
...
sp1 sp2 . . . spp


1√
s11
1√
s22
. . .
1√
spp

=

1 s12√s11s22 . . .
s1p√
s11spp
s21√
s22s11
1 . . .
s2p√
s22spp
...
...
. . .
...
sp1√
spps11
sp2√
spps22
. . . 1
 :=

r11 r12 . . . r1p
r21 r22 . . . r2p
...
...
. . .
...
rp1 rp2 . . . rpp

The matrix R is called the sample correlation matrix for the original data matrix X.
5 Mahalanobis distance and mean-centered ellipse
Sample covariance is p.s.d.
Recall that the sample covariance is
S =
1
n− 1
n∑
i=1
(~xi − ~¯x)(~xi − ~¯x)>.
Is S always positive semidefinite? Consider the spectral decomposition
S =
p∑
j=1
λj~uj~u
>
j .
Then S~uj = λj~uj , which implies that
~u>j S~uj = ~u
>
j (λj~uj) = λj~u
>
j ~uj = λj .
On the other hand
~u>j S~uj =
1
n− 1~u
>
j
(
n∑
i=1
(~xi − ~¯x)(~xi − ~¯x)>
)
~uj
=
1
n− 1
n∑
i=1
~u>j (~xi − ~¯x)(~xi − ~¯x)>~uj
=
1
n− 1
n∑
i=1
|~u>j (~xi − ~¯x)|2 ≥ 0.
This implies that all eigenvalues of S are nonnegative, so S is positive semidefinite.
In this course, we always assume n > p and S is positive definite, which also implies that the inverse
sample covariance matrix S−1 is also positive definite.
5
Mahalanobis distance
For any two vectors ~x, ~y ∈ Rp, their Mahalanobis distance based on S−1 is defined as
dM (~x, ~y) =

(~x− ~y)>S−1(~x− ~y).
By spectral decomposition of S−1:
S−1 =
p∑
j=1
1
λj
~uj~u
>
j ,
the Mahalanobis distance is well-defined since
(~x− ~y)>S−1(~x− ~y) = (~x− ~y)>
 p∑
j=1
1
λj
~uj~u
>
j
 (~x− ~y) = p∑
j=1
1
λj
|(~x− ~y)>~uj |2 ≥ 0.
The mean-centered ellipse with Mahalanobis radius c is defined as
{~x ∈ Rp : dM (~x, ~¯x) ≤ c} = {~x ∈ Rp : (~x− ~¯x)>S−1(~x− ~¯x) ≤ c2}.
Mean-centered ellipse
For any ~x, we have
(~x− ~¯x)>S−1(~x− ~¯x) = (~x− ~¯x)>
 p∑
j=1
1
λj
~uj~u
>
j
 (~x− ~¯x) = p∑
j=1
1
λj
|(~x− ~¯x)>~uj |2
Consider a new cartesian coordinate system with center ~¯x and axes ~u1, ~u2, ..., ~up, the new coordinates of
~x based on the axis ~uj becomes wj = (~x− ~¯x)>~uj , j=1, . . . , p. Then the mean-centered ellipse
{~x : (~x− ~¯x)>S−1(~x− ~¯x) ≤ c2}
becomes
{~w :
p∑
j=1
1
(

λj)2
w2j ≤ c2} = {~w :
p∑
j=1
1
(c

λj)2
w2j ≤ 1}
in the new coordinate system, which is an ellipse with half axis lengths c

λ1, c

λ2, ..., c

λp.
X1
X2
~u1~u2
~¯x
c

λ1
c

λ2
6
6 Examples
Example 1
Consider a 2-variate data matrix
X =

x11 x12
x21 x22
...
...
xn1 xn2

with sample mean vector ~x and sample covariance matrix S~x.
Define the new sample
y1 = x11 + x12, y2 = x21 + x22, ..., yn = xn1 + xn2.
Can we compute its sample mean and sample variance directly through ~x and S~x?
Denote C = [1, 1]. Then
yi = xi1 + xi2 = [1, 1]
[
xi1
xi2
]
= C~xi.
The sample mean of y1, . . . , yn can be represented as
y¯ =
1
n
[(x11 + x12) + . . .+ (xn1 + xn2)]
=
1
n
[x11 + . . .+ xn1] +
1
n
[x12 + . . .+ xn2]
= x1 + x2
= C~x.
Represent the sample variance of y1, . . . , yn by s
2
y. Then
(n− 1)s2y =
n∑
i=1
(yi − y)2 =
n∑
i=1
((xi1 + xi2)− (x1 + x2))2
=
n∑
i=1
((xi1 − x1) + (xi2 − x2))2
=
n∑
i=1
(
(xi1 − x1)2 + 2(xi1 − x1)(xi2 − x2) + (xi2 − x2)2
)
=
n∑
i=1
(xi1 − x1)2 + 2
n∑
i=1
(xi1 − x1)(xi2 − x2) +
n∑
i=1
(xi2 − x2)2
= (n− 1)s11 + 2(n− 1)s12 + (n− 1)s22.
Then
s2y = s11 + 2s12 + s22 = s11 + s12 + s21 + s22
= [1, 1]
[
s11 s12
s21 s22
] [
1
1
]
= CSC>
7
Example 2
Suppose X ∈ Rn×4 is a data matrix for the variables ~X =

X1
X2
X3
X4
, with the following sample covariance
Sx =

2 0 0 0
0 2 1 0
0 1 2 1
0 0 1 2
 .
What is the sample cross-covariance matrix between
[
X1
X3
]
and
[
X2
X4
]
?
Solution Since
~Y :=

X1
X3
X2
X4
 =

1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1


X1
X2
X3
X4
 := C ~X,
we know it sample covariance matrix is
Sy = CSxC
>
=

1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1


2 0 0 0
0 2 1 0
0 1 2 1
0 0 1 2


1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1

=

2 0 0 0
0 1 2 1
0 2 1 0
0 0 1 2


1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1

=

2 0 0 0
0 2 1 1
0 1 2 0
0 1 0 2
 .
From the partition
~Y =

X1
X3
X2
X4

we have the partition
Sy =

2 0 0 0
0 2 1 1
0 1 2 0
0 1 0 2
 .
Then sample cross-covariance matrix between
[
X1
X3
]
and
[
X2
X4
]
is
[
0 0
1 1
]
. This result can be verified
entrywise.
8 