AOS1427-r代写|学霸联盟

AOS1427-r代写

时间：2022-11-02

The Annals of Statistics
2016, Vol. 44, No. 4, 1564–1592
DOI: 10.1214/15-AOS1427
© Institute of Mathematical Statistics, 2016
THE TRACY–WIDOM LAW FOR THE LARGEST EIGENVALUE OF
F TYPE MATRICES
BY XIAO HAN, GUANGMING PAN1 AND BO ZHANG
Nanyang Technological University
Let Ap = YY∗m and Bp = XX
∗
n be two independent random matrices
where X = (Xij )p×n and Y = (Yij )p×m respectively consist of real (or
complex) independent random variables with EXij = EYij = 0, E|Xij |2 =
E|Yij |2 = 1. Denote by λ1 the largest root of the determinantal equation
det(λAp − Bp) = 0. We establish the Tracy–Widom type universality for λ1
under some moment conditions on Xij and Yij when p/m and p/n approach
positive constants as p → ∞.
1. Introduction. High-dimensional data now commonly arise in many sci-
entific fields such as genomics, image processing, microarray, proteomics and fi-
nance, to name but a few. It is well known that the classical theory of multivariate
statistical analysis for the fixed dimension p and large sample size n may lose its
validity when handling high-dimensional data. A popular tool in analyzing large
covariance matrices, and hence high-dimensional data is random matrix theory.
The spectral analysis of high-dimensional sample covariance matrices has attracted
considerable interests among statisticians, probabilitists and mathematicians since
the seminal work of Marc˘enko and Pastur [21] about the limiting spectral distribu-
tion for a class of sample covariance matrices. One can refer to the monograph of
Bai and Silverstein [1] for a comprehensive summary and references therein.
The largest eigenvalue of covariance matrices plays an important role in mul-
tivariate statistical analysis such as principle component analysis (PCA), multi-
variate analysis of variance (MANOVA) and discriminant analysis. One may refer
to [22] for more details. In this paper, we focus on the largest eigenvalue of the F
type matrices. Suppose that
Ap = YY
∗
m
, Bp = XX
∗
n
(1.1)
are two independent random matrices where X = (Xij )p×n and Y = (Yij )p×m re-
spectively consist of real (or complex) independent random variables with EXij =
Received June 2015; revised December 2015.
1Supported in part by a MOE Tier 2 Grant 2014-T2-2-060 and by a MOE Tier 1 Grant RG25/14
at the Nanyang Technological University, Singapore.
MSC2010 subject classifications. Primary 60B20, 34K25; secondary 60F05, 62H10.
Key words and phrases. Tracy–Widom distribution, largest eigenvalue, sample covariance matrix,
F matrix.
1564
TRACY–WIDOM LAW 1565
EYij = 0 and E|Xij |2 = E|Yij |2 = 1. Consider the determinantal equation
det(λAp − Bp) = 0.(1.2)
When Ap is invertible, the roots to (1.2) are the eigenvalues of a F matrix
A−1p Bp,(1.3)
referred to as a Fisher matrix in the literature. The determinantal equation (1.2) is
closely connected with the generalized eigenproblem
det
[
λ(Ap + Bp)− Bp]= 0.(1.4)
We illustrate this in the next section. Many classical multivariate statistical tests
are based on the roots of (1.2) or (1.4). For instance, one may use them to test
the equality of two covariance matrices and the general linear hypothesis. In the
framework of multivariate analysis of variance (MANOVA), Ap represents the
within group covariance matrix while Bp means the between groups covariance
matrix. A one-way MANOVA can be used to examine the hypothesis of equality
of the mean vectors of interest.
Tracy and Widom in [29, 30] first discovered the limiting distributions of the
largest eigenvalue for the large Gaussian Wigner ensemble, thus named as Tracy–
Widom’s law. Since their pioneer work study toward the largest eigenvalues of
large random matrices becomes flourishing. To name a few, we mention [6, 10,
13, 14] and [26]. Among them we would mention El Karoui [6] which handled
the largest eigenvalue of Wishart matrices for the nonnull population covariance
matrix and provided a kind of condition on the population covariance matrix to
ensure the Tracy–Widow law [see (4.16) below].
A follow-up to the above results is to establish the so-called universality prop-
erty for generally distributed large random matrices. Specifically speaking, the uni-
versality property states that the limiting behavior of an eigenvalue statistic usually
is not dependent on the distribution of the matrix entries. Indeed, the Tracy–Widom
law has been established for the general sample covariance matrices under very
general assumptions on the distributions of the entries of X. The readers can refer
to [3, 7, 9, 17, 19, 25, 27, 28, 33] for some representative developments on this
topic. When proving universality an important tool is the Lindeberg comparison
strategy (see Tao and Vu in [27] and Erdos, Yau and Yin [9]) and an important
input when applying Lindeberg’s comparison strategy is the strong local law de-
veloped by Erdos, Schlein and Yau in [8] and Erdos, Yau and Yin in [9].
Johnstone in [15] proved that the largest root of (1.1) converges to Tracy and
Widom’s distribution of type one after appropriate centering and scaling when the
dimension p of the matrices Ap and Bp is even, limp→∞ p/m < 1 and Bp and
Ap are both Wishart matrices. It is believed that the limiting distribution should
not be affected by the dimension p. Indeed, numerical investigations both in [15]
and [16] suggest that the Tracy and Widom approximation in the odd dimension
1566 X. HAN, G. PAN AND B. ZHANG
case works as well as in the even dimension case. Besides, as it can be guessed,
the Tracy and Widom approximation should not rely on the Gaussian assumption.
However, theoretical support for these remains open. Furthermore, when Ap is not
invertible the limiting distribution of the largest root to (1.1) is unknown yet even
under the Gaussian assumption.
In this paper, we prove the universality of the largest root of (1.2) by impos-
ing some moment conditions on Ap and Bp . Specifically speaking, we prove that
the largest root of (1.2) converges in distribution to the Tracy and Widom law
for the general distributions of the entries of X and Y no matter what the di-
mension p is, even or odd. Moreover, the result holds when limp→∞ p/m < 1
or limp→∞ p/m > 1, corresponding to invertible Ap and noninvertible Ap . This
result also implies the asymptotic distribution of the largest root of (1.4).
At this point, it is also appropriate to mention some related work about the roots
of (1.2). The limiting spectral distribution of the roots was derived by [32] and [1].
One may also find the limits of the largest root and the smallest root in [1]. Central
limit theorem about linear spectral statistics was established in [38]. Very recently,
the so-called spiked F model has been investigated by [5] and [34]. We would like
to point out that they prove the local asymptotic normality or asymptotic normality
for the largest eigenvalue of the spiked F model, which is completely different from
our setting.
We conclude this section by outlining some ideas in the proof and presenting the
structure of the rest of the paper. When Ap is invertible, the roots to (1.2) become
those of the F matrix A−1p Bp so that we may work on A−1p Bp . Roughly speaking,
A−1p Bp can be viewed as a kind of general sample covariance matrix T
1/2
n XX∗T1/2n
with Tn being a population covariance matrix by conditioning on Ap . Denote the
largest root of (1.2) by λ1. The key idea is to break λ1 into a sum of two parts as
follows:
λ1 −μp = (λ1 − μˆp)+ (μˆp −μp),(1.5)
where μˆp is an appropriate value when Ap is given and μp is an appropriate value
when Ap is not given (their definitions are given in the later sections). However,
we cannot condition on Ap directly. Instead we first construct an appropriate event
so that we can handle the first term on the right-hand side of (1.5) on the event
to apply the earlier results about T1/2n XX∗T1/2n . Particularly, we need to verify the
condition (4.16) below. Once this is done, the next step is to prove that the second
term on the right-hand side of (1.5) after scaling converges to zero in probability.
This approach is different from that used in the literature in proving universality
for the local eigenvalue statistics.
Unfortunately, when Ap is not invertible we cannot work on F matrices A−1Bp
anymore. To overcome the difficulty, we instead start from the determinantal equa-
tion (1.2). It turns out that the largest root λ1 can then be linked to the largest root
of some F matrix when X consists of Gaussian random variables. Therefore, the
TRACY–WIDOM LAW 1567
result about F matrices A−1Bp is applicable. For general distributions, we find that
it is equivalent to working on such a “covariance-type” matrix
D−1/2U1X
(
I − X∗U∗2
(
U2XX∗U∗2
)−1U2X)X∗U∗1D−1/2.(1.6)
The definitions of D and Uj , j = 1,2 are given in the later section. This matrix
is much more complicated than general sample covariance matrices. To deal with
(1.6), we construct a 3 × 3 block linearization matrix
H = H(X) =
⎛
⎝ −zI 0 D
−1/2U1X
0 0 U2X
XT UT1 D−1/2 XT UT2 −I
⎞
⎠ ,(1.7)
where z = E + iη is a complex number with a positive imaginary part. It turns out
that the upper left block of the 3 × 3 block matrix H−1 is the Stieltjes transform
of (1.6) by simple calculations. We next develop the strong local law around the
right end support μp by using a type of Lindeberg’s comparison strategy raised in
[17] and then use it to prove edge universality by adapting the approach used in [9]
and [3].
The paper is organized as follows. Section 2 is to give the main results. Sta-
tistical applications and Tracy–Widom approximation are discussed in Section 3.
Section 4 is devoted to proving the main result when Ap is invertible. Sections 5
and 6 prove the main result when Ap is not invertible. Some lemmas (theorems)
and their proof are provided in the supplementary material [12] (Sections 7–12).
2. The main results. Throughout the paper, we make the following condi-
tions.
CONDITION 1. Assume that {Zij } are independent random variables with
EZij = 0, E|Zij |2 = 1. For all k ∈ N , there is a constant Ck such that E|Zij |k ≤
Ck . In addition, if {Zij } are complex, then EZ2ij = 0.
We say that a random matrix Z = (Zij ) satisfies Condition 1 if its entries {Zij }
satisfy Condition 1.
CONDITION 2. Assume that random matrices X = (Xij )p,n and Y = (Yij )p,m
are independent.
CONDITION 3. Set m = m(p) and n = n(p). Suppose that
lim
p→∞
p
m
= d1 > 0, lim
p→∞
p
n
= d2 > 0, 0 < lim
p→∞
p
m+ n < 1.
1568 X. HAN, G. PAN AND B. ZHANG
To present the main results uniformly, we define m˘ = max{m,p}, n˘ =
min{n,m+ n− p} and p˘ = min{m,p}. Moreover, let
sin2(γ /2) = min{p˘, n˘} − 1/2
m˘+ n˘− 1 , sin
2(ψ/2) = max{p˘, n˘} − 1/2
m˘+ n˘− 1 ,(2.1)
μJ,p = tan2
(
γ +ψ
2
)
,
(2.2)
σ 3J,p = μ3J,p
16
(m˘+ n˘− 1)2
1
sin(γ ) sin(ψ) sin2(γ +ψ).
Formulas (2.2) can be found in [15] when d1 < 1.
We below present alternative expressions of μJ,p and σJ,p . To this end, define
a modified density of the Marc˘enko–Pastur law [21] (M–P law) by
p(x) = 12πx(p˘/m˘)
√
(bp − x)(x − ap)I(ap ≤ x ≤ bp),(2.3)
where ap = (1 −
√
p˘
m˘
)2 and bp = (1 +
√
p˘
m˘
)2. Let γ1 ≥ γ2 ≥ · · · ≥ γp˘ satisfy∫ +∞
γj
p(x) dx = j
p˘
,(2.4)
with γ0 = bp and γp = ap . Moreover, suppose that cp ∈ [0, ap) satisfies the equa-
tion ∫ +∞
−∞
(
cp
x − cp
)2
p(x) dx = n˘
p˘
.(2.5)
One may easily check the existence and uniqueness of cp . Define
μp = 1
cp
(
1 + p˘
n˘
∫ +∞
−∞
(
cp
x − cp
)
p(x) dx
)
(2.6)
and
1
σ 3p
= 1
c3p
(
1 + p˘
n˘
∫ +∞
−∞
(
cp
x − cp
)3
p(x) dx
)
.(2.7)
It turns out that (2.2) and (2.6)–(2.7) are equivalent subject to some scaling, which
is verified in Section 7 in the supplementary material [12].
We also need the following moment match condition.
DEFINITION 1 (Moment matching). Let X1 = (x1ij )M×N and X0 = (x0ij )M×N
be two matrices satisfying Condition 1. We say that X1 matches X0 to order q ,
if for the integers i, j, l and k satisfying 1 ≤ i ≤ M , 1 ≤ j ≤ N , 0 ≤ l, k and
l + k ≤ q , they have the relationship
E
[(x1ij )l( x1ij )k]= E[(x0ij )l( x0ij )k]+O(exp(−(logp)C)),(2.8)
TRACY–WIDOM LAW 1569
where C is some positive constant bigger than one, x is the real part and x is
the imaginary part of x.
Throughout the paper, we use X0 to stand for the random matrix consisting of
independent Gaussian random variables with mean zero and variance one.
Denote the type-i Tracy–Widom distribution by Fi , i = 1,2 (see [30]). Set Bp =
XX∗
n˘
and Ap = YY∗m˘ . We are now in a position to state the main results about F type
matrices.
THEOREM 2.1. Suppose that the real random matrices X and Y satisfy Con-
ditions 1–3. Moreover, suppose that 0 < d2 < ∞. Denote the largest root of
det(λAp − Bp) = 0 by λ1.
(i) If 0 < d1 < 1, then
lim
p→∞P
(
(n˘/m˘)λ1 −μJ,p
σJ,p
≤ s
)
= F1(s).(2.9)
(ii) If d1 > 1 and X matches the standard X0 to order 3, then (2.9) still holds.
REMARK 1. When X and Y are complex random matrices, Theorem 2.1 still
holds but the Tracy–Widom distribution F1(s) should be replaced by F2(s).
If 0 < d1 < 1, then Ap is invertible. In this case the largest eigenvalue λ1 is that
of F matrices A−1p Bp . If d1 > 1, then Ap is not invertible.
REMARK 2. Theorem 2.1 immediately implies the distribution of the largest
root of det(λ(Bp + Ap) − Bp) = 0. In fact, the largest root of det(λ(Bp + Ap) −
Bp) = 0 is λ11+λ1 if λ1 is the largest root of the F matrices BpA−1p in Theorem 2.1
when 0 < d1 < 1.
When d1 > 1 the largest root of det(λ(Bp + Ap) − Bp) = 0 is one with
multiplicity (p − m). We instead consider the (p − m + 1)th largest root of
det(λ(Bp + Ap) − Bp) = 0. It turns out that the (p − m + 1)th largest root of
det(λ(Bp + Ap)− Bp) = 0 is λ11+λ1 if λ1 is the largest root of det(λAp − Bp) = 0.
Moreover, note the equality
(Bp + Ap)−1Bp + (Bp + Ap)−1Ap = I.
If Y matches X0 to order 3, then the smallest positive root of det(λ(Bp + Ap) −
Bp) = 0 also tends to type-1 Tracy–Widom distribution after appropriate central-
izing and rescaling by Theorem 2.1 when d1 > 1 and d2 > 1.
We would like to point out that Johnstone [15] proved part (i) of Theorem (2.1)
when p is even, Ap and Bp are both Wishart matrices. Part (ii) of Theorem (2.1) is
new even if Ap and Bp are both Wishart matrices. When proving Theorem 2.1, we
1570 X. HAN, G. PAN AND B. ZHANG
have indeed obtained different asymptotic mean and variance. Precisely we have
proved that
lim
p→∞P
(
σpn˘
2/3(λ1 −μp) ≤ s)= F1(s)(2.10)
and that ∣∣∣∣m˘n˘ μJ,p −μp
∣∣∣∣= O(p−1), limp→∞σp m˘n˘1/3 σJ,p = 1.(2.11)
Equations (2.10) and (2.11) imply Theorem 2.1. The proof of (2.11) is provided in
the supplementary material [12] and we prove (2.10) in the main paper.
3. Applications and simulations. This section is to explore some applica-
tions of our universality results in high-dimensional statistical inference and con-
duct simulations to check the quality of the approximations of our limiting law.
3.1. Long-side spherical test (LSST) for separable covariance matrices. Con-
sider a data matrix Y = (y1, . . . ,yN)p×N as follows:
Y = 1/2XT1/2,(3.1)
where X is a p × N matrix satisfying Condition 1, X matches X0 to order 3 (if
p > 2N or 2p definite matrices. We start from a special case T = I and the model then becomes
Y = 1/2X.
For such a simplified model, the spherical test: H0 : = σ 2I vs. H1 :
= σ 2I
has been widely discussed in literature. When p is comparable to N , there are
considerable work on it. To name a few, we mention [18] and Section 9.5 of [35].
We can extend this test to the more general model (3.1). In model (3.1), YY∗ is
called the separable covariance matrix which is to depict the spatial temporal data.
For high dimensional data, the spectral properties of YY∗ are studied in some
recent papers such as [24] and [37]. In this section, we focus on testing whether
or T is proportional to I. To be precise, we can conduct the following hypothesis
testing problems:
If lim
p→∞
p
N
< 1, we test H0 : T = σ 21 I vs. H1 : T
= σ 21 I
or
If lim
p→∞
p
N
> 1, we test H0 : = σ 22 I vs. H1 :
= σ 22 I.
In the sequel, we focus on the first testing problem, that is, limp→∞ pN < 1 and
the second can be discussed similarly. We choose an index subset S ∈ {1, . . . ,N}
such that the cardinality of S is N2 (we also suggest an approach for selection
TRACY–WIDOM LAW 1571
of S in stationary time series models in simulation). Moreover, we define Z2Z∗2 =∑
i∈S yiy∗i , Z1Z∗1 =
∑
i /∈S yiy∗i , n = N2 and m = N−n. We use the largest root λ1
of det(λZ1Z
∗
1
m˘
− Z2Z∗2
n˘
) = 0 as a test statistic. Under the null hypothesis, λ1 tends to
Tracy–Widom’s distribution after centralizing and rescalling for any selection S.
The key observation is that can be eliminated in det(λZ1Z
∗
1
m˘
− Z2Z∗2
n˘
) = 0. But
under the alternative hypothesis, for a suitably chosen S, the correlation structure
involved in Z2 can be much different from that of Z1, which implies the largest
root of the above determinant deviates much from μp . This observation ensures
that n2/3(λ1 − μp) will be very large when the null hypothesis does not hold.
One can see that the restriction of limp→∞ pN < 1 comes from the conditions of
Theorem 2.1.
In addition, we would point out that the extreme eigenvalue of sample covari-
ance matrices is not a proper statistic for such a hypothesis test when there are no
spiked eigenvalues, while our statistic is not dependent on the fact that whether
is spiked or not. We demonstrate the reasons below. First, one cannot directly use
the largest eigenvalue of YY∗ since is unknown. Therefore, we have to apply the
statistic [One-side identity test (OSI)] in Section 2.1 of [3], that is, λ1(YY∗)−λ2(YY∗)
λ2(YY∗)−λ3(YY∗) .
When T is a spiked matrix, the statistic works well (see Table 4 in [3]). However,
when T is not spiked and = I the statistic λ1(YY∗)−λ2(YY∗)
λ2(YY∗)−λ3(YY∗) always tends to the
same distribution for any T satisfying (1.4) of [3] (including T = σ 2I), which
means the statistic does not work in this case. Table 4 below confirms this phe-
nomenon.
3.2. Equality of K covariance matrices (EOM). Consider the model of the
following form:
Zi = 1/2i Xi , i = 1, . . . ,K,
where {Xi} are p × ni random matrices and {i} are p × p invertible population
covariance matrices and K is a positive integer. Moreover, we assume that there
exists a k0 such that limp→∞
∑k0
i=1 ni
p
∈ (0,∞) and limp→∞
∑K
i=k0+1 ni
p
∈ (0,∞).
For simplicity and being consistent with the previous notation, we set n =∑k0i=1 ni ,
m = ∑Ki=k0+1 ni , Y = (Xk0+1, . . . ,XK) and X = (X1, . . . ,Xk0). We also assume
that X and Y satisfy the conditions of Theorem 2.1.
We are interested in the following hypothesis test:
H0 : 1 = 2 = · · · = K vs. H1 : ∃1 ≤ i < j ≤ K such that i
= j .
Under the null hypothesis, we have
det
(
λ
∑K
i=k0+1 ZkZ
∗
k
m˘
−
∑k0
i=1 ZkZ∗k
n˘
)
= 0 ⇐⇒ det
(
λ
YY∗
m˘
− XX
∗
n˘
)
= 0.
1572 X. HAN, G. PAN AND B. ZHANG
In view of this, we may propose the largest root λ1 of det(λ
∑K
i=k0+1 ZkZ
∗
k
m˘
−∑k0
i=1 ZkZ∗k
n˘
) = 0 as a test statistic. By Theorem 2.1, we see that λ1 tends to Tracy–
Widom’s distribution after centralizing and rescaling.
We below investigate its power for a kind of sparse alternative hypothesis when
K = 2. Specifically, we consider the alternative case
Z1 = 1/2X, Z2 = Y.(3.2)
If YY∗ is invertible, we choose = I + τ p/m+r1−p/me1eT1 and r =
√
p
m
+ p
n
− p2
mn
. The
reason why we choose the factor p/m+r1−p/m is that it is a spiked F matrix when τ > 1.
The largest eigenvalue λ1 converges to normal distribution weakly by Proposi-
tion 11 of [5] and Theorem 4.1 of [34]. In fact, by Proposition 5 of [5] and Theo-
rem 3.1 of [34] we immediately have the following proposition.
PROPOSITION 1. For the model (3.2), suppose YY∗ is invertible and = I +
τ
p/m+r
1−p/me1e
T
1 . Let φ(x) = x(x−1+p/n)x(1−p/m)−1 . When τ > 1, the largest eigenvalue of the
spiked F matrix m
n
(YY∗)−1 1/2XX∗ 1/2 (denoted by λ1) almost surely converges
to m
n
φ(1 + τ p/m+r1−p/m), and for any positive constant C, we have
lim
p→∞P
(
(n/m)λ1 −μJ,p
σJ,p
> C
)
= 1(3.3)
(the power of the test goes to 1 as p → ∞).
An interesting feature of this approach is that K can tend to infinity. Below we
compare our statistic with Corrected Likelihood Ratio Test (CLRT) proposed in
Chapter 9 of [35] when K = 2. First, we do not assume E|Xij |4 to be equal to
some known constant β for all i and j unlike [35]. Moreover, the 4th moment
assumption restricts the extension of their approach to the equality test of K ma-
trices since it is not reasonable to make such an assumption when K is large. The
advantage of CLRT is that it includes all information of the F matrix’s spectrum
such that their test is more powerful when the population eigenvalues are close to
each other. But when −11 2 is a spiked matrix the largest eigenvalue of F type
matrices works better. See Table 8 below.
3.3. Correlated noise detection. Let
yt = f (xt )+ 1/2εt , t = 1,2, . . . , T(3.4)
be the signal received at time t where yt is a p-dimensional real or complex vector
and εt is a p-dimensional white noise vector (i.i.d.) satisfying Condition 1. More-
over, if 2p < T , we assume that the third moments of the entries of εt are 0, is
an unknown p × p invertible matrix, limp→∞ pT ∈ (0,1), x is a vector or matrix
TRACY–WIDOM LAW 1573
with arbitrary dimension (maybe correlated to y) and that f (x) is a given function
such as regression model f (x) = x∗β . We are interested in whether there is “real
signal” contained in yt . Our hypothesis testing problem is
H0 : yt = 1/2εt vs. H1 : y
= 1/2εt .(3.5)
Let X = (y1, . . . ,yT/2), Y = (yT/2+1, . . . ,yT ), n = T2 and m = T −n. As be-
fore use the largest root of det(λYY∗
m˘
− XX∗
n˘
) = 0 as a test statistic which converges
to Tracy–Widom’s distribution after centralizing and rescaling by Theorem 2.1.
In engineering, we do not need to split the sample into X and Y. Specifically, in
signal detection or cognitive radio, model (3.4) takes the form of
yt = Axt + 1/2εt , t = 1,2, . . . , T ,(3.6)
where xt is a k-dimensional signal vector with covariance matrix S, A is a p × k
deterministic matrix and the other assumptions are the same as those in the pre-
vious model (3.4). We are also interested in the test (3.5). This is a widely dis-
cussed problem in cognitive radio. For the high dimensional setting, once may
see [36] and [23]. We also refer to the recent paper [31] assuming the corre-
lated noise. In engineering, there exists some method to get another signal-free
sample, say rt = 1/2zt , t = 1, . . . , T1 where {εt }Tt=1 and {zt }T1t=1 are i.i.d. One
can refer to [20, 34] and [23] for detailed discussions. Let R1 = (y1, . . . ,yT ) and
R2 = (r1, . . . , rT1). We use the largest root of det(λR2R
∗
2
T1
− R1R∗1
T
) = 0 as a statistic.
The power is sated in Table 9 below.
3.4. Other applications under the Gaussian distribution. There are many
other applications which can be connected with the largest eigenvalue of the F
matrices due to nice properties of the Gaussian distribution. We illustrate a multi-
variate ANOVA test below. One can refer to [15] for more applications. We con-
sider the multivariate regression model
Y = XB + Z,
where Y is an N × p response matrix, X is a known N × q design matrix, B
is a q × p unknown regression matrix, Z is a N × p random matrix with i.i.d.
Gaussian entries and is an invertible deterministic matrix. We are interested in
the following hypothesis test: given g × q matrix C:
H0 : CB = 0 vs. H1 : CB
= 0.
To explain the motivation behind the test, we consider a low dimensional ex-
ample. If q = 3, p = 1, Y = (y1, . . . ,yN)T , yi ∼ N(bj , σ 2), Nj−1 ≤ i ≤ Nj ,
1 = N1 ≤ · · · ≤ N4 = N and C = (10 0−1 −11 ), the null hypothesis CB = 0 is
equivalent to b1 = b2 = b3, that is, ANOVA test. Of course, we consider high
1574 X. HAN, G. PAN AND B. ZHANG
dimensional setting in this paper. We assume N > q . The least square estimator is
Bˆ = (X∗X)−1X∗Y. Under the null hypothesis, it is easy to see that the matrices
D = Y∗(I − PX)Y ∼ Wp(,N − q),
E = (CBˆ)∗[C(X∗X)−1C∗]−1(CBˆ) ∼ Wp(, g)
are independent where PX is the projection matrix generated by X. Then the largest
root of det(λ D
N−q − Eg ) = 0 can be used as a statistic for the test. One can further
refer to pages 210–213 of [11] for constructing a confidence interval for the linear
combination of the entries of B.
3.5. Simulations. We conduct some numerical simulations to check the accu-
racy of the distributional approximations in Theorem 2.1 under various settings of
(p,m,n) and the distribution of X. We also study the power of the hypothesis tests
in Sections 3.1–3.2.
As in [15], we below use ln(λ1) to run simulations. To do so, we first give its
distribution. By [15] and (2.10), we can find that
λ1 = μp + Z
σpn˘2/3
+ op(n˘−2/3),(3.7)
where Z = F−11 (U) and U is a U(0,1) random variable. By Taylor’s expansion,
we then have
ln(λ1) = ln(μp)+ Z
μpσpn˘2/3
+ op(n˘−2/3).(3.8)
Recall | m˘
n˘
μJ,p − μp| = O(p−1) and limp→∞ σp m˘n˘1/3 σJ,p = 1 in Section 2. Sum-
marizing the above, we can find
lim
p→∞P
(
σpln
(
ln(λ1)−μpln)≤ s)= F1(s),(3.9)
where
μpln = ln
(
m˘
n˘
μJ,p
)
, σpln = μJ,p
σJ,p
.(3.10)
3.5.1. Accuracy of approximations for TW laws and size. We conduct some
numerical simulations to check the accuracy of the distributional approximations
in Theorem 2.1, which include the size of the tests as well.
Table 1 is done by the software R. We set two initial triples (p,m,n) of
M0 = (5,40,10) and M1 = (30,20,25) and then consider 2Mi,3Mi and 4Mi ,
i = 0,1. The triples M0 and M1 correspond to invertible YY∗ and noninvertible
YY∗ respectively. For each case, we generate 10,000 (X,Y) whose entries follow
standard normal distribution. We calculate the largest root of det(λYY∗
m˘
− XX∗
n˘
) = 0
to get ln(λ1) and renormalize it with μpln and σpln. In the “Percentile column”,
TRACY–WIDOM LAW 1575
TABLE 1
Standard quantiles for several triples (p,m,n): Gaussian case
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25)
Percentile TW M0 2M0 3M0 4M0 M1 2M1 3M1 4M1 2*SE
−3.9 0.01 0.0208 0.0133 0.0124 0.0115 0.0017 0.0035 0.0048 0.0060 0.002
−3.18 0.05 0.0680 0.0601 0.0562 0.0582 0.0210 0.0276 0.0327 0.0370 0.004
−2.78 0.1 0.1176 0.1120 0.1088 0.1095 0.0608 0.0712 0.0808 0.0842 0.006
−1.91 0.3 0.3154 0.3030 0.3080 0.3084 0.2641 0.2744 0.2864 0.2909 0.009
−1.27 0.5 0.5139 0.5070 0.5051 0.5082 0.4839 0.4904 0.4960 0.4964 0.01
−0.59 0.7 0.7073 0.7154 0.7012 0.7111 0.7055 0.7031 0.7019 0.7005 0.009
0.45 0.9 0.9083 0.9058 0.9047 0.9090 0.9040 0.9010 0.9016 0.9003 0.006
0.98 0.95 0.9561 0.9544 0.9517 0.9557 0.9489 0.9530 0.9504 0.9498 0.004
2.02 0.99 0.9919 0.9909 0.9913 0.9919 0.9878 0.9887 0.9897 0.9901 0.002
the quantiles of TW1 law corresponding to the “TW” column are listed. We state
the values of the empirical distributions of the renormalized λ1 for various triples
at the corresponding quantiles in columns 3–10 and the standard errors based on
binomial sampling are listed in the last column. QQ-plots corresponding to the
triples (20,160,40) and (120,80,100) are also stated in Figure 1.
Tables 2, 3 and Figures 2 and 3 are the same as Table 1 and the corresponding
Figure 1 except that that we replace the Gaussian distribution by the some discrete
distribution and uniform distribution.
When considering the tests in Sections 3.1–3.3, one may refer to Tables 1–3 as
well for their sizes at the nominal significant levels.
FIG. 1. QQ plots of the triples (20,160,40) and (120,80,100) corresponding to Table 1.
1576 X. HAN, G. PAN AND B. ZHANG
TABLE 2
Standard quantiles for several triples (p,m,n): Discrete distribution with the probability mass
function P(x = √3) = P(x = −√3) = 1/6 and P(x = 0) = 2/3
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25)
Percentile TW M0 2M0 3M0 4M0 M1 2M1 3M1 4M1 2*SE
−3.9 0.01 0.0192 0.0132 0.0136 0.0123 0.0006 0.0031 0.0046 0.0047 0.002
−3.18 0.05 0.0637 0.0581 0.0571 0.0573 0.0216 0.0302 0.0321 0.0356 0.004
−2.78 0.1 0.1147 0.1101 0.1099 0.1088 0.0626 0.0733 0.0757 0.0824 0.006
−1.91 0.3 0.3100 0.2966 0.3060 0.3029 0.2665 0.2721 0.2808 0.2827 0.009
−1.27 0.5 0.5000 0.4959 0.4969 0.4996 0.4841 0.4834 0.4985 0.4899 0.01
−0.59 0.7 0.7025 0.7013 0.7099 0.7018 0.6990 0.6992 0.7109 0.6975 0.009
0.45 0.9 0.9107 0.9061 0.9071 0.9036 0.9014 0.9040 0.9059 0.9001 0.006
0.98 0.95 0.9566 0.9546 0.9538 0.9546 0.9503 0.9527 0.9526 0.9512 0.004
2.02 0.99 0.9929 0.994 0.9903 0.9914 0.9890 0.9908 0.9901 0.9894 0.002
FIG. 2. QQ plots of the triples (20,160,40) and (120,80,100) corresponding to Table 2.
TABLE 3
Standard quantiles for several triples (p,m,n): Continuous uniform distribution U(−√3,√3)
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25)
Percentile TW M0 2M0 3M0 4M0 M1 2M1 3M1 4M1 2*SE
−3.9 0.01 0.0098 0.0117 0.0122 0.0120 0.0101 0.0087 0.0092 0.0096 0.002
−3.18 0.05 0.0612 0.0632 0.0606 0.0592 0.0514 0.0462 0.0492 0.0482 0.004
−2.78 0.1 0.1205 0.1243 0.1208 0.1197 0.1023 0.0942 0.1033 0.0992 0.006
−1.91 0.3 0.3644 0.3542 0.351 0.3432 0.3132 0.2946 0.3101 0.3017 0.009
−1.27 0.5 0.5767 0.5575 0.5563 0.5496 0.516 0.5073 0.5151 0.5069 0.01
−0.59 0.7 0.7728 0.7540 0.7443 0.7440 0.7182 0.7123 0.714 0.7171 0.009
0.45 0.9 0.9397 0.9243 0.9181 0.9202 0.9141 0.9068 0.9071 0.9059 0.006
0.98 0.95 0.9722 0.9672 0.9599 0.9614 0.9584 0.9538 0.9556 0.9534 0.004
2.02 0.99 0.9959 0.9941 0.993 0.9922 0.9932 0.9912 0.9919 0.9916 0.002
TRACY–WIDOM LAW 1577
FIG. 3. QQ plots of the triples (120,320,160) and (320,140,200) corresponding to Table 3.
3.5.2. Power study of “Long-Side Spherical Test (LSST) for separable covari-
ance matrices” (see Section 3.1). We consider the alternative model as fol-
lows:
y1 = z1, yt = ayt−1 +
√
1 − a2zt , t = 2, . . . ,N,
where {yt }Nt=1 are p-dimensional vectors, {zt }Nt=1 are independent noise vectors
satisfying Condition 1 and a ∈ (0,1). It is easy to see that {yt }Nt=1 are a station-
ary sequence, and hence the matrix T [see (3.1)] is a Toeplitz matrix. We then
suggest to choose the set S = {1, . . . , N4 , 3N4 , . . . ,N} so that the correlation
structures involved in Z1 and Z2 are different. In this subsection, we also compare
our approach (denoted by LSST) with that raised by BAO, etc. in [3] (denoted by
OSI).
The power of the tests is listed in Table 4 below and the nominal significant
level of the tests is 5%.
From the table, we can see that the OSI approach does not gain power under
the alternative hypothesis, which is consistent with our analysis before. The power
of LSST is larger when either dimension or a becomes larger. When a is small,
say 0.1, the power is very poor. This phenomenon is easy to understand since
a3 = 0.001 ≈ 0, that is, Cov(yt ,yt+3) = 0.001 ≈ 0, the data {yt }Tt=1 look like in-
dependent.
3.5.3. Power of “Equality of K covariance matrices (EOM)” for k = 2. We
study the power of the test and consider the alternative case (3.2).
When YY∗ is invertible, we choose = I + τ p/m+r1−p/me1eT1 mentioned be-
low (3.2). When YY∗ is not invertible, by Theorem 1.2 of [2] we can find out
1578 X. HAN, G. PAN AND B. ZHANG
TABLE 4
Power of several two-tuples (p,N): the entries of zt follow continuous uniform distribution
U(−√3,√3)
Initial two-tuples M0 = (5,40) and M1 = (30,40)
Triples Approach a = 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
M0 LSST 0.0198 0.0226 0.0251 0.0363 0.0608 0.1223 0.2510 0.4812 0.7789
OSI 0.0593 0.0618 0.0645 0.0669 0.0782 0.0841 0.0935 0.1204 0.1845
2M0 LSST 0.0324 0.0339 0.0389 0.0709 0.1669 0.3910 0.7120 0.9584 0.9996
OSI 0.0590 0.0607 0.0622 0.0601 0.0631 0.0694 0.0741 0.0815 0.1028
3M0 LSST 0.0388 0.0386 0.0474 0.1275 0.3062 0.6668 0.9516 0.9996 1.0000
OSI 0.0616 0.0593 0.0574 0.0607 0.0590 0.0655 0.0665 0.0766 0.0854
4M0 LSST 0.0389 0.0380 0.0688 0.1805 0.4707 0.8653 0.9961 1.0000 1.0000
OSI 0.0579 0.0606 0.0583 0.0623 0.0598 0.0609 0.0669 0.0672 0.0753
5M0 LSST 0.0390 0.0409 0.0862 0.2463 0.6321 0.9567 1.0000 1.0000 1.0000
OSI 0.0551 0.0589 0.0550 0.0570 0.0641 0.0618 0.0651 0.0671 0.0704
M1 LSST 0.0293 0.0321 0.0364 0.0507 0.0730 0.1025 0.1438 0.1868 0.2267
OSI 0.0613 0.0615 0.0668 0.0676 0.0695 0.0678 0.0667 0.0705 0.0904
2M1 LSST 0.0360 0.0369 0.0499 0.0878 0.1503 0.2467 0.3531 0.4604 0.5364
OSI 0.0550 0.0605 0.0565 0.0601 0.0595 0.0613 0.0632 0.0578 0.0577
3M1 LSST 0.0388 0.0453 0.0689 0.1333 0.2550 0.4188 0.5903 0.7249 0.7883
OSI 0.0562 0.0602 0.0566 0.0552 0.0611 0.0558 0.0601 0.0566 0.0518
4M1 LSST 0.0439 0.0478 0.0920 0.1871 0.3556 0.5922 0.7885 0.8914 0.9316
OSI 0.0587 0.0562 0.0583 0.0608 0.0611 0.0577 0.0556 0.0559 0.0540
5M1 LSST 0.0396 0.0504 0.1107 0.2458 0.4794 0.7570 0.9068 0.9659 0.9831
OSI 0.0566 0.0585 0.0579 0.0607 0.0580 0.0622 0.0603 0.0576 0.0538
that the smallest nonzero eigenvalue of 1
m
−1/2YY∗−1/2 is not spiked for the
above . So it is hard to get a spiked F matrix. Therefore, we use another ma-
trix
(ω) =
⎛
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝
1
ω
1
ω
.. .
1
ω
⎞
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠
.
In Tables 5–7, the data X and Y are generated as in Tables 1–3 and the nominal
significant level of our test is 5%.
TRACY–WIDOM LAW 1579
TABLE 5
Power of several triples (p,m,n): Gaussian distribution
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25)
τ M0 2M0 3M0 4M0 ω M1 2M1 3M1 4M1
0.5 0.0672 0.0585 0.0563 0.0593 0.3 0.2178 0.4934 0.7071 0.8419
2 0.2763 0.3801 0.4551 0.5067 0.6 0.0574 0.1332 0.2241 0.3106
4 0.6291 0.816 0.9072 0.9567 2 0.1037 0.2166 0.3463 0.5029
6 0.8162 0.9543 0.988 0.9967 3 0.2242 0.5521 0.8156 0.9537
TABLE 6
Power of several triples (p,m,n): Discrete distribution with the probability mass function
P(x = √3) = P(x = −√3) = 1/6 and P(x = 0) = 2/3
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25)
τ M0 2M0 3M0 4M0 ω M1 2M1 3M1 4M1
0.5 0.0674 0.0573 0.0576 0.0595 0.3 0.2101 0.4883 0.7024 0.8425
2 0.3045 0.397 0.4561 0.5171 0.6 0.057 0.1382 0.2176 0.3078
4 0.647 0.8137 0.8984 0.9478 2 0.1055 0.2232 0.3504 0.4974
6 0.8147 0.943 0.9813 0.9936 3 0.2254 0.5487 0.8211 0.9529
TABLE 7
Power of several triples (p,m,n): Continuous uniform distribution U(−√3,√3)
Initial triple M0 = (30,80,40) Initial triple M1 = (80,40,50)
τ M0 2M0 3M0 4M0 ω M1 2M1 3M1 4M1
0.5 0.0394 0.0452 0.0497 0.0440 0.3 0.8209 0.9865 0.9995 1.0000
2 0.4495 0.6622 0.7989 0.8708 0.6 0.2712 0.5675 0.7669 0.8765
4 0.9791 0.9998 1.0000 1.0000 2 0.3115 0.7163 0.9381 0.9937
6 1.0000 1.0000 1.0000 1.0000 3 0.7285 0.9933 1.0000 1.0000
In Tables 5–7, we can find that when τ = 0.5 < 1, (YY∗)−11/2XX∗1/2 is
not a spiked F matrix and the power is poor. When τ > 1, it is a spiked F matrix
and by Proposition 1 the power increases with the dimension and τ . This phe-
nomenon is due to the fact that it may not cause significant change to the largest
eigenvalue of F matrix when finite rank perturbation is weak enough. This phe-
nomenon has been widely discussed for sample covariance matrices; see [10]
and [3]. For the spiked F matrix, one can refer to [5] and [34]. For the non-
invertible case when is far away from I (ω = 0.3 or 3), the power becomes
better. This is because when the empirical spectral distribution (ESD) of is
1580 X. HAN, G. PAN AND B. ZHANG
TABLE 8
Power comparison of several triples (p,m,n): Gaussian distribution with M0 = (5,40,10)
EOM CLRT
τ M0 2M0 3M0 4M0 M0 2M0 3M0 4M0
0.5 0.0672 0.0585 0.0563 0.0593 0.1051 0.0909 0.0777 0.0787
2 0.2763 0.3801 0.4551 0.5067 0.2696 0.2614 0.2503 0.2500
4 0.6291 0.816 0.9072 0.9567 0.5385 0.5867 0.6126 0.6282
6 0.8162 0.9543 0.988 0.9967 0.7318 0.8157 0.8499 0.8676
very different from the M–P law λ1 may tend to another point μ instead of
μp . Then we may gain good power because n˘2/3(μ − μp) may tend to infin-
ity.
Table 8 lists the power comparison between our approach (denoted by EOM)
and CLRT, where we only consider the alternative case (3.2) and = I +
τ
p/m+r
1−p/me1e
T
1 . The nominal significant level is also 5%.
The comparison shows that when is a spiked covariance matrix, our statis-
tic performs better than CLRT, which is consistent with our discussion in Sec-
tion 3.2.
3.5.4. Power of “correlated noise detection”. We consider the model (3.6),
that is, yt = Axt + 1/2εt , t = 1,2, . . . , T . Here, we choose k = 1, A =√
τ
p/m+r
1−p/me1, = I, xt ∼ U(−
√
3,
√
3) and the entries of εt also follow the
uniform distribution U(−√3,√3). Then it is easy to see that Cov(yt ) = I +
τ
p/m+r
1−p/me1e
T
1 , which is similar to Section 3.5.3. Since we can generate an i.i.d.
copy of εt , say zt , t = 1, . . . , T1, in engineering as we discussed in Section 3.3, we
always choose T1 >p for simplicity.
Table 9 lists the power study of the model and the nominal significant level of
our test is 5%.
TABLE 9
Power of several triples (p,T1, T ): Continuous uniform distribution U(−
√
3,
√
3)
Initial triple M0 = (30,80,40)
τ M0 2M0 3M0 4M0
0.5 0.0412 0.0436 0.0485 0.0469
2 0.4485 0.6399 0.7624 0.8401
4 0.9486 0.9983 0.9998 1.0000
6 0.9964 1.0000 1.0000 1.0000
TRACY–WIDOM LAW 1581
The simulation result in Table 9 is similar to that for the spiked case in Ta-
ble 7 above. But we would mention one difference. Write yt = (
√
τ
p/m+r
1−p/me1I)
(st
εt
)
with (
√
τ
p/m+r
1−p/me1I) being of size p × (p + 1). However, Theorem 3.1 of [34]
did not consider such a rectangular matrix. This means we cannot directly apply
Proposition 1 to say the power tends to 1 when τ > 1. But when the (p + 1)-
dimensional vector
(st
εt
)
follows standard multivariate Gaussian distribution, we
then have yt
d= (τ p/m+r1−p/me1eT1 + I)1/2zt , where zt d= εt . This means that Propo-
sition 1 holds under the Gaussian case. From the simulation result, we conjec-
ture that Theorem 3.1 of [34] and Proposition 1 still hold for this model even if
(
√
τ
p/m+r
1−p/me1I) is not a square matrix.
4. Proof of part (i) of Theorem 2.1. In this section, we focus on (i) of Theo-
rem 2.1, that is, (m˘, n˘, p˘) = (m,n,p). Actually, Lemmas 1 and 2 below hold if we
replace (m,n,p) by (m˘, n˘, p˘).
4.1. Two key lemmas. This subsection is to first prove two key lemmas for
proving part (i) of Theorem 2.1. We begin with some notation and definitions.
Throughout the paper, we use M,M0,M ′0,M ′′0 ,M1,M ′′1 to denote some generic
positive constants whose values may differ from line to line. We also use D to
denote sufficiently large positive constants whose values may differ from line to
line. We say that an event holds with high probability if for any big positive
constant D
P
(
c
)≤ n−D,
for sufficiently large n. Recall the definition of γj in (2.4). Let cp,0 ∈ [0, ap) satisfy
1
p
p∑
j=1
(
cp,0
γj − cp,0
)2
= n
p
.(4.1)
Existence of cp,0 will be verified in Lemma 1 below. Moreover, define
μp,0 = 1
cp,0
(
1 + 1
n
p∑
j=1
(
cp,0
γj − cp,0
))
,
(4.2)
1
σ 3p,0
= 1
c3p,0
(
1 + 1
n
p∑
j=1
(
cp,0
γj − cp,0
)3)
.
Set Ap = 1mYY∗ and Bp = 1nXX∗. Rank the eigenvalues of the matrix Ap as
γˆ1 ≥ γˆ2 ≥ · · · ≥ γˆp . Let cˆp ∈ [0, γˆp) satisfy
1
p
p∑
j=1
(
cˆp
γˆj − cˆp
)2
= n
p
.(4.3)
1582 X. HAN, G. PAN AND B. ZHANG
The existence of cˆp with high probability will be given in Lemma 2 below. More-
over, set
μˆp = 1
cˆp
(
1 + 1
n
p∑
j=1
(
cˆp
γˆj − cˆp
))
,
(4.4)
1
σˆ 3p
= 1
cˆ3p
(
1 + 1
n
p∑
j=1
(
cˆp
γˆj − cˆp
)3)
.
We now discuss the properties of cp, cp,0, cˆp,μp,μp,0, μˆp, σp, σp,0 defined
(2.5)–(2.7), (4.1)–(4.4) in the next two lemmas. These lemmas are crucial to the
proof strategy which transforms F matrices into an appropriate sample covariance
matrix.
LEMMA 1. Under the conditions in Theorem 2.1, there exists a constant M0
such that
sup
p
{
cp
ap − cp
}
≤ M0, sup
p
{
cp,0
ap − cp,0
}
≤ M0,(4.5)
lim
p→∞n
2/3|μp −μp,0| = 0(4.6)
and
lim
p→∞
σp
σp,0
= 1, lim sup
p
cp,0
ap
< 1.(4.7)
LEMMA 2. Under the conditions in Theorem 2.1, for any ζ > 0 there exists a
constant Mζ ≥ M0 such that
sup
p
{
cˆp
γˆp − cˆp
}
≤ Mζ , lim sup
p
cˆp
γˆp
< 1(4.8)
and
lim
p→∞n
2/3|μˆp −μp,0| = 0, lim
p→∞
σˆp
σp,0
= 1(4.9)
hold with high probability. Indeed (4.8) and (4.9) hold on the event Sζ defined by
Sζ = {∀j,1 ≤ j ≤ p, |γˆj − γj | ≤ pζp−2/3j˜−1/3},(4.10)
where ζ is a sufficiently small positive constant and j˜ = min{min{m,p}+1−j, j}.
The proofs of Lemmas 1 and 2 are given in the supplementary material [12].
TRACY–WIDOM LAW 1583
4.2. Proof of part (i) of Theorem 2.1. Recall the definition of the matrices Ap
and Bp above (4.3). Define a F matrix F = A−1p Bp whose largest eigenvalue is
λ1 according to the definition of λ1 in Theorem 2.1. It then suffices to find the
asymptotic distribution of λ1 to prove Theorem 2.1.
Recalling the definition of the event Sζ in (4.10), we may write
P
(
σpn
2/3(λ1 −μp) ≤ s)
= P ((σpn2/3(λ1 −μp) ≤ s)∩ Sζ )+ P ((σpn2/3(λ1 −μp) ≤ s)∩ Scζ ).
This implies that (2.10) is equivalent to
lim
p→∞P
((
σpn
2/3(λ1 −μp) ≤ s)∩ Sζ )= F1(s),(4.11)
where we use the fact that P(Scζ ) ≤ p−D for any positive D by Theorem 3.3
of [25].
Write
σpn
2/3(λ1 −μp) = σp
σˆp
σˆpn
2/3(λ1 − μˆp)+ σpn2/3(μˆp −μp)(4.12)
[see (4.3) and (4.4) for σˆp and μˆp]. Note that the eigenvalues of A−1p are 1γˆ1 ≤
1
γˆ2
≤ · · · ≤ 1
γˆp
. Rewrite (4.3) as
1
p
p∑
j=1
(
(1/γˆj )cˆp
1 − (1/γˆp)cˆp
)2
= n
p
.(4.13)
Also recast (4.4) as
μˆp = 1
cˆp
(
1 + p
n
1
p
p∑
j=1
(1/γˆj )cˆp
1 − (1/γˆp)cˆp
)
,
(4.14)
1
σˆ 3p
= 1
cˆ3p
(
1 + p
n
1
p
p∑
j=1
(1/γˆj )cˆp
1 − (1/γˆp)cˆp
)3
.
Up to this stage, the result about the largest eigenvalue of the sample covariance
matrices ZZ∗ with being the population covariance matrix comes into play
where Z is of size p × n satisfying Condition 1 and is of size p × p. A key
condition to ensure Tracy–Widom’s law for the largest eigenvalue is that if ρ ∈
(0,1/σ1) is the solution to the equation∫ (
tρ
1 − tρ
)2
dF(t) = n
p
(4.15)
then
lim sup
p
ρσ1 < 1(4.16)
1584 X. HAN, G. PAN AND B. ZHANG
(one may see [6], Conditions 1.2 and 1.4 and Theorem 1.3 [3], Definition 2.7(i) and
Corollary 3.19 of [17]). Here, F(t) denotes the empirical spectral distribution of
and σ1 means the largest eigenvalue of . Now given Ap , if we treat A−1p as
, then (4.16) is satisfied on the event Sζ due to (4.3) and (4.8) in Lemma 2. It
follows from Theorem 1.3 of [3] and Corollary 3.19 of [17] that
lim
p→∞P
((
σˆpn
2/3(λ1 − μˆp) ≤ s)∩ Sζ |Ap)= F1(s),(4.17)
which implies that
lim
p→∞P
((
σˆpn
2/3(λ1 − μˆp) ≤ s)∩ Sζ )= F1(s).(4.18)
Moreover, by Lemmas 1 and 2 we obtain on the event Sζ
lim
p→∞
σp
σˆp
= 1(4.19)
and
lim
p→∞σpn
2/3(μˆp −μp) = 0.(4.20)
Equation (4.11) then follows from (4.12), (4.17)–(4.20) and Slutsky’s theorem.
The proof is complete.
5. Proof of part (ii) of Theorem 2.1: Standard Gaussian distribution. This
section is to consider the case when {Xij } follow normal distribution with mean
zero and variance one. We below first introduce more notation. Let A = (Aij ) be a
matrix. We define the following norms:
‖A‖ = max|x|=1 |Ax|, ‖A‖∞ = maxi,j |Aij |, ‖A‖F =
√∑
ij
|Aij |2,
where |x| represents the Euclidean norm of a vector x. Notice that ‖ ·‖ is a “pseudo
norm” and we have a simple relationship among these norms
‖A‖∞ ≤ ‖A‖ ≤ ‖A‖F .
We also need the following commonly used definition about stochastic domination
to simplify the statements.
DEFINITION 2 (Stochastic domination). Let
ξ = {ξ (n)(u) : n ∈N, u ∈ U(n)}, ζ = {ζ (n)(u) : n ∈N, u ∈ U(n)}
be two families of random variables, where U(n) is a n-dependent parameter set
(or independent of n). If for sufficiently small positive ε and sufficiently large σ ,
sup
u∈U(n)
P
[∣∣ξ (n)(u)∣∣> nε∣∣ζ (n)(u)∣∣]≤ n−σ
TRACY–WIDOM LAW 1585
for large enough n ≥ n(ε, σ ), then we say that ζ stochastically dominates ξ uni-
formly in u. We denote this relationship by |ξ | ≺ ζ and also write it as ξ = O≺(ζ ).
Furthermore, we also write it as |x| ≺ y if x and y are both nonrandom and
|x| ≤ nε|y| for sufficiently small positive ε.
PROOF OF PART (ii) OF THEOREM 2.1. We start the proof by reminding read-
ers that m

p. Since mfunction of 1
p
Y∗Y is the M–P law and we denote its density by ρpm(x). We define
γm,1 ≥ γm,2 ≥ · · · ≥ γm,m to satisfy∫ +∞
γm,j
ρpm dx = j
m
,(5.1)
with γm,0 = (1 +
√
m
p
)2, γm,m = (1 −
√
m
p
)2. Correspondingly, denote the eigen-
values of 1
p
Y∗Y by γˆm,1 ≥ γˆm,2 ≥ · · · ≥ γˆm,m. Here, we would remind the readers
that ρpm(x), γm,j , γˆm,1 are similar to those in (2.3), below (2.3) and above (4.3)
except that we are interchanging the role of p and m because we are considering
1
p
Y∗Y rather than 1
m
YY∗. Moreover, by Theorem 3.3 of [25] and (4.10) for any
sufficiently small ζ > 0 and big D > 0 there exists an event Sζ (here with a bit
abuse of notion Sζ ) such that
Sζ = {∀j,1 ≤ j ≤ m, |γˆm,j − γm,j | ≤ pζ−2/3j˜−1/3}(5.2)
and
P
(
Scζ
)≤ p−D.(5.3)
Note that 1
p
YY∗ and 1
p
Y∗Y have the same nonzero eigenvalues. To simplify
notation, let mp = m+ n− p. Write
1
p
YY∗ = U∗
(D 0
0 0
)
U,(5.4)
with D = diag{γˆm,1, γˆm,2, . . . , γˆm,m} and U is an orthogonal matrix. Then
det(λYY∗
p
− XX∗
mp
) = 0 is equivalent to
det
(
λ
(D 0
0 0
)
− 1
mp
UXX∗U∗
)
= 0.
Moreover, since {Xij } are independent standard normal random variables and U is
an orthogonal matrix we have UX d= X so that it suffices to consider the following
determinant:
det
(
λ
(D 0
0 0
)
− 1
mp
XX∗
)
= 0.(5.5)
Here, d= means having the identical distribution.
1586 X. HAN, G. PAN AND B. ZHANG
Now rewrite X as X = (X1X2), where X1 is a m×n matrix and X2 is a (p−m)×n
matrix. It follows that
XX∗ =
(X1X∗1 X1X∗2
X2X∗1 X2X∗2
)
=
(X11 X12
X21 X22
)
.(5.6)
Equation (5.5) can be rewritten as
det
( 1
mp
X11 − λD 1mp X12
1
mp
X21 1mp X22
)
= 0.
Since m+ n > p, X22 is invertible. Equation (5.5) is further equivalent to
det
( 1
mp
X11 − λD − 1
mp
X12X−122 X21
)
= 0.(5.7)
Moreover,
X11 − X12X−122 X21 = X1X∗1 − X1X∗2
(
X2X∗2
)−1X2X∗1
= X1(In − X∗2(X2X∗2)−1X2)X∗1.
Since rank(In − X∗2(X2X∗2)−1X2) = m+ n− p = mp , we can write
In − X∗2
(
X2X∗2
)−1X2 = V
( Imp 0
0 0
)
V∗,
where V is an orthogonal matrix. In view of the above, we can construct a m×mp
matrix Z = (Zij )m,mp consisting of independent standard normal random variables
so that
X11 − X12X−122 X21 d= ZZ∗.(5.8)
It follows that (5.7), and hence (5.5) are equivalent to
det
( 1
mp
ZZ∗ − λD
)
= 0.(5.9)
It then suffices to consider the largest eigenvalue of 1
mp
D−1ZZ∗. Denote by λ1
the largest eigenvalue of 1
mp
D−1ZZ∗. As in (4.3) and (4.4) define cˆm ∈ [0, γˆm,m)
to satisfy
1
m
m∑
j=1
(
cˆm
γˆm,j − cˆm
)2
= mp
m
(5.10)
and μˆp and σˆp by
μˆm = 1
cˆm
(
1 + 1
mp
m∑
j=1
(
cˆm
γˆm,j − cˆm
))
,
1
σˆ 3m
= 1
cˆ3m
(
1 + 1
mp
m∑
j=1
(
cˆm
γˆm,j − cˆm
)3)
.
TRACY–WIDOM LAW 1587
From Lemma 2, we have on the event Sζ
lim sup
p
cˆm
γˆm,m
< 1,(5.11)
which implies condition (4.16). It follows from Theorem 1.3 of [3] and Corol-
lary 3.19 of [17] that
lim
p→∞P
(
σˆm(m+ n− p)2/3(λ1 − μˆm) ≤ s)= F1(s).(5.12)
As in the proof of Theorem 2.1, by Lemmas 1 and 2 one may further conclude that
lim
p→∞P
(
σp(m+ n− p)2/3(λ1 −μp) ≤ s)= F1(s).(5.13)
6. Proof of part (ii) of Theorem 2.1: General distributions. The aim of this
section is to relax the Gaussian assumption on X. We below assume that X and Y
are real matrices. The complex case can be handled similarly, and hence we omit
it here. In the sequel, we absorb 1√
m+n−p and
1√
p
into X and Y, respectively [i.e.,
Var(Xij ) = 1m+n−p , Var(Yst ) = 1p ] for convenience.
In terms of the notation in this section [Var(Yst ) = 1p ], (5.4) can be rewritten as
YY∗ = U∗
(D 0
0 0
)
U.
Break U as
(U1
U2
)
where U1 and U2 are m × p and (p − m) × p, respectively. By
(5.4)–(5.7) (note that here we cannot omit U by UX d= X), the maximum eigen-
value of det(λYY∗ − XX∗) is equivalent to that of the following matrix:
A = D−1/2U1X(I − XT UT2 (U2XXT UT2 )−1U2X)XT UT1 D−1/2(6.1)
= D−1/2U1X(I − PXT UT2 )X
T UT1 D−1/2,
where PXT UT2 is the projection matrix. It is not necessary to assume that U2XX
T UT2
is invertible since PXT UT2 is unique even if (U2XX
T UT2 )− is the generalized inverse
matrix of U2XXT UT2 . Moreover, we indeed have the following lemma to control
the smallest eigenvalue of U2XXT UT2 .
LEMMA 3. Suppose that (m + n − p)1/2X satisfies Condition 1. Then
U2XXT UT2 is invertible and ∥∥(U2XXT UT2 )−1∥∥≤ M(6.2)
for a large constant M with high probability. Moreover,∥∥XX∗∥∥≤ M(6.3)
with high probability under conditions in Theorem 2.1.
1588 X. HAN, G. PAN AND B. ZHANG
PROOF. One may check that the conditions in Theorem 3.12 in [17] are satis-
fied when considering U2XXT UT2 . Applying Theorem 3.12 in [17], then yields∣∣∣∣λmin(U2XXT UT2 )−
(
1 −
√
n
p −m
)2∣∣∣∣≺ n−2/3,
where (1 −
√
n
p−m)
2 can be obtained when considering the special case when the
entries of X are Gaussian. As for (6.3), see Lemma 4.8 in [17].
Since the matrix in (6.1) is quite complicated, we construct a linearization ma-
trix for it
H = H(X) =
⎛
⎝ −zI 0 D
−1/2U1X
0 0 U2X
XT UT1 D−1/2 XT UT2 −I
⎞
⎠ .(6.4)
The connection between H and the matrix in (6.1) is that the upper left block of the
3 × 3 block matrix H−1 is the Stieltjes transform of (6.1) by simple calculations.
We next give the limit of the Stieltjes transform of (6.1) and need the following
well-known result (see [1]). There exists a unique solution m(z) : C+ → C such
that
1
m(z)
= −z + m
m+ n− p
∫
t
1 + tm(z) dHn(t),(6.5)
where Hn is the empirical distribution function of D−1. Moreover, we set
m(z) = − 1
m
Tr
(
z
(
1 +m(z)D−1))−1, ρ(x) = lim
z∈C+→x
m(z).
From the end of the last section, we see that under the Gaussian case (6.1) d=
D−1/2ZZ∗D−1/2. Hence, it is easy to see that μˆm defined above (5.11) is the right
most end point of the support of ρ(x).
For any small positive constant τ , we define the domains
E(τ,n) = {z = E + iη ∈C+ : |z| ≥ τ, |E| ≤ τ−1, n−1+τ ≤ η ≤ τ−1},(6.6)
E+ = E+(τ, τ ′, n)= {z ∈ E(τ,n) : E ≥ μˆm − τ ′},(6.7)
where τ ′ is a sufficiently small positive constant.
Set
= (z) =
√
m(z)
nη
+ 1
nη
,
(6.8)
G(z) = H−1, = (z) = z−1(1 +m(z)D−1)−1,
TRACY–WIDOM LAW 1589
and
F(z) =
⎛
⎜⎝
− D−1/2U1XXT UT2
U2XXT UT1 D−1/2 + U2XXT UT1 D−1/2D−1/2U1XXT UT2
0 XT UT2
0
U2X(
zm(z)+ 1)(I − PXT UT2 )
⎞
⎟⎠ ,
where = (U2XXT UT2 )−1. In fact, F(z) is close to G(z) with high probability.
We are now in a position to state our main result about the local law near μˆm,
the right end point of the support of the limit of the ESD of A in (6.1).
THEOREM 6.1 (Strong local law). Suppose that (m+ n− p)1/2X and p1/2Y
satisfy the conditions of Theorem 2.1. Then:
(i) For any deterministic unit vectors v, w ∈Rp+n.〈
v,
(
G(z)− F(z))w〉≺ (6.9)
uniformly z ∈ E+ and
(ii)
∣∣mn(z)−m(z)∣∣≺ 1
nη
(6.10)
uniformly in z ∈ E+, where mn(z) = 1m
∑m
i=1 Gii .
The proof of Theorem 6.1 is provided in the supplementary material [12].
6.1. Convergence rate on the right edge and universality.
6.1.1. Convergence rate on the right edge. We state the following lemma and
the proof is given in the supplementary material [12].
LEMMA 4. Denote by λ1 the largest eigenvalue of A in (6.1). Under condi-
tions of Theorem 2.1,
λ1 − μˆm = O≺(n−2/3).
6.1.2. Universality. The aim of this subsection is to prove (ii) of Theorem 2.1.
By (5.12) and (5.13), it suffices to prove edge universality at the rightmost edge of
the support μˆm. In other words, the asymptotic distribution of λ1 is not affected by
the distribution of the entries of X under the 3rd moment matching condition. Sim-
ilar to Theorem 6.4 of [9], we first show the following Green function comparison
theorem.
1590 X. HAN, G. PAN AND B. ZHANG
THEOREM 6.2. There exists ε0 > 0. For any ε < ε0, set η = n−2/3−ε , E1,
E2 ∈R with E1 |E1 − μˆm|, |E2 − μˆm| ≤ n−2/3+ε.
Suppose that K :R→R is a smooth function with bounded derivatives up to fifth
order. Then there exists a constant φ > 0 such that for large enough n∣∣∣∣EK
(
n
∫ E2
E1
mX1(x + iη) dx
)
−EK
(
n
∫ E2
E1
mX0(x + iη) dx
)∣∣∣∣
(6.11)
≤ n−φ
[see Definition 1 or (2.8) for X1 and X0].
The proof of Theorem 6.2 is given in the supplementary material [12].
In order to prove the Tracy–Widom law, we need to connect the probability
P(λ1 ≤ E) with Theorem 6.2.
By Lemma 4, we can fix E∗ ≺ n−2/3 such that it suffices to consider λ1 ≤
μˆm + E∗. Choosing |E − μˆm| ≺ n−2/3, η = n−2/3−9ε and l = 12n−2/3−ε , then
for some sufficiently small constant ε > 0 and sufficiently large constant D, there
exists a constant n0(ε,D) such that
EK
(
n
π
∫ μˆm+E∗
E−l
mX1(x + iη) dx
)
(6.12)
≤ P(λ1 ≤ E) ≤ EK
(
n
π
∫ μˆm+E∗
E+l
mX1(x + iη) dx
)
+ n−D,
where n ≥ n0(ε,D) and K is a smooth cutoff function satisfying the condition of
K in Theorem 6.2. We omit the proof of (6.12) because it is a standard procedure
and one can refer to [9] or Corollary 5.1 of [4], for instance. Combining (6.12)
with Theorem 6.2 one can prove Tracy–Widom’s law directly (see the proof of
Theorem 1.3 of [3]).
SUPPLEMENTARY MATERIAL
Supplement: Proof of some lemmas and theorems (DOI: 10.1214/15-
AOS1427SUPP; .pdf). In the supplementary file [12], we provide the proofs of
(2.11), Lemmas 1, 2 and 4, Theorems 6.1 and 6.2.
REFERENCES
[1] BAI, Z. and SILVERSTEIN, J. W. (2006). Spectral Analysis of Large Dimensional Random
Matrices, 1st ed. Springer, New York.
[2] BAIK, J. and SILVERSTEIN, J. W. (2006). Eigenvalues of large sample covariance matrices of
spiked population models. J. Multivariate Anal. 97 1382–1408. MR2279680
[3] BAO, Z., PAN, G. and ZHOU, W. (2015). Universality for the largest eigenvalue of sample
covariance matrices with general population. Ann. Statist. 43 382–421. MR3311864
TRACY–WIDOM LAW 1591
[4] BAO, Z. G., PAN, G. M. and ZHOU, W. (2013). Local density of the spectrum on the edge for
sample covariance matrices with general population. Preprint. Available at http://www.
ntu.edu.sg/home/gmpan/publications.html.
[5] DHARMAWANSA, P., JOHNSTONE, I. M. and ONATSKI, A. (2014). Local asymptotic normal-
ity of the spectrum of high-dimensional spiked F-ratios. Available at http://arxiv.org/pdf/
1411.3875.pdf.
[6] EL KAROUI, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of com-
plex sample covariance matrices. Ann. Probab. 35 663–714. MR2308592
[7] ERD ˝OS, L., KNOWLES, A. and YAU, H. (2013). Averaging fluctuations in resolvents of ran-
dom band matrices. Ann. Henri Poincaré 14 1837–1926. MR3119922
[8] ERD ˝OS, L., SCHLEIN, B. and YAU, H. (2009). Local semicircle law and complete delocaliza-
tion for Wigner random matrices. Comm. Math. Phys. 287 641–655. MR2481753
[9] ERD ˝OS, L., YAU, H. and YIN, J. (2012). Rigidity of eigenvalues of generalized Wigner matri-
ces. Adv. Math. 229 1435–1515. MR2871147
[10] FÉRAL, D. and PÉCHÉ, S. (2009). The largest eigenvalues of sample covariance matrices for
a spiked population: Diagonal case. J. Math. Phys. 50 073302, 33. MR2548630
[11] FUJIKOSHI, Y., ULYANOV, V. V. and SHIMIZU, R. (2010). Multivariate Statistics: High-
Dimensional and Large-Sample Approximations. Wiley, Hoboken, NJ. MR2640807
[12] HAN, X., PAN, G. and ZHANG, B. (2016). Supplement to “The Tracy–Widom law for the
largest eigenvalue of F type matrices.” DOI:10.1214/15-AOS1427SUPP.
[13] JOHANSSON, K. (2000). Shape fluctuations and random matrices. Comm. Math. Phys. 209
437–476. MR1737991
[14] JOHNSTONE, I. M. (2001). On the distribution of the largest eigenvalue in principal compo-
nents analysis. Ann. Statist. 29 295–327. MR1863961
[15] JOHNSTONE, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue,
Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716. MR2485010
[16] JOHNSTONE, I. M. (2009). Approximate null distribution of the largest root in multivariate
analysis. Ann. Appl. Stat. 3 1616–1633. MR2752150
[17] KNOWLES, A. and YIN, J. (2015). Anisotropic local laws for random matrices. Available at
arXiv:1410.3516v3.
[18] LEDOIT, O. and WOLF, M. (2002). Some hypothesis tests for the covariance matrix when the
dimension is large compared to the sample size. Ann. Statist. 30 1081–1102. MR1926169
[19] LEE, J. O. and SCHNELLI, K. (2014). Tracy–Widom distribution for the largest eigenvalue of
real sample covariance matrices with general population. Available at arXiv:1409.4979v1.
[20] LEVANON, N. (1988). Radar Principles. Wiley, New York.
[21] MAR ˘CENKO, V. A. and PASTUR, L. A. (1967). Distribution of eigenvalues for some sets of
random matrices. Sb. Math. 4 457–483.
[22] MUIRHEAD, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
MR0652932
[23] NADAKUDITI, R. R. and SILVERSTEIN, J. W. Fundamental limit of sample generalized eigen-
value based detection of signals in noise using relatively few signal-bearing and noise-
only samples. IEEE J. Sel. Top. Signal Process. 4 468–480.
[24] PAUL, D. and SILVERSTEIN, J. W. (2009). No eigenvalues outside the support of the limiting
empirical spectral distribution of a separable covariance matrix. J. Multivariate Anal. 100
37–57. MR2460475
[25] PILLAI, N. S. and YIN, J. (2014). Universality of covariance matrices. Ann. Appl. Probab. 24
935–1001. MR3199978
[26] SOSHNIKOV, A. (2002). A note on universality of the distribution of the largest eigenvalues in
certain sample covariance matrices. J. Stat. Phys. 108 1033–1056. MR1933444
[27] TAO, T. and VU, V. (2011). Random matrices: Universality of local eigenvalue statistics. Acta
Math. 206 127–204. MR2784665
1592 X. HAN, G. PAN AND B. ZHANG
[28] TAO, T. and VU, V. (2012). Random covariance matrices: Universality of local statistics of
eigenvalues. Ann. Probab. 40 1285–1315. MR2962092
[29] TRACY, C. A. and WIDOM, H. (1994). Level-spacing distributions and the Airy kernel. Comm.
Math. Phys. 159 151–174. MR1257246
[30] TRACY, C. A. and WIDOM, H. (1996). On orthogonal and symplectic matrix ensembles.
Comm. Math. Phys. 177 727–754. MR1385083
[31] VINOGRADOVA, J., COUILLET, R. and HACHEM, W. (2013). Statistical inference in large
antenna arrays under unknown noise pattern. IEEE Trans. Signal Process. 61 5633–5645.
MR3130031
[32] WACHTER, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann.
Statist. 8 937–957. MR0585695
[33] WANG, K. (2012). Random covariance matrices: Universality of local statistics of eigenvalues
up to the edge. Random Matrices Theory Appl. 1 1150005, 24. MR2930383
[34] WANG, Q. and YAO, J. (2015). Extreme eigenvalues of large-dimensional spiked Fisher ma-
trices with application. Available at http://arxiv.org/pdf/1504.05087.pdf.
[35] YAO, J. F., ZHENG, S. R. and BAI, Z. D. (2015). Large Sample Covariance Matrices and
High-Dimemnsional Data Analysis. Cambridge Univ. Press, Cambridge.
[36] ZENG, Y. H. and LIANG, Y. C. (2009). Eigenvalue-based spectrum sensing algorithms for
cognitive radio. IEEE Transactions Communications 57 1784–1793.
[37] ZHANG, L. X. (2006). Spectral Analysis of Large Dimensional Random Matrices. Ph.D. The-
sis, National University of Singapore.
[38] ZHENG, S. (2012). Central limit theorems for linear spectral statistics of large dimensional
F -matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 444–476. MR2954263
SCHOOL OF PHYSICAL AND MATHEMATICAL SCIENCES
NANYANG TECHNOLOGICAL UNIVERSITY
SINGAPORE
E-MAIL: xhan011@e.ntu.edu.sg
gmpan@ntu.edu.sg
bzhang007@e.ntu.edu.sg