AOS1427-r代写
时间:2022-11-02
The Annals of Statistics 
2016, Vol. 44, No. 4, 1564–1592 
DOI: 10.1214/15-AOS1427 
© Institute of Mathematical Statistics, 2016 
THE TRACY–WIDOM LAW FOR THE LARGEST EIGENVALUE OF 
F TYPE MATRICES 
BY XIAO HAN, GUANGMING PAN1 AND BO ZHANG 
Nanyang Technological University 
Let Ap = YY∗m and Bp = XX 
∗ 
n be two independent random matrices 
where X = (Xij )p×n and Y = (Yij )p×m respectively consist of real (or 
complex) independent random variables with EXij = EYij = 0, E|Xij |2 = 
E|Yij |2 = 1. Denote by λ1 the largest root of the determinantal equation 
det(λAp − Bp) = 0. We establish the Tracy–Widom type universality for λ1 
under some moment conditions on Xij and Yij when p/m and p/n approach 
positive constants as p → ∞. 
1. Introduction. High-dimensional data now commonly arise in many sci- 
entific fields such as genomics, image processing, microarray, proteomics and fi- 
nance, to name but a few. It is well known that the classical theory of multivariate 
statistical analysis for the fixed dimension p and large sample size n may lose its 
validity when handling high-dimensional data. A popular tool in analyzing large 
covariance matrices, and hence high-dimensional data is random matrix theory. 
The spectral analysis of high-dimensional sample covariance matrices has attracted 
considerable interests among statisticians, probabilitists and mathematicians since 
the seminal work of Marc˘enko and Pastur [21] about the limiting spectral distribu- 
tion for a class of sample covariance matrices. One can refer to the monograph of 
Bai and Silverstein [1] for a comprehensive summary and references therein. 
The largest eigenvalue of covariance matrices plays an important role in mul- 
tivariate statistical analysis such as principle component analysis (PCA), multi- 
variate analysis of variance (MANOVA) and discriminant analysis. One may refer 
to [22] for more details. In this paper, we focus on the largest eigenvalue of the F 
type matrices. Suppose that 
Ap = YY 
∗ 

, Bp = XX 
∗ 

(1.1) 
are two independent random matrices where X = (Xij )p×n and Y = (Yij )p×m re- 
spectively consist of real (or complex) independent random variables with EXij = 
Received June 2015; revised December 2015. 
1Supported in part by a MOE Tier 2 Grant 2014-T2-2-060 and by a MOE Tier 1 Grant RG25/14 
at the Nanyang Technological University, Singapore. 
MSC2010 subject classifications. Primary 60B20, 34K25; secondary 60F05, 62H10. 
Key words and phrases. Tracy–Widom distribution, largest eigenvalue, sample covariance matrix, 
F matrix. 
1564 
TRACY–WIDOM LAW 1565 
EYij = 0 and E|Xij |2 = E|Yij |2 = 1. Consider the determinantal equation 
det(λAp − Bp) = 0.(1.2) 
When Ap is invertible, the roots to (1.2) are the eigenvalues of a F matrix 
A−1p Bp,(1.3) 
referred to as a Fisher matrix in the literature. The determinantal equation (1.2) is 
closely connected with the generalized eigenproblem 
det 

λ(Ap + Bp)− Bp]= 0.(1.4) 
We illustrate this in the next section. Many classical multivariate statistical tests 
are based on the roots of (1.2) or (1.4). For instance, one may use them to test 
the equality of two covariance matrices and the general linear hypothesis. In the 
framework of multivariate analysis of variance (MANOVA), Ap represents the 
within group covariance matrix while Bp means the between groups covariance 
matrix. A one-way MANOVA can be used to examine the hypothesis of equality 
of the mean vectors of interest. 
Tracy and Widom in [29, 30] first discovered the limiting distributions of the 
largest eigenvalue for the large Gaussian Wigner ensemble, thus named as Tracy– 
Widom’s law. Since their pioneer work study toward the largest eigenvalues of 
large random matrices becomes flourishing. To name a few, we mention [6, 10, 
13, 14] and [26]. Among them we would mention El Karoui [6] which handled 
the largest eigenvalue of Wishart matrices for the nonnull population covariance 
matrix and provided a kind of condition on the population covariance matrix to 
ensure the Tracy–Widow law [see (4.16) below]. 
A follow-up to the above results is to establish the so-called universality prop- 
erty for generally distributed large random matrices. Specifically speaking, the uni- 
versality property states that the limiting behavior of an eigenvalue statistic usually 
is not dependent on the distribution of the matrix entries. Indeed, the Tracy–Widom 
law has been established for the general sample covariance matrices under very 
general assumptions on the distributions of the entries of X. The readers can refer 
to [3, 7, 9, 17, 19, 25, 27, 28, 33] for some representative developments on this 
topic. When proving universality an important tool is the Lindeberg comparison 
strategy (see Tao and Vu in [27] and Erdos, Yau and Yin [9]) and an important 
input when applying Lindeberg’s comparison strategy is the strong local law de- 
veloped by Erdos, Schlein and Yau in [8] and Erdos, Yau and Yin in [9]. 
Johnstone in [15] proved that the largest root of (1.1) converges to Tracy and 
Widom’s distribution of type one after appropriate centering and scaling when the 
dimension p of the matrices Ap and Bp is even, limp→∞ p/m < 1 and Bp and 
Ap are both Wishart matrices. It is believed that the limiting distribution should 
not be affected by the dimension p. Indeed, numerical investigations both in [15] 
and [16] suggest that the Tracy and Widom approximation in the odd dimension 
1566 X. HAN, G. PAN AND B. ZHANG 
case works as well as in the even dimension case. Besides, as it can be guessed, 
the Tracy and Widom approximation should not rely on the Gaussian assumption. 
However, theoretical support for these remains open. Furthermore, when Ap is not 
invertible the limiting distribution of the largest root to (1.1) is unknown yet even 
under the Gaussian assumption. 
In this paper, we prove the universality of the largest root of (1.2) by impos- 
ing some moment conditions on Ap and Bp . Specifically speaking, we prove that 
the largest root of (1.2) converges in distribution to the Tracy and Widom law 
for the general distributions of the entries of X and Y no matter what the di- 
mension p is, even or odd. Moreover, the result holds when limp→∞ p/m < 1 
or limp→∞ p/m > 1, corresponding to invertible Ap and noninvertible Ap . This 
result also implies the asymptotic distribution of the largest root of (1.4). 
At this point, it is also appropriate to mention some related work about the roots 
of (1.2). The limiting spectral distribution of the roots was derived by [32] and [1]. 
One may also find the limits of the largest root and the smallest root in [1]. Central 
limit theorem about linear spectral statistics was established in [38]. Very recently, 
the so-called spiked F model has been investigated by [5] and [34]. We would like 
to point out that they prove the local asymptotic normality or asymptotic normality 
for the largest eigenvalue of the spiked F model, which is completely different from 
our setting. 
We conclude this section by outlining some ideas in the proof and presenting the 
structure of the rest of the paper. When Ap is invertible, the roots to (1.2) become 
those of the F matrix A−1p Bp so that we may work on A−1p Bp . Roughly speaking, 
A−1p Bp can be viewed as a kind of general sample covariance matrix T 
1/2 
n XX∗T1/2n 
with Tn being a population covariance matrix by conditioning on Ap . Denote the 
largest root of (1.2) by λ1. The key idea is to break λ1 into a sum of two parts as 
follows: 
λ1 −μp = (λ1 − μˆp)+ (μˆp −μp),(1.5) 
where μˆp is an appropriate value when Ap is given and μp is an appropriate value 
when Ap is not given (their definitions are given in the later sections). However, 
we cannot condition on Ap directly. Instead we first construct an appropriate event 
so that we can handle the first term on the right-hand side of (1.5) on the event 
to apply the earlier results about T1/2n XX∗T1/2n . Particularly, we need to verify the 
condition (4.16) below. Once this is done, the next step is to prove that the second 
term on the right-hand side of (1.5) after scaling converges to zero in probability. 
This approach is different from that used in the literature in proving universality 
for the local eigenvalue statistics. 
Unfortunately, when Ap is not invertible we cannot work on F matrices A−1Bp 
anymore. To overcome the difficulty, we instead start from the determinantal equa- 
tion (1.2). It turns out that the largest root λ1 can then be linked to the largest root 
of some F matrix when X consists of Gaussian random variables. Therefore, the 
TRACY–WIDOM LAW 1567 
result about F matrices A−1Bp is applicable. For general distributions, we find that 
it is equivalent to working on such a “covariance-type” matrix 
D−1/2U1X 

I − X∗U∗2 

U2XX∗U∗2 
)−1U2X)X∗U∗1D−1/2.(1.6) 
The definitions of D and Uj , j = 1,2 are given in the later section. This matrix 
is much more complicated than general sample covariance matrices. To deal with 
(1.6), we construct a 3 × 3 block linearization matrix 
H = H(X) = 
⎛ 
⎝ −zI 0 D 
−1/2U1X 
0 0 U2X 
XT UT1 D−1/2 XT UT2 −I 
⎞ 
⎠ ,(1.7) 
where z = E + iη is a complex number with a positive imaginary part. It turns out 
that the upper left block of the 3 × 3 block matrix H−1 is the Stieltjes transform 
of (1.6) by simple calculations. We next develop the strong local law around the 
right end support μp by using a type of Lindeberg’s comparison strategy raised in 
[17] and then use it to prove edge universality by adapting the approach used in [9] 
and [3]. 
The paper is organized as follows. Section 2 is to give the main results. Sta- 
tistical applications and Tracy–Widom approximation are discussed in Section 3. 
Section 4 is devoted to proving the main result when Ap is invertible. Sections 5 
and 6 prove the main result when Ap is not invertible. Some lemmas (theorems) 
and their proof are provided in the supplementary material [12] (Sections 7–12). 
2. The main results. Throughout the paper, we make the following condi- 
tions. 
CONDITION 1. Assume that {Zij } are independent random variables with 
EZij = 0, E|Zij |2 = 1. For all k ∈ N , there is a constant Ck such that E|Zij |k ≤ 
Ck . In addition, if {Zij } are complex, then EZ2ij = 0. 
We say that a random matrix Z = (Zij ) satisfies Condition 1 if its entries {Zij } 
satisfy Condition 1. 
CONDITION 2. Assume that random matrices X = (Xij )p,n and Y = (Yij )p,m 
are independent. 
CONDITION 3. Set m = m(p) and n = n(p). Suppose that 
lim 
p→∞ 


= d1 > 0, lim 
p→∞ 


= d2 > 0, 0 < lim 
p→∞ 

m+ n < 1. 
1568 X. HAN, G. PAN AND B. ZHANG 
To present the main results uniformly, we define m˘ = max{m,p}, n˘ = 
min{n,m+ n− p} and p˘ = min{m,p}. Moreover, let 
sin2(γ /2) = min{p˘, n˘} − 1/2 
m˘+ n˘− 1 , sin 
2(ψ/2) = max{p˘, n˘} − 1/2 
m˘+ n˘− 1 ,(2.1) 
μJ,p = tan2 

γ +ψ 



(2.2) 
σ 3J,p = μ3J,p 
16 
(m˘+ n˘− 1)2 

sin(γ ) sin(ψ) sin2(γ +ψ). 
Formulas (2.2) can be found in [15] when d1 < 1. 
We below present alternative expressions of μJ,p and σJ,p . To this end, define 
a modified density of the Marc˘enko–Pastur law [21] (M–P law) by 
p(x) = 12πx(p˘/m˘) 
√ 
(bp − x)(x − ap)I(ap ≤ x ≤ bp),(2.3) 
where ap = (1 − 
√ 
p˘ 
m˘ 
)2 and bp = (1 + 
√ 
p˘ 
m˘ 
)2. Let γ1 ≥ γ2 ≥ · · · ≥ γp˘ satisfy∫ +∞ 
γj 
p(x) dx = j 
p˘ 
,(2.4) 
with γ0 = bp and γp = ap . Moreover, suppose that cp ∈ [0, ap) satisfies the equa- 
tion ∫ +∞ 
−∞ 

cp 
x − cp 
)2 
p(x) dx = n˘ 
p˘ 
.(2.5) 
One may easily check the existence and uniqueness of cp . Define 
μp = 1 
cp 

1 + p˘ 
n˘ 
∫ +∞ 
−∞ 

cp 
x − cp 

p(x) dx 

(2.6) 
and 

σ 3p 
= 1 
c3p 

1 + p˘ 
n˘ 
∫ +∞ 
−∞ 

cp 
x − cp 
)3 
p(x) dx 

.(2.7) 
It turns out that (2.2) and (2.6)–(2.7) are equivalent subject to some scaling, which 
is verified in Section 7 in the supplementary material [12]. 
We also need the following moment match condition. 
DEFINITION 1 (Moment matching). Let X1 = (x1ij )M×N and X0 = (x0ij )M×N 
be two matrices satisfying Condition 1. We say that X1 matches X0 to order q , 
if for the integers i, j, l and k satisfying 1 ≤ i ≤ M , 1 ≤ j ≤ N , 0 ≤ l, k and 
l + k ≤ q , they have the relationship 

[(x1ij )l( x1ij )k]= E[(x0ij )l( x0ij )k]+O(exp(−(logp)C)),(2.8) 
TRACY–WIDOM LAW 1569 
where C is some positive constant bigger than one, x is the real part and x is 
the imaginary part of x. 
Throughout the paper, we use X0 to stand for the random matrix consisting of 
independent Gaussian random variables with mean zero and variance one. 
Denote the type-i Tracy–Widom distribution by Fi , i = 1,2 (see [30]). Set Bp = 
XX∗ 
n˘ 
and Ap = YY∗m˘ . We are now in a position to state the main results about F type 
matrices. 
THEOREM 2.1. Suppose that the real random matrices X and Y satisfy Con- 
ditions 1–3. Moreover, suppose that 0 < d2 < ∞. Denote the largest root of 
det(λAp − Bp) = 0 by λ1. 
(i) If 0 < d1 < 1, then 
lim 
p→∞P 

(n˘/m˘)λ1 −μJ,p 
σJ,p 
≤ s 

= F1(s).(2.9) 
(ii) If d1 > 1 and X matches the standard X0 to order 3, then (2.9) still holds. 
REMARK 1. When X and Y are complex random matrices, Theorem 2.1 still 
holds but the Tracy–Widom distribution F1(s) should be replaced by F2(s). 
If 0 < d1 < 1, then Ap is invertible. In this case the largest eigenvalue λ1 is that 
of F matrices A−1p Bp . If d1 > 1, then Ap is not invertible. 
REMARK 2. Theorem 2.1 immediately implies the distribution of the largest 
root of det(λ(Bp + Ap) − Bp) = 0. In fact, the largest root of det(λ(Bp + Ap) − 
Bp) = 0 is λ11+λ1 if λ1 is the largest root of the F matrices BpA−1p in Theorem 2.1 
when 0 < d1 < 1. 
When d1 > 1 the largest root of det(λ(Bp + Ap) − Bp) = 0 is one with 
multiplicity (p − m). We instead consider the (p − m + 1)th largest root of 
det(λ(Bp + Ap) − Bp) = 0. It turns out that the (p − m + 1)th largest root of 
det(λ(Bp + Ap)− Bp) = 0 is λ11+λ1 if λ1 is the largest root of det(λAp − Bp) = 0. 
Moreover, note the equality 
(Bp + Ap)−1Bp + (Bp + Ap)−1Ap = I. 
If Y matches X0 to order 3, then the smallest positive root of det(λ(Bp + Ap) − 
Bp) = 0 also tends to type-1 Tracy–Widom distribution after appropriate central- 
izing and rescaling by Theorem 2.1 when d1 > 1 and d2 > 1. 
We would like to point out that Johnstone [15] proved part (i) of Theorem (2.1) 
when p is even, Ap and Bp are both Wishart matrices. Part (ii) of Theorem (2.1) is 
new even if Ap and Bp are both Wishart matrices. When proving Theorem 2.1, we 
1570 X. HAN, G. PAN AND B. ZHANG 
have indeed obtained different asymptotic mean and variance. Precisely we have 
proved that 
lim 
p→∞P 

σpn˘ 
2/3(λ1 −μp) ≤ s)= F1(s)(2.10) 
and that ∣∣∣∣m˘n˘ μJ,p −μp 
∣∣∣∣= O(p−1), limp→∞σp m˘n˘1/3 σJ,p = 1.(2.11) 
Equations (2.10) and (2.11) imply Theorem 2.1. The proof of (2.11) is provided in 
the supplementary material [12] and we prove (2.10) in the main paper. 
3. Applications and simulations. This section is to explore some applica- 
tions of our universality results in high-dimensional statistical inference and con- 
duct simulations to check the quality of the approximations of our limiting law. 
3.1. Long-side spherical test (LSST) for separable covariance matrices. Con- 
sider a data matrix Y = (y1, . . . ,yN)p×N as follows: 
Y = 1/2XT1/2,(3.1) 
where X is a p × N matrix satisfying Condition 1, X matches X0 to order 3 (if 
p > 2N or 2p definite matrices. We start from a special case T = I and the model then becomes 
Y = 1/2X. 
For such a simplified model, the spherical test: H0 : = σ 2I vs. H1 :  
= σ 2I 
has been widely discussed in literature. When p is comparable to N , there are 
considerable work on it. To name a few, we mention [18] and Section 9.5 of [35]. 
We can extend this test to the more general model (3.1). In model (3.1), YY∗ is 
called the separable covariance matrix which is to depict the spatial temporal data. 
For high dimensional data, the spectral properties of YY∗ are studied in some 
recent papers such as [24] and [37]. In this section, we focus on testing whether  
or T is proportional to I. To be precise, we can conduct the following hypothesis 
testing problems: 
If lim 
p→∞ 


< 1, we test H0 : T = σ 21 I vs. H1 : T  
= σ 21 I 
or 
If lim 
p→∞ 


> 1, we test H0 : = σ 22 I vs. H1 :  
= σ 22 I. 
In the sequel, we focus on the first testing problem, that is, limp→∞ pN < 1 and 
the second can be discussed similarly. We choose an index subset S ∈ {1, . . . ,N} 
such that the cardinality of S is N2 (we also suggest an approach for selection 
TRACY–WIDOM LAW 1571 
of S in stationary time series models in simulation). Moreover, we define Z2Z∗2 =∑ 
i∈S yiy∗i , Z1Z∗1 = 
∑ 
i /∈S yiy∗i , n = N2 and m = N−n. We use the largest root λ1 
of det(λZ1Z 
∗ 

m˘ 
− Z2Z∗2 
n˘ 
) = 0 as a test statistic. Under the null hypothesis, λ1 tends to 
Tracy–Widom’s distribution after centralizing and rescalling for any selection S. 
The key observation is that can be eliminated in det(λZ1Z 
∗ 

m˘ 
− Z2Z∗2 
n˘ 
) = 0. But 
under the alternative hypothesis, for a suitably chosen S, the correlation structure 
involved in Z2 can be much different from that of Z1, which implies the largest 
root of the above determinant deviates much from μp . This observation ensures 
that n2/3(λ1 − μp) will be very large when the null hypothesis does not hold. 
One can see that the restriction of limp→∞ pN < 1 comes from the conditions of 
Theorem 2.1. 
In addition, we would point out that the extreme eigenvalue of sample covari- 
ance matrices is not a proper statistic for such a hypothesis test when there are no 
spiked eigenvalues, while our statistic is not dependent on the fact that whether  
is spiked or not. We demonstrate the reasons below. First, one cannot directly use 
the largest eigenvalue of YY∗ since is unknown. Therefore, we have to apply the 
statistic [One-side identity test (OSI)] in Section 2.1 of [3], that is, λ1(YY∗)−λ2(YY∗) 
λ2(YY∗)−λ3(YY∗) . 
When T is a spiked matrix, the statistic works well (see Table 4 in [3]). However, 
when T is not spiked and = I the statistic λ1(YY∗)−λ2(YY∗) 
λ2(YY∗)−λ3(YY∗) always tends to the 
same distribution for any T satisfying (1.4) of [3] (including T = σ 2I), which 
means the statistic does not work in this case. Table 4 below confirms this phe- 
nomenon. 
3.2. Equality of K covariance matrices (EOM). Consider the model of the 
following form: 
Zi = 1/2i Xi , i = 1, . . . ,K, 
where {Xi} are p × ni random matrices and {i} are p × p invertible population 
covariance matrices and K is a positive integer. Moreover, we assume that there 
exists a k0 such that limp→∞ 
∑k0 
i=1 ni 

∈ (0,∞) and limp→∞ 
∑K 
i=k0+1 ni 

∈ (0,∞). 
For simplicity and being consistent with the previous notation, we set n =∑k0i=1 ni , 
m = ∑Ki=k0+1 ni , Y = (Xk0+1, . . . ,XK) and X = (X1, . . . ,Xk0). We also assume 
that X and Y satisfy the conditions of Theorem 2.1. 
We are interested in the following hypothesis test: 
H0 : 1 = 2 = · · · = K vs. H1 : ∃1 ≤ i < j ≤ K such that i  
= j . 
Under the null hypothesis, we have 
det 

λ 
∑K 
i=k0+1 ZkZ 
∗ 

m˘ 
− 
∑k0 
i=1 ZkZ∗k 
n˘ 

= 0 ⇐⇒ det 

λ 
YY∗ 
m˘ 
− XX 
∗ 
n˘ 

= 0. 
1572 X. HAN, G. PAN AND B. ZHANG 
In view of this, we may propose the largest root λ1 of det(λ 
∑K 
i=k0+1 ZkZ 
∗ 

m˘ 
−∑k0 
i=1 ZkZ∗k 
n˘ 
) = 0 as a test statistic. By Theorem 2.1, we see that λ1 tends to Tracy– 
Widom’s distribution after centralizing and rescaling. 
We below investigate its power for a kind of sparse alternative hypothesis when 
K = 2. Specifically, we consider the alternative case 
Z1 = 1/2X, Z2 = Y.(3.2) 
If YY∗ is invertible, we choose = I + τ p/m+r1−p/me1eT1 and r = 
√ 


+ p 

− p2 
mn 
. The 
reason why we choose the factor p/m+r1−p/m is that it is a spiked F matrix when τ > 1. 
The largest eigenvalue λ1 converges to normal distribution weakly by Proposi- 
tion 11 of [5] and Theorem 4.1 of [34]. In fact, by Proposition 5 of [5] and Theo- 
rem 3.1 of [34] we immediately have the following proposition. 
PROPOSITION 1. For the model (3.2), suppose YY∗ is invertible and = I + 
τ 
p/m+r 
1−p/me1e 

1 . Let φ(x) = x(x−1+p/n)x(1−p/m)−1 . When τ > 1, the largest eigenvalue of the 
spiked F matrix m 

(YY∗)−1 1/2XX∗ 1/2 (denoted by λ1) almost surely converges 
to m 

φ(1 + τ p/m+r1−p/m), and for any positive constant C, we have 
lim 
p→∞P 

(n/m)λ1 −μJ,p 
σJ,p 
> C 

= 1(3.3) 
(the power of the test goes to 1 as p → ∞). 
An interesting feature of this approach is that K can tend to infinity. Below we 
compare our statistic with Corrected Likelihood Ratio Test (CLRT) proposed in 
Chapter 9 of [35] when K = 2. First, we do not assume E|Xij |4 to be equal to 
some known constant β for all i and j unlike [35]. Moreover, the 4th moment 
assumption restricts the extension of their approach to the equality test of K ma- 
trices since it is not reasonable to make such an assumption when K is large. The 
advantage of CLRT is that it includes all information of the F matrix’s spectrum 
such that their test is more powerful when the population eigenvalues are close to 
each other. But when −11 2 is a spiked matrix the largest eigenvalue of F type 
matrices works better. See Table 8 below. 
3.3. Correlated noise detection. Let 
yt = f (xt )+ 1/2εt , t = 1,2, . . . , T(3.4) 
be the signal received at time t where yt is a p-dimensional real or complex vector 
and εt is a p-dimensional white noise vector (i.i.d.) satisfying Condition 1. More- 
over, if 2p < T , we assume that the third moments of the entries of εt are 0, is 
an unknown p × p invertible matrix, limp→∞ pT ∈ (0,1), x is a vector or matrix 
TRACY–WIDOM LAW 1573 
with arbitrary dimension (maybe correlated to y) and that f (x) is a given function 
such as regression model f (x) = x∗β . We are interested in whether there is “real 
signal” contained in yt . Our hypothesis testing problem is 
H0 : yt = 1/2εt vs. H1 : y  
= 1/2εt .(3.5) 
Let X = (y1, . . . ,yT/2), Y = (yT/2+1, . . . ,yT ), n = T2 and m = T −n. As be- 
fore use the largest root of det(λYY∗ 
m˘ 
− XX∗ 
n˘ 
) = 0 as a test statistic which converges 
to Tracy–Widom’s distribution after centralizing and rescaling by Theorem 2.1. 
In engineering, we do not need to split the sample into X and Y. Specifically, in 
signal detection or cognitive radio, model (3.4) takes the form of 
yt = Axt + 1/2εt , t = 1,2, . . . , T ,(3.6) 
where xt is a k-dimensional signal vector with covariance matrix S, A is a p × k 
deterministic matrix and the other assumptions are the same as those in the pre- 
vious model (3.4). We are also interested in the test (3.5). This is a widely dis- 
cussed problem in cognitive radio. For the high dimensional setting, once may 
see [36] and [23]. We also refer to the recent paper [31] assuming the corre- 
lated noise. In engineering, there exists some method to get another signal-free 
sample, say rt = 1/2zt , t = 1, . . . , T1 where {εt }Tt=1 and {zt }T1t=1 are i.i.d. One 
can refer to [20, 34] and [23] for detailed discussions. Let R1 = (y1, . . . ,yT ) and 
R2 = (r1, . . . , rT1). We use the largest root of det(λR2R 
∗ 

T1 
− R1R∗1 

) = 0 as a statistic. 
The power is sated in Table 9 below. 
3.4. Other applications under the Gaussian distribution. There are many 
other applications which can be connected with the largest eigenvalue of the F 
matrices due to nice properties of the Gaussian distribution. We illustrate a multi- 
variate ANOVA test below. One can refer to [15] for more applications. We con- 
sider the multivariate regression model 
Y = XB + Z, 
where Y is an N × p response matrix, X is a known N × q design matrix, B 
is a q × p unknown regression matrix, Z is a N × p random matrix with i.i.d. 
Gaussian entries and is an invertible deterministic matrix. We are interested in 
the following hypothesis test: given g × q matrix C: 
H0 : CB = 0 vs. H1 : CB  
= 0. 
To explain the motivation behind the test, we consider a low dimensional ex- 
ample. If q = 3, p = 1, Y = (y1, . . . ,yN)T , yi ∼ N(bj , σ 2), Nj−1 ≤ i ≤ Nj , 
1 = N1 ≤ · · · ≤ N4 = N and C = (10 0−1 −11 ), the null hypothesis CB = 0 is 
equivalent to b1 = b2 = b3, that is, ANOVA test. Of course, we consider high 
1574 X. HAN, G. PAN AND B. ZHANG 
dimensional setting in this paper. We assume N > q . The least square estimator is 
Bˆ = (X∗X)−1X∗Y. Under the null hypothesis, it is easy to see that the matrices 
D = Y∗(I − PX)Y ∼ Wp(,N − q), 
E = (CBˆ)∗[C(X∗X)−1C∗]−1(CBˆ) ∼ Wp(, g) 
are independent where PX is the projection matrix generated by X. Then the largest 
root of det(λ D 
N−q − Eg ) = 0 can be used as a statistic for the test. One can further 
refer to pages 210–213 of [11] for constructing a confidence interval for the linear 
combination of the entries of B. 
3.5. Simulations. We conduct some numerical simulations to check the accu- 
racy of the distributional approximations in Theorem 2.1 under various settings of 
(p,m,n) and the distribution of X. We also study the power of the hypothesis tests 
in Sections 3.1–3.2. 
As in [15], we below use ln(λ1) to run simulations. To do so, we first give its 
distribution. By [15] and (2.10), we can find that 
λ1 = μp + Z 
σpn˘2/3 
+ op(n˘−2/3),(3.7) 
where Z = F−11 (U) and U is a U(0,1) random variable. By Taylor’s expansion, 
we then have 
ln(λ1) = ln(μp)+ Z 
μpσpn˘2/3 
+ op(n˘−2/3).(3.8) 
Recall | m˘ 
n˘ 
μJ,p − μp| = O(p−1) and limp→∞ σp m˘n˘1/3 σJ,p = 1 in Section 2. Sum- 
marizing the above, we can find 
lim 
p→∞P 

σpln 

ln(λ1)−μpln)≤ s)= F1(s),(3.9) 
where 
μpln = ln 

m˘ 
n˘ 
μJ,p 

, σpln = μJ,p 
σJ,p 
.(3.10) 
3.5.1. Accuracy of approximations for TW laws and size. We conduct some 
numerical simulations to check the accuracy of the distributional approximations 
in Theorem 2.1, which include the size of the tests as well. 
Table 1 is done by the software R. We set two initial triples (p,m,n) of 
M0 = (5,40,10) and M1 = (30,20,25) and then consider 2Mi,3Mi and 4Mi , 
i = 0,1. The triples M0 and M1 correspond to invertible YY∗ and noninvertible 
YY∗ respectively. For each case, we generate 10,000 (X,Y) whose entries follow 
standard normal distribution. We calculate the largest root of det(λYY∗ 
m˘ 
− XX∗ 
n˘ 
) = 0 
to get ln(λ1) and renormalize it with μpln and σpln. In the “Percentile column”, 
TRACY–WIDOM LAW 1575 
TABLE 1 
Standard quantiles for several triples (p,m,n): Gaussian case 
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25) 
Percentile TW M0 2M0 3M0 4M0 M1 2M1 3M1 4M1 2*SE 
−3.9 0.01 0.0208 0.0133 0.0124 0.0115 0.0017 0.0035 0.0048 0.0060 0.002 
−3.18 0.05 0.0680 0.0601 0.0562 0.0582 0.0210 0.0276 0.0327 0.0370 0.004 
−2.78 0.1 0.1176 0.1120 0.1088 0.1095 0.0608 0.0712 0.0808 0.0842 0.006 
−1.91 0.3 0.3154 0.3030 0.3080 0.3084 0.2641 0.2744 0.2864 0.2909 0.009 
−1.27 0.5 0.5139 0.5070 0.5051 0.5082 0.4839 0.4904 0.4960 0.4964 0.01 
−0.59 0.7 0.7073 0.7154 0.7012 0.7111 0.7055 0.7031 0.7019 0.7005 0.009 
0.45 0.9 0.9083 0.9058 0.9047 0.9090 0.9040 0.9010 0.9016 0.9003 0.006 
0.98 0.95 0.9561 0.9544 0.9517 0.9557 0.9489 0.9530 0.9504 0.9498 0.004 
2.02 0.99 0.9919 0.9909 0.9913 0.9919 0.9878 0.9887 0.9897 0.9901 0.002 
the quantiles of TW1 law corresponding to the “TW” column are listed. We state 
the values of the empirical distributions of the renormalized λ1 for various triples 
at the corresponding quantiles in columns 3–10 and the standard errors based on 
binomial sampling are listed in the last column. QQ-plots corresponding to the 
triples (20,160,40) and (120,80,100) are also stated in Figure 1. 
Tables 2, 3 and Figures 2 and 3 are the same as Table 1 and the corresponding 
Figure 1 except that that we replace the Gaussian distribution by the some discrete 
distribution and uniform distribution. 
When considering the tests in Sections 3.1–3.3, one may refer to Tables 1–3 as 
well for their sizes at the nominal significant levels. 
FIG. 1. QQ plots of the triples (20,160,40) and (120,80,100) corresponding to Table 1. 
1576 X. HAN, G. PAN AND B. ZHANG 
TABLE 2 
Standard quantiles for several triples (p,m,n): Discrete distribution with the probability mass 
function P(x = √3) = P(x = −√3) = 1/6 and P(x = 0) = 2/3 
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25) 
Percentile TW M0 2M0 3M0 4M0 M1 2M1 3M1 4M1 2*SE 
−3.9 0.01 0.0192 0.0132 0.0136 0.0123 0.0006 0.0031 0.0046 0.0047 0.002 
−3.18 0.05 0.0637 0.0581 0.0571 0.0573 0.0216 0.0302 0.0321 0.0356 0.004 
−2.78 0.1 0.1147 0.1101 0.1099 0.1088 0.0626 0.0733 0.0757 0.0824 0.006 
−1.91 0.3 0.3100 0.2966 0.3060 0.3029 0.2665 0.2721 0.2808 0.2827 0.009 
−1.27 0.5 0.5000 0.4959 0.4969 0.4996 0.4841 0.4834 0.4985 0.4899 0.01 
−0.59 0.7 0.7025 0.7013 0.7099 0.7018 0.6990 0.6992 0.7109 0.6975 0.009 
0.45 0.9 0.9107 0.9061 0.9071 0.9036 0.9014 0.9040 0.9059 0.9001 0.006 
0.98 0.95 0.9566 0.9546 0.9538 0.9546 0.9503 0.9527 0.9526 0.9512 0.004 
2.02 0.99 0.9929 0.994 0.9903 0.9914 0.9890 0.9908 0.9901 0.9894 0.002 
FIG. 2. QQ plots of the triples (20,160,40) and (120,80,100) corresponding to Table 2. 
TABLE 3 
Standard quantiles for several triples (p,m,n): Continuous uniform distribution U(−√3,√3) 
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25) 
Percentile TW M0 2M0 3M0 4M0 M1 2M1 3M1 4M1 2*SE 
−3.9 0.01 0.0098 0.0117 0.0122 0.0120 0.0101 0.0087 0.0092 0.0096 0.002 
−3.18 0.05 0.0612 0.0632 0.0606 0.0592 0.0514 0.0462 0.0492 0.0482 0.004 
−2.78 0.1 0.1205 0.1243 0.1208 0.1197 0.1023 0.0942 0.1033 0.0992 0.006 
−1.91 0.3 0.3644 0.3542 0.351 0.3432 0.3132 0.2946 0.3101 0.3017 0.009 
−1.27 0.5 0.5767 0.5575 0.5563 0.5496 0.516 0.5073 0.5151 0.5069 0.01 
−0.59 0.7 0.7728 0.7540 0.7443 0.7440 0.7182 0.7123 0.714 0.7171 0.009 
0.45 0.9 0.9397 0.9243 0.9181 0.9202 0.9141 0.9068 0.9071 0.9059 0.006 
0.98 0.95 0.9722 0.9672 0.9599 0.9614 0.9584 0.9538 0.9556 0.9534 0.004 
2.02 0.99 0.9959 0.9941 0.993 0.9922 0.9932 0.9912 0.9919 0.9916 0.002 
TRACY–WIDOM LAW 1577 
FIG. 3. QQ plots of the triples (120,320,160) and (320,140,200) corresponding to Table 3. 
3.5.2. Power study of “Long-Side Spherical Test (LSST) for separable covari- 
ance matrices” (see Section 3.1). We consider the alternative model as fol- 
lows: 
y1 = z1, yt = ayt−1 + 
√ 
1 − a2zt , t = 2, . . . ,N, 
where {yt }Nt=1 are p-dimensional vectors, {zt }Nt=1 are independent noise vectors 
satisfying Condition 1 and a ∈ (0,1). It is easy to see that {yt }Nt=1 are a station- 
ary sequence, and hence the matrix T [see (3.1)] is a Toeplitz matrix. We then 
suggest to choose the set S = {1, . . . , N4 , 3N4 , . . . ,N} so that the correlation 
structures involved in Z1 and Z2 are different. In this subsection, we also compare 
our approach (denoted by LSST) with that raised by BAO, etc. in [3] (denoted by 
OSI). 
The power of the tests is listed in Table 4 below and the nominal significant 
level of the tests is 5%. 
From the table, we can see that the OSI approach does not gain power under 
the alternative hypothesis, which is consistent with our analysis before. The power 
of LSST is larger when either dimension or a becomes larger. When a is small, 
say 0.1, the power is very poor. This phenomenon is easy to understand since 
a3 = 0.001 ≈ 0, that is, Cov(yt ,yt+3) = 0.001 ≈ 0, the data {yt }Tt=1 look like in- 
dependent. 
3.5.3. Power of “Equality of K covariance matrices (EOM)” for k = 2. We 
study the power of the test and consider the alternative case (3.2). 
When YY∗ is invertible, we choose = I + τ p/m+r1−p/me1eT1 mentioned be- 
low (3.2). When YY∗ is not invertible, by Theorem 1.2 of [2] we can find out 
1578 X. HAN, G. PAN AND B. ZHANG 
TABLE 4 
Power of several two-tuples (p,N): the entries of zt follow continuous uniform distribution 
U(−√3,√3) 
Initial two-tuples M0 = (5,40) and M1 = (30,40) 
Triples Approach a = 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 
M0 LSST 0.0198 0.0226 0.0251 0.0363 0.0608 0.1223 0.2510 0.4812 0.7789 
OSI 0.0593 0.0618 0.0645 0.0669 0.0782 0.0841 0.0935 0.1204 0.1845 
2M0 LSST 0.0324 0.0339 0.0389 0.0709 0.1669 0.3910 0.7120 0.9584 0.9996 
OSI 0.0590 0.0607 0.0622 0.0601 0.0631 0.0694 0.0741 0.0815 0.1028 
3M0 LSST 0.0388 0.0386 0.0474 0.1275 0.3062 0.6668 0.9516 0.9996 1.0000 
OSI 0.0616 0.0593 0.0574 0.0607 0.0590 0.0655 0.0665 0.0766 0.0854 
4M0 LSST 0.0389 0.0380 0.0688 0.1805 0.4707 0.8653 0.9961 1.0000 1.0000 
OSI 0.0579 0.0606 0.0583 0.0623 0.0598 0.0609 0.0669 0.0672 0.0753 
5M0 LSST 0.0390 0.0409 0.0862 0.2463 0.6321 0.9567 1.0000 1.0000 1.0000 
OSI 0.0551 0.0589 0.0550 0.0570 0.0641 0.0618 0.0651 0.0671 0.0704 
M1 LSST 0.0293 0.0321 0.0364 0.0507 0.0730 0.1025 0.1438 0.1868 0.2267 
OSI 0.0613 0.0615 0.0668 0.0676 0.0695 0.0678 0.0667 0.0705 0.0904 
2M1 LSST 0.0360 0.0369 0.0499 0.0878 0.1503 0.2467 0.3531 0.4604 0.5364 
OSI 0.0550 0.0605 0.0565 0.0601 0.0595 0.0613 0.0632 0.0578 0.0577 
3M1 LSST 0.0388 0.0453 0.0689 0.1333 0.2550 0.4188 0.5903 0.7249 0.7883 
OSI 0.0562 0.0602 0.0566 0.0552 0.0611 0.0558 0.0601 0.0566 0.0518 
4M1 LSST 0.0439 0.0478 0.0920 0.1871 0.3556 0.5922 0.7885 0.8914 0.9316 
OSI 0.0587 0.0562 0.0583 0.0608 0.0611 0.0577 0.0556 0.0559 0.0540 
5M1 LSST 0.0396 0.0504 0.1107 0.2458 0.4794 0.7570 0.9068 0.9659 0.9831 
OSI 0.0566 0.0585 0.0579 0.0607 0.0580 0.0622 0.0603 0.0576 0.0538 
that the smallest nonzero eigenvalue of 1 

−1/2YY∗−1/2 is not spiked for the 
above . So it is hard to get a spiked F matrix. Therefore, we use another ma- 
trix 
(ω) = 
⎛ 
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ 

ω 

ω 
.. . 

ω 
⎞ 
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎠ 

In Tables 5–7, the data X and Y are generated as in Tables 1–3 and the nominal 
significant level of our test is 5%. 
TRACY–WIDOM LAW 1579 
TABLE 5 
Power of several triples (p,m,n): Gaussian distribution 
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25) 
τ M0 2M0 3M0 4M0 ω M1 2M1 3M1 4M1 
0.5 0.0672 0.0585 0.0563 0.0593 0.3 0.2178 0.4934 0.7071 0.8419 
2 0.2763 0.3801 0.4551 0.5067 0.6 0.0574 0.1332 0.2241 0.3106 
4 0.6291 0.816 0.9072 0.9567 2 0.1037 0.2166 0.3463 0.5029 
6 0.8162 0.9543 0.988 0.9967 3 0.2242 0.5521 0.8156 0.9537 
TABLE 6 
Power of several triples (p,m,n): Discrete distribution with the probability mass function 
P(x = √3) = P(x = −√3) = 1/6 and P(x = 0) = 2/3 
Initial triple M0 = (5,40,10) Initial triple M1 = (30,20,25) 
τ M0 2M0 3M0 4M0 ω M1 2M1 3M1 4M1 
0.5 0.0674 0.0573 0.0576 0.0595 0.3 0.2101 0.4883 0.7024 0.8425 
2 0.3045 0.397 0.4561 0.5171 0.6 0.057 0.1382 0.2176 0.3078 
4 0.647 0.8137 0.8984 0.9478 2 0.1055 0.2232 0.3504 0.4974 
6 0.8147 0.943 0.9813 0.9936 3 0.2254 0.5487 0.8211 0.9529 
TABLE 7 
Power of several triples (p,m,n): Continuous uniform distribution U(−√3,√3) 
Initial triple M0 = (30,80,40) Initial triple M1 = (80,40,50) 
τ M0 2M0 3M0 4M0 ω M1 2M1 3M1 4M1 
0.5 0.0394 0.0452 0.0497 0.0440 0.3 0.8209 0.9865 0.9995 1.0000 
2 0.4495 0.6622 0.7989 0.8708 0.6 0.2712 0.5675 0.7669 0.8765 
4 0.9791 0.9998 1.0000 1.0000 2 0.3115 0.7163 0.9381 0.9937 
6 1.0000 1.0000 1.0000 1.0000 3 0.7285 0.9933 1.0000 1.0000 
In Tables 5–7, we can find that when τ = 0.5 < 1, (YY∗)−11/2XX∗1/2 is 
not a spiked F matrix and the power is poor. When τ > 1, it is a spiked F matrix 
and by Proposition 1 the power increases with the dimension and τ . This phe- 
nomenon is due to the fact that it may not cause significant change to the largest 
eigenvalue of F matrix when finite rank perturbation is weak enough. This phe- 
nomenon has been widely discussed for sample covariance matrices; see [10] 
and [3]. For the spiked F matrix, one can refer to [5] and [34]. For the non- 
invertible case when is far away from I (ω = 0.3 or 3), the power becomes 
better. This is because when the empirical spectral distribution (ESD) of is 
1580 X. HAN, G. PAN AND B. ZHANG 
TABLE 8 
Power comparison of several triples (p,m,n): Gaussian distribution with M0 = (5,40,10) 
EOM CLRT 
τ M0 2M0 3M0 4M0 M0 2M0 3M0 4M0 
0.5 0.0672 0.0585 0.0563 0.0593 0.1051 0.0909 0.0777 0.0787 
2 0.2763 0.3801 0.4551 0.5067 0.2696 0.2614 0.2503 0.2500 
4 0.6291 0.816 0.9072 0.9567 0.5385 0.5867 0.6126 0.6282 
6 0.8162 0.9543 0.988 0.9967 0.7318 0.8157 0.8499 0.8676 
very different from the M–P law λ1 may tend to another point μ instead of 
μp . Then we may gain good power because n˘2/3(μ − μp) may tend to infin- 
ity. 
Table 8 lists the power comparison between our approach (denoted by EOM) 
and CLRT, where we only consider the alternative case (3.2) and = I + 
τ 
p/m+r 
1−p/me1e 

1 . The nominal significant level is also 5%. 
The comparison shows that when is a spiked covariance matrix, our statis- 
tic performs better than CLRT, which is consistent with our discussion in Sec- 
tion 3.2. 
3.5.4. Power of “correlated noise detection”. We consider the model (3.6), 
that is, yt = Axt + 1/2εt , t = 1,2, . . . , T . Here, we choose k = 1, A =√ 
τ 
p/m+r 
1−p/me1, = I, xt ∼ U(− 
√ 
3, 
√ 
3) and the entries of εt also follow the 
uniform distribution U(−√3,√3). Then it is easy to see that Cov(yt ) = I + 
τ 
p/m+r 
1−p/me1e 

1 , which is similar to Section 3.5.3. Since we can generate an i.i.d. 
copy of εt , say zt , t = 1, . . . , T1, in engineering as we discussed in Section 3.3, we 
always choose T1 >p for simplicity. 
Table 9 lists the power study of the model and the nominal significant level of 
our test is 5%. 
TABLE 9 
Power of several triples (p,T1, T ): Continuous uniform distribution U(− 
√ 
3, 
√ 
3) 
Initial triple M0 = (30,80,40) 
τ M0 2M0 3M0 4M0 
0.5 0.0412 0.0436 0.0485 0.0469 
2 0.4485 0.6399 0.7624 0.8401 
4 0.9486 0.9983 0.9998 1.0000 
6 0.9964 1.0000 1.0000 1.0000 
TRACY–WIDOM LAW 1581 
The simulation result in Table 9 is similar to that for the spiked case in Ta- 
ble 7 above. But we would mention one difference. Write yt = ( 
√ 
τ 
p/m+r 
1−p/me1I) 
(st 
εt 

with ( 
√ 
τ 
p/m+r 
1−p/me1I) being of size p × (p + 1). However, Theorem 3.1 of [34] 
did not consider such a rectangular matrix. This means we cannot directly apply 
Proposition 1 to say the power tends to 1 when τ > 1. But when the (p + 1)- 
dimensional vector 
(st 
εt 

follows standard multivariate Gaussian distribution, we 
then have yt 
d= (τ p/m+r1−p/me1eT1 + I)1/2zt , where zt d= εt . This means that Propo- 
sition 1 holds under the Gaussian case. From the simulation result, we conjec- 
ture that Theorem 3.1 of [34] and Proposition 1 still hold for this model even if 

√ 
τ 
p/m+r 
1−p/me1I) is not a square matrix. 
4. Proof of part (i) of Theorem 2.1. In this section, we focus on (i) of Theo- 
rem 2.1, that is, (m˘, n˘, p˘) = (m,n,p). Actually, Lemmas 1 and 2 below hold if we 
replace (m,n,p) by (m˘, n˘, p˘). 
4.1. Two key lemmas. This subsection is to first prove two key lemmas for 
proving part (i) of Theorem 2.1. We begin with some notation and definitions. 
Throughout the paper, we use M,M0,M ′0,M ′′0 ,M1,M ′′1 to denote some generic 
positive constants whose values may differ from line to line. We also use D to 
denote sufficiently large positive constants whose values may differ from line to 
line. We say that an event holds with high probability if for any big positive 
constant D 



)≤ n−D, 
for sufficiently large n. Recall the definition of γj in (2.4). Let cp,0 ∈ [0, ap) satisfy 


p∑ 
j=1 

cp,0 
γj − cp,0 
)2 
= n 

.(4.1) 
Existence of cp,0 will be verified in Lemma 1 below. Moreover, define 
μp,0 = 1 
cp,0 

1 + 1 

p∑ 
j=1 

cp,0 
γj − cp,0 
)) 

(4.2) 

σ 3p,0 
= 1 
c3p,0 

1 + 1 

p∑ 
j=1 

cp,0 
γj − cp,0 
)3) 

Set Ap = 1mYY∗ and Bp = 1nXX∗. Rank the eigenvalues of the matrix Ap as 
γˆ1 ≥ γˆ2 ≥ · · · ≥ γˆp . Let cˆp ∈ [0, γˆp) satisfy 


p∑ 
j=1 

cˆp 
γˆj − cˆp 
)2 
= n 

.(4.3) 
1582 X. HAN, G. PAN AND B. ZHANG 
The existence of cˆp with high probability will be given in Lemma 2 below. More- 
over, set 
μˆp = 1 
cˆp 

1 + 1 

p∑ 
j=1 

cˆp 
γˆj − cˆp 
)) 

(4.4) 

σˆ 3p 
= 1 
cˆ3p 

1 + 1 

p∑ 
j=1 

cˆp 
γˆj − cˆp 
)3) 

We now discuss the properties of cp, cp,0, cˆp,μp,μp,0, μˆp, σp, σp,0 defined 
(2.5)–(2.7), (4.1)–(4.4) in the next two lemmas. These lemmas are crucial to the 
proof strategy which transforms F matrices into an appropriate sample covariance 
matrix. 
LEMMA 1. Under the conditions in Theorem 2.1, there exists a constant M0 
such that 
sup 


cp 
ap − cp 

≤ M0, sup 


cp,0 
ap − cp,0 

≤ M0,(4.5) 
lim 
p→∞n 
2/3|μp −μp,0| = 0(4.6) 
and 
lim 
p→∞ 
σp 
σp,0 
= 1, lim sup 

cp,0 
ap 
< 1.(4.7) 
LEMMA 2. Under the conditions in Theorem 2.1, for any ζ > 0 there exists a 
constant Mζ ≥ M0 such that 
sup 


cˆp 
γˆp − cˆp 

≤ Mζ , lim sup 

cˆp 
γˆp 
< 1(4.8) 
and 
lim 
p→∞n 
2/3|μˆp −μp,0| = 0, lim 
p→∞ 
σˆp 
σp,0 
= 1(4.9) 
hold with high probability. Indeed (4.8) and (4.9) hold on the event Sζ defined by 
Sζ = {∀j,1 ≤ j ≤ p, |γˆj − γj | ≤ pζp−2/3j˜−1/3},(4.10) 
where ζ is a sufficiently small positive constant and j˜ = min{min{m,p}+1−j, j}. 
The proofs of Lemmas 1 and 2 are given in the supplementary material [12]. 
TRACY–WIDOM LAW 1583 
4.2. Proof of part (i) of Theorem 2.1. Recall the definition of the matrices Ap 
and Bp above (4.3). Define a F matrix F = A−1p Bp whose largest eigenvalue is 
λ1 according to the definition of λ1 in Theorem 2.1. It then suffices to find the 
asymptotic distribution of λ1 to prove Theorem 2.1. 
Recalling the definition of the event Sζ in (4.10), we may write 


σpn 
2/3(λ1 −μp) ≤ s) 
= P ((σpn2/3(λ1 −μp) ≤ s)∩ Sζ )+ P ((σpn2/3(λ1 −μp) ≤ s)∩ Scζ ). 
This implies that (2.10) is equivalent to 
lim 
p→∞P 
(( 
σpn 
2/3(λ1 −μp) ≤ s)∩ Sζ )= F1(s),(4.11) 
where we use the fact that P(Scζ ) ≤ p−D for any positive D by Theorem 3.3 
of [25]. 
Write 
σpn 
2/3(λ1 −μp) = σp 
σˆp 
σˆpn 
2/3(λ1 − μˆp)+ σpn2/3(μˆp −μp)(4.12) 
[see (4.3) and (4.4) for σˆp and μˆp]. Note that the eigenvalues of A−1p are 1γˆ1 ≤ 

γˆ2 
≤ · · · ≤ 1 
γˆp 
. Rewrite (4.3) as 


p∑ 
j=1 

(1/γˆj )cˆp 
1 − (1/γˆp)cˆp 
)2 
= n 

.(4.13) 
Also recast (4.4) as 
μˆp = 1 
cˆp 

1 + p 



p∑ 
j=1 
(1/γˆj )cˆp 
1 − (1/γˆp)cˆp 


(4.14) 

σˆ 3p 
= 1 
cˆ3p 

1 + p 



p∑ 
j=1 
(1/γˆj )cˆp 
1 − (1/γˆp)cˆp 
)3 

Up to this stage, the result about the largest eigenvalue of the sample covariance 
matrices ZZ∗ with being the population covariance matrix comes into play 
where Z is of size p × n satisfying Condition 1 and is of size p × p. A key 
condition to ensure Tracy–Widom’s law for the largest eigenvalue is that if ρ ∈ 
(0,1/σ1) is the solution to the equation∫ ( 
tρ 
1 − tρ 
)2 
dF(t) = n 

(4.15) 
then 
lim sup 

ρσ1 < 1(4.16) 
1584 X. HAN, G. PAN AND B. ZHANG 
(one may see [6], Conditions 1.2 and 1.4 and Theorem 1.3 [3], Definition 2.7(i) and 
Corollary 3.19 of [17]). Here, F(t) denotes the empirical spectral distribution of 
and σ1 means the largest eigenvalue of . Now given Ap , if we treat A−1p as 
, then (4.16) is satisfied on the event Sζ due to (4.3) and (4.8) in Lemma 2. It 
follows from Theorem 1.3 of [3] and Corollary 3.19 of [17] that 
lim 
p→∞P 
(( 
σˆpn 
2/3(λ1 − μˆp) ≤ s)∩ Sζ |Ap)= F1(s),(4.17) 
which implies that 
lim 
p→∞P 
(( 
σˆpn 
2/3(λ1 − μˆp) ≤ s)∩ Sζ )= F1(s).(4.18) 
Moreover, by Lemmas 1 and 2 we obtain on the event Sζ 
lim 
p→∞ 
σp 
σˆp 
= 1(4.19) 
and 
lim 
p→∞σpn 
2/3(μˆp −μp) = 0.(4.20) 
Equation (4.11) then follows from (4.12), (4.17)–(4.20) and Slutsky’s theorem. 
The proof is complete. 
5. Proof of part (ii) of Theorem 2.1: Standard Gaussian distribution. This 
section is to consider the case when {Xij } follow normal distribution with mean 
zero and variance one. We below first introduce more notation. Let A = (Aij ) be a 
matrix. We define the following norms: 
‖A‖ = max|x|=1 |Ax|, ‖A‖∞ = maxi,j |Aij |, ‖A‖F = 
√∑ 
ij 
|Aij |2, 
where |x| represents the Euclidean norm of a vector x. Notice that ‖ ·‖ is a “pseudo 
norm” and we have a simple relationship among these norms 
‖A‖∞ ≤ ‖A‖ ≤ ‖A‖F . 
We also need the following commonly used definition about stochastic domination 
to simplify the statements. 
DEFINITION 2 (Stochastic domination). Let 
ξ = {ξ (n)(u) : n ∈N, u ∈ U(n)}, ζ = {ζ (n)(u) : n ∈N, u ∈ U(n)} 
be two families of random variables, where U(n) is a n-dependent parameter set 
(or independent of n). If for sufficiently small positive ε and sufficiently large σ , 
sup 
u∈U(n) 

[∣∣ξ (n)(u)∣∣> nε∣∣ζ (n)(u)∣∣]≤ n−σ 
TRACY–WIDOM LAW 1585 
for large enough n ≥ n(ε, σ ), then we say that ζ stochastically dominates ξ uni- 
formly in u. We denote this relationship by |ξ | ≺ ζ and also write it as ξ = O≺(ζ ). 
Furthermore, we also write it as |x| ≺ y if x and y are both nonrandom and 
|x| ≤ nε|y| for sufficiently small positive ε. 
PROOF OF PART (ii) OF THEOREM 2.1. We start the proof by reminding read- 
ers that m

p. Since mfunction of 1 

Y∗Y is the M–P law and we denote its density by ρpm(x). We define 
γm,1 ≥ γm,2 ≥ · · · ≥ γm,m to satisfy∫ +∞ 
γm,j 
ρpm dx = j 

,(5.1) 
with γm,0 = (1 + 
√ 


)2, γm,m = (1 − 
√ 


)2. Correspondingly, denote the eigen- 
values of 1 

Y∗Y by γˆm,1 ≥ γˆm,2 ≥ · · · ≥ γˆm,m. Here, we would remind the readers 
that ρpm(x), γm,j , γˆm,1 are similar to those in (2.3), below (2.3) and above (4.3) 
except that we are interchanging the role of p and m because we are considering 


Y∗Y rather than 1 

YY∗. Moreover, by Theorem 3.3 of [25] and (4.10) for any 
sufficiently small ζ > 0 and big D > 0 there exists an event Sζ (here with a bit 
abuse of notion Sζ ) such that 
Sζ = {∀j,1 ≤ j ≤ m, |γˆm,j − γm,j | ≤ pζ−2/3j˜−1/3}(5.2) 
and 


Scζ 
)≤ p−D.(5.3) 
Note that 1 

YY∗ and 1 

Y∗Y have the same nonzero eigenvalues. To simplify 
notation, let mp = m+ n− p. Write 


YY∗ = U∗ 
(D 0 
0 0 

U,(5.4) 
with D = diag{γˆm,1, γˆm,2, . . . , γˆm,m} and U is an orthogonal matrix. Then 
det(λYY∗ 

− XX∗ 
mp 
) = 0 is equivalent to 
det 

λ 
(D 0 
0 0 

− 1 
mp 
UXX∗U∗ 

= 0. 
Moreover, since {Xij } are independent standard normal random variables and U is 
an orthogonal matrix we have UX d= X so that it suffices to consider the following 
determinant: 
det 

λ 
(D 0 
0 0 

− 1 
mp 
XX∗ 

= 0.(5.5) 
Here, d= means having the identical distribution. 
1586 X. HAN, G. PAN AND B. ZHANG 
Now rewrite X as X = (X1X2), where X1 is a m×n matrix and X2 is a (p−m)×n 
matrix. It follows that 
XX∗ = 
(X1X∗1 X1X∗2 
X2X∗1 X2X∗2 


(X11 X12 
X21 X22 

.(5.6) 
Equation (5.5) can be rewritten as 
det 
( 1 
mp 
X11 − λD 1mp X12 

mp 
X21 1mp X22 

= 0. 
Since m+ n > p, X22 is invertible. Equation (5.5) is further equivalent to 
det 
( 1 
mp 
X11 − λD − 1 
mp 
X12X−122 X21 

= 0.(5.7) 
Moreover, 
X11 − X12X−122 X21 = X1X∗1 − X1X∗2 

X2X∗2 
)−1X2X∗1 
= X1(In − X∗2(X2X∗2)−1X2)X∗1. 
Since rank(In − X∗2(X2X∗2)−1X2) = m+ n− p = mp , we can write 
In − X∗2 

X2X∗2 
)−1X2 = V 
( Imp 0 
0 0 

V∗, 
where V is an orthogonal matrix. In view of the above, we can construct a m×mp 
matrix Z = (Zij )m,mp consisting of independent standard normal random variables 
so that 
X11 − X12X−122 X21 d= ZZ∗.(5.8) 
It follows that (5.7), and hence (5.5) are equivalent to 
det 
( 1 
mp 
ZZ∗ − λD 

= 0.(5.9) 
It then suffices to consider the largest eigenvalue of 1 
mp 
D−1ZZ∗. Denote by λ1 
the largest eigenvalue of 1 
mp 
D−1ZZ∗. As in (4.3) and (4.4) define cˆm ∈ [0, γˆm,m) 
to satisfy 


m∑ 
j=1 

cˆm 
γˆm,j − cˆm 
)2 
= mp 

(5.10) 
and μˆp and σˆp by 
μˆm = 1 
cˆm 

1 + 1 
mp 
m∑ 
j=1 

cˆm 
γˆm,j − cˆm 
)) 


σˆ 3m 
= 1 
cˆ3m 

1 + 1 
mp 
m∑ 
j=1 

cˆm 
γˆm,j − cˆm 
)3) 

TRACY–WIDOM LAW 1587 
From Lemma 2, we have on the event Sζ 
lim sup 

cˆm 
γˆm,m 
< 1,(5.11) 
which implies condition (4.16). It follows from Theorem 1.3 of [3] and Corol- 
lary 3.19 of [17] that 
lim 
p→∞P 

σˆm(m+ n− p)2/3(λ1 − μˆm) ≤ s)= F1(s).(5.12) 
As in the proof of Theorem 2.1, by Lemmas 1 and 2 one may further conclude that 
lim 
p→∞P 

σp(m+ n− p)2/3(λ1 −μp) ≤ s)= F1(s).(5.13)  
6. Proof of part (ii) of Theorem 2.1: General distributions. The aim of this 
section is to relax the Gaussian assumption on X. We below assume that X and Y 
are real matrices. The complex case can be handled similarly, and hence we omit 
it here. In the sequel, we absorb 1√ 
m+n−p and 
1√ 

into X and Y, respectively [i.e., 
Var(Xij ) = 1m+n−p , Var(Yst ) = 1p ] for convenience. 
In terms of the notation in this section [Var(Yst ) = 1p ], (5.4) can be rewritten as 
YY∗ = U∗ 
(D 0 
0 0 

U. 
Break U as 
(U1 
U2 

where U1 and U2 are m × p and (p − m) × p, respectively. By 
(5.4)–(5.7) (note that here we cannot omit U by UX d= X), the maximum eigen- 
value of det(λYY∗ − XX∗) is equivalent to that of the following matrix: 
A = D−1/2U1X(I − XT UT2 (U2XXT UT2 )−1U2X)XT UT1 D−1/2(6.1) 
= D−1/2U1X(I − PXT UT2 )X 
T UT1 D−1/2, 
where PXT UT2 is the projection matrix. It is not necessary to assume that U2XX 
T UT2 
is invertible since PXT UT2 is unique even if (U2XX 
T UT2 )− is the generalized inverse 
matrix of U2XXT UT2 . Moreover, we indeed have the following lemma to control 
the smallest eigenvalue of U2XXT UT2 . 
LEMMA 3. Suppose that (m + n − p)1/2X satisfies Condition 1. Then 
U2XXT UT2 is invertible and ∥∥(U2XXT UT2 )−1∥∥≤ M(6.2) 
for a large constant M with high probability. Moreover,∥∥XX∗∥∥≤ M(6.3) 
with high probability under conditions in Theorem 2.1. 
1588 X. HAN, G. PAN AND B. ZHANG 
PROOF. One may check that the conditions in Theorem 3.12 in [17] are satis- 
fied when considering U2XXT UT2 . Applying Theorem 3.12 in [17], then yields∣∣∣∣λmin(U2XXT UT2 )− 

1 − 
√ 

p −m 
)2∣∣∣∣≺ n−2/3, 
where (1 − 
√ 

p−m) 
2 can be obtained when considering the special case when the 
entries of X are Gaussian. As for (6.3), see Lemma 4.8 in [17].  
Since the matrix in (6.1) is quite complicated, we construct a linearization ma- 
trix for it 
H = H(X) = 
⎛ 
⎝ −zI 0 D 
−1/2U1X 
0 0 U2X 
XT UT1 D−1/2 XT UT2 −I 
⎞ 
⎠ .(6.4) 
The connection between H and the matrix in (6.1) is that the upper left block of the 
3 × 3 block matrix H−1 is the Stieltjes transform of (6.1) by simple calculations. 
We next give the limit of the Stieltjes transform of (6.1) and need the following 
well-known result (see [1]). There exists a unique solution m(z) : C+ → C such 
that 

m(z) 
= −z + m 
m+ n− p 
∫ 

1 + tm(z) dHn(t),(6.5) 
where Hn is the empirical distribution function of D−1. Moreover, we set 
m(z) = − 1 

Tr 



1 +m(z)D−1))−1, ρ(x) = lim 
z∈C+→x 
m(z). 
From the end of the last section, we see that under the Gaussian case (6.1) d= 
D−1/2ZZ∗D−1/2. Hence, it is easy to see that μˆm defined above (5.11) is the right 
most end point of the support of ρ(x). 
For any small positive constant τ , we define the domains 
E(τ,n) = {z = E + iη ∈C+ : |z| ≥ τ, |E| ≤ τ−1, n−1+τ ≤ η ≤ τ−1},(6.6) 
E+ = E+(τ, τ ′, n)= {z ∈ E(τ,n) : E ≥ μˆm − τ ′},(6.7) 
where τ ′ is a sufficiently small positive constant. 
Set 
= (z) = 
√ 
m(z) 
nη 
+ 1 
nη 

(6.8) 
G(z) = H−1, = (z) = z−1(1 +m(z)D−1)−1, 
TRACY–WIDOM LAW 1589 
and 
F(z) = 
⎛ 
⎜⎝ 
− D−1/2U1XXT UT2  
U2XXT UT1 D−1/2 + U2XXT UT1 D−1/2D−1/2U1XXT UT2  
0 XT UT2  

U2X( 
zm(z)+ 1)(I − PXT UT2 ) 
⎞ 
⎟⎠ , 
where = (U2XXT UT2 )−1. In fact, F(z) is close to G(z) with high probability. 
We are now in a position to state our main result about the local law near μˆm, 
the right end point of the support of the limit of the ESD of A in (6.1). 
THEOREM 6.1 (Strong local law). Suppose that (m+ n− p)1/2X and p1/2Y 
satisfy the conditions of Theorem 2.1. Then: 
(i) For any deterministic unit vectors v, w ∈Rp+n.〈 
v, 

G(z)− F(z))w〉≺ (6.9) 
uniformly z ∈ E+ and 
(ii) 
∣∣mn(z)−m(z)∣∣≺ 1 
nη 
(6.10) 
uniformly in z ∈ E+, where mn(z) = 1m 
∑m 
i=1 Gii . 
The proof of Theorem 6.1 is provided in the supplementary material [12]. 
6.1. Convergence rate on the right edge and universality. 
6.1.1. Convergence rate on the right edge. We state the following lemma and 
the proof is given in the supplementary material [12]. 
LEMMA 4. Denote by λ1 the largest eigenvalue of A in (6.1). Under condi- 
tions of Theorem 2.1, 
λ1 − μˆm = O≺(n−2/3). 
6.1.2. Universality. The aim of this subsection is to prove (ii) of Theorem 2.1. 
By (5.12) and (5.13), it suffices to prove edge universality at the rightmost edge of 
the support μˆm. In other words, the asymptotic distribution of λ1 is not affected by 
the distribution of the entries of X under the 3rd moment matching condition. Sim- 
ilar to Theorem 6.4 of [9], we first show the following Green function comparison 
theorem. 
1590 X. HAN, G. PAN AND B. ZHANG 
THEOREM 6.2. There exists ε0 > 0. For any ε < ε0, set η = n−2/3−ε , E1, 
E2 ∈R with E1 |E1 − μˆm|, |E2 − μˆm| ≤ n−2/3+ε. 
Suppose that K :R→R is a smooth function with bounded derivatives up to fifth 
order. Then there exists a constant φ > 0 such that for large enough n∣∣∣∣EK 


∫ E2 
E1 
mX1(x + iη) dx 

−EK 


∫ E2 
E1 
mX0(x + iη) dx 
)∣∣∣∣ 
(6.11) 
≤ n−φ 
[see Definition 1 or (2.8) for X1 and X0]. 
The proof of Theorem 6.2 is given in the supplementary material [12]. 
In order to prove the Tracy–Widom law, we need to connect the probability 
P(λ1 ≤ E) with Theorem 6.2. 
By Lemma 4, we can fix E∗ ≺ n−2/3 such that it suffices to consider λ1 ≤ 
μˆm + E∗. Choosing |E − μˆm| ≺ n−2/3, η = n−2/3−9ε and l = 12n−2/3−ε , then 
for some sufficiently small constant ε > 0 and sufficiently large constant D, there 
exists a constant n0(ε,D) such that 
EK 


π 
∫ μˆm+E∗ 
E−l 
mX1(x + iη) dx 

(6.12) 
≤ P(λ1 ≤ E) ≤ EK 


π 
∫ μˆm+E∗ 
E+l 
mX1(x + iη) dx 

+ n−D, 
where n ≥ n0(ε,D) and K is a smooth cutoff function satisfying the condition of 
K in Theorem 6.2. We omit the proof of (6.12) because it is a standard procedure 
and one can refer to [9] or Corollary 5.1 of [4], for instance. Combining (6.12) 
with Theorem 6.2 one can prove Tracy–Widom’s law directly (see the proof of 
Theorem 1.3 of [3]). 
SUPPLEMENTARY MATERIAL 
Supplement: Proof of some lemmas and theorems (DOI: 10.1214/15- 
AOS1427SUPP; .pdf). In the supplementary file [12], we provide the proofs of 
(2.11), Lemmas 1, 2 and 4, Theorems 6.1 and 6.2. 
REFERENCES 
[1] BAI, Z. and SILVERSTEIN, J. W. (2006). Spectral Analysis of Large Dimensional Random 
Matrices, 1st ed. Springer, New York. 
[2] BAIK, J. and SILVERSTEIN, J. W. (2006). Eigenvalues of large sample covariance matrices of 
spiked population models. J. Multivariate Anal. 97 1382–1408. MR2279680 
[3] BAO, Z., PAN, G. and ZHOU, W. (2015). Universality for the largest eigenvalue of sample 
covariance matrices with general population. Ann. Statist. 43 382–421. MR3311864 
TRACY–WIDOM LAW 1591 
[4] BAO, Z. G., PAN, G. M. and ZHOU, W. (2013). Local density of the spectrum on the edge for 
sample covariance matrices with general population. Preprint. Available at http://www. 
ntu.edu.sg/home/gmpan/publications.html. 
[5] DHARMAWANSA, P., JOHNSTONE, I. M. and ONATSKI, A. (2014). Local asymptotic normal- 
ity of the spectrum of high-dimensional spiked F-ratios. Available at http://arxiv.org/pdf/ 
1411.3875.pdf. 
[6] EL KAROUI, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of com- 
plex sample covariance matrices. Ann. Probab. 35 663–714. MR2308592 
[7] ERD ˝OS, L., KNOWLES, A. and YAU, H. (2013). Averaging fluctuations in resolvents of ran- 
dom band matrices. Ann. Henri Poincaré 14 1837–1926. MR3119922 
[8] ERD ˝OS, L., SCHLEIN, B. and YAU, H. (2009). Local semicircle law and complete delocaliza- 
tion for Wigner random matrices. Comm. Math. Phys. 287 641–655. MR2481753 
[9] ERD ˝OS, L., YAU, H. and YIN, J. (2012). Rigidity of eigenvalues of generalized Wigner matri- 
ces. Adv. Math. 229 1435–1515. MR2871147 
[10] FÉRAL, D. and PÉCHÉ, S. (2009). The largest eigenvalues of sample covariance matrices for 
a spiked population: Diagonal case. J. Math. Phys. 50 073302, 33. MR2548630 
[11] FUJIKOSHI, Y., ULYANOV, V. V. and SHIMIZU, R. (2010). Multivariate Statistics: High- 
Dimensional and Large-Sample Approximations. Wiley, Hoboken, NJ. MR2640807 
[12] HAN, X., PAN, G. and ZHANG, B. (2016). Supplement to “The Tracy–Widom law for the 
largest eigenvalue of F type matrices.” DOI:10.1214/15-AOS1427SUPP. 
[13] JOHANSSON, K. (2000). Shape fluctuations and random matrices. Comm. Math. Phys. 209 
437–476. MR1737991 
[14] JOHNSTONE, I. M. (2001). On the distribution of the largest eigenvalue in principal compo- 
nents analysis. Ann. Statist. 29 295–327. MR1863961 
[15] JOHNSTONE, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, 
Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716. MR2485010 
[16] JOHNSTONE, I. M. (2009). Approximate null distribution of the largest root in multivariate 
analysis. Ann. Appl. Stat. 3 1616–1633. MR2752150 
[17] KNOWLES, A. and YIN, J. (2015). Anisotropic local laws for random matrices. Available at 
arXiv:1410.3516v3. 
[18] LEDOIT, O. and WOLF, M. (2002). Some hypothesis tests for the covariance matrix when the 
dimension is large compared to the sample size. Ann. Statist. 30 1081–1102. MR1926169 
[19] LEE, J. O. and SCHNELLI, K. (2014). Tracy–Widom distribution for the largest eigenvalue of 
real sample covariance matrices with general population. Available at arXiv:1409.4979v1. 
[20] LEVANON, N. (1988). Radar Principles. Wiley, New York. 
[21] MAR ˘CENKO, V. A. and PASTUR, L. A. (1967). Distribution of eigenvalues for some sets of 
random matrices. Sb. Math. 4 457–483. 
[22] MUIRHEAD, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York. 
MR0652932 
[23] NADAKUDITI, R. R. and SILVERSTEIN, J. W. Fundamental limit of sample generalized eigen- 
value based detection of signals in noise using relatively few signal-bearing and noise- 
only samples. IEEE J. Sel. Top. Signal Process. 4 468–480. 
[24] PAUL, D. and SILVERSTEIN, J. W. (2009). No eigenvalues outside the support of the limiting 
empirical spectral distribution of a separable covariance matrix. J. Multivariate Anal. 100 
37–57. MR2460475 
[25] PILLAI, N. S. and YIN, J. (2014). Universality of covariance matrices. Ann. Appl. Probab. 24 
935–1001. MR3199978 
[26] SOSHNIKOV, A. (2002). A note on universality of the distribution of the largest eigenvalues in 
certain sample covariance matrices. J. Stat. Phys. 108 1033–1056. MR1933444 
[27] TAO, T. and VU, V. (2011). Random matrices: Universality of local eigenvalue statistics. Acta 
Math. 206 127–204. MR2784665 
1592 X. HAN, G. PAN AND B. ZHANG 
[28] TAO, T. and VU, V. (2012). Random covariance matrices: Universality of local statistics of 
eigenvalues. Ann. Probab. 40 1285–1315. MR2962092 
[29] TRACY, C. A. and WIDOM, H. (1994). Level-spacing distributions and the Airy kernel. Comm. 
Math. Phys. 159 151–174. MR1257246 
[30] TRACY, C. A. and WIDOM, H. (1996). On orthogonal and symplectic matrix ensembles. 
Comm. Math. Phys. 177 727–754. MR1385083 
[31] VINOGRADOVA, J., COUILLET, R. and HACHEM, W. (2013). Statistical inference in large 
antenna arrays under unknown noise pattern. IEEE Trans. Signal Process. 61 5633–5645. 
MR3130031 
[32] WACHTER, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann. 
Statist. 8 937–957. MR0585695 
[33] WANG, K. (2012). Random covariance matrices: Universality of local statistics of eigenvalues 
up to the edge. Random Matrices Theory Appl. 1 1150005, 24. MR2930383 
[34] WANG, Q. and YAO, J. (2015). Extreme eigenvalues of large-dimensional spiked Fisher ma- 
trices with application. Available at http://arxiv.org/pdf/1504.05087.pdf. 
[35] YAO, J. F., ZHENG, S. R. and BAI, Z. D. (2015). Large Sample Covariance Matrices and 
High-Dimemnsional Data Analysis. Cambridge Univ. Press, Cambridge. 
[36] ZENG, Y. H. and LIANG, Y. C. (2009). Eigenvalue-based spectrum sensing algorithms for 
cognitive radio. IEEE Transactions Communications 57 1784–1793. 
[37] ZHANG, L. X. (2006). Spectral Analysis of Large Dimensional Random Matrices. Ph.D. The- 
sis, National University of Singapore. 
[38] ZHENG, S. (2012). Central limit theorems for linear spectral statistics of large dimensional 
F -matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 444–476. MR2954263 
SCHOOL OF PHYSICAL AND MATHEMATICAL SCIENCES 
NANYANG TECHNOLOGICAL UNIVERSITY 
SINGAPORE 
E-MAIL: xhan011@e.ntu.edu.sg 
gmpan@ntu.edu.sg 
bzhang007@e.ntu.edu.sg 

essay、essay代写