AOS1463-r代写
时间:2022-11-02
The Annals of Statistics 
2017, Vol. 45, No. 1, 415–460 
DOI: 10.1214/16-AOS1463 
© Institute of Mathematical Statistics, 2017 
EXTREME EIGENVALUES OF LARGE-DIMENSIONAL SPIKED 
FISHER MATRICES WITH APPLICATION 
BY QINWEN WANG AND JIANFENG YAO 
University of Hong Kong 
Consider two p-variate populations, not necessarily Gaussian, with co- 
variance matrices 1 and 2, respectively. Let S1 and S2 be the correspond- 
ing sample covariance matrices with degrees of freedom m and n. When the 
difference between 1 and 2 is of small rank compared to p,m and n, 
the Fisher matrix S := S−12 S1 is called a spiked Fisher matrix. When p,m 
and n grow to infinity proportionally, we establish a phase transition for the 
extreme eigenvalues of the Fisher matrix: a displacement formula showing 
that when the eigenvalues of (spikes) are above (or under) a critical value, 
the associated extreme eigenvalues of S will converge to some point outside 
the support of the global limit (LSD) of other eigenvalues (become outliers); 
otherwise, they will converge to the edge points of the LSD. Furthermore, we 
derive central limit theorems for those outlier eigenvalues of S. The limiting 
distributions are found to be Gaussian if and only if the corresponding pop- 
ulation spike eigenvalues in are simple. Two applications are introduced. 
The first application uses the largest eigenvalue of the Fisher matrix to test 
the equality between two high-dimensional covariance matrices, and explicit 
power function is found under the spiked alternative. The second application 
is in the field of signal detection, where an estimator for the number of signals 
is proposed while the covariance structure of the noise is arbitrary. 
1. Introduction. Consider two p-variate populations with covariance matri- 
ces 1 and 2, and let S1 and S2 be the sample covariance matrices from samples 
of the two populations with degrees of freedom m and n, respectively. When the 
difference between 1 and 2 is of finite rank, the Fisher matrix S := S−12 S1 
is called a spiked Fisher matrix. In this paper, we derive three results related to 
the extreme eigenvalues of the spiked Fisher matrix for general populations in 
the large-dimensional regime, that is, the dimension (p) grows to infinity together 
with the two sample sizes (m and n). Our first result is a phase transition phe- 
nomenon for the extreme eigenvalues of S: a displacement formula showing that 
when the eigenvalues of (spikes) are above (or under) a critical value, the asso- 
ciated extreme eigenvalues of S will converge to some point outside the support 
of the global limit (LSD) of other eigenvalues (become outliers), and the loca- 
tion of this limit only depends on the corresponding population spike of and 
Received April 2015; revised March 2016. 
MSC2010 subject classifications. Primary 62H12; secondary 60F05. 
Key words and phrases. Large-dimensional Fisher matrices, spiked Fisher matrix, spiked popu- 
lation model, extreme eigenvalue, phase transition, central limit theorem, signal detection, high- 
dimensional data analysis. 
415 
416 Q. WANG AND J. YAO 
two dimension-to-sample-size ratios; otherwise, they will converge to the edge 
points of the LSD. The second result is on the second-order behavior of those out- 
lier eigenvalues of S. We show that after proper normalization, a packet of those 
outlier eigenvalues (corresponding to the same spike in ) converge to the eigen- 
values’ distribution of some structured Gaussian random matrix. In particular, the 
limiting distribution of the outlier eigenvalue of S (after normalization) is Gaus- 
sian if and only if the corresponding spike in is simple. Finally, as an extension, 
we consider the joint distribution of all those outlier eigenvalues (correspond to 
different spikes in ) as a whole, and it is shown that those outlier eigenvalues 
(after normalization) converge to the eigenvalues’ distribution of some block ran- 
dom matrix, whose structure can be fully identified. Also as a special case, if all 
the spikes in are simple, then the joint distribution of the outlier eigenvalues of 
S is multivariate Gaussian. 
There exists a vast literate on the spectral analysis of multivariate Fisher matri- 
ces under the assumption that both populations are Gaussian and share the same 
covariance matrix, that is, 1 =2. The joint distribution of the eigenvalues of the 
corresponding Fisher matrix S was first simultaneously and independently pub- 
lished in 1939 by R. A. Fisher, S. N. Roy, P. L. Hsu and M. A. Girshick. Later in 
1980, Wachter (1980) finds a deterministic limit, the celebrated Wacheter distribu- 
tion, for the empirical measure of these eigenvalues when the dimension p grows 
to infinity proportionally with the degrees of freedom m and n (large-dimensional 
regime). Wachter’s result has been later extended to non-Gaussian populations us- 
ing the tools from the random matrix theory and two early examples of such ex- 
tensions are Silverstein (1985) and Bai, Yin and Krishnaiah (1987). 
In this paper, we are also interested in the large-dimensional regime, while al- 
lowing 1 and 2 to be separated by a (finite) rank-M matrix . Besides, the 
two populations can have arbitrary distributions other than Gaussian. From the 
perturbation theory, when M is a fixed integer while p, m and n grow to infinity 
proportionally, the empirical measure of the p eigenvalues of S will be affected 
by a difference of order M/p (→ 0), so that its limit remains the Wachter dis- 
tribution. Therefore, our main concern is the local asymptotic behaviors of the 
M extreme eigenvalues of S (other than the global limit). In a recent preprint 
Dharmawansa, Johnstone and Onatski (2014), by assuming both population are 
Gaussian and M = 1, these authors show that, when the norm of the rank-1 dif- 
ference (spike) exceeds a phase transition threshold, the asymptotic behavior of 
the log-ratio of the joint density of these characteristic roots under a local deviation 
from the spike depends only on the largest eigenvalue lp,1 and the statistical exper- 
iment of observing all the eigenvalues is locally asymptotically normal (LAN). As 
a by-product of their analysis, the authors also establish joint asymptotic normal- 
ity of a few of the largest eigenvalues when the corresponding spikes in (with 
M > 1) exceed the phase transition threshold. The analysis given in this refer- 
ence highly relies on the Gaussian assumption so that the joint density function of 
the eigenvalues has indeed an explicit form, and the main results are obtained via 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 417 
an accurate asymptotic approximation of the log-ratio of these density functions. 
Therefore, one of the main objectives of our work is to develop a general theory 
without such Gaussian assumption. It is thus apparent that the joint density of the 
eigenvalues of the Fisher matrix S has then no more an analytic formula and new 
techniques are needed to solve the questions. 
Our approach relies on the tools borrowed from the theory of random matrices. 
A methodology particularly successful both in theory and applications within this 
approach relies on the spiked population model coined in Johnstone (2001). This 
model assumes the population covariance matrix has the structure p = Ip +  
where the rank of is M (M is a fixed integer). Again for small rank M , the 
empirical eigenvalue distribution of the corresponding sample covariance matrix 
remains the standard Marcˇenko–Pastur law. What makes a difference is the local 
asymptotic behaviors of the extreme sample eigenvalues. For example, the fluctu- 
ation of largest eigenvalues of a sample covariance matrix from a complex spiked 
Gaussian population is studied in Baik, Ben Arous and Péché (2005), where the 
authors uncover a phase transition phenomenon: the weak limit and the scaling of 
these extreme eigenvalues are different depending on whether the eigenvalues of 
(spikes) are above, equal or below a critical value, situations refereed as super- 
critical, critical and sub-critical, respectively. In Baik and Silverstein (2006), the 
authors consider the spiked population model with general populations (not nec- 
essarily Gaussian). For the almost sure limits of the extreme sample eigenvalues, 
they find that if a population spike (in ) is large or small enough, the correspond- 
ing spiked sample eigenvalues will converge to a limit outside the support of the 
limiting spectrum (become outliers). In Paul (2007), a CLT is established for these 
outliers, that is, the super-critical case, under the Gaussian assumption and assum- 
ing that population spikes are simple (multiplicity 1). The CLT for super-critical 
outliers with general populations and arbitrary multiplicity numbers is developed 
in Bai and Yao (2008). Joint distributions for the outlier sample eigenvalues and 
eigenvectors can be found in Shi (2013) and Wang, Su and Yao (2014). A recent 
related application to high-dimensional regression can be found in Kargin (2015). 
Within the theory of random matrices, the techniques we use in this paper for 
spiked models are closely connected to other random matrix ensembles through the 
concept of small-rank perturbations. Theories on perturbed Wigner matrices can be 
found in Péché (2006), Féral and Péché (2007), Capitaine, Donati-Martin and Féral 
(2009), Pizzo, Renfrew and Soshnikov (2013) and Renfrew and Soshnikov (2013). 
In a more general setting of finite-rank perturbation including both the additive and 
the multiplicative one, referees include Benaych-Georges and Nadakuditi (2011), 
Benaych-Georges, Guionnet and Maida (2011) and Capitaine (2013). 
Apart from the theoretical results, we also propose two applications both in 
high-dimensional hypothesis testing and signal detection, respectively. The first 
application uses the largest eigenvalue of the Fisher matrix to test the following 
hypotheses: 
(1.1) H0 : 1 =2 vs. H1 : 1 =2 +, 
418 Q. WANG AND J. YAO 
where is a nonnegative definite matrix of rank M . Under this spiked alterna- 
tive H1, explicit formula for the power function is derived. Our second application 
is to propose an estimator for the number of signals based on noisy observations. 
Other than the existing approaches [see, e.g., Kritchman and Nadler (2008), Nadler 
(2010), Passemier and Yao (2012, 2014)], our method allows the covariance struc- 
ture of the noise to be arbitrary. 
The rest of the paper is organized as follows. First, in Section 2, the exact setting 
of the spiked Fisher matrix S = S−12 S1 is introduced. Then in Section 3, we estab- 
lish the phase transition phenomenon for the extreme eigenvalues of S: a displace- 
ment formula is found as well as the transition boundary is explicitly obtained. 
Next, CLTs for those outlier eigenvalues fluctuating around their limit (i.e., in the 
super-critical case) are established first in Section 4 for one group of sample eigen- 
values corresponding to a same population spike, and then in Section 6 for all the 
groups jointly. Section 5 contains numerical illustrations that demonstrate the finite 
sample performance of our results. In Section 7, we develop in detail two appli- 
cations in high-dimensional statistics. Proofs of the main theorems (Theorems 3.1 
and 4.1) are included in Section 8 while some technical lemmas are grouped in the 
Appendix. 
2. Spiked Fisher matrix and preliminary results. In what follows, we will 
assume that 2 = Ip . This assumption does not lose any generality since the eigen- 
values of the Fisher matrix S = S−12 S1 are invariant under the transformation 
S1 →−1/22 S1−1/22 , S2 →−1/22 S2−1/22 .(2.1) 
Also we will write p for 1 to signify the dependence on the dimension p. Let 
Z = (z1, . . . , zn)= (zij )1≤i≤p,1≤j≤n(2.2) 
and 
W = (w1, . . . ,wm)= (wkl)1≤k≤p,1≤l≤m(2.3) 
be two independent arrays, with respective size p × n and p ×m, of independent 
real-valued random variables with mean 0 and variance 1. Now suppose we have 
two samples {zi}1≤i≤n and {xi =1/2p wi}1≤i≤m, where {zi} and {wi} are given by 
(2.2) and (2.3), and p is a rank M (M is a fixed integer) perturbation of Ip , that 
is, 
(2.4) p = 

M 0 
0 Ip−M 


Here, M is a M ×M covariance matrix, containing k nonzero and nonunit eigen- 
values (ai), with multiplicity numbers (ni) (n1 + · · · + nk = M). That is, M 
has the eigen-decomposition U diag(a1, . . . , a1︸ ︷︷ ︸ 
n1 
, . . . , ak, . . . , ak︸ ︷︷ ︸ 
nk 
)U∗, where U is a 
M ×M orthogonal matrix. 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 419 
The sample covariance matrices of the two observations {xi} and {zi} are 
(2.5) S1 = 1 

m∑ 
l=1 
xlx 
∗ 
l = 


XX∗ =1/2p 
( 1 

WW ∗ 

1/2p 
and 
(2.6) S2 = 1 

n∑ 
j=1 
zj z 
∗ 
j = 


ZZ∗, 
respectively. 
Throughout the paper, we consider an asymptotic regime of Marcˇenko–Pastur- 
type, that is, 
p ∧ n∧m → ∞, yp := p/n→ y ∈ (0,1) and(2.7) 
cp := p/m → c > 0. 
Recall that the empirical spectral distribution (ESD) of a p × p matrix A with 
eigenvalues {λj } is the distribution p−1∑pj=1 δλj where δa denotes the Dirac mass 
at a. Since the total rank M generated by the k spikes is fixed, the ESD of S will 
have the same limit (LSD) as there were no spikes in p . This limiting spectral 
distribution, which is the celebrated Wachter distribution, has been known for a 
long time. 
PROPOSITION 2.1. For the Fisher matrix S = S−12 S1 with the sample covari- 
ance matrices Si ’s given in (2.5)–(2.6), assume that the dimension p and the two 
sample sizes n,m grow to infinity proportionally as in (2.7). Then almost surely, 
the ESD of S weakly converges to a deterministic distribution Fc,y with a bounded 
support [α,β] and a density function given by 
fc,y(x)= 
⎧⎪⎨ 
⎪⎩ 
(1 − y)√(β − x)(x − α) 
2πx(c + xy) , when α ≤ x ≤ β, 
0, otherwise, 
(2.8) 
where 
α = 
(1 − √c + y − cy 
1 − y 
)2 
and β = 
(1 + √c + y − cy 
1 − y 
)2 
.(2.9) 
Furthermore, if c > 1, then Fc,y has a point mass 1 − 1/c at the origin. Also, the 
Stieltjes transform s(z) of Fc,y equals 
s(z) = 1 
zc 
− 1 

− c(z(1 − y)+ 1 − c)+ 2zy − c 
√ 
(1 − c + z(1 − y))2 − 4z 
2zc(c + zy) ,(2.10) 
z /∈ [α,β]. 
420 Q. WANG AND J. YAO 
REMARK 2.1. Assuming both populations are Gaussian, Wachter (1980), 
Theorem 3.1, derives the limiting distribution for roots of the determinental equa- 
tion, ∣∣mS1 − x2(mS1 + nS2)∣∣= 0, x ∈R. 
The continuous component of the distribution has a compact support [A2,B2] with 
density function proportional to {(x − A2)(B2 − x)}1/2/{x(1 − x2)}. It can be 
readily checked that by the change of variable z = cx2/{y(1 − x2)}, the density 
of the continuous component of the LSD of S is exactly (2.8). The validity of 
this limit for general populations (nonnecessarily Gaussian) is due to Silverstein 
(1985) and Bai, Yin and Krishnaiah (1987). 
3. Phase transition of the extreme eigenvalues of S = S−12 S1. In this sec- 
tion, we establish a phase transition phenomenon for the extreme eigenvalues of 
S = S−12 S1, that is, when a population spike ai with multiplicity ni is larger (or 
smaller) than a critical value, a packet of ni corresponding sample eigenvalues of 
S will jump outside the support [α,β] of its LSD Fc,y and converge all to a fixed 
limit φ(ai), which is called the displacement of the population spike ai . Otherwise, 
these associated sample eigenvalues will converge to one of the edges α and β . 
By assumption, the k population spike eigenvalues {ai} are all positive and 
nonunit. We order them with their multiplicities in descending order together with 
the p −M unit eigenvalues as 
a1 = · · · = a1 > a2 = · · · = a2 > · · ·> ak0 = · · · = ak0 > 1 = · · · = 1(3.1) 
> ak0+1 = · · · = ak0+1 > · · ·> ak = · · · = ak. 
That is, k0 of these population spike eigenvalues are larger than 1 while the other 
k − k0 are smaller. Let 
Ji = 
{[n1 + · · · + ni−1 + 1, n1 + · · · + ni], 1 ≤ i ≤ k0,[ 
p − (ni + · · · + nk)+ 1,p − (ni+1 + · · · + nk)], k0 < i ≤ k. 
Notice that the cardinality of each Ji is ni . Next, the sample eigenvalues {lp,j } of 
the Fisher matrix S−12 S1 are also sorted in the descending order as lp,1 ≥ lp,2 ≥· · · ≥ lp,p . Therefore, for each spike eigenvalue ai , there are ni associated sample 
eigenvalues {lp,j , j ∈ Ji}. The phase transition for these extreme eigenvalues is 
given in the following Theorem 3.1. 
THEOREM 3.1. For the Fisher matrix S = S−12 S1 with the sample covariance 
matrices Si ’s given in (2.5)–(2.6), assume that the dimension p and the two sample 
sizes n,m grow to infinity proportionally as in (2.7). Then for any spike eigenvalue 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 421 
ai (i = 1, . . . , k), it holds that for all j ∈ Ji , lp,j almost surely converges to a limit 
λi = 
⎧⎪⎪⎨ 
⎪⎪⎩ 
φ(ai), |ai − γ |> γ√c + y − cy, 
β, 1 < ai ≤ γ {1 + √c + y − cy}, 
α, γ {1 − √c + y − cy} ≤ ai < 1, 
(3.2) 
where γ := 1/(1 − y) ∈ (1,∞) and 
φ(ai)= ai(ai + c − 1) 
ai − aiy − 1(3.3) 
is the displacement of the population spike ai . 
The proof of this Theorem is postponed to Section 8.1. 
REMARK 3.1. Theorem 3.1 states that when the population spike ai is large 
enough (ai > γ {1 + √c + y − cy}) or small enough (ai < γ {1 − √c + y − cy}), 
the corresponding extreme sample eigenvalues of the spiked Fisher matrix will 
converge to φ(ai), which is located outside the support [α,β] of its LSD. Other- 
wise, they converge to one of its edges α and β . This phenomenon is depicted in 
Figure 1 for understanding. 
REMARK 3.2. Using the notation γ = 1/(1 − y), the function φ(x) in (3.3) 
could be expressed as 
(3.4) φ(x)= γ x(x − 1 + c) 
x − γ , x = γ, 
which is a rational function with a single pole at x = γ . And the function asymp- 
totically equals to g(x) = γ (x + c − 1 + γ ) when |x| → ∞. On the other hand, 
since φ(γ {1 − √c + y − cy}) = α and φ(γ {1 + √c + y − cy}) = β , it can be 
checked that the points A(γ {1−√c + y − cy,α}) and B(γ {1+√c + y − cy,β}) 
are exactly the two extreme points for the function φ. An example of φ(x) with 
parameters (c, y)= (15 , 12) is illustrated in Figure 2. 
REMARK 3.3. It is worth observing that when y → 0, the φ(x) function tends 
to the function well known in the literature for similar transition phenomenon of a 
spiked sample covariance matrix, that is, 
lim 
y→0φ(x)= x + 
cx 
x − 1 , x = 1,(3.5) 
see, for example, the ψ-function in Figure 4 of Bai and Yao (2012). These func- 
tions [(3.4) and (3.5)] share a same shape; however, the pole here equals 1, which 
is smaller than the pole γ = 1/(1 − y) [in (3.4)] for the case of a spiked Fisher 
matrix. 
422 Q. WANG AND J. YAO 
FIG. 1. Phase transition of the extreme eigenvalues of the spiked Fisher matrix: upper-left panel: 
when 1 < ai ≤ γ {1 + √c + y − cy}, the limit of the corresponding extreme sample eigenvalue 
{lp,j , j ∈ Ji} is β; upper-right panel: when ai > γ {1 + √c + y − cy}, the limit of {lp,j , j ∈ Ji} 
is larger than β [located at λi = φ(αi)]; lower-left panel: when γ {1 − √c + y − cy} ≤ ai < 1, 
the limit of {lp,j , j ∈ Ji} is α; lower-right panel: when 0 < ai < γ {1 − √c + y − cy}, the limit of 
{lp,j , j ∈ Ji} is smaller than α [located at λi = φ(αi)]. 
REMARK 3.4. As said in the Introduction, this phase transition phenomenon 
has already been established in a preprint Dharmawansa, Johnstone and Onatski 
(2014) (their Proposition 5) under Gaussian assumption and using a completely 
different approach. Theorem 3.1 proves that such a phase transition phenomenon 
is indeed universal. 
4. Central limit theorem for the outlier eigenvalues of S−12 S1. The aim of 
this section is to give a CLT for the ni-packed outlier eigenvalues:√ 


lp,j − φ(ai), j ∈ Ji}. 
Denote U = (U1 U2 · · · Uk), where each Ui is a matrix of size M × ni that 
corresponds to the spike eigenvalue ai . 
THEOREM 4.1. Assume the same assumptions as in Theorem 3.1 and in ad- 
dition, the variables (zij ) [in (2.2)] and (wkl) [in (2.3)] have the same first four 
moments and denote v4 as their common fourth moment: 
v4 = E|zij |4 = E|wkl|4, 1 ≤ i, k ≤ p,1 ≤ j ≤ n,1 ≤ l ≤m. 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 423 
FIG. 2. Example of the function φ(x) with (c, y) = ( 15 , 12 ). Its pole is at x = 2. When |x| → ∞, 
φ(x) is getting close to the equation g(x)= 2x + 125 (see the red line). The two extreme points are at 
A(0.450,0.203) and B(3.549,12.597), meaning that critical values for spikes are 0.450 and 3.549 
while the support of the LSD is [0.203,12.597]. 
Then for any population spike ai satisfying |ai − γ | > γ√c + y − cy, the 
normalized ni-packed outlier eigenvalues of S−12 S1: 
√ 
p{lp,j − φ(ai), j ∈ Ji} 
converge weakly to the distribution of the eigenvalues of the random matrix 
−U∗i R(λi)Ui/(λi). Here, 
(λi)= (1 − ai − c)(1 + ai(y − 1)) 

(ai − 1)(−1 + 2ai + c + a2i (y − 1)) 
,(4.1) 
R(λi) = (Rmn) is a M × M symmetric random matrix, made with independent 
Gaussian entries of mean zero and variance 
Var(Rmn)= 

2θi + (v4 − 3)ωi, m= n, 
θi, m = n,(4.2) 
where 
ωi = a 

i (ai + c − 1)2(c + y) 
(ai − 1)2 ,(4.3) 
θi = a 

i (ai + c − 1)2(cy − c − y) 
−1 + 2ai + c + a2i (y − 1) 
.(4.4) 
The proof of this theorem is postponed to Section 8.2. 
424 Q. WANG AND J. YAO 
REMARK 4.1. Notice that the result above involves the ith block Ui of the 
eigen-matrix U . When the spike ai is simple, Ui is unique up to its sign, then 
U∗i R(λi)Ui is uniquely determined. But when ai has multiplicities greater than 1, 
Ui is not unique; actually, any rotation of Ui can be an eigenvector corresponding 
to ai . But, according to Lemma A.1 in the Appendix, such a rotation will not affect 
the eigenvalues of the matrix U∗i R(λi)Ui . 
Next, we consider a special case where M is diagonal (U = IM ), with distinct 
eigenvalues ai , that is, M = k and ni = 1 for all 1 ≤ i ≤ M . Using the previous 
result of Theorem 4.1, it can be shown that after normalization, the outlier eigen- 
values lp,i of S−12 S1 are asymptotically Gaussian when |ai − γ |> γ 
√ 
c + y − cy. 
PROPOSITION 4.1. Under the same assumptions as in Theorem 3.1, with ad- 
ditional conditions that M is diagonal and all its eigenvalues ai (1 ≤ i ≤ M) 
are simple, we have when |ai − γ | > γ√c + y − cy, the outlier eigenvalue lp,i of 
S−12 S1 is asymptotically Gaussian: 
√ 


lp,i − ai(ai − 1 + c) 
ai − 1 − aiy 

=⇒N(0, σ 2i ), 
where 
σ 2i = 
2a2i (cy − c − y)(ai − 1)2(−1 + 2ai + c + a2i (y − 1)) 
(1 + ai(y − 1))4 
+ (v4 − 3) · a 

i (c + y)(−1 + 2ai + c + a2i (y − 1))2 
(1 + ai(y − 1))4 . 
PROOF. Under the above assumptions, the random matrix −U∗i R(λi)Ui re- 
duces to −R(λi)(i, i), which is a Gaussian random variable of mean zero and 
variance 
2θi + (v4 − 3)ωi = 2a 

i (ai + c − 1)2(cy − c − y) 
−1 + 2ai + c + a2i (y − 1) 
+ (v4 − 3) · a 

i (ai + c − 1)2(c + y) 
(ai − 1)2 . 
Therefore, combining with the value of δ(λi) in (4.1) we have 
√ 


lp,i − ai(ai − 1 + c) 
ai − 1 − aiy 

=⇒N(0, σ 2i ), 
where 
σ 2i = 
2a2i (cy − c − y)(ai − 1)2(−1 + 2ai + c + a2i (y − 1)) 
(1 + ai(y − 1))4 
+ (v4 − 3) · a 

i (c + y)(−1 + 2ai + c + a2i (y − 1))2 
(1 + ai(y − 1))4 . 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 425 
The proof of Proposition 4.1 is complete.  
REMARK 4.2. Notice that when the observations are standard Gaussian, we 
have v4 = 3, then the above theorem reduces to 
√ 


lp,i − ai(ai − 1 + c) 
ai − 1 − aiy 

=⇒N 

0, 
2a2i (ai − 1)2(cy − c − y)(−1 + 2ai + c + a2i (y − 1)) 
(1 + ai(y − 1))4 


which is exactly the result in Dharmawansa, Johnstone and Onatski (2014); see 
setting 1 in their Proposition 11. 
5. Numerical illustrations. In this section, numerical results are provided to 
illustrate the results of our Theorem 4.1 and Proposition 4.1. We fix p = 200, 
T = 1000, n = 400 with 1000 replications, thus y = 1/2 and c = 1/5. The 
critical interval is then [γ − γ√c + y − cy, γ + γ√c + y − cy] = [0.45,3.55] 
and the limiting support [α,β] = [0.2,12.6]. Consider k = 3 spike eigenvalues 
(a1, a2, a3) = (20,0.2,0.1) with respective multiplicity (n1, n2, n3) = (1,2,1). 
Let l1 ≥ · · · ≥ lp be the ordered eigenvalues of the Fisher matrix S−12 S1. We are 
particularly interested in the distributions of l1, (lp−2, lp−1) and lp , which corre- 
sponds to the spike eigenvalues a1, a2 and a3, respectively. 
5.1. Case of U = I4. In this subsection, we consider a simple case that U = 
I4. Therefore, following Theorem 4.1, we have: 
• for j = 1,p, √p{lj −φ(ai)} →N(0, σ 2i ). Here, for j = 1, i = 1, φ(a1)= 42.67 
and σ 21 = 4246.8 + 1103.5(v4 − 3); and for j = p, i = 3, φ(a3) = 0.07 and 
σ 23 = 7.2 × 10−3 + 3.15 × 10−3(v4 − 3);• for j = p − 2,p − 1 and i = 2, the two-dimensional random vector √p{lj − 
φ(a2)} converges to the eigenvalues of the random matrix − Rmn(λ2) . Here, 
φ(a2) = 0.13, (λ2) = 1.45 and Rmn is the 2 × 2 symmetric random matrix, 
made with independent Gaussian entries of mean zero and variance given by 
(5.1) Var(Rmn)= 

2θ2 + (v4 − 3)ω2(= 0.04 + 0.016(v4 − 3)), m= n, 
θ2(= 0.02), m = n. 
Simulations are conducted to compare the distributions of the empirical extreme 
eigenvalues with their limits. 
5.1.1. Gaussian case. First, we assume all the zij and wij are i.i.d. standard 
Gaussian, thus v4 − 3 = 0. And according to (5.1), Rmn/ 
√ 
0.04 is the standard 
2 × 2 Gaussian Wigner matrix (GOE). Therefore, we have: 
426 Q. WANG AND J. YAO 
FIG. 3. Upper panels show the empirical densities of l1 and lp (solid lines, after centralization 
and scaling) compared to their Gaussian limits (dashed lines). Lower panels show contour plots 
of empirical joint density function of (lp−2, lp−1) (left plot, after centralization and scaling) and 
contour plots of their limits (right plot). Both the empirical and limit joint density functions are 
displayed using the two-dimensional kernel density estimates. Samples are draw from i.i.d. standard 
Gaussian distribution with U = I4. The replication number is 1000. 
• √p{l1 − 42.67} →N(0,4246.8), 
• √p{lp − 0.07} →N(0,7.2 × 10−3), 
• the two-dimensional random vector √p{lp−2 − 0.13, lp−1 − 0.13} converges to 
the eigenvalues of the random matrix −0.138 ·W ; here, W is a 2 × 2 GOE. 
We compare the empirical distributions with their limits in Figure 3. The 
upper panels show the empirical kernel density estimates (in solid lines) of√ 
p{l1 −42.67} and √p{lp −0.07} from 1000 independent replications, compared 
to their Gaussian limits N(0,4246.8) and N(0,7.2 × 10−3), respectively (dashed 
lines). When considering the empirical distribution of the two-dimensional random 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 427 
vector 
√ 
p{lp−2 − 0.13, lp−1 − 0.13}, we run the two-dimensional kernel density 
estimation from 1000 independent replications and display their contour lines (see 
the lower-left panel of the figure), while the lower-right panel shows the contour 
lines of the kernel density estimation of the eigenvalues of the 2×2 random matrix 
−0.138 · GOE (their limits). 
5.1.2. Binary case. Second, we assume all the zij and wij are i.i.d. binary 
variables taking values {1,−1} with probability 1/2, and in this case we have 
v4 = 1. Similarly, we have: 
• √p{l1 − 42.67} →N(0,2039.8), 
• √p{lp − 0.07} →N(0,9 × 10−4), 
• the two-dimensional random vector √p{lp−2 − 0.13, lp−1 − 0.13} converges 
to the eigenvalues of the random matrix −Rmn/1.45. Here, Rmn is the 2 × 2 
symmetric random matrix, made with independent Gaussian entries of mean 
zero and variance 
Var(Rmn)= 

0.008, m= n, 
0.02, m = n. 
Figure 4 compares the empirical distributions with their limits in this binary case. 
The upper panels show the empirical kernel density estimates of √p{l1 − 42.67} 
and √p{lp − 0.07} from 1000 independent replications (in solid lines), compared 
to their Gaussian limits (in dashed lines). Also, the lower panel shows the contour 
lines of the empirical joint density of the √p{lp−2 − 0.13, lp−1 − 0.13} (the left 
plot), with the right plot displaying the contour lines of their limit. 
5.2. Case of general U. In this subsection, we consider the following nonunit 
orthogonal matrix: 
U = 
⎛ 
⎜⎜⎜⎜⎜⎜⎜⎝ 
1 0 0 0 
0 1 0 0 
0 0 
1√ 

1√ 

0 0 
1√ 

−1√ 

⎞ 
⎟⎟⎟⎟⎟⎟⎟⎠ 
,(5.2) 
that is, we have 
U1 = 
⎛ 
⎜⎜⎝ 




⎞ 
⎟⎟⎠ , U2 = 
⎛ 
⎜⎜⎜⎜⎜⎜⎜⎝ 
0 0 
1 0 

1√ 


1√ 

⎞ 
⎟⎟⎟⎟⎟⎟⎟⎠ 
, U3 = 
⎛ 
⎜⎜⎜⎜⎜⎜⎜⎝ 


1√ 
2−1√ 

⎞ 
⎟⎟⎟⎟⎟⎟⎟⎠ 

428 Q. WANG AND J. YAO 
FIG. 4. Upper panels show the empirical densities of l1 and lp (solid lines, after centralization 
and scaling) compared to their Gaussian limits (dashed lines). Lower panels show contour plots 
of empirical joint density function of (lp−2, lp−1) (left plot, after centralization and scaling) and 
contour plots of their limits (right plot). Both the empirical and limit joint density functions are 
displayed using the two-dimensional kernel density estimates. Samples are draw from i.i.d. binary 
distribution with U = I4. The replication number is 1000. 
Since Gaussian distribution is invariant under orthogonal transformation, we only 
consider the case that all the zij and wij are i.i.d. binary variables taking values 
{1,−1} with probability 1/2, with all the other settings the same as in Section 5.1. 
Then according to Theorem 4.1, we have: 
• √p{l1 − 42.67} →N(0,2039.8), 
• √p{lp − 0.07} →N(0,0.004), 
• the two-dimensional random vector √p{lp−2 − 0.13, lp−1 − 0.13} converges to 
the eigenvalues of the random matrix −U∗2 R(λ2)U2/1.45. Here, R(λ2) is the 
4 × 4 symmetric random matrix, made with independent Gaussian entries of 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 429 
FIG. 5. Upper panels show the empirical densities of l1 and lp (solid lines, after centralization and 
scaling) compared to their Gaussian limits (dashed lines). Lower panels show contour plots of em- 
pirical joint density function of (lp−2, lp−1) (left plot, after centralization and scaling) and contour 
plots of their limits (right plot). Both the empirical and limit joint density functions are displayed 
using the two-dimensional kernel density estimates. Samples are from i.i.d. binary distribution with 
U given by (5.2). The replication number is 1000. 
mean zero and variance 
Var(Rmn)= 

0.008, m= n, 
0.02, m = n. 
Figure 5 compares the empirical distributions with their limits in this general U 
case. The upper panels show the empirical kernel density estimates of √p{l1 − 
42.67} and √p{lp − 0.07} from 1000 independent replications (in solid lines), 
compared to their Gaussian limits (in dashed lines). Also, the lower panel of 
the figure shows the contour lines of the empirical joint density of √p{lp−2 − 
430 Q. WANG AND J. YAO 
0.13, lp−1 − 0.13} (the lower-left plot), with the lower-right plot showing the con- 
tour lines of their limit. 
6. Joint distribution of the outlier eigenvalues. In the previous section, we 
have obtained the following result for the outlier eigenvalues: the ni -dimensional 
real random vector √p{lp,j −λi, j ∈ Ji} converges to the distribution of the eigen- 
values of a random matrix −U∗i R(λi)Ui/(λi). It is in fact possible to derive theirjoint distribution, that is, the limit of the M-dimensional real random vector⎛ 
⎜⎝ 
√ 
p{lp,j1 − λ1, j1 ∈ J1} 
...√ 
p{lp,jk − λk, jk ∈ Jk} 
⎞ 
⎟⎠(6.1) 
if all the spike eigenvalues ai are above (or below) the phase transition threshold. 
Such joint convergence result is useful for inference procedures where consecutive 
sample eigenvalues are used such as their differences or ratios; see, for example, 
Onatski (2009) and Passemier and Yao (2014). 
THEOREM 6.1. Assume the same conditions as in Theorem 4.1 holds and all 
the population spikes ai satisfy the condition |ai − γ | > γ√c + y − cy. Then the 
M-dimensional random vector in (6.1) converges in distribution to the eigenvalues 
of the following M ×M random matrix:⎛ 
⎜⎜⎜⎜⎜⎝ 
−U∗1 R(λ1)U1 
(λ1) 
. . . 0 
... 
. . . 
... 
0 · · · −U 
∗ 
k R(λk)Uk 
(λk) 
⎞ 
⎟⎟⎟⎟⎟⎠ ,(6.2) 
where the matrices {R(λi)} are made with zero-mean independent Gaussian ran- 
dom variables, with the following covariance function between different blocks 
(l = s): for 1 ≤ i ≤ j ≤M : 
Cov 

R(λl)(i, j),R(λs)(i, j) 
)= 

θ(l, s), i = j, 
ω(l, s)(v4 − 3)+ 2θ(l, s), i = j, 
where 
θ(l, s)= lim 1 
n+ T trAn(λl)An(λs), 
ω(l, s)= lim 1 
n+ T 
n+T∑ 
i=1 
An(λl)(i, i)An(λs)(i, i), 
and An(λ) is defined in (A.17). 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 431 
The proof of this theorem is very close to that of Theorem 2.3 in Wang, Su and 
Yao (2014), thus omitted. 
In principle, the limiting parameters θ(l, s) and ω(l, s) can be completely speci- 
fied for a given spiked structure. However, this will lead to quite complex formula. 
Here, we prefer explaining a simple case where M is diagonal with simple eigen- 
values (ai), all satisfying the condition: |ai − γ |> γ√c + y − cy (i = 1, . . . ,M). 
Therefore, U∗i R(λi)Ui in (6.2) reduces to the (i, i)th element of R(λi), which is 
a Gaussian random variable. Besides, from Theorem 6.1, we see that the random 
variables {R(λi)(i, i)}i=1,...,M are jointly independent since the index sets (i, i) are 
disjoint. Finally, we have the following joint distribution of the M outlier eigen- 
values of S−12 S1. 
PROPOSITION 6.1. Under the same assumptions as in Theorem 4.1, then if 
M is diagonal with all its eigenvalues (ai) being simple, satisfying: |ai − γ | > 
γ 
√ 
c + y − cy, then the M outlier eigenvalues lp,j (j = 1, . . . ,M) of S−12 S1 are 
asymptotically independent, having the joint distribution as follows:⎛ 
⎜⎝ 
√ 
p(lp,1 − λ1) 
...√ 
p(lp,M − λM) 
⎞ 
⎟⎠=⇒N 
⎛ 
⎜⎜⎝0M, 
⎛ 
⎜⎜⎝ 
σ 21 · · · 0 
... 
. . . 
... 
0 · · · σ 2M 
⎞ 
⎟⎟⎠ 
⎞ 
⎟⎟⎠ , 
where 
σ 2i = 
2a2i (cy − c − y)(ai − 1)2(−1 + 2ai + c + a2i (y − 1)) 
(1 + ai(y − 1))4 
+ (v4 − 3) · a 

i (c + y)(−1 + 2ai + c + a2i (y − 1))2 
(1 + ai(y − 1))4 . 
7. Applications. In this section, we present two applications of our previous 
results Theorem 4.1 and Proposition 4.1 in the areas of high-dimensional hypoth- 
esis testing and signal detection. 
7.1. Application 1: Power of testing the equality between two high-dimensional 
covariance matrices. Let (xi)1≤i≤m and (zj )1≤j≤n be two p-dimensional ob- 
servations from populations 1 and 2. This subsection considers the high- 
dimensional hypothesis testing for the equality between 1 and 2 against 
a specific alternative, that is, the difference between 1 and 2 is a finite rank 
covariance matrix. Put it in another way, we are concerned about the following 
testing problem: 
H0 : 1 =2 vs. H1 : 1 =2 +,(7.1) 
where rank()=M (here M is a finite integer). 
432 Q. WANG AND J. YAO 
There exists a wide literature on testing the equality between two covariance 
matrices. In the classical large sample asymptotics, early works can be found in 
text books like Muirhead (1982) and Anderson (1984), where the authors find the 
limit distribution to be χ2 [with degrees of freedom p(p+1)/2] for the likelihood 
ratio statistic under the Gaussian assumption. In recent years, this testing prob- 
lem has been reconsidered but in a different asymptotic regime, that is, both the 
dimension and the two sample sizes are allowed to grow to infinity together. For 
example, in Bai et al. (2009), the authors prove that in the asymptotic regime of 
Marcˇenko–Pastur-type, the limiting distribution of the likelihood ratio statistic is 
Gaussian under H0. Li and Chen (2012) propose a test based on some U -statistic, 
and its limiting distribution is derived under both the null and the alternative hy- 
potheses in the high-dimensional framework. Cai, Liu and Xia (2013) proposes a 
test statistic based on the elements of the two sample covariance matrices and both 
its limiting distribution under the null hypothesis and its power are studied. And 
it is shown that their statistic enjoys certain optimality and especially powerful 
against sparse alternatives. 
In the following, we consider a statistic based on the largest eigenvalue of the 
Fisher matrix and it will be shown that it is powerful against spiked alternatives. 
Now denote the sample covariance matrices of the two populations to be 
(7.2) S1 = 1 

m∑ 
j=1 
xjx 
∗ 
j = 


XX∗ 
and 
(7.3) S2 = 1 

n∑ 
j=1 
zj z 
∗ 
j = 


ZZ∗ 
respectively. When p, m and n are all growing to infinity proportionally while M is 
a fixed integer, the empirical measure of the p eigenvalues of S−12 S1 (for simplicity, 
we assume p < n) will be affected by a difference of order M/p which vanishes, 
so that its limit remains the same as in the null hypothesis, that is, the Wachter 
distribution (see Proposition 2.1). In other words, such global limit from all the 
eigenvalues of S−12 S1 will be of little help for distinguishing the two hypotheses 
(7.1). It happens that the useful information to detect a small rank alternative is 
actually encoded in a few largest eigenvalues of S−12 S1. 
Now denote l1 as the largest eigenvalue of S−12 S1. Notice that the eigenvalues of 
S−12 S1 are invariant under the transformation (2.1), so without lose of generality, 
we can assume that under H0, it holds 1 =2 = Ip . Then according to Han, Pan 
and Zhang (2016), we have 
l1 − β 
sp 
=⇒ F1, 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 433 
where sp = 1m( 
√ 
m+ √p)( 1√ 

+ 1√ 

)1/3, which is the order of p−2/3 and F1 de- 
notes the type-1 Tracy–Widom distribution. Consequently, we adopt the following 
decision rule: 
Reject H0 : if l1 > qαsp + β,(7.4) 
where qα is the upper quantile at level α of the Tracy–Widom distribution F1: 
F1(qα,∞)= α. 
Once the largest eigenvalue a1 of is above the critical value for phase transition, 
this test will be able to detect the alternative hypothesis with a power tending to 
one as the dimension tends to infinity. 
THEOREM 7.1. Under the asymptotic scheme set in (2.7), assume the largest 
eigenvalue a1 of is above the critical value 1+ 
√ 
c+y−cy 
1−y . Then the power function 
of the test procedure (7.4) equals to 
Power = 1 − 
(√ 

σ1 
spqα + 
√ 

σ1 

β − a1(a1 − 1 + c) 
a1 − 1 − a1y 
)) 
+ o(1), 
which will finally tend to one as the dimension tends to infinity. 
PROOF. Under the alternative H1 and according to our Proposition 4.1, the 
asymptotic distribution for l1 is Gaussian: 
√ 


l1 − a1(a1 − 1 + c) 
a1 − 1 − a1y 

=⇒N(0, σ 21 ). 
Therefore, the power can be calculated as 
Power = 1 − 
(√ 

σ1 
spqα + 
√ 

σ1 

β − a1(a1 − 1 + c) 
a1 − 1 − a1y 
)) 
+ o(1),(7.5) 
where is the standard normal cumulative distribution function. Since the order 
of sp is p−2/3 when p → ∞, the first term 
√ 

σ1 
spqα → 0 and the second term√ 

σ1 
(β − a1(a1−1+c) 
a1−1−a1y ) → −∞ [when a1 > 
1+√c+y−cy 
1−y , 
a1(a1−1+c) 
a1−1−a1y is always larger 
than the right edge point β]. Therefore, we have the right-hand side of (7.5) tend 
to one for any pre-given α when p → ∞. The proof of Theorem 7.1 is complete. 

REMARK 7.1. In Li and Chen (2012), the authors use an U-statistic Tm,n to 
test the hypothesis H0 :1 =2. And its power is shown to be 


−Lm,n(1,2)zα + tr{(1 −2) 
2} 
σm,n 

,(7.6) 
434 Q. WANG AND J. YAO 
where zα is the upper-α quantile of N(0,1) and 
Lm,n(1,2)= σ−1m,n 
{ 2 

tr 

22 
)+ 2 

tr 

21 
)} 

σ 2m,n = 

n2 

tr 

22 
)}2 + 8 

tr 

22 −12 
)2 + 4 
m2 

tr 

21 
)}2 
+ 8 

tr 

21 −12 
)2 + 8 
mn 

tr(12) 
}2 

If we restrict it to the specific alternative as in (7.1), then all the three parameters 
Lm,n(1,2), tr{(1 −2)2} and σm,n in (7.6) are of constant order. Therefore, 
against an alternative hypothesis of spiked type (7.1), our procedure is more pow- 
erful. 
7.2. Application 2: Determine the number of signals. In this subsection, we 
consider an application of our results in the field of signal detection, where the 
spiked Fisher matrix arises naturally. 
In a signal detection equipment, records are of form 
(7.7) xi =Asi + ei, i = 1, . . . ,m, 
where xi is p-dimensional observations, si is a k× 1 low-dimensional signal (k  
p) with unit covariance matrix, A a p × k mixing matrix, and (ei) is an i.i.d. 
noise with covariance matrix 2. Therefore, the covariance matrix of xi can be 
considered as a k-dimensional (low rank) perturbation of 2, denoted as p in 
the following. Notice that none of the quantities at the right-hand side of (7.7) 
is observed. One of the fundamental problems here is to estimate k, the number 
of signals present in the system, which is challenging when the dimension p is 
large, say has a comparable magnitude with the sample size m. When the noise 
has the simplest covariance structure, that is, 2 = σ 2e Ip , this problem has been 
much investigated recently and several solutions are proposed; see, for example, 
Kritchman and Nadler (2008), Nadler (2010), Passemier and Yao (2012, 2014). 
However, the problem with an arbitrary noise covariance matrix 2, say diagonal 
to simplify, remains unsolved in the large-dimensional context (to the best of our 
knowledge). 
Nevertheless, there exists an astute engineering device where the system can 
be tuned in a signal-free environment, for example, in laboratory: that is, we can 
directly record a sequence of pure-noise observations zj , j = 1, . . . , n, which have 
the same distribution as the (ei) above. These signal-free records can then be used 
to whiten the observations (xi) thanks to the invariant property in (2.1), which 
states that the eigenvalues of S−12 S1 [S1 and S2 are same defined as in (7.2) and 
(7.3)] are in fact independent of 2. Therefore, these eigenvalues can be thought 
as if 2 = Ip , that is, S−12 S1 becomes a spiked Fisher matrix as introduced in 
Section 2. This is actually the reason why the two sample procedure developed 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 435 
here can deal with an arbitrary covariance matrix of the noise while the existing 
one-sample procedures cannot. 
Based on Theorem 3.1, we propose our estimator of the number of signals as 
the number of eigenvalues of S−12 S1 that is larger than the right edge point of the 
support of its LSD: 
kˆ = max{i : li ≥ β + dn},(7.8) 
where (dn) is a sequence of vanishing constants. 
THEOREM 7.2. Assume all the spike eigenvalues ai (i = 1, . . . , k) satisfy ai > 
γ + γ√c + y − cy. Let dn be a sequence of positive numbers such that √p · dn → 
0 and p2/3 · dn → +∞ as p → +∞, then the estimator kˆ in (7.8) is consistent, 
that is, kˆ → k in probability as p → +∞. 
PROOF. Since 
{kˆ = k} = {k = max{i : li ≥ β + dn}} 
= {∀j ∈ {1, . . . , k}, lj ≥ β + dn}∩ {lk+1 < β + dn}, 
we have 
P{kˆ = k} = P 
( ⋂ 
1≤j≤k 
{lj ≥ β + dn} ∩ {lk+1 < β + dn} 

= 1 − P 
( ⋃ 
1≤j≤k 
{lj < β + dn} ∪ {lk+1 ≥ β + dn} 

(7.9) 
≥ 1 − 
k∑ 
j=1 
P(lj < β + dn)− P(lk+1 ≥ β + dn). 
For j = 1, . . . , k, 
P(lj < β + dn)= P(√p(lj − φ(aj ))<√p(β + dn − φ(aj )))(7.10) 
→ P(√p(lj − φ(aj ))<√p(β − φ(aj ))), 
which is due to the assumption that √p · dn → 0. Then the part √p(β − φ(aj )) 
in (7.10) will tend to −∞ since we have always φ(aj ) > β when ai > γ + 
γ 
√ 
c + y − cy. On the other hand, by Theorem 4.1, √p(lj − φ(aj )) in (7.10) 
has a limiting distribution; it is then bounded in probability. Therefore, we have 
P(lj < β + dn)→ 0 for j = 1, . . . , k.(7.11) 
Also 
P(lk+1 ≥ β + dn)= P(p2/3(lk+1 − β)≥ p2/3 · dn), 
436 Q. WANG AND J. YAO 
and the part p2/3(lk+1 − β) is asymptotically Tracy–Widom distributed [see Han, 
Pan and Zhang (2016)]. As p2/3 · dn tend to infinity as assumed, we have 
P(lk+1 ≥ β + dn)= 0.(7.12) 
Combine (7.9), (7.11) and (7.12), we have P{kˆ = k} → 1 as p → +∞. The proof 
of Theorem 7.2 is complete.  
REMARK 7.2. Notice here that there is no need for those spikes ai to be 
simple. The only requirement is that they should be properly strong enough 
(ai > γ + γ√c + y − cy) for detection. 
In the following, we will conduct a short simulation to illustrate the perfor- 
mance of our estimator. For comparison, we also show the performance of another 
estimator k¯ that treats the noise covariance as known (using a plug-in estimator 
for this quantity). Detailed illustrations are as follows. Recall the model in (7.7), 
where Cov(ei) = 2 is arbitrary. Now assume for a moment that 2 is known, 
then we can multiply both sides of (7.7) by −1/22 : 

−1/2 
2 xi =−1/22 Asi +−1/22 ei, i = 1, . . . ,m, 
where the left-hand side is still observable (simply multiply the original observa- 
tions {xi} by −1/22 ). Denote x˜i = −1/22 xi and e˜i = −1/22 ei , then Cov(e˜i) = Ip . 
On the other hand, due to the fact that the rank of −1/22 Asi is still k, the covariance 
matrix of the new observation x˜i is then a rank k perturbation of Ip . Therefore, the 
method in Kritchman and Nadler (2008) can be adopted. Their proposed estimator 
is 
k¯ = max{k : lk > (1 + √c)2 + dn}.(7.13) 
Besides, the {lk} in (7.13) are the eigenvalues of the sample covariance matrix of 
the observation x˜i : 

−1/2 
2 · 
( 1 

XXT 

·−1/22 , 
whose eigenvalues are the same as those of −12 S1. Since 2 is actually unknown, 
here we simply use its plug-in estimator S2. Therefore, the estimator in (7.13) for 
comparison is then 
k¯ = max{k : lk(S−12 S1)> (1 + √c)2 + dn}.(7.14) 
The parameters for the simulation is set as follows. We fix y = 0.1, c = 0.9 and 
the value of p varies from 50 to 250, therefore, the critical value for ai in the model 
(2.4) (after whitening) is ai > γ {1 + √c + y − cy} = 2.17. For each given pair of 
(p,n,m) (we take floor if the values of n or m are nonintegers), we repeat 1000 
times. The tuning parameter dn is chosen to be logp/p2/3. 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 437 
Next, suppose k = 3 and A is a p × 3 matrix of form A = (√c1v1,√c2v2), 
where c1 = 10, c2 = 5, 
v1 = (1 0 · · · 0)∗ and v2 = 

0 1/ 
√ 
2 1/ 
√ 
2 0 · · · 0 
0 1/ 
√ 
2 −1/√2 0 · · · 0 
)∗ 

So we have two spike eigenvalues c1 = 10, c2 = 5 (before whitening) with multi- 
plicity n1 = 1, n2 = 2, respectively. 
Besides, assume Cov(si) = I3 and we run both the Gaussian (si is multivariate 
Gaussian) and non-Gaussian (each component of si is i.i.d., taking value 1 or −1 
with equal probability) cases. Finally, we set ei to be multivariate Gaussian dis- 
tributed with covariance matrix Cov(ei) either diagonal or nondiagonal as in the 
following two cases: 
• Case 1: Cov(ei) = diag(1, . . . ,1︸ ︷︷ ︸ 
p/2 
,2, . . . ,2︸ ︷︷ ︸ 
p/2 
). In this case, we have the three 
nonzero eigenvalues of (c1v1v∗1 + c2v2v∗2) · [Cov(ei)]−1 equal 10,5,5, respec- 
tively, which are all larger than the critical value 2.17−1, therefore, the number 
of detectable signals is three; 
• Case 2: Cov(ei) is compound symmetric with all the diagonal elements equal 1 
and all the off-diagonal elements equal 0.1. In this case, we have for each given 
p, the three nonzero eigenvalues of (c1v1v∗1 +c2v2v∗2) · [Cov(ei)]−1 are all larger 
than 5.36 (> 2.17 − 1). The number of detectable signals is again three. 
Tables 1 and 2 report the empirical frequency of our estimator kˆ in Case 1 and 
Case 2. For comparison, we also report the frequency of the plug-in estimator 
TABLE 1 
Frequency of our estimator and the plug-in estimator defined in (7.14) for Case 1 
Gaussian Non-Gaussian 
p 50 100 150 200 250 50 100 150 200 250 
kˆ = 2 0.029 0.001 0 0 0 0.011 0 0 0 0 
kˆ = 3 0.971 0.997 0.997 0.995 0.998 0.985 0.997 0.993 0.998 0.998 
kˆ = 4 0 0.002 0.003 0.005 0.002 0.004 0.003 0.007 0.002 0.002 
k¯ = 3 0.603 0.037 0 0 0 0.654 0.051 0 0 0 
k¯ = 4 0.387 0.485 0.03 0 0 0.334 0.514 0.026 0 0 
k¯ = 5 0.01 0.439 0.375 0.016 0 0.012 0.394 0.392 0.009 0 
k¯ = 6 0 0.039 0.508 0.194 0.008 0 0.041 0.481 0.253 0.002 
k¯ = 7 0 0 0.084 0.566 0.125 0 0 0.096 0.56 0.108 
k¯ = 8 0 0 0.003 0.204 0.463 0 0 0.005 0.163 0.518 
k¯ = 9 0 0 0 0.02 0.369 0 0 0 0.015 0.334 
k¯ = 10 0 0 0 0 0.035 0 0 0 0 0.038 
438 Q. WANG AND J. YAO 
TABLE 2 
Frequency of our estimator and the plug-in estimator defined in (7.14) for Case 2 
Gaussian Non-Gaussian 
p 50 100 150 200 250 50 100 150 200 250 
kˆ = 2 0.018 0 0 0 0 0.003 0 0 0 0 
kˆ = 3 0.982 0.995 0.996 0.995 0.998 0.993 0.997 0.993 0.998 0.998 
kˆ = 4 0 0.005 0.004 0.005 0.002 0.004 0.003 0.007 0.002 0.002 
k¯ = 3 0.6 0.034 0 0 0 0.644 0.048 0.026 0 0 
k¯ = 4 0.39 0.477 0.03 0 0 0.345 0.511 0.382 0.008 0 
k¯ = 5 0.01 0.449 0.36 0.016 0 0.011 0.399 0.491 0.243 0 
k¯ = 6 0 0.04 0.518 0.193 0.007 0 0.042 0.096 0.564 0.002 
k¯ = 7 0 0 0.088 0.559 0.116 0 0 0.005 0.169 0.103 
k¯ = 8 0 0 0.004 0.207 0.465 0 0 0 0.016 0.516 
k¯ = 9 0 0 0 0.025 0.377 0 0 0 0 0.341 
k¯ = 10 0 0 0 0 0.035 0 0 0 0 0.038 
defined in (7.14). According to our set up, the true number of signals is k = 3. From 
these two tables, we see that the frequency of correct estimation of our estimator kˆ 
(kˆ = 3) is always around some value close to 1 in the two cases (both for Gaussian 
signal and non-Gaussian signal), which confirms the consistency of our estimator. 
While the plug-in estimator will always overestimate the number of signals in 
both cases. This overestimation phenomenon gets more and more striking when 
the value of p gets larger. 
8. Proofs of the main results. 
8.1. Proof of Theorem 3.1. For notation convenience, first we define some 
integrals with respect to Fc,y(x) as follows: for a complex number z /∈ [α,β], 
s(z) := 
∫ 1 
x − z dFc,y(x), m1(z) := 
∫ 1 
(z− x)2 dFc,y(x), 
m2(z) := 
∫ 

z− x dFc,y(x), m3(z) := 
∫ 

(z− x)2 dFc,y(x),(8.1) 
m4(z) := 
∫ 
x2 
(z− x)2 dFc,y(x). 
PROOF. The proof is divided into the following three steps: 
• Step 1: we derive the almost sure limit of an outlier eigenvalue of S−12 S1; 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 439 
• Step 2: we show that in order for the extreme eigenvalue of S−12 S1 to be an 
outlier, the population spike ai should be larger (or smaller) than a critical 
value; 
• Step 3: if not so, the extreme eigenvalue of S−12 S1 will converge to one of the 
edge points α and β . 
Step 1: Let lp,j (j ∈ Ji) be the outlier eigenvalue of S−12 S1 corresponding to the 
population spike ai . Then lp,j must satisfy the following equation:∣∣lp,j Ip − S−12 S1∣∣= 0, 
and it is equivalent to 
|lp,j S2 − S1| = 0.(8.2) 
Now we make some shorthand. Denote Z = (Z1 
Z2 

, where Z1 is the n observations 
of its first M coordinates and Z2 the remaining. We partition X accordingly as 
X = (X1 
X2 

, where X1 is the m observations of its first M coordinates and X2 the 
remaining. Using such a representation, we have 
S1 = 1 

XX∗ = 1 


X1X 
∗ 
1 X1X 
∗ 

X2X 
∗ 
1 X2X 
∗ 



(8.3) 
S2 = 1 

ZZ∗ = 1 


Z1Z 
∗ 
1 Z1Z 
∗ 

Z2Z 
∗ 
1 Z2Z 
∗ 



Then (8.2) could be written in the block form:∣∣∣∣∣∣∣∣ 
⎛ 
⎜⎜⎝ 
lp,j 

Z1Z 
∗ 
1 − 


X1X 
∗ 

lp,j 

Z1Z 
∗ 
2 − 


X1X 
∗ 

lp,j 

Z2Z 
∗ 
1 − 


X2X 
∗ 

lp,j 

Z2Z 
∗ 
2 − 


X2X 
∗ 

⎞ 
⎟⎟⎠ 
∣∣∣∣∣∣∣∣= 0.(8.4) 
Since lp,j is an outlier, it holds |lp,j · 1nZ2Z∗2 − 1mX2X∗2 | = 0, and for block matrix, 
we have det 
(A 



) = detD · det(A − BD−1C) when D is invertible. Therefore, 
(8.4) reduces to∣∣∣∣ lp,jn Z1Z∗1 − 1mX1X∗1 
− 

lp,j 

Z1Z 
∗ 
2 − 


X1X 
∗ 

)( 
lp,j 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
× 

lp,j 

Z2Z 
∗ 
1 − 


X2X 
∗ 

)∣∣∣∣= 0. 
440 Q. WANG AND J. YAO 
More specifically, we have 
det 

lp,j 

Z1 

In −Z∗2 

lp,j Ip − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

)−1(1 

Z2Z 
∗ 

)−1 lp,j 

Z2 

Z∗1︸ ︷︷ ︸ 
(I ) 
− 1 

X1 

Im +X∗2 

lp,j Ip − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

)−1(1 

Z2Z 
∗ 

)−1 1 

X2 

X∗1︸ ︷︷ ︸ 
(II) 
+ lp,j 

Z1Z 
∗ 


lp,j Ip − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

)−1(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 
1︸ ︷︷ ︸ 
(III) 
(8.5) 
+ 1 

X1X 
∗ 


lp,j Ip − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

)−1(1 

Z2Z 
∗ 

)−1 lp,j 

Z2Z 
∗ 
1︸ ︷︷ ︸ 
(IV) 

= 0. 
In all the following, we denote by S the Fisher matrix ( 1 

Z2Z 
∗ 
2) 
−1 1 

X2X 
∗ 
2 , which 
has a LSD Fc,y(x). And in order to find the limit of lp,j , we simply find the limit on 
the left-hand side of (8.5), then it will generate an equation. Solving this equation 
will give the value of its limit. 
First, consider the terms (III) and (IV). Since (Z1,X1) is independent of 
(Z2,X2), using Lemma A.2, we see these two terms will converge to some con- 
stant multiplied by the covariance matrix between X1 and Z1. On the other hand, 
X1 is also independent of Z1, we have 
Cov(X1,Z1)= EX1Z1 −EX1EZ1 = EX1EZ1 −EX1EZ1 = 0M×M. 
Therefore, these two terms will both tend to a zero matrix 0M×M almost surely. 
So the remaining task is to find the limit of (I ) and (II). We recall the expression 
of X1 and Z1 that 
Cov(X1)=U diag(a1, . . . , a1︸ ︷︷ ︸ 
n1 
, . . . , ak, . . . , ak︸ ︷︷ ︸ 
nk 
)U∗, Cov(Z1)= IM. 
According to Lemma A.2, we have 
(I )= lp,j 

Z1 

In −Z∗2(lp,j Ip − S)−1 
(1 

Z2Z 
∗ 

)−1 lp,j 

Z2 

Z∗1 
→ λi 


E tr 

In −Z∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1λi 

Z2 
]} 
· IM(8.6) 
= λi(1 + yλis(λi)) · IM, 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 441 
here, we denote λi as the limit of the outlier {lp,j , j ∈ Ji}. For the same reason, 
(II) = − 1 

X1 

Im +X∗2(lp,j Ip − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

X2 

X∗1 
→ − 1 


E tr 

Im +X∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

X2 
]} 
(8.7) 
×U 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗ 
= U (−1 + c + cλis(λi)) · 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗. 
Therefore, combining (8.5), (8.6) and (8.7), we have the determinant of the 
following M ×M matrix: 

⎛ 
⎜⎝ 
λi 

1 + yλis(λi))+ (−1 + c + cλis(λi))a1 0 
... 
. . . 
... 
0 λi 

1 + yλis(λi))+ (−1 + c + cλis(λi))ak 
⎞ 
⎟⎠U∗ 
equal to zero, which is also to say that λi satisfies the equation: 
λi 

1 + yλis(λi))+ (−1 + c + cλis(λi))ai = 0.(8.8) 
Finally, together with the expression of the Stieltjes transform of a Fisher matrix 
in (2.10), we have 
λi = ai(ai + c − 1) 
ai − aiy − 1 = φ(ai).(8.9) 
Step 2: Define s(z) as the Stieltjes transform of the LSD of 1 

X∗2( 


Z2Z 
∗ 
2) 
−1X2, 
who shares the same nonzero eigenvalues as S−12 S1. Then we have the relationship: 
s(z)+ 1 

(1 − c)= cs(z).(8.10) 
Recall the expression of s(z) in (2.10), we have 
s(z)= −c(z(1 − y)+ 1 − c)+ 2zy − c 
√ 
(1 − c + z(1 − y))2 − 4z 
2z(c + zy) . 
(8.11) 
On the other hand, due to (8.8) and (8.10), we have the value for s(λi): 
s(λi)= yc − y − c 
yλi + aic .(8.12) 
442 Q. WANG AND J. YAO 
Since λi is outside the support of the LSD, we have 
s−1 

yc − y − c 
yλi + aic 

= λi > β or s−1 

yc − y − c 
yλi + aic 

= λi < α, 
which is also to say that 
s(β) < 
yc − y − c 
yλi + aic(8.13) 
or 
s(α) > 
yc − y − c 
yλi + aic .(8.14) 
Then (8.13) says that s(β) must be smaller than the minimum value on its right- 
hand side, whose minimum value is attained when λi = β [the right-hand side 
of (8.13) is a decreasing function of λi ]. Similarly, (8.14) says that s(α) must 
be larger than the maximum value on its right-hand side, which is attained when 
λi = α. Therefore, the condition for λi be an outlier is 
s(β) < 
yc − y − c 
yβ + aic or s(α) > 
yc − y − c 
yα + aic .(8.15) 
Finally, using (8.11) together with the value of α and β , we have 
ai > 
1 + √c + y − cy 
1 − y or ai < 
1 − √c + y − cy 
1 − y , 
which is equivalent to say that [recall the expression of γ that γ = 1/(1 − y)] the 
condition to allow for the outlier is 
|ai − γ |> γ√c + y − cy. 
Step 3: In this step, we show that if the condition in Step 2 is not fulfilled, 
then the extreme eigenvalues of S−12 S1 will tend to one of the edge points α and 
β . For simplicity, we only show the convergence to the right edge β: the proof 
for the convergence to the left edge α is similar. Thus suppose all the ai > 1 for 
i = 1, . . . , k. Let 
S1 = 1 

XX∗ = 1 


X1X 
∗ 
1 X1X 
∗ 

X2X 
∗ 
1 X2X 
∗ 


:= 

B11 B12 
B21 B22 

and 
S2 = 1 

ZZ∗ = 1 


Z1Z 
∗ 
1 Z1Z 
∗ 

Z2Z 
∗ 
1 Z2Z 
∗ 


:= 

A11 A12 
A21 A22 


where B11 and A11 are the blocks of size M × M . Using the inverse formula for 
block matrix, the (p −M)× (p −M) major sub-matrix of S−12 S1 is 
−(A22 −A21A−111 A12)−1A21A−111 B12 + (A22 −A21A−111 A12)−1B22(8.16) 
:= C. 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 443 
The part 
−(A22 −A21A−111 A12)−1A21A−111 B12 
= −(A22 −A21A−111 A12)−1A21A−111 · 1mX1X∗2 
is of rank M ; besides, we have 
tr 
{( 
A22 −A21A−111 A12 
)−1 
A21A 
−1 
11 


X1X 
∗ 


→ 0, 
since X1 is independent of X2. Therefore, the M nonzero eigenvalues of the matrix 
−(A22 − A21A−111 A12)−1A21A−111 B12 will all tend to zero (so is its largest one). 
Then consider the second part of (8.16) as follows: 
A22 −A21A−111 A12 = 


Z2 

In −Z∗1 
(1 

Z1Z 
∗ 

)−1 1 

Z1 

Z∗2 := 


Z2PZ2. 
Since P = In −Z∗1( 1nZ1Z∗1)−1 1nZ1 is a projection matrix of rank p−M , it has the 
spectral decomposition: 
P = V 
⎛ 
⎜⎜⎜⎝ 

. . . 

In−M 
⎞ 
⎟⎟⎟⎠V ∗, 
where V is an n × n orthogonal matrix. Since M is fixed, the ESD of P tends 
to δ1, which leads to the fact that the LSD of the matrix 1nZ2PZ 
∗ 
2 is the standard 
Marcˇenko–Pastur law. Then the matrix ( 1 

Z2PZ 
∗ 
2) 
−1B22 is a standard Fisher ma- 
trix, and its M + 1 largest eigenvalues α1(C) ≥ · · · ≥ αM+1(C) all converge to 
the right edge β of the limiting Wachter distribution. Meanwhile, since C is the 
(p − M) × (p − M) major sub-matrix of S−12 S1, we have by Cauchy interlacing 
theorem 
αM+1(C)≤ lp,M+1 ≤ α1(C)≤ lp,1. 
Thus lp,M+1 → β either. On the other hand, we have 
lp,1 = 
∥∥S−12 S1∥∥op ≤ ∥∥S−12 ∥∥op · ‖S1‖op, 
so that for some positive constant θ , lim sup lp,1 ≤ θ . Consequently, almost surely, 
β ≤ lim inf lp,M ≤ · · · ≤ lim sup lp,1 ≤ θ <∞; 
in particular the whole family {lp,j ,1 ≤ j ≤M} is bounded. Now let 1 ≤ j ≤M be 
fixed and assume that a subsequence (lpk,j )k converges to a limit β˜ ∈ [β, θ ]. Either 
β˜ = φ(ai) > β or β˜ = β . However, according to Step 2, β˜ > β implies that ai > 
γ {1+√c + y − cy}, and otherwise, we have ai ≤ γ {1+√c + y − cy}. Therefore, 
accordingly to one of these two conditions, all subsequences converge to a same 
limit φ(ai) or β , which is thus also the unique limit of the whole sequence (lp,j )p . 
The proof of Theorem 3.1 is complete.  
444 Q. WANG AND J. YAO 
8.2. Proof of Theorem 4.1. Step 1: Convergence to the eigenvalues of the ran- 
dom matrix −U∗i R(λi)Ui/(λi). We start from (8.5). Define 
A(λ) = In −Z∗2 

λIp − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

]−1(1 

Z2Z 
∗ 

)−1λ 

Z2, 
B(λ) = Im +X∗2 

λIp − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

]−1(1 

Z2Z 
∗ 

)−1 1 

X2, 
(8.17) 
C(λ) = Z∗2 

λIp − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

]−1(1 

Z2Z 
∗ 

)−1 1 

X2, 
D(λ) = X∗2 

λIp − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

]−1(1 

Z2Z 
∗ 

)−1 1 

Z2, 
then (8.5) could be written as 
det 

lp,j 

Z1A(lp,j )Z 
∗ 
1︸ ︷︷ ︸ 
(i) 
− 1 

X1B(lp,j )X 
∗ 
1︸ ︷︷ ︸ 
(ii)(8.18) 
+ lp,j 

Z1C(lp,j )X 
∗ 
1︸ ︷︷ ︸ 
(iii) 
+ lp,j 

X1D(lp,j )Z 
∗ 
1︸ ︷︷ ︸ 
(iv) 

= 0. 
The remaining is to find second-order approximation of the four terms on the left- 
hand side of (8.18). 
Using Lemma A.5 in the Appendix, we have 
(i) = Eλi 

Z1A(λi)Z 
∗ 
1 + 
lp,j 

Z1A(lp,j )Z 
∗ 
1 −E 
λi 

Z1A(λi)Z 
∗ 

= (λi + yλ2i s(λi)) · IM + lp,jn Z1A(lp,j )Z∗1 − λin Z1A(λi)Z∗1 
+ λi 

Z1A(λi)Z 
∗ 
1 −E 
λi 

Z1A(λi)Z 
∗ 

= (λi + yλ2i s(λi)) · IM + lp,j − λin Z1A(lp,j )Z∗1(8.19) 
+ λi 

Z1 

A(lp,j )−A(λi))Z∗1 
+ λi√ 

[ 1√ 

Z1A(λi)Z 
∗ 
1 −E 
1√ 

Z1A(λi)Z 
∗ 


→ (λi + yλ2i s(λi)) · IM + (lp,j − λi) · (1 + 2yλis(λi)+ λ2i ym1(λi)) · IM 
+ λi√ 

[ 1√ 

Z1A(λi)Z 
∗ 
1 −E 
1√ 

Z1A(λi)Z 
∗ 



LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 445 
(ii) = E 1 

X1B(λi)X 
∗ 
1 + 


X1B(lp,j )X 
∗ 
1 −E 


X1B(λi)X 
∗ 

= U (1 − c − cλis(λi)) · 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗ + 1 

X1 

B(lp,j )−B(λi))X∗1 
+ 1√ 

[ 1√ 

X1B(λi)X 
∗ 
1 −E 
1√ 

X1B(λi)X 
∗ 


(8.20) 
→ U (1 − c − cλis(λi)) · 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗ 
−U(lp,j − λi) · cm3(λi) · 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗ 
+ 1√ 

[ 1√ 

X1B(λi)X 
∗ 
1 −E 
1√ 

X1B(λi)X 
∗ 



(iii) = lp,j 

Z1C(lp,j )X 
∗ 
1 −E 
λi 

Z1C(λi)X 
∗ 

= lp,j 

Z1C(lp,j )X 
∗ 
1 − 
λi 

Z1C(λi)X 
∗ 
1 + 
λi 

Z1C(λi)X 
∗ 

−Eλi 

Z1C(λi)X 
∗ 

(8.21) 
= lp,j 

Z1 

C(lp,j )−C(λi))X∗1 + lp,j − λin Z1C(λi)X∗1 
+ λi 

· [Z1C(λi)X∗1 −EZ1C(λi)X∗1] 
→ λi 

· [Z1C(λi)X∗1 −EZ1C(λi)X∗1], 
(iv) = lp,j 

X1D(lp,j )Z 
∗ 
1 −E 
λi 

X1D(λi)Z 
∗ 

= lp,j 

X1D(lp,j )Z 
∗ 
1 − 
λi 

X1D(λi)Z 
∗ 
1 + 
λi 

X1D(λi)Z 
∗ 

−Eλi 

X1D(λi)Z 
∗ 

(8.22) 
= lp,j 

X1 

D(lp,j )−D(λi))Z∗1 + lp,j − λim X1D(λi)Z∗1 
446 Q. WANG AND J. YAO 
+ λi 

· [X1D(λi)Z∗1 −EX1D(λi)Z∗1] 
→ λi 

· [X1D(λi)Z∗1 −EX1D(λi)Z∗1]. 
Denote 
Rn(λi)= λi 
√ 


[ 1√ 

Z1A(λi)Z 
∗ 


− 
√ 


[ 1√ 

X1B(λi)X 
∗ 


+ λi 
√ 


[ 1√ 

Z1C(λi)X 
∗ 


(8.23) 
+ λi 
√ 


[ 1√ 

X1D(λi)Z 
∗ 


−E[·], 
where E[·] denotes the total expectation of all the preceding terms in the equation, 
and 
(λi)= 1 + 2yλis(λi)+ λ2i ym1(λi)+ aicm3(λi). 
Combining (8.18), (8.19), (8.20), (8.21), (8.22) and considering the diagonal block 
that corresponds to the row and column index in Ji × Ji leads to∣∣√p(lp,j − λi) ·(λi) · Ini +U∗i Rn(λi)Ui ∣∣→ 0.(8.24) 
Furthermore, it will be established in Step 2 below that 
(8.25) U∗i Rn(λi)Ui −→U∗i R(λi)Ui in distribution, 
for some random matrix R(λi). Using the device of Skorokhod strong representa- 
tion [Hu and Bai (2014), Skorokhod (1956)], we may assume that this convergence 
hold almost surely by considering an enlarged probability space. Under this device, 
(8.24) is equivalent to say that √p(lp,j − λi) tends to an eigenvalue of the matrix 
−U∗i R(λi)Ui/(λi). Finally, as the index j is arbitrary over the set Ji , all the ni 
random variables {√ 
p(lp,j − λi), j ∈ Ji} 
converge almost surely to the set of eigenvalues of the random matrix −U∗i R(λi)Ui 
(λi) 

Besides, due to Lemma A.3, we have 
(λi)= 1 + 2yλis(λi)+ λ2i ym1(λi)+ acm3(λi) 
= (1 − ai − c)(1 + ai(y − 1)) 

(ai − 1)(−1 + 2ai + c + a2i (y − 1)) 

Step 2: Proof of the convergence (8.25) and structure of the random matrix 
R(λi). In the second step, we aim to find the matrix limit of the random ma- 
trix U∗i Rn(λi)Ui . First, we show U∗i Rn(λi)Ui equals to another random matrix 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 447 
U∗i R˜n(λi)Ui , here R˜n(λi) is the type of random sesquilinear form. Then using the 
results in Bai and Yao (2008) (Proposition 3.1 and Remark 1), we are able to find 
the matrix limit of R˜n(λi). 
By assumption (b) that xi =1/2p wi , we have its first M components: 
X1 =1/2M W1 =U 
⎛ 
⎜⎝ 
√ 
a1 
. . . √ 
ak 
⎞ 
⎟⎠U∗W1. 
Recall the definition of Rn(λi) in (8.23), we have 
U∗i Rn(λi)Ui 
=U∗i 
√ 
pλi 

Z1A(λi)Z 
∗ 
1Ui − 
√ 


⎛ 
⎜⎝ 
√ 
a1 
. . . √ 
ak 
⎞ 
⎟⎠U∗i W1B(λi) 
×W ∗1 Ui 
⎛ 
⎜⎝ 
√ 
a1 
. . . √ 
ak 
⎞ 
⎟⎠ 
+U∗i 
√ 
pλi 

Z1C(λi)W 
∗ 
1 Ui 
⎛ 
⎜⎝ 
√ 
a1 
. . . √ 
ak 
⎞ 
⎟⎠ 
+ λi 
√ 


⎛ 
⎜⎝ 
√ 
a1 
. . . √ 
ak 
⎞ 
⎟⎠U∗i W1D(λi)Z∗1Ui −E[·](8.26) 
=U∗i 

λi 
√ 


Z1A(λi)Z 
∗ 
1 − ai 
√ 


W1B(λi)W 
∗ 
1 + 
√ 
aiλi 
√ 


Z1C(λi)W 
∗ 

+ √aiλi 
√ 


W1D(λi)Z 
∗ 


Ui −E[·] 
=U∗i 

Z1 W1 

⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠ 

Z∗1 
W ∗1 

Ui −E[·] 
:=U∗i R˜n(λi)Ui, 
where 
R˜n(λi) := (Z1 W1) 
⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠ 

Z∗1 
W ∗1 

−E[·]. 
448 Q. WANG AND J. YAO 
Finally, using Lemma A.6 in the Appendix leads to the result. The proof of Theo- 
rem 4.1 is complete. 
APPENDIX A: SOME LEMMAS 
LEMMA A.1. Let R be a M × M real-valued matrix, U = (U1 · · · Uk) 
and V = (V1 · · · Vk) be two orthogonal bases of some subspace E ⊆ RM of 
dimension M , where both Ui and Vi are of size M ×ni , satisfying n1 + · · ·+nk = 
M . Then the eigenvalues of the two ni × ni matrices U∗i RUi and V ∗i RVi are the 
same. 
PROOF. It is sufficient to prove that there exists a ni ×ni orthogonal matrix A, 
such that 
Vi =Ui ·A.(A.1) 
If it is true, then V ∗i RVi = A∗(U∗i RUi)A. Since A is orthogonal, we have the 
eigenvalues of V ∗i RVi and U∗i RUi are the same. Therefore, it only remains 
to show (A.1). Let Ui = (u1 · · · uni ) and Vi = (v1 · · · vni ). Define A = 
(als)1≤l,s≤ni , such that⎧⎪⎪⎨ 
⎪⎪⎩ 
v1 = a11u1 + · · · + ani1uni , 
... 
vni = a1niu1 + · · · + anini uni . 
Put in matrix form: 

v1 · · · vni 
)= (u1 · · · uni ) 
⎛ 
⎜⎝ 
a11 · · · a1ni 
... 
. . . 
... 
0 · · · anini 
⎞ 
⎟⎠ , 
that is, Vi = Ui · A. Since 〈vi, vj 〉 = 〈a·i , a·j 〉 (by orthogonality of {uj }), where 
a·k = (alk)1≤k≤ni , the matrix A is then orthogonal.  
LEMMA A.2. Suppose X = (x1, . . . , xn) is a p×n matrix, with each columns 
{xi} being independent random vectors. Y = (y1, . . . , yn) is defined similarly. Let 
p be the covariance matrix between xi and yi , A is a deterministic matrix, then 
we have 
XAY ∗ −→ trA ·p. 
Moreover, if A is random but independent of X and Y , then we have 
XAY ∗ −→ E trA ·p.(A.2) 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 449 
PROOF. We consider the (i, j)th entry of XAY ∗: 
XAY ∗(i, j)= 
n∑ 
k,l=1 
X(i, k)A(k, l)Y ∗(l, j)= 
n∑ 
k,l=1 
XikYjlAkl.(A.3) 
Since XikYjl → p(i, j) when k = l, the right-hand side of (A.3) tends to 
p(i, j) ·∑nk=1 Akk , which is equivalent to say that 
XAY ∗ → trA ·p. 
Equation (A.2) is simply due to the conditional expectation. The proof of 
Lemma A.2 is complete.  
In all the following, write λ as the outlier limit φ(a) in (3.4), that is, 
λ := a(a − 1 + c) 
a − 1 − ay . 
LEMMA A.3. With s(z), m1(z)−m4(z) defined in (8.1), we have 
s(λ)= a(y − 1)+ 1 
(a − 1)(a + c − 1) , 
m1(λ)= (a(y − 1)+ 1) 
2(−1 + 2a + a2(y − 1)+ y(c − 1)) 
(a − 1)2(a + c − 1)2(−1 + 2a + c + a2(y − 1)) , 
m2(λ)= 1 
a − 1 , 
m3(λ)= −(a(y − 1)+ 1) 

(a − 1)2(−1 + 2a + c + a2(y − 1)) , 
m4(λ)= −1 + 2a + c + a 
2(−1 + c(y − 1)) 
(a − 1)2(−1 + 2a + c + a2(y − 1)) . 
SKETCH OF THE PROOF OF LEMMA A.3. In this short proof, we skip all the 
detailed calculations. Recall the definition of s(z) in (8.11), its value at λ is 
s(λ)= a(y − 1)+ 1 
(a − 1)(a + c − 1) .(A.4) 
Also, (8.11) says that s(z) is the solution of the following equation: 
(A.5) z(c + zy)s2(z)+ (c(z(1 − y)+ 1 − c)+ 2zy)s(z)+ c + y − cy = 0. 
Taking derivatives on both sides of (A.5) and combing with (A.4) will give the 
value of s′(λ). On the other hand, according to (8.10), it holds 
s(z)+ 1 

(1 − c)= cs(z),(A.6) 
450 Q. WANG AND J. YAO 
taking derivatives on both sides again will give the value of s′(λ). Finally, the 
above five values are all some linear combinations of s(λ) and s ′(λ). The proof of 
Lemma A.3 is complete.  
LEMMA A.4. Under assumptions (a)–(d), 


tr 
{( 
λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1} 
a.s.−→ 1 
a + c − 1 . 
PROOF. We first condition on Z2, then we can use the result in Zheng, Bai and 
Yao (2013) (Lemma 4.3), which says that 


tr 
(1 

· 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
→ m˜(z) a.s., 
where m˜(z) is the unique solution to the equation 
m˜(z)= 
∫ 1 


− 11−cm˜(z) 
dFy(x)(A.7) 
satisfying 
(z) · (m˜(z))≥ 0, 
here, Fy(x) is the LSD of 1nZ2Z 
∗ 
2 (deterministic), which is the standard M–P 
law with parameter y. Besides, if we denote its Stieltjes transform as s(z) :=∫ 1 
x−z dFy(x), then (A.7) could be written as 
m˜(z)= 
∫ 

x − z1−cm˜(z) 
dFy(x)= z · s 


1 − cm˜(z) 

.(A.8) 
Since we know that the Stieltjes transform of the LSD of a standard sample covari- 
ance matrix satisfies: 
s(z)= 1 
1 − y − yzs(z)− z,(A.9) 
then bringing (A.8) into (A.9) leads to 
m˜(z) 

= 1 
1 − y − y · z1−cm˜(z) · m˜(z)z − z1−cm˜(z) 

whose nonnegative solution is unique, which is 
(A.10) m˜(z)= −1 + y + z− zc + 
√ 
(1 − y − z+ zc)2 + 4z(yc − y − c) 
2(yc − y − c) . 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 451 
Therefore, we have for fixed 1 

Z2Z 
∗ 
2 , 


tr 

λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
→ m˜ 
(1 
λ 

= 1 
a + c − 1 
almost surely. Finally, due to the fact that for each ω, the ESD of 1 

Z2Z 
∗ 
2(ω) will 
tend to the same limit (standard M–P distribution), which is independent of the 
choice of ω. Therefore, we have for all 1 

Z2Z 
∗ 
2 (not necessarily deterministic but 
only independent of 1 

X2X 
∗ 
2), 


tr 

λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
→ 1 
a + c − 1 
almost surely. The proof of Lemma A.4 is complete.  
LEMMA A.5. A(λ), B(λ), C(λ) and D(λ) are defined in (8.17), then 
(l − λ) · 1 

Z1A(l)Z 
∗ 
1 → (l − λ) · 

1 + yλs(λ)) · IM,(A.11) 
λ 

Z1 

A(l)−A(λ)]Z∗1 → (l − λ) · (λys(λ)+ λ2ym1(λ)) · IM,(A.12) 


X1 

B(l)−B(λ))X∗1 → −(l − λ) · cm3(λ) ·U 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗,(A.13) 


Z1 

C(l)−C(λ))X∗1 + l − λn Z1C(λ)X∗1 → (l − λ) · 0M×M,(A.14) 


X1 

D(l)−D(λ))Z∗1 + l − λm X1D(λ)Z∗1 → (l − λ) · 0M×M.(A.15) 
PROOF. Proof of (A.11): Since Z1 is independent of A and Cov(Z1) = IM , 
we combine this fact with Lemma A.2: 
(l − λ) · 1 

Z1A(l)Z 
∗ 
1 → (l − λ) · 


E trA(l) · IM.(A.16) 
Considering the expression of A(l), we have 


E trA(λ)= 1 

E tr 

In −Z∗2(λIp − S)−1 
(1 

Z2Z 
∗ 

)−1λ 

Z2 

= 1 − λ 

E tr(λIp − S)−1 
= 1 − yλ 
∫ 1 
λ− x dFc,y(x) 
= 1 + yλs(λ). 
452 Q. WANG AND J. YAO 
Therefore, combine with (A.16), we have 
(l − λ) · 1 

Z1A(l)Z 
∗ 
1 → (l − λ) 

1 + yλs(λ)) · IM. 
Proof of (A.12): Bringing the expression of A(l) into consideration, we first have 
A(l)−A(λ) 
= Z∗2(λIp − S)−1 
(1 

Z2Z 
∗ 

)−1λ 

Z2 
−Z∗2(lIp − S)−1 
(1 

Z2Z 
∗ 

)−1 l 

Z2 
= Z∗2(λIp − S)−1 
(1 

Z2Z 
∗ 

)−1λ− l 

Z2 
+Z∗2 

(λIp − S)−1 − (lIp − S)−1](1 

Z2Z 
∗ 

)−1 l 

Z2 
= (l − λ) · 

−Z∗2(λIp − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

Z2 
+Z∗2(λIp − S)−1(lIp − S)−1 
(1 

Z2Z 
∗ 

)−1 l 

Z2 


Then using Lemma A.2 for the same reason, we have 
λ 

Z1 

A(l)−A(λ)]Z∗1 → λn{E tr(A(l)−A(λ))} · IM 
and 


E tr 

A(l)−A(λ)) 
= (l − λ) · 

−1 

E tr 

Z∗2(λIp − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

Z2 

+ 1 

E tr 

Z∗2(λIp − S)−1(lIp − S)−1 
(1 

Z2Z 
∗ 

)−1 l 

Z2 
}] 
= (l − λ) · 

−1 

E tr(λIp − S)−1 + λ 

E tr(λIp − S)−2 + o(1) 

= (l − λ) · 


∫ 1 
x − λ dFc,y(x)+ λy 
∫ 1 
(λ− x)2 dFc,y(x)+ o(1) 

= (l − λ) · [ys(λ)+ λym1(λ)+ o(1)]. 
Therefore, we have 
λ 

Z1 

A(l)−A(λ)]Z∗1 → (l − λ) · (yλs(λ)+ λ2ym1(λ)) · IM. 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 453 
Proof of (A.13): First, recall the fact that 
Cov(X1)=U 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗ 
and X1 is independent of B . Using Lemma A.2, we have 


X1 

B(l)−B(λ))X∗1 → 1mE tr(B(l)−B(λ)) ·U 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗. 
The part 


E tr 

B(l)−B(λ)) 
= 1 

E tr 

X∗2 

(lIp − S)−1 − (λIp − S)−1](1 

Z2Z 
∗ 

)−1 1 

X2 

= (l − λ) · 

− 1 

E tr 

(λIp − S)−2S}+ o(1)] 
= (l − λ) · 

−c 
∫ 

(λ− x)2 dFc,y(x)+ o(1) 

= (l − λ) · (−cm3(λ)+ o(1)). 
Therefore, we have 


X1 

B(l)−B(λ))X∗1 → −c(l − λ)m3(λ) ·U 
⎛ 
⎜⎝ 
a1 
. . . 
ak 
⎞ 
⎟⎠U∗. 
Proof of (A.14) and (A.15): (A.14) and (A.15) are derived simply due to the 
fact that Cov(X1,Z1)= 0M×M . The proof of Lemma A.5 is complete.  
LEMMA A.6. Define 
R˜n(λi) := (Z1 W1) 
⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠ 

Z∗1 
W ∗1 

−E[·] 
then R˜n(λi) weakly converges to a M × M symmetric random matrix R(λi) = 
(Rmn), which is made with independent Gaussian entries of mean zero and vari- 
ance 
Var(Rmn)= 

2θi + (v4 − 3)ωi, m= n, 
θi, m = n, 
454 Q. WANG AND J. YAO 
where 
ωi = a 

i (ai + c − 1)2(c + y) 
(ai − 1)2 , 
θi = a 

i (ai + c − 1)2(cy − c − y) 
−1 + 2ai + c + a2i (y − 1) 

PROOF. Since Z1 and W1 are independent, both are made with i.i.d. com- 
ponents, having the same first four moments, we can now view 

Z1 W1 

as a 
M × (n+m) table ξ , made with i.i.d elements of mean 0 and variance 1. Besides, 
we can rewrite the expression of A(λ), B(λ), C(λ) and D(λ) as follows: 
A(λ)= In −Z∗2 

λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1λ 

Z2, 
B(λ)= Im +X∗2 

λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 1 

X2, 
C(λ)= Z∗2 

λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 1 

X2, 
D(λ)=X∗2 

λ · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 1 

Z2. 
It holds 
A(λ)∗ =A(λ), B(λ)∗ = B(λ), m ·C(λ)∗ = n ·D(λ), 
therefore, the matrix ⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠ 
is symmetric. Define 
An(λi)= 
√ 
n+m · 
⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠ .(A.17) 
Now we can apply the results in Bai and Yao (2008) (Proposition 3.1 and Re- 
mark 1), which says that R˜n(λi) weakly converges to a M ×M symmetric random 
matrix R(λi) = (Rmn), which is made with i.i.d. Gaussian entries of mean zero 
and variance 
Var(Rmn)= 

2θi + (v4 − 3)ωi, m= n, 
θi, m = n. 
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 455 
The following is devoted to the calculation of the values of θi and ωi . 
Calculating of θi : From the definition of θ [see Bai and Yao (2008) for details], 
we have 
θi = lim 1 
n+m trA 

n(λi) 
= lim tr 
⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠ 
× 
⎛ 
⎜⎜⎝ 
λi 
√ 
pA(λi) 

λi 
√ 
aipC(λi) 

λi 
√ 
aipD(λi) 

−ai√pB(λi) 

⎞ 
⎟⎟⎠(A.18) 
= lim tr 
⎛ 
⎜⎝ 
pλ2i 
n2 
A2(λi)+ λ 

i aip 
nm 
C(λi)D(λi)  

λ2i aip 
nm 
D(λi)C(λi)+ a 

i p 
m2 
B2(λi) 
⎞ 
⎟⎠ 
= lim 

pλ2i 
n2 
trA2(λi)+ 2λ 

i aip 
nm 
trC(λi)D(λi)+ a 

i p 
m2 
trB2(λi) 


trA2(λi) = tr 

In +Z∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 
× λi 

Z2Z 
∗ 
2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1λi 

Z2 
− 2Z∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1λi 

Z2 

(A.19) 
= n+ λ2i tr(λiIp − S)−2 − 2λi tr(λiIp − S)−1 
= n+ pλ2i m1(λi)+ 2pλis(λi), 
trC(λi)D(λi) = tr 

Z∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 
× 1 

X2X 
∗ 
2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

Z2 

(A.20) 
= tr(λiIp − S)−1S(λiIp − S)−1 = pm3(λi), 
trB2(λi) = tr 

Im +X∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 
× 1 

X2X 
∗ 
2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

X2 
456 Q. WANG AND J. YAO 
+ 2X∗2(λiIp − S)−1 
(1 

Z2Z 
∗ 

)−1 1 

X2 

(A.21) 
= m+ tr(λiIp − S)−1F(λiIp − S)−1S + 2 tr(λiIp − S)−1S 
= m+ pm4(λi)+ 2pm2(λi). 
Combining (A.18), (A.19), (A.20) and (A.21), we have 
θi = λ2i y 

1 + yλ2i m1(λi)+ 2yλis(λi) 

+ 2λ2i aicym3(λi)+ a2i c 

1 + cm4(λi)+ 2cm2(λi)) 
= a 

i (ai + c − 1)2(cy − c − y) 
−1 + 2ai + c + a2i (y − 1) 

Calculating of ωi : 
ωi = lim 1 
n+m 
n+m∑ 
i=1 

An(λi)(i, i) 
)2 
(A.22) 
= lim 

n∑ 
i=1 
λ2i p 
n2 
A2(i, i)+ 
m∑ 
i=1 
a2i p 
m2 
B2(i, i) 


In the following, we will show that A(i, i) and B(i, i) both tend to some limits 
that is independent of i: 
A(i, i) 
= 1 − 

Z∗2 

λiIp − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

]−1(1 

Z2Z 
∗ 

)−1λi 

Z2 

(i, i)(A.23) 
= 1 − λi 


Z∗2 

λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

]−1 
Z2 

(i, i). 
If we denote ηi as the ith column of Z2, we have 


Z2Z 
∗ 
2 = 



η1 · · · ηn) · 
⎛ 
⎜⎝ 
η∗1 
... 
η∗n 
⎞ 
⎟⎠= 1 

ηiη 
∗ 
i + 


Z2iZ 
∗ 
2i , 
where Z2i is independent of ηi . Since( 
λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
− 

λi · 1 

Z2iZ 
∗ 
2i − 


X2X 
∗ 

)−1 
= − 

λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1λi 

ηiη 
∗ 


λi · 1 

Z2iZ 
∗ 
2i − 


X2X 
∗ 

)−1 

LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 457 
we have ( 
λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
(A.24) 
= (λi · 


Z2iZ 
∗ 
2i − 1mX2X∗2)−1 
1 + λi 

η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi 

Bringing (A.24) into (A.23), 
A(i, i)= 1 − λi 

η∗i 

λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

]−1 
ηi 
= 1 − 
λi 

η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi 
1 + λi 

η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi 
= 1 
1 + λi 

η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi 

whose denominator of (A.25) equals 
1 + λi 

tr 

λi · 1 

Z2iZ 
∗ 
2i − 


X2X 
∗ 

)−1 
ηiη 
∗ 
i .(A.25) 
Since ηi is independent of (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1, (A.25) converges to the 
value 1 + λiy · 1ai+c−1 according to Lemma A.4. Therefore, we have 
A(i, i)→ 1 
1 + λiy · 1ai+c−1 
,(A.26) 
which is independent of the choice of i. 
For the same reason, we have 
B(i, i) 
= 1 + 

X∗2 

λiIp − 
(1 

Z2Z 
∗ 

)−1 1 

X2X 
∗ 

]−1(1 

Z2Z 
∗ 

)−1 1 

X2 

(i, i)(A.27) 
= 1 + 

X∗2 

λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

]−1 1 

X2 

(i, i). 
If we denote δi as the ith column of X2, then we have 


X2X 
∗ 
2 = 



δ1 · · · δm) · 
⎛ 
⎜⎜⎝ 
δ∗1 
... 
δ∗m 
⎞ 
⎟⎟⎠= 1mδiδ∗i + 1mX2iX∗2i 
458 Q. WANG AND J. YAO 
and( 
λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
− 

λi · 1 

Z2Z 
∗ 
2 − 


X2iX 
∗ 
2i 
)−1 


λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 1 

δiδ 
∗ 


λi · 1 

Z2Z 
∗ 
2 − 


X2iX 
∗ 
2i 
)−1 

So we have ( 
λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

)−1 
(A.28) 
= (λi · 


Z2Z 
∗ 
2 − 1mX2iX∗2i )−1 
1 − 1 

δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi 

Combine (A.27) and (A.28), we have 
B(i, i)= 1 + δ∗i 

λi · 1 

Z2Z 
∗ 
2 − 


X2X 
∗ 

]−1 1 

δi 
= 1 + 


δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi 
1 − 1 

δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi 
(A.29) 
= 1 
1 − 1 

δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi 

Using the independence between δi and (λi · 1nZ2Z∗2 − 1T X2iX∗2i )−1 and Lem- 
ma A.4 again, we have 


δ∗i 

λi · 1 

Z2Z 
∗ 
2 − 


X2iX 
∗ 
2i 
)−1 
δi → c · 1 
ai + c − 1 . 
Therefore, we have 
B(i, i)→ 1 
1 − c 
ai+c−1 

which is also independent of the choice of i. 
Finally, taking the definition of ωi in (A.22) into consideration, we have 
ωi = λ 

i y 
(1 + yλi · 1ai+c−1)2 
+ a 

i c 
(1 − c 
ai+c−1) 

(A.30) 
= a 

i (ai + c − 1)2(c + y) 
(ai − 1)2 . 
The proof of Lemma A.6 is complete.  
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 459 
REFERENCES 
ANDERSON, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New 
York. MR0771294 
BAI, Z. and YAO, J. (2008). Central limit theorems for eigenvalues in a spiked population model. 
Ann. Inst. Henri Poincaré Probab. Stat. 44 447–474. MR2451053 
BAI, Z. and YAO, J. (2012). On sample eigenvalues in a generalized spiked population model. J. Mul- 
tivariate Anal. 106 167–177. MR2887686 
BAI, Z. D., YIN, Y. Q. and KRISHNAIAH, P. R. (1987). On limiting empirical distribution function 
of the eigenvalues of a multivariate F matrix. Theory Probab. Appl. 32 490–500. 
BAI, Z., JIANG, D., YAO, J.-F. and ZHENG, S. (2009). Corrections to LRT on large-dimensional 
covariance matrix by RMT. Ann. Statist. 37 3822–3840. MR2572444 
BAIK, J., BEN AROUS, G. and PÉCHÉ, S. (2005). Phase transition of the largest eigenvalue for 
nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697. MR2165575 
BAIK, J. and SILVERSTEIN, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked 
population models. J. Multivariate Anal. 97 1382–1408. MR2279680 
BENAYCH-GEORGES, F., GUIONNET, A. and MAIDA, M. (2011). Fluctuations of the extreme 
eigenvalues of finite rank deformations of random matrices. Electron. J. Probab. 16 1621–1662. 
MR2835249 
BENAYCH-GEORGES, F. and NADAKUDITI, R. R. (2011). The eigenvalues and eigenvectors of 
finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521. MR2782201 
CAI, T., LIU, W. and XIA, Y. (2013). Two-sample covariance matrix testing and support recovery 
in high-dimensional and sparse settings. J. Amer. Statist. Assoc. 108 265–277. MR3174618 
CAPITAINE, M. (2013). Additive/multiplicative free subordination property and limiting eigenvec- 
tors of spiked additive deformations of Wigner matrices and spiked sample covariance matrices. 
J. Theory Probab. 26 595–648. MR3090543 
CAPITAINE, M., DONATI-MARTIN, C. and FÉRAL, D. (2009). The largest eigenvalues of finite rank 
deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. 
Probab. 37 1–47. MR2489158 
DHARMAWANSA, P., JOHNSTONE, I. M. and ONATSKI, A. (2014). Local asymptotic normality of 
the spectrum of high-dimensional spiked F-ratios. Preprint. Available at arXiv:1411.3875. 
FÉRAL, D. and PÉCHÉ, S. (2007). The largest eigenvalue of rank one deformation of large Wigner 
matrices. Comm. Math. Phys. 272 185–228. MR2291807 
HAN, X., PAN, G. and ZHANG, B. (2016). The Tracy–Widom law for the largest eigenvalue of F 
type matrices. Ann. Statist. 44 1564–1592. MR3519933 
HU, J. and BAI, Z. (2014). Strong representation of weak convergence. Sci. China Math. 57 2399– 
2406. MR3266500 
JOHNSTONE, I. M. (2001). On the distribution of the largest eigenvalue in principal components 
analysis. Ann. Statist. 29 295–327. MR1863961 
KARGIN, V. (2015). On estimation in the reduced-rank regression with a large number of responses 
and predictors. J. Multivariate Anal. 140 377–394. MR3372575 
KRITCHMAN, S. and NADLER, B. (2008). Determining the number of components in a factor model 
from limited noisy data. Chemom. Intell. Lab. Syst. 94 19–32. 
LI, J. and CHEN, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. 
Statist. 40 908–940. MR2985938 
MUIRHEAD, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York. MR0652932 
NADLER, B. (2010). Nonparametric detection of signals by information theoretic criteria: Per- 
formance analysis and an improved estimator. IEEE Trans. Signal Process. 58 2746–2756. 
MR2789420 
ONATSKI, A. (2009). Testing hypotheses about the numbers of factors in large factor models. Econo- 
metrica 77 1447–1479. MR2561070 
460 Q. WANG AND J. YAO 
PASSEMIER, D. and YAO, J.-F. (2012). On determining the number of spikes in a high-dimensional 
spiked population model. Random Matrices Theory Appl. 1 1150002, 19. MR2930380 
PASSEMIER, D. and YAO, J. (2014). Estimation of the number of spikes, possibly equal, in the 
high-dimensional case. J. Multivariate Anal. 127 173–183. MR3188885 
PAUL, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance 
model. Statist. Sinica 17 1617–1642. MR2399865 
PÉCHÉ, S. (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. 
Probab. Theory Related Fields 134 127–173. MR2221787 
PIZZO, A., RENFREW, D. and SOSHNIKOV, A. (2013). On finite rank deformations of Wigner ma- 
trices. Ann. Inst. Henri Poincaré Probab. Stat. 49 64–94. MR3060148 
RENFREW, D. and SOSHNIKOV, A. (2013). On finite rank deformations of Wigner matrices II: 
Delocalized perturbations. Random Matrices Theory Appl. 2 1250015, 36. MR3039820 
SHI, D. (2013). Asymptotic joint distribution of extreme sample eigenvalues and eigenvectors in the 
spiked population model. Preprint. Available at arXiv:1304.6113. 
SILVERSTEIN, J. W. (1985). The limiting eigenvalue distribution of a multivariate F matrix. SIAM 
J. Math. Anal. 16 641–646. MR0783987 
SKOROKHOD, A. V. (1956). Limit theorems for stochastic processes. Theory Probab. Appl. 1 261– 
290. 
WACHTER, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann. 
Statist. 8 937–957. MR0585695 
WANG, Q., SU, Z. and YAO, J. (2014). Joint CLT for several random sesquilinear forms with 
applications to large-dimensional spiked population models. Electron. J. Probab. 19 1–28. 
MR3275855 
ZHENG, S. R., BAI, Z. D. and YAO, J. F. (2013). CLT for linear spectral statistics of random matrix 
S−1T . Preprint. Available at arXiv:1305.1376. 
DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE 
UNIVERSITY OF HONG KONG 
POKFULAM 
HONG KONG 
E-MAIL: wqw8813@gmail.com 
jeffyao@hku.hk 


essay、essay代写