The Annals of Statistics
2017, Vol. 45, No. 1, 415–460
DOI: 10.1214/16-AOS1463
© Institute of Mathematical Statistics, 2017
EXTREME EIGENVALUES OF LARGE-DIMENSIONAL SPIKED
FISHER MATRICES WITH APPLICATION
BY QINWEN WANG AND JIANFENG YAO
University of Hong Kong
Consider two p-variate populations, not necessarily Gaussian, with co-
variance matrices 1 and 2, respectively. Let S1 and S2 be the correspond-
ing sample covariance matrices with degrees of freedom m and n. When the
difference between 1 and 2 is of small rank compared to p,m and n,
the Fisher matrix S := S−12 S1 is called a spiked Fisher matrix. When p,m
and n grow to infinity proportionally, we establish a phase transition for the
extreme eigenvalues of the Fisher matrix: a displacement formula showing
that when the eigenvalues of (spikes) are above (or under) a critical value,
the associated extreme eigenvalues of S will converge to some point outside
the support of the global limit (LSD) of other eigenvalues (become outliers);
otherwise, they will converge to the edge points of the LSD. Furthermore, we
derive central limit theorems for those outlier eigenvalues of S. The limiting
distributions are found to be Gaussian if and only if the corresponding pop-
ulation spike eigenvalues in are simple. Two applications are introduced.
The first application uses the largest eigenvalue of the Fisher matrix to test
the equality between two high-dimensional covariance matrices, and explicit
power function is found under the spiked alternative. The second application
is in the field of signal detection, where an estimator for the number of signals
is proposed while the covariance structure of the noise is arbitrary.
1. Introduction. Consider two p-variate populations with covariance matri-
ces 1 and 2, and let S1 and S2 be the sample covariance matrices from samples
of the two populations with degrees of freedom m and n, respectively. When the
difference between 1 and 2 is of finite rank, the Fisher matrix S := S−12 S1
is called a spiked Fisher matrix. In this paper, we derive three results related to
the extreme eigenvalues of the spiked Fisher matrix for general populations in
the large-dimensional regime, that is, the dimension (p) grows to infinity together
with the two sample sizes (m and n). Our first result is a phase transition phe-
nomenon for the extreme eigenvalues of S: a displacement formula showing that
when the eigenvalues of (spikes) are above (or under) a critical value, the asso-
ciated extreme eigenvalues of S will converge to some point outside the support
of the global limit (LSD) of other eigenvalues (become outliers), and the loca-
tion of this limit only depends on the corresponding population spike of and
Received April 2015; revised March 2016.
MSC2010 subject classifications. Primary 62H12; secondary 60F05.
Key words and phrases. Large-dimensional Fisher matrices, spiked Fisher matrix, spiked popu-
lation model, extreme eigenvalue, phase transition, central limit theorem, signal detection, high-
dimensional data analysis.
415
416 Q. WANG AND J. YAO
two dimension-to-sample-size ratios; otherwise, they will converge to the edge
points of the LSD. The second result is on the second-order behavior of those out-
lier eigenvalues of S. We show that after proper normalization, a packet of those
outlier eigenvalues (corresponding to the same spike in ) converge to the eigen-
values’ distribution of some structured Gaussian random matrix. In particular, the
limiting distribution of the outlier eigenvalue of S (after normalization) is Gaus-
sian if and only if the corresponding spike in is simple. Finally, as an extension,
we consider the joint distribution of all those outlier eigenvalues (correspond to
different spikes in ) as a whole, and it is shown that those outlier eigenvalues
(after normalization) converge to the eigenvalues’ distribution of some block ran-
dom matrix, whose structure can be fully identified. Also as a special case, if all
the spikes in are simple, then the joint distribution of the outlier eigenvalues of
S is multivariate Gaussian.
There exists a vast literate on the spectral analysis of multivariate Fisher matri-
ces under the assumption that both populations are Gaussian and share the same
covariance matrix, that is, 1 =2. The joint distribution of the eigenvalues of the
corresponding Fisher matrix S was first simultaneously and independently pub-
lished in 1939 by R. A. Fisher, S. N. Roy, P. L. Hsu and M. A. Girshick. Later in
1980, Wachter (1980) finds a deterministic limit, the celebrated Wacheter distribu-
tion, for the empirical measure of these eigenvalues when the dimension p grows
to infinity proportionally with the degrees of freedom m and n (large-dimensional
regime). Wachter’s result has been later extended to non-Gaussian populations us-
ing the tools from the random matrix theory and two early examples of such ex-
tensions are Silverstein (1985) and Bai, Yin and Krishnaiah (1987).
In this paper, we are also interested in the large-dimensional regime, while al-
lowing 1 and 2 to be separated by a (finite) rank-M matrix . Besides, the
two populations can have arbitrary distributions other than Gaussian. From the
perturbation theory, when M is a fixed integer while p, m and n grow to infinity
proportionally, the empirical measure of the p eigenvalues of S will be affected
by a difference of order M/p (→ 0), so that its limit remains the Wachter dis-
tribution. Therefore, our main concern is the local asymptotic behaviors of the
M extreme eigenvalues of S (other than the global limit). In a recent preprint
Dharmawansa, Johnstone and Onatski (2014), by assuming both population are
Gaussian and M = 1, these authors show that, when the norm of the rank-1 dif-
ference (spike) exceeds a phase transition threshold, the asymptotic behavior of
the log-ratio of the joint density of these characteristic roots under a local deviation
from the spike depends only on the largest eigenvalue lp,1 and the statistical exper-
iment of observing all the eigenvalues is locally asymptotically normal (LAN). As
a by-product of their analysis, the authors also establish joint asymptotic normal-
ity of a few of the largest eigenvalues when the corresponding spikes in (with
M > 1) exceed the phase transition threshold. The analysis given in this refer-
ence highly relies on the Gaussian assumption so that the joint density function of
the eigenvalues has indeed an explicit form, and the main results are obtained via
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 417
an accurate asymptotic approximation of the log-ratio of these density functions.
Therefore, one of the main objectives of our work is to develop a general theory
without such Gaussian assumption. It is thus apparent that the joint density of the
eigenvalues of the Fisher matrix S has then no more an analytic formula and new
techniques are needed to solve the questions.
Our approach relies on the tools borrowed from the theory of random matrices.
A methodology particularly successful both in theory and applications within this
approach relies on the spiked population model coined in Johnstone (2001). This
model assumes the population covariance matrix has the structure p = Ip +
where the rank of is M (M is a fixed integer). Again for small rank M , the
empirical eigenvalue distribution of the corresponding sample covariance matrix
remains the standard Marcˇenko–Pastur law. What makes a difference is the local
asymptotic behaviors of the extreme sample eigenvalues. For example, the fluctu-
ation of largest eigenvalues of a sample covariance matrix from a complex spiked
Gaussian population is studied in Baik, Ben Arous and Péché (2005), where the
authors uncover a phase transition phenomenon: the weak limit and the scaling of
these extreme eigenvalues are different depending on whether the eigenvalues of
(spikes) are above, equal or below a critical value, situations refereed as super-
critical, critical and sub-critical, respectively. In Baik and Silverstein (2006), the
authors consider the spiked population model with general populations (not nec-
essarily Gaussian). For the almost sure limits of the extreme sample eigenvalues,
they find that if a population spike (in ) is large or small enough, the correspond-
ing spiked sample eigenvalues will converge to a limit outside the support of the
limiting spectrum (become outliers). In Paul (2007), a CLT is established for these
outliers, that is, the super-critical case, under the Gaussian assumption and assum-
ing that population spikes are simple (multiplicity 1). The CLT for super-critical
outliers with general populations and arbitrary multiplicity numbers is developed
in Bai and Yao (2008). Joint distributions for the outlier sample eigenvalues and
eigenvectors can be found in Shi (2013) and Wang, Su and Yao (2014). A recent
related application to high-dimensional regression can be found in Kargin (2015).
Within the theory of random matrices, the techniques we use in this paper for
spiked models are closely connected to other random matrix ensembles through the
concept of small-rank perturbations. Theories on perturbed Wigner matrices can be
found in Péché (2006), Féral and Péché (2007), Capitaine, Donati-Martin and Féral
(2009), Pizzo, Renfrew and Soshnikov (2013) and Renfrew and Soshnikov (2013).
In a more general setting of finite-rank perturbation including both the additive and
the multiplicative one, referees include Benaych-Georges and Nadakuditi (2011),
Benaych-Georges, Guionnet and Maida (2011) and Capitaine (2013).
Apart from the theoretical results, we also propose two applications both in
high-dimensional hypothesis testing and signal detection, respectively. The first
application uses the largest eigenvalue of the Fisher matrix to test the following
hypotheses:
(1.1) H0 : 1 =2 vs. H1 : 1 =2 +,
418 Q. WANG AND J. YAO
where is a nonnegative definite matrix of rank M . Under this spiked alterna-
tive H1, explicit formula for the power function is derived. Our second application
is to propose an estimator for the number of signals based on noisy observations.
Other than the existing approaches [see, e.g., Kritchman and Nadler (2008), Nadler
(2010), Passemier and Yao (2012, 2014)], our method allows the covariance struc-
ture of the noise to be arbitrary.
The rest of the paper is organized as follows. First, in Section 2, the exact setting
of the spiked Fisher matrix S = S−12 S1 is introduced. Then in Section 3, we estab-
lish the phase transition phenomenon for the extreme eigenvalues of S: a displace-
ment formula is found as well as the transition boundary is explicitly obtained.
Next, CLTs for those outlier eigenvalues fluctuating around their limit (i.e., in the
super-critical case) are established first in Section 4 for one group of sample eigen-
values corresponding to a same population spike, and then in Section 6 for all the
groups jointly. Section 5 contains numerical illustrations that demonstrate the finite
sample performance of our results. In Section 7, we develop in detail two appli-
cations in high-dimensional statistics. Proofs of the main theorems (Theorems 3.1
and 4.1) are included in Section 8 while some technical lemmas are grouped in the
Appendix.
2. Spiked Fisher matrix and preliminary results. In what follows, we will
assume that 2 = Ip . This assumption does not lose any generality since the eigen-
values of the Fisher matrix S = S−12 S1 are invariant under the transformation
S1 →−1/22 S1−1/22 , S2 →−1/22 S2−1/22 .(2.1)
Also we will write p for 1 to signify the dependence on the dimension p. Let
Z = (z1, . . . , zn)= (zij )1≤i≤p,1≤j≤n(2.2)
and
W = (w1, . . . ,wm)= (wkl)1≤k≤p,1≤l≤m(2.3)
be two independent arrays, with respective size p × n and p ×m, of independent
real-valued random variables with mean 0 and variance 1. Now suppose we have
two samples {zi}1≤i≤n and {xi =1/2p wi}1≤i≤m, where {zi} and {wi} are given by
(2.2) and (2.3), and p is a rank M (M is a fixed integer) perturbation of Ip , that
is,
(2.4) p =
(
M 0
0 Ip−M
)
.
Here, M is a M ×M covariance matrix, containing k nonzero and nonunit eigen-
values (ai), with multiplicity numbers (ni) (n1 + · · · + nk = M). That is, M
has the eigen-decomposition U diag(a1, . . . , a1︸ ︷︷ ︸
n1
, . . . , ak, . . . , ak︸ ︷︷ ︸
nk
)U∗, where U is a
M ×M orthogonal matrix.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 419
The sample covariance matrices of the two observations {xi} and {zi} are
(2.5) S1 = 1
m
m∑
l=1
xlx
∗
l =
1
m
XX∗ =1/2p
( 1
m
WW ∗
)
1/2p
and
(2.6) S2 = 1
n
n∑
j=1
zj z
∗
j =
1
n
ZZ∗,
respectively.
Throughout the paper, we consider an asymptotic regime of Marcˇenko–Pastur-
type, that is,
p ∧ n∧m → ∞, yp := p/n→ y ∈ (0,1) and(2.7)
cp := p/m → c > 0.
Recall that the empirical spectral distribution (ESD) of a p × p matrix A with
eigenvalues {λj } is the distribution p−1∑pj=1 δλj where δa denotes the Dirac mass
at a. Since the total rank M generated by the k spikes is fixed, the ESD of S will
have the same limit (LSD) as there were no spikes in p . This limiting spectral
distribution, which is the celebrated Wachter distribution, has been known for a
long time.
PROPOSITION 2.1. For the Fisher matrix S = S−12 S1 with the sample covari-
ance matrices Si ’s given in (2.5)–(2.6), assume that the dimension p and the two
sample sizes n,m grow to infinity proportionally as in (2.7). Then almost surely,
the ESD of S weakly converges to a deterministic distribution Fc,y with a bounded
support [α,β] and a density function given by
fc,y(x)=
⎧⎪⎨
⎪⎩
(1 − y)√(β − x)(x − α)
2πx(c + xy) , when α ≤ x ≤ β,
0, otherwise,
(2.8)
where
α =
(1 − √c + y − cy
1 − y
)2
and β =
(1 + √c + y − cy
1 − y
)2
.(2.9)
Furthermore, if c > 1, then Fc,y has a point mass 1 − 1/c at the origin. Also, the
Stieltjes transform s(z) of Fc,y equals
s(z) = 1
zc
− 1
z
− c(z(1 − y)+ 1 − c)+ 2zy − c
√
(1 − c + z(1 − y))2 − 4z
2zc(c + zy) ,(2.10)
z /∈ [α,β].
420 Q. WANG AND J. YAO
REMARK 2.1. Assuming both populations are Gaussian, Wachter (1980),
Theorem 3.1, derives the limiting distribution for roots of the determinental equa-
tion, ∣∣mS1 − x2(mS1 + nS2)∣∣= 0, x ∈R.
The continuous component of the distribution has a compact support [A2,B2] with
density function proportional to {(x − A2)(B2 − x)}1/2/{x(1 − x2)}. It can be
readily checked that by the change of variable z = cx2/{y(1 − x2)}, the density
of the continuous component of the LSD of S is exactly (2.8). The validity of
this limit for general populations (nonnecessarily Gaussian) is due to Silverstein
(1985) and Bai, Yin and Krishnaiah (1987).
3. Phase transition of the extreme eigenvalues of S = S−12 S1. In this sec-
tion, we establish a phase transition phenomenon for the extreme eigenvalues of
S = S−12 S1, that is, when a population spike ai with multiplicity ni is larger (or
smaller) than a critical value, a packet of ni corresponding sample eigenvalues of
S will jump outside the support [α,β] of its LSD Fc,y and converge all to a fixed
limit φ(ai), which is called the displacement of the population spike ai . Otherwise,
these associated sample eigenvalues will converge to one of the edges α and β .
By assumption, the k population spike eigenvalues {ai} are all positive and
nonunit. We order them with their multiplicities in descending order together with
the p −M unit eigenvalues as
a1 = · · · = a1 > a2 = · · · = a2 > · · ·> ak0 = · · · = ak0 > 1 = · · · = 1(3.1)
> ak0+1 = · · · = ak0+1 > · · ·> ak = · · · = ak.
That is, k0 of these population spike eigenvalues are larger than 1 while the other
k − k0 are smaller. Let
Ji =
{[n1 + · · · + ni−1 + 1, n1 + · · · + ni], 1 ≤ i ≤ k0,[
p − (ni + · · · + nk)+ 1,p − (ni+1 + · · · + nk)], k0 < i ≤ k.
Notice that the cardinality of each Ji is ni . Next, the sample eigenvalues {lp,j } of
the Fisher matrix S−12 S1 are also sorted in the descending order as lp,1 ≥ lp,2 ≥· · · ≥ lp,p . Therefore, for each spike eigenvalue ai , there are ni associated sample
eigenvalues {lp,j , j ∈ Ji}. The phase transition for these extreme eigenvalues is
given in the following Theorem 3.1.
THEOREM 3.1. For the Fisher matrix S = S−12 S1 with the sample covariance
matrices Si ’s given in (2.5)–(2.6), assume that the dimension p and the two sample
sizes n,m grow to infinity proportionally as in (2.7). Then for any spike eigenvalue
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 421
ai (i = 1, . . . , k), it holds that for all j ∈ Ji , lp,j almost surely converges to a limit
λi =
⎧⎪⎪⎨
⎪⎪⎩
φ(ai), |ai − γ |> γ√c + y − cy,
β, 1 < ai ≤ γ {1 + √c + y − cy},
α, γ {1 − √c + y − cy} ≤ ai < 1,
(3.2)
where γ := 1/(1 − y) ∈ (1,∞) and
φ(ai)= ai(ai + c − 1)
ai − aiy − 1(3.3)
is the displacement of the population spike ai .
The proof of this Theorem is postponed to Section 8.1.
REMARK 3.1. Theorem 3.1 states that when the population spike ai is large
enough (ai > γ {1 + √c + y − cy}) or small enough (ai < γ {1 − √c + y − cy}),
the corresponding extreme sample eigenvalues of the spiked Fisher matrix will
converge to φ(ai), which is located outside the support [α,β] of its LSD. Other-
wise, they converge to one of its edges α and β . This phenomenon is depicted in
Figure 1 for understanding.
REMARK 3.2. Using the notation γ = 1/(1 − y), the function φ(x) in (3.3)
could be expressed as
(3.4) φ(x)= γ x(x − 1 + c)
x − γ , x = γ,
which is a rational function with a single pole at x = γ . And the function asymp-
totically equals to g(x) = γ (x + c − 1 + γ ) when |x| → ∞. On the other hand,
since φ(γ {1 − √c + y − cy}) = α and φ(γ {1 + √c + y − cy}) = β , it can be
checked that the points A(γ {1−√c + y − cy,α}) and B(γ {1+√c + y − cy,β})
are exactly the two extreme points for the function φ. An example of φ(x) with
parameters (c, y)= (15 , 12) is illustrated in Figure 2.
REMARK 3.3. It is worth observing that when y → 0, the φ(x) function tends
to the function well known in the literature for similar transition phenomenon of a
spiked sample covariance matrix, that is,
lim
y→0φ(x)= x +
cx
x − 1 , x = 1,(3.5)
see, for example, the ψ-function in Figure 4 of Bai and Yao (2012). These func-
tions [(3.4) and (3.5)] share a same shape; however, the pole here equals 1, which
is smaller than the pole γ = 1/(1 − y) [in (3.4)] for the case of a spiked Fisher
matrix.
422 Q. WANG AND J. YAO
FIG. 1. Phase transition of the extreme eigenvalues of the spiked Fisher matrix: upper-left panel:
when 1 < ai ≤ γ {1 + √c + y − cy}, the limit of the corresponding extreme sample eigenvalue
{lp,j , j ∈ Ji} is β; upper-right panel: when ai > γ {1 + √c + y − cy}, the limit of {lp,j , j ∈ Ji}
is larger than β [located at λi = φ(αi)]; lower-left panel: when γ {1 − √c + y − cy} ≤ ai < 1,
the limit of {lp,j , j ∈ Ji} is α; lower-right panel: when 0 < ai < γ {1 − √c + y − cy}, the limit of
{lp,j , j ∈ Ji} is smaller than α [located at λi = φ(αi)].
REMARK 3.4. As said in the Introduction, this phase transition phenomenon
has already been established in a preprint Dharmawansa, Johnstone and Onatski
(2014) (their Proposition 5) under Gaussian assumption and using a completely
different approach. Theorem 3.1 proves that such a phase transition phenomenon
is indeed universal.
4. Central limit theorem for the outlier eigenvalues of S−12 S1. The aim of
this section is to give a CLT for the ni-packed outlier eigenvalues:√
p
{
lp,j − φ(ai), j ∈ Ji}.
Denote U = (U1 U2 · · · Uk), where each Ui is a matrix of size M × ni that
corresponds to the spike eigenvalue ai .
THEOREM 4.1. Assume the same assumptions as in Theorem 3.1 and in ad-
dition, the variables (zij ) [in (2.2)] and (wkl) [in (2.3)] have the same first four
moments and denote v4 as their common fourth moment:
v4 = E|zij |4 = E|wkl|4, 1 ≤ i, k ≤ p,1 ≤ j ≤ n,1 ≤ l ≤m.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 423
FIG. 2. Example of the function φ(x) with (c, y) = ( 15 , 12 ). Its pole is at x = 2. When |x| → ∞,
φ(x) is getting close to the equation g(x)= 2x + 125 (see the red line). The two extreme points are at
A(0.450,0.203) and B(3.549,12.597), meaning that critical values for spikes are 0.450 and 3.549
while the support of the LSD is [0.203,12.597].
Then for any population spike ai satisfying |ai − γ | > γ√c + y − cy, the
normalized ni-packed outlier eigenvalues of S−12 S1:
√
p{lp,j − φ(ai), j ∈ Ji}
converge weakly to the distribution of the eigenvalues of the random matrix
−U∗i R(λi)Ui/(λi). Here,
(λi)= (1 − ai − c)(1 + ai(y − 1))
2
(ai − 1)(−1 + 2ai + c + a2i (y − 1))
,(4.1)
R(λi) = (Rmn) is a M × M symmetric random matrix, made with independent
Gaussian entries of mean zero and variance
Var(Rmn)=
{
2θi + (v4 − 3)ωi, m= n,
θi, m = n,(4.2)
where
ωi = a
2
i (ai + c − 1)2(c + y)
(ai − 1)2 ,(4.3)
θi = a
2
i (ai + c − 1)2(cy − c − y)
−1 + 2ai + c + a2i (y − 1)
.(4.4)
The proof of this theorem is postponed to Section 8.2.
424 Q. WANG AND J. YAO
REMARK 4.1. Notice that the result above involves the ith block Ui of the
eigen-matrix U . When the spike ai is simple, Ui is unique up to its sign, then
U∗i R(λi)Ui is uniquely determined. But when ai has multiplicities greater than 1,
Ui is not unique; actually, any rotation of Ui can be an eigenvector corresponding
to ai . But, according to Lemma A.1 in the Appendix, such a rotation will not affect
the eigenvalues of the matrix U∗i R(λi)Ui .
Next, we consider a special case where M is diagonal (U = IM ), with distinct
eigenvalues ai , that is, M = k and ni = 1 for all 1 ≤ i ≤ M . Using the previous
result of Theorem 4.1, it can be shown that after normalization, the outlier eigen-
values lp,i of S−12 S1 are asymptotically Gaussian when |ai − γ |> γ
√
c + y − cy.
PROPOSITION 4.1. Under the same assumptions as in Theorem 3.1, with ad-
ditional conditions that M is diagonal and all its eigenvalues ai (1 ≤ i ≤ M)
are simple, we have when |ai − γ | > γ√c + y − cy, the outlier eigenvalue lp,i of
S−12 S1 is asymptotically Gaussian:
√
p
(
lp,i − ai(ai − 1 + c)
ai − 1 − aiy
)
=⇒N(0, σ 2i ),
where
σ 2i =
2a2i (cy − c − y)(ai − 1)2(−1 + 2ai + c + a2i (y − 1))
(1 + ai(y − 1))4
+ (v4 − 3) · a
2
i (c + y)(−1 + 2ai + c + a2i (y − 1))2
(1 + ai(y − 1))4 .
PROOF. Under the above assumptions, the random matrix −U∗i R(λi)Ui re-
duces to −R(λi)(i, i), which is a Gaussian random variable of mean zero and
variance
2θi + (v4 − 3)ωi = 2a
2
i (ai + c − 1)2(cy − c − y)
−1 + 2ai + c + a2i (y − 1)
+ (v4 − 3) · a
2
i (ai + c − 1)2(c + y)
(ai − 1)2 .
Therefore, combining with the value of δ(λi) in (4.1) we have
√
p
(
lp,i − ai(ai − 1 + c)
ai − 1 − aiy
)
=⇒N(0, σ 2i ),
where
σ 2i =
2a2i (cy − c − y)(ai − 1)2(−1 + 2ai + c + a2i (y − 1))
(1 + ai(y − 1))4
+ (v4 − 3) · a
2
i (c + y)(−1 + 2ai + c + a2i (y − 1))2
(1 + ai(y − 1))4 .
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 425
The proof of Proposition 4.1 is complete.
REMARK 4.2. Notice that when the observations are standard Gaussian, we
have v4 = 3, then the above theorem reduces to
√
p
(
lp,i − ai(ai − 1 + c)
ai − 1 − aiy
)
=⇒N
(
0,
2a2i (ai − 1)2(cy − c − y)(−1 + 2ai + c + a2i (y − 1))
(1 + ai(y − 1))4
)
,
which is exactly the result in Dharmawansa, Johnstone and Onatski (2014); see
setting 1 in their Proposition 11.
5. Numerical illustrations. In this section, numerical results are provided to
illustrate the results of our Theorem 4.1 and Proposition 4.1. We fix p = 200,
T = 1000, n = 400 with 1000 replications, thus y = 1/2 and c = 1/5. The
critical interval is then [γ − γ√c + y − cy, γ + γ√c + y − cy] = [0.45,3.55]
and the limiting support [α,β] = [0.2,12.6]. Consider k = 3 spike eigenvalues
(a1, a2, a3) = (20,0.2,0.1) with respective multiplicity (n1, n2, n3) = (1,2,1).
Let l1 ≥ · · · ≥ lp be the ordered eigenvalues of the Fisher matrix S−12 S1. We are
particularly interested in the distributions of l1, (lp−2, lp−1) and lp , which corre-
sponds to the spike eigenvalues a1, a2 and a3, respectively.
5.1. Case of U = I4. In this subsection, we consider a simple case that U =
I4. Therefore, following Theorem 4.1, we have:
• for j = 1,p, √p{lj −φ(ai)} →N(0, σ 2i ). Here, for j = 1, i = 1, φ(a1)= 42.67
and σ 21 = 4246.8 + 1103.5(v4 − 3); and for j = p, i = 3, φ(a3) = 0.07 and
σ 23 = 7.2 × 10−3 + 3.15 × 10−3(v4 − 3);• for j = p − 2,p − 1 and i = 2, the two-dimensional random vector √p{lj −
φ(a2)} converges to the eigenvalues of the random matrix − Rmn(λ2) . Here,
φ(a2) = 0.13, (λ2) = 1.45 and Rmn is the 2 × 2 symmetric random matrix,
made with independent Gaussian entries of mean zero and variance given by
(5.1) Var(Rmn)=
{
2θ2 + (v4 − 3)ω2(= 0.04 + 0.016(v4 − 3)), m= n,
θ2(= 0.02), m = n.
Simulations are conducted to compare the distributions of the empirical extreme
eigenvalues with their limits.
5.1.1. Gaussian case. First, we assume all the zij and wij are i.i.d. standard
Gaussian, thus v4 − 3 = 0. And according to (5.1), Rmn/
√
0.04 is the standard
2 × 2 Gaussian Wigner matrix (GOE). Therefore, we have:
426 Q. WANG AND J. YAO
FIG. 3. Upper panels show the empirical densities of l1 and lp (solid lines, after centralization
and scaling) compared to their Gaussian limits (dashed lines). Lower panels show contour plots
of empirical joint density function of (lp−2, lp−1) (left plot, after centralization and scaling) and
contour plots of their limits (right plot). Both the empirical and limit joint density functions are
displayed using the two-dimensional kernel density estimates. Samples are draw from i.i.d. standard
Gaussian distribution with U = I4. The replication number is 1000.
• √p{l1 − 42.67} →N(0,4246.8),
• √p{lp − 0.07} →N(0,7.2 × 10−3),
• the two-dimensional random vector √p{lp−2 − 0.13, lp−1 − 0.13} converges to
the eigenvalues of the random matrix −0.138 ·W ; here, W is a 2 × 2 GOE.
We compare the empirical distributions with their limits in Figure 3. The
upper panels show the empirical kernel density estimates (in solid lines) of√
p{l1 −42.67} and √p{lp −0.07} from 1000 independent replications, compared
to their Gaussian limits N(0,4246.8) and N(0,7.2 × 10−3), respectively (dashed
lines). When considering the empirical distribution of the two-dimensional random
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 427
vector
√
p{lp−2 − 0.13, lp−1 − 0.13}, we run the two-dimensional kernel density
estimation from 1000 independent replications and display their contour lines (see
the lower-left panel of the figure), while the lower-right panel shows the contour
lines of the kernel density estimation of the eigenvalues of the 2×2 random matrix
−0.138 · GOE (their limits).
5.1.2. Binary case. Second, we assume all the zij and wij are i.i.d. binary
variables taking values {1,−1} with probability 1/2, and in this case we have
v4 = 1. Similarly, we have:
• √p{l1 − 42.67} →N(0,2039.8),
• √p{lp − 0.07} →N(0,9 × 10−4),
• the two-dimensional random vector √p{lp−2 − 0.13, lp−1 − 0.13} converges
to the eigenvalues of the random matrix −Rmn/1.45. Here, Rmn is the 2 × 2
symmetric random matrix, made with independent Gaussian entries of mean
zero and variance
Var(Rmn)=
{
0.008, m= n,
0.02, m = n.
Figure 4 compares the empirical distributions with their limits in this binary case.
The upper panels show the empirical kernel density estimates of √p{l1 − 42.67}
and √p{lp − 0.07} from 1000 independent replications (in solid lines), compared
to their Gaussian limits (in dashed lines). Also, the lower panel shows the contour
lines of the empirical joint density of the √p{lp−2 − 0.13, lp−1 − 0.13} (the left
plot), with the right plot displaying the contour lines of their limit.
5.2. Case of general U. In this subsection, we consider the following nonunit
orthogonal matrix:
U =
⎛
⎜⎜⎜⎜⎜⎜⎜⎝
1 0 0 0
0 1 0 0
0 0
1√
2
1√
2
0 0
1√
2
−1√
2
⎞
⎟⎟⎟⎟⎟⎟⎟⎠
,(5.2)
that is, we have
U1 =
⎛
⎜⎜⎝
1
0
0
0
⎞
⎟⎟⎠ , U2 =
⎛
⎜⎜⎜⎜⎜⎜⎜⎝
0 0
1 0
0
1√
2
0
1√
2
⎞
⎟⎟⎟⎟⎟⎟⎟⎠
, U3 =
⎛
⎜⎜⎜⎜⎜⎜⎜⎝
0
0
1√
2−1√
2
⎞
⎟⎟⎟⎟⎟⎟⎟⎠
.
428 Q. WANG AND J. YAO
FIG. 4. Upper panels show the empirical densities of l1 and lp (solid lines, after centralization
and scaling) compared to their Gaussian limits (dashed lines). Lower panels show contour plots
of empirical joint density function of (lp−2, lp−1) (left plot, after centralization and scaling) and
contour plots of their limits (right plot). Both the empirical and limit joint density functions are
displayed using the two-dimensional kernel density estimates. Samples are draw from i.i.d. binary
distribution with U = I4. The replication number is 1000.
Since Gaussian distribution is invariant under orthogonal transformation, we only
consider the case that all the zij and wij are i.i.d. binary variables taking values
{1,−1} with probability 1/2, with all the other settings the same as in Section 5.1.
Then according to Theorem 4.1, we have:
• √p{l1 − 42.67} →N(0,2039.8),
• √p{lp − 0.07} →N(0,0.004),
• the two-dimensional random vector √p{lp−2 − 0.13, lp−1 − 0.13} converges to
the eigenvalues of the random matrix −U∗2 R(λ2)U2/1.45. Here, R(λ2) is the
4 × 4 symmetric random matrix, made with independent Gaussian entries of
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 429
FIG. 5. Upper panels show the empirical densities of l1 and lp (solid lines, after centralization and
scaling) compared to their Gaussian limits (dashed lines). Lower panels show contour plots of em-
pirical joint density function of (lp−2, lp−1) (left plot, after centralization and scaling) and contour
plots of their limits (right plot). Both the empirical and limit joint density functions are displayed
using the two-dimensional kernel density estimates. Samples are from i.i.d. binary distribution with
U given by (5.2). The replication number is 1000.
mean zero and variance
Var(Rmn)=
{
0.008, m= n,
0.02, m = n.
Figure 5 compares the empirical distributions with their limits in this general U
case. The upper panels show the empirical kernel density estimates of √p{l1 −
42.67} and √p{lp − 0.07} from 1000 independent replications (in solid lines),
compared to their Gaussian limits (in dashed lines). Also, the lower panel of
the figure shows the contour lines of the empirical joint density of √p{lp−2 −
430 Q. WANG AND J. YAO
0.13, lp−1 − 0.13} (the lower-left plot), with the lower-right plot showing the con-
tour lines of their limit.
6. Joint distribution of the outlier eigenvalues. In the previous section, we
have obtained the following result for the outlier eigenvalues: the ni -dimensional
real random vector √p{lp,j −λi, j ∈ Ji} converges to the distribution of the eigen-
values of a random matrix −U∗i R(λi)Ui/(λi). It is in fact possible to derive theirjoint distribution, that is, the limit of the M-dimensional real random vector⎛
⎜⎝
√
p{lp,j1 − λ1, j1 ∈ J1}
...√
p{lp,jk − λk, jk ∈ Jk}
⎞
⎟⎠(6.1)
if all the spike eigenvalues ai are above (or below) the phase transition threshold.
Such joint convergence result is useful for inference procedures where consecutive
sample eigenvalues are used such as their differences or ratios; see, for example,
Onatski (2009) and Passemier and Yao (2014).
THEOREM 6.1. Assume the same conditions as in Theorem 4.1 holds and all
the population spikes ai satisfy the condition |ai − γ | > γ√c + y − cy. Then the
M-dimensional random vector in (6.1) converges in distribution to the eigenvalues
of the following M ×M random matrix:⎛
⎜⎜⎜⎜⎜⎝
−U∗1 R(λ1)U1
(λ1)
. . . 0
...
. . .
...
0 · · · −U
∗
k R(λk)Uk
(λk)
⎞
⎟⎟⎟⎟⎟⎠ ,(6.2)
where the matrices {R(λi)} are made with zero-mean independent Gaussian ran-
dom variables, with the following covariance function between different blocks
(l = s): for 1 ≤ i ≤ j ≤M :
Cov
(
R(λl)(i, j),R(λs)(i, j)
)=
{
θ(l, s), i = j,
ω(l, s)(v4 − 3)+ 2θ(l, s), i = j,
where
θ(l, s)= lim 1
n+ T trAn(λl)An(λs),
ω(l, s)= lim 1
n+ T
n+T∑
i=1
An(λl)(i, i)An(λs)(i, i),
and An(λ) is defined in (A.17).
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 431
The proof of this theorem is very close to that of Theorem 2.3 in Wang, Su and
Yao (2014), thus omitted.
In principle, the limiting parameters θ(l, s) and ω(l, s) can be completely speci-
fied for a given spiked structure. However, this will lead to quite complex formula.
Here, we prefer explaining a simple case where M is diagonal with simple eigen-
values (ai), all satisfying the condition: |ai − γ |> γ√c + y − cy (i = 1, . . . ,M).
Therefore, U∗i R(λi)Ui in (6.2) reduces to the (i, i)th element of R(λi), which is
a Gaussian random variable. Besides, from Theorem 6.1, we see that the random
variables {R(λi)(i, i)}i=1,...,M are jointly independent since the index sets (i, i) are
disjoint. Finally, we have the following joint distribution of the M outlier eigen-
values of S−12 S1.
PROPOSITION 6.1. Under the same assumptions as in Theorem 4.1, then if
M is diagonal with all its eigenvalues (ai) being simple, satisfying: |ai − γ | >
γ
√
c + y − cy, then the M outlier eigenvalues lp,j (j = 1, . . . ,M) of S−12 S1 are
asymptotically independent, having the joint distribution as follows:⎛
⎜⎝
√
p(lp,1 − λ1)
...√
p(lp,M − λM)
⎞
⎟⎠=⇒N
⎛
⎜⎜⎝0M,
⎛
⎜⎜⎝
σ 21 · · · 0
...
. . .
...
0 · · · σ 2M
⎞
⎟⎟⎠
⎞
⎟⎟⎠ ,
where
σ 2i =
2a2i (cy − c − y)(ai − 1)2(−1 + 2ai + c + a2i (y − 1))
(1 + ai(y − 1))4
+ (v4 − 3) · a
2
i (c + y)(−1 + 2ai + c + a2i (y − 1))2
(1 + ai(y − 1))4 .
7. Applications. In this section, we present two applications of our previous
results Theorem 4.1 and Proposition 4.1 in the areas of high-dimensional hypoth-
esis testing and signal detection.
7.1. Application 1: Power of testing the equality between two high-dimensional
covariance matrices. Let (xi)1≤i≤m and (zj )1≤j≤n be two p-dimensional ob-
servations from populations 1 and 2. This subsection considers the high-
dimensional hypothesis testing for the equality between 1 and 2 against
a specific alternative, that is, the difference between 1 and 2 is a finite rank
covariance matrix. Put it in another way, we are concerned about the following
testing problem:
H0 : 1 =2 vs. H1 : 1 =2 +,(7.1)
where rank()=M (here M is a finite integer).
432 Q. WANG AND J. YAO
There exists a wide literature on testing the equality between two covariance
matrices. In the classical large sample asymptotics, early works can be found in
text books like Muirhead (1982) and Anderson (1984), where the authors find the
limit distribution to be χ2 [with degrees of freedom p(p+1)/2] for the likelihood
ratio statistic under the Gaussian assumption. In recent years, this testing prob-
lem has been reconsidered but in a different asymptotic regime, that is, both the
dimension and the two sample sizes are allowed to grow to infinity together. For
example, in Bai et al. (2009), the authors prove that in the asymptotic regime of
Marcˇenko–Pastur-type, the limiting distribution of the likelihood ratio statistic is
Gaussian under H0. Li and Chen (2012) propose a test based on some U -statistic,
and its limiting distribution is derived under both the null and the alternative hy-
potheses in the high-dimensional framework. Cai, Liu and Xia (2013) proposes a
test statistic based on the elements of the two sample covariance matrices and both
its limiting distribution under the null hypothesis and its power are studied. And
it is shown that their statistic enjoys certain optimality and especially powerful
against sparse alternatives.
In the following, we consider a statistic based on the largest eigenvalue of the
Fisher matrix and it will be shown that it is powerful against spiked alternatives.
Now denote the sample covariance matrices of the two populations to be
(7.2) S1 = 1
m
m∑
j=1
xjx
∗
j =
1
m
XX∗
and
(7.3) S2 = 1
n
n∑
j=1
zj z
∗
j =
1
n
ZZ∗
respectively. When p, m and n are all growing to infinity proportionally while M is
a fixed integer, the empirical measure of the p eigenvalues of S−12 S1 (for simplicity,
we assume p < n) will be affected by a difference of order M/p which vanishes,
so that its limit remains the same as in the null hypothesis, that is, the Wachter
distribution (see Proposition 2.1). In other words, such global limit from all the
eigenvalues of S−12 S1 will be of little help for distinguishing the two hypotheses
(7.1). It happens that the useful information to detect a small rank alternative is
actually encoded in a few largest eigenvalues of S−12 S1.
Now denote l1 as the largest eigenvalue of S−12 S1. Notice that the eigenvalues of
S−12 S1 are invariant under the transformation (2.1), so without lose of generality,
we can assume that under H0, it holds 1 =2 = Ip . Then according to Han, Pan
and Zhang (2016), we have
l1 − β
sp
=⇒ F1,
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 433
where sp = 1m(
√
m+ √p)( 1√
m
+ 1√
p
)1/3, which is the order of p−2/3 and F1 de-
notes the type-1 Tracy–Widom distribution. Consequently, we adopt the following
decision rule:
Reject H0 : if l1 > qαsp + β,(7.4)
where qα is the upper quantile at level α of the Tracy–Widom distribution F1:
F1(qα,∞)= α.
Once the largest eigenvalue a1 of is above the critical value for phase transition,
this test will be able to detect the alternative hypothesis with a power tending to
one as the dimension tends to infinity.
THEOREM 7.1. Under the asymptotic scheme set in (2.7), assume the largest
eigenvalue a1 of is above the critical value 1+
√
c+y−cy
1−y . Then the power function
of the test procedure (7.4) equals to
Power = 1 −
(√
p
σ1
spqα +
√
p
σ1
(
β − a1(a1 − 1 + c)
a1 − 1 − a1y
))
+ o(1),
which will finally tend to one as the dimension tends to infinity.
PROOF. Under the alternative H1 and according to our Proposition 4.1, the
asymptotic distribution for l1 is Gaussian:
√
p
(
l1 − a1(a1 − 1 + c)
a1 − 1 − a1y
)
=⇒N(0, σ 21 ).
Therefore, the power can be calculated as
Power = 1 −
(√
p
σ1
spqα +
√
p
σ1
(
β − a1(a1 − 1 + c)
a1 − 1 − a1y
))
+ o(1),(7.5)
where is the standard normal cumulative distribution function. Since the order
of sp is p−2/3 when p → ∞, the first term
√
p
σ1
spqα → 0 and the second term√
p
σ1
(β − a1(a1−1+c)
a1−1−a1y ) → −∞ [when a1 >
1+√c+y−cy
1−y ,
a1(a1−1+c)
a1−1−a1y is always larger
than the right edge point β]. Therefore, we have the right-hand side of (7.5) tend
to one for any pre-given α when p → ∞. The proof of Theorem 7.1 is complete.
REMARK 7.1. In Li and Chen (2012), the authors use an U-statistic Tm,n to
test the hypothesis H0 :1 =2. And its power is shown to be
(
−Lm,n(1,2)zα + tr{(1 −2)
2}
σm,n
)
,(7.6)
434 Q. WANG AND J. YAO
where zα is the upper-α quantile of N(0,1) and
Lm,n(1,2)= σ−1m,n
{ 2
m
tr
(
22
)+ 2
n
tr
(
21
)}
,
σ 2m,n =
4
n2
{
tr
(
22
)}2 + 8
n
tr
(
22 −12
)2 + 4
m2
{
tr
(
21
)}2
+ 8
m
tr
(
21 −12
)2 + 8
mn
{
tr(12)
}2
.
If we restrict it to the specific alternative as in (7.1), then all the three parameters
Lm,n(1,2), tr{(1 −2)2} and σm,n in (7.6) are of constant order. Therefore,
against an alternative hypothesis of spiked type (7.1), our procedure is more pow-
erful.
7.2. Application 2: Determine the number of signals. In this subsection, we
consider an application of our results in the field of signal detection, where the
spiked Fisher matrix arises naturally.
In a signal detection equipment, records are of form
(7.7) xi =Asi + ei, i = 1, . . . ,m,
where xi is p-dimensional observations, si is a k× 1 low-dimensional signal (k
p) with unit covariance matrix, A a p × k mixing matrix, and (ei) is an i.i.d.
noise with covariance matrix 2. Therefore, the covariance matrix of xi can be
considered as a k-dimensional (low rank) perturbation of 2, denoted as p in
the following. Notice that none of the quantities at the right-hand side of (7.7)
is observed. One of the fundamental problems here is to estimate k, the number
of signals present in the system, which is challenging when the dimension p is
large, say has a comparable magnitude with the sample size m. When the noise
has the simplest covariance structure, that is, 2 = σ 2e Ip , this problem has been
much investigated recently and several solutions are proposed; see, for example,
Kritchman and Nadler (2008), Nadler (2010), Passemier and Yao (2012, 2014).
However, the problem with an arbitrary noise covariance matrix 2, say diagonal
to simplify, remains unsolved in the large-dimensional context (to the best of our
knowledge).
Nevertheless, there exists an astute engineering device where the system can
be tuned in a signal-free environment, for example, in laboratory: that is, we can
directly record a sequence of pure-noise observations zj , j = 1, . . . , n, which have
the same distribution as the (ei) above. These signal-free records can then be used
to whiten the observations (xi) thanks to the invariant property in (2.1), which
states that the eigenvalues of S−12 S1 [S1 and S2 are same defined as in (7.2) and
(7.3)] are in fact independent of 2. Therefore, these eigenvalues can be thought
as if 2 = Ip , that is, S−12 S1 becomes a spiked Fisher matrix as introduced in
Section 2. This is actually the reason why the two sample procedure developed
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 435
here can deal with an arbitrary covariance matrix of the noise while the existing
one-sample procedures cannot.
Based on Theorem 3.1, we propose our estimator of the number of signals as
the number of eigenvalues of S−12 S1 that is larger than the right edge point of the
support of its LSD:
kˆ = max{i : li ≥ β + dn},(7.8)
where (dn) is a sequence of vanishing constants.
THEOREM 7.2. Assume all the spike eigenvalues ai (i = 1, . . . , k) satisfy ai >
γ + γ√c + y − cy. Let dn be a sequence of positive numbers such that √p · dn →
0 and p2/3 · dn → +∞ as p → +∞, then the estimator kˆ in (7.8) is consistent,
that is, kˆ → k in probability as p → +∞.
PROOF. Since
{kˆ = k} = {k = max{i : li ≥ β + dn}}
= {∀j ∈ {1, . . . , k}, lj ≥ β + dn}∩ {lk+1 < β + dn},
we have
P{kˆ = k} = P
( ⋂
1≤j≤k
{lj ≥ β + dn} ∩ {lk+1 < β + dn}
)
= 1 − P
( ⋃
1≤j≤k
{lj < β + dn} ∪ {lk+1 ≥ β + dn}
)
(7.9)
≥ 1 −
k∑
j=1
P(lj < β + dn)− P(lk+1 ≥ β + dn).
For j = 1, . . . , k,
P(lj < β + dn)= P(√p(lj − φ(aj ))<√p(β + dn − φ(aj )))(7.10)
→ P(√p(lj − φ(aj ))<√p(β − φ(aj ))),
which is due to the assumption that √p · dn → 0. Then the part √p(β − φ(aj ))
in (7.10) will tend to −∞ since we have always φ(aj ) > β when ai > γ +
γ
√
c + y − cy. On the other hand, by Theorem 4.1, √p(lj − φ(aj )) in (7.10)
has a limiting distribution; it is then bounded in probability. Therefore, we have
P(lj < β + dn)→ 0 for j = 1, . . . , k.(7.11)
Also
P(lk+1 ≥ β + dn)= P(p2/3(lk+1 − β)≥ p2/3 · dn),
436 Q. WANG AND J. YAO
and the part p2/3(lk+1 − β) is asymptotically Tracy–Widom distributed [see Han,
Pan and Zhang (2016)]. As p2/3 · dn tend to infinity as assumed, we have
P(lk+1 ≥ β + dn)= 0.(7.12)
Combine (7.9), (7.11) and (7.12), we have P{kˆ = k} → 1 as p → +∞. The proof
of Theorem 7.2 is complete.
REMARK 7.2. Notice here that there is no need for those spikes ai to be
simple. The only requirement is that they should be properly strong enough
(ai > γ + γ√c + y − cy) for detection.
In the following, we will conduct a short simulation to illustrate the perfor-
mance of our estimator. For comparison, we also show the performance of another
estimator k¯ that treats the noise covariance as known (using a plug-in estimator
for this quantity). Detailed illustrations are as follows. Recall the model in (7.7),
where Cov(ei) = 2 is arbitrary. Now assume for a moment that 2 is known,
then we can multiply both sides of (7.7) by −1/22 :
−1/2
2 xi =−1/22 Asi +−1/22 ei, i = 1, . . . ,m,
where the left-hand side is still observable (simply multiply the original observa-
tions {xi} by −1/22 ). Denote x˜i = −1/22 xi and e˜i = −1/22 ei , then Cov(e˜i) = Ip .
On the other hand, due to the fact that the rank of −1/22 Asi is still k, the covariance
matrix of the new observation x˜i is then a rank k perturbation of Ip . Therefore, the
method in Kritchman and Nadler (2008) can be adopted. Their proposed estimator
is
k¯ = max{k : lk > (1 + √c)2 + dn}.(7.13)
Besides, the {lk} in (7.13) are the eigenvalues of the sample covariance matrix of
the observation x˜i :
−1/2
2 ·
( 1
m
XXT
)
·−1/22 ,
whose eigenvalues are the same as those of −12 S1. Since 2 is actually unknown,
here we simply use its plug-in estimator S2. Therefore, the estimator in (7.13) for
comparison is then
k¯ = max{k : lk(S−12 S1)> (1 + √c)2 + dn}.(7.14)
The parameters for the simulation is set as follows. We fix y = 0.1, c = 0.9 and
the value of p varies from 50 to 250, therefore, the critical value for ai in the model
(2.4) (after whitening) is ai > γ {1 + √c + y − cy} = 2.17. For each given pair of
(p,n,m) (we take floor if the values of n or m are nonintegers), we repeat 1000
times. The tuning parameter dn is chosen to be logp/p2/3.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 437
Next, suppose k = 3 and A is a p × 3 matrix of form A = (√c1v1,√c2v2),
where c1 = 10, c2 = 5,
v1 = (1 0 · · · 0)∗ and v2 =
(
0 1/
√
2 1/
√
2 0 · · · 0
0 1/
√
2 −1/√2 0 · · · 0
)∗
.
So we have two spike eigenvalues c1 = 10, c2 = 5 (before whitening) with multi-
plicity n1 = 1, n2 = 2, respectively.
Besides, assume Cov(si) = I3 and we run both the Gaussian (si is multivariate
Gaussian) and non-Gaussian (each component of si is i.i.d., taking value 1 or −1
with equal probability) cases. Finally, we set ei to be multivariate Gaussian dis-
tributed with covariance matrix Cov(ei) either diagonal or nondiagonal as in the
following two cases:
• Case 1: Cov(ei) = diag(1, . . . ,1︸ ︷︷ ︸
p/2
,2, . . . ,2︸ ︷︷ ︸
p/2
). In this case, we have the three
nonzero eigenvalues of (c1v1v∗1 + c2v2v∗2) · [Cov(ei)]−1 equal 10,5,5, respec-
tively, which are all larger than the critical value 2.17−1, therefore, the number
of detectable signals is three;
• Case 2: Cov(ei) is compound symmetric with all the diagonal elements equal 1
and all the off-diagonal elements equal 0.1. In this case, we have for each given
p, the three nonzero eigenvalues of (c1v1v∗1 +c2v2v∗2) · [Cov(ei)]−1 are all larger
than 5.36 (> 2.17 − 1). The number of detectable signals is again three.
Tables 1 and 2 report the empirical frequency of our estimator kˆ in Case 1 and
Case 2. For comparison, we also report the frequency of the plug-in estimator
TABLE 1
Frequency of our estimator and the plug-in estimator defined in (7.14) for Case 1
Gaussian Non-Gaussian
p 50 100 150 200 250 50 100 150 200 250
kˆ = 2 0.029 0.001 0 0 0 0.011 0 0 0 0
kˆ = 3 0.971 0.997 0.997 0.995 0.998 0.985 0.997 0.993 0.998 0.998
kˆ = 4 0 0.002 0.003 0.005 0.002 0.004 0.003 0.007 0.002 0.002
k¯ = 3 0.603 0.037 0 0 0 0.654 0.051 0 0 0
k¯ = 4 0.387 0.485 0.03 0 0 0.334 0.514 0.026 0 0
k¯ = 5 0.01 0.439 0.375 0.016 0 0.012 0.394 0.392 0.009 0
k¯ = 6 0 0.039 0.508 0.194 0.008 0 0.041 0.481 0.253 0.002
k¯ = 7 0 0 0.084 0.566 0.125 0 0 0.096 0.56 0.108
k¯ = 8 0 0 0.003 0.204 0.463 0 0 0.005 0.163 0.518
k¯ = 9 0 0 0 0.02 0.369 0 0 0 0.015 0.334
k¯ = 10 0 0 0 0 0.035 0 0 0 0 0.038
438 Q. WANG AND J. YAO
TABLE 2
Frequency of our estimator and the plug-in estimator defined in (7.14) for Case 2
Gaussian Non-Gaussian
p 50 100 150 200 250 50 100 150 200 250
kˆ = 2 0.018 0 0 0 0 0.003 0 0 0 0
kˆ = 3 0.982 0.995 0.996 0.995 0.998 0.993 0.997 0.993 0.998 0.998
kˆ = 4 0 0.005 0.004 0.005 0.002 0.004 0.003 0.007 0.002 0.002
k¯ = 3 0.6 0.034 0 0 0 0.644 0.048 0.026 0 0
k¯ = 4 0.39 0.477 0.03 0 0 0.345 0.511 0.382 0.008 0
k¯ = 5 0.01 0.449 0.36 0.016 0 0.011 0.399 0.491 0.243 0
k¯ = 6 0 0.04 0.518 0.193 0.007 0 0.042 0.096 0.564 0.002
k¯ = 7 0 0 0.088 0.559 0.116 0 0 0.005 0.169 0.103
k¯ = 8 0 0 0.004 0.207 0.465 0 0 0 0.016 0.516
k¯ = 9 0 0 0 0.025 0.377 0 0 0 0 0.341
k¯ = 10 0 0 0 0 0.035 0 0 0 0 0.038
defined in (7.14). According to our set up, the true number of signals is k = 3. From
these two tables, we see that the frequency of correct estimation of our estimator kˆ
(kˆ = 3) is always around some value close to 1 in the two cases (both for Gaussian
signal and non-Gaussian signal), which confirms the consistency of our estimator.
While the plug-in estimator will always overestimate the number of signals in
both cases. This overestimation phenomenon gets more and more striking when
the value of p gets larger.
8. Proofs of the main results.
8.1. Proof of Theorem 3.1. For notation convenience, first we define some
integrals with respect to Fc,y(x) as follows: for a complex number z /∈ [α,β],
s(z) :=
∫ 1
x − z dFc,y(x), m1(z) :=
∫ 1
(z− x)2 dFc,y(x),
m2(z) :=
∫
x
z− x dFc,y(x), m3(z) :=
∫
x
(z− x)2 dFc,y(x),(8.1)
m4(z) :=
∫
x2
(z− x)2 dFc,y(x).
PROOF. The proof is divided into the following three steps:
• Step 1: we derive the almost sure limit of an outlier eigenvalue of S−12 S1;
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 439
• Step 2: we show that in order for the extreme eigenvalue of S−12 S1 to be an
outlier, the population spike ai should be larger (or smaller) than a critical
value;
• Step 3: if not so, the extreme eigenvalue of S−12 S1 will converge to one of the
edge points α and β .
Step 1: Let lp,j (j ∈ Ji) be the outlier eigenvalue of S−12 S1 corresponding to the
population spike ai . Then lp,j must satisfy the following equation:∣∣lp,j Ip − S−12 S1∣∣= 0,
and it is equivalent to
|lp,j S2 − S1| = 0.(8.2)
Now we make some shorthand. Denote Z = (Z1
Z2
)
, where Z1 is the n observations
of its first M coordinates and Z2 the remaining. We partition X accordingly as
X = (X1
X2
)
, where X1 is the m observations of its first M coordinates and X2 the
remaining. Using such a representation, we have
S1 = 1
m
XX∗ = 1
m
(
X1X
∗
1 X1X
∗
2
X2X
∗
1 X2X
∗
2
)
,
(8.3)
S2 = 1
n
ZZ∗ = 1
n
(
Z1Z
∗
1 Z1Z
∗
2
Z2Z
∗
1 Z2Z
∗
2
)
.
Then (8.2) could be written in the block form:∣∣∣∣∣∣∣∣
⎛
⎜⎜⎝
lp,j
n
Z1Z
∗
1 −
1
m
X1X
∗
1
lp,j
n
Z1Z
∗
2 −
1
m
X1X
∗
2
lp,j
n
Z2Z
∗
1 −
1
m
X2X
∗
1
lp,j
n
Z2Z
∗
2 −
1
m
X2X
∗
2
⎞
⎟⎟⎠
∣∣∣∣∣∣∣∣= 0.(8.4)
Since lp,j is an outlier, it holds |lp,j · 1nZ2Z∗2 − 1mX2X∗2 | = 0, and for block matrix,
we have det
(A
C
B
D
) = detD · det(A − BD−1C) when D is invertible. Therefore,
(8.4) reduces to∣∣∣∣ lp,jn Z1Z∗1 − 1mX1X∗1
−
(
lp,j
n
Z1Z
∗
2 −
1
m
X1X
∗
2
)(
lp,j
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
×
(
lp,j
n
Z2Z
∗
1 −
1
m
X2X
∗
1
)∣∣∣∣= 0.
440 Q. WANG AND J. YAO
More specifically, we have
det
(
lp,j
n
Z1
[
In −Z∗2
(
lp,j Ip −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
)−1(1
n
Z2Z
∗
2
)−1 lp,j
n
Z2
]
Z∗1︸ ︷︷ ︸
(I )
− 1
m
X1
[
Im +X∗2
(
lp,j Ip −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
)−1(1
n
Z2Z
∗
2
)−1 1
m
X2
]
X∗1︸ ︷︷ ︸
(II)
+ lp,j
n
Z1Z
∗
2
(
lp,j Ip −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
)−1(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
1︸ ︷︷ ︸
(III)
(8.5)
+ 1
m
X1X
∗
2
(
lp,j Ip −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
)−1(1
n
Z2Z
∗
2
)−1 lp,j
n
Z2Z
∗
1︸ ︷︷ ︸
(IV)
)
= 0.
In all the following, we denote by S the Fisher matrix ( 1
n
Z2Z
∗
2)
−1 1
m
X2X
∗
2 , which
has a LSD Fc,y(x). And in order to find the limit of lp,j , we simply find the limit on
the left-hand side of (8.5), then it will generate an equation. Solving this equation
will give the value of its limit.
First, consider the terms (III) and (IV). Since (Z1,X1) is independent of
(Z2,X2), using Lemma A.2, we see these two terms will converge to some con-
stant multiplied by the covariance matrix between X1 and Z1. On the other hand,
X1 is also independent of Z1, we have
Cov(X1,Z1)= EX1Z1 −EX1EZ1 = EX1EZ1 −EX1EZ1 = 0M×M.
Therefore, these two terms will both tend to a zero matrix 0M×M almost surely.
So the remaining task is to find the limit of (I ) and (II). We recall the expression
of X1 and Z1 that
Cov(X1)=U diag(a1, . . . , a1︸ ︷︷ ︸
n1
, . . . , ak, . . . , ak︸ ︷︷ ︸
nk
)U∗, Cov(Z1)= IM.
According to Lemma A.2, we have
(I )= lp,j
n
Z1
[
In −Z∗2(lp,j Ip − S)−1
(1
n
Z2Z
∗
2
)−1 lp,j
n
Z2
]
Z∗1
→ λi
n
{
E tr
[
In −Z∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1λi
n
Z2
]}
· IM(8.6)
= λi(1 + yλis(λi)) · IM,
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 441
here, we denote λi as the limit of the outlier {lp,j , j ∈ Ji}. For the same reason,
(II) = − 1
m
X1
[
Im +X∗2(lp,j Ip − S)−1
(1
n
Z2Z
∗
2
)−1 1
m
X2
]
X∗1
→ − 1
m
{
E tr
[
Im +X∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1 1
m
X2
]}
(8.7)
×U
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗
= U (−1 + c + cλis(λi)) ·
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗.
Therefore, combining (8.5), (8.6) and (8.7), we have the determinant of the
following M ×M matrix:
U
⎛
⎜⎝
λi
(
1 + yλis(λi))+ (−1 + c + cλis(λi))a1 0
...
. . .
...
0 λi
(
1 + yλis(λi))+ (−1 + c + cλis(λi))ak
⎞
⎟⎠U∗
equal to zero, which is also to say that λi satisfies the equation:
λi
(
1 + yλis(λi))+ (−1 + c + cλis(λi))ai = 0.(8.8)
Finally, together with the expression of the Stieltjes transform of a Fisher matrix
in (2.10), we have
λi = ai(ai + c − 1)
ai − aiy − 1 = φ(ai).(8.9)
Step 2: Define s(z) as the Stieltjes transform of the LSD of 1
m
X∗2(
1
n
Z2Z
∗
2)
−1X2,
who shares the same nonzero eigenvalues as S−12 S1. Then we have the relationship:
s(z)+ 1
z
(1 − c)= cs(z).(8.10)
Recall the expression of s(z) in (2.10), we have
s(z)= −c(z(1 − y)+ 1 − c)+ 2zy − c
√
(1 − c + z(1 − y))2 − 4z
2z(c + zy) .
(8.11)
On the other hand, due to (8.8) and (8.10), we have the value for s(λi):
s(λi)= yc − y − c
yλi + aic .(8.12)
442 Q. WANG AND J. YAO
Since λi is outside the support of the LSD, we have
s−1
(
yc − y − c
yλi + aic
)
= λi > β or s−1
(
yc − y − c
yλi + aic
)
= λi < α,
which is also to say that
s(β) <
yc − y − c
yλi + aic(8.13)
or
s(α) >
yc − y − c
yλi + aic .(8.14)
Then (8.13) says that s(β) must be smaller than the minimum value on its right-
hand side, whose minimum value is attained when λi = β [the right-hand side
of (8.13) is a decreasing function of λi ]. Similarly, (8.14) says that s(α) must
be larger than the maximum value on its right-hand side, which is attained when
λi = α. Therefore, the condition for λi be an outlier is
s(β) <
yc − y − c
yβ + aic or s(α) >
yc − y − c
yα + aic .(8.15)
Finally, using (8.11) together with the value of α and β , we have
ai >
1 + √c + y − cy
1 − y or ai <
1 − √c + y − cy
1 − y ,
which is equivalent to say that [recall the expression of γ that γ = 1/(1 − y)] the
condition to allow for the outlier is
|ai − γ |> γ√c + y − cy.
Step 3: In this step, we show that if the condition in Step 2 is not fulfilled,
then the extreme eigenvalues of S−12 S1 will tend to one of the edge points α and
β . For simplicity, we only show the convergence to the right edge β: the proof
for the convergence to the left edge α is similar. Thus suppose all the ai > 1 for
i = 1, . . . , k. Let
S1 = 1
m
XX∗ = 1
m
(
X1X
∗
1 X1X
∗
2
X2X
∗
1 X2X
∗
2
)
:=
(
B11 B12
B21 B22
)
and
S2 = 1
n
ZZ∗ = 1
n
(
Z1Z
∗
1 Z1Z
∗
2
Z2Z
∗
1 Z2Z
∗
2
)
:=
(
A11 A12
A21 A22
)
,
where B11 and A11 are the blocks of size M × M . Using the inverse formula for
block matrix, the (p −M)× (p −M) major sub-matrix of S−12 S1 is
−(A22 −A21A−111 A12)−1A21A−111 B12 + (A22 −A21A−111 A12)−1B22(8.16)
:= C.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 443
The part
−(A22 −A21A−111 A12)−1A21A−111 B12
= −(A22 −A21A−111 A12)−1A21A−111 · 1mX1X∗2
is of rank M ; besides, we have
tr
{(
A22 −A21A−111 A12
)−1
A21A
−1
11
1
m
X1X
∗
2
}
→ 0,
since X1 is independent of X2. Therefore, the M nonzero eigenvalues of the matrix
−(A22 − A21A−111 A12)−1A21A−111 B12 will all tend to zero (so is its largest one).
Then consider the second part of (8.16) as follows:
A22 −A21A−111 A12 =
1
n
Z2
[
In −Z∗1
(1
n
Z1Z
∗
1
)−1 1
n
Z1
]
Z∗2 :=
1
n
Z2PZ2.
Since P = In −Z∗1( 1nZ1Z∗1)−1 1nZ1 is a projection matrix of rank p−M , it has the
spectral decomposition:
P = V
⎛
⎜⎜⎜⎝
0
. . .
0
In−M
⎞
⎟⎟⎟⎠V ∗,
where V is an n × n orthogonal matrix. Since M is fixed, the ESD of P tends
to δ1, which leads to the fact that the LSD of the matrix 1nZ2PZ
∗
2 is the standard
Marcˇenko–Pastur law. Then the matrix ( 1
n
Z2PZ
∗
2)
−1B22 is a standard Fisher ma-
trix, and its M + 1 largest eigenvalues α1(C) ≥ · · · ≥ αM+1(C) all converge to
the right edge β of the limiting Wachter distribution. Meanwhile, since C is the
(p − M) × (p − M) major sub-matrix of S−12 S1, we have by Cauchy interlacing
theorem
αM+1(C)≤ lp,M+1 ≤ α1(C)≤ lp,1.
Thus lp,M+1 → β either. On the other hand, we have
lp,1 =
∥∥S−12 S1∥∥op ≤ ∥∥S−12 ∥∥op · ‖S1‖op,
so that for some positive constant θ , lim sup lp,1 ≤ θ . Consequently, almost surely,
β ≤ lim inf lp,M ≤ · · · ≤ lim sup lp,1 ≤ θ <∞;
in particular the whole family {lp,j ,1 ≤ j ≤M} is bounded. Now let 1 ≤ j ≤M be
fixed and assume that a subsequence (lpk,j )k converges to a limit β˜ ∈ [β, θ ]. Either
β˜ = φ(ai) > β or β˜ = β . However, according to Step 2, β˜ > β implies that ai >
γ {1+√c + y − cy}, and otherwise, we have ai ≤ γ {1+√c + y − cy}. Therefore,
accordingly to one of these two conditions, all subsequences converge to a same
limit φ(ai) or β , which is thus also the unique limit of the whole sequence (lp,j )p .
The proof of Theorem 3.1 is complete.
444 Q. WANG AND J. YAO
8.2. Proof of Theorem 4.1. Step 1: Convergence to the eigenvalues of the ran-
dom matrix −U∗i R(λi)Ui/(λi). We start from (8.5). Define
A(λ) = In −Z∗2
[
λIp −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
]−1(1
n
Z2Z
∗
2
)−1λ
n
Z2,
B(λ) = Im +X∗2
[
λIp −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
]−1(1
n
Z2Z
∗
2
)−1 1
m
X2,
(8.17)
C(λ) = Z∗2
[
λIp −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
]−1(1
n
Z2Z
∗
2
)−1 1
m
X2,
D(λ) = X∗2
[
λIp −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
]−1(1
n
Z2Z
∗
2
)−1 1
n
Z2,
then (8.5) could be written as
det
(
lp,j
n
Z1A(lp,j )Z
∗
1︸ ︷︷ ︸
(i)
− 1
m
X1B(lp,j )X
∗
1︸ ︷︷ ︸
(ii)(8.18)
+ lp,j
n
Z1C(lp,j )X
∗
1︸ ︷︷ ︸
(iii)
+ lp,j
m
X1D(lp,j )Z
∗
1︸ ︷︷ ︸
(iv)
)
= 0.
The remaining is to find second-order approximation of the four terms on the left-
hand side of (8.18).
Using Lemma A.5 in the Appendix, we have
(i) = Eλi
n
Z1A(λi)Z
∗
1 +
lp,j
n
Z1A(lp,j )Z
∗
1 −E
λi
n
Z1A(λi)Z
∗
1
= (λi + yλ2i s(λi)) · IM + lp,jn Z1A(lp,j )Z∗1 − λin Z1A(λi)Z∗1
+ λi
n
Z1A(λi)Z
∗
1 −E
λi
n
Z1A(λi)Z
∗
1
= (λi + yλ2i s(λi)) · IM + lp,j − λin Z1A(lp,j )Z∗1(8.19)
+ λi
n
Z1
(
A(lp,j )−A(λi))Z∗1
+ λi√
n
[ 1√
n
Z1A(λi)Z
∗
1 −E
1√
n
Z1A(λi)Z
∗
1
]
→ (λi + yλ2i s(λi)) · IM + (lp,j − λi) · (1 + 2yλis(λi)+ λ2i ym1(λi)) · IM
+ λi√
n
[ 1√
n
Z1A(λi)Z
∗
1 −E
1√
n
Z1A(λi)Z
∗
1
]
,
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 445
(ii) = E 1
m
X1B(λi)X
∗
1 +
1
m
X1B(lp,j )X
∗
1 −E
1
m
X1B(λi)X
∗
1
= U (1 − c − cλis(λi)) ·
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗ + 1
m
X1
(
B(lp,j )−B(λi))X∗1
+ 1√
m
[ 1√
m
X1B(λi)X
∗
1 −E
1√
m
X1B(λi)X
∗
1
]
(8.20)
→ U (1 − c − cλis(λi)) ·
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗
−U(lp,j − λi) · cm3(λi) ·
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗
+ 1√
m
[ 1√
m
X1B(λi)X
∗
1 −E
1√
m
X1B(λi)X
∗
1
]
,
(iii) = lp,j
n
Z1C(lp,j )X
∗
1 −E
λi
n
Z1C(λi)X
∗
1
= lp,j
n
Z1C(lp,j )X
∗
1 −
λi
n
Z1C(λi)X
∗
1 +
λi
n
Z1C(λi)X
∗
1
−Eλi
n
Z1C(λi)X
∗
1
(8.21)
= lp,j
n
Z1
(
C(lp,j )−C(λi))X∗1 + lp,j − λin Z1C(λi)X∗1
+ λi
n
· [Z1C(λi)X∗1 −EZ1C(λi)X∗1]
→ λi
n
· [Z1C(λi)X∗1 −EZ1C(λi)X∗1],
(iv) = lp,j
m
X1D(lp,j )Z
∗
1 −E
λi
m
X1D(λi)Z
∗
1
= lp,j
m
X1D(lp,j )Z
∗
1 −
λi
m
X1D(λi)Z
∗
1 +
λi
m
X1D(λi)Z
∗
1
−Eλi
m
X1D(λi)Z
∗
1
(8.22)
= lp,j
m
X1
(
D(lp,j )−D(λi))Z∗1 + lp,j − λim X1D(λi)Z∗1
446 Q. WANG AND J. YAO
+ λi
m
· [X1D(λi)Z∗1 −EX1D(λi)Z∗1]
→ λi
m
· [X1D(λi)Z∗1 −EX1D(λi)Z∗1].
Denote
Rn(λi)= λi
√
p
n
[ 1√
n
Z1A(λi)Z
∗
1
]
−
√
p
m
[ 1√
m
X1B(λi)X
∗
1
]
+ λi
√
p
n
[ 1√
n
Z1C(λi)X
∗
1
]
(8.23)
+ λi
√
p
m
[ 1√
m
X1D(λi)Z
∗
1
]
−E[·],
where E[·] denotes the total expectation of all the preceding terms in the equation,
and
(λi)= 1 + 2yλis(λi)+ λ2i ym1(λi)+ aicm3(λi).
Combining (8.18), (8.19), (8.20), (8.21), (8.22) and considering the diagonal block
that corresponds to the row and column index in Ji × Ji leads to∣∣√p(lp,j − λi) ·(λi) · Ini +U∗i Rn(λi)Ui ∣∣→ 0.(8.24)
Furthermore, it will be established in Step 2 below that
(8.25) U∗i Rn(λi)Ui −→U∗i R(λi)Ui in distribution,
for some random matrix R(λi). Using the device of Skorokhod strong representa-
tion [Hu and Bai (2014), Skorokhod (1956)], we may assume that this convergence
hold almost surely by considering an enlarged probability space. Under this device,
(8.24) is equivalent to say that √p(lp,j − λi) tends to an eigenvalue of the matrix
−U∗i R(λi)Ui/(λi). Finally, as the index j is arbitrary over the set Ji , all the ni
random variables {√
p(lp,j − λi), j ∈ Ji}
converge almost surely to the set of eigenvalues of the random matrix −U∗i R(λi)Ui
(λi)
.
Besides, due to Lemma A.3, we have
(λi)= 1 + 2yλis(λi)+ λ2i ym1(λi)+ acm3(λi)
= (1 − ai − c)(1 + ai(y − 1))
2
(ai − 1)(−1 + 2ai + c + a2i (y − 1))
.
Step 2: Proof of the convergence (8.25) and structure of the random matrix
R(λi). In the second step, we aim to find the matrix limit of the random ma-
trix U∗i Rn(λi)Ui . First, we show U∗i Rn(λi)Ui equals to another random matrix
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 447
U∗i R˜n(λi)Ui , here R˜n(λi) is the type of random sesquilinear form. Then using the
results in Bai and Yao (2008) (Proposition 3.1 and Remark 1), we are able to find
the matrix limit of R˜n(λi).
By assumption (b) that xi =1/2p wi , we have its first M components:
X1 =1/2M W1 =U
⎛
⎜⎝
√
a1
. . . √
ak
⎞
⎟⎠U∗W1.
Recall the definition of Rn(λi) in (8.23), we have
U∗i Rn(λi)Ui
=U∗i
√
pλi
n
Z1A(λi)Z
∗
1Ui −
√
p
m
⎛
⎜⎝
√
a1
. . . √
ak
⎞
⎟⎠U∗i W1B(λi)
×W ∗1 Ui
⎛
⎜⎝
√
a1
. . . √
ak
⎞
⎟⎠
+U∗i
√
pλi
n
Z1C(λi)W
∗
1 Ui
⎛
⎜⎝
√
a1
. . . √
ak
⎞
⎟⎠
+ λi
√
p
m
⎛
⎜⎝
√
a1
. . . √
ak
⎞
⎟⎠U∗i W1D(λi)Z∗1Ui −E[·](8.26)
=U∗i
{
λi
√
p
n
Z1A(λi)Z
∗
1 − ai
√
p
m
W1B(λi)W
∗
1 +
√
aiλi
√
p
n
Z1C(λi)W
∗
1
+ √aiλi
√
p
m
W1D(λi)Z
∗
1
}
Ui −E[·]
=U∗i
(
Z1 W1
)
⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠
(
Z∗1
W ∗1
)
Ui −E[·]
:=U∗i R˜n(λi)Ui,
where
R˜n(λi) := (Z1 W1)
⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠
(
Z∗1
W ∗1
)
−E[·].
448 Q. WANG AND J. YAO
Finally, using Lemma A.6 in the Appendix leads to the result. The proof of Theo-
rem 4.1 is complete.
APPENDIX A: SOME LEMMAS
LEMMA A.1. Let R be a M × M real-valued matrix, U = (U1 · · · Uk)
and V = (V1 · · · Vk) be two orthogonal bases of some subspace E ⊆ RM of
dimension M , where both Ui and Vi are of size M ×ni , satisfying n1 + · · ·+nk =
M . Then the eigenvalues of the two ni × ni matrices U∗i RUi and V ∗i RVi are the
same.
PROOF. It is sufficient to prove that there exists a ni ×ni orthogonal matrix A,
such that
Vi =Ui ·A.(A.1)
If it is true, then V ∗i RVi = A∗(U∗i RUi)A. Since A is orthogonal, we have the
eigenvalues of V ∗i RVi and U∗i RUi are the same. Therefore, it only remains
to show (A.1). Let Ui = (u1 · · · uni ) and Vi = (v1 · · · vni ). Define A =
(als)1≤l,s≤ni , such that⎧⎪⎪⎨
⎪⎪⎩
v1 = a11u1 + · · · + ani1uni ,
...
vni = a1niu1 + · · · + anini uni .
Put in matrix form:
(
v1 · · · vni
)= (u1 · · · uni )
⎛
⎜⎝
a11 · · · a1ni
...
. . .
...
0 · · · anini
⎞
⎟⎠ ,
that is, Vi = Ui · A. Since 〈vi, vj 〉 = 〈a·i , a·j 〉 (by orthogonality of {uj }), where
a·k = (alk)1≤k≤ni , the matrix A is then orthogonal.
LEMMA A.2. Suppose X = (x1, . . . , xn) is a p×n matrix, with each columns
{xi} being independent random vectors. Y = (y1, . . . , yn) is defined similarly. Let
p be the covariance matrix between xi and yi , A is a deterministic matrix, then
we have
XAY ∗ −→ trA ·p.
Moreover, if A is random but independent of X and Y , then we have
XAY ∗ −→ E trA ·p.(A.2)
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 449
PROOF. We consider the (i, j)th entry of XAY ∗:
XAY ∗(i, j)=
n∑
k,l=1
X(i, k)A(k, l)Y ∗(l, j)=
n∑
k,l=1
XikYjlAkl.(A.3)
Since XikYjl → p(i, j) when k = l, the right-hand side of (A.3) tends to
p(i, j) ·∑nk=1 Akk , which is equivalent to say that
XAY ∗ → trA ·p.
Equation (A.2) is simply due to the conditional expectation. The proof of
Lemma A.2 is complete.
In all the following, write λ as the outlier limit φ(a) in (3.4), that is,
λ := a(a − 1 + c)
a − 1 − ay .
LEMMA A.3. With s(z), m1(z)−m4(z) defined in (8.1), we have
s(λ)= a(y − 1)+ 1
(a − 1)(a + c − 1) ,
m1(λ)= (a(y − 1)+ 1)
2(−1 + 2a + a2(y − 1)+ y(c − 1))
(a − 1)2(a + c − 1)2(−1 + 2a + c + a2(y − 1)) ,
m2(λ)= 1
a − 1 ,
m3(λ)= −(a(y − 1)+ 1)
2
(a − 1)2(−1 + 2a + c + a2(y − 1)) ,
m4(λ)= −1 + 2a + c + a
2(−1 + c(y − 1))
(a − 1)2(−1 + 2a + c + a2(y − 1)) .
SKETCH OF THE PROOF OF LEMMA A.3. In this short proof, we skip all the
detailed calculations. Recall the definition of s(z) in (8.11), its value at λ is
s(λ)= a(y − 1)+ 1
(a − 1)(a + c − 1) .(A.4)
Also, (8.11) says that s(z) is the solution of the following equation:
(A.5) z(c + zy)s2(z)+ (c(z(1 − y)+ 1 − c)+ 2zy)s(z)+ c + y − cy = 0.
Taking derivatives on both sides of (A.5) and combing with (A.4) will give the
value of s′(λ). On the other hand, according to (8.10), it holds
s(z)+ 1
z
(1 − c)= cs(z),(A.6)
450 Q. WANG AND J. YAO
taking derivatives on both sides again will give the value of s′(λ). Finally, the
above five values are all some linear combinations of s(λ) and s ′(λ). The proof of
Lemma A.3 is complete.
LEMMA A.4. Under assumptions (a)–(d),
1
p
tr
{(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1}
a.s.−→ 1
a + c − 1 .
PROOF. We first condition on Z2, then we can use the result in Zheng, Bai and
Yao (2013) (Lemma 4.3), which says that
1
p
tr
(1
z
· 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
→ m˜(z) a.s.,
where m˜(z) is the unique solution to the equation
m˜(z)=
∫ 1
x
z
− 11−cm˜(z)
dFy(x)(A.7)
satisfying
(z) · (m˜(z))≥ 0,
here, Fy(x) is the LSD of 1nZ2Z
∗
2 (deterministic), which is the standard M–P
law with parameter y. Besides, if we denote its Stieltjes transform as s(z) :=∫ 1
x−z dFy(x), then (A.7) could be written as
m˜(z)=
∫
z
x − z1−cm˜(z)
dFy(x)= z · s
(
z
1 − cm˜(z)
)
.(A.8)
Since we know that the Stieltjes transform of the LSD of a standard sample covari-
ance matrix satisfies:
s(z)= 1
1 − y − yzs(z)− z,(A.9)
then bringing (A.8) into (A.9) leads to
m˜(z)
z
= 1
1 − y − y · z1−cm˜(z) · m˜(z)z − z1−cm˜(z)
,
whose nonnegative solution is unique, which is
(A.10) m˜(z)= −1 + y + z− zc +
√
(1 − y − z+ zc)2 + 4z(yc − y − c)
2(yc − y − c) .
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 451
Therefore, we have for fixed 1
n
Z2Z
∗
2 ,
1
p
tr
(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
→ m˜
(1
λ
)
= 1
a + c − 1
almost surely. Finally, due to the fact that for each ω, the ESD of 1
n
Z2Z
∗
2(ω) will
tend to the same limit (standard M–P distribution), which is independent of the
choice of ω. Therefore, we have for all 1
n
Z2Z
∗
2 (not necessarily deterministic but
only independent of 1
m
X2X
∗
2),
1
p
tr
(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
→ 1
a + c − 1
almost surely. The proof of Lemma A.4 is complete.
LEMMA A.5. A(λ), B(λ), C(λ) and D(λ) are defined in (8.17), then
(l − λ) · 1
n
Z1A(l)Z
∗
1 → (l − λ) ·
(
1 + yλs(λ)) · IM,(A.11)
λ
n
Z1
[
A(l)−A(λ)]Z∗1 → (l − λ) · (λys(λ)+ λ2ym1(λ)) · IM,(A.12)
1
m
X1
(
B(l)−B(λ))X∗1 → −(l − λ) · cm3(λ) ·U
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗,(A.13)
l
n
Z1
(
C(l)−C(λ))X∗1 + l − λn Z1C(λ)X∗1 → (l − λ) · 0M×M,(A.14)
l
m
X1
(
D(l)−D(λ))Z∗1 + l − λm X1D(λ)Z∗1 → (l − λ) · 0M×M.(A.15)
PROOF. Proof of (A.11): Since Z1 is independent of A and Cov(Z1) = IM ,
we combine this fact with Lemma A.2:
(l − λ) · 1
n
Z1A(l)Z
∗
1 → (l − λ) ·
1
n
E trA(l) · IM.(A.16)
Considering the expression of A(l), we have
1
n
E trA(λ)= 1
n
E tr
[
In −Z∗2(λIp − S)−1
(1
n
Z2Z
∗
2
)−1λ
n
Z2
]
= 1 − λ
n
E tr(λIp − S)−1
= 1 − yλ
∫ 1
λ− x dFc,y(x)
= 1 + yλs(λ).
452 Q. WANG AND J. YAO
Therefore, combine with (A.16), we have
(l − λ) · 1
n
Z1A(l)Z
∗
1 → (l − λ)
(
1 + yλs(λ)) · IM.
Proof of (A.12): Bringing the expression of A(l) into consideration, we first have
A(l)−A(λ)
= Z∗2(λIp − S)−1
(1
n
Z2Z
∗
2
)−1λ
n
Z2
−Z∗2(lIp − S)−1
(1
n
Z2Z
∗
2
)−1 l
n
Z2
= Z∗2(λIp − S)−1
(1
n
Z2Z
∗
2
)−1λ− l
n
Z2
+Z∗2
[
(λIp − S)−1 − (lIp − S)−1](1
n
Z2Z
∗
2
)−1 l
n
Z2
= (l − λ) ·
[
−Z∗2(λIp − S)−1
(1
n
Z2Z
∗
2
)−1 1
n
Z2
+Z∗2(λIp − S)−1(lIp − S)−1
(1
n
Z2Z
∗
2
)−1 l
n
Z2
]
.
Then using Lemma A.2 for the same reason, we have
λ
n
Z1
[
A(l)−A(λ)]Z∗1 → λn{E tr(A(l)−A(λ))} · IM
and
1
n
E tr
(
A(l)−A(λ))
= (l − λ) ·
[
−1
n
E tr
{
Z∗2(λIp − S)−1
(1
n
Z2Z
∗
2
)−1 1
n
Z2
}
+ 1
n
E tr
{
Z∗2(λIp − S)−1(lIp − S)−1
(1
n
Z2Z
∗
2
)−1 l
n
Z2
}]
= (l − λ) ·
[
−1
n
E tr(λIp − S)−1 + λ
n
E tr(λIp − S)−2 + o(1)
]
= (l − λ) ·
[
y
∫ 1
x − λ dFc,y(x)+ λy
∫ 1
(λ− x)2 dFc,y(x)+ o(1)
]
= (l − λ) · [ys(λ)+ λym1(λ)+ o(1)].
Therefore, we have
λ
n
Z1
[
A(l)−A(λ)]Z∗1 → (l − λ) · (yλs(λ)+ λ2ym1(λ)) · IM.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 453
Proof of (A.13): First, recall the fact that
Cov(X1)=U
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗
and X1 is independent of B . Using Lemma A.2, we have
1
m
X1
(
B(l)−B(λ))X∗1 → 1mE tr(B(l)−B(λ)) ·U
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗.
The part
1
m
E tr
(
B(l)−B(λ))
= 1
m
E tr
{
X∗2
[
(lIp − S)−1 − (λIp − S)−1](1
n
Z2Z
∗
2
)−1 1
m
X2
}
= (l − λ) ·
[
− 1
m
E tr
{
(λIp − S)−2S}+ o(1)]
= (l − λ) ·
[
−c
∫
x
(λ− x)2 dFc,y(x)+ o(1)
]
= (l − λ) · (−cm3(λ)+ o(1)).
Therefore, we have
1
m
X1
(
B(l)−B(λ))X∗1 → −c(l − λ)m3(λ) ·U
⎛
⎜⎝
a1
. . .
ak
⎞
⎟⎠U∗.
Proof of (A.14) and (A.15): (A.14) and (A.15) are derived simply due to the
fact that Cov(X1,Z1)= 0M×M . The proof of Lemma A.5 is complete.
LEMMA A.6. Define
R˜n(λi) := (Z1 W1)
⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠
(
Z∗1
W ∗1
)
−E[·]
then R˜n(λi) weakly converges to a M × M symmetric random matrix R(λi) =
(Rmn), which is made with independent Gaussian entries of mean zero and vari-
ance
Var(Rmn)=
{
2θi + (v4 − 3)ωi, m= n,
θi, m = n,
454 Q. WANG AND J. YAO
where
ωi = a
2
i (ai + c − 1)2(c + y)
(ai − 1)2 ,
θi = a
2
i (ai + c − 1)2(cy − c − y)
−1 + 2ai + c + a2i (y − 1)
.
PROOF. Since Z1 and W1 are independent, both are made with i.i.d. com-
ponents, having the same first four moments, we can now view
(
Z1 W1
)
as a
M × (n+m) table ξ , made with i.i.d elements of mean 0 and variance 1. Besides,
we can rewrite the expression of A(λ), B(λ), C(λ) and D(λ) as follows:
A(λ)= In −Z∗2
(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1λ
n
Z2,
B(λ)= Im +X∗2
(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1 1
m
X2,
C(λ)= Z∗2
(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1 1
m
X2,
D(λ)=X∗2
(
λ · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1 1
n
Z2.
It holds
A(λ)∗ =A(λ), B(λ)∗ = B(λ), m ·C(λ)∗ = n ·D(λ),
therefore, the matrix ⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠
is symmetric. Define
An(λi)=
√
n+m ·
⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠ .(A.17)
Now we can apply the results in Bai and Yao (2008) (Proposition 3.1 and Re-
mark 1), which says that R˜n(λi) weakly converges to a M ×M symmetric random
matrix R(λi) = (Rmn), which is made with i.i.d. Gaussian entries of mean zero
and variance
Var(Rmn)=
{
2θi + (v4 − 3)ωi, m= n,
θi, m = n.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 455
The following is devoted to the calculation of the values of θi and ωi .
Calculating of θi : From the definition of θ [see Bai and Yao (2008) for details],
we have
θi = lim 1
n+m trA
2
n(λi)
= lim tr
⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠
×
⎛
⎜⎜⎝
λi
√
pA(λi)
n
λi
√
aipC(λi)
n
λi
√
aipD(λi)
m
−ai√pB(λi)
m
⎞
⎟⎟⎠(A.18)
= lim tr
⎛
⎜⎝
pλ2i
n2
A2(λi)+ λ
2
i aip
nm
C(λi)D(λi)
λ2i aip
nm
D(λi)C(λi)+ a
2
i p
m2
B2(λi)
⎞
⎟⎠
= lim
[
pλ2i
n2
trA2(λi)+ 2λ
2
i aip
nm
trC(λi)D(λi)+ a
2
i p
m2
trB2(λi)
]
,
trA2(λi) = tr
[
In +Z∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1
× λi
n
Z2Z
∗
2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1λi
n
Z2
− 2Z∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1λi
n
Z2
]
(A.19)
= n+ λ2i tr(λiIp − S)−2 − 2λi tr(λiIp − S)−1
= n+ pλ2i m1(λi)+ 2pλis(λi),
trC(λi)D(λi) = tr
{
Z∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1
× 1
m
X2X
∗
2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1 1
n
Z2
}
(A.20)
= tr(λiIp − S)−1S(λiIp − S)−1 = pm3(λi),
trB2(λi) = tr
[
Im +X∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1
× 1
m
X2X
∗
2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1 1
m
X2
456 Q. WANG AND J. YAO
+ 2X∗2(λiIp − S)−1
(1
n
Z2Z
∗
2
)−1 1
m
X2
]
(A.21)
= m+ tr(λiIp − S)−1F(λiIp − S)−1S + 2 tr(λiIp − S)−1S
= m+ pm4(λi)+ 2pm2(λi).
Combining (A.18), (A.19), (A.20) and (A.21), we have
θi = λ2i y
(
1 + yλ2i m1(λi)+ 2yλis(λi)
)
+ 2λ2i aicym3(λi)+ a2i c
(
1 + cm4(λi)+ 2cm2(λi))
= a
2
i (ai + c − 1)2(cy − c − y)
−1 + 2ai + c + a2i (y − 1)
.
Calculating of ωi :
ωi = lim 1
n+m
n+m∑
i=1
(
An(λi)(i, i)
)2
(A.22)
= lim
[
n∑
i=1
λ2i p
n2
A2(i, i)+
m∑
i=1
a2i p
m2
B2(i, i)
]
.
In the following, we will show that A(i, i) and B(i, i) both tend to some limits
that is independent of i:
A(i, i)
= 1 −
[
Z∗2
[
λiIp −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
]−1(1
n
Z2Z
∗
2
)−1λi
n
Z2
]
(i, i)(A.23)
= 1 − λi
n
[
Z∗2
[
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
]−1
Z2
]
(i, i).
If we denote ηi as the ith column of Z2, we have
1
n
Z2Z
∗
2 =
1
n
(
η1 · · · ηn) ·
⎛
⎜⎝
η∗1
...
η∗n
⎞
⎟⎠= 1
n
ηiη
∗
i +
1
n
Z2iZ
∗
2i ,
where Z2i is independent of ηi . Since(
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
−
(
λi · 1
n
Z2iZ
∗
2i −
1
m
X2X
∗
2
)−1
= −
(
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1λi
n
ηiη
∗
i
(
λi · 1
n
Z2iZ
∗
2i −
1
m
X2X
∗
2
)−1
,
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 457
we have (
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
(A.24)
= (λi ·
1
n
Z2iZ
∗
2i − 1mX2X∗2)−1
1 + λi
n
η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi
.
Bringing (A.24) into (A.23),
A(i, i)= 1 − λi
n
η∗i
[
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
]−1
ηi
= 1 −
λi
n
η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi
1 + λi
n
η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi
= 1
1 + λi
n
η∗i (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1ηi
,
whose denominator of (A.25) equals
1 + λi
n
tr
(
λi · 1
n
Z2iZ
∗
2i −
1
m
X2X
∗
2
)−1
ηiη
∗
i .(A.25)
Since ηi is independent of (λi · 1nZ2iZ∗2i − 1mX2X∗2)−1, (A.25) converges to the
value 1 + λiy · 1ai+c−1 according to Lemma A.4. Therefore, we have
A(i, i)→ 1
1 + λiy · 1ai+c−1
,(A.26)
which is independent of the choice of i.
For the same reason, we have
B(i, i)
= 1 +
[
X∗2
[
λiIp −
(1
n
Z2Z
∗
2
)−1 1
m
X2X
∗
2
]−1(1
n
Z2Z
∗
2
)−1 1
m
X2
]
(i, i)(A.27)
= 1 +
[
X∗2
[
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
]−1 1
m
X2
]
(i, i).
If we denote δi as the ith column of X2, then we have
1
m
X2X
∗
2 =
1
m
(
δ1 · · · δm) ·
⎛
⎜⎜⎝
δ∗1
...
δ∗m
⎞
⎟⎟⎠= 1mδiδ∗i + 1mX2iX∗2i
458 Q. WANG AND J. YAO
and(
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
−
(
λi · 1
n
Z2Z
∗
2 −
1
m
X2iX
∗
2i
)−1
=
(
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1 1
m
δiδ
∗
i
(
λi · 1
n
Z2Z
∗
2 −
1
m
X2iX
∗
2i
)−1
.
So we have (
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
)−1
(A.28)
= (λi ·
1
n
Z2Z
∗
2 − 1mX2iX∗2i )−1
1 − 1
m
δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi
.
Combine (A.27) and (A.28), we have
B(i, i)= 1 + δ∗i
[
λi · 1
n
Z2Z
∗
2 −
1
m
X2X
∗
2
]−1 1
m
δi
= 1 +
1
m
δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi
1 − 1
m
δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi
(A.29)
= 1
1 − 1
m
δ∗i (λi · 1nZ2Z∗2 − 1mX2iX∗2i )−1δi
.
Using the independence between δi and (λi · 1nZ2Z∗2 − 1T X2iX∗2i )−1 and Lem-
ma A.4 again, we have
1
m
δ∗i
(
λi · 1
n
Z2Z
∗
2 −
1
m
X2iX
∗
2i
)−1
δi → c · 1
ai + c − 1 .
Therefore, we have
B(i, i)→ 1
1 − c
ai+c−1
,
which is also independent of the choice of i.
Finally, taking the definition of ωi in (A.22) into consideration, we have
ωi = λ
2
i y
(1 + yλi · 1ai+c−1)2
+ a
2
i c
(1 − c
ai+c−1)
2
(A.30)
= a
2
i (ai + c − 1)2(c + y)
(ai − 1)2 .
The proof of Lemma A.6 is complete.
LARGE AND SPIKED FISHER MATRICES WITH APPLICATION 459
REFERENCES
ANDERSON, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New
York. MR0771294
BAI, Z. and YAO, J. (2008). Central limit theorems for eigenvalues in a spiked population model.
Ann. Inst. Henri Poincaré Probab. Stat. 44 447–474. MR2451053
BAI, Z. and YAO, J. (2012). On sample eigenvalues in a generalized spiked population model. J. Mul-
tivariate Anal. 106 167–177. MR2887686
BAI, Z. D., YIN, Y. Q. and KRISHNAIAH, P. R. (1987). On limiting empirical distribution function
of the eigenvalues of a multivariate F matrix. Theory Probab. Appl. 32 490–500.
BAI, Z., JIANG, D., YAO, J.-F. and ZHENG, S. (2009). Corrections to LRT on large-dimensional
covariance matrix by RMT. Ann. Statist. 37 3822–3840. MR2572444
BAIK, J., BEN AROUS, G. and PÉCHÉ, S. (2005). Phase transition of the largest eigenvalue for
nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697. MR2165575
BAIK, J. and SILVERSTEIN, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked
population models. J. Multivariate Anal. 97 1382–1408. MR2279680
BENAYCH-GEORGES, F., GUIONNET, A. and MAIDA, M. (2011). Fluctuations of the extreme
eigenvalues of finite rank deformations of random matrices. Electron. J. Probab. 16 1621–1662.
MR2835249
BENAYCH-GEORGES, F. and NADAKUDITI, R. R. (2011). The eigenvalues and eigenvectors of
finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521. MR2782201
CAI, T., LIU, W. and XIA, Y. (2013). Two-sample covariance matrix testing and support recovery
in high-dimensional and sparse settings. J. Amer. Statist. Assoc. 108 265–277. MR3174618
CAPITAINE, M. (2013). Additive/multiplicative free subordination property and limiting eigenvec-
tors of spiked additive deformations of Wigner matrices and spiked sample covariance matrices.
J. Theory Probab. 26 595–648. MR3090543
CAPITAINE, M., DONATI-MARTIN, C. and FÉRAL, D. (2009). The largest eigenvalues of finite rank
deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann.
Probab. 37 1–47. MR2489158
DHARMAWANSA, P., JOHNSTONE, I. M. and ONATSKI, A. (2014). Local asymptotic normality of
the spectrum of high-dimensional spiked F-ratios. Preprint. Available at arXiv:1411.3875.
FÉRAL, D. and PÉCHÉ, S. (2007). The largest eigenvalue of rank one deformation of large Wigner
matrices. Comm. Math. Phys. 272 185–228. MR2291807
HAN, X., PAN, G. and ZHANG, B. (2016). The Tracy–Widom law for the largest eigenvalue of F
type matrices. Ann. Statist. 44 1564–1592. MR3519933
HU, J. and BAI, Z. (2014). Strong representation of weak convergence. Sci. China Math. 57 2399–
2406. MR3266500
JOHNSTONE, I. M. (2001). On the distribution of the largest eigenvalue in principal components
analysis. Ann. Statist. 29 295–327. MR1863961
KARGIN, V. (2015). On estimation in the reduced-rank regression with a large number of responses
and predictors. J. Multivariate Anal. 140 377–394. MR3372575
KRITCHMAN, S. and NADLER, B. (2008). Determining the number of components in a factor model
from limited noisy data. Chemom. Intell. Lab. Syst. 94 19–32.
LI, J. and CHEN, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann.
Statist. 40 908–940. MR2985938
MUIRHEAD, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York. MR0652932
NADLER, B. (2010). Nonparametric detection of signals by information theoretic criteria: Per-
formance analysis and an improved estimator. IEEE Trans. Signal Process. 58 2746–2756.
MR2789420
ONATSKI, A. (2009). Testing hypotheses about the numbers of factors in large factor models. Econo-
metrica 77 1447–1479. MR2561070
460 Q. WANG AND J. YAO
PASSEMIER, D. and YAO, J.-F. (2012). On determining the number of spikes in a high-dimensional
spiked population model. Random Matrices Theory Appl. 1 1150002, 19. MR2930380
PASSEMIER, D. and YAO, J. (2014). Estimation of the number of spikes, possibly equal, in the
high-dimensional case. J. Multivariate Anal. 127 173–183. MR3188885
PAUL, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance
model. Statist. Sinica 17 1617–1642. MR2399865
PÉCHÉ, S. (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices.
Probab. Theory Related Fields 134 127–173. MR2221787
PIZZO, A., RENFREW, D. and SOSHNIKOV, A. (2013). On finite rank deformations of Wigner ma-
trices. Ann. Inst. Henri Poincaré Probab. Stat. 49 64–94. MR3060148
RENFREW, D. and SOSHNIKOV, A. (2013). On finite rank deformations of Wigner matrices II:
Delocalized perturbations. Random Matrices Theory Appl. 2 1250015, 36. MR3039820
SHI, D. (2013). Asymptotic joint distribution of extreme sample eigenvalues and eigenvectors in the
spiked population model. Preprint. Available at arXiv:1304.6113.
SILVERSTEIN, J. W. (1985). The limiting eigenvalue distribution of a multivariate F matrix. SIAM
J. Math. Anal. 16 641–646. MR0783987
SKOROKHOD, A. V. (1956). Limit theorems for stochastic processes. Theory Probab. Appl. 1 261–
290.
WACHTER, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann.
Statist. 8 937–957. MR0585695
WANG, Q., SU, Z. and YAO, J. (2014). Joint CLT for several random sesquilinear forms with
applications to large-dimensional spiked population models. Electron. J. Probab. 19 1–28.
MR3275855
ZHENG, S. R., BAI, Z. D. and YAO, J. F. (2013). CLT for linear spectral statistics of random matrix
S−1T . Preprint. Available at arXiv:1305.1376.
DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE
UNIVERSITY OF HONG KONG
POKFULAM
HONG KONG
E-MAIL: wqw8813@gmail.com
jeffyao@hku.hk