程序代写案例-13-AOS1094|学霸联盟

程序代写案例-13-AOS1094

时间：2022-03-26

The Annals of Statistics
2013, Vol. 41, No. 2, 670–692
DOI: 10.1214/13-AOS1094
© Institute of Mathematical Statistics, 2013
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION1
BY YAO XIE AND DAVID SIEGMUND
Duke University and Stanford University
We develop a mixture procedure to monitor parallel streams of data for
a change-point that affects only a subset of them, without assuming a spatial
structure relating the data streams to one another. Observations are assumed
initially to be independent standard normal random variables. After a change-
point the observations in a subset of the streams of data have nonzero mean
values. The subset and the post-change means are unknown. The procedure
we study uses stream specific generalized likelihood ratio statistics, which
are combined to form an overall detection statistic in a mixture model that
hypothesizes an assumed fraction p0 of affected data streams. An analytic
expression is obtained for the average run length (ARL) when there is no
change and is shown by simulations to be very accurate. Similarly, an ap-
proximation for the expected detection delay (EDD) after a change-point is
also obtained. Numerical examples are given to compare the suggested proce-
dure to other procedures for unstructured problems and in one case where the
problem is assumed to have a well-defined geometric structure. Finally we
discuss sensitivity of the procedure to the assumed value of p0 and suggest a
generalization.
1. Introduction. Single sequence problems of change-point detection have a
long history in industrial quality control, where an observed process is assumed
initially to be in control and at a change-point becomes out of control. It is desired
to detect the change-point with as little delay as possible, subject to the constraint
that false detections occurring before the true change-point are very rare. Outstand-
ing early contributions are due to Page [7, 8], Shiryaev [18] and Lorden [5].
We assume there are parallel streams of data subject to change-points. More pre-
cisely suppose that for each n = 1, . . . ,N , we make observations yn,t , t = 1,2, . . . .
The observations are mutually independent within and across data streams. At a
certain time κ , there are changes in the distributions of observations made at a
subset N ⊂ {1, . . . ,N} of cardinality |N | ≤ N . Also denote by N c the set of un-
affected data streams. The change-point κ , the subset N and its size, and the size of
the changes are all unknown. As in the case of a single sequence, N = 1, the goal
is to detect the change-point as soon as possible after it occurs, while keeping the
frequency of false alarms as low as possible. In the change-point detection litera-
ture, a surrogate for the frequency of false alarms is the average-run-length (ARL),
Received June 2012; revised November 2012.
1Supported in part by NSF Grant 1043204 and Stanford General Yao-Wu Wang Graduate
Fellowship.
MSC2010 subject classifications. 62L10.
Key words and phrases. Change-point detection, multi-sensor.
670
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 671
defined to be the expected time before incorrectly announcing a change of distri-
bution when none has occurred.
It may be convenient to imagine that the data streams represent observations at
a collection of N sensors and that the change-point is the onset of a localized sig-
nal that can be detected by sensors in the neighborhood of the signal. In this paper
we assume for the most part that the problem is unstructured in the sense that we
do not assume a model that relates the changes seen at the different sensors. An
example of an unstructured problem is the model for anomaly detection in com-
puter networks developed in [4]. For other discussions of unstructured problems
and applications, see [2, 6, 9, 19].
At the other extreme are structured problems where there exists a profile deter-
mining the relative magnitudes of the changes observed by different sensors, say,
according to their distance from the location of a signal (e.g., [12, 16]). A prob-
lem that is potentially structured, may behave more like an unstructured problem
if the number of sensors is small and/or they are irregularly placed, so their dis-
tances from one another are large compared to the point spread function of the
signals. Alternatively, local signals may be collected at the relatively widely dis-
persed hubs of small, star-shaped subnetworks, then condensed and transmitted to
a central processor, thus in effect removing the local structure.
The detection problem of particular interest in this paper involves the case that
N is large and |N | is relatively small. To achieve efficient detection, the detection
procedure should use insofar as possible only information from affected sensors
and suppress noise from the unaffected sensors.
In analogy to the well-known CUSUM statistic (e.g., Page [7, 8], Lorden [5]),
Mei [6] recently proposed a multi-sensor procedure based on sums of the CUSUM
statistic from individual sensors. He then compares the sum with a suitable thresh-
old to determine a stopping rule. While the distributions of the data, both before
and after the change-point, are completely general, they are also assumed to be
completely known. The method is shown to minimize asymptotically the expected
detection delay (EDD) for a given false alarm rate, when the threshold value (and
hence the constraint imposed by the ARL) becomes infinitely large. The procedure
fails to be asymptotically optimal when the specified distributions are incorrect.
Tartakovsky and Veeravalli proposed a different procedure [19] that sums the local
likelihood ratio statistic before forming CUSUM statistics. They also assume the
post-change distributions are completely prescribed. Moreover, both procedures
assume the change-point is observed by all sensors. When only a subset of sen-
sors observe the change-point, these procedures include noise from the unaffected
sensors in the detection statistic, which may lead to long detection delays.
In this paper, we develop a mixture procedure that achieves good detection per-
formance in the case of an unknown subset of affected sensors and incompletely
specified post-change distributions. The key feature of the proposed procedure is
that it incorporates an assumption about the fraction of affected sensors when com-
puting the detection statistic. We assume that the individual observations are inde-
pendent and normally distributed with unit variance, and that the changes occur
672 Y. XIE AND D. SIEGMUND
in their mean values. At the t th vector of observations (yn,t , n = 1, . . . ,N), the
mixture procedure first computes a generalized likelihood ratio (GLR) statistic for
each individual sensor under the assumption that a change-point has occurred at
k ≤ t . The local GLR statistics are combined via a mixture model that has the ef-
fect of soft thresholding the local statistics according to an hypothesized fraction of
affected sensors, p0. The resulting local statistics are summed and compared with
a detection threshold. To characterize the performance of our proposed procedure,
we derive analytic approximations for its ARL and EDD, which are evaluated by
comparing the approximations to simulations. Since simulation of the ARL is quite
time consuming, the analytic approximation to the ARL proves very useful in de-
termining a suitable detection threshold. The proposed procedure is then compared
numerically to competing procedures and is shown to be very competitive. It is also
shown to be reasonably robust to the choice of p0, and methods are suggested to
increase the robustness to mis-specification of p0.
Although we assume throughout that the observations are normally distributed,
the model can be generalized to an exponential family of distributions satisfying
some additional regularity conditions.
The remainder of the paper is organized as follows. In Section 2 we establish our
notation and formulate the problem more precisely. In Section 3 we review several
detection procedures and introduce the proposed mixture procedure. In Section 4
we derive approximations to the ARL and EDD of the mixture procedure, and we
demonstrate the accuracy of these approximations numerically. Section 5 demon-
strates by numerical examples that the mixture procedure performs well compared
to other procedures in the unstructured problem. In Section 6 we suggest a “par-
allel” procedure to increase robustness regarding the hypothesized fraction of af-
fected data streams, p0. In Section 7 we also compare the mixture procedure to that
suggested in [16] for a structured problem, under the assumption that the assumed
structure is correct. Finally Section 8 concludes the paper with some discussion.
2. Assumptions and formulation. Given N sensors, for each n = 1,2, . . . ,
N , the observations from the nth sensor are given by yn,t , t = 1,2, . . . . Assume
that different observations are mutually independent and normally distributed with
unit variances. Under the hypothesis of no change, they have zero means. Probabil-
ity and expectation in this case are denoted by P∞ and E∞, respectively. Alterna-
tively, there exists a change-point κ , 0 ≤ κ < ∞, and a subset N of {1,2, . . . ,N},
having cardinality |N |, of observations affected by the change-point. For each
n ∈ N , the observations yn,t have means equal to μn > 0 for all t > κ , while
observations from the unaffected sensors keep the same standard normal distri-
bution. The probability and expectation in this case are denoted by Pκ and Eκ ,
respectively. In particular, κ = 0 denotes an immediate change. Note that proba-
bilities and expectations depend on N and the values of μn, although this depen-
dence is suppressed in the notation. The fraction of affected sensors is given by
p = |N |/N .
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 673
Our goal is to define a stopping rule T such that for all sufficiently large pre-
scribed constants c > 0, E∞{T } ≥ c, while asymptotically Eκ{T − κ|T > κ} is a
minimum. Ideally, the minimization would hold uniformly in the various unknown
parameters: κ , N and the μn. Since this is clearly impossible, in Section 5 we will
compare different procedures through numerical examples computed under vari-
ous hypothetical conditions.
3. Detection procedures. Since the observations are independent, for an as-
sumed value of the change-point κ = k and sensor n ∈ N , the log-likelihood of
observations accumulated by time t > k is given by
n(t, k,μn) =
t∑
i=k+1
(
μnyn,i − μ2n/2
)
.(3.1)
We assume that each sensor is affected by the change with probability p0 (inde-
pendently from one sensor to the next). The global log likelihood of all N sensors
is
N∑
n=1
log
(
1 − p0 + p0 exp[n(t, k,μn)]).(3.2)
Expression (3.2) suggests several change-point detection rules.
One possibility is to set μn equal to a nominal change, say δ > 0, which would
be important to detect, and define the stopping rule
T1 = inf
{
t : max
0≤k≤t
N∑
n=1
log
(
1 − p0 + p0 exp[+n (t, k, δ)])≥ b
}
,(3.3)
where x+ denotes the positive part of x. Here thresholding by the positive part
plays the role of dimension reduction by limiting the current considerations only
to sequences that appear to be affected by the change-point.
Another possibility is to replace μn by its maximum likelihood estimator, as
follows. The maximum likelihood estimate of the post-change mean as a function
of the current number of observations t and putative change-point location k is
given by
μˆn =
(
t∑
i=k+1
yn,i
)+/
(t − k).(3.4)
Substitution into (3.1) gives the log generalized likelihood ratio (GLR) statistic.
Putting
Sn,t =
t∑
i=1
yn,i,
(3.5)
Un,k,t = (t − k)−1/2(Sn,t − Sn,k),
674 Y. XIE AND D. SIEGMUND
we can write the log GLR as
n(t, k, μˆn) = (U+n,k,t )2/2.(3.6)
We define the stopping rule
T2 = inf
{
t : max
0≤kN∑
n=1
log
(
1 − p0 + p0 exp[(U+n,k,t )2/2])≥ b
}
.(3.7)
REMARK. In what follows we use a window limited version of (3.7), where
the maximum is restricted to m0 ≤ t−k < m1 for suitable m0 < m1. The role of m1
is two-fold. On the one hand it reduces the memory requirements to implement the
stopping rule, and on the other it effectively establishes a minimum level of change
that we want to detect. For asymptotic theory given below, we assume that b → ∞,
with m1/b also diverging. More specific guidelines in selecting m1 are discussed
in [3]. In the numerical examples that follow, we take m0 = 1. In practice a slightly
larger value can be used to provide protection against outliers in the data, although
it may delay detection in cases involving very large changes.
The detection rule (3.7) is motivated by the suggestion of [17] for a similar fixed
sample change-point detection problem.
For the special case p0 = 1, (3.7) becomes the (global) GLR procedure, which
for N = 1 was studied by [14]. It is expected to be efficient if the change-point
affects a large fraction of the sensors. At the other extreme, if only one or a very
small number of sensors is affected by the change-point, a reasonable procedure
would be
Tmax = inf
{
t : max
0≤k(
U+n,k,t
)2
/2 ≥ b
}
.(3.8)
The stopping rule Tmax can also be window limited.
Still other possibilities are suggested by the observation that a function of y of
the form log[1 − p0 + p0 exp(y)] is large only if y is large, and then this function
is approximately equal to [y + log(p0)]+. This suggests the stopping rules
T3 = inf
{
t : max
0≤kN∑
n=1
[
n(t, k, δ) + log(p0)]+ ≥ b
}
(3.9)
and
T4 = inf
{
t : max
0≤kN∑
n=1
[(
U+n,k,t
)2
/2 + log(p0)]+ ≥ b
}
,(3.10)
or a suitably window limited version.
Mei [6] suggests the stopping rule
TMei = inf
{
t :
N∑
n=1
max
0≤k}
,(3.11)
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 675
which simply adds the classical CUSUM statistics for the different sensors. Note
that this procedure does not involve the assumption that all distributions affected by
the change-point change simultaneously. As we shall see below, this has a negative
impact on the efficiency of the procedure in our formulation, although it might
prove beneficial in differently formulated problems. For example, there may be
a time delay before the signal is perceived at different sensors, or there may be
different signals occurring at different times in the proximity of different sensors.
In these problems, Mei’s procedure, which allows changes to occur at different
times, could be useful.
The procedure suggested by Tartakovsky and Veeravalli [19] is defined by the
stopping rule
TTV inf
{
t : max
0≤kN∑
n=1
n(t, k, δ) ≥ b
}
.(3.12)
This stopping rule resembles T3(p0) with p0 = 1, but with one important differ-
ence. After a change-point the statistics of the unaffected sensors have negative
drifts that tend to cancel the positive drifts from the affected sensors. This can lead
to a large EDD. Use of the positive part, [n(t, k, δ)]+, in the definitions of our
stopping rules is designed to avoid this problem.
Different thresholds b are required for each of these detection procedures to
meet the ARL requirement.
4. Properties of the detection procedures. In this section we develop theo-
retical properties of the detection procedures T1 to T4, with emphasis on T2 and
the closely related T4. We use two standard performance metrics: (i) the expected
value of the stopping time when there is no change, the average run length or ARL;
(ii) the expected detection delay (EDD), defined to be the expected stopping time
in the extreme case where a change occurs immediately at κ = 0. The EDD pro-
vides an upper bound on the expected delay after a change-point until detection
occurs when the change occurs later in the sequence of observations. The approx-
imation to the ARL will be shown below to be very accurate, which is fortunate
since its simulation can be quite time consuming, especially for large N . Accu-
racy of our approximation for the EDD is variable, but fortunately this parameter
is usually easily simulated.
4.1. Average run length when there is no change. The ARL is the expected
value of the stopping time T when there is no change-point. It will be convenient
to use the following notation. Let g(x) denote a twice continuously differentiable
increasing function that is bounded below at −∞ and grows sub-exponentially at
+∞. In what follows we consider explicitly g(u). With an additional argument
discussed below the results also apply to g(u+). Let
ψ(θ) = logE{exp[θg(U)]},(4.1)
676 Y. XIE AND D. SIEGMUND
where U has a standard normal distribution. Also let
γ (θ) = 12θ2E
{[
g˙(U)
]2
exp
[
θg(U) − ψ(θ)]},(4.2)
where the dot denotes differentiation. Let
H(N, θ) = θ [2πψ¨(θ)]
1/2
γ (θ)N1/2
exp
{
N
[
θψ˙(θ) − ψ(θ)]}.(4.3)
Denote the standard normal density function by φ(x) and its distribution function
by
(x). Also let ν(x) = 2x−2 exp[−2∑∞1 n−1
(−|x|n1/2/2)]; cf. [13], page 82.
For numerical purposes a simple, accurate approximation is given by (cf. [15])
ν(x) ≈ (2/x)[
(x/2) − 0.5]
(x/2)
(x/2) + φ(x/2) .
THEOREM 1. Assume that N → ∞ and b → ∞ with b/N fixed. Let θ be
defined by ψ˙(θ) = b/N . For the window limited stopping rule
T = inf
{
t : max
0≤kN∑
n=1
g(Un,k,t ) ≥ b
}
(4.4)
with m1 = o(br) for some positive integer r , we have
E
∞{T } ∼ H(N, θ)
/∫ [2Nγ (θ)/m0]1/2
[2Nγ (θ)/m1]1/2
yν2(y) dy.(4.5)
REMARK. The integrand in the approximation is integrable at both 0 and ∞
by virtue of the relations ν(y) → 1 as y → 0, and ν(y) ∼ 2/y2 as y → ∞.
The following calculations illustrate the essential features of approxima-
tion (4.5). For detailed proofs in similar problems, see [14] (where additional
complications arise because the stopping rule there is not window limited) or [16].
From arguments similar to those used in [17], we can show that
P
∞{T ≤ m}
= P∞
{
max
t≤m,m0≤t−k≤m1
N∑
n=1
g(Un,k,t ) ≥ b
}
(4.6)
∼ N2e−N[θψ˙(θ)−ψ(θ)][2πNψ¨(θ)]−1/2|θ |−1γ 2(θ)
×
∫ m1/m
m0/m
ν2
([
2Nγ (θ)/(mt)
]1/2)
(1 − t) dt/t2.
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 677
Here it is assumed that m is large, but small enough that the right-hand side of (4.6)
converges to 0 when b → ∞. Changing variables in the integrand and using the
notation (4.3), we can re-write this approximation as
P
∞{T ≤ m} ∼ m
∫ [2Nγ (θ)/m0]1/2
[2Nγ (θ)/m1]1/2
yν2(y) dy/H(N, θ).(4.7)
From the arguments in [14] or [16] (see also [1]), we see that T is asymptotically
exponentially distributed and is uniformly integrable. Hence if λ denotes the factor
multiplying m on the right-hand side of (4.7), then for still larger m, in the range
where mλ is bounded away from 0 and ∞, P∞{T ≤ m} − [1 − exp(−λm)] → 0.
Consequently E∞{T } ∼ λ−1, which is equivalent to (4.5).
REMARKS. (i) The result we have used from [17] was motivated by a prob-
lem involving changes that could be positive, or negative, or both; and in that
paper it was assumed that the function g(u) is twice continuously differentiable.
The required smoothness is not satisfied by the composite functions of principal
interest here, of the form g(u+). However, (i) the required smoothness is required
only in the derivation of some of the constant factors, not the exponentially small
factor, and (ii) the second derivative that appears in the derivation in [17] can be
eliminated from the final approximation by an integration by parts. As a conse-
quence, we can approximate the indicator of u > 0 by
(ru) and use in place of u+
the smooth function
∫ u
−∞
(rv)dv = u
(ru) + r−1φ(ru), which converges uni-
formly to u+ as r → ∞. Letting r → ∞ and interchanging limits produce (4.6).
An alternative approach would be simply to define g(u) to be appropriate for
a one-sided change while having the required smoothness in u. An example is
g(u) = log[1 −p0 +p0 exp(u2
(ru)/2)], which sidesteps the technical issue, but
seems less easily motivated.
(ii) The fact that all the stopping times studied in this paper are asymptotically
exponentially distributed when there is no change can be very useful. (A simulation
illustrating this property in the case of T2 is given in Section 4.3.) To simulate the
ARL, it is not necessary to simulate the process until the stopping time T , which
can be computationally time consuming, but only until a time m when we are
able to estimate P∞{T ≤ m} with a reasonably small percentage error. For the
numerical examples given later, we have occasionally used this shortcut with the
value of m that makes this probability 0.1 or 0.05.
(iii) Although the mathematical assumptions involve large values of N , some
numerical experimentation for T2(p0) shows that (4.5) gives roughly the correct
values even for N = 1 or 2. For p0 = 1 (4.5) provides numerical results similar to
those given for the generalized likelihood ratio statistic in [14].
(iv) Theorem 1 allows us to approximate the ARL for T2 and T4. The stop-
ping rule Tmax is straightforward to handle, since the minimum of N independent
exponentially distributed random variables is itself exponentially distributed. The
678 Y. XIE AND D. SIEGMUND
stopping rules T1 and T3, where g is composed with +t,k,δ , can be handled by
a similar argument with one important difference. Now the cumulant generating
function ψ(θ) depends on w = t − k, so the equation defining θ must be solved
for each value of w, and the resulting approximation summed over possible values
of w. Fortunately only a few terms make a substantial contribution to the sum,
except when δ is very small. For the results reported below, the additional amount
of computation is negligible.
4.2. Expected detection delay. After a change-point occurs, we are interested
in the expected number of additional observations required for detection. For the
detection rules considered in this paper, the maximum expected detection delay
over κ ≥ 0 is attained at κ = 0. Hence we consider this case.
Here we are unable to consider stopping times defined by a general function g,
so we consider the specific functions involved in the definitions of T2 and T4.
Let g(u,p0) = log(1−p0 +p0 exp[(u+)2/2]) or [(u+)2/2+ log(p0)]+, and let U
denote a standard normal random variable. Recall that N denotes the set of sensors
at which there is a change, |N | is the cardinality of this set and p = |N |/N is the
true fraction of sensors that are affected by the change. For each n ∈ N the mean
value changes from 0 to μn > 0, and for n ∈ N c the distribution remains the same
as before the change-point. Let
=
(∑
n∈N
μ2n
)1/2
.(4.8)
Note that the Kullback–Leibler divergence of a vector of observations after the
change-point from a vector of observations before the change-point is 2/2, which
determines the asymptotic rate of growth of the detection statistic after the change-
point. Using Wald’s identity [13], we see to a first-order approximation that the
expected detection delay is 2b/ 2, provided that the maximum window size, m1,
is large compared to this quantity. In the following derivation we assume m1

2b/ 2.
In addition, let
S˜t
t∑
i=1
zi(4.9)
be a random walk where the increments zi are independent and identically dis-
tributed with mean 2/2 and variance 2. Let τ = min{t : S˜t > 0}. Our approxi-
mation to the expected detection delay given below depends on two related quan-
tities. The first is
ρ( ) = 12E
{
S˜2τ
}
/E{S˜τ }(4.10)
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 679
for which exact computational expressions and useful approximations are available
in [13]. In particular,
ρ( ) = E{z21}/(2E{z1})−
∞∑
i=1
i−1E
{
S˜−i
}= 2/4 + 1 − ∞∑
i=1
i−1E
{
S˜−i
}
,(4.11)
where (x)− = −min{x,0}. The second quantity is E{mint≥0 S˜t }, which according
to (Problem 8.14 in [13]) is given by
E
{
min
t≥0 S˜t
}
= ρ( ) − 1 − 2/4.(4.12)
The following approximation refines the first-order result for the expected de-
tection delay. Recall that E0 denotes expectation when the change-point κ = 0.
THEOREM 2. Suppose b → ∞, with other parameters held fixed. Then for
T = T2 or T4,
E
0{T } = 2 −2
[
b + ρ( ) − |N | logp0 − |N |/2 + E
{
min
t≥0 S˜t
}
(4.13)
− (N − |N |)E{g(U,p0)}+ o(1)].
The following calculation provides the ingredients for a proof of (4.13). For de-
tails in similar problems involving a single sequence, see [10] and [14]. For conve-
nience we assume that T = T2, but there is almost no difference in the calculations
when T = T4. Let k0 = b1/2. For k < T − k0, we can write the detection statistic
at the stopping time T as follows, up to a term that tends to zero exponentially fast
in probability:
Zk,T =
N∑
n=1
g(Un,k,T ,p0)
= ∑
n∈N
g(Un,k,T ,p0) +
∑
n∈N c
g(Un,k,T ,p0)
= ∑
n∈N
log
(
p0 exp
{(
U+n,k,T
)2
/2
}[
1 + 1 − p0
p0
exp
{−(U+n,k,T )2/2}
])
+ ∑
n∈N c
g(Un,k,T ,p0)
= ∑
n∈N
[
logp0 + (U+n,k,T )2/2]+ ∑
n∈N c
g(Un,k,T ,p0)(4.14)
+ ∑
n∈N
log
(
1 + 1 − p0
p0
exp
{−(U+n,k,T )2/2}
)
680 Y. XIE AND D. SIEGMUND
= |N | logp0 +
∑
n∈N
(
U+n,k,T
)2
/2 + ∑
n∈N c
g(Un,k,T ,p0) + o(1)
= |N | logp0 +
∑
n∈N
[
(Sn,T − Sn,k)+]2/2(T − k)
+ ∑
n∈N c
g(Un,k,T ,p0) + o(1).
The residual term
∑
n∈N log(1 + (1 − p0) exp{−(U+n,k,T )2/2}/p0) tends to zero
exponentially fast when b → ∞ because when b → ∞, T → b/ , and n ∈ N ,
(U+n,k,T )2 grows on the order of μ2n(T − k) > μ2nk0 = μ2n
√
b.
We then use the following simple identity to decompose the second term
in (4.14) for the affected sensors into two parts:
(
S+n,t
)2
/2t = S2n,t /2t −
(
S−n,t
)2
/2t
(4.15)
= μn(Sn,t − μnt/2) + (Sn,t − μnt)2/2t − (S−n,t )2/2t.
From the preceding discussion, we see that max0≤kwhile maxT−k0≤kprobability the max over all k is attained for k < T − k0, so from (4.15) and (4.14)
we have
max
0≤k= max
0≤kN∑
n=1
g(Un,k,T ,p0) + o(1)
= |N | logp0
+ max
0≤k[∑
n∈N
μn
[
(Sn,T − Sn,k) − (T − k)μn/2]
+ ∑
n∈N
[
(Sn,T − Sn,k) − (T − k)μn]2/[2(T − k)]
− [(Sn,T − Sn,k)−]2/2(T − k)
(4.16)
+ ∑
n∈N c
g(Un,k,T ,p0)
]
+ o(1)
= |N | logp0 +
∑
n∈N
μn(Sn,T − T μn/2)
+ max
0≤k[
− ∑
n∈N
μn(Sn,k − kμn/2)
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 681
+ ∑
n∈N
[
(Sn,T − Sn,k) − (T − k)μn]2/[2(T − k)]
− ∑
n∈N
[
(Sn,T − Sn,k)−]2/[2(T − k)]
+ ∑
n∈N c
g(Un,k,T ,p0)
]
+ o(1).
The following lemma forms the basis for the rest of the derivation. The proof is
omitted here; for details see [20] (or [14] for the special case N = 1).
LEMMA 4.1. For k0 = b1/2, asymptotically as b → ∞
max
0≤k[
− ∑
n∈N
μn(Sn,k − kμn/2) +
∑
n∈N
[(Sn,T − Sn,k) − (T − k)μn]2
2(T − k)
− ∑
n∈N
[(Sn,T − Sn,k)−]2
2(T − k) +
∑
n∈N c
g(Un,k,T ,p0)
]
= ∑
n∈N
(Sn,T − T μn)2/2T +
∑
n∈N c
g(Un,0,T ,p0)
+ max
0≤k[
− ∑
n∈N
μn(Sn,k − kμn/2)
]
+ op(1),
where op(1) converges to 0 in probability.
By taking expectations in (4.16), letting b → ∞ and using Lemma 4.1, we have
E
0
{
max
0≤kN∑
n=1
g(Un,k,T ,p0)
}
= E0
{
|N | logp0 +
∑
n∈N
μn(Sn,T − T μn/2) +
∑
n∈N
(Sn,T − T μn)2
2T
(4.17)
+ ∑
n∈N c
g(Un,0,T ,p0) + max
0≤k[
− ∑
n∈N
μn(Sn,k − kμn/2)
]}
+ o(1).
We will compute each term on the right-hand side of (4.17) separately. We will
need the lemma due to Anscombe and Doeblin (see Theorem 2.40 in [13]), which
states that the standardized randomly stopped sum of random variables are asymp-
totically normally distributed under quite general conditions.
682 Y. XIE AND D. SIEGMUND
(i) By Wald’s identity [13],
E
0
{∑
n∈N
μn(Sn,T − T μn/2)
}
= E0{T } 2/2.(4.18)
(ii) By the Anscombe–Doeblin lemma, (Sn,T − T μn)/T 1/2 is asymptotically
normally distributed with zero mean and unit variance. Hence
∑
n∈N (Sn,T −
T μn)
2/T is asymptotically a sum of independent χ21 random variables, so
E
0
{∑
n∈N
(Sn,T − T μn)2/2T
}
= |N |/2 + o(1).(4.19)
(iii) Similarly,
E
0
{ ∑
n∈N c
g(Un,0,T ,p0)
}
→ (N − |N |)E0{g(U,p0)}.(4.20)
(iv) The term −∑n∈N μn(Sn,k − μnk/2) (k ≥ 0) is a random walk with neg-
ative drift − 2/2 and variance 2. Hence E0{max0≤k∑
n∈N μn(Sn,k −
kμn/2)]} converges to the expected minimum of this random walk, which has
the same distribution as mint≥0 S˜t defined above.
Having evaluated the right-hand side of (4.17), we now consider the left-hand
side, to which we will apply a nonlinear renewal theorem. This requires that we
write the process of interest as a random walk and a relatively slowly varying
remainder, and follows standard lines by using a Taylor series approximation to
show that for large values of t and bounded values of k (cf. [10, 14], and the
argument already given above) the asymptotic growth of ∑Nn=1 g(Un,k,t , p0) for
t > κ is governed by the random walk
∑
n∈N μn(Sn,t − tμn/2), which has mean
value t 2/2 and variance t 2. By writing
E
0
{
max
0≤kN∑
n=1
g(Un,k,T ,p0)
}
= b + E0
{
max
0≤kN∑
n=1
g(Un,k,T ,p0) − b
}
,(4.21)
and using nonlinear renewal theory to evaluate the expected overshoot of the pro-
cess of (4.9) over the boundary ([13], Chapter IX), we obtain
E
0
{
max
0≤kN∑
n=1
g(Un,k,T ,p0) − b
}
→ ρ( ).(4.22)
REMARKS. (i) Although the proof of Theorem 2 follows the pattern of argu-
ments given previously in the case N = 1, unlike that case where the asymptotic
approximation is surprisingly accurate even when the EDD is relatively small, here
the accuracy is quite variable. The key element in the derivation is the asymptotic
linearization of g(U+n,k,t , p0) for each n ∈ N into a term involving a random walk
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 683
and a remainder. A simple test for conditions when the approximation will be rea-
sonably accurate is to compare the exact value of E0{Z0,t }, which is easily evalu-
ated by numerical integration, to the expectation of the linearized approximation,
then take t large enough to make these two expectations approximately equal. If
such a value of t makes the expectations less than or equal to b, the approximation
of the theorem will be reasonably accurate. Indeed the preceding argument is sim-
ply an elaboration of these equalities at t = T combined with Wald’s identity to
extract E0{T } from the random walk, and numerous technical steps to approximate
the nonnegligible terms in the remainders. For a crude, but quite reliable approx-
imation that has no mathematically precise justification that we can see, choose t
to satisfy E0{Z0,t } = b. Fortunately the EDD is easily simulated when it is small,
which is where problems with the analytic approximation arise.
(ii) In principle the same method can be used to approximate the expected de-
tection delay of T1 or T3. In some places the analysis is substantially simpler, but
in one important respect it is more complicated. In the preceding argument, for
n ∈ N c the term involving the expected value of g(Un,0,T ,p0) is very simple,
since U2n,0,T has asymptotically a χ2 distribution. For the stopping rules T1 and
T3, the term g(n,0,T ,p0) does not have a limiting distribution, and in fact for
n ∈ N c it converges to 0 as b → ∞. However, there are typically a large number
of these terms, and in many cases T is relatively small, nowhere near its asymp-
totic limit. Hence it would be unwise simply to replace this expectation by 0. To a
crude first-order approximation T ∼ b/[δ0(∑n∈N μn − δ0/2)] = t0, say. Although
it is not correct mathematically speaking, an often reasonable approximation can
be obtained by using the term −(N − |N |)E0{n,0,t0} to account for the statis-
tics associated with sequences unaffected by the change-point. Some examples are
included in the numerical examples in Table 5.
4.3. Accuracy of the approximations. We start with examining the accuracy
of our approximations for the ARL and the EDD in (4.5) and (4.13). For a Monte
Carlo experiment we use N = 100 sensors, m1 = 200 and μn = 1 for all affected
data streams. The comparisons for different values of p0 between the theoretical
and Monte Carlo ARLs obtained from 500 Monte Carlo trials are given in Tables 1
and 2, which show that the approximation in (4.5) is quite accurate.
Figure 1 illustrates the fact that T2(0.1) is approximately exponentially dis-
tributed.
Results for the EDD obtained from 500 Monte Carlo trials are given in Table 3.
Although the approximation seems reasonable, it does not appear to be as accurate
as the approximation for the ARL. Since the EDD requires considerably less com-
putational effort to simulate and needs to be known only roughly when we choose
design parameters for a particular problem, there is less value to an accurate ana-
lytic approximation.
We have performed considerably more extensive simulations that yield results
consistent with the small experiments reported in Tables 1, 2 and 3. Since the
684 Y. XIE AND D. SIEGMUND
TABLE 1
ARL of T2(p0),m1 = 200
p0 b Theory Monte Carlo
0.3 31.2 5001 5504
0.3 32.3 10,002 10,221
0.1 19.5 5000 4968
0.1 20.4 10,001 10,093
0.03 12.7 5001 4830
0.03 13.5 10,001 9948
TABLE 2
ARL of T4(p0), m1 = 200
p0 b Theory Monte Carlo
0.3 24.0 5000 5514
0.1 15.1 5000 5062
0.03 10.8 5000 5600
parameter p0 defining T2 must be chosen subjectively, it is interesting to observe
that Table 3 suggests these procedures are reasonably robust with respect to the
choice of p0, and choosing p0 somewhat too large seems less costly than choosing
FIG. 1. The tail probability P{T2(0.1) > m}. Approximate theoretical values are obtained
from (4.5); numerical values are obtained from 500 Monte Carlo trials.
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 685
TABLE 3
EDDs of T2(p0) and T4(p0) with ARL ≈ 5000, μ = 1, and m1 = 200
p p0 Theory T2(p0) Monte Carlo T2(p0) Theory T4(p0) Monte Carlo T4(p0)
0.3 0.3 3.5 3.2 4.2 3.5
0.1 0.3 6.2 6.5 7.1 6.6
0.3 0.1 5.2 3.6 5.1 4.1
0.1 0.1 7.2 6.7 7.0 7.1
0.03 0.1 13.9 14.3 13.5 14.3
0.03 0.03 13.9 14.2 13.7 14.6
p0 too small. More extensive calculations bear out this observation. We return to
the problem of choosing p0 in Section 6.
5. Numerical comparisons. In this section, we compare the expected detec-
tion delays for several procedures when their ARLs are all approximately 5000.
The thresholds are given in Table 4, where we assume N = 100, and m1 = 200 for
those procedures for which a limited window size is appropriate. Procedure (3.7)
is denoted by T2(p0). For Mei’s procedure we put δn = 1. The procedures in (3.9)
are denoted by T3(p0, δ). Recall that T2(1) uses the generalized likelihood ra-
tio statistic and T3(1, δ) is similar to the procedure proposed by Tartakovsky and
Veeravalli [19], but we have inserted the positive part to avoid the problems men-
tioned in Section 3. The expected detection delays are obtained from 500 Monte
Carlo trials and are listed in Table 5. For some entries, values from our asymptotic
approximation are given in parentheses.
Note that the max procedure (3.8) has the smallest detection delay when p =
0.01, but it has the largest delay for p greater than 0.1. The procedures defined by
T2 and by T3 are comparable. Mei’s procedure performs well when p is large, but
poorly when p is small.
TABLE 4
Thresholds for ARL ≈ 5000, m1 = 200
Procedure b Monte Carlo ARL
Max 12.8 5041
T2(1) 53.5 4978
T2(0.1) 19.5 5000
Mei 88.5 4997
T3(0.1,1) 12.4 4948
T3(1,1) 41.6 4993
686 Y. XIE AND D. SIEGMUND
TABLE 5
EDD with N = 100 obtained from 500 Monte Carlo trials. Thresholds for ARL 5000 are listed in
Table 4. Theoretical approximations for EDD are in parentheses
p Method EDD, μ= 1 EDD, μ= 0.7 EDD, μ= 1.3
0.01 max 25.5 49.6 16.3
T2(1) 52.3 (56.9) 105.5 (114.6) 32.9 (34.1)
T2(0.1) 31.6 (32.5) 59.4 (64.9) 20.3 (19.7)
Mei 53.2 103.8 38.1
T3(0.1,1) 29.1 (29.3) 63.3 (59.0) 19.1 (19.1)
T3(1,1) 82.0 (83.6) 213.7 (193.5) 53.3 (53.5)
0.03 max 18.1 33.3 11.6
T2(1) 18.7 (19.3) 35.8 (38.4) 12.6 (11.7)
T2(0.1) 14.2 (13.9) 26.7 (27.5) 9.3 (8.5)
Mei 23.0 41.6 16.4
T3(0.1) 13.4 26.9 9.2
T3(1) 27.2 66.0 16.3
0.05 max 15.5 28.4 9.7
T2(1) 12.2 (11.6) 21.8 (23.0) 7.9 (7.1)
T2(0.1) 10.4 (10.1) 18.9 (19.9) 6.9 (6.2)
Mei 15.7 26.9 11.4
T3(0.1,1) 9.8 (9.8) 18.6 (21.4) 7.0 (6.8)
T3(1,1) 15.5 (16.2) 38.8 (39.8) 9.0 (9.7)
0.1 max 12.6 23.0 8.4
T2(1) 6.7 (5.9) 11.8 (11.3) 4.7 (3.7)
T2(0.1) 6.7 (7.2) 11.6 (14.1) 4.6 (4.5)
Mei 9.6 15.4 7.4
T3(0.1,1) 7.1 (7.6) 11.9 (16.7) 5.3 (5.3)
T3(1,1) 6.8 (7.3) 15.7 (19.6) 4.6 (4.5)
0.3 max 9.6 16.7 6.6
T2(1) 3.0 (2.0) 4.4 (3.5) 2.4 (1.4)
T2(0.1) 3.5 (5.2) 5.6 (10.1) 2.7 (3.3)
Mei 4.9 7.0 4.0
T3(0.1,1) 4.6 6.7 3.9
T3(1,1) 3.0 4.3 2.5
0.5 max 8.6 14.4 5.8
T2(1) 2.3 3.0 2.0
T2(0.1) 2.8 4.0 2.1
Mei 3.8 5.0 3.0
T3(0.1,1) 4.0 5.4 3.3
T3(1,1) 2.3 3.0 2.0
1 max 7.2 12.1 5.1
T2(1) 2.0 2.0 2.0
T2(0.1) 2.0 2.6 2.0
Mei 3.0 3.4 2.3
T3(0.1,1) 3.4 4.3 3.0
T3(1,1) 2.0 2.1 2.0
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 687
TABLE 6
Comparison of EDD, parallel and simple procedures
p μ T2(0.1), EDD Parallel, EDD
0.1 0.7 6.5 6.4
0.005 1.0 27.1 22.9
0.005 0.7 54.5 45.8
0.25 0.3 12.0 10.5
0.4 0.2 14.4 12.3
0.0025 1.5 23.3 17.8
6. Parallel mixture procedure. The procedures considered above depend on
a parameter p0, which presumably should be chosen to be close to the unknown
true fraction p. While Table 5 suggests that the value of p0 is fairly robust when p0
does not deviate much from the true p, to achieve robustness over a wider range of
the unknown parameter p, we consider a parallel procedure that combines several
procedures, each using a different p0 to monitor a different range of p values. The
thresholds of these individual procedures will be chosen so that they have the same
ARL. For example, we can use two different values of p0, say a small p0 = p1 and
a large p0 = p2, and then choose thresholds b1 and b2 to obtain the same ARL for
these two procedures. The parallel procedure claims a detection if at least one of
the component procedures reaches its threshold, specifically
Tparallel min
{
T2(p1), T2(p2)
}
.(6.1)
To compare the performance of the parallel procedure with that of a single T2,
we consider a case with N = 400 and m1 = 200. For the single mixture pro-
cedure we use the intermediate value p0 = 0.10 and threshold value b = 44.7,
so P∞{T2 ≤ 1000} ≈ 0.10 and hence the ARL ≈ 10,000. For the parallel pro-
cedure we consider the values p1 = 0.02 and p2 = 0.33. For the threshold val-
ues b1 = 21.2 and b2 = 87.7, respectively, we have P∞{T2(pi) ≤ 1000} ≈ 0.05,
i = 1,2. By the Bonferroni inequality P∞{min[T2(p1), T2(p2)] ≤ 1000} ≤ 0.1, so
conservatively E∞{Tparallel} ≥ 10,000. Table 6 shows that the expected detection
delays of the parallel procedure are usually smaller than those of the single proce-
dure, particularly for very small or very large p. Presumably these differences are
magnified in problems involving larger values of N , which have the possibility of
still smaller values of p.
Simulations indicate that because of dependence between the two statistics used
to define the parallel procedure, the ARL is actually somewhat larger than the Bon-
ferroni approximation suggested. Since the parallel procedure becomes increas-
ingly attractive in larger problems, which provide more room for improvement
over a single choice of p0, but which are also increasingly difficult to simulate, it
would be interesting to develop a more accurate theoretical approximation for the
ARL.
688 Y. XIE AND D. SIEGMUND
An attractive alternative to the parallel procedure would be to use a weighted
linear combination for different values of p0 of the statistics used to define T2
or T3. Our approximation for the ARL can be easily adapted, but some modest
numerical exploration suggests that the expected detection delay is not improved
as much as for the parallel procedure.
7. Profile-based procedure for structured problems. Up to now we have
assumed there is no spatial structure relating the change-point amplitudes at differ-
ence sensors. In this section we will consider briefly a structured problem, where
there is a parameterized profile of the amplitude of the signal seen at each sensor
that is based on the distance of the sensor to the source of the signal. Assuming
we have some knowledge about such a profile, we can incorporate this knowledge
into the definition of an appropriate detection statistic. Our developments follow
closely the analysis in [16].
Assume the location of the nth sensor is given by its coordinates xn, n =
1, . . . ,N at points in Euclidean space, which for simplicity we take to be on an
equi-spaced grid. We assume that the source is located in a region D, which is
a subset of the ambient Euclidean space. In our example below we consider two
dimensional space, but three dimensions would also be quite reasonable. Assume
the change-point amplitude at the nth sensor is determined by the expression
μn =
M∑
m=1
rmαzm(xn),(7.1)
where M is the number of sources, zm ∈ D is the (unknown) spatial location of the
mth source, αz(x) is the profile function, and the scalar rm is an unknown param-
eter that measures the strength of the mth signal. The profile function describes
how the signal strength of the mth point source has decayed at the nth sensor.
We assume some knowledge about this profile function is available. For example,
αz(x) is often taken to be a decreasing function of the Euclidean distance between
z and x. The profile may also depend on finitely many parameters, such as the rate
of decay of the function. See [11] or [12] for examples in a fixed sample context.
If the parameters rm are multiplied by a positive constant and the profile αzm(xn)
divided by the same constant, the values of μn do not change. To avoid this lack
of identifiability, it is convenient to assume that for all z the profiles have been
standardized to have unit Euclidean norm, that is,
∑
x α
2
z (x) = 1 for all z.
7.1. Profile-based procedure. Under the assumption that there is at most one
source, say at z, for observations up to time t with a change-point assumed to
equal k, the log likelihood function for observations from all sensors (3.1) is
(t, k, r, z) =
N∑
n=1
[
rαz(xn)(Sn,t − Sn,k) − r2(t − k)α2z (xn)/2
]
.(7.2)
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 689
When maximized with respect to r this becomes
1
2
[{∑
n
αz(xn)Un,k,t
}+]2
.(7.3)
Maximizing the function (7.3) with respect to the putative change-point k and the
source location z, we obtain the log GLR statistic and a profile-based stopping rule
of the form
Tprofile = inf
{
t : max
0≤k[{∑
n
αz(xn)Un,k,t
}+]2
≥ b
}
.(7.4)
If the model is correct, (7.4) is a matched-filter type of statistic.
7.2. Theoretical ARL of profile-based procedure. Using the result presented
in [16], we can derive an approximation for the ARL of the profile-based proce-
dure. We consider in detail a special case where d = 2 and the profile is given by
a Gaussian function
αz(x) = 1√2πβ e
−(1/(4β))‖x−z‖2, x ∈ R2, β > 0.(7.5)
The parameter β > 0 controls of rate of profile decay and is assumed known. With
minor modifications one could also maximize with respect to a range of values
of β .
Although the sensors have been assumed to be located on the integer lattice of
two-dimensional Euclidean space, it will be convenient as a very rough approxi-
mation to assume that summation over sensor locations x can be approximated by
integration over the entire Euclidean space. With this approximation,
∑
x α
2
z (x),
which we have assumed equals 1 for all z, becomes
∫
R2 α
2
z (x) dx, which by (7.5)
is readily seen to be identically 1. The approximation is reasonable if β is large, so
the effective distance between points of the grid is small, and the space D, assumed
to contain the signal, is well within the set of sensor locations (so edge effects can
be ignored and the integration extended over all of R2).
It will be convenient to use the notation
〈f,g〉 =
∫
R2
f (x)g(x) dx.(7.6)
Let α˙z denote the gradient of αz with respect to z. Then according to [16],
P
∞{Tprofile ≤ m}
∼ m exp(−b/2)(b/4π)3/221/2(7.7)
×
∫ (b/m0)1/2
(b/m1)1/2
uν2(u) du
∫
D
∣∣det (〈α˙z, α˙z 〉)∣∣1/2 dz.
690 Y. XIE AND D. SIEGMUND
To evaluate the last integral in (7.7), we see from (7.5) that α˙z satisfies
α˙z(x) = αz(x)(x − z)/(2β).(7.8)
Hence by (7.6) 〈α˙z, α˙z 〉 is a 2 × 2 matrix of integrals, which can be easily evalu-
ated, and its determinant equals 1/(16β4). Hence the last integral in (7.7) equals
|D|/(4β2) where |D| denotes the area of D. Arguing as above from the asymptotic
exponentiality of Tprofile, we find that an asymptotic approximation for the average
run length is given by
E
∞{Tprofile}
(7.9)
∼ 16(2π3)1/2β2b−3/2 exp(b/2)/[∫ (b/m0)1/2
(b/m1)1/2
uν2(u) du · |D|
]
.
7.3. Numerical examples. In this section we briefly compare the unstructured
detection procedure based on T2 with the profile-based procedure in the special
case that the assumed profile is correct.
Assume that the profile is given by the Gaussian function (7.5) with parameter
β = 1 and both procedures are window-truncated with m0 = 1, m1 = 100. The
number of sensors is N = 625 distributed over a 25× 25 square grid with center at
the origin. In this situation, approximately p = 0.016 sensors are affected. In the
specification of T2, we take p0 = 0.05.
The thresholds are chosen so that the average run lengths when there is no
change-point are approximately 5000. Using (7.7), we obtain P∞{Tprofile ≤ 250} =
0.050 for b = 29.5. From 500 Monte Carlo trials we obtained the threshold 26.3,
so the theoretical approximation appears to be slightly conservative.
To deal with a failure to know the true rate of decay of the signal with distance,
we could maximize over β , say, for β ∈ [0.5,5]. A suitable version of (7.7) indi-
cates the threshold would be 33.8. This slight increase to the threshold suggests
that failure to know the appropriate rate of decay of the signal with distance leads
to a relatively moderate loss of detection efficiency.
For comparisons of the EDD, we used for the profile-based procedure the
threshold 26.3, given by simulation, while for T2(0.05) we used the analytic ap-
proximation, which our studies have shown to be very accurate. Table 7 compares
TABLE 7
Comparison of EDD, profile-based and unstructured procedures
b EDD r = 1 EDD, r = 1.5
Profile-based procedure 26.3 25.6 12.3
Unstructured procedure 39.7 78.3 35.8
SEQUENTIAL MULTI-SENSOR CHANGE-POINT DETECTION 691
the expected detection delay of the profile-based procedure with that of the mix-
ture procedure. As one would expect from the precise modeling assumptions, the
profile-based procedure is substantially more powerful.
In many cases there will be only a modest scientific basis for the assumed pro-
file, especially in multidimensional problems. The distance between sensors rela-
tive to the decay rate of the signal is also an important consideration. It would be
interesting to compare the structured and the unstructured problems when the as-
sumed profile differs moderately or substantially from the true profile, perhaps in
the number of sources of the signals, their shape, the rate of decay, or the locations
of the sensors.
8. Discussion. For an unstructured multi-sensor change-point problem we
have suggested and compared a number of sequential detection procedures. We
assume that the pre- and post-change samples are normally distributed with known
variance and that both the post-change mean and the set of affected sensors are un-
known. For performance analysis, we have derived approximations for the average
run length (ARL) and the expected detection delay (EDD), and have shown that
these approximations have reasonable accuracy. Our principal procedure depends
on the assumption that a known fraction of sensors are affected by the change-
point. We show numerically that the procedures are fairly robust with respect to
discrepancies between the actual and the hypothesized fractions, and we suggest
a parallel procedure based on two or more hypothesized fractions to increase this
robustness.
In a structured problem, we have shown that knowledge of the correct structure
can be implemented to achieve large improvements in the EDD. Since the assumed
structure is usually at best only approximately correct, an interesting open ques-
tion is the extent to which failure to hypothesize the appropriate structure com-
promises these improvements. One possible method to achieve robustness against
inadequacy of the structured model would be a parallel version of structured and
unstructured detection.
REFERENCES
[1] ALDOUS, D. (1989). Probability Approximations Via the Poisson Clumping Heuristic. Applied
Mathematical Sciences 77. Springer, New York. MR0969362
[2] CHEN, M., GONZALEZ, S., VASILAKOS, A., CAO, H. and LEUNG, V. C. M. (2010). Body
area networks: A survey. Mobile Netw. Appl. 16 171–193.
[3] LAI, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems.
J. Roy. Statist. Soc. Ser. B 57 613–658. MR1354072
[4] LÉVY-LEDUC, C. and ROUEFF, F. (2009). Detection and localization of change-points in high-
dimensional network traffic data. Ann. Appl. Stat. 3 637–662. MR2750676
[5] LORDEN, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Statist. 42
1897–1908. MR0309251
[6] MEI, Y. (2010). Efficient scalable schemes for monitoring a large number of data streams.
Biometrika 97 419–433. MR2650748
692 Y. XIE AND D. SIEGMUND
[7] PAGE, E. S. (1954). Continuous inspection schemes. Biometrika 41 100–115. MR0088850
[8] PAGE, E. S. (1955). A test for a change in a parameter occurring at an unknown point.
Biometrika 42 523–527. MR0072412
[9] PETROV, A., ROZOVSKII, B. L. and TARTAKOVSKY, A. G. (2003). Efficient Nonlinear Fil-
tering Methods for Detection of Dim Targets by Passive Systems, Vol. IV. Artech House,
Boston, MA.
[10] POLLAK, M. and SIEGMUND, D. (1975). Approximations to the expected sample size of cer-
tain sequential tests. Ann. Statist. 3 1267–1282. MR0403114
[11] RABINOWITZ, D. (1994). Detecting clusters in disease incidence. In Change-point Prob-
lems (South Hadley, MA, 1992). Institute of Mathematical Statistics Lecture Notes—
Monograph Series 23 255–275. IMS, Hayward, CA. MR1477929
[12] SHAFIE, K., SIGAL, B., SIEGMUND, D. and WORSLEY, K. J. (2003). Rotation space random
fields with an application to fMRI data. Ann. Statist. 31 1732–1771. MR2036389
[13] SIEGMUND, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New
York. MR0799155
[14] SIEGMUND, D. and VENKATRAMAN, E. S. (1995). Using the generalized likelihood ratio
statistic for sequential detection of a change-point. Ann. Statist. 23 255–271. MR1331667
[15] SIEGMUND, D. and YAKIR, B. (2007). The Statistics of Gene Mapping. Springer, New York.
MR2301277
[16] SIEGMUND, D. and YAKIR, B. (2008). Detecting the emergence of a signal in a noisy image.
Stat. Interface 1 3–12. MR2425340
[17] SIEGMUND, D., YAKIR, B. and ZHANG, N. R. (2011). Detecting simultaneous variant inter-
vals in aligned sequences. Ann. Appl. Stat. 5 645–668. MR2840169
[18] ŠIRJAEV, A. N. (1963). Optimal methods in quickest detection problems. Theory Probab. Appl.
8 22–46.
[19] TARTAKOVSKY, A. G. and VEERAVALLI, V. V. (2008). Asymptotically optimal quick-
est change detection in distributed sensor systems. Sequential Anal. 27 441–475.
MR2460208
[20] XIE, Y. (2011). Statistical signal detection with multi-sensor and sparsity. Ph.D. thesis, Stan-
ford Univ.
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
DUKE UNIVERSITY
DURHAM, NORTH CAROLINA, 27705
USA
E-MAIL: yao.c.xie@gmail.com
DEPARTMENT OF STATISTICS
STANFORD UNIVERSITY
STANFORD, CALIFORNIA, 94305
USA
E-MAIL: siegmund@stanford.edu