程序代写案例-OX2
时间:2021-05-05
For: Acta Numerica
Preconditioning
A. J. Wathen
Mathematical Institute, University of Oxford,
Andrew Wiles Building, Radcli↵e Observatory Quarter,
Woodstock Road, Oxford OX2 6GG, UK
E-mail: wathen@maths.ox.ac.uk
The computational solution of problems can be restricted by the availability
of solution methods for linear(ized) systems of equations. In conjunction with
iterative methods, preconditioning is often the vital component in enabling
the solution of such systems when the dimension is large. We attempt a broad
review of preconditioning methods.
CONTENTS
1 Introduction 1
2 Krylov subspace iterative methods 5
3 Symmetric and positive definite matrices 8
4 Semidefinite matrices 18
5 Symmetric and indefinite matrices 19
6 Nonsymmetric matrices 27
7 Some comments on symmetry 33
8 Some comments on practical computing 35
9 Summary 38
References 39
1. Introduction
This review aims to give a comprehensive view of the current state of pre-
conditioning for general linear(ized) systems of equations. However we start
with a specific example.

2 A. J. Wathen
Consider a symmetric matrix with constant diagonals:
A =
266664
a0 a1 · · an1
a1 a0 a1 · ·
· a1 a0 · ·
· · · · a1
an1 · · a1 a0
377775 2 Rn⇥n.
Such a matrix is called a symmetric Toeplitz matrix. It is clearly defined
by just n real numbers, but what is the best solution method for linear sys-
tems involving A? The fastest known stable factorisation (direct) methods
for solving a linear system Ax = b for x 2 Rn given b 2 Rn require O(n2)
floating point computations—this is still better than the O(n3) computa-
tions required for a general real n ⇥ n matrix using the well-known Gauss
Elimination algorithm.
Strang (1986) observed that one could approximate A by the circulant
matrix
C =
2666666666666666664
a0 a1 . . . abn2 c abn12 c . . . a2 a1
a1 a0 a1 . . . abn2 c abn12 c . . . a2
. . . a1 a0
. . . . . . abn2 c
. . . . . .
abn2 c . . .
. . .
. . .
. . . . . .
. . . abn12 c
abn12 c
. . . . . .
. . .
. . .
. . . . . . abn2 c
. . .
. . . abn2 c . . .
. . . a0 a1 . . .
a2 . . . abn12 c abn2 c . . . a1 a0 a1
a1 a2 . . . abn12 c abn2 c . . . a1 a0
3777777777777777775
2 Rn⇥n
where the outer diagonals of A have been overwritten by the central di-
agonals which are ‘wrapped around’ in a periodic manner. Circulant ma-
trices are diagonalised by a Fourier basis (Van Loan 1992) and it follows
that for any given b 2 Rn, the fast Fourier transform (FFT) (Cooley and
Tukey 1965) enables the computation of the matrix-vector product Cb and
of the solution, y 2 Rn of Cy = b in O(n log n) arithmetic operations. The
importance of Strang’s observation is that C is close to A if the discarded
entries adn2 e, . . . , an are small; precise statements will describe this in terms
of decay in the entries of A along rows/columns moving away from the di-
agonal. With such a property, one can show that not only are the matrices
A and C close entrywise, but that they are spectrally close, i.e. that the
eigenvalues of C1A (equivalently the eigenvalues of the matrix ‘pencil’
A C) are mostly close to 1. As a consequence, an appropriate iterative
method (it would be the conjugate gradient method if A is positive definite:
Preconditioning 3
see below) will compute a sequence of vectors x1, x2, . . . which converge
rapidly from any starting guess, x0 to the solution x of Ax = b. At each
iteration only a matrix-vector product with A and the solution of a single
system with C as well as vector operations need be computed. For any de-
sired accuracy, a fixed number of iterations not dependent on the dimension
n will be required. The outcome is that use of the FFT for solving systems
with C and for doing multiplications of vectors by A (since A can in an ob-
vious way be embedded in a circulant matrix of dimension 2n 1⇥ 2n 1)
guarantees that the iterative method will require only O(n log n) operations
to compute the solution.
Strang’s ‘proposal for Toeplitz matrix calculations’ is thus to employ the
circulant matrix C as a preconditioner for linear systems involving A. One
can naively think of the preconditioning as converting a linear system
Ax = b
which is not so readily solved into the system
C1Ax = C1b
for which there is a fast solution method. This is not exactly what is done:
generally C1A will be nonsymmetric even though A and C are symmetric,
but it is possible to precondition with C whilst preserving symmetry of the
preconditioned system matrix; see Saad (2003, Algorithm 9.1).
This example captures the essence of preconditioning. Not always does
one have matrices which are so obviously structured (nor with potentially
so many non-zero entries), nor does one always have a clever and ecient
tool like the FFT, but, especially for matrices of large dimension, it can
be extremely beneficial to have a good matrix approximation, i.e. a good
preconditioner, when solving linear systems of equations. Indeed, often such
a preconditioner is necessary in order to enable practical computation at all
of large-scale problems within reasonable time on any given computational
platform. The reduction from O(n2) to O(n log n) in the above example
brings some large-dimensional Toeplitz systems into the range of practical
computation.
One does not need to necessarily have a preconditioner as a matrix; a
linear operation (i.e. a computational procedure which applies a linear op-
eration to a vector) is what is required.
This article is a personal perspective on the subject of preconditioning.
It addresses preconditioning only in the most common context of the so-
lution of linear systems of equations. Much research on this problem has
been directed at sparse matrices coming from the discretization of partial
di↵erential equations (PDEs) by finite element, finite volume or finite di↵er-
ence methods. This is certainly one major source for such problems, but far
from the only area where large dimensional linear systems arise. It is true,
4 A. J. Wathen
however, that underlying properties of a PDE problem are often reflected in
the matrices which arise from discretization, so that matrices arising from
PDEs can inherit structural properties which allow them to not simply be
considered as arrays of numbers. Indeed there are many PDE analysts who
consider PDE matrices as simply members of families of discrete operators
with many of the properties of the underlying continuous operators from
which they are derived through discretization, see for example Mardal and
Winther (2011), or the recent book by Ma´lek and Strakosˇ (2014). This can
be a very valuable viewpoint. Indeed from this perspective, not applying any
preconditioning is seen to coincide with rather an unnatural choice of map-
pings on the underlying spaces (Gu¨nnel et al. 2014). Sometimes one comes
across the perception that multigrid must be the most ecient method to
solve any PDE problem: this is far from the truth, however multigrid meth-
ods and ideas remain very important both as solvers for certain simpler
problems, and, in particular, as components of preconditioners for a wider
class of problems as we shall describe.
Whether or not a matrix system arises from PDEs, knowledge of the
source problem can be helpful in designing a good preconditioner. Purely
algebraic approaches which simply take the numerical matrix entries as
input can have merit in situations where little is known about the underlying
problem—they certainly have the useful property that it is usually possible
to apply such a method—but the generated preconditioners can be poor.
This division between approaches to preconditioning essentially comes down
to whether one is ‘given a problem’ or ‘given a matrix’ (Neytcheva 1999).
Though we emphasise much about the former, we also give pointers to and
brief description of the latter.
Preconditioners are overwhelmingly used together with Krylov subspace
iterative methods (van der Vorst 2003, Greenbaum 1997, Saad 2003, Meurant
1999, Hackbusch 1994, Elman et al. 2014b, Liesen and Strakosˇ 2013, Olshan-
skii and Tyrtyshnikov 2014), and the applicable iterative methods make
di↵erent requirements of a preconditioner: for example, one could employ
a nonsymmetric preconditioner for a symmetric matrix, but in general one
would then lose the possibility to use the very e↵ective conjugate gradient or
minres Krylov subspace iterative methods. Therefore, we have structured
this paper into sections based on the symmetry and definiteness proper-
ties of the presented coecient matrix. In essence, this subdivision can
be thought of also as in terms of the Krylov subspace method of choice.
The coecient matrix will usually be denoted by A as in the example above
(though di↵erent fonts will be used). For any invertible preconditioning ma-
trix or linear operator, P , there is an iterative method that can be used, but
it is usually desirable not to ignore structural properties such as symmetry
when choosing/deriving a preconditioner.
We have generally restricted consideration to real square matrices, A, but
Preconditioning 5
it will be evident that some of the di↵erent preconditioning ideas could be
applied to matrices over di↵erent algebraic fields, in particular the complex
numbers (for which ‘symmetric’ ! ‘Hermitian’ etc.). It is also true that
some of the ideas presented apply to rectangular matrices such as arise in
least squares problems, but, except tangentially, this paper does not consider
such situations.
Our general strategy/intention is to describe most preconditioning ap-
proaches that we are aware of, and to guide the reader towards selecting an
appropriate strategy for any particular problem. Inevitably, some precon-
ditioning ideas have arisen in the specific context of particular problems, so
brief review of such problems is included.
Over ten years ago, Benzi (2002) wrote an excellent survey of precon-
ditioning techniques which contains much wisdom and remains an excel-
lent source today, in particular with regard to incomplete factorization and
sparse approximate inverse methods. The text by Chen (2005) describes
several approaches and includes many example applications.
In section 3 preconditioning approaches for symmetric and positive defi-
nite matrices are discussed. There follows in section 4 some brief comments
on symmetric semidefinite matrices. This is followed in section 5 by precon-
ditioning for symmetric and indefinite matrices and finally for nonsymmetric
matrices in section 6. We start, however, with a brief review of on iterative
methods in order to orient ourselves.
2. Krylov subspace iterative methods
For any vector r 2 Rn, if it is easy to compute the matrix-vector product
Ar, then it is easy to generate vectors in the Krylov subspaces
Kk(A, r) = span{r, Ar,A2r, . . . , Ak1r}, k = 1, 2, . . .
by forming A(Ar) etc.. In particular, when A is sparse (i.e. has few non-zero
entries in any row), then such matrix-vector products are readily computed.
For general dense matrices, a matrix-vector multiplication takes O(n2) op-
erations, hence there is less possibility for preconditioning to reduce this
complexity unless some other structure can be used such as in the Toeplitz
example above.
Krylov subspace iterative methods for finding x 2 Rn which satisfies
Ax = b compute iterates xk for which
xk x0 2 Kk(A, r0), k = 1, 2, . . . (2.1)
from some initial guess x0 and thus require just one matrix-vector product
computation at each iteration. The vector r0 = b Ax0 is the residual
associated with x0; in general rj = bAxj , j = 0, 1, 2, . . .. If x0 = 0 then
xk 2 Kk(A, b), k = 1, 2, . . . .
6 A. J. Wathen
Thus the iterates and residuals of every Krylov subspace method satisfy
xk x0 =
k1X
j=0
↵jA
jr0
for some coecients ↵j , j = 0, 1, 2, . . . , k 1. Hence
xk = x0 + q(A)r0 (2.2)
where q is the polynomial of degree k 1 with q(z) =Pk1j=0 ↵jzj . Premul-
tiplying (2.2) by A and subtracting from b we obtain
bAxk = bAx0 Aq(A)r0
thus
rk = r0 Aq(A)r0 = p(A)r0 (2.3)
where
p(z) = 1 z
k1X
j=0
↵jz
j = 1
kX
j=1
↵j1zj
is a polynomial of degree k which satisfies p(0) = 1.
All Krylov subspace methods are thus described by (2.3) and each dif-
ferent Krylov subspace method is characterised by an optimality or (bi-)or-
thogonality property.
When A is symmetric and positive definite (so that kvkA = (vtAv)
1
2 de-
fines a vector norm), the conjugate gradient method (cg) (Hestenes and
Stiefel 1952), (Saad 2003, algorithm 6.18) requires only the one matrix-
vector product with A as well as five simple vector operations at each itera-
tion to compute iterates satisfying (2.1) which minimise kxxkkA over the
Krylov subspace.
When A is symmetric but indefinite, rTAr takes both positive and nega-
tive values so that a norm can not be defined as for the conjugate gradient
method. The Krylov subspace method of choice for symmetric indefinite sys-
tems, the minimum residual (minres) method (Paige and Saunders 1975),
(Fischer 2011, page 187) takes one matrix-vector product with A as well
as seven simple vector operations at each iteration to compute iterates sat-
isfying (2.1) which minimise the Euclidean norm of the residual, krkkI =
(rTk rk)
1
2 .
When A is nonsymmetric, there is not so obvious a method of choice,
hence several Krylov subspace methods are widely used. The most popular
is gmres (Saad and Schultz 1986), (Saad 2003, algorithm 6.9) which, sim-
ilarly to minres, computes iterates which minimise the Euclidean norm of
the residual, but by contrast to minres requires an increasing ammount
Preconditioning 7
of computation and storage at each successive iteration to achieve this.
Thus gmres can be a good method if only few iterations are needed
to achieve acceptable convergence—this might be the case if one has a
good preconditioner—but it is not practical if many iterations are required.
Restarting gmres at (regular) intervals reduces the computational burden,
but also in most cases severely slows convergence. Thus use is made of
Krylov subspace methods for nonsymmetric A which require only a fixed
ammount of work at each iteration, but which necessarily therefore have
more erratic convergence (Faber and Manteu↵el 1984). These are gener-
ally based on the nonsymmetric Lanczos method: we mention bicgstab
(van der Vorst 1992), bicgstab(`) (Sleijpen and Fokkema 1993), qmr
(Freund and Nachtigal 1991), idr(s) (Sonneveld and van Gijzen 2008),
though there are many others. There are also several variants, see Simoncini
and Szyld (2007).
For any nonsingular nonsymmetric matrix, it is always possible to attack
the symmetric and positive definite system of normal equations
ATAx = AT b, (2.4)
(or AAT y = b, x = AT y) and in some instances this has been proposed;
see for example Laird and Giles (2002). A related approach is to solve a
symmetric indefinite system in a form such as
I A
AT 0

r
x

=

b
0

. (2.5)
However, whichever form is employed, convergence can be rather slow; AT
is not generally such a good preconditioner for A! To give an idea of the
convergence that might be expected: if A was already symmetric and posi-
tive definite, then generally the number of cg iterations required for (2.4)
would be approximately the square of the number of cg iterations for the
system Ax = b. There are situations in which this may not be such a di-
culty, however, it is generally harder to find good preconditioners for ATA
(or AAT ) than for A and this may be the main reason that more attention
and usage is made of the nonsymmetric Krylov subspace iterative meth-
ods above. An important and underappreciated general observation in this
regard is due to Braess and Peisker (1986):
Remark 1. P TP can be an arbitrarily bad preconditioner for ATA re-
gardless of the quality of P as a preconditioner for A.
The above remark applies even if A, P are symmetric as is shown by the
elementary example in Braess and Peisker (1986). We will see in section 5
situations where one can build a particular preconditioner P for A so that
P TP is also a good preconditioner for ATA, indeed the main result in Braess
and Peisker (1986) provides sucient conditions for this to occur.
8 A. J. Wathen
Now, cg for (2.4) will generate a Krylov space of the form K(ATA,AT b)
for example, when x0 = 0. Perhaps the most suitable implementation alge-
braically equivalent to a symmetric Krylov subspace method for the normal
equations is the lsqr algorithm of Paige and Saunders (1982) which also
computes least squares solutions for rectangular A. An advantage of these
methods is that the convergence theory of cg and minres apply.
The reason that there are methods of choice (cg , minres) for symmetric
matrices, but a whole range of possibilities for nonsymmetric matrices arises
because of convergence guarantees. In the case of symmetric matrices there
are descriptive convergence bounds which depend only on the eigenvalues
of the coecient matrix, thus the number of iterations which will be needed
for convergence to a given tolerance can be estimated and bounded a priori;
a good preconditioner will ensure fast convergence. These comments apply
also for normal equation approaches for nonsymmetric matrices.
By contrast, to date there are no generally applicable and descriptive
convergence bounds even for gmres; for any of the nonsymmetric methods
without a minimisation property, convergence theory is extremely limited.
The convergence of gmres has attracted a lot of attention, but the absence
of any reliable way to guarantee (or bound) the number of iterations needed
to achieve convergence to any given tolerance in most situations means that
there is currently no good a priori way to identify what are the desired
qualities of a preconditioner. This is a major theoretical diculty, but
heuristic ideas abound.
All Krylov subspace methods are to an extent a↵ected by rounding errors
in a computer if run for a large enough number of iterations, see Liesen and
Strakosˇ (2013) or the discussion in Greenbaum (1997, Chapter 4). In this
paper we ignore any such e↵ects since an important consequence of good
preconditoning is that few iterations should be necessary for acceptable
convergence, thus round-o↵ e↵ects are not usually able to build up and
become a significant issue. Similarly, there is theory which describes the
eventual (i.e. asymptotic) convergence of Krylov subspace methods such as
cg, but this is also not generally relevant to preconditioned iterations—it is
rapid error or residual reduction in early iterations that is typically desired.
3. Symmetric and positive definite matrices
The standard cg convergence bound for iterative solution of Ax = b when
A is symmetric and positive definite comes straight from (2.3) and the min-
imisation condition described above and is
kx xkkA
kx x0kA  minpk2⇧k,pk(0)=1 max2(A) |pk()|,
where kyk2A = yTAy, ⇧k is the set of real polynomials of degree less or
equal to k and (A) is the eigenvalue spectrum of A (Greenbaum 1997,
Preconditioning 9
section 3.1). One immediately deduces that if A has a small number of
distinct eigenvalues, say s, then cg will terminate at the sth iteration with
the correct solution since there is a degree s polynomial which has these
eigenvalues as its roots. Iteration error can already be acceptably small
after few iterations when the eigenvalues are clustered. In particular if one
considers all eigenvalues to lie in a single interval Amin   Amax then use
of Chebyshev polynomials yields the convergence bound
kx xkkA
kx x0kA  2
✓p
 1p
+ 1
◆k
, (3.1)
where  = A = Amax/
A
min is the Euclidean norm (or 2-norm) condition
number for such a symmetric positive definite matrix; see again Greenbaum
(1997, section 3.1). An equivalent statement is that it takes approximately
1
2
p
| log ✏/2| iterations to ensure that kx xkkA/kx x0kA  ✏.
The origin of the name preconditioning is generally now associated with
this convergence bound in the sense that reduction of the condition num-
ber, , guarantees faster convergence (though for historical accuracy, see
Benzi (2002)); what we now understand as preconditioning is, however, a
much broader concept. If P is also symmetric and positive definite and the
extreme generalised eigenvalues of the ‘pencil’ Av = Pv (equivalently the
extreme eigenvalues of P1A) satisfy
A =
Amax
Amin

P1A
max
P
1A
min
= P
1A,
then cg convergence for the preconditioned system is guaranteed in many
fewer iterations than for the original unpreconditioned system since (3.1)
holds now with  = P
1A. Note that for cg even with preconditioning, it
remains the relative magnitude of the error in k·kA which arises in the bound
(3.1); a symmetric positive definite preconditioner a↵ects the eigenvalues
and thus the speed of convergence, but not the norm in which convergence
occurs. Use of a nonsymmetric preconditioner, P , would prevent use of cg;
a nonsymmetric Krylov subspace method would in that case have to be used
in general (but see Section 7).
To establish the quality of P as a preconditioner for A, it is sucient
to find (and a priori prove if possible!) upper and lower bounds on the
eigenvalues of the form
`  P1Amin , P
1A
max  ⌥ (3.2)
or equivalently—and usually more tractably—upper and lower bounds on
the generalised Rayleigh quotient
`  u
TAu
uTPu
 ⌥, for all u 2 Rn {0}. (3.3)
10 A. J. Wathen
In either case P
1A  ⌥/` leads directly to a convergence bound through
(3.1) since the right hand side varies monotonically with .
One does not actually replace A with P1A in the cg algorithm to imple-
ment preconditioning in the case of symmetric A since this would generally
destroy symmetry, but even when symmetry is preserved (as in Saad (2003,
Algorithm 9.1)), the action of P1 is required on a di↵erent vector at each
iteration.
One thus has clear sucient criteria for identifying a good preconditioner,
P , in this symmetric positive definite case:
• it should be easy to solve systems of equations with the preconditioner
P as coecient matrix, that is, the action of P1 on a vector should
be readily computed
• P1A should be small.
These are competing aims; P = I satisfies the first but not the second unless
A is already well conditioned and P = A satisfies the second but not the
first unless solution of systems with A is already easy!
3.1. Diagonal scaling
The earliest preconditioning ideas came from consideration of the first of
these criterion and are largely in the ‘given a matrix’ category. Taking P
to be a diagonal matrix is very simple; if in fact P = diag(A) then this is
widely called Jacobi preconditioning. For a well conditioned matrix which is
just badly scaled, this is generally all that is needed, but in general diagonal
preconditioning is likely to achieve very little in terms of reducing iterations
or computation time except in specific situations as below. There are two
related types of problem where Jacobi preconditioning (or just diagonal
scaling) is important.
Firstly, the ‘identity’ matrix for finite element computation is the mass
matrix, Q, which is the Gram matrix of the finite element basis functions
{i, i = 1, . . . , n}, i.e. it has entries
Q = {qi,j , i, j = 1, . . . , n}, qi,j =
Z

⇢i(x)j(x) (3.4)
for a domain ⌦. Here the potentially variable coecient ⇢ might, for exam-
ple, represent material density in a structure, and if this is highly variable
or if the discretization has a large variation in the size (length, area, vol-
ume) of the elements—as is inevitable on an irregular or adapted mesh—
then this variability is also represented in the diagonal entries of Q. Thus
P = diag(Q) is provably a good preconditioner in this case (Wathen 1986).
For example the gallery(’wathen’,n,n) matrix in matlab is a finite el-
ement mass matrix with randomly variable density for a so-called seredipity
Preconditioning 11
finite element. The precise eigenvalue bounds of the form (3.2) or (3.3), to-
gether with tabulated values for ` and ⌥ for many types of finite elements,
are derived in Wathen (1986).
We briefly comment that in problems as above with analytic a priori
eigenvalue bounds ` and ⌥ which are (reasonably) tight, the possibility also
arises to use a fixed but small number of Chebyshev (semi-)iterations as a
(linear) preconditioner—see Wathen and Rees (2009).
The second important situation arises also through highly variable coef-
ficients; the classic example arises from the operator
r · (Kru)
where the permeability K has large jumps. For continuous finite element
approximation of such terms, Graham and Hagger (1999) were the first to
identify the utility of diagonal scaling to render the preconditioned prob-
lem more homogeneous. For such an elliptic PDE, Jacobi preconditioning
is highly unlikely to be an adequate preconditioner on its own, but when
used together with an appropriate preconditioner for a homogeneous elliptic
PDE operator—see below—the combination can be much better than just
the elliptic preconditioner by itself. In the literature there are several de-
velopments of the idea of Graham and Hagger, including for non-isotropic
permeabilities.
3.2. Incomplete Cholesky factorization
It is well known that Cholesky factorization (LLT or LDLT factorization
where L is lower triangular and D is diagonal) of a symmetric and positive
definite matrix leads immediately via forward and backward substitution to
direct solution of a matrix system. For sparse matrices, unfortunately, the
factor L is usually much less sparse (i.e. has many more non-zeros) than
A. Such fill-in is usually unavoidable, though widely used sparse direct
methods use sophisticated algorithms in attempting to minimize it (HSL
2013, Davis 2004, Amestoy et al. 2000, Li 2005). In the context of PDE
problems, the essential rule is that a sparse direct method can be a good
solver for a problem on a 2-dimensional spatial domain, but is generally
computationally infeasible for a 3-dimensional problem. If one preselects
(or dynamically selects) the sparsity pattern of L and drops terms which
would fill outside of this pattern, then one can form an incomplete Cholesky
factorization
A = LLT + E
where E represents the error due to the dropped terms. One can additionally
threshold so that small entries are not kept in the incomplete factors. With
P = LLT , it is clear that forward and backwards substitution still enable
12 A. J. Wathen
easy solution of systems with the preconditioner, and the main remaining
question is whether the preconditioner leads to fast convergence of cg.
In general many di↵erent variants of this basic idea exist and software for
various incomplete factorization-based algorithms is widely available. This
is certainly a plus point, but unless A is diagonally dominant, incomplete
Cholesky algorithms do not always work well, indeed they do not always
exist without modification, see Meijerink and van der Vorst (1977), and
see Benzi (2002) for discussion of this as well as many other aspects of
incomplete factorization preconditioning. With regard to robustness, there
are certain advantages to computing an incomplete triangular factorization
not using Cholesky but via orthogonalization in the inner product generated
by A; see Benzi and Tuma (2003). This approach is called RIF (robust
incomplete factorization) and shares some similarity with the AINV sparse
approximate inverse discussed below.
Let us make some of these statements precise in the context of solving
a simple discretized elliptic boundary value problem, namely the common
5-point finite di↵erence approximation of the Dirichlet problem for the (neg-
ative) Laplacian
r2u = f in ⌦, u given on @⌦
which arises from second order central di↵erencing on a regular grid with
spacing h in each of the two spatial dimensions. The choice of sign ensures
that the operator and thus the arising matrix are positive rather than neg-
ative definite. If the domain, ⌦, is the unit square (so that the boundary
@⌦ is four coordinate-aligned line segments) then the arising matrix A is
of dimension N ⇡ h2 for small h and also A = O(h2), see Elman et al.
(2014b, section 1.6). Thus (3.1) predicts that O(h1) cg iterations would
be required for convergence of unpreconditioned cg. The simplest incom-
plete Cholesky preconditioner for which L has the same sparsity pattern
as the lower trianglular part of A also leads to P
1A = O(h2) though
there is significant clustering of a large number of the eigenvalues around 1;
the bound (3.1) based only on the extreme eigenvalues is pessimistic in this
situation. A variant, modified incomplete factorization (due to Gustafsson
(1978) but related to rather earlier work by Dupont et al. (1968)) reduces
this to P
1A = O(h1), and generally leads to slightly faster cg conver-
gence. Applying (3.1) in this case predicts that O(h1/2) preconditioned
cg iterations would be required for convergence, and this is close to what
is seen in practical computation. This implies that for a sequence of such
problems with smaller and smaller h, it will take an increasing number of
iterations to achieve acceptable convergence as h gets smaller (and so the
matrix dimension gets larger) – a rather undesirable feature.
Fortunately, there are much better preconditioners for this and related
elliptic PDE problems which guarantee P
1A = O(1), so that the number of
Preconditioning 13
required cg iterations is essentially constant for larger dimensional discrete
problems coming from approximation on finer grids. Such preconditioning
methods are said to have P spectrally equivalent to A. The most important
class of such methods arise from multigrid ideas which we consider next.
Of course, even in the more general context of problems not coming from
PDEs, incomplete Cholesky and its variants can be of considerable utility; in
particular, algebraic preconditioners of this form can very often be applied.
Before moving on, we comment that preconditioning techniques have been
introduced that turn out to be related to incomplete factorization. A par-
ticular example is the ‘support graph’ preconditioner introduced originally
in a conference talk by Vaidya who wrote commercial software but did not
publish on the subject. The support graph idea comes out of graph-theoretic
notions and has a somewhat discrete mathematical flavour. Later analysis
both clarified the scope of the ideas—they are now essentially extended to
cover most symmetric and diagonally dominant matrices—and made con-
nections with incomplete factorization: see Bern et al. (2006), Spielman and
Teng (2014) and references therein.
3.3. Multigrid
Since the groundbreaking paper by Brandt (1977), multigrid methods have
been at the core of scientific computing. Earlier contributions, notably by
Fedorenko (1964) and Bakhvalov (1966), are now considered to have in-
troduced the key ideas of combining a simple smoothing iteration (such
as Jacobi or Gauss-Seidel iteration) with coarse grid correction on succes-
sively coarser grids for the finite di↵erence Laplacian, but Brandt’s vision
was much broader: he showed just how fast multigrid solvers could be and
introduced many ideas to keep reseachers busy for decades! For a posi-
tive definite elliptic PDE operator, it has been clear for a long time that
an appropriate multigrid method will provide an optimal solver, that is, a
solver such that the work to compute the solution scales linearly with the
dimension of the discretized problem.
Classically, multigrid is a stationary iteration (albeit a non-trivial one) for
solving Ax = b when A is a discrete Laplacian matrix (arising from finite
di↵erence or finite elements). For V- or W-cycles, iterates satisfy single
cycle contraction of the form
kx xkkA  ⌘ kx xk1kA (3.5)
where the contraction factor, ⌘ is typically around 0.1 (see, for example
Elman et al. (2014b, section 2.5)). It follows that the linear multigrid op-
erator (represented by a complicated and fairly dense matrix which one
would never want to use in computations) is an excellent candidate for a
14 A. J. Wathen
preconditioner since from (3.5) it readily follows that
1 ⌘  x
TAx
xTPx
 1 + ⌘,
see Elman et al. (2014b, Lemma 4.2). That is, applying one multigrid V-
cycle, say, is equivalent to solution of a preconditioner system with pre-
conditioner P which guarantees rapid cg convergence because P
1A 
(1 + ⌘)/(1 ⌘). There is little point in using multigrid cycles as a precon-
ditioner for cg in this way for simple problems; one might as well simply
use multigrid iteration to solve, but for more complicated problems this ob-
servation that a single multigrid cycle is an ecient spectrally equivalent
approximation of an elliptic PDE operator like the Laplacian is very useful;
see section 5. Note also the important point that arises here for the first
time in this review: a preconditioner can be a linear operation rather than
specifically a known matrix. Thus {3 V-cycles}, for example, is a suitable
preconditioner. On the other hand, iterating multigrid (or any other itera-
tion) to some residual tolerance will likely ensure that di↵erent numbers of
V-cycles are used at di↵erent Krylov subspace iterations and so the precon-
ditioner is not a fixed linear operator and the above convergence analysis
can not apply.
Multigrid – and multilevel methodology more generally – has now become
a huge subject (see for example Trottenberg et al. (2000)), with specialized
techniques for the treatment of several specific features which might arise in
a PDE (or other) problem. In the past couple of decades, algebraic multigrid
methods (AMG), which try to mimic a geometric multigrid approach but
use only information from the entries of a matrix rather than geometric
information regarding grids, boundary conditions, etc. have attracted much
attention and there is now widely available software: e.g. HSL MI20 (Boyle
et al. 2010), ML (Gee et al. 2006), BoomerAMG (Henson and Yang 2000). In
particular, in situations where a problem is fundamentally of the character
of an elliptic di↵erential operator but there are complicating features, an
AMG cycle can be an excellent preconditioner. This can be true even if
AMG is non-convergent since it may contract on all but a few modes; the
few corresponding outlier eigenvalues are easily dealt with by cg. It can
simply be much more convenient to employ an AMG cycle as (part of) a
preconditioner for a Krylov subspace method than to have to ‘hard code’
bespoke techniques for each di↵erent variant problem that comes up. So
common is this combined use of AMG and a Krylov subspace method that
it is rarely even mentioned that AMG is used as a preconditioner, but this
is almost always the case in practice.
We should mention that multigrid ideas have been tried in other contexts
than just for PDE problems; see for example Giles (2008), Livne and Brandt
(2012).
Preconditioning 15
It also needs mention in this section that a multigrid V- or W-cycle can
usually be quite easily made into a symmetric linear operator—see Elman
et al. (2014b, page 98)—though the so-called full multigrid cycle, for exam-
ple, can not. Positive definiteness is not usually an issue for an underlying
positive definite problem.
3.4. Domain Decomposition
For a very large dimensional problem, it might be a fairly natural idea to
split into separate subproblems and solve each separately (and usually in
parallel) to give an approximation to the solution of the overall problem.
In the original context of solving an elliptic boundary value problem on a
domain ⌦, this is the basic idea of domain decomposition: the subproblems
being problems expressed on subdomains ⌦k, k = 1, . . . , N with [⌦k =
⌦. In its simplest form, it correponds to a block diagonal preconditioner
with each separate subproblem represented by a single diagonal block which
may or may not be overlapping, thats is, which may or may not share
some variables with other subproblems. An important issue is how to treat
the variables either in the overlap regions or on the introduced boundaries
between subdomains. Domain decomposition was first employed abstractly
by Hermann Schwarz in the nineteenth century to prove an existence result
for an elliptic PDE, hence methods of this type are often called Schwarz
methods.
Like multigrid, domain decomposition is a huge subject which we have
no intention of covering with any depth here; see for example the books
(Quarteroni and Valli 1999), (Smith et al. 1996), (Toselli and Widlund
2004), (Mathew 2008) or the earlier survey (Chan and Mathew 1994). What
is clear, however, even from this brief description is that domain decompo-
sition gives rise to preconditioners which are well suited to parallel compu-
tation.
To give a brief idea of what can be achieved on a simple problem, we con-
sider the Poisson equation on a domain ⌦ ⇢ Rd approximated by standard
(conforming) finite elements or finite di↵erences with a mesh spacing h. If
⌦ is split into non-overlapping subdomains of width H, then there are basic
block diagonal preconditioners without coarse grid solve for which one can
establish that
P
1A  CH2(1 + log2(H/h)).
The more sophisticated domain decomposition schemes which include also
a coarse grid (or ‘wire basket’ solve) can achieve
P
1A  C(1 + log2(H/h)).
In either case, C is a generic constant not dependent on H,h and the bounds
translate into convergence estimates via (3.1). The size H can be considered
16 A. J. Wathen
to be the coarse grid size in essence; the bounds above assume exact solution
of the problems on subdomains.
With the importance of parallel and distributed computing, the domain
decomposition concept remains very important and much is known for more
advanced methods and; see the regular Domain Decomposition conference
proceeding available at ddm.org (2014).
3.5. Sparse approximate inverses
For any matrix norm, minimisation of
kI ARk
over all matrices R 2 Rn⇥n will clearly yield R = A1. Minimization with
restrictions on the entries where R may have nonzero values leads to sparse
approximate inverses. That is
R = argminR2SR kI ARk,
where SR describes the subset of entries of R which are allowed to be non-
zero, is a sparse approximate inverse for A. To be consistent with the
nomenclature in this article, P = R1 is the preconditioner here; solution
of systems with P is, of course, simply multiplication by R in this situation.
SR is strictly a subspace of matrices with restricted sparsity pattern.
The Frobenius norm
kBk2F =
X
i=1,...,n
X
j=1,...,n
b2i,j
is by far the most widely considered since the minimisation problem then
reduces to least squares problems for the columns of R separately (thus
computable in parallel). Further, these least squares problems are all of
small dimension when the specification of SR ensures that R is a sparse
matrix. We again refer to Benzi (2002) for a thorough description.
As posed in this simple form, the sparse approximate inverse, R is not
generally symmetric when A is symmetric, however we introduce this type
of preconditioner in this section because symmetry can be enforced by com-
puting R in factored form R = LLT where L is sparse and lower triangular;
see Kolotilina and Yeremin (1993). Symmetry of a sparse approximate in-
verse can also be achieved by the AINV algorithm of Benzi et al. (1996)
and the more robust (stabilized) SAINV variant (Benzi et al. 2000) which
is guaranteed not to break down if A is positive definite. See again Benzi
(2002).
The main issue in the construction of any sparse approximate inverse is
the choice of the sparsity pattern, SR. The simplest idea is to use the spar-
sity of A, whilst there have been several attempts to dynamically define SR,
Preconditioning 17
most notably in the SPAI algorithm of Grote and Huckle (1997). An exten-
sion which can be used in a number of di↵erent scenarios is to use a target
matrix, T , and hence minimise kT ARkF ; see Holland et al. (2005). A
further comment is that sparse approximate inverse algorithms are clearly
quite widely applicable, but generally do not give spectrally equivalent pre-
conditioners for elliptic PDEs. They do however make excellent smoothers
for use with multigrid (Tang and Wan 2000, Bro¨ker et al. 2001).
3.6. Hierarchical representations
For symmetric and positive definite matrices arising via low order finite
element approximation from elliptic PDEs, Yserentant (1986) introduced
nearly three decades ago the hierarchical basis preconditioner. This uses
the idea that the coecient matrix would have a much smaller condition
number if a di↵erent (but less computationally convenient) basis for the
approximation space was employed. By using a hierarchical basis—which
would give rise to a fairly dense matrix—both this good conditioning and
a rapid change of basis matrix based on a simple tree structure allow the
sparse matrix arising from the usual locally defined finite element basis
functions to be e↵ectively preconditioned.
For the discrete Laplacian on a two-dimensional domain, Yserentant’s
hierarchical preconditoner reduces  = O(h2) to  = O(log h1) for the
discrete Laplacian (see Yserentant (1986) or Elman and Zhang (1995) for a
more linear algebraic proof). Unfortunately, for three-dimensional domains
only  = O(h1) is possible with the straightforward use of the hierarchical
basis (Ong 1997), though an improved variant of this idea due to Bramble
et al. (1990) can even yield  = O(1) in three-dimensions as well. These
ideas, originally introduced as based on the multilevel splitting of finite
element spaces, were the precursor to a whole abstract theory based on such
splittings; the review article by Xu (1992) remains a largely comprehensive
description of this abstract theory which covers many approaches including
multigrid and domain decomposition discussed above for symmetric and
positive definite matrices arising from elliptic PDEs.
A di↵erent hierarchical structure can also be of utility. We wish to men-
tion some ideas and the associated computational technology which en-
ables certain dense matrices—such as the inverse matrices of discrete pos-
itive definite elliptic PDE operators—to be represented to machine accu-
racy in a highly ecient format. There are a number of related but dif-
fering approaches ( Chandrasekaran et al. (2005), Greengard and Rokhlin
(1987), Hackbusch (1999), Grasedyck and Hackbusch (2003), Hackbusch
et al. (2000), Bebendorf (2008)), but the essential observation can be thought
of in terms of the Green’s function for an elliptic operator: nearby entries
need accurate representation, but variables distant from a given entry can
18 A. J. Wathen
be treated together. This notion is most directly represented in the H-
matrix representation of Hackbusch: o↵-diagonal blocks are represented as
low rank blocks using outer products of vectors or matrices with very few
columns. By employing a blocking hierarchy, the computational complexity
of storing (and also multiplying with) such matrix approximations can be
as low as O(n log n).
We mention these ideas here since such approximate representations of
dense matrices have been suggested as preconditioners rather in the same
vein as sparse approximate inverses. They also can be utilised in other more
demanding situations as we describe later.
4. Semidefinite matrices
It might seem rather odd to be interested in preconditioners for singular
matrices; one might believe that only well-posed problems were worthy of
attention. However, there are di↵erent situations in which approximations
to certain singular operators/matrices are valuable and we mention some
ideas in this brief section.
The simplest PDE probem which gives rise to a singular matrix is proba-
bly the Neumann problem for a simple operator like the Laplacian; specify-
ing only derivative boundary conditions for a potential-like variable leaves
a 1-dimensional kernel comprising the constants as is well known. Such
a small dimensional null space generally causes little significant diculty
in terms of iterative methods or preconditioning and methods for definite
matrices generally apply: see (Elman et al. 2014b, section 2.3). Similarly,
graph analysis and Markov chain problems respectively lead to symmetric
and nonsymmetric singular systems with simple, low-dimensional kernels;
again, methods for definite matrices should generally be applied without
explicit ‘fixes’ to ensure nonsingularity for such problems. For problems
arising in contexts other than PDEs, singularity tends to imply that a de-
scription has not been completely specified.
The issue of semidefiniteness is certainly serious, however, for equations
involving the curl or div operators applied to vector fields as arise, for exam-
ple, in the Maxwell equations describing electromagnetics. In this context,
the operator curlcurl has all gradients of scalar fields in its kernel, thus
under most discretizations one would expect that the matrices arising will
be semidefinite with a kernel of some significant dimension. Similarly, the
curl of any smooth enough vector field is in the kernel of the div opera-
tor. On the relevant spaces of functions, the curlcurl operator is however
positive definite and self-adjoint (symmetric), and it seems that it may be
possibile to use convenient preconditioners like standard AMG implemen-
tations if appropriate account is taken of the kernel. This is the purpose of
‘Auxiliary Space’ preconditioning which aims to take account of the natural
Preconditioning 19
decompositions of function spaces that arise. The ideas have evolved over
some years; see Hiptmair and Xu (2007) and references therein. The main
practical issue is representing mapping operators between the various spaces
in the discrete setting.
5. Symmetric and indefinite matrices
For real symmetric matrices which do not have to be definite, the Krylov
subspace method of choice is the minres algorithm due to Paige and Saun-
ders (1975). In the same paper they introduced the alternative but less used
symmlq method. As its name implies, minres minimises the norm of the
residual vector for xk in the k-dimensional Krylov subspace at each itera-
tion. Thus since the symmetry of A ensures that the eigenvalues are real and
the eigenvectors are orthogonal, the convergence of minres is described by
krkkI
kr0kI  minpk2⇧k,pk(0)=1 max2(A) |pk()|, (5.1)
where the notation is as used before in the positive definite case. The
di↵erence from that case is not only that the residual rather than the error
is minimised (it is really a di↵erence of norm since rk = A(x xk) implies
that krkkI = kx xkkA2), but the spectrum, (A), now contains both
positive and negative eigenvalues.
As in the symmetric and positive definite case, (5.1) immediately indicates
that termination of the iteration will occur with the solution at the sth iter-
ation if A has just s distinct eigenvalues. Correspondingly, the eigenvalues
lying in a small number of clusters should also lead to rapid convergence
if none of the clusters is too far from, nor too close to the origin. If the
eigenvalues of A are all contained in two intervals
[a,b] [ [c, d],
where a, b, c, d are all positive, then it is less straightforward in general
than in the positive definite case to derive a simple convergence bound (see
Wathen et al. (1995)). One situation where a simple expression is possible,
however, is when d c = a b; the minres residuals then satisfy the
convergence bound
kr2kkI
kr0kI  2
✓p
 1p
+ 1
◆k
, (5.2)
where  = ad/(bc) (see (Elman et al. 2014b, section 4.2.4.)). Note that 
here plays a similar role to max/min in the positive definite case, though it
is not the matrix condition number here; since the Euclidean norm condition
number, kAkkA1k is max{a, d}/min{b, c} in this situation,  can be as
large as the square of this condition number when a = d, b = c, but rather
20 A. J. Wathen
less when the two intervals are not symmetrically placed about the origin.
When a = c, for example,  = d/b is again exactly the Euclidean norm
condition number.
When A is symmetric and indefinite, any preconditioner for minresmust
be symmetric and positive definite. This is necessary since otherwise there is
no equivalent symmetric system for the preconditioned matrix. Thus a non-
symmetric iterative method must be used even if a symmetric and indefinite
preconditioner is employed for a symmetric and indefinite matrix in general.
An indefinite preconditioner is therefore not usually desirable, though there
are exceptions: see below. A preconditioner for a symmetric indefinite
matrix A for use with minres therefore can not be an approximation of
the inverse of A, since this is also indefinite. This simple observation rules
out several approaches described above in the positive definite case.
With a symmetric and positive definite preconditioner, the preconditioned
minres convergence bounds as above become
krkkP1
kr0kP1
 min
pk2⇧k,pk(0)=1
max
2(P1A)
|pk()|
 min
pk2⇧k,pk(0)=1
max
2[a,b][[c,d]
|pk()| (5.3)
when [a,b] [ [c, d] is now an inclusion set for all of the eigenvalues of
P1A rather than A. A further aspect introduced by preconditioning for
minres which does not arise for cg in the positive definite case is now
apparent: the norm in which convergence naturally occurs is a↵ected by
the preconditioner; it is not just the eigenvalue distribution and hence the
speed of convergence which is a↵ected.
The only reason we bring up this issue is because one now has the undesir-
able possibility that a preconditioner which apparently gives rapid minres
convergence might give inaccurate solution iterates simply because k · kP1
is a highly distorted norm. The following trivial example will make the
point:
Let A be any indefinite diagonal matrix and suppose we precondition with
P = diag(✏, 1/✏, . . . , 1/✏)
for some small positive ✏. Convergence to some small tolerance will appear
to occur after only 1 iteration since there is only 1 large entry of P1 and the
remaining entries are very small. Because of this distorted weighting, the
first component of the first solution iterate will be accurate, but the rest will
generally not be. For further consideration of this issue see Wathen (2007).
These di↵erent issues, which do not arise when A is symmetric and pos-
itive definite, lead some authors to the conclusion that preconditioning for
symmetric indefinite matrices is much more dicult than the definite case.
However it is still much more tractable than the nonsymmetric case! For
Preconditioning 21
a PDE problem, for example, the convergence bound (5.3) guarantees that
preconditioned minres convergence will be in a number of iterations in-
dependent of any mesh size parameter, h, given a preconditioner which
ensures that a, b, c, d are bounded, bounded away from zero and indepen-
dent of h. This is like the situation for spectrally equivalent preconditioners
in the symmetric positive definite case. Such guarantees rarely exist for
non-symmetric iterative methods; see section 6 below.
We mention that the sqmr Krylov subspace iterative method (Freund
and Nachtigal 1994) is designed for indefinite matrices with indefinite pre-
conditioning; this method, however, has no convergence guarantees, being
a variant of the nonsymmetric qmr iterative method for which symmetry
allows a more streamlined implementation.
General preconditioning strategies for symmetric indefinite matrices are
not so available as in the positive definite case. Amongst the ‘given a ma-
trix’ approaches, simple application of scaling and incomplete factorization
are not contenders and there is little significant work on sparse approxi-
mate inverses since these appear unlikely to provide much benefit except
possibly when A has only a few negative eigenvalues. ‘Given a problem’ is
more promising and there are two general classes of problem which give rise
to symmetric indefinite matrices for which helpful—sometimes excellent—
preconditioning approaches are known.
5.1. Shifted elliptic operators: Helmholtz problems
It might seem that taking a symmetric and positive definite matrix, K, and
simply shifting it by subtracting some multiple of the identity should lead
to matrices for which easy solution algorithms exist. In general, however,
this is not the case. Certainly the matrix K I can have resonances (and
thus be singular) when is an eigenvalue of K, but the diculty lies deeper
than just this observation.
The classic (and important) PDE problem here is the Helmholtz equation
r2uu = f which arises when looking for periodic (wave-like) solutions
of the linear wave equation; the parameter is then related to the frequency.
Supposing that K is the discretised form of the positive definite operator
r2 (i.e. the negative Laplacian), then for > min(K) it is clear that
A = K I (5.4)
is indefinite and symmetric. Many of the important physical applications
involve high frequencies and thus large values of . Having a good precondi-
tioner forK—we have identified many in the sections above—is of little help
in constructing a preconditioner for A. For finite element discretization, the
identity operator becomes the mass matrix, Q (3.4), but via diagonal ap-
proximation or since the grids employed are generally fairly regular for such
wave problems, there is little lost in description by considering (5.4).
22 A. J. Wathen
In fact for Helmholtz problems, there is usually a radiation boundary con-
dition for outgoing waves or an absorbing boundary condition (or ‘perfectly
matched layer’) is applied so that there is little reflection of outgoing waves
back into the computational domain. These add some complex entries in the
matrix which generally render it complex and symmetric (not Hermitian).
Nevertheless, it is the indefiniteness of Helmholtz problems which remains
the most important feature, hence why we consider them in this section.
This aspect can, however, mean that nonsymmetric Krylov subspace itera-
tive methods are required—sqmr is quite popular in this context.
Magolu Monga Made et al. (2000) and Laird and Giles (2002) were per-
haps the first to recognize the potential of preconditioning (5.4) with a
matrix of the form K + µI where |µ| is positive. In the case of real µ,
such a positive definite matrix is readily approximated by standard multi-
grid cycles which can be used as a positive definite preconditioner for (5.4).
Complex values for µ were also explored (see also van Gijzen et al. (2007),
for example), and good choices of the parameter can be found from Fourier
analysis of the spectrum of the preconditioned operator/matrix. As well as
indicating the speed of Krylov subspace iteration convergence, such analysis
can also assist in the choice of multigrid components (Erlangga et al. 2006).
Elman et al. (2001) also propose a multigrid approach for Helmholtz prob-
lem employing gmres as a smoother. Domain decomposition approaches
have also been suggested (Gander et al. 2002).
A more recent prosposal by Engquist and Ying (2011a) for variable and
high-frequency regimes involves ‘sweeping’ from the absorbing boundary,
eliminating the variables locally in layers. The resulting preconditioner is
thus a block incomplete LDLT factorzation. A key aspect of the implemen-
tation is that hierarchical (H-)matrix compression techniques as described
in section 3 above can be employed for the intermediate layers; this leads to
significant reductions in storage and computational work and leads to ex-
cellent overall complexities like O(n log2 n) for computation of P = LDLT
and application of P1 to a vector for a two-dimensional domain problem.
Krylov subspace iterative convergence is observed to occur in few iterations
making this a very competitive technique. Unfortunately, though the idea—
and a closely related one employing frontal sparse factorization (Engquist
and Ying 2011b)—is observed to work well for three-dimensional domains
as well, the H-matrix compression does not appear to yield the same small
computational complexity: algebraic factors in n replace logarithmic factors
in estimates of the complexity for setting up the preconditioner (Poulson
et al. 2013). That is, for a three-dimensional domain problem, the com-
putation of P = LDLT takes time which grows like n for some constant
> 1 whereas, once P has been computed, the action of P1 can be com-
puted in O(n log n) work at each iteration (similarly to the FFT). For a
Preconditioning 23
problem with many di↵erent right hand side vectors there could thus be
some amelioration in the work required to set up the preconditioner.
We should alert the reader to the article by Ernst and Gander (2012)
which explains the general diculty of iterative solution of Helmholtz prob-
lems.
The Maxwell equations describing electromagnetics have wave-like solu-
tions and hence some of the character of Helmholtz problems. However,
they also (in various formulations) have additional semidefinite features as
discussed in the section above, and additional indefinite saddle point struc-
ture, and are therefore considered below.
The idea of sweeping through the discrete variables for a problem in a par-
ticular order is also important for preconditioner construction for nonsym-
metric matrices which derive from PDEs that involve advection/convection
(or transport) as described below.
5.2. Saddle point systems
The second main class of symmetric and indefinite problems come from
minimization of quadratic functionals subject to linear constraints. In fact,
linearizations of much more general constrained optimization problems quite
generally result in linear systems of the same form
Ax =

A BT
B 0

u
p

=

f
g

(5.5)
which are called saddle point systems; see Benzi et al. (2005) for a com-
prehensive treatment of such problems. The block A 2 Rn⇥n is usually
positive definite on the kernel of B 2 Rm⇥n with m < n. This is a su-
cient condition for invertibility of A provided B is of full rank. Often A
is positive definite, though in the context of sequential quadratic approxi-
mations in Optimization (and possibly elsewhere), indefiniteness can arise;
see Gould and Simoncini (2009) for spectral bounds in this less usual case.
The variables u are the primary variables and p are the Lagrange multi-
pliers. Sometimes the zero block is replaced with a regularization matrix
C 2 Rm⇥m where C is symmetric and positive semidefinite; if C 6= 0 then
some authors refer to a regularized or generalized saddle point problem.
Many problems give rise to matrices of this type: as well as the ubiqui-
tous appearance of such problems in Optimization, several important PDE
problems including the Stokes problem describing slow viscous incompress-
ible flow, are of this form. The incompressible Navier-Stokes equations give
matrices of similar form but in which A is nonsymmetric. For these ap-
plications, the primary variables are the fluid velocity and the Lagrange
multiplier variables are the fluid pressure. For other applications and refer-
ences, see again Benzi et al. (2005) or Pestana and Wathen (2015).
The appearance of the zero diagonal block in (5.5) causes problems for
24 A. J. Wathen
most of the preconditoning approaches identified in section 3 above. This is
also generally true with regularisation, C 6= 0. Multigrid approaches, for ex-
ample, require more complex smoothers; see for example John and Tobiska
(2000) and Manservisi (2006). Domain decomposition can be applied, but
then the subdomain problems remain indefinite. Incomplete factorization
ideas can sometimes be employed if, for example, A and C are diagonal or
nearly so (Bergamaschi et al. 2004).
A simple observation in the case that A is nonsingular is that if
P =

A 0
0 S

, S = BA1BT , (5.6)
then the preconditioned matrix P1A has only three distinct eigenvalues
independently of m,n; thus minres (or gmres in the nonsymmetric case)
will converge in 3 iterations to the solution (Murphy et al. 2000, Korzak
1999). P in (5.6) is rarely a practical preconditioner for A in its exact
form, but this observation does give a paradigm for the construction of
preconditioners and, in particular, highlight the utility of approximations to
the Schur complement, S. An important point to note is that P is nothing
like an inverse of A (nor an approximation thereof). Nevertheless, good
approximations (preconditioners) bA for the block A and bS for the Schur
complement, S, lead to block diagonal preconditioners
P =
 bA 0
0 bS

(5.7)
for A for which minres converges in few iterations. Precisely, if
`  u
TAu
uT bAu  ⌥ and  p
TSp
pT bSp  for all u 2 Rn{0} and p 2 Rm{0}
for positive constants `,⌥, and , then there are positive constants a, b, c
and d satisfying the preconditioned minres convergence bound (5.3). In
the context of PDE problems, for example, this means that preconditioned
minres is a solver of optimal computational complexity provided that bA is
spectrally equivalent to A and bS is spectrally equivalent to S. A multigrid
cycle can be an excellent choice for bA. See Pestana and Wathen (2015) for
a general proof which highlights the role of inf-sup stability associated with
the minimal singular value of B (in appropriate norms).
For many problems there are natural Schur complement approximations
which arise from problem formulation and do not require separate approx-
imation of the matrices B and A—see Pestana and Wathen (2015). In the
case of the Stokes equations, the natural Schur complement approximation
is particularly simple: it is the mass matrix (3.4) (Verfu¨rth 1984, Silvester
and Wathen 1994). For any such problem with a natural Schur comple-
ment approximation, (5.7) is an excellent preconditioning approach since bA
Preconditioning 25
can usually be any of the approximations in section 3. It should be noted
that once such approximations bA and bS are identified, then there are more
involved ways to make use of them than the simple block diagonal precon-
ditioner (5.7). When matrix symmetry is not an issue, a block triangular
preconditioner can be employed—see section 6.
Sometimes it is actually useful to directly approximate the matrices B
and A in constructing bS, for example in the context of optimization with
constraints which are PDEs (Rees et al. 2010a), though care must be exer-
cised to ensure that the sucient conditions of Braess and Peisker (1986)
are satisfied.
For saddle point problems arising particularly in the more general context
of optimization, constraint preconditioning has attracted significant atten-
tion (Keller et al. 2000). Here
P =

H BT
B 0

is used as a preconditioner forA. Note that the blocksB andBT which come
from the constraints are kept, whilst A is replaced by another symmetric and
(usually) positive definite matrix H. It might appear that solution of sys-
tems with P may not be signficantly easier than with A, however, by leaving
H implicitly defined, this can be achieved (Dollar and Wathen 2006, Dollar
et al. 2006). One significant advantage of keeping the constraint blocks in
the preconditioner is that the preconditioned problem is now equivalent to
a positive definite problem on the constraint manifold, hence a projected
cg algorithm can be used (Gould et al. 2001); this is rather useful since
a constraint preconditioner is symmetric and indefinite precluding simple
use of minres. Keller et al. (2000) prove that when H is positive definite
then P1A has 2m eigenvalues of 1, each of geometric multiplicity 2 (thus
the minimum polynomial generically has a factor ( 1)2), and that the
remaining nm eigenvalues are real and interlace those of H1A. Thus if
H is a good preconditioner for A and systems with P can be readily solved,
then this can be an attractive strategy.
Di↵erent challenges arise and di↵erent strategies are needed for saddle
point problems (5.5) with semidefinite block, A. The constraint precondi-
tioning paradigm can be applied, in which case finding a matrix H for the
semidefinite matrix A may require ideas as in section 4 above. The matrix
A is still invertible provided A is positive definite on the kernel of B and B
is of full rank, as has been mentioned. One possible approach is therefore
via an augmentation or augmented system preconditioner
P =

A+BTW1B 0
0 W

for some appropriate symmetric and positive definite matrix W . This ap-
26 A. J. Wathen
proach is described and analysed by Greif and Scho¨tzau (2006) who were
motivated by the solution of Maxwell equations describing electromagnetic
phenomena—a key PDE problem with this structure. For related but earlier
ideas associated with the semidefiniteness of the div operator, see Arnold
et al. (1997).
For the various Maxwell equation formulations, semidefiniteness arises
through curlcurl terms, indefiniteness because of a div constraint and wave-
like solutions are always expected, which means that fine computational
grids must be used especially for high frequency problems. Fortunately,
Maxwell’s equations are linear and self-adjoint in most formulations which
is why we bring them up in this section. Mixed finite element approxi-
mation is widely used for these PDEs. Representative of the large and
still growing literature on preconditioning for such problems are Greif and
Scho¨tzau (2007), who develop block preconditioners and Tsuji et al. (2012),
who extend sweeping preconditioner ideas to these problems.
A slightly more detailed review of preconditioning for saddle point prob-
lems can be found in Benzi and Wathen (2009).
5.3. A word of caution
As has already been mentioned in section 2, for any nonsingular square ma-
trix A, the solution of a linear system can be achieved by using an indefinite
symmetric system of twice the dimension of the form (2.5) or even
0 A
AT 0

r
x

=

b
0

.
Since the eigenvalues of the coecient matrix here are exactly symmetric
about the origin (they are ±i(A) where i(A) are the singular values of
A), generically, the convergence of minres would stagnate at every other
iteration, so the computational e↵ort in solution is exactly the same as for
the normal equations (2.4) (Fischer et al. 1998). For (2.5) a similar issue
arises.
The artificial construction of indefinite symmetric systems is therefore
unlikely to achieve other than equivalent problem statements, except possi-
bly in the solution of least squares problems by augmented matrix methods
(Bjo¨rck 1996, Chapter 7). To go further, a symmetric indefinite matrix
of any form which has eigenvalues symmetrically arranged about the origin
will generically lead to staircasing behaviour where progress is made only on
alternate minres iterations; cg for (2.4) generally requires the same com-
putational e↵ort for equation solution in such circumstances. In this sense,
it is highly unfortunate that the tractable minres convergence bound (5.2)
applies only when the positive and negative eigenvalues extend over inter-
vals of equal length. If in fact a = d and b = c so that these intervals are
Preconditioning 27
centred on the origin, then the undesirable staircasing e↵ect is always im-
plied! In general, it is advantageous in terms of convergence speed if there
is some asymmetry in the distribution of positive and negative eigenvalues
of preconditioned indefinite symmetric matrices. However, it is then con-
siderably more complicated to take account of this asymmetry in a minres
convergence bound (but see Wathen et al. (1995)).
6. Nonsymmetric matrices
As already mentioned, all preconditioning for nonsymmetric problems is
heuristic, since descriptive convergence bounds for gmres or any of the
other applicable nonsymmetric Krylov subspace iterative methods do not
presently exist (but see Pestana and Wathen (2013b)). The most straight-
forward convergence bound for gmres for diagonalisable matrices, A =
X⇤X1 where ⇤ is a diagonal matrix of eigenvalues, comes directly from
(2.3) and the residual minimization property:
krkk = min
p2⇧k,p(0)=1
kp(A)r0k
 min
p2⇧k,p(0)=1
kp(X⇤X1)k kr0k
= min
p2⇧k,p(0)=1
kXp(⇤)X1k kr0k
 kXk kX1k min
p2⇧k,p(0)=1
kp(⇤)k kr0k
 kXk kX1k min
p2⇧k,p(0)=1
max
z2R
|p(z)| kr0k, (6.1)
where R is any region containing the eigenvalues. Since it is sometimes pos-
sible to find regions R which contain all of the eigenvalues, but rarely possi-
ble to obtain any reasonable estimates of the condition number kXkkX1k
of the eigenvector matrices, X, very often in the literature, the eigenvalues
of P1A are estimated or bounded as a guide to the quality of a precondi-
tioner. It is certainly usually true that if the eigenvalues are widely spread
through a large region of C then slow convergence can follow!
There are alternative approaches for obtaining bounds for gmres con-
vergence; see for example (Greenbaum 1997, Section 3.2). One can generate
such bounds, for example, with use of the field of values or numerical range
F(A) = {xTAx, x 2 Cn, xTx = 1},
provided 0 /2 F(A). The obtained bounds can be rather pessimistic, but
field-of-values convergence bounds have been used, for example, to rigor-
ously establish the convergence of gmres in a number of iterations in-
dependent of the mesh size, h, for certain preconditioned PDE problems;
see for example Klawonn and Starke (1999), Loghin and Wathen (2004),
28 A. J. Wathen
Benzi and Olshanskii (2011). gmres convergence bounds can be based on
other sets in the complex plane; ✏pseudospectral sets, for example, have
been used to analyse initial stagnation of (unpreconditioned) gmres for
convection-di↵usion problems (Trefethen and Embree 2005, Chapter 26).
One aspect which arises in the nonsymmetric but not the symmetric case
is the possibility to left precondition
P1Ax = P1b,
to right precondition
AP1y = b, x = P1y,
or, if P is available in split form P = P1P2, to use
P11 AP
1
2 y = P
1
1 b, x = P
1
2 y.
In every case the coecient matrix is generally again nonsymmetric and
obvious similarity tranformations involving P or the Pi show that all three
coecient matrices are mathematically similar and so have the same eigen-
values. There is little evidence, however, that any one of these is better
than the others, in terms of its e↵ect on the convergence rate for a Krylov
subspace method, even though, theoretically, this could be the case. Tradi-
tionally, right preconditioning is favoured since then gmres, for example,
minimizes the residual rk = b Axk for the original linear system; left
preconditioning would lead to minimization of the preconditioned residual
P1rk. It is certainly not always clear which is preferable; a good precon-
ditioner fortunately seems to give fast convergence for all three forms in
practice.
In the context of PDEs, convection-di↵usion problems present significant
challenges for stable discretization—without excessive numerical di↵usion
or non-physical oscillations—as well as for preconditioning and iterative
solvers. A key aspect is simply equation ordering. Even such a simple
method as stationary Gauss-Seidel iteration can be good for such problems
if the variables are ordered in the direction of the convection, but it can be
very poor otherwise (see Elman and Chernesky (1993), Elman et al. (2014b,
chapter 7)).
We mention also that there is a ‘flexible’ version of gmres which allows
for a preconditioner that varies between iterations (Saad 1993). This can be
useful, for example, when employing another iteration as a preconditioner,
but this flexible gmres method does not necessarily work as well as regular
gmres. It can sometimes be better to just use a fixed number of inner
iterations so that the preconditioner is a fixed operator for every iteration;
if such inner iterative steps represent a fixed linear process, then at least
the limited guarantees that come with gmres then remain in place.
Preconditioning 29
6.1. Incomplete Factorization
Just as a Cholesky factorization for a symmetric and positive definite ma-
trix can be incompletely formed, an LU factorization (equivalent to Gauss
elimination) for a non-symmetric matrix can be performed incompletely
by prespecifying allowable fill or thresholding or both. Perhaps such ILU
preconditioning methods and their variants are the most widely used for
non-symmetric problems and software is widely available.
Because the product of triangular matrix factors is computed for ILU , for
problems with an implied directionality like convection–di↵usion, ordering
of the variables can lead to one of the factors being less important (and
perhaps more nearly diagonal).
Although, in general, there is little we can say about ILU precondition-
ing for non-symmetric matrices, gmres with ILU is a ‘go-to’ technique for
very many practical problems, so much largely empirical research continues
to explore every aspect of the various possibilities, for example, multilevel
variants, block variants and parallel implementation issues. We again refer
to Benzi (2002) as the huge literature is fairly evenly cited. Because of
its significance, particularly in the important area of petroleum reservoir
modelling, we specifically mention the nested ILU methods originally de-
rived by Appleyard and Cheshire (1983), which use partial eliminations and
are designed to take advantage of parallel computation; related ideas were
developed, for example, by Saad (1996). See also Vassilevski (2008) for a
comprehensive treatment of these methods in a more general framework.
6.2. Multigrid
There is no inherent obstacle to applying multigrid or multilevel ideas for
nonsymmetric problems. However, smoothing is generally not so easy and
requires more work than for symmetric problems and care must be taken so
that coarse grid correction does not reintroduce high frequencies. Without
attention to these aspects, the methods simply do not work at all well.
Since such issues vary with di↵ering applications, we briefly discuss only
the important problem associated with the convection-di↵usion equation.
Smoothing – or perhaps more appropriately in this setting, ‘sweeping’ –
should ideally be something like a Gauss-Seidel iteration with the variables
ordered in the direction of the convection. For problems with recirculation,
one can try multidirectional sweeping, so that all parts of the flow have
at least one of the directional sweeps that is approximately in the convec-
tion direction. Coarse grid correction then requires care (in grid transfers
and/or coarse grid operator). In the convection-di↵usion geometric multi-
grid method due to Ramage (1999), for example, the coarse grid operator
is recomputed on each coarse grid with—crucially—streamline upwinding
appropriate to that grid size. The point is that when a grid under-resolves
30 A. J. Wathen
any layer-like phenomenon which arises with convection—this must happen
at some level if one has a true multigrid concept—then oscillatory solutions
are commonly the result. The reintroduction of such high frequencies would
require more significant sweeping or smoothing, and it is preferable to pro-
vide stabilisation appropriate to each level of grid so that smooth coarse
grid corrections result.
Multigrid for hyperbolic problems is thought about rather di↵erently
than for elliptic problems and the idea of directional sweeping rather than
smoothing is key. The coarse grids allow one to sweep errors out of the
boundary of the domain more rapidly than using just a single grid. Again,
transport is the main issue, not smoothing.
Algebraic multigrid ideas have been tried and can work well for some
problems. The computation of ‘coarse grids’ (or in reality, spaces of coarse
variables) by purely algebraic means can identify layer phenomena, for ex-
ample, and automatically semi-coarsen. In particular, widely used coars-
ening algorithms seem to detect anisotropy and lead to appropriate coarse
spaces/grids. All is not rosy, however, and AMG software can quite simply
fail. There is a particular danger that simply does not arise with geometric
multigrid ideas, namely that the coarse grid can have a significant propor-
tion of the variables of the fine grid; that is, coarsening does not lead to a
suciently significant reduction in the number of variables. If this occurs,
then many levels are required and eciency can be severely compromised.
Active research continues on AMG for nonsymmetric problems; see for ex-
ample Notay (2012).
Multigrid methodology is generally applied as a preconditioner for non-
symmetric problems.
6.3. Sparse Approximate Inverses
Norm minimization readily leads to sparse approximate inverses similarly
to the symmetric case. A biconjugation based method (AINV) and its
more robust stabilized form (SAINV) are, however, perhaps the most widely
considered of such methods in the non-symmetric case (Benzi and Tuma
1998, Benzi et al. 2000).
6.4. Symmetric/Skew splitting
Every nonsymmetric real matrix can be uniquely split into its symmetric
and skew-symmetric part:
A = 12(A+A
T )| {z }
H
+ 12(AAT )| {z }
S
.
(The H stands for Hermitian, since the same idea obviously applies in the
complex case when transpose is replace by conjugate transpose.) When H is
Preconditioning 31
positive definite, preconditioning of A by H gives a shifted skew-symmetric
matrix for which a special three-term recurrence Krylov subspace method
exists (Concus and Golub 1976, Widlund 1978) and convergence depends on
the size of 1+|max(H1S)| (Widlund 1978, Freund and Ruscheweyh 1986).
Although this approach has been considered for some applications—see for
example Elman and Schultz (1986) in the context of convection–di↵usion
problems—the so-called HSS (Hermitian Skew-Hermitian Splitting) meth-
ods originally introduced by Bai et al. (2003) have been rather more widely
investigated. They proved the convergence of an alternate iteration using
shifted Hermitian and shifted skew-Hermitian parts of the form
(↵I +H)x(k+
1
2) = (↵I S)x(k) + b
(↵I + S)x(k+1) = (↵I H)x(k+12) + b
for Ax = b. Though convergence is guaranteed independently of the choice
of the parameter ↵ > 0, it can be slow. Nevertheless, many variants of
using this idea as a preconditioner have been suggested, as have related
structural splitting ideas, and good results are reported for some; see for
example Benzi et al. (2011a).
6.5. Nonsymmetric saddle point systems
Saddle point problems must essentially be symmetric as in section 5.2, but
matrices of the structure (5.5) arise in which the block A is non-symmetric.
A key example is provided by linearizations of the incompressible Navier–
Stokes equations (see Elman et al. (2014b, Chapters 8 and 9)). It is these
problems that we refer to as non-symmetric saddle point problems.
For a nonsymmetric saddle point matrix, the results of Murphy et al.
(2000) still hold and block diagonal preconditioners can still be applied.
However, since there is no symmetry to lose, the block triangular precondi-
tioners
P =
 bA 0
B ±bS

or P =
 bA BT
0 ±bS

will give in the ‘ideal’ case bA = A, bS = S = BA1BT , a preconditioned
system with either just the two distinct eigenvalues ±1 or even just the
single eigenvalue of 1 depending on whether the + or the sign is selected,
respectively (Murphy et al. 2000). Even with the single eigenvalue 1, P is
not the inverse, so something has to give! In fact it is diagonalisability that
is lost in selecting the sign; the minimium polynomial is then ( 1)2, so
two iterations would be required exactly as for the + case. However, in the
practical case when the approximations bA, bS are not ideal, it is usually the
32 A. J. Wathen
sign which is chosen so that all eigenvalues are clustered around 1 rather
than the 2 clusters around ±1 which arise with the + choice, though it is
not clear that this is to be preferred (Ymbert et al. 2015).
Approximations of the Schur complement remain a key aspect of any
such block preconditioning approach. For the common Oseen linearization
of the incompressible Navier-Stokes problem, the matrix block A comes
from a vector convection–di↵usion operator so that an appropriate multi-
grid approach as outlined above or any other such good approximation of
(preconditioner for) convection-di↵usion is required as bA. To identify an
approximate Schur complement requires more in depth consideration, but
there are now two excellent candidates: the pressure convection-di↵usion
(PCD) (Kay et al. 2002) and the least-squares commutator (LSC) precon-
ditioners. Both are derived, described and analysed in Elman et al. (2014b,
Chapter 9). The PCD method is in some ways simpler, but it requires the
construction of a convection–di↵usion operator on the pressure space (the
space of Lagrange multipliers), which is not required for the problem for-
mulation itself. A purely algebraic way to compute this operator based on
sparse approximate inverse technology was, however, been described by El-
man et al. (2006). The LSC method is derived via a commutator argument
and is defined purely in terms of the matrix blocks which arise naturally
for the Oseen problem. For a comparison and full consideration of advan-
tages and disadvantages of these two approaches, see Elman et al. (2014b,
Chapter 9).
The augmentation or augumented Lagrangian approach has also been
applied to non-symmetric saddle point problems. Although some concerns
have been expressed regarding possible distortion of the norm in which
convergence occurs, good results have been reported even for large Reynolds
number (Benzi et al. 2011b).
Earlier approaches for this problem involved segregated treatment of pres-
sure and velocity and predate the preconditioned Krylov subspce iteration
technology. It is perhaps worth commenting here that this example is one
of the first where block preconditioners comprising of blocks designed for
di↵erent sets of variables allow treatment of di↵erent ‘physics’ within the
preconditioner whilst gmres or some other Krylov subspace method is ap-
plied to the complete coupled problem. This is an important paradigm.
Such is its importance that it is in the context of nonsymmetric saddle
point systems arising from approximation of the Oseen problem that per-
haps the most refined analysis of preconditioned gmres convergence has
been pursued. There are several eigenvalue analyses for di↵erent precondi-
tioners, see for example Elman and Silvester (1996), although these do not
lead to rigorous estimates as explained above. It is in the context of this
problem, however, that field-of-values convergence analysis has been suc-
cessfully employed to rigorously bound preconditioned gmres convergence
Preconditioning 33
and to elucidate its dependence on or independence of the mesh size, h, and
of the Reynolds number; see for example the analysis of Loghin and Wa-
then (2004) for PCD, and Benzi and Olshanskii (2011) for block triangular
preconditioning of an augmented Lagrangian formulation.
6.6. Circulant and semi-circulant preconditioning
Our opening example already indicates the potential for circulant precondi-
tioning, at least for symmetric Toeplitz matrices. In fact the applicability
of such ideas is rather wider, though the theoretical justification is less com-
plete for non-symmetric problems.
The analysis for symmetric and positive definite Toeplitz matrices is well
described in the books by Chan and Jin (2007) and Ng (2004) who also in-
dicate how similar ideas follow for block Toeplitz matrices with Toeplitz
blocks. Indeed, the use of circulant preconditioners for non-symmetric
Toeplitz matrices does not really depend on symmetry. However, unless the
normal equations are used, convergence guarantees are not readily avail-
able, simply because of the general lack of descriptive convergence bounds
for non-symmetric matrices. Good computational results have been ob-
served in many situations, and in fact even proved for certain problems via
bounding of the eigenvector condition number as well as the eigenvalues
and using (6.1) (Otto 1996). A recent observation of Pestana and Wathen
(2014), that guaranteed convergence bounds can be achieved for minres
for circulant preconditioned non-symmetric Toeplitz matrices via a variable
reordering, is also relevant.
7. Some comments on symmetry
We have emphasised the importance of matrix symmetry in this review,
in particular that it is rarely sensible to throw away such a property by
preconditioning. In the reverse direction, it would be highly advantageous
for non-symmetric matrices if it were easily possible to precondition to ob-
tain symmetry of the preconditioned problem without recourse to the nor-
mal equations. This possibility seems unlikely without the computation
of expensive matrix factorizations. However, there are situations in which
a broader consideration of self-adjointness allows many of the attractive
properties of symmetric matrices, such as the use of the symmetric Krylov
subspace iterative methods cg and minres.
A symmetric matrix, A = AT , is simply one which is self-adjoint in the
usual Eulidean inner product, that is,
hAu, vi = vTAu = uTAv = hAv, ui = hu,Avi.
In fact given any inner product
hu, viH = vTHu (7.1)
34 A. J. Wathen
defined by a symmetric and positive definite matrixH (all inner products on
Rn arise in this way), the distinction between symmetric and non-symmetric
matrices is really the distinction between matrices which are self-adjoint and
those which are non-self-adjoint with respect to the inner product. Indeed,
in the standard situation when A and P are both symmetric and P is
positive definite, then the left-preconditioned matrix P1A is self-adjoint in
hu, viP , since
hP1Au, viP = vTPP1Au = vTAu = vTATPTPu = hu, P1AviP .
This description of symmetric preconditioning for symmetric matrices with-
out destroying self-adjointness provides an alternative to the approach in
section 3. The cg method, for example, can be employed in such a gen-
eral inner product (7.1) for any matrix which is self-adjoint in this inner
product and satisfies the positive definiteness condition hAu, uiH > 0 for all
u 2 Rn {0}.
In general, any matrix A is self-adjoint in (7.1) if and only if, for all
u, v 2 Rn
vTHAu = hAu, viH = hu,AviH = vTATHu, (7.2)
which is the same condition as HA = ATH, so we must have AT = HAH1
for some positive definite symmetric matrix H. It can be shown that such
a matrix H exists if and only if the real matrix A is diagonalisable and has
real eigenvalues (Taussky 1972). Thus the use of self-adjointness in non-
standard inner products is limited, but interesting examples do exist where,
via left- or right-preconditioning, non-symmetric preconditioned matrices
arise which are nevertheless self-adjoint in some convenient non-standard
inner product; see Stoll and Wathen (2008). Since the inner product must
be used for computation of scalar products (dot products) arising in Krylov
subspace methods, there is additional arithmetic overhead in the use of
H 6= I unless other structures mitigate this.
We present just one example—perhaps the most well-known—of the use
of non-standard inner products; exceptionally, it allows the robust appli-
cation of cg in a non-standard inner product for an indefinite symmetric
saddle point problem by creating a self-adjoint and positive definite matrix
in that inner product (Bramble and Pasciak 1988). For other examples, see
Klawonn (1998) and Benzi and Simoncini (2006), and see Stoll and Wathen
(2008) and Pestana and Wathen (2013a) for how examples of this struc-
ture can be combined to yield further cases of possible utility (so-called
combination preconditioning).
In the Bramble and Pasciak method one preconditions the symmetric and
indefinite saddle point matrix as given in (5.5) on the left with the block
Preconditioning 35
lower triangular matrix
P =
 bA 0
B I

,
resulting in the non-symmetric matrix
P1A =
 bA1A bA1BT
B bA1AB B bA1BT

.
This turns out to be self-adjoint in the inner product (7.1) (i.e. it satisfies
the condition (7.2)) where
H =

A bA 0
0 I

.
Moreover P1A is positive definite in this inner product. Note that it is
necessary for A bA to be positive definite in order to have an inner product
(theoretically – and quite often in practice – this can be achieved by scaling
of bA), and in fact the positive definiteness also requires this condition.
Perhaps this appears to be a rather exotic structure, but much practical
use has been made of this particular example; the possibility for other really
useful examples remains unrealised but rather intriguing.
A final comment concerns the possibility of non-symmetric precondition-
ing of symmetric systems. We have earlier noted that such preconditioning
would destroy symmetry in general and thus is undesirable. However, in
the conceivable, but presumably rare, situation that A = AT , P 6= P T but
P1A = APT—that is, PT is one of the matrices that is self-adjoint
in the (possibly indefinite) inner product related to the symmetric matrix
A—then it would be possible to use cg for P1Ax = P1b when P1A
is positive definite or minres if P1A is indefinite. The cg error would
converge in k ·kP1A and for minres it would be P1(bAx), which would
converge in k ·kI . We do not know of any instance of this structure for non-
symmetric P , however the symmetric preconditioner P1 = q(A) for any
polynomial, q, clearly does commute with A, and there is some literature
on such polynomial preconditioning (see for example O’Leary (1991)).
8. Some comments on practical computing
Some important general ideas remain to be mentioned that do not in them-
selves comprise a particular preconditioning approach, but which have broad
significance.
First we mention deflation. It can be helpful – sometimes extremely help-
ful – to remove certain subspaces from a Krylov subspace. For example, if
one has a symmetric and positive definite matrix with a reasonable spread of
eigenvalues except for a few very small eigenvalues about which one has some
36 A. J. Wathen
knowledge of the corresponding eigenspace, then cg convergence could be
much improved if the e↵ect of this eigenspace could be removed or ignored.
This is the idea of deflation; if the eigenspace spanned by the eigenvectors
corresponding to the m smallest eigenvalues 1  2  . . .  m can be
deflated, then the e↵ective condition number is  = max/m+1 and it is
this which appears in the cg convergence bound (3.1). The crucial step is
to identify the deflating subspace.
The deflation idea is most e↵ective when the problem naturally presents
such a space. For example, see Brandt and Ta’asan (1986), who deflate the
low frequency eigenspaces which correspond to negative eigenvalues for a
Helmholtz problem, and Jonsthovel et al. (2012) who describe a prototyp-
ical application involving composite materials. However, various subspaces
which arise from iterations of a previous linear system solve, or other sources,
can be used. In particular, when restarting gmres, one can use informa-
tion from previous iterates via deflation. There are many papers describing
various approaches: for symmetric and positive definite matrices, see for
example Frank and Vuik (2001), and for nonsymmetric matrices, see for
example Erhel et al. (1995), Sifuentes et al. (2013) and Giraud et al. (2010).
When solving sequences of linear systems, the related idea of ‘recycling’
can be considered (Parks et al. 2006). Deflation is central to the iterative
computation of eigenvalues (Lehoucq and Sorensen 1996).
As parallel computing becomes more widespread, preconditioners which
are parallelizable or inherently parallel become more important. After all, it
is large scale computation for which preconditioned iterative solution tech-
nology is generally being developed. Very simple preconditioning such as
diagonal scaling is obviously parallel, but the utility of such an approach is
limited, as we have described. Methods based on incomplete factorization
are more sequential in design, and though there are parallel variants, often
speed of convergence is compromised by ordering and separating variables,
etc. to obtain the necessary independence of calculations. Such is the im-
portance of multigrid and AMG methods that parallel forms which give up
the least of the potency of sequential implementation are critical. There are,
however, compromises to be made in going parallel (Worley 1991). There is
much to be considered here, and this is not an article on parallel multigrid.
The paradigm of domain decomposition is clearly ideal for parallel compu-
tation and this is precisely what it was invented for. Sparse approximate
inverses were precisely developed because they appeared to be a tractable
way to compute a purely algebraic (‘given a matrix’) preconditioner in par-
allel. Their e↵ectivity, however, remains unclear. One beneficial use of
these ideas appears to be in contexts where one is not seeking to approxi-
mate an inverse, but rather seeking some more limited goal like minimising
a commutator—see, for example Elman et al. (2006).
Since preconditioning is used with iterative methods, sensible criteria for
Preconditioning 37
stopping an iteration must be chosen. In the ‘given a matrix’ context,
some fairly arbitrary small tolerance for a residual norm is usually specified.
When one is ‘given a problem’, however, there can be criteria appropriate
to the problem. In particular for PDE problems, reducing iteration error to
orders of magnitude less than discretization error is untenable and rather
wasteful; practical criteria are given by Arioli (2004) and Arioli et al. (2005).
An important point is that a larger-dimensional matrix for a given problem
comes from a more accurate discretization, so any stopping tolerance should
also in general be smaller on finer grids; see Elman et al. (2014b, Section
2.1.2).
In the general context of solution of linear systems of equations, it is often
regarded as desirable to reduce the number of variables/degrees of freedom
if this is easy; such thinking is associated completely with direct solution
methods. In the context of iterative methods, it can be preferable to have a
larger dimensional problem if it is then easier to identify and apply an ap-
propriate preconditioner. That is, it can be convenient even to enlarge the
dimension of given systems if the resulting matrices are more readily precon-
ditioned, as long as the storage requirement for the enlarged matrix and the
associated vectors does not become unacceptable. In finite element analysis,
for example, the idea of ‘static condensation’ is precisely the elimination of
variables with only local connections, but although it slightly reduces the
number of free variables, and hence the matrix dimension, the resulting
smaller matrices can be less easy to precondition even though they often
possess a slightly reduced condition number. More radical is the idea of
the first-order system least-squares (FOSLS) approach, where extra depen-
dent variables are explicitly introduced making a rather larger-dimensional
system in general, but a system for which multigrid solvers can be readily
applied. For an application of this idea, see de Sterck et al. (2004), for
example.
A further example of the freedom and possibilities is the ‘all-at-once’, ‘one-
shot’ or ‘monolithic’ approach, where preconditioning for separate parts of a
problem – perhaps separate physical processes – are employed together with
a Krylov subspace method for the whole of a coupled problem for which the
required matrix-vector products are computable in a matrix-free way, that
is, without explicit storage and operation on the complete two-dimensional
array. See, for example Heil et al. (2008), Haber and Ascher (2001) or Rees
et al. (2010b). When matrix-vector products are readily computed, there
is always a Krylov subspace method that can be applied to the monolithic
problem; the representation of the component parts can very conveniently
be the job of a block-structured preconditioner. Good examples of this
kind of approach are recent ideas for preconditioning problems coming from
magnetohydrodynamics (Phillips et al. 2014, Wathen 2014) and from geo-
dynamics (Rhebergen et al. 2014).
38 A. J. Wathen
A final comment is aimed at anyone who would like to experiment and
see the potential of preconditioning in an easy computational setting. The
freely downloadable ifiss software (Elman et al. 2007, Elman et al. 2014a),
which runs under basic matlab or octave enables such experiments.
9. Summary
We conclude with a few general guidelines on preconditioner design and
selection.
• Keep to the original structure when preconditioning (e.g., preserve
symmetry, partitioning into blocks, etc.)
• The more knowledge about a problem that can be represented in a
preconditioner, the better.
• The work in applying the preconditioner should ideally be commen-
surate with the work in matrix–vector multiplication: that would be
O(n) for a sparse matrix and O(n2) for a general full matrix.
• It may be easier to precondition a matrix of larger dimension, for which
structures are clearer, than to eliminate variables, giving a less struc-
tured matrix of smaller dimension.
There is, of course, no such concept as a best preconditioner: the only
two candidates for this would be P = I, for which the preconditioning takes
no time at all, and P = A, for which only one iteration would be required
for solution by any iterative method. However, every practitioner knows
when they have a good preconditioner which enables feasible computation
and solution of problems. In this sense, preconditioning will always be an
art rather than a science.
Acknowledgements
I am grateful to Michele Benzi, Howard Elman and David Silvester for read-
ing an early draft; their comments and suggestions have led to significant
improvements.
Parts of this survey were written whilst visiting CERFACS, Toulouse,
France and the Memorial University of Newfoundland, Canada. I am grate-
ful to Serge Gratton, Xavier Vasseur, Hermann Brunner and Scott MacLach-
lan for their support and encouragement.
Preconditioning 39
REFERENCES
P.R. Amestoy, I.S. Du↵, J-Y. L’Excellent and J. Koster (2000), ’MUMPS: a general
purpose distributed memory sparse solver’, In Proc. PARA2000, 5th Interna-
tional Workshop on Applied Parallel Computing, Springer-Verlag, 122–131.
J.R. Appleyard and I.M. Cheshire ’Nested factorization’, Society of Petroleum
Engineers, SPE 12264, presented at the Seventh SPE Symposium on Reservoir
Simulation, San Francisco, 1983.
M. Arioli (2004), ’A stopping criterion for the conjugate gradient algorithm in a
finite element method framework’, Numer. Math. 97, 1–24.
M. Arioli, D. Loghin and A.J. Wathen (2005), ’Stopping criteria for iterations in
finite element methods’, Numer. Math. 99, 381–410.
D.N. Arnold, R.S. Falk and R. Winther (1997), ’Preconditioning in H( div) and
applications’, Math. Comput. 66, 957–984.
Z.-Z. Bai, G.H. Golub and M.K. Ng (2003), ’Hermitian and skew-Hermitian split-
ting methods for non-Hermitian positive definite linear systems’, SIAM J.
Matrix Anal. Appl. 24, 603–626.
N.S. Bakhvalov (1966), ’On the convergence of a relaxation method with natural
constraints on the elliptic operator’, USSR Comp. Math. and Math. Phys.
6, 101–135.
M. Bebendorf (2008), Hierarchical Matrices: A Means to Eciently Solve Ellip-
tic Boundary Value Problems, Lecture Notes in Computational Science and
Engineering, vol. 63, Springer-Verlag.
M. Benzi (2002), ’Preconditioning techniques for large linear systems: A survey’,
J. Comput. Phys. 182, 418–477.
M. Benzi, J.K. Cullum and M. Tuma (2000), ’Robust approximate inverse precon-
ditioning for the conjugate gradient method’, SIAM J. Sci. Comput. 22,1318–
1332.
M. Benzi, G.H. Golub and J. Liesen (2005), ’Numerical solution of saddle point
problems’, Acta Numerica 14, 1–137.
M. Benzi, C.D. Meyer and M. Tuma (1996), ’A sparse approximate inverse precon-
ditioner for the conjugate gradient method’, SIAM J. Sci. Comput. 17, 1135–
1149.
M. Benzi, M. Ng, Q. Niu and Z. Wang (2011a), ’A relaxed dimensional factorization
preconditioner for the incompressible Navier-Stokes equations’, J. Comput.
Phys. 230, 6185–6202.
M. Benzi and M.A. Olshanskii (2011), ’Field-of-values convergence analysis of aug-
mented Lagrangian preconditioners for the linearized Navier-Stokes problem’,
SIAM J. Numer. Anal. 49, 770–788.
M. Benzi, M.A. Olshanskii and Z. Wang (2011b), ’Modified augmented Lagrangian
preconditioners for the incompressible Navier-Stokes equations’, Int. J. Nu-
mer. Meth. Fluids 66, 486–508.
M. Benzi and V. Simoncini (2006), ’On the eigenvalues of a class of saddle point
matrices’, Numer. Math. 103, 173–196.
M. Benzi and M. Tuma (1998), ’A sparse approximate inverse preconditioner for
nonsymmetric linear systems’, SIAM J. Sci. Comput. 19, 968–994.
M. Benzi and M. Tuma (2003), ’A robust incomplete factorization preconditioner
for positive definite matrices’, Numer. Linear Algebra Appl. 10, 385–400.
40 A. J. Wathen
M. Benzi and A.J. Wathen (2008), ’Some preconditioning techniques for saddle
point problems’, in Model order reduction: theory, research aspects and ap-
plications, 195–211.
L. Bergamaschi, J. Gondzio and G. Zilli (2004), ’Preconditioning indefinite systems
in interior point methods for optimization’, Comput. Optim. Appl. 28, 149–
171.
M. Bern, J.R. Gilbert, B. Hendrickson, N. Nguyen and S. Toledo (2006), ’Support-
Graph preconditioners’, SIAM J. Matrix Anal. Appl. 27, 930–951.
A. Bjo¨rck (1996), Numerical Methods for Least Squares Problems, Society for In-
dustrial and Applied Mathematics.
J.W. Boyle, M.D. Mihajlovic and J.A. Scott (2010), ’HSL MI20: an ecient AMG
preconditioner for finite element problems in 3D’, Int. J. Numer. Meth. En-
gnrg 82, 64–98.
D. Braess and P. Peisker (1986), ’On the numerical solution of the biharmonic equa-
tion and the role of squaring matrices for preconditioning’, IMA J. Numer.
Anal. 6, 393–404.
J. Bramble and J. Pasciak (1988), ’A preconditioning technique for indefinite sys-
tem resulting from mixed approximations of elliptic problems’, Math. Com-
put. 50, 1–17.
J.H. Bramble, J.E. Pasciak and J. Xu (1990), ’Parallel multilevel preconditioners’,
Math. Comput. 55,1–22.
A. Brandt (1977), ’Multilevel adaptive solutions to boundary-value problems’,
Math. Comput. 31,333–390.
A. Brandt and S. Ta’asan (1986), ’Multigrid methods for nearly singular and
slightly indefinite problems’, in Multigrid Methods II, W. Hackbusch and
U. Trottenberg, eds., Lecture Notes in Math. 1228, Springer-Verlag, Berlin,
pp. 99–121.
O. Bro¨ker, M.J. Grote, C. Mayer and A. Reusken (2001), ’Robust parallel smooth-
ing for multigrid via sparse approximate inverses’, SIAM J. Sci Comp.
23, 1395–1416.
R.H.-F. Chan and X.-Q. Jin (2007), An Introduction to Iterative Toeplitz Solvers,
Society for Industrial and Applied Mathematics.
T.F. Chan and T.P. Mathew (1994), ’Domain decomposition algorithms’, Acta
Numerica 3, 61–143.
S. Chandrasekaran, P. Dewilde, M. Gu, T. Pals, X. Sun, A.J. van der Veen and
D. White (2005), ’Some fast algorithms for sequentially semiseparable repre-
sentation’, SIAM J. Matrix Anal. Appl. 27, 341–364.
K. Chen (2005), Matrix preconditioning techniques and applications, Cambridge
University Press.
P. Concus and G.H. Golub (1976), ’A generalized conjugate gradient method for
nonsymmetric systems of linear equations’, in R. Glowinski and P.L. Lions
eds., Lecture Notes in Economics and Mathematical Systems 134, 56–65,
Springer.
J.W. Cooley and J.W. Tukey (1965), ’An algorithm for the machine calculation of
complex Fourier series’, Math. Comput. 19, 297–301.
T.A. Davis (2004), ’Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern
multifrontal method’, ACM Trans. Math. Softw. 30, 196–199.
Preconditioning 41
’International Conference on Domain Decomposition Methods’,
http://www.ddm.org/conferences.html .
H. de Sterck, T. Manteu↵el, S. McCormick and L. Olson (2004), ’Least-squares
finite element methods and algebraic multigrid solvers for linear hyperbolic
PDEs’, SIAM J. Sci Comp. 26, 31–54.
H.S. Dollar, N.I.M. Gould, W.H.A. Schilders and A.J. Wathen (2006), ’Implicit-
factorization preconditioning and iterative solvers for regularized saddle-point
systems’, SIAM J. Matrix Anal. Appl. 28, 170–189.
H.S. Dollar and A.J. Wathen (2006), ’Approximate factorization constraint precon-
ditioners for saddle-point matrices’, SIAM J. Sci. Comput. 27, 1555–1572.
T.F. Dupont, R.P. Kendall and H.H. Rachford (1968), ’An approximate factoriza-
tion procedure for solving self-adjoint elliptic di↵erence equations’, SIAM J.
Numer. Anal. 5, 554–573.
H.C. Elman and M.P. Chernesky (1993), ’Ordering e↵ects on relaxation methods
applied to the discrete one-dimensional convection-di↵usion equation’, SIAM
J. Numer. Anal. 30, 1268–1290.
H.C. Elman, O.G. Ernst and D.P. O’Leary (2001), ’A multigrid method enhanced
by Krylov subspace iteration for discrete Helmholtz equations’, SIAM J. Sci.
Comput. 23, 1291–1315.
H.C. Elman, V.E. Howle, J.N. Shadid, R. Shuttleworth and R. Tuminaro (2006),
’Block preconditioners based on approximate commutators’, SIAM J. Sci.
Comput. 27, 1651–1668.
H.C. Elman, A. Ramage and D.J. Silvester (2007), ’Algorithm 886: IFISS, A Mat-
lab toolbox for modelling incompressible flow’, ACM Trans. Math. Softw.
33(2), article no. 14.
H.C. Elman, A. Ramage and D.J. Silvester (2014a), ’IFISS: A computational lab-
oratory for investigating incompressible flow problems’, SIAM Rev. 56, 261–
273.
H.C. Elman and M.H. Schultz (1986), ’Preconditioning by fast direct methods
for non-self-adjoint nonseparable elliptic equations’, SIAM J. Numer. Anal.
23, 44–57.
H.C. Elman and D.J. Silvester (1996), Fast nonsymmetric iterations and precon-
ditioning for the Navier-Stokes equations’, SIAM J. Sci. Comput. 17, 33–46.
H.C. Elman, D.J. Silvester and A.J. Wathen (2014b), Finite Elements and Fast
Iterative Solvers: with applications in incompressible fluid dynamics, Second
Edition, Oxford University Press.
H.C. Elman, D.J. Silvester and A.J. Wathen (2005), Finite Elements and Fast
Iterative Solvers: with applications in incompressible fluid dynamics, Oxford
University Press.
H.C. Elman and X. Zhang (1995), ’Algebraic analysis of the hierarchical basis
preconditioner’, SIAM J. Matrix Anal. Appl. 16, 192–206.
B. Engquist and L. Ying (2011a), ’Sweeping preconditioner for the Helmholtz equa-
tion: Hierarchical matrix representation’, Comm. Pure Appl. Math. 64, 697–
735.
B. Engquist and L. Ying (2011b), ’Sweeping preconditioner for the Helmholtz equa-
tion: moving perfectly matched layers’, Multiscale Model. Simul. 9, 686–710.
42 A. J. Wathen
J. Erhel, K. Burrage and B. Pohl (1995), ’Restarted GMRES preconditioned by
deflation’, J. Comput. Appl. Math. 69, 303–318.
Y.A. Erlangga, C.W. Oosterlee and C, Vuik (2006), ’A novel multigrid based pre-
conditioner for heterogeneous Helmholtz problems’, SIAM J. Sci. Comput.
27, 1471–1492.
O.G. Ernst and M.J. Gander (2012), ’Why it is dicult to solve Helmholtz prob-
lems with classical iterative methods’, in Numerical analysis of multiscale
problems, 325–363, Springer.
V. Faber and T. Manteu↵el (1984), ’Necessary and sucient conditions for the
existence of a conjugate gradient method’, SIAM J. Numer. Anal. 21, 315–
339.
R. P. Federenko (1964), ’The speed of convergence of one iterative process’, USSR
Comp. Math. and Math. Phys. 4, 227–235.
B. Fischer (2011), Polynomial Based Iteration Methods for Symmetric Linear Sys-
tems, Society for Industrial and Applied Mathematics.
B. Fischer, A. Ramage, D.J. Silvester and A.J. Wathen (1998), ’Minimum residual
methods for augmented systems’, BIT 38, 527–543.
J. Frank and C. Vuik (2001), ’On the construction of deflation-based precondition-
ers’, SIAM J. Sci. Comput. 23, 442–462.
R.W. Freund and N.M. Nachtigal (1991), ’QMR a quasi-minimal residual method
for non-Hermitian linear systems’, Numer. Math. 60, 315–339.
R.W. Freund and N.M. Nachtigal (1994), ’A new Krylov-subspace method for
symmetric indefinite linear systems’, Proceedings of the 14th IMACS World
Congress on Computational and Applied Mathematics, W.F. Ames(Ed.),
IMACS, 1253–1256.
R.W. Freund and S. Ruscheweyh (1986), ’On a class of Chebyshev approximation
problems which arise in connection with a conjugate gradient type method’,
Numer. Math. 48, 525–542.
M.J. Gander, F. Magoule`s and F. Nataf (2002), ’Optimized Schwarz methods with-
out overlap for the Helmholtz equation’, SIAM J. Sci. Comput. 24, 38–60.
M.W. Gee, C.M. Siefert, J.J. Hu, R.S. Tuminaro and M.G. Sala (2006), ’ML 5.0
Smoothed Aggregation User’s Guide’, Sandia National Laboratories, Report
number SAND2006-2649.
M.B. Giles (2008), ’Multilevel Monte Carlo path simulation’, Oper. Res. 56, 607–
617.
L. Giraud, S. Gratton, X. Pinel and X. Vasseur (2010), ’Flexible GMRES with
deflated restarting’, SIAM J. Sci. Comput. 32, 1858–1878.
N.I.M. Gould, M.E. Hribar and J. Nocedal (2001), ’On the solution of equality
constrained quadratic programming problems arising in optimization’, SIAM
J. Sci. Comput. 23, 1376–1395.
N.I.M. Gould and V. Simoncini (2009), ’Spectral analysis of saddle point matrices
with indefinite leading blocks’, SIAM J. Matrix Anal. Appl. 31, 1152–1171.
I.G. Graham and M.J. Hagger (1999), ’Unstructured additive Schwarz-conjugate
gradient method for elliptic problems with highly discontinuous coecients’,
SIAM J. Sci. Comput. 20, 2041–2066.
L. Grasedyck and W. Hackbusch (2003), ’Construction and arithmetics of H-
matrices’, Computing 70, 295–334.
Preconditioning 43
A. Greenbaum (1997), Iterative Methods for Solving Linear Systems, Society for
Industrial and Applied Mathematics.
L, Greengard and V. Rokhlin (1987), ’A fast algorithm for particle simulations’, J.
Comput. Phys. 73, 325–348.
C. Greif and D. Scho¨tzau (2006), ’Preconditioners for saddle point linear systems
with highly singular (1,1) blocks’, Elect. Trans. Numer. Anal. 22, 114–121.
C. Greif and D. Scho¨tzau (2007), ’Preconditioners for the discretized time-harmonic
Maxwell equations in mixed form’, Numer. Lin. Alg. Appl. 14, 281–297.
M.J. Grote and T. Huckle (1997), ’Parallel preconditioning with sparse approxi-
mate inverses’, SIAM J. Sci. Comput. 18, 83–853.
A. Gu¨nnel, R. Herzog and E. Sachs (2014), ’A note on preconditioners and scalar
products in Krylov subspace methods for self-adjoint problems in Hilbert
space’, Elect. Trans. Numer. Anal. 41, 13–20.
I. Gustafsson (1978), ’A class of first order factorization methods’, BIT 18, 142–
156.
E. Haber and U.M. Ascher (2001), ’Preconditioned all-at-once methods for large,
sparse parameter estimation problems’, Inverse Probl. 17, 1847–1864.
W. Hackbusch (1994), Iterative Solution of Large Sparse Systems of Equations,
Springer-Verlag.
W. Hackbusch (1999), ’A sparse matrix arithmetic based on H-matrices. I. Intro-
duction to H-matrices’, Computing 62, 89–108
W. Hackbusch, B.K. Khoromskij and S. Sauter (2000), ’On H2-matrices’, in Lec-
tures on Applied Mathematics, 9–29, H. Bungartz, R. Hoppe and C. Zenger,
eds..
M. Heil, A.L. Hazel and J. Boyle (2008), ’Solvers for large-displacement fluid-
structure interaction problems: segregated versus monolithic approaches’,
Comput. Mech. 43, 91–101.
V.E. Henson and U.M. Yang (2000), ’BoomerAMG: a parallel algebraic multigrid
solver and preconditioner’, Appl. Numer. Math. 41, 155–177.
M.R. Hestenes and E. Stiefel (1952), ’Methods of conjugate gradients for solving
linear problems’, J. Res. Nat. Bur. Stand. 49, 409–436.
R. Hiptmair and J. Xu (2007), ’Nodal auxiliary space preconditioning in H(curl)
and H(div) spaces’, SIAM J. Numer. Anal. 45, 2483–2509.
R.M. Holland, A.J. Wathen and G.J. Shaw, (2005), ’Sparse approximate inverses
and target matrices’, SIAM J. Sci. Comput. 26, 1000–1011.
’HSL(2013). A collection of fortran codes for large scale scientific computation’,
http://www.hsl.rl.ac.uk .
V. John and L. Tobiska (2000), ’Numerical performance of smoothers in coupled
multigrid methods for the parallel solution of the incompressible Navier-
Stokes equations’, Int. J. Numer. Meth. Fluids 33, 453–473.
T.B. Jonsthovel, M.B. van Gijzen, S.C. Maclachlan, C. Vuik and A. Scarpas (2012),
’Comparison of the deflated preconditioned conjugate gradient method and
algebraic multigrid for composite materials’, Comput. Mech. 50, 321–333.
D. Kay, D. Loghin and A.J. Wathen (2002), ’A preconditioner for the steady-state
Navier-Stokes equations’, SIAM J. Sci. Comput. 24, 237–256.
C. Keller, N.I.M. Gould and A.J. Wathen (2000), ’Constraint preconditioning for
indefinite linear systems’, SIAM J. Matrix Anal. Appl. 21, 1300–1317.
44 A. J. Wathen
A. Klawonn (1998), ’Block-triangular preconditioners for saddle point problems
with a penalty term’, SIAM J. Sci. Comput. 19, 172–184.
A. Klawonn and G. Starke (1999), ’Block triangular preconditioners for nonsym-
metric saddle point problems: field-of-value analysis’, Numer. Math. 81, 577–
594.
L.Y. Kolotilina and A.Y. Yeremin (1993), ’Factorized sparse approximate inverse
preconditioning. I. Theory’, SIAM J. Matrix Anal. Appl. 14, 45–58.
J. Korzak (1999), ’Eigenvalue relations and conditions of matrices arising in linear
programming’, Computing 62, 45–54.
A.L. Laird and M.B. Giles (2002), ’Preconditioned iterative solution of the 2D
Helmholtz equation’, report NA-02/12, Oxford University Computer Labora-
tory, Oxford, UK.
R.B. Lehoucq and D.C. Sorensen (1996), ’Deflation techniques for an implicitly
restarted Arnoldi iteration’, SIAM J. Matrix Anal. Appl. 17, 789–821.
X.S. Li (2005), ’An overview of SuperLU: Algorithms, implementation and user
interface’, ACM Trans. Math. Software 31, 302–325.
J. Liesen and Z. Strakosˇ (2013), Krylov subspace methods : principles and analysis,
Oxford University Press.
O.E. Livne and A. Brandt (2012), ’Lean Algebraic Multigrid (LAMG): Fast graph
Laplacian linear solver’, SIAM J. Sci. Comput. 34, B499–B522.
D. Loghin and A.J. Wathen (2004), ’Analysis of preconditioners for saddle-point
problems’, SIAM J. Sci. Comput. 25, 2029–2049.
J. Ma´lek and Z. Strakosˇ (2014), Preconditioning and the Conjugate Gradient
Method in the Context of Solving PDEs, Society for Industrial and Applied
Mathematics.
S. Manservisi (2006), ’Numerical analysis of Vanka-type solvers for steady Stokes
and Navier-Stokes flows’, SIAM J. Numer. Anal. 44, 2025–2056.
K.-A. Mardal and R. Winther (2011), ’Preconditioning discretizations of systems
of partial di↵erential equations’, Numer. Linear Algebra Appl. 18, 1–40.
T.P. Mathew (2008), Domain Decomposition Methods for the Numerical Solution
of Partial Di↵erential Equations, Springer.
J.A. Meijerink and H.A. van der Vorst (1977), ’An iterative solution method for
linear systems of which the coecient matrix is a symmetric M-matrix’,Math.
Comput. 31, 148–162.
G.A. Meurant (1999), Computer solution of large linear systems, North-Holland.
M. Magolu Monga Made, R. Beauwens and G. Warze´e (2000), ’Preconditioning
of discrete Helmholtz operators perturbed by a diagonal complex matrix’,
Commun. Numer. Methods Engrg. 16, 801–817.
M.F. Murphy, G.H. Golub and A.J. Wathen (2000), ’A note on preconditioning for
indefinite linear systems’, SIAM J. Sci. Comput. 21, 1969–1972.
M. Neytcheva (1999), talk presented at the first International Conference on Pre-
conditioning, Minneapolis, 1999.
M.K. Ng (2004), Iterative Methods for Toeplitz Systems, Oxford University Press.
Y. Notay (2012), ’Aggregation-based algebraic multigrid for convection-di↵usion
equations’, SIAM J. Sci. Comput. 34, A2288–A2316.
D.P. O’Leary ’Yet another polynomial preconditioner for the conjugate gradient
algorithm’, Linear Algebra Appl. 154–156, 377–388.
Preconditioning 45
M.A. Olshanskii and E.E. Tyrtyshnikov (2014), Iterative Methods for Linear Sys-
tems: Theory and Applications, Society for Industrial and Applied Mathe-
matics.
M.E.G. Ong (1997), ’Hierarchical basis preconditioners in three dimensions’, SIAM
J. Sci. Comput. 18, 479–498.
K. Otto (1996), ’Analysis of preconditioners for hyperbolic partial di↵erential equa-
tions’, SIAM J. Numer. Anal. 33, 2131–2165.
C.C. Paige and M.A. Saunders (1975), ’Solution of sparse indefinite systems of
linear equations’, SIAM J. Num. Anal. 12, 617–629.
C.C. Paige and M.A. Saunders (1982), ’LSQR: An algorithm for sparse linear
equations and sparse least squares’, ACM Trans. Math. Software 8, 43–71.
M.L. Parks, E. de Sturler, G. Mackey, D.D. Johnson and S. Maiti (2006), ’Recycling
Krylov subspaces for sequences of linear systems’, SIAM J. Sci. Comput.
28, 1651–1674.
J. Pestana and A.J. Wathen (2014), ’A preconditioned MINRES method for non-
symmetric Toeplitz matrices’, Numerical Analysis report 14/11, Oxford Uni-
versity Mathematical Institute, Oxford, UK.
J. Pestana and A.J. Wathen (2013a), ’Combination preconditioning of saddle point
systems for positive definiteness’, Numer. Linear Algebra Appl. 20, 785–808.
J. Pestana and A.J. Wathen (2013b), ’On choice of preconditioner for mini-
mum residual methods for non-Hermitian matrices’, J. Comput. Appl. Math.
249, 57–68.
J. Pestana and A.J. Wathen (2015), ’Natural preconditioning and iterative methods
for saddle point systems’, SIAM Rev. to appear.
E.G. Phillips, H.C. Elman, E.C. Cyr, J.N. Shadid and R.P. Pawlowski (2014), ’A
block preconditioner for an exact penalty formulation for stationary MHD’,
University of Maryland, Computer Science report CS-TR-5032.
J. Poulson, B. Engquist, S. Li and L. Ying (2013), ’A parallel sweeping precon-
ditioner for heterogeneous 3D Helmholtz equations’, SIAM J. Sci. Comput.
35, C194–C212.
A. Quarteroni and A. Valli (1999), Domain Decomposition Methods for Partial
Di↵erential Equations, Oxford University Press.
A. Ramage (1999), ’A multigrid preconditioner for stabilised discretisations of
advection-di↵usion problems’, J. Comput. Appl. Math. 101, 187–203.
T. Rees, H.S. Dollar and A.J. Wathen (2010a), ’Optimal solvers for PDE-
constrained optimization’, SIAM J. Sci. Comput. 32, 271–298.
T. Rees, M. Stoll and A.J. Wathen (2010b), ’All-at-once preconditioning in PDE-
constrained optimization’, Kybernetika 46, 341–360.
S. Rhebergen, G.N. Wells, R.F. Katz and A.J. Wathen (2014), ’Analysis of block-
preconditioners for models of coupled magma/mantle dynamics’, SIAM J.
Sci. Comput. 36, A1960–A1977.
Y. Saad (1993), ’A flexible inner-outer preconditioned GMRES algorithm’, SIAM
J. Sci. Comput. 14, 461–469.
Y. Saad (1996), ’ILUM: a multi-elimination ILU preconditioner for general sparse
matrices’, SIAM J. Sci. Comput. 17, 830–847.
Y. Saad (2003), Iterative Methods for Sparse Linear Systems, second edition, So-
ciety for Industrial and Applied Mathematics.
46 A. J. Wathen
Y. Saad and M.H. Schultz (1986), ’GMRES: A generalized minimal residual algo-
rithm for solving nonsymmetric linear systems’, SIAM J. Sci. Statist. Com-
put. 7, 856–869.
J.A. Sifuentes, M. Embree and R.B. Morgan (2013), ’GMRES convergence for per-
turbed coecient matrices, with application to approximate deflation precon-
ditioning’, SIAM. J. Matrix Anal. Appl. 34, 1066–1088.
D.J. Silvester and A.J. Wathen (1994), ’Fast iterative solution of stabilised Stokes
systems Part II: Using general block preconditioners’, SIAM J. Numer Anal.
31, 1352–1367.
V. Simoncini and D.B. Szyld (2007), ’Recent computational developments in
Krylov subspace methods for linear systems’, Numer. Linear Algebra Appl.
14, 1–59.
G.L.G. Sleijpen and D.R. Fokkema (1993), ’BiCGstab(ell) for linear equations in-
volving unsymmetric matrices with complex spectrum’, Elect. Trans. Numer.
Anal. 1, 11–32.
B. Smith, P. Bjørstad and W. Gropp (1996), Domain Decomposition, Parallel Mul-
tilevel Methods for Elliptic Partial Di↵erential Equations, Cambridge Univer-
sity Press.
P. Sonneveld and M.B. van Gijzen (2008), ’IDR(s): A family of simple and fast
algorithms for solving large nonsymmetric systems of linear equations’, SIAM
J. Sci. Comput. 31, 1035–1062.
D.A. Spielman and S.-H. Teng (2014), ’Nearly-linear time algorithms for precon-
ditioning and solving symmetric, diagonally dominant linear systems’, SIAM
J Matrix Anal. Appl. 35, 835–885.
M. Stoll and A.J. Wathen, (2008), ’Combination preconditioning and the Bramble-
Pasciak+ preconditioner’, SIAM J Matrix Anal. Appl. 30, 582–608.
G. Strang (1986), ’A proposal for Toeplitz matrix calculations’, Stud. Appl. Math.
74, 171–176.
W-P. Tang and W.L. Wan (2000), ’Sparse approximate inverse smoother for multi-
grid’, SIAM J. Matrix Anal. Appl. 21, 1236–1252.
O. Taussky (1972), ’The role of symmetric matrices in the study of general matri-
ces’, Linear Algebra Appl. 5, 147–154.
A. Toselli and O. Widlund (2004), Domain Decompposition Methods - Algorithms
and Theory, Springer Series in Computational Mathematics, Vol. 34.
L.N. Trefethen and M. Embree (2005), Spectra and Pseudospectra, Princeton Uni-
versity Press.
U. Trottenberg, C.W. Oosterlee and A. Schu¨ler (2000), Multigrid, Academic Press.
P. Tsuji, B. Engquist and L.Ying (2012), ’A sweeping preconditioner for time-
harmonic Maxwells equations with finite elements’, J. Comput. Phys.
231, 3770–3783.
M.B. van Gijzen, Y.A. Erlangga and C. Vuik (2007), ’Spectral analysis of the
discrete Helmholtz operator preconditioned with a shifted Laplacian’, SIAM
J. Sci. Comput. 29, 1942–1958.
H.A. van der Vorst (1992), ’Bi-CGSTAB: A fast and smoothly converging variant of
Bi-CG for the solution of nonsymmetric linear systems’, SIAM J. Sci. Statist.
Comput. 13. 631–644.
Preconditioning 47
H.A. van der Vorst (2003), Iterative Krylov Methods for Large Linear Systems,
Cambridge University Press.
C.F. Van Loan (1992), Computational Frameworks for the Fast Fourier Transform,
Society for Industrial and Applied Mathematics.
P.S. Vassilevski (2008), Multilevel Block Factorization Preconditioners, Matrix-
based Analysis and Algorithms for Solving Finite Element Equations,
Springer.
R. Verfu¨rth (1984), ’A combined conjugate gradient-multigrid algorithm for the
numerical solution of the Stokes problem’, IMA J. Numer. Anal. 4, 441–455.
A.J. Wathen (1986), ’Realistic eigenvalue bounds for the Galerkin mass matrix’,
IMA J. Numer. Anal. 7 449–457.
A.J. Wathen (2007), ’Preconditioning and convergence in the right norm’, Int. J.
Comput. Math. 84, 1199–1209.
A.J. Wathen, B. Fischer and D.J. Silvester (1995), ’The convergence rate of the
minimum residual method for the Stokes problem’, Numer. Math. 71, 121–
134.
A.J. Wathen and T. Rees (2009), ’Chebyshev semi-iteration in preconditioning for
problems including the mass matrix’, Elect. Trans. Numer. Anal. 34, 125–135.
M.P. Wathen (2014), ’Iterative solution of a mixed finite element discretisation of
an incompressible magnetohydrodynamics problem’, MSc dissertation, De-
partment of Computer Science, University of British Columbia.
O. Widlund (1978), ’A Lanczos method for a class of nonsymmetric systems of
linear equations’, SIAM J. Numer. Anal. 15, 801–812.
P.H. Worley (1991), ’Limits on parallelism in the numerical solution of linear partial
di↵erential equations’, SIAM J. Sci. Stat. Comput. 12, 1–35.
J. Xu (1992), ’Iterative methods by space decomposition and subspace correction’,
SIAM Rev. 34, 581–613
G. Ymbert, M. Embree and J. Sifuentes (2015), ’Approximate Murphy-Golub-
Wathen preconditioning for saddle point problems’, In Preparation.
H. Yserentant (1986), ’On the multi-level splitting of finite element spaces’, Numer.
Math. 49, 379–412.

essay、essay代写