xuebaunion@vip.163.com

3551 Trousdale Rkwy, University Park, Los Angeles, CA

留学生论文指导和课程辅导

无忧GPA：https://www.essaygpa.com

工作时间：全年无休-早上8点到凌晨3点

微信客服：xiaoxionga100

微信客服：ITCS521

程序代写案例-OX2

时间：2021-05-05

For: Acta Numerica

Preconditioning

A. J. Wathen

Mathematical Institute, University of Oxford,

Andrew Wiles Building, Radcli↵e Observatory Quarter,

Woodstock Road, Oxford OX2 6GG, UK

E-mail: wathen@maths.ox.ac.uk

The computational solution of problems can be restricted by the availability

of solution methods for linear(ized) systems of equations. In conjunction with

iterative methods, preconditioning is often the vital component in enabling

the solution of such systems when the dimension is large. We attempt a broad

review of preconditioning methods.

CONTENTS

1 Introduction 1

2 Krylov subspace iterative methods 5

3 Symmetric and positive definite matrices 8

4 Semidefinite matrices 18

5 Symmetric and indefinite matrices 19

6 Nonsymmetric matrices 27

7 Some comments on symmetry 33

8 Some comments on practical computing 35

9 Summary 38

References 39

1. Introduction

This review aims to give a comprehensive view of the current state of pre-

conditioning for general linear(ized) systems of equations. However we start

with a specific example.

2 A. J. Wathen

Consider a symmetric matrix with constant diagonals:

A =

266664

a0 a1 · · an1

a1 a0 a1 · ·

· a1 a0 · ·

· · · · a1

an1 · · a1 a0

377775 2 Rn⇥n.

Such a matrix is called a symmetric Toeplitz matrix. It is clearly defined

by just n real numbers, but what is the best solution method for linear sys-

tems involving A? The fastest known stable factorisation (direct) methods

for solving a linear system Ax = b for x 2 Rn given b 2 Rn require O(n2)

floating point computations—this is still better than the O(n3) computa-

tions required for a general real n ⇥ n matrix using the well-known Gauss

Elimination algorithm.

Strang (1986) observed that one could approximate A by the circulant

matrix

C =

2666666666666666664

a0 a1 . . . abn2 c abn12 c . . . a2 a1

a1 a0 a1 . . . abn2 c abn12 c . . . a2

. . . a1 a0

. . . . . . abn2 c

. . . . . .

abn2 c . . .

. . .

. . .

. . . . . .

. . . abn12 c

abn12 c

. . . . . .

. . .

. . .

. . . . . . abn2 c

. . .

. . . abn2 c . . .

. . . a0 a1 . . .

a2 . . . abn12 c abn2 c . . . a1 a0 a1

a1 a2 . . . abn12 c abn2 c . . . a1 a0

3777777777777777775

2 Rn⇥n

where the outer diagonals of A have been overwritten by the central di-

agonals which are ‘wrapped around’ in a periodic manner. Circulant ma-

trices are diagonalised by a Fourier basis (Van Loan 1992) and it follows

that for any given b 2 Rn, the fast Fourier transform (FFT) (Cooley and

Tukey 1965) enables the computation of the matrix-vector product Cb and

of the solution, y 2 Rn of Cy = b in O(n log n) arithmetic operations. The

importance of Strang’s observation is that C is close to A if the discarded

entries adn2 e, . . . , an are small; precise statements will describe this in terms

of decay in the entries of A along rows/columns moving away from the di-

agonal. With such a property, one can show that not only are the matrices

A and C close entrywise, but that they are spectrally close, i.e. that the

eigenvalues of C1A (equivalently the eigenvalues of the matrix ‘pencil’

A C) are mostly close to 1. As a consequence, an appropriate iterative

method (it would be the conjugate gradient method if A is positive definite:

Preconditioning 3

see below) will compute a sequence of vectors x1, x2, . . . which converge

rapidly from any starting guess, x0 to the solution x of Ax = b. At each

iteration only a matrix-vector product with A and the solution of a single

system with C as well as vector operations need be computed. For any de-

sired accuracy, a fixed number of iterations not dependent on the dimension

n will be required. The outcome is that use of the FFT for solving systems

with C and for doing multiplications of vectors by A (since A can in an ob-

vious way be embedded in a circulant matrix of dimension 2n 1⇥ 2n 1)

guarantees that the iterative method will require only O(n log n) operations

to compute the solution.

Strang’s ‘proposal for Toeplitz matrix calculations’ is thus to employ the

circulant matrix C as a preconditioner for linear systems involving A. One

can naively think of the preconditioning as converting a linear system

Ax = b

which is not so readily solved into the system

C1Ax = C1b

for which there is a fast solution method. This is not exactly what is done:

generally C1A will be nonsymmetric even though A and C are symmetric,

but it is possible to precondition with C whilst preserving symmetry of the

preconditioned system matrix; see Saad (2003, Algorithm 9.1).

This example captures the essence of preconditioning. Not always does

one have matrices which are so obviously structured (nor with potentially

so many non-zero entries), nor does one always have a clever and ecient

tool like the FFT, but, especially for matrices of large dimension, it can

be extremely beneficial to have a good matrix approximation, i.e. a good

preconditioner, when solving linear systems of equations. Indeed, often such

a preconditioner is necessary in order to enable practical computation at all

of large-scale problems within reasonable time on any given computational

platform. The reduction from O(n2) to O(n log n) in the above example

brings some large-dimensional Toeplitz systems into the range of practical

computation.

One does not need to necessarily have a preconditioner as a matrix; a

linear operation (i.e. a computational procedure which applies a linear op-

eration to a vector) is what is required.

This article is a personal perspective on the subject of preconditioning.

It addresses preconditioning only in the most common context of the so-

lution of linear systems of equations. Much research on this problem has

been directed at sparse matrices coming from the discretization of partial

di↵erential equations (PDEs) by finite element, finite volume or finite di↵er-

ence methods. This is certainly one major source for such problems, but far

from the only area where large dimensional linear systems arise. It is true,

4 A. J. Wathen

however, that underlying properties of a PDE problem are often reflected in

the matrices which arise from discretization, so that matrices arising from

PDEs can inherit structural properties which allow them to not simply be

considered as arrays of numbers. Indeed there are many PDE analysts who

consider PDE matrices as simply members of families of discrete operators

with many of the properties of the underlying continuous operators from

which they are derived through discretization, see for example Mardal and

Winther (2011), or the recent book by Ma´lek and Strakosˇ (2014). This can

be a very valuable viewpoint. Indeed from this perspective, not applying any

preconditioning is seen to coincide with rather an unnatural choice of map-

pings on the underlying spaces (Gu¨nnel et al. 2014). Sometimes one comes

across the perception that multigrid must be the most ecient method to

solve any PDE problem: this is far from the truth, however multigrid meth-

ods and ideas remain very important both as solvers for certain simpler

problems, and, in particular, as components of preconditioners for a wider

class of problems as we shall describe.

Whether or not a matrix system arises from PDEs, knowledge of the

source problem can be helpful in designing a good preconditioner. Purely

algebraic approaches which simply take the numerical matrix entries as

input can have merit in situations where little is known about the underlying

problem—they certainly have the useful property that it is usually possible

to apply such a method—but the generated preconditioners can be poor.

This division between approaches to preconditioning essentially comes down

to whether one is ‘given a problem’ or ‘given a matrix’ (Neytcheva 1999).

Though we emphasise much about the former, we also give pointers to and

brief description of the latter.

Preconditioners are overwhelmingly used together with Krylov subspace

iterative methods (van der Vorst 2003, Greenbaum 1997, Saad 2003, Meurant

1999, Hackbusch 1994, Elman et al. 2014b, Liesen and Strakosˇ 2013, Olshan-

skii and Tyrtyshnikov 2014), and the applicable iterative methods make

di↵erent requirements of a preconditioner: for example, one could employ

a nonsymmetric preconditioner for a symmetric matrix, but in general one

would then lose the possibility to use the very e↵ective conjugate gradient or

minres Krylov subspace iterative methods. Therefore, we have structured

this paper into sections based on the symmetry and definiteness proper-

ties of the presented coecient matrix. In essence, this subdivision can

be thought of also as in terms of the Krylov subspace method of choice.

The coecient matrix will usually be denoted by A as in the example above

(though di↵erent fonts will be used). For any invertible preconditioning ma-

trix or linear operator, P , there is an iterative method that can be used, but

it is usually desirable not to ignore structural properties such as symmetry

when choosing/deriving a preconditioner.

We have generally restricted consideration to real square matrices, A, but

Preconditioning 5

it will be evident that some of the di↵erent preconditioning ideas could be

applied to matrices over di↵erent algebraic fields, in particular the complex

numbers (for which ‘symmetric’ ! ‘Hermitian’ etc.). It is also true that

some of the ideas presented apply to rectangular matrices such as arise in

least squares problems, but, except tangentially, this paper does not consider

such situations.

Our general strategy/intention is to describe most preconditioning ap-

proaches that we are aware of, and to guide the reader towards selecting an

appropriate strategy for any particular problem. Inevitably, some precon-

ditioning ideas have arisen in the specific context of particular problems, so

brief review of such problems is included.

Over ten years ago, Benzi (2002) wrote an excellent survey of precon-

ditioning techniques which contains much wisdom and remains an excel-

lent source today, in particular with regard to incomplete factorization and

sparse approximate inverse methods. The text by Chen (2005) describes

several approaches and includes many example applications.

In section 3 preconditioning approaches for symmetric and positive defi-

nite matrices are discussed. There follows in section 4 some brief comments

on symmetric semidefinite matrices. This is followed in section 5 by precon-

ditioning for symmetric and indefinite matrices and finally for nonsymmetric

matrices in section 6. We start, however, with a brief review of on iterative

methods in order to orient ourselves.

2. Krylov subspace iterative methods

For any vector r 2 Rn, if it is easy to compute the matrix-vector product

Ar, then it is easy to generate vectors in the Krylov subspaces

Kk(A, r) = span{r, Ar,A2r, . . . , Ak1r}, k = 1, 2, . . .

by forming A(Ar) etc.. In particular, when A is sparse (i.e. has few non-zero

entries in any row), then such matrix-vector products are readily computed.

For general dense matrices, a matrix-vector multiplication takes O(n2) op-

erations, hence there is less possibility for preconditioning to reduce this

complexity unless some other structure can be used such as in the Toeplitz

example above.

Krylov subspace iterative methods for finding x 2 Rn which satisfies

Ax = b compute iterates xk for which

xk x0 2 Kk(A, r0), k = 1, 2, . . . (2.1)

from some initial guess x0 and thus require just one matrix-vector product

computation at each iteration. The vector r0 = b Ax0 is the residual

associated with x0; in general rj = bAxj , j = 0, 1, 2, . . .. If x0 = 0 then

xk 2 Kk(A, b), k = 1, 2, . . . .

6 A. J. Wathen

Thus the iterates and residuals of every Krylov subspace method satisfy

xk x0 =

k1X

j=0

↵jA

jr0

for some coecients ↵j , j = 0, 1, 2, . . . , k 1. Hence

xk = x0 + q(A)r0 (2.2)

where q is the polynomial of degree k 1 with q(z) =Pk1j=0 ↵jzj . Premul-

tiplying (2.2) by A and subtracting from b we obtain

bAxk = bAx0 Aq(A)r0

thus

rk = r0 Aq(A)r0 = p(A)r0 (2.3)

where

p(z) = 1 z

k1X

j=0

↵jz

j = 1

kX

j=1

↵j1zj

is a polynomial of degree k which satisfies p(0) = 1.

All Krylov subspace methods are thus described by (2.3) and each dif-

ferent Krylov subspace method is characterised by an optimality or (bi-)or-

thogonality property.

When A is symmetric and positive definite (so that kvkA = (vtAv)

1

2 de-

fines a vector norm), the conjugate gradient method (cg) (Hestenes and

Stiefel 1952), (Saad 2003, algorithm 6.18) requires only the one matrix-

vector product with A as well as five simple vector operations at each itera-

tion to compute iterates satisfying (2.1) which minimise kxxkkA over the

Krylov subspace.

When A is symmetric but indefinite, rTAr takes both positive and nega-

tive values so that a norm can not be defined as for the conjugate gradient

method. The Krylov subspace method of choice for symmetric indefinite sys-

tems, the minimum residual (minres) method (Paige and Saunders 1975),

(Fischer 2011, page 187) takes one matrix-vector product with A as well

as seven simple vector operations at each iteration to compute iterates sat-

isfying (2.1) which minimise the Euclidean norm of the residual, krkkI =

(rTk rk)

1

2 .

When A is nonsymmetric, there is not so obvious a method of choice,

hence several Krylov subspace methods are widely used. The most popular

is gmres (Saad and Schultz 1986), (Saad 2003, algorithm 6.9) which, sim-

ilarly to minres, computes iterates which minimise the Euclidean norm of

the residual, but by contrast to minres requires an increasing ammount

Preconditioning 7

of computation and storage at each successive iteration to achieve this.

Thus gmres can be a good method if only few iterations are needed

to achieve acceptable convergence—this might be the case if one has a

good preconditioner—but it is not practical if many iterations are required.

Restarting gmres at (regular) intervals reduces the computational burden,

but also in most cases severely slows convergence. Thus use is made of

Krylov subspace methods for nonsymmetric A which require only a fixed

ammount of work at each iteration, but which necessarily therefore have

more erratic convergence (Faber and Manteu↵el 1984). These are gener-

ally based on the nonsymmetric Lanczos method: we mention bicgstab

(van der Vorst 1992), bicgstab(`) (Sleijpen and Fokkema 1993), qmr

(Freund and Nachtigal 1991), idr(s) (Sonneveld and van Gijzen 2008),

though there are many others. There are also several variants, see Simoncini

and Szyld (2007).

For any nonsingular nonsymmetric matrix, it is always possible to attack

the symmetric and positive definite system of normal equations

ATAx = AT b, (2.4)

(or AAT y = b, x = AT y) and in some instances this has been proposed;

see for example Laird and Giles (2002). A related approach is to solve a

symmetric indefinite system in a form such as

I A

AT 0

r

x

=

b

0

. (2.5)

However, whichever form is employed, convergence can be rather slow; AT

is not generally such a good preconditioner for A! To give an idea of the

convergence that might be expected: if A was already symmetric and posi-

tive definite, then generally the number of cg iterations required for (2.4)

would be approximately the square of the number of cg iterations for the

system Ax = b. There are situations in which this may not be such a di-

culty, however, it is generally harder to find good preconditioners for ATA

(or AAT ) than for A and this may be the main reason that more attention

and usage is made of the nonsymmetric Krylov subspace iterative meth-

ods above. An important and underappreciated general observation in this

regard is due to Braess and Peisker (1986):

Remark 1. P TP can be an arbitrarily bad preconditioner for ATA re-

gardless of the quality of P as a preconditioner for A.

The above remark applies even if A, P are symmetric as is shown by the

elementary example in Braess and Peisker (1986). We will see in section 5

situations where one can build a particular preconditioner P for A so that

P TP is also a good preconditioner for ATA, indeed the main result in Braess

and Peisker (1986) provides sucient conditions for this to occur.

8 A. J. Wathen

Now, cg for (2.4) will generate a Krylov space of the form K(ATA,AT b)

for example, when x0 = 0. Perhaps the most suitable implementation alge-

braically equivalent to a symmetric Krylov subspace method for the normal

equations is the lsqr algorithm of Paige and Saunders (1982) which also

computes least squares solutions for rectangular A. An advantage of these

methods is that the convergence theory of cg and minres apply.

The reason that there are methods of choice (cg , minres) for symmetric

matrices, but a whole range of possibilities for nonsymmetric matrices arises

because of convergence guarantees. In the case of symmetric matrices there

are descriptive convergence bounds which depend only on the eigenvalues

of the coecient matrix, thus the number of iterations which will be needed

for convergence to a given tolerance can be estimated and bounded a priori;

a good preconditioner will ensure fast convergence. These comments apply

also for normal equation approaches for nonsymmetric matrices.

By contrast, to date there are no generally applicable and descriptive

convergence bounds even for gmres; for any of the nonsymmetric methods

without a minimisation property, convergence theory is extremely limited.

The convergence of gmres has attracted a lot of attention, but the absence

of any reliable way to guarantee (or bound) the number of iterations needed

to achieve convergence to any given tolerance in most situations means that

there is currently no good a priori way to identify what are the desired

qualities of a preconditioner. This is a major theoretical diculty, but

heuristic ideas abound.

All Krylov subspace methods are to an extent a↵ected by rounding errors

in a computer if run for a large enough number of iterations, see Liesen and

Strakosˇ (2013) or the discussion in Greenbaum (1997, Chapter 4). In this

paper we ignore any such e↵ects since an important consequence of good

preconditoning is that few iterations should be necessary for acceptable

convergence, thus round-o↵ e↵ects are not usually able to build up and

become a significant issue. Similarly, there is theory which describes the

eventual (i.e. asymptotic) convergence of Krylov subspace methods such as

cg, but this is also not generally relevant to preconditioned iterations—it is

rapid error or residual reduction in early iterations that is typically desired.

3. Symmetric and positive definite matrices

The standard cg convergence bound for iterative solution of Ax = b when

A is symmetric and positive definite comes straight from (2.3) and the min-

imisation condition described above and is

kx xkkA

kx x0kA minpk2⇧k,pk(0)=1 max2(A) |pk()|,

where kyk2A = yTAy, ⇧k is the set of real polynomials of degree less or

equal to k and (A) is the eigenvalue spectrum of A (Greenbaum 1997,

Preconditioning 9

section 3.1). One immediately deduces that if A has a small number of

distinct eigenvalues, say s, then cg will terminate at the sth iteration with

the correct solution since there is a degree s polynomial which has these

eigenvalues as its roots. Iteration error can already be acceptably small

after few iterations when the eigenvalues are clustered. In particular if one

considers all eigenvalues to lie in a single interval Amin Amax then use

of Chebyshev polynomials yields the convergence bound

kx xkkA

kx x0kA 2

✓p

1p

+ 1

◆k

, (3.1)

where = A = Amax/

A

min is the Euclidean norm (or 2-norm) condition

number for such a symmetric positive definite matrix; see again Greenbaum

(1997, section 3.1). An equivalent statement is that it takes approximately

1

2

p

| log ✏/2| iterations to ensure that kx xkkA/kx x0kA ✏.

The origin of the name preconditioning is generally now associated with

this convergence bound in the sense that reduction of the condition num-

ber, , guarantees faster convergence (though for historical accuracy, see

Benzi (2002)); what we now understand as preconditioning is, however, a

much broader concept. If P is also symmetric and positive definite and the

extreme generalised eigenvalues of the ‘pencil’ Av = Pv (equivalently the

extreme eigenvalues of P1A) satisfy

A =

Amax

Amin

P1A

max

P

1A

min

= P

1A,

then cg convergence for the preconditioned system is guaranteed in many

fewer iterations than for the original unpreconditioned system since (3.1)

holds now with = P

1A. Note that for cg even with preconditioning, it

remains the relative magnitude of the error in k·kA which arises in the bound

(3.1); a symmetric positive definite preconditioner a↵ects the eigenvalues

and thus the speed of convergence, but not the norm in which convergence

occurs. Use of a nonsymmetric preconditioner, P , would prevent use of cg;

a nonsymmetric Krylov subspace method would in that case have to be used

in general (but see Section 7).

To establish the quality of P as a preconditioner for A, it is sucient

to find (and a priori prove if possible!) upper and lower bounds on the

eigenvalues of the form

` P1Amin , P

1A

max ⌥ (3.2)

or equivalently—and usually more tractably—upper and lower bounds on

the generalised Rayleigh quotient

` u

TAu

uTPu

⌥, for all u 2 Rn {0}. (3.3)

10 A. J. Wathen

In either case P

1A ⌥/` leads directly to a convergence bound through

(3.1) since the right hand side varies monotonically with .

One does not actually replace A with P1A in the cg algorithm to imple-

ment preconditioning in the case of symmetric A since this would generally

destroy symmetry, but even when symmetry is preserved (as in Saad (2003,

Algorithm 9.1)), the action of P1 is required on a di↵erent vector at each

iteration.

One thus has clear sucient criteria for identifying a good preconditioner,

P , in this symmetric positive definite case:

• it should be easy to solve systems of equations with the preconditioner

P as coecient matrix, that is, the action of P1 on a vector should

be readily computed

• P1A should be small.

These are competing aims; P = I satisfies the first but not the second unless

A is already well conditioned and P = A satisfies the second but not the

first unless solution of systems with A is already easy!

3.1. Diagonal scaling

The earliest preconditioning ideas came from consideration of the first of

these criterion and are largely in the ‘given a matrix’ category. Taking P

to be a diagonal matrix is very simple; if in fact P = diag(A) then this is

widely called Jacobi preconditioning. For a well conditioned matrix which is

just badly scaled, this is generally all that is needed, but in general diagonal

preconditioning is likely to achieve very little in terms of reducing iterations

or computation time except in specific situations as below. There are two

related types of problem where Jacobi preconditioning (or just diagonal

scaling) is important.

Firstly, the ‘identity’ matrix for finite element computation is the mass

matrix, Q, which is the Gram matrix of the finite element basis functions

{i, i = 1, . . . , n}, i.e. it has entries

Q = {qi,j , i, j = 1, . . . , n}, qi,j =

Z

⌦

⇢i(x)j(x) (3.4)

for a domain ⌦. Here the potentially variable coecient ⇢ might, for exam-

ple, represent material density in a structure, and if this is highly variable

or if the discretization has a large variation in the size (length, area, vol-

ume) of the elements—as is inevitable on an irregular or adapted mesh—

then this variability is also represented in the diagonal entries of Q. Thus

P = diag(Q) is provably a good preconditioner in this case (Wathen 1986).

For example the gallery(’wathen’,n,n) matrix in matlab is a finite el-

ement mass matrix with randomly variable density for a so-called seredipity

Preconditioning 11

finite element. The precise eigenvalue bounds of the form (3.2) or (3.3), to-

gether with tabulated values for ` and ⌥ for many types of finite elements,

are derived in Wathen (1986).

We briefly comment that in problems as above with analytic a priori

eigenvalue bounds ` and ⌥ which are (reasonably) tight, the possibility also

arises to use a fixed but small number of Chebyshev (semi-)iterations as a

(linear) preconditioner—see Wathen and Rees (2009).

The second important situation arises also through highly variable coef-

ficients; the classic example arises from the operator

r · (Kru)

where the permeability K has large jumps. For continuous finite element

approximation of such terms, Graham and Hagger (1999) were the first to

identify the utility of diagonal scaling to render the preconditioned prob-

lem more homogeneous. For such an elliptic PDE, Jacobi preconditioning

is highly unlikely to be an adequate preconditioner on its own, but when

used together with an appropriate preconditioner for a homogeneous elliptic

PDE operator—see below—the combination can be much better than just

the elliptic preconditioner by itself. In the literature there are several de-

velopments of the idea of Graham and Hagger, including for non-isotropic

permeabilities.

3.2. Incomplete Cholesky factorization

It is well known that Cholesky factorization (LLT or LDLT factorization

where L is lower triangular and D is diagonal) of a symmetric and positive

definite matrix leads immediately via forward and backward substitution to

direct solution of a matrix system. For sparse matrices, unfortunately, the

factor L is usually much less sparse (i.e. has many more non-zeros) than

A. Such fill-in is usually unavoidable, though widely used sparse direct

methods use sophisticated algorithms in attempting to minimize it (HSL

2013, Davis 2004, Amestoy et al. 2000, Li 2005). In the context of PDE

problems, the essential rule is that a sparse direct method can be a good

solver for a problem on a 2-dimensional spatial domain, but is generally

computationally infeasible for a 3-dimensional problem. If one preselects

(or dynamically selects) the sparsity pattern of L and drops terms which

would fill outside of this pattern, then one can form an incomplete Cholesky

factorization

A = LLT + E

where E represents the error due to the dropped terms. One can additionally

threshold so that small entries are not kept in the incomplete factors. With

P = LLT , it is clear that forward and backwards substitution still enable

12 A. J. Wathen

easy solution of systems with the preconditioner, and the main remaining

question is whether the preconditioner leads to fast convergence of cg.

In general many di↵erent variants of this basic idea exist and software for

various incomplete factorization-based algorithms is widely available. This

is certainly a plus point, but unless A is diagonally dominant, incomplete

Cholesky algorithms do not always work well, indeed they do not always

exist without modification, see Meijerink and van der Vorst (1977), and

see Benzi (2002) for discussion of this as well as many other aspects of

incomplete factorization preconditioning. With regard to robustness, there

are certain advantages to computing an incomplete triangular factorization

not using Cholesky but via orthogonalization in the inner product generated

by A; see Benzi and Tuma (2003). This approach is called RIF (robust

incomplete factorization) and shares some similarity with the AINV sparse

approximate inverse discussed below.

Let us make some of these statements precise in the context of solving

a simple discretized elliptic boundary value problem, namely the common

5-point finite di↵erence approximation of the Dirichlet problem for the (neg-

ative) Laplacian

r2u = f in ⌦, u given on @⌦

which arises from second order central di↵erencing on a regular grid with

spacing h in each of the two spatial dimensions. The choice of sign ensures

that the operator and thus the arising matrix are positive rather than neg-

ative definite. If the domain, ⌦, is the unit square (so that the boundary

@⌦ is four coordinate-aligned line segments) then the arising matrix A is

of dimension N ⇡ h2 for small h and also A = O(h2), see Elman et al.

(2014b, section 1.6). Thus (3.1) predicts that O(h1) cg iterations would

be required for convergence of unpreconditioned cg. The simplest incom-

plete Cholesky preconditioner for which L has the same sparsity pattern

as the lower trianglular part of A also leads to P

1A = O(h2) though

there is significant clustering of a large number of the eigenvalues around 1;

the bound (3.1) based only on the extreme eigenvalues is pessimistic in this

situation. A variant, modified incomplete factorization (due to Gustafsson

(1978) but related to rather earlier work by Dupont et al. (1968)) reduces

this to P

1A = O(h1), and generally leads to slightly faster cg conver-

gence. Applying (3.1) in this case predicts that O(h1/2) preconditioned

cg iterations would be required for convergence, and this is close to what

is seen in practical computation. This implies that for a sequence of such

problems with smaller and smaller h, it will take an increasing number of

iterations to achieve acceptable convergence as h gets smaller (and so the

matrix dimension gets larger) – a rather undesirable feature.

Fortunately, there are much better preconditioners for this and related

elliptic PDE problems which guarantee P

1A = O(1), so that the number of

Preconditioning 13

required cg iterations is essentially constant for larger dimensional discrete

problems coming from approximation on finer grids. Such preconditioning

methods are said to have P spectrally equivalent to A. The most important

class of such methods arise from multigrid ideas which we consider next.

Of course, even in the more general context of problems not coming from

PDEs, incomplete Cholesky and its variants can be of considerable utility; in

particular, algebraic preconditioners of this form can very often be applied.

Before moving on, we comment that preconditioning techniques have been

introduced that turn out to be related to incomplete factorization. A par-

ticular example is the ‘support graph’ preconditioner introduced originally

in a conference talk by Vaidya who wrote commercial software but did not

publish on the subject. The support graph idea comes out of graph-theoretic

notions and has a somewhat discrete mathematical flavour. Later analysis

both clarified the scope of the ideas—they are now essentially extended to

cover most symmetric and diagonally dominant matrices—and made con-

nections with incomplete factorization: see Bern et al. (2006), Spielman and

Teng (2014) and references therein.

3.3. Multigrid

Since the groundbreaking paper by Brandt (1977), multigrid methods have

been at the core of scientific computing. Earlier contributions, notably by

Fedorenko (1964) and Bakhvalov (1966), are now considered to have in-

troduced the key ideas of combining a simple smoothing iteration (such

as Jacobi or Gauss-Seidel iteration) with coarse grid correction on succes-

sively coarser grids for the finite di↵erence Laplacian, but Brandt’s vision

was much broader: he showed just how fast multigrid solvers could be and

introduced many ideas to keep reseachers busy for decades! For a posi-

tive definite elliptic PDE operator, it has been clear for a long time that

an appropriate multigrid method will provide an optimal solver, that is, a

solver such that the work to compute the solution scales linearly with the

dimension of the discretized problem.

Classically, multigrid is a stationary iteration (albeit a non-trivial one) for

solving Ax = b when A is a discrete Laplacian matrix (arising from finite

di↵erence or finite elements). For V- or W-cycles, iterates satisfy single

cycle contraction of the form

kx xkkA ⌘ kx xk1kA (3.5)

where the contraction factor, ⌘ is typically around 0.1 (see, for example

Elman et al. (2014b, section 2.5)). It follows that the linear multigrid op-

erator (represented by a complicated and fairly dense matrix which one

would never want to use in computations) is an excellent candidate for a

14 A. J. Wathen

preconditioner since from (3.5) it readily follows that

1 ⌘ x

TAx

xTPx

1 + ⌘,

see Elman et al. (2014b, Lemma 4.2). That is, applying one multigrid V-

cycle, say, is equivalent to solution of a preconditioner system with pre-

conditioner P which guarantees rapid cg convergence because P

1A

(1 + ⌘)/(1 ⌘). There is little point in using multigrid cycles as a precon-

ditioner for cg in this way for simple problems; one might as well simply

use multigrid iteration to solve, but for more complicated problems this ob-

servation that a single multigrid cycle is an ecient spectrally equivalent

approximation of an elliptic PDE operator like the Laplacian is very useful;

see section 5. Note also the important point that arises here for the first

time in this review: a preconditioner can be a linear operation rather than

specifically a known matrix. Thus {3 V-cycles}, for example, is a suitable

preconditioner. On the other hand, iterating multigrid (or any other itera-

tion) to some residual tolerance will likely ensure that di↵erent numbers of

V-cycles are used at di↵erent Krylov subspace iterations and so the precon-

ditioner is not a fixed linear operator and the above convergence analysis

can not apply.

Multigrid – and multilevel methodology more generally – has now become

a huge subject (see for example Trottenberg et al. (2000)), with specialized

techniques for the treatment of several specific features which might arise in

a PDE (or other) problem. In the past couple of decades, algebraic multigrid

methods (AMG), which try to mimic a geometric multigrid approach but

use only information from the entries of a matrix rather than geometric

information regarding grids, boundary conditions, etc. have attracted much

attention and there is now widely available software: e.g. HSL MI20 (Boyle

et al. 2010), ML (Gee et al. 2006), BoomerAMG (Henson and Yang 2000). In

particular, in situations where a problem is fundamentally of the character

of an elliptic di↵erential operator but there are complicating features, an

AMG cycle can be an excellent preconditioner. This can be true even if

AMG is non-convergent since it may contract on all but a few modes; the

few corresponding outlier eigenvalues are easily dealt with by cg. It can

simply be much more convenient to employ an AMG cycle as (part of) a

preconditioner for a Krylov subspace method than to have to ‘hard code’

bespoke techniques for each di↵erent variant problem that comes up. So

common is this combined use of AMG and a Krylov subspace method that

it is rarely even mentioned that AMG is used as a preconditioner, but this

is almost always the case in practice.

We should mention that multigrid ideas have been tried in other contexts

than just for PDE problems; see for example Giles (2008), Livne and Brandt

(2012).

Preconditioning 15

It also needs mention in this section that a multigrid V- or W-cycle can

usually be quite easily made into a symmetric linear operator—see Elman

et al. (2014b, page 98)—though the so-called full multigrid cycle, for exam-

ple, can not. Positive definiteness is not usually an issue for an underlying

positive definite problem.

3.4. Domain Decomposition

For a very large dimensional problem, it might be a fairly natural idea to

split into separate subproblems and solve each separately (and usually in

parallel) to give an approximation to the solution of the overall problem.

In the original context of solving an elliptic boundary value problem on a

domain ⌦, this is the basic idea of domain decomposition: the subproblems

being problems expressed on subdomains ⌦k, k = 1, . . . , N with [⌦k =

⌦. In its simplest form, it correponds to a block diagonal preconditioner

with each separate subproblem represented by a single diagonal block which

may or may not be overlapping, thats is, which may or may not share

some variables with other subproblems. An important issue is how to treat

the variables either in the overlap regions or on the introduced boundaries

between subdomains. Domain decomposition was first employed abstractly

by Hermann Schwarz in the nineteenth century to prove an existence result

for an elliptic PDE, hence methods of this type are often called Schwarz

methods.

Like multigrid, domain decomposition is a huge subject which we have

no intention of covering with any depth here; see for example the books

(Quarteroni and Valli 1999), (Smith et al. 1996), (Toselli and Widlund

2004), (Mathew 2008) or the earlier survey (Chan and Mathew 1994). What

is clear, however, even from this brief description is that domain decompo-

sition gives rise to preconditioners which are well suited to parallel compu-

tation.

To give a brief idea of what can be achieved on a simple problem, we con-

sider the Poisson equation on a domain ⌦ ⇢ Rd approximated by standard

(conforming) finite elements or finite di↵erences with a mesh spacing h. If

⌦ is split into non-overlapping subdomains of width H, then there are basic

block diagonal preconditioners without coarse grid solve for which one can

establish that

P

1A CH2(1 + log2(H/h)).

The more sophisticated domain decomposition schemes which include also

a coarse grid (or ‘wire basket’ solve) can achieve

P

1A C(1 + log2(H/h)).

In either case, C is a generic constant not dependent on H,h and the bounds

translate into convergence estimates via (3.1). The size H can be considered

16 A. J. Wathen

to be the coarse grid size in essence; the bounds above assume exact solution

of the problems on subdomains.

With the importance of parallel and distributed computing, the domain

decomposition concept remains very important and much is known for more

advanced methods and; see the regular Domain Decomposition conference

proceeding available at ddm.org (2014).

3.5. Sparse approximate inverses

For any matrix norm, minimisation of

kI ARk

over all matrices R 2 Rn⇥n will clearly yield R = A1. Minimization with

restrictions on the entries where R may have nonzero values leads to sparse

approximate inverses. That is

R = argminR2SR kI ARk,

where SR describes the subset of entries of R which are allowed to be non-

zero, is a sparse approximate inverse for A. To be consistent with the

nomenclature in this article, P = R1 is the preconditioner here; solution

of systems with P is, of course, simply multiplication by R in this situation.

SR is strictly a subspace of matrices with restricted sparsity pattern.

The Frobenius norm

kBk2F =

X

i=1,...,n

X

j=1,...,n

b2i,j

is by far the most widely considered since the minimisation problem then

reduces to least squares problems for the columns of R separately (thus

computable in parallel). Further, these least squares problems are all of

small dimension when the specification of SR ensures that R is a sparse

matrix. We again refer to Benzi (2002) for a thorough description.

As posed in this simple form, the sparse approximate inverse, R is not

generally symmetric when A is symmetric, however we introduce this type

of preconditioner in this section because symmetry can be enforced by com-

puting R in factored form R = LLT where L is sparse and lower triangular;

see Kolotilina and Yeremin (1993). Symmetry of a sparse approximate in-

verse can also be achieved by the AINV algorithm of Benzi et al. (1996)

and the more robust (stabilized) SAINV variant (Benzi et al. 2000) which

is guaranteed not to break down if A is positive definite. See again Benzi

(2002).

The main issue in the construction of any sparse approximate inverse is

the choice of the sparsity pattern, SR. The simplest idea is to use the spar-

sity of A, whilst there have been several attempts to dynamically define SR,

Preconditioning 17

most notably in the SPAI algorithm of Grote and Huckle (1997). An exten-

sion which can be used in a number of di↵erent scenarios is to use a target

matrix, T , and hence minimise kT ARkF ; see Holland et al. (2005). A

further comment is that sparse approximate inverse algorithms are clearly

quite widely applicable, but generally do not give spectrally equivalent pre-

conditioners for elliptic PDEs. They do however make excellent smoothers

for use with multigrid (Tang and Wan 2000, Bro¨ker et al. 2001).

3.6. Hierarchical representations

For symmetric and positive definite matrices arising via low order finite

element approximation from elliptic PDEs, Yserentant (1986) introduced

nearly three decades ago the hierarchical basis preconditioner. This uses

the idea that the coecient matrix would have a much smaller condition

number if a di↵erent (but less computationally convenient) basis for the

approximation space was employed. By using a hierarchical basis—which

would give rise to a fairly dense matrix—both this good conditioning and

a rapid change of basis matrix based on a simple tree structure allow the

sparse matrix arising from the usual locally defined finite element basis

functions to be e↵ectively preconditioned.

For the discrete Laplacian on a two-dimensional domain, Yserentant’s

hierarchical preconditoner reduces = O(h2) to = O(log h1) for the

discrete Laplacian (see Yserentant (1986) or Elman and Zhang (1995) for a

more linear algebraic proof). Unfortunately, for three-dimensional domains

only = O(h1) is possible with the straightforward use of the hierarchical

basis (Ong 1997), though an improved variant of this idea due to Bramble

et al. (1990) can even yield = O(1) in three-dimensions as well. These

ideas, originally introduced as based on the multilevel splitting of finite

element spaces, were the precursor to a whole abstract theory based on such

splittings; the review article by Xu (1992) remains a largely comprehensive

description of this abstract theory which covers many approaches including

multigrid and domain decomposition discussed above for symmetric and

positive definite matrices arising from elliptic PDEs.

A di↵erent hierarchical structure can also be of utility. We wish to men-

tion some ideas and the associated computational technology which en-

ables certain dense matrices—such as the inverse matrices of discrete pos-

itive definite elliptic PDE operators—to be represented to machine accu-

racy in a highly ecient format. There are a number of related but dif-

fering approaches ( Chandrasekaran et al. (2005), Greengard and Rokhlin

(1987), Hackbusch (1999), Grasedyck and Hackbusch (2003), Hackbusch

et al. (2000), Bebendorf (2008)), but the essential observation can be thought

of in terms of the Green’s function for an elliptic operator: nearby entries

need accurate representation, but variables distant from a given entry can

18 A. J. Wathen

be treated together. This notion is most directly represented in the H-

matrix representation of Hackbusch: o↵-diagonal blocks are represented as

low rank blocks using outer products of vectors or matrices with very few

columns. By employing a blocking hierarchy, the computational complexity

of storing (and also multiplying with) such matrix approximations can be

as low as O(n log n).

We mention these ideas here since such approximate representations of

dense matrices have been suggested as preconditioners rather in the same

vein as sparse approximate inverses. They also can be utilised in other more

demanding situations as we describe later.

4. Semidefinite matrices

It might seem rather odd to be interested in preconditioners for singular

matrices; one might believe that only well-posed problems were worthy of

attention. However, there are di↵erent situations in which approximations

to certain singular operators/matrices are valuable and we mention some

ideas in this brief section.

The simplest PDE probem which gives rise to a singular matrix is proba-

bly the Neumann problem for a simple operator like the Laplacian; specify-

ing only derivative boundary conditions for a potential-like variable leaves

a 1-dimensional kernel comprising the constants as is well known. Such

a small dimensional null space generally causes little significant diculty

in terms of iterative methods or preconditioning and methods for definite

matrices generally apply: see (Elman et al. 2014b, section 2.3). Similarly,

graph analysis and Markov chain problems respectively lead to symmetric

and nonsymmetric singular systems with simple, low-dimensional kernels;

again, methods for definite matrices should generally be applied without

explicit ‘fixes’ to ensure nonsingularity for such problems. For problems

arising in contexts other than PDEs, singularity tends to imply that a de-

scription has not been completely specified.

The issue of semidefiniteness is certainly serious, however, for equations

involving the curl or div operators applied to vector fields as arise, for exam-

ple, in the Maxwell equations describing electromagnetics. In this context,

the operator curlcurl has all gradients of scalar fields in its kernel, thus

under most discretizations one would expect that the matrices arising will

be semidefinite with a kernel of some significant dimension. Similarly, the

curl of any smooth enough vector field is in the kernel of the div opera-

tor. On the relevant spaces of functions, the curlcurl operator is however

positive definite and self-adjoint (symmetric), and it seems that it may be

possibile to use convenient preconditioners like standard AMG implemen-

tations if appropriate account is taken of the kernel. This is the purpose of

‘Auxiliary Space’ preconditioning which aims to take account of the natural

Preconditioning 19

decompositions of function spaces that arise. The ideas have evolved over

some years; see Hiptmair and Xu (2007) and references therein. The main

practical issue is representing mapping operators between the various spaces

in the discrete setting.

5. Symmetric and indefinite matrices

For real symmetric matrices which do not have to be definite, the Krylov

subspace method of choice is the minres algorithm due to Paige and Saun-

ders (1975). In the same paper they introduced the alternative but less used

symmlq method. As its name implies, minres minimises the norm of the

residual vector for xk in the k-dimensional Krylov subspace at each itera-

tion. Thus since the symmetry of A ensures that the eigenvalues are real and

the eigenvectors are orthogonal, the convergence of minres is described by

krkkI

kr0kI minpk2⇧k,pk(0)=1 max2(A) |pk()|, (5.1)

where the notation is as used before in the positive definite case. The

di↵erence from that case is not only that the residual rather than the error

is minimised (it is really a di↵erence of norm since rk = A(x xk) implies

that krkkI = kx xkkA2), but the spectrum, (A), now contains both

positive and negative eigenvalues.

As in the symmetric and positive definite case, (5.1) immediately indicates

that termination of the iteration will occur with the solution at the sth iter-

ation if A has just s distinct eigenvalues. Correspondingly, the eigenvalues

lying in a small number of clusters should also lead to rapid convergence

if none of the clusters is too far from, nor too close to the origin. If the

eigenvalues of A are all contained in two intervals

[a,b] [ [c, d],

where a, b, c, d are all positive, then it is less straightforward in general

than in the positive definite case to derive a simple convergence bound (see

Wathen et al. (1995)). One situation where a simple expression is possible,

however, is when d c = a b; the minres residuals then satisfy the

convergence bound

kr2kkI

kr0kI 2

✓p

1p

+ 1

◆k

, (5.2)

where = ad/(bc) (see (Elman et al. 2014b, section 4.2.4.)). Note that

here plays a similar role to max/min in the positive definite case, though it

is not the matrix condition number here; since the Euclidean norm condition

number, kAkkA1k is max{a, d}/min{b, c} in this situation, can be as

large as the square of this condition number when a = d, b = c, but rather

20 A. J. Wathen

less when the two intervals are not symmetrically placed about the origin.

When a = c, for example, = d/b is again exactly the Euclidean norm

condition number.

When A is symmetric and indefinite, any preconditioner for minresmust

be symmetric and positive definite. This is necessary since otherwise there is

no equivalent symmetric system for the preconditioned matrix. Thus a non-

symmetric iterative method must be used even if a symmetric and indefinite

preconditioner is employed for a symmetric and indefinite matrix in general.

An indefinite preconditioner is therefore not usually desirable, though there

are exceptions: see below. A preconditioner for a symmetric indefinite

matrix A for use with minres therefore can not be an approximation of

the inverse of A, since this is also indefinite. This simple observation rules

out several approaches described above in the positive definite case.

With a symmetric and positive definite preconditioner, the preconditioned

minres convergence bounds as above become

krkkP1

kr0kP1

min

pk2⇧k,pk(0)=1

max

2(P1A)

|pk()|

min

pk2⇧k,pk(0)=1

max

2[a,b][[c,d]

|pk()| (5.3)

when [a,b] [ [c, d] is now an inclusion set for all of the eigenvalues of

P1A rather than A. A further aspect introduced by preconditioning for

minres which does not arise for cg in the positive definite case is now

apparent: the norm in which convergence naturally occurs is a↵ected by

the preconditioner; it is not just the eigenvalue distribution and hence the

speed of convergence which is a↵ected.

The only reason we bring up this issue is because one now has the undesir-

able possibility that a preconditioner which apparently gives rapid minres

convergence might give inaccurate solution iterates simply because k · kP1

is a highly distorted norm. The following trivial example will make the

point:

Let A be any indefinite diagonal matrix and suppose we precondition with

P = diag(✏, 1/✏, . . . , 1/✏)

for some small positive ✏. Convergence to some small tolerance will appear

to occur after only 1 iteration since there is only 1 large entry of P1 and the

remaining entries are very small. Because of this distorted weighting, the

first component of the first solution iterate will be accurate, but the rest will

generally not be. For further consideration of this issue see Wathen (2007).

These di↵erent issues, which do not arise when A is symmetric and pos-

itive definite, lead some authors to the conclusion that preconditioning for

symmetric indefinite matrices is much more dicult than the definite case.

However it is still much more tractable than the nonsymmetric case! For

Preconditioning 21

a PDE problem, for example, the convergence bound (5.3) guarantees that

preconditioned minres convergence will be in a number of iterations in-

dependent of any mesh size parameter, h, given a preconditioner which

ensures that a, b, c, d are bounded, bounded away from zero and indepen-

dent of h. This is like the situation for spectrally equivalent preconditioners

in the symmetric positive definite case. Such guarantees rarely exist for

non-symmetric iterative methods; see section 6 below.

We mention that the sqmr Krylov subspace iterative method (Freund

and Nachtigal 1994) is designed for indefinite matrices with indefinite pre-

conditioning; this method, however, has no convergence guarantees, being

a variant of the nonsymmetric qmr iterative method for which symmetry

allows a more streamlined implementation.

General preconditioning strategies for symmetric indefinite matrices are

not so available as in the positive definite case. Amongst the ‘given a ma-

trix’ approaches, simple application of scaling and incomplete factorization

are not contenders and there is little significant work on sparse approxi-

mate inverses since these appear unlikely to provide much benefit except

possibly when A has only a few negative eigenvalues. ‘Given a problem’ is

more promising and there are two general classes of problem which give rise

to symmetric indefinite matrices for which helpful—sometimes excellent—

preconditioning approaches are known.

5.1. Shifted elliptic operators: Helmholtz problems

It might seem that taking a symmetric and positive definite matrix, K, and

simply shifting it by subtracting some multiple of the identity should lead

to matrices for which easy solution algorithms exist. In general, however,

this is not the case. Certainly the matrix K I can have resonances (and

thus be singular) when is an eigenvalue of K, but the diculty lies deeper

than just this observation.

The classic (and important) PDE problem here is the Helmholtz equation

r2uu = f which arises when looking for periodic (wave-like) solutions

of the linear wave equation; the parameter is then related to the frequency.

Supposing that K is the discretised form of the positive definite operator

r2 (i.e. the negative Laplacian), then for > min(K) it is clear that

A = K I (5.4)

is indefinite and symmetric. Many of the important physical applications

involve high frequencies and thus large values of . Having a good precondi-

tioner forK—we have identified many in the sections above—is of little help

in constructing a preconditioner for A. For finite element discretization, the

identity operator becomes the mass matrix, Q (3.4), but via diagonal ap-

proximation or since the grids employed are generally fairly regular for such

wave problems, there is little lost in description by considering (5.4).

22 A. J. Wathen

In fact for Helmholtz problems, there is usually a radiation boundary con-

dition for outgoing waves or an absorbing boundary condition (or ‘perfectly

matched layer’) is applied so that there is little reflection of outgoing waves

back into the computational domain. These add some complex entries in the

matrix which generally render it complex and symmetric (not Hermitian).

Nevertheless, it is the indefiniteness of Helmholtz problems which remains

the most important feature, hence why we consider them in this section.

This aspect can, however, mean that nonsymmetric Krylov subspace itera-

tive methods are required—sqmr is quite popular in this context.

Magolu Monga Made et al. (2000) and Laird and Giles (2002) were per-

haps the first to recognize the potential of preconditioning (5.4) with a

matrix of the form K + µI where |µ| is positive. In the case of real µ,

such a positive definite matrix is readily approximated by standard multi-

grid cycles which can be used as a positive definite preconditioner for (5.4).

Complex values for µ were also explored (see also van Gijzen et al. (2007),

for example), and good choices of the parameter can be found from Fourier

analysis of the spectrum of the preconditioned operator/matrix. As well as

indicating the speed of Krylov subspace iteration convergence, such analysis

can also assist in the choice of multigrid components (Erlangga et al. 2006).

Elman et al. (2001) also propose a multigrid approach for Helmholtz prob-

lem employing gmres as a smoother. Domain decomposition approaches

have also been suggested (Gander et al. 2002).

A more recent prosposal by Engquist and Ying (2011a) for variable and

high-frequency regimes involves ‘sweeping’ from the absorbing boundary,

eliminating the variables locally in layers. The resulting preconditioner is

thus a block incomplete LDLT factorzation. A key aspect of the implemen-

tation is that hierarchical (H-)matrix compression techniques as described

in section 3 above can be employed for the intermediate layers; this leads to

significant reductions in storage and computational work and leads to ex-

cellent overall complexities like O(n log2 n) for computation of P = LDLT

and application of P1 to a vector for a two-dimensional domain problem.

Krylov subspace iterative convergence is observed to occur in few iterations

making this a very competitive technique. Unfortunately, though the idea—

and a closely related one employing frontal sparse factorization (Engquist

and Ying 2011b)—is observed to work well for three-dimensional domains

as well, the H-matrix compression does not appear to yield the same small

computational complexity: algebraic factors in n replace logarithmic factors

in estimates of the complexity for setting up the preconditioner (Poulson

et al. 2013). That is, for a three-dimensional domain problem, the com-

putation of P = LDLT takes time which grows like n for some constant

> 1 whereas, once P has been computed, the action of P1 can be com-

puted in O(n log n) work at each iteration (similarly to the FFT). For a

Preconditioning 23

problem with many di↵erent right hand side vectors there could thus be

some amelioration in the work required to set up the preconditioner.

We should alert the reader to the article by Ernst and Gander (2012)

which explains the general diculty of iterative solution of Helmholtz prob-

lems.

The Maxwell equations describing electromagnetics have wave-like solu-

tions and hence some of the character of Helmholtz problems. However,

they also (in various formulations) have additional semidefinite features as

discussed in the section above, and additional indefinite saddle point struc-

ture, and are therefore considered below.

The idea of sweeping through the discrete variables for a problem in a par-

ticular order is also important for preconditioner construction for nonsym-

metric matrices which derive from PDEs that involve advection/convection

(or transport) as described below.

5.2. Saddle point systems

The second main class of symmetric and indefinite problems come from

minimization of quadratic functionals subject to linear constraints. In fact,

linearizations of much more general constrained optimization problems quite

generally result in linear systems of the same form

Ax =

A BT

B 0

u

p

=

f

g

(5.5)

which are called saddle point systems; see Benzi et al. (2005) for a com-

prehensive treatment of such problems. The block A 2 Rn⇥n is usually

positive definite on the kernel of B 2 Rm⇥n with m < n. This is a su-

cient condition for invertibility of A provided B is of full rank. Often A

is positive definite, though in the context of sequential quadratic approxi-

mations in Optimization (and possibly elsewhere), indefiniteness can arise;

see Gould and Simoncini (2009) for spectral bounds in this less usual case.

The variables u are the primary variables and p are the Lagrange multi-

pliers. Sometimes the zero block is replaced with a regularization matrix

C 2 Rm⇥m where C is symmetric and positive semidefinite; if C 6= 0 then

some authors refer to a regularized or generalized saddle point problem.

Many problems give rise to matrices of this type: as well as the ubiqui-

tous appearance of such problems in Optimization, several important PDE

problems including the Stokes problem describing slow viscous incompress-

ible flow, are of this form. The incompressible Navier-Stokes equations give

matrices of similar form but in which A is nonsymmetric. For these ap-

plications, the primary variables are the fluid velocity and the Lagrange

multiplier variables are the fluid pressure. For other applications and refer-

ences, see again Benzi et al. (2005) or Pestana and Wathen (2015).

The appearance of the zero diagonal block in (5.5) causes problems for

24 A. J. Wathen

most of the preconditoning approaches identified in section 3 above. This is

also generally true with regularisation, C 6= 0. Multigrid approaches, for ex-

ample, require more complex smoothers; see for example John and Tobiska

(2000) and Manservisi (2006). Domain decomposition can be applied, but

then the subdomain problems remain indefinite. Incomplete factorization

ideas can sometimes be employed if, for example, A and C are diagonal or

nearly so (Bergamaschi et al. 2004).

A simple observation in the case that A is nonsingular is that if

P =

A 0

0 S

, S = BA1BT , (5.6)

then the preconditioned matrix P1A has only three distinct eigenvalues

independently of m,n; thus minres (or gmres in the nonsymmetric case)

will converge in 3 iterations to the solution (Murphy et al. 2000, Korzak

1999). P in (5.6) is rarely a practical preconditioner for A in its exact

form, but this observation does give a paradigm for the construction of

preconditioners and, in particular, highlight the utility of approximations to

the Schur complement, S. An important point to note is that P is nothing

like an inverse of A (nor an approximation thereof). Nevertheless, good

approximations (preconditioners) bA for the block A and bS for the Schur

complement, S, lead to block diagonal preconditioners

P =

bA 0

0 bS

(5.7)

for A for which minres converges in few iterations. Precisely, if

` u

TAu

uT bAu ⌥ and p

TSp

pT bSp for all u 2 Rn{0} and p 2 Rm{0}

for positive constants `,⌥, and , then there are positive constants a, b, c

and d satisfying the preconditioned minres convergence bound (5.3). In

the context of PDE problems, for example, this means that preconditioned

minres is a solver of optimal computational complexity provided that bA is

spectrally equivalent to A and bS is spectrally equivalent to S. A multigrid

cycle can be an excellent choice for bA. See Pestana and Wathen (2015) for

a general proof which highlights the role of inf-sup stability associated with

the minimal singular value of B (in appropriate norms).

For many problems there are natural Schur complement approximations

which arise from problem formulation and do not require separate approx-

imation of the matrices B and A—see Pestana and Wathen (2015). In the

case of the Stokes equations, the natural Schur complement approximation

is particularly simple: it is the mass matrix (3.4) (Verfu¨rth 1984, Silvester

and Wathen 1994). For any such problem with a natural Schur comple-

ment approximation, (5.7) is an excellent preconditioning approach since bA

Preconditioning 25

can usually be any of the approximations in section 3. It should be noted

that once such approximations bA and bS are identified, then there are more

involved ways to make use of them than the simple block diagonal precon-

ditioner (5.7). When matrix symmetry is not an issue, a block triangular

preconditioner can be employed—see section 6.

Sometimes it is actually useful to directly approximate the matrices B

and A in constructing bS, for example in the context of optimization with

constraints which are PDEs (Rees et al. 2010a), though care must be exer-

cised to ensure that the sucient conditions of Braess and Peisker (1986)

are satisfied.

For saddle point problems arising particularly in the more general context

of optimization, constraint preconditioning has attracted significant atten-

tion (Keller et al. 2000). Here

P =

H BT

B 0

is used as a preconditioner forA. Note that the blocksB andBT which come

from the constraints are kept, whilst A is replaced by another symmetric and

(usually) positive definite matrix H. It might appear that solution of sys-

tems with P may not be signficantly easier than with A, however, by leaving

H implicitly defined, this can be achieved (Dollar and Wathen 2006, Dollar

et al. 2006). One significant advantage of keeping the constraint blocks in

the preconditioner is that the preconditioned problem is now equivalent to

a positive definite problem on the constraint manifold, hence a projected

cg algorithm can be used (Gould et al. 2001); this is rather useful since

a constraint preconditioner is symmetric and indefinite precluding simple

use of minres. Keller et al. (2000) prove that when H is positive definite

then P1A has 2m eigenvalues of 1, each of geometric multiplicity 2 (thus

the minimum polynomial generically has a factor ( 1)2), and that the

remaining nm eigenvalues are real and interlace those of H1A. Thus if

H is a good preconditioner for A and systems with P can be readily solved,

then this can be an attractive strategy.

Di↵erent challenges arise and di↵erent strategies are needed for saddle

point problems (5.5) with semidefinite block, A. The constraint precondi-

tioning paradigm can be applied, in which case finding a matrix H for the

semidefinite matrix A may require ideas as in section 4 above. The matrix

A is still invertible provided A is positive definite on the kernel of B and B

is of full rank, as has been mentioned. One possible approach is therefore

via an augmentation or augmented system preconditioner

P =

A+BTW1B 0

0 W

for some appropriate symmetric and positive definite matrix W . This ap-

26 A. J. Wathen

proach is described and analysed by Greif and Scho¨tzau (2006) who were

motivated by the solution of Maxwell equations describing electromagnetic

phenomena—a key PDE problem with this structure. For related but earlier

ideas associated with the semidefiniteness of the div operator, see Arnold

et al. (1997).

For the various Maxwell equation formulations, semidefiniteness arises

through curlcurl terms, indefiniteness because of a div constraint and wave-

like solutions are always expected, which means that fine computational

grids must be used especially for high frequency problems. Fortunately,

Maxwell’s equations are linear and self-adjoint in most formulations which

is why we bring them up in this section. Mixed finite element approxi-

mation is widely used for these PDEs. Representative of the large and

still growing literature on preconditioning for such problems are Greif and

Scho¨tzau (2007), who develop block preconditioners and Tsuji et al. (2012),

who extend sweeping preconditioner ideas to these problems.

A slightly more detailed review of preconditioning for saddle point prob-

lems can be found in Benzi and Wathen (2009).

5.3. A word of caution

As has already been mentioned in section 2, for any nonsingular square ma-

trix A, the solution of a linear system can be achieved by using an indefinite

symmetric system of twice the dimension of the form (2.5) or even

0 A

AT 0

r

x

=

b

0

.

Since the eigenvalues of the coecient matrix here are exactly symmetric

about the origin (they are ±i(A) where i(A) are the singular values of

A), generically, the convergence of minres would stagnate at every other

iteration, so the computational e↵ort in solution is exactly the same as for

the normal equations (2.4) (Fischer et al. 1998). For (2.5) a similar issue

arises.

The artificial construction of indefinite symmetric systems is therefore

unlikely to achieve other than equivalent problem statements, except possi-

bly in the solution of least squares problems by augmented matrix methods

(Bjo¨rck 1996, Chapter 7). To go further, a symmetric indefinite matrix

of any form which has eigenvalues symmetrically arranged about the origin

will generically lead to staircasing behaviour where progress is made only on

alternate minres iterations; cg for (2.4) generally requires the same com-

putational e↵ort for equation solution in such circumstances. In this sense,

it is highly unfortunate that the tractable minres convergence bound (5.2)

applies only when the positive and negative eigenvalues extend over inter-

vals of equal length. If in fact a = d and b = c so that these intervals are

Preconditioning 27

centred on the origin, then the undesirable staircasing e↵ect is always im-

plied! In general, it is advantageous in terms of convergence speed if there

is some asymmetry in the distribution of positive and negative eigenvalues

of preconditioned indefinite symmetric matrices. However, it is then con-

siderably more complicated to take account of this asymmetry in a minres

convergence bound (but see Wathen et al. (1995)).

6. Nonsymmetric matrices

As already mentioned, all preconditioning for nonsymmetric problems is

heuristic, since descriptive convergence bounds for gmres or any of the

other applicable nonsymmetric Krylov subspace iterative methods do not

presently exist (but see Pestana and Wathen (2013b)). The most straight-

forward convergence bound for gmres for diagonalisable matrices, A =

X⇤X1 where ⇤ is a diagonal matrix of eigenvalues, comes directly from

(2.3) and the residual minimization property:

krkk = min

p2⇧k,p(0)=1

kp(A)r0k

min

p2⇧k,p(0)=1

kp(X⇤X1)k kr0k

= min

p2⇧k,p(0)=1

kXp(⇤)X1k kr0k

kXk kX1k min

p2⇧k,p(0)=1

kp(⇤)k kr0k

kXk kX1k min

p2⇧k,p(0)=1

max

z2R

|p(z)| kr0k, (6.1)

where R is any region containing the eigenvalues. Since it is sometimes pos-

sible to find regions R which contain all of the eigenvalues, but rarely possi-

ble to obtain any reasonable estimates of the condition number kXkkX1k

of the eigenvector matrices, X, very often in the literature, the eigenvalues

of P1A are estimated or bounded as a guide to the quality of a precondi-

tioner. It is certainly usually true that if the eigenvalues are widely spread

through a large region of C then slow convergence can follow!

There are alternative approaches for obtaining bounds for gmres con-

vergence; see for example (Greenbaum 1997, Section 3.2). One can generate

such bounds, for example, with use of the field of values or numerical range

F(A) = {xTAx, x 2 Cn, xTx = 1},

provided 0 /2 F(A). The obtained bounds can be rather pessimistic, but

field-of-values convergence bounds have been used, for example, to rigor-

ously establish the convergence of gmres in a number of iterations in-

dependent of the mesh size, h, for certain preconditioned PDE problems;

see for example Klawonn and Starke (1999), Loghin and Wathen (2004),

28 A. J. Wathen

Benzi and Olshanskii (2011). gmres convergence bounds can be based on

other sets in the complex plane; ✏pseudospectral sets, for example, have

been used to analyse initial stagnation of (unpreconditioned) gmres for

convection-di↵usion problems (Trefethen and Embree 2005, Chapter 26).

One aspect which arises in the nonsymmetric but not the symmetric case

is the possibility to left precondition

P1Ax = P1b,

to right precondition

AP1y = b, x = P1y,

or, if P is available in split form P = P1P2, to use

P11 AP

1

2 y = P

1

1 b, x = P

1

2 y.

In every case the coecient matrix is generally again nonsymmetric and

obvious similarity tranformations involving P or the Pi show that all three

coecient matrices are mathematically similar and so have the same eigen-

values. There is little evidence, however, that any one of these is better

than the others, in terms of its e↵ect on the convergence rate for a Krylov

subspace method, even though, theoretically, this could be the case. Tradi-

tionally, right preconditioning is favoured since then gmres, for example,

minimizes the residual rk = b Axk for the original linear system; left

preconditioning would lead to minimization of the preconditioned residual

P1rk. It is certainly not always clear which is preferable; a good precon-

ditioner fortunately seems to give fast convergence for all three forms in

practice.

In the context of PDEs, convection-di↵usion problems present significant

challenges for stable discretization—without excessive numerical di↵usion

or non-physical oscillations—as well as for preconditioning and iterative

solvers. A key aspect is simply equation ordering. Even such a simple

method as stationary Gauss-Seidel iteration can be good for such problems

if the variables are ordered in the direction of the convection, but it can be

very poor otherwise (see Elman and Chernesky (1993), Elman et al. (2014b,

chapter 7)).

We mention also that there is a ‘flexible’ version of gmres which allows

for a preconditioner that varies between iterations (Saad 1993). This can be

useful, for example, when employing another iteration as a preconditioner,

but this flexible gmres method does not necessarily work as well as regular

gmres. It can sometimes be better to just use a fixed number of inner

iterations so that the preconditioner is a fixed operator for every iteration;

if such inner iterative steps represent a fixed linear process, then at least

the limited guarantees that come with gmres then remain in place.

Preconditioning 29

6.1. Incomplete Factorization

Just as a Cholesky factorization for a symmetric and positive definite ma-

trix can be incompletely formed, an LU factorization (equivalent to Gauss

elimination) for a non-symmetric matrix can be performed incompletely

by prespecifying allowable fill or thresholding or both. Perhaps such ILU

preconditioning methods and their variants are the most widely used for

non-symmetric problems and software is widely available.

Because the product of triangular matrix factors is computed for ILU , for

problems with an implied directionality like convection–di↵usion, ordering

of the variables can lead to one of the factors being less important (and

perhaps more nearly diagonal).

Although, in general, there is little we can say about ILU precondition-

ing for non-symmetric matrices, gmres with ILU is a ‘go-to’ technique for

very many practical problems, so much largely empirical research continues

to explore every aspect of the various possibilities, for example, multilevel

variants, block variants and parallel implementation issues. We again refer

to Benzi (2002) as the huge literature is fairly evenly cited. Because of

its significance, particularly in the important area of petroleum reservoir

modelling, we specifically mention the nested ILU methods originally de-

rived by Appleyard and Cheshire (1983), which use partial eliminations and

are designed to take advantage of parallel computation; related ideas were

developed, for example, by Saad (1996). See also Vassilevski (2008) for a

comprehensive treatment of these methods in a more general framework.

6.2. Multigrid

There is no inherent obstacle to applying multigrid or multilevel ideas for

nonsymmetric problems. However, smoothing is generally not so easy and

requires more work than for symmetric problems and care must be taken so

that coarse grid correction does not reintroduce high frequencies. Without

attention to these aspects, the methods simply do not work at all well.

Since such issues vary with di↵ering applications, we briefly discuss only

the important problem associated with the convection-di↵usion equation.

Smoothing – or perhaps more appropriately in this setting, ‘sweeping’ –

should ideally be something like a Gauss-Seidel iteration with the variables

ordered in the direction of the convection. For problems with recirculation,

one can try multidirectional sweeping, so that all parts of the flow have

at least one of the directional sweeps that is approximately in the convec-

tion direction. Coarse grid correction then requires care (in grid transfers

and/or coarse grid operator). In the convection-di↵usion geometric multi-

grid method due to Ramage (1999), for example, the coarse grid operator

is recomputed on each coarse grid with—crucially—streamline upwinding

appropriate to that grid size. The point is that when a grid under-resolves

30 A. J. Wathen

any layer-like phenomenon which arises with convection—this must happen

at some level if one has a true multigrid concept—then oscillatory solutions

are commonly the result. The reintroduction of such high frequencies would

require more significant sweeping or smoothing, and it is preferable to pro-

vide stabilisation appropriate to each level of grid so that smooth coarse

grid corrections result.

Multigrid for hyperbolic problems is thought about rather di↵erently

than for elliptic problems and the idea of directional sweeping rather than

smoothing is key. The coarse grids allow one to sweep errors out of the

boundary of the domain more rapidly than using just a single grid. Again,

transport is the main issue, not smoothing.

Algebraic multigrid ideas have been tried and can work well for some

problems. The computation of ‘coarse grids’ (or in reality, spaces of coarse

variables) by purely algebraic means can identify layer phenomena, for ex-

ample, and automatically semi-coarsen. In particular, widely used coars-

ening algorithms seem to detect anisotropy and lead to appropriate coarse

spaces/grids. All is not rosy, however, and AMG software can quite simply

fail. There is a particular danger that simply does not arise with geometric

multigrid ideas, namely that the coarse grid can have a significant propor-

tion of the variables of the fine grid; that is, coarsening does not lead to a

suciently significant reduction in the number of variables. If this occurs,

then many levels are required and eciency can be severely compromised.

Active research continues on AMG for nonsymmetric problems; see for ex-

ample Notay (2012).

Multigrid methodology is generally applied as a preconditioner for non-

symmetric problems.

6.3. Sparse Approximate Inverses

Norm minimization readily leads to sparse approximate inverses similarly

to the symmetric case. A biconjugation based method (AINV) and its

more robust stabilized form (SAINV) are, however, perhaps the most widely

considered of such methods in the non-symmetric case (Benzi and Tuma

1998, Benzi et al. 2000).

6.4. Symmetric/Skew splitting

Every nonsymmetric real matrix can be uniquely split into its symmetric

and skew-symmetric part:

A = 12(A+A

T )| {z }

H

+ 12(AAT )| {z }

S

.

(The H stands for Hermitian, since the same idea obviously applies in the

complex case when transpose is replace by conjugate transpose.) When H is

Preconditioning 31

positive definite, preconditioning of A by H gives a shifted skew-symmetric

matrix for which a special three-term recurrence Krylov subspace method

exists (Concus and Golub 1976, Widlund 1978) and convergence depends on

the size of 1+|max(H1S)| (Widlund 1978, Freund and Ruscheweyh 1986).

Although this approach has been considered for some applications—see for

example Elman and Schultz (1986) in the context of convection–di↵usion

problems—the so-called HSS (Hermitian Skew-Hermitian Splitting) meth-

ods originally introduced by Bai et al. (2003) have been rather more widely

investigated. They proved the convergence of an alternate iteration using

shifted Hermitian and shifted skew-Hermitian parts of the form

(↵I +H)x(k+

1

2) = (↵I S)x(k) + b

(↵I + S)x(k+1) = (↵I H)x(k+12) + b

for Ax = b. Though convergence is guaranteed independently of the choice

of the parameter ↵ > 0, it can be slow. Nevertheless, many variants of

using this idea as a preconditioner have been suggested, as have related

structural splitting ideas, and good results are reported for some; see for

example Benzi et al. (2011a).

6.5. Nonsymmetric saddle point systems

Saddle point problems must essentially be symmetric as in section 5.2, but

matrices of the structure (5.5) arise in which the block A is non-symmetric.

A key example is provided by linearizations of the incompressible Navier–

Stokes equations (see Elman et al. (2014b, Chapters 8 and 9)). It is these

problems that we refer to as non-symmetric saddle point problems.

For a nonsymmetric saddle point matrix, the results of Murphy et al.

(2000) still hold and block diagonal preconditioners can still be applied.

However, since there is no symmetry to lose, the block triangular precondi-

tioners

P =

bA 0

B ±bS

or P =

bA BT

0 ±bS

will give in the ‘ideal’ case bA = A, bS = S = BA1BT , a preconditioned

system with either just the two distinct eigenvalues ±1 or even just the

single eigenvalue of 1 depending on whether the + or the sign is selected,

respectively (Murphy et al. 2000). Even with the single eigenvalue 1, P is

not the inverse, so something has to give! In fact it is diagonalisability that

is lost in selecting the sign; the minimium polynomial is then ( 1)2, so

two iterations would be required exactly as for the + case. However, in the

practical case when the approximations bA, bS are not ideal, it is usually the

32 A. J. Wathen

sign which is chosen so that all eigenvalues are clustered around 1 rather

than the 2 clusters around ±1 which arise with the + choice, though it is

not clear that this is to be preferred (Ymbert et al. 2015).

Approximations of the Schur complement remain a key aspect of any

such block preconditioning approach. For the common Oseen linearization

of the incompressible Navier-Stokes problem, the matrix block A comes

from a vector convection–di↵usion operator so that an appropriate multi-

grid approach as outlined above or any other such good approximation of

(preconditioner for) convection-di↵usion is required as bA. To identify an

approximate Schur complement requires more in depth consideration, but

there are now two excellent candidates: the pressure convection-di↵usion

(PCD) (Kay et al. 2002) and the least-squares commutator (LSC) precon-

ditioners. Both are derived, described and analysed in Elman et al. (2014b,

Chapter 9). The PCD method is in some ways simpler, but it requires the

construction of a convection–di↵usion operator on the pressure space (the

space of Lagrange multipliers), which is not required for the problem for-

mulation itself. A purely algebraic way to compute this operator based on

sparse approximate inverse technology was, however, been described by El-

man et al. (2006). The LSC method is derived via a commutator argument

and is defined purely in terms of the matrix blocks which arise naturally

for the Oseen problem. For a comparison and full consideration of advan-

tages and disadvantages of these two approaches, see Elman et al. (2014b,

Chapter 9).

The augmentation or augumented Lagrangian approach has also been

applied to non-symmetric saddle point problems. Although some concerns

have been expressed regarding possible distortion of the norm in which

convergence occurs, good results have been reported even for large Reynolds

number (Benzi et al. 2011b).

Earlier approaches for this problem involved segregated treatment of pres-

sure and velocity and predate the preconditioned Krylov subspce iteration

technology. It is perhaps worth commenting here that this example is one

of the first where block preconditioners comprising of blocks designed for

di↵erent sets of variables allow treatment of di↵erent ‘physics’ within the

preconditioner whilst gmres or some other Krylov subspace method is ap-

plied to the complete coupled problem. This is an important paradigm.

Such is its importance that it is in the context of nonsymmetric saddle

point systems arising from approximation of the Oseen problem that per-

haps the most refined analysis of preconditioned gmres convergence has

been pursued. There are several eigenvalue analyses for di↵erent precondi-

tioners, see for example Elman and Silvester (1996), although these do not

lead to rigorous estimates as explained above. It is in the context of this

problem, however, that field-of-values convergence analysis has been suc-

cessfully employed to rigorously bound preconditioned gmres convergence

Preconditioning 33

and to elucidate its dependence on or independence of the mesh size, h, and

of the Reynolds number; see for example the analysis of Loghin and Wa-

then (2004) for PCD, and Benzi and Olshanskii (2011) for block triangular

preconditioning of an augmented Lagrangian formulation.

6.6. Circulant and semi-circulant preconditioning

Our opening example already indicates the potential for circulant precondi-

tioning, at least for symmetric Toeplitz matrices. In fact the applicability

of such ideas is rather wider, though the theoretical justification is less com-

plete for non-symmetric problems.

The analysis for symmetric and positive definite Toeplitz matrices is well

described in the books by Chan and Jin (2007) and Ng (2004) who also in-

dicate how similar ideas follow for block Toeplitz matrices with Toeplitz

blocks. Indeed, the use of circulant preconditioners for non-symmetric

Toeplitz matrices does not really depend on symmetry. However, unless the

normal equations are used, convergence guarantees are not readily avail-

able, simply because of the general lack of descriptive convergence bounds

for non-symmetric matrices. Good computational results have been ob-

served in many situations, and in fact even proved for certain problems via

bounding of the eigenvector condition number as well as the eigenvalues

and using (6.1) (Otto 1996). A recent observation of Pestana and Wathen

(2014), that guaranteed convergence bounds can be achieved for minres

for circulant preconditioned non-symmetric Toeplitz matrices via a variable

reordering, is also relevant.

7. Some comments on symmetry

We have emphasised the importance of matrix symmetry in this review,

in particular that it is rarely sensible to throw away such a property by

preconditioning. In the reverse direction, it would be highly advantageous

for non-symmetric matrices if it were easily possible to precondition to ob-

tain symmetry of the preconditioned problem without recourse to the nor-

mal equations. This possibility seems unlikely without the computation

of expensive matrix factorizations. However, there are situations in which

a broader consideration of self-adjointness allows many of the attractive

properties of symmetric matrices, such as the use of the symmetric Krylov

subspace iterative methods cg and minres.

A symmetric matrix, A = AT , is simply one which is self-adjoint in the

usual Eulidean inner product, that is,

hAu, vi = vTAu = uTAv = hAv, ui = hu,Avi.

In fact given any inner product

hu, viH = vTHu (7.1)

34 A. J. Wathen

defined by a symmetric and positive definite matrixH (all inner products on

Rn arise in this way), the distinction between symmetric and non-symmetric

matrices is really the distinction between matrices which are self-adjoint and

those which are non-self-adjoint with respect to the inner product. Indeed,

in the standard situation when A and P are both symmetric and P is

positive definite, then the left-preconditioned matrix P1A is self-adjoint in

hu, viP , since

hP1Au, viP = vTPP1Au = vTAu = vTATPTPu = hu, P1AviP .

This description of symmetric preconditioning for symmetric matrices with-

out destroying self-adjointness provides an alternative to the approach in

section 3. The cg method, for example, can be employed in such a gen-

eral inner product (7.1) for any matrix which is self-adjoint in this inner

product and satisfies the positive definiteness condition hAu, uiH > 0 for all

u 2 Rn {0}.

In general, any matrix A is self-adjoint in (7.1) if and only if, for all

u, v 2 Rn

vTHAu = hAu, viH = hu,AviH = vTATHu, (7.2)

which is the same condition as HA = ATH, so we must have AT = HAH1

for some positive definite symmetric matrix H. It can be shown that such

a matrix H exists if and only if the real matrix A is diagonalisable and has

real eigenvalues (Taussky 1972). Thus the use of self-adjointness in non-

standard inner products is limited, but interesting examples do exist where,

via left- or right-preconditioning, non-symmetric preconditioned matrices

arise which are nevertheless self-adjoint in some convenient non-standard

inner product; see Stoll and Wathen (2008). Since the inner product must

be used for computation of scalar products (dot products) arising in Krylov

subspace methods, there is additional arithmetic overhead in the use of

H 6= I unless other structures mitigate this.

We present just one example—perhaps the most well-known—of the use

of non-standard inner products; exceptionally, it allows the robust appli-

cation of cg in a non-standard inner product for an indefinite symmetric

saddle point problem by creating a self-adjoint and positive definite matrix

in that inner product (Bramble and Pasciak 1988). For other examples, see

Klawonn (1998) and Benzi and Simoncini (2006), and see Stoll and Wathen

(2008) and Pestana and Wathen (2013a) for how examples of this struc-

ture can be combined to yield further cases of possible utility (so-called

combination preconditioning).

In the Bramble and Pasciak method one preconditions the symmetric and

indefinite saddle point matrix as given in (5.5) on the left with the block

Preconditioning 35

lower triangular matrix

P =

bA 0

B I

,

resulting in the non-symmetric matrix

P1A =

bA1A bA1BT

B bA1AB B bA1BT

.

This turns out to be self-adjoint in the inner product (7.1) (i.e. it satisfies

the condition (7.2)) where

H =

A bA 0

0 I

.

Moreover P1A is positive definite in this inner product. Note that it is

necessary for A bA to be positive definite in order to have an inner product

(theoretically – and quite often in practice – this can be achieved by scaling

of bA), and in fact the positive definiteness also requires this condition.

Perhaps this appears to be a rather exotic structure, but much practical

use has been made of this particular example; the possibility for other really

useful examples remains unrealised but rather intriguing.

A final comment concerns the possibility of non-symmetric precondition-

ing of symmetric systems. We have earlier noted that such preconditioning

would destroy symmetry in general and thus is undesirable. However, in

the conceivable, but presumably rare, situation that A = AT , P 6= P T but

P1A = APT—that is, PT is one of the matrices that is self-adjoint

in the (possibly indefinite) inner product related to the symmetric matrix

A—then it would be possible to use cg for P1Ax = P1b when P1A

is positive definite or minres if P1A is indefinite. The cg error would

converge in k ·kP1A and for minres it would be P1(bAx), which would

converge in k ·kI . We do not know of any instance of this structure for non-

symmetric P , however the symmetric preconditioner P1 = q(A) for any

polynomial, q, clearly does commute with A, and there is some literature

on such polynomial preconditioning (see for example O’Leary (1991)).

8. Some comments on practical computing

Some important general ideas remain to be mentioned that do not in them-

selves comprise a particular preconditioning approach, but which have broad

significance.

First we mention deflation. It can be helpful – sometimes extremely help-

ful – to remove certain subspaces from a Krylov subspace. For example, if

one has a symmetric and positive definite matrix with a reasonable spread of

eigenvalues except for a few very small eigenvalues about which one has some

36 A. J. Wathen

knowledge of the corresponding eigenspace, then cg convergence could be

much improved if the e↵ect of this eigenspace could be removed or ignored.

This is the idea of deflation; if the eigenspace spanned by the eigenvectors

corresponding to the m smallest eigenvalues 1 2 . . . m can be

deflated, then the e↵ective condition number is = max/m+1 and it is

this which appears in the cg convergence bound (3.1). The crucial step is

to identify the deflating subspace.

The deflation idea is most e↵ective when the problem naturally presents

such a space. For example, see Brandt and Ta’asan (1986), who deflate the

low frequency eigenspaces which correspond to negative eigenvalues for a

Helmholtz problem, and Jonsthovel et al. (2012) who describe a prototyp-

ical application involving composite materials. However, various subspaces

which arise from iterations of a previous linear system solve, or other sources,

can be used. In particular, when restarting gmres, one can use informa-

tion from previous iterates via deflation. There are many papers describing

various approaches: for symmetric and positive definite matrices, see for

example Frank and Vuik (2001), and for nonsymmetric matrices, see for

example Erhel et al. (1995), Sifuentes et al. (2013) and Giraud et al. (2010).

When solving sequences of linear systems, the related idea of ‘recycling’

can be considered (Parks et al. 2006). Deflation is central to the iterative

computation of eigenvalues (Lehoucq and Sorensen 1996).

As parallel computing becomes more widespread, preconditioners which

are parallelizable or inherently parallel become more important. After all, it

is large scale computation for which preconditioned iterative solution tech-

nology is generally being developed. Very simple preconditioning such as

diagonal scaling is obviously parallel, but the utility of such an approach is

limited, as we have described. Methods based on incomplete factorization

are more sequential in design, and though there are parallel variants, often

speed of convergence is compromised by ordering and separating variables,

etc. to obtain the necessary independence of calculations. Such is the im-

portance of multigrid and AMG methods that parallel forms which give up

the least of the potency of sequential implementation are critical. There are,

however, compromises to be made in going parallel (Worley 1991). There is

much to be considered here, and this is not an article on parallel multigrid.

The paradigm of domain decomposition is clearly ideal for parallel compu-

tation and this is precisely what it was invented for. Sparse approximate

inverses were precisely developed because they appeared to be a tractable

way to compute a purely algebraic (‘given a matrix’) preconditioner in par-

allel. Their e↵ectivity, however, remains unclear. One beneficial use of

these ideas appears to be in contexts where one is not seeking to approxi-

mate an inverse, but rather seeking some more limited goal like minimising

a commutator—see, for example Elman et al. (2006).

Since preconditioning is used with iterative methods, sensible criteria for

Preconditioning 37

stopping an iteration must be chosen. In the ‘given a matrix’ context,

some fairly arbitrary small tolerance for a residual norm is usually specified.

When one is ‘given a problem’, however, there can be criteria appropriate

to the problem. In particular for PDE problems, reducing iteration error to

orders of magnitude less than discretization error is untenable and rather

wasteful; practical criteria are given by Arioli (2004) and Arioli et al. (2005).

An important point is that a larger-dimensional matrix for a given problem

comes from a more accurate discretization, so any stopping tolerance should

also in general be smaller on finer grids; see Elman et al. (2014b, Section

2.1.2).

In the general context of solution of linear systems of equations, it is often

regarded as desirable to reduce the number of variables/degrees of freedom

if this is easy; such thinking is associated completely with direct solution

methods. In the context of iterative methods, it can be preferable to have a

larger dimensional problem if it is then easier to identify and apply an ap-

propriate preconditioner. That is, it can be convenient even to enlarge the

dimension of given systems if the resulting matrices are more readily precon-

ditioned, as long as the storage requirement for the enlarged matrix and the

associated vectors does not become unacceptable. In finite element analysis,

for example, the idea of ‘static condensation’ is precisely the elimination of

variables with only local connections, but although it slightly reduces the

number of free variables, and hence the matrix dimension, the resulting

smaller matrices can be less easy to precondition even though they often

possess a slightly reduced condition number. More radical is the idea of

the first-order system least-squares (FOSLS) approach, where extra depen-

dent variables are explicitly introduced making a rather larger-dimensional

system in general, but a system for which multigrid solvers can be readily

applied. For an application of this idea, see de Sterck et al. (2004), for

example.

A further example of the freedom and possibilities is the ‘all-at-once’, ‘one-

shot’ or ‘monolithic’ approach, where preconditioning for separate parts of a

problem – perhaps separate physical processes – are employed together with

a Krylov subspace method for the whole of a coupled problem for which the

required matrix-vector products are computable in a matrix-free way, that

is, without explicit storage and operation on the complete two-dimensional

array. See, for example Heil et al. (2008), Haber and Ascher (2001) or Rees

et al. (2010b). When matrix-vector products are readily computed, there

is always a Krylov subspace method that can be applied to the monolithic

problem; the representation of the component parts can very conveniently

be the job of a block-structured preconditioner. Good examples of this

kind of approach are recent ideas for preconditioning problems coming from

magnetohydrodynamics (Phillips et al. 2014, Wathen 2014) and from geo-

dynamics (Rhebergen et al. 2014).

38 A. J. Wathen

A final comment is aimed at anyone who would like to experiment and

see the potential of preconditioning in an easy computational setting. The

freely downloadable ifiss software (Elman et al. 2007, Elman et al. 2014a),

which runs under basic matlab or octave enables such experiments.

9. Summary

We conclude with a few general guidelines on preconditioner design and

selection.

• Keep to the original structure when preconditioning (e.g., preserve

symmetry, partitioning into blocks, etc.)

• The more knowledge about a problem that can be represented in a

preconditioner, the better.

• The work in applying the preconditioner should ideally be commen-

surate with the work in matrix–vector multiplication: that would be

O(n) for a sparse matrix and O(n2) for a general full matrix.

• It may be easier to precondition a matrix of larger dimension, for which

structures are clearer, than to eliminate variables, giving a less struc-

tured matrix of smaller dimension.

There is, of course, no such concept as a best preconditioner: the only

two candidates for this would be P = I, for which the preconditioning takes

no time at all, and P = A, for which only one iteration would be required

for solution by any iterative method. However, every practitioner knows

when they have a good preconditioner which enables feasible computation

and solution of problems. In this sense, preconditioning will always be an

art rather than a science.

Acknowledgements

I am grateful to Michele Benzi, Howard Elman and David Silvester for read-

ing an early draft; their comments and suggestions have led to significant

improvements.

Parts of this survey were written whilst visiting CERFACS, Toulouse,

France and the Memorial University of Newfoundland, Canada. I am grate-

ful to Serge Gratton, Xavier Vasseur, Hermann Brunner and Scott MacLach-

lan for their support and encouragement.

Preconditioning 39

REFERENCES

P.R. Amestoy, I.S. Du↵, J-Y. L’Excellent and J. Koster (2000), ’MUMPS: a general

purpose distributed memory sparse solver’, In Proc. PARA2000, 5th Interna-

tional Workshop on Applied Parallel Computing, Springer-Verlag, 122–131.

J.R. Appleyard and I.M. Cheshire ’Nested factorization’, Society of Petroleum

Engineers, SPE 12264, presented at the Seventh SPE Symposium on Reservoir

Simulation, San Francisco, 1983.

M. Arioli (2004), ’A stopping criterion for the conjugate gradient algorithm in a

finite element method framework’, Numer. Math. 97, 1–24.

M. Arioli, D. Loghin and A.J. Wathen (2005), ’Stopping criteria for iterations in

finite element methods’, Numer. Math. 99, 381–410.

D.N. Arnold, R.S. Falk and R. Winther (1997), ’Preconditioning in H( div) and

applications’, Math. Comput. 66, 957–984.

Z.-Z. Bai, G.H. Golub and M.K. Ng (2003), ’Hermitian and skew-Hermitian split-

ting methods for non-Hermitian positive definite linear systems’, SIAM J.

Matrix Anal. Appl. 24, 603–626.

N.S. Bakhvalov (1966), ’On the convergence of a relaxation method with natural

constraints on the elliptic operator’, USSR Comp. Math. and Math. Phys.

6, 101–135.

M. Bebendorf (2008), Hierarchical Matrices: A Means to Eciently Solve Ellip-

tic Boundary Value Problems, Lecture Notes in Computational Science and

Engineering, vol. 63, Springer-Verlag.

M. Benzi (2002), ’Preconditioning techniques for large linear systems: A survey’,

J. Comput. Phys. 182, 418–477.

M. Benzi, J.K. Cullum and M. Tuma (2000), ’Robust approximate inverse precon-

ditioning for the conjugate gradient method’, SIAM J. Sci. Comput. 22,1318–

1332.

M. Benzi, G.H. Golub and J. Liesen (2005), ’Numerical solution of saddle point

problems’, Acta Numerica 14, 1–137.

M. Benzi, C.D. Meyer and M. Tuma (1996), ’A sparse approximate inverse precon-

ditioner for the conjugate gradient method’, SIAM J. Sci. Comput. 17, 1135–

1149.

M. Benzi, M. Ng, Q. Niu and Z. Wang (2011a), ’A relaxed dimensional factorization

preconditioner for the incompressible Navier-Stokes equations’, J. Comput.

Phys. 230, 6185–6202.

M. Benzi and M.A. Olshanskii (2011), ’Field-of-values convergence analysis of aug-

mented Lagrangian preconditioners for the linearized Navier-Stokes problem’,

SIAM J. Numer. Anal. 49, 770–788.

M. Benzi, M.A. Olshanskii and Z. Wang (2011b), ’Modified augmented Lagrangian

preconditioners for the incompressible Navier-Stokes equations’, Int. J. Nu-

mer. Meth. Fluids 66, 486–508.

M. Benzi and V. Simoncini (2006), ’On the eigenvalues of a class of saddle point

matrices’, Numer. Math. 103, 173–196.

M. Benzi and M. Tuma (1998), ’A sparse approximate inverse preconditioner for

nonsymmetric linear systems’, SIAM J. Sci. Comput. 19, 968–994.

M. Benzi and M. Tuma (2003), ’A robust incomplete factorization preconditioner

for positive definite matrices’, Numer. Linear Algebra Appl. 10, 385–400.

40 A. J. Wathen

M. Benzi and A.J. Wathen (2008), ’Some preconditioning techniques for saddle

point problems’, in Model order reduction: theory, research aspects and ap-

plications, 195–211.

L. Bergamaschi, J. Gondzio and G. Zilli (2004), ’Preconditioning indefinite systems

in interior point methods for optimization’, Comput. Optim. Appl. 28, 149–

171.

M. Bern, J.R. Gilbert, B. Hendrickson, N. Nguyen and S. Toledo (2006), ’Support-

Graph preconditioners’, SIAM J. Matrix Anal. Appl. 27, 930–951.

A. Bjo¨rck (1996), Numerical Methods for Least Squares Problems, Society for In-

dustrial and Applied Mathematics.

J.W. Boyle, M.D. Mihajlovic and J.A. Scott (2010), ’HSL MI20: an ecient AMG

preconditioner for finite element problems in 3D’, Int. J. Numer. Meth. En-

gnrg 82, 64–98.

D. Braess and P. Peisker (1986), ’On the numerical solution of the biharmonic equa-

tion and the role of squaring matrices for preconditioning’, IMA J. Numer.

Anal. 6, 393–404.

J. Bramble and J. Pasciak (1988), ’A preconditioning technique for indefinite sys-

tem resulting from mixed approximations of elliptic problems’, Math. Com-

put. 50, 1–17.

J.H. Bramble, J.E. Pasciak and J. Xu (1990), ’Parallel multilevel preconditioners’,

Math. Comput. 55,1–22.

A. Brandt (1977), ’Multilevel adaptive solutions to boundary-value problems’,

Math. Comput. 31,333–390.

A. Brandt and S. Ta’asan (1986), ’Multigrid methods for nearly singular and

slightly indefinite problems’, in Multigrid Methods II, W. Hackbusch and

U. Trottenberg, eds., Lecture Notes in Math. 1228, Springer-Verlag, Berlin,

pp. 99–121.

O. Bro¨ker, M.J. Grote, C. Mayer and A. Reusken (2001), ’Robust parallel smooth-

ing for multigrid via sparse approximate inverses’, SIAM J. Sci Comp.

23, 1395–1416.

R.H.-F. Chan and X.-Q. Jin (2007), An Introduction to Iterative Toeplitz Solvers,

Society for Industrial and Applied Mathematics.

T.F. Chan and T.P. Mathew (1994), ’Domain decomposition algorithms’, Acta

Numerica 3, 61–143.

S. Chandrasekaran, P. Dewilde, M. Gu, T. Pals, X. Sun, A.J. van der Veen and

D. White (2005), ’Some fast algorithms for sequentially semiseparable repre-

sentation’, SIAM J. Matrix Anal. Appl. 27, 341–364.

K. Chen (2005), Matrix preconditioning techniques and applications, Cambridge

University Press.

P. Concus and G.H. Golub (1976), ’A generalized conjugate gradient method for

nonsymmetric systems of linear equations’, in R. Glowinski and P.L. Lions

eds., Lecture Notes in Economics and Mathematical Systems 134, 56–65,

Springer.

J.W. Cooley and J.W. Tukey (1965), ’An algorithm for the machine calculation of

complex Fourier series’, Math. Comput. 19, 297–301.

T.A. Davis (2004), ’Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern

multifrontal method’, ACM Trans. Math. Softw. 30, 196–199.

Preconditioning 41

’International Conference on Domain Decomposition Methods’,

http://www.ddm.org/conferences.html .

H. de Sterck, T. Manteu↵el, S. McCormick and L. Olson (2004), ’Least-squares

finite element methods and algebraic multigrid solvers for linear hyperbolic

PDEs’, SIAM J. Sci Comp. 26, 31–54.

H.S. Dollar, N.I.M. Gould, W.H.A. Schilders and A.J. Wathen (2006), ’Implicit-

factorization preconditioning and iterative solvers for regularized saddle-point

systems’, SIAM J. Matrix Anal. Appl. 28, 170–189.

H.S. Dollar and A.J. Wathen (2006), ’Approximate factorization constraint precon-

ditioners for saddle-point matrices’, SIAM J. Sci. Comput. 27, 1555–1572.

T.F. Dupont, R.P. Kendall and H.H. Rachford (1968), ’An approximate factoriza-

tion procedure for solving self-adjoint elliptic di↵erence equations’, SIAM J.

Numer. Anal. 5, 554–573.

H.C. Elman and M.P. Chernesky (1993), ’Ordering e↵ects on relaxation methods

applied to the discrete one-dimensional convection-di↵usion equation’, SIAM

J. Numer. Anal. 30, 1268–1290.

H.C. Elman, O.G. Ernst and D.P. O’Leary (2001), ’A multigrid method enhanced

by Krylov subspace iteration for discrete Helmholtz equations’, SIAM J. Sci.

Comput. 23, 1291–1315.

H.C. Elman, V.E. Howle, J.N. Shadid, R. Shuttleworth and R. Tuminaro (2006),

’Block preconditioners based on approximate commutators’, SIAM J. Sci.

Comput. 27, 1651–1668.

H.C. Elman, A. Ramage and D.J. Silvester (2007), ’Algorithm 886: IFISS, A Mat-

lab toolbox for modelling incompressible flow’, ACM Trans. Math. Softw.

33(2), article no. 14.

H.C. Elman, A. Ramage and D.J. Silvester (2014a), ’IFISS: A computational lab-

oratory for investigating incompressible flow problems’, SIAM Rev. 56, 261–

273.

H.C. Elman and M.H. Schultz (1986), ’Preconditioning by fast direct methods

for non-self-adjoint nonseparable elliptic equations’, SIAM J. Numer. Anal.

23, 44–57.

H.C. Elman and D.J. Silvester (1996), Fast nonsymmetric iterations and precon-

ditioning for the Navier-Stokes equations’, SIAM J. Sci. Comput. 17, 33–46.

H.C. Elman, D.J. Silvester and A.J. Wathen (2014b), Finite Elements and Fast

Iterative Solvers: with applications in incompressible fluid dynamics, Second

Edition, Oxford University Press.

H.C. Elman, D.J. Silvester and A.J. Wathen (2005), Finite Elements and Fast

Iterative Solvers: with applications in incompressible fluid dynamics, Oxford

University Press.

H.C. Elman and X. Zhang (1995), ’Algebraic analysis of the hierarchical basis

preconditioner’, SIAM J. Matrix Anal. Appl. 16, 192–206.

B. Engquist and L. Ying (2011a), ’Sweeping preconditioner for the Helmholtz equa-

tion: Hierarchical matrix representation’, Comm. Pure Appl. Math. 64, 697–

735.

B. Engquist and L. Ying (2011b), ’Sweeping preconditioner for the Helmholtz equa-

tion: moving perfectly matched layers’, Multiscale Model. Simul. 9, 686–710.

42 A. J. Wathen

J. Erhel, K. Burrage and B. Pohl (1995), ’Restarted GMRES preconditioned by

deflation’, J. Comput. Appl. Math. 69, 303–318.

Y.A. Erlangga, C.W. Oosterlee and C, Vuik (2006), ’A novel multigrid based pre-

conditioner for heterogeneous Helmholtz problems’, SIAM J. Sci. Comput.

27, 1471–1492.

O.G. Ernst and M.J. Gander (2012), ’Why it is dicult to solve Helmholtz prob-

lems with classical iterative methods’, in Numerical analysis of multiscale

problems, 325–363, Springer.

V. Faber and T. Manteu↵el (1984), ’Necessary and sucient conditions for the

existence of a conjugate gradient method’, SIAM J. Numer. Anal. 21, 315–

339.

R. P. Federenko (1964), ’The speed of convergence of one iterative process’, USSR

Comp. Math. and Math. Phys. 4, 227–235.

B. Fischer (2011), Polynomial Based Iteration Methods for Symmetric Linear Sys-

tems, Society for Industrial and Applied Mathematics.

B. Fischer, A. Ramage, D.J. Silvester and A.J. Wathen (1998), ’Minimum residual

methods for augmented systems’, BIT 38, 527–543.

J. Frank and C. Vuik (2001), ’On the construction of deflation-based precondition-

ers’, SIAM J. Sci. Comput. 23, 442–462.

R.W. Freund and N.M. Nachtigal (1991), ’QMR a quasi-minimal residual method

for non-Hermitian linear systems’, Numer. Math. 60, 315–339.

R.W. Freund and N.M. Nachtigal (1994), ’A new Krylov-subspace method for

symmetric indefinite linear systems’, Proceedings of the 14th IMACS World

Congress on Computational and Applied Mathematics, W.F. Ames(Ed.),

IMACS, 1253–1256.

R.W. Freund and S. Ruscheweyh (1986), ’On a class of Chebyshev approximation

problems which arise in connection with a conjugate gradient type method’,

Numer. Math. 48, 525–542.

M.J. Gander, F. Magoule`s and F. Nataf (2002), ’Optimized Schwarz methods with-

out overlap for the Helmholtz equation’, SIAM J. Sci. Comput. 24, 38–60.

M.W. Gee, C.M. Siefert, J.J. Hu, R.S. Tuminaro and M.G. Sala (2006), ’ML 5.0

Smoothed Aggregation User’s Guide’, Sandia National Laboratories, Report

number SAND2006-2649.

M.B. Giles (2008), ’Multilevel Monte Carlo path simulation’, Oper. Res. 56, 607–

617.

L. Giraud, S. Gratton, X. Pinel and X. Vasseur (2010), ’Flexible GMRES with

deflated restarting’, SIAM J. Sci. Comput. 32, 1858–1878.

N.I.M. Gould, M.E. Hribar and J. Nocedal (2001), ’On the solution of equality

constrained quadratic programming problems arising in optimization’, SIAM

J. Sci. Comput. 23, 1376–1395.

N.I.M. Gould and V. Simoncini (2009), ’Spectral analysis of saddle point matrices

with indefinite leading blocks’, SIAM J. Matrix Anal. Appl. 31, 1152–1171.

I.G. Graham and M.J. Hagger (1999), ’Unstructured additive Schwarz-conjugate

gradient method for elliptic problems with highly discontinuous coecients’,

SIAM J. Sci. Comput. 20, 2041–2066.

L. Grasedyck and W. Hackbusch (2003), ’Construction and arithmetics of H-

matrices’, Computing 70, 295–334.

Preconditioning 43

A. Greenbaum (1997), Iterative Methods for Solving Linear Systems, Society for

Industrial and Applied Mathematics.

L, Greengard and V. Rokhlin (1987), ’A fast algorithm for particle simulations’, J.

Comput. Phys. 73, 325–348.

C. Greif and D. Scho¨tzau (2006), ’Preconditioners for saddle point linear systems

with highly singular (1,1) blocks’, Elect. Trans. Numer. Anal. 22, 114–121.

C. Greif and D. Scho¨tzau (2007), ’Preconditioners for the discretized time-harmonic

Maxwell equations in mixed form’, Numer. Lin. Alg. Appl. 14, 281–297.

M.J. Grote and T. Huckle (1997), ’Parallel preconditioning with sparse approxi-

mate inverses’, SIAM J. Sci. Comput. 18, 83–853.

A. Gu¨nnel, R. Herzog and E. Sachs (2014), ’A note on preconditioners and scalar

products in Krylov subspace methods for self-adjoint problems in Hilbert

space’, Elect. Trans. Numer. Anal. 41, 13–20.

I. Gustafsson (1978), ’A class of first order factorization methods’, BIT 18, 142–

156.

E. Haber and U.M. Ascher (2001), ’Preconditioned all-at-once methods for large,

sparse parameter estimation problems’, Inverse Probl. 17, 1847–1864.

W. Hackbusch (1994), Iterative Solution of Large Sparse Systems of Equations,

Springer-Verlag.

W. Hackbusch (1999), ’A sparse matrix arithmetic based on H-matrices. I. Intro-

duction to H-matrices’, Computing 62, 89–108

W. Hackbusch, B.K. Khoromskij and S. Sauter (2000), ’On H2-matrices’, in Lec-

tures on Applied Mathematics, 9–29, H. Bungartz, R. Hoppe and C. Zenger,

eds..

M. Heil, A.L. Hazel and J. Boyle (2008), ’Solvers for large-displacement fluid-

structure interaction problems: segregated versus monolithic approaches’,

Comput. Mech. 43, 91–101.

V.E. Henson and U.M. Yang (2000), ’BoomerAMG: a parallel algebraic multigrid

solver and preconditioner’, Appl. Numer. Math. 41, 155–177.

M.R. Hestenes and E. Stiefel (1952), ’Methods of conjugate gradients for solving

linear problems’, J. Res. Nat. Bur. Stand. 49, 409–436.

R. Hiptmair and J. Xu (2007), ’Nodal auxiliary space preconditioning in H(curl)

and H(div) spaces’, SIAM J. Numer. Anal. 45, 2483–2509.

R.M. Holland, A.J. Wathen and G.J. Shaw, (2005), ’Sparse approximate inverses

and target matrices’, SIAM J. Sci. Comput. 26, 1000–1011.

’HSL(2013). A collection of fortran codes for large scale scientific computation’,

http://www.hsl.rl.ac.uk .

V. John and L. Tobiska (2000), ’Numerical performance of smoothers in coupled

multigrid methods for the parallel solution of the incompressible Navier-

Stokes equations’, Int. J. Numer. Meth. Fluids 33, 453–473.

T.B. Jonsthovel, M.B. van Gijzen, S.C. Maclachlan, C. Vuik and A. Scarpas (2012),

’Comparison of the deflated preconditioned conjugate gradient method and

algebraic multigrid for composite materials’, Comput. Mech. 50, 321–333.

D. Kay, D. Loghin and A.J. Wathen (2002), ’A preconditioner for the steady-state

Navier-Stokes equations’, SIAM J. Sci. Comput. 24, 237–256.

C. Keller, N.I.M. Gould and A.J. Wathen (2000), ’Constraint preconditioning for

indefinite linear systems’, SIAM J. Matrix Anal. Appl. 21, 1300–1317.

44 A. J. Wathen

A. Klawonn (1998), ’Block-triangular preconditioners for saddle point problems

with a penalty term’, SIAM J. Sci. Comput. 19, 172–184.

A. Klawonn and G. Starke (1999), ’Block triangular preconditioners for nonsym-

metric saddle point problems: field-of-value analysis’, Numer. Math. 81, 577–

594.

L.Y. Kolotilina and A.Y. Yeremin (1993), ’Factorized sparse approximate inverse

preconditioning. I. Theory’, SIAM J. Matrix Anal. Appl. 14, 45–58.

J. Korzak (1999), ’Eigenvalue relations and conditions of matrices arising in linear

programming’, Computing 62, 45–54.

A.L. Laird and M.B. Giles (2002), ’Preconditioned iterative solution of the 2D

Helmholtz equation’, report NA-02/12, Oxford University Computer Labora-

tory, Oxford, UK.

R.B. Lehoucq and D.C. Sorensen (1996), ’Deflation techniques for an implicitly

restarted Arnoldi iteration’, SIAM J. Matrix Anal. Appl. 17, 789–821.

X.S. Li (2005), ’An overview of SuperLU: Algorithms, implementation and user

interface’, ACM Trans. Math. Software 31, 302–325.

J. Liesen and Z. Strakosˇ (2013), Krylov subspace methods : principles and analysis,

Oxford University Press.

O.E. Livne and A. Brandt (2012), ’Lean Algebraic Multigrid (LAMG): Fast graph

Laplacian linear solver’, SIAM J. Sci. Comput. 34, B499–B522.

D. Loghin and A.J. Wathen (2004), ’Analysis of preconditioners for saddle-point

problems’, SIAM J. Sci. Comput. 25, 2029–2049.

J. Ma´lek and Z. Strakosˇ (2014), Preconditioning and the Conjugate Gradient

Method in the Context of Solving PDEs, Society for Industrial and Applied

Mathematics.

S. Manservisi (2006), ’Numerical analysis of Vanka-type solvers for steady Stokes

and Navier-Stokes flows’, SIAM J. Numer. Anal. 44, 2025–2056.

K.-A. Mardal and R. Winther (2011), ’Preconditioning discretizations of systems

of partial di↵erential equations’, Numer. Linear Algebra Appl. 18, 1–40.

T.P. Mathew (2008), Domain Decomposition Methods for the Numerical Solution

of Partial Di↵erential Equations, Springer.

J.A. Meijerink and H.A. van der Vorst (1977), ’An iterative solution method for

linear systems of which the coecient matrix is a symmetric M-matrix’,Math.

Comput. 31, 148–162.

G.A. Meurant (1999), Computer solution of large linear systems, North-Holland.

M. Magolu Monga Made, R. Beauwens and G. Warze´e (2000), ’Preconditioning

of discrete Helmholtz operators perturbed by a diagonal complex matrix’,

Commun. Numer. Methods Engrg. 16, 801–817.

M.F. Murphy, G.H. Golub and A.J. Wathen (2000), ’A note on preconditioning for

indefinite linear systems’, SIAM J. Sci. Comput. 21, 1969–1972.

M. Neytcheva (1999), talk presented at the first International Conference on Pre-

conditioning, Minneapolis, 1999.

M.K. Ng (2004), Iterative Methods for Toeplitz Systems, Oxford University Press.

Y. Notay (2012), ’Aggregation-based algebraic multigrid for convection-di↵usion

equations’, SIAM J. Sci. Comput. 34, A2288–A2316.

D.P. O’Leary ’Yet another polynomial preconditioner for the conjugate gradient

algorithm’, Linear Algebra Appl. 154–156, 377–388.

Preconditioning 45

M.A. Olshanskii and E.E. Tyrtyshnikov (2014), Iterative Methods for Linear Sys-

tems: Theory and Applications, Society for Industrial and Applied Mathe-

matics.

M.E.G. Ong (1997), ’Hierarchical basis preconditioners in three dimensions’, SIAM

J. Sci. Comput. 18, 479–498.

K. Otto (1996), ’Analysis of preconditioners for hyperbolic partial di↵erential equa-

tions’, SIAM J. Numer. Anal. 33, 2131–2165.

C.C. Paige and M.A. Saunders (1975), ’Solution of sparse indefinite systems of

linear equations’, SIAM J. Num. Anal. 12, 617–629.

C.C. Paige and M.A. Saunders (1982), ’LSQR: An algorithm for sparse linear

equations and sparse least squares’, ACM Trans. Math. Software 8, 43–71.

M.L. Parks, E. de Sturler, G. Mackey, D.D. Johnson and S. Maiti (2006), ’Recycling

Krylov subspaces for sequences of linear systems’, SIAM J. Sci. Comput.

28, 1651–1674.

J. Pestana and A.J. Wathen (2014), ’A preconditioned MINRES method for non-

symmetric Toeplitz matrices’, Numerical Analysis report 14/11, Oxford Uni-

versity Mathematical Institute, Oxford, UK.

J. Pestana and A.J. Wathen (2013a), ’Combination preconditioning of saddle point

systems for positive definiteness’, Numer. Linear Algebra Appl. 20, 785–808.

J. Pestana and A.J. Wathen (2013b), ’On choice of preconditioner for mini-

mum residual methods for non-Hermitian matrices’, J. Comput. Appl. Math.

249, 57–68.

J. Pestana and A.J. Wathen (2015), ’Natural preconditioning and iterative methods

for saddle point systems’, SIAM Rev. to appear.

E.G. Phillips, H.C. Elman, E.C. Cyr, J.N. Shadid and R.P. Pawlowski (2014), ’A

block preconditioner for an exact penalty formulation for stationary MHD’,

University of Maryland, Computer Science report CS-TR-5032.

J. Poulson, B. Engquist, S. Li and L. Ying (2013), ’A parallel sweeping precon-

ditioner for heterogeneous 3D Helmholtz equations’, SIAM J. Sci. Comput.

35, C194–C212.

A. Quarteroni and A. Valli (1999), Domain Decomposition Methods for Partial

Di↵erential Equations, Oxford University Press.

A. Ramage (1999), ’A multigrid preconditioner for stabilised discretisations of

advection-di↵usion problems’, J. Comput. Appl. Math. 101, 187–203.

T. Rees, H.S. Dollar and A.J. Wathen (2010a), ’Optimal solvers for PDE-

constrained optimization’, SIAM J. Sci. Comput. 32, 271–298.

T. Rees, M. Stoll and A.J. Wathen (2010b), ’All-at-once preconditioning in PDE-

constrained optimization’, Kybernetika 46, 341–360.

S. Rhebergen, G.N. Wells, R.F. Katz and A.J. Wathen (2014), ’Analysis of block-

preconditioners for models of coupled magma/mantle dynamics’, SIAM J.

Sci. Comput. 36, A1960–A1977.

Y. Saad (1993), ’A flexible inner-outer preconditioned GMRES algorithm’, SIAM

J. Sci. Comput. 14, 461–469.

Y. Saad (1996), ’ILUM: a multi-elimination ILU preconditioner for general sparse

matrices’, SIAM J. Sci. Comput. 17, 830–847.

Y. Saad (2003), Iterative Methods for Sparse Linear Systems, second edition, So-

ciety for Industrial and Applied Mathematics.

46 A. J. Wathen

Y. Saad and M.H. Schultz (1986), ’GMRES: A generalized minimal residual algo-

rithm for solving nonsymmetric linear systems’, SIAM J. Sci. Statist. Com-

put. 7, 856–869.

J.A. Sifuentes, M. Embree and R.B. Morgan (2013), ’GMRES convergence for per-

turbed coecient matrices, with application to approximate deflation precon-

ditioning’, SIAM. J. Matrix Anal. Appl. 34, 1066–1088.

D.J. Silvester and A.J. Wathen (1994), ’Fast iterative solution of stabilised Stokes

systems Part II: Using general block preconditioners’, SIAM J. Numer Anal.

31, 1352–1367.

V. Simoncini and D.B. Szyld (2007), ’Recent computational developments in

Krylov subspace methods for linear systems’, Numer. Linear Algebra Appl.

14, 1–59.

G.L.G. Sleijpen and D.R. Fokkema (1993), ’BiCGstab(ell) for linear equations in-

volving unsymmetric matrices with complex spectrum’, Elect. Trans. Numer.

Anal. 1, 11–32.

B. Smith, P. Bjørstad and W. Gropp (1996), Domain Decomposition, Parallel Mul-

tilevel Methods for Elliptic Partial Di↵erential Equations, Cambridge Univer-

sity Press.

P. Sonneveld and M.B. van Gijzen (2008), ’IDR(s): A family of simple and fast

algorithms for solving large nonsymmetric systems of linear equations’, SIAM

J. Sci. Comput. 31, 1035–1062.

D.A. Spielman and S.-H. Teng (2014), ’Nearly-linear time algorithms for precon-

ditioning and solving symmetric, diagonally dominant linear systems’, SIAM

J Matrix Anal. Appl. 35, 835–885.

M. Stoll and A.J. Wathen, (2008), ’Combination preconditioning and the Bramble-

Pasciak+ preconditioner’, SIAM J Matrix Anal. Appl. 30, 582–608.

G. Strang (1986), ’A proposal for Toeplitz matrix calculations’, Stud. Appl. Math.

74, 171–176.

W-P. Tang and W.L. Wan (2000), ’Sparse approximate inverse smoother for multi-

grid’, SIAM J. Matrix Anal. Appl. 21, 1236–1252.

O. Taussky (1972), ’The role of symmetric matrices in the study of general matri-

ces’, Linear Algebra Appl. 5, 147–154.

A. Toselli and O. Widlund (2004), Domain Decompposition Methods - Algorithms

and Theory, Springer Series in Computational Mathematics, Vol. 34.

L.N. Trefethen and M. Embree (2005), Spectra and Pseudospectra, Princeton Uni-

versity Press.

U. Trottenberg, C.W. Oosterlee and A. Schu¨ler (2000), Multigrid, Academic Press.

P. Tsuji, B. Engquist and L.Ying (2012), ’A sweeping preconditioner for time-

harmonic Maxwells equations with finite elements’, J. Comput. Phys.

231, 3770–3783.

M.B. van Gijzen, Y.A. Erlangga and C. Vuik (2007), ’Spectral analysis of the

discrete Helmholtz operator preconditioned with a shifted Laplacian’, SIAM

J. Sci. Comput. 29, 1942–1958.

H.A. van der Vorst (1992), ’Bi-CGSTAB: A fast and smoothly converging variant of

Bi-CG for the solution of nonsymmetric linear systems’, SIAM J. Sci. Statist.

Comput. 13. 631–644.

Preconditioning 47

H.A. van der Vorst (2003), Iterative Krylov Methods for Large Linear Systems,

Cambridge University Press.

C.F. Van Loan (1992), Computational Frameworks for the Fast Fourier Transform,

Society for Industrial and Applied Mathematics.

P.S. Vassilevski (2008), Multilevel Block Factorization Preconditioners, Matrix-

based Analysis and Algorithms for Solving Finite Element Equations,

Springer.

R. Verfu¨rth (1984), ’A combined conjugate gradient-multigrid algorithm for the

numerical solution of the Stokes problem’, IMA J. Numer. Anal. 4, 441–455.

A.J. Wathen (1986), ’Realistic eigenvalue bounds for the Galerkin mass matrix’,

IMA J. Numer. Anal. 7 449–457.

A.J. Wathen (2007), ’Preconditioning and convergence in the right norm’, Int. J.

Comput. Math. 84, 1199–1209.

A.J. Wathen, B. Fischer and D.J. Silvester (1995), ’The convergence rate of the

minimum residual method for the Stokes problem’, Numer. Math. 71, 121–

134.

A.J. Wathen and T. Rees (2009), ’Chebyshev semi-iteration in preconditioning for

problems including the mass matrix’, Elect. Trans. Numer. Anal. 34, 125–135.

M.P. Wathen (2014), ’Iterative solution of a mixed finite element discretisation of

an incompressible magnetohydrodynamics problem’, MSc dissertation, De-

partment of Computer Science, University of British Columbia.

O. Widlund (1978), ’A Lanczos method for a class of nonsymmetric systems of

linear equations’, SIAM J. Numer. Anal. 15, 801–812.

P.H. Worley (1991), ’Limits on parallelism in the numerical solution of linear partial

di↵erential equations’, SIAM J. Sci. Stat. Comput. 12, 1–35.

J. Xu (1992), ’Iterative methods by space decomposition and subspace correction’,

SIAM Rev. 34, 581–613

G. Ymbert, M. Embree and J. Sifuentes (2015), ’Approximate Murphy-Golub-

Wathen preconditioning for saddle point problems’, In Preparation.

H. Yserentant (1986), ’On the multi-level splitting of finite element spaces’, Numer.

Math. 49, 379–412.

Preconditioning

A. J. Wathen

Mathematical Institute, University of Oxford,

Andrew Wiles Building, Radcli↵e Observatory Quarter,

Woodstock Road, Oxford OX2 6GG, UK

E-mail: wathen@maths.ox.ac.uk

The computational solution of problems can be restricted by the availability

of solution methods for linear(ized) systems of equations. In conjunction with

iterative methods, preconditioning is often the vital component in enabling

the solution of such systems when the dimension is large. We attempt a broad

review of preconditioning methods.

CONTENTS

1 Introduction 1

2 Krylov subspace iterative methods 5

3 Symmetric and positive definite matrices 8

4 Semidefinite matrices 18

5 Symmetric and indefinite matrices 19

6 Nonsymmetric matrices 27

7 Some comments on symmetry 33

8 Some comments on practical computing 35

9 Summary 38

References 39

1. Introduction

This review aims to give a comprehensive view of the current state of pre-

conditioning for general linear(ized) systems of equations. However we start

with a specific example.

2 A. J. Wathen

Consider a symmetric matrix with constant diagonals:

A =

266664

a0 a1 · · an1

a1 a0 a1 · ·

· a1 a0 · ·

· · · · a1

an1 · · a1 a0

377775 2 Rn⇥n.

Such a matrix is called a symmetric Toeplitz matrix. It is clearly defined

by just n real numbers, but what is the best solution method for linear sys-

tems involving A? The fastest known stable factorisation (direct) methods

for solving a linear system Ax = b for x 2 Rn given b 2 Rn require O(n2)

floating point computations—this is still better than the O(n3) computa-

tions required for a general real n ⇥ n matrix using the well-known Gauss

Elimination algorithm.

Strang (1986) observed that one could approximate A by the circulant

matrix

C =

2666666666666666664

a0 a1 . . . abn2 c abn12 c . . . a2 a1

a1 a0 a1 . . . abn2 c abn12 c . . . a2

. . . a1 a0

. . . . . . abn2 c

. . . . . .

abn2 c . . .

. . .

. . .

. . . . . .

. . . abn12 c

abn12 c

. . . . . .

. . .

. . .

. . . . . . abn2 c

. . .

. . . abn2 c . . .

. . . a0 a1 . . .

a2 . . . abn12 c abn2 c . . . a1 a0 a1

a1 a2 . . . abn12 c abn2 c . . . a1 a0

3777777777777777775

2 Rn⇥n

where the outer diagonals of A have been overwritten by the central di-

agonals which are ‘wrapped around’ in a periodic manner. Circulant ma-

trices are diagonalised by a Fourier basis (Van Loan 1992) and it follows

that for any given b 2 Rn, the fast Fourier transform (FFT) (Cooley and

Tukey 1965) enables the computation of the matrix-vector product Cb and

of the solution, y 2 Rn of Cy = b in O(n log n) arithmetic operations. The

importance of Strang’s observation is that C is close to A if the discarded

entries adn2 e, . . . , an are small; precise statements will describe this in terms

of decay in the entries of A along rows/columns moving away from the di-

agonal. With such a property, one can show that not only are the matrices

A and C close entrywise, but that they are spectrally close, i.e. that the

eigenvalues of C1A (equivalently the eigenvalues of the matrix ‘pencil’

A C) are mostly close to 1. As a consequence, an appropriate iterative

method (it would be the conjugate gradient method if A is positive definite:

Preconditioning 3

see below) will compute a sequence of vectors x1, x2, . . . which converge

rapidly from any starting guess, x0 to the solution x of Ax = b. At each

iteration only a matrix-vector product with A and the solution of a single

system with C as well as vector operations need be computed. For any de-

sired accuracy, a fixed number of iterations not dependent on the dimension

n will be required. The outcome is that use of the FFT for solving systems

with C and for doing multiplications of vectors by A (since A can in an ob-

vious way be embedded in a circulant matrix of dimension 2n 1⇥ 2n 1)

guarantees that the iterative method will require only O(n log n) operations

to compute the solution.

Strang’s ‘proposal for Toeplitz matrix calculations’ is thus to employ the

circulant matrix C as a preconditioner for linear systems involving A. One

can naively think of the preconditioning as converting a linear system

Ax = b

which is not so readily solved into the system

C1Ax = C1b

for which there is a fast solution method. This is not exactly what is done:

generally C1A will be nonsymmetric even though A and C are symmetric,

but it is possible to precondition with C whilst preserving symmetry of the

preconditioned system matrix; see Saad (2003, Algorithm 9.1).

This example captures the essence of preconditioning. Not always does

one have matrices which are so obviously structured (nor with potentially

so many non-zero entries), nor does one always have a clever and ecient

tool like the FFT, but, especially for matrices of large dimension, it can

be extremely beneficial to have a good matrix approximation, i.e. a good

preconditioner, when solving linear systems of equations. Indeed, often such

a preconditioner is necessary in order to enable practical computation at all

of large-scale problems within reasonable time on any given computational

platform. The reduction from O(n2) to O(n log n) in the above example

brings some large-dimensional Toeplitz systems into the range of practical

computation.

One does not need to necessarily have a preconditioner as a matrix; a

linear operation (i.e. a computational procedure which applies a linear op-

eration to a vector) is what is required.

This article is a personal perspective on the subject of preconditioning.

It addresses preconditioning only in the most common context of the so-

lution of linear systems of equations. Much research on this problem has

been directed at sparse matrices coming from the discretization of partial

di↵erential equations (PDEs) by finite element, finite volume or finite di↵er-

ence methods. This is certainly one major source for such problems, but far

from the only area where large dimensional linear systems arise. It is true,

4 A. J. Wathen

however, that underlying properties of a PDE problem are often reflected in

the matrices which arise from discretization, so that matrices arising from

PDEs can inherit structural properties which allow them to not simply be

considered as arrays of numbers. Indeed there are many PDE analysts who

consider PDE matrices as simply members of families of discrete operators

with many of the properties of the underlying continuous operators from

which they are derived through discretization, see for example Mardal and

Winther (2011), or the recent book by Ma´lek and Strakosˇ (2014). This can

be a very valuable viewpoint. Indeed from this perspective, not applying any

preconditioning is seen to coincide with rather an unnatural choice of map-

pings on the underlying spaces (Gu¨nnel et al. 2014). Sometimes one comes

across the perception that multigrid must be the most ecient method to

solve any PDE problem: this is far from the truth, however multigrid meth-

ods and ideas remain very important both as solvers for certain simpler

problems, and, in particular, as components of preconditioners for a wider

class of problems as we shall describe.

Whether or not a matrix system arises from PDEs, knowledge of the

source problem can be helpful in designing a good preconditioner. Purely

algebraic approaches which simply take the numerical matrix entries as

input can have merit in situations where little is known about the underlying

problem—they certainly have the useful property that it is usually possible

to apply such a method—but the generated preconditioners can be poor.

This division between approaches to preconditioning essentially comes down

to whether one is ‘given a problem’ or ‘given a matrix’ (Neytcheva 1999).

Though we emphasise much about the former, we also give pointers to and

brief description of the latter.

Preconditioners are overwhelmingly used together with Krylov subspace

iterative methods (van der Vorst 2003, Greenbaum 1997, Saad 2003, Meurant

1999, Hackbusch 1994, Elman et al. 2014b, Liesen and Strakosˇ 2013, Olshan-

skii and Tyrtyshnikov 2014), and the applicable iterative methods make

di↵erent requirements of a preconditioner: for example, one could employ

a nonsymmetric preconditioner for a symmetric matrix, but in general one

would then lose the possibility to use the very e↵ective conjugate gradient or

minres Krylov subspace iterative methods. Therefore, we have structured

this paper into sections based on the symmetry and definiteness proper-

ties of the presented coecient matrix. In essence, this subdivision can

be thought of also as in terms of the Krylov subspace method of choice.

The coecient matrix will usually be denoted by A as in the example above

(though di↵erent fonts will be used). For any invertible preconditioning ma-

trix or linear operator, P , there is an iterative method that can be used, but

it is usually desirable not to ignore structural properties such as symmetry

when choosing/deriving a preconditioner.

We have generally restricted consideration to real square matrices, A, but

Preconditioning 5

it will be evident that some of the di↵erent preconditioning ideas could be

applied to matrices over di↵erent algebraic fields, in particular the complex

numbers (for which ‘symmetric’ ! ‘Hermitian’ etc.). It is also true that

some of the ideas presented apply to rectangular matrices such as arise in

least squares problems, but, except tangentially, this paper does not consider

such situations.

Our general strategy/intention is to describe most preconditioning ap-

proaches that we are aware of, and to guide the reader towards selecting an

appropriate strategy for any particular problem. Inevitably, some precon-

ditioning ideas have arisen in the specific context of particular problems, so

brief review of such problems is included.

Over ten years ago, Benzi (2002) wrote an excellent survey of precon-

ditioning techniques which contains much wisdom and remains an excel-

lent source today, in particular with regard to incomplete factorization and

sparse approximate inverse methods. The text by Chen (2005) describes

several approaches and includes many example applications.

In section 3 preconditioning approaches for symmetric and positive defi-

nite matrices are discussed. There follows in section 4 some brief comments

on symmetric semidefinite matrices. This is followed in section 5 by precon-

ditioning for symmetric and indefinite matrices and finally for nonsymmetric

matrices in section 6. We start, however, with a brief review of on iterative

methods in order to orient ourselves.

2. Krylov subspace iterative methods

For any vector r 2 Rn, if it is easy to compute the matrix-vector product

Ar, then it is easy to generate vectors in the Krylov subspaces

Kk(A, r) = span{r, Ar,A2r, . . . , Ak1r}, k = 1, 2, . . .

by forming A(Ar) etc.. In particular, when A is sparse (i.e. has few non-zero

entries in any row), then such matrix-vector products are readily computed.

For general dense matrices, a matrix-vector multiplication takes O(n2) op-

erations, hence there is less possibility for preconditioning to reduce this

complexity unless some other structure can be used such as in the Toeplitz

example above.

Krylov subspace iterative methods for finding x 2 Rn which satisfies

Ax = b compute iterates xk for which

xk x0 2 Kk(A, r0), k = 1, 2, . . . (2.1)

from some initial guess x0 and thus require just one matrix-vector product

computation at each iteration. The vector r0 = b Ax0 is the residual

associated with x0; in general rj = bAxj , j = 0, 1, 2, . . .. If x0 = 0 then

xk 2 Kk(A, b), k = 1, 2, . . . .

6 A. J. Wathen

Thus the iterates and residuals of every Krylov subspace method satisfy

xk x0 =

k1X

j=0

↵jA

jr0

for some coecients ↵j , j = 0, 1, 2, . . . , k 1. Hence

xk = x0 + q(A)r0 (2.2)

where q is the polynomial of degree k 1 with q(z) =Pk1j=0 ↵jzj . Premul-

tiplying (2.2) by A and subtracting from b we obtain

bAxk = bAx0 Aq(A)r0

thus

rk = r0 Aq(A)r0 = p(A)r0 (2.3)

where

p(z) = 1 z

k1X

j=0

↵jz

j = 1

kX

j=1

↵j1zj

is a polynomial of degree k which satisfies p(0) = 1.

All Krylov subspace methods are thus described by (2.3) and each dif-

ferent Krylov subspace method is characterised by an optimality or (bi-)or-

thogonality property.

When A is symmetric and positive definite (so that kvkA = (vtAv)

1

2 de-

fines a vector norm), the conjugate gradient method (cg) (Hestenes and

Stiefel 1952), (Saad 2003, algorithm 6.18) requires only the one matrix-

vector product with A as well as five simple vector operations at each itera-

tion to compute iterates satisfying (2.1) which minimise kxxkkA over the

Krylov subspace.

When A is symmetric but indefinite, rTAr takes both positive and nega-

tive values so that a norm can not be defined as for the conjugate gradient

method. The Krylov subspace method of choice for symmetric indefinite sys-

tems, the minimum residual (minres) method (Paige and Saunders 1975),

(Fischer 2011, page 187) takes one matrix-vector product with A as well

as seven simple vector operations at each iteration to compute iterates sat-

isfying (2.1) which minimise the Euclidean norm of the residual, krkkI =

(rTk rk)

1

2 .

When A is nonsymmetric, there is not so obvious a method of choice,

hence several Krylov subspace methods are widely used. The most popular

is gmres (Saad and Schultz 1986), (Saad 2003, algorithm 6.9) which, sim-

ilarly to minres, computes iterates which minimise the Euclidean norm of

the residual, but by contrast to minres requires an increasing ammount

Preconditioning 7

of computation and storage at each successive iteration to achieve this.

Thus gmres can be a good method if only few iterations are needed

to achieve acceptable convergence—this might be the case if one has a

good preconditioner—but it is not practical if many iterations are required.

Restarting gmres at (regular) intervals reduces the computational burden,

but also in most cases severely slows convergence. Thus use is made of

Krylov subspace methods for nonsymmetric A which require only a fixed

ammount of work at each iteration, but which necessarily therefore have

more erratic convergence (Faber and Manteu↵el 1984). These are gener-

ally based on the nonsymmetric Lanczos method: we mention bicgstab

(van der Vorst 1992), bicgstab(`) (Sleijpen and Fokkema 1993), qmr

(Freund and Nachtigal 1991), idr(s) (Sonneveld and van Gijzen 2008),

though there are many others. There are also several variants, see Simoncini

and Szyld (2007).

For any nonsingular nonsymmetric matrix, it is always possible to attack

the symmetric and positive definite system of normal equations

ATAx = AT b, (2.4)

(or AAT y = b, x = AT y) and in some instances this has been proposed;

see for example Laird and Giles (2002). A related approach is to solve a

symmetric indefinite system in a form such as

I A

AT 0

r

x

=

b

0

. (2.5)

However, whichever form is employed, convergence can be rather slow; AT

is not generally such a good preconditioner for A! To give an idea of the

convergence that might be expected: if A was already symmetric and posi-

tive definite, then generally the number of cg iterations required for (2.4)

would be approximately the square of the number of cg iterations for the

system Ax = b. There are situations in which this may not be such a di-

culty, however, it is generally harder to find good preconditioners for ATA

(or AAT ) than for A and this may be the main reason that more attention

and usage is made of the nonsymmetric Krylov subspace iterative meth-

ods above. An important and underappreciated general observation in this

regard is due to Braess and Peisker (1986):

Remark 1. P TP can be an arbitrarily bad preconditioner for ATA re-

gardless of the quality of P as a preconditioner for A.

The above remark applies even if A, P are symmetric as is shown by the

elementary example in Braess and Peisker (1986). We will see in section 5

situations where one can build a particular preconditioner P for A so that

P TP is also a good preconditioner for ATA, indeed the main result in Braess

and Peisker (1986) provides sucient conditions for this to occur.

8 A. J. Wathen

Now, cg for (2.4) will generate a Krylov space of the form K(ATA,AT b)

for example, when x0 = 0. Perhaps the most suitable implementation alge-

braically equivalent to a symmetric Krylov subspace method for the normal

equations is the lsqr algorithm of Paige and Saunders (1982) which also

computes least squares solutions for rectangular A. An advantage of these

methods is that the convergence theory of cg and minres apply.

The reason that there are methods of choice (cg , minres) for symmetric

matrices, but a whole range of possibilities for nonsymmetric matrices arises

because of convergence guarantees. In the case of symmetric matrices there

are descriptive convergence bounds which depend only on the eigenvalues

of the coecient matrix, thus the number of iterations which will be needed

for convergence to a given tolerance can be estimated and bounded a priori;

a good preconditioner will ensure fast convergence. These comments apply

also for normal equation approaches for nonsymmetric matrices.

By contrast, to date there are no generally applicable and descriptive

convergence bounds even for gmres; for any of the nonsymmetric methods

without a minimisation property, convergence theory is extremely limited.

The convergence of gmres has attracted a lot of attention, but the absence

of any reliable way to guarantee (or bound) the number of iterations needed

to achieve convergence to any given tolerance in most situations means that

there is currently no good a priori way to identify what are the desired

qualities of a preconditioner. This is a major theoretical diculty, but

heuristic ideas abound.

All Krylov subspace methods are to an extent a↵ected by rounding errors

in a computer if run for a large enough number of iterations, see Liesen and

Strakosˇ (2013) or the discussion in Greenbaum (1997, Chapter 4). In this

paper we ignore any such e↵ects since an important consequence of good

preconditoning is that few iterations should be necessary for acceptable

convergence, thus round-o↵ e↵ects are not usually able to build up and

become a significant issue. Similarly, there is theory which describes the

eventual (i.e. asymptotic) convergence of Krylov subspace methods such as

cg, but this is also not generally relevant to preconditioned iterations—it is

rapid error or residual reduction in early iterations that is typically desired.

3. Symmetric and positive definite matrices

The standard cg convergence bound for iterative solution of Ax = b when

A is symmetric and positive definite comes straight from (2.3) and the min-

imisation condition described above and is

kx xkkA

kx x0kA minpk2⇧k,pk(0)=1 max2(A) |pk()|,

where kyk2A = yTAy, ⇧k is the set of real polynomials of degree less or

equal to k and (A) is the eigenvalue spectrum of A (Greenbaum 1997,

Preconditioning 9

section 3.1). One immediately deduces that if A has a small number of

distinct eigenvalues, say s, then cg will terminate at the sth iteration with

the correct solution since there is a degree s polynomial which has these

eigenvalues as its roots. Iteration error can already be acceptably small

after few iterations when the eigenvalues are clustered. In particular if one

considers all eigenvalues to lie in a single interval Amin Amax then use

of Chebyshev polynomials yields the convergence bound

kx xkkA

kx x0kA 2

✓p

1p

+ 1

◆k

, (3.1)

where = A = Amax/

A

min is the Euclidean norm (or 2-norm) condition

number for such a symmetric positive definite matrix; see again Greenbaum

(1997, section 3.1). An equivalent statement is that it takes approximately

1

2

p

| log ✏/2| iterations to ensure that kx xkkA/kx x0kA ✏.

The origin of the name preconditioning is generally now associated with

this convergence bound in the sense that reduction of the condition num-

ber, , guarantees faster convergence (though for historical accuracy, see

Benzi (2002)); what we now understand as preconditioning is, however, a

much broader concept. If P is also symmetric and positive definite and the

extreme generalised eigenvalues of the ‘pencil’ Av = Pv (equivalently the

extreme eigenvalues of P1A) satisfy

A =

Amax

Amin

P1A

max

P

1A

min

= P

1A,

then cg convergence for the preconditioned system is guaranteed in many

fewer iterations than for the original unpreconditioned system since (3.1)

holds now with = P

1A. Note that for cg even with preconditioning, it

remains the relative magnitude of the error in k·kA which arises in the bound

(3.1); a symmetric positive definite preconditioner a↵ects the eigenvalues

and thus the speed of convergence, but not the norm in which convergence

occurs. Use of a nonsymmetric preconditioner, P , would prevent use of cg;

a nonsymmetric Krylov subspace method would in that case have to be used

in general (but see Section 7).

To establish the quality of P as a preconditioner for A, it is sucient

to find (and a priori prove if possible!) upper and lower bounds on the

eigenvalues of the form

` P1Amin , P

1A

max ⌥ (3.2)

or equivalently—and usually more tractably—upper and lower bounds on

the generalised Rayleigh quotient

` u

TAu

uTPu

⌥, for all u 2 Rn {0}. (3.3)

10 A. J. Wathen

In either case P

1A ⌥/` leads directly to a convergence bound through

(3.1) since the right hand side varies monotonically with .

One does not actually replace A with P1A in the cg algorithm to imple-

ment preconditioning in the case of symmetric A since this would generally

destroy symmetry, but even when symmetry is preserved (as in Saad (2003,

Algorithm 9.1)), the action of P1 is required on a di↵erent vector at each

iteration.

One thus has clear sucient criteria for identifying a good preconditioner,

P , in this symmetric positive definite case:

• it should be easy to solve systems of equations with the preconditioner

P as coecient matrix, that is, the action of P1 on a vector should

be readily computed

• P1A should be small.

These are competing aims; P = I satisfies the first but not the second unless

A is already well conditioned and P = A satisfies the second but not the

first unless solution of systems with A is already easy!

3.1. Diagonal scaling

The earliest preconditioning ideas came from consideration of the first of

these criterion and are largely in the ‘given a matrix’ category. Taking P

to be a diagonal matrix is very simple; if in fact P = diag(A) then this is

widely called Jacobi preconditioning. For a well conditioned matrix which is

just badly scaled, this is generally all that is needed, but in general diagonal

preconditioning is likely to achieve very little in terms of reducing iterations

or computation time except in specific situations as below. There are two

related types of problem where Jacobi preconditioning (or just diagonal

scaling) is important.

Firstly, the ‘identity’ matrix for finite element computation is the mass

matrix, Q, which is the Gram matrix of the finite element basis functions

{i, i = 1, . . . , n}, i.e. it has entries

Q = {qi,j , i, j = 1, . . . , n}, qi,j =

Z

⌦

⇢i(x)j(x) (3.4)

for a domain ⌦. Here the potentially variable coecient ⇢ might, for exam-

ple, represent material density in a structure, and if this is highly variable

or if the discretization has a large variation in the size (length, area, vol-

ume) of the elements—as is inevitable on an irregular or adapted mesh—

then this variability is also represented in the diagonal entries of Q. Thus

P = diag(Q) is provably a good preconditioner in this case (Wathen 1986).

For example the gallery(’wathen’,n,n) matrix in matlab is a finite el-

ement mass matrix with randomly variable density for a so-called seredipity

Preconditioning 11

finite element. The precise eigenvalue bounds of the form (3.2) or (3.3), to-

gether with tabulated values for ` and ⌥ for many types of finite elements,

are derived in Wathen (1986).

We briefly comment that in problems as above with analytic a priori

eigenvalue bounds ` and ⌥ which are (reasonably) tight, the possibility also

arises to use a fixed but small number of Chebyshev (semi-)iterations as a

(linear) preconditioner—see Wathen and Rees (2009).

The second important situation arises also through highly variable coef-

ficients; the classic example arises from the operator

r · (Kru)

where the permeability K has large jumps. For continuous finite element

approximation of such terms, Graham and Hagger (1999) were the first to

identify the utility of diagonal scaling to render the preconditioned prob-

lem more homogeneous. For such an elliptic PDE, Jacobi preconditioning

is highly unlikely to be an adequate preconditioner on its own, but when

used together with an appropriate preconditioner for a homogeneous elliptic

PDE operator—see below—the combination can be much better than just

the elliptic preconditioner by itself. In the literature there are several de-

velopments of the idea of Graham and Hagger, including for non-isotropic

permeabilities.

3.2. Incomplete Cholesky factorization

It is well known that Cholesky factorization (LLT or LDLT factorization

where L is lower triangular and D is diagonal) of a symmetric and positive

definite matrix leads immediately via forward and backward substitution to

direct solution of a matrix system. For sparse matrices, unfortunately, the

factor L is usually much less sparse (i.e. has many more non-zeros) than

A. Such fill-in is usually unavoidable, though widely used sparse direct

methods use sophisticated algorithms in attempting to minimize it (HSL

2013, Davis 2004, Amestoy et al. 2000, Li 2005). In the context of PDE

problems, the essential rule is that a sparse direct method can be a good

solver for a problem on a 2-dimensional spatial domain, but is generally

computationally infeasible for a 3-dimensional problem. If one preselects

(or dynamically selects) the sparsity pattern of L and drops terms which

would fill outside of this pattern, then one can form an incomplete Cholesky

factorization

A = LLT + E

where E represents the error due to the dropped terms. One can additionally

threshold so that small entries are not kept in the incomplete factors. With

P = LLT , it is clear that forward and backwards substitution still enable

12 A. J. Wathen

easy solution of systems with the preconditioner, and the main remaining

question is whether the preconditioner leads to fast convergence of cg.

In general many di↵erent variants of this basic idea exist and software for

various incomplete factorization-based algorithms is widely available. This

is certainly a plus point, but unless A is diagonally dominant, incomplete

Cholesky algorithms do not always work well, indeed they do not always

exist without modification, see Meijerink and van der Vorst (1977), and

see Benzi (2002) for discussion of this as well as many other aspects of

incomplete factorization preconditioning. With regard to robustness, there

are certain advantages to computing an incomplete triangular factorization

not using Cholesky but via orthogonalization in the inner product generated

by A; see Benzi and Tuma (2003). This approach is called RIF (robust

incomplete factorization) and shares some similarity with the AINV sparse

approximate inverse discussed below.

Let us make some of these statements precise in the context of solving

a simple discretized elliptic boundary value problem, namely the common

5-point finite di↵erence approximation of the Dirichlet problem for the (neg-

ative) Laplacian

r2u = f in ⌦, u given on @⌦

which arises from second order central di↵erencing on a regular grid with

spacing h in each of the two spatial dimensions. The choice of sign ensures

that the operator and thus the arising matrix are positive rather than neg-

ative definite. If the domain, ⌦, is the unit square (so that the boundary

@⌦ is four coordinate-aligned line segments) then the arising matrix A is

of dimension N ⇡ h2 for small h and also A = O(h2), see Elman et al.

(2014b, section 1.6). Thus (3.1) predicts that O(h1) cg iterations would

be required for convergence of unpreconditioned cg. The simplest incom-

plete Cholesky preconditioner for which L has the same sparsity pattern

as the lower trianglular part of A also leads to P

1A = O(h2) though

there is significant clustering of a large number of the eigenvalues around 1;

the bound (3.1) based only on the extreme eigenvalues is pessimistic in this

situation. A variant, modified incomplete factorization (due to Gustafsson

(1978) but related to rather earlier work by Dupont et al. (1968)) reduces

this to P

1A = O(h1), and generally leads to slightly faster cg conver-

gence. Applying (3.1) in this case predicts that O(h1/2) preconditioned

cg iterations would be required for convergence, and this is close to what

is seen in practical computation. This implies that for a sequence of such

problems with smaller and smaller h, it will take an increasing number of

iterations to achieve acceptable convergence as h gets smaller (and so the

matrix dimension gets larger) – a rather undesirable feature.

Fortunately, there are much better preconditioners for this and related

elliptic PDE problems which guarantee P

1A = O(1), so that the number of

Preconditioning 13

required cg iterations is essentially constant for larger dimensional discrete

problems coming from approximation on finer grids. Such preconditioning

methods are said to have P spectrally equivalent to A. The most important

class of such methods arise from multigrid ideas which we consider next.

Of course, even in the more general context of problems not coming from

PDEs, incomplete Cholesky and its variants can be of considerable utility; in

particular, algebraic preconditioners of this form can very often be applied.

Before moving on, we comment that preconditioning techniques have been

introduced that turn out to be related to incomplete factorization. A par-

ticular example is the ‘support graph’ preconditioner introduced originally

in a conference talk by Vaidya who wrote commercial software but did not

publish on the subject. The support graph idea comes out of graph-theoretic

notions and has a somewhat discrete mathematical flavour. Later analysis

both clarified the scope of the ideas—they are now essentially extended to

cover most symmetric and diagonally dominant matrices—and made con-

nections with incomplete factorization: see Bern et al. (2006), Spielman and

Teng (2014) and references therein.

3.3. Multigrid

Since the groundbreaking paper by Brandt (1977), multigrid methods have

been at the core of scientific computing. Earlier contributions, notably by

Fedorenko (1964) and Bakhvalov (1966), are now considered to have in-

troduced the key ideas of combining a simple smoothing iteration (such

as Jacobi or Gauss-Seidel iteration) with coarse grid correction on succes-

sively coarser grids for the finite di↵erence Laplacian, but Brandt’s vision

was much broader: he showed just how fast multigrid solvers could be and

introduced many ideas to keep reseachers busy for decades! For a posi-

tive definite elliptic PDE operator, it has been clear for a long time that

an appropriate multigrid method will provide an optimal solver, that is, a

solver such that the work to compute the solution scales linearly with the

dimension of the discretized problem.

Classically, multigrid is a stationary iteration (albeit a non-trivial one) for

solving Ax = b when A is a discrete Laplacian matrix (arising from finite

di↵erence or finite elements). For V- or W-cycles, iterates satisfy single

cycle contraction of the form

kx xkkA ⌘ kx xk1kA (3.5)

where the contraction factor, ⌘ is typically around 0.1 (see, for example

Elman et al. (2014b, section 2.5)). It follows that the linear multigrid op-

erator (represented by a complicated and fairly dense matrix which one

would never want to use in computations) is an excellent candidate for a

14 A. J. Wathen

preconditioner since from (3.5) it readily follows that

1 ⌘ x

TAx

xTPx

1 + ⌘,

see Elman et al. (2014b, Lemma 4.2). That is, applying one multigrid V-

cycle, say, is equivalent to solution of a preconditioner system with pre-

conditioner P which guarantees rapid cg convergence because P

1A

(1 + ⌘)/(1 ⌘). There is little point in using multigrid cycles as a precon-

ditioner for cg in this way for simple problems; one might as well simply

use multigrid iteration to solve, but for more complicated problems this ob-

servation that a single multigrid cycle is an ecient spectrally equivalent

approximation of an elliptic PDE operator like the Laplacian is very useful;

see section 5. Note also the important point that arises here for the first

time in this review: a preconditioner can be a linear operation rather than

specifically a known matrix. Thus {3 V-cycles}, for example, is a suitable

preconditioner. On the other hand, iterating multigrid (or any other itera-

tion) to some residual tolerance will likely ensure that di↵erent numbers of

V-cycles are used at di↵erent Krylov subspace iterations and so the precon-

ditioner is not a fixed linear operator and the above convergence analysis

can not apply.

Multigrid – and multilevel methodology more generally – has now become

a huge subject (see for example Trottenberg et al. (2000)), with specialized

techniques for the treatment of several specific features which might arise in

a PDE (or other) problem. In the past couple of decades, algebraic multigrid

methods (AMG), which try to mimic a geometric multigrid approach but

use only information from the entries of a matrix rather than geometric

information regarding grids, boundary conditions, etc. have attracted much

attention and there is now widely available software: e.g. HSL MI20 (Boyle

et al. 2010), ML (Gee et al. 2006), BoomerAMG (Henson and Yang 2000). In

particular, in situations where a problem is fundamentally of the character

of an elliptic di↵erential operator but there are complicating features, an

AMG cycle can be an excellent preconditioner. This can be true even if

AMG is non-convergent since it may contract on all but a few modes; the

few corresponding outlier eigenvalues are easily dealt with by cg. It can

simply be much more convenient to employ an AMG cycle as (part of) a

preconditioner for a Krylov subspace method than to have to ‘hard code’

bespoke techniques for each di↵erent variant problem that comes up. So

common is this combined use of AMG and a Krylov subspace method that

it is rarely even mentioned that AMG is used as a preconditioner, but this

is almost always the case in practice.

We should mention that multigrid ideas have been tried in other contexts

than just for PDE problems; see for example Giles (2008), Livne and Brandt

(2012).

Preconditioning 15

It also needs mention in this section that a multigrid V- or W-cycle can

usually be quite easily made into a symmetric linear operator—see Elman

et al. (2014b, page 98)—though the so-called full multigrid cycle, for exam-

ple, can not. Positive definiteness is not usually an issue for an underlying

positive definite problem.

3.4. Domain Decomposition

For a very large dimensional problem, it might be a fairly natural idea to

split into separate subproblems and solve each separately (and usually in

parallel) to give an approximation to the solution of the overall problem.

In the original context of solving an elliptic boundary value problem on a

domain ⌦, this is the basic idea of domain decomposition: the subproblems

being problems expressed on subdomains ⌦k, k = 1, . . . , N with [⌦k =

⌦. In its simplest form, it correponds to a block diagonal preconditioner

with each separate subproblem represented by a single diagonal block which

may or may not be overlapping, thats is, which may or may not share

some variables with other subproblems. An important issue is how to treat

the variables either in the overlap regions or on the introduced boundaries

between subdomains. Domain decomposition was first employed abstractly

by Hermann Schwarz in the nineteenth century to prove an existence result

for an elliptic PDE, hence methods of this type are often called Schwarz

methods.

Like multigrid, domain decomposition is a huge subject which we have

no intention of covering with any depth here; see for example the books

(Quarteroni and Valli 1999), (Smith et al. 1996), (Toselli and Widlund

2004), (Mathew 2008) or the earlier survey (Chan and Mathew 1994). What

is clear, however, even from this brief description is that domain decompo-

sition gives rise to preconditioners which are well suited to parallel compu-

tation.

To give a brief idea of what can be achieved on a simple problem, we con-

sider the Poisson equation on a domain ⌦ ⇢ Rd approximated by standard

(conforming) finite elements or finite di↵erences with a mesh spacing h. If

⌦ is split into non-overlapping subdomains of width H, then there are basic

block diagonal preconditioners without coarse grid solve for which one can

establish that

P

1A CH2(1 + log2(H/h)).

The more sophisticated domain decomposition schemes which include also

a coarse grid (or ‘wire basket’ solve) can achieve

P

1A C(1 + log2(H/h)).

In either case, C is a generic constant not dependent on H,h and the bounds

translate into convergence estimates via (3.1). The size H can be considered

16 A. J. Wathen

to be the coarse grid size in essence; the bounds above assume exact solution

of the problems on subdomains.

With the importance of parallel and distributed computing, the domain

decomposition concept remains very important and much is known for more

advanced methods and; see the regular Domain Decomposition conference

proceeding available at ddm.org (2014).

3.5. Sparse approximate inverses

For any matrix norm, minimisation of

kI ARk

over all matrices R 2 Rn⇥n will clearly yield R = A1. Minimization with

restrictions on the entries where R may have nonzero values leads to sparse

approximate inverses. That is

R = argminR2SR kI ARk,

where SR describes the subset of entries of R which are allowed to be non-

zero, is a sparse approximate inverse for A. To be consistent with the

nomenclature in this article, P = R1 is the preconditioner here; solution

of systems with P is, of course, simply multiplication by R in this situation.

SR is strictly a subspace of matrices with restricted sparsity pattern.

The Frobenius norm

kBk2F =

X

i=1,...,n

X

j=1,...,n

b2i,j

is by far the most widely considered since the minimisation problem then

reduces to least squares problems for the columns of R separately (thus

computable in parallel). Further, these least squares problems are all of

small dimension when the specification of SR ensures that R is a sparse

matrix. We again refer to Benzi (2002) for a thorough description.

As posed in this simple form, the sparse approximate inverse, R is not

generally symmetric when A is symmetric, however we introduce this type

of preconditioner in this section because symmetry can be enforced by com-

puting R in factored form R = LLT where L is sparse and lower triangular;

see Kolotilina and Yeremin (1993). Symmetry of a sparse approximate in-

verse can also be achieved by the AINV algorithm of Benzi et al. (1996)

and the more robust (stabilized) SAINV variant (Benzi et al. 2000) which

is guaranteed not to break down if A is positive definite. See again Benzi

(2002).

The main issue in the construction of any sparse approximate inverse is

the choice of the sparsity pattern, SR. The simplest idea is to use the spar-

sity of A, whilst there have been several attempts to dynamically define SR,

Preconditioning 17

most notably in the SPAI algorithm of Grote and Huckle (1997). An exten-

sion which can be used in a number of di↵erent scenarios is to use a target

matrix, T , and hence minimise kT ARkF ; see Holland et al. (2005). A

further comment is that sparse approximate inverse algorithms are clearly

quite widely applicable, but generally do not give spectrally equivalent pre-

conditioners for elliptic PDEs. They do however make excellent smoothers

for use with multigrid (Tang and Wan 2000, Bro¨ker et al. 2001).

3.6. Hierarchical representations

For symmetric and positive definite matrices arising via low order finite

element approximation from elliptic PDEs, Yserentant (1986) introduced

nearly three decades ago the hierarchical basis preconditioner. This uses

the idea that the coecient matrix would have a much smaller condition

number if a di↵erent (but less computationally convenient) basis for the

approximation space was employed. By using a hierarchical basis—which

would give rise to a fairly dense matrix—both this good conditioning and

a rapid change of basis matrix based on a simple tree structure allow the

sparse matrix arising from the usual locally defined finite element basis

functions to be e↵ectively preconditioned.

For the discrete Laplacian on a two-dimensional domain, Yserentant’s

hierarchical preconditoner reduces = O(h2) to = O(log h1) for the

discrete Laplacian (see Yserentant (1986) or Elman and Zhang (1995) for a

more linear algebraic proof). Unfortunately, for three-dimensional domains

only = O(h1) is possible with the straightforward use of the hierarchical

basis (Ong 1997), though an improved variant of this idea due to Bramble

et al. (1990) can even yield = O(1) in three-dimensions as well. These

ideas, originally introduced as based on the multilevel splitting of finite

element spaces, were the precursor to a whole abstract theory based on such

splittings; the review article by Xu (1992) remains a largely comprehensive

description of this abstract theory which covers many approaches including

multigrid and domain decomposition discussed above for symmetric and

positive definite matrices arising from elliptic PDEs.

A di↵erent hierarchical structure can also be of utility. We wish to men-

tion some ideas and the associated computational technology which en-

ables certain dense matrices—such as the inverse matrices of discrete pos-

itive definite elliptic PDE operators—to be represented to machine accu-

racy in a highly ecient format. There are a number of related but dif-

fering approaches ( Chandrasekaran et al. (2005), Greengard and Rokhlin

(1987), Hackbusch (1999), Grasedyck and Hackbusch (2003), Hackbusch

et al. (2000), Bebendorf (2008)), but the essential observation can be thought

of in terms of the Green’s function for an elliptic operator: nearby entries

need accurate representation, but variables distant from a given entry can

18 A. J. Wathen

be treated together. This notion is most directly represented in the H-

matrix representation of Hackbusch: o↵-diagonal blocks are represented as

low rank blocks using outer products of vectors or matrices with very few

columns. By employing a blocking hierarchy, the computational complexity

of storing (and also multiplying with) such matrix approximations can be

as low as O(n log n).

We mention these ideas here since such approximate representations of

dense matrices have been suggested as preconditioners rather in the same

vein as sparse approximate inverses. They also can be utilised in other more

demanding situations as we describe later.

4. Semidefinite matrices

It might seem rather odd to be interested in preconditioners for singular

matrices; one might believe that only well-posed problems were worthy of

attention. However, there are di↵erent situations in which approximations

to certain singular operators/matrices are valuable and we mention some

ideas in this brief section.

The simplest PDE probem which gives rise to a singular matrix is proba-

bly the Neumann problem for a simple operator like the Laplacian; specify-

ing only derivative boundary conditions for a potential-like variable leaves

a 1-dimensional kernel comprising the constants as is well known. Such

a small dimensional null space generally causes little significant diculty

in terms of iterative methods or preconditioning and methods for definite

matrices generally apply: see (Elman et al. 2014b, section 2.3). Similarly,

graph analysis and Markov chain problems respectively lead to symmetric

and nonsymmetric singular systems with simple, low-dimensional kernels;

again, methods for definite matrices should generally be applied without

explicit ‘fixes’ to ensure nonsingularity for such problems. For problems

arising in contexts other than PDEs, singularity tends to imply that a de-

scription has not been completely specified.

The issue of semidefiniteness is certainly serious, however, for equations

involving the curl or div operators applied to vector fields as arise, for exam-

ple, in the Maxwell equations describing electromagnetics. In this context,

the operator curlcurl has all gradients of scalar fields in its kernel, thus

under most discretizations one would expect that the matrices arising will

be semidefinite with a kernel of some significant dimension. Similarly, the

curl of any smooth enough vector field is in the kernel of the div opera-

tor. On the relevant spaces of functions, the curlcurl operator is however

positive definite and self-adjoint (symmetric), and it seems that it may be

possibile to use convenient preconditioners like standard AMG implemen-

tations if appropriate account is taken of the kernel. This is the purpose of

‘Auxiliary Space’ preconditioning which aims to take account of the natural

Preconditioning 19

decompositions of function spaces that arise. The ideas have evolved over

some years; see Hiptmair and Xu (2007) and references therein. The main

practical issue is representing mapping operators between the various spaces

in the discrete setting.

5. Symmetric and indefinite matrices

For real symmetric matrices which do not have to be definite, the Krylov

subspace method of choice is the minres algorithm due to Paige and Saun-

ders (1975). In the same paper they introduced the alternative but less used

symmlq method. As its name implies, minres minimises the norm of the

residual vector for xk in the k-dimensional Krylov subspace at each itera-

tion. Thus since the symmetry of A ensures that the eigenvalues are real and

the eigenvectors are orthogonal, the convergence of minres is described by

krkkI

kr0kI minpk2⇧k,pk(0)=1 max2(A) |pk()|, (5.1)

where the notation is as used before in the positive definite case. The

di↵erence from that case is not only that the residual rather than the error

is minimised (it is really a di↵erence of norm since rk = A(x xk) implies

that krkkI = kx xkkA2), but the spectrum, (A), now contains both

positive and negative eigenvalues.

As in the symmetric and positive definite case, (5.1) immediately indicates

that termination of the iteration will occur with the solution at the sth iter-

ation if A has just s distinct eigenvalues. Correspondingly, the eigenvalues

lying in a small number of clusters should also lead to rapid convergence

if none of the clusters is too far from, nor too close to the origin. If the

eigenvalues of A are all contained in two intervals

[a,b] [ [c, d],

where a, b, c, d are all positive, then it is less straightforward in general

than in the positive definite case to derive a simple convergence bound (see

Wathen et al. (1995)). One situation where a simple expression is possible,

however, is when d c = a b; the minres residuals then satisfy the

convergence bound

kr2kkI

kr0kI 2

✓p

1p

+ 1

◆k

, (5.2)

where = ad/(bc) (see (Elman et al. 2014b, section 4.2.4.)). Note that

here plays a similar role to max/min in the positive definite case, though it

is not the matrix condition number here; since the Euclidean norm condition

number, kAkkA1k is max{a, d}/min{b, c} in this situation, can be as

large as the square of this condition number when a = d, b = c, but rather

20 A. J. Wathen

less when the two intervals are not symmetrically placed about the origin.

When a = c, for example, = d/b is again exactly the Euclidean norm

condition number.

When A is symmetric and indefinite, any preconditioner for minresmust

be symmetric and positive definite. This is necessary since otherwise there is

no equivalent symmetric system for the preconditioned matrix. Thus a non-

symmetric iterative method must be used even if a symmetric and indefinite

preconditioner is employed for a symmetric and indefinite matrix in general.

An indefinite preconditioner is therefore not usually desirable, though there

are exceptions: see below. A preconditioner for a symmetric indefinite

matrix A for use with minres therefore can not be an approximation of

the inverse of A, since this is also indefinite. This simple observation rules

out several approaches described above in the positive definite case.

With a symmetric and positive definite preconditioner, the preconditioned

minres convergence bounds as above become

krkkP1

kr0kP1

min

pk2⇧k,pk(0)=1

max

2(P1A)

|pk()|

min

pk2⇧k,pk(0)=1

max

2[a,b][[c,d]

|pk()| (5.3)

when [a,b] [ [c, d] is now an inclusion set for all of the eigenvalues of

P1A rather than A. A further aspect introduced by preconditioning for

minres which does not arise for cg in the positive definite case is now

apparent: the norm in which convergence naturally occurs is a↵ected by

the preconditioner; it is not just the eigenvalue distribution and hence the

speed of convergence which is a↵ected.

The only reason we bring up this issue is because one now has the undesir-

able possibility that a preconditioner which apparently gives rapid minres

convergence might give inaccurate solution iterates simply because k · kP1

is a highly distorted norm. The following trivial example will make the

point:

Let A be any indefinite diagonal matrix and suppose we precondition with

P = diag(✏, 1/✏, . . . , 1/✏)

for some small positive ✏. Convergence to some small tolerance will appear

to occur after only 1 iteration since there is only 1 large entry of P1 and the

remaining entries are very small. Because of this distorted weighting, the

first component of the first solution iterate will be accurate, but the rest will

generally not be. For further consideration of this issue see Wathen (2007).

These di↵erent issues, which do not arise when A is symmetric and pos-

itive definite, lead some authors to the conclusion that preconditioning for

symmetric indefinite matrices is much more dicult than the definite case.

However it is still much more tractable than the nonsymmetric case! For

Preconditioning 21

a PDE problem, for example, the convergence bound (5.3) guarantees that

preconditioned minres convergence will be in a number of iterations in-

dependent of any mesh size parameter, h, given a preconditioner which

ensures that a, b, c, d are bounded, bounded away from zero and indepen-

dent of h. This is like the situation for spectrally equivalent preconditioners

in the symmetric positive definite case. Such guarantees rarely exist for

non-symmetric iterative methods; see section 6 below.

We mention that the sqmr Krylov subspace iterative method (Freund

and Nachtigal 1994) is designed for indefinite matrices with indefinite pre-

conditioning; this method, however, has no convergence guarantees, being

a variant of the nonsymmetric qmr iterative method for which symmetry

allows a more streamlined implementation.

General preconditioning strategies for symmetric indefinite matrices are

not so available as in the positive definite case. Amongst the ‘given a ma-

trix’ approaches, simple application of scaling and incomplete factorization

are not contenders and there is little significant work on sparse approxi-

mate inverses since these appear unlikely to provide much benefit except

possibly when A has only a few negative eigenvalues. ‘Given a problem’ is

more promising and there are two general classes of problem which give rise

to symmetric indefinite matrices for which helpful—sometimes excellent—

preconditioning approaches are known.

5.1. Shifted elliptic operators: Helmholtz problems

It might seem that taking a symmetric and positive definite matrix, K, and

simply shifting it by subtracting some multiple of the identity should lead

to matrices for which easy solution algorithms exist. In general, however,

this is not the case. Certainly the matrix K I can have resonances (and

thus be singular) when is an eigenvalue of K, but the diculty lies deeper

than just this observation.

The classic (and important) PDE problem here is the Helmholtz equation

r2uu = f which arises when looking for periodic (wave-like) solutions

of the linear wave equation; the parameter is then related to the frequency.

Supposing that K is the discretised form of the positive definite operator

r2 (i.e. the negative Laplacian), then for > min(K) it is clear that

A = K I (5.4)

is indefinite and symmetric. Many of the important physical applications

involve high frequencies and thus large values of . Having a good precondi-

tioner forK—we have identified many in the sections above—is of little help

in constructing a preconditioner for A. For finite element discretization, the

identity operator becomes the mass matrix, Q (3.4), but via diagonal ap-

proximation or since the grids employed are generally fairly regular for such

wave problems, there is little lost in description by considering (5.4).

22 A. J. Wathen

In fact for Helmholtz problems, there is usually a radiation boundary con-

dition for outgoing waves or an absorbing boundary condition (or ‘perfectly

matched layer’) is applied so that there is little reflection of outgoing waves

back into the computational domain. These add some complex entries in the

matrix which generally render it complex and symmetric (not Hermitian).

Nevertheless, it is the indefiniteness of Helmholtz problems which remains

the most important feature, hence why we consider them in this section.

This aspect can, however, mean that nonsymmetric Krylov subspace itera-

tive methods are required—sqmr is quite popular in this context.

Magolu Monga Made et al. (2000) and Laird and Giles (2002) were per-

haps the first to recognize the potential of preconditioning (5.4) with a

matrix of the form K + µI where |µ| is positive. In the case of real µ,

such a positive definite matrix is readily approximated by standard multi-

grid cycles which can be used as a positive definite preconditioner for (5.4).

Complex values for µ were also explored (see also van Gijzen et al. (2007),

for example), and good choices of the parameter can be found from Fourier

analysis of the spectrum of the preconditioned operator/matrix. As well as

indicating the speed of Krylov subspace iteration convergence, such analysis

can also assist in the choice of multigrid components (Erlangga et al. 2006).

Elman et al. (2001) also propose a multigrid approach for Helmholtz prob-

lem employing gmres as a smoother. Domain decomposition approaches

have also been suggested (Gander et al. 2002).

A more recent prosposal by Engquist and Ying (2011a) for variable and

high-frequency regimes involves ‘sweeping’ from the absorbing boundary,

eliminating the variables locally in layers. The resulting preconditioner is

thus a block incomplete LDLT factorzation. A key aspect of the implemen-

tation is that hierarchical (H-)matrix compression techniques as described

in section 3 above can be employed for the intermediate layers; this leads to

significant reductions in storage and computational work and leads to ex-

cellent overall complexities like O(n log2 n) for computation of P = LDLT

and application of P1 to a vector for a two-dimensional domain problem.

Krylov subspace iterative convergence is observed to occur in few iterations

making this a very competitive technique. Unfortunately, though the idea—

and a closely related one employing frontal sparse factorization (Engquist

and Ying 2011b)—is observed to work well for three-dimensional domains

as well, the H-matrix compression does not appear to yield the same small

computational complexity: algebraic factors in n replace logarithmic factors

in estimates of the complexity for setting up the preconditioner (Poulson

et al. 2013). That is, for a three-dimensional domain problem, the com-

putation of P = LDLT takes time which grows like n for some constant

> 1 whereas, once P has been computed, the action of P1 can be com-

puted in O(n log n) work at each iteration (similarly to the FFT). For a

Preconditioning 23

problem with many di↵erent right hand side vectors there could thus be

some amelioration in the work required to set up the preconditioner.

We should alert the reader to the article by Ernst and Gander (2012)

which explains the general diculty of iterative solution of Helmholtz prob-

lems.

The Maxwell equations describing electromagnetics have wave-like solu-

tions and hence some of the character of Helmholtz problems. However,

they also (in various formulations) have additional semidefinite features as

discussed in the section above, and additional indefinite saddle point struc-

ture, and are therefore considered below.

The idea of sweeping through the discrete variables for a problem in a par-

ticular order is also important for preconditioner construction for nonsym-

metric matrices which derive from PDEs that involve advection/convection

(or transport) as described below.

5.2. Saddle point systems

The second main class of symmetric and indefinite problems come from

minimization of quadratic functionals subject to linear constraints. In fact,

linearizations of much more general constrained optimization problems quite

generally result in linear systems of the same form

Ax =

A BT

B 0

u

p

=

f

g

(5.5)

which are called saddle point systems; see Benzi et al. (2005) for a com-

prehensive treatment of such problems. The block A 2 Rn⇥n is usually

positive definite on the kernel of B 2 Rm⇥n with m < n. This is a su-

cient condition for invertibility of A provided B is of full rank. Often A

is positive definite, though in the context of sequential quadratic approxi-

mations in Optimization (and possibly elsewhere), indefiniteness can arise;

see Gould and Simoncini (2009) for spectral bounds in this less usual case.

The variables u are the primary variables and p are the Lagrange multi-

pliers. Sometimes the zero block is replaced with a regularization matrix

C 2 Rm⇥m where C is symmetric and positive semidefinite; if C 6= 0 then

some authors refer to a regularized or generalized saddle point problem.

Many problems give rise to matrices of this type: as well as the ubiqui-

tous appearance of such problems in Optimization, several important PDE

problems including the Stokes problem describing slow viscous incompress-

ible flow, are of this form. The incompressible Navier-Stokes equations give

matrices of similar form but in which A is nonsymmetric. For these ap-

plications, the primary variables are the fluid velocity and the Lagrange

multiplier variables are the fluid pressure. For other applications and refer-

ences, see again Benzi et al. (2005) or Pestana and Wathen (2015).

The appearance of the zero diagonal block in (5.5) causes problems for

24 A. J. Wathen

most of the preconditoning approaches identified in section 3 above. This is

also generally true with regularisation, C 6= 0. Multigrid approaches, for ex-

ample, require more complex smoothers; see for example John and Tobiska

(2000) and Manservisi (2006). Domain decomposition can be applied, but

then the subdomain problems remain indefinite. Incomplete factorization

ideas can sometimes be employed if, for example, A and C are diagonal or

nearly so (Bergamaschi et al. 2004).

A simple observation in the case that A is nonsingular is that if

P =

A 0

0 S

, S = BA1BT , (5.6)

then the preconditioned matrix P1A has only three distinct eigenvalues

independently of m,n; thus minres (or gmres in the nonsymmetric case)

will converge in 3 iterations to the solution (Murphy et al. 2000, Korzak

1999). P in (5.6) is rarely a practical preconditioner for A in its exact

form, but this observation does give a paradigm for the construction of

preconditioners and, in particular, highlight the utility of approximations to

the Schur complement, S. An important point to note is that P is nothing

like an inverse of A (nor an approximation thereof). Nevertheless, good

approximations (preconditioners) bA for the block A and bS for the Schur

complement, S, lead to block diagonal preconditioners

P =

bA 0

0 bS

(5.7)

for A for which minres converges in few iterations. Precisely, if

` u

TAu

uT bAu ⌥ and p

TSp

pT bSp for all u 2 Rn{0} and p 2 Rm{0}

for positive constants `,⌥, and , then there are positive constants a, b, c

and d satisfying the preconditioned minres convergence bound (5.3). In

the context of PDE problems, for example, this means that preconditioned

minres is a solver of optimal computational complexity provided that bA is

spectrally equivalent to A and bS is spectrally equivalent to S. A multigrid

cycle can be an excellent choice for bA. See Pestana and Wathen (2015) for

a general proof which highlights the role of inf-sup stability associated with

the minimal singular value of B (in appropriate norms).

For many problems there are natural Schur complement approximations

which arise from problem formulation and do not require separate approx-

imation of the matrices B and A—see Pestana and Wathen (2015). In the

case of the Stokes equations, the natural Schur complement approximation

is particularly simple: it is the mass matrix (3.4) (Verfu¨rth 1984, Silvester

and Wathen 1994). For any such problem with a natural Schur comple-

ment approximation, (5.7) is an excellent preconditioning approach since bA

Preconditioning 25

can usually be any of the approximations in section 3. It should be noted

that once such approximations bA and bS are identified, then there are more

involved ways to make use of them than the simple block diagonal precon-

ditioner (5.7). When matrix symmetry is not an issue, a block triangular

preconditioner can be employed—see section 6.

Sometimes it is actually useful to directly approximate the matrices B

and A in constructing bS, for example in the context of optimization with

constraints which are PDEs (Rees et al. 2010a), though care must be exer-

cised to ensure that the sucient conditions of Braess and Peisker (1986)

are satisfied.

For saddle point problems arising particularly in the more general context

of optimization, constraint preconditioning has attracted significant atten-

tion (Keller et al. 2000). Here

P =

H BT

B 0

is used as a preconditioner forA. Note that the blocksB andBT which come

from the constraints are kept, whilst A is replaced by another symmetric and

(usually) positive definite matrix H. It might appear that solution of sys-

tems with P may not be signficantly easier than with A, however, by leaving

H implicitly defined, this can be achieved (Dollar and Wathen 2006, Dollar

et al. 2006). One significant advantage of keeping the constraint blocks in

the preconditioner is that the preconditioned problem is now equivalent to

a positive definite problem on the constraint manifold, hence a projected

cg algorithm can be used (Gould et al. 2001); this is rather useful since

a constraint preconditioner is symmetric and indefinite precluding simple

use of minres. Keller et al. (2000) prove that when H is positive definite

then P1A has 2m eigenvalues of 1, each of geometric multiplicity 2 (thus

the minimum polynomial generically has a factor ( 1)2), and that the

remaining nm eigenvalues are real and interlace those of H1A. Thus if

H is a good preconditioner for A and systems with P can be readily solved,

then this can be an attractive strategy.

Di↵erent challenges arise and di↵erent strategies are needed for saddle

point problems (5.5) with semidefinite block, A. The constraint precondi-

tioning paradigm can be applied, in which case finding a matrix H for the

semidefinite matrix A may require ideas as in section 4 above. The matrix

A is still invertible provided A is positive definite on the kernel of B and B

is of full rank, as has been mentioned. One possible approach is therefore

via an augmentation or augmented system preconditioner

P =

A+BTW1B 0

0 W

for some appropriate symmetric and positive definite matrix W . This ap-

26 A. J. Wathen

proach is described and analysed by Greif and Scho¨tzau (2006) who were

motivated by the solution of Maxwell equations describing electromagnetic

phenomena—a key PDE problem with this structure. For related but earlier

ideas associated with the semidefiniteness of the div operator, see Arnold

et al. (1997).

For the various Maxwell equation formulations, semidefiniteness arises

through curlcurl terms, indefiniteness because of a div constraint and wave-

like solutions are always expected, which means that fine computational

grids must be used especially for high frequency problems. Fortunately,

Maxwell’s equations are linear and self-adjoint in most formulations which

is why we bring them up in this section. Mixed finite element approxi-

mation is widely used for these PDEs. Representative of the large and

still growing literature on preconditioning for such problems are Greif and

Scho¨tzau (2007), who develop block preconditioners and Tsuji et al. (2012),

who extend sweeping preconditioner ideas to these problems.

A slightly more detailed review of preconditioning for saddle point prob-

lems can be found in Benzi and Wathen (2009).

5.3. A word of caution

As has already been mentioned in section 2, for any nonsingular square ma-

trix A, the solution of a linear system can be achieved by using an indefinite

symmetric system of twice the dimension of the form (2.5) or even

0 A

AT 0

r

x

=

b

0

.

Since the eigenvalues of the coecient matrix here are exactly symmetric

about the origin (they are ±i(A) where i(A) are the singular values of

A), generically, the convergence of minres would stagnate at every other

iteration, so the computational e↵ort in solution is exactly the same as for

the normal equations (2.4) (Fischer et al. 1998). For (2.5) a similar issue

arises.

The artificial construction of indefinite symmetric systems is therefore

unlikely to achieve other than equivalent problem statements, except possi-

bly in the solution of least squares problems by augmented matrix methods

(Bjo¨rck 1996, Chapter 7). To go further, a symmetric indefinite matrix

of any form which has eigenvalues symmetrically arranged about the origin

will generically lead to staircasing behaviour where progress is made only on

alternate minres iterations; cg for (2.4) generally requires the same com-

putational e↵ort for equation solution in such circumstances. In this sense,

it is highly unfortunate that the tractable minres convergence bound (5.2)

applies only when the positive and negative eigenvalues extend over inter-

vals of equal length. If in fact a = d and b = c so that these intervals are

Preconditioning 27

centred on the origin, then the undesirable staircasing e↵ect is always im-

plied! In general, it is advantageous in terms of convergence speed if there

is some asymmetry in the distribution of positive and negative eigenvalues

of preconditioned indefinite symmetric matrices. However, it is then con-

siderably more complicated to take account of this asymmetry in a minres

convergence bound (but see Wathen et al. (1995)).

6. Nonsymmetric matrices

As already mentioned, all preconditioning for nonsymmetric problems is

heuristic, since descriptive convergence bounds for gmres or any of the

other applicable nonsymmetric Krylov subspace iterative methods do not

presently exist (but see Pestana and Wathen (2013b)). The most straight-

forward convergence bound for gmres for diagonalisable matrices, A =

X⇤X1 where ⇤ is a diagonal matrix of eigenvalues, comes directly from

(2.3) and the residual minimization property:

krkk = min

p2⇧k,p(0)=1

kp(A)r0k

min

p2⇧k,p(0)=1

kp(X⇤X1)k kr0k

= min

p2⇧k,p(0)=1

kXp(⇤)X1k kr0k

kXk kX1k min

p2⇧k,p(0)=1

kp(⇤)k kr0k

kXk kX1k min

p2⇧k,p(0)=1

max

z2R

|p(z)| kr0k, (6.1)

where R is any region containing the eigenvalues. Since it is sometimes pos-

sible to find regions R which contain all of the eigenvalues, but rarely possi-

ble to obtain any reasonable estimates of the condition number kXkkX1k

of the eigenvector matrices, X, very often in the literature, the eigenvalues

of P1A are estimated or bounded as a guide to the quality of a precondi-

tioner. It is certainly usually true that if the eigenvalues are widely spread

through a large region of C then slow convergence can follow!

There are alternative approaches for obtaining bounds for gmres con-

vergence; see for example (Greenbaum 1997, Section 3.2). One can generate

such bounds, for example, with use of the field of values or numerical range

F(A) = {xTAx, x 2 Cn, xTx = 1},

provided 0 /2 F(A). The obtained bounds can be rather pessimistic, but

field-of-values convergence bounds have been used, for example, to rigor-

ously establish the convergence of gmres in a number of iterations in-

dependent of the mesh size, h, for certain preconditioned PDE problems;

see for example Klawonn and Starke (1999), Loghin and Wathen (2004),

28 A. J. Wathen

Benzi and Olshanskii (2011). gmres convergence bounds can be based on

other sets in the complex plane; ✏pseudospectral sets, for example, have

been used to analyse initial stagnation of (unpreconditioned) gmres for

convection-di↵usion problems (Trefethen and Embree 2005, Chapter 26).

One aspect which arises in the nonsymmetric but not the symmetric case

is the possibility to left precondition

P1Ax = P1b,

to right precondition

AP1y = b, x = P1y,

or, if P is available in split form P = P1P2, to use

P11 AP

1

2 y = P

1

1 b, x = P

1

2 y.

In every case the coecient matrix is generally again nonsymmetric and

obvious similarity tranformations involving P or the Pi show that all three

coecient matrices are mathematically similar and so have the same eigen-

values. There is little evidence, however, that any one of these is better

than the others, in terms of its e↵ect on the convergence rate for a Krylov

subspace method, even though, theoretically, this could be the case. Tradi-

tionally, right preconditioning is favoured since then gmres, for example,

minimizes the residual rk = b Axk for the original linear system; left

preconditioning would lead to minimization of the preconditioned residual

P1rk. It is certainly not always clear which is preferable; a good precon-

ditioner fortunately seems to give fast convergence for all three forms in

practice.

In the context of PDEs, convection-di↵usion problems present significant

challenges for stable discretization—without excessive numerical di↵usion

or non-physical oscillations—as well as for preconditioning and iterative

solvers. A key aspect is simply equation ordering. Even such a simple

method as stationary Gauss-Seidel iteration can be good for such problems

if the variables are ordered in the direction of the convection, but it can be

very poor otherwise (see Elman and Chernesky (1993), Elman et al. (2014b,

chapter 7)).

We mention also that there is a ‘flexible’ version of gmres which allows

for a preconditioner that varies between iterations (Saad 1993). This can be

useful, for example, when employing another iteration as a preconditioner,

but this flexible gmres method does not necessarily work as well as regular

gmres. It can sometimes be better to just use a fixed number of inner

iterations so that the preconditioner is a fixed operator for every iteration;

if such inner iterative steps represent a fixed linear process, then at least

the limited guarantees that come with gmres then remain in place.

Preconditioning 29

6.1. Incomplete Factorization

Just as a Cholesky factorization for a symmetric and positive definite ma-

trix can be incompletely formed, an LU factorization (equivalent to Gauss

elimination) for a non-symmetric matrix can be performed incompletely

by prespecifying allowable fill or thresholding or both. Perhaps such ILU

preconditioning methods and their variants are the most widely used for

non-symmetric problems and software is widely available.

Because the product of triangular matrix factors is computed for ILU , for

problems with an implied directionality like convection–di↵usion, ordering

of the variables can lead to one of the factors being less important (and

perhaps more nearly diagonal).

Although, in general, there is little we can say about ILU precondition-

ing for non-symmetric matrices, gmres with ILU is a ‘go-to’ technique for

very many practical problems, so much largely empirical research continues

to explore every aspect of the various possibilities, for example, multilevel

variants, block variants and parallel implementation issues. We again refer

to Benzi (2002) as the huge literature is fairly evenly cited. Because of

its significance, particularly in the important area of petroleum reservoir

modelling, we specifically mention the nested ILU methods originally de-

rived by Appleyard and Cheshire (1983), which use partial eliminations and

are designed to take advantage of parallel computation; related ideas were

developed, for example, by Saad (1996). See also Vassilevski (2008) for a

comprehensive treatment of these methods in a more general framework.

6.2. Multigrid

There is no inherent obstacle to applying multigrid or multilevel ideas for

nonsymmetric problems. However, smoothing is generally not so easy and

requires more work than for symmetric problems and care must be taken so

that coarse grid correction does not reintroduce high frequencies. Without

attention to these aspects, the methods simply do not work at all well.

Since such issues vary with di↵ering applications, we briefly discuss only

the important problem associated with the convection-di↵usion equation.

Smoothing – or perhaps more appropriately in this setting, ‘sweeping’ –

should ideally be something like a Gauss-Seidel iteration with the variables

ordered in the direction of the convection. For problems with recirculation,

one can try multidirectional sweeping, so that all parts of the flow have

at least one of the directional sweeps that is approximately in the convec-

tion direction. Coarse grid correction then requires care (in grid transfers

and/or coarse grid operator). In the convection-di↵usion geometric multi-

grid method due to Ramage (1999), for example, the coarse grid operator

is recomputed on each coarse grid with—crucially—streamline upwinding

appropriate to that grid size. The point is that when a grid under-resolves

30 A. J. Wathen

any layer-like phenomenon which arises with convection—this must happen

at some level if one has a true multigrid concept—then oscillatory solutions

are commonly the result. The reintroduction of such high frequencies would

require more significant sweeping or smoothing, and it is preferable to pro-

vide stabilisation appropriate to each level of grid so that smooth coarse

grid corrections result.

Multigrid for hyperbolic problems is thought about rather di↵erently

than for elliptic problems and the idea of directional sweeping rather than

smoothing is key. The coarse grids allow one to sweep errors out of the

boundary of the domain more rapidly than using just a single grid. Again,

transport is the main issue, not smoothing.

Algebraic multigrid ideas have been tried and can work well for some

problems. The computation of ‘coarse grids’ (or in reality, spaces of coarse

variables) by purely algebraic means can identify layer phenomena, for ex-

ample, and automatically semi-coarsen. In particular, widely used coars-

ening algorithms seem to detect anisotropy and lead to appropriate coarse

spaces/grids. All is not rosy, however, and AMG software can quite simply

fail. There is a particular danger that simply does not arise with geometric

multigrid ideas, namely that the coarse grid can have a significant propor-

tion of the variables of the fine grid; that is, coarsening does not lead to a

suciently significant reduction in the number of variables. If this occurs,

then many levels are required and eciency can be severely compromised.

Active research continues on AMG for nonsymmetric problems; see for ex-

ample Notay (2012).

Multigrid methodology is generally applied as a preconditioner for non-

symmetric problems.

6.3. Sparse Approximate Inverses

Norm minimization readily leads to sparse approximate inverses similarly

to the symmetric case. A biconjugation based method (AINV) and its

more robust stabilized form (SAINV) are, however, perhaps the most widely

considered of such methods in the non-symmetric case (Benzi and Tuma

1998, Benzi et al. 2000).

6.4. Symmetric/Skew splitting

Every nonsymmetric real matrix can be uniquely split into its symmetric

and skew-symmetric part:

A = 12(A+A

T )| {z }

H

+ 12(AAT )| {z }

S

.

(The H stands for Hermitian, since the same idea obviously applies in the

complex case when transpose is replace by conjugate transpose.) When H is

Preconditioning 31

positive definite, preconditioning of A by H gives a shifted skew-symmetric

matrix for which a special three-term recurrence Krylov subspace method

exists (Concus and Golub 1976, Widlund 1978) and convergence depends on

the size of 1+|max(H1S)| (Widlund 1978, Freund and Ruscheweyh 1986).

Although this approach has been considered for some applications—see for

example Elman and Schultz (1986) in the context of convection–di↵usion

problems—the so-called HSS (Hermitian Skew-Hermitian Splitting) meth-

ods originally introduced by Bai et al. (2003) have been rather more widely

investigated. They proved the convergence of an alternate iteration using

shifted Hermitian and shifted skew-Hermitian parts of the form

(↵I +H)x(k+

1

2) = (↵I S)x(k) + b

(↵I + S)x(k+1) = (↵I H)x(k+12) + b

for Ax = b. Though convergence is guaranteed independently of the choice

of the parameter ↵ > 0, it can be slow. Nevertheless, many variants of

using this idea as a preconditioner have been suggested, as have related

structural splitting ideas, and good results are reported for some; see for

example Benzi et al. (2011a).

6.5. Nonsymmetric saddle point systems

Saddle point problems must essentially be symmetric as in section 5.2, but

matrices of the structure (5.5) arise in which the block A is non-symmetric.

A key example is provided by linearizations of the incompressible Navier–

Stokes equations (see Elman et al. (2014b, Chapters 8 and 9)). It is these

problems that we refer to as non-symmetric saddle point problems.

For a nonsymmetric saddle point matrix, the results of Murphy et al.

(2000) still hold and block diagonal preconditioners can still be applied.

However, since there is no symmetry to lose, the block triangular precondi-

tioners

P =

bA 0

B ±bS

or P =

bA BT

0 ±bS

will give in the ‘ideal’ case bA = A, bS = S = BA1BT , a preconditioned

system with either just the two distinct eigenvalues ±1 or even just the

single eigenvalue of 1 depending on whether the + or the sign is selected,

respectively (Murphy et al. 2000). Even with the single eigenvalue 1, P is

not the inverse, so something has to give! In fact it is diagonalisability that

is lost in selecting the sign; the minimium polynomial is then ( 1)2, so

two iterations would be required exactly as for the + case. However, in the

practical case when the approximations bA, bS are not ideal, it is usually the

32 A. J. Wathen

sign which is chosen so that all eigenvalues are clustered around 1 rather

than the 2 clusters around ±1 which arise with the + choice, though it is

not clear that this is to be preferred (Ymbert et al. 2015).

Approximations of the Schur complement remain a key aspect of any

such block preconditioning approach. For the common Oseen linearization

of the incompressible Navier-Stokes problem, the matrix block A comes

from a vector convection–di↵usion operator so that an appropriate multi-

grid approach as outlined above or any other such good approximation of

(preconditioner for) convection-di↵usion is required as bA. To identify an

approximate Schur complement requires more in depth consideration, but

there are now two excellent candidates: the pressure convection-di↵usion

(PCD) (Kay et al. 2002) and the least-squares commutator (LSC) precon-

ditioners. Both are derived, described and analysed in Elman et al. (2014b,

Chapter 9). The PCD method is in some ways simpler, but it requires the

construction of a convection–di↵usion operator on the pressure space (the

space of Lagrange multipliers), which is not required for the problem for-

mulation itself. A purely algebraic way to compute this operator based on

sparse approximate inverse technology was, however, been described by El-

man et al. (2006). The LSC method is derived via a commutator argument

and is defined purely in terms of the matrix blocks which arise naturally

for the Oseen problem. For a comparison and full consideration of advan-

tages and disadvantages of these two approaches, see Elman et al. (2014b,

Chapter 9).

The augmentation or augumented Lagrangian approach has also been

applied to non-symmetric saddle point problems. Although some concerns

have been expressed regarding possible distortion of the norm in which

convergence occurs, good results have been reported even for large Reynolds

number (Benzi et al. 2011b).

Earlier approaches for this problem involved segregated treatment of pres-

sure and velocity and predate the preconditioned Krylov subspce iteration

technology. It is perhaps worth commenting here that this example is one

of the first where block preconditioners comprising of blocks designed for

di↵erent sets of variables allow treatment of di↵erent ‘physics’ within the

preconditioner whilst gmres or some other Krylov subspace method is ap-

plied to the complete coupled problem. This is an important paradigm.

Such is its importance that it is in the context of nonsymmetric saddle

point systems arising from approximation of the Oseen problem that per-

haps the most refined analysis of preconditioned gmres convergence has

been pursued. There are several eigenvalue analyses for di↵erent precondi-

tioners, see for example Elman and Silvester (1996), although these do not

lead to rigorous estimates as explained above. It is in the context of this

problem, however, that field-of-values convergence analysis has been suc-

cessfully employed to rigorously bound preconditioned gmres convergence

Preconditioning 33

and to elucidate its dependence on or independence of the mesh size, h, and

of the Reynolds number; see for example the analysis of Loghin and Wa-

then (2004) for PCD, and Benzi and Olshanskii (2011) for block triangular

preconditioning of an augmented Lagrangian formulation.

6.6. Circulant and semi-circulant preconditioning

Our opening example already indicates the potential for circulant precondi-

tioning, at least for symmetric Toeplitz matrices. In fact the applicability

of such ideas is rather wider, though the theoretical justification is less com-

plete for non-symmetric problems.

The analysis for symmetric and positive definite Toeplitz matrices is well

described in the books by Chan and Jin (2007) and Ng (2004) who also in-

dicate how similar ideas follow for block Toeplitz matrices with Toeplitz

blocks. Indeed, the use of circulant preconditioners for non-symmetric

Toeplitz matrices does not really depend on symmetry. However, unless the

normal equations are used, convergence guarantees are not readily avail-

able, simply because of the general lack of descriptive convergence bounds

for non-symmetric matrices. Good computational results have been ob-

served in many situations, and in fact even proved for certain problems via

bounding of the eigenvector condition number as well as the eigenvalues

and using (6.1) (Otto 1996). A recent observation of Pestana and Wathen

(2014), that guaranteed convergence bounds can be achieved for minres

for circulant preconditioned non-symmetric Toeplitz matrices via a variable

reordering, is also relevant.

7. Some comments on symmetry

We have emphasised the importance of matrix symmetry in this review,

in particular that it is rarely sensible to throw away such a property by

preconditioning. In the reverse direction, it would be highly advantageous

for non-symmetric matrices if it were easily possible to precondition to ob-

tain symmetry of the preconditioned problem without recourse to the nor-

mal equations. This possibility seems unlikely without the computation

of expensive matrix factorizations. However, there are situations in which

a broader consideration of self-adjointness allows many of the attractive

properties of symmetric matrices, such as the use of the symmetric Krylov

subspace iterative methods cg and minres.

A symmetric matrix, A = AT , is simply one which is self-adjoint in the

usual Eulidean inner product, that is,

hAu, vi = vTAu = uTAv = hAv, ui = hu,Avi.

In fact given any inner product

hu, viH = vTHu (7.1)

34 A. J. Wathen

defined by a symmetric and positive definite matrixH (all inner products on

Rn arise in this way), the distinction between symmetric and non-symmetric

matrices is really the distinction between matrices which are self-adjoint and

those which are non-self-adjoint with respect to the inner product. Indeed,

in the standard situation when A and P are both symmetric and P is

positive definite, then the left-preconditioned matrix P1A is self-adjoint in

hu, viP , since

hP1Au, viP = vTPP1Au = vTAu = vTATPTPu = hu, P1AviP .

This description of symmetric preconditioning for symmetric matrices with-

out destroying self-adjointness provides an alternative to the approach in

section 3. The cg method, for example, can be employed in such a gen-

eral inner product (7.1) for any matrix which is self-adjoint in this inner

product and satisfies the positive definiteness condition hAu, uiH > 0 for all

u 2 Rn {0}.

In general, any matrix A is self-adjoint in (7.1) if and only if, for all

u, v 2 Rn

vTHAu = hAu, viH = hu,AviH = vTATHu, (7.2)

which is the same condition as HA = ATH, so we must have AT = HAH1

for some positive definite symmetric matrix H. It can be shown that such

a matrix H exists if and only if the real matrix A is diagonalisable and has

real eigenvalues (Taussky 1972). Thus the use of self-adjointness in non-

standard inner products is limited, but interesting examples do exist where,

via left- or right-preconditioning, non-symmetric preconditioned matrices

arise which are nevertheless self-adjoint in some convenient non-standard

inner product; see Stoll and Wathen (2008). Since the inner product must

be used for computation of scalar products (dot products) arising in Krylov

subspace methods, there is additional arithmetic overhead in the use of

H 6= I unless other structures mitigate this.

We present just one example—perhaps the most well-known—of the use

of non-standard inner products; exceptionally, it allows the robust appli-

cation of cg in a non-standard inner product for an indefinite symmetric

saddle point problem by creating a self-adjoint and positive definite matrix

in that inner product (Bramble and Pasciak 1988). For other examples, see

Klawonn (1998) and Benzi and Simoncini (2006), and see Stoll and Wathen

(2008) and Pestana and Wathen (2013a) for how examples of this struc-

ture can be combined to yield further cases of possible utility (so-called

combination preconditioning).

In the Bramble and Pasciak method one preconditions the symmetric and

indefinite saddle point matrix as given in (5.5) on the left with the block

Preconditioning 35

lower triangular matrix

P =

bA 0

B I

,

resulting in the non-symmetric matrix

P1A =

bA1A bA1BT

B bA1AB B bA1BT

.

This turns out to be self-adjoint in the inner product (7.1) (i.e. it satisfies

the condition (7.2)) where

H =

A bA 0

0 I

.

Moreover P1A is positive definite in this inner product. Note that it is

necessary for A bA to be positive definite in order to have an inner product

(theoretically – and quite often in practice – this can be achieved by scaling

of bA), and in fact the positive definiteness also requires this condition.

Perhaps this appears to be a rather exotic structure, but much practical

use has been made of this particular example; the possibility for other really

useful examples remains unrealised but rather intriguing.

A final comment concerns the possibility of non-symmetric precondition-

ing of symmetric systems. We have earlier noted that such preconditioning

would destroy symmetry in general and thus is undesirable. However, in

the conceivable, but presumably rare, situation that A = AT , P 6= P T but

P1A = APT—that is, PT is one of the matrices that is self-adjoint

in the (possibly indefinite) inner product related to the symmetric matrix

A—then it would be possible to use cg for P1Ax = P1b when P1A

is positive definite or minres if P1A is indefinite. The cg error would

converge in k ·kP1A and for minres it would be P1(bAx), which would

converge in k ·kI . We do not know of any instance of this structure for non-

symmetric P , however the symmetric preconditioner P1 = q(A) for any

polynomial, q, clearly does commute with A, and there is some literature

on such polynomial preconditioning (see for example O’Leary (1991)).

8. Some comments on practical computing

Some important general ideas remain to be mentioned that do not in them-

selves comprise a particular preconditioning approach, but which have broad

significance.

First we mention deflation. It can be helpful – sometimes extremely help-

ful – to remove certain subspaces from a Krylov subspace. For example, if

one has a symmetric and positive definite matrix with a reasonable spread of

eigenvalues except for a few very small eigenvalues about which one has some

36 A. J. Wathen

knowledge of the corresponding eigenspace, then cg convergence could be

much improved if the e↵ect of this eigenspace could be removed or ignored.

This is the idea of deflation; if the eigenspace spanned by the eigenvectors

corresponding to the m smallest eigenvalues 1 2 . . . m can be

deflated, then the e↵ective condition number is = max/m+1 and it is

this which appears in the cg convergence bound (3.1). The crucial step is

to identify the deflating subspace.

The deflation idea is most e↵ective when the problem naturally presents

such a space. For example, see Brandt and Ta’asan (1986), who deflate the

low frequency eigenspaces which correspond to negative eigenvalues for a

Helmholtz problem, and Jonsthovel et al. (2012) who describe a prototyp-

ical application involving composite materials. However, various subspaces

which arise from iterations of a previous linear system solve, or other sources,

can be used. In particular, when restarting gmres, one can use informa-

tion from previous iterates via deflation. There are many papers describing

various approaches: for symmetric and positive definite matrices, see for

example Frank and Vuik (2001), and for nonsymmetric matrices, see for

example Erhel et al. (1995), Sifuentes et al. (2013) and Giraud et al. (2010).

When solving sequences of linear systems, the related idea of ‘recycling’

can be considered (Parks et al. 2006). Deflation is central to the iterative

computation of eigenvalues (Lehoucq and Sorensen 1996).

As parallel computing becomes more widespread, preconditioners which

are parallelizable or inherently parallel become more important. After all, it

is large scale computation for which preconditioned iterative solution tech-

nology is generally being developed. Very simple preconditioning such as

diagonal scaling is obviously parallel, but the utility of such an approach is

limited, as we have described. Methods based on incomplete factorization

are more sequential in design, and though there are parallel variants, often

speed of convergence is compromised by ordering and separating variables,

etc. to obtain the necessary independence of calculations. Such is the im-

portance of multigrid and AMG methods that parallel forms which give up

the least of the potency of sequential implementation are critical. There are,

however, compromises to be made in going parallel (Worley 1991). There is

much to be considered here, and this is not an article on parallel multigrid.

The paradigm of domain decomposition is clearly ideal for parallel compu-

tation and this is precisely what it was invented for. Sparse approximate

inverses were precisely developed because they appeared to be a tractable

way to compute a purely algebraic (‘given a matrix’) preconditioner in par-

allel. Their e↵ectivity, however, remains unclear. One beneficial use of

these ideas appears to be in contexts where one is not seeking to approxi-

mate an inverse, but rather seeking some more limited goal like minimising

a commutator—see, for example Elman et al. (2006).

Since preconditioning is used with iterative methods, sensible criteria for

Preconditioning 37

stopping an iteration must be chosen. In the ‘given a matrix’ context,

some fairly arbitrary small tolerance for a residual norm is usually specified.

When one is ‘given a problem’, however, there can be criteria appropriate

to the problem. In particular for PDE problems, reducing iteration error to

orders of magnitude less than discretization error is untenable and rather

wasteful; practical criteria are given by Arioli (2004) and Arioli et al. (2005).

An important point is that a larger-dimensional matrix for a given problem

comes from a more accurate discretization, so any stopping tolerance should

also in general be smaller on finer grids; see Elman et al. (2014b, Section

2.1.2).

In the general context of solution of linear systems of equations, it is often

regarded as desirable to reduce the number of variables/degrees of freedom

if this is easy; such thinking is associated completely with direct solution

methods. In the context of iterative methods, it can be preferable to have a

larger dimensional problem if it is then easier to identify and apply an ap-

propriate preconditioner. That is, it can be convenient even to enlarge the

dimension of given systems if the resulting matrices are more readily precon-

ditioned, as long as the storage requirement for the enlarged matrix and the

associated vectors does not become unacceptable. In finite element analysis,

for example, the idea of ‘static condensation’ is precisely the elimination of

variables with only local connections, but although it slightly reduces the

number of free variables, and hence the matrix dimension, the resulting

smaller matrices can be less easy to precondition even though they often

possess a slightly reduced condition number. More radical is the idea of

the first-order system least-squares (FOSLS) approach, where extra depen-

dent variables are explicitly introduced making a rather larger-dimensional

system in general, but a system for which multigrid solvers can be readily

applied. For an application of this idea, see de Sterck et al. (2004), for

example.

A further example of the freedom and possibilities is the ‘all-at-once’, ‘one-

shot’ or ‘monolithic’ approach, where preconditioning for separate parts of a

problem – perhaps separate physical processes – are employed together with

a Krylov subspace method for the whole of a coupled problem for which the

required matrix-vector products are computable in a matrix-free way, that

is, without explicit storage and operation on the complete two-dimensional

array. See, for example Heil et al. (2008), Haber and Ascher (2001) or Rees

et al. (2010b). When matrix-vector products are readily computed, there

is always a Krylov subspace method that can be applied to the monolithic

problem; the representation of the component parts can very conveniently

be the job of a block-structured preconditioner. Good examples of this

kind of approach are recent ideas for preconditioning problems coming from

magnetohydrodynamics (Phillips et al. 2014, Wathen 2014) and from geo-

dynamics (Rhebergen et al. 2014).

38 A. J. Wathen

A final comment is aimed at anyone who would like to experiment and

see the potential of preconditioning in an easy computational setting. The

freely downloadable ifiss software (Elman et al. 2007, Elman et al. 2014a),

which runs under basic matlab or octave enables such experiments.

9. Summary

We conclude with a few general guidelines on preconditioner design and

selection.

• Keep to the original structure when preconditioning (e.g., preserve

symmetry, partitioning into blocks, etc.)

• The more knowledge about a problem that can be represented in a

preconditioner, the better.

• The work in applying the preconditioner should ideally be commen-

surate with the work in matrix–vector multiplication: that would be

O(n) for a sparse matrix and O(n2) for a general full matrix.

• It may be easier to precondition a matrix of larger dimension, for which

structures are clearer, than to eliminate variables, giving a less struc-

tured matrix of smaller dimension.

There is, of course, no such concept as a best preconditioner: the only

two candidates for this would be P = I, for which the preconditioning takes

no time at all, and P = A, for which only one iteration would be required

for solution by any iterative method. However, every practitioner knows

when they have a good preconditioner which enables feasible computation

and solution of problems. In this sense, preconditioning will always be an

art rather than a science.

Acknowledgements

I am grateful to Michele Benzi, Howard Elman and David Silvester for read-

ing an early draft; their comments and suggestions have led to significant

improvements.

Parts of this survey were written whilst visiting CERFACS, Toulouse,

France and the Memorial University of Newfoundland, Canada. I am grate-

ful to Serge Gratton, Xavier Vasseur, Hermann Brunner and Scott MacLach-

lan for their support and encouragement.

Preconditioning 39

REFERENCES

P.R. Amestoy, I.S. Du↵, J-Y. L’Excellent and J. Koster (2000), ’MUMPS: a general

purpose distributed memory sparse solver’, In Proc. PARA2000, 5th Interna-

tional Workshop on Applied Parallel Computing, Springer-Verlag, 122–131.

J.R. Appleyard and I.M. Cheshire ’Nested factorization’, Society of Petroleum

Engineers, SPE 12264, presented at the Seventh SPE Symposium on Reservoir

Simulation, San Francisco, 1983.

M. Arioli (2004), ’A stopping criterion for the conjugate gradient algorithm in a

finite element method framework’, Numer. Math. 97, 1–24.

M. Arioli, D. Loghin and A.J. Wathen (2005), ’Stopping criteria for iterations in

finite element methods’, Numer. Math. 99, 381–410.

D.N. Arnold, R.S. Falk and R. Winther (1997), ’Preconditioning in H( div) and

applications’, Math. Comput. 66, 957–984.

Z.-Z. Bai, G.H. Golub and M.K. Ng (2003), ’Hermitian and skew-Hermitian split-

ting methods for non-Hermitian positive definite linear systems’, SIAM J.

Matrix Anal. Appl. 24, 603–626.

N.S. Bakhvalov (1966), ’On the convergence of a relaxation method with natural

constraints on the elliptic operator’, USSR Comp. Math. and Math. Phys.

6, 101–135.

M. Bebendorf (2008), Hierarchical Matrices: A Means to Eciently Solve Ellip-

tic Boundary Value Problems, Lecture Notes in Computational Science and

Engineering, vol. 63, Springer-Verlag.

M. Benzi (2002), ’Preconditioning techniques for large linear systems: A survey’,

J. Comput. Phys. 182, 418–477.

M. Benzi, J.K. Cullum and M. Tuma (2000), ’Robust approximate inverse precon-

ditioning for the conjugate gradient method’, SIAM J. Sci. Comput. 22,1318–

1332.

M. Benzi, G.H. Golub and J. Liesen (2005), ’Numerical solution of saddle point

problems’, Acta Numerica 14, 1–137.

M. Benzi, C.D. Meyer and M. Tuma (1996), ’A sparse approximate inverse precon-

ditioner for the conjugate gradient method’, SIAM J. Sci. Comput. 17, 1135–

1149.

M. Benzi, M. Ng, Q. Niu and Z. Wang (2011a), ’A relaxed dimensional factorization

preconditioner for the incompressible Navier-Stokes equations’, J. Comput.

Phys. 230, 6185–6202.

M. Benzi and M.A. Olshanskii (2011), ’Field-of-values convergence analysis of aug-

mented Lagrangian preconditioners for the linearized Navier-Stokes problem’,

SIAM J. Numer. Anal. 49, 770–788.

M. Benzi, M.A. Olshanskii and Z. Wang (2011b), ’Modified augmented Lagrangian

preconditioners for the incompressible Navier-Stokes equations’, Int. J. Nu-

mer. Meth. Fluids 66, 486–508.

M. Benzi and V. Simoncini (2006), ’On the eigenvalues of a class of saddle point

matrices’, Numer. Math. 103, 173–196.

M. Benzi and M. Tuma (1998), ’A sparse approximate inverse preconditioner for

nonsymmetric linear systems’, SIAM J. Sci. Comput. 19, 968–994.

M. Benzi and M. Tuma (2003), ’A robust incomplete factorization preconditioner

for positive definite matrices’, Numer. Linear Algebra Appl. 10, 385–400.

40 A. J. Wathen

M. Benzi and A.J. Wathen (2008), ’Some preconditioning techniques for saddle

point problems’, in Model order reduction: theory, research aspects and ap-

plications, 195–211.

L. Bergamaschi, J. Gondzio and G. Zilli (2004), ’Preconditioning indefinite systems

in interior point methods for optimization’, Comput. Optim. Appl. 28, 149–

171.

M. Bern, J.R. Gilbert, B. Hendrickson, N. Nguyen and S. Toledo (2006), ’Support-

Graph preconditioners’, SIAM J. Matrix Anal. Appl. 27, 930–951.

A. Bjo¨rck (1996), Numerical Methods for Least Squares Problems, Society for In-

dustrial and Applied Mathematics.

J.W. Boyle, M.D. Mihajlovic and J.A. Scott (2010), ’HSL MI20: an ecient AMG

preconditioner for finite element problems in 3D’, Int. J. Numer. Meth. En-

gnrg 82, 64–98.

D. Braess and P. Peisker (1986), ’On the numerical solution of the biharmonic equa-

tion and the role of squaring matrices for preconditioning’, IMA J. Numer.

Anal. 6, 393–404.

J. Bramble and J. Pasciak (1988), ’A preconditioning technique for indefinite sys-

tem resulting from mixed approximations of elliptic problems’, Math. Com-

put. 50, 1–17.

J.H. Bramble, J.E. Pasciak and J. Xu (1990), ’Parallel multilevel preconditioners’,

Math. Comput. 55,1–22.

A. Brandt (1977), ’Multilevel adaptive solutions to boundary-value problems’,

Math. Comput. 31,333–390.

A. Brandt and S. Ta’asan (1986), ’Multigrid methods for nearly singular and

slightly indefinite problems’, in Multigrid Methods II, W. Hackbusch and

U. Trottenberg, eds., Lecture Notes in Math. 1228, Springer-Verlag, Berlin,

pp. 99–121.

O. Bro¨ker, M.J. Grote, C. Mayer and A. Reusken (2001), ’Robust parallel smooth-

ing for multigrid via sparse approximate inverses’, SIAM J. Sci Comp.

23, 1395–1416.

R.H.-F. Chan and X.-Q. Jin (2007), An Introduction to Iterative Toeplitz Solvers,

Society for Industrial and Applied Mathematics.

T.F. Chan and T.P. Mathew (1994), ’Domain decomposition algorithms’, Acta

Numerica 3, 61–143.

S. Chandrasekaran, P. Dewilde, M. Gu, T. Pals, X. Sun, A.J. van der Veen and

D. White (2005), ’Some fast algorithms for sequentially semiseparable repre-

sentation’, SIAM J. Matrix Anal. Appl. 27, 341–364.

K. Chen (2005), Matrix preconditioning techniques and applications, Cambridge

University Press.

P. Concus and G.H. Golub (1976), ’A generalized conjugate gradient method for

nonsymmetric systems of linear equations’, in R. Glowinski and P.L. Lions

eds., Lecture Notes in Economics and Mathematical Systems 134, 56–65,

Springer.

J.W. Cooley and J.W. Tukey (1965), ’An algorithm for the machine calculation of

complex Fourier series’, Math. Comput. 19, 297–301.

T.A. Davis (2004), ’Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern

multifrontal method’, ACM Trans. Math. Softw. 30, 196–199.

Preconditioning 41

’International Conference on Domain Decomposition Methods’,

http://www.ddm.org/conferences.html .

H. de Sterck, T. Manteu↵el, S. McCormick and L. Olson (2004), ’Least-squares

finite element methods and algebraic multigrid solvers for linear hyperbolic

PDEs’, SIAM J. Sci Comp. 26, 31–54.

H.S. Dollar, N.I.M. Gould, W.H.A. Schilders and A.J. Wathen (2006), ’Implicit-

factorization preconditioning and iterative solvers for regularized saddle-point

systems’, SIAM J. Matrix Anal. Appl. 28, 170–189.

H.S. Dollar and A.J. Wathen (2006), ’Approximate factorization constraint precon-

ditioners for saddle-point matrices’, SIAM J. Sci. Comput. 27, 1555–1572.

T.F. Dupont, R.P. Kendall and H.H. Rachford (1968), ’An approximate factoriza-

tion procedure for solving self-adjoint elliptic di↵erence equations’, SIAM J.

Numer. Anal. 5, 554–573.

H.C. Elman and M.P. Chernesky (1993), ’Ordering e↵ects on relaxation methods

applied to the discrete one-dimensional convection-di↵usion equation’, SIAM

J. Numer. Anal. 30, 1268–1290.

H.C. Elman, O.G. Ernst and D.P. O’Leary (2001), ’A multigrid method enhanced

by Krylov subspace iteration for discrete Helmholtz equations’, SIAM J. Sci.

Comput. 23, 1291–1315.

H.C. Elman, V.E. Howle, J.N. Shadid, R. Shuttleworth and R. Tuminaro (2006),

’Block preconditioners based on approximate commutators’, SIAM J. Sci.

Comput. 27, 1651–1668.

H.C. Elman, A. Ramage and D.J. Silvester (2007), ’Algorithm 886: IFISS, A Mat-

lab toolbox for modelling incompressible flow’, ACM Trans. Math. Softw.

33(2), article no. 14.

H.C. Elman, A. Ramage and D.J. Silvester (2014a), ’IFISS: A computational lab-

oratory for investigating incompressible flow problems’, SIAM Rev. 56, 261–

273.

H.C. Elman and M.H. Schultz (1986), ’Preconditioning by fast direct methods

for non-self-adjoint nonseparable elliptic equations’, SIAM J. Numer. Anal.

23, 44–57.

H.C. Elman and D.J. Silvester (1996), Fast nonsymmetric iterations and precon-

ditioning for the Navier-Stokes equations’, SIAM J. Sci. Comput. 17, 33–46.

H.C. Elman, D.J. Silvester and A.J. Wathen (2014b), Finite Elements and Fast

Iterative Solvers: with applications in incompressible fluid dynamics, Second

Edition, Oxford University Press.

H.C. Elman, D.J. Silvester and A.J. Wathen (2005), Finite Elements and Fast

Iterative Solvers: with applications in incompressible fluid dynamics, Oxford

University Press.

H.C. Elman and X. Zhang (1995), ’Algebraic analysis of the hierarchical basis

preconditioner’, SIAM J. Matrix Anal. Appl. 16, 192–206.

B. Engquist and L. Ying (2011a), ’Sweeping preconditioner for the Helmholtz equa-

tion: Hierarchical matrix representation’, Comm. Pure Appl. Math. 64, 697–

735.

B. Engquist and L. Ying (2011b), ’Sweeping preconditioner for the Helmholtz equa-

tion: moving perfectly matched layers’, Multiscale Model. Simul. 9, 686–710.

42 A. J. Wathen

J. Erhel, K. Burrage and B. Pohl (1995), ’Restarted GMRES preconditioned by

deflation’, J. Comput. Appl. Math. 69, 303–318.

Y.A. Erlangga, C.W. Oosterlee and C, Vuik (2006), ’A novel multigrid based pre-

conditioner for heterogeneous Helmholtz problems’, SIAM J. Sci. Comput.

27, 1471–1492.

O.G. Ernst and M.J. Gander (2012), ’Why it is dicult to solve Helmholtz prob-

lems with classical iterative methods’, in Numerical analysis of multiscale

problems, 325–363, Springer.

V. Faber and T. Manteu↵el (1984), ’Necessary and sucient conditions for the

existence of a conjugate gradient method’, SIAM J. Numer. Anal. 21, 315–

339.

R. P. Federenko (1964), ’The speed of convergence of one iterative process’, USSR

Comp. Math. and Math. Phys. 4, 227–235.

B. Fischer (2011), Polynomial Based Iteration Methods for Symmetric Linear Sys-

tems, Society for Industrial and Applied Mathematics.

B. Fischer, A. Ramage, D.J. Silvester and A.J. Wathen (1998), ’Minimum residual

methods for augmented systems’, BIT 38, 527–543.

J. Frank and C. Vuik (2001), ’On the construction of deflation-based precondition-

ers’, SIAM J. Sci. Comput. 23, 442–462.

R.W. Freund and N.M. Nachtigal (1991), ’QMR a quasi-minimal residual method

for non-Hermitian linear systems’, Numer. Math. 60, 315–339.

R.W. Freund and N.M. Nachtigal (1994), ’A new Krylov-subspace method for

symmetric indefinite linear systems’, Proceedings of the 14th IMACS World

Congress on Computational and Applied Mathematics, W.F. Ames(Ed.),

IMACS, 1253–1256.

R.W. Freund and S. Ruscheweyh (1986), ’On a class of Chebyshev approximation

problems which arise in connection with a conjugate gradient type method’,

Numer. Math. 48, 525–542.

M.J. Gander, F. Magoule`s and F. Nataf (2002), ’Optimized Schwarz methods with-

out overlap for the Helmholtz equation’, SIAM J. Sci. Comput. 24, 38–60.

M.W. Gee, C.M. Siefert, J.J. Hu, R.S. Tuminaro and M.G. Sala (2006), ’ML 5.0

Smoothed Aggregation User’s Guide’, Sandia National Laboratories, Report

number SAND2006-2649.

M.B. Giles (2008), ’Multilevel Monte Carlo path simulation’, Oper. Res. 56, 607–

617.

L. Giraud, S. Gratton, X. Pinel and X. Vasseur (2010), ’Flexible GMRES with

deflated restarting’, SIAM J. Sci. Comput. 32, 1858–1878.

N.I.M. Gould, M.E. Hribar and J. Nocedal (2001), ’On the solution of equality

constrained quadratic programming problems arising in optimization’, SIAM

J. Sci. Comput. 23, 1376–1395.

N.I.M. Gould and V. Simoncini (2009), ’Spectral analysis of saddle point matrices

with indefinite leading blocks’, SIAM J. Matrix Anal. Appl. 31, 1152–1171.

I.G. Graham and M.J. Hagger (1999), ’Unstructured additive Schwarz-conjugate

gradient method for elliptic problems with highly discontinuous coecients’,

SIAM J. Sci. Comput. 20, 2041–2066.

L. Grasedyck and W. Hackbusch (2003), ’Construction and arithmetics of H-

matrices’, Computing 70, 295–334.

Preconditioning 43

A. Greenbaum (1997), Iterative Methods for Solving Linear Systems, Society for

Industrial and Applied Mathematics.

L, Greengard and V. Rokhlin (1987), ’A fast algorithm for particle simulations’, J.

Comput. Phys. 73, 325–348.

C. Greif and D. Scho¨tzau (2006), ’Preconditioners for saddle point linear systems

with highly singular (1,1) blocks’, Elect. Trans. Numer. Anal. 22, 114–121.

C. Greif and D. Scho¨tzau (2007), ’Preconditioners for the discretized time-harmonic

Maxwell equations in mixed form’, Numer. Lin. Alg. Appl. 14, 281–297.

M.J. Grote and T. Huckle (1997), ’Parallel preconditioning with sparse approxi-

mate inverses’, SIAM J. Sci. Comput. 18, 83–853.

A. Gu¨nnel, R. Herzog and E. Sachs (2014), ’A note on preconditioners and scalar

products in Krylov subspace methods for self-adjoint problems in Hilbert

space’, Elect. Trans. Numer. Anal. 41, 13–20.

I. Gustafsson (1978), ’A class of first order factorization methods’, BIT 18, 142–

156.

E. Haber and U.M. Ascher (2001), ’Preconditioned all-at-once methods for large,

sparse parameter estimation problems’, Inverse Probl. 17, 1847–1864.

W. Hackbusch (1994), Iterative Solution of Large Sparse Systems of Equations,

Springer-Verlag.

W. Hackbusch (1999), ’A sparse matrix arithmetic based on H-matrices. I. Intro-

duction to H-matrices’, Computing 62, 89–108

W. Hackbusch, B.K. Khoromskij and S. Sauter (2000), ’On H2-matrices’, in Lec-

tures on Applied Mathematics, 9–29, H. Bungartz, R. Hoppe and C. Zenger,

eds..

M. Heil, A.L. Hazel and J. Boyle (2008), ’Solvers for large-displacement fluid-

structure interaction problems: segregated versus monolithic approaches’,

Comput. Mech. 43, 91–101.

V.E. Henson and U.M. Yang (2000), ’BoomerAMG: a parallel algebraic multigrid

solver and preconditioner’, Appl. Numer. Math. 41, 155–177.

M.R. Hestenes and E. Stiefel (1952), ’Methods of conjugate gradients for solving

linear problems’, J. Res. Nat. Bur. Stand. 49, 409–436.

R. Hiptmair and J. Xu (2007), ’Nodal auxiliary space preconditioning in H(curl)

and H(div) spaces’, SIAM J. Numer. Anal. 45, 2483–2509.

R.M. Holland, A.J. Wathen and G.J. Shaw, (2005), ’Sparse approximate inverses

and target matrices’, SIAM J. Sci. Comput. 26, 1000–1011.

’HSL(2013). A collection of fortran codes for large scale scientific computation’,

http://www.hsl.rl.ac.uk .

V. John and L. Tobiska (2000), ’Numerical performance of smoothers in coupled

multigrid methods for the parallel solution of the incompressible Navier-

Stokes equations’, Int. J. Numer. Meth. Fluids 33, 453–473.

T.B. Jonsthovel, M.B. van Gijzen, S.C. Maclachlan, C. Vuik and A. Scarpas (2012),

’Comparison of the deflated preconditioned conjugate gradient method and

algebraic multigrid for composite materials’, Comput. Mech. 50, 321–333.

D. Kay, D. Loghin and A.J. Wathen (2002), ’A preconditioner for the steady-state

Navier-Stokes equations’, SIAM J. Sci. Comput. 24, 237–256.

C. Keller, N.I.M. Gould and A.J. Wathen (2000), ’Constraint preconditioning for

indefinite linear systems’, SIAM J. Matrix Anal. Appl. 21, 1300–1317.

44 A. J. Wathen

A. Klawonn (1998), ’Block-triangular preconditioners for saddle point problems

with a penalty term’, SIAM J. Sci. Comput. 19, 172–184.

A. Klawonn and G. Starke (1999), ’Block triangular preconditioners for nonsym-

metric saddle point problems: field-of-value analysis’, Numer. Math. 81, 577–

594.

L.Y. Kolotilina and A.Y. Yeremin (1993), ’Factorized sparse approximate inverse

preconditioning. I. Theory’, SIAM J. Matrix Anal. Appl. 14, 45–58.

J. Korzak (1999), ’Eigenvalue relations and conditions of matrices arising in linear

programming’, Computing 62, 45–54.

A.L. Laird and M.B. Giles (2002), ’Preconditioned iterative solution of the 2D

Helmholtz equation’, report NA-02/12, Oxford University Computer Labora-

tory, Oxford, UK.

R.B. Lehoucq and D.C. Sorensen (1996), ’Deflation techniques for an implicitly

restarted Arnoldi iteration’, SIAM J. Matrix Anal. Appl. 17, 789–821.

X.S. Li (2005), ’An overview of SuperLU: Algorithms, implementation and user

interface’, ACM Trans. Math. Software 31, 302–325.

J. Liesen and Z. Strakosˇ (2013), Krylov subspace methods : principles and analysis,

Oxford University Press.

O.E. Livne and A. Brandt (2012), ’Lean Algebraic Multigrid (LAMG): Fast graph

Laplacian linear solver’, SIAM J. Sci. Comput. 34, B499–B522.

D. Loghin and A.J. Wathen (2004), ’Analysis of preconditioners for saddle-point

problems’, SIAM J. Sci. Comput. 25, 2029–2049.

J. Ma´lek and Z. Strakosˇ (2014), Preconditioning and the Conjugate Gradient

Method in the Context of Solving PDEs, Society for Industrial and Applied

Mathematics.

S. Manservisi (2006), ’Numerical analysis of Vanka-type solvers for steady Stokes

and Navier-Stokes flows’, SIAM J. Numer. Anal. 44, 2025–2056.

K.-A. Mardal and R. Winther (2011), ’Preconditioning discretizations of systems

of partial di↵erential equations’, Numer. Linear Algebra Appl. 18, 1–40.

T.P. Mathew (2008), Domain Decomposition Methods for the Numerical Solution

of Partial Di↵erential Equations, Springer.

J.A. Meijerink and H.A. van der Vorst (1977), ’An iterative solution method for

linear systems of which the coecient matrix is a symmetric M-matrix’,Math.

Comput. 31, 148–162.

G.A. Meurant (1999), Computer solution of large linear systems, North-Holland.

M. Magolu Monga Made, R. Beauwens and G. Warze´e (2000), ’Preconditioning

of discrete Helmholtz operators perturbed by a diagonal complex matrix’,

Commun. Numer. Methods Engrg. 16, 801–817.

M.F. Murphy, G.H. Golub and A.J. Wathen (2000), ’A note on preconditioning for

indefinite linear systems’, SIAM J. Sci. Comput. 21, 1969–1972.

M. Neytcheva (1999), talk presented at the first International Conference on Pre-

conditioning, Minneapolis, 1999.

M.K. Ng (2004), Iterative Methods for Toeplitz Systems, Oxford University Press.

Y. Notay (2012), ’Aggregation-based algebraic multigrid for convection-di↵usion

equations’, SIAM J. Sci. Comput. 34, A2288–A2316.

D.P. O’Leary ’Yet another polynomial preconditioner for the conjugate gradient

algorithm’, Linear Algebra Appl. 154–156, 377–388.

Preconditioning 45

M.A. Olshanskii and E.E. Tyrtyshnikov (2014), Iterative Methods for Linear Sys-

tems: Theory and Applications, Society for Industrial and Applied Mathe-

matics.

M.E.G. Ong (1997), ’Hierarchical basis preconditioners in three dimensions’, SIAM

J. Sci. Comput. 18, 479–498.

K. Otto (1996), ’Analysis of preconditioners for hyperbolic partial di↵erential equa-

tions’, SIAM J. Numer. Anal. 33, 2131–2165.

C.C. Paige and M.A. Saunders (1975), ’Solution of sparse indefinite systems of

linear equations’, SIAM J. Num. Anal. 12, 617–629.

C.C. Paige and M.A. Saunders (1982), ’LSQR: An algorithm for sparse linear

equations and sparse least squares’, ACM Trans. Math. Software 8, 43–71.

M.L. Parks, E. de Sturler, G. Mackey, D.D. Johnson and S. Maiti (2006), ’Recycling

Krylov subspaces for sequences of linear systems’, SIAM J. Sci. Comput.

28, 1651–1674.

J. Pestana and A.J. Wathen (2014), ’A preconditioned MINRES method for non-

symmetric Toeplitz matrices’, Numerical Analysis report 14/11, Oxford Uni-

versity Mathematical Institute, Oxford, UK.

J. Pestana and A.J. Wathen (2013a), ’Combination preconditioning of saddle point

systems for positive definiteness’, Numer. Linear Algebra Appl. 20, 785–808.

J. Pestana and A.J. Wathen (2013b), ’On choice of preconditioner for mini-

mum residual methods for non-Hermitian matrices’, J. Comput. Appl. Math.

249, 57–68.

J. Pestana and A.J. Wathen (2015), ’Natural preconditioning and iterative methods

for saddle point systems’, SIAM Rev. to appear.

E.G. Phillips, H.C. Elman, E.C. Cyr, J.N. Shadid and R.P. Pawlowski (2014), ’A

block preconditioner for an exact penalty formulation for stationary MHD’,

University of Maryland, Computer Science report CS-TR-5032.

J. Poulson, B. Engquist, S. Li and L. Ying (2013), ’A parallel sweeping precon-

ditioner for heterogeneous 3D Helmholtz equations’, SIAM J. Sci. Comput.

35, C194–C212.

A. Quarteroni and A. Valli (1999), Domain Decomposition Methods for Partial

Di↵erential Equations, Oxford University Press.

A. Ramage (1999), ’A multigrid preconditioner for stabilised discretisations of

advection-di↵usion problems’, J. Comput. Appl. Math. 101, 187–203.

T. Rees, H.S. Dollar and A.J. Wathen (2010a), ’Optimal solvers for PDE-

constrained optimization’, SIAM J. Sci. Comput. 32, 271–298.

T. Rees, M. Stoll and A.J. Wathen (2010b), ’All-at-once preconditioning in PDE-

constrained optimization’, Kybernetika 46, 341–360.

S. Rhebergen, G.N. Wells, R.F. Katz and A.J. Wathen (2014), ’Analysis of block-

preconditioners for models of coupled magma/mantle dynamics’, SIAM J.

Sci. Comput. 36, A1960–A1977.

Y. Saad (1993), ’A flexible inner-outer preconditioned GMRES algorithm’, SIAM

J. Sci. Comput. 14, 461–469.

Y. Saad (1996), ’ILUM: a multi-elimination ILU preconditioner for general sparse

matrices’, SIAM J. Sci. Comput. 17, 830–847.

Y. Saad (2003), Iterative Methods for Sparse Linear Systems, second edition, So-

ciety for Industrial and Applied Mathematics.

46 A. J. Wathen

Y. Saad and M.H. Schultz (1986), ’GMRES: A generalized minimal residual algo-

rithm for solving nonsymmetric linear systems’, SIAM J. Sci. Statist. Com-

put. 7, 856–869.

J.A. Sifuentes, M. Embree and R.B. Morgan (2013), ’GMRES convergence for per-

turbed coecient matrices, with application to approximate deflation precon-

ditioning’, SIAM. J. Matrix Anal. Appl. 34, 1066–1088.

D.J. Silvester and A.J. Wathen (1994), ’Fast iterative solution of stabilised Stokes

systems Part II: Using general block preconditioners’, SIAM J. Numer Anal.

31, 1352–1367.

V. Simoncini and D.B. Szyld (2007), ’Recent computational developments in

Krylov subspace methods for linear systems’, Numer. Linear Algebra Appl.

14, 1–59.

G.L.G. Sleijpen and D.R. Fokkema (1993), ’BiCGstab(ell) for linear equations in-

volving unsymmetric matrices with complex spectrum’, Elect. Trans. Numer.

Anal. 1, 11–32.

B. Smith, P. Bjørstad and W. Gropp (1996), Domain Decomposition, Parallel Mul-

tilevel Methods for Elliptic Partial Di↵erential Equations, Cambridge Univer-

sity Press.

P. Sonneveld and M.B. van Gijzen (2008), ’IDR(s): A family of simple and fast

algorithms for solving large nonsymmetric systems of linear equations’, SIAM

J. Sci. Comput. 31, 1035–1062.

D.A. Spielman and S.-H. Teng (2014), ’Nearly-linear time algorithms for precon-

ditioning and solving symmetric, diagonally dominant linear systems’, SIAM

J Matrix Anal. Appl. 35, 835–885.

M. Stoll and A.J. Wathen, (2008), ’Combination preconditioning and the Bramble-

Pasciak+ preconditioner’, SIAM J Matrix Anal. Appl. 30, 582–608.

G. Strang (1986), ’A proposal for Toeplitz matrix calculations’, Stud. Appl. Math.

74, 171–176.

W-P. Tang and W.L. Wan (2000), ’Sparse approximate inverse smoother for multi-

grid’, SIAM J. Matrix Anal. Appl. 21, 1236–1252.

O. Taussky (1972), ’The role of symmetric matrices in the study of general matri-

ces’, Linear Algebra Appl. 5, 147–154.

A. Toselli and O. Widlund (2004), Domain Decompposition Methods - Algorithms

and Theory, Springer Series in Computational Mathematics, Vol. 34.

L.N. Trefethen and M. Embree (2005), Spectra and Pseudospectra, Princeton Uni-

versity Press.

U. Trottenberg, C.W. Oosterlee and A. Schu¨ler (2000), Multigrid, Academic Press.

P. Tsuji, B. Engquist and L.Ying (2012), ’A sweeping preconditioner for time-

harmonic Maxwells equations with finite elements’, J. Comput. Phys.

231, 3770–3783.

M.B. van Gijzen, Y.A. Erlangga and C. Vuik (2007), ’Spectral analysis of the

discrete Helmholtz operator preconditioned with a shifted Laplacian’, SIAM

J. Sci. Comput. 29, 1942–1958.

H.A. van der Vorst (1992), ’Bi-CGSTAB: A fast and smoothly converging variant of

Bi-CG for the solution of nonsymmetric linear systems’, SIAM J. Sci. Statist.

Comput. 13. 631–644.

Preconditioning 47

H.A. van der Vorst (2003), Iterative Krylov Methods for Large Linear Systems,

Cambridge University Press.

C.F. Van Loan (1992), Computational Frameworks for the Fast Fourier Transform,

Society for Industrial and Applied Mathematics.

P.S. Vassilevski (2008), Multilevel Block Factorization Preconditioners, Matrix-

based Analysis and Algorithms for Solving Finite Element Equations,

Springer.

R. Verfu¨rth (1984), ’A combined conjugate gradient-multigrid algorithm for the

numerical solution of the Stokes problem’, IMA J. Numer. Anal. 4, 441–455.

A.J. Wathen (1986), ’Realistic eigenvalue bounds for the Galerkin mass matrix’,

IMA J. Numer. Anal. 7 449–457.

A.J. Wathen (2007), ’Preconditioning and convergence in the right norm’, Int. J.

Comput. Math. 84, 1199–1209.

A.J. Wathen, B. Fischer and D.J. Silvester (1995), ’The convergence rate of the

minimum residual method for the Stokes problem’, Numer. Math. 71, 121–

134.

A.J. Wathen and T. Rees (2009), ’Chebyshev semi-iteration in preconditioning for

problems including the mass matrix’, Elect. Trans. Numer. Anal. 34, 125–135.

M.P. Wathen (2014), ’Iterative solution of a mixed finite element discretisation of

an incompressible magnetohydrodynamics problem’, MSc dissertation, De-

partment of Computer Science, University of British Columbia.

O. Widlund (1978), ’A Lanczos method for a class of nonsymmetric systems of

linear equations’, SIAM J. Numer. Anal. 15, 801–812.

P.H. Worley (1991), ’Limits on parallelism in the numerical solution of linear partial

di↵erential equations’, SIAM J. Sci. Stat. Comput. 12, 1–35.

J. Xu (1992), ’Iterative methods by space decomposition and subspace correction’,

SIAM Rev. 34, 581–613

G. Ymbert, M. Embree and J. Sifuentes (2015), ’Approximate Murphy-Golub-

Wathen preconditioning for saddle point problems’, In Preparation.

H. Yserentant (1986), ’On the multi-level splitting of finite element spaces’, Numer.

Math. 49, 379–412.