maple代写-MATH256|学霸联盟

maple代写-MATH256

时间：2022-03-22

MATH256 2021–22 Numerical Methods (5 Parts)
Contents
2 Errors and rounding 2
2.1 Computer storage and correct rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Absolute and relative error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.1 Example: decimal places . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.2 Example: significant figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.3 Example approximation of e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.4 Series summation condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 Error propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Example: square root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.2 Example: addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.3 Example: subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.4 Example: approximation of x− y . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.5 Example: roots of a quadratic equation . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Avoiding cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 Example: reformulation of roots formula . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.2 Example: a Maple test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Complex arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.6.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 Recurrence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.7.2 Miller’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2-1
MATH256 2021–22 Numerical Methods (5 Parts)
2 Errors and rounding
Main Message in one sentence:
If all numbers are allowed to have a fixed number of digits (throwing away the rest)
e.g. x = 0.x1x2...x10 × 10n for 10 digits, what happens to subsequence operations or the accuracy?
In Part 1 (or chapter 1), we encountered the situation in which we needed to break a loop to obtain an
approximate value for an infinite series. We used the condition
|tn+1| < 0.5× 10−10 × |Sn|, (∗)
where Sn is the partial sum up to term n and tn+1 is the first term omitted. In this chapter, we will examine
errors and approximations in more detail. In the process, we will see the justification for (∗).
2.1 Computer storage and correct rounding
Any nonzero x ∈ R can be written in scientific notation; that is
x = a× 10p, where 0.1 ≤ |a| < 1 and p ∈ Z.
The first nonzero digit of a always appears immediately after the decimal point. For example,
0.01206 = 0.1206× 10−1,
203.002 = 0.203002× 103,
and
−pi = −0.31415 . . . × 101.
Computers store numerical values in a similar way, using a fixed number of significant digits. Consequently,
most operations are subject to rounding error. With the default settings, Maple uses ten digits, so a typical
number a˜ in Maple (omitting exponent)
a˜ = 0.a1a2 . . . a10
= a1 × 10−1 + a2 × 10−2 + . . .+ a10 × 10−10.
Here the digits aj are integers between 0 and 9 inclusive, except a1, which cannot be zero (unless a˜ = 0
exactly). We care about how an exact and nearby number a = 0.a1a2 . . . a10, a11, . . . a50 . . . is approximated
by a˜ by rounding. The correctly rounded approximation to a is the most accurate approximation stopping
at a10. The true value for a then satisfies
|a− a˜| ≤ 5× 10−11 = 0.5× 10−10.
Note that numbers ending in 5 can be equidistantly rounded in two wayss, though traditional rounding
chooses one way.
For example, to round the 11-digit number a = 0.12345678905 to only 10 digits, we can take
a˜ = 0.1234567890 or a˜ = 0.1234567891
with |a− a˜| = 5× 10−11 in both cases. This is called a tie. The elementary rule ‘5 rounds up’ is usually not
used to resolve these because always rounding up can cause errors to accumulate. By default, Maple resolves
ties by requiring that the final stored digit is even. Each error caused by a tie then has a 50% chance of
being positive and a 50% chance of being negative, so they are unlikely to accumulate. Of course, numbers
not ending in 5 never encounter such a tie situations e.g. a = 0.12345678914→ a = 0.1234567891 with
no other choices.
2-2
MATH256 2021–22 Numerical Methods (5 Parts)
Remarks:
• We may visualise why the digit 5 is special in rounding from below∣∣∣(10)∣∣∣−−− 9−−− 8−−− 7−−− 6−−− [5]−−− 4−−− 3−−− 2−−− 1−−− ∣∣∣(0)∣∣∣
• If x and y are stored exactly (and y 6= 0), then the result of x ± y, xy and x/y will be correctly
rounded, as will √x and √y. However, this applies for a single operation only. Maintaining correct
rounding throughout a long calculation is more or less impossible.
• In other words, rounding error cannot be avoided, so we must accept that the last digit in a result is
likely to be wrong (perhaps more in long calculations).
• Although it is not our intention to dig deep into the process of how to implement these elementary
operations, it is not hard to see that one must take care of the components p first and then
operate between the decimal numbers a (often called mantissa) as below. Assume p2 ≥ p1 for
x1 = a1 × 10p1 , x2 = a2 × 10p2 :
⇒ x2 ± x1 = (a2 × 10p2−p1 ± x1)× 10p1 ≈ a˜3 × 10p˜3 ,
⇒ x1/x2 = (a1/a2)× 10−(p2−p1) ≈ a˜4 × 10p˜4 ,
⇒ x1 × x2 = (a1 × a2)× 10p1+p2 ≈ a˜5 × 10p˜5 ,
where possibly p˜4 = −(p2−p1), 1−(p2−p1) and a˜5 = p1+p2, p1+p2−1 since 0.1 < |a1/a2| < 10
and |a1 × a2| < 1.
• The number of digits stored by Maple can be increased by changing the parameter Digits. Remember:
this controls the accuracy to which numbers are stored at each stage of a calculation. It does not
guarantee the number of correct significant figures in the final answer.
• Rounding error places a bound on accuracy. Trying to beat this (e.g. by including more terms in a
series) will never succeed.
Example
Suppose that x = 0.13, and consider the two approximations
cos(x) ≈ 1− x
2
2! +
x4
4! and cos(x) ≈ 1−
x2
2! +
x4
4! −
x6
6! .
There are two sources of error here: rounding error and the truncation of an infinite series. Using ten digit
arithmetic, we obtain
cos(x) ≈ 0.9915619004 and cos(x) ≈ 0.9915618937.
The second approximation is more accurate, so we have made an improvement by including an extra term
in the series. In fact, the second result is correctly rounded. Adding more terms cannot further improve the
accuracy unless we also decrease rounding error by including more digits.
2-3
MATH256 2021–22 Numerical Methods (5 Parts)
2.2 Absolute and relative error
Notation: For any exact quantity x, x˜ represents an approximate value.
The absolute error is given by
Ex = x˜− x.
The significance of absolute error depends on the scale of the numbers in question.
• A 5mm error in measuring out a football pitch is not important.
• A 5mm error in eye surgery is likely to be disastrous.
To take the magnitudes of x into account, we can use the relative error:
εx =
x˜− x
x
= x˜
x
− 1.
The percentage error is 100× |εx|. Relative and percentage error are undefined if x = 0.
Remark: In general, the sign of the error (absolute or relative) does not matter, only the magnitude is
important.
2.2.1 Example: decimal places
This is a measure of absolute error; it takes no account of the magnitude of the numbers we are dealing
with. We must define exactly what we mean when we say an approximation is accurate to n decimal places.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Attempt 1: Count matching digits after the decimal point.
Counterexample: x = 1.00001
x˜1 = 1.00200
x˜2 = 0.99991.
Clearly, x˜1 approximates x to two places, but the more accurate approximation is x˜2, in which none of the
digits after the decimal point are the same as those of x.
Attempt 2: Round x˜ to n decimal places, then check if the results are the same.
This fixes the above problem; x˜2 = 1.000 to three places and 0.9999 to four, so x˜2 = x to 3 d.p.
Counterexample: x = 1.530
x˜1 = 1.535
x˜2 = 1.525.
Here we have a tie (see section 2.1): x˜1 and x˜2 are equidistant from x. It makes no sense to claim that
either approximation is ‘better.’ It also makes no sense to allow the number of correct d.p. to be influenced
by the mechanism used for resolving ties.
2-4
MATH256 2021–22 Numerical Methods (5 Parts)
Attempt 3: Consider the difference between x and x˜. If |x− x˜| < 10−n, then there are n zeros after the
decimal point and x˜ is accurate to n places.
Counterexample: x = 1.560
x˜ = 1.55001
Clearly, x˜ is accurate to one place, but x− x˜ = 0.00999, which has two zeros after the decimal. To prevent
this, we require the n+ 1th digit after the decimal point in x˜− x to be less than 5, so that it does not
round up. Hence (finally), we have the following definition.
To state that x˜ is an approximation to x, accurate to n decimal places, means that
|x− x˜| < 5× 10−(n+1) = 12 × 10−n.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2.2.2 Example: significant figures
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
This is a measure of relative error. Again, we must make a precise definition. We begin by writing x using
scientific notation, that is
x = a× 10p, where 0.1 ≤ |a| < 1 and p ∈ Z.
Since the first nonzero digit in a comes immediately after the decimal point, n decimal places in a is
equivalent to n significant figures in x. Thus, if
x˜ = a˜× 10p
then x˜ approximates x to n significant figures if and only if
|a− a˜| < 12 × 10−n. (∗)
Now
a− a˜
a
= 10
p × (a− a˜)
10p × a =
x− x˜
x
,
so (∗) yields ∣∣∣∣x− x˜x
∣∣∣∣× |a| < 12 × 10−n.
Finally, since |a| < 1, we have the following:
If
∣∣∣∣x− x˜x
∣∣∣∣ ≤ 12 × 10−n then x˜ approximates x to n significant figures.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2-5
MATH256 2021–22 Numerical Methods (5 Parts)
Important remark: This is a sufficient but not necessary condition. It is stronger than necessary
because we have dropped |a| from the left-hand side. If n is the largest integer for which it holds, then we
definitely have n s.f. and we may have n+ 1 (a could be as small as 0.1). We definitely do not have n+ 2
s.f.
2.2.3 Example approximation of e
• If we approximate e using
e˜ = 27211001 ,
then
εe =
e˜− e
e =
2721
1001e − 1 ≈ −0.405× 10
−7.
The test shows that we have 7 s.f. but we may have 8. We don’t because
e ≈ 2.718281828
whereas
e˜ ≈ 2.718281718.
• If x = 1 and x˜ = 1.049, then
εx =
1.049− 1
1 = 0.049 < 0.5× 10
−1.
The test shows that we have 1 s.f. but we may have 2 (we do).
• If x = 1 and x˜ = 1.051, then
εx =
1.051− 1
1 = 0.051 < 0.5× 10
0.
The test shows that we may have 1 s.f. (we do).
• If ε > 0.5, then the largest possible value for n is actually −1, so we definitely don’t have 1 s.f. An
approximation with a relative error greater than 0.5 is largely useless.
2.2.4 Series summation condition
Consider a series in which the sum up to term n is Sn, and tn+1 is the first term omitted.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Let x = Sn+1 and x˜ = Sn. Then
x− x˜
x
= Sn+1 − Sn
Sn+1
= tn+1
Sn + tn+1
= tn+1
Sn
× 11 + (tn+1/Sn) .
If |tn+1| < 0.5× 10−10 × |Sn| then the last term is very close to 1, so Sn+1 and Sn are (almost certainly)
the same to 10 s.f.
2-6
MATH256 2021–22 Numerical Methods (5 Parts)
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
This explains the condition we use for breaking a loop in a series summation: nothing is achieved by
including tn+1 in the sum.
Remarks: (i) Careless use of this condition can produce incorrect results (see Practice Problem 3.2 for
example). However, it is a good practical choice in many cases.
(ii) The above study on Sn and and Sn+1 is not a complete analysis, because it does not ensure
that the remainder |Rn| = |∑∞k=n+1 tk| is less than 0.5× 10−10. It is not trivial to give a general formula.
2.3 Cancellation
Rounding error alone is not usually problematic, but certain operations can magnify existing errors, which is
much more dangerous. Consider the example shown in quadratic.mw. Here, the quadratic equation
t2 − 42t+ 1 = 0
is solved using the standard formula, with the result
t1 = 41.97617696 and t2 = 0.02382304.
The second root has only seven significant digits; three have been lost. A few such operations will produce
results that are completely wrong, so we need to investigate what has happened.
2.4 Error propagation
We can use the fact that εx = x˜/x− 1, and so
x˜ = (1 + εx)x,
to study how errors propagate through calculations. For a given operation f , we need to find an expression
for the relative error in f(x˜); that is
εf =
f(x˜)− f(x)
f(x)
in terms of x and εx. Note that we are not concerned here with the error generated by computing f (which
should be small). Rather, we are concerned with the effect that f has on the pre-existing error in x˜. As we
will see, this can be significant.
2.4.1 Example: square root
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Assuming that x > 0, we have √
x˜ =
√
1 + εx
√
x,
and so
ε√x =
√
x˜−√x√
x
=
√
1 + εx − 1.
2-7
MATH256 2021–22 Numerical Methods (5 Parts)
Now we expand
√
1 + εx as a Taylor series; thus
√
1 + εx = 1 +
εx
2 −
ε2x
8 + · · · .
Since εx is small, we neglect terms in ε2x, ε3x, etc. Hence,
ε√x ≈
εx
2 ,
meaning the relative error in the result is approximately half the original value. Taking a square root does
not amplify existing errors.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2.4.2 Example: addition
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Let x > 0 and y > 0. We have x˜ = (1 + εx)x and y˜ = (1 + εy)y, so that
εx+y =
(x˜+ y˜)− (x+ y)
x+ y
= (x+ xεx + y + yεy)− (x+ y)
x+ y
= εx
x
x+ y + εy
y
x+ y .
Now if we let
ε = max{|εx|, |εy|},
then we observe that
|εx+y| ≤ ε x
x+ y + ε
y
x+ y
≤ ε,
meaning that addition of positive quantities does not magnify existing errors. The same argument works if
x < 0 and y < 0.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2.4.3 Example: subtraction
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Again, we assume that x and y have the same sign. We then have
εx−y =
(x˜− y˜)− (x− y)
x− y
= (x+ xεx − y − yεy)− (x− y)
x− y ,
2-8
MATH256 2021–22 Numerical Methods (5 Parts)
which simplifies to
εx−y = εx
x
x− y − εy
y
x− y . (∗)
A significant magnification of rounding error may occur when x ≈ y because the coefficients multiplying εx
and εx can be large in this case.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
The reason for this is simple: if x ≈ y, the leading digits of x and y must be the same, and these will
cancel when we compute x − y, reducing the number of correct significant figures. This effect is called
catastrophic cancellation.
2.4.4 Example: approximation of x− y
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Let x = pi and let y = 3.14 (exactly). Suppose we use x˜ = 22/7 and y˜ = y to approximate x− y. Then
εy = 0 and
εx =
22/7− pi
pi
≈ 0.40× 10−3,
which shows that the initial approximation to pi is accurate to three or four significant figures (it’s three).
Substituting into (∗), we find that
εx−y ≈ 0.40× 10−3 × pi
pi − 3.14 ≈ 0.79,
so we won’t get any correct digits if we use x˜− y˜ to calculate x− y. It is easy to check that this is indeed
the case:
x− y = pi − 3.14 = 0.001592654
is quite different from
x˜− y˜ = 227 − 3.14 = 0.002857143.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2.4.5 Example: roots of a quadratic equation
We can now explain why solving the quadratic equation t2 − 42t+ 1 = 0 produced an inaccurate result (see
§2.3). The roots are given by
t1 =
42 +
√
1760
2 and t2 =
42−√1760
2 .
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2-9
MATH256 2021–22 Numerical Methods (5 Parts)
There is no problem in calculating t1. For t2, let x = 42 and y =
√
1760 and substitute these into (∗) to
see the effect of the subtraction. We obtain
εt2 = εx
42
42−√1760 − εy
√
1760
42−√1760 .
Now εx = 0, because there is no error in representing 42 using 10 s.f., but
εy =
√
1760
∣∣∣
10 s.f.
−√1760
√
1760
≈ −1.6× 10−10,
meaning that
|εt2 | ≈ 1.6× 10−10 ×
√
1760
42−√1760
≈ 1.6× 10−10 × 880.5
≈ 1.4× 10−7.
The relative error has been magnified by almost 1000; hence the loss of three significant digits.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Remarks:
• Subtraction always causes some cancellation, but the effect is not catastrophic if the values are
reasonably far apart. In general, there is no need to be concerned about it if y ≥ 2x or y ≤ x/2. (Try
putting y = 2x or y = x/2 into example 2.4.3 to see why.)
• A calculation that can be carried out without catastrophic cancellation is said to be safe.
• To prove that a calculation is safe, we need only show that it does not involve subtraction of nearby
values.
2.5 Avoiding cancellation
It is sometimes possible to avoid catastrophic cancellation by using a different method to compute a result.
2.5.1 Example: reformulation of roots formula
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Consider again the problem of solving the quadratic equation
at2 + bt+ c = 0,
in the case where b2 4ac, so that there are two real roots. The larger root can be safely calculated, so
we define
t1 =

−b−√b2 − 4ac
2a if b ≥ 0
−b+√b2 − 4ac
2a if b < 0.
2-10
MATH256 2021–22 Numerical Methods (5 Parts)
Now we find t2 by writing the quadratic in terms of its roots; thus
a(t− t1)(t− t2) = at2 + bt+ c,
i.e.
at2 − a(t1 + t2)t+ at1t2 = at2 + bt+ c,
so that matching coefficients yields
b = −a(t1 + t2) and c = at1t2.
Rearranging the last formula, we obtain
t2 = c/(at1).
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
This can be used to safely compute t2, as shown in the improved procedure solve_quadratic_v2 (see
quadratic.mw).
2.5.2 Example: a Maple test
Consider the function
g(x) = sin(x)− x
x3
.
g := x -> ( sin( x ) - x ) / x^3 :
g( 0.0002 ) ;
-0.1625000000
evalf[32]( g( 0.0002 ) ;
-0.16666666633333333365079375000000
Using Maple, we see that calculating g(0.0002) using ten digit arithmetic produces only two correct digits in
the answer. To see why this happens, assume x is stored exactly, so that εx = 0, but that sin(x) is subject
to rounding error.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Then the formula from §2.4.3 yields
εx−sinx = −εsinx × sin x
x− sin x
= −εsinx × sin 0.00020.0002− sin 0.0002
≈ −εsinx × 1.53× 108.
The error is magnified by approximately 108, so we can expect to lose eight digits; starting with ten leaves
only two. To correct this problem, we use the Taylor series for sin x. Thus
g(x) = 1
x3
[
−x+
∞∑
j=1
(−1)j−1
(2j − 1)! x
2j−1
]
=
∞∑
j=2
(−1)j−1
(2j − 1)! x
2j−4
=
∞∑
j=1
(−1)j
(2j + 1)! x
2j−2.
2-11
MATH256 2021–22 Numerical Methods (5 Parts)
Using only one term from this expansion yields g(0.0002) ≈ −0.1666666667, which has nine correct digits.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Remarks:
• The best way to deal with a function such as g(x) is to write a procedure that uses the series if |x| is
small and the definition g(x) =
(
sin(x)− x)/x3 otherwise.
• Although it is sometimes possible to counter the effect of cancellation by calculating with more
significant figures than the problem requires, this is a crude, inefficient and unreliable approach. It
is certainly no substitute for a sound numerical method. Errors in an unsound program can grow
exponentially and will contaminate results no matter how many digits are used.
2.6 Complex arithmetic
Errors in complex arithmetic work in much the same way as errors in real arithmetic, provided we measure
the relative error in the whole quantity.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
That is, if z = x+ iy, then
εz =
z˜ − z
z
= (x˜− x) + i(y˜ − y)
x+ iy ,
and hence
|εz| =
√
(x˜− x)2 + (y˜ − y)2
x2 + y2 .
Since
εx =
x˜− x
x
and εy =
y˜ − y
y
,
we rewrite this in the form
|εz| =
√
(x˜− x)2
x2
x2
x2 + y2 +
(y˜ − y)2
y2
y2
x2 + y2
=
√
ε2x
x2
x2 + y2 + ε
2
y
y2
x2 + y2 .
Now, if ε = max{|εx|, |εy|} then
|εz| ≤
√
ε2
x2
x2 + y2 + ε
2 y
2
x2 + y2 = ε.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
2-12
MATH256 2021–22 Numerical Methods (5 Parts)
Remarks
• The above calculation shows that if the relative errors in the real and imaginary parts are small then
|εz| is also small, which is not surprising.
• The converse is not necessarily true if |y| |x| or |x| |y|. To see this, write y = αx to obtain
|εz| =
√
ε2x + α2ε2y
1 + α2 .
Then, if |y| |x| so that |α| 1 and 1 + α2 ≈ 1, we have
|εy| ≈
√|εz|2 − ε2x
|α| ,
which could be much larger than |εz|.
For example, if |y| |x| then y2/(x2 + y2) 1 so in this case |εy| could be much larger than |εz|.
More precisely, let 0 ≈ r = |y|/|x| 1. Rewrite the above formula as
|εz| =
√((x˜− x)2
x2
+ (y˜ − y)
2
y2
y2
x2
)
x2
x2 + y2 =
√(
|εx|2 + |εy|2r2
) 1
1 + r2 .
Define the weights w1 = 1/
√
1 + r2 and w2 = r/
√
1 + r2 with w2 quite small. Then
max(|εx|w1, |εy|w2) ≤ |εz| ≤
√
2max(|εx|w1, |εy|w2).
Therefore if max(|εx|w1, |εy|w2) = |εx|w1, |εx| ≈ |εz|.
But if max(|εx|w1, |εy|w2) = |εy|w2, then |εz| ≈ |εy|w2 or |εy| |εz| as claimed because w2 ≈ 0.
For example, if |y| |x| then y2/(x2 + y2) 1 so in this case |εy| could be much larger than |εz|.
More precisely, let 0 ≈ r = |y|/|x| 1. Rewrite the above formula as
|εz| =
√((x˜− x)2
x2
+ (y˜ − y)
2
y2
y2
x2
)
x2
x2 + y2 =
√(
|εx|2 + |εy|2r2
) 1
1 + r2 .
Define the weights w1 = 1/
√
1 + r2 and w2 = r/
√
1 + r2 with w2 quite small. Then
max(|εx|w1, |εy|w2) ≤ |εz| ≤
√
2max(|εx|w1, |εy|w2).
Therefore if max(|εx|w1, |εy|w2) = |εx|w1, |εx| ≈ |εz|.
But if max(|εx|w1, |εy|w2) = |εy|w2, then |εz| ≈ |εy|w2 or |εy| |εz| as claimed because w2 ≈ 0.
• If one part (real or imaginary) of a complex result is small relative to the other, it will often have
fewer correct significant figures. Unless we can calculate the real and imaginary parts separately, there
is usually nothing we can do about this.
2-13
MATH256 2021–22 Numerical Methods (5 Parts)
2.6.1 Example
Suppose that a = 1, c = 9 and b = 6− 0.1× 10−4. The roots of the quadratic equation
ax2 + bx+ c = 0
are given by
−b± (b2 − 4ac)1/2
2a =
−5.99999± (5.999992 − 36)1/2
2
≈ −2.999995± 0.005477223293i.
However, if b˜ = 6− 0.2× 10−4 (which has five correct s.f.) then we obtain
−b˜± (b˜2 − 4ac)1/2
2a =
−5.99998± (5.999982 − 36)1/2
2
≈ −2.99999± 0.007745960237i.
The imaginary part (which is smaller than the real part) has no correct significant figures.
Now look at the ‘+’ case in details (to verify the analysis in the previous section):
z = x+ yi = −2.999995 + 0.005477223i approximated by z˜ = x˜+ y˜i = −2.99999 + 0.007745960237i.
We compute the relative errors
εx =
x˜− x
x
≈ −0.1666669444× 10−7, εy = y˜ − y
y
≈ 0.4142130487,
εz =
z˜ − z
z
= (x˜− x) + i(y˜ − y)
x+ iy = −0.2859550249× 10
−8 − 0.7562475282× 10−3i
|εz| ≈ 0.07562475823× 10−2, 0 ≈ r = |y||x| =
0.005477223
2.999995 = 0.001825744043 1.
Here w1 = 1/
√
1 + r2 = 0.999998334, w2 = r/
√
1 + r2 = 0.001825741001. This suggests that x˜ has
seven s.f., y˜ has no s. f. while z˜ has two s.f., which is because
max(|εx|w1, |εy|w2)
= max(0.167× 10−7 × 0.99999, 0.4142130487× 0.001825741001) = |εy|w2 = 0.07562457462× 10−2
and |εz| = |εy|w2, |εy| = |εz|/w2 ≈ 547|εz| |εz|. So |εy| |εz| due to r = |y|/|x| 1.
2.7 Recurrence relations
A recurrence relation defines the members of a sequence in terms of one or more earlier members. Perhaps
the best known example is the Fibonacci sequence
f0 = 0, f1 = 1, fj = fj−2 + fj−1, j = 2, 3, . . .
Recurrence relations are very useful in numerical analysis, but they can be subject to catastrophic cancellation,
and so must be used with care.
2-14
MATH256 2021–22 Numerical Methods (5 Parts)
2.7.1 Example
Consider the sequence of integrals
Jn =
∫ 1
0
tn
t+ 8 dt, n = 0, 1, . . .
We can easily calculate
J0 =
∫ 1
0
1
t+ 8 dt =
[
ln(t+ 8)
]1
0 = ln(9/8),
J1 =
∫ 1
0
t
t+ 8 dt =
∫ 1
0
(
1− 8
t+ 8
)
dt =
[
t− 8 ln(t+ 8)]10 = 1− 8 ln(9/8)
... ... ...
but the integrations become increasingly complicated as we progress through the sequence. Instead, we
may observe that
Jn+1 + 8Jn =
∫ 1
0
tn+1
t+ 8 dt+ 8
∫ 1
0
tn
t+ 8 dt
=
∫ 1
0
tn
t+ 8
t+ 8 dt
=
[
tn+1
n+ 1
]1
0
= 1
n+ 1 .
Therefore we might try finding J1, J2, etc. from J0 using the recurrence relation
Jn+1 =
1
n+ 1 − 8Jn.
However, non-integer values have to remain within the fixed (computer) precision or have fixed digits.
Using a do loop in Maple (miller_example.mw), we obtain However, calculating using a do loop in Maple
(miller_example.mw) we obtain J20 = 5.036163635 × 107. Actually a problem (of increasing) is first
noticeable when J8 = 0.2600 > J7 = 0.10300 at n = 7 and also J9 = −0.400.
This is nonsense;
tn/(t+ 8) < 1 for 0 < t < 1,
so the sequence is bounded and non-negative i.e. 0 < Jn < 1 for any n. The correct values are
J8 ≈ 0.09989, J9 = 0.0898, J20 ≈ 0.0053. To see why this happens, we examine the effect of the
recurrence relation on an existing relative error.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
We start by writing
J˜n = Jn(1 + εn),
where εn is a small relative error. Putting this into the recurrence relation, we find that
J˜n+1 =
1
n+ 1 − 8J˜n
= 1
n+ 1 − 8Jn(1 + εn)
= Jn+1 − 8Jnεn.
2-15
MATH256 2021–22 Numerical Methods (5 Parts)
Rearranging yields
εn+1 =
J˜n+1 − Jn+1
Jn+1
= −8εn JnJn+1 .
Finally, since
tn+1
t+ 8 <
tn
t+ 8 for 0 < t < 1,
it follows that Jn > Jn+1 and so
|εn+1| > 8|εn|.
· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
Remarks
• The relative error is multiplied by at least 8 in each step.
• The actual factor is larger than 8 because Jn/Jn+1 > 1, but finding a more accurate bound for this
ratio is difficult.
• Using ten digit arithmetic, we should certainly expect no accuracy at all in J12, J13, . . ., because
812 ≈ 6.87× 1010. In fact the Maple worksheet shows that the last term with even one correct s.f. is
J8.
Using ten digit arithmetic, we can expect no accuracy at all in J12, J13, . . ., because 812 ≈ 6.87× 1010. In
fact, this estimate is conservative; the Maple worksheet shows that last term with even one correct s.f. is J8.
• [Optional material] To improve the prediction, we have to replace JnJn+1 > 1 in (**) by something more
precise. By the “mean value theorem for definite integrals" i.e.∫ b
a
f(x)g(x)dx = f(c)
∫ b
a
g(x)dx, c ∈ (a, b), g(x) ≥ 0 ( or no sign change in g),
we see that for some ξn+1 ∈ (0, 1)∫ 1
0
tn+1
t+ 8dt = ξn+1
∫ 1
0
tn
t+ 8dt i.e.
Jn
Jn+1
= 1
ξn+1
> 1, |εn| = 8
n
ξnξn−1 · · · ξ1 |ε0|.
Though this is very precise, we do not know these ξk. We may guess them from their estimates e.g.
assuming 0.3 ≤ ξk ≤ 0.7 leads to ξnξn−1 · · · ξ1 ∈ [0.3n, 0.7n]. Taking |ε0| = 0.49 × 10−10, we see that
|ε6| ∈ [0.11× 10−5, 0.17× 10−3],
|ε7| ∈ [0.13× 10−4, 0.47× 10−2], |ε8| ∈ [0.14× 10−3, 12.5], |ε9| ∈ [0.16× 10−2, 334]
which mean that J7 will have some accuracy and J8, J9 might not have any s.f. — a bit closer to the truth!
Another way to derive the above result is to mimick a first order differential equation: y′(x)+8y(x) = 1/(x+1)
which has the solution y(x) = yp(x) +A exp(−8x).
Note that, using the particular solution Jn, the most general solution to the recurrence relation
yn+1 + 8yn =
1
n+ 1
is
yn = Jn +A(−8)n,
2-16
MATH256 2021–22 Numerical Methods (5 Parts)
since A(−8)n satisfies yn+1 + 8yn = 0 when A is a constant as with an ODE. We can verify that
yn+1 + 8yn = Jn+1 +A(−8)n+1 + 8Jn + 8A(−8)n
= Jn+1 + 8Jn
= 1
n+ 1 .
This is similar to a first order differential equation: Jn plays the role of the particular integral, and A(−8)n
is the homogeneous solution (complementary function). We want the solution with A = 0, but we never
get this (numerically) because the exact values involve logarithms so there is always some rounding error. If
J˜0 = J0(1+ε0), then A = J0ε0. This is small, but its contribution grows rapidly with n and soon dominates
the solution.
Important remark: If we can use a recurrence relation in such a way that the relative error does not
grow, the calculation is said to be stable. Otherwise it is unstable.
2.7.2 Miller’s algorithm
Miller’s algorithm works on the basis that a recurrence relation that is unstable in one direction may be
stable in the opposite direction. Consider the recurrence from example 2.7.1 as an equation for Jn:
Jn =
1
8
( 1
n+ 1 − Jn+1
)
.
Assuming |εn+1| is given first, we have |εn+1| > 8|εn|, so the following is even better than |εn| < |εn+1|:
|εn| < 18 |εn+1|
meaning the errors will actually become smaller if we step downwards! The only question is how to start
the recurrence — computing JN directly is awkward if N is large. Noting J∞ = 0 (i.e. JN+P ≈ 0 for large
P ), Miller gets around this by starting far into the sequence at n = N + P , say. Since
|εN+P | =
∣∣∣∣ J˜N+P − JN+PJN+P
∣∣∣∣
there is one initial estimate for which we know the relative error: if J˜N+P = 0, then the above formula
states |εN+P | = 1. Since
1
812 ≈ 1.45× 10
−11 < 0.5× 10−10
choosing P = 12 is guaranteed to produce accurate results for n = 0, . . . , N (see miller_example.mw).
2-17