matlab代写-CSCC43|学霸联盟

matlab代写-CSCC43

时间：2022-10-20

CSCC43 - Summary 2 FLOATING POINT ARITHMETIC
1 What is Numerical Analysis?
Physical Systems are typically transformed into Mathematical Models by engineers, chemists, physi-
cists...
Mathematical Models can then have a closed form solution (rarely!), however most times there will only
be a numerical solution which requires Numerical Analysis
2 Floating Point Arithmetic
2.1 Representation of non-negative integers
Decimal System (Base 10): 350 = (350)10 = 3 · 102 + 5 · 101 + 0 · 100
Binary System (Base 2): 350 = (101011110)2 = 1 · 28 + 0 · 27 + · · ·+ 1 · 21 + 0 · 20
In general, we have Base b System, for b > 0, b 2 N
x = (dndn1 . . . d0)b = dn · bn + dn1 + · · ·+ d0 · d0
where x 0, x 2 N, 0  di < b, i 2 [1, n]
Hexadecimal System (Base 16): Symbols 0, 1, . . . , 8, 9, A,B,C,D,E, F
(8FA)16 = 8 · 162 + F · 161 +A · 160 = 8 · 162 + 15 · 161 + 10 · 160 = (2298)10
Converting decimal to base b
Example, (350)10 in base b = 2
Numerator Denominator Quotient Remainder
350 2 175 0
175 2 87 1
87 2 43 1
43 2 21 1
21 2 10 1
10 2 5 0
5 2 2 1
2 2 1 0
1 2 0 1
Once we hit 0 in the Quotients column, we simply read the Remainder column from bottom up and we
have our conversion. Thus (350)10 = (101011110)2.
This algorithm is safe (will always terminate). Where overflow is the only potential problem.
2.2 Representation of real numbers
if x 2 R then x = ±(xI .xF )b = ±(dndn1 . . . d0.d1d2 . . . )b
Where xI is the integral part and xF is the fractional part. The sign (+ or ) is a single bit 0 or 1
xI is the non-negative integer talked about above, and xF can have infinite digits.
Example: (0.77 . . . )10 = (0.7¯)10 = 7 · 101 + 7 · 102 + · · ·
Binary System (Base 2): (0.77 . . . )10 = (.000110011 . . . )2 = (0.0 ¯0011)2 = 0·21+0·22+0·23+1·24+· · ·
In general, we have Base b System, for b > 0, b 2 N
xF = (.d1d2d3 . . . )b = d1 · b1 + d2 · b2 + · · · =
1X
i=1
di · bi
We say that a Binary Fraction with n digits is terminating if it does not have any cycling and is finite,
furthermore it has a terminating decimal representation.
Page 1

CSCC43 - Summary 2 FLOATING POINT ARITHMETIC
Converting Decimal Fractions to base b
Example, (0.625)10 in base b = 2
Multiplier Base Product Integral Fraction
0.625 2 1.25 1 0.25
.25 2 0.5 0 0.5
0.5 2 1 1 0
Once we hit 0 in the fractions column, we simply read the Integral column from top down and we have our
conversions. Thus (0.625)10 = (.101)2
However you may not always hit 0 as this algorithm is not safe (won’t always terminate)!
For example: What is (0.1)10 in binary?
Multiplier Base Product Integral Fraction
0.1 2 0.2 0 0.2
0.2 2 0.4 0 0.4
0.4 2 0.8 0 0.8
0.8 2 1.6 1 0.6
0.6 2 1.2 1 0.2
We see that 0.2 occurs again as a fraction. This means that there must be a cycle! Thus (0.1)10 = (0.0 ¯0011)2
2.3 Machine Representation of Reals
Reals represented in computers as Floating Point numbers (FP for short).
A FP x in base b has the form: x = (F )b ·b(e)b , where F is the fraction and has the form F = ±(d1, d2 · · · dt)b
such F is also called the mantissa.
and e = ±(cscs1 . . . c1)b is called the exponent
A FP number is normalized if d1 6= 0 unless d1 = d2 = · · · = dt = 0
Significant Digits: (Of a nonzero FP) are the digits following and including the first nonzero digit. In the
case of a normalized FP, all digits of the mantissa is significant (storing useful information).
Absolute value of mantissa is always 0 and < 1
Exponent is limited: e 2 [M,M ] where M = (asas1 · · · a1)b with each ai = (b 1)
Largest FP number in absolute value defined by this system is (.aa . . . a)b · b(aa...a)b where each a = (b 1)
Smallest nonzero normalized FP number in absolute value is (0.10 . . . 0)b · b(aa...a)b , where each a = (b 1)
Non-normalized FPs allow us to get very very close to 0, consider (.00 . . . 01)b · b(aa...a)b
Rb(t, s) denotes the set of all floating point numbers with t digit mantissa and s digit exponent.
And any Rb(t, s) is finite, while R is infinite.
Overflow orUnderflow occurs whenever a nonzero floating point number with absolute value outside these
ranges must be stored on the computer.
Note that when the number is too close to zero it’s called underflow. not when its largely negative.
Furthermore, R is compact, where as Rb(t, s) is not. This is to say that between any two real numbers, there
is an infinite number of reals in between. (infinite density).
A real number x = ±(xI .xF )b = ±(dkdk1 . . . d0.d1d2 . . . )b
We say we convert a real x to a fp number with the following notation, x 2 R! Fl(x) 2 Rb(t, s)
Page 2
f砻
0 1X26
1xzz
以27
1Xp Nz8
惢
CSCC43 - Summary 2 FLOATING POINT ARITHMETIC
We can represent this in Rb(t, s) by the following algorithm:
1. Normalizing the mantissa.
We shift the decimal point, and readjust with the exponent
x = ±(dkdk1 . . . d0.d1d2 . . . )b = ±(.D1D2 . . . )b · bk+1
2. chop or round the mantissa
(a) chopping - chop after digit t of the mantissa.
(b) rounding - chop after digit t then round Dt depending on whether Dt+1 b/2 or Dt+1 < b/2.
A more ecient way to round is to add b/2 to Dt+1, then simply chop.
Example
Consider Fl(3/2) 2 R10(2, 4) =
(
+0.66 if chop
+0.67 if round
Round o↵ Error
Precisely, this is the di↵erence between an x 2 R and FL(x) 2 Rb(t, s).
Typically measured relative to x as
x FL(x)
x
= or FL(x) = x(1 ) where is the relative round o↵.
can be bound independently of x, < b1t for chopping normalized FPs.
|| < 1
2
b1t for rounding normalized FPs.
2.4 Machine Arithmetic
Let x, y 2 R and FL(x), FL(y) 2 Rb(t, s)
Consider operations 2 {+,,⇥, /}, the computer gives x y ⇡ FL(FL(x) FL(y))
Example in R10(2, 4)
let x = 2, y = 0.0000058 2 R we have x+ y = 2.0000058
then FL(x) = (+0.20 · 101)10 and FL(y) = (+0.58 · 105)10 Note that no RROs in FL(x), FL(y)
then FL(x) + FL(y) = 0.20 · 101 + 0.58 · 105 = 0.20000058 · 101 stored temporarily.
finally, FL(FL(x) + FL(y)) = (+0.20 · 101)10
2.5 Machine Precision
Definition: The smallest non-normalized FP number eps such that 1 + eps > 1. This number (eps) is
referred to as machine epsilon.
This is important as this number eps =
8<:b
1t chopping
1
2
b1t rounding
recall =
x FL(x)
x
is the RRO where can be bound independently.
0   eps for chopping, and ||  eps for rounding.
How exactly is error propagated in each of 2 {+,,⇥, /}?
Let x, y 2 R, then Fl(x) = x(1 ), F l(y) = y(1 )
Multiplication: x · y, where computer gives FL(FL(x) · FL(y))
We have [x(x x) · y(1 y)](1 xy) = x · y(1 x)(1 y)(1 xy) ⇡ x · y(1 x y xy) = x · y(1 .)
We say this is a good approximation as each is a small fraction bound by machine epsilon, there’s a good
chance that products of s will be so small that they are lower than such machine epsilon.
Where ||  3 · eps, this is the worst case scenario.
Page 3
CSCC43 - Summary 3 LINEAR SYSTEMS
Addition: x+ y, where computer gives FL(FL(x) + FL(y))
We have (x(1x)+y(1y))(1xy) = x(1x)(1xy)+y(1y)(1xy) ⇡ x(1xxy)+y(1yxy)
= (x+ y)

1 x(1 x xy)
x+ y
+
y(1 y xy)
x+ y

= (x+ y)[1 +]
Where |+| 
xx+ y
2 · eps+ yx+ y
2 · eps = |x|+ |y||x+ y| 2 · eps
i.e. |+| 
8<:2 · eps x and y are the same sign2 · eps |x y||x+ y| x and y are di↵erent signs
We see that when x ⇡ y, the error bound could approach infinity. This is called subtractive cancellation.
Example of Subtractive Cancellation in R10(3, 1) with rounding.
Compute a2 2ab+ b2 with a = 15.6, b = 15.7 2 R
first we have fl(a) = +(0.156 · 102) and fl(b) = +(0.157 · 102) with no initial round o↵.
In R10(3, 1), f l(a2)! fl(243.36) = +(0.243 · 103)
fl(2ab)! fl(489.84) = +(0.490 · 103)
fl(b2)! fl(246.49) = +(0.246 · 103)
fl(a2 2ab+ b2)! fl(243 490 + 246) = (0.100.101)
But this is a problem as a2 2ab+ b2 = (a b)2 which is positive.
3 Linear Systems
A~x = ~b, A 2 Rn⇥n, ~x,~b 2 Rn, Given A,~b, calculate ~x
General Solution Technique:
1. Reduce the problem to an equivalent one that is easier to solve.
2. Solve the reduced problem.
Ex, given
3x1 5x2 = 1
6x1 7x2 = 5 ,

3 5
6 7

x1
x2

=

1
5

,

3 5
0 3

x1
x2

=

1
3

) x1 = 1, x1 = 2
Generalize this to n equations in n unknowns.
Let matrix A = [aij ] where a11 refers to the top left element.
We have:
Equation 1: a11x1 + a12x2 + · · ·+ a1nxn = b1
Equation 2: a21x1 + a22x2 + · · ·+ a2nxn = b2
...
Equation n: an1x1 + an2x2 + · · ·+ annxn = bn
Reduce system to triangular form.
1. Assume that a11 6= 0
multiply Equation 1 by
a21
a11
and subtract that from Equation 2.
multiply Equation 1 by
a31
a11
and subtract that from Equation 3.
...
multiply Equation 1 by
an1
a11
and subtract that from Equation n.
This results in a equivalent system, where x1 has been eliminated from Equations 2 to n.
Page 4
CSCC43 - Summary 3 LINEAR SYSTEMS
Now we have:
Equation 1: a11x1 + a12x2 + · · ·+ a1nxn = b1
Equation 2ˆ: 0 + aˆ22x2 + · · ·+ aˆ2nxn = bˆ2
...
Equation nˆ: 0 + aˆn2x2 + · · ·+ aˆnnxn = bˆn
2. Assume that aˆ22 6= 0
multiply Equation 2ˆ by
aˆ32
aˆ22
and subtract that from Equation 3ˆ.
multiply Equation 2ˆ by
aˆn2
aˆ22
and subtract that from Equation nˆ.
...
finally, at step n 1, we assume a˜n1,n1 6= 0, multiply equation ˜n 1 by a˜n,n1
a˜n1.n1
and subtract from
Equation n˜
this will result in an upper triangular system.
Equation 1: a11x1 + a12x2 + · · ·+ a1nxn = b1
Equation 2ˆ: 0 + aˆ22x2 + · · ·+ aˆ2nxn = bˆ2
...
Equation n˜: 0 + 0 + · · ·+ a˜nnxn = b˜n
Another way of looking at this problem is using matrix vector notation, looking to solve A~x = ~b
where A 2 Rn⇥n,~b, ~x 2 Rn
Triangulating such A requires the following steps:
1. Eliminate the first column of A below a11
L1A~x = L1~b, where L1 =
2666664
1 0 . . . 0
a21/a11 1 . . . 0
a31/a11 0 . . . 0
...
...
...
...
an1/a11 0 . . . 1
3777775
This is very similar to the identity matrix, except the first column is filled with the multipliers used in
the first step.
2. Eliminate the second column of L1A below aˆ22
L2(L1A)~x = L2(L1~b) =
2666664
a11 a12 . . . a1n
0 aˆ22 . . . aˆ2n
0 0 ˆˆa33 ˆˆa3n
...
...
...
...
0 0 ˆˆan3 ˆˆann
3777775, where L2 =
2666664
1 0 . . . 0
0 1 . . . 0
0 aˆ32/aˆ22 . . . 0
...
...
...
...
0 aˆn2/aˆ22 . . . 1
3777775
...
We continue until we have Ln1Ln2 . . . L1A~x = Ln1Ln2 . . . L1~b
let Ln1Ln2 . . . L1A = U , where U is an upper triangular matrix. This becomes very easy to solve.
Page 5
CSCC43 - Summary 3 LINEAR SYSTEMS
3.1 LU Factorization
We have Ln1Ln2 . . . L1A = U , A = L11 L12 . . . L1n1U
Lemma 1: If Li is a Gauss Transform. Then L
1
i exists and is also a Gauss Transform.
Lemma 2: If Li and Lj are Gauss Transforms, and i < j, LiLj = Li + Lj I
Then A = L11 L
1
2 . . . L
1
n1U = LU
Using A = LU to solve A~x = ~b
A~x = ~b, LU~x = ~b , let ~d = ~b for a lower triangular L~d
Then L~d = ~b, U~x = ~d for an upper triangular U~x
We use LU factorization because its far cheaper, and if we have several linear systems with the same
coecient matrix A, but di↵erent right hand side, i.e. A~x = ~b = ~c = ~d . . . , then we can use the same LU .
Example of LU factorization
let A =
24 2 1 12 2 2
2 4 1
35
L1 =
24 1 0 01 1 0
1 0 1
35 , L1A =
242 1 10 3 1
0 3 2
35
L2 =
241 0 00 1 0
0 1 1
35 , L2L1A =
242 1 10 3 1
0 0 1
35, L2L1A = U , A = L11 L12 U
L11 =
24 1 0 01 1 0
1 0 1
35 , L12 =
241 0 00 1 0
0 1 1
35 , L = L11 L12 = L11 + L12 I =
24 1 0 01 1 0
1 1 1
35
3.2 Gaussian Elimination with Pivoting
From last lecture, we have L2P2L1P1A = U
L2P2L1P1A ⌘ L2P2L1P2P2P1A, as P2P2 ⌘ I
Then L2P2L1P2P2P1A ⌘ L2(P2L1P2)P2P1A
Let L˜1 = P2L1P2
Claim: L˜1 is a Gauss Transform.
Example P2 =
241 0 00 0 1
0 1 0
35 , L1 =
24 1 0 0l21 1 0
l31 0 1
35, then P2L1P2 =
24 1 0 0l31 1 0
l21 0 1
35 (swapped multipliers)
Them L2L˜1P2P1A = U , P2P1A = L˜11 L12 U , PA = LU
Now, to solve A~x = ~b, given PA = LU
A~x = ~b, PA~x = P~b, LU~x = P~b, note that P~b is ~b with 2 elements swapped.
Then LU~x = ~˜b, let ~d = U~x
L~d = ~b is easy to solve because L is lower triangular
U~x = ~d is easy to solve because U is upper triangular.
What happens if at some k-th stage, the diagonal and everything below it is 0?
For example Lk1 . . . L2L1A, we cannot divide by the k-th row, k-th column element because its 0, further-
more every element below it is 0.
Remember our goal originally is to make every element in the k-th column under k zero, we just continue.
Page 6
CSCC43 - Summary 3 LINEAR SYSTEMS
This will result in a matrix U with a 0 along one of the diagonal entries.
Then we have a singular matrix U , but the factorization still stays.
While its possible for U to be singular, it’s not for the matrix L.
As a combination of gauss transforms, they must be invertible and thus non-singular.
In the last step, we have U~x = ~d, but U is singular. We could have no solution, or infinitely many solutions.
Example: U~x = ~d!
242 5 40 0 1
0 0 2
3524x1x2
x3
35 =
24b1b2
b3
35) 2x3 = b3 and x2 = b2
Now if b3 = 2b2, we have a free parameter x2. Otherwise, there are no solutions.
Now suppose during a numerical factorization of FPS, if all elements below the diagonal in column k and
[akk] at the k-th stage and are of magnitude  eps ·maxj2[1,k]|ujj |, we call this Numerical Singularity
(Near Singularity).
3.3 Complexity of Gaussian Elimination
Let matrix A be of order n⇥ n
We will count multiplication/addition pairs, i.e. mx+ b as a Flop (Floating Point Operation).
Lets begin with factorization. Computing the LU factorization,
1st stage is the zeroing out the first column, we have (n 1)2 flops as we are adding multiples of row 1 to
rows 2 to n.
2nd stage is the zeroing out the first column, we have (n 2)2 flops as we are adding multiples of row 2 to
rows 3 to n.
...
(n 1)-th stage is the zeroing out the first column, we have 1 flop.
Finally, we have (n 1)2 + (n 2)2 + · · ·+ 1 total flops.
nX
k=1
k2 =
n(n+ 1)(2n+ 1)
6
, then (n 1)2 + (n 2)2 + · · ·+ 1 = n(n 1)(2n)
6
=
n3
3
+O(n2)
Computing the forward solve, L~d = ~b where L is a lower triangular matrix.26664
1 0 . . . 0
l21 1 o. ts 0
...
...
. . . 0
ln1 ln2 . . . 1
37775
26664
d1
d2
...
dn
37775 =
26664
b1
b2
...
bn
37775, d1 = b1, d2 = b2 l21d1, d3 = b3 l31d1 l32d2 . . .
Total = 0 + 1 + 2 + · · ·+ (n 1) = n
2
2
+O(n) Flops
Back-solving is similar, resulting in another
n2
2
+O(n) Flops
Total Forward/Backward solve: n2 +O(n) Flops
3.4 Round O↵ Error of Gaussian Elimination
Recall we have a factorization PA = LU computed in a floating point system.
Because of machine round o↵ error, we actually get Pˆ (A+E) = LˆUˆ , where Pˆ , Lˆ, Uˆ are the computed factors.
Then solving A~x = ~b becomes solving (A+ E)~ˆx = ~b, where ~ˆx is the computed solution.
Equivalently, let E~ˆx = ~r, then (A+ E)~ˆx = ~b becomes A~ˆx+ ~r = ~b, or ~r = ~bA~ˆx
Where ~r is the residual. (We would like this to be ~0).
Page 7
zzrrffifǐ
必
ye Lexi
st_s ii d.ie
A th
AFC to factorA.lu器
week5
CSCC43 - Summary 3 LINEAR SYSTEMS
We can show that if row partial pivoting is used,
kEk / k · eps · kAk where k is not too large and depends on n.
And that k~rk / k · eps · kbk , krkkbk / k · eps
However, this does not mean that k~x ~ˆxk or k~x
~ˆxk
kxk is small.
Example:

0.780 0.563
0.913 0.656

~x =

0.217
0.254

, we know true solution is ~x =

1
1

Consider two computed solutions, xˆ1 =

0.999
1.001

, xˆ2 =

0.341
0.087

~r1 = ~bAxˆ1 =
0.001243
0.001572

,~r2 = ~bAxˆ2 =
0.000001
0

We see that
k~r2k
k~bk is much smaller than
k~r1k
~b
, yet we see that xˆ2 is a terrible solution.
But why is
k~x ~ˆx1k
kxk so much smaller than
k~x ~ˆx2k
kxk ?
We need the relationship between relative error and relative residual.
We have Axˆ = ~b ~r and A~x = ~b, subtract the top from the bottom yields:
A(~x xˆ) = ~r , ~x xˆ = A1~r
Taking the norm of both sides yields: k~x xˆk = kA1~rk  kA1kk~rk (1)
We can take the original ~b = A~x, k~bk = kA~xk  kAkk~xk (2)
Combining (1) and (2):
k~x xˆk
kAkk~xk 
kA1kk~rk
k~bk ,
k~x xˆk
k~xk 
kAkkA1kk~rk
k~bk
Where kAkkA1k = cond(A)
We can try to derive a lower bound for this, again we have equations A~x = ~b and Axˆ = ~b ~r,
Subtracting the two yields A(~x xˆ) = ~r , kA(~x xˆ)k = k~rk
By triangular inequality, kAkk~x xˆk k~rk (1)
We take the original A~x = ~b, rearrange: ~x = A1~b) ~x  kA1kk~bk (2)
Combining equations (1) and (2) yields
k~x xˆk
k~xk
k~rk
k~bkkAkkA1k
Finally, combining two bounds, we have
k~rk
k~bkcond(A) 
k~x xˆk
k~xk  cond(A)
k~rk
k~bk
If cond(A) is very large, problem is poorly conditioned. Small relative residual does not mean small relative
error.
If the cond(A) is not too large, the problem is well conditioned and the small relative residual is a reliable
indicator of small relative error.
Conditioning is a continuous spectrum, how large is ”very large” depends on context.
Recall in the previous example: A =

0.780 0.563
0.913 0.656

, A1 = 106

0.656 0.565
0.913 0.780

, where 106 =
1
det(A)
Determinant formula: A =

a b
c d

, A1 =
1
ad bc

d b
c a

where det(A) = 0) A1 DNE.
Page 8
⼀
CSCC43 - Summary 4 NON LINEAR EQUATIONS
kAk1 = 1.572, kA1k = 1.693 · 106 ) cond1(A) = kAk1kA1k1 = 2.66 · 106
k~x xˆk
k~xk  2.66 · 10
6 krk
kbk , i.e. relative error in x could be as big as 2.66 · 10
6 times the relative residual.
Thus this A is a poorly conditioned matrix. and relative residual is not a reliable indicator of relative error.
3.5 Iterative Refinement/Iterative Improvement
Can we improve xˆ? One easy way is to improve mantissa length.
Want to solve A~x = ~b, having already solved (A+ E)xˆ = ~b
Subtracting the two equations yields A(~x xˆ) = ~r, let ~z = ~x xˆ, and we solve A~z = ~r
Then ~x = xˆ+ ~z, however this is a fallacy since we can not solve for ~z exactly. thus we get zˆ, and not ~z.
But if we were to get 2 leading digits correct in xˆ, then we will get 2 leading digits correct in zˆ, and then
you will have 4 leading digits correct in xˆ+ zˆ
The Algorithm
Compute xˆ(0) by solving A~x = ~b in a floating point system.
For i = 0, 1, 2, . . . until solution is good enough.
Compute the residual: r(i) = b axˆ(i)
Solve A~z(i) = r(i) For some zˆ(i)
Update xˆ(i+1) = xˆ(i) + zˆ(i)
4 Non Linear Equations
Problem: F : R! R
We need to some F (x˜) = 0 for some x˜ 2 R (root finding problem)
Typically, we cannot find a ”closed form” (analytic) solution to non linear F , which is why we need numerical
techniques.
We can find iterative methods which generate an approximation xˆk, k = 0, 1, 2, . . . such that xˆk ! x˜ as
k !1
4.1 Fixed Point Methods
Given a root finding problem, F (x˜) = 0, this is equivalent to a fixed point problem x˜ = g(x˜)
where x˜ = g(x˜) is obtaining a fixed point from g.
Example: F (x) = x ex, this is equivalent to finding the fixed point of x = ex
We can always use g(x) = xF (x) to obtain a working g(x). Alternatively, we can use g(x) = xh(x)F (x)
we refer to x F (x) to be the first form, x h(x)F (X) as the second form.
We often use the algebraically equivalent fixed point problem to solve the root finding problem because there
is an obvious iteration algorithm to do so.
First form, F (x˜) = 0, x˜ = g(x˜)
Second form, F (x˜) = 0) x˜ = g(x˜), however we can have a fixed point x˜ = g(x˜) such that F (x˜) 6= 0.
(i.e., h(x˜) = 0 but F (x˜) 6= 0)
The advantage of the second form is that there is flexibility in designing g(x), to make iteration converge
faster.
Page 9
CSCC43 - Summary 4 NON LINEAR EQUATIONS
4.2 Fixed Point Iteration
Start with an approximate solution xˆ0 then iterate:
ˆxk+1 = g(xˆk), k = 0, 1, 2, . . . until convergence/failure.
Example: Let F (x) = x2 + 2x 3 with exact roots x˜ = 1,3
Consider the fixed point iteration (FPI): xk+1 = xk +
x2k + 2xk 3
x2x 5
for the fixed point problem x = g(x) = x+
x2 + 2x 3
x2 5
We see that this is the second form of the fixed point problem, where h(x) = 1
x2 5
notice using this h(x) 6= 0, 8x 2 R, this means we don’t need to check after convergence.
Fixed Point Iteration of xˆ0 = 5! xˆks converges to -3.
However, Fixed Point Iteration of xˆ0 = +5! xˆks do not converge.
We can keep trying, but that would be a significant waste of time.
4.3 Fixed Point Theorem
If there is an interval [a, b] such that g(x) 2 [a, b] for all x 2 [a, b]
And if kg0(x)k  L < 1, 8x 2 [a, b]
Then g(x) has a unique fixed point in [a, b]
Proof, start with any xˆ0 2 [a, b] and iterate with ˆxk+1 = g(xˆk), k = 0, 1, 2 . . . then all xˆk 2 [a, b]
Moreover, xk+1 xk = g(xk) g(xk1) = g0(⌘k)g(xk xk1)
For some ⌘k 2 [xk1, xk] ⇢ [a, b] By MVT (Mean Value Theorem)
Thus, |xk+1 xk| < |g0(⌘k)||xk xk1|  L|xk xk1|
Then, L|xk xk1|  · · ·  Lk|x1 x0|
Since L < 1, |xk+1 xk|! 0 as k !1
This means the fixed points are converging to some x˜ 2 [a, b]
to complete this proof, we must show:
1. x˜ = g(x˜) (i.e., x˜ is a fixed point!)
2. x˜ is unique in [a, b].
Using the Fixed Pint Theorem, Example: F (x) = x ex = 0 and g(x) = ex for x = g(x)
What is the maximum interval of guaranteed convergence of xk?
4.4 Rate of Convergence
Definition: If lim
x˜k!x˜
|x˜ xk+1|
|x˜ xk|p = c 6= 0, we have p-th order convergence to fixed point x˜
Example: Table of absolute error of iterates, |x˜ xk|
k p = 1, c = 1/2 p = 2, c = 1
0 101 101
1 5 · 102 102
2 2.5 · 102 104
3 1.25 · 102 108
4 6.125 · 103 1016
We see that despite the second column having c = 1, p = 2 converges much much faster.
Page 10
CSCC43 - Summary 4 NON LINEAR EQUATIONS
4.5 Rate of Convergence Theorem
For the Fixed Point Iteration xk+1 = g(xk), if g0(xˆ) = g00(xˆ) · · · = g(p1)(xˆ) = 0
but gp(xˆ) 6= 0, then we have p-th order convergence.
Proof let xk+1 = g(xk) = g(x˜+ (xk x˜))
= g(x˜) + g(xk x˜)g0(x˜) + (xk x˜)
2
2!
g00(x˜) + · · ·+ (xk x˜)
p1
(p 1)! g
(p1)(x˜) +
(xk x˜)p
p!
g(p)(⌘k)
Where ⌘k 2 [x˜, xk]
However, if we have g0(xˆ) = g00(xˆ) = · · · = g(p1)(xˆ) = 0
The Taylor expansion yields xk+1 = g(xk) = g(xˆ) +
(xk xˆ)p
p!
g(p)(⌘k), xk+1 xˆ
(xk xˆ)p =
1
p!
g(p)(⌘k)
We see that as the p!1, xk ! xˆ, ⌘k 2 [xˆ, xk]) ⌘k = xˆ
We can rewrite this as lim
xk!xˆ
|xk+1 xˆ|
|xk xˆ|p =
1
p!
g(p)(xˆ)
Thus we see that when the p-th derivative of g hat xˆ is not zero, we reach p-th order convergence.
We see that now we can use the second form of the fixed point iteration g(x) = x h(x)F (x), and pick such
h(x) so that the p-th derivative of g is not zero at xˆ to accelerate iteration.
4.6 Newton’s Method
Recall Newton’s Method of F (x) = 0 where xk+1 = xk F (xk)
F 0(xk)
Where it takes the root finding problem of F (x) = 0, x = g(x) with g(x) = x F (x)
F 0(x)
This is the second form of the fixed point problem with h(x) =
1
F 0(x)
Speed of Newton’s Method
Suppose that F 0(x˜) = 0 but F 00(x˜) 6= 0
g0(x) = 1 F
0(x)F 0(x) F (x)F 00(x)
(F 0(x))2
=
F (x)F 00(x)
(F 0(x))2
i.e., g0(x˜) = 0 For any function F .
By rate of convergence theorem, the newtons method has at-least quadratic convergence. since g0(x˜) = 0 for
any function F .
Geometric Interpretation of Newton’s Method
Want to solve F (x) = 0, at an initial guess xk approximate (or model) F (x) by a linear polynomial p(x)
that satisfies conditions: p(xk) = F (xk) and p0(xk) = F 0(xk)) pk(x) = F (xk) + (x xk)F 0(xk)
Then xk+1 is the root of pk(x), i.e. pk(xk+1) = 0) F (xk) + (xk+1 xk)F 0(xk) = 0
) xk+1 = xk F (xk)
F 0(xk)
Will Newton’s Method always Converge? No.
For example Newton’s method on F (x) = x ex
xk+1 = xk F (xk)
F 0(xk)
= xk xk e
xk
1 + exk
=
(1 + xk)exk
1 + exk
4.7 Secant Method
Recall NM: xk+1 = xk F (xk)
F 0(xk)
, notice that we have to supply the first derivative of F
Page 11
CSCC43 - Summary 4 NON LINEAR EQUATIONS
We can approximate F 0(xk) by
F (xk) F (xk1)
xk xk1
Then xk+1 = xk F (xk) (xk xk1)
F (xk) F (xk1)
Another way to derive this is using the geometric interpretation.
However This is not a fixed point iteration of the form xk+1 = g(xk), as no function g could support two
values.
We cannot directly use FPT or RCT to analyze this method. However, with some adjustment can prove
rate of convergence is p =
1 +
p
5
2
⇡ 1.62
Super Linear Convergence, perhaps not as fast as Newton (p = 2) ?
Newton’s Method: xk+1 = xk F (xk)
F 0(xk)
Secant Method: xk+1 = xk F (xk)F (xk) F (xk1)
xk xk1
= xk F (xk) (xk xk1)
F (xk) F (xk1)
But we see that newtons method requires 2 Function evaluations per step as F, F 0 are not the same.
And the secant method requires only 1 function evaluation. Although F (x) needs to be evaluated, F (xk1)
has been computed during the previous iteration!
E↵ect rate of convergence takes cost per iteration into account, as well as speed. Thus secant method is
”e↵ectively” faster than Newton’s method.
4.8 Bisection Method
Oftentimes, if we start Newton’s/Secant method with the wrong initial guess, they will not converge. (Wrong
side of the hill)
Bisection Method however has guaranteed convergence, this is to say that it always finds a root, but slowly.
We need to find an a  b such that F (a)  0  F (b) or F (b)  0  F (a)
This means there is at least one root of F in [a, b]
Assume F (a)  0  F (b)
Loop until b a is small enough.
let m = (a+ b)/2
If F (m)  0 then let a = m, else b = m
Repeat for the interval [m, b] or [a,m]
Then we have linear rate of convergence p = 1, c =
1
2
with guaranteed convergence.
4.9 Hybrid Method
We can start o↵ with bisection, then switch to a faster one later on.
Example: Combine Bisection and Newton’s Method
4.10 System of Non-Linear Equations
Problem: Solve F (~x) = ~0
Where F (~x) =

F1(x1, x2)
F2(x1, x2)

=

4(x1)2 + 9(x2)2 16x1 54x2 + 61
x1x2 2x1 1

=

0
0

Extending Newton’s Method to Systems. Recall Newton’s Method for a single non linear equation where
F : R! R and xk+1 = xk F (xk)
F 0(xk)
Page 12
CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION
Then for F : Rn ! Rn then ~xk+1 = ~xk F ( ~xk)
F 0( ~xk)
Need to Know: F 0 is the Jacobian matrix of F , i.e. F 0 2 Rn⇥n
then ~xk+1 = ~xk [F 0( ~xk)]1F ( ~xk) or [F 0( ~xk)]( ~xk+1 ~xk) = F ( ~xk)
This is simply in the form of some A~x = ~b, a linear system!
This is very expensive! We can use the Pseudo Newton Method by holding the Jacobian matrix fixed for a
few iterations, this means we can reuse the PA = LU factorization. This is alright so long as we are still
converging.
5 Approximation/Interpolation
Truncated Taylor Series: p(x) = F (a) + F 0(a)(x a) + · · ·+ F
(n)(a)
n!
(x a)n
This is a polynomial because of the (x a)i for i = 0, 1, 2 . . .
Then the error e(x) = p(x) F (x) = F
(n+1)(⌘)
(n+ 1)!
(x a)n+1
Other approximations:
1. Interpolation: Find a polynomial p such that p(xi) = F (xi), i = 0, 1, 2, . . .
F is the function we are trying to approximate, could simply be a set of data.
2. Least Squares: Finding a polynomial p such that p(x) minimize kF pk2 =
Z b
a
(F (x) p(x))2dx
!1/2
Other norms we can use for Least Squares approximation,
kF pk1 = maxaxb|F (x) p(x)|
kF pk1 =
Z b
a
|F (x) p(x)|dx
5.1 Polynomial Interpolation
Consider Pn : the set of all polynomials of degree  n
This is a function space, and requires a basis of n+1 functions. (Pn is a function space of dimension n+1)
The most common basis is the monomial basis {xi}ni=0
Weierstrass’ Theorem (Existence)
If a function F is continuous on an interval [a, b], then for any ✏ > 0, 9p✏ 3: kF p✏k1 < ✏
This is to say that for any continuous function on a closed interval [a, b] there exists some polynomial that
is as close to it as it can be.
Numerical Methods for Polynomial Interpolation
1. Vandermonde: Method of Undetermined Coecients
Theorem For any sets: {xi : i = 0, 1, . . . n}, {yi : i = 0, 1, . . . , n} for distinct xis and undistinct yis
Then 9!p 2 Pn,3: p(xi) = yi, i = 0, 1, . . . , n
Proof : If such p(x) exists, it must be possible to write it as a monomial space polynomial:
p(x) =
nX
i=0
aix
i, Then we have n+ 1 unknowns, {ai}, i = 0, 1, . . . , n
Page 13
CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION
This can be converted into a matrix problem with p(xi) = yi, i = 0, 1, . . . n
We can solve for the ais by using the Vandermonde matrix:26664
(x0)0 (x0)1 (x0)2 . . . (x0)n
(x1)0 (x1)1 (x1)2 . . . (x1)n
...
(xn)0 (xn)1 (xn)2 . . . (xn)n
37775
26664
a0
a1
...
an
37775 =
26664
y0
y1
...
yn
37775
Now is the Vandermonde matrix non-singular?
We can return to polynomial problem, we are asking if its possible
for p 2 Pn,3: p(xi) = 0, i = 0, 1, 2, . . . , n
But this is impossible as no polynomial in Pn have n+ 1 roots
Thus the polynomial is the x-axis with 0 coecients.
Or det(V ) = (1)n+1
nY
i=1
i1Y
j=0
(xi xj)
The Vandermonde however does not lead to the best algorithm, it can be poorly conditioned, PV = LU
factorization then forward/backward solve will cost.
This will give the monomial basis p(x)
2. Lagrange Basis
Consider a simple interpolation problem p(xi) = yi, i = 0, 1, . . . , n
Consider the basis {li(x)} where li(x) =
nY
j=0,j 6=i
x xj
xi xj for i = 0, 1, . . . , n
Note, li(x) 2 Pn, 8i and li(xj) =
(
1 i = j
0 i 6= j
With this basis function, we can write out the interpolating polynomial for free.
Where p(x) =
nX
i=0
li(x)yi ) p(x) 2 Pn and p(xi) = yi, i = 0, 1, . . . , n
3. Newton (Divided Di↵erence) Basis
For simple interpolation p(xi) = yi, i = 0, 1, . . . , n
We look for an interpolating p(x) of the form:
p(x) = a0 + a1(x x0) + a2(x x0)(x x1) + · · ·+ an(x x0)(x x1)(x x2) . . . (x xn1)
Converting into a matrix,
2666666664
1 0 0 . . . 0
1 (x1 x0) 0 . . . 0
1 (x2 x0) (x2 x0)(x2 x1) . . . 0
...
1 (xn x0) . . .
n1Y
i=0
(xn xi)
3777777775
2666664
a0
a1
a2
...
an
3777775 =
2666664
y0
y1
y2
...
yn
3777775
This is a lower triangular system, which means there is no factorization involved.
Page 14
CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION
5.2 Divided Di↵erences
Definition: y[x1] = y(x1) = y1
y[xi+k, . . . , xi] =
y[xi+k, . . . , xi+1] y[xi+k+1, . . . , xi]
xi+k xi
example: y[x2, x1, x0] =
y[x2, x1] y[x1, x0]
x2 x0
Newton’s Polynomial: p(x) = y[x0]+ (xx0)y[x1, x0]+ · · ·+(xx0)(xx1) . . . (xxn1)y[xn, , . . . , x0]
Then, p(x) 2 Pn and p(xi) = yi, i = 0, 1, . . . , n
Proof by Induction, let p(x) ⌘ pn(x)
Basis: n = 0, n = 1
p0(x) = y[x0], p0(x0) = y[x0] = y0
p1(x) = y[x0] + (x1 x0)y[x1, x0]
p1(x0) = y[x0] = y0
p1(x1) = y[x0] + (x1 x0)
✓
y1 y0
x1 x0
◆
= y[x0] + y1 y0 = y1
Induction Hypothesis:
SKIPPING THIS FOR NOW NO TIME LMFAO
How are divided di↵erences and derivatives related?
Example: y[x1, x0] =
y(x1) y(x0)
x1 x0 , but what happens when x1 ! x0
Then lim
x1!x0
y[x1, x0] = lim
x1!x0
y(x1) y(x0)
x1 x0 = y
0(x0)
Provided the y0 exists.
Example: y[x2, x1, x0] =
y[x2, x1] y[x1, x0]
x2 x0
Then lim
x2!x0
x1!x0
y[x2, x1, x0] = lim
x1!x0
y[x2, x1] y[x1, x0]
x2 x0 =
y00(x0)
2!
We can show that in general, lim xk!x0
xk1!x0
xk2!x0
...
x1!x0
y[xk, xk1, . . . x0] =
y(k)(x0)
k!
This helps with Osculatory Interpolation (i.e., interpolation with derivatives).
Page 15
CSCC43 - Summary 5 APPROXIMATION/INTERPOLATION
Example: Find p 2 P4 such that.
p(0) = 0
p(1) = 1, p0(1) = 1, p00(1) = 2
p(2) = 6
Page 16
CSCC43 - Summary 6 NUMERICAL INTEGRATION
6 Numerical Integration
Problem: Approximate some I(F ) =
Z b
a
F (x)dx
recall Riemann Sums of a function F , R(F ) =
u1X
i=0
F (⌘i)(xi+1 xi)
Composite Rule: Apply a simple format on many short integrals, i.e.
Z x2
x1
F (x)dx
Basic Formulas: Based o↵ interpolation, instead of integrating we can use the interpolated polynomial.
We approximate F (x) by some p(x) on [a, b]
Then I(F ) ⇡ Q(F ) = I(P ) =
Z b
a
p(x)dx =
Z b
a
[
nX
i=0
F (xi)li(x)] =
nX
i=0
F (xi)
Z b
a
li(x)dx
We arrive at the general form of quadrature: Q(F ) =
nX
i=0
AiF (xi), Ai =
Z b
a
li(x)dx
6.1 Theorem: Existence of Interpolatory Quadrature
Given any set of n + 1 distinct nodes, we can choose the weights Ai such that the quadrature rule is exact
for all polynomials of degree  qn, moreover Ais are unique.
Proof: Since Q(F ) is linear, then for {xi}ni=0 the polynomials can be divided into basis form.
Then for each xk, Q(xk) =
nX
i=0
Ai(xi)
k =
1
k + 1
(bk+1 ak+1)
This results in n+ 1 equations with n+ 1 unknowns.2666664
1 1 1 . . . 1
x0 x1 x2 . . . xn
(x0)2 (x1)2 (x2)2 . . . (xn)2
...
...
... . . .
...
(x0)n (x1)n (x2)n . . . (xn)n
3777775
2666664
A0
A1
A2
...
An
3777775 =
2666664
b a
(b2 a2)/2
(b3 a3)/3
...
(bn+1 an+1)/(n+ 1)
3777775
We see that the first matrix is simply the transposed Vandermonde Matrix, thus by invertability, weights
are unique.
6.2 Definition of Precision
If m is the highest natural number such that Q integrates all polynomials of degree  m, we say the precision
of Q is m.
Midpoint Rule: I(F ) ⇡ Q(F ) =M(F ) = I(p0) = (b a)F (a+ b
2
)
Trapezoid Rule: I(F ) ⇡ Q(F ) = T (F ) = I(p1) = b a
2
[F (a) + F (b)]
Simpsons Rule: I(F ) ⇡ Q(F ) = T (F ) = I(p2) = b a
6
[F (a) + F (
b+ a
2
) + F (b)]
Page 17
CSCC43 - Summary 6 NUMERICAL INTEGRATION
6.3 Error in Interpolatory Quadrature
En = I(F )Qn(F ), where Qn will integrate all polynomials if degree n
=
Z b
a
[F (x) pn(x)]dx =
Z b
a

F (n+1)(⌘)
(n+ 1)!
nY
i=0
(x xi)
!
dx
Will interpolatory quadrature rules converge to I(F ) as n!1?
Theorem: Let F be continuous on [a, b] an Qn(F ) =
nX
i=0
A(n)i F ((xi)
(n))
Then: lim
n!1Qn(F ) = I(F )() 9k 2 R,3:
X
i = 0n|A(n)i |  k, 8n 2 N
Page 18

学霸联盟