R2-无代写|学霸联盟

R2-无代写

时间：2023-04-07

Chapter 10: Jacobian Matrix and Inverse Functions 1 / 30
10.1 Vector Value Functions 2 / 30
So far we have looked at functions from R to Rn (curves) and from R2 or R3 to R — in other words
either the domain or the codomain but not both have been vector valued.
10.1 Vector Valued Functions
A function
˜
f : Rp → Rq (q > 1) is a vector valued function of p variables.
Example 10.1
˜
f

xy
z

 =
(
x+ y + z
xyz
)
defines a function from R3 to R2.
2 / 30
10.1 Vector Valued Functions (continued)
When it comes to these vector valued functions, we now must write vectors as column vectors.
In example 10.1, the real-valued functions f1

xy
z

 = x+ y + z
and f2

xy
z

 = xyz are called the co-ordinate or component functions of
˜
f, and
we may write
˜
f =
(
f1
f2
)
.
Generally, any
˜
f : Rp → Rq is determined by q co-ordinate functions f1, . . . , f q and we write
˜
f =


f1(x1, . . . , xp)
...
f q(x1, . . . , xp)

 (1)
3 / 30
1
10.2 Jacobian Matrix 4 / 30
Jacobian Matrix
How do we define the derivative of a vector valued function?
Recall that if f : R2 → R then we can form the directional derivative, i.e.,
f ′
˜
u = u
1 ∂f
∂x
+ u2
∂f
∂y
= ∇f ·
˜
u where
˜
u = (u1, u2).
So that knowledge of the the gradient of f gives information about all directional derivatives.
Therefore it is reasonable to assume
∇
˜
pf =
(
∂f
∂x
(
˜
p),
∂f
∂y
(
˜
p)
)
is the derivative of f at
˜
p. (The story is more complicated than this but if f is “differentiable” then
∇f represents the derivative, see later.)
5 / 30
Jacobian Matrix (continued)
More generally if f : Rp → R we take the derivative at
˜
a to be the row vector(
∂f
∂x1
(
˜
a),
∂f
∂x2
(
˜
a), . . . ,
∂f
∂xp
(
˜
a)
)
= ∇
˜
af
Now take
˜
f : Rp → Rq where
˜
f is as in equation (1), then the natural candidate for the derivative of
˜
f
at
˜
a is
J
˜
a
˜
f =


∂f1
∂x1
∂f1
∂x2
. . .
∂f1
∂xp
∂f2
∂x1
∂f2
∂x2
. . .
∂f2
∂xp
...
...
. . .
...
∂f q
∂x1
∂f q
∂x2
. . .
∂f q
∂xp


where the partial derivatives are evaluated at
˜
a. This q × p matrix is called the Jacobian matrix of
˜
f.
6 / 30
2
Example 10.2
Writing the function
˜
f as a column helps us to get the rows and columns of the Jacobian matrix the
right way round.
Note the “Jacobian” is usually the determinant of this matrix when the matrix is square, i.e. when
p = q.
Example 10.2 Find the Jacobian matrix of
˜
f from example 10.1, where
˜
f

xy
z

 =
(
x+ y + z
xyz
)
,
and evaluate it at the point A = (1, 2, 3).
J
˜
f =
(
1 1 1
yz xz xy
)
and J
˜
a
˜
f =
(
1 1 1
6 3 2
)
.
7 / 30
Jacobian Determinant
Most of the cases we will be looking at have p = q = either 2 or 3. Suppose u = u(x, y) and
v = v(x, y). If we define f : R2 → R2 by
˜
f
(
x
y
)
=
(
u(x, y)
v(x, y)
)
≡
(
f1
f2
)
then the Jacobian matrix is
J
˜
f =


∂u
∂x
∂u
∂y
∂v
∂x
∂v
∂y


and the Jacobian (determinant)
det(J
˜
f) =
∣∣∣∣∣∣∣∣
∂u
∂x
∂u
∂y
∂v
∂x
∂v
∂y
∣∣∣∣∣∣∣∣
=
∂u
∂x
∂v
∂y
− ∂v
∂x
∂u
∂y
.
8 / 30
3
Example 10.3
We often denote det(J
˜
f) by
∂(u, v)
∂(x, y)
.
Example 10.3 Polar to Cartesian co-ordinates, where
x = r cos θ and y = r sin θ.
Then
∂(x, y)
∂(r, θ)
=
∣∣∣∣∣∣∣∣
∂x
∂r
∂x
∂θ
∂y
∂r
∂y
∂θ
∣∣∣∣∣∣∣∣
=
∣∣∣∣cos θ − r sin θsin θ r cos θ
∣∣∣∣ = r.
9 / 30
10.3 Derivatives 10 / 30
Derivatives
We have already noted that if
˜
f : Rp → Rq then the Jacobian matrix at each point
˜
a ∈ Rp is an q × p
matrix.
Such a matrix J
˜
a
˜
f gives us a linear map D
˜
a
˜
f : Rp → Rq by
(D
˜
a
˜
f) (
˜
x) = J
˜
a
˜
f ·
˜
x for all
˜
x ∈ Rp
Note that
˜
x is a column vector.
Definition 10.1 Formally
˜
f : Rp → Rq is differentiable at
˜
a if, for small
˜
h, D
˜
a
˜
f(
˜
h) is a “good”
approximation to
˜
f(
˜
a+
˜
h)−
˜
f(
˜
a) in the sense that
lim
˜
h→
˜
0
‖
˜
f(
˜
a+
˜
h)−
˜
f(
˜
a)−D
˜
a
˜
f(
˜
h)‖
‖
˜
h‖
= 0
where
‖
˜
h‖ =
√
h21 + h
2
2 + . . .+ h
2
p.
11 / 30
4
Derivatives (continued)
You should compare this to the one variable case: a function f : R→ R is differentiable at a if
lim
h→0
f(a+ h)− f(a)
h
exists, and we call this limit f ′(a).
But we could equally well say this as f : R→ R is differentiable at a if there is a number, written
f ′(a), for which
lim
h→0
|f(a+ h)− f(a)− f ′(a) · h|
|h| = 0,
because a linear map T : R→ R can only be multiplying by a number.
12 / 30
Example 10.3
Example 10.4 Write the derivative of the function in example 10.1 at (1, 2, 3) as a linear map.
T (
˜
x) =
(
1 1 1
6 3 2
)xy
z

 =
(
x+ y + z
6x+ 3y + 2z
)
.
Suppose
˜
f and
˜
g are two differentiable functions from Rp to Rq.
It is easy to see that the derivative of
˜
f+
˜
g is the sum of the derivatives of
˜
f and
˜
g.
We can take the dot product of
˜
f and
˜
g and get a function from Rp to R, and then differentiate that.
The result is a sort of product rule, but I’ll leave you to work out what happens.
Since we cannot divide vectors, there cannot be a quotient rule, so of the standard differentiation
rules, that leaves the chain rule.
13 / 30
5
10.4 The Chain Rule 14 / 30
Chain Rule
10.4 The Chain Rule
Now suppose that
˜
g : Rp → Rs and
˜
f : Rs → Rq.
We can now form the composition
˜
f ◦
˜
g by mapping with
˜
g first and then following with
˜
f:
R
p
˜
g−−−−→ Rs ˜
f−−−−→ Rq
˜
x −−−−→
˜
g(
˜
x) −−−−→
˜
f(
˜
g(
˜
x))
(2)
(
˜
f ◦
˜
g) (
˜
x) =
˜
f (
˜
g(
˜
x)) ∀
˜
x ∈ Rp
15 / 30
Example 10.5
Example 10.5 Let
˜
g : R2 → R2 and
˜
f : R2 → R3 be defined, respectively, by
˜
g
(
x
y
)
=
(
x+ y
xy
)
and
˜
f
(
x
y
)
=

tan
−1 x
x− y
xy

 .
Then
˜
f ◦
˜
g is defined by
(
˜
f ◦
˜
g)
(
x
y
)
= f
(
g
(
x
y
))
= f
(
x+ y
xy
)
=

tan
−1(x+ y)
x+ y − xy
(x+ y) (xy)

 .
16 / 30
6
The chain rule
Now if
˜
f and
˜
g in equation (2) above,
(
˜
f ◦
˜
g) (
˜
x) =
˜
f (
˜
g(
˜
x)) ∀
˜
x ∈ Rp
are differentiable, then if
˜
b =
˜
g(
˜
a) ∈ Rs, the maps J
˜
a
˜
g : Rp → Rs and J
˜
b
˜
f : Rs → Rq are defined,
and we have
Theorem 10.1 (The Chain Rule) Suppose that
˜
g : Rp → Rs and
˜
f : Rs → Rq are differentiable.
Then
J
˜
a(
˜
f ◦
˜
g) = J
˜
g(
˜
a)
˜
f · J
˜
a
˜
g.
This is again just like the one variable case, except now we are multiplying matrices (see below).
17 / 30
Example 10.6
Example 10.6 Consider example 10.5:
˜
g
(
x
y
)
=
(
x+ y
xy
)
and
˜
f
(
x
y
)
=

tan
−1 x
x− y
xy


Find J
˜
a(
˜
f ◦
˜
g) where
˜
a =
(
1
−2
)
.
Now J
˜
a
˜
g =
(
1 1
y x
)
˜
a
=
(
1 1
−2 1
)
,
˜
g(
˜
a) = (−1,−2)
so J
˜
g(
˜
a)
˜
f =

(1 + x
2)−1 0
1 − 1
y x


x=−1,y=−2
=


1
2 0
1 − 1
−2 −1


18 / 30
7
Example 10.6 (continued)
As
˜
f ◦
˜
g =

tan
−1(x+ y)
x+ y − xy
(x+ y) (xy)

 , we have
J
˜
a(
˜
f ◦
˜
g) =


1
1 + (x+ y)2
1
1 + (x+ y)2
1− y 1− x
2xy + y2 x2 + 2xy


(1,−2)
=


1
2
1
2
3 0
0 −3


We see that 

1
2
1
2
3 0
0 −3

 =


1
2 0
1 −1
−2 −1

( 1 1−2 1
)
as required.
19 / 30
Revisiting the one-variable chain rule
The one variable chain rule is a special case of the one we’ve just met — the same can be said for the
chain rules we saw in chapter 5.
Let x : R→ R be a differentiable function of t and u : R→ R a differentiable function of x. Then
(u ◦ x) : R→ R is given by (u ◦ x)(t) = u(x(t)).
In the notation of this chapter
Jt(u ◦ x) = Jx(t)u · Jtx i.e.
[
d
dt
(u ◦ x)
]
t
=
[
du
dx
]
x(t)
[
dx
dt
]
t
We usually write this as
du
dt
=
du
dx
dx
dt
keeping in mind that when we write
du
dt
we are thinking of u as
a function of t, i.e., u(x(t)) and when we write
du
dx
we are thinking of u as a function of x.
20 / 30
8
Revisiting the one-variable chain rule (continued)
Now suppose we have x = x(t), y = y(t) and z = f(x, y). Then
R
(x,y)−−−−→ R2 f−−−−→ R
and
Jt(f ◦
˜
x) = J
˜
x(t)f · Jt
˜
x
Therefore
(
d
dt
f(x(t), y(t))
)
=
(
∂f
∂x
∂f
∂y
)
·


dx
dt
dy
dt


so that, as we saw in chapter 5,
df
dt
=
∂f
∂x
dx
dt
+
∂f
∂y
dy
dt
.
21 / 30
10.5 Inverse Functions 22 / 30
One-variable inverse function theorem
The inverse function theorem says that if f ′(a) is not zero, there is a differentiable inverse function
f−1 defined near f(a) with
[
d
dt
(f−1)
]
f(a)
=
1
f ′(a)
. But with, for example, polar coordinates
(x = r cos θ, r =
√
x2 + y2), if we differentiate we get
∂r
∂x
=
x√
x2 + y2
=
r cos θ
r
= cos θ and
∂x
∂r
= cos θ,
i.e.,
∂r
∂x
6= 1
∂x
∂r
So the one variable inverse function theorem does not apply to partial derivatives.
23 / 30
9
Polar coordinates
However, there is a simple generalisation if we use the multivariable derivative, that is, the Jacobian
matrix.
To continue with the polar coordinate example, define
˜
f
(
r
θ
)
=
(
x(r, θ)
y(r, θ)
)
and
˜
g
(
x
y
)
=
(
r(x, y)
θ(x, y)
)
Consider
(
˜
f ◦
˜
g)
(
x
y
)
=
˜
f
(
˜
g
(
x
y
))
=
˜
f
(
r
θ
)
=
(
x
y
)
= Id
(
x
y
)
Therefore
˜
f ◦
˜
g = Id, the identity operator on R2.
Similarly
˜
g ◦
˜
f = Id.
24 / 30
Polar coordinates (continued)
Recall
Id
(
x
y
)
=
(
x
y
)
so J(Id) =
(
1 0
0 1
)
≡ 2× 2 identity matrix.
Thus by the chain rule (omitting the points of evaluation)
J
˜
f · J
˜
g = J(Id) =
(
1 0
0 1
)
= J
˜
g · J
˜
f so that (J
˜
f)−1 = J
˜
g.
So


∂r
∂x
∂r
∂y
∂θ
∂x
∂θ
∂y


− 1
=


∂x
∂r
∂x
∂θ
∂y
∂r
∂y
∂θ


We can check this directly using
∂r
∂x
=
x√
x2 + y2
= cos θ etc.
25 / 30
10
Multi-variant inverse function theorem
The same idea works in general:
Theorem 10.2 (The Inverse Function Theorem) Let
˜
f : Rp → Rp be differentiable at
˜
a.
Then if J
˜
a
˜
f is an invertible matrix, there is a differentiable inverse function
˜
f−1 : Rp → Rp defined in
some neighbourhood of
˜
b =
˜
f(
˜
a) and
(J
˜
b
˜
f−1) = (J
˜
a
˜
f)−1
Note that the inverse function may only exist in a small region around
˜
b =
˜
f(
˜
a).
26 / 30
Example 10.7
Example 10.7 We earlier saw that for polar coordinates,
J
˜
f =
(
cos θ − r sin θ
sin θ r cos θ
)
,
with determinant r.
So it follows from the inverse function theorem that the inverse function
˜
g is differentiable if r 6= 0.
27 / 30
11
Example 10.8
Example 10.8 The function
˜
g : R2 → R2 is
˜
given by
˜
g
(
x
y
)
=
(
u
v
)
=
(
xy
x+ y
)
.
Where is
˜
g−1 differentiable? Find the Jacobian matrix of
˜
g−1 where it exists.
Firstly, J
˜
g =
(
y x
1 1
)
and det J
˜
g = y − x, so
˜
g has a differentiable inverse everywhere except on the
line y = x
For the Jacobian of the inverse, we invert Jg.
J
˜
g−1 =
(
y x
1 1
)−1
=
1
y − x
(
1 −x
−1 y
)
28 / 30
Example 10.8 (continued)
Translating to (u, v) coordinates, we have (for ε = ±1)
x =
1
2
(
v + ε
√
v2 − 4u
)
, y =
1
2
(
v − ε
√
v2 − 4u
)
.
(Note that v2 ≥ 4u⇐⇒ (x− y)2 ≥ 0, so the roots are real.)
How do we chose the sign of ε?
A function needs to be one-to-one to be invertible.
Here
˜
g(a, b) =
˜
g(α, β) implies that (a, b) = (α, β) or (β, α).
So if we restrict
˜
g to either x > y or to y > x (two regions separated by x = y of course) it is
one-to-one, fixing ε.
Assuming we just look at
˜
g defined in x > y, we get
J
˜
g−1 =
1
y − x
(
1 −x
−1 y
)
=
−1√
v2 − 4u
(
1 −12(v +
√
v2 − 4u)
−1 12(v −
√
v2 − 4u)
)
.

29 / 30
12
Jacobians
Now let us apply the inverse function theorem to the Jacobian determinants. We recall that
∂(r, θ)
∂(x, y)
= det J
˜
g =
∣∣∣∣∣∣∣∣
∂r
∂x
∂r
∂y
∂θ
∂x
∂θ
∂y
∣∣∣∣∣∣∣∣
and
∂(x, y)
∂(r, θ)
= det J
˜
f =
∣∣∣∣∣∣∣∣
∂x
∂r
∂x
∂θ
∂y
∂r
∂y
∂θ
∣∣∣∣∣∣∣∣
.
Since J
˜
g and J
˜
f are inverse matrices, their determinants are inverses:
∂(r, θ)
∂(x, y)
=
1
∂(x, y)
∂(r, θ)
.
This sort of result is true for any change of variable — in any number of dimensions — and will prove
vitally important later.
30 / 30