MAST10007 Linear Algebra
THE UNIVERSITY OF MELBOURNE
SCHOOL OF MATHEMATICS AND STATISTICS
Summer 2022 Lecture Notes
These notes have been made in accordance with the provisions of Part VB of the copyright act for teaching purposes of the
University. These notes are for the use of students of the University of Melbourne enrolled in MAST10007 Linear Algebra.
1
Topic 1: Linear equations [AR 1.1 and 1.2]
1.1. Systems of linear equations. Row operations.
1.2. Reduction of systems to row-echelon form.
1.3. Reduction of systems to reduced row-echelon form.
1.4. Consistent and inconsistent systems.
2
1.1 Systems of linear equations. Row operations
Linear equations
Definition (Linear equation and linear system)
A linear equation in n variables, x1, x2, . . . , xn, is an equation of the form
a1x1 + a2x2 + · · ·+ anxn = b,
where a1, . . . , an and b are constants.
A finite collection of linear equations in the variables x1, x2, . . . , xn is
called a system of linear equations or a linear system.
Example
x1 + 5x2 + 6x3 = 100
x2 − x3 = −1
−x1 + x3 = 11
3
Definition (Homogeneous linear system)
A system of linear equations is called homogeneous if all the constants
on the right hand side are zero.
Example
x1 + 5x2 + 6x3 = 0
x2 − x3 = 0
−x1 + x3 = 0
Definition (Solution of a system of linear equations)
A solution to a system of linear equations in the variables x1, . . . , xn is a
set of values of these variables which satisfy every equation in the
system.
4
Example 1. Data fitting using a polynomial
Find the coefficients c , α, β, γ such that the cubic polynomial
y = c + αx + βx2 + γx3 passes through the points specified below.
x y
−0.1 0.90483
0 1
0.1 1.10517
0.2 1.2214
Solution:
Substituting the points (x , y) into the cubic polynomial gives:
0.90483 = c − 0.1α + 0.01β − 0.001γ
1 = c
1.10517 = c + 0.1α + 0.01β + 0.001γ
1.2214 = c + 0.2α + 0.04β + 0.008γ
5
So c = 1.
We can rewrite the system as 3 equations in 3 unknowns.
−0.1α+ 0.01β − 0.001γ = −0.09517
0.1α + 0.01β + 0.001γ = 0.10517
0.2α + 0.04β + 0.008γ = 0.2214
Note:
To solve these equations by hand is tedious, so we can use a computer
package such as MATLAB.
6
Example 2. (Revision)
Solve the following linear system.
2x − y = 3
x + y = 0
Solution:
Graphically
Solution is the intersection of the lines: (x , y) = (1,−1).
Note:
◮ Need accurate sketch.
◮ Not practical for three or more variables.
7
Elimination
2x − y = 3 (1)
x + y = 0 ⇒ y = −x (2)
Substituting (2) in (1) gives
2x + x = 3 ⇒ 3x = 3 ⇒ x = 1
Substituting into (2) gives y = −1.
Note:
Elimination of variables will always give a solution, but we need to do
this systematically and not in an adhoc manner, for three or more
variables.
8
Definition (Coefficient matrix and augmented matrix)
The coefficient matrix for a linear system is the matrix formed from the
coefficients in the equations.
The augmented matrix for a linear system is the matrix formed from the
coefficients in the equations and the constant terms. These are
separated by a vertical line.
Example 3.
Write down an augmented matrix for the following linear system:
2x − y = 3
x + y = 0
Solution: [
2 −1 3
1 1 0
]
9
Row Operations
To find the solutions of a linear system, we perform row operations to
simplify the augmented matrix. An essential condition is that whichever
row operations we perform, we can recover the solutions to the original
system from the new matrix.
Definition (Elementary row operations)
The elementary row operations are:
1. Interchanging two rows.
2. Multiplying a row by a non-zero constant.
3. Adding a multiple of one row to another row.
Note:
The matrices produced after each row operation are not equal but are
equivalent, meaning that the solution is the same for the system
represented by each augmented matrix. We use the symbol ∼ to denote
equivalence of matrices.
10
Example 4.
Solve the following system using elementary row operations:
2x − y = 3
x + y = 0
Solution:[
2 −1 3
1 1 0
]
R2 ↔ R1 ∼
[
1 1 0
2 −1 3
]
R2 → R2 − 2R1
∼
[
1 1 0
0 −3 3
]
R2 → −13R2
∼
[
1 1 0
0 1 −1
]
R1 → R1 − R2
∼
[
1 0 1
0 1 −1
]
So x = 1, y = −1.
11
1.2 Reduction of systems to row-echelon form
Definition (Leading entry)
The leftmost non-zero entry in each row of the matrix is called the
leading entry.
Definition (Row-echelon form)
A matrix is in row-echelon form if:
1. For any row with a leading entry, all elements below that entry and
in the same column as it, are zero.
2. For any non-zero two rows, the leading entry of the lower row is
further to the right than the leading entry in the higher row.
3. Any row that consists solely of zeros is lower than any row with
non-zero entries.
12
Examples[
1 −2 3 4 5 ] row-echelon form
1 0 0 30 4 1 2
0 0 0 3
row-echelon form
0 0 0 2 4
0 0 3 1 6
0 0 0 0 0
2 −3 6 −4 9
not row-echelon form
13
Gaussian elimination
Gaussian elimination is a systematic way to reduce a matrix to
row-echelon form using row operations.
Note:
The row-echelon form obtained is not unique.
Gaussian elimination
1. Interchange rows, if necessary, to bring a non-zero number to the
top of the first column with a non-zero entry.
2. Add suitable multiples of the top row to lower rows so that all
entries below the leading entry are zero. (Multiplying a row by a
constant is also allowed and is often useful.)
3. Start again at Step 1 applied to the matrix without the first row.
To solve a linear system we read off the equations from the row-echelon
matrix and then solve the equations to find the unknowns, starting with
the last equation. This final step is called back substitution.
14
Example 5.
Solve the following system of linear equations using Gaussian
elimination:
x − 3y + 2z = 11
2x − 3y − 2z = 13
4x − 2y + 5z = 31
Solution:
Step 1. Write the system in augmented matrix form.
1 −3 2 112 −3 −2 13
4 −2 5 31
15
Step 2. Use elementary row operations to reduce the augmented matrix
to row-echelon form.
1 −3 2 112 −3 −2 13
4 −2 5 31
R2 → R2 − 2R1
R3 → R3 − 4R1
∼
1 −3 2 110 3 −6 −9
0 10 −3 −13
R2 → 13R2
∼
1 −3 2 110 1 −2 −3
0 10 −3 −13
R3 → R3 − 10R2
∼
1 −3 2 110 1 −2 −3
0 0 17 17
16
Step 3. Read off the equations from the augmented matrix and use back
substitution to solve for the unknowns, starting with the last equation.
This gives us three equations:
17z = 17 ⇒ z = 1
y − 2z = −3 ⇒ y = 2− 3 = −1
x − 3y + 2z = 11 ⇒ x = −3− 2 + 11 = 6
17
1.3 Reduction of systems to reduced row-echelon form
Definition (Reduced row-echelon form)
A matrix is in reduced row-echelon form if the following three conditions
are satisfied:
1. It is in row-echelon form.
2. Each leading entry is equal to 1 (called a leading 1).
3. In each column containing a leading 1, all other entries are zero.
18
Examples[
1 −2 3 −4 5 ] reduced row-echelon form
[
1 2 0
0 0 1
]
reduced row-echelon form
1 0 0 30 1 1 2
0 0 0 3
is not in reduced row-echelon form
1 0 0 2 4
0 1 0 1 6
0 0 0 0 0
0 0 1 −4 9
is not in reduced row-echelon form
19
Gauss-Jordan elimination
Gauss-Jordan elimination is a systematic way to reduce a matrix to
reduced row-echelon form using row operations.
Note:
The reduced row-echelon form obtained is unique.
Gauss-Jordan elimination
1. Use Gaussian elimination to reduce matrix to row-echelon form.
2. Multiply each non-zero row by an appropriate number to create a
leading 1 (type 2 row operations).
3. Use row operations (of type 3) to create zeros above the leading
entries.
20
Example 6.
Solve the following system of linear equations using Gauss-Jordan
elimination:
x − 3y + 2z = 11
2x − 3y − 2z = 13
4x − 2y + 5z = 31
Solution:
Step 1. Use Gaussian elimination to reduce the augmented matrix to
row-echelon form.
From example 5, we have:
1 −3 2 112 −3 −2 13
4 −2 5 31
∼
1 −3 2 110 1 −2 −3
0 0 17 17
21
Step 2. Divide rows by their leading entry.
1 −3 2 110 1 −2 −3
0 0 17 17
R3 → 117R3
∼
1 −3 2 110 1 −2 −3
0 0 1 1
Step 3. Add multiples of the last row to the rows above to make the
entries above its leading 1 equal to zero.
1 −3 2 110 1 −2 −3
0 0 1 1
R1 → R1 − 2R3R2 → R2 + 2R3 ∼
1 −3 0 90 1 0 −1
0 0 1 1
22
Step 4. Add multiples of the penultimate (2nd last) row to the rows
above to make the entries above its leading 1 equal to zero.
1 −3 0 90 1 0 −1
0 0 1 1
R1 → R1 + 3R2 ∼
1 0 0 60 1 0 −1
0 0 1 1
Step 5. Read off the answers.
The solution is x = 6, y = −1, z = 1.
23
1.4 Consistent and inconsistent systems
Theorem
A system of linear equations has zero solutions, one solution, or
infinitely many solutions. There are no other possibilities.
We say that the system is:
◮ inconsistent if it has no solutions.
◮ consistent if it has at least one solution.
Note:
Every homogeneous linear system is consistent, since it always has at
least one solution: x1 = 0, . . . , xn = 0.
24
Inconsistent systems
We can determine whether a system is consistent or inconsistent by
reducing its augmented matrix to row-echelon form.
Theorem
The system is inconsistent if and only if there is at least one row in the
row-echelon matrix having all entries equal to zero except for a non-zero
final entry.
Example 7.
Solve the system with row-echelon form:
1 0 1 −20 2 2 4
0 0 0 −3
Solution:
The third equation says 0x1 + 0x2 + 0x3 = −3.
So the system is inconsistent with no solution.
25
Geometrically, an inconsistent system in 3 variables has no common
point of intersection for the planes determined by the system.
26
Consistent systems
Unique solution
For a consistent system with n variables, a unique solution exists when
the row reduced coefficient matrix and augmented matrix has n
non-zero rows. We can read off the solution from the reduced
row-echelon form of the matrix.
Example 8.
Solve the system with reduced row-echelon form:
1 0 0 20 1 0 −4
0 0 1 15
Solution:
Reading off the answers, we get
x1 = 2, x2 = −4, x3 = 15.
27
Infinitely many solutions
Example 9. Calculating flows in networks
At each node • we require
sum of flows in = sum of flows out.
Determine a, b, c and d in the following network.
28
Solution:
Equating the flow in and out of each node gives:
10 = a+ b (1)
a = 3 + c (2)
c + d = 6 (3)
b = 1 + d (4)
Rearranging the equations gives 4 equations in 4 unknowns:
a + b = 10 (1)
a− c = 3 (2)
c + d = 6 (3)
b − d = 1 (4)
29
Writing the equations as an augmented matrix gives:
1 1 0 0 10
1 0 −1 0 3
0 0 1 1 6
0 1 0 −1 1
R2 → R2 − R1
Using Gauss-Jordan elimination gives:
∼
1 1 0 0 10
0 −1 −1 0 −7
0 0 1 1 6
0 1 0 −1 1
R4 → R4 + R2
∼
1 1 0 0 10
0 −1 −1 0 −7
0 0 1 1 6
0 0 −1 −1 −6
R2 → −R2
R4 → R4 + R3
30
∼
1 1 0 0 10
0 1 1 0 7
0 0 1 1 6
0 0 0 0 0
R2 → R2 − R3 (row − echelon form)
∼
1 1 0 0 10
0 1 0 −1 1
0 0 1 1 6
0 0 0 0 0
R1 → R1 − R2
∼
1 0 0 1 9
0 1 0 −1 1
0 0 1 1 6
0 0 0 0 0
(reduced row − echelon form)
The corresponding (non-zero) equations are
a + d = 9
b − d = 1
c + d = 6
31
The variable d can take on any value, so we let d = t ∈ R.
Then the other variables are given by
a = 9− t
b = 1 + t
c = 6− t
where t ∈ R.
There are an infinite number of solutions with the form
(a, b, c , d) = (9− t, 1 + t, 6− t, t), t ∈ R
Two examples of solutions for (a, b, c , d) are (9, 1, 6, 0) and (8, 2, 5, 1).
32
Example 10.
Solve the system with reduced row-echelon form:
1 2 0 0 5 1
0 0 1 0 6 2
0 0 0 1 7 3
0 0 0 0 0 0
Solution:
The corresponding (non-zero) equations are
x1 + 2x2 + 5x5 = 1
x3 + 6x5 = 2
x4 + 7x5 = 3
The variables x2 and x5 corresponding to columns with no leading entry
can take any value, so we let
x2 = s ∈ R and x5 = t ∈ R.
33
Then the other variables are given by
x1 = 1− 2s − 5t
x3 = 2− 6t
x4 = 3− 7t
where s, t ∈ R.
Therefore
(x1, x2, x3, x4, x5) = (1− 2s − 5t, s, 2 − 6t, 3− 7t, t), s, t ∈ R
34
Theorem
Suppose we have a consistent linear system with n variables.
◮ If the row-reduced augmented matrix has < n non-zero rows,
then the system has infinitely many solutions.
◮ If r is the number of non-zero rows in the row-echelon form,
then n − r parameters are needed to specify the solution set.
More precisely, in the row-echelon form of the matrix, there will be
n − r columns which contain no leading entry. We can choose the
variable corresponding to such a column arbitrarily. The values for the
remaining variables are then determined.
35
Geometrically, a consistent system in 3 variables has a common point,
line or plane of intersection for the planes determined by the system.
36
Example 11.
Find the values of a, b ∈ R for which the system
x − 2y + z = 4
2x − 3y + z = 7
3x − 6y + az = b
has
(a) no solution,
(b) a unique solution,
(c) an infinite number of solutions.
Solution:
Using row operations on the augmented matrix gives:
1 −2 1 42 −3 1 7
3 −6 a b
R2 → R2 − 2R1
R3 → R3 − 3R1
∼
1 −2 1 40 1 −1 −1
0 0 a− 3 b − 12
37
Looking at the last row of the augmented matrix, we find there is:
(a) no solution if a − 3 = 0 and b − 12 6= 0. Hence a = 3, b 6= 12.
(b) a unique solution if a − 3 6= 0 and b − 12 is arbitrary. Hence
a 6= 3, b ∈ R.
(c) an infinite number of solutions if a− 3 = 0 and b − 12 = 0. Hence
a = 3, b = 12.
38
Topic 2: Matrices and Determinants [AR 1.3–1.7, 2.1–2.4]
2.1 Matrix notation
2.2 Matrix operations
2.3 Matrix inverses
2.4 Elementary matrices
2.5 Linear systems revisited
2.6 Determinants
39
2.1 Matrix notation
Definition (Matrix)
A matrix of size m × n is a rectangular array of numbers with m rows
and n columns.
A =
A11 A12 . . . A1n
A21 A22 . . . A2n
...
...
. . .
...
Am1 Am2 . . . Amn
or A = [Aij ]
We denote by Aij the entry in the i -th row and j-th column of A where
i = 1, 2, . . . ,m and j = 1, 2, . . . , n.
40
Special matrices
◮ A matrix with the same number of rows as columns is called a
square matrix.
◮ A matrix with one row is called a row matrix or row vector.
◮ A matrix with one column is called a column matrix or column
vector.
◮ A square matrix with Aij = 0 for i 6= j is called a diagonal matrix.
e.g.
[
2 0 0
0 −1 0
0 0 5
]
◮ A square matrix with Aij = 0 for i > j is called an upper triangular
matrix.
e.g.
[
2 1 3
0 −1 4
0 0 5
]
◮ A square matrix with Aij = 0 for i < j is called a lower triangular
matrix.
e.g.
[
2 0 0
1 −1 0
5 3 5
]
41
◮ A matrix with all zero entries is called a zero matrix. We denote
the zero matrix by 0 or, if size is important, we write 0m,n to
denote the zero matrix of size m × n.
e.g. 02,2 = [ 0 00 0 ], 02,3 = [
0 0 0
0 0 0 ]
◮ A square matrix A satisfying Aij =
{
1 if i = j
0 if i 6= j is called an
identity matrix. We write I to denote an identity matrix or, if size
is important, we write In to denote the identity matrix of size n× n.
e.g. I2 = [ 1 00 1 ], I3 =
[
1 0 0
0 1 0
0 0 1
]
42
2.2 Matrix operations
Recall that we can multiply a matrix by a scalar, add matrices of the
same size, and multiply matrices if the sizes are compatible.
Definition (Scalar multiplication)
Let A be a matrix and c be any scalar. The product cA is the matrix
obtained by multiplying all entries of A by c . It is called a scalar
multiple of A:
(cA)ij = c(Aij)
Definition (Addition of matrices)
Let A and B be m × n matrices. The sum A+ B is an m × n matrix
obtained by adding corresponding entries of A and B .
(A+ B)ij = Aij + Bij
43
Definition (Matrix multiplication)
Let A be an m× n matrix and B be a n× q matrix. The product AB is
a matrix of size m × q.
The entry in position (i , j) of the matrix product is obtained by taking
row i of A, and column j of B , then multiplying together the entries in
order and adding.
(AB)ij =
n∑
k=1
AikBkj
Note:
◮ The product AB is only defined if the number of columns in A is
equal to the number of rows in B .
◮ We can think of matrix multiplication in terms of dot products:
(AB)ij = (row i of A) · (column j of B).
44
Application of matrix multiplication
Example 1. Graphs and adjacency matrices
A graph is a set of points called vertices together with a set of lines or
curves called edges which connect certain vertices.
Every graph has a corresponding adjacency matrix that specifies which
vertices Vi are connected. If the graph has n vertices then the
adjacency matrix is n× n and the (i , j) entry Aij is equal to the number
of edges from Vi to Vj .
45
Determine the adjacency matrix for the above graph.
Solution:
The adjacency matrix is
A =
0 1 0 0 1
1 0 1 1 1
0 1 0 1 0
0 1 1 0 1
1 1 0 1 0
46
Example 2. Walks in a graph
A walk in a graph is a sequence of edges linking one vertex to another.
The length of a walk is equal to the number of edges it traverses.
Theorem
If A is an n× n adjacency matrix of a graph and Akij represents the (i , j)
entry of Ak , then Akij is equal to the number of walks of length k from
Vi to Vj for each k > 1.
In particular, the entry Aij of A gives the number of walks of length 1
from Vi to Vj .
Problem:
Determine the number of walks of length 3 between V1 and V4 for the
graph in example 1.
47
Solution:
To find walks of length 3, we compute A3.
A3 =
2 6 3 3 5
6 6 6 7 7
3 6 2 5 3
3 7 5 4 7
5 7 3 7 4
The (1, 4) entry of A3 equals 3. Hence there are 3 walks of length 3
from V1 to V4.
The walks are:
V1 → V2 → V3 → V4
V1 → V2 → V5 → V4
V1 → V5 → V2 → V4
48
Definition (Transpose of a matrix)
Let A be an m × n matrix. The transpose of A, denoted by AT , is the
n ×m matrix whose entries are given by interchanging the rows and
columns of A:
(AT )ij = Aji
Example 3.
Let A =
[
1 2 3
4 5 6
]
. Find AT .
Solution:
AT =
1 42 5
3 6
49
Properties of the transpose
Let α be a scalar. The following properties hold whenever the matrix
products and sums exist:
1.
(
AT
)T
= A
2. (A+ B)T = AT + BT
3. (αA)T = αAT
4. (AB)T = BTAT
Proof: Property 4.
(1) Check that (AB)T and BTAT are the same size. If A is an m × n
matrix and B is an n× q matrix then
◮ AB is an m × q matrix
◮ (AB)T is an q ×m matrix
◮ BT is an q × n matrix
◮ AT is an n ×m matrix
◮ BTAT is an q ×m matrix
50
(2) Show that the corresponding entries are equal for all i and j .
((AB)T )ij = (AB)ji
=
n∑
k=1
AjkBki
=
n∑
k=1
BkiAjk
=
n∑
k=1
(BT )ik(A
T )kj
= (BTAT )ij
51
2.3 Matrix inverses
Definition (Matrix inverse)
An n× n matrix A is called invertible if there exists a matrix B such that
AB = BA = In.
The matrix B is called the inverse of A and is denoted by A−1. If A is
not invertible, we say that A is singular.
We can prove from the definition that:
◮ (If it exists) A−1 has the same size as A.
◮ (If it exists) A−1 is unique.
◮ If A is invertible, then A−1 is invertible and (A−1)−1 = A.
◮ I−1n = In
◮ 0 is singular.
◮ A matrix that has a row or column consisting entirely of zeros is
singular.
52
Properties of the matrix inverse
If A and B are n× n invertible matrices and α is a non-zero scalar, then
1. (αA)−1 = 1α A
−1
2. (AB)−1 = B−1A−1
3. (An)−1 =
(
A−1
)n
(for all integers n > 1)
4.
(
AT
)−1
=
(
A−1
)T
Proof: Property 2.
Since A and B are n × n invertible matrices, then
AA−1 = A−1A = In and BB−1 = B−1B = In
So
(AB)(B−1A−1) = A(BB−1)A−1 = AInA−1 = AA−1 = In
Similarly,
(B−1A−1)(AB) = B−1(A−1A)B = B−1InB = B−1B = In
Hence, AB is invertible and (AB)−1 = B−1A−1.
53
Theorem (Inverse of a 2× 2 matrix)
Let A=
[
a b
c d
]
. Then
1. A is invertible ⇐⇒ ad − bc 6= 0.
2. If ad − bc 6= 0, then A−1 = 1
ad − bc
[
d −b
−c a
]
.
Example 4.
Find the inverse of A =
[
2 −1
1 1
]
.
Solution:
ad − bc = 2 + 1 = 3
A−1 = 1
3
[
1 1
−1 2
]
54
Finding the inverse of a square matrix via Gauss-Jordan elimination
Calculating the inverse of a matrix
Algorithm: input: n × n matrix A
output: A−1 or “A is not invertible”.
1. Construct the augmented matrix [A | In ].
2. Apply row operations to [A | In ] to get the block corresponding to
A into reduced row-echelon form. This gives
[A | In ] ∼ [R | B ]
where R is in reduced row-echelon form.
3. If R = In, then A is invertible and A
−1 = B .
If R 6= In, then A is singular , i.e. A−1 does not exist.
55
Example 5.
Find the inverse of A =
1 2 1−1 −1 1
0 1 3
.
Solution:
Form the augmented matrix [A | I3 ] and perform row operations so that
A is in reduced row-echelon form.
1 2 1 1 0 0−1 −1 1 0 1 0
0 1 3 0 0 1
R2 → R2 + R1
∼
1 2 1 1 0 00 1 2 1 1 0
0 1 3 0 0 1
R3 → R3 − R2
56
∼
1 2 1 1 0 00 1 2 1 1 0
0 0 1 −1 −1 1
R1 → R1 − R3R2 → R2 − 2R3
∼
1 2 0 2 1 −10 1 0 3 3 −2
0 0 1 −1 −1 1
R1 → R1 − 2R2
∼
1 0 0 −4 −5 30 1 0 3 3 −2
0 0 1 −1 −1 1
Hence
A−1 =
−4 −5 33 3 −2
−1 −1 1
57
Example 6. Encrypting messages
A common way of sending an encrypted message is to assign an integer
value to each letter of the alphabet and send the message as a string of
integers.
For example the message
SEND MONEY
might be sent as
5, 8, 10, 21, 7, 2, 10, 8, 3
Here the S is represented by a 5, the E by an 8, and so on. Ciphers like
this are generally easy to break.
A more secure way is to use matrix multiplication and matrix inverses.
58
Step 1: Encrypting the message
We put the message (SEND MONEY) in a matrix by entering the
integers corresponding to each letter down the columns
B =
5 21 108 7 8
10 2 3
For simplicity, choose an invertible 3× 3 matrix whose inverse has
integer entries, e.g.
A =
1 2 12 5 3
2 3 2
The product
AB =
1 2 12 5 3
2 3 2
5 21 108 7 8
10 2 3
=
31 37 2980 83 69
54 67 50
59
gives the encrypted message to be sent:
31, 80, 54, 37, 83, 67, 29, 69, 50
Step 2: Decrypting the message
The person receiving the encrypted message can decrypt it by
premultiplying the received message by A−1
A−1(AB) =
1 −1 12 0 −1
−4 1 1
31 37 2980 83 69
54 67 50
=
5 21 108 7 8
10 2 3
which gives the sequence of integers
5, 8, 10, 21, 7, 2, 10, 8, 3
corresponding to
SEND MONEY
60
2.4 Elementary matrices
The effect of an elementary row operation can be achieved by
multiplication on the left by an elementary matrix.
Definition (Elementary matrix)
An n× n matrix is an elementary matrix if it can be obtained from In by
performing a single elementary row operation.
Theorem
Let Ep be the elementary matrix obtained by applying a row operation p
to In.
If A is a matrix such that EpA is defined, then EpA is equal to the result
of performing p on A.
We can perform a sequence of elementary row operations using a
corresponding sequence of elementary matrices.
61
Example 7.
Find the elementary matrices obtained by applying the following row
operations on I3.
(a) R2 ↔ R3 (b) R1 → 2R1
(c) R3 → R3 + 3R1
Solution:
(a)
1 0 00 0 1
0 1 0
(b)
2 0 00 1 0
0 0 1
(c)
1 0 00 1 0
3 0 1
62
Example 8.
Let A =
[
1 2
3 4
]
and B =
[
1 2
0 1
]
. Given that
[
1 2
3 4
]
R2 → R2 − 3R1 ∼
[
1 2
0 −2
]
R2 → −12R2
∼
[
1 2
0 1
]
write B as a product of elementary matrices and A.
Solution:
The elementary matrices for each row operation are:
R2 → R2 − 3R1 : E1 =
[
1 0
−3 1
]
R2 → −1
2
R2 : E2 =
[
1 0
0 −1
2
]
Then
B = E2E1A =
[
1 0
0 −1
2
] [
1 0
−3 1
] [
1 2
3 4
]
63
Example 8 (extended).
Extend the row operations in the previous question so that A ∼ I2:
[
1 2
3 4
]
∼
[
1 2
0 −2
]
∼
[
1 2
0 1
]
R1 → R1 − 2R2
∼
[
1 0
0 1
]
Write I2 as a product of elementary matrices and A.
Solution:
The elementary matrices for each row operation are:
R2 → R2 − 3R1 : E1 =
[
1 0
−3 1
]
R2 → −1
2
R2 : E2 =
[
1 0
0 −1
2
]
R1 → R1 − 2R2 : E3 =
[
1 −2
0 1
]
Then
I2 = E3E2E1A
64
If A ∼ In by a sequence of elementary row operations, then there is a
sequence of elementary matrices E1,E2, . . . ,Ek such that
EkEk−1 · · ·E2E1A = In.
Each of the elementary matrices Ep (p = 1, . . . k) are invertible.
Theorem
1. Let A be an n × n matrix. Then
A is invertible ⇐⇒ A ∼ In.
2. Every invertible matrix can be written as a product of elementary
matrices.
65
2.5 Linear systems revisited
A linear system in the variables x1, . . . , xn can be written in the form
Ax = b
where A is an m × n matrix, b is an m × 1 matrix and x is the n × 1
matrix [x1, . . . , xn]
T .
Theorem
If A is an invertible n × n matrix, then any linear system of the form
Ax = b has a unique solution, given by x = A−1b.
Proof:
Since A is an n × n invertible matrix, then AA−1 = A−1A = In. So
Ax = b
⇒A−1Ax = A−1b
⇒Inx = A−1b
⇒x = A−1b
66
Example 9.
Use a matrix inverse to solve the linear system
x + 2y + z = −3
−x − y + z = 11
y + 3z = 21
Solution:
Step 1. Write the system in the form Ax = b
1 2 1−1 −1 1
0 1 3
xy
z
=
−311
21
Step 2. Find A−1.
From example 5:
A−1 =
−4 −5 33 3 −2
−1 −1 1
67
Step 3.
The solution is x = A−1b.
Hence
xy
z
=
−4 −5 33 3 −2
−1 −1 1
−311
21
=
12− 55 + 63−9 + 33− 42
3− 11 + 21
=
20−18
13
68
General solutions of linear systems
The general solution (set of all solutions) of a linear system Ax = b is
related to the general solution of the homogeneous system Ax = 0.
Theorem (Superposition of solutions)
If Ax = b is consistent, then the general solution is obtained by taking
one particular solution p of Ax = b, and adding the general solution h
of Ax = 0. That is,
x = p+ h.
Proof:
If Ap = b and Ah = 0 then x = p+ h is a solution to Ax = b since
Ax = A(p+ h) = Ap+ Ah = b+ 0 = b.
Conversely, if Ap = b and Ax = b then h = x− p satisfies Ax = 0 since
Ah = A(x− p) = Ax− Ap = b− b = 0.
69
Example 10.
In Example 9 of topic 1, we found the general solution
(a, b, c , d) = (9− t, 1 + t, 6− t, t), t ∈ R.
Determine a particular solution and the general solution to the
homogeneous system.
Solution:
(a, b, c , d) = (9− t, 1 + t, 6− t, t)
= (9, 1, 6, 0) + t(−1, 1,−1, 1), t ∈ R
So a particular solution is
p = (9, 1, 6, 0).
The general solution to the homogeneous system is
h = t(−1, 1,−1, 1), t ∈ R.
70
Rank of a matrix
Definition (Rank of a matrix)
The rank of a matrix A is the number of non-zero rows in the reduced
row-echelon form of A.
Note:
◮ This is the same as the number of non-zero rows in any
row-echelon form of A.
◮ If A has size m × n, then rank(A) 6 m and rank(A) 6 n.
71
Example 11.
Find the rank of
A =
1 −1 2 10 1 1 −2
1 −3 0 5
Solution:
1 −1 2 10 1 1 −2
1 −3 0 5
R3 → R3 − R1
∼
1 −1 2 10 1 1 −2
0 −2 −2 4
R3 → R3 + 2R2
∼
1 −1 2 10 1 1 −2
0 0 0 0
Since there are 2 non-zero rows, the rank(A) is 2.
72
Rank and solutions of linear systems
We can rewrite the results of section 1.4 for the solutions of a linear
system in terms of rank.
Theorem
The linear system Ax = b, where A is an m × n matrix, has
1. No solution if rank(A) < rank([A |b]).
2. A unique solution if rank(A) = rank([A |b]) and rank(A) = n.
3. Infinitely many solutions if rank(A) = rank([A |b]) and
rank(A) < n.
73
Example 12.
Determine the number of solutions of the system with row-echelon form:
1 −1 2 10 1 1 −2
1 −3 0 5
∼
1 −1 2 10 1 1 −2
0 0 0 0
Solution:
Now
◮ rank(A) = rank(A|b) = 2, so system is consistent
◮ number of columns (variables) is 3
As rank(A) < 3, the system has infinite solutions by part (3) of theorem.
74
Theorem
If A is an n × n matrix, the following conditions are equivalent:
1. A is invertible.
2. Ax = b has a unique solution for any b.
3. The rank of A is n.
4. The reduced row-echelon form of A is In.
75
2.6 Determinants
The inverse of A =
[
a b
c d
]
exists if and only if ad − bc 6= 0. We call
ad − bc the determinant of A, and write this as det(A) or |A|.
Geometrically, | det(A)| gives the area of the parallelogram spanned by
the vectors v1 = (a, b), v2 = (c , d) of A.
v1
v2
(0,0) (3,0)
(0,2)
anticlockwise
v2
v1
(0,0) (3,0)
(0,2)
clockwise
det(A) =
∣∣∣∣3 00 2
∣∣∣∣ = 6 det(A) =
∣∣∣∣0 23 0
∣∣∣∣ = −6
area = 6 area = 6
76
Extracting the key properties of the determinant for 2× 2 matrices
allows us to define the determinant for n × n matrices.
Definition (Determinant)
The determinant is a function that maps each n × n matrix A to a
number, denoted det(A) or |A|, that satisfies the following conditions:
D1. Adding one row to another row does not change the determinant.
D2. Multiplying any row by a scalar α multiplies the determinant by α.
D3. The determinant of the identity matrix In equals 1.
Note:
These three conditions specify the determinant uniquely.
77
We can show that the effects of performing elementary row operations
on the determinant are as follows.
Theorem
Let A be an n× n matrix.
1. If B is obtained from A by replacing a row of A by itself plus a
multiple of another row, then det(B) = det(A).
2. If B is obtained from A by multiplying a row of A by the scalar α,
then det(B) = α det(A).
3. If B is obtained from A by swapping two rows of A, then
det(B) = − det(A).
Note:
We can also perform elementary column operations on the matrix A
which have the same effect as the corresponding elementary row
operations on the determinant.
78
Example 13.
Compute the determinant of
A =
2 6 90 3 8
0 0 −1
using row operations.
Solution:
Use row operations of type 3 to convert A to diagonal form. By
definition D1, this does not change the determinant.∣∣∣∣∣∣
2 6 9
0 3 8
0 0 −1
∣∣∣∣∣∣
R1 → R1 + 9R3
R2 → R2 + 8R3 =
∣∣∣∣∣∣
2 6 0
0 3 0
0 0 −1
∣∣∣∣∣∣
R1 → R1 − 2R2
=
∣∣∣∣∣∣
2 0 0
0 3 0
0 0 −1
∣∣∣∣∣∣
79
Using definition D2 to take out factors 2, 3,−1 from the rows and
definition D3, gives
|A| = 2× 3×−1×
∣∣∣∣∣∣
1 0 0
0 1 0
0 0 1
∣∣∣∣∣∣
= −6
80
A similar argument proves the following:
Theorem
If A is an upper triangular or lower triangular matrix, then det(A) is the
product of the elements on the main diagonal.
Note:
We can use row operations to transform a matrix into upper triangular
form to make computation of the determinant easier.
81
Example 14.
Compute the determinant of
A =
1 2 1−1 1 1
0 1 3
using row operations.
Solution:
|A| =
∣∣∣∣∣∣
1 2 1
−1 1 1
0 1 3
∣∣∣∣∣∣ R2 → R2 + R1 =
∣∣∣∣∣∣
1 2 1
0 3 2
0 1 3
∣∣∣∣∣∣ R3 → R3 − 13R2
=
∣∣∣∣∣∣
1 2 1
0 3 2
0 0 7
3
∣∣∣∣∣∣
= 1× 3× 7
3
= 7
82
Properties of the determinant
Theorem
Let A,B be n × n matrices and let α be a scalar. Then
1. If A has a row (or column) of zeros, then det(A) = 0.
2. If A has a row (or column) which is a scalar multiple of another
row (or column) then det(A) = 0.
3. det(αA) = αn det(A)
4. det(AB) = det(A) det(B)
5. det
(
AT
)
= det(A)
6. A is invertible if and only if det(A) 6= 0.
7. If A is invertible then det
(
A−1
)
=
1
det(A)
.
Note:
Property 5 means that we can replace rows by columns in all the
previous properties of determinants.
83
Example 15.
Prove property 7 for the determinant.
Solution:
Since A is an n × n invertible matrix, then
AA−1 = In
Taking determinants of the equation we get
det(AA−1) = det(In)
⇒ det(A) det(A−1) = det(In) (property 4)
⇒ det(A) det(A−1) = 1 (definition D3)
⇒ det(A−1) = 1
det(A)
(property 6)
84
Cofactor expansion
Another way to calculate determinants is to use cofactor expansion.
Definition (Cofactor)
Let A be a square matrix. The (i , j)-cofactor of A, denoted by Cij , is
the number given by
Cij = (−1)i+jdet (A(i , j))
where A(i , j) is the matrix obtained from A by deleting the ith row and
jth column.
85
Example 16.
If A =
1 2 1−1 −1 1
0 1 3
, calculate C23.
Solution:
Now
A(2, 3) =
[
1 2
0 1
]
So C23 = (−1)5det(A(2, 3)) = −1
86
Theorem (Cofactor expansion)
The determinant of an n × n matrix A can be computed by choosing
any row (or column) of A and multiplying the entries in that row (or
column) by their cofactors and then adding the resulting products.
That is, for each 1 6 i 6 n,
det(A) = Ai1Ci1 + Ai2Ci2 + · · ·+ AinCin =
n∑
k=1
AikCik
(cofactor expansion along the ith row)
and for each 1 6 j 6 n,
det(A) = A1jC1j + A2jC2j + · · ·+ AnjCnj =
n∑
k=1
AkjCkj
(cofactor expansion along the jth column).
87
How do you remember the sign of the cofactor?
The (1, 1) cofactor always has sign +. Starting from there, imagine
walking to the square you want using either horizontal or vertical steps.
The appropriate sign will change at each step.
We can visualise this arrangement with the following matrix:
+ − + − . . .
− + − + . . .
+ − + − . . .
− + − + . . .
...
...
...
...
. . .
.
So, for example, C13 is assigned + but C32 is assigned −.
88
Example 17.
Compute the determinant of
A =
1 2 1−1 1 1
0 1 3
using cofactor expansion.
Solution:
Expand along the third row since there is a zero.
det(A) = +0×
∣∣∣∣ 2 11 1
∣∣∣∣− 1×
∣∣∣∣ 1 1−1 1
∣∣∣∣+ 3×
∣∣∣∣ 1 2−1 1
∣∣∣∣
= 0− 1(1 × 1− 1× (−1)) + 3(1 × 1− 2× (−1))
= −2 + 9
= 7
as before.
89
Example 18.
Calculate |A| =
∣∣∣∣∣∣∣∣
1 −2 0 1
3 2 2 0
1 0 1 0
0 −4 2 4
∣∣∣∣∣∣∣∣
using cofactors.
Solution:
Expand along the fourth column since there are 2 zeros.
|A| = (−1)×
∣∣∣∣∣∣
3 2 2
1 0 1
0 −4 2
∣∣∣∣∣∣+ 4×
∣∣∣∣∣∣
1 −2 0
3 2 2
1 0 1
∣∣∣∣∣∣
= −
(
3×
∣∣∣∣ 0 1−4 2
∣∣∣∣− 1×
∣∣∣∣ 2 2−4 2
∣∣∣∣
)
(1st column)
+ 4
(
1×
∣∣∣∣ 2 20 1
∣∣∣∣− (−2)×
∣∣∣∣ 3 21 1
∣∣∣∣
)
(1st row)
90
So
|A| = −3(0 + 4) + (4 + 8) + 4[(2 − 0) + 2(3 − 2)]
= −12 + 12 + 8 + 8
= 16
91
Topic 3: Euclidean Vector Spaces [AR 3.1-3.5]
3.1 Vectors in Rn
3.2 Dot product
3.3 Cross product of vectors in R3
3.4 Lines and planes
92
3.1 Vectors in Rn
Definition (Addition and scalar multiplication of vectors in Rn)
The set of all n-tuples of real numbers are denoted by
R
n = {(v1, v2, . . . , vn) | vi ∈ R for i = 1, 2, . . . , n}
Let v = (v1, v2, . . . , vn) ∈ Rn, u = (u1, u2, . . . , un) ∈ Rn and α ∈ R.
Then addition and scalar multiplication are defined by:
v + u = (v1, . . . , vn) + (u1, . . . , un) = (v1 + u1, . . . , vn + un)
αv = α(v1, v2, . . . , vn) = (αv1, αv2, . . . , αvn)
In R3, the unit vectors in the x , y , z directions are often denoted by
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)
so any vector in R3 can be written in the form
v = (v1, v2, v3) = v1i+ v2 j+ v3k.
93
Definition (Length of a vector)
The length (norm, magnitude) of a vector u = (u1, u2, . . . , un) ∈ Rn is
given by
‖u‖ =
√
u21 + u
2
2 + · · ·+ u2n.
A vector of length 1 is called a unit vector.
Note:
Every non-zero vector u is parallel to a unit vector uˆ, where
uˆ =
u
||u|| .
94
Distance between two points
Let u, v ∈ Rn. The distance between u and v is
d(u, v) = ||v − u||.
v
v − u
u
Example 1.
Find the distance between the points P(1, 3,−1, 2) and Q(2, 1,−1, 3).
Solution:
Let u = (1, 3,−1, 2) and v = (2, 1,−1, 3).
Distance is
||−→PQ|| = ||v − u|| = ||(1,−2, 0, 1)|| = √1 + 4 + 1 =
√
6
95
3.2 Dot product
Definition (Dot product)
Let u = (u1, u2, . . . , un) ∈ Rn and v = (v1, v2, . . . , vn) ∈ Rn.
We define the dot product (scalar product, Euclidean inner product) by
u · v =
n∑
k=1
ukvk = u1v1 + u2v2 + · · ·+ unvn.
Note:
If we write u and v as column matrices U =
u1
...
un
and V =
v1
...
vn
, then
u · v = UTV .
96
Properties of the dot product
Let u, v,w ∈ Rn and let α ∈ R be a scalar. Then
1. u · v is a scalar
2. u · v = v · u
3. (u+ v) · w = u · w + v ·w
4. u · (v + w) = u · v + u ·w
5. u · u = ‖u‖2
6. (αu) · v = α(u · v)
97
Geometry of the dot product
We can also define the dot product geometrically by
u · v = ‖u‖‖v‖ cos θ
u
v
Ƨ
where 0 6 θ 6 pi is the angle between the vectors u and v.
Note:
◮ Two non-zero vectors are perpendicular or orthogonal when
u · v = 0, so that θ = pi
2
.
◮ Since | cos θ| 6 1, then
|u · v| 6 ||u|| ||v||.
This is known as the Cauchy-Schwarz inequality.
98
Example 2.
Verify that the Cauchy-Schwarz inequality holds for the vectors
u = (0, 2, 2,−1) and v = (−1, 1, 1,−1).
Solution:
Now
u · v = (0, 2, 2,−1) · (−1, 1, 1,−1) = 0 + 2 + 2 + 1 = 5
and
‖u‖ = √0 + 4 + 4 + 1 =
√
9 = 3
‖v‖ = √1 + 1 + 1 + 1 =
√
4 = 2
⇒ ‖u‖‖v‖ = 6
Hence
|u · v| 6 ||u|| ||v||.
99
Example 3. Correlation
Correlation is a measure of how closely two variables are dependent.
Suppose we want to compare how closely 7 students’ assignment marks
correlate with their exam marks.
Student Assignment Mark Exam Mark
S1 99 100
S2 80 82.5
S3 79 79
S4 75.5 82.5
S5 87.5 91
S6 67 67.5
S7 76 68
Average 80.5 81.5
To measure the correlation we first need to adjust the scores so that
they have mean zero. This can be done by subtracting the average
score X¯ from each students’ score X .
100
Let’s store the information in a matrix.
X − X¯ =
18.5 18.5
−0.5 1
−1.5 −2.5
−5 1
7 9.5
−13.5 −14
−4.5 −13.5
To compare the exam and assignment scores, we compute the cosine of
the angle between the two columns of X − X¯ (denoted x1,x2).
cos θ =
x1 · x2
‖x1‖‖x2‖ =
656.75
(24.92)(28.62)
≈ 0.92
A cosine value near 1 indicates that the scores are highly correlated.
101
Vector projection
Definition (Projection of v onto u)
Assuming u 6= 0, the projection of v onto u is given by
proju v =
u · v
‖u‖2 u = (uˆ · v)uˆ
u
v
proj
u
v
v-projuv
θ
projuv = ‖v‖ cos θuˆ
= ‖uˆ‖‖v‖ cos θuˆ
= (uˆ · v)uˆ
Note:
u · (v − proju v) = 0
102
Example 4.
Let u = (2,−1,−2) and v = (2, 1, 3). Find vectors v1 and v2 such that
v = v1 + v2
where v1 is parallel to u and v2 is perpendicular to u.
Solution:
Now v1 is the projection of v onto u.
u · v = (2,−, 1,−2) · (2, 1, 3) = 4− 1− 6 = −3
‖u‖2 = 22 + (−1)2 + (−2)2 = 9
⇒ v1 = u · v‖u‖2 u = −
3
9
(2,−1,−2) = −1
3
(2,−1,−2)
Hence
v2 = v − v1 = (2, 1, 3) + 1
3
(2,−1,−2) = 1
3
(8, 2, 7)
103
3.3 Cross product of vectors in R3
Definition (Cross product)
Let u = (u1, u2, u3) and v = (v1, v2, v3) be two vectors in R
3. We define
the cross product (or vector product) of u and v by
u× v = (u2v3 − u3v2)i+ (u3v1 − u1v3) j+ (u1v2 − u2v1)k.
It is easy to remember the cross product as a determinant expanded
along the first row:
u× v =
∣∣∣∣∣∣
i j k
u1 u2 u3
v1 v2 v3
∣∣∣∣∣∣ =
∣∣∣∣u2 u3v2 v3
∣∣∣∣ i−
∣∣∣∣u1 u3v1 v3
∣∣∣∣ j+
∣∣∣∣u1 u2v1 v2
∣∣∣∣ k
104
Properties of the cross product
Let u, v,w ∈ R3 and α ∈ R be a scalar. Then
1. v × u = −(u× v)
2. (u+ v)× w = u× w + v × w
3. u× (v + w) = u× v + u× w
4. (αu)× v = α(u× v)
5. u× 0 = 0
6. u× u = 0
Note:
The cross product is defined only for R3.
105
Geometry of the cross product
We can also define the cross product geometrically by
u× v = ‖u‖ ‖v‖ sin(θ) nˆ
u
v
Q
Ƨ
where
◮ 0 6 θ 6 pi is the angle between u and v
◮ nˆ is a unit vector
◮ nˆ is perpendicular to both u and v
◮ nˆ points in the direction given by the right-hand rule. The
direction of u× v is determined by curling your fingers on your
right hand following the angle from u to v and noting the direction
your thumb is pointing.
106
Example 5.
Find a vector perpendicular to both (1, 1, 1) and (1,−1,−2).
Solution:
(1, 1, 1) × (1,−1,−2) = det
i j k1 1 1
1 −1 −2
= i
∣∣∣∣ 1 1−1 −2
∣∣∣∣− j
∣∣∣∣ 1 11 −2
∣∣∣∣+ k
∣∣∣∣ 1 11 −1
∣∣∣∣
= i(−2 + 1)− j(−2− 1) + k(−1− 1)
= −i+ 3j− 2k
So (−1, 3,−2) is perpendicular to both (1, 1, 1) and (1,−1,−2).
107
Area of a parallelogram
Suppose that u, v ∈ R3. The area of the parallelogram defined by u and
v is equal to ‖u× v‖.
base length
u
v height
θ
Area of parallelogram = (base) (height)
= ‖u‖ ‖v‖ sin θ
= ‖u× v‖
108
Example 6.
Find the area of the triangle with vertices (2,−5, 4), (3,−4, 5) and
(3,−6, 2).
V
W
Solution:
u = (3,−4, 5) − (2,−5, 4) = (1, 1, 1)
v = (3,−6, 2) − (2,−5, 4) = (1,−1,−2)
Area of triangle =
1
2
‖u× v‖
=
1
2
‖ − i+ 3j− 2k‖ (example 5)
=
1
2
√
(−1)2 + 32 + (−2)2
=
√
14
2
109
Definition (Scalar triple product)
Let u = (u1, u2, u3), v = (v1, v2, v3) and w = (w1,w2,w3) be vectors in
R
3. The scalar triple product is the real number u · (v × w) given by
u · (v × w) =
∣∣∣∣∣∣
u1 u2 u3
v1 v2 v3
w1 w2 w3
∣∣∣∣∣∣ .
Properties of the scalar triple product
1. u · (u× v) = u · (v × u) = v · (u× u) = 0
2. u · (v × w) = w · (u× v) = v · (w × u)
3. u · (v × w) = −u · (w × v) = −v · (u× w) = −w · (v × u)
110
Volume of a parallelepiped
Suppose that u, v,w ∈ R3. Then, the parallelepiped defined by the
vectors u, v and w has volume equal to the absolute value of the scalar
triple product of u, v and w:
Volume of parallelepiped = |u · (v × w)|
v
u
wθ
Volume of parallelepiped = (area of base) (height)
= ‖v × w‖ ‖u‖ | cos θ|
= |u · (v × w)|
111
Example 7.
Find the volume of the parallelepiped with adjacent edges
−→
PQ,
−→
PR,
−→
PS,
where the points are P(2, 0,−1), Q(4, 1, 0), R(3,−1, 1) and
S(2,−2, 2).
Solution:
Now
−→
PQ = (2, 1, 1),
−→
PR = (1,−1, 2), −→PS = (0,−2, 3)
Volume = |−→PQ · (−→PR×−→PS)|
=
∣∣∣∣∣∣det
2 1 11 −1 2
0 −2 3
∣∣∣∣∣∣
=
∣∣∣∣2
∣∣∣∣ −1 2−2 3
∣∣∣∣−
∣∣∣∣ 1 1−2 3
∣∣∣∣
∣∣∣∣ (expand along 1st column)
= |2(−3 + 4)− (3 + 2)|
= | − 3|
= 3
112
3.4 Lines and planes
Equations of a straight line
The vector equation of a line through a point P0 in the direction of a
vector v is
r = r0 + tv, t ∈ R
where r0 =
−−→
OP0.
0
r
U
P
3Wv
Each point on the line is given by a unique value of t.
113
For example, in R3, if r = (x , y , z), r0 = (x0, y0, z0) and v = (a, b, c),
the vector equation of the line is
(x , y , z) = (x0, y0, z0) + t(a, b, c), t ∈ R
Equating components gives the parametric equations for the line:
x = x0 + ta
y = y0 + tb,
z = z0 + tc
t ∈ R
If a 6= 0, b 6= 0 and c 6= 0, we can solve the parametric equations for t
and equate. This gives the Cartesian form of the line:
x − x0
a
=
y − y0
b
=
z − z0
c
114
Example 8.
Determine vector, parametric and Cartesian equations of the line
passing through the points P(−1, 2, 3) and Q(4,−2, 5)?
Solution:
The direction of the line is
−→
PQ = (4,−2, 5) − (−1, 2, 3) = (5,−4, 2)
Since (−1, 2, 3) is on the line, the vector equation of the line is:
(x , y , z) = (−1, 2, 3) + t(5,−4, 2), t ∈ R
Alternatively, since (4,−2, 5) is on the line, the vector equation can be
written as:
(x , y , z) = (4,−2, 5) + s(5,−4, 2), s ∈ R
The parameters are related by
s = t − 1
115
Writing the first vector equation in component form gives the
parametric equations of the line:
x = −1 + 5t
y = 2− 4t,
z = 3 + 2t
t ∈ R
Eliminating t we get the Cartesian form of the line:
x + 1
5
=
y − 2
−4 =
z − 3
2
116
Example 9.
Find a vector equation of the line through the point (2, 0, 1) that is
parallel to the line
x − 1
1
=
y + 2
−2 =
z − 6
2
.
Does the point (0, 4,−3) lie on the line?
Solution:
The direction of the line is v = (1,−2, 2).
Since (2, 0, 1) is on the line, the vector equation of the line is:
(x , y , z) = (2, 0, 1) + t(1,−2, 2), t ∈ R
Since
(2, 0, 1) + t(1,−2, 2) = (0, 4,−3)
when t = −2, then (0, 4,−3) lies on the line.
117
Definition
Two lines are said to
◮ intersect if there is a point lying on both lines.
◮ be parallel if their direction vectors are parallel.
◮ be skew if they do not intersect and are not parallel.
The angle between two lines is the angle θ between their direction
vectors. (We take the smaller of the two angles θ and pi − θ.)
118
Equations of planes
The vector equation of a plane through a point P0 that contains two
non-parallel vectors in the directions u and v is
r = r0 + su+ tv, s, t ∈ R,
where r0 =
−−→
OP0.
r0
P0
r
tv
su
0
Every value of s and t determines a point on the plane.
In R3, a vector perpendicular to the plane is
n = u× v
We say that n is a normal vector to the plane.
119
If P is a point with position vector r, then
P lies on the plane⇔ r − r0 is parallel to plane
⇔ r − r0 is perpendicular to n
r0
r
0
n
r - r0
Hence the equation of the plane in R3 can also be written in
point-normal form:
(r − r0) · n = 0
120
Putting r = (x , y , z), r0 = (x0, y0, z0) and n = (a, b, c) in the
point-normal form, we derive the Cartesian equation of a plane:
r · n = r0 · n
⇒ (x , y , z) · (a, b, c) = (x0, y0, z0) · (a, b, c)
⇒ ax + by + cz = ax0 + by0 + cz0
⇒ ax + by + cz = d
where d is the constant given by
d = ax0 + by0 + cz0.
The angle between two planes in R3 is given by the angle θ between
their normal vectors. (We take the smaller of the two angles θ and
pi − θ.)
121
Example 10.
Find a Cartesian equation of the plane with vector form
(x , y , z) = s(1,−1, 0) + t(2, 0, 1) + (−1, 1, 1), s, t ∈ R
Solution:
Two vectors in the plane are u = (1,−1, 0) and v = (2, 0, 1).
So a normal to the plane is given by
n = u× v
= det
i j k1 −1 0
2 0 1
= −i− j+ 2k
Hence
n = (a, b, c) = (−1,−1, 2).
122
Therefore, the Cartesian equation of the plane has the form
ax + by + cz = d
⇒ −x − y + 2z = d
Since (−1, 1, 1) is in the plane
⇒ d = 1− 1 + 2 = 2
and
−x − y + 2z = 2
123
Example 11.
Find the vector form of the plane containing the points P(1, 0, 2),
Q(1, 2, 3) and R(4, 5, 6).
Solution:
Two vectors parallel to the plane are
−→
PQ = (0, 2, 1) and−→
PR = (3, 5, 4).
Since (1, 0, 2) is in the plane, the vector equation can be written as:
(x , y , z) = (1, 0, 2) + s(0, 2, 1) + t(3, 5, 4), s, t ∈ R
Alternatively, since (4, 5, 6) is in the plane, the vector equation can be
written as:
(x , y , z) = (4, 5, 6) + p(0, 2, 1) + q(3, 5, 4), p, q ∈ R
The parameters are related by:
p = s, q = t − 1
124
Intersection of a line and a plane
Example 12.
Where does the line
x − 1
1
=
y − 2
2
=
z − 3
3
intersect the plane 3x + 2y + z = 20?
Solution:
Writing the line in parametric form gives
x = t + 1, y = 2t + 2, z = 3t + 3, t ∈ R
Substituting into the plane equation gives
3(t + 1) + 2(2t + 2) + 3t + 3 = 20
⇒ 10t = 10
⇒ t = 1
The line and plane intersect at (x , y , z) = (2, 4, 6)
125
Intersection of two planes
Example 13.
Find the vector form for the line of intersection of the two planes
x + 3y + 2z = 6 and 3x + 2y − z = 11.
Solution:
Solving the two equations for the planes simultaneously gives[
1 3 2 6
3 2 −1 11
]
R2 → R2 − 3R1
∼
[
1 3 2 6
0 −7 −7 −7
]
R2 → −17 R2
∼
[
1 3 2 6
0 1 1 1
]
R1 → R1 − 3R2
∼
[
1 0 −1 3
0 1 1 1
]
126
Since there are no leading entries in column 3, let z = t ∈ R. Then
y = 1− t
x = 3 + t
The line of intersection is
(x , y , z) = (3 + t, 1− t, t), t ∈ R
⇒ (x , y , z) = (3, 1, 0) + t(1,−1, 1), t ∈ R
127
Topic 4: General Vector Spaces [AR 4.1 - 4.8]
4.1 General vector spaces
4.2 Subspaces
4.3 Linear dependence and linear independence
4.4 Spanning sets
4.5 Bases and dimension
4.6 Solution space, column space, row space
4.7 Coordinates relative to a basis
128
4.1 General vector spaces
Vector spaces
We want to extend the properties of vectors in Rn to more general sets
of objects. A vector space is a set V with two operations defined:
1. addition,
2. scalar multiplication.
We want these two operations to satisfy the basic algebraic properties
of vectors in Rn. This leads to a list of ten axioms (properties) that we
use as our definition.
129
Fields
The scalars are members of a number system F called a field where
addition, subtraction, multiplication, and division by a non zero scalar
are defined.
Every field F contains a:
◮ zero scalar 0 such that α+ 0 = α for all α ∈ F.
◮ unit scalar 1 such that 1α = α for all α ∈ F.
In this subject, we focus on the fields F = R and F = C.
We will also look at F = F2, the integers modulo 2.
130
Definition (Vector space)
A vector space is a non-empty set V with two operations: addition and
scalar multiplication. These operations are required to satisfy the
following 10 rules (axioms).
For any u, v,w ∈ V :
Addition behaves well:
A1 u+ v ∈ V . (closure of vector addition)
A2 (u+ v) + w = u+ (v + w). (associativity)
A3 u+ v = v + u. (commutativity)
There must be a zero and inverses:
A4 There exists a vector 0 ∈ V such that
v + 0 = v for all v ∈ V . (existence of zero vector)
A5 For all v ∈ V , there exists a vector −v
such that v + (−v) = 0. (additive inverse)
131
Definition (Vector space continued)
For all u, v,∈ V and α, β ∈ F:
Scalar multiplication behaves well:
M1 αv ∈ V . (closure under scalar multiplication)
M2 α(βv) = (αβ)v. (associativity of scalar multiplication)
M3 1v = v. (multiplication by unit scalar)
Addition and scalar multiplication combine well :
D1 α(u+ v) = αu+ αv. (distributivity 1)
D2 (α+ β)v = αv + βv. (distributivity 2)
Note:
It follows from the axioms that for all v ∈ V :
1. 0v = 0.
2. (−1)v = −v.
132
Examples of vector spaces
Definition
A vector space that has R as the scalars is called a real vector space.
1. Rn
Our definition of vector spaces was based on algebraic properties of Rn,
so it is easy to check that Rn is a vector space with scalars R.
133
2. Vector space of matrices
Definition
Denote by Mm,n( or Mm,n(R)) the set of all m × n matrices with real
entries.
Mm,n is a real vector space with the usual matrix addition and scalar
multiplication operations.
Example 1.
Show that M2,2 is a real vector space.
Solution:
Define two vectors in V and a scalar in F
Let u =
[
a11 a12
a21 a22
]
∈ M2,2, v =
[
b11 b12
b21 b22
]
∈ M2,2 and α ∈ R.
A1. Closure under vector addition
u+ v =
[
a11 a12
a21 a22
]
+
[
b11 b12
b21 b22
]
=
[
a11 + b11 a12 + b12
a21 + b21 a22 + b22
]
∈ M2,2
134
A4. Existence of zero vector
Let the zero vector be
0 =
[
0 0
0 0
]
∈ M2,2,
then
u+ 0 =
[
a11 a12
a21 a22
]
+
[
0 0
0 0
]
=
[
a11 a12
a21 a22
]
= u
A5. Additive inverse
Let the additive inverse be
−u =
[−a11 −a12
−a21 −a22
]
∈ M2,2,
then
u+ (−u) =
[
a11 a12
a21 a22
]
+
[−a11 −a12
−a21 −a22
]
=
[
0 0
0 0
]
= 0
135
M1. Closure under scalar multiplication
αu = α
[
a11 a12
a21 a22
]
=
[
αa11 αa12
αa21 αa22
]
∈ M2,2
M3. Multiplication by unit scalar
1u = 1
[
a11 a12
a21 a22
]
=
[
a11 a12
a21 a22
]
= u
Axioms A2, A3, M2, D1, D2
These axioms follow directly from properties of scalar multiplication and
matrix addition.
As all 10 axioms are satisfied, M2,2 is a real vector space.
136
3. Vector space of polynomials
Definition
For a fixed integer n > 0, denote by Pn (or Pn(R)) the set of all
polynomials with degree at most n:
Pn = {a0 + a1x + a2x2 + · · ·+ anxn | a0, a1, . . . , an ∈ R}
Pn is a real vector space with the usual polynomial addition and scalar
multiplication operations. That is, if
a0 + · · ·+ anxn ∈ Pn, b0 + · · ·+ bnxn ∈ Pn and α ∈ R, then
(a0 + · · ·+ anxn) + (b0 + · · ·+ bnxn) = (a0 + b0) + · · · + (an + bn)xn
α(a0 + a1x + · · ·+ anxn) = (αa0) + (αa1)x + · · ·+ (αan)xn.
137
4. Vector space of functions
Definition
Let
F(S ,R) = { all functions f : S → R }
denote the set of all functions from S to R, where S is a non-empty set.
Let f , g ∈ F(S ,R) and α ∈ R.
If we define vector addition and scalar multiplication by the functions
(f + g)(x) = f (x) + g(x)
(αf )(x) = αf (x)
for all x ∈ S , then F(S ,R) is a real vector space.
138
Example 2.
If f , g ∈ F(R,R) are given by
f (x) = sin x and g(x) = x2
then
(f + g)(x) = sin x + x2
and
(3f )(x) = 3 sin x .
Note:
Function spaces are important in quantum mechanics and the study of
differential equations.
139
5. Vector space Fn2
The field F2
Definition
The set containing just two elements
F2 = {0, 1},
equipped with the operations of addition and multiplication defined
below, is a field called the integers modulo 2.
Addition
0 + 0 = 0
1 + 0 = 1
0 + 1 = 1
1 + 1 = 0
Multiplication
0× 0 = 0
1× 0 = 0
0× 1 = 0
1× 1 = 1
We call the arithmetic in F2, modular or mod 2 arithmetic.
140
The vector space Fn2
Definition
Denote by Fn2 the set of vectors with integers modulo 2 as scalars:
F
n
2 = {(a1, a2, . . . , an) | ai ∈ F2, i = 1, . . . , n}
Let (a1, a2, . . . , an) ∈ Fn2, (b1, b2, . . . , bn) ∈ Fn2 and α ∈ F2.
If we define addition and scalar multiplication by
(a1, a2, . . . , an) + (b1, b2, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn)
α(a1, a2, . . . , an) = (αa1, αa2, . . . , αan)
where on the right hand side the addition and scalar multiplication are
as for mod 2 arithmetic,
then Fn2 is a vector space.
141
6. Complex vector spaces
Definition
A vector space that has C as the scalars is called a complex vector
space.
Example 3.
C
2 = {(a1, a2) | a1, a2 ∈ C}
Let (a1, a2) ∈ C2, (b1, b2) ∈ C2 and α ∈ C.
If we define addition and scalar multiplication by
(a1, a2) + (b1, b2) = (a1 + b1, a2 + b2)
α(a1, a2) = (αa1, αa2)
then C2 is a complex vector space.
Note:
The real vector spaces Rn, Mm,n(R) and Pn(R) have complex analogues
denoted by Cn, Mm,n(C) and Pn(C).
142
Example 4.
Consider V = R2. Let u = (a1, a2) ∈ R2, v = (b1, b2) ∈ R2 and
α ∈ R. Define addition and scalar multiplication as follows:
(a1, a2) + (b1, b2) = (a1 + b1, a2 + b2)
α(a1, a2) = (αa1, 0)
Show that V is not a vector space with these operations.
Solution:
Find a specific counterexample:
Let v = (1, 1) ∈ V . Then
1v = 1(1, 1) = (1, 0) 6= v
Since axiom M3 is not true, V is not a vector space with the stated
operations.
143
Applications of general vector spaces
Regarding matrices and polynomials as vector spaces allows geometrical
notions such as distance, angles and projection to be extended to these
objects. (See Topic 7.)
Example 5.
Let A =
[
a11 a12
a21 a22
]
and B =
[
b11 b12
b21 b22
]
so that A,B ∈ M2,2.
The distance between A and B can be defined as√
(a11 − b11)2 + (a12 − b12)2 + (a21 − b21)2 + (a22 − b22)2
This is useful in comparing digital photographs since the data is
naturally a matrix (each pixel gives an entry).
144
Example 6.
The vector space Fn2 is used in coding theory applications to reduce
errors occurring in data storage and message transmission.
One way of coding data is to use a Hamming code so that errors can
not only be detected but also corrected efficiently.
1. Data is stored in the form of a string of 1s and 0s. These 1s and 0s
are called binary numbers.
2. Using modular 2 arithmetic enables digital information.
The Hamming code is a cleverly chosen subspace of F82. It has 16
elements. Declaring the elements of the Hamming code as the valid
words of length 8 is what makes error correction work.
145
4.2 Subspaces
Definition (Subspace)
A subspace W of a vector space V is a subset W ⊆ V that is itself a
vector space using the addition and scalar multiplication operations
defined on V .
The following theorem allows us to avoid checking all 10 vector space
axioms.
Theorem (Subspace theorem)
Let V be a vector space with scalars F. A subset W ⊆ V is a subspace
of V if and only if
0. W is non-empty.
1. W is closed under vector addition:
u, v ∈W ⇒ u+ v ∈W.
2. W is closed under scalar multiplication:
u ∈W , α ∈ F⇒ αu ∈W.
146
Example 7.
Show that S = {(x , y , z) ∈ R3 | x + y + z = 0} is a subspace of R3.
Solution:
Define two vectors in S and a scalar in R
Let u = (x1, y1, z1) ∈ S , v = (x2, y2, z2) ∈ S and α ∈ R.
Then
x1 + y1 + z1 = 0
x2 + y2 + z2 = 0
0. S is non-empty
(0, 0, 0) ∈ S as 0 + 0 + 0 = 0
So S is not empty.
147
1. Closure under vector addition
u+ v = (x1, y1, z1) + (x2, y2, z2) = (x1 + x2, y1 + y2, z1 + z2) ∈ R3
Now
(x1 + x2) + (y1 + y2) + (z1 + z2) = (x1 + y1 + z1) + (x2 + y2 + z2)
= 0 + 0
= 0
⇒ u+ v ∈ S
2. Closure under scalar multiplication
αu = α(x1, y1, z1) = (αx1, αy1, αz1) ∈ R3
Now
αx1 + αy1 + αz1 = α(x1 + y1 + z1) = α(0) = 0
⇒ αu ∈ S
Since S is a non-empty set that is closed under vector addition and
scalar multiplication, S is a subspace of R3.
148
Example 8.
Let S ⊂ P2 be the set of polynomials defined by
S = {a1x + a2x2 | a1, a2 ∈ R}
Is S a subspace of P2?
Solution:
Define two vectors in S and a scalar in R
Let u = a1x + a2x
2 ∈ S , v = b1x + b2x2 ∈ S and α ∈ R.
0. S is non-empty
0x + 0x2 ∈ S
So S is not empty.
149
1. Closure under vector addition
u+ v = a1x + a2x
2 + b1x + b2x
2
= (a1 + b1)x + (a2 + b2)x
2
= c1x + c2x
2, c1, c2 ∈ R
⇒ u+ v ∈ S
2. Closure under scalar multiplication
αu = α(a1x + a2x
2)
= (αa1)x + (αa2)x
2
= c1x + c2x
2, c1, c2 ∈ R
⇒ αu ∈ S
Since S is a non-empty set that is closed under vector addition and
scalar multiplication, S is a subspace of P2.
150
Definition (Trace)
If A is an n × n matrix, then the trace of A is the sum of the diagonal
entries, namely:
Tr(A) = A11 + A22 + ...+ Ann
Example 9.
Let W be the real 2× 2 matrices whose trace is equal to 0. Show that
W is a subspace of M2,2.
Solution:
Define two vectors in W and a scalar in R
Let u =
[
a11 a12
a21 a22
]
∈W , v =
[
b11 b12
b21 b22
]
∈W and α ∈ R.
Then
Tr(u) = a11 + a22 = 0
Tr(v) = b11 + b22 = 0
151
0. W is non-empty
[
0 0
0 0
]
∈W as Tr
([
0 0
0 0
])
= 0 + 0 = 0
So W is not empty.
1. Closure under vector addition
u+ v =
[
a11 a12
a21 a22
]
+
[
b11 b12
b21 b22
]
=
[
a11 + b11 a12 + b12
a21 + b21 a22 + b22
]
∈ M2,2
So
Tr (u+ v) = (a11 + b11) + (a22 + b22)
= (a11 + a22) + (b11 + b22)
= 0 + 0
= 0
⇒ u+ v ∈W
152
2. Closure under scalar multiplication
αu = α
[
a11 a12
a21 a22
]
=
[
αa11 αa12
αa21 αa22
]
∈ M2,2
So
Tr (αu) = αa11 + αa22
= α(a11 + a22)
= α(0)
= 0
⇒ αu ∈W
Since W is a non-empty set that is closed under vector addition and
scalar multiplication, W is a subspace of M2,2.
153
Example 10.
Let W be the set of real 2× 2 matrices with determinant equal to 0:
W =
{[
a b
c d
]
| a, b, c , d ∈ R and ad − bc = 0
}
Show that W is not a subspace of M2,2.
Solution:
Find a specific counterexample
Let u =
[
1 0
0 0
]
and v =
[
0 0
0 1
]
.
Since det(u) = 0 and det(v) = 0 then u, v ∈W .
Now u+ v =
[
1 0
0 1
]
so det
([
1 0
0 1
])
= 1
Since u+ v /∈W then W is not a subspace of M2,2.
154
Example 11.
Consider the set
H = {(x , y) ∈ R2 | x > 0, y > 0}
Is H a subspace of R2?
Y
Z
Solution:
We suspect H is not a subspace, so find a specific counterexample.
Let v = (1, 1) ∈ H and α = −1.
⇒ αv = −1(1, 1) = (−1,−1) 6∈ H
H is not closed under scalar multiplication, so H is not a subspace of
R
2.
155
Useful facts about subspaces
1. A subspace of V must contain the zero vector 0 ∈ V .
2. The only subspaces of R2 are
• {0}
• Lines through the origin
• R2
3. The only subspaces of R3 are
• {0}
• Lines through the origin
• Planes through the origin
• R3
Example 12.
Is the line {(x , y) ∈ R2 | y = 2x + 1} a subspace of R2?
Solution:
No, since the line does not pass through the origin.
156
4.3 Linear dependence and linear independence
Linear combinations
Let V be a vector space with scalars F.
Definition (Linear combination)
A linear combination of vectors v1, v2, . . . , vk ∈ V is the sum
k∑
i=1
αivi = α1v1 + · · ·+ αkvk ,
where each αi ∈ F.
157
Example 13.
Show that (−1, 4, 2) ∈ R3 is a linear combination of
(a) i, j and k,
(b) u1 = (1, 2, 1) and u2 = (6, 0, 0).
Solution:
(a) (−1, 4, 2) = −1i+ 4j+ 2k
(b) (−1, 4, 2) = (2, 4, 2) − (3, 0, 0) = 2u1 − 12u2
158
Example 14.
Show that (1, 2, 3) ∈ R3 is not a linear combination of (1,−1, 2) and
(−1, 1, 2)
Solution:
We need to show that there are no values of α1 and α2 in R such that
(1, 2, 3) = α1(1,−1, 2) + α2(−1, 1, 2)
= (α1 − α2,−α1 + α2, 2α1 + 2α2)
This gives a system of linear equations:
α1 − α2 = 1
−α1 + α2 = 2
2α1 + 2α2 = 3
159
1 −1 1−1 1 2
2 2 3
R2 → R2 + R1
R3 → R3 − 2R1
∼
1 −1 10 0 3
0 4 1
The system is inconsistent so there is no solution for α1 and α2.
Hence (1, 2, 3) is not a linear combination of (1,−1, 2) and (−1, 1, 2).
160
Example 15. Computer graphics application
Linear combinations are used to generate colours in computer graphics.
Three primary colours are represented as unit vectors in R3.
Red := (1, 0, 0) Green := (0, 1, 0) Blue := (0, 0, 1)
To generate a general colour we form a vector of the coefficients
α(1, 0, 0) + β(0, 1, 0) + γ(0, 0, 1)
with α, β, γ chosen between 0 and 1.
Examples
has α = 0.8, β = 0.47, γ = 0.25
has α = 0.8, β = 0.12, γ = 0.5
Note:
These primary colours are different from the standard primary colours
red, blue, yellow.
161
Linear dependence and linear independence
Definition (Linear dependence)
A set of vectors {v1, . . . , vk} ∈ V is linearly dependent if there are
scalars α1, . . . , αk ∈ F, not all equal to zero, such that
k∑
i=1
αivi = α1v1 + α2v2 + · · ·+ αkvk = 0.
Theorem
A set of vectors v1, . . . , vk with k > 2 is linearly dependent if and only
if one vector is a linear combination of the others.
In particular:
Two vectors are linearly dependent if and only if one is a multiple of the
other.
162
Note:
Any set of vectors containing the zero vector 0 is linearly dependent.
Definition (Linear independence)
A set of vectors {v1, . . . , vk} ∈ V is linearly independent if it is not
linearly dependent. Equivalently, if
k∑
i=1
αivi = α1v1 + α2v2 + · · ·+ αkvk = 0,
where α1, . . . , αk ∈ F are scalars, then the only solution is
α1 = 0, α2 = 0, . . . , αk = 0.
163
Example 16.
Are the following vectors linearly dependent or linearly independent?
(a) u = (2,−1, 1), v = (−6, 3,−3)
(b) u = (2,−1, 1), v = (4, 0, 2)
Solution:
(a) u and v are linearly dependent since
v = −3(2,−1, 1) = −3u ⇒ 3u+ v = 0
(b) u and v are linearly independent since neither vector is a multiple of
the other.
(4, 0, 2) 6= α(2,−1, 1) for any α ∈ R.
164
Example 17.
Are the vectors (2, 0, 0), (6, 1, 7), (2,−1, 2) linearly dependent or linearly
independent?
Solution:
Suppose that
α1(2, 0, 0) + α2(6, 1, 7) + α3(2,−1, 2) = 0
for some α1, α2, α3 ∈ R.
This gives a system of linear equations:
2α1 + 6α2 + 2α3 = 0
α2 − α3 = 0
7α2 + 2α3 = 0
165
Reducing the augmented matrix to row echelon form gives
2 6 2 00 1 −1 0
0 7 2 0
R1 → 12R1
R3 → R3 − 7R2
∼
1 3 1 00 1 −1 0
0 0 9 0
The only solution is α1 = α2 = α3 = 0.
Hence (2, 0, 0), (6, 1, 7), (2,−1, 2) are linearly independent.
Note:
The augmented matrix has the original vectors as its columns.
166
Example 18.
Are the vectors (1, 2, 0), (1, 5, 3), (0, 1, 1) linearly dependent or linearly
independent?
Solution:
Suppose that
α1(1, 2, 0) + α2(1, 5, 3) + α3(0, 1, 1) = 0
for some α1, α2, α3 ∈ R.
This gives a system of linear equations:
α1 + α2 = 0
2α1 + 5α2 + α3 = 0
3α2 + α3 = 0
167
Reducing the augmented matrix to row echelon form gives:
1 1 0 02 5 1 0
0 3 1 0
R2 → R2 − 2R1 ∼
1 1 0 00 3 1 0
0 3 1 0
R3 → R3 − R2
∼
1 1 0 00 3 1 0
0 0 0 0
There are an infinite number of solutions, so there is a solution with
(α1, α2, α3) 6= (0, 0, 0) .
Hence (1, 2, 0), (1, 5, 3), (0, 1, 1) are linearly dependent.
168
How to determine if vectors in Rn are linearly independent
The previous two examples showed that if A = [v1 · · · vk ] where
v1, · · · , vk ∈ Rn, then:
1. v1, · · · , vk ∈ Rn are linearly independent ⇔ the linear system for
the augmented matrix [A|0] has a unique solution.
2. v1, . . . , vk ∈ Rn are linearly independent ⇔ Rank(A) = k
3. If k = n then
v1, . . . , vn ∈ Rn are linearly independent ⇔ det(A) 6= 0
169
Example 19.
Determine if the following polynomials
2 + 2x + 5x2, 1 + x + x2, 1 + 2x + 3x2
are linearly independent.
Solution:
Suppose that
α1(2 + 2x + 5x
2) + α2(1 + x + x
2) + α3(1 + 2x + 3x
2) = 0 + 0x + 0x2.
for some α1, α2, α3 ∈ R.
Equating coefficients of 1, x , x2 gives a system of 3 linear equations:
2α1 + α2 + α3 = 0
2α1 + α2 + 2α3 = 0
5α1 + α2 + 3α3 = 0
170
We can omit the last column of zeros in the augmented matrix. Solving
gives:
2 1 12 1 2
5 1 3
R2 → R2 − R1
R3 → R3 − 52R1
∼
2 1 10 0 1
0 −3
2
1
2
R2 ↔ R3
∼
2 1 10 −3
2
1
2
0 0 1
The matrix is of rank 3 which is equal to the number of unknowns. So
the only solution is α1 = α2 = α3 = 0.
Therefore, the polynomials are linearly independent.
171
Example 20.
Are the following matrices linearly independent?[
1 3
1 1
]
,
[−2 1
1 −1
]
,
[
1 10
4 2
]
.
Solution:
Suppose that
α1
[
1 3
1 1
]
+ α2
[−2 1
1 −1
]
+ α3
[
1 10
4 2
]
=
[
0 0
0 0
]
for some α1, α2, α3 ∈ R.
Equating components in the matrices gives a system of 4 linear
equations:
α1 − 2α2 + α3 = 0
3α1 + α2 + 10α3 = 0
α1 + α2 + 4α3 = 0
α1 − α2 + 2α3 = 0
172
We can omit the last column of zeros in the augmented matrix. Solving
gives:
1 −2 1
3 1 10
1 1 4
1 −1 2
R2 → R2 − 3R1
R4 → R4 − R1
∼
1 −2 1
0 7 7
0 3 3
0 1 1
R2 ↔ R4
∼
1 −2 1
0 1 1
0 3 3
0 7 7
R3 → R3 − 3R2
R4 → R4 − 7R2
∼
1 −2 1
0 1 1
0 0 0
0 0 0
The matrix has rank 2 which is less than the number of unknowns, so
there are infinite solutions.
As there are non zero solutions for α1, α2, α3, the matrices are not
linearly independent.
173
Theorem
Let {v1, . . . , vk} be a set of vectors in Rn. If k > n, this set is linearly
dependent.
Proof:
Let A = [v1 · · · vk ] where vi ∈ Rn.
Then rank(A) 6 n.
Since k > n this implies that rank(A) 6= k .
Hence the vectors are linearly dependent.
174
Linear combinations via reduced row-echelon form
Let A = [v1 · · · vk ] and B = [u1 · · · uk ].
If A and B are row-equivalent matrices (i.e., A ∼ B), then the columns
of A satisfy the same linear relations as the columns of B :
α1v1 + · · ·+ αkvk = 0 ⇔ α1u1 + · · · + αkuk = 0
If A ∼ B and B is in reduced row-echelon form, then the relations
between the vectors v1, . . . , vk can be read off from B .
175
Example 21.
Consider the vectors v1 = (1, 2, 3), v2 = (3, 6, 9), v3 = (−1, 0,−2),
v4 = (1, 4, 4) and v5 = (−2, 3, 5).
(a) Are the vectors linearly independent?
(b) Express v2 and v4 as a linear combination of v1, v3 and v5.
(c) Are the vectors v1, v3, v5 linearly independent?
Solution:
Using row operations
A =
1 3 −1 1 −22 6 0 4 3
3 9 −2 4 5
∼
1 3 0 2 00 0 1 1 0
0 0 0 0 1
= B
(a) Since there are 5 vectors in R3, the vectors are linearly dependent.
176
(b) Columns 2 and 4 in the reduced row-echelon form give:
v2 = 3v1
v4 = 2v1 + v3
(c) Columns 1, 3 and 5 in the reduced row-echelon form are clearly
linearly independent. So v1, v3, v5 are also linearly independent.
177
4.4 Spanning sets
Let S = {v1, . . . , vk} be a set of vectors in a vector space V defined
over the field F.
Definition (Span of a set)
The span of the set S = {v1, . . . , vk} is the set of all linear
combinations of vectors from S :
span(S) = {α1v1 + · · ·+ αkvk | α1, . . . , αk ∈ F}.
!
"
#
$
%
In this diagram, the plane is the span of the vectors u and v.
178
Theorem
Let V be a vector space and v1, . . . , vk ∈ V . Then
1. span{v1, . . . , vk} is a subspace of V .
2. Any subspace of V that contains the vectors v1, . . . , vk contains
span{v1, . . . , vk}.
179
What does the span of a set of vectors look like?
Consider 3 vectors v1, v2, v3 in R
3. Then
span{v1, v2, v3}
could be:
1. {0}
2. A line through the origin
3. A plane through the origin
4. R3
Note:
A span of 4 or more vectors in R3 will also give one of the above sets
because these are the only possible subspaces of R3.
180
Example 22.
Let v1 = (1, 1, 1), v2 = (2, 2, 2), v3 = (3, 3, 3). In R
3, what is
span{v1, v2, v3}?
Solution:
span{v1, v2, v3}
= {α1(1, 1, 1) + α2(2, 2, 2) + α3(3, 3, 3) | α1, α2, α3 ∈ R}
= {α1(1, 1, 1) + 2α2(1, 1, 1) + 3α3(1, 1, 1), α1, α2, α3 ∈ R}
Let t = α1 + 2α2 + 3α3. Then
span{v1, v2, v3} = {t(1, 1, 1) | t ∈ R}
This is the line through the origin in the direction of (1, 1, 1) or
x = y = z .
181
Example 23.
In R3, what is span{(1, 1, 1), (3, 2, 1)}?
Solution:
span{(1, 1, 1), (3, 2, 1)} = {α1(1, 1, 1) + α2(3, 2, 1) | α1, α2 ∈ R}
This is the plane through the origin parallel to the vectors
(1, 1, 1), (3, 2, 1).
182
Example 24.
In R3, what is span{(1, 0, 0), (0, 1, 0), (0, 0, 1)}?
Solution:
span{(1, 0, 0), (0, 1, 0), (0, 0, 1)}
= {α1(1, 0, 0) + α2(0, 1, 0) + α3(0, 0, 1) | α1, α2, α3 ∈ R}
Since the vectors all have different directions, this is the whole of R3.
183
Definition (Vectors spanning a space)
Let V be a vector space. Vectors v1, . . . , vk ∈ V are said to span V if
span{v1, . . . , vk} = V .
Then we also say that {v1, . . . , vk} is a spanning set for V .
Note:
If the vectors v1, . . . , vk span V , then
(1) V contains all the vectors v1, . . . , vk , and
(2) all vectors in V are linear combinations of the vectors v1, . . . , vk .
184
Example 25.
Show that the vectors (1,−1), (2, 4) span R2.
Solution:
We need to show that for any (x , y) ∈ R2, we can find α1, α2 ∈ R such
that
α1(1,−1) + α2(2, 4) = (x , y)
This gives the linear system
α1 + 2α2 = x
−α1 + 4α2 = y
185
[
1 2 x
−1 4 y
]
R2 → R2 + R1 ∼
[
1 2 x
0 6 y + x
]
Since the rank of the coefficient matrix is 2 and there are 2 unknowns
α1, α2, there is a unique solution for all (x , y).
So the vectors span R2.
186
Example 26.
Do the vectors (1, 2, 0), (1, 5, 3), (0, 1, 1) span R3?
Solution:
If the vectors span R3, then for any (x , y , z) ∈ R3 we can find
α1, α2, α3 ∈ R such that
α1(1, 2, 0) + α2(1, 5, 3) + α3(0, 1, 1) = (x , y , z)
This gives a system of linear equations:
α1 + α2 = x
2α1 + 5α2 + α3 = y
3α2 + α3 = z
187
From example 18,
1 1 0 x2 5 1 y
0 3 1 z
∼
1 1 0 x0 3 1 y − 2x
0 0 0 z − y + 2x
The system only has a solution if z − y + 2x = 0, so the vectors do not
span R3.
In fact, they span the plane {(x , y , z) ∈ R3 | z − y + 2x = 0}.
Alternatively:
Since the vectors are in R3 and the rank of the coefficient matrix is
2 < 3, the vectors do not span R3.
188
How to determine if vectors span Rn
The previous two examples show that if A = [v1 · · · vk ] and
X = [x1 · · · xk ]T where vi ∈ Rn, xi ∈ R, then:
1. v1, · · · , vk span Rn if the linear system for the augmented matrix
[A|X ] has a solution for all X ∈ Rn.
2. v1, . . . , vk span R
n ⇔ rank(A) = n.
Theorem
If k < n, then the vectors v1, . . . , vk ∈ Rn do not span Rn.
Proof:
Since A = [v1 · · · vk ] has k columns, rank(A) 6 k .
Since k < n this implies that rank(A) 6= n.
Hence v1, . . . , vk do not span R
n.
189
Example 27.
Do the polynomials p1(x) = 1 + x + x
2 and p2(x) = x
2 span P2?
Solution:
If p1,p2 span P2, then for any a, b, c ∈ R we can find α1, α2 ∈ R such
that
α1(1 + x + x
2) + α2(x
2) = a + bx + cx2
Equating coefficients of 1, x , x2 gives:
α1 = a
α1 = b
α1 + α2 = c
The system only has a solution for α1 and α2 if a = b.
Hence p1(x) = 1 + x + x
2 and p2(x) = x
2 do not span P2.
190
4.5 Bases and dimension
Sometimes spanning sets can be too big. To get an efficient description
of a vector space V , we seek a set with the minimum number of vectors
needed to span V .
Bases
Definition (Basis)
A basis for a vector space V is a set of vectors {v1, . . . , vk} ⊆ V which:
1. span V .
2. are linearly independent.
Note:
A vector space V can have infinitely many different bases.
e.g. Any two linearly independent vectors in R2 will give a basis for R2.
191
Example 28.
Is {(1,−1), (2, 4)} a basis for R2?
Solution:
In example 25, we found
span{(1,−1), (2, 4)} = R2.
Since
(1,−1) 6= α(2, 4) for α ∈ R,
then (1,−1) and (2, 4) are linearly independent.
Hence {(1,−1), (2, 4)} is a basis for R2.
192
Example 29.
Is {(2,−1,−1), (1, 2,−3)} a basis for the plane
W = {(x , y , z) ∈ R3|x + y + z = 0} in R3?
Solution:
Do the vectors lie in the plane?
Let v1 = (2,−1,−1), v2 = (1, 2,−3).
Then v1, v2 ∈W since 2− 1− 1 = 0 and 1 + 2− 3 = 0.
Do the vectors span the plane?
Given (x , y , z) ∈W we want to find α1, α2 ∈ R such that
α1(2,−1,−1) + α2(1, 2,−3) = (x , y , z)
This gives the system:
2α1 + α2 = x
−α1 + 2α2 = y
−α1 − 3α2 = z
193
Solving gives:
2 1 x−1 2 y
−1 −3 z
R1 ↔ R2 ∼
−1 2 y2 1 x
−1 −3 z
R2 → R2 + 2R1
R3 → R3 − R1
∼
−1 2 y0 5 x + 2y
0 −5 z − y
R3 → R3 + R2
∼
−1 2 y0 5 x + 2y
0 0 x + y + z
So the system is consistent provided x + y + z = 0, i.e. (x , y , z) ∈W .
Thus the vectors span W .
194
Are the vectors linearly independent?
Since
(2,−1,−1) 6= α(1, 2,−3) for α ∈ R,
then (2,−1,−1) and (1, 2,−3) are linearly independent.
Hence
{(2,−1,−1), (1, 2,−3)}
is a basis for W .
195
Theorem
Let V be a vector space and let {v1, . . . , vk} be a basis for V .
1. A subset of V with more than k vectors is linearly dependent.
2. A subset of V with fewer than k vectors does not span V .
3. Any two bases of V have the same number of elements, k.
Dimension
Definition (Dimension)
The dimension of a vector space V is the number of vectors in a basis
for V . This is denoted by dim(V ) or dimV .
We call V finite dimensional if it has a basis with a finite number of
elements, and infinite dimensional otherwise.
196
Standard Bases
◮ The standard basis for Rn is
{(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)} .
These vectors are typically denoted as {e1, e2, · · · , en}.
(In R3 the standard basis is also denoted by {i, j, k})
The dimension of Rn is n.
◮ The standard basis for Mm,n is
1 0 . . . 0
0 0 . . . 0
...
...
...
0 0 . . . 0
,
0 1 . . . 0
0 0 . . . 0
...
...
...
0 0 . . . 0
, . . . ,
0 0 . . . 0
0 0 . . . 0
...
...
...
0 0 . . . 1
.
The dimension of Mm,n is mn.
197
◮ The standard basis for Pn is{
1, x , x2, . . . , xn
}
.
The dimension of Pn is n + 1.
◮ Let P be the set of all polynomials of all possible degrees:
P = {a0 + a1x + · · · + anxn | n ∈ N, a0, a1, . . . , an ∈ R}
where N denotes the set of natural numbers.
P is an infinite dimensional vector space, since for each integer
k > 1, we can find a linearly independent set {1, . . . , xk} with
more than k elements.
The standard basis for P is
{1, x , x2, . . .}.
Note:
If V = {0}, we say that dim(V ) = 0.
198
Example 30.
Is W =
{[
1 0
0 −1
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]}
a basis for M2,2?
Solution:
Since dim(M2,2) = 2× 2 = 4, and there are only 3 matrices in W , W
does not span M2,2.
Hence W is a not a basis for M2,2.
199
How to find a basis for the span of a set of vectors in Rn
Recall that if A ∼ B , then the columns of A and B satisfy the same
linear relationships.
Method 1: Using columns
To find a basis for V = span{v1, . . . , vk} where v1 · · · vk ∈ Rn.
1. Form the matrix A = [v1 · · · vk ] with columns v1, . . . , vk .
2. Reduce A to row-echelon form B .
3. The columns of A corresponding to the leading entries in B give a
basis for V .
Note:
This method gives a basis that is a subset of the original set of vectors.
200
Example 31.
Let S = {v1, v2, v3} = {(1,−1, 2, 1), (1,−3, 0, 5), (1, 0, 3,−1)}.
(a) Find a subset of S that is a basis for span(S).
(b) What is the dimension of span(S)?
Solution:
(a) Reduce [v1 v2 v3] to row-echelon form
A =
1 1 1
−1 −3 0
2 0 3
1 5 −1
R2 → R2 + R1R3 → R3 − 2R1
R4 → R4 − R1
∼
1 1 1
0 −2 1
0 −2 1
0 4 −2
R3 → R3 − R2
R4 → R4 + 2R2
∼
1 1 1
0 −2 1
0 0 0
0 0 0
= B
201
The leading entries are in columns 1 and 2 of B , so columns 1 and 2 of
A are linearly independent.
Hence, a basis for span(S) is the first 2 columns of A
{(1,−1, 2, 1), (1,−3, 0, 5)}.
(b) As there are 2 elements in the basis,
dim(span(S)) = 2.
202
How to find a basis for the span of a set of vectors in Rn
Method 2: Using rows
To find a basis for V = span{v1, . . . , vk} where v1 · · · vk ∈ Rn.
1. Form the matrix A = [v1 · · · vk ]T with rows v1, . . . , vk .
2. Reduce A to row-echelon form B .
3. The non-zero rows of B give a basis for V .
Note:
This method gives a basis that will usually not be a subset of the
original set of vectors.
The method works because elementary row operations do not change
the subspace spanned by the rows, and in the row echelon form the
non-zero rows are linearly independent.
203
Example 32.
Let S = {v1, v2, v3} = {(1,−1, 2, 1), (1,−3, 0, 5), (1, 0, 3,−1)}.
Find a basis for span(S) using the row method.
Solution:
Reduce [v1 v2 v3]
T to row-echelon form
1 −1 2 11 −3 0 5
1 0 3 −1
R2 → R2 − R1
R3 → R3 − R1
∼
1 −1 2 10 −2 −2 4
0 1 1 −2
R3 → R3 + 12R2
∼
1 −1 2 10 −2 −2 4
0 0 0 0
The non zero rows in the row-echelon matrix give a basis for span(S):
{(1,−1, 2, 1), (0,−2,−2, 4)}.
Note:
This basis is different to that found in the previous example.
204
Theorem
Let V be a vector space.
1. Any set that spans V contains a basis for V .
2. Any linearly independent set in V can be extended to a basis of V .
Theorem
Let V be a vector space, and suppose that dim(V ) = n is finite.
1. If a set with exactly n elements spans V , then it is a basis for V .
2. If a linearly independent subset of V has exactly n elements, then
it is a basis for V .
205
Example 33.
Let B = {(2, 0, 0), (6, 1, 7), (2,−1, 2)}. Is B a basis for R3?
Solution:
In example 17, we found that (2, 0, 0), (6, 1, 7), (2,−1, 2) are linearly
independent.
Since dim(R3) = 3 and B contains 3 linearly independent vectors, then
B is a basis for R3.
206
Example 34.
(a) Show that
W = {(a + b, 0, a − b, 0) | a, b ∈ R}
is a subspace of R4 and find a basis B for W .
(b) Extend B to a basis for R4.
Solution:
(a) Now
W = {a(1, 0, 1, 0) + b(1, 0,−1, 0) | a, b ∈ R}
= span{(1, 0, 1, 0), (1, 0,−1, 0)}
Hence W is a subspace of R4.
Since
(1, 0, 1, 0) 6= α(1, 0,−1, 0) for any α ∈ R,
then (1, 0, 1, 0) and (1, 0,−1, 0) are linearly independent.
Hence a basis for W is
B = {(1, 0, 1, 0), (1, 0,−1, 0)}
207
(b) Since B is a linearly independent set in R4, we can extend it to a
basis for R4.
Extend B to a spanning set S for R4 by adding the standard basis for
R
4:
S = {(1, 0, 1, 0), (1, 0,−1, 0), (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)}
Reduce S to a basis using the column method:
208
A =
1 1 1 0 0 0
0 0 0 1 0 0
1 −1 0 0 1 0
0 0 0 0 0 1
R3 → R3 − R1
∼
1 1 1 0 0 0
0 0 0 1 0 0
0 −2 −1 0 1 0
0 0 0 0 0 1
R2 ↔ R3
∼
1 1 1 0 0 0
0 −2 −1 0 1 0
0 0 0 1 0 0
0 0 0 0 0 1
The leading entries occur in columns 1,2,4,6, so these columns are
linearly independent. Hence a basis for R4 is
{(1, 0, 1, 0), (1, 0,−1, 0), (0, 1, 0, 0), (0, 0, 0, 1)}
which contains the basis B for W .
209
4.6 Solution space, column space, row space
Solution space of a matrix
Theorem (Solution Space)
Let A be a real m × n matrix. If (x1, · · · , xn) ∈ Rn and
x = [x1, · · · , xn]T , then the solution set
S = {x |Ax = 0}
is a subspace of Rn.
This is called the solution space or nullspace of A, and its dimension is
called the nullity of A, denoted by nullity(A).
Proof:
Define two vectors in S and a scalar in R
Let u ∈ S , v ∈ S and α ∈ R.
Then Au = 0 and Av = 0.
210
0. S is non-empty
0 ∈ S as A0 = 0. So S is not empty.
1. Closure under vector addition
A(u+ v) = Au+ Av = 0+ 0 = 0
⇒ u+ v ∈ S
2. Closure under scalar multiplication
A(αu) = αAu = α0 = 0
⇒ αu ∈ S
Since S is a non-empty set that is closed under vector addition and
scalar multiplication, S is a subspace of Rn.
211
How to find a basis for the solution space
To find a basis for the solution space of the linear system Ax = 0:
1. Reduce the augmented matrix [A|0] to row-echelon form [B |0].
2. Solve the system and write the solution in the form
x = t1v1 + · · · + tkvk
where t1, . . . , tk ∈ R and v1, . . . , vk ∈ Rn.
Then {v1, · · · , vk} is a basis for the solution space.
212
Example 35.
Find a basis for the solution space of the linear system given by
x1 + 2x2 + x3 + x4 = 0
3x1 + 6x2 + 4x3 + x4 = 0
What is the dimension of the solution space?
Solution:
Solve the linear system[
1 2 1 1 0
3 6 4 1 0
]
R2 → R2 − 3R1 ∼
[
1 2 1 1 0
0 0 1 −2 0
]
There are no leading entries for x2 and x4, so we let
x2 = s, x4 = t, s, t ∈ R.
Back substitution gives
x3 = 2x4 = 2t
x1 = −2x2 − x3 − x4 = −2s − 3t
213
So the general solution is
(x1, x2, x3, x4) = (−2s − 3t, s, 2t, t)
= s(−2, 1, 0, 0) + t(−3, 0, 2, 1), s, t ∈ R.
Hence
{(−2, 1, 0, 0), (−3, 0, 2, 1)}
spans the solution space.
Further, these vectors are linearly independent as
(−2, 1, 0, 0) 6= α(−3, 0, 2, 1) for any α ∈ R,
so
{(−2, 1, 0, 0), (−3, 0, 2, 1)}
is a basis for the solution space.
214
Since there are 2 vectors in the basis, the dimension of the solution
space (nullity) is 2.
Note:
The nullity(A) is the number of columns in the row-echelon form of A
not containing a leading entry.
215
Column space and row space of a matrix
Definition (Column space)
Let A be a real m × n matrix. The subspace of Rm spanned by the
columns of A is called the column space of A.
Definition (Row Space)
Let A be a real m × n matrix. The subspace of Rn spanned by the rows
of A is called the row space of A.
Note:
◮ The column space and row space are subspaces, since the span of
any set of vectors is a subspace.
◮ To find a basis for the column space use the column method.
◮ To find a basis for the row space use the row method.
216
Example 36.
Let
A =
1 −1 2 −2
2 0 1 0
5 −3 7 −6
1 1 −1 3
.
Find a basis for the
(a) column space of A.
(b) row space of A.
What are the dimensions of the row space and the column space?
Solution:
Reducing A to row-echelon form gives
A =
1 −1 2 −2
2 0 1 0
5 −3 7 −6
1 1 −1 3
∼
1 −1 2 −2
0 2 −3 4
0 0 0 1
0 0 0 0
= B
217
(a) The leading entries in B are in columns 1, 2 and 4.
So columns 1, 2 and 4 of A form a basis for the column space of A
{(1, 2, 5, 1), (−1, 0,−3, 1), (−2, 0,−6, 3)}.
As there are 3 elements in the basis,
dim(Column space(A)) = 3.
(b) A basis for the row space of A is the 3 non-zero rows in B
{(1,−1, 2,−2), (0, 2,−3, 4), (0, 0, 0, 1)}.
As there are 3 elements in the basis,
dim(Row space(A)) = 3.
218
Effect of row operations
Assume that A ∼ B by a sequence of elementary row operations. Then
◮ solution space of A = solution space of B ,
◮ row space of A = row space of B ,
◮ column space of A 6= column space of B in general.
219
Example 37.
Given
A =
1 −1 2 −2
2 0 1 0
5 −3 7 −6
1 1 −1 3
∼
1 −1 2 −2
0 2 −3 4
0 0 0 1
0 0 0 0
= B
Are the column spaces of A and B the same?
Solution:
From example 36, the column space of A is
span{(1, 2, 5, 1), (−1, 0,−3, 1), (−2, 0,−6, 3)}
which contains vectors with last coordinate non-zero.
The column space of B is
span{(1, 0, 0, 0), (−1, 2, 0, 0), (−2, 4, 1, 0)}
where all vectors have last coordinate zero.
So the column space of A and B are not equal.
220
Row rank equals column rank
Theorem
For any m × n matrix A:
rank(A) = dim(row space of A) = dim(column space of A).
This is because the number of non-zero rows in the row-echelon form is
equal to the number of leading entries in the row-echelon form. This
determines the number of basis vectors.
221
Rank-nullity theorem
Theorem
Suppose A is an m × n matrix. Then
rank(A) + nullity(A) = n = number of columns in A.
Proof:
Suppose A has row echelon form B .
The rank of A equals the number of columns in B containing a leading
entry.
The nullity of A equals the number of columns in B which do not
contain a leading entry.
Therefore, rank(A) + nullity(A) is equal to the number of columns in B ,
which is equal to the number of columns in A.
222
Other fields of scalars
Our work on vector spaces applies using any field of scalars F instead of
the real numbers R. For example, we could use F = C or F = F2.
Then
F
n = {(x1, . . . , xn) | x1, . . . , xn ∈ F}
is vector space using F as scalars.
Given any m × n matrix A ∈ Mm,n(F) with entries in F, we have:
◮ solution space of A is a subspace of Fn
◮ column space of A is a subspace of Fm
◮ row space of A is a subspace of Fn
The previous theorems also apply in this context.
223
Example 38.
Consider the matrix
A =
[
1 1 0
1 0 1
]
∈ M2,3(F2).
Determine the column space, row space and solution space for A.
Note: Since the field is F2 all matrix entries are in the set {0, 1} and
the operations of vector addition and scalar multiplication are done
using mod 2 arithmetic.
Solution:
Reduce A to reduced row-echelon form using mod 2 arithmetic.
A =
[
1 1 0
1 0 1
]
R2 → R2 + R1 ∼
[
1 1 0
0 1 1
]
R1 → R1 + R2
∼
[
1 0 1
0 1 1
]
= B
224
Column space of A
Since the leading entries are in column 1 and 2 of B , columns 1 and 2
of A form a basis for the column space.
Hence the column space of A is
span{(1, 1), (1, 0)} = {α1(1, 1) + α2(1, 0), α1, α2 ∈ F2}
There are 4 elements:
α1 = α2 = 0 : (0, 0)
α1 = 1, α2 = 0 : (1, 1)
α1 = 0, α2 = 1 : (1, 0)
α1 = α2 = 1 : (1, 1) + (1, 0) = (0, 1)
So the column space of A is F22.
225
Row space of A
The non-zero rows in B form a basis for the row space of A.
Hence the row space of A is
span{(1, 0, 1), (0, 1, 1)} = {α1(1, 0, 1) + α2(0, 1, 1), α1, α2 ∈ F2}
There are 4 elements:
α1 = α2 = 0 : (0, 0, 0)
α1 = 1, α2 = 0 : (1, 0, 1)
α1 = 0, α2 = 1 : (0, 1, 1)
α1 = α2 = 1 : (1, 0, 1) + (0, 1, 1) = (1, 1, 0)
226
Solution space of A
We need to solve the following system using mod 2 arithmetic
[
1 1 0
1 0 1
]x1x2
x3
= [0
0
]
.
The reduced row-echelon form gives
x1 + x3 = 0
x2 + x3 = 0
This requires x1 = x2 = x3.
Let x1 = x2 = x3 = t, t ∈ F2.
The solution space of A is
{(x1, x2, x3) = t(1, 1, 1) | t ∈ F2}.
227
There are 2 elements:
t = 0 : (0, 0, 0)
t = 1 : (1, 1, 1)
Note:
The rank-nullity theorem holds as rank(A) = 2, nullity(A) = 1 and A
has 3 columns.
228
4.7 Coordinates relative to a basis
Theorem
If {v1, . . . , vn} is a basis for a vector space V , then every vector v ∈ V
can be written uniquely in the form
v = α1v1 + . . .+ αnvn,
where α1, . . . , αn ∈ F.
Proof:
Since {v1, . . . , vn} is a basis, it spans V and so for some scalars
α1, . . . , αn ∈ F
v = α1v1 + . . .+ αnvn (1)
Are these scalars unique? Suppose
v = β1v1 + . . .+ βnvn (2)
for some other scalars β1, . . . , βn ∈ F.
229
Taking (2)− (1) gives
0 = (β1 − α1)v1 + . . . + (βn − αn)vn.
Since {v1, . . . , vn} is a basis, v1, . . . , vn are linearly independent, and
the only solution is
β1 − α1 = 0, . . . , βn − αn = 0
⇒ α1 = β1, . . . , αn = βn
showing the uniqueness of the scalars α1, . . . , αn.
Note:
Conversely, if v1, . . . , vn ∈ V and every vector v ∈ V can be written
uniquely in the form
v = α1v1 + . . . αnvn,
where α1, . . . , αn ∈ F, then {v1, . . . , vn} is a basis for V .
230
Definition (Coordinates)
Suppose B = {v1, . . . , vn} is an ordered basis for a vector space V ,
i.e. a basis arranged in the order v1 first, v2 second and so on.
For v ∈ V we can write
v = α1v1 + · · ·+ αnvn for some scalars α1, . . . , αn ∈ F.
The scalars α1, . . . , αn are uniquely determined by v and are called the
coordinates of v relative to B.
Then
[v]B =
α1
...
αn
is called the coordinate vector of v with respect to B.
231
Example 39.
Consider the vector v = (1, 5). Determine the coordinates of v with
respect to the ordered bases:
(a) B = {(1, 0), (0, 1)}
(b) B′ = {a,b} = {(2, 1), (−1, 1)}
Solution:
(a) As v = (1, 0) + 5(0, 1) then [v]B =
[
1
5
]
(b) We seek α1, α2 ∈ R such that
α1(2, 1) + α2(−1, 1) = (1, 5)
Equating components gives
2α1 − α2 = 1
α1 + α2 = 5
232
Reducing the augmented matrix to reduced row-echelon form gives[
2 −1 1
1 1 5
]
∼
[
1 0 2
0 1 3
]
Since α1 = 2 and α2 = 3, then [v]B′ =
[
2
3
]
.
v
i
5j
2a
3b
1
5
1
2
3
-1
-2
1
2
3
4
-1
-2
y
x
a
b
The point (1, 5) shown on a grid formed by a and b.
233
Example 40.
Let
p(x) = 2 + 7x − 9x2 ∈ P2.
Determine the coordinates of p with respect to the basis
B = {2, 1
2
x ,−3x2}.
Solution:
Since
p(x) = 1(2) + 14
(
1
2
x
)
+ 3(−3x2)
then
[p]B =
114
3
.
234
Example 41.
Let
A =
[
1 2
3 4
]
∈ M2,2.
Determine the coordinates of A with respect to the standard basis
B =
{[
1 0
0 0
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]
,
[
0 0
0 1
]}
.
Solution:
Since [
1 2
3 4
]
= 1
[
1 0
0 0
]
+ 2
[
0 1
0 0
]
+ 3
[
0 0
1 0
]
+ 4
[
0 0
0 1
]
then [A]B =
1
2
3
4
.
235
Having fixed a finite ordered basis B for a vector space V , there is a
one-to-one correspondence between vectors and their coordinate
vectors.
Theorem
Let V be a finite dimensional vector space over a field of scalars F. Let
B be an ordered basis of V .
If u, v ∈ V and α ∈ F is a scalar, then
[u+ v]B = [u]B + [v]B
[αv]B = α[v]B .
This theorem tells us that taking coordinates is a linear transformation
between the vector spaces V and Fn.
236
Topic 5: Linear Transformations [AR 4.9-4.11, 8.1-8.5]
5.1 General linear transformations
5.2 Geometric linear transformations from R2 to R2
5.3 Matrix representations for general linear transformations
5.4 Image, kernel, rank and nullity
5.5 Invertible linear transformations
5.6 Change of basis
237
5.1 General linear transformations
We next consider functions or mappings between general vector spaces
that preserve the vector space structure of vector addition and scalar
multiplication.
Definition (Linear transformation)
Let V and W be vector spaces over the same field of scalars F.
A linear transformation from V to W is a function T : V →W such
that for each u, v ∈ V and each α ∈ F:
1. T (u+ v) = T (u) + T (v) (T preserves vector addition)
2. T (αu) = αT (u) (T preserves scalar multiplication)
Loosely speaking, linear transformations are functions between vector
spaces that preserve the vector space structure.
238
Example 1.
Show that the cross product T : R3 → R3 given by
T (u) = u× (1, 1,−1)
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar
Let u ∈ R3, v ∈ R3 and α ∈ R.
1. T preserves vector addition
T (u+ v) = (u+ v)× (1, 1,−1)
= u× (1, 1,−1) + v × (1, 1,−1)
= T (u) + T (v)
239
2. T preserves scalar multiplication
T (αu) = (αu)× (1, 1,−1)
= α(u× (1, 1,−1))
= αT (u)
Since T preserves vector addition and scalar multiplication, T is a linear
transformation.
240
Example 2.
Show that the function D : P3 → P2 defined by differentiation with
respect to x ,
D(a0 + a1x + a2x
2 + a3x
3) = a1 + 2a2x + 3a3x
2
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar
Let p1,p2 ∈ P3 and α ∈ R.
1. D preserves vector addition
D(p1 + p2) =
d
dx
(p1 + p2)
=
d
dx
(p1) +
d
dx
(p2)
= D(p1) + D(p2)
241
2. D preserves scalar multiplication
D(αp1) =
d
dx
(αp1)
= α
d
dx
(p1)
= αD(p1)
Since D preserves vector addition and scalar multiplication, D is a linear
transformation.
242
Example 3.
Show that the function T : R2 → R2 given by
T (x1, x2) = (x1 − x2, x1 + 1)
is not a linear transformation.
Solution:
Find a specific counterexample
Method 1 - vector addition
Let u = (1, 1) and v = (2, 2). Then
T (u+ v) = T (3, 3) = (0, 4)
and
T (u) + T (v) = T (1, 1) + T (2, 2) = (0, 2) + (0, 3) = (0, 5).
Since
T (αu) 6= αT (u),
T does not preserve vector addition and it is not a linear transformation.
243
Method 2 - scalar multiplication
Let α = 2 and u = (1, 1). Then
T (αu) = T (2, 2) = (0, 3)
and
αT (u) = 2T (1, 1) = 2(0, 2) = (0, 4).
Since
T (αu) 6= αT (u),
T does not preserve scalar multiplication and it is not a linear
transformation.
244
Example 4.
Prove that the function T : R3 → R2 given by
T (x1, x2, x3) = (x2 − 2x3, 3x1 + x3)
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar
Let u = (u1, u2, u3) ∈ R3, v = (v1, v2, v3) ∈ R3 and α ∈ R.
1. T preserves vector addition
T (u+ v) = T (u1 + v1, u2 + v2, u3 + v3)
= (u2 + v2 − 2u3 − 2v3, 3u1 + 3v1 + u3 + v3)
= (u2 − 2u3, 3u1 + u3) + (v2 − 2v3, 3v1 + v3)
= T (u) + T (v)
245
2. T preserves scalar multiplication
T (αu) = T (αu1, αu2, αu3)
= (αu2 − 2αu3, 3αu1 + αu3)
= α(u2 − 2u3, 3u1 + u3)
= αT (u)
Since T preserves vector addition and scalar multiplication, T is a linear
transformation.
246
Note:
◮ We can write T in matrix form using coordinate vectors:
[T (x1, x2, x3)] =
[
x2 − 2x3
3x1 + x3
]
=
[
0 1 −2
3 0 1
]x1x2
x3
The matrix
[T ] =
[
0 1 −2
3 0 1
]
is called the standard matrix representation for T .
◮ All linear transformations from Rn to Rm can be represented by
matrices.
247
Theorem
Let S = {e1, . . . , en} and S ′ = {e1, . . . , em} be the standard bases for
R
n and Rm, and let v ∈ Rn.
Every linear transformation T : Rn → Rm has a standard matrix
representation [T ] specified by the m × n matrix
[T ] = [ [T (e1)] [T (e2)] · · · [T (en)] ]
where each [T (ei )] denotes the coordinate vector of T applied to ei .
Then
[T (v)] = [T ] [v] for all v ∈ Rn,
where [v] is the coordinate vector for v with respect to S, and
[T (v)] is the coordinate vector for T (v) with respect to S ′
248
Proof:
Let v ∈ Rn and write
v = α1e1 + α2e2 + · · ·+ αnen.
The coordinate vector of v with respect to the standard basis is
[v] =
α1
...
αn
Since T is a linear transformation
[T (v)] = [T (α1e1 + α2e2 + · · ·+ αnen)]
= [T (α1e1)] + [T (α2e2)] + · · ·+ [T (αnen)]
= α1[T (e1)] + α2[T (e2)] + · · ·+ αn[T (en)]
= [ [T (e1)] [T (e2)] · · · [T (en)] ]
α1
...
αn
= [T ] [v]
249
5.2 Geometric linear transformations from R2 to R2
Many familiar geometric transformations are linear transformations from
R
2 to R2.
Recall that the standard basis in R2 is
B = {e1, e2} = {(1, 0), (0, 1)}
and the coordinate vector of v = (x , y) ∈ R2 with respect to B is
[v] =
[
x
y
]
.
Then
[T (x , y)] = [T ]
[
x
y
]
where
[T ] = [ [T (e1)] [T (e2)] ]
is the standard matrix for T .
250
Example 5.
Determine the standard matrix representation in R2 for reflection across
the y -axis.
x
Z
Y
y
Y
Z
Solution:
[T (x , y)] =
[−x
y
]
=
[−1 0
0 1
] [
x
y
]
The standard matrix is
[T ] =
[−1 0
0 1
]
251
Example 6.
Determine the standard matrix representation in R2 for reflection in the
line y = 5x .
Solution:
Let v = (x , y) and u = (1, 5) be a vector in the direction of y = 5x . Let
p = projuv =
u · v
||u||2 u
252
Now
u · v = (1, 5) · (x , y) = x + 5y
||u||2 = 1 + 25 = 26
⇒ p = x + 5y
26
(1, 5)
Also
T (v) = p+ w = p+ (p− v) = 2p− v.
Hence
T (v) =
x + 5y
13
(1, 5) − (x , y)
=
(
x + 5y
13
− x , 5x + 25y
13
− y
)
=
(−12x + 5y
13
,
5x + 12y
13
)
253
Now
[T (x , y)] =
[
x ′
y ′
]
=
[ −12
13
5
13
5
13
12
13
] [
x
y
]
Hence the standard matrix is
[T ] =
[−12
13
5
13
5
13
12
13
]
254
Example 7.
Determine the standard matrix representation in R2 for the rotation
around the origin anticlockwise by an angle of θ.
[
\
[\
Q
[
\
F
U
U
Solution:
Using polar coordinates
x = r cosφ, y = r sinφ
So
x ′ = r cos(φ+ θ)
= r cosφ cos θ − r sinφ sin θ
= x cos θ − y sin θ
255
y ′ = r sin(φ+ θ)
= r sinφ cos θ + r cosφ sin θ
= x sin θ + y cos θ
Hence
[T (x , y)] =
[
x ′
y ′
]
=
[
cos θ − sin θ
sin θ cos θ
] [
x
y
]
The standard matrix is
[T ] =
[
cos θ − sin θ
sin θ cos θ
]
256
Example 8.
Determine the standard matrix representation in R2 for the
compression/expansion along the x-axis by a factor c .
1
1
c
1
x
y
x
y
Solution:
Consider the effect of the transformation on the standard basis vectors.
T (1, 0) = (c , 0) ⇒ [T (e1)] =
[
c
0
]
T (0, 1) = (0, 1) ⇒ [T (e2)] =
[
0
1
]
257
The standard matrix is
[T ] = [ [T (e1)] [T (e2)] ] =
[
c 0
0 1
]
258
Example 9.
Determine the standard matrix representation in R2 for the shear by a
factor c along the x-axis.
1
1
1c
1
1+cx
y
x
y
Solution:
Consider the effect of the transformation on the standard basis vectors.
T (1, 0) = (1, 0) ⇒ [T (e1)] =
[
1
0
]
T (0, 1) = (c , 1) ⇒ [T (e2)] =
[
c
1
]
259
The standard matrix is
[T ] = [ [T (e1)] [T (e2)] ] =
[
1 c
0 1
]
260
Definition (Composition of linear transformations)
Let U,V ,W be vector spaces over the same field of scalars F.
If S : U → V and R : V →W are linear transformations, then the
composition
R ◦ S : U → W
is defined by doing S followed by R :
(R ◦ S)(u) = R(S(u)) for all u ∈ U.
Then R ◦ S is a linear transformation.
Further, if
◮ R : Rk → Rm has standard matrix [R ],
◮ S : Rn → Rk has standard matrix [S ],
then the composition R ◦ S has standard matrix
[R ◦ S ] = [R ] [S ].
261
Example 10.
Find the image of (x , y) after a shear along the x-axis with c = 1
followed by a reflection across the y -axis.
Solution:
From example 5 and 9, the image of (x , y) is[
x ′
y ′
]
=
[ −1 0
0 1
] [
1 1
0 1
] [
x
y
]
=
[ −1 −1
0 1
] [
x
y
]
=
[−x − y
y
]
262
Geometric properties of linear transformations
Let T : V →W be a linear transformation between real vector spaces
V and W . Then
1. T maps the zero vector in V to the zero vector in W .
Proof:
Since T is a linear transformation then for u ∈ V
T (αu) = αT (u) for all scalars α ∈ R.
Choosing α = 0 shows that
T (0V ) = T (0u) = 0T (u) = 0W
263
2. T maps each line in V to a line (or a point) in W .
Proof:
Any line has a vector equation
{v0 + tv | t ∈ R}.
Then
T (v0 + tv) = T (v0) + T (tv)
= T (v0) + tT (v)
This is the line through T (v0) in the direction T (v), or a point if
T (v) = 0.
264
3. T maps each parallelogram in V to a (possibly degenerate)
parallelogram in W .
Proof:
For example, the parallelogram with vertices
0,u, v,u + v
is mapped to the quadrilateral with vertices
T (0) = 0,T (u),T (v),T (u + v).
Since
T (u+ v) = T (u) + T (v)
these are the vertices of a (possibly degenerate) parallelogram.
265
Example 11.
Show that the function T : R2 → R2 given by
T (x1, x2) = (x1 − x2, x1 + 1)
is not a linear transformation.
Solution:
Since
T (0, 0) = (0, 1),
the zero vector does not map to the zero vector.
Hence, T is not a linear transformation.
266
5.3 Matrix representations for general linear
transformations
Let U,V be finite-dimensional vector spaces with the same scalars F.
Suppose that
◮ T : U → V is a linear transformation.
◮ B = {b1, . . . ,bn} is an ordered basis for U.
◮ C = {c1, . . . , cm} is an ordered basis for V .
Then for any u ∈ U, the coordinate vector for the image of u under T
with respect to basis C , is given by
[T (u)]C = [T ]C,B[u]B
where [T ]C,B is a suitable m × n matrix.
267
Theorem
There exists a unique matrix satisfying
[T (u)]C = [T ]C,B[u]B for all u ∈ U
which is given by
[T ]C,B =
[
[T (b1)]C [T (b2)]C · · · [T (bn)]C
]
.
Note:
◮ The proof is the same as for the case of the standard basis.
◮ The matrix [T ]C,B is called the matrix of T with respect to the
bases B and C.
◮ For the case U = V and B = C, we write often [T ]B instead of
[T ]B,B.
268
Example 12.
Consider the linear transformation T : M2,2 → M2,2 where T is defined
by
T (Q) = QT .
Find the matrix representation of T with respect to the basis
B =
{[
1 0
0 0
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]
,
[
0 0
0 1
]}
.
Solution:
Find the image of each basis vector in B under T , and express it as a
coordinate vector using the basis B.
T
([
1 0
0 0
])
=
[
1 0
0 0
]
⇒ [T (b1)] =
1
0
0
0
269
T([
0 1
0 0
])
=
[
0 0
1 0
]
⇒ [T (b2)] =
0
0
1
0
T
([
0 0
1 0
])
=
[
0 1
0 0
]
⇒ [T (b3)] =
0
1
0
0
T
([
0 0
0 1
])
=
[
0 0
0 1
]
⇒ [T (b4)] =
0
0
0
1
Hence
[T ]B =
1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
270
Example 13.
Find [T ]C,B for the linear transformation T : P2 → P1 given by
T (a0 + a1x + a2x
2) = (a0 + a2) + a0x
using the basis B = {1, x , x2} for P2 and basis
(a) C = {1, x} for P1.
(b) C = {2, 3x} for P1.
Solution:
1. Find the image of each basis vector in B under T .
For both (a) and (b)
T (1) = 1 + x
T (x) = 0
T (x2) = 1
271
2. Express each image as a coordinate vector with respect to basis C.
(a)
T (1) = 1 + x = 1(1) + 1(x) ⇒ [T (1)]C =
[
1
1
]
T (x) = 0 ⇒ [T (x)]C =
[
0
0
]
T (x2) = 1 = 1(1) + 0(x) ⇒ [T (x2)]C =
[
1
0
]
Hence
[T ]C,B =
[
1 0 1
1 0 0
]
272
(b)
T (1) = 1 + x =
1
2
(2) +
1
3
(3x) ⇒ [T (1)]C =
[
1
2
1
3
]
T (x) = 0 ⇒ [T (x)]C =
[
0
0
]
T (x2) = 1 =
1
2
(2) + 0(3x) ⇒ [T (x2)]C =
[
1
2
0
]
Hence
[T ]C,B =
[
1
2
0 1
2
1
3
0 0
]
273
Example 14.
A linear transformation T : R3 → R2 has matrix
[T ] =
[
5 1 0
1 5 −2
]
with respect to the standard bases of R3 and R2.
What is the transformation matrix with respect to the bases
B = {(1, 1, 0), (1,−1, 0), (1,−1,−2)} of R3
and
C = {(1, 1), (1,−1)} of R2?
Solution:
1. Find the coordinates of the image of each vector in B with respect to
the standard basis.
[T (b1)] =
[
5 1 0
1 5 −2
] 11
0
= [6
6
]
274
[T (b2)] =
[
5 1 0
1 5 −2
] 1−1
0
= [ 4−4
]
[T (b3)] =
[
5 1 0
1 5 −2
] 1−1
−2
= [4
0
]
2. Express each image as a coordinate vector with respect to basis C.
[T (b1)] = 6
[
1
1
]
+ 0
[
1
−1
]
⇒ [T (b1)]C =
[
6
0
]
[T (b2)] = 0
[
1
1
]
+ 4
[
1
−1
]
⇒ [T (b2)]C =
[
0
4
]
275
[T (b3)] = 2
[
1
1
]
+ 2
[
1
−1
]
⇒ [T (b3)]C =
[
2
2
]
Hence
[T ]C,B =
[
6 0 2
0 4 2
]
276
5.4 Image, kernel, rank and nullity
Definition (Kernel and Image)
Let T : U → V be a linear transformation.
The kernel (or nullspace) of T is defined to be
ker(T ) = {u ∈ U | T (u) = 0}.
The image (or range) of T is defined to be
Im(T ) = {v ∈ V | v = T (u) for some u ∈ U}.
Note:
◮ The kernel is a subspace of U. Its dimension is denoted by
nullity(T ).
◮ The image is a subspace of V . Its dimension is denoted by
rank(T ).
277
Example 15.
Find the kernel and image of the linear transformation D : P3 → P2
given by differentiation with respect to x :
D(a0 + a1x + a2x
2 + a3x
3) = a1 + 2a2x + 3a3x
2.
Solution:
The kernel is
ker(D) = {p ∈ P3|D(p) = 0}
= {a0 + a1x + a2x2 + a3x3|a1 + 2a2x + 3a3x2 = 0 + 0x + 0x2}
= {a0|a0 ∈ R}
These are just the constant polynomials in P3.
The image is
Im(D) = {a1 + 2a2x + 3a3x2|a1, a2, a3 ∈ R} = P2
278
Image and kernel from matrix representations
Given a linear transformation T : U → V and choices of ordered bases
B for U and C for V , we can represent T by the matrix
[T ] = [T ]C,B.
◮ To find ker(T ) we solve equations of the form [T ][x] = [0], so the
solution space for [T ] gives the coordinate vectors of the elements
of ker(T ) with respect to the basis B.
◮ If B is a basis for U then T (B) spans T (U). The columns of [T ]
give the C-coordinates of the elements of T (B), so the column
space of [T ] gives the coordinate vectors of the elements of Im(T )
with respect to the basis C.
◮ Hence, nullity(T ) = nullity([T ]) and rank(T ) = rank([T ]).
279
Using the rank-nullity theorem for matrices gives the following:
Theorem (Rank-Nullity Theorem)
For a linear transformation T : U → V with dim(U) = n,
rank(T ) + nullity(T ) = n.
280
Example 16.
Consider the linear transformation T : R3 → R2 defined by
T (x , y , z) = (2x − y , y + z).
Find bases for Im(T ) and ker(T ). Verify the rank-nullity theorem.
Solution:
1. Image of T
Using the standard bases for R3 and R2:
[T (x , y , z)] =
[
2 −1 0
0 1 1
]xy
z
Note that
[T ] =
[
2 −1 0
0 1 1
]
is in row-echelon form.
As the leading entries are in columns 1 and 2, then a basis for the
column space of [T ] is
{(2, 0), (−1, 1)}.
281
A basis for the Im(T ) is
{(2, 0), (−1, 1)}.
2. Kernel of T
Solve [T ][x] = [0]. [
2 −1 0 0
0 1 1 0
]
Since there is no leading entry in column 3, we let
z = t, t ∈ R.
Then
y + z = 0 =⇒ y = −t
2x − y = 0 =⇒ x = − t
2
.
282
So the solution space of [T ] is
(x , y , z) =
(
− t
2
,−t, t
)
= t
(
−1
2
,−1, 1
)
, t ∈ R
A basis for ker(T ) is {(
−1
2
,−1, 1
)}
.
3. Verify rank-nullity theorem
rank(T ) + nullity(T ) = 2 + 1 = 3 = dim(R3)
283
Example 17.
Consider the linear transformation T : P2 → R2 defined by
T (a0 + a1x + a2x
2) = (a0 − a1 + a2, 2a0 − 2a1 + 2a2).
Find bases for Im(T ) and ker(T ).
Solution:
1. Image of T
Using the standard bases for P2 and R2:
[T (a0 + a1x + a2x
2)] =
[
1 −1 1
2 −2 2
]a0a1
a2
Now [
1 −1 1
2 −2 2
]
R2 → R2 − 2R1 ∼
[
1 −1 1
0 0 0
]
As the leading entry is in column 1, then a basis for the column space
of [T ] is
{(1, 2)}
284
A basis for Im(T ) is
{(1, 2)}
2. Kernel of T
Solve [T ][x] = [0]. From Im(T ), we have
[
1 −1 1 0
0 0 0 0
]
⇒ a0 − a1 + a2 = 0
Let a1 = s, a2 = t for s, t ∈ R.
Then a0 = s − t and
(a0, a1, a2) = (s − t, s, t)
= s(1, 1, 0) + t(−1, 0, 1)
285
So a basis for the solution space of [T ] is
{(1, 1, 0), (−1, 0, 1)}.
This gives the coordinates of a basis for ker(T ).
Hence a basis for ker(T ) is{
1 + x ,−1 + x2}
286
5.5 Invertible linear transformations
Definition (Injective, surjective)
A function T : U → V is
1. injective or one-to-one if T (x) = T (y) ⇒ x = y.
2. surjective or onto if T (U) = V .
For linear transformations, injectivity simplifies to the following:
Theorem
A linear transformation T : U → V is injective iff ker(T ) = {0}.
287
Example 18.
Consider the linear transformation T : P2 → R2 defined by
T (a0 + a1x + a2x
2) = (a0 − a1 + a2, 2a0 − 2a1 + 2a2).
(a) Is T injective?
(b) Is T surjective?
Solution:
From example 17:
(a) Since
ker(T ) = span
{
1 + x ,−1 + x2} 6= {0},
T is not injective.
(b) Since
Im(T ) = span{(1, 2)} 6= R2,
T is not surjective.
288
Definition (Invertible)
A function T : U → V is invertible if there exists a function S : V → U
such that
◮ (S ◦ T )(u) = u for all u ∈ U, and
◮ (T ◦ S)(v) = v for all v ∈ V .
Then S is called the inverse of T and is denoted by T−1.
289
Theorem
Let T : U → V be a linear transformation.
1. T is invertible iff it is both injective and surjective.
2. If T is invertible, then its inverse T−1 is also a linear
transformation.
3. Assume that U and V have finite ordered bases B and C,
respectively. Then T is invertible iff [T ]C,B is invertible, and
[T−1]B,C = ([T ]C,B)−1.
This requires dim(U) = dim(V ).
Note:
An invertible linear transformation T is also called an isomorphism.
290
Example 19.
Is the rotation T : R2 → R2 around the origin anticlockwise by an angle
θ, with standard matrix
[T ] =
[
cos θ − sin θ
sin θ cos θ
]
,
an invertible linear transformation? If it is, find [T−1].
Solution:
We have
det[T ] = cos2 θ + sin2 θ = 1
⇒ [T ]−1 =
[
cos θ sin θ
− sin θ cos θ
]
So T is invertible and
[T−1] =
[
cos θ sin θ
− sin θ cos θ
]
291
Example 20.
Consider the linear transformation T : R2 → R2 for projection onto the
x-axis, with standard matrix
[T ] =
[
1 0
0 0
]
.
(a) Is T surjective?
(b) Is T injective?
(c) Is T invertible?
Solution:
(a)
[T (x , y)] =
[
1 0
0 0
] [
x
y
]
=
[
x
0
]
= x
[
1
0
]
So
Im(T ) = span{(1, 0)}
which is the x-axis.
Since Im(T ) 6= R2, T is not surjective.
292
(b) Solve [T ][x] = [0] then
x
[
1
0
]
=
[
0
0
]
⇒ x = 0, y ∈ R
So
ker(T ) = span{(0, 1)}
which is the y -axis.
Since ker(T ) 6= {0}, T is not injective.
(c) T is not invertible for any of the following reasons:
◮ T is not injective
◮ T is not surjective
◮ det([T ]) = 0 so [T ]−1 does not exist
293
5.6 Change of basis
Transition Matrices
Let B = {b1, . . . ,bn} and C = {c1, . . . , cn} be ordered bases for the
same vector space V . Consider v ∈ V .
How are [v]B and [v]C related?
Theorem
There exists a unique matrix PC,B such that for any vector v ∈ V ,
[v]C = PC,B[v]B.
The matrix PC,B is given by
PC,B = [ [b1]C · · · [bn]C]
and is called the transition matrix from B to C.
294
Proof:
We want to find a matrix PC,B such that for all vectors v ∈ V ,
[v]C = PC,B[v]B.
If T : V → V is a linear transformation, then
[T (v)]C = [T ]C,B[v]B
where
[T ]C,B =
[
[T (b1)]C [T (b2)]C · · · [T (bn)]C
]
is the matrix representation of T with respect to bases B and C.
295
For the special case of the identity function
T (v) = v for all v ∈ V ,
this gives
[T ]C,B =
[
[b1]C [b2]C . . . [bn]C
]
and
[v]C = [T ]C,B[v]B.
So
PC,B = [T ]C,B.
Note:
The transition matrix PC,B is invertible since T is invertible, and
(PC,B)−1 = PB,C .
296
Example 21.
Consider the following bases for R2
B = {(1, 1), (1,−1)} and S = {(1, 0), (0, 1)} .
(a) Determine the transition matrix from B to S.
(b) If [v]B =
[
1
1
]
, compute [v]S .
Solution:
(a)
PS,B =
[
[b1]S [b2]S
]
=
[
1 1
1 −1
]
(b)
[v]S = PS,B[v]B =
[
1 1
1 −1
] [
1
1
]
=
[
2
0
]
297
Example 22.
Consider the following bases for R2
B = {(1, 1), (1,−1)} and S = {(1, 0), (0, 1)} .
(a) Determine the transition matrix from S to B.
(b) If [v]S =
[
2
0
]
, compute [v]B.
Solution:
In example 21, we found PS,B =
[
1 1
1 −1
]
.
(a)
PB,S = (PS,B)−1 =
1
−2
[ −1 −1
−1 1
]
=
1
2
[
1 1
1 −1
]
(b)
[v]B = PB,S [v]S =
1
2
[
1 1
1 −1
] [
2
0
]
=
[
1
1
]
298
Calculating a general transition matrix from basis B to basis C
We have
[v]S = PS,B[v]B and [v]C = PC,S [v]S .
Combining these, we get
[v]C = PC,S [v]S = PC,SPS,B[v]B .
Since the transition matrix is unique then
PC,B = PC,SPS,B = (PS,C)−1PS,B.
299
Example 23.
Consider the following bases for R2
B = {(1, 2), (1, 1)} and C = {(−3, 4), (1,−1)}.
Find the transition matrix from B to C.
Solution:
Now
PS,B =
[
1 1
2 1
]
PS,C =
[−3 1
4 −1
]
so
PC,S = (PS,C)−1 = −
[−1 −1
−4 −3
]
=
[
1 1
4 3
]
300
Hence the transition matrix from B to C is
PC,B = PC,SPS,B
=
[
1 1
4 3
] [
1 1
2 1
]
=
[
3 2
10 7
]
Note:
Writing vectors in B as linear combinations of vectors in C gives:
(1, 2) = 3(−3, 4) + 10(1,−1)
(1, 1) = 2(−3, 4) + 7(1,−1)
where the coefficients correspond to the coordinate vectors in the
columns of PC,B.
301
How are [T ]C and [T ]B related?
Theorem
The matrix representations of T : V → V with respect to two ordered
bases C and B for V are related by the following equation:
[T ]B = PB,C[T ]CPC,B.
Proof:
We need to show that for all v ∈ V ,
[T (v)]B = PB,C[T ]CPC,B[v]B
Starting with the right-hand side we obtain
PB,C[T ]CPC,B[v]B = PB,C[T ]C [v]C (property of PC,B)
= PB,C[T (v)]C (property of [T ]C)
= [T (v)]B (property of PB,C)
302
Example 24.
Determine the standard matrix representation in R2 for reflection in the
line y = 5x
using a change of basis from the standard basis S = {(1, 0), (0, 1)} to
the basis B = {u, v}, where
◮ u = (1, 5) is a vector in the direction of y = 5x
◮ v = (5,−1) is a vector perpendicular to y = 5x
303
Solution:
1. Find the image of the vectors in B and write in terms of B to give
[T ]B.
T (1, 5) = (1, 5) = 1(1, 5) + 0(5,−1) ⇒ [T (u)]B =
[
1
0
]
T (5,−1) = (−5, 1) = 0(1, 5) − 1(5,−1) ⇒ [T (v)]B =
[
0
−1
]
The matrix representation of T with respect to basis B is
[T ]B = [ [T (u)]B [T (v)]B ] =
[
1 0
0 −1
]
.
304
2. Find the transition matrices PS,B and PB,S .
The transition matrix from B to S is
PS,B =
[
1 5
5 −1
]
The transition matrix from S to B is
PB,S = (PS,B)−1
= − 1
26
[−1 −5
−5 1
]
=
1
26
[
1 5
5 −1
]
305
3. Find the standard matrix representation of T .
[T ]S = PS,B[T ]BPB,S
=
1
26
[
1 5
5 −1
] [
1 0
0 −1
] [
1 5
5 −1
]
=
1
26
[
1 5
5 −1
] [
1 5
−5 1
]
=
1
26
[−24 10
10 24
]
=
[−12
13
5
13
5
13
12
13
]
as found in example 6.
306
Example 25.
Consider the linear transformation T : R2 → R2 given by
T (x , y) = (3x − y ,−x + 3y).
Let B = {(1, 1), (1,−1)} be a basis for R2.
(a) Find the standard matrix representation of T .
(b) Find the matrix representation of T with respect to the basis B.
(c) Interpret T geometrically.
Solution:
(a) Now
[T (x , y)] =
[
3 −1
−1 3
] [
x
y
]
.
The standard matrix is
[T ] =
[
3 −1
−1 3
]
.
307
(b) From examples 21 and 22, we have
[T ]B = PB,S [T ]PS,B
=
1
2
[
1 1
1 −1
] [
3 −1
−1 3
] [
1 1
1 −1
]
=
1
2
[
1 1
1 −1
] [
2 4
2 −4
]
=
1
2
[
4 0
0 8
]
=
[
2 0
0 4
]
308
(c) Note [T ]B is a diagonal matrix.
T stretches by a factor of 2 in the direction b1 = (1, 1)
T stretches by a factor of 4 in the direction b2 = (1,−1)
309
Topic 6: Eigenvalues and Eigenvectors [AR Chap 5]
6.1 Definition of eigenvalues and eigenvectors
6.2 Finding eigenvalues
6.3 Finding eigenvectors
6.4 Diagonalisation
6.5 Matrix powers
310
6.1 Definition of eigenvalues and eigenvectors
We wish to identify subspaces W that are invariant under a linear
transformation T : V → V , i.e. with T (W ) ⊆W .
Example 1.
Let T : R2 → R2 be defined by a reflection in the line y = 5x .
(a) Identify two lines through the origin that are invariant under T .
(b) Find the image of the direction vectors for each line in (a).
311
Solution:
(a) By inspection, the line y = 5x , and the perpendicular line y = −x
5
are invariant under T .
(b)The invariant lines have direction vectors (1, 5) and (5,−1).
Now
T (1, 5) = (1, 5) = 1(1, 5)
T (5,−1) = −1(5,−1)
312
Algebraically, we seek non-zero vectors v such that
T (v) = λv
for some scalar λ. Then the 1-dimensional subspace W = span{v} is
invariant under T .
Geometrically,
◮ If λ = 0, then T maps v to 0.
◮ If λ is real and positive, then T rescales v by a factor λ.
◮ If λ is real and negative, then T rescales v by a factor λ and also
reverses the direction of v.
313
Definition (Eigenvalues and eigenvectors of transformations)
Let T : V → V be a linear transformation.
A scalar λ is an eigenvalue of T if there is a non-zero vector v ∈ V
such that
T (v) = λv.
Then v is called an eigenvector of T with corresponding eigenvalue λ.
For each eigenvalue λ, the set
{v ∈ V | T (v) = λv}.
is a subspace of V , called the eigenspace of λ.
Note:
Each non-zero vector in this eigenspace is an eigenvector of T with
eigenvalue λ.
314
If we represent T by its matrix [T ] with respect to an ordered basis B
for V , then we can find eigenvalues and eigenvectors by solving
[T ][v] = λ[v], where v 6= 0.
This leads to the following definition.
Definition (Eigenvalues and eigenvectors of matrices)
Let A be an n× n matrix with real or complex entries. A scalar λ ∈ C is
an eigenvalue of A if there is a non-zero column vector v ∈ Cn such
that
Av = λv.
Then v is called an eigenvector of A, with corresponding eigenvalue λ.
Note:
For a real matrix, eigenvalues and eigenvectors can be real or complex .
315
6.2 Finding eigenvalues
Let A be an n× n matrix and v be an n × 1 column vector.
If I denotes the n × n identity matrix, then the equation
Av = λv
can be rewritten in the form
Av− λv = 0
⇒ Av − λIv = 0
⇒ (A − λI )v = 0
The values of λ for which this equation has non-zero solutions are the
eigenvalues.
316
Theorem
The homogeneous linear system
(A − λI )v = 0
has a non-zero solution if and only if
det(A− λI ) = 0.
The eigenvalues of A are the values of λ which satisfy the characteristic
equation
det(A− λI ) = 0.
317
Example 2.
Find the eigenvalues of A =
[
1 4
1 1
]
.
Solution:
Solve det(A− λI ) = 0
0 = det
[
1− λ 4
1 1− λ
]
= (1− λ)2 − 4
= 1− 2λ+ λ2 − 4
= λ2 − 2λ− 3
= (λ+ 1)(λ − 3)
The eigenvalues are λ = −1, 3.
318
Definition (Characteristic polynomial)
Let A be an n× n matrix. The determinant det(A− λI ) is a polynomial
in λ of degree n called the characteristic polynomial of A.
Definition (Multiplicity of eigenvalues)
The algebraic multiplicity of an eigenvalue λ0 is the power of λ− λ0 in
the characteristic polynomial.
The geometric multiplicity of an eigenvalue is the number of linearly
independent eigenvectors associated with it.
By the fundamental theorem of algebra, we can always find at least one
complex eigenvalue for any real or complex n × n matrix. Further, the
sum of the algebraic multiplicities must equal n.
319
Example 3.
Find the eigenvalues of A =
2 −3 60 5 −6
0 1 0
.
What is the algebraic multiplicity of each eigenvalue?
Solution:
Solve det(A− λI ) = 0
0 = det
2− λ −3 60 5− λ −6
0 1 −λ
= (2− λ) det
[
5− λ −6
1 −λ
]
= (2− λ)((5 − λ)(−λ) + 6)
= (2− λ)(λ2 − 5λ+ 6)
= (2− λ)(λ− 2)(λ− 3)
320
The eigenvalues are λ = 2, 3.
The algebraic multiplicity of λ = 2 is 2.
The algebraic multiplicity of λ = 3 is 1.
321
Useful properties of eigenvalues
The eigenvalues λi (i = 1, · · · , n) of an n × n matrix A satisfy the
following properties:
1. Tr(A) =
n∑
i=1
λi = sum of the eigenvalues
2. det(A) =
n∏
i=1
λi = product of the eigenvalues
These properties are useful for checking that you have calculated the
eigenvalues correctly.
322
Example 4.
Check these properties for the eigenvalues λ = 2, 2, 3 of the matrix
A =
2 −3 60 5 −6
0 1 0
.
Solution:
The sum of the eigenvalues is 2 + 2 + 3 = 7.
Tr(A) = 2 + 5 + 0 = 7
The product of the eigenvalues is (2)(2)(3) = 12.
det(A) = 2 det
[
5 −6
1 0
]
= 2(6) = 12
323
Example 5.
Consider the matrix A =
[
0 1
−1 0
]
.
(a) Find the eigenvalues of A.
(b) Interpret the matrix and eigenvalues geometrically.
Solution:
(a) Solve det(A− λI ) = 0
0 = det
[ −λ 1
−1 −λ
]
= λ2 + 1
The eigenvalues are λ = ±i .
(b) The matrix corresponds to an anticlockwise rotation of R2 by
3pi
2
.
324
This rotation leaves no lines in R2 unchanged, so there are no real
eigenvalues and no real eigenvectors.
Note:
The matrix A defines a linear transformation T : C2 → C2 where
T (v) = Av
for v ∈ C2.
Therefore, T has complex eigenvalues.
325
Eigenvalues of matrices versus eigenvalues of linear transformations
Let A be an n× n matrix with complex entries. Then the eigenvalues of
A are the eigenvalues of the linear transformation T : Cn → Cn with
standard matrix A. For each eigenvalue λ,
{v ∈ Cn | Av = λv}
is a subspace of Cn, called the (complex) eigenspace of λ.
Let A be an n× n matrix with real entries. Then the real eigenvalues of
A are the eigenvalues of the linear transformation T : Rn → Rn with
standard matrix A. For each real eigenvalue λ,
{v ∈ Rn : Av = λv}
is a subspace of Rn called the (real) eigenspace of λ.
326
6.3 Finding eigenvectors
To find the eigenvectors of a matrix
Let A be an n× n matrix and I be the n × n identity matrix.
◮ For each eigenvalue λ, solve the linear system
(A− λI )v = 0.
The solution space is the eigenspace of λ.
◮ You always obtain at least one row of zeros in row-echelon form as
rank(A− λI ) < n. So there are non-zero solutions.
◮ Choose a basis for the eigenspace. The vectors in the basis are the
linearly independent eigenvectors.
Note:
The eigenvectors of a matrix are not unique. An eigenvector v can be
multiplied by any non-zero scalar and it will still be an eigenvector.
327
Example 6.
Find the eigenvectors of the matrix A =
[
1 4
1 1
]
.
Solution:
From example 2, the eigenvalues are λ = −1, 3.
For each eigenvalue λ, solve (A− λI )v = 0
For λ = −1, solve (A+ I )v = 0 where v =
[
v1
v2
]
.
[
2 4 0
1 2 0
]
R2 → R2 − 12R1
∼
[
2 4 0
0 0 0
]
R1 → 12R1
∼
[
1 2 0
0 0 0
]
Let v2 = t for t ∈ R. Then v1 = −2v2 = −2t.
328
The eigenspace is {
t
[ −2
1
]
, t ∈ R
}
One eigenvector corresponding to λ = −1 is
v =
[ −2
1
]
.
For λ = 3, solve (A− 3I )v = 0 where v =
[
v1
v2
]
.
[ −2 4 0
1 −2 0
]
R2 → R2 + 12R1
∼
[ −2 4 0
0 0 0
]
R1 → 12R1
∼
[ −1 2 0
0 0 0
]
329
Let v2 = t for t ∈ R. Then v1 = 2v2 = 2t.
The eigenspace is {
t
[
2
1
]
, t ∈ R
}
One eigenvector corresponding to λ = 3 is
v =
[
2
1
]
330
Example 7.
Find the eigenvectors of the matrix A =
2 −3 60 5 −6
0 1 0
.
What is the geometric multiplicity of each eigenvalue?
Solution:
From example 3, the eigenvalues are λ = 2, 3.
For each eigenvalue λ, solve (A− λI )v = 0
For λ = 2, solve (A− 2I )v = 0 where v =
v1v2
v3
.
0 −3 6 00 3 −6 0
0 1 −2 0
R1 ↔ R3 ∼
0 1 −2 00 3 −6 0
0 −3 6 0
R2 → R2 − 3R1
R3 → R3 + 3R1
∼
0 1 −2 00 0 0 0
0 0 0 0
331
Let v1 = s and v3 = t for s, t ∈ R.
Then v2 = 2v3 = 2t.
The eigenspace is
s2t
t
= s
10
0
+ t
02
1
, s, t ∈ R
Two linearly independent eigenvectors corresponding to λ = 2 are
v1 =
10
0
and v2 =
02
1
The geometric multiplicity of λ = 2 is 2.
332
For λ = 3, solve (A− 3I )v = 0 where v =
v1v2
v3
.
−1 −3 6 00 2 −6 0
0 1 −3 0
R2 ↔ R3 ∼
−1 −3 6 00 1 −3 0
0 2 −6 0
R3 → R3 − 2R2
∼
−1 −3 6 00 1 −3 0
0 0 0 0
R1 → R1 + 3R2
∼
−1 0 −3 00 1 −3 0
0 0 0 0
333
Let v3 = t for t ∈ R.
Then v1 = −3v3 = −3t, v2 = 3v3 = 3t.
The eigenspace is
t
−33
1
, t ∈ R
An eigenvector corresponding to λ = 3 is
v =
−33
1
The geometric multiplicity of λ = 3 is 1.
Note:
The geometric multiplicity is always less than or equal to the algebraic
multiplicity.
334
Example 8.
Find the eigenvectors of the matrix A =
[
0 1
−1 0
]
.
Solution:
From example 5, the eigenvalues are λ = ±i .
For each eigenvalue λ, solve (A− λI )v = 0
For λ = i , solve (A− iI )v = 0 where v =
[
v1
v2
]
.
[ −i 1 0
−1 −i 0
]
R1 → iR1 ∼
[
1 i 0
−1 −i 0
]
R2 → R2 + R1
∼
[
1 i 0
0 0 0
]
Let v2 = t for t ∈ C. Then v1 = −iv2 = −it.
335
The (complex) eigenspace is{
t
[ −i
1
]
, t ∈ C
}
An eigenvector corresponding to λ = i is
v =
[ −i
1
]
For λ = −i , solve (A+ iI )v = 0 where v =
[
v1
v2
]
.
[
i 1 0
−1 i 0
]
R1 → iR1 ∼
[ −1 i 0
−1 i 0
]
R2 → R2 − R1
∼
[ −1 i 0
0 0 0
]
336
Let v2 = t for t ∈ C. Then v1 = iv2 = it.
The (complex) eigenspace is{
t
[
i
1
]
, t ∈ C
}
An eigenvector corresponding to λ = −i is
v =
[
i
1
]
Note:
◮ The eigenvalues are complex conjugates.
◮ The eigenvectors are complex conjugates.
In general, the non-real eigenvalues and eigenvectors of a real matrix
occur in complex conjugate pairs.
337
6.4 Diagonalisation
Definition
A linear transformation T : V → V is diagonalisable if there is a basis
B = {b1, . . . ,bn} for V such that the matrix [T ]B is diagonal.
Definition
A complex n × n matrix A is diagonalisable over C if the linear
transformation
T : Cn → Cn withT (v) = Av
is diagonalisable.
A real n × n matrix A is diagonalisable over R if the linear
transformation
T : Rn → Rn withT (v) = Av
is diagonalisable.
338
Theorem
A linear transformation T : V → V is diagonalisable if and only if there
is a basis B for V consisting of eigenvectors of T .
An n× n matrix A is diagonalisable if and only if it has n linearly
independent eigenvectors.
Proof:
Let B = {b1,b2, · · · ,bn}. The matrix [T ]B is diagonal if and only if
each of the basis vectors bi satisfies
T (bi ) = λibi ,
where λi is the corresponding diagonal entry of the matrix [T ]B.
339
How to diagonalise a matrix
Theorem
An n× n matrix A is diagonalisable if and only if there is an n× n
invertible matrix P and an n × n diagonal matrix D such that
P−1AP = D
or, equivalently,
A = PDP−1.
The matrix P is said to diagonalise A.
If v1, . . . , vn are linearly independent eigenvectors of A with
corresponding eigenvalues λ1, . . . , λn, then we can take
P =
[
v1 · · · vn
]
and D = diag(λ1, λ2, . . . , λn).
If A is diagonalisable over R we can choose P and D with real entries.
340
Example 9.
Let A =
[
1 4
1 1
]
. Suppose T : R2 → R2 is such that [T ] = A.
(a) Write A in the form PDP−1. Check your answer.
(b) Give a geometric description of T .
Solution:
(a) From example 6:
An eigenvector corresponding to λ = −1 is v1 =
[ −2
1
]
.
An eigenvector corresponding to λ = 3 is v2 =
[
2
1
]
.
Hence A is diagonalisable with
P =
[−2 2
1 1
]
and D =
[−1 0
0 3
]
341
Check A = PDP−1 by multiplying the matrices.
PDP−1 =
[−2 2
1 1
] [−1 0
0 3
]
1
4
[−1 2
1 2
]
=
1
4
[−2 2
1 1
] [
1 −2
3 6
]
=
1
4
[
4 16
4 4
]
=
[
1 4
1 1
]
Note: To avoid calculating P−1 verify that AP = PD.
(b) T reverses vectors in the direction of (−2, 1) and scales vectors in
the direction of (2, 1) by a factor of 3.
342
Theorem
Eigenvectors v1, . . . , vk corresponding to distinct eigenvalues λ1, . . . , λk
are linearly independent.
Proof:
If k = 2.
Let v1, v2 be eigenvectors of A corresponding to λ1 and λ2 respectively,
such that λ1 6= λ2. Then
Av1 = λ1v1 and Av2 = λ2v2.
Suppose that
α1v1 + α2v2 = 0 (1)
Then
A(α1v1 + α2v2) = A0 = 0
⇒ α1Av1 + α2Av2 = 0
⇒ α1λ1v1 + α2λ2v2 = 0 (2)
343
Equation (2)− λ1(1) gives
α2(λ2 − λ1)v2 = 0
Since λ1 6= λ2 and v2 6= 0, then α2 = 0.
Since v1 6= 0 and α2 = 0, then (1) gives α1 = 0.
Since the only solution of (1) is
α1 = α2 = 0
then v1 and v2 are linearly independent eigenvectors.
Note:
Similar arguments can be used for the cases k > 2.
344
Theorem
If an n × n matrix A has n distinct eigenvalues, then A has n linearly
independent eigenvectors. Hence, A will be diagonalisable.
Note:
If A has one or more repeated eigenvalues, then A may or may not be
diagonalisable depending on the number of linearly independent
eigenvectors associated with each eigenvalue.
◮ If the algebraic and geometric multiplicities for each eigenvalue are
equal, A will be diagonalisable. In this case, the dimensions of the
eigenspaces add up to n.
◮ If the geometric multiplicity is less than the algebraic multiplicity
for some eigenvalue, A will not be diagonalisable.
345
Example 10.
Is A =
2 −3 60 5 −6
0 1 0
diagonalisable?
If it is, specify the matrices P and D such that A = PDP−1.
Solution: From example 7:
Two linearly independent eigenvectors corresponding to λ = 2 are
v1 =
10
0
and v2 =
02
1
An eigenvector corresponding to λ = 3 is
v3 =
−33
1
346
Hence A is diagonalisable with
P =
1 0 −30 2 3
0 1 1
and D =
2 0 00 2 0
0 0 3
347
Example 11.
Is A =
[
0 1
−1 0
]
diagonalisable?
If it is, specify the matrices P and D such that A = PDP−1.
Solution: From example 8:
An eigenvector corresponding to λ = i is
v1 =
[ −i
1
]
An eigenvector corresponding to λ = −i is
v2 =
[
i
1
]
Hence A is diagonalisable over C with
P =
[−i i
1 1
]
and D =
[
i 0
0 −i
]
348
Example 12.
Show that the matrix A =
[
1 2
0 1
]
is not diagonalisable.
Solution:
Find the eigenvalues of A
0 = det
[
1− λ 2
0 1− λ
]
= (1− λ)2
The eigenvalue is λ = 1.
Find the eigenvectors of A
Solve (A− I )v = 0 where v =
[
v1
v2
]
.
[
0 2 0
0 0 0
]
Then v2 = 0. Let v1 = t for t ∈ R.
349
The eigenspace is {
t
[
1
0
]
, t ∈ R
}
An eigenvector corresponding to λ = 1 is
v =
[
1
0
]
Since there is only one linearly independent eigenvector corresponding
to λ = 1, we cannot diagonalise A.
Note:
The eigenvalue λ = 1 has algebraic multiplicity 2 but geometric
multiplicity of 1, so we cannot diagonalise A.
350
6.5 Matrix powers
In applications, one often needs to apply a transformation many times.
If the transformation can be represented by a diagonal matrix D, it is
easy to compute Dk and thus the action of the k-th application of the
transformation.
Example 13.
If D =
−4 0 00 3 0
0 0 2
, write down D10.
Solution:
D10 =
410 0 00 310 0
0 0 210
351
Theorem
Suppose A is diagonalisable, so that
A = PDP−1
for some invertible matrix P and a diagonal matrix D.
Then for all integers k > 1, we have
Ak = PDkP−1.
352
Example 14.
From example 9, A =
[
1 4
1 1
]
can be written in the form A = PDP−1
where
D =
[−1 0
0 3
]
, P =
[−2 2
1 1
]
and P−1 =
1
4
[−1 2
1 2
]
.
Calculate Ak , for all integers k > 1.
Solution:
Since Ak = PDkP−1, we have
Ak =
[−2 2
1 1
] [
(−1)k 0
0 3k
]
1
4
[−1 2
1 2
]
=
1
4
[−2 2
1 1
] [
(−1)k+1 2(−1)k
3k 2(3k )
]
=
1
4
[
2
(
(−1)k + 3k) 4 ((−1)k+1 + 3k)
(−1)k+1 + 3k 2 ((−1)k + 3k)
]
353
Example 15. An application from genetics
A snapdragon plant has either red, pink or white flowers, depending on
its genotype (with alleles RR, RW or WW respectively). We want to
cross pink plants with red, pink and white plants. On average,
◮ crossing pink plants with red plants produces:
1
2
red plants and 1
2
pink plants as offspring.
◮ crossing pink plants with pink plants produces:
1
4
red plants, 1
2
pink plants and 1
4
white plants as offspring.
◮ crossing pink plants with white plants produces:
1
2
pink plants and 1
2
white plants as offspring.
This is an example of incomplete dominance in genetics.
If you continue crossing with only pink plants, what fractions of the
three types of flowers would you eventually expect to see in your garden?
354
Solution:
Let
xn =
rnpn
wn
where rn, pn,wn are the fractions of red, pink and white flowers after n
generations, for n > 0. Note that these satisfy
rn + pn + wn = 1.
Now for each n > 0 we have
rn+1
pn+1
wn+1
=
1
2
rn +
1
4
pn + 0wn
1
2
rn +
1
2
pn +
1
2
wn
0rn +
1
4
pn +
1
2
wn
=
1
2
1
4
0
1
2
1
2
1
2
0 1
4
1
2
rn
pn
wn
355
This gives
xn+1 = Txn,
where T is the transition matrix
T =
1
2
1
4
0
1
2
1
2
1
2
0 1
4
1
2
.
Note that T has non-negative entries and the sum of each column is 1.
Then
x1 = Tx0, x2 = Tx1 = T (Tx0) = T
2x0,
and in general
xn = T
nx0
gives the distribution after n generations for n > 1.
356
Now T has eigenvalues
λ = 1,
1
2
, 0
with corresponding eigenvectors
v1 =
12
1
, v2 =
−10
1
, v3 =
1−2
1
.
Hence we can diagonalise T and
T = PDP−1
where
P =
1 −1 12 0 −2
1 1 1
, D =
1 0 00 1
2
0
0 0 0
, P−1 =
1
4
1
4
1
4
−1
2
0 1
2
1
4
−1
4
1
4
.
357
Now
T n = PDnP−1
Taking the limit as n→∞ we get
lim
n→∞T
n =
1 −1 12 0 −2
1 1 1
1 0 00 0 0
0 0 0
14 14 14−1
2
0 1
2
1
4
−1
4
1
4
=
1 −1 12 0 −2
1 1 1
1
4
1
4
1
4
0 0 0
0 0 0
=
1
4
1
4
1
4
1
2
1
2
1
2
1
4
1
4
1
4
and
x∞ = lim
n→∞ xn = limn→∞T
nx0
.
358
Therefore
x∞ =
1
4
1
4
1
4
1
2
1
2
1
2
1
4
1
4
1
4
r0
p0
w0
=
1
4
(r0 + p0 + w0)
1
2
(r0 + p0 + w0)
1
4
(r0 + p0 + w0)
=
1
4
1
2
1
4
since for any initial values r0, p0,w0, we have r0 + p0 + w0 = 1.
So in the long run, the expected distribution of flower colours is:
1
4
red, 1
2
pink, and 1
4
white.
Note:
◮ These fractions are proportional to the entries of the eigenvector of
T with eigenvalue 1 . This is expected since the limiting
proportions x∞ should satisfy Tx∞ = x∞.
◮ This is an example of a Markov chain.
359
Topic 7: Inner Product Spaces [AR 6.1-6.5,5.3,7.1-7.2,7.5]
7.1 Definition of inner products
7.2 Geometry from inner products
7.3 Orthogonal sets and Gram-Schmidt procedure
7.4 Least squares curve fitting
7.5 Orthogonal matrices and symmetric matrices
7.6 Unitary matrices and Hermitian matrices
360
7.1 Definition of inner products
Definition (Inner product over R)
An inner product on a real vector space V is a function that associates
to every pair of vectors u, v ∈ V a real number, denoted by 〈u, v〉,
satisfying the following properties.
For all u, v,w ∈ V and α ∈ R:
1. 〈u, v〉 = 〈v,u〉 (symmetry)
2. 〈αu, v〉 = α〈u, v〉 (linearity of scalar multiplication)
3. 〈u+ v,w〉 = 〈u,w〉+ 〈v,w〉 (linearity of vector addition)
4. a. 〈u, u〉 > 0 (positivity)
b. 〈u, u〉 = 0⇒ u = 0
A real vector space V together with an inner product is called a real
inner product space.
For V = Rn, the dot product
〈u, v〉 = u · v
361
Example 1.
Let u = (u1, u2) ∈ R2 and v = (v1, v2) ∈ R2. Show that
〈u, v〉 = u1v1 + 2u2v2
defines an inner product on R2.
Solution:
Define a third vector and a scalar
Let w = (w1,w2) ∈ R2 and α ∈ R.
1. Symmetry
〈u, v〉 = u1v1 + 2u2v2
= v1u1 + 2v2u2
= 〈v,u〉
362
2. Linearity of scalar multiplication
〈αu, v〉 = (αu1)v1 + 2(αu2)v2
= α(u1v1 + 2u2v2)
= α〈u, v〉
3. Linearity of vector addition
〈u+ v,w〉 = (u1 + v1)w1 + 2(u2 + v2)w2
= u1w1 + v1w1 + 2u2w2 + 2v2w2
= (u1w1 + 2u2w2) + (v1w1 + 2v2w2)
= 〈u,w〉 + 〈v,w〉
363
4. Positivity
(a) Now
〈u,u〉 = u21 + 2u22 > 0
(b) Further,
〈u,u〉 = 0
⇒ u21 + 2u22 = 0
⇒ u21 = 0 and u22 = 0
⇒ u1 = u2 = 0
⇒ u = 0
Since all 4 axioms hold,〈u, v〉 is an inner product over R2.
364
Example 2.
Let u = (u1, u2, u3) ∈ R3 and v = (v1, v2, v3) ∈ R3. Show that
〈u, v〉 = u1v1 − u2v2 + u3v3
does not define an inner product on R3.
Solution:
Give a specific counterexample
Let u = (0, 1, 0). Then
〈u,u〉 = 0− 1 + 0 = −1 < 0
Since axiom 4a does not hold, 〈u, v〉 is not an inner product over R3.
365
Note:
Let u = (u1, · · · , un) ∈ Rn and v = (v1, · · · , vn) ∈ Rn.
◮ The definition of 〈u, v〉 in examples 1 and 2 can be written in the
form
〈u, v〉 = [u]TA [v]
where A is a real symmetric n× n matrix, so
A = AT .
◮ This is true for all inner products 〈u, v〉 defined on Rn.
◮ When written in this matrix form, axioms 1, 2 and 3 follow
immediately from properties of matrices. We only need to check
axiom 4.
366
Example 3.
Let u = (u1, u2) ∈ R2, v = (v1, v2) ∈ R2 and
〈u, v〉 = 2u1v1 − 2u1v2 − 2u2v1 + 3u2v2.
(a) Write 〈u, v〉 in terms of a symmetric matrix.
(b) Show that 〈u, v〉 defines an inner product on R2.
Solution:
(a) By inspection,
〈u, v〉 = [u1 u2]
[
2 −2
−2 3
] [
v1
v2
]
where A =
[
2 −2
−2 3
]
is a symmetric matrix.
Putting u = v gives
367
〈u,u〉 = 2u21 − 4u1u2 + 3u22
= 2(u21 − 2u1u2 +
3
2
u22)
= 2((u1 − u2)2 + 1
2
u22)
(a) This is a sum of squares with positive coefficients, so
〈u,u〉 > 0 for all u ∈ R2
(b) Further
〈u,u〉 = 0
⇒ u1 − u2 = 0 and u2 = 0
⇒ u = 0
Hence axiom 4 is satisfied. Since A is a symmetric matrix and axiom 4
is satisfied, then 〈u, v〉 defines an inner product on R2.
368
Example 4.
Let p = p(x) ∈ Pn and q = q(x) ∈ Pn. Show that
〈p,q〉 =
∫ 1
0
p(x)q(x) dx
defines an inner product on Pn.
Solution:
Define a third polynomial and a scalar
Let r = r(x) ∈ Pn and α ∈ R.
1. Symmetry
〈p,q〉 =
∫ 1
0
p(x)q(x) dx
=
∫ 1
0
q(x)p(x) dx
= 〈q,p〉
369
2. Linearity of scalar multiplication
〈αp,q〉 =
∫ 1
0
αp(x)q(x) dx
= α
∫ 1
0
p(x)q(x) dx
= α〈p,q〉
3. Linearity of vector addition
〈p+ q, r〉 =
∫ 1
0
(p(x) + q(x))r(x) dx
=
∫ 1
0
p(x)r(x) + q(x)r(x) dx
=
∫ 1
0
p(x)r(x) dx +
∫ 1
0
q(x)r(x) dx
= 〈p, r〉 + 〈q, r〉
370
4. Positivity
(a) Now
〈p,p〉 =
∫ 1
0
p(x)p(x) dx
=
∫ 1
0
(p(x))2 dx
> 0
(b) Since p(x) is continuous on [0, 1] and (p(x))2 > 0 for all x , then
∫ 1
0
(p(x))2dx = 0 ⇔ p(x) = 0
Hence
〈p,p〉 = 0⇒ p = 0
Since all 4 axioms hold, 〈p,q〉 is an inner product on Pn.
371
Complex vector spaces
An inner product on the complex vector space Cn is defined slightly
differently.
Definition (Hermitian dot product on Cn)
The Hermitian dot product (or complex dot product) of u, v ∈ Cn is
defined by
u · v = u1v1 + u2v2 + · · ·+ unvn,
where a bar indicates the complex conjugate.
Taking the complex conjugate is necessary to have real (and positive)
lengths:
v · v = v1v1 + v2v2 + · · · + vnvn = |v1|2 + |v2|2 + · · ·+ |vn|2 > 0.
So we can define the length of v by
√
v · v.
372
Example 5.
Let u = (1 + i , 1− i) and v = (i , 1). Compute the Hermitian dot
products u · v, v · u and u · u.
Solution:
Now v = (i , 1) = (−i , 1) and u = (1 + i , 1− i) = (1− i , 1 + i)
Then
u · v = (1 + i)(−i) + (1− i)(1)
= −i + 1 + 1− i
= 2− 2i
v · u = (i)(1− i) + (1)(1 + i)
= i + 1 + 1 + i
= 2 + 2i
u · u = (1 + i)(1− i) + (1− i)(1 + i)
= 2 + 2 = 4
373
Definition (Inner product over C)
A Hermitian inner product on a complex vector space V is a function
that associates to every pair of vectors u, v ∈ V a complex number,
denoted by 〈u, v〉, satisfying the following properties.
For all u, v,w ∈ V and α ∈ C:
1. 〈u, v〉 = 〈v,u〉 (Hermitian symmetry)
2. 〈αu, v〉 = α〈u, v〉 (linearity of scalar multiplication)
3. 〈u+ v,w〉 = 〈u,w〉+ 〈v,w〉 (linearity of vector addition)
4. a. 〈u, u〉 > 0 (positivity)
b. 〈u, u〉 = 0⇒ u = 0
A complex vector space V together with an Hermitian inner product is
called a complex inner product space.
374
Example
The Hermitian dot product on Cn satisfies these properties.
Note:
Properties 2,3 give linearity in the first position:
〈αu+ βv,w〉 = α〈u,w〉 + β〈v,w〉 for α, β ∈ C
Then property 1 gives conjugate linearity in the second position:
〈u, αv + βw〉 = α¯〈u, v〉 + β¯〈u,w〉 for α, β ∈ C.
375
Example
The inner product on Pn(R) defined in example 4 can be generalised to
Pn(C) by replacing the second polynomial in the definition by its
complex conjugate:
〈p,q〉 =
∫ 1
0
p(x)q(x) dx .
Note:
Inner products defined by integration are important in many
applications, for example in quantum mechanics, electrical engineering,
and in the study of differential equations.
376
Example 6.
Let u = (u1, u2) ∈ C2 and v = (v1, v2) ∈ C2. Show that
〈u, v〉 = iu1v1 − iu2v2
does not define an inner product on C2.
Solution:
Give a specific counterexample
Let u = (1, 0). Then
〈u,u〉 = iu1u1 − iu2u2
= i(1)(1) − i(0)(0)
= i
which is not real and positive.
Since axiom 4a does not hold, then 〈u, v〉 is not an inner product over
C
2.
377
7.2 Geometry from inner products
In a general vector space, how can we find “lengths” of vectors and
“angles” between vectors?
Definition (Length, distance)
For a real or complex inner product space, the length (or norm) of a
vector v is defined by
‖v‖ =
√
〈v, v〉.
The distance between u and v is given by
d(u, v) = ‖v − u‖.
378
Definition (Angle)
For a real inner product space, the angle θ between two non-zero
vectors u and v is defined by
cos θ =
〈u, v〉
‖u‖‖v‖ , 0 6 θ 6 pi.
For a complex inner product space, the angle θ between two non-zero
vectors u and v is defined by
cos θ =
Re〈u, v〉
‖u‖‖v‖ , 0 6 θ 6 pi,
where Re〈u, v〉 denotes the real part of the inner product.
379
Note:
In order for these definitions of angle to make sense, we need
−1 6 〈v,u〉‖v‖‖u‖ 6 1 and − 1 6
Re〈v,u〉
‖v‖‖u‖ 6 1.
These results follow immediately from the Cauchy-Schwarz inequality.
Theorem (Cauchy-Schwarz inequality)
Let V be an inner product space. Then for all u, v ∈ V
|〈u, v〉| 6 ‖u‖‖v‖.
Equality holds if and only if one vector is a multiple of the other.
380
Example 7.
Let u = (3, 1) and v = (−2, 3). Using the inner product on R2:
〈(u1, u2), (v1, v2)〉 = u1v1 + 2u2v2
find the
(a) length of u
(b) distance between u and v
(c) angle between u and v
Solution:
(a) The length of u is
‖u‖ = ‖(3, 1)‖
=
√
〈(3, 1), (3, 1)〉
=
√
9 + 2
=
√
11
381
(b) The distance between u and v is
‖v − u‖ = ‖(−5, 2)‖
=
√
〈(−5, 2), (−5, 2)〉
=
√
25 + 8
=
√
33
(c) Now
〈u, v〉 = 〈(3, 1), (−2, 3)〉
= −6 + 6
= 0
Hence
cos θ = 0 ⇒ θ = pi
2
Note:
u and v are orthogonal with respect to this inner product.
382
7.3 Orthogonal sets and the Gram-Schmidt procedure
Definition (Orthogonal vectors)
Let V be an inner product space. Two vectors u ∈ V and v ∈ V are
orthogonal if
〈v,u〉 = 0.
Definition (Orthogonal set)
A set of vectors {v1, . . . , vk} in an inner product space V is orthogonal
if
〈vi , vj 〉 = 0, i 6= j .
That is, the inner product of each distinct pair of vectors is zero.
383
Theorem
Every orthogonal set of non-zero vectors {v1, . . . , vk} in an inner
product space V is linearly independent.
Proof:
Since {v1, . . . , vk} is an orthogonal set, then
〈vi , vj 〉 = 0, i 6= j
Also
〈0, vi 〉 = 0, i = 1, · · · , k
Suppose that
α1v1 + · · · + αkvk = 0 (*)
for some scalars αi .
384
Then, for each i = 1, · · · , k , we have
〈α1v1 + · · ·+ αkvk , vi 〉 = 0
⇒ 〈α1v1, vi 〉+ · · · + 〈αivi , vi 〉+ · · ·+ 〈αkvk , vi 〉 = 0
⇒ α1〈v1, vi 〉+ · · · + αi〈vi , vi 〉+ · · ·+ αk〈vk , vi 〉 = 0
⇒ αi 〈vi , vi 〉 = 0
Since 〈vi , vi 〉 6= 0 then αi = 0.
Hence the only solution of (∗) is
αi = 0, for all i = 1, · · · , k
so {v1, . . . , vk} are linearly independent.
385
Example 8.
Show that, with respect to the dot product,
{v1, v2, v3} = {(1, 1, 1), (1,−1, 0), (1, 1,−2)}
is an orthogonal set of vectors in R3.
Solution:
〈(1, 1, 1), (1,−1, 0)〉 = (1, 1, 1) · (1,−1, 0) = 1− 1 + 0 = 0
〈(1, 1, 1), (1, 1,−2)〉 = (1, 1, 1) · (1, 1,−2) = 1 + 1− 2 = 0
〈(1,−1, 0), (1, 1 − 2)〉 = (1,−1, 0) · (1, 1,−2) = 1− 1 + 0 = 0
Hence the set {v1, v2, v3} is orthogonal with respect to the dot product.
386
Definition (Orthonormal set)
A set of vectors {v1, . . . , vk} in an inner product space V is
orthonormal if it is orthogonal and each vector has length one.
That is
{v1, . . . , vk} is orthonormal ⇔ 〈vi , vj 〉 =
{
0 i 6= j ,
1 i = j .
Note:
Any orthogonal set of non-zero vectors can be made orthonormal by
dividing each vector by its length.
387
Example 9.
Using the orthogonal set
{v1, v2, v3} = {(1, 1, 1), (1,−1, 0), (1, 1,−2)},
construct a set of R3 that is orthonormal with respect to the dot
product.
Solution:
The length of each vector is
||v1|| = ||(1, 1, 1)|| =
√
1 + 1 + 1 =
√
3
||v2|| = ||(1,−1, 0)|| =
√
1 + 1 + 0 =
√
2
||v3|| = ||(1, 1,−2)|| =
√
1 + 1 + 4 =
√
6
Hence an orthonormal set is{
1√
3
(1, 1, 1),
1√
2
(1,−1, 0), 1√
6
(1, 1,−2)
}
388
Orthonormal bases
Theorem
Let V be an inner product space. If {v1, . . . , vn} is an orthonormal
basis for V and x ∈ V , then
x = 〈x, v1〉v1 + · · ·+ 〈x, vn〉vn.
Proof:
Since {v1, . . . , vn} is an orthonormal set
〈vi , vj 〉 =
{
0 i 6= j ,
1 i = j .
Since {v1, . . . , vn} is a basis for V , then there is a unique set of scalars
αi (i = 1, · · · , n) such that
x = α1v1 + · · ·+ αnvn.
389
Then, for each i = 1, · · · , n, we have
〈x, vi 〉 = 〈α1v1 + · · ·+ αnvn, vi 〉
= 〈α1v1, vi 〉+ · · · + 〈αivi , vi 〉+ · · · + 〈αnvn, vi 〉
= α1〈v1, vi 〉+ · · · + αi〈vi , vi 〉+ · · · + αn〈vn, vi 〉
= αi
by properties 2 and 3 of inner products.
Hence
x = 〈x, v1〉v1 + · · ·+ 〈x, vn〉vn.
390
Example 10.
Using the dot product, R3 has an orthonormal basis
{b1,b2,b3} =
{
1√
3
(1, 1, 1),
1√
2
(1,−1, 0), 1√
6
(1, 1,−2)
}
.
Express x = (1, 1,−1) as a linear combination of b1,b2,b3.
Solution:
Since the set is orthonormal
x = 〈x,b1〉b1 + 〈x,b2〉b2 + 〈x,b3〉b3.
Now
〈x,b1〉 = (1, 1,−1) · 1√
3
(1, 1, 1) =
1√
3
(1 + 1− 1) = 1√
3
〈x,b2〉 = (1, 1,−1) · 1√
2
(1,−1, 0) = 1√
2
(1− 1) = 0
391
〈x,b3〉 = (1, 1,−1) · 1√
6
(1, 1,−2) = 1√
6
(1 + 1 + 2) =
4√
6
Hence
x =
1√
3
b1 +
4√
6
b3
392
How to produce an orthonormal basis from a general basis
Gram-Schmidt procedure
Suppose {v1, . . . , vk} is a basis for V . Define:
1. u1 =
1
‖v1‖v1
2a. w2 = v2 − 〈v2,u1〉u1
2b. u2 =
1
‖w2‖w2
3a. w3 = v3 − 〈v3,u1〉u1 − 〈v3,u2〉u2
3b. u3 =
1
‖w3‖w3
...
...
ka. wk = vk − 〈vk ,u1〉u1 − · · · − 〈vk ,uk−1〉uk−1
kb. uk =
1
‖wk‖wk
Then {u1, . . . ,uk} is an orthonormal basis for V .
393
Example 11.
Let v1 = (1, 1, 1, 1), v2 = (0, 1, 1, 1), v3 = (0, 0, 1, 1) be a basis for a
subspace V of R4. Use the Gram-Schmidt procedure to find an
orthonormal basis with respect to the dot product for V .
Solution:
Step 1: Find u1 =
v1
‖v1‖
‖v1‖ =
√
12 + 12 + 12 + 12 = 2 ⇒ u1 = 1
2
(1, 1, 1, 1)
394
Step 2a: Find w2 = v2 − (v2 · u1)u1
v2 · u1 = 1
2
(1 + 1 + 1) =
3
2
⇒ w2 = (0, 1, 1, 1) − 3
4
(1, 1, 1, 1) =
1
4
(−3, 1, 1, 1)
Step 2b: Find u2 =
w2
‖w2‖
‖w2‖ = 1
4
((−3)2 + 12 + 12 + 12)1/2 =
√
12
4
=
√
3
2
⇒ u2 = 1
2
√
3
(−3, 1, 1, 1).
395
Step 3a: Find w3 = v3 − (vv3 · u1)vu1 − (v3 · u2)u2
v3 · u1 = 1
2
(1 + 1) = 1, v3 · u2 = 1
2
√
3
(1 + 1) =
1√
3
⇒ w3 = (0, 0, 1, 1) − 1
2
(1, 1, 1, 1) − 1
6
(−3, 1, 1, 1)
=
1
6
(0,−4, 2, 2) = 1
3
(0,−2, 1, 1)
Step 3b: Find u3 =
w3
‖w3‖
‖w3‖ = 1
3
((−2)2 + 12 + 12)1/2 = 1
3
√
6
⇒ u3 = 1√
6
(0,−2, , 1, 1)
396
The orthogonal projection of v ∈ Rn onto u ∈ Rn can be generalised to
a projection onto a subspace W of V .
Definition (Orthogonal projection onto a vector)
Let V be an inner product space. Let u ∈ V be a unit vector.
The orthogonal projection of v onto u is
proju(v) = 〈v,u〉u.
Definition (Orthogonal projection onto a subspace)
Let {u1, . . . ,uk} be an orthonormal basis for the subspace W ⊆ V and
v ∈ V .
The orthogonal projection of v onto W is
projW (v) = 〈v,u1〉u1 + · · · + 〈v,uk 〉uk .
397
Properties of the projection of v onto W
1. projW (v) ∈W
2. If w ∈W , then
projW (w) = w
3. v− projW (v) is orthogonal to W , i.e. orthogonal to every vector in
W .
4. projW (v) is the vector in W that is closest to v. Thus, for all
w ∈W ,
‖v − w‖ > ‖v − projW (v)‖.
5. projW (v) does not depend on the choice of orthonormal basis for
W .
398
Example 12.
Let W = {(x , y , z) | x + y + z = 0}. The set
{u1, u2} =
{
1√
2
(1,−1, 0), 1√
6
(1, 1,−2)
}
is an orthonormal basis for W with respect to the dot product.
(a) Find the projection of the vector v = (1, 2, 3) onto W
(b) Find the (shortest) distance from v to W .
Solution:
(a)
〈v,u1〉 = (1, 2, 3) · 1√
2
(1,−1, 0) = 1√
2
(1− 2 + 0) = −1√
2
〈v,u2〉 = (1, 2, 3) · 1√
6
(1, 1,−2) = 1√
6
(1 + 2− 6) = −3√
6
399
Then
projW (v) = 〈v,u1〉u1 + 〈v,u2〉u2
=
−1√
2
1√
2
(1,−1, 0) + −3√
6
1√
6
(1, 1,−2)
=
(
−1
2
)
(1,−1, 0) −
(
1
2
)
(1, 1,−2)
= (−1, 0, 1)
(b) The distance from v to W is
‖v − projW (v)‖ = ‖(2, 2, 2)‖ =
√
12 = 2
√
3.
400
7.4 Least squares curve fitting
Given a set of data points (x1, y1), (x2, y2),. . . , (xn, yn) we want to find
the line y = a + bx that best approximates the data.
A common approach is to minimise the least squares error E .
E 2 = sum of the squares
of vertical errors
=
n∑
i=1
δ2i
=
n∑
i=1
(yi − (a + bxi ))2
x
y
a+bx
401
This can be written as
E 2 = ‖y − Au‖2
where
y =
y1
...
yn
, A =
1 x1
1 x2
...
...
1 xn
, u =
[
a
b
]
, Au =
a + bx1
a + bx2
...
a + bxn
.
To minimise ‖y − Au‖2, we want Au to be as close as possible to y.
To find the closest point, we project y onto W = {Av | v ∈ R2}, which
is the column space of A.
So we want u ∈ R2 such that Au = projW y.
We can calculate u directly (without finding an orthonormal basis for
W ) by noting that y − projW y is orthogonal to W .
402
Thus
〈w, y − projW y〉 = 0 for all w ∈W
⇒ 〈Av, y − Au〉 = 0 for all v ∈ R2
⇒ (Av)T (y − Au) = 0 for all v ∈ R2
⇒ vTAT (y − Au) = 0 for all v ∈ R2
⇒ AT (y − Au) = 0
⇒ ATy − ATAu = 0
⇒ ATAu = ATy
From this we can calculate u, given that we know A and y.
403
Theorem (Least squares line of best fit)
The least squares solution for the line y = a+ bx of best fit for the
points (x1, y1), . . . , (xn, yn) is specified by
ATA u = ATy,
or, if ATA is invertible,
u = (ATA)−1ATy,
where
y =
y1
...
yn
, A =
1 x1
1 x2
...
...
1 xn
, u =
[
a
b
]
.
404
Example 13.
Find the line of best fit for the data points (−1, 1), (1, 1), (2, 3) using
the method of least squares.
Solution:
Let
A =
1 −11 1
1 2
and y =
11
3
Then
ATA =
[
1 1 1
−1 1 2
] 1 −11 1
1 2
= [ 3 2
2 6
]
and
ATy =
[
1 1 1
−1 1 2
] 11
3
= [ 5
6
]
405
We must solve
ATA u = ATy,
or [
3 2
2 6
] [
a
b
]
=
[
5
6
]
⇒
[
a
b
]
=
[
3 2
2 6
]−1 [
5
6
]
=
1
14
[
6 −2
−2 3
] [
5
6
]
=
1
14
[
18
8
]
=
1
7
[
9
4
]
406
The line of best fit is
y =
9
7
+
4
7
x
407
7.5 Orthogonal matrices and symmetric matrices
Definition (Orthogonal matrix)
An orthogonal matrix Q is a real n × n matrix such that Q−1 = QT .
Note: It is enough to show that QTQ = In or QQ
T = In.
Example 14.
Show that the following rotation matrix is orthogonal:
Q =
[
cos θ − sin θ
sin θ cos θ
]
Solution:
QTQ =
[
cos θ sin θ
− sin θ cos θ
] [
cos θ − sin θ
sin θ cos θ
]
=
[
cos2 θ + sin2 θ − cos θ sin θ + sin θ cos θ
− sin θ cos θ + cos θ sin θ sin2 θ + cos2 θ
]
=
[
1 0
0 1
]
408
Geometric properties of orthogonal matrices
Theorem
If Q is an orthogonal n × n matrix then
1. Qu ·Qv = u · v for all u, v ∈ Rn,
2. det(Q) = ±1.
Note:
◮ Property 1 says that orthogonal matrices preserve dot products.
Hence they also preserve Euclidean lengths and angles.
◮ Geometrically, orthogonal matrices represent rotations when
det(Q) = 1, and reflections composed with rotations when
det(Q) = −1.
409
Example 15.
Prove property 1.
Solution:
Recall that the dot product can be written in matrix form:
x · y = xTy
for column vectors x, y ∈ Rn.
Hence, if Q is orthogonal and u, v ∈ Rn:
Qu ·Qv = (Qu)T (Qv)
= uTQTQv
= uT Inv
= uTv
= u · v.
410
Theorem
A real n × n matrix is orthogonal if and only if its columns form an
orthonormal basis for Rn using the dot product.
Proof:
Let Q =
[
v1 . . . vn
]
where v1, . . . , vn ∈ Rn are column vectors. Then
QTQ =
vT1
...
vTn
[v1 . . . vn] =
vT1 v1 . . . v
T
1 vn
...
...
vTn v1 . . . v
T
n vn
Thus Q is orthogonal iff QTQ = In iff
vi · vj = vTi vj =
{
1 i = j
0 i 6= j
iff v1, . . . , vn are orthonormal.
411
Example 16.
Show that the following matrix is orthogonal:
A =
1√
3
1√
2
1√
6
1√
3
− 1√
2
1√
6
1√
3
0 − 2√
6
Solution:
From example 10, the columns{
1√
3
(1, 1, 1),
1√
2
(1,−1, 0), 1√
6
(1, 1,−2)
}
form an orthonormal basis of R3 with respect to the dot product.
Hence A is orthogonal.
412
Real symmetric matrices
Theorem (Orthogonal Diagonalisation)
Let A be an n× n real symmetric matrix. Then using the dot product
as the inner product on Rn:
1. all eigenvalues λi (i = 1, · · · , n) of A are real,
2. eigenvectors corresponding to distinct eigenvalues are orthogonal,
3. there is an orthonormal basis of eigenvectors {v1, · · · , vn},
4. A is diagonalisable with
A = QDQ−1 = QDQT
where D is the diagonal matrix
diag(λ1, . . . , λn)
and Q is the orthogonal matrix[
v1 · · · vn
]
.
413
Example 17.
Write the symmetric matrix
A =
[
1 −1
−1 1
]
in the form A = QDQT where D is a diagonal matrix.
Solution:
Step 1. Find the eigenvalues of A
0 = det
[
1− λ −1
−1 1− λ
]
= (1− λ)2 − 1
= λ2 − 2λ
= λ(λ− 2)
The eigenvalues are λ = 0 and λ = 2.
414
Step 2. Find a unit eigenvector corresponding to each eigenvalue
For λ = 0:[
1 −1 0
−1 1 0
]
R2 → R2 + R1 ∼
[
1 −1 0
0 0 0
]
Let v2 = t for t ∈ R. Then v1 = v2 = t.
The eigenspace is {
t
[
1
1
]
, t ∈ R
}
An eigenvector is
v1 =
[
1
1
]
Now ||v1||2 = (1, 1) · (1, 1) = 2.
So an eigenvector with length 1 corresponding to λ = 0 is
v1 =
1√
2
[
1
1
]
415
For λ = 2:
[ −1 −1 0
−1 −1 0
]
R2 → R2 − R1 ∼
[ −1 −1 0
0 0 0
]
Let v2 = t for t ∈ R. Then v1 = −v2 = −t.
The eigenspace is {
t
[ −1
1
]
, t ∈ R
}
An eigenvector is
v1 =
[ −1
1
]
Now ||v1||2 = (−1, 1) · (−1, 1) = 2.
So an eigenvector with length 1 corresponding to λ = 2 is
v2 =
1√
2
[ −1
1
]
416
Step 3. Form D and Q.
Hence A = QDQT with
D =
[
0 0
0 2
]
and Q =
1√
2
[
1 −1
1 1
]
.
Note:
In the case of repeated eigenvalues, you may need to use the
Gram-Schmidt procedure in each eigenspace to find an orthonormal
basis of eigenvectors.
417
Singular value decomposition
For a general real m × n matrix A, the diagonalisation formula for
symmetric matrices can be extended to find orthogonal matrices Q1 and
Q2 such that
A = Q1DQ
T
2
where D is a matrix with Dii > 0 and Dij = 0 whenever i 6= j .
These non-negative values σi = Dii are called the singular values of A,
and the above equation is called the singular value decomposition of A.
The number of non-zero singular values σi is equal to the rank of A.
418
Theorem
Every real m × n matrix A has a singular value decomposition
A = Q1DQ
T
2
using the dot product as the inner product, where
◮ Q1 is an m ×m real orthogonal matrix,
◮ Q2 is an n × n real orthogonal matrix,
◮ D is an m × n matrix with Dii = σi > 0 and Dij = 0, i 6= j .
Further,
1. the σ2i are the eigenvalues of AA
T and ATA,
2. the columns of Q1 are eigenvectors ui of AA
T ,
3. the columns of Q2 are eigenvectors vi of A
TA,
4. Avi = σiui .
419
Note:
◮ AAT is an m ×m matrix and ATA is an n × n matrix.
◮ AAT and ATA are real symmetric matrices.
◮ There are orthonormal bases of eigenvectors for AAT of Rm and
for ATA of Rn, which form the columns of Q1 and Q2.
◮ The eigenvalues of AAT and ATA are both real and non-negative.
They both have r positive eigenvalues where r = rank(A).
Note:
Geometrically, this theorem says that every linear transformation from
R
n → Rm factors into a rotation, followed by stretching, followed by
another rotation.
The singular value decomposition is important in many applications,
including digital image compression and stable numerical computations.
420
Example 18.
Let
A =
[
0 −1
0 0
]
and D =
[
σ1 0
0 σ2
]
.
Find orthogonal matrices Q1 and Q2 and scalars σ1, σ2 > 0 such that
A = Q1DQ
T
2
Solution:
Step 1. Find a matrix Q2
ATA =
[
0 0
−1 0
] [
0 −1
0 0
]
=
[
0 0
0 1
]
which has eigenvalues 0 and 1.
For λ = 0, solve
[
0 0 0
0 1 0
]
Let v1 = t for t ∈ R and v2 = 0.
An eigenvector with length 1 is v1 =
[
1
0
]
.
421
For λ = 1, solve
[ −1 0 0
0 0 0
]
Let v2 = t for t ∈ R and v1 = 0.
An eigenvector with length 1 is v2 =
[
0
1
]
.
Then
Q2 =
[
1 0
0 1
]
The singular values are σ1 = 0, σ2 = 1.
Step 2. Find a matrix Q1 compatible with Q2 using property 4
Now Av2 = σ2u2 and
Av2 =
[
0 −1
0 0
] [
0
1
]
=
[−1
0
]
= (1)
[−1
0
]
⇒ u2 =
[−1
0
]
422
Since σ1 = 0, the equation Av1 = σ1u1 does not determine u1.
However, u1 must be orthogonal to u2, so u1 · u2 = 0.
Hence we can choose the unit vector
u1 =
[
0
1
]
.
Hence A = Q1DQ
T
2 with
D =
[
0 0
0 1
]
Q1 =
[
0 −1
1 0
]
Q2 =
[
1 0
0 1
]
423
7.6 Unitary matrices and Hermitian matrices
We now consider the complex analogues of orthogonal and real
symmetric matrices.
Definition (Conjugate transpose of a matrix)
Let A be a complex m × n matrix. We define its conjugate transpose
(or adjoint or Hermitian transpose) A∗ = AT to be the n ×m matrix
whose entries are
(A∗)ij = (A
T
)ij = Aji .
The conjugate transpose of A is sometimes denoted by A†.
Note:
If α ∈ C is a scalar, then the following properties hold whenever the
matrix products and sums exist:
(A∗)∗ = A, (A+ B)∗ = A∗ + B∗,
(AB)∗ = B∗A∗, (αA)∗ = α¯A∗.
424
Definition (Unitary matrix)
An unitary matrix is a complex n× n matrix U such that
U−1 = U∗ = UT .
Note:
◮ To check if a matrix is unitary it suffices to check that U∗U = In or
UU∗ = In.
◮ Any real orthogonal matrix is unitary if viewed as a complex matrix.
425
Example 19.
Show that
U =
1√
2
[−i i
1 1
]
is a unitary matrix.
Solution:
Now
U∗ = UT =
(
1√
2
[
i −i
1 1
])T
=
1√
2
[
i 1
−i 1
]
so
U∗U =
1√
2
[
i 1
−i 1
]
1√
2
[−i i
1 1
]
=
1
2
[
1 + 1 −1 + 1
−1 + 1 1 + 1
]
=
[
1 0
0 1
]
.
Hence U is a unitary matrix.
426
The Hermitian dot product on Cn can be written in matrix form:
〈u, v〉 = u1v1 + u2v2 + · · ·+ unvn = v¯Tu = v∗u,
where u, v ∈ Cn are regarded as column vectors.
This leads to the following analogues of our previous results.
Theorem
If U is a unitary n × n matrix then
1. 〈Uu,Uv〉 = 〈u, v〉 for all u, v ∈ Cn,
2. | det(U)| = 1,
where 〈u, v〉 is the Hermitian dot product on Cn.
427
Theorem
A complex n × n matrix is unitary if and only its columns form an
orthonormal basis for Cn using the Hermitian dot product.
Definition (Hermitian matrix)
A complex n × n matrix A is Hermitian or self-adjoint if
A∗ = A.
Note:
Any real symmetric matrix is Hermitian if viewed as a complex matrix.
428
Example 20.
Are the following matrices Hermitian?
(a) A =
[
1 i
−i 1
]
(b) B =
[
i 0
0 i
]
Solution:
(a) Since
A∗ = AT =
[
1 −i
i 1
]T
=
[
1 i
−i 1
]
= A
then A is Hermitian.
(b) Since
B∗ = BT =
[−i 0
0 −i
]
6= B
then B is not Hermitian.
429
Theorem
Let A be an n× n Hermitian matrix. Then using the Hermitian dot
product as inner product on Cn:
1. all eigenvalues λi (i = 1, · · · , n) of A are real,
2. eigenvectors corresponding to distinct eigenvalues are orthogonal,
3. there is an orthonormal basis of eigenvectors {v1, · · · , vn},
4. A is diagonalisable with
A = UDU−1 = UDU∗
where D is the diagonal matrix
diag(λ1, λ2, . . . , λn)
and U is the unitary matrix [
v1 · · · vn
]
.
430
Example 21.
Write the Hermitian matrix
A =
[
1 i
−i 1
]
in the form
A = UDU−1
where D is a diagonal matrix and U is a unitary matrix.
Solution:
Step 1. Find the eigenvalues of A
0 = det
[
1− λ i
−i 1− λ
]
= (1− λ)2 − 1
= λ2 − 2λ
= λ(λ− 2)
The eigenvalues are λ = 0 and λ = 2.
431
Step 2. Find a unit eigenvector corresponding to each eigenvalue
For λ = 0: [
1 i 0
−i 1 0
]
R2 → R2 + iR1 ∼
[
1 i 0
0 0 0
]
Let v2 = t for t ∈ C. Then v1 = −it.
The eigenspace is {
t
[ −i
1
]
, t ∈ C
}
An eigenvector is
v1 =
[ −i
1
]
Now ||v1||2 = 〈(−i , 1), (−i , 1)〉 = 2.
So an eigenvector with length 1 corresponding to λ = 0 is
v1 =
1√
2
[ −i
1
]
432
For λ = 2:[ −1 i 0
−i −1 0
]
R2 → R2 − iR1 ∼
[ −1 i 0
0 0 0
]
Let v2 = t for t ∈ C. Then v1 = it.
The eigenspace is {
t
[
i
1
]
, t ∈ C
}
An eigenvector is
v2 =
[
i
1
]
Now ||v2||2 = 〈(i , 1), (i , 1)〉 = 2.
So an eigenvector with length 1 corresponding to λ = 2 is
v2 =
1√
2
[
i
1
]
433
Step 3. Form D and U.
Hence
A = UDU−1
with
D =
[
0 0
0 2
]
and U =
1√
2
[−i i
1 1
]
.
Note:
U is the unitary matrix from example 19.
434