MAST10007-mast10007代写
时间:2023-09-17
MAST10007 Linear Algebra
THE UNIVERSITY OF MELBOURNE
SCHOOL OF MATHEMATICS AND STATISTICS
Semester 2, 2023
Lecture Notes
These notes have been made in accordance with the provisions of Part VB of the copyright act for teaching purposes of the
University. These notes are for the use of students of the University of Melbourne enrolled in MAST10007 Linear Algebra.
1
Topic 1: Linear equations [AR 1.1 and 1.2]
1.1. Systems of linear equations. Row operations.
1.2. Reduction of systems to row-echelon form.
1.3. Reduction of systems to reduced row-echelon form.
1.4. Consistent and inconsistent systems.
2
1.1 Systems of linear equations. Row operations
Linear equations
Definition (Linear equation and linear system)
A linear equation in n variables, x1, x2, . . . , xn, is an equation of the form
a1x1 + a2x2 + · · ·+ anxn = b,
where the coefficients a1, . . . , an and the constant term b are constants.
A finite collection of linear equations in the variables x1, x2, . . . , xn is
called a system of linear equations or a linear system.
Example
x1 + 5x2 + 6x3 = 100
x2 − x3 = −1
−x1 + x3 = 11
3
Definition (Homogeneous linear system)
A system of linear equations is called homogeneous if all the constants
on the right hand side are zero.
Example
x1 + 5x2 + 6x3 = 0
x2 − x3 = 0
−x1 + x3 = 0
Definition (Solution of a system of linear equations)
A solution to a system of linear equations in the variables x1, . . . , xn is a
set of values of these variables which satisfy every equation in the
system.
4
Example 1. Data fitting using a polynomial
Find the coefficients c , α, β, γ such that the cubic polynomial
y = c + αx + βx2 + γx3 passes through the points specified below.
x y
−0.1 0.90483
0 1
0.1 1.10517
0.2 1.2214
Solution:
Substituting the points (x , y) into the cubic polynomial gives:
5
Note:
Solving such equations by hand can be tedious, so we can use a
computer package such as MATLAB.
6
Example 2. (Revision)
Solve the following linear system.
2x − y = 3
x + y = 0
Solution:
Graphically
Note:
◮ Need an accurate drawing.
◮ Not practical for three or more variables.
7
Elimination
Note:
Elimination of variables will always give a solution, but we need to do
this systematically and not in an adhoc manner, for three or more
variables.
8
Definition (Matrix)
A matrix is a rectangular array of numbers. A m× n matrix has m rows
and n columns.
Definition (Coefficient matrix and augmented matrix)
The coefficient matrix for a linear system is the matrix formed from the
coefficients in the equations.
The augmented matrix for a linear system is the matrix formed from the
coefficients in the equations and the constant terms. These are usually
separated by a vertical line.
9
Example 3.
Write down a coefficient matrix and augmented matrix for the following
linear system:
2x − y = 3
x + y = 0
Solution:
10
Row Operations
To find the solutions of a linear system, we perform row operations to
simplify the augmented matrix. An essential condition is that the row
operations we perform do not change the set of solutions to the system.
Definition (Elementary row operations)
The elementary row operations are:
1. Interchanging two rows.
2. Multiplying a row by a non-zero constant.
3. Adding a multiple of one row to another row.
Note:
The matrices produced after each row operation are not equal but are
equivalent, meaning that the solution set is the same for the system
represented by each augmented matrix. We use the symbol ∼ to denote
equivalence of matrices.
11
Example 4.
Solve the following system using elementary row operations:
2x − y = 3
x + y = 0
Solution:
12
1.2 Reduction of systems to row-echelon form
Definition (Leading entry)
The first non-zero entry in each row of a matrix is called the leading
entry.
Definition (Row-echelon form)
A matrix is in row-echelon form if:
1. For any non-zero two rows, the leading entry of the lower row is
further to the right than the leading entry in the higher row.
2. Any rows that consist entirely of zeros are grouped together at the
bottom of the matrix.
Note:
These conditions imply that in each column containing a leading entry,
all entries below the leading entry are zero.
13
Examples[
1 −2 3 4 5 ] row-echelon form
1 0 0 30 4 1 2
0 0 0 3
row-echelon form
0 0 0 2 4
0 0 3 1 6
0 0 0 0 0
2 −3 6 −4 9
not row-echelon form
14
Gaussian elimination
Gaussian elimination is a systematic way to reduce a matrix to
row-echelon form using row operations.
Note: The row-echelon form obtained is not unique.
Gaussian elimination
1. Interchange rows, if necessary, to bring a non-zero number to the
top of the first column with a non-zero entry.
2. (Optional, but often useful.) Divide the first row by its leading
entry to create a new leading entry 1.
3. Add suitable multiples of the top row to lower rows so that all
entries below the leading entry are zero.
4. Start again at Step 1 applied to the matrix without the first row.
To solve a linear system we read off the equations from the row-echelon
matrix and then solve the equations to find the unknowns, starting with
the last equation. This final step is called back substitution.
15
Example 5.
Solve the following system of linear equations using Gaussian
elimination:
x − 3y + 2z = 11
2x − 3y − 2z = 13
4x − 2y + 5z = 31
Solution:
Step 1. Write the system in augmented matrix form.
16
Step 2. Use elementary row operations to reduce the augmented matrix
to row-echelon form.
17
Step 3. Read off the equations from the augmented matrix and use back
substitution to solve for the unknowns, starting with the last equation.
18
1.3 Reduction of systems to reduced row-echelon form
Definition (Reduced row-echelon form)
A matrix is in reduced row-echelon form if the following three conditions
are satisfied:
1. It is in row-echelon form.
2. Each leading entry is equal to 1 (called a leading 1).
3. In each column containing a leading 1, all other entries are zero.
19
Examples[
1 −2 3 −4 5 ] reduced row-echelon form
[
1 2 0
0 0 1
]
reduced row-echelon form
1 0 0 30 1 1 2
0 0 0 3
is not in reduced row-echelon form
1 0 0 2 4
0 1 0 1 6
0 0 0 0 0
0 0 1 −4 9
is not in reduced row-echelon form
20
Gauss-Jordan elimination
Gauss-Jordan elimination is a systematic way to reduce a matrix to
reduced row-echelon form using row operations.
Note: The reduced row-echelon form obtained is unique.
Gauss-Jordan elimination
1. Use Gaussian elimination to reduce matrix to row-echelon form.
2. Multiply each non-zero row by an appropriate number to create a
leading 1 (type 2 row operations).
3. Use row operations (of type 3) to create zeros above the leading
entries.
21
Example 6.
Solve the following system of linear equations using Gauss-Jordan
elimination:
x − 3y + 2z = 11
2x − 3y − 2z = 13
4x − 2y + 5z = 31
Solution:
Step 1. Use Gaussian elimination to reduce the augmented matrix to
row-echelon form.
22
Step 2. Divide rows by their leading entry.
Step 3. Add multiples of the last row to the rows above to make the
entries above its leading 1 equal to zero.
23
Step 4. Add multiples of the penultimate (2nd last) row to the rows
above to make the entries above its leading 1 equal to zero.
Step 5. Read off the answers.
24
1.4 Consistent and inconsistent systems
Theorem
A system of linear equations has zero solutions, one solution, or
infinitely many solutions. There are no other possibilities.
We say that the system is:
◮ inconsistent if it has no solutions.
◮ consistent if it has at least one solution.
We can determine whether a system is consistent or inconsistent by
reducing its augmented matrix to row-echelon form.
Note:
Every homogeneous linear system is consistent, since it always has at
least one solution: x1 = 0, . . . , xn = 0.
25
Inconsistent systems
Theorem
The system is inconsistent if and only if there is at least one row in the
row-echelon matrix having all entries equal to zero except for a non-zero
final entry.
Example 7.
Solve the system with row-echelon form:
1 0 1 −20 2 2 4
0 0 0 −3
Solution:
26
Geometrically, an inconsistent system in 3 variables has no common
point of intersection for the planes determined by the system.
27
Consistent systems
Theorem
Suppose we have a consistent linear system with n variables.
◮ If the row-reduced augmented matrix has exactly n non-zero rows,
then the system has a unique solution.
◮ If the row-reduced augmented matrix has < n non-zero rows,
then the system has infinitely many solutions.
◮ If r is the number of non-zero rows in the row-echelon form,
then n − r parameters are needed to specify the solution set.
More precisely, in the row-echelon form of the augmented matrix, there
will be n− r columns which contain no leading entry. We can choose
the variable corresponding to such a column arbitrarily. The values for
the remaining variables are then uniquely determined.
Note: r is called the rank of the augmented matrix.
28
Example 8.
Solve the system with reduced row-echelon form:
1 0 0 20 1 0 −4
0 0 1 15
Solution:
29
Example 9.
Solve the system with reduced row-echelon form:
1 2 0 0 5 1
0 0 1 0 6 2
0 0 0 1 7 3
0 0 0 0 0 0
Solution:
30
31
Geometrically, a consistent system in 3 variables has a common point,
line or plane of intersection for the planes determined by the system.
32
Example 10. Calculating flows in networks
At each node • we require
sum of flows in = sum of flows out.
Determine a, b, c and d in the following network.
33
Solution:
34
35
36
Note:
◮ x = t(−1, 1,−1, 1), t ∈ R is the general solution of the
homogeneous linear system.
◮ x = (9, 1, 6, 0) is called a particular solution of the linear system.
37
Example 11.
Find the values of a, b ∈ R for which the system
x − 2y + z = 4
2x − 3y + z = 7
3x − 6y + az = b
has
(a) no solution,
(b) a unique solution,
(c) an infinite number of solutions.
Solution:
38
39
Topic 2: Matrices and Determinants [AR 1.3–1.7, 2.1–2.4]
2.1 Matrix notation
2.2 Matrix operations
2.3 Matrix inverses
2.4 Elementary matrices
2.5 Linear systems revisited
2.6 Determinants
40
2.1 Matrix notation
Definition (Matrix)
A matrix of size m × n is a rectangular array of numbers with m rows
and n columns.
A =
A11 A12 . . . A1n
A21 A22 . . . A2n
...
...
. . .
...
Am1 Am2 . . . Amn
or A = [Aij ]
We denote by Aij the entry in the i -th row and j-th column of A where
i = 1, 2, . . . ,m and j = 1, 2, . . . , n.
41
Special matrices
◮ A matrix with the same number of rows as columns is called a
square matrix.
◮ A matrix with one row is called a row matrix or row vector.
◮ A matrix with one column is called a column matrix or column
vector.
◮ A square matrix with Aij = 0 for i 6= j is called a diagonal matrix.
e.g.
[
2 0 0
0 −1 0
0 0 5
]
◮ A square matrix with Aij = 0 for i > j is called an upper triangular
matrix.
e.g.
[
2 1 3
0 −1 4
0 0 5
]
◮ A square matrix with Aij = 0 for i < j is called a lower triangular
matrix.
e.g.
[
2 0 0
1 −1 0
5 3 5
]
42
◮ A matrix with all zero entries is called a zero matrix. We denote
the zero matrix by 0 or, if size is important, we write 0m,n to
denote the zero matrix of size m × n.
e.g. 02,2 = [ 0 00 0 ], 02,3 = [
0 0 0
0 0 0 ]
◮ A square matrix A satisfying Aij =
{
1 if i = j
0 if i 6= j is called an
identity matrix. We write I to denote an identity matrix or, if size
is important, we write In to denote the identity matrix of size n× n.
e.g. I2 = [ 1 00 1 ], I3 =
[
1 0 0
0 1 0
0 0 1
]
43
2.2 Matrix operations
Recall that we can multiply a matrix by a scalar, add matrices of the
same size, and multiply matrices if the sizes are compatible.
Definition (Scalar multiplication)
Let A be a matrix and c be any scalar (real or complex number). The
product cA is the matrix obtained by multiplying all entries of A by c .
It is called a scalar multiple of A:
(cA)ij = c(Aij)
Definition (Addition of matrices)
Let A and B be m × n matrices. The sum A+ B is an m × n matrix
obtained by adding corresponding entries of A and B .
(A+ B)ij = Aij + Bij
44
Definition (Matrix multiplication)
Let A be an m× n matrix and B be a n× q matrix. The product AB is
a matrix of size m × q.
The entry in position (i , j) of the matrix product is obtained by taking
row i of A, and column j of B , then multiplying together the entries in
order and adding.
(AB)ij =
n∑
k=1
AikBkj
Note:
◮ The product AB is only defined if the number of columns in A is
equal to the number of rows in B .
◮ We can think of matrix multiplication in terms of dot products:
(AB)ij = (row i of A) · (column j of B).
45
Properties of matrix multiplication
Whenever the matrix products and sums are defined:
1. A(B + C ) = AB + AC (left distributivity)
2. (A+ B)C = AC + BC (right distributivity)
3. A(BC ) = (AB)C (associativity)
4. A(αB) = α(AB)
5. AIn = ImA = A (where A has size m × n)
6. A0 = 0 and 0A = 0
where α is a scalar and 0 denotes the zero matrix of appropriate size.
Definition (Matrix powers)
If A is a square matrix and n > 1 is an integer we define
An = AA · · ·A
as the product of n copies of A.
46
Application of matrix powers
Example 1. Graphs and adjacency matrices
A graph is a set of points called vertices together with a set of lines or
curves called edges which connect certain vertices.
Every graph has a corresponding adjacency matrix that specifies which
vertices Vi are connected. If the graph has n vertices then the
adjacency matrix is n× n and the (i , j) entry Aij is equal to the number
of edges from Vi to Vj .
47
Determine the adjacency matrix for the above graph.
Solution:
48
Example 2. Walks in a graph
A walk in a graph is a sequence of edges linking one vertex to another.
The length of a walk is equal to the number of edges it traverses.
Theorem
If A is an n× n adjacency matrix of a graph and Akij represents the (i , j)
entry of Ak , then Akij is equal to the number of walks of length k from
Vi to Vj for each k > 1.
In particular, the entry Aij of A gives the number of walks of length 1
from Vi to Vj .
Problem:
Determine the number of walks of length 3 between V1 and V4 for the
graph in example 1.
49
Solution:
To find walks of length 3, we compute A3.
A3 =
2 6 3 3 5
6 6 6 7 7
3 6 2 5 3
3 7 5 4 7
5 7 3 7 4
50
Definition (Transpose of a matrix)
Let A be an m × n matrix. The transpose of A, denoted by AT , is the
n ×m matrix whose entries are given by interchanging the rows and
columns of A:
(AT )ij = Aji
Example 3.
Let A =
[
1 2 3
4 5 6
]
. Find AT .
Solution:
51
Properties of the transpose
Let α be a scalar. The following properties hold whenever the matrix
products and sums exist:
1.
(
AT
)T
= A
2. (A+ B)T = AT + BT
3. (αA)T = αAT
4. (AB)T = BTAT
Proof: Property 4.
(1) Check that (AB)T and BTAT are the same size.
52
(2) Show that the corresponding entries are equal for all i and j .
53
2.3 Matrix inverses
Definition (Matrix inverse)
An n× n matrix A is called invertible if there exists a matrix B such that
AB = BA = In.
The matrix B is called the inverse of A and is denoted by A−1. If A is
not invertible, we say that A is singular.
We can prove from the definition that:
◮ (If it exists) A−1 has the same size as A.
◮ (If it exists) A−1 is unique.
◮ If A is invertible, then A−1 is invertible and (A−1)−1 = A.
◮ I−1n = In
◮ 0 is singular.
◮ A matrix that has a row or column consisting entirely of zeros is
singular.
54
Properties of the matrix inverse
If A and B are n× n invertible matrices and α is a non-zero scalar, then
1. (αA)−1 = 1α A
−1
2. (AB)−1 = B−1A−1
3.
(
Ak
)−1
=
(
A−1
)k
(for all integers k > 1)
4.
(
AT
)−1
=
(
A−1
)T
Proof: Property 2.
55
56
Theorem (Inverse of a 2× 2 matrix)
Let A=
[
a b
c d
]
. Then
1. A is invertible ⇐⇒ ad − bc 6= 0.
2. If ad − bc 6= 0, then A−1 = 1
ad − bc
[
d −b
−c a
]
.
Example 4.
Find the inverse of A =
[
2 −1
1 1
]
.
Solution:
57
Finding the inverse of a square matrix via Gauss-Jordan elimination
Calculating the inverse of a matrix
Algorithm: input: n × n matrix A
output: A−1 or “A is not invertible”.
1. Construct the augmented matrix [A | In ].
2. Apply row operations to [A | In ] to get the block corresponding to
A into reduced row-echelon form. This gives
[A | In ] ∼ [R | B ]
where R is in reduced row-echelon form.
3. If R = In, then A is invertible and A
−1 = B .
If R 6= In, then A is singular , i.e. A−1 does not exist.
58
Example 5.
Find the inverse of A =
1 2 1−1 −1 1
0 1 3
.
Solution:
Form the augmented matrix [A | I3 ] and perform row operations so that
A is in reduced row-echelon form.
59
60
2.4 Elementary matrices
The effect of an elementary row operation can be achieved by
multiplication on the left by an elementary matrix.
Definition (Elementary matrix)
An n× n matrix is an elementary matrix if it can be obtained from In by
performing a single elementary row operation.
Theorem
Let Ep be the elementary matrix obtained by applying a row operation p
to In.
If A is a matrix such that EpA is defined, then EpA is equal to the result
of performing p on A.
Thus, we can perform a sequence of elementary row operations using a
corresponding sequence of elementary matrices.
61
Example 6.
Find the elementary matrices obtained by applying the following row
operations on I3.
(a) R2 ↔ R3 (b) R1 → 2R1
(c) R3 → R3 + 3R1
Solution:
62
Example 7.
Let A =
[
1 2
3 4
]
. Assume that
[
1 2
3 4
]
∼
[
1 2
0 −2
]
∼
[
1 2
0 1
]
∼
[
1 0
0 1
]
.
(a) Write I2 as a product of elementary matrices and A.
(b) Write A−1 as a product of elementary matrices.
(c) Write A as a product of elementary matrices.
Solution:
63
64
In general, if A ∼ In by a sequence of elementary row operations, then
there is a sequence of elementary matrices E1,E2, . . . ,Ek such that
EkEk−1 · · ·E2E1A = In.
Each of the elementary matrices Ep (p = 1, . . . k) is invertible.
Theorem
1. Let A be an n × n matrix. Then
A is invertible ⇐⇒ A ∼ In.
2. Every invertible matrix A can be written as a product of elementary
matrices:
A = E−11 E
−1
2 · · ·E−1k .
3. If A and B are n × n matrices such that AB = In, then A and B
are invertible with A−1 = B and B−1 = A.
65
2.5 Linear systems revisited
A linear system in the variables x1, . . . , xn can be written in the form
Ax = b
where A is an m × n matrix, b is an m × 1 matrix and x is the n × 1
matrix [x1, . . . , xn]
T .
Theorem
If A is an invertible n × n matrix, then every linear system of the form
Ax = b has a unique solution, given by x = A−1b.
Proof:
Since A is an n × n invertible matrix, then AA−1 = A−1A = In. So
Ax = b
⇒A−1Ax = A−1b
⇒Inx = A−1b
⇒x = A−1b
66
Example 8.
Use a matrix inverse to solve the linear system
x + 2y + z = −4
−x − y + z = 11
y + 3z = 21
Solution:
Step 1. Write the system in the form Ax = b
Step 2. Find A−1.
67
Step 3.
The solution is x = A−1b.
Note:
Once we know A−1 we can solve the linear system Ax = b for any
choice of b.
68
Rank of a matrix
Definition (Rank of a matrix)
The rank of a matrix A is the number of non-zero rows in the reduced
row-echelon form of A.
Note:
◮ This is the same as the number of non-zero rows in any
row-echelon form of A.
◮ If A has size m × n, then rank(A) 6 m and rank(A) 6 n.
69
Example 9.
Find the rank of
A =
1 −1 2 10 1 1 −2
1 −3 0 5
Solution:
70
Rank and solutions of linear systems
We can rewrite the results of section 1.4 for the solutions of a linear
system in terms of rank.
Theorem
The linear system Ax = b, where A is an m × n matrix, has
1. No solution if rank(A) < rank([A |b]).
2. A unique solution if rank(A) = rank([A |b]) and rank(A) = n.
3. Infinitely many solutions if rank(A) = rank([A |b]) and
rank(A) < n.
Note:
rank(A) 6 rank([A |b])
71
Example 10.
Determine the number of solutions of the system with row-echelon form:
1 −1 2 10 1 1 −2
1 −3 0 5
∼
1 −1 2 10 1 1 −2
0 0 0 0
Solution:
72
Theorem
If A is an n × n matrix, the following conditions are equivalent:
1. A is invertible.
2. Ax = b has a unique solution for any b.
3. The rank of A is n.
4. The reduced row-echelon form of A is In.
73
2.6 Determinants
The inverse of
A =
[
a b
c d
]
exists if and only if ad − bc 6= 0.
We call ad − bc the determinant of A, and write this as det(A) or |A|.
Geometrically, | det(A)| gives the area of the parallelogram spanned by
the row vectors v1 = (a, b), v2 = (c , d) of A.
The sign of det(A) depends on the orientation of v1, v2.
74
Examples of area and determinants
Multiplying one row by a positive scalar
The parallelograms of the row vectors of
(1,0)
(0,2)
A =
[
1 0
0 2
]
and B =
[
3 0
0 2
]
(3,0)
(0,2)
det(A)=2, area equals 2 det(B)=6, area equals 6
75
Adding a multiple of one row to another row
The parallelograms of the row vectors of
(3,0)
(2,2)
A =
[
3 0
2 2
]
and B =
[
3 0
0 2
]
(3,0)
(0,2)
det(A)=6, area equals 6 det(B)=6, area equals 6
76
Multiplying a row by −1
The parallelograms of the row vectors of
(3,0)
(2,2)
A =
[
3 0
2 2
]
and B =
[−3 0
2 2
]
(-3,0)
(2,2)
anti-clockwise orientation
det(A)=6, area equals 6
clockwise orientation
det(B)=-6, area equals 6
77
Interchanging two rows
The parallelograms of the row vectors of
(1,0)
(0,1)
A =
[
1 0
0 1
]
anti-clockwise orientation
det(A)=1, area equals 1
(1,0)
(0,1)
B =
[
0 1
1 0
]
clockwise orientation
det(A)=-1, area equals 1
and
78
Defining determinants
For n × n matrices the determinant can be interpreted as the volume of
an n-dimensional parallelepiped.
For n × n matrices we can define and compute the determinant by
understanding how it changes under elementary row operations:
Definition (Determinant)
The determinant is a function that maps each n × n matrix A to a
number, denoted det(A) or |A|, that satisfies the following conditions:
D1. Adding one row to another row does not change the determinant.
D2. Multiplying any row by a scalar α multiplies the determinant by α.
D3. Swapping two rows multiplies the determinant by −1.
D4. The determinant of the identity matrix In equals 1.
79
Theorem
These four conditions specify the determinant uniquely.
Note:
◮ Definitions D1 and D2 imply that adding any multiple of one row
to another row does not change the determinant.
◮ All invertible n × n matrices have a non zero determinant.
80
Example 11.
Compute the determinant of
A =
2 6 90 3 8
0 0 −1
using row operations.
Solution:
81
82
A similar argument proves the following:
Theorem
If A is an upper triangular or lower triangular matrix, then det(A) is the
product of the elements on the main diagonal.
Calculating determinants
For a general matrix, we use Gaussian elimination to transform the
matrix into triangular form, keeping track of what each step does to the
determinant. Then we can read off the determinant using the theorem.
83
Example 12.
Compute the determinant of
A =
1 2 1−1 1 1
0 1 3
using row operations.
Solution:
84
Properties of the determinant
Theorem
Let A,B be n × n matrices and let α be a scalar. Then
1. If A has a row (or column) of zeros, then det(A) = 0.
2. If A has a row (or column) which is a scalar multiple of another
row (or column) then det(A) = 0.
3. det(αA) = αn det(A)
4. det(AB) = det(A) det(B)
5. det
(
AT
)
= det(A)
6. A is invertible if and only if det(A) 6= 0.
7. If A is invertible then det
(
A−1
)
=
1
det(A)
.
Note:
Property 5 means that we can replace rows by columns in all the
previous properties of determinants, including D1-D3.
85
Example 13.
Prove property 7 for the determinant, using the definition of
determinant and the previous properties.
Solution:
86
Cofactor expansion
Another way to calculate determinants is to use cofactor expansion.
Definition (Cofactor)
Let A be a square matrix. The (i , j)-cofactor of A, denoted by Cij , is
the number given by
Cij = (−1)i+jdet (A(i , j))
where A(i , j) is the matrix obtained from A by deleting the ith row and
jth column.
87
Example 14.
If A =
1 2 1−1 −1 1
0 1 3
, calculate C23.
Solution:
88
Theorem (Cofactor expansion)
The determinant of an n × n matrix A can be computed by choosing
any row (or column) of A and multiplying the entries in that row (or
column) by their cofactors and then adding the resulting products.
That is, for each 1 6 i 6 n,
det(A) = Ai1Ci1 + Ai2Ci2 + · · ·+ AinCin =
n∑
k=1
AikCik
(cofactor expansion along the ith row)
and for each 1 6 j 6 n,
det(A) = A1jC1j + A2jC2j + · · ·+ AnjCnj =
n∑
k=1
AkjCkj
(cofactor expansion along the jth column).
89
How do you remember the sign of the cofactor?
The (1, 1) cofactor always has sign +. Starting from there, imagine
walking to the square you want using either horizontal or vertical steps.
The appropriate sign will change at each step.
We can visualise this arrangement with the following matrix:
+ − + − . . .
− + − + . . .
+ − + − . . .
− + − + . . .
...
...
...
...
. . .
.
So, for example, C13 is assigned + but C32 is assigned −.
Note:
The cofactor method is particularly useful for matrices with many zeros.
90
Example 15.
Compute the determinant of
A =
1 2 1−1 1 1
0 1 3
using cofactor expansion.
Solution:
91
Example 16.
Calculate |A| =
∣∣∣∣∣∣∣∣
1 −2 0 1
3 2 2 0
1 0 1 0
0 −4 2 4
∣∣∣∣∣∣∣∣
using cofactors.
Solution:
92
93
Topic 3: Euclidean Vector Spaces [AR 3.1-3.5]
3.1 Vectors in Rn
3.2 Dot product
3.3 Cross product of vectors in R3
3.4 Lines and planes
94
3.1 Vectors in Rn
Definition (Addition and scalar multiplication of vectors in Rn)
We define Rn as the set of all ordered n-tuples of real numbers and
write these as row vectors:
v = (v1, v2, . . . , vn), where vi ∈ R for i = 1, 2, . . . , n.
Let v = (v1, v2, . . . , vn) ∈ Rn, u = (u1, u2, . . . , un) ∈ Rn and α ∈ R.
Then addition and scalar multiplication are defined by:
v + u = (v1, . . . , vn) + (u1, . . . , un) = (v1 + u1, . . . , vn + un)
αv = α(v1, v2, . . . , vn) = (αv1, αv2, . . . , αvn)
Note:
In handwritten work, we denote vectors using the notation ~v or
˜
v or v .
95
In R3, the unit vectors in the x , y , z directions are often denoted by
i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1)
so any vector in R3 can be written in the form
v = (v1, v2, v3) = v1i+ v2 j+ v3k.
Definition (Length of a vector)
The length (norm, magnitude) of a vector u = (u1, u2, . . . , un) ∈ Rn is
given by
‖u‖ =
√
u21 + u
2
2 + · · ·+ u2n.
A vector of length 1 is called a unit vector.
Note:
We say a vector u is parallel to a vector v if u is a scalar multiple of v.
Then every non-zero vector u is parallel to a unit vector uˆ, where
uˆ =
u
||u|| .
96
Distance between two points
Let u, v ∈ Rn. The distance between u and v is
d(u, v) = ||v − u||.
v
v − u
u
Example 1.
Find the distance between the points P(1, 3,−1, 2) and Q(2, 1,−1, 3).
Solution:
97
3.2 Dot product
Definition (Dot product)
Let u = (u1, u2, . . . , un) ∈ Rn and v = (v1, v2, . . . , vn) ∈ Rn.
We define the dot product (scalar product, Euclidean inner product) by
u · v =
n∑
k=1
ukvk = u1v1 + u2v2 + · · ·+ unvn.
Note:
If we write u and v as column matrices U =
u1
...
un
and V =
v1
...
vn
, then
u · v = UTV .
98
Properties of the dot product
Let u, v,w ∈ Rn and let α ∈ R be a scalar. Then
1. u · v is a scalar
2. u · v = v · u
3. (u+ v) · w = u · w + v ·w
4. u · (v + w) = u · v + u ·w
5. (αu) · v = u · (αv) = α(u · v)
6. u · u = ‖u‖2
99
Geometry of the dot product
We can also interpret the dot product geometrically by
u · v = ‖u‖‖v‖ cos θ
u
v
Ƨ
where 0 6 θ 6 π is the angle between the vectors u and v.
Note:
◮ Two non-zero vectors are perpendicular or orthogonal when
u · v = 0, so that θ = pi
2
.
◮ Since | cos θ| 6 1, then
|u · v| 6 ||u|| ||v||.
This is known as the Cauchy-Schwarz inequality.
100
Example 2.
Verify that the Cauchy-Schwarz inequality holds for the vectors
u = (0, 2, 2,−1) and v = (−1, 1, 1,−1).
Solution:
101
Example 3. Correlation
Correlation is a measure of how closely two variables are dependent.
Suppose we want to compare how closely 7 students’ assignment marks
correlate with their exam marks.
Student Assignment Mark Exam Mark
S1 99 100
S2 80 82.5
S3 79 79
S4 75.5 82.5
S5 87.5 91
S6 67 67.5
S7 76 68
Average 80.5 81.5
To measure the correlation we first need to adjust the scores so that
they have mean zero. This can be done by subtracting the average
score X¯ from each students’ score X .
102
Let’s store the differences X − X¯ in a matrix:
X − X¯ =
18.5 18.5
−0.5 1
−1.5 −2.5
−5 1
7 9.5
−13.5 −14
−4.5 −13.5
To compare the exam and assignment scores, we compute the cosine of
the angle between the two columns x1, x2 of this matrix:
cos θ =
x1 · x2
‖x1‖‖x2‖ =
656.75
(24.92)(28.62)
≈ 0.92
A cosine value near 1 indicates that the scores are highly correlated.
103
Vector projection
Definition (Projection of v onto u)
Assuming u 6= 0, the projection of v onto u is given by
proju v =
u · v
‖u‖2 u = (uˆ · v)uˆ
u
v
proj
u
v
v-projuv
θ
projuv = ‖v‖ cos θ uˆ
= ‖uˆ‖‖v‖ cos θ uˆ
= (uˆ · v)uˆ
Note:
u · (v − proju v) = 0
104
Example 4.
Let u = (2,−1,−2) and v = (2, 1, 3). Find vectors v1 and v2 such that
v = v1 + v2
where v1 is parallel to u and v2 is perpendicular to u.
Solution:
105
3.3 Cross product of vectors in R3
Definition (Cross product)
Let u = (u1, u2, u3) and v = (v1, v2, v3) be two vectors in R
3. We define
the cross product (or vector product) of u and v by
u× v = (u2v3 − u3v2)i+ (u3v1 − u1v3) j+ (u1v2 − u2v1)k.
It is easy to remember the cross product as a determinant expanded
along the first row:
u× v =
∣∣∣∣∣∣
i j k
u1 u2 u3
v1 v2 v3
∣∣∣∣∣∣ =
∣∣∣∣u2 u3v2 v3
∣∣∣∣ i−
∣∣∣∣u1 u3v1 v3
∣∣∣∣ j+
∣∣∣∣u1 u2v1 v2
∣∣∣∣ k
Note:
The cross product is defined only for R3.
106
Properties of the cross product
Let u, v,w ∈ R3 and α ∈ R be a scalar. Then
1. v × u = −(u× v)
2. (u+ v)× w = u× w + v × w
3. u× (v + w) = u× v + u× w
4. (αu)× v = u× (αv) = α(u× v)
5. u× u = 0
Example
i× j = k, j× k = i, k× i = j.
107
Geometry of the cross product
We can also interpret the cross product geometrically by
u× v = ‖u‖ ‖v‖ sin(θ) nˆ
u
v
Q
Ƨ
where
◮ 0 6 θ 6 π is the angle between u and v
◮ nˆ is a unit vector
◮ nˆ is perpendicular to both u and v
◮ nˆ points in the direction given by the right-hand rule. The
direction of u× v is determined by curling your fingers on your
right hand following the angle from u to v and noting the direction
your thumb is pointing.
108
Example 5.
Find a vector perpendicular to both (1, 1, 1) and (1,−1,−2).
Solution:
109
Application: Area of a parallelogram
Suppose that u, v ∈ R3. The area of the parallelogram defined by u and
v is equal to ‖u× v‖.
base length
u
v height
θ
Area of parallelogram = (base) (height)
= ‖u‖ ‖v‖ sin θ
= ‖u× v‖
110
Example 6.
Find the area of the triangle with vertices (2,−5, 4), (3,−4, 5) and
(3,−6, 2).
V
W
Solution:
111
Definition (Scalar triple product)
Let u = (u1, u2, u3), v = (v1, v2, v3) and w = (w1,w2,w3) be vectors in
R
3. The scalar triple product is the real number u · (v × w) given by
u · (v × w) =
∣∣∣∣∣∣
u1 u2 u3
v1 v2 v3
w1 w2 w3
∣∣∣∣∣∣ .
Properties of the scalar triple product
1. v · (v × w) = w · (v × w) = 0
2. u · (v × w) = v · (w × u) = w · (u× v)
3. u · (v × w) = −u · (w × v) = −v · (u× w) = −w · (v × u)
Note:
Property 1 says that v × w is orthogonal to both v and w.
112
Application: Volume of a parallelepiped
Suppose that u, v,w ∈ R3. Then, the parallelepiped defined by the
vectors u, v and w has volume equal to the absolute value of the scalar
triple product of u, v and w:
Volume of parallelepiped = |u · (v × w)|
v
u
wθ
Volume of parallelepiped = (area of base) (height)
= ‖v × w‖ ‖u‖ | cos θ|
= |u · (v × w)|
Note:
This is the absolute value of the determinant of the matrix with rows
given by u, v,w.
113
Example 7.
Find the volume of the parallelepiped with adjacent edges
−→
PQ,
−→
PR,
−→
PS,
where the points are P(2, 0,−1), Q(4, 1, 0), R(3,−1, 1) and
S(2,−2, 2).
Solution:
114
3.4 Lines and planes
Equations of a straight line
A vector equation of a line through a point P0 in the direction of a
vector v is given by
r = r0 + tv, t ∈ R
where r0 =
−−→
OP0.
0
r
U
P
3Wv
Each point on the line is given by a unique value of t.
115
For example, in R3, if r = (x , y , z), r0 = (x0, y0, z0) and v = (a, b, c),
the vector equation of the line is
(x , y , z) = (x0, y0, z0) + t(a, b, c), t ∈ R
Equating components gives parametric equations for the line:
x = x0 + ta
y = y0 + tb,
z = z0 + tc
t ∈ R
If a 6= 0, b 6= 0 and c 6= 0, we can solve the parametric equations for t
and equate. This gives a Cartesian form of the line:
x − x0
a
=
y − y0
b
=
z − z0
c
Note:
The equations for a line are not unique.
116
Example 8.
Determine vector, parametric and Cartesian equations of the line
passing through the points P(−1, 2, 3) and Q(4,−2, 5).
Solution:
117
118
Example 9.
Find a vector equation of the line through the point (2, 0, 1) that is
parallel to the line
x − 1
1
=
y + 2
−2 =
z − 6
2
.
Does the point (0, 4,−3) lie on the line?
Solution:
119
Definition
Two lines are said to
◮ intersect if there is a point lying on both lines.
◮ be parallel if their direction vectors are parallel.
◮ be skew if they do not intersect and are not parallel.
The angle between two lines is given by the angle θ between their
direction vectors. (We take the acute angle, which is the smaller of the
two possible angles θ and π − θ.)
120
Equations of planes
A vector equation of a plane through a point P0 that contains two
non-parallel vectors in the directions u and v is
r = r0 + su+ tv, s, t ∈ R,
where r0 =
−−→
OP0.
r0
P0
r
tv
su
0
Every value of s and t determines a point on the plane.
In R3, a vector perpendicular to the plane is
n = u× v
We say that n is a normal vector to the plane.
121
If P is a point with position vector r, then
P lies on the plane⇔ r − r0 is parallel to plane
⇔ r − r0 is perpendicular to n
r0
r
0
n
r - r0
Hence the equation of the plane in R3 can also be written in
point-normal form:
(r − r0) · n = 0
122
Putting r = (x , y , z), r0 = (x0, y0, z0) and n = (a, b, c) in the
point-normal form, we derive the Cartesian equation of a plane:
r · n = r0 · n
⇒ (x , y , z) · (a, b, c) = (x0, y0, z0) · (a, b, c)
⇒ ax + by + cz = ax0 + by0 + cz0
⇒ ax + by + cz = d
where d is the constant given by
d = ax0 + by0 + cz0.
Note:
◮ The equations for a plane are not unique.
◮ The angle between two planes in R3 is given by the angle θ
between their normal vectors. (We take the acute angle, which is
the smaller of the two possible angles θ and π − θ.)
123
Example 10.
Find a Cartesian equation of the plane with vector form
(x , y , z) = s(1,−1, 0) + t(2, 0, 1) + (−1, 1, 1), s, t ∈ R
Solution:
124
125
Example 11.
Find a vector equation for the plane containing the points P(1, 0, 2),
Q(1, 2, 3) and R(4, 5, 6).
Solution:
126
Intersection of a line and a plane
Example 12.
Where does the line
x − 1
1
=
y − 2
2
=
z − 3
3
intersect the plane 3x + 2y + z = 20?
Solution:
127
Intersection of two planes
Example 13.
Find a vector form for the line of intersection of the two planes
x + 3y + 2z = 6 and 3x + 2y − z = 11.
Solution:
128
129
Topic 4: General Vector Spaces [AR 4.1 - 4.8]
4.1 General vector spaces
4.2 Subspaces
4.3 Linear combinations and spans
4.4 Linear dependence and linear independence
4.5 Bases and dimension
4.6 Solution space, column space, row space
4.7 Coordinates relative to a basis
130
4.1 General vector spaces
Vector spaces
We want to extend the properties of vectors in Rn to more general sets
of objects.
A vector space is a set V with two operations defined:
1. vector addition,
2. scalar multiplication.
We want these two operations to satisfy the basic algebraic properties
of vectors in Rn. This leads to a list of ten axioms (properties) that we
use as our definition.
131
Fields
The scalars are members of a number system F called a field where
addition, subtraction, multiplication, and division by a non zero scalar
are defined.
Every field F contains a:
◮ zero scalar 0 such that α+ 0 = α for all α ∈ F.
◮ unit scalar 1 (with 1 6= 0) such that 1α = α for all α ∈ F.
In this subject, we focus on the fields F = R and F = C.
We will also look at F = F2 = {0, 1}, the integers modulo 2.
132
Definition (Vector space)
Fix a field of scalars F. Then a vector space over F is a non-empty set
V with two operations: vector addition and scalar multiplication. These
operations are required to satisfy the following 10 rules (axioms).
For all u, v,w ∈ V :
Vector addition behaves well:
A1 u+ v ∈ V . (closure of vector addition)
A2 (u+ v) + w = u+ (v + w). (associativity)
A3 u+ v = v + u. (commutativity)
There must be a zero and inverses:
A4 There exists a vector 0 ∈ V such that
v + 0 = v for all v ∈ V . (existence of zero vector)
A5 For all v ∈ V , there exists a vector −v
such that v + (−v) = 0. (additive inverse)
133
Definition (Vector space continued)
For all u, v ∈ V and α, β ∈ F:
Scalar multiplication behaves well:
M1 αv ∈ V . (closure under scalar multiplication)
M2 α(βv) = (αβ)v. (associativity of scalar multiplication)
M3 1v = v. (multiplication by unit scalar)
Vector addition and scalar multiplication combine well :
D1 α(u+ v) = αu+ αv. (distributivity 1)
D2 (α+ β)v = αv + βv. (distributivity 2)
Note:
It follows from the axioms that for all v ∈ V :
1. 0v = 0.
2. (−1)v = −v.
134
Examples of vector spaces
Definition
A vector space that has R as the scalars is called a real vector space.
1. Rn
Our definition of vector spaces was based on algebraic properties of Rn,
so it is easy to check that Rn is a vector space with scalars R.
135
2. Vector space of matrices
Definition
Denote by Mm,n( or Mm,n(R)) the set of all m × n matrices with real
entries.
Mm,n is a real vector space with the usual matrix addition and scalar
multiplication operations.
Example 1.
Prove axioms A1, A4, A5 and M1 for M2,2.
Solution:
Define two vectors in V and a scalar in R
136
A1. Closure under vector addition
A4. Existence of zero vector
137
A5. Additive inverse
M1. Closure under scalar multiplication
138
3. Vector space of polynomials
Definition
For a fixed integer n > 0, denote by Pn (or Pn(R)) the set of all
polynomials with degree at most n with real coefficients:
Pn = {a0 + a1x + a2x2 + · · ·+ anxn | a0, a1, . . . , an ∈ R}
Note:
Two polynomials are equal if and only if their coefficients are equal.
Pn is a real vector space with the usual operations of polynomial
addition and scalar multiplication. That is, if
a0 + · · ·+ anxn ∈ Pn, b0 + · · ·+ bnxn ∈ Pn and α ∈ R, then
(a0 + · · ·+ anxn) + (b0 + · · ·+ bnxn) = (a0 + b0) + · · · + (an + bn)xn
α(a0 + a1x + · · ·+ anxn) = (αa0) + (αa1)x + · · · + (αan)xn
139
4. Vector space of functions
Definition
Let
F(S ,R) = { all functions f : S → R }
denote the set of all functions from S to R, where S is a non-empty set.
If f , g ∈ F(S ,R) and α ∈ R, then f + g and αf are the functions
defined by
(f + g)(x) = f (x) + g(x)
(αf )(x) = αf (x)
for all x ∈ S .
Then F(S ,R) is a real vector space.
Note:
The zero vector is the function which is identically zero, i.e.
0 : S → R such that 0(x) = 0 for all x ∈ S .
140
Example 2.
If f , g ∈ F(R,R) are given by
f (x) = sin(x) and g(x) = x for all x ∈ R,
then
(f + g)(x) = sin(x) + x and (3f )(x) = 3 sin(x) for all x ∈ R.
-5 5
-10
-5
5
10
-5 5
-10
-5
5
10
Note:
Function spaces are important in analysis of sound and light waves,
quantum mechanics and the study of differential equations.
141
5. Vector space Fn2
The field F2
Definition
The set containing just two elements
F2 = {0, 1},
equipped with the operations of addition and multiplication defined
below, is a field called the integers modulo 2.
Addition
0 + 0 = 0
1 + 0 = 1
0 + 1 = 1
1 + 1 = 0
Multiplication
0× 0 = 0
1× 0 = 0
0× 1 = 0
1× 1 = 1
We call the arithmetic in F2, modular or mod 2 arithmetic.
142
The vector space Fn2
Definition
Denote by Fn2 the set of vectors with integers modulo 2 as scalars:
F
n
2 = {(a1, a2, . . . , an) | ai ∈ F2, i = 1, . . . , n}
For (a1, a2, . . . , an) ∈ Fn2, (b1, b2, . . . , bn) ∈ Fn2 and α ∈ F2, we define
addition and scalar multiplication by
(a1, a2, . . . , an) + (b1, b2, . . . , bn) = (a1 + b1, a2 + b2, . . . , an + bn)
α(a1, a2, . . . , an) = (αa1, αa2, . . . , αan),
where on the right hand side the addition and scalar multiplication are
as for mod 2 arithmetic.
Then Fn2 is a vector space over F2.
143
6. Complex vector spaces
Definition
A vector space that has C as the scalars is called a complex vector
space.
Example 3.
C
2 = {(a1, a2) | a1, a2 ∈ C}
For (a1, a2) ∈ C2, (b1, b2) ∈ C2 and α ∈ C, we define addition and
scalar multiplication by
(a1, a2) + (b1, b2) = (a1 + b1, a2 + b2)
α(a1, a2) = (αa1, αa2)
Then C2 is a complex vector space.
Note:
The real vector spaces Rn, Mm,n(R), Pn(R) and F(S ,R) have complex
analogues denoted by Cn, Mm,n(C), Pn(C) and F(S ,C).
144
Example 4.
Consider V = R2. Let u = (a1, a2) ∈ R2, v = (b1, b2) ∈ R2 and
α ∈ R. Define addition and scalar multiplication as follows:
(a1, a2) + (b1, b2) = (a1 + b1, a2 + b2)
α(a1, a2) = (αa1, 0)
Show that V is not a vector space with these operations.
Solution:
Find a specific counterexample:
145
Applications of general vector spaces
Regarding matrices and functions as vector spaces allows geometrical
notions such as distance, angles and projection to be extended to these
objects. (See Topic 7.)
Example 5.
Let A =
[
a11 a12
a21 a22
]
and B =
[
b11 b12
b21 b22
]
so that A,B ∈ M2,2.
The distance between A and B can be defined as√
(a11 − b11)2 + (a12 − b12)2 + (a21 − b21)2 + (a22 − b22)2
This is useful in comparing digital photographs since the data is
naturally a matrix (each pixel gives an entry). Image compression using
singular value decomposition is an application of these ideas.
146
Example 6.
The vector space Fn2 over F2 is used in coding theory applications to
reduce errors occurring in data storage and message transmission.
One way of coding data is to use a Hamming code so that errors can
not only be detected but also corrected efficiently.
1. Data is stored in the form of a string of 1s and 0s. These 1s and 0s
are called binary numbers.
2. Using modular 2 arithmetic enables checking and correction of
digital information.
147
4.2 Subspaces
Definition (Subspace)
A subspace W of a vector space V is a subset W ⊆ V that is itself a
vector space using the addition and scalar multiplication operations
defined on V .
The following theorem allows us to avoid checking all 10 vector space
axioms.
Theorem (Subspace theorem)
Let V be a vector space with scalars F. A subset W ⊆ V is a subspace
of V if and only if
0. W contains the zero vector 0 from V .
1. W is closed under vector addition:
u, v ∈W ⇒ u+ v ∈W.
2. W is closed under scalar multiplication:
u ∈W , α ∈ F⇒ αu ∈W.
148
Example 7.
Show that S = {(x , y , z) ∈ R3 | x + y + z = 0} is a subspace of R3.
Solution:
Define two vectors in S and a scalar in R
0. S contains the zero vector from R3
149
1. Closure under vector addition
2. Closure under scalar multiplication
150
Some geometric examples
1. The only subspaces of R2 are
• {0}
• Lines through the origin
• R2
2. The only subspaces of R3 are
• {0}
• Lines through the origin
• Planes through the origin
• R3
Example 8.
Is the line {(x , y) ∈ R2 | y = 2x + 1} a subspace of R2?
Solution:
151
Example 9.
Let S ⊂ P2 be the set of polynomials defined by
S = {a1x + a2x2 | a1, a2 ∈ R}
Is S a subspace of P2?
Solution:
Define two vectors in S and a scalar in R
0. S contains the zero vector from P2
152
1. Closure under vector addition
2. Closure under scalar multiplication
153
Definition (Trace)
If A is an n × n matrix, then the trace of A is the sum of the diagonal
entries, namely:
Tr(A) = A11 + A22 + ...+ Ann
Example 10.
Let W be the real 2× 2 matrices whose trace is equal to 0. Show that
W is a subspace of M2,2.
Solution:
Define two vectors in W and a scalar in R
154
0. W contains the zero vector
1. Closure under vector addition
155
2. Closure under scalar multiplication
156
Example 11.
Let W be the set of real 2× 2 matrices with determinant equal to 0:
W =
{[
a b
c d
]
| a, b, c , d ∈ R and ad − bc = 0
}
Show that W is not a subspace of M2,2.
Solution:
Find a specific counterexample
157
4.3 Linear combinations and spans
Linear combinations
Let V be a vector space with scalars F.
Definition (Linear combination)
A linear combination of vectors v1, v2, . . . , vk ∈ V is the sum of scalar
multiples of the vectors:
k∑
i=1
αivi = α1v1 + · · ·+ αkvk ,
where each αi ∈ F.
Example
(5, 1) is a linear combination of (1,−1) and (1, 2) since
3(1,−1) + 2(1, 2) = (5, 1)
158
Example 12. Computer graphics application
Linear combinations are used to generate colours in computer graphics.
Three primary colours are represented as unit vectors in R3.
Red := (1, 0, 0) Green := (0, 1, 0) Blue := (0, 0, 1)
To generate a general colour we form a linear combination
α(1, 0, 0) + β(0, 1, 0) + γ(0, 0, 1)
with α, β, γ chosen between 0 and 1.
Examples
has α = 0.8, β = 0.47, γ = 0.25
has α = 0.8, β = 0.12, γ = 0.5
Note:
These primary colours are different from the standard primary colours
red, blue, yellow.
159
Example 13.
Determine whether (1, 2, 3) ∈ R3 is a linear combination of (1,−1, 2)
and (−1, 1, 2).
Solution:
160
161
Example 14.
In P2, determine whether q(x) = 1− 2x − x2 is a linear combination of
p1(x) = 1 + x + x
2 and p2(x) = 3 + x
2.
Solution:
162
163
Spans
We can use linear combinations to produce vector equations for lines
and planes.
Definition (Span of a set of vectors)
Let V be a vector space defined over the field F. The span of vectors
v1, . . . , vk ∈ V is the set of all linear combinations of these vectors:
span{v1, . . . , vk} = {α1v1 + · · ·+ αkvk | α1, . . . , αk ∈ F}.
Theorem
Let V be a vector space and v1, . . . , vk ∈ V . Then span{v1, . . . , vk} is
a subspace of V .
Proof:
Check the conditions of the subspace theorem.
164
What does the span of a set of vectors look like?
Consider 3 vectors v1, v2, v3 in R
3. Then span{v1, v2, v3} could be:
1. {0}
2. A line through the origin
3. A plane through the origin
4. R3
Note:
A span of 4 or more vectors in R3 will also give one of the above sets
because these are the only possible subspaces of R3.
Example
The span of the vectors u and v is a plane through the origin.
!
"
#
$
%
165
Example 15.
Let v1 = (1, 1, 1), v2 = (2, 2, 2), v3 = (3, 3, 3). In R
3, what is
span{v1, v2, v3}?
Solution:
166
We can also use the word “span” as a verb.
Definition (Vectors spanning a space)
Let V be a vector space. The vectors v1, . . . , vk ∈ V span V if
span{v1, . . . , vk} = V .
Then we also say that {v1, . . . , vk} is a spanning set for V .
Note:
The vectors v1, . . . , vk span V if and only if
(1) V contains all the vectors v1, . . . , vk , and
(2) all vectors in V are linear combinations of the vectors v1, . . . , vk .
167
Example 16.
Show that the vectors (1,−1), (2, 4) span R2.
Solution:
168
169
Example 17.
Do the vectors (1, 2, 0), (1, 5, 3), (0, 1, 1) span R3?
Solution:
170
171
General method to determine if vectors span Rn
If A = [v1 · · · vk ] where v1, · · · , vk ∈ Rn are column vectors, then:
1. v1, · · · , vk span Rn ⇔ the linear system with augmented matrix
[A | w] has a solution for all w ∈ Rn.
2. v1, . . . , vk span R
n ⇔ rank(A) = n.
Theorem
If k < n, then vectors v1, . . . , vk ∈ Rn do not span Rn.
Proof:
Since A = [v1 · · · vk ] has k columns, rank(A) 6 k .
Since k < n this implies that rank(A) 6= n.
Hence v1, . . . , vk do not span R
n.
172
Example 18.
Do the polynomials p1(x) = 1 + x + x
2 and p2(x) = x
2 span P2?
Solution:
173
4.4 Linear dependence and linear independence
A vector space can have many spanning sets.
For example, R2 is spanned by any of the following sets:
1. {(1, 0), (0, 1)}
2. {(1, 1), (1, 2)}
3. {(1, 1), (1, 2), (3, 5)}
4. {(1, 0), (0, 1), (0, 0), (1, 1), (1, 2), (3, 5)}
To get a small spanning set, we want to remove “redundant” vectors.
This leads to the ideas of linear dependence and linear independence.
174
Let V be a vector space with scalars F.
Definition (Linear dependence)
The vectors v1, . . . , vk ∈ V are linearly dependent if there exist scalars
α1, . . . , αk ∈ F, with at least one αi non-zero, such that
k∑
i=1
αivi = α1v1 + · · ·+ αkvk = 0.
Definition (Linear independence)
The vectors v1, . . . , vk ∈ V are linearly independent if they are not
linearly dependent. Equivalently, if α1, . . . , αk ∈ F are scalars such that
k∑
i=1
αivi = α1v1 + · · ·+ αkvk = 0,
then the only solution is α1 = 0, α2 = 0, . . . , αk = 0.
175
Note:
Any set of vectors containing the zero vector 0 is linearly dependent.
Theorem
The vectors v1, . . . , vk (k > 2) are linearly dependent if and only if one
vector is a linear combination of the others.
In particular:
Two vectors are linearly dependent if and only if one is a multiple of the
other.
176
Example 19.
Are the following vectors linearly dependent or linearly independent?
(a) u = (2,−1, 1), v = (−6, 3,−3)
(b) u = (2,−1, 1), v = (4, 0, 2)
Solution:
177
Example 20.
Are the vectors (2, 0, 0), (6, 1, 7), (2,−1, 2) linearly dependent or linearly
independent?
Solution:
178
Note:
The augmented matrix has the original vectors as its columns.
179
General method to determine if vectors in Rn are linearly independent
If A = [v1 · · · vk ] where v1, · · · , vk ∈ Rn are column vectors, then:
1. v1, · · · , vk ∈ Rn are linearly independent ⇔ the linear system with
augmented matrix [A | 0] has a unique solution.
2. v1, . . . , vk ∈ Rn are linearly independent ⇔ rank(A) = k
3. If k = n then
v1, . . . , vn ∈ Rn are linearly independent ⇔ det(A) 6= 0
180
Theorem
Let v1, . . . , vk be vectors in R
n. If k > n, then the vectors are linearly
dependent.
Proof:
Let A = [v1 · · · vk ] where vi ∈ Rn.
Then rank(A) 6 n.
Since k > n this implies that rank(A) 6= k .
Hence the vectors are linearly dependent.
Note:
The results and methods on linear combinations and linear dependence
for vectors in Rn also apply to vectors in Fn for any field F, e.g. C, F2.
181
Example 21.
Determine if the following polynomials
2 + 2x + 5x2, 1 + x + x2, 1 + 2x + 3x2
are linearly independent.
Solution:
182
183
Example 22.
Are the following matrices linearly independent or linearly dependent?[
1 3
1 1
]
,
[−2 1
1 −1
]
,
[
1 10
4 2
]
.
Solution:
184
185
Linear combinations via reduced row-echelon form
Let A = [v1 · · · vk ] and B = [u1 · · · uk ] be matrices with vectors in
columns.
If A and B are row-equivalent (i.e., A ∼ B), then the columns of A
satisfy the same linear relations as the columns of B :
α1v1 + · · ·+ αkvk = 0 ⇔ α1u1 + · · · + αkuk = 0
If A ∼ B and B is in reduced row-echelon form, then the relations
between the vectors v1, . . . , vk can be read off from B .
186
Example 23.
Consider the vectors v1 = (1, 2, 3), v2 = (3, 6, 9), v3 = (−1, 0,−2) and
v4 = (1, 4, 4) in R
3.
Given that:
A =
1 3 −1 12 6 0 4
3 9 −2 4
∼
1 3 0 20 0 1 1
0 0 0 0
= B .
(a) Are the vectors linearly independent?
(b) Express v2 and v4 as a linear combination of v1 and v3.
(c) Are the vectors v1, v3 linearly independent?
Solution:
187
188
4.5 Bases and dimension
Sometimes spanning sets can be too big. To get an efficient description
of a vector space V , we seek a set with the minimum number of vectors
needed to span V .
Definition (Basis)
A basis for a vector space V is a set of vectors in V which:
1. spans V , and
2. is linearly independent.
Note:
A vector space V can have many different bases.
e.g. Any two linearly independent vectors in R2 will give a basis for R2.
189
Example 24.
Is {(1,−1), (2, 4)} a basis for R2?
Solution:
190
Example 25.
Recall that
W = {A ∈ M2,2 | Tr(A) = 0}
is a subspace of M2,2.
Is
{[
1 0
0 −1
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]}
a basis for W ?
Solution:
1. Are the vectors in W ?
191
2. Do the vectors span W ?
192
3. Are the vectors linearly independent?
193
Theorem (Existence of bases)
Let V be a vector space. Then
1. V has a basis.
2. Any set that spans V contains a basis for V .
3. Any linearly independent set in V can be extended to a basis of V .
Theorem
Let V be a vector space and let {v1, . . . , vk} be a basis for V .
1. A subset of V with more than k vectors is linearly dependent.
2. A subset of V with fewer than k vectors does not span V .
3. Any two bases of V have the same number of elements, k.
194
This allows us to define the dimension of a vector space.
Definition (Dimension)
The dimension of a vector space V is the number of vectors in any basis
for V . This is denoted by dim(V ).
We call V finite dimensional if it has a basis with a finite number of
elements, and infinite dimensional otherwise.
Note:
If V = {0}, we say that dim(V ) = 0.
195
Standard Bases
◮ The standard basis for Rn is
{(1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, 0, . . . , 0, 1)} .
These vectors are typically denoted as {e1, e2, · · · , en}.
(In R3 the standard basis is also denoted by {i, j, k})
The dimension of Rn is n.
◮ The standard basis for Mm,n is
1 0 . . . 0
0 0 . . . 0
...
...
...
0 0 . . . 0
,
0 1 . . . 0
0 0 . . . 0
...
...
...
0 0 . . . 0
, . . . ,
0 0 . . . 0
0 0 . . . 0
...
...
...
0 0 . . . 1
.
The dimension of Mm,n is mn.
196
◮ The standard basis for Pn is{
1, x , x2, . . . , xn
}
.
The dimension of Pn is n + 1.
◮ Let P be the set of all polynomials of all possible degrees:
P = {a0 + a1x + · · · + anxn | n ∈ N, a0, a1, . . . , an ∈ R}
where N denotes the set of natural numbers.
P is an infinite dimensional vector space, since for each integer
k > 1, we can find a linearly independent set {1, . . . , xk} with
more than k elements.
The standard basis for P is
{1, x , x2, . . .}.
197
Theorem
Let V be a vector space, and suppose that dim(V ) = n is finite.
1. If a set with exactly n elements spans V , then it is a basis for V .
2. If a linearly independent subset of V has exactly n elements, then
it is a basis for V .
198
Example 26.
Is S =
{[
1 0
0 −1
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]}
a basis for M2,2?
Solution:
199
Example 27.
Let B = {2 + 2x +5x2, 1 + x + x2, 1 + 2x +3x2}. Is B a basis for P2?
Solution:
200
How to find a basis for the span of a set of vectors in Rn
Recall that if A ∼ B , then the columns of A and B satisfy the same
linear relationships.
Method 1: Using columns
To find a basis for V = span{v1, . . . , vk} where v1 · · · vk ∈ Rn.
1. Form the matrix A with columns v1, . . . , vk .
2. Reduce A to row-echelon form B .
3. The columns of A corresponding to the leading entries in B give a
basis for V .
Note:
This method gives a basis that is a subset of the original set of vectors.
201
Example 28.
Let S = {v1, v2, v3} = {(1, 3,−1, 1), (2, 6, 0, 4), (3, 9,−2, 4)}.
(a) Find a subset of S that is a basis for span(S).
(b) What is the dimension of span(S)?
Solution:
(a) We write the vectors as columns of a matrix and reduce to row
echelon form.
202
203
How to find a basis for the span of a set of vectors in Rn
Method 2: Using rows
To find a basis for V = span{v1, . . . , vk} where v1 · · · vk ∈ Rn.
1. Form the matrix A with rows v1, . . . , vk .
2. Reduce A to row-echelon form B .
3. The non-zero rows of B give a basis for V .
Note:
◮ This method gives a basis that will usually not be a subset of the
original set of vectors.
◮ The method works because elementary row operations do not
change the subspace spanned by the rows, and in the row echelon
form the non-zero rows are linearly independent.
204
Example 29.
Let
S = {(1, 3,−1, 1), (2, 6, 0, 4), (3, 9,−2, 4))} ⊂ R4.
Find a basis for W = span(S) and dim(W ) using the row method.
Solution:
We write the vectors as rows of a matrix and reduce to row echelon
form. Now
A =
1 3 −1 12 6 0 4
3 9 −2 4
∼
1 3 0 20 0 1 1
0 0 0 0
= B
205
Example 30.
(a) Show that
W = {(a + b, 0, a − b, 0) | a, b ∈ R}
is a subspace of R4 and find a basis B for W .
(b) Extend B to a basis for R4.
Solution:
206
207
208
4.6 Solution space, column space, row space
Solution space of a matrix
Theorem (Solution Space)
Let A be a real m × n matrix. Then the solution set
S = {x ∈ Rn|Ax = 0}
is a subspace of Rn, with x regarded as a column vector.
This is called the solution space or nullspace of A, and its dimension is
called the nullity of A, denoted by nullity(A).
Proof:
Check the conditions of the subspace theorem.
209
Example 31.
Find a basis for the solution space of
x1 + 2x2 + x3 + x4 = 0
3x1 + 6x2 + 4x3 + x4 = 0
What is the dimension of the solution space?
Solution:
210
211
Note:
The nullity(A) is the number of columns in the row-echelon form of A
not containing a leading entry.
212
Column space and row space of a matrix
Definition (Column space)
Let A be a real m × n matrix. The subspace of Rm spanned by the
columns of A is called the column space of A.
Definition (Row Space)
Let A be a real m × n matrix. The subspace of Rn spanned by the rows
of A is called the row space of A.
Note:
◮ The column space and row space are subspaces, since the span of
any set of vectors is a subspace.
◮ To find a basis for the column space use the column method.
◮ To find a basis for the row space use the row method.
213
Example 32.
Given that
A =
1 −1 2 −2
2 0 1 0
5 −3 7 −6
1 1 −1 3
∼
1 −1 2 −2
0 2 −3 4
0 0 0 1
0 0 0 0
= B ,
find a basis for the
(a) column space of A,
(b) row space of A.
What are the dimensions of the row space and the column space?
Solution:
214
215
Effect of row operations
Assume that A ∼ B by a sequence of elementary row operations. Then
◮ solution space of A = solution space of B ,
◮ row space of A = row space of B ,
◮ column space of A 6= column space of B in general.
Theorem
For any m × n matrix A:
rank(A) = dim(row space of A) = dim(column space of A).
This is because in both cases the number of leading entries in
row-echelon form determines the number of basis vectors.
216
Rank-nullity theorem
Theorem
Suppose A is an m × n matrix. Then
rank(A) + nullity(A) = n = number of columns in A.
Proof:
Suppose A has row echelon form B .
The rank of A equals the number of columns in B containing a leading
entry.
The nullity of A equals the number of columns in B which do not
contain a leading entry.
Therefore, rank(A) + nullity(A) is equal to the number of columns in B ,
which is equal to the number of columns in A.
217
Other fields of scalars
Our work on vector spaces applies using any field of scalars F instead of
the real numbers R. For example, we could use F = C or F = F2.
Then
F
n = {(x1, . . . , xn) | x1, . . . , xn ∈ F}
is vector space using F as scalars.
Given any m × n matrix A ∈ Mm,n(F) with entries in F, we have:
◮ the solution space of A is a subspace of Fn
◮ the column space of A is a subspace of Fm
◮ the row space of A is a subspace of Fn
The previous theorems also apply in this context.
218
Example 33.
Consider the matrix
A =
[
1 1 0
1 0 1
]
∈ M2,3(F2).
Determine the column space, row space and solution space for A.
Note:
Since the field is F2 all matrix entries are in {0, 1} and the operations of
vector addition and scalar multiplication are done using mod 2
arithmetic.
Solution:
Reduce A to reduced row-echelon form using mod 2 arithmetic.
219
Column space of A
220
Row space of A
221
Solution space of A
We need to solve the following system using mod 2 arithmetic
[
1 1 0
1 0 1
]x1x2
x3
= [0
0
]
.
222
Note:
The rank-nullity theorem holds as rank(A) = 2, nullity(A) = 1 and A
has 3 columns.
223
4.7 Coordinates relative to a basis
Theorem
If {v1, . . . , vn} is a basis for a vector space V , then every vector v ∈ V
can be written uniquely in the form
v = α1v1 + . . .+ αnvn,
where α1, . . . , αn ∈ F.
Proof:
Since {v1, . . . , vn} is a basis, it spans V and so for some scalars
α1, . . . , αn ∈ F
v = α1v1 + . . .+ αnvn (1)
Are these scalars unique? Suppose
v = β1v1 + . . .+ βnvn (2)
for some other scalars β1, . . . , βn ∈ F.
224
Taking (2)− (1) gives
0 = (β1 − α1)v1 + . . . + (βn − αn)vn.
Since {v1, . . . , vn} is a basis, v1, . . . , vn are linearly independent, and
the only solution is
β1 − α1 = 0, . . . , βn − αn = 0
⇒ α1 = β1, . . . , αn = βn
showing the uniqueness of the scalars α1, . . . , αn.
225
Definition (Coordinates)
Suppose B = {v1, . . . , vn} is an ordered basis for a vector space V ,
i.e. a basis arranged in the order v1 first, v2 second and so on.
For v ∈ V we can write
v = α1v1 + · · ·+ αnvn for some scalars α1, . . . , αn ∈ F.
The scalars α1, . . . , αn are uniquely determined by v and are called the
coordinates of v relative to B.
Then
[v]B =
α1
...
αn
is called the coordinate vector of v with respect to B.
226
Example 34.
Consider the vector v = (1, 5) ∈ R2. Determine the coordinate vector of
v with respect to the ordered bases:
(a) B = {(1, 0), (0, 1)}
(b) B′ = {a,b} = {(2, 1), (−1, 1)}
Solution:
227
vi
5j
2a
3b
1
5
1
2
3
-1
-2
1
2
3
4
-1
-2
y
x
a
b
The vector (1, 5) shown on a grid formed by a and b.
228
Example 35.
Find the coordinate vector of p(x) = 2 + 7x − 9x2 with respect to the
ordered basis for P2 given by
B = {2, 1
2
x ,−3x2}.
Solution:
229
Having fixed a finite ordered basis B for a vector space V , there is a
one-to-one correspondence between vectors and their coordinate
vectors.
Theorem
Let V be a finite dimensional vector space over a field of scalars F. Let
B be an ordered basis of V .
If u, v ∈ V and α ∈ F is a scalar, then
[u+ v]B = [u]B + [v]B
[αv]B = α[v]B .
This theorem tells us that taking coordinates is a linear transformation
between the vector spaces V and Fn.
230
Topic 5: Linear Transformations [AR 4.9-4.11, 8.1-8.5]
5.1 General linear transformations
5.2 Geometric linear transformations from R2 to R2
5.3 Matrix representations for general linear transformations
5.4 Image, kernel, rank and nullity
5.5 Invertible linear transformations
5.6 Change of basis
231
5.1 General linear transformations
We now consider functions or mappings between general vector spaces
that preserve the vector space structure of vector addition and scalar
multiplication.
Definition (Linear transformation)
Let V and W be vector spaces over the same field of scalars F.
A linear transformation from V to W is a function T : V →W such
that for all u, v ∈ V and all α ∈ F:
1. T (u+ v) = T (u) + T (v) (T preserves vector addition)
2. T (αu) = αT (u) (T preserves scalar multiplication)
Note:
Taking α = 0 in condition 2 shows that
T (0V ) = 0W
i.e. T maps the zero vector in V to the zero vector in W .
232
Example 1.
Show that the cross product T : R3 → R3 given by
T (u) = u× (1, 1,−1)
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar
1. T preserves vector addition
233
2. T preserves scalar multiplication
234
Example 2.
Show that the function D : P3 → P2 defined by differentiation with
respect to x ,
D(a0 + a1x + a2x
2 + a3x
3) = a1 + 2a2x + 3a3x
2
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar
1. D preserves vector addition
235
2. D preserves scalar multiplication
236
Example 3.
Show that the determinant T : M2,2 → R given by
T
([
a b
c d
])
=
∣∣∣∣a bc d
∣∣∣∣ = ad − bc
is not a linear transformation.
Solution:
Find a specific counterexample
Method 1 - vector addition
237
Method 2 - scalar multiplication
238
Example 4.
Prove that the function T : R3 → R2 given by
T (x1, x2, x3) = (x2 − 2x3, 3x1 + x3)
is a linear transformation.
Solution:
Define two vectors in the domain and a scalar
1. T preserves vector addition
239
2. T preserves scalar multiplication
240
Note:
◮ We can write T in matrix form using coordinate vectors:
[T (x1, x2, x3)] =
[
x2 − 2x3
3x1 + x3
]
=
[
0 1 −2
3 0 1
]x1x2
x3
The matrix
[T ] =
[
0 1 −2
3 0 1
]
is called the standard matrix representation for T .
◮ All linear transformations from Rn to Rm can be represented by
matrices.
241
Theorem
Let S = {e1, . . . , en} and S ′ = {e1, . . . , em} be the ordered standard
bases for Rn and Rm, and let v ∈ Rn.
Every linear transformation T : Rn → Rm has a standard matrix
representation [T ] specified by the m × n matrix
[T ] = [ [T (e1)] [T (e2)] · · · [T (en)] ]
where [T (ei )] denotes the coordinate vector of T applied to ei . Then
[T (v)] = [T ] [v] for all v ∈ Rn,
where [v] is the coordinate vector for v with respect to S, and
[T (v)] is the coordinate vector for T (v) with respect to S ′
242
Proof:
Let v ∈ Rn and write
v = α1e1 + α2e2 + · · ·+ αnen.
The coordinate vector of v with respect to the standard basis is
[v] =
α1
...
αn
Since T is a linear transformation
[T (v)] = [T (α1e1 + α2e2 + · · ·+ αnen)]
= [T (α1e1)] + [T (α2e2)] + · · ·+ [T (αnen)]
= α1[T (e1)] + α2[T (e2)] + · · ·+ αn[T (en)]
= [ [T (e1)] [T (e2)] · · · [T (en)] ]
α1
...
αn
= [T ] [v]
243
5.2 Geometric linear transformations from R2 to R2
Many familiar geometric transformations are linear transformations
T : R2 → R2.
Two geometric properties of linear transformations are:
1. T maps every line in R2 to a line (or a point) in R2.
Proof:
A line has a vector equation
{v0 + tv | t ∈ R}.
Then
T (v0 + tv) = T (v0) + T (tv)
= T (v0) + tT (v)
This is the line through T (v0) in the direction T (v), or a point if
T (v) = 0.
244
2. T maps every parallelogram in R2 with vertices
0, v,w, v + w
to a (possibly degenerate) parallelogram with vertices
0,T (v),T (w),T (v + w).
0
v
w
v + w
T (v)
T (w)
T (v + w)
245
Example 5.
Determine the standard matrix representation in R2 for reflection across
the y -axis.
x
Z
Y
y
Y
Z
Solution:
246
Example 6.
Determine the standard matrix representation in R2 for reflection in the
line y = 5x .
Solution:
247
248
249
Example 7.
Determine the standard matrix representation in R2 for rotation around
the origin anticlockwise by an angle of θ.
x
y
e1
T (e1)
e2
T (e2)
θ
250
Solution:
Consider the effect of the transformation on the standard basis vectors.
251
Example 8.
Determine the standard matrix representation in R2 for the
compression/expansion along the x-axis by a factor c > 0.
1
1
c
1
x
y
x
y
Solution:
Consider the effect of the transformation on the standard basis vectors.
252
Example 9.
Determine the standard matrix representation in R2 for the shear by a
factor c along the x-axis.
1
1
1c
1
1+cx
y
x
y
Solution:
Consider the effect of the transformation on the standard basis vectors.
253
Composition of linear transformations
Recall:
If S : U → V and R : V →W are functions, then the composition
R ◦ S : U → W
is defined by doing S followed by R :
(R ◦ S)(u) = R(S(u)) for all u ∈ U.
Theorem
Suppose R : Rk → Rm and S : Rn → Rk are linear transformations with
standard matrices [R ] and [S ] respectively.
Then the composition R ◦ S : Rn → Rm is a linear transformation with
standard matrix
[R ◦ S ] = [R ] [S ].
254
Example 10.
Find the image of (x , y) after a shear along the x-axis with c = 1
followed by a reflection across the y -axis.
Solution:
From examples 5 and 9
255
5.3 Matrix representations for general linear
transformations
Let U,V be finite-dimensional vector spaces with the same scalars F.
Suppose that
◮ T : U → V is a linear transformation.
◮ B = {b1, . . . ,bn} is an ordered basis for U.
◮ C = {c1, . . . , cm} is an ordered basis for V .
Then for any u ∈ U, the coordinate vector for the image of u under T
with respect to basis C , is given by
[T (u)]C = [T ]C,B[u]B
where [T ]C,B is a suitable m × n matrix.
256
Theorem
There exists a unique matrix satisfying
[T (u)]C = [T ]C,B[u]B for all u ∈ U
which is given by
[T ]C,B =
[
[T (b1)]C [T (b2)]C · · · [T (bn)]C
]
.
Note:
◮ The matrix [T ]C,B is called the matrix of T with respect to the
bases B and C.
◮ For the case U = V and B = C, we write often [T ]B instead of
[T ]B,B.
257
Example 11.
Consider the linear transformation T : M2,2 → M2,2 where T is defined
by
T (Q) = QT .
Find the matrix representation of T with respect to the ordered basis
B =
{[
1 0
0 0
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]
,
[
0 0
0 1
]}
.
Solution:
Find the image of each basis vector in B under T , and express it as a
coordinate vector using the basis B.
258
259
Example 12.
Find [T ]C,B for the linear transformation T : P2 → P1 given by
T (a0 + a1x + a2x
2) = (a0 + a2) + a0x
using the ordered basis B = {1, x , x2} for P2 and the ordered basis
(a) C = {1, x} for P1.
(b) C = {2, 3x} for P1.
Solution:
1. Find the image of each basis vector in B under T .
260
2. Express each image as a coordinate vector with respect to basis C.
261
262
Example 13.
A linear transformation T : R3 → R2 has matrix
[T ] =
[
5 1 0
1 5 −2
]
with respect to the ordered standard bases of R3 and R2.
What is the transformation matrix with respect to the ordered bases
B = {(1, 1, 0), (1,−1, 0), (1,−1,−2)} of R3
and
C = {(1, 1), (1,−1)} of R2?
Solution:
1. Find the coordinates of the image of each vector in B with respect to
the standard basis.
263
2. Express each image as a coordinate vector with respect to basis C.
264
265
5.4 Image, kernel, rank and nullity
Definition (Kernel and Image)
Let T : U → V be a linear transformation.
The kernel (or nullspace) of T is defined to be
ker(T ) = {u ∈ U | T (u) = 0}.
The image (or range) of T is defined to be
Im(T ) = {v ∈ V | v = T (u) for some u ∈ U}.
Note:
◮ The kernel is a subspace of U. Its dimension is denoted by
nullity(T ).
◮ The image is a subspace of V . Its dimension is denoted by
rank(T ).
266
Example 14.
Find the kernel and image of the linear transformation D : P3 → P2
given by differentiation with respect to x :
D(a0 + a1x + a2x
2 + a3x
3) = a1 + 2a2x + 3a3x
2.
Solution:
267
Image and kernel from matrix representations
Given a linear transformation T : U → V and choices of ordered bases
B for U and C for V , we can represent T by the matrix
[T ] = [T ]C,B.
◮ To find ker(T ) we solve equations of the form [T ][x] = [0], so the
solution space for [T ] gives the coordinate vectors of the elements
of ker(T ) with respect to the basis B.
◮ If B is a basis for U then T (B) spans T (U). The columns of [T ]
give the C-coordinates of the elements of T (B), so the column
space of [T ] gives the coordinate vectors of the elements of Im(T )
with respect to the basis C.
◮ Hence, nullity(T ) = nullity([T ]) and rank(T ) = rank([T ]).
268
Using the rank-nullity theorem for matrices gives the following:
Theorem (Rank-Nullity Theorem)
For a linear transformation T : U → V with dim(U) = n,
rank(T ) + nullity(T ) = n.
269
Example 15.
Consider the linear transformation T : R3 → R2 defined by
T (x , y , z) = (2x − y , y + z).
Find bases for Im(T ) and ker(T ). Verify the rank-nullity theorem.
Solution:
1. Image of T
270
2. Kernel of T
271
3. Verify rank-nullity theorem
272
Example 16.
Consider the linear transformation T : P2 → P1 defined by
T (a0 + a1x + a2x
2) = (a0 − a1 + a2)(1 + 2x)
Find bases for Im(T ) and ker(T ).
Solution:
1. Image of T
273
2. Kernel of T
274
275
5.5 Invertible linear transformations
Definition (Injective, surjective)
A function T : U → V is
1. injective or one-to-one if T (x) = T (y) ⇒ x = y.
2. surjective or onto if T (U) = V .
For linear transformations, injectivity simplifies to the following:
Theorem
A linear transformation T : U → V is injective if and only if
ker(T ) = {0}.
276
Example 17.
Consider the linear transformation T : P2 → P1 defined by
T (a0 + a1x + a2x
2) = (a0 − a1 + a2)(1 + 2x).
(a) Is T injective?
(b) Is T surjective?
Solution:
From example 16:
277
Definition (Invertible)
A function T : U → V is invertible if there exists a function S : V → U
such that
◮ (S ◦ T )(u) = u for all u ∈ U, and
◮ (T ◦ S)(v) = v for all v ∈ V .
Then S is called the inverse of T and is denoted by T−1.
278
Theorem
Let T : U → V be a linear transformation.
1. T is invertible if and only if it is both injective and surjective.
2. If T is invertible, then its inverse T−1 is also a linear
transformation.
3. Assume that U and V have finite ordered bases B and C,
respectively. Then T is invertible if and only if [T ]C,B is invertible,
and
[T−1]B,C = ([T ]C,B)
−1.
This requires dim(U) = dim(V ).
Note:
An invertible linear transformation T is also called an isomorphism.
279
Example 18.
Is the rotation T : R2 → R2 around the origin anticlockwise by an angle
θ, with standard matrix
[T ] =
[
cos θ − sin θ
sin θ cos θ
]
,
an invertible linear transformation? If it is, find [T−1].
Solution:
280
Example 19.
Consider the linear transformation T : R2 → R2 for orthogonal
projection onto the x-axis, with standard matrix
[T ] =
[
1 0
0 0
]
.
1
1
1
1
x
y
x
y
(a) Is T surjective?
(b) Is T injective?
(c) Is T invertible?
Solution:
281
282
5.6 Change of basis
Transition Matrices
Let B = {b1, . . . ,bn} and C = {c1, . . . , cn} be ordered bases for the
same vector space V with field of scalars F.
Given v ∈ V , how are [v]B and [v]C related?
Theorem
There exists a unique matrix PC,B such that for any vector v ∈ V ,
[v]C = PC,B[v]B.
The matrix PC,B is given by
PC,B = [ [b1]C · · · [bn]C]
and is called the transition matrix from B to C.
283
Proof:
We want to find a matrix PC,B such that for all vectors v ∈ V ,
[v]C = PC,B[v]B.
If T : V → V is a linear transformation, then
[T (v)]C = [T ]C,B[v]B
where
[T ]C,B =
[
[T (b1)]C [T (b2)]C · · · [T (bn)]C
]
is the matrix representation of T with respect to bases B and C.
284
For the special case of the identity function
T (v) = v for all v ∈ V ,
this gives
[T ]C,B =
[
[b1]C [b2]C . . . [bn]C
]
and
[v]C = [T ]C,B[v]B.
So
PC,B = [T ]C,B.
Note:
The transition matrix PC,B is invertible since T is invertible, and
(PC,B)
−1 = PB,C .
285
Example 20.
Consider the following ordered bases for R2
B = {(1, 1), (−1, 1)} and S = {(1, 0), (0, 1)} .
(a) Determine the transition matrix from B to S.
(b) If [v]B =
[
1
1
]
, compute [v]S .
Solution:
286
Example 21.
Consider the following ordered bases for R2
B = {(1, 1), (−1, 1)} and S = {(1, 0), (0, 1)} .
(a) Determine the transition matrix from S to B.
(b) If [v]S =
[
0
2
]
, compute [v]B.
Solution:
287
Calculating a general transition matrix from basis B to basis C
We have
[v]S = PS,B[v]B and [v]C = PC,S [v]S .
Combining these, we get
[v]C = PC,S [v]S = PC,SPS,B[v]B .
Since the transition matrix is unique then
PC,B = PC,SPS,B = (PS,C)
−1PS,B.
288
Example 22.
Consider the following ordered bases for R2
B = {(1, 2), (1, 1)} and C = {(−3, 4), (1,−1)}.
Find the transition matrix from B to C.
Solution:
289
Note:
Writing vectors in B as linear combinations of vectors in C gives:
(1, 2) = 3(−3, 4) + 10(1,−1)
(1, 1) = 2(−3, 4) + 7(1,−1)
where the coefficients correspond to the coordinate vectors in the
columns of PC,B.
290
How are [T ]C and [T ]B related?
Theorem
The matrix representations of T : V → V with respect to two ordered
bases C and B for V are related by the following equation:
[T ]B = PB,C[T ]CPC,B.
Proof:
We need to show that for all v ∈ V ,
[T (v)]B = PB,C[T ]CPC,B[v]B
Starting with the right-hand side we obtain
PB,C[T ]CPC,B[v]B = PB,C[T ]C [v]C (property of PC,B)
= PB,C[T (v)]C (property of [T ]C)
= [T (v)]B (property of PB,C)
291
Example 23.
Determine the standard matrix representation in R2 for reflection in the
line y = 5x
using a change of basis from the ordered standard basis
S = {(1, 0), (0, 1)} to the ordered basis B = {u, v}, where
◮ u = (1, 5) is a vector in the direction of y = 5x
◮ v = (5,−1) is a vector perpendicular to y = 5x
292
Solution:
1. Find image of the vectors in B and write in terms of B to give [T ]B.
293
2. Find the transition matrices PS,B and PB,S .
294
3. Find the standard matrix representation of T .
295
Example 24.
Consider the linear transformation T : R2 → R2 given by
T (x , y) = (3x − y ,−x + 3y).
Let B = {(1, 1), (−1, 1)} be a basis for R2.
(a) Find the standard matrix representation of T .
(b) Find the matrix representation of T with respect to the basis B.
(c) Find [T (v)]B if [v]B =
[
v1
v2
]
. Hence interpret T geometrically.
Solution:
296
297
b1b2
v1v2
298
Topic 6: Eigenvalues and Eigenvectors [AR Chap 5]
6.1 Definition of eigenvalues and eigenvectors
6.2 Finding eigenvalues
6.3 Finding eigenvectors
6.4 Diagonalisation
6.5 Matrix powers
299
6.1 Definition of eigenvalues and eigenvectors
We wish to identify subspaces W that are invariant under a linear
transformation T : V → V , i.e. with T (W ) ⊆W .
Example 1.
Let T : R2 → R2 be defined by a reflection in the line y = 5x .
(a) Identify two lines through the origin that are invariant under T .
(b) Find the image of the direction vectors for each line in (a).
300
Solution:
301
Algebraically, we seek non-zero vectors v such that
T (v) = λv
for some scalar λ. Then the 1-dimensional subspace W = span{v} is
invariant under T .
Geometrically,
◮ If λ = 0, then T maps v to 0.
◮ If λ is real and positive, then T rescales v by a factor λ.
◮ If λ is real and negative, then T rescales v by a factor λ and also
reverses the direction of v.
302
Let V be a vector space over a field of scalars F.
Definition (Eigenvalues and eigenvectors of transformations)
Let T : V → V be a linear transformation.
A scalar λ ∈ F is an eigenvalue of T if there is a non-zero vector v ∈ V
such that
T (v) = λv.
Then v is called an eigenvector of T with corresponding eigenvalue λ.
For each eigenvalue λ, the set
{v ∈ V | T (v) = λv}.
is a subspace of V , called the eigenspace of λ.
Note:
Each non-zero vector in this eigenspace is an eigenvector of T with
eigenvalue λ.
303
If we represent T by its matrix [T ]B with respect to an ordered basis B
for V , then we can find eigenvalues and eigenvectors by solving
[T ]B[v]B = λ[v]B, where v 6= 0.
This leads to the following definition.
Definition (Eigenvalues and eigenvectors of matrices)
Let A be an n× n matrix with real or complex entries. A scalar λ ∈ C is
an eigenvalue of A if there is a non-zero column vector v ∈ Cn such that
Av = λv.
Then v is called an eigenvector of A, with corresponding eigenvalue λ.
Note:
For a real matrix, eigenvalues and eigenvectors can be real or complex .
304
6.2 Finding eigenvalues
Let A be an n× n matrix and v be a column vector in Rn or Cn.
If I denotes the n × n identity matrix, then the equation
Av = λv
can be rewritten in the form
Av− λv = 0
⇒ Av − λIv = 0
⇒ (A − λI )v = 0
The values of λ for which this equation has non-zero solutions are the
eigenvalues.
305
Theorem
The homogeneous linear system
(A − λI )v = 0
has a non-zero solution if and only if
det(A− λI ) = 0.
The eigenvalues of A are the values of λ which satisfy the characteristic
equation
det(A− λI ) = 0.
306
Example 2.
Find the eigenvalues of A =
[
1 4
1 1
]
.
Solution:
Solve det(A− λI ) = 0
307
Definition (Characteristic polynomial)
Let A be an n× n matrix. The determinant det(A− λI ) is a polynomial
in λ of degree n called the characteristic polynomial of A.
Definition (Multiplicity of eigenvalues)
The algebraic multiplicity of an eigenvalue λ0 is the power of λ− λ0 in
the characteristic polynomial.
The geometric multiplicity of an eigenvalue is the number of linearly
independent eigenvectors associated with it.
By the fundamental theorem of algebra, we can always find at least one
complex eigenvalue for any real or complex n × n matrix. Further, the
sum of the algebraic multiplicities must equal n.
308
Example 3.
Find the eigenvalues of A =
2 −3 60 5 −6
0 1 0
.
What is the algebraic multiplicity of each eigenvalue?
Solution:
Solve det(A− λI ) = 0
309
310
Useful properties of eigenvalues
The eigenvalues λi (i = 1, · · · , n) of an n × n matrix A satisfy the
following properties:
1. Tr(A) =
n∑
i=1
λi = sum of the eigenvalues
2. det(A) =
n∏
i=1
λi = product of the eigenvalues
These properties are useful for checking that you have calculated the
eigenvalues correctly.
311
Example 4.
Check these properties for the eigenvalues λ = 2, 2, 3 of the matrix
A =
2 −3 60 5 −6
0 1 0
.
Solution:
312
Example 5.
Consider the matrix A =
[
0 1
−1 0
]
.
(a) Find the eigenvalues of A.
(b) Interpret the matrix and eigenvalues geometrically.
Solution:
313
Note:
The matrix A also defines a linear transformation T : C2 → C2 where
T (v) = Av
for v ∈ C2.
Then T has complex eigenvalues.
314
Eigenvalues of matrices versus eigenvalues of linear transformations
Let A be an n× n matrix with complex entries. Then the eigenvalues of
A are the eigenvalues of the linear transformation T : Cn → Cn with
standard matrix A. For each eigenvalue λ,
{v ∈ Cn | Av = λv}
is a subspace of Cn, called the (complex) eigenspace of λ.
Let A be an n× n matrix with real entries. Then the real eigenvalues of
A are the eigenvalues of the linear transformation T : Rn → Rn with
standard matrix A. For each real eigenvalue λ,
{v ∈ Rn : Av = λv}
is a subspace of Rn called the (real) eigenspace of λ.
315
6.3 Finding eigenvectors
To find the eigenvectors of a matrix
Let A be an n× n matrix and I be the n × n identity matrix.
◮ For each eigenvalue λ, solve the linear system
(A− λI )v = 0.
The solution space is the eigenspace of λ.
◮ You always obtain at least one row of zeros in row-echelon form as
rank(A− λI ) < n. So there are non-zero solutions.
◮ Choose a basis for the eigenspace. The vectors in the basis are
linearly independent eigenvectors with eigenvalue λ.
Note:
The eigenvectors of a matrix are not unique. An eigenvector v can be
multiplied by any non-zero scalar and it will still be an eigenvector.
316
Example 6.
Find the eigenvectors of the matrix A =
[
1 4
1 1
]
.
Solution:
From example 2, the eigenvalues are λ = −1, 3.
For each eigenvalue λ, solve (A− λI )v = 0
317
318
319
Example 7.
Find the eigenvectors of the matrix A =
2 −3 60 5 −6
0 1 0
.
What is the geometric multiplicity of each eigenvalue?
Solution:
From example 3, the eigenvalues are λ = 2, 3.
For each eigenvalue λ, solve (A− λI )v = 0
320
321
322
Note:
The geometric multiplicity is always less than or equal to the algebraic
multiplicity.
323
Example 8.
Find the eigenvectors of the matrix A =
[
0 1
−1 0
]
.
Solution:
From example 5, the eigenvalues are λ = ±i .
For each eigenvalue λ, solve (A− λI )v = 0
324
325
Note:
◮ The eigenvalues are complex conjugates.
◮ The eigenvectors are complex conjugates.
In general, the non-real eigenvalues and eigenvectors of a real matrix
occur in complex conjugate pairs.
326
6.4 Diagonalisation
Definition
A linear transformation T : V → V is diagonalisable if there is a basis
B = {b1, . . . ,bn} for V such that the matrix [T ]B is diagonal.
Definition
A complex n × n matrix A is diagonalisable (or diagonalisable over C) if
the linear transformation
T : Cn → Cn withT (v) = Av
is diagonalisable.
A real n × n matrix A is diagonalisable over R if the linear
transformation
T : Rn → Rn withT (v) = Av
is diagonalisable.
327
Assume V is a finite dimensional vector space over a field F = R or C.
Theorem
A linear transformation T : V → V is diagonalisable if and only if there
is a basis B for V consisting of eigenvectors of T .
A matrix A ∈ Mn,n(F) is diagonalisable over F if and only if it has n
linearly independent eigenvectors in Fn.
Proof:
Let B = {b1,b2, · · · ,bn} be a basis for V .The matrix [T ]B is diagonal
if and only if each of the basis vectors bi satisfies
T (bi ) = λibi ,
where λi is the corresponding diagonal entry of the matrix [T ]B.
328
How to diagonalise a matrix
Theorem
An n× n matrix A is diagonalisable if and only if there is an n× n
invertible matrix P and an n × n diagonal matrix D such that
P−1AP = D
or, equivalently,
A = PDP−1.
The matrix P is said to diagonalise A.
If v1, . . . , vn are linearly independent eigenvectors of A with
corresponding eigenvalues λ1, . . . , λn, then we can take
P =
[
v1 · · · vn
]
and D = diag(λ1, λ2, . . . , λn).
Note:
If A is diagonalisable over F then P and D must have entries in F.
329
Example 9.
Let A =
[
1 4
1 1
]
. Suppose T : R2 → R2 is such that [T ] = A.
(a) Write A in the form PDP−1. Check your answer.
(b) Give a geometric description of T .
Solution:
(a) From example 6:
An eigenvector corresponding to λ = −1 is v1 =
[ −2
1
]
.
An eigenvector corresponding to λ = 3 is v2 =
[
2
1
]
.
330
Check A = PDP−1 or AP = PD by multiplying the matrices.
331
Theorem
Eigenvectors v1, . . . , vk corresponding to distinct eigenvalues λ1, . . . , λk
are linearly independent.
Proof:
If k = 2.
Let v1, v2 be eigenvectors of A corresponding to λ1 and λ2 respectively,
such that λ1 6= λ2. Then
Av1 = λ1v1 and Av2 = λ2v2.
Suppose that
α1v1 + α2v2 = 0 (1)
Then
A(α1v1 + α2v2) = A0 = 0
⇒ α1Av1 + α2Av2 = 0
⇒ α1λ1v1 + α2λ2v2 = 0 (2)
332
Equation (2)− λ1(1) gives
α2(λ2 − λ1)v2 = 0
Since λ1 6= λ2 and v2 6= 0, then α2 = 0.
Since v1 6= 0 and α2 = 0, then (1) gives α1 = 0.
Since the only solution of (1) is
α1 = α2 = 0
then v1 and v2 are linearly independent eigenvectors.
Note:
Similar arguments can be used for the cases k > 2.
333
Theorem
If an n × n matrix A has n distinct eigenvalues, then A has n linearly
independent eigenvectors. Hence, A will be diagonalisable.
Note:
If A has one or more repeated eigenvalues, then A may or may not be
diagonalisable depending on the number of linearly independent
eigenvectors associated with each eigenvalue.
◮ If the algebraic and geometric multiplicities for each eigenvalue are
equal, A will be diagonalisable. In this case, the dimensions of the
eigenspaces add up to n.
◮ If the geometric multiplicity is less than the algebraic multiplicity
for some eigenvalue, A will not be diagonalisable.
334
Example 10.
Is A =
2 −3 60 5 −6
0 1 0
diagonalisable?
If it is, specify the matrices P and D such that A = PDP−1.
Solution:
From example 7:
Two linearly independent eigenvectors corresponding to λ = 2 are
v1 =
10
0
and v2 =
02
1
An eigenvector corresponding to λ = 3 is
v3 =
−33
1
335
336
Example 11.
Show that the matrix A =
[
1 2
0 1
]
is not diagonalisable.
Solution:
Find the eigenvalues of A
Find the eigenvectors of A
337
Note:
The eigenvalue λ = 1 has algebraic multiplicity 2 but geometric
multiplicity of 1, so we cannot diagonalise A.
338
Example 12.
Is A =
[
0 1
−1 0
]
diagonalisable?
If it is, specify the matrices P and D such that A = PDP−1.
Solution:
From example 8:
An eigenvector corresponding to λ = i is v1 =
[ −i
1
]
An eigenvector corresponding to λ = −i is v2 =
[
i
1
]
339
6.5 Matrix powers
In applications, one often needs to apply a transformation many times.
If the transformation can be represented by a diagonal matrix D, it is
easy to compute Dk and thus the action of the k-th application of the
transformation.
Example 13.
If D =
−4 0 00 3 0
0 0 2
, write down D10.
Solution:
340
Theorem
Suppose A is diagonalisable, so that
A = PDP−1
for some invertible matrix P and a diagonal matrix D.
Then for all integers k > 1, we have
Ak = PDkP−1.
341
Example 14.
From example 9, A =
[
1 4
1 1
]
can be written in the form A = PDP−1
where
D =
[−1 0
0 3
]
, P =
[−2 2
1 1
]
and P−1 =
1
4
[−1 2
1 2
]
.
Calculate Ak , for all integers k > 1.
Solution:
342
Example 15. An application from genetics
A snapdragon plant has either red, pink or white flowers, depending on
its genotype (with alleles RR, RW or WW respectively). We want to
cross pink plants with red, pink and white plants. On average,
◮ crossing pink plants with red plants produces:
1
2
red plants and 1
2
pink plants as offspring.
◮ crossing pink plants with pink plants produces:
1
4
red plants, 1
2
pink plants and 1
4
white plants as offspring.
◮ crossing pink plants with white plants produces:
1
2
pink plants and 1
2
white plants as offspring.
This is an example of incomplete dominance in genetics.
If you continue crossing with only pink plants, what fractions of the
three types of flowers would you eventually expect to see in your garden?
343
Solution:
Let
xn =
rnpn
wn
where rn, pn,wn are the fractions of red, pink and white flowers after n
generations, for n > 0. Note that these satisfy
rn + pn + wn = 1.
Now for each n > 0 we have
344
This gives
xn+1 = Txn,
where T is the transition matrix
Note that T has non-negative entries and the sum of each column is 1.
Then
x1 = Tx0, x2 = Tx1 = T (Tx0) = T
2x0,
and in general
xn = T
nx0
gives the distribution after n generations for n > 1.
345
Now T has eigenvalues
λ = 1,
1
2
, 0
with corresponding eigenvectors
v1 =
12
1
, v2 =
−10
1
, v3 =
1−2
1
.
Hence we can diagonalise T and
T = PDP−1
where
P =
1 −1 12 0 −2
1 1 1
, D =
1 0 00 1
2
0
0 0 0
, P−1 =
1
4
1
4
1
4
−1
2
0 1
2
1
4
−1
4
1
4
.
346
Now
T n = PDnP−1
Taking the limit as n→∞ we get
347
So in the long run, the expected distribution of flower colours is:
Note:
◮ These fractions are proportional to the entries of the eigenvector of
T with eigenvalue 1 . This is expected since the limiting
proportions x∞ satisfy Tx∞ = x∞.
◮ This is an example of a Markov chain.
348
Topic 7: Inner Product Spaces [AR 6.1-6.5,5.3,7.1-7.2,7.5]
7.1 Definition of inner products
7.2 Geometry from inner products
7.3 Orthogonal sets and Gram-Schmidt procedure
7.4 Least squares curve fitting
7.5 Orthogonal matrices and symmetric matrices
7.6 Unitary matrices and Hermitian matrices
349
7.1 Definition of inner products
We now generalise the dot product to define inner products, angles and
lengths on general vector spaces.
Definition (Inner product over R)
An inner product on a real vector space V is a function that associates
to every pair of vectors u, v ∈ V a real number, denoted by 〈u, v〉,
satisfying the following properties.
For all u, v,w ∈ V and α ∈ R:
1. 〈u, v〉 = 〈v,u〉 (symmetry)
2. 〈αu, v〉 = α〈u, v〉 (linearity of scalar multiplication)
3. 〈u+ v,w〉 = 〈u,w〉+ 〈v,w〉 (linearity of vector addition)
4. a. 〈u, u〉 > 0 (positivity)
b. 〈u, u〉 = 0⇒ u = 0
A real vector space V together with an inner product is called a real
inner product space.
350
Example
For V = Rn, the dot product
〈u, v〉 = u · v
defines an inner product.
351
Example 1.
For u = (u1, u2) ∈ R2 and v = (v1, v2) ∈ R2, define
〈u, v〉 = u1v1 + 2u2v2.
Show that this gives an inner product on R2.
Solution:
Define a third vector and a scalar
1. Symmetry
352
2. Linearity of scalar multiplication
3. Linearity of vector addition
353
4. Positivity
354
Example 2.
For u = (u1, u2, u3) ∈ R3 and v = (v1, v2, v3) ∈ R3, define
〈u, v〉 = u1v1 − u2v2 + u3v3.
Show that this does not define an inner product on R3.
Solution:
Give a specific counterexample
355
Note:
Let u = (u1, · · · , un) ∈ Rn and v = (v1, · · · , vn) ∈ Rn.
◮ All inner products defined on Rn can be written in the form
〈u, v〉 = [u]TA [v]
where A is a real symmetric n× n matrix, so
A = AT .
◮ When written in this matrix form, axioms 1, 2 and 3 follow
immediately from properties of matrices. We only need to check
axiom 4.
356
Example 3.
For u = (u1, u2) ∈ R2 and v = (v1, v2) ∈ R2, define
〈u, v〉 = 2u1v1 − 2u1v2 − 2u2v1 + 3u2v2.
(a) Write 〈u, v〉 in terms of a symmetric matrix.
(b) Show that 〈u, v〉 defines an inner product on R2.
Solution:
357
358
Example 4.
For p = p(x) ∈ Pn and q = q(x) ∈ Pn, define
〈p,q〉 =
∫ 1
0
p(x)q(x) dx .
Show that this gives an inner product on Pn.
Solution:
Define a third polynomial and a scalar
1. Symmetry
359
2. Linearity of scalar multiplication
3. Linearity of vector addition
360
4. Positivity
361
Complex vector spaces
Definition (Hermitian dot product on Cn)
The Hermitian dot product (or complex dot product) of u, v ∈ Cn is
defined by
u · v = u1v1 + u2v2 + · · ·+ unvn,
where a bar indicates the complex conjugate.
Taking the complex conjugate is necessary to have real and positive
lengths:
v · v = v1v1 + v2v2 + · · · + vnvn = |v1|2 + |v2|2 + · · ·+ |vn|2 > 0.
So we can define the length of v by
√
v · v.
362
Example 5.
Let u = (1 + i , 1− i), v = (i , 1) ∈ C2.
Compute the Hermitian dot products u · v, v · u and u · u.
Solution:
363
Definition (Inner product over C)
A Hermitian inner product on a complex vector space V is a function
that associates to every pair of vectors u, v ∈ V a complex number,
denoted by 〈u, v〉, satisfying the following properties.
For all u, v,w ∈ V and α ∈ C:
1. 〈u, v〉 = 〈v,u〉 (Hermitian symmetry)
2. 〈αu, v〉 = α〈u, v〉 (linearity of scalar multiplication)
3. 〈u+ v,w〉 = 〈u,w〉+ 〈v,w〉 (linearity of vector addition)
4. a. 〈u, u〉 > 0 (positivity)
b. 〈u, u〉 = 0⇒ u = 0
A complex vector space V together with an Hermitian inner product is
called a complex inner product space.
364
Example
The Hermitian dot product on Cn satisfies these properties.
Example
The inner product on Pn(R) defined in example 4 can be generalised to
Pn(C) by replacing the second polynomial in the definition by its
complex conjugate:
〈p,q〉 =
∫ 1
0
p(x)q(x) dx .
Note:
Inner products defined by integration are important in many
applications, for example in quantum mechanics, electrical engineering,
and in the study of differential equations.
365
Example 6.
For u = (u1, u2) ∈ C2 and v = (v1, v2) ∈ C2, define
〈u, v〉 = iu1v1 − iu2v2.
Show that this does not define an inner product on C2.
Solution:
Give a specific counterexample
366
7.2 Geometry from inner products
In a general vector space, how can we find “lengths” of vectors and
“angles” between vectors?
Definition (Length, distance)
For a real or complex inner product space, the length (or norm) of a
vector v is defined by
‖v‖ =
√
〈v, v〉.
The distance between u and v is given by
d(u, v) = ‖v − u‖.
367
Example (Frobenius inner product and digital pictures)
Given two matrices A,B ∈ Mm,n, we define
〈A,B〉 = Tr(ATB).
This is an inner product on Mm,n (see tutorial 12), so defines a distance
function d on Mm,n.
If the matrix entries represent the brightness of pixels in grayscale
digital photos, then
d(A,B) = ‖A− B‖ =
(
Tr((A − B)T (A− B))
)1/2
measures how “close together” two photos are.
368
Definition (Angle)
For a real inner product space, the angle θ between two non-zero
vectors u and v is defined by
cos θ =
〈u, v〉
‖u‖‖v‖ , 0 6 θ 6 π.
For a complex inner product space, the angle θ between two non-zero
vectors u and v is defined by
cos θ =
Re〈u, v〉
‖u‖‖v‖ , 0 6 θ 6 π,
where Re〈u, v〉 denotes the real part of the inner product.
369
Note:
In order for these definitions of angle to make sense, we need
−1 6 〈u, v〉‖v‖‖u‖ 6 1 and − 1 6
Re〈u, v〉
‖v‖‖u‖ 6 1.
These results follow immediately from the Cauchy-Schwarz inequality.
Theorem (Cauchy-Schwarz inequality)
Let V be an inner product space. Then for all u, v ∈ V
|〈u, v〉| 6 ‖u‖‖v‖.
Equality holds if and only if one vector is a multiple of the other i.e. the
vectors are linearly dependent.
370
Example 7.
Let u = 1 and v = x . Using the inner product:
〈p,q〉 =
∫ 1
0
p(x)q(x) dx
find the
(a) lengths of u and v
(b) distance between u and v
(c) angle between u and v
Solution:
371
372
7.3 Orthogonal sets and the Gram-Schmidt procedure
Definition (Orthogonal vectors)
Let V be an inner product space. Two vectors u, v ∈ V are orthogonal
if
〈u, v〉 = 0.
Note:
For complex inner products this is stronger than saying the angle
between u and v is π/2 which only requires Re〈u, v〉 = 0. It also implies
that the angle between u and iv is π/2.
Definition (Orthogonal set)
A set of vectors {v1, . . . , vk} in an inner product space V is orthogonal
if
〈vi , vj 〉 = 0, i 6= j .
That is, the inner product of each distinct pair of vectors is zero.
373
Example 8. (Pythagoras’ Theorem)
If V is a real inner product space and u, v are orthogonal vectors in V ,
show that
‖u+ v‖2 = ‖u‖2 + ‖v‖2.
Solution:
374
Example 9.
Show that
{v1, v2, v3} = {(1, 1, 1), (1,−1, 1), (1, 0,−1)}
is an orthogonal set of vectors in R3 with respect to the inner product
〈u, v〉 = u1v1 + 2u2v2 + u3v3.
Solution:
375
Theorem
Every orthogonal set of non-zero vectors {v1, . . . , vk} in an inner
product space V is linearly independent.
Proof:
Since {v1, . . . , vk} is an orthogonal set, then
〈vi , vj 〉 = 0, i 6= j
Also
〈0, vi 〉 = 0, i = 1, · · · , k
Suppose that
α1v1 + · · · + αkvk = 0 (*)
for some scalars αi .
376
Then, for each i = 1, · · · , k :
〈α1v1 + · · ·+ αkvk , vi 〉 = 0
⇒ 〈α1v1, vi 〉+ · · · + 〈αivi , vi 〉+ · · ·+ 〈αkvk , vi 〉 = 0
⇒ α1〈v1, vi 〉+ · · · + αi〈vi , vi 〉+ · · ·+ αk〈vk , vi 〉 = 0
⇒ αi 〈vi , vi 〉 = 0
Since 〈vi , vi 〉 6= 0 then αi = 0.
Hence the only solution of (∗) is
αi = 0, for all i = 1, · · · , k
so {v1, . . . , vk} are linearly independent.
377
Definition (Orthonormal set)
A set of vectors {v1, . . . , vk} in an inner product space V is
orthonormal if it is orthogonal and each vector has length 1.
That is
{v1, . . . , vk} is orthonormal ⇔ 〈vi , vj 〉 =
{
0 i 6= j ,
1 i = j .
Note:
Any orthogonal set of non-zero vectors can be made orthonormal by
dividing each vector by its length.
378
Example 10.
Using the orthogonal set
{v1, v2, v3} = {(1, 1, 1), (1,−1, 1), (1, 0,−1)}
construct a set of R3 that is orthonormal with respect to the inner
product
〈u, v〉 = u1v1 + 2u2v2 + u3v3.
Solution:
379
Orthonormal bases
Theorem
Let V be an inner product space. If {v1, . . . , vn} is an orthonormal
basis for V and x ∈ V , then
x = 〈x, v1〉v1 + · · ·+ 〈x, vn〉vn.
Proof:
Since {v1, . . . , vn} is an orthonormal set
〈vi , vj 〉 =
{
0 i 6= j ,
1 i = j .
Since {v1, . . . , vn} is a basis for V , then there is a unique set of scalars
αi (i = 1, · · · , n) such that
x = α1v1 + · · ·+ αnvn.
380
Then, for each i = 1, · · · , n, we have
〈x, vi 〉 = 〈α1v1 + · · ·+ αnvn, vi 〉
= 〈α1v1, vi 〉+ · · · + 〈αivi , vi 〉+ · · · + 〈αnvn, vi 〉
= α1〈v1, vi 〉+ · · · + αi〈vi , vi 〉+ · · · + αn〈vn, vi 〉
= αi
by properties 2 and 3 of inner products.
Hence
x = 〈x, v1〉v1 + · · ·+ 〈x, vn〉vn.
381
Example 11.
Recall R3 has an orthonormal basis
{b1,b2,b3} =
{
1
2
(1, 1, 1),
1
2
(1,−1, 1), 1√
2
(1, 0,−1)
}
with respect to the inner product
〈u, v〉 = u1v1 + 2u2v2 + u3v3.
Express x = (1, 1,−1) as a linear combination of b1,b2,b3.
Solution:
382
383
Orthogonal projection
Orthogonal projection in Rn can be generalised to any inner product
space.
Definition (Orthogonal projection onto a vector)
Let V be an inner product space. Let u ∈ V be a non-zero vector.
The orthogonal projection of v onto u is
proju v =
〈v,u〉
〈u,u〉u.
If u is unit vector then
proju v = 〈v,u〉u.
Note:
If p = proju v then v − p is orthogonal to u since:
〈v − p,u〉 = 〈v,u〉 − 〈p,u〉 = 〈v,u〉 − 〈v,u〉〈u,u〉 〈u,u〉 = 0.
384
We can also generalise to projection onto a subspace W of V .
Definition (Orthogonal projection onto a subspace)
Let {u1, . . . ,uk} be an orthonormal basis for the subspace W ⊆ V and
v ∈ V .
The orthogonal projection of v onto W is
projW v = 〈v,u1〉u1 + · · · + 〈v,uk〉uk .
If p = projW v then:
v
p
v−p
0
W
385
Properties of orthogonal projection of v onto W
1. projW v ∈W
2. If w ∈W , then
projW w = w
3. v − projW v is orthogonal to W , i.e. orthogonal to every vector in
W .
4. projW v is the unique vector in W that is closest to v. Thus,
‖v − w‖ > ‖v − projW v‖
for all w ∈W with w 6= projW v.
5. projW v does not depend on the choice of orthonormal basis for W .
386
Example 12.
Let W = {(x , y , z) | x + y + z = 0}. The set
{u1, u2} =
{
1√
2
(1,−1, 0), 1√
6
(1, 1,−2)
}
is an orthonormal basis for W with respect to the dot product.
(a) Find the orthogonal projection of the vector v = (1, 2, 3) onto W .
(b) Find the (shortest) distance from v to W .
Solution:
387
388
How to produce an orthonormal basis from a general basis
Gram-Schmidt procedure
Suppose {v1, . . . , vk} is a basis for an inner product space V . Define:
1. u1 =
1
‖v1‖
v1
2a. w2 = v2 − 〈v2,u1〉u1
2b. u2 =
1
‖w2‖
w2
3a. w3 = v3 − 〈v3,u1〉u1 − 〈v3,u2〉u2
3b. u3 =
1
‖w3‖
w3
...
...
ka. wk = vk − 〈vk ,u1〉u1 − · · · − 〈vk ,uk−1〉uk−1
kb. uk =
1
‖wk‖
wk
Then {u1, . . . ,uk} is an orthonormal basis for V .
389
Example 13.
Using Gram-Schmidt, construct an orthonormal basis for R3 with
respect to the dot product, starting with the ordered basis
{v1, v2, v3} = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}.
Solution:
Step 1: Find u1 =
v1
‖v1‖
390
Step 2a: Find w2 = v2 − 〈v2,u1〉u1
Step 2b: Find u2 =
w2
‖w2‖
391
Step 3a: Find w3 = v3 − 〈v3,u1〉u1 − 〈v3,u2〉u2
392
Step 3b: Find u3 =
w3
‖w3‖
393
7.4 Least squares curve fitting
Given a set of data points (x1, y1), (x2, y2),. . . , (xn, yn) we want to find
the line y = a + bx that best approximates the data.
A common approach is to minimise the least squares error E .
E 2 = sum of the squares
of vertical errors
=
n∑
i=1
δ2i
=
n∑
i=1
(yi − (a + bxi ))2
x
y
a+bx
394
This can be written as
E 2 = ‖y − Au‖2
using the Euclidean dot product, where
y =
y1
...
yn
, A =
1 x1
1 x2
...
...
1 xn
, u =
[
a
b
]
, Au =
a + bx1
a + bx2
...
a + bxn
.
To minimise ‖y − Au‖2, we want to choose u so that Au is as close as
possible to y.
To find the closest point, we project y onto
W = {Av | v ∈ R2} = column space of A.
So we want u ∈ R2 such that Au = projW y.
We can calculate u directly (without finding an orthonormal basis for
W ) by noting that y − projW y is orthogonal to W .
395
Thus
〈w, y − projW y〉 = 0 for all w ∈W
⇒ 〈Av, y − Au〉 = 0 for all v ∈ R2
⇒ (Av)T (y − Au) = 0 for all v ∈ R2
⇒ vTAT (y − Au) = 0 for all v ∈ R2
⇒ AT (y − Au) = 0
⇒ ATy − ATAu = 0
⇒ ATAu = ATy
From this we can calculate u, given that we know A and y.
396
Theorem (Least squares line of best fit)
The least squares solution for the line y = a+ bx of best fit for the
points (x1, y1), . . . , (xn, yn) is determined by
ATA u = ATy,
or, if ATA is invertible,
u = (ATA)−1ATy,
where
y =
y1
...
yn
, A =
1 x1
1 x2
...
...
1 xn
, u =
[
a
b
]
.
397
Example 14.
Find the line of best fit for the data points (−1, 1), (1, 1), (2, 3) using
the method of least squares.
Solution:
398
399
400
7.5 Orthogonal matrices and symmetric matrices
Definition (Orthogonal matrix)
An orthogonal matrix Q is a real n × n matrix such that Q−1 = QT .
Note:
It is enough to show that
QTQ = In or QQ
T = In.
Theorem
A real n × n matrix is orthogonal if and only if its columns form an
orthonormal basis for Rn using the dot product.
401
Example 15.
Show that the following rotation matrix is orthogonal:
Q =
[
cos θ − sin θ
sin θ cos θ
]
Solution:
402
Geometric properties of orthogonal matrices
Theorem
If Q is an orthogonal n × n matrix then
1. Qu ·Qv = u · v for all u, v ∈ Rn (with u, v regarded as column
vectors),
2. det(Q) = ±1.
Note:
◮ Property 1 says that orthogonal matrices preserve dot products.
Hence they also preserve Euclidean lengths and angles.
◮ Geometrically, orthogonal matrices represent rotations when
det(Q) = 1, and reflections composed with rotations when
det(Q) = −1.
403
Example 16.
Prove property 1: If Q is an orthogonal n× n matrix then
Qu ·Qv = u · v for all u, v ∈ Rn regarded as column vectors.
Solution:
404
Real symmetric matrices
Theorem (Orthogonal Diagonalisation)
Let A be an n× n real symmetric matrix. Then using the dot product
as the inner product on Rn:
1. all eigenvalues λi (i = 1, · · · , n) of A are real,
2. eigenvectors corresponding to distinct eigenvalues are orthogonal,
3. there is an orthonormal basis of eigenvectors {v1, · · · , vn},
4. A is diagonalisable with
A = QDQ−1 = QDQT
where D is the diagonal matrix
diag(λ1, . . . , λn)
and Q is the orthogonal matrix[
v1 · · · vn
]
.
405
Example 17.
Write the symmetric matrix
A =
[
1 −1
−1 1
]
in the form A = QDQT where D is a diagonal matrix.
Solution:
Step 1. Find the eigenvalues of A
406
Step 2. Find a unit eigenvector corresponding to each eigenvalue
407
408
Step 3. Form D and Q.
Note:
In the case of repeated eigenvalues, you may need to use the
Gram-Schmidt procedure in each eigenspace to find an orthonormal
basis of eigenvectors.
409
Singular value decomposition (SVD)
For a general real m × n matrix A, the diagonalisation formula for
symmetric matrices can be extended to find orthogonal matrices U and
V such that
A = USV T
where S is an m × n matrix with Sii > 0 and Sij = 0 whenever i 6= j .
The above equation is called the singular value decomposition of A.
The singular value decomposition is important in many applications,
including digital image compression and stable numerical computations.
410
Theorem
Every real m × n matrix A has a singular value decomposition
A = USV T
where
◮ U is an m ×m real orthogonal matrix,
◮ V is an n × n real orthogonal matrix,
◮ S is an m × n matrix with Sii = σi > 0 and Sij = 0, i 6= j ,
◮ σi for 1 6 i 6 min(m, n) are called the singular values of A.
Further, using the dot product as the inner product,
1. the σ2i are the eigenvalues of AA
T and ATA,
2. the columns of U are orthonormal eigenvectors ui of AA
T ,
3. the columns of V are orthonormal eigenvectors vi of A
TA,
4. Avi = σiui .
411
Note:
◮ AAT is an m ×m matrix and ATA is an n × n matrix.
◮ AAT and ATA are real symmetric matrices.
◮ Geometrically, U and V correspond to rotations (or rotations
followed by reflections), so only S changes the lengths of vectors by
stretching by factors σi > 0 and orthogonal projection.
412
How to find the singular value decomposition of A ∈ Mm,n
1. Find an orthonormal basis v1, . . . , vn of eigenvectors for A
TA, with
corresponding eigenvalues λ1, . . . , λn.
2. The singular values of A are
σi =
√
λi > 0.
3. Let
ui =
1
σi
Avi
for σi 6= 0, say i = 1, . . . , r = rank(A).
4. Extend u1, . . . ,ur to an orthonormal basis {u1, . . . ,um} for Rm.
5. Then
A = USV T
where V = [v1 . . . vn], U = [u1 . . . um] and S has diagonal entries
Sii = σi for i = 1, . . . ,min(m, n).
413
Example 18.
Let
A =
[
0 −1
0 0
]
and S =
[
σ1 0
0 σ2
]
.
Find orthogonal matrices U and V and scalars σ1, σ2 > 0 such that
A = USV T .
Solution:
Step 1. Find a matrix V
414
415
Step 2. Find a matrix U compatible with V using property 4 of theorem
416
417
7.6 Unitary matrices and Hermitian matrices
We now consider the complex analogues of orthogonal and real
symmetric matrices.
Definition (Conjugate transpose of a matrix)
Let A be a complex m × n matrix. We define its conjugate transpose
(or adjoint or Hermitian transpose) A∗ = A
T
to be the n ×m matrix
whose entries are
(A∗)ij = (A
T
)ij = Aji .
The conjugate transpose of A is sometimes denoted by A†.
Note:
If α ∈ C is a scalar, then the following properties hold whenever the
matrix products and sums exist:
(A∗)∗ = A, (A+ B)∗ = A∗ + B∗,
(AB)∗ = B∗A∗, (αA)∗ = α¯A∗.
418
Definition (Unitary matrix)
An unitary matrix is a complex n× n matrix U such that
U−1 = U∗ = U
T
.
Note:
◮ To check if a matrix is unitary it suffices to check that U∗U = In or
UU∗ = In.
◮ Any real orthogonal matrix is unitary if viewed as a complex matrix.
419
Example 19.
Show that
U =
1√
2
[−i i
1 1
]
is a unitary matrix.
Solution:
420
The Hermitian dot product on Cn can be written in matrix form:
〈u, v〉 = u1v1 + u2v2 + · · ·+ unvn = v¯Tu = v∗u,
where u, v ∈ Cn are regarded as column vectors.
This leads to the following analogues of our previous results.
Theorem
If U is a unitary n × n matrix then
1. 〈Uu,Uv〉 = 〈u, v〉 for all u, v ∈ Cn,
2. | det(U)| = 1,
where 〈u, v〉 is the Hermitian dot product on Cn.
421
Theorem
A complex n × n matrix is unitary if and only its columns form an
orthonormal basis for Cn using the Hermitian dot product.
Definition (Hermitian matrix)
A complex n × n matrix A is Hermitian or self-adjoint if
A∗ = A.
Note:
Any real symmetric matrix is Hermitian if viewed as a complex matrix.
422
Example 20.
Are the following matrices Hermitian?
(a) A =
[
1 i
−i 1
]
(b) B =
[
i 0
0 i
]
Solution:
423
Theorem
Let A be an n× n Hermitian matrix. Then using the Hermitian dot
product as inner product on Cn:
1. all eigenvalues λi (i = 1, · · · , n) of A are real,
2. eigenvectors corresponding to distinct eigenvalues are orthogonal,
3. there is an orthonormal basis of eigenvectors {v1, · · · , vn},
4. A is diagonalisable with
A = UDU−1 = UDU∗
where D is the diagonal matrix
diag(λ1, λ2, . . . , λn)
and U is the unitary matrix [
v1 · · · vn
]
.
424
Example 21.
Write the Hermitian matrix
A =
[
1 i
−i 1
]
in the form
A = UDU−1
where D is a diagonal matrix and U is a unitary matrix.
Solution:
Step 1. Find the eigenvalues of A
425
Step 2. Find a unit eigenvector corresponding to each eigenvalue
426
427
Step 3. Form D and U.
428