Linear Algebra 1
Course Notes for MATH 136
Edition 1.0
D. Wolczuk
Copyright: D. Wolczuk, 1st Edition, 2011
Contents
A Note to Students - READ THIS! . . . . . . . . . . . . . . . . . . . iii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1 Vectors in Euclidean Space 1
1.1 Vector Addition and Scalar Multiplication . . . . . . . . . . . . . . . 1
1.2 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Dot Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2 Systems of Linear Equations 28
2.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . 28
2.2 Solving Systems of Linear Equation . . . . . . . . . . . . . . . . . . . 32
3 Matrices and Linear Mappings 46
3.1 Operations on Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2 Linear Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3 Special Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.4 Operations on Linear Mappings . . . . . . . . . . . . . . . . . . . . . 75
4 Vector Spaces 78
4.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3 Coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5 Inverses and Determinants 107
5.1 Matrix Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
5.2 Elementary Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 Determinants and Systems of Equations . . . . . . . . . . . . . . . . 129
5.5 Area and Volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6 Diagonalization 138
6.1 Matrix of a Linear Mapping and Similar Matrices . . . . . . . . . . . 138
6.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . 143
6.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
6.4 Powers of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
ii
A NOTE TO STUDENTS - READ THIS!
Best Selling Novel: “My Favourite Math,” by Al G. Braw
Welcome to Linear Algebra!
In this course, you will be introduced to basic elements in linear algebra, including
vectors, vector spaces, matrices, linear mappings, and solving systems of linear equa-
tions. The strategy used in this course is to help you understand an example of a
concept before generalizing it. For example, in Chapter 1, you will learn about vectors
in Rn, and then in Chapter 4, we will extend part of what we did with vectors in Rn
to the abstract setting of a vector space. Of course, the better you understand the
particular example, the easier it will be for you to understand the more complicated
concepts introduced later.
As you will quickly see, this course contains both computational and theoretical
elements. Most students, with enough practice, will find the computational questions
(about 70% of the tests) fairly easy, but may have difficulty with the theory portion
(definitions, theorems and proofs). While studying this course, you should keep in
mind that when applying the material in the future, problems will likely be far too
large to compute by hand. Hence, the computational work will be done by a computer,
and you will use your knowledge and understanding of the theory to interpret the
results. Why do we make you compute things by hand if they can be done by a
computer? Because solving problems by hand can often help you understand the
theory. Thus, when you first learn how to solve problems in this course, pay close
attention to how solving the problems applies to the concepts in the course (definitions
and theorems). In doing so, not only will you find the computational problems easier,
but it will also help you greatly when doing proofs.
Prerequisites:
Teacher: Recall from your last course that...
Student: What?!? You mean we were supposed to remember that?
We expect that you know:
- How to solve a system of equations using substitution and elimination
- The basics of vectors in R2 and R3, including the dot product and cross product
- The basics of how to write proofs
- Basic operations on complex numbers
If you are unsure of any of these, we recommend that you do some self study to
learn/remember these topics.
iii
What is linear algebra used for?
Student: Will we ever use this in real life?
Professor: Not if you get a job flipping hamburgers.
Linear algebra is used in the social sciences, the natural sciences, engineering, busi-
ness, computer science, statistics, and all branches of mathematics. A small sample
of courses at UW that have linear algebra prerequisites include: AFM 272, ACTSC
291, AMATH 333, CM 271, CO 250, CS 371, ECON 301, PHYS 234, PMATH 330,
STAT 331.
Note that although we will mention applications of linear algebra in this course,
we will not discuss them in depth. This is because in most cases it would require
knowledge of the various applications that we do not expect you to have at this point.
Additionally, you will have a lot of time to apply your linear algebra knowledge and
skills in future courses.
How to do well in this course!
Student: Is there life after death?
Teacher: Why do you ask?
Student: I’ll need the extra time to finish all the homework you gave us.
1. Attend all the lectures.
Although these course notes have been written to help you learn the course,
the lectures are your primary resource to the material covered in the course.
The lectures will provide examples and explanations to help you understand the
course material. Additionally, the lectures will be interactive. That is, when
you do not understand something or need clarification, you can ask. Course
notes do not provide this feature (yet). You will likely find it very difficult to
get a good grade in the course without attending all the lectures.
2. Read these course notes.
It is generally recommended that students read course notes/text books prior to
class. If you are able to teach yourself some of the material before it being taught
in class, you will find the lectures twice as effective. The lecturers will then be
there for you to clarify what you taught yourself and to help with the areas that
you had difficulty with. Trust me, you will enjoy your classes much more if you
understand most of what is being taught in the lectures. Additionally, you will
likely find it considerably easier to take notes and to listen to the professor at
the same time.
iv
3. Doing/Checking homework assignments.
Homework assignments are designed to help you learn the material and prepare
for the tests. You should always give a serious effort to solve a problem before
seeking help. Spending the time required to solve a hard problem is not only
very satisfying, but it will greatly increase your knowledge of the course and
your problem solving skills. Naturally, if you are unable to figure out how to
solve a problem, it is very important to get help so that you can learn how to
do it. Make sure that you always indicate on each assignment where you have
received help. Never just read/copy solutions. On a test, you are not going
to have a solution to read/copy from. You need to make sure that you are able
to solve problems without referring to similar problems. Also, the assignments
are only worth 5% of your overall mark; the tests, 95%. Struggling on the
assignments and learning from your mistakes so that you do much better on
the tests is a much better way to go. For this reason, it is highly recommended
that you collect your assignments and compare them with the online solutions.
Even if you have the correct answer, it is worth checking if there is another way
to solve the problem.
4. Study!
This might sound a little obvious, but most students do not do nearly enough
of this. Moreover, you need to study properly. If you stay up all night studying
drinking Red Bull, the only thing that is going to grow wings and fly away is
your knowledge. Learning math, and most other subjects, is like building a
house. If you do not have a firm foundation, then it will be very difficult for
you to build on top of it. It is extremely important that you know and do not
forget the basics.
For study skills check this out:
http://www.adm.uwaterloo.ca/infocs/study/index.html
5. Use learning resources.
There are many places you can get help outside of the lectures. These include
the Tutorial Centre, your lecturer’s office hours, the UW-ACE website, and
others. They are there to assist you... make use of them!
v
Proofs
Student 1: What is your favorite part of mathematics?
Student 2: Knot Theory.
Student 1: Me neither.
Do you have difficulty with proof questions? Here are some suggestions to help you.
1. Know and understand all definitions and theorems.
Proofs are essentially puzzles that you have to put together. In class and in
these course notes we will give you all of the pieces and it is up to you to figure
out how they are to fit together to give the desired picture. Of course, if you
are missing or do not understand a required piece, then you are not going to be
able to get the correct result.
2. Learn proof techniques.
In class and in the course notes, you will see numerous proofs. You should use
these to learn how to select the correct puzzle pieces and to learn the ways
of putting the puzzle pieces together. In particular, I always recommend that
students ask the following two questions for every step in each proof:
(a) “Why is the step true?”
(b) “What was the purpose of the step?”
Answering the first question should really help your understanding of the def-
initions and theorems from class (the puzzle pieces). Answering the second
question, which is generally more difficult, will help you learn how to think
about creating proofs.
3. Practice!
As with everything else, you will get better at proofs with experience. Many
proofs in the course notes are left as exercises to give you practice in prov-
ing things. It is strongly recommended that you do (try) these proofs before
continuing further.
vi
ACKNOWLEDGMENTS
Thanks are expressed to:
Mike La Croix: for formatting, LaTeX consulting, and all of the amazing figures.
MEF: for their generous financial support.
Kate Schreiner and Joseph Horan: for proof-reading, editing, and suggestions.
All of my many colleagues whom have given many useful suggestions on how to teach
linear algebra.
vii
Chapter 1
Vectors in Euclidean Space
1.1 Vector Addition and Scalar Multiplication
DEFINITION
Rn
For any positive integer n the set of all elements of the form (x1, . . . , xn) where xi ∈ R
for 1 ≤ i ≤ n is called n-dimensional Euclidean space and is denoted Rn. The elements
of Rn are called points and are usually denoted P (x1, . . . , xn).
2-dimensional and 3-dimensional Euclidean space was originally studied by the an-
cient Greek mathematicians. In Euclid’s Elements, Euclid defined two and three
dimensional space with a set of postulates, definitions, and common notions. How-
ever, in modern mathematics, we not only want to make the concepts in Euclidean
space more mathematically precise, but we want to be able to easily generalize the
concepts to allow us to use the same ideas to solve problems in other areas.
In Linear Algebra, we will view Rn as a set of vectors rather than as a set of points.
In particular, we will write an element of Rn as a column vector and denote it with
the usual vector symbol ~x (Note that some textbooks will denote vectors in bold face
i.e. x). That is, ~x ∈ Rn can be represented as
~x =
x1
...
xn
, xi ∈ R for 1 ≤ i ≤ n
For example, in 3-dimensional Euclidean space, we will denote the origin (0, 0, 0), by
the vector ~0 =
00
0
.
1
2REMARKS
1. Although we are going to view Rn as a set of vectors, it is important to un-
derstand that we really are still referring to n-dimensional Euclidean space. In
some cases it will be useful to think of the vector
x1
...
xn
as the point (x1, . . . , xn).
In particular, we will sometimes interpret a set of vectors as a set of points to
get a geometric object (such as a line or plane).
2. In Linear Algebra all vectors in Rn should be written as column vectors, how-
ever, we will sometimes write these as n-tuples to match the notation that is
commonly used in other areas. In particular, when we are dealing with functions
of vectors, we will write f(x1, . . . , xn) rather than f
x1
...
xn
.
One advantage in viewing the elements of Rn as vectors instead of as points is that
we can perform operations on vectors. You likely have seen vectors in R2 and R3
in Physics being used to represent motion or force. From these physical examples
we can observe that we add vectors by summing their components and multiply a
vector by a scalar by multiplying each entry of the vector by the scalar. We keep this
definition for vectors in Rn.
DEFINITION
Vector Addition
Scalar
Multiplication
Let ~x =
x1
...
xn
and ~y =
y1
...
yn
be two vectors in Rn and c ∈ R. We define
~x+ ~y =
x1 + y1
...
xn + yn
c~x =
cx1
...
cxn
EXAMPLE 1 Let ~x =
12
−3
and ~y =
−13
0
be vectors in R3. Then,
~x+ ~y =
1 + (−1)2 + 3
−3 + 0
=
05
−3
3√
2~y =
−
√
2
3
√
2
0
2~x− 3~y =
24
−6
+
3−9
0
=
5−5
−6
Since we will often look at the sums of scalar multiples of vectors, we make the
following definition.
DEFINITION
Linear
Combination
Let ~v1, . . . , ~vk ∈ Rn. Then the sum
c1~v1 + c2~v2 + · · ·+ ck~vk
where ci ∈ R for 1 ≤ i ≤ k is called a linear combination of ~v1, . . . , ~vk.
Although it is intuitively obvious that for any ~v1, . . . , ~vk ∈ Rn that any linear combi-
nation
c1~v1 + c2~v2 + · · ·+ ck~vk
is also going to be a vector in Rn, it is instructive to prove this along with several
other important properties. We will see throughout Math 136 that these properties
are extremely important.
THEOREM 1 Let ~x, ~y, ~w ∈ Rn and c, d ∈ R. Then:
V1 ~x+ ~y ∈ Rn;
V2 (~x+ ~y) + ~w = ~x+ (~y + ~w);
V3 ~x+ ~y = ~y + ~x;
V4 There exists a vector ~0 ∈ Rn such that ~x+~0 = ~x;
V5 For each ~x ∈ Rn there exists a vector −~x ∈ Rn such that ~x+ (−~x) = ~0;
V6 c~x ∈ Rn;
V7 c(d~x) = (cd)~x;
V8 (c+ d)~x = c~x+ d~x;
V9 c(~x+ ~y) = c~x+ c~y;
V10 1~x = ~x.
Proof: We will prove V1 and V3 and leave the others as an exercise.
For V1, by definition we have
~x+ ~y =
x1 + y1
...
xn + yn
∈ Rn
4since xi + yi ∈ R.
For V3, we have
~x+ ~y =
x1 + y1
...
xn + yn
=
y1 + x1
...
yn + xn
= ~y + ~x
EXERCISE 1 Finish the proof of Theorem 1.
REMARKS
1. Observe that properties V2, V3, V7, V8, V9, and V10 only refer to the op-
erations of addition and scalar multiplication, while the other properties V1,
V4, V5, and V6 are about the relationship between the operations and the set
Rn. These facts should be clear in the proof of the theorem. Moreover, we see
that the zero vector of Rn is the vector ~0 =
0
...
0
, and the additive inverse of
~x =
x1
...
xn
is −~x = (−1)~x =
−x1
...
−xn
.
2. Properties V1 and V6 show that Rn is closed under linear combinations.
That is, if ~v1, . . . , ~vk ∈ Rn, then c1~v1 + · · · + ck~vk ∈ Rn for any c1, . . . , ck ∈ R.
This fact might seem rather obvious in Rn; however, we will soon see that there
are sets which are not closed under linear combinations. In Linear Algebra, it
will be important and useful to identify which sets have this nice property and,
in fact, all the properties V1 - V10.
It can be useful to look at the geometric interpretation of sets of linear combinations
of vectors.
EXAMPLE 2 Let ~v =
[
1
1
]
and consider the set S of all scalar multiples of ~v. That is,
S = {~x ∈ Rn | ~x = c~v for some c ∈ R}
What does S represent geometrically?
5Solution: Every vector in S has the form[
x1
x2
]
= t
[
1
1
]
=
[
t
t
]
, t ∈ R
Hence, we see this is all vectors in R2 where x1 = x2. Alternately, we can view this
as the set of all points (x1, x1) in R
2, which we recognize as the line x2 = x1.
x1
x2
O
~x = t
[
1
1
]
~v =
[
1
1
]
Figure 1.1.1: Geometric representation of S
EXAMPLE 3 Let ~e1 =
[
1
0
]
and ~e2 =
[
0
1
]
. How would you describe geometrically the set S of all
possible linear combinations of ~e1 and ~e2?
Solution: Every vector ~x in S has the form
~x = c1~e1 + c2~e2 = c1
[
1
0
]
+ c2
[
0
1
]
=
[
c1
c2
]
for any c1, c2 ∈ R. Therefore, any point in R2 can be represented as a linear combi-
nation of ~e1 and ~e2. Hence, S is the x1x2-plane.
Instead of using set notation to indicate a set of vectors, we often instead use the
vector equation of the set as we did in the example above.
EXAMPLE 4 Let ~m =
−11
1
and ~b =
10
1
. Describe the set with vector equation ~x = t~m + ~b,
t ∈ R geometrically.
Solution: We have all possible scalar multiples of ~m that are being translated by ~b.
Hence, this is a line in R3 that passes through the point (1, 0, 1) that has direction
vector ~m.
6We will often look at the set of all possible linear combinations of a set of vectors.
Thus, we make the following definition.
DEFINITION
Span
Let B = {~v1, . . . , ~vk} be a set of vectors in Rn. Then we define the span of B by
SpanB = {c1~v1 + c2~v2 + · · ·+ ck~vk | c1, . . . , ck ∈ R}
We say that SpanB is spanned by B and that B is a spanning set for SpanB.
DEFINITION
Span
Let B = {~v1, . . . , ~vk} be a set of vectors in Rn. Then we define the span of B by
SpanB = {c1~v1 + c2~v2 + · · ·+ ck~vk | c1, . . . , ck ∈ R}
We say that SpanB is spanned by B and that B is a spanning set for SpanB.
DEFINITION
Vector Equation
If the set S is spanned by vectors {~v1, . . . , ~vk}, then a vector equation for S is
~x = c1~v1 + c2~v2 + · · ·+ ck~vk, c1, . . . , ck ∈ R
EXAMPLE 5 Using the definition above, we can write the line in Example 2 as Span
{[
1
1
]}
.
The set S in Example 3 can be written as S = Span
{[
1
0
]
,
[
0
1
]}
= Span{~e1, ~e2}.
EXAMPLE 6 Let S = Span
10
0
,
01
0
, T = Span
10
1
,
20
2
, and U =
1−1
2
. Describe
each set geometrically and determine a vector equation for each.
Solution: By definition of span we have
S =
c1
10
0
+ c2
01
0
| c1, c2 ∈ R
.
So, we see that S is the set of all vectors of the form
c1c2
0
, and so represents the
x1x2-plane in R
3. A vector equation for S is ~x = c1
10
0
+ c2
01
0
.
7For T , we have
T =
c1
10
1
+ c2
20
2
| c1, c2 ∈ R
We see that T is the set of all vectors of the form
c1 + 2c20
c1 + 2c2
=
c0
c
with c = c1 + 2c2. Hence, T is the line in R
3 with vector equation ~x = c
10
1
.
Observe that U only contains a single vector. Hence, U is just a single point in R3
and has vector equation ~x =
1−1
2
.
We could have also written the vector equation of T as ~x = c1
10
1
+c2
20
2
. However,
we were able to simplify the vector equation since the second vector was a scalar
multiple of the first. In general, when writing a vector equation for a set (or writing
the span of a set of vectors), we want to simplify the equation (set) as much as
possible. That is, we want to remove any unnecessary vectors.
Let’s first consider the simple case of two vectors ~v1, ~v2 in R
3. Then, we have that
Span{~v1, ~v2} is a plane in R3 if and only if neither ~v1 nor ~v2 is a scalar multiple of the
other. What if ~v2 is a scalar multiple of ~v1? Then, we have Span{~v1, ~v2} = Span{~v1}
and so it either represents a line or just the origin if ~v1 = ~v2 = ~0.
If we have k vectors in Rn, when do we have
Span{~v1, . . . , ~vk} = Span{~v1, . . . , ~vk−1}?
For this to be true, the vector ~vk must be in the set Span{~v1, . . . , ~vk−1}. Hence, ~vk
must be able to be written as a linear combination of ~v1, . . . , ~vk−1. We now prove the
converse.
THEOREM 2 If ~vk can be written as a linear combination of ~v1, . . . , ~vk−1, then
Span{~v1, . . . , ~vk} = Span{~v1, . . . , ~vk−1}
Proof: We are assuming that there exist c1, . . . , ck−1 ∈ R such that
c1~v1 + · · ·+ ck−1~vk−1 = ~vk.
8Let ~x ∈ Span{~v1, . . . , ~vk}. Then there exist d1, . . . , dk ∈ R such that
~x = d1~v1 + · · ·+ dk−1~vk−1 + dk~vk
= d1~v1 + · · ·+ dk−1~vk−1 + dk(c1~v1 + · · ·+ ck−1~vk−1)
= (d1 + dkc1)~v1 + · · ·+ (dk−1 + dkck−1)~vk−1
Thus, ~x ∈ Span{~v1, . . . , ~vk−1}. Hence, Span{~v1, . . . , ~vk} ⊆ Span{~v1, . . . , ~vk−1}.
Clearly, we have Span{~v1, . . . , ~vk−1} ⊆ Span{~v1, . . . , ~vk} and so
Span{~v1, . . . , ~vk} = Span{~v1, . . . , ~vk−1}
as required.
Of course, by rearranging the vectors in the set we see that any vector that can be
written as a linear combination of the other vectors can be removed from the set
without changing the set it spans.
EXAMPLE 7 Describe geometrically the following sets and write a simplified vector equation of
each.
a) S = Span
{[
1
1
]
,
[
2
2
]
,
[
3
3
]}
Solution: By Theorem 2, we have
Span
{[
1
1
]
,
[
2
2
]
,
[
3
3
]}
= Span
{[
1
1
]
,
[
2
2
]}
= Span
{[
1
1
]}
Hence, S is a line in R2 with vector equation ~x = c1
[
1
1
]
.
b) T = Span
{[
1
0
]
,
[
2
0
]
,
[
1
3
]
,
[
0
1
]}
Solution: Since
[
2
0
]
is a scalar multiple of
[
1
0
]
, Theorem 2 gives
Span
{[
1
0
]
,
[
2
0
]
,
[
1
3
]
,
[
0
1
]}
= Span
{[
1
0
]
,
[
1
3
]
,
[
0
1
]}
We also have that
[
1
3
]
can be written as a linear combination of
[
1
0
]
and
[
0
1
]
, hence
Span
{[
1
0
]
,
[
1
3
]
,
[
0
1
]}
= Span
{[
1
0
]
,
[
0
1
]}
Hence, T is all of R2 and has vector equation ~x = c1
[
1
0
]
+ c2
[
0
1
]
.
c) U = Span
11
2
,
01
−1
,
00
0
9Solution: Since
00
0
is a scalar multiple of any vector we get
Span
11
2
,
01
−1
,
00
0
= Span
11
2
,
01
−1
Hence, U is a plane in R3 with vector equation ~x = c1
11
2
+ c2
01
−1
.
REMARK
Observe that for T any of the answers
~x = c1
[
1
0
]
+ c2
[
1
3
]
~x = c1
[
2
0
]
+ c2
[
0
1
]
~x = c1
[
2
0
]
+ c2
[
1
3
]
would also be correct.
From the examples we see that it is very important to detect when one or more vectors
in the set can be made up as linear combinations of some (or all) of the other vectors.
We call such sets linearly dependent sets. i.e.
{[
1
0
]
,
[
2
0
]
,
[
1
3
]
,
[
0
1
]}
is linearly
dependent and
10
0
,
01
0
,
00
1
is linearly independent. In the examples, it
was quite easy to see when we had this situation; however, in real world applications
we may get hundreds of vectors each having hundreds of entries. So, we need to
develop a method for detecting these. To do this, we first need a mathematical
definition of linear dependence.
If a set {~v1, . . . , ~vk} is linearly dependent, then it means that one of the vectors can
be made as a linear combination of some (or all) of the other vectors. Say, ~vi can be
written as a linear combination of the other vectors, that is, there exist c1, . . . ck ∈ R
such that
−civi = c1v1 + · · ·+ ci−1vi−1 + ci+1vi+1 + · · ·+ ckvk
where ci 6= 0. We can rewrite this as
~0 = c1v1 + · · ·+ ci−1vi−1 + civi + ci+1vi+1 + · · ·+ ckvk
10
Hence, we get our mathematical definition.
DEFINITION
Linearly Dependent
Linearly
Independent
A set of vectors {~v1, . . . , ~vk} in Rn is said to be linearly dependent if there exist
coefficients c1, . . . , ck not all zero such that
~0 = c1~v1 + · · ·+ ck~vk
Thus, a set of vectors {~v1, . . . , ~vk} is said to be linearly independent if the only
solution to
~0 = c1~v1 + · · ·+ ck~vk
is c1 = c2 = · · · = ck = 0 (called the trivial solution).
THEOREM 3 If a set of vectors {~v1, . . . , ~vk} contains the zero vector then it is linearly dependent.
Proof: Without loss of generality assume ~vk = ~0. Then we have
0~v1 + 0~v2 + · · ·+ 0~vk−1 + 1~vk = ~0
Hence, the equation ~0 = c1v1 + · · ·+ ckvk has a solution with one coefficient, namely
ck, that is non-zero. So, by definition, the set is linearly dependent.
EXAMPLE 8 Determine whether
7−14
6
,
−1015
15/14
,
−10
3
is linearly dependent or linearly in-
dependent.
Solution: We consider
c1
7−14
6
+ c2
−1015
15/14
+ c3
−10
3
=
00
0
Using operations on vectors we get
7c1 − 10c2 − c3−14c1 + 15c2
6c1 +
15
14
c2 + 3c3
=
00
0
Thus, we get 3 equations in 3 unknowns
7c1 − 10c2 − c3 = 0
−14c1 + 15c2 = 0
6c1 +
15
14
c2 + 3c3 = 0
Solving we find that there are in fact infinitely many possible solutions. One is c1 =
3
7
,
c2 =
2
5
and c3 = −1. Hence, the set is linearly dependent.
11
REMARK
Observe that for determining whether a set {~v1, . . . , ~vk} in Rn is linearly dependent
or linearly independent requires determining solutions of the vector equation c1~v1 +
· · ·+ ck~vk = ~0. However, this equation actually represents n equations (one for each
entry of the vectors) in k unknowns c1, . . . , ck. In the next chapter, we will look at
how to efficiently solve such systems of equations.
What we have derived above is that the simplest spanning set for a given set is one
that is linearly independent. Hence, we make the following definition.
DEFINITION
Basis
If a subset S of Rn can be written as a span of vectors ~v1, . . . , ~vk where {~v1, . . . , ~vk}
is linearly independent, then we say that {~v1, . . . , ~vk} is a basis for S.
This definition and its generalization will be extremely important throughout the
remainder of the book. At this point, however, we just define the following very
important bases.
DEFINITION
Standard Basis
In Rn, let ~ei represent the vector whose i-th component is 1 and all other components
are 0. The set {~e1, . . . , ~en} is called the standard basis for Rn.
EXAMPLE 9 The standard basis for R2 is {~e1, ~e2} =
{[
1
0
]
,
[
0
1
]}
.
EXERCISE 2 Write the standard basis for R4.
Surfaces in Higher Dimensions
We can extend our geometrical concepts of lines and planes to Rn for n > 3.
DEFINITION
Line in Rn
Let ~v,~b ∈ Rn with ~v 6= ~0. Then we call the set with vector equation ~x = c1~v + ~b,
c1 ∈ R a line in Rn which passes through ~b.
DEFINITION
Plane in Rn
Let ~v1, ~v2,~b ∈ Rn with {~v1, ~v2} being a linearly independent set. Then we call the
set with vector equation ~x = c1~v1 + c2~v2 +~b, c1, c2 ∈ R a plane in Rn which passes
through ~b.
12
DEFINITION
k-plane in Rn
Let ~v1, . . . , ~vk,~b ∈ Rn with {~v1, . . . , ~vk} being a linearly independent set. Then we
call the set with vector equation ~x = c1~v1 + · · ·+ ck~vk +~b, c1, . . . , ck ∈ R a k-plane
in Rn which passes through ~b.
DEFINITION
Hyperplane in Rn
Let ~v1, . . . , ~vn−1,~b ∈ Rn with {~v1, . . . , ~vn−1} being linearly independent. Then the set
with vector equation ~x = c1~v1 + · · ·+ cn−1~vn−1 +~b, ci ∈ R is called a hyperplane in
Rn which passes through ~b.
REMARK
Geometrically we see that a line in Rn has dimension 1, a k-plane in Rn has dimension
k, and a hyperplane in Rn has dimension n−1. We will make a precise mathematical
definition of dimension in Chapter 4.
EXAMPLE 10 The set Span
1
0
0
1
,
0
1
0
−2
,
0
1
1
−1
defines a hyperplane in R
4 since
1
0
0
1
,
0
1
0
−2
,
0
1
1
−1
is linearly independent.
EXERCISE 3 Give a vector equation for a hyperplane in R5.
1.2 Subspaces
As we mentioned when we first looked at the ten properties of vector addition and
scalar multiplication, Theorem 1, properties V1 and V6 seem rather obvious. How-
ever, it is easy to see that not all subsets of Rn have this property. For example, the
set S =
{[
1
1
]}
clearly does satisfy property V1 and V6 since
[
1
1
]
+
[
1
1
]
=
[
2
2
]
6∈ S
c
[
1
1
]
=
[
c
c
]
6∈ S if c 6= 1
13
A set which satisfies property V1 is said to be closed under addition. A set which
satisfies property V6 is said to be closed under scalar multiplication.
You might be wondering why we should be so interested in whether a set has properties
V1 and V6 or not. One reason is that a non-empty subset of Rn which does not have
these properties is not closed under linear combinations. Another reason is that a
non-empty subset of Rn which does have these properties will in fact satisfy all the
same ten properties as Rn. That is, it is a “space” which is a subset of the larger
space Rn.
DEFINITION
Subspace of Rn
A non-empty subset S of Rn is called a subspace of Rn if it satisfies all ten properties
V1 - V10 of Theorem 1.
It, of course, would be tedious to have to check all ten properties each time we want
to prove a non-empty subset S of Rn is a subspace. The good news is that this is
not necessary. Observe that properties V2, V3, V7, V8, V9, V10 are only about the
operations of addition and scalar multiplication. Thus, we already know that these
properties hold. Also, if V1 and V6 hold, then for any ~v ∈ S we have ~0 = 0~v ∈ S and
(−1)~v = (−~v) ∈ S, so V4 and V5 hold. Thus, to check if a non-empty subset of Rn
is a subspace, we can use the following result.
THEOREM 1 (Subspace Test)
Let S be a non-empty subset of Rn. If ~x+~y ∈ S and c~x ∈ S for all ~x, ~y ∈ S and c ∈ R,
then S is a subspace of Rn.
REMARK
If S is a non-empty and is closed under scalar multiplication, then our work above
shows us that S must contain ~0. Thus, the standard way of checking if S is non-
empty is to determine if ~0 satisfies the conditions of the set. If ~0 is not in S, then we
automatically know that it is not a subspace of Rn as it would not satisfy V4 or V6.
EXAMPLE 1 Determine, with proof, which of the following are subspaces of R3. Describe each
subspace geometrically and write a basis for the subspace.
a) S1 =
x1x2
x3
| x1 + x2 = 0, x1 − x3 = 0
.
Solution: By definition S1 is a subset of R
3, and ~0 satisfies the conditions of the set
(0 + 0 = 0 and 0− 0 = 0) so ~0 ∈ S1. Thus, S1 is non-empty subset of R3, so we can
apply the Subspace Test.
14
To show that S1 is closed under addition we pick two vectors ~x, ~y ∈ S1 and show that
~x + ~y satisfies the conditions of S1. If ~x =
x1x2
x3
and ~y =
y1y2
y3
are in S1, then we
have x1 +x2 = 0, x1−x3 = 0, y1 + y2 = 0, and y1− y3 = 0. By definition of addition,
we get
~x+ ~y =
x1 + y1x2 + y2
x3 + y3
Then we see that
(x1 + y1) + (x2 + y2) = x1 + x2 + y1 + y2 = 0 + 0 = 0
and
(x1 + y1)− (x3 + y3) = x1 − x3 + y1 − y3 = 0 + 0 = 0
Hence, ~x+ ~y satisfies the conditions of S1, so S1 is closed under addition.
Similarly, for any c ∈ R we have
c~x =
cx1cx2
cx3
and
cx1 + cx2 = c(x1 + x2) = c0 = 0, cx1 − cx3 = c(x1 − x3) = c0 = 0
Hence, S1 is also closed under scalar multiplication.
Therefore, by the subspace test S1 is a subspace of R
3.
To describe S1 geometrically, we see that since x1 + x2 = 0 and x1 − x3 = 0 we have
x2 = −x1 and x3 = x1. Thus, a general vector in the set can be represented as
x1x2
x3
=
x1−x1
x1
= x1
1−1
1
.
Thus, S1 is a line in R
3 and B1 =
1−1
1
is a basis for S1.
b) S2 =
x1x2
x3
| x1 + x2 = 1
.
Solution: By definition S2 is a subset of R
3, but we see that ~0 does not satisfy the
condition of S2 (0 + 0 6= 1), hence S2 is not a subspace of R3.
c) S3 =
x1x2
x3
| x1 − 2x2 = 0
.
15
Solution: By definition S3 is a subset of R
3, and ~0 satisfies the conditions of the set
(0 − 2(0) = 0) so ~0 ∈ S3. Thus, S3 is non-empty subset of R3, so we can apply the
Subspace Test.
Let ~x =
x1x2
x3
and ~y =
y1y2
y3
be vectors in S3, then we have x1 − 2x2 = 0, and
y1 − 2y2 = 0. Then
~x+ ~y =
x1 + y1x2 + y2
x3 + y3
and
(x1 + y1)− 2(x2 + y2) = x1 − 2x2 + y1 − 2y2 = 0 + 0 = 0
Hence, S3 is closed under addition.
Similarly, for any c ∈ R we have
c~x =
cx1cx2
cx3
and
cx1 − 2(cx2) = c(x1 − 2x2) = c0 = 0
Hence, S3 is also closed under scalar multiplication.
Therefore, S3 is a subspace of R
3 by the Subspace Test.
Since x1 − 2x2 = 0 we have x1 = 2x2. So, a general vector in S3 can be represented
as
x1x2
x3
=
2x2x2
x3
= x2
21
0
+ x3
00
1
Thus, S3 is a plane in R
3 since B2 =
21
0
,
00
1
is a basis for S3.
EXERCISE 1 Let ~v,~b ∈ Rn. Show that the line S = {t~v+~b | t ∈ R} is a subspace of Rn if and only
if ~b is a scalar multiple of ~v.
THEOREM 2 Let ~v1, . . . , ~vk ∈ Rn. Then S = Span{~v1, . . . , ~vk} is a subspace of Rn.
Proof: The proof is left as an exercise.
16
1.3 Dot Product
In high school you saw the dot product in R2 and R3. Recall that[
x1
x2
]
·
[
y1
y2
]
= x1y1 + x2y2
x1x2
x3
·
y1y2
y3
= x1y1 + x2y2 + x3y3
You saw that two useful aspects of the dot product were in calculating the length of
a vector and in determining whether two vectors are orthogonal. In particular, the
length of a vector ~v is ‖~v‖ =
√
~v · ~v, and two vectors ~x and ~y are orthogonal if and
only if ~x · ~y = 0. In relation to the latter, the dot product also determines the angle
between two vectors in R2.
THEOREM 1 Let ~x, ~y ∈ R2 and let θ be the angle between ~x and ~y. Then
~x · ~y = ‖~x‖‖~y‖ cos θ
O x1
x2
θ
(y1, y2)
(x1, x2)
‖~y‖
‖~x‖
√
(x1 − y1)2 + (x2 − y2)2
Figure 1.3.1: A figure illustrating the relationship between the dot product and cosines.
Since the dot product is so useful in R2 and R3, it makes sense to extend it to Rn.
DEFINITION
Dot Product
Let ~x =
x1
...
xn
and ~y =
y1
...
yn
be vectors in Rn. Then the dot product of ~x and ~y is
~x · ~y =
x1
...
xn
·
y1
...
yn
= x1y1 + · · ·+ xnyn = n∑
i=1
xiyi
17
REMARK
The dot product is also called the standard inner product or the scalar product
of Rn.
EXAMPLE 1 Let ~x =
1
2
−1
1
, ~y =
0
0
1
1
, and ~z =
3
1
−2
−2
. Calculate ~x · ~y, ~y · ~z, ~z · ~z, and (~x · ~z)~z.
Solution: We have
~x · ~y = 1(0) + 2(0) + (−1)(1) + 1(1) = 0
~y · ~z = 0(3) + 0(1) + 1(−2) + 1(−2) = −4
~z · ~z = 3(3) + 1(1) + (−2)(−2) + (−2)(−2) = 18
(~x · ~z)~z = (1(3) + 2(1) + (−1)(−2) + 1(−2))~z = 5
3
1
−2
−2
=
15
5
−10
−10
THEOREM 2 Let ~x, ~y, ~z ∈ Rn and let s, t ∈ R. Then
(1) ~x · ~x ≥ 0 and ~x · ~x = 0 if and only if ~x = ~0.
(2) ~x · ~y = ~y · ~x
(3) ~x · (s~y + t~z) = s(~x · ~y) + t(~x · ~z)
Proof: Let ~x =
x1
...
xn
, ~y =
y1
...
yn
, and ~z =
z1
...
zn
. Then for (1)
~x · ~x = x21 + · · ·+ x2n ≥ 0
and ~x · ~x = 0 if and only if xi = 0 for 1 ≤ i ≤ n.
For (2) we have
~x · ~y = x1y1 + · · ·+ xnyn = y1x1 + · · ·+ ynxn = ~y · ~x
18
For (3) We have
~x · (s~y + t~z) =
x1
...
xn
·
sy1 + tz1
...
syn + tzn
= x1(sy1 + tz1) + · · ·+ xn(syn + tzn)
= sx1y1 + tx1z1 + · · ·+ sxnyn + txnzn
= s(x1y1 + · · ·+ xnyn) + t(x1z1 + · · ·+ xnzn)
= s(~x · ~y) + t(~x · ~z)
Observe that property (1) allows us to define the length of a vector in Rn to match
our formula in R2 and R3.
DEFINITION
Length
Let ~x ∈ Rn. The length or norm of ~x is defined to be
‖~x‖ =
√
~x · ~x
EXAMPLE 2 Find the length of ~x =
1
2
−1
0
and ~y =
1
−2
3
1
1
.
Solution: We have
‖~x‖ =
√
~x · ~x =
√
12 + 22 + (−1)2 + 02 =
√
6
‖~y‖ =
√
~y · ~y =
√
12 + (−2)2 + 32 + 12 + 12 =
√
16 = 4
DEFINITION
Unit Vector
A vector ~x ∈ Rn such that ‖~x‖ = 1 is called a unit vector.
THEOREM 3 Let ~x, ~y ∈ Rn and c ∈ R. Then
(1) ‖~x‖ ≥ 0 and ‖~x‖ = 0 if and only if ~x = ~0.
(2) ‖c~x‖ = |c|‖~x‖
(3) |~x · ~y| ≤ ‖~x‖‖~y‖ (Cauchy-Schwarz-Buniakowski Inequality)
(4) ‖~x+ ~y‖ ≤ ‖~x‖+ ‖~y‖ (Triangle Inequality)
19
Proof: We prove (3) and leave the rest as exercises.
If ~x = ~0 the result is obvious. For ~x 6= ~0, we use (1) and properties of dot products
to get
0 ≤ (t~x+ ~y) · (t~x+ ~y) = (~x · ~x)t2 + 2(~x · ~y)t+ (~y · ~y)
for any ~x, ~y ∈ Rn and t ∈ R. This is a polynomial in t where the coefficient ~x · ~x of
t2 is positive. Thus, for the polynomial to be non-negative it must have non-positive
discriminant. Hence,
(2~x · ~y)2 − 4(~x · ~x)(~y · ~y) ≥ 0
4(~x · ~y)2 − 4‖~x‖2‖‖~y‖2 ≥ 0
(~x · ~y)2 ≥ ‖~x‖2‖~y‖2
Taking square roots of both sides gives the desired result.
We now also extend the concept of angles, and in particular orthogonality, to vectors
in Rn.
DEFINITION
Angle in Rn
Let ~x, ~y ∈ Rn. Then we define the angle between ~x and ~y to be the angle θ such that
~x · ~y = ‖~x‖‖~y‖ cos θ
REMARK
Observe that property (3) of Theorem 3 guarantees that such an angle θ exists for
each pair ~x, ~y.
DEFINITION
Orthogonal
Two ~x, ~y ∈ Rn are said to be orthogonal if and only if ~x · ~y = 0.
EXAMPLE 3 The vectors ~x =
1
0
1
2
and ~y =
2
3
−4
1
are orthogonal because
~x · ~y = 1(2) + 0(3) + 1(−4) + 2(1) = 0
The vectors ~u =
1
2
1
−1
1
and ~v =
1
1
2
1
5
are not orthogonal becuase
~u · ~v = 1(1) + 2(1) + 1(2) + (−1)(1) + 5(1) = 9
The zero vector ~0 in Rn is orthogonal to every vector ~x ∈ Rn since ~x ·~0 = 0.
20
EXAMPLE 4 The standard basis vectors for R3 form an orthogonal set since every pair of vectors
in the set is orthogonal. That is
10
0
·
01
0
= 0
10
0
·
00
1
= 0
00
1
·
01
0
= 0
REMARK
In fact, one reason the standard basis vectors for Rn are so easy to work with is
because that they form an orthogonal set of unit vectors. This will be examined in
Math 235.
Scalar Equation of a Plane
We can use the dot product to write the equation of a plane in R3 in a useful form.
Let ~n =
n1n2
n3
be a vector in R3 that is orthogonal to every vector in the plane
(called the normal vector) and let A(a1, a2, a3) be any point on the plane. Then for
any other point X(x1, x2, x3) in the plane we can form the directed line segment
between A and X. We define
−→
AX=
x1 − a1x2 − a2
x3 − a3
By definition, this is a vector which lies in the plane and hence is orthogonal to ~n.
Thus we have ~n·
−→
AX= 0. Simplifying this we get
n1n2
n3
·
x1 − a1x2 − a2
x3 − a3
= 0
n1(x1 − a1) + n2(x2 − a2) + n3(x3 − a3) = 0
n1x1 + n2x2 + n3x3 = n1a1 + n2a2 + n3a3
Observe that this equation is valid for every point X(x1, x2, x3) in the plane, and
hence is an equation of the plane. It is called the scalar equation of the plane in
R3.
21
EXAMPLE 5 Find the scalar equation of the plane which passes through A(1, 2, 3) with normal
vector ~n =
1−1
2
.
Solution: From our work above, we have that the scalar equation is
1x1 + (−1)x2 + 2(x3) = 1(1) + (−1)(2) + 2(3) = 5
EXAMPLE 6 Find the scalar equation of the plane which passes through A(−1, 0, 2) that is parallel
to the plane x1 − 2x2 + 3x3 = −2.
Solution: For two planes to be parallel, their normal vectors must also be parallel.
That is, a normal vector for the required plane is also ~n =
1−2
3
. Thus, the scalar
equation of the required plane is
x1 − 2x2 + 3x3 = 1(−1) + (−2)(0) + 3(2) = 5
We can extend this to Rm. Let ~n =
n1
...
nm
. The set of all vectors which are orthogonal
to ~n will form a hyperplane of Rm. Thus, the standard form for the scalar equation
of a hyperplane in Rm is
n1x1 + · · ·+ nmxm = d
where d is a constant.
Notice that our derivation of the scalar equation required us to know the normal
vector of the plane in advance. However, as we saw in the previous section, typically
in Linear Algebra we will be given the vector equation of a plane. Therefore, we need
to determine how to find a normal vector of a plane given its vector equation.
Let P be the plane with vector equation
~x = c1~v + c2 ~w +~b = c1
v1v2
v3
+ c2
w1w2
w3
+
b1b2
b3
c1, c2 ∈ R. We want to find a vector ~n that is orthogonal to every vector which lies in
the plane. We first find directed line segments which lie in the plane that originate
from the same point. Taking c1 = 0 = c2 we get that B(b1, b2, b3) lies in the plane,
taking c1 = 1, c2 = 0 we get that V (v1 + b1, v2 + b2, v3 + b3) lies in the plane, and
22
taking c1 = 0, c2 = 1, we get thatW (w1+b1, w2+b2, w3+b3) lies in the plane. Hence,
the directed line segments
−→
BV =
v1v2
v3
= ~v
−→
BW =
w1w2
w3
= ~w
are vectors in the plane which originate from the same point ~b. We now want to find
a vector ~n such that
~n · ~v = 0 = ~n · ~w
This gives the system of two equations in three unknowns
v1n1 + v2n2 + v3n3 = 0
w1n1 + w2n2 + w3n3 = 0
Solving the system we find that one solution is
~n =
v2w3 − v3w2v3w1 − v1w3
v1w2 − v2w1
DEFINITION
Cross Product
Let ~v, ~w ∈ R3. Then, the cross product of ~v =
v1v2
v3
and ~w =
w1w2
w3
is
~v × ~w =
v2w3 − v3w2v3w1 − v1w3
v1w2 − v2w1
EXAMPLE 7 We have
12
3
×
65
4
=
2(4)− 3(5)3(6)− 1(4)
1(5)− 2(6)
=
−714
−7
01
0
×
10
0
=
1(0)− 0(0)0(1)− 0(0)
0(0)− 1(1)
=
00
−1
By definition, the cross product of ~v and ~w gives a vector ~n that is orthogonal to both
~v and ~w. Additionally, the cross product has the following properties.
23
THEOREM 4 For any vectors ~v, ~w, ~x ∈ R3 and c ∈ R we have
(1) If ~n = ~v × ~w, then for any ~y ∈ Span{~v, ~w} we have ~y · ~n = 0.
(2) ~v × ~w = −~w × ~v
(3) ~v × ~v = ~0
(4) If ~v× ~w = ~0 then either one of ~v or ~w is the zero vector or ~w is a scalar multiple
of ~v.
(5) ~v × (~w + ~x) = ~v × ~w + ~v × ~x
(6) (k~v)× (~w) = k(~v × ~w).
(7) ‖~v × ~w‖ = ∣∣‖~v‖‖~w‖ sin θ∣∣ where θ is the angle between ~v and ~w.
Proof: The proof is left as an exercise.
EXAMPLE 8 Find a scalar equation for the plane with vector equation ~x = s
31
−1
+t
−1−1
5
+
12
3
,
s, t ∈ R.
Solution: From our work above, we see that a normal vector to the plane is
~n =
31
−1
×
−1−1
5
=
4−14
−2
Thus, a scalar equation of the plane is
4x1 + (−14)x2 + (−2)x3 = 4(1) + (−14)(2) + (−2)(3) = −30
1.4 Projections
So far we have seen how to combine vectors together (via linear combinations) to
create other vectors. We now look at this procedure in reverse. We want to take a
vector and write it as a sum of two other vectors. In elementary physics, this is used
to split a force into its components along certain directions (usually horizontally and
vertical).
In particular, given vectors ~u,~v ∈ Rn with ~v 6= ~0 we want to write ~u as a sum of a
scalar multiple of ~v and another vector ~w which is orthogonal to ~v as in Figure 1.4.1.
To do this, we first need to find how much of ~u is in the direction of ~v.
Consider ~u = c~v + ~w. Then
~u · ~v = (c~v + ~w) · ~v = c(~v · ~v) + ~w · ~v = c‖~v‖2 + 0
Hence, c = ~u·~v
‖~v‖2
.
24
x1
x2
O
~a
~b
proj~a
(
~b
)
perp~a
(
~b
)
x1
x2
O
~a
~b
proj~a
(
~b
)
perp~a
(
~b
)
x1
x2
O
~a
~b
proj~a
(
~b
)
perp~a
(
~b
)
Figure 1.4.1: Examples of projections
DEFINITION
Projection in Rn
Let ~u,~v ∈ Rn with ~v 6= ~0. The projection of ~u onto ~v is defined by
proj~v ~u =
~u · ~v
‖~v‖2~v
Now that we have the projection of ~u onto ~v, it is easy to find the vector ~w that is
orthogonal to ~v such that ~u = ~v + ~w.
DEFINITION
Perpendicular in Rn
Let ~u,~v ∈ Rn with ~v 6= ~0. The perpendicular of ~u onto ~v is defined by
perp~v ~u = ~u− proj~v ~u
EXERCISE 1 Show that proj~v ~u · perp~v ~u = 0.
EXAMPLE 1 Let ~u =
[
2
3
]
and ~v =
[
3
4
]
. Find proj~v ~u and perp~v ~u.
25
Solution: We have
proj~v ~u =
~u · ~v
‖~v‖2~v =
18
25
[
3
4
]
perp~v(~u) = ~u− proj~v ~u =
[
2
3
]
−
[
54/25
72/25
]
=
[−4/25
3/25
]
EXAMPLE 2 Let ~v =
12
2
and ~u =
10
1
. Find proj~v ~u and perp~v ~u.
Solution: We have
proj~v ~u =
~u · ~v
‖~v‖2~v =
3
9
12
2
=
1/32/3
2/3
perp~v(~u) = ~u− proj~v ~u =
10
1
−
1/32/3
2/3
=
2/3−2/3
1/3
How do we find the projection of a vector onto a plane? The problem is that the
plane is made up of two linearly independent vectors, so we don’t really know onto
which vector in the plane we are projecting. From Figure 1.4.2 we see that we can
instead use the normal vector of the plane. In particular, the projection of the vector
onto the plane is the perpendicular of the projection of the vector onto the normal
vector of the plane.
EXAMPLE 3 Find the projection of ~u =
23
1
onto the plane 3x1 − x2 + 4x3 = 0.
Solution: We see that the normal vector is ~n =
3−1
4
. From our work above the
projection of ~u onto the plane is the perpendicular of the projection of ~u onto ~n. That
is
perp~n ~u = ~u−
~u · ~n
‖~n‖2~n =
23
1
− 7
26
3−1
4
=
31/2685/26
−1/13
26
x1
x2
x3
O
~v
~n
proj~n ~v
perp~n ~v
Figure 1.4.2: projection of a vector onto a plane
EXAMPLE 4 Find the projection of ~u =
−11
3
onto the plane with vector equation ~x = s
12
1
+
t
−22
0
, s, t ∈ R.
Solution: To find the projection, we first need to find a normal vector of the plane.
We have
~n =
12
1
×
−22
0
=
−2−2
6
Thus, the projection of ~u onto the plane is
perp~n ~u = ~u−
~u · ~n
‖~n‖2~n =
−11
3
− 18
44
−2−2
6
=
−2/1120/11
6/11
EXAMPLE 5 Find the projection of ~u =
23
5
onto the plane P = Span
1−1
0
,
12
0
.
Solution: The vector equation of the plane is ~x = s
1−1
0
+ t
12
0
, s, t ∈ R. Thus,
27
a normal vector of the plane is
~n =
1−1
0
×
12
0
=
00
3
Thus, the projection of ~u onto P is
perp~n ~u = ~u−
~u · ~n
‖~n‖2~n =
23
5
− 15
9
00
3
=
23
0
REMARK
In the last two examples we could have used any scalar multiple of ~n as a normal
vector.
Chapter 2
Systems of Linear Equations
We have already seen in the last chapter many cases where we needed to simultane-
ously solve a set of equations: when determining whether a set was linearly indepen-
dent, when determining if a vector was in the span of a set, and when determining the
formula for the cross product. For each of these cases, these examples could be solved,
with some effort, using high school methods of substitution and elimination. How-
ever, in the real world, we don’t often get a set of 3 linear equations in 3 unknowns.
Problems can have hundreds of thousands of equations and hundreds of thousands
of unknowns. Hence, we want to develop some theory for solving equations that will
allow us to solve such problems.
It is important to remember while studying this chapter that in typical applications
the computations are performed by computer. Thus, although we expect that you
can solve small systems by hand, it is more important that you understand how to
set up an appropriate system of equations and to interpret the results. In particular,
the theory presented in this chapter will be extremely important throughout Math
136 and Math 235.
2.1 Systems of Linear Equations
DEFINITION
Linear Equation
An equation of the form
a1x1 + · · ·+ anxn = b
where a1, . . . , an, b are constants is called a linear equation. The constants ai are
called the coefficients of the equation and b is called the right-hand side.
REMARKS
1. In this course the coefficients and right-hand side will typically be real numbers;
however, we will occasionally present linear equations where the coefficients and
right-hand side are complex.
28
29
2. From our work in the last chapter, we know that a linear equation where
a1, . . . , an, b ∈ R in n variables geometrically represents a hyperplane in Rn.
EXAMPLE 1 The equation x1 + 2x3 = 4 is linear. Note that the coefficient of x2 in this equation
is 0.
The equation x2 = −4x1 is also linear as it can be written as 4x1 + x2 = 0 (called
standard form).
The equations x21 + x1x2 = 5 and x sin x = 0 are not linear equations.
DEFINITION
System of Linear
Equations
A set of m linear equations in the same variables x1, . . . , xn is called a system of
linear equations.
A general system of m linear equations in n variables has the form
a11x1 + a12x2 + · · ·+ a1nxn = b1
a21x1 + a22x2 + · · ·+ a2nxn = b2
...
... =
...
am1x1 + am2x2 + · · ·+ amnxn = bm
Observe that the coefficient aij represents the coefficient of xj in the i-th equation.
DEFINITION
Solution of a
System
A vector ~s =
s1
...
sn
∈ Rn is called a solution of a system of m linear equations
in n unknowns (variables) if all m equations are satisfied when we set xi = si for
1 ≤ i ≤ n.
DEFINITION
Consistent
Inconsistent
If a system of linear equations has at least one solution, then it is said to be consis-
tent. Otherwise, it is said to be inconsistent.
EXAMPLE 2 A solution of the system of 3 linear equations in 2 unknowns
x1 + x2 = 1
2x1 − 3x2 = 12
−2x1 − 3x2 = 0
30
is ~s =
[
3
−2
]
, since if we take x1 = 3 and x2 = −2, then we get
3 + (−2) = 1
2(3)− 3(−2) = 12
−2(3)− 3(−2) = 0
Therefore, this system is consistent.
EXAMPLE 3 Two of the solutions of the system of 2 equations in 3 unknows
x1 − 2x2 = 3
x1 + x2 + 3x3 = 9
are ~s1 =
72
0
and ~s2 =
51
1
. Therefore, this system is consistent.
EXAMPLE 4 The system of 3 equations in 3 unknowns
2x2 + x3 = 1
−x1 + x2 − x3 = 2
x1 + x2 + 2x3 = 3
does not have any solutions. Hence, the system is inconsistent.
REMARK
Observe that geometrically a system of m linear equations in n variables represents
m hyperplanes in Rn and so a solution of the system is a vector in Rn which lies on
all m hyperplanes. The system is inconsistent if all m hyperplanes do not share a
point of intersection.
It is easy to show geometrically that a system of linear equations in 2 variables is
either inconsistent, consistent with a unique solution, or consistent with infinitely
many solutions (you should sketch all 3 possibilities). Of course, it is clear that we
can make a system of linear equations that is inconsistent, or that is consistent with
a unique solution. We now prove that if a system of equations is consistent with more
than one solution, then it must in fact have infinitely many solutions.
31
THEOREM 1 Assume the system of linear equations with a1, . . . , an, b ∈ R
a11x1 + a12x2 + · · ·+ a1nxn = b1
a21x1 + a22x2 + · · ·+ a2nxn = b2
...
... =
...
am1x1 + am2x2 + · · ·+ amnxn = bm
has two distinct solutions ~s =
s1
...
sn
and ~t =
t1
...
tn
. Then, ~x = ~s+ c(~s−~t) is a distinct
solution for each c ∈ R.
Proof: For each i, 1 ≤ i ≤ m we have
ai1
(
s1 + c(s1−t1)
)
+ · · ·+ ain
(
sn + c(sn − tn)
)
= ai1s1 + cai1s1 − cai1t1 + · · ·+ ainsn + cainsn − caintn
= ai1s1 + · · ·+ ainsn + c
(
ai1s1 + · · ·+ ainsn
)
+ c
(
ai1t1 + · · ·+ aintn
)
= bi + cbi − cbi
= bi
Thus, ~s+ c(~s− ~t) is a solution for each c ∈ R.
We now show that for any c1 6= c2, the solutions ~s + c1(~s − ~t) and ~s + c2(~s − ~t) are
distinct. Assume that they are not distinct. Then
~s+ c1(~s− ~t) = ~s+ c2(~s− ~t)
c1(~s− ~t) = c2(~s− ~t)
(c1 − c2)(~s− ~t) = ~0
Since c1 6= c2, this implies that ~s − ~t = ~0 and hence ~s = ~t. But this contradicts our
assumption that ~s 6= ~t.
DEFINITION
Solution Set
The set of all solutions of a system of linear equations is called the solution set of
the system.
In the next section, we will derive a couple of methods for finding the solution set of
a system of linear equations.
32
2.2 Solving Systems of Linear Equation
Given a system of 2 linear equations in 2 unknowns, we can use high school methods
of substitution and elimination to solve it.
EXAMPLE 1 Solve the system
2x1 + 3x2 = 11
3x1 + 6x2 = 7
Solution: We first want to solve for one variable, say x2. To do this, we need to
eliminate x1 from one of the equations. If we multiply the first equation by (-3) the
system becomes
−6x1 − 9x2 = −33
3x1 + 6x2 = 7
Now, if we multiply the second equation by 2 we get
−6x1 − 9x2 = −33
6x1 + 12x2 = 14
Adding the first equation to the second we get
−6x1 − 9x2 = −33
0x1 + 3x2 = −19
Multiplying the second equation by 1
3
gives
−6x1 − 9x2 = −33
0x1 + x2 = −19/3
We can now add 9 times the second equation to the first equation to get
−6x1 + 0x2 = −90
0x1 + x2 = −19/3
Finally, multiplying the top equation by −1
6
we get
x1 + 0x2 = 15
0x1 + x2 = −19/3
Hence, the system is consistent with unique solution
[
15
−19/3
]
.
Observe that after each step we actually have a new system of linear equations to
33
solve. Of course, the idea is that each new system has the same solution set as the
original and is easier to solve.
DEFINITION
Equivalent
Two systems of equations are said to be equivalent if they have the same solution
set.
Of course, this method can be used to solve larger systems as well. To invent a
method for solving very large systems, it is important that you really understand
how the method works for smaller systems.
EXERCISE 1 Solve the system by using substitution and elimination. Clearly show/explain all of
your steps used in solving the system and describe your general procedure.
x1 − 2x2 − 3x3 = 0
2x1 + x2 + 5x3 = −1
−x1 + x2 − x3 = 2
Now imagine that you need to solve a system of a hundred thousand equations with a
hundred thousand variables. Certainly you are not going to want to do this by hand
(it would be environmentally unfriendly to use that much paper). So, being clever,
you decide that you want to program a computer to solve the system for you. To do
this, you need to ask yourself a few questions: How is the computer going to store
the system? What operations is the computer allowed to perform on the equations?
How will the computer know which operations it should use to solve the system?
To answer the first question, we observe that when using substitution and elimination,
we do not really need to write down the variables xi each time as we are only modifying
the coefficients and right-hand side. Thus, it makes sense to store just the coefficients
and right-hand side of each equation in the system in a rectangular array.
DEFINITION
Augmented Matrix
Coefficient Matrix
The augmented matrix for the system of linear equations
a11x1 + a12x2 + · · ·+ a1nxn = b1
a21x1 + a22x2 + · · ·+ a2nxn = b2
... =
...
am1x1 + am2x2 + · · ·+ amnxn = bm
is the rectangular array
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
...
. . .
...
...
am1 am2 · · · amn bm
34
The coefficient matrix of the system is
a11 a12 · · · a1n
a21 a22 · · · a2n
...
. . .
...
am1 am2 · · · amn
EXAMPLE 2 The system of linear equations
3x1 + x2 + x3 = −1
6x1 + x3 = −2
−3x1 + 2x2 = 3
has coefficient matrix
3 1 16 0 1
−3 2 0
and augmented matrix
3 1 1 −16 0 1 −2
−3 2 0 3
REMARKS
1. It is very important to observe that we can look at the coefficient matrix in
two ways. First, the i-th row of the coefficient matrix represents the coefficients
of the i-th equation in the system. Second, the j-th column of the coefficient
matrix represents the coefficients of xj in all equations.
2. We will denote a system of linear equations with coefficient matrix A and right-
hand side ~b =
b1
...
bm
by [A | ~b]. In the next chapter, we will introduce another
way of denoting a system of equations.
To answer the question about what operations is the computer allowed to perform,
it is helpful to write each step of our example in matrix form.
35
EXAMPLE 3 Write each step of Example 1 in matrix form and describe how the rows of the matrix
were modified in each step.
Solution: The initial system was [
2 3 11
3 6 7
]
We then multiplied the first row of the augmented matrix by (-3) to get the augmented
matrix [ −6 −9 −33
3 6 7
]
We then multiplied the second row by 2 to get[ −6 −9 −33
6 12 14
]
Adding the first row to the second row gave[ −6 −9 −33
0 3 −19
]
Next we multiplied the second row by 1
3
to get[ −6 −9 −33
0 1 −19/3
]
Then we added 9 times the second row to the first.[ −6 0 −90
0 1 −19/3
]
The last step was to multiply the first row by −1
6
.[
1 0 15
0 1 −19/3
]
This is the augmented matrix for the system
x1 = 15
x2 = −19/3
which gives us the solution.
EXERCISE 2 Write each step of Exercise 1 in matrix form and describe how the rows of the matrix
were modified in each step.
Our method of solving has involved two types of operations. One was to multiply
36
a row by a non-zero number and the second was to add a multiple of one row to
another. However, to create a nice method for a computer to solve a system, we will
add one more type of operation: swapping rows.
DEFINITION
Elementary Row
Operations
The three elementary row operations (EROs) for a solving a system of linear
equations are:
1. multiplying a row by a non-zero scalar,
2. adding a multiple of one row to another,
3. swap two rows.
When applying EROs to a matrix, called row reducing the matrix, we often use
short hand notation to indicate which operations we are using:
1. cRi indicates multiplying the i-th row by c 6= 0.
2. Ri + cRj indicates adding c times the j-th row to the i-th row.
3. Ri ↔ Rj indicates swapping the i-th row and the j-th row.
For now you are required to always indicate which EROs you use in each step. Not
only will this help you in row reducing a matrix, and finding/fixing computational
errors, but later in the course, we will find a case where this is necessary.
Of course, our goal is to row reduce the augmented matrix of a system into the
augmented matrix of an equivalent system that is easier to solve.
DEFINITION
Row Equivalent
Two matrices A and B are said to be row equivalent if there exists a sequence of
elementary row operations that transform A into B.
REMARK
Observe that EROs are reversible (we can always perform an ERO that undoes what
an ERO did), hence if there exists a sequence of elementary row operations that
transform A into B, then there also exists a sequence of elementary row operations
that transforms B into A.
THEOREM 1 If the augmented matrices
[
A1 | ~b1
]
and
[
A | ~b
]
are row equivalent, then the systems
of linear equations associated with each system are equivalent.
Proof: The proof is left as an exercise.
Finally, to answer the third question about which operations a computer should use
to solve the system, we need to figure out an algorithm for the computer to follow.
The algorithm should apply elementary row operations to the augmented matrix of
a system until we have an equivalent system which is easy to solve.
37
DEFINITION
Reduced Row
Echelon Form
A matrix R is said to be in reduced row echelon form (RREF) if:
1. All rows containing a non-zero entry are above rows which only contains zeros.
2. The first non-zero entry in each non-zero row is 1, called a leading one.
3. The leading one in each non-zero row is to the right of the leading one in any
row above it.
4. A leading one is the only non-zero entry in its column.
If R is row equivalent to a matrix A, then we say that R is the reduced row echelon
form of A.
REMARK
The reduced row echelon form is sometimes called the row canonical form.
EXAMPLE 4 Determine which of the following matrices are in RREF. a)
1 0 2 00 1 1 0
0 0 0 1
b)
0 1 0 00 0 0 1
0 0 0 0
c)
1 0 0 20 1 1 1
0 0 1 1
d) [0 1 0
1 0 0
]
.
Solution: The matrices in a) and b) are in RREF. The matrix in c) is not in RREF
as the column containing the leading one in the third row has another non-zero entry.
The matrix d) is not in RREF since the leading one in the second row is to the left
of the leading one in the row above it.
THEOREM 2 The RREF of a matrix is unique.
Proof: The proof requires some additional tools which will be presented in Chapter
4.
EXAMPLE 5 Solve the following system of equations by row reducing the augmented matrix of the
system to RREF.
x1 + x2 = −7
2x1 + 4x2 + x3 = −16
x1 + 2x2 + x3 = 9
38
Solution: We get
1 1 0 −72 4 1 −16
1 2 1 9
R2 − 2R1
R3 − R1
∼
1 1 0 −70 2 1 −2
0 1 1 16
R2 ↔ R3
∼
1 1 0 −70 1 1 16
0 2 1 −2
R1 − R2
R3 − 2R2
∼
1 0 −1 −230 1 1 16
0 0 −1 −34
(−1)R3
∼
1 0 −1 −230 1 1 16
0 0 1 34
R1 +R2
R2 −R3 ∼
1 0 0 110 1 0 −18
0 0 1 34
Hence the solution is ~x =
11−18
34
.
EXAMPLE 6 Find the solution set of the following system of equations.
x1 + x2 + 2x3 + x4 = 3
x1 + 2x2 + 4x3 + x4 = 7
x1 + x4 = −21
Solution: We write the coefficient matrix and row reduce the coefficient matrix to
RREF.
1 1 2 1 31 2 4 1 7
1 0 0 1 −21
R2 − R1
R3 − R1
∼
1 1 2 1 30 1 2 0 4
0 −1 −2 0 −24
R1 − R2
R3 +R2
∼
1 0 0 1 −10 1 2 0 4
0 0 0 0 −20
Observe that the last row represents the equation 0x1 + 0x2 + 0x3 + 0x4 = −20.
Clearly this is impossible, so the system is inconsistent.
It is clear that if we ever get a row of the form
[
0 · · · 0 b ] with b 6= 0, then the
system of equations inconsistent as this corresponds to the equation 0 = b.
Infinitely Many Solutions
As we discussed above, a consistent system of linear equations either has a unique
solution or infinitely many. We now look at how to determine all solutions of a system
with infinitely many solutions. We start by considering an example.
39
EXAMPLE 7 Show the following system of equations has infinitely many solutions and find the
solution set.
x1 + x2 + x3 = 4
x2 + x3 = 3
Solution: We row reduce the augmented matrix to RREF.[
1 1 1 4
0 1 1 3
]
R1 − R2 ∼
[
1 0 0 1
0 1 1 3
]
Observe that the RREF of the augmented matrix corresponds to the system of equa-
tions
x1 = 1
x2 + x3 = 3
There are infinitely many choices for x2 and x3 that will satisfy x2 + x3 = 3. Hence,
the system has infinitely many solutions.
Observe that by choosing any x3 = t ∈ R, we get a unique value for x2; in particular,
x2 = 3− t. Thus, the solution set of the system is
~x =
x1x2
x3
=
13− t
t
=
13
0
+ t
0−1
1
for any t ∈ R.
REMARK
The geometric interpretation of the example above is that the planes x1+x2+x3 = 4
and x2 + x3 = 3 in R
3 intersect in the line
13
0
+ t
0−1
1
, t ∈ R.
DEFINITION
Free Variable
Let R be the RREF of a coefficient matrix of a system of linear equations. If the j-th
column of R does not contain a leading one, then we call xj a free variable.
Using the examples above, we can give an algorithm for solving a system of linear
equations.
40
ALGORITHM
To solve a system of linear equations:
1. Write the augmented matrix for the system.
2. Use elementary row operations to row reduce the augmented matrix into RREF.
3. Write the system of linear equations corresponding to the RREF.
4. If the system contains an equation of the form 0 = b where b 6= 0, then stop as
the system is inconsistent.
5. Otherwise, move each free variable (if any) to the right hand side of each equa-
tion and assign each free variable a parameter.
6. Determine the solution set by using vector operations to write the system as a
linear combination of vectors.
REMARK
The typical method for row reducing an augmented matrix to its RREF is called
Gauss-Jordan Elimination. However, although this algorithm for row reducing a
matrix always works, it is not always the most efficient way to reduce a matrix by
hand (or by computer). In individual cases, you may want to perform various “tricks”
to make the row reductions/computations easier. For example, you probably want to
avoid fractions as long as possible. These “tricks” are best learned from experience.
That is, you should row reduce lots of matrices.
EXAMPLE 8 Solve the following system of linear equations.
2x1 + x2 + x3 + x4 = −2
x1 + x3 + x4 = 1
−2x2 + 2x3 + 2x4 = 8
Solution: We have
2 1 1 1 −21 0 1 1 1
0 −2 2 2 8
R1 ↔ R2
1
2
R3
∼
1 0 1 1 12 1 1 1 −2
0 −1 1 1 4
R2 − 2R1 ∼
1 0 1 1 10 1 −1 −1 −4
0 −1 1 1 4
R3 +R2
∼
1 0 1 1 10 1 −1 −1 −4
0 0 0 0 0
41
The corresponding system of equations is
x1 + x3 + x4 = 1
x2 − x3 − x4 = −4
0 = 0
Since x3 and x4 are free variables we let x3 = s ∈ R and x4 = t ∈ R. Thus, we have
x1 = 1 − x3 − x4 = 1 − s − t and x2 = −4 + x3 + x4 = −4 + s + t. Therefore all
solutions have the form
~x =
x1
x2
x3
x4
=
1− s− t
−4 + s+ t
s
t
=
1
−4
0
0
+ s
−1
1
1
0
+ t
−1
1
0
1
We can solve systems of linear equations with complex coefficients over the complex
numbers C using exactly the same method.
EXAMPLE 9 Solve the following system over C.
z2 − iz3 = 1 + 3i
iz1 − z2 + (−1 + i)z3 = 1 + 2i
2z1 + 2iz2+ (3 + 2i)z3 = 4
Solution: We have
0 1 −i 1 + 3ii −1 −1 + i 1 + 2i
2 2i 3 + 2i 4
R1 ↔ R2 ∼
i −1 −1 + i 1 + 2i0 1 −i 1 + 3i
2 2i 3 + 2i 4
(−i)R1 ∼
1 i 1 + i 2− i0 1 −i 1 + 3i
2 2i 3 + 2i 4
R3 − 2R1
∼
1 i 1 + i 2− i0 1 −i 1 + 3i
0 0 1 2i
R1 − iR2 ∼
1 0 i 5− 2i0 1 −i 1 + 3i
0 0 1 2i
R1 − iR3
R2 + iR3
∼
1 0 0 7− 2i0 1 0 −1 + 3i
0 0 1 2i
42
EXAMPLE 10 Solve the following system over C.
(1 + i)z1 + 2iz2 − iz3 = i
2iz2 + z3 = i
z1 − z3 = 0
Solution:
1 + i 2i −i i0 2i 1 i
1 0 −1 0
R1 ↔ R3 ∼
1 0 −1 00 2i 1 i
1 + i 2i −i i
−i
2
R2
R3 − (1 + i)R1
∼
1 0 −1 00 1 −i/2 1/2
0 2i 1 i
R3 − 2iR2
∼
1 0 −1 00 1 −i/2 1/2
0 0 0 0
The corresponding system of equations is
z1 − z3 = 0
z2 − i
2
z3 =
1
2
Since z3 is a complex free variable we let z3 = t ∈ C. Thus, we get z1 = t and
z2 =
1
2
+ i
2
t, hence the solution is
~z =
z1z2
z3
=
t1
2
+ i
2
t
t
=
01/2
0
+ t
1i/2
1
Rank
It is clear that the number of free variables in a system of linear equations is the
number of columns in the RREF of the coefficient matrix which do not have leading
ones. It is therefore not surprising that counting the number of leading ones can be
very useful.
DEFINITION
Rank
The rank of a matrix is the number of leading ones in the RREF of the matrix.
EXAMPLE 11 Determine the rank of the following matrices.
a) A =
1 0 0 1 00 0 1 0 −2
0 0 0 0 0
.
43
Solution: Since A is in RREF, we see that A has two leading ones, so rankA = 2.
b) B =
2 1 1 1 −21 0 1 1 1
0 0 2 2 8
.
Solution: Observe that the matrix B is just the augmented matrix in Example 8.
Hence, performing the same row operations we find that the RREF of B is
R =
1 0 1 1 10 1 −1 −1 −4
0 0 0 0 0
Therefore, rankB = 2.
We now get the following extremely important theorem which summarizes all of our
results on system of linear equations in terms of rank. This theorem will be referenced
many times throughout the rest of these notes.
THEOREM 3 Let A be the m× n coefficient matrix of a system of linear equations.
(1) If the rank of A is less than the rank of the augmented matrix
[
A | ~b
]
, then the
system is inconsistent.
(2) If the system
[
A | ~b
]
is consistent, then the system contains n − rankA free
variables (parameters). In particular, a consistent system has a unique solution
if and only if rankA = n.
(3) rankA = m if and only if the system
[
A | ~b
]
is consistent for every ~b ∈ Rm.
Proof: (1) If the rank of A is less than the rank of the augmented matrix, then there
is a row in the RREF of the augmented matrix of the form
[
0 · · · 0 1 ]. This
corresponds to the equation 0 = 1, which implies the system is inconsistent.
(2) This follows immediately from the definition of rank and free variables.
(3) Assume that the RREF of
[
A | ~b
]
is [R | ~c]. If
[
A | ~b
]
is inconsistent for some
~b ∈ Rm, then the RREF of the augmented matrix must contain a row of the form[
0 · · · 0 1 ], as otherwise, we could find a solution to [A | ~b] by picking all free-
variables to be 0 and setting ci equal to the variable contains the leading one in the
i-th column. Therefore, rankA < m.
On the other hand, if rankA < m, then the RREF R of A has a row of all zeros.
Thus, [R | ~en] is inconsistent as it has a row of the form
[
0 · · · 0 1 ]. Since
elementary row operations are reversible, we can apply the reverse operations needed
to row reduce A to R on [R | ~en] to get
[
A | ~b
]
for some ~b ∈ Rn. Then, this system
is inconsistent since elementary row operations do not change the solution set. Thus,
there exists some ~b ∈ Rm such that
[
A | ~b
]
is inconsistent.
44
Homogeneous Systems
DEFINITION
Homogeneous
System
A system of linear equations is said to be a homogeneous system if the right-hand
side only contains zeros. That is, it has the form
[
A | ~0
]
.
We will soon see that homogeneous systems arise frequently in linear algebra. Of
course, all of our theory above regarding systems of linear equations applies to homo-
geneous systems. But, the fact that the right-hand side only contains 0s simplifies a
few things.
Observe that the last column of any matrix that is row equivalent to the augmented
matrix of a homogeneous system will only contain zeros. Thus, it is standard practice
to omit this row when solving a homogeneous system; we just row reduce the coeffi-
cient matrix. From this, we immediately observe that it is impossible to get a row of
the form
[
0 · · · 0 1 ], hence a homogeneous system is always consistent. In par-
ticular, it is easy to see that the zero vector ~0 is always a solution of a homogeneous
system. This is called the trivial solution.
EXAMPLE 12 Find the solution set of the homogeneous system
x1 + 2x2 + 3x3 = 0
−2x1 − 3x2 − 6x3 = 0
x1 + 3x2 + 4x3 = 0
x1 + 4x2 + 5x3 = 0
Solution: We row reduce the coefficient matrix to get
1 2 3
−2 −3 −6
1 3 4
1 4 5
R2 + 2R1R3 −R1
R4 −R1
∼
1 2 3
0 1 0
0 1 1
0 2 2
R1 − 2R2
R3 −R2
R4 − 2R2
∼
1 0 3
0 1 0
0 0 1
0 0 2
R1 − 3R3
R4 − 2R3
∼
1 0 0
0 1 0
0 0 1
0 0 0
This gives x1 = 0, x2 = 0, and x3 = 0. Therefore, the only solution is the trivial
solution.
45
EXAMPLE 13 Find the solution set of the homogeneous system
x1 + 2x2 = 0
x1 + 3x2 + x3 = 0
x2 + x3 = 0
Solution: We row reduce the coefficient matrix to get
1 2 01 3 1
0 1 1
R2 − R1 ∼
1 2 00 1 1
0 1 1
R1 − 2R2
R3 − R2 ∼
1 0 −20 1 1
0 0 0
The corresponding system is
x1 − 2x3 = 0
x2 + x3 = 0
Since x3 is a free variable, we let x3 = t ∈ R. Thus, the solution set has vector
equation
~x =
x1x2
x3
=
2t−t
t
= t
2−1
1
We see in both of the examples above that we could have written the solution set as a
spanning set. Which, by Theorem 2, indicates that both solution sets are subspaces
of R3.
THEOREM 4 The solution set of a homogeneous system of m linear equations in n variables is a
subspace of Rn.
Proof: The proof is left as an exercise.
REMARK
Because the solution set of a homogeneous system is a subspace, we often call it the
solution space of the system.
Chapter 3
Matrices and Linear Mappings
3.1 Operations on Matrices
Addition and Scalar Multiplication of Matrices
In the last chapter we used matrix notation to help us solve systems of linear equa-
tions. However, matrices can also be used in other settings as well. In particular, we
can treat matrices like vectors.
DEFINITION
Matrix
An m× n matrix A is a rectangular array with m rows and n columns. We denote
the entry in the i-th row and j-th column by aij. That is,
A =
a11 a12 · · · a1j · · · a1n
a21 a22 · · · a2j · · · a2n
...
...
...
...
ai1 ai2 · · · aij · · · ain
...
...
...
...
am1 am2 · · · amj · · · amn
REMARKS
1. Two matrices A and B are equal if and only if they have the same size and
have corresponding entries equal. That is, if aij = bij for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
2. When working with multiple matrices we sometimes denote the ij-th entry of
a matrix A by (A)ij = aij .
3. The set of all m× n matrices with real entries is denoted Mm×n(R).
We define addition and scalar multiplication of matrices to match what we did with
vectors in Rn.
46
47
DEFINITION
Addition and
Scalar
Multiplication
Let A,B ∈Mm×n(R) and c ∈ R. Then we define A +B and cA by
(A +B)ij = (A)ij + (B)ij
(cA)ij = c(A)ij
EXAMPLE 1 a)
[
1 2
3 −3
]
+
[
2 5
−1 3
]
=
[
3 7
2 0
]
.
b)
√
2
[
1 −2
−√2 √3
]
=
[√
2 −2√2
−2 √6
]
.
c) 3
[
1 −1
2 4
]
− 2
[
2 2
−2 3
]
=
[
3− 4 −3− 4
6 + 4 12− 6
]
=
[−1 −7
10 6
]
.
As we did with vectors in Rn, we will call a sum of scalar multiples of matrices a
linear combination. Note that this only makes sense if all of the matrices are of the
same size.
Properties of Matrix Addition and Multiplication by Scalars
THEOREM 1 Let A, B, and C be m× n matrices and let c1, c2 be real scalars. Then
V1 A+B is an m× n matrix;
V2 (A+B) + C = A + (B + C);
V3 A+B = B + A;
V4 There exists a matrix, denoted by Om,n, such that A+Om,n = A. In particular,
Om,n is the m× n matrix with all entries zero and is called the zero matrix.
V5 For each matrix A, there exists an m× n matrix (−A), with the property that
A+ (−A) = Om,n. In particular, (−A) is defined by (−A)ij = −(A)ij ;
V6 c1A is an m× n matrix;
V7 c1(c2A) = (c1c2)A;
V8 (c1 + c2)A = c1A+ c2A;
V9 c1(A+ B) = c1A+ c1B;
V10 1A = A.
Proof: These properties follow easily from the definitions of addition and multipli-
cation by scalars. The proof is left to the reader.
48
REMARK
Observe that these are exactly the same ten properties we had in Section 1.1 for
addition and multiplication by scalars of vectors in Rn.
The Transpose of a Matrix
As we will soon see, we will sometimes wish to work with the rows of an m×n matrix
as vectors in Rn. To preserve our convention of writing vectors in Rn as column
vectors, we invent some notation for turning rows into columns and vice versa.
DEFINITION
Transpose
The transpose of an m × n matrix A is the n ×m matrix whose ij-th entry is the
ji-th entry of A. We denote the transpose of A by AT . Hence, we have
(AT )ij = (A)ji
EXAMPLE 2 Determine the transpose of A =
[
1 2 5
]
, B =
−30
1
, and C =
3 −24 1
7 −6
.
Solution: We have
AT =
[
1 2 5
]T
=
12
5
BT =
−30
1
T
=
[−3 0 1]
CT =
3 −24 1
7 −6
T
=
[
3 4 7
−2 1 −6
]
THEOREM 2 For any m× n matrices A and B and scalar c ∈ R we have
(1) (AT )T = A.
(2) (A+B)T = AT +BT .
(3) (cA)T = cAT .
49
Proof: (1) By definition of the transpose we have
(
(AT )T
)
ij
= (AT )ji = Aij .
(2)
(
(A+B)T
)
ij
= (A+B)ji = (A)ji + (B)ji = (A
T )ij + (B
T )ij = (A
T +BT )ij .
(3)
(
(cA)T
)
ij
= (cA)ji = c(A)ji = c(A
T )ij = (cA
T )ij.
As mentioned above, one of the main use of the transpose is so that we have notation
for the rows of a matrix. We demonstrate this in the next example.
EXAMPLE 3 Let ~a1 =
13
1
and ~a2 =
2−1
4
. Then the matrix A = [~aT1
~aT2
]
is
[
1 3 1
2 −1 4
]
.
EXERCISE 1 If
~aT1~aT2
~aT3
=
1 3
1
−4 0 2
5 9 −3
, then what is ~a1, ~a2, and ~a3?
Matrix-Vector Multiplication
We saw in the last chapter that we had two ways of viewing the coefficient matrix of
A of a system of linear equations
[
A | ~b
]
: the coefficients of the i-th equation are the
entries in the i-th row of A, or the coefficients of xi are the entries in the i-th column
of A. We will now use both of these interpretations to derive equivalent definitions
of matrix-vector multiplication so that we can represent the system
[
A | ~b
]
by the
equation A~x = ~b where ~x =
x1
...
xn
. Note that both interpretations are going to be
very useful throughout this course and in Linear Algebra 2.
Coefficients are the Rows of A
Let A be an m × n matrix. Using the transpose, we can write A =
~aT1
...
~aTm
where
~ai ∈ Rn. Observe that the i-th equation of the system
[
A | ~b
]
can be written ~ai ·~x = bi
since
a1
...
an
·
x1
...
xn
= a1x1 + · · ·+ anxn
50
Therefore, the system
[
A | ~b
]
can be written as
~a1 · ~x
...
~am · ~x
=
b1
...
bm
To write the left-hand side in the form A~x, we make the following definition.
DEFINITION
Matrix-Vector
Multiplication
Let A be an m× n matrix whose rows are denoted ~aTi for 1 ≤ i ≤ m. Then, for any
~x ∈ Rn, we define
A~x =
~a1 · ~x
...
~am · ~x
REMARK
If A is an m×n matrix, then A~x is only defined if ~x ∈ Rn. Moreover, if ~x ∈ Rn, then
A~x ∈ Rm.
EXAMPLE 4 Calculate
1 32 −4
9 −1
[x1
x2
]
.
Solution: Let
1 32 −4
9 −1
[x1
x2
]
=
b1b2
b3
. Then, we have
b1 =
[
1
3
]
·
[
x1
x2
]
= x1 + 3x2
b1 =
[
2
−4
]
·
[
x1
x2
]
= 2x1 − 4x2
b1 =
[
9
−1
]
·
[
x1
x2
]
= 9x1 − x2
Hence,
1 32 −4
9 −1
[x1
x2
]
=
x1 + 3x22x1 − 4x2
9x1 − x2
51
EXAMPLE 5 Let A =
[
3 4 −5
1 0 2
]
. Calculate A~x where:
(a) ~x =
2−1
6
, (b) ~x =
10
0
, (c) ~x =
00
1
.
Solution: We have
A~x =
[
3 4 −5
1 0 2
] 2−1
6
= [3(2) + 4(−1) + (−5)(6)
1(2) + (0)(−1) + (2)(6)
]
=
[−28
14
]
A~x =
[
3 4 −5
1 0 2
]10
0
= [3(1) + 4(0) + (−5)(0)
1(1) + (0)(0) + (2)(0)
]
=
[
3
1
]
A~x =
[
3 4 −5
1 0 2
]00
1
= [3(0) + 4(0) + (−5)(1)
1(0) + (0)(0) + (2)(1)
]
=
[−5
2
]
Coefficients are the Columns of A
Observe that by using operations on vectors in Rn the system of linear equations
a11x1 + · · ·+ a1nxn = b1
...
... =
...
am1x1 + · · ·+ amnxn = bm
can be written as
x1
a11
...
am1
+ · · ·+ xn
a1n
...
amn
=
b1
...
bm
To write the right-hand side in the form A~x, we make the following definition.
DEFINITION
Matrix-Vector
Multiplication
Let A be an m × n matrix whose columns are denoted ~ai for 1 ≤ i ≤ n. Then, for
any ~x =
x1
...
xn
∈ Rn, we define
A~x = x1
a11
...
am1
+ · · ·+ xn
a1n
...
amn
52
REMARKS
1. As above, if A is an m × n matrix, then A~x only makes sense if ~x ∈ Rn.
Moreover, if ~x ∈ Rn, then A~x ∈ Rm.
2. Observe that this definition shows that for any ~x ∈ Rn, A~x is a linear combina-
tion of the columns of A.
EXAMPLE 6 Let A =
[
3 4 −5
1 0 2
]
. Calculate A~x where:
(a) ~x =
2−1
6
, (b) ~x =
01
0
.
Solution: We have
A
2−1
6
= 2 [3
1
]
+ (−1)
[
4
0
]
+ 6
[−5
2
]
=
[
2(3) + (−1)(4) + (6)(−5)
2(1) + (−1)(0) + (6)(2)
]
=
[−28
14
]
A
01
0
= 0 [3
1
]
+ 1
[
4
0
]
+ 0
[−5
2
]
=
[
4
0
]
Two Important Facts:
1. Observe that if ~x, ~y ∈ Rn, then
~xT~y =
[
x1 · · · xn
]
y1
...
yn
= [x1y1 + · · ·+ xnyn] = ~x · ~y
2. In most cases, we will now represent a system of linear equations
[
A | ~b
]
as
A~x = ~b. This will allow us to use properties of matrix vector multiplication
when dealing with systems of linear equations.
Matrix Multiplication
DEFINITION
Matrix
Multiplication
Let A be an m× n matrix and let B =
[
~b1 · · · ~bp
]
be an n× p matrix. Then we
define AB to be the m× p matrix
AB = A
[
~b1 · · · ~bp
]
=
[
A~b1 · · · A~bp
]
53
We can compute the entries of AB easily using our first interpretation of matrix-
vector multiplication. In particular, if ~aTi are the rows of A, then the ij-th entry of
AB is the i-th entry of A~bj . That is
(AB)ij = ~ai ·~bj = ~aTi ~bj = ai1b1j + ai2b2j + · · ·+ ainbnj =
n∑
k=1
(A)ik(B)kj
Observe that the number of columns of A must equal the number of rows of B for
this to be defined.
EXAMPLE 7
[
1 3 2
−1 0 −3
]−2 4−4 5
0 0
= [ 1(−2) + 3(−4) + 2(0) 1(4) + 3(5) + 2(0)
(−1)(−2) + 0(−4) + (−3)(0) (−1)(4) + 0(5) + (−3)(0)
]
=
[−14 19
2 −4
]
−2 4−4 5
0 0
[ 1 3 2−1 0 −3
]
=
(−2)(1) + 4(−1) (−2)(3) + 4(0) (−2)(2) + 4(−3)(−4)(1) + 5(−1) (−4)(3) + 5(0) (−4)(2) + 5(−3)
0(1) + 0(−1) 0(3) + 0(0) 0(2) + 0(−3)
=
−6 −6 −16−9 −12 −23
0 0 0
EXERCISE 2 Calculate the following matrix products, or state why they don’t exist.
(a)
1 32 −1
0 1
[2 3 0 1
0 1 −2 1
]
(b)
[
2 3 0 1
0 1 −2 1
]1 32 −1
0 1
(c)
[
1 2 3
] −12
1
(d)
−12
1
[1 2 3]
54
THEOREM 3 If A, B, and C are matrices of the correct size so that the required products are
defined, and t ∈ R, then:
(1) A(B + C) = AB + AC
(2) t(AB) = (tA)B = A(tB)
(3) A(BC) = (AB)C
(4) (AB)T = BTAT
Proof: We prove (1) and (3) and leave (2) and (4) as exercises.
For (1) we have
(
A(B + C)
)
ij
=
n∑
k=1
(A)ik(B + C)kj
=
n∑
k=1
(A)ik ((B)kj + (C)kj)
=
n∑
k=1
(A)ik(B)kj +
n∑
k=1
(A)ik(C)kj
=
(
AB
)
ij
+
(
AC
)
ij
=
(
AB + AC
)
ij
For (3) we have
(
A(BC)
)
ij
=
n∑
k=1
(A)ik(BC)kj
=
n∑
k=1
(A)ik
[
n∑
`=1
(B)k`(C)`j
]
=
n∑
k=1
[
n∑
`=1
(A)ik(B)k`(C)`j
]
=
n∑
`=1
n∑
k=1
(A)ik(B)k`(C)`j
=
n∑
`=1
[
n∑
k=1
(A)ik(B)k`
]
(C)`j
=
n∑
`=1
(AB)i`(C)`j
=
(
(AB)C
)
ij
55
Observe that Example 7 shows that in general AB 6= BA. When multiplying an
equation by a matrix we have to make sure we that we keep this in mind. That is, if
A = B and we multiply both sides by a matrix D, then we get either DA = DB or
AD = BD. In general AD 6= DB. Similarly, we do not have the cancellation law for
matrix multiplication; if AC = BC, then we can not guarantee that A = B. This is
demonstrated in the next example.
EXAMPLE 8 We have [
1 3
0 0
] [
0 1
0 0
]
=
[
0 1
0 0
]
=
[
1 15
0 −5
] [
0 1
0 0
]
but [
1 3
0 0
]
6=
[
1 15
0 −5
]
However, in many cases we will like to prove that two matrices are equal. To do this,
we often use the following theorem.
THEOREM 4 Suppose that A and B are m × n matrices such that A~x = B~x for every ~x ∈ Rn.
Then A = B.
Proof: Let A =
[
~a1 · · · ~an
]
and B =
[
~b1 · · · ~bn
]
. Let ~ei denote the i-th stan-
dard basis vector. Then, since A~x = B~x for all ~x ∈ Rn, we have that
~ai = A~ei = B~ei = ~bi
for 1 ≤ i ≤ n. Hence A = B.
Identity Matrix
When considering multiplication, we are often interested in finding the multiplicative
identity; the element I such that A× I = A = I ×A for every element A. For n× n
matrices, the multiplicative identity is the n× n identity matrix.
DEFINITION
Identity Matrix
The n × n identity matrix, denoted by I or In, is the matrix such that (I)ii = 1
for 1 ≤ i ≤ n and (I)ij = 0 whenever i 6= j. In particular, the columns of In are the
standard basis vectors of Rn.
EXAMPLE 9 For any 2× 2 matrix A =
[
a b
c d
]
we have
AI =
[
a b
c d
] [
1 0
0 1
]
=
[
a b
c d
]
=
[
1 0
0 1
] [
a b
c d
]
56
REMARK
Observe that the set Mm×n(R) with m 6= n does not have a multiplicative identity
since if AB = A, then B must be an n× n matrix, while if CA = A, then C must be
an m×m matrix.
THEOREM 5 Let A be an m× n matrix. Prove that ImA = A and AIn = A.
Proof: The proof is left as an exercise.
This theorem shows that In is the multiplicative identity of Mn×n(R).
Block Multiplication
So far in this section, we have often made use of the fact that we could represent
a matrix as a list of columns (n × 1 matrices) or as a list of rows (1 × n matrices).
Sometimes it can also be very useful to represent a matrix as a group of submatrices
called blocks.
DEFINITION
Block Matrix
If A is an m× n matrix, then we can write A as the k × ` block matrix
A =
A11 · · · A1`
...
. . .
...
Ak1 · · · Ak`
where Aij is a block such that all blocks in the i-th row have the same number of
rows and all blocks in the j-th column have the same number of columns.
EXAMPLE 10 Let A =
1 −1 3 40 0 0 0
0 3 1 2
. Then, we can write A as
A =
[
A11 A12
A21 A22
]
where A11 =
[
1 −1
0 0
]
, A12 =
[
3 4
0 0
]
, A21 =
[
0 3
]
, and A22 =
[
1 2
]
. Or, we could
write A as
A =
A11 A12A21 A22
A31 A32
where A11 =
[
1 −1 3], A12 = [4], A21 = [0 0 0], A22 = [0], A31 = [0 3 1],
and A32 =
[
2
]
.
57
Let A be an m×n matrix and let B be an n×p matrix. If we write A as a k×` block
matrix and B as an `×s block matrix, then we can calculate AB, by multiplying the
block matrices together using the definition of matrix multiplication. For example, if
A =
A11 A12A21 A22
A31 A32
and B = [B11 B12
B21 B22
]
, then
AB =
A11B11 + A12B21 A11B12 + A12B22A21B11 + A22B21 A21B12 + A22B22
A31B11 + A32B21 A31B12 + A32B22
assuming that we have divided A and B into blocks such that all of the required
products of the blocks is defined.
EXAMPLE 11 Let A =
1 2 −30 3 1
1 0 2
and B =
2 3 10 1 −2
0 0 3
. Let A11 = [1], A12 = [2 −3], A21 =
[
0
1
]
, and A22 =
[
3 1
0 2
]
so that A =
[
A11 A12
A21 A22
]
, and let B11 =
[
2
]
, B12 =
[
3 1
]
,
B21 =
[
0
0
]
, and B22 =
[
1 −2
0 3
]
, so that B =
[
B11 B12
B21 B22
]
. Then we have
AB =
[
A11 A12
A21 A22
] [
B11 B12
B21 B22
]
=
[
A11B11 + A12B21 A11B12 + A12B22
A21B11 + A22B21 A21B12 + A22B22
]
Computing each entry we get
A11B11 + A12B21 =
[
1
] [
2
]
+
[
2 −3] [0
0
]
=
[
2
]
+
[
0
]
=
[
2
]
A11B12 + A12B22 =
[
1
] [
3 1
]
+
[
2 −3] [1 −2
0 3
]
=
[
3 1
]
+
[
2 −13] = [5 −12]
A21B11 + A22B21 =
[
0
1
] [
2
]
+
[
3 1
0 2
] [
0
0
]
=
[
0
2
]
+
[
0
0
]
=
[
0
2
]
A21B12 + A22B22 =
[
0
1
] [
3 1
]
+
[
3 1
0 2
] [
1 −2
0 3
]
=
[
0 0
3 1
]
+
[
3 −3
0 6
]
=
[
3 −3
3 7
]
Hence,
AB =
2 5 −120 3 −3
2 3 7
58
REMARKS
1. If you multiply out AB like normal and compare this to the calculations above
it becomes clear why block multiplication works.
2. Block multiplication is used to distribute matrix multiplication over multiple
computers, and in a variety of applications.
3.2 Linear Mappings
Recall that a function f : A → B is a rule that associates with each element a ∈ A
one element f(a) ∈ B called the image of a under f . The subset of A for which f(a)
is defined is called the domain of f . The set B is called the codomain of f while
the subset of B consisting of all f(a) is called the range of f and is denoted R(f).
Matrix Mappings
Observe that if A is an m× n matrix, then we can define a function f : Rn → Rm by
f(~x) = A~x called a matrix mapping.
REMARK
As we indicated in the previous section, if f : Rn → Rm, then we should write
f
x1
...
xn
=
y1
...
ym
However, we often write f(x1, . . . , xn) = (y1, . . . , ym) as is typically done in other
areas.
EXAMPLE 1 Let A =
[
1 3 2
−1 3 1
]
and define f(~x) = A~x. Observe that for A~x to be defined, ~x
must be a vector in R3. Moreover, A~x is defined for every ~x ∈ R3, so the domain of
f is R3. Also, we see that A~x ∈ R2 and so the codomain of f is R2. We can easily
compute the images of some vectors in R3 under f using either definition of matrix
vector multiplication.
59
f(4,−2, 5) =
[
1 3 2
−1 3 1
] 4−2
5
= (8,−5)
f(−1,−1, 2) =
[
1 3 2
−1 3 1
]−1−1
2
= (0, 0)
f(1, 0, 0) =
[
1 3 2
−1 3 1
]10
0
= (1,−1)
f(x1, x2, x3) =
[
1 3 2
−1 3 1
]x1x2
x3
= (x1 + 3x2 + 2x3,−x1 + 3x2 + x3)
Using the properties of matrix multiplication, we prove the following important prop-
erties of matrix mappings.
THEOREM 1 Let A be an m× n matrix, and let f : Rn → Rm be defined by f(~x) = A~x. Then for
all ~x, ~y ∈ Rn and b, c ∈ R we have
f(b~x+ c~y) = bf(~x) + cf(~y)
Proof: Using Theorem 3.1.1 we get
f(b~x+ c~y) = A(b~x+ c~y) = A(b~x) + A(c~y) = bA~x + cA~y = bf(~x) + cf(~y)
Any function which has the property that f(b~x + c~y) = bf(~x) + cf(~y) is said to
preserve linear combinations. This property, called the linearity property, is
extremely important and useful in Linear Algebra. We now look at functions which
have this property.
Linear Mappings
DEFINITION
Linear Mapping
A function L : Rn → Rm is said to be a linear mapping if for every ~x, ~y ∈ Rn and
b, c ∈ R we have
L(b~x+ c~y) = bL(~x) + cL(~y)
60
REMARKS
1. A linear mapping may also be called a linear transformation.
2. A linear mapping L : Rn → Rn is sometimes called a linear operator. This
often done when we wish to stress the fact that the linear mapping has the same
domain and codomain.
Observe that if we have a linear mapping L : Rn → Rm and that ~x can be written as a
linear combination of vectors ~v1, . . . , ~vk, say ~x = c1~v1+ · · ·+ ck~vk, then the image of ~x
under L can be written as a linear combination of the images of the vectors ~v1, . . . , ~vk
under L. That is
L(~x) = L(c1~v1 + · · ·+ ck~vk) = c1L(~v1) + · · ·+ ckL(~vk)
since L preserves linear combinations. This is the most important property of linear
mappings; it will be used frequently!
EXAMPLE 2 Prove that the function L : R3 → R2 defined by L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3)
is a linear mapping.
Solution: Let ~x, ~y ∈ R3 and b, c ∈ R. Then
L(b~x+ c~y) = L
(
b(x1, x2, x3) + c(y1, y2, y3)
)
= L(bx1 + cy1, bx2 + cy2, bx3 + cy3)
=
(
3(bx1 + cy1)− (bx2 + cy2), 2(bx1 + cy1) + 2(bx3 + cy3)
)
= b(3x1 − x2, 2x1 + 2x3) + c(3y1 − y2, 2y1 + 2y3)
= bL(~x) + cL(~y)
EXAMPLE 3 Prove that L : R2 → R2 defined by L(x1, x2) = (x21 − x22, x1x2) is not linear.
Solution: To prove L is not linear we just need to find one example to show that L
does not preserve linear combinations. Observe that L(1, 2) = (−3, 2) and L(2, 4) =
(−12, 8), but 2L(1, 2) = (−6, 4) 6= L(2(1, 2)) = L(2, 4). Hence, L is not linear.
EXAMPLE 4 Let ~a ∈ Rn. Show that proj~a : Rn → Rn is a linear mapping.
Solution: Let ~x, ~y ∈ Rn and b, c ∈ R. Then
proj~a(b~x+ c~y) =
(b~x+ c~y) · ~a
‖~a‖2 ~a
=
b(~x · ~a)
‖~a‖2 ~a+
c(~y · ~a)
‖~a‖2 ~a
= b proj~a ~x+ c proj~a ~y
Hence, proj~a is linear.
61
EXERCISE 1 Prove that the mapping L : R3 → R2 defined by L(x1, x2, x3) = (x1 − x2, x1 − 2x3) is
linear.
Theorem 3.1.1 shows that every matrix mapping is a linear mapping. It is natural
to ask whether the converse is true: “Is every linear mapping a matrix mapping?”
To answer this question, we will see that our second interpretation of matrix vector
multiplication is very important.
Let L : Rn → Rm be a linear mapping and let ~x ∈ Rn. We know that ~x can be
written as a unique linear combination of the standard basis vectors, say
~x = x1~e1 + · · ·+ xn~en
Hence, using the linearity property we get
L(~x) = L(x1~e1 + · · ·+ xn~en) = x1L(~e1) + · · ·+ xnL(~en)
Our second interpretation of matrix vector multiplication says that a matrix times
a vector gives us a linear combination of the columns of the matrix. Using this in
reverse, we see that
L(~x) =
[
L(~e1) · · · L(~en)
]
x1
...
xn
We have proven the following theorem.
THEOREM 2 Every linear mapping L : Rn → Rm can be represented as a matrix mapping with
matrix whose columns are the images of the standard basis vectors of Rn under L.
That is, L(~x) = [L]~x where
[L] =
[
L(~e1) · · · L(~en)
]
DEFINITION
Standard Matrix
Let L : Rn → Rm be a linear mapping. Then the matrix [L] = [L(~e1) · · · L(~en)]
is called the standard matrix of L.
REMARK
We are calling this the standard matrix of L since it is based on the standard basis. In
Math 235, we will see that we can find the matrix representation of a linear mapping
with respect to different bases.
62
EXAMPLE 5 Determine the standard matrix of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).
Solution: By definition, the columns of the standard matrix [L] are the images of
the standard basis vectors of R3 under L. We have
L(1, 0, 0) = (3, 2)
L(0, 1, 0) = (−1, 0)
L(0, 0, 1) = (0, 2)
Thus,
[L] =
[
L(~e1) L(~e2) L(~e3)
]
=
[
3 −1 0
2 0 2
]
REMARK
Be careful to remember that when we write L(1, 0, 0) = (3, 2), we really mean L(~e1) =
L
10
0
= [3
2
]
so that L(~e1) is in fact a column of [L].
EXAMPLE 6 Let ~a =
[
1
2
]
. Determine the standard matrix of proj~a : R
2 → R2.
Solution: We need to find the projection of the standard basis vectors of R2 onto ~a.
proj~a(~e1) =
~e1 · ~a
‖~a‖2 ~a =
1
5
[
1
2
]
=
[
1/5
2/5
]
proj~a(~e2) =
~e2 · ~a
‖~a‖2 ~a =
2
5
[
1
2
]
=
[
2/5
4/5
]
Hence, [proj~a] =
[
1/5 2/5
2/5 4/5
]
.
EXERCISE 2 Determine the standard matrix of the linear mapping L(x1, x2, x3) = (x1 − x2, x1 −
2x3).
We now look at two important examples of linear mappings which will be used
throughout Math 136 and Math 235.
63
Rotations in R2
Let Rθ : R
2 → R2 denote the function that rotates a vector ~x ∈ R2 about the origin
through an angle θ. Using basic trigonometric identities, it is easy to show that
Rθ(x1, x2) = (x1 cos θ − x2 sin θ, x1 sin θ + x2 cos θ)
and that Rθ is linear. We then find that the standard matrix of Rθ is
[Rθ] =
[
cos θ − sin θ
sin θ cos θ
]
EXAMPLE 7 Determine the standard matrix of Rpi/4.
Solution: We have
[Rpi/4] =
[
cospi/4 − sin pi/4
sin pi/4 cos pi/4
]
=
[
1/
√
2 −1/√2
1/
√
2 1/
√
2
]
We call Rθ a rotation in R
2 and we call its standard matrix [Rθ] a rotation matrix.
The following theorem shows some important properties of a rotation matrix.
THEOREM 3 Let Rθ : R2 → R2 be a rotation with rotation matrix A = [Rθ]. Then the columns of
A are orthogonal unit vectors.
Proof: The columns of A are ~a1 =
[
cos θ
sin θ
]
and ~a2 =
[− sin θ
cos θ
]
. Hence,
~a1 · ~a2 = − cos θ sin θ + cos θ sin θ = 0
‖~a1‖2 = cos2 θ + sin2 θ = 1
‖~a2‖2 = cos2 θ + sin2 θ = 1
Reflections
Let refl~n : R
n → Rn denote the linear mapping which maps a vector ~x to its mirror
image in the hyperplane with normal vector ~n. Figure 3.2.1 below shows a reflection
over a line in R2. From the figure, we see that the reflection of ~x over the line with
normal vector ~n is given by
refl~n ~x = ~x− 2 proj~n ~x
We extend this formula directly to the n-dimensional case.
64
x1
x2
proj~n ~x
~n
θ
θ2
~x
refl~n ~x
Figure 3.2.1: Reflection of ~x across the line with normal vector ~n
EXAMPLE 8 Determine the reflection of ~x =
1
−1
1
3
over the hyperplane in R4 with normal vector
~n =
1
2
−2
0
.
Solution: We have
refl~n ~x = ~x− 2 proj~n ~x = ~x− 2
~x · ~n
‖~n‖2~n
=
1
−1
1
3
− −69
1
2
−2
0
=
5/3
1/3
−1/3
3
EXAMPLE 9 Determine the standard matrix of refl~n : R3 → R3 where ~n =
12
−3
.
65
Solution: We have
refl~n ~e1 =
10
0
− 2
14
12
−3
=
6/7−2/7
3/7
refl~n ~e2 =
01
0
− 4
14
12
−3
=
−2/73/7
6/7
refl~n ~e3 =
00
1
+ 6
14
12
−3
=
3/76/7
−2/7
Hence,
[refl~n] =
6/7 −2/7 3/7−2/7 3/7 6/7
3/7 6/7 −2/7
3.3 Special Subspaces
When dealing with functions, it is often important and useful to look at their domain
and range. Thus, we now look at the domain and range of linear mappings.
Kernel
We have seen that unlike many functions you deal with in Calculus, linear mappings
L : Rn → Rm never have restricted domain. That is, the domain of L is always all of
Rn. However, there is an important subset of the domain which we will use frequently.
DEFINITION
Kernel
Let L : Rn → Rm be a linear mapping. Then the kernel of L is defined to be
ker(L) = {~x ∈ Rn | L(~x) = ~0}
EXAMPLE 1 Let L : R2 → R2 be defined by L(x1, x2) = (2x1 − 2x2,−x1 + x2). Determine which
of the following vectors is in the kernel of L: ~x1 =
[
0
0
]
, ~x2 =
[
2
2
]
, ~x3 =
[
3
−1
]
.
Solution: We have
L(~x1) = L(0, 0) = (0, 0)
L(~x2) = L(2, 2) = (0, 0)
L(~x3) = L(3,−1) = (8,−4)
Hence, ~x1, ~x2 ∈ ker(L) while ~x3 6∈ ker(L).
66
EXAMPLE 2 Determine the kernel of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).
Solution: By definition, the kernel of L is the set of all ~x ∈ R3 such that L(~x) = ~0.
That is, ~x ∈ ker(L) if
(0, 0) = L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3)
Hence, every ~x =
x1x2
x3
in the kernel of L must satisfy
3x1 − x2 = 0
2x1 + 2x3 = 0
We find the general solution of the homogeneous system is
~x = t
−1−3
1
= Span
−1−3
1
Thus,
ker(L) = Span
−1−3
1
EXAMPLE 3 Determine the kernel of Rθ : R2 → R2.
Solution: We need to find all ~x ∈ R2 such that Rθ(~x) =
[
0
0
]
. Recall that [Rθ] =[
cos θ − sin θ
sin θ cos θ
]
. Thus, we need to solve
[
0
0
]
= Rθ(~x) = [Rθ]~x =
[
cos θ − sin θ
sin θ cos θ
]
~x
This is a homogeneous system, so we row reduce the coefficient matrix.[
cos θ − sin θ
sin θ cos θ
]
∼
[
1 0
0 1
]
Hence, the system has the unique solution ~x = ~0. So,
ker(Rθ) =
{[
0
0
]}
Note that geometrically this result is obvious.
67
EXERCISE 1 Determine the kernel of L(x1, x2, x3) = (x1 − x2, x1 − 2x3).
Observe that in both of these examples the kernel of the linear mapping is a subspace
of the appropriate Rn. To prove that this is always the case, we will use the following
lemma.
LEMMA 1 Let L : Rn → Rm be a linear mapping. Then L(~0) = ~0.
Proof: The proof is left as an exercise.
THEOREM 2 Let L : Rn → Rm be a linear mapping. Then ker(L) is a subspace of Rn.
Proof: We show that ker(L) satisfies the definition of a subspace of Rn. We first
observe that by definition ker(L) is a subset of Rn. Also, we have that ~0 ∈ ker(L) by
Lemma 1. Let ~x, ~y ∈ ker(L). Then L(~x) = ~0 and L(~y) = ~0 so
L(~x+ ~y) = L(~x) + L(~y) = ~0 +~0 = ~0
so ~x+ ~y ∈ ker(L). Thus, ker(L) is closed under addition. Similarly, for any c ∈ R we
have
L(c~x) = cL(~x) = c~0 = ~0
so c~x ∈ ker(L). Therefore, ker(L) is also closed under scalar multiplication. Hence,
we have shown that ker(L) is a subspace of Rn.
Range
DEFINITION
Range
Let L : Rn → Rm be a linear mapping. The range of L is
R(L) = {L(~x) ∈ Rm | ~x ∈ Rn}
EXAMPLE 4 Let L : R2 → R3 be defined by L(x1, x2) = (x1 + x2, x1− x2, x2). Determine which of
the following vectors is in the range of L: ~y1 =
00
0
, ~y2 =
23
1
, ~y3 =
3−1
2
.
Solution: By Lemma 1, we have L(0, 0) = (0, 0, 0), so ~y1 ∈ R(L).
For ~y2, we need to find if there exist x1 and x2 such that
(2, 3, 1) = L(x1, x2) = (x1 + x2, x1 − x2, x2)
68
That is, we need to solve the system of equations
x1 + x2 = 2
x1 − x2 = 3
x2 = 1
This system is inconsistent so ~y2 6∈ R(L).
For ~y3, we need to find if there exist x1 and x2 such that
(3,−1, 2) = L(x1, x2) = (x1 + x2, x1 − x2, x2)
That is, we need to solve the system of equations
x1 + x2 = 3
x1 − x2 = −1
x2 = 2
We find that the system is consistent with unique solution is x1 = 1, x2 = 2. Hence,
L(1, 2) = (3,−1, 2), so ~y3 ∈ R(L).
EXAMPLE 5 Let ~a =
[
2
−1
]
. Determine the range of proj~a : R
2 → R2.
Solution: We see that every vector proj~a ~x in the range of proj~a has the form
proj~a ~x =
~x · ~a
‖~a‖2~a = c~a
for some c ∈ R. Moreover, for any c ∈ R, we can find some ~x ∈ R2 such that c = ~x·~a
‖~a‖2
.
Thus,
R(proj~a) = Span{~a}
EXAMPLE 6 Determine the range of L(x1, x2, x3) = (3x1 − x2, 2x1 + 2x3).
Solution: We see that every vector L(~x) has the form[
3x1 − x2
2x1 + 2x3
]
= x1
[
3
2
]
+ x2
[−1
0
]
+ x3
[
0
2
]
Hence,
R(L) = Span
{[
3
2
]
,
[−1
0
]
,
[
0
2
]}
Observe that the spanning set for R(L) is linearly dependent and so we could simplify
this to be, for example,
R(L) = Span
{[−1
0
]
,
[
0
2
]}
69
EXERCISE 2 Determine the range of L(x1, x2, x3) = (x1 − x2, x1 − 2x3).
Observe in the examples above that the range of a linear mapping is a subspace of
the codomain. Of course, this is true for all linear mappings.
THEOREM 3 Let L : Rn → Rm be a linear mapping. Then, R(L) is a subspace of Rm.
Proof: The proof is left as an exercise.
In many cases we are interested in whether the range of a linear mapping L equals
its codomain. That is, for every vector ~y in the codomain, can we find a vector ~x in
the domain such that L(~x) = ~y.
REMARK
A function whose range equals its codomain is said to be onto. We will look more
closely at onto mappings in Math 235.
EXAMPLE 7 Let L : R3 → R3 be defined by L(x1, x2, x3) = (x1 + x2 + x3, 2x1 − 2x3, x2 + 3x3).
Prove that R(L) = R3.
Solution: To prove that R(L) = R3, we need to show that if we pick any vector
~y = R3, then we can find a vector ~x ∈ R3 such that L(~x) = ~y. Let ~y =
y1y2
y3
∈ R3.
Consider
(y1, y2, y3) = L(x1, x2, x3) = (x1 + x2 + x3, 2x1 − 2x3, x2 + 3x3)
This gives the system of linear equations
x1 + x2 + x3 = y1
2x1 − 2x3 = y2
x2 + 3x3 = y3
Row reducing the corresponding augmented matrix gives
1 1 1 y12 0 −2 y2
0 1 3 y3
∼
1 1 1 y10 −2 −3 −2y1 + y2
0 0 1 −y1 + 12y2 + y3
Therefore R(L) = R3 since the system is consistent for all y1, y2, and y3. In particular,
solving the system we find that by taking x1 = −y1 + y2 + y3, x2 = 3y1 − 32y2 − 2y3,
x3 = −y1 + 12y2 + y3 we get
L(x1, x2, x3) = (y1, y2, y3)
70
EXAMPLE 8 Let A be an n× n matrix and let L : Rn → Rn be defined by L(~x) = A~x. Prove that
R(L) = Rn if and only if rankA = n.
Solution: Assume that R(L) = Rn. Then, for every ~y ∈ Rn there exists a vector ~x
such that ~y = L(~x) = A~x. Hence, the system of equations A~x = ~y with coefficient
matrix A is consistent for every ~y ∈ Rn. By Theorem 2.2.3 this implies that rank(A) =
n.
On the other hand, if rankA = n, then L(~x) = A~x = ~y is consistent for every ~y ∈ Rn,
so R(L) = Rn.
The Four Fundamental Subspaces of a Matrix
We now look at the relationship of the standard matrix of a linear mapping to its
range and kernel. In doing so, we will derive the four fundamental subspaces of
a matrix.
THEOREM 4 Let L : Rn → Rm be a linear mapping and let A = [L] be the standard matrix of L.
Then, ~x ∈ ker(L) if and only if A~x = ~0.
Proof: If ~x ∈ ker(L), then ~0 = L(~x) = A~x. On the other hand, if A~x = ~0, then
~0 = A~x = L(~x), so ~x ∈ ker(L).
Since ker(L) is a subspace of Rn, this theorem shows that for any m × n matrix A
the set of all vectors ~x ∈ Rn such that A~x = ~0 is a subspace of Rn. This motivates
the following definition.
DEFINITION
Nullspace
Let A be an m × n matrix. The set of all ~x ∈ Rn such that A~x = ~0 is called the
nullspace of A. We write
Null(A) = {~x ∈ Rn | A~x = ~0}
EXAMPLE 9 Let A =
[
1 3 1
−1 −2 2
]
. Determine a spanning set for the nullspace of A.
Solution: We need to find all ~x such that A~x = ~0. We observe that this is a
homogeneous system with coefficient matrix A. Row reducing gives[
1 3 1
−1 −2 2
]
∼
[
1 0 −8
0 1 3
]
71
Thus, the general solution to the system is
~x = t
8−3
1
Hence,
Null(A) = Span
8−3
1
EXAMPLE 10 Let A =
1 2 −15 0 −4
−2 6 4
. Show that Null(A) = {~0}.
Solution: We need to show that the only solution to the homogeneous system A~x = ~0
is ~x = ~0. Row reducing the associated coefficient matrix we get
1 2 −15 0 −4
−2 6 4
∼
1 2 −10 −10 1
0 0 3
Hence, the rank of the coefficient matrix equals the number of columns so by Theorem
2.2.3 there is a unique solution. Therefore, since the system is homogeneous, the only
solution is ~x = ~0. Thus, Null(A) = {~0}.
THEOREM 5 Let A be an m × n matrix. A consistent system of linear equations A~x = ~b has a
unique solution if and only if Null(A) = {~0}.
Proof: If A~x = ~b has a unique solution, then by Theorem 2.2.3, rankA = n. Hence,
A~x = ~0 has a unique solution, so Null(A) = {~0}.
If Null(A) = {~0}, then A~x = ~0 has a unique solution, so by Theorem 2.2.3, rankA =
n. Thus, A~x = ~b has a unique solution.
The connection between the range of a linear mapping and its standard matrix follows
easily from our second interpretation of matrix vector multiplication.
THEOREM 6 Let L : Rn → Rm be a linear mapping with standard matrix [L] = A = [~a1 · · · ~an].
Then,
R(L) = Span{~a1, . . . ,~an}
72
Proof: We have
L(~x) = A~x =
[
~a1 · · · ~an
]
x1
...
xn
= x1~a1 + · · ·+ xn~an
for all x1, . . . , xn ∈ Rn, and the result follows.
Therefore, the range of a linear mapping is the subspace spanned by the columns of
its standard matrix.
DEFINITION
Columnspace
Let A =
[
~a1 · · · ~an
]
. The columnspace of A is the subspace of Rm defined by
Col(A) = Span{~a1, . . . ,~an} = {A~x ∈ Rm | ~x ∈ Rn}
EXAMPLE 11 Let A =
[
1 5 −3
9 2 1
]
and B =
1 −30 2
0 1
, then Col(A) = Span{[1
9
]
,
[
5
2
]
,
[−3
1
]}
and
Col(B) = Span
10
0
,
−32
1
.
THEOREM 7 Let A be an m× n matrix. Then Col(A) = Rm if and only if rank(A) = m.
Proof: The proof is left as an exercise.
Recall that the transpose of a matrix changes the columns of a matrix into rows and
vice versa. Hence, the subspace of Rn spanned by the rows of an m× n matrix A is
just the columnspace of AT .
DEFINITION
Rowspace
Let A be an m× n matrix. The rowspace of A is the subspace of Rn defined by
Row(A) = {AT~x ∈ Rn | ~x ∈ Rm}
EXAMPLE 12 Let A =
[
1 5 −3
9 2 1
]
and B =
1 −30 2
0 1
, then Row(A) = Span
15
−3
,
92
1
and
Row(B) = Span
{[
1
−3
]
,
[
0
2
]
,
[
0
1
]}
.
73
Lastly, since we have looked at the columnspace of AT it makes sense to also consider
the nullspace of AT .
DEFINITION
Left Nullspace
Let A be an m × n matrix. The left nullspace of A is the subspace of Rm defined
by
Null(AT ) = {~x ∈ Rm | AT~x = ~0}
EXAMPLE 13 Let A =
1 −30 2
0 1
. Then, AT = [ 1 0 0−3 2 1
]
. We find that the general solution to
the homogeneous system AT~x = ~0 is
~x = t
0−1
2
Therefore,
Null(AT ) = Span
0−1
2
For any m × n matrix A, we call the nullspace, columnspace, rowspace, and left
nullspace the four fundamental subspaces of A. The next two theorems give you
a brief glimpse of why these subspaces are useful. This will be explored further in
Math 235
EXAMPLE 14 Find the four fundamental subspaces of A =
[
1 3 2
2 5 2
]
.
Solution: To find Null(A) we solve A~x = ~0. Row reducing A gives[
1 3 2
2 5 2
]
∼
[
1 0 −4
0 1 2
]
Hence, Null(A) = Span
4−2
1
.
For the columnspace of A we have
Col(A) = Span
{[
1
2
]
,
[
3
5
]
,
[
2
2
]}
= Span
{[
1
2
]
,
[
3
5
]}
since
−4
[
1
2
]
+ 2
[
3
5
]
=
[
2
2
]
74
The rowspace of A is Row(A) = Span
13
2
,
25
2
.
To determine the left nullspace of A, we solve AT~x = ~0. Row reducing AT gives
1 23 5
2 2
∼
1 00 1
0 0
Hence, Null(AT ) = {~0}.
EXERCISE 3 Find the four fundamental subspaces of A =
1 2 1 1−2 −5 −2 1
−1 −3 −1 2
THEOREM 8 Let A be an m× n matrix. If ~a ∈ Row(A) and ~x ∈ Null(A), then ~a · ~x = 0.
Proof: Let A =
~aT1
...
~aTm
. If ~x ∈ Null(A), then
~0 = A~x =
~aT1
...
~aTm
~x =
~a1 · ~x
...
~am · ~x
So, ~ai · ~x = 0 for all 1 ≤ i ≤ m. Now, if ~a ∈ Row(A), then ~a = c1~a1 + · · · + cm~am.
But then
~a · ~x = (c1~a1 + · · ·+ cm~am) · ~x = c1(~a1 · ~x) + · · ·+ cm(~am · ~x) = 0
as required.
THEOREM 9 Let A be an m× n matrix. If ~a ∈ Col(A) and ~x ∈ Null(AT ), then ~a · ~x = 0.
Proof: The proof is left as an exercise.
75
3.4 Operations on Linear Mappings
Addition and Scalar Multiplication of Mappings
Since linear mappings are functions, we can perform operations on linear mappings
in the same way we do any other function.
DEFINITION
Addition and
Scalar
Multiplication
Let L : Rn → Rm and M : Rn → Rm be linear mappings and let c ∈ R. We define
L+M : Rn → Rm and cL : Rn → Rm by
(L+M)(~x) = L(~x) +M(~x)
(cL)(~x) = cL(~x)
REMARK
Two linear mappings L and M are equal if and only if they have the same domain,
the same range, and L(~x) = M(~x) for all ~x in the domain. We write L =M .
EXAMPLE 1 Let L : R2 → R3 and M : R2 → R3 be defined by L(x1, x2) = (x1, x1 + x2,−x2) and
M(x1, x2) = (x1, x1, x2). Then, L+M is the mapping defined by
(L+M)(~x) = L(~x) +M(~x) = (x1, x1 + x2,−x2) + (x1, x1, x2) = (2x1, 2x1 + x2, 0)
and 3L is the mapping defined by
(3L)(~x) = 3L(~x) = 3(x1, x1 + x2,−x2) = (3x1, 3x1 + 3x3,−3x2)
If we let L denote the set of all possible linear mappings L : Rn → Rm, then we
can show that L satisfies the same 10 properties as vectors in Rn and matrices in
Mm×n(R).
THEOREM 1 Let L,M,N ∈ L and let c1, c1 be real scalars. Then
V1 L+M ∈ L;
V2 (L+M) +N = L+ (M +N);
V3 L+M =M + L;
V4 There exists a linear mapping O : Rn → Rm, such that L+O = L. In particular,
O is the linear mapping defined by O(~x) = ~0 for all ~x ∈ Rn;
76
V5 For each linear mapping L, there exists a linear mapping (−L) with the property
that L + (−L) = O. In particular, (−L)(~x) is the linear mapping defined by
(−L)(~x) = −L(~x);
V6 c1L ∈ L;
V7 c1(c2L) = (c1c2)L;
V8 (c1 + c2)L = c1L+ c2L;
V9 c1(L+M) = c1L+ c1M ;
V10 1L = L;
Proof: We will prove V1, V3, and V9 and leave the rest as exercises.
For V1, we need to show that L +M is also a linear mapping. Let ~x, ~y ∈ Rn and
b, c ∈ R. Then
(L+M)(b~x + c~y) = L(b~x + c~y) +M(b~x + c~y) by definition of L+M
= bL(~x) + cL(~y) + bM(~x) + cM(~y) since L and M are linear
= b
(
L(~x) +M(~x)
)
+ c
(
L(~y) +M(~y)
)
by properties of vectors in Rm
= b(L+M)(~x) + c(L+M)(~y) by definition of L+M
Hence, L+M is linear.
For V3, we need to show (L+M)(~x) = (M + L)(~x) for all ~x ∈ Rn. We have
(L+M)(~x) = L(~x) +M(~x) by definition of L+M
= M(~x) + L(~x) using properties of vectors Rm
= (M + L)(~x) by definition of M + L
For V9, we need to show that
(
c1(L+M)
)
(~x) =
(
c1L+ c1M
)
(~x) for all ~x ∈ Rn. We
have(
c1(L+M)
)
(~x) = c1(L+M)(~x) by definition of scalar multiplication
= c1
(
L(~x) +M(~x)
)
by definition of L+M
= c1L(~x) + c1M(~x) using properties of vectors R
m
= (c1L)(~x) + (c1M)(~x) by definition of scalar multiplication
=
(
c1L+ c1M
)
(~x) by definition of L+M
Properties V1 and V6 show that the sum of linear mappings is a linear mapping as is
the scalar multiple of a linear mapping. Thus, it makes sense to consider the standard
matrix of these new linear mappings.
77
THEOREM 2 Let L : Rn → Rm and M : Rn → Rm be linear mappings and let c ∈ R. Then
[L+M ] = [L] + [M ]
[cL] = c[L]
Proof: We have
[L+M ](~x) = (L+M)(~x) = L(~x) +M(~x) = [L]~x+ [M ]~x = ([L] + [M ])(~x)
The result follows from Theorem 1. Similarly,
[cL](~x) = (cL)(~x) = cL(~x) = c[L]~x
Composition
One of the most important operations on functions is composition.
DEFINITION
Composition
Let L : Rn → Rm and M : Rm → Rp be linear mappings. Then M composed of L is
the function M ◦ L : Rn → Rp defined by
(M ◦ L)(~x) = M(L(~x))
REMARK
Observe that it is necessary that the range of L is a subset of the domain of M for
M ◦ L to be defined.
THEOREM 3 Let L : Rn → Rm and M : Rm → Rp be linear mappings. Then M ◦ L is a linear
mapping and
[M ◦ L] = [M ][L]
Proof: The proof is left as an exercise.
Chapter 4
Vector Spaces
4.1 Vector Spaces
We have seen that Rn, subspaces of Rn, the set Mm×n(R) of m× n matrices, and the
set L of all possible linear mappings L : Rn → Rm satisfy the same ten properties. In
fact, as we will see, there are many other sets with operations of addition and scalar
multiplication that satisfy these same properties. Instead of studying each of these
separately, it makes sense to define and study one abstract concept which contains
all of them.
DEFINITION
Vector Space
Let V be a set. We will call the elements of V vectors and denote them with the usual
vector symbol ~x. V is then called a vector space over R if there is an operation of
addition, denoted ~x + ~y, and an operation of scalar multiplication, denoted c~x, such
that for any ~v, ~x, ~y ∈ V and a, b ∈ R we have:
V1 ~x+ ~y ∈ V
V2 (~x+ ~y) + ~v = ~x+ (~y + ~v)
V3 ~x+ ~y = ~y + ~x
V4 There is a vector denoted ~0 in V such that ~x + ~0 = ~x. It is called the zero
vector.
V5 For each ~x ∈ V there exists an element −~x such that ~x+(−~x) = ~0. −~x is called
the additive inverse of ~x.
V6 a~x ∈ V
V7 a(b~x) = (ab)~x
V8 (a+ b)~x = a~x+ b~x
V9 a(~x+ ~y) = a~x+ a~y
V10 1~x = ~x.
78
79
REMARKS
1. We sometimes denote addition by ~x⊕~y and scalar multiplication by c~x to dis-
tinguish these from “normal” operations of addition and scalar multiplication.
See Example 7 below.
2. Sometimes, when working with multiple vector spaces, we denote the zero vector
in a vector space V by ~0V.
3. The ten vector space axioms define a “structure” for the set based on the op-
erations of addition and scalar multiplication. The study of vector spaces is
the study of this structure. Note that individual vector spaces may have ad-
ditional operations that are not included in this structure and hence may be
different than what is included in other vector spaces. For example, matrix
multiplication.
4. The definition of a vector space makes sense if the scalars are allowed to be
taken from any field, for example, Q or C. In Math 235 we will consider the
case where the scalars are allowed to be complex. In this course, when we say
vector space, we mean a vector space over R.
EXAMPLE 1 In Chapter 1 we saw that Rn and any subspace of Rn under standard addition and
scalar multiplication of vectors satisfy all ten vector space axioms, and hence are
vector spaces.
EXAMPLE 2 The set Mm×n(R) is a vector space under standard addition and scalar multiplication
of matrices.
EXAMPLE 3 The set L of all linear mappings L : Rn → Rm, is a vector space under standard
addition and scalar multiplication of linear mappings.
EXAMPLE 4 Denote the set of all polynomials of degree at most n with real coefficients by Pn(R).
That is,
Pn(R) = {a0 + a1x+ · · ·+ anxn | a0, a1, . . . , an ∈ R}
Define standard addition and scalar multiplication of polynomials by
(a0 + · · ·+ anxn) + (b0 + · · ·+ bnxn) = (a0 + b0) + · · ·+ (an + bn)xn
c(a0 + · · ·+ anxn) = (ca0) + · · ·+ (can)xn
for all c ∈ R. It is easy to verify that Pn(R) is a vector space under these operations.
We find that the zero vector in Pn(R) is the polynomial z(x) = 0 for all x, and the
additive inverse of p(x) = a0 + a1x+ · · ·+ anxn is (−p)(x) = −a0 − a1x− · · ·− anxn.
80
EXAMPLE 5 The set C[a, b] of all continuous functions defined on the interval [a, b] is a vector
space under standard addition and scalar multiplication of functions.
EXAMPLE 6 Let V = {~0} and define addition by ~0 + ~0 = ~0, and scalar multiplication by c~0 = ~0
for all c ∈ R. Show that V is a vector space under these operations.
Solution: To show that V is a vector space, we need to show that it satisfies all ten
axioms.
V1 The only element in V is ~0 and ~0 +~0 = ~0 ∈ V.
V2 (~0 +~0) +~0 = ~0 +~0 = ~0 + (~0 +~0)
V3 ~0 +~0 = ~0 = ~0 +~0
V4 ~0 satisfies ~0 +~0 = ~0, so ~0 is the zero vector in the set.
V5 The additive inverse of ~0 is ~0.
V6 a~0 = ~0 ∈ V.
V7 a(b~0) = a~0 = ~0 = (ab)~0.
V8 (a+ b)~0 = ~0 = ~0 +~0 = a~0 + b~0 .
V9 a(~0 +~0) = a~0 = ~0 = ~0 +~0 = a~0 + a~0
V10 1~0 = ~0.
for all a, b ∈ R. Thus, V is a vector space. It is called the trivial vector space.
EXAMPLE 7 Let S = {x ∈ R | x > 0}. Define addition in S by x ⊕ y = xy, and define scalar
multiplication by c x = xc for all x, y ∈ S and all c ∈ R. Prove that V is a vector
space under these operations.
Solution: We will show that S satisfies all ten vector space axioms. For any x, y, z ∈
S and a, b ∈ R we have
V1 x⊕ y = xy > 0 since x > 0 and y > 0. Hence, x⊕ y ∈ S.
V2 (x⊕ y)⊕ z = (xy)⊕ z = (xy)z = x(yz) = x⊕ (yz) = z ⊕ (y ⊕ z)
V3 x⊕ y = xy = yx = y ⊕ x
V4 The zero vector is 1 since 1 ∈ S and x⊕ 1 = x1 = x.
V5 If x ∈ S, then 1
x
⊕x = 1
x
x = 1. Hence 1
x
is the additive inverse of x. Also, 1
x
> 0
since x > 0 and hence 1
x
∈ S.
81
V6 a x = xa > 0 since x > 0. Hence a x ∈ S.
V7 a (b x) = a xb = (xb)a = xab = (ab) x.
V8 (a+ b) x = xa+b = xaxb = xa ⊕ xb = a x⊕ b x .
V9 a (x⊕ y) = a (xy) = (xy)a = xaya = xa ⊕ ya = a x⊕ a y.
V10 1x = x1 = x.
Therefore S is a vector space.
EXAMPLE 8 Let D =
{[
a1 0
0 a2
]
| a1, a2 ∈ R
}
. Show that D is a vector space under standard
addition and scalar multiplication of matrices.
Solution: We show that D satisfies all ten vector space axioms. For any A =[
a1 0
0 a2
]
, B =
[
b1 0
0 b2
]
, and C =
[
c1 0
0 c2
]
in D and s, t ∈ R we have
V1 A+B =
[
a1 + b1 0
0 a2 + b2
]
∈ D.
V2 (A+B) + C = A + (B + C) by Theorem 3.1.1.
V3 A+B = B + A by Theorem 3.1.1.
V4 Clearly O2,2 ∈ D and we have O2,2 satisfies A + O2,2 = A for all A ∈ D by
Theorem 1. Hence, O2,2 is the zero vector in D.
V5 By Theorem 3.1.1, the additive inverse of A is
[−a1 0
0 −a2
]
∈ D for all A ∈ D.
V6 sA =
[
sa1 0
0 sa2
]
∈ D.
V7 s(t(A)) = (st)A by Theorem 3.1.1.
V8 (s+ t)A = sA + tA by Theorem 3.1.1.
V9 s(A+B) = sA + sB by Theorem 3.1.1.
V10 1A = A by Theorem 3.1.1.
Hence, D is a vector space.
82
REMARK
Notice that we did not really need to check axioms V2, V3, V5, V7, V8, V9, and
V10 since we were using the addition and scalar multiplication operations of a vector
space. We will look more at this below.
EXAMPLE 9 Show the set S = {(x, y) | x, y ∈ R} with addition defined by (x1, y1) ⊕ (x2, y2) =
(x1 + y2, x2 + y1) and scalar multiplication defined by c (x1, y1) = (cx1, cy1) is not
a vector space.
Solution: To show that V is not a vector space, we only need to find one counter
example to one of the vector space axioms. Observe that (1, 2) ∈ S and that
(1 + 1) (1, 2) = 2 (1, 2) = (2, 4)
1 (1, 2)⊕ 1 (1, 2) = (1, 2)⊕ (1, 2) = (1 + 2, 2 + 1) = (3, 3)
Hence, axiom V8 is not satisfied. Thus, S is not a vector space.
EXAMPLE 10 Show the set Z2 =
{[
x1
x2
]
| x1, x2 ∈ Z
}
is not a vector space under standard addition
and scalar multiplication of vectors.
Solution: Observe that
[
1
2
]
∈ Z2, but √2
[
1
2
]
=
[√
2
2
√
2
]
6∈ Z2. Hence, Z2 does not
satisfy axiom V6, so it is not a vector space.
We have now seen that there are numerous different vector spaces. The advantage to
having all of these belonging to the one abstract concept of a vector space is that we
can study all of them at the same time. That is, when we prove a result for a vector
space, it is immediately true for all vector spaces. We now demonstrate this.
THEOREM 1 Let V be a vector space with addition defined by ~x + ~y and scalar multiplication
defined by c~x for all ~x, ~y ∈ V, and c ∈ R. Then,
(1) 0~x = ~0 for all ~x ∈ V,
(2) −~x = (−1)~x for all ~x ∈ V.
Proof: We prove (1) and leave (2) as an exercise. For all ~x ∈ V we have
0~x = 0~x+~0 by V4
= 0~x+ [~x+ (−~x)] by V5
= 0~x+ [1~x+ (−~x)] by V10
= [0~x+ 1~x] + (−~x) by V2
83
= (0 + 1)~x+ (−~x) by V8
= 1~x+ (−~x) operation of numbers in R
= ~x+ (−~x) by V10
= ~0 by V5
We can now use this result to find the zero vector in any vector space V by simply
finding 0~x for any ~x ∈ V. Similarly, we can find the additive inverse of any ~x ∈ V by
calculating (−1)~x.
EXAMPLE 11 Let S be defined as in Example 7. Observe that we indeed have ~0 = 0 x = x0 = 1,
and (−x) = (−1) x = x−1 = 1
x
for any x ∈ S.
Subspaces
Observe that the set D in Example 8 is a vector space that is contained in the larger
vector spaceM2×2(R). This is exactly the same as we saw in Chapter 1 with subspaces
of Rn.
DEFINITION
Subspace
Let V be a vector space. If S is a subset of V and S is a vector space under the same
operations as V, then S is called a subspace of V.
As in Example 8, we see that we do not need to test all ten vector space axioms to
show that a set S is a subspace of a vector space V. In particular, we get the following
theorem which matches what we saw in Chapter 1.
THEOREM 2 (Subspace Test)
Let S be a non-empty subset of V. If ~x+ ~y ∈ S and c~x ∈ S for all ~x, ~y ∈ S and c ∈ R
under the operations of V, then S is a subspace of V.
Proof: The proof is left as an exercise.
84
REMARKS
1. It is very important to always remember to check that S is a subset of V and is
non-empty before applying the subspace test.
2. As before, we will say that S is closed under addition if ~x + ~y ∈ S for all
~x, ~y ∈ S, and we will say that S is closed under scalar multiplication if
c~x ∈ S for all ~x ∈ S and c ∈ R.
EXAMPLE 12 Determine which of the following are subspaces of the given vector space.
a) S1 = {xp(x) | p(x) ∈ P2(R)} of P4(R).
Solution: Recall that P2(R) is the set of all polynomials of degree at most 2. That
is, polynomials of the form p(x) = a0 + a1x + a2x
2. Similarly, P4(R) is the set of
polynomials of degree at most 4. Then every vector q(x) ∈ S1 has the form
q(x) = x(a0 + a1x+ a2x
2) = a0x+ a1x
2 + a2x
3
which is a polynomial of degree at most 4 (actually, of at most 3). Hence, S1 is a
subset of P4(R).
S1 is clearly non-empty since x(1 + x+ x
2) ∈ S1.
Let q1(x), q2(x) ∈ S1. Then there exist p1(x), p2(x) ∈ P2(R) such that q1(x) = xp1(x)
and q2(x) = xp2(x). We have
q1(x) + q2(x) = xp1(x) + xp2(x) = x(p1(x) + p2(x))
and p1(x) + p2(x) ∈ P2(R) as P2(R) is closed under addition. So, q1(x) + q2(x) ∈ S1.
Similarly, for any c ∈ R we get
cq1(x) = c(xp1(x)) = x(cp1(x)) ∈ S1
since cp1(x) ∈ P2(R) as P2(R) is closed under scalar multiplication.
Therefore, S1 is a subspace of P4(R) by the Subspace Test.
b) S2 =
{[
a1 a2
0 a3
]
| a1a2a3 = 0
}
of M(2× 2)(R).
Solution: Observe that A =
[
1 1
0 0
]
and B =
[
0 0
0 1
]
are both in S2, but
A+B =
[
1 1
0 1
]
6∈ S2
So, S2 is not closed under addition and hence is not a subspace.
85
c) S3 = {p(x) ∈ P2(R) | p(2) = 0} of P2(R).
Solution: By definition S3 is a subset of P2(R).
In P2(R) the zero vector is the polynomial that satisfies z(x) = 0 for all x. Hence,
z(x) ∈ S3 since z(2) = 0. Therefore, S3 is non-empty.
Let p(x), q(x) ∈ S3. Then p(2) = 0 and q(2) = 0, so (p+q)(2) = p(2)+q(2) = 0+0 = 0.
Hence, (p+ q) ∈ S3. Thus, S3 is closed under addition.
Similarly, (cp)(2) = cp(2) = c(0) = 0 for all c ∈ R, so (cp) ∈ S3. Thus, it is also closed
under scalar multiplication. Therefore S3 is a subspace of P2(R) by the Subspace Test.
d) S4 =
{[
1 2
3 4
]}
of M2×2(R).
Solution: Observe that 0
[
1 2
3 4
]
=
[
0 0
0 0
]
6∈ S3. Therefore, S3 is not closed under
scalar multiplication and hence is not a subspace.
REMARK
As with subspaces in Chapter 1, normally the easiest way to verify the set S is non-
empty is to check if the zero vector of V, ~0V, is in S. Moreover, if ~0V 6∈ S, then, as
we saw in the example above, S is not closed under scalar multiplication and hence
is not a subspace.
EXAMPLE 13 A matrix A is called skew-symmetric if AT = −A. Determine if the set of all n×n
skew-symmetric matrices with real entries is a vector space under standard addition
and scalar multiplication of matrices.
Solution: To prove the set S of skew-symmetric matrices is a vector space, we will
prove that it is a subspace of Mn×n(R).
By definition S is a subset of Mn×n(R). Also, the n × n zero matrix On,n clearly
satisfies AT = −A, thus On,n ∈ S.
Let A and B be any two skew-symmetric matrices. Then
(A +B)T = AT +BT = (−A) + (−B) = −A−B = −(A+B)
so A+B is skew-symmetric and hence S is closed under addition.
Let c ∈ R then (cA)T = c(AT ) = c(−A) = −(cA) so (cA) is skew-symmetric and
therefore S is closed under scalar multiplication.
Thus, S is a subspace of Mn×n(R) by the Subspace Test and therefore is a vector
space.
86
Axioms V1 and V6 imply that every vector space is closed under linear combinations.
Therefore, it makes sense to look at the concepts of spanning and linear independence
as we did in Chapter 1.
Spanning
DEFINITION
Span
Let B = {~v1, . . . , ~vk} be a set of vectors in a vector space V. Then we define the span
of B by
SpanB = {c1~v1 + c2~v2 + · · ·+ ck~vk | c1, . . . , ck ∈ R}
We say that SpanB is spanned by B and that B is a spanning set for SpanB.
EXAMPLE 14 Let B = {1 + 2x+ 3x2, 3 + 2x+ x2, 5 + 6x+ 7x2}. Determine which of the following
vectors is in SpanB.
a) p(x) = 1 + x+ 2x2.
Solution: To determine if p ∈ SpanB, we need to find whether there exist coefficients
c1, c2, and c3 such that
1 + x+ 2x2 = c1(1 + 2x+ 3x
2) + c2(3 + 2x+ x
2) + c3(5 + 6x+ 7x
2)
= (c1 + 3c2 + 5c3) + (2c1 + 2c2 + 6c3)x+ (3c1 + c2 + 7c3)x
2
Comparing coefficients of powers of x on both sides gives us the system of linear
equations
c1 + 3c2 + 5c3 = 1
2c1 + 2c2 + 6c3 = 1
3c1 + c2 + 7c3 = 2
Row reducing the augmented matrix of the system we get
1 3 5 12 2 6 1
3 1 7 2
∼
1 3 5 10 −4 −4 −1
0 0 0 1
Hence, the system is inconsistent, so p 6∈ SpanB.
b) q(x) = 2 + x.
Solution: To determine if q ∈ SpanB, we need to find whether there exist coefficients
c1, c2, and c3 such that
2 + x = c1(1 + 2x+ 3x
2) + c2(3 + 2x+ x
2) + c3(5 + 6x+ 7x
2)
= (c1 + 3c2 + 5c3) + (2c1 + 2c2 + 6c3)x+ (3c1 + c2 + 7c3)x
2
87
Comparing coefficients of powers of x on both sides gives us the system of linear
equations
c1 + 3c2 + 5c3 = 2
2c1 + 2c2 + 6c3 = 1
3c1 + c2 + 7c3 = 0
Row reducing the augmented matrix of the system we get
1 3 5 22 2 6 1
3 1 7 0
∼
1 0 2 −1/40 1 1 3/4
0 0 0 0
Hence, the system is consistent, so q ∈ SpanB. In particular, if we solve the system,
we get
~c =
c1c2
c3
=
−1/43/4
0
+ t
−2−1
1
for any t ∈ R. Therefore, there are infinitely many different ways to write q as a
linear combination of the vectors in B.
EXAMPLE 15 Let B =
{[
1 1
0 1
]
,
[
1 2
0 −1
]
,
[
1 3
0 2
]}
. Prove that SpanB =
{[
a1 a2
0 a3
]
| a, b, c ∈ R
}
.
Solution: We need to show that every vector of the form
[
a1 a2
0 a3
]
can be written as
a linear combination of the vectors in B. That is, we need to show that the equation
c1
[
1 1
0 1
]
+ c2
[
1 2
0 −1
]
+ c3
[
1 3
0 2
]
=
[
a1 a2
0 a3
]
is consistent for all a1, a2, a3 ∈ R.
Using operations on matrices on the left hand side, we get[
c1 + c2 + c3 c1 + 2c2 + 3c3
0 c1 − c2 + 2c3
]
=
[
a1 a2
0 a3
]
Comparing entries gives the system of linear equations
c1 + c2 + c3 = a1
c1 + 2c2 + 3c3 = a2
c1 − c2 + 2c3 = a3
We row reduce the coefficient matrix to get
1 1 11 2 3
1 −1 2
∼
1 0 00 1 0
0 0 1
By Theorem 2.2.3, the system is consistent for all a1, a2, a3 ∈ R.
88
REMARK
Notice in the last example that to prove the system was consistent for all a1, a2, a3 ∈ R
we only needed to row reduce the coefficient matrix. However, if we had row reduced
the augmented matrix, we would have found equations for c1, c2, and c3 in terms of
a1, a2, and a3.
THEOREM 3 Let B = {~v1, . . . , ~vk} be a set of vectors in a vector space V. Then, SpanB is a
subspace of V.
Proof: The proof is left as an exercise.
Linear Independence
DEFINITION
Linearly Dependent
Linearly
Independent
A set of vectors {~v1, . . . , ~vk} in a vector space V is said to be linearly dependent if
there exist coefficients c1, . . . , ck not all zero such that
~0 = c1~v1 + · · ·+ ck~vk
A set of vectors {~v1, . . . , ~vk} is said to be linearly independent if the only solution
to
~0 = c1~v1 + · · ·+ ck~vk
is the trivial solution c1 = c2 = · · · = ck = 0.
EXAMPLE 16 Determine which of the following sets are linearly independent. If a set is linearly
dependent write all linear combinations of the vectors which equals ~0.
a) {1 + x+ 2x2, x− x2,−2 + x2}.
Solution: By definition, the set is linearly independent if and only if the only solution
to the equation
0 = c1(1+x+2x
2)+c2(x−x2)+c3(−2+x2) = (c1−2c3)+(c1+c2)x+(2c1−c2+c3)x2
is c1 = c2 = c3 = 0. Comparing coefficients of powers of x gives the homogeneous
system of equations
c1 − 2c3 = 0
c1 + c2 = 0
2c1 − c2 + c3 = 0
Row reducing the coefficient matrix gives
1 0 −21 1 0
2 −1 1
∼
1 0 00 1 0
0 0 1
89
By Theorem 2.2.3, the system has a unique solution. Thus, the set is linearly inde-
pendent.
b)
{[
1 2
1 −2
]
,
[
1 −2
2 2
]
,
[
2 0
3 0
]
,
[
1 1
1 1
]}
.
Solution: Consider[
0 0
0 0
]
= c1
[
1 2
1 −2
]
+ c2
[
1 −2
2 2
]
+ c3
[
2 0
3 0
]
+ c4
[
1 1
1 1
]
Simplifying the right hand side gives[
0 0
0 0
]
=
[
c1 + c2 + 2c3 + c4 2c1 − 2c2 + c4
c1 + 2c2 + 3c3 + c4 −2c1 + 2c2 + c4
]
Comparing entries we get the homogeneous system
c1 + c2 + 2c3 + c4 = 0
2c1 − 2c2 + c4 = 0
c1 + 2c2 + 3c3 + c4 = 0
−2c1 + 2c2 + c4 = 0
We row reduce the coefficient matrix to get
1 1 2 1
2 −2 0 1
1 2 3 1
−2 2 0 1
∼
1 0 1 0
0 1 1 0
0 0 0 1
0 0 0 0
Therefore, by Theorem 2.2.3, the system has infinitely many solution and hence the
set is linearly dependent. In particular, finding the general solution to the system
shows us that[
0 0
0 0
]
= (−t)
[
1 2
1 −2
]
+ (−t)
[
1 −2
2 2
]
+ t
[
2 0
3 0
]
+ 0
[
1 1
1 1
]
for all t ∈ R.
THEOREM 4 Any set of vectors {~v1, . . . , ~vk} in a vector space V such that ~vi = ~0 for some i is
linearly dependent.
Proof: The proof is left as an exercise.
90
4.2 Bases and Dimension
Bases
Theorem 4.1.3 shows that every set spanned by a set of vectors is a vector space. It
is easy to see that the converse is also true. However, as we saw in Chapter 1, most
of the time we wish to have a minimal spanning set for a vector space. That is, a
set containing as few vectors as possible that spans the vector space. We get the
following results.
THEOREM 1 If ~vi ∈ Span{~v1, . . . , ~vi−1, ~vi+1, . . . , ~vk}, then
Span{~v1, . . . , ~vk} = Span{~v1, . . . , ~vi−1, ~vi+1, . . . , ~vk}
Proof: The proof is left as an exercise.
THEOREM 2 Let B = {~v1, . . . , ~vk}. If ~vi 6∈ Span{~v1, . . . , ~vi−1, ~vi+1, . . . , ~vk} for 1 ≤ i ≤ k, then B is
linearly independent.
Proof: We prove the contrapositive. Assume that {~v1, . . . , ~vk} is linearly dependent.
Then, there exist c1, . . . , ck not all zero such that
c1~v1 + · · ·+ ck~vk = ~0
Pick one of the coefficients that is non-zero, say ci. Then, we get
ci~vi = −c1~v1 − · · · − ci−1~vi−1 − ci+1~vi+1 − · · · − ck~vk
~vi = −c1
ci
~v1 − · · · − ci−1
ci
~vi−1 − ci+1
ci
~vi+1 − · · · − ck
ci
~vk
Hence, ~vi ∈ Span{~v1, . . . , ~vi−1, ~vi+1, . . . , ~vk}.
These theorems show that a minimal spanning set for a vector space is one that is
linearly independent. Thus, we make the following definition.
DEFINITION
Basis
Let V be a vector space. The set B is called a basis for V if B is a linearly independent
spanning set for V.
91
EXAMPLE 1 Prove that B =
32
1
,
21
5
,
10
1
is a basis for R3.
Solution: For it to be a basis it must be a linearly independent spanning set. Con-
sider
b1b2
b3
= c1
32
1
+ c2
21
5
+ c3
10
1
=
3c1 + 2c2 + c32c1 + c2
c1 + 5c2 + c3
Comparing corresponding entries we get the system of linear equations
3c1 + 2c2 + c3 = b1
2c1 + c2+ = b2
c1 + 5c2 + c3 = b3
Row reducing the corresponding coefficient matrix we get
3 2 12 1 0
1 5 1
∼
1 0 00 1 0
0 0 1
Since there is a leading 1 in each row, the system is consistent for every
b1b2
b3
∈ R3
by Theorem 2.2.3. Hence B spans R3.
Similarly, since there is a leading 1 in each column, the homogeneous system (b1 =
b2 = b3 = 0) has a unique solution by Theorem 2.2.3. Thus B is also linearly
independent.
Therefore, B is a basis for R3.
EXAMPLE 2 The set {1, x, x2, . . . , xn} is a basis for Pn(R) since it is clearly linearly independent
and a spanning set for Pn(R). It is called the standard basis for Pn(R).
EXAMPLE 3 The set
{[
1 0
0 0
]
,
[
0 1
0 0
]
,
[
0 0
1 0
]
,
[
0 0
0 1
]}
is the standard basis for M2×2(R).
EXAMPLE 4 Prove that B = {1 + 2x+ x2, 2 + 9x, 3 + 3x+ 4x2} is a basis for P2(R).
Solution: First, we need to show that for any polynomial p(x) = c + bx + ax2 we
92
can find c1, c2, c3 such that
c+ bx+ ax2 = c1(1 + 2x+ x
2) + c2(2 + 9x) + c3(3 + 3x+ 4x
2)
= (c1 + 2c2 + 3c3) + (2c1 + 9c2 + 3c3)x+ (c1 + 4c3)x
2.
Row reducing the coefficient matrix of the corresponding system gives
1 2 32 9 3
1 0 4
∼
1 0 00 1 0
0 0 1
By Theorem 2.2.3, SpanB = P2(R) since the system is consistent for every right hand
side
cb
a
. Moreover, if we let a = b = c = 0, we see that we get the unique solution
c1 = c2 = c3 = 0, and hence B is also linearly independent. Therefore, it is a basis
for P2(R).
EXAMPLE 5 Determine whether B =
{[
1 1
0 1
]
,
[
1 2
0 −1
]
,
[
1 3
0 2
]}
is a basis for the vector space
U =
{[
a b
0 c
]
| a, b, c ∈ R
}
Solution: Consider[
0 0
0 0
]
= c1
[
1 1
0 1
]
+ c2
[
1 2
0 −1
]
+ c3
[
1 3
0 2
]
=
[
c1 + c2 + c3 c1 + 2c2 + 3c3
0 c1 − c2 + 2c3
]
.
This gives us the homogeneous system of linear equations
c1 + c2 + c3 = 0
c1 + 2c2 + 3c3 = 0
c1 − c2 + 2c3 = 0
Row reducing the coefficient matrix of the homogeneous system we get
1 1 11 2 3
1 −1 2
∼
1 0 00 1 0
0 0 1
.
Hence, the only solution is c1 = c2 = c3 = 0, so the set is linearly independent.
Now, consider [
a b
0 c
]
= c1
[
1 1
0 1
]
+ c2
[
1 2
0 −1
]
+ c3
[
1 3
0 2
]
.
This gives a system of linear equations with the same coefficient matrix as above.
So, using the same row-operations we did above we see that the matrix has a leading
93
one in each row of its REF, hence the system is consistent by Theorem 2.2.3 for all
matrices
[
a b
0 c
]
, so the set spans U.
Therefore, B is a basis for U.
EXAMPLE 6 Find a basis for the vector space S of 2× 2 skew-symmetric matrices.
Solution: We first want to find a spanning set for S. If A =
[
a b
c d
]
is skew-
symmetric, then AT = −A, so [
a c
b d
]
=
[−a −b
−c −d
]
For this to be true, we must have a = −a, c = −b, b = −c, and d = −d. Thus,
a = 0 = d and b = −c. Hence, every vector in S can be written in the form
A =
[
a b
c d
]
=
[
0 b
−b 0
]
= b
[
0 1
−1 0
]
Thus, S = Span
{[
0 1
−1 0
]}
, so B =
{[
0 1
−1 0
]}
is a spanning set for S. Moreover,
B contains one non-zero vector, so it is clearly linearly independent. Therefore, B is
a basis for S.
In Linear Algebra we often wish to find a basis for a vector space. The example
above shows the general procedure for finding a basis for a vector space which can be
spanned by a finite number of vectors.
ALGORITHM
Let V be a vector space. To find a basis for V:
1. Find the general form of a vector ~x ∈ V.
2. Write the general form of ~x as a linear combination of vectors ~v1, . . . , ~vk in V.
3. Check if B = {~v1, . . . , ~vk} is linearly independent. If it is, then stop as B is a
basis.
4. If B is linearly dependent, pick some vector ~vi ∈ B which can be written as a
linear combination of the other vectors in B and remove it from the set using
Theorem 1. Repeat this until B is linearly independent.
94
EXAMPLE 7 Find a basis for the vector space V =
a− ba− c
c− b
| a, b, c ∈ R
.
Solution: Any vector ~v ∈ V has the form
~v =
a− ba− c
c− b
= a
11
0
+ b
−10
−1
+ c
0−1
1
Thus, V = Span
11
0
,
−10
−1
,
0−1
1
. Consider
00
0
= c1
11
0
+ c2
−10
−1
+ c3
0−1
1
=
c1 − c2c1 − c3
−c2 + c3
Row reducing the coefficient matrix of the corresponding system of linear equations
gives
1 −1 01 0 −1
0 −1 1
∼
1 0 −10 1 −1
0 0 0
Hence, the set is linearly dependent. However, interpreting the coefficient matrix
above as an augmented matrix, we get that
1 −1 01 0 −1
0 −1 1
∼
1 0 −10 1 −1
0 0 0
This tells us that
0−1
1
= −
11
0
−
−10
−1
So, we can apply Theorem 1, to get that the set B =
11
0
,
−10
−1
also spans V.
Now consider
00
0
= c1
11
0
+ c2
−10
−1
=
c1 − c2c1
−c2
Using exactly the same row operations as above gives
1 −11 0
0 −1
∼
1 00 1
0 0
Hence, the system has a unique solution, so B is linearly independent. Thus, B is a
basis for V.
95
In general, it would be very inefficient to remove one vector at a time to reduce a
spanning set to a basis. Moreover, notice that when we did remove a vector from
B in the example above, that all we did was remove the corresponding column from
the matrix representation of the corresponding system of equations. So, we often can
use our knowledge and understanding of solving systems of equations to be able to
remove all linearly dependent vectors at once. This is demonstrated in the example
below.
EXAMPLE 8 Consider the vectors p1, p2, p3, p4, and p5 given by
p1(x) = 4 + 7x− 12x3 + 9x4, p2(x) = −2 + 12x− 8x2 + 4x4,
p3(x) = −16 + 3x− 16x2 + 36x3 − 19x4, p4(x) = −6 + 19x2 − 6x3,
p5(x) = 82− 49x− 96x2 − 12x3 + 17x4,
in P4(R), and let V = Span{p1, p2, p3, p4, p5}. Determine a basis for V given that the
following two matrices are row equivalent:
4 −2 −16 −6 82
7 12 3 0 −49
0 −8 −16 19 −96
−12 0 36 −6 −12
9 4 −19 0 17
and
1 0 −3 0 5
0 1 2 0 −7
0 0 0 1 −8
0 0 0 0 0
0 0 0 0 0
Solution: Observe that the first matrix is the coefficient matrix of the homogeneous
system corresponding to the equation
0 = c1p1 + c2p2 + c3p3 + c4p4 + c5p5
The RREF shows us that the set {p1, p2, p3, p4, p5} is linearly dependent. However,
observe that if we ignore the last two columns, and interpret the first three columns
as an augmented matrix, then this shows that
p3 = −3p1 + 2p2
Similarly, interpreting all the columns as an augmented matrix and taking c3 = 0, we
find that
p5 = 5p1 − 7p2 − 8p4
Hence, by Theorem 1, we get that V = Span{p1, p2, p4}. Moreover, if we ignore the
third and fifth column of the matrices, we see that the equation
0 = c1p1 + c2p2 + c4p4
has a unique solution, hence {p1, p2, p4} is linearly independent. Thus, {p1, p2, p4} is
a basis for V.
96
Dimension
In Chapter 1, we geometrically understood that Rn is n-dimensional and, for example,
a plane in Rn is two-dimensional. We now generalize the concept of dimension to
vector spaces. We begin with the following important theorems.
THEOREM 3 Let B = {~v1, . . . , ~vn} be a basis for V, and let {~w1, . . . , ~wk} be a linearly independent
set in V. Then k ≤ n.
Proof: Since ~w1, . . . , ~wk ∈ V and B is a basis for V, we can write every vector
~w1, . . . , ~wk as a linear combination of the vectors in B. Say,
~w1 = a11~v1 + a12~v2 + · · ·+ a1n~vn
~w2 = a21~v1 + a22~v2 + · · ·+ a2n~vn
...
...
...
~wk = ak1~v1 + ak2~v2 + · · ·+ akn~vn
Consider,
~0 = c1 ~w1 + · · ·+ ck ~wk
= c1(a11~v1 + a12~v2 + · · ·+ a1n~vn) + · · ·+ ck(ak1~v1 + ak2~v2 + · · ·+ akn~vn)
= (c1a11 + · · ·+ ckak1)~v1 + · · ·+ (c1a1n + · · ·+ ckakn)~vn
Since B is a basis, it is linearly independent, and hence the only solution is
c1a11 + · · ·+ ckak1 = 0
...
...
...
c1a1n + · · ·+ ckakn = 0
If k > n, then the system has infinitely many solutions by Theorem 2.2.3. But, this
would imply that ~0 = c1 ~w1+· · ·+ck ~wk has infinitely many solutions which contradicts
our assumption that {~w1, . . . , ~wk} is linearly independent. Hence, k ≤ n.
REMARK
Recall that we showed that we can think of a basis as a minimal spanning set. This
theorem shows that we can also think of a basis as a maximal linearly independent
set. That is, a basis B is a linearly independent set such that if we add any other
vector from the vector space to B, then the resulting set would be linearly dependent.
THEOREM 4 Let {~v1, . . . , ~vn} and {~w1, . . . , ~wk} be bases for a vector space V. Then, k = n.
Proof: The proof is left as an exercise.
97
Thus, if a vector space V has a basis containing a finite number of elements, then
every basis of V has the same number of elements. This implies that every basis for
Rn contains exactly n vectors, and every basis for a plane in Rn contains exactly two
vectors which matches our geometric understanding of dimension. Thus, we make
the following definition.
DEFINITION
Dimension
Let {~v1, . . . , ~vn} be a basis for a vector space V. Then, we say the dimension of V
is n and we write
dim V = n
REMARK
If V does not have a basis with a finite number of vectors in it, then V is said to be
infinite-dimensional. We define a basis for the trivial vector space to be the empty
set, so that the dimension of the trivial vector space is 0.
EXAMPLE 9 The dimension of Rn is n since the standard basis contains n vectors.
The dimension of Mm×n(R) is mn since the standard basis contains mn vectors.
The dimension of Pn(R) is n+ 1 since the standard basis contains n+ 1 vectors.
EXAMPLE 10 Since
{[
0 1
−1 0
]}
is a basis for the vector space S of 2× 2 skew-symmetric matrices,
we have that the dimension of S is 1.
EXAMPLE 11 In Example 7, we found that a basis for V =
a− ba− c
c− b
| a, b, c ∈ R
is B =
11
0
,
−10
−1
. Thus, dimV = 2. So, geometrically V is a plane in R3.
The concept of dimension can be very useful in many cases. This is demonstrated in
the following theorem.
THEOREM 5 Let V be an n-dimensional vector space. Then
(1) A set of more than n vectors in V must be linearly dependent.
(2) A set of fewer than n vectors in V cannot span V.
(3) A set of n vectors in V is linearly independent if and only if it spans V.
Proof: The proof is left as an exercise.
98
REMARK
Observe that (3) shows that if we have a set B of n linearly independent vectors in
an n-dimensional vector space V, then B is a basis for V. Similarly, if C contains n
vectors and Span C = V, then C is a basis for V.
EXAMPLE 12 Find a basis for the hyperplane x1 + x2 + 2x3 + x4 = 0 in R4 and then extend the
basis to obtain a basis for R4.
Solution: By definition of a hyperplane, we know that a hyperplane in R4 has
dimension 3. Hence, by Theorem 5, any three linearly independent vectors in the
hyperplane will form a basis for the hyperplane. Observe that the vectors ~v1 =
1
−1
0
0
,
~v2 =
0
2
−1
0
, and ~v3 =
0
0
1
−2
satisfy the equation of the hyperplane and hence are in
the hyperplane. Now, consider
0
0
0
0
= c1~v1 + c2~v2 + c3~v3 =
c1
−c1 + 2c2
−c2 + c3
−2c3
The only solution is c1 = c2 = c3 = 0 so the vectors are linearly independent. Thus,
B = {~v1, ~v2, ~v3} is a basis for the hyperplane.
To extend the basis B to a basis for R4, we just need to add a vector ~v4 so that
{~v1, ~v2, ~v3, ~v4} is linearly independent. Since ~v4 =
1
0
0
0
does not satisfy the equation
of the hyperplane, it is not in Span(~v1, ~v2, ~v3) and so is not a linear combination of
~v1, ~v2, ~v3. Thus, {~v1, ~v2, ~v3, ~v4} is linearly independent and therefore a basis for R4.
THEOREM 6 Let V be an n-dimensional vector space, and let {~v1, . . . , ~vk} be a linearly indepen-
dent set in V with k < n. Then, there exist vectors ~wk+1, . . . , ~wn in V such that
{~v1, . . . , ~vk, ~wk+1, . . . , ~wn} is a basis for V.
Proof: The proof is left as an exercise.
We will use this fact that we can always extend a linearly independent set to a basis
for a finite dimensional vector space in a few proofs in Math 235.
99
4.3 Coordinates
Observe that when we write a vector ~x =
x1
...
xn
in Rn, that we really mean
~x = x1~e1 + · · ·+ xn~en
where {~e1, . . . , ~en} is the standard basis for Rn. For example, when you originally
learned to plot the point (1, 2) in the xy-plane, you were taught that this means you
move 1 in the x-direction (~e1) and 2 in the y-direction (~e2). Of course, we can extend
this to general vector spaces.
Let B = {~v1, . . . , ~vn} be a basis for a vector space V. Then every vector ~v in V can
be written as a linear combination of the vectors in B. Say, ~v = b1~v1 + · · · + bn~vn.
Thus, we can think of ~v as being b1 in the ~v1 direction, b2 in the ~v2 direction, etc.
Therefore, with respect to the basis B, it is the coefficients of the linear combination
of the basis vectors which gives ~v that defines the vector ~v. This is demonstrated in
the next example.
EXAMPLE 1 Let ~x =
1−2
3
, ~y =
05
−4
, p(x) = 1−2x+3x2, q(x) = 5x−4x2. Also, let {~v1, ~v2, ~v3}
be a basis for V = Span{~v1, ~v2, ~v3} and let ~v = ~v1−2~v2+3~v3, and ~w = 0~v1+5~v2−4~v3.
Then we have
~x+ ~y =
1 + 0−2 + 5
3− 4
=
13
−1
p(x) + q(x) = (1 + 0) + (−2 + 5)x+ (3− 4)x2 = 1 + 3x− x2
~v + ~w = (1 + 0)~v1 + (−2 + 5)~v2 + (3− 4)~v3 = 1~v1 + 3~v2 − ~v3
From this example we notice that once we have written out the vectors with respect
to a basis, performing operations on vectors in any finite dimensional vector space
is exactly the same as operations on vectors in Rn. However, before we proceed any
further, we show that this concept is well defined by showing that if we have a basis
for a vector space, then every vector can be written as a unique linear combination
of the basis vectors.
THEOREM 1 Let B = {~v1, . . . , ~vn} be a basis for V. Then, every vector ~v ∈ V can be represented
as a unique linear combination of ~v1, . . . , ~vn.
Proof: Since B is a basis, it is a spanning set. Thus, for every ~v ∈ V there exist
c1, . . . , cn such that
c1~v1 + · · ·+ cn~vn = ~v
100
Assume that there also exists d1, . . . , dn such that d1~v1 + · · ·+ dn~vn = ~v. Then
c1~v1 + · · ·+ cn~vn = d1~v1 + · · ·+ dn~vn ⇒ (c1 − d1)~v1 + · · ·+ (cn − dn)~vn = ~0
But, this implies that ci = di for 1 ≤ i ≤ n since B is linearly independent. Thus,
there is only one linear combination of the vectors in B that equals ~v.
This allows us to make the following definition.
DEFINITION
Coordinate Vector
Let V be a vector space with basis B = {~v1, . . . , ~vn}. For any ~v ∈ V we define the
coordinate vector of ~v with respect to B by
[~v]B =
b1
...
bn
where ~v = b1~v1 + · · ·+ bn~vn.
REMARK
We will often say the B-coordinates of a vector rather than saying the coordinate
vector of ~v with respect to B.
EXAMPLE 2 Given that the set B =
{[
1 1
0 −1
]
,
[
1 0
1 1
]
,
[
0 1
1 2
]}
is a basis for the subspace of
M2×2(R) which it spans, find the B-coordinates of ~u =
[
1 −3
2 3
]
and ~v =
[−1 0
3 7
]
.
Solution: We need to find c1, c2, c3 and d1, d2, d3 such that
c1
[
1 1
0 −1
]
+ c2
[
1 0
1 1
]
+ c3
[
0 1
1 2
]
=
[
1 −3
2 3
]
d1
[
1 1
0 −1
]
+ d2
[
1 0
1 1
]
+ d3
[
0 1
1 2
]
=
[−1 0
3 7
]
Observe that each corresponding system will have the same coefficient matrix, but
different augmented part. Thus, we can solve both systems simultaneously by row
reducing a doubly augmented matrix. We get
1 1 0 1 −1
1 0 1 −3 0
0 1 1 2 3
−1 1 2 3 7
∼
1 0 0 −2 −2
0 1 0 3 1
0 0 1 −1 2
0 0 0 0 0
101
For the first system we have c1 = −2, c2 = 3, and c3 = −1, and for the second system
we have d1 = −2, d2 = 1, and d3 = 2. Thus,
[~u]B =
−23
−1
and [~v]B =
−21
2
REMARK
It is important to notice that the coordinates of a vector is dependent on which basis
is being used for the vector space. Moreover, it is also dependent on the order of
vectors in the basis. Thus, to prevent confusion, when we speak of a basis, we mean
an ordered basis.
As intended, coordinate vectors allow us to change from working with vectors in some
vector space with respect to some basis to working with vectors in Rn. In particular,
observe that taking coordinates of a vector ~v with respect to a basis B of a vector
space V is really a function which takes a vector in V and outputs a vector in Rn.
Not surprisingly, this function is linear.
THEOREM 2 Let V be a vector space with basis B = {~v1, . . . , ~vn}. Then, for any ~v, ~w ∈ V and
s, t ∈ R we have
[s~v + t~w]B = s[~v]B + t[~w]B
Proof: Let ~v = b1~v1 + · · ·+ bn~vn and ~w = c1~v1 + · · ·+ cn~vn. Then we have
s~v + t~w = (sb1 + tc1)~v1 + · · ·+ (sbn + tcn)~vn
Therefore,
[s~v + t~w]B =
sb1 + tc1
...
sbn + tcn
= s
b1
...
bn
+ t
c1
...
cn
= s[~v]B + t[~w]B
This function also satisfies two other important properties: it is one-to-one and onto.
As we will see in Math 235, this will allow us to show that, as vector spaces, any
n-dimensional vector space is essentially the same as Rn.
102
Change of Coordinates
We have seen how to find different bases for a vector space V and how to find the
coordinates of a vector ~v ∈ V with respect to any basis B of V. In some cases, it is
useful to have a quick way of determining the coordinates of ~v with respect to some
basis C for V given the coordinates of ~v with respect to the basis B.
For example, in some applications it is useful to write polynomials in terms of powers
of x − c. That is, given any polynomial p(x) = a0 + a1x + · · · + anxn, you want to
write it as
p(x) = b0 + b1(x− c) + b2(x− c)2 + · · ·+ bn(x− c)n
Such a situation may arise if the values of x you are working with are very close to
c. If you are working with many polynomials, then it would be very helpful to have
a fast way of converting each polynomial. We can rephrase this problem in terms of
linear algebra.
Let S = {1, x, . . . , xn} be the standard basis for Pn(R). Then, given [p(x)]S =
a0
...
an
we want to determine [p(x)]B where B is the basis B = {1, x−c, (x−c)2, . . . , (x−c)n}
for Pn(R).
Observe that since taking coordinates is a linear operation by Theorem 2, we get
[p(x)]B = [a01 + a1x+ · · ·+ anxn]B
= a0[1]B + a1[x]B + · · ·+ an[xn]B
=
[
[1]B [x]B · · · [xn]B
]
a0
...
an
So, the multiplication of this matrix and the coordinate vector of p(x) with respect
to the standard basis gives us the coordinates of p(x) with respect to the new basis
B. We call this matrix the change of coordinates matrix from S coordinates to
B coordinates and denote it by BPS .
EXAMPLE 3 Consider the basis B =
12
3
,
−10
1
,
−20
−3
for R3. Find the change of coordinates
matrix from B to the standard basis S.
Solution: The columns of the desired change of coordinates matrix are the S-
coordinates of the vectors in B. We have
12
3
S
=
12
3
,
−10
1
S
=
−10
1
,
−20
−3
S
=
−20
−3
103
So, the change of coordinates matrix SPB is
SPB =
12
3
S
−10
1
S
−20
−3
S
=
1 −1 −22 0 0
3 1 −3
Of course, we can repeat the argument above in the case where B and C are any two
bases of a vector space V. In particular, if B = {~v1, . . . , ~vn} and [~x]B =
b1
...
bn
, then
[~x]C = [b1~v1 + · · ·+ bn~vn]C
= b1[~v1]C + · · ·+ bn[~vn]C
=
[
[~v1]C · · · [~vn]C
]
b1
...
bn
Hence, we make the following definition.
DEFINITION
Change of
Coordinates Matrix
Let B = {~v1, . . . , ~vn} and C both be bases for a vector space V. Then, the change of
coordinates matrix from B-coordinates to C-coordinates is defined by
CPB =
[
[~v1]C · · · [~vn]C
]
and for any ~x ∈ V we have
[~x]C =CPB[~x]B
EXAMPLE 4 Let B =
{[
1
3
]
,
[
2
1
]}
and C =
{[−1
1
]
,
[
5
−4
]}
be bases for R2. Find CPB and BPC.
Solution: To find CPB we need to find the C-coordinates of the vectors in B. Thus,
we need to solve the systems
c11
[−1
1
]
+ c12
[
5
−4
]
=
[
1
3
]
c21
[−1
1
]
+ c22
[
5
−4
]
=
[
2
1
]
Observe that since system has the same coefficient matrix, the same row operations
will solve each system. Therefore, to make this shorter, we make the multiple aug-
mented system [ −1 5 1 2
1 −4 3 1
]
104
Row reducing this system gives[ −1 5 1 2
1 −4 3 1
]
∼
[
1 0 19 13
0 1 4 3
]
Hence, CPB =
[
19 13
4 3
]
. Similarly, for BPC we need to find the B-coordinates of the
vectors in C. We again form a multiple augmented system and row reduce to get[
1 2 −1 5
3 1 1 −4
]
∼
[
1 0 3/5 −13/5
0 1 −4/5 19/5
]
So BPC =
[
3/5 −13/5
−4/5 19/5
]
.
EXAMPLE 5 Let B = {1 + x, 1 + 2x + x2, x − x2} be a basis for P2. Determine the change of
coordinates matrix from B to the standard basis S = {1, x, x2} for P2.
Solution: We need to find the coordinates of the vectors in B with respect to the
basis S. We have
[1 + x]S =
11
0
[1 + 2x+ x2]S =
12
1
[x− x2]S =
01
−1
Hence, the change of coordinates matrix is
SPB =
[
[1 + x]S [1 + 2x+ x
2]S [x− x2]S
]
=
1 1 01 2 1
0 1 −1
EXAMPLE 6 Consider the basis B =
13
−1
,
21
1
,
34
1
. Find the change of coordinates matrix
from the standard basis S to B and hence find
x1x2
x3
B
.
Solution: The columns of the desired change of coordinates matrix are the B-
coordinates of the standard basis vectors. To find these we need to solve the aug-
mented systems
1 2 3 13 1 4 0
−1 1 1 0
,
1 2 3 03 1 4 1
−1 1 1 0
,
1 2 3 03 1 4 0
−1 1 1 1
105
Again, we row reduce the multiple augmented system to get
1 2 3 1 0 03 1 4 0 1 0
−1 1 1 0 0 1
∼
1 0 0 3/5 −1/5 −10 1 0 7/5 −4/5 −1
0 0 1 −4/5 3/5 1
Thus, the change of coordinates matrix from S-coordinates to B-coordinates is
BPS =
3/5 −1/5 −17/5 −4/5 −1
−4/5 3/5 1
Therefore, we have
x1x2
x3
B
=
3/5 −1/5 −17/5 −4/5 −1
−4/5 3/5 1
x1x2
x3
=
35x1 − 15x2 − x37
5
x1 − 45x2 − x3
−4
5
x1 +
3
5
x2 + x3
We can check to make sure that the answer in Example 6 is correct by verify-
ing that these are the B-coordinates of [~x] =
x1x2
x3
. Observe that if
x1x2
x3
B
=
35x1 − 15x2 − x37
5
x1 − 45x2 − x3
−4
5
x1 +
3
5
x2 + x3
, then by definition of coordinates we have
(
3
5
x1−1
5
x2 − x3)
13
−1
+ (7
5
x1 − 4
5
x2 − x3)
21
1
+ (−4
5
x1 +
3
5
x2 + x3)
34
1
=
1 2 33 1 4
−1 1 1
35x1 − 15x2 − x37
5
x1 − 45x2 − x3
−4
5
x1 +
3
5
x2 + x3
=
1 2 33 1 4
−1 1 1
3/5 −1/5 −17/5 −4/5 −1
−4/5 3/5 1
x1x2
x3
=
1 0 00 1 0
0 0 1
x1x2
x3
=
x1x2
x3
Hence, we not only see that our calculations were correct, but we observe that since
106
SPB =
1 2 33 1 4
−1 1 1
we have shown that
SPB BPS = I
Of course, we can prove this in general.
THEOREM 3 Let B and C be bases for an n-dimensional vector space V. Then the change of
coordinate matrices CPB and BPC satisfy
CPB BPC = I =BPC CPB
Proof: By definition of the change of coordinates matrix we have [~x]C =CPB[~x]B and
[~x]B =BPC[~x]C. Combining these gives
[~x]C =CPB[~x]B =CPB BPC[~x]C
Hence I[~x]C =CPB BPC[~x]C for all [~x]C ∈ Rn, so CPB BPC = I by Theorem 4. Similarly,
[~x]B =BPC[~x]C =BPC CPB[~x]B
for all [~x]B ∈ Rn, hence BPC CPB = I.
Chapter 5
Inverses and Determinants
5.1 Matrix Inverses
In the last chapter we saw that if B and C are bases for a finite dimensional vector
space V, then the change of coordinates matrix BPC and the change of coordinates
matrix CPB satisfy
BPCCPB = I
Since the product of these matrices gives the identity matrix (the multiplicative iden-
tity), these matrices should be inverses of each other. We now develop and explore
the idea of inverse matrices.
DEFINITION
Left and Right
Inverse
Let A be an m × n matrix. If B is an n ×m matrix such that AB = Im, then B is
called the right inverse of A. If C is an n×m matrix such that CA = In, then C
is called the left inverse of A.
EXAMPLE 1 Let A =
[
1 2 2
−1 3 1
]
. A right inverse of A is B =
3/5 −2/51/5 1/5
0 0
since
[
1 2 2
−1 3 1
]3/5 −2/51/5 1/5
0 0
= [1 0
0 1
]
Note that this also shows that A is the left inverse of B.
THEOREM 1 If A is an m× n matrix with m > n, then A cannot have a right inverse.
Proof: The proof is left as an exercise.
107
108
COROLLARY 2 If A is an m× n matrix with m < n, then A cannot have a left inverse.
Proof: If A has a left inverse C, then C is an n×m matrix with n > m with a right
inverse which contradicts the previous theorem.
Thus we see that only an n × n matrix can have a left and a right inverse. We now
show that if a matrix has a left and right inverse, then the left and right inverses must
equal.
THEOREM 3 Let A, B and C be n× n matrices such that AB = I = CA, then B = C.
Proof: We have
B = IB = (CA)B = C(AB) = CI = C
DEFINITION
Matrix Inverse
Invertible
Let A be an n×n matrix. If B is a matrix such that AB = I = BA, then B is called
the inverse of A. We write B = A−1 and we say that A is invertible.
REMARK
Observe from the definition that if B = A−1, then A = B−1.
The following theorem is extremely important. It not only shows that the right (left)
inverse of an n × n matrix is necessarily the inverse, but also that the rank of an
invertible matrix is n.
THEOREM 4 Let A be an n×n matrix. If there exists an n×n matrix B such that AB = I, then
rankA = n = rankB and BA = I. Hence, A is invertible.
Proof: The proof is left as an exercise.
THEOREM 5 Let A and B be invertible matrices and let c ∈ R, c 6= 0. Then
(1) (cA)−1 = 1
c
A−1,
(2) (AT )−1 = (A−1)T ,
(3) (AB)−1 = B−1A−1.
109
Proof: We just prove (1) and leave (2) and (3) as exercises.
By Theorem 4, to show that B = A−1 we just need to show that AB = I. Thus, to
show that (cA)−1 = B = 1
c
A−1 we just need to show that (cA)
(
1
c
A−1
)
= I. We have
(cA)
(
1
c
A−1
)
=
c
c
AA−1 = 1I = I
as required.
THEOREM 6 Let A be an n× n matrix with rank n. Then A is invertible.
Proof: Observe that if rankA = n, then the system of equations A~bi = ~ei, 1 ≤ i ≤ n
are all consistent by Theorem 2.2.3. Hence, if we let B =
[
~b1 · · · ~bn
]
, then
AB = A
[
~b1 · · · ~bn
]
=
[
A~b1 · · · A~bn
]
=
[
~e1 · · · ~en
]
= I
Thus, A is invertible by Theorem 4.
Observe that the proof of this theorem teaches us one way of finding the inverse of
a matrix A. The columns of the inverse are the vectors ~bi such that A~bi = ~ei for
1 ≤ i ≤ n. Since each of these systems have the same coefficient matrix A, we
can solve them by making one multiple augmented matrix and row reducing. In
particular, assuming that A is invertible we have
[A | I] ∼ [I | A−1]
REMARK
Compare this with the method for finding the change of coordinates matrix from
standard coordinates in Rn to any other basis B of Rn.
EXAMPLE 2 Find the inverse of A =
1 −2 3−1 4 1
1 −1 4
.
Solution: We row reduce the multiple augmented system [A | I].
1 −2 3 1 0 0−1 4 1 0 1 0
1 −1 4 0 0 1
∼
1 0 0 −17/2 −5/2 70 1 0 −5/2 −1/2 2
0 0 1 3/2 1/2 −1
Thus, A−1 =
−17/2 −5/2 7−5/2 −1/2 2
3/2 1/2 −1
.
110
EXAMPLE 3 Let A ∈ M2×2(R). Find a condition on the entries of A that guarantee that A is
invertible and then, assuming that A is invertible, find A−1.
Solution: Let A =
[
a b
c d
]
. For A to be invertible, we require that rankA = 2.
Thus, we cannot have a = 0 = c. So, we assume that a 6= 0. Row reducing [A | I] we
get
[
a b 1 0
c d 0 1
]
R2 − caR1
∼
[
a b 1 0
0 d− bc
a
− c
a
1
]
aR2
∼
[
a b 1 0
0 ad− bc −c a
]
Since we need rankA = 2, we now require that ad − bc 6= 0. Assuming this we
continue row reducing to get[
a b 1 0
0 ad− bc −c a
]
∼
[
1 0 d
ad−bc
−b
ad−bc
0 1 −c
ad−bc
a
ad−bc
]
Thus, we get that A is invertible if and only if ad− bc 6= 0 and if A is invertible, then
A−1 =
1
ad− bc
[
d −b
−c a
]
REMARK
This quantity ad− bc which determines whether a 2× 2 matrix is invertible or not is
called the determinant of the matrix. We will look more at this later in the chapter.
Invertible Matrix Theorem
The following theorem ties together many of the concepts we have looked at through-
out the course. Additionally, we will find it useful throughout the rest of the course.
THEOREM 7 (Invertible Matrix Theorem)
For an n× n matrix A, the following are equivalent:
(1) A is invertible,
(2) The RREF of A is I,
(3) rankA = n,
(4) The system of equations A~x = ~b is consistent with a unique solution for all
~b ∈ Rn,
111
(5) The nullspace of A is {~0},
(6) The columns of A form a basis for Rn,
(7) The rows of A form a basis for Rn,
(8) AT is invertible.
This theorem says that if any one of the eight items are true for an n × n matrix,
then all of the other items are also true. We leave the proof of this theorem as an
important exercise. This is an excellent way of testing that you understand much of
the material we have done in the course up to this point.
The Invertible Matrix Theorem tells us that if A is invertible, then the system of
equations A~x = ~b is consistent with a unique solution. What is the solution? Observe
that since A is invertible, we have
A−1A~x = A−1~b
I~x = A−1~b
~x = A−1~b
Hence, if A is invertible, we can solve the system A~x = ~b by simply matrix multiplying
A−1~b.
EXAMPLE 4 Solve the system of linear equations
3x1 − 2x2 = 4
x1 + 5x2 = 7
Solution: We can represent the system as A~x = ~b where A =
[
3 −2
1 5
]
and ~b =
[
4
7
]
.
Then, from our work in Example 3, we see that A is invertible with A−1 = 1
17
[
5 2
−1 3
]
.
Hence, the solution of the system is
~x = A−1~b =
1
17
[
5 2
−1 3
] [
4
7
]
=
[
2
1
]
EXAMPLE 5 Let A =
1 −2 3−1 4 1
1 −1 4
. Solve the following systems of linear equations.
a) A~x =
10
1
112
Solution: In Example 2 we found that A−1 =
−17/2 −5/2 7−5/2 −1/2 2
3/2 1/2 −1
. Hence, the
solution of the system is
~x = A−1~b =
−17/2 −5/2 7−5/2 −1/2 2
3/2 1/2 −1
10
1
=
−3/2−1/2
1/2
b) A~x =
12
1
Solution: The solution is
~x = A−1~b =
−17/2 −5/2 7−5/2 −1/2 2
3/2 1/2 −1
12
1
=
−13/2−3/2
3/2
Observe from the example above that once we have found A−1 we can solve any
system of linear equations with coefficient matrix A very quickly. Moreover, at first
glance, it might seem like we are solving the system of linear equations without row
reducing. However, this is an illusion. The row operations are in fact “stored” inside
of the inverse of the matrix (we used row operations to find the inverse). We now
look at how to discover those stored row operations, and the connection between row
operations and matrix multiplication.
5.2 Elementary Matrices
Consider the system of linear equations A~x = ~b where A is invertible. We compare
our two methods of solving this system.
First, we can solve this system using the methods of Chapter 2 by row reducing the
augmented matrix: [
A | ~b
]
∼ [I | ~x]
Alternately, we can solve it by using the fact that A is invertible. We find A−1 using
the method in the previous section:
[A | I] ∼ [I | A−1]
We then solve the system by computing ~x = A−1~b.
Observe that in both methods we can use exactly the same row operations to row
reduce A to I. In the first method, we are applying those row operations directly to
113
~b to determine ~x. Thus, we see that in the second method the row operations used
are being “stored” inside of A−1 and the matrix vector product A−1~b is “performing”
those row operations on ~b so that we get the same answer as in the first method.
This not only shows us that A−1 is made of elementary row operations, but also that
there is a close connection between matrix multiplication and performing elementary
row operations.
DEFINITION
Elementary Matrix
An n× n matrix E is called an elementary matrix if it can be obtained from the
identity matrix by performing exactly one elementary row operation.
EXAMPLE 1 Determine which of the following matrices are elementary. For each elementary ma-
trix, indicate the associated elementary row operation.
a)
1 0 20 1 0
0 0 1
Solution: This matrix is elementary as it can be obtained from the identity matrix
by performing the row operation R1 + 2R3.
b)
[
0 1
1 0
]
Solution: This matrix is elementary as it can be obtained from the identity matrix
by performing the row operation R1 ↔ R2.
c)
3 0 00 3 0
0 0 3
Solution: This matrix is not elementary as it would require three elementary row
operations to get this matrix from the identity matrix.
d)
1 0 00 −1 0
0 0 1
Solution: This matrix is elementary as it can be obtained from the identity matrix
by performing the row operation (−1)R2.
It is clear that the RREF of every elementary matrix is I; we just need to perform the
inverse (opposite) row operation to turn the elementary matrix E back to I. Thus,
we not only have that every elementary matrix is invertible, but the inverse is the
elementary matrix associated with the opposite row operation.
EXAMPLE 2 Find the inverse of each of the following elementary matrices. Check your answer by
multiplying the matrices together.
114
a) E1 =
[
1 2
0 1
]
Solution: The inverse matrix E−11 is the elementary matrix associated with the row
operation required to bring E1 back to I. That is, R1 − 2R2. Therefore,
E−11 =
[
1 −2
0 1
]
Checking, we get
E−11 E1 =
[
1 −2
0 1
] [
1 2
0 1
]
=
[
1 0
0 1
]
b) E2 =
0 0 10 1 0
1 0 0
Solution: The inverse matrix E−12 is the elementary matrix associated with the row
operation required to bring E2 back to I. That is, R1 ↔ R3. Therefore,
E−12 =
0 0 10 1 0
1 0 0
Checking, we get
E−12 E2 =
0 0 10 1 0
1 0 0
0 0 10 1 0
1 0 0
=
1 0 00 1 0
0 0 1
c) E3 =
1 0 00 1 0
0 0 −3
Solution: The inverse matrix E−13 is the elementary matrix associated with the row
operation required to bring E3 back to I. That is, (−1/3)R3. Therefore,
E−13 =
1 0 00 1 0
0 0 −1/3
Checking, we get
E−13 E3 =
1 0 00 1 0
0 0 −1/3
1 0 00 1 0
0 0 −3
=
1 0 00 1 0
0 0 1
115
If we look at our calculations in the example above we see something interesting.
Consider an elementary matrix E. We saw that the product EI is the matrix obtained
from I by performing the row operations associated with E on I, and that E−1E is
the matrix obtained from E by performing the row operation associated with E−1 on
E. We demonstrate this further with another example.
EXAMPLE 3 Let E1 =
1 0 02 1 0
0 0 1
, E2 =
1 0 00 1 0
0 0 3
, and A =
1 1 30 −1 2
0 5 6
. Calculate E1A and
E2E1A. Describe the products in terms of matrices obtain from A by elementary row
operations.
Solution: By matrix multiplication, we get
E1A =
1 0 02 1 0
0 0 1
1 1 30 −1 2
0 5 6
=
1 1 32 1 8
0 5 6
Observe that E1A is the matrix obtain from A by performing the row operation
R2 + 2R1. That is, by performing the row operation associated with E1.
E2E1A =
1 0 00 1 0
0 0 3
1 1 32 1 8
0 5 6
=
1 1 32 1 8
0 15 18
Thus, E2E1A is the matrix obtained from A by first performing the row operation
R2 + 2R1, and then performing the row operation associated with E2, namely 3R3.
The following theorems prove that we have this useful fact for all elementary matrices.
THEOREM 1 Let A be an m×n matrix and let E be the m×m elementary matrix corresponding
to the row operation Ri + cRj, for i 6= j. Then EA is the matrix obtained from A by
performing the row operation Ri + cRj on A.
Proof: Observe that the i-th row of E satisfies eii = 1, eij = c, and eik = 0 for all
k 6= i, j. So, the entries in the i-th row of EA are
(EA)ik =
m∑
`=1
Ei`A`k = eijajk + eiiaik
= cajk + aik
for 1 ≤ k ≤ n. Therefore, the i-th row of A is equal to Ri+ cRj . The remaining rows
of E are rows from the identity matrix, and hence the remaining rows of EA are just
the corresponding rows in A.
116
THEOREM 2 Let A be an m×n matrix and let E be the m×m elementary matrix corresponding
to the row operation cRi. Then EA is the matrix obtained from A by performing the
row operation cRi on A.
Proof: The proof is left as an exercise.
THEOREM 3 Let A be an m×n matrix and let E be the m×m elementary matrix corresponding
to the row operation Ri ↔ Rj , for i 6= j. Then EA is the matrix obtained from A by
performing the row operation Ri ↔ Rj on A.
Proof: The proof is left as an exercise.
Since elementary row operations do not change the rank of a matrix, we immediately
get the following result.
COROLLARY 4 Let A be an m× n matrix and let E be an m×m elementary matrix. Then,
rank(EA) = rankA
We began this section by asking about how the inverse of a matrix stores elementary
row operations. We can now use our work with elementary matrices to decompose a
matrix into its elementary components.
THEOREM 5 Let A be an m × n matrix. Then there exists a sequence E1, . . . , Ek of m × m
elementary matrices such that
EkEk−1 · · ·E2E1A = R
where R is the reduced row echelon form of A.
Proof: Using the methods of Chapter 2, we know that we can row reduce a matrix
A to it RREF with a sequence of elementary row operations. Let E1 denote the first
row operations performed, E2 the second row operation, etc. The result now follows
from Theorems 1, 2, and 3, since E1A is the matrix obtained from A by performing
the first row operation on A, E2E1A is the matrix obtained from A by performing
the second row operation on E1A, etc.
117
EXAMPLE 4 Let A =
1 3 0 1−1 −3 3 2
2 7 0 2
. Write the reduced row echelon form R of A as a product
of elementary matrices and A.
Solution: We first row reduce A to R keeping track of the row operations used.
1 3 0 1−1 −3 3 2
2 7 0 2
R2 +R1
R3 − 2R1
∼
1 3 0 10 0 3 3
0 1 0 0
R2 ↔ R3
∼
1 3 0 10 1 0 0
0 0 3 3
1
3
R3
∼
1 3 0 10 1 0 0
0 0 1 1
R1 − 3R2 ∼
1 0 0 10 1 0 0
0 0 1 1
Next, we write the elementary matrix for each row operation we used. We have
E1 =
1 0 01 1 0
0 0 1
E2 =
1 0 00 1 0
−2 0 1
E3 =
1 0 00 0 1
0 1 0
E4 =
1 0 00 1 0
0 0 1/3
E5 =
1 −3 00 1 0
0 0 1
Hence, we have
E5E4E3E2E1A = R
COROLLARY 6 If A is an n × n invertible matrix, then A and A−1 can be written as a product of
elementary matrices.
Proof: Since A is invertible we have that the RREF A is I by the Invertible Ma-
trix Theorem. Hence, by Theorem 5, we get that there exist elementary matrices
E1, . . . , Ek such that
Ek · · ·E1A = I
Thus, by Theorem 5.1.4 we have that
A−1 = Ek · · ·E1
So, A−1 can be written as a product of elementary matrices. Moreover, by Theorem
5.1.5 (3), we get that
A = (Ek · · ·E1)−1 = E−11 E−12 · · ·E−1k
as required.
118
EXAMPLE 5 Let A =
[
1 3
−3 −11
]
. Write A and A−1 as products of elementary matrices.
Solution: Row reducing A to RREF gives[
1 3
−3 −11
]
R2 + 3R1
∼
[
1 3
0 −2
]
(−1/2)R2 ∼[
1 3
0 1
]
R1 − 3R2 ∼
[
1 0
0 1
]
Hence,
E1 =
[
1 0
3 1
]
E2 =
[
1 0
0 −1/2
]
E3 =
[
1 −3
0 1
]
and E3E2E1A = I so
A−1 = E3E2E1.
Moreover, we have
A = E−11 E
−1
2 E
−1
3 =
[
1 0
−3 1
] [
1 0
0 −2
] [
1 3
0 1
]
We end this section with one additional result which will be used in a couple of proofs
later in the course.
THEOREM 7 Let E be an m×m elementary matrix. Then ET is an elementary matrix.
Proof: If E is obtained from I by multiplying a row by a non-zero constant c, then
ET = E.
If E is obtained from I by swapping row i and row j, then eij = 1 = eji, so E
T = E.
Finally, if E is obtained from I by adding c times row i to row j, then ET is obtained
from I by adding c times row j to row i.
Hence, in all cases, ET is elementary.
5.3 Determinants
We have seen that a system of linear equations with coefficient matrix A =
[
a b
c d
]
has a unique solution if and only if ad− bc 6= 0. Therefore, by the Invertible Matrix
Theorem, A is invertible if and only if ad − bc 6= 0. Since this quantity determines
whether the matrix is invertible or not, we make the following definition.
119
DEFINITION
2× 2 Determinant
Let A =
[
a b
c d
]
. The determinant of A is
detA = ad− bc =
∣∣∣∣a bc d
∣∣∣∣
EXAMPLE 1 Calculate the following determinants.
a) det
[
3 5
−1 2
]
Solution: det
[
3 5
−1 2
]
= 3(2)− 5(−1) = 11
b)
∣∣∣∣3 62 4
∣∣∣∣
Solution:
∣∣∣∣3 62 4
∣∣∣∣ = 3(4)− 2(6) = 0
We, of course, want to extend the definition of the determinant to n × n matrices.
So, we next consider the 3× 3 case. Let A =
a11 a12 a13a21 a22 a23
a31 a32 a33
. Then it can be shown
(with some effort) that A is invertible if and only if
a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31 6= 0.
Thus, we define this to be the determinant of the 3×3 matrix A. However, we would
like to write the formula in a nicer form. Moreover, we would like to find some sort of
pattern with this and the 2× 2 determinant so that we can figure out how to define
the determinant for general n× n matrices. We observe that
a11a22a33 − a11a23a32 − a12a21a33 + a13a21a32 + a12a23a31 − a13a22a31
= a11(a22a33 − a23a32)− a21(a12a33 − a13a32) + a31(a12a23 − a13a22)
= a11
∣∣∣∣a22 a23a32 a33
∣∣∣∣− a21
∣∣∣∣a12 a13a32 a33
∣∣∣∣+ a31
∣∣∣∣a12 a13a22 a23
∣∣∣∣ (5.1)
Therefore, the 3 × 3 determinant can be written as a linear combination of 2 ×
2 determinants of submatrices of A where the coefficients are (plus or minus) the
coefficients from the first column of A.
To extend all of this to n× n matrices we will recursively use the same pattern. We
first make some definitions.
120
DEFINITION
Cofactor
Let A be an n×n matrix with n ≥ 2. Let A(i, j) be the n−1×n−1 matrix obtained
from A by deleting the i-th row and the j-th column. The cofactor of aij is
Cij = (−1)i+j detA(i, j)
EXAMPLE 2 Find all nine of the cofactors of A =
1 0 20 −1 3
−2 −3 4
.
Solution: We have
C11 = (−1)1+1
∣∣∣∣−1 3−3 4
∣∣∣∣ C12 = (−1)1+2
∣∣∣∣ 0 3−2 4
∣∣∣∣ C13 = (−1)1+3
∣∣∣∣ 0 −1−2 −3
∣∣∣∣
= 5 = −6 = −2
C21 = (−1)2+1
∣∣∣∣ 0 2−3 4
∣∣∣∣ C22 = (−1)2+2
∣∣∣∣ 1 2−2 4
∣∣∣∣ C23 = (−1)2+3
∣∣∣∣ 1 0−2 −3
∣∣∣∣
= −6 = 8 = 3
C31 = (−1)3+1
∣∣∣∣ 0 2−1 3
∣∣∣∣ C32 = (−1)3+2
∣∣∣∣1 20 3
∣∣∣∣ C33 = (−1)3+3
∣∣∣∣1 00 −1
∣∣∣∣
= 2 = −3 = −1
Using this and equation (5.1) we get
detA = a11C11 + a21C21 + a31C31 =
3∑
i=1
ai1Ci1.
Then, to define the determinant of an n×n matrix, we repeat this recursive pattern.
DEFINITION
n× n Determinant
Let A be an n× n matrix with n ≥ 2. Then, the determinant of A is defined to be
detA =
n∑
i=1
ai1Ci1
where the determinant of a 1× 1 matrix is defined to be det[c] = c.
REMARK
As we did with 2 × 2 matrices, we often represent the determinant of a matrix by∣∣ ∣∣. For example, we write
det
a11 a12 a13a21 a22 a23
a31 a32 a33
=
∣∣∣∣∣∣
a11 a12 a13
a21 a22 a23
a31 a32 a33
∣∣∣∣∣∣
121
EXAMPLE 3 Find the determinant of A =
1 0 20 1 1
−2 2 3
.
Solution: By definition, we have
detA = a11C11 + a21C21 + a31C31
= 1(−1)1+1
∣∣∣∣1 12 3
∣∣∣∣ + 0(−1)2+1
∣∣∣∣0 22 3
∣∣∣∣+ (−2)(−1)3+1
∣∣∣∣0 21 1
∣∣∣∣
= 1(1)
(
1(3)− 1(2))+ 0(1)(0(3)− 2(2))+ (−2)(1)(0(1)− 2(1))
= 1 + 0 + 4 = 5
In the example above we did not actually need to evaluate the cofactor C12 since the
coefficient a12 = 0. Normally, such steps are omitted.
EXAMPLE 4 Find the determinant of B =
0 1 0 0
1 0 1 0
0 1 0 1
−2 0 2 3
.
Solution: By definition, we have
detB = b11C11 + b21C21 + b31C31 + b41C41
= (−1)2+1
∣∣∣∣∣∣
1 0 0
1 0 1
0 2 3
∣∣∣∣∣∣+ (−2)(−1)4+1
∣∣∣∣∣∣
1 0 0
0 1 0
1 0 1
∣∣∣∣∣∣
= (−1)
∣∣∣∣∣∣
1 0 0
1 0 1
0 2 3
∣∣∣∣∣∣ + 2
∣∣∣∣∣∣
1 0 0
0 1 0
1 0 1
∣∣∣∣∣∣
We now need to evaluate these 3× 3 determinants. By definition, we get
detB =(−1)
(
(−1)1+1
∣∣∣∣0 12 3
∣∣∣∣ + (−1)2+1
∣∣∣∣0 02 3
∣∣∣∣
)
+
2
(
(−1)1+1
∣∣∣∣1 00 1
∣∣∣∣+ (−1)3+1
∣∣∣∣0 01 0
∣∣∣∣
)
=(−1) ((0(3)− 1(2))+ (−1)(0(3)− 0(2)))+
2
((
1(1)− 0(0))+ (0(0)− 0(1)))
=2 + 2 = 4
This example clearly demonstrates the recursive nature of the definition of the de-
terminant. In particular, notice that we were quite lucky to have lots of zeros in the
122
matrix as otherwise we would have had many more cofactors to evaluate. In general,
if we use this formula to find the determinant of an n×n matrix, we need to calculate
the value of n cofactors, each of which is the determinant of an (n−1)×(n−1) matrix.
To calculate each cofactor we need to calculate the value of n − 1 determinants of
(n− 2)× (n− 2) matrices, and so on. For even a small value of n, say n = 1000, you
can image that this is going to take a lot of work. In some modern applications, it
is necessary to calculate determinants of much, much larger matrices. Therefore, we
want to search for faster ways to evaluate a determinant.
The Cofactor Expansion
Observe that our way of factoring the 3 × 3 determinant in equation (5.1) was not
the only way it could be factored. We also could get
detA = a12C12 + a22C22 + a32C32
detA = a13C13 + a23C23 + a33C33
detA = a11C11 + a12C12 + a13C13
detA = a21C21 + a22C22 + a23C23
detA = a31C31 + a32C32 + a33C33
Hence, we did not have to use the coefficients and cofactors from the first column of
the matrix. We can in fact use the coefficients and cofactors from any row or column
of the matrix. For n× n matrices we have the following theorem.
THEOREM 1 Let A be an n× n matrix. Then
detA =
n∑
k=1
aikCik
called the cofactor expansion across the i-th row, OR
detA =
n∑
k=1
akjCkj
called the cofactor expansion across the j-th column.
REMARK
The proof of this theorem is fairly lengthy and so is omitted. However, the proof is
actually quite interesting, so you may wish to search for it, or, if you are interested
in a challenge, try proving it yourself.
123
This theorem allows us to expand the determinant along the row or column with the
most zeros.
EXAMPLE 5 Find the determinant of each of the following matrices.
a) A =
1 2 0−3 4 6
−2 5 0
.
Solution: Using the cofactor expansion along the third columns gives
detA = a13C13 + a23C23 + a33C33
= 6(−1)2+3
∣∣∣∣ 1 2−2 5
∣∣∣∣
= −6(1(5)− 2(−2))
= −54
b) B =
1 4 2 5
0 1 0 0
−1 3 0 1
0 −2 3 0
Solution: Using the cofactor expansion along the second row gives
detB = b21C21 + b22C22 + b23C23 + b24C24
= 1(−1)2+2
∣∣∣∣∣∣
1 2 5
−1 0 1
0 3 0
∣∣∣∣∣∣
We now expand the 3× 3 determinant along the third row to get
detB = 1(3)(−1)3+2
∣∣∣∣ 1 5−1 1
∣∣∣∣
= −3(1(1)− 5(−1))
= −18
c) C =
1 51 −73 29
0 2 −16 15
0 0 3 −99
0 0 0 4
.
Solution: Using the cofactor expansion along the first column in each step gives
detC = 1(−1)1+1
∣∣∣∣∣∣
2 −16 15
0 3 −99
0 0 4
∣∣∣∣∣∣
= 1(2)(−1)1+1
∣∣∣∣3 −990 4
∣∣∣∣
= 1(2)
(
3(4)− (−99)(0))
= 1(2)(3)(4) = 24
124
d) D =
3 −3 4 0 5
−6 8 9 0 3
−3 5 2 0 1
−5 −5 3 0 1
3 7 3 0 1
.
Solution: Using the cofactor expansion along the fourth column gives detD = 0.
Part c) in the last example motivates the following definitions which leads to an
important result.
DEFINITION
Upper Triangular
Lower Triangular
An m × n matrix U is said to be upper triangular if uij = 0 whenever i > j. An
m× n matrix L is said to be lower triangular if lij = 0 whenever i < j.
EXAMPLE 6 The matrices
1 2 3 10 0 2 5
0 0 −1 1
,
1 1 10 1 1
0 0 1
, and [−4] are all upper triangular.
The matrices
3 0 0 01 4 0 0
0 0 −2 0
,
1 0 01 0 0
1 0 0
, and [0 0
0 0
]
are all lower triangular.
THEOREM 2 If an n× n matrix A is upper triangular or lower triangular, then
detA = a11a22 · · ·ann
Proof: We prove this by induction on n for upper triangular matrices. The proof for
lower triangular matrices is similar.
If A is a 2 × 2 upper triangular matrix, say A =
[
a11a12
0 a22
]
, then detA = a11a22
as required. Assume that if B is an (n− 1)× (n− 1) upper triangular matrix, then
detB = b11 · · · b(n−1)(n−1). Let A be an n × n upper triangular matrix. Expanding
the determinant along the first column gives
detA = a11(−1)1+1C11 + 0 · · ·+ 0 = a11 detA(1, 1)
But, A(1, 1) is the (n− 1)× (n− 1) upper triangular matrix formed by deleting the
first row and first column of A. Thus, detA(1, 1) = a11a22 · · · (n− 1)(n− 1) by the
inductive hypothesis. Thus, detA = a11a22 · · ·ann as required.
125
Determinants and Row Operations
We have seen that it is very difficult to calculate the determinant of a large matrix
with very few zeros, but very easy to calculate the determinant of an upper or lower
triangular matrix. Thus, it makes sense to look at how applying elementary row
operations to a matrix changes the determinant.
THEOREM 3 Let A be an n× n matrix and let B be the matrix obtained from A by multiplying
one row of A by c ∈ R. Then detB = c detA.
Proof: We will prove the theorem by induction on n. For n = 2, the result is easily
verified. Assume the result holds for all (n− 1)× (n− 1) matrices with n ≥ 3. Let B
be the matrix obtained from A by multiplying the `-th row of A by c and let B(i, j)
denote the (n − 1) × (n − 1) submatrix of B obtained by deleting the i-th row and
j-th column of B. Then
bij =
{
aij if i 6= `
caij if i = `
and B(i, j) is the matrix obtained from A(i, j) by multiplying some row of A(i, j) by
c. Hence, by the inductive hypothesis, detB(i, j) = c detA(i, j). Therefore, using the
cofactor expansion along the i-th row, with i 6= `, gives
detB =
n∑
k=1
bik(−1)i+k detB(i, k)
=
n∑
k=1
aik(−1)i+k[c detA(i, k)]
= c
n∑
k=1
aik(−1)i+k detA(i, k)
= c detA
THEOREM 4 Let A be an n×n matrix and let B be the matrix obtained from A by swapping two
rows of A. Then detB = − detA.
Proof: The proof is left as an exercise.
COROLLARY 5 If an n× n matrix A has two identical rows, then detA = 0.
Proof: If the i-th and j-th rows of A are identical, then let B be the matrix obtained
from A by adding (-1) times the i-th row of A to the j-th row. Then, B has a row of
zeroes, so performing a cofactor expansion along that row gives
0 = detB = detA
126
THEOREM 6 Let A be an n × n matrix and let B be the matrix obtained from A by adding a
multiple of one row of A to another row. Then detB = detA.
Proof: The proof is left as an exercise.
We now see that we can find a determinant quite efficiently by row reducing the
matrix to upper triangular form, keeping track of how the row operations change
the determinant. Since adding a multiple of one row to another does not change the
determinant, it reduces the chance of errors if this is the only row operation used.
EXAMPLE 7 Evaluate the following determinants.
a)
∣∣∣∣∣∣∣∣
1 2 3 1
−1 −1 −1 2
1 3 1 1
−2 −2 0 −1
∣∣∣∣∣∣∣∣
.
Solution: Since adding a multiple of one row to another does not change the deter-
minant, we get ∣∣∣∣∣∣∣∣
1 2 3 1
−1 −1 −1 2
1 3 1 1
−2 −2 0 −1
∣∣∣∣∣∣∣∣
=
∣∣∣∣∣∣∣∣
1 2 3 1
0 1 2 3
0 1 −2 0
0 2 6 1
∣∣∣∣∣∣∣∣
=
∣∣∣∣∣∣∣∣
1 2 3 1
0 1 2 3
0 0 −4 −3
0 0 2 −5
∣∣∣∣∣∣∣∣
=
∣∣∣∣∣∣∣∣
1 2 3 1
0 1 2 3
0 0 −4 −3
0 0 0 −13/2
∣∣∣∣∣∣∣∣
= 1(1)(−4)(−13/2) = 26
b)
∣∣∣∣∣∣∣∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
1 3 −1 4
∣∣∣∣∣∣∣∣
127
Solution: Using the row operation R4 +R2 and then R4 −R1, we get∣∣∣∣∣∣∣∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
1 3 −1 4
∣∣∣∣∣∣∣∣
=
∣∣∣∣∣∣∣∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
3 2 −1 1
∣∣∣∣∣∣∣∣
=
∣∣∣∣∣∣∣∣
3 2 −1 1
2 −1 0 −3
5 2 3 −2
0 0 0 0
∣∣∣∣∣∣∣∣
= 0
Determinants and Elementary Matrices
Since multiplying a matrix A on the left by an elementary matrix E performs the same
elementary row operation on A as the corresponding row operation for E, Theorems
3, 4, and 6 give the following result.
COROLLARY 7 Let A be an n× n matrix and let E be an n× n elementary matrix. Then detEA =
detE detA.
Proof: Observe from Theorems 3, 4, and 6, that performing an elementary row op-
eration multiplies the determinant of the original matrix by a non-zero real number
c (in the case of swapping rows or adding a multiple of one row to another we have
c = −1 or c = 1 respectively). Since E is obtained from I by performing a single row
operation, we get that detE = c, where c depends on the single row operation. Simi-
larly, since EA is the matrix obtained from A by performing the same row operation,
we have that detEA = c detA = detE detA.
This corollary allows us to prove many other important results.
THEOREM 8 (addition to the Invertible Matrix Theorem)
An n× n matrix A is invertible if and only if detA 6= 0.
Proof: By Corollary 6 there exists a sequence of elementary matrices E1, . . . , Ek such
that Ek · · ·E1A = R where R is the reduced row echelon form of A. Then
detR = det(Ek · · ·E1A) = det(Ek) · · ·det(E1) detA
Since the determinant of an elementary matrix is non-zero we get that detA 6= 0 if
and only if detR 6= 0. But detR 6= 0 if and only if R = I. Hence, detA 6= 0 if and
only if A is invertible.
128
THEOREM 9 If A and B are n× n matrices then det(AB) = detA detB.
Proof: If detA = 0, then A is not invertible, so A~y = ~0 has infinitely many solutions
by the Invertible Matrix Theorem. Thus A(B~x) = ~0 has infinitely many solutions.
So (AB)~x = ~0 has infinitely many solutions, hence AB is not invertible which gives
0 = det(AB) = detA detB.
If detA 6= 0, then A is invertible, so there exists a sequence of elementary matrices
such that A = E1 · · ·Ek. Then
det(AB) = det(E1 · · ·EkB)
= det(E1) · · ·det(Ek) detB
= det(E1 · · ·Ek) detB
= detA detB
COROLLARY 10 If A is an invertible matrix, then detA−1 = 1
detA
.
Proof: The proof is left as an exercise.
THEOREM 11 Let A be an n× n matrix. Then detA = detAT .
Proof: The proof is left as an exercise.
Since detA = detAT we can use column operations instead of row operations when
evaluating a determinant. In particular, adding a multiple of one column to another
does not change the determinant, swapping two columns multiplies the determinant
by (-1), and multiplying a column by a scalar c multiplies the determinant by c.
Furthermore, combining row operations, column operations, and cofactor expansions
can make evaluating determinants quite efficient.
EXAMPLE 8 Evaluate the determinant of D =
1 3 −1 1
−3 2 1 2
2 −1 1 1
2 −3 2 −3
.
Solution: Using the row operation R3 +R1 gives
detD =
∣∣∣∣∣∣∣∣
1 3 −1 1
−3 2 1 2
3 2 0 2
2 −3 2 −3
∣∣∣∣∣∣∣∣
129
We now use column operation C2 − C4 to get
detD =
∣∣∣∣∣∣∣∣
1 2 −1 1
−3 0 1 2
3 0 0 2
2 0 2 −3
∣∣∣∣∣∣∣∣
Using the cofactor expansion along the second column gives
detD = 2(−1)1+2
∣∣∣∣∣∣
−3 1 2
3 0 2
2 2 −3
∣∣∣∣∣∣
Now R3 − 2R1 gives
detD = −2
∣∣∣∣∣∣
−3 1 2
3 0 2
8 0 −7
∣∣∣∣∣∣
Hence, by using the cofactor expansion along the second column we get
detD = (−2)(1)(−1)1+2
∣∣∣∣3 28 −7
∣∣∣∣ = −74
5.4 Determinants and Systems of Equations
The Invertible Matrix Theorem, tell us that for an n×n matrix A is invertible if and
only if the system of equations A~x = ~b is consistent with a unique solution for all ~b,
and that these are both equivalent to detA 6= 0. Thus, it is clear that all three of
these concepts are closely related. We now look a little further at this relationship.
Inverse by Cofactors
Observe that we can write the inverse of a matrix A =
[
a11 a12
a21 a22
]
in the form
A−1 =
1
detA
[
a22 −a12
−a21 a11
]
=
1
detA
[
C11 C21
C12 C22
]
That is, the entries of A−1 are the cofactors of A. In particular,
(
A−1
)
ij
=
1
detA
Cji
130
It is important to take careful notice of the change of order in the subscripts in the
line above. At first, this result may seem suprising, but consider the product
AA−1 =
[
a11 a12
a21 a22
]
1
detA
[
C11 C21
C12 C22
]
=
1
detA
[
a11C11 + a12C12 a11C21 + a12C22
a21C11 + a22C12 a21C21 + a22C22
]
=
1
detA
[
detA a11(−a12) + a12a11
a21a22 + a22(−a21) detA
]
=
[
1 0
0 1
]
We now prove that this formula works in the n × n case. We begin by proving a
lemma.
LEMMA 1 Let A be an n× n matrix with cofactors Cij. Then
n∑
k=1
(A)ikCjk = 0
Proof: Let B be the matrix obtained from A by replacing the j-th row of A by the
i-th row of A. That is, b`k = a`k for all ` 6= j, and bjk = aik. Then B has two identical
rows, so detB = 0. Moreover, since the cofactors of the j-th row of B are the same
as the cofactors of the j-th row of A we get
0 = detB =
n∑
k=1
bjkCjk =
n∑
k=1
aikCjk
In words, this lemma says that if we try to do a cofactor expansion of the determinant,
but use the coefficients from one row and the cofactors from another row, then we
will always get 0.
THEOREM 2 Let A be an invertible n× n matrix. Then (A−1)ij = 1detACji.
Proof: Let B be the n× n matrix defined by (B)ij = 1detACji. Then, for 1 ≤ i ≤ n
we have that (
AB
)
ii
=
n∑
k=1
(A)ik(B)ki =
1
detA
n∑
k=1
(A)ikCik = 1
131
For
(
AB
)
ij
with i 6= j, we have
(
AB
)
ij
=
n∑
k=1
(A)ik(B)kj =
1
detA
n∑
k=1
(A)ikCjk = 0
by Lemma 1. Therefore, AB = I so B = A−1.
DEFINITION
Cofactor Matrix
Let A be an n × n matrix. Then the cofactor matrix, cof A, of A is the matrix
whose ij-th entry is the ij-th cofactor of A. That is,
(cof A)ij = Cij
DEFINITION
Adjugate
Let A be an n× n matrix. The adjugate of A is the matrix defined by(
adjA
)
ij
= Cji
In particular, adjA = (cof A)T .
REMARK
By Theorem 2 we have that A−1 = 1
detA
adjA.
EXAMPLE 1 Determine the adjugate of A =
1 0 2−2 1 0
3 −1 1
and verify that A(adjA) = (detA)I.
Solution: We find that the cofactor matrix of A is
C =
1 2 −1−2 −5 1
−2 −4 1
So
adjA = CT =
1 −2 −22 −5 −4
−1 1 1
Multiplying we find
A(adjA) =
1 0 2−2 1 0
3 −1 1
1 −2 −22 −5 −4
−1 1 1
=
−1 0 00 −1 0
0 0 −1
Observe that (
A(adjA)
)
ii
= ai1Ci1 + ai2Ci2 + ai3Ci3 = detA
132
Thus, we see that detA = −1 and A(adjA) = (detA)I. Moreover, we have that
A−1 =
1
detA
adjA =
−1 2 2−2 5 4
1 −1 −1
Notice that this is not a very efficient way of calculating the inverse. However, it can
be useful in some theoretic applications as it gives a formula for the entries of the
inverse.
Cramer’s Rule
Consider the system of linear equations A~x = ~b where A is an n × n matrix. If A is
invertible then we know this has a unique solution
~x = A−1~b =
1
detA
(adjA)~b =
1
detA
C11 · · · Cn1
...
...
C1n · · · Cnn
b1
...
bn
.
Then the i-th component of ~x is
xi =
b1C1i + · · ·+ bnCni
detA
.
Observe that the quantity b1C1i+ · · ·+ bnCni is the determinant of the matrix where
we have replaced the i-th column of A by ~b. We let
Ai = b1C1i + · · ·+ bnCni =
∣∣∣∣∣∣∣
a11 · · · a1(i−1) b1 a1(i+1) · · · a1n
...
...
...
an1 · · · an(i−1) bn an(i+1) · · · ann
∣∣∣∣∣∣∣
and then xi =
detAi
detA
.
THEOREM 3 (Cramer’s Rule)
If A is an invertible matrix, then the solution ~x of A~x = ~b is given by
xi =
detAi
detA
where Ai is the matrix obtained from A by replacing the i-th column of A by ~b.
133
EXAMPLE 2 Solve the system of linear equations using Cramer’s Rule.
x1 + 2x2 + x3 = 2
−4x1 − x2 + 3x3 = 6
−2x1 + x2 + 2x3 = 3
Solution: We have A =
1 2 1−4 −1 3
−2 1 2
and ~b =
26
3
. We find that
detA =
∣∣∣∣∣∣
1 2 1
−4 −1 3
−2 1 2
∣∣∣∣∣∣ =
∣∣∣∣∣∣
1 2 1
0 7 7
0 5 4
∣∣∣∣∣∣ = −7
Therefore, A is invertible and we can apply Cramer’s Rule to get
x1 =
detA1
detA
= −1
7
∣∣∣∣∣∣
2 2 1
6 −1 3
3 1 2
∣∣∣∣∣∣ = −
1
7
∣∣∣∣∣∣
0 2 1
0 −1 3
−1 1 2
∣∣∣∣∣∣ =
−7
−7 = 1
x2 =
detA2
detA
= −1
7
∣∣∣∣∣∣
1 2 1
−4 6 3
−2 3 2
∣∣∣∣∣∣ = −
1
7
∣∣∣∣∣∣
1 0 1
−4 0 3
−2 −1 2
∣∣∣∣∣∣ =
7
−7 = −1
x3 =
detA3
detA
= −1
7
∣∣∣∣∣∣
1 2 2
−4 −1 6
−2 1 3
∣∣∣∣∣∣ = −
1
7
∣∣∣∣∣∣
1 2 2
0 7 14
0 5 7
∣∣∣∣∣∣ =
−21
−7 = 3
So, the solution is ~x =
1−1
3
.
EXAMPLE 3 Let A =
a 1 20 b −1
c 1 d
. Assuming that detA 6= 0, solve A~x =
1−1
1
.
Solution: We have detA = abd − 2cb+ a− c. Thus,
x1 =
1
detA
∣∣∣∣∣∣
1 1 2
−1 b −1
1 1 d
∣∣∣∣∣∣ =
1
detA
∣∣∣∣∣∣
1 1 2
0 b+ 1 1
0 0 d− 2
∣∣∣∣∣∣ =
(b+ 1)(d− 2)
abd − 2cb+ a− c
x2 =
1
detA
∣∣∣∣∣∣
a 1 2
0 −1 −1
c 1 d
∣∣∣∣∣∣ =
1
detA
∣∣∣∣∣∣
a 0 1
0 −1 −1
c 0 d− 1
∣∣∣∣∣∣ =
−ad+ a+ c
abd− 2cb+ a− c
x3 =
1
detA
∣∣∣∣∣∣
a 1 1
0 b −1
c 1 1
∣∣∣∣∣∣ =
1
detA
∣∣∣∣∣∣
a 0 1
0 b+ 1 −1
c 0 1
∣∣∣∣∣∣ =
(b+ 1)(a− c)
abd − 2cb+ a− c
134
That is, the solution is ~x = 1
abd−2cb+a−c
(b+ 1)(d− 2)−ad+ a + c
(b+ 1)(a− c)
.
5.5 Area and Volume
Area of a Parallelogram in R2
Let ~u =
[
u1
u2
]
and ~v =
[
v1
v2
]
be two non-zero vectors in R2. We can then form a
parallelogram in R2 with corner points (0, 0), (u1, u2), (v1, v2), and (u1 + v1, u2 + v2).
This is called the paralellogram induced by ~u and ~v. See Figure 5.5.1.
x1
x2
~u
(u1, u2)
~v
(v1, v2)
~u+ ~v
u1 u1 + v1
v2
u2 + v2
(0, 0)
(u1 + v1, u2 + v2)
Figure 5.5.1: The parallelogram induced by vectors ~u and ~v
We know the area A of a parallelogram is length × height. As in Figure 5.5.2, we get
A = ‖~u‖‖ perp~u ~v‖
Using trigonometry, we see that ‖ perp~u ~v‖ = ‖~v‖| sin θ|. Now, recall from Section 1.3
135
that cos θ = ~u·~v
‖~u‖‖~v‖
. Using this gives
A = ‖~u‖‖~v‖| sin θ|
A2 = ‖~u‖2‖~v‖2 sin2 θ
= ‖~u‖2‖~v‖2(1− cos2 θ)
= ‖~u‖2‖~v‖2 − (~u · ~v)2
= (u21 + u
2
2)(v
2
1 + v
2
2)− (u1v1 + u2v2)2
= u21v
2
2 + u
2
2v
2
1 − 2(u1v2u2v1)
= (u1v2 − u2v1)2
=
∣∣∣∣u1 v1u2 v2
∣∣∣∣
2
Hence, the area of the parallelogram is given by A =
∣∣det [~u ~v]∣∣.
x1
x2
(0, 0)
‖~u‖
‖~v‖
θ
‖perp~u ~v‖
Figure 5.5.2: The area of the parallelogram induced by ~u and ~v
EXAMPLE 1 Determine the area of the parallelogram induced by ~u =
[
1
3
]
and ~v =
[
2
1
]
.
Solution: We have
A =
∣∣∣∣det
[
1 2
3 1
]∣∣∣∣ = | − 5| = 5
REMARK
Observe that the determinant in the example above gave a negative answer because
the angle θ from ~u to ~v was greater than pi rads. In particular, if we swap ~u and ~v,
136
that is swap the columns of the matrix
[
~u ~v
]
, then we will multiply the determinant
by -1. In a similar way, we can use the result above to get geometric confirmation of
many of the properties of determinants.
Volume of a Parallelepiped
We can repeat what we did above to find the volume of a parallelepiped induced by
three vectors ~u =
u1u2
u3
, ~v =
v1v2
v3
, and ~w =
w1w2
w3
in R3.
~n = ~u× ~v
O
~u
~v
~w
proj~n ~w
‖~u× ~v‖
Figure 5.5.3: The volume of a parallelepiped is determined by the area of its base, and by its height
The volume of a parallelepiped is (area of base) × height. The base is the parallel-
ogram induced by vectors ~u and ~v. Thus, using our work above and Theorem 1.3.4,
we have that
area of base =
∣∣‖~u‖‖~v‖ sin θ∣∣ = ‖~u× ~v‖
Also, the height is the perpendicular of the projection of ~w onto the base. But, this
is just the projection of ~w onto the normal vector ~n of the base. Since ~u and ~v lie in
the base, we can take ~n = ~u× ~v. Therefore,
height = ‖ proj~n ~w‖ =
|~w · ~n|
‖~n‖ =
|~w · (~u× ~v)|
‖~u× ~v‖
Hence, the volume V of the parallelepiped is given by
V =
|~w · (~u× ~v)|
‖~u× ~v‖ × ‖~u× ~v‖ = |~w · (~u× ~v)|
= |w1(u2v3 − u3v2)− w2(u1v3 − u3v1) + w3(u1v2 − u2v1)|
=
∣∣∣∣∣∣det
u1 v1 w1u2 v2 w2
u3 v3 w3
∣∣∣∣∣∣
137
EXAMPLE 2 Determine the volume of the parallelepiped determined by ~u =
11
4
, ~v =
13
4
, and
~w =
−21
−5
.
Solution: The volume is
V =
∣∣det [~u ~v ~w]∣∣ =
∣∣∣∣∣∣det
1 1 −21 3 1
4 4 −5
∣∣∣∣∣∣ =
∣∣∣∣∣∣det
1 1 −20 2 3
0 0 3
∣∣∣∣∣∣ = 6
REMARK
Of course, this can be extended to higher dimensions. If ~v1, . . . , ~vn ∈ Rn, then we
say that they induce an n-dimensional parallelotope (the n-dimensional version of
a parallelogram or parallelepiped). The n-volume V of the parallelotope is
V =
∣∣det [~v1 · · · ~vn]∣∣
Chapter 6
Diagonalization
6.1 Matrix of a Linear Mapping and Similar Ma-
trices
Matrix of a Linear Operator
In Chapter 3 we saw how to find the standard matrix of a linear mapping by finding
the image of the standard basis vectors under the linear mapping. However, if we look
at some particular examples, we see that the standard matrix of a linear mapping
is not always nice to work with. We found in Example 3.2.6 that for ~a =
[
1
2
]
the
standard matrix of proj~a is
[proj~a] =
[
1/5 2/5
2/5 4/5
]
Even though we can use the standard matrix to calculate the projection of a vector
~x onto ~a, the matrix itself does not seem to provide us with any information about
the action of the projection. This is because we are trying to use the standard basis
vectors to determine how a vector is projected onto ~a. It would make more sense to
instead use a basis for R2 which contains ~a and a vector orthogonal to ~a. Of course,
to do this we first need to determine how to find the matrix of a linear operator with
respect to a basis other than the standard basis.
Recall that the standard matrix [L] of a linear operator L : Rn → Rn satisfies L(~x) =
[L]~x. To define the matrix of L with respect to any basis B of Rn, we mimic this
formula in B-coordinates instead of standard coordinates. That is, we want
[L(~x)]B = [L]B[~x]B
If B = {~v1, . . . , ~vn}, then we can write ~x = b1~v1 + · · · + bn~vn. Hence, we have
138
139
[~x]B =
b1
...
bn
, and
[L(~x)]B = [L(b1~v1 + · · ·+ bn~vn)]B
= [b1L(~v1) + · · ·+ bnL(~vn)]B
= b1[L(~v1)]B + · · ·+ bn[L(~vn)]B
=
[
[L(~v1)]B · · · [L(~vn)]B
]
b1
...
bn
Thus, we make the following definition.
DEFINITION
B-Matrix
Let B = {~v1, . . . , ~vn} be a basis for Rn and let L : Rn → Rn be a linear operator.
Then the B-matrix of L is defined to be
[L]B =
[
[L(~v1)]B · · · [L(~vn)]B
]
EXAMPLE 1 Let L : R3 → R3 be a linear mapping with standard matrix [L] =
−6 −2 9−5 −1 7
−7 −2 10
and let L(~x) = [L]~x. Find the B-matrix of L where B =
11
1
,
10
1
,
13
2
.
Solution: By definition, the columns of the B-matrix are the B-coordinates of the
images of the vectors in B under L. We find that
−6 −2 9−5 −1 7
−7 −2 10
11
1
=
11
1
= 1
11
1
+ 0
10
1
+ 0
13
2
−6 −2 9−5 −1 7
−7 −2 10
10
1
=
32
3
= 2
11
1
+ 1
10
1
+ 0
13
2
−6 −2 9−5 −1 7
−7 −2 10
13
2
=
66
7
= 3
11
1
+ 2
10
1
+ 1
13
2
Hence,
[L]B =
[
[L(1, 1, 1)]B [L(1, 0, 1)]B [L(1, 3, 2)]B
]
=
1 2 30 1 2
0 0 1
140
In the example above we had to use the methods of Section 4.3 to find the coordinates
of the image vectors with respect to the basis B. However, if we can pick a basis for
the linear mapping which is geometrically suited to the mapping, then we can often
find the coordinates of the image vectors with respect to the basis by observation.
EXAMPLE 2 Let ~a =
[
1
2
]
. Determine a geometrically natural basis B for proj~a : R2 → R2, and
then determine [proj~a]B.
Solution: As discussed above, it makes sense to include ~a in our geometrically
natural basis as well as a vector orthogonal to ~a. So, we take the other basis vector
to be ~n =
[
2
−1
]
and let B = {~a, ~n}.
By definition, the columns of the B-matrix are the B-coordinates of the images of the
vectors in B under proj~a. We have
proj~a~a = ~a = 1
[
1
2
]
+ 0
[−2
1
]
proj~a ~n = ~0 = 0
[
1
2
]
+ 0
[−2
1
]
Therefore,
[proj~a]B =
[
[proj~a~a]B [proj~a ~n]B
]
=
[
1 0
0 0
]
Let us compare the standard matrix of proj~a to its B-matrix found in the example
above. First, consider its standard matrix [proj~a] =
[
1/5 2/5
2/5 4/5
]
. Does this matrix give
you any useful information about the mapping? It does show us that the projection
of
[
x1
x2
]
onto ~a is
[
1
5
x1 +
2
5
x2
2
5
x1 +
4
5
x2
]
. But this does not tell us much about the mapping.
On the other hand, consider its B-matrix [proj~a]B =
[
1 0
0 0
]
. What information does
this give us about the mapping? First, this matrix clearly shows the action of the
mapping. In particular, for any ~x = b1~a+ b2~n ∈ R2 we have
[proj~a ~x]B = [proj~a]B[~x]B =
[
1 0
0 0
] [
b1
b2
]
=
[
b1
0
]
That is, the image of ~x under proj~a is the amount of ~x in the direction of ~a. Also,
it is clear from the form of the matrix that the range of the mapping is all scalar
multiples of the first vector in B and the kernel is all scalar multiples of the second
vector in B.
The form of the matrix in Example 2 is obviously important. Therefore, we make the
following definition.
141
DEFINITION
Diagonal Matrix
An n × n matrix D is said to be a diagonal matrix if dij = 0 for all i 6= j. We
denote a diagonal matrix by
diag(d11, d22, . . . , dnn)
EXAMPLE 3 The diagonal matrix D = diag(1, 2, 0, 1) is D =
1 0 0 0
0 2 0 0
0 0 0 0
0 0 0 1
.
Let’s consider another example. Let L : R3 → R3 be the linear mapping with standard
matrix [L] =
1 0 −1−1/3 2/3 −1/3
−2/3 −2/3 4/3
. Without doing any further work, what can you
determine about L from [L]? Not much. However, it can be shown that there exists
a basis B = {~v1, ~v2, ~v3} for R3 such that [L]B =
1 0 00 2 0
0 0 0
. What can you say
about L by looking at [L]B? We can see that the action of L is to map a vector
~x = b1~v1 + b2~v2 + b3~v3 onto b1~v1 + 2b2~v2 + 0b3~v3, the range of L is Span{~v1, ~v2}, and
the kernel of L is Span{~v3}.
Clearly it is very useful to have a B-matrix of a linear operator L : Rn → Rn in
diagonal form. This leads to a couple of questions. How do we determine if there
exists a basis B such that [L]B has this form? If there is such a basis, how do we find
it? We start answering these questions by looking at our method for calculating the
B-matrix a little more closely.
Similar Matrices
Let L : Rn → Rn be a linear operator with standard matrix A = [L] and let B =
{~v1, . . . , ~vn} be a basis for Rn. Then, by definition, we have
[L]B =
[
[A~v1]B · · · [A~vn]B
]
(6.1)
From our work in Section 4.3, we know that the change of coordinates matrix from
B-coordinates to S-coordinates is P = [~v1 · · · ~vn]. Hence, we have [~x]B = P−1~x
for any ~x ∈ Rn. Since A~vi ∈ Rn for 1 ≤ i ≤ n, we can apply this to (6.1) to get
[L]B =
[
[A~v1]B · · · [A~vn]B
]
=
[
P−1A~v1 · · · P−1A~vn
]
= P−1A
[
~v1 · · · ~vn
]
= P−1AP
= P−1[L]P
142
REMARK
You can also show that [L]B = P
−1[L]P , by showing that [L]B[~x]B = P
−1[L]P [~x]B for
all ~x ∈ Rn by simplifying the right-hand side of the equation. This is an excellent
exercise.
Since [L]B and [L] represent the same linear mapping, we would expect that these
two matrices have many of the same properties.
THEOREM 1 Let A and B be n× n matrices such that P−1AP = B for some invertible matrix P .
Then
(1) rankA = rankB
(2) detA = detB
(3) trA = trB where trA is defined by trA =
n∑
i=1
aii is called the trace of a matrix.
Proof: For (1), we multiply both sides on the left of P−1AP = B by P to get
AP = PB. Then, since P is invertible, it can be written as a product of elementary
matrices, say P = E1 · · ·Ek. Hence, by Corollary 5.2.4, we get
rank(PB) = rank(E1 · · ·EkB) = rank(B)
Similarly,
rank(AP ) = rank((AP )T ) = rank(ETk · · ·ET1 AT ) = rank(AT ) = rank(A)
Thus, rank(A) = rank(AP ) = rank(PB) = rank(B) as required.
For (2), we have detP 6= 0, so
det(B) = det(P−1AP ) = det(P−1) detA detP =
1
detP
detA detP = detA
For (3), we first observe that
tr(AB) =
n∑
i=1
n∑
k=1
aikbki =
n∑
k=1
n∑
i=1
bkiaik = tr(BA)
Hence,
tr(B) = tr(P−1AP ) = tr(P−1(AP )) = tr((AP )P−1) = tr(APP−1) = tr(A)
We see that such matrices are similar in many ways. This motivates the following
definition.
143
DEFINITION
Similar Matrices
Let A and B be n× n matrices such that P−1AP = B for some invertible matrix P .
Then A and B are said to be similar.
REMARK
Observe that if P−1AP = B, then taking Q = P−1 we get that Q is an invertible
matrix such that A = PBP−1 = Q−1BQ. So, the similarity property is symmetric.
We can now rephrase our earlier questions. Given a linear operator L : Rn → Rn,
is the standard matrix of L similar to a diagonal matrix? If so, how do we find an
invertible matrix P such that P−1[L]P is diagonal?
6.2 Eigenvalues and Eigenvectors
Let A = [L] be the standard matrix of a linear operator L : Rn → Rn. To determine
how to construct an invertible matrix P such that P−1AP = D is diagonal, we work
in reverse. That is, we will assume that such a matrix P exists and use this to find
what properties P must have.
Let P =
[
~v1 · · · ~vn
]
and let D = diag(λ1, . . . , λn) =
[
λ1~e1 · · · λn~en
]
such that
P−1AP = D, or alternately AP = PD. This gives
A
[
~v1 · · · ~vn
]
= P
[
λ1~e1 · · · λn~en
]
[
A~v1 · · · A~vn
]
=
[
λ1P~e1 · · · λnP~en
]
[
A~v1 · · · A~vn
]
=
[
λ1~v1 · · · λ~vn
]
Thus, we see that we must have A~vi = λi~vi for 1 ≤ i ≤ n. Moreover, since P is
invertible, the columns of P must be linearly independent. In particular, ~vi 6= ~0 for
1 ≤ i ≤ n.
DEFINITION
Eigenvalues
Eigenvectors
Eigenpair
Let A be an n× n matrix. If there exists a vector ~v 6= ~0 such that A~v = λ~v, then λ
is called an eigenvalue of A and ~v is called an eigenvector of A corresponding to
λ. The pair (λ,~v) is called an eigenpair.
If L : Rn → Rn is a linear operator with standard matrix A = [L], and if (λ,~v) is an
eigenpair of A, then observe that we have
L(~v) = A~v = λ~v
Hence, it makes sense to also define eigenvalues and eigenvectors for linear operators.
144
DEFINITION
Eigenvalues
Eigenvectors
Let L : Rn → Rn be a linear operator. If there exists a vector ~v 6= ~0 such that
L(~v) = λ~v, then λ is called an eigenvalue of L and ~v is called an eigenvector of L
corresponding to λ.
EXAMPLE 1 Let ~a =
[
1
2
]
and ~n =
[
2
−1
]
. Then, we have
proj~a~a = 1~a
proj~a ~n = ~0 = 0~n
Hence, ~a is an eigenvector of proj~a with eigenvalue 1, and ~n is an eigenvector of proj~a
with eigenvalue 0.
Observe that it is easy to determine whether a given vector ~v is an eigenvector of
a given matrix A. We just need to determine whether the image of ~v under A is a
scalar multiple of ~v.
EXAMPLE 2 Determine which of the following vectors are eigenvectors of A =
2 −3 −11 −2 −1
1 −3 0
.
Determine the eigenvalue associated with each eigenvector.
a) ~v1 =
31
0
Solution: We have
A~v1 =
2 −3 −11 −2 −1
1 −3 0
31
0
=
31
0
So, ~v1 is an eigenvector with eigenvalue λ1 = 1.
b) ~v2 =
−12
−1
Solution: We have
A~v2 =
2 −3 −11 −2 −1
1 −3 0
−12
−1
=
−7−4
−7
So, A~v2 is not a scalar multiple of ~v2, so it is not an eigenvector.
c) ~v3 =
11
1
145
Solution: We have
A~v3 =
2 −3 −11 −2 −1
1 −3 0
11
1
=
−2−2
−2
= (−2)
11
1
So, ~v3 is an eigenvector with eigenvalue λ2 = −2.
However, how do we determine whether a given scalar λ is an eigenvalue of a given
matrix A? To do this, we would need to determine whether there exists a vector
~v 6= ~0 such that A~v = λ~v. At first glance this equation might look like a system of
linear equations, but we observe that the unknown vector ~v is actually on both sides
of the equation. To turn this into a system of linear equations, we need to move λ~v
to the other side. We get
A~v − λ~v = ~0
(A− λI)~v = ~0
Hence, we have a homogeneous system of linear equations with coefficient matrix
A−λI. Since the definition requires that ~v 6= ~0, we see that λ is an eigenvalue of A if
and only if this system has infinitely many solutions. Thus, since A−λI is n×n, we
know by the Invertible Matrix Theorem that we require det(A− λI) = 0. Moreover,
we see that to find all the eigenvectors associated with the eigenvalue λ, we just need
to determine all non-zero vectors ~v such that (A− λI)~v = ~0.
EXAMPLE 3 Find all of the eigenvalues of A =
1 6 30 −2 0
3 6 1
. Determine all eigenvectors associated
with each eigenvalue.
Solution: Our work above shows that all eigenvalues of A must satisfy det(A−λI) =
0. Thus, to find all eigenvalues of A, we solve this equation for λ. We have
0 = det(A− λI) =
∣∣∣∣∣∣
1− λ 6 3
0 −2− λ 0
3 6 1− λ
∣∣∣∣∣∣
= (−2− λ)
∣∣∣∣1− λ 33 1− λ
∣∣∣∣
= −(λ + 2)(λ2 − 2λ− 8) = −(λ+ 2)(λ+ 2)(λ− 4)
Hence, the eigenvalues are λ1 = −2, λ2 = −2, and λ3 = 4.
To find the eigenvectors associated with the eigenvalue λ1 = −2 we solve the homo-
geneous system (A− λ1I)~v = ~0. Row reducing the coefficient matrix gives
A− λ1I =
3 6 30 0 0
3 6 3
∼
1 2 10 0 0
0 0 0
146
The solution of the homogeneous system is ~v = s
−21
0
+ t
−10
1
, s, t ∈ R. Hence,
all eigenvectors for λ1 are all non-zero vectors of the form ~v = s
−21
0
+ t
−10
1
,
s, t ∈ R. Of course, these are also the eigenvectors for λ2.
Similarly, for λ3 = 4 we row reduce the coefficient matrix of the homogeneous system
(A− λ3I)~v = ~0 to get
A− λ3I =
−3 6 30 −6 0
3 6 −3
∼
1 0 −10 1 0
0 0 0
The solution of the homogeneous system is ~v = t
10
1
, t ∈ R. Hence, all eigenvectors
for λ3 are ~v = t
10
1
, t ∈ R with t 6= 0.
Observe that for any n×n matrix A the equation det(A−λI) will always be an n-th
degree polynomial, and the roots of the polynomial are the eigenvalues of A.
DEFINITION
Characteristic
Polynomial
Let A be an n× n matrix. The characteristic polynomial of A is the n-th degree
polynomial
C(λ) = det(A− λI)
THEOREM 1 A scalar λ is an eigenvalue of an n× n matrix A if and only if C(λ) = 0.
Proof: The proof is left as an exercise.
We also see that the set of all eigenvectors associated with an eigenvalue λ is the
nullspace of A− λI not including the zero vector.
DEFINITION
Eigenspace
Let A be an n × n matrix with eigenvalue λ. We call the nullspace of A − λI the
eigenspace of λ. The eigenspace is denoted Eλ.
147
REMARKS
1. It is important to remember that the set of all eigenvectors for the eigenvalue
λ of A is all vectors in Eλ excluding the zero vector.
2. Since the eigenspace is just the nullspace of A− λI, it is also a subspace of Rn.
EXAMPLE 4 Let A =
1 0 10 1 0
0 0 1
. Find all eigenvalues ofA and determine a basis for the eigenspace
of each eigenvalue.
Solution: To find all of the eigenvalues, we find the roots of the characteristic
polynomial. We have
0 = C(λ) = det(A− λI) =
∣∣∣∣∣∣
1− λ 0 1
0 1− λ 0
0 0 1− λ
∣∣∣∣∣∣ = (1− λ)3
Hence, all three eigenvalues of A are 1. For λ1 = 1, we have
A− λ1I =
0 0 10 0 0
0 0 0
Hence, a basis for Eλ1 is
10
0
,
01
0
.
Observe in the examples above that the dimension of the eigenspace of an eigenvalue
is always less than or equal to the number of times the eigenvalue is repeated as a
root of the characteristic polynomial. Before we prove this is always the case, we
make the following definitions and prove a lemma.
DEFINITION
Algebraic
Multiplicity
Geometric
Multiplicity
Let A be an n × n matrix with eigenvalue λ1. The algebraic multiplicity of λ1,
denoted aλ1 , is the number of times that λ1 is a root of the characteristic polynomial
C(λ). That is, if C(λ) = (λ − λ1)kC1(λ), where C1(λ1) 6= 0, then aλ1 = k. The
geometric multiplicity of λ, denoted gλ1 , is the dimension of its eigenspace. So,
gλ1 = dim(Eλ1).
LEMMA 2 Let A and B be similar matrices, then A and B have the same characteristic poly-
nomial, and hence the same eigenvalues.
148
Proof: If A and B are similar, then there exists an invertible matrix P such that
P−1AP = B. Hence
det(B − λI) = det(P−1AP − λP−1P )
= det(P−1(A− λI)P )
= detP−1 det(A− λI) detP
= det(A− λI)
THEOREM 3 Let A be an n× n matrix with eigenvalue λ1. Then
1 ≤ gλ1 ≤ aλ1
Proof: First, by definition of an eigenvalue we have that gλ1 = dim(Eλ1) ≥ 1.
Let {~v1, . . . , ~vk} be a basis for Eλ1 . Extend this to a basis {~v1, . . . , ~vk, ~wk+1, . . . , ~wn}
for Rn and let P =
[
~v1 · · · ~vk ~wk+1 · · · ~wn
]
. Since the columns of P form
a basis for Rn, we have that P is invertible by the Inverse Matrix Theorem. Let
F =
[
~v1 · · · ~vk
]
and G =
[
~wk+1 · · · ~wn
]
so that P can be written as the block
matrix P =
[
F G
]
. Similarly, write P−1 as the block matrix P−1 =
[
H
J
]
so that
I = P−1P =
[
H
J
] [
F G
]
=
[
HF HG
JF JG
]
In particular, this gives that HF = Ik and JF = On−k,n−k, the (n− k)× (n− k) zero
matrix. Define the matrix B by B = P−1AP , so that B is similar to A.
B = P−1AP
=
[
H
J
]
A
[
F G
]
=
[
H
J
] [
AF AG
]
=
[
H
J
] [
λ1F AG
]
=
[
λ1HF HAG
λ1JF JAG
]
=
[
λ1I X
On−k,n−k Y
]
where X is k×k and Y is (n−k)×k. Then, by Lemma 2, we have the characteristic
polynomial C(λ) of A equals the characteristic polynomial of B. In particular,
C(λ) = det(B − λI) =
∣∣∣∣(λ1 − λ)I XOn−k,n−k Y − λI
∣∣∣∣
149
Expanding the determinant we get
C(λ) = (−1)k(λ− λ1)k det(Y − λI)
Since, λ1 may or may not be a root of det(Y − λI), we get that aλ1 ≥ k = gλ1 .
EXAMPLE 5 In Example 4 we saw that the matrix A has only one eigenvalue λ1 = 1 which is a
triple root of the characteristic polynomial. Thus, the algebraic multiplicity of λ1 is
aλ1 = 3. The geometric multiplicity of λ1 is gλ1 = dimEλ1 = 2.
EXAMPLE 6 Let A =
−1 6 31 0 −1
−3 6 5
. Determine the geometric and algebraic multiplicity of all
eigenvalues of A.
Solution: We have
0 = C(λ) =
∣∣∣∣∣∣
−1− λ 6 3
1 −λ −1
−3 6 5− λ
∣∣∣∣∣∣
= (−1− λ)
∣∣∣∣−λ −16 5− λ
∣∣∣∣+ 6(−1)
∣∣∣∣ 1 −1−3 5− λ
∣∣∣∣+ 3
∣∣∣∣ 1 −λ−3 6
∣∣∣∣
= (−1− λ)(λ2 − 5λ+ 6)− 6(−λ+ 2) + 3(6− 3λ)
= −λ3 + 4λ2 − 4λ
= −λ(λ2 − 4λ+ 4)
= −λ(λ− 2)2
Thus, the eigenvalues of A are λ1 = 0 and λ2 = 2. Since λ1 is a single root of C(λ)
we have that aλ1 = 1. We have aλ2 = 2 since λ2 is a double root of C(λ).
For λ1 = 0, we have
A− λ1I =
−1 6 31 0 −1
−3 6 5
∼
1 0 −10 1 1/3
0 0 0
Thus, a basis for Eλ1 is
1−1/3
1
. Hence, gλ1 = dimEλ1 = 1.
For λ2 = 2, we have
A− λ2I =
−3 6 31 −2 −1
−3 6 3
∼
1 −2 −10 0 0
0 0 0
Thus, a basis for Eλ2 is
21
0
,
10
1
. Hence, gλ2 = dimEλ2 = 2.
150
Observe in the example above that we needed to factor a cubic polynomial. For
purposes of this course, we typically have two options to make this easier. First, we
can try to use row and/or column operations when finding det(A−λI) so that we do
not have to factor a cubic polynomial, or alternately we can use the Rational Roots
Theorem and synthetic division to factor the cubic. Of course, in the real world we
generally have polynomials of degree much larger than 3 and are not so lucky as to
have rational roots. In these cases techniques for approximating roots of polynomials
or more specialized methods for approximating eigenvalues are employed. These are
beyond the scope of this course; however, you should keep in mind that the problems
we do here are not really representative of what you may see in the future.
EXAMPLE 7 Let A =
−8 −3 −19 4 −2
−8 −8 −1
. Determine the geometric and algebraic multiplicity of all
eigenvalues of A.
Solution: Since adding a multiple of one column to another or adding a multiple of
one row to another does not change the determinant, we get
0 = C(λ) =
∣∣∣∣∣∣
−8− λ −3 −1
9 4− λ −2
−8 −8 −1− λ
∣∣∣∣∣∣
=
∣∣∣∣∣∣
−5− λ −3 −1
5 + λ 4− λ −2
0 −8 −1− λ
∣∣∣∣∣∣
=
∣∣∣∣∣∣
−5− λ −3 −1
0 1− λ −3
0 −8 −1− λ
∣∣∣∣∣∣
= (−5− λ)(λ2 − 25)
= −(λ+ 5)2(λ− 5)
Hence, the eigenvalues of A are λ1 = −5 and λ2 = 5 with aλ1 = 2 and aλ2 = 1.
For λ1 = −5, we have
A− λ1I =
−3 −3 −19 9 −2
−8 −8 4
∼
1 1 00 0 1
0 0 0
Thus, a basis for Eλ1 is
−11
0
. Hence, gλ1 = dimEλ1 = 1.
For λ2 = 5, we have
A− λ2I =
−13 −3 −19 −1 −2
−8 −8 −6
∼
1 0 −1/80 1 7/8
0 0 0
151
Thus, a basis for Eλ2 is
1/8−7/8
1
. Hence, gλ2 = dimEλ2 = 1.
6.3 Diagonalization
DEFINITION
Diagonalizable
An n× n matrix A is said to be diagonalizable if A is similar to a diagonal matrix
D. If P−1AP = D, then we say that P diagonalizes A.
REMARK
In this course, we will restrict ourselves to diagonalizing matrices with real entries
over R. That is, if A has a complex eigenvalue, then we will say that A is not
diagonalizable over R.
Our work in the last section showed that if an n×n matrix A is diagonalizable, then
the columns of P must be linearly independent eigenvectors of A and the diagonal
entries of the diagonal matrix D are the eigenvalues of A corresponding columnwise
to the eigenvectors in P . In particular, if {~v1, . . . , ~vn} is a basis for Rn of eigenvectors
of A corresponding to eigenvalues λ1, . . . , λn, then taking P =
[
~v1 · · · ~vn
]
, we get
that
P−1AP = diag(λ1, . . . , λn)
We know how to find all of the eigenvalues of A, but how do we find a set of n linearly
independent eigenvectors of A?
LEMMA 1 Let A be an n×n matrix with eigenpairs (λ1, ~v1), (λ2, ~v2), . . . , (λk, ~vk) where λi 6= λj
for i 6= j, then {~v1, . . . , ~vk} is linearly independent.
Proof: We will prove this by induction. If k = 1, then the result is trivial, since by
definition of an eigenvector ~v1 6= ~0. Assume that the result is true for some k ≥ 1.
To show {~v1, . . . , ~vk, ~vk+1} is linearly independent we consider
c1~v1 + · · ·+ ck~vk + ck+1~vk+1 = ~0 (6.2)
Observe that since A~vi = λi~vi we have
(A− λiI)~vj = A~vj − λi~vj = λj~vj − λi~vj = (λj − λi)~vj
152
for all 1 ≤ i, j ≤ k + 1. Thus, multiplying both sides of (6.2) by A− λk+1I gives
~0 = c1(λ1 − λk+1)~v1 + · · ·+ ck(λk − λk+1)~vk + ck+1(λk+1 − λk+1)~vk+1
= c1(λ1 − λk+1)~v1 + · · ·+ ck(λk − λk+1)~vk +~0
By our induction hypothesis, {~v1, . . . , ~vk} is linearly independent thus all the coeffi-
cients must be 0. But, λi 6= λk+1 for all 1 ≤ i ≤ k, so we must have c1 = · · · = ck = 0.
Thus (6.2) becomes
~0 + ck+1~vk+1 = ~0.
But ~vk+1 6= ~0 since it is an eigenvector, hence ck+1 = 0. Therefore, the set is linearly
independent.
THEOREM 2 Let A be an n × n matrix with distinct eigenvalues λ1, . . . , λk and let Bi =
{~vi,1,
. . . , ~vi,gλi} denote a basis for the eigenspace of λi for 1 ≤ i ≤ k.
ThenB1 ∪ B2 ∪ · · · ∪ Bk is a linearly independent set.
Proof: We prove the result by induction on k. If k = 1, then B1 is linearly inde-
pendent since it is a basis for Eλ1 . Assume the result is true for some j. That is,
assume that B1 ∪ · · · ∪ Bj is linearly independent and consider B1 ∪ · · · ∪ Bj ∪ Bj+1.
For simplicity, we will denote the vectors in B1 ∪ · · · ∪Bj by ~w1, . . . , ~w`. Assume that
there exists a non-trivial solution to the equation
c1 ~w1 + · · ·+ c` ~w` + d1~vj+1,1 + · · ·+ dgλj+1~vj+1,gλj+1 = ~0
Observe that c1, . . . , c` cannot be all zero as otherwise this would contradict that the
fact that Bj+1 is a basis. Similarly, d1, . . . , dgλj+1 cannot be all zero as this would
contradict the inductive hypothesis. Thus, we must have at least one ci non-zero and
at least one di non-zero. But then we would have a linear combination of some vectors
in Bj+1 equaling a linear combination of vectors in B1, . . . ,Bj which would contradict
Lemma 1. Hence, the solution solution must be the trivial solution. Therefore,
B1 ∪ · · · ∪Bj ∪Bj+1 is linearly independent. The result now follows by induction.
The theorem shows us how to form a linearly independent set of eigenvectors of a
matrix A. We first find a basis for the eigenspace of every eigenvalue of A. Then the
set containing all the vectors in every basis is linearly independent by Theorem 2.
However, for the matrix P to be invertible, we actually need n linearly independent
eigenvectors of A. The next result gives us the condition required for there to exist
n linearly independent eigenvectors.
COROLLARY 3 Let A be an n× n matrix with distinct eigenvalues λ1, . . . , λk. Then A is diagonal-
izable if and only if gλi = aλi for 1 ≤ i ≤ k.
Proof: The proof is left as an exercise.
If λ is an eigenvalue of A such that gλ < aλ, then λ is said to be deficient. Observe
that since gλ ≥ 1 for every eigenvalue λ, then any eigenvalue λ such that aλ = 1
cannot be deficient. This gives the following result.
153
COROLLARY 4 Let A be an n× n matrix with n distinct eigenvalues. Then A is diagonalizable.
Proof: The proof is left as an exercise.
We now have an algorithm for diagonalizing a matrix A or showing that it is not
diagonalizable.
ALGORITHM
Let A be an n× n matrix.
1. Find and factor the characteristic polynomial C(λ) = det(A− λI).
2. Let λ1, . . . , λn denote the n-roots of C(λ) (repeated according to multiplicity).
f any of the roots of C(λ) are complex, then A is not diagonalizable over R.
3. Find a basis for the eigenspace of each distinct eigenvalue λ by finding the
nullspace of A− λI.
4. If gλ < aλ for any eigenvalue λ of A, then A is not diagonalizable. Otherwise,
form a basis {~v1, . . . , ~vn} for Rn of eigenvectors of A by using Theorem 2. Let
P =
[
~v1 · · · ~vn
]
. Then P−1AP = diag(λ1, . . . , λn) where λi is an eigenvalue
corresponding to the eigenvector ~vi for 1 ≤ i ≤ n.
EXAMPLE 1 Show that A =
−1 6 31 0 −1
−3 6 5
is diagonalizable and find the invertible matrix P and
diagonal matrix D such that P−1AP = D.
Solution: In Example 6.2.6 we found that the eigenvalues of A are λ1 = 0 and
λ2 = 2, and that aλ1 = gλ1 and aλ2 = gλ2 . Hence, A is diagonalizable by Corollary
3. Moreover, we get the the columns of P are the basis vectors from each of the
eigenspaces. That is,
P =
1 2 1−1/3 1 0
1 0 1
The diagonal entries in D are the eigenvalues of A corresponding columnwise to the
columns in P . Hence,
P−1AP = D = diag(0, 2, 2) =
0 0 00 2 0
0 0 2
154
REMARK
It is important to remember that the answer is not unique. We can arrange the
linearly independent eigenvectors of A as the columns of P in any order. We only
have to ensure the entry dii in D is an eigenvalue of A of the eigenvector in the i-th
column of P .
EXAMPLE 2 Show that A =
[
1 1
0 1
]
is not diagonalizable.
Solution: We have
0 = C(λ) =
∣∣∣∣1− λ 10 1− λ
∣∣∣∣ = (λ− 1)2
Hence, λ1 = 1 is the only eigenvalue and aλ1 = 2. We then get
A− λ1I =
[
0 1
0 0
]
Therefore, a basis for Eλ1 is
{[
1
0
]}
. Therefore gλ1 = 1 < aλ1 , hence A is not
diagonalizable.
EXAMPLE 3 Show that A =
[
0 1
−1 0
]
is not diagonalizable over R.
Solution: We have
0 = C(λ) =
∣∣∣∣−λ 1−1 −λ
∣∣∣∣ = λ2 + 1
So, A does not have any real eigenvalues, and is not diagonalizable over R.
REMARK
In Math 235 we will show that A =
[
0 1
−1 0
]
is diagonalizable over C.
EXAMPLE 4 Show that A =
[
1 2
2 1
]
is diagonalizable and find the invertible matrix P and diagonal
matrix D such that P−1AP = D.
Solution: We have
0 = C(λ) =
∣∣∣∣1− λ 22 1− λ
∣∣∣∣ = λ2 − 2λ− 3 = (λ− 3)(λ+ 1)
155
Thus, the eigenvalues of A are λ1 = 3 and λ2 = −1. Since A has two distinct
eigenvalues, it is diagonalizable by Corollary 3.
For λ1 = 3, we have
A− λ1I =
[−2 2
2 −2
]
∼
[
1 −1
0 0
]
Thus, a basis for Eλ1 is
{[
1
1
]}
.
For λ2 = −1, we have
A− λ2I =
[
2 2
2 2
]
∼
[
1 1
0 0
]
Thus, a basis for Eλ2 is
{[−1
1
]}
.
We now let P =
[−1 1
1 1
]
and get
D = P−1AP =
[−1 0
0 3
]
EXAMPLE 5 Show that A =
1 2 22 1 2
2 2 1
is diagonalizable and find the invertible matrix P and
diagonal matrix D such that P−1AP = D.
Solution: Using column and row operations on the determinant, we get
0 = C(λ) =
∣∣∣∣∣∣
1− λ 2 2
2 1− λ 2
2 2 1− λ
∣∣∣∣∣∣
=
∣∣∣∣∣∣
1− λ 2 0
2 1− λ 1 + λ
2 2 −1− λ
∣∣∣∣∣∣
=
∣∣∣∣∣∣
1− λ 2 0
4 3− λ 0
2 2 −1− λ
∣∣∣∣∣∣
= (−1− λ)
∣∣∣∣1− λ 24 3− λ
∣∣∣∣
= −(λ+ 1)(λ2 − 4λ− 5)
= −(λ+ 1)2(λ− 5)
Thus, the eigenvalues of A are λ1 = −1 and λ2 = 5 where aλ1 = 2 and aλ2 = 1.
156
For λ1 = −1, we have
A− λ1I =
2 2 22 2 2
2 2 2
∼
1 1 10 0 0
0 0 0
Thus, a basis for Eλ1 is
−11
0
,
−10
1
, so gλ1 = aλ1 .
For λ2 = 5, we have
A− λ2I =
−4 2 22 −4 2
2 2 −4
∼
1 0 −10 1 −1
0 0 0
Thus, a basis for Eλ2 is
11
1
, so gλ2 = aλ2 .
Therefore, A is diagonalizable by Corollary 3, with P =
−1 −1 11 0 1
0 1 1
and get
D = P−1AP =
−1 0 00 −1 0
0 0 5
6.4 Powers of Matrices
We now briefly look at one of the many applications of diagonalization. In particular,
we will look at how to use diagonalization to easily compute powers of a diagonalizable
matrix.
Let A be an n× n matrix. Then, we define powers of a matrix in the expected way.
That is, for any positive integer k, Ak is defined to be A multiplied by itself k times.
Thus, for A =
[
1 2
−1 4
]
we obviously have
A2 =
[
1 2
−1 4
] [
1 2
−1 4
]
=
[−1 10
−5 14
]
A3 = A2A =
[−1 10
−5 14
] [
1 2
−1 4
]
=
[−11 38
−19 46
]
A1000 =
[
21001 − 31000 −21001 + 2 · 31000
21000 − 31000 −21000 + 2 · 31000
]
157
Perhaps the value of A1000 is not so obvious. Moreover, we certainly did not multiply
a thousand matrices A together. Instead, we again use the power of the diagonal
form of a matrix.
Observe that multiplying a diagonal matrix D by itself is easy. In particular, we
have if D = diag(d1, . . . , dn), then D
k = diag(dk1, . . . , d
k
n). The following theorem,
shows us how we can use this fact with diagonalization to quickly take powers of a
diagonalizable matrix A.
THEOREM 1 Let A be an n × n matrix. If there exists a matrix P and diagonal matrix D such
that P−1AP = D, then
Ak = PDkP−1
Proof: We prove the result by induction on k. If k = 1, then P−1AP = D implies
A = PDP−1 and so the result holds. Assume the result is true for some k. We then
have
Ak+1 = AkA = (PDkP−1)(PDP−1) = PDkP−1PDP−1 = PDkIDP−1 = PDk+1P−1
as required.
EXAMPLE 1 Let A =
[
1 2
−1 4
]
. Show that A1000 =
[
21001 − 31000 −21001 + 2 · 31000
21000 − 31000 −21000 + 2 · 31000
]
.
Solution: We first diagonalize A. We have
0 = det(A− λI) =
∣∣∣∣1− λ 2−1 4− λ
∣∣∣∣ = λ2 − 5λ+ 6 = (λ− 2)(λ− 3)
Thus, the eigenvalues of A are λ1 = 2 and λ2 = 3.
For λ1 = 2 we have
A− λ1I =
[−1 2
−1 2
]
∼
[−1 2
0 0
]
so a basis for Eλ1 is
{[
2
1
]}
.
For λ2 = 3 we have
A− λ2I =
[−2 2
−1 1
]
∼
[
1 −1
0 0
]
so a basis for Eλ2 is
{[
1
1
]}
.
158
Therefore, P =
[
2 1
1 1
]
, D =
[
2 0
0 3
]
. We compute P−1 =
[
1 −1
−1 2
]
and so
A1000 = PD1000P−1
=
[
2 1
1 1
] [
21000 0
0 31000
] [
1 −1
−1 2
]
=
[
21001 31000
21000 31000
] [
1 −1
−1 2
]
[
21001 − 31000 −21001 + 2 · 31000
21000 − 31000 −21000 + 2 · 31000
]
EXAMPLE 2 Let A =
[−2 2
−3 5
]
. Calculate A200.
Solution: We have
0 = det(A− λI) =
∣∣∣∣−2− λ 2−3 5− λ
∣∣∣∣ = λ2 − 3λ− 4 = (λ+ 1)(λ− 4).
Thus, the eigenvalues of A are λ1 = −1 and λ2 = 4.
For λ1 = −1 we have A− λ1I =
[−1 2
−3 6
]
∼
[
1 −2
0 0
]
. so a basis for Eλ1 is
{[
2
1
]}
.
For λ2 = 4 we have A− λ2I =
[−6 2
−3 1
]
∼
[
1 −1/3
0 0
]
. so a basis for Eλ2 is
{[
1
3
]}
.
Therefore, P =
[
2 1
1 3
]
, D =
[−1 0
0 4
]
. We compute P−1 = 1
5
[
3 −1
−1 2
]
and so
A200 = PD200P−1 =
[
2 1
1 3
] [
1 0
0 4200
]
1
5
[
3 −1
−1 2
]
=
1
5
[
6− 4200 −2 + 2 · 4200
3− 3 · 4200 −1 + 6 · 4200
]
Index
B-coordinates, 99
B-matrix, 137
k-plane, 11
Rn, 1
adjugate, 129
algebraic multiplicity, 145
angle, 18
augmented matrix, 32
basis, 11, 89
basis algorithm, 92
block matrix, 55
characteristic polynomial, 144
coefficient matrix, 33
coefficients, 27
cofactor, 119
cofactor matrix, 129
columnspace, 71
composition of mappings, 76
consistent, 28
coordinate vector, 99
Cramer’s Rule, 130
cross product, 22
determinant, 109, 118, 119
diagonal matrix, 139
diagonalizable, 149
dimension, 96
directed line segment, 20
dot product, 16
eigenpair, 141
eigenspace, 144
eigenvalue, 141, 142
eigenvector, 141, 142
elementary matrix, 112
elementary row operations, 35
equivalent, 32
four fundamental subspaces, 69
free variable, 38
geometric multiplicity, 145
homogeneous system, 43
hyperplane, 11
identity matrix, 54
inconsistent, 28
inverse, 107
invertible, 107
Invertible Matrix Theorem, 109, 126
kernel, 64
left inverse, 106
left nullspace, 71
length, 18
line, 11
linear combination, 3
linear equation, 27
linear mapping, 58
linearly independent, 9, 87
lower triangular, 123
matrix, 45
matrix multiplication, 51
matrix of a linear operator, 137
matrix-vector multiplication, 48, 50
norm, 18
normal vector, 20
nullspace, 69
orthogonal, 19
orthogonal set, 19
parallelotope, 135
perpendicular, 24
plane, 11
points, 1
projection, 24
159
160
range, 66
rank, 41
reduced row echelon form, 36
right inverse, 106
right-hand side, 27
rotation matrix, 62
row equivalent, 35
row reducing algorithm, 39
rowspace, 71
scalar equation, 20
scalar product, 16
similar, 141
skew-symmetric, 84
solution, 28
solution set, 30
solution space, 44
span, 6, 85
standard basis, 11
standard inner product, 16
standard matrix, 60
subspace, 82
subspace of Rn, 13
subspace test, 13, 82
system of linear equations, 28
trace, 140
transpose, 47
trivial solution, 9, 43
trivial vector space, 79
unit vector, 18
upper triangular, 123
vector equation, 5
vector space over R, 77
vectors, 1, 77
学霸联盟