David Liu
UTM edits by Daniel Zingaro
Introduction to the Theory of
Computation
Lecture Notes and Exercises for CSC236
Department of Computer Science
University of Toronto
Contents
Introduction 5
Induction 9
Recursion 29
Program Correctness 49
Regular Languages & Finite Automata 67
In Which We Say Goodbye 85
Appendix: Runtime Analysis 87
Introduction
There is a common misconception held by our students, the students of
other disciplines, and the public at large about the nature of computer
science. So before we begin our study this semester, let us clear up
exactly what we will be learning: computer science is not the study of
programming, any more than chemistry is the study of test tubes or
math the study of calculators. To be sure, programming ability is a
vital tool in any computer scientist’s repertoire, but it is still a tool in
service to a higher goal.
Computer science is the study of problem-solving. Unlike other disci-
plines, where researchers use their skill, experience, and luck to solve
problems at the frontier of human knowledge, computer science asks:
What is problem-solving? How are problems solved? Why are some
problems easier to solve than others? How can we measure the quality
of a problem’s solution?
Even the many who go into industry
confront these questions on a daily
basis in their work!
It should come as no surprise that the field of computer science
predates the invention of computers, since humans have been solving
problems for millennia. Our English word algorithm, a sequence of
steps taken to solve a problem, is named after the Persian mathemati-
cian Muhammad ibn Musa al-Khwarizmi, whose mathematics texts
were compendia of mathematics computational procedures. In 1936,
The word algebra is derived from the
word al-jabr, appearing in the title
of one of his books, describing the
operation of subtracting a number from
both sides of an equation.
Alan Turing, one of the fathers of modern computer science, devel-
oped the Turing Machine, a theoretical model of computation which is
widely believed to be just as powerful as all programming languages
in existence today. In one of the earliest and most fundamental results
A little earlier, Alonzo Church (who
would later supervise Turing during
the latter’s graduate studies) developed
the lambda calculus, an alternative
model of computation that forms
the philosophical basis for functional
programming languages like Scheme,
Haskell, and ML.
in computer science, Turing proved that there are some problems that
cannot be solved by any computer that has ever or will ever be built –
before computers had been invented at all!
6 david liu utm edits by daniel zingaro
But Why Do I Care?
A programmer’s value lies not in her ability to write code, but to un-
derstand problems and design solutions – a much harder task. Be-
ginning programmers often write code by trial and error (“Does this
compile? What if I add this line?”), which indicates not a lack of pro-
gramming experience, but a lack of design experience. When presented
with a problem, many students often jump straight to the computer,
even if they have no idea what they are going to write! And when the
code is complete, they are at a loss when asked the two fundamental
questions: Why is your code correct, and is it a good solution?
“My code is correct because it passed
all of the tests” is reasonable but unsat-
isfying. What I really want to know is
how your code works.
In this course, you will learn the skills necessary to answer both of
these questions, improving both your ability to reason about the code
you write and your ability to communicate your thinking with others.
These skills will help you design cleaner and more efficient programs,
and clearly document and present your code. Of course, like all skills,
you will practice and refine these throughout your university educa-
tion and in your careers.
Overview of this Course
The first section of the course introduces the powerful proof technique
of induction. We will see how inductive arguments can be used in
many different mathematical settings; you will master the structure
and style of inductive proofs, so that later in the course you will not
even blink when asked to read or write a “proof by induction.”
From induction, we turn our attention to the runtime analysis of
recursive programs. You have done this already for non-recursive pro-
grams, but did not have the tools necessary to handle recursion. We
will see that (mathematical) induction and (programming) recursion
are two sides of the same coin, so we use induction to make analysing
recursive programs easy as cake. After these lessons, you will always Some might even say, chocolate cake.
be able to evaluate your recursive code based on its runtime, a very
important consideration!
We next turn our attention to the correctness of both recursive and
non-recursive programs. You already have some intuition about why
your programs are correct; we will teach you how to formalize this
intuition into mathematically rigorous arguments, so that you may
reason about the code you write and determine errors without the use
of testing.
This is not to say tests are unnecessary!
The methods we’ll teach you in this
course are quite tricky for larger soft-
ware systems. However, a more mature
understanding of your own code cer-
tainly facilitates finding and debugging
errors.
introduction to the theory of computation 7
Finally, we will turn our attention to the simplest model of computa-
tion, the finite automaton. This serves as both an introduction to more
complex computational models like Turing Machines, and also formal
language theory through the intimate connection between finite au-
tomata and regular languages. Regular languages and automata have
many other applications in computer science, from text-based pattern
matching to modelling biological processes.
Prerequisite Knowledge
CSC236 is mainly a theoretical course, the successor to MAT102. This
is fundamentally a computer science course, though, so while math-
ematics will play an important role in our thinking, we will mainly
draw motivation from computer science concepts. Also, you will be
expected to both read and write Python code at the level of CSC148.
Here are some brief reminders about things you need to know – and
if you find that you don’t remember something, review early!
Concepts from MAT102
In MAT102, you learned how to write proofs. This is the main object
of interest in CSC236, so you should be comfortable with this style
of writing. However, one key difference is that we will not expect
(nor award marks for) a particular proof structure – indentation is
no longer required, and your proofs can be mixtures of mathematics,
English paragraphs, pseudocode, and diagrams! Of course, we will
still greatly value clear, well-justified arguments, especially since the
content will be more complex.
So a technically correct solution that is
extremely difficult to understand will
not receive full marks. Conversely, an
incomplete solution which explains
clearly partial results (and possibly
even what is left to do to complete
the solution) will be marked more
generously.
Concepts from CSC148
Recursion, recursion, recursion. If you liked using recursion CSC148,
you’re in luck: induction, the central proof structure in this course, is
the abstract thinking behind designing recursive functions. And if you
didn’t like recursion or found it confusing, don’t worry! This course
will give you a great opportunity to develop a better feel for recursive
functions in general, and even give you programming opportunities to
get practical experience.
This is not to say you should forget everything you have done with
iterative programs; loops will be present in our code throughout this
8 david liu utm edits by daniel zingaro
course, and will be the central object of study for a week or two when
we discuss program correctness. In particular, you should be very
comfortable with the central design pattern of first-year python: com-
A design pattern is a common coding
template which can be used to solve a
variety of different problems. “Looping
through a list” is arguably the simplest
one.puting on a list by processing its elements one at a time using a for or
while loop.
You should also be comfortable with terminology associated with
trees, which will come up occasionally throughout the course when we
discuss induction proofs.
You will also have to remember the fundamentals of Big-O algo-
rithm analysis, and how to determine tight asymptotic bounds for
common functions.
Finally, the last part of the course deals with regular languages;
you should be familiar with the terminology associated with strings,
including length, reversal, concatenation, and the empty string.
Induction
What is the sum of the numbers from 0 to n? This is a well-known
identity you’ve probably seen before:
n
∑
i=0
i =
n(n + 1)
2
.
A “proof” of this is attributed to Gauss:
1+ 2+ 3+ · · ·+ n− 1+ n = (1+ n) + (2+ n− 1) + (3+ n− 2) + · · ·
= (n + 1) + (n + 1) + (n + 1) + · · ·
=
n
2
(n + 1) (since there are
n
2
pairs)
This isn’t exactly a formal proof – what if n is odd? – and although it We ignore the 0 in the summation, since
this doesn’t change the sum.could be made into one, this proof is based on a mathematical “trick”
that doesn’t work for, say,
n
∑
i=0
i2. And while mathematical tricks are
often useful, they’re hard to come up with in the first place! Induction
gives us a different way to tackle this problem that is astonishingly
straightforward.
A predicate is a parametrized logical statement. Another way to view
a predicate is as a function that takes in one or more arguments, and
outputs either True or False. Some examples of predicates are:
EV(n) : n is even
GR(x, y) : x > y
FROSH(a) : a is a first-year university student
Every predicate has a domain, the set of its possible input values. For
example, the above predicates could have domains N, R, and “the set We will always use the convention that
0 ∈N unless otherwise specified.of all UofT students,” respectively. Predicates give us a precise way
of formulating English problems; the predicate that is relevant to our
example is
P(n) :
n
∑
i=0
i =
n(n + 1)
2
.
10 david liu utm edits by daniel zingaro
You might be thinking right now: “Okay, now we’re going to prove
A common mistake: defining the
predicate to be something like
P(n) :
n(n + 1)
2
. Such an expres-
sion is wrong and misleading because
it isn’t a True/False value, and so fails
to capture precisely what we want to
prove.
that P(n) is true.” But this is wrong, because we haven’t yet defined
n! So in fact we want to prove that P(n) is true for all natural numbers
n, or written symbolically, ∀n ∈ N, P(n). Here is how a formal proof
might go if we were not using mathematical induction:
Proof of ∀n ∈N,
n
∑
i=0
i =
n(n + 1)
2
Let n ∈N.
# Want to prove that P(n) is true.
Case 1: Assume n is even.
# Gauss' trick
...
Then P(n) is true.
Case 2: Assume n is odd.
# Gauss' trick, with a twist?
...
Then P(n) is true.
Then in all cases, P(n) is true.
Then ∀n ∈N, P(n). ■
Instead, we’re going to see how induction gives us a different, easier
way of proving the same thing.
The Induction Idea
Suppose we want to create a viral Youtube video featuring “The World’s
Longest Domino Chain!!! (like plz)".
Of course, a static image like the one featured on the right is no
good for video; instead, once we have set it up we plan on recording
all of the dominoes falling in one continuous, epic take. It took a lot
of effort to set up the chain, so we would like to make sure that it will
work; that is, that once we tip over the first domino, all the rest will
fall. Of course, with dominoes the idea is rather straightforward, since
we have arranged the dominoes precisely enough that any one falling
will trigger the next one to fall. We can express this thinking a bit more
formally:
(1) The first domino will fall (when we push it).
(2) For each domino, if it falls, then the next one in the chain will fall
(because each domino is close enough to the next one).
introduction to the theory of computation 11
From these two thoughts, we can conclude that
(3) Every domino in the chain will fall.
We can apply the same reasoning to the set of natural numbers. In-
stead of “every domino in the chain will fall,” suppose we want to
prove that “for all n ∈ N, P(n) is true”, where P(n) is some predi-
cate. The analogues of the above statements in the context of natural
numbers are
(1) P(0) is true (0 is the “first” natural number) The “is true” is redundant, but we will
often include these words for clarity.
(2) ∀k ∈N, P(k)⇒ P(k + 1)
(3) ∀n ∈N, P(n) is true
Putting these together yields the Principle of Simple Induction (also known simple/mathematical induction
as Mathematical Induction):(
P(0) ∧ ∀k ∈N, P(k)⇒ P(k + 1))⇒ ∀n ∈N, P(n)
A different, slightly more mathematical intuition for what induction
says is that “P(0) is true, and P(1) is true because P(0) is true, and P(2)
is true because P(1) is true, and P(3) is true because. . . ” However, it
turns out that a more rigorous proof of simple induction doesn’t ex-
ist from the basic arithmetic properties of the natural numbers alone.
Therefore mathematicians accept the principle of induction as an ax-
iom, a statement as fundamentally true as 1+ 1 = 2.
It certainly makes sense intuitively, and
turns out to be equivalent to another
fundamental math fact called the Well-
Ordering Principle.
This gives us a new way of proving a statement is true for all natural
numbers: instead of proving P(n) for an arbitrary n, just prove P(0),
and then prove the link P(k)⇒ P(k+ 1) for an arbitrary k. The former
step is called the base case, while the latter is called the induction step.
We’ll see exactly how such a proof goes by illustrating it with the
opening example.
Example. Prove that for every natural number n,
n
∑
i=0
i =
n(n + 1)
2
.
The first few induction examples
in this chapter have a great deal of
structure; this is only to help you learn
the necessary ingredients of induction
proofs. We will not be marking for a
particular structure in this course, but
you will probably find it helpful to use
our keywords to organize your proofs.
Proof. First, we define the predicate associated with this question. This
lets us determine exactly what it is we’re going to use in the induction
proof.
Step 1 (Define the Predicate) P(n) :
n
∑
i=0
i =
n(n + 1)
2
It’s easy to miss this step, but without it, often you’ll have trouble
deciding precisely what to write in your proofs.
12 david liu utm edits by daniel zingaro
Step 2 (Base Case): n = 0. We would like to prove that P(0) is true.
Recall the meaning of P:
P(0) :
0
∑
i=0
i =
0(0+ 1)
2
.
This statement is trivially true, because both sides of the equation are
equal to 0.
For induction proofs, the base case
usually a very straightforward proof.
In fact, if you find yourself stuck on the
base case, then it is likely that you’ve
misunderstood the question and/or are
trying to prove the wrong predicate.Step 3 (Induction Step): the goal is to prove that ∀k ∈ N, P(k) ⇒
P(k + 1). Let k ∈ N be some arbitrary natural number, and assume
P(k) is true. This antecedent assumption has a special name: the In-
duction Hypothesis. Explicitly, we assume that
k
∑
i=0
i =
k(k + 1)
2
.
Now, we want to prove that P(k+ 1) is true, i.e., that
k+1
∑
i=0
i =
(k + 1)(k + 2)
2
.
This can be done with a simple calculation:
We break up the sum by removing the
last element.
k+1
∑
i=0
i =
(
k
∑
i=0
i
)
+ (k + 1)
=
k(k + 1)
2
+ (k + 1) (By Induction Hypothesis)
= (k + 1)
(
k
2
+ 1
)
=
(k + 1)(k + 2)
2
Therefore P(k + 1) holds. This completes the proof of the induction
The one structural requirement we do
have for this course is that you must
always state exactly where you use
the induction hypothesis. We expect
to see the words “by the induction
hypothesis” at least once in each of
your proofs.
step: ∀k ∈N, P(k)⇒ P(k + 1).
Finally, by the Principle of Simple Induction, we can conclude that
∀n ∈N, P(n). ■
In our next example, we look at a geometric problem – notice how
our proof will use no algebra at all, but instead constructs an argu-
ment from English statements and diagrams. This example is also
interesting because it shows how to apply simple induction starting at
a number other than 0.
Example. A triomino is a three-square L-shaped figure. To the right,
we show a 4-by-4 chessboard with one corner missing that has been
tiled with triominoes.
Prove that for all n ≥ 1, any 2n-by-2n chessboard with one corner
missing can be tiled with triominoes.
introduction to the theory of computation 13
Proof. Predicate: P(n): Any 2n-by-2n chessboard with one corner miss-
ing can be tiled with triominoes.
Base Case: This is slightly different, because we only want to prove
the claim for n ≥ 1 (and ignore n = 0). Therefore our base case is
n = 1, i.e., this is the “start” of our induction chain. When n = 1,
we consider a 2-by-2 chessboard with one corner missing. But such a
chessboard is exactly the same shape as a triomino, so of course it can
be tiled by triominoes!
Again, a rather trivial base case. Keep
in mind that even though it was simple,
the proof would have been incomplete
without it!Induction Step: Let k ≥ 1 and suppose that P(k) holds; that is,
that every 2k-by-2k chessboard with one corner missing can be tiled by
triominoes. (This is the Induction Hypothesis.) The goal is to show that
any 2k+1-by-2k+1 chessboard with one corner missing can be tiled by
triominoes.
Consider an arbitrary 2k+1-by-2k+1 chessboard with one corner miss-
ing. Divide it into quarters, each quarter a 2k-by-2k chessboard.
Exactly one of these has one corner missing; by the Induction Hy-
pothesis, this quarter can be tiled by triominoes. Next, place a single
triomino in the middle that covers one corner in each of the three re-
maining quarters.
Each of these quarters now has one corner covered, and by the I.H.
again, they can each be tiled by triominoes. This completes the tiling
of the 2k+1-by-2k+1 chessboard. ■ Note that in this proof, we used the
induction hypothesis twice! (Or tech-
nically, 4 times, one for each 2k-by-2k
quarter.)Before moving on, here is some intuition behind what we did in the
previous two examples. Given a problem of a 2n-by-2n chessboard,
we repeatedly broke it up into smaller and smaller parts, until we
reached the 2-by-2 size, which we could tile using just a single tri-
omino. This idea of breaking down the problem into smaller ones
“again and again” was a clear sign that a formal proof by induction
was the way to go. Be on the lookout for phrases like “repeat over and
over” in your own thinking to signal that you should be using induc-
tion. In the opening example, we used an even more specific approach: In your programming, this is the same
sign that points to using recursive
solutions as the easiest approach.
in the induction step, we took the sum of size k+ 1 and reduced it to a
sum of size k, and evaluated that using the induction hypothesis. The
cornerstone of simple induction is this link between problem instances
of size k and size k + 1, and this ability to break down a problem into
something exactly one size smaller.
Example. Consider the sequence of natural numbers satisfying the fol-
lowing properties: a0 = 1, and for all n ≥ 1, an = 2an−1 + 1. Prove
that for all n ∈N, an = 2n+1 − 1.
We will see in the next chapter one way
of discovering this expression for an.
14 david liu utm edits by daniel zingaro
Proof. The predicate we will prove is
P(n) : an = 2n+1 − 1.
The base case is n = 0. By the definition of the sequence, a0 = 1, and
20+1 − 1 = 2− 1 = 1, so P(0) holds.
For the induction step, let k ∈ N and suppose ak = 2k+1 − 1. Our
goal is to prove that P(k + 1) holds. By the recursive property of the
sequence,
ak+1 = 2ak + 1
= 2(2k+1 − 1) + 1 (by the I.H.)
= 2k+2 − 2+ 1
= 2k+2 − 1 ■
When Simple Induction Isn’t Enough
By this point, you have done several examples using simple induc-
tion. Recall that the intuition behind this proof technique is to reduce
problems of size k+ 1 to problems of size k (where “size” might mean
the value of a number, or the size of a set, or the length of a string,
etc.). However, for many problems there is no natural way to reduce
problem sizes just by 1. Consider, for example, the following problem:
Every prime can be written as a product
of just one number: itself!Prove that every natural number greater than 1 has a prime
factorization, i.e., can be written as a product of primes.
How would you go about proving the induction step, using the
method we’ve used so far? That is, how would you prove P(k) ⇒
P(k + 1)? This is a very hard question to answer, because even the
prime factorizations of consecutive numbers can be completely differ-
ent!
E.g., 210 = 2 · 3 · 5 · 7, but 211 is prime.
But if I asked you to solve this question by “breaking the problem
down,” you would come up with the idea that if k + 1 is not prime,
then we can write k+ 1 = a · b, where a, b < k+ 1, and we can “do this
recursively” until we’re left with a product of primes. Since we always
identify recursion with induction, this hints at a more general form of
induction that we can use to prove this statement.
introduction to the theory of computation 15
Complete Induction
Recall the intuitive “chain of reasoning” that we do with simple in-
duction: first we prove P(0), and then use P(0) to prove P(1), then
use P(1) to prove P(2), etc. So when we get to k + 1, we try to prove
P(k + 1) using P(k), but we have already gone through proving P(0),
P(1), . . . , and P(k − 1), in addition to P(k)! In some sense, in Sim-
ple Induction we’re throwing away all of our previous work except for
P(k). In Complete Induction, we keep this work and use it in our proof
of the induction step. Here is the formal statement of The Principle
of Complete Induction: complete induction(
P(0) ∧ ∀k, (P(0) ∧ P(1) ∧ · · · ∧ P(k))⇒ P(k + 1))⇒ ∀n, P(n)
The only difference between Complete and Simple Induction is in
the antecedent of the inductive part: instead of assuming just P(k), we
now assume all of P(0), P(1), . . . , P(k). Since these are assumptions we
get to make in our proofs, Complete Induction proofs are often more
flexible than Simple Induction proofs — intuitively, because we have
“more to work with.”
Somewhat surprisingly, everything we
can prove with Complete Induction
we can also prove with Simple Induc-
tion, and vice versa. So these proof
techniques are equally powerful.
Azadeh Farzan gave a great analogy
for the two types of induction. Simple
Induction is like a person climbing
stairs step by step; Complete Induction
is like a robot with multiple giant legs,
capable of jumping from any lower step
to a higher step.
Let’s illustrate this (slightly different) technique by proving the ear-
lier claim about prime factorizations.
Example. Prove that every natural number greater than 1 has a prime
factorization.
Proof. Predicate: P(n) : “There are primes p1, p2, . . . , pm (for some m ≥
1) such that n = p1 p2 · · · pm.” We will show that ∀n ≥ 2, P(n).
Base Case: n = 2. Since 2 is prime, we can let p1 = 2 and say that
n = p1, so P(2) holds.
Induction Step: Here is the only structural difference for Complete
Induction proofs. We let k ≥ 2, and our induction hypothesis is now
to assume that for all 2 ≤ i ≤ k, P(i) holds. (That is, we’re assuming
P(2), P(3), P(4), . . . , P(k) are all true.) The goal is still the same: prove
that P(k + 1) is true.
There are two cases. In the first case, assume k + 1 is prime. Then
of course k + 1 can be written as a product of primes, so P(k + 1) is
true. The product contains a single number,
k + 1.
In the second case, k + 1 is composite. But then by the definition of
compositeness, there exist a, b ∈N such that k+ 1 = ab and 2 ≤ a, b ≤
16 david liu utm edits by daniel zingaro
k; that is, k + 1 has factors other than 1 and itself. This is the intuition
from earlier. And here is the “recursive thinking”: by the induction
hypothesis, P(a) and P(b) hold. Therefore we can write We can only use the induction hypothe-
sis because a and b are at least 2 and less
than k + 1.a = q1 · · · ql1 and b = r1 · · · rl2 ,
where each of the q’s and r’s is prime. But then
k + 1 = ab = q1 · · · ql1 r1 · · · rl2 ,
and this is the prime factorization of k + 1. So P(k + 1) holds. ■
Note that we used inductive thinking to break down the problem;
but unlike Simple Induction where the size of the subproblem is one
less than the current problem size, we didn’t know much about the
sizes of the resulting problems (only that they were smaller than the
original problem). Complete Induction allows us to handle this sort of
structure.
Example. The Fibonacci sequence is a sequence of natural numbers de-
fined recursively as f1 = f2 = 1, and for all n ≥ 3, fn = fn−1 + fn−2.
Prove that for all n ≥ 1,
fn =
( 1+
√
5
2 )
n − ( 1−
√
5
2 )
n
√
5
.
Proof. Note that we really need complete induction here (and not just
simple induction) because fn is defined in terms of both fn−1 and fn−2,
and not just fn−1 only.
The predicate we will prove is P(n) : fn =
( 1+
√
5
2 )
n − ( 1−
√
5
2 )
n
√
5
. We
require two base cases: one for n = 1, and one for n = 2. These can
be checked by simple calculations:
( 1+
√
5
2 )
1 − ( 1−
√
5
2 )
1
√
5
=
1+
√
5
2 − 1−
√
5
2√
5
=
√
5√
5
= 1 = f1
( 1+
√
5
2 )
2 − ( 1−
√
5
2 )
2
√
5
=
6+2
√
5
4 − 6−2
√
5
4√
5
=
√
5√
5
= 1 = f2
introduction to the theory of computation 17
For the induction step, let k ≥ 2 and assume P(1), P(2), . . . , P(k) hold.
Consider fk+1. By the recursive definition, we have
fk+1 = fk + fk−1
=
( 1+
√
5
2 )
k − ( 1−
√
5
2 )
k
√
5
+
( 1+
√
5
2 )
k−1 − ( 1−
√
5
2 )
k−1
√
5
(by I.H.)
=
( 1+
√
5
2 )
k + ( 1+
√
5
2 )
k−1
√
5
− (
1−√5
2 )
k + ( 1−
√
5
2 )
k−1
√
5
=
( 1+
√
5
2 )
k−1( 1+
√
5
2 + 1)√
5
− (
1−√5
2 )
k−1( 1−
√
5
2 + 1)√
5
=
( 1+
√
5
2 )
k−1 · 6+2
√
5
4√
5
− (
1−√5
2 )
k−1 · 6−2
√
5
4√
5
=
( 1+
√
5
2 )
k−1( 1+
√
5
2 )
2
√
5
− (
1−√5
2 )
k−1( 1−
√
5
2 )
2
√
5
=
( 1+
√
5
2 )
k+1
√
5
− (
1−√5
2 )
k+1
√
5
■
Beyond Numbers
So far, our proofs have all been centred on natural numbers. Even in
situations where we have proved statements about other objects — like
sets and chessboards — our proofs have always required associating
these objects with natural numbers. Consider the following problem:
Prove that any non-empty binary tree has exactly one more
node than edge.
We could use either simple or complete induction on this problem
by associating every tree with a natural number (height and number
of nodes are two of the most common). But this is not the most “nat-
ural” way of approaching this problem (though it’s perfectly valid!)
because binary trees already have a lot of nice recursive structure that
we should be able to use directly, without shoehorning in natural num-
bers. What we want is a way of proving statements about objects other
than numbers. Thus we move away from N (the set of natural num-
bers), to more general sets (such as the set of all non-empty binary
trees).
18 david liu utm edits by daniel zingaro
Recursive Definitions of Sets
You are already familiar with many descriptions of sets: {2,π,
√
10},
{x ∈ R | x ≥ 4}, and “the set of all non-empty binary trees” are all
perfectly valid descriptions of sets. Unfortunately, these set descrip-
tions don’t lend themselves very well to induction, because induction
is recursion and it isn’t clear how to apply recursive thinking to any
of these descriptions. However, for some objects – like binary trees
– it is relatively straightforward to define them recursively. Here’s a
warm-up.
Example. Suppose we want to construct a recursive definition of N.
Here is one way. Define N to be the (smallest) set such that:
The “smallest” means that nothing else
is in N. This is an important point to
make; for example, the set of integers Z
also satisfies the given properties, but
includes more than N. In the recursive
definitions below, we omit “smallest”
but it is always implicitly there.
• 0 ∈N
• If k ∈N, then k + 1 ∈N
Notice how similar this definition looks to the Principle of Simple
Induction! This isn’t a coincidence: induction fundamentally makes
use of this recursive structure of N. We’ll refer to the first rule as the
base of the definition, and the second as the recursive rule. In general, a
recursive definition can have multiple base and recursive rules!
Example. Construct a recursive definition of “the set of all non-empty
binary trees.”
Intuitively, the base rule(s) always capture the smallest or simplest
elements of a set. Certainly the smallest non-empty binary tree is a
single node.
What about larger trees? This is where “breaking down” problems
into smaller subproblems makes the most sense. You should know
from CSC148 that we really store binary trees in a recursive manner:
every tree has a root node and links to the roots of the left and right
subtrees (the suggestive word here is “subtree.”) One slight subtlety
is that one or both of these subtrees could be empty. Here is a for-
mal recursive definition (before you read it, try coming up with one
yourself!):
• A single node is a non-empty binary tree.
• If T1, T2 are two non-empty binary trees, then the tree with a new
root r connected to the roots of T1 and T2 is a non-empty binary
tree.
introduction to the theory of computation 19
• If T1 is a non-empty binary tree, then the tree with a new root r
connected to the root of T1 to the left or to the right is a non-empty
binary tree.
Notice that this definition has two recursive rules, not one!
Structural Induction
Now, we mimic the format of our induction proofs, but with the recur-
sive definition of non-empty binary trees rather than natural numbers.
The similarity of form is why this type of proof is called structural
induction. In particular, notice the identical terminology. structural induction
Example. Prove that every non-empty binary tree has one more node
than edge.
Proof. As before, we need to define a predicate to nail down exactly
what it is we’d like to prove. However, unlike all of the previous
predicates we’ve seen, which have been boolean functions on natural
numbers, now the domain of the predicate is the set of all non-empty
binary trees.
Predicate: P(T): T has one more node than edge.
We will use structural induction to prove that for every non-empty
binary tree T, P(T) holds.
Note that here the domain of the pred-
icate is NOT N, but instead the set of
non-empty binary trees.
Base Case: Our base case is determined by the first rule. Suppose
T is a single node. Then it has one node and no edges, so P(T) holds.
Induction Step: We’ll divide our proof into two parts, one for each
recursive rule.
• Let T1 and T2 be two non-empty binary trees, and assume P(T1)
and P(T2) hold. (This is the induction hypothesis.) Let T be the tree
constructed by attaching a node r to the roots of T1 and T2. Let
V(G) and E(G) denote the number of nodes and edges in a tree G,
respectively. Then we have the equations
V(T) = V(T1) +V(T2) + 1
E(T) = E(T1) + E(T2) + 2
20 david liu utm edits by daniel zingaro
since one extra node (new root r) and two extra edges (from r to
the roots of T1 and T2) were added to form T. By the induction
hypothesis, V(T1) = E(T1) + 1 and V(T2) = E(T2) + 1, and so
V(T) = E(T1) + 1+ E(T2) + 1+ 1
= E(T1) + E(T2) + 2+ 1
= E(T) + 1
Therefore P(T) holds.
• Let T1 be a non-empty binary tree, and suppose P(T1) holds. Let T
be the tree formed by taking a new node r and adding an edge to
the root of T1. Then V(T) = V(T1) + 1 and E(T) = E(T1) + 1, and
since V(T1) = E(T1) + 1 (by the induction hypothesis), we have
V(T) = E(T1) + 2 = E(T) + 1. ■
In structural induction, we identify some property that is satisfied
by the simplest (base) elements of the set, and then show that the
property is preserved under each of the recursive construction rules.
We say that such a property is invariant
under the recursive rules, meaning
it isn’t affected when the rules are
applied. The term “invariant” will
reappear throughout this course in
different contexts.Here is some intuition: imagine you have a set of Lego blocks. Start-
ing with individual Lego pieces, there are certain “rules” that you can
use to combine Lego objects to build larger and larger structures, cor-
responding to (say) different ways of attaching Lego pieces together.
This is a recursive way of describing the (infinite!) set of all possible
Lego creations.
Now suppose you’d like to make a perfectly spherical object, like
a soccer ball or the Death Star. Unfortunately, you look in your Lego
kit and all you see are rectangular pieces! Naturally, you complain
to your mother (who bought the kit for you) that you’ll be unable to
make a perfect sphere using the kit. But she remains unconvinced:
maybe you should try doing it, she suggests, and if you’re lucky you’ll
come up with a clever way of arranging the pieces to make a sphere.
Aha! This is impossible, since you’re starting with non-spherical pieces,
and you (being a Lego expert) know that no matter which way you
combine Lego objects together, starting with rectangular objects yields
only other rectangular objects as results. So even though there are
many, many different rectangular structures you could build, none of
them could ever be perfect spheres.
introduction to the theory of computation 21
A Larger Example
Let us turn our attention to another useful example of induction: prov-
ing the equivalence of recursive and non-recursive definitions. We
know from our study of Python that often problems can be solved
using either recursive or iterative programs, but we’ve taken it for Although often a particular problem
lends itself more to one technique than
the other.
granted that these programs really can accomplish the same task. We’ll
look later in this course at proving things about what programs do, but
for a warm-up in this section, we’ll step back from programs and prove
a similar type of mathematical result.
Example. Consider the following recursively defined set S ⊆N ∗N:
• (0, 0) ∈ S
• If (a, b) ∈ S, then both (a + 1, b + 1) ∈ S and (a + 3, b) ∈ S Again, there are two recursive rules
here.
Also, define the set S′ = {(x, y) ∈ N ∗N | x ≥ y ∧ 3 | x − y}. Prove Here, 3 | x − y means that x − y is
divisible by 3.that these two definitions are equivalent, i.e., S = S′.
Proof. We divide our solution into two parts. First, we show using
structural induction that S ⊆ S′; that is, every element of S satisfies
the property of S′. Then, we prove using complete induction that
S′ ⊆ S; that is, every element of S′ can be constructed from the base
and recursive rules of S.
Part 1: S ⊆ S′. In this part, we show that the base case of S is in S′,
and that all elements generated using the recursive rules of S are also
in S′. For clarity, we define the predicate
P(x, y) : x ≥ y ∧ 3 | x− y
The only base element of S is (0, 0). Clearly, P(0, 0) is true, as 0 ≥ 0
and 3 | 0.
Now for the induction step. There are two recursive rules for S. Let
(a, b) ∈ S, and suppose P(a, b) holds. Consider (a + 1, b + 1). By the
induction hypothesis, a ≥ b, and so a+ 1 ≥ b+ 1. Also, (a+ 1)− (b+
1) = a− b, which is divisible by 3 (again by the I.H.). So P(a+ 1, b+ 1)
also holds.
Finally, consider (a + 3, b). Since a ≥ b (by the I.H.), a + 3 ≥ b.
Also, since 3 | a− b (again by the I.H.), we can write a− b = 3k. Then
(a + 3)− b = 3(k + 1), so 3 | (a + 3)− b. Therefore, P(a + 3, b) holds.
Part 2: S′ ⊆ S. We would like to use complete induction, but we can
only apply that technique to natural numbers, and not pairs of natural
22 david liu utm edits by daniel zingaro
numbers. So we need to associate each pair (a, b) with a single natural
number. We can do this by considering the sum of the pair. We define
the following predicate:
P(n) : for every (x, y) ∈ S′ such that x + y = n, (x, y) ∈ S.
It should be clear that proving ∀n ∈ N, P(n) is equivalent to proving
that S′ ⊆ S. We will prove the former using complete induction.
The base case is n = 0. the only element of S′ whose (x, y) sums
to 0 is (0, 0), which is certainly in S by the base rule of the recursive
definition.
Now let k ∈ N, and suppose P(0), P(1), . . . , P(k) all hold. Let
(x, y) ∈ S′ such that x + y = k + 1. We will prove that (x, y) ∈ S.
There are two cases to consider:
• y > 0. Then since x ≥ y, x > 0. Then (x− 1, y− 1) ∈ S′, and (x− The > 0 checks ensure that x − 1, y−
1 ∈N.1) + (y − 1) = k − 1. By the Induction Hypothesis (in particular,
P(k − 1)), (x − 1, y− 1) ∈ S. Then (x, y) ∈ S by applying the first
recursive rule in the definition of S.
• y = 0. Since k + 1 > 0, it must be the case that x > 0. Then since
x− y = x, x must be divisible by 3, and so x ≥ 3. Then (x− 3, y) ∈
S′ and (x − 3) + y = k − 2, so by the Induction Hypothesis (in
particular, P(k− 2)), (x− 3, y) ∈ S. Applying the second recursive
rule in the definition of S shows that (x, y) ∈ S. ■
Exercises
1. Prove that for all n ∈N,
n
∑
i=0
i2 =
n(n + 1)(2n + 1)
6
.
2. Let a ∈ R, a ̸= 1. Prove that for all n ∈N,
n
∑
i=0
ai =
an+1 − 1
a− 1 .
3. Prove that for all n ≥ 1,
n
∑
k=1
1
k(k + 1)
=
n
n + 1
.
4. Prove that ∀n ∈N, the units digit of 4n is either 1, 4, or 6.
introduction to the theory of computation 23
5. Prove that ∀n ∈ N, 3 | 4n − 1, where “m | n” means that m divides
n, or equivalently that n is a multiple of m. This can be expressed
algebraically as ∃k ∈N, n = mk.
6. Prove that for all n ≥ 2, 2n + 3n < 4n.
7. Let m ∈N. Prove that for all n ∈N, m | (m + 1)n − 1.
8. Prove that ∀n ∈ N, n2 ≤ 2n + 1. Hint: first prove, without using
induction, that 2n + 1 ≤ n2 − 1 for n ≥ 3.
9. Find a natural number k ∈ N such that for all n ≥ k, n3 + n < 3n.
Then, prove this result using simple induction.
10. Prove that 3n < n! for all n > 6.
11. Prove that for every n ∈N, every set of size n has exactly 2n subsets.
12. Find formulas for the number of even-sized subsets and odd-sized
subsets of a set of size n. Prove that your formulas are correct in a
single induction argument. So your predicate should be something
like “every set of size n has ... even-
sized subsets and ... odd-sized subsets.”13. Prove, using either simple or complete induction, that any binary
string begins and ends with the same character if and only if it con- A binary string is a string containing
only 0’s and 1’s.tains an even number of occurrences of substrings from {01, 10}.
14. A ternary tree is a tree where each node has at most 3 children. Prove
that for every n ≥ 1, every non-empty ternary tree of height n has
at most (3n − 1)/2 nodes.
15. Let a ∈ R, a ̸= 1. Prove that for all n ∈N,
n
∑
i=0
i · ai = n · a
n+2 − (n + 1) · an+1 + a
(a− 1)2 .
Challenge: can you mathematically derive this formula by starting
from the standard geometric identity?
16. Recall two standard trigonometric identities:
cos(x + y) = cos(x) cos(y)− sin(x) sin(y)
sin(x + y) = sin(x) cos(y) + cos(x) sin(y)
Also recall the definition of the imaginary number i =
√−1. Prove,
using induction, that
(cos(x) + i sin(x))n = cos(nx) + i sin(nx).
17. The Fibonacci sequence is an infinite sequence of natural numbers
f1, f2, . . . with the following recursive definition:
fi =
{
1, if i = 1, 2
fi−1 + fi−2, if i > 2
24 david liu utm edits by daniel zingaro
(a) Prove that for all n ≥ 1,
n
∑
i=1
fi = fn+2 − 1.
(b) Prove that for all n ≥ 1,
n
∑
i=1
f2i−1 = f2n.
(c) Prove that for all n ≥ 2, f 2n − fn+1 fn−1 = (−1)n−1.
(d) Prove that for all n ≥ 1, gcd( fn, fn+1) = 1. You may use the fact that for all a < b,
gcd(a, b) = gcd(a, b− a).
(e) Prove that for all n ≥ 1,
n
∑
i=1
f 2i = fn fn+1.
18. A full binary tree is a non-empty binary tree where every node has
exactly 0 or 2 children. Equivalently, every internal node (non-leaf)
has exactly two children.
(a) Prove using complete induction that every full binary tree has an
odd number of nodes. You can choose to do induction on
either the height or number of nodes
in the tree. A solution with simple
induction is also possible, but less
generalizable.
(b) Prove using complete induction that every full binary tree has
exactly one more leaf than internal nodes.
(c) Give a recursive definition for the set of all full binary trees.
(d) Reprove parts (a) & (b) using structural induction instead of com-
plete induction.
19. Consider the sets of binary trees with the following property: for
each node, the heights of its left and right children differ by at most
1. Prove that every binary tree with this property of height n has at
least (1.5)n − 1 nodes.
20. Let k > 1. Prove that for all n ∈N,
(
1− 1
k
)n
≥ 1− n
k
.
21. Consider the following recursively defined function f : N→N.
f (n) =
2, if n = 0
7, if n = 1
( f (n− 1))2 − f (n− 2), if n ≥ 2
Prove that for all n ∈ N, 3 | f (n)− 2. It will be helpful to phrase
your predicate here as ∃k ∈N, f (n) = 3k + 2.
22. Prove that every natural number greater than 1 can be written as
the sum of prime numbers.
23. We define the set S of strings over the alphabet {[, ]} recursively by
• ϵ, the empty string, is in S
• If w ∈ S, then so is [w]
• If x, y ∈ S, then so is xy
Prove that every string in S is balanced, i.e., the number of left brack-
ets equals the number of right brackets.
introduction to the theory of computation 25
24. The Fibonacci trees Tn are a special set of binary trees defined recur-
sively as follows.
• T1 and T2 are binary trees with only a single node.
• For n > 2, Tn consists of a root node whose left subtree is Tn−1,
and whose right subtree is Tn−2.
(a) Prove that for all n ≥ 2, the height of Tn is n− 2.
(b) Prove that for all n ≥ 1, Tn has fn leaves, where fn is the n-th
Fibonacci number.
25. Consider the following recursively defined set S ⊂N2.
• 2 ∈ S
• If k ∈ S, then k2 ∈ S
• If k ∈ S, and k ≥ 2, then k
2
∈ S
(a) Prove that every element of S is a power of 2, i.e., can be written
in the form 2m for some m ∈N.
(b) Prove that every power of 2 (including 20) is in S.
26. Consider the set S ⊂N2 of ordered pairs of integers defined by the
following recursive definition:
• (3, 2) ∈ S
• If (x, y) ∈ S, then (3x− 2y, x) ∈ S
Also consider the set S′ ⊂ N2 with the following non-recursive
definition:
S′ = {(2k+1 + 1, 2k + 1) | k ∈N}.
Prove that S = S′, or in other words, that the recursive and non-
recursive definitions agree.
27. We define the set of propositional formulas PF as follows:
• Any proposition P is a propositional formula.
• If F is a propositional formula, then so is ¬F.
• If F1 and F2 are propositional formulas, then so are F1∧ F2, F1∨ F2,
F1 ⇒ F2, and F1 ⇔ F2.
Prove that for all propositional formulas F, F has a logically equiv-
alent formula G such that G only has negations applied to proposi-
tions. For example, we have the equivalence
¬(¬(P ∧Q)⇒ R)⇐⇒ (¬P ∨ ¬Q) ∧ ¬R
Hint: you won’t have much luck applying induction directly to the
statement in the question. (Try it!) Instead, prove the stronger state-
ment: “F and ¬F have equivalent formulas that only have negations
applied to propositions.”
26 david liu utm edits by daniel zingaro
28. It is well-known that Facebook friendships are the most important
relationships you will have in your lifetime. For a person x on Face-
book, let fx denote the number of friends x has. Find a relationship
between the total number of Facebook friendships in the world, and
the sum of all of the fx’s (over every person on Facebook). Prove
your relationship using induction.
29. Consider the following 1-player game. We start with n pebbles in
a pile, where n ≥ 1. A valid move is the following: pick a pile with
more than 1 pebble, and divide it into two smaller piles. When this
happens, add to your score the product of the sizes of the two new
piles. Continue making moves until no more can be made, i.e., there
are n piles each containing a single pebble.
Prove using complete induction that no matter how the player makes
her moves, she will always score
n(n− 1)
2
points when playing this
game with n pebbles.
So this game is completely determined
by the starting conditions, and not at all
by the player’s choices. Sounds fun.
30. A certain summer game is played with n people, each carrying one
water balloon. The players walk around randomly on a field until
a buzzer sounds, at which point they stop. You may assume that
when the buzzer sounds, each player has a unique closest neigh-
bour. After stopping, each player then throws his water balloon at
their closest neighbour. The winners of the game are the players
who are dry after the water balloons have been thrown (assume
everyone has perfect aim).
Prove that for every odd n, this game always has at least one winner.
The following problems are for the more mathematically-inclined
students.
1. The Principle of Double Induction is as follows. Suppose that P(x, y)
is a predicate with domain N2 satisfying the following properties:
(1) P(0, y) holds for all y ∈N
(2) For all (x, y) ∈N2, if P(x, y) holds, then so does P(x + 1, y).
Then we may conclude that for all x, y ∈N, P(x, y).
Prove that the Principle of Double Induction is equivalent to the
Principle of Simple Induction.
2. Prove that for all n ≥ 1, and positive real numbers x1, . . . , xn ∈ R+,
1− x1
1+ x1
× 1− x2
1+ x2
× · · · × 1− xn
1+ xn
≥ 1− S
1+ S
,
where S =
n
∑
i=1
xi.
introduction to the theory of computation 27
3. A unit fraction is a fraction of the form
1
n
, n ∈ Z+. Prove that every
rational number 0 <
p
q
< 1 can be written as the sum of distinct
unit fractions.
Recursion
Now, programming! In this chapter, we will apply what we’ve learned
about induction to study recursive algorithms. In particular, we will
learn how to analyse the time complexity of recursive programs, for
which the runtime on an input of size n depends on the runtime on
smaller inputs. Unsurprisingly, this is tightly connected to the study
of recursively defined (mathematical) functions; we will discuss how
to go from a recurrence relation like f (n + 1) = f (n) + f (n − 1) to
a closed form expression like f (n) = 2n + n2. For recurrences of a
special form, we will see how the Master Theorem gives us immediate,
tight asymptotic bounds. These recurrences will be used for divide- Recall that asymptotic bounds involve
Big-O, and are less precise than exact
expressions.
and-conquer algorithms; you will gain experience with this common
algorithmic paradigm and even design algorithms of your own.
Measuring Runtime
Recall that one of the most important properties of an algorithm is how
long it takes to run. We can use the number of steps as a measurement
of running time; but reporting an absolute number like “10 steps” or
“1 000 000 steps” an algorithm takes is pretty meaningless unless we
know how “big” the input was, since of course we’d expect algorithms
to take longer on larger inputs. So a more meaningful measure of run-
time is “10 steps when the input has size 2” or “1 000 000 steps when
the input has size 300” or even better, “n2 + 2 steps when the input
has size n.” But as you probably remember from CSC148, counting an
exact number of steps is often tedious and arbitrary, so we care more
about the Big-O (asymptotic) analysis of an algorithm. In this course, we will mainly care
about the upper bound on the worst-
case runtime of algorithms; that is, the
absolute longest an algorithm could run
on a given input size n.
In CSC148 and earlier in this course, you analysed the runtime of
iterative algorithms. As we’ve mentioned several times by now, induc-
tion is very similar to recursion; since induction has been the key idea
of the course so far, it should come as no surprise that we’ll turn our
30 david liu utm edits by daniel zingaro
attention to recursive algorithms now!
A Simple Recursive Function
Consider the following simple recursive function, which you probably
saw in CSC148:
All code in this course will be in
Python-like pseudocode. The syntax
and methods will be mostly Python,
with some English making the code
more readable and/or intuitive. We’ll
expect you to follow a similar style.
1 def fact(n):
2 if n == 1:
3 return 1
4 else:
5 return n * fact(n-1)
In CSC148, it was surely claimed that
this function computes n! and we’ll
see later in this course how to formally
prove that this is what the function does.
How would you informally analyse the runtime of this algorithm?
One might say that the recursion depth is n, and at each call there is
just one step other than the recursive call, so the runtime is O(n), i.e.,
linear time. But in performing a more thorough step-by-step analy-
sis, we reach a stumbling block with the recursive call fact(n-1): the
runtime of fact on input n depends on its runtime on input n − 1!
Let’s see how to deal with this relationship, using mathematics and
induction.
Recursively Defined Functions
You should all be familiar with standard function notation: f (n) = n2,
f (n) = n log n, or the slightly more unusual (but no less meaningful)
f (n) = “the number of distinct prime factors of n.” There is a second
way of defining functions using recursion, e.g.,
f (n) =
{
0, if n = 0
f (n− 1) + 2n− 1, if n ≥ 1
Recursive definitions allow us to capture marginal or relative differ- recursively defined function
ence between function values, even when we don’t know their exact
values. But recursive definitions have a significant downside: we can
only calculate large values of f by computing smaller values first. For
calculating extremely large, or symbolic, values of f (like f (n2 + 3n)), a
recursive definition is inadequate; what we would really like is a closed
form expression for f , one that doesn’t depend on other values of f . In
this case, the closed form expression for f is f (n) = n2. You will prove this in the Exercises.
introduction to the theory of computation 31
Before returning to our earlier factorial example, let us see how to
apply this to a more concrete example.
Example. There are exactly two ways of tiling a 3-by-2 grid using tri-
ominoes, shown to the right:
Develop a recursive definition for f (n), the number of ways of tiling
a 3-by-n grid using triominoes for n ≥ 1. Then, find a closed form
expression for f .
Solution:
Note that if n = 1, there are no possible tilings, since no triomino
will fit in a 3-by-1 board. We have already observed that there are 2
tilings for n = 2. Suppose n > 2. The key idea to get a recurrence
is that for a 3-by-n block, first consider the upper-left square. In any
tiling, there are only two possible triomino placements that can cover
it (these orientations are shown in the diagram above). Once we have
fixed one of these orientations, there is only one possible triomino
orientation that can cover the bottom-left square (again, these are the
two orientations shown in the figure).
So there are exactly two possibilities for covering both the bottom-
left and top-left squares. But once we’ve put these two down, we’ve
tiled the leftmost 3-by-2 part of the grid, and the remainder of the tiling
really just tiles the remaining 3-by-(n − 2) part of the grid; there are
f (n− 2) such tilings. Since these two parts are independent of each Because we’ve expressed f (n) in terms
of f (n− 2), we need two base cases –
otherwise, at n = 2 we would be stuck,
as f (0) is undefined.
other, we get the total number of tilings by multiplying the number of
possibilities for each. Therefore the recurrence relation is:
f (n) =
0, if n = 1
2, if n = 2
2 f (n− 2), if n > 2
Now that we have the recursive definition of f , we would like to find
its closed form expression. The first step is to guess the closed form
expression, by a “brute force” approach known as repeated substitution.
Intuitively, we’ll expand out the recursive definition until we find a
pattern. So much of mathematics is finding
patterns.
f (n) = 2 f (n− 2)
= 4 f (n− 4)
= 8 f (n− 6)
...
= 2k f (n− 2k)
32 david liu utm edits by daniel zingaro
There are two possibilities. If n is odd, say n = 2m + 1, then we have
f (n) = 2m f (1) = 0, since f (1) = 0. If n is even, say n = 2m, then
f (n) = 2m−1 f (2) = 2m−1 · 2 = 2m. Writing our final answer in terms
of n only:
f (n) =
{
0, if n is odd
2
n
2 , if n is even
Thus we’ve obtained a closed form formula f (n) – except the
... in our
repeated substitution does not constitute a formal proof! When you
saw the
..., you probably interpreted it as “repeat over and over again
until. . . ” and we already know how to make this thinking formal:
induction! That is, given the recursive definition of f , we can prove using
complete induction that f (n) has the closed form given above. This is
Why complete and not simple induc-
tion? We need the induction hypothesis
to work for n− 2, and not just n− 1.
a rather straightforward argument, and we leave it for the Exercises.
We will now apply this technique to our earlier example.
Example. Analyse the asymptotic worst-case running time of fact(n),
in terms of n.
Solution:
Let T(n) denote the worst-case running time of fact on input n. In
this course, we will ignore exact step counts entirely, replacing these
counts with constants.
The base case of this method is when n = 1; in this case, the if block
executes and the method returns 1. This is done in constant time, and Constant always means “independent of
input size.”so we can say that T(1) = c for some constant c.
What if n > 1? Then fact makes a recursive call, and to analyse the
runtime we consider the recursive and non-recursive parts separately.
The non-recursive part is simple: only a constant number of steps
occur (the if check, multiplication by n, and the return), so let’s say
the non-recursive part takes d steps. What about the recursive part?
The recursive call is fact(n-1), which has worst-case runtime T(n−
1), by definition! Therefore when n > 1 we get the recurrence relation
T(n) = T(n− 1) + d. Putting this together with the base case, we get
the full recursive definition of T:
T(n) =
c, if n = 1T(n− 1) + d, if n > 1
Now, we would like to say that T(n) = O(??), but to do so, we
really need a closed form definition of T. Once again, we use repeated
introduction to the theory of computation 33
substitution.
T(n) = T(n− 1) + d
=
(
T(n− 2) + d)+ d = T(n− 2) + 2d
= T(n− 3) + 3d
...
= T(1) + (n− 1)d
= c + (n− 1)d (Since T(1) = c)
Thus we’ve obtained the closed form formula T(n) = c + (n − 1)d,
modulo the
.... As in the previous example, we leave proving this closed
form as an exercise.
After proving this closed form, the final step is simply to convert
this closed form into an asymptotic bound on T. Since c and d are
constants with respect to n, we have that T(n) = O(n).
Now let’s see a more complicated recursive function.
Example. Consider the following code for binary search.
1 def bin_search(A, x):
2 '''
3 Pre: A is a sorted list (non-decreasing order).
4 Post: Returns True if and only if x is in A.
5 '''
6 if len(A) == 0:
7 return False
8 else if len(A) == 1:
9 return A[0] == x
10 else:
11 m = len(A) // 2 # Rounds down, like floor
12 if x <= A[m-1]:
13 return bin_search(A[0..m-1], x)
14 else:
15 return bin_search(A[m..len(A)-1], x)
One notable difference from Python is
how we’ll denote sublists. Here, we use
the notation A[i..j] to mean the slice
of the list A from index i to index j,
including A[i] and A[j].
We analyse the runtime of bin_search in terms of n, the length of
the input list A. If n = 0 or n = 1, bin_search(A,x) takes constant
time (note that it doesn’t matter whether the constant is the same or
different for 0 and 1).
What about when n > 1? Then some recursive calls are made, and
we again look at the recursive and non-recursive steps separately. We
34 david liu utm edits by daniel zingaro
include the computation of A[0..m-1] and A[m..len(A)-1] in the non-
recursive part, since argument evaluation happens before the recursive
call begins. Interestingly, this is not the case in
some programming languages – an
alternative is “lazy evaluation.”
IMPORTANT ANNOUNCEMENT 1: We will interpret all
list slicing operations A[i..j] as constant time, even when i
and j depend on the length of the list. See the discussion in
the following section.
Interpreting list slicing as constant time, the non-recursive cost of
bin_search is constant time. What about the recursive calls? In all
possible cases, only one recursive call occurs. What is the size of the
list of the recursive call? When either recursive call happens, m =
⌊n
2
⌋
,
meaning the recursive call either is on a list of size
⌊n
2
⌋
or
⌈n
2
⌉
.
IMPORTANT ANNOUNCEMENT 2: In this course, we
won’t care about floors and ceilings. We’ll always assume
that the input sizes are “nice” so that the recursive calls al-
ways divide the list evenly. In the case of binary search,
we’ll assume that n is a power of 2.
You may look in Vassos Hadzilacos’
course notes for a complete handling
of floors and ceilings. The algebra is
a little more involved, but the bottom
line is that the asymptotic analysis is
unchanged.
With this in mind, we conclude that the recurrence relation for T(n)
is T(n) = T
(n
2
)
+ d. Therefore the full recursive definition of T is
T(n) =
c, if n ≤ 1T (n
2
)
+ d, if n > 1 Again, we omit floors and ceilings.
Let us use repeated substitution to guess a closed form. Assume
that n = 2k for some natural number k.
T(n) = T
(n
2
)
+ d
=
(
T
(n
4
)
+ d
)
+ d = T
(n
4
)
+ 2d
= T
(n
8
)
+ 3d
...
= T
( n
2k
)
+ kd
= T(1) + kd (Since n = 2k)
= c + kd (Since T(1) = c)
introduction to the theory of computation 35
Once again, we’ll leave proving this closed form to the Exercises. So
T(n) = c + kd. This expression is quite misleading, because it seems
to not involve an n, and hence be constant time – which we know is
not the case for binary search! The key is to remember that n = 2k,
so k = log2 n. Therefore we have T(n) = c + d log2 n, and so T(n) =
O(log n).
Aside: List Slicing vs. Indexing
In our analysis of binary search, we assumed that the list slicing op-
eration A[0..m-1] took constant time. However, this is not the case
in Python and many other programming languages, which implement
this operation by copying the sliced elements into a new list. Depend-
ing on the scale of your application, this can be undesirable for two
reasons: this copying takes time linear in the size of the slice, and uses
linear additional memory.
While we are not so concerned in this course about the second is-
sue, the first can drastically change our runtime analysis (e.g., in our
analysis of binary search). However, there is always another way to im-
plement these algorithms without this sort of slicing that can be done
in constant time, and without creating new lists. The key idea is to use
variables to keep track of the start and end points of the section of the
list we are interested in, but keep the whole list all the way through
the computation. We illustrate this technique in our modified binary
search:
1 def indexed_bin_search(A, x, first, last):
2 if first > last:
3 return False
4 else if first == last:
5 return A[first] == x
6 else:
7 m = (first + last + 1) // 2
8 if x <= A[m-1]:
9 return indexed_bin_search(A, x, first, m - 1)
10 else:
11 return indexed_bin_search(A, x, m, last)
In this code, the same list A is passed to each recursive call; the
range of searched values, on the other hand, indeed gets smaller, as
36 david liu utm edits by daniel zingaro
the first and last parameters change. More technically, the size of the
range, last - first + 1, decreases by a (multiplicative) factor of two
at each recursive call.
Passing indices as arguments works well for recursive functions that
work on smaller and smaller segments of a list. We’ve just introduced
the most basic version of this technique. However, many other algo-
rithms involve making new lists in more complex ways, and it is usu-
ally possible to make these algorithms in-place, i.e., to use a constant
amount of extra memory, and do operations by changing the elements
of the original list.
Because we aren’t very concerned with this level of implementation
detail in this course, we’ll use the shortcut of interpreting list slicing
as taking constant time, keeping in mind actual naïve implementations take
linear time. This will allow us to perform our runtime analyses without
getting bogged down in clever implementation tricks.
A Special Recurrence Form
In general, finding exact closed forms for recurrences can be tedious or
even impossible. Luckily, we are often not really looking for a closed
form solution to a recurrence, but an asymptotic bound. But even for
this relaxed goal, the only method we have so far is to find a closed
form and turn it into an asymptotic bound. In this section, we’ll look
at a powerful technique with the caveat that it works only for a special
recurrence form.
We can motivate this recurrence form by considering a style of re-
cursive algorithm called divide-and-conquer. We’ll discuss this in detail
in the next section, but for now consider the mergesort algorithm, mergesort
which can roughly be outlined in three steps:
1. Divide the list into two equal halves.
2. Sort each half separately, using recursive calls to mergesort.
3. Merge each of the sorted halves.
1 def mergesort(A):
2 if len(A) == 1:
3 return A
4 else:
5 m = len(A) // 2
6 L1 = mergesort(A[0..m-1])
introduction to the theory of computation 37
7 L2 = mergesort(A[m..len(A)-1])
8 return merge(L1, L2)
9
10 def merge(A, B):
11 i = 0
12 j = 0
13 C = []
14 while i < len(A) and j < len(B):
15 if A[i] <= B[j]:
16 C.append(A[i])
17 i += 1
18 else:
19 C.append(B[j])
20 j += 1
21 return C + A[i..len(A)-1] + B[j..len(B)-1] # List concatenation
Consider the analysis of T(n), the runtime of mergesort(A) where
len(A) = n. Steps 1 and 3 together take linear time. What about Step Careful implementations of mergesort
can do step 1 in constant time, but
merging always takes linear time.
2? Since this is the recursive step, we’d expect T to appear here. There
are two fundamental questions:
• What is the number of recursive calls?
• What is the size of the lists passed to the recursive calls?
From the written description, you should be able to intuit that there
are two recursive calls, each on a list of size
n
2
. So the cost of Step 2 is For the last time, we’ll point out that we
ignore floors and ceilings.
2T
(n
2
)
. Putting all three steps together, we get a recurrence relation
T(n) = 2T
(n
2
)
+ cn.
This is an example of our special recurrence form:
T(n) = aT
(n
b
)
+ f (n),
where a ∈ Z+, b > 1 are constants and f : N→ R+ is some arbitrary
function. Though we’ll soon restrict f to ease the
analysis of this recursive form.
Before we get to the Master Theorem, which gives us an immediate
asymptotic bound for recurrences of this form, let’s discuss some in-
tuition. The special recurrence form has three parameters: a, b, and f .
Changing how big they are affects the overall runtime:
• a is the “number of recursive calls”; the bigger a is, the more recur-
sive calls, and the bigger we expect T(n) to be.
38 david liu utm edits by daniel zingaro
• b determines the rate of decrease of the problem size; the larger b
is, the faster the problem size goes down to 1, and the smaller T(n)
is.
• f (n) is the cost of the non-recursive part; the bigger f (n) is, the
bigger T(n) is.
We can further quantify this relationship by considering the following
even more specific form:
f (n) =
c, if n = 1a f (n
b
)
+ nk, if n > 1
Suppose n is a power of b, say n = br. Using repeated substitution,
f (n) = a f
(n
b
)
+ nk
= a
(
a f
( n
b2
)
+
nk
bk
)
+ nk = a2 f
( n
b2
)
+ nk
(
1+
a
bk
)
= a3 f
( n
b3
)
+ nk
(
1+
a
bk
+
( a
bk
)2)
...
= ar f
( n
br
)
+ nk
r−1
∑
i=0
( a
bk
)i
= ar f (1) + nk
r−1
∑
i=0
( a
bk
)i
(Since n = br)
= car + nk
r−1
∑
i=0
( a
bk
)i
= cnlogb a + nk
r−1
∑
i=0
( a
bk
)i
The latter term looks like a geometric series, for which we may use
Note that r = logb n, and so a
r =
alogbn = blogb a·logb n = nlogb a.
our geometric series formula. However, this only applies when the
common ratio
a
bk
is not equal to 1. Therefore, there are two cases.
• Case 1:
a
bk
= 1, so a = bk. Taking logs, we have logb a = k. In this
case, the expression becomes
f (n) = cnk + nk
r−1
∑
i=0
1i
= cnk + nkr
= cnk + nk logb n
= O(nk log n)
introduction to the theory of computation 39
• Case 2: a ̸= bk. Then by the geometric series formula,
f (n) = cnlogb a + nk
(
1− arbkr
1− abk
)
= cnlogb a + nk
1− nlogb ank
1− abk
=
(
c− 1
1− abk
)
nlogb a +
(
1
1− abk
)
nk
There are two occurrences of n in this expression: nlogb a and nk.
Asymptotically, the higher exponent dominates; so if logb a > k,
then f (n) = O(nlogb a), and if logb a < k, then f (n) = O(nk).
With this intuition in mind, let us now state the Master Theorem.
Theorem (Master Theorem). Let T : N → R+ be a recursively de- Master Theorem
fined function with recurrence relation T(n) = aT(n/b) + f (n), for some
constants a ∈ Z+, b > 1, and f : N → R+. Furthermore, suppose
f (n) = Θ(nk) for some k ≥ 0. Then we can conclude the following about
the asymptotic complexity of T:
(1) If k = logb a, then T(n) = O(nk log n).
(2) If k < logb a, then T(n) = O(nlogb a).
(3) If k > logb a, then T(n) = O(nk).
Let’s see some examples of the Master Theorem in action.
Example. Consider the recurrence for mergesort: T(n) = 2T(n/2) +
dn. Here a = b = 2, so log2 2 = 1, while dn = Θ(n
1). Therefore
Case 1 of the Master Theorem applies, and T(n) = O(n log n), as we
expected.
Example. Consider the recurrence T(n) = 49T(n/7) + 50nπ . Here,
log7 49 = 2 and 2 < π, so by Case 3 of the Master Theorem, T(n) =
O(nπ).
Even though the Master Theorem is useful in a lot of situations, be
sure you understand the statement of the theorem to see exactly when
it applies (see Exercises for some questions investigating this).
Divide-and-Conquer Algorithms
Now that we have seen the Master Theorem, let’s discuss some al-
gorithms for which it can help us analyse the runtime! A key fea-
40 david liu utm edits by daniel zingaro
ture of the recurrence form aT(n/b) + f (n) is that each of the recursive
calls has the same size. This naturally leads to the divide-and-conquer
paradigm, which can be summarized as follows: divide-and-conquer
An algorithmic paradigm is a general
strategy for designing algorithms to
solve problems. You will see many
more such strategies in CSC373.
1 def divide-and-conquer(P):
2 if P has "small enough" size:
3 return solve_directly(P)
4 else:
5 divide P into smaller problems P_1, ..., P_k (same size)
6 for i from 1 to k:
7 # Solve each subproblem recursively
8 s_i = divide_and_conquer(P_i)
9 # combine the s_1 ... s_k to solve P
10 return combine(s_1 ... s_k)
This is a very general template — in fact, it may seem exactly like
your mental model of recursion so far, and certainly it is a recursive
strategy. What distinguishes divide-and-conquer algorithms from a lot
of other recursive procedures is that we divide the problem into two or
more parts and solve the subproblems for each part, whereas recursive
functions in general may make only a single recursive call, like in fact
or bin_search.
Another common non-divide-and-
conquer recursive design pattern
is taking a list, processing the first
element, then recursively processing
the rest of the list (and combining the
results).
This introduction to the divide-and-conquer paradigm was deliber-
ately abstract. However, we have already discussed one divide-and-
conquer algorithm: mergesort! Let us now see two more examples of
divide-and-conquer algorithms: fast multiplication and quicksort.
Fast Multiplication
Consider an algorithm for multiplying two numbers: 1234× 5678. We
might start by writing this as
(1 ∗ 1000+ 2 ∗ 100+ 3 ∗ 10+ 4) ∗ (5 ∗ 1000+ 6 ∗ 100+ 7 ∗ 10+ 8)
Expanding this product requires 16 one-digit multiplications (1 ∗ 5, 2 ∗
5, 3 ∗ 5, 4 ∗ 5, 1 ∗ 6, . . . , 4 ∗ 8), and then some one-digit additions to add
everything up. In general, multiplying two n-digit numbers this way
requires O(n2) one-digit operations.
Now, let’s see a different way of making this faster. Using a divide-
and-conquer approach, we want to split 1234 and 5678 into smaller
introduction to the theory of computation 41
numbers:
1234 = 12 · 100+ 34, 5678 = 56 · 100+ 78.
Now we use some algebra to write the product 1234 · 5678 as the com-
bination of some smaller products:
1234 · 5678 = (12 · 100+ 34)(56 · 100+ 78)
= (12 · 56) · 10000+ (12 · 78+ 34 · 56) · 100+ 34 · 78
So now instead of multiplying 4-digit numbers, we have shown how to
find the solution by multiplying some 2-digit numbers, a much easier
problem! Note that we aren’t counting multiplication by powers of
10, since that amounts to just adding some zeroes to the end of the
numbers.
On a computer, we would use base-2
instead of base-10 to take advantage
of the “adding zeros,” which corre-
sponds to (very fast) bit-shift machine
operations.
Reducing 4-digit multiplication to 2-digit multiplication may not
seem that impressive; but now, we’ll generalize this to arbitrary n-digit
numbers (the difference in multiplying 100-digit vs. 50-digit numbers
may be more impressive).
Let x and y be n-digit numbers. For simplicity, assume n is a power
of 2. Then we can divide x and y each into two halves:
x = 10
n
2 a + b
y = 10
n
2 c + d
Where a, b, c, d are
n
2
-digit numbers. Then
x · y = (ac)10n + (ad + bc)10 n2 + bd.
We have found a mathematical identity that seems useful, and we
can use this to develop a multiplication algorithm. Let’s see some
pseudocode:
The length of a number here refers to
the number of digits in its decimal
representation.
1 def rec_mult(x,y):
2 n = length of x # Assume x and y have the same length
3 if n == 1:
4 return x * y
5 else:
6 a = x // 10^(n//2)
7 b = x % 10^(n//2)
8 c = y // 10^(n//2)
9 d = y % 10^(n//2)
10 r = rec_mult(a, c)
11 s = rec_mult(a, d)
42 david liu utm edits by daniel zingaro
12 t = rec_mult(b, c)
13 u = rec_mult(b, d)
14 return r * 10^n + (s + t) * 10^(n//2) + u
Now, let’s talk about the running time of this algorithm, in terms of
the size n of the two numbers. Note that there are four recursive calls;
each call multiplies two numbers of size
n
2
, so the cost of the recursive
calls is 4T
(n
2
)
. What about the non-recursive parts? Note that the
final return step involves addition of 2n-digit numbers, which takes
Θ(n) time. Therefore we have the recurrence
T(n) = 4T
(n
2
)
+ cn.
By the Master Theorem, we have T(n) = O(n2).
So, this approach didn’t help! We had an arguably more compli-
cated algorithm that achieved the same asymptotic runtime as what
we learned in elementary school! Moral of the story: Divide-and-
conquer, like all algorithmic paradigms, doesn’t always lead to “bet-
ter” solutions!
This is a serious lesson. It is not the case
that everything we teach you works for
every situation. It is up to you to care-
fully put together your knowledge to
figure out how to approach problems!
In the case of fast multiplication, though, we can use more math
to improve the running time. Note that the “cross term” ad + bc in
the algorithm required two multiplications to compute naïvely; how-
ever, it is correlated with the values of ac and bd with the following
straightforward identity:
(a + b)(c + d) = ac + (ad + bc) + bd
(a + b)(c + d)− ac− bd = ad + bc
So we can compute ad + bc by calculating just one additional product
(a+ b)(c+ d) (together with ac and bd that we are calculating anyway).
This trick underlies the multiplication algorithm of Karatsuba (1960).
1 def fast_rec_mult(x,y):
2 n = length of x # Assume x and y have the same length
3 if n == 1:
4 return x * y
5 else:
6 a = x // 10^(n//2)
7 b = x % 10^(n//2)
8 c = y // 10^(n//2)
9 d = y % 10^(n//2)
introduction to the theory of computation 43
10 p = fast_rec_mult(a + b, c + d)
11 r = fast_rec_mult(a, c)
12 u = fast_rec_mult(b, d)
13 return r * 10^n + (p - r - u) * 10^(n//2) + u
You can study the (improved!) runtime of this algorithm in the
exercises.
Quicksort
In this section, we explore the divide-and-conquer sorting algorithm
known as quicksort, which in practice is one of the most commonly quicksort
used sorting algorithms. First, we give the pseudocode for this algo-
rithm; note that this follows a very clear divide-and-conquer pattern.
Unlike fast_rec_mult and mergesort, the hard work is done in the
divide (partition) step, not the combine step. The combine step for
quicksort can be made to take a constant amount of work. Our naïve implementation below
does list concatenation in linear time
because of list slicing, but in fact a more
clever implementation using indexing
accomplishes this in constant time.
1 def quicksort(A):
2 if len(A) <= 1:
3 # do nothing (A is already sorted)
4 else:
5 choose some element x of A (the "pivot")
6 partition (divide) the rest of the elements of A into two lists:
7 - L, the elements of A <= x
8 - G, the elements of A > x
9 sort L and G recursively
10 combine the sorted lists in the order L + [x] + G
11 set A equal to the new list
Before moving on, an excellent exercise is to take the above pseu-
docode and implement quicksort yourself. As we will discuss again
and again, implementing algorithms yourself is the best way to un-
derstand them. Remember that the only way to improve your coding
abilities is to code lots — even something as simple and common as
sorting algorithms offers great practice. See the Exercises for more
examples.
44 david liu utm edits by daniel zingaro
1 def quicksort(A):
2 if len(A) <= 1:
3 pass
4 else:
5 # Choose the final element as the pivot
6 pivot = A[-1]
7 # Partition the rest of A with respect to the pivot
8 L, G = partition(A[0:-1], pivot)
9 # Sort each list recursively
10 quicksort(L)
11 quicksort(G)
12 # Combine
13 sorted = L + [pivot] + G
14 # Set A equal to the sorted list
15 for i in range(len(A)):
16 A[i] = sorted[i]
17 def partition(A, pivot):
18 L = []
19 G = []
20 for x in A:
21 if x <= pivot:
22 L.append(x)
23 else:
24 G.append(x)
25 return L, G
Let us try to analyse the running time T(n) of this algorithm, where
n is the length of the input list A. First, the base case n = 1 takes
constant time. The partition method takes linear time, since it is
called on a list of length n − 1 and contains a for loop that loops
through all n − 1 elements. The Python list methods in the rest of
the code also take linear time, though a more careful implementation
could reduce this. But because partitioning the list always takes linear
time, the non-recursive cost of quicksort is linear.
What about the costs of the recursive steps? There are two of them:
quicksort(L) and quicksort(G), so the recursive cost in terms of L
and G is T(|L|) and T(|G|). Therefore a potential recurrence is: Here |A| denotes the length of the list
A.
T(n) =
{
c, if n ≤ 1
T(|L|) + T(|G|) + dn, if n > 1
What’s the problem with this recurrence? It depends on what L and G
introduction to the theory of computation 45
are, which in turn depends on the input array and the chosen pivot! In
particular, we can’t use either repeated substitution or the Master The-
orem to analyse this function. In fact, the asymptotic running time of
this algorithm can range from Θ(n log n) to Θ(n2), the latter of which
is just as bad as bubblesort! See the Exercises for details.
This begs the question: why is quicksort so used in practice? Two
reasons: quicksort takes Θ(n log n) time “on average”, and careful im- Average-case analysis is slightly more
sophisticated than what we do in
this course, but you can take this
to mean that most of the time, on
randomly selected inputs, quicksort
takes Θ(n log n) time.
plementations of quicksort yield better constants than other Θ(n log n)
algorithms like mergesort. These two facts together imply that quick-
sort often outperforms other sorting algorithms in practice!
Exercises
1. Let f : N→N be defined as
f (n) =
{
0, if n = 0
f (n− 1) + 2n− 1, if n ≥ 1
Prove using induction that the closed form for f is f (n) = n2.
2. Recall the recursively defined function
f (n) =
0, if n = 1
2, if n = 2
2 f (n− 2), if n > 2
Prove that the closed form for f is
f (n) =
{
0, if n is odd
2
n
2 , if n is even
3. Prove that the closed form expression for the runtime of fact is
T(n) = c + (n− 1)d.
4. Prove that the closed form expression for the runtime of bin_search
is T(n) = c + d log2 n.
5. Let T(n) be the number of binary strings of length n in which there
are no consecutive 1’s. So T(0) = 1, T(1) = 2, T(2) = 3, etc.
(a) Develop a recurrence for T(n). Hint: think about the two possible
cases for the last character.
(b) Find a closed form expression for T(n).
(c) Prove that your closed form expression is correct using induction.
46 david liu utm edits by daniel zingaro
6. Repeat the steps of the previous question, except with binary strings
where every 1 is immediately preceded by a 0.
7. It is known that every full binary tree has an odd number of nodes. A full binary tree is a binary tree where
every node has either 0 or 2 children.Let T(n) denote the number of distinct full binary trees with n
nodes. For example, T(1) = 1, T(3) = 1, and T(7) = 5. Give a
recurrence for T(n), justifying why it is correct. Then, use induc-
tion to prove that T(n) ≥ 1
n
2(n−1)/2.
8. Consider the following recursively defined function
f (n) =
3, if n = 0
7, if n = 1
3 f (n− 1)− 2 f (n− 2), if n ≥ 2
Find a closed form expression for f , and prove that it is correct
using induction.
9. Consider the following recursively defined function:
f (n) =
1
5
, if n = 0
1+ f (n− 1)
2
, if n ≥ 1
(a) Prove that for all n ≥ 1, f (n + 1)− f (n) < f (n)− f (n− 1).
(b) Prove that for all n ∈N, f (n) = 1− 4
5 · 2n .
10. A block in a binary string is a maximal substring consisting of the
same symbol. For example, the string 0100011 has four blocks: 0, 1,
000, and 11. Let H(n) denote the number of binary strings of length
n that have no odd length blocks of 1’s. For example, H(4) = 5:
0000 1100 0110 0011 1111
Develop a recursive definition for H(n), and justify why it is correct.
Then find a closed form for H using repeated substitution.
11. Consider the following recursively defined function.
T(n) =
1, if n = 14T (n
2
)
+ log2 n, otherwise
Use repeated substitution to come up with a closed form expression
for T(n), when n = 2k; i.e., n is a power of 2. You will need to use
the following identity:
n
∑
i=0
i · ai = n · a
n+2 − (n + 1) · an+1 + a
(a− 1)2 .
introduction to the theory of computation 47
12. Analyse the worst-case runtime of fast_rec_mult.
13. Analyse the runtime of each of the following recursive algorithms.
It’s up to you to decide whether you should use repeated substitu-
tion or the Master Theorem to find the asymptotic bound.
(a)
1 def sum(A):
2 if len(A) == 0:
3 return 1
4 else:
5 return A[0] + sum(A[1..len(A)-1])
(b)
1 def fun(A):
2 if len(A) < 2:
3 return len(A) == 0
4 else:
5 return fun(A[2..len(A)-1])
(c)
1 def double_fun(A):
2 n = len(A)
3 if n < 2:
4 return n
5 else:
6 return double_fun(A[0..n-2]) + double_fun(A[1..n-1])
(d)
1 def mystery(A):
2 if len(A) <= 1:
3 return 1
4 else:
5 d = len(A) // 4
6 s = mystery(A[0..d-1])
7 i = d
8 while i < 3 * d:
9 s += A[i]
10 i += 1
48 david liu utm edits by daniel zingaro
11 s += mystery(A[3*d..len(A)-1])
12 return s
14. Recall the recurrence for the worst-case runtime of quicksort:
T(n) =
{
c, if n ≤ 1
T(|L|) + T(|G|) + dn, if n > 1
where L and G are the partitions of the list. Clearly, how the list is
partitioned matters a great deal for the runtime of quicksort.
(a) Suppose the lists are always evenly split; that is, |L| = |G| = n
2
at
each recursive call. Find a tight asymptotic bound on the runtime For simplicity, we’ll ignore the fact that
each list really would have size
n− 1
2
.of quicksort using this assumption.
(b) Now suppose that the lists are always very unevenly split: |L| =
n− 2 and |G| = 1 at each recursive call. Find a tight asymptotic
bound on the runtime of quicksort using this assumption.
Program Correctness
In our study of algorithms so far, we have mainly been concerned with
their worst-case running time. While this is an important considera-
tion of any program, there is arguably a much larger one: program
correctness! That is, while it is important for our algorithms to run
quickly, it is more important that they work! You are used to testing
your programs to demonstrate their correctness, but your confidence
depends on the quality of your testing.
Frankly, developing high-quality tests
takes a huge amount of time – much
longer than you probably spent on it in
CSC148!
In this chapter, we’ll discuss methods of formally proving program
correctness, without writing any tests at all. We cannot overstate the
importance of this technique: a test suite cannot possibly test a pro-
gram on all possible inputs (unless it is a very restricted program), and
so a proof is the only way we can ensure that our programs are actu-
ally correct on all inputs. Even for larger software systems, which are
far too complex to formally prove their correctness, the skills you will
learn in this chapter will enable you to reason more effectively about
your code; essentially, what we will teach you is the art of semantically
tracing through code.
By “semantically” we mean your ability
to derive meaning from code, i.e.,
identify exactly what the program does.
This contrasts with program syntax,
things like punctuation and (in Python)
indentation.
What is Correctness?
You may be familiar with the most common tools used to specify pro-
gram correctness: preconditions and postconditions. Formally, a precon- precondition/postcondition
dition of a function is a property that an input to the function must
satisfy in order to guarantee that the function will work properly. A As a program designer, it is up to you
to specify preconditions. This often bal-
ances the desire of flexibility (allowing
a broad range of inputs/usages) with
feasibility (how much code you want or
are able to write).
postcondition of a function is a property that must be satisfied after the
function completes. Most commonly, this refers to properties of a re-
turn value, though it can also refer to changes to the variables passed
in as with the implementation of quicksort from the previous chapter
(which didn’t return anything but instead changed the input list A). A single function can have several pre-
and postconditions.
50 david liu utm edits by daniel zingaro
Example. Consider the following code for calculating the greatest com-
mon divisor of two natural numbers. Its pre- and postconditions are
shown.
1 def gcd_rec(a, b):
2 '''
3 Pre: a and b are positive integers, and a >= b
4 Post: returns the greatest common divisor of a and b
5 '''
6 if a == 1 or b == 1:
7 return 1
8 else if a mod b == 0:
9 return b
10 else:
11 return gcd_rec(b, a mod b)
We’ll use mod rather than % in our code,
for clarity.
So preconditions tell us what must be true before the program
starts, and postconditions tell us what must be true after the program
terminates (assuming it ends at all). We have the following formal
statement of correctness. Though it is written in more formal lan-
guage, note that it really captures what we mean when we say that a
program is “correct.”
Definition (Program Correctness). Let f be a function with a set of program correctness
preconditions and postconditions. Then f is correct (with respect to
the pre- and postconditions) if the following holds:
For every input I to f, if I satisfies the preconditions, then
f(I) terminates, and all the postconditions hold after it ter-
minates.
Correctness of Recursive Programs
Consider the code for gcd_rec shown in the previous example. Here
is its statement of correctness:
For all a, b ∈ Z+ such that a ≥ b, gcd_rec(a,b) terminates
and returns gcd(a, b).
introduction to the theory of computation 51
How do we prove that gcd_rec is correct? It should come as no sur-
prise that we use induction, since you know by now that induction and
recursion are closely related concepts. At this point in the course, you
should be extremely comfortable with the following informal reason-
ing: “If a and b are ’small’ then we can prove directly using the base
case of gcd_rec that it is correct. Otherwise, the recursive call happens
on ’smaller’ inputs, and by induction, the recursive call is correct, and
hence because of some {math, logic, etc.}, the program also returns the
correct value.”
Writing full induction proofs that formalise the above logic is te-
dious, so instead we use the fundamental idea in a looser template.
For each program path from the first line to a return statement, we
show that it terminates and that, when it does, the postconditions are
satisfied. We do this as follows:
• If the path contains no recursive calls or loops, analyse the code line
by line until the return statement.
• For each recursive call on the path (if there are any), argue why the
preconditions are satisfied at the time of the recursive call, and that
the recursive call occurs on a “smaller” input than the original call.
There is some ambiguity around what
is meant by “smaller.” We will discuss
this shortly.
Then you may assume that the postconditions for the recursive call
are satisfied when the recursive call terminates.
Finally, argue from the last recursive call to the end of the function
why the postconditions of the original function call will hold.
• For each loop, use a “loop invariant.” We will deal with this in the
next section.
Pre
Post
Pre
Post
Post
We have described the content of an induction proof; by focusing
on tracing values in the code, we are isolating the most important
thinking involved. But recall that the core of induction is proving a
predicate for all natural numbers; thus the final ingredient we need to
account for is associating each input with a natural number: its “size.” So in essence, our “predicate” would be
“for all inputs I of size n that satisfy the
precondition of f, f is correct on I.
Usually, this will be very simple: the length of a list, or the value of
an input number. Let us illustrate this technique using gcd_rec as our
example.
Example. We will show that gcd_rec is correct. There are three pro-
gram paths (this is easy to see, because of the if statements). Let’s
look at each one separately.
• Path 1: the program terminates at line 7. If the program goes into
this block, then a = 1 or b = 1. But in these cases, gcd(a, b) = 1,
52 david liu utm edits by daniel zingaro
because gcd(x, 1) = 1 for all x. Then the postcondition holds, since
at line 7 the program returns 1.
• Path 2: the program terminates at line 9. If the program goes into
this block, b divides a. Since b is the greatest possible divisor of
itself, this means that gcd(a, b) = b, and b is returned at line 9.
• Path 3: the program terminates at line 11. We need to check that the
recursive call satisfies its preconditions and is called on a smaller
instance. Note that b and (a mod b) are both at least 1, and (a
mod b) < b, so the preconditions are satisfied. Since a + b > (a
mod b) + b, the sum of the inputs decreases, and so the recursive call
is made on a smaller instance. If you recall the example of using
complete induction on ordered pairs,
taking the sum of the two components
was the size measure we used there,
too.
Therefore when the call completes, it returns gcd(b, a mod b). Now
we use the identity that gcd(a, b) = gcd(b, a mod b) to conclude
that the original call returns the correct answer.
Example. We now look at a recursive example on lists. Here we con-
sider a randomized binary search – this is worse than regular binary
search in practice, but is useful for our purposes because it shows that How is it similar to quicksort?
the correctness of binary search doesn’t depend on the size of the re-
cursive calls.
1 def rand_bin_search(A, x):
2 '''
3 Pre: A is a sorted list of numbers, and x is a number
4 Post: Returns true if and only if x is an element of A
5 '''
6 if len(A) == 0:
7 return false
8 else if len(A) == 1:
9 return A[0] == x
10 else:
11 guess = a random number from 0 to len(A) - 1, inclusive
12 if A[guess] == x:
13 return true
14 else if A[guess] > x:
15 return rand_bin_search(A[0..guess-1], x)
16 else:
17 return rand_bin_search(A[guess+1..len(A)-1], x)
Proof of correctness. Here there are five different program paths. We’ll
check three of them, and leave the others as an exercise:
introduction to the theory of computation 53
• Path 1: the program terminates at line 7. This happens when A is
empty; if this happens, x is certainly not in A, so the program returns
the correct value (false).
• Path 2: the program terminates at line 9. We compare A[0] to x,
returning true if they are equal, and false otherwise. Note that if
they aren’t equal, then x cannot be in A, since A has only one element
(its length is one).
• Path 4: the program terminates at line 15. This happens when
len(A) > 1, and A[guess] > x. Because A is sorted, and A[guess]
> x, for every index i ≥ guess, A[i] > x. Therefore the only way x
could appear in A is if it appeared at an index smaller than guess.
Now, let us handle the recursive call. Since guess ≤ len(A)− 1, we
have that guess− 1 ≤ len(A)− 2, and so the length of the list in the
recursive call is at most len(A)− 1; so the recursive call happens on
a smaller instance. Therefore, when the recursive call returns, the
postcondition is satisfied: it returns true if and only if x appears in
A[0..guess-1]. The original function call then returns this value;
by the discussion in the previous paragraph, this is the correct value
to return, so the postcondition is satisfied.
■
Iterative Programs
In this section, we’ll discuss how to handle loops in our code. So far,
we have been able to determine the exact sequence of steps in each
program path (e.g., “Lines 1, 2, 4, and 6 execute, and then the program
returns”). However, this is not the case when we are presented with a
We’ve treated recursive calls as “black
boxes” that behave nicely (i.e., can be
treated as a single step) as long as their
preconditions are satisfied and they are
called on smaller inputs.
loop, because the sequence of steps depends on the number of times
the loop iterates, which in turn depends on the input (e.g., the length
of an input list). Thus our argument for correctness cannot possibly
go step by step!
Instead, we treat the entire loop as a single unit, and give a correct-
ness argument specifically for it separately. But what do we mean for
a loop to be “correct”? Consider the following function.
1 def avg(A):
2 '''
3 Pre: A is a non-empty list of numbers
4 Post: Returns the average of the numbers in A
5 '''
54 david liu utm edits by daniel zingaro
6 sum = 0
7 i = 0
8 while i < len(A):
9 sum += A[i]
10 i += 1
11 return sum / len(A)
This is the sort of program you wrote in CSC108. Intuitively, it is
certainly correct — why? The “hard” work is done by the loop, which
calculates the sum of the elements in A. The key is to prove that this
is what the loop does. Coming out of first-year programming, most
of you would be comfortable saying that the variable sum starts with
value 0 before the loop, and after the loop ends it contains the value
of the sum of all elements in A. But what can we say about the value
of sum while the loop is running?
Clearly, the loop calculates the sum of the elements in A one element
at a time. After some thought, we determine that the variable sum
starts with value 0 and in the loop takes on the values A[0], then A[0]
+ A[1], then A[0] + A[1] + A[2], etc. We formalize this by defining
a loop invariant for this loop. A loop invariant is a predicate that loop invariant
is true every time the loop-condition is checked (including the check
that terminates the loop). Usually, the predicate will depend on which
iteration the loop is on, or more generally, the value(s) of the program
variable(s) associated with the loop. For example, in avg, the loop invari-
ant corresponding to our previous intuition is
By convention, the empty sum
−1
∑
k=0
A[k]
evaluates to 0.
P(i, sum) : sum =
i−1
∑
k=0
A[k]
The i and sum in the predicate really correspond to the values of those
variables in the code for avg. That is, this predicate is stating a property
of these variables in the code.
Unfortunately, this loop invariant isn’t quite right; what if i >
len(A)? Then the sum is not well-defined, since, for example, A[len(A)]
is undefined. This can be solved with a common technique for loop
invariants: putting bounds on “loop counter” variables, as follows:
Inv(i, sum) : 0 ≤ i ≤ len(A) ∧ sum =
i−1
∑
k=0
A[k].
This ensures that the sum is always well-defined, and has the added
benefit of explicitly defining a possible range of values on i.
In general, the fewer possible values a
variable takes on, the fewer cases you
have to worry about in your code.
introduction to the theory of computation 55
A loop invariant is correct if it is always true at the beginning of
every loop iteration, including the loop check that fails, causing the loop
to terminate. This is why we allowed i ≤ len(A) rather than just i <
len(A) in the invariant.
How do we prove that loop invariants are correct? The argument is
yet another application of induction:
• First, we argue that the loop invariant is satisfied when the loop is
reached. (This is arrow (1) in the diagram)
• Then, we argue that if the loop invariant is satisfied at the beginning
of an iteration, then after the loop body executes once (i.e., one loop
iteration occurs), the loop invariant still holds. (Arrow (2))
• Finally, after proving that the loop invariant is correct, we show that
if the invariant holds when the loop ends, then the postcondition
will be satisfied when the program returns. (Arrow (3))
Pre
Inv
Post
(1)
(2)
(3)
Though this is basically an inductive proof, as was the case for recur-
sive programs, we won’t hold to the formal induction structure here. If we wanted to be precise, we would
do induction on the number of loop
iterations executed.
Example. Let us formally prove that avg is correct. The main portion
of the proof will be the proof that our loop invariant Inv(i, sum) is
correct.
Proof. When the program first reaches the loop, i = 0 and sum = 0.
Plugging this into the predicate yields
Inv(0, 0) : 0 ≤ 0 ≤ len(A) ∧ 0 =
−1
∑
k=0
A[k],
which is true (recall the note about the empty sum from earlier).
Now suppose the loop invariant holds when i = i0, at the beginning
of a loop iteration. Let sum0 be the value of the variable sum at this
time. The loop invariant we are assuming is the following:
Inv(i0, sum0) : 0 ≤ i0 ≤ len(A) ∧ sum0 =
i0−1
∑
k=0
A[k].
What happens next? The obvious answer is “the loop body runs,” but
this misses one subtle point: if i0 = len(A) (which is allowed by the
loop invariant), the body of the loop doesn’t run, and we don’t need to
worry about this case, since we only care about checking what happens
to the invariant when the loop actually runs.
56 david liu utm edits by daniel zingaro
Assume that i0 < len(A), so that the loop body runs. What happens
in one iteration? Two things: sum increases by A[i0], and i increases
by 1. Let sum1 and i1 be the values of sum and i at the end of the loop
iteration. We have sum1 = sum0 + A[i0] and i1 = i0 + 1. Our goal is to
prove that the loop invariant holds for i1 and sum1, i.e.,
Inv(i1, sum1) : 0 ≤ i1 ≤ len(A) ∧ sum1 =
i1−1
∑
k=0
A[k].
Let us check that the loop invariant is still satisfied by sum1 and i1.
First, 0 ≤ i0 < i0 + 1 = i1 ≤ len(A), where the first inequality came
from the loop invariant holding at the beginning of the loop, and the
last inequality came from the assumption that i0 < len(A). The second
part of Inv can be checked by a simple calculation:
sum1 = sum0 + A[i0]
=
(
i0−1
∑
k=0
A[k]
)
+ A[i0] (by Inv(i0, sum0))
=
i0
∑
k=0
A[k]
=
i1−1
∑
k=0
A[k] (Since i1 = i0 + 1)
Therefore the loop invariant always holds.
The next key idea is that when the loop ends, variable i has value
len(A), since by the loop invariant it always has value ≤ len(A), and
if it were strictly less than len(A), another iteration of the loop would
run. Then by the loop invariant, the value of sum is
len(A)−1
∑
k=0
A[k], i.e.,
the sum of all the elements in A! The final step is to continue tracing
That might seem like a lot of writing to
get to what we said paragraphs ago, but
this is a formal argument that confirms
our intuition.until the program returns, which in this case takes just a single step:
the program returns sum / len(A). But this is exactly the average of
the numbers in A, because the variable sum is equal to their sum!
We also implicitly use here the mathe-
matical definition of “average” as the
sum of the numbers divided by how
many there are.Therefore the postcondition is satisfied. ■
Note the deep connection between the loop invariant and the post-
condition. There are many other loop invariants we could have tried to
prove: for example, Inv(i, sum) : i + sum ≥ 0. But this wouldn’t have
helped at all in proving the postcondition! When confronting more
problems on your own, it will be up to you to determine the right loop
invariants for the job. Keep in mind that choosing loop invariants can
introduction to the theory of computation 57
usually be done by either taking a high-level approach and mimick-
ing something in the postcondition or taking a low-level approach by
carefully tracing through the code on test inputs to try to find patterns
in the variable values.
One final warning before we move on: loop invariants describe rela-
tionships between variables at a specific moment in time. Students often Specifically, at the beginning of a
particular loop check.try to use loop invariants to capture how variables change over time (e.g.,
“i will increase by 1 when the loop runs”), which creates massive
headaches because determining how the code works is the meat of
a proof, and shouldn’t be shoehorned into a single predicate! When
working with more complex code, we take the view that loop invari-
ants are properties that are preserved, even if they don’t describe ex-
actly how the code works or exactly what happens! This flexibility is
what makes correctness proofs manageable.
Example. Here we consider a numerical example: a low-level imple-
mentation of multiplication.
1 def mult(a,b):
2 '''
3 Pre: a and b are natural numbers
4 Post: returns a * b
5 '''
6 m = 0
7 count = 0
8 while count < b:
9 m += a
10 count += 1
11 return m
Proof of correctness. The key thing to figure out here is how the loop
accomplishes the multiplication. It’s clear what it’s supposed to do:
the variable m changes from 0 at the beginning of the loop to a*b at
the end. How does this change happen? One simple thing we could
do is make a table of values of the variables m and count as the loop
progresses:
58 david liu utm edits by daniel zingaro
m count
0 0
a 1
2a 2
3a 3
...
...
Aha: m seems to always contain the product of a and count; and, when
the loop ends, count = b! This leads directly to the following loop
invariant (including the bound on count):
Inv(m, count) : m = a× count ∧ count ≤ b
Consider an execution of the code, with the preconditions satisfied
by the inputs. Then when the loop is first encountered, m = 0 and
count = 0, so m = a× count, and count ≤ b (since b ∈N).
Now suppose the loop invariant holds at the beginning of some
iteration, with m = m0 and count = count0. Furthermore, suppose Explicitly, we assume that
Inv(m0, count0) holds.count0 < b, so the loop runs. When the loop runs, m increases by a and
count increases by 1. Let m1 and count1 denote the new values of m and
count; so m1 = m0 + a and count1 = count0 + 1. Since Inv(m0, count0)
holds, we have
m1 = m0 + a
= a× count0 + a (by invariant)
= a(count0 + 1)
= a× count1
Moreover, since we’ve assumed count0 < b, we have that count1 =
count0 + 1 ≤ b. So Inv(m1, count1) holds.
Finally, when the loop terminates, we must have count = b, since by
the loop invariant count ≤ b, and if count < b another iteration would
occur. Then by the loop invariant again, when the loop terminates,
m = ab, and the function returns m, satisfying the postcondition. ■
Termination
Unfortunately, there is a slight problem with all the correctness proofs
we have done so far. We’ve used phrases like “when the recursive call
ends” and “when the loop terminates”. But how do we know that the
recursive calls and loops end at all? That is, how do we know that a
program does not contain an infinite loop or infinite recursion? This is a serious issue: what beginning
programmer hasn’t been foiled by either
of these errors?
introduction to the theory of computation 59
The case for recursion is actually already handled by our implicit
induction proof structure. Recall that the predicate in our induction
proofs is that f is correct on inputs of size n; part of the definition of
correctness is that the program terminates. Therefore as long as the
induction structure holds — i.e., that the recursive calls are getting
smaller and smaller — termination comes for free.
The case is a little trickier for loops. As an example, consider the
loop in avg. Because the loop invariant Inv(i, sum) doesn’t say any-
thing about how i changes, we can’t use it to prove that the loop
terminates. But for most loops, including this one, it is “obvious” why
they terminate, because they typically have counter variables that iter-
ate through a fixed range of values.
The counter role is played by the
variable i, which goes through the
range {0, 1, . . . , len(A)}.
This argument would certainly convince us that the avg loop termi-
nates, and in general all loops with this form of loop counter terminate.
However, not all loops you will see or write will have such an obvious
loop counter. Here’s an example:
1 def collatz(n):
2 ''' Pre: n is a natural number '''
3 curr = n
4 while curr > 1:
5 if curr is even:
6 curr = curr // 2
7 else:
8 curr = 3 * curr + 1
In fact, it is an open question in math-
ematics whether this function halts
on all inputs or not. If only we had a
computer program that could tell us
whether it does!
Therefore we’ll now introduce a formal way of proving loop termi-
nation. Recall that our correctness proofs of recursive functions hinged loop termination
on the fact that the recursive calls were made on smaller and smaller
inputs, until some base case was reached. Our strategy for loops will
draw inspiration from this: we associate with the loop a loop variant loop variant
v that has the following two properties:
(1) v decreases with each iteration of the loop
(2) v is always a natural number at the beginning of each loop iteration
While this is not the only strategy for
proving termination, it turns out to
be suitable for most loops involving
numbers and lists. When you study
more advanced data structures and
algorithms, you will discuss more
complex arguments for both correctness
and termination.If such a v exists, then at some point v won’t be able to decrease any
further (because 0 is the smallest natural number), and therefore the
loop cannot have any more iterations. This is analogous to inputs to
recursive calls getting smaller and smaller until a base case is reached.
60 david liu utm edits by daniel zingaro
Let us illustrate this technique on our avg loop.
Example. Proof of termination of avg. Even though we’ve already ob-
served that the loop has a natural loop counter variable i, this vari-
able increases with each iteration. Instead, our loop variant will be
v = len(A)− i. Let us check that v satisfies the properties (1) and (2):
(1) Since at each iteration i increases by 1, and len(A) stays the same,
v = len(A)− i decreases by 1 on each iteration.
(2) Note that i and len(A) are both always natural numbers. But this
alone is not enough to conclude that v ∈ N; for example, 3, 5 ∈ N
but (3− 5) /∈ N. But the loop invariant we proved included the
predicate 0 ≤ i ≤ len(A), and because of this we can conclude that
len(A)− i ≥ 0, so len(A)− i ∈N.
This is a major reason we include
such loop counter bounds on the loop
invariant.
Since we have established that v is a decreasing, bounded variant for
the loop, this loop terminates, and therefore avg terminates (since ev-
ery other line of code is a simple step that certainly terminates). ■
Notice that the above termination proof relied on i increasing by 1
on each iteration, and that i never exceeds len(A). That is, we basically
just used the fact that i was a standard loop counter. Here is a more
complex example where there is no obvious loop counter.
Example. Prove that the following function terminates:
1 def term_ex(x,y):
2 ''' Pre: x and y are natural numbers. '''
3 a = x
4 b = y
5 while a > 0 or b > 0:
6 if a > 0:
7 a -= 1
8 else:
9 b -= 1
10 return x * y
Proof. Intuitively, the loop terminates because when the loop runs, ei-
ther a or b decreases, and will stop when a and b reach 0. To make
this argument formal, we need the following loop invariant: a, b ≥ 0,
whose proof we leave as an exercise.
introduction to the theory of computation 61
The loop variant we define is v = a + b. Let us prove the necessary
properties for v:
• In the loop, either a decreases by 1, or b decreases by 1. In either
case, v = a + b decreases by 1. Therefore v is decreasing.
• Since our loop invariant says that a, b ≥ 0, we have that v ≥ 0 as
well. Therefore v ∈N. ■
Exercises
1. Here is some code that recursively determines the smallest element
of a list. Give pre- and postconditions for this function, then prove
it is correct according to your specifications.
1 def recmin(A):
2 if len(A) == 1:
3 return A[0]
4 else:
5 m = len(A) // 2
6 min1 = recmin(A[0..m-1])
7 min2 = recmin(A[m..len(A)-1])
8 return min(min1, min2)
2. Prove that the following code is correct, according to its specifica-
tions.
1 def sort_colours(A):
2 '''
3 Pre: A is a list whose elements are either 'red' or 'blue'
4 Post: All red elements in A appear before all blue ones
5 '''
6 i = 0
7 j = 0
8 while i < len(A):
9 if A[i] is red:
10 swap A[i], A[j]
11 j += 1
12 i +=1
3. Prove the following loop invariant for the loop in term_ex: Inv(a, b) :
a, b ≥ 0.
62 david liu utm edits by daniel zingaro
4. Consider the following modification of the term_exexample.
1 def term_ex_2(x,y):
2 ''' Pre: x and y are natural numbers '''
3 a = x
4 b = y
5 while a >= 0 or b >= 0:
6 if a > 0:
7 a -= 1
8 else:
9 b -= 1
10 return x * y
(a) Demonstrate via example that this doesn’t always terminate.
(b) Show why the proof of termination given for term_ex fails.
5. For each of the following, state pre- and postconditions that capture
what the program is designed to do, then prove that it is correct
according to your specifications.
Don’t forget to prove termination (even though this is pretty sim-
ple). It’s easy to forget about this if you aren’t paying attention.
(a)
1 def mod(n, d):
2 r = n
3 while r >= d:
4 r -= d
5 return r
(b)
1 def div(n, d):
2 r = n
3 q = 0
4 while r >= d:
5 r -= d
6 q += 1
7 return q
introduction to the theory of computation 63
(c)
1 def lcm(a,b):
2 x = a
3 y = b
4 while x != y:
5 if x < y:
6 x += a
7 else:
8 y += b
9 return x
(d)
1 def div3(s):
2 r = 0
3 t = 1
4 i = 0
5 while i < len(s):
6 r += t * s[i]
7 t *= -1
8 i += 1
9 return r mod 3 == 0
(e)
1 def count_zeroes(L):
2 z = 0
3 i = 0
4 while i < len(L):
5 if L[i] == 0:
6 z += 1
7 i +=1
8 return z
(f)
1 def f(n):
2 r = 2
3 i = n
4 while i > 0:
5 r = 3*r -2
64 david liu utm edits by daniel zingaro
6 i -= 1
7 return r
6. Consider the following code.
1 def f(x):
2 ''' Pre: x is a natural number '''
3 a = x
4 y = 10
5 while a > 0:
6 a -= y
7 y -= 1
8 return a * y
(a) Give a loop invariant that characterizes the values of a and y.
(b) Show that sometimes this code fails to terminate.
7. In this question, we study two different algorithms for exponen-
tiation: a recursive and iterative algorithm. First, state pre- and
postconditions that the algorithms must satisfy (they’re the same
for the two). Then, prove that each algorithm is correct according to
the specifications.
(a)
1 def exp_rec(a, b):
2 if b == 0:
3 return 1
4 else if b mod 2 == 0:
5 x = exp_rec(a, b / 2)
6 return x * x
7 else:
8 x = exp_rec(a, (b - 1)/2)
9 return x * x * a
(b)
1 def exp_iter(a, b):
2 ans = 1
3 mult = a
4 exp = b
introduction to the theory of computation 65
5 while exp > 0:
6 if exp mod 2 == 1:
7 ans *= mult
8 mult = mult * mult
9 exp = exp // 2
10 return ans
8. Prove that the following function is correct. Warning: this one is
probably the most difficult of these exercises. But, it runs in linear
time – pretty amazing!
1 def majority(A):
2 '''
3 Pre: A is a list with more than half its entries equal to x
4 Post: Returns the majority element x
5 '''
6 c = 1
7 m = A[0]
8 i = 1
9 while i <= len(A) - 1:
10 if c == 0:
11 m = A[i]
12 c == 1
13 else if A[i] == m:
14 c += 1
15 else:
16 c -= 1
17 i += 1
18 return m
9. Here we study yet another sorting algorithm, bubblesort.
1 def bubblesort(L):
2 '''
3 Pre: L is a list of numbers
4 Post: L is sorted
5 '''
6 k = 0
7 while k < len(L):
8 i = 0
9 while i < len(L) - k:
10 if L[i] > L[i+1]:
11 swap L[i] and L[i+1]
12 i += 1
13 k +=1
66 david liu utm edits by daniel zingaro
(a) State and prove an invariant for the inner loop.
(b) State and prove an invariant for the outer loop.
(c) Prove that bubblesort is correct, according to its specifications.
10. Consider the following generalization of the min function.
1 def extract(A, k):
2 pivot = A[0]
3 # Use partition from quicksort
4 L, G = partition(A[1..len(A) - 1], pivot)
5 if len(L) == k - 1:
6 return pivot
7 else if len(L) >= k:
8 return extract(L, k)
9 else:
10 return extract(G, k - len(L) - 1)
(a) Prove that this algorithm is correct.
(b) Analyse the worst-case running time of this algorithm. Hint: this
algorithm is known as quickselect, and is pretty obviously related
to quicksort.
Regular Languages & Finite Automata
In this final chapter, we turn our attention to the study of finite au-
tomata, a simple model of computation with surprisingly deep ap-
plications ranging from vending machines to neurological systems.
We focus on one particular application: matching regular languages,
which are the foundation of natural language processing, including
text searching and parsing. This application alone makes automata
an invaluable computational tool, one with which you are probably
already familiar in the guise of regular expressions.
Definitions
We open with some definitions related to strings. An alphabet Σ is a fi- alphabet
nite set of symbols, e.g., {0, 1}, {a, b, . . . , z}, or {0, 1, . . . , 9,+,−,×,÷}.
A string over an alphabet Σ is a finite sequence of symbols from Σ. string
Therefore “0110” is a string over {0, 1}, and “abba” and “cdbaaaa” are
strings over {a, b, c, d}. The empty string “”, denoted by ϵ, consists of a Σ: Greek letter “Sigma”, ϵ: “epsilon”
sequence of zero symbols from the alphabet. We use the notation Σ∗
to denote the set of all strings over the alphabet Σ.
The length of a string w ∈ Σ∗ is the number of symbols appearing length
in the string, and is denoted |w|. For example, |ϵ| = 0, |aab| = 3,
and |11101010101| = 11. We use Σn to denote the set of strings over
Σ of length n. For example, if Σ = {0, 1}, then Σ0 = {ϵ} and Σ2 =
{00, 01, 10, 11}. So Σ∗ = Σ0 ∪ Σ1 ∪ Σ2 ∪ · · ·
A language L over an alphabet Σ is a subset of strings over Σ: L ⊆ language
Σ∗. Unlike alphabets and strings, languages may be either finite The “English language” is a subset of
the strings that can be formed by all
possible combinations of the usual 26
letters.
or infinite. Here are some examples of languages over the alphabet
68 david liu utm edits by daniel zingaro
{a, b, c}:
{ϵ, a, b, ccc}
{w ∈ {a, b, c}∗ | |w| ≤ 3}
{w ∈ {a, b, c}∗ | w has the same number of a’s and c’s}
{w ∈ {a, b, c}∗ | w can be found in an English dictionary}
These are pretty mundane examples. Somewhat surprisingly, how-
ever, this notion of languages also captures solutions to computational
problems. Consider the following languages over the alphabet of all
standard ASCII characters.
E.g., “[1, 2, 76]′′ ∈ L1
L1 = {A | A is a string representation of a sorted list of numbers}
L2 = {(A, x) | A is a list of numbers, x is the minimum of A}
L3 = {(a, b, c) | a, b, c ∈N and gcd(a, b) = c}
L4 = {(P, x) | P is a Python program that halts when given input x}
So we can interpret many computer programs as deciding membership
in a particular language. For example, a program that decides whether
an input string is in L1 is essentially a program that decides whether
an input list is sorted.
Membership in L4 cannot be computed
by any program at all — this is the
famous Halting Problem!
Solutions to computational problems are just one face of the coin;
what about the programs we create to solve them? One of the great
achievements of the early computer scientist Alan Turing was the de-
velopment of the Turing Machine, an abstract model of computation
that is just as powerful as the physical computers we use today. Unfor-
tunately, Turing Machines are a far more powerful and complex model
than is appropriate for this course. Instead, we will study the simpler You will study Turing Machines to your
heart’s content in CSC363/CSC463.computational model of finite automata, and the class of languages they
compute: regular languages.
Regular Languages
Regular languages are the most basic kind of languages, and are de-
rived from rather simple language operations. In particular, we define
the following three operations for languages L, M ⊆ Σ∗:
• Union: The union of L and M is the language L ∪ M = {x ∈ Σ∗ | union
x ∈ L or x ∈ M}.
• Concatenation: The concatenation of L and M is the language LM = concatenation
{xy ∈ Σ∗ | x ∈ L, y ∈ M}.
introduction to the theory of computation 69
• Kleene Star: The (Kleene) star of L is the language L∗ = {ϵ} ∪ {x ∈ Kleene star
Σ∗ | ∃w1, w2, . . . , wn ∈ L such that x = w1w2 . . . wn, for some n}.
That is, L∗ contains the strings that can be broken down into 0 or
more smaller strings, each of which is in L.
The star operation can be thought of
in terms of union and concatenation as
L∗ = {ϵ} ∪ L ∪ LL ∪ LLL ∪ · · ·
Example. Consider the languages L = {a, bb} and M = {a, c}. Then
we have
L ∪ M = {a, bb, c}
LM = {aa, ac, bba, bbc}
L∗ = {ϵ, a, aa, bb, aaa, abb, bba, . . . }
M∗ = {ϵ, a, c, aa, ac, ca, cc, aaa, aac, . . . } Note that M∗ = {a, c}∗ is exactly thestrings made up of only a’s and c’s.
This explains the notation Σ∗ to denote
the set of all strings over the alphabet Σ.
We can now give the following recursive definition of regular lan-
guages:
Definition (Regular Language). The set of regular languages over an regular language
alphabet Σ is defined recursively as follows:
• ∅, the empty set, is a regular language.
• {ϵ}, the language consisting of only the empty string, is a regular
language.
• For any symbol a ∈ Σ, {a} is a regular language.
• If L, M are regular languages, then so are L ∪ M, LM, and L∗.
Students often confuse the notation ∅,
ϵ, and {ϵ}. First, ϵ is a string, while ∅
and {ϵ} are sets of strings. The set ∅
contains no strings (and so has size 0),
while {ϵ} contains the single string ϵ
(and so has size 1).
Regular languages are sets of strings, and are often infinite. As hu-
mans, we are able to leverage our language processing and logical abil-
ities to represent languages; for example, “strings that start and end
with the same character” and “strings that have an even number of ze-
roes” are both simple descriptions of regular languages. What about
computers? We could certainly write simple programs that compute
either of the above languages, but we have grander ambitions. Specifi- Exercise: write these programs.
cally, we would like a simple, computer-friendly representation of reg-
ular languages so that we could input an arbitrary regular language
and a string, and the computer would determine whether the string is
in the language or not.
This is precisely the idea behind the regular expression (regex), a regular expression
pattern-based string representation of a regular language. Given a
regular expression r, we use L(r) to denote the language matched (rep-
resented) by r. Here are the elements of regular expressions:
Note the almost identical structure
to the definition of regular languages
themselves.
70 david liu utm edits by daniel zingaro
• ∅ is a regex, with L(∅) = ∅ (matches no string) We follow the unfortunate overload-
ing of symbols of previous course
instructors. When we write, for exam-
ple, L(∅) = ∅, the first ∅ is a string
with the single character ∅, which is
a regular expression; the second ∅ is
the standard mathematical notation
representing the empty set.
• ϵ is a regex, with L(ϵ) = {ϵ} (matches only the empty string)
• For all symbols a ∈ Σ, a is a regex with L(a) = {a} (matches only
the single string a)
• Let r1, r2 be regexes. Then r1 + r2, r1r2, and r∗1 are regexes, with
L(r1 + r2) = L(r1) ∪ L(r2), L(r1r2) = L(r1)L(r2), and L(r∗1) =
(L(r1))∗ (matches union, concatenation, and star, respectively)
It’s an easy exercise to prove by structural induction that every reg-
ular language can be matched by a regular expression, and every regular
expression matches a language that is regular. Another way to put it is
that a language L is regular if and only if there is a regular expression
r such that L = L(r).
Example. Let Σ = {0, 1}. Describe the language of the regex 01 +
1(0 + 1)∗. To interpret this regex, we need to understand precedence
rules. By convention, these are identical to the standard arithmetic
precedence; thus star has the highest precedence, followed by concatenation,
and finally union. Therefore the complete bracketing of this regex is So star is like power, concatenation is
like multiplication, and union is like
addition.
(01) + (1((0+ 1)∗)), but with these precedence rules in place, we only
need the brackets around the (0+ 1).
Let us proceed part by part. The “01” component matches the string
01. The (0 + 1)∗ matches all binary strings, because it contains all
strings resulting from adding a 0 or 1 at each step. This means that
1(0 + 1)∗ matches a 1, followed by any binary string. Finally, we take
the union of these two: L(01 + 1(0 + 1)∗) is the set of strings that are
either 01 or start with a 1.
Example. Let’s go the other way, and develop a regular expression
given a description of the following regular language:
L = {w ∈ {a, b}∗ | w has length at most 2}.
Solution:
Note that L is finite, so in fact we can simply list out all of the strings
in our regex:
ϵ+ a + b + aa + ab + ba + bb
Another strategy is to divide up the regex into cases depending on
length:
ϵ+ (a + b) + (a + b)(a + b)
The three parts capture the strings of length 0, 1, and 2, respectively.
A final representation would be to say that we’ll match two characters,
introduction to the theory of computation 71
each of which could be empty, a, or b:
(ϵ+ a + b)(ϵ+ a + b).
All three regexes we gave are correct! In general, there is more than one
regular expression that matches any given regular language.
Example. A regular expression matching the language L = {w ∈
{0, 1}∗ | w has 11 as a substring} is rather straightforward:
(0+ 1)∗11(0+ 1)∗.
But what about the complement of L, i.e., the language L¯ = {w ∈
{0, 1}∗ | w does not have 11 as a substring}? It is more difficult to find
a regex for this language because regular expressions specify patterns
that should be matched, not avoided. Let’s approach this problem by
interpreting the definition as “every 1 must be preceded by a 0.”
Here is our first attempt:
(00∗1)∗.
Inside the brackets, 00∗1 matches a non-empty block of 0’s followed
by a 1, and this certainly ensures that there are no consecutive 1’s. Un-
fortunately, this regular expression matches only a subset of L¯, and not
L¯ entirely. For example, the string 1001 is in L¯, but can’t be matched
by this regular expression because the first 1 isn’t preceded by any 0’s.
We can fix this by saying that the first character could possibly be a 1:
(ϵ+ 1)(00∗1)∗.
There is one last problem: this fails to match any string that ends with
0’s, e.g., 0100. We can apply a similar fix as the previous one to allow
a block of 0’s to be matched at the end:
(ϵ+ 1)(00∗1)∗0∗.
Some interesting meta-remarks arise naturally from the last exam-
ple. One is a rather straightforward way of showing that a language
and regex are inequivalent, by simply producing a string that is in one
but not the other. On the other hand, convincing ourselves that a regu-
Keep this in mind when you and
your friends are arguing about whose
regular expressions are correct.
lar expression correctly matches a language can be quite difficult; how
did we know that (ϵ+ 1)(00∗1)∗0∗ correctly matches L¯? Proving that
a regex matches a particular language L is much harder, because we
need to show that every string in L is matched by the regex, and every
string not in L is not.
Note that this is a universally quantified
statement, while its negation (regex
doesn’t match L) is existentially quan-
tified. This explains the difference in
difficulty.Our intuition only takes us so far — it is precisely this gap between
our gut and true understanding that proofs were created to fill. This is,
72 david liu utm edits by daniel zingaro
however, just a little beyond the scope of the course; you have all the
necessary ingredients — surprise: induction — but the arguments for
all but the simplest languages are quite involved. We will see later that reasoning about
the correctness of deterministic finite
automata is just as powerful, and a little
simpler.
Example. We finish off this section with a few corner cases to illus-
trate some of the subtleties in our definition, and extend the arithmetic
metaphor we hinted at earlier. First, the following equalities show that
∅ plays the role of the “zero” in regular expressions. Let r be an arbi-
trary regular expression.
By equality between regular expressions
we mean that they match the same
language. That is, r1 = r2 ⇔ L(r1) =
L(r2).
∅+ r = r +∅ = r
∅r = r∅ = ∅
The first is obvious because taking the union of any set with the empty
set doesn’t change the set. What about concatenation? Recall that con-
catenation of languages involves taking combinations of strings from
the first language and strings from the second; if one of the two lan-
guages is empty, then no combinations are possible.
We can use similar arguments to show that ϵ plays the role of the
“one”:
ϵr = rϵ = r
ϵ∗ = ϵ
A Suggestive Flowchart
You might be wondering how computers actually match strings to reg-
ular expressions. It turns out that regular languages are rather easy to
match because of the following (non-obvious!) property:
You can determine membership in a regular language by
reading symbols of the string one at a time, left to right.
q0 q1
0
1
1
0
Consider the flowchart-type object shown in the figure on the right.
Suppose we start at the state marked q0. Now consider the string
0110, reading the symbols one at a time, left to right, and following
the arrows marked by the symbols we read in. It is not hard to see
that we end up at state q0. On the other hand, if the string is 111,
we end up at state q1. The state q1 is marked as special by a double-
border; we say that a string ending up at q1 is accepted, while a string
ending at q0 is rejected.
introduction to the theory of computation 73
After some thought you may realise that the accepted strings are
exactly the ones with an odd number of 1’s. A more suggestive way
of saying this is that the language accepted by this flowchart is the set of
strings with an odd number of 1’s.
Deterministic Finite Automata
We now use the notion of “flowcharts” to define a simple model of
computation. Each one acts as a simple computer program: starting
at a particular point, it receives inputs from the user, updates its in-
ternal memory, and when the inputs are finished, it outputs True or
False. The following definition is likely one of the most technical
you’ve encountered to date, in this course and others. Make sure you
fully understand the prior example, and try to match it to the formal
definition as you read it.
Definition (Deterministic Finite Automaton). A deterministic finite
automaton (DFA) (denoted D) is a quintuple D = (Q,Σ, δ, s, F) where deterministic finite automaton
the components define the various parts of the automaton:
• Q is the (finite) set of states in D
• Σ is the alphabet of symbols used by D
• δ : Q× Σ→ Q is the transition function (represents the “arrows”)
• s ∈ Q is the initial state of D
• F ⊆ Q is the set of accepting (final) states of D
Example. In the introductory example, the state set is Q = {q0, q1},
the alphabet is {0, 1}, the initial state is q0, and the set of final states is
{q1}. We can represent the transition function as a table of values:
Note that there’s only one final state in
this example, but in general there may
be several final states.
Old State Symbol New State
q0 0 q0
q0 1 q1
q1 0 q1
q1 1 q0
In general, the number of rows in the transition table is |Q| · |Σ|;
each state must have exactly |Σ| transitions leading out of it, each la-
belled with a unique symbol in Σ.
Before proceeding, make sure you understand each of the following
statements about how DFAs work:
74 david liu utm edits by daniel zingaro
• DFAs read strings one symbol at a time, from left to right
• DFAs cannot “go back” and reread previous symbols
• At a particular state, once you have read a symbol there is only one
arrow (transition) you can follow
• DFAs have a finite amount of memory, since they have a finite num-
ber of states
• Inputs to DFAs can be any length
• Because of these last two points, it is impossible for DFAs to always
remember every symbol they have read so far
A quick note about notation before proceeding with a few more
examples. Technically, δ takes as its second argument a single symbol;
e.g., δ(q0, 1) = q1 (from the previous example). But we can just as
easily extend this definition to arbitrary-length strings in the second
argument. For example, we can say δ(q0, 11) = q0, δ(q1, 1000111) = q1,
and δ(q0, ϵ) = q0. For every state q, δ(q, ϵ) = q.
With this “extended” transition function, it is very easy to symboli-
cally represent the language L(D) accepted by the DFA D. Recall that the language accepted by a
DFA is the set of strings accepted by it.
L(D) = {w ∈ Σ∗ | δ(s, w) ∈ F}
Example. Let us design a DFA that accepts the following language:
L = {w ∈ {a, b}∗ | w starts with a and ends with b}.
introduction to the theory of computation 75
Solution:
Consider starting at an initial state q0. q0
q0 q1
b
a, b
q0
q1
q2 q3
b
a
a, b
a
b
q0
q1
q2 q3
b
a
a, b
a
b
a
b
What happens when you read in an a or a b? Reading in a b should
lead to a state q1 where it’s impossible to accept. On the other hand,
reading in an a should lead to a state q2 where we need to “end with a
b.” The simple way to achieve this is to have q2 loop on a’s, then move
to an accepting state q3 on a b.
What about the transitions for q3? As long as it’s reading b’s, it can
accept, so it should loop to itself. On the other hand, if it reads an a, it
should go back to q2, and continue reading until it sees another b.
Correctness of DFAs
Like regular expressions, arguing that DFAs are incorrect is gener-
ally easier than arguing that they are correct. However, because of
the rather restricted form DFAs must take, reasoning about their be-
haviour is a little more amenable than for regular expressions.
The simple strategy of “pick an arbitrary string in L, and show that
it is accepted by the DFA” is hard to accomplish, because the paths
taken through the DFA by each accepted string can be quite differ-
ent; i.e., different strings will probably require substantially different
proofs. Therefore we adopt a different strategy. We know that DFAs
consist of states and transitions between states; the term state suggests
that if a string reaches that point, the DFA “knows” something about
that string, or it is “expecting” what will come next. We formalize
this notion by characterizing for each state precisely what must be true
about the strings that reach it.
Definition (State invariant). Let D = (Q,Σ, δ, s, F) be a DFA. Let q ∈ Q
be a state of the DFA. We define a state invariant for q as a predicate state invariant
P(x) (over domain Σ∗) such that for every string w ∈ Σ∗, δ(s, w) = q if
and only if P(w) is true.
Note that the definition of state invariant uses an if and only if. We
aren’t just giving properties that the strings reaching q must satisfy;
we are defining precisely which strings reach q. Let us see how state
invariants can help us prove that DFAs are correct.
Example. Consider the following language over the alphabet {0, 1}:
L = {w | w has an odd number of 1’s}, and the DFA shown. Prove
that the DFA accepts precisely the language L.
q0 q1
0
1
1
0
76 david liu utm edits by daniel zingaro
Proof. It is fairly intuitive why this DFA is correct: strings with an even
number of 1’s go to q0, and transition to q1 upon reading a 1 (where the
string now has an odd number of 1’s). Here are some state invariants
for the two states:
δ(q0, w) = q0 ⇔ w has an even number of 1’s
δ(q0, w) = q1 ⇔ w has an odd number of 1’s
Here are two important properties to keep in mind when designing
state invariants:
• The state invariants should be mutually exclusive. That is, there
should be no overlap between them; no string should satisfy two
different state invariants. Otherwise, to which state would the string
go?
• The state invariants should be exhaustive. That is, they should cover
all possible cases; every string in Σ∗, including ϵ, should satisfy one
of the state invariants. Otherwise, the string goes nowhere.
These conditions are definitely satisfied by our two invariants above,
since every string has either an even or odd number of 1’s.
Next, we want to prove that the state invariants are correct. We do
this in two steps.
• Show that the empty string ϵ satisfies the state invariant of the
initial state. In our case, the initial state is q0; ϵ has zero 1’s, which
is even; therefore the state invariant is satisfied by ϵ.
• For each transition q a−→ r, show that if a string w satisfies the
invariant of state q, then the string wa satisfies the invariant of
r. (That is, each transition respects the invariants.) There are four The astute reader will note that we are
basically doing a proof by induction
on the length of the strings. Like our
proofs of program correctness, we
“hide” the formalities of induction
proofs and focus only on the content.
transitions in our DFA. For the two 0-loops, appending a 0 to a
string doesn’t change the number of 1’s in the string, and hence if
w contains an even (odd) number of 1’s, then w0 contains an even
(odd) number of 1’s as well, so these two transitions are correct.
On the other hand, appending a 1 increases the number of 1’s in
a string by one. So if w contains an even (odd) number of 1’s, w1
contains an odd (even) number of 1’s, so the transitions between q0
and q1 labelled 1 preserve the invariants.
Thus we have proved that the state invariants are correct. The final
step is to show that the state invariants of the accepting state(s) pre-
cisely describe the target language. This is very obvious in this case, Remember that in general, there can be
more than one accepting state!because the only accepting state is q1, and its state invariant is exactly
the defining characteristic of the target language L. ■
introduction to the theory of computation 77
Limitations of DFAs
The simplicity of the DFA model enables proofs of correctness, as
shown above. This simplicity is also useful for reasoning about the
model’s limitations. In this section, we’ll cover two examples. First,
we prove a lower bound on the number of states required in a DFA ac-
cepting a particular language. Then, we’ll show that certain languages
cannot be accepted by DFAs of any size!
Example. Consider the language
L = {w ∈ {0, 1}∗ | w has at least three 1’s.}.
We’ll prove that any DFA accepting this language has at least 4 states.
Proof. Suppose there exists a DFA D = (Q,Σ, δ, s, F) that accepts L and
has fewer than four states. Consider the four strings w0 = ϵ, w1 = 1,
w2 = 11, and w3 = 111. By the Pigeonhole Principle, two of these strings
reach the same state in D from the initial state. Suppose 0 ≤ i < j ≤ 3,
and δ(s, wi) = δ(s, wj) = q for some state q ∈ Q. (That is, wi and wj
both reach the same state q.)
The Pigeonhole Principle states that if
m > n, and you are putting m pigeons
into n holes, then two pigeons will go
into the same hole.
Now, since wi and wj reach the same state, they are indistinguishable
prefixes to the DFA; this means that any strings of the form wix and wjx
will end up at the same state in the DFA, and hence are both accepted
or both rejected. However, suppose x = w3−j. Then wix = wi+3−j
and wjx = wj+3−j = w3. Then wix contains 3 − j + i 1’s, and wjx
contains three 1’s. But since i < j, 3− j + i < 3, so wix /∈ L, while
wjx ∈ L. Therefore wix and wjx cannot end up at the same state in D,
a contradiction! ■
The key idea in the above proof was that the four different strings
ϵ, 1, 11, 111 all had to reach different states, because there were suffixes
that could distinguish any pair of them. In general, to prove that a These suffixes were the x = w3−j in the
proof.language L requires at least k states in a DFA to accept it, it suffices to
give a set of k strings, each of which is distinguishable from the others
with respect to L. Two strings w1 and w2 are “distinguish-
able with respect to L” if there is a
suffix x such that w1x ∈ L and w2x /∈ L,
or vice versa.
Now we turn to a harder problem: proving that some languages
cannot be accepted by DFAs of any size.
Example. Consider the language L = {0n1n | n ∈ N}. Prove that no
DFA accepts L.
Proof. We’ll prove this by contradiction again. Suppose there is a DFA
D = (Q,Σ, δ, s, F) that accepts L. Let k = |Q|, the number of states
78 david liu utm edits by daniel zingaro
in D. Now here is the key idea: D has only k states of “memory,”
whereas to accept L, you really have to remember the exact number
of 0’s you’ve read in so far, so that you can match that number of 1’s
later. So we should be able to “overload” the memory of D.
Here’s how: consider the string w = 0k+1, i.e., the string consisting
of k + 1 0’s. Since there are only k states in D, the path that w takes
through D must involve a loop starting at some state q. That is, we
can break up w into three parts: w = 0a0b0c, where b ≥ 1, δ(s, 0a) = q,
and δ(q, 0b) = q.
This loop is dangerous for the DFA! Because reading 0b causes a
loop that begins and ends at q, the DFA forgets whether it has read
0b or not; thus the strings 0a0c and 0a0b0c reach the same state, and
are now indistinguishable to D. But of course these two strings are The DFA has lost track of the number of
0’s.distinguishable with respect to L: 0a0c1a+c ∈ L, but 0a0b0c1a+c /∈ L.
■
Nondeterminism
Consider the following language over the alphabet {0, 1}∗: L = {w |
the third last character of w is 1}. You’ll see in the Exercises that a
DFA takes at least 8 states to accept L, using the techniques we devel-
oped in the previous section. Yet there is a very short regular expres-
sion that matches this language: (0+ 1)∗1(0+ 1)(0+ 1). Contrast this
with the regular expression (0 + 1)(0 + 1)1(0 + 1)∗, matching strings
whose third character is 1; this has a simpler 5-state DFA. You can prove this in the Exercises.
Why is it hard for DFAs to “implement” the former regex, but easy
to implement the latter? The fundamental problem is the uncertainty
associated with the Kleene star. In the former case, a DFA cannot
tell how many characters to match with the initial (0+ 1)∗ segment, before
moving on to the 1! This is not an issue in the latter case, because
DFAs read left to right, and so have no problem reading the first three
characters, and then matching the rest of the string to the (0+ 1)∗.
q0
q1
q2
q3
0,1
1
0,1
0,1
On the other hand, consider the automaton to the right. This is not
deterministic because it has a “choice”: reading in a 1 at q0 can loop
back to q0 or move to q1. Moreover, reading in a 0 or a 1 at q3 leads
nowhere! But consider this: for any string whose third last character is
indeed a 1, there is a “correct path” that leads to the final state q3. For
example, for the string 0101110, the correct path would continuously
loop at q0 for the prefix 0101, then read the next 1, transition to q1,
introduction to the theory of computation 79
then read the remaining 10 to end at q3, and accept.
BUT WAIT, you say, isn’t this like cheating? How did the automa-
ton “know” to loop at the first two 1’s, then transition to q1 on the
third 1? We will define our model in this way first, so bear with us.
The remarkable fact we’ll show later is that, while this model seems
more powerful than plain old DFAs, in fact every language that we can
accept in this model can also be accepted with a DFA. Cheating doesn’t help.
A Nondeterministic Finite Automaton (NFA) is defined in a similar nondeterministic finite automaton
fashion to a DFA: it is a quintuple N = (Q,Σ, δ, s, F), with Q, Σ, s, and
F playing the same roles as before. The transition function δ now maps
to sets of states rather than individual states; that is, δ : Q× Σ → 2Q,
where 2Q represents the set of all subsets of Q. For instance, in the
previous example, δ(q0, 1) = {q0, q1}, and δ(q3, 0) = ∅.
We think of δ(q, a) here as representing the set of possible states reach-
able from q by reading in the symbol a. This is extended in the natural
way to δ(q, w) for arbitrary length strings w to mean all states reach-
able from q by reading in the string w. Note that for state q and next
symbol a, if δ(q, a) = ∅ then this path “aborts,” i.e., the NFA does not
continue reading more characters for this path.
We say that a string w is accepted by NFA N if δ(s, w) ∩ F ̸= ∅; that
is, if there exists a final state reachable from the initial state by reading
the string w. Note that the existential quantifier used in the definition
of acceptance is an integral part of the definition: we don’t require that
every path leading out of s along w reach a final state, only one. This
formalizes the notion that it be possible to choose the correct path, even
from among many rejecting or aborted paths.
On the other hand, if a string is rejected
by an NFA, this means all possible
paths that string could take either
aborted or ended at a rejecting state.
ϵ-transitions
We can augment NFAs even further through the use of ϵ-transitions. ϵ-transition
These are nondeterministic transitions that do not require reading in a
symbol to activate. That is, if you are currently at state q on an NFA,
and there is an ϵ-transition from q to another state r, then you can
transition to r without reading the next symbol in the string.
q rϵ
For instance, the NFA on the right accepts the string 0 by taking
the ϵ-transition to q1, reading the 0 to reach q2, then taking another
ϵ-transition to q3.
q0
q1
q2
q3
q4
q5
ϵ
ϵ
0
ϵ, 0
1
As was the case for nondeterminism, ϵ-transitions do not actually
80 david liu utm edits by daniel zingaro
add any power to the model, although they can be useful in certain
constructions that we’ll use in the next section.
Equivalence of Definitions
So far in this chapter, we have used both DFAs and regular expressions
to represent the class of regular languages. We have taken for granted
that DFAs are sufficient to represent regular languages; in this section,
we will prove this formally. There is also the question of nondeter-
minism: do NFAs accept a larger class of languages than DFAs? In
this section, we’ll show that the answer, somewhat surprisingly, is no.
Specifically, we will sketch a proof of the following theorem.
Theorem (Equivalence of Representations of Regular Languages). Let
L be a language over an alphabet Σ. Then the following are equivalent:
(1) There is a regular expression that matches L.
(2) There is a deterministic finite automaton that accepts L.
(3) There is a nondeterministic finite automaton (possibly with ϵ-transitions)
that accepts L.
Proof. If you aren’t familiar with theorems asserting the equivalence of
multiple statements, what we need to prove is that any one of the state-
ments being true implies that all of the others must also be true. We
are going to prove this by showing the following chain of implications:
(3)⇒ (2)⇒ (1)⇒ (3).
We only sketch the main ideas of the
proof. For a more formal treatment,
see Sections 7.4.2 and 7.6 of Vassos
Hadzilacos’ course notes.
(3) ⇒ (2). Given an NFA, we’ll show how to construct a DFA
that accepts the same language. Here is the high-level idea: nondeter-
minism allows you to “choose” different paths to take through an au-
tomaton. After reading in some characters, the possible path choices
can be interpreted as the NFA being simultaneously in some set of
states. Therefore we can model the NFA as transitioning between sets
of states each time a symbol is read. Rather than formally defining this
construction, we’ll illustrate this on an example NFA (shown right).
0 1 2
b
a
ϵ, a
b
a
There are 23 = 8 possible subsets of states in this NFA; to construct
a DFA, we start with one state for each subset (notice that they are
labelled according to which states of the NFA they contain, so state 02
corresponds to the subset {0, 2}).
0 01 02 012
∅ 1 2 12
0 02 012
∅ 2 12
0 02 012
∅ 2 12
b
a
a, b
a, b
a, b
a
b
a
b
0 02 012
∅ 2 12
b
a
a, b
a, b
a, b
a
b
a
b
0 12 02 012
b
a a, b
a, b
a
b
But, we notice that the ϵ-transition between 1 and 2 ensures that
every time we could be in state 1 of the NFA, we could also be in state
introduction to the theory of computation 81
2. Therefore we can ignore states 01 and 1 in our construction, leaving
us with just six states.
Next, we put in transitions between the states of the DFA. Consider
the subset {0, 2} of states in the NFA. Upon reading the symbol a, we
could end up at all three states, {0, 1, 2} (notice that to reach 2, we must
transition from 2 to 1 by reading the a, then use the ϵ-transition from 1
back to 2). In our constructed DFA, there is a transition between {1, 2}
and {0, 1, 2} on symbol a. So in general, we look at all possible outcomes
starting from a state in subset S and reading symbol a. Repeating this
for all subsets yields the following transitions.
Next, we need to identify initial and final states. The initial state of
the NFA is 0, and since there are no ϵ-transitions, the initial state of
the constructed DFA is {0}. The final states of the DFA are exactly the
subsets containing the final state 1 of the NFA. Finally, we can simplify
the DFA considerably by removing the states ∅ and {2}, which cannot
be reached from the initial state.
(1) ⇒ (3). In this part, we show how to construct NFAs from
regular expressions. Note that regular expressions have a recursive
definition, so we can actually prove this part using structural induc-
tion. First, we show standard NFAs for accepting ∅, {ϵ}, and {a} (for
a generic symbol a).
{∅} {ϵ}
a
{a}
Next, we show how to construct NFAs for union, concatenation, and
star, the three recursive operations used to define regular expressions.
Note that because we’re using structural induction, it suffices to show
how to perform these operations on NFAs; that is, given two NFAs
N1 and N2, construct NFAs accepting L(N1) ∪ L(N2), L(N1)L(N2),
and (L(N1))∗. We use the notation on the right to denote generic
NFAs; the two accepting states on the right side of each box symbolize
all accepting states of the NFAs, and their start states are s1 and s2,
respectively.
s1
N1
s2
N2
First consider union. This can be accepted by the NFA shown to
the right. Essentially, the idea is that starting in a new start state,
we “guess” whether the word will be accepted by N1 or N2 by ϵ-
transitioning to either s1 or s2, and then see if the word is actually
accepted by running the corresponding NFA. s
s1
N1
s2
N2
ϵ
ϵFor concatenation, we start with the first NFA N1, and then every
time we reach a final state, we “guess” that the matched string from
L(N1) is complete, and ϵ-transition to the start state of N2.
s1
N1
s2
N2
ϵ
ϵ
82 david liu utm edits by daniel zingaro
Finally, for the Kleene star we perform a similar construction, except
that the final states of N1 ϵ-transition to s1 rather than s2. To possibly
accept ϵ, we add a new initial state that is also accepting. s s1
N1
ϵ
ϵ
ϵ
(2) ⇒ (1). Finally, we show how to construct regular expressions
from DFAs. This is the hardest construction to prove, so our sketch
here will be especially vague. The key idea is the following:
For any two states q, r in a DFA, there is a regular expression
that matches precisely the strings w such that δ(q, w) = r;
i.e., the strings that induce a path from q to r.
Let D = (Q,Σ, δ, s, F) be a DFA with n states, Q = {1, . . . , n}. For
each i, j ∈ Q, we define the sets Lij = {w | δ(i, w) = j}; our ultimate
goal is to show that every Lij can be matched by a regular expression.
To do this, we use a clever induction argument. For every 0 ≤ k ≤ n, This part was first proved by Stephen
Kleene, the inventor of regular expres-
sions and after whom the Kleene star is
named.
let
Lij(k) = {w | δ(i, w) = j, and only states ≤ k are passed between i and j}
Note that Lij(0) is the set of strings where there must be no interme-
diate states, i.e., w is a symbol labelling a transition directly from i to
j. Also, Lij(n) = Lij: no restrictions are placed on the states that can be
passed. We will show how to inductively build up regular expressions
matching each of the Lij(k), where the induction is done on k. First,
the base case, which we’ve already described intuitively:
Formally, our predicate is P(k) : “For
all states i and j, the set Lij(k) can be
matched by a regex.”
Lij(0) =
{{a ∈ Σ | δ(i, a) = j}, if i ̸= j
{a ∈ Σ | δ(i, a) = j} ∪ {ϵ}, if i = j
Note that when i = j, we need to include ϵ, as this indicates the trivial
act of following no transition at all. Since the Lij(0) are finite sets (of
symbols), we can write regular expressions for them (e.g., if Lij(0) =
{a, c, f }, the regex would be a + c + f ).
You can prove this fact in the Exercises.
Finally, here is the recursive definition of the sets that will allow us
to construct regular expressions: it defines Lij(k + 1) in terms of some
L...(k), using only the operations of union, concatenation, and star:
Lij(k + 1) = Lij(k) ∪
(
Li,k+1(k)Lk+1,k+1(k)∗Lk+1,j(k)
)
Therefore, given regular expressions for Lij(k), Li,k+1(k), Lk+1,k+1(k),
and Lk+1,j(k), we can construct a regular expression for Lij(k + 1).
■
introduction to the theory of computation 83
Exercises
1. For each of the following regular languages over the alphabet Σ =
{0, 1}, design a regular expression and DFA which accepts that lan-
guage. For which languages can you design an NFA that is substan-
tially smaller than your DFA?
(a) {w | w contains an odd number of 1’s}
(b) {w | w contains exactly two 1’s}
(c) {w | w contains 011}
(d) {w | w ends with 00}
(e) {w | |w| is a multiple of 3}
(f) {w | every 0 in w is eventually followed by a 1}
(g) {w | w does not begin with 10}
(h) {w | w is the binary representation of a multiple of three}
(i) {w | w contains both 00 and 11 as substrings}
(j) {0n1m | m, n ∈N, m + n is even}
(k) {w | w contains a substring of length at most 5 that has at least three 1’s}
(l) {w | w has an even number of zeroes, or exactly 2 ones}
2. Let L = {w ∈ {0, 1}∗ | the third character of w is a 1}. Prove that
every DFA accepting L has at least 5 states.
3. Let L = {w ∈ {0, 1}∗ | the third last character of w is a 1}. Prove
that every DFA accepting L has at least 8 states. Hint: Consider the
8 binary strings of length 3. For a bonus, what is the smallest DFA
you can find that accepts L? It will have
at least 8 states!4. Prove by induction that every finite language can be represented
by a regular expression. (This shows that all finite languages are
regular.)
5. Prove that the following languages are not regular.
(a) {an2 | n ∈N}
(b) {xx | x ∈ {0, 1}∗}
(c) {w ∈ {a, b}∗ | w has more a’s than b’s}
(d) {w ∈ {0, 1}∗ | w has two blocks of 0’s with the same length} A block is a maximal substring contain-
ing the same character; for example, the
string 00111000001 has four blocks: 00,
111, 00000, and 1.
(e) {anbmcn−m | n ≥ m ≥ 0}
6. Recall that the complement of a language L ⊆ Σ∗ is the set L =
{w ∈ Σ∗ | w /∈ L}.
(a) Given a DFA D = (Q,Σ, δ, s, F) that accepts a language L, de-
scribe how you can construct a DFA that accepts L.
(b) Let L be a language over an alphabet Σ. Prove that if L is not
regular, then the complement of L is not regular.
84 david liu utm edits by daniel zingaro
7. A regular expression r is star-free if it doesn’t contain any star oper-
ations. Prove that for every star-free regex r, L(r) is finite.
8. The suffix operator Su f f takes as input a regular language, and out-
puts all possible suffixes of strings in the language. For example, if
L = {aa, baab} then
Su f f (L) = {ϵ, a, aa, b, ab, aab, baab}.
Prove that if L is a regular language, then so is Su f f (L). (Hint: recall
the definition of regular languages, and use structural induction!)
9. The prefix operator Pre takes as input a regular language, and outputs
all possible prefixes of strings in the language. For example, if L =
{aa, baab} then
Pre(L) = {ϵ, a, aa, b, ba, baa, baab}.
Prove that if L is a regular language, then so is Pre(L). (Hint: recall
the definition of regular languages, and use structural induction!)
In Which We Say Goodbye
With CSC236 complete, you have now mastered the basic concepts and
reasoning techniques vital to your computer science career both at this
university and beyond. You have learned how to analyse the efficiency
of your programs, both iterative and recursive. You have also learned
how to argue formally that they are correct by using program spec-
ifications (pre- and postconditions) and loop invariants. You studied
the finite automaton, a simple model of computation with far-reaching
consequences.
Where to from here? Most obviously, you will use your skills in
CSC263 and CSC373, where you will study more complex data struc-
tures and algorithms. You will see first-hand the real tools with which
computers store and compute with large amounts of data, facing real-
world problems as ubiquitous as sorting – but whose solutions are not
nearly as straightforward. If you were intrigued by the idea of prov-
ably correct programs, you may want to check out CSC410; if you liked
the formal logic you studied in CSC165, and are interested in more of
its computer science applications (of which there are many!), CSC330,
CSC438, CSC465, and CSC486 would be good courses to consider. If
you’d like to learn about more powerful kinds of automata and more
complex languages, CSC448 is the course for you. Finally, CSC463
tackles computability and complexity theory, the fascinating study of the
inherent hardness of problems.
For more information on the above, or other courses, or any other
matter academic, professional, or personal, come talk to any one of us
in the Department of Computer Science! Our {doors, ears, minds} are
always open.
Be the person your dog thinks you are.
Appendix: Runtime Analysis
(Written by Dr. Daniel Zingaro, influenced by discussions with Dr.
Cusack and David Liu.)
This appendix introduces three formal notations for analyzing the
runtime of algorithms: O, Ω, and Θ. We use these notations in this
course to characterize the running times of algorithms in ways that
are unrelated to some particular computer, programming language, or
operating system.
Here are two Python functions of the sort that you saw in CSC108
and CSC148.
1 def slow(n):
2 x = 0
3 for i in range(n):
4 for j in range(n):
5 for k in range(n):
6 x = x + 1
7 return x
1 def faster(n):
2 x = 0
3 for i in range(n):
4 for j in range(n):
5 x = x + 1
6 return x
You’ll recall from earlier courses that faster is a faster algorithm
than slow. Why? The intuition is that the triply-nested for-loops of
88 david liu utm edits by daniel zingaro
slow are slower than the doubly-nested loops of faster. In slow, the
x = x + 1 assignment statement happens n3 times, whereas it hap-
pens only n2 times in faster. In this course, we won’t be able to rely
on such informal ad-hoc analyses, because:
• Some of our functions will be recursive, not iterative, and so there
are no loops to count anyway! We want a technique that “works”
for recursive functions, too.
• Sometimes, the runtime complexity is not apparent from the loop
structure. Instead, we’ll want a technique that takes an expression
for the number of steps, and returns a bound like n2 or n3.
The notations introduced in this Appendix solve both of these prob-
lems.
Big O
We’ll start with the O notation (“big oh”), which is used to give an
upper-bound to an expression.
We say that f (n) is Big-O of g(n), written as f (n) ∈
O(g(n)), iff there are positive constants c and n0 such
that
f (n) ≤ c g(n) for all n ≥ n0.
This definition captures the idea that g(n) is an upper-bound of
f (n). It doesn’t say anything about being a tight upper-bound, though.
For example, it is true that 3n = O(n), but it is also true that 3n =
O(n2) or 3n = O(2n). Sometimes, people informally use O to mean
“tight bound”, but this is not how it is formally defined.
Note that sometimes you will see f (n) ∈ O(g(n)) written with an
equals sign, like f (n) = O(g(n)). There’s really no equality going on
here, so you should continue to think of it as set membership: f is a
function in the set of functions that grow no faster than g.
Here’s how to think about the two constants n0 and c:
• n0 is a point at which g is at least as large as f , and g stays at least as
large as f at larger values of n > n0. That is, at n0, n0 + 1, n0 + 2, . . .,
g is at least as large as f . This captures the idea that we don’t care
about the relative sizes of f and g when n is small. We just have to
find a point, eventually, where g is at least as large as f .
introduction to the theory of computation 89
• If we didn’t have the c constant, then we’d be stuck saying stuff
like 3n = O(3n). But we don’t care about the 3 here: what really
matters is the n. The constant c is what allows us to “throw away”
coefficients and smaller terms, focusing us only on the largest term
and ignoring its coefficient.
To prove that f (n) = O(g(n)), we start with f (n) and use a chain of
≤ inequalities or = equalities to arrive at cg(n) for some c > 0. Along
the way, we’ll have to keep track of some constant n0 for which the
inequalities are true for all values of n at least as large as n0. Let’s see
how such a proof goes.
Example. Prove that 100n + 10000 = O(n2).
Proof. Intuitively, you shouldn’t be surprised that this is true: n2 on
the right grows more quickly than the n and constant terms on the
left. To begin the formal proof, think about how 100n compares to
100n2. As long as n ≥ 1, we have that 100n ≤ 100n2, because 100n2
has an extra multiplicative n term that makes the expression larger. So,
we’ve got an upper-bound for 100n of 100n2, and this is good news,
because we’re trying to get the left side 100n + 10000 to be a bunch of
n2 terms.
Let’s continue by comparing 10000 to 10000n2. Again, when n ≥ 1,
we have that 10000 ≤ 10000n2. So, we have an upper-bound for 10000
of 10000n2. Let’s write out what we’ve got so far, in an equational
style:
100n + 10000 ≤ 100n2 + 10000n2
= 10100n2
We already know our n0: it’s 1. Additionally, from this chain of
inequalities, we also get our c: it’s 10100. This completes the proof, as
we have found n0 = 1 and c = 10100 such that 100n + 10000 ≤ cn2 for
all n ≥ n0. ■
Let’s continue with an example that has negative terms. To remove
a negative term, simply ensure that it really is negative through a
suitable choice of n0, and then remove the term. This is correct because
removing a negative term from x provides an upper-bound for x.
Example. Prove that 5n2 − 3n + 20 = O(n2).
Proof. We have to show for all n ≥ n0 and some constant c > 0 that
5n2 − 3n + 20 ≤ cn2
90 david liu utm edits by daniel zingaro
For n ≥ 1, we have
5n2 − 3n + 20 ≤ 5n2 + 20
≤ 5n2 + 20n2
= 25n2
So, we choose n0 = 1 and c = 25. ■
Note that we’re not forced to find the smallest suitable values of n0
and c. As long as the math is correct, any suitable values of n0 and
c will do. For example, in the previous proof, we could have used
n0 = 34 and c = 50, and the proof would still be valid. (Confusing, for
sure, but valid!)
Big Omega
We now know what O does: it gives an upper-bound for an expression.
But remember that the O upper-bound isn’t required to be tight. So
if someone says that an algorithm is O(n3), you really don’t know its
runtime. It could be n3, yes, but it could also be faster, like n2 or n.
So upper-bounding isn’t sufficient; we’ll also want to lower-bound. To
lower-bound, we use a different but related notation, Ω (big Omega).
We say that f (n) is Big-Omega of g(n), written as f (n) ∈
Ω(g(n)), iff there are positive constants c and n0 such
that
c g(n) ≤ f (n) for all n ≥ n0.
So, instead of upper-bounding f by g as for a O proof, we upper-
bound g by f . Here’s a quick example.
Example. Prove that n3 + 4n2 = Ω(n2).
Proof. We have to show for all n ≥ n0 and some positive c that
cn2 ≤ n3 + 4n2
For n ≥ 0, we have
n2 ≤ 4n2
≤ n3 + 4n2
So, we choose n0 = 0 and c = 1. ■
introduction to the theory of computation 91
It’s a common belief that O is used for worst-case analysis and Ω is
used for best-case analysis. This is not true. Both notations can bound
any function — best-case, worst-case, average-case, whatever.
Big Theta
One more notation, and then we’ll be done. So far, we have a way to
bound a function from above (O) and a way to bound a function from
below (Omega). Suppose we show that f (n) = O(n2) and also that
f (n) = Ω(n2). This means that the upper-bound and lower-bound are
both n2. We then want a way to say “ f (n) is exactly proportional to
n2”. The way to state this exact bound is using big Theta, Θ. It’s a
combination of O and Ω:
We say that f (n) is Big-Theta of g(n), written as f (n) ∈
Θ(g(n)), iff f (n) = O(g(n)) and f (n) = Ω(g(n)).
Therefore, to do a Θ proof, we do two separate proofs: the O proof
and the Ω proof. If the O and Ω bounds are the same, then the Θ proof
is complete.
We’ll start with a polynomial Θ example and then move on to a
nonpolynomial example to help you generalize what you’ve learned
about the three notations.
Example. Find a Θ bound on f (n) = n8 + 7n7− 10n5− 2n4 + 3n2− 17.
Proof. The highest power is n8, so it’s reasonable to try proving an n8
Θ bound.
First, the big O proof. For n ≥ 1:
n8 + 7n7 − 10n5 − 2n4 + 3n2 − 17 ≤ n8 + 7n7 + 3n2
≤ n8 + 7n8 + 3n8
= 11n8
So, we choose n0 = 1 and c = 11 to show that f (n) = O(n8).
Second, the big Omega proof. For n ≥ 1:
n8 + 7n7 − 10n5 − 2n4 + 3n2 − 17 ≥ n8 − 10n5 − 2n4 − 17
≥ n8 − 10n7 − 2n7 − 17n7
= n8 − 29n7
92 david liu utm edits by daniel zingaro
To continue, we’ll want to show that n8 − 29n7 ≥ cn8 for some
constant c > 0. As long as n ≥ 1, dividing through by n8 gives us
something equivalent that we can prove:
n8 − 29n7 ≥ cn8
1− 29/n ≥ c (if n ≥ 1)
How can we choose c so that this inequality is true? The key is
to understand the behaviour of the −29/n term. −29/n is increasing
(substitute large values of n to observe its behaviour), which is a good
thing as we want to lower-bound 1 − 29/n. If we substitute 58 for
n, 1− 29/n = 1− 29/58 = 0.5, so 1− 29/n ≥ 0.5 for n ≥ 58. We
therefore choose n0 = 58 and c = 0.5 to complete the Ω proof. (Don’t
get too hung up on that choice of n0 = 58: other values are possible,
too. For example, we could choose n0 = 40, c = 0.275.)
As the O and Ω proofs are now complete, the overall Θ proof is
complete.
You might wonder why we used n8− 10n7− 2n7− 17n7 in the proof.
Why not just use n8 − 10n8 − 2n8 − 17n8 = −28n8? The reason is
that then we’d be stuck proving that −28n8 ≥ cn8 or, equivalently for
n ≥ 1, −28 ≥ c. We can satisfy this with something like c = −28
or c = −29, but remember that we must find a positive value for c,
and that is impossible here. So using that 7 exponent is what allows a
positive value of c to be found! ■
Example. Prove that n2 ∈ O(1.1n), but that n2 /∈ Θ(1.1n).
What are we doing here? We’re being asked to prove that c ∗ 1.1n is
an upper-bound of n2 (that’s the O part), but that d ∗ 1.1n is not also
a lower-bound for n2. If we can do this, then we’ll have proved that
n2 ∈ O(1.1n), but also that n2 ̸∈ Ω(1.1n). Putting those together, it
follows that n2 ̸∈ Θ(1.1n). This kind of proof shows that some g(n) is
an upper-bound, but not a tight upper-bound, for some function f (n).
There’s actually a notation for this case, f (n) = o(g(n)) (that’s a little
o), but we won’t use it in this course.
Proof. The O part of the proof is more annoying than previous ones,
because one function is a polynomial while the other is an exponential.
Informally, we could argue that an exponential grows more quickly
than a polynomial, but that isn’t a formal proof! Instead, you can use
mathematical induction to prove that for all n ≥ 100, n2 ≤ 1.1n. Then,
introduction to the theory of computation 93
using n0 = 100 and c = 1, the O proof will be complete. This induction
proof is a good exercise — please try it!
Now for the Ω part of the proof. Suppose for contradiction that, for
all n ≥ n0 and some c > 0, we have
c ∗ 1.1n ≤ n2
taking logs and doing some algebra, the following lines are equivalent:
ln(c ∗ 1.1n) ≤ ln(n2)
ln(c ∗ 1.1n) ≤ 2 ln(n)
ln(c) + ln(1.1n) ≤ 2 ln(n)
ln(c) + n ln(1.1) ≤ 2 ln(n)
ln(c) ≤ 2 ln(n)− n ln(1.1)
but this is a contradiction: no c satisfies this equation. The reason is
because the right side decreases to negative infinity. No matter what
value we choose for c, the right side can always be made to be less
than ln(c) through a suitably-large choice of n. ■
Properties of the Notations
The O, Ω, and Θ notations each have many properties that can be
proved. These properties often follow naturally from their intuitive
definitions. For example, suppose that f (n) = O(g(n)) and g(n) =
O(h(n)). This says that g(n) is an upper-bound for f (n), and that h(n)
is an upper-bound for g(n). You’d then expect h(n) to be an upper-
bound for f (n) and, indeed, this can be proven using the definition of
O. (In this way, O is similar to the ≤ operator.)
Here, we’ll prove a different result: that Θ is symmetric (and, in this
way, acts like the = operator).
Example. Prove that if f (n) = Θ(g(n)), then g(n) = Θ( f (n)).
Proof. The assumption is that f (n) = Θ(g(n)). This tells us two things:
that f (n) = O(g(n)) and that f (n) = Ω(g(n)). Let’s associate c1, n1 >
0 with the O assumption and c2, n2 > 0 with the Ω assumption. Let n0
be the larger of n1 and n2.
Then, we have as assumptions that c2g(n) ≤ f (n) ≤ c1g(n) for all
n ≥ n0 Therefore, we have g(n) ≤ (1/c2) f (n) and g(n) ≥ (1/c1) f (n),
for all n ≥ n0 Equivalently:
(1/c1) f (n) ≤ g(n) ≤ (1/c2) f (n), for all n ≥ n0 Using constants 1/c1,
1/c2, and n0, we see that g(n) = Θ( f (n)), as required. ■