Math40002: Analysis I
We will pick up where MATH40002 left off in the Autumn term, rigorously devel-
oping the basic tools of calculus: continuity, differentiation, and integration.
We will continue to have infinite fun.
Syllabus
Continuity: Review of continuity. Sequential continuity. Uniform continuity. In-
termediate and extreme value theorems. Inverse function theorem for mono-
tonic functions.
Differentiation: Definitions, examples, and properties. Mean value theorem. Higher
derivatives and convexity. Differentiation of series.
Integration: Definition, examples, and properties of Riemann–Darboux integral.
Fundamental theorem of calculus. Techniques: integration by parts, substitu-
tion. Indefinite integration.
Books
Martin Liebeck, A Concise Introduction to Pure Mathematics.
Mary Hart, Guide to Analysis.
KG Binmore, Mathematical Analysis, A Straightforward Approach.
David Brannan, A first course in mathematical analysis.
Steven Lay, Analysis: with an introduction to proof.
Stephen Abbott, Understanding analysis.
I will post gappy notes on Blackboard, and a complete version at the end of term.
We will continue to use edStem for online discussion of the lecture material.
Assessment
20% Autumn term assessments, including the January test
3% Mini Blackboard quizzes every few weeks (due 1 Feb, 1 Mar, 22 Mar)
5% Midterm test (1 hour) – week of 14-18 February
1% Formal write-up of a problem sheet problem, due 4 February
1% Formal write-up of a midterm problem, due 4 March
70% May exam
Contents Math40002: Analysis I
Contents
1 Continuity 3
1.1 The intermediate value theorem . . . . . . . . . . . . . . . . . . . . . 5
1.2 The extreme value theorem . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Open, closed, and compact sets . . . . . . . . . . . . . . . . . . . . . 17
1.4 Uniform continuity and convergence . . . . . . . . . . . . . . . . . . . 21
2 Differentiation 31
2.1 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 The mean value theorem . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.3 L’Hoˆpital’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.4 Higher derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5 Second derivatives and convexity . . . . . . . . . . . . . . . . . . . . 51
2.6 Limits of differentiable functions . . . . . . . . . . . . . . . . . . . . . 55
3 Integration 63
3.1 Darboux sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 The Darboux integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.4 The fundamental theorem of calculus . . . . . . . . . . . . . . . . . . 83
3.5 More properties of integrals . . . . . . . . . . . . . . . . . . . . . . . 87
3.6 Limits of integrable functions . . . . . . . . . . . . . . . . . . . . . . 92
3.7 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.8 Lebesgue’s criterion for integrability . . . . . . . . . . . . . . . . . . . 103
3.9 The irrationality of pi . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2
1 Continuity Math40002: Analysis I
1 Continuity
We begin by recalling what it means for a function to be continuous.
Definition. Given a function f : R→ R, we say that f is continuous at a ∈ R
if and only if
∀ > 0 ∃δ > 0 such that |x− a| < δ =⇒ |f(x)− f(a)| <
We say that f is continuous on R (or just “continuous”) if it is continuous at
all a ∈ R.
We also saw the equivalent notion of sequential continuity, which is sometimes more
convenient.
Theorem 1.1
f : R→ R is continuous at a ∈ R ⇐⇒ f(xn)→ f(a) ∀ sequences xn → a.
Let’s warm up by proving some basic properties of continuity.
Proposition 1.2. Let f, g : R → R be functions which are both continuous at
a ∈ R. Then the functions f + g, f − g, and f · g are all continuous at a, and if
g(a) 6= 0 then fg is also continuous at a.
Proof. We’ll make use of sequential continuity here. For any sequence xn → a, we
have f(xn) → f(a) and g(xn) → g(a) by hypothesis. The algebra of limits tells us
that
lim
n→∞
(
f(xn) + g(xn)
)
= lim
n→∞ f(xn) + limn→∞ g(xn) = f(a) + g(a).
Since this works for every sequence xn → a, we conclude that f + g is continuous.
The same argument works for f − g and f · g, and also for fg if g(a) 6= 0.
Now we can start to assemble a big collection of continuous functions almost for free
– no need for epsilons and deltas! We know that linear functions f(x) = mx+ c are
continuous, so we get
• any function f(x) = xn is continuous, where n ≥ 0 is an integer;
• any polynomial p(x) = anxn+an−1xn−1+ · · ·+a1x+a0 is continuous, where
the ai are real numbers;
• any rational function p(x)
q(x)
, where p and q are polynomials, is continuous at
all a ∈ R where q(a) 6= 0.
3
1 Continuity Math40002: Analysis I
The first two can be proved by induction on n (try it!), and the third follows from
the second.
Similarly, we saw last term that E : C → C defined by E(z) = ∑∞n=0 znn! is contin-
uous on C. Since x 7→ sin(x) and x 7→ cos(x) can be defined in terms of E (how?),
this tells us where the trigonometric functions are continuous:
• sin(x) and cos(x) are continuous on R;
• tan(x) = sin(x)
cos(x)
is continuous wherever cos(x) 6= 0, meaning at all x except
x =
(2k+1)pi
2 , k ∈ Z;
• likewise, sec(x), csc(x), cot(x) are continuous at any x where cos(x) 6= 0,
sin(x) 6= 0, and sin(x) 6= 0 respectively.
Proposition 1.3. If f : R → R is continuous at a, and g : R → R is continuous
at f(a), then the composition g ◦ f defined by x 7→ g(f(x)) is continuous at a.
Proof. Let xn be any sequence such that xn → a. Since f is continuous at a, we
have f(xn) → f(a). Now we can plug the convergent sequence yn = f(xn), with
limit y = f(a), into g: since g is continuous at y, we have g(yn)) → g(y). But this
is just another way of saying that g(f(xn)) → g(f(a)). Again, this works for any
sequence xn → a, so g(f(x)) must be continuous at a.
Example 1.4. We know that E : C → C defined by E(z) = ∑∞n=0 znn! is
continuous on C, and we saw that this implies that sin(x) is continuous on R.
So now
f(z) = sin
(
z2 + 1
z − 2
)
is continuous at all a ∈ R where z2+1z−2 is continuous, meaning at all a 6= 2.
Example 1.5. Consider the function f : R→ R defined by
f(x) =
{
x sin(1/x), x 6= 0
0, x = 0.
Since 1x is a rational function, it is continuous at all x 6= 0. Thus
• sin( 1x) is continuous at all x 6= 0, since x 7→ sin(x) is continuous;
4
1 Continuity Math40002: Analysis I
• x sin( 1x) is continuous at all x 6= 0, since both x and sin( 1x) are;
and finally we saw last term that f is continuous at x = 0 as well, so f is
continuous on all of R.
Question 1. Let f, g : R→ R be functions, and let h(x) = g(f(x)). Which of
the following is always true?
1. If f and g are not continuous, then h is not continuous.
2. If f is continuous and g is not, then h is not.
3. If f is not continuous but g is, then h is.
4. More than one of choices 1, 2, 3.
5. None of choices 1, 2, 3. X
The easiest thing to try would be functions that are constant, or maybe piecewise
constant, to see what happens. For item 1, we might try something like
f(x) =
{
0, x < 0
1, x ≥ 0 and g(x) =
{
0, −2 ≤ x ≤ 2
1, |x| > 2
so that g(f(x)) = 0 is constant. For 2, we take f = 0 and any discontinuous g, and
then h = f(0) is constant. For 3, we take g(x) = x and any discontinuous f , and
then h = f is discontinuous.
1.1 The intermediate value theorem
We now come to the first major application of continuity.
Theorem 1.6: Intermediate value theorem
Let f : [a, b] → R be a continuous function, and pick a value c between f(a)
and f(b). Then there is some x ∈ [a, b] such that f(x) = c.
Here’s a picture of what this might look like:
5
1 Continuity Math40002: Analysis I
a
b
f(a)
f(b)
x
c
Note that there might be several possible values of x; in this picture there are three.
This theorem certainly sounds obvious – a continuous function can’t jump, so it
can’t skip any values – but that doesn’t mean it’s easy to prove! We wouldn’t
even know where to begin if we hadn’t already built up the notions of limits and
continuity from scratch.
Proof. We can assume without loss of generality that f(a) < c < f(b). Indeed:
• if f(a) = c we take x = a, and if f(b) = c then we take x = b;
• if f(a) = f(b) then c = f(a) and we are already done; and
• if f(a) > f(b) then we can take g(x) = −f(x), so that g(a) < g(b), and ask
for x with g(x) = −c instead.
So now we consider the set
Sc = {y ∈ [a, b] | f(y) < c}.
This set is nonempty, because a ∈ Sc, and it is bounded above by b. Thus if we let
x = supSc then we have a ≤ x ≤ b.
Claim 1: f(x) ≥ c.
Let’s suppose this claim is false, so f(x) < c, and take = c− f(x) > 0. Note that
if f(x) < c then x 6= b, so we must have x < b here. Since f is continuous at x, we
have
∃δ > 0 such that |f(y)− f(x)| < ∀y ∈ (x− δ, x+ δ) ∩ [a, b].
In particular, this means that
f(y) < f(x) + = c ∀y ∈ (x− δ, x+ δ) ∩ [a, b].
So all of these y belong to Sc, and if we choose y ∈ (x, x+ δ) ∩ [a, b], say
y = x+
1
2
min(δ, b− x),
6
1 Continuity Math40002: Analysis I
then we have y > x, so x isn’t actually an upper bound for Sc, contradiction.
Claim 2: f(x) ≤ c.
Supposing again that this claim is false, so that f(x) > c (and hence x 6= a), we
take = f(x)− c > 0 and use the continuity of f at x to find δ > 0 such that
|f(y)− f(x)| < ∀y ∈ (x− δ, x+ δ) ∩ [a, b].
For any such y we have f(y) > f(x)− = c, so none of these y are in Sc. But then
m = max
(
x− δ
2
, a
)
is strictly less than x, and it is also an upper bound for Sc, since x is an upper
bound and no y ∈ (m,x) belongs to Sc. This contradicts the fact that x is the least
upper bound of Sc.
We have now shown that f(x) ≥ c and f(x) ≤ c, so together these tell us that
f(x) = c and we are done.
Question 2. Let f : R→ R. Which of the following is true?
1. If f is continuous, then ∀c ∈ R ∃x such that f(x) = c.
2. If f is not continuous, then ∃c ∈ R such that ∀x f(x) 6= c.
3. If f is continuous and f(−1) = −1, f(0) = 2, and f(3) = −5, then there
is more than one x ∈ R such that f(x) = 0. X
4. If f is continuous and lim
x→∞ f(x) = 0, then ∃x ∈ R such that f(x) = 0.
Here are pictures of some counterexamples to choices 1, 2, and 4:
f(x) = exp(x) f(x) = 1/(x
2 + 1)f(x) 1/(x2 1)
As an application, the fundamental theorem of algebra says that every non-
constant polynomial
p(x) = adx
d + ad−1xd−1 + · · ·+ a1x+ a0, d ≥ 1, ad 6= 0
7
1 Continuity Math40002: Analysis I
has a root: there is some x such that p(x) = 0. We won’t prove this here (it’s one
of the highlights of complex analysis!), but we can prove a special case.
Proposition 1.7. If p(x) is a polynomial of odd degree, then p(x) has a root.
Proof. We can divide both sides of p(x) = 0 by ad, so we might as well assume that
ad = 1 and then try to solve
p(x) = xd + ad−1xd−1 + · · ·+ a1x+ a0 = 0.
Polynomials are continuous, so we just need to find some a, b ∈ R such that p(a) < 0
and p(b) > 0, and then we can apply the intermediate value theorem to the interval
[a, b], taking c = 0, to find x such that p(x) = c = 0.
We’ll look for b first. The key idea is that for x very large, the xd term above is
much bigger than the others, so that p(x) behaves like xd in some sense. We can
factor
p(x) = xd
(
1 +
ad−1
x
+ · · ·+ a1
xd−1
+
a0
xd
)
,
and if we call the part in parentheses q(x), then lim
x→∞ q(x) = 1. This means that
there’s some N > 0 such that q(x) > 12 for all x ≥ N , and then
p(x) = xdq(x) >
xd
2
> 0 ∀x ≥ N.
So we can just take b = N , and then p(b) > 0.
We use the same idea to look for a. We also have lim
x→−∞ q(x) = 1, so there’s some
M < 0 such that q(x) > 12 for all x ≤M , and then since xd < 0 for x ≤M we have
p(x) = xdq(x) <
xd
2
< 0 ∀x ≤M.
So if we take a = M then we have p(a) < 0, and we’re done.
Here’s another application of the intermediate value theorem.
Proposition 1.8. If f : [0, 1] → [0, 1] is continuous, then it has a fixed point:
there is an x ∈ [0, 1] such that f(x) = x.
8
1 Continuity Math40002: Analysis I
1
1
f(x)
x
Proof. Rearranging the conclusion we want a little bit, we get x − f(x) = 0. So
we’ll define g : [0, 1] → R by g(x) = x − f(x), which is also continuous, and try to
prove that there’s some x ∈ [0, 1] for which g(x) = 0.
1
−1
1 g(x)
If we have f(0) = 0 or f(1) = 1 then there’s nothing left to prove, so we can assume
from now on that f(0) > 0 and f(1) < 1. In this case we have
g(0) = 0− f(0) < 0 and g(1) = 1− f(1) > 0.
Since 0 lies between g(0) and g(1), the intermediate value theorem tells us that
there’s some x ∈ [0, 1] for which g(x) = 0, and this means that f(x) = x.
1.2 The extreme value theorem
We wish to study the maxima and minima of continuous functions, so we’ll begin
with some terminology.
Definition. Let S be a subset of R, and let f : S → R be a function. We say
that f is bounded above if
∃M ∈ R such that f(x) ≤M ∀x ∈ S.
Similarly, f is bounded below if
∃m ∈ R such that f(x) ≥ m ∀x ∈ S.
9
1 Continuity Math40002: Analysis I
If f is both bounded above and bounded below, we say that f is bounded.
A function can be continuous but not bounded, and vice versa; we have to pay close
attention to the domain where f is defined. Here are some examples:
Example 1.9. The function f : {x > 0} → R given by f(x) = 1x is bounded
below by 0, but not bounded above.
1 2 3 4
1
2
3
4
Supposing f were bounded above by some M ∈ R, we could take
x =
1
|M |+ 1 =⇒ f(x) = |M |+ 1 > M,
a contradiction.
Example 1.10. The function f : R \ {0} → R given by f(x) = 1x is neither
bounded above nor bounded below. If it were bounded below by some m ∈ R,
we could take x = −1/(|m|+ 1) and get the same contradiction as before.
Example 1.11. Define f : R → R by f(x) =
{
0, x ∈ Q
1, x 6∈ Q. Then f isn’t
continuous anywhere, but it’s bounded below by m = 0 and above by M = 1.
The above examples all seem to involve discontinuities, if not on their domains then
right at the boundary. But even continuous functions on all of R can be unbounded:
Example 1.12. The function f : R → R defined by f(x) = x3 is neither
bounded above nor bounded below. If it were bounded above by M , then we
could take x = 3
√
M + 1 and we’d get f(x) > M , and the bounded below case is
10
1 Continuity Math40002: Analysis I
similar.
On the other hand, things look a little better for continuous functions on closed
intervals of finite length.
Theorem 1.13
If f : [a, b]→ R is continuous, then it is bounded.
Proof. Let’s prove that f is bounded above. If not, then we can define a sequence
x1, x2, x3, . . .
by taking xn ∈ [a, b] which satisfies f(xn) > n for all n. The sequence
(
xn
)
is
bounded below by a and bounded above by b, so the Bolzano–Weierstrass theorem
says that it has a convergent subsequence
xn1 , xn2 , xn3 , . . .
for some integers 1 ≤ n1 < n2 < n3 < . . . which go to infinity. We have a ≤ xni ≤ b
for all i, so if we call the limit of this subsequence x, then a ≤ x ≤ b as well. (Why?)
Now we use the (sequential) continuity of f and the fact that x ∈ [a, b] to say that
f(xni)→ f(x).
But by construction we have f(xni) ≥ ni, and ni →∞, so f(xni)→∞ and this is
a contradiction.
The proof that f is bounded below is essentially the same. Or we could reduce it
to the case we’ve already done: the function g(x) = −f(x) is continuous on [a, b],
hence bounded above by some M , and since g(x) ≤ M for all x ∈ [a, b] we must
have f(x) ≥ −M for all x ∈ [a, b].
11
1 Continuity Math40002: Analysis I
Theorem 1.14: Extreme value theorem
Let f : [a, b]→ R be a continuous function. Then f is bounded, and it attains
its lower and upper bounds: there are c, d ∈ [a, b] such that
f(c) ≤ f(x) ≤ f(d) ∀x ∈ [a, b].
Equivalently, we have f(c) = inf
x∈[a,b]
f(x) and f(d) = sup
x∈[a,b]
f(x).
Proof 1. We just proved that f is bounded, so let S = sup
x∈[a,b]
f(x). Since S is the
least upper bound of {f(x) | x ∈ [a, b]}, we know that S − 1n is not an upper bound
for any n ≥ 1, so for each n we know that
∃xn ∈ [a, b] such that S − 1
n
< f(xn) ≤ S.
Again by Bolzano–Weierstrass, the xn have a convergent subsequence xni , with some
limit d ∈ [a, b]. Since f is continuous we have
f(xni)→ f(d),
and the sequence f(xni) is squeezed between S − 1ni → S and S, so its limit f(d)
must be S.
Likewise, if we let I = inf
x∈[a,b]
f(x), then for all n we can find elements xn ∈ [a, b]
such that I ≤ f(xn) < I+ 1n . We take c to be the limit of a convergent subsequence,
and it follows just as before that f(c) = I.
Proof 2. Once again f is bounded above, so we let S = sup
x∈[a,b]
f(x). If we assume
that f(x) < S for all x ∈ [a, b], then we have a well-defined function
g : [a, b]→ R, g(x) = 1
S − f(x)
which takes only positive values since S − f(x) > 0 for all x. We know that g is
continuous, so g is bounded as well.
Let M > 0 be an upper bound for g. Then we do some rearranging to get
g(x) =
1
S − f(x) ≤M ⇐⇒
1
M
≤ S − f(x)
⇐⇒ f(x) ≤ S − 1
M
.
12
1 Continuity Math40002: Analysis I
So S − 1M is also an upper bound for {f(x) | a ≤ x ≤ b}, contradicting the
assumption that S was the least upper bound. It must have been true that f(x) = S
for some x ∈ [a, b] after all, and we take d to be that value of x. (And again, the
proof that f attains its infimum is basically the same.)
We can combine the intermediate and extreme value theorems to understand the
images of continuous functions.
Proposition 1.15. Let f : [a, b] → R be a continuous function. Then there are
c, d ∈ [a, b] such that the image f([a, b]) is the closed interval [f(c), f(d)].
Proof. By the extreme value theorem, there are c, d ∈ [a, b] such that
f(c) = inf
x∈[a,b]
f(x), f(d) = sup
x∈[a,b]
f(x).
So the image f([a, b]) is at least a subset of the interval [f(c), f(d)]. In fact, for any
y ∈ [f(c), f(d)], the intermediate value theorem says that we can find e ∈ [c, d] (or
e ∈ [d, c], in case d < c) such that f(e) = y. Such an e must belong to the interval
[a, b], since it lies between c and d, so then y = f(e) ∈ f([a, b]) and we conclude that
the image is the whole interval.
We have shown that the image of a closed, bounded interval under a continuous
map is closed and bounded. This map need not be one-to-one, and in fact we can
tell exactly when this happens.
Proposition 1.16. If f : [a, b]→ R or f : R→ R is continuous, then f is injective
if and only if it is strictly monotonic.
Proof. The direction ⇐ is easy: if f is strictly monotonic, then for any distinct
points x < y in the domain we have f(x) < f(y) if f is monotone increasing, or
f(x) > f(y) if f is monotone decreasing. In any case, f(x) 6= f(y).
In order to prove ⇒, let’s first consider the case where f is defined on [a, b]. If
f is injective then f(a) 6= f(b), so we can assume without loss of generality that
f(a) < f(b).
Claim 1: f(a) = inf
x∈[a,b]
f(x) and f(b) = sup
x∈[a,b]
f(x).
The extreme value theorem says that there are some c, d ∈ [a, b] such that
f(c) = inf
x∈[a,b]
f(x) and f(d) = sup
x∈[a,b]
f(x).
13
1 Continuity Math40002: Analysis I
If f(c) < f(a) then we have f(c) < f(a) < f(b), so
by the intermediate value theorem we can find some
x ∈ [c, b] such that f(x) = f(a). We have a 6∈ [c, b],
so this says that a 6= x, contradicting the assumption
that f is injective. Therefore f(a) = f(c). Likewise,
if f(b) < f(d) then we can find y ∈ [a, d] with f(y) =
f(b), and again this is impossible, so f(b) = f(d).
a c x
b
Claim 2: f is strictly monotone increasing on [a, b].
Supposing otherwise, there are c, d ∈ [a, b] such that
c < d but f(c) ≥ f(d); since f is injective we can
sharpen this to f(c) > f(d). Since
f(a) = inf
x∈[a,b]
f(x) ≤ f(d) < f(c),
we can use the intermediate value theorem to find
x ∈ [a, c] such that f(x) = f(d). But then x ≤ c < d,
so this contradicts the fact that f is injective.
a
c
dx
b
These claims prove that an injective, continuous function f : [a, b] → R must be
strictly monotone. If instead we have an injective, continuous function f : R → R
which is not strictly monotonic, then there are
• a, b ∈ R such that a < b and f(a) ≥ f(b), since f is not strictly increasing;
• c, d ∈ R such that c < d and f(c) ≤ f(d), since f is not strictly decreasing.
We pick some constant N > max(|a|, |b|, |c|, |d|) and consider the restriction
g = f |[−N,N ] : [−N,N ]→ R.
On the one hand, g is injective since f is. On the other hand, its domain contains the
intervals [a, b] and [c, d], and it is neither strictly monotone increasing on the first nor
strictly monotone decreasing on the second, so g cannot be strictly monotonic. But
this contradicts the case of the proposition that we already proved, so we conclude
that f must have been strictly monotonic as well.
Thus for a continuous function f : [a, b]→ R, we have
f injective⇐⇒ f strictly monotonic
⇐⇒ f is a bijection [a, b]→ [f(a), f(b)].
This means that we have an inverse function f−1 : [f(a), f(b)] → R, where f−1(x)
is defined to be the unique value of y such that f(y) = x.
14
1 Continuity Math40002: Analysis I
Similarly, any continuous injective function f : R→ R also has an inverse. Here we
have to be a little more careful about the domain of f−1, because there are several
possibilities for the image of f .
Question 3. Let f : R → R be continuous and injective. Which of the
following can not be the image of f?
1. R
2. (a,∞) for some a ∈ R
3. (−∞, b] for some b ∈ RX
4. (a, b) for some a, b ∈ R
In fact, f(R) has to be an open interval. If it contained an endpoint, say if f(R) =
(−∞, b], then we’d have f(x) = b for some x ∈ R. But f is strictly monotonic, so
either
f(x− 1) > f(x) = b or f(x+ 1) > f(x) = b,
and this contradicts the assertion that b is an upper bound for f(R).
Theorem 1.17
If f : R→ R is continuous and injective, then f−1 : f(R)→ R is continuous.
Proof. Fix a point y ∈ f(R), say with y = f(x). We need to show that g = f−1 is
continuous at y, meaning that
∀ > 0 ∃δ > 0 such that |z − y| < δ =⇒ |g(z)− g(y)| < .
By definition we have x = g(y).
Since f is continuous and injective, it is strictly monotonic; let’s say without loss
of generality that it’s increasing. Then g is also strictly monotonically increasing.
Indeed, for any y1 < y2 in f(R), we have
f(g(y1)) = y1 < y2 = f(g(y2)),
and since f is strictly increasing this must mean that g(y1) < g(y2).
15
1 Continuity Math40002: Analysis I
y = f(x)
x
y
b
a
δ
δ
Fix > 0 and let a = f(x − ) and b = f(x + ). Then a < y < b, since y = f(x)
and f is strictly increasing. We also have g(a) = x− and g(b) = x + , and since
g is strictly increasing we have
x− < g(z) < x+ ∀z ∈ (a, b).
Since g(y) = x, we can subtract it from each side and rewrite this as
− < g(z)− g(y) < ∀z ∈ (a, b).
We take δ = min(b−y, y−a). Then y+δ ≤ y+(b−y) = b, and y−δ ≥ y−(y−a) = a,
so
|z − y| < δ =⇒ a < z < b =⇒ |g(z)− g(y)| < .
So this value of δ suffices to prove that g is continuous at y.
Example 1.18. We saw last term that the function E(x) =
∞∑
n=0
xn
n!
is continuous
and strictly increasing, so it must be injective. We also recall that it satisfies
E(x) 6= 0 and E(−x) = 1
E(x)
for all x ∈ R.
For x ≥ 0 every term in the series is nonnegative, so we throw them out for
n ≥ 2 and get E(x) ≥ 1 + x for all x ≥ 0. Thus E(x) is not bounded above.
On the other hand, E(x) is bounded below by 0: we already know that E(0) =
1 is positive, so supposing there were some a such that E(a) < 0, then the
intermediate value theorem would say that E(c) = 0 for some c between a and
0. But 0 must be the infimum of E(R), because for every > 0 we have
E
(
−1
)
=
1
E
(
1
) ≤ 1
1 + 1
=
+ 1
< .
So the intermediate value theorem says that every positive real number is in the
16
1 Continuity Math40002: Analysis I
image of E, hence E(R) = (0,∞). In other words, E gives a continuous bijection
E : R ∼−→ (0,∞).
The inverse function E−1, which we write as
log : (0,∞)→ R,
is therefore also a continuous bijection.
1.3 Open, closed, and compact sets
We’d like to state some upcoming theorems about continuity in fairly general terms.
In order to do this, we’ll first need to take a detour and introduce some language
for subsets of R. You’ll get to see much more general versions of this if you take a
topology module, which of course you should.
Definition. A set S ⊂ R is open if and only if
∀x ∈ S ∃δ > 0 such that (x− δ, x+ δ) ⊂ S.
In other words, if x is a point of an open set S, then S has to contain every other
point within a small neighborhood of x.
Example 1.19. An “open interval” (a, b) is open.
Proof: for any x ∈ (a, b), we take δ = min(b− x, x− a). Then
x < y < x+ δ =⇒ a < y < x+ (b− x) = b =⇒ y ∈ (a, b),
and similarly if x− δ < y < x then
a = x− (x− a) ≤ x− δ < y < b.
Essentially the same proof works for unbounded open intervals like (a,∞), (−∞, b),
or R itself.
Proposition 1.20. Given a collection {Sα} of open subsets of R, which may or
may not be finite, the union S =
⋃
α
Sα is open.
17
1 Continuity Math40002: Analysis I
Proof. For any x ∈ S we must have x ∈ Sα for some α, and since Sα is open,
∃δ > 0 such that (x− δ, x+ δ) ⊂ Sα.
But then this entire neighborhood of x belongs to S =
⋃
α Sα as well.
Proposition 1.21. Given finitely many open sets S1, . . . , Sn ⊂ R, the intersection
S =
n⋂
i=1
Si is open.
Proof. For any x ∈ S we must have x ∈ Si for all i, and since Si is open,
∀i ∈ {1, . . . , n} ∃δi > 0 such that (x− δi, x+ δi) ⊂ Si.
Let δ = min(δ1, . . . , δn). Then
∀i ∈ {1, . . . , n} (x− δ, x+ δ) ⊂ (x− δi, x+ δi) ⊂ Si,
so (x− δ, x+ δ) is also a subset of ⋂Si = S.
Example 1.22. To see that finiteness is necessary in this proposition, we define
open sets Sn ⊂ R for n ∈ N by
Sn =
(
− 1
n
,
1
n
)
.
To compute the intersection S =
∞⋂
n=1
Sn, we check that
• we have 0 ∈ Sn for all n, so 0 ∈ S;
• if x 6= 0, then x 6∈ Sn for n large enough, so x 6∈ S.
For example, we could take n =
⌈∣∣ 1
x
∣∣⌉+ 1, so that n > | 1x |, and then
|x| > 1
n
=⇒ x 6∈ Sn =
(
− 1
n
,
1
n
)
.
It follows that S = {0}, which is not open because it does not contain any
neighborhood (−δ, δ) of 0 where δ > 0.
18
1 Continuity Math40002: Analysis I
Definition. A set S ⊂ R is closed if and only if
∀sequence (xn) ⊂ S, xn → x ∈ R =⇒ x ∈ S.
In other words, the limit of any convergent subsequence of S must also be in
S.
A set S ⊂ R is compact if and only if it is closed and bounded.
Example 1.23. A “closed interval” [a, b] is compact.
Proof: It is bounded, so we just need to see that it is closed.
Given a convergent sequence xn → x, with a ≤ xn ≤ b for all n, we must have
a ≤ inf
n
xn ≤ x ≤ sup
n
xn ≤ b,
so x ∈ [a, b] as well.
Warning. “Open” and “closed” are opposites in English, but not in maths! A
subset of R which is not open is not necessarily closed either. Convince yourself
of the following:
• that the half-open interval (0, 1] is neither open nor closed;
• that R is both open and closed;
• that the empty set ∅ is both open and closed.
Question 4. Which of the following sets are closed?
1. { 1n | n ∈ N}
2. [3,∞)X
3. {40002}X
4. Q
5. RX
Proposition 1.24. A set S ⊂ R is open if and only if its complement T = R \S is
closed.
Proof. =⇒: Suppose that S is open and let (xn) ⊂ T converge to some x ∈ R. This
means that
∀ > 0 ∃N ∈ N such that n ≥ N =⇒ |x− xn| < .
19
1 Continuity Math40002: Analysis I
This means that any open neighborhood (x − , x + ) contains some (in fact,
infinitely many) of the xn, which belong to T . But then no such neighborhood of
x lies entirely in S, and since S is open we must have x 6∈ S, i.e., x ∈ T . So T is
closed.
⇐=: Suppose that T is closed and fix x ∈ S. If S does not contain any δ-
neighborhood of x, then
∀δ > 0 ∃y 6∈ S such that |x− y| < δ.
So for each n ≥ 1 we can find xn ∈ T such that |x − xn| < 1n . Then xn → x, and
since T is closed we must have x ∈ T , contradiction. So S must be open.
Proposition 1.25. A union of finitely many closed sets is closed. An intersection
of arbitrarily many closed sets is also closed.
Proof. Let S1, . . . , Sn be finitely many closed sets. Their complements Ti = R \ Si
are open, and we have
R \
n⋃
i=1
Si =
n⋂
i=1
(
R \ Si
)
=
n⋂
i=1
Ti,
which as an intersection of finitely many open sets is also open. The union
n⋃
i=1
Si
must therefore be closed, since its complement is open.
Exercise: prove the second part of this proposition.
We can also describe compactness in terms of sequences. This is often useful, just
as sequential continuity was a very useful reframing of continuity.
Proposition 1.26. A subset S ⊂ R is compact if and only if every sequence(
xn
) ⊂ S
has a convergent subsequence xni → x, with x ∈ S.
Proof. Suppose first that S is compact, and that
(
xn
) ⊂ S. Since S is bounded,
there is a convergent subsequence
(
xni
)
by Bolzano–Weierstrass, and then lim
n→∞xni ∈
S since S is closed. So we have found a convergent subsequence with limit in S.
Conversely, suppose instead that every sequence
(
xn
) ⊂ S has a convergent subse-
quence whose limit lies in S. If such a sequence already converges, say xn → L, then
20
1 Continuity Math40002: Analysis I
we proved last term that any subsequence xni converges to the same L. But some
subsequence converges to a limit x ∈ S, so L = x must lie in S. Thus S is closed.
In addition, if S were not bounded, then it would contain a sequence
(
xn
) ∈ S with
|xn| → ∞, and no subsequence of this would be convergent because it wouldn’t even
be bounded, giving a contradiction. So S is bounded and hence compact.
We can revisit some of our earlier theorems with these definitions in mind. The
extreme value theorem actually holds for any continuous f : S → R whose domain
S is compact, because the proof only used the claims that the domain (originally
[a, b]) was closed and bounded. Why isn’t this true of the intermediate value theorem
as well?
1.4 Uniform continuity and convergence
Suppose we have a sequence of functions defined on a set S ⊂ R,
f1, f2, · · · : S → R.
In what sense can we say these converge? If lim
n→∞ fn(x) exists for all x ∈ S, then
we could define f : S → R so that f(x) is this limit. But it’s possible that f is not
continuous, even if all of the fn are.
Example 1.27. Define fn : [0, 1]→ R by fn(x) = xn for all n ≥ 1. Let
f(x) = lim
n→∞ fn(x) = limn→∞x
n.
. . .
→
Then f(x) = 0 for 0 ≤ x < 1, but f(1) = 1, so f is not continuous at 1.
We would like to have a stronger notion of continuity which does guarantee conver-
gence to a continuous function.
Recall that given a function f : S → R, where S ⊂ R, we say that f is continuous
if and only if
∀a ∈ S ∀ > 0 ∃δ > 0 such that |x− a| < δ =⇒ |f(x)− f(a)| < .
21
1 Continuity Math40002: Analysis I
Reading this carefully, the value of δ is allowed to depend on our choice of a. But
sometimes this isn’t necessary:
Definition. A function f : S → R is said to be uniformly continuous if and
only if
∀ > 0 ∃δ > 0 such that ∀x, y ∈ S, |x− y| < δ =⇒ |f(x)− f(y)| < .
Note that this definition has its quantifiers in a different order. Before, for each
a ∈ S individually we examined whether f was continuous at a by checking over all
x ∈ S. Now, for uniform continuity, we don’t fix a; instead, we fix and choose δ,
and then check the same condition for all pairs of points x, y simultaneously. This
is certainly a stronger condition! In particular, it doesn’t make sense to talk about
being uniformly continuous at a point; rather, it’s a property of the whole domain.
Proposition 1.28. If f : S → R is uniformly continuous, then it is continuous.
Proof. We need to check that f is continuous at each point a ∈ S. By uniform
continuity,
∀ > 0 ∃δ > 0 such that ∀x, y ∈ S, |x− y| < δ =⇒ |f(x)− f(y)| < .
If we specialize to y = a, then we have
∀ > 0 ∃δ > 0 such that ∀x ∈ S, |x− a| < δ =⇒ |f(x)− f(a)| < .
But this is exactly what it means to be continuous at a.
Example 1.29. Let f(x) = ax+ b. Then for any x, y ∈ R we have
|f(x)− f(y)| = |a| · |x− y| ≤ (|a|+ 1)|x− y|,
so if |x− y| < δ then |f(x)− f(y)| < (|a|+ 1)δ. Thus if we are given any > 0
and we set δ = |a|+1 , then it follows that
|x− y| < δ =⇒ |f(x)− f(y)| < (|a|+ 1)
(
|a|+ 1
)
= .
This didn’t depend on x and y, so f is uniformly continuous.
22
1 Continuity Math40002: Analysis I
Example 1.30. Define f : R→ R by f(x) = x2, and fix > 0. For any δ > 0,
if we set y = x+ δ2 then we have
|f(y)− f(x)| =
∣∣∣∣∣
(
x+
δ
2
)2
− x2
∣∣∣∣∣ = δ2 ·
∣∣∣∣2x+ δ2
∣∣∣∣ .
We can then choose x = δ , and we have |y − x| = δ2 < δ but |f(y) − f(x)| >
δ
2 · 2x = . This works for any δ, so f cannot be uniformly continuous.
Example 1.31. Define f : (0, 1] → R by f(x) = 1x . If f were uniformly
continuous, then we could set = 1 and know that
∃δ > 0 such that |x− y| < δ =⇒ |f(x)− f(y)| < 1.
We should expect this to be false, because f(x) grows much too quickly as x ↓ 0.
In order to disprove it, let’s take x = 1n and y =
1
n+1 for some really large n ≥ 1.
We’ve chosen these because they satisfy
|f(x)− f(y)| = |n− (n+ 1)| = 1,
while still being as close together as we could ask for (for large enough n). In
particular, we have
|x− y| =
∣∣∣∣ 1n − 1n+ 1
∣∣∣∣ = 1n2 + n < 1n2 ,
so no matter what value of δ > 0 we were given, we could let n = 1√
δ
and we’d
have |x− y| < δ. In other words, we have
(x, y) =
(
√
δ,
√
δ√
δ + 1
)
=⇒ |x− y| = δ
1 +
√
δ
< δ, but |f(x)− f(y)| = 1.
So f is not uniformly continuous.
23
1 Continuity Math40002: Analysis I
Uniformly continuous f(x) = 1/(x2 + 1)Not uniformly continuous
In some sense, the reason for the lack of uniform continuity in this last example is
that f(x) approaches infinity in finite time, say if we walk backwards from x = 1 to
x = 0. This can’t happen if f satisfies the extreme value theorem, because then it’s
bounded. So this suggests that in this situation we should get uniform continuity
for free.
Proposition 1.32. If S is compact and f : S → R is continuous, then f is uni-
formly continuous.
Proof. Suppose that f is not uniformly continuous. Then there must be an > 0
such that
∀δ > 0 ∃x, y ∈ S such that |x− y| < δ and |f(x)− f(y)| ≥ .
We’ll take a sequence of such points xi, yi with |xi − yi| < 1i for all i. By the
compactness of S, there is a convergent subsequence
(
xij
)
, with limit x ∈ S. Then
for all j we have
|x− yij | ≤ |x− xij |+ |xij − yij |
by the triangle inequality, and both terms on the right go to 0 as j →∞, so yij → x
as well.
Since f is sequentially continuous at x, we know that
lim
j→∞
f(xij) = f(x) = lim
j→∞
f(yij).
So on the one hand we have |f(xij)− f(yij)| ≥ for all j, but on the other hand we
have
lim
j→∞
(
f(xij)− f(yij)
)
= 0
by the algebra of limits, and this is a contradiction.
We saw that the function f : R→ R given by f(x) = x2 is not uniformly continuous,
and neither is f : (0, 1]→ R given by f(x) = 1x . But this last proposition says that
when restricted to [1, 2], for example, both of these become uniformly continuous.
In other words, unlike continuity at a point, uniform continuity can depend very
much on the domain of the function.
24
1 Continuity Math40002: Analysis I
Question 5. Which of the following is uniformly continuous?
1. f : (−∞, 0)→ R given by f(x) = ex
2. f : (0, 1)→ R given by f(x) = ex
3. f : [1,∞)→ R given by f(x) = ex
4. 1 and 2X
5. 2 and 3
6. 1, 2, and 3
7. None of these
For any real numbers y ≤ x ≤ 1 with x− y < δ, we have
ex − ey = ey(ex−y − 1) < e1(eδ − 1).
So given > 0 and any x, y ≤ 1 with |x − y| < δ, we have |ex − ey| < as long as
e(eδ − 1) ≤ , so we set
δ = log
(
1 +
e
)
.
Then |x − y| < δ implies |ex − ey| < for any x, y ≤ 1, and this proves uniform
continuity on both (−∞, 0) and (0, 1).
On the other hand, f can’t be uniformly continuous on [1,∞). Suppose that for a
given > 0 we have δ > 0 such that |x− y| < δ implies |ex− ey| < for all x, y ≥ 1.
Assuming y ≤ x, we have
ex − ey = ey(ex−y − 1) ≥ ey(1 + (x− y))− 1) = ey(x− y).
So we take y = max(log(2/δ), 1) and x = y + δ2 , and we have
|x− y| = δ
2
< δ but |ex − ey| ≥ ey(x− y) ≥ 2
δ
· δ
2
= ,
a contradiction.
We now discuss what it means for a sequence of functions to converge, generalizing
the earlier notions of convergence we had for sequences of real numbers.
Definition. Let f1, f2, · · · : S → R be a sequence of functions defined on
S ⊂ R. We say that fn converges pointwise to f : S → R if
∀x ∈ S ∀ > 0 ∃N ∈ N such that n ≥ N =⇒ |fn(x)− f(x)| < .
25
1 Continuity Math40002: Analysis I
We say that fn converges uniformly to f if
∀ > 0 ∃N ∈ N such that ∀x ∈ S, n ≥ N =⇒ |fn(x)− f(x)| < .
Note that uniform convergence of functions is almost the same as pointwise conver-
gence, but we’ve rearranged the quantifiers to make it a stronger condition: for each
> 0, there has to be an N such that for all n ≥ N , the supremum of the function
x 7→ |fn(x)− f(x)| is at most .
f
fn
Example 1.33. Recall the sequence of functions fn(x) = x
n on [0, 1]:
. . .
→
These converge pointwise to the discontinuous function
f(x) =
{
0, 0 ≤ x < 1
1, x = 1,
but not uniformly. The problem is that for all n ∈ N, we have
lim
x↑1
|fn(x)− f(x)| = lim
x↑1
|xn − 0| = 1
and so the supremum of |fn(x)− f(x)| is at least 1 for all n.
26
1 Continuity Math40002: Analysis I
Theorem 1.34
If a sequence of continuous functions fn : S → R converges uniformly to f : S →
R, then f is continuous. Moreover, if in fact the fn are uniformly continuous,
then so is f .
Proof. Fixing > 0 and y ∈ S, we want to find δ > 0 such that
|x− y| < δ =⇒ |f(x)− f(y)| < .
We achieve this by trying to make sense of the following picture, which is an attempt
to estimate |f(x)− f(y)| by breaking it into several steps.
f
3
fn
x y
< 3
< 3
< 3
Since fn converges uniformly, we can find N ∈ N such that
∀x ∈ S, n ≥ N =⇒ |fn(x)− f(x)| <
3
.
For a fixed n ≥ N , the continuity of fn at y says that we can also find δ > 0 such
that
|x− y| < δ =⇒ |fn(x)− fn(y)| <
3
.
We apply the triangle inequality to see that if |x− y| < δ, then
|f(x)− f(y)| ≤ |f(x)− fn(x)|+ |fn(x)− fn(y)|+ |fn(y)− f(y)|
and each of the terms on the right is less than 3 , so |f(x)− f(y)| < .
The above argument shows that f is continuous at any y ∈ S, but the choice of δ
may have depended on y. If the specific fn we used had been uniformly continuous,
then we could have chosen δ independently of y. Thus if all of the fn are uniformly
continuous then this actually proves that f is uniformly continuous as well.
27
1 Continuity Math40002: Analysis I
This theorem gives another proof that the sequence fn : [0, 1] → R given by
fn(x) = x
n does not converge uniformly, since if it did then its limit would have
been continuous. More importantly, it gives us a criterion for when we can exchange
the order of two limits. (Note that this isn’t automatic from the definition of limits,
and it isn’t even true in general!)
Corollary 1.35. Let fn : S → R be a uniformly convergent sequence of continuous
functions. If S contains an open interval (a− δ, a+ δ) for some δ > 0, then
lim
n→∞ limx→a fn(x) = limx→a limn→∞ fn(x).
Proof. The pointwise limit f(x) = lim
n→∞ fn(x) is continuous because fn converges
uniformly, and then by two applications of sequential continuity we have
lim
n→∞
(
lim
x→a fn(a)
)
= lim
n→∞ fn(a) = f(a)
and also
f(a) = lim
x→a f(x) = limx→a
(
lim
n→∞ fn(x)
)
.
Finally, we might also be interested in whether a series of functions converges. Just
as with real numbers, we say that a series
∞∑
i=1
fi(x)
converges if and only if the sequence of partial sums
Sn(x) =
n∑
i=1
fi(x)
converges, and it converges uniformly if and only if the sequence Sn(x) converges
uniformly. The following is a useful criterion.
Theorem 1.36: Weierstrass M-test
Let f1, f2, · · · : S → R be a sequence of continuous functions, and suppose there
are constants M1,M2, . . . such that
∀i ∀x ∈ S, |fi(x)| ≤Mi.
If
∞∑
i=1
Mi converges, then the series
∞∑
i=1
fi(x) converges uniformly to a contin-
uous function g : S → R.
28
1 Continuity Math40002: Analysis I
Proof. For each n ∈ N we define the nth partial sum of this series by
Sn(x) =
n∑
i=1
fi(x).
The Sn are all continuous. For any x ∈ S, the comparison test for series tells us
that
0 ≤ |fi(x)| ≤Mi =⇒
∞∑
i=1
fi(x) converges absolutely,
so we can define g : S → R by g(x) =
∞∑
i=1
fi(x) = lim
n→∞Sn(x).
The sequence
(∑n
i=1Mi
)
of partial sums is Cauchy because it converges, so given
any > 0:
∃N ∈ N such that N ≤ m ≤ n =⇒
∣∣∣∣∣∣
n∑
i=m+1
Mi
∣∣∣∣∣∣ < 2 .
For the same and N , we use the triangle inequality to show that ∀x ∈ S,
|Sn(x)− Sm(x)| =
∣∣∣∣∣∣
n∑
i=m+1
fi(x)
∣∣∣∣∣∣ ≤
n∑
i=m+1
|fi(x)| ≤
n∑
i=m+1
Mi <
2
,
and this bound is independent of x ∈ S. Taking limits as n→∞ gives
∀m ≥ N ∀x ∈ S, |g(x)− Sm(x)| ≤
2
< .
Since we can find such an N ∈ N for any > 0, the sequence Sn(x) converges
uniformly, and this means that the series
∞∑
i=1
fi(x) does as well.
Example 1.37. Suppose for some r > 0 that the series
∞∑
i=0
air
i converges
absolutely, where the ai are a sequence of real numbers. For all i ≥ 0, we take
fi(x) = aix
i, Mi = |ai|ri =⇒ ∀x ∈ [−r, r], |fi(x)| ≤Mi.
Since
∑
iMi converges, the Weierstrass M-test then tells us that the power series
∞∑
i=0
aix
i =
∞∑
i=0
fi(x)
29
1 Continuity Math40002: Analysis I
converges uniformly to a continuous function on the interval [−r, r].
This last example tells us the following important fact about power series.
Theorem 1.38
Let f(x) =
∞∑
i=0
aix
i be a power series with radius of convergence R > 0. Then
f is continuous on the open interval (−R,R).
We have to be a little careful in proving this: we’d like to just claim that f(x) is
absolutely convergent at x = R and apply the Weierstrass M-test, but f may not
converge there at all! We thus use a trick to prove that f is continuous at a fixed
x ∈ (−R,R): we find a smaller, compact subinterval of (−R,R) containing x where
we know that f does converge, and we prove that it’s continuous there.
Proof of Theorem 1.38. Fix x ∈ (−R,R); we wish to show that f is continuous at
x. Let t = 12(R + |x|):
0 R−R x t−t
so that we have a chain of inclusions
x ∈ (−t, t) ⊂ [−t, t] ⊂ (−R,R).
Then
∞∑
i=0
ait
i converges absolutely, so f is continuous on [−t, t]. And since x ∈
(−t, t), it follows that f is continuous at x.
Example 1.39. The series f(x) =
∞∑
i=0
cos(13ipix)
2i
converges uniformly on all of
R, since if we take Mi = 12i for all i then∣∣∣∣cos(13ipix)2i
∣∣∣∣ ≤Mi and ∞∑
i=0
Mi converges.
This last example is one of a family of functions constructed by Weierstrass which
are famously continuous on all of R but not differentiable anywhere.
30
2 Differentiation Math40002: Analysis I
2 Differentiation
Definition. A function f : R → R is differentiable at a ∈ R, with derivative
f ′(a) ∈ R, iff
lim
x→a
f(x)− f(a)
x− a
exists and is equal to f ′(a). This is equivalent to
∀ > 0 ∃δ > 0 such that 0 < |x− a| < δ ⇒
∣∣∣∣f(x)− f(a)x− a − f ′(a)
∣∣∣∣ < .
The quotient
f(x)−f(a)
x−a is the slope of the line segment through (x, f(x)) and (a, f(a)),
so being differentiable at a means that these slopes get arbitrarily close to f ′(a) as
x gets close to a.
Question 6. Which of the following is equivalent to the definition of f ′(a)?
1. lim
h→0
f(a+ h)− f(a)
h
X
2. lim
h→0
f(a+ h)− f(a− h)
2h
3. lim
x↓a
f(x)− f(a)
x− a
4. More than one of these.
The second option is not equivalent because this limit may exist when lim
x→a
f(x)− f(a)
x− a
does not. Consider the function
f(x) =
{
0, x 6= 0
1, x = 0
31
2 Differentiation Math40002: Analysis I
at a = 0; then lim
h→0
f(h)− f(−h)
2h
= 0, but lim
x→0
f(x)− f(0)
x
= lim
x→0
−1
x
= ∞. For
the third option, we note that the definition of derivative requires the limit to exist
as x→ a from either side, and so
f(x) =
{
0, x ≥ 0
1, x < 0
at a = 0 satisfies lim
x↓0
f(x)− f(a)
x− a = 0 even though the same limit as x → 0 does
not exist.
In both cases, we constructed a counterexample out of a function that wasn’t con-
tinuous or differentiable at x = a. We’ll see soon that if f was differentiable, then
it would have been continuous as well.
Example 2.1. Let f(x) = x2. Then for any a ∈ R,
lim
x→a
f(x)− f(a)
x− a = limx→a
x2 − a2
x− a = limx→ax+ a = 2a.
So f(x) is differentiable at a with derivative f ′(a) = 2a.
We will also write f ′(x) to denote the function whose value at x = a is f ′(a),
assuming one exists.
Example 2.2. Let f(x) = |x|, and fix a ∈ R. If a > 0 then
lim
x→a
|x| − |a|
x− a = limx→a
x− a
x− a = 1,
because we have |x| = x for all x sufficiently close to a (say, within a2 of it).
Similarly, if a < 0 then
lim
x→a
|x| − |a|
x− a = limx→a
−x− (−a)
x− a = −1.
On the other hand, at a = 0 we have
|x| − |0|
x− 0 =
{
−1, x < 0
1, x > 0
and so lim
x→0
|x| − 0
x− 0 does not exist. So f(x) = |x| is not differentiable at 0, and
32
2 Differentiation Math40002: Analysis I
otherwise we have
f ′(x) =
{
1, x > 0
−1, x < 0.
We can prove that some common functions are differentiable, and compute their
derivatives.
Proposition 2.3. For any integer n ≥ 0, the function f(x) = xn has derivative
f ′(x) = nxn−1.
Proof. When n = 0, we have f(x) = 1 and so
f ′(a) = lim
x→a
1− 1
x− a = limx→a 0 = 0.
Assuming now that n ≥ 1, we use the quotient
xn − an
x− a =
n−1∑
i=0
xn−1−iai = xn−1 + xn−2a+ · · ·+ xan−2 + an−1.
As x→ a, each of the n terms on the right side approaches an−1, so we have
lim
x→a
xn − an
x− a = na
n−1.
Proposition 2.4. The function f(x) = ex has derivative f ′(x) = ex.
Proof. We recall the identity f(x+ y) = f(x)f(y), which implies that
ex − ea
x− a = e
a
(
ex−a − 1
x− a
)
.
Now we set h = x− a, and note that x→ a is the same as h→ 0, so that
lim
x→a
ex − ea
x− a = e
a lim
h→0
eh − 1
h
.
But we have
eh − 1
h
=
(∑∞
n=0
hn
n!
)
− 1
h
=
∞∑
n=1
hn−1
n!
= 1 +
h
2!
+
h2
3!
+
h3
4!
+ . . . ,
33
2 Differentiation Math40002: Analysis I
and this is a convergent series with limit 1 as h→ 0, since for |h| < 2 we have∣∣∣∣∣eh − 1h − 1
∣∣∣∣∣ =
∣∣∣∣∣∣
∞∑
n=2
hn−1
n!
∣∣∣∣∣∣ ≤
∞∑
m=1
|h|m
(m+ 1)!
≤
∞∑
m=1
|h|m
2m
=
|h|/2
1− |h|/2
and the right side goes to 0 as h does. Putting this all together, we have
lim
x→a
ex − ea
x− a = e
a lim
h→0
eh − 1
h
= ea,
exactly as claimed.
Being differentiable is in fact a stronger condition than being continuous. It’s cer-
tainly true that continuous functions need not be differentiable – we’ve already seen
f(x) = |x| at x = 0 – but differentiable functions are always continuous.
Proposition 2.5. If f(x) is differentiable at a, then it is continuous at a.
Proof. If f is differentiable at a then we have
lim
x→a
f(x)− f(a)
x− a = f
′(a),
which means that
∀ > 0 ∃δ > 0 such that 0 < |x− a| < δ ⇒
∣∣∣∣f(x)− f(a)x− a − f ′(a)
∣∣∣∣ < .
The triangle inequality says that
|f(x)− f(a)| ≤ |f(x)− f(a) + (x− a)f ′(a)|+ |(a− x)f ′(a)|,
so given and δ as above and |x− a| < δ, we have
|f(x)− f(a)| ≤ |x− a|+ |a− x||f ′(a)| = (+ |f ′(a)|)|x− a|.
In particular, if we let δ′ = min(δ, /2
+|f ′(a)|) then it follows that
|x− a| < δ′ ⇒ |f(x)− f(a)| ≤ (+ |f ′(a)|)
(
/2
+ |f ′(a)|
)
< ,
and so f must be continuous at a.
34
2 Differentiation Math40002: Analysis I
We could have also proved this using the algebra of limits: we have
f(x) = f(a) + (x− a)
(
f(x)− f(a)
x− a
)
,
and as x → a the right side converges to f(a) + 0 · f ′(a) = f(a), so then lim
x→a f(x)
exists and is equal to f(a) as well.
Note that even if f is continuous and differentiable everywhere, its derivative need
not be continuous.
Example 2.6. Let f(x) =
{
x2 sin(1/x), x 6= 0
0, x = 0.
We’ll eventually be able to
show that
f ′(x) = 2x sin(1/x)− cos(1/x) for all x 6= 0,
and this does not converge as x→ 0. On the other hand, for nonzero x we have∣∣∣∣f(x)− f(0)x− 0
∣∣∣∣ = ∣∣∣∣x2 sin(1/x)x
∣∣∣∣ = |x sin(1/x)| ≤ |x|,
so
∣∣∣f(x)−f(0)x−0 ∣∣∣ → 0 as x → 0, hence f is differentiable at 0 and f ′(0) = 0. It
follows that f(x) is differentiable everywhere, but that f ′(x) is not continuous.
2.1 Basic properties
Now we will see the effect of some common operations on derivatives.
Proposition 2.7. If f(x) and g(x) are both differentiable at x = a, then h(x) =
f(x) + g(x) is also differentiable at x = a, and
h′(a) = f ′(a) + g′(a).
Proof. We use the algebra of limits to compute that
lim
x→a
h(x)− h(a)
x− a = limx→a
(f(x) + g(x))− (f(a) + g(a))
x− a
= lim
x→a
f(x)− f(a)
x− a + limx→a
g(x)− g(a)
x− a
= f ′(a) + g′(a).
35
2 Differentiation Math40002: Analysis I
Theorem 2.8: Product rule
Suppose that f(x) and g(x) are both differentiable at x = a. Then h(x) =
f(x)g(x) is also differentiable at a, and
h′(a) = f ′(a)g(a) + f(a)g′(a).
Proof. We can break the usual limit up into several pieces and evaluate them sepa-
rately, after we first add and subtract the same term from it:
lim
x→a
f(x)g(x)− f(a)g(a)
x− a = limx→a
(f(x)g(x)− f(a)g(x)) + (f(a)g(x)− f(a)g(a))
x− a
= lim
x→a
((
f(x)− f(a)
x− a
)
g(x) + f(a)
(
g(x)− g(a)
x− a
))
=
(
lim
x→a
f(x)− f(a)
x− a
)(
lim
x→a g(x)
)
+ f(a)
(
lim
x→a
g(x)− g(a)
x− a
)
= f ′(a)g(a) + f(a)g′(a).
Proposition 2.9. If f(x) is differentiable at x = a and f(a) 6= 0, then g(x) = 1
f(x)
is differentiable at x = a, and
g′(a) = − f
′(a)
(f(a))2
.
Proof. We evaluate the usual limit:
lim
x→a
g(x)− g(a)
x− a = limx→a
1
f(x)
− 1
f(a)
x− a
= lim
x→a
f(a)− f(x)
f(x)f(a)(x− a)
= lim
x→a
( −1
f(x)f(a)
)(
f(x)− f(a)
x− a
)
=
−f ′(a)
(f(a))2
.
The last step is mostly algebra of limits, except for one subtle detail: we need to
know that f being differentiable at a makes it continuous at a, so that lim
x→a f(x)
exists and is equal to f(a).
36
2 Differentiation Math40002: Analysis I
Example 2.10. For any n ∈ N we have seen that if f(x) = xn then f ′(x) =
nxn−1. Letting g(x) = 1
f(x)
= x−n, if x 6= 0 then g is differentiable at x and we
have
g′(x) =
−f ′(x)
(f(x))2
=
−nxn−1
x2n
= (−n)x(−n)−1.
So in fact the claim that xn has derivative nxn−1 holds for all integers n.
Theorem 2.11: Quotient rule
Suppose that f(x) and g(x) are both differentiable at x = a, and that g(a) 6= 0.
Then h(x) =
f(x)
g(x)
is differentiable at a, and
h′(a) =
f ′(a)g(a)− f(a)g′(a)
(g(a))2
.
Proof. We write h(x) = f(x)r(x), where r(x) = 1
g(x)
, and apply the previous propo-
sition: r(x) is differentiable at x = a with derivative r′(a) = − g′(a)
(g(a))2
, so by the
product rule, h is differentiable at a and
h′(x) = f ′(a)r(a) + f(a)r′(a)
= f ′(a)
(
1
g(a)
)
− f(a)
(
g′(a)
(g(a))2
)
=
f ′(a)g(a)− f(a)g′(a)
(g(a))2
.
Theorem 2.12: Chain rule
Let f(x) and g(x) be functions such that g is differentiable at x = a and f is
differentiable at x = g(a). Then h(x) = f(g(x)) is differentiable at x = a, and
h′(a) = f ′(g(a))g′(a).
Proof. We can define functions r and s such that
f(y)− f(g(a)) = (y − g(a))(f ′(g(a)) + r(y))
g(x)− g(a) = (x− a)(g′(a) + s(x)),
satisfying r(g(a)) = s(a) = 0, and the definition of the derivative tells us that
lim
y→g(a)
r(y) = lim
x→a s(x) = 0.
37
2 Differentiation Math40002: Analysis I
Then we compute from the definitions of r and s that
f(g(x))− f(g(a)) = (g(x)− g(a))(f ′(g(a)) + r(g(x)))
= (x− a)(g′(a) + s(x))(f ′(g(a)) + r(g(x))).
Rearranging this slightly, when x 6= a we have
f(g(x))− f(g(a))
x− a = (f
′(g(a)) + r(g(x)))(g′(a) + s(x)).
As x → a we have s(x) → 0; and g(x) → g(a) since g is continuous at a, so
r(g(x))→ 0 as well. Thus
lim
x→a
f(g(x))− f(g(a))
x− a = f
′(g(a))g′(a),
and this is by definition the value of h′(a).
Remark 2.13. What we’d really like to have done is take the usual limit definition,
and multiply and divide the top and bottom by the same term:
lim
x→a
f(g(x))− f(g(a))
x− a = limx→a
(
f(g(x))− f(g(a))
g(x)− g(a)
)(
g(x)− g(a)
x− a
)
.
Certainly
g(x)−g(a)
x−a → g′(a) as x → a, and with a little more work we would hope
to prove that the first factor converges to f ′(g(x)) as well. This is nearly true, but
it requires us to know that g(x) approaches g(a) without being equal to it. Since we
can’t guarantee this – g(x) might be constant near x = a, for example – we either
have to treat that case separately, or find a different proof like the one we gave
above.
2.2 The mean value theorem
Derivatives give us a way to identify extreme values of functions.
Definition. Let f : S → R be a function. We say that f has a local minimum
at x ∈ S if and only if
∃δ > 0 such that |y − x| < δ ⇒ f(y) ≥ f(x).
Likewise, we say that f has a local maximum at x ∈ S if and only if
∃δ > 0 such that |y − x| < δ ⇒ f(y) ≤ f(x).
38
2 Differentiation Math40002: Analysis I
The following picture shows a function with a local maximum at x0 and a local
minimum at x1.
x0
x1
The “local” part means that f does not have to take its minimum or maximum
value over the whole domain at x, just the minimum or maximum value on some
small neighborhood of x.
Question 7. Let f(x) =
{
0, x 6∈ Q
x, x ∈ Q.
Where are the local maxima of f?
1. Negative irrational numbers. X
2. Negative rational numbers.
3. Positive irrational numbers.
4. Positive rational numbers.
5. 1 and 4.
6. 2 and 3.
7. Nowhere.
Proposition 2.14. Let f : [a, b] → R be a function. If f has a local minimum
or a local maximum at some point x ∈ (a, b), and if f is differentiable at x, then
f ′(x) = 0.
Proof. Suppose f has a local minimum at x; the local maximum case is nearly
identical. Then there is some δ > 0 such that (x− δ, x+ δ) ⊂ (a, b) and
|y − x| < δ ⇒ f(y) ≥ f(x).
39
2 Differentiation Math40002: Analysis I
(Once the second condition is satisfied, we can take an even smaller δ if needed to
satisfy the first one as well.) Using this, we compute that
f(y)− f(x)
y − x ≤ 0 for x− δ < y < x =⇒ limy↑x
f(y)− f(x)
y − x ≤ 0,
because the numerator is positive or zero while the denominator is negative. Simi-
larly, we have
f(y)− f(x)
y − x ≥ 0 for x < y < x+ δ =⇒ limy↓x
f(y)− f(x)
y − x ≥ 0,
because the numerator is nonnegative while the denominator is positive. Now by
assumption f ′(x) = lim
y→x
f(y)− f(x)
y − x exists, so it must be equal to both the limit as
y ↑ x and the limit as y ↓ x, and this is only possible if f ′(x) = 0.
The converse is not true: a function f can be differentiable and satisfy f ′(x) = 0
at a point x which is neither a local minimum nor a local maximum. For example,
when f(x) = x3 we know that f ′(x) = 3x2, so f ′(0) = 0. But f(x) cannot have a
local maximum at 0, because f(y) > 0 for all positive y, and similarly it does not
have a local minimum at 0 because f(y) < 0 for all negative y.
Theorem 2.15: Rolle’s theorem
Let f be a function which is continuous on [a, b] and differentiable on (a, b). If
f(a) = f(b), then there is some c ∈ (a, b) such that f ′(c) = 0.
A sketch of some typical f suggests that we should try to prove this by looking for
40
2 Differentiation Math40002: Analysis I
local minima or maxima, and that’s exactly what we’ll do.
c
a b
Proof. The extreme value theorem says that f attains both a maximum value and
a minimum value on the interval [a, b]. If either of these happen at some point
c ∈ (a, b), then the previous proposition says that f ′(c) = 0 and we are done.
This gives us the desired c in all cases except when f attains its minimum and
maximum values at a and b in some order. But if that happens, then since f(a) =
f(b), the minimum and maximum values of f are the same, and so f must be
constant on [a, b], in which case f ′(c) = 0 for all c ∈ (a, b) anyway.
Theorem 2.16: Mean value theorem
Let f be continuous on [a, b] and differentiable on (a, b). Then there is a point
c ∈ (a, b) such that
f ′(c) =
f(b)− f(a)
b− a .
The statement of this theorem says roughly that there’s a point c where the slope
of the line tangent to the graph of f is the same as the slope of the line through
(a, f(a)) and (b, f(b)):
Our strategy will be to change the function f so that we shear the graph vertically,
turning these lines into horizontal ones, and then apply Rolle’s theorem.
Proof. We define a function g : [a, b]→ R by
g(x) = f(x)− f(b)− f(a)
b− a (x− a).
41
2 Differentiation Math40002: Analysis I
This is designed to be continuous on [a, b] and differentiable on (a, b), since the
same is true of f(x) and x − a, and to satisfy g(a) = g(b) = f(a). Rolle’s theorem
tells us that there is a point c ∈ (a, b) such that g′(c) = 0, and we have
g′(c) = f ′(c)− f(b)− f(a)
b− a = 0 =⇒ f
′(c) =
f(b)− f(a)
b− a .
So this is the desired c.
Proposition 2.17. Let f be continuous on [a, b] and differentiable on (a, b). If
f ′(x) ≥ 0 for all x ∈ (a, b), then f is monotone increasing on [a, b].
Moreover, if the stronger inequality f ′(x) > 0 holds for all x ∈ (a, b), then f is
strictly monotone increasing on [a, b].
Proof. For any x, y ∈ [a, b], with x < y, we can apply the mean value theorem to f
on the interval [x, y] to find c ∈ (x, y) such that
f(y)− f(x)
y − x = f
′(c) ≥ 0.
We multiply both sides by y−x > 0 to conclude that f(y)−f(x) ≥ 0, or f(y) ≥ f(x).
And if f ′ were strictly positive, then we would have f ′(c) > 0 and so the same
inequality becomes f(y)− f(x) = f ′(c)(y − x) > 0, hence f(y) > f(x).
Of course, the same proof shows that if f ′(x) ≤ 0 or f ′(x) < 0 for all x ∈ (a, b),
then f is monotone decreasing or strictly monotone decreasing on [a, b]. And then
we can conclude the following:
Proposition 2.18. Let f be continuous on [a, b] and differentiable on (a, b). If
f ′(x) = 0 for all x ∈ (a, b), then f is constant on [a, b].
Proof. Since f ′(x) ≥ 0 we know that f is monotone increasing, and since f ′(x) ≤ 0
it’s also monotone decreasing. So for every x < y in the interval [a, b] we have both
f(x) ≤ f(y) and f(x) ≥ f(y), and therefore f(x) = f(y).
And this can be applied to tell us that if a function is differentiable, then it’s
determined up to an additive constant by its derivative.
Proposition 2.19. Let f and g be continuous on [a, b] and differentiable on (a, b). If
f ′(x) = g′(x) for all x ∈ (a, b), then there exists some c ∈ R such that f(x) = g(x)+c
for all x ∈ [a, b].
Proof. The continuous function h(x) = f(x) − g(x) on [a, b] satisfies h′(x) = 0 on
(a, b), so h(x) = c for some constant c and then f(x) = g(x) + c on all of [a, b].
42
2 Differentiation Math40002: Analysis I
2.3 L’Hoˆpital’s rule
Occasionally we come across a limit of the form
lim
x→a
f(x)
g(x)
, f(a) = g(a) = 0.
The algebra of limits doesn’t help with this indeterminate form, but if f and g are
differentiable near a then the following is often useful.
Theorem 2.20: L’Hoˆpital’s rule, one-sided version
Suppose that f and g are differentiable on an interval (a, b), with g′(x) 6= 0 on
this interval. If
lim
x↓a
f(x) = lim
x↓a
g(x) = 0 and lim
x↓a
f ′(x)
g′(x)
= L,
then lim
x↓a
f(x)
g(x)
= L.
The proof of l’Hoˆpital’s rule will make use of a stronger version of the mean value
theorem, which we present first.
Proposition 2.21. Let f, g : [a, b] → R be continuous functions which are both
differentiable on (a, b). Then there is some c ∈ (a, b) such that
(f(b)− f(a))g′(c) = (g(b)− g(a))f ′(c).
Proof. Consider h(x) = (f(b) − f(a))g(x) − (g(b) − g(a))f(x). This is continuous
on [a, b] and differentiable on (a, b), and we can compute that
h(a) = f(b)g(a)− f(a)g(b) = h(b),
so Rolle’s theorem tells us that h′(c) = 0 for some c ∈ (a, b), and h′(c) = 0 is
equivalent to the desired condition.
Note that by taking g(x) = x in this proposition, we recover the original mean
value theorem: there is some c ∈ (a, b) such that f(b) − f(a) = (b − a)f ′(c), or
f ′(c) = f(b)−f(a)b−a .
Proof of l’Hoˆpital’s rule, one-sided version. By the definition of a one-sided limit,
given any > 0 there is a δ > 0 such that
a < x < a+ δ ⇒
∣∣∣∣f ′(x)g′(x) − L
∣∣∣∣ < 2 .
43
2 Differentiation Math40002: Analysis I
If we pick x, y ∈ (a, a + δ), say with a < y < x < a + δ, then the generalized mean
value theorem says that there is c ∈ (y, x) such that
f(x)− f(y)
g(x)− g(y) =
f ′(c)
g′(c)
∈
(
L−
2
, L+
2
)
.
Here we use the hypothesis that g′ 6= 0 on all of (a, b) in order to divide by g(x)−g(y),
because if g(x) = g(y) then the mean value theorem would provide z ∈ (x, y) such
that g′(z) = g(x)−g(y)x−y = 0 and this is assumed not to happen.
Since
∣∣∣f(x)−f(y)g(x)−g(y) − L∣∣∣ < 2 whenever a < y < x < a + δ, we take limits as y ↓ a to
get
a < x < a+ δ ⇒
∣∣∣∣f(x)g(x) − L
∣∣∣∣ ≤ 2 < .
We can find such a δ > 0 for any > 0, so we have
lim
x↓a
f(x)
g(x)
= L
by the definition of a one-sided limit.
Of course, the same proof works if everything is defined on an interval (b, a) with
b < a and we take limits as x ↑ a instead. So if we combine the two one-sided limits,
we get:
Theorem 2.22: L’Hoˆpital’s rule
Suppose that f and g are differentiable on an interval (c, d), except possibly at
some point a ∈ (c, d), and that g′(x) 6= 0 on (c, d) \ {a}. If
lim
x→a f(x) = limx→a g(x) = 0 and limx→a
f ′(x)
g′(x)
= L,
then lim
x→a
f(x)
g(x)
= L.
Question 8. In which of the following situations does l’Hoˆpital’s rule also
apply, meaning that we can evaluate the limit by replacing f and g with f ′ and
g′? There may be several correct answers.
1. lim
x→a
f(x)
g(x)
, with f, g →∞ as x→ aX
2. lim
x→∞
f(x)
g(x)
, with f, g → 0 as x→∞X
44
2 Differentiation Math40002: Analysis I
3. lim
x→∞(f(x)− g(x)), with f, g →∞ as x→∞
4. lim
x→∞ f(x)
g(x), with f, g → 0 as x→∞
The first choice is another form of l’Hoˆpital’s rule whose proof is very similar but
just a little bit trickier. For the second one, note that
lim
x→∞
f(x)
g(x)
= lim
y↓0
f(1/y)
g(1/y)
= lim
y↓0
f ′(1/y)(−1/y2)
g′(1/y)(−1/y2) = limy↓0
f ′(1/y)
g′(1/y)
= lim
x→∞
f ′(x)
g′(x)
.
The third one cannot be true, because lim
x→∞(2x−x) =∞ is not the same as limx→∞(2−
1) = 1. And for a counterexample to the fourth option, we have
lim
x→∞
(−e−x)2/x = lim
x→∞ e
−2 =
1
e2
, lim
x→∞
(
e−x
)−2/x2
= lim
x→∞ e
2/x = 1.
Example 2.23. We use l’Hoˆpital’s rule to evaluate
lim
x→0
sin2(x)
1− cos(x) .
Both the numerator and denominator approach zero as x → 0, and they are
differentiable on R with derivatives 2 sin(x) cos(x) (via the chain rule) and sin(x),
respectively, according to a problem sheet. We have
lim
x→0
2 sin(x) cos(x)
sin(x)
= lim
x→0
2 cos(x) = 2,
since cos(0) = 1, so it follows that
lim
x→0
sin2(x)
1− cos(x) = 2
as well.
We can apply l’Hoˆpital’s rule multiple times if needed to evaluate a limit.
Example 2.24. We wish to evaluate lim
x→0
ex − 1− x
x2
. By l’Hoˆpital’s rule, we
have
lim
x→0
ex − 1− x
x2
= lim
x→0
ex − 1
2x
provided that the limit on the right exists. Again by l’Hoˆpital’s rule, we compute
that
lim
x→0
ex − 1
2x
= lim
x→0
ex
2
=
e0
2
=
1
2
,
45
2 Differentiation Math40002: Analysis I
so the desired limit does exist and thus lim
x→0
ex − 1− x
x2
=
1
2
.
In general, if f ′(x) exists and is differentiable then we call its derivative f ′′(x). Note
that in this case, f ′(x) must be continuous because it is differentiable.
Proposition 2.25. If f ′′(x) exists on a neighborhood of x = a and is continuous at
x = a, then
f ′′(a) = lim
h→0
f(a+ h)− 2f(a) + f(a− h)
h2
.
Proof. The numerator and denominator are differentiable functions of h which both
approach 0 as h → 0, and the derivative 2h of h2 is nonzero away from h = 0, so
l’Hoˆpital’s rule says that
lim
h→0
f(a+ h)− 2f(a) + f(a− h)
h2
= lim
h→0
d
dh(f(a+ h)− 2f(a) + f(a− h))
d
dh(h
2)
= lim
h→0
f ′(a+ h)− f ′(a− h)
2h
if the limit on the right exists. But f ′(a + h) − f ′(a − h) is differentiable when |h|
is small, and it approaches 0 as h → 0 since f ′ is continuous at a = 0, so another
application of l’Hoˆpital’s rule says that
lim
h→0
f ′(a+ h)− f ′(a− h)
2h
= lim
h→0
d
dh(f
′(a+ h)− f ′(a− h))
d
dh(2h)
= lim
h→0
f ′′(a+ h) + f ′′(a− h)
2
= f ′′(a)
by the fact that f ′′ is continuous at x = a, hence f ′′(a±h)→ f ′′(a) as h→ 0. Since
this limit exists, the original limit must exist as well, and it is equal to f ′′(a).
2.4 Higher derivatives
If a function f(x) is differentiable, then its derivative f ′(x) may be differentiable, and
we call the derivative of f ′(x) the second derivative of f , denoted f ′′(x) or f (2)(x).
If f ′′(x) is differentiable then its derivative is called the third derivative and written
f ′′′(x) or f (3)(x). We can repeat this as often as we like; these higher derivatives
f (n)(x), also written as d
nf
dxn , carry important information about the original function
f .
Note that in order for the nth derivative f (n)(x) to exist at x = a, the (n − 1)st
derivative f (n−1)(x) must exist in a neighborhood of a and be differentiable at x = a.
46
2 Differentiation Math40002: Analysis I
Theorem 2.26: Taylor’s theorem
Suppose that f : [c, d] → R has continuous derivatives f (i)(x) for all i ≤ n,
and that f (n+1)(x) exists for all x ∈ (c, d). For a ∈ [c, d], define the Taylor
polynomial of order n at x = a by
Pn(x) = f(a) +
f ′(a)
1!
(x− a) + f
′′(a)
2!
(x− a)2 + · · ·+ f
(n)(a)
n!
(x− a)n
=
n∑
i=0
f (i)(a)
i!
(x− a)i.
Then for any b ∈ [c, d] with b 6= a, there is a point t between a and b such that
f(b) = Pn(b) +
f (n+1)(t)
(n+ 1)!
(b− a)n+1.
When we take n = 0, Taylor’s theorem asserts that there is a t between a and b such
that
f(b) = f(a) + f ′(t)(b− a) ⇐⇒ f ′(t) = f(b)− f(a)
b− a ,
which is exactly the mean value theorem. So we can consider Taylor’s theorem a
massive generalization of the mean value theorem.
In the proof of Taylor’s theorem we will repeatedly use the fact that if k is a non-
negative integer, then the ith derivative of (x− a)k at x = a is
k(k − 1) . . . (k + 1− i) (x− a)k−i
∣∣∣
x=a
=
{
k!, i = k
0, i 6= k.
Try to prove this by induction on i.
Proof of Taylor’s theorem. We let k =
f(b)−Pn(b)
(b−a)n+1 , and we define an auxiliary func-
tion g : [c, d]→ R by
g(x) = f(x)− Pn(x)− k(x− a)n+1,
noting by our choice of k that
g(b) = f(b)− Pn(b)−
(
f(b)− Pn(b)
(b− a)n+1
)
(b− a)n+1 = 0.
(This may seem unmotivated, but up to a constant term, the n = 0 version of this
is exactly the same polynomial we used to prove the mean value theorem!)
47
2 Differentiation Math40002: Analysis I
Since Pn(x) is a polynomial of degree n, we have P
(n+1)
n (x) = 0, and so
g(n+1)(x) = f (n+1)(x)− (n+ 1)!k.
Thus we want to find some t between a and b such that g(n+1)(t) = 0.
We observe that for 0 ≤ i ≤ n we have
g(i)(a) = f (i)(a)− P (i)n (a)− d
i
dxi
k(x− a)n+1
∣∣
x=a
= f (i)(a)− f (i)(a) = 0.
So to summarize, g(a) = g′(a) = g′′(a) = · · · = g(n)(a) = 0 and g(b) = 0.
We now apply Rolle’s theorem as many times as possible. Since g(a) = g(b) = 0,
there is some b1 strictly between a and b0 = b such that
g′(b1) = 0.
Then, since g′(a) = 0, there is some b2 strictly between a and b1 such that
g′′(b2) = 0.
We repeat this process n+1 times, getting b1, b2, . . . , bn, bn+1 such that bi is strictly
between a and bi−1 and such that g(i)(bi) = 0 for each i ≤ n+ 1.
a bbn+1 b1bn b2· · ·
(Note that this requires g(i−1) to be continuous on [a, bi−1] and g(i) to exist on
(a, bi−1) for 1 ≤ i ≤ n+1, and this is guaranteed by the hypotheses of the theorem.)
Since g(n+1)(bn+1) = 0, we take t = bn+1 and we are done.
Example 2.27. We saw in a problem sheet that sin(x) has derivative cos(x),
and cos(x) has derivative − sin(x), so that
dn
dxn
sin(x) =
sin(x), n = 4k
cos(x), n = 4k + 1
− sin(x), n = 4k + 2
− cos(x), n = 4k + 3.
The fourth order Taylor polynomial for sin(x) at x = a is then
P4(x) = sin(0) +
cos(0)
1!
x+
− sin(0)
2!
x2 +
− cos(0)
3!
x3 +
sin(0)
4!
x4
= x− x
3
6
.
48
2 Differentiation Math40002: Analysis I
Taylor’s theorem says that there is some t ∈ (0, x) such that
sin(x) = P4(x) +
d5
dx5 sin(x)
∣∣∣
x=t
5!
x5 = x− x
3
6
+
cos(t)
120
x5.
But |cos(t)| ≤ 1, and if 0 < x ≤ 13 then |x5| ≤ 1243 , so for 0 < x ≤ 13
the approximation sin(x) ≈ x− x
3
6
is accurate to within 1243 · 1120 = 129160 <
0.000035.
Of course, there’s no reason why we have to stop a Taylor polynomial at some finite
order n.
Definition. Suppose that f (n)(a) exists for all n ≥ 0. The Taylor series for f
at x = a is
∞∑
n=0
f (n)(a)
n!
(x− a)n = f(a) + f
′(a)
1!
(x− a) + · · ·+ f
(n)(a)
n!
(x− a)n + . . . .
Question 9. Let Pn(x) be the nth order Taylor polynomial for f(x) at a = 0,
and P (x) the Taylor series at a = 0. Which of the following is true?
1. P (x) has infinite radius of convergence.
2. f(x) = P (x) on (−1, 1).
3. The error |Pn+1(x)− f(x)| is strictly smaller than |Pn(x)− f(x)|.
4. More than one of these.
5. None of these. X
None of the first three options is true: (1) If we take f(x) = 11−x , then f
(n)(x) =
n!(1− x)−n−1, so the Taylor series is
∞∑
n=0
n!
n!
xn =
∞∑
n=0
xn,
whose radius of convergence is only 1. (2) We cannot expect that f(x) = P (x)
on a fixed interval (−1, 1), because we could start with f = 0 and add a little
“bump” to the graph on the interval (13 ,
2
3), and the new function would still have
f (n)(0) = 0 for all n, so its Taylor series would be P (x) = 0. (3) We may have
|Pn+1(x)− f(x)| = |Pn(x)− f(x)| if f (n+1)(x) = 0.
49
2 Differentiation Math40002: Analysis I
Example 2.28. Since ex has derivative ex, we have d
n
dxn e
x = ex for all n ≥ 0.
The Taylor series for ex at x = 0 is
∞∑
n=0
dn
dxn e
x
∣∣∣
x=0
n!
xn =
∞∑
n=0
e0
n!
xn =
∞∑
n=0
xn
n!
,
which we recognize as not only a convergent series but the very definition of ex.
So ex is equal to its own Taylor series as a function R→ R.
Example 2.29. Let f : R→ R be a function such that f (n+1)(x) = 0 for all x.
If Pn(x) is its nth order Taylor polynomial, say at a = 0, then Taylor’s theorem
says for any x 6= 0 that for some t between 0 and x we have
f(x) = Pn(x) +
f (n+1)(t)
(n+ 1)!
xn+1 = Pn(x).
So in this case we have
f(x) =
n∑
i=0
f (i)(0)
i!
xi,
and in particular f must be a polynomial of degree at most n.
Despite the last few examples, it is definitely not true that every function is equal
to its own Taylor series at a point. When this is true, we say that f is analytic.
Example 2.30. Let f(x) =
{
e−1/x
2
, x 6= 0
0, x = 0.
The chain rule says that
f ′(x) =
2
x3
e−1/x
2
for all x 6= 0,
and we work directly with the definition of the derivative to compute f ′(0):
lim
x→0
∣∣∣∣∣e−1/x
2 − 0
x− 0
∣∣∣∣∣ = limy→∞ e−y
2
1/y
= lim
y→∞
y
ey
2
via the substitution y = 1x . (We can just take y → ∞ and not worry about
y → −∞, which happens as x ↑ 0, because lim
y→−∞
∣∣∣∣∣e−y
2
1/y
∣∣∣∣∣ is the same.) Since
50
2 Differentiation Math40002: Analysis I
et ≥ 1 + t for all t ≥ 0, we have
0 <
y
ey
2 ≤
y
y2 + 1
and the right side goes to zero as y →∞, so y
ey2
does as well and hence
lim
x→0
∣∣∣∣∣e−1/x
2 − 0
x− 0
∣∣∣∣∣ = 0 ⇒ f ′(0) = 0.
Therefore
f ′(x) =
{
2
x3 e
−1/x2 , x 6= 0
0, x = 0.
Similar but more complicated arguments show that f (n)(x) exists for all n, and
f (n)(0) = 0 for all n, so that f(x) has Taylor series
∞∑
n=0
f (n)(0)
n!
xn = 0.
But clearly f(x) is not actually zero anywhere except at x = 0.
2.5 Second derivatives and convexity
The first derivative f ′(a) can be thought of as the slope of a tangent line to the
graph of f(x) at x = a. What do the higher derivatives mean? Here we’ll try to at
least understand the second derivative a little better.
To start, remember that if f(x) has a local maximum or a local minimum at x = a,
and if f is differentiable at a, then f ′(a) = 0. The second derivative gives us a
converse to this statement in many situations.
Proposition 2.31. If f ′(a) = 0 and f ′′(a) > 0, then f(x) has a local minimum at
x = a. If f ′(a) = 0 and f ′′(a) < 0, then f(x) has a local maximum at x = a.
Proof. We only prove the case f ′′(a) > 0, since the other one is nearly identical.
By definition, we have
0 < f ′′(a) = lim
x→a
f ′(x)− f ′(a)
x− a = limx→a
f ′(x)
x− a.
Since the limit is strictly positive, we have
f ′(x)
x−a > 0 for all x 6= a in a neighborhood
51
2 Differentiation Math40002: Analysis I
(a− δ, a+ δ), where δ > 0. This means that
f ′(x) < 0 for all x ∈ (a− δ, a)
f ′(x) > 0 for all x ∈ (a, a+ δ).
So f is strictly monotone decreasing on (a − δ, a] and strictly monotone increasing
on [a, a+ δ), and together these imply that
f(x) ≥ f(a) for all a− δ < x < a+ δ,
with equality if and only if x = a. In other words, f(x) has a local minimum at
x = a.
Example 2.32. Let f(x) = x3 − x. Then f ′(x) = 3x2 − 1 is zero at x = ± 1√
3
,
and since f ′′(x) = 6x we have
f ′′(−1/
√
3) = −2
√
3 < 0, f ′′(1/
√
3) = 2
√
3 > 0.
So f(x) has a local maximum at x = − 1√
3
and a local minimum at x = 1√
3
.
−1 1− 1√
3
1√
3
Example 2.33. If a function f(x) satisfies f ′(a) = f ′′(a) = 0, then the second
derivative test is inconclusive: it does not say whether f has a local minimum
or a local maximum at x = a.
• The function f(x) = x3 satisfies f ′(0) = f ′′(0) = 0 (since f ′(x) = 3x2 and
f ′′(x) = 6x), and f has neither a local minimum nor a local maximum at
x = 0.
• The function g(x) = x4 satisfies g′(0) = g′′(0) = 0 (since g′(x) = 4x3 and
g′′(x) = 12x2), and g has a local minimum at x = 0.
Finally, we apply Taylor’s theorem to understand when a function is convex.
Definition. We say a function f : [a, b] → R is convex if for all c < t < d in
52
2 Differentiation Math40002: Analysis I
the domain, we have
f(c) +
f(d)− f(c)
d− c (t− c) ≥ f(t).
In other words, the height of the line through (c, f(c)) and (d, f(d)) at x = t is
at least f(t).
a bt
Convex
a bc dt
Not convex
An equivalent way of stating this is that the region
S = {(x, y) | x ∈ [a, b], y ≥ f(x)}
lying above the graph of f is a convex subset of R2: the line segment between
any two points of S lies entirely within S.
We can rearrange the definition by writing t = sc + (1 − s)d for some s ∈ [0, 1].
Then the left side becomes
f(c) +
f(d)− f(c)
d− c ((1− s)(d− c)) = f(c) + (1− s)(f(d)− f(c))
= sf(c) + (1− s)f(d)
and so f is convex if and only if for all c < d in the domain and all s ∈ (0, 1), we
have
sf(c) + (1− s)f(d) ≥ f(sc+ (1− s)d).
Question 10. Which of the following functions f : R→ R are convex? Select
all that apply.
1. f(x) = xX
2. f(x) = x2X
3. f(x) = x3
53
2 Differentiation Math40002: Analysis I
4. f(x) = exX
The line through any two points on the graph of f(x) = x coincides with the graph,
so it is convex. For f(x) = x2 and f(x) = ex the convexity should be clear from
looking at their graphs, though we can actually prove it using the next proposition.
And for f(x) = x3, the line through (−1,−1) and (1, 1) is y = x, and the point
(−12 ,−18) on the graph sits above this line, so it isn’t convex.
Proposition 2.34. Let f : [a, b] → R be continuous on [a, b] and have a second
derivative f ′′(x) on (a, b). Then f is convex if and only if f ′′(x) ≥ 0 for all x ∈ (a, b).
Proof. We prove the direction (=⇒) first. Suppose f is convex and take points x < y
in (a, b). We choose another two points c and d with
x < c < d < y,
and argue directly from the definition of convexity that
f(c)− f(x)
c− x ≤
f(y)− f(x)
y − x ≤
f(y)− f(d)
y − d
since these are the slopes of line segments from (x, f(x)) to (c, f(c)), from (x, f(x))
to (y, f(y)), and from (d, f(d)) to (y, f(y)) respectively.
x c d y
≤
≤
We take limits as c ↓ x to get f ′(x) ≤ f(y)− f(d)
y − d , and then as d ↑ y to get
f ′(x) ≤ f ′(y).
But this works for any x and y in (a, b) with x < y, so f ′ is monotone increasing on
(a, b), and thus f ′′(z) ≥ 0 for all z ∈ (a, b).
In order to prove the direction (⇐=), we now suppose that f ′′(x) ≥ 0 for all x ∈
(a, b). Fix c < t < d in the interval [a, b] and write t = sc+(1−s)d, where 0 < s < 1.
By Taylor’s theorem, we can write f(c) in terms of the first-order Taylor polynomial
of f at x = t: there is some z ∈ (c, t) such that
f(c) = f(t) + f ′(t)(c− t) + f
′′(z)
2
(c− t)2,
54
2 Differentiation Math40002: Analysis I
and since f ′′(z) ≥ 0 this gives us an inequality
f(c) ≥ f(t) + f ′(t)(c− t)
= f(t) + f ′(t)
(
(1− s)(c− d)).
Likewise for f(d), there is some w ∈ (t, d) such that
f(d) = f(t) + f ′(t)(d− t) + f
′′(w)
2
(d− t)2 ≥ f(t) + f ′(t)(d− t),
so that
f(d) ≥ f(t) + f ′(t)(s(d− c)).
Combining these inequalities, we have
sf(c) + (1− s)f(d) ≥ s
(
f(t) + f ′(t)
(
(1− s)(c− d)))
+ (1− s)
(
f(t) + f ′(t)
(
s(d− c)))
=
(
s+ (1− s)) f(t) + f ′(t) (s(1− s)(c− d) + s(1− s)(d− c))
= f(t)
= f(sc+ (1− s)d),
and this works for any t ∈ (c, d), hence for any s ∈ (0, 1), so f is convex.
2.6 Limits of differentiable functions
We have already seen that a sequence
(
fn
)
of continuous functions can converge
pointwise to a discontinuous function; introducing the notion of uniform convergence
gave us a way to ensure that the limit is continuous. We can ask similar questions
about the derivative of a pointwise limit.
Example 2.35. Define a sequence fn : [0, 1]→ R for all n ∈ N by
fn(x) =
xn
n
.
Then fn → 0 uniformly, because for any > 0 and all x ∈ [0, 1] we have
|fn(x)− 0| =
∣∣∣∣xnn
∣∣∣∣ ≤ ∣∣∣∣ 1n
∣∣∣∣ <
as long as n > 1 . But the fn are all differentiable, and the familiar sequence
f ′n(x) = x
n−1
55
2 Differentiation Math40002: Analysis I
does not converge to a continuous function on [0, 1]. In particular, we have
lim
n→∞ f
′
n(1) = 1
even though the derivative of the limiting function f(x) = 0 satisfies f ′(1) = 0.
The problem in this example is that the derivatives f ′n(x) do not converge uniformly.
When they do, the outcome is much nicer.
Theorem 2.36
Let fn : [a, b]→ R be a sequence of differentiable functions, and suppose there
is some c ∈ [a, b] such that lim
n→∞ fn(c) exists. If the sequence
(
f ′n(x)
)
converges
uniformly on [a, b], then
(
fn
)
converges uniformly to a function f : [a, b]→ R,
and for all x ∈ (a, b) we have
f ′(x) = lim
n→∞ f
′
n(x).
The proof is a bit long, so we’ll break it up into two steps.
Step 1: the functions fn converge uniformly. Fixing some > 0, the convergence of(
fn(c)
)
and the uniform convergence of the functions
(
f ′n(x)
)
guarantees that there
is some N ≥ 1 such that for all m,n ≥ N ,
|fn(c)− fm(c)| <
2
and |f ′n(x)− f ′m(x)| <
2
· 1
b− a
for all x ∈ [a, b]. The left side is simply the fact that (fn(c)) is a Cauchy sequence; for
the right side, if f ′n(x)→ g(x) then we take n large enough such that |f ′n(x)−g(x)| <
4(b−a) for all x, and apply the triangle inequality
|f ′n(x)− f ′m(x)| ≤ |f ′n(x)− g(x)|+ |g(x)− f ′m(x)| <
4(b− a) +
4(b− a) .
With m,n ≥ N as above, we apply the mean value theorem to the function fn(x)−
fm(x) to see that for x 6= c, there is some t between x and c such that
(fn(x)− fm(x))− (fn(c)− fm(c))
x− c = f
′
n(t)− f ′m(t).
We have |f ′n(t)− f ′m(t)| · |x− c| ≤ 2(b−a)(b− a) = 2 , so by the triangle inequality
|fn(x)− fm(x)| ≤ |(fn(x)− fm(x))− (fn(c)− fm(c))|+ |fn(c)− fm(c)|
<
2
+
2
= .
56
2 Differentiation Math40002: Analysis I
This says that for any x ∈ [a, b], the sequence (fn(x)) is Cauchy, hence convergent.
We define f : [a, b] → R to be the pointwise limit of the fn, and taking m → ∞
above gives
|fn(x)− f(x)| ≤ for all n ≥ N.
So in fact the functions fn converge uniformly to f .
Step 2: the derivatives f ′n converge to f ′. We now fix a point y ∈ (a, b) and consider
the continuous functions
gn(x) =
{
fn(x)−fn(y)
x−y , x 6= y
f ′n(y), x = y
defined on [a, b]. We fix > 0 and then take N ≥ 1 and m,n ≥ N just as before,
and then for x 6= y, the mean value theorem once again gives us t between x and y
such that∣∣∣∣(fn(x)− fm(x))− (fn(y)− fm(y))x− y
∣∣∣∣ = |f ′n(t)− f ′m(t)| < 2(b− a) .
Taking m→∞, this becomes∣∣∣∣(fn(x)− fn(y))− (f(x)− f(y))x− y
∣∣∣∣ ≤ 2(b− a) ,
hence |gn(x)−g(x)| ≤ 2(b−a) for all x 6= y. Since limn→∞ gn(y) also exists by hypothesis,
it follows that gn converges uniformly to a function g, with
g(x) =
f(x)− f(y)
x− y for all x 6= y,
Since the convergence is uniform, Corollary 1.35 tells us that we can exchange limits:
lim
n→∞ f
′
n(y) = lim
n→∞
(
lim
x→y gn(x)
)
= lim
x→y
(
lim
n→∞ gn(x)
)
= lim
x→y g(x)
= lim
x→y
f(x)− f(y)
x− y = f
′(y),
and this is exactly what we wanted to show.
57
2 Differentiation Math40002: Analysis I
Theorem 2.37: Differentiation of power series
Let f(x) =
∞∑
n=0
anx
n be a power series with radius of convergence R > 0. Then
f has a continuous derivative on (−R,R), and
f ′(x) =
∞∑
n=1
nanx
n−1
for all |x| < R.
Proof. We recall that f(x) converges absolutely for any |x| < R. Given any positive
r < R, and letting t = r+R2 , we also proved by the Weierstrass M-test that the
partial sums
fn(x) =
n∑
i=0
aix
i
converge uniformly to f(x) on the interval [−t, t] ⊂ (−R,R). Thus fn(0)→ f(0).
We have f ′n(x) =
n∑
i=1
iaix
i−1. If |x| < t, then for all sufficiently large i we have
∣∣∣iaixi−1∣∣∣ = |aiti|
t
∣∣∣∣i(xt )i−1
∣∣∣∣ ≤ |aiti|t .
The key observation here, left as an exercise, is that if 0 ≤ s < 1 then isi−1 ≤ 1 for
i large enough. So we apply the Weierstrass M-test with
|iaixi−1| ≤Mi = ait
i
t
for all large enough i,
using the fact that 1t
∞∑
i=0
ait
i converges absolutely, to see that
f ′n(x)→
∞∑
i=0
iaix
i−1 uniformly on (−t, t),
with the limit series being continuous. (Why is it not a problem that we may have
|iaixi−1| > Mi for finitely many i?) Since the derivatives f ′n(x) converge uniformly
on [−r, r] ⊂ (−t, t), and fn(0)→ f(0), the above theorem tells us that
f ′(x) = lim
n→∞ f
′
n(x) =
∞∑
i=1
iaix
i−1
58
2 Differentiation Math40002: Analysis I
on the interval [−r, r], and this works for any r < R.
Example 2.38. Let f(x) = 11−x for |x| < 1. Then f is equal to the power series
1
1− x =
∞∑
n=0
xn, which has radius of convergence 1. We can differentiate both
sides to get
1
(1− x)2 =
∞∑
n=1
nxn−1
and then multiply by x to deduce for all x ∈ (−1, 1) the identity
x
(1− x)2 =
∞∑
n=1
nxn = x+ 2x2 + 3x3 + 4x4 + . . . .
Example 2.39. We defined cos : R→ R and sin : R→ R by
cos(x) = ReE(ix), sin(x) = ImE(ix).
Using the power series E(x) =
∞∑
n=0
xn
n!
, we have
E(ix) =
(∑
n even
(−1)n/2xn
n!
)
+ i
∑
n odd
(−1)(n−1)/2xn
n!
,
so substituting n = 2k in the first sum and n = 2k + 1 on the right gives
cos(x) =
∞∑
k=0
(−1)kx2k
(2k)!
= 1− x
2
2
+
x4
4!
− x
6
6!
+ . . . ,
sin(x) =
∞∑
k=0
(−1)kx2k+1
(2k + 1)!
= x− x
3
3!
+
x5
5!
− x
7
7!
+ . . . .
Both of these are power series with infinite radius of convergence, so we can
differentiate term by term to get
d
dx
sin(x) =
∞∑
k=0
(2k + 1)
(−1)kx2k
(2k + 1)!
=
∞∑
k=0
(−1)kx2k
(2k)!
= cos(x)
59
2 Differentiation Math40002: Analysis I
and
d
dx
cos(x) =
∞∑
k=1
(2k)
(−1)kx2k−1
(2k)!
=
∞∑
m=0
−(−1)mx2m+1
(2m+ 1)!
= − sin(x).
(Here in the second step we substituted m = k− 1.) So we conclude that sin(x)
and cos(x) are differentiable on all of R, with derivatives cos(x) and − sin(x)
respectively.
We can use these computations to understand more about sin(x) and cos(x), and in
particular to define the constant pi.
Lemma 2.40. There is a positive real number y > 0 such that sin(y) = 0.
Proof. Suppose not. Then it’s also true that cos(x) 6= 0 for all x > 0, because if
cos(x) = 0 for some x > 0 then
sin(2x) = 2 sin(x) cos(x) = 0
and we can take y = 2x. Since cos(0) = 1 is positive, the intermediate value theorem
implies that cos(x) > 0 for all x > 0. But then the derivative of sin(x) is positive on
(0,∞), so sin(x) is strictly monotone increasing on [0,∞). In particular, sin(x) > 0
for x > 0 as well.
We now apply the mean value theorem to cos(x): for any x > 1, there is some
t ∈ (1, x) such that
cos(x)− cos(1)
x− 1 = − sin(t) =⇒ cos(x) = cos(1)− (x− 1) sin(t).
Since sin is a positive, increasing function on (0,∞), it follows that
cos(x) < cos(1)− (x− 1) sin(1) for all x > 1,
but if we let x = 1+
cos(1)
sin(1)
> 1 then we have cos(x) < 0 and this is a contradiction.
Proposition 2.41. Define pi = inf S, where
S = {y > 0 | sin(y) = 0}.
Then pi > 0, and sin(0) = sin(pi) = 0 while sin(x) > 0 for 0 < x < pi.
60
2 Differentiation Math40002: Analysis I
Proof. We have shown that S is nonempty, and it is bounded below by 0, so pi ≥ 0.
We also have sin(pi) = 0, because if we take a sequence
(
xn
) ⊂ S with xn → pi then
the sequential continuity of sin(x) says that
sin(pi) = lim
n→∞ sin(xn) = limn→∞ 0 = 0.
In order to see that pi > 0, we note that ddx sin(x) = cos(x) is positive on some
interval [0, δ) with δ > 0, since cos(0) = 1, and so sin(x) is strictly monotone
increasing on [0, δ]. It follows that sin(x) > sin(0) = 0 for all x ∈ (0, δ), and hence δ
is a lower bound for S as well, so pi ≥ δ > 0. And then sin( δ2) > 0, so if sin(x) ≤ 0
for some x ∈ (0, pi) then the intermediate value theorem would give us an element
of S in (0, pi), and this is impossible, so sin(x) > 0 on (0, pi) as claimed.
In fact, it is possible to show that S = {npi | n ∈ N}, though we will not do this
here; we merely observe that
sin(2pi) = 2 sin(pi) cos(pi) = 0,
and one can show that sin(npi) = 0 for all n ≥ 1 by induction.
We now observe from their power series that
cos(−x) =
∞∑
k=0
(−1)k(−x)2k
(2k)!
=
∞∑
k=0
(−1)kx2k
(2k)!
= cos(x)
and
sin(−x) =
∞∑
k=0
(−1)k(−x)2k+1
(2k + 1)!
= −
∞∑
k=0
(−1)kx2k+1
(2k + 1)!
= − sin(x),
meaning that cos(x) and sin(x) are even and odd functions, respectively. Then
cos2(x) + sin2(x) =
(
cos(x) + i sin(x)
) (
cos(x)− i sin(x))
=
(
cos(x) + i sin(x)
) (
cos(−x) + i sin(−x))
= E(ix)E(−ix)
= E(ix+ (−ix)) = E(0) = 1.
Theorem 2.42
We have sin(x+2pi) = sin(x) and cos(x+2pi) = cos(x) for all x ∈ R. The same
identities do not hold if we replace 2pi by any smaller p > 0.
61
2 Differentiation Math40002: Analysis I
Proof. It follows from the angle addition formula for cos(x) that
cos(2x) = cos2(x)− sin2(x) = 1− 2 sin2(x) = 2 cos2(x)− 1,
and since sin(x) is strictly positive on the interval (0, pi) and sin(pi) = 0, we have
cos(pi) = 1− 2 sin2
(
pi
2
)
< 1.
But cos2(pi) = 1− sin2(pi) = 1, so we must have cos(pi) = −1 and
cos(2pi) = 2 cos2(pi)− 1 = 1.
We use this to conclude that
sin(x+ 2pi) = sin(x) cos(2pi) + cos(x) sin(2pi) = sin(x)
cos(x+ 2pi) = cos(x) cos(2pi)− sin(x) sin(2pi) = cos(x)
for all x ∈ R.
Finally, if sin(x+p) = sin(x) and cos(x+p) = cos(x) where 0 < p < 2pi, then we have
sin(p) = sin(0) = 0, so p ≥ pi by our definition of pi, and likewise cos(p) = cos(0) = 1
implies that p 6= pi. But then
sin(p− pi) = sin(p) cos(pi)− cos(p) sin(pi) = 0 · cos(pi)− cos(p) · 0 = 0
while 0 < p − pi < pi, contradicting the fact that sin(x) is nonzero on (0, pi). Thus
we cannot replace 2pi with any smaller positive value of p, as claimed.
62
3 Integration Math40002: Analysis I
3 Integration
No one’s fast like Gaston
Good at maths like Gaston
Finds the area under a graph like Gaston
- “Gaston Darboux”, definitely not from Beauty and the Beast
A definite integral
∫ b
a
f(x) dx is supposed to measure the area between the x-axis
and the graph of a function f : [a, b]→ R.
f(x)
Area =
∫ b
a
f(x) dx
a b
In the next few sections we’ll see how to make this precise, using the Darboux integral
– this is equivalent to the more familiar Riemann integral, so we may also call it
“Riemann–Darboux integration” – and develop some key properties.
3.1 Darboux sums
The rough idea behind the Darboux integral is that we can estimate the area under
the graph of f(x) by approximating it with rectangles, since we know the area of
a rectangle. If the rectangles all lie under the graph of f(x), then we’ll get a lower
bound; if they cover the whole area between the x-axis and the graph, then we’ll
get an upper bound.
a b
≤ Area ≤
a b
In order to do this, we split the interval [a, b] into finitely pieces as follows.
63
3 Integration Math40002: Analysis I
Definition. A partition of the interval [a, b] is a finite sequence of real numbers
P = (x0, x1, x2, . . . , xk) such that
a = x0 < x1 < x2 < · · · < xk = b.
We can view a partition P = (x0, . . . .xk) of [a, b] as splitting it up into closed
intervals
[x0, x1], [x1, x2], [x2, x3], . . . , [xk−1, xk],
and we write ∆xi = xi+1 − xi > 0 for the length of the interval [xi, xi+1]. If
f : [a, b] → R is a bounded function (but not necessarily continuous!), then we can
define
mi = inf
xi≤t≤xi+1
f(t), Mi = sup
xi≤t≤xi+1
f(t)
for 0 ≤ i < k. We define the lower Darboux sum of f with respect to P as
L(f, P ) =
k−1∑
i=0
mi∆xi
and the upper Darboux sum of f with respect to P is similarly
U(f, P ) =
k−1∑
i=0
Mi∆xi.
Note that the mi∆xi terms are the areas of rectangles lying just under the graph of
f , as pictured above at left, and the Mi∆xi terms are areas of rectangles lying just
above the graph, as shown above at right.
Question 11. Which of the following is true for any function f : [a, b] → R
and any partition P = (x0, x1, . . . , xn) of [a, b]?
1. At least one of L(f, P ) and U(f, P ) exists.
2. If f is continuous then both L(f, P ) and U(f, P ) exist. X
3. We always have L(f, P ) < U(f, P ) if both are defined.
4. The value of L(f, P ) does not depend on P .
If f is continuous then the extreme value theorem says that it is bounded on [a, b],
so both L(f, P ) and U(f, P ) exist. It is possible for f to not be bounded above or
below on any interval, and then L(f, P ) and U(f, P ) are not defined because inf f
and sup f are not defined on any interval: for example, consider
f(x) =
{
0, x 6∈ Q
(−1)pq, x = pq ∈ Q.
64
3 Integration Math40002: Analysis I
Also, if f is constant then we have L(f, P ) = U(f, P ), and nearly any example
will show that L(f, P ) depends on P (take f(x) = x on [0, 1] and P = (0, 1) but
P ′ = (0, 12 , 1)).
Example 3.1. Suppose that f(x) = c is constant on [a, b]. Then for any
partition P = (x0, . . . , xk) of [a, b], we have
mi = inf
xi≤t≤xi+1
f(t) = c, Mi = sup
xi≤t≤xi+1
f(t) = c
for all i. The corresponding lower and upper Darboux sums are
L(f, P ) =
k−1∑
i=0
mi∆xi = c(x1 − x0) + c(x2 − x1) + · · ·+ c(xk − xk−1)
= c(xk − x0) = c(b− a)
and similarly
U(f, P ) =
k−1∑
i=0
Mi∆xi = c(xk − x0) = c(b− a).
So L(f, P ) = U(f, P ) = c(b− a) for all P .
Example 3.2. Let f(x) =
{
0, x ∈ Q
1, x 6∈ Q, and fix a partition P = (x0, . . . , xk) of
[a, b]. Then any interval [xi, xi+1] contains both rational and irrational numbers,
so we have
mi = inf
xi≤t≤xi+1
f(t) = 0, Mi = sup
xi≤t≤xi+1
f(t) = 1.
Thus for any P we have
L(f, P ) =
k−1∑
i=0
0 ·∆xi = 0,
U(f, P ) =
k−1∑
i=0
1 ·∆xi = b− a.
65
3 Integration Math40002: Analysis I
Example 3.3. Let f(x) = x on the interval [0, 1], and consider the partition
Pn = (0,
1
n ,
2
n , . . . ,
n−1
n , 1) with xi =
i
n for 0 ≤ i ≤ n. We have ∆xi = 1n for all
i < n, and
mi = inf
i
n
≤t≤ i+1
n
t =
i
n
, Mi = sup
i
n
≤t≤ i+1
n
t =
i+ 1
n
,
so we compute that
L(f, Pn) =
n−1∑
i=0
i
n
· 1
n
=
1
n2
n−1∑
i=0
i =
1
n2
n(n− 1)
2
=
n− 1
2n
and similarly
U(f, Pn) =
n−1∑
i=0
i+ 1
n
· 1
n
=
1
n2
n−1∑
i=0
(i+ 1) =
1
n2
n(n+ 1)
2
=
n+ 1
2n
.
Lemma 3.4. Given a bounded function f : [a, b] → R and a partition P of [a, b],
we have
L(f, P ) ≤ U(f, P ).
Proof. For all i = 0, 1, . . . , k − 1 we have ∆xi > 0 and
mi = inf
xi≤t≤xi+1
f(t) ≤ sup
xi≤t≤xi+1
f(t) = Mi,
and so
L(f, P ) =
k−1∑
i=0
mi∆xi ≤
k−1∑
i=0
Mi∆xi = U(f, P ).
Of course, there’s no reason to think that L(f, P ) and U(f, P ) should be equal, but
we might hope that the more points we add to our partition, the closer they get to
the actual area we want to measure, and this turns out to be true.
Definition. A partition Q is a refinement of P , written P ≺ Q, if and only if
every point of P is also a point of Q.
The common refinement R of any two partitions P and Q is the partition whose
points are precisely those points belonging to either of P and Q. It satisfies
66
3 Integration Math40002: Analysis I
P ≺ R and Q ≺ R.
P
Q
R
a b
Question 12. Let P,Q,R, S be partitions of [a, b]. Which of the following is
not true?
1. If P ≺ Q and Q ≺ R, then P ≺ R.
2. Either P ≺ Q or Q ≺ P . X
3. If R is the common refinement of P and Q, then P ≺ S and Q ≺ S implies
R ≺ S.
4. ∀ > 0, P has a refinement Q = (x0, . . . , xn) with max
i
(xi+1 − xi) < .
The second option is false: for example, if P = (0, 13 , 1) and Q = (0,
2
3 , 1) then
1
3 ∈ P
does not belong to Q, and 23 ∈ Q does not belong to P . The first and third options
follow from the definitions, and the fourth is less obvious but we can first take
R =
(
a, a+
1
n
(b− a), a+ 2
n
(b− a), . . . , n− 1
n
(b− a), b
)
for n large enough – here we have max
i
(xi+1 − xi) = b− a
n
< if n > b−a – and
then let Q be the common refinement of P and R.
Proposition 3.5. If Q is a refinement of P , then we have
L(f, P ) ≤ L(f,Q) ≤ U(f,Q) ≤ U(f, P ).
Proof. We’ll reduce this to the case where Q \ P is a single point. In general, if
Q \ P consists of m points a1, . . . , am, then we can define a sequence of partitions
P = P0 ≺ P1 ≺ P2 ≺ · · · ≺ Pm = Q,
where for 1 ≤ k ≤ m we build Pk by adding the point ak to Pk−1. Since |Pk\Pk−1| =
1 for all k, we repeatedly apply the m = 1 case of the proposition to get
L(f, P ) ≤ L(f, P1) ≤ L(f, P2) ≤ · · · ≤ L(f, Pm) = L(f,Q)
67
3 Integration Math40002: Analysis I
and likewise
U(f,Q) = U(f, Pm) ≤ U(f, Pm−1) ≤ · · · ≤ U(f, P1) ≤ U(f, P ).
Putting these together, we conclude the general case from the case where m = 1.
To prove the m = 1 case, suppose that P = (x0, x1, . . . , xn) and that
Q = (x0, . . . , xk, y, xk+1, . . . , xn)
for some k and some y ∈ (xk, xk+1).
a bxk xk+1 a by
Then almost all terms in the lower Darboux sums for P and Q are the same, be-
cause they mostly compute the areas of the same rectangles: the difference happens
entirely on the interval [xk, xk+1], and so we compute that
L(f,Q)− L(f, P ) =
((
inf
xk≤t≤y
f(t)
)
(y − xk) +
(
inf
y≤t≤xk+1
f(t)
)
(xk+1 − y)
)
−
(
inf
xk≤t≤xk+1
f(t)
)
(xk+1 − xk).
Some rearranging, using the fact that xk+1 − xk = (xk+1 − y) + (y − xk), gives us
L(f,Q)− L(f, P ) =
((
inf
xk≤t≤y
f(t)
)
−
(
inf
xk≤t≤xk+1
f(t)
))
(y − xk)
+
((
inf
y≤t≤xk+1
f(t)
)
−
(
inf
xk≤t≤xk+1
f(t)
))
(xk+1 − y).
Both y − xk and xk+1 − y are positive, and we have(
inf
xk≤t≤y
f(t)
)
,
(
inf
y≤t≤xk+1
f(t)
)
≥
(
inf
xk≤t≤xk+1
f(t)
)
because a lower bound for f(x) on [xk, xk+1] is certainly also a lower bound for f(x)
on each of [xk, y] and [y, xk+1], so each difference of infima above is nonnegative and
we conclude that
L(f,Q)− L(f, P ) ≥ 0.
The same argument with sup instead of inf shows that U(f,Q)− U(f, P ) ≤ 0.
68
3 Integration Math40002: Analysis I
Using these inequalities, we can prove that any lower Darboux sum for f is less than
or equal to any upper Darboux sum of f , regardless of the partitions we use.
Proposition 3.6. If f : [a, b] → R is bounded, and P and Q are any partitions of
[a, b], then
L(f, P ) ≤ U(f,Q).
Proof. We let R be the common refinement of P and Q. Then P ≺ R and Q ≺ R,
so
L(f, P ) ≤ L(f,R) ≤ U(f,R) ≤ U(f,Q)
by applying the last proposition twice. These imply that L(f, P ) ≤ U(f,Q).
3.2 The Darboux integral
The inequality L(f, P ) ≤ U(f,Q) for any P and Q says that the set
{L(f, P ) | P is a partition of [a, b]}
is bounded above by any upper Darboux sum U(f,Q), and likewise the set
{U(f, P ) | P is a partition of [a, b]}
is bounded below by any lower Darboux sum L(f,Q).
Definition. Let f : [a, b]→ R be a bounded function. We define the lower and
upper Darboux integrals of f on [a, b] by∫ b
a
f(x) dx = sup
P
L(f, P ),
∫ b
a
f(x) dx = inf
P
U(f, P ).
Lemma 3.7. If f : [a, b]→ R is bounded, then
∫ b
a
f(x) dx ≤
∫ b
a
f(x) dx.
Proof. For any partitions P and Q, we have L(f, P ) ≤ U(f,Q), and so U(f,Q) is
an upper bound for the set of all lower Darboux sums L(f, P ), or∫ b
a
f(x) dx = sup
P
L(f, P ) ≤ U(f,Q).
69
3 Integration Math40002: Analysis I
Then
∫ b
a
f(x) dx is a lower bound for the set of all upper Darboux sums U(f,Q),
so ∫ b
a
f(x) dx ≤ inf
Q
U(f,Q) =
∫ b
a
f(x) dx
as claimed.
Definition. If the upper and lower Darboux integrals of f on [a, b] are equal,
then we say that f is (Darboux) integrable on [a, b], and we define∫ b
a
f(x) dx
def
=
∫ b
a
f(x) dx =
∫ b
a
f(x) dx.
Remark 3.8. The “dx” part of the integral is a bit of notation that says we’re
integrating a function of x. We could change the name of the variable, and the
definition would still be the same: it makes perfect sense to say that∫ b
a
f(t) dt =
∫ b
a
f(x) dx.
Example 3.9. When f(x) = c is constant on [a, b], we computed that L(f, P ) =
U(f, P ) = c(b− a) for all P . Thus f(x) = c is integrable, with∫ b
a
c dx = c(b− a).
Example 3.10. We computed for f(x) =
{
0, x ∈ Q
1, x 6∈ Q on the interval [a, b] that
L(f, P ) = 0 and U(f, P ) = b− a for all P . Thus∫ b
a
f(x) dx = 0 < b− a =
∫ b
a
f(x) dx
and since the lower and upper Darboux integrals are not equal, f(x) is not
integrable on [a, b].
70
3 Integration Math40002: Analysis I
Example 3.11. Let f(x) = x on the interval [0, 1], and let Pn = (0,
1
n ,
2
n , . . . ,
n−1
n , 1).
We computed that
L(f, Pn) =
n− 1
2n
for all n ⇒
∫ 1
0
x dx ≥ sup
n∈N
n− 1
2n
=
1
2
,
and similarly
U(f, Pn) =
n+ 1
2n
for all n ⇒
∫ 1
0
x dx ≤ inf
n∈N
n+ 1
2n
=
1
2
.
Since the lower Darboux integral is less than or equal to the upper one, we have
1
2
≤
∫ 1
0
x dx ≤
∫ 1
0
x dx ≤ 1
2
,
and hence both of them must equal 12 . So f(x) = x is integrable on [0, 1], with∫ 1
0
x dx =
1
2
.
Note that this coincides with the area of the triangle 0 ≤ y ≤ x, 0 ≤ x ≤ 1.
0 1
1
This last example illustrates an important principle: if we want to show that f(x)
is integrable on [a, b] then we don’t really need to consider all partitions of [a, b],
just some well-chosen sequence of partitions for which the lower and upper Darboux
sums converge to the same value. To make this precise:
Proposition 3.12. A bounded function f : [a, b]→ R is integrable if and only if for
every > 0, there is a partition P of [a, b] such that
U(f, P )− L(f, P ) < .
Proof. (=⇒) Since f is integrable we have
sup
P
L(f, P ) =
∫ b
a
f(x) dx =
∫ b
a
f(x) dx =
∫ b
a
f(x) dx = inf
P
U(f, P ),
71
3 Integration Math40002: Analysis I
so given > 0 we can find partitions Q and R of [a, b] such that
L(f,Q) >
∫ b
a
f(x) dx−
2
, U(f,R) <
∫ b
a
f(x) dx+
2
.
Let P be the common refinement of Q and R; then Q ≺ P and R ≺ P , so we have
L(f,Q) ≤ L(f, P ) ≤ U(f, P ) ≤ U(f,R)
and therefore
U(f, P )− L(f, P ) ≤ U(f,R)− L(f,Q)
<
(∫ b
a
f(x) dx+
2
)
−
(∫ b
a
f(x) dx−
2
)
= .
(⇐=) Take > 0 and a partition P with U(f, P )− L(f, P ) < . Then we have
0 ≤
∫ b
a
f(x) dx−
∫ b
a
f(x) dx =
(
inf
Q
U(f,Q)
)
−
(
sup
Q
L(f,Q)
)
≤ U(f, P )− L(f, P ) < .
Since the difference between the upper and lower Darboux integrals lies in [0, ) for
all > 0, it must be 0, and so f is integrable.
By thinking a little harder about this argument, we can also extract the value of∫ b
a
f(x) dx from the Darboux sums of any sequence of partitions that were used to
prove the integrability of f(x).
Proposition 3.13. Given a sequence
(
Pn
)
of partitions of the interval [a, b] such
that lim
n→∞
(
U(f, Pn)− L(f, Pn)
)
= 0, we have
∫ b
a
f(x) dx = lim
n→∞L(f, Pn) = limn→∞U(f, Pn).
Proof. The previous proposition shows that f(x) is integrable on [a, b]. We have a
sequence of inequalities
L(f, Pn) ≤
∫ b
a
f(x) dx =
∫ b
a
f(x) dx =
∫ b
a
f(x) dx ≤ U(f, Pn)
72
3 Integration Math40002: Analysis I
for any n, so from L(f, Pn) ≤
∫ b
a
f(x) dx ≤ U(f, Pn) it follows immediately that
0 ≤
∫ b
a
f(x) dx− L(f, Pn) ≤ U(f, Pn)− L(f, Pn).
The right side goes to zero as n → ∞, hence so does ∫ b
a
f(x) dx − L(f, Pn). The
same argument shows that U(f, Pn)−
∫ b
a
f(x) dx→ 0.
Example 3.14. Let f(x) = x2 on [0, 1], and consider the partitions Pn =
(0, 1n ,
2
n , . . . ,
n−1
n , 1). Then
L(f, Pn) =
n−1∑
i=0
(
inf
i
n
≤t≤ i+1
n
t2
)
∆xi =
n−1∑
i=0
(
i
n
)2
1
n
=
1
n3
n−1∑
i=0
i2
U(f, Pn) =
n−1∑
i=0
sup
i
n
≤t≤ i+1
n
t2
∆xi = n−1∑
i=0
(
i+ 1
n
)2
1
n
=
1
n3
n∑
j=1
j2
and so U(f, Pn)− L(f, Pn) = 1n3 (n2 − 02) = 1n → 0. Thus∫ 1
0
x2 dx = lim
n→∞U(f, Pn) = limn→∞
1
n3
n(n+ 1)(2n+ 1)
6
=
1
3
.
Example 3.15. Let f(x) = 1x on [1, b] for some integer b > 1, and let
Pn =
(
1, 1 +
1
n
, 1 +
2
n
, . . . , b− 1
n
, b
)
.
Then we compute that
L(f, Pn) =
(b−1)n−1∑
i=0
(
inf
1+ i
n
≤t≤1+ i+1
n
1
t
)
∆xi
=
(b−1)n−1∑
i=0
1
1 + i+1n
· 1
n
=
(b−1)n−1∑
i=0
1
n+ i+ 1
=
1
n+ 1
+
1
n+ 2
+ · · ·+ 1
bn
= Hbn −Hn,
where Hk =
1
1 +
1
2 + · · · + 1k is the kth harmonic sum. The same computation
73
3 Integration Math40002: Analysis I
shows that
U(f, Pn) =
1
n
+ · · ·+ 1
bn− 1 ⇒ U(f, Pn)− L(f, Pn) =
1
n
− 1
bn
→ 0,
so f(x) is integrable, and∫ b
1
1
x
dx = lim
n→∞L(f, Pn) = limn→∞ (Hbn −Hn) .
On a problem sheet we showed that γ = lim
k→∞
(
Hk − log(k)
)
exists, so we have
lim
n→∞ (Hbn −Hn) = limn→∞
((
Hbn − log(bn)
)− (Hn − log(n))+ log(b))
= lim
n→∞(Hbn − log(bn))− limn→∞(Hn − log(n)) + log(b)
= γ − γ + log(b) = log(b)
by the algebra of limits, and thus
∫ b
1
1
x dx = log(b).
The same criterion for integrability lets us prove the following important theorem,
showing that the vast majority of familiar functions are integrable.
Theorem 3.16
Let f : [a, b]→ R be a continuous function. Then f is integrable.
Proof. We know that f is bounded by the extreme value theorem, and it is uniformly
continuous since [a, b] is compact. The latter says that given any > 0, there is a
δ > 0 such that for all x, y ∈ [a, b],
|x− y| < δ ⇒ |f(x)− f(y)| <
b− a.
We can choose a partition P = (x0, x1, . . . , xn) of [a, b] such that ∆xi = xi+1−xi < δ
for all i, say by taking x0 = a and xn = b where n = b2(b−a)δ c, and then letting
xi = a+
iδ
2 for 1 ≤ i ≤ n− 1. Then for all i we have
y, z ∈ [xi, xi+1] ⇒ |y − z| ≤ ∆xi < δ ⇒ |f(y)− f(z)| <
b− a.
By the extreme value theorem, we can find y, z ∈ [xi, xi+1] such that
Mi −mi =
(
sup
xi≤t≤xi+1
f(t)
)
−
(
inf
xi≤t≤xi+1
f(t)
)
= f(y)− f(z),
74
3 Integration Math40002: Analysis I
and then |y − z| < δ implies that
0 ≤Mi −mi <
b− a.
We now estimate the difference between the upper and lower Darboux sums as
U(f, P )− L(f, P ) =
n−1∑
i=0
(Mi −mi)∆xi
<
n−1∑
i=0
b− a(xi+1 − xi)
=
b− a
n−1∑
i=0
(xi+1 − xi)
=
b− a(xn − x0) =
b− a(b− a) = .
Since such a P exists for any > 0, we conclude that f(x) is integrable on [a, b].
The converse to this theorem is false, though: an integrable function need not be
continuous.
Example 3.17. Pick some c ∈ [a, b] and define f : [a, b]→ R by
f(x) =
{
0, x 6= c
1, x = c.
Then for any partition P of [a, b], we have L(f, P ) = 0, because every interval
[xi, xi+1] contains a point x 6= c where f(t) = 0. Thus
∫ b
a
f(x) dx = 0. We also
have
U(f, P ) =
∑
c∈[xi,xi+1]
1 ·∆xi,
and either one or two such closed intervals contain c – it’s one if c is in the
interior of an interval, and two if it’s a common endpoint of two of them.
1
a bc
1
a bc
75
3 Integration Math40002: Analysis I
We choose a partition Pn such that ∆xi <
1
2n for all i, and then we have
U(f, Pn) < 2
(
1
2n
)
= 1n . It follows that
∫ b
a
f(x) dx ≤ 0, so in fact f(x) is in-
tegrable and
∫ b
a
f(x) dx = 0.
Question 13. Under which of the following circumstances must a bounded
function f : [0, 1]→ R be integrable?
1. f is differentiable on (0, 1).
2. f is monotone increasing.
3. f is discontinuous at finitely many points.
4. All of these. X
5. None of these.
If f is differentiable on (0, 1) then it is continuous there, so it is integrable. For
monotone increasing functions, we take a partition
Pn =
(
0,
1
n
,
2
n
, . . . ,
n− 1
n
, 1
)
and then we compute that
U(f, Pn)− L(f, Pn) =
n−1∑
i=0
(
f
(
i+ 1
n
)
− f
(
i
n
))
1
n
=
f(1)− f(0)
n
which goes to 0 as n→∞, so f is integrable. If f has finitely many discontinuities,
say k of them, then we can try to repeat the proof that continuous functions are
integrable, and the additional contribution to U(f, P )−L(f, P ) from the subintervals
where f is discontinuous will be at most k ·(sup f− inf f)δ, which vanishes as δ → 0.
3.3 Basic properties
In this section we’ll establish some basic properties of the Darboux integral.
Proposition 3.18. If f, g : [a, b] → R are integrable and f(x) ≤ g(x) for all
x ∈ [a, b], then ∫ b
a
f(x) dx ≤
∫ b
a
g(x) dx.
76
3 Integration Math40002: Analysis I
Proof. The inequality f(x) ≤ g(x) implies that L(f, P ) ≤ L(g, P ) for all partitions
P of a, b, so then∫ b
a
f(x) dx = sup
P
L(f, P ) ≤ sup
P
L(g, P ) =
∫ b
a
g(x) dx.
The lower Darboux integrals on either side are equal to
∫ b
a
f(x) dx and
∫ b
a
g(x) dx
respectively, so the proof is complete.
The next theorem asserts that integration is a linear operator.
Theorem 3.19
If f and g are integrable on [a, b], then∫ b
a
(
cf(x) + dg(x)
)
dx = c
∫ b
a
f(x) dx+ d
∫ b
a
g(x) dx
for any constants c, d ∈ R.
The proof follows immediately from combining the next two propositions. Their
proofs are a bit tedious, but they each follow the same general outline: take some-
thing we already know to be integrable, find a sequence of partitions so that the
lower and upper Darboux sums converge to the integral, and then manipulate these
Darboux sums to show that something else of interest is also integrable.
Proposition 3.20. Let f : [a, b] → R be integrable. Then cf(x) is integrable on
[a, b] for any c ∈ R, and ∫ b
a
cf(x) dx = c
∫ b
a
f(x) dx.
Proof. We pick partitions Pn of [a, b] with U(f, Pn)−L(f, Pn) < 1n for all n. If c ≥ 0
then we have
L(cf, Pn) = cL(f, Pn), U(cf, Pn) = cU(f, Pn)
and so U(cf, Pn) − L(cf, Pn) < cn for all n. Since this difference goes to zero as
n→∞, we see that cf(x) is integrable and∫ b
a
cf(x) dx = lim
n→∞L(cf, Pn) = limn→∞ cL(f, Pn) = c
∫ b
a
f(x) dx.
77
3 Integration Math40002: Analysis I
If c < 0 then nearly the same argument applies, except we notice that
inf
xi≤t≤xi+1
cf(x) = c
(
sup
xi≤t≤xi+1
f(x)
)
for all i, and this implies that L(cf, Pn) = cU(f, Pn); similarly U(cf, Pn) = cL(f, Pn).
But we still have
U(cf, Pn)− L(cf, Pn) = −c
(
U(f, Pn)− L(f, Pn)
)→ 0,
so cf(x) is still integrable, and∫ b
a
cf(x) dx = lim
n→∞L(cf, Pn) = limn→∞ cU(f, Pn) = c
∫ b
a
f(x) dx.
Proposition 3.21. Let f, g : [a, b] → R be integrable. Then f + g is integrable on
[a, b], and ∫ b
a
(
f(x) + g(x)
)
dx =
∫ b
a
f(x) dx+
∫ b
a
g(x) dx.
Proof. We check the inequalities
inf
x∈S
(
f(x) + g(x)
) ≥ ( inf
x∈S
f(x)
)
+
(
inf
x∈S
g(x)
)
sup
x∈S
(
f(x) + g(x)
) ≤ (sup
x∈S
f(x)
)
+
(
sup
x∈S
g(x)
)
,
which immediately imply for any partition P of [a, b] that
L(f, P ) + L(g, P ) ≤ L(f + g, P ), U(f + g, P ) ≤ U(f, P ) + U(g, P ).
For any n > 0 there are partitions Pn and Qn of [a, b] such that
U(f, Pn)− L(f, Pn) < 1
2n
, U(g,Qn)− L(g,Qn) < 1
2n
,
and if Rn is a common refinement of both Pn and Qn then it follows that
U(f,Rn)− L(f,Rn) ≤ U(f, Pn)− L(f, Pn) < 1
2n
,
U(g,Rn)− L(g,Rn) ≤ U(g,Qn)− L(g,Qn) < 1
2n
.
78
3 Integration Math40002: Analysis I
So then
U(f + g,Rn)− L(f + g,Rn) ≤
(
U(f,Rn) + U(g,Rn)
)− (L(f,Rn) + L(g,Rn))
=
(
U(f,Rn)− L(f,Rn)
)
+
(
U(g,Rn)− L(g,Rn)
)
<
1
2n
+
1
2n
=
1
n
.
Since we can do this for any n, it proves that f + g is integrable. We have∫ b
a
(
f(x) + g(x)
)
dx = lim
n→∞L(f + g,Rn)
≥ lim
n→∞L(f,Rn) + limn→∞L(g,Rn)
=
∫ b
a
f(x) dx+
∫ b
a
g(x) dx,
and the same argument with upper Darboux sums instead of lower sums shows that∫ b
a
(
f(x) + g(x)
)
dx ≤
∫ b
a
f(x) dx+
∫ b
a
g(x) dx,
so the two sides are equal.
Theorem 3.22
Let f : [a, b]→ R be integrable, and choose c ∈ (a, b). Then f is integrable on
each of [a, c] and [c, b], and∫ b
a
f(x) dx =
∫ c
a
f(x) dx+
∫ b
c
f(x) dx.
Proof. Since f(x) is integrable on [a, b], given any n > 0, we can find a partition Pn
of [a, b] such that
U(f, Pn)− L(f, Pn) < 1
n
.
We refine Pn to a partition Qn which also contains c, and then
U(f,Qn)− L(f,Qn) ≤ U(f, Pn)− L(f, Pn) < 1
n
.
Now Qn = (a, x1, x2, . . . , xk−1, c, xk+1, . . . , xm−1, b) gives us partitions
Q1,n = (a, x1, . . . , xk−1, c),
Q2,n = (c, xk+1, . . . , xm−1, b)
79
3 Integration Math40002: Analysis I
of [a, c] and [c, b] respectively, and by definition we have
L(f,Qn) = L(f,Q1,n) + L(f,Q2,n),
U(f,Qn) = U(f,Q1,n) + U(f,Q2,n).
Each U(f,Qi,n)− L(f,Qi,n) is nonnegative, and their sum over i = 1, 2 is equal to
U(f,Qn)− L(f,Qn) < 1n , so we must have
0 ≤ U(f,Qi,n)− L(f,Qi,n) < 1
n
, i = 1, 2.
Since we can do this for any n > 0, it follows that f(x) is integrable on each interval.
We also have
lim
n→∞L(f,Q1,n) =
∫ c
a
f(x) dx, lim
n→∞L(f,Q2,n) =
∫ b
c
f(x) dx
and so the algebra of limits says that∫ b
a
f(x) dx = lim
n→∞L(f,Qn)
= lim
n→∞
(
L(f,Q1,n) + L(f,Q2,n)
)
=
∫ c
a
f(x) dx+
∫ b
c
f(x) dx.
We can also compose continuous functions with integrable ones and the result will
be integrable, even if we can’t say much about the actual value of the integral.
Theorem 3.23
Let f : [a, b] → R be integrable, with m ≤ f(x) ≤ M for all x ∈ [a, b], and let
g : [m,M ]→ R be a continuous function. Then
h(x) = g(f(x))
is also integrable on [a, b].
Question 14. Let f : [a, b]→ [m,M ] be integrable and let g : [m,M ]→ R be
continuous. Which must be true of h(x) = g(f(x))?
1. h(x) is bounded. X
2. h(x) is continuous.
3. h(x) is monotone if g(x) is.
80
3 Integration Math40002: Analysis I
4. More than one of these.
5. None of these.
We know that g is bounded because it is continuous on a compact interval, hence so
is h. But if we take g(x) = x then h won’t be continuous or monotone unless f is.
Proof. Fix > 0. Since g(x) is continuous on a compact interval, we know the
following:
• The extreme value theorem provides xmin, xmax ∈ [m,M ] which satisfy g(xmin) ≤
g(x) ≤ g(xmax) for all x ∈ [m,M ], and we set
C = g(xmax)− g(xmin) + 1 > 0;
• g(x) is uniformly continuous, so there is a δ > 0 such that
|x− y| < δ ⇒ |g(x)− g(y)| <
2(b− a) .
We are allowed to replace δ with a smaller value, so we insist that δ < 2C .
Since f(x) is integrable, we pick a partition P = (x0, x1, . . . , xn) of [a, b] such that
U(f, P )− L(f, P ) < δ2.
Letting mi = inf
xi≤t≤xi+1
f(t) and Mi = sup
xi≤t≤xi+1
f(t), this means that
n−1∑
i=0
(Mi −mi)∆xi < δ2.
We call an index i good if Mi −mi < δ and bad otherwise. Our goal is to bound
U(h, P ) − L(h, P ) by separating the contributions from good and bad indices into
two different sums. Each good interval will contribute a small amount by itself; we
can’t say this about the bad intervals, but we’ll show instead that their total length
is very small.
First, if i is good then for all x, y ∈ [xi, xi+1] we have
|f(y)− f(x)| < δ ⇒ |g(f(y))− g(f(x))| <
2(b− a) ,
and so (
sup
xi≤t≤xi+1
h(t)
)
−
(
inf
xi≤t≤xi+1
h(t)
)
≤
2(b− a) .
81
3 Integration Math40002: Analysis I
On the other hand, if i is bad then we only know that
(
suph(t)
)− (inf h(t)) < C,
but summing over all bad i gives∑
i bad
δ ·∆xi ≤
∑
i bad
(Mi −mi)∆xi
≤
n−1∑
i=0
(Mi −mi)∆xi = U(f, P )− L(f, P ) < δ2
and so
∑
i bad
∆xi < δ <
2C
.
Combining these two bounds, we get
U(h, P )− L(h, P ) =
∑
i good
( sup
xi≤t≤xi+1
h(t)
)
−
(
inf
xi≤t≤xi+1
h(t)
)∆xi
+
∑
i bad
( sup
xi≤t≤xi+1
h(t)
)
−
(
inf
xi≤t≤xi+1
h(t)
)∆xi
≤
∑
i good
(
2(b− a)
)
∆xi +
∑
i bad
C ·∆xi
<
(
2(b− a)
)
(b− a) + C
(
2C
)
= .
We have thus found for any > 0 a partition P with U(h, P )− L(h, P ) < , and it
follows that h(x) is integrable on [a, b].
This is a very general result, but the following special cases are interesting.
Proposition 3.24. If f : [a, b]→ R is integrable, then |f | is also integrable on [a, b],
and we have a triangle inequality for integrals:∣∣∣∣∣
∫ b
a
f(x) dx
∣∣∣∣∣ ≤
∫ b
a
|f(x)| dx.
Proof. The previous theorem tells us that |f | is integrable, since we get it by com-
posing f with the continuous function |x|. The inequality
−|f(x)| ≤ f(x) ≤ |f(x)|
82
3 Integration Math40002: Analysis I
for all x ∈ [a, b] then immediately implies that
−
∫ b
a
|f(x)| dx =
∫ b
a
(−|f(x)|) dx ≤
∫ b
a
f(x) dx ≤
∫ b
a
|f(x)| dx,
hence
∣∣∣∫ ba f(x) dx∣∣∣ ≤ ∫ ba |f(x)| dx.
Proposition 3.25. If f, g : [a, b]→ R are integrable, then so is fg : [a, b]→ R.
Proof. We note that if a function h(x) is integrable on a given domain then so is
h(x)2, since it is the composition of h(x) with the continuous function x2. Now
both f+g2 and
f−g
2 are integrable on [a, b], as linear combinations of the integrable
functions f(x) and g(x), and so we use the identity
fg =
(
f + g
2
)2
−
(
f − g
2
)2
to see that f(x)g(x) is integrable as well.
3.4 The fundamental theorem of calculus
We’ve developed a long list of basic properties of integrals, but we haven’t managed
to compute that many so far. The fundamental theorem of calculus is our best tool
for doing so, and it also illustrates the strong relationship between derivatives and
integrals.
Theorem 3.26: Fundamental theorem of calculus, first version
Let f : [a, b]→ R be a continuous function, and define F : [a, b]→ R by
F (x) =
∫ x
a
f(t) dt.
Then F is continuous on [a, b] and differentiable on (a, b), and F ′(x) = f(x) for
all x ∈ (a, b).
Proof. For any x ∈ (a, b) and h > 0, we use basic properties of the Darboux integral
to see that
F (x+ h)− F (x)
h
=
1
h
(∫ x+h
a
f(t) dt−
∫ x
a
f(t) dt
)
=
1
h
∫ x+h
x
f(t) dt.
83
3 Integration Math40002: Analysis I
Similarly, if h < 0 then
F (x+h)−F (x)
h = − 1h
∫ x
x+h
f(t) dt.
Fixing > 0, we know since f is continuous at x that there is δ > 0 such that
|t− x| < δ ⇒ |f(t)− f(x)| < ,
or equivalently
|t− x| < δ ⇒ f(x)− < f(t) < f(x) + .
If 0 < h < δ then this holds for all t ∈ [x, x+ h], and so∫ x+h
x
(
f(x)− ) dt < ∫ x+h
x
f(t) dt <
∫ x+h
x
(
f(x) +
)
dt,
and since f(x)± is constant as a function of t this simplifies to
h(f(x)− ) <
∫ x+h
x
f(t) dt < h(f(x) + ).
The middle term is F (x + h) − F (x), so upon dividing by h and subtracting f(x)
from each side, we have shown that
0 < h < δ ⇒
∣∣∣∣F (x+ h)− F (x)h − f(x)
∣∣∣∣ < .
The same argument applies when −δ < h < 0, and this works for any > 0, so
F ′(x) = lim
h→0
F (x+ h)− F (x)
h
= f(x).
Since F is differentiable on (a, b) it is continuous there, though we still need to prove
continuity at a and b. Using the continuity of f at a, the same argument as before
shows us that for any > 0 there is a δ > 0 such that
0 < h < δ ⇒
∣∣∣∣F (a+ h)− F (a)h − f(a)
∣∣∣∣ <
⇒ |F (a+ h)− F (a)− hf(a)| < h.
If we take 0 < h < δ′ = min(δ, |f(a)|+) and apply the triangle inequality then
|F (a+ h)− F (a)| < h(|f(a)|+ ) < ,
and we can find such a δ′ > 0 for every > 0, so F is continuous at a; continuity at
b follows from the same argument.
84
3 Integration Math40002: Analysis I
The assumption that f is continuous is crucial here. For example, consider the
function f : [−1, 1]→ R given by
f(x) =
{
0, x < 0
1, x ≥ 0.
−1 1
1
f(x)
If we let F (x) =
∫ x
−1 f(t) dt, then we have
x < 0 ⇒ F (x) =
∫ x
−1
0 dt = 0
x ≥ 0 ⇒ F (x) =
∫ 0
−1
0 dt+
∫ x
0
1 dt = x. −1 1
1
F (x)
Here f(x) is not continuous at 0, and as a result F is not differentiable at 0 since
lim
x↑0
F (x)− F (0)
x− 0 = 0 but limx↓0
F (x)− F (0)
x− 0 = limx↓0
x− 0
x− 0 = 1.
Theorem 3.27: Fundamental theorem of calculus, second version
Let f : [a, b] → R be a continuous function which has a continuous derivative
on (a, b). Then ∫ b
a
f ′(x) dx = f(b)− f(a).
Proof. Let F (x) =
∫ x
a
f ′(t) dt. Then the first version of the fundamental theorem
of calculus says that
F ′(x) = f ′(x) for all x ∈ (a, b),
and both F and f are continuous on [a, b], so there is some constant c such that
f(x) = F (x) + c
on [a, b]. Setting x = a shows us that c = f(a), since F (a) = 0, and so∫ b
a
f ′(x) dx = F (b) = f(b)− f(a).
85
3 Integration Math40002: Analysis I
Question 15. Let f, g : [a, b] → R be continuous. Which of the following is a
consequence of the fundamental theorem of calculus?
1. There is an h : [a, b]→ R such that f = h′′ on (a, b). X
2. If f ′(x) exists on (a, b), then f ′ is integrable.
3. If f and g are continuously differentiable on (a, b) and f ′(x) = g′(x) for
all x, then f(x) = g(x) for all x.
4. More than one of these.
For the first option, we can take F (x) =
∫ x
a
f(t) dt and g(x) =
∫ x
a
F (t) dt, and then
g′′ = F ′ = f on (a, b). The second is false because we need f ′(x) to be continuous:
if we take
f(x) =
{
x2 sin( 1x2 ), x 6= 0
0, x = 0
⇒ f ′(x) =
{
2x sin( 1x2 )− 2x cos( 1x2 ), x 6= 0
0, x = 0
then f ′ exists on (−1, 1) but is discontinuous at x = 0, and it is not integrable on
[−1, 1] because it is unbounded. The third option is false because we could have
g = f + 1.
Example 3.28. Suppose we wish to compute
∫ b
a
xn dx for some integer n 6= −1,
and that if n < 0 then 0 6∈ [a, b]. Since xn+1n+1 has derivative xn, the fundamental
theorem of calculus tells us that∫ b
a
xn dx =
bn+1
n+ 1
− a
n+1
n+ 1
.
For the case n = −1, assuming that 0 < a < b, we note that log(x) has derivative
1
x , and so ∫ b
a
1
x
dx = log(b)− log(a).
The first version of the fundamental theorem of calculus gives us the following new
version of the mean value theorem.
Theorem 3.29: Mean value theorem for integrals
Let f : [a, b]→ R be a continuous function. Then there is some c ∈ (a, b) such
that ∫ b
a
f(x) dx = f(c)(b− a).
86
3 Integration Math40002: Analysis I
In other words, the two overlapping shaded regions below have the same area:
a b
f(c)
c
Proof. Let F (x) =
∫ x
a
f(t) dt. Then F (x) is continuous on [a, b] and differentiable
on (a, b), so the original mean value theorem gives us some c ∈ (a, b) such that
F ′(c) =
F (b)− F (a)
b− a .
But we know that F ′(c) = f(c), so we conclude that∫ b
a
f(x) dx = F (b)− F (a) = F ′(c)(b− a) = f(c)(b− a).
3.5 More properties of integrals
The fundamental theorem of calculus allows us to develop some new tools for eval-
uating integrals.
Theorem 3.30: Integration by parts
If f, g : [a, b]→ R are continuous with continuous first derivatives, then∫ b
a
f(x)g′(x) dx = f(b)g(b)− f(a)g(a)−
∫ b
a
f ′(x)g(x) dx.
Proof. We start with the product rule for derivatives:
d
dx
(f(x)g(x)) = f ′(x)g(x) + f(x)g′(x).
Since f ′ and g′ are continuous, so are f ′g, fg′, and (fg)′ = f ′g+ fg′, so they’re all
integrable and the second fundamental theorem of calculus says that∫ b
a
(f(x)g(x))′ dx = f(b)g(b)− f(a)g(a),
87
3 Integration Math40002: Analysis I
and hence
f(b)g(b)− f(a)g(a) =
∫ b
a
f ′(x)g(x) dx+
∫ b
a
f(x)g′(x) dx.
We subtract
∫ b
a
f ′(x)g(x) dx from both sides to complete the proof.
In practice, given a function f : [a, b]→ R we may write f(x)
∣∣x=b
x=a
as shorthand for
f(b)− f(a), as in ∫ b
a
f ′(x) dx = f(x)
∣∣x=b
x=a
.
Example 3.31. We can integrate log(x) by parts, using f(x) = log(x) and
g(x) = x: ∫ b
a
log(x) · 1 dx = x log(x)
∣∣x=b
x=a
−
∫ b
a
1
x
· x dx
= (b log(b)− a log(a))− (b− a)
= (b log(b)− b)− (a log(a)− a).
This computation shows that x log(x)−x is an antiderivative of log(x), meaning
that its derivative is log(x).
From here on we may sometimes accidentally write down an integral of the form∫ b
a
f(x) dx where a > b. In this case we make sense of it by defining∫ b
a
f(x) dx = −
∫ a
b
f(x) dx,
and then it is not hard to check that all of the properties we have developed for the
Darboux integral remain true even when a > b. For example, this can happen in
the following theorem statement when φ is monotone decreasing.
Theorem 3.32: Integration by substitution
Let f : [a, b]→ R be a continuous function, and suppose that φ : [c, d]→ [a, b]
has a continuous derivative on (c, d). Then∫ φ(d)
φ(c)
f(x) dx =
∫ d
c
f(φ(t))φ′(t) dt.
88
3 Integration Math40002: Analysis I
Proof. Let F (x) =
∫ x
a
f(t) dt, which is an antiderivative of f . Then
d
dx
(F (φ(x))) = F ′(φ(x))φ′(x) = f(φ(x))φ′(x),
which is continuous on (c, d), so by the fundamental theorem of calculus we have∫ d
c
f(φ(t))φ′(t) dt = F (φ(d))− F (φ(c))
=
∫ φ(d)
a
f(t) dt−
∫ φ(c)
a
f(t) dt
=
∫ φ(d)
φ(c)
f(t) dt.
We can use substitution to find antiderivatives of several new functions.
Example 3.33. We evaluate
∫ x
e
1
t log(t)
dt by substituting t = es. Then
∫ x
e
1
t log(t)
dt =
∫ log(x)
1
1
es log(es)
· es ds =
∫ log(x)
1
1
s
ds = log(s)
∣∣s=log(x)
s=1
,
so log(log(x)) is an antiderivative of 1
x log(x)
.
Example 3.34. We evaluate
∫ x
0
1√
1−t2 dt for x ∈ (−1, 1) by substituting t =
sin(θ). Then we have
∫ x
0
1√
1− t2 dt =
∫ sin−1(x)
0
1√
1− sin2(θ)
· cos(θ) dθ
=
∫ sin−1(x)
0
cos(θ)
cos(θ)
dθ
=
∫ sin−1(x)
0
1 dθ = sin−1(x).
So sin−1 : [−1, 1]→ [−pi2 , pi2 ] is differentiable on (−1, 1), with derivative 1√1−x2 .
Example 3.35. We evaluate
∫ x
0
1
1+t2 dt by substituting t = tan(θ). Then
dt
dθ
=
d
dθ
(
sin(θ)
cos(θ)
)
=
cos2(θ) + sin2(θ)
cos2(θ)
=
1
cos2(θ)
89
3 Integration Math40002: Analysis I
by the quotient rule for derivatives, so we have∫ x
0
1
1 + t2
dt =
∫ tan−1(x)
0
1
1 + tan2(θ)
· 1
cos2(θ)
dθ
=
∫ tan−1(x)
0
1
cos2(θ) + sin2(θ)
dθ
=
∫ tan−1(x)
0
dθ = tan−1(x),
and so tan−1 : R→ (−pi2 , pi2 ) is differentiable, with derivative 11+x2 .
Our last result in this section tells us how to integrate the inverse of a strictly
monotone function. Surprisingly, this was not proved until 1905, thirty years after
the introduction of the Darboux integral!
Theorem 3.36
Let f : [a, b]→ R be a strictly monotone increasing function. Then both f and
its inverse f−1 are integrable, and∫ b
a
f(x) dx+
∫ f(b)
f(a)
f−1(x) dx = bf(b)− af(a).
Proof. Both f and f−1 are monotone, and we leave it as an exercise to show that
monotone functions are integrable. (Hint: take partitions in which all subintervals
have the same length, and then the difference between the upper and lower Darboux
sums is proportional to the reciprocal of the number of subintervals.)
We use f to give a bijection between partitions of [a, b] and of [f(a), f(b)]:
P = (x0, . . . , xn) ⇐⇒ f(P ) = (f(x0), f(x1), . . . , f(xn)).
Then we can prove by picture that U(f, P ) + L(f−1, f(P )) = bf(b) − af(a), and
90
3 Integration Math40002: Analysis I
that L(f, P ) + U(f−1, f(P )) = bf(b)− af(a):
a bx1 x2 · · ·
f(a)
f(b)
f(x1)
f(x2)
...
a bx1 x2 · · ·
f(a)
f(b)
f(x1)
f(x2)
...
In the picture at left:
• the total area of the blue rectangles is by definition U(f, P );
• the total area of the red rectangles is L(f−1, f(P )), as can be seen by reflecting
the picture across the line y = x;
• the blue and red rectangles do not overlap, except along their edges;
• and the total area of all the rectangles is bf(b)−af(a), because the shape they
form is built from a rectangle of area bf(b) by carving out a rectangle of area
af(a) from the bottom left corner.
So if we combine all of these facts then we get
U(f, P ) + L(f−1, f(P )) = bf(b)− af(a),
hence ∫ f(b)
f(a)
f−1(x) dx = sup
f(P )
L(f−1, f(P ))
= sup
P
(
bf(b)− af(a)− U(f, P ))
= bf(b)− af(a)− inf
P
U(f, P )
= bf(b)− af(a)−
∫ b
a
f(x) dx.
The same analysis applied to the picture on the right gives∫ f(b)
f(a)
f−1(x) dx = bf(b)− af(a)−
∫ b
a
f(x) dx,
so f−1 is integrable on the domain [f(a), f(b)] and its integral has the claimed
value.
91
3 Integration Math40002: Analysis I
Example 3.37. We wish to find an antiderivative of sin−1(x), so we fix t ∈ (0, 1)
and set f(x) = sin(x), a = 0, and b = sin−1(t) to get the identity∫ sin−1(t)
0
sin(x) dx+
∫ t
0
sin−1(x) dx = t sin−1(t).
By the fundamental theorem of calculus, the first integral on the left is∫ sin−1(t)
0
sin(x) dx =
[− cos(x)]x=sin−1(t)
x=0
= 1− cos (sin−1(t)) .
Now plugging θ = sin−1(t) into cos2(θ) + sin2(θ) = 1 gives us
cos2(sin−1(t)) = 1− sin2(sin−1(t)) = 1− t2,
and sin−1(t) ∈ (0, pi2 ) implies that cos(sin−1(t)) > 0, so we must have
cos(sin−1(t)) =
√
1− t2.
We conclude that∫ t
0
sin−1(x) dx = t sin−1(t)−
(
1− cos (sin−1(t)))
= t sin−1(t) +
√
1− t2 − 1.
3.6 Limits of integrable functions
We have seen that very few things commute with pointwise convergence of functions:
a sequence of continuous functions need not converge to something continuous, and
even if a sequence of differentiable functions fn converges pointwise to a differentiable
f then f ′(x) can be different from the pointwise limit lim
n→∞ f
′
n(x). The same is true
for integration.
Example 3.38. Define a function fn : [0, 1]→ R by
fn(x) =
{
n, 0 < x < 1n
0, otherwise.
1
f5
f4
f3
f2
f1
92
3 Integration Math40002: Analysis I
Then each fn is integrable, with∫ 1
0
fn(x) dx =
∫ 1/n
0
fn(x) dx = 1.
But the functions fn converge pointwise to f(x) = 0, since fn(0) = 0 for all 0
and
0 < x ≤ 1 =⇒ fn(x) = 0 for all n > 1
x
,
and we have
∫ 1
0
(
lim
n→∞ fn(x)
)
dx =
∫ 1
0
f(x) dx = 0 while
lim
n→∞
∫ 1
0
fn(x) dx = 1.
Example 3.39. We will construct a sequence of functions fn : [0, 1] → R such
that
lim
n→∞
∫ b
a
fn(x) dx 6=
∫ b
a
(
lim
n→∞ fn(x)
)
dx,
even though the functions fn and their pointwise limit f(x) = lim
n→∞ fn(x) are all
not just continuous but infinitely differentiable.
We showed in a problem sheet that the function
φ(x) =
{
e
− 1
x2
− 1
(1−x)2 , 0 < x < 1
0 otherwise
0 1
is infinitely differentiable, and it is nonzero iff 0 < x < 1. Let c =
∫ 1
0
φ(x) dx.
For each n ∈ N we define a function fn : [0, 1]→ R by
fn(x) =
n
c
· φ(nx),
93
3 Integration Math40002: Analysis I
so fn(x) = 0 except on the interval (0,
1
n).
1
f5
f4
f3
f2
f1
We compute by substitution that for all n,∫ 1
0
fn(x) dx =
1
c
∫ 1
0
φ(nx) · n dx
=
1
c
∫ n
0
φ(y) dy
=
1
c
∫ 1
0
φ(y) dy = 1.
But at the same time, the pointwise limit f(x) = lim
n→∞ fn(x) is identically zero:
this is clear for x = 0 since fn(0) = 0 for all n, and if x > 0 instead then we have
fn(x) = 0 for all n >
1
x . So∫ 1
0
f(x) dx = 0 6= 1 = lim
n→∞
∫ 1
0
fn(x) dx.
In fact, we can do even worse: the functions nfn(x) converge pointwise to f(x) =
0 for the same reason, but
∫ 1
0
nfn(x) dx = n→∞.
Just as with continuity and differentiation, however, it turns out that uniform con-
vergence saves the day.
Theorem 3.40
Let fn : [a, b] → R be a sequence of integrable functions which converges
uniformly to f : [a, b]→ R. Then f is integrable, and∫ b
a
f(x) dx = lim
n→∞
∫ b
a
fn(x) dx.
94
3 Integration Math40002: Analysis I
Proof. The key idea is that if fn is very close to f when n is large, then the upper
and lower Darboux sums for f must be very close to those for fn, and so the upper
and lower Darboux integrals for f get arbitrarily close to
∫ b
a
fn(x) dx as n→∞.
Fix > 0. Uniform convergence gives us some N > 0 such that if n ≥ N , then
|fn(x)− f(x)| <
2(b− a) ∀x ∈ [a, b].
For any n ≥ N and any partition P = (x0, . . . , xk) of [a, b], we have
inf
xi≤t≤xi+1
f(t) ≥
(
inf
xi≤t≤xi+1
fn(t)
)
−
2(b− a) ,
for all i, so then the lower Darboux sums of f and fn satisfy
L(f, P ) =
k−1∑
i=0
(
inf
xi≤t≤xi+1
f(t)
)
∆xi
≥
k−1∑
i=0
((
inf
xi≤t≤xi+1
fn(t)
)
−
2(b− a)
)
∆xi
= L(fn, P )−
2(b− a)
k−1∑
i=0
∆xi
= L(fn, P )−
2
.
Taking suprema over all P gives us∫ b
a
f(x) dx ≥
∫ b
a
fn(x) dx−
2
for all n ≥ N . We can also apply the same argument with the bound
sup
xi≤t≤xi+1
f(t) ≤
(
sup
xi≤t≤xi+1
fn(t)
)
+
2(b− a) ,
taking infima of the corresponding upper Darboux sums over all P , to conclude that∫ b
a
f(x) dx ≤
∫ b
a
fn(x) dx+
2
for all n ≥ N . But since the fn are integrable, their lower and upper Darboux
integrals are equal, and we now have∫ b
a
fn(x) dx−
2
≤
∫ b
a
f(x) dx ≤
∫ b
a
f(x) dx ≤
∫ b
a
fn(x) dx+
2
95
3 Integration Math40002: Analysis I
for all n ≥ N .
This last chain of inequalities lets us conclude two things. First, we have
0 ≤
∫ b
a
f(x) dx−
∫ b
a
f(x) dx ≤ ,
and we can show this for any > 0, so in fact the upper and lower Darboux integrals
must be equal. This proves that f is integrable. Second, given > 0 we have found
N such that
n ≥ N ⇒
∣∣∣∣∣
∫ b
a
f(x) dx−
∫ b
a
fn(x) dx
∣∣∣∣∣ ≤ 2 < .
Since we can do this for any > 0, this is precisely what is needed to show that∫ b
a
fn(x) dx→
∫ b
a
f(x) dx
as n→∞.
One useful application of this theorem is to finding antiderivatives of power series.
Proposition 3.41. Let f(x) =
∞∑
n=0
anx
n be a power series with radius of conver-
gence R > 0. Then f is integrable on any closed subinterval of (−R,R), with∫ x
0
f(t) dt =
∞∑
n=0
an
n+ 1
xn+1
for any x ∈ (−R,R).
Proof. Fix x ∈ (−R,R). We know that the partial sums
fn(t) =
n∑
i=0
ait
i
converge uniformly to f on the interval [−x, x] and thus on the subinterval [0, x], so
it follows that f is integrable on [0, x] and that
∞∑
i=0
ai
i+ 1
xi+1 = lim
n→∞
n∑
i=0
ai
i+ 1
xi+1 = lim
n→∞
∫ x
0
fn(t) dt =
∫ x
0
f(t) dt,
where the last equality uses the uniform convergence of fn → f on [0, x].
96
3 Integration Math40002: Analysis I
Example 3.42. We integrate the power series 11−x =
∞∑
n=0
xn, with radius of
convergence 1, to get
log
(
1
1− x
)
=
∞∑
n=0
xn+1
n+ 1
=
x
1
+
x2
2
+
x3
3
+
x4
4
+ . . .
for all x ∈ (−1, 1). Taking x = 12 gives us
log(2) =
∞∑
n=1
1
n · 2n ,
which converges very quickly compared to the alternating harmonic series log(2) =
∞∑
n=1
(−1)n−1
n
, since the error after the first k terms is
∞∑
n=k+1
1
n · 2n ≤
∞∑
n=k+1
1
(k + 1)2k+1
· 1
2n−(k+1)
=
1
(k + 1)2k
.
Taking just the first five terms gives
log(2) ≈ 1
2
+
1
8
+
1
24
+
1
64
+
1
160
=
661
960
= 0.6885416
to within at most 16·25 =
1
192 = 0.0052083. (The actual value is 0.6931 . . . .)
3.7 Improper integrals
Occasionally we would like to evaluate integrals
∫ b
a
f(x) dx which don’t quite have
the form we’re used to: either f is unbounded on [a, b], or [a, b] has infinite length
because either a = −∞ or b = ∞ (or both). In this case, we can still try to make
sense of this using limits.
Definition. Suppose f : (a, b] → R is integrable on every subinterval [c, b] ⊂
(a, b]. Then we define the improper integral∫ b
a
f(x) dx = lim
c↓a
∫ b
c
f(x) dx
if the limit exists, and otherwise we say that it diverges or fails to exist.
97
3 Integration Math40002: Analysis I
Likewise, if f : [a, b)→ R is integrable on every [a, c] ⊂ [a, b) then we define∫ b
a
f(x) dx = lim
c↑b
∫ c
a
f(x) dx.
If a, b ∈ R and f extends to an integrable function f˜ : [a, b] → R, then this is
no different from the usual integral. To see this, we take an M > 0 such that
|f˜(x)| ≤M for all x ∈ [a, b], and then∣∣∣∣∫ c
a
f˜(x) dx
∣∣∣∣ ≤ ∫ c
a
|f˜(x)| dx ≤
∫ c
a
M dx = M(c− a),
which implies that lim
c↓a
∫ c
a
f˜(x) dx = 0. Then we have
lim
c↓a
∫ b
c
f(x) dx =
∫ b
a
f˜(x) dx− lim
c↓a
∫ c
a
f˜(x) dx
=
∫ b
a
f˜(x) dx.
Improper integrals are more interesting when f˜ doesn’t exist, such as when either f
or its domain is unbounded.
Example 3.43. If 0 < r < 1 then 1xr is unbounded as x ↓ 0 and undefined at
x = 0. But on any interval [, 1] with > 0, we have∫ 1
1
xr
dx =
x1−r
1− r
∣∣∣∣x=1
x=
=
1− 1−r
1− r ,
and this converges as ↓ 0, so we have the improper integral∫ 1
0
1
xr
dx = lim
↓0
∫ 1
1
xr
dx =
1
1− r .
On the other hand, if r > 1 then the limit as ↓ 0 does not exist, and likewise
if r = 1 then
lim
↓0
∫ 1
1
x
dx = lim
↓0
(− log()) =∞.
So
∫ 1
0
1
xr dx exists for r > 0 if and only if 0 < r < 1.
98
3 Integration Math40002: Analysis I
Example 3.44. We define the improper integral
∫∞
0
1
1+x2 dx as a limit:∫ ∞
0
1
1 + x2
dx = lim
b→∞
∫ b
0
1
1 + x2
dx
= lim
b→∞
tan−1(x)
∣∣x=b
x=0
= lim
b→∞
tan−1(b).
Since the limit exists and is equal to pi2 , we have
∫ ∞
0
1
1 + x2
dx =
pi
2
.
Question 16. For which of the following f : [1,∞) → R does ∫∞
1
f(x) dx
exist? Choose all that apply.
1. f(x) = e−xX
2. f(x) = cos(x)
3. f(x) = x−3X
4. f(x) = 1
x log(x)
Likewise, we may be interested in an integral
∫ b
a
f(x) dx, where f is unbounded at
some interior point c ∈ (a, b) or where both a = −∞ and b = ∞. In this case we
define the improper integral∫ b
a
f(x) dx =
∫ c
a
f(x) dx+
∫ b
c
f(x) dx,
as a sum of two improper integrals, and we require that both of them exist; we leave
it as an exercise to check that the answer does not depend on c.
If f has multiple improprieties (not an official term), meaning points in the domain
where f is unbounded or limits of integration at ±∞, then we similarly split it into
a sum of improper integrals
∫ d
c
f(x) dx with at most one impropriety each, and then
we take the sum of each of these if they are defined.
For example, we could take f(x) = 1x2−1 on [0,∞):
1 2
99
3 Integration Math40002: Analysis I
Here we would split the domain into intervals [0, 1], [1, c], and [c,∞] for some c >
1, so that f is unbounded on the first two, and f is bounded but the domain is
unbounded on the third. We compute that for a proper integral of this form, we
have ∫ b
a
f dx =
(
1
2
log
∣∣∣∣x− 1x+ 1
∣∣∣∣
)x=b
x=a
,
so the improper integral is by definition∫ ∞
0
f(x) dx =
∫ 1
0
f dx︸ ︷︷ ︸
divergent
+
∫ 2
1
f(x) dx︸ ︷︷ ︸
divergent
+
∫ ∞
2
f(x) dx︸ ︷︷ ︸
= 1
2
log(3)
and this does not exist.
Example 3.45. Consider the integral∫ 1
−1
1
x
dx =
∫ 0
−1
1
x
dx+
∫ 1
0
1
x
dx.
This does not exist, because neither improper integral on the right converges.
Note that if we tried to evaluate this using a single limit for both summands,
the answer would depend on how we approached 0 from either side. We have
lim
↓0
(∫ −
−1
1
x
dx+
∫ 1
1
x
dx
)
= lim
↓0
(
log() + (− log())) = 0,
but
lim
↓0
(∫ −2
−1
1
x
dx+
∫ 1
1
x
dx
)
= lim
↓0
(
log(2) + (− log()))
= lim
↓0
log(2) = log(2)
and similarly
lim
↓0
(∫ −2
−1
1
x
dx+
∫ 1
1
x
dx
)
= lim
↓0
(
log(2) + (− log()))
= lim
↓0
log() = −∞.
Thus the only sensible thing to do if we want the original improper integral∫ 1
−1
1
x dx to exist and be well-defined is to insist that each individual limit exists
100
3 Integration Math40002: Analysis I
and then take their sum. In this case, neither
∫ 0
−1
1
x dx nor
∫ 1
0
1
x dx exists, so we
can’t hope to make sense of
∫ 1
−1
1
x dx.
Example 3.46. The function f(x) = 1√|x| is unbounded as x→ 0, so we define∫ 1
−1
1√
|x|
dx =
∫ 0
−1
1√
|x|
dx+
∫ 1
0
1√
|x|
dx.
The second of these summands is
lim
a↓0
∫ 1
a
1√
x
dx = lim
a↓0
2
√
x
∣∣x=1
x=a
= lim
a↓0
(
2− 2√a) = 2,
and substituting y = −x shows that the first summand is the limit as a ↓ 0 of∫ −a
−1
1√−x dx = −
∫ a
1
1√
y
dy =
∫ 1
a
1√
y
dy → 2,
so the original improper integral exists and we have∫ 1
−1
1√
|x|
dx = 4.
Example 3.47. We have∫ ∞
−∞
1
1 + x2
dx =
∫ 0
−∞
1
1 + x2
dx+
∫ ∞
0
1
1 + x2
dx.
We have already evaluated the second summand as pi2 , and the first is also
pi
2 by
essentially the same argument, so∫ ∞
−∞
1
1 + x2
dx =
pi
2
+
pi
2
= pi.
Virtually all of our properties of Darboux integrals apply to improper integrals as
well; we just need to check that all of the limits involved are well-defined. For
example, we have the second fundamental theorem of calculus: if f : [a,∞)→ R is
continuous and has a continuous derivative on (a,∞), then for all b > a we have∫ b
a
f ′(x) dx = f(b)− f(a),
101
3 Integration Math40002: Analysis I
so
∫∞
a
f ′(x) dx exists if and only if lim
b→∞
f(b) exists, and then∫ ∞
a
f ′(x) dx = f(x)
∣∣x=∞
x=a
=
(
lim
x→∞ f(x)
)
− f(a).
Here we introduce the notation “
∣∣x=∞” to mean that we take the limit as x→∞.
We also have the following useful criterion for integrability.
Proposition 3.48. Let f, g : [a,∞) → R be functions satisfying 0 ≤ f(x) ≤ g(x)
for all x ≥ a, and which are both integrable on any interval [a, b]. If ∫∞
a
g(x) dx
exists, then
∫∞
a
f(x) dx also exists and
0 ≤
∫ ∞
a
f(x) dx ≤
∫ ∞
a
g(x) dx.
Proof. We know that
∫ b
a
g(x) dx is a monotone increasing function of b, and its limit
as b→∞ exists, so for all b ≥ a we have
0 ≤
∫ b
a
f(x) dx ≤
∫ b
a
g(x) dx ≤
∫ ∞
a
g(x) dx.
But
∫ b
a
f(x) dx is also increasing as a function of b, and it is bounded above by∫∞
a
g(x) dx, so it converges as b→∞ and
0 ≤ lim
b→∞
∫ b
a
f(x) dx ≤
∫ ∞
a
g(x) dx.
The middle term is equal to
∫∞
a
f(x) dx, so this completes the proof.
Example 3.49. The integral
∫∞
2
1
x2+sin(x)
dx exists, because the integrand sat-
isfies
0 ≤ 1
x2 + sin(x)
≤ 1
x2 − 1 <
2
x2
for all x ≥ 2
and the upper bound is integrable on [2,∞):∫ ∞
2
2
x2
dx = −2
x
∣∣∣∣x=∞
x=2
= 1.
As another example of a property which continues to work for improper integrals,
we still have integration by parts: letting b→∞ in the formula∫ b
a
f(x)g′(x) dx = f(x)g(x)
∣∣x=b
x=a
−
∫ b
a
f ′(x)g(x) dx
102
3 Integration Math40002: Analysis I
and applying the algebra of limits, we see that if both∫ ∞
a
f ′(x)g(x) dx and lim
x→∞ f(x)g(x)
exist, then f(x)g′(x) is integrable over [a,∞) as well, and∫ ∞
a
f(x)g′(x) dx = f(x)g(x)
∣∣x=∞
x=a
−
∫ ∞
a
f ′(x)g(x) dx
where x =∞ on the right means that we take the limit as x→∞.
3.8 Lebesgue’s criterion for integrability
We have seen that continuous functions f : [a, b]→ R are integrable, and some but
not all discontinuous, bounded functions are as well. In this non-examinable section
we’ll determine exactly which functions are integrable.
Theorem 3.50: Lebesgue’s criterion for integrability
A bounded function f : [a, b]→ R is integrable if and only if the set
D(f) = {x ∈ [a, b] | f is not continuous at x}
has measure zero.
We will define “measure zero” momentarily, but for now we claim that
• countable sets have measure zero, and
• a set which contains an interval [c, d] with c < d does not have measure zero.
So a function which is continuous at all irrational x ∈ [a, b] is integrable, and if f is
integrable then the set of points where f is continuous must be dense in [a, b].
The measure zero condition says roughly that a given set is contained in the union of
some open intervals whose total length can be made arbitrarily small. We introduce
some terminology to make this precise:
Definition. An open cover of S ⊂ R is a collection of open intervals {Uα =
(aα, bα)} such that
S ⊂
⋃
α
Uα.
Lemma 3.51. Let {Uα = (aα, bα)} be an open cover of S ⊂ R. Then
103
3 Integration Math40002: Analysis I
1. {Uα} has a countable subcover, meaning a collection of countably many Ui ∈
{Uα} such that {U1, U2, U3, . . . } also covers S; and
2. If S is compact, then in fact {Uα} has a finite subcover.
Proof. (1) The set of intervals of the form (p, q) with p, q both rational is countable,
since there is an injection
{(p, q) | p, q ∈ Q} ↪→ Q×Q
by sending the interval (p, q) to the ordered pair (p, q). Every point x ∈ S belongs
to some open set Ux, which by definition contains some open interval (x− δ, x+ δ)
with δ > 0, and if we pick rational px, qx with
x− δ < px < x < qx < x+ δ
then x belongs to (px, qx), which is in turn a subset of Ux.
Ux
x
px qx
x− δ x+ δ
The collection
{(px, qx) | x ∈ S}
is then an open cover of S, and it only has countably many distinct intervals, so we
enumerate them as (pi, qi) where i ∈ N. Each (pi, qi) was a subset of some Uαi by
construction, and we have
S ⊂
⋃
i
(pi, qi) ⊂
⋃
i
Uαi
so the countable subcover {Uαi} of our original {Uα} also covers S.
(2) Now suppose that S is compact, and that we have an infinite cover of S; by the
above argument we can pass to a countable subcover {Ui = (ai, bi) | i ∈ N}. We
define a function f : S → N by
f(x) = min{n | x ∈ Un};
this is well defined because x is in
⋃
i
Ui, and so it must belong to some Ui. If no
finite union
m⋃
i=1
Ui contains all of S, then for every n we can find an xn ∈ S such
104
3 Integration Math40002: Analysis I
that f(xn) ≥ n. Since S is compact, the Bolzano–Weierstrass theorem says that the
sequence
(
xn
)
has a convergent subsequence whose limit lies in S; we write this as
yn → y ∈ S, lim
n→∞ f(yn) =∞.
But we have y ∈ Uf(y), so Uf(y) contains some open interval (y − , y + ). Since
yn → y, there is an N > 0 such that
n ≥ N ⇒ |yn − y| < ⇒ yn ∈ (y − , y + ) ⊂ Uf(y).
R
Uf(y)
yy − y +
yN−1 yN yN+1 yN+2yN+3···
This says that f(yn) ≤ f(y) for all n ≥ N , contradicting f(yn)→∞, and so it must
be the case that some finite union
⋃m
i=1 Ui contains all of S after all.
Definition. We say that a set S ⊂ R has (outer) measure zero if for every
> 0, there is a finite or countable open cover {Ui = (ai, bi)} of S with∑
i
(bi − ai) < .
Example 3.52. If S = {x} is a single point, then for any > 0 it can be covered
by the single open interval (
x−
3
, x+
3
)
of length 23 < . Thus a point has measure zero.
Example 3.53. If S has measure zero, then any subset T ⊂ S also has measure
zero, since any open cover of S with total length is also an open cover of T .
Example 3.54. If S1, S2, . . . are countably many sets of measure zero and we
are given > 0, then each Si admits a countable open cover
{Ui,j = (ai,j , bi,j) | j ∈ N} such that
∑
j
(bi,j − ai,j) <
2i
.
The collection of all open intervals Ui,j is countable, since it is a countable union
105
3 Integration Math40002: Analysis I
of countably many intervals, and it covers
⋃
Si, with
∑
i,j
(bi,j − ai,j) =
∞∑
i=1
∑
j
(bi,j − ai,j)
< ∞∑
i=1
2i
= .
Thus a countable union of sets of measure zero also has measure zero. In partic-
ular, Q has measure zero.
Proposition 3.55. If S = [a, b] with a < b, then S does not have measure zero.
Proof. Let {Ui} be a countable open cover of [a, b], and use the compactness of S
to pass to a finite subcover
{Ui | 1 ≤ i ≤ n}
whose total length does not exceed that of the original cover. If any two intervals
Ui and Uj overlap then we can replace the two of them with their union,
↓
← Ui →
← Uj →
← Ui ∪ Uj →
↓
which decreases the number of intervals without changing
⋃
Ui or increasing the
length of the cover, and we repeat until all of the Ui are disjoint.
We now assume the Ui are labeled so that U1 = (c1, d1) contains a. Then no other
Uj = (cj , dj) can contain d1, because as an open set it would have to contain a whole
neighborhood (d1 − δ, d1 + δ) for some δ > 0, and if we take δ small enough then
(d1 − δ, d1) ⊂ U1 ∩ Uj ,
contradicting the fact that U1 and Uj are disjoint. Since the Ui cover [a, b], it follows
that d1 6∈ [a, b], and since c1 < a < d1 we must have d1 > b. So in fact [a, b] ⊂ U1,
and hence the total length of the cover {U1, . . . , Un} is at least d1 − c1 > b− a. We
conclude that [a, b] cannot have measure zero.
Corollary 3.56. If S contains an interval [a, b] with a < b, then it does not have
measure zero.
We now want to figure out what it takes for a function f : [a, b]→ R to be integrable,
so we’ll begin by quantifying how discontinuous it can be at a point. We measure
how much f “jumps” near x by defining
jf (x) = inf
δ>0
(
sup
|y−x|<δ
f(y)− inf
|y−x|<δ
f(y)
)
.
106
3 Integration Math40002: Analysis I
jf (x)
x
Given x ∈ [a, b] and any fixed δ > 0, we have
sup
|y−x|<δ
f(y) ≥ f(x) ≥ inf
|y−x|<δ
f(y),
and f is continuous at x if and only if the left and right sides both approach f(x)
as δ → 0. So jf (x) ≥ 0, with equality if and only if f is continuous at x.
Given the function f : [a, b]→ R, we label its set of discontinuous points by
D(f) = {x ∈ [a, b] | f is not continuous at x} = {x | jf (x) > 0},
and for every c > 0 we define
Dc(f) = {x ∈ [a, b] | jf (x) ≥ c}.
Then we can write D(f) as a countable union
D(f) =
⋃
n∈N
D1/n(f),
so D(f) has measure zero if and only if D1/n(f) has measure zero for all n ∈ N.
Proposition 3.57. If f is integrable, then Dc(f) has measure zero for all c > 0,
and so D(f) =
⋃
n∈N
D1/n(f) has measure zero.
Proof. Since f is integrable, given c > 0 and any > 0 we can pick a partition
P = (x0, . . . , xn) of [a, b] such that U(f, P )− L(f, P ) < c2 , or equivalently
n−1∑
i=0
(
sup
xi≤t≤xi+1
f(t)− inf
xi≤t≤xi+1
f(t)
)
∆xi <
c
2
.
If the open interval (xi, xi+1) contains a point of Dc(f), then the corresponding term
contributes at least c∆xi to the sum on the left, and the sum of all such terms is
less than c2 , so the sum of the lengths of the open intervals (xi, xi+1) which contain
points of Dc(f) is less than
2 . These intervals provide an open cover of
Dc(f) \ {x0, x1, . . . , xn}
107
3 Integration Math40002: Analysis I
with total length less than 2 , and we can add small open intervals
(
xi− 4(n+1) , xi+
4(n+1)
)
around each of x0, . . . , xn to cover all of Dc(f) by sets of total length less
than . Since we can do this for any > 0, we conclude that Dc(f) has measure
zero.
This proves one direction of Lebesgue’s criterion. For the converse, we first need to
understand a little more about the sets Dc(f).
Lemma 3.58. Each set Dc(f) is compact.
Proof. We know that Dc(f) is bounded since it is a subset of [a, b], so we only need
to prove that it is closed. Let
(
xn
) ⊂ Dc(f) be a sequence which converges to some
x ∈ R. Given any δ > 0, there is some xn such that |xn − x| < δ2 , and then since
jf (xn) ≥ c we have
sup
|y−xn|< δ2
f(y)− inf
|y−xn|< δ2
f(y) ≥ c.
But if |y − xn| < δ2 then by the triangle inequality
|y − x| ≤ |y − xn|+ |xn − x| < δ
2
+
δ
2
= δ,
so we have
sup
|y−x|<δ
f(y)− inf
|y−x|<δ
f(y) ≥ sup
|y−xn|< δ2
f(y)− inf
|y−xn|< δ2
f(y) ≥ c.
Since this works for any δ > 0, we have jf (x) ≥ c and so x ∈ Dc(f) after all, proving
that Dc(f) is closed.
Proposition 3.59. If f is bounded and D(f) has measure zero, then f is integrable.
Proof. Choose M such that |f(x)| ≤ M for all x ∈ [a, b], and pick > 0. Let
c =
2(b−a) ; since Dc(f) ⊂ D(f) is compact and has measure zero, we can cover it
by finitely many open intervals
Ui = (ci, di), 1 ≤ i ≤ n
of total length
n∑
i=1
(di − ci) <
4M
.
108
3 Integration Math40002: Analysis I
Let S = [a, b] \
n⋃
i=1
Ui; this is closed, as the intersection of the closed sets [a, b] and
R \
n⋃
i=1
Ui, and it is bounded as a subset of [a, b], so S is compact. For every x ∈ S
we have jf (x) < c, so there is some δx > 0 such that
sup
|y−x|<δx
f(y)− inf
|y−x|<δx
f(y) < c,
and in particular
sup
x− δx
2
≤t≤x+ δx
2
f(t)− inf
x− δx
2
≤t≤x+ δx
2
f(t) < c.
The intervals (x− δx2 , x+ δx2 ) for all x ∈ S form an open cover of S, so by compactness
there is some finite subcollection
Vj = (ej , fj), 1 ≤ j ≤ m
which suffices to cover S.
We now form a partition P = (x0, x1, . . . , xk) of [a, b] which contains all of the
numbers ci, di, ej , fj . Every interval [xi, xi+1] either contains a point of Dc(f), in
which case it lies in some [ci, di] and we call it bad, or it is disjoint from Dc(f) but
lies in some [x− δx2 , x+ δx2 ] ⊂ (x− δx, x+ δx), in which case we call it good. Then
U(f, P )− L(f, P ) =
∑
[xi,xi+1] good
(
sup
xi≤t≤xi+1
f(t)− inf
xi≤t≤xi+1
f(t)
)
∆xi
+
∑
[xi,xi+1] bad
(
sup
xi≤t≤xi+1
f(t)− inf
xi≤t≤xi+1
f(t)
)
∆xi.
On the good intervals we have sup f(t) − inf f(t) < c, and on the bad intervals we
only have sup f(t)− inf f(t) ≤ 2M but their total length is less than 4M . Thus
U(f, P )− L(f, P ) ≤
∑
[xi,xi+1] good
c∆xi +
∑
[xi,xi+1] bad
2M∆xi
< c(b− a) + 2M
(
4M
)
=
2
+
2
= .
Since we can find such a P for any > 0, we conclude that f is integrable.
We have now shown that if f is integrable then D(f) has measure zero, and con-
versely that if f is bounded and has measure zero then it is integrable, so this
completes the proof of Lebesgue’s criterion.
109
3 Integration Math40002: Analysis I
3.9 The irrationality of pi
We know that irrational numbers exist by a simple cardinality argument – there are
uncountably many real numbers, and only countably many of them are rational –
but it is usually much harder to prove that specific numbers are irrational. In this
subsection we will use integration to prove that pi is irrational, following a one-page
argument by Ivan Niven in 1946. We recall that we have defined pi as the least x > 0
such that sin(x) = 0.
Given a rational number ab , with a, b ∈ N, we can define a polynomial
fn(x) =
xn(a− bx)n
n!
of degree 2n. (Eventually we will suppose that pi = ab .)
Lemma 3.60. The function fn(x) satisfies
f (i)(0) ∈ Z and f (i)(ab ) ∈ Z
for all integers i ≥ 0.
Proof. Using the binomial theorem, we can write
fn(x) =
xn
n!
n∑
i=0
(
n
i
)
ai(−b)n−ixn−i
=
1
n!
2n∑
j=n
cjx
j ,
where cj =
(
n
j−n
)
a2n−j(−b)j−n ∈ Z for n ≤ j ≤ 2n. We use this to compute that
f
(i)
n (0) =
0, 0 ≤ i < n
ci · i!n! , n ≤ i ≤ 2n
0, i > 2n,
and so f
(i)
n (0) ∈ Z for all i ≥ 0. For the derivatives at x = ab , we note that
fn(x) =
bn
n!
xn(ab − x)n =⇒ fn(x) = fn(ab − x),
so it follows using the chain rule that f
(i)
n (x) = (−1)if (i)n (ab − x) for all i, hence
f
(i)
n (
a
b ) ∈ Z for all i ≥ 0 as well.
110
3 Integration Math40002: Analysis I
Proposition 3.61. Supposing that pi = ab , we have
∫ pi
0
fn(x) sin(x) dx ∈ Z.
Proof. We would like to use the second fundamental theorem of calculus to evaluate
this integral, so we must find an antiderivative for fn(x) sin(x). We use fn to define
an auxiliary polynomial by
gn(x) =
n∑
i=0
(−1)if (2i)n (x)
= fn(x)− f (2)n (x) + f (4)n (x)− · · ·+ (−1)nf (2n)n (x),
and by the previous lemma we have gn(0) ∈ Z and gn(pi) ∈ Z. The function gn is
designed to satisfy a nice differential equation in terms of fn:
g′′n(x) + gn(x) =
(
+ f
(2)
n (x)− f (4)n (x) + · · · − (−1)nf (2n)n (x)
)
+
(
fn(x)− f (2)n (x) + f (4)n (x)− · · ·+ (−1)nf (2n)n (x)
)
= fn(x),
where in the first row we use the fact that fn is a polynomial of degree 2n to see
that f
(2n+2)
n (x) = 0. We now compute that
d
dx
(
g′n(x) sin(x)− gn(x) cos(x)
)
=
(
g′′n(x) sin(x) + g
′
n(x) cos(x)
)
− (g′n cos(x)− gn sin(x))
=
(
g′′n(x) + gn(x)
)
sin(x)
= fn(x) sin(x),
so we have our antiderivative and we can apply the fundamental theorem of calculus:∫ pi
0
fn(x) sin(x) dx =
(
g′n(x) sin(x)− gn(x) cos(x)
)∣∣∣x=pi
x=0
= −gn cos(pi) + gn(0) cos(0)
= gn(pi) + gn(0),
which is an integer because each of the summands on the right is an integer.
Where did the function gn(x) come from? It obscures a simpler idea, whose proof
is only slightly longer: that if we integrate by parts we get∫ pi
0
fn(x) sin(x) dx = fn(x)(− cos(x))
∣∣x=pi
x=0
−
∫ pi
0
f ′n(x)(− cos(x)) dx
=
(
fn(pi) + fn(0)
)
+
∫ pi
0
f ′n(x) cos(x) dx
111
3 Integration Math40002: Analysis I
and then another integration by parts gives us∫ pi
0
f ′n(x) cos(x) dx = f
′
n(x) sin(x)
∣∣x=pi
x=0
−
∫ pi
0
f ′′n(x) sin(x) dx
= −
∫ pi
0
f ′′n(x) sin(x) dx,
so combined we have∫ pi
0
fn(x) sin(x) dx =
(
fn(pi) + fn(0)
)− ∫ pi
0
f ′′n(x) sin(x) dx.
Repeated application of this last identity to f ′′n(x), f
(4)
n (x), . . . , f
(2n)
n (x) gives∫ pi
0
fn(x) sin(x) dx =
n∑
i=0
(−1)i(f (2i)n (pi) + f (2i)n (0))
+ (−1)n+1
∫ pi
0
f
(2n+2)
n (x) sin(x) dx,
and the integral on the right is zero because f
(2n+2)
n (x) = 0, so∫ pi
0
fn(x) sin(x) dx =
n∑
i=0
(−1)i(f (2i)n (pi) + f (2i)n (0)) ∈ Z
as claimed.
Theorem 3.62
pi is irrational.
Proof. Suppose that pi = ab , with a, b ∈ N. For 0 < x < ab we have the inequality
0 < xn(a− bx)n < (ab)n an = (api)n,
and 0 < sin(x) ≤ 1 on the same interval, so combining these bounds gives
0 < fn(x) sin(x) ≤ (api)
n
n!
for 0 < x < pi.
Integrating from 0 to pi, we see that
0 <