COMP4141
Theory of Computation
Lecture 7: Algorithms for CFLs
Administrivia
Assignment 1 now overdue
Assignment 2 available: due Monday, 21 March at 12 noon
(AEDST)
2
Outline
Decision Problems
Algorithms for Regular Languages
Algorithms for Context-Free Languages
3
Common questions about languages
We have seen a number of ways of representing languages in a
finite way (automata, regular expresions, grammars). This lets us
address several questions algorithmically:
Emptiness: Given a representation, is the associated language
empty?
Universality: Given a representation, is the associated
language universal?
Word/Acceptance problem: Given a word w and a
representation of a language L, is w ∈ L?
Language equivalence: Given representations of two
languages, L1 and L2, is L1 = L2?
Language inclusion: Given representations of two languages,
L1 and L2, is L1 ⊆ L2?
4
Decision Problems
When we can phrase a question we want answered as a Yes/No
question this is known as a Decision Problem.
For example:
NFA Emptiness
Input: An NFA N
Question: Is L(N) = ∅
5
Outline
Decision Problems
Algorithms for Regular Languages
Algorithms for Context-Free Languages
6
Deciding emptiness of a regular language
How to decide emptiness of a regular language L depends on its
representation, of which we’ve met a few.
NFA Emptiness
Input: An NFA N
Question: Is L(N) = ∅?
RegExp Emptiness
Input: A regular expression R
Question: Is L(R) = ∅?
7
NFA Emptiness
When L is given as L(A) of an NFA (Q,Σ, δ, q0,F ) (or DFA,
-NFA) is an exercise in graph reachability: Is there a final state
that can be reached from the initial state?
This can be done by a depth-first search in time linear in the
number of edges and vertices. (But note there could be |Q|2
edges.)
8
RegExp Emptiness
When L is given as L(R) of a regular expression R then we can
abstract inductively as follows:
Base: L(∅) = ∅, L() 6= ∅, and L(a) 6= ∅.
Induction:
L(R1 ∪ R2) = ∅ iff L(R1) = L(R2) = ∅.
L(R1 · R2) = ∅ iff L(R1) = ∅ or L(R2) = ∅.
L(R∗1 ) 6= ∅.
L((R1)) = ∅ iff L(R1) = ∅.
This implies that L(R) = ∅ can be decided in O(|R|) time.
9
The word problem for regular languages
Like the emptiness problem, the word problem depends on the
representation.
DFA Acceptance
Input: A DFA D and a word w
Question: Is w ∈ L(D)?
NFA Acceptance
Input: An NFA N and a word w
Question: Is w ∈ L(N)?
10
Word problems
DFA: easy, just feed the word to the automaton.
NFA: marking algorithm:
On input w = a1 . . . a|w |
1 Set of marks M := {q0}.
2 For i = 1 to |w | do
M :=
⋃
q∈M δ(q, ai )
3 Return whether F ∩M 6= ∅.
Others: translate to an equivalent DFA.
11
The inclusion problem for regular languages
DFA Language Inclusion
Input: Two DFAs A1 and A2
Question: Is L(A1) ⊆ L(A2)?
Algorithm:
1 Construct a DFA recognising L(A1) ∩ L(A2)
2 Check for emptiness
Other represntations: Translate to equivalent DFAs.
12
The inclusion problem for regular languages
DFA Language Inclusion
Input: Two DFAs A1 and A2
Question: Is L(A1) ⊆ L(A2)?
Algorithm:
1 Construct a DFA recognising L(A1) ∩ L(A2)
2 Check for emptiness
Other represntations: Translate to equivalent DFAs.
12
The inclusion problem for regular languages
DFA Language Inclusion
Input: Two DFAs A1 and A2
Question: Is L(A1) ⊆ L(A2)?
Algorithm:
1 Construct a DFA recognising L(A1) ∩ L(A2)
2 Check for emptiness
Other represntations: Translate to equivalent DFAs.
12
The equivalence problem for regular languages
DFA Language Equivalence
Input: Two DFAs A1 and A2
Question: Is L(A1) = L(A2)?
Algorithm:
1 Check if L(A1) ⊆ L(A2)
2 Check if L(A2) ⊆ L(A1)
Other representations: translate to equivalent DFAs.
13
The equivalence problem for regular languages
DFA Language Equivalence
Input: Two DFAs A1 and A2
Question: Is L(A1) = L(A2)?
Algorithm:
1 Check if L(A1) ⊆ L(A2)
2 Check if L(A2) ⊆ L(A1)
Other representations: translate to equivalent DFAs.
13
The equivalence problem for regular languages
DFA Language Equivalence
Input: Two DFAs A1 and A2
Question: Is L(A1) = L(A2)?
Algorithm:
1 Check if L(A1) ⊆ L(A2)
2 Check if L(A2) ⊆ L(A1)
Other representations: translate to equivalent DFAs.
13
Outline
Decision Problems
Algorithms for Regular Languages
Algorithms for Context-Free Languages
14
Emptiness of CFLs
CFG Emptiness
Input: A CFG G = (V ,Σ,P,S)
Question: Is L(G ) = ∅?
The emptiness problem for CFGs can be decided as follows:
1 Mark the terminals and , as generating
2 Mark as generating all those non-terminals that have a
production with only generating symbols in the RHS.
3 Repeat step 2 until nothing new is marked generating.
4 Check whether the start symbol is marked as generating.
15
Emptiness of CFLs
CFG Emptiness
Input: A CFG G = (V ,Σ,P,S)
Question: Is L(G ) = ∅?
The emptiness problem for CFGs can be decided as follows:
1 Mark the terminals and , as generating
2 Mark as generating all those non-terminals that have a
production with only generating symbols in the RHS.
3 Repeat step 2 until nothing new is marked generating.
4 Check whether the start symbol is marked as generating.
15
Chomsky normal form
For many of the remaining questions, it is convenient to introduce
a particularly simple class of CFGs that is still as powerful as CFGs
in general.
Definition
A context free (type 2) grammar is in Chomsky normal form if
every production is of one of the forms
S → , or
A→ BC where B and C are not S , or
A→ a.
Theorem
Any CFL is generated by a CFG in Chomsky normal form.
16
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
Step 0: Define an equivalent CFG G0 = (V ∪ {S0},Σ,P0,S0) with a fresh start
variable S0 and the production S0 → S . We also remove all productions of
the form A → and patch this up by introducing new productions for every
occurrence of A in a body with that occurrence removed.
(E.g., A→ | aAbA becomes A→ ab | abA | aAb | aAbA.)
Productions B → A are replaced by B → unless that is one we had removed
earlier.
Repeat until occurs at most in S0 → .
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
Step 1: Define an equivalent CFG G1 = (V1,Σ,P1,S0) with fresh non-terminals
for terminals V1 = V ∪ {S0} ∪ { Xa | a ∈ Σ }. Here P1 is derived from P0 by
replacing all occurrences of a terminal a ∈ Σ in the body of a production by the
corresponding non-terminal Xa and then adding productions Xa → a.
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
Step 2: Define an equivalent CFG G2 = (V1,Σ,P2, S0). To generate P2 from
P1, eliminate productions of the form A→ B by
1 determining all derivations A1 ⇒G1 . . .⇒G1 Ak ⇒G1 α /∈ V1 not
containing repetitions of non-terminals,
2 drop all productions A→ B, and
3 introduce A1 → α for each of the derivations determined before.
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
Step 3: Define an equivalent CFG G3 = (V3,Σ,P3, S0). To generate P3 from
P2, replace all productions A→ B1 . . .Bn with n > 2 by A→ B1C1, C1 → B2C2,
. . . , Cn−2 → Bn−1Bn for fresh non-terminals Ci . (These are distinct for each
such production.)
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
S → ASA | aB
A → B | S
B → b |
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
S0 → S
S → ASA | aB | a
A → B | S | b |
B → b
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
S0 → S
S → ASA | aB | a | SA | AS | S
A → B | S | b
B → b
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
S0 → S
S → ASA | XaB | a | SA | AS | S
A → B | S | b
B → b
Xa → a
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
S0 → ASA | XaB | a | SA | AS
S → ASA | XaB | a | SA | AS
A → b | ASA | XaB | a | SA | AS
B → b
Xa → a
17
Constructing the Chomsky normal form
Let G = (V ,Σ,P, S) be a CFG.
Step 0 Remove S and from RHS
Step 1 Remove terminals from RHS
Step 2 Remove singleton rules (A→ B)
Step 3 Reduce RHS to pairs of non-terminals
S0 → AA1 | XaB | a | SA | AS
S → AA1 | XaB | a | SA | AS
A → b | AA1 | XaB | a | SA | AS
B → b
Xa → a
A1 → SA
17
The word problem for CFLs
CFG Acceptance
Input: A CFG G and a word w
Question: Is w ∈ L(G )?
Naıve algorithm:
1 Convert to Chomsky normal form
2 Explore all derivations (each derivation has length 2|w |)
(Running time O(2n))
A better approach:
Theorem
Let G = (V ,Σ,P,S) be a CFG in Chomsky normal form. Then
the CYK-algorithm decides w ∈ L(G ) in time O(|w |3) for w ∈ Σ∗.
18
The word problem for CFLs
CFG Acceptance
Input: A CFG G and a word w
Question: Is w ∈ L(G )?
Naıve algorithm:
1 Convert to Chomsky normal form
2 Explore all derivations (each derivation has length 2|w |)
(Running time O(2n))
A better approach:
Theorem
Let G = (V ,Σ,P,S) be a CFG in Chomsky normal form. Then
the CYK-algorithm decides w ∈ L(G ) in time O(|w |3) for w ∈ Σ∗.
18
The word problem for CFLs
CFG Acceptance
Input: A CFG G and a word w
Question: Is w ∈ L(G )?
Naıve algorithm:
1 Convert to Chomsky normal form
2 Explore all derivations (each derivation has length 2|w |)
(Running time O(2n))
A better approach:
Theorem
Let G = (V ,Σ,P,S) be a CFG in Chomsky normal form. Then
the CYK-algorithm decides w ∈ L(G ) in time O(|w |3) for w ∈ Σ∗.
18
The CYK-Algorithm
The CYK-algorithm (for Cocke-Younger-Kasami) works as follows.
Given w = a1 . . . an compute for i , j ∈ {1, . . . , n} the set
Nij = { A ∈ V | A⇒∗G ai . . . aj } of non-terminals generating
ai . . . aj . Then
w ∈ L(G ) iff S ∈ V1n .
The details are not in [Sipser2006].
19
The CYK-Algorithm
for CFG G = (V ,Σ,P, S) in Chomsky normal form.
“On input w = a1 . . . an
1 for i , j := 1 to n do
Ni ,j :=
{
{ A ∈ V | A→ ai ∈ P } if i = j
∅ otherwise
2 for d := 1 to n − 1 and i := 1 to n − d do
1 j := i + d
2 for k := i to j − 1 do
Ni,j := Ni,j ∪
{
A ∈ V
∣∣∣∣ ∃B,C ( A→ BC ∈ P ∧B ∈ Ni,k ∧ C ∈ Nk+1,j
) }
3 Return whether S ∈ N1,n.
20
CYK-Algorithm Example
Example
Decide if accb ∈ L(G )
where G is defined by the rules:
S →
AB
A →
CA | a | b
B →
BB | c
C →
CB | AB | c
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G )
where G is defined by the rules:
S →
AB
A →
CA | a | b
B →
BB | c
C →
CB | AB | c
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G )
where G is defined by the rules:
S →
AB
A →
CA | a | b
B →
BB | c
C →
CB | AB | c
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S →
AB
A →
CA |
a | b
B →
BB |
c
C →
CB | AB |
c
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA
| a | b
B → BB
| c
C → CB | AB
| c
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA
| a | b
B → BB
| c
C → CB | AB
| c
A C
B,C A
A B
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA
| a | b
B → BB
| c
C → CB | AB
| c
A
, S ,C
A C
B,C A
A B
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA
| a | b
B → BB
| c
C → CB | AB
| c
A, S ,C
A C
B,C A
A B
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
A B,C B,C A
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,C B,C A
A B,C B,C A
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,C A
S ,C B,C A
A B,C B,C A
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
A
S ,C A
S ,C B,C A
A B,C B,C A
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
A
S ,C A
S ,C B,C A
A B,C B,C A
a c c b
A ⇒G CA
⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
A
S ,C A
S ,C B,C A
A B,C B,C A
a c c b
A ⇒G CA ⇒∗G ABCA
⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if accb ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
A
S ,C A
S ,C B,C A
A B,C B,C A
a c c b
A ⇒G CA ⇒∗G ABCA ⇒∗G accb
21
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
c c a c
S ⇒G AB
⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
B,C B,C A B,C
c c a c
S ⇒G AB
⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB
⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
A,B,C S ,C
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB
⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,B,C
A,B,C S ,C
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB
⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,B,C
A,B,C S ,C
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB
⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,B,C
A,B,C S ,C
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB ⇒G CAB
⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,B,C
A,B,C S ,C
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB ⇒G CAB ⇒G CCAB
⇒∗G ccac
22
CYK-Algorithm Example
Example
Decide if ccac ∈ L(G ) where G is defined by the rules:
S → AB
A → CA | a | b
B → BB | c
C → CB | AB | c
S ,B,C
A,B,C S ,C
B,C A S ,C
B,C B,C A B,C
c c a c
S ⇒G AB ⇒G CAB ⇒G CCAB ⇒∗G ccac
22
Preview of undecidable CFL problems
Later we’ll develop a theory that allows us to prove rigorously that
there are problems that cannot be solved by any algorithm that
can be implemented as a computer program.
Such problems are called undecidable.
Some simple decision problems in the realm of CFLs turn out to be
undecidable:
Is a given CFG ambiguous?
Is any CFG for a given CFL necessarily ambiguous?
Is the intersection of two given CFLs empty?
Are two given CFGs/PDAs equivalent?
Does a given CFG generate all strings Σ∗?
23