asd

© All Rights Reserved

7 vistas

asd

© All Rights Reserved

- NCERT Class 11 Mathematics Problems
- Formal Methods
- Graphing Linear Equations
- functions and graphs hsn
- UT Dallas Syllabus for math1314.001 05f taught by Joanna Robinson (joanna)
- Chapter 2
- Functional equations.pdf
- Untitled
- Function Worksheet
- 1. Functions[1]
- Fljs Sample
- Instantaneous Slowness Versus Depth Functions
- digital-bbpt lesson plan
- 04 SEP 600 Application Configuration
- unity-certified-developer-exam-objectives.pdf
- Glossary
- Category Based Application Engine
- isofinal
- Putnam Anal
- matrix borgatti.pdf

Está en la página 1de 113

nell, John Banks and the Department of Mathematics and Statistics, La Trobe

University. As such, reproduction of this material may only be undertaken with

their express permission.

i

Machines and Languages

Subject Notes Part A for MAT2ALC

Algebra, Linear Codes and Automata

This text was developed by Peter Stacey and subsequently revised by Kevin

Bicknell and John Banks. The 2012 full edition was typeset by John Banks.

Contents

1.1. Cartesian products 3

1.2. Relations 4

1.3. Functions 6

1.4. An alternative view of Cartesian products 8

1.5. Combining relations and functions 10

2.1. Directed graphs of relations 13

2.2. Properties of binary relations 14

2.3. Closures of binary relations 18

3.1. Deterministic finite state machines 21

3.2. Finite state machines without output 24

3.3. Recognition machines 24

3.4. Notations for words and languages 26

3.5. Extended transition function and suffix sets 27

4.1. Grammars and regular languages 29

4.2. Recognition machines for regular grammars 33

4.3. Nondeterministic machines 35

4.4. Regular Expressions 36

5.1. Equivalent deterministic and nondeterministic machines 39

5.2. Simplifying deterministic machines 41

5.3. An algorithm for finding suffix equivalence classes 43

5.4. Designing machines from language descriptions 46

v

vi CONTENTS

6.1. Non-Regular Languages 49

6.2. Stacks 51

6.3. Push down automata 54

6.4. Nondeterministic push down automata 59

7.1. Context free grammars 63

7.2. Greibach normal form 67

7.3. PDA and context free grammars 70

7.4. Deterministic context free languages 73

8.1. How big is a set? 75

8.2. Countable sets 79

8.3. How big is a language? 82

8.4. Uncountable sets 86

9.1. Not all languages are context free 91

9.2. Turing machines 94

9.3. The power of Turing machines 102

Part 1

Chapter One

Relations play an important role in computer science, for ex-

ample in conceptual models for databases. A mathematical de-

scription of relations can be based on the notion of Cartesian

products of sets. Functions and binary operations can be de-

fined as special sorts of relations.

Structures for Computer Science by Alan Doerr and Kenneth Levasseur (SRA,

1985).

If D1 , D2 , . . . , Dn are sets then the Cartesian product D1 D2 Dn is usually1

defined to be the set of all ordered n-tuples (D1 , D2 , . . . , Dn ) where d1 D1 , d2

D2 , . . . , dn Dn . An ordered n-tuple is just a list of n objects in a particular order

so that, for example, the 3-tuple (1, 2, 3) is different from the 3-tuple (3, 2, 1).

Although the word product and the multiplication sign are used, Cartesian

products have nothing to do with ordinary multiplication. It is conventional to

use round brackets to enclose the elements of an n-tuple, in which the order

matters, and curly brackets to enclose the elements of a set, in which the order

doesnt matter.

jects which can be linked by various relationships. For example D1 could consist

of names, such as D1 = {Andrew, Michelle, Tracey} and D2 could consist of sub-

urbs, for example, D2 = {Bundoora, Greensborough, Heidelberg}. Then D1 D2

consists of all pairings of names with suburbs (with names listed first). So

(Michelle, Bundoora), (Michelle, Greensborough), (Michelle, Heidelberg),

(Tracey, Bundoora), (Tracey, Greensborough), (Tracey, Heidelberg)}.

3

4 1. RELATIONS AND FUNCTIONS

Example 1.1.2. When D1 , D2 are sets of numbers we can draw a picture of their

Cartesian product. We represent the ordered pair (D1 , D2 ) by the point in the

plane at a horizontal distance D1 and a vertical distance D2 from a chosen origin.

For example, if D1 = {1, 2} and D2 = {2, 3, 4} then

D1 D2 = {(1, 2), (1, 3), (1, 4), (2, 2), (2, 3), (2, 4)}

y

x

1 2

m1 m2 elements. This is one reason why the Cartesian product is called a

product. More generally, if D1 has m1 elements, D2 has m2 elements and so on,

then D1 D2 Dn has m1 m2 mn elements. To see why this is

true, notice that for each of the m1 choices for the first coordinate there are m2

choices for the second coordinate, giving a total of m1 m2 choices for the first

two coordinates. For each of these there are m3 choices for the third coordinate,

giving a total of m1 m2 m3 choices for the first three coordinates, and so on.

1.2. Relations

If D1 , . . . , Dn are any sets, then a relation between elements of these sets is a

subset of the Cartesian product D1 D2 Dn .

D2 = {Bundoora, Greensborough, Heidelberg}. If Andrew and Tracey live in

Greensborough and Michelle lives in Bundoora, then the lives in relation be-

tween D1 and D2 is given by the set L = {(Andrew, Greensborough), (Michelle,

Bundoora), (Tracey, Greensborough)}. If Andrew and Michelle work in Heidel-

berg and Tracey works in Bundoora, the works in relation between D1 and D2

is W = {(Andrew, Heidelberg), (Michelle, Heidelberg), (Tracey, Bundoora)}.

1.2. RELATIONS 5

addition relation A is given by all the 3-tuples (d1 , d2 , d3 ) for which d3 = d1 + d2 ,

so

A = {(1, 1, 2), (1, 2, 3), (2, 1, 3), (1, 3, 4), (2, 2, 4), (3, 1, 4), . . . }.

be described by the set of all ordered pairs (d1 , d2 ) for which d1 d2 , so

B = {(1, 1), (1, 2), (1, 3), (2, 2), (1, 4), (2, 3), . . . }.

subset of a Cartesian product D1 D2 Dn and hence we use (d1 , . . . , dn ) R

to specify that D1 , . . . , Dn are related. The simplest non-trivial case of relations,

when the Cartesian product involves just two sets, is particularly important. Such

relations are called binary relations. In the special case of binary relations it is

common to write d1 R d2 instead of (d1 , d2 ) R when D1 is related to D2 .

Sometimes we create some sort of special symbol associated with the relation

R and write d1 d2 instead of (d1 , d2 ) R. Thus, for example, we always write

d1 = d2 instead of (d1 , d2 ) R = {(d1 , d2 ) : d1 is equal to d2 }

and

d1 < d2 instead of (d1 , d2 ) R = {(d1 , d2 ) : d1 is less than d2 }.

The inverse of a binary relation R D1 D2 is defined to be the to be the set

R1 = {(y, x) : (x, y) R} D2 D1 .

{(Andrew, Greensborough), (Michelle, Bundoora),(Tracey, Greensborough)}

is the has a resident relation

{(Greensborough, Andrew), (Bundoora, Michelle),(Greensborough, Tracey)}.

a relation S R on D1 D3 , known as the composite of R and S, by2

S R = {(d1 , d3 ) : there exists d2 D2 with (d1 , d2 ) R and (d2 , d3 ) S}.

(Michelle, Bundoora), (Tracey, Greensborough)} and T = {(Greensborough, train),

(Bundoora, tram), (Heidelberg, train)} is T L = {(Andrew, train), (Michelle,

tram), (Tracey, train)}. Notice that Heidelberg does not appear in L, but that

this does not matter. In fact a composite of non-empty relations could easily turn

out to be empty.

2Fenced off sections of the text can be omitted by students focussing on the basics.

6 1. RELATIONS AND FUNCTIONS

is called a ternary relation. In general the number n of places in a relation

R D1 D2 Dn is called the arity of R.

1.3. Functions

A function is a binary relation R D1 D2 with the special property that, for

each d1 D1 there is exactly one d2 D2 with (d1 , d2 ) R. The set D1 is usually

called the domain of the function and D2 is called its codomain and we say that

R is a function from D1 to D2 .

a function because for the choice d1 = 10, for instance, there are lots of different

elements d2 N with (10, d2 ) R. For example (10, 1) R and (10, 2) R.

Another reason R fails to be a function is that for d1 = 1 there is no d2 such that

(d1 , d2 ) R because 1 is less than or equal to every natural number.

Then F is a function from d1 to d2 . Note however, that if we let D1 = N then

F would fail to be a function because there would be no d2 D2 for which

(1, d2 ) R.

a function: whichever d1 we pick in N there is exactly one choice of d2 (namely

d2 = d1 d1 ) for which (d1 , d2 ) G.

In Example 1.3.3 there was a rule associated with elements of R, namely square

the first element to get the second one. Similarly, all functions can be thought

of as rules. Given an element d1 in the domain, the rule produces the unique

element d2 in the codomain for which (d1 , d2 ) R. If we call the function f , then

we use the notation f (d1 ) = d2 to describe the fact that d2 depends on d1 . When

discussing functions we will frequently use the traditional notation

f : D1 D2

more briefly f mapping D1 to D2 . In this notation, the subset of D1 D2

which we have used to define the function is called the graph of the function. As

shown in calculus or precalculus courses, a function specified as a rule determines

a unique graph. As outlined above, the graph also determines the rule. Hence

either the rule or the subset of D1 D2 characterises the function.

Since a function f is just a binary relation, it always has an inverse f 1 as defined

in 1.2.1, but there is no guarantee that the inverse will be a function.

1.3. FUNCTIONS 7

written in ordered pair notation as f = {(x, x2 ) : x R}. Its inverse,

f 1 = {(x2 , x) : x R}

the square root relation, is not a function because, for example, it contains the

pairs (4, 2) and (4, 2).

When the domain of a function f is finite (as is often the case in computer science

applications) we can represent it using a table by simply tabulating all of the x

and f (x) values.

may be defined by the following table.

x 0 1 2 3

f (x) 3 2 1 0

We could even list all possible functions mapping {0, 1, 2, 3} to {0, 1, 2, 3} in a

single table as follows.

x 0 1 2 3

f1 (x) 0 0 0 0

f2 (x) 0 0 0 1

f2 (x) 0 0 0 2

.. .. .. .. ..

. . . . .

f256 (x) 3 3 3 3

Since there are 44 = 256 such functions, we have omitted most of them!

function with domain S and codomain D2 which takes exactly the same values as

f , but only for elements of S. The restriction of f to S is sometimes written as

f |S , so f |S (x) = f (x) for all x S. In the ordered pair notation for a function

f |S = {(x, y) f : x S}.

1.3.1. Partial functions. A partial function is a relation R D1 D2 with

the property that, for each d1 D1 there is at most one d2 D2 with (d1 , d2 ) R.

Unlike a function, we dont insist that a partial function be defined at every

d1 D1 . Every function f : D1 D2 is a partial function satisfying the additional

condition that for every d1 D1 there is at least one d2 D2 such that (d1 , d2 ) f ,

so functions are a special case of partial functions.

R is not a partial function because when d1 = 10, for instance, there are lots

of different elements d2 N with (10, d2 ) R. For example (10, 1) R and

(10, 2) R.

8 1. RELATIONS AND FUNCTIONS

F is a partial function from D1 to D2 . It doesnt matter that there is no d2 D2

for which (1, d2 ) R. In Example 1.3.2 we observed that F is not a function.

Partial functions arise in many settings in computer science. We will see many

examples of them in our study of automata. As for functions, we can specify a

partial function on a finite set using a table. The only difference is that we use

the symbol to indicate that the partial function is not defined for certain input

values.

Example 1.3.8. The the following table gives the values of the square root

relation

S = {(x, y) D D : x = y 2 }

on the set D = {0, 1, 2, 3, 4, 5} and illustrates the fact that S is indeed a partial

function.

x 0 1 2 3 4 5

f (x) 0 1 2

In accordance with usual practice in elementary mathematics, we introduced the

Cartesian product D1 D2 . . . Dn as the set of all ordered n-tuples

(d1 , d2 , . . . , dn ) where d1 D1 , d2 d2 , . . . , dn Dn . Another equivalent way

of viewing this product is as the set of all functions F with domain3

Nn = {1, 2, . . . , n}

that have the property that f (i) Di for each i Nn . Although the equivalence

of this approach may not seem obvious at first sight, notice that:

(a) For each function f having the above property, it is true by definition

that the n-tuple (f (1), f (2), . . . , f (n)) is an element of D1 D2 Dn

as traditionally defined.

(b) For each (d1 , d2 , . . . , dn ) D1 D2 Dn the function

f : Nn D1 D2 Dn : i 7 di

Together (a) and (b) show that there is a one to one correspondence

3In case you are worried about it, D1 D2 Dn provides a suitable the codomain for

these functions.

1.4. AN ALTERNATIVE VIEW OF CARTESIAN PRODUCTS 9

set of functions defined above. In the context of this representation of Cartesian

products, the set Nn which is the domain of all of the functions in the product is

called the index set for the product. One of the advantages of this representation

is that we can actually use any finite set in place of the standard index set Nn .

This allows meaningful names to be used for the coordinates or places in

the product. In the theory of relational databases, this corresponds t the named

perspective on the concept of Cartesian product and it is the perspective we will

adopt when discussing the theory of relational databases in the next few chapters.

coordinates where the horizontal or x coordinate comes first and the vertical or

y coordinate comes second. Upon reflection, it should be clear that the order

of these coordinates is not important. What really matters is that we dont mix

up the horizontal and vertical coordinates. Listing them in a particular order is

merely one way of doing that. We could just as easily represent the plane as the

set of all functions p : {h, v} R. In this representation, our index set is {h, v},

the horizontal coordinate of the point represented by p is p(h) and p(v) is the

vertical coordinate. For example, the point we normally write as (1, 2) would be

represented by the function p : {h, v} R where p(h) = 1 and p(v) = 2.

sian products as opposed to the n-tuple representation of Cartesian products. This

representation of Cartesian products turns out to be particulary useful in the

mathematical description of relational databases because we often want to asso-

ciate meaningful names or labels with the places in a Cartesian product or a

relation. It also avoids the need to order the places a significant theoretical

advantage from the relational database point of view.

the set of all functions f with domain {name, residence} such that

f (residence) D2 = {Bundoora, Greensborough, Heidelberg}.

the following table.

10 1. RELATIONS AND FUNCTIONS

x name residence

f1 (x) Andrew Greensborough

f2 (x) Michelle Bundoora

f3 (x) Tracey Greensborough

The fact that the function representation of the Cartesian product allows us to

describe this relation using a table will be particularly relevant for description of

relations in relational databases, as discussed in the next chapter. In this context,

we typically omit the names of the functions (unless we need them for some

reason), so the table becomes a bit simpler:

name residence

Andrew Greensborough

Michelle Bundoora

Tracey Greensorough

The function representation of Cartesian products makes it easy define the product

of infinitely many sets. For example, the set of all infinite sequences of natural

numbers is an infinite product of copies of N. This means the set of all possible

functions S : N N. The index set is now N. Similarly, the set of all infinite

sequences of real numbers is an infinite product of copies of R which just means

the set of all functions S : N R. Such products are of vital importance in many

branches of mathematics.

Recall that, given two sets R and S, the union of R and S, denoted by RS, is the

set of all elements which are either in R or S (possibly both) and the intersection

of R and S, denoted by R S, is the set of all elements which are both in R and

S. Thus

R S = {x : x R or x S}

and

R S = {x : x R and x S}.

The set difference (often denoted by RS but here by R\S to avoid any confusion

with subtraction) consists of the elements of R which are not in S i.e.

R\S = {x R : x

/ S}.

Venn diagrams for these three operations are given in Figure 1. It is fairly easy

1.5. COMBINING RELATIONS AND FUNCTIONS 11

R S R S R S

to see that if R and S are both subsets of the same Cartesian product

D1 D2 Dn

between elements of D1 , D2 , . . . , Dn then so are R S, R S and R \ S.

Example 1.5.1.

(a) If R is the less than relation {(m, n) : m, n N, m < n} on the natural

numbers N and S is the equals relation {(m, m) : m N} on N, then

R S is the less than or equal to relation on N.

(b) If R is the less than or equal to relation {(m, n) : m, n N, m n} on N

and S is the greater than or equal to relation {(m, n) : m, n N, m n}

on N, then R S is the equals relation on N.

(c) If R is the less than or equal to relation {(m, n) : m, n N, m n}

on N and S is the equals relation {(m, m) : m N} on the natural

numbers, then R \ S is the less than relation on N.

We can also take unions, intersections and differences of pairs relations that are

subsets of different Cartesian products, but this only yields a relation in cases

where the arities (the numbers of sets in the two products) are the same. If

R D1 D2 Dn and S E1 E2 En it turns out that

R S (D1 E1 ) (D2 E2 ) (Dn En ),

R \ S D1 D2 Dn .

Example 1.5.2.

(a) Let F be the set of female mathematics students and M be the set of

male mathematics students at La Trobe University. The union of the

relations S = {(x, y) : x, y F, x is a sister of y} and B = {(x, y) :

x, y M, x is a brother of y} is the same sex sibling relation on the set

F M of all mathematics students. Note that S B is not the same as

12 1. RELATIONS AND FUNCTIONS

it does not contain any (brother, sister) or (sister, brother) pairs.

(b) It makes no sense to take the union of the (binary) less than relation

{(a, b) : a < b} NN with the (ternary) addition relation {(a, b, c) : c =

a+b} NNN because the union of these two sets would be a mixture

of 3-tuples and 2-tuples and therefore not a relation. The intersection or

difference of these relations would not be a relation either .

Since a function is by definition a subset of a Catesian product of two sets, the do-

main and codomain, we can always take the union of two functions. The resulting

union may or may not turn out to be a function.

Example 1.5.3.

(a) The union of the functions f = {(x, x) : x R, x 0} and

g = {(x, x) : x R, x 0} is the square root relation pictured in

Figure 2. Although f g is the inverse of the function {(x, x2 ) : x R},

it is not a function itself.

(b) The union of f = {(x, x) : x R, x 0} and g = {(x, x) : x R, x 0}

is a function, known as the absolute value function, usually written |x|.

1 2

0 x 1

1 2 3 4

1 x

2 1 0 1 2

2

function precisely when f and g agree on D1 D3 , ie, when f (x) = g(x) for all

x-values in D1 D3 .

will indeed be a function. The Lemma still holds in these cases because, from a

logicians point of view, it is true that f (x) = g(x) for all x-values in D1 D3 .

Chapter Two

Binary relations on finite sets can be represented using directed

graphs. These can be used to understand important special

properties like reflexivity, symmetry and transitivity possessed

by many binary relations. When a relation does not have one of

these properties, we can often find a larger relation which does.

The smallest such relation is the closure of the original relation

with respect to the given property.

There are two quite different ways of picturing a binary relation R D D for

some set D. The first of these was introduced in Example 1.1.2 and only makes

sense if D R. The second, using directed graphs, can be applied to any (finite)

set D and is particularly helpful when we are trying to understand various special

properties of binary relations. The idea is very simple.

For each element d D we draw a vertex (a dot) and label it d.

For each (d, e) R we draw a directed edge (an arrow) from d to e (so we

must draw a loop in the case where d = e).

Example 2.1.1. Let D = {1, 2, 3} and let R be the less than or equal to

relation R = {(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)}. The directed graph is

1 2 3

For the same set, the directed graph representing the strictly less than relation

R = {(1, 2), (1, 3), (2, 3)} is

1 2 3

13

14 2. PROPERTIES OF BINARY RELATIONS

and the equality relation R = {(1, 1), (2, 2), (3, 3)} has directed graph

1 2 3

We have just seen how easy it is to draw the directed graph of a relation written

as a finite set of ordered pairs. In the opposite direction, it is just as easy to write

down an ordered pair description of the relation represented by a given directed

graph. In fact, from a mathematical point of view a binary relation on a set S is

essentially the same thing as a directed graph with vertex set s.

We return for now to some ideas about binary relations that are not particularly

relevant to the theory of relational databases. It will be convenient to use the

ordered pair representation for these relations. A relation R X X is said to

be symmetric if whenever x is related to y then y is related to x (i.e. if whenever

(x, y) R then (y, x) R). It is called antisymmetric if whenever (x, y) R and

x 6= y then (y, x) / R (alternatively, whenever both (x, y) R and (y, x) R

then x = y). It is said to be reflexive if every element is related to itself (i.e. if

(x, x) R for each x X) and is said to be transitive if whenever (x, y) R and

(y, z) R then (x, z) R.

It can be helpful to picture these properties using a notation for binary relations

instead of (x, y) R. Then we have

(a) Symmetry: x y implies y x

(b) Antisymmetry: x y and y x implies x = y

(c) Reflexivity: x x for each x

(d) Irreflexivity: x x is false for each x

(e) Transitivity: x y and y z implies x z.

definitions can make them difficult to understand. This is where directed graphs

come to the rescue. Each of these properties translates directly into a property of

the directed graph representing the binary relation. In all cases (with the possible

exception of transitivity) it is very easy to decide whether a graph has the required

property.

(a) Symmetry: If there is an directed edge from a to b there must also be

one from b to a. In pictures, for any vertices a and b you can have

2.2. PROPERTIES OF BINARY RELATIONS 15

a b a b a b

this . . . . . . or this . . . . . . but not this.

(b) Antisymmetry: There are no pairs of vertices a and b with an edge from

a to b and an edge from b to a. You can never have

a b

(d) Irreflexivity: There are no loops at all.

(e) Transitivity: Wherever you find an edge from a to b and an edge from b

to c, there must also be an edge from a to c.

a b c a b c

where you see this . . . . . . you must also have this.

from a to b and an edge from b to a, so there must be a loop at a and

therefore also a loop at b.

a b a b a b

if you see this . . . . . . you must also have this . . . and also have this.

a b a b

So, if you see this . . . . . . you must have two loops.

In addition to the above properties, note that concepts like being a partial func-

tion and being a function are also properties of binary relations. These too,

admit simple interpretations in terms of directed graphs.

(a) Partial Function: There is at most one edge coming out of every vertex.

You can never have:

b

a

c

16 2. PROPERTIES OF BINARY RELATIONS

(b) Function: There is exactly one edge coming out of every vertex.

You can never have

b

a no out edge

c a b

this . . . . . . or this.

the set D on which he relation is defined is finite. Nonetheless, the above graph-

ical properties can assist us to understand the meaning of properties of binary

relations.

2.2.2. Partial orders and equivalences. Two types of binary relations are

particularly important in computer science:

A relation which is reflexive, transitive and antisymmetric is known as a

partial order.

A relation which is symmetric, transitive and reflexive is called an equiv-

alence relation. For an equivalence relation R it is customary to write

x R y (or just x y when the relation is clear) rather than (x, y) R.

Example 2.2.1.

(a) Let R = {(n, n) : n Z} (the equality relation on Z). Then R is sym-

metric (because x = y implies y = x), reflexive and transitive, so is an

equivalence relation. It is also antisymmetric in a subtle way (because we

never have (x, y) R and x 6= y), so is also a partial order.

(b) Let R = {(x, y) Z Z : x y} (the less than or equal to relation

on Z). Then R is reflexive, transitive and antisymmetric (because when

x y and x 6= y then y x.) It is not symmetric (because, for example,

(1, 2) R but (2, 1)

/ R.) Hence it is a partial order (as we would hope)

but not an equivalence relation.

(c) Let R = {(x, y) Z Z : x < y} (the strictly less than relation on

Z). Then R is transitive and antisymmetric but neither reflexive nor

symmetric. It is a partial order.

(d) Let R = {(x, y) Z Z : x y is a multiple of 2}. Then R is symmetric

(because if x y = 2m then y x = 2(m)), reflexive (because x x =

2 0) and transitive (because if x y = 2m and y z = 2n then

x z = 2(m + n)). It is therefore an equivalence relation. It is not

antisymmetric because (1, 3) R with 1 6= 3 and (3, 1) R.

(e) Let R = {(x, y) Z Z : x is a factor of y}. Then R is reflexive (because

x = 1 x for each x), transitive (because if y = mx and z = ny then

z = (mn)x), not symmetric (because 2 is a factor of 4 but 4 is not a factor

2.2. PROPERTIES OF BINARY RELATIONS 17

of 2 so both (2, 2) R and (2, 2) R although 2 6= 2.

finite number S1 , S2 , . . . , Sn of disjoint sets.

Sn

...

S1 S2

y if x and y belong to the same set of the partition. Then R is reflexive, symmetric

and transitive, so is an equivalence relation. The same argument applies when S

is a union of an infinite number of disjoint sets. Hence every partition gives rise

to an equivalence relation.

It is also true that every equivalence relation gives a partition, so that partitions

and equivalence relations are just two different names for essentially the same

thing. Given x S let [x] be the set of all elements related to x, i.e.

[x] = {y S : (x, y) R} = {y S : x y}

where R is the equivalence relation. [x] is called the equivalence class containing

x S. Note that if x S, then x [x]. It turns out that the set of all equivalence

classes forms a partition of S, as we will now see.

Since R is reflexive, it is clear that every element x S belongs to [x] so S = [x].

x

It is not quite so clear that if [x] 6= [y] then [x] and [y] are disjoint. Suppose that

there exists z [x] [y]. Then (x, z) R and (y, z) R i.e. x z and y z.

From symmetry (z, y) R i.e. z y, and then by transitivity, (x, y) R i.e.

x y. Hence y [x]. Then, if s [y], y s so, by transitivity and x y, x s.

Hence s [x]. It follows that [y] [x]. Similarly [x] [y] and so [x] = [y]. Thus

if [x] 6= [y] then [x] and [y] are disjoint.

18 2. PROPERTIES OF BINARY RELATIONS

Example 2.2.2. (a) For the example R = {(n, n) : n Z}, each element is

related only to itself, i.e. [n] = {n} for each n. Hence each equivalence

class contains exactly one number. There are infinitely many equivalence

classes, giving a partition of Z into infinitely many sets.

(b) For the example R = {(x, y) Z Z : x y is a multiple of 2}, the equiv-

alence class containing 0 consists of the even numbers and the equivalence

class containing 1 consists of the odd numbers, i.e.

[0] = {y : y 0 is a multiple of 2} = {. . . , 2, 0, 2, 4, . . .}

and

[1] = {y : y 1 is a multiple of 2} = {. . . , 1, 0, 1, 3, . . . }.

In this example the associated partition contains just two sets, each with

infinitely many elements.

When a binary relation R D D lacks a certain desirable property we can often

extend it to one that has the property by adding some new ordered pairs.

When R is not reflexive, we can make it reflexive by adding all pairs

(x, x) i.e. by replacing R by (R) = R {(x, x) : x D}. Not only is

(R) reflexive, but it is also the smallest reflexive set containing R (since

a reflexive set containing R is forced to contain both R and each (x, x)

where x D). We call (R) the reflexive closure of R. We obtain the

directed graph representation of (R) by starting with the directed graph

representation of R and adding a loop at any vertex where none exists.

In a similar way we can define the symmetric closure (R) of R by (R) =

R {(y, x) : (x, y) R}. To obtain the directed graph representation of

(R) from that of R by adding an edge from b to a wherever we see one

from a to b but none from b to a.

It is less obvious how to construct the transitive closure of R. Certainly, it

must contain all pairs (x, z) where (x, y) R and (y, z) R. However it

must also contain all pairs (x, z) for which there exist y1 , y2 with (x, y1 )

R, (y1 , y2 ) R and (y2 , z) R i.e. for which there is a three step process

x y1 y2 z. In a similar way the transitive closure of R must

contain each pair (x, z) obtained in n steps x y1 y2 yn1 z

for any n. In fact this is all we need, and the resulting relation is the

smallest transitive relation containing R, called the transitive closure of

R and denoted (R). The next example illustrates its construction using

directed graph representations.

2.3. CLOSURES OF BINARY RELATIONS 19

is represented by the following directed graph.

1 2 3 4 5

Adding edges from a to c wherever we see edges from a to b and from b to c yields

the following graph.

1 2 3 4 5

But we are not yet done. The relation represented by this graph is still not

transitive because, for example, there is an edge from 1 to 3 and an edge from

3 to 4, but no edge from 1 to 4. So, repeating the procedure used above gives

the following graph, which does in fact represent a transitive relation (in fact it

represents the relation <on D). Here we repeated the procedure twice to obtain

the graph of (R). In general, we may need to repeat this procedure several times

to obtain the transitive closure.

1 2 3 4 5

set (this is not quite obvious) and is the smallest such relation containing R. It

is therefore the equivalence relation generated by R.

in Example 3.1.1(b). Recall that R is reflexive and transitive, but not

symmetric. Its symmetric closure (R) is R {(x, y) : (y, x) R} =

R {(x, y) : y x} = {(x, y) : x y or y x} = Z Z. Thus Z Z is

the smallest equivalence relation containing R.

If x Z then the equivalence class [x] containing x is {y Z : (x, y)

Z Z} = Z. Hence the partition corresponding to (R) = Z Z contains

only one equivalence class (the whole of Z).

(b) Let R = {(x, y) Z Z : x < y}, as considered in Example 3.1.1(c). R

is transitive but neither reflexive or symmetric. Its reflexive closure (R)

is R {(x, x) : x Z} = {(x, y) Z Z : x < y} {(x, y) Z Z :

x = y} = {(x, y) Z Z : x y}. From (a) it follows that the smallest

equivalence relation containing R must also be Z Z.

20 2. PROPERTIES OF BINARY RELATIONS

3.1.1(e), which is reflexive and transitive but not symmetric. Its symmet-

ric closure (R) = R {(x, y) : (y, x) R} = R {(x, y) : x is a multiple

of y} = {(x, y) : x is a factor of y or x is a multiple of y}. This is now

no longer transitive since, for example, (8, 24) (R) and (24, 6) (R)

(because 8 is a factor of 24 and 24 is a multiple of 6) but (8, 6) / (R).

It can be shown that the equivalence relation generated by R is Z Z.

(Notice that [1] = {y : 1 is a factor of y} = Z.)

produce a partition P such that every element of P0 is contained in some element

of P . This is really just the idea of finding the equivalence relation generated

by some relation, but it is a little easier because we dont have to worry about

symmetry.

sequence of sets P1 , P2 , . . . as follows:

For each set A in Pi let A0 be the union of A and all the sets in Pi that

intersect A and define Pi+1 = {A0 : A Pi }.

It can be shown that Pn = Pn+1 for some i.

S

Let P = Pn W where W = S \ AP0 A is the set of all {x} such that

0 0

x / A for any A Pn .

P is then a partition and each element of P0 is contained in some element of P .

P0 = {{1, 2}, {3, 2}, {3, 5}, {4, 7}, {7, 9}}

applying Algorithm 2.1 gives

P1 = {{1, 2, 3}, {3, 2, 5}, {4, 7, 9}}

P2 = {{1, 2, 3, 5}, {4, 7, 9}}

P3 = {{1, 2, 3, 5}, {4, 7, 9}}

so we stop calculating Pi s and let P = {{1, 2, 3, 5}, {4, 7, 9}, {6}, {8}, {10}}.

As you can see from the example, Algorithm 2.1 could easily be applied for small

P0 by simple inspection.

Chapter Three

A finite state machine is an abstract way of modelling a range

of mechanical or electronic devices such as vending machines,

simple flip-flops or entire computers. It can also be used to

model certain computer programs and non-hardware systems,

such as network protocols.

Automata Theory, Languages and Computation by John Hopcroft, and Jeffrey

Ullman (Addison -Wesley, 1979), but this book is quite technical and difficult as

a first account. More readable extended accounts are given in An Introduction

to Formal Languages and Automata by Peter Linz (D. C. Heath & co., 1990)

and Automata and Formal Languages: An Introduction by Dean Kelly (Prentice-

Hall, 1995). Shorter more elementary treatments appear in chapters of Discrete

Mathematics by Richard Johnsonbaugh (Macmillan, 3rd edition, 1993), Discrete

Mathematical Structures by B. Kolman, R. Busby and S. Ross (Prentice Hall,

1996) and in Doerr and Levasseur.

There are several, slightly different, ways of describing a finite state machine or

finite state automaton. We will consider a machine which can be in any of finitely

many internal states {q1 , . . . , qr }, can process a finite set {a1 , . . . , am } of allowable

inputs and can deliver a finite set {z1 , . . . , zn } of defined outputs. When the

system processes an input it may change states, so there is transition function

which describes exactly how it does that. This function maps a pair (x, s)

representing the current input and the current state to the next state (x, s).

There may also be an output function f which maps (x, s) to an output f (x, s).

In summary, our definition of a (deterministic) finite state machine consists of 5

parts:

Q = {q1 , . . . , qr } is the set of states

= {a1 , . . . , am } is the input alphabet

= {z1 , . . . , zn } is the output alphabet

: Q Q is the transition function

f : Q is the output function.

21

22 3. FINITE STATE MACHINES

Many mechanical devices, such as vending machines and electrical circuits, can

be modeled as finite state machines.

Example 3.1.1. For simplicity, consider a machine which sells two items A and

B, both costing $2. The set of inputs is {select A, select B, deposit $2}, the set

of outputs is {release A, release B, release nothing }, and the set Q of states is

{permit release, forbid release}. The transition function has, for example

(deposit $2, forbid release) = permit release

(select A, forbid release) = forbid release

(select A, permit release) = forbid release

f (deposit $2, forbid release) = release nothing

f (deposit $2, permit release) = release nothing

f (select A, permit release) = release A

f (select A, forbid release) = release nothing.

Notice that when A is selected and the machine permits release then A is released

and the state is moved to forbid release, so a further $2 is needed before more

items can be released. The alert student might spot that this machine is biased in

favour of the owner: it allows the purchaser to pay more than $2 for an item!

There are two convenient ways to describe a finite state machine. The first is to

use a pair of function tables to describe the transition and output functions. For

instance, in Example 3.1.1 with the obvious notation1, the transition function is

given by

x sA sB d$2

(x,per) for for per

(x,for) for for per

x sA sB d$2

f (x,per) rA rB rn

f (x,for) rn rn rn

The other way to describe the machine is by a directed graph with the vertices

labelled by the states and the edges, representing possible transitions between

states, labelled by the corresponding inputs and outputs. The graph for Example

3.1.1 is shown in Figure 1.

1Since and f are functions of two variables, the function tables are slightly more compli-

cated than the ones we have seen so far.

3.1. DETERMINISTIC FINITE STATE MACHINES 23

sB/rB

per for sB/rn

Example 3.1.2. A finite state machine with the states carry and dont carry can

be used to add a pair of binary numbers which are input as a sequence of pairs of

binary digits. For example, to add 1101 and 11 then (starting from the right) the

pairs (1,1), (0,1), (1,0), (1,0), (0,0) are entered in turn. (The final (0,0) allows for

carry overs.) The transition function is

(x, c) dc c c c

(x, dc) dc dc dc c

To see this, note that we must carry if the total of the two inputs and any currently

carried digit is at least 2. The output function would be

f (x, c) 1 0 0 1

f (x, dc) 0 1 1 0

The output from a given string of inputs can be calculated, once we know the

initial state, the transition function and the output function. For example, to add

the numbers 1101 and 11 using the machine of Example 5.1.2 we can record the

output as follows.

States dc c c c c

Outputs 0 0 0 0 1

1101 + 11 = 10000.

Notice that we started in the state dont carry. This is known as the initial

state (or starting state).

24 3. FINITE STATE MACHINES

In some cases the states of a machine themselves govern the output, so that the

machine is determined by just the inputs, the states and the transition function2.

in state one a positive voltage difference is maintained across a pair of terminals

and in state two the voltage difference is negative.

input signal input signal

S1 S2

low voltage high voltage

impulse). The machine is then completely specified x 0 1

by the transition function in the table at right. No- (x, s1 ) s1 s2

tice that the input 0 leaves the system unchanged, (x, s2 ) s2 s1

whereas the input 1 causes a change of state.

When we draw the directed graph of a finite state machine without output each

edge has just one label, giving the corresponding input. For example, the flip-flop

of Example 3.2.1 has the following diagram.

1

0 s1 s2 0

1

A particularly useful class of finite state machines are designed to accept or reject

a given string of inputs. These recognition machines or finite state automata have

two additional features an initial state q0 and a set F of accepting states (also

known as final states). The machine accepts a given input string (or word) w

if, starting in the starting state, it finishes in an accepting state. In the directed

2Machines with outputs are sometimes callled transducers to distinguish them from those

without inputs. In fact the machines we described in Section 3.1 are a particular type of

transducer called a Mealy machine.

3.3. RECOGNITION MACHINES 25

circles and the initial state q0 is marked by an incoming arrow as shown below.

q q0

b b

a

q0 q1

a

inputs: a a b b a b

state: q0 q1 q0 q0 q0 q1 q1

a a b b a b

q0 q1 q0 q0 q 0 q1 q1 .

Since the final state q1 is accepting, the word is accepted. On the other hand if

the input had been aabb (with the final ab deleted) then the final state would have

been q0 and the input word would have been rejected.

The set of input words accepted by a recognition machine is called the language of

the machine. For example, the language of the machine in Example 5.3.1 consists

of all words with an odd number of as. To see this, note that the input b never

changes the state but the input a always does. Hence to move from q0 to q1

requires an odd number of inputs a.

outputs some authors prefer to say that they have outputs 0 and 1, with output 0

when the current state is not accepting and output 1 when the state is accepting.

The input word is accepted if the final output is 1.

The purpose of a recognition machine is to accept or reject the finite words from a

given alphabet. Two machines which accept precisely the same words are therefore

effectively the same and we say that they are equivalent. This gives an equivalence

relation on the class of all recognition machines.

26 3. FINITE STATE MACHINES

b b b

a a

q0 q1 q2

a a

accepts the same language as that of Example 3.3.1.

From here on, we will be studying a lot of recognition machines, as well as other

types of automata that either accept or reject a given input word or string w. It

will be useful to have some notations for describing the languages of such machines.

Where repeated symbols occur in a word, as in the above example, a power

notation is often used. Using this notation

04 = 0000, 12 03 = 11000, abn = a |b .{z

. . }b

n

and so on. This notation extends easily to cases where more than one symbol is

repeated. The language

{0101

| {z . . . 01} : n 0}

n

consisting of an arbitrary number of repeats of the word 01, for example, can be

written more briefly as

{(01)n : n 0}.

Note that in this case the parentheses around the 01 are not part of the language

itself. They are used as a notation that shows the scope of the power. Without

them the expression

{01n : n 0}

would be taken to mean

. . . 1} : n 0}.

{0 |11 {z

n

Another convenient notation is na (w) which counts the number of occurrences

of a particular alphabet symbol a in a given word w. Using this notation we can

write the language of the recognition machine of Examples 3.3.1 and 3.3.2 as

{w : na (w) is odd}.

The only problem with this notation is that it doesnt tell us what letters are

allowed to be in w apart from a. The star notation allows us to rectify this. For

3.5. EXTENDED TRANSITION FUNCTION AND SUFFIX SETS 27

a given alphabet the set of all possible words or input strings made from the

symbols in is denoted by . With this notation we can describe the language

of Examples 3.3.1 and 3.3.2 unambiguously as

We will discuss the star notation further in the next chapter. Finally it is conve-

nient to have a notation for the null or empty word the word with no symbols

at all. We write this as .

Recall that the transition function of a finite state machine acts on a pair (x, s)

of an input x and a state s to produce a new state (x, s). If we have a finite

word x1 . . . xn of inputs and a state s then we can produce a state by repeatedly

applying the transition functions as follows.

moves after the sequence of inputs x1 .

(x1 x2 , S) = (x2 , (x1 , S)) is the state to which the state S

moves after the sequence of inputs x1 x2 .

.. ..

. .

(x1 x2 . . . xn , S) is the state to which the state S

moves after the sequence of inputs x1 x2 . . . , xn .

(x1 , s) = (x1 , S)

(x1 . . . xn+1 , S) = (xn+1 , (x1 . . . xn , S))

output, a similar function F can be defined by

F (x, S) = f (x1 , S)

F (x1 . . . xn+1 , S) = f (xn+1 , F (x1 . . . xn , S))

where f (x, s) is the output when the machine is in state s and receives input x.

28 3. FINITE STATE MACHINES

(aaa, q0 ) = q1 (which shows aaa is accepted).

(aaba, q0 ) = q1 (which shows aa is not accepted).

(aa, q0 ) = q0 (which shows aa is not accepted).

(aa, q1 ) = q1 .

(aaba, q1 ) = q0 .

In general the word w {a, b} is accepted provided (w, q0 ) = q1 .

This example illustrates how can be used to write down a definition of the

language of a recognition machine:

L = {w : (w, q0 ) F }

where q0 is the initial state and F is the set of accepting states. In fact, a similar

idea applies to any state qi . We define the set of words that take us to an

accepting state if we start processing at qi . This set

S(qi ) = {w : (w, qi ) F }

is called the suffix set of qi . The concept of a suffix set can make the task of

calculating the language of a machine easier by breaking up the calculation into

smaller steps.

Example 3.5.2. We calculate some suffix sets for the recognition machine shown.

q1 0 q2

0

S(q5 ) = .

1 0, 1

q0 S(q2 ) = {}.

1 S(q1 ) = {0}.

0 0 S(q4 ) = {1n : n 0}.

q3 q4 q5

S(q3 ) = {1m 01n : m, n 0}.

1 1 0, 1

Using the above calculations of S(q1 ) and S(q3 ), the language of this machine is

S(q0 ) = {00} {1m 01n : m 1, n 0}.

Note that S(q2 ), S(q4 ). This always happens for accepting states.

Once we reach q5 , we are stuck there. A non-accepting state with this

property is called a sink or black hole.

The suffix set of a sink is always .

Chapter Four

Recognition Machines

Formal language theory underlies many aspects of computer sci-

ence including the specification of programming languages. Lan-

guages may be specified by theire grammars and, if these gram-

mars are of a particularly simple form, the languages produced

are each associated with a recognition machine.

Chapters 2 and 3 of Linz and in Chapter 2 of Kelley. Shorter treatments are given

in Chapter 10 of Johnsonbaugh, Chapter 10 of Kolman, Busby and Ross and 14.2

and 14.3 of Doerr and Levasseur.

A formal language L over an alphabet is just a set of finite strings (or words)

of elements of , possibly including the empty string . Another way of saying

this is that L is any subset of . Although any set of strings is by definition

a language, for particular applications, we usually want the construction of valid

strings to be governed by rules (as in ordinary language). To specify these rules

called production rules we use two disjoint sets, a set N of non-terminal symbols

and a set T of terminal symbols. The terminal symbols are really just the symbols

in our alphabet, while the non-terminal symbols never appear in any string in our

language. They are used along the way as we gradually build up valid strings using

the production rules. The non-terminal symbols include a special one, known

as the starting symbol. We will obtain the words in the language by starting

from and repeatedly applying the production rules. The set of production rules

used to specify a language in this way is called a grammar for that language. A

regular grammar is one where all of the production rules take one of the following

two very simple forms:

(RG1) Replace a symbol A N by symbols tB where t T and B N . The

notation for a production rule of this type is A tB. (It may help to

read this type of rule as replace A with tB or A goes to tB.)

29

30 4. REGULAR LANGUAGES AND RECOGNITION MACHINES

sion. The notation for a production rule of this type is A . (It may

help to read this type of rule as delete A.)

A regular language is one that is generated by a regular grammar. This means that

every string or word in the language can be built up by the following algorithm.

Algorithm 4.1.

(a) Start with the string (and observe that there is precisely one non-

terminal symbol to begin with).

(b) While a non-terminal symbol remains in the string, either:

(i) use an appropriate production rule of type (RG1) to replace the non-

terminal symbol with a terminal symbol followed by a non-terminal

one (and observe that each time we do this there will be precisely

one non-terminal symbol present in the string).

(ii) Use a production rule of type (RG2) to replace the non-terminal

symbol in the string.

Once we perform step (ii), there will be no non-terminal symbols left, so

we are forced to stop.

There is a subtle point about the definition of a regular language that is easily

missed. It is sometimes possible to generate a regular language using a grammar

that is not regular. Example 4.1.3 illustrates this. The definition of a regular

language requires only that there exists at least one regular grammar for the the

language, not that all grammars for that language are regular.

consisting of the following production rules.

0A (rule of type (RG1))

A 0A (rule of type (RG1))

A 1B (rule of type (RG1))

A (rule of type (RG2))

B (rule of type (RG2))

Starting with the string we have only one choice since there is only one produc-

tion rule with on the left hand side (LHS). We must therefore replace by 0A.

We may now replace the A by 0A a many times as we like (including none), thus

building up a sequence of 0s:

0A, 00A, 000A, 0000A, . . .

4.1. GRAMMARS AND REGULAR LANGUAGES 31

As soon as we choose to use the rule A 1B we get the string 0n 1B for some

n 0 and at the next move we are forced to use the rule B , so we end up

with 0n 1. Alternatively, we may choose to use the rule A and hence end

up with 0n . The language generated by this grammar is therefore the set of all

strings of binary digits beginning with a positive number of 0s which may or may

not be followed by a single 1. We can write this language a set

The step-by-step process of using Algorithm 4.1 to obtain a word in the language

generated by a regular grammar is called derivation. We represent the steps in

a specific derivation using the symbol = . For instance, the derivation of the

word 00001 from the grammar in Example 4.1.1 is

We start out by considering the easy direction of this correspondence.

grammar with the same language as follows.

(a) The non-terminal symbols are the labels of the states of the machine

(b) The starting symbol is the initial state of machine

(c) The terminal symbols is the input alphabet of machine

(d) Construct a set of production rules by:

(i) Adding an (RG1) rule A tB for each arrow from A to B labelled

t in the graph

(ii) Adding an (RG2) rule A for each accepting state A.

ample 3.3.1. It has input alphabet {a, b}, state set {A, B}, initial state A and

accepting state set F = {B}.

b b

a

A B

a

The construction gives N = {A, B}, the starting state is A and the production

rules are

A aB, A bA, B aA, B bB, B .

32 4. REGULAR LANGUAGES AND RECOGNITION MACHINES

Repeated application of these rules starting with A allows the production of cer-

tain strings, but does not allow the production of others. For example

A = aB = a = a

produces the word bbab. On the other hand , you might like to convince yourself

that there is no way to produce the word aba using the above grammar.

the given finite state machine. We now illustrate why this is so by considering

some examples. If we start with an input string, such as bbab, which is clearly

accepted by the machine then (because there is only one accepting state B) the

production rules corresponding to the given inputs b, b, a, b lead from A to bbabB.

The production rule B then enables bbab to be obtained in the regular

language generated by the rules.

On the other hand, if bbab is obtained by a sequence of production rules then,

working backwards, it must have come in turn from bbabB, bbaB, bbA, bA, A

via inputs bbab. Thus the input bbab leads from state A to B, which is accepting,

so bbab is accepted by the recognition machine.

The next example illustrates how a grammar can be used to specify the syntax of

part of a programming language.

Example 4.1.3. Let the set of terminal symbols be {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, the

non-terminal symbols be {h digit i, hintegeri, hsigned integeri, hunsigned integeri},

let the starting symbol be hintegeri and let the production rules be

hintegeri hsigned integeri, hintegeri hunsigned integeri

hsigned integeri + hunsigned integeri

hsigned integeri hunsigned integeri

hunsigned integeri hdigitihunsigned integeri

hunsigned integeri hdigiti

hdigiti 0, hdigiti 1, . . . , hdigiti 9.

One example of a valid derivation is:

hintegeri = hunsigned integeri = hdigitihunsigned integer i =

hdigitihdigit i = hdigiti2 = 12

Another is:

hintegeri = hsigned integeri = hunsigned integeri

= hdigiti = 7

4.2. RECOGNITION MACHINES FOR REGULAR GRAMMARS 33

In this way any integer can be obtained. This is not a regular grammar because,

for example, the production rule hunsigned integeri hdigitihunsigned integeri

is not of the form (RG1) or (RG2). The language generated by this grammar

is regular, however, because there is a regular grammar that generates the same

language (Exercise: Try to write one).

Given a regular grammar, we can reverse the process used in the last section

and produce a recognition machine which will accept precisely the words in the

language. Based on our experience of going from a recognition machine to a

regular grammar, we expect that the recognition machine based on the language

should have the following features.

machine with the same language as follows.

(a) The states are the non-terminal symbols

(b) The initial state is the starting symbol

(c) The input alphabet is the set of terminal symbols

(d) The accepting states are the non-terminal symbols A for which there is

an (RG2) rule A .

(e) Each (RG1) rule A tB gives an arrows from A to B labelled t.

As an example, you should check that applying this construction to the grammar

of Example 4.1.2, takes you back to the graph from which the grammar was

generated. All goes smoothly because the grammar was derived from a recognition

machine. If we start with an arbitrary regular grammar, however, problems can

arise.

Example 4.2.1. Suppose T = {a, b}, N = {, , } and the production rules are

b, a, b, b,

Our constructiion gives the recognition machine shown in Figure 1.

b b

a b

chine because:

(a) the state (b, ) is not uniquely defined

34 4. REGULAR LANGUAGES AND RECOGNITION MACHINES

(b) the states (a, ) and (b, ) are not defined at all.

The latter problem with the machine of Example 4.2.1 is easily fixed. We simply

add a new non-accepting state to which we move from states where there is no

definition of how to handle a particular input. The transition function is then

completed by taking all inputs at the new state to the new state. In Section 3.5,

we described such a state as a sink or sometimes a black hole. Adding a sink

to the machine of Example 4.2.1 gives the complete machine (i.e. one where all

transitions are defined) of Figure 2.

b b a, b

a b a, b

Although we have fixed one of the problems mentioned above, the problem that

(b, ) is not uniquely defined remains. We will see how to fix this later. A

commonly used convention, which we will sometimes employ, is to leave out the

sink. Under this convention, when we reach a state for which the next transition

is not defined, the word we are processing is rejected. This is usually done to keep

the diagram for the machine simpler. It can be made formal by:

(a) allowing the transition map to be a partial function (as defined in Section

1.3) rather than a function and

(b) adopting the above convention of rejecting any word that requires the use

of a transition that is not defined.

When we discuss push down automata in Chapter 6, we will nearly always use a

partial function to define transitions and employ the latter convention.

An Equivalent Definition

The more common way of defining of regular grammars allows production rules

of the form

At (RG3)

(where t is a terminal symbol) in addition to (RG1) and (RG2) rules. We have

avoided this because in our construction of a recognition machine from a regular

grammar it is not immediately obvious how to handle such rules. It turns out,

however, that from the point of view of the language generated by a grammar

(which is really all that matters) these two definitions are equivalent. A gram-

mar consisting of rules of the forms (RG1), (RG2) and (RG3) can always be

4.3. NONDETERMINISTIC MACHINES 35

containing only (RG1) and (RG2) rules by:

Adding one new non-terminal symbol E

Adding the (RG2) rule E .

Replacing each (RG3) rule A t by the (RG1) rule A tE.

You should convince yourself that this new grammar generates exactly the same

language as the original did.

The remaining problem with the machine in Figure 2 the fact that one of the

transitions is not uniquely defined is more fundamental. In fact, the diagram

in Figure 2 describes a more general type of machine called a nondeterministic

recognition machine or nondeterministic finite state automaton. The difference

between a nondeterministic recognition machine and a deterministic one is that

in the former the transition function maps a pair of an input and a state to a set

of possible states to which we are allowed to move rather than a single state to

which we must move. For the machine of Figure 2 we have the following (partial)

transition function given by the table

x a b

(x, ) { } {}

(x, ) {, }

(x, )

Recall that is used to denote the undefined entries in the table of a partial

function. Here, we can either think of in the same way or in its usual inter-

pretation as the empty set. For a nondeterministic recognition machine, an input

string does not lead to a unique finishing state, so we have to decide what it ac-

tually means for the machine to accept an input. This is the key idea in defining

nondeterministic machines:

We say that the input is accepted if there is at least one possible choice of

transitions corresponding to the input string which ends in an accepting

state (even if there are other choices that do not end in an accepting state).

36 4. REGULAR LANGUAGES AND RECOGNITION MACHINES

For instance, in Example 4.2.1 the input string ab could either correspond to the

state sequence or to . Even though the first of these does

not end in an accepting state, the input string is accepted by the machine.

To summarise, for each (deterministic) recognition machine there is a regular

language consisting precisely of the words accepted by the machine and for each

regular language there is a nondeterministic recognition machine which accepts

precisely the words in the language. In fact, as we will see in the next chapter,

every nondeterministic machine has an equivalent deterministic one, so regular

languages correspond exactly to recognition machines.

We now generalise the star notation introduced in section 3.4. For any set B

words, the notation B denotes the set of all words (of any finite length including

zero) that can be made from the words in B by simply placing words in B after

one another in any order we please. The technical name for the operation of

making a word w by putting together two words u and v is concatenation and we

simply write w = uv. For example if u = 011 and v = 1001 then

is not the same word as vu. By definition B always includes the empty word .

If we want to exclude the empty word we use the notation B + . Here are some

examples using and + :

{11} = {12n : n 0} = {, 11, 1111, 111111, . . . },

{1}+ = {1n : n 1} = {1, 11, 111, . . . },

{1, 00} = {, 1, 00, 11, 100, 001, 111, 1111, 1100, 1001, 0011, 0000, . . . }

{1, 00}+ = {1, 00, 11, 100, 001, 111, 1111, 1100, 1001, 0011, 0000, . . . }

Taken together with set union (which we used earlier in this section) and a rather

obvious notation for concatenation, the notation gives a remarkably powerful way

of describing languages. In fact, once they have been properly defined, expressions

using these few operations can be used to define any regular language. They are

called regular expressions and provide a third standard way to describe regular

languages (in addition to recognition machines and regular grammars).

Example 4.4.1.

(a) The language L = {0n 1 : n 1} {0n : n 1} of Example 4.1.1 is

described by the regular expression 0{0} 1 0{0} or {0}+ 1 {0}+ .

4.4. REGULAR EXPRESSIONS 37

(b) L = {0101

| {z . . . 01} : n 0} is described by the regular expression {01} .

n

(c) The language of all words on the alphabet {a, b, c} containing precisely

two as and commencing with b is described by the regular expression

b{b, c} a{b, c} a{b, c} .

Regular expressions are used for many purposes in computer science. Although

we have used set theoretic notations here to emphasise the fact that a regular

expression really denotes a set of words, the versions of regular expressions used

in computer science have been adapted to need only standard computer keyboard

characters so for example:

+ is used instead of

Parentheses ( and ) are used in place of { and } for indicating the scope

of *s and +s.

science style notation (which may be more familiar to you).

(a) The regular expression 0{0} 1 0{0} would be written as 0+ 1 + 0+ .

(b) The regular expression {01} would be written as (01) .

(c) The regular expression b{b, c} a{b, c} a{b, c} would be written as

b(b + c) a(b + c) a(b + c) .

The notations may also have various abbreviations added for frequently needed

items like digits, white space and alphabetic characters. One well known notation

for regular expressions is the advanced text searching system known as GREP

(Global Regular Expression Parser ) searching. This allows for much more sophis-

ticated search and replace patterns than simple text strings or text strings with

wildcards. In principle, one can use GREP patterns to search for any set of text

strings that constitutes a regular language. GREP based searching is available in

many text editors and command line applications.

Chapter Five

Deterministic Machines

Although non-deterministic recognition machines appear to be

more general, we will see that every such machine accepts the

same inputs as some deterministic machine. This machine might

be quite complicated, but there is a method of simplifying a

machine while still keeping the same accepted inputs.

Chapters 2 and 3 of Linz and in Chapter 2 of Kelley. The construction of a

deterministic recognition machine from a non-deterministic one is also discussed

in 10.5 of Johnsonbaugh. Simplification of machines using suffix sets is covered

in 3.4 of Hopcroft and Ullman.

A nondeterministic machine appears to be a more general concept than a deter-

ministic one. Nevertheless, for every nondeterministic recognition machine there

is a deterministic one which accepts precisely the same input strings. To see this

we focus on an example.

b b a, b

a, b a

set of states, we let the states of a new machine be all the subsets of the sets of

the given one i.e. , {}, { }, {}, {, }, {, }, {, }, {, , }. The accepting

states are all those subsets containing an accepting state of the original machine,

the initial state is {} and the input alphabet is the same as originally. For any

subset S of original states and any input x, the new transition function 0 takes

a non-empty set S to the set consisting of all the original states obtained from

39

40 5. DETERMINISTIC MACHINES

[

elements of S under the input x. More formally, 0 (x, S) = (x, s). Also 0

sS

maps the empty set to itself after any input. Hence the transition table is

x {} { } {} {, } {, } {, } {, , }

(x, a) { } {} {} {, } {, }

0

{} {, }

0 (x, b) {, } { } {} {, } {, } {, , } {, , }

b b

a, b

{} {, } {, , }

a b a a

a

a

{ } {} {, } {, }

a

b a, b b b

The inputs accepted by the original nondeterministic machine are all accepted by

the new machine. For example, bbabb is accepted by the original machine through

the path

b b a b b

This gives rise to the path

b b a b b

{} {} {} { } {, } {, }

in the new machine, so that bbabb is also accepted by the new machine. (Note

that in each element of the path, the state set in the new machine contains the

original state, so that the final state will be accepting.)

It is also true that any input accepted by the new machine will also be accepted

by the old one. For example, the new machine accepts abbb through the path

a b b b

{} { } {, } {, } {, }.

and in the old machine, abbb is accepted by the path

a b b b

5.2. SIMPLIFYING DETERMINISTIC MACHINES 41

The arguments used in this example can be applied generally to show that every

nondeterministic recognition machine has an associated deterministic one which

accepts precisely the same inputs. For later reference we can now summarise the

results of the previuos chapter and this section in the form of a theorem:

ministic recognition machine.

If we are given a regular language, specified by production rules, then we can

use the theory of Section 4.2 to construct a (possibly) nondeterministic machine

accepting the given language and can then (if necessary) use the theory of Section

5.1 to construct a deterministic machine doing the same job. However the answer

might be very complicated, with many states. We need techniques for making a

simpler machine accepting the same inputs. The techniques we discuss in the rest

of this chapter are, of course, useful in many other situations as well.

The first step is to remove any inaccessible states from the machine (and any

transitions to or from them). These are states that can never be reached from

the initial state. For the machine illustrated in Figure 2 for example, the states

{}, {, }, {, } and {, , } are clearly inaccessible. Removing them gives the

machine shown in Figure 3, which clearly has the same language as the original

machine.

{} {, }

b

a a

a

{ } {} {, }

a

b a, b b

input alphabet . Calculating the suffix sets of all of the states of M gives us a

way to decide whether there is a smaller machine equivalent to M . If there are less

42 5. DETERMINISTIC MACHINES

suffix sets than states we can construct a smaller machine M 0 that is equivalent

to M .

Example 5.2.1. Although our Example from Figure 3 has five vertices, you can

check that it only has three distinct suffix sets:

S({}) =

S({ }) = S({, }) = {bn : n 0} (accepting)

S({}) = S({, }) = {bm abn : m, n 0} {bn : n 0} (initial & accepting)

Observe that the suffix set S({ }) of the accepting state { } contains the empty

string . A moments thought should convince you that S(q) precisely in the

case where q is accepting.

(a) The states of M 0 are the suffix sets themselves.

(b) The initial state of M 0 is S(q0 ) where q0 is the initial state of the original

machine.

(c) The accepting states of M 0 are the states S(q) where q is an accepting

state in the original machine.

(d) The transition function 0 of M 0 is defined for states q and inputs a by

Although this construction may seem a little abstract, it is really no worse than

the construction of the previous section. There we used subsets of the set of states

of the original machine to construct our new machine. Here we use the suffix sets

of the states of the original machine.

equivalent to M and has the least possible number of states for such a machine

(we say M 0 is minimal).

Applying this construction to our Example from Figure 3, we see that the states

are the three distinct suffix sets S(), S({ }) = S({, }) and S({}) = S({, })

calculated in Example 5.2.1, the initial state is S({}), the accepting states are

S({}) and S({ }) and according to (d) above, the new transition function 0

may be calculated using the transition function for the machine of Figure 3:

0 (b, S({})) = S((b, {})) = S({})

0 (a, S({ })) = S((a, { })) = S({})

0 (b, S({ })) = S((b, { })) = S({, }) = S({ })

5.3. AN ALGORITHM FOR FINDING SUFFIX EQUIVALENCE CLASSES 43

0 (b, S({})) = S((b, {})) = S({, }) = S({})

giving the minimal deterministic machine shown in Figure 4.

b b a, b

a a

S({}) S({ }) S({})

We may also construct our new machine M 0 by partitioning the machine M (with-

out inaccessible states) into equivalence classes. We regard states as equivalent if

they have the same suffix set and say that they are suffix equivalent. The equiv-

alence class of the initial state is the initial state of M 0 , the accepting states of

M 0 are the equivalence classes of the accepting states of M and the transition

function 0 is given by the rule

0 (a, [q]) = [(a, q)]

for states q and inputs a. This is obviously just another way of carrying out the

above construction, but it emphasizes the key problem that needs to be solved:

Which vertices are suffix equivalent?

Although the method of the previous section is theoretically valid, it is not always

practical. For a large complicated machine it is not as easy to calculate the

suffix sets as our example might suggest. Fortunately, there are algorithms that

calculate the suffix equivalence classes without having to calculate the suffix sets

explicitly. We now present an algorithm for this purpose which actually works by

determining which pairs of states are not suffix equivalent.

a cell for each pair {p, q} of distinct states. An ex- A B C D E

ample is shown at right for a machine with states B - - - -

A, B, C, D, E and F . The dashes indicate cells we C - - -

dont need (because we only need one cell for each D - -

pair of distinct vertices. Our algorithm will place an E -

X in the cell for every pair of distinct states that are F

not suffix equivalent.

44 5. DETERMINISTIC MACHINES

Algorithm 5.1.

Initialization: We observed in Example 5.2.1 that S(q) precisely in the case

where q is accepting. This shows an accepting state never has the same suffix set

as a non-accepting one. Therefore:

Initialize the table by placing an X in the cell for every pair {p, q} where

q is accepting and p is not.

Loop stage: Suppose at some stage we have states p and q and an input a for

which (a, p) and (a, q) are distinct and the cell for the pair {(a, p), (a, q)}

already has an X. This means (a, p) and (a, q) are not suffix equivalent, so

there must be either w S((a, p)) that is not in S((a, q)) or vice-versa. If there

such w exists then aw S(p), but aw cant be in S(q) because this would mean

w S((a, q)). Thus S((a, p)) 6= S((a, q)). In the vice-versa case, we again

have S((a, p)) 6= S((a, q)) for a similar reason. Therefore:

Make repeated passes through all table cells that do not yet contain an

X, placing an X in the cell for {p, q} if there is an input a such that the

cell for the pair {(a, p), (a, q)} already has an X.

Stopping criterion: We may need to go through the table many times because

a pass that adds at least one X may be setting up the scene for adding more Xs

on the next pass. Therefore:

Calculating the equivalence relation: If the cell for {p, q} does not have an

X after the loop has finished, p and q must be suffix equivalent. Since this is an

equivalence relation, we can easily find the equivalence classes using Algorithm

2.1. Therefore:

Let P0 be the set of pairs {p, q} for which the cell does not contain an

X. Apply Algorithm 2.1 to obtain a partition P . This is the set of suffix

equivalence classes.

Once the equivalence classes are calculated the simplified machine is defined in

the manner discussed at the end of the previous section.

Example 5.3.1. The initialization stage and loop passes for the deterministic

recognition machine in Figure 5 are shown in the following tables.

5.3. AN ALGORITHM FOR FINDING SUFFIX EQUIVALENCE CLASSES 45

1

A B

1 0 1

0 0

F C

0 0

1 0 1

E D

1

A B C D E A B C D E A B C D E

B X - - - - B X - - - - B X - - - -

C X - - - C X X - - - C X X - - -

D X X - - D X X - - D X X - -

E X X - E X X X - E X X X -

F X X F X X X X F X X X X

Initialization First Loop Pass Second Loop Pass

Initialization is straightforward. The first loop pass adds 4 new Xs. For example,

an X is added for the pair {B, F } since (0, B) = A, (0, F ) = E and the cell for

{A, E} already has an X. Since the second loop pass adds no new Xs no further

passes are made.

The final table shows that the pairs {A, D}, {B, E} and {C, F } are suffix equiv-

alent. This is already a partition (although it might not be for some machines).

The equivalence classes are therefore [A] = {A, D}, [B] = {B, E} and {C, F }.

These are the states of our simplified machine. Since F [C] the initial state is

[C]. Since [A] contains all accepting states from the original machine it is the only

accepting state in the new machine. Using the formula

from the previous section to calculate the transition function 0 , we obtain the

new recognition machine shown in Figure 6.

46 5. DETERMINISTIC MACHINES

1 [A]

0

[C] 0 1

0

1 [B]

of Figure 5

The ideas of Section 5.2 can also be used to design minimal machines that accept

a languages described using the notations introduced in Section 3.4. For any

language L with alphabet , we can define the suffix set of an element w of

(recall that this means the set of all words made from ) by

SL (w) = {z : wz L}

so that SL (w) is the set of all possible words made from that can be concatenated

with w to give a word in L. In many cases SL (w) will be empty. In many cases,

SL (w) will be infinite. For any language L we have SL () = L (can you see why?).

L = {1, 00} = {, 1, 00, 11, 100, 001, 111, 1111, 1100, 1001, 0011, 0000, . . . }

of Example 4.4.1 blocks of 0s must be of even length. Hence no word in L can

begin with 01 and therefore SL (01) = . On the other hand, one may check that

SL (1) = L, which is of course infinite.

Even though the individual suffix sets for a language are often infinite, it could

still be the case that there are only finitely many of them (ie only finitely many

distinct suffix sets) because there may be many, many different words in that

have the same suffix set.

suffix sets SL (1) = , SL () = L, SL (0) and SL (01) since:

SL (w) = SL (1) = w starts with 1 or has a 0 anywhere other than

in first place.

SL (w) = L w = .

SL (w) = SL (0) = {12n : n 0} w = 01n for some even n 0.

5.4. DESIGNING MACHINES FROM LANGUAGE DESCRIPTIONS 47

Since any word in {0, 1} that does not start with 1 or have a 0 anywhere other

than in first place must be of the form 01k for some k 0, these three cases cover

all possibilities.

It is no coincidence that the language L of the previous example only has finitely

many distinct suffix sets. In fact it is a consequence of the following theorem

which gives yet another characterization of regular languages.

Theorem 5.3. A language is regular precisely if it has finitely many suffix sets.

Part of the proof of this theorem involves showing how to construct a recognition

machine for a language L with finitely many suffix sets. The construction is simple

and is very similar to the simplification construction discussed in section 5.2:

The states of the machine are the suffix sets of L.

The initial state is SL () = L.

The accepting states are the states of the form SL (w) where w L.

The transition function is defined for w and a by

(a, SL (w)) = SL (wa).

According to a theorem similar to Theorem 5.2, the machine constructed in this

way is guaranteed to be minimal and deterministic.

of Example 5.4.2 gives:

Our calculations from Example 5.4.2 show that the states of the machine

are SL (1) = , SL () = L, SL (0) and SL (01) .

The initial state is SL () = L as always.

The only accepting state is S(0).

We calculate the values for the transition function using the formula:

(0, SL ()) = SL (0) = S(0)

(1, SL ()) = SL (1) = S(1)

(0, SL (1)) = SL (10) = SL (1) since 10 starts with 1.

(1, SL (1)) = SL (11) = SL (1) since 11 starts with 1.

(0, SL (0)) = SL (00) = SL (1) since 00 has a 0 in second place.

(1, SL (0)) = SL (01).

(0, SL (01)) = SL (010) = SL (1) since 010 has a 0 in third place.

(1, SL (01)) = SL (011) = SL (0).

The diagram for this machine is shown in figure 7. You should check that its

language is in fact L. We could simplify the machine by omitting the sink SL (1).

48 5. DETERMINISTIC MACHINES

0

SL () SL (0)

1 1 1

0

0, 1 SL (1) SL (01)

0

Chapter Six

Although regular languages can be powerful and useful, not all

languages are regular. In fact, most modern programming lan-

guages are not regular. First we will consider why this is so and

see how to obtain a more powerful class of machines capable of

recognising a larger class of languages by adding a limited type

of memory in the form of a stack.

Chapter 7 of Linz, Chapter 3 of Kelley and Chapter 5 of Hopcroft and Ullman.

We have seen that regular languages are quite useful and powerful. For example,

in Section 3.4 we saw that they may be used for advanced text searching. Given

the ease with which they may be described, it would be a wonderful world indeed

if all useful languages turned out to be regular. Unfortunately this is not the case.

For reasons we will discuss shortly, most modern programming languages are not

regular. Lets first consider why even the very simple language

L = {0n 1n : n 0}

fails to be regular. Many text books use a pumping lemma to prove this, but

Theorem 5.3 allows us to take an easier approach. We simply show that L has

infinitely many distinct suffix sets (as defined in Section 5.4), so L cannot be

regular by Theorem 5.3. For each n 1 it is easy to see that the only word

in {0, 1} that can follow 0n 1 is 1n1 so SL (0n 1) = {1n1 }. Since these sets are

different for each n 1, we have found an infinite number of distinct suffix sets,

proving L is not regular. Notice that we didnt need to find all of the suffix sets

(there are others) to do this. It was enough to find an infinite set of distinct ones.

observe that SL (0m 1) = {} {1n : n m 2} for each m 2 so the sets

SL (001) = {}, SL (0001) = {, 1}, SL (00001) = {, 1, 11}, . . .

are all distinct. The fact that these suffix sets are not disjoint does not

matter. We only need to know that they are distinct.

49

50 6. MACHINES WITH MEMORY

(b) Recall from Section 3.4 that we denote the number of 0s in a word w in

{0, 1} by n0 (w) and the number of 1s by n1 (w). To show that

L = {w {0, 1} : n0 (w) = n1 (w)}

is not regular, observe that

SL (1m ) = {w {0, 1} : n0 (w) = n1 (w) + m}

for each m 0 so the sets SL (1), SL (11), SL (111), . . . are all disjoint.

(c) Showing the language M = {(m )m : m 0} of matched parentheses is not

regular is much the same as for the language L = {0n 1n : n 0} discussed

above. Here the sets SL ((n )) = {)n1 } are disjoint for each n 1.

Example 6.1.1(c) suggests one reason why most modern programming languages

are not regular. They generally allow arithmetic expressions like

a + b, (a + b) 4 and ((a b) + (b/(a a)))

where the parentheses must match. To illustrate this, the next Example will give

a grammar for a limited language of arithmetic expressions of this type and show

that it is not regular. We first introduce the alternation convention for writing

the production rules in a grammar. This convention is designed to reduce the

amount of writing and allows us to combine production rules like

A tB

A rS

A

with the same left hand side into a single expression as

A tB rS .

This is just a more compact way of writing several similar rules at once and means

exactly the same thing. It may be read as replace A with either tB or rS or .

we limit ourselves to just four single letter variables, two operations and allow no

numerical constants. The grammar for L is defined as follows:

Terminal symbols T = {a, b, c, d, +, , (, )}

Non-terminal symbols N = {, E}

Starting symbol

Production rules

E+E EE

E (E + E) (E E) a b c d

6.2. STACKS 51

involving the variables a, b, c and d like

a + c, (a + b) c, d + (a a)

and so on. Derivations for these expressions illustrate this:

= E + E = a + E = a + c

= E E = (E + E) E = (a + E) E = (a + b) E

= (a + b) c

= E + E = E + (E E) = d + (E E) = d + (a E)

= d + (a a)

The form of the productions E (E +E) (E E) guarantees that the parenthe-

ses will be balanced. To show that this language is not regular, observe first that

for each m 1 the suffix set S((m ) is not empty since, for example, it contains

a{+a)}m = a +a) + a) + a) .

| {z }

m

m

Second, observe that for any w S(( ), the fact that left and right parentheses

must match means that

n) (w) = n( (w) + m.

But this means the sets S((m ) are disjoint for different m giving infinitely many

distinct suffix sets, so L cannot be regular.

L of example 6.1.2 cannot be regular. Another fundamental feature of L that

cannot be realised in a regular grammar is the arbitrary nesting of expressions.

We can expand any E in a derivation to obtain any expression in L and thus

nest expressions to arbitrary depth. Regular languages do not posses this kind

of recursive structure. This is a further reason why most modern programming

languages are not regular. They almost always allow arbitrary nesting of control

constructs like if-then-else statements and loops inside each other.

6.2. Stacks

The reason recognition machines are unable to recognise languages like

L = {0n 1n : n 0}

is that they only have a very limited form of memory. To process elements of L,

we would need some way of keeping track of how many 0s we have encountered

so that we can make sure the number of 1s is the same. In general, there is no

way of doing this with a finite state recognition machine. Since the language L we

52 6. MACHINES WITH MEMORY

have discussed in this section is not regular, it cannot be generated any regular

grammar. It is very easy, however, to give a non-regular grammar that generates

it. The terminal symbols are of course 0 and 1 and we need only one non-terminal

symbol (which therefore must be the starting symbol). The production rules

are 01 . It is easy to check that this simple little grammar generates L.

machine in the form of a stack. A stack is a simple data an

structure that permits only a simple operation of adding ele- an1

..

ments to one end and removing them at the same end. Like .

a machine, the stack has an alphabet of allowed symbols. A a2

diagrammatic representation is given at right for a stack con- a1

taining the elements a1 , a2 , . . . , an .

that we only have access to one element of

an Top of Stack

the stack at any point in time the element

an1

most recently added. This element is called ..

.

the top of the stack. Once this element is

a2

removed an action called popping the one

a1 Bottom of Stack

below that becomes the top of the stack and

hence becomes available to be popped.

The action of popping makes the top of the stack available for use in computation

and removes it from the stack. The reverse action putting items on the stack

is called pushing. The item at the other end of the stack is called the bottom.

Stacks are used for huge range of purposes in computer science. We may think

of a stack as the most rudimentary form of memory or storage available in a

computation. In the next section, we will use them to define a more powerful

type of machine capable of recognising all of the examples of non-regular languages

from the previous section. Lets see how we can use a stack to solve the problem

of deciding whether an input word is in one of these languages.

in the language

L = {w {0, 1} : n0 (w) = n1 (w)}

of Example 6.1.1(b). Our stack alphabet in this case is {0, 1, z} and we start with

an stack containing a single z. The purpose of the initial stack symbol z is to alert

us if we reach the bottom of the stack. The strategy is to add 0s and 1s to the

stack in such a way that they cancel each other out so that if n0 (w) = n1 (w),

we should be left with just z on the stack after the input word is processed. More

6.2. STACKS 53

0

0 0 0

0 1 0 0 0 0 0

z z z z z z z z z z z

0 1 1 0 0 0 0 1 1 1

Input symbols

popped from the stack and:

If a 6= s and s 6= z we do nothing since 0s and 1s should cancel out

(remember that popping has already removed s from the stack).

If s = z we push z and then a onto the stack because there is nothing to

cancel and we must replace the popped element z and then add a to the

stack for future cancellation.

If a = s (and therefore s 6= z) we push two copies of a onto the stack

because we must replace the popped element and add a new one to account

for the input symbol.

If at the end of this processing the top element of the stack is z, the word is

accepted. If not the word is rejected.

To see how this works, consider the processing of the word w = 0110000111.

Figure 1 shows how the stack evolves as each input symbol is processed. This

word is accepted since the top element of the stack is z after the last input symbol

has been processed.

way to visualize how a stack works, we will find a horizontal representation much

more convenient and compact. In fact, we simply write the contents of the stack

as a string of symbols. We adopt the convention writing the top element of the

stack at the left. Using this convention we can write down the evolution of the

stack shown in Figure 1 as:

z 0z z 1z z 0z 00z 000z 00z 0z z

and if we want to emphasize how the input symbols drive the evolution of the

stack:

0 1 1 0 0 0 0 1 1 1

z 0z z 1z z 0z 00z 000z 00z 0z z.

54 6. MACHINES WITH MEMORY

We obtain a new and more powerful class of machines by adding a stack to a finite

state machine in such a way that as each input symbol is processed, the top of

the stack is popped and may be used as well as the input symbol and the current

state to decide:

(a) Which state to move to next.

(b) What to push onto the stack.

Machines of this kind are called push down automata (PDA), so named because

stacks are often referred to as push-down data structures. Formally speaking a

(deterministic) push down automaton is specified as follows:

Q = {q1 , . . . , qr } is the set of states.

= {a1 , . . . , am } is the input alphabet

= {z, b1 , . . . , bn } is the stack alphabet

: Q Q is the partial transition function.

F is the set of accepting states.

Q0 is the initial state.

z is the initial stack symbol .

Most items in this description are familiar from our study of finite state recognition

machines and we already met the stack alphabet and the initial stack symbol in the

previous section. The difference that requires explanation is the more complicated

form of the partial transition function . We will explain shortly why it is always

a partial function. For a PDA, it needs to take three arguments:

The current input symbol (as always),

The current state (as always),

The symbol popped from the stack (which may now influence the result).

Instead of a single state, the partial transition function for a PDA returns an

ordered pair consisting of:

the state to move to next,

the word to be pushed onto the stack.

For reasons already apparent in Example 6.2.1, we typically need the ability to

push more than one symbol onto the stack at each move. There we often needed

to replace the element that had just been popped (remember that the action of

popping removed it from the stack) and then add a new one. Sometimes we may

even need to push more than two elements. It is also convenient in some cases to

start with more than just the initial stack symbol on the stack.

Example 6.3.1. Suppose we want our PDA to move from state p to state q when

the input is 0 and the stack top is 1. If we wish to replace the popped 1 and then

add the input symbol 0 to the top of the stack, the transition formula would be

6.3. PUSH DOWN AUTOMATA 55

(0, p, 1) = (q, 01) (Since the top of the stack is written on the left, we push 01,

not 10). In the notation introduced in the previous section, we can write this

stack transition as:

0

1bn . . . b1 z 01bn . . . b1 z

where bn . . . b1 z is the prior contents of the stack at this point. We extend this

notation to show state and stack transitions using ordered pair notation:

0

(p, 1bn . . . b1 z) (q, 01bn . . . b1 z).

To erase the top of the stack, the transition formula would be (0, p, 1) = (q, )

and the effect of the transition would be written as

0

(p, 1bn . . . b1 z) (q, bn . . . b1 z).

This notation shows how the internal configuration the state and stack con-

tents of the PDA is changing during a computation. We consequently call this

configuration notation.

We define the transitions for a PDA using a partial function. This is because

most PDAs we construct would otherwise need a sink to cope with all of the un-

wanted combinations of input symbol, state and stack top, making their descrip-

tion unnecessarily complicated. With so many such combinations to consider, the

description of a full transition function is tedious and typically contains lots of

rows where all entries go to a sink. For these reasons, the description of PDAs

using either a directed graph or transition function table is greatly simplified by

omitting sinks and using a partial function.

The fact that the transition function now takes three arguments instead of two

is inconvenient when writing down a transition table. We are forced to combine

two of the arguments have on one axis. We adopt the convention of writing

the input symbols along the top of the table and all possible combinations of the

state and stack symbol down the side. Entries in the transition table are ordered

pairs of the form (state, word). We use the symbol to indicate, where necessary,

places where the transition is undefined. The reasons for this notation will become

clearer in Section 6.4.

Example 6.3.2. The following PDA accepts L = {0n 1n : n 0}, our first

example of a non-regular language.

Q = {q0 , q1 , q2 , q3 }. x 0 1

= {0, 1}. (x, q0 , z) (q1 , z)

= {z, 0}. (x, q1 , z) (q1 , 0z) (q0 , z)

F = {q0 , q3 }. (x, q1 , 0) (q1 , 00) (q2 , )

Initial state q0 (x, q2 , z) (q0 , z)

Initial stack symbol z. (x, q2 , 0) (q2 , )

56 6. MACHINES WITH MEMORY

Since L contains the empty word, q0 is an accepting state. If the first input is 0 we

move to q1 and leave the stack unchanged. While the input consists of consecutive

0s, we stay at q1 adding a 0 to the stack for each input 0. Since we didnt push

the first 0, the stack always contains one less 0 than we have processed.

When an input 1 occurs, we move to q2 erasing the 0 on the top of the stack

and remain there while the input word contains consecutive 1s, erasing a 0 for

each input 1. If we are at q1 or q2 , the next input is 1 and the stack top is z, we

are at a point where the input processed so far is of the form 0n 1n1 because we

have erased precisely n 1 zeros on the stack, so we move to the accepting state

q3 . The next input 1 then completes the word 0n 1n L. Any further input gives

a word that is not in L so there are no transitions defined from q3 which means

that such a word is rejected.

We illustrate the operation of this PDA using the configuration notation intro-

duced in Example 6.3.1 for various words (as in the transition table, indicates

an undefined transition):

0 1

w = 01 : (q0 , z) (q1 , z) (q3 , z) (accept)

0 0 1

w = 001 : (q0 , z) (q1 , z) (q1 , 0z) (q2 , z) (reject)

0 0 1 1

w = 0011 : (q0 , z) (q1 , z)) (q1 , 0z) (q2 , z) (q3 , z) (accept)

0 0 1 0

w = 0010 : (q0 , z) (q1 , z) (q1 , 0z) (q2 , z) (reject)

We can also draw directed graph representations of push down automata in a

similar way to those for finite state automata. The difference is that when we

label the edges representing the transitions, we also need to show

(a) how the popped element determines the transition,

(b) what gets pushed at the transition.

We can take care of (a) by drawing an edge coming out of every state for every

possible ordered pair of the form (input symbol, stack symbol), i.e., for every

(a, b) . As usual, we dont draw multiple edges between the same two

states, instead labeling one edge with the information for all of the transitions

between the states. This makes the diagram simpler.

(a, b) 7 u,

(c, d) 7 v (e, f ) 7 w

p q r s

We can take care of (b) above using a similar notation to that for stack evolu-

tion introduced at the end of Section 6.2. For example, edge labellings for the

6.3. PUSH DOWN AUTOMATA 57

transitions (a, p, b) = (q, u), (c, p, d) = (q, v) and (e, r, f ) = (s, w) are shown in

Figure 2. The directed graph for the PDA of Example 6.3.2 is shown in Figure 3.

(0, z) 7 0z

z q0 q1 (0, z) 7 z, (0, 0) 7 00

(1, z) 7 z (1, 0) 7

q3 q2 (1, 0) 7

(1, z) 7 z

So far in our discussion of PDAs we have applied the same criterion for acceptance

of words as we used for finite state automata, namely, that processing the word

takes us to an accepting state. Another criterion often used to determine whether

a PDA accepts an input word is that the stack should be empty after the word

is processed. By saying that the stack is empty, we really mean that the stack

contains only the initial stack symbol z (or more accurately, since we are only

allowed to look at the top of the stack, z is on the top of the stack). In fact, we

used this criterion in Example 6.2.1. We call this criterion acceptance on empty

stack and when defining a machine we will adopt the convention that this criterion

is used whenever the set F of accepting states for a PDA is empty. Acceptance

on empty stack frequently leads to simpler machines.

PDA that accepts the same language as that of 6.3.2, but has only two states:

Q = {q0 , q1 }. x 0 1

= {0, 1}. (x, q0 , z) (q0 , 0z)

= {z, 0}. (x, q0 , 0) (q0 , 00) (q1 , )

F = . (x, q1 , 0) (q1 , )

Initial state q0

Initial stack symbol z.

The directed graph for this PDA is shown in Figure 4. Not only does it have less

states, but it also avoids the annoying requirement that the number of 0s to be

placed on the stack be one less than the number processed.

58 6. MACHINES WITH MEMORY

(0, 0) 7 00,

(0, z) 7 0z (1, 0) 7

(1, 0) 7

z q0 q1

Example 6.3.4. The PDA described below implements the stack algorithm used

in Example 6.2.1 as a PDA. The language is

L = {w {0, 1} : n0 (w) = n1 (w)}.

Q = {q0 }. x 0 1

= {0, 1}. (x, q0 , z) (q0 , 0z) (q0 , 1z)

= {z, 0, 1}. (x, q0 , 0) (q0 , 00) (q0 , )

F = . (x, q0 , 1) (q0 , ) (q0 , 11)

Initial state q0

Initial stack symbol z.

Using acceptance on empty stack has enabled us to construct a PDA with only

one state! Although we could draw a directed graph for this PDA, a moments

thought should convince you that directed graphs for single state machines are

not very informative. We can express the processing of the word w = 0110000111

of Section 6.2 in configuration notation:

0 1 1 0 0 0

(q0 , z) (q0 , 0z) (q0 , z) (q0 , 1z) (q0 , z) (q0 , 0z) (q0 , 00z)

0 1 1 1

(q0 , 000z) (q0 , 00z) (q0 , 0z) (q0 , z).

Although the fact that the PDA of Example 6.3.4 has only a single state may

seem rather strange at first sight, it can be shown that for any PDA there is an

equivalent one using acceptance on empty stack with only one state. In our final

example of this section, we use the notation wR to denote the reverse of a word

w. This is just the word w written in reverse order. For example:

001R = 100, 0101R = 1010, 1001R = 1001.

Words like 1001 that satisfy the condition w = wR are known as palindromes.

Palindromes of odd length are always of the form ucuR for some word u and

symbol c. Palindromes of even length are of the form vv R for some word v.

L = {w2wR : w {0, 1} }

6.4. NONDETERMINISTIC PUSH DOWN AUTOMATA 59

consists of palindromes in {0, 1, 2} of a special form. They have odd length and

the middle symbol 2 does not appear anywhere else, so we can easily detect when

we have reached the middle of the word. The following PDA accepts this language:

Q = {q0 , q1 }. x 0 1 2

= {0, 1, 2}. (x, q0 , z) (q0 , 0z) (q0 , 1z) (q1 , z)

= {z, 0, 1}. (x, q0 , 0) (q0 , 00) (q0 , 10) (q1 , 0)

F = . (x, q0 , 1) (q0 , 01) (q0 , 11) (q1 , 1)

Initial state q0 (x, q1 , 0) (q1 , )

Initial stack symbol z. (x, q1 , 1) (q1 , )

stack. When a 2 occurs we move to q1 . For an input word w2wR L the stack

should now contain the w part. We then compare each new input symbol to the

stack top. For an input word in L, they should be the same because they are

coming off the stack in reverse order. If they are not the same, the transition is

undefined, so the word is rejected. If we get to an empty stack stage, we have

processed a word in L. If there are any further input symbols, the word is rejected

because no transitions are defined for stack top z at q1 . For similar reasons any

word containing a second 0 is rejected. The graph this PDA is shown in Figure 5.

(0, 0) 7 00, (1, 0) 7 10, (0, 0) 7 ,

(0, 1) 7 01, (1, 1) 7 11 (1, 1) 7

(2, z) 7 z,

(2, 0) 7 0,

(2, 1) 7 1

z q0 q1

0

w=2: (q0 , z) (q1 , z) (accept)

0 1 2 1

w = 0121 : (q0 , z) (q0 , 0z)) (q0 , 10z) (q2 , 10z) (q1 , 0z) (reject)

0 1 2

w = 01210 : (q0 , z) (q0 , 0z)) (q0 , 10z) (q2 , 10z)

1 0

(q1 , 0z) (q1 , z) (accept)

0 1 2 0

w = 01201 : (q0 , z) (q0 , 0z)) (q0 , 10z) (q2 , 10z) (reject)

Just as finite state machines come in deterministic and nondeterministic varieties,

so do PDAs. As was the case for finite state machines, the difference is in the

60 6. MACHINES WITH MEMORY

(state, word) the partial transition function returns a set of such ordered pairs.

This is the set of all possible moves that can be made for a given input symbol,

state and stack top.

As was the case for finite state machines, we can define nondeterministic PDAs

using transition tables or directed graphs. As was the case for finite state recogni-

tion machines, a nondeterministic PDA is said to accept a word w if there is some

possible evolution of the machine starting from the initial state that processes the

word and ends at an accepting state or with an empty stack (depending on the

acceptance criterion).

There is a very important difference between nondeterministic PDAs and nonde-

terministic finite state machines. We saw in Chapters 4 and 5 that deterministic

and non-deterministic finite state machines are capable of recognizing exactly the

same languages, namely, the regular ones. For PDAs this is not true. Nondeter-

ministic PDAs are capable of recognizing a strictly larger class of languages than

deterministic PDAs.

Example 6.4.1. Despite its similarity to the language of Example 6.3.5, it can

be shown that no deterministic PDA that accepts the language

L = {wwR : w {0, 1} }

istic PDA that accepts this language:

Q = {q0 , q1 }. x 0 1

= {0, 1}. (x, q0 , z) {(q0 , 0z), (q1 , 0z)} {(q0 , 1z), (q1 , 1z)}

= {z, 0, 1}. (x, q0 , 0) {(q0 , 00), (q1 , 00)} {(q0 , 10), (q1 , 10)}

F = . (x, q0 , 1) {(q0 , 01), (q1 , 01)} {(q0 , 11), (q1 , 11)}

Initial state q0 (x, q1 , 0) {(q1 , )}

Initial stack symbol z. (x, q1 , 1) {(q1 , )}

The operation of this PDA is similar to that of Example 6.3.5, except that while

it is at q0 pushing input symbols onto the stack, it has the ability to jump for

no particular reason to q1 and start removing matching items from the stack. If

it happens to do this at the right point for a word wwR in L just as it pushes

the last symbol of w it will remove all of the symbols in w from the stack in

reverse order and empty the stack. This means that there is a possible evolution

of the machine starting from the initial state that processes the word and ends

with an empty stack, which is precisely how we define which words are accepted

by a nondeterministic PDA. We illustrate the operation of this PDA for various

words:

6.4. NONDETERMINISTIC PUSH DOWN AUTOMATA 61

0 0

(q0 , z) (q0 , 0z) and (q0 , z) (q1 , 0z)

neither of which ends with an empty stack, so w is rejected, correctly

since w is not an even length palindrome.

For For w = 00 there are three possible evolutions

0 0

(q0 , z) (q0 , 0z) (q0 , 00z),

0 0

(q0 , z) (q0 , 0z) (q1 , 00z),

0 0

(q0 , z) (q1 , 0z) (q1 , z)

precisely one of which ends with an empty stack, so w is accepted.

For For w = 1001 there are five possible evolutions

1 0 0 1

(q0 , z) (q0 , 1z) (q0 , 01z) (q0 , 001z) (q0 , 1001z),

1 0 0 1

(q0 , z) (q0 , 1z) (q0 , 01z) (q0 , 001z) (q1 , 1001z),

1 0 0 1

(q0 , z) (q0 , 1z) (q0 , 01z) (q1 , 001z) ,

1 0 0 1

(q0 , z) (q0 , 1z) (q1 , 01z) (q1 , 1z) (q1 , z),

1 0

(q0 , z) (q1 , 1z)

precisely one of which ends with an empty stack, so w is accepted.

Notice that once this machine reaches q1 , its operation becomes deterministic,

ensuring that it exactly matches the part of the word pushed to the stack while

it was at q0 .

(0, 0) 7 00, (1, 0) 7 10, (0, 0) 7 ,

(0, 1) 7 01, (1, 1) 7 11 (1, 1) 7

(0, z) 7 0z, (1, z) 7 1z,

(0, 0) 7 00, (1, 0) 7 10,

(0, 1) 7 01, (1, 1) 7 11

z q0 q1

The fact that the machine in Example 6.4.1 can decide somewhat arbitrarily to

move to an alternative state may seem strange in the study of computation, but

it is precisely this feature that gives the machine its power to detect the middle

of an even length palindrome. The machines ability to guess where the middle

is come from the way we define which words are accepted by a nondeterministic

PDA. You might say that it only guesses correctly in the sense that at least one

correct guess is possible.

62 6. MACHINES WITH MEMORY

use a PDA to specify the desired operation of a piece of hardware or software, for

example, we typically require determinism. The task of simulating a deterministic

PDA in hardware or software is typically much easier than for a non-deterministic

one.

On the other hand, nondeterministic PDAs have turned out to be useful models

for analysing the behaviour of backtracking algorithms that explore trees of pos-

sibilities in search of a solution to a problem. We may think of the machine in

Example 6.4.1 in this way. In a certain sense it tries switching from pushing

symbols mode to matching symbols model at every possible point in the input

word, looking for a switch point that gives a match. Conversely, the task of sim-

ulating a deterministic PDA in hardware or software typically involves some sort

of backtracking algorithm.

In the case of non-deterministic PDA, using the empty set symbol to mark

undefined entries in the transition table is completely consistent with its usual

usage in set theory. It simply indicates the that the set of possible transitions for

the table entry is empty. The transition table in Example 6.4.1 illustrates this.

Chapter Seven

We now show how to write grammars for the languages we stud-

ied in Chapter 6. Just as the languages generated by regular

grammars are those accepted by finite state recognition ma-

chines, there is a class of grammars which generate the languages

accepted by PDA.

Chapters 5, 6 and 7 of Linz, Chapter 3 of Kelley and Chapter 4 of Hopcroft

and Ullman. An algorithm for transforming context free grammars to Greibach

Normal Form is given in Greibach Normal Form Transformation Revisited by

Norbert Blum and Robert Koch (Information and Computation, 150 (1999), 112

118).

In Chapter 6 we saw some reasons why most programming languages are not reg-

ular. Although we can give grammars for them, we cannot give regular grammars.

Equivalently, it is impossible to design a finite state machine that recognises them.

In this chapter we consider a larger class of grammars called context free gram-

mars. These grammars are powerful enough to describe most1 of the syntax of

modern programming languages.

Just like a regular grammar, a context free grammar has a set N of non-terminal

symbols, a set T of terminal symbols and a special starting symbol N . The

difference is in the production rules. Recall that all production rules in a regular

grammar must be of type (RG1) or (RG2). In a context free grammar, the

only restriction on production rules is that the part of the rule to the left of the

arrow () must be a single non-terminal symbol. Any finite string of terminal

or non-terminal symbols can appear to the right of the arrow. In other words the

production rules must all be of the form

A 1 2 . . . n

1It is known that certain features of programming languages, particularly the requirement

that variables be declared before use, cannot be described using context free grammars. Nonethe-

less, the standard method of describing programming languages, BNF, is context free (see

page 65). Problems like pre-declaration of variables are dealt with separately.

63

64 7. CONTEXT FREE LANGUAGES

generated by some regular grammar, we say L is a context free language if there

exists some context free grammar that generates L. As was the case for regular

languages, there may be other non-context free grammars that generate L, but

provided there is at least one context free one that also generates L, we call L

context free.

Recall that the production rules of a regular grammar always have just a single

non-terminal to the left of the arrow. Since regular grammars also have this

property, they are always context free and hence regular languages are always

context free. This idea is illustrated in the Venn diagram in Figure 1.

Regular Languages

Because the production rules of a context free grammar are much less restricted

than those in a regular grammar, they allow us to describe more complicated

languages. For instance, the grammar for simple arithmetic expressions presented

in Example 6.1.2 is context free and therefore the language consisting of all such

expressions is context free. All of the non-regular languages for which we con-

structed PDAs in Chapter 6 are context free. In most cases, it is easy to give

context free grammars that generate them.

grammar with T = {0, 1}, N = {} and rules

01 .

= 01 = 0011 = 000111.

the grammar with T = {0, 1, 2}, N = {} and rules

00 11 2.

7.1. CONTEXT FREE GRAMMARS 65

is generated by the grammar with T = {0, 1}, N = {} and rules

00 11 .

Students familiar with the Backus Naur Form (BNF ) notation for specifying

the syntax of programming languages may have already noticed the similarity

between BNF and the way we write context free grammars. In fact, BNF is just

an alternative way of writing context free grammars that has some short cuts

useful for describing program syntax. In BNF:

The arrow is written as ::=.

Non-terminal symbols are written as names enclosed by < and >, for

example <expression>.

Terminals are just written as themselves.

The grammar of Example 6.1.2 could be written in BNF as:

< expr > ::= < subexp > + < subexp > < subexp > < subexp >

< subexp > ::= (< subexp > + < subexp >) (< subexp > < subexp >)

< subexp > ::= a b c d

BNF also has shortcuts for optional elements and for repetitions of elements.

These can be expressed in our notation for context free grammars, although it

is necessary to use more production rules to do so. The notation of Example

4.1.3 was inspired by BNF. Indeed, if changing each to ::= converts this

notation to valid BNF.

In order to understand the reason context free grammars are so called, it is in-

structive to consider an even more powerful class of grammars called the context

sensitive grammars. In these grammars the production rules are of the form:

A ()

possibly empty) of non terminals and terminals (so , (N T ) and

(N T )+ ). For example, we might have a production rule like

tAu taaBu

66 7. CONTEXT FREE LANGUAGES

The idea expressed by such a rule is that the non-terminal A can be replaced by

the string aaB in a derivation, but only if it has a t to its left and a u to its right.

In other words, it can only be replaced if it appears in the context tAu. In this

sense, the production rules are senstive to the context in which items appear.

Observe also that since and in () may be empty, every context free production

rule is of the form required for a context sensitive grammar. Thus every context

free grammar is also context sensitive and hence every context free language is

also context sensitive. Not all grammars and languages are context sensitive.

Even this powerful class of grammars has its limitations because it does not allow

certain types of production rules. For example, none of the rules

AB CD or t tAB or AtB AB

where A, B, C, D N and t T are valid in a context sensitive grammar. The

broadest possible class of grammars is the class of unrestricted grammars. For

these grammars, almost any kind of production rules are allowed, the only re-

striction being that the left hand side of a rule cannot be . The four classes of

grammars and languages we have disscussed regular, context free, context sen-

sitive and unrestricted make up the Chomsky hierarchy 2, the most fundamental

classification in formal language theory.

Unrestricted Languages

Regular Languages

2Named after Noam Chomsky, a pioneer of the study of formal languages and their appli-

cation to both computer science and linguistics.

7.2. GREIBACH NORMAL FORM 67

In Chapter 4 we saw that the regular languages are precisely those accepted by

finite state recognition machines. It turns out that there is a similar correspon-

dence between context free languages and (not necessarily deterministic) PDA.

In the next section, we show how to construct a PDA that accepts the language

generated by a context free grammar. Before we can do this, however, we must

first transform the grammar into an equivalent one that is suitable for this pur-

pose. By an equivalent grammar, we mean one that generates exactly the same

language as the one we started with. We need to convert our grammar into an

equivalent one in which every production rule takes one of the following forms3.

(GNF1) A tB1 B2 . . . Bn

(GNF2) A t

(GNF3)

hand side of any production rule). In other words (apart from the special rule

), the right hand side of every production rule must consist of a terminal

symbol followed by a (possibly empty) string of non-terminal symbols. Context

free grammars where every production rule takes one of these forms are said to

be in Greibach normal form 4.

Although it is far from obvious, every context free grammar is equivalent to a

grammar in Greibach normal form. There are various algorithms that take a

general context free grammar and produce an equivalent one in Greibach normal

form. They are all rather technical and complicated and we will not investigate

them in detail. Instead we will give an idea of how they work by considering how

to find equivalent gammars in Greibach normal form in the case of some very

simple context free grammars.

generated by the grammar with T = {0, 1}, N = {} and rules

01 01

A very simple trick enables us to get rid of the troublesome terminal 1 on the

right hand side of each of the rules 01 and 01. We just replace it with

a completely new non-terminal symbol B. This works provided we add another

(GNF2) rule B 1 production rule which ensures that B is eventually replaced

3Definitions of Greibach normal form given in various texts vary quite a bit. We adopt the

definition used by Blum and Koch in the paper listed at the start of this chapter.

4Named after its inventor, Sheila Greibach, now Professor of Computer Science at UCLA.

68 7. CONTEXT FREE LANGUAGES

the left hand side. This gives the grammar

0B 0B

B1

which is not quite in Greibach normal form yet, because there is a on the right

hand side of 0B. We can replace this with another entirely new non-

terminal symbol V , but we must then to duplicate every (GNF1) or (GNF2) rule

that has a on the left hand side by one that has every replaced by V . This

ensures that all derivations involving will still be possible in our new grammar.

We thus obtain two new rules from 0B and 0B by replacing s by

V s, giving the grammar

0V B 0B

V 0V B 0B

B1

which is in Greibach normal form. You should convince yourself that this really

is equivalent to the grammar we started with. In other words, you should check

that both grammars generate L.

the right hand side of a context free production rule always work. In the worst

case we are forced to add one new non-terminal and one new rule for each terminal

plus one new terminal replacing and as many new rules as there are rules with

on the left hand side.

Thus for any context free grammar, we can always find an equivalent one where

terminals only appear as the first symbol on the right hand side (if they appear at

all) and there are no s on the right hand sides of the production. The remaining

problem where there are production rules with only non-terminals on the right

hand side is somewhat trickier.

Example 7.2.2. For the grammar with T = {0, 1}, N = {, U, V } and rules

UV V U

U 0V 0

V 1U 1

the rules for U and V are already consistent with Greibach normal form. By

making the two substitutions allowed by the rules for U , into the troublesome

rule U V we obtain two new (GNF1) rules 0V V and 0V . These

7.2. GREIBACH NORMAL FORM 69

new rules replace the existing rule U V . Using a similar approach, we can

replace the rule V U with two new (GNF1) rules, giving a grammar

0V V 0V 1U U 1U

U 0V 0

V 1U 1

in Greibach normal form. Again, you should convince yourself this is equivalent

to the original grammar.

The method used in Example 7.2.2 doesnt always work. This is the reason a

more sophisticated algorithm is sometimes needed to find an equivalent grammar

in Greibach normal form.

Example 7.2.3. Consider the grammar with T = {0, 1}, N = {, U } and rules

U 1U

U 0 0

the method of Example 7.2.2 to replacing the troublesome rule U , we would

need to make both of the substitutions allowed by the rules for . This would

give the rules

U and 1U U.

it is easy to see that further applications of the method of Example 7.2.2 to this

rule will not improve matters. They will just give rules of the form

U, U, U, U

and so on.

Example 7.2.3 illustrates just one of the problems that can be encountered when

attempting to apply a naive approach to transforming a context free grammar

into Greibach normal form. In view of such examples, it is not obvious that there

is always a way to carry out this transformation. The paper by Blum and Koch

appearing in the references at the start of this chapter contains a discussion and

proof of the following theorem.

normal form.

70 7. CONTEXT FREE LANGUAGES

A context free grammar in Greibach normal form looks a bit like a regular gram-

mar in the sense that the right hand side of every production rule except

(if it is present) begins with a terminal which may or may not be followed by

non-terminals. The biggest difference in the regular case is that there can be

at most one non-terminal. This lets us use a modified version of the technique

of Section 4.2 to construct a (typically non-deterministic) PDA recognising the

language of the grammar. As in Section 4.2, the input alphabet consists of the

terminal symbols (so = T ). Instead of the non-terminals labeling the states,

they (together with the initial stack symbol z) constitute the stack alphabet .

We need only one state q (which must therefore be the initial state) and our PDA

accepts on empty stack. The transition function is built up using the following

algorithm (remember that we are constructing a non-deterministic PDA, so the

transition maps (t, q, A) to a set).

Algorithm 7.1.

(q, B1 B2 . . . Bn ) to the set (t, q, A).

(b) For each (GNF2) rule A t add an element (q, ) to the set (t, q, A).

(c) If there is a (GNF3) rule , find all other rules with on the left

hand side and:

For each (GNF1) rule tB1 B2 . . . Bn add an element (q, B1 B2 . . . Bn z)

to (t, q, z).

For each such (GNF2) rule t add an element (q, z) to (t, q, z).

Note that if there is no (GNF3) rule, we need not bother adding any rules for the

initial stack symbol z. We will come back to the reasons for this shortly. The

construction is best understood by means of an example.

Example 7.3.1. Let L = {0n 1n : n 0}. In Example 7.2.1 we observed that the

Greibach normal form grammar with T = {0, 1}, N = {, V, B} and rules

0V B 0B

V 0V B 0B

B1

7.3. PDA AND CONTEXT FREE GRAMMARS 71

Q = {q}. x 0 1

= {0, 1}. (x, q, z) {(q, Bz), (q, V Bz)}

= {z, , V, B}. (x, q, ) {(q, B), (q, V B)}

F = . (x, q, V ) {(q, B), (q, V B)}

Initial state q (x, q, B) {(q, )}

Initial stack symbol z.

The production rules 0V B and 0B add (q, V B) and (q, B) respectively

to the set (0, q, ). Similarly, The production rules V 0V B and V 0B add

(q, V B) and (q, B) respectively to the set (0, q, V ). The production rule B 1

adds (q, ) to the set (1, q, B). Finally, since the (GNF3) rule is present,

the production rules 0V B and 0B add (q, V Bz) and (q, Bz) respectively

to the set (0, q, z).

In order to accept a word w = 0n 1n L, this non-deterministic PDA must guess

when it reaches the last 0 in w and switch from pushing V B to pushing B only.

After that it has no choice but to pop a B for each 1 in the input string. We

illustrate this with some configuration evolutions for some elements of L.

w = : (q, z)

0 1

w = 01 : (q, z) (q, Bz) (q, z)

0 0 1 1

w = 0011 : (q, z) (q, V Bz) (q, BBz) (q, Bz) (q, z)

0 0 0 1

w = 000111 : (q, z) (q, V Bz) (q, V BBz) (q, BBBz) (q, BBz)

1 1

(q, Bz) (q, z)

In all cases, the evolution ends with the configuration (q, z) so each word is ac-

cepted. You should convince yourself that L really is the language accepted by

this PDA by considering what happens when a word not in L is processed.

This construction always gives a PDA with one state q. To simplify notation in

practice classes and assignments, we will leave q out of the transition tables and

configuration notation, giving only the push word and the stack contents.

Example 7.3.1 illustrates an important fact about a single state PDA that accepts

on empty stack. Namely, if we start with just a z on the stack, the empty word

will always be accepted. This is not good news if we have a grammar whose

language does not contain . By definition Greibach normal form, rules of the

form A where A 6= are not allowed. In fact, the only way a Greibach

normal form grammar can generate is if it contains the (GNF3) production

because applying rules of type (GNF1) and (GNF2) always adds one

terminal symbol at each step in a derivation.

72 7. CONTEXT FREE LANGUAGES

This is convenient because it means we can easily tell whether or not the language

L generated by the grammar contains . When / L, there is a simple trick we

can use to prevent the PDA from accepting . Instead of starting with just the

initial stack symbol z on the stack, we start with z on the stack. This guarantees

that the initial configuration (q, z) is not accepting, so is not accepted. As a

bonus, we no longer need to define any transitions for the case where z is on the

top of the stack, simplifying the transition table a bit.

Example 7.3.2. Let L = {0n 1n : n 1}. From Example 7.3.1 and the preceding

paragraph, it should be clear that the Greibach normal form grammar with T =

{0, 1}, N = {, V, B} and rules

0V B 0B

V 0V B 0B

B1

Q = {q}. x 0 1

= {0, 1}. (x, q, z)

= {z, , V, B}. (x, q, ) {(q, B), (q, V B)}

F = . (x, q, V ) {(q, B), (q, V B)}

Initial state q (x, q, B) {(q, )}

Initial stack symbol z.

The transitions for , V and B are the same as in Example 7.3.1 and since there

is no (GNF3) rule, we dont need to define any transitions for z. As in Example

7.3.1, the PDA must guess when the last 0 is reached. We illustrate this by

giving configuration evolutions for some elements of L.

0 1

w = 01 : (q, z) (q, Bz) (q, z)

0 0 1 1

w = 0011 : (q, z) (q, V Bz) (q, BBz) (q, Bz) (q, z)

0 0 0 1

w = 000111 : (q, z) (q, V Bz) (q, V BBz) (q, BBBz) (q, BBz)

1 1

(q, Bz) (q, z)

Since the evolutions all end with (q, z), the words are accepted. Even though the

initial symbol z wasnt marking the stack top when we started, we still need it to

detect the fact that the stack was empty after processing the input.

It is also possible to show that for any PDA with language L there is a context

free grammar that generates L (and hence L is a context free language). As this

is rather complicated, we will not consider the details, but putting this result

7.4. DETERMINISTIC CONTEXT FREE LANGUAGES 73

together with the PDA construction of Algorithm 7.1 and Theorem 7.1 yields the

following theorem.

by some (possibly non-deterministic) PDA.

Recall from Section 6.4, that in contrast to the situation with finite state ma-

chines, non-deterministic PDA are genuinely more powerful than deterministic

ones. They are capable of recognising a larger class of languages. This give rise to

a very important distinction in the theory of context free languages. A determin-

istic context free language is one that is recognised by some deterministic PDA.

Example 6.4.1 gave a non-deterministic PDA recognising the even palindrome

language

L = {wwR : w {0, 1} }.

It was noted there that there is no deterministic PDA recognising this language,

illustrating the fact that not all context free languages are deterministic.

From the point of view of compiler or interpreter design, deterministic context free

languages are highly desirable. A compiler or interpreter for a non-deterministic

language would typically need to use of some kind of backtracking algorithm.

Such algorithms are slow because the number of steps need to process an input

typically grows exponentially as a function the length of the input.

The problem of determining whether or not the language generated by a given

context free grammar is deterministic is a large and difficult topic. We will not

discuss it in detail. However, there is one particular class of context free languages

that is worth discussing because they can be very neatly shown to be deterministic

using ideas we have already developed.

A context free grammar is called an s-grammar or simple grammar if it is in

Greibach Normal form and has the additional property that

for any non-terminal symbol A and any terminal symbol t there is at most

one production rule with A on the left and t on the right.

Of course, a rule with A on the left and t on the right is either (GNF1) or (GNF2).

Example 7.4.1. The Greibach normal form grammar of Example 7.3.2 for L =

{0n 1n : n 1} has T = {0, 1}, N = {, V, B} and rules

0V B 0B

V 0V B 0B

B1

74 7. CONTEXT FREE LANGUAGES

and is not an s-grammar because, for example, there are two production rules

V 0V B and V 0B having the same non-terminal V on the left and terminal

0 on the right. There is, however, an equivalent s-grammar. One may check that

the grammar with T = {0, 1}, N = {, V, B} and rules

0V

V 0V B 1

B1

also generates L (you should convince yourself of this). The two rules V 0V B

and V 1 for the non-terminal V are acceptable in an s-grammar, because the

terminal symbols on the right are different. This is therefore an s-grammar.

When we carry out the construction of Algorithm 7.1 for an s-grammar, we obtain

a deterministic PDA. This happens because the defining property of an s-grammar

guarantees we can only ever add at most one element to (t, q, A) for each t T

and A N .

Example 7.4.2. Applying the construction of Section 7.3 to the s-grammar we

found in Example 7.4.1 for the language L = {0n 1n : n 1} yields the following

PDA.

Q = {q}. x 0 1

= {0, 1}. (x, q, ) {(q, V )}

= {z, , V, B}. (x, q, V ) {(q, V B)} {(q, )}

F = . (x, q, B) {(q, )}

Initial state q

Initial stack symbol z.

and need not define transitions for z. Since x 0 1

all defined entries in the table are one element (x, q, ) (q, V )

set, the PDA is really deterministic so we could (x, q, V ) (q, V B) (q, )

rewrite its transition table in the deterministic (x, q, B) (q, )

form show at right.

S-grammars are very easy to work with. It is easy to write fast and efficient

programs that can recognise the language generated by an s-grammar. Although

they are not sufficiently powerful to describe modern programming languages, they

can be used for some tasks and they can be generalised to more powerful types of

grammars appropriate to the task of compiling or interpreting real programming

languages. In the theory of compilers and programming language design, the most

widely studied generalisations of s-grammars are the LL grammars and the LR

grammars.

Chapter Eight

Resolution of many issues in theoretical computer science, re-

quires a consideration of the relative sizes of sets. For finite sets

A and B, questions like Is A larger than B? or Do A and B

have the same number of elements? are simple to interpret and

usually fairly easy to answer. For infinite sets, the meaning of

such questions is not immediately obvious. In this chapter, we

consider how to compare the sizes of infinite sets.

cardinality and countability.

8.1.1. When sets are the same size. Before attempting to analyse the

situation for infinite sets, we first consider what it means for one finite set to be

the same size as another. Provided we resolve these questions in the right way,

our definition extends to the infinite case in a straightforward way.

It does not take mathematical genius to calculate that the set A = {a, b, c, d} has

4 elements. One simply counts the elements: a is the 1st element, b is the 2nd , c is

the 3rd and d is the 4th . Another way of viewing this counting process is that we

are matching up the elements of A with the numbers 1, 2, 3 and 4 in such a way

that each element of A corresponds to precisely one of these numbers. In other

words, we are showing there is a one-to-one correspondence

a b c d

l l l l

1 2 3 4

between A and N4 = {1, 2, 3, 4}. This correspondence is only possible because the

sets A and N4 are the same size. A similar correspondence between

F = {Hawthorn, Essendon, Footscray, Geelong, Carlton}

and N5 = {1, 2, 3, 4, 5} shows that F has five elements or, equivalently, that F

and N5 have the same size. This line of thinking leads to the definition that

A has n elements precisely there is one-to-one correspondence between A and

Nn = {1, 2, . . . , n} and more generally, that two finite sets are the same size if

there is a one-to-one correspondence between them. We may think of the sets Nn

75

76 8. COUNTABILITY AND UNCOUNTABILITY

as reference sets for the sizes of finite sets. It should be obvious that every finite

set is in one-to-one correspondence with precisely one of them.

If we wish to use one-to-one correspondences to frame a definition of sets having

the same number of elements, we need to state in a precise mathematical way

what is meant by a one-to-one correspondence. Changing the correspondence

diagram slightly

x 1 2 3 4

f (x) a b c d

makes it clear that a one-to-one correspondence between sets is in fact a function

between the sets, not just any function, but one that matches up the elements

of its domain and codomain with each other in a one-to-one fashion. A function

between the sets A and B has this matching property precisely if it satisfies two

requirements:

It must be a one-to-one function. This means it must send different ele-

ments of its domain to different elements in the codomain. In quantifiers,

this may be expressed as

codomain has some element of the domain mapping to it. In quantifiers,

this may be expressed as

(y B)(x A) f (x) = y.

The quantified definition of a one-to-one function given above captures the moti-

vation of he definition nicely, namely, that distinct points in the first set cannot

correspond to the same point in the other. When checking whether a particular

function is one-to-one, however, it is often easier to use the equivalent contrapos-

itive form

(x, y A) f (x) = f (y) = x = y

of the definition. This is particularly so in cases where f is defined using a formula

rather than a table.

Example 8.1.1. Let A = {1, 0, 1, 2} and B = {2, 0, 2, 4}. To check that the

function f : A B defined by f (x) = 2x is a one-to-one correspondence, we

check that the contrapositive form of the definition of one-to-one holds

f (x) = f (y) = 2x = 2y = x = y

and that f is onto. Since B is finite, we can check this by checking exhaustively

that every element of B has something that maps to it:

8.1. HOW BIG IS A SET? 77

we will have to work smarter.

So far we have only considered finite sets A and B, but this idea of saying A and

B have the same number of elements (or the same size) precisely if there is a one-

to-one and onto function f : A B still makes perfectly good sense in the case

where A and B are infinite. In fact, this is the universally accepted mathematical

definition of the idea that two sets are the same size.

Definition 8.1.1. We say sets A and B have the same size or cardinality if there

exists a one-to-one and onto function f : A B. We express this by writing

|A| = |B|.

Example 8.1.2. For the language L = {0n 1n : n 1}, it is easy to see that

the function f : N L defined by f (n) = 0n 1n is is a one-to-one correspondence.

Hence |N| = |L|.

All of this may seem like childs play in the case of finite sets, but in the infinite

case there are a few surprises in store.

natural numbers. We now check that the function f : A B defined by f (x) = 2x

is a one-to-one correspondence. As in Example 8.1.1, the contrapositive form of

the definition of one-to-one holds since

f (x) = f (y) = 2x = 2y = x = y.

To check that f is onto, we observe that every element y = B takes the form

y = 2n for some n N, so y = f (n).

Example 8.1.3 may seem a bit strange at first sight. Despite the fact that B is a

proper subset of A, it is the same size as A according to our definition. This kind

of thing seems strange because we are so used to thinking about the size of finite

sets, for which such behaviour is impossible1. There are stranger things to come.

There is another point about Definition 8.1.1 that may be bothoering some read-

ers. It seems to be asymmetrical. Surely |A| = |B| should imply |B| = |A| and

yet the definition is expressed using a one-to-one and onto function f : A B.

Recall from Example 1.3.4 that although a function f always has an inverse rela-

tion f 1 , there is no guarantee in general that f 1 is a function. The following

theorem, however, shows that if f : A B is one-to-one and onto, then so is

f 1 : B A and hence |B| = |A|. This shows that the asymmetry in Definition

1In fact, there is an alternative definition of a finite set which says that a set is finite precisely

if it does not have the same cardinality of any of its proper subsets. A set having this property

is called Dedekind finite.

78 8. COUNTABILITY AND UNCOUNTABILITY

8.1.1 is apparent rather than real. In other words, when proving set cardinalities

are equal, it doesnt matter which direction the function goes.

if f is one-to-one and onto. Moreover, if f is one-to-one and onto, so is f 1 .

Theorem 8.1 is also useful for checking that a function is one-to-one and onto,

because it says we can do so by finding a formula for the inverse. And in order

to establish that a function g is the inverse of a function f , all we need to do is

check that the following two conditions hold.

(INV1) g(f (x)) = x for every x in the domain of f .

(INV2) f (g(x)) = x for every x in the domain of g.

Example 8.1.4. The set Z of integers consists of all whole numbers, positive,

negative and zero. The one-to-one correspondence suggested by

1 2 3 4 5 6 7 8 9 ...

l l l l l l l l l

0 1 1 2 2 3 3 4 4 . . .

can be turned into a formula for a function f : N Z defined by

1n if n is odd

2

f (n) =

n

2

if is even.

In view of Theorem 8.1, we can show that f is one-to-one and onto by showing it

has an inverse. The function g : Z N defined by

1 2n if n 0

g(n) =

2n if n > 0.

is the inverse of f (you should check this, by checking conditions (INV1) and

(INV2)) so f is one-to-one and onto and hence |Z| = |N|.

8.1.2. Large and small. When trying to decide when one set is larger than

another, it is again sensible to start with finite examples. Although there is no

one-to-one correspondence between A = {a, b, c, d} and B = {a, c, e}, we can get

a one-to-one correspondence between B and a subset of A.

a c e

l l l

a b c d

Such a correspondence gives a one-to-one function from B to A that is not onto.

This gives a way of expressing the fact that A is larger than B, but we have to

be careful. We could attempt to define |B| < |A| to mean that there is a one-to-

one function from B that is not onto. This would work in the finite case, but it

8.2. COUNTABLE SETS 79

would give a questionable definition in the case where the sets are infinite. The

function f (x) = 2x of Example 8.1.3(a) illustrates this, because it is a one-to-one

function but not onto function from N to N showing that |N| < |N| according to

this tentative definition!

To avoid these difficulties, we instead define |B| |A| to mean that there is an

one-to-one function from B to A. This function could be onto, in which case the

sets would equal, as suggested by the notation. The statement |B| < |A| is

then defined to mean that |B| |A| is true, but |B| = |A| is false. In other

words, there is an one-to-one function from B to A, but there is no one-to-one

and onto function. This definition lets us prove the following (rather obvious, but

very useful) theorem.

f (x) = x. This function is one-to-one since f (x) = f (y) = x = y.

(the numbers we can write as fractions m n

where m and n are integers and

n 6= 0), we have |Z| |Q|.

(b) Since Q is contained in the set R of real numbers, we have |Q| |R|.

(c) If F is a finite set then for some m N there is a one-to-one and onto

function f : F Nm and Nm N. Hence |F | = |Nm | |N| and hence

|F | |N|. Since it is obvious that there can be no one-to-one and onto

function g : F N, we have |F | < |N|.

We saw that |Z| = |N| in Example 8.1.3(c) and we now know that |Z| |Q| and

|Q| |R|. There is no obvious way of deciding whether |Z| < |Q| or |Q| < |R| at

this stage. In fact, some real surprises are in store here.

It may be shown that the smallest possible infinite sets are those having the same

cardinality as N. A precise meaning for this slightly mysterious statement is given

out in the following theorem.

Theorem 8.3. If a set A is infinite and |A| |N| then |A| = |N|.

If |A| < |N| then A is finite. ()

Sets of the same cardinality as N are extremely important in mathematics and

theoretical computer science because they have the very useful property that their

80 8. COUNTABILITY AND UNCOUNTABILITY

x1 , x2 , x3 , x4 , . . .

in which no element is ever repeated. This follows directly from the definition of

cardinality, for if |A| = |N| there is a one-to-one and onto function f : N A,

and putting

x1 = f (1), x2 = f (2), x3 = f (3), x4 = f (4), . . .

gives the desired listing of the elements of A. The fact that f is one-to-one

guarantees no element is ever repeated. Moreover, this argument can easily be

reversed to show that any set A whose elements can be written as an infinite list

x1 , x2 , x3 , x4 , . . .

with no element is ever repeated has the same cardinality as N. All we need to

do is define f : A N by f (xi ) = i. Sets of this cardinality are so important,

they have a special name. Just as the sets Nn of Section 8.1 are used to represent

the sizes of finite sets, the set N is used as the standard representative the size of

these sets. This is the idea behind the following definition.

Definition 8.2.1. A set A is called countably infinite if |A| = |N| and A is called

countable if |A| |N|.

Example 8.1.5(c) shows immediately that every finite set is countable. In fact by

the version of Theorem 8.3 given above in (), the countable sets that are not

countably infinite are precisely the finite sets. Example 8.1.3 shows, somewhat

surprisingly, that the set of even natural numbers and the set Z of integers are both

countably infinite. Much more surprising is the fact that the Cartesian product

N N pictured in Figure 1, is countable.

1 2 3 4

8.2. COUNTABLE SETS 81

Even by itself, the bottom row is a copy of N, so N N appears to have far more

elements than N. However we can use the idea of writing the elements of N N

as an infinite list to demonstrate that N N is countably infinite. The trick is to

count following a diagonal pattern, as follows:

(1, 1), (1, 2), (2, 1) (1, 3), (2, 2), (3, 1), ....

| {z } | {z } | {z }

sum 2 sum 3 sum 4

We first count the ordered pairs with sum 2, then those with sum 3 (in increasing

order of the first coordinate), then those with sum 4, and so on. Pairs with the

same sum lie on a diagonal. This process is illustrated in figure 2.

1 2 3 4

1 2 3 4 5 6

l l l l l l ....

(1, 1) (1, 2) (2, 1) (1, 3) (2, 2) (3, 1)

convince a skeptic we need to produce a formula for a one-to-one and onto function

f : N N N. It turns out that

1

f (m, n) = (m + n 2)(m + n 1) + m

2

does the trick. It is left as a challenge to the interested student to show that

f really is one-to-one and onto. Notice, however, that it works for the first six

ordered pairs listed before:

f (1, 1) = ( 21 0 1) + 1 = 1, f (1, 2) = ( 21 1 2) + 1 = 1 + 1 = 2,

f (2, 1) = ( 21 1 2) + 2 = 1 + 2 = 3, f (1, 3) = ( 12 1 3) + 1 = 3 + 1 = 4,

f (2, 2) = ( 21 2 3) + 2 = 3 + 2 = 5, f (3, 1) = ( 12 2 3) + 3 = 3 + 3 = 6.

82 8. COUNTABILITY AND UNCOUNTABILITY

The fact that N N is countably infinite has other surprising consequences. Let

Q+ denote the set of positive rational numbers. Recall that we can write any

positive rational number q Q+ in lowest positive terms. This means q can be

written in the form q = m q

nq

where mq and nq have no common factors (because

we have cancelled as far as possible) and mq and nq are also both positive (there

is no point writing q = mn

where m and n are both negative). The fact that we

have written q in its lowest terms means mq and nq are unique, so the function

f : Q+ N N defined by f (q) = (mq , nq ) is one-to-one and hence

|Q+ | |N N| = |N|

which shows that |Q+ | |N|. Since Q+ is infinite, Theorem 8.3 shows that

|Q+ | = |N|. A construction similar to the one used in Example 8.1.4 can be used

to show that |Q+ | = |Q|, giving the the following theorem.

In view of the very generous way in which the rational numbers are scattered

among the real numbers between any pair of distinct real numbers, there is

always a rational number this seems very surprising. It looks as though there

are a should be a lot more rational numbers than there are natural numbers!

If A is infinite and |A| |N|, there is a one-to-one function f : A N. Since f

is one-to-one, the set

B = {f (x) : x A}

is infinite and since B N, we can list its elements in increasing order

For each n N the fact that kn B means there is an x A such that f (x) = kn .

Since f is one-to-one, there is only one such element, so we an call it xn and define

g : N A by g(n) = xn . It may be shown that g is one-to-one and onto, which

shows |A| = |N|. It is left as a challenge to the interested student to show g is

one-to-one and onto.

Recall that the union A1 A2 An of a finite sequence A1 , A2 , . . . , An of

sets is the set obtained by throwing in all of the elements of all of the Ai . Using

8.3. HOW BIG IS A LANGUAGE? 83

[n

Ai = A1 A2 An = {x : (i Nn ) x Ai }.

i=1

A1 , A2 , A3 , A4 , . . .

of sets. The aim is the same. We seek the set obtained by throwing in all of the

elements of all of the Ai . The union of an infinite sequence of sets A1 , A2 , A3 , . . .

is defined using quantifiers by

[

Ai = A1 A2 A3 = {x : (i N) x Ai }.

i=1

Example 8.3.1.

(a) If Ai = {i} for each i N, it is easy to see that A1 A2 A3 = N.

(b) If Ai = {i, i, 0} for each i N, it is easy to see that A1 A2 A3 = Z.

(c) If Ai = {0i 1i } for each i N where 0i 1i denotes a word in {0, 1} , it is

easy to see that A1 A2 A3 . . . is the language L = {0n 1n : n N}.

A1 , A2 , A3 , A4 , . . .

of disjoint, finite, non-empty sets, so |Ai | = ki for each i N. To label the

elements of these sets carefully, we need two subscripts one to tell which set Ai

the element is in and one to tell precisely which element it is giving a labeling

A1 = {a11 , a12 , a13 , . . . , a1k1 }

A2 = {a21 , a22 , a23 , . . . , a2k2 }

A3 = {a31 , a32 , a33 , . . . , a3k3 }

.. ..

. .

for all of the elements of all of the Ai . This labeling allows us to write the elements

of A1 A2 A3 . . . as a list

a11 , a12 , . . . , a1k1 , a21 , . . . , a2k2 , a31 , a32 , . . . , a3k3 , a41 , . . .

Since the Ai are disjoint, each element of A1 A2 A3 . . . appears exactly once

in this list. This listing shows us that |A1 A2 A3 . . . | = |N|.

For the skeptics we note that this list order is given by the function

[

f: Ai N

i=1

84 8. COUNTABILITY AND UNCOUNTABILITY

i1

X

f (aij ) = j + kr

r=1

in the case where i > 1. It is left as a challenge to the interested student to show

f is one-to-one and onto.

In fact, it is possible to drop the requirement that the Ai be disjoint. This requires

a modification to the above list in the case where the Ai are not disjoint as we

construct the list, whenever we encounter an aij we have seen before, we just leave

it out. This yields the following theorem.

fact, we can use Theorem 8.5 to see why every language must be countable. It is

implicit our definition of a language L that the alphabet of the L is finite, say

|| = m.

use a counting argument of the type you met in MAT1DM to show that each Li

is finite. For any word

w = w1 w2 . . . wi Li

where i > 0, we have w1 so there are at most m possible choices for the

symbol w1 . Similarly, there are at most m possible choices for w2 , w3 , . . . , wi1

and wi . These choices may or may not be independent, depending on L, but in

the case where they are independent, we obtain the total number of words in Li

by multiplying, so

|Li | = m m m = mi .

If the choices are not independent, the number of possibilities will actually be less

and we will have |Li | < mi . In either case Li is finite. Finally, there are only two

possibilities for L0 . If L containd then L0 = {} and if not L0 = .

there are no restrictions on the choice of 0s and 1s. In other words, the

choices are independent of one another.

(b) If L = {0n 1n : n 0} , then Li = {0i/2 1i/2 } for each even i 0 and hence

|Li | = 1. For odd i 0, we have Li = so |Li | = 0.

8.3. HOW BIG IS A LANGUAGE? 85

odd i 0. For even i 0, the choices of 0s and 1s are independent so

|Li | = 2i .

[

L= Li1 = L0 L1 L2 L3 . . . ()

i=1

and Theorem 8.5 shows L is countable. This gives the following theorem.

We can obtain an even better result than Theorem 8.5 using the fact that N N

is countably infinite. If A1 , A2 , A3 , . . . is an infinite sequence of disjoint countable

sets, then for each i N there is a one-to-one function gi : Ai N (not necessarily

onto because some of the Ai might be finite). Because the Ai are disjoint, we can

define

[

f: Ai N N

i=1

by f (aij ) = (i, gi (j)). (Can you see why we require that the Ai be disjoint here?)

It can be shown that f is one-to-one and hence

[

Ai |N N| = |N|

i=1

S S

i=1 Ai | |N|. Since i=1 Ai is infinite, Theorem 8.3 shows

S

that | i=1 Ai | = |N|, so is countably infinite. Here again it is possible to drop the

requirement that the Ai be disjoint using a the sam approach as for Theorem 8.5.

This gives a stronger theorem:

2Notice that a language nee not be infinite. It may be the case that only finitely many of

the Li are non-empty, in which case the union in () would be finite.

86 8. COUNTABILITY AND UNCOUNTABILITY

A set that is not countable is (not surprisingly) called uncountable. In view of the

theorems of the previous section, some readers may be wondering by now whether

all sets are countable. Of course, they are not. Otherwise, we wouldnt waste your

time by defining uncoutable sets! A key idea we will need in order to demonstrate

the existence of uncountable sets is that of the set of all subsets of a given set.

Definition 8.4.1. Let A be a set. The power set of A is the set of all subsets of

A (never forget that this always includes and A itself) and is denoted P(A).

Example 8.4.1.

(a) P() = {}. Notice that even though is empty, P() is not, since it

contains itself.

(b) P(N1 ) = P ({1}) = {, {1}} so |P({1})| = 2.

(c) P(N2 ) = {, {1}, {2}, {1, 2}} so |P(N2 )| = 4.

(d) P(N3 ) = {, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}, {1, 2, 3}} so |P(N3 )| = 8.

(e) P(N) contains , N, all of the finite subsets of N, for example,

{1}, {1, 2}, {4, 7}, {100, 1000}, {10, 11, . . . , 20}

and so on as well as all of the infinite subsets like the even numbers, the

odd numbers, the perfect squares {1, 4, 9, 16, . . . }, and countless others

(in fact, uncountably many, as we shall soon see).

You may have already spotted a pattern in Example 8.4.1. It turns out that

and from this it may be shown3 that if |A| = n then |P(A)| = 2n . It follows that

|A| < |P(A)| for any finite set A. Remarkably, this is true even for infinite sets.

Theorem 8.8 (Cantors Theorem). |A| < |P(A)| for any set A.

Thus |N| < |P(N)|, so P(N) is our first example of an uncountable set. By

applying Cantors Theorem repeatedly, we cans construct a sequence of infinite

sets of strictly increasing cardinality

3Because of this, many books use the notation 2A instead of P(A) for the power set. In

fact, this is the reason for the name power set.

8.4. UNCOUNTABLE SETS 87

are all uncountable, so we have not just shown how to construct an uncountable

set we have shown how to construct uncountable sets as big as we like. These sets

may seem a little unfamiliar, but we can use the fact that P(N) is uncountable,

to show that more familiar sets are also uncountable.

Example 8.4.2.

(a) Let B denote the set of all infinite binary sequences4, so B is the set of

all sequences of the form

x1 , x2 , x3 , . . .

a very natural way by letting f (x1 , x2 , x3 , . . . ) = {i N : xi = 1}.

For example

f (0, 0, 0, 0, . . . ) =

f (1, 0, 0, 0, . . . ) = {1}

f (0, 1, 0, 1, 0, 1, . . . ) = {n N : n is even}

f (1, 1, 1, 1, . . . ) = N.

hence B is uncountable.

(b) Define a function f : B R (where B is defined as in (a) above) by

sending x1 , x2 , x3 , B to the number with decimal representation

0.x1 x2 x3 . . . so every f (x1 , x2 , x3 , . . . ) = 0.x1 x2 x3 . . . is contained in the

interval [0, 1). Thus for example

f (0, 0, 0, 0, . . . ) = 0

1

f (1, 0, 0, 0, . . . ) = 0.1 = 10

f (1, 1, 1, 1, . . . ) = 0.1111 = 19 .

It is well known that any two distinct decimal representations give distinct

real numbers except in the important case where one representation ends

in an infinite string of 9s. For example,

elements of B are always sent to different numbers by f . This means f is

one-to-one and hence |B| |R|. It follows that R is uncountable by (a).

4The usual mathematical notation for B is {0, 1}N .

88 8. COUNTABILITY AND UNCOUNTABILITY

countable, but that between any pair of distinct real numbers, there is always a

rational number. Nonetheless, there are a lot more real numbers than there are

rational numbers.

The most important result from our point of view concerns the set of all languages

made from a (non-empty) alphabet . Since contains some symbol a, it contains

the word an for each n N, so it is infinite. Since is a language, Theorem 8.6

tells us that | | |N| and hence | | = |N| by Theorem 8.3. By our definition

of a language with alphabet , every subset of is a language. The set of all

possible languages made from is therefore P( ). But now Cantors Theorem

gives

|P( )| > | | = |N|

Theorem 8.9. The set of all possible languages made from any non-empty alpha-

bet is uncountable.

The fact that |P(A)| |A| follows from Theorem 8.2 since the subset

{{x} : x A}

of P(A) clearly has the same cardinality as A (you should prove this).

To show |P(A)| > |A| we need to establish that |P(A)| = |A| is false. Here we

need to use proof by contradiction. This means we assume the negation of what

we really want to prove and show that this leads to a contradiction a statement

that is clearly absurd or false.

In this case, we assume that |A| = P(A)|. This means there is a one-to-one and

onto function f : A P(A). Now here comes the fiendishly clever trick! Define

E = {x A : x

/ f (x)}.

Although this definition looks a bit strange, it makes sense because the codomain

of f consists of subsets of A, so f (x) is a set for each x A. This means we can

always ask whether or not x f (x) and define E to be the set of x values for

which this is false. It may be that E is empty, but this doesnt matter.

Since E X it is an element of the codomain P(X) of f . Since f is onto, there

must be a z X such that f (z) = E. We get a contraction from the fact that z

must either be an element of E or not:

8.4. UNCOUNTABLE SETS 89

If z E then z

/ f (z) by definition of E, but f (z) = E so this means

z / E. But this is a contradiction because we have now shown that

z E = z / E.

If z

/ E then z f (z) by definition of E, but f (z) = E so this means

z E. This is also a contradiction because we have shown that z /

E = z E.

Since both cases lead to a contradiction, we conclude that there can be no one-

to-one and onto map A P(A) so |P(A)| = |A| must be false, as we wanted to

prove. Thus |A| < |P(A)|.

Chapter Nine

In Chapters 3 to 5 we considered finite state machines and their

correspondence with regular languages. In Chapters 6 and 7 we

added more power to finite state machines to obtain PDA and

considered their correspondence with context free languages. In

this chapter, we extend these ideas to define an even more pow-

erful class of automata and consider the languages they are ca-

pable of recognising. In fact these automata known as Turing

machines are so powerful that they have been universally re-

garded for over 60 years as the gold standard of what can

theoretically be calculated by an algorithm.

References. The material in this chapter is covered in much greater detail in

Chapters 9 to 12 of Linz, Chapters 4, 5 and 6 of Kelley and Chapters 7 and 8 of

Hopcroft and Ullman.

Earlier we saw that not all languages are regular. We now observe that even the

larger class of context free languages does not contain everything. In fact, some

rather innocent looking languages like

L1 = {0n 1n 2n : n 1}

and

L2 = {ww : w {0, 1}+ }

are not context free. In view of the construction of Section 7.3 this means there

is no PDA that recognizes them. This seems plausible for both of the above

languages. For L1 it seems likely that the only way a PDA can test whether the

number of 0s is the same as the number of 1s is to use a method similar to

that of Example 6.3.2 and 6.3.3 adding a symbol to the stack as each 0 is read

and then deleting a symbol as each 1 is read. The problem is that this method

destroys our count of 0s and we have no way to check that the number of 2s

is correct. For L2 we would need a PDA similar to the one of Example 6.4.1

which remembers the first half of a palindrome by writing it to the stack and

then compares. For L2 a problem arises with this method because items retrieved

from a stack are in reverse order. There is no way to correct this. We can give

91

92 9. MORE POWERFUL MACHINES

a rigourous demonstration of the fact that these languages are not context free

using the following theorem.

Theorem 9.1 (Pumping Lemma for Context Free Languages). For any context

free language L, there is a special number p N (called a pumping length) that

has the following property:

Any word w L such that |w| p can be written as w = uvxyz where

(a) |vxy| p,

(b) v 6= or y 6= (i.e. at least one of v and y is non-empty),

(c) The word uv i xy i z is in L for every i 0.

The name of Theorem 9.1 comes from (c). It tells us that the words

uvvxyyz, uvvvxyyyz, . . . , uv i xy i z, . . .

and so on are all in L. Provided w L is long enough to begin with, we can

pump it up in the way described by (c) to get longer and longer words that are

also in L. We will not attempt to prove this result in MAT2MFC. A proof would

be at least a weeks work in itself. We will simply demonstrate how it can be

used to show that languages are not context free. To do this we need to use the

technique of proof by contradiction. This means we assume the negation of what

we really want to prove and show that this leads to a contradiction a statement

that is clearly false. We begin by assuming the language is context free and show

(using the pumping lemma) that this leads to a false conclusion.

let n be any integer greater than p where p is the pumping length for L1 . The

word w = 0n 1n 2n is in L1 and |w| = 3n > p so the pumping lemma lets us write

w = 0n 1n 2n = uvxyz

where |vxy| p < n and at least one of the words v and y is non-empty. The fact

that |vxy| < n means that vxy isnt long enough to contain all three symbols 0, 1

and 2, so it must be a sub-word of one of the five words

0n , 0n 1n , 1n , 1n 2n and 2n .

Lets consider these possibilities:

(i) If vxy were a sub-word of 0n we would have vxy = 0j for some j n,

u would be a sub-word of 0n and 1n 2n would be a sub-word of z. We

would also have v = 0k and y = 0m for some k, m 0. Since at least

one of v and y is non-empty, this would give vvxyy = 0q+j for some

q > 0 (i.e. vvxyy would have q more 0s than vxy). This would mean

s = uvvxyvz = 0q+n 1n 2n so s would have q more 0s than w = uvxyz but

the same number of 1s and 2s. This would show s / L1 . But Theorem

9.1. NOT ALL LANGUAGES ARE CONTEXT FREE 93

9.1 tells us that s L1 . This shows vxy cant have been a sub-word of

0n after all.

(ii) By similar reasoning vxy cant be a sub-word of 1n or some 2n either.

The only possibilities left are that vxy is a sub-word of 0n 1n or 1n 2n .

(iii) If vxy were a sub-word of 0n 1n , a similar (although slightly more compli-

cated) argument as the one in (i) would show that vvxyy would have to

contain either more 0s than vxy or more 1s than vxy (or possibly both).

But this in turn would mean that s = uvvxyvz would have to contain

either more 0s than 2s or more 1s than 2s. In either case, we have

s / L1 , but as before, Theorem 9.1 tells us that s L1 . As before, vxy

cant have been a sub-word of 0n 1n after all.

(ii) By similar reasoning vxy cant be a sub-word of 1n 2n either. There are

no possibilities left.

All possibilities ended in tears! We must conclude that L1 is not context free.

L2 where n is any number greater than the pumping length. Believe it or not,

Example 9.1.1 is a relatively easy application of the pumping lemma! Even for the

relatively simple languages L1 and L2 in the introduction to this section, applying

the pumping lemma can be quite hard. There is a much easier way to show

languages on a one-letter alphabet are not context free.

Theorem 9.2. Let a be any symbol. A language L {a} is context free precisely

if it is regular.

We already know that languages like {0n 1n : n 0} are context free but not

regular, so even for a two letter alphabet, this theorem no longer holds. For

L {a} however, if a suffix set argument shows L is not regular, then L cannot

be context free either.

n

L = {02 : n 0} = {w {0} : (m 0) |w| = 2m }

n

is not context free. We will show that the suffix sets S(02 ) are distinct for all

n n n n+1 n n

n 0. First note that 02 02 = 022 = 02 L so 02 S(02 ) for all n 0.

n m

But 02 / S(02 ) for any m > n. This is because

m n m 2n n (2mn +1)

02 02 = 02 = 02

cannot be in L since 2n (2mn + 1) has an odd factor 2mn + 1 and therefore cannot

be a power of 2. (You should convince yourself that powers of 2 never have odd

n

factors.) This shows that the suffix sets S(02 ) are all distinct, so L is not regular

by Theorem 5.3 and therefore not context free by Theorem 9.2.

94 9. MORE POWERFUL MACHINES

We have seen that there are correspondences between regular languages and finite

state recognition machines and between context free languages and push down

automata. We now know that there are languages, some quite simple to describe,

that are not context free. It is natural, therefore, to seek a more powerful class of

machines capable of recognizing an even larger class of languages. As illustrated

at the start of Section 9.1, the power of a PDA seems to be limited by the way

in which it accesses the stack. We can define a more powerful type of machine

called a Turing machine (abbreviated as TM) by giving more flexible access to

memory. Instead of only ever reading from and writing to the stack top, we allow a

kind of sequential access memory that allows us to move back and forth between

locations, reading and writing as we go. We picture this kind of memory as a

tape extending infinitely in both directions that can store a symbol at each

location or cell.

... a b a 1 a b 0 ...

At any particular stage in processing, most of the tape is blank. In fact, there

are only ever finitely many cells that are not blank. We represent this situation

using a special blank symbol .

... a b a 1 a b 0 ...

Since we can now read from and write to any cell, we need to know which cell

is currently the active one, usually called the read/write cell (or sometimes the

read/write head ). After each step in processing, a TM moves the read/write cell

either one cell to the left or one cell to the right. We represent the position of the

read/write cell by underlining the symbol in that cell.

... a b a 1 a b 0 ...

Just like finite state machines and PDA, Turing machines have a set Q of states, an

initial state q0 and a set F of accepting states. A major difference from finite state

machines and PDA, however, is that there is no separate input string. Instead,

the input is placed directly on the tape initially. As you will see, our new class

of machines can do more that just pass through the input symbols one at a time.

They can pass back and forth along the input word (or parts of it) as many times

as necessary. This means we need the input right there on the tape where we can

process it.

This mode of operation also means that all Turing machines have output, because

we can regard the contents of the tape after processing has ceased as output. To

make all of this work, there must be an alphabet of symbols (including the

blank symbol ) that can appear on the tape. It makes things easier to also have

an input alphabet \ { }. This is not strictly necessary, but it gives a

9.2. TURING MACHINES 95

are typically used as temporary markers during computations. This idea will

become clear in the examples. To avoid confusion, we never use as an input

symbol or as a marker.

Finally, just like finite state machines and PDA, Turing machines have a partial

transition function which specifies how they operate. Given the current state

and the current read/write symbol, must tell us:

which state to move to (as it would in a finite state machine or PDA).

what to replace the symbol in the read/write cell with (we can replace it

with the same symbol if we like).

whether to move left (represented by L) or right (represented by R) after

processing the the symbol in the read/write cell.

so has domain Q and codomain Q {L, R}. Putting all of this together

gives a (standard) Turing machine. In summary, a standard Turing machine1

consists of the following.

a set Q states.

a set F of accepting states.

an initial state q0 .

a tape alphabet containing the blank symbol .

a partial transition function : Q Q {L, R}

Because TM do not simply process their input string one symbol at a time, it is

not immediately obvious when processing ceases. There is no sensible criterion

analogous to the empty stack criterion for PDA (especially if we want to regard

the contents of the tape after processing as output). The convention we adopt

is that processing stops whenever no further transitions are defined (remember

that is a partial function). This makes it easy to use a TM as a recognition

machine. We say that an input word is accepted precisely if processing stops in an

accepting state. If it stops in a non-accepting state (because the next transition is

undefined), the word is rejected. To keep things simple, we adopt the convention

in MAT2MFC that transitions will never be defined for any accepting state.

The only way to really understand TM is to study some examples. To do so, we

need informative ways to represent TM and their operations. Transition tables

can certainly be given, but as for FSM, directed graphs are usually easier to use.

Transition tables are different than for PDA because the domain of is slightly

simpler while the codomain is slightly more complicated. In a directed graph, edge

labellings look a little different. For example, edge labellings for the transitions

1Named after the mathematical logician and cryptanalyst Alan Turing, who proposed what

we now call Turing machines in the 1930s. Many consider him the founder of computer science.

96 9. MORE POWERFUL MACHINES

We will see why transitions like ( , r) = ( , w, L) are necessary in the examples.

a 7 (u, R),

c 7 (v, L) 7 (, L)

p q r s

Example 6.3.2.

Q = {q0 , q1 , q2 , q3 , q4 }.

initial state q0 . x 0 1 a

(x, q0 ) (a, q1 , R) (1, q4 , L)

tape alphabet = {0, 1, a, }.

(x, q1 ) (0, q1 , R) (1, q1 , R) (1, q2 , L) ( , q2 , L)

input alphabet = {0, 1}.

(x, q2 ) (a, q3 , L)

accepting states F = {q4 }. (x, q3 ) (0, q3 , L) (1, q3 , L) (0, q0 , R)

a 7 (0, R) a 7 (1, L)

1 7 (1, L), 0 7 (0, L) q3 q0 q4

1 7 (a, L) 0 7 (a, R)

a 7 (1, L),

7 (, L)

q2 q1 1 7 (1, R), 0 7 (0, R)

a 7 (0, R) a 7 (1, L)

1 7 (1, L), 0 7 (0, L) q3 q0 q4

1 7 (a, L) 0 7 (a, R)

a 7 (1, L),

7 (, L)

q2 q1 1 7 (1, R), 0 7 (0, R)

9.2. TURING MACHINES 97

the first 0 and the last 1 it then repeatedly moves the left a to the next 0 on

its right and right a to the next 1 on its left. For a word w = 0n 1n L this

strategy should give a word of the form 0n1 aa1n1 after n 1 repeats and leave

the machine in state q0 . It then moves to accepting state q4 . In brief, the strategy

is to progressively transfor the input word as follows

know how to build a PDA that recognizes L, but the contrast of methods will

be illuminating. We will also use this machine to build other, more powerful

machines. The details of the operation of this (or any) TM are best understood by

examining how some example inputs are processed using a configuration notation

for Turing machines which we now describe. At each step in processing we need

to know the state and the contents of the tape, which we write as an ordered pair.

We definitely do not want to write the contents of the tape in the cumbersome

way we did on page 94. Instead, we give the word consisting of all of the non-

blank symbols, including a blank at one end of this word when it is necessary

because the read/write cell contains a blank. Here is the configuration notation

representing the processing of w = 0011 L.

(q0 , 0011) (q1 , a011) (q1 , a011) (q1 , a011) (q1 , a011 )

(q2 , a011) (q3 , a01a) (q3 , a01a) (q3 , a01a) (q0 , 001a)

(q1 , 0a1a) (q1 , 0a1a) (q2 , 0a11) (q3 , 0aa1) (q0 , 00a1)

(q4 , 0011) (accept)

We can also use configuration notation to illustrate why the machine rejects words

that are not in L.

w = 001 : (q0 , 001) (q1 , a01) (q1 , a01) (q1 , a01 ) (q2 , a01) (q3 , a0a)

(q3 , a0a) (q0 , 00a) (q1 , 0aa) (q2 , 0a1) (reject)

w = 011 : (q0 , 011) (q1 , a11) (q1 , a11) (q1 , a11 ) (q2 , a11) (q3 , a1a)

(q3 , a1a) (q0 , 01a) (reject)

w = 010 : (q0 , 010) (q1 , a10) (q1 , a10) (q1 , a10 ) (q2 , a10) (reject)

We have already observed that TM always have output because the contents of the

tape at the end of processing may be viewed as output. We now consider examples

that make deliberate use of this idea to carry out some string calculations.

98 9. MORE POWERFUL MACHINES

halts in an accepting state with 02n on the tape. It starts by replacing the first

symbol in 0n with marker a. It then repeatedly shifts a one cell to the right end

and each time it does so writes an extra zero at the left hand end of the tape

contents (the transition ( , q1 ) 7 (0, q2 , R)). It stops after the a reaches the right

end of the tape contents. This strategy has the effect of doubling 0n . When the

tape contents after processing are regarded as output, a TM may be thought of as

calculating a string function that converts one string into another. This machine

computes the string function f : 0 0 defined by f (0n ) = 02n . It doubles the

string.

0 7 (a, L)

q0 q1 0 7 (0, L)

q3 q2 0 7 (0, R)

The string function of the machine of Example 9.2.2 can be computed much more

easily and efficiently by the FSM (with output) shown in Figure 4. Indeed, this

is a trivial task for an FSM. This illustrates just how much harder TM are to

program. However the machine of Example 9.2.2 can be expanded to perform a

task impossible for an FSM or PDA computing the string function f : 0 0

n

defined by f (0n ) = 02 , as we shall see in Example 9.2.4.

q0 0/00

of Example 9.2.1 gives us a machine with input language 0+ = {0n : n 1}

that halves strings of even length and rejects strings of odd length. The strategy

is again to place the marker a at each end of the input word and move them

9.2. TURING MACHINES 99

toward he middle, this time erasing the right hand end of the word as the right

hand marker moves toward the centre. Similar to Example 9.2.1, provided the

as meet up in the middle, the word must be of even length. Notice that this

strategy could also be used to design a machine that checks whether a a given

input word is of even length without erasing it. The machine implements the

string function

f (02n ) = 0n

with domain {02n : n 1}. It is shown in Figure 5.

a 7 (0, R) a 7 (, L)

0 7 (0, L) q3 q0 q4

0 7 (a, L) 0 7 (a, R)

a 7 (, L)

7 (, L)

q2 q1 0 7 (0, R)

processing of the same word on page 56 illustrates the fact that TM are typically

less efficient than PDA. They also require more care and subtlety in design and

are harder to program it is harder to devise the strategy and work out the

transitions. So we would like to know this is all worthwhile. So far we havent

shown that TM have any more power than PDA. We rectify this by exhibiting a

machine that accepts the language

n

L = {02 : n 0} = {w {0} : (m 0) |w| = 2m }

which is not context free as shown in Example 9.1.2. The strategy is to add some

further states to the machine of Example 9.2.3 that allow it to repeatedly halve

the length of the input string. If the length of the original input string was a

power of two, this should eventually yield a string of length one as follows:

n n1 0

02 7 02 7 . . . 7 02 7 02 = 0.

If not, a sting of odd length greater than one will eventually be obtained. The

trick is to accept the string 0 if it appears as the tape contents at any stage of the

computation, but to reject any string 0n where n > 1 and n odd.

100 9. MORE POWERFUL MACHINES

machine with input language 0+ = {0n : n 1} that repeatedly halves the input

string. If an attempt to halve the string currently on the tape reveals that the

string is of odd length, the original string is rejected. Before attempting to halve

the string, however, the machine must first check that it is not just 0. If this

condition is detected (at state q1 ), the original string is accepted. The state q5

is used to move the head back to the start of the string after each halving. The

machine is shown in Figure 6.

a 7 (, L)

a 7 (0, R)

0 7 (0, L) q4 q0 q5 0 7 (0, L)

7 (, R)

0 7 (a, L)

q3 0 7 (a, R)

0 7 (0, R) 7 (, L)

0 7 (0, R) q2 q1 q6

program. Recall that the machine of of Example 9.2.2 implements the string func-

tion f (0n ) = 02n . We can use this function repeatedly to calculate the the string

n

function f (0n ) = 02 . We can design a TM that does this by starting with ma-

chine M of Example 9.2.2 and building a supervisor machine that causes M to

run n times. This is done in Example 9.2.5. It illustrates the modularity of Turing

machines. We frequently build up machines that perform more complicated tasks

joining simpler machines together or plugging one into another in various ways.

Joining together two slightly modified versions of the machine of Example 9.2.1

gives a machine that accepts the non-context free language L1 = {0n 1n 2n : n 1}

of Example 9.1.1.

(states q0 , q1 , q2 , q3 ) in a supervisor machine (states p0 , p1 , p2 , p3 , p4 ) which causes

it to run once for each 0 in the input word. The strategy is to first convert the

input string 0n to 1n so we can use the 1s to count how many times we have

9.2. TURING MACHINES 101

doubled and not mix up the counters with the output. Then each 1 (there

should be n of them to begin with) is deleted but the number of 0s is doubled

using a copy of the machine of Example 9.2.2. When all 1s have been deleted,

the machine halts.

q3

0 7 (0, L)

7 (, L)

1 7 (1, R) 0 7 (a, L)

1 7 (1, R) p3 q0 q1

7 (, L)

1 7 (, L) 7 (0, L)

0 7 (0, L)

1 7 (1, L) p1 p2 q2

7 (, L)

0 7 (0, L) 0 7 (0, R)

0 7 (1, R) p0

n

Figure 7. Turing machine that computes f (0n ) = 02 .

In view of the result of Example 9.1.2, the machine constructed in Example 9.2.4

proves conclusively that TM are strictly more powerful than PDA. In fact they

are very powerful indeed, even if somewhat clumsy and inefficient. We will come

back to the issue of just how powerful in Section 9.3. This increase in power

is not surprising given the substantially greater flexibility of memory access we

grant to TM. However, this greatly expanded power has its down side. We have

already observed that TM are typically less efficient than PDA and they tend to be

harder to program, but there is a far more serious problem. TM sometimes fail

to stop processing. Given certain input words, they may just continue processing

indefinitely, neither reaching an accepting state nor halting in a non-accepting

state due to an undefined transition. This is analogous to a program entering an

infinite loop.

Example 9.2.6. The very simple TM in Figure 8 was designed to accept the

regular language L = {(01)n 0 : n 0} = {01} 0 by simulating a simple

finite state recognition machine. Suppose we mistakenly (a typical programming

102 9. MORE POWERFUL MACHINES

0 7 (0, R)

7 (, L)

q0 q1 q2

1 7 (1, R)

error) include the transition 1 7 (1, L) in place of 1 7 (1, R) giving the almost

identical machine of Figure 9. This new machine still correctly accepts the word

0 L and it still correctly rejects, for example, words beginning with 1 or 00 (you

should check these claims). However, using configuration notation to analyse the

0 7 (0, R)

7 (, L)

q0 q1 q2

1 7 (1, L)

(q0 , 010) (q1 , 010) (q0 , 010) (q1 , 010) (q0 , 010) (q1 , 010) . . .

The machine cycles between the two configurations (q0 , 010) and (q1 , 010) forever!

In the next section we will see that TM provide the most widely accepted model

of what it means to be computable by an algorithm. Given that they enjoy such

power, it is not surprising that they run the risk, well known to every programmer,

of writing a program that goes into an infinite loop.

9.3.1. The gold standard. Standard TM of the type presented in Section

9.2 have underpinned the accepted model of what can and cannot be computed

by an algorithm for about sixty years. Many variations on the standard TM have

been proposed:

(a) Non-deterministic Turing machines allow for a non-deterministic transi-

tion function in much the same way as for finite state machines or PDA.

(b) Multiple tape Turing machines have more than one tape and can read

from and write to all of them at each move.

(c) Turing machines with a stay put option have a third option S in addition

to L and R which means dont move the read/write cell.

9.3. THE POWER OF TURING MACHINES 103

(d) Turing machines with a one sided tape have a tape with a fixed starting

point that only extends infinitely in one direction.

None of these have yielded anything different. Just as non-deterministic finite

state recognition machines turn out to be equivalent in power to deterministic

ones, so non-deterministic TM are equivalent in power to deterministic ones.

Similarly, the other variations have all turned out to yield classes of machines

equivalent in power to standard TM. This is not to say that these variations are

of no interest. Some of them allow for easier coding of algorithms, easier proofs

and so on. For example, multiple tapes allow for many words to be processed si-

multaneously and compared with one another or for distinct tapes used for input

and output. Various other alternative models of computation have also been pro-

posed, but they have all turned out to be equivalent to (or in some cases weaker

than) standard TM.

This history has led to a widespread acceptance of the idea that TM provide

a definition of what can can be computed by an algorithm. A function is said

be computable by an algorithm precisely if there is a TM that computes it and

halts on all possible inputs. We want our machine to halt on all possible inputs

because we feel that something worthy of the name algorithm should not go

into an infinite loop and should eventually give an output in all cases. Roughly

speaking, this idea of equating algorithmic computability with computability by

a TM (or with some equivalent system) is known as the Church-Turing thesis 2.

As a definition of computability it can neither be proved nor disproved, but the

fact that it has stood the test of time gives us confidence that it is a sensible

definition. Not only are halting TM the gold standard of what can and cannot

be computed by an algorithm, they also underpin the theory of computational

complexity, which analyses how efficiently (both in terms of time and memory)

computations can be done.

9.3.2. The limits of computation. Notwithstanding the above comments,

TM are not omnipotent. There are some things they cant do. One consequence

of the following theorem is that are even languages they cannot recognize.

Theorem 9.3. There are countably many Turing recognition machines with a

given input alphabet = {x1 , x2 , . . . , xm }.

To keep things simple, we assume that the set of states of an n state TM is

Qn = {q0 , q1 , . . . , qn1 }, the initial state is q0 (it should be clear that it doesnt

2The mathematical logician Alonzo Church was Alan Turings PhD supervisor.

104 9. MORE POWERFUL MACHINES

really matter how we label the states) and that the set F of accepting states is

not empty (since the language of a recognition machine with no accepting states

is empty). We also assume that the set of marker elements of the tape alphabet

the set \ ( { }) is of the form {a1 , . . . , ak } for some k 0 (because

it doesnt really matter what the markers are called, as long as we have enough

of them). The transition table of a machine with n states and and k markers has

the following structure.

x x1 . . . x m a0 . . . ak

(x, q0 )

(x, q1 )

.. .. ..

. . .

(x, qn1 )

The table entry for (x1 , q0 ) marked by is either or of the form (b, q, D) for

some q Qn , b = {x1 , . . . , xm , a1 , . . . , ak , } and D {L, R}. There are

m + k + 1 possibilities for b, multiplied by n possibilities for q, multiplied by 2

possibilities for D giving 2n(m + k + 1) + 1 possible ways of completing the entry

for (x1 , q0 ). But this argument works exactly the same for each of the n(m+k+1)

entries in the table. By the type of counting argument familiar from MAT1DM,

this means that the total number of ways to fill in the table entries is

(2n(m + k + 1) + 1)n(m+k+1) . ()

Since we have agreed that q0 is the initial state, the only remaining issue is which

states are accepting. Now F Q so the number of possible ways of choosing F is

|P(Q)| = 2n and since we usually want our machine to have at least one accepting

state we can rule out F = giving 2n 1 choices for F . Putting this together

with (), there are

(2n 1)(2n(m + k + 1) + 1)n(m+k+1) ()

elements in the set M(n,k) of machines with n states and k markers.

In particular M(n,k) is finite. We saw in Section 8.2, the set N N is countable.

It is easy to extend this argument to show (N {0}) N is countable (you should

check this), so there are countably many distinct sets M(n,k) which we may list as

M1 , M2 , M3 , . . . Since each of these sets is finite, Theorem 8.5 shows that their

union

[

M= Mi

i=1

is countable. But M is clearly the set of all Turing recognition machines with

input alphabet .

9.3. THE POWER OF TURING MACHINES 105

In view of Theorems 8.9 and 9.3, there must be languages with alphabet

= {0, 1} that are not recognized by any TM. Remarkably, there must even be

languages with one letter input alphabet = {0} that are not recognized by any

TM. Notice that the problem here is not that we lack the cunning to construct

recognition machines for some languages. It is that there arent enough recognition

machines to recognize all of the possible languages with input alphabet . Given

that some languages are not recognized by TM, there is a name for those that

are. They are called recursively enumerable. It even turns out that there are

some languages that can be recognised by a TM but not by any TM that halts

on all possible inputs. Because of this situtation, there is a name for the class of

languages recognised by some TM that is guaranteed to halt on all inputs. Such

languages are called recursive. In view of the above discussion, we may think of

the recursive languages as those that can be recognized by some kind of algorithm.

This result is just the tip of the iceberg. There are many, many things that simply

cannot be done by TM. Perhaps the most famous is the halting problem for TM

themselves. This is the problem of deciding whether a given Turing machine M

with input alphabet fails to halt when it attempts to process a given word w.

Before we can even think about asking a TM to solve this problem, we need to

be able to represent our machine M in some form that can be used as input to

another TM. This always turns out to be possible because of the finite nature of

a TM:

The transition table has finitely many entries.

There are finitely many states, one of which is initial, and finitely many

of which are accepting.

We can represent this information using a finite string. The main difficulty is that

if we want to feed this string to a TM, it must be based on a finite alphabet. This

means we cant have infinitely many symbols q0 , q1 , q2 , . . . for our states. We can

avoid this problem by using the words q, qq, qqq, . . . to represent the states. This

means we only need one symbol to represent all of the states. A similar trick can

be used for marker symbols, which we represent as a, aa, aa, . . . and so on.

Example 9.3.1. The machine of Figure 9 has transition table shown and:

states Q = {q0 , q1 , q2 }. x 0 1

initial state q0 . (x, q0 ) (0, q1 , R)

accepting states Q = {q2 }. (x, q1 ) (1, q0 , L) ( , q2 , L)

tape alphabet = {0, 1, }. (x, q2 )

input alphabet = {0, 1} .

106 9. MORE POWERFUL MACHINES

first named state is the initial state and we dont really need to mention the

symbol because every machine has in its tape alphabet. With these conventions,

we could encode this TM in a single string S

delimiter separating the rows in the transition table, which we just write down

in order. Many other encodings are possible, some no doubt more efficient and

easier to process. The point is that our string is made from a fixed finite alphabet

0 = {q, , |, (, ), , , /}

and completely defines the operation of the machine, because from the string

S you could easily write down the transition table and hence draw the graph.

Moreover, 0 could be used to describe any machine with input alphabet . (We

used a forward slash (/) in place of a comma (,) here to avoid some very confusing

set notation.) It is now a simple matter to add a word w {0, 1} to this encoding

by simply adding w to the end of S. We are now ready to feed this string S w

to our very clever TM, in the hope that it can decide whether this machine would

halt if given the input w.

the machine halts, given any Turing machine M with input alphabet and any

input word w . But it cant be done! A rigourous mathematical proof can

be given to show no such machine exists (no matter how cleverly we encode the

machines). This doesnt mean that we can never decide whether a particular

machine halts for a particular input. It means there is no TM that can decide this

question for all possible TM with input alphabet and words in . This has

immensely important consequences in computer science. It means, for example,

that we cannot write a program that can check whether a program given to it as

input will halt on all possible inputs. A similar proof shows there is no general

algorithmic way of checking whether a program given to it as input will go into

an infinite loop on some input.

There are many other famous decision problems for which there is no TM that

works in all cases. Such problems are called undecidable. Many of them are

difficult even to describe let alone attempt to solve. On the other hand, some of

the undecidable problems concerning context free grammars are easy to describe.

Here it is very easy to see how to encode the objects we wish to study. A set

of production rules like 00 11 for a grammar is, after all, just a

string of symbols and the terminal and non-terminal symbols can be coded as in

9.3. THE POWER OF TURING MACHINES 107

Example 9.3.1. Among the many decision problems known to be undecidable are

the following surprisingly simple ones.

Is the language of a given context free grammar G a regular language?

For a given context free grammar G with set of terminal symbols, is

the language of G is the whole of ?

Do a pair of context free grammars G1 and G2 give the same language?

Here again, the claim is not that we can never decide whether a particular context

free grammar actually generates a regular language. The claim is that there is no

TM that can decide this question for all possible context free grammars.

- NCERT Class 11 Mathematics ProblemsCargado porraajeevms
- Formal MethodsCargado porslasherzkreeb
- Graphing Linear EquationsCargado porJen Goldschmidt
- functions and graphs hsnCargado porapi-298592212
- UT Dallas Syllabus for math1314.001 05f taught by Joanna Robinson (joanna)Cargado porUT Dallas Provost's Technology Group
- Chapter 2Cargado porKshitij Goyal
- Functional equations.pdfCargado porrikabe70
- UntitledCargado porapi-162641823
- 1. Functions[1]Cargado porlatifarshad
- Function WorksheetCargado porNathan Pete De Groot
- Fljs SampleCargado porYrvin Escorihuela
- Instantaneous Slowness Versus Depth FunctionsCargado porMohand76
- digital-bbpt lesson planCargado porapi-361030675
- 04 SEP 600 Application ConfigurationCargado porMourad Benderradji
- unity-certified-developer-exam-objectives.pdfCargado porGOKUL PRASAD
- GlossaryCargado porantoniobr97
- Category Based Application EngineCargado porIRJCS-INTERNATIONAL RESEARCH JOURNAL OF COMPUTER SCIENCE
- isofinalCargado porJoshua Greenhalgh
- Putnam AnalCargado porZach
- matrix borgatti.pdfCargado porShinji Ikari
- Freudenstein’s Method an Analytical Solution for 4-Bar Synthesis 3-Pt Function GenerationCargado porSahilModi
- Gena RiseCargado porAnonymous MqprQvjEK
- ej3.1.pyCargado porJuan Luis Gutierrez Blanco
- math151 assignmentCargado porSFUMACM101
- 00050713Cargado porSubhajit Guha
- location cCargado porapi-272643960
- Universal Approximation Using Feedforward Neural NCargado porSandeep Pandey
- CUMULATIVE TEST # 1 (Q).pdfCargado porManavSaksaria
- PHP 5 Array FunctionsCargado porKings Queens Alls
- Single Valued Neutrosophic Graphs: Degree, Order and SizeCargado porAnonymous 0U9j6BLllB

- CSE1ACF Subject Learning Guide 2018 Semester 2Cargado porShubham
- 2013 ISExam LTU QuestionsCargado porShubham
- Assignment Declaration FormCargado porShubham
- Food - Application to Transfer Registration v112017Cargado porShubham
- Cse5ent Sum 1 l1Cargado porShubham
- Invoice TemplateCargado porDamien Artovski
- TYROCargado porShubham
- Handbook Summarized 2019 PRA.docxCargado porShubham
- Anabolics 10th EditoionCargado porShubham
- BicepsBlackBook.pdfCargado porThanos Rv
- Workout PlanCargado porShubham
- STA1DCT Assignment 1Cargado porShubham
- CSE2&5NEF Instruction for Answering the ExamCargado porShubham
- Self AssessmentCargado porShubham
- Medical Certificate TemplateCargado porShubham
- VCE Results 2017Cargado porShubham
- Group Presentation Marking SheetCargado porShubham
- Beginner Sheiko Training ProgramsCargado porShubham
- bec27ad2df8b3b385587b97f029a2847 (1)Cargado porShubham
- MAT1CDE_Ass5 (5)Cargado porShubham
- 17 MrOlympia India ResultsCargado porShubham
- 17 IOO Prac09 LinkedListsCargado porShubham
- Westside for Skinny BastardsCargado porJosé Carlos Santos
- SAT Template.pdfCargado porShubham

- 29392324 Light Bearers of Darkness by Christina Stoddard Aka Inquire Within Scanned Page Images OCR FaultyCargado porGregor Sharkey
- 4 Blooms for ELLs-1Cargado porVasantha Mallar
- Case Study 1Cargado porNguyễn Tâm
- Blanchot, Maurice - The Instant of My Death (Stanford UP 2000)Cargado porSofia Falomir
- The Life and Teaching of Donald Reinhardt (Cook)Cargado porhorralloas
- Universities Australia guidelines for addressing sexual assault harassment on campusCargado porSimon McCarthy
- One-way AnovaCargado porAgus Pranadi
- (Epub) How to Make the Word Come Alive - Charles & Frances HunterCargado porNewCreation133
- A Broader Conception of Mood Experience.pdfCargado porMelissa
- An introduction to complex numbers.pdfCargado porGus Edi
- Sanad of Ziyarat AshuraCargado porيا لثارات الحسين عليه السلام
- Integral Feminism of St Thomas AquinasCargado porRafaelRoca
- Dissertation Final ReportCargado porDipankar Kalita
- Annual Report Final 2017-18 EnCargado porNidhi Desai
- Theory of Constraints 20130524Cargado porSaras Agrawal
- Doctoral Degree RegulationsCargado porHoac Chu Ken
- Prepare for CATCargado porNikhil Khandelwal
- LHS Club Application PacketCargado porGavin Mai
- MesmerismCargado porvasile_69
- Sociometrija 2011Cargado poropenid_aQs1wCwj
- terblanchesmit_impact_2008.pdf.txtCargado porsaadnatiq
- FORM 138-2018Cargado porVan Surriga Punay
- Various Interpretations of the Sexual UrgeCargado porJulian Raja Arockiaraj
- IBM PresentationCargado porMudasir Ahmad Bhat
- Land TenureCargado pordrshaye
- 1979_WTw19790801.pdfCargado porjanine
- Endure and RenounceCargado porDayal nitai
- Mechanisms of Emergent Computation in Cellular AutomataCargado porAndrew-Robert Gallimore
- pdf - Mathematics for Scientists - Applied MathCargado porAkeem Ajibola Ojo Engr
- Chapter 1 NewCargado porShafiq Stylezz