28

(6) Let = {a, b, c}.

(a) Give a regular expression for the set of all elements of — containing

exactly two bs

(b) Give a regular expression for the set of all elements of — containing

exactly two bs and two cs

(c) Give a regular expression for the set of all elements of — containing

two or more bs

(d) Give a regular expression for the set of all elements of — beginning

and ending with a and containing at least one b and one c.

(e) Give a regular expression for the set of all elements of — consisting

of one or more as, followed by one or more bs and then one or more

cs.

(7) Let = {a, b}.

(a) Give a regular expression for the set of all elements of — containing

exactly two bs or exactly two as.

(b) Give a regular expression for the set of all elements of — containing

an even number of bs.

(c) Give a regular expression for the set of all elements of — beginning

and ending with a and containing at least one b.

(d) Give a regular expression for the set of all elements of — such that

the number of as in each string is divisible by 3 or the number of bs

is divisible by 5.

(e) Give a regular expression for the set of all elements of — such that

the length of each string is divisible by 3.

(8) Which of the following are uniquely decipherable codes?

(a) {ab, ba, a, b}

(b) {ab, acb, accb, acccb, . . .}

(c) {a, b, c, bd}

(d) {ab, ba, a}

(e) {a, ab, ac, ad}.

(9) Which of the following expressions describe uniquely decipherable codes?

(a) ab—

(b) ab— ∨ baaa

(c) ab— c ∨ baaac

(d) (a ∨ b)(b ∨ a)

(e) (a ∨ b ∨ »)(b ∨ a ∨ »).

(10) Which of the following are uniquely decipherable codes? Which are suf¬x

codes?

(a) {ab, ba}

(b) {ab, abc, bc}

2.2 Retracts (Optional) 29

(c) {a, b, c, bd}

(d) {aba, ba, c}

(e) {ab, acb, accb, acccb}.

(11) Which of the following expressions describe pre¬x codes? Which describe

suf¬x codes?

(a) ab—

(b) ab— c

(c) a— bc—

(d) (a ∨ b)(b ∨ a)

(e) a— b.

Show that the intersection of the monoids {ac, bc, d}— and {a, cb, cd}—

(12)

is the monoid generated by the code described by the expression

ac(bc)— d.

(13) Prove Theorem 2.1. If a code is a suf¬x, pre¬x, in¬x, bipre¬x, or block

code, then it is uniquely decipherable.

(14) Prove Theorem 2.2. A block code is a suf¬x, pre¬x, in¬x, and bipre¬x

code.

(15) Prove Theorem 2.3. An in¬x code is a bipre¬x code.

2.2 Retracts (Optional)

In this section we discuss an additional source of examples of regular lan-

guages: the ¬xed languages of endomorphisms of free monoids A— . Each such

language is necessarily a submonoid of A— and is the image of a special type of

endomorphism called a retraction. Such images are called retracts and they are

characterized among submonoids as those submonoids that are generated by a

special class of codes called key codes.

De¬nition 2.9 Let X be a set and let f : X ’ X be a function having the

property that f ( f (x)) = f (x) for all x in X . A function with this property is

called a retraction of X and its image is called a retract of X .

Notice that the restriction of a retraction f : X ’ X to the image of f ,

{ f (x) : x ∈ X }, is the identity mapping of the image of f onto itself.

Example 2.4 For the real numbers R, the absolute value function f : R ’ R,

de¬ned by f (r ) = |r |, is a retraction and {x ∈ R : x>0} is its associated retract.

The ¬‚oor and ceiling functions, when regarded as functions from R into R

provide two additional examples of retractions which determine the same retract

{r ∈ R : r is an integer}.

Languages and codes

30

Notice that, when X is merely a set having no speci¬ed structure, every

nonempty subset S of X is a retract of S since S is the image of the retraction

r : X ’ X de¬ned by r (x) = x if x is in S and otherwise r (x) = s where s

is in S. When structures are speci¬ed on a set, its retracts may become quite

interesting. In this section we study the retracts of free monoids A— , where A

is a ¬nite set. Several of our results hold also when A is in¬nite but we leave

these extensions to the interested reader. We will assume in this section and the

next that alphabets are always ¬nite.

De¬nition 2.10 The ¬xed language of a homomorphism h : A— ’ A— is the

set L = {w ∈ A— : h(w) = w}.

Note that the ¬xed language of each homomorphism is a submonoid of A— .

Example 2.5 Let A = {a, b, c, d} and let f be the homomorphism f : A— ’

A— de¬ned by f (a) = dad, f (b) = bc, f (c) = d, f (d) = ». The ¬xed lan-

guage of f is the submonoid generated by the set {dad, bcd}. Notice that

f ( f (b)) = f (bc) = f (b) f (c) = bcd which is not equal to f (b) = bc. Con-

sequently f is not a retraction. However, notice also that the homomorphism

r : A— ’ A— , de¬ned by r (a) = dad, r (b) = bcd, r (c) = r (d) = » is a retrac-

tion and has the same image as f . Thus the image of f is a retract even though f

itself is not a retraction. Finally, note that {dad, bcd} is a uniquely decipherable

code.

The behavior of a, b, and c in Example 2.5 will provide an illustration of the

classi¬cation of alphabetical symbols that will be necessary for understanding

retracts.

De¬nition 2.11 Let A be a set and let f be a homomorphism f : A— ’ A— .

A symbol a in A is said to be mortal, with respect to f , if there is a positive

integer n for which f n (a) = »; otherwise a is said to be vital.

For each homomorphism f , the mortal/vital dichotomy of the symbols of

A may be determined as follows. For each nonnegative integer j let A j be

de¬ned inductively by: A0 is empty; A1 = {a ∈ A : h(a) = »}; and for j ≥ 2,

A j = {a ∈ A : h(a) ∈ A— }. Since A is ¬nite there will be a least nonnegative

j’1

integer m for which Am = Am+1 . The set Am is the set of all mortal symbols

and its complement in A is the set of vital symbols.

Notice that in Example 2.5 the symbols d and c are mortal and the symbols a

and b are vital. Note also that the ¬xed language is the submonoid generated by

a set of words in each of which there is exactly one occurrence of a vital symbol.

Further, each of these vital symbols occurs in only one of the generators. In

this section we show that the simple observations concerning ¬xed languages,

2.2 Retracts (Optional) 31

retractions, and codes, made for Example 2.5, are completely typical of ¬xed

languages of homomorphisms.

De¬nition 2.12 Any symbol k in A that occurs exactly once in a word w in

A— is called a key of w. A word w for which there is at least one key symbol is

called a key word.

Note that for the word dad the symbol a is the unique key. For the word bcd

each of the three symbols b, c, and d is a key. Consequently both dad and bcd

are key words. Finally, the word abcbac is not a key word since it has no key.

De¬nition 2.13 A set X of words is called a key code if each word in X is a

key word and a key for each word in X can be chosen that does not occur in

any other word in X .

Note that the set of generators given in Example 2.5, namely X =

{dad, bcd}, is a key code. The word dad allows only the unique symbol a

to be chosen as its key. The word bcd allows each of b, c, and d to be chosen

as a key. To con¬rm that X is a key code we cannot use d as a key for the word,

but if either b or c is chosen as the key for bcd, then it is con¬rmed that X is a

key code. Each key code X is uniquely decipherable since, given any string that

is the concatenation of words chosen from X , simply noting the key symbols

that occur in the string provides the unique segmentation of X into code words.

The key codes constitute a very restricted subclass of the uniquely decipher-

able codes. Such simple codes as {aa, bb, cc, dd} are uniquely decipherable,

but contain no key word at all.

Example 2.6 Let A = {a, b, c, d}. The following are key codes: {a, b, c, d},

{a, bcc, dcc}, {abcbc, bbd}, {ababcd}, and the empty set. Note the crucial fact

that {a, b, c, d} is the only key code in A— that consists of exactly four words,

since each four-word key code must use each of the four symbols in A as

a key. The following are not key codes: {abba}, {abcd, c}, {abc, bcd, cda},

{a, b, c, dd}. Note that a subset of A— that contains ¬ve or more words cannot

be a key code, since there are only four possible keys.

The following technical result is the basis for the theorem that establishes

the ¬rm relationships that hold among the concepts: ¬xed language, retract, and

key code.

The following proposition was discovered by Tom Head [16].

Proposition 2.1 Let A be an alphabet and h : A— ’ A— be a homomorphism.

Let X = {a ∈ A : h(a) = uav where only mortal symbols occur in u and v}.

For each a in X , let Na be the least nonnegative integer for which h Na (uv) = ».

Let H = {h Na (a) : a ∈ X }. The ¬xed language L of h is the submonoid of A—

Languages and codes

32

generated by H . The correspondence a”h Na (a) is a one-to-one correspon-

dence between X and H .

Proof (1) H — ⊆ L: Since h is a homomorphism it is suf¬cient to verify that

each element h N (a) (a) of H is in L, which is con¬rmed by the calculation:

h(h N (a) (a)) = h N (a) (h(a))

= h N (a) (uav)

= h N (a) (u)h N (a) (a)h N (a) (v)

= »h N (a) (a)»

= h N (a) (a).

(2) L ⊆ H — : Let w be in L. Let a1 , a2 , a3 , . . . , an be the subsequence consisting

of the occurrences of the vital elements of w. We use now the principle: a

vital symbol can come only from a vital symbol and only a mortal symbol

can eventually be erased. Since h(w) = w, each ai must occur exactly once

in h(ai ). It follows that, for each ai , we must have h(ai ) = u i ai vi where each

u i , vi must consist entirely of mortal symbols. Thus for each i there is a least

nonnegative integer N (i) for which h N (i) (u i vi ) = ». Let N be the largest of

the N (i). Note that, for each ai , h N (ai ) = h N (i) (ai ) is in H . Then w = h(w) =

h N (w) = h N (a1 )h N (a2 )h N (a3 ) . . . h N (an ) is in H — .

From (1) and (2) we have L = H — . Note that H = {h N (i) (a) : a ∈ X } is a

key code since the set X is a set of keys for H .

Theorem 2.4 Let A be a ¬nite alphabet and L be a language contained in

A— . The following three conditions on L are equivalent:

(1) L is the ¬xed language of a homomorphism of A— into A— ;

(2) L is a submonoid of A— that is generated by a key code; and

(3) L is a retract of A— .

Proof (1 ’ 2): Let h : A— ’ A— be a homomorphism. Proposition 2.1 pro-

vides us with the key code H for which L = H — .

(2 ’ 3): Let L be a submonoid of A— that is generated by a keycode X and

let K be a set of keys for X . For each k in K there are strings xk and yk for

which xk kyk is the key word in X having k as its key. De¬ne a homomorphism

r : A— ’ A— by r (k) = xk kyk for each k in K and r (a) = » for each a not in

K . Note that r is a retraction of A— having X — = L as its image, hence L is a

retract.

(3 ’ 1): Let L be a retract of A— . Then there is a retraction r : A—

’ A— that has L as its image. Then L is the ¬xed language of the the

homomorphism r .

2.2 Retracts (Optional) 33

Theorem 2.4 has several valuable corollaries the proofs of which will be

relegated to the exercises.

Corollary 2.1 A retract of a free monoid is free.

Note that this property of retracts does not hold for arbitrary submonoids of

a free monoid since, for any nonempty alphabet A, the submonoid consisting

of all words of length >2 is not free.

Corollary 2.2 If A is an alphabet having exactly n symbols, then no inclusion

chain of distinct retracts of A— has more than n + 1 retracts even when the

retract {»} is included.

Corollary 2.3 If X is a key code and x n lies in X — , then so does x.

Corollary 2.4 If X is a key code and both uv and vu lie in X — , then so do u

and v.

Let A = {a1 , a2 , a3 , . . . , an }. A simple example of a longest possible inclu-

sion chain of retracts in A— is

{a1 , a2 , a3 , . . . , an }— , {a2 , a3 , . . . , an }— , {a3 , . . . , an }— , . . . , {an }— , {»}.

Each of these retracts, except the ¬rst, is maximal among the retracts contained

in its predecessor. In each case the number of generators of the subretract is

one less than the number of generators of its predecessor. However, maximal

proper subretracts of a retract can have many fewer generators:

Proposition 2.2 Let n be a positive integer and A = {a1 , a2 , a3 , . . . , an } an

alphabet of n symbols. Let m be any positive integer less than n. Then A—

contains a maximal proper retract generated by exactly m words.

Proof The set of m words

K = a1 , a2 , a3 , . . . , am’1 , am am+1 am+2 am+3 . . .

22 2 2

an’1 an am am+1 am+2 am+3 . . . an’1 an

2

is a key code for which K — is a maximal proper retract of A— .

The veri¬cation of the maximality is left as an exercise.

The retracts of a free monoid and the the partially ordered set they form

under set inclusion have been studied previously in [16],[10], and [9].

Languages and codes

34

Exercises

(1) Which of the following are sets of key codes?

(a) {a, ab, ac, d}

(b) {ab, ac, ad, ae}

(c) {aabaa, aacaa, ddeda, dada f }

(d) {abba, acca, adda, aeea}.

(2) De¬ne the retraction maps with the following retracts on A— where A =

{a, b, c, d, x, y, z}

(a) {aabaa, acaax, daax y}

(b) {ax, bx, cx, d x}

(c) {abcd}.

(3) Prove that the restriction of a retraction f : X ’ X to the image of f , is

the identity mapping of the image of f onto itself.

(4) Prove Corollary 2.1 that a retract of a free monoid is free.

(5) Prove Corollary 2.2 “ if A is an alphabet having exactly n symbols, then no

inclusion chain of distinct retracts of A— has more than n + 1 retracts even

when the retract {»} is included.

(6) Prove Corollary 2.3 “ if X is a key code and x n lies in X — , then so does x.

(7) Prove Corollary 2.4 “ if X is a key code and both uv and vu lie in X — , then

so do u and v.

2.3 Semiretracts and lattices (Optional)

The intersection of two retracts of the free monoid on a ¬nite set A need not be

a retract if A contains four or more symbols. Possibly the simplest example is

the following one adapted from [7]: Let A = {a, b, c, d}. The sets {ab, ac, d}

and {ba, c, da} are key codes and consequently the submonoids R and R that

they generated are retracts of A— . However, their intersection

R © R = (d(ab)— ac)—

is not only not a retract; it is not even ¬nitely generated. The set d(ab)— ac is a

uniquely decipherable code since simply noting the locations of the occurrences

of the symbol d in any string that is a concatenation of these words provides

the unique segmentation of the string into generators. Thus R © R is a free

submonoid, although not a retract of A— . In fact, the intersection of any family,

whether ¬nite or in¬nite, of free submonoids of a free monoid is free [37].

Consequently the family of free submonoids of a free monoid is always a

complete lattice. By broadening our attention slightly we obtain a similarly

attractive stability result for what we call semiretracts of free monoids:

2.3 Semiretracts and lattices (Optional) 35

De¬nition 2.14 By a semiretract of A— , we mean an intersection of a ¬nite

number of retracts of A— .

Each retract of A— is also a semiretract. The clearest example of a semiretract

that is not a retract is the example given previously: R © R = (d(ab)— ac)— .

Some pairs of retracts have as their intersection a retract:

{abc, d}— © {a, bcd}— = {abcd}— .

As stated above, but not to be demonstrated here, if fewer than four alpha-

bet symbols appear in the keycodes that generate a class of retracts, then the

intersection of this collection must also be a retract.

Since every retract of A— is a regular language, every semiretract is also

a regular language. Thus this section continues to yield examples of regular

languages.

The de¬nition of a semiretract provides one closure property, “the inter-

section of a ¬nite number of semiretracts is semiretract.” A stronger result is

true, but not obvious, “the intersection of any ¬nite or in¬nite collection of

semiretracts of a free monoid A— on a ¬nite alphabet A is again a semiretract.”

This is an immediate consequence of a co-compactness property of the family

of semiretracts of A— which is included in the appendices. Every collection of

retracts of A— has a ¬nite sub-collection that has the same intersection as the

original collection. The intersection of the ¬nite sub-collection is a semiretract

of A— by the de¬nition of semiretract.

The elementary set theoretic union of two semiretracts need not be a semire-

tract nor even a submonoid: a — and b— are retracts, but a — ∪ b— is not a sub-

monoid. However, given any collection C of semiretracts, whether ¬nite or

in¬nite, let M be the intersection of the class of all semiretracts of A— , each

of which contains every semiretract in C. There is at least one semiretract that

contains them all, namely A— itself. The resulting intersection M is a semiretract

of A— as explained in the previous paragraph. For each such C we denote M by

∨C. We summarize the discussions of this section as:

Theorem 2.5 Let A be a ¬nite alphabet. The set of all semiretracts of A— is a

complete lattice with binary operations © and ∨ having A— as maximal element

and {»} as minimal element.

The semiretracts of a free monoid and the lattice they form have been studied

previously in [1], [2], and [3].

Languages and codes

36

Exercises

(1) Find the code of the semiretract which is the intersection of retracts with

key codes {ab, cb, cd} and {a, bc, d}.

(2) Find the code of the semiretract which is the intersection of retracts with

key codes {ab, st, sd, e f, eg} and {a, bs, ts, de, f e, g}.

(3) Find the code of the semiretract which is the intersection of retracts with

key codes {ba, st, sd, e f, eg} and {as, bs, ts, de, f e, g}.

(4) Find the key codes of two retracts whose intersection has the basis

ab(de f )— dgh.

(5) Find the key codes of two retracts whose intersection has the basis

ab(de)— d f g(hk)— hm.

(6) Prove that a key code of a retract is a pre¬x code.

(7) Prove that a key code of a retract is an in¬x code.

(8) Find a semiretract that is the intersection of three retracts, but not two

retracts.

3

Automata

3.1 Deterministic and nondeterministic automata

An automaton is a device which recognizes or accepts certain elements of — ,

where is a ¬nite alphabet. Since the elements accepted by the automaton are

a subset of — , they form a language. Therefore each automaton will recognize

or accept a language contained in — . The language of — consisting of the

words accepted by an automaton M is the language over — accepted by M

and denoted M(L). We will be interested in the types of language an automaton

accepts.

De¬nition 3.1 A deterministic automaton, denoted by ( , Q, s0 , ’, F), con-

sists of a ¬nite alphabet , a ¬nite set Q of states, and a function ’ : Q — ’

Q, called the transition function and a set F of acceptance states. The set Q

contains an element s0 and a subset F, the set of acceptance states.

The input of ’ is a letter of and a state belonging to Q. The output is a

state of Q (possibly the same one). If the automaton is in state s and “reads”

the letter a, then (s, a) is the input for ’ and ’(s, a) is the next state. Given a

string in — the automaton “reads” the string or word as follows. Beginning at

the initial state s0 , and beginning with the ¬rst letter in the string (if the string is

nonempty), it reads the ¬rst letter of the string. If the ¬rst letter is the letter a of

, then it “moves” to state s = ’(s0 , a). The automaton next “reads” the second

letter of the string, say b, and then moves to state s = ’(s, b). Therefore, as the

automaton continues to “read” a string of letters from the alphabet it “moves”

from one state to another. Eventually the automaton “reads” every letter in the

string and then stops. If the state the automaton is in after reading the last letter

belongs to the set of acceptance states, then the automaton accepts the string.

Let M be the automaton with alphabet = {a, b}, set of states Q = {s0 , s1 , s2 },

37

Automata

38

and ’ de¬ned by the table

’ s0 s1 s2

a s1 s2 s2

b s0 s0 s1

Suppose M “reads” the string aba. Since the automaton begins in state s0 , and

the letter read is a, and ’(s0 , a) = s1 , the automaton is now in state s1 . The

next letter read is b and ’(s1 , b) = s0 . Finally the last letter a is read and, since

’(s0 , a) = s1 , the automaton remains in state s1 . We may also state ’ as a set

of rules as follows:

If in state s0 and a is read go to state s1 .

If in state s1 and a is read go to state s2 .

If in state s2 and a is read go to state s2 .

If in state s0 and b is read go to state s0 .

If in state s1 and b is read go to state s0 .

If in state s2 and b is read go to state s1 .

Let s0 and s2 be the acceptance states.

This deterministic automaton is best shown pictorially by a state diagram

which is a directed graph where the states are represented by the vertices and

each edge from s to s is labeled with a letter, say a, of the alphabet if

’(s, a) = s . A directed arrow from s to s labeled with the letter a will be

called an a-arrow from s to s . If s is a starting state, then its vertex is denoted

by the diagram

s

If s is an acceptance state, its vertex is denoted by the diagram

s

Therefore the deterministic automaton above may be represented pictorially

as seen in Fig. 3.1. More speci¬cally, an automaton “reads” a word or string

a0 a1 a2 . . . an of — by ¬rst reading a0 , then reading a1 and continuing until it

has read an . If an automaton is in state s1 and reads the word w and is then

in s2 , then w is a path from s1 to s2 . A deterministic automaton accepts or

recognizes a0 a1 a2 . . . an if after beginning with a0 in state s0 and continuing

until reading an , the automaton stops in an acceptance state. Thus the automaton

3.1 Deterministic and nondeterministic automata 39

a

b

a

a

s2

s0 s1

b

b

Figure 3.1

above would not accept aba since s1 is not an acceptance state. It would however

accept bbaaa and bab, since s0 and s2 are acceptance states.

The automaton with the state diagram

a b

b

s4

a

a s1 a

s0

ba

s3 b

b

s2

has initial state s0 and acceptance state s3 . It accepts the word aba since after

reading a, it is in state s1 . After reading b, it is still in state s1 . After reading

the second a, it is in state s3 , which is an acceptance state. One can see that

it also accepts abbba and bb, so they are in the language accepted by the

given automaton. However bbb, abab, and abb are not. Notice that any string

beginning with two as or two bs is accepted only if the string is not extended.

Also, if three as occur in the string, the string is not accepted. The state s4 is an

example of a sink state. Once the automaton is in the sink state, it can never

leave this state again, regardless of the letter read.

Since ’ is a function, a deterministic automaton can always read the entire

string. We shall later de¬ne a nondeterministic automaton which may not always

be able to read the entire string. In such a case the word cannot be accepted.

Example 3.1 Consider the automaton with state diagram

a

b

b b

s0 s1 s3

s2

b

a a

a

having = {a, b}, starting state s0 , and acceptance states s0 , s1 , and s2 . It

obviously accepts the word bb. In each state, there is a loop for a so that

Automata

40

if a is read then the state does not change. This enables us to read as many

as as desired without changing states, before reading another b. Thus the

automaton reads aababaaa, baabab, baaab, babaaa, aabaabaa, and in fact

we can read any word in the language described by the regular expression

(a— ba— ba— ) ∨ (a— ba— ) ∨ a— . This language can also be described as the set of all

words containing at most two bs. Notice that s3 is a sink state.

Example 3.2 Consider the automaton with state diagram

b

a s3

s1

s0

b ca

a

b

s2 c

c

which we simplify as

b,c

a s3

s1

s0

a

s2

b,c a,b,c

a,b,c

to decrease the number of arrows. This automaton obviously accepts only

the words ab and ac. This language may be described by the regular expres-

sion a(b ∨ c). Notice that the sink state s2 eliminates all other words from the

language.

Example 3.3 Consider the automaton with state diagram

b

a b c

s1 s3

s0 s4

a,c

a,b

c s2 a,b,c

a,b,c

The only words accepted are b and abc. Therefore the expression for the lan-

guage accepted is b ∨ abc.

3.1 Deterministic and nondeterministic automata 41

Example 3.4 Consider the automaton with state diagram

a a,b

a

b b

s0 s1 s2 s3

b

a

In this automaton, if three consecutive bs are read, then the automaton is in state

s3 , which is a sink state and is not an acceptance state. This is the only way to

get to s3 and every other state is an acceptance state. Thus the language accepted

by this automaton consists of all words which do not have three consecutive bs.

An expression for this language is

(a ∨ (ba) ∨ (bba))— (» ∨ b ∨ (bb)).

As previously mentioned, the automata that we have been discussing are

called deterministic automata since in every state and for every value of the

alphabet that is read, there is one and only one state in which the automata can

be. In other words, ’ : Q — ’ Q is a function. It is often convenient to relax

the rules so that ’ is no longer a function, but a relation. If we again consider

’ as a set of rules, given a ∈ and s ∈ Q, the rules may allow advancement

to each of several states or there may not be a rule which does not allow it to

go to any state after reading a in state s. In the latter case, the automaton is

“hung up” and can proceed no further. This cannot occur with a deterministic

automaton.

Although the de¬nition of a nondeterministic automaton varies, we shall use

the following de¬nition:

De¬nition 3.2 A nondeterministic automaton, denoted by

( , Q, s0 , ’, F)

consists of a ¬nite alphabet , a ¬nite set Q of states, and a function

’ :Q— ’ P(Q)

called the transition function. The set Q contains an element s0 and a subset

F containing one or more acceptance states. (Note that P(Q) is the power set

of Q.)

Thus given a ∈ and s ∈ Q, there may be a-arrows from s to several dif-

ferent states or to no state at all. By de¬nition, a deterministic automaton is also

Automata

42

considered to be a nondeterministic automaton. A nondeterministic automaton

often simpli¬es the state diagram and eliminates the need for a sink state. In

Example 3.2, the state diagram can be simpli¬ed to

a b,c

s1

s0 s2

Note that in reading aa, after reading the ¬rst a, the automaton is in state s3 ,

and when the second a is read the automaton “hangs up”, since there is no a

arrow out of state s3 .

Example 3.5 The deterministic automaton represented by

a b

b

s4

a

a s1 a

s0

ba

s3 b

b

s2

can be simpli¬ed using a nondeterministic automaton by simply eliminating

state s4 and all arrows into or out of this state.

Example 3.6 It is easily seen that the automaton with state diagram

b

a c

s0 s1 s2

accepts the language with regular expression ab— c.

Example 3.7 The automaton with state diagram

s1

a,b

s0

accepts the language with regular expression a ∨ b.

3.1 Deterministic and nondeterministic automata 43

Example 3.8 The automaton with state diagram

a b

a b

s0 s1 s2

accepts the language with regular expression aa— bb— .

Example 3.9 The automaton with state diagram

a,b

a,b

a,b a

s1 s2

a

s0

accepts the language consisting of strings with at least two as and so may be

written as (a ∨ b)— a(a ∨ b)— a(a ∨ b)— .

Obviously any language accepted by a deterministic automaton is accepted

by a nondeterministic automaton since the set of deterministic automata is a

subset of the set of nondeterministic automata. In the following theorem, how-

ever, we shall see that any language accepted by a nondeterministic automaton

is also accepted by a deterministic automaton.

Theorem 3.1 For each nondeterministic automaton, there is an equivalent

deterministic automaton that accepts the same language.

We demonstrate how to construct a deterministic automaton which accepts

the language accepted by a nondeterministic automaton. We shall later give a

formal proof that a language is accepted by a deterministic automaton if and

only if it is accepted by a nondeterministic automaton. If Q is the set of states

for the nondeterministic automaton, we shall use elements of P(Q), i.e. the

set of subsets of Q, as states for the deterministic automaton which we are

constructing. Some of these states may not be used since they do not occur

on any path which leads to acceptance state. Hence they could be removed

and greatly simplify the deterministic automaton created. However, for our

purpose, we are only interested in showing that a deterministic automaton can be

created.

Automata

44

In general we have the following procedure for constructing a deterministic

automaton

M = ( , Q , {s0 }, ’ , F )

from a nondeterministic automaton.

N = ( , Q, s0 , ’, F).

(1) Begin with the state {s0 } where s0 is the start state of the nondeterministic

automaton.

(2) For each ai ∈ , construct an ai arrow from {s0 } to the set consisting of all

states such that there is an ai -arrow from s0 to that state.

(3) For each newly constructed set of states s j and for each ai ∈ construct

an ai arrow from s j to the set consisting of all states such that there is an ai

arrow from an element of s j to that state.

(4) Continue this process until no new states are created.

(5) Make each set of states s j , that contains an element of the acceptance set

of the nondeterministic automaton, into an acceptance state.

Example 3.10 Consider the nondeterministic automaton N

a b

a b

s0 s1 s2

Construct an a-arrow from {s0 } to the set of all states so that there is an a-arrow

from s0 to that state. Since there is an a-arrow from s0 to s0 and an a-arrow from

s0 to s1 , we construct an a-arrow from {s0 } to {s0 , s1 }. There is no b-arrow from

s0 to any state. Hence the set of all states such that there is a b-arrow to one of

these states is empty and we construct a b-arrow from {s0 } to the empty set ….

We now consider the state {s0 , s1 }. We construct an a-arrow from {s0 , s1 } to the

set of all states such that there is an a-arrow from either s0 or s1 to that state.

Thus we construct an a-arrow from {s0 , s1 } to itself. We construct a b-arrow

from {s0 , s1 } to the set of all states such that there is a b-arrow from either s0

or s1 to that state. Thus construct a b-arrow from {s0 , s1 } to {s2 }. Since there

are no a-arrows or b-arrows from any state in the empty set to any other state,

we construct an a-arrow and a b-arrow from the empty set to itself. Consider

{s2 }. Since there is no a-arrow from s2 to any other state, we construct an a-

arrow from {s2 } to the empty set. Since the only b-arrow from s2 is to itself, we

construct a b-arrow from {s2 } to itself. The acceptance states consist of all sets

3.1 Deterministic and nondeterministic automata 45

which contain an element of the terminal set of N . In this case {s2 } is the only

acceptance state. We have now completed the state diagram

a

{s0 ,s1}

b

a

b

{s0} {s2}

a

b

˜

a b

which is easily seen to be the state diagram of a deterministic automaton. This

automaton also reads the same language as N , namely the language described

by the expression aa— bb— .

Example 3.11 Given the nondeterministic automaton

b b

s2

s0 s1

a

a

a

s3 a

using the same method as above we complete the deterministic automaton

a

{s0} {s1}

b

b

{s0 ,s2}

˜

a b a

b

a,b

{s3}

a {s1 ,s3}

At this point we introduce a new notation. The ordered pair (si , w) indicates

that the automaton is in state si and still has input w left to read. For example,

Automata

46

(s2 , abbb) indicates that the automaton is in state s2 and must still read abbb.

Assume that we have (si , aw) w ∈ + . Thus the automaton is in state si and

must still read a followed by w. The notation (si , aw) (s j , w) means that the

automaton has read a and moved from state si to state s j . Therefore ’(si , a) =

s j . In the automaton

a a,b

a

b b

s0 s1 s2 s3

b

a

we have (s2 , bab) (s3 , ab). We also have

(s0 , babba) (s1 , abba) (s0 , bba) (s1 , ba) (s2 , a) (s0 , »).

If we have (si , wi ) (s j , w j ) · · · (sm , wm ), we denote this by (si , wi ) —

(sm , wm ). We also let (s, w) — (s, w). Thus a word w is accepted by an automa-

ton if and only if (s0 , w) — (s, ») where s is an acceptance state. In our example

(s0 , bababb) — (s0 , »), so bababb is accepted by the automaton.

We shall now prove that a language is accepted by a deterministic automa-

ton if and only if it is accepted by a nondeterministic automaton. We begin

with two lemmas. The ¬rst is obvious since every deterministic automaton is a

nondeterministic automaton.

Lemma 3.1 Every language accepted by a deterministic automaton is

accepted by a nondeterministic automaton.

Lemma 3.2 Let N = ( , Q, s0 , ’, F) be a nondeterministic automaton and

M = ( , Q , {s0 }, ’ , F ) be the deterministic automaton derived from N using

the above process. Then (s0 , w) — (s, ») in N if and only if there exists X such

that ({s0 }, w) — (X, ») in M where s ∈ X .

Proof We ¬rst show that if (s0 , w) — (s, ») in N , then ({s0 }, w) — (X, »)

where s ∈ X. The proof uses induction on the length n of w. If n = 0, we

have (s0 , ») — (s0 , ») in N , ({s0 }, ») — ({s0 }, ») in M, and s0 ∈ {s0 }, so the

statement is true if n = 0. Assume w = va ∈ + has length k + 1, so v has

length n. Since (s0 , va) — (s, »), then (s0 , va) — (t, a) (s, ») for some t ∈ Q

and (s0 , v) — (t, »). Therefore by induction, there exist Y so that t ∈ Y and

({s0 }, v) — (Y, »). Since t ∈ Y and (t, a) (s, ») in N , (Y, a) (X, ») for some

X where s ∈ X . Therefore ({s0 }, va) — (Y, a) (X, ») or ({s0 }, va) — (X, »)

where s ∈ X .

3.1 Deterministic and nondeterministic automata 47

Conversely, we show that if ({s0 }, w) — (X, ») in M, then (s0 , w) — (s, »)

in N where s ∈ X . We again use induction on n, the length of the word w.

Assume there exists X such that ({s0 }, w) — (X, ») in M where s ∈ X . If

n = 0, we have ({s0 }, ») — ({s0 }, ») in M, (s0 , ») — (s0 , ») in N , and s0 ∈ {s0 },

so the statement is true if n = 0. Given ({s0 }, va) with length k + 1, so v has

length n. Assume ({s0 }, va) — (Y, a) (X, »). Therefore ({s0 }, v) — (Y, »).

By induction, (s0 , v) — (t, ») for all t in Y and hence (s0 , va) — (t, a) for all t in

Y . By de¬nition, since (Y, a) (X, ») and (t, a) (s, »),then (s0 , w) — (s, »)

in N for s ∈ X .

We are now able to prove the desired Theorem 3.1.

Theorem 3.2 A language is accepted by a deterministic automaton if and

only if it is accepted by a nondeterministic automaton.

Proof To show this we need only show that a word is accepted by a non-

deterministic automaton if and only if it is accepted by the corresponding

deterministic automaton. If (s0 , w) — (s, ») where s is an acceptance state

in the nondeterministic automaton, then ({s0 }, w) — (X, ») where X contains

an acceptance state. Hence X is an acceptance state. Assume X is an accep-

tance state, then it contains an acceptance state r from the nondeterministic

automaton. But by the previous lemma, if ({s0 }, w) — (X, ») and r ∈ X then

(s0 , w) — (r, »). Therefore r is an acceptance state.

At this point we shall de¬ne an extended nondeterministic automaton and

prove that a language is accepted by an extended nondeterministic automaton

if and only if it is accepted by a nondeterministic automaton (and hence a

deterministic automaton).

Using a nondeterministic automaton, we can extend the automaton so that

( + , Q, s0 , ’, F) consists of + , a ¬nite set Q of states, and a function

’ : + — Q ’ P(Q), called the transition function. Thus ’ reads words

instead of letters. This can be changed back to reading letters by adding new

nonterminal states. If ’ reads the word w = a1 a2 · · · ak, and moves from state

s to state s , add states σ2 σ3 · · · σk , and let ’(s, a1 ) = σ2 , ’(σ2 , a2 ) = σ3 ,

’(σ3 , a3 ) = σ4 , . . . , ’(σk’1 , ak’1 ) = σk , and ’(σk , ak ) = s . This forms a

nondeterministic automata, but we can form a deterministic automata with

the same language as shown above.

If we allow the automaton to pass from one state si to another state s j without

reading a letter of the alphabet, this may be shown as the automaton having an

edge from si to s j with label ». Thus paths may contain one or more » s. Such

an automaton is said to have »-moves. We can then have an automaton with the

form ( — , Q, s0 , ’, F).

Automata

48

Formally a ¬nite automaton M = ( , Q, s0 , ’, F) with »-moves has the

property that ’ maps Q ∪ {»} to Q. We wish to create a deterministic automata

M = ( , Q , s0 , ’ , F ) containing no »-moves with the same language. Thus

M(L) = M (L). Given a letter q in , de¬ne E(q) to be all the states that

are reachable from q without reading a letter in the alphabet. Thus E(q) =

{ p : (q, w) ( p, w). In our construction, the set of states of M is a subset of

P(Q). The state s0 = E(s0 ), and F is a set containing an element of F. For

each element a of , de¬ne ’ by ’ (P, a) = p∈P E(’( p, a)).

We ¬rst show that M is deterministic. It is certainly single valued. Further

’ (P, a) will always have a value even if it is the empty set.

We must now show that M(L) = M (L). To do this we show that for any

states p and q in Q, and any word w in —

— —

( p, w) (q, ») in M if and only if (E( p), w) (P, ») in M

for some P containing q. From this it will follow that

— —

(s0 , w) ( f, ») in M if and only if (E(s0 ), w) (P, ») in M

for some P containing f , where f ∈ F.

We prove this using induction of the length of w. If |w| = 0, then w = »,

and it must be shown that

— —

( p, ») (q, ») in M if and only if (E( p), ») (P, ») in M

for some P containing q. Now ( p, ») — (q, ») if and only if q ∈ E( p);

but since M is deterministic and no letter is read, then P = E( p) and

p ∈ E( p). Therefore the statement is true if |w| = 0.

Assume the statement is true for all strings having nonnegative length k. We

now have to prove the statement is true for any string w with length k + 1.

’: Assume w = va for some letter a and w and ( p, w) — (q, ») so that

— —

( p, va) (q1 , a) (q2 , ») (q, »)

where at the end, possibly no letters of the alphabet are read. Since ( p, va) —

(q1 , a) then ( p, v) — (q1 , ») and, by induction, (E( p), v) — (R, ») for some R

containing q1 . But since (q1 , a) (q2 , »), by construction, E(q2 ) ⊆ ’ (R, a),

and since (q2 , ») — (q, »), q ∈ E(q2 ) by de¬nition of E, and hence q ∈

’ (R, a). Therefore (R, a) ((P, ») for some P containing q by de¬nition

of ’ and (E( p), va) — (R, a) ((P, ») for some P containing q.

In M , assume (E( p, va)) — (R, a) (P, ») where q ∈ P and ’ (R, a) =

P. By de¬nition ’ (R, a) = r ∈R E(’(r, a)). There exists some state r ∈ R

such that ’(r, a) = s and q ∈ E(s). Therefore (s, ») — (q, ») by de¬nition

3.1 Deterministic and nondeterministic automata 49

— —

of E(s). By the induction hypotheses ( p, v) (r, »). Therefore ( p, va)

(r, a) (s, ») — (q, »).

Given the automaton (M = ( , Q, s0 , ’, F)

Example 3.12

c

b

a

l

l

s0 s2

s1

c

b

l

a

s3

which has »-moves, we construct M = ( , Q , s0 , ’ , F ) containing no

»-moves: E(s0 ) = {s0 , s1 , s2 }, E(s1 ) = {s1 , s2 }, E(s2 ) = {s2 }, and E(s3 ) =

{s0 , s1 , s2 , s3 }. Denote these sets by s0 , s1 , s2 , and s3 respectively. Then ’

is given by the following table

a b c

s0 s3 s1 s2

…

s1 s1 s2

… …

s2 s2

s3 s3 s1 s2

giving the »- free automaton

b

s'1 c

b

c c

s'0 s'2

a

b

a a,b

c

˜

s'3

a,b,c

a

Both automata generate the language a— b— c— .

Automata

50

Exercises

(1) Which of the following words are accepted by the automaton?

a

b

a

a

s2

s0 s1

b

b

(a) abba.

(b) aabbb.

(c) babab.

(d) aaabbb.

(e) bbaab.

(2) Which of the following words are accepted by the automaton?

a

b

b

b

s2

s0 s1

a

a

(a) aaabb.

(b) abbbabbb.

(c) bababa.

(d) aaabab.

(e) bbbabab.

(3) Write an expression for the language accepted by the automaton

a,b

a

b

b

s2

s0 s1

a

(4) Write an expression for the language accepted by the automaton

a

a

b

b s3

s0 s2

a

b

s1

3.1 Deterministic and nondeterministic automata 51

(5) Write an expression for the language accepted by the automaton

a

a

b

b b

s2

s1

s0

a

(6) Write an expression for the language accepted by the automaton

b

a a

a

a,b

s1

s0 s2 s3

a

b

b

(7) Find a deterministic automaton which accepts the language expressed by

aa— bb— cc— .

(8) Find a deterministic automaton which accepts the language expressed by

(a— ba— ba— b)— .

(9) Find a deterministic automaton which accepts the language expressed by

(a— (ba)— bb— a)— .

(10) Find a deterministic automaton which accepts the language expressed by

(a— b) ∨ (b— a)— .

(11) Find a nondeterministic automaton which accepts the language expressed

by aa— bb— cc— .

(12) Find a nondeterministic automaton which accepts the language expressed

by (a— b) ∨ (c— b) ∨ (ac)— .

(13) Find a nondeterministic automaton which accepts the language expressed

by (a ∨ b)— (aa ∨ bb)(a ∨ b)— .

(14) Find a nondeterministic automaton which accepts the language expressed

by ((aa— b) ∨ bb— a)ac— .

(15) Find a deterministic automaton which accepts the same language as the

nondeterministic automaton

a,b

a

b

a s1

s0 s2

b

(16) Find a deterministic automaton which accepts the same language as the

nondeterministic automaton

Automata

52

a

b a,b

s0 s2

a

b

a

s1

a,b

(17) Find a deterministic automaton which accepts the same language as the

nondeterministic automaton

b a,b

a

s0 s2

s1 a

b

(18) Find a deterministic automaton which accepts the same language as the

nondeterministic automaton

b

a,b

b

s1

a

s3

s0 a

a a

s2

3.2 Kleene™s Theorem

In this section we show Kleene™s Theorem which may be stated as follows:

Theorem 3.3 A language is regular if and only if it is accepted by an auto-

maton.

We begin by showing that the rules de¬ning a regular language can be

duplicated by an automaton. First it is shown that there are automata which

accept subsets of the ¬nite set . Then it is shown that if there are automata

that accept languages L 1 and L 2 , we can construct automata that accept L 1 L 2 ,

L — , and L 1 ∪ L 2 . Thus we show that for every regular language over a ¬nite

1

set , there is a nondeterministic automaton that accepts that language.

3.2 Kleene™s Theorem 53

The set is »; it is accepted by the automaton with state diagram.

s

It may be preferable to create a state s1 , which is not an acceptance state, and

have all arrows from both s0 and s1 go to s1 .

We next show that for every ¬nite subset of , there is an automaton that reads

that subset. If the automaton has no acceptance state, then the language accepted

is the empty set. The elements of the set {a1 , a2 , a3 , . . . , an } are accepted by

the automaton with state diagram

s1

s2

a1

s0 a2

a3 s3

¦

an

sn

In particular if a ∈ , it is accepted by the automaton with state diagram

a

s0 s1

We next show that if regular languages with expression L 1 and L 2 are both

accepted by automata, then their concatenation L 1 L 2 is also accepted by an

automaton. Assume L 1 is accepted by the automaton M1 = ( , Q, s0 , ’, F),

and L 2 is accepted by the automaton M2 = ( , Q , s0 , ’ , F ). We shall con-

sider both automata to be deterministic without loss of generality. We now

de¬ne a new automaton M = ( , Q , s0 , ’ , F ) which is essentially the ¬rst

automaton followed by the second automaton. Put simply, place the state dia-

gram for M2 after the state diagram for M1 . If, for a ∈ , there is an a-arrow

from any state s in the state diagram for M1 to an acceptance state in the state

diagram for M1 , then change the acceptance state into a nonacceptance state

and also place an a-arrow from s to the starting state in the state diagram for

M2 . This is the state diagram for M. Thus the set of states Q = Q ∪ Q , so that

Q consists of all the states in M1 and M2 . We shall assume that M1 and M2

Automata

54

have no common states. If they do, we can always relabel them. Since we want

to begin in M1 , we let s0 be the starting state of M so that s0 = s0 . Since we

want to ¬nish in M2 , we let the set of acceptance states be F so that F = F .

We de¬ne the rules for ’ as follows.

If the rule

If in state si and a is read, go to state s j

is in ’ and s j is not an acceptance state then include this rule in ’ . If s j is an

acceptance state then include this rule in ’ and also include the rule

If in state si and a is read, go to state s0 .

Hence there is the option of going to the state s j or skipping over to s0 in the

second automaton. Again recall that s j now ceases to be an acceptance state.

If the rule is in ’ then it is included in ’ . As a result, if the automaton M

has read a word in L 1 , it may then skip over and read a word in L 2 . As a special

case, consider the possibility of » being a word in L 1 . Include the rule

If state s0 is an acceptance state, go to state s0 .

When these rules are followed, a word in L 1 L 2 ends up in an acceptance state of

M2 and hence ’ so that it is accepted by M. Therefore every string consisting

of a word in L 1 followed by a word in L 2 is accepted by M, and L 1 L 2 is

accepted by M.

Example 3.13 Let L 1 be the language described by the language (ab)— c and

having automaton M1 with state diagram

a

s2

s0 s1

b

c

Let L 2 be the language described by the language ab— c— and having automaton

M2 with state diagram

b c

a c

s'0 s'1 s'2

To ¬nd the state diagram for the language L 1 L 2 , place the state diagram for M2

after the state diagram for M1 . Since there is a c-arrow from s0 to s2 , and s2 is

3.2 Kleene™s Theorem 55

an acceptance state, add a c-arrow from s0 to s0 . The state diagram

b c

a

a c

s'1

s0 s'0

s1 s'2

s2

b

c

c

is the state diagram for M, the automaton for L 1 L 2 .

Example 3.14 Let L 1 and L 2 and their respective automata be the same as

those in the previous example. To ¬nd the automaton for the language for L 2 L 1

is slightly more complicated. First we place the state diagram for M1 after the

state diagram for M2 . There is an a-arrow from s0 to s1 and s1 is an acceptance

state, so place an a-arrow from s0 to s0 . There is a b-arrow from s1 to s1 and s1

is an acceptance state, so place a b-arrow from s1 to s0 . There is a c-arrow from

s1 to s2 and s2 is an acceptance state, so place a c-arrow from s1 to s0 . There is

a c-arrow from s2 to s2 and s2 is an acceptance state, so place a c-arrow from s2

to s0 . Then change s1 , s2 so that they are not acceptance states. Thus we have

the state diagram

c

b

a

c

c

a

s0 s1

s'0 s2

s'2

s'1 b

b,c c

a

which is the state diagram for M, the automaton for L 2 L 1 .

Similarly we show that if L is a language accepted by an automaton M1 =

( , Q, s0 , ’, F) then L — is also accepted by an automaton. We now de¬ne a

new automaton M = ( , Q , s0 , ’ , F ) which is essentially the same as M1

except M is looped to itself. Let M be de¬ned as follows: Create a new state

s0 , and make it an acceptance state. We include state s0 so M will accept the

empty word. For each rule

If in state s0 and a is read, go to state s j

for a ∈ , add the rule

If in state s0 and a is read, go to state s j .

Thus if there is an a-arrow from s0 to s j , there is an a-arrow from s0 to s j .

Include all of the current rules for M1 in M. In addition, for a ∈ , if there is

Automata

56

an a-arrow from any state s in the state diagram for M1 to an acceptance state

in the state diagram for M1 , then also place an a-arrow from s to s0 in the state

diagram for M. This is the state diagram for M. The set of states Q = Q ∪ {s0 }.

The set of acceptance states for M is F ∪ {s0 }.

We thus de¬ne the rules for ’ as follows:

If in state s0 and a is read, go to state s j

for a ∈ , add the rule

If in state s0 and a is read, go to state s j .

If the rule is in ’, then include this rule in ’ . If s j is an acceptance state

and

If in state si and a is read, go to state s j

then also include the rule

If in state si and a is read, go to state s0 .

Hence there is the option of going to the acceptance state s j or skipping over

to s0 .

Example 3.15 Let the diagram

b

s3

s1

a a

s0 b

b

s4

s2

b

a

be the state diagram for the automaton accepting the language L, then the

diagram

b

b

a

s3

s0 s1

b a

a

b b

b

b

s'0 s2 s4

b

a

is the state diagram for the automaton accepting the language L — .

3.2 Kleene™s Theorem 57

Since the relationship between an automaton and its state diagram should

be evident by now, in the future we will identify an automaton with its state

diagram. Given automata M and M accepting languages L and L respectively,

we now wish to construct an automaton M which accepts L ∪ L . We wish

to read a word simultaneously in M and M , and accept it if it is accepted by

either M or M .

We now show how to construct M which accepts L ∪ L given automata

M = ( , Q, s0 , ’, F) and M = ( , Q , s0 , ’ , F ) If s0 and s0 are the initial

states of M and M respectively then construct a new initial state s0 , which is an

acceptance state if either s0 or s0 is, and let M = ( , Q , s0 , ’ , F ) where

Q = Q ∪ Q ∪ {s0 } and F = F ∪ F . Let ’ = ’ ∪ ’ together with the

following rules: If there is a rule

If in state s0 and a is read, go to state s j

for a ∈ in ’ include the rule

If in state s0 and a is read, go to state s j

in ’ .

If there is a rule

If in state s0 and a is read, go to state s j