<<

. 2
( 9)



>>

Languages and codes
28


(6) Let = {a, b, c}.
(a) Give a regular expression for the set of all elements of — containing
exactly two bs
(b) Give a regular expression for the set of all elements of — containing
exactly two bs and two cs
(c) Give a regular expression for the set of all elements of — containing
two or more bs
(d) Give a regular expression for the set of all elements of — beginning
and ending with a and containing at least one b and one c.
(e) Give a regular expression for the set of all elements of — consisting
of one or more as, followed by one or more bs and then one or more
cs.
(7) Let = {a, b}.
(a) Give a regular expression for the set of all elements of — containing
exactly two bs or exactly two as.
(b) Give a regular expression for the set of all elements of — containing
an even number of bs.
(c) Give a regular expression for the set of all elements of — beginning
and ending with a and containing at least one b.
(d) Give a regular expression for the set of all elements of — such that
the number of as in each string is divisible by 3 or the number of bs
is divisible by 5.
(e) Give a regular expression for the set of all elements of — such that
the length of each string is divisible by 3.
(8) Which of the following are uniquely decipherable codes?
(a) {ab, ba, a, b}
(b) {ab, acb, accb, acccb, . . .}
(c) {a, b, c, bd}
(d) {ab, ba, a}
(e) {a, ab, ac, ad}.
(9) Which of the following expressions describe uniquely decipherable codes?
(a) ab—
(b) ab— ∨ baaa
(c) ab— c ∨ baaac
(d) (a ∨ b)(b ∨ a)
(e) (a ∨ b ∨ »)(b ∨ a ∨ »).
(10) Which of the following are uniquely decipherable codes? Which are suf¬x
codes?
(a) {ab, ba}
(b) {ab, abc, bc}
2.2 Retracts (Optional) 29


(c) {a, b, c, bd}
(d) {aba, ba, c}
(e) {ab, acb, accb, acccb}.
(11) Which of the following expressions describe pre¬x codes? Which describe
suf¬x codes?
(a) ab—
(b) ab— c
(c) a— bc—
(d) (a ∨ b)(b ∨ a)
(e) a— b.
Show that the intersection of the monoids {ac, bc, d}— and {a, cb, cd}—
(12)
is the monoid generated by the code described by the expression
ac(bc)— d.
(13) Prove Theorem 2.1. If a code is a suf¬x, pre¬x, in¬x, bipre¬x, or block
code, then it is uniquely decipherable.
(14) Prove Theorem 2.2. A block code is a suf¬x, pre¬x, in¬x, and bipre¬x
code.
(15) Prove Theorem 2.3. An in¬x code is a bipre¬x code.



2.2 Retracts (Optional)
In this section we discuss an additional source of examples of regular lan-
guages: the ¬xed languages of endomorphisms of free monoids A— . Each such
language is necessarily a submonoid of A— and is the image of a special type of
endomorphism called a retraction. Such images are called retracts and they are
characterized among submonoids as those submonoids that are generated by a
special class of codes called key codes.

De¬nition 2.9 Let X be a set and let f : X ’ X be a function having the
property that f ( f (x)) = f (x) for all x in X . A function with this property is
called a retraction of X and its image is called a retract of X .

Notice that the restriction of a retraction f : X ’ X to the image of f ,
{ f (x) : x ∈ X }, is the identity mapping of the image of f onto itself.

Example 2.4 For the real numbers R, the absolute value function f : R ’ R,
de¬ned by f (r ) = |r |, is a retraction and {x ∈ R : x>0} is its associated retract.
The ¬‚oor and ceiling functions, when regarded as functions from R into R
provide two additional examples of retractions which determine the same retract
{r ∈ R : r is an integer}.
Languages and codes
30


Notice that, when X is merely a set having no speci¬ed structure, every
nonempty subset S of X is a retract of S since S is the image of the retraction
r : X ’ X de¬ned by r (x) = x if x is in S and otherwise r (x) = s where s
is in S. When structures are speci¬ed on a set, its retracts may become quite
interesting. In this section we study the retracts of free monoids A— , where A
is a ¬nite set. Several of our results hold also when A is in¬nite but we leave
these extensions to the interested reader. We will assume in this section and the
next that alphabets are always ¬nite.
De¬nition 2.10 The ¬xed language of a homomorphism h : A— ’ A— is the
set L = {w ∈ A— : h(w) = w}.
Note that the ¬xed language of each homomorphism is a submonoid of A— .
Example 2.5 Let A = {a, b, c, d} and let f be the homomorphism f : A— ’
A— de¬ned by f (a) = dad, f (b) = bc, f (c) = d, f (d) = ». The ¬xed lan-
guage of f is the submonoid generated by the set {dad, bcd}. Notice that
f ( f (b)) = f (bc) = f (b) f (c) = bcd which is not equal to f (b) = bc. Con-
sequently f is not a retraction. However, notice also that the homomorphism
r : A— ’ A— , de¬ned by r (a) = dad, r (b) = bcd, r (c) = r (d) = » is a retrac-
tion and has the same image as f . Thus the image of f is a retract even though f
itself is not a retraction. Finally, note that {dad, bcd} is a uniquely decipherable
code.
The behavior of a, b, and c in Example 2.5 will provide an illustration of the
classi¬cation of alphabetical symbols that will be necessary for understanding
retracts.
De¬nition 2.11 Let A be a set and let f be a homomorphism f : A— ’ A— .
A symbol a in A is said to be mortal, with respect to f , if there is a positive
integer n for which f n (a) = »; otherwise a is said to be vital.
For each homomorphism f , the mortal/vital dichotomy of the symbols of
A may be determined as follows. For each nonnegative integer j let A j be
de¬ned inductively by: A0 is empty; A1 = {a ∈ A : h(a) = »}; and for j ≥ 2,
A j = {a ∈ A : h(a) ∈ A— }. Since A is ¬nite there will be a least nonnegative
j’1
integer m for which Am = Am+1 . The set Am is the set of all mortal symbols
and its complement in A is the set of vital symbols.
Notice that in Example 2.5 the symbols d and c are mortal and the symbols a
and b are vital. Note also that the ¬xed language is the submonoid generated by
a set of words in each of which there is exactly one occurrence of a vital symbol.
Further, each of these vital symbols occurs in only one of the generators. In
this section we show that the simple observations concerning ¬xed languages,
2.2 Retracts (Optional) 31


retractions, and codes, made for Example 2.5, are completely typical of ¬xed
languages of homomorphisms.
De¬nition 2.12 Any symbol k in A that occurs exactly once in a word w in
A— is called a key of w. A word w for which there is at least one key symbol is
called a key word.
Note that for the word dad the symbol a is the unique key. For the word bcd
each of the three symbols b, c, and d is a key. Consequently both dad and bcd
are key words. Finally, the word abcbac is not a key word since it has no key.
De¬nition 2.13 A set X of words is called a key code if each word in X is a
key word and a key for each word in X can be chosen that does not occur in
any other word in X .
Note that the set of generators given in Example 2.5, namely X =
{dad, bcd}, is a key code. The word dad allows only the unique symbol a
to be chosen as its key. The word bcd allows each of b, c, and d to be chosen
as a key. To con¬rm that X is a key code we cannot use d as a key for the word,
but if either b or c is chosen as the key for bcd, then it is con¬rmed that X is a
key code. Each key code X is uniquely decipherable since, given any string that
is the concatenation of words chosen from X , simply noting the key symbols
that occur in the string provides the unique segmentation of X into code words.
The key codes constitute a very restricted subclass of the uniquely decipher-
able codes. Such simple codes as {aa, bb, cc, dd} are uniquely decipherable,
but contain no key word at all.
Example 2.6 Let A = {a, b, c, d}. The following are key codes: {a, b, c, d},
{a, bcc, dcc}, {abcbc, bbd}, {ababcd}, and the empty set. Note the crucial fact
that {a, b, c, d} is the only key code in A— that consists of exactly four words,
since each four-word key code must use each of the four symbols in A as
a key. The following are not key codes: {abba}, {abcd, c}, {abc, bcd, cda},
{a, b, c, dd}. Note that a subset of A— that contains ¬ve or more words cannot
be a key code, since there are only four possible keys.
The following technical result is the basis for the theorem that establishes
the ¬rm relationships that hold among the concepts: ¬xed language, retract, and
key code.
The following proposition was discovered by Tom Head [16].
Proposition 2.1 Let A be an alphabet and h : A— ’ A— be a homomorphism.
Let X = {a ∈ A : h(a) = uav where only mortal symbols occur in u and v}.
For each a in X , let Na be the least nonnegative integer for which h Na (uv) = ».
Let H = {h Na (a) : a ∈ X }. The ¬xed language L of h is the submonoid of A—
Languages and codes
32


generated by H . The correspondence a”h Na (a) is a one-to-one correspon-
dence between X and H .

Proof (1) H — ⊆ L: Since h is a homomorphism it is suf¬cient to verify that
each element h N (a) (a) of H is in L, which is con¬rmed by the calculation:
h(h N (a) (a)) = h N (a) (h(a))
= h N (a) (uav)
= h N (a) (u)h N (a) (a)h N (a) (v)
= »h N (a) (a)»
= h N (a) (a).

(2) L ⊆ H — : Let w be in L. Let a1 , a2 , a3 , . . . , an be the subsequence consisting
of the occurrences of the vital elements of w. We use now the principle: a
vital symbol can come only from a vital symbol and only a mortal symbol
can eventually be erased. Since h(w) = w, each ai must occur exactly once
in h(ai ). It follows that, for each ai , we must have h(ai ) = u i ai vi where each
u i , vi must consist entirely of mortal symbols. Thus for each i there is a least
nonnegative integer N (i) for which h N (i) (u i vi ) = ». Let N be the largest of
the N (i). Note that, for each ai , h N (ai ) = h N (i) (ai ) is in H . Then w = h(w) =
h N (w) = h N (a1 )h N (a2 )h N (a3 ) . . . h N (an ) is in H — .
From (1) and (2) we have L = H — . Note that H = {h N (i) (a) : a ∈ X } is a
key code since the set X is a set of keys for H .

Theorem 2.4 Let A be a ¬nite alphabet and L be a language contained in
A— . The following three conditions on L are equivalent:

(1) L is the ¬xed language of a homomorphism of A— into A— ;
(2) L is a submonoid of A— that is generated by a key code; and
(3) L is a retract of A— .

Proof (1 ’ 2): Let h : A— ’ A— be a homomorphism. Proposition 2.1 pro-
vides us with the key code H for which L = H — .
(2 ’ 3): Let L be a submonoid of A— that is generated by a keycode X and
let K be a set of keys for X . For each k in K there are strings xk and yk for
which xk kyk is the key word in X having k as its key. De¬ne a homomorphism
r : A— ’ A— by r (k) = xk kyk for each k in K and r (a) = » for each a not in
K . Note that r is a retraction of A— having X — = L as its image, hence L is a
retract.
(3 ’ 1): Let L be a retract of A— . Then there is a retraction r : A—
’ A— that has L as its image. Then L is the ¬xed language of the the
homomorphism r .
2.2 Retracts (Optional) 33


Theorem 2.4 has several valuable corollaries the proofs of which will be
relegated to the exercises.

Corollary 2.1 A retract of a free monoid is free.

Note that this property of retracts does not hold for arbitrary submonoids of
a free monoid since, for any nonempty alphabet A, the submonoid consisting
of all words of length >2 is not free.

Corollary 2.2 If A is an alphabet having exactly n symbols, then no inclusion
chain of distinct retracts of A— has more than n + 1 retracts even when the
retract {»} is included.

Corollary 2.3 If X is a key code and x n lies in X — , then so does x.

Corollary 2.4 If X is a key code and both uv and vu lie in X — , then so do u
and v.

Let A = {a1 , a2 , a3 , . . . , an }. A simple example of a longest possible inclu-
sion chain of retracts in A— is

{a1 , a2 , a3 , . . . , an }— , {a2 , a3 , . . . , an }— , {a3 , . . . , an }— , . . . , {an }— , {»}.

Each of these retracts, except the ¬rst, is maximal among the retracts contained
in its predecessor. In each case the number of generators of the subretract is
one less than the number of generators of its predecessor. However, maximal
proper subretracts of a retract can have many fewer generators:

Proposition 2.2 Let n be a positive integer and A = {a1 , a2 , a3 , . . . , an } an
alphabet of n symbols. Let m be any positive integer less than n. Then A—
contains a maximal proper retract generated by exactly m words.

Proof The set of m words

K = a1 , a2 , a3 , . . . , am’1 , am am+1 am+2 am+3 . . .
22 2 2

an’1 an am am+1 am+2 am+3 . . . an’1 an
2



is a key code for which K — is a maximal proper retract of A— .
The veri¬cation of the maximality is left as an exercise.

The retracts of a free monoid and the the partially ordered set they form
under set inclusion have been studied previously in [16],[10], and [9].
Languages and codes
34


Exercises
(1) Which of the following are sets of key codes?
(a) {a, ab, ac, d}
(b) {ab, ac, ad, ae}
(c) {aabaa, aacaa, ddeda, dada f }
(d) {abba, acca, adda, aeea}.
(2) De¬ne the retraction maps with the following retracts on A— where A =
{a, b, c, d, x, y, z}
(a) {aabaa, acaax, daax y}
(b) {ax, bx, cx, d x}
(c) {abcd}.
(3) Prove that the restriction of a retraction f : X ’ X to the image of f , is
the identity mapping of the image of f onto itself.
(4) Prove Corollary 2.1 that a retract of a free monoid is free.
(5) Prove Corollary 2.2 “ if A is an alphabet having exactly n symbols, then no
inclusion chain of distinct retracts of A— has more than n + 1 retracts even
when the retract {»} is included.
(6) Prove Corollary 2.3 “ if X is a key code and x n lies in X — , then so does x.
(7) Prove Corollary 2.4 “ if X is a key code and both uv and vu lie in X — , then
so do u and v.


2.3 Semiretracts and lattices (Optional)
The intersection of two retracts of the free monoid on a ¬nite set A need not be
a retract if A contains four or more symbols. Possibly the simplest example is
the following one adapted from [7]: Let A = {a, b, c, d}. The sets {ab, ac, d}
and {ba, c, da} are key codes and consequently the submonoids R and R that
they generated are retracts of A— . However, their intersection
R © R = (d(ab)— ac)—
is not only not a retract; it is not even ¬nitely generated. The set d(ab)— ac is a
uniquely decipherable code since simply noting the locations of the occurrences
of the symbol d in any string that is a concatenation of these words provides
the unique segmentation of the string into generators. Thus R © R is a free
submonoid, although not a retract of A— . In fact, the intersection of any family,
whether ¬nite or in¬nite, of free submonoids of a free monoid is free [37].
Consequently the family of free submonoids of a free monoid is always a
complete lattice. By broadening our attention slightly we obtain a similarly
attractive stability result for what we call semiretracts of free monoids:
2.3 Semiretracts and lattices (Optional) 35


De¬nition 2.14 By a semiretract of A— , we mean an intersection of a ¬nite
number of retracts of A— .

Each retract of A— is also a semiretract. The clearest example of a semiretract
that is not a retract is the example given previously: R © R = (d(ab)— ac)— .
Some pairs of retracts have as their intersection a retract:

{abc, d}— © {a, bcd}— = {abcd}— .

As stated above, but not to be demonstrated here, if fewer than four alpha-
bet symbols appear in the keycodes that generate a class of retracts, then the
intersection of this collection must also be a retract.
Since every retract of A— is a regular language, every semiretract is also
a regular language. Thus this section continues to yield examples of regular
languages.
The de¬nition of a semiretract provides one closure property, “the inter-
section of a ¬nite number of semiretracts is semiretract.” A stronger result is
true, but not obvious, “the intersection of any ¬nite or in¬nite collection of
semiretracts of a free monoid A— on a ¬nite alphabet A is again a semiretract.”
This is an immediate consequence of a co-compactness property of the family
of semiretracts of A— which is included in the appendices. Every collection of
retracts of A— has a ¬nite sub-collection that has the same intersection as the
original collection. The intersection of the ¬nite sub-collection is a semiretract
of A— by the de¬nition of semiretract.
The elementary set theoretic union of two semiretracts need not be a semire-
tract nor even a submonoid: a — and b— are retracts, but a — ∪ b— is not a sub-
monoid. However, given any collection C of semiretracts, whether ¬nite or
in¬nite, let M be the intersection of the class of all semiretracts of A— , each
of which contains every semiretract in C. There is at least one semiretract that
contains them all, namely A— itself. The resulting intersection M is a semiretract
of A— as explained in the previous paragraph. For each such C we denote M by
∨C. We summarize the discussions of this section as:

Theorem 2.5 Let A be a ¬nite alphabet. The set of all semiretracts of A— is a
complete lattice with binary operations © and ∨ having A— as maximal element
and {»} as minimal element.

The semiretracts of a free monoid and the lattice they form have been studied
previously in [1], [2], and [3].
Languages and codes
36


Exercises
(1) Find the code of the semiretract which is the intersection of retracts with
key codes {ab, cb, cd} and {a, bc, d}.
(2) Find the code of the semiretract which is the intersection of retracts with
key codes {ab, st, sd, e f, eg} and {a, bs, ts, de, f e, g}.
(3) Find the code of the semiretract which is the intersection of retracts with
key codes {ba, st, sd, e f, eg} and {as, bs, ts, de, f e, g}.
(4) Find the key codes of two retracts whose intersection has the basis
ab(de f )— dgh.
(5) Find the key codes of two retracts whose intersection has the basis
ab(de)— d f g(hk)— hm.
(6) Prove that a key code of a retract is a pre¬x code.
(7) Prove that a key code of a retract is an in¬x code.
(8) Find a semiretract that is the intersection of three retracts, but not two
retracts.
3
Automata




3.1 Deterministic and nondeterministic automata
An automaton is a device which recognizes or accepts certain elements of — ,
where is a ¬nite alphabet. Since the elements accepted by the automaton are
a subset of — , they form a language. Therefore each automaton will recognize
or accept a language contained in — . The language of — consisting of the
words accepted by an automaton M is the language over — accepted by M
and denoted M(L). We will be interested in the types of language an automaton
accepts.

De¬nition 3.1 A deterministic automaton, denoted by ( , Q, s0 , ’, F), con-
sists of a ¬nite alphabet , a ¬nite set Q of states, and a function ’ : Q — ’
Q, called the transition function and a set F of acceptance states. The set Q
contains an element s0 and a subset F, the set of acceptance states.

The input of ’ is a letter of and a state belonging to Q. The output is a
state of Q (possibly the same one). If the automaton is in state s and “reads”
the letter a, then (s, a) is the input for ’ and ’(s, a) is the next state. Given a
string in — the automaton “reads” the string or word as follows. Beginning at
the initial state s0 , and beginning with the ¬rst letter in the string (if the string is
nonempty), it reads the ¬rst letter of the string. If the ¬rst letter is the letter a of
, then it “moves” to state s = ’(s0 , a). The automaton next “reads” the second
letter of the string, say b, and then moves to state s = ’(s, b). Therefore, as the
automaton continues to “read” a string of letters from the alphabet it “moves”
from one state to another. Eventually the automaton “reads” every letter in the
string and then stops. If the state the automaton is in after reading the last letter
belongs to the set of acceptance states, then the automaton accepts the string.
Let M be the automaton with alphabet = {a, b}, set of states Q = {s0 , s1 , s2 },

37
Automata
38


and ’ de¬ned by the table

’ s0 s1 s2
a s1 s2 s2
b s0 s0 s1

Suppose M “reads” the string aba. Since the automaton begins in state s0 , and
the letter read is a, and ’(s0 , a) = s1 , the automaton is now in state s1 . The
next letter read is b and ’(s1 , b) = s0 . Finally the last letter a is read and, since
’(s0 , a) = s1 , the automaton remains in state s1 . We may also state ’ as a set
of rules as follows:
If in state s0 and a is read go to state s1 .
If in state s1 and a is read go to state s2 .
If in state s2 and a is read go to state s2 .
If in state s0 and b is read go to state s0 .
If in state s1 and b is read go to state s0 .
If in state s2 and b is read go to state s1 .
Let s0 and s2 be the acceptance states.
This deterministic automaton is best shown pictorially by a state diagram
which is a directed graph where the states are represented by the vertices and
each edge from s to s is labeled with a letter, say a, of the alphabet if
’(s, a) = s . A directed arrow from s to s labeled with the letter a will be
called an a-arrow from s to s . If s is a starting state, then its vertex is denoted
by the diagram

s


If s is an acceptance state, its vertex is denoted by the diagram


s


Therefore the deterministic automaton above may be represented pictorially
as seen in Fig. 3.1. More speci¬cally, an automaton “reads” a word or string
a0 a1 a2 . . . an of — by ¬rst reading a0 , then reading a1 and continuing until it
has read an . If an automaton is in state s1 and reads the word w and is then
in s2 , then w is a path from s1 to s2 . A deterministic automaton accepts or
recognizes a0 a1 a2 . . . an if after beginning with a0 in state s0 and continuing
until reading an , the automaton stops in an acceptance state. Thus the automaton
3.1 Deterministic and nondeterministic automata 39


a
b

a
a
s2
s0 s1
b
b
Figure 3.1

above would not accept aba since s1 is not an acceptance state. It would however
accept bbaaa and bab, since s0 and s2 are acceptance states.
The automaton with the state diagram

a b
b

s4
a
a s1 a
s0
ba
s3 b
b
s2


has initial state s0 and acceptance state s3 . It accepts the word aba since after
reading a, it is in state s1 . After reading b, it is still in state s1 . After reading
the second a, it is in state s3 , which is an acceptance state. One can see that
it also accepts abbba and bb, so they are in the language accepted by the
given automaton. However bbb, abab, and abb are not. Notice that any string
beginning with two as or two bs is accepted only if the string is not extended.
Also, if three as occur in the string, the string is not accepted. The state s4 is an
example of a sink state. Once the automaton is in the sink state, it can never
leave this state again, regardless of the letter read.
Since ’ is a function, a deterministic automaton can always read the entire
string. We shall later de¬ne a nondeterministic automaton which may not always
be able to read the entire string. In such a case the word cannot be accepted.
Example 3.1 Consider the automaton with state diagram

a
b
b b
s0 s1 s3
s2

b
a a
a

having = {a, b}, starting state s0 , and acceptance states s0 , s1 , and s2 . It
obviously accepts the word bb. In each state, there is a loop for a so that
Automata
40


if a is read then the state does not change. This enables us to read as many
as as desired without changing states, before reading another b. Thus the
automaton reads aababaaa, baabab, baaab, babaaa, aabaabaa, and in fact
we can read any word in the language described by the regular expression
(a— ba— ba— ) ∨ (a— ba— ) ∨ a— . This language can also be described as the set of all
words containing at most two bs. Notice that s3 is a sink state.

Example 3.2 Consider the automaton with state diagram

b
a s3
s1
s0
b ca
a
b
s2 c
c


which we simplify as

b,c
a s3
s1
s0
a
s2
b,c a,b,c


a,b,c

to decrease the number of arrows. This automaton obviously accepts only
the words ab and ac. This language may be described by the regular expres-
sion a(b ∨ c). Notice that the sink state s2 eliminates all other words from the
language.

Example 3.3 Consider the automaton with state diagram

b

a b c
s1 s3
s0 s4
a,c
a,b
c s2 a,b,c


a,b,c

The only words accepted are b and abc. Therefore the expression for the lan-
guage accepted is b ∨ abc.
3.1 Deterministic and nondeterministic automata 41


Example 3.4 Consider the automaton with state diagram

a a,b

a
b b
s0 s1 s2 s3
b
a


In this automaton, if three consecutive bs are read, then the automaton is in state
s3 , which is a sink state and is not an acceptance state. This is the only way to
get to s3 and every other state is an acceptance state. Thus the language accepted
by this automaton consists of all words which do not have three consecutive bs.
An expression for this language is

(a ∨ (ba) ∨ (bba))— (» ∨ b ∨ (bb)).

As previously mentioned, the automata that we have been discussing are
called deterministic automata since in every state and for every value of the
alphabet that is read, there is one and only one state in which the automata can
be. In other words, ’ : Q — ’ Q is a function. It is often convenient to relax
the rules so that ’ is no longer a function, but a relation. If we again consider
’ as a set of rules, given a ∈ and s ∈ Q, the rules may allow advancement
to each of several states or there may not be a rule which does not allow it to
go to any state after reading a in state s. In the latter case, the automaton is
“hung up” and can proceed no further. This cannot occur with a deterministic
automaton.
Although the de¬nition of a nondeterministic automaton varies, we shall use
the following de¬nition:

De¬nition 3.2 A nondeterministic automaton, denoted by

( , Q, s0 , ’, F)

consists of a ¬nite alphabet , a ¬nite set Q of states, and a function

’ :Q— ’ P(Q)

called the transition function. The set Q contains an element s0 and a subset
F containing one or more acceptance states. (Note that P(Q) is the power set
of Q.)

Thus given a ∈ and s ∈ Q, there may be a-arrows from s to several dif-
ferent states or to no state at all. By de¬nition, a deterministic automaton is also
Automata
42


considered to be a nondeterministic automaton. A nondeterministic automaton
often simpli¬es the state diagram and eliminates the need for a sink state. In
Example 3.2, the state diagram can be simpli¬ed to

a b,c
s1
s0 s2



Note that in reading aa, after reading the ¬rst a, the automaton is in state s3 ,
and when the second a is read the automaton “hangs up”, since there is no a
arrow out of state s3 .

Example 3.5 The deterministic automaton represented by

a b
b

s4
a
a s1 a
s0
ba
s3 b
b
s2


can be simpli¬ed using a nondeterministic automaton by simply eliminating
state s4 and all arrows into or out of this state.

Example 3.6 It is easily seen that the automaton with state diagram

b
a c
s0 s1 s2



accepts the language with regular expression ab— c.

Example 3.7 The automaton with state diagram


s1
a,b
s0


accepts the language with regular expression a ∨ b.
3.1 Deterministic and nondeterministic automata 43


Example 3.8 The automaton with state diagram


a b

a b
s0 s1 s2



accepts the language with regular expression aa— bb— .

Example 3.9 The automaton with state diagram

a,b
a,b
a,b a
s1 s2
a
s0


accepts the language consisting of strings with at least two as and so may be
written as (a ∨ b)— a(a ∨ b)— a(a ∨ b)— .

Obviously any language accepted by a deterministic automaton is accepted
by a nondeterministic automaton since the set of deterministic automata is a
subset of the set of nondeterministic automata. In the following theorem, how-
ever, we shall see that any language accepted by a nondeterministic automaton
is also accepted by a deterministic automaton.

Theorem 3.1 For each nondeterministic automaton, there is an equivalent
deterministic automaton that accepts the same language.

We demonstrate how to construct a deterministic automaton which accepts
the language accepted by a nondeterministic automaton. We shall later give a
formal proof that a language is accepted by a deterministic automaton if and
only if it is accepted by a nondeterministic automaton. If Q is the set of states
for the nondeterministic automaton, we shall use elements of P(Q), i.e. the
set of subsets of Q, as states for the deterministic automaton which we are
constructing. Some of these states may not be used since they do not occur
on any path which leads to acceptance state. Hence they could be removed
and greatly simplify the deterministic automaton created. However, for our
purpose, we are only interested in showing that a deterministic automaton can be
created.
Automata
44


In general we have the following procedure for constructing a deterministic
automaton

M = ( , Q , {s0 }, ’ , F )

from a nondeterministic automaton.

N = ( , Q, s0 , ’, F).

(1) Begin with the state {s0 } where s0 is the start state of the nondeterministic
automaton.
(2) For each ai ∈ , construct an ai arrow from {s0 } to the set consisting of all
states such that there is an ai -arrow from s0 to that state.
(3) For each newly constructed set of states s j and for each ai ∈ construct
an ai arrow from s j to the set consisting of all states such that there is an ai
arrow from an element of s j to that state.
(4) Continue this process until no new states are created.
(5) Make each set of states s j , that contains an element of the acceptance set
of the nondeterministic automaton, into an acceptance state.

Example 3.10 Consider the nondeterministic automaton N

a b

a b
s0 s1 s2



Construct an a-arrow from {s0 } to the set of all states so that there is an a-arrow
from s0 to that state. Since there is an a-arrow from s0 to s0 and an a-arrow from
s0 to s1 , we construct an a-arrow from {s0 } to {s0 , s1 }. There is no b-arrow from
s0 to any state. Hence the set of all states such that there is a b-arrow to one of
these states is empty and we construct a b-arrow from {s0 } to the empty set ….
We now consider the state {s0 , s1 }. We construct an a-arrow from {s0 , s1 } to the
set of all states such that there is an a-arrow from either s0 or s1 to that state.
Thus we construct an a-arrow from {s0 , s1 } to itself. We construct a b-arrow
from {s0 , s1 } to the set of all states such that there is a b-arrow from either s0
or s1 to that state. Thus construct a b-arrow from {s0 , s1 } to {s2 }. Since there
are no a-arrows or b-arrows from any state in the empty set to any other state,
we construct an a-arrow and a b-arrow from the empty set to itself. Consider
{s2 }. Since there is no a-arrow from s2 to any other state, we construct an a-
arrow from {s2 } to the empty set. Since the only b-arrow from s2 is to itself, we
construct a b-arrow from {s2 } to itself. The acceptance states consist of all sets
3.1 Deterministic and nondeterministic automata 45


which contain an element of the terminal set of N . In this case {s2 } is the only
acceptance state. We have now completed the state diagram

a
{s0 ,s1}
b
a
b
{s0} {s2}

a
b
˜
a b


which is easily seen to be the state diagram of a deterministic automaton. This
automaton also reads the same language as N , namely the language described
by the expression aa— bb— .
Example 3.11 Given the nondeterministic automaton

b b
s2
s0 s1
a
a
a
s3 a


using the same method as above we complete the deterministic automaton

a
{s0} {s1}
b
b
{s0 ,s2}
˜
a b a
b
a,b
{s3}

a {s1 ,s3}




At this point we introduce a new notation. The ordered pair (si , w) indicates
that the automaton is in state si and still has input w left to read. For example,
Automata
46


(s2 , abbb) indicates that the automaton is in state s2 and must still read abbb.
Assume that we have (si , aw) w ∈ + . Thus the automaton is in state si and
must still read a followed by w. The notation (si , aw) (s j , w) means that the
automaton has read a and moved from state si to state s j . Therefore ’(si , a) =
s j . In the automaton

a a,b

a
b b
s0 s1 s2 s3
b
a

we have (s2 , bab) (s3 , ab). We also have

(s0 , babba) (s1 , abba) (s0 , bba) (s1 , ba) (s2 , a) (s0 , »).

If we have (si , wi ) (s j , w j ) · · · (sm , wm ), we denote this by (si , wi ) —
(sm , wm ). We also let (s, w) — (s, w). Thus a word w is accepted by an automa-
ton if and only if (s0 , w) — (s, ») where s is an acceptance state. In our example
(s0 , bababb) — (s0 , »), so bababb is accepted by the automaton.
We shall now prove that a language is accepted by a deterministic automa-
ton if and only if it is accepted by a nondeterministic automaton. We begin
with two lemmas. The ¬rst is obvious since every deterministic automaton is a
nondeterministic automaton.
Lemma 3.1 Every language accepted by a deterministic automaton is
accepted by a nondeterministic automaton.
Lemma 3.2 Let N = ( , Q, s0 , ’, F) be a nondeterministic automaton and
M = ( , Q , {s0 }, ’ , F ) be the deterministic automaton derived from N using
the above process. Then (s0 , w) — (s, ») in N if and only if there exists X such
that ({s0 }, w) — (X, ») in M where s ∈ X .

Proof We ¬rst show that if (s0 , w) — (s, ») in N , then ({s0 }, w) — (X, »)
where s ∈ X. The proof uses induction on the length n of w. If n = 0, we
have (s0 , ») — (s0 , ») in N , ({s0 }, ») — ({s0 }, ») in M, and s0 ∈ {s0 }, so the
statement is true if n = 0. Assume w = va ∈ + has length k + 1, so v has
length n. Since (s0 , va) — (s, »), then (s0 , va) — (t, a) (s, ») for some t ∈ Q
and (s0 , v) — (t, »). Therefore by induction, there exist Y so that t ∈ Y and
({s0 }, v) — (Y, »). Since t ∈ Y and (t, a) (s, ») in N , (Y, a) (X, ») for some
X where s ∈ X . Therefore ({s0 }, va) — (Y, a) (X, ») or ({s0 }, va) — (X, »)
where s ∈ X .
3.1 Deterministic and nondeterministic automata 47


Conversely, we show that if ({s0 }, w) — (X, ») in M, then (s0 , w) — (s, »)
in N where s ∈ X . We again use induction on n, the length of the word w.
Assume there exists X such that ({s0 }, w) — (X, ») in M where s ∈ X . If
n = 0, we have ({s0 }, ») — ({s0 }, ») in M, (s0 , ») — (s0 , ») in N , and s0 ∈ {s0 },
so the statement is true if n = 0. Given ({s0 }, va) with length k + 1, so v has
length n. Assume ({s0 }, va) — (Y, a) (X, »). Therefore ({s0 }, v) — (Y, »).
By induction, (s0 , v) — (t, ») for all t in Y and hence (s0 , va) — (t, a) for all t in
Y . By de¬nition, since (Y, a) (X, ») and (t, a) (s, »),then (s0 , w) — (s, »)
in N for s ∈ X .
We are now able to prove the desired Theorem 3.1.
Theorem 3.2 A language is accepted by a deterministic automaton if and
only if it is accepted by a nondeterministic automaton.
Proof To show this we need only show that a word is accepted by a non-
deterministic automaton if and only if it is accepted by the corresponding
deterministic automaton. If (s0 , w) — (s, ») where s is an acceptance state
in the nondeterministic automaton, then ({s0 }, w) — (X, ») where X contains
an acceptance state. Hence X is an acceptance state. Assume X is an accep-
tance state, then it contains an acceptance state r from the nondeterministic
automaton. But by the previous lemma, if ({s0 }, w) — (X, ») and r ∈ X then
(s0 , w) — (r, »). Therefore r is an acceptance state.
At this point we shall de¬ne an extended nondeterministic automaton and
prove that a language is accepted by an extended nondeterministic automaton
if and only if it is accepted by a nondeterministic automaton (and hence a
deterministic automaton).
Using a nondeterministic automaton, we can extend the automaton so that
( + , Q, s0 , ’, F) consists of + , a ¬nite set Q of states, and a function
’ : + — Q ’ P(Q), called the transition function. Thus ’ reads words
instead of letters. This can be changed back to reading letters by adding new
nonterminal states. If ’ reads the word w = a1 a2 · · · ak, and moves from state
s to state s , add states σ2 σ3 · · · σk , and let ’(s, a1 ) = σ2 , ’(σ2 , a2 ) = σ3 ,
’(σ3 , a3 ) = σ4 , . . . , ’(σk’1 , ak’1 ) = σk , and ’(σk , ak ) = s . This forms a
nondeterministic automata, but we can form a deterministic automata with
the same language as shown above.
If we allow the automaton to pass from one state si to another state s j without
reading a letter of the alphabet, this may be shown as the automaton having an
edge from si to s j with label ». Thus paths may contain one or more » s. Such
an automaton is said to have »-moves. We can then have an automaton with the
form ( — , Q, s0 , ’, F).
Automata
48


Formally a ¬nite automaton M = ( , Q, s0 , ’, F) with »-moves has the
property that ’ maps Q ∪ {»} to Q. We wish to create a deterministic automata
M = ( , Q , s0 , ’ , F ) containing no »-moves with the same language. Thus
M(L) = M (L). Given a letter q in , de¬ne E(q) to be all the states that
are reachable from q without reading a letter in the alphabet. Thus E(q) =
{ p : (q, w) ( p, w). In our construction, the set of states of M is a subset of
P(Q). The state s0 = E(s0 ), and F is a set containing an element of F. For
each element a of , de¬ne ’ by ’ (P, a) = p∈P E(’( p, a)).
We ¬rst show that M is deterministic. It is certainly single valued. Further
’ (P, a) will always have a value even if it is the empty set.
We must now show that M(L) = M (L). To do this we show that for any
states p and q in Q, and any word w in —
— —
( p, w) (q, ») in M if and only if (E( p), w) (P, ») in M

for some P containing q. From this it will follow that
— —
(s0 , w) ( f, ») in M if and only if (E(s0 ), w) (P, ») in M

for some P containing f , where f ∈ F.
We prove this using induction of the length of w. If |w| = 0, then w = »,
and it must be shown that
— —
( p, ») (q, ») in M if and only if (E( p), ») (P, ») in M

for some P containing q. Now ( p, ») — (q, ») if and only if q ∈ E( p);
but since M is deterministic and no letter is read, then P = E( p) and
p ∈ E( p). Therefore the statement is true if |w| = 0.
Assume the statement is true for all strings having nonnegative length k. We
now have to prove the statement is true for any string w with length k + 1.
’: Assume w = va for some letter a and w and ( p, w) — (q, ») so that
— —
( p, va) (q1 , a) (q2 , ») (q, »)

where at the end, possibly no letters of the alphabet are read. Since ( p, va) —
(q1 , a) then ( p, v) — (q1 , ») and, by induction, (E( p), v) — (R, ») for some R
containing q1 . But since (q1 , a) (q2 , »), by construction, E(q2 ) ⊆ ’ (R, a),
and since (q2 , ») — (q, »), q ∈ E(q2 ) by de¬nition of E, and hence q ∈
’ (R, a). Therefore (R, a) ((P, ») for some P containing q by de¬nition
of ’ and (E( p), va) — (R, a) ((P, ») for some P containing q.
In M , assume (E( p, va)) — (R, a) (P, ») where q ∈ P and ’ (R, a) =
P. By de¬nition ’ (R, a) = r ∈R E(’(r, a)). There exists some state r ∈ R
such that ’(r, a) = s and q ∈ E(s). Therefore (s, ») — (q, ») by de¬nition
3.1 Deterministic and nondeterministic automata 49


— —
of E(s). By the induction hypotheses ( p, v) (r, »). Therefore ( p, va)
(r, a) (s, ») — (q, »).

Given the automaton (M = ( , Q, s0 , ’, F)
Example 3.12

c
b
a
l
l
s0 s2
s1
c
b
l
a

s3


which has »-moves, we construct M = ( , Q , s0 , ’ , F ) containing no
»-moves: E(s0 ) = {s0 , s1 , s2 }, E(s1 ) = {s1 , s2 }, E(s2 ) = {s2 }, and E(s3 ) =
{s0 , s1 , s2 , s3 }. Denote these sets by s0 , s1 , s2 , and s3 respectively. Then ’
is given by the following table

a b c
s0 s3 s1 s2

s1 s1 s2
… …
s2 s2
s3 s3 s1 s2


giving the »- free automaton

b

s'1 c
b

c c
s'0 s'2
a
b

a a,b
c
˜
s'3

a,b,c
a

Both automata generate the language a— b— c— .
Automata
50


Exercises
(1) Which of the following words are accepted by the automaton?
a
b

a
a
s2
s0 s1
b
b
(a) abba.
(b) aabbb.
(c) babab.
(d) aaabbb.
(e) bbaab.
(2) Which of the following words are accepted by the automaton?
a


b
b
b
s2
s0 s1
a

a

(a) aaabb.
(b) abbbabbb.
(c) bababa.
(d) aaabab.
(e) bbbabab.
(3) Write an expression for the language accepted by the automaton
a,b
a
b
b
s2
s0 s1
a

(4) Write an expression for the language accepted by the automaton
a
a
b
b s3
s0 s2
a
b
s1
3.1 Deterministic and nondeterministic automata 51


(5) Write an expression for the language accepted by the automaton

a
a
b
b b
s2
s1
s0
a


(6) Write an expression for the language accepted by the automaton

b
a a
a
a,b
s1
s0 s2 s3
a
b
b

(7) Find a deterministic automaton which accepts the language expressed by
aa— bb— cc— .
(8) Find a deterministic automaton which accepts the language expressed by
(a— ba— ba— b)— .
(9) Find a deterministic automaton which accepts the language expressed by
(a— (ba)— bb— a)— .
(10) Find a deterministic automaton which accepts the language expressed by
(a— b) ∨ (b— a)— .
(11) Find a nondeterministic automaton which accepts the language expressed
by aa— bb— cc— .
(12) Find a nondeterministic automaton which accepts the language expressed
by (a— b) ∨ (c— b) ∨ (ac)— .
(13) Find a nondeterministic automaton which accepts the language expressed
by (a ∨ b)— (aa ∨ bb)(a ∨ b)— .
(14) Find a nondeterministic automaton which accepts the language expressed
by ((aa— b) ∨ bb— a)ac— .
(15) Find a deterministic automaton which accepts the same language as the
nondeterministic automaton

a,b
a
b
a s1
s0 s2
b


(16) Find a deterministic automaton which accepts the same language as the
nondeterministic automaton
Automata
52


a
b a,b
s0 s2
a
b
a
s1


a,b

(17) Find a deterministic automaton which accepts the same language as the
nondeterministic automaton

b a,b
a
s0 s2
s1 a
b


(18) Find a deterministic automaton which accepts the same language as the
nondeterministic automaton

b

a,b
b
s1
a
s3
s0 a
a a
s2




3.2 Kleene™s Theorem
In this section we show Kleene™s Theorem which may be stated as follows:
Theorem 3.3 A language is regular if and only if it is accepted by an auto-
maton.
We begin by showing that the rules de¬ning a regular language can be
duplicated by an automaton. First it is shown that there are automata which
accept subsets of the ¬nite set . Then it is shown that if there are automata
that accept languages L 1 and L 2 , we can construct automata that accept L 1 L 2 ,
L — , and L 1 ∪ L 2 . Thus we show that for every regular language over a ¬nite
1
set , there is a nondeterministic automaton that accepts that language.
3.2 Kleene™s Theorem 53


The set is »; it is accepted by the automaton with state diagram.

s


It may be preferable to create a state s1 , which is not an acceptance state, and
have all arrows from both s0 and s1 go to s1 .
We next show that for every ¬nite subset of , there is an automaton that reads
that subset. If the automaton has no acceptance state, then the language accepted
is the empty set. The elements of the set {a1 , a2 , a3 , . . . , an } are accepted by
the automaton with state diagram

s1


s2
a1
s0 a2
a3 s3
¦




an

sn


In particular if a ∈ , it is accepted by the automaton with state diagram

a
s0 s1


We next show that if regular languages with expression L 1 and L 2 are both
accepted by automata, then their concatenation L 1 L 2 is also accepted by an
automaton. Assume L 1 is accepted by the automaton M1 = ( , Q, s0 , ’, F),
and L 2 is accepted by the automaton M2 = ( , Q , s0 , ’ , F ). We shall con-
sider both automata to be deterministic without loss of generality. We now
de¬ne a new automaton M = ( , Q , s0 , ’ , F ) which is essentially the ¬rst
automaton followed by the second automaton. Put simply, place the state dia-
gram for M2 after the state diagram for M1 . If, for a ∈ , there is an a-arrow
from any state s in the state diagram for M1 to an acceptance state in the state
diagram for M1 , then change the acceptance state into a nonacceptance state
and also place an a-arrow from s to the starting state in the state diagram for
M2 . This is the state diagram for M. Thus the set of states Q = Q ∪ Q , so that
Q consists of all the states in M1 and M2 . We shall assume that M1 and M2
Automata
54


have no common states. If they do, we can always relabel them. Since we want
to begin in M1 , we let s0 be the starting state of M so that s0 = s0 . Since we
want to ¬nish in M2 , we let the set of acceptance states be F so that F = F .
We de¬ne the rules for ’ as follows.
If the rule
If in state si and a is read, go to state s j
is in ’ and s j is not an acceptance state then include this rule in ’ . If s j is an
acceptance state then include this rule in ’ and also include the rule
If in state si and a is read, go to state s0 .
Hence there is the option of going to the state s j or skipping over to s0 in the
second automaton. Again recall that s j now ceases to be an acceptance state.
If the rule is in ’ then it is included in ’ . As a result, if the automaton M
has read a word in L 1 , it may then skip over and read a word in L 2 . As a special
case, consider the possibility of » being a word in L 1 . Include the rule
If state s0 is an acceptance state, go to state s0 .
When these rules are followed, a word in L 1 L 2 ends up in an acceptance state of
M2 and hence ’ so that it is accepted by M. Therefore every string consisting
of a word in L 1 followed by a word in L 2 is accepted by M, and L 1 L 2 is
accepted by M.
Example 3.13 Let L 1 be the language described by the language (ab)— c and
having automaton M1 with state diagram

a
s2
s0 s1
b

c

Let L 2 be the language described by the language ab— c— and having automaton
M2 with state diagram

b c


a c
s'0 s'1 s'2


To ¬nd the state diagram for the language L 1 L 2 , place the state diagram for M2
after the state diagram for M1 . Since there is a c-arrow from s0 to s2 , and s2 is
3.2 Kleene™s Theorem 55


an acceptance state, add a c-arrow from s0 to s0 . The state diagram
b c

a
a c
s'1
s0 s'0
s1 s'2
s2
b
c
c

is the state diagram for M, the automaton for L 1 L 2 .
Example 3.14 Let L 1 and L 2 and their respective automata be the same as
those in the previous example. To ¬nd the automaton for the language for L 2 L 1
is slightly more complicated. First we place the state diagram for M1 after the
state diagram for M2 . There is an a-arrow from s0 to s1 and s1 is an acceptance
state, so place an a-arrow from s0 to s0 . There is a b-arrow from s1 to s1 and s1
is an acceptance state, so place a b-arrow from s1 to s0 . There is a c-arrow from
s1 to s2 and s2 is an acceptance state, so place a c-arrow from s1 to s0 . There is
a c-arrow from s2 to s2 and s2 is an acceptance state, so place a c-arrow from s2
to s0 . Then change s1 , s2 so that they are not acceptance states. Thus we have
the state diagram
c
b
a
c
c
a
s0 s1
s'0 s2
s'2
s'1 b
b,c c
a

which is the state diagram for M, the automaton for L 2 L 1 .
Similarly we show that if L is a language accepted by an automaton M1 =
( , Q, s0 , ’, F) then L — is also accepted by an automaton. We now de¬ne a
new automaton M = ( , Q , s0 , ’ , F ) which is essentially the same as M1
except M is looped to itself. Let M be de¬ned as follows: Create a new state
s0 , and make it an acceptance state. We include state s0 so M will accept the
empty word. For each rule
If in state s0 and a is read, go to state s j
for a ∈ , add the rule
If in state s0 and a is read, go to state s j .
Thus if there is an a-arrow from s0 to s j , there is an a-arrow from s0 to s j .
Include all of the current rules for M1 in M. In addition, for a ∈ , if there is
Automata
56


an a-arrow from any state s in the state diagram for M1 to an acceptance state
in the state diagram for M1 , then also place an a-arrow from s to s0 in the state
diagram for M. This is the state diagram for M. The set of states Q = Q ∪ {s0 }.
The set of acceptance states for M is F ∪ {s0 }.
We thus de¬ne the rules for ’ as follows:
If in state s0 and a is read, go to state s j
for a ∈ , add the rule
If in state s0 and a is read, go to state s j .
If the rule is in ’, then include this rule in ’ . If s j is an acceptance state
and
If in state si and a is read, go to state s j
then also include the rule
If in state si and a is read, go to state s0 .
Hence there is the option of going to the acceptance state s j or skipping over
to s0 .
Example 3.15 Let the diagram

b
s3
s1
a a
s0 b
b
s4
s2
b
a

be the state diagram for the automaton accepting the language L, then the
diagram

b
b
a
s3
s0 s1
b a
a
b b
b
b
s'0 s2 s4
b
a

is the state diagram for the automaton accepting the language L — .
3.2 Kleene™s Theorem 57


Since the relationship between an automaton and its state diagram should
be evident by now, in the future we will identify an automaton with its state
diagram. Given automata M and M accepting languages L and L respectively,
we now wish to construct an automaton M which accepts L ∪ L . We wish
to read a word simultaneously in M and M , and accept it if it is accepted by
either M or M .
We now show how to construct M which accepts L ∪ L given automata
M = ( , Q, s0 , ’, F) and M = ( , Q , s0 , ’ , F ) If s0 and s0 are the initial
states of M and M respectively then construct a new initial state s0 , which is an
acceptance state if either s0 or s0 is, and let M = ( , Q , s0 , ’ , F ) where
Q = Q ∪ Q ∪ {s0 } and F = F ∪ F . Let ’ = ’ ∪ ’ together with the
following rules: If there is a rule

If in state s0 and a is read, go to state s j

for a ∈ in ’ include the rule

If in state s0 and a is read, go to state s j

in ’ .
If there is a rule

If in state s0 and a is read, go to state s j

<<

. 2
( 9)



>>