Next Page

Prev. page

Next Page Next Chapter

CHAPTER 1 ................................................. 1

Foundations ............................................. 1

TERMINOLOGY ..................................... 1

Sender and Receiver ................................... 1

Messages and Encryption ........................... 1

Original Plaintext b Decryption ................. 1

Authentication, Integrity, and Nonrepudiati ... 2

Algorithms and Keys .................................... 2

Figure 1.2 Encryption and Decryption .......... 3

Figure 1.3 Encryption Decryption Key Key . 4

Symmetric Algorithms .................................. 4

Public-Key Algorithms ................................. 4

Cryptanalysis ............................................... 5

Security Algorithms ...................................... 8

Historical Terms ........................................... 9

STEGANOGRAPHY ............................... 9

SUBSTITUTION CIPHERS AND ............. 10

Substitution Ciphers .................................... 10

Transposition Ciphers .................................. 12

Rotor Machines ........................................... 12

Figure 1.4 Columnar transposition cipher. .... 12

Further Reading ........................................... 13

SIMPLE ................................................... 13

ONE-TIME PADS .................................... 15

COMPUTER ALGORITHMS ................... 17

LARGE NUMBERS ................................. 17

TABLE 1.1 Large Numbers ........................ 18

Page 1

Prev. Chapter Home Next Page

Prev. Page Next Chapter

1

CHAPTER

Foundations

1.1 TERMINOLOGY

Sender and Receiver

Suppose a sender wants to send a message to a receiver. Moreover, this sender

wants to send the message securely: She wants to make sure an eavesdropper can-

not read the message.

Messages and Encryption

A message is plaintext (sometimes called cleartext). The process of disguising a

message in such a way as to hide its substance is encryption. An encrypted message

is ciphertext. The process of turning ciphertext back into plaintext is decryption.

This is all shown in Figure 1.1.

(If you want to follow the IS0 7498-2 standard, use the terms â€śencipherâ€ť and

â€śdecipher.â€ť It seems that some cultures find the terms â€śencryptâ€ť and â€śdecryptâ€ť

offensive, as they refer to dead bodies.)

The art and science of keeping messages secure is cryptography, and it is practiced

by cryptographers. Cryptanalysts are practitioners of cryptanalysis, the art and sci-

ence of breaking ciphertext; that is, seeing through the disguise. The branch of

mathematics encompassing both cryptography and cryptanalysis is cryptology and

its practitioners are cryptologists. Modern cryptologists are generally trained in the-

oretical mathematics-they have to be.

1

Original

Plaintext

b

Decryption

Figure 1.1 Encryption and Decryption.

Page 2

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

Plaintext is denoted by M, for message, or P, for plaintext. It can be a stream of

bits, a text file, a bitmap, a stream of digitized voice, a digital video image . . . what-

ever. As far as a computer is concerned, M is simply binary data. (After this chapter,

this book concerns itself with binary data and computer cryptography.) The plain-

text can be intended for either transmission or storage. In any case, M is the message

to be encrypted.

Ciphertext is denoted by C. It is also binary data: sometimes the same size as M,

sometimes larger. (By combining encryption with compression, C may be smaller

than M. However, encryption does not accomplish this.) The encryption function E,

operates on M to produce C. Or, in mathematical notation:

E(M) = C

In the reverse process, the decryption function D operates on C to produce M:

D(C) = M

Since the whole point of encrypting and then decrypting a message is to recover

the original plaintext, the following identity must hold true:

D(E(M)) = M

Authentication, Integrity, and Nonrepudiation

In addition to providing confidentiality, cryptography is often asked to do other

jobs:

- Authentication. It should be possible for the receiver of a message to

ascertain its origin; an intruder should not be able to masquerade as

someone else.

- Integrity. It should be possible for the receiver of a message to verify

that it has not been modified in transit; an intruder should not be able

to substitute a false message for a legitimate one.

- Nonrepudiation. A sender should not be able to falsely deny later that

he sent a message.

These are vital requirements for social interaction on computers, and are analo-

gous to face-to-face interactions. That someone is who he says he is . . . that some-

oneâ€™s credentials-whether a driverâ€™s license, a medical degree, or a passport-are

valid . . . that a document purporting to come from a person actually came from that

person. . . . These are the things that authentication, integrity, and nonrepudiation

provide.

Algorithms and Keys

A cryptographic algorithm, also called a cipher, is the mathematical function used

for encryption and decryption. (Generally, there are two related functions: one for

encryption and the other for decryption.)

Page 3

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

1.1 Terminology

If the security of an algorithm is based on keeping the way that algorithm works

a secret, it is a restricted algorithm. Restricted algorithms have historical interest,

but are woefully inadequate by todayâ€™s standards. A large or changing group of users

cannot use them, because every time a user leaves the group everyone else must

switch to a different algorithm. If someone accidentally reveals the secret, everyone

must change their algorithm.

Even more damning, restricted algorithms allow no quality control or standard-

ization. Every group of users must have their own unique algorithm. Such a group

canâ€™t use off-the-shelf hardware or software products; an eavesdropper can buy the

same product and learn the algorithm. They have to write their own algorithms and

implementations. If no one in the group is a good cryptographer, then they wonâ€™t

know if they have a secure algorithm.

Despite these major drawbacks, restricted algorithms are enormously popular for

low-security applications. Users either donâ€™t realize or donâ€™t care about the security

problems inherent in their system.

Modern cryptography solves this problem with a key, denoted by K. This key might

be any one of a large number of values. The range of possible values of the key is called

the keyspace. Both the encryption and decryption operations use this key (i.e., they

are dependent on the key and this fact is denoted by the K subscript), so the functions

now become:

E,(M) = C

DK(C) = M

Those functions have the property that (see Figure 1.2):

DA&(M)) = M

Some algorithms use a different encryption key and decryption key (see Figure

1.3). That is, the encryption key, K1, is different from the corresponding decryption

key, Kz. In this case:

E,,(M) = C

D&l =M

b2h1 (Ml = M

All of the security in these algorithms is based in the key (or keys); none is based

in the details of the algorithm. This means that the algorithm can be published and

analyzed. Products using the algorithm can be mass-produced. It doesnâ€™t matter if an

Key

Original

Plaintext

+

Figure 1.2 Encryption and decryption with a key.

Page 4

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

Encryption Decryption

Key Key

Original

Plaintext

w

Figure 1.3 Encryption and decryption with two different keys.

eavesdropper knows your algorithm; if she doesnâ€™t know your particular key, she

canâ€™t read your messages.

A cryptosystem is an algorithm, plus all possible plaintexts, ciphertexts, and keys.

Symmetric Algorithms

There are two general types of key-based algorithms: symmetric and public-key.

Symmetric algorithms, sometimes called conventional algorithms, are algorithms

where the encryption key can be calculated from the decryption key and vice versa.

In most symmetric algorithms, the encryption key and the decryption key are the

same. These algorithms, also called secret-key algorithms, single-key algorithms, or

one-key algorithms, require that the sender and receiver agree on a key before they

can communicate securely. The security of a symmetric algorithm rests in the key;

divulging the key means that anyone could encrypt and decrypt messages. As long

as the communication needs to remain secret, the key must remain secret.

Encryption and decryption with a symmetric algorithm are denoted by:

E,(M) = C

DK(C) =M

Symmetric algorithms can be divided into two categories. Some operate on the

plaintext a single bit (or sometimes byte) at a time; these are called stream algo-

rithms or stream ciphers. Others operate on the plaintext in groups of bits. The

groups of bits are called blocks, and the algorithms are called block algorithms or

block ciphers. For modern computer algorithms, a typical block size is 64 bits-

large enough to preclude analysis and small enough to be workable. (Before com-

puters, algorithms generally operated on plaintext one character at a time. You can

think of this as a stream algorithm operating on a stream of characters.)

Public-Key Algorithms

Public-key algorithms (also called asymmetric algorithms) are designed so that

the key used for encryption is different from the key used for decryption. Further-

more, the decryption key cannot (at least in any reasonable amount of time) be cal-

culated from the encryption key. The algorithms are called â€śpublic-keyâ€ť because

the encryption key can be made public: A complete stranger can use the encryption

key to encrypt a message, but only a specific person with the corresponding decryp-

Page 5

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

1.1 Terminology

tion key can decrypt the message. In these systems, the encryption key is often

called the public key, and the decryption key is often called the private key. The pri-

vate key is sometimes also called the secret key, but to avoid confusion with sym-

metric algorithms, that tag wonâ€™t be used here.

Encryption using public key K is denoted by:

E,(M) = C

Even though the public key and private key are different, decryption with the cor-

responding private key is denoted by:

DK(C) = M

Sometimes, messages will be encrypted with the private key and decrypted with

the public key; this is used in digital signatures (see Section 2.6). Despite the possi-

ble confusion, these operations are denoted by, respectively:

E,(M) = C

DK(C) = M

Cryptanalysis

The whole point of cryptography is to keep the plaintext (or the key, or both)

secret from eavesdroppers (also called adversaries, attackers, interceptors, interlop-

ers, intruders, opponents, or simply the enemy). Eavesdroppers are assumed to have

complete access to the communications between the sender and receiver.

Cryptanalysis is the science of recovering the plaintext of a message without

access to the key. Successful cryptanalysis may recover the plaintext or the key. It

also may find weaknesses in a cryptosystem that eventually lead to the previous

results. (The loss of a key through noncryptanalytic means is called a compromise.)

An attempted cryptanalysis is called an attack. A fundamental assumption in

cryptanalysis, first enunciated by the Dutchman A. Kerckhoffs in the nineteenth

century, is that the secrecy must reside entirely in the key [794]. Kerckhoffs

assumes that the cryptanalyst has complete details of the cryptographic algorithm

and implementation. (Of course, one would assume that the CIA does not make a

habit of telling Mossad about its cryptographic algorithms, but Mossad probably

finds out anyway.) While real-world cryptanalysts donâ€™t always have such detailed

information, itâ€™s a good assumption to make. If others canâ€™t break an algorithm,

even with knowledge of how it works, then they certainly wonâ€™t be able to break it

without that knowledge.

There are four general types of cryptanalytic attacks. Of course, each of them

assumes that the cryptanalyst has complete knowledge of the encryption algo-

rithm used:

1. Ciphertext-only attack. The cryptanalyst has the ciphertext of several

messages, all of which have been encrypted using the same encryption

algorithm. The cryptanalystâ€™s job is to recover the plaintext of as many

messages as possible, or better yet to deduce the key (or keys) used to

Page 6

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

encrypt the messages, in order to decrypt other messages encrypted with

the same keys.

Given: Ci = Ek(Pl), Cz = Ek(P,), . . . Ci = Ek,(Pi)

Deduce: Either P1, P2, . . . PI,;k; or an algorithm

to infer Pi + 1 from Ci + 1= Ek(Pl + 1)

2. Known-plaintext attack. The cryptanalyst has access not only to the

ciphertext of several messages, but also to the plaintext of those messages.

His job is to deduce the key (or keys) used to encrypt the messages or an

algorithm to decrypt any new messages encrypted with the same key (or

keys).

Given: Pi, Cl = Ek(Pl), Pz, Cl = Ek(Pz),. . . PI, C, = Ek(Pi)

Deduce: Either k, or an algorithm

to infer Pi + i from Ci + 1= Ek(Pi + 1)

3. Chosen-plaintext attack. The cryptanalyst not only has access to the

ciphertext and associated plaintext for several messages, but he also

chooses the plaintext that gets encrypted. This is more powerful than a

known-plaintext attack, because the cryptanalyst can choose specific

plaintext blocks to encrypt, ones that might yield more information about

the key. His job is to deduce the key (or keys) used to encrypt the messages

or an algorithm to decrypt any new messages encrypted with the same key

(or keys).

Given: PI, Cl = Ek(Pl), P2, Cl = Ek(Pz),. . . Pi, Ci = Ek(Pi),

where the cryptanalyst gets to choose PI, P2, . . . Pi

Deduce: Either k, or an algorithm to infer Pi + 1from Ci + 1= Ek(P,+ J

4. Adaptive-chosen-plaintext attack. This is a special case of a chosen-

plaintext attack. Not only can the cryptanalyst choose the plaintext that is

encrypted, but he can also modify his choice based on the results of previ-

ous encryption. In a chosen-plaintext attack, a cryptanalyst might just be

able to choose one large block of plaintext to be encrypted; in an adaptive-

chosen-plaintext attack he can choose a smaller block of plaintext and

then choose another based on the results of the first, and so forth.

There are at least three other types of cryptanalytic attack.

5. Chosen-ciphertext attack. The cryptanalyst can choose different cipher-

texts to be decrypted and has access to the decrypted plaintext. For exam-

ple, the cryptanalyst has access to a tamperproof box that does automatic

decryption. His job is to deduce the key.

Given: Cl, PI = Dk( Cl), Cx, Pz = Dk(Cz), . . . Cip Pi = Dk( Ci)

Deduce: k

Page 7

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

This attack is primarily applicable to public-key algorithms and will be

discussed in Section 19.3. A chosen-ciphertext attack is sometimes effec-

tive against a symmetric algorithm as well. (Sometimes a chosen-plaintext

attack and a chosen-ciphertext attack are together known as a chosen-text

attack.)

6. Chosen-key attack. This attack doesnâ€™t mean that the cryptanalyst can

choose the key; it means that he has some knowledge about the relation-

ship between different keys. Itâ€™s strange and obscure, not very practical,

and discussed in Section 12.4.

7. Rubber-hose cryptanalysis. The cryptanalyst threatens, blackmails, or tor-

tures someone until they give him the key. Bribery is sometimes referred

to as a purchase-key attack. These are all very powerful attacks and often

the best way to break an algorithm.

Known-plaintext attacks and chosen-plaintext attacks are more common than

you might think. It is not unheard-of for a cryptanalyst to get a plaintext message

that has been encrypted or to bribe someone to encrypt a chosen message. You may

not even have to bribe someone; if you give a message to an ambassador, you will

probably find that it gets encrypted and sent back to his country for consideration.

Many messages have standard beginnings and endings that might be known to the

cryptanalyst. Encrypted source code is especially vulnerable because of the regular

appearance of keywords: #define, struct, else, return. Encrypted executable code has

the same kinds of problems: functions, loop structures, and so on. Known-plaintext

attacks (and even chosen-plaintext attacks) were successfully used against both the

Germans and the Japanese during World War II. David Kahnâ€™s books [794,795,796]

have historical examples of these kinds of attacks.

And donâ€™t forget Kerckhoffsâ€™s assumption: If the strength of your new cryptosys-

tern relies on the fact that the attacker does not know the algorithmâ€™s inner work-

ings, youâ€™re sunk. If you believe that keeping the algorithmâ€™s insides secret

improves the security of your cryptosystem more than letting the academic com-

munity analyze it, youâ€™re wrong. And if you think that someone wonâ€™t disassemble

your code and reverse-engineer your algorithm, youâ€™re naive. (In 1994 this hap-

pened with the RC4 algorithm-see Section 17.1.) The best algorithms we have are

the ones that have been made public, have been attacked by the worldâ€™s best cryp-

tographers for years, and are still unbreakable. (The National Security Agency

keeps their algorithms secret from outsiders, but they have the best cryptographers

in the world working within their walls-you donâ€™t. Additionally, they discuss

their algorithms with one another, relying on peer review to uncover any weak-

nesses in their work.)

Cryptanalysts donâ€™t always have access to the algorithms, as when the United

States broke the Japanese diplomatic code PURPLE during World War II [794]-but

they often do. If the algorithm is being used in a commercial security program, it is

simply a matter of time and money to disassemble the program and recover the algo-

rithm. If the algorithm is being used in a military communications system, it is sim-

Page 8

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

ply a matter of time and money to buy (or steal) the equipment and reverse-engineer

the algorithm.

Those who claim to have an unbreakable cipher simply because they canâ€™t break

it are either geniuses or fools. Unfortunately, there are more of the latter in the

world. Beware of people who extol the virtues of their algorithms, but refuse to

make them public; trusting their algorithms is like trusting snake oil.

Good cryptographers rely on peer review to separate the good algorithms from

the bad.

of Algorithms

Security

Different algorithms offer different degrees of security; it depends on how hard

they are to break. If the cost required to break an algorithm is greater than the value

of the encrypted data, then youâ€™re probably safe. If the time required to break an

algorithm is longer than the time the encrypted data must remain secret, then

youâ€™re probably safe. If the amount of data encrypted with a single key is less than

the amount of data necessary to break the algorithm, then youâ€™re probably safe.

I say â€śprobablyâ€ť because there is always a chance of new breakthroughs in crypt-

analysis. On the other hand, the value of most data decreases over time. It is impor-

tant that the value of the data always remain less than the cost to break the security

protecting it.

Lars Knudsen classified these different categories of breaking an algorithm. In

decreasing order of severity [858]:

1. Total break. A cryptanalyst finds the key, K, such that DK(C) = l?

2. Global deduction. A cryptanalyst finds an alternate algorithm, A, equiva-

lent to DK(C), without knowing K.

3. Instance (or local) deduction. A cryptanalyst finds the plaintext of an inter-

cepted ciphertext.

4. Information deduction. A cryptanalyst gains some information about the

key or plaintext. This information could be a few bits of the key, some

information about the form of the plaintext, and so forth.

An algorithm is unconditionally secure if, no matter how much ciphertext a

cryptanalyst has, there is not enough information to recover the plaintext. In point

of fact, only a one-time pad (see Section 1S) is unbreakable given infinite resources.

All other cryptosystems are breakable in a ciphertext-only attack, simply by trying

every possible key one by one and checking whether the resulting plaintext is mean-

ingful. This is called a brute-force attack (see Section 7.1).

Cryptography is more concerned with cryptosystems that are computationally

infeasible to break. An algorithm is considered computationally secure (sometimes

called strong) if it cannot be broken with available resources, either current or

future. Exactly what constitutes â€śavailable resourcesâ€ť is open to interpretation.

You can measure the complexity (see Section 11.1) of an attack in different ways:

Page 9

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

1. Data complexity. The amount of data needed as input to the attack.

2. Processing complexity. The time needed to perform the attack. This is

often called the work factor.

3. Storage requirements. The amount of memory needed to do the attack.

As a rule of thumb, the complexity of an attack is taken to be the minimum of

these three factors. Some attacks involve trading off the three complexities: A faster

attack might be possible at the expense of a greater storage requirement.

Complexities are expressed as orders of magnitude. If an algorithm has a process-

ing complexity of 2â€ť*, then 21z8operations are required to break the algorithm.

(These operations may be complex and time-consuming.) Still, if you assume that

you have enough computing speed to perform a million operations every second and

you set a million parallel processors against the task, it will still take over 1019years

to recover the key. Thatâ€™s a billion times the age of the universe.

While the complexity of an attack is constant (until some cryptanalyst finds a bet-

ter attack, of course), computing power is anything but. There have been phenome-

nal advances in computing power during the last half-century and there is no reason

to think this trend wonâ€™t continue. Many cryptanalytic attacks are perfect for paral-

lel machines: The task can be broken down into billions of tiny pieces and none of

the processors need to interact with each other. Pronouncing an algorithm secure

simply because it is infeasible to break, given current technology, is dicey at best.

Good cryptosystems are designed to be infeasible to break with the computing

power that is expected to evolve many years in the future.

Historical Terms

Historically, a code refers to a cryptosystem that deals with linguistic units:

words, phrases, sentences, and so forth. For example, the word â€śOCELOTâ€ť might be

the ciphertext for the entire phrase â€śTURN LEFT 90 DEGREES,â€ť the word â€śLOL-

LIPOPâ€ť might be the ciphertext for â€śTURN RIGHT 90 DEGREES,â€ť and the words

â€śBENT EARâ€ť might be the ciphertext for â€śHOWITZER.â€ť Codes of this type are not

discussed in this book; see [794,795]. Codes are only useful for specialized circum-

stances. Ciphers are useful for any circumstance. If your code has no entry for

â€śANTEATERS,â€ť then you canâ€™t say it. You can say anything with a cipher.

1.2 STEGANOGFWPHY

Steganography serves to hide secret messages in other messages, such that the

secretâ€™s very existence is concealed. Generally the sender writes an innocuous mes-

sage and then conceals a secret message on the same piece of paper. Historical tricks

include invisible inks, tiny pin punctures on selected characters, minute differences

between handwritten characters, pencil marks on typewritten characters, grilles

which cover most of the message except for a few characters, and so on.

Page 10

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

More recently, people are hiding secret messages in graphic images. Replace the

least significant bit of each byte of the image with the bits of the message. The

graphical image wonâ€™t change appreciably-most graphics standards specify more

gradations of color than the human eye can notice-and the message can be stripped

out at the receiving end. You can store a 64-kilobyte message in a 1024 x 1024 grey-

scale picture this way. Several public-domain programs do this sort of thing.

Peter Waynerâ€™s mimic functions obfuscate messages. These functions modify a

message so that its statistical profile resembles that of something else: the classi-

fieds section of The New York Times, a play by Shakespeare, or a newsgroup on the

Internet [1584,1585]. This type of steganography wonâ€™t fool a person, but it might

fool some big computers scanning the Internet for interesting messages.

1.3 SUBSTITUTION CIPHERS AND TRANSPOSITION CIPHERS

Before computers, cryptography consisted of character-based algorithms. Different

cryptographic algorithms either substituted characters for one another or transposed

characters with one another. The better algorithms did both, many times each.

Things are more complex these days, but the philosophy remains the same. The

primary change is that algorithms work on bits instead of characters. This is actu-

ally just a change in the alphabet size: from 26 elements to two elements. Most good

cryptographic algorithms still combine elements of substitution and transposition.

Substitution Ciphers

A substitution cipher is one in which each character in the plaintext is substi-

tuted for another character in the ciphertext. The receiver inverts the substitution

on the ciphertext to recover the plaintext.

In classical cryptography, there are four types of substitution ciphers:

- A simple substitution cipher, or monoalphabetic cipher, is one in

which each character of the plaintext is replaced with a correspond-

ing character of ciphertext. The cryptograms in newspapers are sim-

ple substitution ciphers.

- A homophonic substitution cipher is like a simple substitution cryp-

tosystem, except a single character of plaintext can map to one of sev-

eral characters of ciphertext. For example, â€śAâ€ť could correspond to

either 5, 13, 25, or 56, â€śBâ€ť could correspond to either 7, 19,31, or 42,

and so on.

- A polygram substitution cipher is one in which blocks of characters

are encrypted in groups. For example, â€śABAâ€ť could correspond to

â€śRTQ, â€ť â€śABBâ€ť could correspond to â€śSLL,â€ť and so on.

- A polyalphabetic substitution cipher is made up of multiple simple

substitution ciphers. For example, there might be five different sim-

ple substitution ciphers used; the particular one used changes with

the position of each character of the plaintext.

Page 11

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

The famous Caesar Cipher, in which each plaintext character is replaced by the

character three to the right modulo 26 (â€śAâ€ť is replaced by â€śD,â€ť â€śBâ€ť is replaced by

â€śE,â€ť . . . , â€śWâ€ť is replaced by â€śZ, â€ť â€śXâ€ť is replaced by â€śA,â€ť â€śYâ€ť is replaced by â€śB,â€ť

and â€śZâ€ť is replaced by â€śCâ€ť) is a simple substitution cipher. Itâ€™s actually even sim-

pler, because the ciphertext alphabet is a rotation of the plaintext alphabet and not

an arbitrary permutation.

ROT13 is a simple encryption program commonly found on UNIX systems; it is

also a simple substitution cipher. In this cipher, â€śAâ€ť is replaced by â€śN,â€ť â€śBâ€ť is

replaced by â€ś0,â€ť and so on. Every letter is rotated 13 places.

Encrypting a file twice with ROT13 restores the original file.

P = ROT13 (ROT13 (P))

ROT13 is not intended for security; it is often used in Usenet posts to hide poten-

tially offensive text, to avoid giving away the solution to a puzzle, and so forth.

Simple substitution ciphers can be easily broken because the cipher does not hide

the underlying frequencies of the different letters of the plaintext. All it takes is

about 25 English characters before a good cryptanalyst can reconstruct the plaintext

[1434]. An algorithm for solving these sorts of ciphers can be found in [578,587,

1600,78,1475,1236,880]. A good computer algorithm is [703].

Homophonic substitution ciphers were used as early as 1401 by the Duchy of Man-

tua [794]. They are much more complicated to break than simple substitution ciphers,

but still do not obscure all of the statistical properties of the plaintext language. With

a known-plaintext attack, the ciphers are trivial to break. A ciphertext-only attack is

harder, but only takes a few seconds on a computer. Details are in [ 126 11.

Polygram substitution ciphers are ciphers in which groups of letters are encrypted

together. The Playfair cipher, invented in 1854, was used by the British during

World War I [794]. It encrypts pairs of letters together. Its cryptanalysis is discussed

in [587,1475,880]. The Hill cipher is another example of a polygram substitution

cipher [732]. Sometimes you see Huffman coding used as a cipher; this is an insecure

polygram substitution cipher.

Polyalphabetic substitution ciphers were invented by Leon Battista in 1568 [794].

They were used by the Union army during the American Civil War. Despite the fact

that they can be broken easily [819,577,587,794] (especially with the help of com-

puters), many commercial computer security products use ciphers of this form

[ 1387,1390,1502]. (Details on how to break this encryption scheme, as used in Word-

Perfect, can be found in [ 135,139l.j The Vigenere cipher, first published in 1586, and

the Beaufort cipher are also examples of polyalphabetic substitution ciphers.

Polyalphabetic substitution ciphers have multiple one-letter keys, each of which

is used to encrypt one letter of the plaintext. The first key encrypts the first letter of

the plaintext, the second key encrypts the second letter of the plaintext, and so on.

After all the keys are used, the keys are recycled. If there were 20 one-letter keys,

then every twentieth letter would be encrypted with the same key. This is called the

period of the cipher. In classical cryptography, ciphers with longer periods were sig-

nificantly harder to break than ciphers with short periods. There are computer tech-

niques that can easily break substitution ciphers with very long periods.

Page 12

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

A running-key cipher-sometimes called a book cipher-in which one text is

used to encrypt another text, is another example of this sort of cipher. Even though

this cipher has a period the length of the text, it can also be broken easily [576,794].

Transposition Ciphers

In a transposition cipher the plaintext remains the same, but the order of charac-

ters is shuffled around. In a simple columnar transposition cipher, the plaintext is

written horizontally onto a piece of graph paper of fixed width and the ciphertext is

read off vertically (see Figure 1.4). Decryption is a matter of writing the ciphertext

vertically onto a piece of graph paper of identical width and then reading the plain-

text off horizontally.

Cryptanalysis of these ciphers is discussed in [587,1475]. Since the letters of the

ciphertext are the same as those of the plaintext, a frequency analysis on the cipher-

text would reveal that each letter has approximately the same likelihood as in

English. This gives a very good clue to a cryptanalyst, who can then use a variety of

techniques to determine the right ordering of the letters to obtain the plaintext.

Putting the ciphertext through a second transposition cipher greatly enhances secu-

rity. There are even more complicated transposition ciphers, but computers can

break almost all of them.

The German ADFGVX cipher, used during World War I, is a transposition cipher

combined with a simple substitution. It was a very complex algorithm for its day

but was broken by Georges Painvin, a French cryptanalyst [794].

Although many modern algorithms use transposition, it is troublesome because it

requires a lot of memory and sometimes requires messages to be only certain

lengths. Substitution is far more common.

Rotor Machines

In the 192Os, various mechanical encryption devices were invented to automate

the process of encryption. Most were based on the concept of a rotor, a mechanical

wheel wired to perform a general substitution.

A rotor machine has a keyboard and a series of rotors, and implements a version

of the Vigenkre cipher. Each rotor is an arbitrary permutation of the alphabet, has 26

positions, and performs a simple substitution. For example, a rotor might be wired

PtainfeXf:COMPUTERGRAPHlCSMAYBESLOWBUTATLEASTITâ€™SEXPENSIVE.

COMPUTERGR

APHICSMAYB

ESLOWBUTAT

LEASTITSEX

PENSIVE

Cipher-text: CAELPOPSEE MHLAN PIOSS UCWTI TSBIVEMUTE RATSG YAERBTX

Figure 1.4 Columnar transposition cipher.

Page 13

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

1.4 Simple XOR

to substitute â€śFâ€ť for â€śA,â€ť â€śUâ€ť for â€śB, â€ť â€śLâ€ť for â€śC,â€ť and so on. And the output pins

of one rotor are connected to the input pins of the next.

For example, in a 4-rotor machine the first rotor might substitute â€śFâ€ť for â€śA,â€ť the

second might substitute â€śYâ€ť for â€śF, â€ť the third might substitute â€śEâ€ť for â€śY,â€ť and the

fourth might substitute â€śCâ€ť for â€śEâ€ť; â€śCâ€ť would be the output ciphertext. Then

some of the rotors shift, so next time the substitutions will be different.

It is the combination of several rotors and the gears moving them that makes the

machine secure. Because the rotors all move at different rates, the period for an n-

rotor machine is 26â€ť. Some rotor machines can also have a different number of posi-

tions on each rotor, further frustrating cryptanalysis.

The best-known rotor device is the Enigma. The Enigma was used by the Ger-

mans during World War II. The idea was invented by Arthur Scherbius and Arvid

Gerhard Damm in Europe. It was patented in the United States by Arthur Scherbius

[ 13831. The Germans beefed up the basic design considerably for wartime use.

The German Enigma had three rotors, chosen from a set of five, a plugboard that

slightly permuted the plaintext, and a reflecting rotor that caused each rotor to oper-

ate on each plaintext letter twice. As complicated as the Enigma was, it was broken

during World War II. First, a team of Polish cryptographers broke the German

Enigma and explained their attack to the British. The Germans modified their

Enigma as the war progressed, and the British continued to cryptanalyze the new

versions. For explanations of how rotor ciphers work and how they were broken, see

[794,86,448,498,446,880,1315,1587,690]. Two fascinating accounts of how the

Enigma was broken are [ 735,796].

Further Reading

This is not a book about classical cryptography, so I will not dwell further on these

subjects. Two excellent precomputer cryptology books are [587,1475]; [448] presents

some modern cryptanalysis of cipher machines. Dorothy Denning discusses many of

these ciphers in [456] and [880] h as some fairly complex mathematical analysis of the

same ciphers. Another older cryptography text, which discusses analog cryptogra-

phy, is (991. An article that presents a good overview of the subject is [579]. David

Kahnâ€™s historical cryptography books are also excellent [794,795,796].

1.4 XOR

SIMPLE

XOR is exclusive-or operation: â€śâ€™ in C or 0 in mathematical notation. Itâ€™s a stan-

dard operation on bits:

ooo=o

OOl=l

lOO=l

lOl=O

Also note that:

aOa=O

Ll@b@b=Ll

Page 14

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

The simple-XOR algorithm is really an embarrassment; itâ€™s nothing more than a

Vigenere polyalphabetic cipher. Itâ€™s here only because of its prevalence in commer-

cial software packages, at least those in the MS-DOS and Macintosh worlds

[1502,1387]. Unfortunately, if a software security program proclaims that it has a

â€śproprietaryâ€ť encryption algorithm-significantly faster than DES-the odds are

that it is some variant of this.

Crypto key input-file output-file */

I* Usage:

void main (int argc, char *argv[l)

I

FILE *fi, *fo;

char *cp;

int c;

if ((cp = argv[ll) && *Cp!='\O') 1

if ((fi = fopen(argv[Zl, "t-b")) != NULL) I

if ((fo = fopen(argv[31, "wb")) != NULL) (

while (Cc = getc(fi)) != EOF) I

if (!*cp) cp = argv[ll;

c A= *ccp++j;

putc(c,fo);

fclose(f0);

fclose(fi);

This is a symmetric algorithm. The plaintext is being XORed with a keyword to

generate the ciphertext. Since XORing the same value twice restores the original,

encryption and decryption use exactly the same program:

POK=C

COK=P

Thereâ€™s no real security here. This kind of encryption is trivial to break, even

without computers [587,1475]. It will only take a few seconds with a computer.

Assume the plaintext is English. Furthermore, assume the key length is any small

number of bytes. Hereâ€™s how to break it:

1. Discover the length of the key by a procedure known as counting coinci-

dences [577]. XOR the ciphertext against itself shifted various numbers of

bytes, and count those bytes that are equal. If the displacement is a multi-

ple of the key length, then something over 6 percent of the bytes will be

equal. If it is not, then less than 0.4 percent will be equal (assuming a ran-

dom key encrypting normal ASCII text; other plaintext will have different

numbers). This is called the index of coincidence. The smallest displace-

ment that indicates a multiple of the key length is the length of the key.

,

Page 15

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

2. Shift the ciphertext by that length and XOR it with itself. This removes

the key and leaves you with plaintext XORed with the plaintext shifted

the length of the key. Since English has 1.3 bits of real information per byte

(see Section 11.l), there is plenty of redundancy for determining a unique

decryption.

Despite this, the list of software vendors that tout this toy algorithm as being

â€śalmost as secure as DESâ€ť is staggering [1387]. It is the algorithm (with a 160-bit

repeated â€śkeyâ€ť) that the NSA finally allowed the U.S. digital cellular phone indus-

try to use for voice privacy. An XOR might keep your kid sister from reading your

files, but it wonâ€™t stop a cryptanalyst for more than a few minutes.

1.5 ONE-TIME PADS

Believe it or not, there is a perfect encryption scheme. Itâ€™s called a one-time pad, and

was invented in 1917 by Major Joseph Mauborgne and AT&Tâ€™s Gilbert Vernam

[794]. (Actually, a one-time pad is a special case of a threshold scheme; see Section

3.7.) Classically, a one-time pad is nothing more than a large nonrepeating set of

truly random key letters, written on sheets of paper, and glued together in a pad. In

its original form, it was a one-time tape for teletypewriters. The sender uses each

key letter on the pad to encrypt exactly one plaintext character. Encryption is the

addition modulo 26 of the plaintext character and the one-time pad key character.

Each key letter is used exactly once, for only one message. The sender encrypts

the message and then destroys the used pages of the pad or used section of the tape.

The receiver has an identical pad and uses each key on the pad, in turn, to decrypt

each letter of the ciphertext. The receiver destroys the same pad pages or tape sec-

tion after decrypting the message. New message-new key letters. For example, if

the message is:

ONETIMEPAD

and the key sequence from the pad is

TBFRGFARFM

then the ciphertext is

IPKLPSFHGQ

because

O+Tmod26=1

N+Bmod26=P

E+Fmod26=K

etc.

Page 16

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER Foundations

1

Assuming an eavesdropper canâ€™t get access to the one-time pad used to encrypt

the message, this scheme is perfectly secure. A given ciphertext message is equally

likely to correspond to any possible plaintext message of equal size.

Since every key sequence is equally likely (remember, the key letters are gener-

ated randomly), an adversary has no information with which to cryptanalyze the

ciphertext. The key sequence could just as likely be:

POYYAEAAZX

which would decrypt to:

SALMONEGGS

or

BXFGBMTMXM

which would decrypt to:

GREENFLUID

This point bears repeating: Since every plaintext message is equally possible,

there is no way for the cryptanalyst to determine which plaintext message is the

correct one. A random key sequence added to a nonrandom plaintext message pro-

duces a completely random ciphertext message and no amount of computing power

can change that.

The caveat, and this is a big one, is that the key letters have to be generated ran-

domly. Any attacks against this scheme will be against the method used to generate

the key letters. Using a pseudo-random number generator doesnâ€™t count; they

always have nonrandom properties. If you use a real random source-this is much

harder than it might first appear, see Section 17.14-itâ€™s secure.

The other important point is that you can never use the key sequence again, ever.

Even if you use a multiple-gigabyte pad, if a cryptanalyst has multiple ciphertexts

whose keys overlap, he can reconstruct the plaintext. He slides each pair of cipher-

texts against each other and counts the number of matches at each position. If they

are aligned right, the proportion of matches jumps suddenly-the exact percentages

depend on the plaintext language. From this point cryptanalysis is easy. Itâ€™s like the

index of coincidence, but with just two â€śperiodsâ€ť to compare [904]. Donâ€™t do it.

The idea of a one-time pad can be easily extended to binary data. Instead of a one-

time pad consisting of letters, use a one-time pad of bits. Instead of adding the plain-

text to the one-time pad, use an XOR. To decrypt, XOR the ciphertext with the same

one-time pad. Everything else remains the same and the security is just as perfect.

This all sounds good, but there are a few problems. Since the key bits must be ran-

dom and can never be used again, the length of the key sequence must be equal to

the length of the message. A one-time pad might be suitable for a few short mes-

sages, but it will never work for a 1.544 Mbps communications channel. You can

store 650 megabytes worth of random bits on a CD-ROM, but there are problems.

First, you want exactly two copies of the random bits, but CD-ROMs are economi-

Page 17

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

1.7 Large Numbers

cal only for large quantities. And second, you want to be able to destroy the bits

already used. CD-ROM has no erase facilities except for physically destroying the

entire disk. Digital tape is a much better medium for this sort of thing.

Even if you solve the key distribution and storage problem, you have to make sure

the sender and receiver are perfectly synchronized. If the receiver is off by a bit (or if

some bits are dropped during the transmission), the message wonâ€™t make any sense.

On the other hand, if some bits are altered during transmission (without any bits

being added or removed-something far more likely to happen due to random noise],

only those bits will be decrypted incorrectly. But on the other hand, a one-time pad

provides no authenticity.

One-time pads have applications in todayâ€™s world, primarily for ultra-secure low-

bandwidth channels. The hotline between the United States and the former Soviet

Union was (is it still active?) rumored to be encrypted with a one-time pad. Many

Soviet spy messages to agents were encrypted using one-time pads. These messages

are still secure today and will remain that way forever. It doesnâ€™t matter how long

the supercomputers work on the problem. Even after the aliens from Andromeda

land with their massive spaceships and undreamed-of computing power, they will

not be able to read the Soviet spy messages encrypted with one-time pads (unless

they can also go back in time and get the one-time pads).

1.6 COMPUTER ALGORITHMS

There are many cryptographic algorithms. These are three of the most common:

- DES (Data Encryption Standard) is the most popular computer encryp-

tion algorithm. DES is a U.S. and international standard. It is a sym-

metric algorithm; the same key is used for encryption and decryption.

- RSA (named for its creators-Rivest, Shamir, and Adleman) is the

most popular public-key algorithm. It can be used for both encryption

and digital signatures.

- DSA (Digital Signature Algorithm, used as part of the Digital Signa-

ture Standard) is another public-key algorithm. It cannot be used for

encryption, but only for digital signatures.

These are the kinds of stuff this book is about.

1.7 LARGENUMBERS

Throughout this book, I use various large numbers to describe different things in

cryptography. Because it is so easy to lose sight of these numbers and what they sig-

nify, Table 1.1 gives physical analogues for some of them.

These numbers are order-of-magnitude estimates, and have been culled from a

variety of sources. Many of the astrophysics numbers are explained in Freeman

Page 18

Prev. Chapter Home Previous Page

Next Page

Prev. page

Next Page Next Chapter

CHAPTER1 Foundations

TABLE 1.1

Large Numbers

Physical Analogue Number

Odds of being killed by lightning (per day) 1 in 9 billion (233)

Odds of winning the top prize in a U.S. state lottery 1 in 4,000,OOO(222)

Odds of winning the top prize in a U.S. state lottery

and being killed by lightning in the same day 1 in 255

Odds of drowning (in the U.S. per year) 1 in 59,000 (21h)

Odds of being killed in an automobile accident

(in the U.S. in 1993) 1 in 6100 (2lâ€ť)

Odds of being killed in an automobile accident

(in the U.S. per lifetime) 1 in 88 (2â€™)

Time until the next ice age 14,000 (214)years

Time until the sun goes nova IO9 (2â€ťâ€ś) years

Age of the planet lo9 (230)years

Age of the Universe 10â€ť (2â€ťâ€ś) years

Number of atoms in the planet 105l (2'70)

Number of atoms in the sun 105â€™ (2'90)

Number of atoms in the galaxy 106â€™ (2223)

Number of atoms in the Universe (dark matter excluded) 10â€ť (2265)

Volume of the Universe 1O84 (2280)cm3

If the Universe is Closed:

Total lifetime of the Universe IOâ€ť (237)years

10â€ť (261)seconds

If the Universe is Open:

Time until low-mass stars cool off 1014(2â€ťâ€˜) years

Time until planets detach from stars 1015(250)years

Time until stars detach from galaxies 1019(2â€ťâ€ś) years

Time until orbits decay by gravitational radiation 1020(267)years

Time until black holes decay by the Hawking process 1O64(2â€™13)years

Time until all matter is liquid at zero temperature 1O65(2â€™16)years

Time until all matter decays to iron 101026years

Time until all matter collapses to black holes 101076years

Dysonâ€™s paper, â€śTime Without End: Physics and Biology in an Open Universe,â€ť in

Reviews of Modern Physics, v. 52, n. 3, July 1979, pp. 447-460. Automobile accident

deaths are calculated from the Department of Transportationâ€™s statistic of 163

deaths per million people in 1993 and an average lifespan of 69.7 years.