ńňđ. 9 |

encryption/decryption boxes plugged into a communications line and encrypted all the data going

across that line. Although software encryption is becoming more prevalent today, hardware is still the

embodiment of choice for military and serious commercial applications. The NSA, for example, only

authorizes encryption in hardware. There are several reasons why this is so.

Table 10.5

Comparing File-Level and Driver-Level Encryption

File-Level Encryption Driver-Level Encryption

Benefits:

Ease of implementation and use. Temporary files, work files, and so forth can be kept

Flexible. on the secure drive.

Relatively small performance penalty. Itâ€™s harder to forget to re-encrypt something on this

Users can move files between different machines kind of system.

without problems.

Users can back files up without problems.

Security Issues:

Potential leakage through security-unconscious Lots of things can go wrong with a device-driver or

programs. (Program may write file to disk for memory-resident program.

temporary storage, for example.) Bad implementations will allow chosen-plaintext, or

Bad implementations may always re-encrypt with even chosen-ciphertext attacks.

same key for same password. If whole system is master-keyed under one

password, loss of that password means that the

attacker gets everything.

A more limited set of ciphers can reasonably be

used for this kind of application. For example, OFB

stream ciphers would not work.

Usability Problems:

User has to figure out what to do. There will be a performance penalty.

There may be different passwords for different files. The driver may interact in weird ways with

Manual encryption of selected files is the only access Windows, OS/2 DOS emulation, device drivers, and

control. so on.

The first is speed. As we will see in Part III, encryption algorithms consist of many complicated

operations on plaintext bits. These are not the sorts of operations that are built into your run-of-the-

mill computer. The two most common encryption algorithms, DES and RSA, run inefficiently on

general-purpose processors. While some cryptographers have tried to make their algorithms more

Page 189 of 666

Applied Cryptography: Second Edition - Bruce Schneier

suitable for software implementation, specialized hardware will always win a speed race.

Additionally, encryption is often a computation-intensive task. Tying up the computerâ€™s primary

processor for this is inefficient. Moving encryption to another chip, even if that chip is just another

processor, makes the whole system faster.

The second reason is security. An encryption algorithm running on a generalized computer has no

physical protection. Mallory can go in with various debugging tools and surreptitiously modify the

algorithm without anyone ever realizing it. Hardware encryption devices can be securely

encapsulated to prevent this. Tamperproof boxes can prevent someone from modifying a hardware

encryption device. Special-purpose VLSI chips can be coated with a chemical such that any attempt to

access their interior will result in the destruction of the chipâ€™s logic. The U.S. governmentâ€™s Clipper

and Capstone chips (see Sections 24.16 and 24.17) are designed to be tamperproof. The chips can be

designed so that it is impossible for Mallory to read the unencrypted key.

IBM developed a cryptographic system for encrypting data and communications on mainframe

computers [515,1027]. It includes tamper-resistant modules to hold keys. This system is discussed in

Section 24.1.

Electromagnetic radiation can sometimes reveal what is going on inside a piece of electronic

equipment. Dedicated encryption boxes can be shielded, so that they leak no compromising

information. General-purpose computers can be shielded as well, but it is a far more complex

problem. The U.S. military calls this TEMPEST; itâ€™s a subject well beyond the scope of this book.

The final reason for the prevalence of hardware is the ease of installation. Most encryption

applications donâ€™t involve general-purpose computers. People may wish to encrypt their telephone

conversations, facsimile transmissions, or data links. It is cheaper to put special-purpose encryption

hardware in the telephones, facsimile machines, and modems than it is to put in a microprocessor and

software.

Even when the encrypted data comes from a computer, it is easier to install a dedicated hardware

encryption device than it is to modify the computerâ€™s system software. Encryption should be invisible;

it should not hamper the user. The only way to do this in software is to write encryption deep into the

operating system. This isnâ€™t easy. On the other hand, even a computer neophyte can plug an

encryption box between his computer and his external modem.

The three basic kinds of encryption hardware on the market today are: self-contained encryption

modules (that perform functions such as password verification and key management for banks),

dedicated encryption boxes for communications links, and boards that plug into personal computers.

Some encryption boxes are designed for certain types of communications links, such as T-1 encryption

boxes that are designed not to encrypt synchronization bits. There are different boxes for

synchronous and asynchronous communications lines. Newer boxes tend to accept higher bit rates

and are more versatile.

Even so, many of these devices have some incompatibilities. Buyers should be aware of this and be

well-versed in their particular needs, lest they find themselves the owners of encryption equipment

unable to perform the task at hand. Pay attention to restrictions in hardware type, operating system,

applications software, network, and so forth.

PC-board encryptors usually encrypt everything written to the hard disk and can be configured to

encrypt everything sent to the floppy disk and serial port as well. These boards are not shielded

against electromagnetic radiation or physical interference, since there would be no benefit in

Page 190 of 666

Applied Cryptography: Second Edition - Bruce Schneier

protecting the boards if the computer remained unaffected.

More companies are starting to put encryption hardware into their communications equipment.

Secure telephones, facsimile machines, and modems are all available.

Internal key management for these devices is generally secure, although there are as many different

schemes as there are equipment vendors. Some schemes are more suited for one situation than

another, and buyers should know what kind of key management is incorporated into the encryption

box and what they are expected to provide themselves.

Software

Any encryption algorithm can be implemented in software. The disadvantages are in speed, cost, and

ease of modification (or manipulation). The advantages are in flexibility and portability, ease of use,

and ease of upgrade. The algorithms written in C at the end of this book can be implemented, with

little modification, on any computer. They can be inexpensively copied and installed on many

machines. They can be incorporated into larger applications, such as communications programs or

word processors.

Software encryption programs are popular and are available for all major operating systems. These

are meant to protect individual files; the user generally has to manually encrypt and decrypt specific

files. It is important that the key management scheme be secure: The keys should not be stored on

disk anywhere (or even written to a place in memory from where the processor swaps out to disk).

Keys and unencrypted files should be erased after encryption. Many programs are sloppy in this

regard, and a user has to choose carefully.

Of course, Mallory can always replace the software encryption algorithm with something lousy. But

for most users, that isnâ€™t a problem. If Mallory can break into our office and modify our encryption

program, he can also put a hidden camera on the wall, a wiretap on the telephone, and a TEMPEST

detector down the street. If Mallory is that much more powerful than the user, the user has lost the

game before it starts.

10.6 Compression, Encoding, and Encryption

Using a data compression algorithm together with an encryption algorithm makes sense for two

reasons:

Cryptanalysis relies on exploiting redundancies in the plaintext; compressing a file before

encryption reduces these redundancies.

Encryption is time-consuming; compressing a file before encryption speeds up the entire

process.

The important thing to remember is to compress before encryption. If the encryption algorithm is any

good, the ciphertext will not be compressible; it will look like random data. (This makes a reasonable

test of an encryption algorithm; if the ciphertext can be compressed, then the algorithm probably isnâ€™t

very good.)

If you are going to add any type of transmission encoding or error detection and recovery, remember

to add that after encryption. If there is noise in the communications path, decryptionâ€™s error-

extension properties will only make that noise worse. Figure 10.3 summarizes these steps.

Page 191 of 666

Applied Cryptography: Second Edition - Bruce Schneier

10.7 Detecting Encryption

How does Eve detect an encrypted file? Eve is in the spy business, so this is an important question.

Imagine that sheâ€™s eavesdropping on a network where messages are flying in all directions at high

speeds; she has to pick out the interesting ones. Encrypted files are certainly interesting, but how does

she know they are encrypted?

Generally, she relies on the fact that most popular encryption programs have well-defined headers.

Electronic-mail messages encrypted with either PEM or PGP (see Sections 24.10 and 24.12) are easy

to identify for that reason.

Other file encryptors just produce a ciphertext file of seemingly random bits. How can she distinguish

it from any other file of seemingly random bits? There is no sure way, but Eve can try a number of

things:

â€” Examine the file. ASCII text is easy to spot. Other file formats, such as TIFF, TeX, C,

Postscript, G3 facsimile, or Microsoft Excel, have standard identifying characteristics.

Executable code is detectable, as well. UNIX files often have â€śmagic numbersâ€ť that can be

detected.

Figure 10.3 Encryption with compression and error control.

â€” Try to uncompress the file, using the major compression algorithms. If the file is

compressed (and not encrypted), this should yield the original file.

â€” Try to compress the file. If the file is ciphertext (and the algorithm is good), then the

probability that the file can be appreciably compressed by a general-purpose compression

routine is small. (By appreciably, I mean more than 1 or 2 percent.) If it is something else (a

binary image or a binary data file, for example) it probably can be compressed.

Any file that cannot be compressed and is not already compressed is probably ciphertext. (Of course,

it is possible to specifically make ciphertext that is compressible.) Identifying the algorithm is a whole

lot harder. If the algorithm is good, you canâ€™t. If the algorithm has some slight biases, it might be

possible to recognize those biases in the file. However, the biases have to be pretty significant or the

file has to be pretty big in order for this to work.

10.8 Hiding Ciphertext in Ciphertext

Alice and Bob have been sending encrypted messages to each other for the past year. Eve has been

collecting them all, but she cannot decrypt any of them. Finally, the secret police tire of all this

unreadable ciphertext and arrest the pair. â€śGive us your encryption keys,â€ť they demand. Alice and

Bob refuse, but then they notice the thumbscrews. What can they do?

Wouldnâ€™t it be nice to be able to encrypt a file such that there are two possible decryptions, each with

a different key. Alice could encrypt a real message to Bob in one of the keys and some innocuous

message in the other key. If Alice were caught, she could surrender the key to the innocuous message

and keep the real key secret.

The easiest way to do this is with one-time pads. Let P be the plaintext, D the dummy plaintext, C the

Page 192 of 666

Applied Cryptography: Second Edition - Bruce Schneier

ciphertext, K the real key, and Kâ€™ the dummy key. Alice encrypts P:

P K=C

Alice and Bob share K, so Bob can decrypt C:

C K=P

If the secret police ever force them to surrender their key, they donâ€™t surrender K, but instead

surrender:

Kâ€™ = C D

The police then recover the dummy plaintext:

C Kâ€™ = D

Since these are one-time pads and K is completely random, there is no way to prove that Kâ€™ was not

the real key. To make matters more convincing, Alice and Bob should concoct some mildly

incriminating dummy messages to take the place of the really incriminating real messages. A pair of

Israeli spies once did this.

Alice could take P and encrypt it with her favorite algorithm and key K to get C. Then she takes C and

XORs it with some piece of mundane plaintextâ€”Pride and Prejudice for example, to get Kâ€™. She stores

both C and the XOR on her hard disk. Now, when the secret police interrogate her, she can explain

that she is an amateur cryptographer and that Kâ€™ is a merely one-time pad for C. The secret police

might suspect something, but unless they know K they cannot prove that Aliceâ€™s explanation isnâ€™t

valid.

Another method is to encrypt P with a symmetric algorithm and K, and D with Kâ€™. Intertwine bits (or

bytes) of the ciphertext to make the final ciphertexts. If the secret police demand the key, Alice gives

them Kâ€™ and says that the alternating bits (or bytes) are random noise designed to frustrate

cryptanalysis. The trouble is the explanation is so implausible that the secret police will probably not

believe her (especially considering it is suggested in this book).

A better way is for Alice to create a dummy message, D, such that the concatenation of P and D,

compressed, is about the same size as D. Call this concatenation Pâ€™. Alice then encrypts Pâ€™ with

whatever algorithm she and Bob share to get C. Then she sends C to Bob. Bob decrypts C to get Pâ€™,

and then P and D. Then they both compute C D = Kâ€™. This Kâ€™ becomes the dummy one-time pad

they use in case the secret police break their doors down. Alice has to transmit D so that hers and

Bobâ€™s alibis match.

Another method is for Alice to take an innocuous message and run it through some error-correcting

code. Then she can introduce errors that correspond to the secret encrypted message. On the

receiving end, Bob can extract the errors to reconstruct the secret message and decrypt it. He can also

use the error-correcting code to recover the innocuous message. Alice and Bob might be hard pressed

to explain to the secret police why they consistently get a 30 percent bit-error rate on an otherwise

noise-free computer network, but in some circumstances this scheme can work.

Finally, Alice and Bob can use the subliminal channels in their digital signature algorithms (see

Sections 4.2 and 23.3). This is undetectable, works great, but has the drawback of only allowing 20 or

so characters of subliminal text to be sent per signed innocuous message. It really isnâ€™t good for much

Page 193 of 666

Applied Cryptography: Second Edition - Bruce Schneier

more than sending keys.

10.9 Destroying Information

When you delete a file on most computers, the file isnâ€™t really deleted. The only thing deleted is an

entry in the diskâ€™s index file, telling the machine that the file is there. Many software vendors have

made a fortune selling file-recovery software that recovers files after they have been deleted.

And thereâ€™s yet another worry: Virtual memory means your computer can read and write memory to

disk any time. Even if you donâ€™t save it, you never know when a sensitive document you are working

on is shipped off to disk. This means that even if you never save your plaintext data, your computer

might do it for you. And driver-level compression programs like Stacker and DoubleSpace can make

it even harder to predict how and where information is stored on a disk.

To erase a file so that file-recovery software cannot read it, you have to physically write over all of the

fileâ€™s bits on the disk. According to the National Computer Security Center [1148]:

Overwriting is a process by which unclassified data are written to storage locations that

previously held sensitive data.... To purge the...storage media, the DoD requires

overwriting with a pattern, then its complement, and finally with another pattern; e.g.,

overwrite first with 0011 0101, followed by 1100 1010, then 1001 0111. The number of

times an overwrite must be accomplished depends on the storage media, sometimes on its

sensitivity, and sometimes on different DoD component requirements. In any case, a purge

is not complete until a final overwrite is made using unclassified data.

You may have to erase files or you may have to erase entire drives. You should also erase all unused

space on your hard disk.

Most commercial programs that claim to implement the DoD standard overwrite three times: first

with all ones, then with all zeros, and finally with a repeating one-zero pattern. Given my general level

of paranoia, I recommend overwriting a deleted file seven times: the first time with all ones, the

second time with all zeros, and five times with a cryptographically secure pseudo-random sequence.

Recent developments at the National Institute of Standards and Technology with electron-tunneling

microscopes suggest even that might not be enough. Honestly, if your data is sufficiently valuable,

assume that it is impossible to erase data completely off magnetic media. Burn or shred the media; itâ€™s

cheaper to buy media new than to lose your secrets.

Part III

Cryptographic Algorithms

Chapter 11

Mathematical Background

11.1 Information Theory

Modern information theory was first published in 1948 by Claude Elmwood Shannon [1431, 1432].

(His papers have been reprinted by the IEEE Press [1433].) For a good mathematical treatment of the

topic, consult [593]. In this section, I will just sketch some important ideas.

Page 194 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Entropy and Uncertainty

Information theory defines the amount of information in a message as the minimum number of bits

needed to encode all possible meanings of that message, assuming all messages are equally likely. For

example, the day-of-the-week field in a database contains no more than 3 bits of information, because

the information can be encoded with 3 bits:

000 = Sunday

001 = Monday

010 = Tuesday

011 = Wednesday

100 = Thursday

101 = Friday

110 = Saturday

111 is unused

If this information were represented by corresponding ASCII character strings, it would take up

more memory space but would not contain any more information. Similarly, the â€śsexâ€ť field of a

database contains only 1 bit of information, even though it might be stored as one of two 6-byte ASCII

strings: â€śMALEâ€ť or â€śFEMALE.â€ť

Formally, the amount of information in a message M is measured by the entropy of a message,

denoted by H(M). The entropy of a message indicating sex is 1 bit; the entropy of a message indicating

the day of the week is slightly less than 3 bits. In general, the entropy of a message measured in bits is

log2 n, in which n is the number of possible meanings. This assumes that each meaning is equally

likely.

The entropy of a message also measures its uncertainty. This is the number of plaintext bits needed to

be recovered when the message is scrambled in ciphertext in order to learn the plaintext. For example,

if the ciphertext block â€śQHP*5Mâ€ť is either â€śMALEâ€ť or â€śFEMALE, â€ť then the uncertainty of the

message is 1. A cryptanalyst has to learn only one well-chosen bit to recover the message.

Rate of a Language

For a given language, the rate of the language is

r = H(M)/N

in which N is the length of the message. The rate of normal English takes various values between 1.0

bits/letter and 1.5 bits/letter, for large values of N. Shannon, in [1434], said that the entropy depends

on the length of the text. Specifically he indicated a rate of 2.3 bits/letter for 8-letter chunks, but the

rate drops to between 1.3 and 1.5 for 16-letter chunks. Thomas Cover used a gambling estimating

technique and found an entropy of 1.3 bits/character [386]. (Iâ€™ll use 1.3 in this book.) The absolute

rate of a language is the maximum number of bits that can be coded in each character, assuming each

character sequence is equally likely. If there are L characters in a language, the absolute rate is:

R = log2 L

This is the maximum entropy of the individual characters.

For English, with 26 letters, the absolute rate is log2 26, or about 4.7 bits/letter. It should come as no

surprise to anyone that the actual rate of English is much less than the absolute rate; natural language

Page 195 of 666

Applied Cryptography: Second Edition - Bruce Schneier

is highly redundant.

The redundancy of a language, denoted D, is defined by:

D=R-r

Given that the rate of English is 1.3, the redundancy is 3.4 bits/letter. This means that each English

character carries 3.4 bits of redundant information.

An ASCII message that is nothing more than printed English has 1.3 bits of information per byte of

message. This means it has 6.7 bits of redundant information, giving it an overall redundancy of 0.84

bits of information per bit of ASCII text, and an entropy of 0.16 bits of information per bit of ASCII

text. The same message in BAUDOT, at 5 bits per character, has a redundancy of 0.74 bits per bit and

an entropy of 0.26 bits per bit. Spacing, punctuation, numbers, and formatting modify these results.

Security of a Cryptosystem

Shannon defined a precise mathematical model of what it means for a cryptosystem to be secure. The

goal of a cryptanalyst is to determine the key K, the plaintext P, or both. However, he may be satisfied

with some probabilistic information about P: whether it is digitized audio, German text, spreadsheet

data, or something else.

In most real-world cryptanalysis, the cryptanalyst has some probabilistic information about P before

he even starts. He probably knows the language of the plaintext. This language has a certain

redundancy associated with it. If it is a message to Bob, it probably begins with â€śDear Bob.â€ť Certainly

â€śDear Bobâ€ť is more probable than â€śe8T&g [, m.â€ť The purpose of cryptanalysis is to modify the

probabilities associated with each possible plaintext. Eventually one plaintext will emerge from the

pile of possible plaintexts as certain (or at least, very probable).

There is such a thing as a cryptosystem that achieves perfect secrecy: a cryptosystem in which the

ciphertext yields no possible information about the plaintext (except possibly its length). Shannon

theorized that it is only possible if the number of possible keys is at least as large as the number of

possible messages. In other words, the key must be at least as long as the message itself, and no key

can be reused. In still other words, the one-time pad (see Section 1.5) is the only cryptosystem that

achieves perfect secrecy.

Perfect secrecy aside, the ciphertext unavoidably yields some information about the corresponding

plaintext. A good cryptographic algorithm keeps this information to a minimum; a good cryptanalyst

exploits this information to determine the plaintext.

Cryptanalysts use the natural redundancy of language to reduce the number of possible plaintexts.

The more redundant the language, the easier it is to cryptanalyze. This is the reason that many real-

world cryptographic implementations use a compression program to reduce the size of the text before

encrypting it. Compression reduces the redundancy of a message as well as the work required to

encrypt and decrypt.

The entropy of a cryptosystem is a measure of the size of the keyspace, K. It is approximated by the

base two logarithm of the number of keys:

H(K) = log2 K

A cryptosystem with a 64-bit key has an entropy of 64 bits; a cryptosystem with a 56-bit key has an

Page 196 of 666

Applied Cryptography: Second Edition - Bruce Schneier

entropy of 56 bits. In general, the greater the entropy, the harder it is to break a cryptosystem.

Unicity Distance

For a message of length n, the number of different keys that will decipher a ciphertext message to

some intelligible plaintext in the same language as the original plaintext (such as an English text

string) is given by the following formula [712,95]:

2H(K)- nD - 1

Shannon [1432] defined the unicity distance, U, also called the unicity point, as an approximation of

the amount of ciphertext such that the sum of the real information (entropy) in the corresponding

plaintext plus the entropy of the encryption key equals the number of ciphertext bits used. He then

went on to show that ciphertexts longer than this distance are reasonably certain to have only one

meaningful decryption. Ciphertexts significantly shorter than this are likely to have multiple, equally

valid decryptions and therefore gain security from the opponentâ€™s difficulty in choosing the correct

one.

For most symmetric cryptosystems, the unicity distance is defined as the entropy of the cryptosystem

divided by the redundancy of the language.

U = H(K)/D

Unicity distance does not make deterministic predictions, but gives probabilistic results. Unicity

distance estimates the minimum amount of ciphertext for which it is likely that there is only a single

intelligible plaintext decryption when a brute-force attack is attempted. Generally, the longer the

unicity distance, the better the cryptosystem. For DES, with a 56-bit key, and an ASCII English

message, the unicity distance is about 8.2 ASCII characters or 66 bits. Table 11.1 gives the unicity

distances for varying key lengths. The unicity distances for some classical cryptosystems are found in

[445].

Unicity distance is not a measure of how much ciphertext is required for cryptanalysis, but how much

ciphertext is required for there to be only one reasonable solution for cryptanalysis. A cryptosystem

may be computationally infeasible to break even if it is theoretically possible to break it with a small

amount of ciphertext. (The largely esoteric theory of relativized cryptography is relevant here [230,

231, 232, 233, 234, 235].) The unicity distance is inversely proportional to the redundancy. As

redundancy approaches zero, even a trivial cipher can be unbreakable with a ciphertext-only attack.

Shannon defined a cryptosystem whose unicity distance is infinite as one that has ideal secrecy. Note

that an ideal cryptosystem is not necessarily a perfect cryptosystem, although a perfect cryptosystem

would necessarily be an ideal cryptosystem. If a cryptosystem has ideal secrecy, even successful

cryptanalysis will leave some uncertainty about whether the recovered plaintext is the real plaintext.

Information Theory in Practice

While these concepts have great theoretical value, actual cryptanalysis seldom proceeds along these

lines. Unicity distance guarantees insecurity if itâ€™s too small but does not guarantee security if itâ€™s

high. Few practical algorithms are absolutely impervious to analysis; all manner of characteristics

might serve as entering wedges to crack some encrypted messages. However, similar information

theory considerations are occasionally useful, for example, to determine a recommended key change

interval for a particular algorithm. Cryptanalysts also employ a variety of statistical and information

Page 197 of 666

Applied Cryptography: Second Edition - Bruce Schneier

theory tests to help guide the analysis in the most promising directions. Unfortunately, most literature

on applying information theory to cryptanalysis remains classified, including the seminal 1940 work

of Alan Turing.

Table 11.1

Unicity Distances of ASCII Text Encrypted

with Algorithms with Varying Key Lengths

Key Length (in bits) Unicity Distance (in characters)

40 5.9

56 8.2

64 9.4

80 11.8

128 18.8

256 37.6

Confusion and Diffusion

The two basic techniques for obscuring the redundancies in a plaintext message are, according to

Shannon, confusion and diffusion [1432].

Confusion obscures the relationship between the plaintext and the ciphertext. This frustrates attempts

to study the ciphertext looking for redundancies and statistical patterns. The easiest way to do this is

through substitution. A simple substitution cipher, like the Caesar Cipher, is one in which every

identical letter of plaintext is substituted for a single letter of ciphertext. Modern substitution ciphers

are more complex: A long block of plaintext is substituted for a different block of ciphertext, and the

mechanics of the substitution change with each bit in the plaintext or key. This type of substitution is

not necessarily enough; the German Enigma is a complex substitution algorithm that was broken

during World War II.

Diffusion dissipates the redundancy of the plaintext by spreading it out over the ciphertext. A

cryptanalyst looking for those redundancies will have a harder time finding them. The simplest way

to cause diffusion is through transposition (also called permutation). A simple transposition cipher,

like columnar transposition, simply rearranges the letters of the plaintext. Modern ciphers do this

type of permutation, but they also employ other forms of diffusion that can diffuse parts of the

message throughout the entire message.

Stream ciphers rely on confusion alone, although some feedback schemes add diffusion. Block

algorithms use both confusion and diffusion. As a general rule, diffusion alone is easily cracked

(although double transposition ciphers hold up better than many other pencil-and-paper systems).

11.2 Complexity Theory

Complexity theory provides a methodology for analyzing the computational complexity of different

cryptographic techniques and algorithms. It compares cryptographic algorithms and techniques and

determines their security. Information theory tells us that all cryptographic algorithms (except one-

time pads) can be broken. Complexity theory tells us whether they can be broken before the heat

death of the universe.

Page 198 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Complexity of Algorithms

An algorithmâ€™s complexity is determined by the computational power needed to execute it. The

computational complexity of an algorithm is often measured by two variables: T (for time complexity)

and S (for space complexity, or memory requirement). Both T and S are commonly expressed as

functions of n, where n is the size of the input. (There are other measures of complexity: the number

of random bits, the communications bandwidth, the amount of data, and so on.)

Generally, the computational complexity of an algorithm is expressed in what is called â€śbig Oâ€ť

notation: the order of magnitude of the computational complexity. Itâ€™s just the term of the complexity

function which grows the fastest as n gets larger; all lower-order terms are ignored. For example, if

the time complexity of a given algorithm is 4n2 + 7n + 12, then the computational complexity is on the

order of n2, expressed O(n2).

Measuring time complexity this way is system-independent. You donâ€™t have to know the exact timings

of various instructions or the number of bits used to represent different variables or even the speed of

the processor. One computer might be 50 percent faster than another and a third might have a data

path twice as wide, but the order-of-magnitude complexity of an algorithm remains the same. This

isnâ€™t cheating; when youâ€™re dealing with algorithms as complex as the ones presented here, the other

stuff is negligible (is a constant factor) compared to the order-of-magnitude complexity.

This notation allows you to see how the input size affects the time and space requirements. For

example, if T = O(n), then doubling the input size doubles the running time of the algorithm. If T = O

(2n), then adding one bit to the input size doubles the running time of the algorithm (within a constant

factor).

Generally, algorithms are classified according to their time or space complexities. An algorithm is

constant if its complexity is independent of n: O(1). An algorithm is linear, if its time complexity is O

(n). Algorithms can also be quadratic, cubic, and so on. All these algorithms are polynomial; their

complexity is O(nm), when m is a constant. The class of algorithms that have a polynomial time

complexity are called polynomial-time algorithms.

Algorithms whose complexities are O(t f(n)), where t is a constant greater than 1 and f (n) is some

polynomial function of n, are called exponential. The subset of exponential algorithms whose

complexities are O(c f(n)), where c is a constant and f (n) is more than constant but less than linear, is

called superpolynomial.

Ideally, a cryptographer would like to be able to say that the best algorithm to break this encryption

algorithm is of exponential-time complexity. In practice, the strongest statements that can be made,

given the current state of the art of computational complexity theory, are of the form â€śall known

cracking algorithms for this cryptosystem are of superpolynomial-time complexity.â€ť That is, the

cracking algorithms that we know are of superpolynomial-time complexity, but it is not yet possible to

prove that no polynomial-time cracking algorithm could ever be discovered. Advances in

computational complexity may some day make it possible to design algorithms for which the existence

of polynomial-time cracking algorithms can be ruled out with mathematical certainty.

As n grows, the time complexity of an algorithm can make an enormous difference in whether the

algorithm is practical. Table 11.2 shows the running times for different algorithm classes in which n

equals one million. The table ignores constants, but also shows why ignoring constants is reasonable.

Page 199 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Table 11.2

Running Times of Different Classes of Algorithms

# of Operations for Time at

n = 106

Class Complexity 106 O/S

Constant O(1) 1 1 Âµsec.

Linear O(n) 106 1 sec.

Quadratic O(n2) 1012 11.6 days

Cubic O(n3) 1018 32, 000 yrs.

Exponential O(2n) 10301,030 10301,006 times the age of the universe

Assuming that the unit of â€śtimeâ€ť for our computer is a microsecond, the computer can complete a

constant algorithm in a microsecond, a linear algorithm in a second, and a quadratic algorithm in

11.6 days. It would take 32, 000 years to complete a cubic algorithm; not terribly practical, but a

computer built to withstand the next ice age would deliver a solution eventually. Performing the

exponential algorithm is futile, no matter how well you extrapolate computing power, parallel

processing, or contact with superintelligent aliens.

Look at the problem of a brute-force attack against an encryption algorithm. The time complexity of

this attack is proportional to the number of possible keys, which is an exponential function of the key

length. If n is the length of the key, then the complexity of a brute-force attack is O(2n). Section 12.3

discusses the controversy surrounding a 56-bit key for DES instead of a 112-bit key. The complexity

of a brute-force attack against a 56-bit key is 256; against a 112-bit key the complexity is 2112. The

former is possible; the latter isnâ€™t.

Complexity of Problems

Complexity theory also classifies the inherent complexity of problems, not just the complexity of

particular algorithms used to solve problems. (Excellent introductions to this topic are [600, 211,

1226]; see also [1096, 27, 739].) The theory looks at the minimum time and space required to solve the

hardest instance of a problem on a theoretical computer known as a Turing machine . A Turing

machine is a finite-state machine with an infinite read-write memory tape. It turns out that a Turing

machine is a realistic model of computation.

Problems that can be solved with polynomial-time algorithms are called tractable, because they can

usually be solved in a reasonable amount of time for reasonable-sized inputs. (The exact definition of

â€śreasonableâ€ť depends on the circumstance.) Problems that cannot be solved in polynomial time are

called intractable, because calculating their solution quickly becomes infeasible. Intractable problems

are sometimes just called hard. Problems that can only be solved with algorithms that are

superpolynomial are computationally intractable, even for relatively small values of n.

It gets worse. Alan Turing proved that some problems are undecidable . It is impossible to devise any

algorithm to solve them, regardless of the algorithmâ€™s time complexity.

Problems can be divided into complexity classes, which depend on the complexity of their solutions.

Figure 11.1 shows the more important complexity classes and their presumed relationships.

(Unfortunately, not much about this material has been proved mathematically.)

Page 200 of 666

Applied Cryptography: Second Edition - Bruce Schneier

On the bottom, the class P consists of all problems that can be solved in polynomial time. The class NP

consists of all problems that can be solved in polynomial time only on a nondeterministic Turing

machine: a variant of a normal Turing machine that can make guesses. The machine guesses the

solution to the problemâ€”either by making â€ślucky guessesâ€ť or by trying all guesses in parallelâ€”and

checks its guess in polynomial time.

NP â€™s relevance to cryptography is this: Many symmetric algorithms and all public-key algorithms

can be cracked in nondeterministic polynomial time. Given a ciphertext C, the cryptanalyst simply

guesses a plaintext, X, and a key, k, and in polynomial time runs the encryption algorithm on inputs X

and k and checks whether the result is equal to C. This is important theoretically, because it puts an

upper bound on the complexity of cryptanalysis for these algorithms. In practice, of course, it is a

deterministic polynomial-time algorithm that the cryptanalyst seeks. Furthermore, this argument is

not applicable to all classes of ciphers; in particular, it is not applicable to one-time padsâ€”for any C,

there are many X, k pairs that yield C when run through the encryption algorithm, but most of these

X s are nonsense, not legitimate plaintexts.

Figure 11.1 Complexity classes.

The class NP includes the class P, because any problem solvable in polynomial time on a deterministic

Turing machine is also solvable in polynomial time on a nondeterministic Turing machine; the

guessing stage can simply be omitted.

If all NP problems are solvable in polynomial time on a deterministic machine, then P = NP. Although

it seems obvious that some NP problems are much harder than others (a brute-force attack against an

encryption algorithm versus encrypting a random block of plaintext), it has never been proven that P

GNP

G (or that P = NP). However, most people working in complexity theory believe that they are

unequal.

Stranger still, specific problems in NP can be proven to be as difficult as any problem in the class.

Steven Cook [365] proved that the Satisfiability problem (given a propositional Boolean formula, is

there a way to assign truth values to the variables that makes the formula true?) is NP-complete . This

means that, if Satisfiability is solvable in polynomial time, then P = NP. Conversely, if any problem in

NP can be proven not to have a deterministic polynomial-time algorithm, the proof will show that

Satisfiability does not have a deterministic polynomial-time algorithm either. No problem is harder

than Satisfiability in NP.

Since Cookâ€™s seminal paper was published, a huge number of problems have been shown to be

equivalent to Satisfiability; hundreds are listed in [600], and some examples follow. By equivalent, I

mean that these problems are also NP-complete; they are in NP and also as hard as any problem in

NP . If their solvability in deterministic polynomial time were resolved, the P versus NP question

would be solved. The question of whether P = NP is the central unsolved question of computational

complexity theory, and no one expects it to be solved anytime soon. If someone showed that P = NP,

Page 201 of 666

Applied Cryptography: Second Edition - Bruce Schneier

then most of this book would be irrelevant: As previously explained, many classes of ciphers are

trivially breakable in nondeterministic polynomial time. If P = NP, they are breakable by feasible,

deterministic algorithms.

Further out in the complexity hierarchy is PSPACE . Problems in PSPACE can be solved in

polynomial space, but not necessarily polynomial time. PSPACE includes NP, but some problems in

PSPACE are thought to be harder than NP. Of course, this isnâ€™t proven either. There is a class of

problems, the so-called PSPACE-complete problems, with the property that, if any one of them is in

NP then PSPACE = NP and if any one of them is in P then PSPACE = P .

And finally, there is the class of problems called EXPTIME . These problems are solvable in

exponential time. The EXPTIME-complete problems can actually be proven not to be solvable in

deterministic polynomial time. It has been shown that P does not equal EXPTIME .

NP-Complete Problems

Michael Garey and David Johnson compiled a list of over 300 NP-complete problems [600]. Here are

just a few of them:

â€” Traveling Salesman Problem. A traveling salesman has to visit n different cities using

only one tank of gas (there is a maximum distance he can travel). Is there a route that allows

him to visit each city exactly once on that single tank of gas? (This is a generalization of the

Hamiltonian Cycle problemâ€”see Section 5.1.)

â€” Three-Way Marriage Problem. In a room are n men, n women, and n clergymen

(priests, rabbis, whatever). There is also a list of acceptable marriages, which consists of one

man, one woman, and one clergyman willing to officiate. Given this list of possible triples, is it

possible to arrange n marriages such that everyone is either marrying one person or officiating

at one marriage?

â€” Three-Satisfiability. There is a list of n logical statements, each with three variables.

For example: if (x and y) then z, (x and w) or (not z), if ((not u and not x) or (z and (u or not x)))

then (not z and u) or x), and so on. Is there a truth assignment for all the variables that satisfies

all the statements? (This is a special case of the Satisfiability problem previously mentioned.)

11.3 Number Theory

This isnâ€™t a book on number theory, so Iâ€™m just going to sketch a few ideas that apply to

cryptography. If you want a detailed mathematical text on number theory, consult one of these books:

[1430, 72, 1171, 12, 959, 681, 742, 420]. My two favorite books on the mathematics of finite fields are

[971, 1042]. See also [88, 1157, 1158, 1060].

Modular Arithmetic

You all learned modular arithmetic in school; it was called â€śclock arithmetic.â€ť Remember these word

problems? If Mildred says sheâ€™ll be home by 10:00, and sheâ€™s 13 hours late, what time does she get

home and for how many years does her father ground her? Thatâ€™s arithmetic modulo 12. Twenty-

three modulo 12 equals 11.

(10 + 13) mod 12 = 23 mod 12 = 11 mod 12

Another way of writing this is to say that 23 and 11 are equivalent, modulo 12:

Page 202 of 666

Applied Cryptography: Second Edition - Bruce Schneier

G11

G (mod 12)

23

Gb

Basically, a G (mod n) if a = b + kn for some integer k. If a is non-negative and b is between 0 and n,

you can think of b as the remainder of a when divided by n. Sometimes, b is called the residue of a,

modulo n. Sometimes a is called congruent to b, modulo n (the triple equals sign, , denotes

congruence). These are just different ways of saying the same thing.

The set of integers from 0 to n - 1 form what is called a complete set of residues modulo n. This means

that, for every integer a, its residue modulo n is some number from 0 to n - 1.

The operation a mod n denotes the residue of a, such that the residue is some integer from 0 to n - 1.

This operation is modular reduction. For example, 5 mod 3 = 2.

This definition of mod may be different from the definition used in some programming languages. For

example, PASCALâ€™s modulo operator sometimes returns a negative number. It returns a number

between -(n - 1) and n - 1. In C, the % operator returns the remainder from the division of the first

expression by the second; this can be a negative number if either operand is negative. For all the

algorithms in this book, make sure you add n to the result of the modulo operator if it returns a

negative number.

Modular arithmetic is just like normal arithmetic: Itâ€™s commutative, associative, and distributive.

Also, reducing each intermediate result modulo n yields the same result as doing the whole calculation

and then reducing the end result modulo n.

a + b) mod n = ((a mod n) + (b mod n)) mod n

(a - b) mod n = ((a mod n) - (b mod n)) mod n

(a*b) mod n = ((a mod n)*(b mod n)) mod n

(a*(b + c)) mod n = (((a*b) mod n) + ((a*c) mod n)) mod n

Cryptography uses computation mod n a lot, because calculating discrete logarithms and square roots

mod n can be hard problems. Modular arithmetic is also easier to work with on computers, because it

restricts the range of all intermediate values and the result. For a k- bit modulus, n, the intermediate

results of any addition, subtraction, or multiplication will not be more than 2k- bits long. So we can

perform exponentiation in modular arithmetic without generating huge intermediate results.

Calculating the power of some number modulo some number,

ax mod n,

is just a series of multiplications and divisions, but there are speedups. One kind of speedup aims to

minimize the number of modular multiplications; another kind aims to optimize the individual

modular multiplications. Because the operations are distributive, it is faster to do the exponentiation

as a stream of successive multiplications, taking the modulus every time. It doesnâ€™t make much

difference now, but it will when youâ€™re working with 200-bit numbers.

For example, if you want to calculate a8 mod n, donâ€™t use the naĂŻve approach and perform seven

multiplications and one huge modular reduction:

(a*a*a*a*a*a*a*a) mod n

Instead, perform three smaller multiplications and three smaller modular reductions:

Page 203 of 666

Applied Cryptography: Second Edition - Bruce Schneier

((a2 mod n)2 mod n)2 mod n

By the same token,

a16 mod n = (((a2 mod n)2 mod n)2 mod n)2 mod n

Computing ax mod n, where x is not a power of 2, is only slightly harder. Binary notation expresses x

as a sum of powers of 2: 25 is 11001 in binary, so 25 = 24 + 23 + 20. So

a25 mod n = (a*a24) mod n = (a*a8*a16) mod n

= (a*((a2)2)2*(((a2)2)2)2) mod n = ((((a2*a)2)2)2*a) mod n

With judicious storing of intermediate results, you only need six multiplications:

(((((((a2 mod n)*a) mod n)2 mod n)2 mod n)2 mod n)*a) mod n

This is called addition chaining [863], or the binary square and multiply method. It uses a simple and

obvious addition chain based on the binary representation. In C, it looks like:

unsigned long qe2(unsigned long x, unsigned long y, unsigned long n) {

unsigned long s,t,u;

int i;

s = 1; t = x; u = y;

while(u) {

if(u&1) s = (s* t)%n;

u>>=1;

t = (t* t)%n;

}

return(s);

}

Another, recursive, algorithm is:

unsigned long fast_exp(unsigned long x, unsigned long y, unsigned long N) {

unsigned long tmp;

if(y==1) return(x % N);

if ((y&1)==0) {

tmp = fast_exp(x,y/2,N);

return ((tmp* tmp)%N);

}

else {

tmp = fast_exp(x,(y-1)/2,N);

tmp = (tmp* tmp)%N;

tmp = (tmp* x)%N;

return (tmp);

}

}

This technique reduces the operation to, on the average, 1.5*k operations, if k is the length of the

number x in bits. Finding the calculation with the fewest operations is a hard problem (it has been

proven that the sequence must contain at least k- 1 operations), but it is not too hard to get the

number of operations down to 1.1*k or better, as k grows.

An efficient way to do modular reductions many times using the same n is Montgomeryâ€™s method

[1111]. Another method is called Barrettâ€™s algorithm [87]. The software performance of these two

Page 204 of 666

Applied Cryptography: Second Edition - Bruce Schneier

algorithms and the algorithm previously discussed is in [210]: The algorithm Iâ€™ve discussed is the best

choice for singular modular reductions; Barrettâ€™s algorithm is the best choice for small arguments;

and Montgomeryâ€™s method is the best choice for general modular exponentiations. (Montgomeryâ€™s

method can also take advantage of small exponents, using something called mixed arithmetic.)

The inverse of exponentiation modulo n is calculating a discrete logarithm . Iâ€™ll discuss this shortly.

Prime Numbers

A prime number is an integer greater than 1 whose only factors are 1 and itself: No other number

evenly divides it. Two is a prime number. So are 73, 2521, 2365347734339, and 2756839 - 1. There are an

infinite number of primes. Cryptography, especially public-key cryptography, uses large primes (512

bits and even larger) often.

Evangelos Kranakis wrote an excellent book on number theory, prime numbers, and their

applications to cryptography [896]. Paulo Ribenboim wrote two excellent references on prime

numbers in general [1307, 1308].

Greatest Common Divisor

Two numbers are relatively prime when they share no factors in common other than 1. In other

words, if the greatest common divisor of a and n is equal to 1. This is written:

gcd(a,n) = 1

The numbers 15 and 28 are relatively prime, 15 and 27 are not, and 13 and 500 are. A prime number

is relatively prime to all other numbers except its multiples.

One way to compute the greatest common divisor of two numbers is with Euclidâ€™s algorithm . Euclid

described the algorithm in his book, Elements, written around 300 B.C. He didnâ€™t invent it. Historians

believe the algorithm could be 200 years older. It is the oldest nontrivial algorithm that has survived

to the present day, and it is still a good one. Knuth describes the algorithm and some modern

modifications [863].

In C:

/* returns gcd of x and y */

int gcd (int x, int y)

{

int g;

if (x < 0)

x = -x;

if (y < 0)

y = -y;

if (x + y == 0)

ERROR;

g = y;

while (x > 0) {

g = x;

x = y % x;

y = g;

}

return g;

}

This algorithm can be generalized to return the gcd of an array of m numbers:

Page 205 of 666

Applied Cryptography: Second Edition - Bruce Schneier

/* returns the gcd of x1, x2...xm */

int multiple_gcd (int m, int *x)

{

size_t i;

int g;

if (m < 1)

return 0;

g = x[0];

for (i=1; i<m; ++i) {

g = gcd(g, x[i]);

/* optimization, since for random x[i], g==1 60% of the time: */

if (g == 1)

return 1;

}

return g;

}

Inverses Modulo a Number

Remember inverses? The multiplicative inverse of 4 is 1/4, because 4*1/4 = 1. In the modulo world, the

problem is more complicated:

G1

G (mod 7)

4*x

This equation is equivalent to finding an x and k such that

4x = 7k + 1

where both x and k are integers.

The general problem is finding an x such that

1 = (a*x) mod n

This is also written as

Gx

G (mod n)

a -1

The modular inverse problem is a lot more difficult to solve. Sometimes it has a solution, sometimes

not. For example, the inverse of 5, modulo 14, is 3. On the other hand, 2 has no inverse modulo 14.

Gx

In general, a-1 G (mod n) has a unique solution if a and n are relatively prime. If a and n are not

Gx

relatively prime, then a-1 G (mod n) has no solution. If n is a prime number, then every number

from 1 to n- 1 is relatively prime to n and has exactly one inverse modulo n in that range.

So far, so good. Now, how do you go about finding the inverse of a modulo n? There are a couple of

ways. Euclidâ€™s algorithm can also compute the inverse of a number modulo n. Sometimes this is called

the extended Euclidean algorithm.

Hereâ€™s the algorithm in C++:

#define isEven(x) ((x & 0x01) == 0)

#define isOdd(x) (x & 0x01)

Page 206 of 666

Applied Cryptography: Second Edition - Bruce Schneier

#define swap(x,y) (x ^= y, y ^= x, x ^= y)

void ExtBinEuclid(int *u, int *v, int *u1, int *u2, int *u3)

{

// warning: u and v will be swapped if u < v

int k, t1, t2, t3;

if (*u < *v) swap(*u,*v);

for (k = 0; isEven(*u) && isEven(*v); ++k) {

*u >>= 1; *v >>= 1;

}

*u1 = 1; *u2 = 0; *u3 = *u; t1 = *v; t2 = *u-1; t3 = *v;

do {

do {

if (isEven(*u3)) {

if (isOdd(*u1) || isOdd(*u2)) {

*u1 += *v; *u2 += *u;

}

*u1 >>= 1; *u2 >>= 1; *u3 >>= 1;

}

if (isEven(t3) || *u3 < t3) {

swap(*u1,t1); swap(*u2,t2); swap(*u3,t3);

}

} while (isEven(*u3));

while (*u1 < t1 || *u2 < t2) {

*u1 += *v; *u2 += *u;

}

*u1 -= t1; *u2 -= t2; *u3 -= t3;

} while (t3 > 0);

while (*u1 >= *v && *u2 >= *u) {

*u1 -= *v; *u2 -= *u;

}

*u1 <<= k; *u2 <<= k; *u3 <<= k;

}

main(int argc, char **argv) {

int a, b, gcd;

if (argc < 3) {

cerr << "Usage: xeuclid u v" << endl;

return -1;

}

int u = atoi(argv[1]);

int v = atoi(argv[2]);

if (u <= 0 || v <= 0) {

cerr << "Arguments must be positive!" << endl;

return -2;

}

// warning: u and v will be swapped if u < v

ExtBinEuclid(&u, &v, &a, &b, &gcd);

cout << a << " * " << u << " + (-"

<< b << ") * " << v << " = " << gcd << endl;

if (gcd == 1)

cout << "the inverse of " << v << " mod " << u << " is: "

<< u - b << endl;

return 0;

}

Iâ€™m not going to prove that it works or give the theory behind it. Details can be found in [863], or in

any of the number theory texts previously listed.

The algorithm is iterative and can be slow for large numbers. Knuth showed that the average number

of divisions performed by the algorithm is:

Page 207 of 666

Applied Cryptography: Second Edition - Bruce Schneier

.843*log2(n) + 1.47

Solving for Coefficients

Euclidâ€™s algorithm can be used to solve this class of problems: Given an array of m variables x1,

x2,...xm, find an array of m coefficients, u1, u2...um, such that

u1*x1 + ...+ um*xm = 1

Fermatâ€™s Little Theorem

If m is a prime, and a is not a multiple of m, then Fermatâ€™s little theorem says

G1 (mod m)

G

am -1

(Pierre de Fermat, pronounced â€śFair-ma, â€ť was a French mathematician who lived from 1601 to 1665.

This theorem has nothing to do with his last theorem.)

The Euler Totient Function

There is another method for calculating the inverse modulo n, but itâ€™s not always possible to use it.

The reduced set of residues mod n is the subset of the complete set of residues that is relatively prime

to n. For example, the reduced set of residues mod 12 is {1, 5, 7, 11}. If n is prime, then the reduced set

of residues mod n is the set of all numbers from 1 to n- 1. The number 0 is never part of the reduced

set of residues for any n not equal to 1.

The Euler totient function, also called the Euler phi function and written as Ď†(n), is the number of

elements in the reduced set of residues modulo n. In other words, Ď†(n) is the number of positive

integers less than n that are relatively prime to n (for any n greater than 1). (Leonhard Euler,

pronounced â€śOiler, â€ť was a Swiss mathematician who lived from 1707 to 1783.)

If n is prime, then Ď†(n) = n- 1. If n = pq, where p and q are prime, then Ď†(n) = (p- 1)(q- 1). These

numbers appear in some public-key algorithms; this is why.

According to Eulerâ€™s generalization of Fermatâ€™s little theorem, if gcd(a,n) = 1, then

aĎ†(n) mod n = 1

Now it is easy to compute a-1 mod n:

x = aĎ†(n)-1 mod n

For example, what is the inverse of 5, modulo 7? Since 7 is prime, Ď†(7) = 7- 1 = 6. So, the inverse of 5,

modulo 7, is

56-1 mod 7 = 55 mod 7 = 3

Both methods for calculating inverses can be extended to solve for x in the general problem (if gcd

(a,n) = 1):

Page 208 of 666

Applied Cryptography: Second Edition - Bruce Schneier

(a*x) mod n = b

Using Eulerâ€™s generalization, solve

x = (b*a Ď†(n)-1) mod n

Using Euclidâ€™s algorithm, solve

x = (b*(a-1 mod n)) mod n

In general, Euclidâ€™s algorithm is faster than Eulerâ€™s generalization for calculating inverses, especially

G1,

for numbers in the 500-bit range. If gcd(a,n) G all is not lost. In this general case, (a*x) mod n = b,

can have multiple solutions or no solution.

Chinese Remainder Theorem

If you know the prime factorization of n, then you can use something called the Chinese remainder

theorem to solve a whole system of equations. The basic version of this theorem was discovered by the

first-century Chinese mathematician, Sun Tse.

In general, if the prime factorization of n is p1*p2*...*pt, then the system of equations

(x mod pi) = ai, where i = 1, 2,..., t

has a unique solution, x, where x is less than n. (Note that some primes can appear more than once.

For example, p1 might be equal to p2 .) In other words, a number (less than the product of some

primes) is uniquely identified by its residues mod those primes.

For example, use 3 and 5 as primes, and 14 as the number. 14 mod 3 = 2, and 14 mod 5 = 4. There is

only one number less than 3*5 = 15 which has those residues: 14. The two residues uniquely

determine the number.

So, for an arbitrary a < p and b < q (where p and q are prime), there exists a unique x, where x is less

than pq, such that

Ga (mod p), and x

G Gb (mod q)

G

x

To find this x, first use Euclidâ€™s algorithm to find u, such that

G1

G (mod p)

u*q

Then compute:

x = (((a - b)*u) mod p)*q + b

Here is the Chinese remainder theorem in C:

/* r is the number of elements in arrays m and u;

m is the array of (pairwise relatively prime) moduli

u is the array of coefficients

Page 209 of 666

Applied Cryptography: Second Edition - Bruce Schneier

return value is n such than n == u[k]%m[k] (k=0..r-1) and

n < m[0]*m[1]*...*m[r-1]

*/

/* totient() is left as an exercise to the reader. */

int chinese_remainder (size_t r, int *m, int *u)

{

size_t i;

int modulus;

int n;

modulus = 1;

for (i=0; i<r; ++i)

modulus *= m[i];

n = 0;

for (i=0; i<r; ++i) {

n += u[i] * modexp(modulus / m[i], totient(m[i]),

m[i]);

n %= modulus;

}

return n;

}

The converse of the Chinese remainder theorem can also be used to find the solution to the problem:

if p and q are primes, and p is less than q, then there exists a unique x less than pq, such that

Gx Gx

G (mod p), and b G (mod q)

a

Gb

G mod p, then

If a

x = (((a - (b mod p))*u) mod p)*q + b

If a < b mod p, then

x = (((a + p - (b mod p))*u) mod p)*q + b

Quadratic Residues

If p is prime, and a is greater than 0 and less than p, then a is a quadratic residue mod p if

Ga

G (mod p), for some x

x2

Not all values of a satisfy this property. For a to be a quadratic residue modulo n, it must be a

quadratic residue modulo all the prime factors of n. For example, if p = 7, the quadratic residues are

1, 2, and 4:

G1

12 = 1 G (mod 7)

G4

22 = 4 G (mod 7)

G2

32 = 9 G (mod 7)

G2

42 = 16 G (mod 7)

G4

52 = 25 G (mod 7)

62 = 36 G1 (mod 7)

G

Note that each quadratic residue appears twice on this list.

Page 210 of 666

Applied Cryptography: Second Edition - Bruce Schneier

There are no values of x which satisfy any of these equations:

G3

G (mod 7)

x2

G5

G (mod 7)

x2

G6

G (mod 7)

x2

The quadratic nonresidues modulo 7, the numbers that are not quadratic residues, are 3, 5, and 6.

Although I will not do so here, it is easy to prove that, when p is odd, there are exactly (p- 1)/2

quadratic residues mod p and the same number of quadratic nonresidues mod p. Also, if a is a

quadratic residue mod p, then a has exactly two square roots, one of them between 0 and (p- 1)/2, and

the other between (p- 1)/2 and (p- 1). One of these square roots is also a quadratic residue mod p; this

is called the principal square root.

If n is the product of two primes, p and q, there are exactly (p- 1)(q- 1)/4 quadratic residues mod n. A

quadratic residue mod n is a perfect square modulo n. This is because to be a square mod n, the

residue must be a square mod p and a square mod q. For example, there are 11 quadratic residues

mod 35: 1, 4, 9, 11, 14, 15, 16, 21, 25, 29, and 30. Each quadratic residue has exactly four square roots.

Legendre Symbol

The Legendre symbol, written L(a,p), is defined when a is any integer and p is a prime greater than 2.

It is equal to 0, 1, or -1.

L(a,p) = 0 if a is divisible by p.

L(a,p) = 1 if a is a quadratic residue mod p.

L(a,p) = - 1 is a is a quadratic nonresidue mod p.

One way to calculate L(a,p) is:

L(a,p) = a(p- 1)/2 mod p

Or you can use the following algorithm:

1. If a = 1, then L(a,p) = 1

ńňđ. 9 |