. 9
( 29)


Until very recently, all encryption products were in the form of specialized hardware. These
encryption/decryption boxes plugged into a communications line and encrypted all the data going
across that line. Although software encryption is becoming more prevalent today, hardware is still the
embodiment of choice for military and serious commercial applications. The NSA, for example, only
authorizes encryption in hardware. There are several reasons why this is so.

Table 10.5
Comparing File-Level and Driver-Level Encryption
File-Level Encryption Driver-Level Encryption
Ease of implementation and use. Temporary files, work files, and so forth can be kept
Flexible. on the secure drive.
Relatively small performance penalty. It™s harder to forget to re-encrypt something on this
Users can move files between different machines kind of system.
without problems.
Users can back files up without problems.
Security Issues:
Potential leakage through security-unconscious Lots of things can go wrong with a device-driver or
programs. (Program may write file to disk for memory-resident program.
temporary storage, for example.) Bad implementations will allow chosen-plaintext, or
Bad implementations may always re-encrypt with even chosen-ciphertext attacks.
same key for same password. If whole system is master-keyed under one
password, loss of that password means that the
attacker gets everything.
A more limited set of ciphers can reasonably be
used for this kind of application. For example, OFB
stream ciphers would not work.
Usability Problems:
User has to figure out what to do. There will be a performance penalty.
There may be different passwords for different files. The driver may interact in weird ways with
Manual encryption of selected files is the only access Windows, OS/2 DOS emulation, device drivers, and
control. so on.

The first is speed. As we will see in Part III, encryption algorithms consist of many complicated
operations on plaintext bits. These are not the sorts of operations that are built into your run-of-the-
mill computer. The two most common encryption algorithms, DES and RSA, run inefficiently on
general-purpose processors. While some cryptographers have tried to make their algorithms more

Page 189 of 666
Applied Cryptography: Second Edition - Bruce Schneier

suitable for software implementation, specialized hardware will always win a speed race.

Additionally, encryption is often a computation-intensive task. Tying up the computer™s primary
processor for this is inefficient. Moving encryption to another chip, even if that chip is just another
processor, makes the whole system faster.

The second reason is security. An encryption algorithm running on a generalized computer has no
physical protection. Mallory can go in with various debugging tools and surreptitiously modify the
algorithm without anyone ever realizing it. Hardware encryption devices can be securely
encapsulated to prevent this. Tamperproof boxes can prevent someone from modifying a hardware
encryption device. Special-purpose VLSI chips can be coated with a chemical such that any attempt to
access their interior will result in the destruction of the chip™s logic. The U.S. government™s Clipper
and Capstone chips (see Sections 24.16 and 24.17) are designed to be tamperproof. The chips can be
designed so that it is impossible for Mallory to read the unencrypted key.

IBM developed a cryptographic system for encrypting data and communications on mainframe
computers [515,1027]. It includes tamper-resistant modules to hold keys. This system is discussed in
Section 24.1.

Electromagnetic radiation can sometimes reveal what is going on inside a piece of electronic
equipment. Dedicated encryption boxes can be shielded, so that they leak no compromising
information. General-purpose computers can be shielded as well, but it is a far more complex
problem. The U.S. military calls this TEMPEST; it™s a subject well beyond the scope of this book.

The final reason for the prevalence of hardware is the ease of installation. Most encryption
applications don™t involve general-purpose computers. People may wish to encrypt their telephone
conversations, facsimile transmissions, or data links. It is cheaper to put special-purpose encryption
hardware in the telephones, facsimile machines, and modems than it is to put in a microprocessor and

Even when the encrypted data comes from a computer, it is easier to install a dedicated hardware
encryption device than it is to modify the computer™s system software. Encryption should be invisible;
it should not hamper the user. The only way to do this in software is to write encryption deep into the
operating system. This isn™t easy. On the other hand, even a computer neophyte can plug an
encryption box between his computer and his external modem.

The three basic kinds of encryption hardware on the market today are: self-contained encryption
modules (that perform functions such as password verification and key management for banks),
dedicated encryption boxes for communications links, and boards that plug into personal computers.

Some encryption boxes are designed for certain types of communications links, such as T-1 encryption
boxes that are designed not to encrypt synchronization bits. There are different boxes for
synchronous and asynchronous communications lines. Newer boxes tend to accept higher bit rates
and are more versatile.

Even so, many of these devices have some incompatibilities. Buyers should be aware of this and be
well-versed in their particular needs, lest they find themselves the owners of encryption equipment
unable to perform the task at hand. Pay attention to restrictions in hardware type, operating system,
applications software, network, and so forth.

PC-board encryptors usually encrypt everything written to the hard disk and can be configured to
encrypt everything sent to the floppy disk and serial port as well. These boards are not shielded
against electromagnetic radiation or physical interference, since there would be no benefit in

Page 190 of 666
Applied Cryptography: Second Edition - Bruce Schneier

protecting the boards if the computer remained unaffected.

More companies are starting to put encryption hardware into their communications equipment.
Secure telephones, facsimile machines, and modems are all available.

Internal key management for these devices is generally secure, although there are as many different
schemes as there are equipment vendors. Some schemes are more suited for one situation than
another, and buyers should know what kind of key management is incorporated into the encryption
box and what they are expected to provide themselves.


Any encryption algorithm can be implemented in software. The disadvantages are in speed, cost, and
ease of modification (or manipulation). The advantages are in flexibility and portability, ease of use,
and ease of upgrade. The algorithms written in C at the end of this book can be implemented, with
little modification, on any computer. They can be inexpensively copied and installed on many
machines. They can be incorporated into larger applications, such as communications programs or
word processors.

Software encryption programs are popular and are available for all major operating systems. These
are meant to protect individual files; the user generally has to manually encrypt and decrypt specific
files. It is important that the key management scheme be secure: The keys should not be stored on
disk anywhere (or even written to a place in memory from where the processor swaps out to disk).
Keys and unencrypted files should be erased after encryption. Many programs are sloppy in this
regard, and a user has to choose carefully.

Of course, Mallory can always replace the software encryption algorithm with something lousy. But
for most users, that isn™t a problem. If Mallory can break into our office and modify our encryption
program, he can also put a hidden camera on the wall, a wiretap on the telephone, and a TEMPEST
detector down the street. If Mallory is that much more powerful than the user, the user has lost the
game before it starts.

10.6 Compression, Encoding, and Encryption

Using a data compression algorithm together with an encryption algorithm makes sense for two

Cryptanalysis relies on exploiting redundancies in the plaintext; compressing a file before
encryption reduces these redundancies.
Encryption is time-consuming; compressing a file before encryption speeds up the entire

The important thing to remember is to compress before encryption. If the encryption algorithm is any
good, the ciphertext will not be compressible; it will look like random data. (This makes a reasonable
test of an encryption algorithm; if the ciphertext can be compressed, then the algorithm probably isn™t
very good.)

If you are going to add any type of transmission encoding or error detection and recovery, remember
to add that after encryption. If there is noise in the communications path, decryption™s error-
extension properties will only make that noise worse. Figure 10.3 summarizes these steps.

Page 191 of 666
Applied Cryptography: Second Edition - Bruce Schneier

10.7 Detecting Encryption

How does Eve detect an encrypted file? Eve is in the spy business, so this is an important question.
Imagine that she™s eavesdropping on a network where messages are flying in all directions at high
speeds; she has to pick out the interesting ones. Encrypted files are certainly interesting, but how does
she know they are encrypted?

Generally, she relies on the fact that most popular encryption programs have well-defined headers.
Electronic-mail messages encrypted with either PEM or PGP (see Sections 24.10 and 24.12) are easy
to identify for that reason.

Other file encryptors just produce a ciphertext file of seemingly random bits. How can she distinguish
it from any other file of seemingly random bits? There is no sure way, but Eve can try a number of

” Examine the file. ASCII text is easy to spot. Other file formats, such as TIFF, TeX, C,
Postscript, G3 facsimile, or Microsoft Excel, have standard identifying characteristics.
Executable code is detectable, as well. UNIX files often have “magic numbers” that can be

Figure 10.3 Encryption with compression and error control.

” Try to uncompress the file, using the major compression algorithms. If the file is
compressed (and not encrypted), this should yield the original file.
” Try to compress the file. If the file is ciphertext (and the algorithm is good), then the
probability that the file can be appreciably compressed by a general-purpose compression
routine is small. (By appreciably, I mean more than 1 or 2 percent.) If it is something else (a
binary image or a binary data file, for example) it probably can be compressed.

Any file that cannot be compressed and is not already compressed is probably ciphertext. (Of course,
it is possible to specifically make ciphertext that is compressible.) Identifying the algorithm is a whole
lot harder. If the algorithm is good, you can™t. If the algorithm has some slight biases, it might be
possible to recognize those biases in the file. However, the biases have to be pretty significant or the
file has to be pretty big in order for this to work.

10.8 Hiding Ciphertext in Ciphertext

Alice and Bob have been sending encrypted messages to each other for the past year. Eve has been
collecting them all, but she cannot decrypt any of them. Finally, the secret police tire of all this
unreadable ciphertext and arrest the pair. “Give us your encryption keys,” they demand. Alice and
Bob refuse, but then they notice the thumbscrews. What can they do?

Wouldn™t it be nice to be able to encrypt a file such that there are two possible decryptions, each with
a different key. Alice could encrypt a real message to Bob in one of the keys and some innocuous
message in the other key. If Alice were caught, she could surrender the key to the innocuous message
and keep the real key secret.

The easiest way to do this is with one-time pads. Let P be the plaintext, D the dummy plaintext, C the

Page 192 of 666
Applied Cryptography: Second Edition - Bruce Schneier

ciphertext, K the real key, and K™ the dummy key. Alice encrypts P:


Alice and Bob share K, so Bob can decrypt C:


If the secret police ever force them to surrender their key, they don™t surrender K, but instead

K™ = C D

The police then recover the dummy plaintext:

C K™ = D

Since these are one-time pads and K is completely random, there is no way to prove that K™ was not
the real key. To make matters more convincing, Alice and Bob should concoct some mildly
incriminating dummy messages to take the place of the really incriminating real messages. A pair of
Israeli spies once did this.

Alice could take P and encrypt it with her favorite algorithm and key K to get C. Then she takes C and
XORs it with some piece of mundane plaintext”Pride and Prejudice for example, to get K™. She stores
both C and the XOR on her hard disk. Now, when the secret police interrogate her, she can explain
that she is an amateur cryptographer and that K™ is a merely one-time pad for C. The secret police
might suspect something, but unless they know K they cannot prove that Alice™s explanation isn™t

Another method is to encrypt P with a symmetric algorithm and K, and D with K™. Intertwine bits (or
bytes) of the ciphertext to make the final ciphertexts. If the secret police demand the key, Alice gives
them K™ and says that the alternating bits (or bytes) are random noise designed to frustrate
cryptanalysis. The trouble is the explanation is so implausible that the secret police will probably not
believe her (especially considering it is suggested in this book).

A better way is for Alice to create a dummy message, D, such that the concatenation of P and D,
compressed, is about the same size as D. Call this concatenation P™. Alice then encrypts P™ with
whatever algorithm she and Bob share to get C. Then she sends C to Bob. Bob decrypts C to get P™,
and then P and D. Then they both compute C D = K™. This K™ becomes the dummy one-time pad
they use in case the secret police break their doors down. Alice has to transmit D so that hers and
Bob™s alibis match.

Another method is for Alice to take an innocuous message and run it through some error-correcting
code. Then she can introduce errors that correspond to the secret encrypted message. On the
receiving end, Bob can extract the errors to reconstruct the secret message and decrypt it. He can also
use the error-correcting code to recover the innocuous message. Alice and Bob might be hard pressed
to explain to the secret police why they consistently get a 30 percent bit-error rate on an otherwise
noise-free computer network, but in some circumstances this scheme can work.

Finally, Alice and Bob can use the subliminal channels in their digital signature algorithms (see
Sections 4.2 and 23.3). This is undetectable, works great, but has the drawback of only allowing 20 or
so characters of subliminal text to be sent per signed innocuous message. It really isn™t good for much

Page 193 of 666
Applied Cryptography: Second Edition - Bruce Schneier

more than sending keys.

10.9 Destroying Information

When you delete a file on most computers, the file isn™t really deleted. The only thing deleted is an
entry in the disk™s index file, telling the machine that the file is there. Many software vendors have
made a fortune selling file-recovery software that recovers files after they have been deleted.

And there™s yet another worry: Virtual memory means your computer can read and write memory to
disk any time. Even if you don™t save it, you never know when a sensitive document you are working
on is shipped off to disk. This means that even if you never save your plaintext data, your computer
might do it for you. And driver-level compression programs like Stacker and DoubleSpace can make
it even harder to predict how and where information is stored on a disk.

To erase a file so that file-recovery software cannot read it, you have to physically write over all of the
file™s bits on the disk. According to the National Computer Security Center [1148]:

Overwriting is a process by which unclassified data are written to storage locations that
previously held sensitive data.... To purge the...storage media, the DoD requires
overwriting with a pattern, then its complement, and finally with another pattern; e.g.,
overwrite first with 0011 0101, followed by 1100 1010, then 1001 0111. The number of
times an overwrite must be accomplished depends on the storage media, sometimes on its
sensitivity, and sometimes on different DoD component requirements. In any case, a purge
is not complete until a final overwrite is made using unclassified data.

You may have to erase files or you may have to erase entire drives. You should also erase all unused
space on your hard disk.

Most commercial programs that claim to implement the DoD standard overwrite three times: first
with all ones, then with all zeros, and finally with a repeating one-zero pattern. Given my general level
of paranoia, I recommend overwriting a deleted file seven times: the first time with all ones, the
second time with all zeros, and five times with a cryptographically secure pseudo-random sequence.
Recent developments at the National Institute of Standards and Technology with electron-tunneling
microscopes suggest even that might not be enough. Honestly, if your data is sufficiently valuable,
assume that it is impossible to erase data completely off magnetic media. Burn or shred the media; it™s
cheaper to buy media new than to lose your secrets.

Part III
Cryptographic Algorithms
Chapter 11
Mathematical Background
11.1 Information Theory

Modern information theory was first published in 1948 by Claude Elmwood Shannon [1431, 1432].
(His papers have been reprinted by the IEEE Press [1433].) For a good mathematical treatment of the
topic, consult [593]. In this section, I will just sketch some important ideas.

Page 194 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Entropy and Uncertainty

Information theory defines the amount of information in a message as the minimum number of bits
needed to encode all possible meanings of that message, assuming all messages are equally likely. For
example, the day-of-the-week field in a database contains no more than 3 bits of information, because
the information can be encoded with 3 bits:

000 = Sunday
001 = Monday
010 = Tuesday
011 = Wednesday
100 = Thursday
101 = Friday
110 = Saturday
111 is unused

If this information were represented by corresponding ASCII character strings, it would take up
more memory space but would not contain any more information. Similarly, the “sex” field of a
database contains only 1 bit of information, even though it might be stored as one of two 6-byte ASCII
strings: “MALE” or “FEMALE.”

Formally, the amount of information in a message M is measured by the entropy of a message,
denoted by H(M). The entropy of a message indicating sex is 1 bit; the entropy of a message indicating
the day of the week is slightly less than 3 bits. In general, the entropy of a message measured in bits is
log2 n, in which n is the number of possible meanings. This assumes that each meaning is equally

The entropy of a message also measures its uncertainty. This is the number of plaintext bits needed to
be recovered when the message is scrambled in ciphertext in order to learn the plaintext. For example,
if the ciphertext block “QHP*5M” is either “MALE” or “FEMALE, ” then the uncertainty of the
message is 1. A cryptanalyst has to learn only one well-chosen bit to recover the message.

Rate of a Language

For a given language, the rate of the language is

r = H(M)/N

in which N is the length of the message. The rate of normal English takes various values between 1.0
bits/letter and 1.5 bits/letter, for large values of N. Shannon, in [1434], said that the entropy depends
on the length of the text. Specifically he indicated a rate of 2.3 bits/letter for 8-letter chunks, but the
rate drops to between 1.3 and 1.5 for 16-letter chunks. Thomas Cover used a gambling estimating
technique and found an entropy of 1.3 bits/character [386]. (I™ll use 1.3 in this book.) The absolute
rate of a language is the maximum number of bits that can be coded in each character, assuming each
character sequence is equally likely. If there are L characters in a language, the absolute rate is:

R = log2 L

This is the maximum entropy of the individual characters.

For English, with 26 letters, the absolute rate is log2 26, or about 4.7 bits/letter. It should come as no
surprise to anyone that the actual rate of English is much less than the absolute rate; natural language

Page 195 of 666
Applied Cryptography: Second Edition - Bruce Schneier

is highly redundant.

The redundancy of a language, denoted D, is defined by:


Given that the rate of English is 1.3, the redundancy is 3.4 bits/letter. This means that each English
character carries 3.4 bits of redundant information.

An ASCII message that is nothing more than printed English has 1.3 bits of information per byte of
message. This means it has 6.7 bits of redundant information, giving it an overall redundancy of 0.84
bits of information per bit of ASCII text, and an entropy of 0.16 bits of information per bit of ASCII
text. The same message in BAUDOT, at 5 bits per character, has a redundancy of 0.74 bits per bit and
an entropy of 0.26 bits per bit. Spacing, punctuation, numbers, and formatting modify these results.

Security of a Cryptosystem

Shannon defined a precise mathematical model of what it means for a cryptosystem to be secure. The
goal of a cryptanalyst is to determine the key K, the plaintext P, or both. However, he may be satisfied
with some probabilistic information about P: whether it is digitized audio, German text, spreadsheet
data, or something else.

In most real-world cryptanalysis, the cryptanalyst has some probabilistic information about P before
he even starts. He probably knows the language of the plaintext. This language has a certain
redundancy associated with it. If it is a message to Bob, it probably begins with “Dear Bob.” Certainly
“Dear Bob” is more probable than “e8T&g [, m.” The purpose of cryptanalysis is to modify the
probabilities associated with each possible plaintext. Eventually one plaintext will emerge from the
pile of possible plaintexts as certain (or at least, very probable).

There is such a thing as a cryptosystem that achieves perfect secrecy: a cryptosystem in which the
ciphertext yields no possible information about the plaintext (except possibly its length). Shannon
theorized that it is only possible if the number of possible keys is at least as large as the number of
possible messages. In other words, the key must be at least as long as the message itself, and no key
can be reused. In still other words, the one-time pad (see Section 1.5) is the only cryptosystem that
achieves perfect secrecy.

Perfect secrecy aside, the ciphertext unavoidably yields some information about the corresponding
plaintext. A good cryptographic algorithm keeps this information to a minimum; a good cryptanalyst
exploits this information to determine the plaintext.

Cryptanalysts use the natural redundancy of language to reduce the number of possible plaintexts.
The more redundant the language, the easier it is to cryptanalyze. This is the reason that many real-
world cryptographic implementations use a compression program to reduce the size of the text before
encrypting it. Compression reduces the redundancy of a message as well as the work required to
encrypt and decrypt.

The entropy of a cryptosystem is a measure of the size of the keyspace, K. It is approximated by the
base two logarithm of the number of keys:

H(K) = log2 K

A cryptosystem with a 64-bit key has an entropy of 64 bits; a cryptosystem with a 56-bit key has an

Page 196 of 666
Applied Cryptography: Second Edition - Bruce Schneier

entropy of 56 bits. In general, the greater the entropy, the harder it is to break a cryptosystem.

Unicity Distance

For a message of length n, the number of different keys that will decipher a ciphertext message to
some intelligible plaintext in the same language as the original plaintext (such as an English text
string) is given by the following formula [712,95]:

2H(K)- nD - 1

Shannon [1432] defined the unicity distance, U, also called the unicity point, as an approximation of
the amount of ciphertext such that the sum of the real information (entropy) in the corresponding
plaintext plus the entropy of the encryption key equals the number of ciphertext bits used. He then
went on to show that ciphertexts longer than this distance are reasonably certain to have only one
meaningful decryption. Ciphertexts significantly shorter than this are likely to have multiple, equally
valid decryptions and therefore gain security from the opponent™s difficulty in choosing the correct

For most symmetric cryptosystems, the unicity distance is defined as the entropy of the cryptosystem
divided by the redundancy of the language.

U = H(K)/D

Unicity distance does not make deterministic predictions, but gives probabilistic results. Unicity
distance estimates the minimum amount of ciphertext for which it is likely that there is only a single
intelligible plaintext decryption when a brute-force attack is attempted. Generally, the longer the
unicity distance, the better the cryptosystem. For DES, with a 56-bit key, and an ASCII English
message, the unicity distance is about 8.2 ASCII characters or 66 bits. Table 11.1 gives the unicity
distances for varying key lengths. The unicity distances for some classical cryptosystems are found in

Unicity distance is not a measure of how much ciphertext is required for cryptanalysis, but how much
ciphertext is required for there to be only one reasonable solution for cryptanalysis. A cryptosystem
may be computationally infeasible to break even if it is theoretically possible to break it with a small
amount of ciphertext. (The largely esoteric theory of relativized cryptography is relevant here [230,
231, 232, 233, 234, 235].) The unicity distance is inversely proportional to the redundancy. As
redundancy approaches zero, even a trivial cipher can be unbreakable with a ciphertext-only attack.

Shannon defined a cryptosystem whose unicity distance is infinite as one that has ideal secrecy. Note
that an ideal cryptosystem is not necessarily a perfect cryptosystem, although a perfect cryptosystem
would necessarily be an ideal cryptosystem. If a cryptosystem has ideal secrecy, even successful
cryptanalysis will leave some uncertainty about whether the recovered plaintext is the real plaintext.

Information Theory in Practice

While these concepts have great theoretical value, actual cryptanalysis seldom proceeds along these
lines. Unicity distance guarantees insecurity if it™s too small but does not guarantee security if it™s
high. Few practical algorithms are absolutely impervious to analysis; all manner of characteristics
might serve as entering wedges to crack some encrypted messages. However, similar information
theory considerations are occasionally useful, for example, to determine a recommended key change
interval for a particular algorithm. Cryptanalysts also employ a variety of statistical and information

Page 197 of 666
Applied Cryptography: Second Edition - Bruce Schneier

theory tests to help guide the analysis in the most promising directions. Unfortunately, most literature
on applying information theory to cryptanalysis remains classified, including the seminal 1940 work
of Alan Turing.

Table 11.1
Unicity Distances of ASCII Text Encrypted
with Algorithms with Varying Key Lengths
Key Length (in bits) Unicity Distance (in characters)
40 5.9
56 8.2
64 9.4
80 11.8
128 18.8
256 37.6

Confusion and Diffusion

The two basic techniques for obscuring the redundancies in a plaintext message are, according to
Shannon, confusion and diffusion [1432].

Confusion obscures the relationship between the plaintext and the ciphertext. This frustrates attempts
to study the ciphertext looking for redundancies and statistical patterns. The easiest way to do this is
through substitution. A simple substitution cipher, like the Caesar Cipher, is one in which every
identical letter of plaintext is substituted for a single letter of ciphertext. Modern substitution ciphers
are more complex: A long block of plaintext is substituted for a different block of ciphertext, and the
mechanics of the substitution change with each bit in the plaintext or key. This type of substitution is
not necessarily enough; the German Enigma is a complex substitution algorithm that was broken
during World War II.

Diffusion dissipates the redundancy of the plaintext by spreading it out over the ciphertext. A
cryptanalyst looking for those redundancies will have a harder time finding them. The simplest way
to cause diffusion is through transposition (also called permutation). A simple transposition cipher,
like columnar transposition, simply rearranges the letters of the plaintext. Modern ciphers do this
type of permutation, but they also employ other forms of diffusion that can diffuse parts of the
message throughout the entire message.

Stream ciphers rely on confusion alone, although some feedback schemes add diffusion. Block
algorithms use both confusion and diffusion. As a general rule, diffusion alone is easily cracked
(although double transposition ciphers hold up better than many other pencil-and-paper systems).

11.2 Complexity Theory

Complexity theory provides a methodology for analyzing the computational complexity of different
cryptographic techniques and algorithms. It compares cryptographic algorithms and techniques and
determines their security. Information theory tells us that all cryptographic algorithms (except one-
time pads) can be broken. Complexity theory tells us whether they can be broken before the heat
death of the universe.

Page 198 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Complexity of Algorithms

An algorithm™s complexity is determined by the computational power needed to execute it. The
computational complexity of an algorithm is often measured by two variables: T (for time complexity)
and S (for space complexity, or memory requirement). Both T and S are commonly expressed as
functions of n, where n is the size of the input. (There are other measures of complexity: the number
of random bits, the communications bandwidth, the amount of data, and so on.)

Generally, the computational complexity of an algorithm is expressed in what is called “big O”
notation: the order of magnitude of the computational complexity. It™s just the term of the complexity
function which grows the fastest as n gets larger; all lower-order terms are ignored. For example, if
the time complexity of a given algorithm is 4n2 + 7n + 12, then the computational complexity is on the
order of n2, expressed O(n2).

Measuring time complexity this way is system-independent. You don™t have to know the exact timings
of various instructions or the number of bits used to represent different variables or even the speed of
the processor. One computer might be 50 percent faster than another and a third might have a data
path twice as wide, but the order-of-magnitude complexity of an algorithm remains the same. This
isn™t cheating; when you™re dealing with algorithms as complex as the ones presented here, the other
stuff is negligible (is a constant factor) compared to the order-of-magnitude complexity.

This notation allows you to see how the input size affects the time and space requirements. For
example, if T = O(n), then doubling the input size doubles the running time of the algorithm. If T = O
(2n), then adding one bit to the input size doubles the running time of the algorithm (within a constant

Generally, algorithms are classified according to their time or space complexities. An algorithm is
constant if its complexity is independent of n: O(1). An algorithm is linear, if its time complexity is O
(n). Algorithms can also be quadratic, cubic, and so on. All these algorithms are polynomial; their
complexity is O(nm), when m is a constant. The class of algorithms that have a polynomial time
complexity are called polynomial-time algorithms.

Algorithms whose complexities are O(t f(n)), where t is a constant greater than 1 and f (n) is some
polynomial function of n, are called exponential. The subset of exponential algorithms whose
complexities are O(c f(n)), where c is a constant and f (n) is more than constant but less than linear, is
called superpolynomial.

Ideally, a cryptographer would like to be able to say that the best algorithm to break this encryption
algorithm is of exponential-time complexity. In practice, the strongest statements that can be made,
given the current state of the art of computational complexity theory, are of the form “all known
cracking algorithms for this cryptosystem are of superpolynomial-time complexity.” That is, the
cracking algorithms that we know are of superpolynomial-time complexity, but it is not yet possible to
prove that no polynomial-time cracking algorithm could ever be discovered. Advances in
computational complexity may some day make it possible to design algorithms for which the existence
of polynomial-time cracking algorithms can be ruled out with mathematical certainty.

As n grows, the time complexity of an algorithm can make an enormous difference in whether the
algorithm is practical. Table 11.2 shows the running times for different algorithm classes in which n
equals one million. The table ignores constants, but also shows why ignoring constants is reasonable.

Page 199 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Table 11.2
Running Times of Different Classes of Algorithms
# of Operations for Time at
n = 106
Class Complexity 106 O/S
Constant O(1) 1 1 µsec.
Linear O(n) 106 1 sec.
Quadratic O(n2) 1012 11.6 days
Cubic O(n3) 1018 32, 000 yrs.
Exponential O(2n) 10301,030 10301,006 times the age of the universe

Assuming that the unit of “time” for our computer is a microsecond, the computer can complete a
constant algorithm in a microsecond, a linear algorithm in a second, and a quadratic algorithm in
11.6 days. It would take 32, 000 years to complete a cubic algorithm; not terribly practical, but a
computer built to withstand the next ice age would deliver a solution eventually. Performing the
exponential algorithm is futile, no matter how well you extrapolate computing power, parallel
processing, or contact with superintelligent aliens.

Look at the problem of a brute-force attack against an encryption algorithm. The time complexity of
this attack is proportional to the number of possible keys, which is an exponential function of the key
length. If n is the length of the key, then the complexity of a brute-force attack is O(2n). Section 12.3
discusses the controversy surrounding a 56-bit key for DES instead of a 112-bit key. The complexity
of a brute-force attack against a 56-bit key is 256; against a 112-bit key the complexity is 2112. The
former is possible; the latter isn™t.

Complexity of Problems

Complexity theory also classifies the inherent complexity of problems, not just the complexity of
particular algorithms used to solve problems. (Excellent introductions to this topic are [600, 211,
1226]; see also [1096, 27, 739].) The theory looks at the minimum time and space required to solve the
hardest instance of a problem on a theoretical computer known as a Turing machine . A Turing
machine is a finite-state machine with an infinite read-write memory tape. It turns out that a Turing
machine is a realistic model of computation.

Problems that can be solved with polynomial-time algorithms are called tractable, because they can
usually be solved in a reasonable amount of time for reasonable-sized inputs. (The exact definition of
“reasonable” depends on the circumstance.) Problems that cannot be solved in polynomial time are
called intractable, because calculating their solution quickly becomes infeasible. Intractable problems
are sometimes just called hard. Problems that can only be solved with algorithms that are
superpolynomial are computationally intractable, even for relatively small values of n.

It gets worse. Alan Turing proved that some problems are undecidable . It is impossible to devise any
algorithm to solve them, regardless of the algorithm™s time complexity.

Problems can be divided into complexity classes, which depend on the complexity of their solutions.
Figure 11.1 shows the more important complexity classes and their presumed relationships.
(Unfortunately, not much about this material has been proved mathematically.)

Page 200 of 666
Applied Cryptography: Second Edition - Bruce Schneier

On the bottom, the class P consists of all problems that can be solved in polynomial time. The class NP
consists of all problems that can be solved in polynomial time only on a nondeterministic Turing
machine: a variant of a normal Turing machine that can make guesses. The machine guesses the
solution to the problem”either by making “lucky guesses” or by trying all guesses in parallel”and
checks its guess in polynomial time.

NP ™s relevance to cryptography is this: Many symmetric algorithms and all public-key algorithms
can be cracked in nondeterministic polynomial time. Given a ciphertext C, the cryptanalyst simply
guesses a plaintext, X, and a key, k, and in polynomial time runs the encryption algorithm on inputs X
and k and checks whether the result is equal to C. This is important theoretically, because it puts an
upper bound on the complexity of cryptanalysis for these algorithms. In practice, of course, it is a
deterministic polynomial-time algorithm that the cryptanalyst seeks. Furthermore, this argument is
not applicable to all classes of ciphers; in particular, it is not applicable to one-time pads”for any C,
there are many X, k pairs that yield C when run through the encryption algorithm, but most of these
X s are nonsense, not legitimate plaintexts.

Figure 11.1 Complexity classes.

The class NP includes the class P, because any problem solvable in polynomial time on a deterministic
Turing machine is also solvable in polynomial time on a nondeterministic Turing machine; the
guessing stage can simply be omitted.

If all NP problems are solvable in polynomial time on a deterministic machine, then P = NP. Although
it seems obvious that some NP problems are much harder than others (a brute-force attack against an
encryption algorithm versus encrypting a random block of plaintext), it has never been proven that P
G (or that P = NP). However, most people working in complexity theory believe that they are

Stranger still, specific problems in NP can be proven to be as difficult as any problem in the class.
Steven Cook [365] proved that the Satisfiability problem (given a propositional Boolean formula, is
there a way to assign truth values to the variables that makes the formula true?) is NP-complete . This
means that, if Satisfiability is solvable in polynomial time, then P = NP. Conversely, if any problem in
NP can be proven not to have a deterministic polynomial-time algorithm, the proof will show that
Satisfiability does not have a deterministic polynomial-time algorithm either. No problem is harder
than Satisfiability in NP.

Since Cook™s seminal paper was published, a huge number of problems have been shown to be
equivalent to Satisfiability; hundreds are listed in [600], and some examples follow. By equivalent, I
mean that these problems are also NP-complete; they are in NP and also as hard as any problem in
NP . If their solvability in deterministic polynomial time were resolved, the P versus NP question
would be solved. The question of whether P = NP is the central unsolved question of computational
complexity theory, and no one expects it to be solved anytime soon. If someone showed that P = NP,

Page 201 of 666
Applied Cryptography: Second Edition - Bruce Schneier

then most of this book would be irrelevant: As previously explained, many classes of ciphers are
trivially breakable in nondeterministic polynomial time. If P = NP, they are breakable by feasible,
deterministic algorithms.

Further out in the complexity hierarchy is PSPACE . Problems in PSPACE can be solved in
polynomial space, but not necessarily polynomial time. PSPACE includes NP, but some problems in
PSPACE are thought to be harder than NP. Of course, this isn™t proven either. There is a class of
problems, the so-called PSPACE-complete problems, with the property that, if any one of them is in
NP then PSPACE = NP and if any one of them is in P then PSPACE = P .

And finally, there is the class of problems called EXPTIME . These problems are solvable in
exponential time. The EXPTIME-complete problems can actually be proven not to be solvable in
deterministic polynomial time. It has been shown that P does not equal EXPTIME .

NP-Complete Problems

Michael Garey and David Johnson compiled a list of over 300 NP-complete problems [600]. Here are
just a few of them:

” Traveling Salesman Problem. A traveling salesman has to visit n different cities using
only one tank of gas (there is a maximum distance he can travel). Is there a route that allows
him to visit each city exactly once on that single tank of gas? (This is a generalization of the
Hamiltonian Cycle problem”see Section 5.1.)
” Three-Way Marriage Problem. In a room are n men, n women, and n clergymen
(priests, rabbis, whatever). There is also a list of acceptable marriages, which consists of one
man, one woman, and one clergyman willing to officiate. Given this list of possible triples, is it
possible to arrange n marriages such that everyone is either marrying one person or officiating
at one marriage?
” Three-Satisfiability. There is a list of n logical statements, each with three variables.
For example: if (x and y) then z, (x and w) or (not z), if ((not u and not x) or (z and (u or not x)))
then (not z and u) or x), and so on. Is there a truth assignment for all the variables that satisfies
all the statements? (This is a special case of the Satisfiability problem previously mentioned.)

11.3 Number Theory

This isn™t a book on number theory, so I™m just going to sketch a few ideas that apply to
cryptography. If you want a detailed mathematical text on number theory, consult one of these books:
[1430, 72, 1171, 12, 959, 681, 742, 420]. My two favorite books on the mathematics of finite fields are
[971, 1042]. See also [88, 1157, 1158, 1060].

Modular Arithmetic

You all learned modular arithmetic in school; it was called “clock arithmetic.” Remember these word
problems? If Mildred says she™ll be home by 10:00, and she™s 13 hours late, what time does she get
home and for how many years does her father ground her? That™s arithmetic modulo 12. Twenty-
three modulo 12 equals 11.

(10 + 13) mod 12 = 23 mod 12 = 11 mod 12

Another way of writing this is to say that 23 and 11 are equivalent, modulo 12:

Page 202 of 666
Applied Cryptography: Second Edition - Bruce Schneier

G (mod 12)

Basically, a G (mod n) if a = b + kn for some integer k. If a is non-negative and b is between 0 and n,
you can think of b as the remainder of a when divided by n. Sometimes, b is called the residue of a,
modulo n. Sometimes a is called congruent to b, modulo n (the triple equals sign, , denotes
congruence). These are just different ways of saying the same thing.

The set of integers from 0 to n - 1 form what is called a complete set of residues modulo n. This means
that, for every integer a, its residue modulo n is some number from 0 to n - 1.

The operation a mod n denotes the residue of a, such that the residue is some integer from 0 to n - 1.
This operation is modular reduction. For example, 5 mod 3 = 2.

This definition of mod may be different from the definition used in some programming languages. For
example, PASCAL™s modulo operator sometimes returns a negative number. It returns a number
between -(n - 1) and n - 1. In C, the % operator returns the remainder from the division of the first
expression by the second; this can be a negative number if either operand is negative. For all the
algorithms in this book, make sure you add n to the result of the modulo operator if it returns a
negative number.

Modular arithmetic is just like normal arithmetic: It™s commutative, associative, and distributive.
Also, reducing each intermediate result modulo n yields the same result as doing the whole calculation
and then reducing the end result modulo n.

a + b) mod n = ((a mod n) + (b mod n)) mod n
(a - b) mod n = ((a mod n) - (b mod n)) mod n
(a*b) mod n = ((a mod n)*(b mod n)) mod n
(a*(b + c)) mod n = (((a*b) mod n) + ((a*c) mod n)) mod n

Cryptography uses computation mod n a lot, because calculating discrete logarithms and square roots
mod n can be hard problems. Modular arithmetic is also easier to work with on computers, because it
restricts the range of all intermediate values and the result. For a k- bit modulus, n, the intermediate
results of any addition, subtraction, or multiplication will not be more than 2k- bits long. So we can
perform exponentiation in modular arithmetic without generating huge intermediate results.
Calculating the power of some number modulo some number,

ax mod n,

is just a series of multiplications and divisions, but there are speedups. One kind of speedup aims to
minimize the number of modular multiplications; another kind aims to optimize the individual
modular multiplications. Because the operations are distributive, it is faster to do the exponentiation
as a stream of successive multiplications, taking the modulus every time. It doesn™t make much
difference now, but it will when you™re working with 200-bit numbers.

For example, if you want to calculate a8 mod n, don™t use the naïve approach and perform seven
multiplications and one huge modular reduction:

(a*a*a*a*a*a*a*a) mod n

Instead, perform three smaller multiplications and three smaller modular reductions:

Page 203 of 666
Applied Cryptography: Second Edition - Bruce Schneier

((a2 mod n)2 mod n)2 mod n

By the same token,

a16 mod n = (((a2 mod n)2 mod n)2 mod n)2 mod n

Computing ax mod n, where x is not a power of 2, is only slightly harder. Binary notation expresses x
as a sum of powers of 2: 25 is 11001 in binary, so 25 = 24 + 23 + 20. So

a25 mod n = (a*a24) mod n = (a*a8*a16) mod n
= (a*((a2)2)2*(((a2)2)2)2) mod n = ((((a2*a)2)2)2*a) mod n

With judicious storing of intermediate results, you only need six multiplications:

(((((((a2 mod n)*a) mod n)2 mod n)2 mod n)2 mod n)*a) mod n

This is called addition chaining [863], or the binary square and multiply method. It uses a simple and
obvious addition chain based on the binary representation. In C, it looks like:

unsigned long qe2(unsigned long x, unsigned long y, unsigned long n) {
unsigned long s,t,u;
int i;

s = 1; t = x; u = y;
while(u) {
if(u&1) s = (s* t)%n;
t = (t* t)%n;

Another, recursive, algorithm is:

unsigned long fast_exp(unsigned long x, unsigned long y, unsigned long N) {
unsigned long tmp;
if(y==1) return(x % N);
if ((y&1)==0) {
tmp = fast_exp(x,y/2,N);
return ((tmp* tmp)%N);
else {
tmp = fast_exp(x,(y-1)/2,N);
tmp = (tmp* tmp)%N;
tmp = (tmp* x)%N;
return (tmp);

This technique reduces the operation to, on the average, 1.5*k operations, if k is the length of the
number x in bits. Finding the calculation with the fewest operations is a hard problem (it has been
proven that the sequence must contain at least k- 1 operations), but it is not too hard to get the
number of operations down to 1.1*k or better, as k grows.

An efficient way to do modular reductions many times using the same n is Montgomery™s method
[1111]. Another method is called Barrett™s algorithm [87]. The software performance of these two

Page 204 of 666
Applied Cryptography: Second Edition - Bruce Schneier

algorithms and the algorithm previously discussed is in [210]: The algorithm I™ve discussed is the best
choice for singular modular reductions; Barrett™s algorithm is the best choice for small arguments;
and Montgomery™s method is the best choice for general modular exponentiations. (Montgomery™s
method can also take advantage of small exponents, using something called mixed arithmetic.)

The inverse of exponentiation modulo n is calculating a discrete logarithm . I™ll discuss this shortly.

Prime Numbers

A prime number is an integer greater than 1 whose only factors are 1 and itself: No other number
evenly divides it. Two is a prime number. So are 73, 2521, 2365347734339, and 2756839 - 1. There are an
infinite number of primes. Cryptography, especially public-key cryptography, uses large primes (512
bits and even larger) often.

Evangelos Kranakis wrote an excellent book on number theory, prime numbers, and their
applications to cryptography [896]. Paulo Ribenboim wrote two excellent references on prime
numbers in general [1307, 1308].

Greatest Common Divisor

Two numbers are relatively prime when they share no factors in common other than 1. In other
words, if the greatest common divisor of a and n is equal to 1. This is written:

gcd(a,n) = 1

The numbers 15 and 28 are relatively prime, 15 and 27 are not, and 13 and 500 are. A prime number
is relatively prime to all other numbers except its multiples.

One way to compute the greatest common divisor of two numbers is with Euclid™s algorithm . Euclid
described the algorithm in his book, Elements, written around 300 B.C. He didn™t invent it. Historians
believe the algorithm could be 200 years older. It is the oldest nontrivial algorithm that has survived
to the present day, and it is still a good one. Knuth describes the algorithm and some modern
modifications [863].

In C:

/* returns gcd of x and y */
int gcd (int x, int y)
int g;
if (x < 0)
x = -x;
if (y < 0)
y = -y;
if (x + y == 0)
g = y;
while (x > 0) {
g = x;
x = y % x;
y = g;
return g;

This algorithm can be generalized to return the gcd of an array of m numbers:

Page 205 of 666
Applied Cryptography: Second Edition - Bruce Schneier

/* returns the gcd of x1, x2...xm */
int multiple_gcd (int m, int *x)
size_t i;
int g;

if (m < 1)
return 0;
g = x[0];
for (i=1; i<m; ++i) {
g = gcd(g, x[i]);
/* optimization, since for random x[i], g==1 60% of the time: */
if (g == 1)
return 1;
return g;

Inverses Modulo a Number

Remember inverses? The multiplicative inverse of 4 is 1/4, because 4*1/4 = 1. In the modulo world, the
problem is more complicated:

G (mod 7)

This equation is equivalent to finding an x and k such that

4x = 7k + 1

where both x and k are integers.

The general problem is finding an x such that

1 = (a*x) mod n

This is also written as

G (mod n)
a -1

The modular inverse problem is a lot more difficult to solve. Sometimes it has a solution, sometimes
not. For example, the inverse of 5, modulo 14, is 3. On the other hand, 2 has no inverse modulo 14.

In general, a-1 G (mod n) has a unique solution if a and n are relatively prime. If a and n are not
relatively prime, then a-1 G (mod n) has no solution. If n is a prime number, then every number
from 1 to n- 1 is relatively prime to n and has exactly one inverse modulo n in that range.

So far, so good. Now, how do you go about finding the inverse of a modulo n? There are a couple of
ways. Euclid™s algorithm can also compute the inverse of a number modulo n. Sometimes this is called
the extended Euclidean algorithm.

Here™s the algorithm in C++:

#define isEven(x) ((x & 0x01) == 0)
#define isOdd(x) (x & 0x01)

Page 206 of 666
Applied Cryptography: Second Edition - Bruce Schneier

#define swap(x,y) (x ^= y, y ^= x, x ^= y)

void ExtBinEuclid(int *u, int *v, int *u1, int *u2, int *u3)
// warning: u and v will be swapped if u < v
int k, t1, t2, t3;

if (*u < *v) swap(*u,*v);
for (k = 0; isEven(*u) && isEven(*v); ++k) {
*u >>= 1; *v >>= 1;
*u1 = 1; *u2 = 0; *u3 = *u; t1 = *v; t2 = *u-1; t3 = *v;
do {
do {
if (isEven(*u3)) {
if (isOdd(*u1) || isOdd(*u2)) {
*u1 += *v; *u2 += *u;
*u1 >>= 1; *u2 >>= 1; *u3 >>= 1;
if (isEven(t3) || *u3 < t3) {
swap(*u1,t1); swap(*u2,t2); swap(*u3,t3);
} while (isEven(*u3));
while (*u1 < t1 || *u2 < t2) {
*u1 += *v; *u2 += *u;
*u1 -= t1; *u2 -= t2; *u3 -= t3;
} while (t3 > 0);
while (*u1 >= *v && *u2 >= *u) {
*u1 -= *v; *u2 -= *u;
*u1 <<= k; *u2 <<= k; *u3 <<= k;

main(int argc, char **argv) {
int a, b, gcd;

if (argc < 3) {
cerr << "Usage: xeuclid u v" << endl;
return -1;
int u = atoi(argv[1]);
int v = atoi(argv[2]);
if (u <= 0 || v <= 0) {
cerr << "Arguments must be positive!" << endl;
return -2;
// warning: u and v will be swapped if u < v
ExtBinEuclid(&u, &v, &a, &b, &gcd);
cout << a << " * " << u << " + (-"
<< b << ") * " << v << " = " << gcd << endl;
if (gcd == 1)
cout << "the inverse of " << v << " mod " << u << " is: "
<< u - b << endl;
return 0;

I™m not going to prove that it works or give the theory behind it. Details can be found in [863], or in
any of the number theory texts previously listed.

The algorithm is iterative and can be slow for large numbers. Knuth showed that the average number
of divisions performed by the algorithm is:

Page 207 of 666
Applied Cryptography: Second Edition - Bruce Schneier

.843*log2(n) + 1.47

Solving for Coefficients

Euclid™s algorithm can be used to solve this class of problems: Given an array of m variables x1,
x2,...xm, find an array of m coefficients, u1, u2...um, such that

u1*x1 + ...+ um*xm = 1

Fermat™s Little Theorem

If m is a prime, and a is not a multiple of m, then Fermat™s little theorem says

G1 (mod m)
am -1

(Pierre de Fermat, pronounced “Fair-ma, ” was a French mathematician who lived from 1601 to 1665.
This theorem has nothing to do with his last theorem.)

The Euler Totient Function

There is another method for calculating the inverse modulo n, but it™s not always possible to use it.
The reduced set of residues mod n is the subset of the complete set of residues that is relatively prime
to n. For example, the reduced set of residues mod 12 is {1, 5, 7, 11}. If n is prime, then the reduced set
of residues mod n is the set of all numbers from 1 to n- 1. The number 0 is never part of the reduced
set of residues for any n not equal to 1.

The Euler totient function, also called the Euler phi function and written as φ(n), is the number of
elements in the reduced set of residues modulo n. In other words, φ(n) is the number of positive
integers less than n that are relatively prime to n (for any n greater than 1). (Leonhard Euler,
pronounced “Oiler, ” was a Swiss mathematician who lived from 1707 to 1783.)

If n is prime, then φ(n) = n- 1. If n = pq, where p and q are prime, then φ(n) = (p- 1)(q- 1). These
numbers appear in some public-key algorithms; this is why.

According to Euler™s generalization of Fermat™s little theorem, if gcd(a,n) = 1, then

aφ(n) mod n = 1

Now it is easy to compute a-1 mod n:

x = aφ(n)-1 mod n

For example, what is the inverse of 5, modulo 7? Since 7 is prime, φ(7) = 7- 1 = 6. So, the inverse of 5,
modulo 7, is

56-1 mod 7 = 55 mod 7 = 3

Both methods for calculating inverses can be extended to solve for x in the general problem (if gcd
(a,n) = 1):

Page 208 of 666
Applied Cryptography: Second Edition - Bruce Schneier

(a*x) mod n = b

Using Euler™s generalization, solve

x = (b*a φ(n)-1) mod n

Using Euclid™s algorithm, solve

x = (b*(a-1 mod n)) mod n

In general, Euclid™s algorithm is faster than Euler™s generalization for calculating inverses, especially
for numbers in the 500-bit range. If gcd(a,n) G all is not lost. In this general case, (a*x) mod n = b,
can have multiple solutions or no solution.

Chinese Remainder Theorem

If you know the prime factorization of n, then you can use something called the Chinese remainder
theorem to solve a whole system of equations. The basic version of this theorem was discovered by the
first-century Chinese mathematician, Sun Tse.

In general, if the prime factorization of n is p1*p2*...*pt, then the system of equations

(x mod pi) = ai, where i = 1, 2,..., t

has a unique solution, x, where x is less than n. (Note that some primes can appear more than once.
For example, p1 might be equal to p2 .) In other words, a number (less than the product of some
primes) is uniquely identified by its residues mod those primes.

For example, use 3 and 5 as primes, and 14 as the number. 14 mod 3 = 2, and 14 mod 5 = 4. There is
only one number less than 3*5 = 15 which has those residues: 14. The two residues uniquely
determine the number.

So, for an arbitrary a < p and b < q (where p and q are prime), there exists a unique x, where x is less
than pq, such that

Ga (mod p), and x
G Gb (mod q)

To find this x, first use Euclid™s algorithm to find u, such that

G (mod p)

Then compute:

x = (((a - b)*u) mod p)*q + b

Here is the Chinese remainder theorem in C:

/* r is the number of elements in arrays m and u;
m is the array of (pairwise relatively prime) moduli
u is the array of coefficients

Page 209 of 666
Applied Cryptography: Second Edition - Bruce Schneier

return value is n such than n == u[k]%m[k] (k=0..r-1) and
n < m[0]*m[1]*...*m[r-1]

/* totient() is left as an exercise to the reader. */
int chinese_remainder (size_t r, int *m, int *u)
size_t i;
int modulus;
int n;
modulus = 1;
for (i=0; i<r; ++i)
modulus *= m[i];

n = 0;
for (i=0; i<r; ++i) {
n += u[i] * modexp(modulus / m[i], totient(m[i]),
n %= modulus;

return n;

The converse of the Chinese remainder theorem can also be used to find the solution to the problem:
if p and q are primes, and p is less than q, then there exists a unique x less than pq, such that

Gx Gx
G (mod p), and b G (mod q)

G mod p, then
If a

x = (((a - (b mod p))*u) mod p)*q + b

If a < b mod p, then

x = (((a + p - (b mod p))*u) mod p)*q + b

Quadratic Residues

If p is prime, and a is greater than 0 and less than p, then a is a quadratic residue mod p if

G (mod p), for some x

Not all values of a satisfy this property. For a to be a quadratic residue modulo n, it must be a
quadratic residue modulo all the prime factors of n. For example, if p = 7, the quadratic residues are
1, 2, and 4:

12 = 1 G (mod 7)
22 = 4 G (mod 7)
32 = 9 G (mod 7)
42 = 16 G (mod 7)
52 = 25 G (mod 7)
62 = 36 G1 (mod 7)

Note that each quadratic residue appears twice on this list.

Page 210 of 666
Applied Cryptography: Second Edition - Bruce Schneier

There are no values of x which satisfy any of these equations:

G (mod 7)
G (mod 7)
G (mod 7)

The quadratic nonresidues modulo 7, the numbers that are not quadratic residues, are 3, 5, and 6.

Although I will not do so here, it is easy to prove that, when p is odd, there are exactly (p- 1)/2
quadratic residues mod p and the same number of quadratic nonresidues mod p. Also, if a is a
quadratic residue mod p, then a has exactly two square roots, one of them between 0 and (p- 1)/2, and
the other between (p- 1)/2 and (p- 1). One of these square roots is also a quadratic residue mod p; this
is called the principal square root.

If n is the product of two primes, p and q, there are exactly (p- 1)(q- 1)/4 quadratic residues mod n. A
quadratic residue mod n is a perfect square modulo n. This is because to be a square mod n, the
residue must be a square mod p and a square mod q. For example, there are 11 quadratic residues
mod 35: 1, 4, 9, 11, 14, 15, 16, 21, 25, 29, and 30. Each quadratic residue has exactly four square roots.

Legendre Symbol

The Legendre symbol, written L(a,p), is defined when a is any integer and p is a prime greater than 2.
It is equal to 0, 1, or -1.

L(a,p) = 0 if a is divisible by p.
L(a,p) = 1 if a is a quadratic residue mod p.
L(a,p) = - 1 is a is a quadratic nonresidue mod p.

One way to calculate L(a,p) is:

L(a,p) = a(p- 1)/2 mod p

Or you can use the following algorithm:

1. If a = 1, then L(a,p) = 1


. 9
( 29)