ńňđ. 1 
Next Page
Prev. page
Next Page Next Chapter
CHAPTER 11 ................................................ 233
Mathematical Background ........................ 233
INFORMATION THEORY ........................ 233
Entropy and Uncertainty ................................. 233
Rate of a Language ....................................... 234
Security of a Cryptosystem ........................... 234
Unicity Distance ............................................ 235
Information Theory in Practice ...................... 236
Table 11.1 Unicity Distances of ASCII .......... 236
Confusion and Diffusion ................................. 237
COMPLEXITY THEORY ........................... 237
Complexity Algorithms .................................. 237
Table 11.2 Running Times of Different ......... 239
Complexity Problems .................................... 239
Figure 11.1 Complexity classes. .................... 240
NPComplete Problems ................................ 241
NUMBER THEORY .................................. 242
Modular Arithmetic ........................................ 242
Prime Numbers ............................................. 245
Greatest Common Divisor ............................. 245
lnverses Modulo a Number ........................... 246
Solving for Coefficients ................................. 248
Fermats Little Theorem ................................ 248
The Euler Totient Function ............................ 248
Chinese Remainder Theorem ....................... 249
Quadratic Residues ....................................... 250
Legendre Symbol ........................................... 251
Jacobi Symbol ............................................... 252
Blum Integers ................................................ 253
Generators .................................................... 253
Computing in a Galois Field .......................... 254
FACTORING ............................................ 255
Square Roots Modulo ................................... 258
PRIME NUMBER GENERATION ............. 258
SolovayStrassen .......................................... 259
Lehmann ....................................................... 259
RabinMiller ................................................... 259
Practical Considerations ............................... 260
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
Strong Primes ............................................... 261
DISCRETE LOGARITHMS IN A FINITE ... 261
Calculating Discrete Logarithms in a .............. 262
Page 233
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
11
CHAPTER
Mathematical
Background
11.1 THEORY
INFORMATION
Modern information theory was first published in 1948 by Claude Elmwood Shan
non [1431,1432]. (His papers have been reprinted by the IEEE Press [1433].) For a
good mathematical treatment of the topic, consult [593]. In this section, I will just
sketch some important ideas.
Entropy and Uncertainty
Information theory defines the amount of information in a message as the mini
mum number of bits needed to encode all possible meanings of that message,
assuming all messages are equally likely. For example, the dayoftheweek field in
a database contains no more than 3 bits of information, because the information can
be encoded with 3 bits:
000 = Sunday
001 = Monday
010 = Tuesday
011 = Wednesday
100 = Thursday
101 = Friday
110 = Saturday
111 is unused
If this information were represented by corresponding ASCII character strings, it
would take up more memory space but would not contain any more information.
Similarly, the â€śsexâ€ť field of a database contains only 1 bit of information, even
though it might be stored as one of two 6byte ASCII strings: â€śMALEâ€ť or â€śFEMALE.â€ť
Formally, the amount of information in a message M is measured by the entropy
of a message, denoted by H(M). The entropy of a message indicating sex is 1 bit; the
entropy of a message indicating the day of the week is slightly less than 3 bits. In
Page 234
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
general, the entropy of a message measured in bits is log, n, in which n is the num
ber of possible meanings. This assumes that each meaning is equally likely.
The entropy of a message also measures its uncertainty. This is the number of
plaintext bits needed to be recovered when the message is scrambled in ciphertext
in order to learn the plaintext. For example, if the ciphertext block â€śQHP*5Mâ€ť is
either â€śMALEâ€ť or â€śFEMALE,â€ť then the uncertainty of the message is 1. A cryptan
alyst has to learn only one wellchosen bit to recover the message.
Rate of a Language
For a given language, the rate of the language is
r = H( M)/N
in which N is the length of the message. The rate of normal English takes various
values between 1.0 bits/letter and 1.5 bits/letter, for large values of N. Shannon, in
[1434], said that the entropy depends on the length of the text. Specifically he indi
cated a rate of 2.3 bits/letter for 8letter chunks, but the rate drops to between 1.3
and 1.5 for 16letter chunks. Thomas Cover used a gambling estimating technique
and found an entropy of 1.3 bits/character [386]. (Iâ€™ll use 1.3 in this book.) The abso
lute rate of a language is the maximum number of bits that can be coded in each
character, assuming each character sequence is equally likely. If there are L charac
ters in a language, the absolute rate is:
R = log, L
This is the maximum entropy of the individual characters.
For English, with 26 letters, the absolute rate is log, 26, or about 4.7 bits/letter. It
should come as no surprise to anyone that the actual rate of English is much less
than the absolute rate; natural language is highly redundant.
The redundancy of a language, denoted D, is defined by:
D=Rr
Given that the rate of English is 1.3, the redundancy is 3.4 bits/letter. This means
that each English character carries 3.4 bits of redundant information.
An ASCII message that is nothing more than printed English has 1.3 bits of infor
mation per byte of message. This means it has 6.7 bits of redundant information,
giving it an overall redundancy of 0.84 bits of information per bit of ASCII text, and
an entropy of 0.16 bits of information per bit of ASCII text. The same message in
BAUDOT, at 5 bits per character, has a redundancy of 0.74 bits per bit and an
entropy of 0.26 bits per bit. Spacing, punctuation, numbers, and formatting modify
these results.
Security of a Cryptosystem
Shannon defined a precise mathematical model of what it means for a cryptosystem
to be secure. The goal of a cryptanalyst is to determine the key K, the plaintext P or
both. However, he may be satisfied with some probabilistic information about P:
whether it is digitized audio, German text, spreadsheet data, or something else.
Page 235
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
In most realworld cryptanalysis, the cryptanalyst has some probabilistic infor
mation about P before he even starts. He probably knows the language of the plain
text. This language has a certain redundancy associated with it. If it is a message to
Bob, it probably begins with â€śDear Bob.â€ť Certainly â€śDear Bobâ€ť is more probable
than â€śe8T&g [,m.â€ť The purpose of cryptanalysis is to modify the probabilities asso
ciated with each possible plaintext. Eventually one plaintext will emerge from the
pile of possible plaintexts as certain (or at least, very probable).
There is such a thing as a cryptosystem that achieves perfect secrecy: a cryp
tosystem in which the ciphertext yields no possible information about the plaintext
(except possibly its length). Shannon theorized that it is only possible if the number
of possible keys is at least as large as the number of possible messages. In other
words, the key must be at least as long as the message itself, and no key can be
reused. In still other words, the onetime pad (see Section 1.5) is the only cryptosys
tern that achieves perfect secrecy.
Perfect secrecy aside, the ciphertext unavoidably yields some information about the
corresponding plaintext. A good cryptographic algorithm keeps this information to a
minimum; a good cryptanalyst exploits this information to determine the plaintext.
Cryptanalysts use the natural redundancy of language to reduce the number of
possible plaintexts. The more redundant the language, the easier it is to cryptana
lyze. This is the reason that many realworld cryptographic implementations use a
compression program to reduce the size of the text before encrypting it. Compres
sion reduces the redundancy of a message as well as the work required to encrypt
and decrypt.
The entropy of a cryptosystem is a measure of the size of the keyspace, K. It is
approximated by the base two logarithm of the number of keys:
H(K) = log, K
A cryptosystem with a 64bit key has an entropy of 64 bits; a cryptosystem with
a 56bit key has an entropy of 56 bits. In general, the greater the entropy, the harder
it is to break a cryptosystem.
Unicity Distance
For a message of length n, the number of different keys that will decipher a cipher
text message to some intelligible plaintext in the same language as the original
plaintext (such as an English text string) is given by the following formula [712,95]:
2W  ˜JJ 1
_
Shannon [ 14321defined the unicity distance, U, also called the unicity point, as an
approximation of the amount of ciphertext such that the sum of the real informa
tion (entropy) in the corresponding plaintext plus the entropy of the encryption key
equals the number of ciphertext bits used. He then went on to show that ciphertexts
longer than this distance are reasonably certain to have only one meaningful decryp
tion. Ciphertexts significantly shorter than this are likely to have multiple, equally
valid decryptions and therefore gain security from the opponentâ€™s difficulty in
choosing the correct one.
Page 236
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
For most symmetric cryptosystems, the unicity distance is defined as the entropy
of the cryptosystem divided by the redundancy of the language.
U = H(K)/D
Unicity distance does not make deterministic predictions, but gives probabilistic
results. Unicity distance estimates the minimum amount of ciphertext for which it
is likely that there is only a single intelligible plaintext decryption when a brute
force attack is attempted. Generally, the longer the unicity distance, the better the
cryptosystem. For DES, with a 56bit key, and an ASCII English message, the unicity
distance is about 8.2 ASCII characters or 66 bits. Table 11.l gives the unicity dis
tances for varying key lengths. The unicity distances for some classical cryptosys
terns are found in [445].
Unicity distance is not a measure of how much ciphertext is required for crypt
analysis, but how much ciphertext is required for there to be only one reasonable
solution for cryptanalysis. A cryptosystem may be computationally infeasible to
break even if it is theoretically possible to break it with a small amount of cipher
text. (The largely esoteric theory of relativized cryptography is relevant here
[230,231,232,233,234,235].) The unicity distance is inversely proportional to the
redundancy. As redundancy approaches zero, even a trivial cipher can be unbreak
able with a ciphertextonly attack.
Shannon defined a cryptosystem whose unicity distance is infinite as one that has
ideal secrecy. Note that an ideal cryptosystem is not necessarily a perfect cryp
tosystem, although a perfect cryptosystem would necessarily be an ideal cryptosys
tern. If a cryptosystem has ideal secrecy, even successful cryptanalysis will leave
some uncertainty about whether the recovered plaintext is the real plaintext.
Information Theory in Practice
While these concepts have great theoretical value, actual cryptanalysis seldom
proceeds along these lines. Unicity distance guarantees insecurity if itâ€™s too small
but does not guarantee security if itâ€™s high. Few practical algorithms are absolutely
impervious to analysis; all manner of characteristics might serve as entering wedges
Table 11.1
Unicity Distances of ASCII Text Encrypted
with Algorithms with Varying Key Lengths
Key Length (in bits) Unicity Distance (in characters)
40 5.9
56 8.2
64 9.4
80 11.8
128 18.8
256 37.6
Page 237
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
11.2 Complexity Theory
to crack some encrypted messages. However, similar information theory considera
tions are occasionally useful, for example, to determine a recommended key change
interval for a particular algorithm. Cryptanalysts also employ a variety of statistical
and information theory tests to help guide the analysis in the most promising direc
tions. Unfortunately, most literature on applying information theory to cryptanaly
sis remains classified, including the seminal 1940 work of Alan Turing.
and Diffusion
Confusion
The two basic techniques for obscuring the redundancies in a plaintext message
are, according to Shannon, confusion and diffusion [ 14321.
Confusion obscures the relationship between the plaintext and the ciphertext.
This frustrates attempts to study the ciphertext looking for redundancies and sta
tistical patterns. The easiest way to do this is through substitution. A simple sub
stitution cipher, like the Caesar Cipher, is one in which every identical letter of
plaintext is substituted for a single letter of ciphertext. Modern substitution ciphers
are more complex: A long block of plaintext is substituted for a different block of
ciphertext, and the mechanics of the substitution change with each bit in the plain
text or key. This type of substitution is not necessarily enough; the German Enigma
is a complex substitution algorithm that was broken during World War II.
Diffusion dissipates the redundancy of the plaintext by spreading it out over the
ciphertext. A cryptanalyst looking for those redundancies will have a harder time
finding them. The simplest way to cause diffusion is through transposition (also
called permutation). A simple transposition cipher, like columnar transposition,
simply rearranges the letters of the plaintext. Modern ciphers do this type of per
mutation, but they also employ other forms of diffusion that can diffuse parts of the
message throughout the entire message.
Stream ciphers rely on confusion alone, although some feedback schemes add dif
fusion. Block algorithms use both confusion and diffusion. As a general rule, diffu
sion alone is easily cracked (although double transposition ciphers hold up better
than many other pencilandpaper systems).
11.2 THEORY
COMPLEXITY
Complexity theory provides a methodology for analyzing the computational com
plexity of different cryptographic techniques and algorithms. It compares crypto
graphic algorithms and techniques and determines their security. Information
theory tells us that all cryptographic algorithms (except onetime pads) can be bro
ken. Complexity theory tells us whether they can be broken before the heat death
of the universe.
of Algorithms
Complexity
An algorithmâ€™s complexity is determined by the computational power needed to
execute it. The computational complexity of an algorithm is often measured by two
variables: T (for time complexity) and S (for space complexity, or memory require
Page 238
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
ment). Both T and S are commonly expressed as functions of n, where n is the size
of the input. (There are other measures of complexity: the number of random bits,
the communications bandwidth, the amount of data, and so on.)
Generally, the computational complexity of an algorithm is expressed in what is
called â€śbig 0â€ť notation: the order of magnitude of the computational complexity.
Itâ€™s just the term of the complexity function which grows the fastest as n gets larger;
all lowerorder terms are ignored. For example, if the time complexity of a given
algorithm is 49 + 7n + 12, then the computational complexity is on the order of n2,
expressed O(nâ€ť).
Measuring time complexity this way is systemindependent. You donâ€™t have to
know the exact timings of various instructions or the number of bits used to repre
sent different variables or even the speed of the processor. One computer might be
50 percent faster than another and a third might have a data path twice as wide, but
the orderofmagnitude complexity of an algorithm remains the same. This isnâ€™t
cheating; when youâ€™re dealing with algorithms as complex as the ones presented
here, the other stuff is negligible (is a constant factor) compared to the orderof
magnitude complexity.
This notation allows you to see how the input size affects the time and space
requirements. For example, if T = O(n), then doubling the input size doubles the run
ning time of the algorithm. If T = 0(2â€ť), then adding one bit to the input size doubles
the running time of the algorithm (within a constant factor).
Generally, algorithms are classified according to their time or space complexities.
An algorithm is constant if its complexity is independent of n: O( 1). An algorithm is
linear, if its time complexity is O(n). Algorithms can also be quadratic, cubic, and so
on. All these algorithms are polynomial; their complexity is O(nm), when m is a con
stant. The class of algorithms that have a polynomial time complexity are called
polynomialtime algorithms.
Algorithms whose complexities are 0( #â€śI), where t is a constant greater than 1 and
f(n) is some polynomial function of n, are called exponential. The subset of expo
nential algorithms whose complexities are O(cflD˜), where c is a constant and f(n) is
more than constant but less than linear, is called superpolynomial.
Ideally, a cryptographer would like to be able to say that the best algorithm to
break this encryption algorithm is of exponentialtime complexity. In practice, the
strongest statements that can be made, given the current state of the art of compu
tational complexity theory, are of the form â€śall known cracking algorithms for this
cryptosystem are of superpolynomialtime complexity.â€ť That is, the cracking algo
rithms that we know are of superpolynomialtime complexity, but it is not yet pos
sible to prove that no polynomialtime cracking algorithm could ever be discovered.
Advances in computational complexity may some day make it possible to design
algorithms for which the existence of polynomialtime cracking algorithms can be
ruled out with mathematical certainty.
As n grows, the time complexity of an algorithm can make an enormous differ
ence in whether the algorithm is practical. Table 11.2 shows the running times for
different algorithm classes in which n equals one million. The table ignores con
stants, but also shows why ignoring constants is reasonable.
Page 239
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
Table 11.2
Running Times of Different Classes of Algorithms
# of Operations Time at
Class Complexity forn = lo6 lo6 o/s
1
Constant 1 psec.
O(l)
6
Linear 1 sec.
O(n)
11oq2
Quadratic 11.6 days
Obâ€ť)
Cubic 10â€™S 32,000 yrs.
%â€śI
˜0301,030
Exponential 10301,006times the age
Wâ€ť)
of the universe
Assuming that the unit of â€śtimeâ€ť for our computer is a microsecond, the com
puter can complete a constant algorithm in a microsecond, a linear algorithm in a
second, and a quadratic algorithm in 11.6 days. It would take 32,000 years to com
plete a cubic algorithm; not terribly practical, but a computer built to withstand the
next ice age would deliver a solution eventually. Performing the exponential algo
rithm is futile, no matter how well you extrapolate computing power, parallel pro
cessing, or contact with superintelligent aliens.
Look at the problem of a bruteforce attack against an encryption algorithm. The
time complexity of this attack is proportional to the number of possible keys, which
is an exponential function of the key length. If n is the length of the key, then the
complexity of a bruteforce attack is O(2â€ť). Section 12.3 discusses the controversy
surrounding a 56bit key for DES instead of a 112bit key. The complexity of a brute
force attack against a 56bit key is 256;against a 112bit key the complexity is 2112.
The former is possible; the latter isnâ€™t.
of Problems
Complexity
Complexity theory also classifies the inherent complexity of problems, not just
the complexity of particular algorithms used to solve problems. (Excellent intro
ductions to this topic are [600,211,1226]; see also [1096,27,739].) The theory looks at
the minimum time and space required to solve the hardest instance of a problem on
a theoretical computer known as a Turing machine. A Turing machine is a finite
state machine with an infinite readwrite memory tape. It turns out that a Turing
machine is a realistic model of computation.
Problems that can be solved with polynomialtime algorithms are called tractable,
because they can usually be solved in a reasonable amount of time for reasonable
sized inputs. (The exact definition of â€śreasonableâ€ť depends on the circumstance.)
Problems that cannot be solved in polynomial time are called intractable, because
calculating their solution quickly becomes infeasible. Intractable problems are
sometimes just called hard. Problems that can only be solved with algorithms that
are superpolynomial are computationally intractable, even for relatively small val
ues of n.
Page 240
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
It gets worse. Alan Turing proved that some problems are undecidable. It is im
possible to devise any algorithm to solve them, regardless of the algorithmâ€™s time
complexity.
problems can be divided into complexity classes, which depend on the complex
ity of their solutions. Figure 11.1 shows the more important complexity classes and
their presumed relationships. (Unfortunately, not much about this material has
been proved mathematically.)
On the bottom, the class P consists of all problems that can be solved in polyno
mial time. The class NP consists of all problems that can be solved in polynomial
time only on a nondeterministic Turing machine: a variant of a normal Turing
machine that can make guesses. The machine guesses the solution to the problem
either by making â€ślucky guessesâ€ť or by trying all guesses in paralleland checks its
guess in polynomial time.
NPâ€™s relevance to cryptography is this: Many symmetric algorithms and all public
key algorithms can be cracked in nondeterministic polynomial time. Given a
ciphertext C, the cryptanalyst simply guesses a plaintext, X, and a key, k, and in
polynomial time runs the encryption algorithm on inputs X and k and checks
whether the result is equal to C. This is important theoretically, because it puts an
upper bound on the complexity of cryptanalysis for these algorithms. In practice, of
course, it is a deterministic polynomialtime algorithm that the cryptanalyst seeks.
Furthermore, this argument is not applicable to all classes of ciphers; in particular,
it is not applicable to onetime padsfor any C, there are many X, k pairs that yield
C when run through the encryption algorithm, but most of these Xs are nonsense,
not legitimate plaintexts.
Figure 11.1 Complexity classes.
Page 241
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
11.2 Complexity Theory
The class NP includes the class P, because any problem solvable in polynomial
time on a deterministic Turing machine is also solvable in polynomial time on a
nondeterministic Turing machine; the guessing stage can simply be omitted.
If all NP problems are solvable in polynomial time on a deterministic machine,
then P = NP. Although it seems obvious that some NP problems are much harder
than others (a bruteforce attack against an encryption algorithm versus encrypt
ing a random block of plaintext), it has never been proven that P z NP (or that
P = NP). However, most people working in complexity theory believe that they
are unequal.
Stranger still, specific problems in NP can be proven to be as difficult as any prob
lem in the class. Steven Cook [365] proved that the Satisfiability problem (given a
propositional Boolean formula, is there a way to assign truth values to the variables
that makes the formula true?) is NPcomplete. This means that, if Satisfiability is
solvable in polynomial time, then P = NP. Conversely, if any problem in NP can be
proven not to have a deterministic polynomialtime algorithm, the proof will show
that Satisfiability does not have a deterministic polynomialtime algorithm either.
No problem is harder than Satisfiability in NP.
Since Cookâ€™s seminal paper was published, a huge number of problems have been
shown to be equivalent to Satisfiability; hundreds are listed in [600], and some
examples follow. By equivalent, I mean that these problems are also NPcomplete;
they are in NP and also as hard as any problem in NP. If their solvability in deter
ministic polynomial time were resolved, the P versus NP question would be solved.
The question of whether P = NP is the central unsolved question of computational
complexity theory, and no one expects it to be solved anytime soon. If someone
showed that P = NP, then most of this book would be irrelevant: As previously
explained, many classes of ciphers are trivially breakable in nondeterministic poly
nomial time. If P = NP, they are breakable by feasible, deterministic algorithms.
Further out in the complexity hierarchy is PSPACE. Problems in PSPACE can be
solved in polynomial space, but not necessarily polynomial time. PSPACE includes
NP, but some problems in PSPACE are thought to be harder than NP. Of course, this
isnâ€™t proven either. There is a class of problems, the socalled PSPACEcomplete
problems, with the property that, if any one of them is in NP then PSPACE = NP and
if any one of them is in P then PSPACE = P.
And finally, there is the class of problems called EXPTIME. These problems are
solvable in exponential time. The EXPTIMEcomplete problems can actually be
proven not to be solvable in deterministic polynomial time. It has been shown that
P does not equal EXPTIME.
NPComplete Problems
Michael Garey and David Johnson compiled a list of over 300 NPcomplete prob
lems [600]. Here are just a few of them:
 Traveling Salesman Problem. A traveling salesman has to visit n dif
ferent cities using only one tank of gas (there is a maximum distance
he can travel). Is there a route that allows him to visit each city
Page 242
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
exactly once on that single tank of gas? (This is a generalization of the
Hamiltonian Cycle problemsee Section 5.1.)
 ThreeWay Marriage Problem. In a room are n men, n women, and n
clergymen (priests, rabbis, whatever). There is also a list of acceptable
marriages, which consists of one man, one woman, and one clergy
man willing to officiate. Given this list of possible triples, is it possi
ble to arrange n marriages such that everyone is either marrying one
person or officiating at one marriage?
 ThreeSatisfiability. There is a list of n logical statements, each with
three variables. For example: if (x and y) then z, (x and w) or (not z), if
((not u and not x) or (z and (u or not x))) then (not z and u) or x), and so
on. Is there a truth assignment for all the variables that satisfies all
the statements? (This is a special case of the Satisfiability problem
previously mentioned.)
11.3 NUMBERTHEORY
This isnâ€™t a book on number theory, so Iâ€™m just going to sketch a few ideas that
apply to cryptography. If you want a detailed mathematical text on number theory,
consult one of these books: [1430,72,1171,12,959,681,742,420]. My two favorite
books on the mathematics of finite fields are [971,1042]. See also [88,1157,
1158,1060].
Modular Arithmetic
You all learned modular arithmetic in school; it was called â€śclock arithmetic.â€ť
Remember these word problems? If Mildred says sheâ€™ll be home by lO:OO,and sheâ€™s
13 hours late, what time does she get home and for how many years does her father
ground her? Thatâ€™s arithmetic modulo 12. Twentythree modulo 12 equals 11.
(10 + 13) mod 12 = 23 mod 12 = 11 mod 12
Another way of writing this is to say that 23 and 11 are equivalent, modulo 12:
23 = 11 (mod 12)
Basically, a = b (mod n) if a = b + kn for some integer k. If a is nonnegative and b
is between 0 and n, you can think of b as the remainder of a when divided by n.
Sometimes, b is called the residue of a, modulo n. Sometimes 0 is called congruent
to b, modulo n (the triple equals sign, =, denotes congruence). These are just differ
ent ways of saying the same thing.
The set of integers from 0 to n  1 form what is called a complete set of residues
modulo n. This means that, for every integer a, its residue modulo n is some num
berfromoton 1.
The operation a mod n denotes the residue of a, such that the residue is some inte
ger from 0 to n  1. This operation is modular reduction. For example, 5 mod 3 = 2.
This definition of mod may be different from the definition used in some pro
gramming languages. For example, PASCALâ€™s modulo operator sometimes returns a
Page 243
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
11.3 Number Theory
negative number. It returns a number between (n  1) and n  1. In C, the % oper
ator returns the remainder from the division of the first expression by the second;
this can be a negative number if either operand is negative. For all the algorithms in
this book, make sure you add n to the result of the modulo operator if it returns a
negative number.
Modular arithmetic is just like normal arithmetic: Itâ€™s commutative, associative,
and distributive. Also, reducing each intermediate result modulo n yields the same
result as doing the whole calculation and then reducing the end result modulo n.
(a + b) mod n = ((a mod n) + (b mod n)) mod n
(a  b) mod n = ((a mod n)  (b mod n)) mod n
(a*b) mod n = ((a mod n)*(b mod n)) mod n
(a*(b + c)) mod n = (((a*b) mod n) + ((a*c) mod n)) mod n
Cryptography uses computation mod n a lot, because calculating discrete loga
rithms and square roots mod n can be hard problems. Modular arithmetic is also eas
ier to work with on computers, because it restricts the range of all intermediate
values and the result. For a kbit modulus, n, the intermediate results of any addi
tion, subtraction, or multiplication will not be more than 2kbits long. So we can
perform exponentiation in modular arithmetic without generating huge intermedi
ate results. Calculating the power of some number modulo some number,
ax mod n,
is just a series of multiplications and divisions, but there are speedups. One kind of
speedup aims to minimize the number of modular multiplications; another kind
aims to optimize the individual modular multiplications. Because the operations
are distributive, it is faster to do the exponentiation as a stream of successive mul
tiplications, taking the modulus every time. It doesnâ€™t make much difference now,
but it will when youâ€™re working with 200bit numbers.
For example, if you want to calculate as mod n, donâ€™t use the naâ€™ive approach and
perform seven multiplications and one huge modular reduction:
(a*a*a*a*a*a*a*a) mod n
Instead, perform three smaller multiplications and three smaller modular reductions:
((a2 mod n)2 mod n)2 mod n
By the same token,
al6 mod n = (((aâ€ť mod n12mod n)â€ť mod n)2 mod n
Computing ax mod n, where x is not a power of 2, is only slightly harder. Binary
notation expresses x as a sum of powers of 2: 25 is 11001 in binary, so 25 = 24 + 23 +
2O.so
a25mod n = (a*a24) mod n = (a+a8*a16)mod n
= (a*((a2)2)2*(((a2)2)2)2) n = ((((a2*a)2)2)2*a)
mod mod n
Page 244
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
With judicious storing of intermediate results, you only need six multiplications:
(((((((aâ€ť mod n)*a) mod n)â€ť mod n)â€™ mod n)â€™ mod n)*a) mod n
This is called addition chaining [863], or the binary square and multiply method.
It uses a simple and obvious addition chain based on the binary representation. In C,
it looks like:
unsigned long y, unsigned long n) I
unsigned long qeZ(unsigned long x,
unsigned long s,t,u;
int i;
s = 1; t = x; u = y;
while(u) I
if(u&l) s = (s*t)%n;
u>>=l ;
t = (t*t)%n;
return(s);
Another, recursive, algorithm is:
unsigned long y, unsigned long N) I
unsigned long fastexpcunsigned long x,
unsigned long tmp;
if(y==l) returncx % N);
if ((y&l)==O) I
tmp = fastexp(x,y/Z,N);
return ((tmp*tmp)%N);
1
else (
tmp = fastexp(x,(y1)/2,N);
tmp = (tmp*tmp)%N;
tmp = (tmp*x)%N;
return (tmp);
This technique reduces the operation to, on the average, 1.5*k operations, if k is
the length of the number x in bits. Finding the calculation with the fewest opera
tions is a hard problem (it has been proven that the sequence must contain at least
k  1 operations), but it is not too hard to get the number of operations down to
1.1 l k or better, as k grows.
An efficient way to do modular reductions many times using the same n is Mont
gomeryâ€™s method [ 11111.Another method is called Barrettâ€™s algorithm (871.The soft
ware performance of these two algorithms and the algorithm previously discussed is
in [2 lo]: The algorithm Iâ€™ve discussed is the best choice for singular modular reduc
tions; Barrettâ€™s algorithm is the best choice for small arguments; and Montgomeryâ€™s
method is the best choice for general modular exponentiations. (Montgomeryâ€™s
method can also take advantage of small exponents, using something called mixed
arithmetic.)
Page 245
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
The inverse of exponentiation modulo n is calculating a discrete logarithm. Iâ€™ll
discuss this shortly.
Prime Numbers
A prime number is an integer greater than 1 whose only factors are 1 and itself:
No other number evenly divides it. Two is a prime number. So are 73, 2521,
2365347734339, and 2756839 1. There are an infinite number of primes. Cryptog

raphy, especially publickey cryptography, uses large primes (512 bits and even
larger) often.
Evangelos Kranakis wrote an excellent book on number theory, prime numbers,
and their applications to cryptography [896]. Paulo Ribenboim wrote two excellent
references on prime numbers in general [ 1307,1308].
Greatest Common Divisor
Two numbers are relatively prime when they share no factors in common other
than 1. In other words, if the greatest common divisor of a and n is equal to 1. This
is written:
gcd(a,n) = 1
The numbers 15 and 28 are relatively prime, 15 and 27 are not, and 13 and 500 are.
A prime number is relatively prime to all other numbers except its multiples.
One way to compute the greatest common divisor of two numbers is with Euclidâ€™s
algorithm. Euclid described the algorithm in his book, Elements, written around 300
B.C. He didnâ€™t invent it. Historians believe the algorithm could be 200 years older. It
is the oldest nontrivial algorithm that has survived to the present day, and it is still
a good one. Knuth describes the algorithm and some modern modifications [863].
In C:
/* returns gcd of x and y */
int gcd (int x, int y)
i
int g;
if (x < 0)
x = x;
if (y < 0)
Y = Yi
if (x + y == 0)
ERROR;
9 = Y;
while (x > 0) {
g = x;
x=y%x;
Y = 4;
I
return g;
Page 246
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
This algorithm can be generalized to return the gcd of an array of m numbers:
/* returns the gcd of xl, xZ...xm */
int multiplegcd (int m, int *x)
sizet i;
int g;
if Cm < 1)
return 0;
g = x[Ol;
for (i=l; i<m; ++i) 1
g = gcd(g, x[il);
since for random x[il, g==l 60% of the time: */
/* optimization,
if (g == 1)
return 1;
return g;
lnverses Modulo a Number
Remember inverses? The multiplicative inverse of 4 is l/4, because 4* l/4 = 1. In
the modulo world, the problem is more complicated:
4*x = 1 (mod 7)
This equation is equivalent to finding an x and k such that
4x=7k+ 1
where both x and k are integers.
The general problem is finding an x such that
1 = (a*˜) mod n
This is also written as
ul = x (mod n)
The modular inverse problem is a lot more difficult to solve. Sometimes it has a
solution, sometimes not. For example, the inverse of 5, modulo 14, is 3. On the
other hand, 2 has no inverse modulo 14.
In general, ul = x (mod n) has a unique solution if a and n are relatively prime. If
u and n are not relatively prime, then aâ€™ = x (mod n) has no solution. If n is a prime
number, then every number from 1 to n  1 is relatively prime to n and has exactly
one inverse modulo n in that range.
So far, so good. Now, how do you go about finding the inverse of a modulo n!
There are a couple of ways. Euclidâ€™s algorithm can also compute the inverse of a
number modulo n. Sometimes this is called the extended Euclidean algorithm.
Hereâ€™s the algorithm in Ctt:
#define isEven ((x & 0x01) == 0)
Page 247
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
(x & 0x01)
#define isOdd
(x n= y, y A.= x, x A= y)
#define swap(x,y)
void ExtBinEuclidCint *u, int *v, int *ul, int *u2, int *u3)
// warning: u and v will be swapped if u<v
int k, tl, t2, t3;
if ( *u < *v ) swap(*u,*v);
for (k = 0; isEven && isEven( ++k) I
*u >>= 1; *v >>= 1;
I
"ul = 1; *u2 = 0; *u3 = *u; t1 = *v; t2 = *u1; t3 = "v;
do I
do I
if (isEven(*u3)) I
if (isOdd(*ul) 11 isOdd(*u2)) (
*u1 += *v; *u2 += *u.
*u1 >>= 1; *u2 >>= 1; *u3 >>= 1;
if (isEven(t3) 11 *u3 < t3) 1
swap(*ul,tl); swap(*uZ,t2); swap(*u3,t3);
I
1 while (isEven(*u3));
while (*ul < tl 1 I *u2 < t2) 1
*u1 += *v; *u2 += *u;
*u1 = t1; *u2 = t2; *u3 = t3;
I while (t3 > 0);
while (*ul >= *v && *u2 >= *u) {
*u1 = *v; *u2 = *u.
1
â€śul <<= k; *u2 <<= k; *u3 <<= k;
maincint argc, char **argv) (
int a, b, gcd;
if ( argc < 3 ) {
cerr << "Usage: xeuclid u v" << endl;
return 1;
I
int u = atoi(argv[ll);
int v = atoi(argvC21);
if ( u <=O 11 v <0) {
cerr << "Arguments must be positive!" << endl;
return 2;
// warning: u and v will be swapped if u < v
ExtBinEuclid(&u, &v, &a, &b, &gcd);
tout << a << * * " << u << " + ("
<< b CC ") * ' << v << " = " << gcd << endl;
if ( gcd == 1 )
Page 248
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
tout << "the inverse of ' << v << " mod " << u << ' is: w
<< u b << endl;
return 0;
Iâ€™m not going to prove that it works or give the theory behind it. Details can be
found in [863], or in any of the number theory texts previously listed.
The algorithm is iterative and can be slow for large numbers. Knuth showed that
the average number of divisions performed by the algorithm is:
.843*log, (n) + 1.47
Solving for Coefficients
Euclidâ€™s algorithm can be used to solve this class of problems: Given an array of m
variables x1, x2, . . . x,, find an array of m coefficients, ul, u2 . . . u,, such that
7.21*x1+. . . +22,*x,= 1
Fermatâ€™s Little Theorem
If m is a prime, and a is not a multiple of m, then Fermatâ€™s little theorem says
U m1 = 1 (modm)
(Pierre de Fermat, pronounced â€śFairma, â€ť was a French mathematician who lived
from 1601 to 1665. This theorem has nothing to do with his last theorem.)
The Euler Totient Function
There is another method for calculating the inverse modulo n, but itâ€™s not always
possible to use it. The reduced set of residues mod n is the subset of the complete
set of residues that is relatively prime to n. For example, the reduced set of residues
mod 12 is (1,5,7,11]. If n is prime, then the reduced set of residues mod n is the set
of all numbers from 1 to n  1. The number 0 is never part of the reduced set of
residues for any n not equal to 1.
The Euler totient function, also called the Euler phi function and written as @(n),
is the number of elements in the reduced set of residues modulo n. In other words,
Q(n) is the numb er of positive integers less than n that are relatively prime to n (for
any n greater than 1). (Leonhard Euler, pronounced â€śOiler,â€ť was a Swiss mathe
matician who lived from 1707 to 1783.)
If n is prime, then Q(n) = n  1. If n = pq, where p and (7 are prime, then $(n) =
(p  l)(q  1). These numbers appear in some publickey algorithms; this is why.
According to Eulerâ€™s generalization of Fermatâ€™s little theorem, if gcd(u,n) = 1, then
aâ€ť(â€śâ€™ mod n = 1
Now it is easy to compute ul mod n:
x = a@˜â€ś) mod n
1
For example, what is the inverse of 5, modulo 7t Since 7 is prime, 4(7) = 7  1 = 6.
So, the inverse of 5, modulo 7, is
Page 249
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
11.3 Number Theory
Both methods for calculating inverses can be extended to solve for x in the general
problem (if gcd(u,n) = 1):
(a*˜) mod n = b
Using Eulerâ€™s generalization, solve
x=(b+&â€śâ€˜â€˜)modn
Using Euclidâ€™s algorithm, solve
x=(b*(uâ€˜modn))modn
In general, Euclidâ€™s algorithm is faster than Eulerâ€™s generalization for calculating
inverses, especially for numbers in the 500bit range. If gcd(u,n) # 1, all is not lost. In
this general case, (a*˜) mod n = b, can have multiple solutions or no solution.
Chinese Remainder Theorem
If you know the prime factorization of n, then you can use something called the
Chinese remainder theorem to solve a whole system of equations. The basic version
of this theorem was discovered by the firstcentury Chinese mathematician, Sun Tse.
In general, if the prime factorization of n is p1*p2*. . .*pt, then the system of equa
tions
mod pi) = ui, where i = 1, 2, . . . ,
(X t
has a unique solution, x, where x is less than n. (Note that some primes can appear
more than once. For example, p1 might be equal to p2.) In other words, a number (less
than the product of some primes) is uniquely identified by its residues mod those
primes.
For example, use 3 and 5 as primes, and 14 as the number. 14 mod 3 = 2, and 14
mod 5 = 4. There is only one number less than 3 l 5 = 15 which has those residues:
14. The two residues uniquely determine the number.
So, for an arbitrary u < p and b < 4 (where p and 4 are prime), there exists a unique
x, where x is less than pq, such that
x = u (mod p), and x = b (mod 4)
To find this x, first use Euclidâ€™s algorithm to find u, such that
u*q = 1 (modp)
Then compute:
˜=(((a b)*u)modp)*q+ b
Here is the Chinese remainder theorem in C:
/* r is the number of elements in arrays m and u;
m is the array of (pairwise relatively prime) moduli
u is the array of coefficients
Page 250
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
return value is n such than n == u[kl%m[kl (k=O..r1) and
n < m[Ol*m[ll*...*m[r11
*/
/* totient0 is left as an exercise to the reader. */
int Chineseremainder (sizet r, int *m, int *u)
sizet i;
int modulus;
int n;
modulus = 1;
for (i=O; i<r; ++i)
modulus *= m[il;
n = 0;
for (i=O; i<r; ++i) I
* modexp(modulus / m[il, tOtient(m[il),
n += u[il
m[il);
n %= modulus;
I
return n;
The converse of the Chinese remainder theorem can also be used to find the solu
tion to the problem: if p and 4 are primes, and p is less than 4, then there exists a
unique x less than pq, such that
a = x (mod p), and b = x (mod 4)
If a 2 b mod p, then
x= (((a  (b modp))*u) modp)*q+ b
If a < b mod p, then
x=(((u+p(bmodp))*u)modp)*q+b
Quadratic Residues
If p is prime, and a is greater than 0 and less than p, then a is a quadratic residue
mod p if
x2 = a (mod p), for some x
Not all values of a satisfy this property. For a to be a quadratic residue modulo n,
it must be a quadratic residue modulo all the prime factors of n. For example, if p =
7, the quadratic residues are 1, 2, and 4:
l2 = 1 = 1 (mod 7)
22=44(mod7)
3â€™=9=2(mod7)
Page 251
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
42=16=2(mod7)
52=25=4(mod7)
62=36 = 1 (mod 7)
Note that each quadratic residue appears twice on this list.
There are no values of x which satisfy any of these equations:
x2 = 3 (mod 7)
x2 = 5 (mod 7)
x2 = 6 (mod 7)
The quadratic nonresidues modulo 7, the numbers that are not quadratic residues,
are 3, 5, and 6.
Although I will not do so here, it is easy to prove that, when p is odd, there are
exactly (p  1)/2 quadratic residues mod p and the same number of quadratic non
residues mod p. Also, if a is a quadratic residue mod p, then a has exactly two square
roots, one of them between 0 and (p  1)/2, and the other between (p  1)/2 and
(p  1). One of these square roots is also a quadratic residue mod p; this is called the
principal square root.
If n is the product of two primes, p and 4, there are exactly (p  l)(q  1)/4
quadratic residues mod n. A quadratic residue mod n is a perfect square modulo n.
This is because to be a square mod n, the residue must be a square mod p and a
square mod 4. For example, there are 11 quadratic residues mod 35: 1, 4, 9, 11, 14,
15, 16, 21, 25, 29, and 30. Each quadratic residue has exactly four square roots.
Symbol
Legendre
The Legendre symbol, written L(a,p), is defined when a is any integer and p is a
prime greater than 2. It is equal to 0, 1, or 1.
L(a,p) = 0 if a is divisible by p.
L( u,p) = 1 if a is a quadratic residue mod p.
L( a,p) = 1 is u is a quadratic nonresidue mod p.
One way to calculate L(u,p) is:
L( u,p) = &  11â€™2
mod p
Or you can use the following algorithm:
1. If a = 1, then L(u,p) = 1
2. If a is even, then L(u,p) = L(u/2,p)*(l)â€˜pâ€™ W*
3. If a is odd (and + l), then L(u,p) = L(p mod ˜,a)*(1)â€ť  â€˜HP â€˜Iâ€™4
Note that this is also an efficient way to determine whether a is a quadratic residue
mod p (when p is prime).
Page 252
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
Jacobi Symbol
The Jacobi symbol, written J(u,n), is a generalization of the Legendre symbol to
composite moduli; it is defined for any integer a and any odd integer n. The function
shows up in primality testing. The Jacobi symbol is a function on the set of reduced
residues of the divisors of n and can be calculated by several formulas [ 14121.This is
one method:
Definition 1: J(a,n) is only defined if n is odd.
Definition 2: J(O,n)= 0.
Definition 3: If n is prime, then the Jacobi symbol J(u,n) = 0 if n divides a.
Definition 4: If n is prime, then the Jacobi symbol J(a,n) = 1 if a is a
quadratic residue modulo n.
Definition 5: If n is prime, then the Jacobi symbol J(u,n) = 1 if a is a
quadratic nonresidue modulo n.
Definition 6: If n is composite, then the Jacobi symbol J(u,n) = J(u,pi)
t . . . + J(a,p,A wherepl . . . pm is the prime factorization of n.
The following algorithm computes the Jacobi symbol recursively:
Rule 1: J(1,n) = 1
Rule 2: J(u*b,n) = J(u,n)*J(b,n)
Rule 3: J(2,n) = 1 if (nâ€ť  1)/8 is even, and 1 otherwise
Rule 4: J(u,n) = J((u mod n),n)
Rule 5: J(a,b,*&) = J(a,br)*J(a,bz)
Rule 6: If the greatest common divisor of a and b = 1, and a and b are
odd:
Rule 6a: J(u,b) = J(b,u) if (a  l)(b  1)/4 is even
Rule 6b: J(u,b) = J(b,u) if (a  l)(b  1)/4 is odd
Here is the algorithm in C:
/* This algorithm computes the Jacobi symbol recursively */
int jacobicint a, int b)
int g;
assert(odd(b));
if (a >= b) a %= b; I* by Rule 4 *I
if (a == 0) return 0; /* by Definition 2 */
if (a == 1) return 1; /* by Rule 1 */
if (a < 0)
if (((bl)/2 % 2 == 0)
Page 253
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
return jacobica,b);
else
return jacobi(a,b);
if (a % 2 == 0) /* a is even */
if (((b*b 1)/8) % 2 == 0)
return +jacobi(a/2, b)
else
return jacobi(a/2, b) /* by Rule 3 and Rule 2 */
g = gcd(a,b);
assert(odd(a)); /* this is guaranteed by the (a % 2 == 0)
test *I
if (g == a /* a exactly divides b */
return 0; /* by Rules 5 and 4, and Definition 2 */
else if (g != 1)
return jacobi(g,b) * jacobi(a/g, b); /* by Rule 2 */
else if CC al)*(b1)/4) % 2 == 0)
return +jacobi(b,a); /* by Rule 6a */
else
return jacobi(b,a); /* by Rule 6b */
I
If n is known to be prime beforehand, simply compute ulln  â€˜˜˜1 mod n instead of
running the previous algorithm; in this case J(u,n) is equivalent to the Legendre
symbol.
The Jacobi symbol cannot be used to determine whether a is a quadratic residue
mod n (unless n is prime, of course). Note that, if J(a,n) = 1 and n is composite, it is
not necessarily true that a is a quadratic residue modulo n. For example:
J(7,143) = J(7,11)*J(7,13) = (1)(l) = 1
However, there is no integer x such that x2 = 7 (mod 143).
Blwm Integers
If p and 4 are two primes, and both are congruent to 3 modulo 4, then n = pq is
sometimes called a Blum integer. If n is a Blum integer, each quadratic residue has
exactly four square roots, one of which is also a square; this is the principal square
root. For example, the principal square root of 139 mod 437 is 24. The other three
square roots are 185, 252, and 413.
Generators
If p is a prime, and g is less than p, then g is a generator mod p if
for each b from 1 to p  1, there exists some a where gâ€ť = b (mod p).
Another way of saying this is that g is primitive with respect to p.
For example, if p = 11, 2 is a generator mod 11:
21Â° = 1024 = 1 (mod 11)
2l=2=2(modll)
Page 254
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
2â€™ = 256 = 3 (mod 11)
22=4=4(mod11)
24 = 16 = 5 (mod 11)
29 = 512 = 6 (mod 11)
2â€™= 128 = 7 (mod 11)
23=8=8(mod11)
26=64=9(mod11)
25=32= lO(mod11)
Every number from 1 to 10 can be expressed as 2â€ť (mod p).
For p = 11, the generators are 2, 6, 7, and 8. The other numbers are not generators.
For example, 3 is not a generator because there is no solution to
3â€ť = 2 (mod 11)
In general, testing whether a given number is a generator is not an easy problem. It
is easy, however, if you know the factorization of p  1. Let ql, q2, . . . , qn be the dis
tinct prime factors of p  1. To test whether a number g is a generator mod p, calculate
giP U/smod p
for all values of 4 = ql, q2, . . . , qn.
If that number equals 1 for some value of 4, then g is not a generator. If that value
does not equal 1 for any values of 4, then g is a generator.
For example, let p = 11. The prime factors of p  1 = 10 are 2 and 5. To test whether
2 is a generator:
21â€ť  l)˜ (mod 11) = 4
21â€ť  1l/2(mod 11) = 10
Neither result is 1, so 2 is a generator.
To test whether 3 is a generator:
31â€ť  1115
(mod 11) = 9
31â€ť  w (mod 11J= 1
Therefore, 3 is not a generator.
If you need to find a generator mod p, simply choose a random number from 1 to
p  1 and test whether it is a generator. Enough of them will be, so youâ€™ll probably
find one fast.
Computing in a Galois Field
Donâ€™t be alarmed; thatâ€™s what we were just doing. If n is prime or the power of a
large prime, then we have what mathematicians call a finite field. In honor of that
fact, we use p instead of n. In fact, this type of finite field is so exciting that mathe
maticians gave it its own name: a Galois field, denoted as GF(p). (kvariste Galois
was a French mathematician who lived in the early nineteenth century and did a lot
of work in number theory before he was killed at age 20 in a duel.)
Page 255
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
In a Galois field, addition, subtraction, multiplication, and division by nonzero
elements are all welldefined. There is an additive identity, 0, and a multiplicative
identity, 1. Every nonzero number has a unique inverse (this would not be true if p
were not prime). The commutative, associative, and distributive laws are true.
Arithmetic in a Galois field is used a great deal in cryptography. All of the num
ber theory works; it keeps numbers a finite size, and division doesnâ€™t have any
rounding errors. Many cryptosystems are based on GF(p), where p is a large prime.
To make matters even more complicated, cryptographers also use arithmetic
modulo irreducible polynomials of degree n whose coefficients are integers modulo
LI,where 4 is prime. These fields are called GF(@). All arithmetic is done modulo
p(x), where p(x) is an irreducible polynomial of degree n.
The mathematical theory behind this is far beyond the scope of the book,
although I will describe some cryptosystems that use it. If you want to try to work
more with this, GF(23) has the following elements: 0, 1, x, x + 1, x2, x2 + 1, x2 + x, x2
+ x + 1. There is an algorithm for computing inverses in GF(2â€ť) that is suitable for
parallel implementation [421].
When talking about polynomials, the term â€śprimeâ€ť is replaced by â€śirreducible.â€ť A
polynomial is irreducible if it cannot be expressed as the product of two other poly
nomials (except for 1 and itself, of course). The polynomial x2 + 1 is irreducible over
the integers. The polynomial x3 + 2x2 + x is not; it can be expressed as x(x + 1)(x + 1).
A polynomial that is a generator in a given field is called primitive; all its coeffi
cients are relatively prime. Weâ€™ll see primitive polynomials again when we talk
about linearfeedback shift registers (see Section 16.2).
Computation in GF(2â€ť) can be quickly implemented in hardware with linear
feedback shift registers. For that reason, computation over GF(2â€ť) is often quicker
than computation over GF(p). Just as exponentiation is much more efficient in
GF(2â€ť), so is calculating discrete logarithms [180,181,368,379]. If you want to learn
more about this, read [ 1401.
For a Galois field GF(2â€ť), cryptographers like to use the trinomial p(x) = xâ€ť + x + 1
as the modulus, because the long string of zeros between the xâ€ť and x coefficients
makes it easy to implement a fast modular multiplication [ 1831. The trinomial
must be primitive, otherwise the math does not work. Values of n less than 1000
[ 1649,1648] for which xâ€ť + x + 1 is primitive are:
1, 3, 4, 6, 9, 15, 22, 28, 30, 46, 60, 63, 127, 153, 172, 303, 471, 532,
865, 900
There exists a hardware implementation of GF(2lâ€ť) where p(x) = x12â€™ + x + 1
[ 163 1,1632,1129]. Efficient hardware architectures for implementing exponentia
tion in GF(2â€ť) are discussed in [ 1471.
11.4 FACTORING
Factoring a number means finding its prime factors,
10 = 2*5
60=2*2*3*5
Page 256
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
252601=41*61*101
The factoring problem is one of the oldest in number theory. Itâ€™s simple to factor
a number, but itâ€™s timeconsuming. This is still true, but there have been some
major advances in the state of the art.
Currently, the best factoring algorithm is:
Number field sieve (NFS) [953] (see also [952,16,279]). The general num
ber field sieve is the fastestknown factoring algorithm for numbers
larger than 110 digits or so [472,635]. It was impractical when originally
proposed, but that has changed due to a series of improvements over the
last few years [953]. The NFS is still too new to have broken any factor
ing records, but this will change soon. An early version was used to fac
tor the ninth Fermat number: 2512 1 [955,954].
+
Other factoring algorithms have been supplanted by the NFS:
Quadratic sieve (QS) [1257,1617,1259]. This is the fastestknown algo
rithm for numbers less than 110 decimal digits long and has been used
extensively [440]. A faster version of this algorithm is called the multi
ple polynomial quadratic sieve [ 1453,302]. The fastest version of this
algorithm is called the double large prime variation of the multiple poly
nomial quadratic sieve.
Elliptic curve method (ECM) (95 7,1112,1113]. This method has been
used to find 43digit factors, but nothing larger.
Pollardâ€™s Monte Carlo algorithm [1254,248]. (This algorithm also appears
in volume 2, page 370 of Knuth [863].)
Continued fraction algorithm. See [ 1123,1252,863]. This algorithm isnâ€™t
even in the running.
Trial division. This is the oldest factoring algorithm and consists of test
ing every prime number less than or equal to the square root of the can
didate number.
See [251] for a good introduction to these different factoring algorithms, except for
the NFS. The best discussion of the NFS is [953]. Other, older references are [505,
1602,1258]. Information on parallel factoring can be found in [250].
If n is the number being factored, the fastest QS variants have a heuristic asymp
totic run time of:
,(l + O/l))/ln (n))˜l/2)jln[ln (n]]]f1/2)
The NFS is much faster, with a heuristic asymptotic time estimate of:
e(1.923 + O/l])(ln (n))f1/3)(ln1111
(n)))(2/3)
In 1970, the big news was the factoring of a 41digit hard number [ 11231.(A â€śhardâ€ť
number is one that does not have any small factors and is not of a special form that
Page 257
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
11.4 Factoring
allows it to be factored more easily.) Ten years later, factoring hard numbers twice
that size took a Cray computer just a few hours [440].
In 1988, Carl Pomerance designed a modular factoring machine, using custom
VLSI chips [1259]. The size of the number you would be able to factor depends on
how large a machine you can afford to build. He never built it.
In 1993, a 120digit hard number was factored using the quadratic sieve; the cal
culation took 825 mipsyears and was completed in three months real time [463].
Other results are [504].
Todayâ€™s factoring attempts use computer networks [302,955]. In factoring a 116
digit number, Arjen Lenstra and Mark Manasse used 400 mipsyearsthe spare
time on an array of computers around the world for a few months.
In March 1994, a I29digit (428bit) number was factored using the double large
prime variation of the multiple polynomial QS [66] by a team of mathematicians led
by Lenstra. Volunteers on the Internet carried out the computation: 600 people and
1600 machines over the course of eight months, probably the largest ad hoc multi
processor ever assembled. The calculation was the equivalent of 4000 to 6000 mips
years. The machines communicated via electronic mail, sending their individual
results to a central repository where the final steps of analysis took place. This com
putation used the QS and fiveyearold theory; it would have taken onetenth the
time using the NFS [949]. According to [66]: â€śWe conclude that commonly used 512
bit RSA moduli are vulnerable to any organization prepared to spend a few million
dollars and to wait a few months.â€ť They estimate that factoring a 512bit number
would be 100 times harder using the same technology, and only 10 times harder
using the NFS and current technology [949].
To keep up on the state of the art of factoring, RSA Data Security, Inc. set up the
RSA Factoring Challenge in March 1991 [532]. The challenge consists of a list of
hard numbers, each the product of two primes of roughly equal size. Each prime was
chosen to be congruent to 2 modulo 3. There are 42 numbers in the challenge, one
each of length 100 digits through 500 digits in steps of 10 digits (plus one additional
number, 129 digits long). At the time of writing, RSA100, RSA110, RSA 120, and
RSA129 have been factored, all using the QS. RSA130 might be next (using the
NFS), or the factoring champions might skip directly to RSA 140.
This is a fastmoving field. It is difficult to extrapolate factoring technology
because no one can predict advances in mathematical theory. Before the NFS was
discovered, many people conjectured that the QS was asymptotically as fast as any
factoring method could be. They were wrong.
Nearterm advances in the NFS are likely to come in the form of bringing down
the constant: 1.923. Some numbers of a special form, like Fermat numbers, have a
constant more along the lines of 1.5 [955,954]. If the hard numbers used in public
key cryptography had that kind of constant, 1024bit numbers could be factored
today. One way to lower the constant is to find better ways of representing numbers
as polynomials with small coefficients. The problem hasnâ€™t been studied very exten
sively yet, but it is probable that advances are coming [949].
For the most current results from the RSA Factoring Challenge, send email to
challengeinfo@rsa.com.
Page 258
Prev. Chapter Home Previous Page
Next Page
Prev. page
Next Page Next Chapter
CHAPTER11 Mathematical Background
Square Roots Modulo n
If n is the product of two primes, then the ability to calculate square roots mod n
is computationally equivalent to the ability to factor n [1283,35,36,193]. In other
words, someone who knows the prime factors of n can easily compute the square
roots of a number mod n, but for everyone else the computation has been proven to
be as hard as computing the prime factors of n.
11.5 PRIME NUMBER GENERATION
Publickey algorithms need prime numbers. Any reasonably sized network needs
lots of them. Before discussing the mathematics of prime number generation, I will
answer a few obvious questions.
1. If everyone needs a different prime number, wonâ€™t we run out? No. In fact,
there are approximately 10151 primes 512 bits in length or less. For numbers
near n, the probability that a random number is prime is approximately
one in In n. So the total number of primes less than n is n/(ln n). There are
only 10â€ť atoms in the universe. If every atom in the universe needed a bil
lion new primes every microsecond from the beginning of time until now,
you would only need lOlo primes; there would still be approximately 10151
512bit primes left.
2. What if two people accidentally pick the same prime number? It wonâ€™t
happen. With over 10151 prime numbers to choose from, the odds of that
happening are significantly less than the odds of your computer sponta
neously combusting at the exact moment you win the lottery.
3. If someone creates a database of all primes, wonâ€™t he be able to use that
database to break publickey algorithms? Yes, but he canâ€™t do it. If you
could store one gigabyte of information on a drive weighing one gram, then
a list of just the 5 12bit primes would weigh so much that it would exceed
the Chandrasekhar limit and collapse into a black hole . . . so you couldnâ€™t
retrieve the data anyway.
But if factoring numbers is so hard, how can generating prime numbers be easy?
The trick is that the yes/no question, â€śIs n prime?â€ť is a much easier question to
answer than the more complicated question, â€śWhat are the factors of n!â€ť
The wrong way to find primes is to generate random numbers and then try to fac
tor them. The right way is to generate random numbers and test if they are prime.
There are several probabilistic primality tests; tests that determine whether a num
ber is prime with a given degree of confidence. Assuming this â€śdegree of confi
denceâ€ť is large enough, these sorts of tests are good enough. Iâ€™ve heard primes
generated in this manner called â€śindustrialgrade primesâ€ť: These are numbers that
ńňđ. 1 