. 14
( 29)


It™s easy to turn this into a maximal-period LFSR. The first number is the length of the LFSR. The
last number is always 0 and can be ignored. All the numbers, except the 0, specify the tap sequence,
counting from the left of the shift register. That is, low degree terms in the polynomial correspond to
taps near the left-hand side of the register.

To continue the example, the listing (32, 7, 5, 3, 2, 1, 0) means that if you take a 32-bit shift register
and generate the new bit by XORing the thirty-second, seventh, fifth, third, second, and first bits
together (see Figure 16.4), the resultant LFSR will be maximal length; it will cycle through 232 - 1
values before repeating.

The C code for this LFSR looks like:

int LFSR () {
static unsigned long ShiftRegister = 1;
/* Anything but 0. */
ShiftRegister = ((((ShiftRegister >> 31)
^ (ShiftRegister >> 6)
^ (ShiftRegister >> 4)
^ (ShiftRegister >> 2)
^ (ShiftRegister >> 1)
^ ShiftRegister))
& 0—00000001)
<< 31)
| (ShiftRegister >> 1) ;
return ShiftRegister & 0—00000001;

Figure 16.4 32-bit long maximal-length LFSR.

Table 16.2
Some Primitive Plynomials Mod 2
(1, 0) (36, 11, 0) (68, 9, 0) (97, 6, 0)
(2, 1, 0) (36, 6, 5, 4, 2, 1, 0) (68, 7, 5, 1, 0) (98, 11, 0)
(3, 1, 0) (37, 6, 4, 1, 0) (69, 6, 5, 2, 0) (98, 7, 4, 3, 1, 0)
(4, 1, 0) (37, 5, 4, 3, 2, 1, 0) (70, 5, 3, 1, 0) (99, 7, 5, 4, 0)
(5, 2, 0) (38, 6, 5, 1, 0) (71, 6, 0) (100, 37, 0)
(6, 1, 0) (39, 4, 0) (71, 5, 3, 1, 0) (100, 8, 7, 2, 0)
(7, 1, 0) (40, 5, 4, 3, 0) (72, 10, 9, 3, 0) (101, 7, 6, 1, 0)
(7, 3, 0) (41, 3, 0) (72, 6, 4, 3, 2, 1, 0) (102, 6 5 3 0)
(8, 4, 3, 2, 0) (42, 7, 4, 3, 0) (73, 25, 0) (103, 9, 9)
(9, 4, 0) (42, 5, 4, 3, 2, 1, 0) (73, 4, 3, 2, 0) (104, 11, 10, 1, 0)
(10, 3, 0) (43, 6, 4, 3, 0) (74, 7, 4, 3, 0) (105, 16, 0)
(11, 2, 0) (44, 6, 5, 2, 0) (75, 6, 3, 1, 0) (106, 15, 0)
(12, 6, 4, 1, 0) (45, 4, 3, 1, 0) (76, 5, 4, 2, 0) (107, 9, 7, 4, 0)
(13, 4, 3, 1, 0) (46, 8, 7, 6, 0) (77, 6, 5, 2, 0) (108, 31, 0)
(14, 5, 3, 1, 0) (46, 8, 5, 3, 2, 1, 0) (78, 7, 2, 1, 0) (109, 5, 4, 2, 0)

Page 312 of 666
Applied Cryptography: Second Edition - Bruce Schneier

(15, 1, 0) (47, 5, 0) (79, 9, 0) (110, 6, 4, 1, 0)
(16, 5, 3, 2, 0) (48, 9, 7, 4, 0) (79, 4, 3, 2, 0) (111, 10, 0)
(17, 3, 0) (48, 7, 5, 4, 2, 1, 0) (80, 9, 4, 2, 0) (111, 49, 0)
(17, 5, 0) (49, 9, 0) (80, 7, 5, 3, 2, 1, 0) (113, 9, 0)
(17, 6, 0) (49, 6, 5, 4, 0) (81, 4, 0) (113, 15, 0)
(18, 7, 0) (50, 4, 3, 2, 0) (82, 9, 6, 4, 0) (113, 30, 0)
(18, 5, 2, 1, 0) (51, 6, 3, 1, 0) (82, 8, 7, 6, 1, 0) (114, 11, 2, 1, 0)
(19, 5, 2, 1, 0) (52, 3, 0) (83, 7, 4, 2, 0) (115, 8, 7, 5, 0)
(20, 3, 0) (53, 6, 2, 1, 0) (84, 13, 0) (116, 6, 5, 2, 0)
(21, 2, 0) (54, 8, 6, 3, 0) (84, 8, 7, 5, 3, 1, 0) (117, 5, 2, 1, 0)
(22, 1, 0) (54, 6, 5, 4, 3, 2, 0) (85, 8, 2, 1, 0) (118, 33, 0)
(23, 5, 0) (55, 24, 0) (86, 6, 5, 2, 0) (119, 8, 0)
(24, 4, 3, 1, 0) (55, 6, 2, 1, 0) (87, 13, 0) (119, 45, 0)
(25, 3, 0) (56, 7, 4, 2, 0) (87, 7, 5, 1, 0) (120, 9, 6, 2, 0)
(26, 6, 2, 1, 0) (57, 7, 0) (88, 11, 9, 8, 0) (121, 18, 0)
(27, 5, 2, 1, 0) (57, 5, 3, 2, 0) (88, 8, 5, 4, 3, 1, 0) (122, 6, 2, 1, 0)
(28, 3, 0) (58, 19, 0) (89, 38, 0) (123, 2, 0)
(29, 2, 0) (58, 6, 5, 1, 0) (89, 51, 0) (124, 37, 0)
(30, 6, 4, 1, 0) (59, 7, 4, 2, 0) (89, 6, 5, 3, 0) (125, 7, 6, 5, 0)
(31, 3, 0) (59, 6, 5, 4, 3, 1, 0) (90, 5, 3, 2, 0) (126, 7, 4, 2, 0)
(31, 6, 0) (60, 1, 0) (91, 8, 5, 1, 0) (127, 1, 0)
(31, 7, 0) (61, 5, 2, 1, 0) (91, 7, 6, 5, 3, 2, 0) (127, 7, 0)
(31, 13, 0) (62, 6, 5, 3, 0) (92, 6, 5, 2, 0) (127, 63, 0)
(32, 7, 6, 2, 0) (63, 1, 0) (93, 2, 0) (128, 7, 2, 1, 0)
(32, 7, 5, 3, 2, 1, 0) (64, 4, 3, 1, 0) (94, 21, 0) (129, 5, 0)
(33, 13, 0) (65, 18, 0) (94, 6, 5, 1, 0) (130, 3, 0)
(33, 16, 4, 1, 0) (65, 4, 3, 1, 0) (95, 11, 0) (131, 8, 3, 2, 0)
(34, 8, 4, 3, 0) (66, 9, 8, 6, 0) (95, 6, 5, 4, 2, 1, 0) (132, 29, 0)
(34, 7, 6, 5, 2, 1, 0) (66, 8, 6, 5, 3, 2, 0) (96, 10, 9, 6, 0) (133, 9, 8, 2, 0)
(35, 2, 0) (67, 5, 2, 1, 0) (96, 7, 6, 4, 3, 2, 0) (134, 57, 0)
(135, 11, 0) (152, 6, 3, 2, 0) (178, 87, 0) (270, 133, 0)
(135, 16, 0) (153, 1, 0) (183, 56, 0) (282, 35, 0)
(135, 22, 0) (153, 8, 0) (194, 87, 0) (282, 43, 0)
(136, 8, 3, 2, 0) (154, 9, 5, 1, 0) (198, 65, 0) (286, 69, 0)
(137, 21, 0) (155, 7, 5, 4, 0) (201, 14, 0) (286, 73, 0)
(138, 8, 7, 1, 0) (156, 9, 5, 3, 0) (201, 17, 0) (294, 61, 0)
(139, 8, 5, 3, 0) (157, 6, 5, 2, 0) (201, 59, 0) (322, 67, 0)
(140, 29, 0) (158, 8, 6, 5, 0) (201, 79, 0) (333, 2, 0)
(141, 13, 6, 1, 0) (159, 31, 0) (202, 55, 0) (350, 53, 0)
(142, 21, 0) (159, 34, 0) (207, 43, 0) (366, 29, 0)
(143, 5, 3, 2, 0) (159, 40, 0) (212, 105, 0) (378, 43, 0)
(144, 7, 4, 2, 0) (160, 5, 3, 2, 0) (218, 11, 0) (378, 107, 0)
(145, 52, 0) (161, 18, 0) (218, 15, 0) (390, 89, 0)
(145, 69, 0) (161, 39, 0) (218, 71, 0) (462, 73, 0)

Page 313 of 666
Applied Cryptography: Second Edition - Bruce Schneier

(146, 5, 3, 2, 0) (161, 60, 0) (218, 83, 0) (521, 32, 0)
(147, 11, 4, 2, 0) (162, 8, 7, 4, 0) (225, 32, 0) (521, 48, 0)
(148, 27, 0) (163, 7, 6, 3, 0) (225, 74, 0) (521, 158, 0)
(149, 10, 9, 7, 0) (164, 12, 6, 5, 0) (225, 88, 0) (521, 168, 0)
(150, 53, 0) (165, 9, 8, 3, 0) (225, 97, 0) (607, 105, 0)
(151, 3, 0) (166, 10, 3, 2, 0) (225, 109, 0) (607, 147, 0)
(151, 9, 0) (167, 6, 0) (231, 26, 0) (607, 273, 0)
(151, 15, 0) (170, 23, 0) (231, 34, 0) (1279, 216, 0)
(151, 31, 0) (172, 2, 0) (234, 31, 0) (1279, 418, 0)
(151, 39, 0) (174, 13, 0) (234, 103, 0) (2281, 715, 0)
(151, 43, 0) (175, 6, 0) (236, 5, 0) (2281, 915, 0)
(151, 46, 0) (175, 16, 0) (250, 103, 0) (2281, 1029, 0)
(151, 51, 0) (175, 18, 0) (255, 52, 0) (3217, 67, 0)
(151, 63, 0) (175, 57, 0) (255, 56, 0) (3217, 576, 0)
(151, 66, 0) (177, 8, 0) (255, 82, 0) (4423, 271, 0)
(151, 67, 0) (177, 22, 0) (258, 83, 0) (9689, 84, 0)
(151, 70, 0) (177, 88, 0) (266, 47, 0)

The code is a little more complicated when the shift register is longer than the computer™s word size,
but not significantly so.

Note that all of these listings have an odd number of coefficients. I have provided such a large table
because LFSRs are often used for stream-cipher cryptography and I wanted many examples so that
different people would pick different primitive polynomials. Since, if p(x) is primitive, then so is xnp
(1/x); each entry on the table is actually two primitive polynomials.

For example, if (a, b, 0) is primitive, then (a, a - b, 0) is also primitive. If (a, b, c, d, 0) is primitive, then
(a, a - d, a - c, a - b, 0) is also primitive. Mathematically:

if xa + xb + 1 is primitive, so is xa + xa-b + 1
if xa + xb + xc + xd + 1 is primitive, so is xa + xa-d + xa-c + xa-b + 1

Primitive trinomials are fastest in software, because only two bits of the shift register have to be
XORed to generate each new bit. Actually, all the feedback polynomials listed in Table 16.2 are
sparse, meaning that they only have a few coefficients. Sparseness is always a source of weakness,
sometimes enough to break the algorithm. It is far better to use dense primitive polynomials, those
with a lot of coefficients, for cryptographic applications. If you use dense polynomials, and especially
if you make them part of the key, you can live with much shorter LFSRs.

Generating dense primitive polynomials modulo 2 is not easy. In general, to generate primitive
polynomials of degree k you need to know the factorization of 2k - 1. Three good references for
finding primitive polynomials are [652,1285,1287].

LFSRs are competent pseudo-random-sequence generators all by themselves, but they have some
annoying nonrandom properties. Sequential bits are linear, which makes them useless for encryption.
For an LFSR of length n, the internal state is the next n output bits of the generator. Even if the
feedback scheme is unknown, it can be determined from only 2n output bits of the generator, by using

Page 314 of 666
Applied Cryptography: Second Edition - Bruce Schneier

the highly efficient Berlekamp-Massey algorithm [1082,1083]: see Section 16.3.

Also, large random numbers generated from sequential bits of this sequence are highly correlated
and, for certain types of applications, not very random at all. Even so, LFSRs are often used as
building blocks in encryption algorithms.

LFSRs in Software

LFSRs are slow in software, but they™re faster in assembly language than in C. One solution is to run
16 LFSRs (or 32, depending on your computer™s word size) in parallel. This scheme uses an array of
words that is the length of the LFSR, with each bit position in the words representing a different
LFSR. Assuming all the feedback polynomials are the same, this can run pretty quickly. In general,
the best way to update shift registers is to multiply the current state by suitable binary matrices [901].

It is also possible to modify the LFSR™s feedback scheme. The resultant generator is no better
cryptographically, but it still has a maximal period and is easy to implement in software [1272].
Instead of using the bits in the tap sequence to generate the new left-most bit, each bit in the tap
sequence is XORed with the output of the generator and replaced; then the output of the generator
becomes the new left-most bit (see Figure 16.5). This is sometimes called a Galois configuration.

In C, this looks like:

#define mask 0—80000057

static unsigned long ShiftRegister=1;
void seed_LFSR (unsigned long seed)
if (seed == 0) /* avoid calamity */
seed = 1;
ShiftRegister = seed;

int modified_LFSR (void)
if (ShiftRegister & 0—00000001) {
ShiftRegister = ((ShiftRegister ^ mask >> 1) |
return 1;
} else {
ShiftRegister >>= 1;
return 0;

Figure 16.5 Galois LFSR.

The savings here is that all the XORs can be done as a single operation. This can also be parallelized,
and the different feedback polynomials can be different. The Galois configuration can also be faster
in hardware, especially in custom VLSI implementations. In general, if you are using hardware that is
good at shifts, use a Fibonacci configuration; if you can exploit parallelism, use a Galois

16.3 Design and Analysis of Stream Ciphers

Page 315 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Most practical stream-cipher designs center around LFSRs. In the early days of electronics, they were
very easy to build. A shift register is nothing more than an array of bit memories and the feedback
sequence is just a series of XOR gates. Even in VLSI circuitry, a LFSR-based stream cipher can give
you a lot of security with only a few logic gates.

The problem with LFSRs is that they are very inefficient in software. You want to avoid sparse
feedback polynomials”they facilitate correlation attacks [1051,1090,350]”and dense feedback
polynomials are inefficient. Any stream cipher outputs a bit at a time; you have to iterate the
algorithm 64 times to encrypt what a single iteration of DES can encrypt. In fact, a simple LFSR
algorithm like the shrinking generator described later is no faster in software than DES.

This branch of cryptography is fast-paced and very politically charged. Most designs are secret; a
majority of military encryptions systems in use today are based on LFSRs. In fact, most Cray
computers (Cray 1, Cray X-MP, Cray Y-MP) have a rather curious instruction generally known as
“population count.” It counts the 1 bits in a register and can be used both to efficiently calculate the
Hamming distance between two binary words and to implement a vectorized version of a LFSR. I™ve
heard this called the canonical NSA instruction, demanded by almost all computer contracts.

On the other hand, an astonishingly large number of seemingly complex shift-register-based
generators have been cracked. And certainly military cryptanalysis institutions such as the NSA have
cracked a lot more. Sometimes it™s amazing to see the simple ones proposed again and again.

Linear Complexity

Analyzing stream ciphers is often easier than analyzing block ciphers. For example, one important
metric used to analyze LFSR-based generators is linear complexity, or linear span. This is defined as
the length, n, of the shortest LFSR that can mimic the generator output. Any sequence generated by a
finite-state machine over a finite field has a finite linear complexity [1006]. Linear complexity is
important because a simple algorithm, called the Berlekamp-Massey algorithm, can generate this
LFSR after examining only 2n bits of the keystream [1005]. Once you™ve generated this LFSR, you™ve
broken the stream cipher.

This idea has extensions from fields to rings [1298], and when the output sequence is viewed as
numbers over fields of odd characteristic [842]. A further enhancement is the notion of a linear
complexity profile, which measures the linear complexity of the sequence as it gets longer and longer
[1357,1168,411,1582]. Another algorithm for computing linear complexity is useful only in very
specialized circumstances [597,595,596,1333]. A generalization of linear complexity is in [776]. There
is also the notion of sphere complexity [502] and 2-adic complexity [844].

In any case, remember that a high linear complexity does not necessarily indicate a secure generator,
but a low linear complexity indicates an insecure one [1357,1249].

Correlation Immunity

Cryptographers try to get a high linear complexity by combining the output of several output
sequences in some nonlinear manner. The danger here is that one or more of the internal output
sequences”often just outputs of individual LFSRs”can be correlated with the combined keystream
and attacked using linear algebra. Often this is called a correlation attack or a divide-and-conquer
attack. Thomas Siegenthaler has shown that correlation immunity can be precisely defined, and that
there is a trade-off between correlation immunity and linear complexity [1450].

Page 316 of 666
Applied Cryptography: Second Edition - Bruce Schneier

The basic idea behind a correlation attack is to identify some correlation between the output of the
generator and the output of one of its internal pieces. Then, by observing the output sequence, you
can obtain information about that internal output. Using that information and other correlations,
collect information about the other internal outputs until the entire generator is broken.

Correlation attacks and variations such as fast correlation attacks”these offer a trade-off between
computational complexity and effectiveness”have been successfully applied to a number of LFSR-
based keystream generators [1451,278,1452,572,1636,1051,1090,350,633,1054,1089,995]. Some
interesting new ideas along these lines are in [46,1641].

Other Attacks

There are other general attacks against keystream generators. The linear consistency test attempts to
identify some subset of the encryption key using matrix techniques [1638]. There is also the meet-in-
the-middle consistency attack [39,41]. The linear syndrome algorithm relies on being able to write a
fragment of the output sequence as a linear equation [1636,1637]. There is the best affine
approximation attack [502] and the derived sequence attack [42]. The techniques of differential
cryptanalysis have even been applied to stream ciphers [501], as has linear cryptanalysis [631].

16.4 Stream Ciphers Using LFSRs

The basic approach to designing a keystream generator using LFSRs is simple. First you take one or
more LFSRs, generally of different lengths and with different feedback polynomials. (If the lengths
are all relatively prime and the feedback polynomials are all primitive, the whole generator is
maximal length.) The key is the initial state of the LFSRs. Every time you want a bit, shift the LFSRs
once (this is sometimes called clocking). The output bit is a function, preferably a nonlinear function,
of some of the bits of the LFSRs. This function is called the combining function, and the whole
generator is called a combination generator. (If the output bit is a function of a single LFSR, the
generator is called a filter generator.) Much of the theoretical background for this kind of thing was
laid down by Selmer and Neal Zierler [1647].

Complications have been added. Some generators have LFSRs clocked at different rates; sometimes
the clocking of one generator depends on the output of another. These are all electronic versions of
pre-WWII cipher machine ideas, and are called clock-controlled generators [641]. Clock control can
be feedforward, where the output of one LFSR controls the clocking of another, or feedback, where
the output of one LFSR controls its own clocking.

Although these generators are, at least in theory, susceptible to embedding and probabilistic
correlation attacks [634,632], many are secure for now. Additional theory on clock-controlled shift
registers is in [89].

Ian Cassells, once the head of pure mathematics at Cambridge and a former Bletchly Park
cryptanalyst, said that “cryptography is a mixture of mathematics and muddle, and without the
muddle the mathematics can be used against you.” What he meant was that in stream ciphers, you
need some kind of mathematical structure”such as a LFSR”to guarantee maximal-length and other
properties, and then some complicated nonlinear muddle to stop someone from getting at the register
and solving it. This advice also holds true for block algorithms.

What follows is a smattering of LFSR-based keystream generators that have appeared in the
literature. I don™t know if any of them have been used in actual cryptographic products. Most of them
are of theoretical interest only. Some have been broken; some may still be secure.

Page 317 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Since LFSR-based ciphers are generally implemented in hardware, electronics logic symbols will be
used in the figures. In the text, is XOR, ^ is AND, is OR, and ¬ is NOT.

Geffe Generator

This keystream generator uses three LFSRs, combined in a nonlinear manner (see Figure 16.6) [606].
Two of the LFSRs are inputs into a multiplexer, and the third LFSR controls the output of the
multiplexer. If a1, a2, and a3 are the outputs of the three LFSRs, the output of the Geffe generator can
be described by:

b = (a1 ^ a2) ((¬ a1) ^ a3)

If the LFSRs have lengths n1, n2, and n3, respectively, then the linear complexity of the generator is

(n1 + 1)n2 + n1n3

The period of the generator is the least common multiple of the periods of the three generators.
Assuming the degrees of the three primitive feedback polynomials are relatively prime, the period of
this generator is the product of the periods of the three LFSRs.

Although this generator looks good on paper, it is cryptographically weak and falls to a correlation
attack [829,1638]. The output of the generator equals the output of LFSR-2 75 percent of the time. So,
if the feedback taps are known, you can guess the initial value for LFSR-2 and generate the output
sequence of that register. Then you can count the number of times the output of the LFSR-2 agrees
with the output of the generator. If you guessed wrong, the two sequences will agree about 50 percent
of the time; if you guessed right, the two sequences will agree about 75 percent of the time.

Similarly, the output of the generator equals the output of LFSR-3 about 75 percent of the time. With
those correlations, the keystream generator can be easily cracked. For example, if the primitive
polynomials only have three terms each, and the largest LFSR is of length n, it only takes a segment of
the output sequence 37n-bits long to reconstruct the internal states of all three LFSRs [1639].

Generalized Geffe Generator

Instead of choosing between two LFSRs, this scheme chooses between k LFSRs, as long as k is a power
of 2. There are k + 1 LFSRs total (see Figure 16.7). LFSR-1 must be clocked log2k times faster than
the other k LFSRs.

Figure 16.6 Geffe generator.

Even though this scheme is more complex than the Geffe generator, the same kind of correlation
attack is possible. I don™t recommend this generator.

Page 318 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Jennings Generator

This scheme uses a multiplexer to combine two LFSRs [778,779,780]. The multiplexer, controlled by
LFSR-1, selects 1 bit of LFSR-2 for each output bit. There is also a function that maps the output of
LFSR-2 to the input of the multiplexer (see Figure 16.8).

The key is the initial state of the two LFSRs and the mapping function. Although this generator has
great statistical properties, it fell to Ross Anderson™s meet-in-the-middle consistency attack [39] and
the linear consistency attack [1638,442]. Don™t use this generator.

Beth-Piper Stop-and-Go Generator

This generator, shown in Figure 16.9, uses the output of one LFSR to control the clock of another
LFSR [151]. The clock input of LFSR-2 is controlled by the output of LFSR-1, so that LFSR-2 can
change its state at time t only if the output of LFSR-1 was 1 at time t - 1.

No one has been able to prove results about this generator™s linear complexity in the general case.
However, it falls to a correlation attack [1639].

Alternating Stop-and-Go Generator

This generator uses three LFSRs of different length. LFSR-2 is clocked when the output of LFSR-1 is
1; LFSR-3 is clocked when the output of LFSR-1 is 0. The output of the generator is the XOR of
LFSR-2 and LFSR-3 (see Figure 16.10) [673].

This generator has a long period and large linear complexity. The authors found a correlation attack
against LFSR-1, but it does not substantially weaken the generator. There have been other attempts
at keystream generators along these lines [1534,1574,1477].

Figure 16.7 Generalized Geffe generator.

Figure 16.8 Jennings generator.

Bilateral Stop-and-Go Generator

This generator uses two LFSRs, both of length n (see Figure 16.11) [1638]. The output of the
generator is the XOR of the outputs of each LFSR. If the output of LFSR-2 at time t “ 1 is 0 and the

Page 319 of 666
Applied Cryptography: Second Edition - Bruce Schneier

output at time t “ 2 is 1, then LFSR-2 does not clock at time t. Conversely, if the output of LFSR-1 at
time t “ 1 is 0 and the output at t “ 2 is 1, and if LFSR-1 clocked at time t, then LFSR-2 does not clock
at time t.

The linear complexity of this system is roughly equal to the period. According to [1638], “no evident
key redundancy has been observed in this system.”

Threshold Generator

This generator tries to get around the security problems of the previous generators by using a
variable number of LFSRs [277]. The theory is that if you use a lot of LFSRs, it™s harder to break the

This generator is illustrated in Figure 16.12. Take the output of a large number of LFSRs (use an odd
number of them). Make sure the lengths of all the LFSRs are relatively prime and all the feedback
polynomials are primitive: maximize the period. If more than half the output bits are 1, then the
output of the generator is 1. If more than half the output bits are 0, then the output of the generator is

Figure 16.9 Beth-Piper stop-and-go generator.

Figure 16.10 Alternating stop-and-go generator.

With three LFSRs, the output generator can be written as:

b = (a1 ^ a2) (a1 ^ a3) (a2 ^ a3)

This is very similar to the Geffe generator, except that it has a larger linear complexity of

n1n2 + n1n3 + n2n3

where n1, n2, and n3 are the lengths of the first, second, and third LFSRs.

This generator isn™t great. Each output bit of the generator yields some information about the state of
the LFSRs”0.189 bit to be exact”and the whole thing falls to a correlation attack. I don™t
recommend using it.

Self-Decimated Generators

Page 320 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Self-decimated generators are generators that control their own clock. Two have been proposed, one
by Rainer Rueppel (see Figure 16.13) [1359] and another by Bill Chambers and Dieter Gollmann
[308] (see Figure 16.14). In Rueppel™s generator, when the output of the LFSR is 0, the LFSR is
clocked d times. When the output of the LFSR is 1, the LFSR is clocked k times. Chambers™s and
Gollmann™s generator is more complicated, but the idea is the same. Unfortunately, both generators
are insecure [1639], although some modifications have been proposed that may correct the problems

Figure 16.11 Bilateral stop-and-go generator.

Figure 16.12 Threshold generator.

Multispeed Inner-Product Generator

This generator, by Massey and Rueppel [1014], uses two LFSRs clocked at two different speeds (see
Figure 16.15). LFSR-2 is clocked d times as fast as LFSR-1. The individual bits of the two LFSRs are
ANDed together and then XORed with each other to produce the final output bit of the generator.

Although this generator has high linear complexity and it possesses excellent statistical properties, it
still falls to a linear consistency attack [1639]. If n1 is the length of LFSR-1, n2 is the length of the
LFSR-2, and d is the speed multiple between the two, then the internal state of the generator can be
recovered from an output sequence of length

n1 + n2 + log2d

Summation Generator

More work by Rainer Rueppel, this generator adds the output of two LFSRs (with carry) [1358,1357].
This operation is highly nonlinear. Through the late 1980s, this generator was the security front-
runner, but it fell to a correlation attack [1053,1054,1091]. And it has been shown that this is an
example of a feedback with carry shift register (see Section 17.4), and can be broken [844].

Figure 16.13 Rueppel™s self-decimated generator.

Page 321 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Figure 16.14 Chambers™s and Gollmann™s self-decimated generator.


That stands for “dynamic random-sequence generator” [1117]. The idea is to have two different filter
generators”threshold, summation, or whatever”fed by a single set of LFSRs and controlled by
another LFSR.

First clock all the LFSRs. If the output of LFSR-0 is 1, then compute the output of the first filter
generator. If the output of LFSR-0 is 0, then compute the output of the second filter generator. The
final output is the first output XOR the second.

Gollmann Cascade

The Gollmann cascade (see Figure 16.16), described in [636,309], is a strengthened version of a stop-
and-go generator. It consists of a series of LFSRs, with the clock of each controlled by the previous
LFSR. If the output of LFSR-1 is 1 at time t - 1, then LFSR-2 clocks. If the output of LFSR-2 is 1 at
time t - 1, then LFSR-3 clocks, and so on. The output of the final LFSR is the output of the generator.
If all the LFSRs have the same length, n, the linear complexity of a system with k LFSRs is

n(2n - 1)k-1

Figure 16.15 Multispeed inner-product generator.

Cascades are a cool idea: They are conceptually very simple and they can be used to generate
sequences with huge periods, huge linear complexities, and good statistical properties. They are
vulnerable to an attack called lock-in [640]. This is a technique by which a cryptanalyst reconstructs
the input to the last shift register in the cascade, then proceeds to break the cascade register by
register. This is a serious problem in some situations and weakens the effective key length of the
algorithm, but precautions can be taken to minimize the attack.

Further analysis has indicated that the sequence approaches random as k gets larger
[637,638,642,639]. Based on recent attacks on short Gollmann cascades [1063], I recommend using a k
of at least 15. You™re better off using more LFSRs of shorter length than fewer LFSRs of longer

Shrinking Generator

The shrinking generator [378] uses a different form of clock control than the previous generators.

Page 322 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Take two LFSRs: LFSR-1 and LFSR-2. Clock both of them. If the output of LFSR-1 is 1, then the
output of the generator is LFSR-2. If the output of LFSR-1 is 0, discard the two bits, clock both
LFSRs, and try again.

This idea is simple, reasonably efficient, and looks secure. If the feedback polynomials are sparse, the
generator is vulnerable, but no other problems have been found. Even so, it™s new. One
implementation problem is that the output rate is not regular; if LFSR-1 has a long string of zeros
then the generator outputs nothing. The authors suggest buffering to solve this problem [378].
Practical implementation of the shrinking generator is discussed in [901].

Self-Shrinking Generator

The self-shrinking generator [1050] is a variant of the shrinking generator. Instead of using two
LFSRs, use pairs of bits from a single LFSR. Clock a LFSR twice. If the first bit in the pair is 1, the
output of the generator is the second bit. If the first bit is 0, discard both bits and try again. While the
self-shrinking generator requires about half the memory space as the shrinking generator, it is also
half the speed.

While the self-shrinking generator also seems secure, it still has some unexplained behavior and
unknown properties. This is a very new generator; give it some time.

Figure 16.16 Gollmann cascade.

16.5 A5

A5 is the stream cipher used to encrypt GSM (Group Special Mobile). That™s the non-American
standard for digital cellular mobile telephones. It is used to encrypt the link from the telephone to the
base station. The rest of the link is unencrypted; the telephone company can easily eavesdrop on your

A lot of strange politics surrounds this one. Originally it was thought that GSM™s cryptography would
prohibit export of the phones to some countries. Now some officials are discussing whether A5 might
harm export sales, implying that it is so weak as to be an embarrassment. Rumor has it that the
various NATO intelligence agencies had a catfight in the mid-1980s over whether GSM encryption
should be strong or weak. The Germans wanted strong cryptography, as they were sitting near the
Soviet Union. The other countries overruled them, and A5 is a French design.

We know most of the details. A British telephone company gave all the documentation to Bradford
University without remembering to get them to sign a nondisclosure agreement. It leaked here and
there, and was eventually posted to the Internet. A paper describing A5 is [1622]; there is also code at
the back of this book.

A5 consists of three LFSRs; the register lengths are 19, 22, and 23; all the feedback polynomials are
sparse. The output is the XOR of the three LFSRs. A5 uses variable clock control. Each register is
clocked based on its own middle bit, XORed with the inverse threshold function of the middle bits of
all three registers. Usually, two of the LFSRs clock in each round.

There is a trivial attack requiring 240 encryptions: Guess the contents of the first two LFSRs, then try

Page 323 of 666
Applied Cryptography: Second Edition - Bruce Schneier

to determine the third LFSR from the keystream. (Whether this attack is actually feasible is under
debate, but a hardware key search machine currently under design should resolve the matter soon

Nonetheless, it is becoming clear that the basic ideas behind A5 are good. It is very efficient. It passes
all known statistical tests; its only known weakness is that its registers are short enough to make
exhaustive search feasible. Variants of A5 with longer shift registers and denser feedback polynomials
should be secure.

16.6 Hughes XPD/KPD

This algorithm is brought to you by Hughes Aircraft Corp. They put it in army tactical radios and
direction-finding equipment for sale to foreign militaries. It was designed in 1986 and called XPD, for
Exportable Protection Device. Later it was renamed KPD”Kinetic Protection Device”and
declassified [1037,1036].

The algorithm uses a 61-bit LFSR. There are 210 different primitive feedback polynomials, which
were approved by the NSA. The key selects one of these polynomials (they are all stored in ROM
somewhere), as well as the initial state of the LFSR.

It has eight different nonlinear filters, each of which has six taps from the LFSR and which produces
1 bit. The bits combine to generate a byte, which is used to encrypt or decrypt the datastream.

This algorithm looks pretty impressive, but I doubt it is. The NSA allows export, so there must be
some attack on the order of 240 or less. What is it?

16.7 Nanoteq

Nanoteq is a South African electronics company. This is their algorithm that has been fielded by the
South African police to encrypt their fax transmissions, and presumably for other uses as well.

The algorithm is described, more or less, in [902,903]. It uses a 127-bit LFSR with a fixed feedback
polynomial; the key is the initial state of the feedback register. The 127 bits of the register are reduced
to a single keystream bit using 25 primitive cells. Each cell has five inputs and one output:

f(x1,x2,x3,x4,x5) = x1 + x2 + (x1 + x3) (x2 + x4 + x5) + (x1 + x4) (x2 + x3) + x5

Each input of the function is XORed with some bit of the key. There is also a secret permutation that
depends on the particular implementation, and is not detailed in the papers. This algorithm is only
available in hardware.

Is this algorithm secure? I doubt it. During the transition to majority rule, embarrassing faxes from
one police station to another would sometimes turn up in the liberal newspapers. These could easily
have been the results of U.S., U.K., or Soviet intelligence efforts. Ross Anderson took some initial steps
towards cryptanalyzing this algorithm in [46]; I expect more results to come soon.

16.8 Rambutan

Rambutan is a British algorithm, designed by the Communications Electronics Security Group (one
of the aliases used by GCHQ). It is only sold as a hardware module and is approved for the protection

Page 324 of 666
Applied Cryptography: Second Edition - Bruce Schneier

of classified material up to “Confidential.” The algorithm itself is secret, and the chip is not generally
commercially available.

Rambutan has a 112-bit key (plus parity bits) and can operate in three modes: ECB, CBC, and 8-bit
CFB. This strongly indicates that it is a block algorithm, but rumors point elsewhere. Supposedly, it is
a LFSR stream cipher. It has five shift registers, each one of a different length around 80 bits. The
feedback polynomials are fairly sparse, with only about 10 taps each. Each shift register provides four
inputs to a very large and complex nonlinear function which eventually spits out a single bit.

Why call it Rambutan? Perhaps, like the fruit, it™s spiny and forbidding on the outside but soft and
yielding inside. On the other hand, maybe that™s not the reason.

16.9 Additive Generators

Additive generators (sometimes called lagged Fibonacci generators) are extremely efficient because
they produce random words instead of random bits [863]. They are not secure on their own, but can
be used as building blocks for secure generators.

The initial state of the generator is an array of n-bit words: 8-bit words, 16-bit words, 32-bit words,
whatever: X1, X2, X3,..., Xm. This initial state is the key. The ith word of the generator is

Xi = (Xi-a + Xi-b + Xi-c +...+ Xi-m) mod 2n

If the coefficients a, b, c,..., m are chosen right, the period of this generator is at least 2n - 1. One of the
requirements on the coefficients is that the least significant bit forms a maximal-length LFSR.

For example, (55,24,0) is a primitive polynomial mod 2 from Table 16.2. This means that the following
additive generator is maximal length.

Xi = (Xi-55 + Xi-24) mod 2n

This works because the primitive polynomial has three coefficients. If it has more than three, you need
some additional requirements to make it maximal length. See [249] for details.


Fish is an additive generator based on techniques used in the shrinking generator [190]. It produces a
stream of 32-bit words which can be XORed with a plaintext stream to produce ciphertext, or XORed
with a ciphertext stream to produce plaintext. The algorithm is named as it is because it is a Fibonacci
shrinking generator.

First, use these two additive generators. The key is the initial values of these generators.

Ai = (Ai-55 + Ai-24) mod 232
Bi = (Bi-52 + Bi-19) mod 232

These sequences are shrunk, as a pair, depending on the least significant bit of Bi: if it is 1, use the
pair; if it is 0, ignore the pair. Cj is the sequence of used words from Ai, and Dj is the sequence of used
words from Bi. These words are used in pairs”C2j, C2j+1, D2j, and D2j+1”to generate two 32-bit

Page 325 of 666
Applied Cryptography: Second Edition - Bruce Schneier

output words: K2j and K2j+1.

E2j = C2j (D2j ^ D2j+1)
F2j = D2j+1 ^ (E2j ^ C2j+1)
K2j = E2j F2j
K2i+1 = C2i+1 F2j

This algorithm is fast. On a 33 megahertz 486, a C implementation of Fish encrypts data at 15
megabits per second. Unfortunately, it is also insecure; an attack has a work factor of about 240 [45].


Pike is a leaner, meaner version of Fish, brought to you by Ross Anderson, the man who broke Fish
[45]. It uses three additive generators. For example:

Ai = (Ai-55 + Ai-24) mod 232
Bi = (Bi-57 + Bi-7) mod 232
Ci = (Ci-58 + Ci-19) mod 232

To generate the keystream word, look at the addition carry bits. If all three agree (all are 0 or all are
1), then clock all three generators. If they do not, just clock the two generators that agree. Save the
carry bits for next time. The final output is the XOR of the three generators.

Pike is faster than Fish, since on the average 2.75 steps will be required per output rather than 3. It is
far too new to trust, but looks good so far.


Mush is a mutual shrinking generator. It™s easy to explain [1590]. Take two additive generators: A
and B. If the carry bit of A is set, clock B. If the carry bit of B is set, clock A. Clock A, and set the
carry bit if there is a carry. Clock B, and set the carry bit if there is a carry. The final output is the
XOR of the output of A and B.

The easiest generators to use are the ones from Fish:

Ai = (Ai-55 + Ai-24) mod 232
Bi = (Bi-52 + Bi-19) mod 232

On the average, three generator iterations are required to produce one output word. And if the
coefficients of the additive generators are chosen correctly and are relatively prime, the output
sequence will be maximal length. I know of no successful attacks, but remember that this algorithm is
very new.

16.10 Gifford

David Gifford invented a stream cipher and used it to encrypt news wire reports in the Boston area
from 1984 until 1988 [608,607,609]. The algorithm has a single 8-byte register: b0, b1,..., b7. The key is

Page 326 of 666
Applied Cryptography: Second Edition - Bruce Schneier

the initial state of the register. The algorithm works in OFB; the plaintext does not affect the
algorithm at all. (See Figure 16.17).

To generate a key byte ki, concatenate b0 and b2 and concatenate b4 and b7. Multiply the two together
to get a 32-bit number. The third byte from the left is ki.

To update the register, take b1 and sticky right shift it 1 bit. This means the left-most bit is both
shifted and also remains in place. Take b7 and shift it 1 bit to the left; there should be a 0 in the right-
most bit position. Take the XOR of the modified b1, the modified b7, and b0. Shift the original register
1 byte to the right and put this byte in the left-most position.

This algorithm remained secure throughout its life, but was broken in 1994 [287]. It turns out that the
feedback polynomial isn™t primitive and can be attacked that way”oops.

Figure 16.17 Gifford.

16.11 Algorithm M

The name is from Knuth [863]. It™s a method for combining multiple pseudo-random streams that
increases their security. One generator™s output is used to select a delayed output from the other
generator [996,1003]. In C:

#define ARR_SIZE (8192) /* for example ” the larger the better

static unsigned char delay[ ARR_SIZE ] ;

unsigned char prngA( void ) ;
long prngB( void ) ;

void init_algM( void )
long i ;

for ( i = 0 ; i < ARR_SIZE ; i++ )
delay = prngA() ;

} /* init_algM */

unsigned char algM( void )
long j,v ;
j = prngB() % ARR_SIZE ; /* get the delay[] index */
v = delay[j] ; /* get the value to return */
delay[j] = prngA() ; /* replace it */

return ( v ) ;

Page 327 of 666
Applied Cryptography: Second Edition - Bruce Schneier

} /* algM */

This has strength in that if prngA were truly random, one could not learn anything about prngB (and
could therefore not cryptanalyze it). If prngA were of the form that it could be cryptanalyzed only if
its output were available in order (i.e., only if prngB were cryptanalyzed first) and otherwise it was
effectively truly random, then the combination would be secure.

16.12 PKZIP

Roger Schlafly designed the encryption algorithm built into the PKZIP data compression program.
It™s a stream cipher that encrypts data one byte at a time. At least, this is the algorithm in version
2.04g. I can™t speak for later versions, but unless there is some announcement you can probably
assume that they are identical.

The algorithm uses three 32-bit variables, initialized as follows:

K0 = 305419896
K1 = 591751049
K2 = 878082192

It has an 8-bit key, K3, derived from K2. Here is the algorithm (all symbols are standard C notation):

Ci = Pi ^ K3
K0 = crc32 (K0, Pi)
K1 = K1 + (K0 & 0—000000ff)
K1 = K1 * 134775813 + 1
K2 = crc32 (K2, K1 >> 24)
K3 = ((K2 | 2) * ((K2 | 2) ^ 1)) >> 8

The function crc32 takes the previous value and a byte, XORs them, and calculates the next value by
the CRC polynomial denoted by 0—edb88320. In practice, a 256-entry table can be precomputed and
the crc32 calculation becomes:

crc32 (a, b) = (a >> 8) ^ table [(a & 0—ff) b]

The table is precomputed by the original definition of crc32:

table [i] = crc32 (i, 0)

To encrypt a plaintext stream, first loop the key bytes through the encryption algorithm to update the
keys. Ignore the ciphertext output in this step. Then encrypt the plaintext, one byte at a time. Twelve
random bytes are prepended to the plaintext, but that™s not really important. Decryption is similar to
encryption, except that Ci is used in the second step of the algorithm instead of Pi.

Security of PKZIP

Unfortunately, it™s not that great. An attack requires 40 to 200 bytes of known plaintext and has a
time complexity of about 227 [166]. You can do it in a few hours on your personal computer. If the
compressed file has any standard headers, getting the known plaintext is no problem. Don™t use the

Page 328 of 666
Applied Cryptography: Second Edition - Bruce Schneier

built-in encryption in PKZIP.

Chapter 17
Other Stream Ciphers and Real Random-Sequence Generators
17.1 RC4

RC4 is a variable-key-size stream cipher developed in 1987 by Ron Rivest for RSA Data Security, Inc.
For seven years it was proprietary, and details of the algorithm were only available after signing a
nondisclosure agreement.

In September, 1994 someone posted source code to the Cypherpunks mailing list”anonymously. It
quickly spread to the Usenet newsgroup sci.crypt, and via the Internet to ftp sites around the world.
Readers with legal copies of RC4 confirmed compatibility. RSA Data Security, Inc. tried to put the
genie back into the bottle, claiming that it was still a trade secret even though it was public; it was too
late. It has since been discussed and dissected on Usenet, distributed at conferences, and taught in
cryptography courses.

RC4 is simple to describe. The algorithm works in OFB: The keystream is independent of the
plaintext. It has a 8 * 8 S-box: S0, S1,..., S255. The entries are a permutation of the numbers 0 through
255, and the permutation is a function of the variable-length key. It has two counters, i and j,
initialized to zero.

To generate a random byte, do the following:

i = (i + 1) mod 256
j = (j + Si) mod 256
swap Si and Sj
t = (Si + Sj) mod 256
K = St

The byte K is XORed with the plaintext to produce ciphertext or XORed with the ciphertext to
produce plaintext. Encryption is fast”about 10 times faster than DES.

Initializing the S-box is also easy. First, fill it linearly: S0 = 0, S1 = 1,..., S255 = 255. Then fill another
256-byte array with the key, repeating the key as necessary to fill the entire array: K0, K1,..., K255. Set
the index j to zero. Then:

for i = 0 to 255:
j = (j + Si + Ki) mod 256
swap Si and Sj

And that™s it. RSADSI claims that the algorithm is immune to differential and linear cryptanalysis,
doesn™t seem to have any small cycles, and is highly nonlinear. (There are no public cryptanalytic
results. RC4 can be in about 21700 (256! — 2562) possible states: an enormous number.) The S-box
slowly evolves with use: i ensures that every element changes and j ensures that the elements change
randomly. The algorithm is simple enough that most programmers can quickly code it from memory.

Page 329 of 666
Applied Cryptography: Second Edition - Bruce Schneier

It should be possible to generalize this idea to larger S-boxes and word sizes. The previous version is
8-bit RC4. There™s no reason why you can™t define 16-bit RC4 with a 16 * 16 S-box (100K of memory)
and a 16-bit word. You™d have to iterate the initial setup a lot more times”65,536 to keep with the
stated design”but the resulting algorithm should be faster.

RC4 has special export status if its key length is 40 bits or under (see Section 13.8). This special export
status has nothing to do with the secrecy of the algorithm, although RSA Data Security, Inc. has
hinted for years that it does. The name is trademarked, so anyone who writes his own code has to call
it something else. Various internal documents by RSA Data Security, Inc. have not yet been made
public [1320,1337].

So, what™s the deal with RC4? It™s no longer a trade secret, so presumably anyone can use it.
However, RSA Data Security, Inc. will almost certainly sue anyone who uses unlicensed RC4 in a
commercial product. They probably won™t win, but they will certainly make it cheaper for a company
to license than fight.

RC4 is in dozens of commercial cryptography products, including Lotus Notes, Apple Computer™s
AOCE, and Oracle Secure SQL. It is part of the Cellular Digital Packet Data specification [37].

17.2 SEAL

SEAL is a software-efficient stream cipher designed at IBM by Phil Rogaway and Don Coppersmith
[1340]. The algorithm was optimized for 32-bit processors: To run well it needs eight 32-bit registers
and a cache of a few kilobytes. Using a relatively slow operation, SEAL preprocesses the key
operation into a set of tables. These tables are then used to speed up encryption and decryption.

Pseudo-random Function Family

One novel feature of SEAL is that is isn™t really a traditional stream cipher: it is a pseudo-random
function family. Given a 160-bit key k, and a 32-bit n, SEAL stretches n into an L-bit string k(n). L
can take any value less than 64 kilobytes. SEAL is supposed to enjoy the property that if k is selected
at random, then k(n) should be computationally indistinguishable from a random L-bit function of n.

The practical effect of SEAL being a pseudo-random function family is that it is useful in applications
where traditional stream ciphers are not. With most stream ciphers you generate a sequence of bits in
one direction: Knowing the key and a position i, the only way to determine the ith bit generated is to
generate all the bits up until the ith one. But a pseudo-random function family is different: You get
easy access at any desired position in the key stream. This is very useful.

Imagine you need to secure a hard drive. You want to encrypt each and every 512-byte sector. With a
pseudo-random function family like SEAL, you can encrypt the contents of sector n by XORing it
with k(n). It is as though the entire disk is XORed with a long pseudo-random string, where any piece
of that long string can be computed without any trouble.

A pseudo-random function family also simplifies the synchronization problem encountered with
standard stream ciphers. Suppose you send encrypted messages over a channel that sometimes drops
messages. With a pseudo-random function family, you can encrypt under k the nth message you
transmit, xn, as n together with the XOR of xn and k(n). The receiver doesn™t need to store any state to
recover xn, nor does he need to worry about lost messages affecting the message decryption process.

Description of SEAL

Page 330 of 666
Applied Cryptography: Second Edition - Bruce Schneier

The inner loop of SEAL is shown by Figure 17.1. Three key-derived tables, called R, S, and T, drive
the algorithm. The preprocessing step maps the key k, to these tables using a procedure based on SHA
(see Section 18.7). The 2-kilobyte table, T, is a 9 * 32 bit S-box.

SEAL also uses four 32-bit registers, A, B, C, and D, whose initial values are determined by n and the
k-derived tables R and T. These registers get modified over several iterations, each one involving 8
rounds. In each round 9 bits of a first register (either A, B, C, or D) are used to index into table T. The
value retrieved from T is then added to or XORed with the contents of a second register: again one of
A, B, C, or D. The first register is then circularly shifted by nine positions. In some rounds the second
register is further modified by adding or XORing it with the (now shifted) first register. After 8
rounds of this, A, B, C, and D are added to the keystream, each masked first by adding or XORing it
with a certain word from S. The iteration is completed by adding to A and C additional values
dependent on n, n1, n2, n3, n4; exactly which one depends on the parity of the iteration number.

The important ideas in this design seem to be:

1. Use a large, secret, key-derived S-box (T).
2. Alternate arithmetic operations which don™t commute (addition and XOR).
3. Use an internal state maintained by the cipher which is not directly manifest in the
data stream (the ni values which modify A and C at the end of each iteration).
4. Vary the round function according to the round number, and vary the iteration
function according to the iteration number.

Figure 17.1 The inner loop of SEAL.

SEAL requires about five elementary machine operations to encrypt each byte of text. It runs at 58
megabits per second on a 50 megahertz 486 machine. This is probably the fastest software algorithm
in the book.

On the other hand, SEAL must preprocess its key into internal tables. These tables total roughly 3
kilobytes in size, and their calculation takes about 200 SHA computations. Thus, SEAL is not
appropriate to use in situations where you don™t have the time to perform the key setup or you don™t
have the memory to store the tables.

Security of SEAL

SEAL is a new algorithm and has yet to be subjected to any published cryptanalysis. This suggests

Page 331 of 666
Applied Cryptography: Second Edition - Bruce Schneier

caution. However, SEAL seems to be well thought through. Its peculiarities do, in the end, make a
good deal of sense. And Don Coppersmith is generally regarded as the world™s cleverest cryptanalyst.

Patents and Licenses

SEAL is being patented [380]. Anyone wishing to license SEAL should contact the Director of
Licenses, IBM Corporation, 500 Columbus Ave., Thurnwood, NY, 10594.

17.3 WAKE

WAKE is the Word Auto Key Encryption algorithm, invented by David Wheeler [1589]. It produces a
stream of 32-bit words which can be XORed with a plaintext stream to produce ciphertext, or XORed
with a ciphertext stream to produce plaintext. And it™s fast.

WAKE works in CFB; the previous ciphertext word is used to generate the next key word. It also uses
an S-box of 256 32-bit values. This S-box has a special property: The high-order byte of all the entries
is a permutation of all possible bytes, and the low-order 3 bytes are random.

First, generate the S-box entries, Si, from the key. Then initialize four registers with the key (or with
another key): a0, b0, c0, and d0. To generate a 32-bit keystream word, Ki:

Ki = di

The ciphertext word Ci, is the plaintext word, Pi XORed with Ki.

Then, update the four registers:

ai+1 = M(ai,di)
bi+1 = M(bi,ai+1)
ci+1 = M(ci,bi+1)
di+1 = M(di,ci+1)

Function M is

M(x,y) = (x + y) >> 8 S(x+y) ^ 255

This is shown in Figure 17.2. The operation >> is a right shift, not a rotation. The low-order 8 bits of x
+ y are the input into the S-box. Wheeler gives a procedure for generating the S-box, but it isn™t really
complete. Any algorithm to generate random bytes and a random permutation will work.

Page 332 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Figure 17.2 Wake.

WAKE™s biggest asset is that it is fast. However, it™s insecure against a chosen-plaintext or chosen-
ciphertext attack. It is being used in the current version of Dr. Solomon™s Anti-Virus program.

17.4 Feedback with Carry Shift Registers

A feedback with carry shift register, or FCSR, is similar to a LFSR. Both have a shift register and a
feedback function; the difference is that a FCSR also has a carry register (see Figure 17.3). Instead of
XORing all the bits in the tap sequence, add the bits together and add in the contents of the carry
register. The result mod 2 becomes the new bit. The result divided by 2 becomes the new content of
the carry register.

Figure 17.4 is an example of a 3-bit FCSR tapped at the first and second bit. Its initial value is 001,
and the initial contents of the carry register is 0. The output bit is the right-most bit of the shift

Shift Register Carry Register
001 0
100 0
010 0
101 0
110 0
111 0
011 1
101 1
010 1
001 1
000 1
100 0

Figure 17.3 Feedback with carry shift register.

Note that the final internal state (including the contents of the carry register) is the same as the second
internal state. The sequence cycles at this point, and has a period of 10.

There are a few things to note here. First, the carry register is not a single bit; it is a number. The size
of the carry register must be at least log2t, where t is the number of taps. There are only two taps in
the previous example, so the carry register only has to be 1 bit wide. If there were four taps, the carry
register would have to be 2 bits wide, and could be either 0, 1, 2, or 3.

Page 333 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Second, there is an initial transient before the FCSR settles down into its repeating period. In the
previous example, only one state never repeated. For larger and more complicated FCSRs, there may
be more.

Third, the maximum period of a FCSR is not 2n - 1, where n is the length of the shift register. The
maximum period is q - 1, where q is the connection integer. This number gives the taps and is defined

q = 2q1 + 22q2 + 24q4 +...+ 2nqn - 1

(Yes, the qis are numbered from left to right.) And even worse, q has to be a prime for which 2 is a
primitive root. The rest of this discussion assumes q is of this form.

In this example, q = 2*0 + 4*1 + 8*1 - 1 = 11. And 11 is a prime with 2 as a primitive root. So the
maximum period is 10.

Not all initial states give you the maximum period. For example, look at the FCSR when the initial
value is 101 and the carry register is set to 4.

Figure 17.4 3-bit FCSR.

Shift Register Carry Register
101 4
110 2
111 1
111 1

At this point the register spits out a neverending stream of 1s.

Any initial state will result in one of four things. First, it is part of the maximum period. Second, it
will fall into the maximum period after an initial transient. Third, it will fall into a sequence of all
zeros after an initial transient. Fourth, it will fall into a sequence of all ones after an initial transient.

There is a mathematical formula for determining what will happen to a given initial state, but it™s
much easier to just test it. Run the FCSR for a while. (If m is the initial memory, and t is the number
of taps, then log2(t) + log2(m) + 1 steps are enough.) If it degenerates into a neverending stream of 0s
or 1s within n bits, where n is the length of the FCSR, don™t use it. If it doesn™t, then use it. Since the
initial state of a FCSR corresponds to the key of the stream cipher, this means that a FCSR-based
generator will have a set of weak keys.

Table 17.1 lists all connection integers less than 10,000 for which 2 is a primitive root. These all have
maximum period q - 1. To turn one of these numbers into a tap sequence, calculate the binary

Page 334 of 666
Applied Cryptography: Second Edition - Bruce Schneier

expansion of q + 1. For example, 9949 would translate to taps on bits 1, 2, 3, 4, 6, 7, 9, 10, and 13,

9950 = 213 + 210 + 29 + 27 + 26 + 24 + 23 + 22 + 21

Table 17.2 lists all the 4-tap tap sequences that result in a maximal-length FCSR for shift register
lengths of 32 bits, 64 bits, and 128 bits. Each of the four values, a, b, c, and d, combine to generate q, a
prime for which 2 is primitive.

q = 2a + 2b + 2c + 2d - 1

Any of these tap sequences can be used to create a FCSR with period q - 1.

The idea of using FCSRs for cryptography is still very new; it is being pioneered by Andy Klapper
and Mark Goresky [844,845,654,843,846]. Just as the analysis of LFSRs is based on the addition of
primitive polynomials mod 2, analysis of FCSRs is based on addition over something called the 2-adic
numbers. The theory is well beyond the scope of this book, but there seems to be a 2-adic analog for
everything. Just as you can define linear complexity, you can define 2-adic complexity. There is even a
2-adic analog to the Berlekamp-Massey algorithm. What this means is that the list of potential stream
ciphers has just doubled”at least. Anything you can do with a LFSR you can do with a FCSR.

There are further enhancements to this sort of idea, ones that involve multiple carry registers. The
analysis of these sequence generators is based on addition over the ramified extensions of the 2-adic
numbers [845,846].

17.5 Stream Ciphers Using FCSRs

There aren™t any FCSR stream ciphers in the literature; the theory is still too new. In the interests of
getting the ball rolling, I propose some here. I am taking two different tacks: I am proposing FCSR
stream ciphers that are identical to previously proposed LFSR generators, and I am proposing stream
ciphers that use both FCSRs and LFSRs. The security of the former can probably be analyzed using
2-adic numbers; the latter cannot be analyzed using algebraic techniques”they can probably only be
analyzed indirectly. In any case, it is important to choose LFSRs and FCSRs whose periods are
relatively prime.

All this will come later. Right now I know of no implementation or analysis of any of these ideas. Wait
some years and scan the literature before you trust any of them.

Cascade Generators

There are two ways to use FCSRs in a cascade generator:

” FCSR Cascade. The Gollmann cascade with FCSRs instead of LFSRs.
” LFSR/FCSR Cascade. The Gollmann cascade with the generators alternating between
LFSRs and FCSRs.

FCSR Combining Generators

These generators use a variable number of LFSRs and/or FCSRs, and a variety of functions to
combine them. The XOR operation destroys the algebraic properties of FCSRs, so it makes sense to
use it to combine them. The generator, shown in Figure 17.5, uses a variable number of FCSRs. Its
output is the XOR of the outputs of the individual FCSRs.

Page 335 of 666
Applied Cryptography: Second Edition - Bruce Schneier

Other generators along similar lines are:

” FCSR Parity Generator. All registers are FCSRs and the combining function is XOR.
” LFSR/FCSR Parity Generator. Registers are a mix of LFSRs and FCSRs and the
combining function is XOR.
” FCSR Threshold Generator. All registers are FCSRs and the combining function is the
majority function.
” LFSR/FCSR Threshold Generator. Registers are a mix of LFSRs and FCSRs and the
combining function is the majority function.
” FCSR Summation Generator. All registers are FCSRs and the combining function is
addition with carry.
” LFSR/FCSR Summation Generator. Registers are a mix of LFSRs and FCSRs and the
combining function is addition with carry.

Table 17.1
Connection Integers for Maximal-period FCSRs
2 653 1549 2477 3539
5 659 1571 2531 3547
11 661 1619 2539 3557
13 677 1621 2549 3571
19 701 1637 2557 3581
29 709 1667 2579 3613
37 757 1669 2621 3637
53 773 1693 2659 3643
59 787 1733 2677 3659
61 797 1741 2683 3677
67 821 1747 2693 3691
83 827 1787 2699 3701
101 829 1861 2707 3709
107 853 1867 2741 3733
131 859 1877 2789 3779
139 877 1901 2797 3797
149 883 1907 2803 3803
163 907 1931 2819 3851
173 941 1949 2837 3853
179 947 1973 2843 3877
181 1019 1979 2851 3907
197 1061 1987 2861 3917
211 1091 1997 2909 3923
227 1109 2027 2939 3931
269 1117 2029 2957 3947
293 1123 2053 2963 3989
317 1171 2069 3011 4003
347 1187 2083 3019 4013


. 14
( 29)