ńňđ. 14 |

Itâ€™s easy to turn this into a maximal-period LFSR. The first number is the length of the LFSR. The

last number is always 0 and can be ignored. All the numbers, except the 0, specify the tap sequence,

counting from the left of the shift register. That is, low degree terms in the polynomial correspond to

taps near the left-hand side of the register.

To continue the example, the listing (32, 7, 5, 3, 2, 1, 0) means that if you take a 32-bit shift register

and generate the new bit by XORing the thirty-second, seventh, fifth, third, second, and first bits

together (see Figure 16.4), the resultant LFSR will be maximal length; it will cycle through 232 - 1

values before repeating.

The C code for this LFSR looks like:

int LFSR () {

static unsigned long ShiftRegister = 1;

/* Anything but 0. */

ShiftRegister = ((((ShiftRegister >> 31)

^ (ShiftRegister >> 6)

^ (ShiftRegister >> 4)

^ (ShiftRegister >> 2)

^ (ShiftRegister >> 1)

^ ShiftRegister))

& 0Ă—00000001)

<< 31)

| (ShiftRegister >> 1) ;

return ShiftRegister & 0Ă—00000001;

}

Figure 16.4 32-bit long maximal-length LFSR.

Table 16.2

Some Primitive Plynomials Mod 2

(1, 0) (36, 11, 0) (68, 9, 0) (97, 6, 0)

(2, 1, 0) (36, 6, 5, 4, 2, 1, 0) (68, 7, 5, 1, 0) (98, 11, 0)

(3, 1, 0) (37, 6, 4, 1, 0) (69, 6, 5, 2, 0) (98, 7, 4, 3, 1, 0)

(4, 1, 0) (37, 5, 4, 3, 2, 1, 0) (70, 5, 3, 1, 0) (99, 7, 5, 4, 0)

(5, 2, 0) (38, 6, 5, 1, 0) (71, 6, 0) (100, 37, 0)

(6, 1, 0) (39, 4, 0) (71, 5, 3, 1, 0) (100, 8, 7, 2, 0)

(7, 1, 0) (40, 5, 4, 3, 0) (72, 10, 9, 3, 0) (101, 7, 6, 1, 0)

(7, 3, 0) (41, 3, 0) (72, 6, 4, 3, 2, 1, 0) (102, 6 5 3 0)

(8, 4, 3, 2, 0) (42, 7, 4, 3, 0) (73, 25, 0) (103, 9, 9)

(9, 4, 0) (42, 5, 4, 3, 2, 1, 0) (73, 4, 3, 2, 0) (104, 11, 10, 1, 0)

(10, 3, 0) (43, 6, 4, 3, 0) (74, 7, 4, 3, 0) (105, 16, 0)

(11, 2, 0) (44, 6, 5, 2, 0) (75, 6, 3, 1, 0) (106, 15, 0)

(12, 6, 4, 1, 0) (45, 4, 3, 1, 0) (76, 5, 4, 2, 0) (107, 9, 7, 4, 0)

(13, 4, 3, 1, 0) (46, 8, 7, 6, 0) (77, 6, 5, 2, 0) (108, 31, 0)

(14, 5, 3, 1, 0) (46, 8, 5, 3, 2, 1, 0) (78, 7, 2, 1, 0) (109, 5, 4, 2, 0)

Page 312 of 666

Applied Cryptography: Second Edition - Bruce Schneier

(15, 1, 0) (47, 5, 0) (79, 9, 0) (110, 6, 4, 1, 0)

(16, 5, 3, 2, 0) (48, 9, 7, 4, 0) (79, 4, 3, 2, 0) (111, 10, 0)

(17, 3, 0) (48, 7, 5, 4, 2, 1, 0) (80, 9, 4, 2, 0) (111, 49, 0)

(17, 5, 0) (49, 9, 0) (80, 7, 5, 3, 2, 1, 0) (113, 9, 0)

(17, 6, 0) (49, 6, 5, 4, 0) (81, 4, 0) (113, 15, 0)

(18, 7, 0) (50, 4, 3, 2, 0) (82, 9, 6, 4, 0) (113, 30, 0)

(18, 5, 2, 1, 0) (51, 6, 3, 1, 0) (82, 8, 7, 6, 1, 0) (114, 11, 2, 1, 0)

(19, 5, 2, 1, 0) (52, 3, 0) (83, 7, 4, 2, 0) (115, 8, 7, 5, 0)

(20, 3, 0) (53, 6, 2, 1, 0) (84, 13, 0) (116, 6, 5, 2, 0)

(21, 2, 0) (54, 8, 6, 3, 0) (84, 8, 7, 5, 3, 1, 0) (117, 5, 2, 1, 0)

(22, 1, 0) (54, 6, 5, 4, 3, 2, 0) (85, 8, 2, 1, 0) (118, 33, 0)

(23, 5, 0) (55, 24, 0) (86, 6, 5, 2, 0) (119, 8, 0)

(24, 4, 3, 1, 0) (55, 6, 2, 1, 0) (87, 13, 0) (119, 45, 0)

(25, 3, 0) (56, 7, 4, 2, 0) (87, 7, 5, 1, 0) (120, 9, 6, 2, 0)

(26, 6, 2, 1, 0) (57, 7, 0) (88, 11, 9, 8, 0) (121, 18, 0)

(27, 5, 2, 1, 0) (57, 5, 3, 2, 0) (88, 8, 5, 4, 3, 1, 0) (122, 6, 2, 1, 0)

(28, 3, 0) (58, 19, 0) (89, 38, 0) (123, 2, 0)

(29, 2, 0) (58, 6, 5, 1, 0) (89, 51, 0) (124, 37, 0)

(30, 6, 4, 1, 0) (59, 7, 4, 2, 0) (89, 6, 5, 3, 0) (125, 7, 6, 5, 0)

(31, 3, 0) (59, 6, 5, 4, 3, 1, 0) (90, 5, 3, 2, 0) (126, 7, 4, 2, 0)

(31, 6, 0) (60, 1, 0) (91, 8, 5, 1, 0) (127, 1, 0)

(31, 7, 0) (61, 5, 2, 1, 0) (91, 7, 6, 5, 3, 2, 0) (127, 7, 0)

(31, 13, 0) (62, 6, 5, 3, 0) (92, 6, 5, 2, 0) (127, 63, 0)

(32, 7, 6, 2, 0) (63, 1, 0) (93, 2, 0) (128, 7, 2, 1, 0)

(32, 7, 5, 3, 2, 1, 0) (64, 4, 3, 1, 0) (94, 21, 0) (129, 5, 0)

(33, 13, 0) (65, 18, 0) (94, 6, 5, 1, 0) (130, 3, 0)

(33, 16, 4, 1, 0) (65, 4, 3, 1, 0) (95, 11, 0) (131, 8, 3, 2, 0)

(34, 8, 4, 3, 0) (66, 9, 8, 6, 0) (95, 6, 5, 4, 2, 1, 0) (132, 29, 0)

(34, 7, 6, 5, 2, 1, 0) (66, 8, 6, 5, 3, 2, 0) (96, 10, 9, 6, 0) (133, 9, 8, 2, 0)

(35, 2, 0) (67, 5, 2, 1, 0) (96, 7, 6, 4, 3, 2, 0) (134, 57, 0)

(135, 11, 0) (152, 6, 3, 2, 0) (178, 87, 0) (270, 133, 0)

(135, 16, 0) (153, 1, 0) (183, 56, 0) (282, 35, 0)

(135, 22, 0) (153, 8, 0) (194, 87, 0) (282, 43, 0)

(136, 8, 3, 2, 0) (154, 9, 5, 1, 0) (198, 65, 0) (286, 69, 0)

(137, 21, 0) (155, 7, 5, 4, 0) (201, 14, 0) (286, 73, 0)

(138, 8, 7, 1, 0) (156, 9, 5, 3, 0) (201, 17, 0) (294, 61, 0)

(139, 8, 5, 3, 0) (157, 6, 5, 2, 0) (201, 59, 0) (322, 67, 0)

(140, 29, 0) (158, 8, 6, 5, 0) (201, 79, 0) (333, 2, 0)

(141, 13, 6, 1, 0) (159, 31, 0) (202, 55, 0) (350, 53, 0)

(142, 21, 0) (159, 34, 0) (207, 43, 0) (366, 29, 0)

(143, 5, 3, 2, 0) (159, 40, 0) (212, 105, 0) (378, 43, 0)

(144, 7, 4, 2, 0) (160, 5, 3, 2, 0) (218, 11, 0) (378, 107, 0)

(145, 52, 0) (161, 18, 0) (218, 15, 0) (390, 89, 0)

(145, 69, 0) (161, 39, 0) (218, 71, 0) (462, 73, 0)

Page 313 of 666

Applied Cryptography: Second Edition - Bruce Schneier

(146, 5, 3, 2, 0) (161, 60, 0) (218, 83, 0) (521, 32, 0)

(147, 11, 4, 2, 0) (162, 8, 7, 4, 0) (225, 32, 0) (521, 48, 0)

(148, 27, 0) (163, 7, 6, 3, 0) (225, 74, 0) (521, 158, 0)

(149, 10, 9, 7, 0) (164, 12, 6, 5, 0) (225, 88, 0) (521, 168, 0)

(150, 53, 0) (165, 9, 8, 3, 0) (225, 97, 0) (607, 105, 0)

(151, 3, 0) (166, 10, 3, 2, 0) (225, 109, 0) (607, 147, 0)

(151, 9, 0) (167, 6, 0) (231, 26, 0) (607, 273, 0)

(151, 15, 0) (170, 23, 0) (231, 34, 0) (1279, 216, 0)

(151, 31, 0) (172, 2, 0) (234, 31, 0) (1279, 418, 0)

(151, 39, 0) (174, 13, 0) (234, 103, 0) (2281, 715, 0)

(151, 43, 0) (175, 6, 0) (236, 5, 0) (2281, 915, 0)

(151, 46, 0) (175, 16, 0) (250, 103, 0) (2281, 1029, 0)

(151, 51, 0) (175, 18, 0) (255, 52, 0) (3217, 67, 0)

(151, 63, 0) (175, 57, 0) (255, 56, 0) (3217, 576, 0)

(151, 66, 0) (177, 8, 0) (255, 82, 0) (4423, 271, 0)

(151, 67, 0) (177, 22, 0) (258, 83, 0) (9689, 84, 0)

(151, 70, 0) (177, 88, 0) (266, 47, 0)

The code is a little more complicated when the shift register is longer than the computerâ€™s word size,

but not significantly so.

Note that all of these listings have an odd number of coefficients. I have provided such a large table

because LFSRs are often used for stream-cipher cryptography and I wanted many examples so that

different people would pick different primitive polynomials. Since, if p(x) is primitive, then so is xnp

(1/x); each entry on the table is actually two primitive polynomials.

For example, if (a, b, 0) is primitive, then (a, a - b, 0) is also primitive. If (a, b, c, d, 0) is primitive, then

(a, a - d, a - c, a - b, 0) is also primitive. Mathematically:

if xa + xb + 1 is primitive, so is xa + xa-b + 1

if xa + xb + xc + xd + 1 is primitive, so is xa + xa-d + xa-c + xa-b + 1

Primitive trinomials are fastest in software, because only two bits of the shift register have to be

XORed to generate each new bit. Actually, all the feedback polynomials listed in Table 16.2 are

sparse, meaning that they only have a few coefficients. Sparseness is always a source of weakness,

sometimes enough to break the algorithm. It is far better to use dense primitive polynomials, those

with a lot of coefficients, for cryptographic applications. If you use dense polynomials, and especially

if you make them part of the key, you can live with much shorter LFSRs.

Generating dense primitive polynomials modulo 2 is not easy. In general, to generate primitive

polynomials of degree k you need to know the factorization of 2k - 1. Three good references for

finding primitive polynomials are [652,1285,1287].

LFSRs are competent pseudo-random-sequence generators all by themselves, but they have some

annoying nonrandom properties. Sequential bits are linear, which makes them useless for encryption.

For an LFSR of length n, the internal state is the next n output bits of the generator. Even if the

feedback scheme is unknown, it can be determined from only 2n output bits of the generator, by using

Page 314 of 666

Applied Cryptography: Second Edition - Bruce Schneier

the highly efficient Berlekamp-Massey algorithm [1082,1083]: see Section 16.3.

Also, large random numbers generated from sequential bits of this sequence are highly correlated

and, for certain types of applications, not very random at all. Even so, LFSRs are often used as

building blocks in encryption algorithms.

LFSRs in Software

LFSRs are slow in software, but theyâ€™re faster in assembly language than in C. One solution is to run

16 LFSRs (or 32, depending on your computerâ€™s word size) in parallel. This scheme uses an array of

words that is the length of the LFSR, with each bit position in the words representing a different

LFSR. Assuming all the feedback polynomials are the same, this can run pretty quickly. In general,

the best way to update shift registers is to multiply the current state by suitable binary matrices [901].

It is also possible to modify the LFSRâ€™s feedback scheme. The resultant generator is no better

cryptographically, but it still has a maximal period and is easy to implement in software [1272].

Instead of using the bits in the tap sequence to generate the new left-most bit, each bit in the tap

sequence is XORed with the output of the generator and replaced; then the output of the generator

becomes the new left-most bit (see Figure 16.5). This is sometimes called a Galois configuration.

In C, this looks like:

#define mask 0Ă—80000057

static unsigned long ShiftRegister=1;

void seed_LFSR (unsigned long seed)

{

if (seed == 0) /* avoid calamity */

seed = 1;

ShiftRegister = seed;

}

int modified_LFSR (void)

{

if (ShiftRegister & 0Ă—00000001) {

ShiftRegister = ((ShiftRegister ^ mask >> 1) |

0Ă—8000000;

return 1;

} else {

ShiftRegister >>= 1;

return 0;

}

}

Figure 16.5 Galois LFSR.

The savings here is that all the XORs can be done as a single operation. This can also be parallelized,

and the different feedback polynomials can be different. The Galois configuration can also be faster

in hardware, especially in custom VLSI implementations. In general, if you are using hardware that is

good at shifts, use a Fibonacci configuration; if you can exploit parallelism, use a Galois

configuration.

16.3 Design and Analysis of Stream Ciphers

Page 315 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Most practical stream-cipher designs center around LFSRs. In the early days of electronics, they were

very easy to build. A shift register is nothing more than an array of bit memories and the feedback

sequence is just a series of XOR gates. Even in VLSI circuitry, a LFSR-based stream cipher can give

you a lot of security with only a few logic gates.

The problem with LFSRs is that they are very inefficient in software. You want to avoid sparse

feedback polynomialsâ€”they facilitate correlation attacks [1051,1090,350]â€”and dense feedback

polynomials are inefficient. Any stream cipher outputs a bit at a time; you have to iterate the

algorithm 64 times to encrypt what a single iteration of DES can encrypt. In fact, a simple LFSR

algorithm like the shrinking generator described later is no faster in software than DES.

This branch of cryptography is fast-paced and very politically charged. Most designs are secret; a

majority of military encryptions systems in use today are based on LFSRs. In fact, most Cray

computers (Cray 1, Cray X-MP, Cray Y-MP) have a rather curious instruction generally known as

â€śpopulation count.â€ť It counts the 1 bits in a register and can be used both to efficiently calculate the

Hamming distance between two binary words and to implement a vectorized version of a LFSR. Iâ€™ve

heard this called the canonical NSA instruction, demanded by almost all computer contracts.

On the other hand, an astonishingly large number of seemingly complex shift-register-based

generators have been cracked. And certainly military cryptanalysis institutions such as the NSA have

cracked a lot more. Sometimes itâ€™s amazing to see the simple ones proposed again and again.

Linear Complexity

Analyzing stream ciphers is often easier than analyzing block ciphers. For example, one important

metric used to analyze LFSR-based generators is linear complexity, or linear span. This is defined as

the length, n, of the shortest LFSR that can mimic the generator output. Any sequence generated by a

finite-state machine over a finite field has a finite linear complexity [1006]. Linear complexity is

important because a simple algorithm, called the Berlekamp-Massey algorithm, can generate this

LFSR after examining only 2n bits of the keystream [1005]. Once youâ€™ve generated this LFSR, youâ€™ve

broken the stream cipher.

This idea has extensions from fields to rings [1298], and when the output sequence is viewed as

numbers over fields of odd characteristic [842]. A further enhancement is the notion of a linear

complexity profile, which measures the linear complexity of the sequence as it gets longer and longer

[1357,1168,411,1582]. Another algorithm for computing linear complexity is useful only in very

specialized circumstances [597,595,596,1333]. A generalization of linear complexity is in [776]. There

is also the notion of sphere complexity [502] and 2-adic complexity [844].

In any case, remember that a high linear complexity does not necessarily indicate a secure generator,

but a low linear complexity indicates an insecure one [1357,1249].

Correlation Immunity

Cryptographers try to get a high linear complexity by combining the output of several output

sequences in some nonlinear manner. The danger here is that one or more of the internal output

sequencesâ€”often just outputs of individual LFSRsâ€”can be correlated with the combined keystream

and attacked using linear algebra. Often this is called a correlation attack or a divide-and-conquer

attack. Thomas Siegenthaler has shown that correlation immunity can be precisely defined, and that

there is a trade-off between correlation immunity and linear complexity [1450].

Page 316 of 666

Applied Cryptography: Second Edition - Bruce Schneier

The basic idea behind a correlation attack is to identify some correlation between the output of the

generator and the output of one of its internal pieces. Then, by observing the output sequence, you

can obtain information about that internal output. Using that information and other correlations,

collect information about the other internal outputs until the entire generator is broken.

Correlation attacks and variations such as fast correlation attacksâ€”these offer a trade-off between

computational complexity and effectivenessâ€”have been successfully applied to a number of LFSR-

based keystream generators [1451,278,1452,572,1636,1051,1090,350,633,1054,1089,995]. Some

interesting new ideas along these lines are in [46,1641].

Other Attacks

There are other general attacks against keystream generators. The linear consistency test attempts to

identify some subset of the encryption key using matrix techniques [1638]. There is also the meet-in-

the-middle consistency attack [39,41]. The linear syndrome algorithm relies on being able to write a

fragment of the output sequence as a linear equation [1636,1637]. There is the best affine

approximation attack [502] and the derived sequence attack [42]. The techniques of differential

cryptanalysis have even been applied to stream ciphers [501], as has linear cryptanalysis [631].

16.4 Stream Ciphers Using LFSRs

The basic approach to designing a keystream generator using LFSRs is simple. First you take one or

more LFSRs, generally of different lengths and with different feedback polynomials. (If the lengths

are all relatively prime and the feedback polynomials are all primitive, the whole generator is

maximal length.) The key is the initial state of the LFSRs. Every time you want a bit, shift the LFSRs

once (this is sometimes called clocking). The output bit is a function, preferably a nonlinear function,

of some of the bits of the LFSRs. This function is called the combining function, and the whole

generator is called a combination generator. (If the output bit is a function of a single LFSR, the

generator is called a filter generator.) Much of the theoretical background for this kind of thing was

laid down by Selmer and Neal Zierler [1647].

Complications have been added. Some generators have LFSRs clocked at different rates; sometimes

the clocking of one generator depends on the output of another. These are all electronic versions of

pre-WWII cipher machine ideas, and are called clock-controlled generators [641]. Clock control can

be feedforward, where the output of one LFSR controls the clocking of another, or feedback, where

the output of one LFSR controls its own clocking.

Although these generators are, at least in theory, susceptible to embedding and probabilistic

correlation attacks [634,632], many are secure for now. Additional theory on clock-controlled shift

registers is in [89].

Ian Cassells, once the head of pure mathematics at Cambridge and a former Bletchly Park

cryptanalyst, said that â€ścryptography is a mixture of mathematics and muddle, and without the

muddle the mathematics can be used against you.â€ť What he meant was that in stream ciphers, you

need some kind of mathematical structureâ€”such as a LFSRâ€”to guarantee maximal-length and other

properties, and then some complicated nonlinear muddle to stop someone from getting at the register

and solving it. This advice also holds true for block algorithms.

What follows is a smattering of LFSR-based keystream generators that have appeared in the

literature. I donâ€™t know if any of them have been used in actual cryptographic products. Most of them

are of theoretical interest only. Some have been broken; some may still be secure.

Page 317 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Since LFSR-based ciphers are generally implemented in hardware, electronics logic symbols will be

used in the figures. In the text, is XOR, ^ is AND, is OR, and Â¬ is NOT.

Geffe Generator

This keystream generator uses three LFSRs, combined in a nonlinear manner (see Figure 16.6) [606].

Two of the LFSRs are inputs into a multiplexer, and the third LFSR controls the output of the

multiplexer. If a1, a2, and a3 are the outputs of the three LFSRs, the output of the Geffe generator can

be described by:

b = (a1 ^ a2) ((Â¬ a1) ^ a3)

If the LFSRs have lengths n1, n2, and n3, respectively, then the linear complexity of the generator is

(n1 + 1)n2 + n1n3

The period of the generator is the least common multiple of the periods of the three generators.

Assuming the degrees of the three primitive feedback polynomials are relatively prime, the period of

this generator is the product of the periods of the three LFSRs.

Although this generator looks good on paper, it is cryptographically weak and falls to a correlation

attack [829,1638]. The output of the generator equals the output of LFSR-2 75 percent of the time. So,

if the feedback taps are known, you can guess the initial value for LFSR-2 and generate the output

sequence of that register. Then you can count the number of times the output of the LFSR-2 agrees

with the output of the generator. If you guessed wrong, the two sequences will agree about 50 percent

of the time; if you guessed right, the two sequences will agree about 75 percent of the time.

Similarly, the output of the generator equals the output of LFSR-3 about 75 percent of the time. With

those correlations, the keystream generator can be easily cracked. For example, if the primitive

polynomials only have three terms each, and the largest LFSR is of length n, it only takes a segment of

the output sequence 37n-bits long to reconstruct the internal states of all three LFSRs [1639].

Generalized Geffe Generator

Instead of choosing between two LFSRs, this scheme chooses between k LFSRs, as long as k is a power

of 2. There are k + 1 LFSRs total (see Figure 16.7). LFSR-1 must be clocked log2k times faster than

the other k LFSRs.

Figure 16.6 Geffe generator.

Even though this scheme is more complex than the Geffe generator, the same kind of correlation

attack is possible. I donâ€™t recommend this generator.

Page 318 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Jennings Generator

This scheme uses a multiplexer to combine two LFSRs [778,779,780]. The multiplexer, controlled by

LFSR-1, selects 1 bit of LFSR-2 for each output bit. There is also a function that maps the output of

LFSR-2 to the input of the multiplexer (see Figure 16.8).

The key is the initial state of the two LFSRs and the mapping function. Although this generator has

great statistical properties, it fell to Ross Andersonâ€™s meet-in-the-middle consistency attack [39] and

the linear consistency attack [1638,442]. Donâ€™t use this generator.

Beth-Piper Stop-and-Go Generator

This generator, shown in Figure 16.9, uses the output of one LFSR to control the clock of another

LFSR [151]. The clock input of LFSR-2 is controlled by the output of LFSR-1, so that LFSR-2 can

change its state at time t only if the output of LFSR-1 was 1 at time t - 1.

No one has been able to prove results about this generatorâ€™s linear complexity in the general case.

However, it falls to a correlation attack [1639].

Alternating Stop-and-Go Generator

This generator uses three LFSRs of different length. LFSR-2 is clocked when the output of LFSR-1 is

1; LFSR-3 is clocked when the output of LFSR-1 is 0. The output of the generator is the XOR of

LFSR-2 and LFSR-3 (see Figure 16.10) [673].

This generator has a long period and large linear complexity. The authors found a correlation attack

against LFSR-1, but it does not substantially weaken the generator. There have been other attempts

at keystream generators along these lines [1534,1574,1477].

Figure 16.7 Generalized Geffe generator.

Figure 16.8 Jennings generator.

Bilateral Stop-and-Go Generator

This generator uses two LFSRs, both of length n (see Figure 16.11) [1638]. The output of the

generator is the XOR of the outputs of each LFSR. If the output of LFSR-2 at time t â€“ 1 is 0 and the

Page 319 of 666

Applied Cryptography: Second Edition - Bruce Schneier

output at time t â€“ 2 is 1, then LFSR-2 does not clock at time t. Conversely, if the output of LFSR-1 at

time t â€“ 1 is 0 and the output at t â€“ 2 is 1, and if LFSR-1 clocked at time t, then LFSR-2 does not clock

at time t.

The linear complexity of this system is roughly equal to the period. According to [1638], â€śno evident

key redundancy has been observed in this system.â€ť

Threshold Generator

This generator tries to get around the security problems of the previous generators by using a

variable number of LFSRs [277]. The theory is that if you use a lot of LFSRs, itâ€™s harder to break the

cipher.

This generator is illustrated in Figure 16.12. Take the output of a large number of LFSRs (use an odd

number of them). Make sure the lengths of all the LFSRs are relatively prime and all the feedback

polynomials are primitive: maximize the period. If more than half the output bits are 1, then the

output of the generator is 1. If more than half the output bits are 0, then the output of the generator is

0.

Figure 16.9 Beth-Piper stop-and-go generator.

Figure 16.10 Alternating stop-and-go generator.

With three LFSRs, the output generator can be written as:

b = (a1 ^ a2) (a1 ^ a3) (a2 ^ a3)

This is very similar to the Geffe generator, except that it has a larger linear complexity of

n1n2 + n1n3 + n2n3

where n1, n2, and n3 are the lengths of the first, second, and third LFSRs.

This generator isnâ€™t great. Each output bit of the generator yields some information about the state of

the LFSRsâ€”0.189 bit to be exactâ€”and the whole thing falls to a correlation attack. I donâ€™t

recommend using it.

Self-Decimated Generators

Page 320 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Self-decimated generators are generators that control their own clock. Two have been proposed, one

by Rainer Rueppel (see Figure 16.13) [1359] and another by Bill Chambers and Dieter Gollmann

[308] (see Figure 16.14). In Rueppelâ€™s generator, when the output of the LFSR is 0, the LFSR is

clocked d times. When the output of the LFSR is 1, the LFSR is clocked k times. Chambersâ€™s and

Gollmannâ€™s generator is more complicated, but the idea is the same. Unfortunately, both generators

are insecure [1639], although some modifications have been proposed that may correct the problems

[1362].

Figure 16.11 Bilateral stop-and-go generator.

Figure 16.12 Threshold generator.

Multispeed Inner-Product Generator

This generator, by Massey and Rueppel [1014], uses two LFSRs clocked at two different speeds (see

Figure 16.15). LFSR-2 is clocked d times as fast as LFSR-1. The individual bits of the two LFSRs are

ANDed together and then XORed with each other to produce the final output bit of the generator.

Although this generator has high linear complexity and it possesses excellent statistical properties, it

still falls to a linear consistency attack [1639]. If n1 is the length of LFSR-1, n2 is the length of the

LFSR-2, and d is the speed multiple between the two, then the internal state of the generator can be

recovered from an output sequence of length

n1 + n2 + log2d

Summation Generator

More work by Rainer Rueppel, this generator adds the output of two LFSRs (with carry) [1358,1357].

This operation is highly nonlinear. Through the late 1980s, this generator was the security front-

runner, but it fell to a correlation attack [1053,1054,1091]. And it has been shown that this is an

example of a feedback with carry shift register (see Section 17.4), and can be broken [844].

Figure 16.13 Rueppelâ€™s self-decimated generator.

Page 321 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Figure 16.14 Chambersâ€™s and Gollmannâ€™s self-decimated generator.

DNRSG

That stands for â€śdynamic random-sequence generatorâ€ť [1117]. The idea is to have two different filter

generatorsâ€”threshold, summation, or whateverâ€”fed by a single set of LFSRs and controlled by

another LFSR.

First clock all the LFSRs. If the output of LFSR-0 is 1, then compute the output of the first filter

generator. If the output of LFSR-0 is 0, then compute the output of the second filter generator. The

final output is the first output XOR the second.

Gollmann Cascade

The Gollmann cascade (see Figure 16.16), described in [636,309], is a strengthened version of a stop-

and-go generator. It consists of a series of LFSRs, with the clock of each controlled by the previous

LFSR. If the output of LFSR-1 is 1 at time t - 1, then LFSR-2 clocks. If the output of LFSR-2 is 1 at

time t - 1, then LFSR-3 clocks, and so on. The output of the final LFSR is the output of the generator.

If all the LFSRs have the same length, n, the linear complexity of a system with k LFSRs is

n(2n - 1)k-1

Figure 16.15 Multispeed inner-product generator.

Cascades are a cool idea: They are conceptually very simple and they can be used to generate

sequences with huge periods, huge linear complexities, and good statistical properties. They are

vulnerable to an attack called lock-in [640]. This is a technique by which a cryptanalyst reconstructs

the input to the last shift register in the cascade, then proceeds to break the cascade register by

register. This is a serious problem in some situations and weakens the effective key length of the

algorithm, but precautions can be taken to minimize the attack.

Further analysis has indicated that the sequence approaches random as k gets larger

[637,638,642,639]. Based on recent attacks on short Gollmann cascades [1063], I recommend using a k

of at least 15. Youâ€™re better off using more LFSRs of shorter length than fewer LFSRs of longer

length.

Shrinking Generator

The shrinking generator [378] uses a different form of clock control than the previous generators.

Page 322 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Take two LFSRs: LFSR-1 and LFSR-2. Clock both of them. If the output of LFSR-1 is 1, then the

output of the generator is LFSR-2. If the output of LFSR-1 is 0, discard the two bits, clock both

LFSRs, and try again.

This idea is simple, reasonably efficient, and looks secure. If the feedback polynomials are sparse, the

generator is vulnerable, but no other problems have been found. Even so, itâ€™s new. One

implementation problem is that the output rate is not regular; if LFSR-1 has a long string of zeros

then the generator outputs nothing. The authors suggest buffering to solve this problem [378].

Practical implementation of the shrinking generator is discussed in [901].

Self-Shrinking Generator

The self-shrinking generator [1050] is a variant of the shrinking generator. Instead of using two

LFSRs, use pairs of bits from a single LFSR. Clock a LFSR twice. If the first bit in the pair is 1, the

output of the generator is the second bit. If the first bit is 0, discard both bits and try again. While the

self-shrinking generator requires about half the memory space as the shrinking generator, it is also

half the speed.

While the self-shrinking generator also seems secure, it still has some unexplained behavior and

unknown properties. This is a very new generator; give it some time.

Figure 16.16 Gollmann cascade.

16.5 A5

A5 is the stream cipher used to encrypt GSM (Group Special Mobile). Thatâ€™s the non-American

standard for digital cellular mobile telephones. It is used to encrypt the link from the telephone to the

base station. The rest of the link is unencrypted; the telephone company can easily eavesdrop on your

conversations.

A lot of strange politics surrounds this one. Originally it was thought that GSMâ€™s cryptography would

prohibit export of the phones to some countries. Now some officials are discussing whether A5 might

harm export sales, implying that it is so weak as to be an embarrassment. Rumor has it that the

various NATO intelligence agencies had a catfight in the mid-1980s over whether GSM encryption

should be strong or weak. The Germans wanted strong cryptography, as they were sitting near the

Soviet Union. The other countries overruled them, and A5 is a French design.

We know most of the details. A British telephone company gave all the documentation to Bradford

University without remembering to get them to sign a nondisclosure agreement. It leaked here and

there, and was eventually posted to the Internet. A paper describing A5 is [1622]; there is also code at

the back of this book.

A5 consists of three LFSRs; the register lengths are 19, 22, and 23; all the feedback polynomials are

sparse. The output is the XOR of the three LFSRs. A5 uses variable clock control. Each register is

clocked based on its own middle bit, XORed with the inverse threshold function of the middle bits of

all three registers. Usually, two of the LFSRs clock in each round.

There is a trivial attack requiring 240 encryptions: Guess the contents of the first two LFSRs, then try

Page 323 of 666

Applied Cryptography: Second Edition - Bruce Schneier

to determine the third LFSR from the keystream. (Whether this attack is actually feasible is under

debate, but a hardware key search machine currently under design should resolve the matter soon

[45].)

Nonetheless, it is becoming clear that the basic ideas behind A5 are good. It is very efficient. It passes

all known statistical tests; its only known weakness is that its registers are short enough to make

exhaustive search feasible. Variants of A5 with longer shift registers and denser feedback polynomials

should be secure.

16.6 Hughes XPD/KPD

This algorithm is brought to you by Hughes Aircraft Corp. They put it in army tactical radios and

direction-finding equipment for sale to foreign militaries. It was designed in 1986 and called XPD, for

Exportable Protection Device. Later it was renamed KPDâ€”Kinetic Protection Deviceâ€”and

declassified [1037,1036].

The algorithm uses a 61-bit LFSR. There are 210 different primitive feedback polynomials, which

were approved by the NSA. The key selects one of these polynomials (they are all stored in ROM

somewhere), as well as the initial state of the LFSR.

It has eight different nonlinear filters, each of which has six taps from the LFSR and which produces

1 bit. The bits combine to generate a byte, which is used to encrypt or decrypt the datastream.

This algorithm looks pretty impressive, but I doubt it is. The NSA allows export, so there must be

some attack on the order of 240 or less. What is it?

16.7 Nanoteq

Nanoteq is a South African electronics company. This is their algorithm that has been fielded by the

South African police to encrypt their fax transmissions, and presumably for other uses as well.

The algorithm is described, more or less, in [902,903]. It uses a 127-bit LFSR with a fixed feedback

polynomial; the key is the initial state of the feedback register. The 127 bits of the register are reduced

to a single keystream bit using 25 primitive cells. Each cell has five inputs and one output:

f(x1,x2,x3,x4,x5) = x1 + x2 + (x1 + x3) (x2 + x4 + x5) + (x1 + x4) (x2 + x3) + x5

Each input of the function is XORed with some bit of the key. There is also a secret permutation that

depends on the particular implementation, and is not detailed in the papers. This algorithm is only

available in hardware.

Is this algorithm secure? I doubt it. During the transition to majority rule, embarrassing faxes from

one police station to another would sometimes turn up in the liberal newspapers. These could easily

have been the results of U.S., U.K., or Soviet intelligence efforts. Ross Anderson took some initial steps

towards cryptanalyzing this algorithm in [46]; I expect more results to come soon.

16.8 Rambutan

Rambutan is a British algorithm, designed by the Communications Electronics Security Group (one

of the aliases used by GCHQ). It is only sold as a hardware module and is approved for the protection

Page 324 of 666

Applied Cryptography: Second Edition - Bruce Schneier

of classified material up to â€śConfidential.â€ť The algorithm itself is secret, and the chip is not generally

commercially available.

Rambutan has a 112-bit key (plus parity bits) and can operate in three modes: ECB, CBC, and 8-bit

CFB. This strongly indicates that it is a block algorithm, but rumors point elsewhere. Supposedly, it is

a LFSR stream cipher. It has five shift registers, each one of a different length around 80 bits. The

feedback polynomials are fairly sparse, with only about 10 taps each. Each shift register provides four

inputs to a very large and complex nonlinear function which eventually spits out a single bit.

Why call it Rambutan? Perhaps, like the fruit, itâ€™s spiny and forbidding on the outside but soft and

yielding inside. On the other hand, maybe thatâ€™s not the reason.

16.9 Additive Generators

Additive generators (sometimes called lagged Fibonacci generators) are extremely efficient because

they produce random words instead of random bits [863]. They are not secure on their own, but can

be used as building blocks for secure generators.

The initial state of the generator is an array of n-bit words: 8-bit words, 16-bit words, 32-bit words,

whatever: X1, X2, X3,..., Xm. This initial state is the key. The ith word of the generator is

Xi = (Xi-a + Xi-b + Xi-c +...+ Xi-m) mod 2n

If the coefficients a, b, c,..., m are chosen right, the period of this generator is at least 2n - 1. One of the

requirements on the coefficients is that the least significant bit forms a maximal-length LFSR.

For example, (55,24,0) is a primitive polynomial mod 2 from Table 16.2. This means that the following

additive generator is maximal length.

Xi = (Xi-55 + Xi-24) mod 2n

This works because the primitive polynomial has three coefficients. If it has more than three, you need

some additional requirements to make it maximal length. See [249] for details.

Fish

Fish is an additive generator based on techniques used in the shrinking generator [190]. It produces a

stream of 32-bit words which can be XORed with a plaintext stream to produce ciphertext, or XORed

with a ciphertext stream to produce plaintext. The algorithm is named as it is because it is a Fibonacci

shrinking generator.

First, use these two additive generators. The key is the initial values of these generators.

Ai = (Ai-55 + Ai-24) mod 232

Bi = (Bi-52 + Bi-19) mod 232

These sequences are shrunk, as a pair, depending on the least significant bit of Bi: if it is 1, use the

pair; if it is 0, ignore the pair. Cj is the sequence of used words from Ai, and Dj is the sequence of used

words from Bi. These words are used in pairsâ€”C2j, C2j+1, D2j, and D2j+1â€”to generate two 32-bit

Page 325 of 666

Applied Cryptography: Second Edition - Bruce Schneier

output words: K2j and K2j+1.

E2j = C2j (D2j ^ D2j+1)

F2j = D2j+1 ^ (E2j ^ C2j+1)

K2j = E2j F2j

K2i+1 = C2i+1 F2j

This algorithm is fast. On a 33 megahertz 486, a C implementation of Fish encrypts data at 15

megabits per second. Unfortunately, it is also insecure; an attack has a work factor of about 240 [45].

Pike

Pike is a leaner, meaner version of Fish, brought to you by Ross Anderson, the man who broke Fish

[45]. It uses three additive generators. For example:

Ai = (Ai-55 + Ai-24) mod 232

Bi = (Bi-57 + Bi-7) mod 232

Ci = (Ci-58 + Ci-19) mod 232

To generate the keystream word, look at the addition carry bits. If all three agree (all are 0 or all are

1), then clock all three generators. If they do not, just clock the two generators that agree. Save the

carry bits for next time. The final output is the XOR of the three generators.

Pike is faster than Fish, since on the average 2.75 steps will be required per output rather than 3. It is

far too new to trust, but looks good so far.

Mush

Mush is a mutual shrinking generator. Itâ€™s easy to explain [1590]. Take two additive generators: A

and B. If the carry bit of A is set, clock B. If the carry bit of B is set, clock A. Clock A, and set the

carry bit if there is a carry. Clock B, and set the carry bit if there is a carry. The final output is the

XOR of the output of A and B.

The easiest generators to use are the ones from Fish:

Ai = (Ai-55 + Ai-24) mod 232

Bi = (Bi-52 + Bi-19) mod 232

On the average, three generator iterations are required to produce one output word. And if the

coefficients of the additive generators are chosen correctly and are relatively prime, the output

sequence will be maximal length. I know of no successful attacks, but remember that this algorithm is

very new.

16.10 Gifford

David Gifford invented a stream cipher and used it to encrypt news wire reports in the Boston area

from 1984 until 1988 [608,607,609]. The algorithm has a single 8-byte register: b0, b1,..., b7. The key is

Page 326 of 666

Applied Cryptography: Second Edition - Bruce Schneier

the initial state of the register. The algorithm works in OFB; the plaintext does not affect the

algorithm at all. (See Figure 16.17).

To generate a key byte ki, concatenate b0 and b2 and concatenate b4 and b7. Multiply the two together

to get a 32-bit number. The third byte from the left is ki.

To update the register, take b1 and sticky right shift it 1 bit. This means the left-most bit is both

shifted and also remains in place. Take b7 and shift it 1 bit to the left; there should be a 0 in the right-

most bit position. Take the XOR of the modified b1, the modified b7, and b0. Shift the original register

1 byte to the right and put this byte in the left-most position.

This algorithm remained secure throughout its life, but was broken in 1994 [287]. It turns out that the

feedback polynomial isnâ€™t primitive and can be attacked that wayâ€”oops.

Figure 16.17 Gifford.

16.11 Algorithm M

The name is from Knuth [863]. Itâ€™s a method for combining multiple pseudo-random streams that

increases their security. One generatorâ€™s output is used to select a delayed output from the other

generator [996,1003]. In C:

#define ARR_SIZE (8192) /* for example â€” the larger the better

*/

static unsigned char delay[ ARR_SIZE ] ;

unsigned char prngA( void ) ;

long prngB( void ) ;

void init_algM( void )

{

long i ;

for ( i = 0 ; i < ARR_SIZE ; i++ )

delay = prngA() ;

} /* init_algM */

unsigned char algM( void )

{

long j,v ;

j = prngB() % ARR_SIZE ; /* get the delay[] index */

v = delay[j] ; /* get the value to return */

delay[j] = prngA() ; /* replace it */

return ( v ) ;

Page 327 of 666

Applied Cryptography: Second Edition - Bruce Schneier

} /* algM */

This has strength in that if prngA were truly random, one could not learn anything about prngB (and

could therefore not cryptanalyze it). If prngA were of the form that it could be cryptanalyzed only if

its output were available in order (i.e., only if prngB were cryptanalyzed first) and otherwise it was

effectively truly random, then the combination would be secure.

16.12 PKZIP

Roger Schlafly designed the encryption algorithm built into the PKZIP data compression program.

Itâ€™s a stream cipher that encrypts data one byte at a time. At least, this is the algorithm in version

2.04g. I canâ€™t speak for later versions, but unless there is some announcement you can probably

assume that they are identical.

The algorithm uses three 32-bit variables, initialized as follows:

K0 = 305419896

K1 = 591751049

K2 = 878082192

It has an 8-bit key, K3, derived from K2. Here is the algorithm (all symbols are standard C notation):

Ci = Pi ^ K3

K0 = crc32 (K0, Pi)

K1 = K1 + (K0 & 0Ă—000000ff)

K1 = K1 * 134775813 + 1

K2 = crc32 (K2, K1 >> 24)

K3 = ((K2 | 2) * ((K2 | 2) ^ 1)) >> 8

The function crc32 takes the previous value and a byte, XORs them, and calculates the next value by

the CRC polynomial denoted by 0Ă—edb88320. In practice, a 256-entry table can be precomputed and

the crc32 calculation becomes:

crc32 (a, b) = (a >> 8) ^ table [(a & 0Ă—ff) b]

The table is precomputed by the original definition of crc32:

table [i] = crc32 (i, 0)

To encrypt a plaintext stream, first loop the key bytes through the encryption algorithm to update the

keys. Ignore the ciphertext output in this step. Then encrypt the plaintext, one byte at a time. Twelve

random bytes are prepended to the plaintext, but thatâ€™s not really important. Decryption is similar to

encryption, except that Ci is used in the second step of the algorithm instead of Pi.

Security of PKZIP

Unfortunately, itâ€™s not that great. An attack requires 40 to 200 bytes of known plaintext and has a

time complexity of about 227 [166]. You can do it in a few hours on your personal computer. If the

compressed file has any standard headers, getting the known plaintext is no problem. Donâ€™t use the

Page 328 of 666

Applied Cryptography: Second Edition - Bruce Schneier

built-in encryption in PKZIP.

Chapter 17

Other Stream Ciphers and Real Random-Sequence Generators

17.1 RC4

RC4 is a variable-key-size stream cipher developed in 1987 by Ron Rivest for RSA Data Security, Inc.

For seven years it was proprietary, and details of the algorithm were only available after signing a

nondisclosure agreement.

In September, 1994 someone posted source code to the Cypherpunks mailing listâ€”anonymously. It

quickly spread to the Usenet newsgroup sci.crypt, and via the Internet to ftp sites around the world.

Readers with legal copies of RC4 confirmed compatibility. RSA Data Security, Inc. tried to put the

genie back into the bottle, claiming that it was still a trade secret even though it was public; it was too

late. It has since been discussed and dissected on Usenet, distributed at conferences, and taught in

cryptography courses.

RC4 is simple to describe. The algorithm works in OFB: The keystream is independent of the

plaintext. It has a 8 * 8 S-box: S0, S1,..., S255. The entries are a permutation of the numbers 0 through

255, and the permutation is a function of the variable-length key. It has two counters, i and j,

initialized to zero.

To generate a random byte, do the following:

i = (i + 1) mod 256

j = (j + Si) mod 256

swap Si and Sj

t = (Si + Sj) mod 256

K = St

The byte K is XORed with the plaintext to produce ciphertext or XORed with the ciphertext to

produce plaintext. Encryption is fastâ€”about 10 times faster than DES.

Initializing the S-box is also easy. First, fill it linearly: S0 = 0, S1 = 1,..., S255 = 255. Then fill another

256-byte array with the key, repeating the key as necessary to fill the entire array: K0, K1,..., K255. Set

the index j to zero. Then:

for i = 0 to 255:

j = (j + Si + Ki) mod 256

swap Si and Sj

And thatâ€™s it. RSADSI claims that the algorithm is immune to differential and linear cryptanalysis,

doesnâ€™t seem to have any small cycles, and is highly nonlinear. (There are no public cryptanalytic

results. RC4 can be in about 21700 (256! Ă— 2562) possible states: an enormous number.) The S-box

slowly evolves with use: i ensures that every element changes and j ensures that the elements change

randomly. The algorithm is simple enough that most programmers can quickly code it from memory.

Page 329 of 666

Applied Cryptography: Second Edition - Bruce Schneier

It should be possible to generalize this idea to larger S-boxes and word sizes. The previous version is

8-bit RC4. Thereâ€™s no reason why you canâ€™t define 16-bit RC4 with a 16 * 16 S-box (100K of memory)

and a 16-bit word. Youâ€™d have to iterate the initial setup a lot more timesâ€”65,536 to keep with the

stated designâ€”but the resulting algorithm should be faster.

RC4 has special export status if its key length is 40 bits or under (see Section 13.8). This special export

status has nothing to do with the secrecy of the algorithm, although RSA Data Security, Inc. has

hinted for years that it does. The name is trademarked, so anyone who writes his own code has to call

it something else. Various internal documents by RSA Data Security, Inc. have not yet been made

public [1320,1337].

So, whatâ€™s the deal with RC4? Itâ€™s no longer a trade secret, so presumably anyone can use it.

However, RSA Data Security, Inc. will almost certainly sue anyone who uses unlicensed RC4 in a

commercial product. They probably wonâ€™t win, but they will certainly make it cheaper for a company

to license than fight.

RC4 is in dozens of commercial cryptography products, including Lotus Notes, Apple Computerâ€™s

AOCE, and Oracle Secure SQL. It is part of the Cellular Digital Packet Data specification [37].

17.2 SEAL

SEAL is a software-efficient stream cipher designed at IBM by Phil Rogaway and Don Coppersmith

[1340]. The algorithm was optimized for 32-bit processors: To run well it needs eight 32-bit registers

and a cache of a few kilobytes. Using a relatively slow operation, SEAL preprocesses the key

operation into a set of tables. These tables are then used to speed up encryption and decryption.

Pseudo-random Function Family

One novel feature of SEAL is that is isnâ€™t really a traditional stream cipher: it is a pseudo-random

function family. Given a 160-bit key k, and a 32-bit n, SEAL stretches n into an L-bit string k(n). L

can take any value less than 64 kilobytes. SEAL is supposed to enjoy the property that if k is selected

at random, then k(n) should be computationally indistinguishable from a random L-bit function of n.

The practical effect of SEAL being a pseudo-random function family is that it is useful in applications

where traditional stream ciphers are not. With most stream ciphers you generate a sequence of bits in

one direction: Knowing the key and a position i, the only way to determine the ith bit generated is to

generate all the bits up until the ith one. But a pseudo-random function family is different: You get

easy access at any desired position in the key stream. This is very useful.

Imagine you need to secure a hard drive. You want to encrypt each and every 512-byte sector. With a

pseudo-random function family like SEAL, you can encrypt the contents of sector n by XORing it

with k(n). It is as though the entire disk is XORed with a long pseudo-random string, where any piece

of that long string can be computed without any trouble.

A pseudo-random function family also simplifies the synchronization problem encountered with

standard stream ciphers. Suppose you send encrypted messages over a channel that sometimes drops

messages. With a pseudo-random function family, you can encrypt under k the nth message you

transmit, xn, as n together with the XOR of xn and k(n). The receiver doesnâ€™t need to store any state to

recover xn, nor does he need to worry about lost messages affecting the message decryption process.

Description of SEAL

Page 330 of 666

Applied Cryptography: Second Edition - Bruce Schneier

The inner loop of SEAL is shown by Figure 17.1. Three key-derived tables, called R, S, and T, drive

the algorithm. The preprocessing step maps the key k, to these tables using a procedure based on SHA

(see Section 18.7). The 2-kilobyte table, T, is a 9 * 32 bit S-box.

SEAL also uses four 32-bit registers, A, B, C, and D, whose initial values are determined by n and the

k-derived tables R and T. These registers get modified over several iterations, each one involving 8

rounds. In each round 9 bits of a first register (either A, B, C, or D) are used to index into table T. The

value retrieved from T is then added to or XORed with the contents of a second register: again one of

A, B, C, or D. The first register is then circularly shifted by nine positions. In some rounds the second

register is further modified by adding or XORing it with the (now shifted) first register. After 8

rounds of this, A, B, C, and D are added to the keystream, each masked first by adding or XORing it

with a certain word from S. The iteration is completed by adding to A and C additional values

dependent on n, n1, n2, n3, n4; exactly which one depends on the parity of the iteration number.

The important ideas in this design seem to be:

1. Use a large, secret, key-derived S-box (T).

2. Alternate arithmetic operations which donâ€™t commute (addition and XOR).

3. Use an internal state maintained by the cipher which is not directly manifest in the

data stream (the ni values which modify A and C at the end of each iteration).

4. Vary the round function according to the round number, and vary the iteration

function according to the iteration number.

Figure 17.1 The inner loop of SEAL.

SEAL requires about five elementary machine operations to encrypt each byte of text. It runs at 58

megabits per second on a 50 megahertz 486 machine. This is probably the fastest software algorithm

in the book.

On the other hand, SEAL must preprocess its key into internal tables. These tables total roughly 3

kilobytes in size, and their calculation takes about 200 SHA computations. Thus, SEAL is not

appropriate to use in situations where you donâ€™t have the time to perform the key setup or you donâ€™t

have the memory to store the tables.

Security of SEAL

SEAL is a new algorithm and has yet to be subjected to any published cryptanalysis. This suggests

Page 331 of 666

Applied Cryptography: Second Edition - Bruce Schneier

caution. However, SEAL seems to be well thought through. Its peculiarities do, in the end, make a

good deal of sense. And Don Coppersmith is generally regarded as the worldâ€™s cleverest cryptanalyst.

Patents and Licenses

SEAL is being patented [380]. Anyone wishing to license SEAL should contact the Director of

Licenses, IBM Corporation, 500 Columbus Ave., Thurnwood, NY, 10594.

17.3 WAKE

WAKE is the Word Auto Key Encryption algorithm, invented by David Wheeler [1589]. It produces a

stream of 32-bit words which can be XORed with a plaintext stream to produce ciphertext, or XORed

with a ciphertext stream to produce plaintext. And itâ€™s fast.

WAKE works in CFB; the previous ciphertext word is used to generate the next key word. It also uses

an S-box of 256 32-bit values. This S-box has a special property: The high-order byte of all the entries

is a permutation of all possible bytes, and the low-order 3 bytes are random.

First, generate the S-box entries, Si, from the key. Then initialize four registers with the key (or with

another key): a0, b0, c0, and d0. To generate a 32-bit keystream word, Ki:

Ki = di

The ciphertext word Ci, is the plaintext word, Pi XORed with Ki.

Then, update the four registers:

ai+1 = M(ai,di)

bi+1 = M(bi,ai+1)

ci+1 = M(ci,bi+1)

di+1 = M(di,ci+1)

Function M is

M(x,y) = (x + y) >> 8 S(x+y) ^ 255

This is shown in Figure 17.2. The operation >> is a right shift, not a rotation. The low-order 8 bits of x

+ y are the input into the S-box. Wheeler gives a procedure for generating the S-box, but it isnâ€™t really

complete. Any algorithm to generate random bytes and a random permutation will work.

Page 332 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Figure 17.2 Wake.

WAKEâ€™s biggest asset is that it is fast. However, itâ€™s insecure against a chosen-plaintext or chosen-

ciphertext attack. It is being used in the current version of Dr. Solomonâ€™s Anti-Virus program.

17.4 Feedback with Carry Shift Registers

A feedback with carry shift register, or FCSR, is similar to a LFSR. Both have a shift register and a

feedback function; the difference is that a FCSR also has a carry register (see Figure 17.3). Instead of

XORing all the bits in the tap sequence, add the bits together and add in the contents of the carry

register. The result mod 2 becomes the new bit. The result divided by 2 becomes the new content of

the carry register.

Figure 17.4 is an example of a 3-bit FCSR tapped at the first and second bit. Its initial value is 001,

and the initial contents of the carry register is 0. The output bit is the right-most bit of the shift

register.

Shift Register Carry Register

001 0

100 0

010 0

101 0

110 0

111 0

011 1

101 1

010 1

001 1

000 1

100 0

Figure 17.3 Feedback with carry shift register.

Note that the final internal state (including the contents of the carry register) is the same as the second

internal state. The sequence cycles at this point, and has a period of 10.

There are a few things to note here. First, the carry register is not a single bit; it is a number. The size

of the carry register must be at least log2t, where t is the number of taps. There are only two taps in

the previous example, so the carry register only has to be 1 bit wide. If there were four taps, the carry

register would have to be 2 bits wide, and could be either 0, 1, 2, or 3.

Page 333 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Second, there is an initial transient before the FCSR settles down into its repeating period. In the

previous example, only one state never repeated. For larger and more complicated FCSRs, there may

be more.

Third, the maximum period of a FCSR is not 2n - 1, where n is the length of the shift register. The

maximum period is q - 1, where q is the connection integer. This number gives the taps and is defined

by:

q = 2q1 + 22q2 + 24q4 +...+ 2nqn - 1

(Yes, the qis are numbered from left to right.) And even worse, q has to be a prime for which 2 is a

primitive root. The rest of this discussion assumes q is of this form.

In this example, q = 2*0 + 4*1 + 8*1 - 1 = 11. And 11 is a prime with 2 as a primitive root. So the

maximum period is 10.

Not all initial states give you the maximum period. For example, look at the FCSR when the initial

value is 101 and the carry register is set to 4.

Figure 17.4 3-bit FCSR.

Shift Register Carry Register

101 4

110 2

111 1

111 1

At this point the register spits out a neverending stream of 1s.

Any initial state will result in one of four things. First, it is part of the maximum period. Second, it

will fall into the maximum period after an initial transient. Third, it will fall into a sequence of all

zeros after an initial transient. Fourth, it will fall into a sequence of all ones after an initial transient.

There is a mathematical formula for determining what will happen to a given initial state, but itâ€™s

much easier to just test it. Run the FCSR for a while. (If m is the initial memory, and t is the number

of taps, then log2(t) + log2(m) + 1 steps are enough.) If it degenerates into a neverending stream of 0s

or 1s within n bits, where n is the length of the FCSR, donâ€™t use it. If it doesnâ€™t, then use it. Since the

initial state of a FCSR corresponds to the key of the stream cipher, this means that a FCSR-based

generator will have a set of weak keys.

Table 17.1 lists all connection integers less than 10,000 for which 2 is a primitive root. These all have

maximum period q - 1. To turn one of these numbers into a tap sequence, calculate the binary

Page 334 of 666

Applied Cryptography: Second Edition - Bruce Schneier

expansion of q + 1. For example, 9949 would translate to taps on bits 1, 2, 3, 4, 6, 7, 9, 10, and 13,

because

9950 = 213 + 210 + 29 + 27 + 26 + 24 + 23 + 22 + 21

Table 17.2 lists all the 4-tap tap sequences that result in a maximal-length FCSR for shift register

lengths of 32 bits, 64 bits, and 128 bits. Each of the four values, a, b, c, and d, combine to generate q, a

prime for which 2 is primitive.

q = 2a + 2b + 2c + 2d - 1

Any of these tap sequences can be used to create a FCSR with period q - 1.

The idea of using FCSRs for cryptography is still very new; it is being pioneered by Andy Klapper

and Mark Goresky [844,845,654,843,846]. Just as the analysis of LFSRs is based on the addition of

primitive polynomials mod 2, analysis of FCSRs is based on addition over something called the 2-adic

numbers. The theory is well beyond the scope of this book, but there seems to be a 2-adic analog for

everything. Just as you can define linear complexity, you can define 2-adic complexity. There is even a

2-adic analog to the Berlekamp-Massey algorithm. What this means is that the list of potential stream

ciphers has just doubledâ€”at least. Anything you can do with a LFSR you can do with a FCSR.

There are further enhancements to this sort of idea, ones that involve multiple carry registers. The

analysis of these sequence generators is based on addition over the ramified extensions of the 2-adic

numbers [845,846].

17.5 Stream Ciphers Using FCSRs

There arenâ€™t any FCSR stream ciphers in the literature; the theory is still too new. In the interests of

getting the ball rolling, I propose some here. I am taking two different tacks: I am proposing FCSR

stream ciphers that are identical to previously proposed LFSR generators, and I am proposing stream

ciphers that use both FCSRs and LFSRs. The security of the former can probably be analyzed using

2-adic numbers; the latter cannot be analyzed using algebraic techniquesâ€”they can probably only be

analyzed indirectly. In any case, it is important to choose LFSRs and FCSRs whose periods are

relatively prime.

All this will come later. Right now I know of no implementation or analysis of any of these ideas. Wait

some years and scan the literature before you trust any of them.

Cascade Generators

There are two ways to use FCSRs in a cascade generator:

â€” FCSR Cascade. The Gollmann cascade with FCSRs instead of LFSRs.

â€” LFSR/FCSR Cascade. The Gollmann cascade with the generators alternating between

LFSRs and FCSRs.

FCSR Combining Generators

These generators use a variable number of LFSRs and/or FCSRs, and a variety of functions to

combine them. The XOR operation destroys the algebraic properties of FCSRs, so it makes sense to

use it to combine them. The generator, shown in Figure 17.5, uses a variable number of FCSRs. Its

output is the XOR of the outputs of the individual FCSRs.

Page 335 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Other generators along similar lines are:

â€” FCSR Parity Generator. All registers are FCSRs and the combining function is XOR.

â€” LFSR/FCSR Parity Generator. Registers are a mix of LFSRs and FCSRs and the

combining function is XOR.

â€” FCSR Threshold Generator. All registers are FCSRs and the combining function is the

majority function.

â€” LFSR/FCSR Threshold Generator. Registers are a mix of LFSRs and FCSRs and the

combining function is the majority function.

â€” FCSR Summation Generator. All registers are FCSRs and the combining function is

addition with carry.

â€” LFSR/FCSR Summation Generator. Registers are a mix of LFSRs and FCSRs and the

combining function is addition with carry.

Table 17.1

Connection Integers for Maximal-period FCSRs

2 653 1549 2477 3539

5 659 1571 2531 3547

11 661 1619 2539 3557

13 677 1621 2549 3571

19 701 1637 2557 3581

29 709 1667 2579 3613

37 757 1669 2621 3637

53 773 1693 2659 3643

59 787 1733 2677 3659

61 797 1741 2683 3677

67 821 1747 2693 3691

83 827 1787 2699 3701

101 829 1861 2707 3709

107 853 1867 2741 3733

131 859 1877 2789 3779

139 877 1901 2797 3797

149 883 1907 2803 3803

163 907 1931 2819 3851

173 941 1949 2837 3853

179 947 1973 2843 3877

181 1019 1979 2851 3907

197 1061 1987 2861 3917

211 1091 1997 2909 3923

227 1109 2027 2939 3931

269 1117 2029 2957 3947

293 1123 2053 2963 3989

317 1171 2069 3011 4003

347 1187 2083 3019 4013

ńňđ. 14 |