ńňđ. 12 |

Page 261 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Description of LOKI91

The mechanics of LOKI91 are similar to DES (see Figure 13.8). The data block is then divided into a

left half and a right half and goes through 16 rounds, much like DES. In each round, the right half is

first XORed with a piece of the key, then sent through an expansion permutation (see Table 13.1).

The 48-bit output is divided into four 12-bit blocks, and each block is sent through an S-box

substitution. The S-box substitution is as follows: Take each 12-bit input; use the 2 left-most bits and

the 2 right-most bits to form the number r, and the 8 innermost bits and form the number c. The

output of the S-box, O, is as follows:

0xff) & 0xff)31 mod Pr

O(r,c) = (c + ((r*17)

Figure 13.8 LOKI91.

Pr is given in Table 13.2.

Then, the four 8-bit outputs are recombined to form a single 32-bit number and sent through the

permutation described in Table 13.3. Finally, the right half is XORed with the left half to become the

new left half, and the left half becomes the new right half. After 16 rounds, the block is again XORed

with the key to produce the ciphertext.

The subkeys are generated from the key in a straightforward manner. The 64-bit key is split into a

left half and a right half. In each round, the subkey is the left half. This left half is then rotated 12 or

13 bits to the left, and then every two rounds the left and right halves are exchanged. As with DES,

the same algorithm can be used for both encryption and decryption, with some modification in how

the subkeys are used.

Table 13.1

Expansion Permutation

4, 3, 2, 1, 32, 31, 20, 29, 28, 27, 26, 25,

28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17,

20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9,

12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1

Page 262 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Table 13.2

Pr

r: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

Pr: 375, 379, 391, 395, 397, 415, 419, 425, 433, 445, 451, 463, 471, 477, 487, 499

Cryptanalysis of LOKI91

Knudsen attempted to cryptanalyze LOKI91 [854, 858], but found it secure against differential

cryptanalysis. However, he found a related-key chosen-plaintext attack that reduces the complexity of

a brute-force search by almost a factor of four. This attack exploits a weakness in the key schedule

and may also apply if the algorithm is used as a one-way hash function (see Section 18.11).

Another attack on related keys can break LOKI91 with 232 chosen-key chosen plaintexts, or 248

chosen-key known plaintexts [158]. The attack is independent of the number of rounds of the

algorithm. (In the same paper, Biham breaks LOKI89 with 217 chosen-key chosen plaintexts or 233

known-key known plaintexts using related-key cryptanalysis.) Itâ€™s easy to make LOKI91 resistant to

this attack; avoid the simple key schedule.

Patents and Licenses

LOKI is not patented. Anyone can implement the algorithm and use it. The source code

implementation in this book is copyrighted by the University of New South Wales. Anyone interested

in using this implementation (or their other implementation, which is several orders of magnitude

faster) in a commercial product should contact Director CITRAD, Department of Computer Science,

University College, UNSW, Australian Defense Force Academy, Canberra ACT 2600, Australia;

FAX: +61 6 268 8581.

13.7 Khufu and Khafre

In 1990 Ralph Merkle proposed two algorithms. The basic design principles behind them are [1071]:

1. DESâ€™s 56-bit key size is too small. Considering the negligible cost of increasing the key

size (computer memory is cheap and plentiful), it should be increased.

2. DESâ€™s extensive use of permutations, while suitable for hardware implementations, is

very difficult to implement in software. The faster software implementations of DES implement

the permutations by table lookup. Table lookup can provide the same â€śdiffusionâ€ť

characteristics as permutation and can be much more flexible.

Table 13.3

P-Box Permutation

32, 24, 16, 8, 31, 23, 15, 7, 30, 22, 14, 6, 29, 21, 13, 5,

28, 20, 12, 4, 27, 19, 11, 3, 26, 18, 10, 2, 25, 17, 9, 1

3. The S-boxes in DES are small, with only 64 4-bit entries per box. Now that memory is

larger, S-boxes should grow. Moreover, all eight S-boxes are used simultaneously. While this is

suitable for hardware, it seems like an unreasonable restriction in software. A larger S-box size

Page 263 of 666

Applied Cryptography: Second Edition - Bruce Schneier

and sequential (rather than parallel) S-box usage should be employed.

4. The initial and final permutations in DES are widely viewed as cryptographically

pointless and should be discarded.

5. All the faster implementations of DES precompute the keys for each round. Given this

fact, there is no reason not to make this computation more complicated.

6. Unlike DES, the S-box design criteria should be public.

To this list, Merkle would probably now add â€śresistant to differential cryptanalysis and to linear

attacks, â€ť but those attacks were still unknown at the time.

Khufu

Khufu is a 64-bit block cipher. The 64-bit plaintext is first divided into two 32-bit halves, L and R.

First, both halves are XORed with some key material. Then, they are subjected to a series of rounds

similar to DES. In each round, the least significant byte of L is used as the input to an S-box. Each S-

box has 8 input bits and 32 output bits. The selected 32-bit entry in the S-box is then XORed with R. L

is then rotated some multiple of 8 bits, L and R are swapped, and the round ends. The S-box itself is

not static, but changes every 8 rounds. Finally, after the last round, L and R are XORed with more

key material, and then combined to form the ciphertext block.

Although parts of the key are XORed with the encryption block at the beginning and end of the

algorithm, the primary purpose of the key is to generate the S-boxes. These S-boxes are secret and, in

essence, part of the key. Khufu calls for a total key size of 512 bits (64 bytes) and gives an algorithm

for generating S-boxes from the key. The number of rounds for the algorithm is left open. Merkle

mentioned that 8-round Khufu is susceptible to a chosen-plaintext attack and recommended 16, 24, or

32 rounds [1071]. (He restricted the choice of rounds to a multiple of eight.)

Because Khufu has key-dependent and secret S-boxes, it is resistant to differential cryptanalysis.

There is a differential attack against 16-round Khufu that recovers the key after 231 chosen plaintexts

[611], but it cannot be extended to more rounds. If brute-force is the best way to attack Khufu, it is

impressively secure. A 512-bit key gives a complexity of 2512â€”inconceivable under any circumstances.

Khafre

Khafre is the second of two cryptosystems proposed by Merkle [1071]. (Khufu and Khafre are names

of Egyptian pharaohs.) It is similar in design to Khufu, except that it was designed for applications

without precomputation time. The S-boxes are not key-dependent. Instead, Khafre uses fixed S-boxes.

And the key is XORed with the encryption block not only before the first round and after the last

round, but also after every 8 rounds of encryption.

Merkle speculated that key sizes of 64- or 128-bits would be used for Khafre and that more rounds of

encryption would be required for Khafre than for Khufu. This, combined with the fact that each

round of Khafre is more complex than for Khufu, makes Khafre slower. In compensation, Khafre

does not require any precomputation and will encrypt small amounts of data more quickly.

In 1990 Biham and Shamir turned their differential cryptanalysis techniques against Khafre [170].

They were able to break 16-round Khafre with a chosen-plaintext attack using about 1500 different

encryptions. It took about an hour, using their personal computer. Converting that to a known-

plaintext attack would require about 238 encryptions. Khafre with 24 rounds can be broken by a

chosen-plaintext attack using 253 encryptions, and a known-plaintext attack using 259 encryptions.

Patents

Page 264 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Both Khufu and Khafre are patented [1072]. Source code for the algorithms are in the patent. Anyone

interested in licensing either or both algorithms should contact Director of Licensing, Xerox

Corporation, P.O. Box 1600, Stamford, CT, 06904-1600.

13.8 RC2

RC2 is a variable-key-size encryption algorithm designed by Ron Rivest for RSA Data Security, Inc.

(RSADSI). Apparently, â€śRCâ€ť stands for â€śRonâ€™s Code, â€ť although it officially stands for â€śRivest

Cipher.â€ť (RC3 was broken at RSADSI during development; RC1 never got further than Rivestâ€™s

notebook.) It is proprietary, and its details have not been published. Donâ€™t think for a minute that this

helps security. RC2 has already appeared in commercial products. As far as I know, RC2 has not

been patented and is only protected as a trade secret.

RC2 is a variable-key-size 64-bit block cipher, designed to be a replacement for DES. According to the

company, software implementations of RC2 are three times faster than DES. The algorithm accepts a

variable-length key, from 0 bytes to the maximum string length the computer system supports;

encryption speed is independent of key size. This key is preprocessed to yield a key-dependent table of

128 bytes. So the number of effectively different keys is 21024. RC2 has no S-boxes [805]; the two

operations are â€śmixâ€ť and â€śmash, â€ť and one is chosen in each round. According to their literature

[1334]:

... RC2 is not an iterative block cipher. This suggests that RC2 offers more protection

against differential and linear cryptanalysis than other block ciphers which have relied

for their security on copying the design of DES.

RSADSIâ€™s refusal to make RC2 public casts doubt on their claims. They are willing to provide details

of the algorithm to most anyone willing to sign a nondisclosure agreement, and have claimed to allow

cryptanalysts to publish any negative results they find. I donâ€™t know of any cryptanalyst outside the

employ of the company who studied it, since it would amount to doing their analysis work for them.

Still, Ron Rivest is not the usual snake-oil peddler. Heâ€™s a respected and competent cryptographer. I

would put a fair degree of trust in the algorithm, even though I havenâ€™t personally inspected the code.

RC4, once the proprietary intellectual property of RSADSI, was posted to the Internet (see Section

17.1), and itâ€™s probably just a matter of time before RC2 is posted as well.

An agreement between the Software Publishers Association (SPA) and the U.S. government gave RC2

and RC4 (see Section 17.1) special export status (see Section 25.14). Products that implement one of

these two algorithms have a much simpler export approval process, provided that the keys are no

more than 40 bits long.

Is a 40-bit key enough? There are a total of one trillion possible keys. Assuming that brute force is the

most efficient method of cryptanalysis (a big assumption, considering that the algorithm has never

been published), and assuming that a brute-force cryptanalysis chip can test one million keys per

second, it will take him 12.7 days to find the correct key. One thousand machines working in parallel

can produce the key in twenty minutes.

RSA Data Security, Inc., maintains that while encryption and decryption are quick, exhaustive key

search is not. A significant amount of time is spent setting up the key schedule. While this time is

negligible when encrypting and decrypting messages, it is not when trying every possible key.

Page 265 of 666

Applied Cryptography: Second Edition - Bruce Schneier

The U.S. government would never allow export of any algorithm it couldnâ€™t, at least in theory, break.

They could create a magnetic tape or CD of a specific plaintext block encrypted with every possible

key. To break a given message, they could just run the tape and compare the ciphertext blocks in the

message with the ciphertext blocks on the tape. If there is a match, they could try the candidate key

and see if the message makes any sense. If they choose a common plaintext block (all zeros, the ASCII

characters for a space, etc.), this method should work. The storage requirement for a 64-bit plaintext

block encrypted with all 1012 possible keys is 8 terabytesâ€”certainly possible.

For information on licensing RC2, contact RSADSI (see Section 25.4).

13.9 IDEA

The first incarnation of the IDEA cipher, by Xuejia Lai and James Massey, surfaced in 1990 [929]. It

was called PES (Proposed Encryption Standard). The next year, after Biham and Shamirâ€™s

demonstrated differential cryptanalysis, the authors strengthened their cipher against the attack and

called the new algorithm IPES (Improved Proposed Encryption Standard) [931,924]. IPES changed

its name to IDEA (International Data Encryption Algorithm) in 1992 [925].

IDEA is based on some impressive theoretical foundations and, although cryptanalysis has made some

progress against reduced-round variants, the algorithm still seems strong. In my opinion, it is the best

and most secure block algorithm available to the public at this time.

The future of IDEA is not yet clear. There has been no rush to adopt it as a replacement to DES,

partly because it is patented and must be licensed for commercial applications, and partly because

people are still waiting to see how well the algorithm fares during the coming years of cryptanalysis.

Its current claim to fame is that it is part of PGP (see Section 24.12).

Overview of IDEA

IDEA is a block cipher; it operates on 64-bit plaintext blocks. The key is 128 bits long. The same

algorithm is used for both encryption and decryption.

As with all the other block ciphers weâ€™ve seen, IDEA uses both confusion and diffusion. The design

philosophy behind the algorithm is one of â€śmixing operations from different algebraic groups.â€ť Three

algebraic groups are being mixed, and they are all easily implemented in both hardware and

software:

â€” XOR

â€” Addition modulo 216

â€” Multiplication modulo 216 + 1. (This operation can be viewed as IDEAâ€™s S-box.)

All these operations (and these are the only operations in the algorithmâ€”there are no bit-level

permutations) operate on 16-bit sub-blocks. This algorithm is even efficient on 16-bit processors.

Description of IDEA

Figure 13.9 is an overview of IDEA. The 64-bit data block is divided into four 16-bit sub-blocks: X1,

X2, X3, and X4. These four sub-blocks become the input to the first round of the algorithm. There are

eight rounds total. In each round the four sub-blocks are XORed, added, and multiplied with one

another and with six 16-bit subkeys. Between rounds, the second and third sub-blocks are swapped.

Page 266 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Finally, the four sub-blocks are combined with four subkeys in an output transformation.

In each round, the sequence of events is as follows:

(1) Multiply X1 and the first subkey.

(2) Add X2 and the second subkey.

(3) Add X3 and the third subkey.

(4) Multiply X4 and the fourth subkey.

(5) XOR the results of steps (1) and (3).

(6) XOR the results of steps (2) and (4).

(7) Multiply the results of step (5) with the fifth subkey.

(8) Add the results of steps (6) and (7).

(9) Multiply the results of step (8) with the sixth subkey.

(10) Add the results of steps (7) and (9).

Figure 13.9 IDEA.

(11) XOR the results of steps (1) and (9).

(12) XOR the results of steps (3) and (9).

(13) XOR the results of steps (2) and (10).

(14) XOR the results of steps (4) and (10).

The output of the round is the four sub-blocks that are the results of steps (11), (12), (13), and (14).

Swap the two inner blocks (except for the last round) and thatâ€™s the input to the next round.

After the eighth round, there is a final output transformation:

(1) Multiply X1 and the first subkey.

(2) Add X2 and the second subkey.

(3) Add X3 and the third subkey.

(4) Multiply X4 and the fourth subkey.

Page 267 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Finally, the four sub-blocks are reattached to produce the ciphertext.

Creating the subkeys is also easy. The algorithm uses 52 of them (six for each of the eight rounds and

four more for the output transformation). First, the 128-bit key is divided into eight 16-bit subkeys.

These are the first eight subkeys for the algorithm (the six for the first round, and the first two for the

second round). Then, the key is rotated 25 bits to the left and again divided into eight subkeys. The

first four are used in round 2; the last four are used in round 3. The key is rotated another 25 bits to

the left for the next eight subkeys, and so on until the end of the algorithm.

Decryption is exactly the same, except that the subkeys are reversed and slightly different. The

decryption subkeys are either the additive or multiplicative inverses of the encryption subkeys. (For

the purposes of IDEA, the all-zero sub-block is considered to represent 216 = â€“ 1 for multiplication

modulo 216 + 1; thus the multiplicative inverse of 0 is 0.) Calculating these takes some doing, but you

only have to do it once for each decryption key. Table 13.4 shows the encryption subkeys and the

corresponding decryption subkeys.

Speed of IDEA

Current software implementations of IDEA are about twice as fast as DES. IDEA on a 33 megahertz

386 machine encrypts data at 880 kilobits per second, and 2400 kilobits per second on a 66 megahertz

486 machine. You might think IDEA should be faster, but multiplications arenâ€™t cheap. To multiply

two 32-bit numbers on a 486 requires 40 clock cycles (10 on a Pentium).

A VLSI implementation of PES encrypts data at 55 megabits per second at 25 megahertz [208, 398].

Another VLSI chip developed at ETH Zurich, consisting of 251, 000 transistors on a chip 107.8

square millimeters, encrypts data using the IDEA algorithm at a 177 megabit-per-second data rate

when clocked at 25 megahertz [926, 207, 397].

Table 13.4

IDEA Encryption and Decryption Subkeys

Round Encryption Subkeys Decryption Subkeys

Z1(1) Z2(1) Z3(1) Z4(1) Z5(1) Z6(1) Z1(9) - 1 â€“Z2(9) â€“Z3(9) Z4(9) - 1 Z5(8) Z6(8)

1st

Z1(2) Z2(2) Z3(2) Z4(2) Z5(2) Z6(2) Z1(8) - 1 â€“Z3(8) â€“Z2(8) Z4(8) - 1 Z5(7) Z6(7)

2nd

Z1(3) Z2(3) Z3(3) Z4(3) Z5(3) Z6(3) Z1(7) - 1 â€“Z3(7) â€“Z2(7) Z4(7) - 1 Z5(6) Z6(6)

3rd

Z1(4) Z2(4) Z3(4) Z4(4) Z5(4) Z6(4) Z1(6) - 1 â€“Z3(6) â€“Z2(6) Z4(6) - 1 Z5(5) Z6(5)

4th

Z1(5) Z2(5) Z3(5) Z4(5) Z5(5) Z6(5) Z1(5) - 1 â€“Z3(5) â€“Z2(5) Z4(5) - 1 Z55(4) Z6(4)

5th

Z1(6) Z2(6) Z3(6) Z4(6) Z5(6) Z6(6) Z1(4) - 1 â€“Z3(4) â€“Z2(4) Z4(4) - 1 Z5(3) Z6(3)

6th

Z1(7) Z2(7) Z3(7) Z4(7) Z5(7) Z6(7) Z1(3) - 1 â€“Z3(3) â€“Z2(3) Z4(3) - 1 Z5(2) Z6(2)

7th

Z1(8) Z2(8) Z3(8) Z4(8) Z5(8) Z6(8) Z1(2) - 1 â€“Z3(2) â€“Z2(2) Z4(2) - 1 Z5(1) Z6(1)

8th

Z1(9) Z2(9) Z3(9) Z4(9) Z1(1) - 1 â€“Z2(1) â€“Z3(1) Z4(1) - 1

output

transformation

Page 268 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Cryptanalysis of IDEA

IDEAâ€™s key length is 128 bitsâ€”over twice as long as DES. Assuming that a brute-force attack is the

most efficient, it would require 2128(1038) encryptions to recover the key. Design a chip that can test a

billion keys per second and throw a billion of them at the problem, and it will still take 10 13 yearsâ€”

thatâ€™s longer than the age of the universe. An array of 1024 such chips can find the key in a day, but

there arenâ€™t enough silicon atoms in the universe to build such a machine. Now weâ€™re getting

somewhereâ€”although Iâ€™d keep my eye on the dark matter debate.

Perhaps brute force isnâ€™t the best way to attack IDEA. The algorithm is still too new for any definitive

cryptanalytic results. The designers have done their best to make the algorithm immune to differential

cryptanalysis; they defined the concept of a Markov cipher and showed that resistance to differential

cryptanalysis can be modeled and quantified [931, 925]. (Figure 13.10 shows the original PES

algorithm to be contrasted with the IDEA algorithm of Figure 13.9 which was strengthened against

differential cryptanalysis. Itâ€™s amazing how a few subtle changes can make such a big difference.) In

[925], Lai argued (he gave evidence, not a proof) that IDEA is immune to differential cryptanalysis

after only 4 of its 8 rounds. According to Biham, his related-key cryptanalytic attack doesnâ€™t work

against IDEA, either [160].

Willi Meier examined the three algebraic operations of IDEA, and pointed out that while they are

incompatible, there are instances where they can be simplified in such a way as to facilitate

cryptanalysis some percentage of the time [1050]. His attack is more efficient than brute-force for 2-

round IDEA (242 operations), but less efficient for 3-round IDEA or higher. Normal IDEA, with 8

rounds, is safe.

Joan Daemen discovered a class of weak keys for IDEA [406, 409]. These are not weak keys in the

sense of the DES weak keys; that is, the encryption function is self-inverse. They are weak in the sense

that if they are used, an attacker can easily identify them in a chosen-plaintext attack. For example, a

weak key is (in hex):

0000, 0000, 0x 00, 0000, 0000, 000x,xxxx,x000

The number at the positions of â€śxâ€ť can be any number. If this key is used, the bit-wise XOR of certain

plaintext pairs guarantees the bit-wise XOR of the resultant ciphertext pairs.

In any case, the chance of accidentally generating one of these weak keys is very small: one in 296.

There is no danger if you choose keys at random. And it is easy to modify IDEA so that it doesnâ€™t have

any weak keys: XOR every subkey with the value 0x0dae [409].

I know of no other cryptanalytic results against IDEA, although many people have tried.

IDEA Modes of Operation and Variants

IDEA can work within any block cipher mode discussed in Chapter 9. Any double-IDEA

implementation would be susceptible to the same meet-in-the-middle attack as DES (see Section 15.1).

However, because IDEAâ€™s key length is more than double DESâ€™s, the attack is impractical. It would

require a storage space of 64*2128 bits, or 1039 bytes. Maybe thereâ€™s enough matter in the universe to

create a memory device that large, but I doubt it.

Page 269 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Figure 13.10 PES.

If youâ€™re worried about parallel universes as well, use a triple-IDEA implementation (see Section

15.2):

C = EK3(DK2(EK1(P)))

It is immune to the meet-in-the-middle attack.

Thereâ€™s also no reason why you canâ€™t implement IDEA with independent subkeys, especially if you

have key-management tools to handle the longer key. IDEA needs a total of 52 16-bit keys, for a total

key length of 832 bits. This variant is definitely more secure, but no one knows by how much.

A naĂŻve variation might double the block size. The algorithm would work just as well with 32-bit sub-

blocks instead of 16-bit sub-blocks, and a 256-bit key. Encryption would be quicker and security

would increase 232 times. Or would it? The theory behind the algorithm hinges on the fact that 216 + 1

is prime; 232 + 1 is not. Perhaps the algorithm could be modified to work, but it would have very

different security properties. Lai says it would be difficult to make it work [926].

While IDEA appears to be significantly more secure than DES, it isnâ€™t always easy to substitute one

for the other in an existing application. If your database and message templates are hardwired to

accept a 64-bit key, it may be impossible to implement IDEAâ€™s 128-bit key.

For those applications, generate a 128-bit key by concatenating the 64-bit key with itself. Remember

that IDEA is weakened considerably by this modification.

If you are more concerned with speed than security, you might consider a variant of IDEA with fewer

rounds. Currently the best attack against IDEA is faster than brute force only for 2.5 rounds or less

[1050]; 4 round IDEA would be twice as fast and, as far as I know, just as secure.

Caveat Emptor

Page 270 of 666

Applied Cryptography: Second Edition - Bruce Schneier

IDEA is a relatively new algorithm, and many questions remain. Is IDEA a group? (Lai thinks not

[926].) Are there any still-undiscovered ways of breaking this cipher? IDEA has a firm theoretical

basis, but time and time again secure-looking algorithms have fallen to new forms of cryptanalysis.

Several academic and military groups have cryptanalyzed IDEA. None of them has gone public about

any successes they might have had. One mightâ€”someday.

Patents and Licenses

IDEA is patented in Europe and the United States [1012, 1013]. The patent is held by Ascom-Tech

AG. No license fee is required for non-commercial use. Commercial users interested in licensing the

algorithm should contact Ascom Systec AG, Dept CMVV, Gewerbepark, CH-5506, MĂ¤genwil,

Switzerland; +41 64 56 59 83; Fax:+41 64 56 59 90; idea@ascom.ch.

13.10 MMB

A complaint against IDEA, that it uses a 64-bit encryption block, was addressed by Joan Daemen in

an algorithm called MMB (Modular Multiplication-based Block cipher) [385, 405, 406]. MMB is

based on the same basic theory as IDEA: mixing operations of different algebraic groups. MMB is an

iterative algorithm that mainly consists of linear steps (XOR and key applications) and the parallel

applications of four large nonlinear invertible substitutions. These substitutions are determined by a

multiplication modulo 232 â€“ 1 with constant factors. The result is an algorithm that has both a 128-bit

key and a 128-bit block size.

MMB operates on 32-bit sub-blocks of text (x0, x1, x2, x3) and 32-bit sub-blocks of key (k0, k1, k2, k3).

This makes the algorithm well suited for implementation on modern, 32-bit processors. A nonlinear

function, f, is applied six times alternating with XORing. Here it is (all index operations are mod 4):

xi = xi ki, for i = 0 to 3

f(x0,x1,x2,x3)

xi = xi ki + 1, for i = 0 to 3

f(x0,x1,x2,x3)

xi = xi ki + 2, for i = 0 to 3

f(x0,x1,x2,x3)

xi = xi ki, for i = 0 to 3

f(x0,x1,x2,x3)

xi = xi ki + 1, for i = 0 to 3

f(x0,x1,x2,x3)

xi = xi ki + 2, for i = 0 to 3

f(x0,x1,x2,x3)

The function f has three steps:

(1) xi = ci * xi, for i = 0 to 3 (If the input to the multiplication is all 1s, the output is also all

1s.)

(2) If the least significant bit of x0 = 1, then x0 = x0 C. If the least significant byte of x3 =

0, then x3 = x3 C.

Page 271 of 666

Applied Cryptography: Second Edition - Bruce Schneier

(3) xi = xi â€“ 1 xi xi + 1, for i = 0 to 3

All index operations are mod 4. The multiplication operation in step (1) is modulo 232 â€“ 1. For

the purposes of the algorithm, if the second operand is 232 â€“ 1, then the result is 232 â€“ 1. The

various constants are:

C = 2aaaaaaa

c0 = 025f1cdb

c1 = 2 * c0

c2 = 23 * c0

c3 = 27 * c0

The constant C is the â€śsimplestâ€ť constant with a high ternary weight, a least-significant bit of zero,

and no circular symmetry. The constant c0 has certain other characteristics. The constants c1, c2, and

c3 are shifted versions of c0, preventing attacks based on symmetry. See [405] for more details.

Decryption is the reverse process. Steps (2) and (3) are their own inverse. Step (1) uses ci-1 instead of

ci. The value of c0-1 is 0dad4694.

Security of MMB

The design of MMB ensures that each round has considerable diffusion independent of the key. In

IDEA, the amount of diffusion is to some extent dependent on the particular subkeys. MMB was also

designed not to have any weak keys as IDEA has.

MMB is dead [402]. Although no cryptanalysis has been published, this is true for several reasons.

First, it was not designed to be resistant to linear cryptanalysis. The multiplication factors were

chosen to be resistant to differential cryptanalysis, but the algorithmâ€™s authors were unaware of linear

cryptanalysis.

Second, Eli Biham has an effective chosen-key attack [160], which exploits the fact that all rounds are

identical and that the key schedule is just a cyclic shift by 32 bits. Third, even though MMB would be

very efficient in software, the algorithm would be less efficient than DES in hardware.

Daemen suggests that anyone interested in improving MMB should first do an analysis of modular

multiplication with respect to linear cryptanalysis and choose a new multiplication factor, and then

make the constant C different for each round [402]. Then, improve the key scheduling by adding

constants to the round keys to remove the bias. Heâ€™s not going to do it; he designed 3-Way instead (see

Section 14.5).

13.11 CA-1.1

CA is a block cipher built on cellular automata, designed by Howard Gutowitz [677, 678, 679]. It

encrypts plaintext in 384-bit blocks and has a 1088-bit key (itâ€™s really two keys, a 1024-bit key and a

64-bit key). Because of the nature of cellular automata, the algorithm is most efficient when

implemented in massively parallel integrated circuits.

Page 272 of 666

Applied Cryptography: Second Edition - Bruce Schneier

CA-1.1 uses both reversible and irreversible cellular automaton rules. Under a reversible rule, each

state of the lattice comes from a unique predecessor state, while under an irreversible rule, each state

can have many predecessor states. During encryption, irreversible rules are iterated backward in

time. To go backward from a given state, one of the possible predecessor states is selected at random.

This process can be repeated many times. Backward iteration thus serves to mix random information

with the message information. CA-1.1 uses a particular kind of partially linear irreversible rule,

which is such that a random predecessor state for any given state can be rapidly built. Reversible

rules are also used for some stages of encryption.

The reversible rules (simple parallel permutations on sub-blocks of the state) are nonlinear. The

irreversible rules are derived entirely from information in the key, while the reversible rules depend

both on key information and on the random information inserted during the stages of encryption with

irreversible rules.

CA-1.1 is built around a block-link structure. That is, the processing of the message block is partially

segregated from the processing of the stream of random information inserted during encryption. This

random information serves to link stages of encryption together. It can also be used to chain together

a ciphertext stream. The information in the link is generated as part of encryption.

Because CA-1.1 is a new algorithm, it is too early to make any pronouncements on its security.

Gutowitz discusses some possible attacks, including differential cryptanalysis, but is unable to break

the algorithm. As an incentive, Gutowitz has offered a $1000 prize to â€śthe first person who develops a

tractable procedure to break CA-1.1.â€ť

CA-1.1 is patented [678], but is available free for non-commercial use. Anyone interested in either

licensing the algorithm or in the cryptanalysis prize should contact Howard Gutowitz, ESPCI,

Laboratoire dâ€™Ć’lectronique, 10 rue Vauquelin, 75005 Paris, France.

13.12 Skipjack

Skipjack is the NSA-developed encryption algorithm for the Clipper and Capstone chips (see Sections

24.16 and 24.17). Since the algorithm is classified Secret, its details have never been published. It will

only be implemented in tamperproof hardware.

The algorithm is classified Secret, not because that enhances its security, but because the NSA doesnâ€™t

want Skipjack being used without the Clipper key-escrow mechanism. They donâ€™t want the algorithm

implemented in software and spread around the world.

Is Skipjack secure? If the NSA wants to produce a secure algorithm, they presumably can. On the

other hand, if the NSA wants to design an algorithm with a trapdoor, they can do that as well.

Hereâ€™s what has been published [1154, 462].

â€” Itâ€™s an iterative block cipher.

â€” The block size is 64 bits.

â€” It has an 80-bit key.

â€” It can be used in ECB, CBC, 64-bit OFB, or 1-, 8-, 16-, 32- or 64-bit CFB modes.

â€” There are 32 rounds of processing per single encrypt or decrypt operation.

â€” NSA started the design in 1985 and completed the evaluation in 1990.

The documentation for the Mykotronx Clipper chip says that the latency for the Skipjack algorithm is

64 clock cycles. This means that each round consists of two clock cycles: presumably one for the S-box

Page 273 of 666

Applied Cryptography: Second Edition - Bruce Schneier

substitution and another for the final XOR at the end of the round. (Remember: permutations take no

time in hardware.) The Mykotronx documentation calls this two-clock-cycle operation a â€śG-box, â€ť

and the whole thing a â€śshift.â€ť (Some part of the G-box is called an â€śF-table, â€ť probably a table of

constants but maybe a table of functions.)

I heard a rumor that Skipjack uses 16 S-boxes, and another that the total memory requirement for

storing the S-boxes is 128 bytes. It is unlikely that both of these rumors are true.

Another rumor implies that Skipjackâ€™s rounds, unlike DESâ€™s, do not operate on half of the block size.

This, combined with the notion of â€śshifts, â€ť an inadvertent statement made at Crypto â€™94 that

Skipjack has â€śa 48-bit internal structure, â€ť implies that it is similar in design to SHA (see Section

18.7) but with four 16-bit sub-blocks: three sub-blocks go through a key-dependent one-way function

to produce 16 bits, which are XORed with the remaining sub-block; then the whole block is circularly

shifted 16 bits to become the input to the next round, or shift. This also implies 128 bytes of S-box

data. I suspect that the S-boxes are key-dependent.

The structure of Skipjack is probably similar to DES. The NSA realizes that their tamperproof

hardware will be reverse-engineered eventually; they wonâ€™t risk any advanced cryptographic

techniques.

The fact that the NSA is planning to use the Skipjack algorithm to encrypt their Defense Messaging

System (DMS) implies that the algorithm is secure. To convince the skeptics, NIST allowed a panel of

â€śrespected experts from outside the government...access to the confidential details of the algorithm to

assess its capabilities and publicly report its findingsâ€ť [812].

The preliminary report of these experts [262] (there never was a final report, and probably never will

be) concluded that:

Under an assumption that the cost of processing power is halved every 18 months, it will

be 36 years before the difficulty of breaking Skipjack by exhaustive search will be equal to

the difficulty of breaking DES today. Thus, there is no significant risk that Skipjack will

be broken by exhaustive search in the next 30â€“40 years.

There is no significant risk that Skipjack can be broken through a shortcut method of

attack, including differential cryptanalysis. There are no weak keys; there is no

complementation property. The experts, not having time to evaluate the algorithm to any

great extent, instead evaluated NSAâ€™s own design and evaluation process.

The strength of Skipjack against a cryptanalytic attack does not depend on the secrecy of

the algorithm.

Of course, the panelists did not look at the algorithm long enough to come to any conclusions

themselves. All they could do was to look at the results that the NSA showed to them.

One unanswered question is whether the Skipjack keyspace is flat (see Section 8.2). Even if Skipjack

has no weak keys in the DES sense, some artifact of the key-scheduling process could make some keys

stronger than others. Skipjack could have 270 strong keys, far more than DES; the odds of choosing

one of those strong keys at random would still be about 1 in 1000. Personally, I think the Skipjack

keyspace is flat, but the fact that no one has ever said this publicly is worrisome.

Skipjack is patented, but the patent is being withheld from distribution by a patent secrecy agreement

[1122]. The patent will only be issued when and if the Skipjack algorithm is successfully reverse-

Page 274 of 666

Applied Cryptography: Second Edition - Bruce Schneier

engineered. This gives the government the best of both worlds: the protection of a patent and the

confidentiality of a trade secret.

Chapter 14

Still Other Block Ciphers

14.1 GOST

GOST is a block algorithm from the former Soviet Union [655,1393]. â€śGOSTâ€ť is an acronym for

â€śGosudarstvennyi Standard,â€ť or Government Standard, sort of similar to a FIPS, except that it can

(and does) refer to just about any kind of standard. (Actually, the full name is Gosudarstvennyi

Standard Soyuza SSR, or Government Standard of the Union of Soviet Socialist Republics.) This

standard is number 28147-89. The Government Committee for Standards of the USSR authorized the

standard, whoever they were.

I donâ€™t know whether GOST 28147-89 was used for classified traffic or just for civilian encryption. A

remark at its beginning states that the algorithm â€śsatisfies all cryptographic requirements and not

limits the grade of information to be protected.â€ť I have heard claims that it was initially used for very

high-grade communications, including classified military communications, but I have no

confirmation.

Description of GOST

GOST is a 64-bit block algorithm with a 256-bit key. GOST also has some additional key material

that will be discussed later. The algorithm iterates a simple encryption algorithm for 32 rounds.

To encrypt, first break the text up into a left half, L. and a right half, R. The subkey for round i is Ki.

A round, i, of GOST is:

Li = Ri - 1

Ri = Li -1 f(Ri - 1, Ki)

Figure 14.1 is a single round of GOST. Function f is straightforward. First, the right half and the ith

subkey are added modulo 232. The result is broken into eight 4-bit chunks, and each chunk becomes

the input to a different S-box. There are eight different S-boxes in GOST; the first 4 bits go into the

first S-box, the second 4 bits go into the second S-box, and so on. Each S-box is a permutation of the

numbers 0 through 15. For example, an S-box might be:

Page 275 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Figure 14.1 One round of GOST.

7, 10, 2, 4, 15, 9, 0, 3, 6, 12, 5, 13, 1, 8, 11

In this case, if the input to the S-box is 0, the output is 7. If the input is 1, the output is 10, and so on.

All eight S-boxes are different; these are considered additional key material. The S-boxes are to be

kept secret.

The outputs of the eight S-boxes are recombined into a 32-bit word, then the entire word undergoes

an 11-bit left circular shift. Finally, the result XORed to the left half to become the new right half, and

the right half becomes the new left half. Do this 32 times and youâ€™re done.

The subkeys are generated simply. The 256-bit key is divided into eight 32-bit blocks: k1, k2,..., k8.

Each round uses a different subkey, as shown in Table 14.1. Decryption is the same as encryption

with the order of the kis reversed.

The GOST standard does not discuss how to generate the S-boxes, only that they are somehow

supplied [655]. This has led to speculation that some Soviet organization would supply good S-boxes

to those organizations it liked and bad S-boxes to those organizations it wished to eavesdrop on. This

may very well be true, but further conversations with a GOST chip manufacturer within Russia

offered another alternative. He generated the S-box permutations himself, using a random-number

generator.

Table 14.1

Use of GOST Subkeys in Different Rounds

Round: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Subkey: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Round: 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Subkey: 1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1

More recently, a set of S-boxes used in an application for the Central Bank of the Russian Federation

surfaced. These S-boxes are also used in the GOSTone-way hash function (see section 18.11)

[657].They are listed in Table 14.2.

Cryptanalysis of GOST

These are the major differences between DES and GOST.

â€” DES has a complicated procedure for generating the subkeys from the keys. GOST has

a very simple procedure.

â€” DES has a 56-bit key; GOST has a 256-bit key. If you add in the secret S-box

permutations, GOST has a total of about 610 bits of secret information.

Page 276 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Table 14.2

GOST S-Boxes

S-box 1:

4 10 9 2 13 8 0 14 6 11 1 12 7 15 5 3

S-box 2:

14 11 4 12 6 13 15 10 2 3 8 1 0 7 5 9

S-box 3:

5 8 1 13 10 3 4 2 14 15 12 7 6 0 9 11

S-box 4:

7 13 10 1 0 8 9 15 14 4 6 12 11 2 5 3

S-box 5:

6 12 7 1 5 15 13 8 4 10 9 14 0 3 11 2

S-box 6:

4 11 10 0 7 2 1 13 3 6 8 5 9 12 15 14

S-box 7:

13 11 4 1 3 15 5 9 0 10 14 7 6 8 2 12

S-box 8:

1 15 13 0 5 7 10 4 9 2 3 14 6 11 8 12

â€” The S-boxes in DES have 6-bit inputs and 4-bit outputs; the S-boxes in GOST have 4-

bit inputs and outputs. Both algorithms have eight S-boxes, but an S-box in GOST is one-fourth

the size of an S-box in DES.

â€” DES has an irregular permutation, called a P-box; GOST uses an 11-bit left circular

shift.

â€” DES has 16 rounds; GOST has 32 rounds.

If there is no better way to break GOST other than brute force, it is a very secure algorithm. GOST

has a 256-bit keyâ€”longer if you count the secret S-boxes. Against differential and linear

cryptanalysis, GOST is probably stronger than DES. Although the random S-boxes in GOST are

probably weaker than the fixed S-boxes in DES, their secrecy adds to GOSTâ€™s resistance against

differential and linear attacks. Also, both of these attacks depend on the number of rounds: the more

rounds, the more difficult the attack. GOST has twice as many rounds as DES; this alone probably

makes both differential and linear cryptanalysis infeasible.

The other parts of GOST are either on par or worse than DES. GOST doesnâ€™t have the same

expansion permutation that DES has. Deleting this permutation from DES weakens it by reducing the

avalanche effect; it is reasonable to believe that GOST is weaker for not having it. GOSTâ€™s use of

addition instead is no less secure than DESâ€™s XOR.

The greatest difference between them seems to be GOSTâ€™s cyclic shift instead of a permutation. The

DES permutation increases the avalanche effect. In GOST a change in one input bit affects one S-box

in one round, which then affects two S-boxes in the next round, three the round after that, and so on.

GOST requires 8 rounds before a single change in an input affects every output bit; DES only

requires 5 rounds. This is certainly a weakness. But remember: GOST has 32 rounds to DESâ€™s 16.

GOSTâ€™s designers tried to achieve a balance between efficiency and security. They modified DESâ€™s

Page 277 of 666

Applied Cryptography: Second Edition - Bruce Schneier

basic design to create an algorithm that is better suited for software implementation. They seem to

have been less sure of their algorithmâ€™s security, and have tried to compensate by making the key

length very large, keeping the S-boxes secret, and doubling the number of iterations. Whether their

efforts have resulted in an algorithm more secure than DES remains to be seen.

14.2 CAST

CAST was designed in Canada by Carlisle Adams and Stafford Tavares [10,7]. They claim that the

name refers to their design procedure and should conjure up images of randomness, but note the

authorsâ€™ initials. The example CAST algorithm uses a 64-bit block size and a 64-bit key.

The structure of CAST should be familiar. The algorithm uses six S-boxes with an 8-bit input and a

32-bit output. Construction of these S-boxes is implementation-dependent and complicated; see the

references for details.

To encrypt, first divide the plaintext block into a left half and a right half. The algorithm has 8

rounds. In each round the right half is combined with some key material using function f and then

XORed with the left half to form the new right half. The original right half (before the round)

becomes the new left half. After 8 rounds (donâ€™t switch the left and right halves after the eighth

round), the two halves are concatenated to form the ciphertext.

Function f is simple:

(1) Divide the 32-bit input into four 8-bit quarters: a, b, c, d.

(2) Divide the 16-bit subkey into two 8-bit halves: e, f.

(3) Process a through S-box 1, b through S-box 2, c through S-box 3, d through S-box 4, e

through S-box 5, and f through S-box 6.

(4) XOR the six S-box outputs together to get the final 32-bit output.

Alternatively, the 32-bit input can be XORed with 32 bits of key, divided into four 8-bit quarters,

processed through the S-boxes, and then XORed together [7]. N rounds of this appears to be as secure

as N + 2 rounds of the other option.

The 16-bit subkey for each round is easily calculated from the 64-bit key. If k1, k2,..., k8 are the 8 bytes

of the key, then the subkeys for each round are:

Round 1: k1, k2

Round 2: k3, k4

Round 3: k5, k6

Round 4: k7, k8

Round 5: k4, k3

Round 6: k2, k1

Round 7: k8, k7

Round 8: k6, k5

The strength of this algorithm lies in its S-boxes. CAST does not have fixed S-boxes; new ones are

constructed for each application. Design criteria are in [10]; bent functions are the S-box columns,

selected for a number of desirable S-box properties (see Section 14.10). Once a set of S-boxes has been

Page 278 of 666

Applied Cryptography: Second Edition - Bruce Schneier

constructed for a given implementation of CAST, they are fixed for all time. The S-boxes are

implementation-dependent, but not key-dependent.

It was shown in [10] that CAST is resistant to differential cryptanalysis and in [728] that CAST is

resistant to linear cryptanalysis. There is no known way to break CAST other than brute force.

Northern Telecom is using CAST in their Entrust security software package for Macintoshes, PCs,

and UNIX workstations. The particular S-boxes they chose are not public. The Canadian government

is evaluating CAST as a new encryption standard. CAST is patent-pending.

14.3 Blowfish

Blowfish is an algorithm of my own design, intended for implementation on large microprocessors

[1388,1389]. The algorithm is unpatented, and the C code in the back of this book is in the public

domain. I designed Blowfish to meet the following design criteria.

1. Fast. Blowfish encrypts data on 32-bit microprocessors at a rate of 26 clock cycles per

byte.

2. Compact. Blowfish can run in less than 5K of memory.

3. Simple. Blowfish uses only simple operations: addition, XORs, and table lookups on

32-bit operands. Its design is easy to analyze which makes it resistant to implementation errors

[1391].

4. Variably Secure. Blowfishâ€™s key length is variable and can be as long as 448 bits.

Blowfish is optimized for applications where the key does not change often, like a communications

link or an automatic file encryptor. It is significantly faster than DES when implemented on 32-bit

microprocessors with large data caches, such as the Pentium and the PowerPC. Blowfish is not

suitable for applications, such as packet switching, with frequent key changes, or as a one-way hash

function. Its large memory requirement makes it infeasible for smart card applications.

Description of Blowfish

Blowfish is a 64-bit block cipher with a variable-length key. The algorithm consists of two parts: key

expansion and data encryption. Key expansion converts a key of up to 448 bits into several subkey

arrays totaling 4168 bytes.

Data encryption consists of a simple function iterated 16 times. Each round consists of a key-

dependent permutation, and a key- and data-dependent substitution. All operations are additions and

XORs on 32-bit words. The only additional operations are four indexed array data lookups per

round.

Blowfish uses a large number of subkeys. These keys must be precomputed before any data

encryption or decryption.

The P-array consists of 18 32-bit subkeys:

P1, P2,..., P18

Four 32-bit S-boxes have 256 entries each:

S1,0, S1,1,..., S1,255

Page 279 of 666

Applied Cryptography: Second Edition - Bruce Schneier

S2,0, S2,1,..., S2,255

S3,0, S3,1,..., S3,255

S4,0, S4,1,..., S4,255

The exact method used to calculate these subkeys will be described later in this section.

Figure 14.2 Blowfish.

Blowfish is a Feistel network (see Section 14.10) consisting of 16 rounds. The input is a 64-bit data

element, x. To encrypt:

Divide x into two 32-bit halves: xL, xR

For i = 1 to 16:

xL = xL Pi

xR = F(xL) xR

Swap xL and xR

Swap xL and xR (Undo the last swap.)

xR = xR P17

xL = xL P18

Recombine xL and xR

Figure 14.3 Function F.

Page 280 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Function F is as follows (see Figure 14.3):

Divide xL into four eight-bit quarters:

a, b, c, and d F(xL) = ((S1,a + S2,b mod 232) S3,c) + S4,d mod 232

Decryption is exactly the same as encryption, except that P1, P2,..., P18 are used in the reverse order.

Implementations of Blowfish that require the fastest speeds should unroll the loop and ensure that all

subkeys are stored in cache. See [568] for details.

The subkeys are calculated using the Blowfish algorithm. The exact method follows.

(1) Initialize first the P-array and then the four S-boxes, in order, with a fixed string. This

string consists of the hexadecimal digits of p.

(2) XOR P1 with the first 32 bits of the key, XOR P2 with the second 32-bits of the key,

and so on for all bits of the key (up to P18). Repeatedly cycle through the key bits until the entire

P-array has been XORed with key bits.

(3) Encrypt the all-zero string with the Blowfish algorithm, using the subkeys described

in steps (1) and (2).

(4) Replace P1 and P2 with the output of step (3).

(5) Encrypt the output of step (3) using the Blowfish algorithm with the modified subkeys.

(6) Replace P3 and P4 with the output of step (5).

(7) Continue the process, replacing all elements of the P-array, and then all four S-boxes

in order, with the output of the continuously changing Blowfish algorithm.

In total, 521 iterations are required to generate all required subkeys. Applications can store the

subkeysâ€”thereâ€™s no need to execute this derivation process multiple times.

Security of Blowfish

Serge Vaudenay examined Blowfish with known S-boxes and r rounds; a differential attack can

recover the P-array with 28r + 1 chosen plaintexts [1568]. For certain weak keys that generate bad S-

boxes (the odds of getting them randomly are 1 in 214), the same attack requires only 24r + 1 chosen

plaintexts to recover the P-array. With unknown S-boxes this attack can detect whether a weak key is

being used, but cannot determine what it is (neither the S-boxes nor the P-array). This attack only

works against reduced-round variants; it is completely ineffective against 16-round Blowfish.

Of course, the discovery of weak keys is significant, even though they seem impossible to exploit. A

weak key is one in which two entries for a given S-box are identical. There is no way to check for

weak keys before doing the key expansion. If you are worried, you have to do the key expansion and

check for identical S-box entries. I donâ€™t think this is necessary, though.

I know of no successful cryptanalysis against Blowfish. To be safe, do not implement Blowfish with a

reduced number of rounds.

Kent Marsh Ltd. has incorporated Blowfish in their FolderBolt security product for Microsoft

Windows and Macintosh. It is also part of Nautilus and PGPfone.

Page 281 of 666

Applied Cryptography: Second Edition - Bruce Schneier

14.4 SAFER

SAFER K-64 stands for Secure And Fast Encryption Routine with a Key of 64 bits [1009]. James

Massey produced this nonproprietary algorithm for Cylink Corp. and it is incorporated into some of

their products. The government of Singapore is planning to use this algorithmâ€”with a 128-bit key

[1010]â€”for a wide variety of applications. There are no patent, copyright, or other restrictions on its

use.

The algorithm has a block and key size of 64 bits. It is not a Feistel network like DES (see Section

14.10), but an iterated block cipher: The same function is applied for some number of rounds. Each

round uses two 64-bit subkeys, and the algorithm only uses operations on bytes.

Description of SAFER K-64

The plaintext block is divided into eight byte-length sub-blocks: B1, B2,..., B7, B8. Then the sub-blocks

go through r rounds. Finally, an output transformation is applied to the sub-blocks. Each round uses

two subkeys: K2i - 1 and K2i.

Figure 14.4 shows one round of SAFER K-64. First, sub-blocks are either XORed or added with bytes

of subkey K2i - 1. Then, the eight sub-blocks are subjected to one of two nonlinear transformations:

y = 45x mod 257. (If x = 128, then y = 0.)

y = log45 x. (If x = 0, then y = 128.)

These are operations in the finite field GF(257), and 45 is a primitive element in that field. In practical

implementations of SAFER K-64, it is quicker to implement this in a lookup table than to calculate

new results all the time.

Then, sub-blocks are either XORed or added with bytes of subkey K2r. The results of this operation

are fed through three layers of linear operations designed to increase the avalanche effect. Each

operation is called a Pseudo-Hadamard Transform (PHT). If the inputs to a PHT are a1 and a2, then

the outputs are:

b1 = (2a1 + a2) mod 256

b2 = (a1 + a2) mod 256

After r rounds, there is a final output transformation. This is the same as the first step of each round.

B1, B4, B5, and B8 are XORed with the corresponding bytes of the last subkey, and B2, B3, B6, and B7

are added to the corresponding bytes of the last subkey. The result is the ciphertext.

Page 282 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Figure 14.4 One round of SAFER.

Decryption is the reverse process: the output transformation (with subtraction instead of addition),

then r reverse rounds. The Inverse PHT (IPHT) is:

a1 = (b1 â€“ b2) mod 256

a2 = (â€“b1 + 2b2) mod 256

Massey recommends 6 rounds, but you can increase that if you want greater security.

Generating subkeys is easy. The first subkey, K1, is simply the user key. Subsequent subkeys are

generated by the following procedure:

Ki+1 = (K1 <<< 3i) + ci

The symbol â€ś<<<â€ť is a left circular shift or a left rotation. The rotation is byte by byte, and ci is a

round constant. If cij is the jth byte of the ith round constant, then you can calculate all of the round

constants by the formula

cij = 4545^((9i + j) mod 256) mod 257 mod 257

Generally, these values are stored in a table.

SAFER K-128

This alternate key schedule was developed by the Ministry of Home Affairs in Singapore, and then

incorporated into SAFER by Massey [1010]. It uses two keys, Ka and Kb, each 64-bits long. The trick

is to generate two subkey sequences in parallel, and then alternate subkeys from each sequence. This

means that if you choose Ka = Kb, then the 128-bit key is compatible with the 64-bit key Ka.

Security of SAFER K-64

Massey showed that SAFER K-64 is immune to differential cryptanalysis after 8 rounds and is

adequately secure against the attack after 6 rounds. After only 3 rounds linear cryptanalysis is

ineffective against this algorithm [1010].

Knudsen found a weakness in the key schedule: For virtually every key, there exists at least one (and

sometimes as many as nine) other key that encrypts some different plaintext to identical ciphertexts

[862]. The number of different plaintexts that encrypt to identical ciphertexts after 6 rounds is

Page 283 of 666

Applied Cryptography: Second Edition - Bruce Schneier

anywhere from 222 to 228. While this attack may not impact SAFERâ€™s security when used as an

encryption algorithm, it greatly reduces its security when used as a one-way hash function. In any

case, Knudsen recommends at least 8 rounds.

SAFER was designed for Cylink, and Cylink is tainted by the NSA [80]. I recommend years of intense

cryptanalysis before using SAFER in any form.

14.5 3-Way

3-Way is a block cipher designed by Joan Daemen [402,410]. It has a 96-bit block length and key

length, and is designed to be very efficient in hardware.

3-Way is not a Feistel network, but it is an iterated block cipher. 3-Way can have n rounds; Daemen

recommends 11.

Description of 3-Way

The algorithm is simple to describe. To encrypt a plaintext block, x:

For i = 0 to n â€“ 1

x = x XOR Ki

x = theta (x)

x = pi â€“ 1 (x)

x = gamma (x)

x = pi â€“ 2 (x)

x = x Kn

x = theta (x)

The functions are:

â€” theta(x) is a linear substitution functionâ€”basically a bunch of circular shifts and

XORs.

â€” piâ€“1(x) and piâ€“2(x) are simple permutations.

â€” gamma(x) is a nonlinear substitution function. This is the step that gives 3-Way its

name; it is the parallel execution of the substitution step on 3-bit blocks of the input.

Decryption is similar to encryption, except that the bits of the input have to be reversed and the bits of

the output have to be reversed. Code to implement 3-Way can be found in the back of this book.

So far, there has been no successful cryptanalysis of 3-Way. The algorithm is unpatented.

14.6 Crab

This algorithm was developed by Burt Kaliski and Matt Robshaw of RSA Laboratories [810]. The

idea behind Crab is to use techniques from one-way hash functions to make a fast encryption

algorithm. Hence, Crab is very similar to MD5, and this section assumes you are familiar with Section

18.5.

Crab has a very large block: 1024 bytes. Since Crab is presented more as a research contribution than

a real algorithm, no definitive key-generation routines are presented. The authors suggest a method

that could turn an 80-bit key into three requisite subkeys, although the algorithm could easily accept

Page 284 of 666

Applied Cryptography: Second Edition - Bruce Schneier

variable-length keys.

Crab uses two sets of large subkeys:

A permutation of the numbers 0 through 255: P 0, P1, P2,..., P255.

A 2048-entry array of 32-bit numbers: S0, S1, S2,..., S2047.

These subkeys must all be calculated before encryption or decryption.

To encrypt a 1024-byte block X:

(1) Divide X into 256 32-bit sub-blocks: X0, X1, X2,..., X255.

(2) Permute the sub-blocks of X according to P.

(3) For r = 0 to 3

For g = 0 to 63

A = X(4g)<<<2r

B = X(4g+1)<<<2r

C = X(4g+2)<<<2r

D = X(4g+3)<<<2r

For step s = 0 to 7

A = A (B + fr(B,C,D) + S512r+8 g+s)

TEMP = D

D=C

C=B

B = A <<< 5

A = TEMP

X(4g)<<<2r = A

X(4g+1)<<<2r = B

X(4g+2)<<<2r = C

X(4g+3)<<<2r = D

(4) Recombine X0, X1, X2,..., X255 to form the ciphertext.

The functions fr(B,C,D) are similar to those used in MD5:

f0(B,C,D) = (B Î› C) Î˝ ((Â¬ B) Î› D)

f1(B,C,D) = (B Î› D) Î˝ (C Î› (Â¬ D))

f2(B,C,D) = B C D

f3(B,C,D) = C (B Î˝ (Â¬ D))

Decryption is the reverse process.

Generating the subkeys is a large task. Here is how the permutation array, P, could be generated from

an 80-bit key, K.

(1) Initialize K0, K1, K2,..., K9 with the 10 bytes of K.

(2) For i = 10 to 255

Page 285 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Ki = Ki - 2 Ki - 6 Ki - 7 Ki - 10

(3) For i = 0 to 255, Pi = i

(4) m = 0

(5) For j = 0 to 1

For i = 256 to 1 step -1

m = (K256 - i + K257 - i) mod i

K257 - i = K257 - i <<< 3

Swap Pi and Pi - 1

The S-array of 2048 32-bit words could be generated in a similar manner, either from the same 80-bit

key or from another key. The authors caution that these details should â€śbe viewed as motivational;

there may very well be alternative schemes which are both more efficient and offer improved

securityâ€ť [810].

Crab was proposed as a testbed of new ideas and not as a working algorithm. It uses many of the

same techniques as MD5. Biham has argued that a very large block size makes an algorithm easier to

cryptanalyze [160]. On the other hand, Crab may make efficient use of a very large key. In such a

case, â€śeasier to cryptanalyzeâ€ť might not mean much.

14.7 SXAL8/MBAL

This is a 64-bit block algorithm from Japan [769]. SXAL8 is the basic algorithm; MBAL is an

expanded version with a variable block length. Since MBAL does some clever things internally, the

authors claim that they can get adequate security with only a few rounds. With a block length of 1024

bytes, MBAL is about 70 times faster than DES. Unfortunately, [1174] shows that MBAL is

susceptible to differential cryptanalysis, and [865] shows that it is susceptible to linear cryptanalysis.

14.8 RC5

RC5 is a block cipher with a variety of parameters: block size, key size, and number of rounds. It was

invented by Ron Rivest and analyzed by RSA Laboratories [1324,1325].

There are three operations: XOR, addition, and rotations. Rotations are constant-time operations on

most processors and variable rotations are a nonlinear function. These rotations, which depend on

both the key and the data, are the interesting operation.

RC5 has a variable-length block, but this example will focus on a 64-bit data block. Encryption uses

2r + 2 key-dependent 32-bit wordsâ€”S0, S1, S2,..., S2r + 1â€”where r is the number of rounds. Weâ€™ll

generate those words later. To encrypt, first divide the plaintext block into two 32-bit words: A and B.

(RC5 assumes a little-endian convention for packing bytes into words: The first byte goes into the

low-order bit positions of register A, etc.) Then:

A = A + S0

B = B + S1

For i = 1 to r:

A = ((A B) <<< B) + S2i

B = ((B A) <<< A) + S2i + 1

The output is in the registers A and B.

Page 286 of 666

Applied Cryptography: Second Edition - Bruce Schneier

Decryption is just as easy. Divide the plaintext block into two words, A and B, and then:

For i = r down to 1:

B = ((B â€“ S2i + 1) >>> A) A

A = ((A â€“ S2i) >>> B) B

B = B â€“ S1

A = A â€“ S0

The symbol â€ś>>>â€ť is a right circular shift. Of course, all addition and subtraction are mod 232.

Creating the array of keys is more complicated, but also straightforward. First, copy the bytes of the

key into an array, L, of c 32-bit words, padding the final word with zeros if necessary. Then, initialize

an array, S, using a linear congruential generator mod 232:

S0 = P

for i = 1 to 2(r + 1) â€“ 1:

Si = (Si - 1 + Q) mod 232

P = 0xb7e15163 and Q = 0x9e3779b9; these constants are based on the binary representation of e and

phi.

Finally, mix L into S:

ńňđ. 12 |