<<

. 12
( 29)



>>



Page 261 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Description of LOKI91

The mechanics of LOKI91 are similar to DES (see Figure 13.8). The data block is then divided into a
left half and a right half and goes through 16 rounds, much like DES. In each round, the right half is
first XORed with a piece of the key, then sent through an expansion permutation (see Table 13.1).

The 48-bit output is divided into four 12-bit blocks, and each block is sent through an S-box
substitution. The S-box substitution is as follows: Take each 12-bit input; use the 2 left-most bits and
the 2 right-most bits to form the number r, and the 8 innermost bits and form the number c. The
output of the S-box, O, is as follows:

0xff) & 0xff)31 mod Pr
O(r,c) = (c + ((r*17)




Figure 13.8 LOKI91.

Pr is given in Table 13.2.

Then, the four 8-bit outputs are recombined to form a single 32-bit number and sent through the
permutation described in Table 13.3. Finally, the right half is XORed with the left half to become the
new left half, and the left half becomes the new right half. After 16 rounds, the block is again XORed
with the key to produce the ciphertext.

The subkeys are generated from the key in a straightforward manner. The 64-bit key is split into a
left half and a right half. In each round, the subkey is the left half. This left half is then rotated 12 or
13 bits to the left, and then every two rounds the left and right halves are exchanged. As with DES,
the same algorithm can be used for both encryption and decryption, with some modification in how
the subkeys are used.

Table 13.1
Expansion Permutation
4, 3, 2, 1, 32, 31, 20, 29, 28, 27, 26, 25,
28, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17,
20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9,
12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1




Page 262 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Table 13.2
Pr

r: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
Pr: 375, 379, 391, 395, 397, 415, 419, 425, 433, 445, 451, 463, 471, 477, 487, 499


Cryptanalysis of LOKI91

Knudsen attempted to cryptanalyze LOKI91 [854, 858], but found it secure against differential
cryptanalysis. However, he found a related-key chosen-plaintext attack that reduces the complexity of
a brute-force search by almost a factor of four. This attack exploits a weakness in the key schedule
and may also apply if the algorithm is used as a one-way hash function (see Section 18.11).

Another attack on related keys can break LOKI91 with 232 chosen-key chosen plaintexts, or 248
chosen-key known plaintexts [158]. The attack is independent of the number of rounds of the
algorithm. (In the same paper, Biham breaks LOKI89 with 217 chosen-key chosen plaintexts or 233
known-key known plaintexts using related-key cryptanalysis.) It™s easy to make LOKI91 resistant to
this attack; avoid the simple key schedule.


Patents and Licenses

LOKI is not patented. Anyone can implement the algorithm and use it. The source code
implementation in this book is copyrighted by the University of New South Wales. Anyone interested
in using this implementation (or their other implementation, which is several orders of magnitude
faster) in a commercial product should contact Director CITRAD, Department of Computer Science,
University College, UNSW, Australian Defense Force Academy, Canberra ACT 2600, Australia;
FAX: +61 6 268 8581.

13.7 Khufu and Khafre

In 1990 Ralph Merkle proposed two algorithms. The basic design principles behind them are [1071]:

1. DES™s 56-bit key size is too small. Considering the negligible cost of increasing the key
size (computer memory is cheap and plentiful), it should be increased.
2. DES™s extensive use of permutations, while suitable for hardware implementations, is
very difficult to implement in software. The faster software implementations of DES implement
the permutations by table lookup. Table lookup can provide the same “diffusion”
characteristics as permutation and can be much more flexible.

Table 13.3
P-Box Permutation
32, 24, 16, 8, 31, 23, 15, 7, 30, 22, 14, 6, 29, 21, 13, 5,
28, 20, 12, 4, 27, 19, 11, 3, 26, 18, 10, 2, 25, 17, 9, 1


3. The S-boxes in DES are small, with only 64 4-bit entries per box. Now that memory is
larger, S-boxes should grow. Moreover, all eight S-boxes are used simultaneously. While this is
suitable for hardware, it seems like an unreasonable restriction in software. A larger S-box size



Page 263 of 666
Applied Cryptography: Second Edition - Bruce Schneier



and sequential (rather than parallel) S-box usage should be employed.
4. The initial and final permutations in DES are widely viewed as cryptographically
pointless and should be discarded.
5. All the faster implementations of DES precompute the keys for each round. Given this
fact, there is no reason not to make this computation more complicated.
6. Unlike DES, the S-box design criteria should be public.

To this list, Merkle would probably now add “resistant to differential cryptanalysis and to linear
attacks, ” but those attacks were still unknown at the time.

Khufu

Khufu is a 64-bit block cipher. The 64-bit plaintext is first divided into two 32-bit halves, L and R.
First, both halves are XORed with some key material. Then, they are subjected to a series of rounds
similar to DES. In each round, the least significant byte of L is used as the input to an S-box. Each S-
box has 8 input bits and 32 output bits. The selected 32-bit entry in the S-box is then XORed with R. L
is then rotated some multiple of 8 bits, L and R are swapped, and the round ends. The S-box itself is
not static, but changes every 8 rounds. Finally, after the last round, L and R are XORed with more
key material, and then combined to form the ciphertext block.

Although parts of the key are XORed with the encryption block at the beginning and end of the
algorithm, the primary purpose of the key is to generate the S-boxes. These S-boxes are secret and, in
essence, part of the key. Khufu calls for a total key size of 512 bits (64 bytes) and gives an algorithm
for generating S-boxes from the key. The number of rounds for the algorithm is left open. Merkle
mentioned that 8-round Khufu is susceptible to a chosen-plaintext attack and recommended 16, 24, or
32 rounds [1071]. (He restricted the choice of rounds to a multiple of eight.)

Because Khufu has key-dependent and secret S-boxes, it is resistant to differential cryptanalysis.
There is a differential attack against 16-round Khufu that recovers the key after 231 chosen plaintexts
[611], but it cannot be extended to more rounds. If brute-force is the best way to attack Khufu, it is
impressively secure. A 512-bit key gives a complexity of 2512”inconceivable under any circumstances.

Khafre

Khafre is the second of two cryptosystems proposed by Merkle [1071]. (Khufu and Khafre are names
of Egyptian pharaohs.) It is similar in design to Khufu, except that it was designed for applications
without precomputation time. The S-boxes are not key-dependent. Instead, Khafre uses fixed S-boxes.
And the key is XORed with the encryption block not only before the first round and after the last
round, but also after every 8 rounds of encryption.

Merkle speculated that key sizes of 64- or 128-bits would be used for Khafre and that more rounds of
encryption would be required for Khafre than for Khufu. This, combined with the fact that each
round of Khafre is more complex than for Khufu, makes Khafre slower. In compensation, Khafre
does not require any precomputation and will encrypt small amounts of data more quickly.

In 1990 Biham and Shamir turned their differential cryptanalysis techniques against Khafre [170].
They were able to break 16-round Khafre with a chosen-plaintext attack using about 1500 different
encryptions. It took about an hour, using their personal computer. Converting that to a known-
plaintext attack would require about 238 encryptions. Khafre with 24 rounds can be broken by a
chosen-plaintext attack using 253 encryptions, and a known-plaintext attack using 259 encryptions.

Patents


Page 264 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Both Khufu and Khafre are patented [1072]. Source code for the algorithms are in the patent. Anyone
interested in licensing either or both algorithms should contact Director of Licensing, Xerox
Corporation, P.O. Box 1600, Stamford, CT, 06904-1600.

13.8 RC2

RC2 is a variable-key-size encryption algorithm designed by Ron Rivest for RSA Data Security, Inc.
(RSADSI). Apparently, “RC” stands for “Ron™s Code, ” although it officially stands for “Rivest
Cipher.” (RC3 was broken at RSADSI during development; RC1 never got further than Rivest™s
notebook.) It is proprietary, and its details have not been published. Don™t think for a minute that this
helps security. RC2 has already appeared in commercial products. As far as I know, RC2 has not
been patented and is only protected as a trade secret.

RC2 is a variable-key-size 64-bit block cipher, designed to be a replacement for DES. According to the
company, software implementations of RC2 are three times faster than DES. The algorithm accepts a
variable-length key, from 0 bytes to the maximum string length the computer system supports;
encryption speed is independent of key size. This key is preprocessed to yield a key-dependent table of
128 bytes. So the number of effectively different keys is 21024. RC2 has no S-boxes [805]; the two
operations are “mix” and “mash, ” and one is chosen in each round. According to their literature
[1334]:

... RC2 is not an iterative block cipher. This suggests that RC2 offers more protection
against differential and linear cryptanalysis than other block ciphers which have relied
for their security on copying the design of DES.


RSADSI™s refusal to make RC2 public casts doubt on their claims. They are willing to provide details
of the algorithm to most anyone willing to sign a nondisclosure agreement, and have claimed to allow
cryptanalysts to publish any negative results they find. I don™t know of any cryptanalyst outside the
employ of the company who studied it, since it would amount to doing their analysis work for them.

Still, Ron Rivest is not the usual snake-oil peddler. He™s a respected and competent cryptographer. I
would put a fair degree of trust in the algorithm, even though I haven™t personally inspected the code.
RC4, once the proprietary intellectual property of RSADSI, was posted to the Internet (see Section
17.1), and it™s probably just a matter of time before RC2 is posted as well.

An agreement between the Software Publishers Association (SPA) and the U.S. government gave RC2
and RC4 (see Section 17.1) special export status (see Section 25.14). Products that implement one of
these two algorithms have a much simpler export approval process, provided that the keys are no
more than 40 bits long.

Is a 40-bit key enough? There are a total of one trillion possible keys. Assuming that brute force is the
most efficient method of cryptanalysis (a big assumption, considering that the algorithm has never
been published), and assuming that a brute-force cryptanalysis chip can test one million keys per
second, it will take him 12.7 days to find the correct key. One thousand machines working in parallel
can produce the key in twenty minutes.

RSA Data Security, Inc., maintains that while encryption and decryption are quick, exhaustive key
search is not. A significant amount of time is spent setting up the key schedule. While this time is
negligible when encrypting and decrypting messages, it is not when trying every possible key.




Page 265 of 666
Applied Cryptography: Second Edition - Bruce Schneier




The U.S. government would never allow export of any algorithm it couldn™t, at least in theory, break.
They could create a magnetic tape or CD of a specific plaintext block encrypted with every possible
key. To break a given message, they could just run the tape and compare the ciphertext blocks in the
message with the ciphertext blocks on the tape. If there is a match, they could try the candidate key
and see if the message makes any sense. If they choose a common plaintext block (all zeros, the ASCII
characters for a space, etc.), this method should work. The storage requirement for a 64-bit plaintext
block encrypted with all 1012 possible keys is 8 terabytes”certainly possible.

For information on licensing RC2, contact RSADSI (see Section 25.4).

13.9 IDEA

The first incarnation of the IDEA cipher, by Xuejia Lai and James Massey, surfaced in 1990 [929]. It
was called PES (Proposed Encryption Standard). The next year, after Biham and Shamir™s
demonstrated differential cryptanalysis, the authors strengthened their cipher against the attack and
called the new algorithm IPES (Improved Proposed Encryption Standard) [931,924]. IPES changed
its name to IDEA (International Data Encryption Algorithm) in 1992 [925].

IDEA is based on some impressive theoretical foundations and, although cryptanalysis has made some
progress against reduced-round variants, the algorithm still seems strong. In my opinion, it is the best
and most secure block algorithm available to the public at this time.

The future of IDEA is not yet clear. There has been no rush to adopt it as a replacement to DES,
partly because it is patented and must be licensed for commercial applications, and partly because
people are still waiting to see how well the algorithm fares during the coming years of cryptanalysis.
Its current claim to fame is that it is part of PGP (see Section 24.12).

Overview of IDEA

IDEA is a block cipher; it operates on 64-bit plaintext blocks. The key is 128 bits long. The same
algorithm is used for both encryption and decryption.

As with all the other block ciphers we™ve seen, IDEA uses both confusion and diffusion. The design
philosophy behind the algorithm is one of “mixing operations from different algebraic groups.” Three
algebraic groups are being mixed, and they are all easily implemented in both hardware and
software:

” XOR
” Addition modulo 216
” Multiplication modulo 216 + 1. (This operation can be viewed as IDEA™s S-box.)

All these operations (and these are the only operations in the algorithm”there are no bit-level
permutations) operate on 16-bit sub-blocks. This algorithm is even efficient on 16-bit processors.

Description of IDEA

Figure 13.9 is an overview of IDEA. The 64-bit data block is divided into four 16-bit sub-blocks: X1,
X2, X3, and X4. These four sub-blocks become the input to the first round of the algorithm. There are
eight rounds total. In each round the four sub-blocks are XORed, added, and multiplied with one
another and with six 16-bit subkeys. Between rounds, the second and third sub-blocks are swapped.



Page 266 of 666
Applied Cryptography: Second Edition - Bruce Schneier



Finally, the four sub-blocks are combined with four subkeys in an output transformation.

In each round, the sequence of events is as follows:

(1) Multiply X1 and the first subkey.
(2) Add X2 and the second subkey.
(3) Add X3 and the third subkey.
(4) Multiply X4 and the fourth subkey.
(5) XOR the results of steps (1) and (3).
(6) XOR the results of steps (2) and (4).
(7) Multiply the results of step (5) with the fifth subkey.
(8) Add the results of steps (6) and (7).
(9) Multiply the results of step (8) with the sixth subkey.
(10) Add the results of steps (7) and (9).




Figure 13.9 IDEA.

(11) XOR the results of steps (1) and (9).
(12) XOR the results of steps (3) and (9).
(13) XOR the results of steps (2) and (10).
(14) XOR the results of steps (4) and (10).

The output of the round is the four sub-blocks that are the results of steps (11), (12), (13), and (14).
Swap the two inner blocks (except for the last round) and that™s the input to the next round.

After the eighth round, there is a final output transformation:

(1) Multiply X1 and the first subkey.
(2) Add X2 and the second subkey.
(3) Add X3 and the third subkey.
(4) Multiply X4 and the fourth subkey.




Page 267 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Finally, the four sub-blocks are reattached to produce the ciphertext.

Creating the subkeys is also easy. The algorithm uses 52 of them (six for each of the eight rounds and
four more for the output transformation). First, the 128-bit key is divided into eight 16-bit subkeys.
These are the first eight subkeys for the algorithm (the six for the first round, and the first two for the
second round). Then, the key is rotated 25 bits to the left and again divided into eight subkeys. The
first four are used in round 2; the last four are used in round 3. The key is rotated another 25 bits to
the left for the next eight subkeys, and so on until the end of the algorithm.


Decryption is exactly the same, except that the subkeys are reversed and slightly different. The
decryption subkeys are either the additive or multiplicative inverses of the encryption subkeys. (For
the purposes of IDEA, the all-zero sub-block is considered to represent 216 = “ 1 for multiplication
modulo 216 + 1; thus the multiplicative inverse of 0 is 0.) Calculating these takes some doing, but you
only have to do it once for each decryption key. Table 13.4 shows the encryption subkeys and the
corresponding decryption subkeys.

Speed of IDEA

Current software implementations of IDEA are about twice as fast as DES. IDEA on a 33 megahertz
386 machine encrypts data at 880 kilobits per second, and 2400 kilobits per second on a 66 megahertz
486 machine. You might think IDEA should be faster, but multiplications aren™t cheap. To multiply
two 32-bit numbers on a 486 requires 40 clock cycles (10 on a Pentium).

A VLSI implementation of PES encrypts data at 55 megabits per second at 25 megahertz [208, 398].
Another VLSI chip developed at ETH Zurich, consisting of 251, 000 transistors on a chip 107.8
square millimeters, encrypts data using the IDEA algorithm at a 177 megabit-per-second data rate
when clocked at 25 megahertz [926, 207, 397].

Table 13.4
IDEA Encryption and Decryption Subkeys
Round Encryption Subkeys Decryption Subkeys

Z1(1) Z2(1) Z3(1) Z4(1) Z5(1) Z6(1) Z1(9) - 1 “Z2(9) “Z3(9) Z4(9) - 1 Z5(8) Z6(8)
1st

Z1(2) Z2(2) Z3(2) Z4(2) Z5(2) Z6(2) Z1(8) - 1 “Z3(8) “Z2(8) Z4(8) - 1 Z5(7) Z6(7)
2nd

Z1(3) Z2(3) Z3(3) Z4(3) Z5(3) Z6(3) Z1(7) - 1 “Z3(7) “Z2(7) Z4(7) - 1 Z5(6) Z6(6)
3rd

Z1(4) Z2(4) Z3(4) Z4(4) Z5(4) Z6(4) Z1(6) - 1 “Z3(6) “Z2(6) Z4(6) - 1 Z5(5) Z6(5)
4th

Z1(5) Z2(5) Z3(5) Z4(5) Z5(5) Z6(5) Z1(5) - 1 “Z3(5) “Z2(5) Z4(5) - 1 Z55(4) Z6(4)
5th

Z1(6) Z2(6) Z3(6) Z4(6) Z5(6) Z6(6) Z1(4) - 1 “Z3(4) “Z2(4) Z4(4) - 1 Z5(3) Z6(3)
6th

Z1(7) Z2(7) Z3(7) Z4(7) Z5(7) Z6(7) Z1(3) - 1 “Z3(3) “Z2(3) Z4(3) - 1 Z5(2) Z6(2)
7th

Z1(8) Z2(8) Z3(8) Z4(8) Z5(8) Z6(8) Z1(2) - 1 “Z3(2) “Z2(2) Z4(2) - 1 Z5(1) Z6(1)
8th

Z1(9) Z2(9) Z3(9) Z4(9) Z1(1) - 1 “Z2(1) “Z3(1) Z4(1) - 1
output
transformation




Page 268 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Cryptanalysis of IDEA

IDEA™s key length is 128 bits”over twice as long as DES. Assuming that a brute-force attack is the
most efficient, it would require 2128(1038) encryptions to recover the key. Design a chip that can test a
billion keys per second and throw a billion of them at the problem, and it will still take 10 13 years”
that™s longer than the age of the universe. An array of 1024 such chips can find the key in a day, but
there aren™t enough silicon atoms in the universe to build such a machine. Now we™re getting
somewhere”although I™d keep my eye on the dark matter debate.

Perhaps brute force isn™t the best way to attack IDEA. The algorithm is still too new for any definitive
cryptanalytic results. The designers have done their best to make the algorithm immune to differential
cryptanalysis; they defined the concept of a Markov cipher and showed that resistance to differential
cryptanalysis can be modeled and quantified [931, 925]. (Figure 13.10 shows the original PES
algorithm to be contrasted with the IDEA algorithm of Figure 13.9 which was strengthened against
differential cryptanalysis. It™s amazing how a few subtle changes can make such a big difference.) In
[925], Lai argued (he gave evidence, not a proof) that IDEA is immune to differential cryptanalysis
after only 4 of its 8 rounds. According to Biham, his related-key cryptanalytic attack doesn™t work
against IDEA, either [160].

Willi Meier examined the three algebraic operations of IDEA, and pointed out that while they are
incompatible, there are instances where they can be simplified in such a way as to facilitate
cryptanalysis some percentage of the time [1050]. His attack is more efficient than brute-force for 2-
round IDEA (242 operations), but less efficient for 3-round IDEA or higher. Normal IDEA, with 8
rounds, is safe.

Joan Daemen discovered a class of weak keys for IDEA [406, 409]. These are not weak keys in the
sense of the DES weak keys; that is, the encryption function is self-inverse. They are weak in the sense
that if they are used, an attacker can easily identify them in a chosen-plaintext attack. For example, a
weak key is (in hex):

0000, 0000, 0x 00, 0000, 0000, 000x,xxxx,x000


The number at the positions of “x” can be any number. If this key is used, the bit-wise XOR of certain
plaintext pairs guarantees the bit-wise XOR of the resultant ciphertext pairs.

In any case, the chance of accidentally generating one of these weak keys is very small: one in 296.
There is no danger if you choose keys at random. And it is easy to modify IDEA so that it doesn™t have
any weak keys: XOR every subkey with the value 0x0dae [409].

I know of no other cryptanalytic results against IDEA, although many people have tried.

IDEA Modes of Operation and Variants

IDEA can work within any block cipher mode discussed in Chapter 9. Any double-IDEA
implementation would be susceptible to the same meet-in-the-middle attack as DES (see Section 15.1).
However, because IDEA™s key length is more than double DES™s, the attack is impractical. It would
require a storage space of 64*2128 bits, or 1039 bytes. Maybe there™s enough matter in the universe to
create a memory device that large, but I doubt it.




Page 269 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Figure 13.10 PES.

If you™re worried about parallel universes as well, use a triple-IDEA implementation (see Section
15.2):

C = EK3(DK2(EK1(P)))

It is immune to the meet-in-the-middle attack.

There™s also no reason why you can™t implement IDEA with independent subkeys, especially if you
have key-management tools to handle the longer key. IDEA needs a total of 52 16-bit keys, for a total
key length of 832 bits. This variant is definitely more secure, but no one knows by how much.

A naïve variation might double the block size. The algorithm would work just as well with 32-bit sub-
blocks instead of 16-bit sub-blocks, and a 256-bit key. Encryption would be quicker and security
would increase 232 times. Or would it? The theory behind the algorithm hinges on the fact that 216 + 1
is prime; 232 + 1 is not. Perhaps the algorithm could be modified to work, but it would have very
different security properties. Lai says it would be difficult to make it work [926].

While IDEA appears to be significantly more secure than DES, it isn™t always easy to substitute one
for the other in an existing application. If your database and message templates are hardwired to
accept a 64-bit key, it may be impossible to implement IDEA™s 128-bit key.

For those applications, generate a 128-bit key by concatenating the 64-bit key with itself. Remember
that IDEA is weakened considerably by this modification.

If you are more concerned with speed than security, you might consider a variant of IDEA with fewer
rounds. Currently the best attack against IDEA is faster than brute force only for 2.5 rounds or less
[1050]; 4 round IDEA would be twice as fast and, as far as I know, just as secure.

Caveat Emptor




Page 270 of 666
Applied Cryptography: Second Edition - Bruce Schneier




IDEA is a relatively new algorithm, and many questions remain. Is IDEA a group? (Lai thinks not
[926].) Are there any still-undiscovered ways of breaking this cipher? IDEA has a firm theoretical
basis, but time and time again secure-looking algorithms have fallen to new forms of cryptanalysis.
Several academic and military groups have cryptanalyzed IDEA. None of them has gone public about
any successes they might have had. One might”someday.

Patents and Licenses

IDEA is patented in Europe and the United States [1012, 1013]. The patent is held by Ascom-Tech
AG. No license fee is required for non-commercial use. Commercial users interested in licensing the
algorithm should contact Ascom Systec AG, Dept CMVV, Gewerbepark, CH-5506, M¤genwil,
Switzerland; +41 64 56 59 83; Fax:+41 64 56 59 90; idea@ascom.ch.

13.10 MMB

A complaint against IDEA, that it uses a 64-bit encryption block, was addressed by Joan Daemen in
an algorithm called MMB (Modular Multiplication-based Block cipher) [385, 405, 406]. MMB is
based on the same basic theory as IDEA: mixing operations of different algebraic groups. MMB is an
iterative algorithm that mainly consists of linear steps (XOR and key applications) and the parallel
applications of four large nonlinear invertible substitutions. These substitutions are determined by a
multiplication modulo 232 “ 1 with constant factors. The result is an algorithm that has both a 128-bit
key and a 128-bit block size.

MMB operates on 32-bit sub-blocks of text (x0, x1, x2, x3) and 32-bit sub-blocks of key (k0, k1, k2, k3).
This makes the algorithm well suited for implementation on modern, 32-bit processors. A nonlinear
function, f, is applied six times alternating with XORing. Here it is (all index operations are mod 4):

xi = xi ki, for i = 0 to 3
f(x0,x1,x2,x3)
xi = xi ki + 1, for i = 0 to 3
f(x0,x1,x2,x3)
xi = xi ki + 2, for i = 0 to 3
f(x0,x1,x2,x3)
xi = xi ki, for i = 0 to 3
f(x0,x1,x2,x3)
xi = xi ki + 1, for i = 0 to 3
f(x0,x1,x2,x3)
xi = xi ki + 2, for i = 0 to 3
f(x0,x1,x2,x3)

The function f has three steps:

(1) xi = ci * xi, for i = 0 to 3 (If the input to the multiplication is all 1s, the output is also all
1s.)
(2) If the least significant bit of x0 = 1, then x0 = x0 C. If the least significant byte of x3 =
0, then x3 = x3 C.




Page 271 of 666
Applied Cryptography: Second Edition - Bruce Schneier



(3) xi = xi “ 1 xi xi + 1, for i = 0 to 3

All index operations are mod 4. The multiplication operation in step (1) is modulo 232 “ 1. For
the purposes of the algorithm, if the second operand is 232 “ 1, then the result is 232 “ 1. The
various constants are:

C = 2aaaaaaa
c0 = 025f1cdb
c1 = 2 * c0
c2 = 23 * c0
c3 = 27 * c0

The constant C is the “simplest” constant with a high ternary weight, a least-significant bit of zero,
and no circular symmetry. The constant c0 has certain other characteristics. The constants c1, c2, and
c3 are shifted versions of c0, preventing attacks based on symmetry. See [405] for more details.

Decryption is the reverse process. Steps (2) and (3) are their own inverse. Step (1) uses ci-1 instead of
ci. The value of c0-1 is 0dad4694.


Security of MMB

The design of MMB ensures that each round has considerable diffusion independent of the key. In
IDEA, the amount of diffusion is to some extent dependent on the particular subkeys. MMB was also
designed not to have any weak keys as IDEA has.

MMB is dead [402]. Although no cryptanalysis has been published, this is true for several reasons.
First, it was not designed to be resistant to linear cryptanalysis. The multiplication factors were
chosen to be resistant to differential cryptanalysis, but the algorithm™s authors were unaware of linear
cryptanalysis.

Second, Eli Biham has an effective chosen-key attack [160], which exploits the fact that all rounds are
identical and that the key schedule is just a cyclic shift by 32 bits. Third, even though MMB would be
very efficient in software, the algorithm would be less efficient than DES in hardware.

Daemen suggests that anyone interested in improving MMB should first do an analysis of modular
multiplication with respect to linear cryptanalysis and choose a new multiplication factor, and then
make the constant C different for each round [402]. Then, improve the key scheduling by adding
constants to the round keys to remove the bias. He™s not going to do it; he designed 3-Way instead (see
Section 14.5).

13.11 CA-1.1

CA is a block cipher built on cellular automata, designed by Howard Gutowitz [677, 678, 679]. It
encrypts plaintext in 384-bit blocks and has a 1088-bit key (it™s really two keys, a 1024-bit key and a
64-bit key). Because of the nature of cellular automata, the algorithm is most efficient when
implemented in massively parallel integrated circuits.




Page 272 of 666
Applied Cryptography: Second Edition - Bruce Schneier




CA-1.1 uses both reversible and irreversible cellular automaton rules. Under a reversible rule, each
state of the lattice comes from a unique predecessor state, while under an irreversible rule, each state
can have many predecessor states. During encryption, irreversible rules are iterated backward in
time. To go backward from a given state, one of the possible predecessor states is selected at random.
This process can be repeated many times. Backward iteration thus serves to mix random information
with the message information. CA-1.1 uses a particular kind of partially linear irreversible rule,
which is such that a random predecessor state for any given state can be rapidly built. Reversible
rules are also used for some stages of encryption.

The reversible rules (simple parallel permutations on sub-blocks of the state) are nonlinear. The
irreversible rules are derived entirely from information in the key, while the reversible rules depend
both on key information and on the random information inserted during the stages of encryption with
irreversible rules.

CA-1.1 is built around a block-link structure. That is, the processing of the message block is partially
segregated from the processing of the stream of random information inserted during encryption. This
random information serves to link stages of encryption together. It can also be used to chain together
a ciphertext stream. The information in the link is generated as part of encryption.

Because CA-1.1 is a new algorithm, it is too early to make any pronouncements on its security.
Gutowitz discusses some possible attacks, including differential cryptanalysis, but is unable to break
the algorithm. As an incentive, Gutowitz has offered a $1000 prize to “the first person who develops a
tractable procedure to break CA-1.1.”

CA-1.1 is patented [678], but is available free for non-commercial use. Anyone interested in either
licensing the algorithm or in the cryptanalysis prize should contact Howard Gutowitz, ESPCI,
Laboratoire d™’lectronique, 10 rue Vauquelin, 75005 Paris, France.

13.12 Skipjack

Skipjack is the NSA-developed encryption algorithm for the Clipper and Capstone chips (see Sections
24.16 and 24.17). Since the algorithm is classified Secret, its details have never been published. It will
only be implemented in tamperproof hardware.

The algorithm is classified Secret, not because that enhances its security, but because the NSA doesn™t
want Skipjack being used without the Clipper key-escrow mechanism. They don™t want the algorithm
implemented in software and spread around the world.

Is Skipjack secure? If the NSA wants to produce a secure algorithm, they presumably can. On the
other hand, if the NSA wants to design an algorithm with a trapdoor, they can do that as well.

Here™s what has been published [1154, 462].

” It™s an iterative block cipher.
” The block size is 64 bits.
” It has an 80-bit key.
” It can be used in ECB, CBC, 64-bit OFB, or 1-, 8-, 16-, 32- or 64-bit CFB modes.
” There are 32 rounds of processing per single encrypt or decrypt operation.
” NSA started the design in 1985 and completed the evaluation in 1990.

The documentation for the Mykotronx Clipper chip says that the latency for the Skipjack algorithm is
64 clock cycles. This means that each round consists of two clock cycles: presumably one for the S-box



Page 273 of 666
Applied Cryptography: Second Edition - Bruce Schneier



substitution and another for the final XOR at the end of the round. (Remember: permutations take no
time in hardware.) The Mykotronx documentation calls this two-clock-cycle operation a “G-box, ”
and the whole thing a “shift.” (Some part of the G-box is called an “F-table, ” probably a table of
constants but maybe a table of functions.)

I heard a rumor that Skipjack uses 16 S-boxes, and another that the total memory requirement for
storing the S-boxes is 128 bytes. It is unlikely that both of these rumors are true.

Another rumor implies that Skipjack™s rounds, unlike DES™s, do not operate on half of the block size.
This, combined with the notion of “shifts, ” an inadvertent statement made at Crypto ™94 that
Skipjack has “a 48-bit internal structure, ” implies that it is similar in design to SHA (see Section
18.7) but with four 16-bit sub-blocks: three sub-blocks go through a key-dependent one-way function
to produce 16 bits, which are XORed with the remaining sub-block; then the whole block is circularly
shifted 16 bits to become the input to the next round, or shift. This also implies 128 bytes of S-box
data. I suspect that the S-boxes are key-dependent.

The structure of Skipjack is probably similar to DES. The NSA realizes that their tamperproof
hardware will be reverse-engineered eventually; they won™t risk any advanced cryptographic
techniques.

The fact that the NSA is planning to use the Skipjack algorithm to encrypt their Defense Messaging
System (DMS) implies that the algorithm is secure. To convince the skeptics, NIST allowed a panel of
“respected experts from outside the government...access to the confidential details of the algorithm to
assess its capabilities and publicly report its findings” [812].

The preliminary report of these experts [262] (there never was a final report, and probably never will
be) concluded that:

Under an assumption that the cost of processing power is halved every 18 months, it will
be 36 years before the difficulty of breaking Skipjack by exhaustive search will be equal to
the difficulty of breaking DES today. Thus, there is no significant risk that Skipjack will
be broken by exhaustive search in the next 30“40 years.

There is no significant risk that Skipjack can be broken through a shortcut method of
attack, including differential cryptanalysis. There are no weak keys; there is no
complementation property. The experts, not having time to evaluate the algorithm to any
great extent, instead evaluated NSA™s own design and evaluation process.

The strength of Skipjack against a cryptanalytic attack does not depend on the secrecy of
the algorithm.

Of course, the panelists did not look at the algorithm long enough to come to any conclusions
themselves. All they could do was to look at the results that the NSA showed to them.

One unanswered question is whether the Skipjack keyspace is flat (see Section 8.2). Even if Skipjack
has no weak keys in the DES sense, some artifact of the key-scheduling process could make some keys
stronger than others. Skipjack could have 270 strong keys, far more than DES; the odds of choosing
one of those strong keys at random would still be about 1 in 1000. Personally, I think the Skipjack
keyspace is flat, but the fact that no one has ever said this publicly is worrisome.

Skipjack is patented, but the patent is being withheld from distribution by a patent secrecy agreement
[1122]. The patent will only be issued when and if the Skipjack algorithm is successfully reverse-




Page 274 of 666
Applied Cryptography: Second Edition - Bruce Schneier



engineered. This gives the government the best of both worlds: the protection of a patent and the
confidentiality of a trade secret.


Chapter 14
Still Other Block Ciphers
14.1 GOST

GOST is a block algorithm from the former Soviet Union [655,1393]. “GOST” is an acronym for
“Gosudarstvennyi Standard,” or Government Standard, sort of similar to a FIPS, except that it can
(and does) refer to just about any kind of standard. (Actually, the full name is Gosudarstvennyi
Standard Soyuza SSR, or Government Standard of the Union of Soviet Socialist Republics.) This
standard is number 28147-89. The Government Committee for Standards of the USSR authorized the
standard, whoever they were.

I don™t know whether GOST 28147-89 was used for classified traffic or just for civilian encryption. A
remark at its beginning states that the algorithm “satisfies all cryptographic requirements and not
limits the grade of information to be protected.” I have heard claims that it was initially used for very
high-grade communications, including classified military communications, but I have no
confirmation.

Description of GOST

GOST is a 64-bit block algorithm with a 256-bit key. GOST also has some additional key material
that will be discussed later. The algorithm iterates a simple encryption algorithm for 32 rounds.

To encrypt, first break the text up into a left half, L. and a right half, R. The subkey for round i is Ki.
A round, i, of GOST is:

Li = Ri - 1
Ri = Li -1 f(Ri - 1, Ki)

Figure 14.1 is a single round of GOST. Function f is straightforward. First, the right half and the ith
subkey are added modulo 232. The result is broken into eight 4-bit chunks, and each chunk becomes
the input to a different S-box. There are eight different S-boxes in GOST; the first 4 bits go into the
first S-box, the second 4 bits go into the second S-box, and so on. Each S-box is a permutation of the
numbers 0 through 15. For example, an S-box might be:




Page 275 of 666
Applied Cryptography: Second Edition - Bruce Schneier



Figure 14.1 One round of GOST.

7, 10, 2, 4, 15, 9, 0, 3, 6, 12, 5, 13, 1, 8, 11

In this case, if the input to the S-box is 0, the output is 7. If the input is 1, the output is 10, and so on.
All eight S-boxes are different; these are considered additional key material. The S-boxes are to be
kept secret.

The outputs of the eight S-boxes are recombined into a 32-bit word, then the entire word undergoes
an 11-bit left circular shift. Finally, the result XORed to the left half to become the new right half, and
the right half becomes the new left half. Do this 32 times and you™re done.

The subkeys are generated simply. The 256-bit key is divided into eight 32-bit blocks: k1, k2,..., k8.
Each round uses a different subkey, as shown in Table 14.1. Decryption is the same as encryption
with the order of the kis reversed.

The GOST standard does not discuss how to generate the S-boxes, only that they are somehow
supplied [655]. This has led to speculation that some Soviet organization would supply good S-boxes
to those organizations it liked and bad S-boxes to those organizations it wished to eavesdrop on. This
may very well be true, but further conversations with a GOST chip manufacturer within Russia
offered another alternative. He generated the S-box permutations himself, using a random-number
generator.

Table 14.1
Use of GOST Subkeys in Different Rounds
Round: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Subkey: 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Round: 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Subkey: 1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1


More recently, a set of S-boxes used in an application for the Central Bank of the Russian Federation
surfaced. These S-boxes are also used in the GOSTone-way hash function (see section 18.11)
[657].They are listed in Table 14.2.


Cryptanalysis of GOST

These are the major differences between DES and GOST.

” DES has a complicated procedure for generating the subkeys from the keys. GOST has
a very simple procedure.
” DES has a 56-bit key; GOST has a 256-bit key. If you add in the secret S-box
permutations, GOST has a total of about 610 bits of secret information.




Page 276 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Table 14.2
GOST S-Boxes
S-box 1:
4 10 9 2 13 8 0 14 6 11 1 12 7 15 5 3
S-box 2:
14 11 4 12 6 13 15 10 2 3 8 1 0 7 5 9
S-box 3:
5 8 1 13 10 3 4 2 14 15 12 7 6 0 9 11
S-box 4:
7 13 10 1 0 8 9 15 14 4 6 12 11 2 5 3
S-box 5:
6 12 7 1 5 15 13 8 4 10 9 14 0 3 11 2
S-box 6:
4 11 10 0 7 2 1 13 3 6 8 5 9 12 15 14
S-box 7:
13 11 4 1 3 15 5 9 0 10 14 7 6 8 2 12
S-box 8:
1 15 13 0 5 7 10 4 9 2 3 14 6 11 8 12



” The S-boxes in DES have 6-bit inputs and 4-bit outputs; the S-boxes in GOST have 4-
bit inputs and outputs. Both algorithms have eight S-boxes, but an S-box in GOST is one-fourth
the size of an S-box in DES.
” DES has an irregular permutation, called a P-box; GOST uses an 11-bit left circular
shift.
” DES has 16 rounds; GOST has 32 rounds.

If there is no better way to break GOST other than brute force, it is a very secure algorithm. GOST
has a 256-bit key”longer if you count the secret S-boxes. Against differential and linear
cryptanalysis, GOST is probably stronger than DES. Although the random S-boxes in GOST are
probably weaker than the fixed S-boxes in DES, their secrecy adds to GOST™s resistance against
differential and linear attacks. Also, both of these attacks depend on the number of rounds: the more
rounds, the more difficult the attack. GOST has twice as many rounds as DES; this alone probably
makes both differential and linear cryptanalysis infeasible.

The other parts of GOST are either on par or worse than DES. GOST doesn™t have the same
expansion permutation that DES has. Deleting this permutation from DES weakens it by reducing the
avalanche effect; it is reasonable to believe that GOST is weaker for not having it. GOST™s use of
addition instead is no less secure than DES™s XOR.

The greatest difference between them seems to be GOST™s cyclic shift instead of a permutation. The
DES permutation increases the avalanche effect. In GOST a change in one input bit affects one S-box
in one round, which then affects two S-boxes in the next round, three the round after that, and so on.
GOST requires 8 rounds before a single change in an input affects every output bit; DES only
requires 5 rounds. This is certainly a weakness. But remember: GOST has 32 rounds to DES™s 16.

GOST™s designers tried to achieve a balance between efficiency and security. They modified DES™s



Page 277 of 666
Applied Cryptography: Second Edition - Bruce Schneier



basic design to create an algorithm that is better suited for software implementation. They seem to
have been less sure of their algorithm™s security, and have tried to compensate by making the key
length very large, keeping the S-boxes secret, and doubling the number of iterations. Whether their
efforts have resulted in an algorithm more secure than DES remains to be seen.

14.2 CAST

CAST was designed in Canada by Carlisle Adams and Stafford Tavares [10,7]. They claim that the
name refers to their design procedure and should conjure up images of randomness, but note the
authors™ initials. The example CAST algorithm uses a 64-bit block size and a 64-bit key.

The structure of CAST should be familiar. The algorithm uses six S-boxes with an 8-bit input and a
32-bit output. Construction of these S-boxes is implementation-dependent and complicated; see the
references for details.

To encrypt, first divide the plaintext block into a left half and a right half. The algorithm has 8
rounds. In each round the right half is combined with some key material using function f and then
XORed with the left half to form the new right half. The original right half (before the round)
becomes the new left half. After 8 rounds (don™t switch the left and right halves after the eighth
round), the two halves are concatenated to form the ciphertext.

Function f is simple:

(1) Divide the 32-bit input into four 8-bit quarters: a, b, c, d.
(2) Divide the 16-bit subkey into two 8-bit halves: e, f.
(3) Process a through S-box 1, b through S-box 2, c through S-box 3, d through S-box 4, e
through S-box 5, and f through S-box 6.
(4) XOR the six S-box outputs together to get the final 32-bit output.

Alternatively, the 32-bit input can be XORed with 32 bits of key, divided into four 8-bit quarters,
processed through the S-boxes, and then XORed together [7]. N rounds of this appears to be as secure
as N + 2 rounds of the other option.

The 16-bit subkey for each round is easily calculated from the 64-bit key. If k1, k2,..., k8 are the 8 bytes
of the key, then the subkeys for each round are:

Round 1: k1, k2
Round 2: k3, k4
Round 3: k5, k6
Round 4: k7, k8
Round 5: k4, k3
Round 6: k2, k1
Round 7: k8, k7
Round 8: k6, k5

The strength of this algorithm lies in its S-boxes. CAST does not have fixed S-boxes; new ones are
constructed for each application. Design criteria are in [10]; bent functions are the S-box columns,
selected for a number of desirable S-box properties (see Section 14.10). Once a set of S-boxes has been



Page 278 of 666
Applied Cryptography: Second Edition - Bruce Schneier



constructed for a given implementation of CAST, they are fixed for all time. The S-boxes are
implementation-dependent, but not key-dependent.

It was shown in [10] that CAST is resistant to differential cryptanalysis and in [728] that CAST is
resistant to linear cryptanalysis. There is no known way to break CAST other than brute force.

Northern Telecom is using CAST in their Entrust security software package for Macintoshes, PCs,
and UNIX workstations. The particular S-boxes they chose are not public. The Canadian government
is evaluating CAST as a new encryption standard. CAST is patent-pending.

14.3 Blowfish

Blowfish is an algorithm of my own design, intended for implementation on large microprocessors
[1388,1389]. The algorithm is unpatented, and the C code in the back of this book is in the public
domain. I designed Blowfish to meet the following design criteria.

1. Fast. Blowfish encrypts data on 32-bit microprocessors at a rate of 26 clock cycles per
byte.
2. Compact. Blowfish can run in less than 5K of memory.
3. Simple. Blowfish uses only simple operations: addition, XORs, and table lookups on
32-bit operands. Its design is easy to analyze which makes it resistant to implementation errors
[1391].
4. Variably Secure. Blowfish™s key length is variable and can be as long as 448 bits.


Blowfish is optimized for applications where the key does not change often, like a communications
link or an automatic file encryptor. It is significantly faster than DES when implemented on 32-bit
microprocessors with large data caches, such as the Pentium and the PowerPC. Blowfish is not
suitable for applications, such as packet switching, with frequent key changes, or as a one-way hash
function. Its large memory requirement makes it infeasible for smart card applications.

Description of Blowfish

Blowfish is a 64-bit block cipher with a variable-length key. The algorithm consists of two parts: key
expansion and data encryption. Key expansion converts a key of up to 448 bits into several subkey
arrays totaling 4168 bytes.

Data encryption consists of a simple function iterated 16 times. Each round consists of a key-
dependent permutation, and a key- and data-dependent substitution. All operations are additions and
XORs on 32-bit words. The only additional operations are four indexed array data lookups per
round.

Blowfish uses a large number of subkeys. These keys must be precomputed before any data
encryption or decryption.

The P-array consists of 18 32-bit subkeys:

P1, P2,..., P18

Four 32-bit S-boxes have 256 entries each:

S1,0, S1,1,..., S1,255



Page 279 of 666
Applied Cryptography: Second Edition - Bruce Schneier



S2,0, S2,1,..., S2,255
S3,0, S3,1,..., S3,255
S4,0, S4,1,..., S4,255

The exact method used to calculate these subkeys will be described later in this section.




Figure 14.2 Blowfish.

Blowfish is a Feistel network (see Section 14.10) consisting of 16 rounds. The input is a 64-bit data
element, x. To encrypt:

Divide x into two 32-bit halves: xL, xR
For i = 1 to 16:
xL = xL Pi
xR = F(xL) xR
Swap xL and xR
Swap xL and xR (Undo the last swap.)
xR = xR P17
xL = xL P18
Recombine xL and xR




Figure 14.3 Function F.




Page 280 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Function F is as follows (see Figure 14.3):

Divide xL into four eight-bit quarters:
a, b, c, and d F(xL) = ((S1,a + S2,b mod 232) S3,c) + S4,d mod 232

Decryption is exactly the same as encryption, except that P1, P2,..., P18 are used in the reverse order.

Implementations of Blowfish that require the fastest speeds should unroll the loop and ensure that all
subkeys are stored in cache. See [568] for details.

The subkeys are calculated using the Blowfish algorithm. The exact method follows.

(1) Initialize first the P-array and then the four S-boxes, in order, with a fixed string. This
string consists of the hexadecimal digits of p.
(2) XOR P1 with the first 32 bits of the key, XOR P2 with the second 32-bits of the key,
and so on for all bits of the key (up to P18). Repeatedly cycle through the key bits until the entire
P-array has been XORed with key bits.
(3) Encrypt the all-zero string with the Blowfish algorithm, using the subkeys described
in steps (1) and (2).
(4) Replace P1 and P2 with the output of step (3).
(5) Encrypt the output of step (3) using the Blowfish algorithm with the modified subkeys.
(6) Replace P3 and P4 with the output of step (5).
(7) Continue the process, replacing all elements of the P-array, and then all four S-boxes
in order, with the output of the continuously changing Blowfish algorithm.

In total, 521 iterations are required to generate all required subkeys. Applications can store the
subkeys”there™s no need to execute this derivation process multiple times.

Security of Blowfish

Serge Vaudenay examined Blowfish with known S-boxes and r rounds; a differential attack can
recover the P-array with 28r + 1 chosen plaintexts [1568]. For certain weak keys that generate bad S-
boxes (the odds of getting them randomly are 1 in 214), the same attack requires only 24r + 1 chosen
plaintexts to recover the P-array. With unknown S-boxes this attack can detect whether a weak key is
being used, but cannot determine what it is (neither the S-boxes nor the P-array). This attack only
works against reduced-round variants; it is completely ineffective against 16-round Blowfish.

Of course, the discovery of weak keys is significant, even though they seem impossible to exploit. A
weak key is one in which two entries for a given S-box are identical. There is no way to check for
weak keys before doing the key expansion. If you are worried, you have to do the key expansion and
check for identical S-box entries. I don™t think this is necessary, though.

I know of no successful cryptanalysis against Blowfish. To be safe, do not implement Blowfish with a
reduced number of rounds.

Kent Marsh Ltd. has incorporated Blowfish in their FolderBolt security product for Microsoft
Windows and Macintosh. It is also part of Nautilus and PGPfone.




Page 281 of 666
Applied Cryptography: Second Edition - Bruce Schneier




14.4 SAFER

SAFER K-64 stands for Secure And Fast Encryption Routine with a Key of 64 bits [1009]. James
Massey produced this nonproprietary algorithm for Cylink Corp. and it is incorporated into some of
their products. The government of Singapore is planning to use this algorithm”with a 128-bit key
[1010]”for a wide variety of applications. There are no patent, copyright, or other restrictions on its
use.

The algorithm has a block and key size of 64 bits. It is not a Feistel network like DES (see Section
14.10), but an iterated block cipher: The same function is applied for some number of rounds. Each
round uses two 64-bit subkeys, and the algorithm only uses operations on bytes.

Description of SAFER K-64

The plaintext block is divided into eight byte-length sub-blocks: B1, B2,..., B7, B8. Then the sub-blocks
go through r rounds. Finally, an output transformation is applied to the sub-blocks. Each round uses
two subkeys: K2i - 1 and K2i.

Figure 14.4 shows one round of SAFER K-64. First, sub-blocks are either XORed or added with bytes
of subkey K2i - 1. Then, the eight sub-blocks are subjected to one of two nonlinear transformations:

y = 45x mod 257. (If x = 128, then y = 0.)
y = log45 x. (If x = 0, then y = 128.)


These are operations in the finite field GF(257), and 45 is a primitive element in that field. In practical
implementations of SAFER K-64, it is quicker to implement this in a lookup table than to calculate
new results all the time.

Then, sub-blocks are either XORed or added with bytes of subkey K2r. The results of this operation
are fed through three layers of linear operations designed to increase the avalanche effect. Each
operation is called a Pseudo-Hadamard Transform (PHT). If the inputs to a PHT are a1 and a2, then
the outputs are:

b1 = (2a1 + a2) mod 256
b2 = (a1 + a2) mod 256

After r rounds, there is a final output transformation. This is the same as the first step of each round.
B1, B4, B5, and B8 are XORed with the corresponding bytes of the last subkey, and B2, B3, B6, and B7
are added to the corresponding bytes of the last subkey. The result is the ciphertext.




Page 282 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Figure 14.4 One round of SAFER.

Decryption is the reverse process: the output transformation (with subtraction instead of addition),
then r reverse rounds. The Inverse PHT (IPHT) is:

a1 = (b1 “ b2) mod 256
a2 = (“b1 + 2b2) mod 256

Massey recommends 6 rounds, but you can increase that if you want greater security.

Generating subkeys is easy. The first subkey, K1, is simply the user key. Subsequent subkeys are
generated by the following procedure:

Ki+1 = (K1 <<< 3i) + ci

The symbol “<<<” is a left circular shift or a left rotation. The rotation is byte by byte, and ci is a
round constant. If cij is the jth byte of the ith round constant, then you can calculate all of the round
constants by the formula

cij = 4545^((9i + j) mod 256) mod 257 mod 257

Generally, these values are stored in a table.

SAFER K-128

This alternate key schedule was developed by the Ministry of Home Affairs in Singapore, and then
incorporated into SAFER by Massey [1010]. It uses two keys, Ka and Kb, each 64-bits long. The trick
is to generate two subkey sequences in parallel, and then alternate subkeys from each sequence. This
means that if you choose Ka = Kb, then the 128-bit key is compatible with the 64-bit key Ka.

Security of SAFER K-64

Massey showed that SAFER K-64 is immune to differential cryptanalysis after 8 rounds and is
adequately secure against the attack after 6 rounds. After only 3 rounds linear cryptanalysis is
ineffective against this algorithm [1010].

Knudsen found a weakness in the key schedule: For virtually every key, there exists at least one (and
sometimes as many as nine) other key that encrypts some different plaintext to identical ciphertexts
[862]. The number of different plaintexts that encrypt to identical ciphertexts after 6 rounds is




Page 283 of 666
Applied Cryptography: Second Edition - Bruce Schneier



anywhere from 222 to 228. While this attack may not impact SAFER™s security when used as an
encryption algorithm, it greatly reduces its security when used as a one-way hash function. In any
case, Knudsen recommends at least 8 rounds.

SAFER was designed for Cylink, and Cylink is tainted by the NSA [80]. I recommend years of intense
cryptanalysis before using SAFER in any form.

14.5 3-Way

3-Way is a block cipher designed by Joan Daemen [402,410]. It has a 96-bit block length and key
length, and is designed to be very efficient in hardware.

3-Way is not a Feistel network, but it is an iterated block cipher. 3-Way can have n rounds; Daemen
recommends 11.

Description of 3-Way

The algorithm is simple to describe. To encrypt a plaintext block, x:

For i = 0 to n “ 1
x = x XOR Ki
x = theta (x)
x = pi “ 1 (x)
x = gamma (x)
x = pi “ 2 (x)
x = x Kn
x = theta (x)

The functions are:

” theta(x) is a linear substitution function”basically a bunch of circular shifts and
XORs.
” pi“1(x) and pi“2(x) are simple permutations.
” gamma(x) is a nonlinear substitution function. This is the step that gives 3-Way its
name; it is the parallel execution of the substitution step on 3-bit blocks of the input.

Decryption is similar to encryption, except that the bits of the input have to be reversed and the bits of
the output have to be reversed. Code to implement 3-Way can be found in the back of this book.

So far, there has been no successful cryptanalysis of 3-Way. The algorithm is unpatented.

14.6 Crab

This algorithm was developed by Burt Kaliski and Matt Robshaw of RSA Laboratories [810]. The
idea behind Crab is to use techniques from one-way hash functions to make a fast encryption
algorithm. Hence, Crab is very similar to MD5, and this section assumes you are familiar with Section
18.5.

Crab has a very large block: 1024 bytes. Since Crab is presented more as a research contribution than
a real algorithm, no definitive key-generation routines are presented. The authors suggest a method
that could turn an 80-bit key into three requisite subkeys, although the algorithm could easily accept




Page 284 of 666
Applied Cryptography: Second Edition - Bruce Schneier



variable-length keys.

Crab uses two sets of large subkeys:

A permutation of the numbers 0 through 255: P 0, P1, P2,..., P255.
A 2048-entry array of 32-bit numbers: S0, S1, S2,..., S2047.


These subkeys must all be calculated before encryption or decryption.

To encrypt a 1024-byte block X:

(1) Divide X into 256 32-bit sub-blocks: X0, X1, X2,..., X255.
(2) Permute the sub-blocks of X according to P.
(3) For r = 0 to 3
For g = 0 to 63
A = X(4g)<<<2r
B = X(4g+1)<<<2r
C = X(4g+2)<<<2r
D = X(4g+3)<<<2r
For step s = 0 to 7
A = A (B + fr(B,C,D) + S512r+8 g+s)
TEMP = D
D=C
C=B
B = A <<< 5
A = TEMP
X(4g)<<<2r = A
X(4g+1)<<<2r = B
X(4g+2)<<<2r = C
X(4g+3)<<<2r = D
(4) Recombine X0, X1, X2,..., X255 to form the ciphertext.

The functions fr(B,C,D) are similar to those used in MD5:

f0(B,C,D) = (B Λ C) ν ((¬ B) Λ D)
f1(B,C,D) = (B Λ D) ν (C Λ (¬ D))
f2(B,C,D) = B C D
f3(B,C,D) = C (B ν (¬ D))

Decryption is the reverse process.

Generating the subkeys is a large task. Here is how the permutation array, P, could be generated from
an 80-bit key, K.

(1) Initialize K0, K1, K2,..., K9 with the 10 bytes of K.
(2) For i = 10 to 255



Page 285 of 666
Applied Cryptography: Second Edition - Bruce Schneier



Ki = Ki - 2 Ki - 6 Ki - 7 Ki - 10
(3) For i = 0 to 255, Pi = i
(4) m = 0
(5) For j = 0 to 1
For i = 256 to 1 step -1
m = (K256 - i + K257 - i) mod i
K257 - i = K257 - i <<< 3
Swap Pi and Pi - 1

The S-array of 2048 32-bit words could be generated in a similar manner, either from the same 80-bit
key or from another key. The authors caution that these details should “be viewed as motivational;
there may very well be alternative schemes which are both more efficient and offer improved
security” [810].

Crab was proposed as a testbed of new ideas and not as a working algorithm. It uses many of the
same techniques as MD5. Biham has argued that a very large block size makes an algorithm easier to
cryptanalyze [160]. On the other hand, Crab may make efficient use of a very large key. In such a
case, “easier to cryptanalyze” might not mean much.

14.7 SXAL8/MBAL

This is a 64-bit block algorithm from Japan [769]. SXAL8 is the basic algorithm; MBAL is an
expanded version with a variable block length. Since MBAL does some clever things internally, the
authors claim that they can get adequate security with only a few rounds. With a block length of 1024
bytes, MBAL is about 70 times faster than DES. Unfortunately, [1174] shows that MBAL is
susceptible to differential cryptanalysis, and [865] shows that it is susceptible to linear cryptanalysis.

14.8 RC5

RC5 is a block cipher with a variety of parameters: block size, key size, and number of rounds. It was
invented by Ron Rivest and analyzed by RSA Laboratories [1324,1325].

There are three operations: XOR, addition, and rotations. Rotations are constant-time operations on
most processors and variable rotations are a nonlinear function. These rotations, which depend on
both the key and the data, are the interesting operation.

RC5 has a variable-length block, but this example will focus on a 64-bit data block. Encryption uses
2r + 2 key-dependent 32-bit words”S0, S1, S2,..., S2r + 1”where r is the number of rounds. We™ll
generate those words later. To encrypt, first divide the plaintext block into two 32-bit words: A and B.
(RC5 assumes a little-endian convention for packing bytes into words: The first byte goes into the
low-order bit positions of register A, etc.) Then:

A = A + S0
B = B + S1
For i = 1 to r:
A = ((A B) <<< B) + S2i
B = ((B A) <<< A) + S2i + 1

The output is in the registers A and B.




Page 286 of 666
Applied Cryptography: Second Edition - Bruce Schneier




Decryption is just as easy. Divide the plaintext block into two words, A and B, and then:

For i = r down to 1:
B = ((B “ S2i + 1) >>> A) A
A = ((A “ S2i) >>> B) B
B = B “ S1
A = A “ S0

The symbol “>>>” is a right circular shift. Of course, all addition and subtraction are mod 232.

Creating the array of keys is more complicated, but also straightforward. First, copy the bytes of the
key into an array, L, of c 32-bit words, padding the final word with zeros if necessary. Then, initialize
an array, S, using a linear congruential generator mod 232:

S0 = P
for i = 1 to 2(r + 1) “ 1:
Si = (Si - 1 + Q) mod 232

P = 0xb7e15163 and Q = 0x9e3779b9; these constants are based on the binary representation of e and
phi.

Finally, mix L into S:

<<

. 12
( 29)



>>