. 14
( 16)


(mod q ) .
z = ( ,. (mod p ) and zq = Cqq

The desired solution is given by

Cd (mod N ) = (uq, bz,) (mod N ) . (7.15)

To see that this actually works, consider the case where N = 33, p = 11,
q = 3, d = 7 (then e = 3, but we do not need the encryption exponent
here). Suppose we want t o decrypt C = 5, that is, we want t o deter-
mine 57 (mod 33). We have

d, 7 (mod 10) = 7 and d, 7 (mod 2) 1
= = =

and we find that a = 12 and b = 22 satisfy the required conditions given
in (7.13) and (7.14), respectively. Then

C, 5 (mod 11) = 5 and C, = 5 (mod 3) = 2

and, therefore,

z = ( ,. = 57 = 3 (mod 11) and zq = C,". = 2l = 2 (mod 3).
, (7.16)

Finally, we have

Cd (mod N ) = 57 = 1 2 . 3 + 2 2 . 2 (mod 33) = 14,

which shows that, at least for this simple example, (7.15) holds.

Another significant speedup in modular exponentiation is provided by
Montgomery multiplication [ 1031. Our discussion here follows the brief but
excellent description given in [24].
Suppose we want to compute ab (mod N). The most expensive part of
this operation is the modular reduction, which, in the nai've approach, requires
a division. However, in some cases, modular reduction can be accomplished
without division.
One case where the modular reduction is easy occurs when the modulus
is of the special form N = mk - 1 for some m and k . Suppose we have a
modulus N of this form and we want to determine, say, ab = c (mod N).
Then there exists co and c1 such that c = e l m k co, with 0 5 co < m k . If
we can find such co and e l , then wc have

+ c1 + CO
c = clmk c1

1) c1 co
= -

+ cg (mod ( m - 1)).
= c1

That is, if we can find integers co and c1 such that c = elmk cg, then we
have c (mod N) = co c1, provided N = mk- 1. In some cases, it is easy to
find the required co and c1. Consider, for example, 3089 (mod 99). For this
example we have
+ 89
3089 = 3 0 . 100
+ 30 + 89
30.100 - 30
1) + (30 + 89)
30(100 ˜

+ 119 = 119 (mod
= 30.99 99).

Here, we need one final step where we subtract 99 from 119 to obtain the
desired result, namely, 3089 = 20 (mod 99). Provided that c = ab satisfies
the condition c < N2 (which will be the case if a < N and b < N ) , the desired
result is either co c1 or, if this sum is greater than N, the desired result
is co + c1 - N . In this latter case, we say that an extra reduction is required.
The Montgomery multiplication algorithm is somewhat analogous to thc
process in the previous paragraph, but it works for any modulus N . Again,
supposc we warit' to compute
crb (rnod N ) . (7.17)

Iri the Montgomery algorithm, we choose R = 2 k , where k is large enough
so t,hat we have R > N , and gcd(R, N ) = 1. Since R is a power of two,
determining any result modulo R is trivial-at least for a computer, where
numbers are in binary. Also, since R and N are relatively prime, we can
find N' and R such that

R R ' N N ' = 1.

Now, instead of dealing directly with a and b we work with the num-
bers a™ = a R (mod N ) and b = bR (mod N ) . We say that u™ and b are in
™ ™
Montgomery form. Converting a and b t o Montgomery form appears t o be a
step backwards, since in (7.17) we only have a single mod N operation, and
we now have two such operations (at least). However, dealing with numbers
in Montgomery form will actually prove t o be highly advantageous when do-
ing modular exponentiation, where repeated multiplication is required. We
return t o this point below, but for now we simply want t o show that we can
efficiently multiply two numbers in Montgomery form and obtain a result in
Montgomery form.
Observe that
a™b™ = abR2.
We would like this result t o be in Montgomery form, that is, we want t o
have abR (mod N ) ,not abR2. Since RR™ = 1 (mod N ) , multiplication by R™
yields abR2R™ = abR (mod N ) , that is, we can obtain the desired result
by multiplying by R™ and reducing the result modulo N . However, we want
t o avoid mod N operations, if possible. Therefore, what we chiefly need
is an efficient method t o convert abR2 t o abR (mod N ) . The Montgomery
algorithm provides just such a method.
Let X = ubR2 and compute

m = ( X (mod R ) ). N™ (mod R ) , (7.18)

which is efficient, since all mod R operations are efficient. Next, let

( X + m N ) / R(mod R ) (7.19)

and return x, unless x 2 N , in which case return x - N , that is, an extra
reduction may be required.
We now want t o verify that the algorithm in the previous paragraph
gives us abR (mod N ) . To see that this is the case, first observe that m
is the product of N™ and the remainder that results when X is divided by R.
Also, from the definition of N™ we have NN™ = -1 (mod R). Consequently,
X m N = X - ( X (mod R ) ) ,and, therefore, X + m N is divisible by R.
Furthermore, since R = 2 k , this division is, in binary, simply a shift by k
bits and, consequently, the division in (7.19) is trivial t o compute. It follows
that xR = X m N = X (mod N ) and, therefore, xRR™ = XR™ (mod N ) .
Finally, from the definition of R we have RR™ = 1 (mod N ) so that

x = xRR™ = XR™ = abR˜R™ = abR (mod N ) ,

as desired.
An example should clarify the Montgomery algorithm [24]. Suppose that
we have N = 79 and a = 61 and b = 5. Since humans prefer powers of 10

to powers of two, and this example is intended for human consumption, we
choose R = 102 = 100. Then

a' = 61.100 = 17 (mod 79) and b' = 5 . 1 0 0 = 26 (mod 79).

Via the Euclidean Algorithm, we find

6 4 . 100 - 81 ' 7 9 = 1.

which implies R' = 64 and N' = 81.
In Montgomery form, we have

a' = aR (mod = bR
N ) = 17 and b' (mod N ) = 26

and we want to determine abR (mod N ) , which is in Montgomery forrn.
From (7.18) we compute X = 1 7 . 2 6 = 442, and

rn = ( X (mod R ) ). N' (mod R )
= (442 (mod 100)) '81 (mod 100)
= 42 . 81 = 3402 = 2 (mod 100).

Then from (7.19) we have

+ r n N ) / R (mod R )
z =(X
= (442 + ( 2 . 79))/100 (mod 100)
= 600/100 = 6.

It is easily verified that this is the correct result, since

abR = 61 . 5 . 100 = 30,500 = 6 (mod 79).
Conversion from Montgomery form into regular (non-Montgomery) forrn
is straightforward, at the cost of one mod N reduction. Given abR (mod N ) ,
since RR' = 1 (mod N ) , we have

abRR' (mod N) = ub (mod N).
In the example above, R' = 64 and we have

(abR)R' = 6 . 64 = 384 = 68 (mod 79).
We can directly verify that this is the correct answer since

ab (mod N ) 6 1 . 5 = 305 = 68 (mod 79).

Montgomery multiplication is certainly more work than it is worth in the
simple example considered above. However, suppose that instead of ˜0x1-
puting ab (mod N ) , we want to compute ad (mod N). Then to use the

Montgomery algorithm, we must pay the price of converting a into Mont-
gomery form, but having done so, all of the multiplications required in the
computation of ad (mod N ) can be computed using (7.18) and (7.19) (and
extra reductions, as required), without any expensive division operations.
The final result must be converted from Montgomery form back into non-
Montgomery form, which requires one additional mod N operation. The
bottom line is that only two expensive mod N operations are required, since
the multiplications are all computed in Montgomery form which only requires
efficient mod R operations. With respect to the timing attacks discussed be-
low, the extra reduction step provides a crucial timing difference that an
attacker can exploit in some circumstances.
Other tricks are also used to speed up modular exponentiation. Of these,
the sliding window and Karatsuba multiplication are the most significant.
A sliding window is a straightforward time-memory trade-off applied to the
repeated squaring algorithm. That is, instead of processing each bit individ-
ually, we process the bits in blocks (say, blocks of five consecutive bits) and
use pre-computed tables containing the required factors.
Karatsuba multiplication [76] is the most efficient method to multiply two
numbers with the same number of digits-assuming that addition is much
cheaper than multiplication. The work factor for Karatsuba multiplication is
on the order of nlog2 = n1.585 multiplications, where n is the number of bits
in each of the numbers to be multiplied, whereas normal “long multiplication”
has a work factor on the order of n2.
The Karatsuba algorithm is based on a simple observation. The naive
+ +
approach to computing the product (a0 a1 . 10)(b, bl . 10) is

(a0+ a1 . lo)@, + bl . 10) = aobo + (a& + U l b 0 ) l O + albl .102,

which requires four multiplications to determine the coefficients of the powers
of ten. However, the same can be accomplished with just three multiplica-
tions. since

+ a1 . 10)(bo + bl . 10)
+ [(a0+ a l ) ( b o + b l ) +
˜ o b o ˜ l b l ] l O albl . lo2 (7.20)
= aobo - -

and this is the essential idea behind Karatsuba multiplication.
The Karatsuba technique can be used for numbers of any magnitude. For
example, suppose that we want to find the product

+ + c2 . lo2 + c : .˜103)(do+ dl . l o + d2 . lo2 + d 3 . lo3).
( c ˜ c1 . l o

We can rewrite the first term a s

+ c1 . 10 and C1 = c2 + c3 . 10. Similarly, we can rewrite the
where Co = co
second term as

+ dl 10) + (d2 + d3.10)102 = Do + D1 . l 0 2 ,
(do ™

where Do = do + d l . 10 and D1 = d2 + d3 . 10. In this case, the Karatsuba
product is given by

+c +
(CO 102)(Do D1 102)

= C o ˜+ [(coC ˜ ) ( + a ) cono c 1 ˜ l ] 1 0+ c l ˜l o 4 .
™ ™

+ .,

o 2
- ˜

Here, the three products involving the Ci and 0 3 are computed as in (7.20).
Consequently, given any product, we can recursively apply the Karatsuba
multiplication technique. At each step in the recursion, three multiplications
are required, and the numbers are half as big as at the previous step. A
straightforward analysis yields the claimed work factor of n1.585.
Note that the Karatsuba algorithm holds if the base 10 (or l o 2 ) is replaced
by any other base. Also, the algorithm is most efficient if the two numbers
to be multiplied are of about the same magnitude.
At this point, we have more than enough background to discuss the
three timing attacks mentioned above. First, we consider Kocher™s attack,
which only applies to systems that use repeated squaring, but not CRT
or Montgomery multiplication. Kocher™s attack has been successfully ap-
plied to smartcards. Then we discuss Sdiindler™s method, which can be used
when CRT and Montgomery multiplication are employed. Finally, we present
the justifiably famous Brumley-Boneh attack, which succeeds against RSA
as implemented in a version of OpenSSL in a realistic scenario (over a net-
work). The OpenSSL implementation of RSA is highly optimized, using CRT,
Montgomery multiplication, sliding windows and Karatsuba™s algorithm. As
of this writing, the Brurnley-Boneh attack stands as the greatest success in
the relatively young field of timing attacks. We note in passing that timing
attacks have recently been directed at symmetric ciphers [la] but, to date,
these have proven far less of a realistic threat than timing attacks on public
kcy cryptosystenis.

Kocher™s Attack
The basic idea behind Kocher™s tiniirig att,ack [85]is elegant, yet reasonably
straightforward. Suppose that the repeated squaring algorithm in Table 7.4
is used for modular exponentiation in RSA. Also, suppose that the time
taken by the multiplication operation, s = s . J: (mod N ) in Table 7.4, varies
depending on the values of s and J:. Furthermore, we assume the attacker
is able t,o determine the timings that will occur, given particular values of s
m d .I..

Given this scenario, Kocher views the problem as a signal detection prob-
lem, where the “signal” consists of the timing variations (which are dependent
on the unknown private key bits di, for i = 1 , 2 , .. . ,n ) . The signal is cor-
rupted by “noise,” which is the result of unknown private key bits, di. The
objective is to recover the bits di one (or a few) at a time, beginning with
the first unknown bit, d l . In practice, it is not necessary to recover all of the
bits, since an algorithm due to Coppersmith [31] is feasible once a sufficient
number of the high-order bits of d are known.
Suppose we have successfully determined bits do, d l , . . . , dk-1 and we want
to determine bit d k . Then we randomly select several ciphertexts, say, Cj,
for j = 0 , 1 , 2 , . . . ,m - 1, and for each we obtain the timing T ( C j )for the
decryption Cj” (mod N ) . For each of these ciphertext values, we can precisely
emulate the repeated squaring algorithm in Table 7.4 for i = 1 , 2 , .. . , k - 1,
and at the i = k step we can emulate both of the possible bit values, d k = 0
and dk = 1. Then we tabulate the differences between the measured timing
and both of the emulated results. Kocher™s crucial observation is that the
variance of the differences will be smaller for the correct choice of d k than for
the incorrect choice.
For example, suppose we are trying to obtain a private key that is only
eight bits in length. Then

d ( d o , d l ,d p , d3,d4, dg, dg, d7) with do 1.
= =

Furthermore, suppose that we are certain that

d”dld2d3 E (1010, l O O l } .

Then we generate some number of random ciphertexts Cj, and for each, we
obtain the corresponding timing T ( C j ) .We can emulate the first four steps
of the repeated squaring algorithm for both

dodldpds 1010 and dodldpd3 = 1001

for each of these ciphertexts. For a given timing T ( C j ) , te be the actual
time taken in step !for the squaring and multiplying steps of the repeated
squaring algorithm. That is, te includes the timing of s = s2 (mod N ) and,
if de = 1, it also includes s = s . Cj (mod N ) (see the algorithm in Table 7.4).
Also, let t“e be the time obtained when emulating the square and multiply
steps for an assumed private exponent bit ! For m > ! define the shorthand
not at ion
t ...m = t
E t e + l . ™ ™ Em.
- I

Of course, it depends on the precise bits emulated, but to simplify the no-
tation we do not explicitly state this dependence (it should be clear from

Now suppose we select four ciphertexts, CO, ( 3 2 , C3, and we obtain the
timing results in Table 7.5. In this example we see that for dodld2d3 = 1010
we have a mean timing of
+ 6 + 6 + 5)/4 = 6,
E ( T ( C j ) to,,.y) (7

while the corresponding variance is

+ + O2 + (-1)2)/4
var(T(Cj) - i 0 . . . 3 ) = (12 O2 = 1/2

On the other hand, for dodld& 1001, we have

E(T(C,) - to...3 6,

but the variance is

+ l2+ (-1)2 + 12)/4 = 1.
var(T(Cj) - i0...3) ((-1)2

Although thc mean is the same in both cases, Kocher™s attack tells us that
the smaller variance indicates that dodld2d3 = 1010 is the correct answer.
But this begs the question of why we should observe a smaller variance in
case of a correct guess for dodld™td3.

Emulate 1010 Emulate 1001

5 6
1 11 4
2 12 6 5
13 8 6
5 7

Consider T(Cj), the timing of a particular computation Cf (mod N ) in
Table 7.5. As above, for this T(C,y), let be the emulated timing for the
square and multiply steps corresponding to the l t h bit of the private ex-
ponent™. Also, let tp be the actual timing of the square and multiply steps
corresponding to the t t h bit of the private exponent. Let u include all tim-
ing not accounted for in the te. The value u can be viewed as representing
the measurement “error”. In the example above, we assumed the private
exponent d is eight bits, so for this case

+ t l + t˜t + ™ . . + t7 + u.
T ( C j )= t o
Now suppose that the high-order bits of d are dodld2d3 = 1010. Then for
the timing T ( C j )we have
+ var(t5) + var(t6) + var(t7) + var(u),
var(T(Cj) - to...:<) var(t4)

= te, for e = 0 , 1 , 2 , 3 and, consequently, there is no variance due
to these emulated timings it. Note that here, we are assuming the te are
independent and that the measurement error u is independent of the te, which
appear to be valid assumptions. If we denote the common variance of each te
by var(t), we have

+ var(u)
var(T(Cj) - to...3) = 4var(t)

However, if dodld2d3 = 1010, but we emulate dodld2d3 = 1001, then
from the point of the first d j that is in error, our emulation will fail, giving
us essentially random timing results. In this case, the first emulation error
occurs at d2 so that we find

+ var(t3 - t 3 ) + var(t4) + var(t5)
var(T - to...3) = var(t2 -

+ var(t6) + var(t7) + var(u)
+ var(u),

since the emulated timings & and i3 can vary from the actual timings t 2
and t 3 , respectively.
Although conceptually simple, Kocher™s technique gives a powerful and
practical approach to conducting a timing attack on an RSA implementation
that uses repeated squaring (but not CRT or Montgomery multiplication).
For the attack to succeed, the variance of the error term u must not vary too
greatly between the different cases that are tested. Assuming that a simple
repeated squaring algorithm is employed, this would almost certainly be the
case since u only includes loop overhead and timing error. For more advanced
modular exponentiation techniques, var (u) could differ greatly for different
emulated bits, effectively masking the timing information needed to recover
the bits of d.
The amount of data required for Kocher™s attack (that is, the number of
chosen decryptions that must be timed) depends on the error term u. Note
that timings can be reused as bits of d are determined, since, given additional
bits of d , only the emulation steps need t o change. Therefore, the required
number of timings is not nearly as daunting as it might appear at first blush.
The major limitation to Kocher™s attack is that repeated squaring, with-
out CRT or Montgomery multiplication, is only used in RSA implementations
in highly resource-constrained environments, such as smartcards. In [85],
Kocher argues that his timing attack-as discussed in this section--should
work for RSA implementations that employ CRT. However, Schindler [129]
(among others) disputes this assertion. The next two timing attacks we dis-
cuss will succeed against RSA implementations that utilize more highly op-
timized modular exponentiation techniques.

Schindler™s Attack
Schindler [lag] gives a timing attack that succeeds against RSA implementa-
tions that employ repeated squaring and both CRT and Montgomery niulti-
plication (but not both Karatsuba multiplication and long multiplication).
First, we describe the precise modular exponentiation scenario for which
Schindler™s attack will succeed. Then we discuss Schindler™s attack in some
det ai1.
We assume that the Montgomery multiplication algorithm is implemented
as given in Table 7.6. The repeated squaring algorithm using Montgomery
multiplication is given in Table 7.7.

Table 7.6: Montgomcry Multiplication

//Find Montgomery product n™b™,
//where a = aR (mod N ) and b = bR (mod N )
™ ™
//Given RR™- NN˜ = 1
Montgomery(a™, 6™)
z = a™b™
T = ( 2 (mod R ) ) N ™ (mod R)
s = ( z + r N ) / R (mod N )
i f s 2 N then
s = s - AT // extra reduction
end i f
r e t u r n (s)
end Montgomery

Table 7.7: Repeated Squaring with Montgomery Multiplication

// Find y = zd (mod N ) ,
// where d = (do, d l , dz, . . . ,d n - l ) with do = 1
t™ = XR (mod N ) / / Montgomery form
sf = t˜
for i = 1t o 1
12 -
s = h/Iontgomery(s™, s™)
i f di == 1 t h e n
.s™ = Montgomery(s™, t™)
end i f
next i
s™R™ (mod N ) / / convert to non-Montgornery form
t =
r et u r n (t )

Suppose that the RSA system we want to attack uses the repeated squar-
ing algorithm in Table 7.7 (which relies on the Montgomery multiplication
algorithm in Table 7.6). Also, suppose that the RSA system uses CRT. Then
for each mod N reduction, where N = p q , we compute a mod p reduc-
tion and a mod q reduction, using the algorithm in Table 7.7 for both. We
combine these two results to obtain the desired mod N reduction, as dis-
cussed above in Section 7.4.1. We assume that the attacker is able to choose
ciphertext messages Cj and accurately time the decryption, that is, the com-
putation C j (mod N ) . Of course, the objective is to determine the private
key d.
Schindler™s timing attack [lag] takes advantage of the extra reduction step
in the Montgomery algorithm. Schindler derives precise probabilities that an
extra reduction occurs when using the Montgomery algorithm. Suppose that
we compute Montgomery(a™, B ) using the algorithm in Table 7.6, assuming
that a = aR (mod N ) and B is randomly-selected in {0,1,2,. . . , N - 1).

Then Schindler shows that for each application of the Montgomery algorithm,
the probability of an extra reduction is

P(extra reduction in Montgomery(a™, B)) = - (7.21)
This gives us a useful probability for an extra reduction in the “multiply” step
of the repeated squaring algorithm in Table 7.7. For the “square” step, where
the element to be squared, say B , is selected at random in {0,1,2,. . . ,N - l},
Schindler is able to show that
P(extra reduction in Montgomery(B, B ) )= -. (7.22)
When computing a modular exponentiation ud (mod N ) using the CRT
approach, we first compute ad, (mod p ) using the repeated squaring algo-
rithm in Table (7.7), where d, = d (mod ( p - 1)). Suppose that when
computing u d p (mod p ) , we have ko multiply steps and kl squaring steps.
Note that ko and kl depend only on d, and, therefore, only on d and p , and
not on a. Since the probability (7.21) holds for each multiply, and the prob-
ability (7.22) holds for each square, the expected number of extra reductions
a™ (mod p ) P (7.23)
IC0 3R
As a function of a™, the expression in (7.23) is piecewise linear-more
precisely, it is a linear function with discontinuities a t integer multiples of p .
Qualitatively, the graph of (7.23) is similar to that in Figure 7.2 (see Prob-
lem 2). Note that the total number of extra reductions in the calculation
of Cd (mod N) also include extra reductions due to the factor q. Nev-
ertheless, there would still be a discontinuity in the total number of extra
reductions at every integer multiple of p (and also q ) .

Figure 7.2: Expected number of extra reductions.

The idea behind Schindler™s timing attack follows directly from the graph
in Figure 7.2. Suppose we select ciphertexts CO and C1, with Co < C1.
Let T(Co)and T(C1)be the timing measurements for the decryption of Co
and C1, respectively. Assuming that timing differences are dominated by the
number of extra reductions, we would generally expect T(C1)- T(C0)to be
relatively small, since the number of extra reductions grows slowly (linearly)
from Co to C1.
However, suppose that Co and C1 bracket a multiple of p . For example,
suppose that we have Co = 2 p - k and C = 2p l , where k and e are
reasonably small. Then due to the discontinuity in the number of extra
reductions at 2p, the expected difference T(Co)- T(C1)would be relativcly
large. Therefore, we can select an initial value z and an offset A and let

+ PA,
Ce for P = 0,1,2,

Then we compute

e = O,1,2,. ..
T(Ce)- T(Ct+l), for
using the chosen ciphertexts Cp. Eventually, we will have C < k p < C!+l for
some k and l, and when this occiirs we should detect a significant increase
in T(Cp) -T(Cp+l). Once we have bracketed k p in this manner, we can simply
compute gcd(n, N ) for every

+ l A , 2 +ea + I , 2 + e + 2 . . . . , 2 + ( e + l)A}.
nt (2

If k p is actually in the interval, we find it following this approach, since we
have gcd(kp,N) = p while for other values in the interval, gcd(n,N) = 1.
Of coiirse. a similar statement holds if we happen t o bracket a multiple of q
instead of p .

There are several possible refinements to this attack. For example, since
the graph in Figure 7.2 represents the expected value (i.e., the average be-
havior) we would want to test several nearby values before deciding whether
we had bracketed a multiple of k p or not. Also, once we have bracketed a
multiple of Icp, we could use a binary search approach to reduce the size of
the interval over which we need to compute the gcds. Determining an opti-
mal size for the increment and good initial starting points are also important
issues. These topics are discussed in Schindler™s paper [129]. Schindler also
gives a detailed analysis of his attack.
It is important to note that in [129], Schindler does not apply his attack
to any real-world RSA implementation. Instead, he simulates an RSA de-
cryption routine that uses repeated squaring and Montgomery multiplication
as described in this section and he gives empirical results showing that his at-
tack succeeds in every case tested. Also, it is interesting to note that whereas
Kocher™s timing attack recovers the bits of the private key one (or a few) at
a time, Schindler™s attack recovers the private key essentially all at once.
Next, we present a timing attack that builds on Schindler™s work. This
attack, which is due to Brumley and Boneh, succeeds against a sophisticated
real-world implementation of RSA.

Brumley-Boneh At tack
Brumley and Boneh [22] consider a timing attack against RSA as implemented
in OpenSSL. The attack they develop is practical and sufficiently robust that
it can recover a private key over a network that includes several routers and
switches between the endpoints, which introduces a significant random timing
The RSA implementation in OpenSSL is highly optimized, using CRT
with repeated squaring, Montgomery multiplication and a sliding window
(with a window of size of five for a 1024-bit modulus). In addition, the
OpenSSL implementation of RSA employs Karatsuba multiplication to com-
pute the product rcy whenever I(: and y consists of the same number of words,
and it uses ordinary “long multiplication” when IC and y are not of the same
word-size. Repeated squaring, CRT, Montgomery multiplication, sliding win-
dow and Karatsuba multiplication are all discussed in Section 7.4.1, above.
Kocher™s original RSA timing attack [85]does not work when CRT is em-
ployed, and Schindler™s timing attack [129] does not succeed when Karatsuba
multiplication is used (below, it will become clear why Schindler™s attack
fails in this case). Consequently, Brumley and Boneh had to significantly
extend Schindler™s approach to successfully attack OpenSSL. Their attack is
undoubtedly the most advanced practical timing attack developed to date.
In the OpenSSL implementation of modular exponentiation, there are
two algorithmic issues that create significant timing differences. First, there

are the extra reductions in the Montgomery algorithm. This is precisely the
issue that Schindler exploits in his attack [129]. Second, the use of Karatsuba
and normal multiplication creates significant timing differences. However, the
timing attack is great,ly complicated by the fact that these two timing effects
tend to counteract each other.
Suppose we want to decrypt C using the OpenSSL implementation of
RSA. When the Montgomery form of C is close to p , but less than p , then
the number of extra reductions will be large--as can be seen from Figure 7.2%
and therefore the decryption time will increase. On the other hand, if the
Montgomery form of C is slightly larger than p , then the number of extra re-
ductions will be relatively small and the decryption time will decrease (again,
see Figure 7.2).
When the Montgomery form of C is slightly less than p , then many of the
multiplication operations in (7.6) will involve numbers that are of about the
same magnitude. Consequently, Karatsuba multiplication will predominate
in this case, which reduces t)he time as compared to normal multiplication.
On the other hand, when the Montgomery form of C slightly exceeds p ,
then C is small (due to the mod p reduction) so that more of the multipli-
cation operations in (7.6) will involve numbers of significantly differing size.
Consequently, the slower normal multiplication rout.ine will predominate.
The Brumley-˜Bonehattack relies on the fact that these two effects (extra
reductions and normal versus Karatsuba multiplication) each dominate dur-
ing different parts of the attack. This implies that Schindler™s attack could
riot. be used to recover those bits where the Karatsuba versus normal multipli-
cation t h i n g effect dominates. Therefore, Schindler™s timing attack cannot
succeed against the OpenSSL implementation of RSA.
Building on Schindler™s work, Brurnley and Boneh were able to develop a
timing attack against, RSA decryption in OpenSSL. Chosen ciphertext mes-
sages arc decrypted and timing information is obtained. This timing informa-
tion is used t o recover a factor p of the modulus N , where N = p q with p < q .
Unlike Schindler™s attack (but similar to Kocher™s attack), the Brumley--
Boneh attack recovers the unknown bits of p = ( P O ,p l , . . . , p n ) , where po = 1,
one at a time, from the most significant bit to the least significant bit (in our
notation, p l to p n ) . It is not necessary t,o recover all of the bits of p , since
given half of the bits, an algorithm due to Coppersmith [31] can be used to
efficiently compute the factorization of N . Of course, the private key d is
easily obtained from p , q and the public encryption exponent e.
The Bruniley-Boneh attack [22] can be summarized as follows:
1. Suppose that bits p l , p 2 , . . . , p i - ] of p have been determined. Let

c = (PO,Pl,. . . . P i & l >
o . . ,O),

that is? Co consists of the known high-order bits of p with the remaining

bits all set to 0. Similarly, let

If the unknown bit pi is 1, then we have CO< C1 5 p ; otherwise, we
have Co 5 p < C1.

2 . For COand C1, the decryption times T(C0)and T ( C l ) respectively, are
measured and we let A = JT(C0)- T(C1)J. If Co < p < C1, then A
will be “large”, indicating that p i = 0. If CO< C < p , then A will be
“small,” and we infer that p , = 1. Previous values of A are used to set
thresholds for “large” and “small.” Note that this presumes that either
the extra reductions or the multiplication (normal versus Karatsuba)
predominates at each step. For the 0 bits of p , at those steps where
the extra reductions predominate, we have T(C0)- T(C1) > 0 (as
indicated in Figure 7.2), while for those steps where the multiplication
effect predominates, we have T(C0)- T(C1)< 0 (as discussed above).

3. This process is repeated to successively obtain bits pi+l, pi+2, p i + 3 , . . .
of p , until half of the bits of p have been recovered. Then Coppersmith™s
algorithm is used to determine p and q , from which d is easily computed.

One complication that arises in this attack is due to the use of sliding win-
dows in OpenSSL, since it greatly reduces the number of multiplications-and
thereby the amount of timing information available. In [ 2 2 ] ,statistical meth-
ods are used to compensate for the smaller number of multiplications by C
due to sliding windows (in comparison to repeated squaring), as well as for
the effects of a networked environment, where timings are inherently less
accurate. This compensation is accomplished by performing multiple decryp-
tions for each bit of p . The decryption time is measured for a neighborhood
+ +
of values, C, C 1 , .. . , C k, and for each C j in the neighborhood, the
decryption time is measured repeatedly to obtain an average time. While
this requires more decryptions, sufficient timing information can be accumu-
lated to exploit small timing differences, in spite of the sliding window and
network-induced timing variations.
The neighborhood size and the number of times a decryption is repeated
must be large enough to yield a A value which strongly indicates the private
key bit. Otherwise, the A values corresponding to private key bits 1 and 0
will have roughly the same order of magnitude and it will be impossible to
discern the bit correctly with a high probability. In Brumley and Boneh™s
attack [ 2 2 ] , private keys corresponding to 1024-bit moduli were recovered,
using about 1,433,600 chosen ciphertexts over a realistic network that in-
cluded several routers and switches. Each attack took about two hours to

It is not too difficult to simulate this timing attack. For example, in [150],
the Briimley-Boneh attack was simulated and, using a sample size of seven
and a neighborhood size of 3200, many bits of one factor of a 1024-bit modulus
were recovered.

Preventing RSA Timing Attacks
A strong defense against timing attacks is provided by RSA blinding, which
is implemented as follows. To decrypt ciphertext C , we first compute the
intermediate value Y = reC (mod N ) , where T is a randomly-selected value
and e i s the RSA ellcryption exponent. Then Y is decrypted in the usual way,
followed by multiplication by T - ˜(mod N ) . This yields the desired result
Yd = T - ™ ( T ˜ C )= r-˜rC“ = Cd (mod N ) .
T -1

Since T is random, Y is random and measuring the decryption time for Y
does not reveal any information about the private key d. It is important that
a ncw random T be used for every decryption.
Inst,ead of blinding, an alternative would be to always carry out the ex-
tra reduction in the Montgomery algorithm where, if˜ no extra reduction is
required, a dummy extra reduction is used. This approach is championed by
Schindler [lag]. In addition, it is possible to use Karatsuba multiplication in
cvery case. While these modifications would seem to stifle any timing attack,
Brumley and Boneh [22] argue against this approach, since, for example, the
dummy extra reduction might, be optimized into oblivion by an optimizing
Another approach that has been suggested [18] is to “quantizt:” RSA de-
cryption, that is, to make all decryptions take some multiple of a specified
amount, of time (a time “quantum”). Brumley and Boneh [22] note that
for this to be completely effective, all decryptions must take the maximum
amount of time of any decryption, so the performance penalty might be sub-
RSA blinding is the preferred method to prevent timing attacks. The
drawbacks include a slight performance penalty and the need for a reasonably
good source of randomness to generate the blinding factors.

Timing Attacks Conclusion
Timing attacks vividly illustrate that when analyzing the strength of a cryp-
tosystem, all aspects must be considered. In particular, it is not sufficient
for a cipher to be mathematically secure, or even secure against “standard”
cryptanalytic attacks. Attackers will always look for the weakest link, and
thcy are not obliged to play by any set of presumed rules.

Timing attacks are just one example of a general class of attacks known
as side-channel attacks. A side channel is a source of information that-
based solely on an analysis of the underlying algorithm-is not supposed
to be available to the attacker. Side-channel attacks have been developed
which rely on power analysis, fault analysis and electromagnetic fields (EMF).
These types of attacks have been very significant recently, particularly in
the design of smartcards. Side-channel attacks will undoubtedly continue to
play an important role in the design and implementation of systems that use
public key cryptography. As mentioned above, timing attacks have recently
been developed for symmetric ciphers [la], but, so far at least, these attacks
appear to be considerably less practical than timing attacks on public key
cryptosyst ems.

Glitching Attack
In some situations, it is possible to induce a “glitch” or error in an RSA com-
putation. For example, if a srnartcard is abused in some way, it might flip a
bit or cause some other type of error in the resulting computation. Any sys-
tem that is in the attacker™s possession is potentially subject to such a glitch.
An NGSCB “trusted computing” system is one non-smartcard example of
such a system [142].
Surprisingly, in some RSA implementations, a single glitch can enable an
attacker to factor the modulus, and thereby recover the private key. Specifi-
cally, an RSA implementation that employs the CRT (as discussed above) is
potentially subject to a glitching attack.
Suppose that an RSA signature is computed for the message M in a
system that uses CRT. Then the signature is computed as follows. First,

M p = M (mod p ) and Mq = M (mod q ) ,

followed by

xp = M$™ (mod p ) and zq = M$ (mod Q),

where d p = d (mod ( p - 1)) and d, = d (mod ( q - 1)).The desired signature
is given by
S = M d (mod N ) = (uzp bz,) (mod N ) ,
where the constant a satisfies

a 1 (mod p ) and a =0 (mod q )

and b satisfies
b = 0 (mod p ) and b = 1 (mod q ) .

Now suppose that this system is subject to glitching, and the attacker
forces an error in the computation. Suppose that this error occurs so that xh
is computed instead of .rq,but xp is correct. That is, the glitch forces an
error in the computation Mq = 1 1 (mod q ) or xq = 1 2 (mod q ) , but the
remaining computations are unaffected. Then

( u z p+ bx:,) (mod N )
S =

is returned instead of S.
The attacker can easily verify that the “signed” value is incorrect, since we
have (S™)e(mod N ) # A[. But, since xp = A@™ (mod p ) , by the definitions
of a and b,

It follows (see Problem 15) that

) M (mod p ) .

) - M is a multiple of the
and (5
Then the attacker can compute (S™)f™
factor p . Also, since xh # A4$ (mod q ) , by the definitions of a and b,

(S™)e# A4 (mod q ) ,

which implies that (S™)˜-M is not a multiple of q and therefore not a niultiple
of N . Consequently, the attacker can compute gcd(N, (S™).- M ) to reveal a
nontrivial factor of N .
The bottom line here is that a single glitch can break RSA in certain
realistic implementations. Boneh [19] points out that random faults can also
bc used to attack many RSA implementations that do not employ CRT.

Implementation Attacks Conclusions
RSA has proven to be remarkably robust. Having been carefully scrutinized
by niariy researchers, the underlying algorithm has remained secure since its
invention more than three decades ago [19]. Tiniing attacks and glitching
attacks are arnong the very few known realistic attacks on sound implemeri-
tat,ions of RSA, and these do not result from any weakness in the underlying
algorithms, and, furthermore, there are straightforward defenses against such
attacks. TJndoubtedly it is for these reasoi1s that RSA is the de facto stan-
dard in public key cryptography, and it appears likely to remain so for the
foreseeahle future.
7.5 SUMMARY 355

7.5 Summary
Factoring and discrete log algorithms represent fundamental attacks on the
most popular public key systems. Advances in factoring or computing discrete
logarithms could significantly change the nature of public key cryptography.
At the least, advances in this area would require that larger parameters be
used to achieve the same level of security. It is also conceivable that a break-
through (such as quantum computers) could render entire classes of public
key systems vulnerable.
Timing and glitching attacks represent cutting-edge attacks, where cryp-
tography is attacked indirectly. These attacks have proved to be extremely
important recently, and there is every indication that this trend will continue.
It is important to be aware of the overall system in which cryptography is
employed, since seemingly extraneous issues can lead to devastating attacks.

7.6 Problems
1. Construct a simple example (other than the one given in the text) to
illustrate the Montgomery multiplication algorithm as discussed in Sec-
tion 7.4.1.
2 . Let p = 123 and R = 128. For each x = 0 , 1 , 2 , . . . , 5 p let f ( z ) be
the number of extra reductions that occur when using the algorithm in
Table 7.7 to compute x31 (mod p ) . Plot f ( z ) .
3 . Consider the congruence of squares in (7.1).
a. Show that we cannot be “unlucky” when x # f y (mod N ) . That
is, if x # f y (mod N ) then the congruence x2 = y2 (mod N )
must reveal a nontrivial factor of N .
b. Suppose x2 = y2 (mod N ) but x = fy. Why can we not factor N ?
4. Suppose that in Dixon™s Algorithm or the quadratic sieve we are un-
lucky, that is, II: - y and x y do not reveal a factor of N . Is it necessary
to start over and redo all of the work?
5 . Empirically estimate the probability that for a given pair x and y that
satisfy x2 = y2 (mod N ) , we have z # fy.
6. Suppose that the prime p divides Q ( x ) ,where Q(z) is defined in (7.6).
Show that p divides Q ( x + k p ) for all integers k # 0.
7. For Dixon™s Algorithm or the quadratic sieve, when determining the
factor base, we should exclude any primes p for which

($) is the Legendre symbol [95]. Why is this the case?

8. Consider the RSA public key cryptosystem. Suppose that the best
available attack is to factor the modulus N , and the best available
factoring algorithm is the number field sieve. Also assume that the best
available attack on a symmetric cipher is an exhaustive key search.

a. A 1024-bit modulus N provides roughly the same security as a
symmetric key of what length?
b. A 2048-bit modulus N provides roughly the same security as a
symmetric key of what length?
of modulus N is required to have security roughly com-
c;. What size
parable to a 256-bit symmetric key?

9. The algorithm described in Section 7.3.2 is actually "giant-step baby-
step," since the giant steps are done first. Is there any advantage or
disadvantage t o doing the baby steps first and the giant steps second?

10. Compute 31°i (mod 101) for i = I,2 , . . . , 10 and compare your results
to the second row in Table 7.3. Explain.

11. Consider the index calculus method of computing the discrete loga-

a. Show that it is possible to find the logarithnis, base g 1 of the
elements in the factor base by solving a system of linear equations.
Hint: Let { p o , p l , . . . ,p,-l} be the elements of the factor base.
Randomly select lc E (0, 1 , 2 , . . . , p - a}, compute y = gk (mod p )
arid try to write g as a product of elements in the factor base, that
is ?
pF . p'l" . ph' . . . pn-l ,
;y = cn-I

2 0. Take log,g of both sides.
where each ci
b. Let 9 = 6 and p = 229 and let the factor base consist of the prime
numbers less than 12. Select random values of lc as in part a,
until you obtain a system of linear equations that can be solved
to determine the logarithms, base g , of the elements in the factor
base. Solve the system to find the logarithms.

12. Recall the repeated squaring, Montgomery multiplication and CRT
methods for efficient niodular exponentiation, which are discussed in
Section 7.4.1.

a. Computc 537 (mod 33) by repeated squaring.
7.6 PROBLEMS 357

b. Compute 537 (mod 33) using repeated squaring, with Montgomery
multiplication and CRT. How many extra reductions occur?

31 and b = 25.
13. Let u =

a. Find ab (mod 79) using the Montgomery algorithm.
b. Find ab (mod 79) using the Montgomery algorithm.

14. Use two iterations of the Karatsuba algorithm to compute the prod-
uct 337 .521. Clearly show the intermediate steps.

15. Suppose that the RSA signature is computed using CRT as follows.
Let M be the message,

M p = M (mod p ) and M, = M (mod q )

xp = M$ (mod p ) and = M,' (mod q ) ,

where d p = d (mod ( p - 1)) and d, d (mod ( q - 1)). Then the
signature is given by

S = M d (mod N ) = (uxp bz,) (mod N ) ,

where the constant a satisfies

a = 1 (mod p ) and n = 0 (mod q )

and b satisfies

b = 0 (mod p ) and b = 1 (mod q ) .

Suppose that due to a glitch, xh # x, is computed but xp is computed
correctly. Let
s = ( a x p+ bxb) (mod N ) .
Show that
( S ' ) e= M (mod p ) .

16. Write a computer program to implement the following timing attack
on RSA [160]. Assume that the repeated squaring algorithm is used
(without CRT, Montgomery multiplication or a sliding window), and
the decryption exponent is of the form

where do 1.

i. Trudy believes she can recover d one bit a t a time. To accomplish
this, Trudy chooses messages y Z , where y3 < N and has Alice
decrypt each of them. For each i , let yi be the time required t o
decrypt Y,. Trudy computes y, the average of the times yi.
ii. Trudy then chooses messages Zi, where 2," N < 2," < and has
Alice sign each of them. For each i , let zi be the time required t o
sign Z .Trudy computes the average z of the times z i .
iii. If d l = 1, then zi > yi for each i . O n the other hand, if d l = 0,
tjhen zi z yi for each i . Thus if 2 is sufficiently larger than y, Trudy
deduces that d , = 1. Otherwise, she concludes that dl = 0.
iv. Having recovered d l , Trudy uses an analogous process to find dz,
where, in this case, the Yi and 2,are chosen to satisfy different
criteria, depending on the recovered value of d l . Once, d2 is known,
Trudy proceeds in similar fashion t o recover additional bits of d.
Use your program t o answer the following questions.
a. Verify that the attack can be used to recover private key bits d l
and d2 for the case where the modulus is N = 36,355,783, the
encryption exponent is e = 3, and the decryption exponent is
given by d = 24,229,147. Also, show that you can recover bits d l
and d2 for N = 13,789,777, e = 3, and d = 9,188,011.
b. What percent of the time does this method work? Is it a practical
Hint: For part a. instcad of trying to actually tiiiic t t i c prograiii. yoti
(.ail "chcat" , and siiiiply count the niiinber of niodular retluct ion s t e p
t tiat occiirs.

17. Slippost. Alice's public kq- i \ (667. 3 )

a. Fiiid Aliw's private key d.
I). I<ii(.rypt A 1 -- 17.
c. I)c.ciypt thc result of part t) using t lie blinding factor r Show
= 1).
;dl interincdiate steps.
18. Supposc that t h c work factors for factoring algorithrris A tlirorigh F arc
given by the following functions. w h w c N is the integer t o he factortd.

Algorithm Work Factor
A f ( N )= N
B f ( N )=
f ( N )= 2'0g2'0g2 N
f ( N )= 2 k N
f ( N )= 2(log, N)1'2(log2 log2 N)1'2
7.6 PROBLEMS 359

Which of the algorithnis A through E have a polynomial work factor,
which have an exponential work factor and which have a subexponential
work factor? Recall that the work is determined as a function of the
number of bits in N , that is, as a function of J: = log,(N).

19. Suppose that in Kocher™s timing attack, we obtain the timings T ( C j )
and the emulated timings & . . 2 for dodld2 E (100,101,110, 111}, as given
in the table below.
I 101 t o...2
100 110 111
5 7 5 8
4 7 4 1
1 6 4 7
2 8 5 2
1 0688
1 1577
1 1 6 5
7 1 2 3

a. What is the most likely value of dodld2 and why?
b. Why does this attack not succeed if CRT or Montgomery multi-
plication is used?

20. Suppose that for Schindler™s timing attack, we obtain the the following
timing data.

C; I 80 85 90 95 100 105 110 115 120 125
T(C,) 50 52 51 56 60 50 52 55 63 55

a. Which interval is most likely t o contain a factor p of N ?
b. Suppose N = 12,423. For every n in the interval you selected in
part a, compute gcd(n, N ) . Use this information t o factor N .

21. In the Brumley-Boneh attack, the bits are recovered one at a time.
Suppose that the following timing information, with a threshold value
of A = 10, was used t o recover private key bits d l through dg.

Time 4 5 6 7 8 9
1 2
T(C0) 98 96 90 85 96 90 80 73 78
T(C1) 91 84 75 88 80 94 95 85 84

a. What values for the private key bits d l through dg were recovered˜?
b. For which bits does the extra reduction in the Montgomery algo-
rithm dominate, and for which bits does the normal versus Karat-
suba multiplication effect dominate?

22. Suppose Alice™s public key is ( N ,e ) = ( 3 3 , 3 ) . Then Alice™s private key
is d = 7. As discussed in the example on page 337, we can use the Chi-
nese Remainder Theorem (CRT) met hod of modular exponentiation,
with a = 12, b = 22, d p = 7 and d, = 1. Suppose 1 . = 5. Then to
sign Ad, we compute Alp = 5 and My = 2 and, as in (7.16), we compute

57 = 3 (mod 11) and zq = M,d“ = 2™
xJ1 M? 2 (mod 3).
= = =

Finally, we comput,e the signature as

S = M d (mod N ) = 57 = ( 3 . 12 + 2 2 . 2 ) (mod 3 3 ) = 14.
Siipposc that an attacker forces a glitch error in the computation so
that xb = 1 is computed instead of x, = 2, but all other intermediate
quantities are computed correctly.

a. Find S, the “signature” that is computed using zb instead of xy.
How would the attacker know that an error has occurred?
b. Determine the factors of N from S™.

MD5 Tables
The following tables are contained in this appendix. A brief description of
each table is provided.

Table A-1 contains the step constants for the MD5 hash algorithm. The

precise values of these constants are not needed to understand the MD5
attack described in the text, but they are necessary to implement the
algorithm or the attack.

Tables A-2 and A-3 give Wang™s output differential for the first message

block, Mo. The input differential can be deduced from this table and it
is also given in (5.47) in Section 5.4. In these tables we use a compact
notation for sums of powers. This notation is also used, for example, in
Table 5.11 in the text and it is defined on page 239.

Tables A-4 and A-5 give Wang™s output differential for the second mes-

sage block, M I . The input differential can be deduced from this table
and it is also given in (5.48) in Section 5.4. In these tables we use a
compact notation for sums of powers. This notation is also used, for
example, in Table 5.11 in the text and it is defined on page 239.

Table A-6 contains the sufficient conditions for the first message block,

Mo, that are satisfied deterministically in Stevens™ attack. Note that
the conditions on the Qi, for i = 0 , 1 , . . . , 1 5 are satisfied by single-step
modifications, while the conditions on the Qi, for i = 16,17,. . . , 2 0 are
satisfied by multi-step modifications, as discussed in Section 5.4.5. In
this table, a 0 indicates that the particular bit must be a 0, and a 1
indicates the bit is a 1. The character indicates that the bit must

equal the bit in the corresponding position of the preceding row, while
a “ ! ” indicates that the bit must not equal the bit in the corresponding
position of the preceding row and a “.” indicates that there is no
restriction on the bit.


Table A-7 contains the sufficient conditions for the first message block,

Mo, that are satisfied probabilistically in Stevens™ attack. That is, for
each putative solution, these conditions are tested. If any of these con-
ditions fail, the putative solution is discarded. In this table, we have
the restriction that I, J , K E {0, l}, with I # K.

Table A-8 gives information analogous to Table A-6 for M I , the sec-

ond message block. See the description of Table A-6, above, for more
information on this table.

Table A-9 gives information analogous to Table A-7 for M I the sec-

ond message block. See the description of Table A-7, above, for inore
information on this table.

Table A-1: MD5 Step Constants

__ -
__ -
j j j
Kj Kj

0 48
Oxd76aa478 Oxf 61e2562 Oxfffa3942 Oxf4292244
I 33
17 49
Oxe8c7b756 Ox8771f681
Oxc040b340 Ox432aff97
2 34
18 50
Ox242070db Ox265e5a51 Ox6d9d6122 Oxab9423a7
3 35
19 51
Oxclbdceee Oxfde5380c
Oxe9b6c7aa Oxfc93a039
4 20 36 52
Oxf 57cOfaf Oxd62f105d Oxa4beea44 Ox655b59c3
5 21 53
Ox4787c62a Ox4bdecfa9
0x02441453 Ox8fOccc92
6 22 54
Oxa8304613 Oxf6bb4b60
Oxd8ale681 Oxffeff47d
7 23 39 55
Oxfd469501 Oxbebfbc70
Oxe7d3fbc8 Ox85845ddl
8 24 40 56
Ox698098d8 Ox2lelcde6 Ox6fa87e4f
9 25 57
Ox8b44f7af Oxeaal27fa Oxfe2ce6eO
10 26 58
Oxf fff5bbl Oxf4d50d87 Oxd4ef3085 Oxa3014314
27 59
11 Ox04881d05 Ox4e081lal
Ox895cd7be Ox455a14ed
12 Ox6b901122 44
28 Oxa9e3e905 Oxf7537e82
29 61
13 Oxe6db99e5 Oxbd3af235
Oxfd987193 Oxf cefa3f8
46 62
14 Oxa679438e Ox676f02d9 Oxlfa27cf8 Ox2ad7d2bb
47 63
15 31 Oxc4ac5665 Oxeb86d391
Ox49b40821 Ox8d2a4c8a -


. 14
( 16)