a direct clue on how to choose an optimal block-size. Before getting into details

about how to cleverly choose the block-sizes, let us outline the remaining steps in

the error correction process, as this will illustrate the ef¬ciency gain we may expect

from an intelligent partitioning strategy. The details and some theoretical results

about the process have been summarized in Chap. 3, but we brie¬‚y repeat them here

for convenience of the reader.

Having split the string into blocks of equal size k, Alice and Bob publicly com-

pare parity bits of each block. Obviously, one error will change the parity, and in

general, any odd number of errors will be discovered by observing disagreeing par-

ities. However, two or any larger even number of errors will remain undetected with

this method, which is why further stages of the process are to follow, once the initial

correction has been completed. Let us describe the general correction step by an

example block with one indicated error, i.e., unequal parity by public comparison.

Then this block is searched for the error using a standard bisective search, which

discloses a further lot of log(k) parities of sub-blocks. To spot and correct remaining

errors in the string, such as present in blocks with an even number of errors in them,

Alice and Bob repeat the randomization and partitioning steps, several times with

increasing block-sizes.

Since the error correction up to now may be ineffective, as still having missed

some errors, Alice and Bob continue by comparing parities of random subsets of

bits they publicly agree on. Upon parity mismatch, a bisective search similarly as

above is performed to ¬nd and erase the error. In order to avoid information leaking

to the adversary, the last bit from each random subset can be discarded. This deletion

can also be done after comparing parities of blocks in the previous steps for the same

reason.

The point at which almost all errors have been removed is detected by counting

the number of successful comparisons after having seen the last error. After a suf-

¬cient number of successful trials (20 is the number proposed in [2]), the strings

are accepted as identical, regarding the probability of errors remaining undetected

as negligible.

4 Adaptive Cascade 51

The protocol Cascade is based on this procedure and has been introduced in a

later paper [5], which presented improvements to the original reconciliation pro-

tocol sketched above. Among the changes is the replacement of the bit-deletion

step by a smarter strategy for the sake of detecting more errors faster, so the task

of information leakage reduction is shifted to the privacy ampli¬cation stage. The

naming stems from the strategy of increasing sizes of blocks in the ¬rst stages of

the protocol. Although a comprehensive theoretical analysis of the protocol is pro-

vided, the authors of [5], as well as of [2], abstain from an analytical treatment of

block-size choices. Nevertheless, a simple heuristic based on estimating the error

frequency by direct comparison of a random sample of bits is provided in [2]. These

bits have to be sacri¬ced for the sake of privacy too, if that approach is adopted. Our

main concern in subsequent sections will thus be an optimal choice of block-sizes

in the initial stage of the protocol. Since the increase of block-sizes may be taken

exponential (taking double sizes in stage two, quarter-size blocks in stage three of

Cascade, and so on), the optimal choice of block-size in terms of deriving determin-

istic or probabilistic models may save us from too many stages of Cascade after the

initial one. This concern has been pointed out in [2] already, since the block parity

comparison approach becomes much less ef¬cient the larger the blocks grow. The

idea of estimating the error frequency from a random sample is interesting though,

as this is already some kind of adaption to the current situation rather than taking a

constant block-size once and for all. In the following, we show how this idea can be

amended to spare the need for sacri¬cing too much key material.

4.2.1 Sources of Errors

It is worthwhile to lose a few words about natural sources of errors (considering an

adversary as a rather unnatural source of such phenomena), since this is the type of

errors we need to model for an improved estimate of optimal block-sizes. A simple

systematic error pattern arises from photon emitters and detectors slowly running

out of synchrony. It is easy to imagine that if a photon emitter sends photons with

frequency f and the corresponding detector samples with frequency f + µ for some

small µ > 0, then we will observe an oscillatory pattern of error bursts, i.e., we do

not sample anything at all over repeating periods of time. Furthermore, environmen-

tal conditions (heat, vibration, etc.) may have an impact on the transmission medium

and hence in¬‚uence the error rate. Since photon emitters and detectors are devices

sensitive to calibration errors, increased vibrations due to traf¬c peak hours may

distort the setup and hence increase the error rate. The form of such perturbations

may be modeled well, and is perhaps even periodic to some extent, hence can be

captured by partially deterministic models as described below.

Further sources of errors may be related to the quantum transmission medium.

Sending photons over ¬ber-optic lines, we may observe error frequencies that are

essentially different from those seen on free-space links. The latter are subject to

various environmental distortions, including weather conditions (fog, rain, snow,

52 S. Rass and C. Kollmitzer

etc.), air pollution, or radiation. Kinks or points where ¬ber-optic cables are joined

by splicing or cleaving may increase photon loss and hence the error rate. Damages

caused during laying the cables are sometimes unavoidable and need to be taken

into account as another source of error. Common to both variants is its damping that

is dependent on the length of the channel, which amounts to less photons making

it through, the longer the line gets. A comprehensive overview of system losses is

provided in [17].

Fig. 4.1 Example distribution of random errors over the raw key r . Bits marked in gray are erro-

neous and thus not carried over to the sifted key s

4.3 Adaptive Initial Block-Size Selection

The original Cascade protocol, based on the ideas sketched above, offers room for

improvement in terms of its initial block-size selection strategy. We shall not deal

with adaptive choices of block-sizes in later stages of Cascade than the ¬rst one,

so the term block-size henceforth refers to the initial block-size, unless stated other-

wise. In [2], the choice of block-size is based on a prior estimate of error frequency

so as to achieve an expected number of one error per block; why not incorporate

the posterior knowledge from a ¬nished protocol run in future executions? This

appears reasonable, since we can assume the link to be geographically static, so

undergoing the same sort of distortions regularly. We hence may be able to contin-

uously improve our error estimate over the lifetime of the link devices. Let us ¬rst

summarize what information is available after one execution of the protocol: call

r ∈ {0, 1}— the raw key being the input for the Cascade protocol. After completion

of the error correction, and before the privacy ampli¬cation, we have a string s of the

same length as r , but with errors being corrected. Checking for each bit in r whether

or not it has made it into the corrected string s without a change gives rise to a

sample from a binary-valued random variable or, more speci¬cally, a binary-valued

stochastic chain, where the discrete time refers to the index of the bits in r . Since

r is itself ¬nite, so is the time horizon of the stochastic chain, and for simplicity,

we may consider it a simple random variable in the ¬rst instance. Call this random

variable X ∈ {0, 1}, then we observe

1, if the ith bit in r became corrected and has thus changed in s,

Xi =

0, otherwise.

Figure 4.1 shows an example of the distribution of errors over a bit string.

4 Adaptive Cascade 53

Our goal is replacing the static error frequency estimate at the beginning of the

Cascade protocol by a dynamic one, which takes into account information from

previous protocol executions. A variety of different solutions can be imagined,

and we shall summarize a selection here, restricting ourselves to two high-level

approaches:

1. Choose the block-size at the beginning and leave it constant over the whole exe-

cution time of Cascade™s ¬rst stage, resulting in an equidistant partitioning. The

initial choice is made using past knowledge about error distributions. We refer to

this approach as with ¬xed initial block-size. This variant is suitable if the error

rate remains constant over short periods, but varies in the long run (due to slow

seasonal variations). The resulting scheme will hence be the choice either for

quick establishment of short keys or in situations where conditions can arti¬-

cially be kept almost constant over a long time. We describe two sub-variants of

how to incorporate past experience in order to reach an improved idea of how

much error to expect in future executions.

2. If the error rates are frequently varying within the duration of the protocol execu-

tion, then we may adapt it in real time during the ¬rst stage. The resulting model

can either be a partially deterministic one, if the error rates exhibit repeating pat-

terns (i.e., some kind of periodicity), or completely stochastic, if no such underly-

ing pattern can be identi¬ed. In case we are dealing with fully non-deterministic

scattering of errors, we propose using Bayesian statistics for adapting the error-

rate model to the particular environmental conditions of a link. In that sense, we

can endow Cascade with learning capabilities in order to self-adapt to changing

conditions. We refer to this variant as with dynamic initial block-size.

Among our main concerns will be ef¬ciency, since quantum cryptographic key

exchange protocols can be expected to run endlessly, continuously establishing key

material for ad hoc usage. Ef¬ciency of the protocols is hence a major issue. Let

us start with the simpler of the methods outlined above, choosing its block-size

constant over the whole execution time. Fine-tuning of any of the dynamic models

described here can be done by simulations, comparing the generated scattering to the

true error patterns. It is thus useful to describe how error patterns can be simulated

using the dynamic variants. We do this below.

4.4 Fixed Initial Block-Size

Treat the vector of realizations of X i as a sample and call » its arithmetic mean.

Moreover, assume that from n previous executions of the protocol, we have the

data »1 , . . . , »n available. Each »i represents the average number of errors in the ith

execution of the protocol. This average number is called the error rate. Furthermore,

this is also the maximum likelihood estimator for the parameter of the Poisson and

exponential distribution, which will become useful later. The set of (consecutive) »i

can be considered as a time series, and our wish is to predict its next value, to use

54 S. Rass and C. Kollmitzer

it as our error estimate for running the Cascade protocol. A considerable amount

of literature exists on smoothing and trend estimation in time series, and we refer

the interested reader to [6, 40] for an introduction and overview. Here, we shall

only mention three popular variants, leaving the reader the freedom to apply more

sophisticated methods.

To reduce the random distortions on a time series (smoothing), we may for

instance

1. use sliding centrality measures, like the sliding mean or median, or

2. use exponential smoothing, taking into account all past measurements, or

3. if the state (in terms of error frequency) of the quantum channel can be approx-

imated by a (linear) dynamical system, then the Kalman ¬lter may be used as a

predictor for the error rates in future executions. Similarly, the particle ¬lter is

an option, if a stochastic model underneath the error distribution can be set up

effectively. However, both options strongly rely on the accuracy of the model and

presume thorough analysis and knowledge of the system dynamics. Furthermore,

the particle ¬lter may come at high computational cost if it is to be implemented

effectively.

Applying other techniques is possible, but beyond the scope of this chapter. We

leave this to the reader, consulting the cited references. Coming back to the ¬rst

among the above three proposals, the arithmetic mean appears to be not the optimal

choice, as being very sensitive to outliers. Outliers in our setup would be short-

term deviations of the error frequency by conditions becoming almost ideal for a

short while or by suffering from strong environmental in¬‚uences like vibrations,

heat, or similar for a limited period of time. Both events may happen on a regular

basis and should hence not be considered as outliers. However, the arithmetic mean

estimate would strongly react on this and the resulting block-size could be smaller

than needed. Detection and elimination of outliers is a highly nontrivial task, and no

panacea solution is likely to exist.

A sliding median is more robust against outliers and offers a second appealing

feature, as the window size has a particularly simple interpretation if the median is

used: suppose that our window is of size n = 2k + 1 for an integer k ≥ 1, then

we consider an extreme event to be an outlier, unless it was observed at least k + 1

times over the last 2k + 1 executions of the protocol. For example, if n = 21, then

any extreme situation is considered exceptional (an outlier), unless it occurred at

least 11 times during the last 21 executions. If an expert can somehow estimate the

likelihood of such extremal conditions (by using extreme value probability models),

then the window size can be chosen to handle that probability and hence become sta-

ble against outliers. Implementing more complex and powerful models like ARMA

(autoregressive moving average) is possible, but the quality of results is highly

dependent on the expertise of the engineer, as the modeling becomes involved for

these techniques. See [31] for an introduction and details.

If a sliding centrality measure is employed, then the resulting formulas for

choosing the block-size at the beginning are simple, as we only have to take the

4 Adaptive Cascade 55

median (or mean or any other centrality measure of choice) over the last n available

samples X i .

Exponential smoothing [6] does include all past observations. The predicted

value »n+1 is found as »n+1 = μ»n + (1 ’ μ)»n’1 , where 0 < μ < 1 is the smooth-

ˆ ˆ

ˆ

ing parameter and »1 , . . . , »n’1 are the so-far smoothed values, setting the ¬rst

ˆ ˆ

»1 = »1 . The reader will instantly verify that this corresponds to a weighted sum of

ˆ

observations with exponentially decaying weights.

A further alternative is offered by decision theory: call the error rate » and assume

it to be random, then our problem is to ¬nd the best decision on what » to choose

next before re-running the protocol. Why not perform a Bayesian estimation of it?

A fully natural approach to updating » is considering it to be drawn from a random

variable Λ, whose distribution is to be estimated upon our given information. Again,

let the information available be the average error rate known from the last protocol

execution. Furthermore, for the sake of analytical simplicity and ef¬ciency, let us

model the random variable Λ using a Gamma distribution with hyper-parameters

a, b > 0, which are to be speci¬ed by an expert prior to any protocol runs.

Recall that we update the distribution of Λ after each protocol execution, which

provides us with only a single sample, being the average error rate. Assuming that

this rate is Poisson-distributed, the likelihood function for a single sample is trivially

a Poisson density. The reader familiar with Bayesian decision theory will instantly

recognize the Gamma distribution as a conjugate prior for a Poissonian likelihood

function, in which case a Bayesian update can ef¬ciently be computed [37]. To make

this precise, updating the Gamma distribution

ba a’1 ’b»

»e, if » ≥ 0

“(a)

G(a, b) f (»|a, b) =

with density

if » < 0,

0,

to account for the information z (being the average number of errors derived from

the most recent measurements) gives the posterior distribution

G(a + z, b + 1), (4.1)

so the update amounts to a simple change of the parameters. Using a quadratic loss

function, i.e., penalizing a deviation of the estimator » from its true value » by the

ˆ

functional (» ’ »)2 , the Bayesian estimator » minimizing the loss we can expect

ˆ

ˆ

under the posterior estimate of the parameter (given by (4.1)) is just the expectation

of the (posterior) Gamma distribution and found as

a+z

» = E[Λ|z] = ,

ˆ

b+1

referring to the parameters of the posterior distribution. A popular argument against

conjugate priors is their in¬‚exibility. A straightforward remedy is using a mixture of

conjugate priors for a close approximation of a more satisfactory prior, but although

the set of mixtures of priors can be shown to be dense in the set of probability

56 S. Rass and C. Kollmitzer

distributions (with respect to the Prohorov topology) [37, Theorem 3.4.3], its usage

introduces the need for normalization after the Bayesian update. This means that we

are back at the lengthy calculations involved when using Monte Carlo methods (see

[37, 15] for an introduction) for evaluating the integrals occurring in the expressions,

and hence this option may be taken if one is willing to trade ef¬ciency for accuracy.

4.5 Dynamic Initial Block-Size

If changes in the error frequency can exhibit regular (perhaps periodic) behavior,

then we may approximate these rates using deterministic models that we learn from

the available information. Consequently, we may set up a stochastic Poisson process

[19, 24] with deterministic intensity function (i.e., an inhomogeneous process) or

use a fully non-deterministic model for the rates, giving a Cox process model for the

error scattering. Each approach is expanded below, starting with the determination

of a model for given empirical error rates.

4.5.1 Deterministic Error-Rate Models

Due to the structure of the Cascade error correction method, we have blocks

with errors located inside, and the number of these is available after the proto-

col terminates. Enumerate the blocks by i = 1, 2, . . . , N and call ti the time

stamp of the transmission where the middle bit in the ith block has been sent.

Let »i denote the number of errors within the ith block divided by the length

(time span) of the ith block, then we are left with a classical curve-¬tting prob-

lem upon a given data set (ti , »i ), for i = 1, . . . , N . This can be solved by stan-

dard techniques. Updating an existing estimate Λ upon arrival of new measure-

ments is computationally ef¬cient, since the formula of Sherman and Morrison

[39] applies. Let the ¬tted model be a function Λ ∈ span(g1 , . . . , gm ) for m lin-

early independent base functions g1 , . . . , gm : R+ ’ R. Possible choices for the

functions gi include the polynomial base 1, x, x 2 , . . . , x m’1 or the trigonometric

base 1, sin(x), cos(x), sin(2x), cos(2x), . . ., which corresponds to an approximation

similar as with a truncated Fourier series. Our goal is ¬nding a function Λ(t), which

approximates the time-dependent variation of errors, assuming that this function is

characteristic for the error distribution on a link. Figure 4.8 displays an example.

Approximation of the given point set can be done with respect to any metric, but

we propose using the Euclidian distance for the sake of an ef¬cient update upon

new data. The technique is better known as least-squares ¬tting. The standard least-

squares approximation problem is the following: given a set of points (ti , »i ) for

i = 1, 2, . . . , N , ¬nd a vector of coef¬cients μ = (μ1 , . . . , μm ), which minimizes

the squared distance functional

⎡ ¤2

N N m

⎣»i ’ μ j g j (ti )¦ ,

Q(μ) = [»i ’ Λ(ti )] =

2

i=1 i=1 j=1

4 Adaptive Cascade 57

where

m

Λ(t) = μ g = μi gi (t),

T

(4.2)

i=1

with g = (g1 , . . . , gm ).

The solution μ to a least-squares ¬tting problem is then found by setting the

¬rst-order partial derivatives of Q with respect to μ1 , . . . , μm to zero. Solving for μ

then gives the well-known expression

μ = ( A T A)’1 A T », (4.3)

where the matrix A is called the design matrix and de¬ned as

⎛ ⎞

g1 (t1 ) g2 (t1 ) · · · gm (t1 )

⎜ g1 (t2 ) g2 (t2 ) · · · gm (t2 ) ⎟

⎜ ⎟

A=⎜ . . ⎟.

. ..

⎝. .⎠

. .

. . .

g1 (t N ) g2 (t N ) · · · gm (t N )

Incorporating new information from the (n + 1) th execution of the error correc-

tion protocol amounts to updating the currently estimated model Λ(t) with the new

sample z := »n+1 . Let us reuse a piece of notation from statistics and write Λ(t|z)

for the posterior model, which incorporates the information z that we gathered from

the last protocol execution. Consequently, we may call Λ(t) the prior model. The

corresponding design matrices differ only by one additional row referring to z, so

denoting it by A z , we have the posterior model design block matrix

A

,

A z :=

zT

with A being the design matrix of the prior model. For A z , we have

A

Az Az = AT z = A T A + zz T ,

T

zT

which shows that taking the new information z into account amounts to a rank-one

update of A T A. This particular type of modi¬cation permits the application of the

T

Sherman“Morrison formula [39], which ef¬ciently creates the inverse of A z A z from

the (known) inverse of A T A. Setting B := ( A T A)’1 gives [34]

Bzz T B

’1

=B’ ,

T

( Az Az )

1 + z T Bz

provided that the denominator does not vanish. The posterior model Λ(t|z) is hence

found by using Eqs. (4.3) and (4.2) as

58 S. Rass and C. Kollmitzer

Λ(t|z) = [(A z A z )’1 A z (»1 , . . . , »n , z )T ]T g.

T T

=»n+1

previous

information

The general technique applies for any model which is linear in the parameter

vector, so the two choices above as well as exponential base functions may be

used equally well, but may exhibit signi¬cant differences in terms of performance.

An oscillatory behavior may be better approximated by a trigonometric base, for

instance. It is thus up to the engineer to do the ¬ne-tuning of the model for a speci¬c

link under consideration.

4.5.2 Going Beyond Least Squares

Among the assumptions underneath our method is homoscedasticity of the error

rate™s errors. Stated differently, we may think of error rates being described by

a model f + µ having random errors µ with constant second moment (variance),

whereas f may describe an underlying trend model. This, however, may not nec-

essarily be the case, since given a time series of error rates from the Cascade pro-

tocol, we may identify certain patterns of increased error rates, following dense

error patterns, or vice versa. If error rates exhibit dependencies on their history,

the assumption of homoscedasticity is violated and we may ¬nd a heteroscedas-

tic model more appropriate. Figure 4.2 illustrates the difference between the two,

showing a heteroscedastic model with linearly increasing variances. Speci¬cally, an

autoregressive conditional heteroscedastic (ARCH) or its generalized version, the

GARCH model, may be promising candidates for a successful error model. Loosely

speaking, a stochastic process (modeling the error rates) follows an ARCH model,

if its expectation (conditional on its past) is zero, but its variance, being conditioned

on the history of the process, nontrivially depends on the sigma ¬eld generated by

the past observations [3]. This means that we explicitly model variations in the sec-

ond moment of the error. A generalized ARCH model arises if an ARCH model is

extended with dependencies on the variance of the error itself. Deterministic trends

with time-dependent variances can be incorporated with this method. A well-written

introduction is found in [18].

Taking a different route, we may switch from the simple models we described

to a more ¬‚exible and general method, known as generalized linear models (see

Chap. 7). Intuitively, one may think of such models as a uniform approach to several

regression techniques, and the idea for our context is brie¬‚y sketched as follows:

assume that the error rates (response) Λ follow a distribution that belongs to the

exponential family (see [37] for a de¬nition). This intensity Λ is written as a linear

combination · of the in¬‚uencing factors (predictors). The so-called link function

g describes how the expected error rate EΛ is related to ·, i.e., · = g(EΛ). In

principle, any monotone continuous and differentiable link function can be used,

but convenient choices and standard recommendations are given in the literature

(see [30] for an introduction, as well as [16, 14, 28] for a more application-oriented

4 Adaptive Cascade 59

Empirical error rate

Time

(a) Homoscedasticity

Empirical error rate

Time

(a) Heteroscedasticity

Fig. 4.2 Example error rates with linear trend and constant/increasing measurement variances

presentation). This is particularly interesting in our case, since error rates can be

classi¬ed as counting data, in which case the Poisson distribution is the natural

candidate. This distribution is in the exponential family and can thus be used with

generalized linear models, or generalized linear mixed models (GLMMs), if further

¬‚exibility is demanded.

It should be noted, however, that the mentioned extensions to the classical least-

squares method often do not permit closed form analytical solutions, in which case

we are stuck with intricate numerics, potentially cutting down the ef¬ciency of the

methods.

4.5.3 Error Rates as a Stochastic Process

Assuming errors to happen independently from each other at constant intensity »,

we may think of the event of an error as a random variable following a Poisson

distribution with parameter ». The temporal spread of errors over the raw key is then

a process with independent increments also following a Poisson distribution with

intensity parameter proportional to the length of the block. To make this precise, let

us brie¬‚y review the properties of a Poisson process for convenience of the reader.

A stochastic process is a family {X (t) : t ∈ T } of random variables, indexed by

60 S. Rass and C. Kollmitzer

a variable t. A standard convention interprets t as the time, in which case T =

R+ , i.e., t ≥ 0. Depending on whether the variable t is continuous or discrete,

we distinguish stochastic processes from stochastic chains, respectively. Moreover,

the process is called discrete if each X (t) is discrete, for otherwise it is called a

continuous stochastic process/chain or simply a stochastic process/chain for short.

Following the presentation of [38], a Poisson process is a family of discrete counting

measures {N (t) : t ≥ 0}, which satisfy the following three conditions:

1. N (0) = 0.

2. The process has independent increments.

3. The number of events in any interval of length t is Poisson-distributed with mean

»t. That is, for all s, t ≥ 0,

(»t)n ’»t

Pr {N (t + s) ’ N (t) = n} = e, n = 0, 1, . . . .

n!

Our Poisson process model N (t) will count the total number of errors in the bit

string at time t. Since our blocks should be chosen such that the expected additional

number of errors after taking a time-step ”t is only 1, we are particularly interested

in the distribution of inter-arrival times between two errors. It is well known that

the distribution of the time between two errors is exponential with mean »’1 if that

parameter is constant [38]. However, this is an advantage and may be a drawback

at the same time: the exponential distribution (together with its discrete counterpart,

the geometric distribution) is the only one which enjoys being memoryless, that is,

the probability of observing another n errors after having seen n errors already is

independent of n, or formally [33],

Pr N > n + n |N > n = Pr N > n .

Since the inter-arrival times between two errors are exponentially distributed with

mean »’1 , they do exhibit the memoryless property, which may appear unnatural

in our setting when considering the physical sources of error. On the bright side,

however, we explicitly know the expected number of errors within time span ”t,

which is » · (”t). Assuming that the bits come in at frequency f , then if n denotes

the number of arriving bits within ”t, we have f · ”t = n and the block-size in

terms of bits is the solution of the equation »n/ f = 1 and thus comes to

f

,

initial block-size n ≈ (4.4)

»

which is the block-size (in bits) that the Poisson process gives us. Observe that we

have a constant block-size again, with the only difference to the previous Cascade

variant being it originating from a probabilistic model rather than a time series

smoothing approach, because the maximum likelihood estimator for the intensity

parameter of a Poisson distribution is precisely the arithmetic mean. This is in accor-

dance with our previous assumption on the availability of time series representing

the average number of errors in several executions of the protocol. Hence, it can be

4 Adaptive Cascade 61

seen as justi¬cation of the original proposal of the developers of Cascade [2], but

offers no real advantage unless we can cleverly adapt ».

4.5.3.1 Simulation of a Poisson Process

Knowing that the inter-arrival times between errors are exponentially distributed

with parameter » makes the simulation of error patterns particularly easy for con-

stant », since we only have to draw from the exponential distribution. This can be

done ef¬ciently using the inversion method [36] as follows:

1. Draw U from a uniform distribution over the unit interval.

2. Set the random sample X := ’ » ln(U ), then X has an exponential distribution

1

with parameter ».

Figure 4.3 shows an example of errors simulated with this procedure. The impres-

sion of uniform distribution of n events occurring within the interval (0, T ) at times

t1 , . . . , tn is indeed correct, as it can be shown that given N (T ) = n errors have

occurred up to time T at times t1 , . . . , tn have the same distribution as the order

statistics corresponding to n independent random variables uniformly distributed on

(0, T ) [38].

Fig. 4.3 Example of a Poisson process error pattern with constant error rate. Vertical lines mark

errors observed over the horizontal time axis

Coming back to the criticism, considering the physical sources of errors in photon

transmission makes a constant error rate with independent errors across adjacent

intervals appear unnatural. A more natural model would consider a non-constant

error-rate parameter »(t), in which case, however, we lose the exponential distribu-

tion between two errors. A Poisson process with time-dependent (but deterministic)

intensity parameter »(t) is called inhomogeneous and becomes a Cox process if the

intensity parameter is replaced by a random variable. This type of model is discussed

in the next section.

Let us consider the case of a deterministic intensity parameter »(t) ¬rst. We

have prepared the basic for this in Sect. 4.5.1. Suppose that the error rate has been

approximated by a model » ∈ span(g1 , . . . , gm ) of integrable base functions gi and

that this model provides a satisfactory approximation of the Poisson rates over time.

Following [38], by setting

t

μ(t) := »(s)ds,

0

one can show that

[μ(t + s) ’ μ(t)]n

Pr {N (t + s) ’ N (t) = n} = exp(’[μ(t + s) ’ μ(t)]) ,

n!

62 S. Rass and C. Kollmitzer

t+s

which is again a Poisson distribution with rates μ(t +s)’μ(t) = t »(„ )d„ . Sim-

ilarly as for standard Poisson processes, one can further deduce that the probability

of one counted error within the interval (t, t + ”t) is

Pr {one counted error within(t, t + ”t)} = »(t)”t + o(”t),

so as before, we may approximate the block-size in bits from the time span ”t

according to the solution of »(t)”t = 1, which by calling f the bit frequency again

(see above) gives

f

,

block-size ≈

»(t)

neglecting the terms o(”t).

4.5.3.2 Simulation of an Inhomogeneous Poisson Process

We follow in our presentation the lines of [22], remarking that the underlying

method is better known as rejection method (cf. [36]). Let W be a bounded set over

which we wish to simulate an inhomogeneous Poisson process. In our case, this

will be the output bit string of the photon detection devices. In the ¬rst step of the

simulation, a standard (homogeneous) Poisson process is simulated with constant

intensity parameter »— = maxt∈W »(t), where we assume »— ¬nite. This is trivially

satis¬ed in our setup if the base functions for »(t) are continuous and the length

of the bit string is ¬nite, so W is compact. The number of points in this pattern

will be higher than the number in the ¬nal pattern, which arises by independent

location-dependent thinning based on the thinning function p(t) = »(t)/»— [38].

The decision about whether or not a point from the ¬rst simulation is retained in

the ¬nal output is made by taking p(t) as its survival probability. In other words,

a point xn is retained with probability p(xn ) independently of what happens to the

other points. If the function »(t) is such that its maximum is dif¬cult to determine,

then any larger value may be used instead, resulting in a less ef¬cient procedure,

however, as more points will be rejected in that case. Figure 4.4 shows a simulated

error pattern for an inhomogeneous point process with linear trend »(t) = at for

some a > 0.

4.5.4 Using Bayesian Networks and Cox Processes

Cox processes may be considered as a generalization of inhomogeneous Poisson

processes in the sense of having their intensity parameter stochastic rather than

deterministic. For that reason, they are sometimes termed doubly stochastic Poisson

processes [19, 22], and the ¬rst task for setting up a Cox process model will be

¬nding a probabilistic model for the random intensity parameter Λ. Since errors in

our model are due to random distortions, it appears natural to consider temperature,

4 Adaptive Cascade 63

0 T

(a) Error pattern with linearly increasing intensity parameter

Error count

0 T

Time

(b) Frequency of errors over time

Fig. 4.4 Inhomogeneous Poisson process error pattern with linear trend. The histogram displays

the number of errors per time-slot, showing a (linear) increase of errors over time

humidity, vibration, etc., as natural factors having in¬‚uence on the error rate, as

discussed in Chap. 7. For a concrete link, one may be able to quantify a number of

additional aspects being relevant for the error rate, and an intuitive modeling tech-

nique for setting up joint probability distributions with known dependency relations

among them is offered by the ¬eld of probabilistic networks (we refer the reader to

[4, 10, 26, 32] and references therein for details). Call Z 1 , . . . , Z k a set of random

variables representing environmental conditions on which the error rate » depends

on. A statistical analysis of the in¬‚uence of temperature (Z 1 ), humidity (Z 2 ), as well

as other environmental conditions, is provided in Chap. 7; see also [27]. Among

these, dependency relations may exist which are either known or unknown. Even

if they are unknown, a variety of learning strategies exist, taking existing data for

inferring independency statements. Having set up a graph with arcs representing

dependency between the nodes Z 1 , . . . , Z k , one can sample from the so-modeled

joint probability distributions using a variety of different techniques, as shown in

the cited literature. The appeal of this modeling approach is its ability to capture the

situation at hand almost directly, as we can translate several factors that have been

identi¬ed to be of importance into a model using a widely automated procedure.

It should be noticed, however, that many of the existing learning procedures for

Bayesian networks (see [21] for instance), as well as inference procedures are com-

putationally hard [9, 11, 12], even NP-complete [8]. Therefore, heuristic procedures

for tackling this problem have been developed; see [25]. A restriction that applies

to Bayesian network modeling is the acyclicity of graph models, meaning that no

circular dependencies are allowed to exist like Z 1 ’ Z 2 ’ Z 3 ’ Z 1 for instance.

Relaxing the acyclicity assumption yields dependency networks [20]. If the direc-

tionality of the dependency is unknown, then an undirected link may be chosen, in

which case we are dealing with chain graphs [7] if the network contains no partially

64 S. Rass and C. Kollmitzer

directed cycles or with Markov networks if all links are undirected. Each of these

comes with a comprehensive theory of learning its structure and doing inference.

As an example, consider temperature T , humidity H , and vibration V as in¬‚u-

ence factors, then a simple Bayesian network modeling the joint distribution of all

four variables (i.e., including the error density f Λ ) is depicted in Fig. 4.5. Although

the speci¬cation of the given Bayesian network requires knowledge of the distribu-

tions of T, H, V , as well as the distribution of Λ conditional on T, H, V , its real

power lies in its learning capabilities. In other words, the error distribution condi-

tional on the identi¬ed input variables can be updated during repeated protocol exe-

cutions upon measurements of T, H, V . This adds further ¬‚exibility to our model,

as repeated Bayesian updates of an estimator are consistent under some regularity

conditions [1, 13].

Fig. 4.5 Example Bayesian network model for random error rate Λ. Nodes are random variables

with (conditional) densities assigned. Dependencies are modeled with directed arcs

The probabilistic network shown in Fig. 4.5 is a model for a simple random

variable, whereas in our setting we consider the error rate to undergo seasonal varia-

tions (where the term seasonal may be understood as short- as well as long-duration

changes). Consequently, we may set up a Poisson process with intensity parameter

distributed according to a density induced by a Bayesian network.

A Cox point process model is set up in two steps [22]:

• Modeling of a stationary non-negative-valued stochastic process Λ(t).

• Given a realization of the process Λ(t) de¬ning the local error rates over time, the

values corresponding to this realization form an inhomogeneous Poisson process

with intensity parameter Λ(t).

4 Adaptive Cascade 65

4.5.4.1 Simulating Errors with Cox Processes

Simulation of a Cox process is straightforward by drawing a realization from Λ(t)

and simulating an inhomogeneous Poisson process with this time-dependent error

rate.

4.6 Examples

Let us compare some of our proposals in terms of quality of the block-size selec-

tion strategy. Assume that the errors are scattered across a bit string, following an

inhomogeneous Poisson distribution with an unknown intensity function roughly1

having the shape depicted in Fig. 4.6. The horizontal axis covers a duration of [0, T ],

one may think of the shape re¬‚ecting periods of high and low error rates, induced

by vibrations due to traf¬c peak hours if T is set to 24 h. Extending the example

further by including seasonal temperature variations or similar is left to the reader.

Error rate

0 T

Duration

Fig. 4.6 Temporal variations of error rate (example)

Cascade with adaptive but a priori ¬xed initial block-sizes takes a constant block-

size according to the median of the last w error rates, and take the block-size recip-

rocal to the (median-)estimated intensity parameter », according to expression (4.4),

where the rate for one protocol execution is estimated in the maximum likelihood

sense as

number of errors

.

»=

ˆ (4.5)

total number of bits

Clearly, taking constant block-sizes in the Cascade protocol implicitly assumes

constant error rates over the whole duration of the protocol, so the Cascade variant

with ¬xed initial block-size is best suited for slow variations of », such as being the

1 None of the examples provided here is related to a real-life quantum cryptographic system, in

order to adhere to intellectual property rights of the related data sets. Hence, all diagrams are given

without explicit scales. All the data that have been used for the experiments described here come

from simulations based on real-life system information.

66 S. Rass and C. Kollmitzer

case if we take T = 24 h in Fig. 4.6. We simulate 500 consecutive executions of

the Cascade protocol over the duration [0, T ] with error rates varying as shown in

Fig. 4.6 and a sliding median smoothing with window size w = 11.

In each execution, the ¬nal string is randomly changed (bit-¬‚ip to introduce an

error) according to the (true) error pattern induced by the Poisson process, with

intensity parameter varying according to the function in Fig. 4.6. Taking the median

over the last 11 estimators given by (4.5), one can determine the (constant) block-

size and count the number of errors falling into each block along the full length

of the output string. Counting the number of blocks with 1, 2, 3, . . . errors, we end

up with a histogram, which for our example looks like shown in Fig. 4.7 (notice

that the absolute numbers were normalized to represent empirical probabilities of

encountering blocks with 1, 2, 3, . . . errors).

0.4

0.3

Probability

0.2

0.1

0

0 5 10 15

Number of errors per block

Fig. 4.7 Example probabilities for blocks with 1, 2, 3, . . . errors for Cascade error correction with

¬xed initial block-size

For evaluating the dynamic variant of the protocol, let the function in Fig. 4.6

cover the duration of one single protocol execution within [0, T ]. Simulation of an

inhomogeneous Poisson process according to the rejection method sketched above

and calculating the average number of errors in equidistantly distributed bins across

the period [0, T ] give a set of points for which an approximating function Λ(t) is

sought. Figure 4.8 shows an example where this has been done and a polynomial

model (solid line) has been ¬tted. Figure 4.9 displays the empirical probability of

encountering blocks with 1, 2, 3, . . . errors, showing that the variations in the local

error rates can indeed be smoothed by our proposal. For comparison, Fig. 4.10 dis-

plays the distribution of the number of errors within blocks if a constant block-size

is used. In the example, we took the maximum likelihood estimator from the sim-

ulated inhomogeneous Poisson process and derived the constant block-size using

formula (4.4).

4.7 Summary

Our results show that adapting the block-size to variations of the local error rate

is indeed worthwhile, since it signi¬cantly increases the ef¬ciency of the error

correction by reducing the number of bits that become revealed to the adversary

4 Adaptive Cascade 67

Error rate

local error estimate

exact intensity

fitted intensity function model

0 T

Time

Fig. 4.8 Example of estimating local error rates and ¬tting a model to the empirical data

0.4

0.3

Probability

0.2

0.1

0

0 5 10 15 20

Number of errors per block

Fig. 4.9 Example empirical probabilities for blocks with 1, 2, 3, . . . errors for Cascade with

dynamic initial block-size

during the Cascade protocol. This leads to a considerable improvement of the QKD

in terms of ef¬ciency.

Combining the described models is possible by mixing a deterministic trend with

a stochastic variational part to capture regular and predictable variations of the error

rate which undergoes random perturbations through environmental in¬‚uences. This

can be seen as an extension to the least-squares model we proposed in Sect. 4.5.1,

whereas it is known that the least-squares ¬t is a best linear unbiased estimator

(BLUE; this is the Gauss“Markov theorem [23]) and becomes a best unbiased esti-

mator (BUE) if the data enjoy a normal distribution. Hence, a combined approach is

most suitable if the error is non-Gaussian, in which case a Bayesian model may be

a promising alternative. A different view on the problem of partitioning a string into

blocks such that the number of errors per block is minimized may also open the area

68 S. Rass and C. Kollmitzer

0.25

0.2

Probability

0.15

0.1

0.05

0

0 5 10 15 20

Number or errors per block

Fig. 4.10 Example empirical probabilities for blocks with 1, 2, 3, . . . errors for Cascade with ¬xed

initial block-size but strongly varying local error rates

for techniques known from designs of experiments. See [29, 35] for an introduction

and overview.

References

1. Barron, A., Schervish, M.J., Wasserman, L.: The consistency of posterior distributions in non-

parametric problems. Ann. Stat. 27(2), 536“561 (1999) 64

2. Bennett, C.H., Bessette, F., Brassard, G., Salvail, L., Smolin, J.: Experimental quantum cryp-

tography. J. Cryptol. 5, 3“28 (1992) 49, 50, 51, 52, 61

3. Bollerslev, T., Engle, R.F., Nelson, D.B.: Handbook of Econometrics, Vol. IV, Chap. 49:

ARCH Models, pp. 2959“3038. Elsevier Science B.V., Amsterdam (1994) 58

4. Borgelt, C., Kruse, R.: Graphical Models “ Methods for Data Analysis and Mining. John

Wiley & Sons, UK (2002) 63

5. Brassard, G., Salvail, L.: Secret-key reconciliation by public discussion. In: Heile-Beth, T (ed.)

EUROCRYPT. Springer, New York, pp. 410“423 (1993) 51

6. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, New York

(1996) 54, 55

7. Buntine, W.L.: Chain graphs for learning. In: Besnard, P. and Hanks, S. (eds.) Uncertainty in

Arti¬cial Intelligence, Morgan Kaufmann, San Francisco, CA., pp. 46“54 (1995) 63

8. Chickering, D.M.: Learning bayesian networks is NP-complete. In: D. Fisher, H.J. Lenz (eds.)

Learning from Data: Arti¬cial Intelligence and Statistics V, Chap. 12, pp. 121“130. Springer-

Verlag New York (1996) 63

9. Cooper, G.F.: The computational complexity of probabilistic inference using bayesian belief

networks (research note). Artif. Intell. 42(2“3), 393“405 (1990) 63

10. Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and

Expert Systems. Springer, New York (1999) 63

11. Dagum, P., Chavez, R.M.: Approximating probabilistic inference in bayesian belief net-

works. IEEE Trans. Pattern Anal. Mach. Intell. 15(3), 246“255 (1993). DOI http://

dx.doi.org/10.1109/34.204906 63

12. Dagum, P., Luby, M.: Approximating probabilistic inference in bayesian belief networks is

NP-hard. Artif. Intell. 60(1), 141“153 (1993) 63

13. Diaconis, P., Freedman, D.: On the consistency of bayes estimates. Ann. Stat. 14(1), 1“26

(1986) 64

14. Dobson, A.J.: An introduction to generalized linear models, 2nd edn. Chapman & Hall, CRC

(2002) 58

15. Doucet, A. (ed.): Sequential Monte Carlo Methods in Practice. Springer, New York (2001) 56

4 Adaptive Cascade 69

16. Faraway, J.J.: Extending the Linear Model with R. Chapman & Hall/CRC, Boca Ration (2006) 58

17. Gilbert, G., Hamrick, M.: Practical quantum cryptography: A comprehensive analysis (part

one) (2000). URL http://www.citebase.org/abstract?id=oai:arXiv.org:quant-ph/00%09027 50, 52

18. Gouri´ roux, C.: ARCH Models and Financial Applications. Springer, New York (1997) 58

e

19. Grandell, J.: Doubly Stochastic Poisson Processes. Springer, New York (1976) 56, 62

20. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Myers Kadie, C.: Dependency

networks for inference, collaborative ¬ltering, and data visualization. J. Mach. Learn. Res. 1,

49“75 (2000) 63

21. Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination

of knowledge and statistical data. In: KDD Workshop, pp. 85“96 (1994) 63

22. Illian, J., Penttinen, A., Stoyan, H., Stoyan, D.: Statistical Analysis and Modeling of Spatial

Point Patterns. Wiley, Chichestor (2008) 62, 64

23. Kariya, T., Kurata, H.: Generalized Least Squares. Wiley, Chichestor (2004) 67

24. Kingman, J.: Poisson Processes. Oxford Science Publications, Oxford, UK (1993) 56

25. Larra˜ aga, P., Poza, M., Murga, R., Kuijpers, C.: Structure learning of bayesian networks by

n

genetic algorithms: A performance analysis of control parameters. IEEE J. Pattern An. Mach.

Intell. 18(9), 912“926 (1996) 63

26. Lauritzen, S.L.: Graphical Models. Oxford Statistical Science Series 17. Oxford Science Pub-

lications, New York (1996) 63

27. Lessiak, K., Kollmitzer, C., Schauer, S., Pilz, J., Rass, S.: Statistical analysis of QKD networks

in real-life environments. In: Proceedings of the Third International Conference on Quantum,

Nano and Micro Technologies. IEEE Computer Society (2009) 109“114 63

28. Lindsey, J.K.: Applying Generalized Linear Models. Springer, New York (1997) 58

29. Mason, R.L.: Statistical Design and Analysis of Experiments with Applications to Engineering

and Science. Series in Probability and Statistics. Wiley, New York (2003) 68

30. McCullagh, P., Nelder, J.: Generalized Linear Models, 2nd edn. Monographs on Statistics and

Applied Probability 37. Chapman & Hall, London (1989) 58

31. Pankratz, A.: Forecasting with Univariate Box-Jenkins Models. Wiley, New York (1983) 54

32. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.

Morgan Kaufmann Publishers Inc., San Francisco, CA (1988) 63

33. Pitman, J.: Probability. Springer, New York (2000) 60

34. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd

edn. Cambridge University Press, New York (1992) 57

35. Rasch, D., Herrend¨ rfer, G., Bock, J., Busch, K.: Verfahrensbibliothek, Versuchsplanung und

o

-auswertung. Deutscher Landwirtschaftsverlag, Berlin (1978) 68

36. Ripley, B.D.: Stochastic Simulation. Wiley, New York (1987) 61, 62

37. Robert, C.P.: The Bayesian Choice. Springer-Verlag, New York (2001) 55, 56, 58

38. Ross, S.M.: Stochastic Processes. Series in Probability and Mathematical Statistics. Wiley,

New York (1983) 60, 61, 62

39. Sherman, J., Morrison, W.J.: Adjustment of an inverse matrix corresponding to a change in

one element of a given matrix. Ann. Math. Statistics 21, 124“127 (1950) 56, 57

ˇ

40. Stulajter, F.: Predictions in Time Series Using Regression Models. Springer, New York (2002) 54

Chapter 5

Attack Strategies on QKD Protocols

S. Schauer

In the following we will describe a number of different attack strategies on Quantum

Key Distribution protocols. First, we will discuss QKD protocols based on ideal

sources, mainly using single photons as in [2, 1, 11] but also entanglement-based

protocols as, for example, in [15]. Additionally, we will deal with the security of

QKD protocols in realistic environments as they are described in Chap. 6. Regarding

realistic sources, due to the physical limitations of the apparatus some loopholes

were identi¬ed, which could lead to a successful attack on such protocols. As we

will see, special countermeasures are implemented in the protocols to close those

loopholes and guarantee an unconditional secure communication.

The security discussions are limited to individual attacks, since they are the sim-

plest and most studied class of attack strategies on QKD protocols. The key point of

an individual attack is that the eavesdropper, Eve, interacts with each signal coming

from Alice separately. Some strategies rely on the fact that Eve is able to postpone

her interaction with the signal (i.e., the measurement) until after the sifting and error

correction “ or as long as she wants “ to obtain the maximum amount of information

from the public discussion. Alternatively, Eve measures the state instantly and uses

the information from sifting and error correction later on.

The individual attacks stand in opposition to the collective attacks, where Eve

prepares an ancilla state for each signal coming from Alice and lets it interact with

the signal. Eve passes the signal on to Bob and keeps the ancilla at her side. When

all signals are transferred Eve waits to obtain as much information as possible from

the public communication to perform the best measurement on all of her ancillae. It

has been proven that the same security bound de¬ned for QKD protocols also holds

if collective attacks are applied [8, 7].

The most general version of collective attacks are coherent attacks. In this kind

of attack Eve is allowed to perform any quantum operation on the signal in transit

and use any possible ancilla state. Due to this vast amount of possibilities for Eve the

attack cannot be parametrized and brute force optimization is not possible. Anyhow,

S. Schauer (B)

Safety & Security Department, Quantum Technologies, AIT Austrian Institute of Technology

GmbH, Lakeside B01A, 9020 Klagenfurt, Austria, stefan.schauer@ait.ac.at;

http://www.ait.ac.at

Schauer, S.: Attack Strategies on QKD Protocols. Lect. Notes Phys. 797, 71“95 (2010)

c Springer-Verlag Berlin Heidelberg 2010

DOI 10.1007/978-3-642-04831-9 5

72 S. Schauer

bounds for information and theoretical security have been found which are equal to

the bounds for collective attacks [34].

5.1 Introduction

In the discussions about the security of QKD protocols the amount of information

Eve has about the secret key shared between Alice and Bob is of great interest.

Every attack on a QKD protocol is characterized by this amount of information and

Alice™s and Bob™s goal is to minimize it. Usually, Eve™s information about the secret

key consists of two parts: the information she obtains from her measurement on the

signal in transit and the information about Alice™s and Bob™s choice of bases. The

second part tells Eve whether she made a correct measurement for the respective

qubit or not. The best way to express this fact is to use the conditional probabil-

ity p(s|m). Here, s is the value of the bit Alice sent and m is Eve™s measurement

outcome. The value p(s|m) can easily be computed using the probabilities p(m|s),

i.e., the probability that Eve obtains the result m if Alice originally sent the bit s. To

achieve that the following formula is used:

p(m|s)

.

p s|m = (5.1)

p(m|s )

s

The next interesting value is the probability that Eve obtains the same result as

Alice, which is the collision probability

2

Pc s|m = p s|m . (5.2)

s

Further, the expected collision probability over all possible measurement out-

comes m of Eve is described as

Pc = p m Pc s|m . (5.3)

m

To quantify Eve™s amount of information on Alice™s bit two entropies are used:

the Shannon entropy H and the Renyi entropy R (cf. also “Sect. 2.3”). Since the

information depends on Eve™s measurement result, the conditional versions of these

two entropies are used, i.e., H (S|M) and R(S|M), respectively. The Renyi entropy

can be computed directly from the collision probability Pc (s|m)

2

R S|M = m = ’ log Pc s|m = ’ log p s|m (5.4)

s

and is averaged over the respective probabilities of Eve™s measurement results, i.e.,

R S|M = p m R S|M = m (5.5)

m

5 Attack Strategies on QKD Protocols 73

Similarly, the Shannon entropy is de¬ned as

H S|M = m = ’ p s|m log p s|m (5.6)

s

and also averaged over the probabilities of Eve™s results

H S|M = p m H S|M = m . (5.7)

m

The Shannon entropy H is an estimator of the uncertainty of a probability dis-

tribution and thus the variation of the Shannon entropy can be interpreted as the

information gain. For the a priori probability distribution X and the a posteriori

distribution Y the information gain is I = H (X ) ’ H (Y ). This can be used to

describe the amount of information Eve obtains by applying a speci¬c attack. Here

Eve has no a priori information about the secret key and thus H = 1, since Alice

chooses her bit string at random. Therefore, the amount of information gained by

Eve is I = 1 ’ H (S|M), which will be used constantly in the following sections.

Another important question is how much key material has to be discarded to min-

imize Eve™s knowledge about the key. This amount is called the discarded fraction

„ and is computed using the expected collision probability

1

„ = 1 + log Pc n . (5.8)

Following from this equation a bit string of length n has to be reduced by n„ bits

during privacy ampli¬cation to leave Eve with at most 1 Shannon bit of information

on the whole secret key, no matter its length [24].

5.2 Attack Strategies in an Ideal Environment

5.2.1 Intercept and Resend

5.2.1.1 Naive Intercept and Resend

The most intuitive kind of an individual attack is the intercept and resend (I&R)

attack. The main intention for Eve is to get hold of each photon coming from Alice

and measuring it in some prede¬ned basis. According to her result Eve prepares a

new photon and forwards it to the legitimate receiver, Bob. In detail, Alice™s qubit

will either be in the horizontal/vertical or the diagonal basis, explicitly one of the

four states |H , |V , |+ , or |’ , where

1

1

|+ = √ |H + |V |’ = √ |H ’ |V . (5.9)

2 2

74 S. Schauer

If Alice sends a 0, she will either encode it into |H or |+ with equal proba-

bility. Eve, unaware of Alice™s encoding, will choose randomly between the H/V

and the +/’ basis. Thus, she will obtain a correct result in case Alice sent |H

and Eve measured in the H/V basis or Alice sent |+ and Eve measured in the

+/’ basis, respectively. Any other combination will result in a completely random

measurement outcome. This leads to the scheme in Fig. 5.1.

1 1

Bit s = 0

2 2

+

H Alice™s basis choice

1 1 1 1

2 2 2 2

Eve™s basis

+/ ’ +/ ’

H/V H/V

choice

1 1 1 1

1 1

0 0

2 2 2 2

Value of m

+ ’ + ’

H V H V

Fig. 5.1 Decision tree for the naive I&R attack strategy

For now, we will also assume that Eve does not listen to any public commu-

nication between Alice and Bob. Therefore, she will not know in which case her

measurement was wrong. The conditional probabilities p(m|s) for the four possible

results are then

3 2

1 3

1

+ ·1=

p m = |H |s = 0 = p m = |+ |s = 0 =

2 2 8

(5.10)

3 2

1 1 1

p m = |V |s = 0 = p m = |’ |s = 0 = + ·0=

2 2 8

and equally for p(m|s = 1). For the conditional probabilities p(s|m) the sum

s p(m|s) = 2 and thus we get p(s|m) = 2 p(m|s). Further, the collision prob-

1

ability in the naive version of the I&R attack is

2 2

3 1 5

Pc s|m = |H = + = (5.11)

4 4 8

and similarly for m = |V , m = |+ , and m = |’ which results in the average

collision probability

2 2

1 1 1 3 5

Pc = Pc s|m = 4 + = . (5.12)

4 4 4 4 8

m

5 Attack Strategies on QKD Protocols 75

From the collision probability the discarded fraction can be computed as

1 ’ log Pc which is „ 0.322. Thus, only one-third of the key has to be discarded

to guarantee that Eve has less than one bit of information on the whole key.

Looking at the Renyi entropy for m = |H , we get

5

= ’ log Pc s|m = |H = ’ log = 3 ’ log 5.

R S|M = |H (5.13)

8

Since the Renyi entropy for m = |V , m = |+ , and m = |’ is the same and all

four results are equally probable, the average Renyi entropy is

1 1

R S|M = R S|M = m = 4 3 ’ log 5 = 3 ’ log 5. (5.14)

4 4

m

The second interesting entropy is the conditional Shannon entropy H (S|M =

|H )

3 31 1

H S|M = |H = ’ log ’ log = 0.811, (5.15)

4 44 4

which is equal to the other entropies H (S|M = |V ), H (S|M = |+ ), H (S|M =

|’ ), such that

1 1 3 31 1

H S|M = H S|M = m = 4 ’ log ’ log = 0.811.

4 4 4 44 4

m

(5.16)

The total information Eve will have in the end about each bit is 1 ’ H S|M

0.2, which is rather poor for Eve. Therefore, Eve will use another strategy, which

gives her more information. One possibility is to use another measurement basis,

e.g., the Breidbart basis.

5.2.1.2 Intercept and Resend in the Breidbart Basis

As pointed out above, we did not take into account that Eve is able to listen to

the public discussion of Alice and Bob during the error correcting and sifting stage

in the naive I&R attack. If Eve listens to the public discussion she can obtain more

information since she knows in which cases she did a correct measurement. Further-

more, for this attack strategy Eve restricts herself to only one measurement basis and

gains more information compared to the naive I&R attack. The measurement basis

in this attack is the •± basis, which is

|•+ = a|H + b|V |•’ = b|H ’ a|V . (5.17)

Here, a = sin ± and b = cos ± depend on some angle ±. Using this basis for

Eve™s measurement we get a slightly different decision tree (cf. Fig. 5.2).

76 S. Schauer

Fig. 5.2 Decision tree for the I&R attack strategy in the Breidbart basis

If Alice sends a 0 encoded as |H , Eve will obtain |•+ with probability a 2 and

|•’ with probability b2 . If Alice encodes 0 as |+ , Eve will obtain |•+ with prob-

ability c2 = 1 (a + b)2 and |•’ with probability d 2 = 1 (a ’ b)2 , which is a direct

2 2

result from the measurement of |+ in the |•± -basis. This gives the conditional

probabilities

12

p m = |•+ , H/V |s = 0 = a =p m = |•’ , H/V |s = 1

2

1

p m = |•’ , H/V |s = 0 = b2 = p m = |•+ , H/V |s = 1

2 . (5.18)

1

p m = |•+ , +/’ |s = 0 = c2 = p m = |•’ , ’/+ |s = 1

2

1

p m = |•’ , +/’ |s = 0 = d 2 = p m = |•+ , ’/+ |s = 1

2

Since a 2 + b2 = c2 + d 2 = 1, the sum s p(m|s) = 1 and thus the conditional

2

probabilities are p (s|m) = 2 p(m|s). This gives a collision probability of

2 2

Pc s|m = |•+ , H/V = a2 + b2 = a 4 + b4 , (5.19)

which leads to the average collision probability (using some simple algebra)

1 1 1 3

Pc = 2a 4 + 2b4 + 2c4 + 2d 4 = a 4 + b4 + c4 + d 4 = . (5.20)

4 4 2 4

Based on this result, the discarded fraction „ can be computed as described in

Eq. 5.8 which leads to „ 0.585. Thus, more than half of the bits of the secret key

have to be discarded to minimize Eve™s information to at most one bit.

It has to be stressed that the average collision probability is independent of the

angle ± chosen by Eve. Similarly, also the Renyi entropy

R S|M = |•+ , H/V = ’ log(a 4 + b4 ) (5.21)

5 Attack Strategies on QKD Protocols 77

and consequently the average Renyi entropy

12 1

R S|M = a + b2 ’ log a 4 + b4 + c2 + d 2 ’ log c4 + d 4

2 2 (5.22)

= 2 ’ log 3

is independent of the angle ±. This is not the case for the Shannon entropy, since

H S|M = |•+ , H/V = ’a 2 log a 2 ’ b2 log b2 (5.23)

and further

12

H S|M = a + b2 ’a 2 log a 2 ’ b2 log b2

2

1 (5.24)

+ c2 + d 2 ’c2 log c2 ’ d 2 log d 2

2

= ’ a 2 log a 2 ’ b2 log b2 .

To make Eve™s errors in the H/V and +/’ basis equally probable she uses

± = 3π . In this case the basis •± is called the Breidbart basis, as introduced in [3].

8

As pointed out above, the average collision probability and the Renyi entropy are

independent of the angle ± and it only affects the Shannon entropy, which is in that

case

3π 3π 3π 3π

H S|M = ’ sin2 ’ cos2 = 0, 61.

log2 sin2 log2 cos2 (5.25)

8 8 8 8

Thus, Eve learns 1 ’ H (S|M) 0.4 bits from every bit sent by Alice. Compared

to the intuitive and rather simple approach described above Eve gains much more

information with a measurement in the Breidbart basis (cf. Eq. 5.16). Nevertheless,

the amount of information is not yet optimal. As we will see in the next section, Eve

is able to increase her information about Alice™s bits by using the H/V and +/’

bases again and listening to the communication between Alice and Bob.

5.2.1.3 Full Intercept and Resend

The most successful I&R attack is a mixture of the two aforementioned strategies. In

this attack Eve randomly chooses between the H/V and the +/’ basis to measure

the signals coming from Alice. She forwards the results she obtains to Bob and

listens to the public communication between Alice and Bob during the sifting phase.

If Alice sends a 0 encoded as |H , Eve will either measure it in the H/V or +/’

basis. As we have already seen, if Eve uses the H/V basis she will obtain |H with

certainty and introduce no error. Otherwise, she will obtain |+ or |’ with equal

probability (cf. Fig. 5.3).

78 S. Schauer

Fig. 5.3 Decision tree for the full I&R attack strategy

Comparing the decision tree with the one from the naive I&R attack, it is easy to

see that Eve can eliminate two events for s = 0, i.e., if she measured |V and Alice

used the H/V basis and |’ and Alice used the +/’ basis. These two events occur

with probability p = 0 which increases Eve™s information. In detail, the probabili-

ties p(m|s) are

2

1

1

·1= = p m = |+ , +/’ |s = 0

p m = |H , H/V |s = 0 =

2 4

2

1

p m = |V , H/V |s = 0 = · 0 = 0 = p m = |’ , +/’ |s = 0

2

,

3

1 1

p m = |+ , H/V |s = 0 = = = p m = |H , +/’ |s = 0

2 8

3

1 1

p m = |’ , H/V |s = 0 = = = p m = |V , +/’ |s = 0

2 8

(5.26)

and we get similar values for s = 1. For the sum s p(m|s) we obtain 1 , such that

4

p(s|m) = 4 p(m|s). This results in the collision probabilities 1 if Eve chooses the

correct basis for her measurement and 1 if she chooses a basis different from Alice™s

2

preparation. Thus, the average collision probability is

1 1 1 3

Pc = +4 + = . (5.27)

4 16 4 4

Calculating the discarded fraction „ , we get „ 0.585 as for the attack in the

Breidbart basis (cf. Sect. 5.2.1.2).

For the Renyi entropy we obtain R(S|M = m) = ’ log 1 = 0 whenever Eve™s

choice of the basis is correct and R(S|M = m) = ’ log 0.5 = 1 otherwise. The

average Renyi entropy is then

1