. 3
( 9)


using a decision-theoretic approach. Moreover, the authors of [17] do not provide
a direct clue on how to choose an optimal block-size. Before getting into details
about how to cleverly choose the block-sizes, let us outline the remaining steps in
the error correction process, as this will illustrate the ef¬ciency gain we may expect
from an intelligent partitioning strategy. The details and some theoretical results
about the process have been summarized in Chap. 3, but we brie¬‚y repeat them here
for convenience of the reader.
Having split the string into blocks of equal size k, Alice and Bob publicly com-
pare parity bits of each block. Obviously, one error will change the parity, and in
general, any odd number of errors will be discovered by observing disagreeing par-
ities. However, two or any larger even number of errors will remain undetected with
this method, which is why further stages of the process are to follow, once the initial
correction has been completed. Let us describe the general correction step by an
example block with one indicated error, i.e., unequal parity by public comparison.
Then this block is searched for the error using a standard bisective search, which
discloses a further lot of log(k) parities of sub-blocks. To spot and correct remaining
errors in the string, such as present in blocks with an even number of errors in them,
Alice and Bob repeat the randomization and partitioning steps, several times with
increasing block-sizes.
Since the error correction up to now may be ineffective, as still having missed
some errors, Alice and Bob continue by comparing parities of random subsets of
bits they publicly agree on. Upon parity mismatch, a bisective search similarly as
above is performed to ¬nd and erase the error. In order to avoid information leaking
to the adversary, the last bit from each random subset can be discarded. This deletion
can also be done after comparing parities of blocks in the previous steps for the same
The point at which almost all errors have been removed is detected by counting
the number of successful comparisons after having seen the last error. After a suf-
¬cient number of successful trials (20 is the number proposed in [2]), the strings
are accepted as identical, regarding the probability of errors remaining undetected
as negligible.
4 Adaptive Cascade 51

The protocol Cascade is based on this procedure and has been introduced in a
later paper [5], which presented improvements to the original reconciliation pro-
tocol sketched above. Among the changes is the replacement of the bit-deletion
step by a smarter strategy for the sake of detecting more errors faster, so the task
of information leakage reduction is shifted to the privacy ampli¬cation stage. The
naming stems from the strategy of increasing sizes of blocks in the ¬rst stages of
the protocol. Although a comprehensive theoretical analysis of the protocol is pro-
vided, the authors of [5], as well as of [2], abstain from an analytical treatment of
block-size choices. Nevertheless, a simple heuristic based on estimating the error
frequency by direct comparison of a random sample of bits is provided in [2]. These
bits have to be sacri¬ced for the sake of privacy too, if that approach is adopted. Our
main concern in subsequent sections will thus be an optimal choice of block-sizes
in the initial stage of the protocol. Since the increase of block-sizes may be taken
exponential (taking double sizes in stage two, quarter-size blocks in stage three of
Cascade, and so on), the optimal choice of block-size in terms of deriving determin-
istic or probabilistic models may save us from too many stages of Cascade after the
initial one. This concern has been pointed out in [2] already, since the block parity
comparison approach becomes much less ef¬cient the larger the blocks grow. The
idea of estimating the error frequency from a random sample is interesting though,
as this is already some kind of adaption to the current situation rather than taking a
constant block-size once and for all. In the following, we show how this idea can be
amended to spare the need for sacri¬cing too much key material.

4.2.1 Sources of Errors

It is worthwhile to lose a few words about natural sources of errors (considering an
adversary as a rather unnatural source of such phenomena), since this is the type of
errors we need to model for an improved estimate of optimal block-sizes. A simple
systematic error pattern arises from photon emitters and detectors slowly running
out of synchrony. It is easy to imagine that if a photon emitter sends photons with
frequency f and the corresponding detector samples with frequency f + µ for some
small µ > 0, then we will observe an oscillatory pattern of error bursts, i.e., we do
not sample anything at all over repeating periods of time. Furthermore, environmen-
tal conditions (heat, vibration, etc.) may have an impact on the transmission medium
and hence in¬‚uence the error rate. Since photon emitters and detectors are devices
sensitive to calibration errors, increased vibrations due to traf¬c peak hours may
distort the setup and hence increase the error rate. The form of such perturbations
may be modeled well, and is perhaps even periodic to some extent, hence can be
captured by partially deterministic models as described below.
Further sources of errors may be related to the quantum transmission medium.
Sending photons over ¬ber-optic lines, we may observe error frequencies that are
essentially different from those seen on free-space links. The latter are subject to
various environmental distortions, including weather conditions (fog, rain, snow,
52 S. Rass and C. Kollmitzer

etc.), air pollution, or radiation. Kinks or points where ¬ber-optic cables are joined
by splicing or cleaving may increase photon loss and hence the error rate. Damages
caused during laying the cables are sometimes unavoidable and need to be taken
into account as another source of error. Common to both variants is its damping that
is dependent on the length of the channel, which amounts to less photons making
it through, the longer the line gets. A comprehensive overview of system losses is
provided in [17].

Fig. 4.1 Example distribution of random errors over the raw key r . Bits marked in gray are erro-
neous and thus not carried over to the sifted key s

4.3 Adaptive Initial Block-Size Selection

The original Cascade protocol, based on the ideas sketched above, offers room for
improvement in terms of its initial block-size selection strategy. We shall not deal
with adaptive choices of block-sizes in later stages of Cascade than the ¬rst one,
so the term block-size henceforth refers to the initial block-size, unless stated other-
wise. In [2], the choice of block-size is based on a prior estimate of error frequency
so as to achieve an expected number of one error per block; why not incorporate
the posterior knowledge from a ¬nished protocol run in future executions? This
appears reasonable, since we can assume the link to be geographically static, so
undergoing the same sort of distortions regularly. We hence may be able to contin-
uously improve our error estimate over the lifetime of the link devices. Let us ¬rst
summarize what information is available after one execution of the protocol: call
r ∈ {0, 1}— the raw key being the input for the Cascade protocol. After completion
of the error correction, and before the privacy ampli¬cation, we have a string s of the
same length as r , but with errors being corrected. Checking for each bit in r whether
or not it has made it into the corrected string s without a change gives rise to a
sample from a binary-valued random variable or, more speci¬cally, a binary-valued
stochastic chain, where the discrete time refers to the index of the bits in r . Since
r is itself ¬nite, so is the time horizon of the stochastic chain, and for simplicity,
we may consider it a simple random variable in the ¬rst instance. Call this random
variable X ∈ {0, 1}, then we observe

1, if the ith bit in r became corrected and has thus changed in s,
Xi =
0, otherwise.

Figure 4.1 shows an example of the distribution of errors over a bit string.
4 Adaptive Cascade 53

Our goal is replacing the static error frequency estimate at the beginning of the
Cascade protocol by a dynamic one, which takes into account information from
previous protocol executions. A variety of different solutions can be imagined,
and we shall summarize a selection here, restricting ourselves to two high-level

1. Choose the block-size at the beginning and leave it constant over the whole exe-
cution time of Cascade™s ¬rst stage, resulting in an equidistant partitioning. The
initial choice is made using past knowledge about error distributions. We refer to
this approach as with ¬xed initial block-size. This variant is suitable if the error
rate remains constant over short periods, but varies in the long run (due to slow
seasonal variations). The resulting scheme will hence be the choice either for
quick establishment of short keys or in situations where conditions can arti¬-
cially be kept almost constant over a long time. We describe two sub-variants of
how to incorporate past experience in order to reach an improved idea of how
much error to expect in future executions.
2. If the error rates are frequently varying within the duration of the protocol execu-
tion, then we may adapt it in real time during the ¬rst stage. The resulting model
can either be a partially deterministic one, if the error rates exhibit repeating pat-
terns (i.e., some kind of periodicity), or completely stochastic, if no such underly-
ing pattern can be identi¬ed. In case we are dealing with fully non-deterministic
scattering of errors, we propose using Bayesian statistics for adapting the error-
rate model to the particular environmental conditions of a link. In that sense, we
can endow Cascade with learning capabilities in order to self-adapt to changing
conditions. We refer to this variant as with dynamic initial block-size.

Among our main concerns will be ef¬ciency, since quantum cryptographic key
exchange protocols can be expected to run endlessly, continuously establishing key
material for ad hoc usage. Ef¬ciency of the protocols is hence a major issue. Let
us start with the simpler of the methods outlined above, choosing its block-size
constant over the whole execution time. Fine-tuning of any of the dynamic models
described here can be done by simulations, comparing the generated scattering to the
true error patterns. It is thus useful to describe how error patterns can be simulated
using the dynamic variants. We do this below.

4.4 Fixed Initial Block-Size

Treat the vector of realizations of X i as a sample and call » its arithmetic mean.
Moreover, assume that from n previous executions of the protocol, we have the
data »1 , . . . , »n available. Each »i represents the average number of errors in the ith
execution of the protocol. This average number is called the error rate. Furthermore,
this is also the maximum likelihood estimator for the parameter of the Poisson and
exponential distribution, which will become useful later. The set of (consecutive) »i
can be considered as a time series, and our wish is to predict its next value, to use
54 S. Rass and C. Kollmitzer

it as our error estimate for running the Cascade protocol. A considerable amount
of literature exists on smoothing and trend estimation in time series, and we refer
the interested reader to [6, 40] for an introduction and overview. Here, we shall
only mention three popular variants, leaving the reader the freedom to apply more
sophisticated methods.
To reduce the random distortions on a time series (smoothing), we may for

1. use sliding centrality measures, like the sliding mean or median, or
2. use exponential smoothing, taking into account all past measurements, or
3. if the state (in terms of error frequency) of the quantum channel can be approx-
imated by a (linear) dynamical system, then the Kalman ¬lter may be used as a
predictor for the error rates in future executions. Similarly, the particle ¬lter is
an option, if a stochastic model underneath the error distribution can be set up
effectively. However, both options strongly rely on the accuracy of the model and
presume thorough analysis and knowledge of the system dynamics. Furthermore,
the particle ¬lter may come at high computational cost if it is to be implemented

Applying other techniques is possible, but beyond the scope of this chapter. We
leave this to the reader, consulting the cited references. Coming back to the ¬rst
among the above three proposals, the arithmetic mean appears to be not the optimal
choice, as being very sensitive to outliers. Outliers in our setup would be short-
term deviations of the error frequency by conditions becoming almost ideal for a
short while or by suffering from strong environmental in¬‚uences like vibrations,
heat, or similar for a limited period of time. Both events may happen on a regular
basis and should hence not be considered as outliers. However, the arithmetic mean
estimate would strongly react on this and the resulting block-size could be smaller
than needed. Detection and elimination of outliers is a highly nontrivial task, and no
panacea solution is likely to exist.
A sliding median is more robust against outliers and offers a second appealing
feature, as the window size has a particularly simple interpretation if the median is
used: suppose that our window is of size n = 2k + 1 for an integer k ≥ 1, then
we consider an extreme event to be an outlier, unless it was observed at least k + 1
times over the last 2k + 1 executions of the protocol. For example, if n = 21, then
any extreme situation is considered exceptional (an outlier), unless it occurred at
least 11 times during the last 21 executions. If an expert can somehow estimate the
likelihood of such extremal conditions (by using extreme value probability models),
then the window size can be chosen to handle that probability and hence become sta-
ble against outliers. Implementing more complex and powerful models like ARMA
(autoregressive moving average) is possible, but the quality of results is highly
dependent on the expertise of the engineer, as the modeling becomes involved for
these techniques. See [31] for an introduction and details.
If a sliding centrality measure is employed, then the resulting formulas for
choosing the block-size at the beginning are simple, as we only have to take the
4 Adaptive Cascade 55

median (or mean or any other centrality measure of choice) over the last n available
samples X i .
Exponential smoothing [6] does include all past observations. The predicted
value »n+1 is found as »n+1 = μ»n + (1 ’ μ)»n’1 , where 0 < μ < 1 is the smooth-
ˆ ˆ
ing parameter and »1 , . . . , »n’1 are the so-far smoothed values, setting the ¬rst
ˆ ˆ
»1 = »1 . The reader will instantly verify that this corresponds to a weighted sum of
observations with exponentially decaying weights.
A further alternative is offered by decision theory: call the error rate » and assume
it to be random, then our problem is to ¬nd the best decision on what » to choose
next before re-running the protocol. Why not perform a Bayesian estimation of it?
A fully natural approach to updating » is considering it to be drawn from a random
variable Λ, whose distribution is to be estimated upon our given information. Again,
let the information available be the average error rate known from the last protocol
execution. Furthermore, for the sake of analytical simplicity and ef¬ciency, let us
model the random variable Λ using a Gamma distribution with hyper-parameters
a, b > 0, which are to be speci¬ed by an expert prior to any protocol runs.
Recall that we update the distribution of Λ after each protocol execution, which
provides us with only a single sample, being the average error rate. Assuming that
this rate is Poisson-distributed, the likelihood function for a single sample is trivially
a Poisson density. The reader familiar with Bayesian decision theory will instantly
recognize the Gamma distribution as a conjugate prior for a Poissonian likelihood
function, in which case a Bayesian update can ef¬ciently be computed [37]. To make
this precise, updating the Gamma distribution

ba a’1 ’b»
»e, if » ≥ 0
G(a, b) f (»|a, b) =
with density
if » < 0,

to account for the information z (being the average number of errors derived from
the most recent measurements) gives the posterior distribution

G(a + z, b + 1), (4.1)

so the update amounts to a simple change of the parameters. Using a quadratic loss
function, i.e., penalizing a deviation of the estimator » from its true value » by the
functional (» ’ »)2 , the Bayesian estimator » minimizing the loss we can expect
under the posterior estimate of the parameter (given by (4.1)) is just the expectation
of the (posterior) Gamma distribution and found as

» = E[Λ|z] = ,

referring to the parameters of the posterior distribution. A popular argument against
conjugate priors is their in¬‚exibility. A straightforward remedy is using a mixture of
conjugate priors for a close approximation of a more satisfactory prior, but although
the set of mixtures of priors can be shown to be dense in the set of probability
56 S. Rass and C. Kollmitzer

distributions (with respect to the Prohorov topology) [37, Theorem 3.4.3], its usage
introduces the need for normalization after the Bayesian update. This means that we
are back at the lengthy calculations involved when using Monte Carlo methods (see
[37, 15] for an introduction) for evaluating the integrals occurring in the expressions,
and hence this option may be taken if one is willing to trade ef¬ciency for accuracy.

4.5 Dynamic Initial Block-Size

If changes in the error frequency can exhibit regular (perhaps periodic) behavior,
then we may approximate these rates using deterministic models that we learn from
the available information. Consequently, we may set up a stochastic Poisson process
[19, 24] with deterministic intensity function (i.e., an inhomogeneous process) or
use a fully non-deterministic model for the rates, giving a Cox process model for the
error scattering. Each approach is expanded below, starting with the determination
of a model for given empirical error rates.

4.5.1 Deterministic Error-Rate Models
Due to the structure of the Cascade error correction method, we have blocks
with errors located inside, and the number of these is available after the proto-
col terminates. Enumerate the blocks by i = 1, 2, . . . , N and call ti the time
stamp of the transmission where the middle bit in the ith block has been sent.
Let »i denote the number of errors within the ith block divided by the length
(time span) of the ith block, then we are left with a classical curve-¬tting prob-
lem upon a given data set (ti , »i ), for i = 1, . . . , N . This can be solved by stan-
dard techniques. Updating an existing estimate Λ upon arrival of new measure-
ments is computationally ef¬cient, since the formula of Sherman and Morrison
[39] applies. Let the ¬tted model be a function Λ ∈ span(g1 , . . . , gm ) for m lin-
early independent base functions g1 , . . . , gm : R+ ’ R. Possible choices for the
functions gi include the polynomial base 1, x, x 2 , . . . , x m’1 or the trigonometric
base 1, sin(x), cos(x), sin(2x), cos(2x), . . ., which corresponds to an approximation
similar as with a truncated Fourier series. Our goal is ¬nding a function Λ(t), which
approximates the time-dependent variation of errors, assuming that this function is
characteristic for the error distribution on a link. Figure 4.8 displays an example.
Approximation of the given point set can be done with respect to any metric, but
we propose using the Euclidian distance for the sake of an ef¬cient update upon
new data. The technique is better known as least-squares ¬tting. The standard least-
squares approximation problem is the following: given a set of points (ti , »i ) for
i = 1, 2, . . . , N , ¬nd a vector of coef¬cients μ = (μ1 , . . . , μm ), which minimizes
the squared distance functional
⎡ ¤2
N N m
⎣»i ’ μ j g j (ti )¦ ,
Q(μ) = [»i ’ Λ(ti )] =

i=1 i=1 j=1
4 Adaptive Cascade 57

Λ(t) = μ g = μi gi (t),

with g = (g1 , . . . , gm ).
The solution μ to a least-squares ¬tting problem is then found by setting the
¬rst-order partial derivatives of Q with respect to μ1 , . . . , μm to zero. Solving for μ
then gives the well-known expression

μ = ( A T A)’1 A T », (4.3)

where the matrix A is called the design matrix and de¬ned as
⎛ ⎞
g1 (t1 ) g2 (t1 ) · · · gm (t1 )
⎜ g1 (t2 ) g2 (t2 ) · · · gm (t2 ) ⎟
⎜ ⎟
A=⎜ . . ⎟.
. ..
⎝. .⎠
. .
. . .
g1 (t N ) g2 (t N ) · · · gm (t N )

Incorporating new information from the (n + 1) th execution of the error correc-
tion protocol amounts to updating the currently estimated model Λ(t) with the new
sample z := »n+1 . Let us reuse a piece of notation from statistics and write Λ(t|z)
for the posterior model, which incorporates the information z that we gathered from
the last protocol execution. Consequently, we may call Λ(t) the prior model. The
corresponding design matrices differ only by one additional row referring to z, so
denoting it by A z , we have the posterior model design block matrix

A z :=

with A being the design matrix of the prior model. For A z , we have

Az Az = AT z = A T A + zz T ,

which shows that taking the new information z into account amounts to a rank-one
update of A T A. This particular type of modi¬cation permits the application of the
Sherman“Morrison formula [39], which ef¬ciently creates the inverse of A z A z from
the (known) inverse of A T A. Setting B := ( A T A)’1 gives [34]

Bzz T B
=B’ ,
( Az Az )
1 + z T Bz

provided that the denominator does not vanish. The posterior model Λ(t|z) is hence
found by using Eqs. (4.3) and (4.2) as
58 S. Rass and C. Kollmitzer

Λ(t|z) = [(A z A z )’1 A z (»1 , . . . , »n , z )T ]T g.


The general technique applies for any model which is linear in the parameter
vector, so the two choices above as well as exponential base functions may be
used equally well, but may exhibit signi¬cant differences in terms of performance.
An oscillatory behavior may be better approximated by a trigonometric base, for
instance. It is thus up to the engineer to do the ¬ne-tuning of the model for a speci¬c
link under consideration.

4.5.2 Going Beyond Least Squares

Among the assumptions underneath our method is homoscedasticity of the error
rate™s errors. Stated differently, we may think of error rates being described by
a model f + µ having random errors µ with constant second moment (variance),
whereas f may describe an underlying trend model. This, however, may not nec-
essarily be the case, since given a time series of error rates from the Cascade pro-
tocol, we may identify certain patterns of increased error rates, following dense
error patterns, or vice versa. If error rates exhibit dependencies on their history,
the assumption of homoscedasticity is violated and we may ¬nd a heteroscedas-
tic model more appropriate. Figure 4.2 illustrates the difference between the two,
showing a heteroscedastic model with linearly increasing variances. Speci¬cally, an
autoregressive conditional heteroscedastic (ARCH) or its generalized version, the
GARCH model, may be promising candidates for a successful error model. Loosely
speaking, a stochastic process (modeling the error rates) follows an ARCH model,
if its expectation (conditional on its past) is zero, but its variance, being conditioned
on the history of the process, nontrivially depends on the sigma ¬eld generated by
the past observations [3]. This means that we explicitly model variations in the sec-
ond moment of the error. A generalized ARCH model arises if an ARCH model is
extended with dependencies on the variance of the error itself. Deterministic trends
with time-dependent variances can be incorporated with this method. A well-written
introduction is found in [18].
Taking a different route, we may switch from the simple models we described
to a more ¬‚exible and general method, known as generalized linear models (see
Chap. 7). Intuitively, one may think of such models as a uniform approach to several
regression techniques, and the idea for our context is brie¬‚y sketched as follows:
assume that the error rates (response) Λ follow a distribution that belongs to the
exponential family (see [37] for a de¬nition). This intensity Λ is written as a linear
combination · of the in¬‚uencing factors (predictors). The so-called link function
g describes how the expected error rate EΛ is related to ·, i.e., · = g(EΛ). In
principle, any monotone continuous and differentiable link function can be used,
but convenient choices and standard recommendations are given in the literature
(see [30] for an introduction, as well as [16, 14, 28] for a more application-oriented
4 Adaptive Cascade 59

Empirical error rate

(a) Homoscedasticity
Empirical error rate

(a) Heteroscedasticity
Fig. 4.2 Example error rates with linear trend and constant/increasing measurement variances

presentation). This is particularly interesting in our case, since error rates can be
classi¬ed as counting data, in which case the Poisson distribution is the natural
candidate. This distribution is in the exponential family and can thus be used with
generalized linear models, or generalized linear mixed models (GLMMs), if further
¬‚exibility is demanded.
It should be noted, however, that the mentioned extensions to the classical least-
squares method often do not permit closed form analytical solutions, in which case
we are stuck with intricate numerics, potentially cutting down the ef¬ciency of the

4.5.3 Error Rates as a Stochastic Process

Assuming errors to happen independently from each other at constant intensity »,
we may think of the event of an error as a random variable following a Poisson
distribution with parameter ». The temporal spread of errors over the raw key is then
a process with independent increments also following a Poisson distribution with
intensity parameter proportional to the length of the block. To make this precise, let
us brie¬‚y review the properties of a Poisson process for convenience of the reader.
A stochastic process is a family {X (t) : t ∈ T } of random variables, indexed by
60 S. Rass and C. Kollmitzer

a variable t. A standard convention interprets t as the time, in which case T =
R+ , i.e., t ≥ 0. Depending on whether the variable t is continuous or discrete,
we distinguish stochastic processes from stochastic chains, respectively. Moreover,
the process is called discrete if each X (t) is discrete, for otherwise it is called a
continuous stochastic process/chain or simply a stochastic process/chain for short.
Following the presentation of [38], a Poisson process is a family of discrete counting
measures {N (t) : t ≥ 0}, which satisfy the following three conditions:
1. N (0) = 0.
2. The process has independent increments.
3. The number of events in any interval of length t is Poisson-distributed with mean
»t. That is, for all s, t ≥ 0,
(»t)n ’»t
Pr {N (t + s) ’ N (t) = n} = e, n = 0, 1, . . . .
Our Poisson process model N (t) will count the total number of errors in the bit
string at time t. Since our blocks should be chosen such that the expected additional
number of errors after taking a time-step ”t is only 1, we are particularly interested
in the distribution of inter-arrival times between two errors. It is well known that
the distribution of the time between two errors is exponential with mean »’1 if that
parameter is constant [38]. However, this is an advantage and may be a drawback
at the same time: the exponential distribution (together with its discrete counterpart,
the geometric distribution) is the only one which enjoys being memoryless, that is,
the probability of observing another n errors after having seen n errors already is
independent of n, or formally [33],

Pr N > n + n |N > n = Pr N > n .

Since the inter-arrival times between two errors are exponentially distributed with
mean »’1 , they do exhibit the memoryless property, which may appear unnatural
in our setting when considering the physical sources of error. On the bright side,
however, we explicitly know the expected number of errors within time span ”t,
which is » · (”t). Assuming that the bits come in at frequency f , then if n denotes
the number of arriving bits within ”t, we have f · ”t = n and the block-size in
terms of bits is the solution of the equation »n/ f = 1 and thus comes to

initial block-size n ≈ (4.4)

which is the block-size (in bits) that the Poisson process gives us. Observe that we
have a constant block-size again, with the only difference to the previous Cascade
variant being it originating from a probabilistic model rather than a time series
smoothing approach, because the maximum likelihood estimator for the intensity
parameter of a Poisson distribution is precisely the arithmetic mean. This is in accor-
dance with our previous assumption on the availability of time series representing
the average number of errors in several executions of the protocol. Hence, it can be
4 Adaptive Cascade 61

seen as justi¬cation of the original proposal of the developers of Cascade [2], but
offers no real advantage unless we can cleverly adapt ». Simulation of a Poisson Process
Knowing that the inter-arrival times between errors are exponentially distributed
with parameter » makes the simulation of error patterns particularly easy for con-
stant », since we only have to draw from the exponential distribution. This can be
done ef¬ciently using the inversion method [36] as follows:
1. Draw U from a uniform distribution over the unit interval.
2. Set the random sample X := ’ » ln(U ), then X has an exponential distribution

with parameter ».
Figure 4.3 shows an example of errors simulated with this procedure. The impres-
sion of uniform distribution of n events occurring within the interval (0, T ) at times
t1 , . . . , tn is indeed correct, as it can be shown that given N (T ) = n errors have
occurred up to time T at times t1 , . . . , tn have the same distribution as the order
statistics corresponding to n independent random variables uniformly distributed on
(0, T ) [38].

Fig. 4.3 Example of a Poisson process error pattern with constant error rate. Vertical lines mark
errors observed over the horizontal time axis

Coming back to the criticism, considering the physical sources of errors in photon
transmission makes a constant error rate with independent errors across adjacent
intervals appear unnatural. A more natural model would consider a non-constant
error-rate parameter »(t), in which case, however, we lose the exponential distribu-
tion between two errors. A Poisson process with time-dependent (but deterministic)
intensity parameter »(t) is called inhomogeneous and becomes a Cox process if the
intensity parameter is replaced by a random variable. This type of model is discussed
in the next section.
Let us consider the case of a deterministic intensity parameter »(t) ¬rst. We
have prepared the basic for this in Sect. 4.5.1. Suppose that the error rate has been
approximated by a model » ∈ span(g1 , . . . , gm ) of integrable base functions gi and
that this model provides a satisfactory approximation of the Poisson rates over time.
Following [38], by setting
μ(t) := »(s)ds,

one can show that

[μ(t + s) ’ μ(t)]n
Pr {N (t + s) ’ N (t) = n} = exp(’[μ(t + s) ’ μ(t)]) ,
62 S. Rass and C. Kollmitzer

which is again a Poisson distribution with rates μ(t +s)’μ(t) = t »(„ )d„ . Sim-
ilarly as for standard Poisson processes, one can further deduce that the probability
of one counted error within the interval (t, t + ”t) is

Pr {one counted error within(t, t + ”t)} = »(t)”t + o(”t),

so as before, we may approximate the block-size in bits from the time span ”t
according to the solution of »(t)”t = 1, which by calling f the bit frequency again
(see above) gives

block-size ≈

neglecting the terms o(”t). Simulation of an Inhomogeneous Poisson Process
We follow in our presentation the lines of [22], remarking that the underlying
method is better known as rejection method (cf. [36]). Let W be a bounded set over
which we wish to simulate an inhomogeneous Poisson process. In our case, this
will be the output bit string of the photon detection devices. In the ¬rst step of the
simulation, a standard (homogeneous) Poisson process is simulated with constant
intensity parameter »— = maxt∈W »(t), where we assume »— ¬nite. This is trivially
satis¬ed in our setup if the base functions for »(t) are continuous and the length
of the bit string is ¬nite, so W is compact. The number of points in this pattern
will be higher than the number in the ¬nal pattern, which arises by independent
location-dependent thinning based on the thinning function p(t) = »(t)/»— [38].
The decision about whether or not a point from the ¬rst simulation is retained in
the ¬nal output is made by taking p(t) as its survival probability. In other words,
a point xn is retained with probability p(xn ) independently of what happens to the
other points. If the function »(t) is such that its maximum is dif¬cult to determine,
then any larger value may be used instead, resulting in a less ef¬cient procedure,
however, as more points will be rejected in that case. Figure 4.4 shows a simulated
error pattern for an inhomogeneous point process with linear trend »(t) = at for
some a > 0.

4.5.4 Using Bayesian Networks and Cox Processes
Cox processes may be considered as a generalization of inhomogeneous Poisson
processes in the sense of having their intensity parameter stochastic rather than
deterministic. For that reason, they are sometimes termed doubly stochastic Poisson
processes [19, 22], and the ¬rst task for setting up a Cox process model will be
¬nding a probabilistic model for the random intensity parameter Λ. Since errors in
our model are due to random distortions, it appears natural to consider temperature,
4 Adaptive Cascade 63

0 T
(a) Error pattern with linearly increasing intensity parameter

Error count

0 T
(b) Frequency of errors over time

Fig. 4.4 Inhomogeneous Poisson process error pattern with linear trend. The histogram displays
the number of errors per time-slot, showing a (linear) increase of errors over time

humidity, vibration, etc., as natural factors having in¬‚uence on the error rate, as
discussed in Chap. 7. For a concrete link, one may be able to quantify a number of
additional aspects being relevant for the error rate, and an intuitive modeling tech-
nique for setting up joint probability distributions with known dependency relations
among them is offered by the ¬eld of probabilistic networks (we refer the reader to
[4, 10, 26, 32] and references therein for details). Call Z 1 , . . . , Z k a set of random
variables representing environmental conditions on which the error rate » depends
on. A statistical analysis of the in¬‚uence of temperature (Z 1 ), humidity (Z 2 ), as well
as other environmental conditions, is provided in Chap. 7; see also [27]. Among
these, dependency relations may exist which are either known or unknown. Even
if they are unknown, a variety of learning strategies exist, taking existing data for
inferring independency statements. Having set up a graph with arcs representing
dependency between the nodes Z 1 , . . . , Z k , one can sample from the so-modeled
joint probability distributions using a variety of different techniques, as shown in
the cited literature. The appeal of this modeling approach is its ability to capture the
situation at hand almost directly, as we can translate several factors that have been
identi¬ed to be of importance into a model using a widely automated procedure.
It should be noticed, however, that many of the existing learning procedures for
Bayesian networks (see [21] for instance), as well as inference procedures are com-
putationally hard [9, 11, 12], even NP-complete [8]. Therefore, heuristic procedures
for tackling this problem have been developed; see [25]. A restriction that applies
to Bayesian network modeling is the acyclicity of graph models, meaning that no
circular dependencies are allowed to exist like Z 1 ’ Z 2 ’ Z 3 ’ Z 1 for instance.
Relaxing the acyclicity assumption yields dependency networks [20]. If the direc-
tionality of the dependency is unknown, then an undirected link may be chosen, in
which case we are dealing with chain graphs [7] if the network contains no partially
64 S. Rass and C. Kollmitzer

directed cycles or with Markov networks if all links are undirected. Each of these
comes with a comprehensive theory of learning its structure and doing inference.
As an example, consider temperature T , humidity H , and vibration V as in¬‚u-
ence factors, then a simple Bayesian network modeling the joint distribution of all
four variables (i.e., including the error density f Λ ) is depicted in Fig. 4.5. Although
the speci¬cation of the given Bayesian network requires knowledge of the distribu-
tions of T, H, V , as well as the distribution of Λ conditional on T, H, V , its real
power lies in its learning capabilities. In other words, the error distribution condi-
tional on the identi¬ed input variables can be updated during repeated protocol exe-
cutions upon measurements of T, H, V . This adds further ¬‚exibility to our model,
as repeated Bayesian updates of an estimator are consistent under some regularity
conditions [1, 13].

Fig. 4.5 Example Bayesian network model for random error rate Λ. Nodes are random variables
with (conditional) densities assigned. Dependencies are modeled with directed arcs

The probabilistic network shown in Fig. 4.5 is a model for a simple random
variable, whereas in our setting we consider the error rate to undergo seasonal varia-
tions (where the term seasonal may be understood as short- as well as long-duration
changes). Consequently, we may set up a Poisson process with intensity parameter
distributed according to a density induced by a Bayesian network.
A Cox point process model is set up in two steps [22]:

• Modeling of a stationary non-negative-valued stochastic process Λ(t).
• Given a realization of the process Λ(t) de¬ning the local error rates over time, the
values corresponding to this realization form an inhomogeneous Poisson process
with intensity parameter Λ(t).
4 Adaptive Cascade 65 Simulating Errors with Cox Processes
Simulation of a Cox process is straightforward by drawing a realization from Λ(t)
and simulating an inhomogeneous Poisson process with this time-dependent error

4.6 Examples

Let us compare some of our proposals in terms of quality of the block-size selec-
tion strategy. Assume that the errors are scattered across a bit string, following an
inhomogeneous Poisson distribution with an unknown intensity function roughly1
having the shape depicted in Fig. 4.6. The horizontal axis covers a duration of [0, T ],
one may think of the shape re¬‚ecting periods of high and low error rates, induced
by vibrations due to traf¬c peak hours if T is set to 24 h. Extending the example
further by including seasonal temperature variations or similar is left to the reader.
Error rate

0 T

Fig. 4.6 Temporal variations of error rate (example)

Cascade with adaptive but a priori ¬xed initial block-sizes takes a constant block-
size according to the median of the last w error rates, and take the block-size recip-
rocal to the (median-)estimated intensity parameter », according to expression (4.4),
where the rate for one protocol execution is estimated in the maximum likelihood
sense as
number of errors
ˆ (4.5)
total number of bits
Clearly, taking constant block-sizes in the Cascade protocol implicitly assumes
constant error rates over the whole duration of the protocol, so the Cascade variant
with ¬xed initial block-size is best suited for slow variations of », such as being the

1 None of the examples provided here is related to a real-life quantum cryptographic system, in
order to adhere to intellectual property rights of the related data sets. Hence, all diagrams are given
without explicit scales. All the data that have been used for the experiments described here come
from simulations based on real-life system information.
66 S. Rass and C. Kollmitzer

case if we take T = 24 h in Fig. 4.6. We simulate 500 consecutive executions of
the Cascade protocol over the duration [0, T ] with error rates varying as shown in
Fig. 4.6 and a sliding median smoothing with window size w = 11.
In each execution, the ¬nal string is randomly changed (bit-¬‚ip to introduce an
error) according to the (true) error pattern induced by the Poisson process, with
intensity parameter varying according to the function in Fig. 4.6. Taking the median
over the last 11 estimators given by (4.5), one can determine the (constant) block-
size and count the number of errors falling into each block along the full length
of the output string. Counting the number of blocks with 1, 2, 3, . . . errors, we end
up with a histogram, which for our example looks like shown in Fig. 4.7 (notice
that the absolute numbers were normalized to represent empirical probabilities of
encountering blocks with 1, 2, 3, . . . errors).





0 5 10 15
Number of errors per block

Fig. 4.7 Example probabilities for blocks with 1, 2, 3, . . . errors for Cascade error correction with
¬xed initial block-size

For evaluating the dynamic variant of the protocol, let the function in Fig. 4.6
cover the duration of one single protocol execution within [0, T ]. Simulation of an
inhomogeneous Poisson process according to the rejection method sketched above
and calculating the average number of errors in equidistantly distributed bins across
the period [0, T ] give a set of points for which an approximating function Λ(t) is
sought. Figure 4.8 shows an example where this has been done and a polynomial
model (solid line) has been ¬tted. Figure 4.9 displays the empirical probability of
encountering blocks with 1, 2, 3, . . . errors, showing that the variations in the local
error rates can indeed be smoothed by our proposal. For comparison, Fig. 4.10 dis-
plays the distribution of the number of errors within blocks if a constant block-size
is used. In the example, we took the maximum likelihood estimator from the sim-
ulated inhomogeneous Poisson process and derived the constant block-size using
formula (4.4).

4.7 Summary
Our results show that adapting the block-size to variations of the local error rate
is indeed worthwhile, since it signi¬cantly increases the ef¬ciency of the error
correction by reducing the number of bits that become revealed to the adversary
4 Adaptive Cascade 67

Error rate

local error estimate
exact intensity
fitted intensity function model

0 T
Fig. 4.8 Example of estimating local error rates and ¬tting a model to the empirical data





0 5 10 15 20
Number of errors per block

Fig. 4.9 Example empirical probabilities for blocks with 1, 2, 3, . . . errors for Cascade with
dynamic initial block-size

during the Cascade protocol. This leads to a considerable improvement of the QKD
in terms of ef¬ciency.
Combining the described models is possible by mixing a deterministic trend with
a stochastic variational part to capture regular and predictable variations of the error
rate which undergoes random perturbations through environmental in¬‚uences. This
can be seen as an extension to the least-squares model we proposed in Sect. 4.5.1,
whereas it is known that the least-squares ¬t is a best linear unbiased estimator
(BLUE; this is the Gauss“Markov theorem [23]) and becomes a best unbiased esti-
mator (BUE) if the data enjoy a normal distribution. Hence, a combined approach is
most suitable if the error is non-Gaussian, in which case a Bayesian model may be
a promising alternative. A different view on the problem of partitioning a string into
blocks such that the number of errors per block is minimized may also open the area
68 S. Rass and C. Kollmitzer






0 5 10 15 20
Number or errors per block

Fig. 4.10 Example empirical probabilities for blocks with 1, 2, 3, . . . errors for Cascade with ¬xed
initial block-size but strongly varying local error rates

for techniques known from designs of experiments. See [29, 35] for an introduction
and overview.

1. Barron, A., Schervish, M.J., Wasserman, L.: The consistency of posterior distributions in non-
parametric problems. Ann. Stat. 27(2), 536“561 (1999) 64
2. Bennett, C.H., Bessette, F., Brassard, G., Salvail, L., Smolin, J.: Experimental quantum cryp-
tography. J. Cryptol. 5, 3“28 (1992) 49, 50, 51, 52, 61
3. Bollerslev, T., Engle, R.F., Nelson, D.B.: Handbook of Econometrics, Vol. IV, Chap. 49:
ARCH Models, pp. 2959“3038. Elsevier Science B.V., Amsterdam (1994) 58
4. Borgelt, C., Kruse, R.: Graphical Models “ Methods for Data Analysis and Mining. John
Wiley & Sons, UK (2002) 63
5. Brassard, G., Salvail, L.: Secret-key reconciliation by public discussion. In: Heile-Beth, T (ed.)
EUROCRYPT. Springer, New York, pp. 410“423 (1993) 51
6. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting. Springer, New York
(1996) 54, 55
7. Buntine, W.L.: Chain graphs for learning. In: Besnard, P. and Hanks, S. (eds.) Uncertainty in
Arti¬cial Intelligence, Morgan Kaufmann, San Francisco, CA., pp. 46“54 (1995) 63
8. Chickering, D.M.: Learning bayesian networks is NP-complete. In: D. Fisher, H.J. Lenz (eds.)
Learning from Data: Arti¬cial Intelligence and Statistics V, Chap. 12, pp. 121“130. Springer-
Verlag New York (1996) 63
9. Cooper, G.F.: The computational complexity of probabilistic inference using bayesian belief
networks (research note). Artif. Intell. 42(2“3), 393“405 (1990) 63
10. Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and
Expert Systems. Springer, New York (1999) 63
11. Dagum, P., Chavez, R.M.: Approximating probabilistic inference in bayesian belief net-
works. IEEE Trans. Pattern Anal. Mach. Intell. 15(3), 246“255 (1993). DOI http://
dx.doi.org/10.1109/34.204906 63
12. Dagum, P., Luby, M.: Approximating probabilistic inference in bayesian belief networks is
NP-hard. Artif. Intell. 60(1), 141“153 (1993) 63
13. Diaconis, P., Freedman, D.: On the consistency of bayes estimates. Ann. Stat. 14(1), 1“26
(1986) 64
14. Dobson, A.J.: An introduction to generalized linear models, 2nd edn. Chapman & Hall, CRC
(2002) 58
15. Doucet, A. (ed.): Sequential Monte Carlo Methods in Practice. Springer, New York (2001) 56
4 Adaptive Cascade 69

16. Faraway, J.J.: Extending the Linear Model with R. Chapman & Hall/CRC, Boca Ration (2006) 58
17. Gilbert, G., Hamrick, M.: Practical quantum cryptography: A comprehensive analysis (part
one) (2000). URL http://www.citebase.org/abstract?id=oai:arXiv.org:quant-ph/00%09027 50, 52
18. Gouri´ roux, C.: ARCH Models and Financial Applications. Springer, New York (1997) 58
19. Grandell, J.: Doubly Stochastic Poisson Processes. Springer, New York (1976) 56, 62
20. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Myers Kadie, C.: Dependency
networks for inference, collaborative ¬ltering, and data visualization. J. Mach. Learn. Res. 1,
49“75 (2000) 63
21. Heckerman, D., Geiger, D., Chickering, D.M.: Learning bayesian networks: The combination
of knowledge and statistical data. In: KDD Workshop, pp. 85“96 (1994) 63
22. Illian, J., Penttinen, A., Stoyan, H., Stoyan, D.: Statistical Analysis and Modeling of Spatial
Point Patterns. Wiley, Chichestor (2008) 62, 64
23. Kariya, T., Kurata, H.: Generalized Least Squares. Wiley, Chichestor (2004) 67
24. Kingman, J.: Poisson Processes. Oxford Science Publications, Oxford, UK (1993) 56
25. Larra˜ aga, P., Poza, M., Murga, R., Kuijpers, C.: Structure learning of bayesian networks by
genetic algorithms: A performance analysis of control parameters. IEEE J. Pattern An. Mach.
Intell. 18(9), 912“926 (1996) 63
26. Lauritzen, S.L.: Graphical Models. Oxford Statistical Science Series 17. Oxford Science Pub-
lications, New York (1996) 63
27. Lessiak, K., Kollmitzer, C., Schauer, S., Pilz, J., Rass, S.: Statistical analysis of QKD networks
in real-life environments. In: Proceedings of the Third International Conference on Quantum,
Nano and Micro Technologies. IEEE Computer Society (2009) 109“114 63
28. Lindsey, J.K.: Applying Generalized Linear Models. Springer, New York (1997) 58
29. Mason, R.L.: Statistical Design and Analysis of Experiments with Applications to Engineering
and Science. Series in Probability and Statistics. Wiley, New York (2003) 68
30. McCullagh, P., Nelder, J.: Generalized Linear Models, 2nd edn. Monographs on Statistics and
Applied Probability 37. Chapman & Hall, London (1989) 58
31. Pankratz, A.: Forecasting with Univariate Box-Jenkins Models. Wiley, New York (1983) 54
32. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.
Morgan Kaufmann Publishers Inc., San Francisco, CA (1988) 63
33. Pitman, J.: Probability. Springer, New York (2000) 60
34. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd
edn. Cambridge University Press, New York (1992) 57
35. Rasch, D., Herrend¨ rfer, G., Bock, J., Busch, K.: Verfahrensbibliothek, Versuchsplanung und
-auswertung. Deutscher Landwirtschaftsverlag, Berlin (1978) 68
36. Ripley, B.D.: Stochastic Simulation. Wiley, New York (1987) 61, 62
37. Robert, C.P.: The Bayesian Choice. Springer-Verlag, New York (2001) 55, 56, 58
38. Ross, S.M.: Stochastic Processes. Series in Probability and Mathematical Statistics. Wiley,
New York (1983) 60, 61, 62
39. Sherman, J., Morrison, W.J.: Adjustment of an inverse matrix corresponding to a change in
one element of a given matrix. Ann. Math. Statistics 21, 124“127 (1950) 56, 57
40. Stulajter, F.: Predictions in Time Series Using Regression Models. Springer, New York (2002) 54
Chapter 5
Attack Strategies on QKD Protocols

S. Schauer

In the following we will describe a number of different attack strategies on Quantum
Key Distribution protocols. First, we will discuss QKD protocols based on ideal
sources, mainly using single photons as in [2, 1, 11] but also entanglement-based
protocols as, for example, in [15]. Additionally, we will deal with the security of
QKD protocols in realistic environments as they are described in Chap. 6. Regarding
realistic sources, due to the physical limitations of the apparatus some loopholes
were identi¬ed, which could lead to a successful attack on such protocols. As we
will see, special countermeasures are implemented in the protocols to close those
loopholes and guarantee an unconditional secure communication.
The security discussions are limited to individual attacks, since they are the sim-
plest and most studied class of attack strategies on QKD protocols. The key point of
an individual attack is that the eavesdropper, Eve, interacts with each signal coming
from Alice separately. Some strategies rely on the fact that Eve is able to postpone
her interaction with the signal (i.e., the measurement) until after the sifting and error
correction “ or as long as she wants “ to obtain the maximum amount of information
from the public discussion. Alternatively, Eve measures the state instantly and uses
the information from sifting and error correction later on.
The individual attacks stand in opposition to the collective attacks, where Eve
prepares an ancilla state for each signal coming from Alice and lets it interact with
the signal. Eve passes the signal on to Bob and keeps the ancilla at her side. When
all signals are transferred Eve waits to obtain as much information as possible from
the public communication to perform the best measurement on all of her ancillae. It
has been proven that the same security bound de¬ned for QKD protocols also holds
if collective attacks are applied [8, 7].
The most general version of collective attacks are coherent attacks. In this kind
of attack Eve is allowed to perform any quantum operation on the signal in transit
and use any possible ancilla state. Due to this vast amount of possibilities for Eve the
attack cannot be parametrized and brute force optimization is not possible. Anyhow,

S. Schauer (B)
Safety & Security Department, Quantum Technologies, AIT Austrian Institute of Technology
GmbH, Lakeside B01A, 9020 Klagenfurt, Austria, stefan.schauer@ait.ac.at;

Schauer, S.: Attack Strategies on QKD Protocols. Lect. Notes Phys. 797, 71“95 (2010)
c Springer-Verlag Berlin Heidelberg 2010
DOI 10.1007/978-3-642-04831-9 5
72 S. Schauer

bounds for information and theoretical security have been found which are equal to
the bounds for collective attacks [34].

5.1 Introduction

In the discussions about the security of QKD protocols the amount of information
Eve has about the secret key shared between Alice and Bob is of great interest.
Every attack on a QKD protocol is characterized by this amount of information and
Alice™s and Bob™s goal is to minimize it. Usually, Eve™s information about the secret
key consists of two parts: the information she obtains from her measurement on the
signal in transit and the information about Alice™s and Bob™s choice of bases. The
second part tells Eve whether she made a correct measurement for the respective
qubit or not. The best way to express this fact is to use the conditional probabil-
ity p(s|m). Here, s is the value of the bit Alice sent and m is Eve™s measurement
outcome. The value p(s|m) can easily be computed using the probabilities p(m|s),
i.e., the probability that Eve obtains the result m if Alice originally sent the bit s. To
achieve that the following formula is used:
p s|m = (5.1)
p(m|s )

The next interesting value is the probability that Eve obtains the same result as
Alice, which is the collision probability
Pc s|m = p s|m . (5.2)

Further, the expected collision probability over all possible measurement out-
comes m of Eve is described as

Pc = p m Pc s|m . (5.3)

To quantify Eve™s amount of information on Alice™s bit two entropies are used:
the Shannon entropy H and the Renyi entropy R (cf. also “Sect. 2.3”). Since the
information depends on Eve™s measurement result, the conditional versions of these
two entropies are used, i.e., H (S|M) and R(S|M), respectively. The Renyi entropy
can be computed directly from the collision probability Pc (s|m)
R S|M = m = ’ log Pc s|m = ’ log p s|m (5.4)

and is averaged over the respective probabilities of Eve™s measurement results, i.e.,

R S|M = p m R S|M = m (5.5)
5 Attack Strategies on QKD Protocols 73

Similarly, the Shannon entropy is de¬ned as

H S|M = m = ’ p s|m log p s|m (5.6)

and also averaged over the probabilities of Eve™s results

H S|M = p m H S|M = m . (5.7)

The Shannon entropy H is an estimator of the uncertainty of a probability dis-
tribution and thus the variation of the Shannon entropy can be interpreted as the
information gain. For the a priori probability distribution X and the a posteriori
distribution Y the information gain is I = H (X ) ’ H (Y ). This can be used to
describe the amount of information Eve obtains by applying a speci¬c attack. Here
Eve has no a priori information about the secret key and thus H = 1, since Alice
chooses her bit string at random. Therefore, the amount of information gained by
Eve is I = 1 ’ H (S|M), which will be used constantly in the following sections.
Another important question is how much key material has to be discarded to min-
imize Eve™s knowledge about the key. This amount is called the discarded fraction
„ and is computed using the expected collision probability

„ = 1 + log Pc n . (5.8)

Following from this equation a bit string of length n has to be reduced by n„ bits
during privacy ampli¬cation to leave Eve with at most 1 Shannon bit of information
on the whole secret key, no matter its length [24].

5.2 Attack Strategies in an Ideal Environment

5.2.1 Intercept and Resend Naive Intercept and Resend
The most intuitive kind of an individual attack is the intercept and resend (I&R)
attack. The main intention for Eve is to get hold of each photon coming from Alice
and measuring it in some prede¬ned basis. According to her result Eve prepares a
new photon and forwards it to the legitimate receiver, Bob. In detail, Alice™s qubit
will either be in the horizontal/vertical or the diagonal basis, explicitly one of the
four states |H , |V , |+ , or |’ , where

|+ = √ |H + |V |’ = √ |H ’ |V . (5.9)
2 2
74 S. Schauer

If Alice sends a 0, she will either encode it into |H or |+ with equal proba-
bility. Eve, unaware of Alice™s encoding, will choose randomly between the H/V
and the +/’ basis. Thus, she will obtain a correct result in case Alice sent |H
and Eve measured in the H/V basis or Alice sent |+ and Eve measured in the
+/’ basis, respectively. Any other combination will result in a completely random
measurement outcome. This leads to the scheme in Fig. 5.1.

1 1
Bit s = 0
2 2

H Alice™s basis choice

1 1 1 1
2 2 2 2

Eve™s basis
+/ ’ +/ ’
1 1 1 1
1 1
0 0
2 2 2 2
Value of m

+ ’ + ’

Fig. 5.1 Decision tree for the naive I&R attack strategy

For now, we will also assume that Eve does not listen to any public commu-
nication between Alice and Bob. Therefore, she will not know in which case her
measurement was wrong. The conditional probabilities p(m|s) for the four possible
results are then

3 2
1 3
+ ·1=
p m = |H |s = 0 = p m = |+ |s = 0 =
2 2 8
3 2
1 1 1
p m = |V |s = 0 = p m = |’ |s = 0 = + ·0=
2 2 8

and equally for p(m|s = 1). For the conditional probabilities p(s|m) the sum
s p(m|s) = 2 and thus we get p(s|m) = 2 p(m|s). Further, the collision prob-

ability in the naive version of the I&R attack is

2 2
3 1 5
Pc s|m = |H = + = (5.11)
4 4 8

and similarly for m = |V , m = |+ , and m = |’ which results in the average
collision probability

2 2
1 1 1 3 5
Pc = Pc s|m = 4 + = . (5.12)
4 4 4 4 8
5 Attack Strategies on QKD Protocols 75

From the collision probability the discarded fraction can be computed as
1 ’ log Pc which is „ 0.322. Thus, only one-third of the key has to be discarded
to guarantee that Eve has less than one bit of information on the whole key.
Looking at the Renyi entropy for m = |H , we get

= ’ log Pc s|m = |H = ’ log = 3 ’ log 5.
R S|M = |H (5.13)
Since the Renyi entropy for m = |V , m = |+ , and m = |’ is the same and all
four results are equally probable, the average Renyi entropy is

1 1
R S|M = R S|M = m = 4 3 ’ log 5 = 3 ’ log 5. (5.14)
4 4

The second interesting entropy is the conditional Shannon entropy H (S|M =
|H )

3 31 1
H S|M = |H = ’ log ’ log = 0.811, (5.15)
4 44 4
which is equal to the other entropies H (S|M = |V ), H (S|M = |+ ), H (S|M =
|’ ), such that
1 1 3 31 1
H S|M = H S|M = m = 4 ’ log ’ log = 0.811.
4 4 4 44 4

The total information Eve will have in the end about each bit is 1 ’ H S|M
0.2, which is rather poor for Eve. Therefore, Eve will use another strategy, which
gives her more information. One possibility is to use another measurement basis,
e.g., the Breidbart basis. Intercept and Resend in the Breidbart Basis
As pointed out above, we did not take into account that Eve is able to listen to
the public discussion of Alice and Bob during the error correcting and sifting stage
in the naive I&R attack. If Eve listens to the public discussion she can obtain more
information since she knows in which cases she did a correct measurement. Further-
more, for this attack strategy Eve restricts herself to only one measurement basis and
gains more information compared to the naive I&R attack. The measurement basis
in this attack is the •± basis, which is

|•+ = a|H + b|V |•’ = b|H ’ a|V . (5.17)

Here, a = sin ± and b = cos ± depend on some angle ±. Using this basis for
Eve™s measurement we get a slightly different decision tree (cf. Fig. 5.2).
76 S. Schauer

Fig. 5.2 Decision tree for the I&R attack strategy in the Breidbart basis

If Alice sends a 0 encoded as |H , Eve will obtain |•+ with probability a 2 and
|•’ with probability b2 . If Alice encodes 0 as |+ , Eve will obtain |•+ with prob-
ability c2 = 1 (a + b)2 and |•’ with probability d 2 = 1 (a ’ b)2 , which is a direct
2 2
result from the measurement of |+ in the |•± -basis. This gives the conditional

p m = |•+ , H/V |s = 0 = a =p m = |•’ , H/V |s = 1
p m = |•’ , H/V |s = 0 = b2 = p m = |•+ , H/V |s = 1
2 . (5.18)
p m = |•+ , +/’ |s = 0 = c2 = p m = |•’ , ’/+ |s = 1
p m = |•’ , +/’ |s = 0 = d 2 = p m = |•+ , ’/+ |s = 1

Since a 2 + b2 = c2 + d 2 = 1, the sum s p(m|s) = 1 and thus the conditional
probabilities are p (s|m) = 2 p(m|s). This gives a collision probability of

2 2
Pc s|m = |•+ , H/V = a2 + b2 = a 4 + b4 , (5.19)

which leads to the average collision probability (using some simple algebra)

1 1 1 3
Pc = 2a 4 + 2b4 + 2c4 + 2d 4 = a 4 + b4 + c4 + d 4 = . (5.20)
4 4 2 4

Based on this result, the discarded fraction „ can be computed as described in
Eq. 5.8 which leads to „ 0.585. Thus, more than half of the bits of the secret key
have to be discarded to minimize Eve™s information to at most one bit.
It has to be stressed that the average collision probability is independent of the
angle ± chosen by Eve. Similarly, also the Renyi entropy

R S|M = |•+ , H/V = ’ log(a 4 + b4 ) (5.21)
5 Attack Strategies on QKD Protocols 77

and consequently the average Renyi entropy

12 1
R S|M = a + b2 ’ log a 4 + b4 + c2 + d 2 ’ log c4 + d 4
2 2 (5.22)
= 2 ’ log 3

is independent of the angle ±. This is not the case for the Shannon entropy, since

H S|M = |•+ , H/V = ’a 2 log a 2 ’ b2 log b2 (5.23)

and further

H S|M = a + b2 ’a 2 log a 2 ’ b2 log b2
1 (5.24)
+ c2 + d 2 ’c2 log c2 ’ d 2 log d 2
= ’ a 2 log a 2 ’ b2 log b2 .

To make Eve™s errors in the H/V and +/’ basis equally probable she uses
± = 3π . In this case the basis •± is called the Breidbart basis, as introduced in [3].
As pointed out above, the average collision probability and the Renyi entropy are
independent of the angle ± and it only affects the Shannon entropy, which is in that

3π 3π 3π 3π
H S|M = ’ sin2 ’ cos2 = 0, 61.
log2 sin2 log2 cos2 (5.25)
8 8 8 8

Thus, Eve learns 1 ’ H (S|M) 0.4 bits from every bit sent by Alice. Compared
to the intuitive and rather simple approach described above Eve gains much more
information with a measurement in the Breidbart basis (cf. Eq. 5.16). Nevertheless,
the amount of information is not yet optimal. As we will see in the next section, Eve
is able to increase her information about Alice™s bits by using the H/V and +/’
bases again and listening to the communication between Alice and Bob. Full Intercept and Resend
The most successful I&R attack is a mixture of the two aforementioned strategies. In
this attack Eve randomly chooses between the H/V and the +/’ basis to measure
the signals coming from Alice. She forwards the results she obtains to Bob and
listens to the public communication between Alice and Bob during the sifting phase.
If Alice sends a 0 encoded as |H , Eve will either measure it in the H/V or +/’
basis. As we have already seen, if Eve uses the H/V basis she will obtain |H with
certainty and introduce no error. Otherwise, she will obtain |+ or |’ with equal
probability (cf. Fig. 5.3).
78 S. Schauer

Fig. 5.3 Decision tree for the full I&R attack strategy

Comparing the decision tree with the one from the naive I&R attack, it is easy to
see that Eve can eliminate two events for s = 0, i.e., if she measured |V and Alice
used the H/V basis and |’ and Alice used the +/’ basis. These two events occur
with probability p = 0 which increases Eve™s information. In detail, the probabili-
ties p(m|s) are
·1= = p m = |+ , +/’ |s = 0
p m = |H , H/V |s = 0 =
2 4
p m = |V , H/V |s = 0 = · 0 = 0 = p m = |’ , +/’ |s = 0
1 1
p m = |+ , H/V |s = 0 = = = p m = |H , +/’ |s = 0
2 8
1 1
p m = |’ , H/V |s = 0 = = = p m = |V , +/’ |s = 0
2 8
and we get similar values for s = 1. For the sum s p(m|s) we obtain 1 , such that
p(s|m) = 4 p(m|s). This results in the collision probabilities 1 if Eve chooses the
correct basis for her measurement and 1 if she chooses a basis different from Alice™s
preparation. Thus, the average collision probability is

1 1 1 3
Pc = +4 + = . (5.27)
4 16 4 4

Calculating the discarded fraction „ , we get „ 0.585 as for the attack in the
Breidbart basis (cf. Sect.
For the Renyi entropy we obtain R(S|M = m) = ’ log 1 = 0 whenever Eve™s
choice of the basis is correct and R(S|M = m) = ’ log 0.5 = 1 otherwise. The
average Renyi entropy is then



. 3
( 9)